Re: Tez UI

2016-10-16 Thread Stephen Sprague
thanks Allan.  so i enabled DEBUG,console on the ATS.  I see this in that
log:

16/10/16 21:07:59 DEBUG mortbay.log: call filter Cross Origin Filter
16/10/16 21:07:59 DEBUG mortbay.log: call filter static_user_filter
16/10/16 21:07:59 DEBUG mortbay.log: call filter guice
16/10/16 21:07:59 DEBUG security.TimelineACLsManager: Verifying the access
of yarn on the timeline entity { id: appattempt_1476593404620_0211_0
1, type: YARN_APPLICATION_ATTEMPT }
16/10/16 21:07:59 DEBUG timeline.TimelineDataManager: Storing the entity {
id: appattempt_1476593404620_0211_01, type: YARN_APPLICATION_ATT
EMPT }, JSON-style content:
{"events":[{"timestamp":1476677279325,"eventtype":"YARN_APPLICATION_ATTEMPT_REGISTERED"}],"entity":"appattempt_1476
593404620_0211_01","entitytype":"YARN_APPLICATION_ATTEMPT","domain":"DEFAULT"}
16/10/16 21:07:59 DEBUG timeline.TimelineDataManager: Storing entities: {
id: appattempt_1476593404620_0211_01, type: YARN_APPLICATION_ATTE
MPT }
16/10/16 21:07:59 DEBUG mortbay.log: RESPONSE /ws/v1/timeline/  200
16/10/16 21:07:59 DEBUG mortbay.log: REQUEST /ws/v1/timeline/ on
org.mortbay.jetty.HttpConnection@7d134e03
16/10/16 21:07:59 DEBUG mortbay.log:
sessionManager=org.mortbay.jetty.servlet.HashSessionManager@350aac89
16/10/16 21:07:59 DEBUG mortbay.log: session=null
16/10/16 21:07:59 DEBUG mortbay.log: servlet=default
16/10/16 21:07:59 DEBUG mortbay.log:
chain=NoCacheFilter->NoCacheFilter->safety->Timeline Authentication
Filter->Cross Origin Filter->static_us
er_filter->guice->default
16/10/16 21:07:59 DEBUG mortbay.log: servlet holder=default
16/10/16 21:07:59 DEBUG mortbay.log: call filter NoCacheFilter
16/10/16 21:07:59 DEBUG mortbay.log: call filter NoCacheFilter
16/10/16 21:07:59 DEBUG mortbay.log: call filter safety
16/10/16 21:07:59 DEBUG mortbay.log: call filter Timeline Authentication
Filter
16/10/16 21:07:59 DEBUG server.AuthenticationFilter: Request [
http://dwrdevnn1.sv2.trulia.com:8188/ws/v1/timeline/] user [dwr]
authenticated
16/10/16 21:07:59 DEBUG mortbay.log: call filter Cross Origin Filter
16/10/16 21:07:59 DEBUG mortbay.log: call filter static_user_filter
16/10/16 21:07:59 DEBUG mortbay.log: call filter guice
16/10/16 21:07:59 DEBUG mortbay.log: RESPONSE /ws/v1/timeline/  404
16/10/16 21:07:59 DEBUG mortbay.log: RESPONSE /ws/v1/timeline/  200
16/10/16 21:08:00 DEBUG mortbay.log: EOF
16/10/16 21:08:00 DEBUG mortbay.log: EOF
16/10/16 21:08:00 DEBUG mortbay.log: EOF
16/10/16 21:08:02 DEBUG mortbay.log: EOF
16/10/16 21:08:02 DEBUG mortbay.log: EXCEPTION
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at org.mortbay.io.nio.ChannelEndPoint.fill(ChannelEndPoint.java:132)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:290)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)


again not sure how to read it.

so far this seems to be the smoking gun to me from the Tez AM.

2016-10-16 16:14:06,106 [DEBUG] [HistoryEventHandlingThread]
|impl.TimelineClientImpl|: HTTP error code: 404 Server response :
{"exception":"UnrecognizedPropertyException","message":"Unrecognized
field \"eventinfo\"



On Sun, Oct 16, 2016 at 5:53 PM, Allan Wilson  wrote:

> I can send you my TEZ file later
>
> Sent from my iPhone
>
> On Oct 16, 2016, at 1:32 PM, Stephen Sprague  wrote:
>
> Hi Hitesh,
> Bingo!
>
> Log Type: syslog_dag_1476593404620_0001_1
>
> Log Upload Time: Sat Oct 15 22:03:47 -0700 2016
>
> Log Length: 75813
>
> Showing 4096 bytes of 75813 total. Click here
> 
> for the full log.
>
> 6-10-15 21:51:35,970 [WARN] [IPC Server handler 25 on 40353] 
> |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown container 
> with id: container_1476593404620_0001_01_50, asking it to die
> 2016-10-15 21:51:35,972 [WARN] [IPC Server handler 27 on 40353] 
> |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown container 
> with id: container_1476593404620_0001_01_08, asking it to die
> 2016-10-15 21:51:35,973 [WARN] [IPC Server handler 3 on 40353] 
> |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown container 
> with id: container_1476593404620_0001_01_07, asking it to die
> 2016-10-15 21:51:35,974 [WARN] [IPC Server handler 29 on

Re: Tez UI

2016-10-16 Thread Allan Wilson
I can send you my TEZ file later

Sent from my iPhone

> On Oct 16, 2016, at 1:32 PM, Stephen Sprague  wrote:
> 
> Hi Hitesh,
> Bingo!
> 
> Log Type: syslog_dag_1476593404620_0001_1
> 
> Log Upload Time: Sat Oct 15 22:03:47 -0700 2016
> 
> Log Length: 75813
> 
> Showing 4096 bytes of 75813 total. Click here for the full log.
> 
> 6-10-15 21:51:35,970 [WARN] [IPC Server handler 25 on 40353] 
> |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown container 
> with id: container_1476593404620_0001_01_50, asking it to die
> 2016-10-15 21:51:35,972 [WARN] [IPC Server handler 27 on 40353] 
> |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown container 
> with id: container_1476593404620_0001_01_08, asking it to die
> 2016-10-15 21:51:35,973 [WARN] [IPC Server handler 3 on 40353] 
> |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown container 
> with id: container_1476593404620_0001_01_07, asking it to die
> 2016-10-15 21:51:35,974 [WARN] [IPC Server handler 29 on 40353] 
> |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown container 
> with id: container_1476593404620_0001_01_11, asking it to die
> 2016-10-15 21:51:35,987 [ERROR] [HistoryEventHandlingThread] 
> |impl.TimelineClientImpl|: Failed to get the response from the timeline 
> server.
> 2016-10-15 21:51:35,987 [WARN] [HistoryEventHandlingThread] 
> |ats.ATSHistoryLoggingService|: Could not handle history events
> org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response 
> from the timeline server.
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:339)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:301)
>   at 
> org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.handleEvents(ATSHistoryLoggingService.java:357)
>   at 
> org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.access$700(ATSHistoryLoggingService.java:53)
>   at 
> org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService$1.run(ATSHistoryLoggingService.java:190)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-15 21:51:35,987 [WARN] [IPC Server handler 6 on 40353] 
> |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown container 
> with id: container_1476593404620_0001_01_58, asking it to die
> 2016-10-15 21:51:35,989 [WARN] [IPC Server handler 24 on 40353] 
> |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown container 
> with id: container_1476593404620_0001_01_51, asking it to die
> 2016-10-15 21:51:36,021 [ERROR] [HistoryEventHandlingThread] 
> |impl.TimelineClientImpl|: Failed to get the response from the timeline 
> server.
> 2016-10-15 21:51:36,021 [WARN] [HistoryEventHandlingThread] 
> |ats.ATSHistoryLoggingService|: Could not handle history events
> org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response 
> from the timeline server.
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:339)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:301)
>   at 
> org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.handleEvents(ATSHistoryLoggingService.java:357)
>   at 
> org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.access$700(ATSHistoryLoggingService.java:53)
>   at 
> org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService$1.run(ATSHistoryLoggingService.java:190)
>   at java.lang.Thread.run(Thread.java:745)
> 
> 
> i'm running the hive cli on host=dwrdevnn1.
> 
> i updated yarn-site.xml on dwrdevnn1.
> 
> i restarted the ATS service on dwrdevnn1. sudo -u yarn -- yarn-daemon.sh 
> --config /etc/hadoop/conf  start timelineserver
> 
> netstat is showing 8188 as being alive. i can also telnet to dwrdevnn1 8188.  
> also port 10200 is LISTENing.
> 
> $ sudo netstat -lanp | grep 31168
> tcp0  0 172.19.103.136:102000.0.0.0:*   LISTEN
>   31168/java
> tcp0  0 172.19.103.136:8188 0.0.0.0:*   LISTEN
>   31168/java
> 
> 
> might there be a debug log level i can set on impl.TimelineClientImpl to see 
> what is happening on the connection event?  
> 
> thank you again!
> 
> Cheers,
> Stephen.
> 
> 
> 
> 
>> On Sun, Oct 16, 2016 at 9:54 AM, Hitesh Shah  wrote:
>> Hello Stephen,
>> 
>> yarn-site.xml needs to be updated wherever the Tez client is used. i.e if 
>> you are using Hive, then wherever you launch the Hive CLI and also where the 
>> HiveServer2 is installed ( HS2 will need a restart ).
>> 
>> To see if the connection to timeline is/was an issue, please check the yarn 
>> app logs for any Tez application ( the application master logs to be more 
>> specific: syslog_dag* files) to see if there are any warnings/exceptions 
>> being logged related to hist

Re: Tez UI

2016-10-16 Thread Stephen Sprague
Hi Hitesh,

thanks for this - much appreciated. so I tried the easy one first. ie.
setting tez.am.log.level=DEBUG

$ hive -hiveconf hive.execution.engine=tez -hiveconf tez.am.log.level=DEBUG

then i examined the syslog_dag* file and found this:

2016-10-16 16:14:06,081 [DEBUG] [HistoryEventHandlingThread]
|ats.ATSHistoryLoggingService|: Sending event batch to Timeline,
batchSize=1
2016-10-16 16:14:06,081 [DEBUG] [HistoryEventHandlingThread]
|security.UserGroupInformation|: PrivilegedAction as:dwr (auth:SIMPLE)
from:org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineURLConnectionFactory.getHttpURLConnection(TimelineClientImpl.java:498)
2016-10-16 16:14:06,083 [DEBUG] [IPC Server handler 0 on 38365]
|hdfs.BlockReaderLocal|: dfs.client.use.legacy.blockreader.local =
false
2016-10-16 16:14:06,084 [DEBUG] [IPC Server handler 0 on 38365]
|hdfs.BlockReaderLocal|: dfs.client.read.shortcircuit = false
2016-10-16 16:14:06,084 [DEBUG] [IPC Server handler 0 on 38365]
|hdfs.BlockReaderLocal|: dfs.client.domain.socket.data.traffic = false
2016-10-16 16:14:06,084 [DEBUG] [IPC Server handler 0 on 38365]
|hdfs.BlockReaderLocal|: dfs.domain.socket.path =
/var/lib/hadoop-hdfs/dn_socket
2016-10-16 16:14:06,084 [DEBUG] [IPC Server handler 0 on 38365]
|hdfs.DFSClient|: No KeyProvider found.
2016-10-16 16:14:06,084 [DEBUG] [IPC Server handler 0 on 38365]
|retry.RetryUtils|: multipleLinearRandomRetry = null
2016-10-16 16:14:06,084 [DEBUG] [IPC Server handler 0 on 38365]
|ipc.Client|: getting client out of cache:
org.apache.hadoop.ipc.Client@5f9b2141
2016-10-16 16:14:06,084 [DEBUG] [IPC Server handler 0 on 38365]
|sasl.DataTransferSaslUtil|: DataTransferProtocol not using
SaslPropertiesResolver, no QOP found in configuration for
dfs.data.transfer.protection
2016-10-16 16:14:06,087 [DEBUG] [IPC Server handler 0 on 38365]
|ipc.Client|: The ping interval is 6 ms.
2016-10-16 16:14:06,087 [DEBUG] [IPC Server handler 0 on 38365]
|ipc.Client|: Connecting to
dwrdevnn1.sv2.trulia.com/172.19.103.136:8020
2016-10-16 16:14:06,088 [DEBUG] [IPC Server handler 0 on 38365]
|security.UserGroupInformation|: PrivilegedAction as:dwr (auth:SIMPLE)
from:org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:717)
2016-10-16 16:14:06,088 [DEBUG] [IPC Server handler 0 on 38365]
|security.SaslRpcClient|: Sending sasl message state: NEGOTIATE

2016-10-16 16:14:06,088 [DEBUG] [IPC Server handler 0 on 38365]
|security.SaslRpcClient|: Received SASL message state: NEGOTIATE
auths {
  method: "TOKEN"
  mechanism: "DIGEST-MD5"
  protocol: ""
  serverId: "default"
  challenge: 
"realm=\"default\",nonce=\"Ovwi9Zs6J6KbO1e4TzhzthbzfaUtDaBCJyBaprgs\",qop=\"auth\",charset=utf-8,algorithm=md5-sess"
}
auths {
  method: "SIMPLE"
  mechanism: ""
}

2016-10-16 16:14:06,089 [DEBUG] [IPC Server handler 0 on 38365]
|security.SaslRpcClient|: Get token info proto:interface
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolPB
info:@org.apache.hadoop.security.token.TokenInfo(value=class
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSelector)
2016-10-16 16:14:06,089 [DEBUG] [IPC Server handler 0 on 38365]
|security.SaslRpcClient|: Use SIMPLE authentication for protocol
ClientNamenodeProtocolPB
2016-10-16 16:14:06,089 [DEBUG] [IPC Server handler 0 on 38365]
|security.SaslRpcClient|: Sending sasl message state: INITIATE
auths {
  method: "SIMPLE"
  mechanism: ""
}

2016-10-16 16:14:06,089 [DEBUG] [IPC Parameter Sending Thread #0]
|ipc.Client|: IPC Client (1462044018) connection to
dwrdevnn1.sv2.trulia.com/172.19.103.136:8020 from dwr sending #37
2016-10-16 16:14:06,089 [DEBUG] [IPC Client (1462044018) connection to
dwrdevnn1.sv2.trulia.com/172.19.103.136:8020 from dwr] |ipc.Client|:
IPC Client (1462044018) connection to
dwrdevnn1.sv2.trulia.com/172.19.103.136:8020 from dwr: starting,
having connections 3
2016-10-16 16:14:06,090 [DEBUG] [IPC Client (1462044018) connection to
dwrdevnn1.sv2.trulia.com/172.19.103.136:8020 from dwr] |ipc.Client|:
IPC Client (1462044018) connection to
dwrdevnn1.sv2.trulia.com/172.19.103.136:8020 from dwr got value #37
2016-10-16 16:14:06,090 [DEBUG] [IPC Server handler 0 on 38365]
|ipc.ProtobufRpcEngine|: Call: getFileInfo took 3ms
2016-10-16 16:14:06,096 [DEBUG] [IPC Parameter Sending Thread #0]
|ipc.Client|: IPC Client (1462044018) connection to
dwrdevnn1.sv2.trulia.com/172.19.103.136:8020 from dwr sending #38
2016-10-16 16:14:06,096 [DEBUG] [IPC Client (1462044018) connection to
dwrdevnn1.sv2.trulia.com/172.19.103.136:8020 from dwr] |ipc.Client|:
IPC Client (1462044018) connection to
dwrdevnn1.sv2.trulia.com/172.19.103.136:8020 from dwr got value #38
2016-10-16 16:14:06,096 [DEBUG] [IPC Server handler 0 on 38365]
|ipc.ProtobufRpcEngine|: Call: getBlockLocations took 1ms
2016-10-16 16:14:06,099 [DEBUG] [IPC Server handler 0 on 38365]
|hdfs.DFSClient|: newInfo = LocatedBlocks{
  fileLength=284761
  underConstruction=false
  
blocks=[LocatedBlock{BP-1307833058-172.19.103.136-1380908958225:blk_111613174

Re: Tez UI

2016-10-16 Thread Hitesh Shah
For the timeline daemon: 
export YARN_ROOT_LOGGER=“DEBUG, RFA” or the appropriate appender and then 
restart the timeline server. 
or if you don’t want a restart, try http://timelinehost:8188/logLevel and use 
that if available to set the log level to DEBUG for org.apache.hadoop 

For the Tez AM:
Set tez.am.log.level to DEBUG to get more info from the AM logs which will also 
set log level to debug for the timeline client running inside the AM. 
Depending on which version of Tez in use, you set the log level to something 
like "DEBUG;org.apache.hadoop.ipc=INFO;org.apache.hadoop.security=INFO;” where 
everything except the ipc and security package classes log at DEBUG. You can 
use the same format to set the log level to 
“INFO;timeline.client.package=DEBUG;” 

— Hitesh

> On Oct 16, 2016, at 12:32 PM, Stephen Sprague  wrote:
> 
> Hi Hitesh,
> Bingo!
> 
> Log Type: syslog_dag_1476593404620_0001_1
> 
> Log Upload Time: Sat Oct 15 22:03:47 -0700 2016
> 
> Log Length: 75813
> 
> Showing 4096 bytes of 75813 total. Click here for the full log.
> 
> 6-10-15 21:51:35,970 [WARN] [IPC Server handler 25 on 40353] 
> |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown container 
> with id: container_1476593404620_0001_01_50, asking it to die
> 2016-10-15 21:51:35,972 [WARN] [IPC Server handler 27 on 40353] 
> |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown container 
> with id: container_1476593404620_0001_01_08, asking it to die
> 2016-10-15 21:51:35,973 [WARN] [IPC Server handler 3 on 40353] 
> |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown container 
> with id: container_1476593404620_0001_01_07, asking it to die
> 2016-10-15 21:51:35,974 [WARN] [IPC Server handler 29 on 40353] 
> |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown container 
> with id: container_1476593404620_0001_01_11, asking it to die
> 2016-10-15 21:51:35,987 [ERROR] [HistoryEventHandlingThread] 
> |impl.TimelineClientImpl|: Failed to get the response from the timeline 
> server.
> 2016-10-15 21:51:35,987 [WARN] [HistoryEventHandlingThread] 
> |ats.ATSHistoryLoggingService|: Could not handle history events
> org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response 
> from the timeline server.
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:339)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:301)
>   at 
> org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.handleEvents(ATSHistoryLoggingService.java:357)
>   at 
> org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.access$700(ATSHistoryLoggingService.java:53)
>   at 
> org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService$1.run(ATSHistoryLoggingService.java:190)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-15 21:51:35,987 [WARN] [IPC Server handler 6 on 40353] 
> |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown container 
> with id: container_1476593404620_0001_01_58, asking it to die
> 2016-10-15 21:51:35,989 [WARN] [IPC Server handler 24 on 40353] 
> |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown container 
> with id: container_1476593404620_0001_01_51, asking it to die
> 2016-10-15 21:51:36,021 [ERROR] [HistoryEventHandlingThread] 
> |impl.TimelineClientImpl|: Failed to get the response from the timeline 
> server.
> 2016-10-15 21:51:36,021 [WARN] [HistoryEventHandlingThread] 
> |ats.ATSHistoryLoggingService|: Could not handle history events
> org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response 
> from the timeline server.
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:339)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:301)
>   at 
> org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.handleEvents(ATSHistoryLoggingService.java:357)
>   at 
> org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.access$700(ATSHistoryLoggingService.java:53)
>   at 
> org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService$1.run(ATSHistoryLoggingService.java:190)
>   at java.lang.Thread.run(Thread.java:745)
> 
> 
> 
> i'm running the hive cli on host=dwrdevnn1.
> 
> i updated yarn-site.xml on dwrdevnn1.
> 
> i restarted the ATS service on dwrdevnn1. sudo -u yarn -- yarn-daemon.sh 
> --config /etc/hadoop/conf  start timelineserver
> 
> netstat is showing 8188 as being alive. i can also telnet to dwrdevnn1 8188.  
> also port 10200 is LISTENing.
> 
> $ sudo netstat -lanp | grep 31168
> tcp0  0 172.19.103.136:102000.0.0.0:*   LISTEN
>   31168/java
> tcp0  0 172.19.103.136:8188 0.0.0.0:*   LISTEN
>   31168/java
> 
> 
> might there be a 

Re: Tez UI

2016-10-16 Thread Stephen Sprague
Hi Hitesh,
Bingo!

Log Type: syslog_dag_1476593404620_0001_1

Log Upload Time: Sat Oct 15 22:03:47 -0700 2016

Log Length: 75813

Showing 4096 bytes of 75813 total. Click here

for the full log.

6-10-15 21:51:35,970 [WARN] [IPC Server handler 25 on 40353]
|app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown
container with id: container_1476593404620_0001_01_50, asking it
to die
2016-10-15 21:51:35,972 [WARN] [IPC Server handler 27 on 40353]
|app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown
container with id: container_1476593404620_0001_01_08, asking it
to die
2016-10-15 21:51:35,973 [WARN] [IPC Server handler 3 on 40353]
|app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown
container with id: container_1476593404620_0001_01_07, asking it
to die
2016-10-15 21:51:35,974 [WARN] [IPC Server handler 29 on 40353]
|app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown
container with id: container_1476593404620_0001_01_11, asking it
to die
2016-10-15 21:51:35,987 [ERROR] [HistoryEventHandlingThread]
|impl.TimelineClientImpl|: Failed to get the response from the
timeline server.
2016-10-15 21:51:35,987 [WARN] [HistoryEventHandlingThread]
|ats.ATSHistoryLoggingService|: Could not handle history events
org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the
response from the timeline server.
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:339)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:301)
at 
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.handleEvents(ATSHistoryLoggingService.java:357)
at 
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.access$700(ATSHistoryLoggingService.java:53)
at 
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService$1.run(ATSHistoryLoggingService.java:190)
at java.lang.Thread.run(Thread.java:745)
2016-10-15 21:51:35,987 [WARN] [IPC Server handler 6 on 40353]
|app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown
container with id: container_1476593404620_0001_01_58, asking it
to die
2016-10-15 21:51:35,989 [WARN] [IPC Server handler 24 on 40353]
|app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown
container with id: container_1476593404620_0001_01_51, asking it
to die
2016-10-15 21:51:36,021 [ERROR] [HistoryEventHandlingThread]
|impl.TimelineClientImpl|: Failed to get the response from the
timeline server.
2016-10-15 21:51:36,021 [WARN] [HistoryEventHandlingThread]
|ats.ATSHistoryLoggingService|: Could not handle history events
org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the
response from the timeline server.
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:339)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:301)
at 
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.handleEvents(ATSHistoryLoggingService.java:357)
at 
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.access$700(ATSHistoryLoggingService.java:53)
at 
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService$1.run(ATSHistoryLoggingService.java:190)
at java.lang.Thread.run(Thread.java:745)



i'm running the hive cli on host=dwrdevnn1.

i updated yarn-site.xml on dwrdevnn1.

i restarted the ATS service on dwrdevnn1. sudo -u yarn -- yarn-daemon.sh
--config /etc/hadoop/conf  start timelineserver

netstat is showing 8188 as being alive. i can also telnet to dwrdevnn1
8188.  also port 10200 is LISTENing.

$ sudo netstat -lanp | grep 31168
tcp0  0 172.19.103.136:102000.0.0.0:*
LISTEN  31168/java
tcp0  0 172.19.103.136:8188 0.0.0.0:*
LISTEN  31168/java


might there be a debug log level i can set on impl.TimelineClientImpl to
see what is happening on the connection event?

thank you again!

Cheers,
Stephen.




On Sun, Oct 16, 2016 at 9:54 AM, Hitesh Shah  wrote:

> Hello Stephen,
>
> yarn-site.xml needs to be updated wherever the Tez client is used. i.e if
> you are using Hive, then wherever you launch the Hive CLI and also where
> the HiveServer2 is installed ( HS2 will need a restart ).
>
> To see if the connection to timeline is/was an issue, please check the
> yarn app logs for any Tez application ( the application master logs to be
> more specific: syslog_dag* files) to see if there are any
> warnings/exceptions being logged related to history event handling.
>
> thanks
> — Hitesh
>
> > On Oct 15, 2016, at 9:58 PM, Stephen Sprague  wrote:
> >
> > hmm... made that change to yarn-

Re: Tez UI

2016-10-16 Thread Hitesh Shah
Hello Stephen,

yarn-site.xml needs to be updated wherever the Tez client is used. i.e if you 
are using Hive, then wherever you launch the Hive CLI and also where the 
HiveServer2 is installed ( HS2 will need a restart ). 

To see if the connection to timeline is/was an issue, please check the yarn app 
logs for any Tez application ( the application master logs to be more specific: 
syslog_dag* files) to see if there are any warnings/exceptions being logged 
related to history event handling. 

thanks
— Hitesh

> On Oct 15, 2016, at 9:58 PM, Stephen Sprague  wrote:
> 
> hmm... made that change to yarn-site.xml and retarted the timelineserver and 
> RM.
> 
> $ sudo netstat -lanp | grep 31168 #timelineserver
> 
> tcp0  0 172.19.103.136:102000.0.0.0:*   LISTEN
>   31168/java
> tcp0  0 172.19.103.136:8188 0.0.0.0:*   LISTEN
>   31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45299
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45298
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45322
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45297
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45316
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45318
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45317
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45321
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45326
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45314
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45315
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45313
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45320
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45324
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45325
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45319
> ESTABLISHED 31168/java
> unix  2  [ ] STREAM CONNECTED 1455259739 31168/java
> unix  2  [ ] STREAM CONNECTED 1455253313 31168/java
> 
> 
> still no dice though.  same error.   i only changed yarn-site.xml on the 
> namenode though.  you think i need to copy it to all the datanodes and 
> restart the NM's too?
> 
> any other suggestions?
> 
> 'ppreciate the help!
> 
> 
> Cheers,
> Stephen.
> 
> On Sat, Oct 15, 2016 at 8:46 PM, Allan Wilson  wrote:
> Just saw Gopals response...that def needs updating too.
> 
> Sent from my iPhone
> 
> On Oct 15, 2016, at 9:31 PM, Stephen Sprague  wrote:
> 
>> thanks guys. lemme answer.
>> 
>> Sreenath-
>> 1. yarn.acl.enable = false  (ie. i did not set it)
>> 2.  this:  http://dwrdevnn1.sv2.trulia.com:9766 displays index.html with an 
>> *empty* list
>> 
>> Gopal-
>> 3. i'll replace 0.0.0.0 with dwrdevnn1.sv2.trulia.com and see happens...
>> 
>> Allan-
>> 4. yes, metrics are enabled.
>> 
>> 
>> I'll let you know what happens with Gopal's suggestion.
>> 
>> 
>> Cheers,
>> Stephen.
>> 
>> On Sat, Oct 15, 2016 at 8:20 PM, Allan Wilson  wrote:
>> Are you emitting metrics to the ATS? 
>> 
>> yarn.timeline-service.enabled=true
>> 
>> Sent from my iPhone
>> 
>> On Oct 15, 2016, at 8:36 PM, Sreenath Somarajapuram 
>>  wrote:
>> 
>>> Hi Stephen,
>>> 
>>> The error message is coming from ATS, and it says that the application data 
>>> is not available.
>>> And yes, tez_application_1476574340629_0001 is a legit value. It can be 
>>> considered as the id for Tez application details.
>>> 
>>> Please help me with these:
>>> 1. Are you having yarn.acl.enable = true in yarn-site.xml ?
>>> 2. On going to http://dwrdevnn1.sv2.trulia.com:9766 from your browser 
>>> window, the UI is supposed to display a list of DAGs. Are you able to view 
>>> them?
>>> 
>>> Thanks,
>>> Sreenath
>>> 
>>> From: Stephen Sprague 
>>> Reply-To: "user@tez.apache.org" 
>>> Date: Sunday, October 16, 2016 at 7:16 AM
>>> To: "user@tez.apache.org" 
>>> Subject: Tez UI
>>> 
>>> hey guys,
>>> i'm having hard time getting the Tez UI to work.  I'm sure i'm doing 
>>> something wrong but i can't seem to figure out.  Here's my scenario.
>>> 
>>> 1. i'm using nginx as the webserver. port 9766.   using that port without 
>>> params correctly displays index.html.  (i followed the instructions on 
>>> unzipping the war file - that seems ok - i'm using tez-ui2 )
>>> 
>>> 
>>> 2. i run a Tez job. It runs fine.
>>> 
>>> 
>>> 3. i click on the "History" hyperlink in the RM UI at 8088.
>>> 
>>> 
>>> 4. it attempts to r