Thanks Hitesh. i'll look into this tonight. On Mon, Oct 17, 2016 at 10:31 AM, Hitesh Shah <hit...@apache.org> wrote:
> Hello Stephen, > > I checked branch-2.4.0 of hadoop just to make sure - it does contain > “eventinfo” as a member of the TimelineEvent class so this does not seem to > indicate any issue in terms of a potential mismatch or a missing patch in > the version of hadoop that you are running. > > Based on the logs, YARN_APPLICATION_ATTEMPT is data being written by the > YARN RM into YARN Timeline and that seems to be working. What is not > working is the Tez AM talking to YARN Timeline. I have not come across the > property not found issue in the past. One guess I have is that this > potentially could due be either due to something incompatible with the > timeline client class on Tez AM’s classpath and/or a combination of the > jackson/jersey jars in use. > > There are a few things you should look into and update this thread with > the following info: > - what version of hadoop you are running > - what version of Tez ( and also what version of hadoop it was compiled > against ) > - check the hadoop classpath for jackson/jersey jars and compare the > versions in it to the versions in the tez tarball. > > thanks > — Hitesh > > > On Oct 16, 2016, at 9:24 PM, Stephen Sprague <sprag...@gmail.com> wrote: > > > > thanks Allan. so i enabled DEBUG,console on the ATS. I see this in > that log: > > > > 16/10/16 21:07:59 DEBUG mortbay.log: call filter Cross Origin Filter > > 16/10/16 21:07:59 DEBUG mortbay.log: call filter static_user_filter > > 16/10/16 21:07:59 DEBUG mortbay.log: call filter guice > > 16/10/16 21:07:59 DEBUG security.TimelineACLsManager: Verifying the > access of yarn on the timeline entity { id: appattempt_1476593404620_0211_ > 0 > > 00001, type: YARN_APPLICATION_ATTEMPT } > > 16/10/16 21:07:59 DEBUG timeline.TimelineDataManager: Storing the entity > { id: appattempt_1476593404620_0211_000001, type: YARN_APPLICATION_ATT > > EMPT }, JSON-style content: {"events":[{"timestamp": > 1476677279325,"eventtype":"YARN_APPLICATION_ATTEMPT_ > REGISTERED"}],"entity":"appattempt_1476 > > 593404620_0211_000001","entitytype":"YARN_APPLICATION_ > ATTEMPT","domain":"DEFAULT"} > > 16/10/16 21:07:59 DEBUG timeline.TimelineDataManager: Storing entities: > { id: appattempt_1476593404620_0211_000001, type: YARN_APPLICATION_ATTE > > MPT } > > 16/10/16 21:07:59 DEBUG mortbay.log: RESPONSE /ws/v1/timeline/ 200 > > 16/10/16 21:07:59 DEBUG mortbay.log: REQUEST /ws/v1/timeline/ on > org.mortbay.jetty.HttpConnection@7d134e03 > > 16/10/16 21:07:59 DEBUG mortbay.log: sessionManager=org.mortbay. > jetty.servlet.HashSessionManager@350aac89 > > 16/10/16 21:07:59 DEBUG mortbay.log: session=null > > 16/10/16 21:07:59 DEBUG mortbay.log: servlet=default > > 16/10/16 21:07:59 DEBUG mortbay.log: chain=NoCacheFilter-> > NoCacheFilter->safety->Timeline Authentication Filter->Cross Origin > Filter->static_us > > er_filter->guice->default > > 16/10/16 21:07:59 DEBUG mortbay.log: servlet holder=default > > 16/10/16 21:07:59 DEBUG mortbay.log: call filter NoCacheFilter > > 16/10/16 21:07:59 DEBUG mortbay.log: call filter NoCacheFilter > > 16/10/16 21:07:59 DEBUG mortbay.log: call filter safety > > 16/10/16 21:07:59 DEBUG mortbay.log: call filter Timeline Authentication > Filter > > 16/10/16 21:07:59 DEBUG server.AuthenticationFilter: Request [ > http://dwrdevnn1.sv2.trulia.com:8188/ws/v1/timeline/] user [dwr] > authenticated > > 16/10/16 21:07:59 DEBUG mortbay.log: call filter Cross Origin Filter > > 16/10/16 21:07:59 DEBUG mortbay.log: call filter static_user_filter > > 16/10/16 21:07:59 DEBUG mortbay.log: call filter guice > > 16/10/16 21:07:59 DEBUG mortbay.log: RESPONSE /ws/v1/timeline/ 404 > > 16/10/16 21:07:59 DEBUG mortbay.log: RESPONSE /ws/v1/timeline/ 200 > > 16/10/16 21:08:00 DEBUG mortbay.log: EOF > > 16/10/16 21:08:00 DEBUG mortbay.log: EOF > > 16/10/16 21:08:00 DEBUG mortbay.log: EOF > > 16/10/16 21:08:02 DEBUG mortbay.log: EOF > > 16/10/16 21:08:02 DEBUG mortbay.log: EXCEPTION > > java.io.IOException: Connection reset by peer > > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > > at sun.nio.ch.IOUtil.read(IOUtil.java:197) > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > > at org.mortbay.io.nio.ChannelEndPoint.fill( > ChannelEndPoint.java:132) > > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:290) > > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser. > java:212) > > at org.mortbay.jetty.HttpConnection.handle( > HttpConnection.java:404) > > at org.mortbay.io.nio.SelectChannelEndPoint.run( > SelectChannelEndPoint.java:410) > > at org.mortbay.thread.QueuedThreadPool$PoolThread. > run(QueuedThreadPool.java:582) > > > > > > again not sure how to read it. > > > > so far this seems to be the smoking gun to me from the Tez AM. > > > > 2016-10-16 16:14:06,106 [DEBUG] [HistoryEventHandlingThread] > |impl.TimelineClientImpl|: HTTP error code: 404 Server response : > > {"exception":" > > UnrecognizedPropertyException","message":"Unrecognized field > \"eventinfo\" > > > > > > On Sun, Oct 16, 2016 at 5:53 PM, Allan Wilson <wilsoncr...@gmail.com> > wrote: > > I can send you my TEZ file later > > > > Sent from my iPhone > > > > On Oct 16, 2016, at 1:32 PM, Stephen Sprague <sprag...@gmail.com> wrote: > > > >> Hi Hitesh, > >> Bingo! > >> > >> Log Type: syslog_dag_1476593404620_0001_1 > >> > >> Log Upload Time: Sat Oct 15 22:03:47 -0700 2016 > >> > >> Log Length: 75813 > >> > >> Showing 4096 bytes of 75813 total. Click here for the full log. > >> > >> 6-10-15 21:51:35,970 [WARN] [IPC Server handler 25 on 40353] > |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown > container with id: container_1476593404620_0001_ > >> 01_000050, asking it to die > >> 2016-10-15 21:51:35,972 [WARN] [IPC Server handler 27 on 40353] > |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown > container with id: container_1476593404620_0001_ > >> 01_000008, asking it to die > >> 2016-10-15 21:51:35,973 [WARN] [IPC Server handler 3 on 40353] > |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown > container with id: container_1476593404620_0001_ > >> 01_000007, asking it to die > >> 2016-10-15 21:51:35,974 [WARN] [IPC Server handler 29 on 40353] > |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown > container with id: container_1476593404620_0001_ > >> 01_000011, asking it to die > >> 2016-10-15 21:51:35,987 [ERROR] [HistoryEventHandlingThread] > |impl.TimelineClientImpl|: Failed to get the response from the timeline > server. > >> 2016-10-15 21:51:35,987 [WARN] [HistoryEventHandlingThread] > |ats.ATSHistoryLoggingService| > >> : Could not handle history events > >> org.apache.hadoop.yarn. > >> exceptions.YarnException: Failed to get the response from the timeline > server. > >> at org.apache.hadoop.yarn.client. > >> api.impl.TimelineClientImpl.doPosting(TimelineClientImpl. > >> java:339) > >> at org.apache.hadoop.yarn.client. > >> api.impl.TimelineClientImpl.putEntities( > >> TimelineClientImpl.java:301) > >> at org.apache.tez.dag.history. > >> logging.ats.ATSHistoryLoggingService.handleEvents( > ATSHistoryLoggingService.java: > >> 357) > >> at org.apache.tez.dag.history. > >> logging.ats.ATSHistoryLoggingService.access$700( > ATSHistoryLoggingService.java: > >> 53) > >> at org.apache.tez.dag.history. > >> logging.ats.ATSHistoryLoggingService$1.run(ATSHistoryLoggingService. > >> java:190) > >> at java.lang.Thread.run(Thread. > >> java:745) > >> 2016-10-15 21:51:35,987 [WARN] [IPC Server handler 6 on 40353] > |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown > container with id: container_1476593404620_0001_ > >> 01_000058, asking it to die > >> 2016-10-15 21:51:35,989 [WARN] [IPC Server handler 24 on 40353] > |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown > container with id: container_1476593404620_0001_ > >> 01_000051, asking it to die > >> 2016-10-15 21:51:36,021 [ERROR] [HistoryEventHandlingThread] > |impl.TimelineClientImpl|: Failed to get the response from the timeline > server. > >> 2016-10-15 21:51:36,021 [WARN] [HistoryEventHandlingThread] > |ats.ATSHistoryLoggingService| > >> : Could not handle history events > >> org.apache.hadoop.yarn. > >> exceptions.YarnException: Failed to get the response from the timeline > server. > >> at org.apache.hadoop.yarn.client. > >> api.impl.TimelineClientImpl.doPosting(TimelineClientImpl. > >> java:339) > >> at org.apache.hadoop.yarn.client. > >> api.impl.TimelineClientImpl.putEntities( > >> TimelineClientImpl.java:301) > >> at org.apache.tez.dag.history. > >> logging.ats.ATSHistoryLoggingService.handleEvents( > ATSHistoryLoggingService.java: > >> 357) > >> at org.apache.tez.dag.history. > >> logging.ats.ATSHistoryLoggingService.access$700( > ATSHistoryLoggingService.java: > >> 53) > >> at org.apache.tez.dag.history. > >> logging.ats.ATSHistoryLoggingService$1.run(ATSHistoryLoggingService. > >> java:190) > >> at java.lang.Thread.run(Thread. > >> java:745) > >> > >> > >> i'm running the hive cli on host=dwrdevnn1. > >> > >> i updated yarn-site.xml on dwrdevnn1. > >> > >> i restarted the ATS service on dwrdevnn1. sudo -u yarn -- > yarn-daemon.sh --config /etc/hadoop/conf start timelineserver > >> > >> netstat is showing 8188 as being alive. i can also telnet to dwrdevnn1 > 8188. also port 10200 is LISTENing. > >> > >> $ sudo netstat -lanp | grep 31168 > >> tcp 0 0 172.19.103.136:10200 0.0.0.0:* > LISTEN 31168/java > >> tcp 0 0 172.19.103.136:8188 0.0.0.0:* > LISTEN 31168/java > >> > >> > >> might there be a debug log level i can set on impl.TimelineClientImpl > to see what is happening on the connection event? > >> > >> thank you again! > >> > >> Cheers, > >> Stephen. > >> > >> > >> > >> > >> On Sun, Oct 16, 2016 at 9:54 AM, Hitesh Shah <hit...@apache.org> wrote: > >> Hello Stephen, > >> > >> yarn-site.xml needs to be updated wherever the Tez client is used. i.e > if you are using Hive, then wherever you launch the Hive CLI and also where > the HiveServer2 is installed ( HS2 will need a restart ). > >> > >> To see if the connection to timeline is/was an issue, please check the > yarn app logs for any Tez application ( the application master logs to be > more specific: syslog_dag* files) to see if there are any > warnings/exceptions being logged related to history event handling. > >> > >> thanks > >> — Hitesh > >> > >> > On Oct 15, 2016, at 9:58 PM, Stephen Sprague <sprag...@gmail.com> > wrote: > >> > > >> > hmm... made that change to yarn-site.xml and retarted the > timelineserver and RM. > >> > > >> > $ sudo netstat -lanp | grep 31168 #timelineserver > >> > > >> > tcp 0 0 172.19.103.136:10200 0.0.0.0:* > LISTEN 31168/java > >> > tcp 0 0 172.19.103.136:8188 0.0.0.0:* > LISTEN 31168/java > >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45299 > ESTABLISHED 31168/java > >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45298 > ESTABLISHED 31168/java > >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45322 > ESTABLISHED 31168/java > >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45297 > ESTABLISHED 31168/java > >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45316 > ESTABLISHED 31168/java > >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45318 > ESTABLISHED 31168/java > >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45317 > ESTABLISHED 31168/java > >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45321 > ESTABLISHED 31168/java > >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45326 > ESTABLISHED 31168/java > >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45314 > ESTABLISHED 31168/java > >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45315 > ESTABLISHED 31168/java > >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45313 > ESTABLISHED 31168/java > >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45320 > ESTABLISHED 31168/java > >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45324 > ESTABLISHED 31168/java > >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45325 > ESTABLISHED 31168/java > >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45319 > ESTABLISHED 31168/java > >> > unix 2 [ ] STREAM CONNECTED 1455259739 > 31168/java > >> > unix 2 [ ] STREAM CONNECTED 1455253313 > 31168/java > >> > > >> > > >> > still no dice though. same error. i only changed yarn-site.xml on > the namenode though. you think i need to copy it to all the datanodes and > restart the NM's too? > >> > > >> > any other suggestions? > >> > > >> > 'ppreciate the help! > >> > > >> > > >> > Cheers, > >> > Stephen. > >> > > >> > On Sat, Oct 15, 2016 at 8:46 PM, Allan Wilson <wilsoncr...@gmail.com> > wrote: > >> > Just saw Gopals response...that def needs updating too. > >> > > >> > Sent from my iPhone > >> > > >> > On Oct 15, 2016, at 9:31 PM, Stephen Sprague <sprag...@gmail.com> > wrote: > >> > > >> >> thanks guys. lemme answer. > >> >> > >> >> Sreenath- > >> >> 1. yarn.acl.enable = false (ie. i did not set it) > >> >> 2. this: http://dwrdevnn1.sv2.trulia.com:9766 displays index.html > with an *empty* list > >> >> > >> >> Gopal- > >> >> 3. i'll replace 0.0.0.0 with dwrdevnn1.sv2.trulia.com and see > happens... > >> >> > >> >> Allan- > >> >> 4. yes, metrics are enabled. > >> >> > >> >> > >> >> I'll let you know what happens with Gopal's suggestion. > >> >> > >> >> > >> >> Cheers, > >> >> Stephen. > >> >> > >> >> On Sat, Oct 15, 2016 at 8:20 PM, Allan Wilson <wilsoncr...@gmail.com> > wrote: > >> >> Are you emitting metrics to the ATS? > >> >> > >> >> yarn.timeline-service.enabled=true > >> >> > >> >> Sent from my iPhone > >> >> > >> >> On Oct 15, 2016, at 8:36 PM, Sreenath Somarajapuram < > ssomarajapu...@hortonworks.com> wrote: > >> >> > >> >>> Hi Stephen, > >> >>> > >> >>> The error message is coming from ATS, and it says that the > application data is not available. > >> >>> And yes, tez_application_1476574340629_0001 is a legit value. It > can be considered as the id for Tez application details. > >> >>> > >> >>> Please help me with these: > >> >>> 1. Are you having yarn.acl.enable = true in yarn-site.xml ? > >> >>> 2. On going to http://dwrdevnn1.sv2.trulia.com:9766 from your > browser window, the UI is supposed to display a list of DAGs. Are you able > to view them? > >> >>> > >> >>> Thanks, > >> >>> Sreenath > >> >>> > >> >>> From: Stephen Sprague <sprag...@gmail.com> > >> >>> Reply-To: "user@tez.apache.org" <user@tez.apache.org> > >> >>> Date: Sunday, October 16, 2016 at 7:16 AM > >> >>> To: "user@tez.apache.org" <user@tez.apache.org> > >> >>> Subject: Tez UI > >> >>> > >> >>> hey guys, > >> >>> i'm having hard time getting the Tez UI to work. I'm sure i'm > doing something wrong but i can't seem to figure out. Here's my scenario. > >> >>> > >> >>> 1. i'm using nginx as the webserver. port 9766. using that port > without params correctly displays index.html. (i followed the instructions > on unzipping the war file - that seems ok - i'm using tez-ui2 ) > >> >>> > >> >>> > >> >>> 2. i run a Tez job. It runs fine. > >> >>> > >> >>> > >> >>> 3. i click on the "History" hyperlink in the RM UI at 8088. > >> >>> > >> >>> > >> >>> 4. it attempts to run http://dwrdevnn1.sv2.trulia. > com:8088/proxy/application_1476574340629_0001/#/tez-app/ > application_1476574340629_0001 > >> >>> > >> >>> > >> >>> 5. which yields this error: > >> >>> > >> >>> <image.png> > >> >>> > >> >>> i see "id: tez_application_1476574340629_0001" is that "tez_" > prefix legit? > >> >>> > >> >>> > >> >>> > >> >>> 6. the ATS is running on port 8188. I've modified the file > config/configs.env as well: cf. timeline: "http://dwrdevnn1.sv2.trulia. > com:8188", > >> >>> > >> >>> > >> >>> 7. here are those details: > >> >>> > >> >>> yarn 29762 1 12 18:10 pts/5 00:00:11 > /usr/lib/jvm/java-8-oracle/jre/bin/java -Dproc_timelineserver -Xmx1000m > -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn > -Dhadoop.log.file=yarn-yarn-timelineserver-dwrdevnn1.log > -Dyarn.log.file=yarn-yarn-timelineserver-dwrdevnn1.log -Dyarn.home.dir= > -Dyarn.id.str=yarn -Dhadoop.root.logger=INFO,RFA > -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/lib/hadoop/lib/native > -Dyarn.policy.file=hadoop-policy.xml -Dhadoop.log.dir=/var/log/hadoop-yarn > -Dyarn.log.dir=/var/log/hadoop-yarn > -Dhadoop.log.file=yarn-yarn-timelineserver-dwrdevnn1.log > -Dyarn.log.file=yarn-yarn-timelineserver-dwrdevnn1.log > -Dyarn.home.dir=/usr/lib/hadoop-yarn -Dhadoop.home.dir=/usr/lib/hadoop-yarn > -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA > -Djava.library.path=/usr/lib/hadoop/lib/native -classpath > /etc/hadoop/conf:/etc/hadoop/conf:/etc/hadoop/conf:/usr/ > lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop- > hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.// > *:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/ > lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*: > /opt/pepperdata/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/ > hadoop-yarn/lib/*:/etc/hadoop/conf/timelineserver-config/log4j.properties > org.apache.hadoop.yarn.server.applicationhistoryservice. > ApplicationHistoryServer > >> >>> > >> >>> $ sudo netstat -lanp |grep 29762 > >> >>> tcp 0 0 0.0.0.0:10200 0.0.0.0:* > LISTEN 29762/java > >> >>> tcp 0 0 0.0.0.0:8188 0.0.0.0:* > LISTEN 29762/java > >> >>> > >> >>> > >> >>> > >> >>> 8. the configs in yarn-site.xml > >> >>> <property> > >> >>> <name>yarn.timeline-service.hostname</name> > >> >>> <value>0.0.0.0</value> > >> >>> </property> > >> >>> <property> > >> >>> <name>yarn.timeline-service.enabled</name> > >> >>> <value>true</value> > >> >>> </property> > >> >>> <property> > >> >>> <name>yarn.timeline-service.webapp.address</name> > >> >>> <value>0.0.0.0:8188</value> > >> >>> </property> > >> >>> <property> > >> >>> <name>yarn.timeline-service.http-cross-origin.enabled</name> > >> >>> <value>true</value> > >> >>> </property> > >> >>> <property> > >> >>> <name>yarn.resourcemanager.system-metrics-publisher. > enabled</name> > >> >>> <value>true</value> > >> >>> </property> > >> >>> > >> >>> > >> >>> 9. and tez-site.xml are as follows: > >> >>> <property> > >> >>> <description>Enable Tez to use the Timeline Server for History > Logging</description> > >> >>> <name>tez.history.logging.service.class</name> > >> >>> <value>org.apache.tez.dag.history.logging.ats. > ATSHistoryLoggingService</value> > >> >>> </property> > >> >>> > >> >>> <!-- port 9766 defined in nginx config file --> > >> >>> <property> > >> >>> <description>URL for where the Tez UI is hosted</description> > >> >>> <name>tez.tez-ui.history-url.base</name> > >> >>> <value>http://dwrdevnn1.sv2.trulia.com:9766</value> > >> >>> </property> > >> >>> > >> >>> <!-- from tez-ui README.txt --> > >> >>> <property> > >> >>> <name>tez.runtime.convert.user-payload.to.history-text</name> > >> >>> <value>true</value> > >> >>> <description>Should be enabled to get the configuration > options. If enabled, the config options are set as > >> >>> userpayload per input/output. > >> >>> </description> > >> >>> </property> > >> >>> > >> >>> <property> > >> >>> <name>tez.allow.disabled.timeline-domains</name> > >> >>> <value>true</value> > >> >>> </property> > >> >>> > >> >>> > >> >>> > >> >>> So i don't get it. Any ideas why this fails? > >> >>> > >> >>> thanks, > >> >>> Stephen. > >> >>> > >> >>> > >> >>> <image.png> > >> >> > >> > > >> > >> > > > >