thanks Allan. so i enabled DEBUG,console on the ATS. I see this in that log:
16/10/16 21:07:59 DEBUG mortbay.log: call filter Cross Origin Filter 16/10/16 21:07:59 DEBUG mortbay.log: call filter static_user_filter 16/10/16 21:07:59 DEBUG mortbay.log: call filter guice 16/10/16 21:07:59 DEBUG security.TimelineACLsManager: Verifying the access of yarn on the timeline entity { id: appattempt_1476593404620_0211_0 00001, type: YARN_APPLICATION_ATTEMPT } 16/10/16 21:07:59 DEBUG timeline.TimelineDataManager: Storing the entity { id: appattempt_1476593404620_0211_000001, type: YARN_APPLICATION_ATT EMPT }, JSON-style content: {"events":[{"timestamp":1476677279325,"eventtype":"YARN_APPLICATION_ATTEMPT_REGISTERED"}],"entity":"appattempt_1476 593404620_0211_000001","entitytype":"YARN_APPLICATION_ATTEMPT","domain":"DEFAULT"} 16/10/16 21:07:59 DEBUG timeline.TimelineDataManager: Storing entities: { id: appattempt_1476593404620_0211_000001, type: YARN_APPLICATION_ATTE MPT } 16/10/16 21:07:59 DEBUG mortbay.log: RESPONSE /ws/v1/timeline/ 200 16/10/16 21:07:59 DEBUG mortbay.log: REQUEST /ws/v1/timeline/ on org.mortbay.jetty.HttpConnection@7d134e03 16/10/16 21:07:59 DEBUG mortbay.log: sessionManager=org.mortbay.jetty.servlet.HashSessionManager@350aac89 16/10/16 21:07:59 DEBUG mortbay.log: session=null 16/10/16 21:07:59 DEBUG mortbay.log: servlet=default 16/10/16 21:07:59 DEBUG mortbay.log: chain=NoCacheFilter->NoCacheFilter->safety->Timeline Authentication Filter->Cross Origin Filter->static_us er_filter->guice->default 16/10/16 21:07:59 DEBUG mortbay.log: servlet holder=default 16/10/16 21:07:59 DEBUG mortbay.log: call filter NoCacheFilter 16/10/16 21:07:59 DEBUG mortbay.log: call filter NoCacheFilter 16/10/16 21:07:59 DEBUG mortbay.log: call filter safety 16/10/16 21:07:59 DEBUG mortbay.log: call filter Timeline Authentication Filter 16/10/16 21:07:59 DEBUG server.AuthenticationFilter: Request [ http://dwrdevnn1.sv2.trulia.com:8188/ws/v1/timeline/] user [dwr] authenticated 16/10/16 21:07:59 DEBUG mortbay.log: call filter Cross Origin Filter 16/10/16 21:07:59 DEBUG mortbay.log: call filter static_user_filter 16/10/16 21:07:59 DEBUG mortbay.log: call filter guice 16/10/16 21:07:59 DEBUG mortbay.log: RESPONSE /ws/v1/timeline/ 404 16/10/16 21:07:59 DEBUG mortbay.log: RESPONSE /ws/v1/timeline/ 200 16/10/16 21:08:00 DEBUG mortbay.log: EOF 16/10/16 21:08:00 DEBUG mortbay.log: EOF 16/10/16 21:08:00 DEBUG mortbay.log: EOF 16/10/16 21:08:02 DEBUG mortbay.log: EOF 16/10/16 21:08:02 DEBUG mortbay.log: EXCEPTION java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at org.mortbay.io.nio.ChannelEndPoint.fill(ChannelEndPoint.java:132) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:290) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) again not sure how to read it. so far this seems to be the smoking gun to me from the Tez AM. 2016-10-16 16:14:06,106 [DEBUG] [HistoryEventHandlingThread] |impl.TimelineClientImpl|: HTTP error code: 404 Server response : {"exception":"UnrecognizedPropertyException","message":"Unrecognized field \"eventinfo\" On Sun, Oct 16, 2016 at 5:53 PM, Allan Wilson <wilsoncr...@gmail.com> wrote: > I can send you my TEZ file later > > Sent from my iPhone > > On Oct 16, 2016, at 1:32 PM, Stephen Sprague <sprag...@gmail.com> wrote: > > Hi Hitesh, > Bingo! > > Log Type: syslog_dag_1476593404620_0001_1 > > Log Upload Time: Sat Oct 15 22:03:47 -0700 2016 > > Log Length: 75813 > > Showing 4096 bytes of 75813 total. Click here > <http://dwrdevnn1.sv2.trulia.com:19888/jobhistory/logs/dwrdevdn17.sv2.trulia.com:46114/container_1476593404620_0001_01_000001/container_1476593404620_0001_01_000001/dwr/syslog_dag_1476593404620_0001_1/?start=0> > for the full log. > > 6-10-15 21:51:35,970 [WARN] [IPC Server handler 25 on 40353] > |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown container > with id: container_1476593404620_0001_01_000050, asking it to die > 2016-10-15 21:51:35,972 [WARN] [IPC Server handler 27 on 40353] > |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown container > with id: container_1476593404620_0001_01_000008, asking it to die > 2016-10-15 21:51:35,973 [WARN] [IPC Server handler 3 on 40353] > |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown container > with id: container_1476593404620_0001_01_000007, asking it to die > 2016-10-15 21:51:35,974 [WARN] [IPC Server handler 29 on 40353] > |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown container > with id: container_1476593404620_0001_01_000011, asking it to die > 2016-10-15 21:51:35,987 [ERROR] [HistoryEventHandlingThread] > |impl.TimelineClientImpl|: Failed to get the response from the timeline > server. > 2016-10-15 21:51:35,987 [WARN] [HistoryEventHandlingThread] > |ats.ATSHistoryLoggingService|: Could not handle history events > org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response > from the timeline server. > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:339) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:301) > at > org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.handleEvents(ATSHistoryLoggingService.java:357) > at > org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.access$700(ATSHistoryLoggingService.java:53) > at > org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService$1.run(ATSHistoryLoggingService.java:190) > at java.lang.Thread.run(Thread.java:745) > 2016-10-15 21:51:35,987 [WARN] [IPC Server handler 6 on 40353] > |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown container > with id: container_1476593404620_0001_01_000058, asking it to die > 2016-10-15 21:51:35,989 [WARN] [IPC Server handler 24 on 40353] > |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown container > with id: container_1476593404620_0001_01_000051, asking it to die > 2016-10-15 21:51:36,021 [ERROR] [HistoryEventHandlingThread] > |impl.TimelineClientImpl|: Failed to get the response from the timeline > server. > 2016-10-15 21:51:36,021 [WARN] [HistoryEventHandlingThread] > |ats.ATSHistoryLoggingService|: Could not handle history events > org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response > from the timeline server. > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:339) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:301) > at > org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.handleEvents(ATSHistoryLoggingService.java:357) > at > org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.access$700(ATSHistoryLoggingService.java:53) > at > org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService$1.run(ATSHistoryLoggingService.java:190) > at java.lang.Thread.run(Thread.java:745) > > > > i'm running the hive cli on host=dwrdevnn1. > > i updated yarn-site.xml on dwrdevnn1. > > i restarted the ATS service on dwrdevnn1. sudo -u yarn -- yarn-daemon.sh > --config /etc/hadoop/conf start timelineserver > > netstat is showing 8188 as being alive. i can also telnet to dwrdevnn1 > 8188. also port 10200 is LISTENing. > > $ sudo netstat -lanp | grep 31168 > tcp 0 0 172.19.103.136:10200 0.0.0.0:* > LISTEN 31168/java > tcp 0 0 172.19.103.136:8188 0.0.0.0:* > LISTEN 31168/java > > > might there be a debug log level i can set on impl.TimelineClientImpl to > see what is happening on the connection event? > > thank you again! > > Cheers, > Stephen. > > > > > On Sun, Oct 16, 2016 at 9:54 AM, Hitesh Shah <hit...@apache.org> wrote: > >> Hello Stephen, >> >> yarn-site.xml needs to be updated wherever the Tez client is used. i.e if >> you are using Hive, then wherever you launch the Hive CLI and also where >> the HiveServer2 is installed ( HS2 will need a restart ). >> >> To see if the connection to timeline is/was an issue, please check the >> yarn app logs for any Tez application ( the application master logs to be >> more specific: syslog_dag* files) to see if there are any >> warnings/exceptions being logged related to history event handling. >> >> thanks >> — Hitesh >> >> > On Oct 15, 2016, at 9:58 PM, Stephen Sprague <sprag...@gmail.com> >> wrote: >> > >> > hmm... made that change to yarn-site.xml and retarted the >> timelineserver and RM. >> > >> > $ sudo netstat -lanp | grep 31168 #timelineserver >> > >> > tcp 0 0 172.19.103.136:10200 0.0.0.0:* >> LISTEN 31168/java >> > tcp 0 0 172.19.103.136:8188 0.0.0.0:* >> LISTEN 31168/java >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45299 >> ESTABLISHED 31168/java >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45298 >> ESTABLISHED 31168/java >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45322 >> ESTABLISHED 31168/java >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45297 >> ESTABLISHED 31168/java >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45316 >> ESTABLISHED 31168/java >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45318 >> ESTABLISHED 31168/java >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45317 >> ESTABLISHED 31168/java >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45321 >> ESTABLISHED 31168/java >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45326 >> ESTABLISHED 31168/java >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45314 >> ESTABLISHED 31168/java >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45315 >> ESTABLISHED 31168/java >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45313 >> ESTABLISHED 31168/java >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45320 >> ESTABLISHED 31168/java >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45324 >> ESTABLISHED 31168/java >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45325 >> ESTABLISHED 31168/java >> > tcp 0 0 172.19.103.136:8188 172.19.103.136:45319 >> ESTABLISHED 31168/java >> > unix 2 [ ] STREAM CONNECTED 1455259739 31168/java >> > unix 2 [ ] STREAM CONNECTED 1455253313 31168/java >> > >> > >> > still no dice though. same error. i only changed yarn-site.xml on >> the namenode though. you think i need to copy it to all the datanodes and >> restart the NM's too? >> > >> > any other suggestions? >> > >> > 'ppreciate the help! >> > >> > >> > Cheers, >> > Stephen. >> > >> > On Sat, Oct 15, 2016 at 8:46 PM, Allan Wilson <wilsoncr...@gmail.com> >> wrote: >> > Just saw Gopals response...that def needs updating too. >> > >> > Sent from my iPhone >> > >> > On Oct 15, 2016, at 9:31 PM, Stephen Sprague <sprag...@gmail.com> >> wrote: >> > >> >> thanks guys. lemme answer. >> >> >> >> Sreenath- >> >> 1. yarn.acl.enable = false (ie. i did not set it) >> >> 2. this: http://dwrdevnn1.sv2.trulia.com:9766 displays index.html >> with an *empty* list >> >> >> >> Gopal- >> >> 3. i'll replace 0.0.0.0 with dwrdevnn1.sv2.trulia.com and see >> happens... >> >> >> >> Allan- >> >> 4. yes, metrics are enabled. >> >> >> >> >> >> I'll let you know what happens with Gopal's suggestion. >> >> >> >> >> >> Cheers, >> >> Stephen. >> >> >> >> On Sat, Oct 15, 2016 at 8:20 PM, Allan Wilson <wilsoncr...@gmail.com> >> wrote: >> >> Are you emitting metrics to the ATS? >> >> >> >> yarn.timeline-service.enabled=true >> >> >> >> Sent from my iPhone >> >> >> >> On Oct 15, 2016, at 8:36 PM, Sreenath Somarajapuram < >> ssomarajapu...@hortonworks.com> wrote: >> >> >> >>> Hi Stephen, >> >>> >> >>> The error message is coming from ATS, and it says that the >> application data is not available. >> >>> And yes, tez_application_1476574340629_0001 is a legit value. It can >> be considered as the id for Tez application details. >> >>> >> >>> Please help me with these: >> >>> 1. Are you having yarn.acl.enable = true in yarn-site.xml ? >> >>> 2. On going to http://dwrdevnn1.sv2.trulia.com:9766 from your >> browser window, the UI is supposed to display a list of DAGs. Are you able >> to view them? >> >>> >> >>> Thanks, >> >>> Sreenath >> >>> >> >>> From: Stephen Sprague <sprag...@gmail.com> >> >>> Reply-To: "user@tez.apache.org" <user@tez.apache.org> >> >>> Date: Sunday, October 16, 2016 at 7:16 AM >> >>> To: "user@tez.apache.org" <user@tez.apache.org> >> >>> Subject: Tez UI >> >>> >> >>> hey guys, >> >>> i'm having hard time getting the Tez UI to work. I'm sure i'm doing >> something wrong but i can't seem to figure out. Here's my scenario. >> >>> >> >>> 1. i'm using nginx as the webserver. port 9766. using that port >> without params correctly displays index.html. (i followed the instructions >> on unzipping the war file - that seems ok - i'm using tez-ui2 ) >> >>> >> >>> >> >>> 2. i run a Tez job. It runs fine. >> >>> >> >>> >> >>> 3. i click on the "History" hyperlink in the RM UI at 8088. >> >>> >> >>> >> >>> 4. it attempts to run http://dwrdevnn1.sv2.trulia.co >> m:8088/proxy/application_1476574340629_0001/#/tez-app/applic >> ation_1476574340629_0001 >> >>> >> >>> >> >>> 5. which yields this error: >> >>> >> >>> <image.png> >> >>> >> >>> i see "id: tez_application_1476574340629_0001" is that "tez_" >> prefix legit? >> >>> >> >>> >> >>> >> >>> 6. the ATS is running on port 8188. I've modified the file >> config/configs.env as well: cf. timeline: "http://dwrdevnn1.sv2.trulia.c >> om:8188", >> >>> >> >>> >> >>> 7. here are those details: >> >>> >> >>> yarn 29762 1 12 18:10 pts/5 00:00:11 >> /usr/lib/jvm/java-8-oracle/jre/bin/java -Dproc_timelineserver -Xmx1000m >> -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn >> -Dhadoop.log.file=yarn-yarn-timelineserver-dwrdevnn1.log >> -Dyarn.log.file=yarn-yarn-timelineserver-dwrdevnn1.log -Dyarn.home.dir= >> -Dyarn.id.str=yarn -Dhadoop.root.logger=INFO,RFA >> -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/lib/hadoop/lib/native >> -Dyarn.policy.file=hadoop-policy.xml -Dhadoop.log.dir=/var/log/hadoop-yarn >> -Dyarn.log.dir=/var/log/hadoop-yarn >> -Dhadoop.log.file=yarn-yarn-timelineserver-dwrdevnn1.log >> -Dyarn.log.file=yarn-yarn-timelineserver-dwrdevnn1.log >> -Dyarn.home.dir=/usr/lib/hadoop-yarn -Dhadoop.home.dir=/usr/lib/hadoop-yarn >> -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA >> -Djava.library.path=/usr/lib/hadoop/lib/native -classpath >> /etc/hadoop/conf:/etc/hadoop/conf:/etc/hadoop/conf:/usr/lib/ >> hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/ >> usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/ >> lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/ >> hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:/ >> opt/pepperdata/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hado >> op-yarn/lib/*:/etc/hadoop/conf/timelineserver-config/log4j.properties >> org.apache.hadoop.yarn.server.applicationhistoryservice.Appl >> icationHistoryServer >> >>> >> >>> $ sudo netstat -lanp |grep 29762 >> >>> tcp 0 0 0.0.0.0:10200 0.0.0.0:* >> LISTEN 29762/java >> >>> tcp 0 0 0.0.0.0:8188 0.0.0.0:* >> LISTEN 29762/java >> >>> >> >>> >> >>> >> >>> 8. the configs in yarn-site.xml >> >>> <property> >> >>> <name>yarn.timeline-service.hostname</name> >> >>> <value>0.0.0.0</value> >> >>> </property> >> >>> <property> >> >>> <name>yarn.timeline-service.enabled</name> >> >>> <value>true</value> >> >>> </property> >> >>> <property> >> >>> <name>yarn.timeline-service.webapp.address</name> >> >>> <value>0.0.0.0:8188</value> >> >>> </property> >> >>> <property> >> >>> <name>yarn.timeline-service.http-cross-origin.enabled</name> >> >>> <value>true</value> >> >>> </property> >> >>> <property> >> >>> <name>yarn.resourcemanager.system-metrics-publisher.enabled >> </name> >> >>> <value>true</value> >> >>> </property> >> >>> >> >>> >> >>> 9. and tez-site.xml are as follows: >> >>> <property> >> >>> <description>Enable Tez to use the Timeline Server for History >> Logging</description> >> >>> <name>tez.history.logging.service.class</name> >> >>> <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLog >> gingService</value> >> >>> </property> >> >>> >> >>> <!-- port 9766 defined in nginx config file --> >> >>> <property> >> >>> <description>URL for where the Tez UI is hosted</description> >> >>> <name>tez.tez-ui.history-url.base</name> >> >>> <value>http://dwrdevnn1.sv2.trulia.com:9766</value> >> >>> </property> >> >>> >> >>> <!-- from tez-ui README.txt --> >> >>> <property> >> >>> <name>tez.runtime.convert.user-payload.to.history-text</name> >> >>> <value>true</value> >> >>> <description>Should be enabled to get the configuration options. >> If enabled, the config options are set as >> >>> userpayload per input/output. >> >>> </description> >> >>> </property> >> >>> >> >>> <property> >> >>> <name>tez.allow.disabled.timeline-domains</name> >> >>> <value>true</value> >> >>> </property> >> >>> >> >>> >> >>> >> >>> So i don't get it. Any ideas why this fails? >> >>> >> >>> thanks, >> >>> Stephen. >> >>> >> >>> >> >>> <image.png> >> >> >> > >> >> >