ah. 2016-12-14 14:05:07,855 [WARN] [AMShutdownThread] |ats.ATSHistoryLoggingService|: ATSService being stopped, eventQueueBacklog=14820, maxTimeLeftToFlush=-1, waitForever=true 2016-12-14 14:05:37,877 [ERROR] [AMShutdownThread] |impl.TimelineClientImpl|: Failed to get the response from the timeline server. java.lang.RuntimeException: Failed to connect to timeline server. Connection retries limit exceeded. The posted timeline event may be missing
so looks like something wonky with the timeline service. yet. $ ps -ef | grep timeline spragues 14326 19414 99 16:43 pts/1 00:02:02 /usr/lib/jvm/java-8-oracle/jre/bin/java -Dproc_timelineserver -Xmx1000m -Dhadoop.log.dir=/usr/lib/hadoop-yarn/logs -Dyarn.log.dir=/usr/lib/hadoop-yarn/logs -Dhadoop.log.file=yarn.log -Dyarn.log.file=yarn.log -Dyarn.home.dir= -Dyarn.id.str= -Dhadoop.root.logger=INFO,console -Dyarn.root.logger=INFO,console -Djava.library.path=/usr/lib/hadoop/lib/native -Dyarn.policy.file=hadoop-policy.xml -Dhadoop.log.dir=/usr/lib/hadoop-yarn/logs -Dyarn.log.dir=/usr/lib/hadoop-yarn/logs -Dhadoop.log.file=yarn.log -Dyarn.log.file=yarn.log -Dyarn.home.dir=/usr/lib/hadoop-yarn -Dhadoop.home.dir=/usr/lib/hadoop-yarn -Dhadoop.root.logger=INFO,console -Dyarn.root.logger=INFO,console -Djava.library.path=/usr/lib/hadoop/lib/native -classpath /etc/hadoop/conf:/etc/hadoop/conf:/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:/usr/lib/apache-tez-0.8.4-bin/conf:/usr/lib/apache-tez-0.8.4-bin/*:/usr/lib/apache-tez-0.8.4-bin/lib/*:/opt/pepperdata/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-yarn/lib/*:/etc/hadoop/conf/timelineserver-config/log4j.properties org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer $ sudo netstat -lanp | grep 14326 | grep LISTEN tcp 0 0 172.19.73.136:10200 0.0.0.0:* LISTEN 14326/java tcp 0 0 172.19.73.136:8188 0.0.0.0:* LISTEN 14326/java so i'm pretty sure its up and running. ran the test tez job again just now and looked a syslog file on the DN. found this again. spragues@dwrdevdn13:~$ sudo cat /storage7/hadoop/yarn/logs/application_1481520856023_2250/container_1481520856023_2250_01_000001/syslog 2016-12-14 16:46:21,177 [ERROR] [HistoryEventHandlingThread] |impl.TimelineClientImpl|: Failed to get the response from the timeline server. 2016-12-14 16:46:21,178 [WARN] [HistoryEventHandlingThread] |ats.ATSHistoryLoggingService|: Could not handle history events org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response from the timeline server. at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:339) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:301) at org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.handleEvents(ATSHistoryLoggingService.java:357) at org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.access$700(ATSHistoryLoggingService.java:53) at org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService$1.run(ATSHistoryLoggingService.java:190) at java.lang.Thread.run(Thread.java:745) 2016-12-14 16:46:21,541 [ERROR] [HistoryEventHandlingThread] |impl.TimelineClientImpl|: Failed to get the response from the timeline server. 2016-12-14 16:46:21,541 [WARN] [HistoryEventHandlingThread] |ats.ATSHistoryLoggingService|: Could not handle history events org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response from the timeline server. at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:339) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:301) at org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.handleEvents(ATSHistoryLoggingService.java:357) at org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.access$700(ATSHistoryLoggingService.java:53) at org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService$1.run(ATSHistoryLoggingService.java:190) at java.lang.Thread.run(Thread.java:745) mapreduce.job.emit-timeline-data=false yarn.timeline-service.address=${yarn.timeline-service.hostname}:10200 yarn.timeline-service.client.max-retries=30 yarn.timeline-service.client.retry-interval-ms=1000 yarn.timeline-service.enabled=true yarn.timeline-service.handler-thread-count=10 yarn.timeline-service.hostname=dwrdevnn1.sv2.trulia.com yarn.timeline-service.http-authentication.simple.anonymous.allowed=true yarn.timeline-service.http-authentication.type=simple yarn.timeline-service.http-cross-origin.enabled=true yarn.timeline-service.keytab=/etc/krb5.keytab yarn.timeline-service.leveldb-timeline-store.path=${hadoop.tmp.dir}/yarn/timeline yarn.timeline-service.leveldb-timeline-store.read-cache-size=104857600 yarn.timeline-service.leveldb-timeline-store.start-time-read-cache-size=10000 yarn.timeline-service.leveldb-timeline-store.start-time-write-cache-size=10000 yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms=300000 yarn.timeline-service.store-class=org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore yarn.timeline-service.ttl-enable=false yarn.timeline-service.ttl-ms=604800000 yarn.timeline-service.webapp.address=dwrdevnn1.sv2.trulia.com:8188 yarn.timeline-service.webapp.https.address=${yarn.timeline-service.hostname}:8190 I think i must be missing something obvious but if the timeline service is running and tez is using ATSHistoryLoggingService one would think it would work, no? thanks again for your help! Cheers, Stephen. On Wed, Dec 14, 2016 at 4:23 PM, Gopal Vijayaraghavan <gop...@apache.org> wrote: > > > looking at the stderr of that one container hanging around we have this > below. > > Look in the syslog for a log line which starts with > > ATSService being stopped, eventQueueBacklog=<number>…, waitForever=true > > Cheers, > Gopal > > > > > > >