first pass: 1. changing yarn.timeline-service.ttl-enable to false didn't seem work. i restarted the TLS and HS2 and RM. and the query still stuck around.
2. figure i'd try using RollingLevelDbTimelineStore but got class not found so i'll dig around for that later today. current settings for "yarn.timeline-service.*" vars are now this: yarn.timeline-service.address=${yarn.timeline-service.hostname}:10200 yarn.timeline-service.client.max-retries=30 yarn.timeline-service.client.retry-interval-ms=1000 yarn.timeline-service.enabled=true yarn.timeline-service.handler-thread-count=10 yarn.timeline-service.hostname=XXXXX.sv2.trulia.com yarn.timeline-service.http-authentication.simple.anonymous.allowed=true yarn.timeline-service.http-authentication.type=simple yarn.timeline-service.http-cross-origin.enabled=true yarn.timeline-service.keytab=/etc/krb5.keytab yarn.timeline-service.leveldb-timeline-store.path=${hadoop.tmp.dir}/yarn/timeline yarn.timeline-service.leveldb-timeline-store.read-cache-size=104857600 yarn.timeline-service.leveldb-timeline-store.start-time-read-cache-size=10000 yarn.timeline-service.leveldb-timeline-store.start-time-write-cache-size=10000 yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms=300000 yarn.timeline-service.store-class=org.apache.hadoop.yarn.server.timeline.RollingLevelDbTimelineStore <-- need to find jar with this class yarn.timeline-service.ttl-enable=false <-- change to false yarn.timeline-service.ttl-ms=604800000 <-- one week? yarn.timeline-service.webapp.address=XXXX.sv2.trulia.com:8188 yarn.timeline-service.webapp.https.address=${yarn.timeline-service.hostname}:8190 looking at the stderr of that one container hanging around we have this below. 2016-12-14 13:58:38 Running Dag: dag_1481520856023_2137_1 Dec 14, 2016 1:58:51 PM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider get WARNING: You are attempting to use a deprecated API (specifically, attempting to @Inject ServletContext inside an eagerly created singleton. While we allow this for backwards compatibility, be warned that this MAY have unexpected behavior if you have more than one injector (with ServletModule) running in the same JVM. Please consult the Guice documentation at http://code.google.com/p/google-guice/wiki/Servlets for more information. 2016-12-14 13:59:06 Completed Dag: dag_1481520856023_2137_1 2016-12-14 13:59:11 Running Dag: dag_1481520856023_2137_2 2016-12-14 13:59:19 Completed Dag: dag_1481520856023_2137_2 2016-12-14 13:59:25 Running Dag: dag_1481520856023_2137_3 2016-12-14 13:59:39 Completed Dag: dag_1481520856023_2137_3 2016-12-14 13:59:43 Running Dag: dag_1481520856023_2137_4 2016-12-14 13:59:54 Completed Dag: dag_1481520856023_2137_4 2016-12-14 13:59:56 Running Dag: dag_1481520856023_2137_5 2016-12-14 14:00:08 Completed Dag: dag_1481520856023_2137_5 2016-12-14 14:00:10 Running Dag: dag_1481520856023_2137_6 2016-12-14 14:03:21 Completed Dag: dag_1481520856023_2137_6 2016-12-14 14:03:26 Running Dag: dag_1481520856023_2137_7 2016-12-14 14:03:44 Completed Dag: dag_1481520856023_2137_7 2016-12-14 14:03:47 Running Dag: dag_1481520856023_2137_8 2016-12-14 14:04:04 Completed Dag: dag_1481520856023_2137_8 2016-12-14 14:04:11 Running Dag: dag_1481520856023_2137_9 2016-12-14 14:04:35 Completed Dag: dag_1481520856023_2137_9 2016-12-14 14:04:48 Running Dag: dag_1481520856023_2137_10 2016-12-14 14:04:54 Completed Dag: dag_1481520856023_2137_10 and this is stdout: * spragues@dwrdevdn27:~$ sudo ls -l /storage6/hadoop/yarn/logs/application_1481520856023_2137/container_1481520856023_2137_01_000001/stdout -rw-rw-r-- 1 yarn yarn 655355 Dec 14 14:54 /storage6/hadoop/yarn/logs/application_1481520856023_2137/container_1481520856023_2137_01_000001/stdout * spragues@dwrdevdn27:~$ sudo tail /storage6/hadoop/yarn/logs/application_1481520856023_2137/container_1481520856023_2137_01_000001/stdout [Ref Enq: 0.0 ms] [Redirty Cards: 0.1 ms] [Free CSet: 0.4 ms] [Eden: 215.0M(215.0M)->0.0B(157.0M) Survivors: 3072.0K->28.0M Heap: 331.0M(373.0M)->149.4M(373.0M)] [Times: user=0.11 sys=0.00, real=0.03 secs] Heap garbage-first heap total 381952K, used 229430K [0x00000000ccc00000, 0x00000000e4100000, 0x0000000100000000) region size 1024K, 102 young (104448K), 28 survivors (28672K) Metaspace used 52193K, capacity 52928K, committed 52952K, reserved 1095680K class space used 5696K, capacity 5868K, committed 5888K, reserved 1048576K spragues@dwrdevdn27:~$ sudo tail -30 /storage6/hadoop/yarn/logs/application_1481520856023_2137/container_1481520856023_2137_01_000001/stdout 3352.102: [GC pause (G1 Evacuation Pause) (young), 0.0283240 secs] [Parallel Time: 5.7 ms, GC Workers: 18] [GC Worker Start (ms): Min: 3352102.4, Avg: 3352102.5, Max: 3352102.6, Diff: 0.2] [Ext Root Scanning (ms): Min: 0.8, Avg: 1.0, Max: 2.2, Diff: 1.5, Sum: 17.7] [Update RS (ms): Min: 1.1, Avg: 2.7, Max: 4.4, Diff: 3.2, Sum: 49.0] [Processed Buffers: Min: 1, Avg: 3.1, Max: 10, Diff: 9, Sum: 56] [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.9] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.2, Sum: 0.5] [Object Copy (ms): Min: 0.1, Avg: 1.2, Max: 1.5, Diff: 1.5, Sum: 21.4] [Termination (ms): Min: 0.0, Avg: 0.5, Max: 0.6, Diff: 0.6, Sum: 8.4] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.6] [GC Worker Total (ms): Min: 5.3, Avg: 5.5, Max: 5.6, Diff: 0.3, Sum: 98.5] [GC Worker End (ms): Min: 3352107.9, Avg: 3352108.0, Max: 3352108.0, Diff: 0.1] [Code Root Fixup: 0.4 ms] [Code Root Migration: 0.8 ms] [Code Root Purge: 0.0 ms] [Clear CT: 0.4 ms] [Other: 21.0 ms] [Choose CSet: 0.0 ms] [Ref Proc: 20.2 ms] [Ref Enq: 0.0 ms] [Redirty Cards: 0.1 ms] [Free CSet: 0.4 ms] [Eden: 215.0M(215.0M)->0.0B(157.0M) Survivors: 3072.0K->28.0M Heap: 331.0M(373.0M)->149.4M(373.0M)] [Times: user=0.11 sys=0.00, real=0.03 secs] Heap garbage-first heap total 381952K, used 229430K [0x00000000ccc00000, 0x00000000e4100000, 0x0000000100000000) region size 1024K, 102 young (104448K), 28 survivors (28672K) Metaspace used 52193K, capacity 52928K, committed 52952K, reserved 1095680K class space used 5696K, capacity 5868K, committed 5888K, reserved 1048576K So definitely looks GC-ish related, yeah? okay, onward looking for that RollingLevelDb class next... Cheers, STephen. On Wed, Dec 14, 2016 at 10:03 AM, Stephen Sprague <sprag...@gmail.com> wrote: > Thanks Gopal. I'll set the ttl flag to false and see what gives. > > Cheers, > Stephen > > On Tue, Dec 13, 2016 at 10:48 PM, Gopal Vijayaraghavan <gop...@apache.org> > wrote: > >> > yarn.timeline-service.ttl-enable=true >> >> Let us validate that this is due to the TTL GC kicking in and disable the >> TTL flag & leave it running for a day. >> >> Better to also verify the Tez logs of sessions hanging along waiting for >> the ATS to collect events (look for the last _post log file in the AM logs >> link). >> >> > you propose that setting that to "RollingLevelDbTimelineStore" might >> fix the issue? >> >> Yes, but you would lose all the existing history, so not yet - but it >> will be what you need to do to get out of the TTL. >> >> Cheers, >> Gopal >> >> >> >