> well we are seeing these sessions sitting around for over an hour 

This could be one of the causes for this issue - a stuck ATS. Tez won't kill a 
session till all the ATS info has been submitted out of the process.

RollingLevelDbTimelineStore & EntityGroupFSTimelineStore was written to fix 
this issue, but AFAIK those are not the default in the Apache Hadoop installs 
(but Ambari does set them up).

Check your yarn.timeline-service.store-class in yarn-site.xml, if it says 
LeveldbTimelineStore, you might see this behavior exactly 30 days after the 
cluster goes operational.

Cheers,
Gopal



Reply via email to