> The failures are always intermittent. Any idea why this happens? First up you should try 0.7.1, because of TEZ-2663.
> Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has >already shutdown. Application application_1444019975627_0001 failed 2 >times due to AM Container for appattempt_1444019975627_0001_000002 exited >with exitCode: 255 Can you say what is printed in the AppMaster logs? I've seen this occasionally happen due to bad setup of cluster uid-limits. The ambari sets that up in a file named /etc/security/limits.d/yarn.conf (yarn.conf.j2) Check for that file. Otherwise as shuffle handler spawns threads, the container launchers will start to intermittently fail (the default is 1024 threads per-user, yarn.conf ups this to 65,000 threads). Cheers, Gopal
