Re: Samza job killed by left orphaned on YARN

2016-05-19 Thread Yi Pan
Hi, David and all, The "ultimate" solution is probably to implement SAMZA-871 , which allows Samza JobCoordinator directly identifies whether a container is alive or not w/o dependency on the cluster management systems. This is also considered

Re: Samza job killed by left orphaned on YARN

2016-05-19 Thread David Yu
Just stumbled upon this post and sees to be the same issue: https://issues.apache.org/jira/browse/SAMZA-498 We followed the fix to create a wrapper kill script and everything works. Do we have a plan to fix this in the next version of Samza? Thanks, David On Wed, May 18, 2016 at 11:53 AM,

Re: Samza job killed by left orphaned on YARN

2016-05-18 Thread Jacob Maes
Hmm, could there be something in your job holding up the container shutdown process? Perhaps something ignoring SIGTERM/Thread.interrupt, by chance? Also, I think there's a YARN property specifying the amount of time the NM waits between sending a SIGTERM and a SIGKILL, though I can't find it at

Re: Samza job killed by left orphaned on YARN

2016-05-18 Thread David Yu
>From the NM log, I'm seeing: 2016-05-18 06:29:06,248 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_e01_1463512986427_0007_01_022016-05-18 06:29:06,265 INFO