Joseph Francis commented on YARN-4331:

[~jlowe] Setting yarn.nodemanager.recovery.enabled=true does solve the issue 
with orphaned containers.
Note that the SIGKILL was only done locally to emulate few production issues we 
had that caused nodemanagers to fall over.
Thanks very much for your clear explanation!

> Restarting NodeManager leaves orphaned containers
> -------------------------------------------------
>                 Key: YARN-4331
>                 URL: https://issues.apache.org/jira/browse/YARN-4331
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager, yarn
>    Affects Versions: 2.7.1
>            Reporter: Joseph Francis
>            Priority: Critical
> We are seeing a lot of orphaned containers running in our production clusters.
> I tried to simulate this locally on my machine and can replicate the issue by 
> killing nodemanager.
> I'm running Yarn 2.7.1 with RM state stored in zookeeper and deploying samza 
> jobs.
> Steps:
> {quote}1. Deploy a job 
> 2. Issue a kill -9 signal to nodemanager 
> 3. We should see the AM and its container running without nodemanager
> 4. AM should die but the container still keeps running
> 5. Restarting nodemanager brings up new AM and container but leaves the 
> orphaned container running in the background
> {quote}
> This is effectively causing double processing of data.

This message was sent by Atlassian JIRA

Reply via email to