Yuliya Feldman commented on YARN-3803:

This situation is easily reproducible while running any M/R job as user with id 
< 500 on a cluster with single NM using LinuxContainerExecutor.

So far the only solution I found is to proceed with localization in 
DuplicateFetchResourceTransition if ref == 0.
This solution does not seem to look very clean according to state transitions, 
but there is no otherwise any evidence that previous container localization 

I would appreciate comments/thoughts on this

> Application hangs after more then one localization attempt fails on the same 
> NM
> -------------------------------------------------------------------------------
>                 Key: YARN-3803
>                 URL: https://issues.apache.org/jira/browse/YARN-3803
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.7.0, 2.5.1
>            Reporter: Yuliya Feldman
>            Assignee: Yuliya Feldman
>            Priority: Minor
> In the sandbox (single node) environment with LinuxContainerExecutor when 
> first Application Localization attempt fails second attempt can not proceed 
> and subsequently application hangs until RM kills it as non-responding.

This message was sent by Atlassian JIRA

Reply via email to