[ 
https://issues.apache.org/jira/browse/YARN-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16113318#comment-16113318
 ] 

Arun Suresh commented on YARN-6920:
-----------------------------------

Without this patch, the test case would time out (the testcase timeout is 200s) 
and you should see the following in the logs:
{noformat}
....
2017-08-03 11:58:04,094 INFO  container.ContainerImpl 
(ContainerImpl.java:transition(1382)) - Relaunching Container 
[container_1501786677410_0001_01_000002] for re-initialization !!
2017-08-03 11:58:04,094 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(1691)) - Container 
container_1501786677410_0001_01_000002 transitioned from REINITIALIZING to 
SCHEDULED
2017-08-03 11:58:04,094 WARN  scheduler.ContainerScheduler 
(ContainerScheduler.java:pickOpportunisticContainersToKill(384)) - There are no 
sufficient resources to start guaranteed 
[container_1501786677410_0001_01_000002]at the moment. Opportunistic containers 
are in the process ofbeing killed to make room.
....
{noformat}

With the patch, if the test does fail for you - it might be due to some other 
assertion failure, not a timeout. And you should not see the above call to 
{{pickOpportunisticContainersToKill()}} in the logs.

Reason:
During container re-initialization, the container process is killed and 
re-launched. This transfers control back to the ContainerScheduler, which, 
after YARN-6706 always checks to see if resources are available to launch the 
container, irrespective of whether queuing is turned on or off. Un-fortunately, 
when the container was killed for re-initialization, we had neglected to 
subtract (reclaim) the containers resources from the utilization tracker, due 
to which the afore mentioned check fails on re launch. This patch makes sure 
the resources are reclaimed.


> Fix TestNMClient failure due to YARN-6706
> -----------------------------------------
>
>                 Key: YARN-6920
>                 URL: https://issues.apache.org/jira/browse/YARN-6920
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
>         Attachments: YARN-6920.001.patch, YARN-6920.002.patch, 
> YARN-6920.003.patch, YARN-6920.004.patch
>
>
> Looks like {{TestNMClient}} has been failing for a while. Opening this JIRA 
> to track the fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to