[ 
https://issues.apache.org/jira/browse/YARN-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564112#comment-14564112
 ] 

Xianyin Xin commented on YARN-3558:
-----------------------------------

By check the log, i think i found the cause. 
First we note that the pending request containers host by AM and scheduler may 
not be consistent at some time. For example, at time 01'', app submit request 
with 3 containers, then the scheduler allocate 2, and pending 1. At time 02'', 
AM update the request where the #containers is still 3, after updated the 
request info, AM gets back the allocated 2 container. But now the pending 
containers is still 3 in AppSchedulingInfo, even though the real request is 1. 
In the next heartbeat at 03'', AM then update the request with 1 container. If 
there're many tasks, such inconsistent will be corrected to some extent.
However, this is not the only reason of this jira. Near the end of map tasks, 
AM updates request with 1 container (in fact this container had been allocated, 
but did not been fetched by AM) at 15:10:38,606, scheduler make two 
reservations on two nodes for this container request (container 19 and 20). 
However, this 1 containers has been fulfilled, then at 15:10:39,622, AM updates 
the request with 0 container. But during this second, 19 and 20 are reserved.
There are two problem here: 1, request host by AM and Scheduler are not 
consistent; 2, conservations are made on many nodes. We should consider the 
reasonability of the two, especially the first.

> Additional containers getting reserved from RM in case of Fair scheduler
> ------------------------------------------------------------------------
>
>                 Key: YARN-3558
>                 URL: https://issues.apache.org/jira/browse/YARN-3558
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler, resourcemanager
>    Affects Versions: 2.7.0
>         Environment: OS :Suse 11 Sp3
> Setup : 2 RM 2 NM
> Scheduler : Fair scheduler
>            Reporter: Bibin A Chundatt
>         Attachments: Amlog.txt, rm.log
>
>
> Submit PI job with 16 maps
> Total container expected : 16 MAPS + 1 Reduce  + 1 AM
> Total containers reserved by RM is 21
> Below set of containers are not being used for execution
> container_1430213948957_0001_01_000020
> container_1430213948957_0001_01_000019
> RM Containers reservation and states
> {code}
>  Processing container_1430213948957_0001_01_000001 of type START
>  Processing container_1430213948957_0001_01_000001 of type ACQUIRED
>  Processing container_1430213948957_0001_01_000001 of type LAUNCHED
>  Processing container_1430213948957_0001_01_000002 of type START
>  Processing container_1430213948957_0001_01_000003 of type START
>  Processing container_1430213948957_0001_01_000002 of type ACQUIRED
>  Processing container_1430213948957_0001_01_000003 of type ACQUIRED
>  Processing container_1430213948957_0001_01_000004 of type START
>  Processing container_1430213948957_0001_01_000005 of type START
>  Processing container_1430213948957_0001_01_000004 of type ACQUIRED
>  Processing container_1430213948957_0001_01_000005 of type ACQUIRED
>  Processing container_1430213948957_0001_01_000002 of type LAUNCHED
>  Processing container_1430213948957_0001_01_000004 of type LAUNCHED
>  Processing container_1430213948957_0001_01_000006 of type RESERVED
>  Processing container_1430213948957_0001_01_000003 of type LAUNCHED
>  Processing container_1430213948957_0001_01_000005 of type LAUNCHED
>  Processing container_1430213948957_0001_01_000007 of type START
>  Processing container_1430213948957_0001_01_000007 of type ACQUIRED
>  Processing container_1430213948957_0001_01_000007 of type LAUNCHED
>  Processing container_1430213948957_0001_01_000008 of type RESERVED
>  Processing container_1430213948957_0001_01_000002 of type FINISHED
>  Processing container_1430213948957_0001_01_000006 of type START
>  Processing container_1430213948957_0001_01_000006 of type ACQUIRED
>  Processing container_1430213948957_0001_01_000006 of type LAUNCHED
>  Processing container_1430213948957_0001_01_000004 of type FINISHED
>  Processing container_1430213948957_0001_01_000009 of type START
>  Processing container_1430213948957_0001_01_000009 of type ACQUIRED
>  Processing container_1430213948957_0001_01_000009 of type LAUNCHED
>  Processing container_1430213948957_0001_01_000010 of type RESERVED
>  Processing container_1430213948957_0001_01_000003 of type FINISHED
>  Processing container_1430213948957_0001_01_000008 of type START
>  Processing container_1430213948957_0001_01_000008 of type ACQUIRED
>  Processing container_1430213948957_0001_01_000008 of type LAUNCHED
>  Processing container_1430213948957_0001_01_000005 of type FINISHED
>  Processing container_1430213948957_0001_01_000011 of type START
>  Processing container_1430213948957_0001_01_000011 of type ACQUIRED
>  Processing container_1430213948957_0001_01_000011 of type LAUNCHED
>  Processing container_1430213948957_0001_01_000007 of type FINISHED
>  Processing container_1430213948957_0001_01_000012 of type START
>  Processing container_1430213948957_0001_01_000012 of type ACQUIRED
>  Processing container_1430213948957_0001_01_000012 of type LAUNCHED
>  Processing container_1430213948957_0001_01_000013 of type RESERVED
>  Processing container_1430213948957_0001_01_000006 of type FINISHED
>  Processing container_1430213948957_0001_01_000010 of type START
>  Processing container_1430213948957_0001_01_000010 of type ACQUIRED
>  Processing container_1430213948957_0001_01_000010 of type LAUNCHED
>  Processing container_1430213948957_0001_01_000009 of type FINISHED
>  Processing container_1430213948957_0001_01_000014 of type START
>  Processing container_1430213948957_0001_01_000014 of type ACQUIRED
>  Processing container_1430213948957_0001_01_000014 of type LAUNCHED
>  Processing container_1430213948957_0001_01_000015 of type RESERVED
>  Processing container_1430213948957_0001_01_000008 of type FINISHED
>  Processing container_1430213948957_0001_01_000013 of type START
>  Processing container_1430213948957_0001_01_000016 of type RESERVED
>  Processing container_1430213948957_0001_01_000013 of type ACQUIRED
>  Processing container_1430213948957_0001_01_000013 of type LAUNCHED
>  Processing container_1430213948957_0001_01_000011 of type FINISHED
>  Processing container_1430213948957_0001_01_000016 of type START
>  Processing container_1430213948957_0001_01_000010 of type FINISHED
>  Processing container_1430213948957_0001_01_000015 of type START
>  Processing container_1430213948957_0001_01_000016 of type ACQUIRED
>  Processing container_1430213948957_0001_01_000015 of type ACQUIRED
>  Processing container_1430213948957_0001_01_000016 of type LAUNCHED
>  Processing container_1430213948957_0001_01_000012 of type FINISHED
>  Processing container_1430213948957_0001_01_000017 of type START
>  Processing container_1430213948957_0001_01_000015 of type LAUNCHED
>  Processing container_1430213948957_0001_01_000014 of type FINISHED
>  Processing container_1430213948957_0001_01_000018 of type START
>  Processing container_1430213948957_0001_01_000017 of type ACQUIRED
>  Processing container_1430213948957_0001_01_000018 of type ACQUIRED
>  Processing container_1430213948957_0001_01_000017 of type LAUNCHED
>  Processing container_1430213948957_0001_01_000019 of type RESERVED
>  Processing container_1430213948957_0001_01_000020 of type RESERVED
>  Processing container_1430213948957_0001_01_000018 of type RELEASED
>  Processing container_1430213948957_0001_01_000015 of type FINISHED
>  Processing container_1430213948957_0001_01_000013 of type FINISHED
>  Processing container_1430213948957_0001_01_000016 of type FINISHED
>  Processing container_1430213948957_0001_01_000017 of type FINISHED
>  Processing container_1430213948957_0001_01_000021 of type START
>  Processing container_1430213948957_0001_01_000021 of type ACQUIRED
>  Processing container_1430213948957_0001_01_000021 of type LAUNCHED
>  Processing container_1430213948957_0001_01_000021 of type FINISHED
>  Processing container_1430213948957_0001_01_000001 of type FINISHED
> {code}
> AM assignment to Nodes
> {code}
> ---------- Find in open document(s) ----------
> Assigning container container_1430213948957_0001_01_000002 with priority 20 
> to NM <NODE1>:64318
> Assigning container container_1430213948957_0001_01_000003 with priority 20 
> to NM <NODE2>:64318
> Assigning container container_1430213948957_0001_01_000004 with priority 20 
> to NM <NODE1>:64318
> Assigning container container_1430213948957_0001_01_000005 with priority 20 
> to NM <NODE2>:64318
> Assigning container container_1430213948957_0001_01_000007 with priority 20 
> to NM <NODE2>:64318
> Assigning container container_1430213948957_0001_01_000006 with priority 20 
> to NM <NODE1>:64318
> Assigning container container_1430213948957_0001_01_000009 with priority 20 
> to NM <NODE1>:64318
> Assigning container container_1430213948957_0001_01_000008 with priority 20 
> to NM <NODE2>:64318
> Assigning container container_1430213948957_0001_01_000011 with priority 20 
> to NM <NODE2>:64318
> Assigning container container_1430213948957_0001_01_000012 with priority 20 
> to NM <NODE2>:64318
> Assigning container container_1430213948957_0001_01_000010 with priority 20 
> to NM <NODE1>:64318
> Assigning container container_1430213948957_0001_01_000014 with priority 20 
> to NM <NODE1>:64318
> Assigning container container_1430213948957_0001_01_000013 with priority 20 
> to NM <NODE2>:64318
> Assigning container container_1430213948957_0001_01_000016 with priority 20 
> to NM <NODE2>:64318
> Assigning container container_1430213948957_0001_01_000015 with priority 20 
> to NM <NODE1>:64318
> Assigning container container_1430213948957_0001_01_000017 with priority 20 
> to NM <NODE2>:64318
> Assigning container container_1430213948957_0001_01_000018 with priority 20 
> to NM <NODE2>:64318
> Assigning container container_1430213948957_0001_01_000021 with priority 10 
> to NM <NODE2>:64318
> 18 occurrences have been found.
> Output completed (0 sec consumed)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to