[jira] [Commented] (YARN-3558) Additional containers getting reserved from RM in case of Fair scheduler

2015-05-28 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564112#comment-14564112
 ] 

Xianyin Xin commented on YARN-3558:
---

By check the log, i think i found the cause. 
First we note that the pending request containers host by AM and scheduler may 
not be consistent at some time. For example, at time 01'', app submit request 
with 3 containers, then the scheduler allocate 2, and pending 1. At time 02'', 
AM update the request where the #containers is still 3, after updated the 
request info, AM gets back the allocated 2 container. But now the pending 
containers is still 3 in AppSchedulingInfo, even though the real request is 1. 
In the next heartbeat at 03'', AM then update the request with 1 container. If 
there're many tasks, such inconsistent will be corrected to some extent.
However, this is not the only reason of this jira. Near the end of map tasks, 
AM updates request with 1 container (in fact this container had been allocated, 
but did not been fetched by AM) at 15:10:38,606, scheduler make two 
reservations on two nodes for this container request (container 19 and 20). 
However, this 1 containers has been fulfilled, then at 15:10:39,622, AM updates 
the request with 0 container. But during this second, 19 and 20 are reserved.
There are two problem here: 1, request host by AM and Scheduler are not 
consistent; 2, conservations are made on many nodes. We should consider the 
reasonability of the two, especially the first.

 Additional containers getting reserved from RM in case of Fair scheduler
 

 Key: YARN-3558
 URL: https://issues.apache.org/jira/browse/YARN-3558
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, resourcemanager
Affects Versions: 2.7.0
 Environment: OS :Suse 11 Sp3
 Setup : 2 RM 2 NM
 Scheduler : Fair scheduler
Reporter: Bibin A Chundatt
 Attachments: Amlog.txt, rm.log


 Submit PI job with 16 maps
 Total container expected : 16 MAPS + 1 Reduce  + 1 AM
 Total containers reserved by RM is 21
 Below set of containers are not being used for execution
 container_1430213948957_0001_01_20
 container_1430213948957_0001_01_19
 RM Containers reservation and states
 {code}
  Processing container_1430213948957_0001_01_01 of type START
  Processing container_1430213948957_0001_01_01 of type ACQUIRED
  Processing container_1430213948957_0001_01_01 of type LAUNCHED
  Processing container_1430213948957_0001_01_02 of type START
  Processing container_1430213948957_0001_01_03 of type START
  Processing container_1430213948957_0001_01_02 of type ACQUIRED
  Processing container_1430213948957_0001_01_03 of type ACQUIRED
  Processing container_1430213948957_0001_01_04 of type START
  Processing container_1430213948957_0001_01_05 of type START
  Processing container_1430213948957_0001_01_04 of type ACQUIRED
  Processing container_1430213948957_0001_01_05 of type ACQUIRED
  Processing container_1430213948957_0001_01_02 of type LAUNCHED
  Processing container_1430213948957_0001_01_04 of type LAUNCHED
  Processing container_1430213948957_0001_01_06 of type RESERVED
  Processing container_1430213948957_0001_01_03 of type LAUNCHED
  Processing container_1430213948957_0001_01_05 of type LAUNCHED
  Processing container_1430213948957_0001_01_07 of type START
  Processing container_1430213948957_0001_01_07 of type ACQUIRED
  Processing container_1430213948957_0001_01_07 of type LAUNCHED
  Processing container_1430213948957_0001_01_08 of type RESERVED
  Processing container_1430213948957_0001_01_02 of type FINISHED
  Processing container_1430213948957_0001_01_06 of type START
  Processing container_1430213948957_0001_01_06 of type ACQUIRED
  Processing container_1430213948957_0001_01_06 of type LAUNCHED
  Processing container_1430213948957_0001_01_04 of type FINISHED
  Processing container_1430213948957_0001_01_09 of type START
  Processing container_1430213948957_0001_01_09 of type ACQUIRED
  Processing container_1430213948957_0001_01_09 of type LAUNCHED
  Processing container_1430213948957_0001_01_10 of type RESERVED
  Processing container_1430213948957_0001_01_03 of type FINISHED
  Processing container_1430213948957_0001_01_08 of type START
  Processing container_1430213948957_0001_01_08 of type ACQUIRED
  Processing container_1430213948957_0001_01_08 of type LAUNCHED
  Processing container_1430213948957_0001_01_05 of type FINISHED
  Processing container_1430213948957_0001_01_11 of type START
  Processing container_1430213948957_0001_01_11 of type ACQUIRED
  Processing container_1430213948957_0001_01_11 of type LAUNCHED
  Processing 

[jira] [Commented] (YARN-3558) Additional containers getting reserved from RM in case of Fair scheduler

2015-05-28 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563007#comment-14563007
 ] 

Bibin A Chundatt commented on YARN-3558:


[~sunilg]. As discussed offline i have tried the same scenarios with only 
single node.
In case of single node number of container reserved is equal to number of 
required containers.
Only 18 containers where reserved by RM. 

 Additional containers getting reserved from RM in case of Fair scheduler
 

 Key: YARN-3558
 URL: https://issues.apache.org/jira/browse/YARN-3558
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, resourcemanager
Affects Versions: 2.7.0
 Environment: OS :Suse 11 Sp3
 Setup : 2 RM 2 NM
 Scheduler : Fair scheduler
Reporter: Bibin A Chundatt
 Attachments: Amlog.txt, rm.log


 Submit PI job with 16 maps
 Total container expected : 16 MAPS + 1 Reduce  + 1 AM
 Total containers reserved by RM is 21
 Below set of containers are not being used for execution
 container_1430213948957_0001_01_20
 container_1430213948957_0001_01_19
 RM Containers reservation and states
 {code}
  Processing container_1430213948957_0001_01_01 of type START
  Processing container_1430213948957_0001_01_01 of type ACQUIRED
  Processing container_1430213948957_0001_01_01 of type LAUNCHED
  Processing container_1430213948957_0001_01_02 of type START
  Processing container_1430213948957_0001_01_03 of type START
  Processing container_1430213948957_0001_01_02 of type ACQUIRED
  Processing container_1430213948957_0001_01_03 of type ACQUIRED
  Processing container_1430213948957_0001_01_04 of type START
  Processing container_1430213948957_0001_01_05 of type START
  Processing container_1430213948957_0001_01_04 of type ACQUIRED
  Processing container_1430213948957_0001_01_05 of type ACQUIRED
  Processing container_1430213948957_0001_01_02 of type LAUNCHED
  Processing container_1430213948957_0001_01_04 of type LAUNCHED
  Processing container_1430213948957_0001_01_06 of type RESERVED
  Processing container_1430213948957_0001_01_03 of type LAUNCHED
  Processing container_1430213948957_0001_01_05 of type LAUNCHED
  Processing container_1430213948957_0001_01_07 of type START
  Processing container_1430213948957_0001_01_07 of type ACQUIRED
  Processing container_1430213948957_0001_01_07 of type LAUNCHED
  Processing container_1430213948957_0001_01_08 of type RESERVED
  Processing container_1430213948957_0001_01_02 of type FINISHED
  Processing container_1430213948957_0001_01_06 of type START
  Processing container_1430213948957_0001_01_06 of type ACQUIRED
  Processing container_1430213948957_0001_01_06 of type LAUNCHED
  Processing container_1430213948957_0001_01_04 of type FINISHED
  Processing container_1430213948957_0001_01_09 of type START
  Processing container_1430213948957_0001_01_09 of type ACQUIRED
  Processing container_1430213948957_0001_01_09 of type LAUNCHED
  Processing container_1430213948957_0001_01_10 of type RESERVED
  Processing container_1430213948957_0001_01_03 of type FINISHED
  Processing container_1430213948957_0001_01_08 of type START
  Processing container_1430213948957_0001_01_08 of type ACQUIRED
  Processing container_1430213948957_0001_01_08 of type LAUNCHED
  Processing container_1430213948957_0001_01_05 of type FINISHED
  Processing container_1430213948957_0001_01_11 of type START
  Processing container_1430213948957_0001_01_11 of type ACQUIRED
  Processing container_1430213948957_0001_01_11 of type LAUNCHED
  Processing container_1430213948957_0001_01_07 of type FINISHED
  Processing container_1430213948957_0001_01_12 of type START
  Processing container_1430213948957_0001_01_12 of type ACQUIRED
  Processing container_1430213948957_0001_01_12 of type LAUNCHED
  Processing container_1430213948957_0001_01_13 of type RESERVED
  Processing container_1430213948957_0001_01_06 of type FINISHED
  Processing container_1430213948957_0001_01_10 of type START
  Processing container_1430213948957_0001_01_10 of type ACQUIRED
  Processing container_1430213948957_0001_01_10 of type LAUNCHED
  Processing container_1430213948957_0001_01_09 of type FINISHED
  Processing container_1430213948957_0001_01_14 of type START
  Processing container_1430213948957_0001_01_14 of type ACQUIRED
  Processing container_1430213948957_0001_01_14 of type LAUNCHED
  Processing container_1430213948957_0001_01_15 of type RESERVED
  Processing container_1430213948957_0001_01_08 of type FINISHED
  Processing container_1430213948957_0001_01_13 of type START
  Processing 

[jira] [Commented] (YARN-3558) Additional containers getting reserved from RM in case of Fair scheduler

2015-05-20 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551884#comment-14551884
 ] 

Sunil G commented on YARN-3558:
---

Hi [~bibinchundatt]
Could you please upload the RM logs.

 Additional containers getting reserved from RM in case of Fair scheduler
 

 Key: YARN-3558
 URL: https://issues.apache.org/jira/browse/YARN-3558
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, resourcemanager
Affects Versions: 2.7.0
 Environment: OS :Suse 11 Sp3
 Setup : 2 RM 2 NM
 Scheduler : Fair scheduler
Reporter: Bibin A Chundatt

 Submit PI job with 16 maps
 Total container expected : 16 MAPS + 1 Reduce  + 1 AM
 Total containers reserved by RM is 21
 Below set of containers are not being used for execution
 container_1430213948957_0001_01_20
 container_1430213948957_0001_01_19
 RM Containers reservation and states
 {code}
  Processing container_1430213948957_0001_01_01 of type START
  Processing container_1430213948957_0001_01_01 of type ACQUIRED
  Processing container_1430213948957_0001_01_01 of type LAUNCHED
  Processing container_1430213948957_0001_01_02 of type START
  Processing container_1430213948957_0001_01_03 of type START
  Processing container_1430213948957_0001_01_02 of type ACQUIRED
  Processing container_1430213948957_0001_01_03 of type ACQUIRED
  Processing container_1430213948957_0001_01_04 of type START
  Processing container_1430213948957_0001_01_05 of type START
  Processing container_1430213948957_0001_01_04 of type ACQUIRED
  Processing container_1430213948957_0001_01_05 of type ACQUIRED
  Processing container_1430213948957_0001_01_02 of type LAUNCHED
  Processing container_1430213948957_0001_01_04 of type LAUNCHED
  Processing container_1430213948957_0001_01_06 of type RESERVED
  Processing container_1430213948957_0001_01_03 of type LAUNCHED
  Processing container_1430213948957_0001_01_05 of type LAUNCHED
  Processing container_1430213948957_0001_01_07 of type START
  Processing container_1430213948957_0001_01_07 of type ACQUIRED
  Processing container_1430213948957_0001_01_07 of type LAUNCHED
  Processing container_1430213948957_0001_01_08 of type RESERVED
  Processing container_1430213948957_0001_01_02 of type FINISHED
  Processing container_1430213948957_0001_01_06 of type START
  Processing container_1430213948957_0001_01_06 of type ACQUIRED
  Processing container_1430213948957_0001_01_06 of type LAUNCHED
  Processing container_1430213948957_0001_01_04 of type FINISHED
  Processing container_1430213948957_0001_01_09 of type START
  Processing container_1430213948957_0001_01_09 of type ACQUIRED
  Processing container_1430213948957_0001_01_09 of type LAUNCHED
  Processing container_1430213948957_0001_01_10 of type RESERVED
  Processing container_1430213948957_0001_01_03 of type FINISHED
  Processing container_1430213948957_0001_01_08 of type START
  Processing container_1430213948957_0001_01_08 of type ACQUIRED
  Processing container_1430213948957_0001_01_08 of type LAUNCHED
  Processing container_1430213948957_0001_01_05 of type FINISHED
  Processing container_1430213948957_0001_01_11 of type START
  Processing container_1430213948957_0001_01_11 of type ACQUIRED
  Processing container_1430213948957_0001_01_11 of type LAUNCHED
  Processing container_1430213948957_0001_01_07 of type FINISHED
  Processing container_1430213948957_0001_01_12 of type START
  Processing container_1430213948957_0001_01_12 of type ACQUIRED
  Processing container_1430213948957_0001_01_12 of type LAUNCHED
  Processing container_1430213948957_0001_01_13 of type RESERVED
  Processing container_1430213948957_0001_01_06 of type FINISHED
  Processing container_1430213948957_0001_01_10 of type START
  Processing container_1430213948957_0001_01_10 of type ACQUIRED
  Processing container_1430213948957_0001_01_10 of type LAUNCHED
  Processing container_1430213948957_0001_01_09 of type FINISHED
  Processing container_1430213948957_0001_01_14 of type START
  Processing container_1430213948957_0001_01_14 of type ACQUIRED
  Processing container_1430213948957_0001_01_14 of type LAUNCHED
  Processing container_1430213948957_0001_01_15 of type RESERVED
  Processing container_1430213948957_0001_01_08 of type FINISHED
  Processing container_1430213948957_0001_01_13 of type START
  Processing container_1430213948957_0001_01_16 of type RESERVED
  Processing container_1430213948957_0001_01_13 of type ACQUIRED
  Processing container_1430213948957_0001_01_13 of type LAUNCHED
  Processing container_1430213948957_0001_01_11 of