[jira] [Commented] (YARN-3558) Additional containers getting reserved from RM in case of Fair scheduler
[ https://issues.apache.org/jira/browse/YARN-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564112#comment-14564112 ] Xianyin Xin commented on YARN-3558: --- By check the log, i think i found the cause. First we note that the pending request containers host by AM and scheduler may not be consistent at some time. For example, at time 01'', app submit request with 3 containers, then the scheduler allocate 2, and pending 1. At time 02'', AM update the request where the #containers is still 3, after updated the request info, AM gets back the allocated 2 container. But now the pending containers is still 3 in AppSchedulingInfo, even though the real request is 1. In the next heartbeat at 03'', AM then update the request with 1 container. If there're many tasks, such inconsistent will be corrected to some extent. However, this is not the only reason of this jira. Near the end of map tasks, AM updates request with 1 container (in fact this container had been allocated, but did not been fetched by AM) at 15:10:38,606, scheduler make two reservations on two nodes for this container request (container 19 and 20). However, this 1 containers has been fulfilled, then at 15:10:39,622, AM updates the request with 0 container. But during this second, 19 and 20 are reserved. There are two problem here: 1, request host by AM and Scheduler are not consistent; 2, conservations are made on many nodes. We should consider the reasonability of the two, especially the first. Additional containers getting reserved from RM in case of Fair scheduler Key: YARN-3558 URL: https://issues.apache.org/jira/browse/YARN-3558 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler, resourcemanager Affects Versions: 2.7.0 Environment: OS :Suse 11 Sp3 Setup : 2 RM 2 NM Scheduler : Fair scheduler Reporter: Bibin A Chundatt Attachments: Amlog.txt, rm.log Submit PI job with 16 maps Total container expected : 16 MAPS + 1 Reduce + 1 AM Total containers reserved by RM is 21 Below set of containers are not being used for execution container_1430213948957_0001_01_20 container_1430213948957_0001_01_19 RM Containers reservation and states {code} Processing container_1430213948957_0001_01_01 of type START Processing container_1430213948957_0001_01_01 of type ACQUIRED Processing container_1430213948957_0001_01_01 of type LAUNCHED Processing container_1430213948957_0001_01_02 of type START Processing container_1430213948957_0001_01_03 of type START Processing container_1430213948957_0001_01_02 of type ACQUIRED Processing container_1430213948957_0001_01_03 of type ACQUIRED Processing container_1430213948957_0001_01_04 of type START Processing container_1430213948957_0001_01_05 of type START Processing container_1430213948957_0001_01_04 of type ACQUIRED Processing container_1430213948957_0001_01_05 of type ACQUIRED Processing container_1430213948957_0001_01_02 of type LAUNCHED Processing container_1430213948957_0001_01_04 of type LAUNCHED Processing container_1430213948957_0001_01_06 of type RESERVED Processing container_1430213948957_0001_01_03 of type LAUNCHED Processing container_1430213948957_0001_01_05 of type LAUNCHED Processing container_1430213948957_0001_01_07 of type START Processing container_1430213948957_0001_01_07 of type ACQUIRED Processing container_1430213948957_0001_01_07 of type LAUNCHED Processing container_1430213948957_0001_01_08 of type RESERVED Processing container_1430213948957_0001_01_02 of type FINISHED Processing container_1430213948957_0001_01_06 of type START Processing container_1430213948957_0001_01_06 of type ACQUIRED Processing container_1430213948957_0001_01_06 of type LAUNCHED Processing container_1430213948957_0001_01_04 of type FINISHED Processing container_1430213948957_0001_01_09 of type START Processing container_1430213948957_0001_01_09 of type ACQUIRED Processing container_1430213948957_0001_01_09 of type LAUNCHED Processing container_1430213948957_0001_01_10 of type RESERVED Processing container_1430213948957_0001_01_03 of type FINISHED Processing container_1430213948957_0001_01_08 of type START Processing container_1430213948957_0001_01_08 of type ACQUIRED Processing container_1430213948957_0001_01_08 of type LAUNCHED Processing container_1430213948957_0001_01_05 of type FINISHED Processing container_1430213948957_0001_01_11 of type START Processing container_1430213948957_0001_01_11 of type ACQUIRED Processing container_1430213948957_0001_01_11 of type LAUNCHED Processing
[jira] [Commented] (YARN-3558) Additional containers getting reserved from RM in case of Fair scheduler
[ https://issues.apache.org/jira/browse/YARN-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563007#comment-14563007 ] Bibin A Chundatt commented on YARN-3558: [~sunilg]. As discussed offline i have tried the same scenarios with only single node. In case of single node number of container reserved is equal to number of required containers. Only 18 containers where reserved by RM. Additional containers getting reserved from RM in case of Fair scheduler Key: YARN-3558 URL: https://issues.apache.org/jira/browse/YARN-3558 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler, resourcemanager Affects Versions: 2.7.0 Environment: OS :Suse 11 Sp3 Setup : 2 RM 2 NM Scheduler : Fair scheduler Reporter: Bibin A Chundatt Attachments: Amlog.txt, rm.log Submit PI job with 16 maps Total container expected : 16 MAPS + 1 Reduce + 1 AM Total containers reserved by RM is 21 Below set of containers are not being used for execution container_1430213948957_0001_01_20 container_1430213948957_0001_01_19 RM Containers reservation and states {code} Processing container_1430213948957_0001_01_01 of type START Processing container_1430213948957_0001_01_01 of type ACQUIRED Processing container_1430213948957_0001_01_01 of type LAUNCHED Processing container_1430213948957_0001_01_02 of type START Processing container_1430213948957_0001_01_03 of type START Processing container_1430213948957_0001_01_02 of type ACQUIRED Processing container_1430213948957_0001_01_03 of type ACQUIRED Processing container_1430213948957_0001_01_04 of type START Processing container_1430213948957_0001_01_05 of type START Processing container_1430213948957_0001_01_04 of type ACQUIRED Processing container_1430213948957_0001_01_05 of type ACQUIRED Processing container_1430213948957_0001_01_02 of type LAUNCHED Processing container_1430213948957_0001_01_04 of type LAUNCHED Processing container_1430213948957_0001_01_06 of type RESERVED Processing container_1430213948957_0001_01_03 of type LAUNCHED Processing container_1430213948957_0001_01_05 of type LAUNCHED Processing container_1430213948957_0001_01_07 of type START Processing container_1430213948957_0001_01_07 of type ACQUIRED Processing container_1430213948957_0001_01_07 of type LAUNCHED Processing container_1430213948957_0001_01_08 of type RESERVED Processing container_1430213948957_0001_01_02 of type FINISHED Processing container_1430213948957_0001_01_06 of type START Processing container_1430213948957_0001_01_06 of type ACQUIRED Processing container_1430213948957_0001_01_06 of type LAUNCHED Processing container_1430213948957_0001_01_04 of type FINISHED Processing container_1430213948957_0001_01_09 of type START Processing container_1430213948957_0001_01_09 of type ACQUIRED Processing container_1430213948957_0001_01_09 of type LAUNCHED Processing container_1430213948957_0001_01_10 of type RESERVED Processing container_1430213948957_0001_01_03 of type FINISHED Processing container_1430213948957_0001_01_08 of type START Processing container_1430213948957_0001_01_08 of type ACQUIRED Processing container_1430213948957_0001_01_08 of type LAUNCHED Processing container_1430213948957_0001_01_05 of type FINISHED Processing container_1430213948957_0001_01_11 of type START Processing container_1430213948957_0001_01_11 of type ACQUIRED Processing container_1430213948957_0001_01_11 of type LAUNCHED Processing container_1430213948957_0001_01_07 of type FINISHED Processing container_1430213948957_0001_01_12 of type START Processing container_1430213948957_0001_01_12 of type ACQUIRED Processing container_1430213948957_0001_01_12 of type LAUNCHED Processing container_1430213948957_0001_01_13 of type RESERVED Processing container_1430213948957_0001_01_06 of type FINISHED Processing container_1430213948957_0001_01_10 of type START Processing container_1430213948957_0001_01_10 of type ACQUIRED Processing container_1430213948957_0001_01_10 of type LAUNCHED Processing container_1430213948957_0001_01_09 of type FINISHED Processing container_1430213948957_0001_01_14 of type START Processing container_1430213948957_0001_01_14 of type ACQUIRED Processing container_1430213948957_0001_01_14 of type LAUNCHED Processing container_1430213948957_0001_01_15 of type RESERVED Processing container_1430213948957_0001_01_08 of type FINISHED Processing container_1430213948957_0001_01_13 of type START Processing
[jira] [Commented] (YARN-3558) Additional containers getting reserved from RM in case of Fair scheduler
[ https://issues.apache.org/jira/browse/YARN-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551884#comment-14551884 ] Sunil G commented on YARN-3558: --- Hi [~bibinchundatt] Could you please upload the RM logs. Additional containers getting reserved from RM in case of Fair scheduler Key: YARN-3558 URL: https://issues.apache.org/jira/browse/YARN-3558 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler, resourcemanager Affects Versions: 2.7.0 Environment: OS :Suse 11 Sp3 Setup : 2 RM 2 NM Scheduler : Fair scheduler Reporter: Bibin A Chundatt Submit PI job with 16 maps Total container expected : 16 MAPS + 1 Reduce + 1 AM Total containers reserved by RM is 21 Below set of containers are not being used for execution container_1430213948957_0001_01_20 container_1430213948957_0001_01_19 RM Containers reservation and states {code} Processing container_1430213948957_0001_01_01 of type START Processing container_1430213948957_0001_01_01 of type ACQUIRED Processing container_1430213948957_0001_01_01 of type LAUNCHED Processing container_1430213948957_0001_01_02 of type START Processing container_1430213948957_0001_01_03 of type START Processing container_1430213948957_0001_01_02 of type ACQUIRED Processing container_1430213948957_0001_01_03 of type ACQUIRED Processing container_1430213948957_0001_01_04 of type START Processing container_1430213948957_0001_01_05 of type START Processing container_1430213948957_0001_01_04 of type ACQUIRED Processing container_1430213948957_0001_01_05 of type ACQUIRED Processing container_1430213948957_0001_01_02 of type LAUNCHED Processing container_1430213948957_0001_01_04 of type LAUNCHED Processing container_1430213948957_0001_01_06 of type RESERVED Processing container_1430213948957_0001_01_03 of type LAUNCHED Processing container_1430213948957_0001_01_05 of type LAUNCHED Processing container_1430213948957_0001_01_07 of type START Processing container_1430213948957_0001_01_07 of type ACQUIRED Processing container_1430213948957_0001_01_07 of type LAUNCHED Processing container_1430213948957_0001_01_08 of type RESERVED Processing container_1430213948957_0001_01_02 of type FINISHED Processing container_1430213948957_0001_01_06 of type START Processing container_1430213948957_0001_01_06 of type ACQUIRED Processing container_1430213948957_0001_01_06 of type LAUNCHED Processing container_1430213948957_0001_01_04 of type FINISHED Processing container_1430213948957_0001_01_09 of type START Processing container_1430213948957_0001_01_09 of type ACQUIRED Processing container_1430213948957_0001_01_09 of type LAUNCHED Processing container_1430213948957_0001_01_10 of type RESERVED Processing container_1430213948957_0001_01_03 of type FINISHED Processing container_1430213948957_0001_01_08 of type START Processing container_1430213948957_0001_01_08 of type ACQUIRED Processing container_1430213948957_0001_01_08 of type LAUNCHED Processing container_1430213948957_0001_01_05 of type FINISHED Processing container_1430213948957_0001_01_11 of type START Processing container_1430213948957_0001_01_11 of type ACQUIRED Processing container_1430213948957_0001_01_11 of type LAUNCHED Processing container_1430213948957_0001_01_07 of type FINISHED Processing container_1430213948957_0001_01_12 of type START Processing container_1430213948957_0001_01_12 of type ACQUIRED Processing container_1430213948957_0001_01_12 of type LAUNCHED Processing container_1430213948957_0001_01_13 of type RESERVED Processing container_1430213948957_0001_01_06 of type FINISHED Processing container_1430213948957_0001_01_10 of type START Processing container_1430213948957_0001_01_10 of type ACQUIRED Processing container_1430213948957_0001_01_10 of type LAUNCHED Processing container_1430213948957_0001_01_09 of type FINISHED Processing container_1430213948957_0001_01_14 of type START Processing container_1430213948957_0001_01_14 of type ACQUIRED Processing container_1430213948957_0001_01_14 of type LAUNCHED Processing container_1430213948957_0001_01_15 of type RESERVED Processing container_1430213948957_0001_01_08 of type FINISHED Processing container_1430213948957_0001_01_13 of type START Processing container_1430213948957_0001_01_16 of type RESERVED Processing container_1430213948957_0001_01_13 of type ACQUIRED Processing container_1430213948957_0001_01_13 of type LAUNCHED Processing container_1430213948957_0001_01_11 of