[jira] [Commented] (YARN-2171) AMs block on the CapacityScheduler lock during allocate()
[ https://issues.apache.org/jira/browse/YARN-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044746#comment-14044746 ] Hudson commented on YARN-2171: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1813 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1813/]) YARN-2171. Improved CapacityScheduling to not lock on nodemanager-count when AMs heartbeat in. Contributed by Jason Lowe. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1605616) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java > AMs block on the CapacityScheduler lock during allocate() > - > > Key: YARN-2171 > URL: https://issues.apache.org/jira/browse/YARN-2171 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Fix For: 2.5.0 > > Attachments: YARN-2171.patch, YARN-2171v2.patch > > > When AMs heartbeat into the RM via the allocate() call they are blocking on > the CapacityScheduler lock when trying to get the number of nodes in the > cluster via getNumClusterNodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2171) AMs block on the CapacityScheduler lock during allocate()
[ https://issues.apache.org/jira/browse/YARN-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044702#comment-14044702 ] Hudson commented on YARN-2171: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1786 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1786/]) YARN-2171. Improved CapacityScheduling to not lock on nodemanager-count when AMs heartbeat in. Contributed by Jason Lowe. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1605616) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java > AMs block on the CapacityScheduler lock during allocate() > - > > Key: YARN-2171 > URL: https://issues.apache.org/jira/browse/YARN-2171 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Fix For: 2.5.0 > > Attachments: YARN-2171.patch, YARN-2171v2.patch > > > When AMs heartbeat into the RM via the allocate() call they are blocking on > the CapacityScheduler lock when trying to get the number of nodes in the > cluster via getNumClusterNodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2171) AMs block on the CapacityScheduler lock during allocate()
[ https://issues.apache.org/jira/browse/YARN-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044565#comment-14044565 ] Hudson commented on YARN-2171: -- FAILURE: Integrated in Hadoop-Yarn-trunk #595 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/595/]) YARN-2171. Improved CapacityScheduling to not lock on nodemanager-count when AMs heartbeat in. Contributed by Jason Lowe. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1605616) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java > AMs block on the CapacityScheduler lock during allocate() > - > > Key: YARN-2171 > URL: https://issues.apache.org/jira/browse/YARN-2171 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Fix For: 2.5.0 > > Attachments: YARN-2171.patch, YARN-2171v2.patch > > > When AMs heartbeat into the RM via the allocate() call they are blocking on > the CapacityScheduler lock when trying to get the number of nodes in the > cluster via getNumClusterNodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2171) AMs block on the CapacityScheduler lock during allocate()
[ https://issues.apache.org/jira/browse/YARN-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044110#comment-14044110 ] Hudson commented on YARN-2171: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5780 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5780/]) YARN-2171. Improved CapacityScheduling to not lock on nodemanager-count when AMs heartbeat in. Contributed by Jason Lowe. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1605616) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java > AMs block on the CapacityScheduler lock during allocate() > - > > Key: YARN-2171 > URL: https://issues.apache.org/jira/browse/YARN-2171 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Fix For: 2.5.0 > > Attachments: YARN-2171.patch, YARN-2171v2.patch > > > When AMs heartbeat into the RM via the allocate() call they are blocking on > the CapacityScheduler lock when trying to get the number of nodes in the > cluster via getNumClusterNodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2171) AMs block on the CapacityScheduler lock during allocate()
[ https://issues.apache.org/jira/browse/YARN-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044100#comment-14044100 ] Vinod Kumar Vavilapalli commented on YARN-2171: --- +1, looks good. Checking this in.. > AMs block on the CapacityScheduler lock during allocate() > - > > Key: YARN-2171 > URL: https://issues.apache.org/jira/browse/YARN-2171 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-2171.patch, YARN-2171v2.patch > > > When AMs heartbeat into the RM via the allocate() call they are blocking on > the CapacityScheduler lock when trying to get the number of nodes in the > cluster via getNumClusterNodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2171) AMs block on the CapacityScheduler lock during allocate()
[ https://issues.apache.org/jira/browse/YARN-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034456#comment-14034456 ] Hadoop QA commented on YARN-2171: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650880/YARN-2171v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4016//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4016//console This message is automatically generated. > AMs block on the CapacityScheduler lock during allocate() > - > > Key: YARN-2171 > URL: https://issues.apache.org/jira/browse/YARN-2171 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-2171.patch, YARN-2171v2.patch > > > When AMs heartbeat into the RM via the allocate() call they are blocking on > the CapacityScheduler lock when trying to get the number of nodes in the > cluster via getNumClusterNodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2171) AMs block on the CapacityScheduler lock during allocate()
[ https://issues.apache.org/jira/browse/YARN-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034060#comment-14034060 ] Vinod Kumar Vavilapalli commented on YARN-2171: --- The code changes look fine enough to me. The test is not so useful beyond validating this ticket, but that's okay. I see that we don't have any test validating the number of nodes itself explicitly, shall we add that here? > AMs block on the CapacityScheduler lock during allocate() > - > > Key: YARN-2171 > URL: https://issues.apache.org/jira/browse/YARN-2171 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-2171.patch > > > When AMs heartbeat into the RM via the allocate() call they are blocking on > the CapacityScheduler lock when trying to get the number of nodes in the > cluster via getNumClusterNodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2171) AMs block on the CapacityScheduler lock during allocate()
[ https://issues.apache.org/jira/browse/YARN-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034034#comment-14034034 ] Hadoop QA commented on YARN-2171: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650819/YARN-2171.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4014//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4014//console This message is automatically generated. > AMs block on the CapacityScheduler lock during allocate() > - > > Key: YARN-2171 > URL: https://issues.apache.org/jira/browse/YARN-2171 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-2171.patch > > > When AMs heartbeat into the RM via the allocate() call they are blocking on > the CapacityScheduler lock when trying to get the number of nodes in the > cluster via getNumClusterNodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2171) AMs block on the CapacityScheduler lock during allocate()
[ https://issues.apache.org/jira/browse/YARN-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033864#comment-14033864 ] Jason Lowe commented on YARN-2171: -- When the CapacityScheduler scheduler thread is running full-time due to a constant stream of events (e.g.: large number of running applications with a large number of cluster nodes) then the CapacityScheduler lock is held by that scheduler loop most of the time. As AMs heartbeat into the RM to try to get their resources, the capacity scheduler code goes out of its way to try to avoid having the AMs grab the scheduler lock. Unfortunately this one was missed to get this one integer value. Therefore they end up piling up on the scheduler lock, filling all of the IPC handlers of the ApplicationMasterService and the others back up on the call queue. Once the scheduler releases the lock it will quickly try to grab it again, so only a few AMs end up getting through the "gate" and the IPC handlers fill again with the next batch of AMs blocking on the scheduler lock. This causes the average RPC response times to skyrocket for AMs. AMs experience large delays getting their allocations which in turn leads to lower cluster utilization and increased application runtimes. > AMs block on the CapacityScheduler lock during allocate() > - > > Key: YARN-2171 > URL: https://issues.apache.org/jira/browse/YARN-2171 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > > When AMs heartbeat into the RM via the allocate() call they are blocking on > the CapacityScheduler lock when trying to get the number of nodes in the > cluster via getNumClusterNodes. -- This message was sent by Atlassian JIRA (v6.2#6252)