[jira] [Updated] (YARN-2171) AMs block on the CapacityScheduler lock during allocate()
[ https://issues.apache.org/jira/browse/YARN-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-2171: Priority: Major (was: Critical) > AMs block on the CapacityScheduler lock during allocate() > - > > Key: YARN-2171 > URL: https://issues.apache.org/jira/browse/YARN-2171 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-2171.patch, YARN-2171v2.patch > > > When AMs heartbeat into the RM via the allocate() call they are blocking on > the CapacityScheduler lock when trying to get the number of nodes in the > cluster via getNumClusterNodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2171) AMs block on the CapacityScheduler lock during allocate()
[ https://issues.apache.org/jira/browse/YARN-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-2171: Priority: Critical (was: Major) > AMs block on the CapacityScheduler lock during allocate() > - > > Key: YARN-2171 > URL: https://issues.apache.org/jira/browse/YARN-2171 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-2171.patch, YARN-2171v2.patch > > > When AMs heartbeat into the RM via the allocate() call they are blocking on > the CapacityScheduler lock when trying to get the number of nodes in the > cluster via getNumClusterNodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2171) AMs block on the CapacityScheduler lock during allocate()
[ https://issues.apache.org/jira/browse/YARN-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-2171: Target Version/s: 2.5.0 (was: 0.23.11, 2.5.0) > AMs block on the CapacityScheduler lock during allocate() > - > > Key: YARN-2171 > URL: https://issues.apache.org/jira/browse/YARN-2171 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-2171.patch, YARN-2171v2.patch > > > When AMs heartbeat into the RM via the allocate() call they are blocking on > the CapacityScheduler lock when trying to get the number of nodes in the > cluster via getNumClusterNodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2171) AMs block on the CapacityScheduler lock during allocate()
[ https://issues.apache.org/jira/browse/YARN-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-2171: - Attachment: YARN-2171v2.patch The point of the unit test was to catch regressions at a high level. If anyone changes the code such that calling allocate() will grab the scheduler lock then the test will fail, whether that's a regression in this particular method or some new method that's added that ApplicationMasterService or CapacityScheduler itself calls and grabs the lock. I added a separate unit test to exercise the getNumClusterNodes method. The AHS unit test failure seems unrelated, and it passes for me locally even with this change. > AMs block on the CapacityScheduler lock during allocate() > - > > Key: YARN-2171 > URL: https://issues.apache.org/jira/browse/YARN-2171 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-2171.patch, YARN-2171v2.patch > > > When AMs heartbeat into the RM via the allocate() call they are blocking on > the CapacityScheduler lock when trying to get the number of nodes in the > cluster via getNumClusterNodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2171) AMs block on the CapacityScheduler lock during allocate()
[ https://issues.apache.org/jira/browse/YARN-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-2171: - Attachment: YARN-2171.patch Patch to use AtomicInteger for the number of nodes so we can avoid grabbing the lock to access the value. I also added a unit test to verify allocate doesn't try to grab the capacity scheduler lock. > AMs block on the CapacityScheduler lock during allocate() > - > > Key: YARN-2171 > URL: https://issues.apache.org/jira/browse/YARN-2171 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-2171.patch > > > When AMs heartbeat into the RM via the allocate() call they are blocking on > the CapacityScheduler lock when trying to get the number of nodes in the > cluster via getNumClusterNodes. -- This message was sent by Atlassian JIRA (v6.2#6252)