[jira] [Updated] (YARN-2171) AMs block on the CapacityScheduler lock during allocate()

2014-06-18 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-2171:


Priority: Major  (was: Critical)

> AMs block on the CapacityScheduler lock during allocate()
> -
>
> Key: YARN-2171
> URL: https://issues.apache.org/jira/browse/YARN-2171
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-2171.patch, YARN-2171v2.patch
>
>
> When AMs heartbeat into the RM via the allocate() call they are blocking on 
> the CapacityScheduler lock when trying to get the number of nodes in the 
> cluster via getNumClusterNodes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2171) AMs block on the CapacityScheduler lock during allocate()

2014-06-18 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-2171:


Priority: Critical  (was: Major)

> AMs block on the CapacityScheduler lock during allocate()
> -
>
> Key: YARN-2171
> URL: https://issues.apache.org/jira/browse/YARN-2171
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-2171.patch, YARN-2171v2.patch
>
>
> When AMs heartbeat into the RM via the allocate() call they are blocking on 
> the CapacityScheduler lock when trying to get the number of nodes in the 
> cluster via getNumClusterNodes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2171) AMs block on the CapacityScheduler lock during allocate()

2014-06-18 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-2171:


Target Version/s: 2.5.0  (was: 0.23.11, 2.5.0)

> AMs block on the CapacityScheduler lock during allocate()
> -
>
> Key: YARN-2171
> URL: https://issues.apache.org/jira/browse/YARN-2171
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-2171.patch, YARN-2171v2.patch
>
>
> When AMs heartbeat into the RM via the allocate() call they are blocking on 
> the CapacityScheduler lock when trying to get the number of nodes in the 
> cluster via getNumClusterNodes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2171) AMs block on the CapacityScheduler lock during allocate()

2014-06-17 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-2171:
-

Attachment: YARN-2171v2.patch

The point of the unit test was to catch regressions at a high level.  If anyone 
changes the code such that calling allocate() will grab the scheduler lock then 
the test will fail, whether that's a regression in this particular method or 
some new method that's added that ApplicationMasterService or CapacityScheduler 
itself calls and grabs the lock.

I added a separate unit test to exercise the getNumClusterNodes method.

The AHS unit test failure seems unrelated, and it passes for me locally even 
with this change.

> AMs block on the CapacityScheduler lock during allocate()
> -
>
> Key: YARN-2171
> URL: https://issues.apache.org/jira/browse/YARN-2171
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-2171.patch, YARN-2171v2.patch
>
>
> When AMs heartbeat into the RM via the allocate() call they are blocking on 
> the CapacityScheduler lock when trying to get the number of nodes in the 
> cluster via getNumClusterNodes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2171) AMs block on the CapacityScheduler lock during allocate()

2014-06-17 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-2171:
-

Attachment: YARN-2171.patch

Patch to use AtomicInteger for the number of nodes so we can avoid grabbing the 
lock to access the value.  I also added a unit test to verify allocate doesn't 
try to grab the capacity scheduler lock.

> AMs block on the CapacityScheduler lock during allocate()
> -
>
> Key: YARN-2171
> URL: https://issues.apache.org/jira/browse/YARN-2171
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-2171.patch
>
>
> When AMs heartbeat into the RM via the allocate() call they are blocking on 
> the CapacityScheduler lock when trying to get the number of nodes in the 
> cluster via getNumClusterNodes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)