[jira] [Commented] (YARN-3137) CapacityScheduler.checkAccess unnecessarily grabs the scheduler lock

2017-05-11 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16006467#comment-16006467
 ] 

Jason Lowe commented on YARN-3137:
--

Seeing this lock contention quite a bit in 2.8, and I guess I'm still a little 
confused why we need a full CapacityScheduler lock to do this.  Even if we have 
a lock, we're either checking just before or just after some kind of hierarchy 
change.  Queue lookup is not by path but simply a concurrent hashmap lookup, so 
that's not an issue.   Either the queue is there or it's not.  Then we can 
invoke checkAccess on the queue itself.  The queue knows its path in the 
hierarchy, and again either we're checking before the queue is moved or after.  
As long as we lock the specific queue when trying to check the ACL we should be 
fine, but maybe I'm missing something.

Can someone elaborate the scenario where moving the checkAccess lock from the 
highly-contended CapacityScheduler lock to a queue-specific lock causes 
problems when the hierarchy changes?

> CapacityScheduler.checkAccess unnecessarily grabs the scheduler lock
> 
>
> Key: YARN-3137
> URL: https://issues.apache.org/jira/browse/YARN-3137
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
>
> The queues are stored in a ConcurrentHashMap and the code already checks for 
> a null queue.  At best we need to lock individual queues when processing the 
> access check, but I don't see why we need to grab the highly-contested 
> scheduler lock to lookup which queue to use for the hasAccess call.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3137) CapacityScheduler.checkAccess unnecessarily grabs the scheduler lock

2015-02-05 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307036#comment-14307036
 ] 

Varun Saxena commented on YARN-3137:


Agree, we can keep a specific R/W lock to handle queue hierarchy.
We however have another sub task of YARN-3091 i.e. YARN-3139 which looks at 
improving locks of Capacity Scheduler.
These 2 look like dependent JIRAs'.

> CapacityScheduler.checkAccess unnecessarily grabs the scheduler lock
> 
>
> Key: YARN-3137
> URL: https://issues.apache.org/jira/browse/YARN-3137
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
>
> The queues are stored in a ConcurrentHashMap and the code already checks for 
> a null queue.  At best we need to lock individual queues when processing the 
> access check, but I don't see why we need to grab the highly-contested 
> scheduler lock to lookup which queue to use for the hasAccess call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3137) CapacityScheduler.checkAccess unnecessarily grabs the scheduler lock

2015-02-04 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306130#comment-14306130
 ] 

Jason Lowe commented on YARN-3137:
--

bq. Queue hierarchy may change at any point of time due to refreshQueues.

True.  I don't think we'd ever crash while walking the hierarchy as it changes, 
but I think we could walk a path that never actually existed before or after 
the refresh if queue parents were radically shifted.  We can probably leverage 
a read-write lock designed to control the queue hierarchy.  The portion of node 
updates and things like checkAccess would be readers when they need to walk the 
hierarchy, and only the super-rare reinitialize would be a writer.

> CapacityScheduler.checkAccess unnecessarily grabs the scheduler lock
> 
>
> Key: YARN-3137
> URL: https://issues.apache.org/jira/browse/YARN-3137
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
>
> The queues are stored in a ConcurrentHashMap and the code already checks for 
> a null queue.  At best we need to lock individual queues when processing the 
> access check, but I don't see why we need to grab the highly-contested 
> scheduler lock to lookup which queue to use for the hasAccess call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3137) CapacityScheduler.checkAccess unnecessarily grabs the scheduler lock

2015-02-04 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306040#comment-14306040
 ] 

Vinod Kumar Vavilapalli commented on YARN-3137:
---

bq. At best we need to lock individual queues when processing the access check, 
but I don't see why we need to grab the highly-contested scheduler lock to 
lookup which queue to use for the hasAccess call.
Queue hierarchy may change at any point of time due to refreshQueues. We need a 
way to lock the queue-hierarchy itself.

> CapacityScheduler.checkAccess unnecessarily grabs the scheduler lock
> 
>
> Key: YARN-3137
> URL: https://issues.apache.org/jira/browse/YARN-3137
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
>
> The queues are stored in a ConcurrentHashMap and the code already checks for 
> a null queue.  At best we need to lock individual queues when processing the 
> access check, but I don't see why we need to grab the highly-contested 
> scheduler lock to lookup which queue to use for the hasAccess call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3137) CapacityScheduler.checkAccess unnecessarily grabs the scheduler lock

2015-02-04 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305881#comment-14305881
 ] 

Jason Lowe commented on YARN-3137:
--

Thanks for the pointer, Wangda!  I missed that JIRA when it was filed, and I 
agree it makes sense to put these as subtasks.

> CapacityScheduler.checkAccess unnecessarily grabs the scheduler lock
> 
>
> Key: YARN-3137
> URL: https://issues.apache.org/jira/browse/YARN-3137
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>
> The queues are stored in a ConcurrentHashMap and the code already checks for 
> a null queue.  At best we need to lock individual queues when processing the 
> access check, but I don't see why we need to grab the highly-contested 
> scheduler lock to lookup which queue to use for the hasAccess call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3137) CapacityScheduler.checkAccess unnecessarily grabs the scheduler lock

2015-02-04 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305763#comment-14305763
 ] 

Wangda Tan commented on YARN-3137:
--

Agree, I have noticed similar issues around this as well. Do you think 
YARN-3136/YARN-3137 should be tracked as sub-JIRA of YARN-3091?

> CapacityScheduler.checkAccess unnecessarily grabs the scheduler lock
> 
>
> Key: YARN-3137
> URL: https://issues.apache.org/jira/browse/YARN-3137
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>
> The queues are stored in a ConcurrentHashMap and the code already checks for 
> a null queue.  At best we need to lock individual queues when processing the 
> access check, but I don't see why we need to grab the highly-contested 
> scheduler lock to lookup which queue to use for the hasAccess call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3137) CapacityScheduler.checkAccess unnecessarily grabs the scheduler lock

2015-02-04 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305758#comment-14305758
 ] 

Jason Lowe commented on YARN-3137:
--

This can be aggravated by the ClientRMService.getQueueInfo implementation which 
will call checkAccess for every application returned.  That could potentially 
be every application the RM knows about, which can often be over 10,000 
attempts to grab the scheduler lock.


> CapacityScheduler.checkAccess unnecessarily grabs the scheduler lock
> 
>
> Key: YARN-3137
> URL: https://issues.apache.org/jira/browse/YARN-3137
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>
> The queues are stored in a ConcurrentHashMap and the code already checks for 
> a null queue.  At best we need to lock individual queues when processing the 
> access check, but I don't see why we need to grab the highly-contested 
> scheduler lock to lookup which queue to use for the hasAccess call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)