[
https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Naganarasimha G R updated YARN-4416:
------------------------------------
Attachment: YARN-4416.v1.001.patch
Thanks for the comments [~wangda],
bq. queueCapacity, resource-usage has their own read/write lock.
Hence have removed unwanted synchronization on the methods
bq. numContainers is volatile.
as its volatile i have removed the synchronization for the get method also
there was unnecessary override of getNumContainers in LeafQueue, hence removed
it
bq. read/write lock could be added to OrderingPolicy. Read operations don't
need synchronized. So getNumApplications doesn't need synchronized.
i have added locks for the access of {{schedulableEntities}} in
*AbstractComparatorOrderingPolicy* but not completely sure of the modifications
as there already synchronization on {{entitiesToReorder}}. So would like
additional(/focused) review for this part in particular
> Deadlock due to synchronised get Methods in AbstractCSQueue
> -----------------------------------------------------------
>
> Key: YARN-4416
> URL: https://issues.apache.org/jira/browse/YARN-4416
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler, resourcemanager
> Affects Versions: 2.7.1
> Reporter: Naganarasimha G R
> Assignee: Naganarasimha G R
> Priority: Minor
> Attachments: YARN-4416.v1.001.patch, deadlock.log
>
>
> While debugging in eclipse came across a scenario where in i had to get to
> know the name of the queue but every time i tried to see the queue it was
> getting hung. On seeing the stack realized there was a deadlock but on
> analysis found out that it was only due to *queue.toString()* during
> debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized
> and better be handled through read and write locks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)