[
https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15117468#comment-15117468
]
MENG DING commented on YARN-4519:
---------------------------------
Hi, [~leftnoteasy]
bq. IIUC, after this patch, increase/decrease container logic needs to acquire
LeafQueue's lock. Since container allocation/release acquires Leafqueue's lock
too, race condition of container/resource will be avoided.
Yes, exactly.
bq. One question not related to the patch, it looks safe to remove synchronized
lock of CS#completedContainerInternal, correct?
I think we don't need to synchronize the entire function with cs lock, only the
part that updates the {{schedulerHealth}}. If you think this is worth fixing, I
will log a separate ticket.
> potential deadlock of CapacityScheduler between decrease container and assign
> containers
> ----------------------------------------------------------------------------------------
>
> Key: YARN-4519
> URL: https://issues.apache.org/jira/browse/YARN-4519
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacityscheduler
> Reporter: sandflee
> Assignee: MENG DING
> Attachments: YARN-4519.1.patch, YARN-4519.2.patch, YARN-4519.3.patch
>
>
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and
> may be get CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread, first get CapacityScheduler's sync lock in
> allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in
> FicaSchedulerApp.assignContainers().
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)