[ https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252343#comment-14252343 ]
Anubhav Dhoot commented on YARN-2975: ------------------------------------- Yes I am worried about getting it wrong for maxRunningEnforcer. Before the change, we would inside a lock achieve the removal of the app whether it was in runnable or not and be reasonably sure. Now the splitting it into 2 non atomic steps outside as i listed above, and also 2 steps inside {noformat} return removeRunnableApp(app) || removeNonRunnableApp(app) {noformat}, we might make it worse as each one leaves the lock before the other acquires. The application could be completely missed when it moves from nonrunnable to runnable in between. How about making removeApp do try to remove from both runnable or nonRunnable inside a single writelock. We can try removing the redundancies with removeRunnableApp and removeNonRunnableApp by having a fourth internal method that all 3 delegate via flags to limit where to look for the app. > FSLeafQueue app lists are accessed without required locks > --------------------------------------------------------- > > Key: YARN-2975 > URL: https://issues.apache.org/jira/browse/YARN-2975 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 2.6.0 > Reporter: Karthik Kambatla > Assignee: Karthik Kambatla > Priority: Blocker > Attachments: yarn-2975-1.patch > > > YARN-2910 adds explicit locked access to runnable and non-runnable apps in > FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed > without locks in other places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)