[
https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas Graves updated YARN-460:
-------------------------------
Attachment: YARN-460.patch
So I think we can simply track if the application gets stopped and then check
that in the allocate() call before really processing it.
All the stopping/removing of the application happens in CS.doneApplication and
the race is really between the calls in that function and the fact that
allocate() isn't synchronized. No other paths I could find should cause issues
since most of the other funtions in CS are all synchronized and wouldn't run
while the doneApplication is happening.
here is a preliminary patch that I am going to do some more testing on it. The
checks for stopped in the SchedulerApp are extra I was just being paranoid.
> CS user left in list of active users for the queue even when application
> finished
> ---------------------------------------------------------------------------------
>
> Key: YARN-460
> URL: https://issues.apache.org/jira/browse/YARN-460
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacityscheduler
> Affects Versions: 0.23.7, 2.0.4-alpha
> Reporter: Thomas Graves
> Assignee: Thomas Graves
> Priority: Critical
> Attachments: YARN-460.patch
>
>
> We have seen a user get left in the queues list of active users even though
> the application was removed. This can cause everyone else in the queue to get
> less resources if using the minimum user limit percent config.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira