[ 
https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-460:
-------------------------------

    Attachment: YARN-460.patch

So I think we can simply track if the application gets stopped and then check 
that in the allocate() call before really processing it.  

All the stopping/removing of the application happens in CS.doneApplication and 
the race is really between the calls in that function and the fact that 
allocate() isn't synchronized. No other paths I could find should cause issues 
since most of the other funtions in CS are all synchronized and wouldn't run 
while the doneApplication is happening. 

here is a preliminary patch that I am going to do some more testing on it.  The 
checks for stopped in the SchedulerApp are extra I was just being paranoid.  
                
> CS user left in list of active users for the queue even when application 
> finished
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-460
>                 URL: https://issues.apache.org/jira/browse/YARN-460
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 0.23.7, 2.0.4-alpha
>            Reporter: Thomas Graves
>            Assignee: Thomas Graves
>            Priority: Critical
>         Attachments: YARN-460.patch
>
>
> We have seen a user get left in the queues list of active users even though 
> the application was removed. This can cause everyone else in the queue to get 
> less resources if using the minimum user limit percent config.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to