[ 
https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765733#comment-16765733
 ] 

Wilfred Spiegelenburg commented on YARN-8655:
---------------------------------------------

Looking at how we get to adding an application to the starved list I don't 
think this is a thread safety issue.

I do agree that we could process the application twice. Fair share starvation 
and min share starvation are two different things. The queue is starved for min 
share and the application is starved for fair share. This does not mean that it 
is a problem. If the application is starved for fair share the calculation of 
the queue min share starvation already takes that fact into account.

The {{updateStarvedAppsMinshare()}} deducts any fair share starvation already 
processed for applications from the possible min share starvation. This means 
two things for an application that is marked for min share starvation
# the application fair share starvation is less than the distributed min share 
starvation of the queue 
# the application has an outstanding demand that is higher than its fair share 
starvation

The chance that an application is starved for fair share with a demand that is 
higher than its fair share starvation combined with the distributed queue 
minimum share that is higher than the fair share starvation is small.

It could be worth the fix if it has a high impact. Looking at the way you are 
proposing to fix it in the patch is however not the way. You introduce a 
{{Thread.sleep()}} call in the pre-emption thread which is not correct. 
Currently the pre-emption will happen when a starved app is added and no 
pre-emption is in progress. With the change there is only 1 pre-emption per 
second. This is a high impact change and I think we need to come up with a 
smarter way to handle this case with less of an impact on the pre-emption 
itself.


> FairScheduler: FSStarvedApps is not thread safe
> -----------------------------------------------
>
>                 Key: YARN-8655
>                 URL: https://issues.apache.org/jira/browse/YARN-8655
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn
>    Affects Versions: 3.0.0
>            Reporter: Zhaohui Xin
>            Assignee: Zhaohui Xin
>            Priority: Major
>         Attachments: YARN-8655.002.patch, YARN-8655.patch
>
>
> *FSStarvedApps is not thread safe, this may make one starve app is processed 
> for two times continuously.*
> For example, when app1 is *fair share starved*, it has been added to 
> appsToProcess. After that, app1 is taken but appBeingProcessed is not yet 
> update to app1. At the moment, app1 is *starved by min share*, so this app 
> is added to appsToProcess again! Because appBeingProcessed is null and 
> appsToProcess also have not this one. 
> {code:java}
> void addStarvedApp(FSAppAttempt app) {
> if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) {
> appsToProcess.add(app);
> }
> }
> FSAppAttempt take() throws InterruptedException {
>   // Reset appBeingProcessed before the blocking call
>   appBeingProcessed = null;
>   // Blocking call to fetch the next starved application
>   FSAppAttempt app = appsToProcess.take();
>   appBeingProcessed = app;
>   return app;
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to