[ 
https://issues.apache.org/jira/browse/YARN-568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649591#comment-13649591
 ] 

Sandy Ryza commented on YARN-568:
---------------------------------

Thanks for addressing my questions, Carlo.  Some responses and more thoughts:

bq. This translate in a deterministic choice of containers across invocations 
of the preemption procedures...

The current ordering isn't really guaranteed to be deterministic, as most 
containers will have the same priorities.  This behavior doesn't really make 
sense and I'd like to change it for that reason and a few others (see YARN-596).

It's true that, with those changes, the same set containers will usually be 
near the bottom, but I can think of a few cases that would lead to 
inconsistencies.  Regarding the changing cluster conditions, this might make 
things fairer to the containers being preempted, but it could mean that, if 
apps aren't obedient, we need to wait another whole maxWaitTimeBeforeKill to 
make the container available to the starved application.  I haven't given this 
an enormous amount of thought, but I think we care more about guarantees for 
starved applications than any notion of fairness in the containers we're 
preempting. (Although hopefully in the case of a big job finishing, the starved 
apps would get the newly available resources, and the containers would not need 
to be preempted anyway.)  While these consequences aren't terrible, is there a 
reason not to, in the FairScheduler class, maintain a/the data structure of 
containers that have been marked, and process those first?

bq. toPreempt is decremented in all three cases because we would otherwise 
double-kill for the same resource needs: imagine you want 5 containers and send 
corresponding preemption requests...

I'm now convinced.  Thanks for spelling it out for me.

bq. It is probably good to have a "no-preemption" mode in which we simply 
straight kill...

Agreed that in most cases, setting the right preemptionInterval and 
maxWaitTimeBeforeKill would be a desirable way to satisfy these needs.  As a 
user, though, I think that if I set maxWaitTimeBeforeKill to 0, I would expect 
it to kill immediately.  A separate mode, as you auggest, would work for this 
as well.  An issue that you touched upon in 2) is that changing the 
preemptionInterval is not as neutral as changing something like the scheduler 
update interval.  There's no guarantee on the time it takes for the cluster 
state to be updated with the results of a preemption.  If there is a starved 
application that is also picky, a short preemption interval could lead to a 
higher volume of (unnecessary) preemptions.  While this is maybe a deficiency 
in the current way of doing things, we'll have to work with it for now.
                
> FairScheduler: support for work-preserving preemption 
> ------------------------------------------------------
>
>                 Key: YARN-568
>                 URL: https://issues.apache.org/jira/browse/YARN-568
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: scheduler
>            Reporter: Carlo Curino
>            Assignee: Carlo Curino
>         Attachments: YARN-568.patch, YARN-568.patch
>
>
> In the attached patch, we modified  the FairScheduler to substitute its 
> preemption-by-killling with a work-preserving version of preemption (followed 
> by killing if the AMs do not respond quickly enough). This should allows to 
> run preemption checking more often, but kill less often (proper tuning to be 
> investigated).  Depends on YARN-567 and YARN-45, is related to YARN-569.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to