[ 
https://issues.apache.org/jira/browse/YARN-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186492#comment-15186492
 ] 

Peng Zhang commented on YARN-3054:
----------------------------------

Preemption happened on low priority container, and for MapReduce reduce task 
got higher priority than map task for scheduling first, but it has data 
dependency on map task. 
So preempt map task which has lower priority may cause job progress never 
proceed.

Detailed scenario described as below: 
1. assume 10 resources in cluster(map and reduce task request the same amount 
of memory and cpu, 1 resource per task), two queues(q1 and q2). 
2. q1 has one job and get all resources when q2 is idle.
3. job in q1 has 5 map tasks and 5 reduce tasks.
4. when q2 get new job, job in q1 will be preempted, and 5 containers will be 
preempted.
5. according to container preemption policy, all map tasks with lower priority 
will be preempted (all progress for these tasks are lost)
6. after container preemption, job in q1 get new resource headroom, and decide 
new ratio between map and reduce tasks, and then AM preempt reduce tasks for 
map tasks. so 5 reduce tasks are killed and new 5 map tasks start.
7. when q2 is idle, job in q1 will then get 5 extra resources and new 5 reduce 
tasks start. 
This is back to phase 3. and this may happens periodically(maybe because map 
tasks for job in q1 run for a long time), map tasks cannot finish before 
container is preempted. So job cannot make any progress.

> Preempt policy in FairScheduler may cause mapreduce job never finish
> --------------------------------------------------------------------
>
>                 Key: YARN-3054
>                 URL: https://issues.apache.org/jira/browse/YARN-3054
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: fairscheduler
>    Affects Versions: 2.6.0
>            Reporter: Peng Zhang
>
> Preemption policy is related with schedule policy now. Using comparator of 
> schedule policy to find preemption candidate cannot guarantee a subset of 
> containers never be preempted. And this may cause tasks to be preempted 
> periodically before they finish. So job cannot make any progress. 
> I think preemption in YARN should got below assurance:
> 1. Mapreduce jobs can get additional resources when others are idle;
> 2. Mapreduce jobs for one user in one queue can still progress with its min 
> share when others preempt resources back.
> Maybe always preempt the latest app and container can get this? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to