[ 
https://issues.apache.org/jira/browse/YARN-6191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15876567#comment-15876567
 ] 

Chris Douglas commented on YARN-6191:
-------------------------------------

bq. However there's still an issue because the preemption message is too 
general. For example, if the message says "going to preempt 60GB of resources" 
and the AM kills 10 reducers that are 6GB each on 6 different nodes, the RM can 
still kill the maps because the RM needed 60GB of contiguous resources.

I haven't followed the modifications to the preemption policy, so I don't know 
if the AM will be selected as a victim again even after satisfying the contract 
(it should not). The preemption message should be expressive enough to encode 
this, if that's the current behavior. If the RM will only accept 60GB of 
resources from a single node, then that can be encoded in a ResourceRequest in 
the preemption message.

Even if everything behaves badly, killing the reducers is still correct, right? 
If the job is still entitled to resources, then it should reschedule the map 
tasks before the reducers. There are still interleavings of requests that could 
result in the same behavior described in this JIRA, but they'd be stunningly 
unlucky.

bq. I still wonder about the logic of preferring lower container priorities 
regardless of how long they've been running. I'm not sure container priority 
always translates well to how important a container is to the application, and 
we might be better served by preferring to minimize total lost work regardless 
of container priority.

All of the options [~sunilg] suggests are fine heuristics, but the application 
has the best view of the tradeoffs. For example, a long-running container might 
be amortizing the cost of scheduling short-lived tasks, and might actually be 
cheap to kill. If the preemption message is not accurately reporting the 
contract the RM is enforcing, then we should absolutely fix that. But I think 
this is a MapReduce problem, ultimately.

> CapacityScheduler preemption by container priority can be problematic for 
> MapReduce
> -----------------------------------------------------------------------------------
>
>                 Key: YARN-6191
>                 URL: https://issues.apache.org/jira/browse/YARN-6191
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>            Reporter: Jason Lowe
>
> A MapReduce job with thousands of reducers and just a couple of maps left to 
> go was running in a preemptable queue.  Periodically other queues would get 
> busy and the RM would preempt some resources from the job, but it _always_ 
> picked the job's map tasks first because they use the lowest priority 
> containers.  Even though the reducers had a shorter running time, most were 
> spared but the maps were always shot.  Since the map tasks ran for a longer 
> time than the preemption period, the job was in a perpetual preemption loop.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to