Varun Saxena commented on YARN-4362:

bq. We should restrict assigning to partition 2.
I think it depends on how to treat non exclusive partition. 
Does it make sense to have a non exclusive partition without it being 
accessible from any queue and assign to it ? Not sure if there is a use case 
for this. If we consider it as free for all kind of partition and its alright 
to assign containers to such a partition, we need to fix the preemption logic. 
Because such assignments will be preempted despite only one app running.
However, if we consider that such a partition has no real meaning until and 
unless its accessible from a queue, you are correct that we should not assign 
to it. Although in this case we can argue that why have a node assigned to a 
partition if its not accessible from any queue. But I guess there can be some 
scenarios where such a situation is possible.

Wangda, thoughts on this ?

> Too many preemption activity when nodelabels are non exclusive
> --------------------------------------------------------------
>                 Key: YARN-4362
>                 URL: https://issues.apache.org/jira/browse/YARN-4362
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Bibin A Chundatt
>            Assignee: Varun Saxena
>            Priority: Critical
>         Attachments: Preemptedpartition.log, ProportionalDefaultQueue.log, 
> ProportionalPolicy.log, capacity-scheduler.xml
> Steps to reproduce
> ===============
> 1.Configure HA cluster with 6 nodes and 3 partition(1,2,3) all non exclusive
> *Partition configuration is as follows*
> 1,2 NM's mapped with Label 1
> NM 3  to  label 2
> 4,5 NM's mapped to Label 3
> NM 6 in DEFAULT partition
> In capacity scheduler the queue are linked only to 1,3 partition.
> The NM 3 with label 2 is a backup node for any partition whenever required 
> will change the label.
> Submit and application/job with 200 containers to default queue.
> All containers that gets assigned to partition 2 gets preempted 
> The application/map task execution is taking more time since 30-40 task gets 
> assigned to partition 2 then gets preempted and all of them needs to be 
> relaunched.

This message was sent by Atlassian JIRA

Reply via email to