Carlo Curino commented on YARN-2022:

Hi Sunil, I read the doc with [~chris.douglas] and [~subru], and we agree with 
the general direction, though you will have to be very careful to test this 
thoroughly as you are enforcing rather tricky invariants.

A couple of specific concerns:

1) The yarn.resourcemanager.monitor.capacity.preemption.am_container_limit you 
propose I think it is a bit overkill. I understand the intent to allow for a 
more tunable preemption of AMs, but I worry this is so esoteric of a parameter 
that people will not know how  to use it. I personally would have to think very 
hard to figure out exactly what different configuration of this will give me in 
terms of increasing/decreasing the chances of an AM to survive preemption, and 
in terms of improving overal cluster efficiency. I propose to enforce only 
based on the existing invariants (am-percentage, max-apps etc..), as the 
semantics are crisper: the preemption policy will re-establish the invariants 
of the queue no more no less.

2) Preserving the correct user mix of jobs in the queue it is also a good 
addition, though again I am worried this is tricky code to write, so I strongly 
encourage you to write many many unit tests, and test the policy on a cluster 
extensively before it gets committed. 

> Preempting an Application Master container can be kept as least priority when 
> multiple applications are marked for preemption by 
> ProportionalCapacityPreemptionPolicy
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>                 Key: YARN-2022
>                 URL: https://issues.apache.org/jira/browse/YARN-2022
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>            Reporter: Sunil G
>            Assignee: Sunil G
>         Attachments: YARN-2022-DesignDraft.docx, Yarn-2022.1.patch
> Cluster Size = 16GB [2NM's]
> Queue A Capacity = 50%
> Queue B Capacity = 50%
> Consider there are 3 applications running in Queue A which has taken the full 
> cluster capacity. 
> J1 = 2GB AM + 1GB * 4 Maps
> J2 = 2GB AM + 1GB * 4 Maps
> J3 = 2GB AM + 1GB * 2 Maps
> Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
> Currently in this scenario, Jobs J3 will get killed including its AM.
> It is better if AM can be given least priority among multiple applications. 
> In this same scenario, map tasks from J3 and J2 can be preempted.
> Later when cluster is free, maps can be allocated to these Jobs.

This message was sent by Atlassian JIRA

Reply via email to