[jira] [Commented] (YARN-5864) Capacity Scheduler preemption for fragmented cluster

Carlo Curino (JIRA) Wed, 09 Nov 2016 23:40:09 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653315#comment-15653315
 ]


Carlo Curino commented on YARN-5864:
------------------------------------

[~wangda] I understand the need for this feature, but the general concern I 
have is with that the collection of features in CS have very poorly defined 
interactions, and worse they do violate each other invariants left, right and 
center. For example non-preemptable queues when in use break the fair 
over-capacity sharing semantics. Similarly locality and node labels have heavy 
and not fully clear redundancies, and user-limits / app priorities / request 
priorities / container types / etc... are further complicating this space. The 
mental model associated with the system is growing disproportionately for both 
users and operators, and this is a bad sign.

The new feature you propose seem to further push us down this slippery slope, 
where the semantics of what a user tenant gets for his/her money are very 
unclear. Up till before this feature the one invariant we had not violated yet 
was that, If I paid for capacity C, and I am within capacity C my containers 
will not be disturbed (regardless of other tenants desires). Now a queue may or 
may not be preempted within its capacity to accommodate some other queue large 
containers. 

This opens up many abuses, one that comes to mind:
 # I request a large container on node N1, 
 # preemption kicks out some other tenant, 
 # I get the container on N1, 
 # I reduce the size of the container on N1 to a normal size containers... 
 # (I repeat till I grab all the nodes I want).  
Through this little trick a nasty user can simply bully his way into the nodes 
he/she wants, regardless of the container size he really needs, and his/her 
capacity standing w.r.t. other tenants. I am sure if we squint hard enough 
there is a combination of configurations that can prevent this, but the general 
concern remains.


Bottomline, I don't want to stand in the way of progress and important 
features, but I don't see this ending well. 

I see two paths forward:
# a deep refactoring to make the code manageable, and an analysis that produces 
crisp semantics associated with each of the N! combination of our 
features---this should ideally lead to cutting all "nice on the box" features 
that are rarely/never used, or have undefined semantics. 
# Keep CS for legacy, and create a new <constraint language + solver>-based 
scheduler for which we can prove clear semantics, and that allows 
users/operators to have a simple mental model of what the system is supposed to 
deliver.

(2) is my favorite option if I had a choice.


> Capacity Scheduler preemption for fragmented cluster 
> -----------------------------------------------------
>
>                 Key: YARN-5864
>                 URL: https://issues.apache.org/jira/browse/YARN-5864
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: YARN-5864.poc-0.patch
>
>
> YARN-4390 added preemption for reserved container. However, we found one case 
> that large container cannot be allocated even if all queues are under their 
> limit.
> For example, we have:
> {code}
> Two queues, a and b, capacity 50:50 
> Two nodes: n1 and n2, each of them have 50 resource 
> Now queue-a uses 10 on n1 and 10 on n2
> queue-b asks for one single container with resource=45. 
> {code} 
> The container could be reserved on any of the host, but no preemption will 
> happen because all queues are under their limits. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5864) Capacity Scheduler preemption for fragmented cluster

Reply via email to