[ https://issues.apache.org/jira/browse/YARN-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653315#comment-15653315 ]
Carlo Curino commented on YARN-5864: ------------------------------------ [~wangda] I understand the need for this feature, but the general concern I have is with that the collection of features in CS have very poorly defined interactions, and worse they do violate each other invariants left, right and center. For example non-preemptable queues when in use break the fair over-capacity sharing semantics. Similarly locality and node labels have heavy and not fully clear redundancies, and user-limits / app priorities / request priorities / container types / etc... are further complicating this space. The mental model associated with the system is growing disproportionately for both users and operators, and this is a bad sign. The new feature you propose seem to further push us down this slippery slope, where the semantics of what a user tenant gets for his/her money are very unclear. Up till before this feature the one invariant we had not violated yet was that, If I paid for capacity C, and I am within capacity C my containers will not be disturbed (regardless of other tenants desires). Now a queue may or may not be preempted within its capacity to accommodate some other queue large containers. This opens up many abuses, one that comes to mind: # I request a large container on node N1, # preemption kicks out some other tenant, # I get the container on N1, # I reduce the size of the container on N1 to a normal size containers... # (I repeat till I grab all the nodes I want). Through this little trick a nasty user can simply bully his way into the nodes he/she wants, regardless of the container size he really needs, and his/her capacity standing w.r.t. other tenants. I am sure if we squint hard enough there is a combination of configurations that can prevent this, but the general concern remains. Bottomline, I don't want to stand in the way of progress and important features, but I don't see this ending well. I see two paths forward: # a deep refactoring to make the code manageable, and an analysis that produces crisp semantics associated with each of the N! combination of our features---this should ideally lead to cutting all "nice on the box" features that are rarely/never used, or have undefined semantics. # Keep CS for legacy, and create a new <constraint language + solver>-based scheduler for which we can prove clear semantics, and that allows users/operators to have a simple mental model of what the system is supposed to deliver. (2) is my favorite option if I had a choice. > Capacity Scheduler preemption for fragmented cluster > ----------------------------------------------------- > > Key: YARN-5864 > URL: https://issues.apache.org/jira/browse/YARN-5864 > Project: Hadoop YARN > Issue Type: New Feature > Reporter: Wangda Tan > Assignee: Wangda Tan > Attachments: YARN-5864.poc-0.patch > > > YARN-4390 added preemption for reserved container. However, we found one case > that large container cannot be allocated even if all queues are under their > limit. > For example, we have: > {code} > Two queues, a and b, capacity 50:50 > Two nodes: n1 and n2, each of them have 50 resource > Now queue-a uses 10 on n1 and 10 on n2 > queue-b asks for one single container with resource=45. > {code} > The container could be reserved on any of the host, but no preemption will > happen because all queues are under their limits. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org