[
https://issues.apache.org/jira/browse/YARN-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653315#comment-15653315
]
Carlo Curino commented on YARN-5864:
------------------------------------
[~wangda] I understand the need for this feature, but the general concern I
have is with that the collection of features in CS have very poorly defined
interactions, and worse they do violate each other invariants left, right and
center. For example non-preemptable queues when in use break the fair
over-capacity sharing semantics. Similarly locality and node labels have heavy
and not fully clear redundancies, and user-limits / app priorities / request
priorities / container types / etc... are further complicating this space. The
mental model associated with the system is growing disproportionately for both
users and operators, and this is a bad sign.
The new feature you propose seem to further push us down this slippery slope,
where the semantics of what a user tenant gets for his/her money are very
unclear. Up till before this feature the one invariant we had not violated yet
was that, If I paid for capacity C, and I am within capacity C my containers
will not be disturbed (regardless of other tenants desires). Now a queue may or
may not be preempted within its capacity to accommodate some other queue large
containers.
This opens up many abuses, one that comes to mind:
# I request a large container on node N1,
# preemption kicks out some other tenant,
# I get the container on N1,
# I reduce the size of the container on N1 to a normal size containers...
# (I repeat till I grab all the nodes I want).
Through this little trick a nasty user can simply bully his way into the nodes
he/she wants, regardless of the container size he really needs, and his/her
capacity standing w.r.t. other tenants. I am sure if we squint hard enough
there is a combination of configurations that can prevent this, but the general
concern remains.
Bottomline, I don't want to stand in the way of progress and important
features, but I don't see this ending well.
I see two paths forward:
# a deep refactoring to make the code manageable, and an analysis that produces
crisp semantics associated with each of the N! combination of our
features---this should ideally lead to cutting all "nice on the box" features
that are rarely/never used, or have undefined semantics.
# Keep CS for legacy, and create a new <constraint language + solver>-based
scheduler for which we can prove clear semantics, and that allows
users/operators to have a simple mental model of what the system is supposed to
deliver.
(2) is my favorite option if I had a choice.
> Capacity Scheduler preemption for fragmented cluster
> -----------------------------------------------------
>
> Key: YARN-5864
> URL: https://issues.apache.org/jira/browse/YARN-5864
> Project: Hadoop YARN
> Issue Type: New Feature
> Reporter: Wangda Tan
> Assignee: Wangda Tan
> Attachments: YARN-5864.poc-0.patch
>
>
> YARN-4390 added preemption for reserved container. However, we found one case
> that large container cannot be allocated even if all queues are under their
> limit.
> For example, we have:
> {code}
> Two queues, a and b, capacity 50:50
> Two nodes: n1 and n2, each of them have 50 resource
> Now queue-a uses 10 on n1 and 10 on n2
> queue-b asks for one single container with resource=45.
> {code}
> The container could be reserved on any of the host, but no preemption will
> happen because all queues are under their limits.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]