Craig Welch commented on YARN-3318:

Looking again at using ResourceUsage instead of the initial use of application 
demand and consumption, while it may be preferable for future cases like queues 
with node label aware policies, there are deficiencies which need to be 
addressed to use it for the initial case, and it makes it more complex to do 
so.  In fact, for the initial case, this approach is inferior.

ResourceUsage is still a bit rough and incomplete, get does not properly handle 
the ANY/ALL case, which is what we need for application fairness - otherwise, 
applications whose resource requests are labeled something other than NO_LABEL 
will be erroneously preferred for scheduling in the fair case.  The prior 
approach was working with full consumption and demand and did not have this 
issue and did not require additional change to support fairness properly.

Even supporting ANY/ALL in ResourceUsage is a little tricky, as I see no reason 
why someone could not set values on ResourceUsage using the ANY label 
definition, and then there is a question as to what is the proper behavior for 
an ANY get request - should it sum all the values for all labels (which is, in 
some sense, correct), or just return the previously set ANY value? Should we 
disallow setting ANY? (that seems a bit arbitrary...) My suggestion is that we 
introduce explicit getAll(Used, Pending, etc), (not an ALL 
CommonNodeLabelsManager constant, I think this just moves/replicates the 
existing problem).  There would be no corresponding setAll.  getAll(XYZ) would 
iterate all labels in ResourceUsage for the passed ResourceType and return a 

For OrderingPolicy, the values should be cached on ResourceUsage instead of in 
SchedulableEntity for cases where that is needed - cloning an entire 
ResourceUsage will be expensive, inefficient, and unnecessary.  We could add a 
separate cache collection in ResourceUsage, but I think it would actually be 
better to add values to the ResourceType enum, SCHEDULING_USED, 

When updating the cached value for Used, OrderingPolicy would then call 
getAllUsed() on ResourceUsage and set the resulting value with set (ANY node 
label expression, SCHEDULING_USED ResourceType), and for demand, 
getAllPending() and then set ANY node label expression, SCHEDULING_PENDING

When getting the cached value, OrderingPolicy would call getUsed(ANY 
nlexpression, SCHEDULING_USED ResourceType) and for pending, getPending(ANY, 

I'm inclined to roll forward with using ResourceUsage despite this additional 
scope to ease future usecases, but we need to be very careful about continuing 
to pull in additional change and complexity which is not required right now, 
and should avoid doing so again this iteration.  It's good to aim for a stable 
api, but it's also good to complete the initial functionality, and to realize 
it's not possible to anticipate all future needs / highly likely there will be 
some change to api's like this as the system evolves.

> Create Initial OrderingPolicy Framework and FifoOrderingPolicy
> --------------------------------------------------------------
>                 Key: YARN-3318
>                 URL: https://issues.apache.org/jira/browse/YARN-3318
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: scheduler
>            Reporter: Craig Welch
>            Assignee: Craig Welch
>         Attachments: YARN-3318.13.patch, YARN-3318.14.patch, 
> YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, 
> YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch, 
> YARN-3318.47.patch, YARN-3318.48.patch, YARN-3318.52.patch, 
> YARN-3318.53.patch, YARN-3318.56.patch
> Create the initial framework required for using OrderingPolicies and an 
> initial FifoOrderingPolicy

This message was sent by Atlassian JIRA

Reply via email to