Craig Welch created YARN-2848:
---------------------------------
Summary: (FICA) Applications should maintain an application
specific 'cluster' resource to calculate headroom and userlimit
Key: YARN-2848
URL: https://issues.apache.org/jira/browse/YARN-2848
Project: Hadoop YARN
Issue Type: Improvement
Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
Advertising
Likely solutions to [YARN-1680] (properly handling node and rack blacklisting
with cluster level node additions and removals) will entail managing an
application-level "slice" of the cluster resource available to the application
for use in accurately calculating the application headroom and user limit.
There is an assumption that events which impact this resource will change less
frequently than the need to calculate headroom, userlimit, etc (which is a
valid assumption given that occurs per-allocation heartbeat). Given that, the
application should (with assistance from cluster-level code...) detect changes
to the composition of the cluster (node addition, removal) and when those have
occurred, calculate a application specific cluster resource by comparing
cluster nodes to it's own blacklist (both rack and individual node). I think
it makes sense to include nodelabel considerations into this calculation as it
will be efficient to do both at the same time and the single resource value
reflecting both constraints could then be used for efficient frequent headroom
and userlimit calculations while remaining highly accurate. The application
would need to be made aware of nodelabel changes it is interested in (the
application or removal of labels of interest to the application to/from nodes).
For this purpose, the application submissions's nodelabel expression would be
used to determine the nodelabel impact on the resource used to calculate
userlimit and headroom (Cases where application elected to request resources
not using the application level label expression are out of scope for this -
but for the common usecase of an application which uses a particular expression
throughout, userlimit and headroom would be accurate) This could also provide
an overall mechanism for handling application-specific resource constraints
which might be added in the future.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)