[ 
https://issues.apache.org/jira/browse/YARN-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208941#comment-14208941
 ] 

Craig Welch commented on YARN-2848:
-----------------------------------

bq. IIUC, this JIRA is to tackle the cases which app has some special 
requirements on resource requests (including but not limited to black list 
nodes, node labels expression, etc.) and RM want to return headroom considering 
such factors to AM.

Well, yes... although the extent to which they are "special" isn't clear, 
[YARN-1680] surfaces this as a bug (something of a design miss...) for 
blacklisting of resources which has been around for some time - and of course, 
node labels were recently added but with an eye to being used - as in, there's 
a desire to be able to use them with processes which will want to have accurate 
headroom, userlimit, etc - so the problem already exists, as it were, it's not 
"something new" we're choosing to introduce, it's rather a way of resolving 
inconsistencies which exist because of functionalities which is are perhaps not 
fully complete wrt the rest of the system - and in so far as we want 
applications to work with constraints with respect to nodes they use, we will 
need to solve this problem in some way, or do away with headroom and / or user 
limits as such, which is not a very attractive choice

bq. My major concern of this is it will bring more computation complexity in RM 
side – we already have very heavy computation when trying to allocate 
containers, like locality/hierachy-of-queues/user-limit/headroom/node-labels

The idea is to minimize the calculation needed during allocation by making 
adjustments to resources only as needed by external events which should be 
relatively infrequent with respect to any given application

bq.  if we trying to resolve the problem by handling events (such as node label 
change, black node list change, etc.) at app level, it will be very 
problematic, since some of the operations cannot be even done in O( n ) time.
bq. So I think if some operation have complex of O( n ), (n can be as large as 
#app in the cluster), we should be very discreet to such operation.

so, the suggestion is not to have the activity which accepts a node label 
change or a node addition or removal from a cluster synchronously notify all 
applications of that change - rather, to allow applications to check for 
changes relevant to them (changes to the nodes held by a label they care about 
(label level info), node additions or removals relevant to their blacklisting 
(cluster level info)) and to have the application only adjust it's resource 
view when it determines it is necessary to do so - at the level of the cluster 
handling the addition or removal of a node, or changes to the nodes for a node 
label, nothing more than an indication of "last change" for the resources needs 
to occur, and applications will simply check for "change indications" that they 
care about and take action as needed - it should be as efficient and 
lightweight as possible, and would not impose any O ( n ) (where n=#app in 
cluster) operations on any single/synchronous code path


> (FICA) Applications should maintain an application specific 'cluster' 
> resource to calculate headroom and userlimit
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-2848
>                 URL: https://issues.apache.org/jira/browse/YARN-2848
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacityscheduler
>            Reporter: Craig Welch
>            Assignee: Craig Welch
>
> Likely solutions to [YARN-1680] (properly handling node and rack blacklisting 
> with cluster level node additions and removals) will entail managing an 
> application-level "slice" of the cluster resource available to the 
> application for use in accurately calculating the application headroom and 
> user limit.  There is an assumption that events which impact this resource 
> will occur less frequently than the need to calculate headroom, userlimit, 
> etc (which is a valid assumption given that occurs per-allocation heartbeat). 
>  Given that, the application should (with assistance from cluster-level 
> code...) detect changes to the composition of the cluster (node addition, 
> removal) and when those have occurred, calculate an application specific 
> cluster resource by comparing cluster nodes to it's own blacklist (both rack 
> and individual node).  I think it makes sense to include nodelabel 
> considerations into this calculation as it will be efficient to do both at 
> the same time and the single resource value reflecting both constraints could 
> then be used for efficient frequent headroom and userlimit calculations while 
> remaining highly accurate.  The application would need to be made aware of 
> nodelabel changes it is interested in (the application or removal of labels 
> of interest to the application to/from nodes).  For this purpose, the 
> application submissions's nodelabel expression would be used to determine the 
> nodelabel impact on the resource used to calculate userlimit and headroom 
> (Cases where the application elected to request resources not using the 
> application level label expression are out of scope for this - but for the 
> common usecase of an application which uses a particular expression 
> throughout, userlimit and headroom would be accurate) This could also provide 
> an overall mechanism for handling application-specific resource constraints 
> which might be added in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to