[ 
https://issues.apache.org/jira/browse/YARN-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2848:
------------------------------
    Description: Likely solutions to [YARN-1680] (properly handling node and 
rack blacklisting with cluster level node additions and removals) will entail 
managing an application-level "slice" of the cluster resource available to the 
application for use in accurately calculating the application headroom and user 
limit.  There is an assumption that events which impact this resource will 
occur less frequently than the need to calculate headroom, userlimit, etc 
(which is a valid assumption given that occurs per-allocation heartbeat).  
Given that, the application should (with assistance from cluster-level code...) 
detect changes to the composition of the cluster (node addition, removal) and 
when those have occurred, calculate an application specific cluster resource by 
comparing cluster nodes to it's own blacklist (both rack and individual node).  
I think it makes sense to include nodelabel considerations into this 
calculation as it will be efficient to do both at the same time and the single 
resource value reflecting both constraints could then be used for efficient 
frequent headroom and userlimit calculations while remaining highly accurate.  
The application would need to be made aware of nodelabel changes it is 
interested in (the application or removal of labels of interest to the 
application to/from nodes).  For this purpose, the application submissions's 
nodelabel expression would be used to determine the nodelabel impact on the 
resource used to calculate userlimit and headroom (Cases where the application 
elected to request resources not using the application level label expression 
are out of scope for this - but for the common usecase of an application which 
uses a particular expression throughout, userlimit and headroom would be 
accurate) This could also provide an overall mechanism for handling 
application-specific resource constraints which might be added in the future.  
(was: Likely solutions to [YARN-1680] (properly handling node and rack 
blacklisting with cluster level node additions and removals) will entail 
managing an application-level "slice" of the cluster resource available to the 
application for use in accurately calculating the application headroom and user 
limit.  There is an assumption that events which impact this resource will 
change less frequently than the need to calculate headroom, userlimit, etc 
(which is a valid assumption given that occurs per-allocation heartbeat).  
Given that, the application should (with assistance from cluster-level code...) 
detect changes to the composition of the cluster (node addition, removal) and 
when those have occurred, calculate a application specific cluster resource by 
comparing cluster nodes to it's own blacklist (both rack and individual node).  
I think it makes sense to include nodelabel considerations into this 
calculation as it will be efficient to do both at the same time and the single 
resource value reflecting both constraints could then be used for efficient 
frequent headroom and userlimit calculations while remaining highly accurate.  
The application would need to be made aware of nodelabel changes it is 
interested in (the application or removal of labels of interest to the 
application to/from nodes).  For this purpose, the application submissions's 
nodelabel expression would be used to determine the nodelabel impact on the 
resource used to calculate userlimit and headroom (Cases where application 
elected to request resources not using the application level label expression 
are out of scope for this - but for the common usecase of an application which 
uses a particular expression throughout, userlimit and headroom would be 
accurate) This could also provide an overall mechanism for handling 
application-specific resource constraints which might be added in the future.)

> (FICA) Applications should maintain an application specific 'cluster' 
> resource to calculate headroom and userlimit
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-2848
>                 URL: https://issues.apache.org/jira/browse/YARN-2848
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacityscheduler
>            Reporter: Craig Welch
>            Assignee: Craig Welch
>
> Likely solutions to [YARN-1680] (properly handling node and rack blacklisting 
> with cluster level node additions and removals) will entail managing an 
> application-level "slice" of the cluster resource available to the 
> application for use in accurately calculating the application headroom and 
> user limit.  There is an assumption that events which impact this resource 
> will occur less frequently than the need to calculate headroom, userlimit, 
> etc (which is a valid assumption given that occurs per-allocation heartbeat). 
>  Given that, the application should (with assistance from cluster-level 
> code...) detect changes to the composition of the cluster (node addition, 
> removal) and when those have occurred, calculate an application specific 
> cluster resource by comparing cluster nodes to it's own blacklist (both rack 
> and individual node).  I think it makes sense to include nodelabel 
> considerations into this calculation as it will be efficient to do both at 
> the same time and the single resource value reflecting both constraints could 
> then be used for efficient frequent headroom and userlimit calculations while 
> remaining highly accurate.  The application would need to be made aware of 
> nodelabel changes it is interested in (the application or removal of labels 
> of interest to the application to/from nodes).  For this purpose, the 
> application submissions's nodelabel expression would be used to determine the 
> nodelabel impact on the resource used to calculate userlimit and headroom 
> (Cases where the application elected to request resources not using the 
> application level label expression are out of scope for this - but for the 
> common usecase of an application which uses a particular expression 
> throughout, userlimit and headroom would be accurate) This could also provide 
> an overall mechanism for handling application-specific resource constraints 
> which might be added in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to