[ 
https://issues.apache.org/jira/browse/YARN-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984167#comment-13984167
 ] 

Steve Loughran commented on YARN-1996:
--------------------------------------

This sounds good -not just as failure handling, but for cluster management

This may be a duplicate of YARN-914, "graceful decommission of NM", and/or 
YARN-671

For long-lived services
# it'd be nice to have a notification from the NM to the AM that it's draining 
and that they should react : YARN-1394
# the drain process must have a (configurable?) timeout & then kill all 
outstanding containers -without adding them as any kind of failure (i.e. 
container loss event from NM -> AM should indicate this)
# AM itself needs to receive a "your own node is being drained" event and do 
any best-effort pre-restart operations (e.g. transition to passive), and RM not 
count AM termination/restart as an AM failure


> Provide alternative policies for UNHEALTHY nodes.
> -------------------------------------------------
>
>                 Key: YARN-1996
>                 URL: https://issues.apache.org/jira/browse/YARN-1996
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, scheduler
>    Affects Versions: 2.4.0
>            Reporter: Gera Shegalov
>            Assignee: Gera Shegalov
>         Attachments: YARN-1996.v01.patch
>
>
> Currently, UNHEALTHY nodes can significantly prolong execution of large 
> expensive jobs as demonstrated by MAPREDUCE-5817, and downgrade the cluster 
> health even further due to [positive 
> feedback|http://en.wikipedia.org/wiki/Positive_feedback]. A container set 
> that might have deemed the node unhealthy in the first place starts spreading 
> across the cluster because the current node is declared unusable and all its 
> containers are killed and rescheduled on different nodes.
> To mitigate this, we experiment with a patch that allows containers already 
> running on a node turning UNHEALTHY to complete (drain) whereas no new 
> container can be assigned to it until it turns healthy again.
> This mechanism can also be used for graceful decommissioning of NM. To this 
> end, we have to write a health script  such that it can deterministically 
> report UNHEALTHY. For example with 
> {code}
> if [ -e $1 ] ; then                                                           
>      
>   echo ERROR Node decommmissioning via health script hack                     
>      
> fi 
> {code}
> In the current version patch, the behavior is controlled by a boolean 
> property {{yarn.nodemanager.unheathy.drain.containers}}. More versatile 
> policies are possible in the future work. Currently, the health state of a 
> node is binary determined based on the disk checker and the health script 
> ERROR outputs. However, we can as well interpret health script output similar 
> to java logging levels (one of which is ERROR) such as WARN, FATAL. Each 
> level can then be treated differently. E.g.,
> - FATAL:  unusable like today 
> - ERROR: drain
> - WARN: halve the node capacity.
> complimented with some equivalence rules such as 3 WARN messages == ERROR,  
> 2*ERROR == FATAL, etc. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to