Hadoop QA commented on YARN-1996:

{color:red}-1 overall{color}.  Here are the results of testing the latest 
  against trunk revision 8caf537.

    {color:green}+1 @author{color}.  The patch does not contain any @author 

    {color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

      {color:red}-1 javac{color}.  The applied patch generated 1223 javac 
compiler warnings (more than the trunk's current 1219 warnings).

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with 

    {color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:red}-1 core tests{color}.  The patch failed these unit tests in 


    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
Javac warnings: 
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5927//console

This message is automatically generated.

> Provide alternative policies for UNHEALTHY nodes.
> -------------------------------------------------
>                 Key: YARN-1996
>                 URL: https://issues.apache.org/jira/browse/YARN-1996
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, scheduler
>    Affects Versions: 2.4.0
>            Reporter: Gera Shegalov
>            Assignee: Gera Shegalov
>         Attachments: YARN-1996-2.patch, YARN-1996.v01.patch
> Currently, UNHEALTHY nodes can significantly prolong execution of large 
> expensive jobs as demonstrated by MAPREDUCE-5817, and downgrade the cluster 
> health even further due to [positive 
> feedback|http://en.wikipedia.org/wiki/Positive_feedback]. A container set 
> that might have deemed the node unhealthy in the first place starts spreading 
> across the cluster because the current node is declared unusable and all its 
> containers are killed and rescheduled on different nodes.
> To mitigate this, we experiment with a patch that allows containers already 
> running on a node turning UNHEALTHY to complete (drain) whereas no new 
> container can be assigned to it until it turns healthy again.
> This mechanism can also be used for graceful decommissioning of NM. To this 
> end, we have to write a health script  such that it can deterministically 
> report UNHEALTHY. For example with 
> {code}
> if [ -e $1 ] ; then                                                           
>   echo ERROR Node decommmissioning via health script hack                     
> fi 
> {code}
> In the current version patch, the behavior is controlled by a boolean 
> property {{yarn.nodemanager.unhealthy.drain.containers}}. More versatile 
> policies are possible in the future work. Currently, the health state of a 
> node is binary determined based on the disk checker and the health script 
> ERROR outputs. However, we can as well interpret health script output similar 
> to java logging levels (one of which is ERROR) such as WARN, FATAL. Each 
> level can then be treated differently. E.g.,
> - FATAL:  unusable like today 
> - ERROR: drain
> - WARN: halve the node capacity.
> complimented with some equivalence rules such as 3 WARN messages == ERROR,  
> 2*ERROR == FATAL, etc. 

This message was sent by Atlassian JIRA

Reply via email to