[
https://issues.apache.org/jira/browse/YARN-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313379#comment-14313379
]
Hadoop QA commented on YARN-1996:
---------------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12683446/YARN-1996-2.patch
against trunk revision af08425.
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:green}+1 tests included{color}. The patch appears to include 6 new
or modified test files.
{color:red}-1 javac{color}. The applied patch generated 1153 javac
compiler warnings (more than the trunk's current 1149 warnings).
{color:green}+1 javadoc{color}. There were no new javadoc warning messages.
{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.
{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/6565//testReport/
Javac warnings:
https://builds.apache.org/job/PreCommit-YARN-Build/6565//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6565//console
This message is automatically generated.
> Provide alternative policies for UNHEALTHY nodes.
> -------------------------------------------------
>
> Key: YARN-1996
> URL: https://issues.apache.org/jira/browse/YARN-1996
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: nodemanager, scheduler
> Affects Versions: 2.4.0
> Reporter: Gera Shegalov
> Assignee: Gera Shegalov
> Attachments: YARN-1996-2.patch, YARN-1996.v01.patch
>
>
> Currently, UNHEALTHY nodes can significantly prolong execution of large
> expensive jobs as demonstrated by MAPREDUCE-5817, and downgrade the cluster
> health even further due to [positive
> feedback|http://en.wikipedia.org/wiki/Positive_feedback]. A container set
> that might have deemed the node unhealthy in the first place starts spreading
> across the cluster because the current node is declared unusable and all its
> containers are killed and rescheduled on different nodes.
> To mitigate this, we experiment with a patch that allows containers already
> running on a node turning UNHEALTHY to complete (drain) whereas no new
> container can be assigned to it until it turns healthy again.
> This mechanism can also be used for graceful decommissioning of NM. To this
> end, we have to write a health script such that it can deterministically
> report UNHEALTHY. For example with
> {code}
> if [ -e $1 ] ; then
>
> echo ERROR Node decommmissioning via health script hack
>
> fi
> {code}
> In the current version patch, the behavior is controlled by a boolean
> property {{yarn.nodemanager.unhealthy.drain.containers}}. More versatile
> policies are possible in the future work. Currently, the health state of a
> node is binary determined based on the disk checker and the health script
> ERROR outputs. However, we can as well interpret health script output similar
> to java logging levels (one of which is ERROR) such as WARN, FATAL. Each
> level can then be treated differently. E.g.,
> - FATAL: unusable like today
> - ERROR: drain
> - WARN: halve the node capacity.
> complimented with some equivalence rules such as 3 WARN messages == ERROR,
> 2*ERROR == FATAL, etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)