[
https://issues.apache.org/jira/browse/YARN-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792601#comment-13792601
]
Hudson commented on YARN-1265:
------------------------------
FAILURE: Integrated in Hadoop-Hdfs-trunk #1549 (See
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1549/])
YARN-1265. Fair Scheduler chokes on unhealthy node reconnect (Sandy Ryza)
(sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1531146)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
*
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
> Fair Scheduler chokes on unhealthy node reconnect
> -------------------------------------------------
>
> Key: YARN-1265
> URL: https://issues.apache.org/jira/browse/YARN-1265
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager, scheduler
> Affects Versions: 2.1.1-beta
> Reporter: Sandy Ryza
> Assignee: Sandy Ryza
> Fix For: 2.2.1
>
> Attachments: YARN-1265-1.patch, YARN-1265.patch
>
>
> Only nodes in the RUNNING state are tracked by schedulers. When a node
> reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if
> it's in the RUNNING state. The FairScheduler doesn't guard against this.
> I think the best way to fix this is to check to see whether a node is RUNNING
> before telling the scheduler to remove it.
--
This message was sent by Atlassian JIRA
(v6.1#6144)