[ https://issues.apache.org/jira/browse/YARN-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14975495#comment-14975495 ]
Hudson commented on YARN-4284: ------------------------------ FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #540 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/540/]) YARN-4284. condition for AM blacklisting is too narrow. Contributed by (jlowe: rev 33a03af3c396097929b9cd9c790d7f52eddc13e0) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java > condition for AM blacklisting is too narrow > ------------------------------------------- > > Key: YARN-4284 > URL: https://issues.apache.org/jira/browse/YARN-4284 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.8.0 > Reporter: Sangjin Lee > Assignee: Sangjin Lee > Fix For: 2.8.0 > > Attachments: YARN-4284.001.patch, YARN-4284.002.patch > > > Per YARN-2005, there is now a way to blacklist nodes for AM purposes so the > next app attempt can be assigned to a different node. > However, currently the condition under which the node gets blacklisted is > limited to {{DISKS_FAILED}}. There are a whole host of other issues that may > cause the failure, for which we want to locate the AM elsewhere; e.g. disks > full, JVM crashes, memory issues, etc. > Since the AM blacklisting is per-app, there is little practical downside in > blacklisting the nodes on *any failure* (although it might lead to > blacklisting the node more aggressively than necessary). I would propose > locating the next app attempt to a different node on any failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)