[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046300#comment-14046300 ]
Xuan Gong commented on YARN-614: -------------------------------- Not sure why this org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions fails, it passed on my local machine. org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter is not related For org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart, it fails because of time-out. I added more logic on the test case, I need to increase the time-out. Submitted new patch to kick the Jenkins again.. > Separate AM failures from hardware failure or YARN error and do not count > them to AM retry count > ------------------------------------------------------------------------------------------------ > > Key: YARN-614 > URL: https://issues.apache.org/jira/browse/YARN-614 > Project: Hadoop YARN > Issue Type: Improvement > Reporter: Bikas Saha > Assignee: Xuan Gong > Fix For: 2.5.0 > > Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, > YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, > YARN-614.10.patch, YARN-614.11.patch, YARN-614.7.patch, YARN-614.8.patch, > YARN-614.9.patch > > > Attempts can fail due to a large number of user errors and they should not be > retried unnecessarily. The only reason YARN should retry an attempt is when > the hardware fails or YARN has an error. NM failing, lost NM and NM disk > errors are the hardware errors that come to mind. -- This message was sent by Atlassian JIRA (v6.2#6252)