[ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045121#comment-14045121
 ] 

Jian He commented on YARN-614:
------------------------------

- shouldNotCountFailureToAttemptLimit, how about 
shouldCountTowardsAttemptFailure ?
- getNumNonPreemptedAppAttempts-> getNumAttemptFailures? please fix all the 
other the previous variable/method names and code comments associated with 
preemption accordingly to avoid confusion.
- 
testshouldNotCountFailureToAttemptLimitOnRMRestart/testShouldNotCountFailureToAttemptLimit,
 rename the test name accordingly. Also, given preemptions are already tested 
in previous test cases, can you change the test to focus on the cases that 
should be tested in this jira only?

> Separate AM failures from hardware failure or YARN error and do not count 
> them to AM retry count
> ------------------------------------------------------------------------------------------------
>
>                 Key: YARN-614
>                 URL: https://issues.apache.org/jira/browse/YARN-614
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Bikas Saha
>            Assignee: Xuan Gong
>             Fix For: 2.5.0
>
>         Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, 
> YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, 
> YARN-614.7.patch, YARN-614.8.patch
>
>
> Attempts can fail due to a large number of user errors and they should not be 
> retried unnecessarily. The only reason YARN should retry an attempt is when 
> the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
> errors are the hardware errors that come to mind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to