qus-jiawei created YARN-1469:
--------------------------------
Summary: ApplicationMaster crash cause the TaskAttemptImpl
couldn't handle the TA_TOO_MANY_FETCH_FAILURE at KILLED
Key: YARN-1469
URL: https://issues.apache.org/jira/browse/YARN-1469
Project: Hadoop YARN
Issue Type: Bug
Reporter: qus-jiawei
This bug could happen when using demission command to demission an
nodemanager.The detail is bellow:
1.one job running happily on the yarn cluster and some MapTask finish on
machine A then begin to schedule the reduce task.Now,the MapTask's state is
successed.
2.The hadoop admin demission machine A 's NodeManager.
3.The ApplicationMaster find the some MapTask hived finish on a demissioned
nodemanager, change this MapTask 's state to KILLED.
4.Some running ReduceTask couldn't get the data from MapTask throw an event
TA_TOO_MANY_FETCH_FAILURE to TaskAttemptImpl.
5.TaskAttemptImpl couldn't handle TA_TOO_MANY_FETCH_FAILURE at KILLED state
then throw an exception,cause the ApplicationMaster turn to ERROR.
I think TaskAttemptImpl could just ignore the TA_TOO_MANY_FETCH_FAILURE event
at KILLED state
--
This message was sent by Atlassian JIRA
(v6.1#6144)