[
https://issues.apache.org/jira/browse/YARN-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
qus-jiawei updated YARN-1469:
-----------------------------
Attachment: job_1384857622207_222215-amlog.txt
Look at log file and find attempt_1384857622207_222215_m_000006_0 task could
clearly know what happen.
> ApplicationMaster crash cause the TaskAttemptImpl couldn't handle the
> TA_TOO_MANY_FETCH_FAILURE at KILLED
> ----------------------------------------------------------------------------------------------------------
>
> Key: YARN-1469
> URL: https://issues.apache.org/jira/browse/YARN-1469
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: qus-jiawei
> Attachments: job_1384857622207_222215-amlog.txt
>
>
> This bug could happen when using demission command to demission an
> nodemanager.The detail is bellow:
> 1.one job running happily on the yarn cluster and some MapTask finish on
> machine A then begin to schedule the reduce task.Now,the MapTask's state is
> successed.
> 2.The hadoop admin demission machine A 's NodeManager.
> 3.The ApplicationMaster find the some MapTask hived finish on a demissioned
> nodemanager, change this MapTask 's state to KILLED.
> 4.Some running ReduceTask couldn't get the data from MapTask throw an event
> TA_TOO_MANY_FETCH_FAILURE to TaskAttemptImpl.
> 5.TaskAttemptImpl couldn't handle TA_TOO_MANY_FETCH_FAILURE at KILLED state
> then throw an exception,cause the ApplicationMaster turn to ERROR.
> I think TaskAttemptImpl could just ignore the TA_TOO_MANY_FETCH_FAILURE
> event at KILLED state
--
This message was sent by Atlassian JIRA
(v6.1#6144)