[ 
https://issues.apache.org/jira/browse/YARN-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996066#comment-14996066
 ] 

Rohith Sharma K S commented on YARN-4324:
-----------------------------------------

When AM is launched and registered, RM expects AM to send heartbeat in timely 
manner. If RM does not receive an heartbeat from AM for certain time(default is 
10min) then RM kills the AM. This is expected behavior. 

In your scenario, try to find what expiry happened either AM heartbeat expiry 
OR container expiry. This info you will get in ResourceManager log. Does 
NodeManager is restarted? If Yes, and NM recovery not enabled then it is 
expected behavior.

> AM hang more than 10 min was kill by RM
> ---------------------------------------
>
>                 Key: YARN-4324
>                 URL: https://issues.apache.org/jira/browse/YARN-4324
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: tangshangwen
>
> this is my logs
> 2015-11-02 01:14:54,175 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 2865
> 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
> job_1446203652278_135526Job Transitioned from RUNNING to COMMITTING   
> 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1446203652278_135526_m_001777_1 TaskAttempt Transition
> ed from UNASSIGNED to KILLED
> 2015-11-02 01:14:54,176 INFO [CommitterEvent Processor #1] 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing 
> the event EventType: JOB_COMMIT                  
> 2015-11-02 01:24:15,851 INFO [Thread-1] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a 
> signal. Signaling RMCommunicator and JobHistoryEventHandler.
> 2015-11-02 01:24:15,851 INFO [Thread-1] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: RMCommunicator 
> notified that iSignalled is: true
> 2015-11-02 01:24:15,851 INFO [Thread-1] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator 
> isAMLastRetry: true
> the hive map run 100% and return map 0% and the job failed!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to