Hi experts! I see strange behaviour of Hadoop while execution of my tasks. It re-runs task attempt which has completed with SUCCEEDED status (see the log below about attempt_1447029285980_0001_m_000012_0).
I don't know why but this tasks repeats in attempts numbers 0,1,2,3,4 and than 2000. The same story with some other tasks.. A also see on screen after execution of task that some times map progress is decreasing... I don't use preemption, speculative execution and don't see any exceptions, time-outs in yarn log (except last line "Container killed on request. Exit code is 143"). How to catch the reason? I use version 2.6.0 in Azure cloud (HDInsight) 2015-11-09 19:57:45,584 INFO [IPC Server handler 17 on 53153] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1447029285980_0001_m_000012_0 is : 1.0 2015-11-09 19:57:45,592 INFO [IPC Server handler 12 on 53153] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from attempt_1447029285980_0001_m_000012_0 2015-11-09 19:57:45,592 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1447029285980_0001_m_000012_0 TaskAttempt Transitioned from RUNNING to SUCCESS_CONTAINER_CLEANUP 2015-11-09 19:57:45,593 INFO [ContainerLauncher #4] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_e04_1447029285980_0001_01_002951 taskAttempt attempt_1447029285980_0001_m_000012_0 2015-11-09 19:57:45,593 INFO [ContainerLauncher #4] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1447029285980_0001_m_000012_0 2015-11-09 19:57:45,593 INFO [ContainerLauncher #4] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : 10.0.0.8:30050 2015-11-09 19:57:45,906 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1447029285980_0001_m_000012_0 TaskAttempt Transitioned from SUCCESS_CONTAINER_CLEANUP to SUCCEEDED 2015-11-09 19:57:45,907 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1447029285980_0001_m_000012_0 2015-11-09 19:57:45,907 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1447029285980_0001_m_000012 Task Transitioned from RUNNING to SUCCEEDED 2015-11-09 19:57:45,907 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 4 2015-11-09 19:57:46,553 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:35 ScheduledReds:1 AssignedMaps:8 AssignedReds:0 CompletedMaps:4 CompletedReds:0 ContAlloc:16 ContRel:0 HostLocal:0 RackLocal:16 2015-11-09 19:57:48,575 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_e04_1447029285980_0001_01_002951 2015-11-09 19:57:48,575 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 1 2015-11-09 19:57:48,575 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce 2015-11-09 19:57:48,575 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1447029285980_0001_m_000012_0: Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143
