Hi, yes, there are several "failed" map, because of 600 sec time-out.
I also found a lot messages like this in the log: 2015-11-09 22:00:35,882 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: TaskAttempt killed because it ran on unusable node 10.0.0.5:30050. AttemptId:attempt_1447029285980_0001_m_000043_1 Different nodes got unusable status very often. Do you know something about possible reason? Maybe changing some time-out params in communication between nodes could help? As I already said I work in the cloud on Azure HDInsight. 2015-11-11 20:33 GMT+03:00 Namikaze Minato <[email protected]>: > Hi. > > Do you also have "failed" map attempts? > Killed map attempts won't help us understand why your job is failing. > > Regards, > LLoyd > > On 11 November 2015 at 16:37, Sergey <[email protected]> wrote: > > > > Hi experts! > > > > I see strange behaviour of Hadoop while execution of my tasks. > > It re-runs task attempt which has completed with SUCCEEDED status > > (see the log below about attempt_1447029285980_0001_m_000012_0). > > > > I don't know why but this tasks repeats in attempts numbers 0,1,2,3,4 and > > than 2000. > > > > The same story with some other tasks.. > > A also see on screen after execution of task that some times map > progress is > > decreasing... > > > > I don't use preemption, speculative execution and don't see any > exceptions, > > time-outs in yarn log > > (except last line "Container killed on request. Exit code is 143"). > > > > How to catch the reason? > > > > I use version 2.6.0 in Azure cloud (HDInsight) > > > > > > 2015-11-09 19:57:45,584 INFO [IPC Server handler 17 on 53153] > > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt > > attempt_1447029285980_0001_m_000012_0 is : 1.0 > > 2015-11-09 19:57:45,592 INFO [IPC Server handler 12 on 53153] > > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement > from > > attempt_1447029285980_0001_m_000012_0 > > 2015-11-09 19:57:45,592 INFO [AsyncDispatcher event handler] > > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > > attempt_1447029285980_0001_m_000012_0 TaskAttempt Transitioned from > RUNNING > > to SUCCESS_CONTAINER_CLEANUP > > 2015-11-09 19:57:45,593 INFO [ContainerLauncher #4] > > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: > > Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container > > container_e04_1447029285980_0001_01_002951 taskAttempt > > attempt_1447029285980_0001_m_000012_0 > > 2015-11-09 19:57:45,593 INFO [ContainerLauncher #4] > > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: > KILLING > > attempt_1447029285980_0001_m_000012_0 > > 2015-11-09 19:57:45,593 INFO [ContainerLauncher #4] > > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: > > Opening proxy : 10.0.0.8:30050 > > 2015-11-09 19:57:45,906 INFO [AsyncDispatcher event handler] > > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > > attempt_1447029285980_0001_m_000012_0 TaskAttempt Transitioned from > > SUCCESS_CONTAINER_CLEANUP to SUCCEEDED > > 2015-11-09 19:57:45,907 INFO [AsyncDispatcher event handler] > > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with > > attempt attempt_1447029285980_0001_m_000012_0 > > 2015-11-09 19:57:45,907 INFO [AsyncDispatcher event handler] > > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: > > task_1447029285980_0001_m_000012 Task Transitioned from RUNNING to > SUCCEEDED > > 2015-11-09 19:57:45,907 INFO [AsyncDispatcher event handler] > > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed > Tasks: 4 > > 2015-11-09 19:57:46,553 INFO [RMCommunicator Allocator] > > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before > > Scheduling: PendingReds:0 ScheduledMaps:35 ScheduledReds:1 AssignedMaps:8 > > AssignedReds:0 CompletedMaps:4 CompletedReds:0 ContAlloc:16 ContRel:0 > > HostLocal:0 RackLocal:16 > > 2015-11-09 19:57:48,575 INFO [RMCommunicator Allocator] > > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received > > completed container container_e04_1447029285980_0001_01_002951 > > 2015-11-09 19:57:48,575 INFO [RMCommunicator Allocator] > > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated > > containers 1 > > 2015-11-09 19:57:48,575 INFO [RMCommunicator Allocator] > > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to > > reduce > > 2015-11-09 19:57:48,575 INFO [AsyncDispatcher event handler] > > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics > > report from attempt_1447029285980_0001_m_000012_0: Container killed by > the > > ApplicationMaster. > > Container killed on request. Exit code is 143 > > Container exited with a non-zero exit code 143 >
