Hi Sergey, This indicates that one or more of your Node Managers' may have gone down. RM indicates this to AM on allocate response. If a map task ran on such a node, its output is considered unusable even though the map task has been marked as success previously. Such a map task is then KILLED and a new attempt is launched.
Regards, Varun Saxena. On Wed, Nov 11, 2015 at 11:44 PM, Sergey <[email protected]> wrote: > Hi, > > yes, there are several "failed" map, because of 600 sec time-out. > > I also found a lot messages like this in the log: > > 2015-11-09 22:00:35,882 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: TaskAttempt killed > because it ran on unusable node 10.0.0.5:30050. > AttemptId:attempt_1447029285980_0001_m_000043_1 > > Different nodes got unusable status very often. > > Do you know something about possible reason? Maybe changing some time-out > params in communication between nodes could help? > > As I already said I work in the cloud on Azure HDInsight. > > > > > > > 2015-11-11 20:33 GMT+03:00 Namikaze Minato <[email protected]>: > >> Hi. >> >> Do you also have "failed" map attempts? >> Killed map attempts won't help us understand why your job is failing. >> >> Regards, >> LLoyd >> >> >> On 11 November 2015 at 16:37, Sergey <[email protected]> wrote: >> > >> > Hi experts! >> > >> > I see strange behaviour of Hadoop while execution of my tasks. >> > It re-runs task attempt which has completed with SUCCEEDED status >> > (see the log below about attempt_1447029285980_0001_m_000012_0). >> > >> > I don't know why but this tasks repeats in attempts numbers 0,1,2,3,4 >> and >> > than 2000. >> > >> > The same story with some other tasks.. >> > A also see on screen after execution of task that some times map >> progress is >> > decreasing... >> > >> > I don't use preemption, speculative execution and don't see any >> exceptions, >> > time-outs in yarn log >> > (except last line "Container killed on request. Exit code is 143"). >> > >> > How to catch the reason? >> > >> > I use version 2.6.0 in Azure cloud (HDInsight) >> > >> > >> > 2015-11-09 19:57:45,584 INFO [IPC Server handler 17 on 53153] >> > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of >> TaskAttempt >> > attempt_1447029285980_0001_m_000012_0 is : 1.0 >> > 2015-11-09 19:57:45,592 INFO [IPC Server handler 12 on 53153] >> > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement >> from >> > attempt_1447029285980_0001_m_000012_0 >> > 2015-11-09 19:57:45,592 INFO [AsyncDispatcher event handler] >> > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: >> > attempt_1447029285980_0001_m_000012_0 TaskAttempt Transitioned from >> RUNNING >> > to SUCCESS_CONTAINER_CLEANUP >> > 2015-11-09 19:57:45,593 INFO [ContainerLauncher #4] >> > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: >> > Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container >> > container_e04_1447029285980_0001_01_002951 taskAttempt >> > attempt_1447029285980_0001_m_000012_0 >> > 2015-11-09 19:57:45,593 INFO [ContainerLauncher #4] >> > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: >> KILLING >> > attempt_1447029285980_0001_m_000012_0 >> > 2015-11-09 19:57:45,593 INFO [ContainerLauncher #4] >> > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: >> > Opening proxy : 10.0.0.8:30050 >> > 2015-11-09 19:57:45,906 INFO [AsyncDispatcher event handler] >> > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: >> > attempt_1447029285980_0001_m_000012_0 TaskAttempt Transitioned from >> > SUCCESS_CONTAINER_CLEANUP to SUCCEEDED >> > 2015-11-09 19:57:45,907 INFO [AsyncDispatcher event handler] >> > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded >> with >> > attempt attempt_1447029285980_0001_m_000012_0 >> > 2015-11-09 19:57:45,907 INFO [AsyncDispatcher event handler] >> > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: >> > task_1447029285980_0001_m_000012 Task Transitioned from RUNNING to >> SUCCEEDED >> > 2015-11-09 19:57:45,907 INFO [AsyncDispatcher event handler] >> > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed >> Tasks: 4 >> > 2015-11-09 19:57:46,553 INFO [RMCommunicator Allocator] >> > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before >> > Scheduling: PendingReds:0 ScheduledMaps:35 ScheduledReds:1 >> AssignedMaps:8 >> > AssignedReds:0 CompletedMaps:4 CompletedReds:0 ContAlloc:16 ContRel:0 >> > HostLocal:0 RackLocal:16 >> > 2015-11-09 19:57:48,575 INFO [RMCommunicator Allocator] >> > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received >> > completed container container_e04_1447029285980_0001_01_002951 >> > 2015-11-09 19:57:48,575 INFO [RMCommunicator Allocator] >> > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got >> allocated >> > containers 1 >> > 2015-11-09 19:57:48,575 INFO [RMCommunicator Allocator] >> > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to >> > reduce >> > 2015-11-09 19:57:48,575 INFO [AsyncDispatcher event handler] >> > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics >> > report from attempt_1447029285980_0001_m_000012_0: Container killed by >> the >> > ApplicationMaster. >> > Container killed on request. Exit code is 143 >> > Container exited with a non-zero exit code 143 >> > >
