Re: Re-execution of map task

Sergey Wed, 11 Nov 2015 10:14:57 -0800

Hi,

yes, there are several "failed" map, because of 600 sec time-out.


I also found a lot messages like this in the log:

2015-11-09 22:00:35,882 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: TaskAttempt killed
because it ran on unusable node 10.0.0.5:30050.
AttemptId:attempt_1447029285980_0001_m_000043_1

Different nodes got unusable status very often.

Do you know something about possible reason? Maybe changing some time-out
params in communication between nodes could help?

As I already said I work in the cloud on Azure HDInsight.






2015-11-11 20:33 GMT+03:00 Namikaze Minato <[email protected]>:

> Hi.
>
> Do you also have "failed" map attempts?
> Killed map attempts won't help us understand why your job is failing.
>
> Regards,
> LLoyd
>
> On 11 November 2015 at 16:37, Sergey <[email protected]> wrote:
> >
> > Hi experts!
> >
> > I see strange behaviour of Hadoop while execution of my tasks.
> > It re-runs task attempt which has completed with SUCCEEDED status
> > (see the log below about attempt_1447029285980_0001_m_000012_0).
> >
> > I don't know why but this tasks repeats in attempts numbers 0,1,2,3,4 and
> > than 2000.
> >
> > The same story with some other tasks..
> > A also see on screen after execution of task that some times map
> progress is
> > decreasing...
> >
> > I don't use preemption, speculative execution and don't see any
> exceptions,
> > time-outs in yarn log
> > (except last line "Container killed on request. Exit code is 143").
> >
> > How to catch the reason?
> >
> > I use version 2.6.0 in Azure cloud (HDInsight)
> >
> >
> > 2015-11-09 19:57:45,584 INFO [IPC Server handler 17 on 53153]
> > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt
> > attempt_1447029285980_0001_m_000012_0 is : 1.0
> > 2015-11-09 19:57:45,592 INFO [IPC Server handler 12 on 53153]
> > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement
> from
> > attempt_1447029285980_0001_m_000012_0
> > 2015-11-09 19:57:45,592 INFO [AsyncDispatcher event handler]
> > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
> > attempt_1447029285980_0001_m_000012_0 TaskAttempt Transitioned from
> RUNNING
> > to SUCCESS_CONTAINER_CLEANUP
> > 2015-11-09 19:57:45,593 INFO [ContainerLauncher #4]
> > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl:
> > Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container
> > container_e04_1447029285980_0001_01_002951 taskAttempt
> > attempt_1447029285980_0001_m_000012_0
> > 2015-11-09 19:57:45,593 INFO [ContainerLauncher #4]
> > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl:
> KILLING
> > attempt_1447029285980_0001_m_000012_0
> > 2015-11-09 19:57:45,593 INFO [ContainerLauncher #4]
> > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy:
> > Opening proxy : 10.0.0.8:30050
> > 2015-11-09 19:57:45,906 INFO [AsyncDispatcher event handler]
> > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
> > attempt_1447029285980_0001_m_000012_0 TaskAttempt Transitioned from
> > SUCCESS_CONTAINER_CLEANUP to SUCCEEDED
> > 2015-11-09 19:57:45,907 INFO [AsyncDispatcher event handler]
> > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with
> > attempt attempt_1447029285980_0001_m_000012_0
> > 2015-11-09 19:57:45,907 INFO [AsyncDispatcher event handler]
> > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:
> > task_1447029285980_0001_m_000012 Task Transitioned from RUNNING to
> SUCCEEDED
> > 2015-11-09 19:57:45,907 INFO [AsyncDispatcher event handler]
> > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed
> Tasks: 4
> > 2015-11-09 19:57:46,553 INFO [RMCommunicator Allocator]
> > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
> > Scheduling: PendingReds:0 ScheduledMaps:35 ScheduledReds:1 AssignedMaps:8
> > AssignedReds:0 CompletedMaps:4 CompletedReds:0 ContAlloc:16 ContRel:0
> > HostLocal:0 RackLocal:16
> > 2015-11-09 19:57:48,575 INFO [RMCommunicator Allocator]
> > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received
> > completed container container_e04_1447029285980_0001_01_002951
> > 2015-11-09 19:57:48,575 INFO [RMCommunicator Allocator]
> > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated
> > containers 1
> > 2015-11-09 19:57:48,575 INFO [RMCommunicator Allocator]
> > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to
> > reduce
> > 2015-11-09 19:57:48,575 INFO [AsyncDispatcher event handler]
> > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics
> > report from attempt_1447029285980_0001_m_000012_0: Container killed by
> the
> > ApplicationMaster.
> > Container killed on request. Exit code is 143
> > Container exited with a non-zero exit code 143
>

Re: Re-execution of map task

Reply via email to