Re: Re-execution of map task

Varun Saxena Wed, 11 Nov 2015 10:24:27 -0800

Hi Sergey,

This indicates that one or more of your Node Managers' may have gone down.
RM indicates this to AM on allocate response.
If a map task ran on such a node, its output is considered unusable even
though the map task has been marked as success previously.
Such a map task is then KILLED and a new attempt is launched.


Regards,
Varun Saxena.

On Wed, Nov 11, 2015 at 11:44 PM, Sergey <[email protected]> wrote:

> Hi,
>
> yes, there are several "failed" map, because of 600 sec time-out.
>
> I also found a lot messages like this in the log:
>
> 2015-11-09 22:00:35,882 INFO [AsyncDispatcher event handler]
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: TaskAttempt killed
> because it ran on unusable node 10.0.0.5:30050.
> AttemptId:attempt_1447029285980_0001_m_000043_1
>
> Different nodes got unusable status very often.
>
> Do you know something about possible reason? Maybe changing some time-out
> params in communication between nodes could help?
>
> As I already said I work in the cloud on Azure HDInsight.
>
>
>
>
>
>
> 2015-11-11 20:33 GMT+03:00 Namikaze Minato <[email protected]>:
>
>> Hi.
>>
>> Do you also have "failed" map attempts?
>> Killed map attempts won't help us understand why your job is failing.
>>
>> Regards,
>> LLoyd
>>
>>
>> On 11 November 2015 at 16:37, Sergey <[email protected]> wrote:
>> >
>> > Hi experts!
>> >
>> > I see strange behaviour of Hadoop while execution of my tasks.
>> > It re-runs task attempt which has completed with SUCCEEDED status
>> > (see the log below about attempt_1447029285980_0001_m_000012_0).
>> >
>> > I don't know why but this tasks repeats in attempts numbers 0,1,2,3,4
>> and
>> > than 2000.
>> >
>> > The same story with some other tasks..
>> > A also see on screen after execution of task that some times map
>> progress is
>> > decreasing...
>> >
>> > I don't use preemption, speculative execution and don't see any
>> exceptions,
>> > time-outs in yarn log
>> > (except last line "Container killed on request. Exit code is 143").
>> >
>> > How to catch the reason?
>> >
>> > I use version 2.6.0 in Azure cloud (HDInsight)
>> >
>> >
>> > 2015-11-09 19:57:45,584 INFO [IPC Server handler 17 on 53153]
>> > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of
>> TaskAttempt
>> > attempt_1447029285980_0001_m_000012_0 is : 1.0
>> > 2015-11-09 19:57:45,592 INFO [IPC Server handler 12 on 53153]
>> > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement
>> from
>> > attempt_1447029285980_0001_m_000012_0
>> > 2015-11-09 19:57:45,592 INFO [AsyncDispatcher event handler]
>> > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
>> > attempt_1447029285980_0001_m_000012_0 TaskAttempt Transitioned from
>> RUNNING
>> > to SUCCESS_CONTAINER_CLEANUP
>> > 2015-11-09 19:57:45,593 INFO [ContainerLauncher #4]
>> > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl:
>> > Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container
>> > container_e04_1447029285980_0001_01_002951 taskAttempt
>> > attempt_1447029285980_0001_m_000012_0
>> > 2015-11-09 19:57:45,593 INFO [ContainerLauncher #4]
>> > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl:
>> KILLING
>> > attempt_1447029285980_0001_m_000012_0
>> > 2015-11-09 19:57:45,593 INFO [ContainerLauncher #4]
>> > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy:
>> > Opening proxy : 10.0.0.8:30050
>> > 2015-11-09 19:57:45,906 INFO [AsyncDispatcher event handler]
>> > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
>> > attempt_1447029285980_0001_m_000012_0 TaskAttempt Transitioned from
>> > SUCCESS_CONTAINER_CLEANUP to SUCCEEDED
>> > 2015-11-09 19:57:45,907 INFO [AsyncDispatcher event handler]
>> > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded
>> with
>> > attempt attempt_1447029285980_0001_m_000012_0
>> > 2015-11-09 19:57:45,907 INFO [AsyncDispatcher event handler]
>> > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:
>> > task_1447029285980_0001_m_000012 Task Transitioned from RUNNING to
>> SUCCEEDED
>> > 2015-11-09 19:57:45,907 INFO [AsyncDispatcher event handler]
>> > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed
>> Tasks: 4
>> > 2015-11-09 19:57:46,553 INFO [RMCommunicator Allocator]
>> > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
>> > Scheduling: PendingReds:0 ScheduledMaps:35 ScheduledReds:1
>> AssignedMaps:8
>> > AssignedReds:0 CompletedMaps:4 CompletedReds:0 ContAlloc:16 ContRel:0
>> > HostLocal:0 RackLocal:16
>> > 2015-11-09 19:57:48,575 INFO [RMCommunicator Allocator]
>> > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received
>> > completed container container_e04_1447029285980_0001_01_002951
>> > 2015-11-09 19:57:48,575 INFO [RMCommunicator Allocator]
>> > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got
>> allocated
>> > containers 1
>> > 2015-11-09 19:57:48,575 INFO [RMCommunicator Allocator]
>> > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to
>> > reduce
>> > 2015-11-09 19:57:48,575 INFO [AsyncDispatcher event handler]
>> > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics
>> > report from attempt_1447029285980_0001_m_000012_0: Container killed by
>> the
>> > ApplicationMaster.
>> > Container killed on request. Exit code is 143
>> > Container exited with a non-zero exit code 143
>>
>
>

Re: Re-execution of map task

Reply via email to