Hi Varun, thank you! But the main question is the reason...
I did check log files with yarn logs and local logs on nodes. And don't see reasons why Hadoop can think that nodes gone unusable. Regards, Sergey. 2015-11-11 21:23 GMT+03:00 Varun Saxena <[email protected]>: > Hi Sergey, > > This indicates that one or more of your Node Managers' may have gone down. > RM indicates this to AM on allocate response. > If a map task ran on such a node, its output is considered unusable even > though the map task has been marked as success previously. > Such a map task is then KILLED and a new attempt is launched. > > Regards, > Varun Saxena. > > On Wed, Nov 11, 2015 at 11:44 PM, Sergey <[email protected]> wrote: > >> Hi, >> >> yes, there are several "failed" map, because of 600 sec time-out. >> >> I also found a lot messages like this in the log: >> >> 2015-11-09 22:00:35,882 INFO [AsyncDispatcher event handler] >> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: TaskAttempt killed >> because it ran on unusable node 10.0.0.5:30050. >> AttemptId:attempt_1447029285980_0001_m_000043_1 >> >> Different nodes got unusable status very often. >> >> Do you know something about possible reason? Maybe changing some time-out >> params in communication between nodes could help? >> >> As I already said I work in the cloud on Azure HDInsight. >> >> >> >> >> >> >> 2015-11-11 20:33 GMT+03:00 Namikaze Minato <[email protected]>: >> >>> Hi. >>> >>> Do you also have "failed" map attempts? >>> Killed map attempts won't help us understand why your job is failing. >>> >>> Regards, >>> LLoyd >>> >>> >>> On 11 November 2015 at 16:37, Sergey <[email protected]> wrote: >>> > >>> > Hi experts! >>> > >>> > I see strange behaviour of Hadoop while execution of my tasks. >>> > It re-runs task attempt which has completed with SUCCEEDED status >>> > (see the log below about attempt_1447029285980_0001_m_000012_0). >>> > >>> > I don't know why but this tasks repeats in attempts numbers 0,1,2,3,4 >>> and >>> > than 2000. >>> > >>> > The same story with some other tasks.. >>> > A also see on screen after execution of task that some times map >>> progress is >>> > decreasing... >>> > >>> > I don't use preemption, speculative execution and don't see any >>> exceptions, >>> > time-outs in yarn log >>> > (except last line "Container killed on request. Exit code is 143"). >>> > >>> > How to catch the reason? >>> > >>> > I use version 2.6.0 in Azure cloud (HDInsight) >>> > >>> > >>> > 2015-11-09 19:57:45,584 INFO [IPC Server handler 17 on 53153] >>> > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of >>> TaskAttempt >>> > attempt_1447029285980_0001_m_000012_0 is : 1.0 >>> > 2015-11-09 19:57:45,592 INFO [IPC Server handler 12 on 53153] >>> > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement >>> from >>> > attempt_1447029285980_0001_m_000012_0 >>> > 2015-11-09 19:57:45,592 INFO [AsyncDispatcher event handler] >>> > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: >>> > attempt_1447029285980_0001_m_000012_0 TaskAttempt Transitioned from >>> RUNNING >>> > to SUCCESS_CONTAINER_CLEANUP >>> > 2015-11-09 19:57:45,593 INFO [ContainerLauncher #4] >>> > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: >>> > Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container >>> > container_e04_1447029285980_0001_01_002951 taskAttempt >>> > attempt_1447029285980_0001_m_000012_0 >>> > 2015-11-09 19:57:45,593 INFO [ContainerLauncher #4] >>> > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: >>> KILLING >>> > attempt_1447029285980_0001_m_000012_0 >>> > 2015-11-09 19:57:45,593 INFO [ContainerLauncher #4] >>> > >>> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: >>> > Opening proxy : 10.0.0.8:30050 >>> > 2015-11-09 19:57:45,906 INFO [AsyncDispatcher event handler] >>> > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: >>> > attempt_1447029285980_0001_m_000012_0 TaskAttempt Transitioned from >>> > SUCCESS_CONTAINER_CLEANUP to SUCCEEDED >>> > 2015-11-09 19:57:45,907 INFO [AsyncDispatcher event handler] >>> > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded >>> with >>> > attempt attempt_1447029285980_0001_m_000012_0 >>> > 2015-11-09 19:57:45,907 INFO [AsyncDispatcher event handler] >>> > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: >>> > task_1447029285980_0001_m_000012 Task Transitioned from RUNNING to >>> SUCCEEDED >>> > 2015-11-09 19:57:45,907 INFO [AsyncDispatcher event handler] >>> > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed >>> Tasks: 4 >>> > 2015-11-09 19:57:46,553 INFO [RMCommunicator Allocator] >>> > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before >>> > Scheduling: PendingReds:0 ScheduledMaps:35 ScheduledReds:1 >>> AssignedMaps:8 >>> > AssignedReds:0 CompletedMaps:4 CompletedReds:0 ContAlloc:16 ContRel:0 >>> > HostLocal:0 RackLocal:16 >>> > 2015-11-09 19:57:48,575 INFO [RMCommunicator Allocator] >>> > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received >>> > completed container container_e04_1447029285980_0001_01_002951 >>> > 2015-11-09 19:57:48,575 INFO [RMCommunicator Allocator] >>> > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got >>> allocated >>> > containers 1 >>> > 2015-11-09 19:57:48,575 INFO [RMCommunicator Allocator] >>> > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to >>> > reduce >>> > 2015-11-09 19:57:48,575 INFO [AsyncDispatcher event handler] >>> > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: >>> Diagnostics >>> > report from attempt_1447029285980_0001_m_000012_0: Container killed by >>> the >>> > ApplicationMaster. >>> > Container killed on request. Exit code is 143 >>> > Container exited with a non-zero exit code 143 >>> >> >> >
