[ 
https://issues.apache.org/jira/browse/TEZ-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14112117#comment-14112117
 ] 

Jeff Zhang edited comment on TEZ-1506 at 8/27/14 10:49 AM:
-----------------------------------------------------------

[~rajesh.balamohan] I check the logs and find that 
*attempt_1409114848645_0041_1_01_000000_0* get the wrong TaskSpec 
(*attempt_1409114848645_0041_1_00_000000_0*) in its EventRouterThread, which 
cause *attempt_1409114848645_0041_1_00_000000_0* fail but actually it succeeded 
before. What tez jobs did you run ? Could you attach the job code ? And you 
could turn debug on to get more info in logs.

here's the log of syslog_attempt_1409114848645_0041_1_01_000000_0 that get the 
wrong task spec, but get the right task spec in its 
LogicalIOProcessorRuntimeTask's init. ( very weird )
{code}
2014-08-27 01:27:31,858 INFO [main] 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask: Initializing 
LogicalIOProcessorRuntimeTask with TaskSpec: DAGName : TEZ_1494, VertexName: 
Reducer_3, VertexParallelism: 2, 
TaskAttemptID:attempt_1409114848645_0041_1_01_000000_0, 
processorName=org.apache.tez.Tez_1494$EchoProcessor, inputSpecListSize=1, 
outputSpecListSize=1, inputSpecList=[{{ sourceVertexName=Map_2, 
physicalEdgeCount=1, 
inputClassName=org.apache.tez.runtime.library.input.OrderedGroupedKVInput }}, 
], outputSpecList=[{{ destinationVertexName=Map_5, physicalEdgeCount=10, 
outputClassName=org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput
 }}, ]
2014-08-27 01:27:31,858 INFO [main] 
org.apache.tez.runtime.common.resources.MemoryDistributor: 
InitialMemoryDistributor (isEnabled=true) invoked with: numInputs=1, 
numOutputs=1, JVM.maxFree=764411904

2014-08-27 01:27:31,910 WARN 
[TezTaskEventRouter[attempt_1409114848645_0041_1_00_000000_0]] 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask: Failed to handle event
java.lang.RuntimeException: java.io.IOException: Failed on local exception: 
java.nio.channels.ClosedByInterruptException; Host Details : local host is: 
"tez-vm/127.0.1.1"; destination host is: "tez-vm":56565;
    at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:181)
    at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.<init>(TezGroupedSplitsInputFormat.java:120)
    at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:86)
    at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:146)
    at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
    at 
org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:573)
    at 
org.apache.tez.mapreduce.input.MRInput.processSplitEvent(MRInput.java:525)
    at org.apache.tez.mapreduce.input.MRInput.handleEvents(MRInput.java:493)
    at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.handleEvent(LogicalIOProcessorRuntimeTask.java:610)
    at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.access$1100(LogicalIOProcessorRuntimeTask.java:90)
    at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$1.run(LogicalIOProcessorRuntimeTask.java:673)
    at java.lang.Thread.run(Thread.java:744)
{code}


was (Author: zjffdu):
[~rajesh.balamohan] I check the logs and find that 
*attempt_1409114848645_0041_1_01_000000_0* get the wrong TaskSpec 
(*attempt_1409114848645_0041_1_00_000000_0*) in its EventRouterThread, which 
cause *attempt_1409114848645_0041_1_00_000000_0* fail but actually it succeeded 
before. What tez jobs did you run ? Could you attach the job code ? And you 
could turn debug on to get more info in logs.

here's the log of syslog_attempt_1409114848645_0041_1_01_000000_0 that get the 
wrong task spect, but get the right task spec in its 
LogicalIOProcessorRuntimeTask's init. 
{code}
2014-08-27 01:27:31,858 INFO [main] 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask: Initializing 
LogicalIOProcessorRuntimeTask with TaskSpec: DAGName : TEZ_1494, VertexName: 
Reducer_3, VertexParallelism: 2, 
TaskAttemptID:attempt_1409114848645_0041_1_01_000000_0, 
processorName=org.apache.tez.Tez_1494$EchoProcessor, inputSpecListSize=1, 
outputSpecListSize=1, inputSpecList=[{{ sourceVertexName=Map_2, 
physicalEdgeCount=1, 
inputClassName=org.apache.tez.runtime.library.input.OrderedGroupedKVInput }}, 
], outputSpecList=[{{ destinationVertexName=Map_5, physicalEdgeCount=10, 
outputClassName=org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput
 }}, ]
2014-08-27 01:27:31,858 INFO [main] 
org.apache.tez.runtime.common.resources.MemoryDistributor: 
InitialMemoryDistributor (isEnabled=true) invoked with: numInputs=1, 
numOutputs=1, JVM.maxFree=764411904

2014-08-27 01:27:31,910 WARN 
[TezTaskEventRouter[attempt_1409114848645_0041_1_00_000000_0]] 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask: Failed to handle event
java.lang.RuntimeException: java.io.IOException: Failed on local exception: 
java.nio.channels.ClosedByInterruptException; Host Details : local host is: 
"tez-vm/127.0.1.1"; destination host is: "tez-vm":56565;
    at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:181)
    at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.<init>(TezGroupedSplitsInputFormat.java:120)
    at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:86)
    at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:146)
    at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
    at 
org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:573)
    at 
org.apache.tez.mapreduce.input.MRInput.processSplitEvent(MRInput.java:525)
    at org.apache.tez.mapreduce.input.MRInput.handleEvents(MRInput.java:493)
    at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.handleEvent(LogicalIOProcessorRuntimeTask.java:610)
    at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.access$1100(LogicalIOProcessorRuntimeTask.java:90)
    at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$1.run(LogicalIOProcessorRuntimeTask.java:673)
    at java.lang.Thread.run(Thread.java:744)
{code}

> Possible statemachine bug : TaskAttemptImpl.handle()-->Invalid event: 
> TA_FAILED at SUCCEEDED
> --------------------------------------------------------------------------------------------
>
>                 Key: TEZ-1506
>                 URL: https://issues.apache.org/jira/browse/TEZ-1506
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>         Attachments: applogs.tar.gz
>
>
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> TA_FAILED at SUCCEEDED
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:720)
>         at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:106)
>         at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:1623)
>         at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:1608)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>         at java.lang.Thread.run(Thread.java:744)
> Will attach the app logs soon. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to