Chris,
Is this running multiple local-dirs in YARN, and are they on different
disks ? This should only show up under normal operation if a disk goes bad.
Do you see the following log line in the logs "Finished spill 0".
Otherwise, would need access to more logs to figure out what's going on.
Also, is this easily reproducible ?

Thanks
- Sid


On Fri, May 16, 2014 at 4:06 PM, Chris K Wensel <[email protected]> wrote:

> Hey all
>
> I'm sure I'll sort this out 2 min after sending this email..
>
> but i'm getting the following exception on a simple scatter/gather, bits
> below..
>
> the salient piece is that its looking for:
>
> Could not find attempt_1400280851085_0001_1_00_000000_0_10003_spill_0.out
>
> but i can only find *_1.out in ../usercache/user/appcache/application_.../
>
> not a *_0.out.
>
> i'm on commit:
>
> * c247a3b - (HEAD, apache-github/master, apache-github/HEAD, master)TEZ-1102. 
> Abstract out connection management logic in shuffle code.
> Contributed by Rajesh Balamohan. (3 days ago)
>
>
>       outputClassName = OnFileSortedOutput.class.getName();
>       inputClassName = ShuffledMergedInputLegacy.class.getName();
>
>       movementType = EdgeProperty.DataMovementType.SCATTER_GATHER;
>       sourceType = EdgeProperty.DataSourceType.PERSISTED;
>       schedulingType = EdgeProperty.SchedulingType.SEQUENTIAL;
>
>
> 2014-05-16 15:54:40,929 INFO [AsyncDispatcher event handler]
> org.apache.tez.dag.history.HistoryEventHandler:
> [HISTORY][DAG:dag_1400280851085_0001_1][Event:DAG_FINISHED]:
> dagId=dag_1400280851085_0001_1, startTime=1400280860264,
> finishTime=1400280880908, timeTaken=20644, status=FAILED,
> diagnostics=Vertex failed, vertexName=BBE3E81575B143109A08968E135658C5,
> vertexId=vertex_1400280851085_0001_1_00, diagnostics=[Task failed,
> taskId=task_1400280851085_0001_1_00_000000,
> diagnostics=[AttemptID:attempt_1400280851085_0001_1_00_000000_0 Info:Error:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: *Could not find
> attempt_1400280851085_0001_1_00_000000_0_10003_spill_0.out* in any of the
> configured local directories
> at
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:445)
> at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164)
> at
> org.apache.tez.runtime.library.common.task.local.output.TezTaskOutputFiles.getSpillFile(TezTaskOutputFiles.java:168)
> at
> org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.mergeParts(DefaultSorter.java:988)
> at
> org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.flush(DefaultSorter.java:633)
> at
> org.apache.tez.runtime.library.output.OnFileSortedOutput.close(OnFileSortedOutput.java:124)
> at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.close(LogicalIOProcessorRuntimeTask.java:331)
> at org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:584)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:570)
>
>
> --
> Chris K Wensel
> [email protected]
> http://concurrentinc.com
>
>

Reply via email to