[jira] [Commented] (TEZ-1587) Some tez-examples fail in local mode
[ https://issues.apache.org/jira/browse/TEZ-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137036#comment-14137036 ] Rajesh Balamohan commented on TEZ-1587: --- Thanks Prakash Ramachandran. Committed to master and branch-0.5. Some tez-examples fail in local mode Key: TEZ-1587 URL: https://issues.apache.org/jira/browse/TEZ-1587 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Prakash Ramachandran Fix For: 0.5.1 Attachments: tez-1587.1.patch *JoinExample run indefinitely, don't finish* {code} 19:13:58,703 - Thread(Fetcher [hashSide] #1) - (HttpConnection.java:273) - Closing connection on fetcher [hashSide] 114 19:13:58,703 - Thread(ShuffleRunner [hashSide]) - (ShuffleManager.java:270) - Scheduling fetch for inputHost: jzhangMBPr.local:0 19:13:58,704 - Thread(ShuffleRunner [hashSide]) - (ShuffleManager.java:333) - Created Fetcher for host: jzhangMBPr.local, with inputs: [] 19:14:03,599 - Thread( main) - (DAGClientRPCImpl.java:444) - DAG: State: RUNNING Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 19:14:03,601 - Thread( main) - (DAGClientRPCImpl.java:444) - VertexStatus: VertexName: hashSide Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 19:14:03,602 - Thread( main) - (DAGClientRPCImpl.java:444) - VertexStatus: VertexName: streamingSide Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 19:14:03,604 - Thread( main) - (DAGClientRPCImpl.java:444) - VertexStatus: VertexName: joiner Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 19:14:08,629 - Thread( main) - (DAGClientRPCImpl.java:444) - DAG: State: RUNNING Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 19:14:08,631 - Thread( main) - (DAGClientRPCImpl.java:444) - VertexStatus: VertexName: hashSide Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 19:14:08,632 - Thread( main) - (DAGClientRPCImpl.java:444) - VertexStatus: VertexName: streamingSide Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 19:14:08,633 - Thread( main) - (DAGClientRPCImpl.java:444) - VertexStatus: VertexName: joiner Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 19:14:13,658 - Thread( main) - (DAGClientRPCImpl.java:444) - DAG: State: RUNNING Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 {code} *WordCount and OrderedWordCount fail due to the following exception* {code} 19:16:47,499 - Thread( main) - (DAGClientRPCImpl.java:444) - DAG completed. FinalState=FAILED WordCount failed with diagnostics: [Vertex re-running, vertexName=Tokenizer, vertexId=vertex_1410779802886_0001_1_00, Vertex failed, vertexName=Summation, vertexId=vertex_1410779802886_0001_1_01, diagnostics=[Task failed, taskId=task_1410779802886_0001_1_01_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$ShuffleError: error in shuffle in fetcher [Tokenizer] #1 at org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$RunShuffleCallable.call(Shuffle.java:335) at org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$RunShuffleCallable.call(Shuffle.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. at org.apache.tez.runtime.library.common.shuffle.impl.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:375) at org.apache.tez.runtime.library.common.shuffle.impl.ShuffleScheduler.copyFailed(ShuffleScheduler.java:292) at org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyFromHost(Fetcher.java:274) at org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.run(Fetcher.java:160) , Container container_1410779802886_0001_00_02 finished with diagnostics set to [TaskExecutionFailure: error in shuffle in fetcher [Tokenizer] #1]], TaskAttempt 1 failed, info=[Error: Failure while running task:org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$ShuffleError: error in shuffle in fetcher [Tokenizer] #2 at org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$RunShuffleCallable.call(Shuffle.java:335) at
[jira] [Commented] (TEZ-1587) Some tez-examples fail in local mode
[ https://issues.apache.org/jira/browse/TEZ-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134733#comment-14134733 ] Prakash Ramachandran commented on TEZ-1587: --- working on this. the OrderedPartitionedKVEdgeConfig does not seem to pick up config from command line or the ones modified by user. it does seem to pick up from tez-site. Some tez-examples fail in local mode Key: TEZ-1587 URL: https://issues.apache.org/jira/browse/TEZ-1587 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Prakash Ramachandran *JoinExample run indefinitely, don't finish* {code} 19:13:58,703 - Thread(Fetcher [hashSide] #1) - (HttpConnection.java:273) - Closing connection on fetcher [hashSide] 114 19:13:58,703 - Thread(ShuffleRunner [hashSide]) - (ShuffleManager.java:270) - Scheduling fetch for inputHost: jzhangMBPr.local:0 19:13:58,704 - Thread(ShuffleRunner [hashSide]) - (ShuffleManager.java:333) - Created Fetcher for host: jzhangMBPr.local, with inputs: [] 19:14:03,599 - Thread( main) - (DAGClientRPCImpl.java:444) - DAG: State: RUNNING Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 19:14:03,601 - Thread( main) - (DAGClientRPCImpl.java:444) - VertexStatus: VertexName: hashSide Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 19:14:03,602 - Thread( main) - (DAGClientRPCImpl.java:444) - VertexStatus: VertexName: streamingSide Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 19:14:03,604 - Thread( main) - (DAGClientRPCImpl.java:444) - VertexStatus: VertexName: joiner Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 19:14:08,629 - Thread( main) - (DAGClientRPCImpl.java:444) - DAG: State: RUNNING Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 19:14:08,631 - Thread( main) - (DAGClientRPCImpl.java:444) - VertexStatus: VertexName: hashSide Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 19:14:08,632 - Thread( main) - (DAGClientRPCImpl.java:444) - VertexStatus: VertexName: streamingSide Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 19:14:08,633 - Thread( main) - (DAGClientRPCImpl.java:444) - VertexStatus: VertexName: joiner Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 19:14:13,658 - Thread( main) - (DAGClientRPCImpl.java:444) - DAG: State: RUNNING Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 {code} *WordCount and OrderedWordCount fail due to the following exception* {code} 19:16:47,499 - Thread( main) - (DAGClientRPCImpl.java:444) - DAG completed. FinalState=FAILED WordCount failed with diagnostics: [Vertex re-running, vertexName=Tokenizer, vertexId=vertex_1410779802886_0001_1_00, Vertex failed, vertexName=Summation, vertexId=vertex_1410779802886_0001_1_01, diagnostics=[Task failed, taskId=task_1410779802886_0001_1_01_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$ShuffleError: error in shuffle in fetcher [Tokenizer] #1 at org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$RunShuffleCallable.call(Shuffle.java:335) at org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$RunShuffleCallable.call(Shuffle.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. at org.apache.tez.runtime.library.common.shuffle.impl.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:375) at org.apache.tez.runtime.library.common.shuffle.impl.ShuffleScheduler.copyFailed(ShuffleScheduler.java:292) at org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyFromHost(Fetcher.java:274) at org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.run(Fetcher.java:160) , Container container_1410779802886_0001_00_02 finished with diagnostics set to [TaskExecutionFailure: error in shuffle in fetcher [Tokenizer] #1]], TaskAttempt 1 failed, info=[Error: Failure while running task:org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$ShuffleError: error in shuffle in fetcher [Tokenizer] #2 at org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$RunShuffleCallable.call(Shuffle.java:335) at
[jira] [Commented] (TEZ-1587) Some tez-examples fail in local mode
[ https://issues.apache.org/jira/browse/TEZ-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134755#comment-14134755 ] Rajesh Balamohan commented on TEZ-1587: --- OrderedPartitionedKVEdgeConfig does not seem to pick up config from command line or the ones modified by user. {code} OrderedPartitionedKVEdgeConfig summationEdgeConf = OrderedPartitionedKVEdgeConfig .newBuilder(Text.class.getName(), IntWritable.class.getName(), HashPartitioner.class.getName()).build(); {code} If we add setFromConfiguration(tezConf), then command line options would be visible to the edge. Some tez-examples fail in local mode Key: TEZ-1587 URL: https://issues.apache.org/jira/browse/TEZ-1587 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Prakash Ramachandran *JoinExample run indefinitely, don't finish* {code} 19:13:58,703 - Thread(Fetcher [hashSide] #1) - (HttpConnection.java:273) - Closing connection on fetcher [hashSide] 114 19:13:58,703 - Thread(ShuffleRunner [hashSide]) - (ShuffleManager.java:270) - Scheduling fetch for inputHost: jzhangMBPr.local:0 19:13:58,704 - Thread(ShuffleRunner [hashSide]) - (ShuffleManager.java:333) - Created Fetcher for host: jzhangMBPr.local, with inputs: [] 19:14:03,599 - Thread( main) - (DAGClientRPCImpl.java:444) - DAG: State: RUNNING Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 19:14:03,601 - Thread( main) - (DAGClientRPCImpl.java:444) - VertexStatus: VertexName: hashSide Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 19:14:03,602 - Thread( main) - (DAGClientRPCImpl.java:444) - VertexStatus: VertexName: streamingSide Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 19:14:03,604 - Thread( main) - (DAGClientRPCImpl.java:444) - VertexStatus: VertexName: joiner Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 19:14:08,629 - Thread( main) - (DAGClientRPCImpl.java:444) - DAG: State: RUNNING Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 19:14:08,631 - Thread( main) - (DAGClientRPCImpl.java:444) - VertexStatus: VertexName: hashSide Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 19:14:08,632 - Thread( main) - (DAGClientRPCImpl.java:444) - VertexStatus: VertexName: streamingSide Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 19:14:08,633 - Thread( main) - (DAGClientRPCImpl.java:444) - VertexStatus: VertexName: joiner Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 19:14:13,658 - Thread( main) - (DAGClientRPCImpl.java:444) - DAG: State: RUNNING Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 {code} *WordCount and OrderedWordCount fail due to the following exception* {code} 19:16:47,499 - Thread( main) - (DAGClientRPCImpl.java:444) - DAG completed. FinalState=FAILED WordCount failed with diagnostics: [Vertex re-running, vertexName=Tokenizer, vertexId=vertex_1410779802886_0001_1_00, Vertex failed, vertexName=Summation, vertexId=vertex_1410779802886_0001_1_01, diagnostics=[Task failed, taskId=task_1410779802886_0001_1_01_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$ShuffleError: error in shuffle in fetcher [Tokenizer] #1 at org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$RunShuffleCallable.call(Shuffle.java:335) at org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$RunShuffleCallable.call(Shuffle.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. at org.apache.tez.runtime.library.common.shuffle.impl.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:375) at org.apache.tez.runtime.library.common.shuffle.impl.ShuffleScheduler.copyFailed(ShuffleScheduler.java:292) at org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyFromHost(Fetcher.java:274) at org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.run(Fetcher.java:160) , Container container_1410779802886_0001_00_02 finished with diagnostics set to [TaskExecutionFailure: error in shuffle in fetcher [Tokenizer] #1]], TaskAttempt 1 failed, info=[Error: Failure while running