[jira] [Commented] (TEZ-1587) Some tez-examples fail in local mode

2014-09-17 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137036#comment-14137036
 ] 

Rajesh Balamohan commented on TEZ-1587:
---

Thanks Prakash Ramachandran.  Committed to master and branch-0.5.

 Some tez-examples fail in local mode
 

 Key: TEZ-1587
 URL: https://issues.apache.org/jira/browse/TEZ-1587
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Prakash Ramachandran
 Fix For: 0.5.1

 Attachments: tez-1587.1.patch


 *JoinExample run indefinitely, don't finish*
 {code}
 19:13:58,703 - Thread(Fetcher [hashSide] #1) - (HttpConnection.java:273) - 
 Closing connection on fetcher [hashSide] 114
 19:13:58,703 - Thread(ShuffleRunner [hashSide]) - (ShuffleManager.java:270) - 
 Scheduling fetch for inputHost: jzhangMBPr.local:0
 19:13:58,704 - Thread(ShuffleRunner [hashSide]) - (ShuffleManager.java:333) - 
 Created Fetcher for host: jzhangMBPr.local, with inputs: []
 19:14:03,599 - Thread( main) - (DAGClientRPCImpl.java:444) - DAG: State: 
 RUNNING Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
 19:14:03,601 - Thread( main) - (DAGClientRPCImpl.java:444) -  VertexStatus: 
 VertexName: hashSide Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 0 
 Failed: 0 Killed: 0
 19:14:03,602 - Thread( main) - (DAGClientRPCImpl.java:444) -  VertexStatus: 
 VertexName: streamingSide Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 0 
 Failed: 0 Killed: 0
 19:14:03,604 - Thread( main) - (DAGClientRPCImpl.java:444) -  VertexStatus: 
 VertexName: joiner Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 
 0 Killed: 0
 19:14:08,629 - Thread( main) - (DAGClientRPCImpl.java:444) - DAG: State: 
 RUNNING Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
 19:14:08,631 - Thread( main) - (DAGClientRPCImpl.java:444) -  VertexStatus: 
 VertexName: hashSide Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 0 
 Failed: 0 Killed: 0
 19:14:08,632 - Thread( main) - (DAGClientRPCImpl.java:444) -  VertexStatus: 
 VertexName: streamingSide Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 0 
 Failed: 0 Killed: 0
 19:14:08,633 - Thread( main) - (DAGClientRPCImpl.java:444) -  VertexStatus: 
 VertexName: joiner Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 
 0 Killed: 0
 19:14:13,658 - Thread( main) - (DAGClientRPCImpl.java:444) - DAG: State: 
 RUNNING Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
 {code} 
 *WordCount and OrderedWordCount fail due to the following exception*
 {code}
 19:16:47,499 - Thread( main) - (DAGClientRPCImpl.java:444) - DAG completed. 
 FinalState=FAILED
 WordCount failed with diagnostics: [Vertex re-running, vertexName=Tokenizer, 
 vertexId=vertex_1410779802886_0001_1_00, Vertex failed, vertexName=Summation, 
 vertexId=vertex_1410779802886_0001_1_01, diagnostics=[Task failed, 
 taskId=task_1410779802886_0001_1_01_00, diagnostics=[TaskAttempt 0 
 failed, info=[Error: Failure while running 
 task:org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$ShuffleError: 
 error in shuffle in fetcher [Tokenizer] #1
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$RunShuffleCallable.call(Shuffle.java:335)
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$RunShuffleCallable.call(Shuffle.java:1)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
   at java.lang.Thread.run(Thread.java:695)
 Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; 
 bailing-out.
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:375)
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.ShuffleScheduler.copyFailed(ShuffleScheduler.java:292)
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyFromHost(Fetcher.java:274)
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.run(Fetcher.java:160)
 , Container container_1410779802886_0001_00_02 finished with diagnostics 
 set to [TaskExecutionFailure: error in shuffle in fetcher [Tokenizer] #1]], 
 TaskAttempt 1 failed, info=[Error: Failure while running 
 task:org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$ShuffleError: 
 error in shuffle in fetcher [Tokenizer] #2
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$RunShuffleCallable.call(Shuffle.java:335)
   at 
 

[jira] [Commented] (TEZ-1587) Some tez-examples fail in local mode

2014-09-15 Thread Prakash Ramachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134733#comment-14134733
 ] 

Prakash Ramachandran commented on TEZ-1587:
---

working on this. the OrderedPartitionedKVEdgeConfig does not seem to pick up 
config from command line or the ones modified by user. it does seem to pick up 
from tez-site. 

 Some tez-examples fail in local mode
 

 Key: TEZ-1587
 URL: https://issues.apache.org/jira/browse/TEZ-1587
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Prakash Ramachandran

 *JoinExample run indefinitely, don't finish*
 {code}
 19:13:58,703 - Thread(Fetcher [hashSide] #1) - (HttpConnection.java:273) - 
 Closing connection on fetcher [hashSide] 114
 19:13:58,703 - Thread(ShuffleRunner [hashSide]) - (ShuffleManager.java:270) - 
 Scheduling fetch for inputHost: jzhangMBPr.local:0
 19:13:58,704 - Thread(ShuffleRunner [hashSide]) - (ShuffleManager.java:333) - 
 Created Fetcher for host: jzhangMBPr.local, with inputs: []
 19:14:03,599 - Thread( main) - (DAGClientRPCImpl.java:444) - DAG: State: 
 RUNNING Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
 19:14:03,601 - Thread( main) - (DAGClientRPCImpl.java:444) -  VertexStatus: 
 VertexName: hashSide Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 0 
 Failed: 0 Killed: 0
 19:14:03,602 - Thread( main) - (DAGClientRPCImpl.java:444) -  VertexStatus: 
 VertexName: streamingSide Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 0 
 Failed: 0 Killed: 0
 19:14:03,604 - Thread( main) - (DAGClientRPCImpl.java:444) -  VertexStatus: 
 VertexName: joiner Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 
 0 Killed: 0
 19:14:08,629 - Thread( main) - (DAGClientRPCImpl.java:444) - DAG: State: 
 RUNNING Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
 19:14:08,631 - Thread( main) - (DAGClientRPCImpl.java:444) -  VertexStatus: 
 VertexName: hashSide Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 0 
 Failed: 0 Killed: 0
 19:14:08,632 - Thread( main) - (DAGClientRPCImpl.java:444) -  VertexStatus: 
 VertexName: streamingSide Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 0 
 Failed: 0 Killed: 0
 19:14:08,633 - Thread( main) - (DAGClientRPCImpl.java:444) -  VertexStatus: 
 VertexName: joiner Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 
 0 Killed: 0
 19:14:13,658 - Thread( main) - (DAGClientRPCImpl.java:444) - DAG: State: 
 RUNNING Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
 {code} 
 *WordCount and OrderedWordCount fail due to the following exception*
 {code}
 19:16:47,499 - Thread( main) - (DAGClientRPCImpl.java:444) - DAG completed. 
 FinalState=FAILED
 WordCount failed with diagnostics: [Vertex re-running, vertexName=Tokenizer, 
 vertexId=vertex_1410779802886_0001_1_00, Vertex failed, vertexName=Summation, 
 vertexId=vertex_1410779802886_0001_1_01, diagnostics=[Task failed, 
 taskId=task_1410779802886_0001_1_01_00, diagnostics=[TaskAttempt 0 
 failed, info=[Error: Failure while running 
 task:org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$ShuffleError: 
 error in shuffle in fetcher [Tokenizer] #1
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$RunShuffleCallable.call(Shuffle.java:335)
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$RunShuffleCallable.call(Shuffle.java:1)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
   at java.lang.Thread.run(Thread.java:695)
 Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; 
 bailing-out.
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:375)
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.ShuffleScheduler.copyFailed(ShuffleScheduler.java:292)
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyFromHost(Fetcher.java:274)
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.run(Fetcher.java:160)
 , Container container_1410779802886_0001_00_02 finished with diagnostics 
 set to [TaskExecutionFailure: error in shuffle in fetcher [Tokenizer] #1]], 
 TaskAttempt 1 failed, info=[Error: Failure while running 
 task:org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$ShuffleError: 
 error in shuffle in fetcher [Tokenizer] #2
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$RunShuffleCallable.call(Shuffle.java:335)
   at 
 

[jira] [Commented] (TEZ-1587) Some tez-examples fail in local mode

2014-09-15 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134755#comment-14134755
 ] 

Rajesh Balamohan commented on TEZ-1587:
---


OrderedPartitionedKVEdgeConfig does not seem to pick up config from command 
line or the ones modified by user. 

{code}
OrderedPartitionedKVEdgeConfig summationEdgeConf = 
OrderedPartitionedKVEdgeConfig
.newBuilder(Text.class.getName(), IntWritable.class.getName(),
HashPartitioner.class.getName()).build();
{code}

If we add setFromConfiguration(tezConf), then command line options would be 
visible to the edge.

 Some tez-examples fail in local mode
 

 Key: TEZ-1587
 URL: https://issues.apache.org/jira/browse/TEZ-1587
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Prakash Ramachandran

 *JoinExample run indefinitely, don't finish*
 {code}
 19:13:58,703 - Thread(Fetcher [hashSide] #1) - (HttpConnection.java:273) - 
 Closing connection on fetcher [hashSide] 114
 19:13:58,703 - Thread(ShuffleRunner [hashSide]) - (ShuffleManager.java:270) - 
 Scheduling fetch for inputHost: jzhangMBPr.local:0
 19:13:58,704 - Thread(ShuffleRunner [hashSide]) - (ShuffleManager.java:333) - 
 Created Fetcher for host: jzhangMBPr.local, with inputs: []
 19:14:03,599 - Thread( main) - (DAGClientRPCImpl.java:444) - DAG: State: 
 RUNNING Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
 19:14:03,601 - Thread( main) - (DAGClientRPCImpl.java:444) -  VertexStatus: 
 VertexName: hashSide Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 0 
 Failed: 0 Killed: 0
 19:14:03,602 - Thread( main) - (DAGClientRPCImpl.java:444) -  VertexStatus: 
 VertexName: streamingSide Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 0 
 Failed: 0 Killed: 0
 19:14:03,604 - Thread( main) - (DAGClientRPCImpl.java:444) -  VertexStatus: 
 VertexName: joiner Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 
 0 Killed: 0
 19:14:08,629 - Thread( main) - (DAGClientRPCImpl.java:444) - DAG: State: 
 RUNNING Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
 19:14:08,631 - Thread( main) - (DAGClientRPCImpl.java:444) -  VertexStatus: 
 VertexName: hashSide Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 0 
 Failed: 0 Killed: 0
 19:14:08,632 - Thread( main) - (DAGClientRPCImpl.java:444) -  VertexStatus: 
 VertexName: streamingSide Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 0 
 Failed: 0 Killed: 0
 19:14:08,633 - Thread( main) - (DAGClientRPCImpl.java:444) -  VertexStatus: 
 VertexName: joiner Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 
 0 Killed: 0
 19:14:13,658 - Thread( main) - (DAGClientRPCImpl.java:444) - DAG: State: 
 RUNNING Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
 {code} 
 *WordCount and OrderedWordCount fail due to the following exception*
 {code}
 19:16:47,499 - Thread( main) - (DAGClientRPCImpl.java:444) - DAG completed. 
 FinalState=FAILED
 WordCount failed with diagnostics: [Vertex re-running, vertexName=Tokenizer, 
 vertexId=vertex_1410779802886_0001_1_00, Vertex failed, vertexName=Summation, 
 vertexId=vertex_1410779802886_0001_1_01, diagnostics=[Task failed, 
 taskId=task_1410779802886_0001_1_01_00, diagnostics=[TaskAttempt 0 
 failed, info=[Error: Failure while running 
 task:org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$ShuffleError: 
 error in shuffle in fetcher [Tokenizer] #1
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$RunShuffleCallable.call(Shuffle.java:335)
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$RunShuffleCallable.call(Shuffle.java:1)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
   at java.lang.Thread.run(Thread.java:695)
 Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; 
 bailing-out.
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:375)
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.ShuffleScheduler.copyFailed(ShuffleScheduler.java:292)
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyFromHost(Fetcher.java:274)
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.run(Fetcher.java:160)
 , Container container_1410779802886_0001_00_02 finished with diagnostics 
 set to [TaskExecutionFailure: error in shuffle in fetcher [Tokenizer] #1]], 
 TaskAttempt 1 failed, info=[Error: Failure while running