[jira] [Commented] (TEZ-978) Enhance auto parallelism tuning for queries having empty outputs or data skewness

2014-09-22 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142985#comment-14142985
 ] 

Bikas Saha commented on TEZ-978:


looks good. committing this.

 Enhance auto parallelism tuning for queries having empty outputs or data 
 skewness
 -

 Key: TEZ-978
 URL: https://issues.apache.org/jira/browse/TEZ-978
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-978-v1.patch, TEZ-978-v2.patch, TEZ-978.3.patch, 
 TEZ-978.4.patch, TEZ-978.4.wip.patch, TEZ-978.5.patch, TEZ-978.6.patch


 Running tpcds (query-92) with auto-tuning 
 tez.am.shuffle-vertex-manager.enable.auto-parallel degraded the performance 
 than original run.  
 Query has lots of empty outputs and these tasks tend to complete a lot more 
 faster than others.  Tez computes the parallelism with the given information 
 (wherein most of the output is empty) and set the reducers to 1.  When 
 other tasks complete, single reducer has to do the heavy lifting and this 
 causes the performance degradation.
 Map 1: 2/181Map 5: 16/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 22/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 25/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 30/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 35/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 36/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 39/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 3/181Map 5: 43/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 5/181Map 5: 46/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1   === 
 ShuffleVertexManager changing parallelism 
 Map 1: 5/181Map 5: 63/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 7/181Map 5: 72/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 7/181Map 5: 83/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 8/181Map 5: 95/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 8/181Map 5: 104/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 9/181Map 5: 116/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 12/181   Map 5: 123/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 13/181   Map 5: 127/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 16/181   Map 5: 127/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 17/181   Map 5: 128/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 18/181   Map 5: 131/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 19/181   Map 5: 131/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 25/181   Map 5: 132/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 33/181   Map 5: 132/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 42/181   Map 5: 134/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1 
 === ShuffleVertexManager changing parallelism 
 Map 1: 51/181   Map 5: 135/179  

[jira] [Commented] (TEZ-978) Enhance auto parallelism tuning for queries having empty outputs or data skewness

2014-09-22 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142988#comment-14142988
 ] 

Bikas Saha commented on TEZ-978:


Looks good. Please commit this to 0.5 also. Thanks!

 Enhance auto parallelism tuning for queries having empty outputs or data 
 skewness
 -

 Key: TEZ-978
 URL: https://issues.apache.org/jira/browse/TEZ-978
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-978-v1.patch, TEZ-978-v2.patch, TEZ-978.3.patch, 
 TEZ-978.4.patch, TEZ-978.4.wip.patch, TEZ-978.5.patch, TEZ-978.6.patch


 Running tpcds (query-92) with auto-tuning 
 tez.am.shuffle-vertex-manager.enable.auto-parallel degraded the performance 
 than original run.  
 Query has lots of empty outputs and these tasks tend to complete a lot more 
 faster than others.  Tez computes the parallelism with the given information 
 (wherein most of the output is empty) and set the reducers to 1.  When 
 other tasks complete, single reducer has to do the heavy lifting and this 
 causes the performance degradation.
 Map 1: 2/181Map 5: 16/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 22/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 25/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 30/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 35/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 36/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 39/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 3/181Map 5: 43/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 5/181Map 5: 46/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1   === 
 ShuffleVertexManager changing parallelism 
 Map 1: 5/181Map 5: 63/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 7/181Map 5: 72/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 7/181Map 5: 83/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 8/181Map 5: 95/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 8/181Map 5: 104/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 9/181Map 5: 116/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 12/181   Map 5: 123/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 13/181   Map 5: 127/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 16/181   Map 5: 127/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 17/181   Map 5: 128/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 18/181   Map 5: 131/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 19/181   Map 5: 131/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 25/181   Map 5: 132/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 33/181   Map 5: 132/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 42/181   Map 5: 134/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1 
 === ShuffleVertexManager changing parallelism 
 Map 1: 

[jira] [Commented] (TEZ-978) Enhance auto parallelism tuning for queries having empty outputs or data skewness

2014-09-06 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124737#comment-14124737
 ] 

Bikas Saha commented on TEZ-978:


typo 
askInputSize, we can wait wait for some more data to be a

this should ideally be after the if stmt check, right?
{code}+long expectedTotalSourceTasksOutputSize =
+(totalNumSourceTasks * completedSourceTasksOutputSize) / 
numVertexManagerEventsReceived;{code}

this condition is double checking something thats already been checked in the 
call stack (min src fraction). Doing it again does not seem useful.

These conditions are checking for weird cases where no useful info is not 
present (even though at this point it should be present). E,g. 
ShufflevertexManager used with an Output that does not send events. Or slow 
start min fraction set to 0. So its best to not change parallelism. That why 
they did nothing and should now return true. Does that sound correct?
{code} if(numSourceTasksCompleted == 0) {
-  return;
+  //special case: source tasks having zero tasks and pending tasks  0.
+  return (totalNumSourceTasks == 0  pendingTasks.size()  0) ? true : 
false;
 }
 
 if(numVertexManagerEventsReceived == 0) {
-  return;
+  return false;
 }{code}



 Enhance auto parallelism tuning for queries having empty outputs or data 
 skewness
 -

 Key: TEZ-978
 URL: https://issues.apache.org/jira/browse/TEZ-978
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-978-v1.patch, TEZ-978-v2.patch, TEZ-978.3.patch, 
 TEZ-978.4.patch, TEZ-978.4.wip.patch, TEZ-978.5.patch


 Running tpcds (query-92) with auto-tuning 
 tez.am.shuffle-vertex-manager.enable.auto-parallel degraded the performance 
 than original run.  
 Query has lots of empty outputs and these tasks tend to complete a lot more 
 faster than others.  Tez computes the parallelism with the given information 
 (wherein most of the output is empty) and set the reducers to 1.  When 
 other tasks complete, single reducer has to do the heavy lifting and this 
 causes the performance degradation.
 Map 1: 2/181Map 5: 16/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 22/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 25/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 30/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 35/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 36/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 39/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 3/181Map 5: 43/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 5/181Map 5: 46/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1   === 
 ShuffleVertexManager changing parallelism 
 Map 1: 5/181Map 5: 63/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 7/181Map 5: 72/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 7/181Map 5: 83/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 8/181Map 5: 95/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 8/181Map 5: 104/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 9/181Map 5: 116/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 12/181   Map 5: 123/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 13/181   Map 5: 127/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 16/181   Map 5: 127/179  Map 

[jira] [Commented] (TEZ-978) Enhance auto parallelism tuning for queries having empty outputs or data skewness

2014-09-03 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120893#comment-14120893
 ] 

Bikas Saha commented on TEZ-978:


; is not consistent with the existing text in that log line
{code}++  desiredTaskInputSize:  + desiredTaskInputDataSize + ; max 
slow start tasks= +
+(totalNumSourceTasks * slowStartMaxSrcCompletionFraction) + ; num 
sources completed= +{code}

Can this new state variable be replaced by a return value from 
determineParallelismAndApply()?
{code}+  boolean canDetermineParallelismLater = false;{code}

Dont see parallelismDetermined being set to true anywhere in the patch. So how 
does the if stmt work in preventing multiple determinations of parallelism? We 
could end up calling determine parallelism and apply multiple times and change 
parallelism multiple times.
{code}
 if(enableAutoParallelism  !parallelismDetermined) {
-  // do this once
-  parallelismDetermined = true;  THIS LINE
{code}

Extra whitespace
{code}-float percentRange = slowStartMaxSrcCompletionFraction - 
+float percentRange = slowStartMaxSrcCompletionFraction -{code}

Tests? There is a TestShuffleVertexManager that has existing tests for this 
code. Tests could be added to that.

 Enhance auto parallelism tuning for queries having empty outputs or data 
 skewness
 -

 Key: TEZ-978
 URL: https://issues.apache.org/jira/browse/TEZ-978
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-978-v1.patch, TEZ-978-v2.patch, TEZ-978.3.patch, 
 TEZ-978.4.patch, TEZ-978.4.wip.patch


 Running tpcds (query-92) with auto-tuning 
 tez.am.shuffle-vertex-manager.enable.auto-parallel degraded the performance 
 than original run.  
 Query has lots of empty outputs and these tasks tend to complete a lot more 
 faster than others.  Tez computes the parallelism with the given information 
 (wherein most of the output is empty) and set the reducers to 1.  When 
 other tasks complete, single reducer has to do the heavy lifting and this 
 causes the performance degradation.
 Map 1: 2/181Map 5: 16/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 22/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 25/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 30/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 35/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 36/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 39/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 3/181Map 5: 43/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 5/181Map 5: 46/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1   === 
 ShuffleVertexManager changing parallelism 
 Map 1: 5/181Map 5: 63/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 7/181Map 5: 72/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 7/181Map 5: 83/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 8/181Map 5: 95/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 8/181Map 5: 104/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 9/181Map 5: 116/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 12/181   Map 5: 123/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 13/181   Map 5: 127/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 16/181   Map 5: 127/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 

[jira] [Commented] (TEZ-978) Enhance auto parallelism tuning for queries having empty outputs or data skewness

2014-07-31 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081648#comment-14081648
 ] 

Hitesh Shah commented on TEZ-978:
-

[~bikassaha], mind taking a look?

 Enhance auto parallelism tuning for queries having empty outputs or data 
 skewness
 -

 Key: TEZ-978
 URL: https://issues.apache.org/jira/browse/TEZ-978
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-978-v1.patch, TEZ-978-v2.patch


 Running tpcds (query-92) with auto-tuning 
 tez.am.shuffle-vertex-manager.enable.auto-parallel degraded the performance 
 than original run.  
 Query has lots of empty outputs and these tasks tend to complete a lot more 
 faster than others.  Tez computes the parallelism with the given information 
 (wherein most of the output is empty) and set the reducers to 1.  When 
 other tasks complete, single reducer has to do the heavy lifting and this 
 causes the performance degradation.
 Map 1: 2/181Map 5: 16/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 22/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 25/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 30/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 35/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 36/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 2/181Map 5: 39/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 3/181Map 5: 43/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/166
 Map 1: 5/181Map 5: 46/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1   === 
 ShuffleVertexManager changing parallelism 
 Map 1: 5/181Map 5: 63/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 7/181Map 5: 72/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 7/181Map 5: 83/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 8/181Map 5: 95/179   Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 8/181Map 5: 104/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 9/181Map 5: 116/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 12/181   Map 5: 123/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 13/181   Map 5: 127/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 16/181   Map 5: 127/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 17/181   Map 5: 128/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 18/181   Map 5: 131/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 19/181   Map 5: 131/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 25/181   Map 5: 132/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 33/181   Map 5: 132/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1
 Map 1: 42/181   Map 5: 134/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/109Reducer 3: 0/137Reducer 4: 0/1  Reducer 6: 0/1 
 === ShuffleVertexManager changing parallelism 
 Map 1: 51/181   Map 5: 135/179  Map 7: 1/1  Map 8: 1/1  Reducer 2: 
 0/1  Reducer 3: 0/137Reducer 4: