Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2013-01-16 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/
---

(Updated Jan. 16, 2013, 7 p.m.)


Review request for hive.


Description
---

This optimizer exploits intra-query correlations and merges multiple correlated 
MapReduce jobs into one jobs. Open a new request since I have been working on 
hive-git.


This addresses bug HIVE-2206.
https://issues.apache.org/jira/browse/HIVE-2206


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 15fce1e 
  conf/hive-default.xml.template cdc1afd 
  ql/if/queryplan.thrift 4427929 
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
 7c4c413 
  ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 18a9bd2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 7f9ad24 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 68302f8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 69fff0e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 919a140 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 18b5540 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 09f8139 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java d1555e2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinResolver.java
 2f5140a 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java cb43d84 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 3df029f 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 9ac7d8c 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java b33d616 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9cbb2e6 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 6f8bc47 
  ql/src/test/queries/clientpositive/correlationoptimizer1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer3.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer4.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer5.q PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out PRE-CREATION 
  ql/src/test/results/compiler/plan/groupby1.q.xml f38ae1f 
  ql/src/test/results/compiler/plan/groupby2.q.xml 72bd2ff 
  ql/src/test/results/compiler/plan/groupby3.q.xml e62439a 
  ql/src/test/results/compiler/plan/groupby5.q.xml 632d88e 

Diff: https://reviews.apache.org/r/7126/diff/


Testing
---

All tests pass.


Thanks,

Yin Huai



Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2013-01-15 Thread Yin Huai


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java,
> >  line 275
> > 
> >
> > Does this imply CLSRSOperator cannot have more than one child operator 
> > in any case. If so, can you please add comments stating that along with 
> > small description why is that?
> 
> Yin Huai wrote:
> I thought that, in the original plan, a ReduceSinkOperator can only have 
> 1 child. Because a CLSRSOperator replaces a ReduceSinkOperator, it also has a 
> single child. Is my understanding correct?
> 
> Ashutosh Chauhan wrote:
> I think you are correct. Since its a terminal operator, RSOperator can 
> only have 1 child. It will be good to add this as comment there.

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java, line 1451
> > 
> >
> > I don't get the reason of introducing rowNumber in Operator. It doesn't 
> > look its required for optimization to take place correctly. Can you 
> > elaborate the need of this variable and different associated methods which 
> > are introduced along with it?

If a query plan is optimized by correlation optimizer, multiple operation paths 
can be merged to a single Map phase. CorrelationCompositeOperator is used to 
evaluate results from different paths and forward a single row to 
ReduceSinkOperator. CorrelationCompositeOperator will first buffer rows 
forwarded to it from different paths. rowNumber is introduced to let 
CorrelationCompositeOperator know when a row has been processed by all paths. 
When a new rowNumber (a new row will come) is set through setRowNumber, 
CorrelationCompositeOperator will evaluate its current buffer and tag which 
operation paths this current row belongs to. I will add comment to explain it.


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java, 
> > line 134
> > 
> >
> > Can you add comments clarifying whats the reason for this?

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java, 
> > line 270
> > 
> >
> > If I am reading this correctly than, map-side groupbys are converted to 
> > reduce-side groupbys. I don't see any fundamental reason why correlation 
> > optimizer cannot work with map-side groupbys. Is this just the limitation 
> > of current implementation or is there any fundamental reason for it. Please 
> > add comments whatever the case is.

For example, if an MR job for an aggregation operator shares the input table 
with an MR job for a join operator, with Reduce-side aggregation (RS-GBY 
pattern), the ReduceSinkOperator for the merged Map phase only needs to emit a 
single row for both the aggregation operator and the join operator. Thus, I 
decide to convert map-side groupbys to reduce-side groupbys.


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java, line 94
> > 
> >
> > Can you add comments why it is not compatible with skew-join and 
> > groupby-skew optimizations?

seems skew optimizations will split a single MR job to 2 jobs. I have not 
carefully thought how to apply the correlation optimization on those plans 
optimized by skew optimizations. So, I'd suggest we evaluate this issue in a 
follow-up jira.


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java, line 393
> > 
> >
> > Explain annotation should have worked, no ? Having this working will be 
> > very useful for debugging.

Enabled. We need to re-generate some test results (probably all results 
involving "EXPLAIN"?).


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java, 
> > line 175
> > 
> >
> > Seems like the logic of auto conversion of join to map-join is same as 
> > the one used in CommonJoinResolver. If thats the case, instead of 
> > replicating the logic here, it will be good to refactor that logic out of 
> > CommonJoinResolver in some util function in some util class and then use 
> > that function here. That logic will likely change over course of time and 
> > risk getting diverged if we duplicate instead of reuse.

i am working on it.


> On Jan. 10, 2013, 2:24 a.m.

Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2013-01-12 Thread Ashutosh Chauhan


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java,
> >  line 275
> > 
> >
> > Does this imply CLSRSOperator cannot have more than one child operator 
> > in any case. If so, can you please add comments stating that along with 
> > small description why is that?
> 
> Yin Huai wrote:
> I thought that, in the original plan, a ReduceSinkOperator can only have 
> 1 child. Because a CLSRSOperator replaces a ReduceSinkOperator, it also has a 
> single child. Is my understanding correct?

I think you are correct. Since its a terminal operator, RSOperator can only 
have 1 child. It will be good to add this as comment there. 


- Ashutosh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/#review15124
---


On Nov. 19, 2012, 7:51 p.m., Yin Huai wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7126/
> ---
> 
> (Updated Nov. 19, 2012, 7:51 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Description
> ---
> 
> This optimizer exploits intra-query correlations and merges multiple 
> correlated MapReduce jobs into one jobs. Open a new request since I have been 
> working on hive-git.
> 
> 
> This addresses bug HIVE-2206.
> https://issues.apache.org/jira/browse/HIVE-2206
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9fa9525 
>   conf/hive-default.xml.template f332f3a 
>   
> ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
>  7c4c413 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 18a9bd2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 46daeb2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 68302f8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 0c22141 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 919a140 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1469325 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java edde378 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java d1555e2 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 2bf284d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 330aa52 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 5a9f064 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java b33d616 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9a95efd 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 6f8bc47 
>   ql/src/test/queries/clientpositive/correlationoptimizer1.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/correlationoptimizer2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/correlationoptimizer3.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/correlationoptimizer4.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/correlationoptimizer5.q PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer1.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer3.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer4.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer5.q.out PRE-CREATION 
>   ql/src/test/results/compiler/plan/groupby1.q.xml cd0d6e4 
>   ql/src/test/results/compiler/plan/groupby2.q.xml 7b07f02 
>   ql/src/test/results/compiler/plan/groupby3.q.xml a6a1986 
> 

Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2013-01-10 Thread Yin Huai


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java,
> >  line 32
> > 
> >
> > I don't see any modifications to ql/if/queryplan.thrift Please mod that 
> > file appropriately and add the generated code  in the patch

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java, 
> > line 36
> > 
> >
> > Unused import.

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java, 
> > line 47
> > 
> >
> > Please add javadocs, explaining this class as well as the fact that 
> > there are currently two classes which extends this. Also, add difference in 
> > behavior of those two classes which necessitates the need for this base 
> > class.

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java,
> >  line 25
> > 
> >
> > Unused import.

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java,
> >  line 26
> > 
> >
> > Unused import.

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java,
> >  line 155
> > 
> >
> > please do return "CorrelationComposite" here instead of "CCO"

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java,
> >  line 224
> > 
> >
> > It seems like you are serializing and then immediately deserializing 
> > keys and values here, which I think is required for ReduceSinkOperator 
> > since keys and values are transferred from mapper process to reducer 
> > process. This is redundant in CLSReduceSinkOp since its all running inline 
> > in one operator pipeline in same memory. So, its looks like this could be 
> > avoided. 
> > I guess doing this keeps implementation easier, but if this is true, we 
> > should take this up in follow-up jira as performance improvement.

yes, it was for easier implementation. I will add a comment indicating it will 
be addressed in a follow-up jira.


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java,
> >  line 275
> > 
> >
> > Does this imply CLSRSOperator cannot have more than one child operator 
> > in any case. If so, can you please add comments stating that along with 
> > small description why is that?

I thought that, in the original plan, a ReduceSinkOperator can only have 1 
child. Because a CLSRSOperator replaces a ReduceSinkOperator, it also has a 
single child. Is my understanding correct?


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java,
> >  line 285
> > 
> >
> > Please add comments here about why its overridden for empty 
> > implementation and how startGroup() is taken care of in processOp()

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java,
> >  line 290
> > 
> >
> > Please add comments here about why its overridden for empty 
> > implementation and how endGroup() is dealt with in processOp()

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java,
> >  line 294
> > 
> >
> > Please add comments here about why its overridden for empty 
> > implementation and how this is taken care of in processOp()

added a comment to explain this method.


- Yin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/#review15124
---


On Nov. 19, 20

Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2013-01-09 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/#review15124
---



ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java


I don't see any modifications to ql/if/queryplan.thrift Please mod that 
file appropriately and add the generated code  in the patch



ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java


Unused import.



ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java


Please add javadocs, explaining this class as well as the fact that there 
are currently two classes which extends this. Also, add difference in behavior 
of those two classes which necessitates the need for this base class.



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java


Unused import.



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java


Unused import.



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java


please do return "CorrelationComposite" here instead of "CCO"



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java


It seems like you are serializing and then immediately deserializing keys 
and values here, which I think is required for ReduceSinkOperator since keys 
and values are transferred from mapper process to reducer process. This is 
redundant in CLSReduceSinkOp since its all running inline in one operator 
pipeline in same memory. So, its looks like this could be avoided. 
I guess doing this keeps implementation easier, but if this is true, we 
should take this up in follow-up jira as performance improvement.



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java


Does this imply CLSRSOperator cannot have more than one child operator in 
any case. If so, can you please add comments stating that along with small 
description why is that?



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java


Please add comments here about why its overridden for empty implementation 
and how startGroup() is taken care of in processOp()



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java


Please add comments here about why its overridden for empty implementation 
and how endGroup() is dealt with in processOp()



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java


Please add comments here about why its overridden for empty implementation 
and how this is taken care of in processOp()



ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java


I don't get the reason of introducing rowNumber in Operator. It doesn't 
look its required for optimization to take place correctly. Can you elaborate 
the need of this variable and different associated methods which are introduced 
along with it?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java


Can you add comments clarifying whats the reason for this? 



ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java


Seems like the logic of auto conversion of join to map-join is same as the 
one used in CommonJoinResolver. If thats the case, instead of replicating the 
logic here, it will be good to refactor that logic out of CommonJoinResolver in 
some util function in some util class and then use that function here. That 
logic will likely change over course of time and risk getting diverged if we 
duplicate instead of reuse. 



ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java


If I am reading this correctly than, map-side groupbys are converted to 
reduce-side groupbys. I don't see any fundamental reason why correlation 
optimizer cannot work with map-side groupbys. Is this just the limitation of 
current implementation or is there any fundamental reason for it. Please add 
comments whatever the case is.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java


Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2012-11-19 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/
---

(Updated Nov. 19, 2012, 7:51 p.m.)


Review request for hive.


Changes
---

Correlation optimizer will guess which join operators at the bottom (input 
tables are not intermediate tables) will be optimized by auto join convert and 
ignore those join operators in the optimization of correlation optimizer.


Description
---

This optimizer exploits intra-query correlations and merges multiple correlated 
MapReduce jobs into one jobs. Open a new request since I have been working on 
hive-git.


This addresses bug HIVE-2206.
https://issues.apache.org/jira/browse/HIVE-2206


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9fa9525 
  conf/hive-default.xml.template f332f3a 
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
 7c4c413 
  ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 18a9bd2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 46daeb2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 68302f8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 0c22141 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 919a140 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1469325 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java edde378 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java d1555e2 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 2bf284d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 330aa52 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 5a9f064 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java b33d616 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9a95efd 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 6f8bc47 
  ql/src/test/queries/clientpositive/correlationoptimizer1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer3.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer4.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer5.q PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out PRE-CREATION 
  ql/src/test/results/compiler/plan/groupby1.q.xml cd0d6e4 
  ql/src/test/results/compiler/plan/groupby2.q.xml 7b07f02 
  ql/src/test/results/compiler/plan/groupby3.q.xml a6a1986 
  ql/src/test/results/compiler/plan/groupby5.q.xml 25e3583 

Diff: https://reviews.apache.org/r/7126/diff/


Testing
---

All tests pass.


Thanks,

Yin Huai



Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2012-11-12 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/
---

(Updated Nov. 12, 2012, 10:21 p.m.)


Review request for hive.


Changes
---

update patch for the latest trunk


Description
---

This optimizer exploits intra-query correlations and merges multiple correlated 
MapReduce jobs into one jobs. Open a new request since I have been working on 
hive-git.


This addresses bug HIVE-2206.
https://issues.apache.org/jira/browse/HIVE-2206


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ca33fc1 
  conf/hive-default.xml.template f332f3a 
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
 7c4c413 
  ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 18a9bd2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 46daeb2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 68302f8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 0c22141 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 919a140 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1469325 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java edde378 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java d1555e2 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 2bf284d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 53c5b21 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 5a9f064 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java b33d616 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9a95efd 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 6f8bc47 
  ql/src/test/queries/clientpositive/correlationoptimizer1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer3.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer4.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer5.q PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out PRE-CREATION 
  ql/src/test/results/compiler/plan/groupby1.q.xml cd0d6e4 
  ql/src/test/results/compiler/plan/groupby2.q.xml 7b07f02 
  ql/src/test/results/compiler/plan/groupby3.q.xml a6a1986 
  ql/src/test/results/compiler/plan/groupby5.q.xml 25e3583 

Diff: https://reviews.apache.org/r/7126/diff/


Testing
---

All tests pass.


Thanks,

Yin Huai



Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2012-11-02 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/
---

(Updated Nov. 3, 2012, 1:50 a.m.)


Review request for hive.


Description
---

This optimizer exploits intra-query correlations and merges multiple correlated 
MapReduce jobs into one jobs. Open a new request since I have been working on 
hive-git.


This addresses bug HIVE-2206.
https://issues.apache.org/jira/browse/HIVE-2206


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b8230e4 
  conf/hive-default.xml.template 855f758 
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
 8c9bd26 
  ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 18a9bd2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 46daeb2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 68302f8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 0c22141 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 919a140 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1469325 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java edde378 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java d1555e2 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 2bf284d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 74b7d2b 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 5a9f064 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 16eb125 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9a95efd 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 142f040 
  ql/src/test/queries/clientpositive/correlationoptimizer1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer3.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer4.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer5.q PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out PRE-CREATION 
  ql/src/test/results/compiler/plan/groupby1.q.xml e1eb209 
  ql/src/test/results/compiler/plan/groupby2.q.xml 268eb90 
  ql/src/test/results/compiler/plan/groupby3.q.xml 3e02d15 
  ql/src/test/results/compiler/plan/groupby5.q.xml 89e1207 

Diff: https://reviews.apache.org/r/7126/diff/


Testing
---

All tests pass.


Thanks,

Yin Huai



Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2012-10-19 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/
---

(Updated Oct. 19, 2012, 3:37 p.m.)


Review request for hive.


Changes
---

diff7 is not the correct one. diff8 is the update


Description
---

This optimizer exploits intra-query correlations and merges multiple correlated 
MapReduce jobs into one jobs. Open a new request since I have been working on 
hive-git.


This addresses bug HIVE-2206.
https://issues.apache.org/jira/browse/HIVE-2206


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java f86d6a7 
  conf/hive-default.xml.template 4a59fb6 
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
 8c9bd26 
  ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 18a9bd2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 652d81c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 587fe33 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 0c22141 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 919a140 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1469325 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 01b0728 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 40cc7ed 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 8bacd3d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java bdae9d5 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 5a9f064 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 16eb125 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9a95efd 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 142f040 
  ql/src/test/queries/clientpositive/correlationoptimizer1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer3.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer4.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer5.q PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out PRE-CREATION 
  ql/src/test/results/compiler/plan/groupby1.q.xml 4382252 
  ql/src/test/results/compiler/plan/groupby2.q.xml eef669c 
  ql/src/test/results/compiler/plan/groupby3.q.xml 9743480 
  ql/src/test/results/compiler/plan/groupby5.q.xml 8e07860 

Diff: https://reviews.apache.org/r/7126/diff/


Testing
---

All tests pass.


Thanks,

Yin Huai



Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2012-10-19 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/
---

(Updated Oct. 19, 2012, 3:24 p.m.)


Review request for hive.


Changes
---

update test results according to HIVE-3495 and update comments


Description
---

This optimizer exploits intra-query correlations and merges multiple correlated 
MapReduce jobs into one jobs. Open a new request since I have been working on 
hive-git.


This addresses bug HIVE-2206.
https://issues.apache.org/jira/browse/HIVE-2206


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java f86d6a7 
  conf/hive-default.xml.template 4a59fb6 
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
 8c9bd26 
  ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 18a9bd2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 652d81c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 587fe33 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 0c22141 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 919a140 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1469325 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 01b0728 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 40cc7ed 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 8bacd3d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java bdae9d5 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 5a9f064 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 16eb125 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9a95efd 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 142f040 
  ql/src/test/queries/clientpositive/correlationoptimizer1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer3.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer4.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer5.q PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out PRE-CREATION 
  ql/src/test/results/compiler/plan/groupby1.q.xml 4382252 
  ql/src/test/results/compiler/plan/groupby2.q.xml eef669c 
  ql/src/test/results/compiler/plan/groupby3.q.xml 9743480 
  ql/src/test/results/compiler/plan/groupby5.q.xml 8e07860 

Diff: https://reviews.apache.org/r/7126/diff/


Testing
---

All tests pass.


Thanks,

Yin Huai



Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2012-10-02 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/
---

(Updated Oct. 2, 2012, 3:43 p.m.)


Review request for hive.


Changes
---

remove the first phase of the optimizer


Description
---

This optimizer exploits intra-query correlations and merges multiple correlated 
MapReduce jobs into one jobs. Open a new request since I have been working on 
hive-git.


This addresses bug HIVE-2206.
https://issues.apache.org/jira/browse/HIVE-2206


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 8064c73 
  conf/hive-default.xml.template 23762af 
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
 8c9bd26 
  ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 283d0b6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 652d81c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 8a5df6f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 0c22141 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 919a140 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1469325 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 01b0728 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 40cc7ed 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 8bacd3d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java bce2a06 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 322f20b 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 16eb125 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9a95efd 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 142f040 
  ql/src/test/queries/clientpositive/correlationoptimizer1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer3.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer4.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer5.q PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out PRE-CREATION 
  ql/src/test/results/compiler/plan/groupby1.q.xml 4382252 
  ql/src/test/results/compiler/plan/groupby2.q.xml eef669c 
  ql/src/test/results/compiler/plan/groupby3.q.xml 9743480 
  ql/src/test/results/compiler/plan/groupby5.q.xml 8e07860 

Diff: https://reviews.apache.org/r/7126/diff/


Testing
---

All tests pass.


Thanks,

Yin Huai



Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2012-09-26 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/
---

(Updated Sept. 26, 2012, 2:26 p.m.)


Review request for hive.


Changes
---

changed my local configurations and tested all cases which trunk did not pass.


Description
---

This optimizer exploits intra-query correlations and merges multiple correlated 
MapReduce jobs into one jobs. Open a new request since I have been working on 
hive-git.


This addresses bug HIVE-2206.
https://issues.apache.org/jira/browse/HIVE-2206


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2693663 
  conf/hive-default.xml.template 72bf4d7 
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
 8c9bd26 
  ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 283d0b6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 8669051 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 5f08519 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 0c22141 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 919a140 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1469325 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 40dd949 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java f292131 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 8bacd3d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 33ce6ca 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 5f38bf2 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 16eb125 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9a95efd 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 142f040 
  ql/src/test/queries/clientpositive/correlationoptimizer1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer3.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer4.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer5.q PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out PRE-CREATION 
  ql/src/test/results/compiler/plan/groupby1.q.xml 4382252 
  ql/src/test/results/compiler/plan/groupby2.q.xml eef669c 
  ql/src/test/results/compiler/plan/groupby3.q.xml 9743480 
  ql/src/test/results/compiler/plan/groupby5.q.xml 8e07860 

Diff: https://reviews.apache.org/r/7126/diff/


Testing (updated)
---

All tests pass.


Thanks,

Yin Huai



Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2012-09-25 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/
---

(Updated Sept. 25, 2012, 3:23 p.m.)


Review request for hive.


Changes
---

address Carl's comments


Description
---

This optimizer exploits intra-query correlations and merges multiple correlated 
MapReduce jobs into one jobs. Open a new request since I have been working on 
hive-git.


This addresses bug HIVE-2206.
https://issues.apache.org/jira/browse/HIVE-2206


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2693663 
  conf/hive-default.xml.template 72bf4d7 
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
 8c9bd26 
  ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 283d0b6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 8669051 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 5f08519 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 0c22141 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 919a140 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1469325 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 40dd949 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java f292131 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 8bacd3d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 33ce6ca 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 5f38bf2 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 16eb125 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9a95efd 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 142f040 
  ql/src/test/queries/clientpositive/correlationoptimizer1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer3.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer4.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer5.q PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out PRE-CREATION 
  ql/src/test/results/compiler/plan/groupby1.q.xml 4382252 
  ql/src/test/results/compiler/plan/groupby2.q.xml eef669c 
  ql/src/test/results/compiler/plan/groupby3.q.xml 9743480 
  ql/src/test/results/compiler/plan/groupby5.q.xml 8e07860 

Diff: https://reviews.apache.org/r/7126/diff/


Testing
---

Cannot test TestHBaseMinimrCliDriver, TestHBaseCliDriver, 
TestHBaseNegativeCliDriver, testSynchronized in TestEmbeddedHiveMetaStore, 
testSynchronized in TestRemoteHiveMetaStore, testSynchronized in 
TestSetUGIOnBothClientServer, testSynchronized in TestSetUGIOnOnlyClient, 
testSynchronized in TestSetUGIOnOnlyServer, and 
testNegativeCliDriver_local_mapred_error_cache in TestNegativeCliDriver, since 
trunk failed on these tests on my machine. Also, since trunk will generate a 
different order of results (rows are in a different order) for queries 
skewjoinopt1.q to skewjoinopt5.q, skewjoinopt10.q, skewjoinopt15.q to 
skewjoinopt17.q, and skewjoinopt19.q to skewjoinopt20.q, I cannot test these 
queries on my machine either. All other tests pass.


Thanks,

Yin Huai



Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2012-09-25 Thread Yin Huai


> On Sept. 24, 2012, 9:52 p.m., Carl Steinbach wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java, line 33
> > 
> >
> > No raw types on LHS, and why is the classname fully qualified?

I have removed full qualified name. Those raw types in BaseReduceSinkDesc, 
CorrelationLocalSimulativeReduceSinkDesc, and ReduceSinkDesc are from trunk and 
are used to expose method "clone" (List is not cloneable). I have removed all 
LHS raw types related to my patch.


> On Sept. 24, 2012, 9:52 p.m., Carl Steinbach wrote:
> > ql/src/test/queries/clientpositive/correlationoptimizer4.q, line 23
> > 
> >
> > Combining these two queries with a UNION ALL would make it easier to 
> > visually verify the results.

In those test cases for CorrelationOptimizer, any query will be executed twice. 
The optimizer is disabled for the first run and is enabled for the second run. 
Results for these two runs will be written to dest_co1 and dest_co2, 
respectively. Actually, what I want to do here is to evaluate if dest_co1 and 
dest_co2 are same. Any good way to do that? Thanks.


- Yin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/#review11858
---


On Sept. 24, 2012, 3:53 p.m., Yin Huai wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7126/
> ---
> 
> (Updated Sept. 24, 2012, 3:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Description
> ---
> 
> This optimizer exploits intra-query correlations and merges multiple 
> correlated MapReduce jobs into one jobs. Open a new request since I have been 
> working on hive-git.
> 
> 
> This addresses bug HIVE-2206.
> https://issues.apache.org/jira/browse/HIVE-2206
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2693663 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 283d0b6 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 8669051 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 5f08519 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 0c22141 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 919a140 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 1a40630 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1469325 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 40dd949 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java f292131 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 8bacd3d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 33ce6ca 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 5f38bf2 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 16eb125 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9a95efd 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 142f040 
>   ql/src/test/queries/clientpositive/correlationoptimizer1.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/correlationoptimizer2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/correlationoptimizer3.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/correlationoptimizer4.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/correlationoptimizer5.q PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer1.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer3.q.out P

Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2012-09-24 Thread Carl Steinbach

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/#review11858
---



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java


Need to add this to conf/hive-default.xml.template along with a description.



ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java


Every instance variable should be explicitly public, private, or protected. 
Please add "protected" where necessary. Also, please maintain consistent order, 
e.g. "protected transient .." instead of "transient protected ..."



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java


Formatting: this line needs to be indented.



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java


switch "CorrelationCompositeOperator" with "Driver".



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java


s/firstRow/isFirstRow/



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java


Please don't use raw types on the left hand side or in parameter lists, e.g:

List operationPathTags = new ArrayList();

instead of

ArrayList operationPathTags = new ArrayList();




ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java


Should this be ignored? And if so, I think it should be logged instead of 
dumping the stack to stderr.



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java


This needs to return something other than null.



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java


Formatting: indent



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java


Raw type on LHS.



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java


Log this?



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java


This is really hard to read. Please use temporary variables instead of 
repeatedly calling the same getters.



ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java


This comment doesn't add much. Please remove.



ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java


Rawtype on the LHS. Please fix all occurrences of this problem in the code 
that you have added/modified.



ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java


Methods should not return rawtypes. Please return List instead. 
Please correct this issue in any other code that is modified/added in this 
patch.



ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java


Is it really necessary to make this public?




ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java


Please remove the HTML javadoc formatting. Most of the folks are are going 
to read this will look at the actual source instead of the javadoc, and for 
them the HTML tags are a distraction.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java


Formatting: variable names (with the exception of public static final 
variables) begin with lowercase lettters, e.g. aliasToTabName instead of 
AliastoTabName.



ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java


No raw types on LHS, and why is the classname fully qualified?



ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java


Please don't use raw types in parameter lists (and why is ArrayList fully 
qualified?)



ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java


Too many input parameters. Please replace this with a Builder or a no-arg 
constructor and setters.



ql/src/test/queries/clientpositive/correlationoptimizer4.q


Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2012-09-24 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/
---

(Updated Sept. 24, 2012, 3:53 p.m.)


Review request for hive.


Changes
---

bug fix


Description
---

This optimizer exploits intra-query correlations and merges multiple correlated 
MapReduce jobs into one jobs. Open a new request since I have been working on 
hive-git.


This addresses bug HIVE-2206.
https://issues.apache.org/jira/browse/HIVE-2206


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2693663 
  ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 283d0b6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 8669051 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 5f08519 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 0c22141 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 919a140 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 1a40630 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1469325 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 40dd949 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java f292131 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 8bacd3d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 33ce6ca 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 5f38bf2 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 16eb125 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9a95efd 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 142f040 
  ql/src/test/queries/clientpositive/correlationoptimizer1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer3.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer4.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer5.q PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out PRE-CREATION 
  ql/src/test/results/compiler/plan/groupby1.q.xml 4382252 
  ql/src/test/results/compiler/plan/groupby2.q.xml eef669c 
  ql/src/test/results/compiler/plan/groupby3.q.xml 9743480 
  ql/src/test/results/compiler/plan/groupby5.q.xml 8e07860 

Diff: https://reviews.apache.org/r/7126/diff/


Testing
---

Cannot test TestHBaseMinimrCliDriver, TestHBaseCliDriver, 
TestHBaseNegativeCliDriver, testSynchronized in TestEmbeddedHiveMetaStore, 
testSynchronized in TestRemoteHiveMetaStore, testSynchronized in 
TestSetUGIOnBothClientServer, testSynchronized in TestSetUGIOnOnlyClient, 
testSynchronized in TestSetUGIOnOnlyServer, and 
testNegativeCliDriver_local_mapred_error_cache in TestNegativeCliDriver, since 
trunk failed on these tests on my machine. Also, since trunk will generate a 
different order of results (rows are in a different order) for queries 
skewjoinopt1.q to skewjoinopt5.q, skewjoinopt10.q, skewjoinopt15.q to 
skewjoinopt17.q, and skewjoinopt19.q to skewjoinopt20.q, I cannot test these 
queries on my machine either. All other tests pass.


Thanks,

Yin Huai



Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2012-09-24 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/
---

(Updated Sept. 24, 2012, 2:33 p.m.)


Review request for hive.


Changes
---

bug fix + 2 new tests


Description
---

This optimizer exploits intra-query correlations and merges multiple correlated 
MapReduce jobs into one jobs. Open a new request since I have been working on 
hive-git.


This addresses bug HIVE-2206.
https://issues.apache.org/jira/browse/HIVE-2206


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2693663 
  ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 283d0b6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 8669051 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 5f08519 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 0c22141 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 919a140 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 1a40630 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1469325 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 40dd949 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java f292131 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 8bacd3d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 33ce6ca 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 5f38bf2 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 16eb125 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9a95efd 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 142f040 
  ql/src/test/queries/clientpositive/correlationoptimizer1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer3.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer4.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer5.q PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out PRE-CREATION 
  ql/src/test/results/compiler/plan/groupby1.q.xml 4382252 
  ql/src/test/results/compiler/plan/groupby2.q.xml eef669c 
  ql/src/test/results/compiler/plan/groupby3.q.xml 9743480 
  ql/src/test/results/compiler/plan/groupby5.q.xml 8e07860 

Diff: https://reviews.apache.org/r/7126/diff/


Testing
---

Cannot test TestHBaseMinimrCliDriver, TestHBaseCliDriver, 
TestHBaseNegativeCliDriver, testSynchronized in TestEmbeddedHiveMetaStore, 
testSynchronized in TestRemoteHiveMetaStore, testSynchronized in 
TestSetUGIOnBothClientServer, testSynchronized in TestSetUGIOnOnlyClient, 
testSynchronized in TestSetUGIOnOnlyServer, and 
testNegativeCliDriver_local_mapred_error_cache in TestNegativeCliDriver, since 
trunk failed on these tests on my machine. Also, since trunk will generate a 
different order of results (rows are in a different order) for queries 
skewjoinopt1.q to skewjoinopt5.q, skewjoinopt10.q, skewjoinopt15.q to 
skewjoinopt17.q, and skewjoinopt19.q to skewjoinopt20.q, I cannot test these 
queries on my machine either. All other tests pass.


Thanks,

Yin Huai



Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2012-09-19 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/
---

(Updated Sept. 19, 2012, 2:45 p.m.)


Review request for hive.


Changes
---

update tests I have done


Description
---

This optimizer exploits intra-query correlations and merges multiple correlated 
MapReduce jobs into one jobs. Open a new request since I have been working on 
hive-git.


This addresses bug HIVE-2206.
https://issues.apache.org/jira/browse/HIVE-2206


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2693663 
  ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 283d0b6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 8669051 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 05a399d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 0c22141 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 919a140 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 1a40630 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1469325 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 6bc5fe4 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java f292131 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 8bacd3d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 63e8ff2 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 5f38bf2 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 16eb125 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9a95efd 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 142f040 
  ql/src/test/queries/clientpositive/correlationoptimizer1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer3.q PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out PRE-CREATION 
  ql/src/test/results/compiler/plan/groupby1.q.xml 4382252 
  ql/src/test/results/compiler/plan/groupby2.q.xml eef669c 
  ql/src/test/results/compiler/plan/groupby3.q.xml 9743480 
  ql/src/test/results/compiler/plan/groupby5.q.xml 8e07860 

Diff: https://reviews.apache.org/r/7126/diff/


Testing (updated)
---

Cannot test TestHBaseMinimrCliDriver, TestHBaseCliDriver, 
TestHBaseNegativeCliDriver, testSynchronized in TestEmbeddedHiveMetaStore, 
testSynchronized in TestRemoteHiveMetaStore, testSynchronized in 
TestSetUGIOnBothClientServer, testSynchronized in TestSetUGIOnOnlyClient, 
testSynchronized in TestSetUGIOnOnlyServer, and 
testNegativeCliDriver_local_mapred_error_cache in TestNegativeCliDriver, since 
trunk failed on these tests on my machine. Also, since trunk will generate a 
different order of results (rows are in a different order) for queries 
skewjoinopt1.q to skewjoinopt5.q, skewjoinopt10.q, skewjoinopt15.q to 
skewjoinopt17.q, and skewjoinopt19.q to skewjoinopt20.q, I cannot test these 
queries on my machine either. All other tests pass.


Thanks,

Yin Huai



Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2012-09-18 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/
---

(Updated Sept. 18, 2012, 5:43 p.m.)


Review request for hive.


Changes
---

bug fix+ 3 test cases


Description
---

This optimizer exploits intra-query correlations and merges multiple correlated 
MapReduce jobs into one jobs. Open a new request since I have been working on 
hive-git.


This addresses bug HIVE-2206.
https://issues.apache.org/jira/browse/HIVE-2206


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2693663 
  ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 283d0b6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 8669051 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 05a399d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 0c22141 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 919a140 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 1a40630 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1469325 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 6bc5fe4 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java f292131 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 8bacd3d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 63e8ff2 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 5f38bf2 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 16eb125 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9a95efd 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 142f040 
  ql/src/test/queries/clientpositive/correlationoptimizer1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer3.q PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out PRE-CREATION 
  ql/src/test/results/compiler/plan/groupby1.q.xml 4382252 
  ql/src/test/results/compiler/plan/groupby2.q.xml eef669c 
  ql/src/test/results/compiler/plan/groupby3.q.xml 9743480 
  ql/src/test/results/compiler/plan/groupby5.q.xml 8e07860 

Diff: https://reviews.apache.org/r/7126/diff/


Testing
---

Cannot test TestHBaseMinimrCliDriver, TestHBaseCliDriver, 
TestHBaseNegativeCliDriver, testSynchronized in TestEmbeddedHiveMetaStore, 
testSynchronized in TestRemoteHiveMetaStore, testSynchronized in 
TestSetUGIOnBothClientServer, testSynchronized in TestSetUGIOnOnlyClient, 
testSynchronized in TestSetUGIOnOnlyServer, and 
testNegativeCliDriver_local_mapred_error_cache in TestNegativeCliDriver. This 
patch should pass all other tests. 

When the optimizer is enabled (right now, the optimizer is disabled by 
default), there are several cases failed. 1 is optimized by the optimizer. 1 is 
not suitable for this correlation optimizer. 2 are due to potential bugs of the 
trunk. Other failures are parsing cases (xml plans). Those failures are due to 
my minor changes in SemanticAnalyzer since several redundant operators will be 
generated for the correlation optimizer. Overall, those failures are not very 
relevant to the patch. Please see 
https://issues.apache.org/jira/browse/HIVE-2206?focusedCommentId=13456171&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13456171
 for details.


Thanks,

Yin Huai



Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2012-09-15 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/
---

Review request for hive.


Description
---

This optimizer exploits intra-query correlations and merges multiple correlated 
MapReduce jobs into one jobs. Open a new request since I have been working on 
hive-git.


This addresses bug HIVE-2206.
https://issues.apache.org/jira/browse/HIVE-2206


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 5efae89 
  ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 283d0b6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java e3ed13a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java f0c35e7 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 0c22141 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java a2caeed 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 1a40630 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java dffdd7b 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 6bc5fe4 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 67d3a99 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 8bacd3d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a65b0e4 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 5f38bf2 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 16eb125 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9a95efd 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 142f040 
  ql/src/test/results/compiler/plan/groupby1.q.xml 4382252 
  ql/src/test/results/compiler/plan/groupby2.q.xml eef669c 
  ql/src/test/results/compiler/plan/groupby3.q.xml 9743480 
  ql/src/test/results/compiler/plan/groupby5.q.xml 8e07860 

Diff: https://reviews.apache.org/r/7126/diff/


Testing
---

Cannot test TestHBaseMinimrCliDriver, TestHBaseCliDriver, 
TestHBaseNegativeCliDriver, testSynchronized in TestEmbeddedHiveMetaStore, 
testSynchronized in TestRemoteHiveMetaStore, testSynchronized in 
TestSetUGIOnBothClientServer, testSynchronized in TestSetUGIOnOnlyClient, 
testSynchronized in TestSetUGIOnOnlyServer, and 
testNegativeCliDriver_local_mapred_error_cache in TestNegativeCliDriver. This 
patch should pass all other tests. 

When the optimizer is enabled (right now, the optimizer is disabled by 
default), there are several cases failed. 1 is optimized by the optimizer. 1 is 
not suitable for this correlation optimizer. 2 are due to potential bugs of the 
trunk. Other failures are parsing cases (xml plans). Those failures are due to 
my minor changes in SemanticAnalyzer since several redundant operators will be 
generated for the correlation optimizer. Overall, those failures are not very 
relevant to the patch. Please see 
https://issues.apache.org/jira/browse/HIVE-2206?focusedCommentId=13456171&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13456171
 for details.


Thanks,

Yin Huai



Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2012-02-10 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2001/
---

(Updated 2012-02-10 20:49:01.177796)


Review request for hive.


Changes
---

updated patch on revision 1237253. Will generate the patch based on the latest 
trunk latter. 


Summary
---

This optimizer exploits intra-query correlations and merges multiple correlated 
MapReduce jobs into one jobs.


This addresses bug HIVE-2206.
https://issues.apache.org/jira/browse/HIVE-2206


Diffs (updated)
-

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1237326 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java 
PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
 PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 1237326 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 1237326 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 1237326 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 
1237326 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 
1237326 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 
1237326 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java 
PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java
 PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
1237326 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 1237326 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1237326 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
1237326 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java 
PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java 
PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java
 PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1237326 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 1237326 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 1237326 
  trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 1237326 
  trunk/ql/src/test/results/compiler/plan/groupby1.q.xml 1237326 
  trunk/ql/src/test/results/compiler/plan/groupby2.q.xml 1237326 
  trunk/ql/src/test/results/compiler/plan/groupby3.q.xml 1237326 
  trunk/ql/src/test/results/compiler/plan/groupby5.q.xml 1237326 

Diff: https://reviews.apache.org/r/2001/diff


Testing
---


Thanks,

Yin



Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2012-02-10 Thread Yin Huai


> On 2012-02-10 17:38:09, Kevin Wilfong wrote:
> > I've started reviewing this, here's my comments so far.  I'll continue to 
> > look over it.

I will update this patch soon. 


> On 2012-02-10 17:38:09, Kevin Wilfong wrote:
> > trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 453
> > 
> >
> > Does this have to default to false, does anything break if it's true?
> > 
> > Similarly, have you tried running the tests with this set to true?

I have not tried running the tests with this set to true. I will do it when I 
find a revision which can pass all unit tests (btw, any suggestion on which 
revision should I use?). In my opinion, since this optimizer is kind of 
complicated and it is still being developed, it will be safer to default it to 
false and let users to decide when to use it than default it to true.


> On 2012-02-10 17:38:09, Kevin Wilfong wrote:
> > trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java,
> >  line 101
> > 
> >
> > It's not clear to me why we need both setRowNumber and processOp.

Since a CorrelationCompositeOperator may have multiple parents, I used a buffer 
to store the output of parents of the CorrelationCompositeOperator (shown 
processOp method). The TableScanOperator will trigger the setRowNumber method 
and then CorrelationCompositeOperator will decide the operationPathTags of this 
row based on the contents in the buffer and then forward the row in its buffer 
to its child. So, setRowNumber in here is used to evaluate the 
operationPathTags of the row in the buffer before the 
CorrelationCompositeOperator gets the new row. 


> On 2012-02-10 17:38:09, Kevin Wilfong wrote:
> > trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java,
> >  lines 150-177
> > 
> >
> > Putting this code in a helper method would be better than having it 
> > both here and in setRowNumber.

I will do it. 


> On 2012-02-10 17:38:09, Kevin Wilfong wrote:
> > trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java,
> >  line 274
> > 
> >
> > Does this commented out code need to be kept?

This commented out code is not needed. I will delete it. 


> On 2012-02-10 17:38:09, Kevin Wilfong wrote:
> > trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java, line 1337
> > 
> >
> > I couldn't find a CorrelationFakeReduceSinkOperator class.

CorrelationLocalSimulativeReduceSinkOperator was named as 
CorrelationFakeReduceSinkOperator. I will use the right name in the comment. 


> On 2012-02-10 17:38:09, Kevin Wilfong wrote:
> > trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java, 
> > line 273
> > 
> >
> > Tabs are bad, could you change them to spaces, at least in the new code 
> > your introducing.

I will change the format of my code. Thanks for letting me know.


> On 2012-02-10 17:38:09, Kevin Wilfong wrote:
> > trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java,
> >  line 239
> > 
> >
> > I take it from this line it's a requirement that in order for this 
> > correlation optimization to be attempted every reduce sink has to be 
> > followed only by children with a single child.
> > 
> > Could this be relaxed?  Could the optimization simply not be applied if 
> > there is an operator between two ReduceSinks that has more than one child?
> > 
> > Also, if there is a ReduceSink which is not followed by another 
> > ReduceSink, but is followed by an operator with more than one child, this 
> > prevents the optimization from being used, even though it shouldn't have an 
> > effect.
> > 
> > Also, regarding checking if the size <=1, if the size <1 the next line 
> > will throw an exception.

Only "assert op.getChildOperators().size() > 0;" is needed at here. Thank you 
for letting me know. 


> On 2012-02-10 17:38:09, Kevin Wilfong wrote:
> > trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java,
> >  line 335
> > 
> >
> > findNextChildReduceSinkOperator can return null, do you need to check 
> > for this?

findNextChildReduceSinkOperator will not return null since its input will not 
be the last ReduceSinkOperator before the FileSinkOperator. For example, 
suppose that we have a plan tree like (some operators)->RS1->(some 
operators)->RS2->(some operators)->FS. The input of 
findNex

Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2012-02-10 Thread Kevin Wilfong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2001/#review4912
---


I've started reviewing this, here's my comments so far.  I'll continue to look 
over it.


trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java


Does this have to default to false, does anything break if it's true?

Similarly, have you tried running the tests with this set to true?



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java


It's not clear to me why we need both setRowNumber and processOp.



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java


Putting this code in a helper method would be better than having it both 
here and in setRowNumber.



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java


Does this commented out code need to be kept?



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java


I couldn't find a CorrelationFakeReduceSinkOperator class.



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java


Tabs are bad, could you change them to spaces, at least in the new code 
your introducing.



trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java


I take it from this line it's a requirement that in order for this 
correlation optimization to be attempted every reduce sink has to be followed 
only by children with a single child.

Could this be relaxed?  Could the optimization simply not be applied if 
there is an operator between two ReduceSinks that has more than one child?

Also, if there is a ReduceSink which is not followed by another ReduceSink, 
but is followed by an operator with more than one child, this prevents the 
optimization from being used, even though it shouldn't have an effect.

Also, regarding checking if the size <=1, if the size <1 the next line will 
throw an exception.



trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java


findNextChildReduceSinkOperator can return null, do you need to check for 
this?


- Kevin


On 2012-01-29 17:56:48, Yin Huai wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2001/
> ---
> 
> (Updated 2012-01-29 17:56:48)
> 
> 
> Review request for hive.
> 
> 
> Summary
> ---
> 
> This optimizer exploits intra-query correlations and merges multiple 
> correlated MapReduce jobs into one jobs.
> 
> 
> This addresses bug HIVE-2206.
> https://issues.apache.org/jira/browse/HIVE-2206
> 
> 
> Diffs
> -
> 
>   trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1237326 
>   
> trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java 
> PRE-CREATION 
>   
> trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java
>  PRE-CREATION 
>   
> trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
>  PRE-CREATION 
>   
> trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
>  PRE-CREATION 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 1237326 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 1237326 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 
> 1237326 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 
> 1237326 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 
> 1237326 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 
> 1237326 
>   
> trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java
>  PRE-CREATION 
>   
> trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java
>  PRE-CREATION 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 1237326 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 
> 1237326 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1237326 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> 1237326 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java 
> PRE-CREATION 
>   
> trunk/ql/src/java/org/apache/h

Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2012-01-29 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2001/
---

(Updated 2012-01-29 17:56:48.704757)


Review request for hive.


Changes
---

make the patch compatible with latest trunk (revision 1237253).


Summary
---

This optimizer exploits intra-query correlations and merges multiple correlated 
MapReduce jobs into one jobs.


This addresses bug HIVE-2206.
https://issues.apache.org/jira/browse/HIVE-2206


Diffs (updated)
-

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1237326 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java 
PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
 PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 1237326 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 1237326 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 1237326 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 
1237326 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 
1237326 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 
1237326 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java 
PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java
 PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
1237326 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 1237326 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1237326 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
1237326 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java 
PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java 
PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java
 PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1237326 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 1237326 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 1237326 
  trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 1237326 
  trunk/ql/src/test/results/compiler/plan/groupby1.q.xml 1237326 
  trunk/ql/src/test/results/compiler/plan/groupby2.q.xml 1237326 
  trunk/ql/src/test/results/compiler/plan/groupby3.q.xml 1237326 
  trunk/ql/src/test/results/compiler/plan/groupby5.q.xml 1237326 

Diff: https://reviews.apache.org/r/2001/diff


Testing
---


Thanks,

Yin



Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2011-12-29 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2001/
---

(Updated 2011-12-29 18:50:12.277210)


Review request for hive.


Summary
---

This optimizer exploits intra-query correlations and merges multiple correlated 
MapReduce jobs into one jobs.


This addresses bug HIVE-2206.
https://issues.apache.org/jira/browse/HIVE-2206


Diffs (updated)
-

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1224666 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java 
PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
 PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 1224666 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 1224666 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 1224666 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 
1224666 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 
1224666 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 
1224666 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java 
PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java
 PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
1224666 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 1224666 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1224666 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
1224666 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java 
PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java 
PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java
 PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1224666 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 1224666 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 1224666 
  trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 1224666 
  trunk/ql/src/test/results/compiler/plan/groupby1.q.xml 1224666 
  trunk/ql/src/test/results/compiler/plan/groupby2.q.xml 1224666 
  trunk/ql/src/test/results/compiler/plan/groupby3.q.xml 1224666 
  trunk/ql/src/test/results/compiler/plan/groupby5.q.xml 1224666 

Diff: https://reviews.apache.org/r/2001/diff


Testing (updated)
---


Thanks,

Yin



Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2011-12-05 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2001/
---

(Updated 2011-12-05 19:12:23.087778)


Review request for hive.


Changes
---

CorrelationReduceSinkOperator has been merged into ReduceSinkOperator. Detailed 
comments has been added to new operator.


Summary
---

This optimizer exploits intra-query correlations and merges multiple correlated 
MapReduce jobs into one jobs.


This addresses bug HIVE-2206.
https://issues.apache.org/jira/browse/HIVE-2206


Diffs (updated)
-

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1210283 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationFakeReduceSinkOperator.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationManualForwardOperator.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
 PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 1210283 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 1210283 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 1210283 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 
1210283 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 
1210283 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java 
PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java
 PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
1210283 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 1210283 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1210283 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
1210283 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java 
PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationFakeReduceSinkDesc.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationManualForwardDesc.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java
 PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1210283 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 1210283 
  trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 1210283 
  trunk/ql/src/test/results/compiler/plan/groupby1.q.xml 1210283 
  trunk/ql/src/test/results/compiler/plan/groupby2.q.xml 1210283 
  trunk/ql/src/test/results/compiler/plan/groupby3.q.xml 1210283 
  trunk/ql/src/test/results/compiler/plan/groupby5.q.xml 1210283 

Diff: https://reviews.apache.org/r/2001/diff


Testing (updated)
---

Previous version of diff passed all unit tests. Since the latest trunk 
(r1209696) cannot finish all of unit tests, the latest version of diff has not 
been tested.


Thanks,

Yin



Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2011-09-21 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2001/
---

Review request for hive.


Summary
---

This optimizer exploits intra-query correlations and merges multiple correlated 
MapReduce jobs into one jobs.


This addresses bug HIVE-2206.
https://issues.apache.org/jira/browse/HIVE-2206


Diffs
-

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1173271 
  
trunk/ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
 1173271 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationDispatchOperator.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationFakeReduceSinkOperator.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationManualForwardOperator.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReduceSinkOperator.java
 PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 1173271 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 
1173271 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 1173271 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 1173271 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationGenMRRedSink1.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java 
PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java
 PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
1173271 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 1173271 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1173271 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
1173271 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java 
PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationDispatchDesc.java 
PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationFakeReduceSinkDesc.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationManualForwardDesc.java
 PRE-CREATION 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReduceSinkDesc.java 
PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1173271 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCountDistinct.java
 PRE-CREATION 
  trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 1173271 
  trunk/ql/src/test/results/clientpositive/show_functions.q.out 1173271 
  trunk/ql/src/test/results/compiler/plan/groupby1.q.xml 1173271 
  trunk/ql/src/test/results/compiler/plan/groupby2.q.xml 1173271 
  trunk/ql/src/test/results/compiler/plan/groupby3.q.xml 1173271 
  trunk/ql/src/test/results/compiler/plan/groupby5.q.xml 1173271 

Diff: https://reviews.apache.org/r/2001/diff


Testing
---

Ran all unit tests


Thanks,

Yin