Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]

2015-06-24 Thread chengxiang li


 On 六月 23, 2015, 1:31 p.m., Xuefu Zhang wrote:
  ql/src/test/results/clientpositive/spark/groupby10.q.out, line 60
  https://reviews.apache.org/r/34757/diff/3-4/?file=988071#file988071line60
 
  Interesting. How come we got more stages now?

Not sure, introduced by latest merge from trunk.


- chengxiang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34757/#review88966
---


On 六月 23, 2015, 7:24 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34757/
 ---
 
 (Updated 六月 23, 2015, 7:24 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-10844
 https://issues.apache.org/jira/browse/HIVE-10844
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Some Hive queries(like TPCDS Q39) may share the same subquery, which 
 translated into sperate, but equivalent Works in SparkWork, combining these 
 equivalent Works into a single one would help to benifit from following 
 dynamic RDD caching optimization.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 
   ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 
   ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 
   ql/src/test/results/clientpositive/spark/groupby10.q.out dd9d9fe 
   ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 
   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 
   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 
   
 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out
  2cb126d 
   ql/src/test/results/clientpositive/spark/groupby8.q.out c249b61 
   ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out 2fb1d73 
   ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 
   ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 
   ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef 
   ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 
   ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 
   ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 
   ql/src/test/results/clientpositive/spark/union28.q.out 98582df 
   ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 
   ql/src/test/results/clientpositive/spark/union30.q.out 3409623 
   ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 
   ql/src/test/results/clientpositive/spark/union5.q.out afee988 
   ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 
   ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab 
   ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 
   ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 
   ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 
   ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 
   ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 
   ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 
   ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 
   ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 
   ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 
   ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 
   ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4 
   ql/src/test/results/clientpositive/spark/union_remove_7.q.out 084fbd6 
   ql/src/test/results/clientpositive/spark/union_top_level.q.out dede1ef 
 
 Diff: https://reviews.apache.org/r/34757/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li
 




Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]

2015-06-24 Thread chengxiang li


 On 六月 19, 2015, 1:47 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java,
   line 98
  https://reviews.apache.org/r/34757/diff/3/?file=988066#file988066line98
 
  I think the recursion should go on even if there is only one child for 
  a given work. For examle, if we have:
  
   w1
   |
   w2
   |
   w3
   / \
  w4  w5
  
  Even if each of w1 and w2 has only one child, it's still possible that 
  we can combine w4 and w5.

created HIVE-11082 to track this.


 On 六月 19, 2015, 1:47 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java,
   line 207
  https://reviews.apache.org/r/34757/diff/3/?file=988066#file988066line207
 
  Could you explain the reason here?

add comments in latest patch. While combine multi equivalent works into single 
one, we need to update all the references to the replaced works. leave works 
output should be read by further SparkWork/FetchWork, we does not able to 
update work reference across SparkWork, so combine leave works may lead to 
error.


- chengxiang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34757/#review88537
---


On 六月 23, 2015, 7:24 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34757/
 ---
 
 (Updated 六月 23, 2015, 7:24 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-10844
 https://issues.apache.org/jira/browse/HIVE-10844
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Some Hive queries(like TPCDS Q39) may share the same subquery, which 
 translated into sperate, but equivalent Works in SparkWork, combining these 
 equivalent Works into a single one would help to benifit from following 
 dynamic RDD caching optimization.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 
   ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 
   ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 
   ql/src/test/results/clientpositive/spark/groupby10.q.out dd9d9fe 
   ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 
   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 
   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 
   
 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out
  2cb126d 
   ql/src/test/results/clientpositive/spark/groupby8.q.out c249b61 
   ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out 2fb1d73 
   ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 
   ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 
   ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef 
   ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 
   ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 
   ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 
   ql/src/test/results/clientpositive/spark/union28.q.out 98582df 
   ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 
   ql/src/test/results/clientpositive/spark/union30.q.out 3409623 
   ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 
   ql/src/test/results/clientpositive/spark/union5.q.out afee988 
   ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 
   ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab 
   ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 
   ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 
   ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 
   ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 
   ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 
   ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 
   ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 
   ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 
   ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 
   ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 
   ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4 
   

Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]

2015-06-24 Thread Xuefu Zhang


 On June 23, 2015, 1:31 p.m., Xuefu Zhang wrote:
  ql/src/test/results/clientpositive/spark/groupby10.q.out, line 60
  https://reviews.apache.org/r/34757/diff/3-4/?file=988071#file988071line60
 
  Interesting. How come we got more stages now?
 
 chengxiang li wrote:
 Not sure, introduced by latest merge from trunk.

Okay. Got it. Thanks.


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34757/#review88966
---


On June 23, 2015, 7:24 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34757/
 ---
 
 (Updated June 23, 2015, 7:24 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-10844
 https://issues.apache.org/jira/browse/HIVE-10844
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Some Hive queries(like TPCDS Q39) may share the same subquery, which 
 translated into sperate, but equivalent Works in SparkWork, combining these 
 equivalent Works into a single one would help to benifit from following 
 dynamic RDD caching optimization.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 
   ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 
   ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 
   ql/src/test/results/clientpositive/spark/groupby10.q.out dd9d9fe 
   ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 
   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 
   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 
   
 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out
  2cb126d 
   ql/src/test/results/clientpositive/spark/groupby8.q.out c249b61 
   ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out 2fb1d73 
   ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 
   ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 
   ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef 
   ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 
   ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 
   ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 
   ql/src/test/results/clientpositive/spark/union28.q.out 98582df 
   ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 
   ql/src/test/results/clientpositive/spark/union30.q.out 3409623 
   ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 
   ql/src/test/results/clientpositive/spark/union5.q.out afee988 
   ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 
   ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab 
   ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 
   ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 
   ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 
   ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 
   ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 
   ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 
   ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 
   ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 
   ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 
   ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 
   ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4 
   ql/src/test/results/clientpositive/spark/union_remove_7.q.out 084fbd6 
   ql/src/test/results/clientpositive/spark/union_top_level.q.out dede1ef 
 
 Diff: https://reviews.apache.org/r/34757/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li
 




Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]

2015-06-24 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34757/#review89155
---

Ship it!


- Xuefu Zhang


On June 23, 2015, 7:24 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34757/
 ---
 
 (Updated June 23, 2015, 7:24 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-10844
 https://issues.apache.org/jira/browse/HIVE-10844
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Some Hive queries(like TPCDS Q39) may share the same subquery, which 
 translated into sperate, but equivalent Works in SparkWork, combining these 
 equivalent Works into a single one would help to benifit from following 
 dynamic RDD caching optimization.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 
   ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 
   ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 
   ql/src/test/results/clientpositive/spark/groupby10.q.out dd9d9fe 
   ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 
   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 
   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 
   
 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out
  2cb126d 
   ql/src/test/results/clientpositive/spark/groupby8.q.out c249b61 
   ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out 2fb1d73 
   ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 
   ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 
   ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef 
   ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 
   ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 
   ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 
   ql/src/test/results/clientpositive/spark/union28.q.out 98582df 
   ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 
   ql/src/test/results/clientpositive/spark/union30.q.out 3409623 
   ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 
   ql/src/test/results/clientpositive/spark/union5.q.out afee988 
   ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 
   ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab 
   ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 
   ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 
   ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 
   ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 
   ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 
   ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 
   ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 
   ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 
   ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 
   ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 
   ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4 
   ql/src/test/results/clientpositive/spark/union_remove_7.q.out 084fbd6 
   ql/src/test/results/clientpositive/spark/union_top_level.q.out dede1ef 
 
 Diff: https://reviews.apache.org/r/34757/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li
 




Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]

2015-06-23 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34757/#review88966
---



ql/src/test/results/clientpositive/spark/groupby10.q.out (line 60)
https://reviews.apache.org/r/34757/#comment141575

Interesting. How come we got more stages now?


- Xuefu Zhang


On June 23, 2015, 7:24 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34757/
 ---
 
 (Updated June 23, 2015, 7:24 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-10844
 https://issues.apache.org/jira/browse/HIVE-10844
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Some Hive queries(like TPCDS Q39) may share the same subquery, which 
 translated into sperate, but equivalent Works in SparkWork, combining these 
 equivalent Works into a single one would help to benifit from following 
 dynamic RDD caching optimization.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 
   ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 
   ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 
   ql/src/test/results/clientpositive/spark/groupby10.q.out dd9d9fe 
   ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 
   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 
   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 
   
 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out
  2cb126d 
   ql/src/test/results/clientpositive/spark/groupby8.q.out c249b61 
   ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out 2fb1d73 
   ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 
   ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 
   ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef 
   ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 
   ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 
   ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 
   ql/src/test/results/clientpositive/spark/union28.q.out 98582df 
   ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 
   ql/src/test/results/clientpositive/spark/union30.q.out 3409623 
   ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 
   ql/src/test/results/clientpositive/spark/union5.q.out afee988 
   ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 
   ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab 
   ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 
   ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 
   ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 
   ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 
   ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 
   ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 
   ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 
   ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 
   ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 
   ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 
   ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4 
   ql/src/test/results/clientpositive/spark/union_remove_7.q.out 084fbd6 
   ql/src/test/results/clientpositive/spark/union_top_level.q.out dede1ef 
 
 Diff: https://reviews.apache.org/r/34757/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li
 




Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]

2015-06-23 Thread chengxiang li


 On 六月 19, 2015, 3:42 a.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java,
   line 207
  https://reviews.apache.org/r/34757/diff/2/?file=986303#file986303line207
 
  I think in SparkWork, there couldn't be two parents connectting to the 
  same child. UnionWork wold be such a child, but SparkWork doesn't have 
  UnionWork, if I'm not mistaken.
  
  I don't think SparkPlan has a limitation of only link between to trans. 
  If there are two links between a parent to a child, the input will be self 
  unioned and the result is the input to the child.
 
 chengxiang li wrote:
 Take self-join for example, there would be 2 MapWork connect to same 
 ReduceWork. if we combine these 2 MapWorks into 1, SparkPlan::connect would 
 throw exception during SparkPlan generation.
 
 Xuefu Zhang wrote:
 I see. Thanks for the explanation. However, I'm wondering if we should 
 remove the restriction. Otherwise, certain cases such as self join will not 
 take the advantage of this feature, right?

Yes, this is a further optimization we can continue to work on, i would create 
a following up JIRA to research on this.


- chengxiang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34757/#review88484
---


On 六月 19, 2015, 7:22 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34757/
 ---
 
 (Updated 六月 19, 2015, 7:22 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-10844
 https://issues.apache.org/jira/browse/HIVE-10844
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Some Hive queries(like TPCDS Q39) may share the same subquery, which 
 translated into sperate, but equivalent Works in SparkWork, combining these 
 equivalent Works into a single one would help to benifit from following 
 dynamic RDD caching optimization.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 
   ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 
   ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 
   ql/src/test/results/clientpositive/spark/groupby10.q.out 9d3cf36 
   ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 
   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 
   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 
   
 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out
  2cb126d 
   ql/src/test/results/clientpositive/spark/groupby8.q.out 307395f 
   ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out ba04a57 
   ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 
   ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 
   ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef 
   ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 
   ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 
   ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 
   ql/src/test/results/clientpositive/spark/union28.q.out 98582df 
   ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 
   ql/src/test/results/clientpositive/spark/union30.q.out 3409623 
   ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 
   ql/src/test/results/clientpositive/spark/union5.q.out afee988 
   ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 
   ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab 
   ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 
   ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 
   ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 
   ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 
   ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 
   ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 
   ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 
   ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 
   ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 
   ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 
   ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4 
   

Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]

2015-06-23 Thread chengxiang li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34757/
---

(Updated June 23, 2015, 7:24 a.m.)


Review request for hive and Xuefu Zhang.


Changes
---

fix Xuefu's second round comments.


Bugs: HIVE-10844
https://issues.apache.org/jira/browse/HIVE-10844


Repository: hive-git


Description
---

Some Hive queries(like TPCDS Q39) may share the same subquery, which translated 
into sperate, but equivalent Works in SparkWork, combining these equivalent 
Works into a single one would help to benifit from following dynamic RDD 
caching optimization.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
  ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 
  ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 
  ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 
  ql/src/test/results/clientpositive/spark/groupby10.q.out dd9d9fe 
  ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 
  ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 
  ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 
  
ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out
 2cb126d 
  ql/src/test/results/clientpositive/spark/groupby8.q.out c249b61 
  ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out 2fb1d73 
  ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 
  ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 
  ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef 
  ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 
  ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 
  ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 
  ql/src/test/results/clientpositive/spark/union28.q.out 98582df 
  ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 
  ql/src/test/results/clientpositive/spark/union30.q.out 3409623 
  ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 
  ql/src/test/results/clientpositive/spark/union5.q.out afee988 
  ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 
  ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab 
  ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 
  ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 
  ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 
  ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 
  ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 
  ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 
  ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 
  ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 
  ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 
  ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 
  ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4 
  ql/src/test/results/clientpositive/spark/union_remove_7.q.out 084fbd6 
  ql/src/test/results/clientpositive/spark/union_top_level.q.out dede1ef 

Diff: https://reviews.apache.org/r/34757/diff/


Testing
---


Thanks,

chengxiang li



Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]

2015-06-19 Thread chengxiang li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34757/
---

(Updated June 19, 2015, 7:22 a.m.)


Review request for hive and Xuefu Zhang.


Changes
---

fix xuefu's first round review.


Bugs: HIVE-10844
https://issues.apache.org/jira/browse/HIVE-10844


Repository: hive-git


Description
---

Some Hive queries(like TPCDS Q39) may share the same subquery, which translated 
into sperate, but equivalent Works in SparkWork, combining these equivalent 
Works into a single one would help to benifit from following dynamic RDD 
caching optimization.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
  ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 
  ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 
  ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 
  ql/src/test/results/clientpositive/spark/groupby10.q.out 9d3cf36 
  ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 
  ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 
  ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 
  
ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out
 2cb126d 
  ql/src/test/results/clientpositive/spark/groupby8.q.out 307395f 
  ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out ba04a57 
  ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 
  ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 
  ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef 
  ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 
  ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 
  ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 
  ql/src/test/results/clientpositive/spark/union28.q.out 98582df 
  ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 
  ql/src/test/results/clientpositive/spark/union30.q.out 3409623 
  ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 
  ql/src/test/results/clientpositive/spark/union5.q.out afee988 
  ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 
  ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab 
  ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 
  ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 
  ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 
  ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 
  ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 
  ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 
  ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 
  ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 
  ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 
  ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 
  ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4 
  ql/src/test/results/clientpositive/spark/union_remove_7.q.out 084fbd6 
  ql/src/test/results/clientpositive/spark/union_top_level.q.out dede1ef 

Diff: https://reviews.apache.org/r/34757/diff/


Testing
---


Thanks,

chengxiang li



Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]

2015-06-19 Thread chengxiang li


 On June 19, 2015, 3:42 a.m., Xuefu Zhang wrote:
  !. First round review. Only at a high level.
  2. Patch looks very good and clean.
  3. It will be better if we can add some test cases for self union, 
  self-join, CWE, and repeated sub-queries. This can be a followup task, 
  though.

Created HIVE-11053 to add more tests.


 On June 19, 2015, 3:42 a.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java,
   line 157
  https://reviews.apache.org/r/34757/diff/2/?file=986303#file986303line157
 
  Could parents be null, in case of top-level works? Same for children.

SparkWork always return not null List now, but it may changes, so it always not 
harm to add null verification.


- chengxiang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34757/#review88484
---


On June 17, 2015, 8:59 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34757/
 ---
 
 (Updated June 17, 2015, 8:59 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-10844
 https://issues.apache.org/jira/browse/HIVE-10844
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Some Hive queries(like TPCDS Q39) may share the same subquery, which 
 translated into sperate, but equivalent Works in SparkWork, combining these 
 equivalent Works into a single one would help to benifit from following 
 dynamic RDD caching optimization.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 
   ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 
   ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 
   ql/src/test/results/clientpositive/spark/groupby10.q.out 9d3cf36 
   ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 
   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 
   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 
   
 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out
  2cb126d 
   ql/src/test/results/clientpositive/spark/groupby8.q.out 307395f 
   ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out ba04a57 
   ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 
   ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 
   ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef 
   ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 
   ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 
   ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 
   ql/src/test/results/clientpositive/spark/union28.q.out 98582df 
   ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 
   ql/src/test/results/clientpositive/spark/union30.q.out 3409623 
   ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 
   ql/src/test/results/clientpositive/spark/union5.q.out afee988 
   ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 
   ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab 
   ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 
   ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 
   ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 
   ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 
   ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 
   ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 
   ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 
   ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 
   ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 
   ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 
   ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4 
   ql/src/test/results/clientpositive/spark/union_remove_7.q.out 084fbd6 
   ql/src/test/results/clientpositive/spark/union_top_level.q.out dede1ef 
 
 Diff: https://reviews.apache.org/r/34757/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li
 




Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]

2015-06-19 Thread chengxiang li


 On June 19, 2015, 3:42 a.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java,
   line 207
  https://reviews.apache.org/r/34757/diff/2/?file=986303#file986303line207
 
  I think in SparkWork, there couldn't be two parents connectting to the 
  same child. UnionWork wold be such a child, but SparkWork doesn't have 
  UnionWork, if I'm not mistaken.
  
  I don't think SparkPlan has a limitation of only link between to trans. 
  If there are two links between a parent to a child, the input will be self 
  unioned and the result is the input to the child.

Take self-join for example, there would be 2 MapWork connect to same 
ReduceWork. if we combine these 2 MapWorks into 1, SparkPlan::connect would 
throw exception during SparkPlan generation.


- chengxiang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34757/#review88484
---


On June 17, 2015, 8:59 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34757/
 ---
 
 (Updated June 17, 2015, 8:59 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-10844
 https://issues.apache.org/jira/browse/HIVE-10844
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Some Hive queries(like TPCDS Q39) may share the same subquery, which 
 translated into sperate, but equivalent Works in SparkWork, combining these 
 equivalent Works into a single one would help to benifit from following 
 dynamic RDD caching optimization.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 
   ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 
   ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 
   ql/src/test/results/clientpositive/spark/groupby10.q.out 9d3cf36 
   ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 
   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 
   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 
   
 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out
  2cb126d 
   ql/src/test/results/clientpositive/spark/groupby8.q.out 307395f 
   ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out ba04a57 
   ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 
   ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 
   ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef 
   ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 
   ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 
   ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 
   ql/src/test/results/clientpositive/spark/union28.q.out 98582df 
   ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 
   ql/src/test/results/clientpositive/spark/union30.q.out 3409623 
   ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 
   ql/src/test/results/clientpositive/spark/union5.q.out afee988 
   ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 
   ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab 
   ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 
   ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 
   ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 
   ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 
   ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 
   ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 
   ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 
   ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 
   ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 
   ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 
   ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4 
   ql/src/test/results/clientpositive/spark/union_remove_7.q.out 084fbd6 
   ql/src/test/results/clientpositive/spark/union_top_level.q.out dede1ef 
 
 Diff: https://reviews.apache.org/r/34757/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li
 




Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]

2015-06-19 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34757/#review88537
---



ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java 
(line 334)
https://reviews.apache.org/r/34757/#comment141118

Nit: extra line



ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java 
(line 398)
https://reviews.apache.org/r/34757/#comment141117

Nit: remove the extra line.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
 (line 72)
https://reviews.apache.org/r/34757/#comment141119

Should we just compare the name itself? In case of the hashcode collides, 
it might still produce an inconsistent plan.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
 (line 98)
https://reviews.apache.org/r/34757/#comment141122

I think the recursion should go on even if there is only one child for a 
given work. For examle, if we have:

 w1
 |
 w2
 |
 w3
 / \
w4  w5

Even if each of w1 and w2 has only one child, it's still possible that we 
can combine w4 and w5.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
 (line 207)
https://reviews.apache.org/r/34757/#comment141123

Could you explain the reason here?


- Xuefu Zhang


On June 19, 2015, 7:22 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34757/
 ---
 
 (Updated June 19, 2015, 7:22 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-10844
 https://issues.apache.org/jira/browse/HIVE-10844
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Some Hive queries(like TPCDS Q39) may share the same subquery, which 
 translated into sperate, but equivalent Works in SparkWork, combining these 
 equivalent Works into a single one would help to benifit from following 
 dynamic RDD caching optimization.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 
   ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 
   ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 
   ql/src/test/results/clientpositive/spark/groupby10.q.out 9d3cf36 
   ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 
   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 
   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 
   
 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out
  2cb126d 
   ql/src/test/results/clientpositive/spark/groupby8.q.out 307395f 
   ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out ba04a57 
   ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 
   ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 
   ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef 
   ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 
   ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 
   ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 
   ql/src/test/results/clientpositive/spark/union28.q.out 98582df 
   ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 
   ql/src/test/results/clientpositive/spark/union30.q.out 3409623 
   ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 
   ql/src/test/results/clientpositive/spark/union5.q.out afee988 
   ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 
   ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab 
   ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 
   ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 
   ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 
   ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 
   ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 
   ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 
   ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 
   ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 
   ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 
   ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 
   

Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]

2015-06-19 Thread Xuefu Zhang


 On June 19, 2015, 3:42 a.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java,
   line 207
  https://reviews.apache.org/r/34757/diff/2/?file=986303#file986303line207
 
  I think in SparkWork, there couldn't be two parents connectting to the 
  same child. UnionWork wold be such a child, but SparkWork doesn't have 
  UnionWork, if I'm not mistaken.
  
  I don't think SparkPlan has a limitation of only link between to trans. 
  If there are two links between a parent to a child, the input will be self 
  unioned and the result is the input to the child.
 
 chengxiang li wrote:
 Take self-join for example, there would be 2 MapWork connect to same 
 ReduceWork. if we combine these 2 MapWorks into 1, SparkPlan::connect would 
 throw exception during SparkPlan generation.

I see. Thanks for the explanation. However, I'm wondering if we should remove 
the restriction. Otherwise, certain cases such as self join will not take the 
advantage of this feature, right?


 On June 19, 2015, 3:42 a.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java,
   line 157
  https://reviews.apache.org/r/34757/diff/2/?file=986303#file986303line157
 
  Could parents be null, in case of top-level works? Same for children.
 
 chengxiang li wrote:
 SparkWork always return not null List now, but it may changes, so it 
 always not harm to add null verification.

Yeah, if that's the case, the original code is cleaner and easier to read. If 
some changes, the tests might just catch the NPE.


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34757/#review88484
---


On June 19, 2015, 7:22 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34757/
 ---
 
 (Updated June 19, 2015, 7:22 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-10844
 https://issues.apache.org/jira/browse/HIVE-10844
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Some Hive queries(like TPCDS Q39) may share the same subquery, which 
 translated into sperate, but equivalent Works in SparkWork, combining these 
 equivalent Works into a single one would help to benifit from following 
 dynamic RDD caching optimization.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 
   ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 
   ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 
   ql/src/test/results/clientpositive/spark/groupby10.q.out 9d3cf36 
   ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 
   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 
   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 
   
 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out
  2cb126d 
   ql/src/test/results/clientpositive/spark/groupby8.q.out 307395f 
   ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out ba04a57 
   ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 
   ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 
   ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef 
   ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 
   ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 
   ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 
   ql/src/test/results/clientpositive/spark/union28.q.out 98582df 
   ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 
   ql/src/test/results/clientpositive/spark/union30.q.out 3409623 
   ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 
   ql/src/test/results/clientpositive/spark/union5.q.out afee988 
   ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 
   ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab 
   ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 
   ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 
   ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 
   ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 
   ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 
   

Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]

2015-06-18 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34757/#review88484
---


!. First round review. Only at a high level.
2. Patch looks very good and clean.
3. It will be better if we can add some test cases for self union, self-join, 
CWE, and repeated sub-queries. This can be a followup task, though.


ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java 
(line 398)
https://reviews.apache.org/r/34757/#comment141051

This doesn't seem efficient in that the comparison will not terminate early 
when result becomes false. Something like this should offers better performance:

if ( compareObject(desc1.getFilterMapString(), desc2.getFilterMapString()) 

 compareObject(desc1.getKeysString(), desc2.getKeysString()) 
 (desc1.getPosBigTable() == desc2.getPosBigTable())  ...) {
 return true;
   } else {
 return false;
   }



ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
 (line 101)
https://reviews.apache.org/r/34757/#comment141052

There seems an efficiency issue: once we divide the first level works into 
sets, each set containing equivalent works, we only need to compare the next 
level works for each set. There is no point to compare works from the next 
level across sets. For instance, top level has w1, w2, w3, and w4. We get two 
sets: {w1, w2}, {w3, w4}. To further compare and combine, we only need to 
evaluate the children of w1 and w2 together, and the children for w3 and w4 
tother. However, we don't need to evaluate the children for w1 and the children 
for w3.

Am I missing something?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
 (line 157)
https://reviews.apache.org/r/34757/#comment141053

Could parents be null, in case of top-level works? Same for children.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
 (line 207)
https://reviews.apache.org/r/34757/#comment141055

I think in SparkWork, there couldn't be two parents connectting to the same 
child. UnionWork wold be such a child, but SparkWork doesn't have UnionWork, if 
I'm not mistaken.

I don't think SparkPlan has a limitation of only link between to trans. If 
there are two links between a parent to a child, the input will be self unioned 
and the result is the input to the child.


- Xuefu Zhang


On June 17, 2015, 8:59 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34757/
 ---
 
 (Updated June 17, 2015, 8:59 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-10844
 https://issues.apache.org/jira/browse/HIVE-10844
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Some Hive queries(like TPCDS Q39) may share the same subquery, which 
 translated into sperate, but equivalent Works in SparkWork, combining these 
 equivalent Works into a single one would help to benifit from following 
 dynamic RDD caching optimization.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 
   ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 
   ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 
   ql/src/test/results/clientpositive/spark/groupby10.q.out 9d3cf36 
   ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 
   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 
   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 
   
 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out
  2cb126d 
   ql/src/test/results/clientpositive/spark/groupby8.q.out 307395f 
   ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out ba04a57 
   ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 
   ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 
   ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef 
   ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 
   ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 
   ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 
   ql/src/test/results/clientpositive/spark/union28.q.out 98582df 
   ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 
   

Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]

2015-06-17 Thread chengxiang li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34757/
---

(Updated June 17, 2015, 8:59 a.m.)


Review request for hive and Xuefu Zhang.


Changes
---

improve the compare algorithm and update qfile output


Bugs: HIVE-10844
https://issues.apache.org/jira/browse/HIVE-10844


Repository: hive-git


Description
---

Some Hive queries(like TPCDS Q39) may share the same subquery, which translated 
into sperate, but equivalent Works in SparkWork, combining these equivalent 
Works into a single one would help to benifit from following dynamic RDD 
caching optimization.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
  ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 
  ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 
  ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 
  ql/src/test/results/clientpositive/spark/groupby10.q.out 9d3cf36 
  ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 
  ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 
  ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 
  
ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out
 2cb126d 
  ql/src/test/results/clientpositive/spark/groupby8.q.out 307395f 
  ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out ba04a57 
  ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 
  ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 
  ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef 
  ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 
  ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 
  ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 
  ql/src/test/results/clientpositive/spark/union28.q.out 98582df 
  ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 
  ql/src/test/results/clientpositive/spark/union30.q.out 3409623 
  ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 
  ql/src/test/results/clientpositive/spark/union5.q.out afee988 
  ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 
  ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab 
  ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 
  ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 
  ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 
  ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 
  ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 
  ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 
  ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 
  ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 
  ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 
  ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 
  ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4 
  ql/src/test/results/clientpositive/spark/union_remove_7.q.out 084fbd6 
  ql/src/test/results/clientpositive/spark/union_top_level.q.out dede1ef 

Diff: https://reviews.apache.org/r/34757/diff/


Testing
---


Thanks,

chengxiang li