Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]
On 六月 23, 2015, 1:31 p.m., Xuefu Zhang wrote: ql/src/test/results/clientpositive/spark/groupby10.q.out, line 60 https://reviews.apache.org/r/34757/diff/3-4/?file=988071#file988071line60 Interesting. How come we got more stages now? Not sure, introduced by latest merge from trunk. - chengxiang --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/#review88966 --- On 六月 23, 2015, 7:24 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/ --- (Updated 六月 23, 2015, 7:24 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-10844 https://issues.apache.org/jira/browse/HIVE-10844 Repository: hive-git Description --- Some Hive queries(like TPCDS Q39) may share the same subquery, which translated into sperate, but equivalent Works in SparkWork, combining these equivalent Works into a single one would help to benifit from following dynamic RDD caching optimization. Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 ql/src/test/results/clientpositive/spark/groupby10.q.out dd9d9fe ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out 2cb126d ql/src/test/results/clientpositive/spark/groupby8.q.out c249b61 ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out 2fb1d73 ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 ql/src/test/results/clientpositive/spark/union28.q.out 98582df ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 ql/src/test/results/clientpositive/spark/union30.q.out 3409623 ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 ql/src/test/results/clientpositive/spark/union5.q.out afee988 ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4 ql/src/test/results/clientpositive/spark/union_remove_7.q.out 084fbd6 ql/src/test/results/clientpositive/spark/union_top_level.q.out dede1ef Diff: https://reviews.apache.org/r/34757/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]
On 六月 19, 2015, 1:47 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java, line 98 https://reviews.apache.org/r/34757/diff/3/?file=988066#file988066line98 I think the recursion should go on even if there is only one child for a given work. For examle, if we have: w1 | w2 | w3 / \ w4 w5 Even if each of w1 and w2 has only one child, it's still possible that we can combine w4 and w5. created HIVE-11082 to track this. On 六月 19, 2015, 1:47 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java, line 207 https://reviews.apache.org/r/34757/diff/3/?file=988066#file988066line207 Could you explain the reason here? add comments in latest patch. While combine multi equivalent works into single one, we need to update all the references to the replaced works. leave works output should be read by further SparkWork/FetchWork, we does not able to update work reference across SparkWork, so combine leave works may lead to error. - chengxiang --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/#review88537 --- On 六月 23, 2015, 7:24 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/ --- (Updated 六月 23, 2015, 7:24 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-10844 https://issues.apache.org/jira/browse/HIVE-10844 Repository: hive-git Description --- Some Hive queries(like TPCDS Q39) may share the same subquery, which translated into sperate, but equivalent Works in SparkWork, combining these equivalent Works into a single one would help to benifit from following dynamic RDD caching optimization. Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 ql/src/test/results/clientpositive/spark/groupby10.q.out dd9d9fe ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out 2cb126d ql/src/test/results/clientpositive/spark/groupby8.q.out c249b61 ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out 2fb1d73 ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 ql/src/test/results/clientpositive/spark/union28.q.out 98582df ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 ql/src/test/results/clientpositive/spark/union30.q.out 3409623 ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 ql/src/test/results/clientpositive/spark/union5.q.out afee988 ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4
Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]
On June 23, 2015, 1:31 p.m., Xuefu Zhang wrote: ql/src/test/results/clientpositive/spark/groupby10.q.out, line 60 https://reviews.apache.org/r/34757/diff/3-4/?file=988071#file988071line60 Interesting. How come we got more stages now? chengxiang li wrote: Not sure, introduced by latest merge from trunk. Okay. Got it. Thanks. - Xuefu --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/#review88966 --- On June 23, 2015, 7:24 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/ --- (Updated June 23, 2015, 7:24 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-10844 https://issues.apache.org/jira/browse/HIVE-10844 Repository: hive-git Description --- Some Hive queries(like TPCDS Q39) may share the same subquery, which translated into sperate, but equivalent Works in SparkWork, combining these equivalent Works into a single one would help to benifit from following dynamic RDD caching optimization. Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 ql/src/test/results/clientpositive/spark/groupby10.q.out dd9d9fe ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out 2cb126d ql/src/test/results/clientpositive/spark/groupby8.q.out c249b61 ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out 2fb1d73 ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 ql/src/test/results/clientpositive/spark/union28.q.out 98582df ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 ql/src/test/results/clientpositive/spark/union30.q.out 3409623 ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 ql/src/test/results/clientpositive/spark/union5.q.out afee988 ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4 ql/src/test/results/clientpositive/spark/union_remove_7.q.out 084fbd6 ql/src/test/results/clientpositive/spark/union_top_level.q.out dede1ef Diff: https://reviews.apache.org/r/34757/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/#review89155 --- Ship it! - Xuefu Zhang On June 23, 2015, 7:24 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/ --- (Updated June 23, 2015, 7:24 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-10844 https://issues.apache.org/jira/browse/HIVE-10844 Repository: hive-git Description --- Some Hive queries(like TPCDS Q39) may share the same subquery, which translated into sperate, but equivalent Works in SparkWork, combining these equivalent Works into a single one would help to benifit from following dynamic RDD caching optimization. Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 ql/src/test/results/clientpositive/spark/groupby10.q.out dd9d9fe ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out 2cb126d ql/src/test/results/clientpositive/spark/groupby8.q.out c249b61 ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out 2fb1d73 ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 ql/src/test/results/clientpositive/spark/union28.q.out 98582df ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 ql/src/test/results/clientpositive/spark/union30.q.out 3409623 ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 ql/src/test/results/clientpositive/spark/union5.q.out afee988 ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4 ql/src/test/results/clientpositive/spark/union_remove_7.q.out 084fbd6 ql/src/test/results/clientpositive/spark/union_top_level.q.out dede1ef Diff: https://reviews.apache.org/r/34757/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/#review88966 --- ql/src/test/results/clientpositive/spark/groupby10.q.out (line 60) https://reviews.apache.org/r/34757/#comment141575 Interesting. How come we got more stages now? - Xuefu Zhang On June 23, 2015, 7:24 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/ --- (Updated June 23, 2015, 7:24 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-10844 https://issues.apache.org/jira/browse/HIVE-10844 Repository: hive-git Description --- Some Hive queries(like TPCDS Q39) may share the same subquery, which translated into sperate, but equivalent Works in SparkWork, combining these equivalent Works into a single one would help to benifit from following dynamic RDD caching optimization. Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 ql/src/test/results/clientpositive/spark/groupby10.q.out dd9d9fe ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out 2cb126d ql/src/test/results/clientpositive/spark/groupby8.q.out c249b61 ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out 2fb1d73 ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 ql/src/test/results/clientpositive/spark/union28.q.out 98582df ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 ql/src/test/results/clientpositive/spark/union30.q.out 3409623 ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 ql/src/test/results/clientpositive/spark/union5.q.out afee988 ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4 ql/src/test/results/clientpositive/spark/union_remove_7.q.out 084fbd6 ql/src/test/results/clientpositive/spark/union_top_level.q.out dede1ef Diff: https://reviews.apache.org/r/34757/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]
On 六月 19, 2015, 3:42 a.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java, line 207 https://reviews.apache.org/r/34757/diff/2/?file=986303#file986303line207 I think in SparkWork, there couldn't be two parents connectting to the same child. UnionWork wold be such a child, but SparkWork doesn't have UnionWork, if I'm not mistaken. I don't think SparkPlan has a limitation of only link between to trans. If there are two links between a parent to a child, the input will be self unioned and the result is the input to the child. chengxiang li wrote: Take self-join for example, there would be 2 MapWork connect to same ReduceWork. if we combine these 2 MapWorks into 1, SparkPlan::connect would throw exception during SparkPlan generation. Xuefu Zhang wrote: I see. Thanks for the explanation. However, I'm wondering if we should remove the restriction. Otherwise, certain cases such as self join will not take the advantage of this feature, right? Yes, this is a further optimization we can continue to work on, i would create a following up JIRA to research on this. - chengxiang --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/#review88484 --- On 六月 19, 2015, 7:22 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/ --- (Updated 六月 19, 2015, 7:22 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-10844 https://issues.apache.org/jira/browse/HIVE-10844 Repository: hive-git Description --- Some Hive queries(like TPCDS Q39) may share the same subquery, which translated into sperate, but equivalent Works in SparkWork, combining these equivalent Works into a single one would help to benifit from following dynamic RDD caching optimization. Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 ql/src/test/results/clientpositive/spark/groupby10.q.out 9d3cf36 ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out 2cb126d ql/src/test/results/clientpositive/spark/groupby8.q.out 307395f ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out ba04a57 ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 ql/src/test/results/clientpositive/spark/union28.q.out 98582df ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 ql/src/test/results/clientpositive/spark/union30.q.out 3409623 ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 ql/src/test/results/clientpositive/spark/union5.q.out afee988 ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4
Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/ --- (Updated June 23, 2015, 7:24 a.m.) Review request for hive and Xuefu Zhang. Changes --- fix Xuefu's second round comments. Bugs: HIVE-10844 https://issues.apache.org/jira/browse/HIVE-10844 Repository: hive-git Description --- Some Hive queries(like TPCDS Q39) may share the same subquery, which translated into sperate, but equivalent Works in SparkWork, combining these equivalent Works into a single one would help to benifit from following dynamic RDD caching optimization. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 ql/src/test/results/clientpositive/spark/groupby10.q.out dd9d9fe ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out 2cb126d ql/src/test/results/clientpositive/spark/groupby8.q.out c249b61 ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out 2fb1d73 ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 ql/src/test/results/clientpositive/spark/union28.q.out 98582df ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 ql/src/test/results/clientpositive/spark/union30.q.out 3409623 ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 ql/src/test/results/clientpositive/spark/union5.q.out afee988 ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4 ql/src/test/results/clientpositive/spark/union_remove_7.q.out 084fbd6 ql/src/test/results/clientpositive/spark/union_top_level.q.out dede1ef Diff: https://reviews.apache.org/r/34757/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/ --- (Updated June 19, 2015, 7:22 a.m.) Review request for hive and Xuefu Zhang. Changes --- fix xuefu's first round review. Bugs: HIVE-10844 https://issues.apache.org/jira/browse/HIVE-10844 Repository: hive-git Description --- Some Hive queries(like TPCDS Q39) may share the same subquery, which translated into sperate, but equivalent Works in SparkWork, combining these equivalent Works into a single one would help to benifit from following dynamic RDD caching optimization. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 ql/src/test/results/clientpositive/spark/groupby10.q.out 9d3cf36 ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out 2cb126d ql/src/test/results/clientpositive/spark/groupby8.q.out 307395f ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out ba04a57 ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 ql/src/test/results/clientpositive/spark/union28.q.out 98582df ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 ql/src/test/results/clientpositive/spark/union30.q.out 3409623 ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 ql/src/test/results/clientpositive/spark/union5.q.out afee988 ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4 ql/src/test/results/clientpositive/spark/union_remove_7.q.out 084fbd6 ql/src/test/results/clientpositive/spark/union_top_level.q.out dede1ef Diff: https://reviews.apache.org/r/34757/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]
On June 19, 2015, 3:42 a.m., Xuefu Zhang wrote: !. First round review. Only at a high level. 2. Patch looks very good and clean. 3. It will be better if we can add some test cases for self union, self-join, CWE, and repeated sub-queries. This can be a followup task, though. Created HIVE-11053 to add more tests. On June 19, 2015, 3:42 a.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java, line 157 https://reviews.apache.org/r/34757/diff/2/?file=986303#file986303line157 Could parents be null, in case of top-level works? Same for children. SparkWork always return not null List now, but it may changes, so it always not harm to add null verification. - chengxiang --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/#review88484 --- On June 17, 2015, 8:59 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/ --- (Updated June 17, 2015, 8:59 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-10844 https://issues.apache.org/jira/browse/HIVE-10844 Repository: hive-git Description --- Some Hive queries(like TPCDS Q39) may share the same subquery, which translated into sperate, but equivalent Works in SparkWork, combining these equivalent Works into a single one would help to benifit from following dynamic RDD caching optimization. Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 ql/src/test/results/clientpositive/spark/groupby10.q.out 9d3cf36 ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out 2cb126d ql/src/test/results/clientpositive/spark/groupby8.q.out 307395f ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out ba04a57 ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 ql/src/test/results/clientpositive/spark/union28.q.out 98582df ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 ql/src/test/results/clientpositive/spark/union30.q.out 3409623 ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 ql/src/test/results/clientpositive/spark/union5.q.out afee988 ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4 ql/src/test/results/clientpositive/spark/union_remove_7.q.out 084fbd6 ql/src/test/results/clientpositive/spark/union_top_level.q.out dede1ef Diff: https://reviews.apache.org/r/34757/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]
On June 19, 2015, 3:42 a.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java, line 207 https://reviews.apache.org/r/34757/diff/2/?file=986303#file986303line207 I think in SparkWork, there couldn't be two parents connectting to the same child. UnionWork wold be such a child, but SparkWork doesn't have UnionWork, if I'm not mistaken. I don't think SparkPlan has a limitation of only link between to trans. If there are two links between a parent to a child, the input will be self unioned and the result is the input to the child. Take self-join for example, there would be 2 MapWork connect to same ReduceWork. if we combine these 2 MapWorks into 1, SparkPlan::connect would throw exception during SparkPlan generation. - chengxiang --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/#review88484 --- On June 17, 2015, 8:59 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/ --- (Updated June 17, 2015, 8:59 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-10844 https://issues.apache.org/jira/browse/HIVE-10844 Repository: hive-git Description --- Some Hive queries(like TPCDS Q39) may share the same subquery, which translated into sperate, but equivalent Works in SparkWork, combining these equivalent Works into a single one would help to benifit from following dynamic RDD caching optimization. Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 ql/src/test/results/clientpositive/spark/groupby10.q.out 9d3cf36 ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out 2cb126d ql/src/test/results/clientpositive/spark/groupby8.q.out 307395f ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out ba04a57 ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 ql/src/test/results/clientpositive/spark/union28.q.out 98582df ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 ql/src/test/results/clientpositive/spark/union30.q.out 3409623 ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 ql/src/test/results/clientpositive/spark/union5.q.out afee988 ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4 ql/src/test/results/clientpositive/spark/union_remove_7.q.out 084fbd6 ql/src/test/results/clientpositive/spark/union_top_level.q.out dede1ef Diff: https://reviews.apache.org/r/34757/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/#review88537 --- ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java (line 334) https://reviews.apache.org/r/34757/#comment141118 Nit: extra line ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java (line 398) https://reviews.apache.org/r/34757/#comment141117 Nit: remove the extra line. ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java (line 72) https://reviews.apache.org/r/34757/#comment141119 Should we just compare the name itself? In case of the hashcode collides, it might still produce an inconsistent plan. ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java (line 98) https://reviews.apache.org/r/34757/#comment141122 I think the recursion should go on even if there is only one child for a given work. For examle, if we have: w1 | w2 | w3 / \ w4 w5 Even if each of w1 and w2 has only one child, it's still possible that we can combine w4 and w5. ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java (line 207) https://reviews.apache.org/r/34757/#comment141123 Could you explain the reason here? - Xuefu Zhang On June 19, 2015, 7:22 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/ --- (Updated June 19, 2015, 7:22 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-10844 https://issues.apache.org/jira/browse/HIVE-10844 Repository: hive-git Description --- Some Hive queries(like TPCDS Q39) may share the same subquery, which translated into sperate, but equivalent Works in SparkWork, combining these equivalent Works into a single one would help to benifit from following dynamic RDD caching optimization. Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 ql/src/test/results/clientpositive/spark/groupby10.q.out 9d3cf36 ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out 2cb126d ql/src/test/results/clientpositive/spark/groupby8.q.out 307395f ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out ba04a57 ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 ql/src/test/results/clientpositive/spark/union28.q.out 98582df ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 ql/src/test/results/clientpositive/spark/union30.q.out 3409623 ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 ql/src/test/results/clientpositive/spark/union5.q.out afee988 ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189
Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]
On June 19, 2015, 3:42 a.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java, line 207 https://reviews.apache.org/r/34757/diff/2/?file=986303#file986303line207 I think in SparkWork, there couldn't be two parents connectting to the same child. UnionWork wold be such a child, but SparkWork doesn't have UnionWork, if I'm not mistaken. I don't think SparkPlan has a limitation of only link between to trans. If there are two links between a parent to a child, the input will be self unioned and the result is the input to the child. chengxiang li wrote: Take self-join for example, there would be 2 MapWork connect to same ReduceWork. if we combine these 2 MapWorks into 1, SparkPlan::connect would throw exception during SparkPlan generation. I see. Thanks for the explanation. However, I'm wondering if we should remove the restriction. Otherwise, certain cases such as self join will not take the advantage of this feature, right? On June 19, 2015, 3:42 a.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java, line 157 https://reviews.apache.org/r/34757/diff/2/?file=986303#file986303line157 Could parents be null, in case of top-level works? Same for children. chengxiang li wrote: SparkWork always return not null List now, but it may changes, so it always not harm to add null verification. Yeah, if that's the case, the original code is cleaner and easier to read. If some changes, the tests might just catch the NPE. - Xuefu --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/#review88484 --- On June 19, 2015, 7:22 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/ --- (Updated June 19, 2015, 7:22 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-10844 https://issues.apache.org/jira/browse/HIVE-10844 Repository: hive-git Description --- Some Hive queries(like TPCDS Q39) may share the same subquery, which translated into sperate, but equivalent Works in SparkWork, combining these equivalent Works into a single one would help to benifit from following dynamic RDD caching optimization. Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 ql/src/test/results/clientpositive/spark/groupby10.q.out 9d3cf36 ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out 2cb126d ql/src/test/results/clientpositive/spark/groupby8.q.out 307395f ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out ba04a57 ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 ql/src/test/results/clientpositive/spark/union28.q.out 98582df ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 ql/src/test/results/clientpositive/spark/union30.q.out 3409623 ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 ql/src/test/results/clientpositive/spark/union5.q.out afee988 ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070
Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/#review88484 --- !. First round review. Only at a high level. 2. Patch looks very good and clean. 3. It will be better if we can add some test cases for self union, self-join, CWE, and repeated sub-queries. This can be a followup task, though. ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java (line 398) https://reviews.apache.org/r/34757/#comment141051 This doesn't seem efficient in that the comparison will not terminate early when result becomes false. Something like this should offers better performance: if ( compareObject(desc1.getFilterMapString(), desc2.getFilterMapString()) compareObject(desc1.getKeysString(), desc2.getKeysString()) (desc1.getPosBigTable() == desc2.getPosBigTable()) ...) { return true; } else { return false; } ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java (line 101) https://reviews.apache.org/r/34757/#comment141052 There seems an efficiency issue: once we divide the first level works into sets, each set containing equivalent works, we only need to compare the next level works for each set. There is no point to compare works from the next level across sets. For instance, top level has w1, w2, w3, and w4. We get two sets: {w1, w2}, {w3, w4}. To further compare and combine, we only need to evaluate the children of w1 and w2 together, and the children for w3 and w4 tother. However, we don't need to evaluate the children for w1 and the children for w3. Am I missing something? ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java (line 157) https://reviews.apache.org/r/34757/#comment141053 Could parents be null, in case of top-level works? Same for children. ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java (line 207) https://reviews.apache.org/r/34757/#comment141055 I think in SparkWork, there couldn't be two parents connectting to the same child. UnionWork wold be such a child, but SparkWork doesn't have UnionWork, if I'm not mistaken. I don't think SparkPlan has a limitation of only link between to trans. If there are two links between a parent to a child, the input will be self unioned and the result is the input to the child. - Xuefu Zhang On June 17, 2015, 8:59 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/ --- (Updated June 17, 2015, 8:59 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-10844 https://issues.apache.org/jira/browse/HIVE-10844 Repository: hive-git Description --- Some Hive queries(like TPCDS Q39) may share the same subquery, which translated into sperate, but equivalent Works in SparkWork, combining these equivalent Works into a single one would help to benifit from following dynamic RDD caching optimization. Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 ql/src/test/results/clientpositive/spark/groupby10.q.out 9d3cf36 ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out 2cb126d ql/src/test/results/clientpositive/spark/groupby8.q.out 307395f ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out ba04a57 ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 ql/src/test/results/clientpositive/spark/union28.q.out 98582df ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4
Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/ --- (Updated June 17, 2015, 8:59 a.m.) Review request for hive and Xuefu Zhang. Changes --- improve the compare algorithm and update qfile output Bugs: HIVE-10844 https://issues.apache.org/jira/browse/HIVE-10844 Repository: hive-git Description --- Some Hive queries(like TPCDS Q39) may share the same subquery, which translated into sperate, but equivalent Works in SparkWork, combining these equivalent Works into a single one would help to benifit from following dynamic RDD caching optimization. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 ql/src/test/results/clientpositive/spark/groupby10.q.out 9d3cf36 ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out 2cb126d ql/src/test/results/clientpositive/spark/groupby8.q.out 307395f ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out ba04a57 ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 ql/src/test/results/clientpositive/spark/union28.q.out 98582df ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 ql/src/test/results/clientpositive/spark/union30.q.out 3409623 ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 ql/src/test/results/clientpositive/spark/union5.q.out afee988 ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4 ql/src/test/results/clientpositive/spark/union_remove_7.q.out 084fbd6 ql/src/test/results/clientpositive/spark/union_top_level.q.out dede1ef Diff: https://reviews.apache.org/r/34757/diff/ Testing --- Thanks, chengxiang li