Re: Review Request 63470: HIVE-17767 Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN

2017-11-06 Thread Vineet Garg


> On Nov. 7, 2017, 12:03 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
> > Lines 1790-1792 (original)
> > 
> >
> > I checked and this rule does look useful. Maybe leave a TODO to bring 
> > this back.

Will do


> On Nov. 7, 2017, 12:03 a.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/llap/subquery_select.q.out
> > Line 2052 (original), 2051 (patched)
> > 
> >
> > Can you double check this plan change? Only thing changing from Left 
> > outer to inner which isn't semantically equivalent. Either earlier 
> > semantics was incorrect or newer one.

HiveFilterJoinRule/FilterJoinRule is the one converting this left outer join 
into inner join. I am trying to understand why does it do that.
Edit: FilterJoinRule uses RelOptUtil's  simplifyJoin which "Simplifies outer 
joins if filter above would reject nulls". FOr this particular query we get a 
plan where filter is generated on top outer join which will reject null columns 
from right hence it is simplified to inner join. So this is safe to do.


> On Nov. 7, 2017, 12:03 a.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/subquery_unqualcolumnrefs.q.out
> > Lines 197 (patched)
> > 
> >
> > Follow-up to transform Gby->LSJ to LSJ ?

Will open a jira for this


- Vineet


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63470/#review190247
---


On Nov. 6, 2017, 7:53 p.m., Vineet Garg wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63470/
> ---
> 
> (Updated Nov. 6, 2017, 7:53 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-17767
> https://issues.apache.org/jira/browse/HIVE-17767
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch does the following:
> * Added back earlier patch to generate value generator
> * Added logic to rewrite EXISTS/IN correlated subqueries into LEFT SEMI JOIN
> * Remove SemiJoinTransposeRule (This rule pushes semi join underneath its 
> left join which might not be semantically correct thing to do)
> 
> 
> Diffs
> -
> 
>   itests/src/test/resources/testconfiguration.properties 9b0bace8bf 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveSubQRemoveRelBuilder.java
>  3a1897f4aa 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java
>  62125f0fb7 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java
>  2dca6a25ac 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/PlanModifierForASTConv.java
>  5e8a994873 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 80351bef87 
>   ql/src/test/queries/clientpositive/subquery_exists.q 19c42f0c29 
>   ql/src/test/queries/clientpositive/subquery_in.q 4ba170a706 
>   ql/src/test/results/clientpositive/constprog_partitioner.q.out 87618df902 
>   ql/src/test/results/clientpositive/llap/constprog_semijoin.q.out 998a5df264 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_2.q.out 
> 87e08fbcde 
>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out 6e55acf0d8 
>   ql/src/test/results/clientpositive/llap/subquery_exists.q.out e206f0851e 
>   ql/src/test/results/clientpositive/llap/subquery_in.q.out af42131bc2 
>   ql/src/test/results/clientpositive/llap/subquery_in_having.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/llap/subquery_multi.q.out 96fe17a05a 
>   ql/src/test/results/clientpositive/llap/subquery_notin.q.out 8e2ca937af 
>   ql/src/test/results/clientpositive/llap/subquery_scalar.q.out c89d053b4a 
>   ql/src/test/results/clientpositive/llap/subquery_select.q.out 118f6ebccf 
>   ql/src/test/results/clientpositive/llap/subquery_views.q.out a9a81133b5 
>   ql/src/test/results/clientpositive/llap/vector_mapjoin_reduce.q.out 
> 4e6f00f6b7 
>   ql/src/test/results/clientpositive/masking_12.q.out 540c53e825 
>   ql/src/test/results/clientpositive/masking_3.q.out 1114c80676 
>   ql/src/test/results/clientpositive/masking_4.q.out 527da21610 
>   ql/src/test/results/clientpositive/perf/spark/query10.q.out eb3a2f6699 
>   ql/src/test/results/clientpositive/perf/spark/query16.q.out b74d721d41 
>   ql/src/test/results/clientpositive/perf/spark/query35.q.out 8759b71b8c 
>   ql/src/test/results/clientpositive/perf/spark/query69.q.out e4430beaac 
>   ql/src/test/results/clientpositive/perf/spark/query94.q.out 43b8c77bdc 
> 

Re: Review Request 63470: HIVE-17767 Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN

2017-11-06 Thread Vineet Garg


> On Nov. 2, 2017, 4:40 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/llap/subquery_multi.q.out
> > Lines 2947-2950 (patched)
> > 
> >
> > Extra scan  and Gby.

The ordering is a bit different so you see extra scan and gby. It was earlier 
done in map9


> On Nov. 2, 2017, 4:40 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/llap/subquery_select.q.out
> > Line 2052 (original), 2051 (patched)
> > 
> >
> > Is this change expected?

Left outer join is transformed into inner join by some rule in pre-join 
optimization phase. This is probably because it is a self join and keys are 
non-nullable. It is safe in this case.


> On Nov. 2, 2017, 4:40 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/subquery_unqualcolumnrefs.q.out
> > Lines 209-212 (original), 220-223 (patched)
> > 
> >
> > This Gby is wasteful.

This GBY correspond to distinct in subquery. Earlier this was probably merged 
and gotten rid of in inner + gby = semi join rule. We will need to write an 
optimization rule to get ride of this group by. I'll open a jira for this.


- Vineet


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63470/#review189924
---


On Nov. 6, 2017, 7:53 p.m., Vineet Garg wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63470/
> ---
> 
> (Updated Nov. 6, 2017, 7:53 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-17767
> https://issues.apache.org/jira/browse/HIVE-17767
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch does the following:
> * Added back earlier patch to generate value generator
> * Added logic to rewrite EXISTS/IN correlated subqueries into LEFT SEMI JOIN
> * Remove SemiJoinTransposeRule (This rule pushes semi join underneath its 
> left join which might not be semantically correct thing to do)
> 
> 
> Diffs
> -
> 
>   itests/src/test/resources/testconfiguration.properties 9b0bace8bf 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveSubQRemoveRelBuilder.java
>  3a1897f4aa 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java
>  62125f0fb7 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java
>  2dca6a25ac 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/PlanModifierForASTConv.java
>  5e8a994873 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 80351bef87 
>   ql/src/test/queries/clientpositive/subquery_exists.q 19c42f0c29 
>   ql/src/test/queries/clientpositive/subquery_in.q 4ba170a706 
>   ql/src/test/results/clientpositive/constprog_partitioner.q.out 87618df902 
>   ql/src/test/results/clientpositive/llap/constprog_semijoin.q.out 998a5df264 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_2.q.out 
> 87e08fbcde 
>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out 6e55acf0d8 
>   ql/src/test/results/clientpositive/llap/subquery_exists.q.out e206f0851e 
>   ql/src/test/results/clientpositive/llap/subquery_in.q.out af42131bc2 
>   ql/src/test/results/clientpositive/llap/subquery_in_having.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/llap/subquery_multi.q.out 96fe17a05a 
>   ql/src/test/results/clientpositive/llap/subquery_notin.q.out 8e2ca937af 
>   ql/src/test/results/clientpositive/llap/subquery_scalar.q.out c89d053b4a 
>   ql/src/test/results/clientpositive/llap/subquery_select.q.out 118f6ebccf 
>   ql/src/test/results/clientpositive/llap/subquery_views.q.out a9a81133b5 
>   ql/src/test/results/clientpositive/llap/vector_mapjoin_reduce.q.out 
> 4e6f00f6b7 
>   ql/src/test/results/clientpositive/masking_12.q.out 540c53e825 
>   ql/src/test/results/clientpositive/masking_3.q.out 1114c80676 
>   ql/src/test/results/clientpositive/masking_4.q.out 527da21610 
>   ql/src/test/results/clientpositive/perf/spark/query10.q.out eb3a2f6699 
>   ql/src/test/results/clientpositive/perf/spark/query16.q.out b74d721d41 
>   ql/src/test/results/clientpositive/perf/spark/query35.q.out 8759b71b8c 
>   ql/src/test/results/clientpositive/perf/spark/query69.q.out e4430beaac 
>   ql/src/test/results/clientpositive/perf/spark/query94.q.out 43b8c77bdc 
>   ql/src/test/results/clientpositive/perf/tez/query10.q.out cf3651b35b 
>   ql/src/test/results/clientpositive/perf/tez/query14.q.out b2a45f155a 
>   ql/src/test/results/clientpositive/perf/tez/query1

Re: Review Request 63470: HIVE-17767 Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN

2017-11-06 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63470/#review190247
---




ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
Lines 1790-1792 (original)


I checked and this rule does look useful. Maybe leave a TODO to bring this 
back.



ql/src/test/results/clientpositive/llap/subquery_select.q.out
Line 2052 (original), 2051 (patched)


Can you double check this plan change? Only thing changing from Left outer 
to inner which isn't semantically equivalent. Either earlier semantics was 
incorrect or newer one.



ql/src/test/results/clientpositive/subquery_unqualcolumnrefs.q.out
Lines 197 (patched)


Follow-up to transform Gby->LSJ to LSJ ?


- Ashutosh Chauhan


On Nov. 6, 2017, 7:53 p.m., Vineet Garg wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63470/
> ---
> 
> (Updated Nov. 6, 2017, 7:53 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-17767
> https://issues.apache.org/jira/browse/HIVE-17767
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch does the following:
> * Added back earlier patch to generate value generator
> * Added logic to rewrite EXISTS/IN correlated subqueries into LEFT SEMI JOIN
> * Remove SemiJoinTransposeRule (This rule pushes semi join underneath its 
> left join which might not be semantically correct thing to do)
> 
> 
> Diffs
> -
> 
>   itests/src/test/resources/testconfiguration.properties 9b0bace8bf 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveSubQRemoveRelBuilder.java
>  3a1897f4aa 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java
>  62125f0fb7 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java
>  2dca6a25ac 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/PlanModifierForASTConv.java
>  5e8a994873 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 80351bef87 
>   ql/src/test/queries/clientpositive/subquery_exists.q 19c42f0c29 
>   ql/src/test/queries/clientpositive/subquery_in.q 4ba170a706 
>   ql/src/test/results/clientpositive/constprog_partitioner.q.out 87618df902 
>   ql/src/test/results/clientpositive/llap/constprog_semijoin.q.out 998a5df264 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_2.q.out 
> 87e08fbcde 
>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out 6e55acf0d8 
>   ql/src/test/results/clientpositive/llap/subquery_exists.q.out e206f0851e 
>   ql/src/test/results/clientpositive/llap/subquery_in.q.out af42131bc2 
>   ql/src/test/results/clientpositive/llap/subquery_in_having.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/llap/subquery_multi.q.out 96fe17a05a 
>   ql/src/test/results/clientpositive/llap/subquery_notin.q.out 8e2ca937af 
>   ql/src/test/results/clientpositive/llap/subquery_scalar.q.out c89d053b4a 
>   ql/src/test/results/clientpositive/llap/subquery_select.q.out 118f6ebccf 
>   ql/src/test/results/clientpositive/llap/subquery_views.q.out a9a81133b5 
>   ql/src/test/results/clientpositive/llap/vector_mapjoin_reduce.q.out 
> 4e6f00f6b7 
>   ql/src/test/results/clientpositive/masking_12.q.out 540c53e825 
>   ql/src/test/results/clientpositive/masking_3.q.out 1114c80676 
>   ql/src/test/results/clientpositive/masking_4.q.out 527da21610 
>   ql/src/test/results/clientpositive/perf/spark/query10.q.out eb3a2f6699 
>   ql/src/test/results/clientpositive/perf/spark/query16.q.out b74d721d41 
>   ql/src/test/results/clientpositive/perf/spark/query35.q.out 8759b71b8c 
>   ql/src/test/results/clientpositive/perf/spark/query69.q.out e4430beaac 
>   ql/src/test/results/clientpositive/perf/spark/query94.q.out 43b8c77bdc 
>   ql/src/test/results/clientpositive/perf/tez/query10.q.out cf3651b35b 
>   ql/src/test/results/clientpositive/perf/tez/query14.q.out b2a45f155a 
>   ql/src/test/results/clientpositive/perf/tez/query16.q.out a7b710d6e1 
>   ql/src/test/results/clientpositive/perf/tez/query23.q.out 7112de61d9 
>   ql/src/test/results/clientpositive/perf/tez/query35.q.out a72f57816e 
>   ql/src/test/results/clientpositive/perf/tez/query69.q.out 591f3fcdb0 
>   ql/src/test/results/clientpositive/perf/tez/query94.q.out 7674aa7f7c 
>   ql/src/test/results/clientpositive/semijoin5.q.out 533c077f58 
>   ql/src/test/results/clientpositive/spark/constprog_partitioner.q.out 
> b89f9f5905 
>   ql/src/test/results/clientpositive/spark/constprog_semijoin.q.out 
> 1c6e38002d 
>   ql/src/test/results/clientp

Re: Review Request 63470: HIVE-17767 Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN

2017-11-06 Thread Vineet Garg

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63470/
---

(Updated Nov. 6, 2017, 7:53 p.m.)


Review request for hive and Ashutosh Chauhan.


Changes
---

Addresses review comments + test output updates


Bugs: HIVE-17767
https://issues.apache.org/jira/browse/HIVE-17767


Repository: hive-git


Description
---

This patch does the following:
* Added back earlier patch to generate value generator
* Added logic to rewrite EXISTS/IN correlated subqueries into LEFT SEMI JOIN
* Remove SemiJoinTransposeRule (This rule pushes semi join underneath its left 
join which might not be semantically correct thing to do)


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 9b0bace8bf 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveSubQRemoveRelBuilder.java
 3a1897f4aa 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java
 62125f0fb7 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java
 2dca6a25ac 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/PlanModifierForASTConv.java
 5e8a994873 
  ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 80351bef87 
  ql/src/test/queries/clientpositive/subquery_exists.q 19c42f0c29 
  ql/src/test/queries/clientpositive/subquery_in.q 4ba170a706 
  ql/src/test/results/clientpositive/constprog_partitioner.q.out 87618df902 
  ql/src/test/results/clientpositive/llap/constprog_semijoin.q.out 998a5df264 
  ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_2.q.out 
87e08fbcde 
  ql/src/test/results/clientpositive/llap/explainuser_1.q.out 6e55acf0d8 
  ql/src/test/results/clientpositive/llap/subquery_exists.q.out e206f0851e 
  ql/src/test/results/clientpositive/llap/subquery_in.q.out af42131bc2 
  ql/src/test/results/clientpositive/llap/subquery_in_having.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/llap/subquery_multi.q.out 96fe17a05a 
  ql/src/test/results/clientpositive/llap/subquery_notin.q.out 8e2ca937af 
  ql/src/test/results/clientpositive/llap/subquery_scalar.q.out c89d053b4a 
  ql/src/test/results/clientpositive/llap/subquery_select.q.out 118f6ebccf 
  ql/src/test/results/clientpositive/llap/subquery_views.q.out a9a81133b5 
  ql/src/test/results/clientpositive/llap/vector_mapjoin_reduce.q.out 
4e6f00f6b7 
  ql/src/test/results/clientpositive/masking_12.q.out 540c53e825 
  ql/src/test/results/clientpositive/masking_3.q.out 1114c80676 
  ql/src/test/results/clientpositive/masking_4.q.out 527da21610 
  ql/src/test/results/clientpositive/perf/spark/query10.q.out eb3a2f6699 
  ql/src/test/results/clientpositive/perf/spark/query16.q.out b74d721d41 
  ql/src/test/results/clientpositive/perf/spark/query35.q.out 8759b71b8c 
  ql/src/test/results/clientpositive/perf/spark/query69.q.out e4430beaac 
  ql/src/test/results/clientpositive/perf/spark/query94.q.out 43b8c77bdc 
  ql/src/test/results/clientpositive/perf/tez/query10.q.out cf3651b35b 
  ql/src/test/results/clientpositive/perf/tez/query14.q.out b2a45f155a 
  ql/src/test/results/clientpositive/perf/tez/query16.q.out a7b710d6e1 
  ql/src/test/results/clientpositive/perf/tez/query23.q.out 7112de61d9 
  ql/src/test/results/clientpositive/perf/tez/query35.q.out a72f57816e 
  ql/src/test/results/clientpositive/perf/tez/query69.q.out 591f3fcdb0 
  ql/src/test/results/clientpositive/perf/tez/query94.q.out 7674aa7f7c 
  ql/src/test/results/clientpositive/semijoin5.q.out 533c077f58 
  ql/src/test/results/clientpositive/spark/constprog_partitioner.q.out 
b89f9f5905 
  ql/src/test/results/clientpositive/spark/constprog_semijoin.q.out 1c6e38002d 
  ql/src/test/results/clientpositive/spark/spark_explainuser_1.q.out 76c74d9ab7 
  ql/src/test/results/clientpositive/spark/subquery_exists.q.out dafe5b6d5b 
  ql/src/test/results/clientpositive/spark/subquery_in.q.out 471c2ccd94 
  ql/src/test/results/clientpositive/spark/subquery_multi.q.out ff519fda09 
  ql/src/test/results/clientpositive/spark/subquery_notin.q.out 1b2c0880ae 
  ql/src/test/results/clientpositive/spark/subquery_scalar.q.out de005ada82 
  ql/src/test/results/clientpositive/spark/subquery_select.q.out 7d3a16b6ee 
  ql/src/test/results/clientpositive/spark/subquery_views.q.out 91e39913a7 
  ql/src/test/results/clientpositive/spark/vector_mapjoin_reduce.q.out 
81af937e97 
  ql/src/test/results/clientpositive/subquery_exists.q.out c9f2a79041 
  ql/src/test/results/clientpositive/subquery_exists_having.q.out 2c41ff6c33 
  ql/src/test/results/clientpositive/subquery_in_having.q.out 6893442b61 
  ql/src/test/results/clientpositive/subquery_notexists.q.out 329573e8e1 
  ql/src/test/results/clientpositive/subquery_notexists_having.q.out 4d2b2fc873 
  ql/src/test/results/clientpositive/subquery_notin_having.q.out c321fe69ed 
  ql/src/test/results/clientp

Re: Review Request 63470: HIVE-17767 Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN

2017-11-02 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63470/#review189924
---




ql/src/test/results/clientpositive/llap/subquery_in.q.out
Lines 685 (patched)


Wasteful Gby.



ql/src/test/results/clientpositive/llap/subquery_multi.q.out
Lines 2947-2950 (patched)


Extra scan  and Gby.



ql/src/test/results/clientpositive/llap/subquery_multi.q.out
Lines 3406-3409 (patched)


Extra Gby.



ql/src/test/results/clientpositive/llap/subquery_multi.q.out
Lines 3869-3871 (patched)


Extra Gby.



ql/src/test/results/clientpositive/llap/subquery_select.q.out
Line 2052 (original), 2051 (patched)


Is this change expected?



ql/src/test/results/clientpositive/subquery_in_having.q.out
Lines 1799-1808 (original), 1805-1815 (patched)


Lets move this to test to llaplocal only.



ql/src/test/results/clientpositive/subquery_unqualcolumnrefs.q.out
Lines 209-212 (original), 220-223 (patched)


This Gby is wasteful.


- Ashutosh Chauhan


On Nov. 2, 2017, 4:51 a.m., Vineet Garg wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63470/
> ---
> 
> (Updated Nov. 2, 2017, 4:51 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-17767
> https://issues.apache.org/jira/browse/HIVE-17767
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch does the following:
> * Added back earlier patch to generate value generator
> * Added logic to rewrite EXISTS/IN correlated subqueries into LEFT SEMI JOIN
> * Remove SemiJoinTransposeRule (This rule pushes semi join underneath its 
> left join which might not be semantically correct thing to do)
> 
> 
> Diffs
> -
> 
>   itests/src/test/resources/testconfiguration.properties 462f332e99 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveSubQRemoveRelBuilder.java
>  3a1897f4aa 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java
>  62125f0fb7 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java
>  2dca6a25ac 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/PlanModifierForASTConv.java
>  5e8a994873 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 80351bef87 
>   ql/src/test/queries/clientpositive/subquery_exists.q 19c42f0c29 
>   ql/src/test/queries/clientpositive/subquery_in.q 4ba170a706 
>   ql/src/test/results/clientpositive/constprog_partitioner.q.out 87618df902 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_2.q.out 
> 87e08fbcde 
>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out 6e55acf0d8 
>   ql/src/test/results/clientpositive/llap/subquery_exists.q.out e206f0851e 
>   ql/src/test/results/clientpositive/llap/subquery_in.q.out af42131bc2 
>   ql/src/test/results/clientpositive/llap/subquery_multi.q.out 96fe17a05a 
>   ql/src/test/results/clientpositive/llap/subquery_notin.q.out 8e2ca937af 
>   ql/src/test/results/clientpositive/llap/subquery_scalar.q.out c89d053b4a 
>   ql/src/test/results/clientpositive/llap/subquery_select.q.out 118f6ebccf 
>   ql/src/test/results/clientpositive/llap/subquery_views.q.out a9a81133b5 
>   ql/src/test/results/clientpositive/llap/vector_mapjoin_reduce.q.out 
> 4e6f00f6b7 
>   ql/src/test/results/clientpositive/masking_12.q.out 540c53e825 
>   ql/src/test/results/clientpositive/masking_3.q.out 1114c80676 
>   ql/src/test/results/clientpositive/masking_4.q.out 527da21610 
>   ql/src/test/results/clientpositive/perf/spark/query10.q.out eb3a2f6699 
>   ql/src/test/results/clientpositive/perf/spark/query16.q.out b74d721d41 
>   ql/src/test/results/clientpositive/perf/spark/query35.q.out 8759b71b8c 
>   ql/src/test/results/clientpositive/perf/spark/query69.q.out e4430beaac 
>   ql/src/test/results/clientpositive/perf/spark/query94.q.out 43b8c77bdc 
>   ql/src/test/results/clientpositive/perf/tez/query10.q.out cf3651b35b 
>   ql/src/test/results/clientpositive/perf/tez/query14.q.out b2a45f155a 
>   ql/src/test/results/clientpositive/perf/tez/query16.q.out a7b710d6e1 
>   ql/src/test/results/clientpositive/perf/tez/query23.q.out 7112de61d9 
>   ql/src/test/results/clientpositive/perf/tez/query35.q.out a72f57816e 
>   ql/src/test/results/clientpositive/perf/tez/query69.q.out 591f3fcdb0 
>   ql/src/test/results/clientpositive/perf/t

Re: Review Request 63470: HIVE-17767 Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN

2017-11-01 Thread Vineet Garg

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63470/
---

(Updated Nov. 2, 2017, 4:51 a.m.)


Review request for hive and Ashutosh Chauhan.


Changes
---

Addressed review comments


Bugs: HIVE-17767
https://issues.apache.org/jira/browse/HIVE-17767


Repository: hive-git


Description
---

This patch does the following:
* Added back earlier patch to generate value generator
* Added logic to rewrite EXISTS/IN correlated subqueries into LEFT SEMI JOIN
* Remove SemiJoinTransposeRule (This rule pushes semi join underneath its left 
join which might not be semantically correct thing to do)


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 462f332e99 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveSubQRemoveRelBuilder.java
 3a1897f4aa 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java
 62125f0fb7 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java
 2dca6a25ac 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/PlanModifierForASTConv.java
 5e8a994873 
  ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 80351bef87 
  ql/src/test/queries/clientpositive/subquery_exists.q 19c42f0c29 
  ql/src/test/queries/clientpositive/subquery_in.q 4ba170a706 
  ql/src/test/results/clientpositive/constprog_partitioner.q.out 87618df902 
  ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_2.q.out 
87e08fbcde 
  ql/src/test/results/clientpositive/llap/explainuser_1.q.out 6e55acf0d8 
  ql/src/test/results/clientpositive/llap/subquery_exists.q.out e206f0851e 
  ql/src/test/results/clientpositive/llap/subquery_in.q.out af42131bc2 
  ql/src/test/results/clientpositive/llap/subquery_multi.q.out 96fe17a05a 
  ql/src/test/results/clientpositive/llap/subquery_notin.q.out 8e2ca937af 
  ql/src/test/results/clientpositive/llap/subquery_scalar.q.out c89d053b4a 
  ql/src/test/results/clientpositive/llap/subquery_select.q.out 118f6ebccf 
  ql/src/test/results/clientpositive/llap/subquery_views.q.out a9a81133b5 
  ql/src/test/results/clientpositive/llap/vector_mapjoin_reduce.q.out 
4e6f00f6b7 
  ql/src/test/results/clientpositive/masking_12.q.out 540c53e825 
  ql/src/test/results/clientpositive/masking_3.q.out 1114c80676 
  ql/src/test/results/clientpositive/masking_4.q.out 527da21610 
  ql/src/test/results/clientpositive/perf/spark/query10.q.out eb3a2f6699 
  ql/src/test/results/clientpositive/perf/spark/query16.q.out b74d721d41 
  ql/src/test/results/clientpositive/perf/spark/query35.q.out 8759b71b8c 
  ql/src/test/results/clientpositive/perf/spark/query69.q.out e4430beaac 
  ql/src/test/results/clientpositive/perf/spark/query94.q.out 43b8c77bdc 
  ql/src/test/results/clientpositive/perf/tez/query10.q.out cf3651b35b 
  ql/src/test/results/clientpositive/perf/tez/query14.q.out b2a45f155a 
  ql/src/test/results/clientpositive/perf/tez/query16.q.out a7b710d6e1 
  ql/src/test/results/clientpositive/perf/tez/query23.q.out 7112de61d9 
  ql/src/test/results/clientpositive/perf/tez/query35.q.out a72f57816e 
  ql/src/test/results/clientpositive/perf/tez/query69.q.out 591f3fcdb0 
  ql/src/test/results/clientpositive/perf/tez/query94.q.out 7674aa7f7c 
  ql/src/test/results/clientpositive/semijoin5.q.out 533c077f58 
  ql/src/test/results/clientpositive/spark/constprog_partitioner.q.out 
b89f9f5905 
  ql/src/test/results/clientpositive/spark/spark_explainuser_1.q.out 76c74d9ab7 
  ql/src/test/results/clientpositive/spark/subquery_exists.q.out dafe5b6d5b 
  ql/src/test/results/clientpositive/spark/subquery_in.q.out 471c2ccd94 
  ql/src/test/results/clientpositive/spark/subquery_multi.q.out ff519fda09 
  ql/src/test/results/clientpositive/spark/subquery_notin.q.out 1b2c0880ae 
  ql/src/test/results/clientpositive/spark/subquery_scalar.q.out de005ada82 
  ql/src/test/results/clientpositive/spark/subquery_select.q.out 7d3a16b6ee 
  ql/src/test/results/clientpositive/spark/subquery_views.q.out 91e39913a7 
  ql/src/test/results/clientpositive/spark/vector_mapjoin_reduce.q.out 
81af937e97 
  ql/src/test/results/clientpositive/subquery_exists.q.out c9f2a79041 
  ql/src/test/results/clientpositive/subquery_exists_having.q.out 2c41ff6c33 
  ql/src/test/results/clientpositive/subquery_in_having.q.out 6893442b61 
  ql/src/test/results/clientpositive/subquery_notexists.q.out 329573e8e1 
  ql/src/test/results/clientpositive/subquery_notexists_having.q.out 4d2b2fc873 
  ql/src/test/results/clientpositive/subquery_notin_having.q.out c321fe69ed 
  ql/src/test/results/clientpositive/subquery_unqualcolumnrefs.q.out 5c306f6b47 
  ql/src/test/results/clientpositive/vector_mapjoin_reduce.q.out ddea584990 


Diff: https://reviews.apache.org/r/63470/diff/2/

Changes: https://reviews.apache.org/r/63470/diff/1-2/


Testing
---


Thanks,

Re: Review Request 63470: HIVE-17767 Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN

2017-11-01 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63470/#review189825
---




ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java
Lines 345 (patched)


We shall initialize valuegen to true during visit. That will be safer.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java
Lines 1177 (patched)


Shall leave a TODO to remove this restriction.



ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
Lines 1790-1797 (original), 1790 (patched)


Any reason to drop this rule.



ql/src/test/results/clientpositive/constprog_partitioner.q.out
Lines 84 (patched)


Extra Gby is not needed, correct? Is this where we need rule to eliminate 
Gby beneath Left semi join?



ql/src/test/results/clientpositive/semijoin5.q.out
Line 122 (original), 122 (patched)


seems like we are not able to propagate constants over semi-join. Shall 
track this in seperate jira.



ql/src/test/results/clientpositive/subquery_in_having.q.out
Line 1819 (original), 1826 (patched)


Lets move this test to minilllaplocal only.


- Ashutosh Chauhan


On Nov. 1, 2017, 6:23 p.m., Vineet Garg wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63470/
> ---
> 
> (Updated Nov. 1, 2017, 6:23 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-17767
> https://issues.apache.org/jira/browse/HIVE-17767
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch does the following:
> * Added back earlier patch to generate value generator
> * Added logic to rewrite EXISTS/IN correlated subqueries into LEFT SEMI JOIN
> * Remove SemiJoinTransposeRule (This rule pushes semi join underneath its 
> left join which might not be semantically correct thing to do)
> 
> 
> Diffs
> -
> 
>   itests/src/test/resources/testconfiguration.properties 462f332e99 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveSubQRemoveRelBuilder.java
>  3a1897f4aa 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java
>  62125f0fb7 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java
>  2dca6a25ac 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 80351bef87 
>   ql/src/test/queries/clientpositive/subquery_exists.q 19c42f0c29 
>   ql/src/test/queries/clientpositive/subquery_in.q 4ba170a706 
>   ql/src/test/results/clientpositive/constprog_partitioner.q.out 87618df902 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_2.q.out 
> 87e08fbcde 
>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out 6e55acf0d8 
>   ql/src/test/results/clientpositive/llap/lineage3.q.out 66cc6ad5a0 
>   ql/src/test/results/clientpositive/llap/subquery_exists.q.out e206f0851e 
>   ql/src/test/results/clientpositive/llap/subquery_in.q.out af42131bc2 
>   ql/src/test/results/clientpositive/llap/subquery_multi.q.out 96fe17a05a 
>   ql/src/test/results/clientpositive/llap/subquery_notin.q.out 8e2ca937af 
>   ql/src/test/results/clientpositive/llap/subquery_scalar.q.out c89d053b4a 
>   ql/src/test/results/clientpositive/llap/subquery_select.q.out 118f6ebccf 
>   ql/src/test/results/clientpositive/llap/subquery_views.q.out a9a81133b5 
>   ql/src/test/results/clientpositive/llap/vector_mapjoin_reduce.q.out 
> 4e6f00f6b7 
>   ql/src/test/results/clientpositive/masking_12.q.out 540c53e825 
>   ql/src/test/results/clientpositive/masking_3.q.out 1114c80676 
>   ql/src/test/results/clientpositive/masking_4.q.out 527da21610 
>   ql/src/test/results/clientpositive/perf/spark/query10.q.out eb3a2f6699 
>   ql/src/test/results/clientpositive/perf/spark/query16.q.out b74d721d41 
>   ql/src/test/results/clientpositive/perf/spark/query35.q.out 8759b71b8c 
>   ql/src/test/results/clientpositive/perf/spark/query69.q.out e4430beaac 
>   ql/src/test/results/clientpositive/perf/spark/query94.q.out 43b8c77bdc 
>   ql/src/test/results/clientpositive/perf/tez/query10.q.out cf3651b35b 
>   ql/src/test/results/clientpositive/perf/tez/query14.q.out b2a45f155a 
>   ql/src/test/results/clientpositive/perf/tez/query16.q.out a7b710d6e1 
>   ql/src/test/results/clientpositive/perf/tez/query23.q.out 7112de61d9 
>   ql/src/test/results/clientpositive/perf/tez/query35.q.out a72f57816e 
>   ql/src/test/results/clientpositive/perf/

Review Request 63470: HIVE-17767 Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN

2017-11-01 Thread Vineet Garg

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63470/
---

Review request for hive and Ashutosh Chauhan.


Bugs: HIVE-17767
https://issues.apache.org/jira/browse/HIVE-17767


Repository: hive-git


Description
---

This patch does the following:
* Added back earlier patch to generate value generator
* Added logic to rewrite EXISTS/IN correlated subqueries into LEFT SEMI JOIN
* Remove SemiJoinTransposeRule (This rule pushes semi join underneath its left 
join which might not be semantically correct thing to do)


Diffs
-

  itests/src/test/resources/testconfiguration.properties 462f332e99 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveSubQRemoveRelBuilder.java
 3a1897f4aa 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java
 62125f0fb7 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java
 2dca6a25ac 
  ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 80351bef87 
  ql/src/test/queries/clientpositive/subquery_exists.q 19c42f0c29 
  ql/src/test/queries/clientpositive/subquery_in.q 4ba170a706 
  ql/src/test/results/clientpositive/constprog_partitioner.q.out 87618df902 
  ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_2.q.out 
87e08fbcde 
  ql/src/test/results/clientpositive/llap/explainuser_1.q.out 6e55acf0d8 
  ql/src/test/results/clientpositive/llap/lineage3.q.out 66cc6ad5a0 
  ql/src/test/results/clientpositive/llap/subquery_exists.q.out e206f0851e 
  ql/src/test/results/clientpositive/llap/subquery_in.q.out af42131bc2 
  ql/src/test/results/clientpositive/llap/subquery_multi.q.out 96fe17a05a 
  ql/src/test/results/clientpositive/llap/subquery_notin.q.out 8e2ca937af 
  ql/src/test/results/clientpositive/llap/subquery_scalar.q.out c89d053b4a 
  ql/src/test/results/clientpositive/llap/subquery_select.q.out 118f6ebccf 
  ql/src/test/results/clientpositive/llap/subquery_views.q.out a9a81133b5 
  ql/src/test/results/clientpositive/llap/vector_mapjoin_reduce.q.out 
4e6f00f6b7 
  ql/src/test/results/clientpositive/masking_12.q.out 540c53e825 
  ql/src/test/results/clientpositive/masking_3.q.out 1114c80676 
  ql/src/test/results/clientpositive/masking_4.q.out 527da21610 
  ql/src/test/results/clientpositive/perf/spark/query10.q.out eb3a2f6699 
  ql/src/test/results/clientpositive/perf/spark/query16.q.out b74d721d41 
  ql/src/test/results/clientpositive/perf/spark/query35.q.out 8759b71b8c 
  ql/src/test/results/clientpositive/perf/spark/query69.q.out e4430beaac 
  ql/src/test/results/clientpositive/perf/spark/query94.q.out 43b8c77bdc 
  ql/src/test/results/clientpositive/perf/tez/query10.q.out cf3651b35b 
  ql/src/test/results/clientpositive/perf/tez/query14.q.out b2a45f155a 
  ql/src/test/results/clientpositive/perf/tez/query16.q.out a7b710d6e1 
  ql/src/test/results/clientpositive/perf/tez/query23.q.out 7112de61d9 
  ql/src/test/results/clientpositive/perf/tez/query35.q.out a72f57816e 
  ql/src/test/results/clientpositive/perf/tez/query69.q.out 591f3fcdb0 
  ql/src/test/results/clientpositive/perf/tez/query94.q.out 7674aa7f7c 
  ql/src/test/results/clientpositive/semijoin5.q.out 533c077f58 
  ql/src/test/results/clientpositive/spark/constprog_partitioner.q.out 
b89f9f5905 
  ql/src/test/results/clientpositive/spark/spark_explainuser_1.q.out 76c74d9ab7 
  ql/src/test/results/clientpositive/spark/subquery_exists.q.out dafe5b6d5b 
  ql/src/test/results/clientpositive/spark/subquery_in.q.out 471c2ccd94 
  ql/src/test/results/clientpositive/spark/subquery_multi.q.out ff519fda09 
  ql/src/test/results/clientpositive/spark/subquery_notin.q.out 1b2c0880ae 
  ql/src/test/results/clientpositive/spark/subquery_scalar.q.out de005ada82 
  ql/src/test/results/clientpositive/spark/subquery_select.q.out 7d3a16b6ee 
  ql/src/test/results/clientpositive/spark/subquery_views.q.out 91e39913a7 
  ql/src/test/results/clientpositive/spark/vector_mapjoin_reduce.q.out 
81af937e97 
  ql/src/test/results/clientpositive/subquery_exists.q.out c9f2a79041 
  ql/src/test/results/clientpositive/subquery_exists_having.q.out 2c41ff6c33 
  ql/src/test/results/clientpositive/subquery_in_having.q.out 6893442b61 
  ql/src/test/results/clientpositive/subquery_notexists.q.out 329573e8e1 
  ql/src/test/results/clientpositive/subquery_notexists_having.q.out 4d2b2fc873 
  ql/src/test/results/clientpositive/subquery_notin_having.q.out c321fe69ed 
  ql/src/test/results/clientpositive/subquery_unqualcolumnrefs.q.out 5c306f6b47 
  ql/src/test/results/clientpositive/vector_mapjoin_reduce.q.out ddea584990 


Diff: https://reviews.apache.org/r/63470/diff/1/


Testing
---


Thanks,

Vineet Garg