[jira] [Commented] (HIVE-17767) Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN

2017-11-08 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244529#comment-16244529
 ] 

Zoltan Haindrich commented on HIVE-17767:
-

[~vgarg] I think you've developed this on a separate branch which was forked 
before the "cross product" warning stabilization; so I've added an addendum to 
change back the order. for the following files:
{code}
ql/src/test/results/clientpositive/perf/tez/query23.q.out
ql/src/test/results/clientpositive/perf/tez/query14.q.out
{code}


> Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN
> ---
>
> Key: HIVE-17767
> URL: https://issues.apache.org/jira/browse/HIVE-17767
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17767.1.patch, HIVE-17767.2.patch, 
> HIVE-17767.3.patch, HIVE-17767.4.patch, HIVE-17767.5.patch, 
> HIVE-17767.6.patch, HIVE-17767.7.patch
>
>
> Currently such queries are written into group by + inner join with value 
> generator and is inefficient. Value generator consists of join with outer 
> query to fetch all correlated values. This value generator could be 
> completely eliminated if such queries are instead rewritten into LEFT SEMI 
> JOIN.
> Note that to do this first hive need to support LEFT SEMI JOIN with non-equi 
> condition (HIVE-17766).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17767) Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN

2017-11-06 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241527#comment-16241527
 ] 

Ashutosh Chauhan commented on HIVE-17767:
-

+1

> Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN
> ---
>
> Key: HIVE-17767
> URL: https://issues.apache.org/jira/browse/HIVE-17767
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17767.1.patch, HIVE-17767.2.patch, 
> HIVE-17767.3.patch, HIVE-17767.4.patch, HIVE-17767.5.patch, 
> HIVE-17767.6.patch, HIVE-17767.7.patch
>
>
> Currently such queries are written into group by + inner join with value 
> generator and is inefficient. Value generator consists of join with outer 
> query to fetch all correlated values. This value generator could be 
> completely eliminated if such queries are instead rewritten into LEFT SEMI 
> JOIN.
> Note that to do this first hive need to support LEFT SEMI JOIN with non-equi 
> condition (HIVE-17766).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17767) Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN

2017-11-06 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241519#comment-16241519
 ] 

Vineet Garg commented on HIVE-17767:


Removed those unwanted changes in latest patch.

> Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN
> ---
>
> Key: HIVE-17767
> URL: https://issues.apache.org/jira/browse/HIVE-17767
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17767.1.patch, HIVE-17767.2.patch, 
> HIVE-17767.3.patch, HIVE-17767.4.patch, HIVE-17767.5.patch, 
> HIVE-17767.6.patch, HIVE-17767.7.patch
>
>
> Currently such queries are written into group by + inner join with value 
> generator and is inefficient. Value generator consists of join with outer 
> query to fetch all correlated values. This value generator could be 
> completely eliminated if such queries are instead rewritten into LEFT SEMI 
> JOIN.
> Note that to do this first hive need to support LEFT SEMI JOIN with non-equi 
> condition (HIVE-17766).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17767) Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN

2017-11-06 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241503#comment-16241503
 ] 

Ashutosh Chauhan commented on HIVE-17767:
-

seems like there are unrelated changes in latest patch.
+1 post removing those changes.

> Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN
> ---
>
> Key: HIVE-17767
> URL: https://issues.apache.org/jira/browse/HIVE-17767
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17767.1.patch, HIVE-17767.2.patch, 
> HIVE-17767.3.patch, HIVE-17767.4.patch, HIVE-17767.5.patch, HIVE-17767.6.patch
>
>
> Currently such queries are written into group by + inner join with value 
> generator and is inefficient. Value generator consists of join with outer 
> query to fetch all correlated values. This value generator could be 
> completely eliminated if such queries are instead rewritten into LEFT SEMI 
> JOIN.
> Note that to do this first hive need to support LEFT SEMI JOIN with non-equi 
> condition (HIVE-17766).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17767) Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN

2017-11-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241362#comment-16241362
 ] 

Hive QA commented on HIVE-17767:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12896253/HIVE-17767.5.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11354 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=62)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=156)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=174)
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc]
 (batchId=94)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_multi] 
(batchId=111)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] 
(batchId=243)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query23] 
(batchId=243)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=206)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7669/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7669/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7669/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12896253 - PreCommit-HIVE-Build

> Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN
> ---
>
> Key: HIVE-17767
> URL: https://issues.apache.org/jira/browse/HIVE-17767
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17767.1.patch, HIVE-17767.2.patch, 
> HIVE-17767.3.patch, HIVE-17767.4.patch, HIVE-17767.5.patch
>
>
> Currently such queries are written into group by + inner join with value 
> generator and is inefficient. Value generator consists of join with outer 
> query to fetch all correlated values. This value generator could be 
> completely eliminated if such queries are instead rewritten into LEFT SEMI 
> JOIN.
> Note that to do this first hive need to support LEFT SEMI JOIN with non-equi 
> condition (HIVE-17766).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17767) Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN

2017-11-03 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237779#comment-16237779
 ] 

Ashutosh Chauhan commented on HIVE-17767:
-

Looks like few tests need update.

> Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN
> ---
>
> Key: HIVE-17767
> URL: https://issues.apache.org/jira/browse/HIVE-17767
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-17767.1.patch, HIVE-17767.2.patch, 
> HIVE-17767.3.patch, HIVE-17767.4.patch
>
>
> Currently such queries are written into group by + inner join with value 
> generator and is inefficient. Value generator consists of join with outer 
> query to fetch all correlated values. This value generator could be 
> completely eliminated if such queries are instead rewritten into LEFT SEMI 
> JOIN.
> Note that to do this first hive need to support LEFT SEMI JOIN with non-equi 
> condition (HIVE-17766).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17767) Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN

2017-11-03 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237207#comment-16237207
 ] 

Hive QA commented on HIVE-17767:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12895342/HIVE-17767.4.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 28 failed/errored test(s), 11352 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=62)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[constprog_semijoin]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_exists]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_in_having]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_multi]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=156)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[constprog_partitioner]
 (batchId=175)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[constprog_semijoin]
 (batchId=175)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=174)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=102)
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc]
 (batchId=94)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_multi] 
(batchId=111)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] 
(batchId=243)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query23] 
(batchId=243)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=206)
org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testAmPoolInteractions 
(batchId=281)
org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testApplyPlanQpChanges 
(batchId=281)
org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testApplyPlanUserMapping 
(batchId=281)
org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testAsyncSessionInitFailures
 (batchId=281)
org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testClusterFractions 
(batchId=281)
org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testDestroyAndReturn 
(batchId=281)
org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testQueueing 
(batchId=281)
org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testReopen (batchId=281)
org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testReuse (batchId=281)
org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testReuseWithDifferentPool
 (batchId=281)
org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testReuseWithQueueing 
(batchId=281)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7603/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7603/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7603/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 28 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12895342 - PreCommit-HIVE-Build

> Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN
> ---
>
> Key: HIVE-17767
> URL: https://issues.apache.org/jira/browse/HIVE-17767
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-17767.1.patch, HIVE-17767.2.patch, 
> HIVE-17767.3.patch, HIVE-17767.4.patch
>
>
> Currently such queries are written into group by + inner join with value 
> generator and is inefficient. Value generator consists of join with outer 
> query to fetch all correlated values. This value generator could be 
> completely eliminated if such queries are instead rewritten into LEFT SEMI 
> JOIN.
> Note that to do this first hive need to support LEFT SEMI JOIN with non-equi 
> condition (HIVE-17766).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17767) Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN

2017-10-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223457#comment-16223457
 ] 

Hive QA commented on HIVE-17767:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12894448/HIVE-17767.1.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 55 failed/errored test(s), 11341 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=245)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constprog_partitioner] 
(batchId=70)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_12] (batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_3] (batchId=53)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_4] (batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[semijoin5] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_exists] 
(batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_exists_having] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_in_having] 
(batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_unqualcolumnrefs]
 (batchId=17)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_mapjoin_reduce] 
(batchId=77)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction_2]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage3] 
(batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_in]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_select]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_mapjoin_reduce]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[constprog_partitioner]
 (batchId=174)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=173)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_corr_in_agg]
 (batchId=91)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_in_implicit_gby]
 (batchId=90)
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc]
 (batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_exists] 
(batchId=120)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_in] 
(batchId=130)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_multi] 
(batchId=110)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_notin] 
(batchId=133)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_scalar] 
(batchId=119)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_select] 
(batchId=119)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_views] 
(batchId=108)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_mapjoin_reduce]
 (batchId=137)
org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query10] 
(batchId=244)
org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query16] 
(batchId=244)
org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query35] 
(batchId=244)
org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query69] 
(batchId=244)
org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query94] 
(batchId=244)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query10] 
(batchId=242)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] 
(batchId=242)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query16] 
(batchId=242)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query23] 
(batchId=242)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query35] 
(batchId=242)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query69] 
(batchId=242)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query94] 
(batchId=242)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=205)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=222)
org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testCancelRenewTokenFlow