[jira] [Commented] (HIVE-17767) Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN
[ https://issues.apache.org/jira/browse/HIVE-17767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244529#comment-16244529 ] Zoltan Haindrich commented on HIVE-17767: - [~vgarg] I think you've developed this on a separate branch which was forked before the "cross product" warning stabilization; so I've added an addendum to change back the order. for the following files: {code} ql/src/test/results/clientpositive/perf/tez/query23.q.out ql/src/test/results/clientpositive/perf/tez/query14.q.out {code} > Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN > --- > > Key: HIVE-17767 > URL: https://issues.apache.org/jira/browse/HIVE-17767 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17767.1.patch, HIVE-17767.2.patch, > HIVE-17767.3.patch, HIVE-17767.4.patch, HIVE-17767.5.patch, > HIVE-17767.6.patch, HIVE-17767.7.patch > > > Currently such queries are written into group by + inner join with value > generator and is inefficient. Value generator consists of join with outer > query to fetch all correlated values. This value generator could be > completely eliminated if such queries are instead rewritten into LEFT SEMI > JOIN. > Note that to do this first hive need to support LEFT SEMI JOIN with non-equi > condition (HIVE-17766). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17767) Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN
[ https://issues.apache.org/jira/browse/HIVE-17767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241527#comment-16241527 ] Ashutosh Chauhan commented on HIVE-17767: - +1 > Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN > --- > > Key: HIVE-17767 > URL: https://issues.apache.org/jira/browse/HIVE-17767 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17767.1.patch, HIVE-17767.2.patch, > HIVE-17767.3.patch, HIVE-17767.4.patch, HIVE-17767.5.patch, > HIVE-17767.6.patch, HIVE-17767.7.patch > > > Currently such queries are written into group by + inner join with value > generator and is inefficient. Value generator consists of join with outer > query to fetch all correlated values. This value generator could be > completely eliminated if such queries are instead rewritten into LEFT SEMI > JOIN. > Note that to do this first hive need to support LEFT SEMI JOIN with non-equi > condition (HIVE-17766). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17767) Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN
[ https://issues.apache.org/jira/browse/HIVE-17767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241519#comment-16241519 ] Vineet Garg commented on HIVE-17767: Removed those unwanted changes in latest patch. > Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN > --- > > Key: HIVE-17767 > URL: https://issues.apache.org/jira/browse/HIVE-17767 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17767.1.patch, HIVE-17767.2.patch, > HIVE-17767.3.patch, HIVE-17767.4.patch, HIVE-17767.5.patch, > HIVE-17767.6.patch, HIVE-17767.7.patch > > > Currently such queries are written into group by + inner join with value > generator and is inefficient. Value generator consists of join with outer > query to fetch all correlated values. This value generator could be > completely eliminated if such queries are instead rewritten into LEFT SEMI > JOIN. > Note that to do this first hive need to support LEFT SEMI JOIN with non-equi > condition (HIVE-17766). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17767) Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN
[ https://issues.apache.org/jira/browse/HIVE-17767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241503#comment-16241503 ] Ashutosh Chauhan commented on HIVE-17767: - seems like there are unrelated changes in latest patch. +1 post removing those changes. > Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN > --- > > Key: HIVE-17767 > URL: https://issues.apache.org/jira/browse/HIVE-17767 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17767.1.patch, HIVE-17767.2.patch, > HIVE-17767.3.patch, HIVE-17767.4.patch, HIVE-17767.5.patch, HIVE-17767.6.patch > > > Currently such queries are written into group by + inner join with value > generator and is inefficient. Value generator consists of join with outer > query to fetch all correlated values. This value generator could be > completely eliminated if such queries are instead rewritten into LEFT SEMI > JOIN. > Note that to do this first hive need to support LEFT SEMI JOIN with non-equi > condition (HIVE-17766). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17767) Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN
[ https://issues.apache.org/jira/browse/HIVE-17767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241362#comment-16241362 ] Hive QA commented on HIVE-17767: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12896253/HIVE-17767.5.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11354 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=62) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=156) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1] (batchId=174) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc] (batchId=94) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_multi] (batchId=111) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] (batchId=243) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query23] (batchId=243) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=206) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=223) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7669/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7669/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7669/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12896253 - PreCommit-HIVE-Build > Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN > --- > > Key: HIVE-17767 > URL: https://issues.apache.org/jira/browse/HIVE-17767 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17767.1.patch, HIVE-17767.2.patch, > HIVE-17767.3.patch, HIVE-17767.4.patch, HIVE-17767.5.patch > > > Currently such queries are written into group by + inner join with value > generator and is inefficient. Value generator consists of join with outer > query to fetch all correlated values. This value generator could be > completely eliminated if such queries are instead rewritten into LEFT SEMI > JOIN. > Note that to do this first hive need to support LEFT SEMI JOIN with non-equi > condition (HIVE-17766). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17767) Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN
[ https://issues.apache.org/jira/browse/HIVE-17767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237779#comment-16237779 ] Ashutosh Chauhan commented on HIVE-17767: - Looks like few tests need update. > Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN > --- > > Key: HIVE-17767 > URL: https://issues.apache.org/jira/browse/HIVE-17767 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Attachments: HIVE-17767.1.patch, HIVE-17767.2.patch, > HIVE-17767.3.patch, HIVE-17767.4.patch > > > Currently such queries are written into group by + inner join with value > generator and is inefficient. Value generator consists of join with outer > query to fetch all correlated values. This value generator could be > completely eliminated if such queries are instead rewritten into LEFT SEMI > JOIN. > Note that to do this first hive need to support LEFT SEMI JOIN with non-equi > condition (HIVE-17766). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17767) Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN
[ https://issues.apache.org/jira/browse/HIVE-17767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237207#comment-16237207 ] Hive QA commented on HIVE-17767: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12895342/HIVE-17767.4.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 28 failed/errored test(s), 11352 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=62) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[constprog_semijoin] (batchId=162) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_exists] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_in_having] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_multi] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=156) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[constprog_partitioner] (batchId=175) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[constprog_semijoin] (batchId=175) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1] (batchId=174) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=102) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc] (batchId=94) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_multi] (batchId=111) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] (batchId=243) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query23] (batchId=243) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=206) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testAmPoolInteractions (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testApplyPlanQpChanges (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testApplyPlanUserMapping (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testAsyncSessionInitFailures (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testClusterFractions (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testDestroyAndReturn (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testQueueing (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testReopen (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testReuse (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testReuseWithDifferentPool (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testReuseWithQueueing (batchId=281) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=223) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7603/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7603/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7603/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 28 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12895342 - PreCommit-HIVE-Build > Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN > --- > > Key: HIVE-17767 > URL: https://issues.apache.org/jira/browse/HIVE-17767 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Attachments: HIVE-17767.1.patch, HIVE-17767.2.patch, > HIVE-17767.3.patch, HIVE-17767.4.patch > > > Currently such queries are written into group by + inner join with value > generator and is inefficient. Value generator consists of join with outer > query to fetch all correlated values. This value generator could be > completely eliminated if such queries are instead rewritten into LEFT SEMI > JOIN. > Note that to do this first hive need to support LEFT SEMI JOIN with non-equi > condition (HIVE-17766). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17767) Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN
[ https://issues.apache.org/jira/browse/HIVE-17767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223457#comment-16223457 ] Hive QA commented on HIVE-17767: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12894448/HIVE-17767.1.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 55 failed/errored test(s), 11341 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only] (batchId=245) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constprog_partitioner] (batchId=70) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=62) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_12] (batchId=1) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_3] (batchId=53) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_4] (batchId=26) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[semijoin5] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_exists] (batchId=40) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_exists_having] (batchId=3) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_in_having] (batchId=57) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_unqualcolumnrefs] (batchId=17) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_mapjoin_reduce] (batchId=77) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction_2] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1] (batchId=155) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage3] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=156) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_in] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar] (batchId=156) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_select] (batchId=156) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_mapjoin_reduce] (batchId=164) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[constprog_partitioner] (batchId=174) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1] (batchId=173) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_corr_in_agg] (batchId=91) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_in_implicit_gby] (batchId=90) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc] (batchId=93) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_exists] (batchId=120) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_in] (batchId=130) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_multi] (batchId=110) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_notin] (batchId=133) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_scalar] (batchId=119) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_select] (batchId=119) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_views] (batchId=108) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_mapjoin_reduce] (batchId=137) org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query10] (batchId=244) org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query16] (batchId=244) org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query35] (batchId=244) org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query69] (batchId=244) org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query94] (batchId=244) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query10] (batchId=242) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] (batchId=242) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query16] (batchId=242) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query23] (batchId=242) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query35] (batchId=242) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query69] (batchId=242) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query94] (batchId=242) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=205) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=222) org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testCancelRenewTokenFlow