[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin
[ https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784796#comment-16784796 ] Vineet Garg commented on HIVE-21340: Thanks [~jcamachorodriguez]. Follow-up JIRA HIVE-21395 > CBO: Prune non-key columns feeding into a SemiJoin > -- > > Key: HIVE-21340 > URL: https://issues.apache.org/jira/browse/HIVE-21340 > Project: Hive > Issue Type: Bug > Components: CBO, Query Planning >Affects Versions: 4.0.0 >Reporter: Gopal V >Assignee: Vineet Garg >Priority: Major > Attachments: HIVE-21340.1.patch, HIVE-21340.2.patch, > HIVE-21340.3.patch > > > {code} > explain cbo > with ss as > (select count(1), ss_item_sk, ss_ticket_number from > store_sales group by ss_item_sk, ss_ticket_number > having count(1) > 1) > select count(1) from item where i_item_sk IN (select ss_item_sk from ss); > {code} > Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} > Only ss_item_sk is relevant for the HiveSemiJoin > {code} > CBO PLAN: > HiveAggregate(group=[{}], agg#0=[count()]) > HiveSemiJoin(condition=[=($0, $1)], joinType=[inner]) > HiveProject(i_item_sk=[$0]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], > table:alias=[item]) > HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2]) > HiveFilter(condition=[>($2, 1)]) > HiveAggregate(group=[{1, 8}], agg#0=[count()]) > HiveFilter(condition=[IS NOT NULL($1)]) > HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, > store_sales]], table:alias=[store_sales]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin
[ https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784633#comment-16784633 ] Jesus Camacho Rodriguez commented on HIVE-21340: Rb link did not work (permission denied), but I went through the patch and LGTM. +1 [~vgarg], please create the follow-up to avoid using HepVertex, that will also help to move this change to Calcite eventually. > CBO: Prune non-key columns feeding into a SemiJoin > -- > > Key: HIVE-21340 > URL: https://issues.apache.org/jira/browse/HIVE-21340 > Project: Hive > Issue Type: Bug > Components: CBO, Query Planning >Affects Versions: 4.0.0 >Reporter: Gopal V >Assignee: Vineet Garg >Priority: Major > Attachments: HIVE-21340.1.patch, HIVE-21340.2.patch, > HIVE-21340.3.patch > > > {code} > explain cbo > with ss as > (select count(1), ss_item_sk, ss_ticket_number from > store_sales group by ss_item_sk, ss_ticket_number > having count(1) > 1) > select count(1) from item where i_item_sk IN (select ss_item_sk from ss); > {code} > Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} > Only ss_item_sk is relevant for the HiveSemiJoin > {code} > CBO PLAN: > HiveAggregate(group=[{}], agg#0=[count()]) > HiveSemiJoin(condition=[=($0, $1)], joinType=[inner]) > HiveProject(i_item_sk=[$0]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], > table:alias=[item]) > HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2]) > HiveFilter(condition=[>($2, 1)]) > HiveAggregate(group=[{1, 8}], agg#0=[count()]) > HiveFilter(condition=[IS NOT NULL($1)]) > HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, > store_sales]], table:alias=[store_sales]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin
[ https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784044#comment-16784044 ] Hive QA commented on HIVE-21340: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12961065/HIVE-21340.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 15817 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16334/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16334/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16334/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12961065 - PreCommit-HIVE-Build > CBO: Prune non-key columns feeding into a SemiJoin > -- > > Key: HIVE-21340 > URL: https://issues.apache.org/jira/browse/HIVE-21340 > Project: Hive > Issue Type: Bug > Components: CBO, Query Planning >Affects Versions: 4.0.0 >Reporter: Gopal V >Assignee: Vineet Garg >Priority: Major > Attachments: HIVE-21340.1.patch, HIVE-21340.2.patch, > HIVE-21340.3.patch > > > {code} > explain cbo > with ss as > (select count(1), ss_item_sk, ss_ticket_number from > store_sales group by ss_item_sk, ss_ticket_number > having count(1) > 1) > select count(1) from item where i_item_sk IN (select ss_item_sk from ss); > {code} > Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} > Only ss_item_sk is relevant for the HiveSemiJoin > {code} > CBO PLAN: > HiveAggregate(group=[{}], agg#0=[count()]) > HiveSemiJoin(condition=[=($0, $1)], joinType=[inner]) > HiveProject(i_item_sk=[$0]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], > table:alias=[item]) > HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2]) > HiveFilter(condition=[>($2, 1)]) > HiveAggregate(group=[{1, 8}], agg#0=[count()]) > HiveFilter(condition=[IS NOT NULL($1)]) > HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, > store_sales]], table:alias=[store_sales]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin
[ https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784030#comment-16784030 ] Hive QA commented on HIVE-21340: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 14s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 21s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 35s{color} | {color:blue} ql in master has 2251 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 29m 52s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16334/dev-support/hive-personality.sh | | git revision | master / fc3eefa | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-16334/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > CBO: Prune non-key columns feeding into a SemiJoin > -- > > Key: HIVE-21340 > URL: https://issues.apache.org/jira/browse/HIVE-21340 > Project: Hive > Issue Type: Bug > Components: CBO, Query Planning >Affects Versions: 4.0.0 >Reporter: Gopal V >Assignee: Vineet Garg >Priority: Major > Attachments: HIVE-21340.1.patch, HIVE-21340.2.patch, > HIVE-21340.3.patch > > > {code} > explain cbo > with ss as > (select count(1), ss_item_sk, ss_ticket_number from > store_sales group by ss_item_sk, ss_ticket_number > having count(1) > 1) > select count(1) from item where i_item_sk IN (select ss_item_sk from ss); > {code} > Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} > Only ss_item_sk is relevant for the HiveSemiJoin > {code} > CBO PLAN: > HiveAggregate(group=[{}], agg#0=[count()]) > HiveSemiJoin(condition=[=($0, $1)], joinType=[inner]) > HiveProject(i_item_sk=[$0]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], > table:alias=[item]) > HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2]) > HiveFilter(condition=[>($2, 1)]) > HiveAggregate(group=[{1, 8}], agg#0=[count()]) > HiveFilter(condition=[IS NOT NULL($1)]) > HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, > store_sales]], table:alias=[store_sales]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin
[ https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784026#comment-16784026 ] Jesus Camacho Rodriguez commented on HIVE-21340: [~vgarg], can you create a PR / RB? Thanks > CBO: Prune non-key columns feeding into a SemiJoin > -- > > Key: HIVE-21340 > URL: https://issues.apache.org/jira/browse/HIVE-21340 > Project: Hive > Issue Type: Bug > Components: CBO, Query Planning >Affects Versions: 4.0.0 >Reporter: Gopal V >Assignee: Vineet Garg >Priority: Major > Attachments: HIVE-21340.1.patch, HIVE-21340.2.patch, > HIVE-21340.3.patch > > > {code} > explain cbo > with ss as > (select count(1), ss_item_sk, ss_ticket_number from > store_sales group by ss_item_sk, ss_ticket_number > having count(1) > 1) > select count(1) from item where i_item_sk IN (select ss_item_sk from ss); > {code} > Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} > Only ss_item_sk is relevant for the HiveSemiJoin > {code} > CBO PLAN: > HiveAggregate(group=[{}], agg#0=[count()]) > HiveSemiJoin(condition=[=($0, $1)], joinType=[inner]) > HiveProject(i_item_sk=[$0]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], > table:alias=[item]) > HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2]) > HiveFilter(condition=[>($2, 1)]) > HiveAggregate(group=[{1, 8}], agg#0=[count()]) > HiveFilter(condition=[IS NOT NULL($1)]) > HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, > store_sales]], table:alias=[store_sales]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin
[ https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784028#comment-16784028 ] Vineet Garg commented on HIVE-21340: [~jcamachorodriguez] RB link: https://reviews.apache.org/r/70125/ > CBO: Prune non-key columns feeding into a SemiJoin > -- > > Key: HIVE-21340 > URL: https://issues.apache.org/jira/browse/HIVE-21340 > Project: Hive > Issue Type: Bug > Components: CBO, Query Planning >Affects Versions: 4.0.0 >Reporter: Gopal V >Assignee: Vineet Garg >Priority: Major > Attachments: HIVE-21340.1.patch, HIVE-21340.2.patch, > HIVE-21340.3.patch > > > {code} > explain cbo > with ss as > (select count(1), ss_item_sk, ss_ticket_number from > store_sales group by ss_item_sk, ss_ticket_number > having count(1) > 1) > select count(1) from item where i_item_sk IN (select ss_item_sk from ss); > {code} > Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} > Only ss_item_sk is relevant for the HiveSemiJoin > {code} > CBO PLAN: > HiveAggregate(group=[{}], agg#0=[count()]) > HiveSemiJoin(condition=[=($0, $1)], joinType=[inner]) > HiveProject(i_item_sk=[$0]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], > table:alias=[item]) > HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2]) > HiveFilter(condition=[>($2, 1)]) > HiveAggregate(group=[{1, 8}], agg#0=[count()]) > HiveFilter(condition=[IS NOT NULL($1)]) > HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, > store_sales]], table:alias=[store_sales]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin
[ https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784020#comment-16784020 ] Vineet Garg commented on HIVE-21340: [~jcamachorodriguez] Can you take a look please? > CBO: Prune non-key columns feeding into a SemiJoin > -- > > Key: HIVE-21340 > URL: https://issues.apache.org/jira/browse/HIVE-21340 > Project: Hive > Issue Type: Bug > Components: CBO, Query Planning >Affects Versions: 4.0.0 >Reporter: Gopal V >Assignee: Vineet Garg >Priority: Major > Attachments: HIVE-21340.1.patch, HIVE-21340.2.patch, > HIVE-21340.3.patch > > > {code} > explain cbo > with ss as > (select count(1), ss_item_sk, ss_ticket_number from > store_sales group by ss_item_sk, ss_ticket_number > having count(1) > 1) > select count(1) from item where i_item_sk IN (select ss_item_sk from ss); > {code} > Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} > Only ss_item_sk is relevant for the HiveSemiJoin > {code} > CBO PLAN: > HiveAggregate(group=[{}], agg#0=[count()]) > HiveSemiJoin(condition=[=($0, $1)], joinType=[inner]) > HiveProject(i_item_sk=[$0]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], > table:alias=[item]) > HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2]) > HiveFilter(condition=[>($2, 1)]) > HiveAggregate(group=[{1, 8}], agg#0=[count()]) > HiveFilter(condition=[IS NOT NULL($1)]) > HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, > store_sales]], table:alias=[store_sales]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin
[ https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781498#comment-16781498 ] Hive QA commented on HIVE-21340: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12960682/HIVE-21340.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 15824 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[semijoin] (batchId=121) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16306/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16306/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16306/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12960682 - PreCommit-HIVE-Build > CBO: Prune non-key columns feeding into a SemiJoin > -- > > Key: HIVE-21340 > URL: https://issues.apache.org/jira/browse/HIVE-21340 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0 >Reporter: Gopal V >Assignee: Vineet Garg >Priority: Major > Attachments: HIVE-21340.1.patch, HIVE-21340.2.patch > > > {code} > explain cbo > with ss as > (select count(1), ss_item_sk, ss_ticket_number from > store_sales group by ss_item_sk, ss_ticket_number > having count(1) > 1) > select count(1) from item where i_item_sk IN (select ss_item_sk from ss); > {code} > Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} > Only ss_item_sk is relevant for the HiveSemiJoin > {code} > CBO PLAN: > HiveAggregate(group=[{}], agg#0=[count()]) > HiveSemiJoin(condition=[=($0, $1)], joinType=[inner]) > HiveProject(i_item_sk=[$0]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], > table:alias=[item]) > HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2]) > HiveFilter(condition=[>($2, 1)]) > HiveAggregate(group=[{1, 8}], agg#0=[count()]) > HiveFilter(condition=[IS NOT NULL($1)]) > HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, > store_sales]], table:alias=[store_sales]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin
[ https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781466#comment-16781466 ] Hive QA commented on HIVE-21340: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 17s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 22s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 24s{color} | {color:blue} ql in master has 2251 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 45s{color} | {color:red} ql: The patch generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 4m 57s{color} | {color:red} ql generated 1 new + 2251 unchanged - 0 fixed = 2252 total (was 2251) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 27m 52s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:ql | | | Dead store to newRightKeys in org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveSemiJoinRule.perform(RelOptRuleCall, ImmutableBitSet, RelNode, Join, RelNode, Aggregate) At HiveSemiJoinRule.java:org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveSemiJoinRule.perform(RelOptRuleCall, ImmutableBitSet, RelNode, Join, RelNode, Aggregate) At HiveSemiJoinRule.java:[line 131] | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16306/dev-support/hive-personality.sh | | git revision | master / 6831b08 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-16306/yetus/diff-checkstyle-ql.txt | | findbugs | http://104.198.109.242/logs//PreCommit-HIVE-Build-16306/yetus/new-findbugs-ql.html | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-16306/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > CBO: Prune non-key columns feeding into a SemiJoin > -- > > Key: HIVE-21340 > URL: https://issues.apache.org/jira/browse/HIVE-21340 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0 >Reporter: Gopal V >Assignee: Vineet Garg >Priority: Major > Attachments: HIVE-21340.1.patch, HIVE-21340.2.patch > > > {code} > explain cbo > with ss as > (select count(1), ss_item_sk, ss_ticket_number from > store_sales group by ss_item_sk, ss_ticket_number > having count(1) > 1) > select count(1) from item where i_item_sk IN (select ss_item_sk from ss); > {code} > Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} > Only ss_item_sk is relevant for the HiveSemiJoin > {code} > CBO PLAN: > HiveAggregate(group=[{}], agg#0=[count()]) > H
[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin
[ https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780906#comment-16780906 ] Hive QA commented on HIVE-21340: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12960521/HIVE-21340.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 15824 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[semijoin] (batchId=121) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[cbo_query14] (batchId=275) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[cbo_query83] (batchId=275) org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[cbo_query23] (batchId=275) org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[cbo_query83] (batchId=275) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16296/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16296/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16296/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12960521 - PreCommit-HIVE-Build > CBO: Prune non-key columns feeding into a SemiJoin > -- > > Key: HIVE-21340 > URL: https://issues.apache.org/jira/browse/HIVE-21340 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0 >Reporter: Gopal V >Assignee: Vineet Garg >Priority: Major > Attachments: HIVE-21340.1.patch > > > {code} > explain cbo > with ss as > (select count(1), ss_item_sk, ss_ticket_number from > store_sales group by ss_item_sk, ss_ticket_number > having count(1) > 1) > select count(1) from item where i_item_sk IN (select ss_item_sk from ss); > {code} > Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} > Only ss_item_sk is relevant for the HiveSemiJoin > {code} > CBO PLAN: > HiveAggregate(group=[{}], agg#0=[count()]) > HiveSemiJoin(condition=[=($0, $1)], joinType=[inner]) > HiveProject(i_item_sk=[$0]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], > table:alias=[item]) > HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2]) > HiveFilter(condition=[>($2, 1)]) > HiveAggregate(group=[{1, 8}], agg#0=[count()]) > HiveFilter(condition=[IS NOT NULL($1)]) > HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, > store_sales]], table:alias=[store_sales]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin
[ https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780786#comment-16780786 ] Vineet Garg commented on HIVE-21340: [~jcamachorodriguez] That is a good suggestion. I'll update this rule in a follow-up patch since the current patch doesn't introduce HepRelVertex code only moves it around. > CBO: Prune non-key columns feeding into a SemiJoin > -- > > Key: HIVE-21340 > URL: https://issues.apache.org/jira/browse/HIVE-21340 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0 >Reporter: Gopal V >Assignee: Vineet Garg >Priority: Major > Attachments: HIVE-21340.1.patch > > > {code} > explain cbo > with ss as > (select count(1), ss_item_sk, ss_ticket_number from > store_sales group by ss_item_sk, ss_ticket_number > having count(1) > 1) > select count(1) from item where i_item_sk IN (select ss_item_sk from ss); > {code} > Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} > Only ss_item_sk is relevant for the HiveSemiJoin > {code} > CBO PLAN: > HiveAggregate(group=[{}], agg#0=[count()]) > HiveSemiJoin(condition=[=($0, $1)], joinType=[inner]) > HiveProject(i_item_sk=[$0]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], > table:alias=[item]) > HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2]) > HiveFilter(condition=[>($2, 1)]) > HiveAggregate(group=[{1, 8}], agg#0=[count()]) > HiveFilter(condition=[IS NOT NULL($1)]) > HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, > store_sales]], table:alias=[store_sales]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin
[ https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780768#comment-16780768 ] Jesus Camacho Rodriguez commented on HIVE-21340: [~vgarg], I believe you could just add a RelNode.class in the operand matcher and pass it as a parameter to {{perform}}, so you can avoid casting to HepRelVertex altogether (rule can remain generic, which makes it easier to contribute it back to Calcite too). > CBO: Prune non-key columns feeding into a SemiJoin > -- > > Key: HIVE-21340 > URL: https://issues.apache.org/jira/browse/HIVE-21340 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0 >Reporter: Gopal V >Assignee: Vineet Garg >Priority: Major > Attachments: HIVE-21340.1.patch > > > {code} > explain cbo > with ss as > (select count(1), ss_item_sk, ss_ticket_number from > store_sales group by ss_item_sk, ss_ticket_number > having count(1) > 1) > select count(1) from item where i_item_sk IN (select ss_item_sk from ss); > {code} > Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} > Only ss_item_sk is relevant for the HiveSemiJoin > {code} > CBO PLAN: > HiveAggregate(group=[{}], agg#0=[count()]) > HiveSemiJoin(condition=[=($0, $1)], joinType=[inner]) > HiveProject(i_item_sk=[$0]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], > table:alias=[item]) > HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2]) > HiveFilter(condition=[>($2, 1)]) > HiveAggregate(group=[{1, 8}], agg#0=[count()]) > HiveFilter(condition=[IS NOT NULL($1)]) > HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, > store_sales]], table:alias=[store_sales]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin
[ https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780025#comment-16780025 ] Vineet Garg commented on HIVE-21340: Problem is with HiveSemiJoinRule. Column pruning is occurring e.g. the plan just before HiveSemiJoinRule is: {code:sql} HiveAggregate(group=[{}], agg#0=[count()]) HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none], cost=[not available]) HiveProject(i_item_sk=[$0]) HiveFilter(condition=[IS NOT NULL($0)]) HiveTableScan(table=[[perf, item]], table:alias=[item]) HiveAggregate(group=[{0}]) HiveFilter(condition=[>($2, 1)]) HiveAggregate(group=[{2, 9}], agg#0=[count()]) HiveFilter(condition=[IS NOT NULL($2)]) HiveTableScan(table=[[perf, store_sales]], table:alias=[store_sales]) {code} HiveSemiJoinRule rewrites the HiveJoin + HIveAggregate into HiveSemiJoin. It does not introduce HiveProject as replacement of HiveAggregate, as a result schema changes to whatever HiveAggregate's input is (HiveFilter in this case) > CBO: Prune non-key columns feeding into a SemiJoin > -- > > Key: HIVE-21340 > URL: https://issues.apache.org/jira/browse/HIVE-21340 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0 >Reporter: Gopal V >Assignee: Vineet Garg >Priority: Major > > {code} > explain cbo > with ss as > (select count(1), ss_item_sk, ss_ticket_number from > store_sales group by ss_item_sk, ss_ticket_number > having count(1) > 1) > select count(1) from item where i_item_sk IN (select ss_item_sk from ss); > {code} > Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} > Only ss_item_sk is relevant for the HiveSemiJoin > {code} > CBO PLAN: > HiveAggregate(group=[{}], agg#0=[count()]) > HiveSemiJoin(condition=[=($0, $1)], joinType=[inner]) > HiveProject(i_item_sk=[$0]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], > table:alias=[item]) > HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2]) > HiveFilter(condition=[>($2, 1)]) > HiveAggregate(group=[{1, 8}], agg#0=[count()]) > HiveFilter(condition=[IS NOT NULL($1)]) > HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, > store_sales]], table:alias=[store_sales]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)