[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin

2019-03-05 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784796#comment-16784796
 ] 

Vineet Garg commented on HIVE-21340:


Thanks [~jcamachorodriguez]. Follow-up JIRA HIVE-21395

> CBO: Prune non-key columns feeding into a SemiJoin
> --
>
> Key: HIVE-21340
> URL: https://issues.apache.org/jira/browse/HIVE-21340
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21340.1.patch, HIVE-21340.2.patch, 
> HIVE-21340.3.patch
>
>
> {code}
> explain cbo 
> with ss as 
> (select count(1), ss_item_sk, ss_ticket_number from 
> store_sales group by ss_item_sk, ss_ticket_number 
> having count(1) > 1) 
> select count(1) from item where i_item_sk IN (select ss_item_sk from ss);
> {code}
> Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} 
> Only ss_item_sk is relevant for the HiveSemiJoin
> {code}
> CBO PLAN:
> HiveAggregate(group=[{}], agg#0=[count()])
>   HiveSemiJoin(condition=[=($0, $1)], joinType=[inner])
> HiveProject(i_item_sk=[$0])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], 
> table:alias=[item])
> HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])
>   HiveFilter(condition=[>($2, 1)])
> HiveAggregate(group=[{1, 8}], agg#0=[count()])
>   HiveFilter(condition=[IS NOT NULL($1)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, 
> store_sales]], table:alias=[store_sales])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin

2019-03-05 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784633#comment-16784633
 ] 

Jesus Camacho Rodriguez commented on HIVE-21340:


Rb link did not work (permission denied), but I went through the patch and LGTM.

+1

[~vgarg], please create the follow-up to avoid using HepVertex, that will also 
help to move this change to Calcite eventually.

> CBO: Prune non-key columns feeding into a SemiJoin
> --
>
> Key: HIVE-21340
> URL: https://issues.apache.org/jira/browse/HIVE-21340
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21340.1.patch, HIVE-21340.2.patch, 
> HIVE-21340.3.patch
>
>
> {code}
> explain cbo 
> with ss as 
> (select count(1), ss_item_sk, ss_ticket_number from 
> store_sales group by ss_item_sk, ss_ticket_number 
> having count(1) > 1) 
> select count(1) from item where i_item_sk IN (select ss_item_sk from ss);
> {code}
> Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} 
> Only ss_item_sk is relevant for the HiveSemiJoin
> {code}
> CBO PLAN:
> HiveAggregate(group=[{}], agg#0=[count()])
>   HiveSemiJoin(condition=[=($0, $1)], joinType=[inner])
> HiveProject(i_item_sk=[$0])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], 
> table:alias=[item])
> HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])
>   HiveFilter(condition=[>($2, 1)])
> HiveAggregate(group=[{1, 8}], agg#0=[count()])
>   HiveFilter(condition=[IS NOT NULL($1)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, 
> store_sales]], table:alias=[store_sales])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin

2019-03-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784044#comment-16784044
 ] 

Hive QA commented on HIVE-21340:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12961065/HIVE-21340.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15817 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16334/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16334/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16334/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12961065 - PreCommit-HIVE-Build

> CBO: Prune non-key columns feeding into a SemiJoin
> --
>
> Key: HIVE-21340
> URL: https://issues.apache.org/jira/browse/HIVE-21340
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21340.1.patch, HIVE-21340.2.patch, 
> HIVE-21340.3.patch
>
>
> {code}
> explain cbo 
> with ss as 
> (select count(1), ss_item_sk, ss_ticket_number from 
> store_sales group by ss_item_sk, ss_ticket_number 
> having count(1) > 1) 
> select count(1) from item where i_item_sk IN (select ss_item_sk from ss);
> {code}
> Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} 
> Only ss_item_sk is relevant for the HiveSemiJoin
> {code}
> CBO PLAN:
> HiveAggregate(group=[{}], agg#0=[count()])
>   HiveSemiJoin(condition=[=($0, $1)], joinType=[inner])
> HiveProject(i_item_sk=[$0])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], 
> table:alias=[item])
> HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])
>   HiveFilter(condition=[>($2, 1)])
> HiveAggregate(group=[{1, 8}], agg#0=[count()])
>   HiveFilter(condition=[IS NOT NULL($1)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, 
> store_sales]], table:alias=[store_sales])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin

2019-03-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784030#comment-16784030
 ] 

Hive QA commented on HIVE-21340:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
21s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
35s{color} | {color:blue} ql in master has 2251 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
7s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 29m 52s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16334/dev-support/hive-personality.sh
 |
| git revision | master / fc3eefa |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16334/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> CBO: Prune non-key columns feeding into a SemiJoin
> --
>
> Key: HIVE-21340
> URL: https://issues.apache.org/jira/browse/HIVE-21340
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21340.1.patch, HIVE-21340.2.patch, 
> HIVE-21340.3.patch
>
>
> {code}
> explain cbo 
> with ss as 
> (select count(1), ss_item_sk, ss_ticket_number from 
> store_sales group by ss_item_sk, ss_ticket_number 
> having count(1) > 1) 
> select count(1) from item where i_item_sk IN (select ss_item_sk from ss);
> {code}
> Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} 
> Only ss_item_sk is relevant for the HiveSemiJoin
> {code}
> CBO PLAN:
> HiveAggregate(group=[{}], agg#0=[count()])
>   HiveSemiJoin(condition=[=($0, $1)], joinType=[inner])
> HiveProject(i_item_sk=[$0])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], 
> table:alias=[item])
> HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])
>   HiveFilter(condition=[>($2, 1)])
> HiveAggregate(group=[{1, 8}], agg#0=[count()])
>   HiveFilter(condition=[IS NOT NULL($1)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, 
> store_sales]], table:alias=[store_sales])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin

2019-03-04 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784026#comment-16784026
 ] 

Jesus Camacho Rodriguez commented on HIVE-21340:


[~vgarg], can you create a PR / RB? Thanks

> CBO: Prune non-key columns feeding into a SemiJoin
> --
>
> Key: HIVE-21340
> URL: https://issues.apache.org/jira/browse/HIVE-21340
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21340.1.patch, HIVE-21340.2.patch, 
> HIVE-21340.3.patch
>
>
> {code}
> explain cbo 
> with ss as 
> (select count(1), ss_item_sk, ss_ticket_number from 
> store_sales group by ss_item_sk, ss_ticket_number 
> having count(1) > 1) 
> select count(1) from item where i_item_sk IN (select ss_item_sk from ss);
> {code}
> Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} 
> Only ss_item_sk is relevant for the HiveSemiJoin
> {code}
> CBO PLAN:
> HiveAggregate(group=[{}], agg#0=[count()])
>   HiveSemiJoin(condition=[=($0, $1)], joinType=[inner])
> HiveProject(i_item_sk=[$0])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], 
> table:alias=[item])
> HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])
>   HiveFilter(condition=[>($2, 1)])
> HiveAggregate(group=[{1, 8}], agg#0=[count()])
>   HiveFilter(condition=[IS NOT NULL($1)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, 
> store_sales]], table:alias=[store_sales])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin

2019-03-04 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784028#comment-16784028
 ] 

Vineet Garg commented on HIVE-21340:


[~jcamachorodriguez] RB link: https://reviews.apache.org/r/70125/

> CBO: Prune non-key columns feeding into a SemiJoin
> --
>
> Key: HIVE-21340
> URL: https://issues.apache.org/jira/browse/HIVE-21340
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21340.1.patch, HIVE-21340.2.patch, 
> HIVE-21340.3.patch
>
>
> {code}
> explain cbo 
> with ss as 
> (select count(1), ss_item_sk, ss_ticket_number from 
> store_sales group by ss_item_sk, ss_ticket_number 
> having count(1) > 1) 
> select count(1) from item where i_item_sk IN (select ss_item_sk from ss);
> {code}
> Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} 
> Only ss_item_sk is relevant for the HiveSemiJoin
> {code}
> CBO PLAN:
> HiveAggregate(group=[{}], agg#0=[count()])
>   HiveSemiJoin(condition=[=($0, $1)], joinType=[inner])
> HiveProject(i_item_sk=[$0])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], 
> table:alias=[item])
> HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])
>   HiveFilter(condition=[>($2, 1)])
> HiveAggregate(group=[{1, 8}], agg#0=[count()])
>   HiveFilter(condition=[IS NOT NULL($1)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, 
> store_sales]], table:alias=[store_sales])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin

2019-03-04 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784020#comment-16784020
 ] 

Vineet Garg commented on HIVE-21340:


[~jcamachorodriguez] Can you take a look please?

> CBO: Prune non-key columns feeding into a SemiJoin
> --
>
> Key: HIVE-21340
> URL: https://issues.apache.org/jira/browse/HIVE-21340
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21340.1.patch, HIVE-21340.2.patch, 
> HIVE-21340.3.patch
>
>
> {code}
> explain cbo 
> with ss as 
> (select count(1), ss_item_sk, ss_ticket_number from 
> store_sales group by ss_item_sk, ss_ticket_number 
> having count(1) > 1) 
> select count(1) from item where i_item_sk IN (select ss_item_sk from ss);
> {code}
> Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} 
> Only ss_item_sk is relevant for the HiveSemiJoin
> {code}
> CBO PLAN:
> HiveAggregate(group=[{}], agg#0=[count()])
>   HiveSemiJoin(condition=[=($0, $1)], joinType=[inner])
> HiveProject(i_item_sk=[$0])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], 
> table:alias=[item])
> HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])
>   HiveFilter(condition=[>($2, 1)])
> HiveAggregate(group=[{1, 8}], agg#0=[count()])
>   HiveFilter(condition=[IS NOT NULL($1)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, 
> store_sales]], table:alias=[store_sales])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin

2019-03-01 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781498#comment-16781498
 ] 

Hive QA commented on HIVE-21340:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12960682/HIVE-21340.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 15824 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[semijoin] 
(batchId=121)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16306/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16306/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16306/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12960682 - PreCommit-HIVE-Build

> CBO: Prune non-key columns feeding into a SemiJoin
> --
>
> Key: HIVE-21340
> URL: https://issues.apache.org/jira/browse/HIVE-21340
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21340.1.patch, HIVE-21340.2.patch
>
>
> {code}
> explain cbo 
> with ss as 
> (select count(1), ss_item_sk, ss_ticket_number from 
> store_sales group by ss_item_sk, ss_ticket_number 
> having count(1) > 1) 
> select count(1) from item where i_item_sk IN (select ss_item_sk from ss);
> {code}
> Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} 
> Only ss_item_sk is relevant for the HiveSemiJoin
> {code}
> CBO PLAN:
> HiveAggregate(group=[{}], agg#0=[count()])
>   HiveSemiJoin(condition=[=($0, $1)], joinType=[inner])
> HiveProject(i_item_sk=[$0])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], 
> table:alias=[item])
> HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])
>   HiveFilter(condition=[>($2, 1)])
> HiveAggregate(group=[{1, 8}], agg#0=[count()])
>   HiveFilter(condition=[IS NOT NULL($1)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, 
> store_sales]], table:alias=[store_sales])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin

2019-03-01 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781466#comment-16781466
 ] 

Hive QA commented on HIVE-21340:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
17s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
22s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
24s{color} | {color:blue} ql in master has 2251 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
45s{color} | {color:red} ql: The patch generated 3 new + 0 unchanged - 0 fixed 
= 3 total (was 0) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
57s{color} | {color:red} ql generated 1 new + 2251 unchanged - 0 fixed = 2252 
total (was 2251) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 27m 52s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Dead store to newRightKeys in 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveSemiJoinRule.perform(RelOptRuleCall,
 ImmutableBitSet, RelNode, Join, RelNode, Aggregate)  At 
HiveSemiJoinRule.java:org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveSemiJoinRule.perform(RelOptRuleCall,
 ImmutableBitSet, RelNode, Join, RelNode, Aggregate)  At 
HiveSemiJoinRule.java:[line 131] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16306/dev-support/hive-personality.sh
 |
| git revision | master / 6831b08 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16306/yetus/diff-checkstyle-ql.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16306/yetus/new-findbugs-ql.html
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16306/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> CBO: Prune non-key columns feeding into a SemiJoin
> --
>
> Key: HIVE-21340
> URL: https://issues.apache.org/jira/browse/HIVE-21340
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21340.1.patch, HIVE-21340.2.patch
>
>
> {code}
> explain cbo 
> with ss as 
> (select count(1), ss_item_sk, ss_ticket_number from 
> store_sales group by ss_item_sk, ss_ticket_number 
> having count(1) > 1) 
> select count(1) from item where i_item_sk IN (select ss_item_sk from ss);
> {code}
> Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} 
> Only ss_item_sk is relevant for the HiveSemiJoin
> {code}
> CBO PLAN:
> HiveAggregate(group=[{}], agg#0=[count()])
>   

[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin

2019-02-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780906#comment-16780906
 ] 

Hive QA commented on HIVE-21340:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12960521/HIVE-21340.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 15824 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[semijoin] 
(batchId=121)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[cbo_query14] 
(batchId=275)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[cbo_query83] 
(batchId=275)
org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[cbo_query23]
 (batchId=275)
org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[cbo_query83]
 (batchId=275)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16296/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16296/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16296/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12960521 - PreCommit-HIVE-Build

> CBO: Prune non-key columns feeding into a SemiJoin
> --
>
> Key: HIVE-21340
> URL: https://issues.apache.org/jira/browse/HIVE-21340
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21340.1.patch
>
>
> {code}
> explain cbo 
> with ss as 
> (select count(1), ss_item_sk, ss_ticket_number from 
> store_sales group by ss_item_sk, ss_ticket_number 
> having count(1) > 1) 
> select count(1) from item where i_item_sk IN (select ss_item_sk from ss);
> {code}
> Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} 
> Only ss_item_sk is relevant for the HiveSemiJoin
> {code}
> CBO PLAN:
> HiveAggregate(group=[{}], agg#0=[count()])
>   HiveSemiJoin(condition=[=($0, $1)], joinType=[inner])
> HiveProject(i_item_sk=[$0])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], 
> table:alias=[item])
> HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])
>   HiveFilter(condition=[>($2, 1)])
> HiveAggregate(group=[{1, 8}], agg#0=[count()])
>   HiveFilter(condition=[IS NOT NULL($1)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, 
> store_sales]], table:alias=[store_sales])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin

2019-02-28 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780786#comment-16780786
 ] 

Vineet Garg commented on HIVE-21340:


[~jcamachorodriguez] That is a good suggestion. I'll update this rule in a 
follow-up patch since the current patch doesn't introduce HepRelVertex code 
only moves it around.

> CBO: Prune non-key columns feeding into a SemiJoin
> --
>
> Key: HIVE-21340
> URL: https://issues.apache.org/jira/browse/HIVE-21340
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21340.1.patch
>
>
> {code}
> explain cbo 
> with ss as 
> (select count(1), ss_item_sk, ss_ticket_number from 
> store_sales group by ss_item_sk, ss_ticket_number 
> having count(1) > 1) 
> select count(1) from item where i_item_sk IN (select ss_item_sk from ss);
> {code}
> Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} 
> Only ss_item_sk is relevant for the HiveSemiJoin
> {code}
> CBO PLAN:
> HiveAggregate(group=[{}], agg#0=[count()])
>   HiveSemiJoin(condition=[=($0, $1)], joinType=[inner])
> HiveProject(i_item_sk=[$0])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], 
> table:alias=[item])
> HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])
>   HiveFilter(condition=[>($2, 1)])
> HiveAggregate(group=[{1, 8}], agg#0=[count()])
>   HiveFilter(condition=[IS NOT NULL($1)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, 
> store_sales]], table:alias=[store_sales])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin

2019-02-28 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780768#comment-16780768
 ] 

Jesus Camacho Rodriguez commented on HIVE-21340:


[~vgarg], I believe you could just add a RelNode.class in the operand matcher 
and pass it as a parameter to {{perform}}, so you can avoid casting to 
HepRelVertex altogether (rule can remain generic, which makes it easier to 
contribute it back to Calcite too).

> CBO: Prune non-key columns feeding into a SemiJoin
> --
>
> Key: HIVE-21340
> URL: https://issues.apache.org/jira/browse/HIVE-21340
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21340.1.patch
>
>
> {code}
> explain cbo 
> with ss as 
> (select count(1), ss_item_sk, ss_ticket_number from 
> store_sales group by ss_item_sk, ss_ticket_number 
> having count(1) > 1) 
> select count(1) from item where i_item_sk IN (select ss_item_sk from ss);
> {code}
> Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} 
> Only ss_item_sk is relevant for the HiveSemiJoin
> {code}
> CBO PLAN:
> HiveAggregate(group=[{}], agg#0=[count()])
>   HiveSemiJoin(condition=[=($0, $1)], joinType=[inner])
> HiveProject(i_item_sk=[$0])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], 
> table:alias=[item])
> HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])
>   HiveFilter(condition=[>($2, 1)])
> HiveAggregate(group=[{1, 8}], agg#0=[count()])
>   HiveFilter(condition=[IS NOT NULL($1)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, 
> store_sales]], table:alias=[store_sales])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin

2019-02-27 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780025#comment-16780025
 ] 

Vineet Garg commented on HIVE-21340:


Problem is with HiveSemiJoinRule. Column pruning is occurring e.g. the plan 
just before HiveSemiJoinRule is:

{code:sql}
HiveAggregate(group=[{}], agg#0=[count()])
  HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none], cost=[not 
available])
HiveProject(i_item_sk=[$0])
  HiveFilter(condition=[IS NOT NULL($0)])
HiveTableScan(table=[[perf, item]], table:alias=[item])
HiveAggregate(group=[{0}])
  HiveFilter(condition=[>($2, 1)])
HiveAggregate(group=[{2, 9}], agg#0=[count()])
  HiveFilter(condition=[IS NOT NULL($2)])
HiveTableScan(table=[[perf, store_sales]], 
table:alias=[store_sales])

{code}

HiveSemiJoinRule rewrites the HiveJoin + HIveAggregate into HiveSemiJoin. It 
does not introduce HiveProject as replacement of HiveAggregate, as a result 
schema changes to whatever HiveAggregate's input is (HiveFilter in this case)

> CBO: Prune non-key columns feeding into a SemiJoin
> --
>
> Key: HIVE-21340
> URL: https://issues.apache.org/jira/browse/HIVE-21340
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
>Priority: Major
>
> {code}
> explain cbo 
> with ss as 
> (select count(1), ss_item_sk, ss_ticket_number from 
> store_sales group by ss_item_sk, ss_ticket_number 
> having count(1) > 1) 
> select count(1) from item where i_item_sk IN (select ss_item_sk from ss);
> {code}
> Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} 
> Only ss_item_sk is relevant for the HiveSemiJoin
> {code}
> CBO PLAN:
> HiveAggregate(group=[{}], agg#0=[count()])
>   HiveSemiJoin(condition=[=($0, $1)], joinType=[inner])
> HiveProject(i_item_sk=[$0])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], 
> table:alias=[item])
> HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])
>   HiveFilter(condition=[>($2, 1)])
> HiveAggregate(group=[{1, 8}], agg#0=[count()])
>   HiveFilter(condition=[IS NOT NULL($1)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, 
> store_sales]], table:alias=[store_sales])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)