Gopal V created HIVE-21340: ------------------------------ Summary: CBO: Prune non-key columns feeding into a SemiJoin Key: HIVE-21340 URL: https://issues.apache.org/jira/browse/HIVE-21340 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 4.0.0 Reporter: Gopal V
{code} explain cbo with ss as (select count(1), ss_item_sk, ss_ticket_number from store_sales group by ss_item_sk, ss_ticket_number having count(1) > 1) select count(1) from item where i_item_sk IN (select ss_item_sk from ss); {code} Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} Only ss_item_sk is relevant for the HiveSemiJoin {code} CBO PLAN: HiveAggregate(group=[{}], agg#0=[count()]) HiveSemiJoin(condition=[=($0, $1)], joinType=[inner]) HiveProject(i_item_sk=[$0]) HiveFilter(condition=[IS NOT NULL($0)]) HiveTableScan(table=[[tpcds_copy_orc_partitioned_10000, item]], table:alias=[item]) HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2]) HiveFilter(condition=[>($2, 1)]) HiveAggregate(group=[{1, 8}], agg#0=[count()]) HiveFilter(condition=[IS NOT NULL($1)]) HiveTableScan(table=[[tpcds_copy_orc_partitioned_10000, store_sales]], table:alias=[store_sales]) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)