Gopal V created HIVE-21340:
------------------------------

             Summary: CBO: Prune non-key columns feeding into a SemiJoin
                 Key: HIVE-21340
                 URL: https://issues.apache.org/jira/browse/HIVE-21340
             Project: Hive
          Issue Type: Bug
          Components: CBO
    Affects Versions: 4.0.0
            Reporter: Gopal V


{code}
explain cbo 
with ss as 
(select count(1), ss_item_sk, ss_ticket_number from 
            store_sales group by ss_item_sk, ss_ticket_number 
            having count(1) > 1) 
select count(1) from item where i_item_sk IN (select ss_item_sk from ss);
{code}

Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} 

Only ss_item_sk is relevant for the HiveSemiJoin

{code}
CBO PLAN:
HiveAggregate(group=[{}], agg#0=[count()])
  HiveSemiJoin(condition=[=($0, $1)], joinType=[inner])
    HiveProject(i_item_sk=[$0])
      HiveFilter(condition=[IS NOT NULL($0)])
        HiveTableScan(table=[[tpcds_copy_orc_partitioned_10000, item]], 
table:alias=[item])
    HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])
      HiveFilter(condition=[>($2, 1)])
        HiveAggregate(group=[{1, 8}], agg#0=[count()])
          HiveFilter(condition=[IS NOT NULL($1)])
            HiveTableScan(table=[[tpcds_copy_orc_partitioned_10000, 
store_sales]], table:alias=[store_sales])
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to