[ 
https://issues.apache.org/jira/browse/HIVE-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535756#comment-14535756
 ] 

Sushanth Sowmyan commented on HIVE-10412:
-----------------------------------------

Removing fix version of 1.2.0 in preparation of release, since this is not a 
blocker for 1.2.0.

> CBO : Calculate join selectivity when computing HiveJoin cost
> -------------------------------------------------------------
>
>                 Key: HIVE-10412
>                 URL: https://issues.apache.org/jira/browse/HIVE-10412
>             Project: Hive
>          Issue Type: Sub-task
>          Components: CBO
>            Reporter: Mostafa Mokhtar
>            Assignee: Laljo John Pullokkaran
>
> This is from TPC-DS Q7
> Because we don't compute the selectivity of sub-expression in a HiveJoin we 
> assume that selective and non-selective joins have the similar cost.
> {code}
> select  i_item_id, 
>         avg(ss_quantity) agg1,
>         avg(ss_list_price) agg2,
>         avg(ss_coupon_amt) agg3,
>         avg(ss_sales_price) agg4 
>  from store_sales, customer_demographics, item
>  where store_sales.ss_item_sk = item.i_item_sk and
>        store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk and
>        cd_gender = 'F' and 
>        cd_marital_status = 'W' and
>        cd_education_status = 'Primary'
>  group by i_item_id
>  order by i_item_id
>  limit 100
> {code}
> Cardinality 
> {code}
> item 462,000
> customer_demographics 1,920,800
> store_sales 82,510,879,939
> {code}
> NDVs
> {code}
> item.i_item_sk 439501
> customer_demographics.cd_demo_sk 1835839
> store_sales.ss_cdemo_sk 1835839
> {code}
> From the logs 
> {code}
> 2015-04-20 21:09:58,055 DEBUG [main]: cost.HiveCostModel 
> (HiveCostModel.java:getJoinCost(60)) - Join algorithm selection for:
> HiveJoin(condition=[=($0, $10)], joinType=[inner], algorithm=[none], 
> cost=[not available])
>   HiveJoin(condition=[=($1, $6)], joinType=[inner], algorithm=[MapJoin], 
> cost=[{8.251089518344444E10 rows, 2.324083308641975E8 cpu, 275417.5666666666 
> io}])
>     HiveProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], ss_quantity=[$9], 
> ss_list_price=[$11], ss_sales_price=[$12], ss_coupon_amt=[$18])
>       HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.store_sales]])
>     HiveProject(cd_demo_sk=[$0], cd_gender=[$1], cd_marital_status=[$2], 
> cd_education_status=[$3])
>       HiveFilter(condition=[AND(=($1, 'F'), =($2, 'W'), =($3, 'Primary'))])
>         
> HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.customer_demographics]])
>   HiveProject(i_item_sk=[$0], i_item_id=[$1])
>     HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.item]])
> 2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel 
> (HiveCostModel.java:getJoinCost(69)) - CommonJoin cost: {6.553102534841269E8 
> rows, 4.0217814199458417E18 cpu, 3.499540319862703E7 io}
> 2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel 
> (HiveCostModel.java:getJoinCost(69)) - MapJoin cost: {6.553102534841269E8 
> rows, 2.13444462E11 cpu, 1.0720709999999998E7 io}
> 2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel 
> (HiveCostModel.java:getJoinCost(78)) - MapJoin selected
> 2015-04-20 21:09:58,057 DEBUG [main]: cost.HiveCostModel 
> (HiveCostModel.java:getJoinCost(60)) - Join algorithm selection for:
> HiveJoin(condition=[=($1, $8)], joinType=[inner], algorithm=[none], cost=[not 
> available])
>   HiveJoin(condition=[=($0, $6)], joinType=[inner], algorithm=[MapJoin], 
> cost=[{8.2511341939E10 rows, 2.13444462E11 cpu, 1.0720709999999998E7 io}])
>     HiveProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], ss_quantity=[$9], 
> ss_list_price=[$11], ss_sales_price=[$12], ss_coupon_amt=[$18])
>       HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.store_sales]])
>     HiveProject(i_item_sk=[$0], i_item_id=[$1])
>       HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.item]])
>   HiveProject(cd_demo_sk=[$0], cd_gender=[$1], cd_marital_status=[$2], 
> cd_education_status=[$3])
>     HiveFilter(condition=[AND(=($1, 'F'), =($2, 'W'), =($3, 'Primary'))])
>       
> HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.customer_demographics]])
> 2015-04-20 21:09:58,058 DEBUG [main]: cost.HiveCostModel 
> (HiveCostModel.java:getJoinCost(69)) - CommonJoin cost: {8.251089518344444E10 
> rows, 2.6089279242468144E21 cpu, 4.901146588836599E9 io}
> 2015-04-20 21:09:58,058 DEBUG [main]: cost.HiveCostModel 
> (HiveCostModel.java:getJoinCost(69)) - MapJoin cost: {8.251089518344444E10 
> rows, 2.324083308641975E8 cpu, 275417.5666666666 io}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to