[ https://issues.apache.org/jira/browse/CALCITE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051680#comment-15051680 ]
Julian Hyde commented on CALCITE-1017: -------------------------------------- For future reference, I think you can move issues from one project to another. > hive.mapred.mode=strict throws an error even if the final plan does not have > cartesian product in it. > ----------------------------------------------------------------------------------------------------- > > Key: CALCITE-1017 > URL: https://issues.apache.org/jira/browse/CALCITE-1017 > Project: Calcite > Issue Type: Bug > Reporter: Hari Sankar Sivarama Subramaniyan > Assignee: Julian Hyde > > {code} > Vertex dependency in root stage > Reducer 10 <- Reducer 9 (SIMPLE_EDGE) > Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 11 (SIMPLE_EDGE) > Reducer 3 <- Map 12 (SIMPLE_EDGE), Reducer 2 (SIMPLE_EDGE) > Reducer 4 <- Map 13 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE) > Reducer 5 <- Map 14 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE) > Reducer 6 <- Map 15 (SIMPLE_EDGE), Reducer 5 (SIMPLE_EDGE) > Reducer 7 <- Map 16 (SIMPLE_EDGE), Reducer 6 (SIMPLE_EDGE) > Reducer 8 <- Map 17 (SIMPLE_EDGE), Reducer 7 (SIMPLE_EDGE) > Reducer 9 <- Reducer 8 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:100 > Stage-1 > Reducer 10 > File Output Operator [FS_63] > compressed:false > Statistics:Num rows: 100 Data size: 143600 Basic stats: COMPLETE > Column stats: NONE > table:{"input > format:":"org.apache.hadoop.mapred.TextInputFormat","output > format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"} > Limit [LIM_62] > Number of rows:100 > Statistics:Num rows: 100 Data size: 143600 Basic stats: > COMPLETE Column stats: NONE > Select Operator [SEL_61] > | > outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14"] > | Statistics:Num rows: 127050 Data size: 182479129 Basic > stats: COMPLETE Column stats: NONE > |<-Reducer 9 [SIMPLE_EDGE] > Reduce Output Operator [RS_60] > key expressions:_col0 (type: string), _col1 (type: > string), _col2 (type: string) > sort order:+++ > Statistics:Num rows: 127050 Data size: 182479129 Basic > stats: COMPLETE Column stats: NONE > value expressions:_col3 (type: bigint), _col4 (type: > double), _col5 (type: double), _col6 (type: double), _col7 (type: bigint), > _col8 (type: double), _col9 (type: double), _col10 (type: double), _col11 > (type: bigint), _col12 (type: double), _col13 (type: double) > Select Operator [SEL_58] > > outputColumnNames:["_col0","_col1","_col10","_col11","_col12","_col13","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"] > Statistics:Num rows: 127050 Data size: 182479129 > Basic stats: COMPLETE Column stats: NONE > Group By Operator [GBY_57] > | > aggregations:["count(VALUE._col0)","avg(VALUE._col1)","stddev_samp(VALUE._col2)","count(VALUE._col3)","avg(VALUE._col4)","stddev_samp(VALUE._col5)","count(VALUE._col6)","avg(VALUE._col7)","stddev_samp(VALUE._col8)"] > | keys:KEY._col0 (type: string), KEY._col1 (type: > string), KEY._col2 (type: string) > | > outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11"] > | Statistics:Num rows: 127050 Data size: 182479129 > Basic stats: COMPLETE Column stats: NONE > |<-Reducer 8 [SIMPLE_EDGE] > Reduce Output Operator [RS_56] > key expressions:_col0 (type: string), _col1 > (type: string), _col2 (type: string) > Map-reduce partition columns:_col0 (type: > string), _col1 (type: string), _col2 (type: string) > sort order:+++ > Statistics:Num rows: 254100 Data size: > 364958258 Basic stats: COMPLETE Column stats: NONE > value expressions:_col3 (type: bigint), _col4 > (type: struct<count:bigint,sum:double,input:int>), _col5 (type: > struct<count:bigint,sum:double,variance:double>), _col6 (type: bigint), _col7 > (type: struct<count:bigint,sum:double,input:int>), _col8 (type: > struct<count:bigint,sum:double,variance:double>), _col9 (type: bigint), > _col10 (type: struct<count:bigint,sum:double,input:int>), _col11 (type: > struct<count:bigint,sum:double,variance:double>) > Group By Operator [GBY_55] > > aggregations:["count(_col5)","avg(_col5)","stddev_samp(_col5)","count(_col10)","avg(_col10)","stddev_samp(_col10)","count(_col14)","avg(_col14)","stddev_samp(_col14)"] > keys:_col22 (type: string), _col24 (type: > string), _col25 (type: string) > > outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11"] > Statistics:Num rows: 254100 Data size: > 364958258 Basic stats: COMPLETE Column stats: NONE > Select Operator [SEL_54] > > outputColumnNames:["_col22","_col24","_col25","_col5","_col10","_col14"] > Statistics:Num rows: 254100 Data size: > 364958258 Basic stats: COMPLETE Column stats: NONE > Merge Join Operator [MERGEJOIN_113] > | condition map:[{"":"Inner Join 0 to > 1"}] > | keys:{"0":"_col1 (type: > int)","1":"_col0 (type: int)"} > | > outputColumnNames:["_col5","_col10","_col14","_col22","_col24","_col25"] > | Statistics:Num rows: 254100 Data size: > 364958258 Basic stats: COMPLETE Column stats: NONE > |<-Map 17 [SIMPLE_EDGE] > | Reduce Output Operator [RS_52] > | key expressions:_col0 (type: int) > | Map-reduce partition columns:_col0 > (type: int) > | sort order:+ > | Statistics:Num rows: 231000 Data > size: 331780228 Basic stats: COMPLETE Column stats: NONE > | value expressions:_col1 (type: > string), _col2 (type: string) > | Select Operator [SEL_18] > | > outputColumnNames:["_col0","_col1","_col2"] > | Statistics:Num rows: 231000 Data > size: 331780228 Basic stats: COMPLETE Column stats: NONE > | Filter Operator [FIL_106] > | predicate:i_item_sk is not > null (type: boolean) > | Statistics:Num rows: 231000 > Data size: 331780228 Basic stats: COMPLETE Column stats: NONE > | TableScan [TS_17] > | alias:item > | Statistics:Num rows: > 462000 Data size: 663560457 Basic stats: COMPLETE Column stats: NONE > |<-Reducer 7 [SIMPLE_EDGE] > Reduce Output Operator [RS_50] > key expressions:_col1 (type: int) > Map-reduce partition columns:_col1 > (type: int) > sort order:+ > Statistics:Num rows: 26735 Data > size: 29919145 Basic stats: COMPLETE Column stats: NONE > value expressions:_col5 (type: > int), _col10 (type: int), _col14 (type: int), _col22 (type: string) > Merge Join Operator [MERGEJOIN_112] > | condition map:[{"":"Inner Join 0 > to 1"}] > | keys:{"0":"_col3 (type: > int)","1":"_col0 (type: int)"} > | > outputColumnNames:["_col1","_col5","_col10","_col14","_col22"] > | Statistics:Num rows: 26735 Data > size: 29919145 Basic stats: COMPLETE Column stats: NONE > |<-Map 16 [SIMPLE_EDGE] > | Reduce Output Operator [RS_47] > | key expressions:_col0 (type: > int) > | Map-reduce partition > columns:_col0 (type: int) > | sort order:+ > | Statistics:Num rows: 852 Data > size: 1628138 Basic stats: COMPLETE Column stats: NONE > | value expressions:_col1 > (type: string) > | Select Operator [SEL_16] > | > outputColumnNames:["_col0","_col1"] > | Statistics:Num rows: 852 > Data size: 1628138 Basic stats: COMPLETE Column stats: NONE > | Filter Operator [FIL_105] > | predicate:s_store_sk is > not null (type: boolean) > | Statistics:Num rows: > 852 Data size: 1628138 Basic stats: COMPLETE Column stats: NONE > | TableScan [TS_15] > | alias:store > | Statistics:Num rows: > 1704 Data size: 3256276 Basic stats: COMPLETE Column stats: NONE > |<-Reducer 6 [SIMPLE_EDGE] > Reduce Output Operator [RS_45] > key expressions:_col3 (type: > int) > Map-reduce partition > columns:_col3 (type: int) > sort order:+ > Statistics:Num rows: 24305 > Data size: 27199223 Basic stats: COMPLETE Column stats: NONE > value expressions:_col1 > (type: int), _col5 (type: int), _col10 (type: int), _col14 (type: int) > Merge Join Operator > [MERGEJOIN_111] > | condition map:[{"":"Inner > Join 0 to 1"}] > | keys:{"0":"_col11 (type: > int)","1":"_col0 (type: int)"} > | > outputColumnNames:["_col1","_col3","_col5","_col10","_col14"] > | Statistics:Num rows: 24305 > Data size: 27199223 Basic stats: COMPLETE Column stats: NONE > |<-Map 15 [SIMPLE_EDGE] > | Reduce Output Operator > [RS_42] > | key expressions:_col0 > (type: int) > | Map-reduce partition > columns:_col0 (type: int) > | sort order:+ > | Statistics:Num rows: > 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE > | Select Operator [SEL_14] > | > outputColumnNames:["_col0"] > | Statistics:Num rows: > 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE > | Filter Operator > [FIL_104] > | > predicate:((d_quarter_name) IN ('2000Q1', '2000Q2', '2000Q3') and d_date_sk > is not null) (type: boolean) > | Statistics:Num > rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE > | TableScan [TS_12] > | alias:d1 > | Statistics:Num > rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE > |<-Reducer 5 [SIMPLE_EDGE] > Reduce Output Operator > [RS_40] > key expressions:_col11 > (type: int) > Map-reduce partition > columns:_col11 (type: int) > sort order:+ > Statistics:Num rows: > 22096 Data size: 24726566 Basic stats: COMPLETE Column stats: NONE > value expressions:_col1 > (type: int), _col3 (type: int), _col5 (type: int), _col10 (type: int), _col14 > (type: int) > Merge Join Operator > [MERGEJOIN_110] > | condition > map:[{"":"Inner Join 0 to 1"}] > | keys:{"0":"_col6 > (type: int)","1":"_col0 (type: int)"} > | > outputColumnNames:["_col1","_col3","_col5","_col10","_col11","_col14"] > | Statistics:Num rows: > 22096 Data size: 24726566 Basic stats: COMPLETE Column stats: NONE > |<-Map 14 [SIMPLE_EDGE] > | Reduce Output > Operator [RS_37] > | key > expressions:_col0 (type: int) > | Map-reduce > partition columns:_col0 (type: int) > | sort order:+ > | Statistics:Num > rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE > | Select Operator > [SEL_11] > | > outputColumnNames:["_col0"] > | Statistics:Num > rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE > | Filter > Operator [FIL_103] > | > predicate:((d_quarter_name) IN ('2000Q1', '2000Q2', '2000Q3') and d_date_sk > is not null) (type: boolean) > | > Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column > stats: NONE > | TableScan > [TS_9] > | alias:d1 > | > Statistics:Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column > stats: NONE > |<-Reducer 4 > [SIMPLE_EDGE] > Reduce Output > Operator [RS_35] > key > expressions:_col6 (type: int) > Map-reduce > partition columns:_col6 (type: int) > sort order:+ > Statistics:Num > rows: 20088 Data size: 22478696 Basic stats: COMPLETE Column stats: NONE > value > expressions:_col1 (type: int), _col3 (type: int), _col5 (type: int), _col10 > (type: int), _col11 (type: int), _col14 (type: int) > Merge Join > Operator [MERGEJOIN_109] > | condition > map:[{"":"Inner Join 0 to 1"}] > | > keys:{"0":"_col0 (type: int)","1":"_col0 (type: int)"} > | > outputColumnNames:["_col1","_col3","_col5","_col6","_col10","_col11","_col14"] > | Statistics:Num > rows: 20088 Data size: 22478696 Basic stats: COMPLETE Column stats: NONE > |<-Map 13 > [SIMPLE_EDGE] > | Reduce Output > Operator [RS_32] > | key > expressions:_col0 (type: int) > | Map-reduce > partition columns:_col0 (type: int) > | sort order:+ > | > Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column > stats: NONE > | Select > Operator [SEL_8] > | > outputColumnNames:["_col0"] > | > Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column > stats: NONE > | Filter > Operator [FIL_102] > | > predicate:((d_quarter_name = '2000Q1') and d_date_sk is not null) (type: > boolean) > | > Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column > stats: NONE > | > TableScan [TS_6] > | > alias:d1 > | > Statistics:Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column > stats: NONE > |<-Reducer 3 > [SIMPLE_EDGE] > Reduce Output > Operator [RS_30] > key > expressions:_col0 (type: int) > Map-reduce > partition columns:_col0 (type: int) > sort order:+ > > Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE > value > expressions:_col1 (type: int), _col3 (type: int), _col5 (type: int), _col6 > (type: int), _col10 (type: int), _col11 (type: int), _col14 (type: int) > Merge Join > Operator [MERGEJOIN_108] > | > condition map:[{"":"Inner Join 0 to 1"}] > | > keys:{"0":"_col8 (type: int), _col7 (type: int)","1":"_col1 (type: int), > _col2 (type: int)"} > | > outputColumnNames:["_col0","_col1","_col3","_col5","_col6","_col10","_col11","_col14"] > | > Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE > |<-Map 12 > [SIMPLE_EDGE] > | Reduce > Output Operator [RS_27] > | key > expressions:_col1 (type: int), _col2 (type: int) > | > Map-reduce partition columns:_col1 (type: int), _col2 (type: int) > | sort > order:++ > | > Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE > | value > expressions:_col0 (type: int), _col3 (type: int) > | > Select Operator [SEL_5] > | > outputColumnNames:["_col0","_col1","_col2","_col3"] > | > Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE > | > Filter Operator [FIL_101] > | > predicate:((cs_bill_customer_sk is not null and cs_item_sk is not null) and > cs_sold_date_sk is not null) (type: boolean) > | > Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE > | > TableScan [TS_4] > | > alias:catalog_sales > | > Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE > |<-Reducer > 2 [SIMPLE_EDGE] > Reduce > Output Operator [RS_25] > key > expressions:_col8 (type: int), _col7 (type: int) > > Map-reduce partition columns:_col8 (type: int), _col7 (type: int) > sort > order:++ > > Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE > value > expressions:_col0 (type: int), _col1 (type: int), _col3 (type: int), _col5 > (type: int), _col6 (type: int), _col10 (type: int) > Merge > Join Operator [MERGEJOIN_107] > | > condition map:[{"":"Inner Join 0 to 1"}] > | > keys:{"0":"_col2 (type: int), _col1 (type: int), _col4 (type: > int)","1":"_col2 (type: int), _col1 (type: int), _col3 (type: int)"} > | > outputColumnNames:["_col0","_col1","_col3","_col5","_col6","_col7","_col8","_col10"] > | > Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE > > |<-Map 1 [SIMPLE_EDGE] > | > Reduce Output Operator [RS_20] > | > key expressions:_col2 (type: int), _col1 (type: int), _col4 (type: int) > | > Map-reduce partition columns:_col2 (type: int), _col1 (type: int), _col4 > (type: int) > | > sort order:+++ > | > Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE > | > value expressions:_col0 (type: int), _col3 (type: int), _col5 (type: int) > | > Select Operator [SEL_1] > | > outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5"] > | > Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE > | > Filter Operator [FIL_99] > | > predicate:((((ss_customer_sk is not null and ss_item_sk is not null) > and ss_ticket_number is not null) and ss_sold_date_sk is not null) and > ss_store_sk is not null) (type: boolean) > | > Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: > NONE > | > TableScan [TS_0] > | > alias:store_sales > | > Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column > stats: NONE > > |<-Map 11 [SIMPLE_EDGE] > > Reduce Output Operator [RS_22] > > key expressions:_col2 (type: int), _col1 (type: int), _col3 (type: int) > > Map-reduce partition columns:_col2 (type: int), _col1 (type: int), _col3 > (type: int) > > sort order:+++ > > Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE > > value expressions:_col0 (type: int), _col4 (type: int) > > Select Operator [SEL_3] > > outputColumnNames:["_col0","_col1","_col2","_col3","_col4"] > > Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE > > Filter Operator [FIL_100] > > predicate:(((sr_customer_sk is not null and sr_item_sk is not null) and > sr_ticket_number is not null) and sr_returned_date_sk is not null) (type: > boolean) > > Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: > NONE > > TableScan [TS_2] > > alias:store_returns > > Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column > stats: NONE > {code} > The query is : > {code} > explain select i_item_desc ,i_category ,i_class ,i_current_price ,i_item_id > ,sum(ws_ext_sales_price) as itemrevenue > ,sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over (partition by > i_class) as revenueratio from web_sales ,item ,date_dim where > web_sales.ws_item_sk = item.i_item_sk and item.i_category in ('Jewelry', > 'Sports', 'Books') and web_sales.ws_sold_date_sk = date_dim.d_date_sk and > date_dim.d_date between '2001-01-12' and '2001-02-11' group by i_item_id > ,i_item_desc ,i_category ,i_class ,i_current_price order by i_category > ,i_class ,i_item_id ,i_item_desc ,revenueratio limit 100; > {code} > It seems that in SemanticAnalyzer.genJoinReduceSinkChild() we look for Join > predicates only in 'ON' clause. If the join condition happens in 'WHERE' > clause of the query, we aggressively throw an exception assuming this join is > a cartesian product in strict mode. We should delay this check post physical > optimizer until the plan is complete. -- This message was sent by Atlassian JIRA (v6.3.4#6332)