[jira] [Commented] (SPARK-18814) CheckAnalysis rejects TPCDS query 32
[ https://issues.apache.org/jira/browse/SPARK-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746846#comment-15746846 ] Nattavut Sutyanyong commented on SPARK-18814: - As this JIRA will be brought to close shortly, I'd like to take this opportunity to thank [~ekhliang] who reported the first case of regression, [~kiszk] the second case. You help shorten the life of a regression from my code. I also thank [~smilegator] and [~hvanhovell] who shared their ideas on the fix and reviewed my code, and lastly, to [~rxin] who nudged me to this problem. > CheckAnalysis rejects TPCDS query 32 > > > Key: SPARK-18814 > URL: https://issues.apache.org/jira/browse/SPARK-18814 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Eric Liang >Assignee: Herman van Hovell >Priority: Blocker > > It seems the CheckAnalysis rule introduced by SPARK-18504 is incorrect > rejecting this TPCDS query, which ran fine in Spark 2.0. There doesn't seem > to be any obvious error in the query or the check rule though: in the plan > below, the scalar subquery's condition field is "scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] ", which should reference cs_item_sk#39. > Nonetheless CheckAnalysis complains that cs_item_sk#39 is not referenced by > the scalar subquery predicates. > analysis error: > {code} > == Query: q32-v1.4 == > Can't be analyzed: org.apache.spark.sql.AnalysisException: a GROUP BY clause > in a scalar correlated subquery cannot contain non-correlated columns: > cs_item_sk#39;; > GlobalLimit 100 > +- LocalLimit 100 >+- Aggregate [sum(cs_ext_discount_amt#46) AS excess discount amount#23] > +- Filter i_manufact_id#72 = 977) && (i_item_sk#59 = > cs_item_sk#39)) && ((d_date#83 >= 2000-01-27) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string && ((d_date_sk#81 = cs_sold_date_sk#58) && > (cast(cs_ext_discount_amt#46 as decimal(14,7)) > cast(scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] as decimal(14,7) > : +- Project [(CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39 AS > cs_item_sk#39#111] > : +- Aggregate [cs_item_sk#39], > [CheckOverflow((promote_precision(cast(1.3 as decimal(11,6))) * > promote_precision(cast(avg(cs_ext_discount_amt#46) as decimal(11,6, > DecimalType(14,7)) AS (CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39] > :+- Filter (((d_date#83 >= 2000-01-27]) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string))) && (d_date_sk#81 = cs_sold_date_sk#58)) > : +- Join Inner > : :- SubqueryAlias catalog_sales > : : +- > Relation[cs_sold_time_sk#25,cs_ship_date_sk#26,cs_bill_customer_sk#27,cs_bill_cdemo_sk#28,cs_bill_hdemo_sk#29,cs_bill_addr_sk#30,cs_ship_customer_sk#31,cs_ship_cdemo_sk#32,cs_ship_hdemo_sk#33,cs_ship_addr_sk#34,cs_call_center_sk#35,cs_catalog_page_sk#36,cs_ship_mode_sk#37,cs_warehouse_sk#38,cs_item_sk#39,cs_promo_sk#40,cs_order_number#41,cs_quantity#42,cs_wholesale_cost#43,cs_list_price#44,cs_sales_price#45,cs_ext_discount_amt#46,cs_ext_sales_price#47,cs_ext_wholesale_cost#48,... > 10 more fields] parquet > : +- SubqueryAlias date_dim > : +- > Relation[d_date_sk#81,d_date_id#82,d_date#83,d_month_seq#84,d_week_seq#85,d_quarter_seq#86,d_year#87,d_dow#88,d_moy#89,d_dom#90,d_qoy#91,d_fy_year#92,d_fy_quarter_seq#93,d_fy_week_seq#94,d_day_name#95,d_quarter_name#96,d_holiday#97,d_weekend#98,d_following_holiday#99,d_first_dom#100,d_last_dom#101,d_same_day_ly#102,d_same_day_lq#103,d_current_day#104,... > 4 more fields] parquet > +- Join Inner > :- Join Inner > : :- SubqueryAlias catalog_sales > : : +- > Relation[cs_sold_time_sk#25,cs_ship_date_sk#26,cs_bill_customer_sk#27,cs_bill_cdemo_sk#28,cs_bill_hdemo_sk#29,cs_bill_addr_sk#30,cs_ship_customer_sk#31,cs_ship_cdemo_sk#32,cs_ship_hdemo_sk#33,cs_ship_addr_sk#34,cs_call_center_sk#35,cs_catalog_page_sk#36,cs_ship_mode_sk#37,cs_warehouse_sk#38,cs_item_sk#39,cs_promo_sk#40,cs_order_number#41,cs_quantity#42,cs_wholesale_cost#43,cs_list_price#44,cs_sales_price#45,cs_ext_discount_amt#46,cs_ext_sales_price#47,cs_ext_wholesale_cost#48,... > 10 more fields] parquet > : +- SubqueryAlias item > : +- >
[jira] [Commented] (SPARK-18814) CheckAnalysis rejects TPCDS query 32
[ https://issues.apache.org/jira/browse/SPARK-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15742879#comment-15742879 ] Nattavut Sutyanyong commented on SPARK-18814: - q92 has the same pattern as q32 and my simplified version. If it's possible, could you try patching my PR to verify the problem is resolved? > CheckAnalysis rejects TPCDS query 32 > > > Key: SPARK-18814 > URL: https://issues.apache.org/jira/browse/SPARK-18814 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Eric Liang >Assignee: Herman van Hovell >Priority: Blocker > > It seems the CheckAnalysis rule introduced by SPARK-18504 is incorrect > rejecting this TPCDS query, which ran fine in Spark 2.0. There doesn't seem > to be any obvious error in the query or the check rule though: in the plan > below, the scalar subquery's condition field is "scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] ", which should reference cs_item_sk#39. > Nonetheless CheckAnalysis complains that cs_item_sk#39 is not referenced by > the scalar subquery predicates. > analysis error: > {code} > == Query: q32-v1.4 == > Can't be analyzed: org.apache.spark.sql.AnalysisException: a GROUP BY clause > in a scalar correlated subquery cannot contain non-correlated columns: > cs_item_sk#39;; > GlobalLimit 100 > +- LocalLimit 100 >+- Aggregate [sum(cs_ext_discount_amt#46) AS excess discount amount#23] > +- Filter i_manufact_id#72 = 977) && (i_item_sk#59 = > cs_item_sk#39)) && ((d_date#83 >= 2000-01-27) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string && ((d_date_sk#81 = cs_sold_date_sk#58) && > (cast(cs_ext_discount_amt#46 as decimal(14,7)) > cast(scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] as decimal(14,7) > : +- Project [(CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39 AS > cs_item_sk#39#111] > : +- Aggregate [cs_item_sk#39], > [CheckOverflow((promote_precision(cast(1.3 as decimal(11,6))) * > promote_precision(cast(avg(cs_ext_discount_amt#46) as decimal(11,6, > DecimalType(14,7)) AS (CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39] > :+- Filter (((d_date#83 >= 2000-01-27]) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string))) && (d_date_sk#81 = cs_sold_date_sk#58)) > : +- Join Inner > : :- SubqueryAlias catalog_sales > : : +- > Relation[cs_sold_time_sk#25,cs_ship_date_sk#26,cs_bill_customer_sk#27,cs_bill_cdemo_sk#28,cs_bill_hdemo_sk#29,cs_bill_addr_sk#30,cs_ship_customer_sk#31,cs_ship_cdemo_sk#32,cs_ship_hdemo_sk#33,cs_ship_addr_sk#34,cs_call_center_sk#35,cs_catalog_page_sk#36,cs_ship_mode_sk#37,cs_warehouse_sk#38,cs_item_sk#39,cs_promo_sk#40,cs_order_number#41,cs_quantity#42,cs_wholesale_cost#43,cs_list_price#44,cs_sales_price#45,cs_ext_discount_amt#46,cs_ext_sales_price#47,cs_ext_wholesale_cost#48,... > 10 more fields] parquet > : +- SubqueryAlias date_dim > : +- > Relation[d_date_sk#81,d_date_id#82,d_date#83,d_month_seq#84,d_week_seq#85,d_quarter_seq#86,d_year#87,d_dow#88,d_moy#89,d_dom#90,d_qoy#91,d_fy_year#92,d_fy_quarter_seq#93,d_fy_week_seq#94,d_day_name#95,d_quarter_name#96,d_holiday#97,d_weekend#98,d_following_holiday#99,d_first_dom#100,d_last_dom#101,d_same_day_ly#102,d_same_day_lq#103,d_current_day#104,... > 4 more fields] parquet > +- Join Inner > :- Join Inner > : :- SubqueryAlias catalog_sales > : : +- > Relation[cs_sold_time_sk#25,cs_ship_date_sk#26,cs_bill_customer_sk#27,cs_bill_cdemo_sk#28,cs_bill_hdemo_sk#29,cs_bill_addr_sk#30,cs_ship_customer_sk#31,cs_ship_cdemo_sk#32,cs_ship_hdemo_sk#33,cs_ship_addr_sk#34,cs_call_center_sk#35,cs_catalog_page_sk#36,cs_ship_mode_sk#37,cs_warehouse_sk#38,cs_item_sk#39,cs_promo_sk#40,cs_order_number#41,cs_quantity#42,cs_wholesale_cost#43,cs_list_price#44,cs_sales_price#45,cs_ext_discount_amt#46,cs_ext_sales_price#47,cs_ext_wholesale_cost#48,... > 10 more fields] parquet > : +- SubqueryAlias item > : +- > Relation[i_item_sk#59,i_item_id#60,i_rec_start_date#61,i_rec_end_date#62,i_item_desc#63,i_current_price#64,i_wholesale_cost#65,i_brand_id#66,i_brand#67,i_class_id#68,i_class#69,i_category_id#70,i_category#71,i_manufact_id#72,i_manufact#73,i_size#74,i_formulation#75,i_color#76,i_units#77,i_container#78,i_manager_id#79,i_product_name#80] > parquet > +- SubqueryAlias date_dim >+- >
[jira] [Commented] (SPARK-18814) CheckAnalysis rejects TPCDS query 32
[ https://issues.apache.org/jira/browse/SPARK-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15742828#comment-15742828 ] Kazuaki Ishizaki commented on SPARK-18814: -- I found the same error {{org.apache.spark.sql.AnalysisException: a GROUP BY clause in a scalar correlated subquery cannot contain non-correlated columns: ws_item_sk#1081;;}} when I ran q92 using master branch. > CheckAnalysis rejects TPCDS query 32 > > > Key: SPARK-18814 > URL: https://issues.apache.org/jira/browse/SPARK-18814 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Eric Liang >Assignee: Herman van Hovell >Priority: Blocker > > It seems the CheckAnalysis rule introduced by SPARK-18504 is incorrect > rejecting this TPCDS query, which ran fine in Spark 2.0. There doesn't seem > to be any obvious error in the query or the check rule though: in the plan > below, the scalar subquery's condition field is "scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] ", which should reference cs_item_sk#39. > Nonetheless CheckAnalysis complains that cs_item_sk#39 is not referenced by > the scalar subquery predicates. > analysis error: > {code} > == Query: q32-v1.4 == > Can't be analyzed: org.apache.spark.sql.AnalysisException: a GROUP BY clause > in a scalar correlated subquery cannot contain non-correlated columns: > cs_item_sk#39;; > GlobalLimit 100 > +- LocalLimit 100 >+- Aggregate [sum(cs_ext_discount_amt#46) AS excess discount amount#23] > +- Filter i_manufact_id#72 = 977) && (i_item_sk#59 = > cs_item_sk#39)) && ((d_date#83 >= 2000-01-27) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string && ((d_date_sk#81 = cs_sold_date_sk#58) && > (cast(cs_ext_discount_amt#46 as decimal(14,7)) > cast(scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] as decimal(14,7) > : +- Project [(CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39 AS > cs_item_sk#39#111] > : +- Aggregate [cs_item_sk#39], > [CheckOverflow((promote_precision(cast(1.3 as decimal(11,6))) * > promote_precision(cast(avg(cs_ext_discount_amt#46) as decimal(11,6, > DecimalType(14,7)) AS (CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39] > :+- Filter (((d_date#83 >= 2000-01-27]) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string))) && (d_date_sk#81 = cs_sold_date_sk#58)) > : +- Join Inner > : :- SubqueryAlias catalog_sales > : : +- > Relation[cs_sold_time_sk#25,cs_ship_date_sk#26,cs_bill_customer_sk#27,cs_bill_cdemo_sk#28,cs_bill_hdemo_sk#29,cs_bill_addr_sk#30,cs_ship_customer_sk#31,cs_ship_cdemo_sk#32,cs_ship_hdemo_sk#33,cs_ship_addr_sk#34,cs_call_center_sk#35,cs_catalog_page_sk#36,cs_ship_mode_sk#37,cs_warehouse_sk#38,cs_item_sk#39,cs_promo_sk#40,cs_order_number#41,cs_quantity#42,cs_wholesale_cost#43,cs_list_price#44,cs_sales_price#45,cs_ext_discount_amt#46,cs_ext_sales_price#47,cs_ext_wholesale_cost#48,... > 10 more fields] parquet > : +- SubqueryAlias date_dim > : +- > Relation[d_date_sk#81,d_date_id#82,d_date#83,d_month_seq#84,d_week_seq#85,d_quarter_seq#86,d_year#87,d_dow#88,d_moy#89,d_dom#90,d_qoy#91,d_fy_year#92,d_fy_quarter_seq#93,d_fy_week_seq#94,d_day_name#95,d_quarter_name#96,d_holiday#97,d_weekend#98,d_following_holiday#99,d_first_dom#100,d_last_dom#101,d_same_day_ly#102,d_same_day_lq#103,d_current_day#104,... > 4 more fields] parquet > +- Join Inner > :- Join Inner > : :- SubqueryAlias catalog_sales > : : +- > Relation[cs_sold_time_sk#25,cs_ship_date_sk#26,cs_bill_customer_sk#27,cs_bill_cdemo_sk#28,cs_bill_hdemo_sk#29,cs_bill_addr_sk#30,cs_ship_customer_sk#31,cs_ship_cdemo_sk#32,cs_ship_hdemo_sk#33,cs_ship_addr_sk#34,cs_call_center_sk#35,cs_catalog_page_sk#36,cs_ship_mode_sk#37,cs_warehouse_sk#38,cs_item_sk#39,cs_promo_sk#40,cs_order_number#41,cs_quantity#42,cs_wholesale_cost#43,cs_list_price#44,cs_sales_price#45,cs_ext_discount_amt#46,cs_ext_sales_price#47,cs_ext_wholesale_cost#48,... > 10 more fields] parquet > : +- SubqueryAlias item > : +- > Relation[i_item_sk#59,i_item_id#60,i_rec_start_date#61,i_rec_end_date#62,i_item_desc#63,i_current_price#64,i_wholesale_cost#65,i_brand_id#66,i_brand#67,i_class_id#68,i_class#69,i_category_id#70,i_category#71,i_manufact_id#72,i_manufact#73,i_size#74,i_formulation#75,i_color#76,i_units#77,i_container#78,i_manager_id#79,i_product_name#80] > parquet > +-
[jira] [Commented] (SPARK-18814) CheckAnalysis rejects TPCDS query 32
[ https://issues.apache.org/jira/browse/SPARK-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15738058#comment-15738058 ] Apache Spark commented on SPARK-18814: -- User 'nsyca' has created a pull request for this issue: https://github.com/apache/spark/pull/16246 > CheckAnalysis rejects TPCDS query 32 > > > Key: SPARK-18814 > URL: https://issues.apache.org/jira/browse/SPARK-18814 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Eric Liang >Assignee: Herman van Hovell >Priority: Blocker > > It seems the CheckAnalysis rule introduced by SPARK-18504 is incorrect > rejecting this TPCDS query, which ran fine in Spark 2.0. There doesn't seem > to be any obvious error in the query or the check rule though: in the plan > below, the scalar subquery's condition field is "scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] ", which should reference cs_item_sk#39. > Nonetheless CheckAnalysis complains that cs_item_sk#39 is not referenced by > the scalar subquery predicates. > analysis error: > {code} > == Query: q32-v1.4 == > Can't be analyzed: org.apache.spark.sql.AnalysisException: a GROUP BY clause > in a scalar correlated subquery cannot contain non-correlated columns: > cs_item_sk#39;; > GlobalLimit 100 > +- LocalLimit 100 >+- Aggregate [sum(cs_ext_discount_amt#46) AS excess discount amount#23] > +- Filter i_manufact_id#72 = 977) && (i_item_sk#59 = > cs_item_sk#39)) && ((d_date#83 >= 2000-01-27) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string && ((d_date_sk#81 = cs_sold_date_sk#58) && > (cast(cs_ext_discount_amt#46 as decimal(14,7)) > cast(scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] as decimal(14,7) > : +- Project [(CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39 AS > cs_item_sk#39#111] > : +- Aggregate [cs_item_sk#39], > [CheckOverflow((promote_precision(cast(1.3 as decimal(11,6))) * > promote_precision(cast(avg(cs_ext_discount_amt#46) as decimal(11,6, > DecimalType(14,7)) AS (CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39] > :+- Filter (((d_date#83 >= 2000-01-27]) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string))) && (d_date_sk#81 = cs_sold_date_sk#58)) > : +- Join Inner > : :- SubqueryAlias catalog_sales > : : +- > Relation[cs_sold_time_sk#25,cs_ship_date_sk#26,cs_bill_customer_sk#27,cs_bill_cdemo_sk#28,cs_bill_hdemo_sk#29,cs_bill_addr_sk#30,cs_ship_customer_sk#31,cs_ship_cdemo_sk#32,cs_ship_hdemo_sk#33,cs_ship_addr_sk#34,cs_call_center_sk#35,cs_catalog_page_sk#36,cs_ship_mode_sk#37,cs_warehouse_sk#38,cs_item_sk#39,cs_promo_sk#40,cs_order_number#41,cs_quantity#42,cs_wholesale_cost#43,cs_list_price#44,cs_sales_price#45,cs_ext_discount_amt#46,cs_ext_sales_price#47,cs_ext_wholesale_cost#48,... > 10 more fields] parquet > : +- SubqueryAlias date_dim > : +- > Relation[d_date_sk#81,d_date_id#82,d_date#83,d_month_seq#84,d_week_seq#85,d_quarter_seq#86,d_year#87,d_dow#88,d_moy#89,d_dom#90,d_qoy#91,d_fy_year#92,d_fy_quarter_seq#93,d_fy_week_seq#94,d_day_name#95,d_quarter_name#96,d_holiday#97,d_weekend#98,d_following_holiday#99,d_first_dom#100,d_last_dom#101,d_same_day_ly#102,d_same_day_lq#103,d_current_day#104,... > 4 more fields] parquet > +- Join Inner > :- Join Inner > : :- SubqueryAlias catalog_sales > : : +- > Relation[cs_sold_time_sk#25,cs_ship_date_sk#26,cs_bill_customer_sk#27,cs_bill_cdemo_sk#28,cs_bill_hdemo_sk#29,cs_bill_addr_sk#30,cs_ship_customer_sk#31,cs_ship_cdemo_sk#32,cs_ship_hdemo_sk#33,cs_ship_addr_sk#34,cs_call_center_sk#35,cs_catalog_page_sk#36,cs_ship_mode_sk#37,cs_warehouse_sk#38,cs_item_sk#39,cs_promo_sk#40,cs_order_number#41,cs_quantity#42,cs_wholesale_cost#43,cs_list_price#44,cs_sales_price#45,cs_ext_discount_amt#46,cs_ext_sales_price#47,cs_ext_wholesale_cost#48,... > 10 more fields] parquet > : +- SubqueryAlias item > : +- > Relation[i_item_sk#59,i_item_id#60,i_rec_start_date#61,i_rec_end_date#62,i_item_desc#63,i_current_price#64,i_wholesale_cost#65,i_brand_id#66,i_brand#67,i_class_id#68,i_class#69,i_category_id#70,i_category#71,i_manufact_id#72,i_manufact#73,i_size#74,i_formulation#75,i_color#76,i_units#77,i_container#78,i_manager_id#79,i_product_name#80] > parquet > +- SubqueryAlias date_dim >+- >
[jira] [Commented] (SPARK-18814) CheckAnalysis rejects TPCDS query 32
[ https://issues.apache.org/jira/browse/SPARK-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737865#comment-15737865 ] Nattavut Sutyanyong commented on SPARK-18814: - I have a potential fix; it works but it's not pretty. I want to step back and think about it more but if a fix is urgently needed, I can submit a PR for it. > CheckAnalysis rejects TPCDS query 32 > > > Key: SPARK-18814 > URL: https://issues.apache.org/jira/browse/SPARK-18814 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Eric Liang >Priority: Blocker > > It seems the CheckAnalysis rule introduced by SPARK-18504 is incorrect > rejecting this TPCDS query, which ran fine in Spark 2.0. There doesn't seem > to be any obvious error in the query or the check rule though: in the plan > below, the scalar subquery's condition field is "scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] ", which should reference cs_item_sk#39. > Nonetheless CheckAnalysis complains that cs_item_sk#39 is not referenced by > the scalar subquery predicates. > analysis error: > {code} > == Query: q32-v1.4 == > Can't be analyzed: org.apache.spark.sql.AnalysisException: a GROUP BY clause > in a scalar correlated subquery cannot contain non-correlated columns: > cs_item_sk#39;; > GlobalLimit 100 > +- LocalLimit 100 >+- Aggregate [sum(cs_ext_discount_amt#46) AS excess discount amount#23] > +- Filter i_manufact_id#72 = 977) && (i_item_sk#59 = > cs_item_sk#39)) && ((d_date#83 >= 2000-01-27) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string && ((d_date_sk#81 = cs_sold_date_sk#58) && > (cast(cs_ext_discount_amt#46 as decimal(14,7)) > cast(scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] as decimal(14,7) > : +- Project [(CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39 AS > cs_item_sk#39#111] > : +- Aggregate [cs_item_sk#39], > [CheckOverflow((promote_precision(cast(1.3 as decimal(11,6))) * > promote_precision(cast(avg(cs_ext_discount_amt#46) as decimal(11,6, > DecimalType(14,7)) AS (CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39] > :+- Filter (((d_date#83 >= 2000-01-27]) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string))) && (d_date_sk#81 = cs_sold_date_sk#58)) > : +- Join Inner > : :- SubqueryAlias catalog_sales > : : +- > Relation[cs_sold_time_sk#25,cs_ship_date_sk#26,cs_bill_customer_sk#27,cs_bill_cdemo_sk#28,cs_bill_hdemo_sk#29,cs_bill_addr_sk#30,cs_ship_customer_sk#31,cs_ship_cdemo_sk#32,cs_ship_hdemo_sk#33,cs_ship_addr_sk#34,cs_call_center_sk#35,cs_catalog_page_sk#36,cs_ship_mode_sk#37,cs_warehouse_sk#38,cs_item_sk#39,cs_promo_sk#40,cs_order_number#41,cs_quantity#42,cs_wholesale_cost#43,cs_list_price#44,cs_sales_price#45,cs_ext_discount_amt#46,cs_ext_sales_price#47,cs_ext_wholesale_cost#48,... > 10 more fields] parquet > : +- SubqueryAlias date_dim > : +- > Relation[d_date_sk#81,d_date_id#82,d_date#83,d_month_seq#84,d_week_seq#85,d_quarter_seq#86,d_year#87,d_dow#88,d_moy#89,d_dom#90,d_qoy#91,d_fy_year#92,d_fy_quarter_seq#93,d_fy_week_seq#94,d_day_name#95,d_quarter_name#96,d_holiday#97,d_weekend#98,d_following_holiday#99,d_first_dom#100,d_last_dom#101,d_same_day_ly#102,d_same_day_lq#103,d_current_day#104,... > 4 more fields] parquet > +- Join Inner > :- Join Inner > : :- SubqueryAlias catalog_sales > : : +- > Relation[cs_sold_time_sk#25,cs_ship_date_sk#26,cs_bill_customer_sk#27,cs_bill_cdemo_sk#28,cs_bill_hdemo_sk#29,cs_bill_addr_sk#30,cs_ship_customer_sk#31,cs_ship_cdemo_sk#32,cs_ship_hdemo_sk#33,cs_ship_addr_sk#34,cs_call_center_sk#35,cs_catalog_page_sk#36,cs_ship_mode_sk#37,cs_warehouse_sk#38,cs_item_sk#39,cs_promo_sk#40,cs_order_number#41,cs_quantity#42,cs_wholesale_cost#43,cs_list_price#44,cs_sales_price#45,cs_ext_discount_amt#46,cs_ext_sales_price#47,cs_ext_wholesale_cost#48,... > 10 more fields] parquet > : +- SubqueryAlias item > : +- > Relation[i_item_sk#59,i_item_id#60,i_rec_start_date#61,i_rec_end_date#62,i_item_desc#63,i_current_price#64,i_wholesale_cost#65,i_brand_id#66,i_brand#67,i_class_id#68,i_class#69,i_category_id#70,i_category#71,i_manufact_id#72,i_manufact#73,i_size#74,i_formulation#75,i_color#76,i_units#77,i_container#78,i_manager_id#79,i_product_name#80] > parquet > +- SubqueryAlias date_dim >+- >
[jira] [Commented] (SPARK-18814) CheckAnalysis rejects TPCDS query 32
[ https://issues.apache.org/jira/browse/SPARK-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737710#comment-15737710 ] Nattavut Sutyanyong commented on SPARK-18814: - [~cloud_fan] FYI, part of the problem here is the deDuplicate logic that rewires the groupby column to a new ExprId in the middle of the checking, a similar problem tracked by SPARK-17154. {code} 1251 // Make sure the inner and the outer query attributes do not collide. 1252 val outputSet = outer.map(_.outputSet).reduce(_ ++ _) 1253 val duplicates = basePlan.outputSet.intersect(outputSet) 1254 val (plan, deDuplicatedConditions) = if (duplicates.nonEmpty) { 1255 val aliasMap = AttributeMap(duplicates.map { dup => 1256 dup -> Alias(dup, dup.toString)() 1257 }.toSeq) 1258 val aliasedExpressions = basePlan.output.map { ref => 1259 aliasMap.getOrElse(ref, ref) 1260 } 1261 val aliasedProjection = Project(aliasedExpressions, basePlan) 1262 val aliasedConditions = baseConditions.map(_.transform { 1263 case ref: Attribute => aliasMap.getOrElse(ref, ref).toAttribute 1264 }) 1265 (aliasedProjection, aliasedConditions) 1266 } else { 1267 (basePlan, baseConditions) 1268 } {code} > CheckAnalysis rejects TPCDS query 32 > > > Key: SPARK-18814 > URL: https://issues.apache.org/jira/browse/SPARK-18814 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Eric Liang >Priority: Blocker > > It seems the CheckAnalysis rule introduced by SPARK-18504 is incorrect > rejecting this TPCDS query, which ran fine in Spark 2.0. There doesn't seem > to be any obvious error in the query or the check rule though: in the plan > below, the scalar subquery's condition field is "scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] ", which should reference cs_item_sk#39. > Nonetheless CheckAnalysis complains that cs_item_sk#39 is not referenced by > the scalar subquery predicates. > analysis error: > {code} > == Query: q32-v1.4 == > Can't be analyzed: org.apache.spark.sql.AnalysisException: a GROUP BY clause > in a scalar correlated subquery cannot contain non-correlated columns: > cs_item_sk#39;; > GlobalLimit 100 > +- LocalLimit 100 >+- Aggregate [sum(cs_ext_discount_amt#46) AS excess discount amount#23] > +- Filter i_manufact_id#72 = 977) && (i_item_sk#59 = > cs_item_sk#39)) && ((d_date#83 >= 2000-01-27) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string && ((d_date_sk#81 = cs_sold_date_sk#58) && > (cast(cs_ext_discount_amt#46 as decimal(14,7)) > cast(scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] as decimal(14,7) > : +- Project [(CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39 AS > cs_item_sk#39#111] > : +- Aggregate [cs_item_sk#39], > [CheckOverflow((promote_precision(cast(1.3 as decimal(11,6))) * > promote_precision(cast(avg(cs_ext_discount_amt#46) as decimal(11,6, > DecimalType(14,7)) AS (CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39] > :+- Filter (((d_date#83 >= 2000-01-27]) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string))) && (d_date_sk#81 = cs_sold_date_sk#58)) > : +- Join Inner > : :- SubqueryAlias catalog_sales > : : +- > Relation[cs_sold_time_sk#25,cs_ship_date_sk#26,cs_bill_customer_sk#27,cs_bill_cdemo_sk#28,cs_bill_hdemo_sk#29,cs_bill_addr_sk#30,cs_ship_customer_sk#31,cs_ship_cdemo_sk#32,cs_ship_hdemo_sk#33,cs_ship_addr_sk#34,cs_call_center_sk#35,cs_catalog_page_sk#36,cs_ship_mode_sk#37,cs_warehouse_sk#38,cs_item_sk#39,cs_promo_sk#40,cs_order_number#41,cs_quantity#42,cs_wholesale_cost#43,cs_list_price#44,cs_sales_price#45,cs_ext_discount_amt#46,cs_ext_sales_price#47,cs_ext_wholesale_cost#48,... > 10 more fields] parquet > : +- SubqueryAlias date_dim > : +- > Relation[d_date_sk#81,d_date_id#82,d_date#83,d_month_seq#84,d_week_seq#85,d_quarter_seq#86,d_year#87,d_dow#88,d_moy#89,d_dom#90,d_qoy#91,d_fy_year#92,d_fy_quarter_seq#93,d_fy_week_seq#94,d_day_name#95,d_quarter_name#96,d_holiday#97,d_weekend#98,d_following_holiday#99,d_first_dom#100,d_last_dom#101,d_same_day_ly#102,d_same_day_lq#103,d_current_day#104,... > 4 more fields] parquet > +- Join Inner > :- Join Inner > : :- SubqueryAlias catalog_sales > : : +- >
[jira] [Commented] (SPARK-18814) CheckAnalysis rejects TPCDS query 32
[ https://issues.apache.org/jira/browse/SPARK-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737686#comment-15737686 ] Nattavut Sutyanyong commented on SPARK-18814: - I can reproduce with a simple script now. {code} Seq((1,1)).toDF("pk","pv").createOrReplaceTempView("p") Seq((1,1)).toDF("ck","cv").createOrReplaceTempView("c") sql("select * from p,c where p.pk=c.ck and c.cv = (select avg(c1.cv) from c c1 where c1.ck = p.pk)").show {code} The requirements are: 1. We need to reference the same table twice in both the parent and the subquery. Here is the table c. 2. We need to have a correlated predicate but to a different table. Here is from c (as c1) in the subquery to p in the parent. 3. We will then "deduplicate" c1.ck in the subquery to {{ck##}} at {{Project}} above {{Aggregate}} of {{avg}}. Then when we compare {{ck##}} and the original group by column {{ck#}} by their canonicalized form, which is # != #. That's how we trigger the exception I added. I will continue working on a fix. > CheckAnalysis rejects TPCDS query 32 > > > Key: SPARK-18814 > URL: https://issues.apache.org/jira/browse/SPARK-18814 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Eric Liang >Priority: Blocker > > It seems the CheckAnalysis rule introduced by SPARK-18504 is incorrect > rejecting this TPCDS query, which ran fine in Spark 2.0. There doesn't seem > to be any obvious error in the query or the check rule though: in the plan > below, the scalar subquery's condition field is "scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] ", which should reference cs_item_sk#39. > Nonetheless CheckAnalysis complains that cs_item_sk#39 is not referenced by > the scalar subquery predicates. > analysis error: > {code} > == Query: q32-v1.4 == > Can't be analyzed: org.apache.spark.sql.AnalysisException: a GROUP BY clause > in a scalar correlated subquery cannot contain non-correlated columns: > cs_item_sk#39;; > GlobalLimit 100 > +- LocalLimit 100 >+- Aggregate [sum(cs_ext_discount_amt#46) AS excess discount amount#23] > +- Filter i_manufact_id#72 = 977) && (i_item_sk#59 = > cs_item_sk#39)) && ((d_date#83 >= 2000-01-27) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string && ((d_date_sk#81 = cs_sold_date_sk#58) && > (cast(cs_ext_discount_amt#46 as decimal(14,7)) > cast(scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] as decimal(14,7) > : +- Project [(CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39 AS > cs_item_sk#39#111] > : +- Aggregate [cs_item_sk#39], > [CheckOverflow((promote_precision(cast(1.3 as decimal(11,6))) * > promote_precision(cast(avg(cs_ext_discount_amt#46) as decimal(11,6, > DecimalType(14,7)) AS (CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39] > :+- Filter (((d_date#83 >= 2000-01-27]) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string))) && (d_date_sk#81 = cs_sold_date_sk#58)) > : +- Join Inner > : :- SubqueryAlias catalog_sales > : : +- > Relation[cs_sold_time_sk#25,cs_ship_date_sk#26,cs_bill_customer_sk#27,cs_bill_cdemo_sk#28,cs_bill_hdemo_sk#29,cs_bill_addr_sk#30,cs_ship_customer_sk#31,cs_ship_cdemo_sk#32,cs_ship_hdemo_sk#33,cs_ship_addr_sk#34,cs_call_center_sk#35,cs_catalog_page_sk#36,cs_ship_mode_sk#37,cs_warehouse_sk#38,cs_item_sk#39,cs_promo_sk#40,cs_order_number#41,cs_quantity#42,cs_wholesale_cost#43,cs_list_price#44,cs_sales_price#45,cs_ext_discount_amt#46,cs_ext_sales_price#47,cs_ext_wholesale_cost#48,... > 10 more fields] parquet > : +- SubqueryAlias date_dim > : +- > Relation[d_date_sk#81,d_date_id#82,d_date#83,d_month_seq#84,d_week_seq#85,d_quarter_seq#86,d_year#87,d_dow#88,d_moy#89,d_dom#90,d_qoy#91,d_fy_year#92,d_fy_quarter_seq#93,d_fy_week_seq#94,d_day_name#95,d_quarter_name#96,d_holiday#97,d_weekend#98,d_following_holiday#99,d_first_dom#100,d_last_dom#101,d_same_day_ly#102,d_same_day_lq#103,d_current_day#104,... > 4 more fields] parquet > +- Join Inner > :- Join Inner > : :- SubqueryAlias catalog_sales > : : +- >
[jira] [Commented] (SPARK-18814) CheckAnalysis rejects TPCDS query 32
[ https://issues.apache.org/jira/browse/SPARK-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737640#comment-15737640 ] Nattavut Sutyanyong commented on SPARK-18814: - To get the extra `#111` of `cs_item_sk#39#111`, we need to reference the same table in both the parent side and the subquery side (as the catalog_sales in Q35) so it will run thru the deduplicate logic to add `#111` to distinguish the same column of the two contexts. > CheckAnalysis rejects TPCDS query 32 > > > Key: SPARK-18814 > URL: https://issues.apache.org/jira/browse/SPARK-18814 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Eric Liang >Priority: Blocker > > It seems the CheckAnalysis rule introduced by SPARK-18504 is incorrect > rejecting this TPCDS query, which ran fine in Spark 2.0. There doesn't seem > to be any obvious error in the query or the check rule though: in the plan > below, the scalar subquery's condition field is "scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] ", which should reference cs_item_sk#39. > Nonetheless CheckAnalysis complains that cs_item_sk#39 is not referenced by > the scalar subquery predicates. > analysis error: > {code} > == Query: q32-v1.4 == > Can't be analyzed: org.apache.spark.sql.AnalysisException: a GROUP BY clause > in a scalar correlated subquery cannot contain non-correlated columns: > cs_item_sk#39;; > GlobalLimit 100 > +- LocalLimit 100 >+- Aggregate [sum(cs_ext_discount_amt#46) AS excess discount amount#23] > +- Filter i_manufact_id#72 = 977) && (i_item_sk#59 = > cs_item_sk#39)) && ((d_date#83 >= 2000-01-27) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string && ((d_date_sk#81 = cs_sold_date_sk#58) && > (cast(cs_ext_discount_amt#46 as decimal(14,7)) > cast(scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] as decimal(14,7) > : +- Project [(CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39 AS > cs_item_sk#39#111] > : +- Aggregate [cs_item_sk#39], > [CheckOverflow((promote_precision(cast(1.3 as decimal(11,6))) * > promote_precision(cast(avg(cs_ext_discount_amt#46) as decimal(11,6, > DecimalType(14,7)) AS (CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39] > :+- Filter (((d_date#83 >= 2000-01-27]) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string))) && (d_date_sk#81 = cs_sold_date_sk#58)) > : +- Join Inner > : :- SubqueryAlias catalog_sales > : : +- > Relation[cs_sold_time_sk#25,cs_ship_date_sk#26,cs_bill_customer_sk#27,cs_bill_cdemo_sk#28,cs_bill_hdemo_sk#29,cs_bill_addr_sk#30,cs_ship_customer_sk#31,cs_ship_cdemo_sk#32,cs_ship_hdemo_sk#33,cs_ship_addr_sk#34,cs_call_center_sk#35,cs_catalog_page_sk#36,cs_ship_mode_sk#37,cs_warehouse_sk#38,cs_item_sk#39,cs_promo_sk#40,cs_order_number#41,cs_quantity#42,cs_wholesale_cost#43,cs_list_price#44,cs_sales_price#45,cs_ext_discount_amt#46,cs_ext_sales_price#47,cs_ext_wholesale_cost#48,... > 10 more fields] parquet > : +- SubqueryAlias date_dim > : +- > Relation[d_date_sk#81,d_date_id#82,d_date#83,d_month_seq#84,d_week_seq#85,d_quarter_seq#86,d_year#87,d_dow#88,d_moy#89,d_dom#90,d_qoy#91,d_fy_year#92,d_fy_quarter_seq#93,d_fy_week_seq#94,d_day_name#95,d_quarter_name#96,d_holiday#97,d_weekend#98,d_following_holiday#99,d_first_dom#100,d_last_dom#101,d_same_day_ly#102,d_same_day_lq#103,d_current_day#104,... > 4 more fields] parquet > +- Join Inner > :- Join Inner > : :- SubqueryAlias catalog_sales > : : +- > Relation[cs_sold_time_sk#25,cs_ship_date_sk#26,cs_bill_customer_sk#27,cs_bill_cdemo_sk#28,cs_bill_hdemo_sk#29,cs_bill_addr_sk#30,cs_ship_customer_sk#31,cs_ship_cdemo_sk#32,cs_ship_hdemo_sk#33,cs_ship_addr_sk#34,cs_call_center_sk#35,cs_catalog_page_sk#36,cs_ship_mode_sk#37,cs_warehouse_sk#38,cs_item_sk#39,cs_promo_sk#40,cs_order_number#41,cs_quantity#42,cs_wholesale_cost#43,cs_list_price#44,cs_sales_price#45,cs_ext_discount_amt#46,cs_ext_sales_price#47,cs_ext_wholesale_cost#48,... > 10 more fields] parquet > : +- SubqueryAlias item > : +- > Relation[i_item_sk#59,i_item_id#60,i_rec_start_date#61,i_rec_end_date#62,i_item_desc#63,i_current_price#64,i_wholesale_cost#65,i_brand_id#66,i_brand#67,i_class_id#68,i_class#69,i_category_id#70,i_category#71,i_manufact_id#72,i_manufact#73,i_size#74,i_formulation#75,i_color#76,i_units#77,i_container#78,i_manager_id#79,i_product_name#80] >
[jira] [Commented] (SPARK-18814) CheckAnalysis rejects TPCDS query 32
[ https://issues.apache.org/jira/browse/SPARK-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737148#comment-15737148 ] Eric Liang commented on SPARK-18814: It seems that the references of an Alias expression should include the referenced attribute, so I would expect #39 to still show up. I could be misunderstanding the behavior of Alias though. On Fri, Dec 9, 2016, 7:50 PM Nattavut Sutyanyong (JIRA)> CheckAnalysis rejects TPCDS query 32 > > > Key: SPARK-18814 > URL: https://issues.apache.org/jira/browse/SPARK-18814 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Eric Liang >Priority: Blocker > > It seems the CheckAnalysis rule introduced by SPARK-18504 is incorrect > rejecting this TPCDS query, which ran fine in Spark 2.0. There doesn't seem > to be any obvious error in the query or the check rule though: in the plan > below, the scalar subquery's condition field is "scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] ", which should reference cs_item_sk#39. > Nonetheless CheckAnalysis complains that cs_item_sk#39 is not referenced by > the scalar subquery predicates. > analysis error: > {code} > == Query: q32-v1.4 == > Can't be analyzed: org.apache.spark.sql.AnalysisException: a GROUP BY clause > in a scalar correlated subquery cannot contain non-correlated columns: > cs_item_sk#39;; > GlobalLimit 100 > +- LocalLimit 100 >+- Aggregate [sum(cs_ext_discount_amt#46) AS excess discount amount#23] > +- Filter i_manufact_id#72 = 977) && (i_item_sk#59 = > cs_item_sk#39)) && ((d_date#83 >= 2000-01-27) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string && ((d_date_sk#81 = cs_sold_date_sk#58) && > (cast(cs_ext_discount_amt#46 as decimal(14,7)) > cast(scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] as decimal(14,7) > : +- Project [(CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39 AS > cs_item_sk#39#111] > : +- Aggregate [cs_item_sk#39], > [CheckOverflow((promote_precision(cast(1.3 as decimal(11,6))) * > promote_precision(cast(avg(cs_ext_discount_amt#46) as decimal(11,6, > DecimalType(14,7)) AS (CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39] > :+- Filter (((d_date#83 >= 2000-01-27]) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string))) && (d_date_sk#81 = cs_sold_date_sk#58)) > : +- Join Inner > : :- SubqueryAlias catalog_sales > : : +- > Relation[cs_sold_time_sk#25,cs_ship_date_sk#26,cs_bill_customer_sk#27,cs_bill_cdemo_sk#28,cs_bill_hdemo_sk#29,cs_bill_addr_sk#30,cs_ship_customer_sk#31,cs_ship_cdemo_sk#32,cs_ship_hdemo_sk#33,cs_ship_addr_sk#34,cs_call_center_sk#35,cs_catalog_page_sk#36,cs_ship_mode_sk#37,cs_warehouse_sk#38,cs_item_sk#39,cs_promo_sk#40,cs_order_number#41,cs_quantity#42,cs_wholesale_cost#43,cs_list_price#44,cs_sales_price#45,cs_ext_discount_amt#46,cs_ext_sales_price#47,cs_ext_wholesale_cost#48,... > 10 more fields] parquet > : +- SubqueryAlias date_dim > : +- > Relation[d_date_sk#81,d_date_id#82,d_date#83,d_month_seq#84,d_week_seq#85,d_quarter_seq#86,d_year#87,d_dow#88,d_moy#89,d_dom#90,d_qoy#91,d_fy_year#92,d_fy_quarter_seq#93,d_fy_week_seq#94,d_day_name#95,d_quarter_name#96,d_holiday#97,d_weekend#98,d_following_holiday#99,d_first_dom#100,d_last_dom#101,d_same_day_ly#102,d_same_day_lq#103,d_current_day#104,... > 4 more fields] parquet > +- Join Inner > :- Join Inner > : :- SubqueryAlias catalog_sales > : : +- > Relation[cs_sold_time_sk#25,cs_ship_date_sk#26,cs_bill_customer_sk#27,cs_bill_cdemo_sk#28,cs_bill_hdemo_sk#29,cs_bill_addr_sk#30,cs_ship_customer_sk#31,cs_ship_cdemo_sk#32,cs_ship_hdemo_sk#33,cs_ship_addr_sk#34,cs_call_center_sk#35,cs_catalog_page_sk#36,cs_ship_mode_sk#37,cs_warehouse_sk#38,cs_item_sk#39,cs_promo_sk#40,cs_order_number#41,cs_quantity#42,cs_wholesale_cost#43,cs_list_price#44,cs_sales_price#45,cs_ext_discount_amt#46,cs_ext_sales_price#47,cs_ext_wholesale_cost#48,... > 10 more fields] parquet > : +- SubqueryAlias item > : +- > Relation[i_item_sk#59,i_item_id#60,i_rec_start_date#61,i_rec_end_date#62,i_item_desc#63,i_current_price#64,i_wholesale_cost#65,i_brand_id#66,i_brand#67,i_class_id#68,i_class#69,i_category_id#70,i_category#71,i_manufact_id#72,i_manufact#73,i_size#74,i_formulation#75,i_color#76,i_units#77,i_container#78,i_manager_id#79,i_product_name#80] > parquet >
[jira] [Commented] (SPARK-18814) CheckAnalysis rejects TPCDS query 32
[ https://issues.apache.org/jira/browse/SPARK-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737142#comment-15737142 ] Nattavut Sutyanyong commented on SPARK-18814: - It looks like the `Project` between `Aggregate` and `Filter scalar-subquery` maps `cs_item_sk#39` to `cs_item_sk#39#111`. The logic in the code is not robust enough to recognize that the two symbols are equivalent. I tried to simplify the problem to {code} Seq[(java.lang.Integer, scalar.lang.BigDecimal)]((1,BigDecimal(1.0))).toDF("k","v").createOrReplaceTempView("P") Seq[(java.lang.Integer, scala.math.BigDecimal)]((1,BigDecimal(1.0))).toDF("k1","v1").createOrReplaceTempView("C") sql("select * from p where v = (select 1.1 * avg(v1) from c where c.k1=p.k)").explain(true) {code} This should have all the elements required to reproduce the problem but somehow I could not get the required `Project` operator so there is no mapping of the column p.k as it is in the TPCDS-Q32. I will keep trying. > CheckAnalysis rejects TPCDS query 32 > > > Key: SPARK-18814 > URL: https://issues.apache.org/jira/browse/SPARK-18814 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Eric Liang >Priority: Blocker > > It seems the CheckAnalysis rule introduced by SPARK-18504 is incorrect > rejecting this TPCDS query, which ran fine in Spark 2.0. There doesn't seem > to be any obvious error in the query or the check rule though: in the plan > below, the scalar subquery's condition field is "scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] ", which should reference cs_item_sk#39. > Nonetheless CheckAnalysis complains that cs_item_sk#39 is not referenced by > the scalar subquery predicates. > analysis error: > {code} > == Query: q32-v1.4 == > Can't be analyzed: org.apache.spark.sql.AnalysisException: a GROUP BY clause > in a scalar correlated subquery cannot contain non-correlated columns: > cs_item_sk#39;; > GlobalLimit 100 > +- LocalLimit 100 >+- Aggregate [sum(cs_ext_discount_amt#46) AS excess discount amount#23] > +- Filter i_manufact_id#72 = 977) && (i_item_sk#59 = > cs_item_sk#39)) && ((d_date#83 >= 2000-01-27) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string && ((d_date_sk#81 = cs_sold_date_sk#58) && > (cast(cs_ext_discount_amt#46 as decimal(14,7)) > cast(scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] as decimal(14,7) > : +- Project [(CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39 AS > cs_item_sk#39#111] > : +- Aggregate [cs_item_sk#39], > [CheckOverflow((promote_precision(cast(1.3 as decimal(11,6))) * > promote_precision(cast(avg(cs_ext_discount_amt#46) as decimal(11,6, > DecimalType(14,7)) AS (CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39] > :+- Filter (((d_date#83 >= 2000-01-27]) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string))) && (d_date_sk#81 = cs_sold_date_sk#58)) > : +- Join Inner > : :- SubqueryAlias catalog_sales > : : +- > Relation[cs_sold_time_sk#25,cs_ship_date_sk#26,cs_bill_customer_sk#27,cs_bill_cdemo_sk#28,cs_bill_hdemo_sk#29,cs_bill_addr_sk#30,cs_ship_customer_sk#31,cs_ship_cdemo_sk#32,cs_ship_hdemo_sk#33,cs_ship_addr_sk#34,cs_call_center_sk#35,cs_catalog_page_sk#36,cs_ship_mode_sk#37,cs_warehouse_sk#38,cs_item_sk#39,cs_promo_sk#40,cs_order_number#41,cs_quantity#42,cs_wholesale_cost#43,cs_list_price#44,cs_sales_price#45,cs_ext_discount_amt#46,cs_ext_sales_price#47,cs_ext_wholesale_cost#48,... > 10 more fields] parquet > : +- SubqueryAlias date_dim > : +- > Relation[d_date_sk#81,d_date_id#82,d_date#83,d_month_seq#84,d_week_seq#85,d_quarter_seq#86,d_year#87,d_dow#88,d_moy#89,d_dom#90,d_qoy#91,d_fy_year#92,d_fy_quarter_seq#93,d_fy_week_seq#94,d_day_name#95,d_quarter_name#96,d_holiday#97,d_weekend#98,d_following_holiday#99,d_first_dom#100,d_last_dom#101,d_same_day_ly#102,d_same_day_lq#103,d_current_day#104,... > 4 more fields] parquet > +- Join Inner > :- Join Inner > : :- SubqueryAlias catalog_sales > : : +- >
[jira] [Commented] (SPARK-18814) CheckAnalysis rejects TPCDS query 32
[ https://issues.apache.org/jira/browse/SPARK-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737079#comment-15737079 ] Nattavut Sutyanyong commented on SPARK-18814: - I am looking at this. > CheckAnalysis rejects TPCDS query 32 > > > Key: SPARK-18814 > URL: https://issues.apache.org/jira/browse/SPARK-18814 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Eric Liang >Priority: Blocker > > It seems the CheckAnalysis rule introduced by SPARK-18504 is incorrect > rejecting this TPCDS query, which ran fine in Spark 2.0. There doesn't seem > to be any obvious error in the query or the check rule though: in the plan > below, the scalar subquery's condition field is "scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] ", which should reference cs_item_sk#39. > Nonetheless CheckAnalysis complains that cs_item_sk#39 is not referenced by > the scalar subquery predicates. > analysis error: > {code} > == Query: q32-v1.4 == > Can't be analyzed: org.apache.spark.sql.AnalysisException: a GROUP BY clause > in a scalar correlated subquery cannot contain non-correlated columns: > cs_item_sk#39;; > GlobalLimit 100 > +- LocalLimit 100 >+- Aggregate [sum(cs_ext_discount_amt#46) AS excess discount amount#23] > +- Filter i_manufact_id#72 = 977) && (i_item_sk#59 = > cs_item_sk#39)) && ((d_date#83 >= 2000-01-27) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string && ((d_date_sk#81 = cs_sold_date_sk#58) && > (cast(cs_ext_discount_amt#46 as decimal(14,7)) > cast(scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] as decimal(14,7) > : +- Project [(CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39 AS > cs_item_sk#39#111] > : +- Aggregate [cs_item_sk#39], > [CheckOverflow((promote_precision(cast(1.3 as decimal(11,6))) * > promote_precision(cast(avg(cs_ext_discount_amt#46) as decimal(11,6, > DecimalType(14,7)) AS (CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39] > :+- Filter (((d_date#83 >= 2000-01-27]) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string))) && (d_date_sk#81 = cs_sold_date_sk#58)) > : +- Join Inner > : :- SubqueryAlias catalog_sales > : : +- > Relation[cs_sold_time_sk#25,cs_ship_date_sk#26,cs_bill_customer_sk#27,cs_bill_cdemo_sk#28,cs_bill_hdemo_sk#29,cs_bill_addr_sk#30,cs_ship_customer_sk#31,cs_ship_cdemo_sk#32,cs_ship_hdemo_sk#33,cs_ship_addr_sk#34,cs_call_center_sk#35,cs_catalog_page_sk#36,cs_ship_mode_sk#37,cs_warehouse_sk#38,cs_item_sk#39,cs_promo_sk#40,cs_order_number#41,cs_quantity#42,cs_wholesale_cost#43,cs_list_price#44,cs_sales_price#45,cs_ext_discount_amt#46,cs_ext_sales_price#47,cs_ext_wholesale_cost#48,... > 10 more fields] parquet > : +- SubqueryAlias date_dim > : +- > Relation[d_date_sk#81,d_date_id#82,d_date#83,d_month_seq#84,d_week_seq#85,d_quarter_seq#86,d_year#87,d_dow#88,d_moy#89,d_dom#90,d_qoy#91,d_fy_year#92,d_fy_quarter_seq#93,d_fy_week_seq#94,d_day_name#95,d_quarter_name#96,d_holiday#97,d_weekend#98,d_following_holiday#99,d_first_dom#100,d_last_dom#101,d_same_day_ly#102,d_same_day_lq#103,d_current_day#104,... > 4 more fields] parquet > +- Join Inner > :- Join Inner > : :- SubqueryAlias catalog_sales > : : +- > Relation[cs_sold_time_sk#25,cs_ship_date_sk#26,cs_bill_customer_sk#27,cs_bill_cdemo_sk#28,cs_bill_hdemo_sk#29,cs_bill_addr_sk#30,cs_ship_customer_sk#31,cs_ship_cdemo_sk#32,cs_ship_hdemo_sk#33,cs_ship_addr_sk#34,cs_call_center_sk#35,cs_catalog_page_sk#36,cs_ship_mode_sk#37,cs_warehouse_sk#38,cs_item_sk#39,cs_promo_sk#40,cs_order_number#41,cs_quantity#42,cs_wholesale_cost#43,cs_list_price#44,cs_sales_price#45,cs_ext_discount_amt#46,cs_ext_sales_price#47,cs_ext_wholesale_cost#48,... > 10 more fields] parquet > : +- SubqueryAlias item > : +- > Relation[i_item_sk#59,i_item_id#60,i_rec_start_date#61,i_rec_end_date#62,i_item_desc#63,i_current_price#64,i_wholesale_cost#65,i_brand_id#66,i_brand#67,i_class_id#68,i_class#69,i_category_id#70,i_category#71,i_manufact_id#72,i_manufact#73,i_size#74,i_formulation#75,i_color#76,i_units#77,i_container#78,i_manager_id#79,i_product_name#80] > parquet > +- SubqueryAlias date_dim >+- >
[jira] [Commented] (SPARK-18814) CheckAnalysis rejects TPCDS query 32
[ https://issues.apache.org/jira/browse/SPARK-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737074#comment-15737074 ] Reynold Xin commented on SPARK-18814: - cc [~hvanhovell] and [~nsyca] > CheckAnalysis rejects TPCDS query 32 > > > Key: SPARK-18814 > URL: https://issues.apache.org/jira/browse/SPARK-18814 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Eric Liang >Priority: Blocker > > It seems the CheckAnalysis rule introduced by SPARK-18504 is incorrect > rejecting this TPCDS query, which ran fine in Spark 2.0. There doesn't seem > to be any obvious error in the query or the check rule though: in the plan > below, the scalar subquery's condition field is "scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] ", which should reference cs_item_sk#39. > Nonetheless CheckAnalysis complains that cs_item_sk#39 is not referenced by > the scalar subquery predicates. > analysis error: > {code} > == Query: q32-v1.4 == > Can't be analyzed: org.apache.spark.sql.AnalysisException: a GROUP BY clause > in a scalar correlated subquery cannot contain non-correlated columns: > cs_item_sk#39;; > GlobalLimit 100 > +- LocalLimit 100 >+- Aggregate [sum(cs_ext_discount_amt#46) AS excess discount amount#23] > +- Filter i_manufact_id#72 = 977) && (i_item_sk#59 = > cs_item_sk#39)) && ((d_date#83 >= 2000-01-27) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string && ((d_date_sk#81 = cs_sold_date_sk#58) && > (cast(cs_ext_discount_amt#46 as decimal(14,7)) > cast(scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] as decimal(14,7) > : +- Project [(CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39 AS > cs_item_sk#39#111] > : +- Aggregate [cs_item_sk#39], > [CheckOverflow((promote_precision(cast(1.3 as decimal(11,6))) * > promote_precision(cast(avg(cs_ext_discount_amt#46) as decimal(11,6, > DecimalType(14,7)) AS (CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39] > :+- Filter (((d_date#83 >= 2000-01-27]) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string))) && (d_date_sk#81 = cs_sold_date_sk#58)) > : +- Join Inner > : :- SubqueryAlias catalog_sales > : : +- > Relation[cs_sold_time_sk#25,cs_ship_date_sk#26,cs_bill_customer_sk#27,cs_bill_cdemo_sk#28,cs_bill_hdemo_sk#29,cs_bill_addr_sk#30,cs_ship_customer_sk#31,cs_ship_cdemo_sk#32,cs_ship_hdemo_sk#33,cs_ship_addr_sk#34,cs_call_center_sk#35,cs_catalog_page_sk#36,cs_ship_mode_sk#37,cs_warehouse_sk#38,cs_item_sk#39,cs_promo_sk#40,cs_order_number#41,cs_quantity#42,cs_wholesale_cost#43,cs_list_price#44,cs_sales_price#45,cs_ext_discount_amt#46,cs_ext_sales_price#47,cs_ext_wholesale_cost#48,... > 10 more fields] parquet > : +- SubqueryAlias date_dim > : +- > Relation[d_date_sk#81,d_date_id#82,d_date#83,d_month_seq#84,d_week_seq#85,d_quarter_seq#86,d_year#87,d_dow#88,d_moy#89,d_dom#90,d_qoy#91,d_fy_year#92,d_fy_quarter_seq#93,d_fy_week_seq#94,d_day_name#95,d_quarter_name#96,d_holiday#97,d_weekend#98,d_following_holiday#99,d_first_dom#100,d_last_dom#101,d_same_day_ly#102,d_same_day_lq#103,d_current_day#104,... > 4 more fields] parquet > +- Join Inner > :- Join Inner > : :- SubqueryAlias catalog_sales > : : +- > Relation[cs_sold_time_sk#25,cs_ship_date_sk#26,cs_bill_customer_sk#27,cs_bill_cdemo_sk#28,cs_bill_hdemo_sk#29,cs_bill_addr_sk#30,cs_ship_customer_sk#31,cs_ship_cdemo_sk#32,cs_ship_hdemo_sk#33,cs_ship_addr_sk#34,cs_call_center_sk#35,cs_catalog_page_sk#36,cs_ship_mode_sk#37,cs_warehouse_sk#38,cs_item_sk#39,cs_promo_sk#40,cs_order_number#41,cs_quantity#42,cs_wholesale_cost#43,cs_list_price#44,cs_sales_price#45,cs_ext_discount_amt#46,cs_ext_sales_price#47,cs_ext_wholesale_cost#48,... > 10 more fields] parquet > : +- SubqueryAlias item > : +- > Relation[i_item_sk#59,i_item_id#60,i_rec_start_date#61,i_rec_end_date#62,i_item_desc#63,i_current_price#64,i_wholesale_cost#65,i_brand_id#66,i_brand#67,i_class_id#68,i_class#69,i_category_id#70,i_category#71,i_manufact_id#72,i_manufact#73,i_size#74,i_formulation#75,i_color#76,i_units#77,i_container#78,i_manager_id#79,i_product_name#80] > parquet > +- SubqueryAlias date_dim >+- >
[jira] [Commented] (SPARK-18814) CheckAnalysis rejects TPCDS query 32
[ https://issues.apache.org/jira/browse/SPARK-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737068#comment-15737068 ] Eric Liang commented on SPARK-18814: [~rxin] > CheckAnalysis rejects TPCDS query 32 > > > Key: SPARK-18814 > URL: https://issues.apache.org/jira/browse/SPARK-18814 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Eric Liang >Priority: Blocker > > It seems the CheckAnalysis rule introduced by SPARK-18504 is incorrect > rejecting this TPCDS query, which ran fine in Spark 2.0. There doesn't seem > to be any obvious error in the query or the check rule though: in the plan > below, the scalar subquery's condition field is "scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] ", which should reference cs_item_sk#39. > Nonetheless CheckAnalysis complains that cs_item_sk#39 is not referenced by > the scalar subquery predicates. > analysis error: > {code} > == Query: q32-v1.4 == > Can't be analyzed: org.apache.spark.sql.AnalysisException: a GROUP BY clause > in a scalar correlated subquery cannot contain non-correlated columns: > cs_item_sk#39;; > GlobalLimit 100 > +- LocalLimit 100 >+- Aggregate [sum(cs_ext_discount_amt#46) AS excess discount amount#23] > +- Filter i_manufact_id#72 = 977) && (i_item_sk#59 = > cs_item_sk#39)) && ((d_date#83 >= 2000-01-27) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string && ((d_date_sk#81 = cs_sold_date_sk#58) && > (cast(cs_ext_discount_amt#46 as decimal(14,7)) > cast(scalar-subquery#24 > [(cs_item_sk#39#111 = i_item_sk#59)] as decimal(14,7) > : +- Project [(CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39 AS > cs_item_sk#39#111] > : +- Aggregate [cs_item_sk#39], > [CheckOverflow((promote_precision(cast(1.3 as decimal(11,6))) * > promote_precision(cast(avg(cs_ext_discount_amt#46) as decimal(11,6, > DecimalType(14,7)) AS (CAST(1.3 AS DECIMAL(11,6)) * > CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110, cs_item_sk#39] > :+- Filter (((d_date#83 >= 2000-01-27]) && (d_date#83 <= > cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 > days as date) as string))) && (d_date_sk#81 = cs_sold_date_sk#58)) > : +- Join Inner > : :- SubqueryAlias catalog_sales > : : +- > Relation[cs_sold_time_sk#25,cs_ship_date_sk#26,cs_bill_customer_sk#27,cs_bill_cdemo_sk#28,cs_bill_hdemo_sk#29,cs_bill_addr_sk#30,cs_ship_customer_sk#31,cs_ship_cdemo_sk#32,cs_ship_hdemo_sk#33,cs_ship_addr_sk#34,cs_call_center_sk#35,cs_catalog_page_sk#36,cs_ship_mode_sk#37,cs_warehouse_sk#38,cs_item_sk#39,cs_promo_sk#40,cs_order_number#41,cs_quantity#42,cs_wholesale_cost#43,cs_list_price#44,cs_sales_price#45,cs_ext_discount_amt#46,cs_ext_sales_price#47,cs_ext_wholesale_cost#48,... > 10 more fields] parquet > : +- SubqueryAlias date_dim > : +- > Relation[d_date_sk#81,d_date_id#82,d_date#83,d_month_seq#84,d_week_seq#85,d_quarter_seq#86,d_year#87,d_dow#88,d_moy#89,d_dom#90,d_qoy#91,d_fy_year#92,d_fy_quarter_seq#93,d_fy_week_seq#94,d_day_name#95,d_quarter_name#96,d_holiday#97,d_weekend#98,d_following_holiday#99,d_first_dom#100,d_last_dom#101,d_same_day_ly#102,d_same_day_lq#103,d_current_day#104,... > 4 more fields] parquet > +- Join Inner > :- Join Inner > : :- SubqueryAlias catalog_sales > : : +- > Relation[cs_sold_time_sk#25,cs_ship_date_sk#26,cs_bill_customer_sk#27,cs_bill_cdemo_sk#28,cs_bill_hdemo_sk#29,cs_bill_addr_sk#30,cs_ship_customer_sk#31,cs_ship_cdemo_sk#32,cs_ship_hdemo_sk#33,cs_ship_addr_sk#34,cs_call_center_sk#35,cs_catalog_page_sk#36,cs_ship_mode_sk#37,cs_warehouse_sk#38,cs_item_sk#39,cs_promo_sk#40,cs_order_number#41,cs_quantity#42,cs_wholesale_cost#43,cs_list_price#44,cs_sales_price#45,cs_ext_discount_amt#46,cs_ext_sales_price#47,cs_ext_wholesale_cost#48,... > 10 more fields] parquet > : +- SubqueryAlias item > : +- > Relation[i_item_sk#59,i_item_id#60,i_rec_start_date#61,i_rec_end_date#62,i_item_desc#63,i_current_price#64,i_wholesale_cost#65,i_brand_id#66,i_brand#67,i_class_id#68,i_class#69,i_category_id#70,i_category#71,i_manufact_id#72,i_manufact#73,i_size#74,i_formulation#75,i_color#76,i_units#77,i_container#78,i_manager_id#79,i_product_name#80] > parquet > +- SubqueryAlias date_dim >+- >