HyukjinKwon commented on a change in pull request #30965:
URL: https://github.com/apache/spark/pull/30965#discussion_r604675022



##########
File path: 
sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q19.sf100/simplified.txt
##########
@@ -6,71 +6,71 @@ TakeOrderedAndProject 
[ext_price,brand,brand_id,i_manufact_id,i_manufact]
           WholeStageCodegen (12)
             HashAggregate 
[i_brand,i_brand_id,i_manufact_id,i_manufact,ss_ext_sales_price] [sum,sum]
               Project 
[ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact]
-                SortMergeJoin [ss_customer_sk,c_customer_sk,ca_zip,s_zip]
-                  InputAdapter
-                    WholeStageCodegen (5)
-                      Sort [ss_customer_sk]
-                        InputAdapter
-                          Exchange [ss_customer_sk] #2
-                            WholeStageCodegen (4)
-                              Project 
[ss_customer_sk,ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact,s_zip]
-                                BroadcastHashJoin [ss_store_sk,s_store_sk]
-                                  Project 
[ss_customer_sk,ss_store_sk,ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact]
-                                    BroadcastHashJoin 
[ss_sold_date_sk,d_date_sk]
-                                      Project 
[ss_sold_date_sk,ss_customer_sk,ss_store_sk,ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact]
-                                        BroadcastHashJoin 
[ss_item_sk,i_item_sk]
-                                          Filter 
[ss_sold_date_sk,ss_item_sk,ss_customer_sk,ss_store_sk]
-                                            ColumnarToRow
-                                              InputAdapter
-                                                Scan parquet 
default.store_sales 
[ss_sold_date_sk,ss_item_sk,ss_customer_sk,ss_store_sk,ss_ext_sales_price]
+                BroadcastHashJoin [ss_item_sk,i_item_sk]

Review comment:
       This actually caused a significant regression at q19 in TPC-DS benchmark 
(performed internally). Can we just revert it for now, and do it with actual 
performance numbers? I think that's safer and easier for everybody here. Seems 
like at least the plan change here was overlooked and the performance of this 
had to be clarified.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to