cloud-fan commented on a change in pull request #30965:
URL: https://github.com/apache/spark/pull/30965#discussion_r604631512



##########
File path: 
sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q19.sf100/simplified.txt
##########
@@ -6,71 +6,71 @@ TakeOrderedAndProject 
[ext_price,brand,brand_id,i_manufact_id,i_manufact]
           WholeStageCodegen (12)
             HashAggregate 
[i_brand,i_brand_id,i_manufact_id,i_manufact,ss_ext_sales_price] [sum,sum]
               Project 
[ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact]
-                SortMergeJoin [ss_customer_sk,c_customer_sk,ca_zip,s_zip]
-                  InputAdapter
-                    WholeStageCodegen (5)
-                      Sort [ss_customer_sk]
-                        InputAdapter
-                          Exchange [ss_customer_sk] #2
-                            WholeStageCodegen (4)
-                              Project 
[ss_customer_sk,ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact,s_zip]
-                                BroadcastHashJoin [ss_store_sk,s_store_sk]
-                                  Project 
[ss_customer_sk,ss_store_sk,ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact]
-                                    BroadcastHashJoin 
[ss_sold_date_sk,d_date_sk]
-                                      Project 
[ss_sold_date_sk,ss_customer_sk,ss_store_sk,ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact]
-                                        BroadcastHashJoin 
[ss_item_sk,i_item_sk]
-                                          Filter 
[ss_sold_date_sk,ss_item_sk,ss_customer_sk,ss_store_sk]
-                                            ColumnarToRow
-                                              InputAdapter
-                                                Scan parquet 
default.store_sales 
[ss_sold_date_sk,ss_item_sk,ss_customer_sk,ss_store_sk,ss_ext_sales_price]
+                BroadcastHashJoin [ss_item_sk,i_item_sk]

Review comment:
       I think q19 exposes a problem. Previously this `BroadcastHashJoin` is 
run before the `SortMergeJoin`, which reduces the input data of shuffle, 
because this  `BroadcastHashJoin` has a filter on the right side and can likely 
prune many data.
   
   @tanelk , if the idea from @wzhfy doesn't look good to you, can you try with 
some other ideas and see if we can fix this issue?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to