wangyum commented on pull request #31193:
URL: https://github.com/apache/spark/pull/31193#issuecomment-760735189


   Push down over Aggregate affect performance, for example:
   ```sql
   SELECT i_item_sk ss_item_sk
     FROM item,
       (SELECT
         distinct
         iss.i_brand_id brand_id,
         iss.i_class_id class_id,
         iss.i_category_id category_id
       FROM store_sales, item iss, date_dim d1
       WHERE ss_item_sk = iss.i_item_sk
         AND ss_sold_date_sk = d1.d_date_sk
         AND d1.d_year BETWEEN 1999 AND 1999 + 2
       INTERSECT
       SELECT
       distinct
         ics.i_brand_id,
         ics.i_class_id,
         ics.i_category_id
       FROM catalog_sales, item ics, date_dim d2
       WHERE cs_item_sk = ics.i_item_sk
         AND cs_sold_date_sk = d2.d_date_sk
         AND d2.d_year BETWEEN 1999 AND 1999 + 2
       INTERSECT
       SELECT
       distinct
         iws.i_brand_id,
         iws.i_class_id,
         iws.i_category_id
       FROM web_sales, item iws, date_dim d3
       WHERE ws_item_sk = iws.i_item_sk
         AND ws_sold_date_sk = d3.d_date_sk
         AND d3.d_year BETWEEN 1999 AND 1999 + 2) x
     WHERE i_brand_id = brand_id
       AND i_class_id = class_id
       AND i_category_id = category_id;
   ```
   
   Push down(enable CBO) takes 3.5 minutes, but it only takes 25 seconds if 
disable push down(disable CBO). So it is hard to say pushing down always has 
benefit.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to