wangyum commented on pull request #31193:
URL: https://github.com/apache/spark/pull/31193#issuecomment-760735189
Push down over Aggregate affect performance, for example:
```sql
SELECT i_item_sk ss_item_sk
FROM item,
(SELECT
distinct
iss.i_brand_id brand_id,
iss.i_class_id class_id,
iss.i_category_id category_id
FROM store_sales, item iss, date_dim d1
WHERE ss_item_sk = iss.i_item_sk
AND ss_sold_date_sk = d1.d_date_sk
AND d1.d_year BETWEEN 1999 AND 1999 + 2
INTERSECT
SELECT
distinct
ics.i_brand_id,
ics.i_class_id,
ics.i_category_id
FROM catalog_sales, item ics, date_dim d2
WHERE cs_item_sk = ics.i_item_sk
AND cs_sold_date_sk = d2.d_date_sk
AND d2.d_year BETWEEN 1999 AND 1999 + 2
INTERSECT
SELECT
distinct
iws.i_brand_id,
iws.i_class_id,
iws.i_category_id
FROM web_sales, item iws, date_dim d3
WHERE ws_item_sk = iws.i_item_sk
AND ws_sold_date_sk = d3.d_date_sk
AND d3.d_year BETWEEN 1999 AND 1999 + 2) x
WHERE i_brand_id = brand_id
AND i_class_id = class_id
AND i_category_id = category_id;
```
Push down(enable CBO) takes 3.5 minutes, but it only takes 25 seconds if
disable push down(disable CBO). So it is hard to say pushing down always has
benefit.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]