peter-toth edited a comment on issue #23802: [SPARK-26893][SQL] Allow partition 
pruning with subquery filters on file source
URL: https://github.com/apache/spark/pull/23802#issuecomment-465186995
 
 
   It seems there is not a query among TPCDS that could benefit from this 
change.
   (Unfortunately, I didn't find any that use subquery filtering on 
partitioning columns. But actually there are quite a lot of queries that use 
partitioning columns with joins and I think this PR can help to address 
https://issues.apache.org/jira/browse/SPARK-26769 and then even TPCDS can 
benefit from it.)
   
   So to run a benchmark I came up with this query that computes the count of 
distinct items sold since 2000 on the web channel:
   ```
   SELECT COUNT(DISTINCT ws_item_sk) FROM web_sales WHERE ws_sold_date_sk >= (
     SELECT MIN(d_date_sk) FROM date_dim WHERE d_year >= 2000
   )
   ```
   Results are in ms:
   
   |run|master|this PR|
   |-|-|-|
   |1|8724|5846|
   |2|3852|3172|
   |3|3385|2432|
   |4|3320|2251|
   |5|3013|2178|
   |6|2959|2227|
   |7|3038|2353|
   |8|3157|2224|
   |9|2929|1967|
   |10|2943|2048|
   |11|2973|1967|
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to