stczwd commented on pull request #35669: URL: https://github.com/apache/spark/pull/35669#issuecomment-1056374269
> Could we add the evidence of Parquet skipping files/row-groups (either a micro benchmark or some logs during execution or code pointers), when we push down partition filter here? @c21 I have add some benchmark tests in FilterPushdownBenchmark, and run them in github actions. Test code can be found [here](https://github.com/stczwd/spark/blob/SPARK-38041-2/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala#L81). Test result ``` OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Linux 5.11.0-1028-azure Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz Data filter with partitions: ((a = 10 and part = 0) or (a = 10240 and part = 1) or (part = 2)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Parquet Vectorized with partition 3039 3157 122 5.2 193.2 1.0X Parquet Vectorized with partition (Pushdown) 1548 1568 15 10.2 98.4 2.0X OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Linux 5.11.0-1028-azure Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz Data filter with partitions: ((a > 10 and part = 0) or (a <= 10 and part >=1 and part < 3)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Parquet Vectorized with partition 2942 2997 40 5.3 187.1 1.0X Parquet Vectorized with partition (Pushdown) 1497 1513 15 10.5 95.2 2.0X -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
