LuciferYang commented on pull request #35669:
URL: https://github.com/apache/spark/pull/35669#issuecomment-1058062770


   > > Could we add the evidence of Parquet skipping files/row-groups (either a 
micro benchmark or some logs during execution or code pointers), when we push 
down partition filter here?
   > 
   > @c21 I have add some benchmark tests in FilterPushdownBenchmark, and run 
them in github actions. Test code can be found 
[here](https://github.com/stczwd/spark/blob/SPARK-38041-2/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala#L81).
   > 
   > Test result
   > 
   > ```
   > OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Linux 5.11.0-1028-azure
   > Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
   > Data filter with partitions: ((a = 10 and part = 0) or (a = 10240 and part 
= 1) or (part = 2)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   
Per Row(ns)   Relative
   > 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   > Parquet Vectorized with partition                                          
                              3039           3157         122          5.2      
   193.2       1.0X
   > Parquet Vectorized with partition (Pushdown)                               
                              1548           1568          15         10.2      
    98.4       2.0X
   > 
   > OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Linux 5.11.0-1028-azure
   > Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
   > Data filter with partitions: ((a > 10 and part = 0) or (a <= 10 and part 
>=1 and part < 3)):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   
Per Row(ns)   Relative
   > 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   > Parquet Vectorized with partition                                          
                           2942           2997          40          5.3         
187.1       1.0X
   > Parquet Vectorized with partition (Pushdown)                               
                           1497           1513          15         10.5         
 95.2       2.0X
   > ```
   
   @stczwd Can you add the benchmark code to this pr and use GA to produce the 
benchmark results?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to