sunchao commented on pull request #29542:
URL: https://github.com/apache/spark/pull/29542#issuecomment-683451979


   > Parquet reader is performance-wise important component in Spark SQL. We 
better to make sure no performance regression due to this change. Should we run 
a benchmark to check it?
   
   @viirya So I used `FilterPushdownBenchmark` for this and I don't see much 
difference. Taking the first few:
   
   Before:
   ```
   [info] OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5
   [info] Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
   [info] Select 0 string row ('7864320' < value < '7864320'):  Best Time(ms)   
Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] 
-----------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                            5632    
       5752         120          2.8         358.1       1.0X
   [info] Parquet Vectorized (Pushdown)                                  491    
        506          18         32.0          31.2      11.5X
   [info] Native ORC Vectorized                                         4300    
       4335          25          3.7         273.4       1.3X
   [info] Native ORC Vectorized (Pushdown)                               525    
        530           6         30.0          33.4      10.7X
   
   [info] OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5
   [info] Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
   [info] Select 1 string row (value = '7864320'):  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] 
------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                 5594           
5757         101          2.8         355.7       1.0X
   [info] Parquet Vectorized (Pushdown)                       472            
491          14         33.3          30.0      11.9X
   [info] Native ORC Vectorized                              4320           
4387          42          3.6         274.7       1.3X
   [info] Native ORC Vectorized (Pushdown)                    512            
524           8         30.7          32.6      10.9X
   ```
   After:
   ```
   [info] OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5
   [info] Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
   [info] Select 0 string row ('7864320' < value < '7864320'):  Best Time(ms)   
Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] 
-----------------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                            5539    
       5635          87          2.8         352.1       1.0X
   [info] Parquet Vectorized (Pushdown)                                  456    
        461           6         34.5          29.0      12.1X
   [info] Native ORC Vectorized                                         4243    
       4282          35          3.7         269.8       1.3X
   [info] Native ORC Vectorized (Pushdown)                               511    
        523          13         30.8          32.5      10.8X
   
   [info] OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5
   [info] Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
   [info] Select 1 string row (value = '7864320'):  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] 
------------------------------------------------------------------------------------------------------------------------
   [info] Parquet Vectorized                                 5509           
5576          55          2.9         350.2       1.0X
   [info] Parquet Vectorized (Pushdown)                       462            
475           8         34.1          29.4      11.9X
   [info] Native ORC Vectorized                              4230           
4294          44          3.7         268.9       1.3X
   [info] Native ORC Vectorized (Pushdown)                    501            
510          11         31.4          31.8      11.0X
   ```
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to