sadikovi commented on PR #37485:
URL: https://github.com/apache/spark/pull/37485#issuecomment-1212765192

   I reran the benchmarks again, on a larger 4x dataset (I changed the size in 
DataSourceReadBenchmark). The numbers are still very similar with the patch 
performing slightly better than the current code. I don't quite understand how 
that is possible unless the benchmark does not exercise the encoding.
   
   ### Before
   
   ```
   OpenJDK 64-Bit Server VM 1.8.0_312-8u312-b07-0ubuntu1~18.04-b07 on Linux 
5.4.0-1071-aws
   Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
   Parquet Reader Single INT Column Scan:       Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
---------------------------------------------------------------------------------------------------------------------------
   ParquetReader Vectorized: DataPageV1                   672            707    
      45         93.6          10.7       1.0X
   ParquetReader Vectorized: DataPageV2                   945           1012    
      95         66.6          15.0       0.7X
   ParquetReader Vectorized -> Row: DataPageV1            383            432    
      28        164.4           6.1       1.8X
   ParquetReader Vectorized -> Row: DataPageV2            670            678    
       8         93.9          10.6       1.0X
   
   OpenJDK 64-Bit Server VM 1.8.0_312-8u312-b07-0ubuntu1~18.04-b07 on Linux 
5.4.0-1071-aws
   Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
   Parquet Reader Single BIGINT Column Scan:    Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
---------------------------------------------------------------------------------------------------------------------------
   ParquetReader Vectorized: DataPageV1                   931            935    
       4         67.6          14.8       1.0X
   ParquetReader Vectorized: DataPageV2                  1475           1477    
       4         42.7          23.4       0.6X
   ParquetReader Vectorized -> Row: DataPageV1            638            650    
      14         98.5          10.1       1.5X
   ParquetReader Vectorized -> Row: DataPageV2           1172           1173    
       2         53.7          18.6       0.8X
   ```
   
   ### After
   ```
   [info] OpenJDK 64-Bit Server VM 1.8.0_312-8u312-b07-0ubuntu1~18.04-b07 on 
Linux 5.4.0-1071-aws
   [info] Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
   [info] Parquet Reader Single INT Column Scan:       Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] 
---------------------------------------------------------------------------------------------------------------------------
   [info] ParquetReader Vectorized: DataPageV1                   656            
704          60         95.9          10.4       1.0X
   [info] ParquetReader Vectorized: DataPageV2                   888            
898          12         70.9          14.1       0.7X
   [info] ParquetReader Vectorized -> Row: DataPageV1            393            
435          24        160.2           6.2       1.7X
   [info] ParquetReader Vectorized -> Row: DataPageV2            667            
681          12         94.3          10.6       1.0X
   
   [info] OpenJDK 64-Bit Server VM 1.8.0_312-8u312-b07-0ubuntu1~18.04-b07 on 
Linux 5.4.0-1071-aws
   [info] Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
   [info] Parquet Reader Single BIGINT Column Scan:    Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] 
---------------------------------------------------------------------------------------------------------------------------
   [info] ParquetReader Vectorized: DataPageV1                   935            
953          16         67.3          14.9       1.0X
   [info] ParquetReader Vectorized: DataPageV2                  1437           
1440           4         43.8          22.8       0.7X
   [info] ParquetReader Vectorized -> Row: DataPageV1            717            
731          12         87.7          11.4       1.3X
   [info] ParquetReader Vectorized -> Row: DataPageV2           1176           
1185          13         53.5          18.7       0.8X
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to