sunchao commented on pull request #33695:
URL: https://github.com/apache/spark/pull/33695#issuecomment-898598261


   > Do we have benchmark result?
   
   Sorry for the slight late response. Yes the benchmark is as follow:
   
   ```
   OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Mac OS X 10.16
   Intel(R) Core(TM) i9-10910 CPU @ 3.60GHz
   SQL Nested Column Scan:                          Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-------------------------------------------------------------------------------------------------------------------------------
   SQL ORC MR                                               11927          
12314         215          0.1       11374.3       1.0X
   SQL ORC Vectorized (Disabled Nested Column)              11834          
12561         431          0.1       11285.5       1.0X
   SQL ORC Vectorized (Enabled Nested Column)                7431           
7556         102          0.1        7086.6       1.6X
   SQL Parquet MR                                            7561           
7692         103          0.1        7210.9       1.6X
   SQL Parquet Vectorized (Disabled Nested Column)           7839           
8165         299          0.1        7475.9       1.5X
   SQL Parquet Vectorized (Enabled Nested Column)            5325           
5400          84          0.2        5078.0       2.2X
   
   
   
================================================================================================
   SQL Single Numeric Column Scan
   
================================================================================================
   
   OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Mac OS X 10.16
   Intel(R) Core(TM) i9-10910 CPU @ 3.60GHz
   SQL Single TINYINT Column Scan in Struct:        Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-------------------------------------------------------------------------------------------------------------------------------
   SQL Parquet MR                                            1490           
1503          18         10.6          94.7       1.0X
   SQL Parquet Vectorized (Disabled Nested Column)           1881           
1893          17          8.4         119.6       0.8X
   SQL Parquet Vectorized (Enabled Nested Column)             107            
128          42        146.6           6.8      13.9X
   
   OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Mac OS X 10.16
   Intel(R) Core(TM) i9-10910 CPU @ 3.60GHz
   SQL Single SMALLINT Column Scan in Struct:       Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-------------------------------------------------------------------------------------------------------------------------------
   SQL Parquet MR                                            1659           
1662           4          9.5         105.5       1.0X
   SQL Parquet Vectorized (Disabled Nested Column)           2115           
2116           1          7.4         134.5       0.8X
   SQL Parquet Vectorized (Enabled Nested Column)             145            
191          34        108.5           9.2      11.4X
   
   OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Mac OS X 10.16
   Intel(R) Core(TM) i9-10910 CPU @ 3.60GHz
   SQL Single INT Column Scan in Struct:            Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-------------------------------------------------------------------------------------------------------------------------------
   SQL Parquet MR                                            1670           
1685          21          9.4         106.2       1.0X
   SQL Parquet Vectorized (Disabled Nested Column)           2082           
2106          34          7.6         132.4       0.8X
   SQL Parquet Vectorized (Enabled Nested Column)             100            
110           8        156.5           6.4      16.6X
   
   OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Mac OS X 10.16
   Intel(R) Core(TM) i9-10910 CPU @ 3.60GHz
   SQL Single BIGINT Column Scan in Struct:         Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-------------------------------------------------------------------------------------------------------------------------------
   SQL Parquet MR                                            1671           
1686          22          9.4         106.2       1.0X
   SQL Parquet Vectorized (Disabled Nested Column)           2168           
2174           9          7.3         137.8       0.8X
   SQL Parquet Vectorized (Enabled Nested Column)             144            
161          17        109.3           9.2      11.6X
   
   OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Mac OS X 10.16
   Intel(R) Core(TM) i9-10910 CPU @ 3.60GHz
   SQL Single FLOAT Column Scan in Struct:          Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-------------------------------------------------------------------------------------------------------------------------------
   SQL Parquet MR                                            1579           
1588          13         10.0         100.4       1.0X
   SQL Parquet Vectorized (Disabled Nested Column)           2070           
2070           0          7.6         131.6       0.8X
   SQL Parquet Vectorized (Enabled Nested Column)              94            
106          11        167.0           6.0      16.8X
   
   OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Mac OS X 10.16
   Intel(R) Core(TM) i9-10910 CPU @ 3.60GHz
   SQL Single DOUBLE Column Scan in Struct:         Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-------------------------------------------------------------------------------------------------------------------------------
   SQL Parquet MR                                            1798           
1808          15          8.8         114.3       1.0X
   SQL Parquet Vectorized (Disabled Nested Column)           2238           
2251          18          7.0         142.3       0.8X
   SQL Parquet Vectorized (Enabled Nested Column)             131            
149          18        119.7           8.4      13.7X
   ```
   
   So for reading array of struct/map column, it is about 1.5x speed up, and 
for reading fields within structs, it is 14x speedup on average.
   
   I'll also run the benchmark using GitHub workflow and add the results as 
part of the PR later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to