sunchao commented on pull request #33695: URL: https://github.com/apache/spark/pull/33695#issuecomment-898598261
> Do we have benchmark result? Sorry for the slight late response. Yes the benchmark is as follow: ``` OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Mac OS X 10.16 Intel(R) Core(TM) i9-10910 CPU @ 3.60GHz SQL Nested Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------------- SQL ORC MR 11927 12314 215 0.1 11374.3 1.0X SQL ORC Vectorized (Disabled Nested Column) 11834 12561 431 0.1 11285.5 1.0X SQL ORC Vectorized (Enabled Nested Column) 7431 7556 102 0.1 7086.6 1.6X SQL Parquet MR 7561 7692 103 0.1 7210.9 1.6X SQL Parquet Vectorized (Disabled Nested Column) 7839 8165 299 0.1 7475.9 1.5X SQL Parquet Vectorized (Enabled Nested Column) 5325 5400 84 0.2 5078.0 2.2X ================================================================================================ SQL Single Numeric Column Scan ================================================================================================ OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Mac OS X 10.16 Intel(R) Core(TM) i9-10910 CPU @ 3.60GHz SQL Single TINYINT Column Scan in Struct: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------------- SQL Parquet MR 1490 1503 18 10.6 94.7 1.0X SQL Parquet Vectorized (Disabled Nested Column) 1881 1893 17 8.4 119.6 0.8X SQL Parquet Vectorized (Enabled Nested Column) 107 128 42 146.6 6.8 13.9X OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Mac OS X 10.16 Intel(R) Core(TM) i9-10910 CPU @ 3.60GHz SQL Single SMALLINT Column Scan in Struct: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------------- SQL Parquet MR 1659 1662 4 9.5 105.5 1.0X SQL Parquet Vectorized (Disabled Nested Column) 2115 2116 1 7.4 134.5 0.8X SQL Parquet Vectorized (Enabled Nested Column) 145 191 34 108.5 9.2 11.4X OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Mac OS X 10.16 Intel(R) Core(TM) i9-10910 CPU @ 3.60GHz SQL Single INT Column Scan in Struct: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------------- SQL Parquet MR 1670 1685 21 9.4 106.2 1.0X SQL Parquet Vectorized (Disabled Nested Column) 2082 2106 34 7.6 132.4 0.8X SQL Parquet Vectorized (Enabled Nested Column) 100 110 8 156.5 6.4 16.6X OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Mac OS X 10.16 Intel(R) Core(TM) i9-10910 CPU @ 3.60GHz SQL Single BIGINT Column Scan in Struct: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------------- SQL Parquet MR 1671 1686 22 9.4 106.2 1.0X SQL Parquet Vectorized (Disabled Nested Column) 2168 2174 9 7.3 137.8 0.8X SQL Parquet Vectorized (Enabled Nested Column) 144 161 17 109.3 9.2 11.6X OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Mac OS X 10.16 Intel(R) Core(TM) i9-10910 CPU @ 3.60GHz SQL Single FLOAT Column Scan in Struct: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------------- SQL Parquet MR 1579 1588 13 10.0 100.4 1.0X SQL Parquet Vectorized (Disabled Nested Column) 2070 2070 0 7.6 131.6 0.8X SQL Parquet Vectorized (Enabled Nested Column) 94 106 11 167.0 6.0 16.8X OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Mac OS X 10.16 Intel(R) Core(TM) i9-10910 CPU @ 3.60GHz SQL Single DOUBLE Column Scan in Struct: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------------- SQL Parquet MR 1798 1808 15 8.8 114.3 1.0X SQL Parquet Vectorized (Disabled Nested Column) 2238 2251 18 7.0 142.3 0.8X SQL Parquet Vectorized (Enabled Nested Column) 131 149 18 119.7 8.4 13.7X ``` So for reading array of struct/map column, it is about 1.5x speed up, and for reading fields within structs, it is 14x speedup on average. I'll also run the benchmark using GitHub workflow and add the results as part of the PR later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
