sunchao commented on a change in pull request #34611:
URL: https://github.com/apache/spark/pull/34611#discussion_r752461643
##########
File path: sql/core/benchmarks/DataSourceReadBenchmark-results.txt
##########
@@ -1,252 +1,275 @@
+================================================================================================
+SQL Single Boolean Column Scan
+================================================================================================
+
+OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.11.0-1020-azure
+Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
+SQL Single BOOLEAN Column Scan: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+SQL CSV 13472 13878
574 1.2 856.5 1.0X
+SQL Json 10036 10477
623 1.6 638.0 1.3X
+SQL Parquet Vectorized 144 167
12 109.2 9.2 93.5X
+SQL Parquet MR 2224 2230
7 7.1 141.4 6.1X
+SQL ORC Vectorized 191 203
6 82.3 12.2 70.5X
+SQL ORC MR 1865 1870
7 8.4 118.6 7.2X
+
+OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.11.0-1020-azure
+Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
+Parquet Reader Single BOOLEAN Column Scan: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-------------------------------------------------------------------------------------------------------------------------
+ParquetReader Vectorized 119 125
8 131.9 7.6 1.0X
+ParquetReader Vectorized -> Row 60 63
2 260.2 3.8 2.0X
+
+
================================================================================================
SQL Single Numeric Column Scan
================================================================================================
-OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
-Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
+OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.11.0-1020-azure
+Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
SQL Single TINYINT Column Scan: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-SQL CSV 15943 15956
18 1.0 1013.6 1.0X
-SQL Json 9109 9158
70 1.7 579.1 1.8X
-SQL Parquet Vectorized 168 191
16 93.8 10.7 95.1X
-SQL Parquet MR 1938 1950
17 8.1 123.2 8.2X
-SQL ORC Vectorized 191 199
6 82.2 12.2 83.3X
-SQL ORC MR 1523 1537
20 10.3 96.8 10.5X
-
-OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
-Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
+SQL CSV 16820 16859
54 0.9 1069.4 1.0X
+SQL Json 11583 11586
4 1.4 736.4 1.5X
+SQL Parquet Vectorized 164 177
11 96.0 10.4 102.7X
+SQL Parquet MR 2839 2857
25 5.5 180.5 5.9X
+SQL ORC Vectorized 150 161
7 104.8 9.5 112.1X
+SQL ORC MR 1915 1923
12 8.2 121.7 8.8X
+
+OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.11.0-1020-azure
+Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
Parquet Reader Single TINYINT Column Scan: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------
-ParquetReader Vectorized 203 206
3 77.5 12.9 1.0X
-ParquetReader Vectorized -> Row 97 100
2 161.6 6.2 2.1X
+ParquetReader Vectorized 211 218
5 74.6 13.4 1.0X
+ParquetReader Vectorized -> Row 286 293
7 55.1 18.2 0.7X
Review comment:
It looks like noise, in the new run:
```
Parquet Reader Single TINYINT Column Scan: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------
ParquetReader Vectorized 179 185
9 88.0 11.4 1.0X
ParquetReader Vectorized -> Row 91 101
3 172.6 5.8 2.0X
```
but come to think about it, I wonder how this makes sense: shouldn't
`ParquetReader Vectorized -> Row` always be more expensive than `ParquetReader
Vectorized`, it basically does the same thing as latter but with the extra
columnar to row conversion.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]