LuciferYang commented on code in PR #47310:
URL: https://github.com/apache/spark/pull/47310#discussion_r1677272149
##########
sql/core/benchmarks/DataSourceReadBenchmark-results.txt:
##########
@@ -1,431 +1,438 @@
-DataSourceReadBenchmark-jdk21-results.txt================================================================================================
+================================================================================================
SQL Single Numeric Column Scan
================================================================================================
-OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1022-azure
+OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1023-azure
AMD EPYC 7763 64-Core Processor
SQL Single BOOLEAN Column Scan: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-SQL CSV 10363 10364
2 1.5 658.9 1.0X
-SQL Json 8667 8699
46 1.8 551.0 1.2X
-SQL Parquet Vectorized: DataPageV1 103 114
8 153.3 6.5 101.0X
-SQL Parquet Vectorized: DataPageV2 101 111
6 155.4 6.4 102.4X
-SQL Parquet MR: DataPageV1 1809 1813
6 8.7 115.0 5.7X
-SQL Parquet MR: DataPageV2 1715 1720
8 9.2 109.0 6.0X
-SQL ORC Vectorized 139 146
5 113.1 8.8 74.5X
-SQL ORC MR 1508 1511
5 10.4 95.8 6.9X
-
-OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1022-azure
+SQL CSV 10854 10862
12 1.4 690.1 1.0X
+SQL Json 8728 8896
238 1.8 554.9 1.2X
+SQL Json with UnsafeRow 9797 9841
62 1.6 622.9 1.1X
+SQL Parquet Vectorized: DataPageV1 105 119
8 149.2 6.7 103.0X
+SQL Parquet Vectorized: DataPageV2 108 115
6 146.2 6.8 100.9X
+SQL Parquet MR: DataPageV1 1861 1872
16 8.5 118.3 5.8X
+SQL Parquet MR: DataPageV2 1770 1771
1 8.9 112.5 6.1X
+SQL ORC Vectorized 147 154
3 107.2 9.3 74.0X
+SQL ORC MR 1650 1650
0 9.5 104.9 6.6X
+
+OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1023-azure
AMD EPYC 7763 64-Core Processor
Parquet Reader Single BOOLEAN Column Scan: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
---------------------------------------------------------------------------------------------------------------------------
-ParquetReader Vectorized: DataPageV1 88 90
2 178.9 5.6 1.0X
-ParquetReader Vectorized: DataPageV2 95 96
1 166.2 6.0 0.9X
-ParquetReader Vectorized -> Row: DataPageV1 73 74
1 215.3 4.6 1.2X
-ParquetReader Vectorized -> Row: DataPageV2 81 83
1 193.1 5.2 1.1X
+ParquetReader Vectorized: DataPageV1 96 97
1 163.7 6.1 1.0X
+ParquetReader Vectorized: DataPageV2 102 104
4 154.4 6.5 0.9X
+ParquetReader Vectorized -> Row: DataPageV1 75 77
1 208.5 4.8 1.3X
+ParquetReader Vectorized -> Row: DataPageV2 82 83
2 192.8 5.2 1.2X
-OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1022-azure
+OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1023-azure
AMD EPYC 7763 64-Core Processor
SQL Single TINYINT Column Scan: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-SQL CSV 11538 11589
73 1.4 733.5 1.0X
-SQL Json 9586 9596
14 1.6 609.5 1.2X
-SQL Parquet Vectorized: DataPageV1 109 116
6 144.8 6.9 106.2X
-SQL Parquet Vectorized: DataPageV2 110 118
8 142.6 7.0 104.6X
-SQL Parquet MR: DataPageV1 1901 1953
74 8.3 120.9 6.1X
-SQL Parquet MR: DataPageV2 1817 1832
22 8.7 115.5 6.4X
-SQL ORC Vectorized 118 126
7 133.6 7.5 98.0X
-SQL ORC MR 1505 1535
43 10.5 95.7 7.7X
-
-OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1022-azure
+SQL CSV 10361 10395
48 1.5 658.7 1.0X
+SQL Json 9825 9848
32 1.6 624.7 1.1X
+SQL Json with UnsafeRow 10692 10700
11 1.5 679.8 1.0X
+SQL Parquet Vectorized: DataPageV1 108 115
6 145.6 6.9 95.9X
+SQL Parquet Vectorized: DataPageV2 106 115
6 147.9 6.8 97.4X
+SQL Parquet MR: DataPageV1 1924 1937
18 8.2 122.4 5.4X
+SQL Parquet MR: DataPageV2 1841 1858
25 8.5 117.0 5.6X
+SQL ORC Vectorized 113 117
4 138.8 7.2 91.4X
+SQL ORC MR 1554 1564
14 10.1 98.8 6.7X
+
+OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1023-azure
AMD EPYC 7763 64-Core Processor
Parquet Reader Single TINYINT Column Scan: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
---------------------------------------------------------------------------------------------------------------------------
-ParquetReader Vectorized: DataPageV1 93 94
1 169.9 5.9 1.0X
-ParquetReader Vectorized: DataPageV2 93 94
1 169.1 5.9 1.0X
-ParquetReader Vectorized -> Row: DataPageV1 61 62
1 258.0 3.9 1.5X
-ParquetReader Vectorized -> Row: DataPageV2 61 62
1 258.4 3.9 1.5X
+ParquetReader Vectorized: DataPageV1 85 88
4 185.9 5.4 1.0X
+ParquetReader Vectorized: DataPageV2 84 86
2 186.5 5.4 1.0X
+ParquetReader Vectorized -> Row: DataPageV1 62 64
1 252.7 4.0 1.4X
+ParquetReader Vectorized -> Row: DataPageV2 62 63
1 253.9 3.9 1.4X
-OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1022-azure
+OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1023-azure
AMD EPYC 7763 64-Core Processor
SQL Single SMALLINT Column Scan: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-SQL CSV 12200 12203
5 1.3 775.7 1.0X
-SQL Json 9813 9854
57 1.6 623.9 1.2X
-SQL Parquet Vectorized: DataPageV1 101 107
6 156.1 6.4 121.0X
-SQL Parquet Vectorized: DataPageV2 129 135
6 122.3 8.2 94.9X
-SQL Parquet MR: DataPageV1 1968 1989
29 8.0 125.1 6.2X
-SQL Parquet MR: DataPageV2 1913 1916
3 8.2 121.6 6.4X
-SQL ORC Vectorized 130 135
6 120.8 8.3 93.7X
-SQL ORC MR 1593 1600
10 9.9 101.3 7.7X
-
-OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1022-azure
+SQL CSV 10958 10970
18 1.4 696.7 1.0X
+SQL Json 10164 10169
7 1.5 646.2 1.1X
+SQL Json with UnsafeRow 11113 11137
33 1.4 706.5 1.0X
Review Comment:
~So, using UnsafeRow is slower than not using it? Is this a negative effect
brought about by saving memory?~
I have seen the PR description
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]