c21 commented on a change in pull request #35090:
URL: https://github.com/apache/spark/pull/35090#discussion_r778575137
##########
File path: sql/hive/benchmarks/OrcReadBenchmark-results.txt
##########
@@ -3,220 +3,220 @@ SQL Single Numeric Column Scan
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.11.0-1022-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
SQL Single TINYINT Column Scan: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Native ORC MR 832 1153
453 18.9 52.9 1.0X
-Native ORC Vectorized 148 189
24 106.5 9.4 5.6X
-Hive built-in ORC 986 1028
59 15.9 62.7 0.8X
+Native ORC MR 1102 1123
30 14.3 70.1 1.0X
+Native ORC Vectorized 177 254
47 89.0 11.2 6.2X
+Hive built-in ORC 1356 1396
57 11.6 86.2 0.8X
OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.11.0-1022-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
SQL Single SMALLINT Column Scan: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Native ORC MR 868 913
60 18.1 55.2 1.0X
-Native ORC Vectorized 133 150
21 118.6 8.4 6.5X
-Hive built-in ORC 1098 1102
6 14.3 69.8 0.8X
+Native ORC MR 1030 1054
33 15.3 65.5 1.0X
+Native ORC Vectorized 218 245
19 72.3 13.8 4.7X
+Hive built-in ORC 1511 1543
45 10.4 96.0 0.7X
OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.11.0-1022-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
SQL Single INT Column Scan: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Native ORC MR 898 917
24 17.5 57.1 1.0X
-Native ORC Vectorized 155 175
16 101.4 9.9 5.8X
-Hive built-in ORC 1114 1126
17 14.1 70.8 0.8X
+Native ORC MR 1151 1162
16 13.7 73.2 1.0X
+Native ORC Vectorized 224 256
20 70.1 14.3 5.1X
+Hive built-in ORC 1589 1661
103 9.9 101.0 0.7X
OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.11.0-1022-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
SQL Single BIGINT Column Scan: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Native ORC MR 897 981
117 17.5 57.0 1.0X
-Native ORC Vectorized 182 224
40 86.2 11.6 4.9X
-Hive built-in ORC 1194 1368
247 13.2 75.9 0.8X
+Native ORC MR 1097 1194
136 14.3 69.8 1.0X
+Native ORC Vectorized 248 274
23 63.4 15.8 4.4X
+Hive built-in ORC 1601 1615
19 9.8 101.8 0.7X
OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.11.0-1022-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
SQL Single FLOAT Column Scan: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Native ORC MR 968 987
23 16.2 61.6 1.0X
-Native ORC Vectorized 219 251
41 71.8 13.9 4.4X
-Hive built-in ORC 1229 1477
351 12.8 78.1 0.8X
+Native ORC MR 1132 1132
1 13.9 71.9 1.0X
+Native ORC Vectorized 263 287
21 59.8 16.7 4.3X
+Hive built-in ORC 1499 1509
15 10.5 95.3 0.8X
OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.11.0-1022-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
SQL Single DOUBLE Column Scan: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Native ORC MR 1006 1010
5 15.6 64.0 1.0X
-Native ORC Vectorized 245 265
20 64.2 15.6 4.1X
-Hive built-in ORC 1220 1228
12 12.9 77.6 0.8X
+Native ORC MR 1220 1236
22 12.9 77.6 1.0X
+Native ORC Vectorized 280 316
40 56.2 17.8 4.4X
+Hive built-in ORC 1581 1679
138 9.9 100.5 0.8X
================================================================================================
Int and String Scan
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.11.0-1022-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Int and String Scan: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Native ORC MR 1906 1923
25 5.5 181.8 1.0X
-Native ORC Vectorized 1057 1067
14 9.9 100.8 1.8X
-Hive built-in ORC 2183 2248
92 4.8 208.2 0.9X
+Native ORC MR 2327 2375
68 4.5 221.9 1.0X
+Native ORC Vectorized 1428 1438
14 7.3 136.2 1.6X
+Hive built-in ORC 2811 2865
76 3.7 268.1 0.8X
================================================================================================
Partitioned Table Scan
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.11.0-1022-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Partitioned Table: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Data column - Native ORC MR 1039 1107
95 15.1 66.1 1.0X
-Data column - Native ORC Vectorized 181 205
27 86.7 11.5 5.7X
-Data column - Hive built-in ORC 1344 1353
13 11.7 85.4 0.8X
-Partition column - Native ORC MR 686 699
12 22.9 43.6 1.5X
-Partition column - Native ORC Vectorized 54 64
6 291.4 3.4 19.3X
-Partition column - Hive built-in ORC 945 956
13 16.6 60.1 1.1X
-Both columns - Native ORC MR 1107 1115
11 14.2 70.4 0.9X
-Both columns - Native ORC Vectorized 199 258
52 79.2 12.6 5.2X
-Both columns - Hive built-in ORC 1383 1386
5 11.4 87.9 0.8X
+Data column - Native ORC MR 1288 1317
40 12.2 81.9 1.0X
+Data column - Native ORC Vectorized 265 302
28 59.3 16.9 4.9X
+Data column - Hive built-in ORC 1710 1753
60 9.2 108.7 0.8X
+Partition column - Native ORC MR 855 891
46 18.4 54.4 1.5X
+Partition column - Native ORC Vectorized 84 96
12 187.7 5.3 15.4X
+Partition column - Hive built-in ORC 1244 1254
15 12.6 79.1 1.0X
+Both columns - Native ORC MR 1460 1482
31 10.8 92.8 0.9X
+Both columns - Native ORC Vectorized 301 326
23 52.3 19.1 4.3X
+Both columns - Hive built-in ORC 1780 1830
70 8.8 113.2 0.7X
================================================================================================
Repeated String Scan
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.11.0-1022-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Repeated String: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Native ORC MR 908 916
8 11.5 86.6 1.0X
-Native ORC Vectorized 180 218
42 58.4 17.1 5.1X
-Hive built-in ORC 1156 1165
13 9.1 110.3 0.8X
+Native ORC MR 1143 1161
26 9.2 109.0 1.0X
+Native ORC Vectorized 261 298
49 40.1 24.9 4.4X
+Hive built-in ORC 1520 1579
84 6.9 145.0 0.8X
================================================================================================
String with Nulls Scan
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.11.0-1022-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
String with Nulls Scan (0.0%): Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Native ORC MR 1666 1719
75 6.3 158.9 1.0X
-Native ORC Vectorized 484 501
15 21.7 46.1 3.4X
-Hive built-in ORC 1985 1989
5 5.3 189.3 0.8X
+Native ORC MR 2155 2176
30 4.9 205.5 1.0X
+Native ORC Vectorized 640 684
43 16.4 61.0 3.4X
+Hive built-in ORC 2592 2654
88 4.0 247.2 0.8X
OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.11.0-1022-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
String with Nulls Scan (50.0%): Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Native ORC MR 1567 1635
96 6.7 149.5 1.0X
-Native ORC Vectorized 641 662
30 16.4 61.1 2.4X
-Hive built-in ORC 1885 1888
5 5.6 179.7 0.8X
+Native ORC MR 1706 1789
117 6.1 162.7 1.0X
+Native ORC Vectorized 721 814
137 14.5 68.8 2.4X
+Hive built-in ORC 2262 2283
29 4.6 215.7 0.8X
OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.11.0-1022-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
String with Nulls Scan (95.0%): Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Native ORC MR 845 851
6 12.4 80.6 1.0X
-Native ORC Vectorized 244 258
16 43.0 23.2 3.5X
-Hive built-in ORC 1107 1162
77 9.5 105.6 0.8X
+Native ORC MR 951 1001
45 11.0 90.7 1.0X
+Native ORC Vectorized 254 285
19 41.3 24.2 3.7X
+Hive built-in ORC 1352 1382
44 7.8 128.9 0.7X
================================================================================================
Single Column Scan From Wide Columns
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.11.0-1022-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Single Column Scan from 100 columns: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Native ORC MR 124 148
27 8.5 118.2 1.0X
-Native ORC Vectorized 71 82
11 14.8 67.4 1.8X
-Hive built-in ORC 782 804
35 1.3 745.6 0.2X
+Native ORC MR 173 205
23 6.0 165.3 1.0X
+Native ORC Vectorized 99 116
17 10.5 94.8 1.7X
+Hive built-in ORC 980 1042
87 1.1 934.4 0.2X
OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.11.0-1022-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Single Column Scan from 200 columns: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Native ORC MR 155 184
31 6.8 147.9 1.0X
-Native ORC Vectorized 101 130
24 10.4 96.2 1.5X
-Hive built-in ORC 1477 1494
25 0.7 1408.7 0.1X
+Native ORC MR 200 239
35 5.2 190.5 1.0X
+Native ORC Vectorized 145 174
23 7.3 137.8 1.4X
+Hive built-in ORC 1809 1945
193 0.6 1725.0 0.1X
OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.11.0-1022-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Single Column Scan from 300 columns: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Native ORC MR 191 227
29 5.5 182.4 1.0X
-Native ORC Vectorized 135 153
18 7.7 129.2 1.4X
-Hive built-in ORC 2085 2085
0 0.5 1988.1 0.1X
+Native ORC MR 278 336
52 3.8 264.7 1.0X
+Native ORC Vectorized 209 232
21 5.0 199.0 1.3X
+Hive built-in ORC 2679 2828
211 0.4 2554.8 0.1X
================================================================================================
Struct scan
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.11.0-1022-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Single Struct Column Scan with 10 Fields: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Native ORC MR 1126 1149
33 0.9 1073.7 1.0X
-Native ORC Vectorized 1136 1141
7 0.9 1083.4 1.0X
-Hive built-in ORC 589 595
8 1.8 561.4 1.9X
+Native ORC MR 363 427
78 2.9 346.2 1.0X
+Native ORC Vectorized 375 442
88 2.8 357.6 1.0X
Review comment:
just FYI @bersprockets - for vectorized scan of nested column, I have
fixed the benchmark to enable vectorization -
https://github.com/apache/spark/commit/4a2ba5b22e84fc79b44604c60320aa5ae679e13a
.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]