dtenedor commented on code in PR #39370:
URL: https://github.com/apache/spark/pull/39370#discussion_r1060966061
##########
sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala:
##########
@@ -1552,7 +1552,6 @@ class InsertSuite extends DataSourceTest with
SharedSparkSession {
test("INSERT rows, ALTER TABLE ADD COLUMNS with DEFAULTs, then SELECT them")
{
case class Config(
sqlConf: Option[(String, String)],
- insertNullsToStorage: Boolean = true,
Review Comment:
> Thank you, @dtenedor . I'll verify the perf today from my side too.
I re-ran the Orc benchmark. Here it is with this PR:
```
================================================================================================
SQL Single Numeric Column Scan
================================================================================================
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
Apple M1 Max
SQL Single TINYINT Column Scan: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Hive built-in ORC 1552 1609
80 10.1 98.7 1.0X
Native ORC MR 1383 1384
2 11.4 87.9 1.1X
Native ORC Vectorized 196 245
56 80.3 12.4 7.9X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
Apple M1 Max
SQL Single SMALLINT Column Scan: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Hive built-in ORC 1651 1666
22 9.5 105.0 1.0X
Native ORC MR 1267 1269
4 12.4 80.5 1.3X
Native ORC Vectorized 163 202
49 96.4 10.4 10.1X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
Apple M1 Max
SQL Single INT Column Scan: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Hive built-in ORC 1776 1812
51 8.9 112.9 1.0X
Native ORC MR 1371 1399
40 11.5 87.2 1.3X
Native ORC Vectorized 211 274
72 74.7 13.4 8.4X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
Apple M1 Max
SQL Single BIGINT Column Scan: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Hive built-in ORC 1767 1915
209 8.9 112.4 1.0X
Native ORC MR 1334 1395
86 11.8 84.8 1.3X
Native ORC Vectorized 224 302
74 70.2 14.2 7.9X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
Apple M1 Max
SQL Single FLOAT Column Scan: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Hive built-in ORC 1795 1876
114 8.8 114.1 1.0X
Native ORC MR 1513 1523
14 10.4 96.2 1.2X
Native ORC Vectorized 279 323
66 56.4 17.7 6.4X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
Apple M1 Max
SQL Single DOUBLE Column Scan: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Hive built-in ORC 1713 1762
70 9.2 108.9 1.0X
Native ORC MR 1372 1398
38 11.5 87.2 1.2X
Native ORC Vectorized 309 334
18 50.9 19.7 5.5X
================================================================================================
Int and String Scan
================================================================================================
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
Apple M1 Max
Int and String Scan: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Hive built-in ORC 3571 3711
197 2.9 340.6 1.0X
Native ORC MR 2750 2824
104 3.8 262.3 1.3X
Native ORC Vectorized 1749 1827
111 6.0 166.8 2.0X
================================================================================================
Partitioned Table Scan
================================================================================================
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
Apple M1 Max
Partitioned Table: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Data column - Hive built-in ORC 1976 2115
197 8.0 125.6 1.0X
Data column - Native ORC MR 1670 1769
140 9.4 106.2 1.2X
Data column - Native ORC Vectorized 223 271
47 70.4 14.2 8.8X
Partition column - Hive built-in ORC 1352 1395
61 11.6 86.0 1.5X
Partition column - Native ORC MR 1054 1113
84 14.9 67.0 1.9X
Partition column - Native ORC Vectorized 67 120
58 235.3 4.3 29.6X
Both columns - Hive built-in ORC 2130 2139
13 7.4 135.4 0.9X
Both columns - Native ORC MR 1642 1646
6 9.6 104.4 1.2X
Both columns - Native ORC Vectorized 235 253
15 66.9 14.9 8.4X
================================================================================================
Repeated String Scan
================================================================================================
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
Apple M1 Max
Repeated String: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Hive built-in ORC 1617 1685
96 6.5 154.2 1.0X
Native ORC MR 1292 1292
0 8.1 123.2 1.3X
Native ORC Vectorized 243 255
8 43.1 23.2 6.6X
================================================================================================
String with Nulls Scan
================================================================================================
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
Apple M1 Max
String with Nulls Scan (0.0%): Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Hive built-in ORC 2735 2816
114 3.8 260.8 1.0X
Native ORC MR 2157 2248
129 4.9 205.7 1.3X
Native ORC Vectorized 677 686
15 15.5 64.5 4.0X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
Apple M1 Max
String with Nulls Scan (50.0%): Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Hive built-in ORC 2610 2620
14 4.0 248.9 1.0X
Native ORC MR 2134 2181
67 4.9 203.5 1.2X
Native ORC Vectorized 962 983
32 10.9 91.8 2.7X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
Apple M1 Max
String with Nulls Scan (95.0%): Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Hive built-in ORC 1606 1632
37 6.5 153.1 1.0X
Native ORC MR 1219 1239
28 8.6 116.3 1.3X
Native ORC Vectorized 284 292
7 37.0 27.0 5.7X
================================================================================================
Single Column Scan From Wide Columns
================================================================================================
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
Apple M1 Max
Single Column Scan from 100 columns: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Hive built-in ORC 1384 1466
117 0.8 1319.7 1.0X
Native ORC MR 204 303
94 5.1 194.7 6.8X
Native ORC Vectorized 99 116
17 10.6 94.7 13.9X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
Apple M1 Max
Single Column Scan from 200 columns: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Hive built-in ORC 2313 2437
176 0.5 2205.5 1.0X
Native ORC MR 252 314
90 4.2 239.9 9.2X
Native ORC Vectorized 177 280
97 5.9 169.0 13.0X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
Apple M1 Max
Single Column Scan from 300 columns: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Hive built-in ORC 3332 3456
175 0.3 3178.1 1.0X
Native ORC MR 336 348
12 3.1 320.2 9.9X
Native ORC Vectorized 232 288
80 4.5 221.6 14.3X
================================================================================================
Struct scan
================================================================================================
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
Apple M1 Max
Single Struct Column Scan with 10 Fields: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Hive built-in ORC 550 611
87 1.9 524.5 1.0X
Native ORC MR 456 488
38 2.3 434.7 1.2X
Native ORC Vectorized 209 215
9 5.0 199.1 2.6X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
Apple M1 Max
Single Struct Column Scan with 100 Fields: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------
Hive built-in ORC 3925 4114
268 0.3 3743.0 1.0X
Native ORC MR 3635 3638
5 0.3 3466.4 1.1X
Native ORC Vectorized 1840 1860
27 0.6 1755.2 2.1X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
Apple M1 Max
Single Struct Column Scan with 300 Fields: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------
Hive built-in ORC 12554 12628
105 0.1 11972.9 1.0X
Native ORC MR 14810 14823
19 0.1 14123.6 0.8X
Native ORC Vectorized 14279 14405
178 0.1 13617.3 0.9X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
Apple M1 Max
Single Struct Column Scan with 600 Fields: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------
Hive built-in ORC 26014 26116
145 0.0 24808.7 1.0X
Native ORC MR 39066 39790
1023 0.0 37256.4 0.7X
Native ORC Vectorized 39048 39152
148 0.0 37238.7 0.7X
================================================================================================
Nested Struct scan
================================================================================================
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
Apple M1 Max
Nested Struct Scan with 10 Elements, 10 Fields: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------------
Hive built-in ORC 4719 4800
115 0.2 4500.1 1.0X
Native ORC MR 5100 5214
161 0.2 4864.0 0.9X
Native ORC Vectorized 1181 1185
5 0.9 1126.6 4.0X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
Apple M1 Max
Nested Struct Scan with 30 Elements, 10 Fields: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------------
Hive built-in ORC 12792 12976
260 0.1 12199.6 1.0X
Native ORC MR 12832 12952
170 0.1 12237.6 1.0X
Native ORC Vectorized 3209 3214
7 0.3 3060.1 4.0X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
Apple M1 Max
Nested Struct Scan with 10 Elements, 30 Fields: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------------
Hive built-in ORC 10448 10517
98 0.1 9963.6 1.0X
Native ORC MR 13308 13308
0 0.1 12691.7 0.8X
Native ORC Vectorized 3292 3337
64 0.3 3139.1 3.2X
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]