dtenedor commented on code in PR #39370:
URL: https://github.com/apache/spark/pull/39370#discussion_r1060966061


##########
sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala:
##########
@@ -1552,7 +1552,6 @@ class InsertSuite extends DataSourceTest with 
SharedSparkSession {
   test("INSERT rows, ALTER TABLE ADD COLUMNS with DEFAULTs, then SELECT them") 
{
     case class Config(
         sqlConf: Option[(String, String)],
-        insertNullsToStorage: Boolean = true,

Review Comment:
   > Thank you, @dtenedor . I'll verify the perf today from my side too.
   
   I re-ran the Orc benchmark. Here it is with this PR:
   
   ```
   
================================================================================================
   SQL Single Numeric Column Scan
   
================================================================================================
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
   Apple M1 Max
   SQL Single TINYINT Column Scan:           Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Hive built-in ORC                                  1552           1609       
   80         10.1          98.7       1.0X
   Native ORC MR                                      1383           1384       
    2         11.4          87.9       1.1X
   Native ORC Vectorized                               196            245       
   56         80.3          12.4       7.9X
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
   Apple M1 Max
   SQL Single SMALLINT Column Scan:          Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Hive built-in ORC                                  1651           1666       
   22          9.5         105.0       1.0X
   Native ORC MR                                      1267           1269       
    4         12.4          80.5       1.3X
   Native ORC Vectorized                               163            202       
   49         96.4          10.4      10.1X
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
   Apple M1 Max
   SQL Single INT Column Scan:               Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Hive built-in ORC                                  1776           1812       
   51          8.9         112.9       1.0X
   Native ORC MR                                      1371           1399       
   40         11.5          87.2       1.3X
   Native ORC Vectorized                               211            274       
   72         74.7          13.4       8.4X
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
   Apple M1 Max
   SQL Single BIGINT Column Scan:            Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Hive built-in ORC                                  1767           1915       
  209          8.9         112.4       1.0X
   Native ORC MR                                      1334           1395       
   86         11.8          84.8       1.3X
   Native ORC Vectorized                               224            302       
   74         70.2          14.2       7.9X
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
   Apple M1 Max
   SQL Single FLOAT Column Scan:             Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Hive built-in ORC                                  1795           1876       
  114          8.8         114.1       1.0X
   Native ORC MR                                      1513           1523       
   14         10.4          96.2       1.2X
   Native ORC Vectorized                               279            323       
   66         56.4          17.7       6.4X
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
   Apple M1 Max
   SQL Single DOUBLE Column Scan:            Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Hive built-in ORC                                  1713           1762       
   70          9.2         108.9       1.0X
   Native ORC MR                                      1372           1398       
   38         11.5          87.2       1.2X
   Native ORC Vectorized                               309            334       
   18         50.9          19.7       5.5X
   
================================================================================================
   Int and String Scan
   
================================================================================================
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
   Apple M1 Max
   Int and String Scan:                      Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Hive built-in ORC                                  3571           3711       
  197          2.9         340.6       1.0X
   Native ORC MR                                      2750           2824       
  104          3.8         262.3       1.3X
   Native ORC Vectorized                              1749           1827       
  111          6.0         166.8       2.0X
   
================================================================================================
   Partitioned Table Scan
   
================================================================================================
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
   Apple M1 Max
   Partitioned Table:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Data column - Hive built-in ORC                    1976           2115       
  197          8.0         125.6       1.0X
   Data column - Native ORC MR                        1670           1769       
  140          9.4         106.2       1.2X
   Data column - Native ORC Vectorized                 223            271       
   47         70.4          14.2       8.8X
   Partition column - Hive built-in ORC               1352           1395       
   61         11.6          86.0       1.5X
   Partition column - Native ORC MR                   1054           1113       
   84         14.9          67.0       1.9X
   Partition column - Native ORC Vectorized             67            120       
   58        235.3           4.3      29.6X
   Both columns - Hive built-in ORC                   2130           2139       
   13          7.4         135.4       0.9X
   Both columns - Native ORC MR                       1642           1646       
    6          9.6         104.4       1.2X
   Both columns - Native ORC Vectorized                235            253       
   15         66.9          14.9       8.4X
   
================================================================================================
   Repeated String Scan
   
================================================================================================
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
   Apple M1 Max
   Repeated String:                          Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Hive built-in ORC                                  1617           1685       
   96          6.5         154.2       1.0X
   Native ORC MR                                      1292           1292       
    0          8.1         123.2       1.3X
   Native ORC Vectorized                               243            255       
    8         43.1          23.2       6.6X
   
================================================================================================
   String with Nulls Scan
   
================================================================================================
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
   Apple M1 Max
   String with Nulls Scan (0.0%):            Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Hive built-in ORC                                  2735           2816       
  114          3.8         260.8       1.0X
   Native ORC MR                                      2157           2248       
  129          4.9         205.7       1.3X
   Native ORC Vectorized                               677            686       
   15         15.5          64.5       4.0X
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
   Apple M1 Max
   String with Nulls Scan (50.0%):           Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Hive built-in ORC                                  2610           2620       
   14          4.0         248.9       1.0X
   Native ORC MR                                      2134           2181       
   67          4.9         203.5       1.2X
   Native ORC Vectorized                               962            983       
   32         10.9          91.8       2.7X
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
   Apple M1 Max
   String with Nulls Scan (95.0%):           Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Hive built-in ORC                                  1606           1632       
   37          6.5         153.1       1.0X
   Native ORC MR                                      1219           1239       
   28          8.6         116.3       1.3X
   Native ORC Vectorized                               284            292       
    7         37.0          27.0       5.7X
   
================================================================================================
   Single Column Scan From Wide Columns
   
================================================================================================
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
   Apple M1 Max
   Single Column Scan from 100 columns:      Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Hive built-in ORC                                  1384           1466       
  117          0.8        1319.7       1.0X
   Native ORC MR                                       204            303       
   94          5.1         194.7       6.8X
   Native ORC Vectorized                                99            116       
   17         10.6          94.7      13.9X
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
   Apple M1 Max
   Single Column Scan from 200 columns:      Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Hive built-in ORC                                  2313           2437       
  176          0.5        2205.5       1.0X
   Native ORC MR                                       252            314       
   90          4.2         239.9       9.2X
   Native ORC Vectorized                               177            280       
   97          5.9         169.0      13.0X
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
   Apple M1 Max
   Single Column Scan from 300 columns:      Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Hive built-in ORC                                  3332           3456       
  175          0.3        3178.1       1.0X
   Native ORC MR                                       336            348       
   12          3.1         320.2       9.9X
   Native ORC Vectorized                               232            288       
   80          4.5         221.6      14.3X
   
================================================================================================
   Struct scan
   
================================================================================================
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
   Apple M1 Max
   Single Struct Column Scan with 10 Fields:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Hive built-in ORC                                   550            611       
   87          1.9         524.5       1.0X
   Native ORC MR                                       456            488       
   38          2.3         434.7       1.2X
   Native ORC Vectorized                               209            215       
    9          5.0         199.1       2.6X
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
   Apple M1 Max
   Single Struct Column Scan with 100 Fields:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-------------------------------------------------------------------------------------------------------------------------
   Hive built-in ORC                                   3925           4114      
   268          0.3        3743.0       1.0X
   Native ORC MR                                       3635           3638      
     5          0.3        3466.4       1.1X
   Native ORC Vectorized                               1840           1860      
    27          0.6        1755.2       2.1X
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
   Apple M1 Max
   Single Struct Column Scan with 300 Fields:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-------------------------------------------------------------------------------------------------------------------------
   Hive built-in ORC                                  12554          12628      
   105          0.1       11972.9       1.0X
   Native ORC MR                                      14810          14823      
    19          0.1       14123.6       0.8X
   Native ORC Vectorized                              14279          14405      
   178          0.1       13617.3       0.9X
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
   Apple M1 Max
   Single Struct Column Scan with 600 Fields:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-------------------------------------------------------------------------------------------------------------------------
   Hive built-in ORC                                  26014          26116      
   145          0.0       24808.7       1.0X
   Native ORC MR                                      39066          39790      
  1023          0.0       37256.4       0.7X
   Native ORC Vectorized                              39048          39152      
   148          0.0       37238.7       0.7X
   
================================================================================================
   Nested Struct scan
   
================================================================================================
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
   Apple M1 Max
   Nested Struct Scan with 10 Elements, 10 Fields:  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------------
   Hive built-in ORC                                        4719           4800 
        115          0.2        4500.1       1.0X
   Native ORC MR                                            5100           5214 
        161          0.2        4864.0       0.9X
   Native ORC Vectorized                                    1181           1185 
          5          0.9        1126.6       4.0X
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
   Apple M1 Max
   Nested Struct Scan with 30 Elements, 10 Fields:  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------------
   Hive built-in ORC                                       12792          12976 
        260          0.1       12199.6       1.0X
   Native ORC MR                                           12832          12952 
        170          0.1       12237.6       1.0X
   Native ORC Vectorized                                    3209           3214 
          7          0.3        3060.1       4.0X
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.16
   Apple M1 Max
   Nested Struct Scan with 10 Elements, 30 Fields:  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------------
   Hive built-in ORC                                       10448          10517 
         98          0.1        9963.6       1.0X
   Native ORC MR                                           13308          13308 
          0          0.1       12691.7       0.8X
   Native ORC Vectorized                                    3292           3337 
         64          0.3        3139.1       3.2X
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to