Hisoka-X commented on code in PR #41078:
URL: https://github.com/apache/spark/pull/41078#discussion_r1191261080


##########
sql/core/benchmarks/JsonBenchmark-results.txt:
##########
@@ -3,121 +3,121 @@ Benchmark for performance of JSON parsing
 
================================================================================================
 
 Preparing data for benchmarking ...
-OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1036-azure
+OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure
 Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
 JSON schema inferring:                    Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-No encoding                                        2973           3233         
291          1.7         594.7       1.0X
-UTF-8 is set                                       4375           4796         
430          1.1         874.9       0.7X
+No encoding                                        3871           3914         
 69          1.3         774.2       1.0X
+UTF-8 is set                                       5539           5563         
 26          0.9        1107.8       0.7X
 
 Preparing data for benchmarking ...
-OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1036-azure
+OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure
 Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
 count a short column:                     Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-No encoding                                        2359           2404         
 39          2.1         471.8       1.0X
-UTF-8 is set                                       3814           3885         
101          1.3         762.8       0.6X
+No encoding                                        2984           2999         
 24          1.7         596.9       1.0X
+UTF-8 is set                                       4875           4928         
 46          1.0         975.0       0.6X
 
 Preparing data for benchmarking ...
-OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1036-azure
+OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure
 Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
 count a wide column:                      Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-No encoding                                        4630           4969         
347          0.2        4630.4       1.0X
-UTF-8 is set                                       8963           9040         
 82          0.1        8963.4       0.5X
+No encoding                                        6353           6446         
143          0.2        6353.4       1.0X
+UTF-8 is set                                      10548          10647         
163          0.1       10547.8       0.6X
 
 Preparing data for benchmarking ...
-OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1036-azure
+OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure
 Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
 select wide row:                          Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-No encoding                                       15252          15481         
329          0.0      305030.9       1.0X
-UTF-8 is set                                      16349          16961         
627          0.0      326988.8       0.9X
+No encoding                                       18807          18880         
 66          0.0      376130.9       1.0X
+UTF-8 is set                                      20530          20554         
 23          0.0      410593.2       0.9X
 
 Preparing data for benchmarking ...
-OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1036-azure
+OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure
 Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
 Select a subset of 10 columns:            Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-Select 10 columns                                  2290           2296         
  6          0.4        2289.6       1.0X
-Select 1 column                                    1636           1652         
 15          0.6        1635.6       1.4X
+Select 10 columns                                  2741           2749         
 12          0.4        2740.6       1.0X
+Select 1 column                                    1916           1925         
  8          0.5        1916.5       1.4X
 
 Preparing data for benchmarking ...
-OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1036-azure
+OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure
 Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
 creation of JSON parser per line:         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-Short column without encoding                       661            673         
 12          1.5         661.1       1.0X
-Short column with UTF-8                             950            978         
 26          1.1         950.1       0.7X
-Wide column without encoding                      11106          11297         
179          0.1       11106.4       0.1X
-Wide column with UTF-8                            13743          13762         
 18          0.1       13743.3       0.0X
+Short column without encoding                       901            934         
 29          1.1         900.8       1.0X
+Short column with UTF-8                            1320           1343         
 31          0.8        1319.9       0.7X
+Wide column without encoding                      13446          13544         
103          0.1       13445.8       0.1X
+Wide column with UTF-8                            17770          17854         
 76          0.1       17770.0       0.1X
 
 Preparing data for benchmarking ...
-OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1036-azure
+OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure
 Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
 JSON functions:                           Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-Text read                                           119            131         
 15          8.4         119.5       1.0X
-from_json                                          2475           2493         
 18          0.4        2474.9       0.0X
-json_tuple                                         2680           2745         
 57          0.4        2680.3       0.0X
-get_json_object                                    2549           2630         
 88          0.4        2549.3       0.0X
+Text read                                           159            167         
  9          6.3         159.2       1.0X
+from_json                                          2844           2863         
 25          0.4        2844.1       0.1X
+json_tuple                                         3137           3161         
 23          0.3        3136.7       0.1X
+get_json_object                                    2874           2884         
  9          0.3        2874.2       0.1X
 
 Preparing data for benchmarking ...
-OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1036-azure
+OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure
 Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
 Dataset of json strings:                  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-Text read                                           545            567         
 29          9.2         109.0       1.0X
-schema inferring                                   2460           2498         
 42          2.0         492.1       0.2X
-parsing                                            2618           2656         
 36          1.9         523.6       0.2X
+Text read                                           732            745         
 11          6.8         146.3       1.0X
+schema inferring                                   3260           3265         
  6          1.5         652.0       0.2X
+parsing                                            3592           3645         
 46          1.4         718.4       0.2X
 
 Preparing data for benchmarking ...
-OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1036-azure
+OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure
 Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
 Json files in the per-line mode:          Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-Text read                                           884            897         
 16          5.7         176.8       1.0X
-Schema inferring                                   3016           3029         
 21          1.7         603.2       0.3X
-Parsing without charset                            3251           3267         
 14          1.5         650.2       0.3X
-Parsing with UTF-8                                 4892           5020         
118          1.0         978.3       0.2X
+Text read                                          1092           1100         
 11          4.6         218.4       1.0X
+Schema inferring                                   3814           3826         
 15          1.3         762.8       0.3X
+Parsing without charset                            4153           4184         
 32          1.2         830.7       0.3X
+Parsing with UTF-8                                 6014           6035         
 22          0.8        1202.9       0.2X
 
-OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1036-azure
+OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure
 Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
 Write dates and timestamps:               Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-Create a dataset of timestamps                      163            164         
  2          6.1         162.6       1.0X
-to_json(timestamp)                                 1307           1383         
 92          0.8        1307.4       0.1X
-write timestamps to files                          1044           1090         
 40          1.0        1044.5       0.2X
-Create a dataset of dates                           195            207         
 10          5.1         195.2       0.8X
-to_json(date)                                       915            934         
 19          1.1         914.8       0.2X
-write dates to files                                717            727         
  9          1.4         717.3       0.2X
+Create a dataset of timestamps                      193            198         
  4          5.2         193.5       1.0X
+to_json(timestamp)                                 1566           1582         
 14          0.6        1566.4       0.1X
+write timestamps to files                          1265           1274         
 14          0.8        1265.1       0.2X
+Create a dataset of dates                           232            239         
 10          4.3         231.9       0.8X
+to_json(date)                                      1037           1058         
 18          1.0        1037.2       0.2X
+write dates to files                                766            770         
  7          1.3         765.6       0.3X
 
-OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1036-azure
+OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure
 Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
 Read dates and timestamps:                                             Best 
Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
-----------------------------------------------------------------------------------------------------------------------------------------------------
-read timestamp text from files                                                 
  270            280           9          3.7         270.4       1.0X
-read timestamps from files                                                     
 2623           2789         159          0.4        2623.1       0.1X
-infer timestamps from files                                                    
 6416           7147         703          0.2        6415.7       0.0X
-read date text from files                                                      
  233            234           1          4.3         233.3       1.2X
-read date from files                                                           
  948            969          24          1.1         948.2       0.3X
-timestamp strings                                                              
  335            347          14          3.0         334.9       0.8X
-parse timestamps from Dataset[String]                                          
 2961           2993          41          0.3        2960.6       0.1X
-infer timestamps from Dataset[String]                                          
 7139           7314         158          0.1        7139.1       0.0X
-date strings                                                                   
  384            397          15          2.6         383.6       0.7X
-parse dates from Dataset[String]                                               
 1325           1347          24          0.8        1325.0       0.2X
-from_json(timestamp)                                                           
 4774           4788          13          0.2        4773.6       0.1X
-from_json(date)                                                                
 3078           3090          11          0.3        3078.5       0.1X
-infer error timestamps from Dataset[String] with default format                
 2025           2058          28          0.5        2025.0       0.1X
-infer error timestamps from Dataset[String] with user-provided format          
20261          20338          95          0.0       20260.6       0.0X
-infer error timestamps from Dataset[String] with legacy format                 
 5495           5528          38          0.2        5495.4       0.0X
+read timestamp text from files                                                 
  283            289           6          3.5         283.1       1.0X
+read timestamps from files                                                     
 3364           3431          60          0.3        3363.6       0.1X
+infer timestamps from files                                                    
 8913           8935          38          0.1        8912.6       0.0X
+read date text from files                                                      
  263            267           4          3.8         262.9       1.1X
+read date from files                                                           
 1102           1116          12          0.9        1101.7       0.3X
+timestamp strings                                                              
  412            426          14          2.4         412.0       0.7X
+parse timestamps from Dataset[String]                                          
 3941           3956          14          0.3        3940.8       0.1X
+infer timestamps from Dataset[String]                                          
 9334           9383          43          0.1        9333.8       0.0X
+date strings                                                                   
  469            484          24          2.1         469.3       0.6X
+parse dates from Dataset[String]                                               
 1565           1572          11          0.6        1564.8       0.2X
+from_json(timestamp)                                                           
 5825           5917          88          0.2        5824.5       0.0X
+from_json(date)                                                                
 3553           3574          19          0.3        3553.1       0.1X
+infer error timestamps from Dataset[String] with default format                
 2590           2609          19          0.4        2589.9       0.1X
+infer error timestamps from Dataset[String] with user-provided format          
 2517           2551          30          0.4        2516.8       0.1X

Review Comment:
   @MaxGekk The bencnmark updated. `infer error timestamps from Dataset[String] 
with user-provided format` speed already up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to