dongjoon-hyun commented on a change in pull request #28966:
URL: https://github.com/apache/spark/pull/28966#discussion_r448621700



##########
File path: sql/core/benchmarks/JsonBenchmark-jdk11-results.txt
##########
@@ -7,106 +7,106 @@ OpenJDK 64-Bit Server VM 
11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-106
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 JSON schema inferring:                    Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-No encoding                                       68879          68993         
116          1.5         688.8       1.0X
-UTF-8 is set                                     115270         115602         
455          0.9        1152.7       0.6X
+No encoding                                       69219          69342         
116          1.4         692.2       1.0X
+UTF-8 is set                                     143950         143986         
 55          0.7        1439.5       0.5X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 count a short column:                     Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-No encoding                                       47452          47538         
113          2.1         474.5       1.0X
-UTF-8 is set                                      77330          77354         
 30          1.3         773.3       0.6X
+No encoding                                       57828          57913         
136          1.7         578.3       1.0X
+UTF-8 is set                                      83649          83711         
 60          1.2         836.5       0.7X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 count a wide column:                      Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-No encoding                                       60470          60900         
534          0.2        6047.0       1.0X
-UTF-8 is set                                     104733         104931         
189          0.1       10473.3       0.6X
+No encoding                                       64560          65193        
1023          0.2        6456.0       1.0X
+UTF-8 is set                                     102925         103174         
216          0.1       10292.5       0.6X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 select wide row:                          Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-No encoding                                      130302         131072         
976          0.0      260604.6       1.0X
-UTF-8 is set                                     150860         151284         
377          0.0      301720.1       0.9X
+No encoding                                      131002         132316        
1160          0.0      262003.1       1.0X
+UTF-8 is set                                     152128         152371         
332          0.0      304256.5       0.9X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Select a subset of 10 columns:            Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-Select 10 columns                                 18619          18684         
 99          0.5        1861.9       1.0X
-Select 1 column                                   24227          24270         
 38          0.4        2422.7       0.8X
+Select 10 columns                                 19376          19514         
160          0.5        1937.6       1.0X
+Select 1 column                                   24089          24156         
 58          0.4        2408.9       0.8X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 creation of JSON parser per line:         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-Short column without encoding                      7947           7971         
 21          1.3         794.7       1.0X
-Short column with UTF-8                           12700          12753         
 58          0.8        1270.0       0.6X
-Wide column without encoding                      92632          92955         
463          0.1        9263.2       0.1X
-Wide column with UTF-8                           147013         147170         
188          0.1       14701.3       0.1X
+Short column without encoding                      8131           8219         
103          1.2         813.1       1.0X
+Short column with UTF-8                           13464          13508         
 44          0.7        1346.4       0.6X
+Wide column without encoding                     108012         108598         
914          0.1       10801.2       0.1X
+Wide column with UTF-8                           150988         151369         
412          0.1       15098.8       0.1X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 JSON functions:                           Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-Text read                                           713            734         
 19         14.0          71.3       1.0X
-from_json                                         22019          22429         
456          0.5        2201.9       0.0X
-json_tuple                                        27987          28047         
 74          0.4        2798.7       0.0X
-get_json_object                                   21468          21870         
350          0.5        2146.8       0.0X
+Text read                                           753            765         
 18         13.3          75.3       1.0X
+from_json                                         23182          23446         
230          0.4        2318.2       0.0X
+json_tuple                                        31129          31304         
181          0.3        3112.9       0.0X
+get_json_object                                   22821          23073         
225          0.4        2282.1       0.0X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Dataset of json strings:                  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-Text read                                          2887           2910         
 24         17.3          57.7       1.0X
-schema inferring                                  31793          31843         
 43          1.6         635.9       0.1X
-parsing                                           36791          37104         
294          1.4         735.8       0.1X
+Text read                                          3078           3101         
 26         16.2          61.6       1.0X
+schema inferring                                  30225          30434         
333          1.7         604.5       0.1X
+parsing                                           32237          32308         
 63          1.6         644.7       0.1X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Json files in the per-line mode:          Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-Text read                                         10570          10611         
 45          4.7         211.4       1.0X
-Schema inferring                                  48729          48763         
 41          1.0         974.6       0.2X
-Parsing without charset                           35490          35648         
141          1.4         709.8       0.3X
-Parsing with UTF-8                                63853          63994         
163          0.8        1277.1       0.2X
+Text read                                         10835          10900         
 86          4.6         216.7       1.0X
+Schema inferring                                  37720          37805         
110          1.3         754.4       0.3X
+Parsing without charset                           35464          35538         
100          1.4         709.3       0.3X
+Parsing with UTF-8                                67311          67738         
381          0.7        1346.2       0.2X
 
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Write dates and timestamps:               Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-Create a dataset of timestamps                     2187           2190         
  5          4.6         218.7       1.0X
-to_json(timestamp)                                16262          16503         
323          0.6        1626.2       0.1X
-write timestamps to files                         11679          11692         
 12          0.9        1167.9       0.2X
-Create a dataset of dates                          2297           2310         
 12          4.4         229.7       1.0X
-to_json(date)                                     10904          10956         
 46          0.9        1090.4       0.2X
-write dates to files                               6610           6645         
 35          1.5         661.0       0.3X
+Create a dataset of timestamps                     2208           2222         
 14          4.5         220.8       1.0X
+to_json(timestamp)                                14299          14570         
285          0.7        1429.9       0.2X
+write timestamps to files                         12955          12969         
 13          0.8        1295.5       0.2X
+Create a dataset of dates                          2297           2323         
 30          4.4         229.7       1.0X
+to_json(date)                                      8509           8561         
 74          1.2         850.9       0.3X
+write dates to files                               6786           6827         
 45          1.5         678.6       0.3X
 
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Read dates and timestamps:                Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-read timestamp text from files                     2524           2530         
  9          4.0         252.4       1.0X
-read timestamps from files                        41002          41052         
 59          0.2        4100.2       0.1X
-infer timestamps from files                       84621          84939         
526          0.1        8462.1       0.0X
-read date text from files                          2292           2302         
  9          4.4         229.2       1.1X
-read date from files                              16954          16976         
 21          0.6        1695.4       0.1X
-timestamp strings                                  3067           3077         
 13          3.3         306.7       0.8X
-parse timestamps from Dataset[String]             48690          48971         
243          0.2        4869.0       0.1X
-infer timestamps from Dataset[String]             97463          97786         
338          0.1        9746.3       0.0X
-date strings                                       3952           3956         
  3          2.5         395.2       0.6X
-parse dates from Dataset[String]                  24210          24241         
 30          0.4        2421.0       0.1X
-from_json(timestamp)                              71710          72242         
629          0.1        7171.0       0.0X
-from_json(date)                                   42465          42481         
 13          0.2        4246.5       0.1X
+read timestamp text from files                     2598           2613         
 18          3.8         259.8       1.0X
+read timestamps from files                        42007          42028         
 19          0.2        4200.7       0.1X
+infer timestamps from files                       18102          18120         
 28          0.6        1810.2       0.1X

Review comment:
       Sorry, but this test case seems to be supposed to run `infer timestamp` 
always. Why does this become faster?
   ```
   - infer timestamps from files                       84621          84939     
    526          0.1        8462.1       0.0X
   + infer timestamps from files                       18102          18120     
     28          0.6        1810.2       0.1X
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to