Hisoka-X commented on code in PR #41078: URL: https://github.com/apache/spark/pull/41078#discussion_r1191261080
########## sql/core/benchmarks/JsonBenchmark-results.txt: ########## @@ -3,121 +3,121 @@ Benchmark for performance of JSON parsing ================================================================================================ Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1036-azure +OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz JSON schema inferring: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -No encoding 2973 3233 291 1.7 594.7 1.0X -UTF-8 is set 4375 4796 430 1.1 874.9 0.7X +No encoding 3871 3914 69 1.3 774.2 1.0X +UTF-8 is set 5539 5563 26 0.9 1107.8 0.7X Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1036-azure +OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz count a short column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -No encoding 2359 2404 39 2.1 471.8 1.0X -UTF-8 is set 3814 3885 101 1.3 762.8 0.6X +No encoding 2984 2999 24 1.7 596.9 1.0X +UTF-8 is set 4875 4928 46 1.0 975.0 0.6X Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1036-azure +OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz count a wide column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -No encoding 4630 4969 347 0.2 4630.4 1.0X -UTF-8 is set 8963 9040 82 0.1 8963.4 0.5X +No encoding 6353 6446 143 0.2 6353.4 1.0X +UTF-8 is set 10548 10647 163 0.1 10547.8 0.6X Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1036-azure +OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz select wide row: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -No encoding 15252 15481 329 0.0 305030.9 1.0X -UTF-8 is set 16349 16961 627 0.0 326988.8 0.9X +No encoding 18807 18880 66 0.0 376130.9 1.0X +UTF-8 is set 20530 20554 23 0.0 410593.2 0.9X Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1036-azure +OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz Select a subset of 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Select 10 columns 2290 2296 6 0.4 2289.6 1.0X -Select 1 column 1636 1652 15 0.6 1635.6 1.4X +Select 10 columns 2741 2749 12 0.4 2740.6 1.0X +Select 1 column 1916 1925 8 0.5 1916.5 1.4X Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1036-azure +OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz creation of JSON parser per line: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Short column without encoding 661 673 12 1.5 661.1 1.0X -Short column with UTF-8 950 978 26 1.1 950.1 0.7X -Wide column without encoding 11106 11297 179 0.1 11106.4 0.1X -Wide column with UTF-8 13743 13762 18 0.1 13743.3 0.0X +Short column without encoding 901 934 29 1.1 900.8 1.0X +Short column with UTF-8 1320 1343 31 0.8 1319.9 0.7X +Wide column without encoding 13446 13544 103 0.1 13445.8 0.1X +Wide column with UTF-8 17770 17854 76 0.1 17770.0 0.1X Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1036-azure +OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz JSON functions: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Text read 119 131 15 8.4 119.5 1.0X -from_json 2475 2493 18 0.4 2474.9 0.0X -json_tuple 2680 2745 57 0.4 2680.3 0.0X -get_json_object 2549 2630 88 0.4 2549.3 0.0X +Text read 159 167 9 6.3 159.2 1.0X +from_json 2844 2863 25 0.4 2844.1 0.1X +json_tuple 3137 3161 23 0.3 3136.7 0.1X +get_json_object 2874 2884 9 0.3 2874.2 0.1X Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1036-azure +OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz Dataset of json strings: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Text read 545 567 29 9.2 109.0 1.0X -schema inferring 2460 2498 42 2.0 492.1 0.2X -parsing 2618 2656 36 1.9 523.6 0.2X +Text read 732 745 11 6.8 146.3 1.0X +schema inferring 3260 3265 6 1.5 652.0 0.2X +parsing 3592 3645 46 1.4 718.4 0.2X Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1036-azure +OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz Json files in the per-line mode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Text read 884 897 16 5.7 176.8 1.0X -Schema inferring 3016 3029 21 1.7 603.2 0.3X -Parsing without charset 3251 3267 14 1.5 650.2 0.3X -Parsing with UTF-8 4892 5020 118 1.0 978.3 0.2X +Text read 1092 1100 11 4.6 218.4 1.0X +Schema inferring 3814 3826 15 1.3 762.8 0.3X +Parsing without charset 4153 4184 32 1.2 830.7 0.3X +Parsing with UTF-8 6014 6035 22 0.8 1202.9 0.2X -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1036-azure +OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Create a dataset of timestamps 163 164 2 6.1 162.6 1.0X -to_json(timestamp) 1307 1383 92 0.8 1307.4 0.1X -write timestamps to files 1044 1090 40 1.0 1044.5 0.2X -Create a dataset of dates 195 207 10 5.1 195.2 0.8X -to_json(date) 915 934 19 1.1 914.8 0.2X -write dates to files 717 727 9 1.4 717.3 0.2X +Create a dataset of timestamps 193 198 4 5.2 193.5 1.0X +to_json(timestamp) 1566 1582 14 0.6 1566.4 0.1X +write timestamps to files 1265 1274 14 0.8 1265.1 0.2X +Create a dataset of dates 232 239 10 4.3 231.9 0.8X +to_json(date) 1037 1058 18 1.0 1037.2 0.2X +write dates to files 766 770 7 1.3 765.6 0.3X -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1036-azure +OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz Read dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ----------------------------------------------------------------------------------------------------------------------------------------------------- -read timestamp text from files 270 280 9 3.7 270.4 1.0X -read timestamps from files 2623 2789 159 0.4 2623.1 0.1X -infer timestamps from files 6416 7147 703 0.2 6415.7 0.0X -read date text from files 233 234 1 4.3 233.3 1.2X -read date from files 948 969 24 1.1 948.2 0.3X -timestamp strings 335 347 14 3.0 334.9 0.8X -parse timestamps from Dataset[String] 2961 2993 41 0.3 2960.6 0.1X -infer timestamps from Dataset[String] 7139 7314 158 0.1 7139.1 0.0X -date strings 384 397 15 2.6 383.6 0.7X -parse dates from Dataset[String] 1325 1347 24 0.8 1325.0 0.2X -from_json(timestamp) 4774 4788 13 0.2 4773.6 0.1X -from_json(date) 3078 3090 11 0.3 3078.5 0.1X -infer error timestamps from Dataset[String] with default format 2025 2058 28 0.5 2025.0 0.1X -infer error timestamps from Dataset[String] with user-provided format 20261 20338 95 0.0 20260.6 0.0X -infer error timestamps from Dataset[String] with legacy format 5495 5528 38 0.2 5495.4 0.0X +read timestamp text from files 283 289 6 3.5 283.1 1.0X +read timestamps from files 3364 3431 60 0.3 3363.6 0.1X +infer timestamps from files 8913 8935 38 0.1 8912.6 0.0X +read date text from files 263 267 4 3.8 262.9 1.1X +read date from files 1102 1116 12 0.9 1101.7 0.3X +timestamp strings 412 426 14 2.4 412.0 0.7X +parse timestamps from Dataset[String] 3941 3956 14 0.3 3940.8 0.1X +infer timestamps from Dataset[String] 9334 9383 43 0.1 9333.8 0.0X +date strings 469 484 24 2.1 469.3 0.6X +parse dates from Dataset[String] 1565 1572 11 0.6 1564.8 0.2X +from_json(timestamp) 5825 5917 88 0.2 5824.5 0.0X +from_json(date) 3553 3574 19 0.3 3553.1 0.1X +infer error timestamps from Dataset[String] with default format 2590 2609 19 0.4 2589.9 0.1X +infer error timestamps from Dataset[String] with user-provided format 2517 2551 30 0.4 2516.8 0.1X Review Comment: @MaxGekk The bencnmark updated. `infer error timestamps from Dataset[String] with user-provided format` speed already up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
