Hisoka-X commented on code in PR #41091: URL: https://github.com/apache/spark/pull/41091#discussion_r1193139144
########## sql/core/benchmarks/JsonBenchmark-results.txt: ########## @@ -4,120 +4,120 @@ Benchmark for performance of JSON parsing Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure -Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz +Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz JSON schema inferring: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -No encoding 3871 3914 69 1.3 774.2 1.0X -UTF-8 is set 5539 5563 26 0.9 1107.8 0.7X +No encoding 3720 3843 121 1.3 743.9 1.0X +UTF-8 is set 5412 5455 45 0.9 1082.4 0.7X Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure -Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz +Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz count a short column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -No encoding 2984 2999 24 1.7 596.9 1.0X -UTF-8 is set 4875 4928 46 1.0 975.0 0.6X +No encoding 3234 3254 33 1.5 646.7 1.0X +UTF-8 is set 4847 4868 21 1.0 969.5 0.7X Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure -Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz +Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz count a wide column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -No encoding 6353 6446 143 0.2 6353.4 1.0X -UTF-8 is set 10548 10647 163 0.1 10547.8 0.6X +No encoding 5702 5794 101 0.2 5702.1 1.0X +UTF-8 is set 9526 9607 73 0.1 9526.1 0.6X Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure -Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz +Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz select wide row: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -No encoding 18807 18880 66 0.0 376130.9 1.0X -UTF-8 is set 20530 20554 23 0.0 410593.2 0.9X +No encoding 18318 18448 199 0.0 366367.7 1.0X +UTF-8 is set 19791 19887 99 0.0 395817.1 0.9X Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure -Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz +Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz Select a subset of 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Select 10 columns 2741 2749 12 0.4 2740.6 1.0X -Select 1 column 1916 1925 8 0.5 1916.5 1.4X +Select 10 columns 2531 2570 51 0.4 2531.3 1.0X +Select 1 column 1867 1882 16 0.5 1867.0 1.4X Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure -Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz +Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz creation of JSON parser per line: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Short column without encoding 901 934 29 1.1 900.8 1.0X -Short column with UTF-8 1320 1343 31 0.8 1319.9 0.7X -Wide column without encoding 13446 13544 103 0.1 13445.8 0.1X -Wide column with UTF-8 17770 17854 76 0.1 17770.0 0.1X +Short column without encoding 868 875 7 1.2 868.4 1.0X +Short column with UTF-8 1151 1163 11 0.9 1150.9 0.8X +Wide column without encoding 12063 12299 205 0.1 12063.0 0.1X +Wide column with UTF-8 16095 16136 51 0.1 16095.3 0.1X Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure -Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz +Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz JSON functions: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Text read 159 167 9 6.3 159.2 1.0X -from_json 2844 2863 25 0.4 2844.1 0.1X -json_tuple 3137 3161 23 0.3 3136.7 0.1X -get_json_object 2874 2884 9 0.3 2874.2 0.1X +Text read 165 170 4 6.1 164.7 1.0X +from_json 2339 2386 77 0.4 2338.9 0.1X +json_tuple 2667 2730 55 0.4 2667.3 0.1X +get_json_object 2627 2659 32 0.4 2627.1 0.1X Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure -Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz +Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz Dataset of json strings: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Text read 732 745 11 6.8 146.3 1.0X -schema inferring 3260 3265 6 1.5 652.0 0.2X -parsing 3592 3645 46 1.4 718.4 0.2X +Text read 700 715 20 7.1 140.1 1.0X +schema inferring 3144 3166 20 1.6 628.7 0.2X +parsing 3261 3271 9 1.5 652.1 0.2X Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure -Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz +Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz Json files in the per-line mode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Text read 1092 1100 11 4.6 218.4 1.0X -Schema inferring 3814 3826 15 1.3 762.8 0.3X -Parsing without charset 4153 4184 32 1.2 830.7 0.3X -Parsing with UTF-8 6014 6035 22 0.8 1202.9 0.2X +Text read 1096 1105 12 4.6 219.1 1.0X +Schema inferring 3818 3830 16 1.3 763.6 0.3X +Parsing without charset 4107 4137 32 1.2 821.4 0.3X +Parsing with UTF-8 5717 5763 41 0.9 1143.3 0.2X OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure -Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz +Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Create a dataset of timestamps 193 198 4 5.2 193.5 1.0X -to_json(timestamp) 1566 1582 14 0.6 1566.4 0.1X -write timestamps to files 1265 1274 14 0.8 1265.1 0.2X -Create a dataset of dates 232 239 10 4.3 231.9 0.8X -to_json(date) 1037 1058 18 1.0 1037.2 0.2X -write dates to files 766 770 7 1.3 765.6 0.3X +Create a dataset of timestamps 199 202 3 5.0 198.9 1.0X +to_json(timestamp) 1458 1487 26 0.7 1458.0 0.1X +write timestamps to files 1232 1262 26 0.8 1232.5 0.2X +Create a dataset of dates 231 237 5 4.3 230.8 0.9X +to_json(date) 956 966 10 1.0 956.5 0.2X +write dates to files 785 793 10 1.3 785.4 0.3X OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure -Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz +Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz Read dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ----------------------------------------------------------------------------------------------------------------------------------------------------- -read timestamp text from files 283 289 6 3.5 283.1 1.0X -read timestamps from files 3364 3431 60 0.3 3363.6 0.1X -infer timestamps from files 8913 8935 38 0.1 8912.6 0.0X -read date text from files 263 267 4 3.8 262.9 1.1X -read date from files 1102 1116 12 0.9 1101.7 0.3X -timestamp strings 412 426 14 2.4 412.0 0.7X -parse timestamps from Dataset[String] 3941 3956 14 0.3 3940.8 0.1X -infer timestamps from Dataset[String] 9334 9383 43 0.1 9333.8 0.0X -date strings 469 484 24 2.1 469.3 0.6X -parse dates from Dataset[String] 1565 1572 11 0.6 1564.8 0.2X -from_json(timestamp) 5825 5917 88 0.2 5824.5 0.0X -from_json(date) 3553 3574 19 0.3 3553.1 0.1X -infer error timestamps from Dataset[String] with default format 2590 2609 19 0.4 2589.9 0.1X -infer error timestamps from Dataset[String] with user-provided format 2517 2551 30 0.4 2516.8 0.1X -infer error timestamps from Dataset[String] with legacy format 6836 6876 63 0.1 6836.1 0.0X Review Comment: @MaxGekk Hi, I updated the benchmark, the speed already up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org