sadikovi commented on PR #42667: URL: https://github.com/apache/spark/pull/42667#issuecomment-1713103692
@dongjoon-hyun I reran the JSON benchmark and it seems like the previous results that I published were noisy. I confirmed there is no apparent regression in the patch. I ran only `Json files in the per-line mode` benchmark for 10 iterations. Results: Without the patch (latest master https://github.com/apache/spark/commit/eb0b09f0f2b518915421365a61d1f3d7d58b4404 with the patch reverted): ``` [info] OpenJDK 64-Bit Server VM 1.8.0_292-8u292-b10-0ubuntu1~18.04-b10 on Linux 5.4.0-1045-aws [info] Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz [info] Json files in the per-line mode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] Text read 463 476 15 10.8 92.6 1.0X [info] Schema inferring 2126 2166 48 2.4 425.1 0.2X [info] Parsing without charset 3195 3201 4 1.6 638.9 0.1X [info] Parsing with UTF-8 4129 4140 8 1.2 825.8 0.1X ``` With the patch (latest master): ``` [info] OpenJDK 64-Bit Server VM 1.8.0_292-8u292-b10-0ubuntu1~18.04-b10 on Linux 5.4.0-1045-aws [info] Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz [info] Json files in the per-line mode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] Text read 459 467 7 10.9 91.8 1.0X [info] Schema inferring 2159 2198 45 2.3 431.7 0.2X [info] Parsing without charset 3106 3119 12 1.6 621.2 0.1X [info] Parsing with UTF-8 4071 4090 10 1.2 814.2 0.1X ``` It seems the results are approximately the same as before. However, the benchmark results tend to fluctuate quite a lot. For example, when I reran the same benchmark without any code changes: The second run with the patch: ``` [info] OpenJDK 64-Bit Server VM 1.8.0_292-8u292-b10-0ubuntu1~18.04-b10 on Linux 5.4.0-1045-aws [info] Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz [info] Json files in the per-line mode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] Text read 458 469 9 10.9 91.5 1.0X [info] Schema inferring 2147 2184 48 2.3 429.4 0.2X [info] Parsing without charset 3294 3308 10 1.5 658.8 0.1X [info] Parsing with UTF-8 4437 4444 8 1.1 887.4 0.1X ``` I think it is fine and it is just noise in the benchmark, no apparent regression because of the patch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
