MaxGekk commented on issue #24309: [SPARK-27398][SQL] Refactoring of CreateJacksonParser.getStreamDecoder URL: https://github.com/apache/spark/pull/24309#issuecomment-480593091 I re-ran JSON benchmark, and unfortunately it shows performance regression up to 2 times. For example, the last benchmarks: Before: ``` son files in the per-line mode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Text read 7537 7556 26 6.6 150.7 1.0X Schema inferring 27875 28306 499 1.8 557.5 0.3X Parsing without charset 26030 26083 67 1.9 520.6 0.3X Parsing with UTF-8 37115 37480 392 1.3 742.3 0.2X ``` After: ``` Json files in the per-line mode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Text read 7435 7457 27 6.7 148.7 1.0X Schema inferring 28307 28378 70 1.8 566.1 0.3X Parsing without charset 25104 25197 89 2.0 502.1 0.3X Parsing with UTF-8 66200 66402 216 0.8 1324.0 0.1X ``` I am closing the PR.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
