MaxGekk opened a new pull request #25871: [SPARK-29190][SQL] Optimize `extract`/`date_part` for the milliseconds `field` URL: https://github.com/apache/spark/pull/25871 ### What changes were proposed in this pull request? Changed the `DateTimeUtils.getMilliseconds()` by avoiding the decimal division, and replacing it by setting scale and precision while converting microseconds to the decimal type. ### Why are the changes needed? This improves performance of `extract` and `date_part()` by more than **50 times**: Before: ``` Invoke extract for timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative Invoke extract for timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ cast to timestamp 397 428 45 25.2 39.7 1.0X MILLISECONDS of timestamp 36723 36761 63 0.3 3672.3 0.0X ``` After: ``` Invoke extract for timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ cast to timestamp 278 284 6 36.0 27.8 1.0X MILLISECONDS of timestamp 592 606 13 16.9 59.2 0.5X ``` ### Does this PR introduce any user-facing change? No ### How was this patch tested? By existing test suite - `DateExpressionsSuite`
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
