MaxGekk commented on issue #25329: [WIP][SPARK-28596][SQL] Use Java 8 time API in the date_trunc() function URL: https://github.com/apache/spark/pull/25329#issuecomment-517976281 Unfortunately, benchmarks show that this implementation is 2-3 times slower than current one. Most of the time, it spends inside of resolving zone ids to zone offsets. For example, the `truncTimestamp` method is bounded by the binary search by historical zone infos in `DateTimeBenchmark`: <img width="945" alt="Screen Shot 2019-08-04 at 10 43 30" src="https://user-images.githubusercontent.com/1580697/62419955-2221a480-b6a5-11e9-88bd-5da028160c88.png"> I have changed the benchmark to avoid the search on zone history but `truncTimestamp` still spends most of the time in zone conversions. For recent year - `2019`, it is bound by getting zone info from an internal cache: <img width="940" alt="Screen Shot 2019-08-04 at 10 52 16" src="https://user-images.githubusercontent.com/1580697/62420057-fce16600-b6a5-11e9-8e07-ad5d3d7b6d84.png"> I will close the PR since I don't know how to avoid the zone conversions. Maybe in newer versions of JDK, this will be faster, and we will come back to the changes again.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
