MaxGekk commented on issue #25329: [WIP][SPARK-28596][SQL] Use Java 8 time API 
in the date_trunc() function
URL: https://github.com/apache/spark/pull/25329#issuecomment-517976281
 
 
   Unfortunately, benchmarks show that this implementation is 2-3 times slower 
than current one. Most of the time, it spends inside of resolving zone ids to 
zone offsets. For example, the `truncTimestamp` method is bounded by the binary 
search by historical zone infos in `DateTimeBenchmark`:
   <img width="945" alt="Screen Shot 2019-08-04 at 10 43 30" 
src="https://user-images.githubusercontent.com/1580697/62419955-2221a480-b6a5-11e9-88bd-5da028160c88.png";>
   I have changed the benchmark to avoid the search on zone history but 
`truncTimestamp` still spends most of the time in zone conversions. For recent 
year - `2019`, it is bound by getting zone info from an internal cache:
   <img width="940" alt="Screen Shot 2019-08-04 at 10 52 16" 
src="https://user-images.githubusercontent.com/1580697/62420057-fce16600-b6a5-11e9-8e07-ad5d3d7b6d84.png";>
   
   I will close the PR since I don't know how to avoid the zone conversions. 
Maybe in newer versions of JDK, this will be faster, and we will come back to 
the changes again.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to