GitHub user ueshin opened a pull request:
https://github.com/apache/spark/pull/17933
[SPARK-20588][SQL] Cache TimeZone instances per thread.
## What changes were proposed in this pull request?
Because the method `TimeZone.getTimeZone(String ID)` is synchronized on the
TimeZone class, concurrent call of this method will become a bottleneck.
This especially happens when casting from string value containing timezone
info to timestamp value, which uses `DateTimeUtils.stringToTimestamp()` and
gets TimeZone instance on the site.
This pr makes a cache of the generated TimeZone instances per thread to
avoid the synchronization.
## How was this patch tested?
Existing tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ueshin/apache-spark issues/SPARK-20588
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/17933.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #17933
----
commit de79e50779c0f2e17ea26301ac7d1216b37331c9
Author: Takuya UESHIN <[email protected]>
Date: 2017-05-10T05:55:53Z
Cache TimeZone instances per thread.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]