GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/23150
[SPARK-26178][SQL] Use java.time API for parsing timestamps and dates from
CSV
## What changes were proposed in this pull request?
In the PR, I propose to use **java.time API** for parsing timestamps and
dates from CSV content with microseconds precision. The SQL config
`spark.sql.legacy.timeParser.enabled` allow to switch back to previous
behaviour with using `java.text.SimpleDateFormat`/`FastDateFormat` for
parsing/generating timestamps/dates.
## How was this patch tested?
It was tested by `UnivocityParserSuite`, `CsvExpressionsSuite`,
`CsvFunctionsSuite` and `CsvSuite`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 time-parser
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/23150.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #23150
----
commit 74a76c2f78ad139993f3bbe0f2ff8f1c81c3bd84
Author: Maxim Gekk <max.gekk@...>
Date: 2018-11-24T11:37:55Z
New and legacy time parser
commit 63cf6112085029c52e4aee6f9bb2e6b84ce18a96
Author: Maxim Gekk <max.gekk@...>
Date: 2018-11-24T12:00:10Z
Add config spark.sql.legacy.timeParser.enabled
commit 2a2ab83a5ecb251ce81e7f12a8c0d3067f88b2d5
Author: Maxim Gekk <max.gekk@...>
Date: 2018-11-24T13:24:07Z
Fallback legacy parser
commit 667bf9f65a90ac69b8cbbad77a17e21f9dd18733
Author: Maxim Gekk <max.gekk@...>
Date: 2018-11-24T15:54:19Z
something
commit 227a7bdc53bdd022e9c365b410810c58f56e8bea
Author: Maxim Gekk <max.gekk@...>
Date: 2018-11-25T12:15:15Z
Using instances
commit 73ee56088bf4d2856c454a7bbd4171b61cfe4614
Author: Maxim Gekk <max.gekk@...>
Date: 2018-11-25T13:52:02Z
Added generator
commit f35f6e13270eb994ac97627da79497673b4fe686
Author: Maxim Gekk <max.gekk@...>
Date: 2018-11-25T18:03:17Z
Refactoring of TimeFormatter
commit 1c09b58e6fe3e0fd565c852dcb73dc012fa56819
Author: Maxim Gekk <max.gekk@...>
Date: 2018-11-25T18:06:22Z
Renaming to DateTimeFormatter
commit 7b213d5b2ae404c87f090da622a78d3d19fee6a9
Author: Maxim Gekk <max.gekk@...>
Date: 2018-11-25T18:32:54Z
Added DateFormatter
commit 242ba474dcf112b48bd286811daed86a66366c39
Author: Maxim Gekk <max.gekk@...>
Date: 2018-11-25T19:58:08Z
Default values in parsing
commit db48ee6918eef06e19c3bdf64e3c44f4541cc294
Author: Maxim Gekk <max.gekk@...>
Date: 2018-11-25T21:09:08Z
Parse as date type because format for timestamp is not not matched to values
commit e18841b38050ac411a507a2a2643584f2c8739ce
Author: Maxim Gekk <max.gekk@...>
Date: 2018-11-25T21:53:11Z
Fix tests
commit 8db023834b680f336ff5a0e08253ba2cb3b6e3b7
Author: Maxim Gekk <max.gekk@...>
Date: 2018-11-25T23:09:20Z
CSVSuite passed
commit 0b9ed92a456d60db0934340f37e0bd428b2f6a42
Author: Maxim Gekk <max.gekk@...>
Date: 2018-11-26T22:00:10Z
Fix imports
commit 799ebb3432dec7fe1e1099d68a3f1c09e714aa8e
Author: Maxim Gekk <max.gekk@...>
Date: 2018-11-26T22:03:19Z
Revert test back
commit 5a223919439e2d22814b92c0e1e572b3c318566f
Author: Maxim Gekk <max.gekk@...>
Date: 2018-11-26T22:17:11Z
Set timeZone
commit f287b77d94de9e9f466c0ff2c2370f22a46b48f7
Author: Maxim Gekk <max.gekk@...>
Date: 2018-11-26T22:44:42Z
Removing default for micros because it causes conflicts in parsing
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]