GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/23150

    [SPARK-26178][SQL] Use java.time API for parsing timestamps and dates from 
CSV

    ## What changes were proposed in this pull request?
    
    In the PR, I propose to use **java.time API** for parsing timestamps and 
dates from CSV content with microseconds precision. The SQL config 
`spark.sql.legacy.timeParser.enabled` allow to switch back to previous 
behaviour with using `java.text.SimpleDateFormat`/`FastDateFormat` for 
parsing/generating timestamps/dates.
    
    ## How was this patch tested?
    
    It was tested by `UnivocityParserSuite`, `CsvExpressionsSuite`, 
`CsvFunctionsSuite` and `CsvSuite`.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 time-parser

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23150.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23150
    
----
commit 74a76c2f78ad139993f3bbe0f2ff8f1c81c3bd84
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-24T11:37:55Z

    New and legacy time parser

commit 63cf6112085029c52e4aee6f9bb2e6b84ce18a96
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-24T12:00:10Z

    Add config spark.sql.legacy.timeParser.enabled

commit 2a2ab83a5ecb251ce81e7f12a8c0d3067f88b2d5
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-24T13:24:07Z

    Fallback legacy parser

commit 667bf9f65a90ac69b8cbbad77a17e21f9dd18733
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-24T15:54:19Z

    something

commit 227a7bdc53bdd022e9c365b410810c58f56e8bea
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-25T12:15:15Z

    Using instances

commit 73ee56088bf4d2856c454a7bbd4171b61cfe4614
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-25T13:52:02Z

    Added generator

commit f35f6e13270eb994ac97627da79497673b4fe686
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-25T18:03:17Z

    Refactoring of TimeFormatter

commit 1c09b58e6fe3e0fd565c852dcb73dc012fa56819
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-25T18:06:22Z

    Renaming to DateTimeFormatter

commit 7b213d5b2ae404c87f090da622a78d3d19fee6a9
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-25T18:32:54Z

    Added DateFormatter

commit 242ba474dcf112b48bd286811daed86a66366c39
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-25T19:58:08Z

    Default values in parsing

commit db48ee6918eef06e19c3bdf64e3c44f4541cc294
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-25T21:09:08Z

    Parse as date type because format for timestamp is not not matched to values

commit e18841b38050ac411a507a2a2643584f2c8739ce
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-25T21:53:11Z

    Fix tests

commit 8db023834b680f336ff5a0e08253ba2cb3b6e3b7
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-25T23:09:20Z

    CSVSuite passed

commit 0b9ed92a456d60db0934340f37e0bd428b2f6a42
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-26T22:00:10Z

    Fix imports

commit 799ebb3432dec7fe1e1099d68a3f1c09e714aa8e
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-26T22:03:19Z

    Revert test back

commit 5a223919439e2d22814b92c0e1e572b3c318566f
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-26T22:17:11Z

    Set timeZone

commit f287b77d94de9e9f466c0ff2c2370f22a46b48f7
Author: Maxim Gekk <max.gekk@...>
Date:   2018-11-26T22:44:42Z

    Removing default for micros because it causes conflicts in parsing

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to