GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/11550

    [SPARK-13667][SQL] Support for specifying custom date format for date and 
timestamp types at CSV datasource.

    ## What changes were proposed in this pull request?
    
    This PR adds the support to specify custom date format for `DateType` and 
`TimestampType`.
    
    For `TimestampType`, this uses the given format to infer schema and also to 
convert the values
    For `DateType`, this uses the given format to convert the values.
    If the `dateFormat` is not given, then it works with `Timestamp.valueOf()` 
and `Date.valueOf()` for backwords compatibility.
    When it's given, then it uses `SimpleDateFormat` for parsing data.
    
    In addition, `IntegerType`, `DoubleType` and `LongType` have a higher 
priority than `TimestampType` in type inference. This means even if the given 
format is `yyyy` or `yyyy.MM`, it will be inferred as `IntegerType` or 
`DoubleType`. Since it is type inference, I think it is okay to give such 
precedences.
    
    In addition, I renamed `csv.CSVInferSchema` to `csv.InferSchema` as JSON 
datasource has `json.InferSchema`. Although they have the same names, I did 
this because I thought the parent package name can still differentiate each.  
Accordingly, the suite name was also changed from `CSVInferSchemaSuite` to 
`InferSchemaSuite`.
    
    
    
    ## How was this patch tested?
    
    unit tests are used and `./dev/run_tests` for coding style tests.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-13667

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11550.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11550
    
----
commit 5c990cd8c996c9a624439749d8809624c2457051
Author: hyukjinkwon <[email protected]>
Date:   2016-03-07T01:16:07Z

    Support for specifying custom date format for date and timestamp types.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to