GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/11550
[SPARK-13667][SQL] Support for specifying custom date format for date and
timestamp types at CSV datasource.
## What changes were proposed in this pull request?
This PR adds the support to specify custom date format for `DateType` and
`TimestampType`.
For `TimestampType`, this uses the given format to infer schema and also to
convert the values
For `DateType`, this uses the given format to convert the values.
If the `dateFormat` is not given, then it works with `Timestamp.valueOf()`
and `Date.valueOf()` for backwords compatibility.
When it's given, then it uses `SimpleDateFormat` for parsing data.
In addition, `IntegerType`, `DoubleType` and `LongType` have a higher
priority than `TimestampType` in type inference. This means even if the given
format is `yyyy` or `yyyy.MM`, it will be inferred as `IntegerType` or
`DoubleType`. Since it is type inference, I think it is okay to give such
precedences.
In addition, I renamed `csv.CSVInferSchema` to `csv.InferSchema` as JSON
datasource has `json.InferSchema`. Although they have the same names, I did
this because I thought the parent package name can still differentiate each.
Accordingly, the suite name was also changed from `CSVInferSchemaSuite` to
`InferSchemaSuite`.
## How was this patch tested?
unit tests are used and `./dev/run_tests` for coding style tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HyukjinKwon/spark SPARK-13667
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11550.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11550
----
commit 5c990cd8c996c9a624439749d8809624c2457051
Author: hyukjinkwon <[email protected]>
Date: 2016-03-07T01:16:07Z
Support for specifying custom date format for date and timestamp types.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]