GitHub user mgaido91 opened a pull request:
https://github.com/apache/spark/pull/20621
[SPARK-23436][SQL] Infer partition as Date only if it can be casted to Date
## What changes were proposed in this pull request?
Before the patch, Spark could infer as Date a partition value which cannot
be casted to Date (this can happen when there are extra characters after a
valid date, like `2018-02-15AAA`).
When this happens and the input format has metadata which define the schema
of the table, then `null` is returned as a value for the partition column,
because the `cast` operator used in
(`PartitioningAwareFileIndex.inferPartitioning`) is unable to convert the value.
The PR checks in the partition inference that values can be casted to Date
and Timestamp, in order to infer that datatype to them.
## How was this patch tested?
added UT
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mgaido91/spark SPARK-23436
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20621.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20621
----
commit 2f05ab8e82b0940e84cbe407abe49f72cddeef11
Author: Marco Gaido <marcogaido91@...>
Date: 2018-02-15T16:59:20Z
[SPARK-23436][SQL] Infer partition as Date only if it can be casted to Date
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]