GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/20764

    [SPARK-23436][SQL][BACKPORT-2.3] Infer partition as Date only if it can be 
casted to Date

    This PR is to backport https://github.com/apache/spark/pull/20621 to branch 
2.3 
    
    
    ---
    ## What changes were proposed in this pull request?
    
    Before the patch, Spark could infer as Date a partition value which cannot 
be casted to Date (this can happen when there are extra characters after a 
valid date, like `2018-02-15AAA`).
    
    When this happens and the input format has metadata which define the schema 
of the table, then `null` is returned as a value for the partition column, 
because the `cast` operator used in 
(`PartitioningAwareFileIndex.inferPartitioning`) is unable to convert the value.
    
    The PR checks in the partition inference that values can be casted to Date 
and Timestamp, in order to infer that datatype to them.
    
    ## How was this patch tested?
    
    added UT


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark backport23436

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20764.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20764
    
----
commit a69d8d19b01438bc228deb6706c2dc59817a1cfd
Author: Marco Gaido <marcogaido91@...>
Date:   2018-02-20T05:56:38Z

    [SPARK-23436][SQL] Infer partition as Date only if it can be casted to Date
    
    ## What changes were proposed in this pull request?
    
    Before the patch, Spark could infer as Date a partition value which cannot 
be casted to Date (this can happen when there are extra characters after a 
valid date, like `2018-02-15AAA`).
    
    When this happens and the input format has metadata which define the schema 
of the table, then `null` is returned as a value for the partition column, 
because the `cast` operator used in 
(`PartitioningAwareFileIndex.inferPartitioning`) is unable to convert the value.
    
    The PR checks in the partition inference that values can be casted to Date 
and Timestamp, in order to infer that datatype to them.
    
    ## How was this patch tested?
    
    added UT
    
    Author: Marco Gaido <marcogaid...@gmail.com>
    
    Closes #20621 from mgaido91/SPARK-23436.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to