GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/19389

    [SPARK-22165][SQL] Resolve type conflicts between decimals, dates and 
timestamps in partition column

    ## What changes were proposed in this pull request?
    
    This PR proposes to re-use `TypeCoercion.findWiderCommonType` when 
resolving type conflicts in partition values. Currently, this uses numeric 
precedence-like comparison; therefore, it looks introducing failures for type 
conflicts between timestamps, dates and decimals, please see:
    
    ```scala
    private val upCastingOrder: Seq[DataType] =
      Seq(NullType, IntegerType, LongType, FloatType, DoubleType, StringType)
    ...
    literals.map(_.dataType).maxBy(upCastingOrder.indexOf(_))
    ```
    
    The codes below:
    
    
    ```scala
    val df = Seq((1, "2015-01-01"), (2, "2016-01-01 00:00:00")).toDF("i", "ts")
    df.write.format("parquet").partitionBy("ts").save("/tmp/foo")
    spark.read.load("/tmp/foo").printSchema()
    
    val df = Seq((1, "1"), (2, "1" * 30)).toDF("i", "decimal")
    df.write.format("parquet").partitionBy("decimal").save("/tmp/bar")
    spark.read.load("/tmp/bar").printSchema()
    ```
    
    produces output as below:
    
    **Before**
    
    ```
    root
     |-- i: integer (nullable = true)
     |-- ts: date (nullable = true)
    
    root
     |-- i: integer (nullable = true)
     |-- decimal: integer (nullable = true)
    ```
    
    **After**
    
    ```
    root
     |-- i: integer (nullable = true)
     |-- ts: timestamp (nullable = true)
    
    root
     |-- i: integer (nullable = true)
     |-- decimal: decimal(30,0) (nullable = true)
    ```
    
    ## How was this patch tested?
    
    Unit tests added in `ParquetPartitionDiscoverySuite`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark partition-type-coercion

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19389.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19389
    
----
commit 1e10336128bf1e78a889ee4438e4519bb12bd84a
Author: hyukjinkwon <[email protected]>
Date:   2017-09-29T05:18:05Z

    Resolve type conflicts between decimals, dates and timestamps in partition 
column

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to