GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/19389
[SPARK-22165][SQL] Resolve type conflicts between decimals, dates and
timestamps in partition column
## What changes were proposed in this pull request?
This PR proposes to re-use `TypeCoercion.findWiderCommonType` when
resolving type conflicts in partition values. Currently, this uses numeric
precedence-like comparison; therefore, it looks introducing failures for type
conflicts between timestamps, dates and decimals, please see:
```scala
private val upCastingOrder: Seq[DataType] =
Seq(NullType, IntegerType, LongType, FloatType, DoubleType, StringType)
...
literals.map(_.dataType).maxBy(upCastingOrder.indexOf(_))
```
The codes below:
```scala
val df = Seq((1, "2015-01-01"), (2, "2016-01-01 00:00:00")).toDF("i", "ts")
df.write.format("parquet").partitionBy("ts").save("/tmp/foo")
spark.read.load("/tmp/foo").printSchema()
val df = Seq((1, "1"), (2, "1" * 30)).toDF("i", "decimal")
df.write.format("parquet").partitionBy("decimal").save("/tmp/bar")
spark.read.load("/tmp/bar").printSchema()
```
produces output as below:
**Before**
```
root
|-- i: integer (nullable = true)
|-- ts: date (nullable = true)
root
|-- i: integer (nullable = true)
|-- decimal: integer (nullable = true)
```
**After**
```
root
|-- i: integer (nullable = true)
|-- ts: timestamp (nullable = true)
root
|-- i: integer (nullable = true)
|-- decimal: decimal(30,0) (nullable = true)
```
## How was this patch tested?
Unit tests added in `ParquetPartitionDiscoverySuite`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HyukjinKwon/spark partition-type-coercion
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19389.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19389
----
commit 1e10336128bf1e78a889ee4438e4519bb12bd84a
Author: hyukjinkwon <[email protected]>
Date: 2017-09-29T05:18:05Z
Resolve type conflicts between decimals, dates and timestamps in partition
column
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]