Github user uzadude commented on a diff in the pull request:
https://github.com/apache/spark/pull/23042#discussion_r234431689
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
---
@@ -138,6 +138,11 @@ object TypeCoercion {
case (DateType, TimestampType)
=> if (conf.compareDateTimestampInTimestamp) Some(TimestampType)
else Some(StringType)
+ // to support a popular use case of tables using Decimal(X, 0) for
long IDs instead of strings
+ // see SPARK-26070 for more details
+ case (n: DecimalType, s: StringType) if n.scale == 0 =>
Some(DecimalType(n.precision, n.scale))
--- End diff --
I personally agree with @cloud-fan that there are a few types that are
"definitely safe", and as the user is not always responsible to his input
tables, I believe convinience is more important than schema definitions. Also,
even count() returns a bigint then you'll have to filter 'count(*)>100L' which
means huge regression.
I believe that the "definitely safe" list is very short and we should use
it. @mgaido91, in your examples I do agree that Double to Decimal is not safe
and so is String to almost anything.
the trivial safes are something like (Long, Int), (Int, Double), (Decimal,
Decimal) - that could be expanded to the same precision and scale, maybe (Data,
TimeStamp)..
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]