Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/23042#discussion_r234155688
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
---
@@ -138,6 +138,11 @@ object TypeCoercion {
case (DateType, TimestampType)
=> if (conf.compareDateTimestampInTimestamp) Some(TimestampType)
else Some(StringType)
+ // to support a popular use case of tables using Decimal(X, 0) for
long IDs instead of strings
+ // see SPARK-26070 for more details
+ case (n: DecimalType, s: StringType) if n.scale == 0 =>
Some(DecimalType(n.precision, n.scale))
--- End diff --
@cloud-fan I think we have seen many issues on this. I don't think there is
a standard for them, every RDBMS has different rules. The worst thing about the
current rules IMHO is that they are not even coherent in Spark (see #19635 for
instance).
The option I'd prefer is to follow Postgres behavior, ie. no implicit cast
at all. When there is a type mismatch the user has to choose how to cast the
things. It is a bit more effort on user side, but it is the safest option IMHO.
What do you think?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]