Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/23042#discussion_r234403917
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
---
@@ -138,6 +138,11 @@ object TypeCoercion {
case (DateType, TimestampType)
=> if (conf.compareDateTimestampInTimestamp) Some(TimestampType)
else Some(StringType)
+ // to support a popular use case of tables using Decimal(X, 0) for
long IDs instead of strings
+ // see SPARK-26070 for more details
+ case (n: DecimalType, s: StringType) if n.scale == 0 =>
Some(DecimalType(n.precision, n.scale))
--- End diff --
I agree with you when you say that it is "too strict", as this is an
extreme approach. But that's how Postgres works and I do believe it has some
benefits over other behaviors. I'd argue just a couple of things about what you
are suggesting:
- if you are comparing an int and a long, you most likely have to better
define your schemas (not always true, of course). For instance in Oracle if you
are not careful you end up pretty easily having a variable like YEAR being
stored as an integer in some tables, as a string in some others and so on.
Because Oracle does implicit casting so nobody realizes this bad design. With
Postgres you do realize if you have a messy schema design like that. If it is
not the case and your schemas are fine and you do need a casting, you can
always add it by the way;
- I think it is very hard to determine which are the "definitely safe"
type coercions. For instance is casting a DOUBLE to a DECIMAL definitely safe?
And an INT to STRING? We may say that casting an INT to STRING is safe, because
there is no error at all in the conversion, but if we have something like `2014
= '2014 '` should we return true or false? Most likely the users wants a true
there, and most likely we would return false, which the user may (or may even
not) realize only at the end of the job (which may mean several hours).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]