Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23042#discussion_r234403917
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
    @@ -138,6 +138,11 @@ object TypeCoercion {
         case (DateType, TimestampType)
           => if (conf.compareDateTimestampInTimestamp) Some(TimestampType) 
else Some(StringType)
     
    +    // to support a popular use case of tables using Decimal(X, 0) for 
long IDs instead of strings
    +    // see SPARK-26070 for more details
    +    case (n: DecimalType, s: StringType) if n.scale == 0 => 
Some(DecimalType(n.precision, n.scale))
    --- End diff --
    
    I agree with you when you say that it is "too strict", as this is an 
extreme approach. But that's how Postgres works and I do believe it has some 
benefits over other behaviors. I'd argue just a couple of things about what you 
are suggesting:
     - if you are comparing an int and a long, you most likely have to better 
define your schemas (not always true, of course). For instance in Oracle if you 
are not careful you end up pretty easily having a variable like YEAR being 
stored as an integer in some tables, as a string in some others and so on. 
Because Oracle does implicit casting so nobody realizes this bad design. With 
Postgres you do realize if you have a messy schema design like that. If it is 
not the case and your schemas are fine and you do need a casting, you can 
always add it by the way;
     - I think it is very hard to determine which are the "definitely safe" 
type coercions. For instance is casting a DOUBLE to a DECIMAL definitely safe? 
And an INT to STRING? We may say that casting an INT to STRING is safe, because 
there is no error at all in the conversion, but if we have something like `2014 
= '2014 '` should we return true or false? Most likely the users wants a true 
there, and most likely we would return false, which the user may (or may even 
not) realize only at the end of the job (which may mean several hours).
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to