[GitHub] [spark] viirya commented on a change in pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

GitBox Wed, 22 Sep 2021 23:03:51 -0700


viirya commented on a change in pull request #34038:
URL: https://github.com/apache/spark/pull/34038#discussion_r714490029




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
##########
@@ -401,15 +401,30 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog {
                     |the ${ordinalNumber(ti + 1)} table has 
${child.output.length} columns
                   """.stripMargin.replace("\n", " ").trim())
               }
+              val isUnion = operator.isInstanceOf[Union]
+              val dataTypesAreCompatibleFn = if (isUnion) {
+                // `TypeCoercion` takes care of type coercion already. If any 
columns or nested
+                // columns are not compatible, we detect it here and throw 
analysis exception.
+                val typeChecker = (dt1: DataType, dt2: DataType) => {
+                  !TypeCoercion.findWiderTypeForTwo(dt1.asNullable, 
dt2.asNullable).isEmpty

Review comment:
       Oh, I spent a little time to recall why I keep original check logic.
   
   It is because if `TypeCoercion` fails to find compatible types for any 
column, it won't add cast for all. It is all or nothing logic here.
   
   So if we only check `dt1 == dt2` here, we compare the original data types 
even some of them are compatible.
   
   `AnalysisErrorSuite` has one example. One relation has `short, string, 
double, decimal`, another one has `string, string, string, map`.
   
   The first three columns are compatible, only the fourth isn't. So 
`TypeCoercion` fails to add casts for all.
   
   If we compare `dt1 == dt2`, the error will be like "short is not compatible 
with string". But currently we get like "decimal is not compatible with map".
   

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
##########
@@ -401,15 +401,30 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog {
                     |the ${ordinalNumber(ti + 1)} table has 
${child.output.length} columns
                   """.stripMargin.replace("\n", " ").trim())
               }
+              val isUnion = operator.isInstanceOf[Union]
+              val dataTypesAreCompatibleFn = if (isUnion) {
+                // `TypeCoercion` takes care of type coercion already. If any 
columns or nested
+                // columns are not compatible, we detect it here and throw 
analysis exception.
+                val typeChecker = (dt1: DataType, dt2: DataType) => {
+                  !TypeCoercion.findWiderTypeForTwo(dt1.asNullable, 
dt2.asNullable).isEmpty

Review comment:
       Oh, I spent a little time to recall why I keep original check logic.
   
   It is because if `TypeCoercion` fails to find compatible types for any 
column, it won't add cast for all. It is all or nothing logic there.
   
   So if we only check `dt1 == dt2` here, we compare the original data types 
even some of them are compatible.
   
   `AnalysisErrorSuite` has one example. One relation has `short, string, 
double, decimal`, another one has `string, string, string, map`.
   
   The first three columns are compatible, only the fourth isn't. So 
`TypeCoercion` fails to add casts for all.
   
   If we compare `dt1 == dt2`, the error will be like "short is not compatible 
with string". But currently we get like "decimal is not compatible with map".
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya commented on a change in pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

Reply via email to