HyukjinKwon opened a new pull request #27991: [SPARK-31227][SQL] Non-nullable 
null type in complex types should not coerce to nullable type
URL: https://github.com/apache/spark/pull/27991
 
 
   ### What changes were proposed in this pull request?
   
   This PR targets for non-nullable null type not to coerce to nullable type in 
complex types.
   
   Non-nullable fields in struct, elements in an array and entries in map can 
mean empty array, struct and map. They are empty so it does not need to force 
the nullability when we find common types.
   
   ### Why are the changes needed?
   
   To type coercion coherent and consistent. Currently, we correctly keep the 
nullability even between non-nullable fields:
   
   ```scala
   import org.apache.spark.sql.types._
   import org.apache.spark.sql.functions._
   spark.range(1).select(array(lit(1)).cast(ArrayType(IntegerType, 
false))).printSchema()
   spark.range(1).select(array(lit(1)).cast(ArrayType(DoubleType, 
false))).printSchema()
   ```
   ```scala
   spark.range(1).selectExpr("concat(array(1), array(1)) as arr").printSchema()
   ```
   
   ### Does this PR introduce any user-facing change?
   
   Yes.
   
   
   ```scala
   import org.apache.spark.sql.types._
   import org.apache.spark.sql.functions._
   spark.range(1).select(array().cast(ArrayType(IntegerType, 
false))).printSchema()
   ```
   ```scala
   spark.range(1).selectExpr("concat(array(), array(1)) as arr").printSchema()
   ```
   
   **Before:**
   
   ```
   org.apache.spark.sql.AnalysisException: cannot resolve 'array()' due to data 
type mismatch: cannot cast array<null> to array<int>;;
   'Project [cast(array() as array<int>) AS array()#68]
   +- Range (0, 1, step=1, splits=Some(12))
   
     at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
     at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:149)
     at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:140)
     at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:333)
     at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
     at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:333)
     at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:330)
     at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:399)
     at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237)
   ```
   
   ```
   root
    |-- arr: array (nullable = false)
    |    |-- element: integer (containsNull = true)
   ```
   
   **After:**
   
   ```
   root
    |-- array(): array (nullable = false)
    |    |-- element: integer (containsNull = false)
   ```
   
   ```
   root
    |-- arr: array (nullable = false)
    |    |-- element: integer (containsNull = false)
   ```
   
   
   ### How was this patch tested?
   
   Unittests were added and manually tested.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to