MaxGekk opened a new pull request, #56845:
URL: https://github.com/apache/spark/pull/56845

   ### What changes were proposed in this pull request?
   
   `CreateNamedStruct.dataType` builds each field with 
`StructField(name.toString, ...)`:
   
   ```scala
   override lazy val dataType: StructType = {
     val fields = names.zip(valExprs).map {
       case (name, expr) =>
         ...
         StructField(name.toString, expr.dataType, expr.nullable, metadata)   
// NPE if name == null
     }
     StructType(fields)
   }
   ```
   
   When a field name is `null`, `name.toString` throws a 
`NullPointerException`. This is reached eagerly while building a `RowEncoder` 
serializer (`SerializerBuildHelper.createSerializerForObject` -> 
`CreateNamedStruct(...).dataType`), so it crashes before any analysis runs. 
This PR makes the field name null-safe and preserves the null name:
   
   ```scala
   StructField(if (name == null) null else name.toString, expr.dataType, 
expr.nullable, metadata)
   ```
   
   ### Why are the changes needed?
   
   A null field name is invalid input -- 
`CreateNamedStruct.checkInputDataTypes` already rejects it 
(`names.contains(null)` -> `UNEXPECTED_NULL`) -- but `dataType` dereferences 
`name.toString` before type checking, and the encoder calls `dataType` 
directly. Keeping it null-safe converts the hard `NullPointerException` into 
correct behavior, consistent with SPARK-57725 which made `AttributeSeq` 
tolerate null-named attributes.
   
   Minimal reproduction:
   
   ```scala
   import org.apache.spark.sql.catalyst.expressions.{CreateNamedStruct, Literal}
   import org.apache.spark.sql.types.{IntegerType, StringType}
   
   CreateNamedStruct(Seq(Literal.create(null, StringType), 
Literal(1))).dataType  // NPE before this fix
   ```
   
   Note: this fixes the specific `CreateNamedStruct.dataType` NPE. The full 
`createDataFrame(schemaWithNullFieldName)` scenario hits additional, 
independent null-name sites further along (e.g. a 
`StructField.name.equalsIgnoreCase` schema comparison during resolution), which 
are separate pre-existing issues and out of scope here.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Added a regression test in `ComplexTypeSuite` asserting `dataType` no longer 
throws and preserves the null field name.
   
   ```
   build/sbt 'catalyst/testOnly *ComplexTypeSuite'
   ```
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Cursor


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to