MaxGekk commented on PR #56840:
URL: https://github.com/apache/spark/pull/56840#issuecomment-4823680379

   Note for reviewers: unlike the master PR (#56831), I omitted the end-to-end 
`DataFrameSuite` test here.
   
   On `branch-3.5`, constructing a `DataFrame` over a null-named column 
(`Dataset.ofRows(spark, <relation>)`) eagerly builds a `RowEncoder` whose 
serializer evaluates `CreateNamedStruct.dataType` -> 
`StructField(name.toString, ...)`, which throws an NPE on the null field name 
*before* column resolution runs:
   
   ```
   java.lang.NullPointerException
     at 
org.apache.spark.sql.catalyst.expressions.CreateNamedStruct.$anonfun$dataType$9(complexTypeCreator.scala:457)
     ...
     at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:56)
   ```
   
   That encoder NPE is a separate, pre-existing issue (it also reproduces on 
master via `createDataFrame` with a null field name) and is out of scope for 
this backport; I filed SPARK-57729 to track it. The catalyst-level 
`AttributeResolutionSuite` test (which directly exercises the `AttributeSeq` 
name maps fixed here) is the regression guard for this change and passes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to