MaxGekk commented on PR #56840:
URL: https://github.com/apache/spark/pull/56840#issuecomment-4823680379
Note for reviewers: unlike the master PR (#56831), I omitted the end-to-end
`DataFrameSuite` test here.
On `branch-3.5`, constructing a `DataFrame` over a null-named column
(`Dataset.ofRows(spark, <relation>)`) eagerly builds a `RowEncoder` whose
serializer evaluates `CreateNamedStruct.dataType` ->
`StructField(name.toString, ...)`, which throws an NPE on the null field name
*before* column resolution runs:
```
java.lang.NullPointerException
at
org.apache.spark.sql.catalyst.expressions.CreateNamedStruct.$anonfun$dataType$9(complexTypeCreator.scala:457)
...
at
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:56)
```
That encoder NPE is a separate, pre-existing issue (it also reproduces on
master via `createDataFrame` with a null field name) and is out of scope for
this backport; I filed SPARK-57729 to track it. The catalyst-level
`AttributeResolutionSuite` test (which directly exercises the `AttributeSeq`
name maps fixed here) is the regression guard for this change and passes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]