MaxGekk opened a new pull request, #56840: URL: https://github.com/apache/spark/pull/56840
### What changes were proposed in this pull request? This is a backport of #56831 to `branch-3.5`. `AttributeSeq` builds case-insensitive name lookup maps (`direct`, `qualified`, `qualified3Part`, `qualified4Part`) by grouping the attributes on `attr.name.toLowerCase(Locale.ROOT)`. The grouping key function dereferences the name without a null check, so a single attribute whose name is `null` makes `groupBy(_.name.toLowerCase(...))` throw a `NullPointerException`, aborting resolution of the whole operator with an `INTERNAL_ERROR` (SQLSTATE XX000) instead of resolving the other columns. This PR introduces a `namedAttrs` lazy val that filters out null-named attributes, and builds the four name maps from it instead of from `attrs`. Positional and expression-id access (`apply(ordinal)`, `indexOf(exprId)`) still use the full `attrs`, so they are unaffected. ### Why are the changes needed? A null-named attribute can arise on the JVM side: `StructField` permits a null name (no `require(name != null)`), and the name flows unchanged through `DataTypeUtils.toAttribute` into `AttributeReference`. A null-named attribute is unaddressable by any column reference (a reference's name parts are never null), so dropping it from the name maps cannot change resolution of any legitimate reference. It converts the hard `NullPointerException` into correct resolution of the remaining (named) attributes, or a normal unresolved-column error if the null-named column is referenced. ### Does this PR introduce _any_ user-facing change? No. It only turns an internal `NullPointerException` / `INTERNAL_ERROR` into normal column-resolution behavior. ### How was this patch tested? Backported regression tests in `AttributeResolutionSuite` (covering the `direct`, `qualified`, `qualified3Part`, and `qualified4Part` maps) and an end-to-end test in `DataFrameSuite`. Note: on `branch-3.5` the end-to-end test uses `Dataset.ofRows` (the `classic.Dataset` package does not exist in 3.5); the catalyst fix and tests are otherwise identical to the master change. ``` build/sbt 'catalyst/testOnly *AttributeResolutionSuite' build/sbt 'sql/testOnly *DataFrameSuite -- -z "SPARK-57725: resolve columns when the input plan has a null-named attribute"' ``` ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
