MaxGekk opened a new pull request, #56839:
URL: https://github.com/apache/spark/pull/56839

   ### What changes were proposed in this pull request?
   
   This is a backport of #56831 to `branch-4.0`.
   
   `AttributeSeq` builds case-insensitive name lookup maps (`direct`, 
`qualified`, `qualified3Part`, `qualified4Part`) by grouping the attributes on 
`attr.name.toLowerCase(Locale.ROOT)`. The grouping key function dereferences 
the name without a null check, so a single attribute whose name is `null` makes 
`groupBy(_.name.toLowerCase(...))` throw a `NullPointerException`, aborting 
resolution of the whole operator with an `INTERNAL_ERROR` (SQLSTATE XX000) 
instead of resolving the other columns.
   
   This PR introduces a `namedAttrs` lazy val that filters out null-named 
attributes, and builds the four name maps from it instead of from `attrs`. 
Positional and expression-id access (`apply(ordinal)`, `indexOf(exprId)`) still 
use the full `attrs`, so they are unaffected.
   
   ### Why are the changes needed?
   
   A null-named attribute can arise on the JVM side: `StructField` permits a 
null name (no `require(name != null)`), and the name flows unchanged through 
`DataTypeUtils.toAttribute` into `AttributeReference`. A null-named attribute 
is unaddressable by any column reference (a reference's name parts are never 
null), so dropping it from the name maps cannot change resolution of any 
legitimate reference. It converts the hard `NullPointerException` into correct 
resolution of the remaining (named) attributes, or a normal unresolved-column 
error if the null-named column is referenced.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. It only turns an internal `NullPointerException` / `INTERNAL_ERROR` into 
normal column-resolution behavior.
   
   ### How was this patch tested?
   
   Backported regression tests in `AttributeResolutionSuite` (covering the 
`direct`, `qualified`, `qualified3Part`, and `qualified4Part` maps) and an 
end-to-end test in `DataFrameSuite`. The same tests pass on the `branch-4.2` 
backport (#56837); CI runs them here.
   
   ```
   build/sbt 'catalyst/testOnly *AttributeResolutionSuite'
   build/sbt 'sql/testOnly *DataFrameSuite -- -z "SPARK-57725: resolve columns 
when the input plan has a null-named attribute"'
   ```
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Cursor


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to