sunchao opened a new pull request, #56507: URL: https://github.com/apache/spark/pull/56507
### Why are the changes needed? After #34558 added code generation for array higher-order functions, `ArrayAggregate` began binding its accumulator with `zero.dataType.asNullable`. Generated execution needs that widened type because the merge lambda may produce null values in nested arrays, maps, or structs. The widening currently happens while `ResolveLambdaVariables` is still resolving the nested lambda tree. For nested `transform` / `filter` / `aggregate` expressions over a strict array-of-struct accumulator, complex-type resolution can consequently inspect an unresolved field extraction and fail with `Invalid call to dataType on unresolved object`. Widening may also make an otherwise valid downstream strict cast fail type checking. Those expressions should keep the already-valid accumulator type and use interpreted execution instead of becoming analysis failures. ### What changes were proposed in this PR? - Bind `ArrayAggregate` lambdas with the original accumulator type so the complete nested lambda tree resolves before nullability changes. - After a logical operator resolves, widen the bound accumulator variables and rewrite downstream attributes to keep their data types consistent. - If widening turns a valid expression into a type-check failure, preserve the strict accumulator and mark that `ArrayAggregate` as unsupported by whole-stage code generation. - Add focused coverage for deferred binding, attribute rewriting, the production-shaped nested analyzer failure, and the interpreted fallback boundary. ### How was this PR tested? - `build/sbt -java-home /opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home 'catalyst/testOnly org.apache.spark.sql.catalyst.analysis.ResolveLambdaVariablesSuite'` - `build/sbt -java-home /opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.HigherOrderFunctionsSuite'` - `build/sbt -java-home /opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home 'sql/testOnly org.apache.spark.sql.DataFrameComplexTypeSuite'` - Catalyst and SQL source/test Scalastyle checks. - `git diff --check`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
