[PR] [SPARK-37019][SQL][FOLLOWUP] Defer ArrayAggregate accumulator widening [spark]

via GitHub Sun, 14 Jun 2026 13:23:26 -0700


sunchao opened a new pull request, #56507:
URL: https://github.com/apache/spark/pull/56507


   ### Why are the changes needed?
   
   After #34558 added code generation for array higher-order functions, 
`ArrayAggregate` began binding its accumulator with `zero.dataType.asNullable`. 
Generated execution needs that widened type because the merge lambda may 
produce null values in nested arrays, maps, or structs.
   
   The widening currently happens while `ResolveLambdaVariables` is still 
resolving the nested lambda tree. For nested `transform` / `filter` / 
`aggregate` expressions over a strict array-of-struct accumulator, complex-type 
resolution can consequently inspect an unresolved field extraction and fail 
with `Invalid call to dataType on unresolved object`.
   
   Widening may also make an otherwise valid downstream strict cast fail type 
checking. Those expressions should keep the already-valid accumulator type and 
use interpreted execution instead of becoming analysis failures.
   
   ### What changes were proposed in this PR?
   
   - Bind `ArrayAggregate` lambdas with the original accumulator type so the 
complete nested lambda tree resolves before nullability changes.
   - After a logical operator resolves, widen the bound accumulator variables 
and rewrite downstream attributes to keep their data types consistent.
   - If widening turns a valid expression into a type-check failure, preserve 
the strict accumulator and mark that `ArrayAggregate` as unsupported by 
whole-stage code generation.
   - Add focused coverage for deferred binding, attribute rewriting, the 
production-shaped nested analyzer failure, and the interpreted fallback 
boundary.
   
   ### How was this PR tested?
   
   - `build/sbt -java-home 
/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home 
'catalyst/testOnly 
org.apache.spark.sql.catalyst.analysis.ResolveLambdaVariablesSuite'`
   - `build/sbt -java-home 
/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home 
'catalyst/testOnly 
org.apache.spark.sql.catalyst.expressions.HigherOrderFunctionsSuite'`
   - `build/sbt -java-home 
/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home 'sql/testOnly 
org.apache.spark.sql.DataFrameComplexTypeSuite'`
   - Catalyst and SQL source/test Scalastyle checks.
   - `git diff --check`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-37019][SQL][FOLLOWUP] Defer ArrayAggregate accumulator widening [spark]

Reply via email to