yadavay-amzn opened a new pull request, #56417:
URL: https://github.com/apache/spark/pull/56417

   ### What changes were proposed in this pull request?
   
   Fix a crash in the single-pass resolver (Analyzer++) when GROUP BY 
CUBE/ROLLUP/GROUPING SETS is used with HAVING or ORDER BY containing aggregate 
functions.
   
   The fix wires `GroupingAnalyticsResolver` into `AggregateResolver` to expand 
`BaseGroupingSets` into an `Expand` operator before `AggregationValidator` 
runs, and guards `ExprUtils.checkValidGroupingExprs` against calling 
`.dataType` on unresolved grouping expressions. Post-expansion validation is 
applied in both the lateral-column-alias and non-LCA paths.
   
   ### Why are the changes needed?
   
   With `spark.sql.analyzer.singlePassResolver.enabled=true`, queries like:
   
   ```sql
   SELECT a, b, SUM(b) FROM VALUES (1,10),(1,20),(2,30) AS t(a,b)
   GROUP BY CUBE(a, b) ORDER BY SUM(b);
   ```
   
   crash with:
   ```
   org.apache.spark.SparkUnsupportedOperationException: 
[UNSUPPORTED_CALL.WITHOUT_SUGGESTION]
   Cannot call the method "dataType$" of the class
   "org.apache.spark.sql.catalyst.expressions.BaseGroupingSets".
   ```
   
   The single-pass resolver called `assertValidAggregation` while 
`BaseGroupingSets` nodes were still present in the grouping expressions (the 
legacy analyzer expands them via `ResolveGroupingAnalytics` before validation). 
`BaseGroupingSets.dataType` throws by design because these nodes must be 
expanded first.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes -- GROUPING SETS/CUBE/ROLLUP queries with HAVING/ORDER BY no longer 
crash under the single-pass resolver.
   
   ### How was this patch tested?
   
   Added tests in `SQLQuerySuite` covering CUBE/ROLLUP/GROUPING SETS with ORDER 
BY and HAVING, NULL grouping columns, empty results, multiple aggregates, a 
negative test (column not in GROUP BY still errors with `MISSING_AGGREGATION`), 
and a legacy-analyzer baseline. TDD-verified: tests fail without the fix.
   
   Note: a separate wrong-results issue (SPARK-57346) exists for multi-column 
ROLLUP with HAVING under the single-pass resolver -- documented in a test but 
out of scope for this crash fix.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Yes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to