yadavay-amzn opened a new pull request, #56417: URL: https://github.com/apache/spark/pull/56417
### What changes were proposed in this pull request? Fix a crash in the single-pass resolver (Analyzer++) when GROUP BY CUBE/ROLLUP/GROUPING SETS is used with HAVING or ORDER BY containing aggregate functions. The fix wires `GroupingAnalyticsResolver` into `AggregateResolver` to expand `BaseGroupingSets` into an `Expand` operator before `AggregationValidator` runs, and guards `ExprUtils.checkValidGroupingExprs` against calling `.dataType` on unresolved grouping expressions. Post-expansion validation is applied in both the lateral-column-alias and non-LCA paths. ### Why are the changes needed? With `spark.sql.analyzer.singlePassResolver.enabled=true`, queries like: ```sql SELECT a, b, SUM(b) FROM VALUES (1,10),(1,20),(2,30) AS t(a,b) GROUP BY CUBE(a, b) ORDER BY SUM(b); ``` crash with: ``` org.apache.spark.SparkUnsupportedOperationException: [UNSUPPORTED_CALL.WITHOUT_SUGGESTION] Cannot call the method "dataType$" of the class "org.apache.spark.sql.catalyst.expressions.BaseGroupingSets". ``` The single-pass resolver called `assertValidAggregation` while `BaseGroupingSets` nodes were still present in the grouping expressions (the legacy analyzer expands them via `ResolveGroupingAnalytics` before validation). `BaseGroupingSets.dataType` throws by design because these nodes must be expanded first. ### Does this PR introduce _any_ user-facing change? Yes -- GROUPING SETS/CUBE/ROLLUP queries with HAVING/ORDER BY no longer crash under the single-pass resolver. ### How was this patch tested? Added tests in `SQLQuerySuite` covering CUBE/ROLLUP/GROUPING SETS with ORDER BY and HAVING, NULL grouping columns, empty results, multiple aggregates, a negative test (column not in GROUP BY still errors with `MISSING_AGGREGATION`), and a legacy-analyzer baseline. TDD-verified: tests fail without the fix. Note: a separate wrong-results issue (SPARK-57346) exists for multi-column ROLLUP with HAVING under the single-pass resolver -- documented in a test but out of scope for this crash fix. ### Was this patch authored or co-authored using generative AI tooling? Yes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
