Re: [PR] [SPARK-57353][SQL] Fix single-pass resolver crash on GROUPING SETS with HAVING/ORDER BY [spark]

via GitHub Tue, 30 Jun 2026 23:30:34 -0700


cloud-fan commented on code in PR #56417:
URL: https://github.com/apache/spark/pull/56417#discussion_r3503718790



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/AggregateResolver.scala:
##########
@@ -144,25 +147,44 @@ class AggregateResolver(
       )
 
       if (resolvedAggregateExpressions.hasLateralColumnAlias) {
+        // LCA + grouping analytics (CUBE/ROLLUP/GROUPING SETS) is not yet 
supported in the
+        // single-pass resolver because Expand mints new attribute IDs that 
get out of sync
+        // with the Project operator. Fall back to legacy analyzer for correct 
results.
+        if 
(finalAggregate.groupingExpressions.exists(_.isInstanceOf[BaseGroupingSets])) {
+          throw new ExplicitlyUnsupportedResolverFeature(
+            "lateral column alias with grouping analytics")
+        }
         val aggregateWithLcaResolutionResult =
           lcaResolver.handleLcaInAggregate(finalAggregate)
+        val lcaBaseAggregate = aggregateWithLcaResolutionResult.baseAggregate
+        AggregationValidator(lcaBaseAggregate)
         AggregateResolutionResult(
           operator = aggregateWithLcaResolutionResult.resolvedOperator,
           outputList = aggregateWithLcaResolutionResult.outputList,
           groupingAttributeIds =
-            
getGroupingAttributeIds(aggregateWithLcaResolutionResult.baseAggregate),
+            getGroupingAttributeIds(lcaBaseAggregate),
           aggregateListAliases = 
aggregateWithLcaResolutionResult.aggregateListAliases,
-          baseAggregate = aggregateWithLcaResolutionResult.baseAggregate
+          baseAggregate = lcaBaseAggregate
         )
       } else {
-        AggregationValidator(finalAggregate)
+        // Validation contract: grouping analytics are expanded first, then 
the expanded
+        // Aggregate is validated by AggregationValidator (post-expansion).
+        val expandedAggregate =
+          if 
(finalAggregate.groupingExpressions.exists(_.isInstanceOf[BaseGroupingSets])) {

Review Comment:
   `GroupingAnalyticsResolver.resolve` already self-guards: 
`tryExtractGroupingAnalytics` returns `None` when the grouping expressions 
contain no `BaseGroupingSets`, and `resolve` then returns the aggregate 
unchanged (GroupingAnalyticsResolver.scala:103). So it's a no-op on plain 
aggregates and can be called unconditionally — which collapses this `if (exists 
BaseGroupingSets) expand-then-validate else validate` fork into a single 
"always expand, then validate once" path.
   
   That matters beyond tidiness: once expansion always runs, the 
`skipGroupingExprChecks` parameter added to 
`assertValidAggregation`/`AggregationValidator` has no remaining caller and can 
be dropped. That parameter is a public knob whose only effect is to silently 
skip grouping-expression validation (type-orderable, no nested aggregates in 
GROUP BY) — the exact fragility raised in the ExprUtils.scala thread. Calling 
`resolve` unconditionally removes the hazard at the root rather than 
documenting around it, and leaves the validator's contract unchanged.
   
   Concretely, the non-LCA branch becomes:
   ```suggestion
           val expandedAggregate = 
groupingAnalyticsResolver.resolve(finalAggregate)
           AggregationValidator(expandedAggregate)
   ```
   and `skipGroupingExprChecks` can then be removed from 
`assertValidAggregation`, `AggregationValidator.apply`, and the LCA-path call 
site. Not a correctness issue with the current code — a materially simpler and 
less error-prone shape.



##########
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala:
##########
@@ -5118,6 +5118,156 @@ class SQLQuerySuite extends SharedSparkSession with 
AdaptiveSparkPlanHelper
       sql("SELECT col1 - rand() FROM VALUES(1) GROUP BY ALL")
     }
   }
+
+  test("SPARK-57353: CUBE with ORDER BY - single pass resolver") {
+    withSQLConf(SQLConf.ANALYZER_SINGLE_PASS_RESOLVER_ENABLED.key -> "true") {
+      QueryTest.checkAnswer(
+        sql(
+          """SELECT a, SUM(b) as s FROM VALUES (1,10),(1,20),(2,30) AS t(a,b)
+            |GROUP BY CUBE(a) ORDER BY s""".stripMargin),
+        Row(1, 30) :: Row(2, 30) :: Row(null, 60) :: Nil,
+        checkToRDD = false)
+    }
+  }
+
+  test("SPARK-57353: ROLLUP with HAVING - single pass resolver") {
+    withSQLConf(SQLConf.ANALYZER_SINGLE_PASS_RESOLVER_ENABLED.key -> "true") {
+      QueryTest.checkAnswer(
+        sql(
+          """SELECT a, SUM(b) FROM VALUES (1,10),(1,20),(2,30) AS t(a,b)
+            |GROUP BY ROLLUP(a) HAVING SUM(b) > 30""".stripMargin),
+        Row(null, 60) :: Nil,
+        checkToRDD = false)
+    }
+  }
+
+  test("SPARK-57353: GROUPING SETS with ORDER BY - single pass resolver") {
+    withSQLConf(SQLConf.ANALYZER_SINGLE_PASS_RESOLVER_ENABLED.key -> "true") {
+      QueryTest.checkAnswer(
+        sql(
+          """SELECT a, b, SUM(b) as s FROM VALUES (1,10),(1,20),(2,30) AS 
t(a,b)
+            |GROUP BY a, b GROUPING SETS ((a, b), (a)) ORDER BY 
s""".stripMargin),
+        Row(1, 10, 10) :: Row(1, 20, 20) :: Row(1, null, 30) ::
+          Row(2, 30, 30) :: Row(2, null, 30) :: Nil,
+        checkToRDD = false)
+    }
+  }
+
+  test("SPARK-57353: CUBE with NULL grouping columns - single pass resolver") {
+    withSQLConf(SQLConf.ANALYZER_SINGLE_PASS_RESOLVER_ENABLED.key -> "true") {
+      QueryTest.checkAnswer(
+        sql(
+          """SELECT a, SUM(b) as s FROM VALUES (1,10),(null,20),(2,30) AS 
t(a,b)
+            |GROUP BY CUBE(a) ORDER BY s""".stripMargin),
+        Row(1, 10) :: Row(null, 20) :: Row(2, 30) :: Row(null, 60) :: Nil,
+        checkToRDD = false)
+    }
+  }
+
+  test("SPARK-57353: ROLLUP with HAVING filtering all rows - single pass 
resolver") {
+    withSQLConf(SQLConf.ANALYZER_SINGLE_PASS_RESOLVER_ENABLED.key -> "true") {
+      QueryTest.checkAnswer(
+        sql(
+          """SELECT a, SUM(b) FROM VALUES (1,10),(1,20),(2,30) AS t(a,b)
+            |GROUP BY ROLLUP(a) HAVING SUM(b) > 100""".stripMargin),
+        Nil,
+        checkToRDD = false)
+    }
+  }
+
+  test("SPARK-57353: CUBE with multiple aggregates in ORDER BY - single pass 
resolver") {
+    withSQLConf(SQLConf.ANALYZER_SINGLE_PASS_RESOLVER_ENABLED.key -> "true") {
+      QueryTest.checkAnswer(
+        sql(
+          """SELECT a, SUM(b) as s, COUNT(b) as c
+            |FROM VALUES (1,10),(1,20),(2,30) AS t(a,b)
+            |GROUP BY CUBE(a) ORDER BY COUNT(b), SUM(b)""".stripMargin),
+        Row(2, 30, 1) :: Row(1, 30, 2) :: Row(null, 60, 3) :: Nil,
+        checkToRDD = false)
+    }
+  }
+
+  // SPARK-57346: multi-column ROLLUP with HAVING produces wrong results in 
single-pass
+  // resolver (1 row instead of 4). This test is ignored until SPARK-57346 is 
fixed.
+  ignore("SPARK-57353: multi-column ROLLUP with HAVING - single pass resolver 
(SPARK-57346)") {

Review Comment:
   Under pure `singlePassResolver.enabled=true`, this shape now silently 
returns wrong results (1 row instead of 4) where it previously crashed — a 
fail-loud → fail-silent regression, and pure single-pass mode has no 
dual-run/tentative fallback to catch it. The `ignore`d test documents the bug 
but the shape still resolves and executes.
   
   Would it be safer to keep this shape throwing 
`ExplicitlyUnsupportedResolverFeature` — the same treatment LCA + grouping 
analytics gets a few lines up — until SPARK-57346 is fixed? That keeps a 
data-correctness bug loud rather than silent, and the guard can be lifted 
together with the fix.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/AggregateResolver.scala:
##########
@@ -144,25 +147,44 @@ class AggregateResolver(
       )
 
       if (resolvedAggregateExpressions.hasLateralColumnAlias) {
+        // LCA + grouping analytics (CUBE/ROLLUP/GROUPING SETS) is not yet 
supported in the
+        // single-pass resolver because Expand mints new attribute IDs that 
get out of sync
+        // with the Project operator. Fall back to legacy analyzer for correct 
results.
+        if 
(finalAggregate.groupingExpressions.exists(_.isInstanceOf[BaseGroupingSets])) {
+          throw new ExplicitlyUnsupportedResolverFeature(
+            "lateral column alias with grouping analytics")
+        }
         val aggregateWithLcaResolutionResult =
           lcaResolver.handleLcaInAggregate(finalAggregate)
+        val lcaBaseAggregate = aggregateWithLcaResolutionResult.baseAggregate
+        AggregationValidator(lcaBaseAggregate)

Review Comment:
   This re-validates an aggregate that was already validated: 
`handleLcaInAggregate` calls `AggregationValidator(aggregate)` internally 
(LateralColumnAliasResolver.scala:77) and returns that same aggregate as 
`baseAggregate`. Harmless since validation is idempotent, but the second call 
can be dropped.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/AggregateResolver.scala:
##########
@@ -144,25 +147,44 @@ class AggregateResolver(
       )
 
       if (resolvedAggregateExpressions.hasLateralColumnAlias) {
+        // LCA + grouping analytics (CUBE/ROLLUP/GROUPING SETS) is not yet 
supported in the
+        // single-pass resolver because Expand mints new attribute IDs that 
get out of sync
+        // with the Project operator. Fall back to legacy analyzer for correct 
results.
+        if 
(finalAggregate.groupingExpressions.exists(_.isInstanceOf[BaseGroupingSets])) {
+          throw new ExplicitlyUnsupportedResolverFeature(
+            "lateral column alias with grouping analytics")

Review Comment:
   This guard is only exercised indirectly — the `LCA with GROUPING SETS` test 
runs in tentative mode and asserts the (correct) fallback results, never that 
this throw fires. A direct `intercept[ExplicitlyUnsupportedResolverFeature]` 
under `enabled=true` would lock in the guard's contract so a future change 
can't silently stop throwing here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-57353][SQL] Fix single-pass resolver crash on GROUPING SETS with HAVING/ORDER BY [spark]

Reply via email to