allisonwang-db opened a new pull request #32054: URL: https://github.com/apache/spark/pull/32054
### What changes were proposed in this pull request? This PR adds two additional checks in `CheckAnalysis` for correlated scalar subquery in Aggregate. It blocks the cases that Spark do not currently support based on the rewrite logic in `RewriteCorrelatedScalarSubquery`: https://github.com/apache/spark/blob/aff6c0febb40d9713895ba00d8c77ba00f04bd16/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala#L618-L624 ### Why are the changes needed? It can be confusing to users when their queries pass the check analysis but cannot be executed. Also, the error messages are confusing: #### Case 1: correlated scalar subquery in the grouping expressions but not in aggregate expressions ```sql SELECT SUM(c2) FROM t t1 GROUP BY (SELECT SUM(c2) FROM t t2 WHERE t1.c1 = t2.c1) ``` We get this error: ``` java.lang.AssertionError: assertion failed: Expects 1 field, but got 2; something went wrong in analysis ``` because the correlated scalar subquery is not rewritten properly: ```scala == Optimized Logical Plan == Aggregate [scalar-subquery#5 [(c1#6 = c1#6#93)]], [sum(c2#7) AS sum(c2)#11L] : +- Aggregate [c1#6], [sum(c2#7) AS sum(c2)#15L, c1#6 AS c1#6#93] : +- LocalRelation [c1#6, c2#7] +- LocalRelation [c1#6, c2#7] ``` #### Case 2: correlated scalar subquery in the aggregate expressions but not in the grouping expressions ```sql SELECT (SELECT SUM(c2) FROM t t2 WHERE t1.c1 = t2.c1), SUM(c2) FROM t t1 GROUP BY c1 ``` We get this error: ``` java.lang.IllegalStateException: Couldn't find sum(c2)#69L in [c1#60,sum(c2#61)#64L] ``` because the transformed correlated scalar subquery output is not present in the grouping expression of the Aggregate: ```scala == Optimized Logical Plan == Aggregate [c1#60], [sum(c2)#69L AS scalarsubquery(c1)#70L, sum(c2#61) AS sum(c2)#65L] +- Project [c1#60, c2#61, sum(c2)#69L] +- Join LeftOuter, (c1#60 = c1#60#95) :- LocalRelation [c1#60, c2#61] +- Aggregate [c1#60], [sum(c2#61) AS sum(c2)#69L, c1#60 AS c1#60#95] +- LocalRelation [c1#60, c2#61] ``` ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? New unit tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
