attilapiros commented on code in PR #48627:
URL: https://github.com/apache/spark/pull/48627#discussion_r1859540322
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala:
##########
@@ -246,6 +267,197 @@ object RewritePredicateSubquery extends Rule[LogicalPlan]
with PredicateHelper {
}
}
+ // Handle the case where the left-hand side of an IN-subquery contains an
aggregate.
+ //
+ // This handler pulls up any expression containing such an IN-subquery
into a new Project
+ // node, replacing aggregate expressions with attributes, and then
re-enters
+ // RewritePredicateSubquery#apply, where the new Project node will be
handled
+ // by the Unary node handler.
+ //
+ // The Unary node handler uses the left-hand side of the IN-subquery in a
+ // join condition. Thus, without this pre-transformation, the join
condition
+ // contains an aggregate, which is illegal. With this pre-transformation,
the
+ // join condition contains an attribute from the left-hand side of the
+ // IN-subquery contained in the Project node.
+ //
+ // For example:
+ //
+ // SELECT col1, SUM(col2) IN (SELECT c2 FROM v1) as x
+ // FROM v2 GROUP BY col1;
+ //
+ // The above query has this plan on entry to
RewritePredicateSubquery#apply:
+ //
+ // Aggregate [col1#28], [col1#28, sum(col2#29) IN (list#24 []) AS x#25]
+ // : +- LocalRelation [c2#35L]
+ // +- LocalRelation [col1#28, col2#29]
+ //
+ // Note that the Aggregate node contains the IN-subquery and the left-hand
+ // side of the IN-subquery is an aggregate expression (sum(col2#28)).
+ //
+ // This handler transforms the above plan into the following:
+ //
+ // Project [col1#28, sum(col2)#36L IN (list#24 []) AS x#25]
+ // : +- LocalRelation [c2#35L]
+ // +- Aggregate [col1#28], [col1#28, sum(col2#29) AS sum(col2)#36L]
+ // +- LocalRelation [col1#28, col2#29]
+ //
+ // The transformation pulled the IN-subquery up into a Project. The
left-hand side of the
+ // IN-subquery is now an attribute (sum(col2)#36L) that refers to the
actual aggregation
+ // which is still performed in the Aggregate node (sum(col2#28) AS
sum(col2)#36L). The Unary
Review Comment:
```suggestion
// which is still performed in the Aggregate node (sum(col2#29) AS
sum(col2)#36L). The Unary
```
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala:
##########
@@ -246,6 +267,197 @@ object RewritePredicateSubquery extends Rule[LogicalPlan]
with PredicateHelper {
}
}
+ // Handle the case where the left-hand side of an IN-subquery contains an
aggregate.
+ //
+ // This handler pulls up any expression containing such an IN-subquery
into a new Project
+ // node, replacing aggregate expressions with attributes, and then
re-enters
+ // RewritePredicateSubquery#apply, where the new Project node will be
handled
+ // by the Unary node handler.
+ //
+ // The Unary node handler uses the left-hand side of the IN-subquery in a
+ // join condition. Thus, without this pre-transformation, the join
condition
+ // contains an aggregate, which is illegal. With this pre-transformation,
the
+ // join condition contains an attribute from the left-hand side of the
+ // IN-subquery contained in the Project node.
+ //
+ // For example:
+ //
+ // SELECT col1, SUM(col2) IN (SELECT c2 FROM v1) as x
+ // FROM v2 GROUP BY col1;
+ //
+ // The above query has this plan on entry to
RewritePredicateSubquery#apply:
+ //
+ // Aggregate [col1#28], [col1#28, sum(col2#29) IN (list#24 []) AS x#25]
+ // : +- LocalRelation [c2#35L]
+ // +- LocalRelation [col1#28, col2#29]
+ //
+ // Note that the Aggregate node contains the IN-subquery and the left-hand
+ // side of the IN-subquery is an aggregate expression (sum(col2#28)).
Review Comment:
```suggestion
// side of the IN-subquery is an aggregate expression (sum(col2#29)).
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]