tanelk commented on a change in pull request #34402:
URL: https://github.com/apache/spark/pull/34402#discussion_r740718765
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
##########
@@ -687,10 +687,10 @@ trait CheckAnalysis extends PredicateHelper with
LookupCatalog {
case inSubqueryOrExistsSubquery =>
plan match {
- case _: Filter | _: SupportsSubquery | _: Join => // Ok
+ case _: SupportsSubquery | _: Join | _: UnaryNode => // Ok
Review comment:
The Exists/In subqueries in UnaryNodes are handled by inserting an
existance join (or joins) bellow the UnaryNode and then the subquery is
replaced with an attribute reference. This is a very generic pattern and should
not depend on the actual UnaryNode. Yes, Filter is handled separately but that
is a performance optimization, not a correctness issue (LeftSemi/LeftAnti are
better than existance join).
But if you think that we should stay safe and only allow a subset of
UnaryNodes, then I would be totally fine with supporting only the most common
nodes - Project, Aggregate, Window and perhaps Sort.
I'll add some extra test cases for aggregate.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]