tanelk commented on a change in pull request #34402:
URL: https://github.com/apache/spark/pull/34402#discussion_r740718765



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
##########
@@ -687,10 +687,10 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog {
 
       case inSubqueryOrExistsSubquery =>
         plan match {
-          case _: Filter | _: SupportsSubquery | _: Join => // Ok
+          case _: SupportsSubquery | _: Join | _: UnaryNode => // Ok

Review comment:
       The Exists/In subqueries in UnaryNodes are handled by inserting an 
existance join (or joins) bellow the UnaryNode and then the subquery is 
replaced with an attribute reference. This is a very generic pattern and should 
not depend on the actual UnaryNode. Yes, Filter is handled separately but that 
is a performance optimization, not a correctness issue (LeftSemi/LeftAnti are 
better than existance join).
   
   But if you think that we should stay safe and only allow a subset of 
UnaryNodes, then I would be totally fine with supporting only the most common 
nodes - Project, Aggregate, Window and perhaps Sort.
   
   I'll add some extra test cases for aggregate.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to