cloud-fan commented on a change in pull request #32072:
URL: https://github.com/apache/spark/pull/32072#discussion_r609902319



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
##########
@@ -656,3 +705,301 @@ object RewriteCorrelatedScalarSubquery extends 
Rule[LogicalPlan] with AliasHelpe
       }
   }
 }
+
+/**
+ * Decorrelate the inner query by eliminating outer references and create 
domain joins.
+ * The implementation is based on the paper: Unnesting Arbitrary Queries by 
Thomas Neumann
+ * and Alfons Kemper. https://dl.gi.de/handle/20.500.12116/2418.
+ * (1) Recursively collects outer references from the inner query until it 
reaches a node
+ *     that does not contain correlated value.
+ * (2) Inserts an optional [[DomainJoin]] node to indicate whether a domain 
(inner) join is
+ *     needed between the outer query and the specific subtree of the inner 
query.
+ * (3) Returns a list of join conditions with the outer query and a mapping 
between outer
+ *     references with references inside the inner query. The parent nodes 
need to preserve
+ *     the references inside the join conditions and substitute all outer 
references using
+ *     the mapping.
+ *
+ * E.g. decorrelate an inner query with equality predicates:
+ *
+ * Aggregate [] [min(c2)]            Aggregate [c1] [min(c2), c1]
+ * +- Filter [outer(c3) = c1]   =>   +- Relation [t]
+ *    +- Relation [t]
+ *
+ * Join conditions: [c3 = c1]
+ *
+ * E.g. decorrelate an inner query with non-equality predicates:
+ *
+ * Aggregate [] [min(c2)]            Aggregate [c3'] [min(c2), c3']
+ * +- Filter [outer(c3) > c1]   =>   +- Filter [c3' > c1]
+ *    +- Relation [t]                   +- DomainJoin [c3']
+ *                                         +- Relation [t]
+ *
+ * Join conditions: [c3 <=> c3']
+ */
+object DecorrelateInnerQuery extends PredicateHelper {

Review comment:
       shall we move it to a new file?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to