Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/12306#discussion_r60009754
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -1447,3 +1450,133 @@ object EmbedSerializerInFilter extends
Rule[LogicalPlan] {
}
}
}
+
+/**
+ * This rule rewrites predicate sub-queries into left semi/anti joins. The
following predicates
+ * are supported:
+ * a. EXISTS/NOT EXISTS will be rewritten as semi/anti join, unresolved
conditions in Filter
+ * will be pulled out as the join conditions.
+ * b. IN/NOT IN will be rewritten as semi/anti join, unresolved conditions
in the Filter will
+ * be pulled out as join conditions, value = selected column will also
be used as join
+ * condition.
+ */
+object RewritePredicateSubquery extends Rule[LogicalPlan] with
PredicateHelper {
+ /**
+ * Pull out all correlated predicates from a given sub-query. This
method removes the correlated
+ * predicates from sub-query [[Filter]]s and adds the references of
these predicates to
+ * all intermediate [[Project]] clauses (if they are missing) in order
to be able to evaluate the
+ * predicates in the join condition.
+ *
+ * This method returns the rewritten sub-query and the combined (AND)
extracted predicate.
+ */
+ private def pullOutCorrelatedPredicates(
+ subquery: LogicalPlan,
+ query: LogicalPlan): (LogicalPlan, Option[Expression]) = {
+ val references: Set[Expression] = query.output.toSet
--- End diff --
It's better to use AttributeSet or ExpressionSet
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]