Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/16005
@hvanhovell I want to get your opinion on this. The more I read the code in
this block of `pullOutCorrelatedPredicates`
// Simplify the predicates before pulling them out.
val transformed = BooleanSimplification(sub) transformUp {
case f @ Filter(cond, child) => ...
case p @ Project(expressions, child) => ...
case a @ Aggregate(grouping, expressions, child) => ...
case w : Window => ...
case j @ Join(left, _, RightOuter, _) => ...
case j @ Join(left, right, FullOuter, _) => ...
case j @ Join(_, right, jt, _) if !jt.isInstanceOf[InnerLike] => ...
case u: Union => ...
case s: SetOperation => ...
case e: Expand => ...
case l : LocalLimit => ...
case g : GlobalLimit => ...
case s : Sample => ...
case p =>
failOnOuterReference(p)
...
}
The code disallows operators in a sub plan of an operator hosting
correlation on a case by case basis. As it is today, it only blocks
Union/Intersect/Except/Expand/LocalLimit/GlobalLimit/Sample/FOJ and right table
of LOJ (and left table of ROJ). That means any LogicalPlan operators that are
not in the list above are permitted to be under a correlation point. Is this
risky? There are many (30+ at least from browsing the LogicalPlan type
hierarchy) operators derived from LogicalPlan class. Should we whitelist what
operators allowed? For the case of ScalarSubquery, it explicitly checks that
only SubqueryAlias/Project/Filter/Aggregate are allowed (CheckAnalysis.scala
around line 126-165 in and after `def cleanQuery`). If we go this route, we
should allow, in addition to the ones allowed in ScalarSubquery: Join,
Distinct, Sort, OneRowRelation. I am debating about including Window though.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]