Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16046#discussion_r89974976
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
    @@ -1077,10 +1077,54 @@ class Analyzer(
     
           // Simplify the predicates before pulling them out.
           val transformed = BooleanSimplification(sub) transformUp {
    -        // WARNING:
    -        // Only Filter can host correlated expressions at this time
    -        // Anyone adding a new "case" below needs to add the call to
    -        // "failOnOuterReference" to disallow correlated expressions in it.
    +
    +        // Whitelist operators allowed in a correlated subquery
    +        // There are 4 categories:
    +        // 1. Operators that are allowed anywhere in a correlated 
subquery, and,
    +        //    by definition of the operators, they cannot host outer 
references.
    +        // 2. Operators that are allowed anywhere in a correlated subquery
    +        //    so long as they do not host outer references.
    +        // 3. Operators that need special handlings. These operators are
    +        //    Project, Filter, Join, Aggregate, and Generate.
    +        //
    +        // Any operators that are not in the above list are allowed
    +        // in a correlated subquery only if they are not on a correlation 
path.
    +        // In other word, these operators are allowed only under a 
correlation point.
    +        //
    +        // A correlation path is defined as the sub-tree of all the 
operators that
    +        // are on the path from the operator hosting the correlated 
expressions
    +        // up to the operator producing the correlated values.
    +
    +        // Category 1:
    +        // Leaf node can be anywhere in a correlated subquery.
    +        case n: LeafNode =>
    +          n
    +        // Category 2:
    +        // These operators can be anywhere in a correlated subquery.
    +        // so long as they do not host outer references in the operators.
    +        // SubqueryAlias can be anywhere in a correlated subquery.
    +        case p: SubqueryAlias =>
    --- End diff --
    
    You don't need to check `failOnOuterReference ` for `SubqueryAlias`, 
`Distinct`, `Repartition` or `BroadcastHint`. These operators do not contain 
expressions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to