Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16046#discussion_r90043228
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
    @@ -1120,47 +1173,54 @@ class Analyzer(
               } else {
                 a
               }
    -        case w : Window =>
    -          failOnOuterReference(w)
    -          failOnNonEqualCorrelatedPredicate(foundNonEqualCorrelatedPred, w)
    -          w
    -        case j @ Join(left, _, RightOuter, _) =>
    -          failOnOuterReference(j)
    -          failOnOuterReferenceInSubTree(left, "a RIGHT OUTER JOIN")
    -          j
    -        // SPARK-18578: Do not allow any correlated predicate
    -        // in a Full (Outer) Join operator and its descendants
    -        case j @ Join(_, _, FullOuter, _) =>
    -          failOnOuterReferenceInSubTree(j, "a FULL OUTER JOIN")
    -          j
    -        case j @ Join(_, right, jt, _) if !jt.isInstanceOf[InnerLike] =>
    -          failOnOuterReference(j)
    -          failOnOuterReferenceInSubTree(right, "a LEFT (OUTER) JOIN")
    +
    +        // Join can host correlated expressions.
    +        case j @ Join(left, right, joinType, _) =>
    +          joinType match {
    +            // Inner join, like Filter, can be anywhere.
    +            // LeftSemi is a special case of Inner join which returns
    +            // only the first matched row to the right table.
    +            case _: InnerLike | LeftSemi =>
    --- End diff --
    
    I have to admit that have been reviewing a lot of PRs. However I am quite 
sure that you cannot define a correlated predicate in the plan on the  right 
hand side of a `LEFT SEMI`/`early-out join` because we only output the column 
of the plan on the left hand side. For example:
    ```sql
    select *
    from   tbl_a
    where  exists (select 1
                   from tbl_b
                   left semi join( select id
                                   from tbl_c
                                   where tbl_c.id = tbl_a.id) c
                    on c.id = tbl_b.id)
    ```
    In this example we could not move the correlated predicate ` tbl_c.id = 
tbl_a.id` because the Left Semi join does not output `cid`. BTW: In this case 
it would actually be OK to convert the Left Semi join into an Inner join.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to