GitHub user henryr opened a pull request:

    https://github.com/apache/spark/pull/19647

    [SPARK-22211][SQL] Remove incorrect FOJ limit pushdown

    ## What changes were proposed in this pull request?
    
    It's not safe in all cases to push down a LIMIT below a FULL OUTER
    JOIN. If the limit is pushed to one side of the FOJ, the physical
    join operator can not tell if a row in the non-limited side would have a
    match in the other side.
    
    *If* the join operator guarantees that unmatched tuples from the limited
    side are emitted before any unmatched tuples from the other side,
    pushing down the limit is safe. But this is impractical for some join
    implementations, e.g. SortMergeJoin.
    
    For now, disable limit pushdown through a FULL OUTER JOIN, and we can
    evaluate whether a more complicated solution is necessary in the future.
    
    ## How was this patch tested?
    
    Ran org.apache.spark.sql.* tests. Altered full outer join tests in
    LimitPushdownSuite.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/henryr/spark spark-22211

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19647.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19647
    
----
commit 0e0123b0887aa33c382a63ee7843be5f64d38b41
Author: Henry Robinson <[email protected]>
Date:   2017-10-30T22:48:33Z

    [SPARK-22211][SQL] Remove incorrect FOJ limit pushdown
    
    ## What changes were proposed in this pull request?
    
    It's not safe in all cases to push down a LIMIT below a FULL OUTER
    JOIN. If the limit is pushed to one side of the FOJ, the physical
    join operator can not tell if a row in the non-limited side would have a
    match in the other side.
    
    *If* the join operator guarantees that unmatched tuples from the limited
    side are emitted before any unmatched tuples from the other side,
    pushing down the limit is safe. But this is impractical for some join
    implementations, e.g. SortMergeJoin.
    
    For now, disable limit pushdown through a FULL OUTER JOIN, and we can
    evaluate whether a more complicated solution is necessary in the future.
    
    ## How was this patch tested?
    
    Ran org.apache.spark.sql.* tests. Altered full outer join tests in
    LimitPushdownSuite.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to