GitHub user wangyum opened a pull request:

    https://github.com/apache/spark/pull/22250

    [SPARK-25259][SQL] left/right join support push down during-join predicates

    ## What changes were proposed in this pull request?
    Prepare data:
    ```sql
    create temporary view EMPLOYEE as select * from values
      ("000010", "HAAS", "A00"),
      ("000010", "THOMPSON", "B01"),
      ("000030", "KWAN", "C01"),
      ("000110", "LUCCHESSI", "A00"),
      ("000120", "O'CONNELL", "A))"),
      ("000130", "QUINTANA", "C01")
      as EMPLOYEE(EMPNO, LASTNAME, WORKDEPT);
    
    create temporary view DEPARTMENT as select * from values
      ("A00", "SPIFFY COMPUTER SERVICE DIV.", "000010"),
      ("B01", "PLANNING", "000020"),
      ("C01", "INFORMATION CENTER", "000030"),
      ("D01", "DEVELOPMENT CENTER", null)
      as EMPLOYEE(DEPTNO, DEPTNAME, MGRNO);
    
    create temporary view PROJECT as select * from values
      ("AD3100", "ADMIN SERVICES", "D01"),
      ("IF1000", "QUERY SERVICES", "C01"),
      ("IF2000", "USER EDUCATION", "E01"),
      ("MA2100", "WELD LINE AUDOMATION", "D01"),
      ("PL2100", "WELD LINE PLANNING", "01")
      as EMPLOYEE(PROJNO, PROJNAME, DEPTNO);
    ```
    For the below SQL, we can push `DEPTNO='E01'` to right side to reduce data 
reading:
    ```sql
    SELECT PROJNO, PROJNAME, P.DEPTNO, DEPTNAME
    FROM PROJECT P LEFT OUTER JOIN DEPARTMENT D
    ON P.DEPTNO = D.DEPTNO AND P.DEPTNO='E01';
    ```
    Optimized SQL is equivalent to:
    ```sql
    SELECT PROJNO, PROJNAME, P.DEPTNO, DEPTNAME
    FROM PROJECT P LEFT OUTER JOIN (SELECT * FROM DEPARTMENT WHERE 
DEPTNO='E01') D
    ON P.DEPTNO = D.DEPTNO AND P.DEPTNO='E01';
    ```
    
    This pr enhancement `PushPredicateThroughJoin` to support this feature.
    
    ## How was this patch tested?
    
    unit tests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wangyum/spark SPARK-25259

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22250.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22250
    
----
commit f9b32d5d044a899529959ad5042f8cf95c789ea8
Author: Yuming Wang <yumwang@...>
Date:   2018-08-28T06:18:05Z

    left/right join support push down during-join predicates

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to