GitHub user dilipbiswal opened a pull request:

    https://github.com/apache/spark/pull/21886

    [SPARK-21274][SQL] Implement INTERSECT ALL clause

    ## What changes were proposed in this pull request?
    Implements INTERSECT ALL clause through query rewrites using existing 
operators in Spark.  Please refer to 
[Link](https://drive.google.com/open?id=1nyW0T0b_ajUduQoPgZLAsyHK8s3_dko3ulQuxaLpUXE)
 for the design.
    
    Input Query
    ``` SQL
    SELECT c1 FROM ut1 INTERSECT ALL SELECT c1 FROM ut2
    ```
    Rewritten Query
    ```SQL
       SELECT c1
        FROM (
             SELECT replicate_row(min_count, c1) AS (min_count, c1)
             FROM (
                  SELECT c1,
                         vcol1_cnt,
                         vcol2_cnt,
                         IF (vcol1_cnt > vcol1_cnt, vcol2_cnt, vcol1_cnt) AS 
min_count
                  FROM (
                       SELECT   c1, count(vcol1) as vcol1_cnt, count(vcol2) as 
vcol2_cnt
                       FROM (
                            SELECT c1, true as vcol1, null as vcol2 FROM ut1
                            UNION ALL
                            SELECT c1, null as vcol1, true as vcol2 FROM ut2
                            ) AS union_all
                       GROUP BY c1
                       HAVING vcol1_cnt >= 1 AND vcol2_cnt >= 1
                      )
                  )
              )
    ```
    
    ## How was this patch tested?
    Added test cases in SQLQueryTestSuite, DataFrameSuite, SetOperationSuite

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dilipbiswal/spark dkb_intersect_all_final

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21886.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21886
    
----
commit b313782ae3aa0756b3f2046bf0b9ac4cab4870f8
Author: Dilip Biswal <dbiswal@...>
Date:   2018-07-25T06:23:28Z

    generator

commit 1039e47e98efdcccbf64e392848b6cc04156bf77
Author: Dilip Biswal <dbiswal@...>
Date:   2018-05-07T08:31:11Z

    [SPARK-21274] Implement INTERSECT ALL clause

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to