GitHub user dilipbiswal opened a pull request: https://github.com/apache/spark/pull/21886
[SPARK-21274][SQL] Implement INTERSECT ALL clause ## What changes were proposed in this pull request? Implements INTERSECT ALL clause through query rewrites using existing operators in Spark. Please refer to [Link](https://drive.google.com/open?id=1nyW0T0b_ajUduQoPgZLAsyHK8s3_dko3ulQuxaLpUXE) for the design. Input Query ``` SQL SELECT c1 FROM ut1 INTERSECT ALL SELECT c1 FROM ut2 ``` Rewritten Query ```SQL SELECT c1 FROM ( SELECT replicate_row(min_count, c1) AS (min_count, c1) FROM ( SELECT c1, vcol1_cnt, vcol2_cnt, IF (vcol1_cnt > vcol1_cnt, vcol2_cnt, vcol1_cnt) AS min_count FROM ( SELECT c1, count(vcol1) as vcol1_cnt, count(vcol2) as vcol2_cnt FROM ( SELECT c1, true as vcol1, null as vcol2 FROM ut1 UNION ALL SELECT c1, null as vcol1, true as vcol2 FROM ut2 ) AS union_all GROUP BY c1 HAVING vcol1_cnt >= 1 AND vcol2_cnt >= 1 ) ) ) ``` ## How was this patch tested? Added test cases in SQLQueryTestSuite, DataFrameSuite, SetOperationSuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/dilipbiswal/spark dkb_intersect_all_final Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21886.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21886 ---- commit b313782ae3aa0756b3f2046bf0b9ac4cab4870f8 Author: Dilip Biswal <dbiswal@...> Date: 2018-07-25T06:23:28Z generator commit 1039e47e98efdcccbf64e392848b6cc04156bf77 Author: Dilip Biswal <dbiswal@...> Date: 2018-05-07T08:31:11Z [SPARK-21274] Implement INTERSECT ALL clause ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org