GitHub user yucai reopened a pull request: https://github.com/apache/spark/pull/21156
[SPARK-24087][SQL] Avoid shuffle when join keys are a super-set of bucket keys ## What changes were proposed in this pull request? To improve the bucket join, when join keys are a super-set of bucket keys, we should avoid shuffle. ## How was this patch tested? Enable ignored test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yucai/spark SPARK-24087 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21156.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21156 ---- commit b6bfdc21ed8edf98f9a3b9ac1c253c59adb141a2 Author: yucai <yyu1@...> Date: 2018-04-25T00:49:43Z [SPARK-24087][SQL] Avoid shuffle when join keys are a super-set of bucket keys commit a59c94f5b655fc034ce8907b98022cacf6bf318e Author: yucai <yyu1@...> Date: 2018-04-26T04:33:08Z simplify the codes commit 4e026e5e437dc7f578434244b55bb1ebe189bace Author: yucai <yyu1@...> Date: 2018-06-04T02:22:12Z Add spark.sql.sortMergeJoinExec.childrenPartitioningDetection for user to disable this feature commit fa76a7823baf4e6eb05f33bc746ade7f65f44372 Author: yucai <yyu1@...> Date: 2018-06-04T05:25:01Z enable spark.sql.sortMergeJoinExec.childrenPartitioningDetection by default commit 946688aee3d03d37a57270e654e00bb9236f21c4 Author: yucai <yyu1@...> Date: 2018-06-04T05:28:51Z should return commit 981a0fd22d30768ce533982c9fcc701b15d4dc44 Author: yucai <yyu1@...> Date: 2018-07-06T06:51:24Z skip RangePartition commit 76e7d5f67017604c29179ce55280e0fc56574fde Author: yucai <yyu1@...> Date: 2018-07-09T10:14:43Z Merge remote-tracking branch 'origin/master' into pr21156 commit 371c3a932f4dede4aeb1be2c9db404b457547ecf Author: yucai <yyu1@...> Date: 2018-07-09T11:33:17Z improve tests commit de2bc4de76077f257b85e6a1d58ee17fbc770c8e Author: yucai <yyu1@...> Date: 2018-07-12T01:43:35Z support shuffled hash join commit f40606203da01efe400431ed9d2b8b70c0476fc6 Author: yucai <yyu1@...> Date: 2018-07-26T14:33:40Z remove bucket table check ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org