Github user wangshisan commented on the issue: https://github.com/apache/spark/pull/21156 What is the status now? I think this is of great value, since this gives users more possibility to leverage bucket join, all joins which take the bucket key as the prefix of join keys will benefit from this. And we have a further optimization here: 1. Table A(a1, a2, a3) is bucketed by a1, a2 2. Table B(b1, b2, b3) is bucketed by b1. 3. A join B on (a1=b1, a2=b2, a3=b3) In this case, only table B needs extra shuffle, and shuffle keys are (b1, b2).
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org