[GitHub] spark issue #21156: [SPARK-24087][SQL] Avoid shuffle when join keys are a su...

wangshisan Mon, 08 Oct 2018 19:58:05 -0700

Github user wangshisan commented on the issue:

    https://github.com/apache/spark/pull/21156
  
    What is the status now? I think this is of great value, since this gives 
users more possibility to leverage bucket join, all joins which take the bucket 
key as the prefix of join keys will benefit from this. 
    And we have a further optimization here:
    1. Table A(a1, a2, a3) is bucketed by a1, a2
    2. Table B(b1, b2, b3) is bucketed by b1.
    3. A join B on (a1=b1, a2=b2, a3=b3)
    
    In this case, only table B needs extra shuffle, and shuffle keys are (b1, 
b2).



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21156: [SPARK-24087][SQL] Avoid shuffle when join keys are a su...

Reply via email to