[GitHub] spark issue #20393: [SPARK-23207][SQL] Shuffle+Repartition on an RDD/DataFra...

jiangxb1987 Fri, 26 Jan 2018 13:57:06 -0800

Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/20393
  
    Another simple way to ensure correctness of RDD.repartition() is to do 
HashPartitioning instead of current RoundRobinPartitioning, but that will lead 
to regression when you have skew input data.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #20393: [SPARK-23207][SQL] Shuffle+Repartition on an RDD/DataFra...

Reply via email to