Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/20393
Another simple way to ensure correctness of RDD.repartition() is to do
HashPartitioning instead of current RoundRobinPartitioning, but that will lead
to regression when you have skew input data.--- --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
