GitHub user megaserg opened a pull request:
https://github.com/apache/spark/pull/18990
[SPARK-21782][Core] Repartition creates skews when numPartitions is a power
of 2
## Problem
When an RDD (particularly with a low item-per-partition ratio) is
repartitioned to numPartitions = power of 2, the resulting partitions are very
uneven-sized, due to using fixed seed to initialize PRNG, and using the PRNG
only once. See details in https://issues.apache.org/jira/browse/SPARK-21782
## What changes were proposed in this pull request?
Instead of using fixed seed, use a default constuctor for `Random`.
## How was this patch tested?
`build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.rdd.RDDSuite test`
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/megaserg/spark repartition-skew
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18990.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18990
----
commit 2cb7550b8ecada3c504621a75c4f82d13880496b
Author: Sergey Serebryakov <[email protected]>
Date: 2017-08-18T05:47:55Z
[SPARK-21782][Core] Repartition creates skews when numPartitions is a power
of 2
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]