[GitHub] spark pull request #18990: [SPARK-21782][Core] Repartition creates skews whe...

2017-08-17 Thread megaserg
GitHub user megaserg opened a pull request: https://github.com/apache/spark/pull/18990 [SPARK-21782][Core] Repartition creates skews when numPartitions is a power of 2 ## Problem When an RDD (particularly with a low item-per-partition ratio) is repartitioned to numPartitions

[GitHub] spark issue #18990: [SPARK-21782][Core] Repartition creates skews when numPa...

2017-08-18 Thread megaserg
Github user megaserg commented on the issue: https://github.com/apache/spark/pull/18990 Sorry, I edited the pull request body. The @srowen's comment above was referring to the initial version, where I proposed using default, non-deterministic constructor for `Random()`. --- If your

[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-04-13 Thread megaserg
Github user megaserg commented on the issue: https://github.com/apache/spark/pull/20704 Thank you @dongjoon-hyun! This was also affecting our Spark job performance! We're using `mapreduce.fileoutputcommitter.algorithm.version=2` in our Spark job config, as recommended e.g