subject:"RE\: Repartitioning by partition size, not by number of partitions."

RE: Repartitioning by partition size, not by number of partitions.

2014-10-31 Thread jan.zikes

Hi Ilya, This seems to me as quiet complicated solution, I'm thinking that easier (though not optimal) solution might be for example to use heuristicaly something like RDD.coalesce(RDD.getNumPartitions() / N), but it keeps me wonder that Spark does not have something like RDD.coalesce(partitio

RE: Repartitioning by partition size, not by number of partitions.

2014-10-31 Thread Ganelin, Ilya

Hi Jan. I've actually written a function recently to do precisely that using the RDD.randomSplit function. You just need to calculate how big each element of your data is, then how many of each data can fit in each RDD to populate the input to rqndomSplit. Unfortunately, in my case I wind up wit