Hi Ilya,
This seems to me as quiet complicated solution, I'm thinking that easier
(though not optimal) solution might be for example to use heuristicaly
something like RDD.coalesce(RDD.getNumPartitions() / N), but it keeps me wonder
that Spark does not have something like RDD.coalesce(partitio
Hi Jan. I've actually written a function recently to do precisely that using
the RDD.randomSplit function. You just need to calculate how big each element
of your data is, then how many of each data can fit in each RDD to populate the
input to rqndomSplit. Unfortunately, in my case I wind up wit