Stephen is exactly correct, I just wanted to point out that in Spark 0.8.1
and above, the "repartition" function has been added to be a clearer way to
accomplish what you want. ("Coalescing" into a larger number of partitions
doesn't make much linguistic sense.)On Thu, Oct 31, 2013 at 7:48 AM, Stephen Haberman < [email protected]> wrote: > > > Is it possible to repartition RDDs other than by the coalesce method. > > I am primarily interested in making finer grained partitioning or > > rebalancing an unbalanced parttioning, without coalescing. > > I believe if you use the shuffle=true parameter, coalesce will do what > you want, and essentially becomes a general "repartition" method. > > Specifically, yes, while shuffle=false can only make larger partitions, > but with shuffle=true, you can break your partitions up into many > smaller partitions, with the content based on a hash partitioner. > > I believe that's what you're asking for? > > - Stephen > > >
