Stephen is exactly correct, I just wanted to point out that in Spark 0.8.1
and above, the "repartition" function has been added to be a clearer way to
accomplish what you want. ("Coalescing" into a larger number of partitions
doesn't make much linguistic sense.)


On Thu, Oct 31, 2013 at 7:48 AM, Stephen Haberman <
[email protected]> wrote:

>
> > Is it possible to repartition RDDs other than by the coalesce method.
> > I am primarily interested in making finer grained partitioning or
> > rebalancing an unbalanced parttioning, without coalescing.
>
> I believe if you use the shuffle=true parameter, coalesce will do what
> you want, and essentially becomes a general "repartition" method.
>
> Specifically, yes, while shuffle=false can only make larger partitions,
> but with shuffle=true, you can break your partitions up into many
> smaller partitions, with the content based on a hash partitioner.
>
> I believe that's what you're asking for?
>
> - Stephen
>
>
>

Reply via email to