This happened because they were integers equal to 0 mod 5, and we used the 
default hashCode implementation for integers, which will map them all to 0. 
There’s no API method that will look at the resulting partition sizes and 
rebalance them, but you could use another hash function.

Matei

On Mar 24, 2014, at 5:20 PM, Walrus theCat <walrusthe...@gmail.com> wrote:

> Hi,
> 
> sc.parallelize(Array.tabulate(100)(i=>i)).filter( _ % 20 == 0 
> ).coalesce(5,true).glom.collect  yields
> 
> Array[Array[Int]] = Array(Array(0, 20, 40, 60, 80), Array(), Array(), 
> Array(), Array())
> 
> How do I get something more like:
> 
>  Array(Array(0), Array(20), Array(40), Array(60), Array(80))
> 
> Thanks

Reply via email to