Is shuffle stable?

2014-06-14 Thread Daniel Darabos
What I mean is, let's say I run this: sc.parallelize(Seq(0-3, 0-2, 0-1), 3).partitionBy(HashPartitioner(3)).collect Will the result always be Array((0,3), (0,2), (0,1))? Or could I possibly get a different order? I'm pretty sure the shuffle files are taken in the order of the source

Re: Is shuffle stable?

2014-06-14 Thread Matei Zaharia
The order is not guaranteed actually, only which keys end up in each partition. Reducers may fetch data from map tasks in an arbitrary order, depending on which ones are available first. If you’d like a specific order, you should sort each partition. Here you might be getting it because each

Re: Is shuffle stable?

2014-06-14 Thread Daniel Darabos
Thanks Matei! In the example all three items have the same key, so they go to the same partition: scala sc.parallelize(Seq(0-3, 0-2, 0-1), 3).partitionBy(new HashPartitioner(3)).glom.collect Array(Array((0,3), (0,2), (0,1)), Array(), Array()) I guess the apparent stability is just due to the