What I mean is, let's say I run this:
sc.parallelize(Seq(0-3, 0-2, 0-1), 3).partitionBy(HashPartitioner(3)).collect
Will the result always be Array((0,3), (0,2), (0,1))? Or could I
possibly get a different order?
I'm pretty sure the shuffle files are taken in the order of the source
The order is not guaranteed actually, only which keys end up in each partition.
Reducers may fetch data from map tasks in an arbitrary order, depending on
which ones are available first. If you’d like a specific order, you should sort
each partition. Here you might be getting it because each
Thanks Matei!
In the example all three items have the same key, so they go to the same
partition:
scala sc.parallelize(Seq(0-3, 0-2, 0-1), 3).partitionBy(new
HashPartitioner(3)).glom.collect
Array(Array((0,3), (0,2), (0,1)), Array(), Array())
I guess the apparent stability is just due to the