Yes, it is for these map-like operations. The only time when it isn't is when 
you change the RDD's partitioner, e.g. by doing sortByKey or groupByKey. It 
would definitely be good to document this more formally.

Matei

On Oct 3, 2013, at 3:33 PM, Mingyu Kim <[email protected]> wrote:

> Hi all,
> 
> Is the sort order guaranteed if you apply operations like map(), filter() or 
> distinct() after sort in a distributed setting (run on a cluster of machines 
> backed by HDFS)? In other words, does rdd.sortByKey().map() have the same 
> sort order as rdd.sortByKey()? If so, is it documented somewhere which 
> operations preserve sort order and which don't?
> 
> Thanks,
> Mingyu

Reply via email to