Got it. Thanks a lot!

From:  Matei Zaharia <[email protected]>
Reply-To:  "[email protected]"
<[email protected]>
Date:  Thursday, October 3, 2013 6:00 PM
To:  "[email protected]" <[email protected]>
Subject:  Re: Sort order of RDD rows

Yes, it is for these map-like operations. The only time when it isn't is
when you change the RDD's partitioner, e.g. by doing sortByKey or
groupByKey. It would definitely be good to document this more formally.

Matei

On Oct 3, 2013, at 3:33 PM, Mingyu Kim <[email protected]> wrote:

> Hi all,
> 
> Is the sort order guaranteed if you apply operations like map(), filter() or
> distinct() after sort in a distributed setting (run on a cluster of machines
> backed by HDFS)? In other words, does rdd.sortByKey().map() have the same sort
> order as rdd.sortByKey()? If so, is it documented somewhere which operations
> preserve sort order and which don't?
> 
> Thanks,
> Mingyu



Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to