See the scaladoc from OrderedRDDFunctions.scala : * Sort the RDD by key, so that each partition contains a sorted range of the elements. Calling * `collect` or `save` on the resulting RDD will return or output an ordered list of records * (in the `save` case, they will be written to multiple `part-X` files in the filesystem, in * order of the keys).
Cheers On Wed, Apr 8, 2015 at 3:01 PM, Tom <thubregt...@gmail.com> wrote: > Hi, > > If I perform a sortByKey(true, 2).saveAsTextFile("filename") on a cluster, > will the data be sorted per partition, or in total. (And is this > guaranteed?) > > Example: > Input 4,2,3,6,5,7 > > Sorted per partition: > part-00000: 2,3,7 > part-00001: 4,5,6 > > Sorted in total: > part-00000: 2,3,4 > part-00001: 5,6,7 > > Thanks, > > Tom > > P.S. (I know that the data might not end up being uniformly distributed, > example: 4 elements in part-00000 and 2 in part-00001) > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/sortByKey-with-multiple-partitions-tp22426.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >