Re: sortByKey with multiple partitions

Ted Yu Wed, 08 Apr 2015 16:09:56 -0700

See the scaladoc from OrderedRDDFunctions.scala :

   * Sort the RDD by key, so that each partition contains a sorted range of
the elements. Calling
   * `collect` or `save` on the resulting RDD will return or output an
ordered list of records
   * (in the `save` case, they will be written to multiple `part-X` files
in the filesystem, in
   * order of the keys).


Cheers

On Wed, Apr 8, 2015 at 3:01 PM, Tom <thubregt...@gmail.com> wrote:

> Hi,
>
> If I perform a sortByKey(true, 2).saveAsTextFile("filename") on a cluster,
> will the data be sorted per partition, or in total. (And is this
> guaranteed?)
>
> Example:
> Input 4,2,3,6,5,7
>
> Sorted per partition:
> part-00000: 2,3,7
> part-00001: 4,5,6
>
> Sorted in total:
> part-00000: 2,3,4
> part-00001: 5,6,7
>
> Thanks,
>
> Tom
>
> P.S. (I know that the data might not end up being uniformly distributed,
> example: 4 elements in part-00000 and 2 in part-00001)
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/sortByKey-with-multiple-partitions-tp22426.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: sortByKey with multiple partitions

Reply via email to