I think this repartitionAndSortWithinPartitions() method may be what I'm looking for in [1]. At least it sounds like it is. Will this method allow me to deal with sorted partitions even when the partition doesn't fit into memory?
[1] https://github.com/apache/spark/blob/branch-1.2/core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala On Wed, Jan 28, 2015 at 9:16 AM, Corey Nolet <cjno...@gmail.com> wrote: > I'm looking @ the ShuffledRDD code and it looks like there is a method > setKeyOrdering()- is this guaranteed to order everything in the partition? > I'm on Spark 1.2.0 > > On Wed, Jan 28, 2015 at 9:07 AM, Corey Nolet <cjno...@gmail.com> wrote: > >> In all of the soutions I've found thus far, sorting has been by casting >> the partition iterator into an array and sorting the array. This is not >> going to work for my case as the amount of data in each partition may not >> necessarily fit into memory. Any ideas? >> >> On Wed, Jan 28, 2015 at 1:29 AM, Corey Nolet <cjno...@gmail.com> wrote: >> >>> I wanted to update this thread for others who may be looking for a >>> solution to his as well. I found [1] and I'm going to investigate if this >>> is a viable solution. >>> >>> [1] >>> http://stackoverflow.com/questions/23995040/write-to-multiple-outputs-by-key-spark-one-spark-job >>> >>> On Wed, Jan 28, 2015 at 12:51 AM, Corey Nolet <cjno...@gmail.com> wrote: >>> >>>> I need to be able to take an input RDD[Map[String,Any]] and split it >>>> into several different RDDs based on some partitionable piece of the key >>>> (groups) and then send each partition to a separate set of files in >>>> different folders in HDFS. >>>> >>>> 1) Would running the RDD through a custom partitioner be the best way >>>> to go about this or should I split the RDD into different RDDs and call >>>> saveAsHadoopFile() on each? >>>> 2) I need the resulting partitions sorted by key- they also need to be >>>> written to the underlying files in sorted order. >>>> 3) The number of keys in each partition will almost always be too big >>>> to fit into memory. >>>> >>>> Thanks. >>>> >>> >>> >> >