It looks like OrderedRDDFunctions ( https://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.OrderedRDDFunctions), which defines sortBy(), is constructed with an implicit Ordered[K], so you could explicitly construct an OrderedRDDFunctions with your own Ordered. You might also be able to define an implicit Ordered[K] that takes precedence over the default ordering in the scope where you call sortBy().
On Wed, Dec 4, 2013 at 1:09 PM, Reynold Xin <[email protected]> wrote: > Spark's expressiveness allows you to do this fairly easily on your own. > > sortByKey is implemented in a few lines of code. It would be fairly easy > to implement your own sortByKey to do that. Replace the partitioner in > sortByKey with a hash partitioner on the key, and then add define a > separate way to sort on each partition after the hash partitioning. > > > On Wed, Dec 4, 2013 at 10:58 AM, Archit Thakur > <[email protected]>wrote: > >> >> >> Hi, >> >> Was just curious. In Hadoop, You have a flexibilty that you can chose >> your class for SortComparator and GroupingComparator. I have figured out >> that there are functions like sortByKey and reduceByKey. >> But what if, I want to customize what part of key I want to use for >> sorting and which part for grouping(that which all records should go to >> single reducer corresponding to same key.)? Is there any way that could be >> achieved, wherein we can specify our own SortComparator and >> GroupingComparator. >> >> Thanks and Regards, >> Archit Thakur. >> >> >
