Spark's expressiveness allows you to do this fairly easily on your own. sortByKey is implemented in a few lines of code. It would be fairly easy to implement your own sortByKey to do that. Replace the partitioner in sortByKey with a hash partitioner on the key, and then add define a separate way to sort on each partition after the hash partitioning.
On Wed, Dec 4, 2013 at 10:58 AM, Archit Thakur <[email protected]>wrote: > > > Hi, > > Was just curious. In Hadoop, You have a flexibilty that you can chose your > class for SortComparator and GroupingComparator. I have figured out that > there are functions like sortByKey and reduceByKey. > But what if, I want to customize what part of key I want to use for > sorting and which part for grouping(that which all records should go to > single reducer corresponding to same key.)? Is there any way that could be > achieved, wherein we can specify our own SortComparator and > GroupingComparator. > > Thanks and Regards, > Archit Thakur. > >
