I think this Spark Package may be what you're looking for! http://spark-packages.org/package/tresata/spark-sorted
Best, Burak On Mon, May 4, 2015 at 12:56 PM, Imran Rashid <[email protected]> wrote: > oh wow, that is a really interesting observation, Marco & Jerry. > I wonder if this is worth exposing in combineByKey()? I think Jerry's > proposed workaround is all you can do for now -- use reflection to > side-step the fact that the methods you need are private. > > On Mon, Apr 27, 2015 at 8:07 AM, Saisai Shao <[email protected]> > wrote: > >> Hi Marco, >> >> As I know, current combineByKey() does not expose the related argument >> where you could set keyOrdering on the ShuffledRDD, since ShuffledRDD is >> package private, if you can get the ShuffledRDD through reflection or other >> way, the keyOrdering you set will be pushed down to shuffle. If you use a >> combination of transformations to do it, the result will be same but the >> efficiency may be different, some transformations will separate into >> different stages, which will introduce additional shuffle. >> >> Thanks >> Jerry >> >> >> 2015-04-27 19:00 GMT+08:00 Marco <[email protected]>: >> >>> Hi, >>> >>> I'm trying, after reducing by key, to get data ordered among partitions >>> (like RangePartitioner) and within partitions (like sortByKey or >>> repartitionAndSortWithinPartition) pushing the sorting down to the >>> shuffles machinery of the reducing phase. >>> >>> I think, but maybe I'm wrong, that the correct way to do that is that >>> combineByKey call setKeyOrdering function on the ShuflleRDD that it >>> returns. >>> >>> Am I wrong? Can be done by a combination of other transformations with >>> the same efficiency? >>> >>> Thanks, >>> Marco >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >> >
