Using a local sort per partition only gives a correct result if the data is already range partitioned.
On 25.10.2013 16:11, Nathan Kronenfeld wrote: > Since no one else has answered... > I assume: > > data.mapPartitions(_.toList.sortBy(...).toIterator) > > would work, but I also suspect there's a better way. > > > On Fri, Oct 25, 2013 at 5:01 AM, Arun Kumar <[email protected]> wrote: > >> Hi, >> >> I am trying to process some logs and the data is sorted(*almost*) by >> timestamp. >> If I do a full sort it takes a lot of time. Is there some way to sort more >> efficiently (like restricting sort to per partition). >> >> Thanks in advance >> > > >
