Using a local sort per partition only gives a correct result if the data
is already range partitioned.

On 25.10.2013 16:11, Nathan Kronenfeld wrote:
> Since no one else has answered...
> I assume:
> 
>     data.mapPartitions(_.toList.sortBy(...).toIterator)
> 
> would work, but I also suspect there's a better way.
> 
> 
> On Fri, Oct 25, 2013 at 5:01 AM, Arun Kumar <[email protected]> wrote:
> 
>> Hi,
>>
>> I am trying to process some logs and the data is sorted(*almost*) by
>> timestamp.
>> If I do a full sort it takes a lot of time. Is there some way to sort more
>> efficiently (like restricting sort to per partition).
>>
>> Thanks in advance
>>
> 
> 
> 

Reply via email to