Adaptive sorting algorithms (https://en.wikipedia.org/wiki/Adaptive_sort)
can benefit from presortedness in their inputs, so that might be a helpful
search term for researching this problem.


On Fri, Oct 25, 2013 at 7:23 AM, Nathan Kronenfeld <
[email protected]> wrote:

> I suspect from his description the difference is negligible for his case.
>  However, there are ways around that anyway.
>
> Assuming a fixed data set (as opposed to something like a streaming
> example, where there is no last element), one can take 3 passes to:
>
>    1. get the last element of each partition
>    2. take elements from each partition that fall before the last element
>    of the previous partition, separate them from the rest of their partition
>    3. and add them to the previous (whichever previous is appropriate, in
>    really degenerate cases, which it sounds like he doesn't have) in the right
>    location
>
>
>
>
> On Fri, Oct 25, 2013 at 10:17 AM, Sebastian Schelter <[email protected]>wrote:
>
>> Using a local sort per partition only gives a correct result if the data
>> is already range partitioned.
>>
>> On 25.10.2013 16:11, Nathan Kronenfeld wrote:
>> > Since no one else has answered...
>> > I assume:
>> >
>> >     data.mapPartitions(_.toList.sortBy(...).toIterator)
>> >
>> > would work, but I also suspect there's a better way.
>> >
>> >
>> > On Fri, Oct 25, 2013 at 5:01 AM, Arun Kumar <[email protected]>
>> wrote:
>> >
>> >> Hi,
>> >>
>> >> I am trying to process some logs and the data is sorted(*almost*) by
>> >> timestamp.
>> >> If I do a full sort it takes a lot of time. Is there some way to sort
>> more
>> >> efficiently (like restricting sort to per partition).
>> >>
>> >> Thanks in advance
>> >>
>> >
>> >
>> >
>>
>>
>
>
> --
> Nathan Kronenfeld
> Senior Visualization Developer
> Oculus Info Inc
> 2 Berkeley Street, Suite 600,
> Toronto, Ontario M5A 4J5
> Phone:  +1-416-203-3003 x 238
> Email:  [email protected]
>

Reply via email to