Adaptive sorting algorithms (https://en.wikipedia.org/wiki/Adaptive_sort) can benefit from presortedness in their inputs, so that might be a helpful search term for researching this problem.
On Fri, Oct 25, 2013 at 7:23 AM, Nathan Kronenfeld < [email protected]> wrote: > I suspect from his description the difference is negligible for his case. > However, there are ways around that anyway. > > Assuming a fixed data set (as opposed to something like a streaming > example, where there is no last element), one can take 3 passes to: > > 1. get the last element of each partition > 2. take elements from each partition that fall before the last element > of the previous partition, separate them from the rest of their partition > 3. and add them to the previous (whichever previous is appropriate, in > really degenerate cases, which it sounds like he doesn't have) in the right > location > > > > > On Fri, Oct 25, 2013 at 10:17 AM, Sebastian Schelter <[email protected]>wrote: > >> Using a local sort per partition only gives a correct result if the data >> is already range partitioned. >> >> On 25.10.2013 16:11, Nathan Kronenfeld wrote: >> > Since no one else has answered... >> > I assume: >> > >> > data.mapPartitions(_.toList.sortBy(...).toIterator) >> > >> > would work, but I also suspect there's a better way. >> > >> > >> > On Fri, Oct 25, 2013 at 5:01 AM, Arun Kumar <[email protected]> >> wrote: >> > >> >> Hi, >> >> >> >> I am trying to process some logs and the data is sorted(*almost*) by >> >> timestamp. >> >> If I do a full sort it takes a lot of time. Is there some way to sort >> more >> >> efficiently (like restricting sort to per partition). >> >> >> >> Thanks in advance >> >> >> > >> > >> > >> >> > > > -- > Nathan Kronenfeld > Senior Visualization Developer > Oculus Info Inc > 2 Berkeley Street, Suite 600, > Toronto, Ontario M5A 4J5 > Phone: +1-416-203-3003 x 238 > Email: [email protected] >
