On Mon, Nov 9, 2009 at 7:46 AM, Stephen Nelson-Smith <sanel...@gmail.com>wrote:
> And the problem I have with the below is that I've discovered that the > input logfiles aren't strictly ordered - ie there is variance by a > second or so in some of the entries. > Within a given set of 10 lines, is the first line and last line "in order" - i.e. 1 2 4 3 5 8 7 6 9 10 > I can sort the biggest logfile (800M) using unix sort in about 1.5 > mins on my workstation. That's not really fast enough, with > potentially 12 other files.... > If that's the case, then I'm pretty sure you can create sort of a queue system, and it should probably cut down on the sorting time. I don't know what the default python sorting algorithm is on a list, but AFAIK you'd be looking at a constant O(log 10) time on each insertion by doing something such as this: log_generator = (d for d in logdata) mylist = # first ten values while True: try: mylist.sort() nextdata = mylist.pop(0) mylist.append(log_generator.next()) except StopIteration: print 'done' #Do something with nextdata Or now that I look, python has a priority queue ( http://docs.python.org/library/heapq.html ) that you could use instead. Just push the next value into the queue and pop one out - you give it some initial qty - 10 or so, and then it will always give you the smallest value. HTH, Wayne
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor