Stephen Nelson-Smith wrote: > I think I'm having a major understanding failure.
Perhaps this will help ... http://www.learningpython.com/2009/02/23/iterators-iterables-and-generators-oh-my/ <snip> > So in essence this: > > logs = [ LogFile( "/home/stephen/qa/ded1353/quick_log.gz", "04/Nov/2009" ), > LogFile( "/home/stephen/qa/ded1408/quick_log.gz", "04/Nov/2009" ), > LogFile( "/home/stephen/qa/ded1409/quick_log.gz", "04/Nov/2009" ) ] > > Gives me a list of LogFiles - each of which has a getline() method, > which returns a tuple. > > I thought I could merge iterables using Kent's recipe, or just with > heapq.merge() But, at this point are your LogFile instances even iterable? AFAICT, the answer is no, and I think you should want them to be in order to use heapq.merge. Have a look at the documentation (http://docs.python.org/library/stdtypes.html#iterator-types) and then re-read Kent's advice, in your previous thread ('Logfile multiplexing'), about "using the iterator protocol" (__iter__). And, judging by the heapq docs (http://docs.python.org/library/heapq.html#heapq.merge) ... """ Merge multiple sorted inputs into a single sorted output (for example, merge timestamped entries from multiple log files). Returns an iterator over the sorted values. """ ... using heapq.merge appears to be a reasonable approach. You might also be interested to know, that while heapq.merge is(was) new in 2.6, it's implementation is very similar (read: nearly identical) to the one of the cookbook recipes referenced by Kent. It's unclear from your previous posts (to me at least) -- are the individual log files already sorted, in chronological order? I'd imagine they are, being log files. But, let's say you were to run your hypothetical merge script against only one file -- would the output to be identical to the input? If not, then you'll want to sort the inputs first. > > But how do I get from a method that can produce a tuple, to some > mergable iterables? > I'm going to re-word this question slightly to "How can I modify the LogFile class, for instances to be usable by heapq.merge?" and make an attempt to answer. The following borrows heavily from Kent's iterator example, but removes your additional line filtering (if self.stamp.startswith(date), etc) to, hopefully, make it clearer. import time, gzip, heapq def timestamp(line): # replace with your own timestamp function # this appears to work with the sample logs I chose stamp = ' '.join(line.split(' ', 3)[:-1]) return time.strptime(stamp, '%b %d %H:%M:%S') class LogFile(object): def __init__(self, filename): self.logfile = gzip.open(filename, 'r') def __iter__(self): for logline in self.logfile: yield (timestamp(logline), logline) logs = [ LogFile("/home/stephen/qa/ded1353/quick_log.gz"), LogFile("/home/stephen/qa/ded1408/quick_log.gz"), LogFile("/home/stephen/qa/ded1409/quick_log.gz") ] merged = heapq.merge(*logs) with open('/tmp/merged_log', 'w') as output: for stamp, line in merged: output.write(line) Will it be fast enough? I have no clue. Good luck! Marty _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor