On 7 January 2013 22:10, Victor Hooi <victorh...@gmail.com> wrote: > Hi, > > I'm trying to compare two logfiles in Python. > > One logfile will have lines recording the message being sent: > > 05:00:06 Message sent - Value A: 5.6, Value B: 6.2, Value C: 9.9 > > the other logfile has line recording the message being received > > 05:00:09 Message received - Value A: 5.6, Value B: 6.2, Value C: 9.9 > > The goal is to compare the time stamp between the two - we can safely assume > the timestamp on the message being received is later than the timestamp on > transmission. > > If it was a direct line-by-line, I could probably use itertools.izip(), right? > > However, it's not a direct line-by-line comparison of the two files - the > lines I'm looking for are interspersed among other loglines, and the time > difference between sending/receiving is quite variable. > > So the idea is to iterate through the sending logfile - then iterate through > the receiving logfile from that timestamp forwards, looking for the matching > pair. Obviously I want to minimise the amount of back-forth through the file. > > Also, there is a chance that certain messages could get lost - so I assume > there's a threshold after which I want to give up searching for the matching > received message, and then just try to resync to the next sent message. > > Is there a Pythonic way, or some kind of idiom that I can use to approach > this problem?
Assuming that you can impose a maximum time between the send and recieve timestamps, something like the following might work (untested): def find_matching(logfile1, logfile2, maxdelta): buf = {} logfile2 = iter(logfile2) for msg1 in logfile1: if msg1.key in buf: yield msg1, buf.pop(msg1.key) continue maxtime = msg1.time + maxdelta for msg2 in logfile2: if msg2.key == msg1.key: yield msg1, msg2 break buf[msg2.key] = msg2 if msg2.time > maxtime: break else: yield msg1, 'No match' Oscar -- http://mail.python.org/mailman/listinfo/python-list