On Thu, Feb 11, 2010 at 4:56 AM, Lao Mao <[email protected]> wrote: > Hi, > > I have 3 servers which generate about 2G of webserver logfiles in a day. > These are available on my machine over NFS. > > I would like to draw up some stats which shows, for a given keyword, how > many times it appears in the logs, per hour, over the previous week. > > So the behavior might be: > > $ ./webstats --keyword downloader > > Which would read from the logs (which it has access to) and produce > something like: > > Monday: > 0000: 12 > 0100: 17 > > etc > > I'm not sure how best to get started. My initial idea would be to filter > the logs first, pulling out the lines with matching keywords, then check the > timestamp - maybe incrementing a dictionary if the logfile was within a > certain time?
I would use itertools.groupby() to group lines by hour, then look for the keywords and increment a count. The technique of stacking generators as a processing pipeline might be useful. See David Beazley's "Generator Tricks for System Programmers" http://www.dabeaz.com/generators-uk/index.html Loghetti might also be useful as a starting point or code reference: http://code.google.com/p/loghetti/ Kent _______________________________________________ Tutor maillist - [email protected] To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
