On Thu, Feb 11, 2010 at 4:56 AM, Lao Mao <[email protected]> wrote: > Hi, > > I have 3 servers which generate about 2G of webserver logfiles in a day. > These are available on my machine over NFS. > > I would like to draw up some stats which shows, for a given keyword, how > many times it appears in the logs, per hour, over the previous week. > > So the behavior might be: > > $ ./webstats --keyword downloader > > Which would read from the logs (which it has access to) and produce > something like: > > Monday: > 0000: 12 > 0100: 17 > > etc > > I'm not sure how best to get started. My initial idea would be to filter > the logs first, pulling out the lines with matching keywords, then check the > timestamp - maybe incrementing a dictionary if the logfile was within a > certain time? > > I'm not looking for people to write it for me, but I'd appreciate some > guidance as the the approach and algorithm. Also what the simplest > presentation model would be. Or even if it would make sense to stick it in > a database! I'll post back my progress. > > Thanks, > > Laomao > > _______________________________________________ > Tutor maillist - [email protected] > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor > >
You may also find this link useful http://effbot.org/zone/wide-finder.htm on parsing logs efficiently using Python. _______________________________________________ Tutor maillist - [email protected] To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
