Lao Mao wrote:
Hi,
I have 3 servers which generate about 2G of webserver logfiles in a
day. These are available on my machine over NFS.
I would like to draw up some stats which shows, for a given keyword,
how many times it appears in the logs, per hour, over the previous week.
So the behavior might be:
$ ./webstats --keyword downloader
Which would read from the logs (which it has access to) and produce
something like:
Monday:
0000: 12
0100: 17
etc
I'm not sure how best to get started. My initial idea would be to
filter the logs first, pulling out the lines with matching keywords,
then check the timestamp - maybe incrementing a dictionary if the
logfile was within a certain time?
I'm not looking for people to write it for me, but I'd appreciate some
guidance as the the approach and algorithm. Also what the simplest
presentation model would be. Or even if it would make sense to stick
it in a database! I'll post back my progress.
Thanks,
Laomao
------------------------------------------------------------------------
_______________________________________________
Tutor maillist - [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
grep -c <keyword> <file-mask eg. *.log>
or if you are looking for only stuff for today for eg then
grep <date> | grep -c <keyword> <file-mask>
That would be the simplest implementation. For a python implementation
think about dictionaries with multiple layers like {Date: {Keyword1:
Count, Keyword2: Count}. Essentially you would just iterate over the
file, check if the line contains your keyword(s) that you are looking
for and then incrementing the counter for it.
--
Kind Regards,
Christian Witts
Business Intelligence
C o m p u s c a n | Confidence in Credit
Telephone: +27 21 888 6000
National Cell Centre: 0861 51 41 31
Fax: +27 21 413 2424
E-mail: [email protected]
NOTE: This e-mail (including attachments )is subject to the disclaimer
published at: http://www.compuscan.co.za/live/content.php?Item_ID=494.
If you cannot access the disclaimer, request it from
[email protected] or 0861 514131.
National Credit Regulator Credit Bureau Registration No. NCRCB6
_______________________________________________
Tutor maillist - [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor