On Mon, Nov 9, 2009 at 8:47 AM, Alan Gauld <alan.ga...@btinternet.com> wrote:
> I'm not familiar with Apache log files so I'll let somebody else answer, > but I suspect you can either use string.split() or a re.findall(). You might > even be able to use csv. Or if they are in XML you could use ElementTree. > It all depends on the data! An apache logfile entry looks like this: 89.151.119.196 - - [04/Nov/2009:04:02:10 +0000] "GET /service.php?s=nav&arg[]=&arg[]=home&q=ubercrumb/node%2F20812 HTTP/1.1" 200 50 "-" "-" I want to extract 24 hrs of data based timestamps like this: [04/Nov/2009:04:02:10 +0000] I also need to do some filtering (eg I actually don't want anything with service.php), and I also have to do some substitutions - that's trivial other than not knowing the optimum place to do it? IE should I do multiple passes? Or should I try to do all the work at once, only viewing each line once? Also what about reading from compressed files? The data comes in as 6 gzipped logfiles which expand to 6G in total. S. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor