Sorry - forgot to include the list. On Mon, Nov 9, 2009 at 9:33 AM, Stephen Nelson-Smith <sanel...@gmail.com> wrote: > On Mon, Nov 9, 2009 at 9:10 AM, ALAN GAULD <alan.ga...@btinternet.com> wrote: >> >>> An apache logfile entry looks like this: >>> >>>89.151.119.196 - - [04/Nov/2009:04:02:10 +0000] "GET >>> /service.php?s=nav&arg[]=&arg[]=home&q=ubercrumb/node%2F20812 >>> HTTP/1.1" 200 50 "-" "-" >>> >>>I want to extract 24 hrs of data based timestamps like this: >>> >>> [04/Nov/2009:04:02:10 +0000] >> >> OK It looks like you could use a regex to extract the first >> thing you find between square brackets. Then convert that to a time. > > I'm currently thinking I can just use a string comparison after the > first entry for the day - that saves date arithmetic. > >> I'd opt for doing it all in one pass. With such large files you really >> want to minimise the amount of time spent reading the file. >> Plus with such large files you will need/want to process them >> line by line anyway rather than reading the whole thing into memory. > > How do I handle concurrency? I have 6 log files which I need to turn > into one time-sequenced log. > > I guess I need to switch between each log depending on whether the > next entry is the next chronological entry between all six. Then on a > per line basis I can also reject it if it matches the stuff I want to > throw out, and substitute it if I need to, then write out to the new > file. > > S. >
-- Stephen Nelson-Smith Technical Director Atalanta Systems Ltd www.atalanta-systems.com _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor