Jay Paulson (CE CEN) wrote:
Jay Paulson (CE CEN) wrote:

Hello everyone!  I've been given the responsiblity of coding an apache 
access_log parser.  What my tasks are to do is to return the number of hits for 
certain file extensions that happen on certain dates with specific IP address.

As of now I'm only going back 7 days in the log looking for this information 
and I'm only looking for 5 file types (.doc, .pdf, .html, .php, and .flv).  I'm 
using the fgets() function so I can read the file line by line and do the 
matches that I need to do and increment the counters as needed.  Right now I 
have 3 loops looking for everything, which seems to me not to be the best way 
of doing this.  I've also encountered that a line may have the file extension I 
want but it's actually the soucre of another file.  (see below for example)

Log file example:
I want the first line but not the second line.  The second line has a .css file 
which was used by the .html file therefore I don't want this line.  I do want 
the first line that all it has is .html and no other files.

10.25.40.64 - - [01/Jan/2006:07:33:18 -0600] "GET /home.html HTTP/1.1" 200 8220 "-" 
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
10.25.40.64 - - [01/Jan/2006:07:33:18 -0600] "GET /styles/redesign.css HTTP/1.1" 200 2381 
"http://wfmu.wfm.pvt/home.html"; "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

At any rate, here's some of my psudo code/code for what I'm trying to 
accomplish.  I know there has to be a better way for this and I'm looking for 
suggestions!

<snip>

Save yourself a ton of work. Dump the raw logs into a db, and you can do all the queries on the db. Something like this...

I took your idea and did a search on Google and found that this has already 
been done for me!  Check it out!

http://www.php-scripts.com/php_diary/012103.php3

Very cool :)

This is the script I wrote when we first started this project a few months ago to parse the 2+ years of log files, and intially get them into the db. If you want to use parts of it, feel free.

http://john.nichel.net/parse.phps

--
John C. Nichel IV
Programmer/System Admin (ÜberGeek)
Dot Com Holdings of Buffalo
716.856.9675
[EMAIL PROTECTED]

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to