I dont know if this is the right place but... if not, sry.

ike the title says i need to be able to deduce web crawler behavior from the
access log.
In particular, i need to understand what this means:

xx.xx.xx.x - - [12/Jun/2008:21:10:31 +0100] "GET /phpmyadmin/main.php
HTTP/1.0" 404 1123 "-" "-"

xx.xx.x.xx - - [12/Jun/2008:21:10:31 +0100] "GET /phpMyAdmin/main.php
HTTP/1.0" 404 1123 "-" "-"

xxx.xxx.xx.xx - - [12/Jun/2008:21:10:31 +0100] "GET /db/main.php HTTP/1.0"
404 1123 "-" "-"

xxx.xxx.xx.xx - - [12/Jun/2008:21:10:32 +0100] "GET /web/main.php HTTP/1.0"
404 1123 "-" "-"

xxx.xxx.xx.xx - - [12/Jun/2008:21:10:32 +0100] "GET /PMA/main.php HTTP/1.0"
404 1123 "-" "-"

xxx.xxx.xx.xx - - [12/Jun/2008:21:10:32 +0100] "GET /admin/main.php
HTTP/1.0" 404 1123 "-" "-"

xxx.xxx.xx.xx - - [12/Jun/2008:21:10:33 +0100] "GET /dbadmin/main.php
HTTP/1.0" 404 1123 "-" "-"

xxx.xxx.xx.xx - - [12/Jun/2008:21:10:33 +0100] "GET /PMA2006/main.php
HTTP/1.0" 404 1123 "-" "-"

xxx.xxx.xx.xx - - [12/Jun/2008:21:10:34 +0100] "GET /pma2006/main.php
HTTP/1.0" 404 1123 "-" "-"

xx.xx.xx.xx - - [12/Jun/2008:21:10:34 +0100] "GET /sqlmanager/main.php
HTTP/1.0" 404 1123 "-" "-"


where i replaced the ip for x's for privacy sake.

this is just an extract... there are probably over 200 lines similar to
those where the crawler tries to get main.php file from hundreds of
different file paths, most including some folder named phpmyadmin or
similar.

Is this an attempt to attack the machine? Why does he want the main.php file
so bad?

thnx in advance
-- 
View this message in context: 
http://www.nabble.com/deducing-web-crawler-behavior-from-access.log-files-tp18269957p18269957.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to