I dont know if this is the right place but... if not, sry. ike the title says i need to be able to deduce web crawler behavior from the access log. In particular, i need to understand what this means:
xx.xx.xx.x - - [12/Jun/2008:21:10:31 +0100] "GET /phpmyadmin/main.php HTTP/1.0" 404 1123 "-" "-" xx.xx.x.xx - - [12/Jun/2008:21:10:31 +0100] "GET /phpMyAdmin/main.php HTTP/1.0" 404 1123 "-" "-" xxx.xxx.xx.xx - - [12/Jun/2008:21:10:31 +0100] "GET /db/main.php HTTP/1.0" 404 1123 "-" "-" xxx.xxx.xx.xx - - [12/Jun/2008:21:10:32 +0100] "GET /web/main.php HTTP/1.0" 404 1123 "-" "-" xxx.xxx.xx.xx - - [12/Jun/2008:21:10:32 +0100] "GET /PMA/main.php HTTP/1.0" 404 1123 "-" "-" xxx.xxx.xx.xx - - [12/Jun/2008:21:10:32 +0100] "GET /admin/main.php HTTP/1.0" 404 1123 "-" "-" xxx.xxx.xx.xx - - [12/Jun/2008:21:10:33 +0100] "GET /dbadmin/main.php HTTP/1.0" 404 1123 "-" "-" xxx.xxx.xx.xx - - [12/Jun/2008:21:10:33 +0100] "GET /PMA2006/main.php HTTP/1.0" 404 1123 "-" "-" xxx.xxx.xx.xx - - [12/Jun/2008:21:10:34 +0100] "GET /pma2006/main.php HTTP/1.0" 404 1123 "-" "-" xx.xx.xx.xx - - [12/Jun/2008:21:10:34 +0100] "GET /sqlmanager/main.php HTTP/1.0" 404 1123 "-" "-" where i replaced the ip for x's for privacy sake. this is just an extract... there are probably over 200 lines similar to those where the crawler tries to get main.php file from hundreds of different file paths, most including some folder named phpmyadmin or similar. Is this an attempt to attack the machine? Why does he want the main.php file so bad? thnx in advance -- View this message in context: http://www.nabble.com/deducing-web-crawler-behavior-from-access.log-files-tp18269957p18269957.html Sent from the Nutch - User mailing list archive at Nabble.com.
