Hi I am trying to write a small program that will scan my access.conf file and update iptables to block anyone looking for stuff that they are not supposed to.
The code: #!/usr/bin/python import sys import re def extractoffendingip(filename): f = open(filename,'r') filecontents = f.read() #193.6.135.21 - - [11/Jun/2011:13:58:01 +0000] "GET /admin/pma/scripts/setup.php HTTP/1.1" 404 304 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)" tuples = re.findall(r'^(\d+\.\d+\.\d+\.\d+).*\"GET(.*)HTTP', filecontents) iplist = [] for items in tuples: (ip, getstring) = items print ip,getstring #print item if ip not in iplist: iplist.append(ip) for item in iplist: print item #ipmatch = re.search(r'', filecontents) def main(): extractoffendingip('access_log.1') if __name__ == '__main__': main() logfile=http://pastebin.com/F3RXDYBW I could probably have used ranges to be more correct about finding ip's but I thought that apache should take care of that. I am assuming a level or integrity in the log file with regards to data... The first problem I ran into was that I added a ^ to my search string: re.findall(r'^(\d+\.\d+\.\d+\.\d+).*\"GET(.*)HTTP', filecontents) but that finds only two results a lot less than I am expecting. I am a little bit confused, first I thought that it might be because the string I am searching is now only one line because of the method of loading and the ^ should only find one instance but instead it finds two? So removing the ^ works much better but now I get mostly correct results but I also get a number of ip's with an empty get string, only thought there should be only one in the log file. I would really appreciate any pointers as to what is going on here. Regards -- Gerhardus Geldenhuis
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor