Hi
I am trying to write a small program that will scan my access.conf file and
update iptables to block anyone looking for stuff that they are not supposed
to.

The code:
#!/usr/bin/python
import sys
import re

def extractoffendingip(filename):
  f = open(filename,'r')
  filecontents = f.read()
#193.6.135.21 - - [11/Jun/2011:13:58:01 +0000] "GET
/admin/pma/scripts/setup.php HTTP/1.1" 404 304 "-" "Mozilla/4.0 (compatible;
MSIE 6.0; Windows 98)"
  tuples = re.findall(r'^(\d+\.\d+\.\d+\.\d+).*\"GET(.*)HTTP', filecontents)
  iplist = []
  for items in tuples:
    (ip, getstring) = items
    print ip,getstring
    #print item
    if ip not in iplist:
      iplist.append(ip)
  for item in iplist:
    print item
  #ipmatch = re.search(r'', filecontents)

def main():
  extractoffendingip('access_log.1')

if __name__ == '__main__':
  main()

logfile=http://pastebin.com/F3RXDYBW


I could probably have used ranges to be more correct about finding ip's but
I thought that apache should take care of that. I am assuming a level or
integrity in the log file with regards to data...

The first problem I ran into was that I added a ^ to my search string:
re.findall(r'^(\d+\.\d+\.\d+\.\d+).*\"GET(.*)HTTP', filecontents)

but that finds only two results a lot less than I am expecting. I am a
little bit confused, first I thought that it might be because the string I
am searching is now only one line because of the method of loading and the ^
should only find one instance but instead it finds two?

So removing the ^ works much better but now I get mostly correct results but
I also get a number of ip's with an empty get string, only thought there
should be only one in the log file. I would really appreciate any pointers
as to what is going on here.

Regards

-- 
Gerhardus Geldenhuis
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to