Looking at the regex you have to match an IP address, I think you would need to put a range limit on each of the four octets you are searching for (as each one would be between 1 and 3 digits long.)

For example: r = re.match(r"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b",line) has worked for me.

I am no expert on regex (it scares me!) I got the example above from:
http://www.regular-expressions.info/examples.html


Hope my semi-coherent ramblings have been of some help

Regards

Peter

On 19/06/11 12:25, Gerhardus Geldenhuis wrote:
Hi
I am trying to write a small program that will scan my access.conf file and update iptables to block anyone looking for stuff that they are not supposed to.

The code:
#!/usr/bin/python
import sys
import re

def extractoffendingip(filename):
  f = open(filename,'r')
  filecontents = f.read()
#193.6.135.21 - - [11/Jun/2011:13:58:01 +0000] "GET /admin/pma/scripts/setup.php HTTP/1.1" 404 304 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)" tuples = re.findall(r'^(\d+\.\d+\.\d+\.\d+).*\"GET(.*)HTTP', filecontents)
  iplist = []
  for items in tuples:
    (ip, getstring) = items
    print ip,getstring
    #print item
    if ip not in iplist:
      iplist.append(ip)
  for item in iplist:
    print item
  #ipmatch = re.search(r'', filecontents)

def main():
  extractoffendingip('access_log.1')

if __name__ == '__main__':
  main()

logfile=http://pastebin.com/F3RXDYBW


I could probably have used ranges to be more correct about finding ip's but I thought that apache should take care of that. I am assuming a level or integrity in the log file with regards to data...

The first problem I ran into was that I added a ^ to my search string:
re.findall(r'^(\d+\.\d+\.\d+\.\d+).*\"GET(.*)HTTP', filecontents)

but that finds only two results a lot less than I am expecting. I am a little bit confused, first I thought that it might be because the string I am searching is now only one line because of the method of loading and the ^ should only find one instance but instead it finds two?

So removing the ^ works much better but now I get mostly correct results but I also get a number of ip's with an empty get string, only thought there should be only one in the log file. I would really appreciate any pointers as to what is going on here.

Regards

--
Gerhardus Geldenhuis


_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


--
LinkedIn Profile: http://linkedin.com/in/pmjlavelle
Twitter: http://twitter.com/pmjlavelle

_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to