Re: [Tutor] regex woes in finding an ip and GET string

Peter Otten Mon, 20 Jun 2011 00:59:30 -0700

Gerhardus Geldenhuis wrote:

> I am trying to write a small program that will scan my access.conf file
> and update iptables to block anyone looking for stuff that they are not
> supposed to.
> 
> The code:
> #!/usr/bin/python
> import sys
> import re
> 
> def extractoffendingip(filename):
>   f = open(filename,'r')
>   filecontents = f.read()
> #193.6.135.21 - - [11/Jun/2011:13:58:01 +0000] "GET
> /admin/pma/scripts/setup.php HTTP/1.1" 404 304 "-" "Mozilla/4.0
> (compatible; MSIE 6.0; Windows 98)"
>   tuples = re.findall(r'^(\d+\.\d+\.\d+\.\d+).*\"GET(.*)HTTP',
>   filecontents)


If you want to process the whole file at once you have to use the 
re.MULTILINE flag for the regex to match the start of a line instead of the 
start of the whole text:

    tuples = re.compile(r'...', re.MULTILINE).findall(filecontents)

But I think it's better to process the file one line at a time.

>   iplist = []
    [snip]
>     if ip not in iplist:
>       iplist.append(ip)

So you want every unique ip appear only once in iplist. Python offers an 
efficient data structure for that, the set. With these changes your funtion 
becomes something like (untested)

def extractoffendingips(filename):
    match = re.compile(r'^(\d+\.\d+\.\d+\.\d+).*\"GET(.*)HTTP').match
    ipset = set()
    with open(filename) as f:
        for line in f:
            m = match(line)
            if m is not None:
                ip, getstring = m.groups()
                ipset.add(ip)
    for item in ipset:
        print item


_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] regex woes in finding an ip and GET string

Reply via email to