Looking at the regex you have to match an IP address, I think you would
need to put a range limit on each of the four octets you are searching
for (as each one would be between 1 and 3 digits long.)
For example: r =
re.match(r"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b",line) has worked for me.
I am no expert on regex (it scares me!) I got the example above from:
http://www.regular-expressions.info/examples.html
Hope my semi-coherent ramblings have been of some help
Regards
Peter
On 19/06/11 12:25, Gerhardus Geldenhuis wrote:
Hi
I am trying to write a small program that will scan my access.conf
file and update iptables to block anyone looking for stuff that they
are not supposed to.
The code:
#!/usr/bin/python
import sys
import re
def extractoffendingip(filename):
f = open(filename,'r')
filecontents = f.read()
#193.6.135.21 - - [11/Jun/2011:13:58:01 +0000] "GET
/admin/pma/scripts/setup.php HTTP/1.1" 404 304 "-" "Mozilla/4.0
(compatible; MSIE 6.0; Windows 98)"
tuples = re.findall(r'^(\d+\.\d+\.\d+\.\d+).*\"GET(.*)HTTP',
filecontents)
iplist = []
for items in tuples:
(ip, getstring) = items
print ip,getstring
#print item
if ip not in iplist:
iplist.append(ip)
for item in iplist:
print item
#ipmatch = re.search(r'', filecontents)
def main():
extractoffendingip('access_log.1')
if __name__ == '__main__':
main()
logfile=http://pastebin.com/F3RXDYBW
I could probably have used ranges to be more correct about finding
ip's but I thought that apache should take care of that. I am assuming
a level or integrity in the log file with regards to data...
The first problem I ran into was that I added a ^ to my search string:
re.findall(r'^(\d+\.\d+\.\d+\.\d+).*\"GET(.*)HTTP', filecontents)
but that finds only two results a lot less than I am expecting. I am a
little bit confused, first I thought that it might be because the
string I am searching is now only one line because of the method of
loading and the ^ should only find one instance but instead it finds two?
So removing the ^ works much better but now I get mostly correct
results but I also get a number of ip's with an empty get string, only
thought there should be only one in the log file. I would really
appreciate any pointers as to what is going on here.
Regards
--
Gerhardus Geldenhuis
_______________________________________________
Tutor maillist - [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
--
LinkedIn Profile: http://linkedin.com/in/pmjlavelle
Twitter: http://twitter.com/pmjlavelle
_______________________________________________
Tutor maillist - [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor