Stripping scripts from HTML with regular expressions

Michel Bouwmans Wed, 09 Apr 2008 12:42:49 -0700

Hey everyone,

I'm trying to strip all script-blocks from a HTML-file using regex.


I tried the following in Python:

testfile = open('testfile')
testhtml = testfile.read()
regex = re.compile('<script\b[^>]*>(.*?)</script>', re.DOTALL)
result = regex.sub('', blaat)
print result

This strips far more away then just the script-blocks. Am I missing
something from the regex-implementation from Python or am I doing something
else wrong?

greetz
MFB
-- 
http://mail.python.org/mailman/listinfo/python-list

Stripping scripts from HTML with regular expressions

Reply via email to