"Iain King" <[EMAIL PROTECTED]> writes:

> I have some code that converts html into xhtml.  For example, convert
> all <i> tags into <em>.  Right now I need to do to string.replace calls
> for every tag:
>
> html = html.replace('<i>','<em>')
> html = html.replace('</i>','</em>')
>
> I can change this to a single call to re.sub:
>
> html = re.sub('<([/]*)i>', r'<\1em>', html)
>
> Would this be a quicker/better way of doing it?

Maybe. You could measure it and see. But neither will work in the face
of attributes or whitespace in the tag.

If you're going to parse [X]HTML, you really should use tools that are
designed for the job. If you have well-formed HTML, you can use the
htmllib parser in the standard library. If you have the usual crap one
finds on the web, I recommend BeautifulSoup.

      <mike
-- 
Mike Meyer <[EMAIL PROTECTED]>                  http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to