"Tim Arnold" <[EMAIL PROTECTED]> writes: > hi, I've got lots of xhtml pages that need to be fed to MS HTML Workshop to > create CHM files. That application really hates xhtml, so I need to convert > self-ending tags (e.g. <br />) to plain html (e.g. <br>). > > Seems simple enough, but I'm having some trouble with it. regexps trip up > because I also have to take into account 'img', 'meta', 'link' tags, not > just the simple 'br' and 'hr' tags. Well, maybe there's a simple way to do > that with regexps, but my simpleminded <img[^(/>)]+/> doesn't work. I'm not > enough of a regexp pro to figure out that lookahead stuff.
Hi, I'm not sure if this is very helpful but the following works on the very simple example below. >>> import re >>> xhtml = '<p>hello <img src="/img.png"/> spam <br/> bye </p>' >>> xtag = re.compile(r'<([^>]*?)/>') >>> xtag.sub(r'<\1>', xhtml) '<p>hello <img src="/img.png"> spam <br> bye </p>' -- Arnaud -- http://mail.python.org/mailman/listinfo/python-list