"Arnaud Delobelle" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > "Tim Arnold" <[EMAIL PROTECTED]> writes: > >> hi, I've got lots of xhtml pages that need to be fed to MS HTML Workshop >> to >> create CHM files. That application really hates xhtml, so I need to >> convert >> self-ending tags (e.g. <br />) to plain html (e.g. <br>). >> >> Seems simple enough, but I'm having some trouble with it. regexps trip up >> because I also have to take into account 'img', 'meta', 'link' tags, not >> just the simple 'br' and 'hr' tags. Well, maybe there's a simple way to >> do >> that with regexps, but my simpleminded <img[^(/>)]+/> doesn't work. I'm >> not >> enough of a regexp pro to figure out that lookahead stuff. > > Hi, I'm not sure if this is very helpful but the following works on > the very simple example below. > >>>> import re >>>> xhtml = '<p>hello <img src="/img.png"/> spam <br/> bye </p>' >>>> xtag = re.compile(r'<([^>]*?)/>') >>>> xtag.sub(r'<\1>', xhtml) > '<p>hello <img src="/img.png"> spam <br> bye </p>' > > > -- > Arnaud
Thanks for that. It is helpful--I guess I had a brain malfunction. Your example will work for me I'm pretty sure, except in some cases where the IMG alt text contains a gt sign. I'm not sure that's even possible, so maybe this will do the job. thanks, --Tim -- http://mail.python.org/mailman/listinfo/python-list