On 2008-04-24 19:16, John Krukoff wrote:
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:python-
[EMAIL PROTECTED] On Behalf Of Tim Arnold
Sent: Thursday, April 24, 2008 9:34 AM
To: python-list@python.org
Subject: convert xhtml back to html
hi, I've got lots of xhtml pages that need to be fed to MS HTML Workshop
to
create CHM files. That application really hates xhtml, so I need to
convert
self-ending tags (e.g. <br />) to plain html (e.g. <br>).
Seems simple enough, but I'm having some trouble with it. regexps trip up
because I also have to take into account 'img', 'meta', 'link' tags, not
just the simple 'br' and 'hr' tags. Well, maybe there's a simple way to do
that with regexps, but my simpleminded <img[^(/>)]+/> doesn't work. I'm
not
enough of a regexp pro to figure out that lookahead stuff.
I'm not sure where to start now; I looked at BeautifulSoup and
BeautifulStoneSoup, but I can't see how to modify the actual tag.
You could filter the XHTML through mxTidy and set the hide_endtags to 1:
http://www.egenix.com/products/python/mxExperimental/mxTidy/
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Apr 24 2008)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
--
http://mail.python.org/mailman/listinfo/python-list