Re: convert xhtml back to html

M.-A. Lemburg Thu, 24 Apr 2008 10:43:42 -0700

On 2008-04-24 19:16, John Krukoff wrote:

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:python-
[EMAIL PROTECTED] On Behalf Of Tim Arnold
Sent: Thursday, April 24, 2008 9:34 AM
To: python-list@python.org
Subject: convert xhtml back to html


hi, I've got lots of xhtml pages that need to be fed to MS HTML Workshop
to
create  CHM files. That application really hates xhtml, so I need to
convert
self-ending tags (e.g. <br />) to plain html (e.g. <br>).

Seems simple enough, but I'm having some trouble with it. regexps trip up
because I also have to take into account 'img', 'meta', 'link' tags, not
just the simple 'br' and 'hr' tags. Well, maybe there's a simple way to do
that with regexps, but my simpleminded <img[^(/>)]+/> doesn't work. I'm
not
enough of a regexp pro to figure out that lookahead stuff.

I'm not sure where to start now; I looked at BeautifulSoup and
BeautifulStoneSoup, but I can't see how to modify the actual tag.


You could filter the XHTML through mxTidy and set the hide_endtags to 1:

http://www.egenix.com/products/python/mxExperimental/mxTidy/

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 24 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
--
http://mail.python.org/mailman/listinfo/python-list

Re: convert xhtml back to html

Reply via email to