I actually had to do this, and did it mainly by hand, in the newfangled option for seeing the KLI's Klingon-only journal (Qo'noS QonoS) in Klingon lettering (using the PUA assignments in CSUR). e.g. http://www.kli.org/QQ/QQ0402.html?mode=UTF
In order to simplify things, I naturally made things more complicated. :) So when you request an issue in pIqaD mode, the program fetches itself from the server (since its raw source code is PHP and not HTML; this way it gets the true HTML version of itself), and then replaces the text with pIqaD, being careful not to replace the contents of the HTML tags. The things you have to go through...
~mark
John Cowan wrote:
Mark E. Shoulson scripsit:
Heh... I've occasionally caught myself almost wishing for this kind of setup, ridiculous though it be. It would be nice to be able to get just the *content* of the text without having to bother with all that mucking about with HTML rendering engines and whatnot.
TSaxon (http://www.ccil.org/~cowan/XML/tagsoup/tsaxon) is the ticket here, with a trivial stylesheet that just specifies text output. Use the -H switch to allow arbitrary HTML input.

