Oh, for crying out loud. Even after switching to plain text Hotmail still strips out my included XML :-( Let's try again - replace the square brackets below with the appropriate less-than and greater-than symbols.
> From: [EMAIL PROTECTED] > Date: Fri, 31 Aug 2007 14:06:59 +0000 > > Tobia Conforto < tobia.conforto < at> linux.it> writes: > >> I have a data source from which I get SAX text nodes into my pipeline >> that contain escaped HTML entities and tags. In Java syntax: >> >> "Lorem ipsum — dolor sit amet. < br> Consectetuer" >> >> or, in XML syntax: >> >> Lorem ipsum — dolor sit amet. <br> Consectetuer >> >> As you can see, the entities and < br> tags are escaped and part of the >> text node. >> >> I cannot change this data source component, therefore I need a >> transformer to examine every text node in the stream, split it at the >> fake "< br>" tags, substitute them with < xhtml:br/> elements, and >> replace every escaped entity with the relevant Unicode character. > > That's one of the rare cases where I consider < xsl:text > disable-output-escaping="yes"> a valid approach [1]. I don't know if there is > something comparable directly on the Java side. Unless I'm mistaken, doing that on his example would result in an invalid document as there's no matching [/br] element...? It would be okay if it can be guaranteed that the included text is nice well-formed XHTML, but if it's plain old HTML then it sounds to me more like a job for the jtidy or neko-based HTML transformers. We have something similar in our application; I arrange the early part of the pipeline so that the escaped HTML appears within a unique element e.g. [some_escaped_html]Lorem ipsum & lt;br& gt; dolor[/some_escaped_html] , pass it through the html transformer [map:transform type="html"] [map:parameter name="tags" value="some_escaped_html"/] [/map:transform] and follow that by a small xsl transformation to strip out the some_escaped_html elements (and the html & body elements that JTidy inserts) [xsl:template match="vf_escaped_html"] [xsl:apply-templates select="html/body/*"/] [/xsl:template] + the usual "passthrough" templates for all other nodes. Net result, the same SAX stream but with the HTML unescaped and cleaned up so it's well-formed again. Andrew. _________________________________________________________________ Get free emoticon packs and customisation from Windows Live. http://www.pimpmylive.co.uk --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
