Hello I have a data source from which I get SAX text nodes into my pipeline that contain escaped HTML entities and <br> tags. In Java syntax:
"Lorem ipsum — dolor sit amet. <br> Consectetuer" or, in XML syntax: Lorem ipsum &mdash; dolor sit amet. <br> Consectetuer As you can see, the entities and <br> tags are escaped and part of the text node. I cannot change this data source component, therefore I need a transformer to examine every text node in the stream, split it at the fake "<br>" tags, substitute them with <xhtml:br/> elements, and replace every escaped entity with the relevant Unicode character. I tried doing it with the Parser transformer, but it's too slow. I tried using the HTML transformer, but I couldn't get it to work. My question is: what do you suggest I use on the Java side? Is there anything like PHP's html_entity_decode() available somewhere in a library that Cocoon is already using, that can parse and convert HTML 4.0 entities with a single pass on the string? Tobia --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
