Not sure what statement you're having the problem with, but if you've got xsl:output's charset set to UTF-8, and using disable-output-escaping="yes" (e.g., in xsl:value-of or xsl:text), and still see it, then when I've seen this problem, iti turned out that the data wasn't actually UTF-8.
This just recently happened when I was creating a Xerces text node, and the DOM_String (Xerces 1.6!) was constructed with a char* that pointed to UTF-8, instead of a wchar_t* pointing to UTF-16. What happens is that Xerces interprets char* as a *multibyte* character set, and converts it to UTF-8 using the local codepage. If it is ASCII, no harm done, but if it's really UTF-8 (encoded Japanese, for instance), the UTF-8 is treated as SHIFT-JIS and "converted" (corrupted) to UTF-8. When that is output, you'll get escaped characters because Xalan correctly determines that the byte-stream is not valid UTF-8. Don't know if this digression applies, but make sure you've still got UTF-8 before using Xalan to process it. If it really is UTF-8, I haven't seen a problem.
Nick Bastin <[EMAIL PROTECTED]> wrote:
Nick Bastin <[EMAIL PROTECTED]> wrote:
Xalan is output-escaping UTF-8 text that should most definitely NOT be
escaped in HTML output. It's escaping all of the character 'bytes' as
if they were characters themselves. Is there something that has to be
specifically set in the stylesheet to avoid this? It seems to me that
it should know that the characters are more than 1 byte wide, and
should leave them alone when outputting them. (Note: it works fine
when we transform XML->XML, but XML->HTML results in the escaping - is
there any way to avoid this?)
--
Nick
