Re: Avoiding the escaping UTF-8 unicode text

Nick Bastin 8 Mar 2004 19:28:15 -0000


On Mar 8, 2004, at 2:18 PM, [EMAIL PROTECTED] wrote:

Yes, I was confused by the fact you said and XML to XML tranformation worked correctly, but XML to HTML did not. Clearly, they must have beeen with different data sets, so the comparison was not relevant.

Well, *we* didn't think they were different data sets, but the output xml went through an unintended change between transformations.

I'm curious as to how invalid UTF-8 byte sequences would get into a transformation. If the parser did not detect these, that's a problem. Did you paste these sequences into a document and parse it? What was the encoding declaration on the document?

The problem was that we copied the output XML into a copy-buffer, and pasted it into a new document in a new application. This copy-paste operation altered the characters such that they were no longer valid UTF-8. The UTF-8 encoding declaration was in the document. I was passing them through the XalanTransform sample program, which I believe dumps all errors to STDOUT, and we didn't see anything reported by the transformer.

--
Nick

Re: Avoiding the escaping UTF-8 unicode text

Reply via email to