Re: Avoiding the escaping UTF-8 unicode text

Keith Rogers 8 Mar 2004 17:10:33 -0000

Actually, David, it did apply - if you think you've got UTF-8, but really don't then the output won't be what you expect.

We're just talking semantics here. I've had a [u]string class (for about 4 years now) that encapsulates both UTF-8 and UTF-16, because it seemed like the X/X transcoders leaked like sieves back when I started using this stuff. The case I was describing just popped out last because a web-form that had previously provided a drop-down list was changed to allow free-form input. The cast to the UTF-8 string (in the DOM_String constructor) had always worked before because the input was always ASCII - when that ceased to be true, it exposed a (gasp) bug in the code. I just changed the cast to the UTF-16 stream and it worked.

[EMAIL PROTECTED] wrote:

This digression does not apply, because what Keith is doing in his code is
wrong. Xerces-C is not "interpreting" his string in any way. He is using
a constructor that expects a character string encoded in the local code
page to create a text node. What he should do is use a UTF-8 transcoder to
transcode the text to UTF-16 and create a text node using the transcoded
string.

Re: Avoiding the escaping UTF-8 unicode text

Reply via email to