Actually, David, it did apply - if you think you've got UTF-8, but really don't then the output won't be what you expect.
 
We're just talking semantics here.  I've had a [u]string class (for about 4 years now) that encapsulates both UTF-8 and UTF-16, because it seemed like the X/X transcoders leaked like sieves back when I started using this stuff.  The case I was describing just popped out last because a web-form that had previously provided a drop-down list was changed to allow free-form input.  The cast to the UTF-8 string (in the DOM_String constructor) had always worked before because the input was always ASCII - when that ceased to be true, it exposed a (gasp) bug in the code.  I just changed the cast to the UTF-16 stream and it worked.

[EMAIL PROTECTED] wrote:
This digression does not apply, because what Keith is doing in his code is
wrong. Xerces-C is not "interpreting" his string in any way. He is using
a constructor that expects a character string encoded in the local code
page to create a text node. What he should do is use a UTF-8 transcoder to
transcode the text to UTF-16 and create a text node using the transcoded
string.

Reply via email to