> hi,
>     i am using xalan-c++ to perform XPath queries on an XML document, All
> works fine except some non ascii characters when encoded as UTF-8 cause
an
> exception in theliaison->parseXMLStream();

I suggest you catch the exception and take a look at the error message.
Without that, it will be impossible to diagnose the problem.  Start with
catching SAXParseException, because that's probably what's being thrown.

> A example problematic character is the german umlaut. The XML its
trnsported
> over http/SOAP from a VB application to Xalan-C++ using gSOAP. looking at
> the encoding of the umlaut character shows it is sent from VB as two
bytes
> (hex) C3 84  - (decimal) 195 132

The two bytes C3 84 in UTF-8 encode the Unicode character U+00C4, Latin
Capital Letter A With Diaeresis, or capital A with an umlaut.  Is that the
character you're expecting?

> however if i return the same character created from the Xerces-C++ DOM
this character is encoded as Ã&#132.

What do you mean by "if i return the same character created from the
Xerces-C++ DOM?"  How did you create this instance?  Did you parse it?  If
not, that DOM instance probably isn't relevant to the discussion.  Do you
mean you are serializing an instance of the DOM, and you are getting those
two characters?  If that's the case, you have an encoding problem, because,
in UTF-16, you are getting U+00C3 (Latin Capital Letter A With Tilde) and
U+0132, which is a control character.

My understanding of VB, which is extremely limited, is that strings are
encoded in UCS-2, not UTF-8.  You may have a problem with parsing a
document which contains an encoding declaration asserting the document is
in UTF-8, when it really is UCS-2.

Dave

Reply via email to