I have an xml document (utf encoded) with german character (e.g: � ) In the eg. below the Umlaut character ' � ' is UTF encoded. The XML file looks like this .. <?xml version="1.0" encoding="UTF-8"?> <!-- This came from sample poll servlet --> <!DOCTYPE X > <X Attrib1="Attrib1Info" Attrib2="Attrib2Data" Attrib3="Attrib3Info" > <CD> <C attrib1="testattrib1" attrib2="testattrib2" >UTF Encoded Umlaut character ä</C> </CD> </X> When I parse the document with Xerces (SAX) I see that the parser does not return the character � in the characters(char ch[], int start, int length) callback method. What I expect to receive in the characters array is "UTF Encoded Umlaut character �" in more than one chunks or one long chunk. Instead I get the char's exactly as they appear in the xml doc : "UTF Encoded Umlaut character ä". Why is the parser not able to return me the correct unicode characters when all parsers are supposed to support UTF-8 encoding? If instead of the UTF code for � I have ä (escape it with the character reference) then the parser is able to recognize and returns the correct string in two chunks char array chunk 1: UTF Encoded Umlaut character char array chunk 2: � When I used IE or other xml viewers to view the xml they correctly interpreted UTF encoding and display the xml with german characters. Is there a bug in Xerces SAX or am I missing something? Thanks Ashish
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
