Jakub, When a character is expressed as a numeric entity, the parser is not allowed to change the numeric value of the character. So, when using numeric entities, it is important to use the Unicode character values. Since ASCII values are also Unicode, it is always safe to do something like  . But for non-ascii characters, you need to be more careful. Some, like the circled-R (R) registered symbol, is hex A9 in both the windows character set *and* in Unicode. So, this © often works *by accident* in XML documents where as the trademark TM character (157, I think) is not the same in Windows and Unicode and is often found to be the source of problems in XML documents originating on Windows. The best thing is to avoid using numeric character entities and just encode the character as a UTF-8 byte sequence (or the appropriate character sequence for the charset in effect). That way, XML parsers and serializers are free to translate the character as appropriate for the charset in effect. -- fas F. Andy Seidl, Co-founder MyST Technology Partners http://myst-technology.com | http://blogsite.com
-----Original Message----- From: Jakub Kahovec [mailto:[EMAIL PROTECTED] Sent: Monday, February 28, 2005 3:09 PM To: [EMAIL PROTECTED] Subject: utf-8 characters problem Hi, when I parse the xml document (with xerces 2.6.2) which has in xml declaration specified utf-8 encoding and which contains utf-8 characters in character reference form &#xxxx; the parser replaces these characters with ascii characters. For some characters is ok but for instance InvisibleTimes change for some incorrect strange character sentese. I'd like to know if is possible to prohibit changing characters from char. ref. form ? Or does it exist some recommendation how to treat with these characters. Here is a piece of my 'problematic' xml document <?xml version="1.0" encoding="UTF-8"?> <mathDoc> <p>Factorise the following quadratic expression: <math> <mrow> <msup> <mrow> <mi>x</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msup> <mo>+</mo> <!-- replaces with character + --> <mi>p</mi> <mo>⁢</mo> <!-- here is InvisibleTimes --> <mi>x</mi> <mo>+</mo> <!-- replaces with character + --> <mi>q</mi> </mrow> </math> </mathDoc> Thanks so much Jakub --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]