Jakub, When a character is expressed as a numeric entity, the parser is not allowed to change the numeric value of the character. [snip]
A parser is not ever allowed to change the numeric value of a character.
The best thing is to avoid using numeric character entities and just encode the character as a UTF-8 byte sequence (or the appropriate character sequence for the charset in effect). That way, XML parsers and serializers are free to translate the character as appropriate for the charset in effect.
Conforming parsers deliver characters in Unicode. The result is the same whether a character is encoded as a UTF-8 sequence or a character reference.
Bob Foster
-- fas
F. Andy Seidl, Co-founder
MyST Technology Partners
http://myst-technology.com | http://blogsite.com
-----Original Message-----
From: Jakub Kahovec [mailto:[EMAIL PROTECTED] Sent: Monday, February 28, 2005 3:09 PM
To: [EMAIL PROTECTED]
Subject: utf-8 characters problem
Hi,
when I parse the xml document (with xerces 2.6.2) which has in xml declaration specified utf-8 encoding and which contains utf-8 characters in character reference form &#xxxx;
the parser replaces these characters with ascii characters. For some characters is ok but for instance InvisibleTimes change for some incorrect strange character sentese.
I'd like to know if is possible to prohibit changing characters from char. ref. form ? Or does it exist some recommendation how to treat with these characters.
Here is a piece of my 'problematic' xml document
<?xml version="1.0" encoding="UTF-8"?> <mathDoc>
<p>Factorise the following quadratic expression: <math> <mrow> <msup> <mrow> <mi>x</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msup> <mo>+</mo> <!-- replaces with character + --> <mi>p</mi> <mo>⁢</mo> <!-- here is InvisibleTimes --> <mi>x</mi> <mo>+</mo> <!-- replaces with character + --> <mi>q</mi> </mrow> </math>
</mathDoc>
Thanks so much
Jakub
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]