Jakub,
When a character is expressed as a numeric entity, the parser is not allowed
to change the numeric value of the character.  So, when using numeric
entities, it is important to use the Unicode character values.  Since ASCII
values are also Unicode, it is always safe to do something like  .  But
for non-ascii characters, you need to be more careful.  Some, like the
circled-R (R) registered symbol, is hex A9 in both the windows character set
*and* in Unicode.  So, this © often works *by accident* in XML
documents where as the trademark TM character (157, I think) is not the same
in Windows and Unicode and is often found to be the source of problems in
XML documents originating on Windows.
The best thing is to avoid using numeric character entities and just encode
the character as a UTF-8 byte sequence (or the appropriate character
sequence for the charset in effect).  That way, XML parsers and serializers
are free to translate the character as appropriate for the charset in
effect.
  -- fas
 F. Andy Seidl, Co-founder
MyST Technology Partners
http://myst-technology.com | http://blogsite.com
 
 

-----Original Message-----
From: Jakub Kahovec [mailto:[EMAIL PROTECTED] 
Sent: Monday, February 28, 2005 3:09 PM
To: [EMAIL PROTECTED]
Subject: utf-8 characters problem

Hi,
when I parse the xml document (with xerces 2.6.2) which has in xml 
declaration specified utf-8 encoding and which contains utf-8 characters 
in character reference form &#xxxx;
the parser replaces these characters  with ascii characters. For some 
characters is ok but for instance InvisibleTimes change for some 
incorrect strange character sentese.
I'd like to know if is possible to prohibit changing characters from 
char. ref. form ? Or does it exist some recommendation how to treat with 
these characters.

Here is a piece of my 'problematic' xml document

<?xml version="1.0" encoding="UTF-8"?>
<mathDoc>

<p>Factorise the following quadratic expression:
        <math>
          <mrow>
            <msup>
              <mrow>
            <mi>x</mi>
              </mrow>
              <mrow>
            <mn>2</mn>
              </mrow>
            </msup>
            <mo>&#x002b;</mo> <!-- replaces with character + -->
            <mi>p</mi>
            <mo>&#x2062;</mo>   <!-- here is InvisibleTimes -->
                    <mi>x</mi>
            <mo>&#x002b;</mo>  <!-- replaces with character + -->
            <mi>q</mi>
          </mrow>
        </math>

</mathDoc>

Thanks so much

Jakub

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to