>> A parser is not ever allowed to change the numeric value of a
character.<<
Yes, agreed.  I was really meaning the serialized bytes representing the
character--not the character value itself.  (Never time to say it right...
always time to re-say it.  <g>)

  -- fas
 F. Andy Seidl, Co-founder
MyST Technology Partners
http://myst-technology.com | http://blogsite.com
 

-----Original Message-----
From: Bob Foster [mailto:[EMAIL PROTECTED] 
Sent: Monday, February 28, 2005 5:31 PM
To: [EMAIL PROTECTED]
Subject: Re: utf-8 characters problem

F. Andy Seidl wrote:
> Jakub,
> When a character is expressed as a numeric entity, the parser is not
allowed
> to change the numeric value of the character. [snip]

A parser is not ever allowed to change the numeric value of a character.

> The best thing is to avoid using numeric character entities and just
encode
> the character as a UTF-8 byte sequence (or the appropriate character
> sequence for the charset in effect).  That way, XML parsers and
serializers
> are free to translate the character as appropriate for the charset in
> effect.

Conforming parsers deliver characters in Unicode. The result is the same 
whether a character is encoded as a UTF-8 sequence or a character reference.

Bob Foster

>   -- fas
>  F. Andy Seidl, Co-founder
> MyST Technology Partners
> http://myst-technology.com | http://blogsite.com
>  
>  
> 
> -----Original Message-----
> From: Jakub Kahovec [mailto:[EMAIL PROTECTED] 
> Sent: Monday, February 28, 2005 3:09 PM
> To: [EMAIL PROTECTED]
> Subject: utf-8 characters problem
> 
> Hi,
> when I parse the xml document (with xerces 2.6.2) which has in xml 
> declaration specified utf-8 encoding and which contains utf-8 characters 
> in character reference form &#xxxx;
> the parser replaces these characters  with ascii characters. For some 
> characters is ok but for instance InvisibleTimes change for some 
> incorrect strange character sentese.
> I'd like to know if is possible to prohibit changing characters from 
> char. ref. form ? Or does it exist some recommendation how to treat with 
> these characters.
> 
> Here is a piece of my 'problematic' xml document
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <mathDoc>
> 
> <p>Factorise the following quadratic expression:
>         <math>
>           <mrow>
>             <msup>
>               <mrow>
>             <mi>x</mi>
>               </mrow>
>               <mrow>
>             <mn>2</mn>
>               </mrow>
>             </msup>
>             <mo>&#x002b;</mo> <!-- replaces with character + -->
>             <mi>p</mi>
>             <mo>&#x2062;</mo>   <!-- here is InvisibleTimes -->
>                     <mi>x</mi>
>             <mo>&#x002b;</mo>  <!-- replaces with character + -->
>             <mi>q</mi>
>           </mrow>
>         </math>
> 
> </mathDoc>
> 
> Thanks so much
> 
> Jakub



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to