>> A parser is not ever allowed to change the numeric value of a character.<< Yes, agreed. I was really meaning the serialized bytes representing the character--not the character value itself. (Never time to say it right... always time to re-say it. <g>)
-- fas F. Andy Seidl, Co-founder MyST Technology Partners http://myst-technology.com | http://blogsite.com -----Original Message----- From: Bob Foster [mailto:[EMAIL PROTECTED] Sent: Monday, February 28, 2005 5:31 PM To: [EMAIL PROTECTED] Subject: Re: utf-8 characters problem F. Andy Seidl wrote: > Jakub, > When a character is expressed as a numeric entity, the parser is not allowed > to change the numeric value of the character. [snip] A parser is not ever allowed to change the numeric value of a character. > The best thing is to avoid using numeric character entities and just encode > the character as a UTF-8 byte sequence (or the appropriate character > sequence for the charset in effect). That way, XML parsers and serializers > are free to translate the character as appropriate for the charset in > effect. Conforming parsers deliver characters in Unicode. The result is the same whether a character is encoded as a UTF-8 sequence or a character reference. Bob Foster > -- fas > F. Andy Seidl, Co-founder > MyST Technology Partners > http://myst-technology.com | http://blogsite.com > > > > -----Original Message----- > From: Jakub Kahovec [mailto:[EMAIL PROTECTED] > Sent: Monday, February 28, 2005 3:09 PM > To: [EMAIL PROTECTED] > Subject: utf-8 characters problem > > Hi, > when I parse the xml document (with xerces 2.6.2) which has in xml > declaration specified utf-8 encoding and which contains utf-8 characters > in character reference form &#xxxx; > the parser replaces these characters with ascii characters. For some > characters is ok but for instance InvisibleTimes change for some > incorrect strange character sentese. > I'd like to know if is possible to prohibit changing characters from > char. ref. form ? Or does it exist some recommendation how to treat with > these characters. > > Here is a piece of my 'problematic' xml document > > <?xml version="1.0" encoding="UTF-8"?> > <mathDoc> > > <p>Factorise the following quadratic expression: > <math> > <mrow> > <msup> > <mrow> > <mi>x</mi> > </mrow> > <mrow> > <mn>2</mn> > </mrow> > </msup> > <mo>+</mo> <!-- replaces with character + --> > <mi>p</mi> > <mo>⁢</mo> <!-- here is InvisibleTimes --> > <mi>x</mi> > <mo>+</mo> <!-- replaces with character + --> > <mi>q</mi> > </mrow> > </math> > > </mathDoc> > > Thanks so much > > Jakub --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]