Alexander ten Bruggencate wrote:
> 
> when I save an xml file (utf-8 encoding) in XXE V2p2, I get different
> results from win98 and linux:
> 
> saved with xxe running on windows 98:
> <superscript>?,??</superscript>
> 
> saved with xxe running on linux:
> <superscript>??</superscript>
> 
> can anyone explain this and tell me how to work around this?

What follows is a scenario which could explain such behavior:

Let's call the XML file bad.xml. This XML file contains UTF-8 *bytes*
such as ??. I'll make no supposition on its XML declaration (i.e. <?xml
version="1.0" encoding="???" ?>).

XXE loads bad.xml under Windows 98. For an unknown reason, it thinks
that its encoding is Windows-1252. This means that, for XXE, ?? are
*two* Unicode characters: character ? followed by character ?. When
these 2 ``characters'' are saved back in UTF-8, this gives 4 UTF-8 bytes
which are ?,??.

XXE loads bad.xml under Linux. On Linux, it guesses its encoding right:
UTF-8. This means that, for XXE, ?? is a single Unicode character. When
this single character is saved back to UTF-8, this gives the same 2
UTF-8 bytes: ??

Now I cannot fix problems when I cannot reproduce them.

I cannot reproduce such behavior between Linux and Windows 2000. We
don't have Windows98 because XXE is not supported on the Windows 9x
family, only on the NT family.

Reply via email to