Alexander ten Bruggencate wrote: > > when I save an xml file (utf-8 encoding) in XXE V2p2, I get different > results from win98 and linux: > > saved with xxe running on windows 98: > <superscript>?,??</superscript> > > saved with xxe running on linux: > <superscript>??</superscript> > > can anyone explain this and tell me how to work around this?
What follows is a scenario which could explain such behavior: Let's call the XML file bad.xml. This XML file contains UTF-8 *bytes* such as ??. I'll make no supposition on its XML declaration (i.e. <?xml version="1.0" encoding="???" ?>). XXE loads bad.xml under Windows 98. For an unknown reason, it thinks that its encoding is Windows-1252. This means that, for XXE, ?? are *two* Unicode characters: character ? followed by character ?. When these 2 ``characters'' are saved back in UTF-8, this gives 4 UTF-8 bytes which are ?,??. XXE loads bad.xml under Linux. On Linux, it guesses its encoding right: UTF-8. This means that, for XXE, ?? is a single Unicode character. When this single character is saved back to UTF-8, this gives the same 2 UTF-8 bytes: ?? Now I cannot fix problems when I cannot reproduce them. I cannot reproduce such behavior between Linux and Windows 2000. We don't have Windows98 because XXE is not supported on the Windows 9x family, only on the NT family.

