Dirk, > This is interesting to read. I was somewhat new to XML (and I'm still > not an expert) when I researched this. I expected the behavior that you > state above. I made a XML file with "encoding=windows-1252" and entered > a few of the problematic bytes/characters (in the windows-1252 > codepage). I expected the file to be valid XML, since in the encoding I > used all bytes are allowed and defined. To verify I opened the file in > XMLSpy and the tool complained about invalid characters. Regardless > whether I used the direct character or the XML byte encoding. Therefore > I concluded to interpret it as "problematic bytes" and not as > "problematic codepoints".
I'm afraid there is no such thing as an "XML byte encoding". If you are referring to the numerical character reference syntax (Ӓ) then be aware that this syntax refers to codepoints, not bytes in any particular encoding. So this indeed includes discouraged characters, regardless of the encoding : <?xml version="1.0" encoding="anything" ?> <data>€SV</data> This might explain part of your unexpected results, if you indeed worked under this misunderstanding. When using the "direct character" it is difficult to say whether you could have made a mistake in the experiment without knowing the details of your procedure. Here is one that works for me : 1. Open Notepad 2. Paste the following XML : <?xml version="1.0" encoding="windows-1252"?> <data>€</data> 3. File/Save as "euro.xml", making sure the "ANSI" encoding is selected in Notepad's save dialog (I'm assuming you're running under the 1252 codepage). 4. Open "euro.xml" in XML Spy. XML Spy does not complain and shows the euro character. 5. Open "euro.xml" in Notepad again. Delete the euro sign, and in its place type ALT+0129. This inserts a small square : 0x81 is an invalid character in windows-1252. Save again. 6. Open the file in XML Spy, now it says that this byte is invalid, correctly. Note however that MSXML will work just fine with such broken XML. I don't know what the Perl parsers do... Cheers, --Jonathan
_______________________________________________ vss2svn-users mailing list Project homepage: http://www.pumacode.org/projects/vss2svn/ Subscribe/Unsubscribe/Admin: http://lists.pumacode.org/mailman/listinfo/vss2svn-users-lists.pumacode.org Mailing list web interface (with searchable archives): http://dir.gmane.org/gmane.comp.version-control.subversion.vss2svn.user