Hi Jonathan,
Please note that XML discourages or forbids some Unicode codepoints, not bytes in specific codepages. Specifically, windows-1252 does not map any byte to a codepoint in the range [0x80-0x9F]. (see http://www.microsoft.com/globaldev/reference/sbcs/1252.mspx) For example, 0x80 in windows-1252 maps to Unicode 0x20AC (Euro sign).
This is interesting to read. I was somewhat new to XML (and I'm still not an expert) when I researched this. I expected the behavior that you state above. I made a XML file with "encoding=windows-1252" and entered a few of the problematic bytes/characters (in the windows-1252 codepage). I expected the file to be valid XML, since in the encoding I used all bytes are allowed and defined. To verify I opened the file in XMLSpy and the tool complained about invalid characters. Regardless whether I used the direct character or the XML byte encoding. Therefore I concluded to interpret it as "problematic bytes" and not as "problematic codepoints".
Perhaps, I did something else completely wrong at that time. Thanks for the info and the clarification. Dirk _______________________________________________ vss2svn-users mailing list Project homepage: http://www.pumacode.org/projects/vss2svn/ Subscribe/Unsubscribe/Admin: http://lists.pumacode.org/mailman/listinfo/vss2svn-users-lists.pumacode.org Mailing list web interface (with searchable archives): http://dir.gmane.org/gmane.comp.version-control.subversion.vss2svn.user