Hi Jonathan,

Please note that XML discourages or forbids some Unicode codepoints,
not bytes in specific codepages. Specifically, windows-1252 does not
map any byte to a codepoint in the range [0x80-0x9F].
(see http://www.microsoft.com/globaldev/reference/sbcs/1252.mspx)
For example, 0x80 in windows-1252 maps to Unicode 0x20AC (Euro sign).

This is interesting to read. I was somewhat new to XML (and I'm still not an expert) when I researched this. I expected the behavior that you state above. I made a XML file with "encoding=windows-1252" and entered a few of the problematic bytes/characters (in the windows-1252 codepage). I expected the file to be valid XML, since in the encoding I used all bytes are allowed and defined. To verify I opened the file in XMLSpy and the tool complained about invalid characters. Regardless whether I used the direct character or the XML byte encoding. Therefore I concluded to interpret it as "problematic bytes" and not as "problematic codepoints".

Perhaps, I did something else completely wrong at that time.

Thanks for the info and the clarification.

Dirk

_______________________________________________
vss2svn-users mailing list
Project homepage:
http://www.pumacode.org/projects/vss2svn/
Subscribe/Unsubscribe/Admin:
http://lists.pumacode.org/mailman/listinfo/vss2svn-users-lists.pumacode.org
Mailing list web interface (with searchable archives):
http://dir.gmane.org/gmane.comp.version-control.subversion.vss2svn.user

Reply via email to