Dirk,
> This is interesting to read. I was somewhat new to XML (and I'm still
> not an expert) when I researched this. I expected the behavior that you
> state above. I made a XML file with "encoding=windows-1252" and entered
> a few of the problematic bytes/characters (in the windows-1252
> codepage). I expected the file to be valid XML, since in the encoding I
> used all bytes are allowed and defined. To verify I opened the file in
> XMLSpy and the tool complained about invalid characters. Regardless
> whether I used the direct character or the XML byte encoding. Therefore
> I concluded to interpret it as "problematic bytes" and not as
> "problematic codepoints".

I'm afraid there is no such thing as an "XML byte encoding". If you are
referring to the numerical character reference syntax (Ӓ) then
be aware that this syntax refers to codepoints, not bytes in any
particular encoding. So this indeed includes discouraged characters,
regardless of the encoding :

<?xml version="1.0" encoding="anything" ?>
<data>&#x80;&#83;&#86;</data>

This might explain part of your unexpected results, if you indeed
worked under this misunderstanding.

When using the "direct character" it is difficult to say whether
you could have made a mistake in the experiment without knowing
the details of your procedure. Here is one that works for me :

1. Open Notepad
2. Paste the following XML :

<?xml version="1.0" encoding="windows-1252"?>
<data>€</data>

3. File/Save as "euro.xml", making sure the "ANSI" encoding is
selected in Notepad's save dialog (I'm assuming you're running
under the 1252 codepage).

4. Open "euro.xml" in XML Spy. XML Spy does not complain and
shows the euro character.

5. Open "euro.xml" in Notepad again. Delete the euro sign,
and in its place type ALT+0129. This inserts a small square :
0x81 is an invalid character in windows-1252. Save again.

6. Open the file in XML Spy, now it says that this byte
is invalid, correctly. Note however that MSXML will work
just fine with such broken XML. I don't know what the Perl
parsers do...

Cheers,
--Jonathan 

_______________________________________________
vss2svn-users mailing list
Project homepage:
http://www.pumacode.org/projects/vss2svn/
Subscribe/Unsubscribe/Admin:
http://lists.pumacode.org/mailman/listinfo/vss2svn-users-lists.pumacode.org
Mailing list web interface (with searchable archives):
http://dir.gmane.org/gmane.comp.version-control.subversion.vss2svn.user

Reply via email to