Kenneth Porter schrieb:
--On Thursday, August 17, 2006 3:03 AM +0200 Dirk <[EMAIL PROTECTED]> wrote:

Yes. This should work also, except for the discouraged characters. So it
might be possible, that we transfer junk which must be filtered on the
vss2svn side before passing it to the XML reader.

This is what the "isprint()" call in the TinyXML formatter did, before I bypassed it. It bases its decision on the locale, and I couldn't figure out how to easily install and select the 1252 locale on Linux.

Not really, isprint checks whether it is a printeable character, but not whether it is a discouraged character. Technically it would not make sence to include non printeable characters in the output, so isprint would be ok. But I don't know how isprint will treat characters that are not defined in a specifc codepage. Probably they are unprinteable, so this shouldn't be a problem also, but we had problems reading the XML file.

Do you have access to a windows system? Then I will send you a version of ssphys with "windows anscii to utf8" encoding later this evening.

And that still wouldn't protect you from codes below 32, as those are getting hex-encoded (ie. "&#x15;") before the isprint().

My experience was, that even if I encoded the discourage characters in the hex encoding, a few XML-Readers still complained about bad input.

Just another question: Do you know whether it is possible to specify the
ThreadCodepage. All functions that I find specify the locale and with it
the codepage. But there is no function to specify the codepage itself,
like SetACP or SetThreadACP.

Unknown. I'm still relatively green to I18N and know essentially nothing about the Windows support for it.

I can't tell that I'm a old fox in this are ;-), either. This problem drives me crazy, too. And it seems also that everybody has the problem also and that everybody is reinventing the whell over and over again.

What I'd really like to know is why the Perl XML parser considers a sub-32 codepoint invalid when hex-encoded. Or if that is indeed the issue being reported in the LOADVSSNAMES thread. I don't want to solve a problem that isn't there.

Yupp.. Good question.

If you read here http://www.w3.org/TR/REC-xml/#charsets not all characters are valid in XML. Only "|#x9 | #xA | #xD" for the sub-32 codepoints.

That is my experience also. you can hexencode the values or do whatever you want. The character is still not allowed in a XML stream.

Dirk
|
_______________________________________________
vss2svn-users mailing list
Project homepage:
http://www.pumacode.org/projects/vss2svn/
Subscribe/Unsubscribe/Admin:
http://lists.pumacode.org/mailman/listinfo/vss2svn-users-lists.pumacode.org
Mailing list web interface (with searchable archives):
http://dir.gmane.org/gmane.comp.version-control.subversion.vss2svn.user

Reply via email to