Re: Unicode again

2006-08-17 Thread Kenneth Porter
--On Thursday, August 17, 2006 12:42 PM +0200 Dirk <[EMAIL PROTECTED]> wrote: If you read here http://www.w3.org/TR/REC-xml/#charsets not all characters are valid in XML. Only "|#x9 | #xA | #xD" for the sub-32 codepoints. That is my experience also. you can hexencode the values or do whatever y

Re: Unicode again

2006-08-17 Thread Dirk
Kenneth Porter schrieb: --On Thursday, August 17, 2006 3:03 AM +0200 Dirk <[EMAIL PROTECTED]> wrote: Yes. This should work also, except for the discouraged characters. So it might be possible, that we transfer junk which must be filtered on the vss2svn side before passing it to the XML reader.

Re: Unicode again

2006-08-16 Thread Kenneth Porter
--On Thursday, August 17, 2006 3:03 AM +0200 Dirk <[EMAIL PROTECTED]> wrote: Yes. This should work also, except for the discouraged characters. So it might be possible, that we transfer junk which must be filtered on the vss2svn side before passing it to the XML reader. This is what the "ispri

Re: Unicode again

2006-08-16 Thread Dirk
Does Cyrillic, or any other codepage, use the low 32 code points (ie. control characters) for language characters? As far as I can see, the error in the LOADVSSNAMES thread was due to that, not due to an incorrect choice of codepage. I don't know, and you are right, the conversion does not

Re: Unicode again

2006-08-16 Thread Kenneth Porter
--On Thursday, August 17, 2006 1:29 AM +0200 Dirk <[EMAIL PROTECTED]> wrote: I have cyrillic text in the codepage 855. If I set the codepage to 855 before calling this function, my text will be converted into utf8. I have loaded the resulting XML file in a UTF8 aware editor and the results looke

Re: Unicode again

2006-08-16 Thread Dirk
I just found a comment, that windows UNICODE is UCS-2. What do you think about the following specific code for Windows to convert from the decoded ANSI input to UTF-8: // Convert file ANSI to Windows UNICODE (AKA UCS-2) MultiByteToWideChar(CP_ACP,0,); // now convert from Windows UN

Re: Unicode again

2006-08-16 Thread Dirk
While I'm searching for more information, have you got an idea about the encoding of what windows things is UNICODE. Is it UTF16 or UCS2? Terminology issues here: Unicode is a character set. UTF16 and UCS2 are encodings of that character set. Yepp you are right, I was a little unclear here

Re: Unicode again

2006-08-16 Thread Kenneth Porter
--On Thursday, August 17, 2006 12:35 AM +0200 Dirk <[EMAIL PROTECTED]> wrote: I just found a comment, that windows UNICODE is UCS-2. What do you think about the following specific code for Windows to convert from the decoded ANSI input to UTF-8: // Convert file ANSI to Windows UNICODE (AKA UC

Re: Unicode again

2006-08-16 Thread Kenneth Porter
--On Thursday, August 17, 2006 12:20 AM +0200 Dirk <[EMAIL PROTECTED]> wrote: While I'm searching for more information, have you got an idea about the encoding of what windows things is UNICODE. Is it UTF16 or UCS2? Terminology issues here: Unicode is a character set. UTF16 and UCS2 are encod

Re: Unicode again

2006-08-16 Thread Dirk
Hi, I just found a comment, that windows UNICODE is UCS-2. What do you think about the following specific code for Windows to convert from the decoded ANSI input to UTF-8: // Convert file ANSI to Windows UNICODE (AKA UCS-2) MultiByteToWideChar(CP_ACP,0,); // now convert from Windows U

Re: Unicode again

2006-08-16 Thread Dirk
And also look here for TinyXMLs support for UTF-8: http://www.grinninglizard.com/tinyxmldocs/index.html We're using TinyXML just for writing, so it doesn't need recognition code. The converter may run on a platform other than that used to create the VSS DB, so the VSS locale may not be avail

Re: Unicode again

2006-08-16 Thread Kenneth Porter
--On Wednesday, August 16, 2006 11:38 PM +0200 Dirk <[EMAIL PROTECTED]> wrote: And also look here for TinyXMLs support for UTF-8: http://www.grinninglizard.com/tinyxmldocs/index.html We're using TinyXML just for writing, so it doesn't need recognition code. The converter may run on a platfor

Re: Unicode again

2006-08-16 Thread Dirk
1.) It seems, that stream IO, Unicode/UTF-8, Console and Microsoft does not align very well. 2.) there are ways to solve the problem ;-) without linking to iconv or other conversion libraries. I note the fact that UTF-8 works with files even though it doesn't work with the console. So how abo

Re: Unicode again

2006-08-16 Thread Kenneth Porter
--On Wednesday, August 16, 2006 10:37 PM +0200 Dirk <[EMAIL PROTECTED]> wrote: 1.) It seems, that stream IO, Unicode/UTF-8, Console and Microsoft does not align very well. 2.) there are ways to solve the problem ;-) without linking to iconv or other conversion libraries. I note the fact that