Re: [Mono-list] unicode trouble

2004-02-09 Thread gabor
On Sun, Feb 08, 2004 at 10:37:45PM -0800, Chris Mullins wrote: .NET has the ability to: 1) Iterate over strings by graphemes so that regardless of encoding, developers can treat Unicode combining characters and surrogate pairs as a single entity. 2) Build and manipulate strings that

Re: [Mono-list] unicode trouble

2004-02-09 Thread Jonathan Pryor
On Mon, 2004-02-09 at 02:22, gabor wrote: snip/ i just can't understand why the designers of dotnet didn't look at the unicode standards. i can understand that java has this problem, but java is much older than dotnet. maybe it's because winapi uses 16-bit characters? I imagine it's due to

Re: [Mono-list] unicode trouble

2004-02-09 Thread max
No, Gabor is not confused. Unicode has grown. It is now 20 bits, not 16. See for example http://www.terena.nl/library/multiling/unicode/utf16.html (which I just found by googling; it looks a bit out-of-date). I had absolutely no clue about this ;) I've been using unicode for years and I

Re: [Mono-list] unicode trouble

2004-02-09 Thread Marcus
As I recall, when the CM3 Modula-3 compiler added support for unicode, they used a hybrid scheme where TEXTs (their equivalent of System.String) can contain both 8-bit and 16-bit chars. So only the portions of the string that require more than 8 bits use it. Something similar could be done with

Re: [Mono-list] unicode trouble

2004-02-09 Thread Jonathan Pryor
On Mon, 2004-02-09 at 19:21, Marcus wrote: As I recall, when the CM3 Modula-3 compiler added support for unicode, they used a hybrid scheme where TEXTs (their equivalent of System.String) can contain both 8-bit and 16-bit chars. So only the portions of the string that require more than 8

[Mono-list] unicode trouble

2004-02-08 Thread gabor
hi, as i understand, characters in .net are 16-bit values. but what about unicode characters, that are simply above the 16-bit limit? for example: OLD ITALIC LETTER A (unicode code: 10300). how do you represent those in .net? i tried to open a textfile containing this old-italic-a: - the

Re: [Mono-list] unicode trouble

2004-02-08 Thread max
Hi Gabor, I think you're confused. Characters in .NET are 16 bits BECAUSE they are unicode. 16 bits = 2 bytes = 65536 values. a way to check that is simple. here's some C# example code: string s = a; s += (char)10300; Console.WriteLine(s = + s); Console.WriteLine(len =

Re: [Mono-list] unicode trouble

2004-02-08 Thread Fergus Henderson
On 08-Feb-2004, max [EMAIL PROTECTED] wrote: Hi Gabor, I think you're confused. Characters in .NET are 16 bits BECAUSE they are unicode. 16 bits = 2 bytes = 65536 values. No, Gabor is not confused. Unicode has grown. It is now 20 bits, not 16. See for example

RE: [Mono-list] unicode trouble

2004-02-08 Thread Fabio Montoya [EMAIL PROTECTED]
represent those in .net? Cheers! Fabio Montoya | -Original Message- | From: [EMAIL PROTECTED] | [mailto:[EMAIL PROTECTED] On Behalf Of max | Sent: Sunday, February 08, 2004 10:04 PM | To: gabor; [EMAIL PROTECTED] | Subject: Re: [Mono-list] unicode trouble | | Hi Gabor, | I think

RE: [Mono-list] unicode trouble

2004-02-08 Thread Fabio Montoya [EMAIL PROTECTED]
'; [EMAIL PROTECTED] | Subject: RE: [Mono-list] unicode trouble | | | | Gabor is right Max! The Unicode standard defines characters | in a 32 bit space, The Unicode Character Space in 32 bits or UCS-32. | | For practical reasons, the Unicode standard defines | transformation formats, | i.e.: | | UTF