RE: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-26 Thread Marco Cimarosti
Doug Ewell wrote: A *script* like Latin or Cyrillic typically has many more characters than any one language will ever use. An *alphabet* is, by definition, language-specific. Hhmmm... We probably all agree that Chinese, Japanese and Korean share the "CJK script". But would you say,

RE: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-26 Thread Peter_Constable
On 02/26/2001 03:21:15 AM Marco Cimarosti wrote: But would you say, following your definition, that the subset of the CJK Script used to write Mandarin in Mainland China should be called "The Chinese Simplified *Alphabet*"? It is a writing system, but not an alphabetic one. - "Alphabet" is a

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-26 Thread John Cowan
Marco Cimarosti wrote: - "Script" is a generic term meaning a writing system of any kind, its inventory of signs and its orthographic rules. - "Alphabet" is a specific class of scripts, whose principal characteristic is that tends to map each sign to one of the language's phonemes. I

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-26 Thread Michael Everson
At 07:25 -0800 2001-02-26, John Cowan wrote: Marco Cimarosti wrote: - "Script" is a generic term meaning a writing system of any kind, its inventory of signs and its orthographic rules. - "Alphabet" is a specific class of scripts, whose principal characteristic is that tends to map each sign to

RE: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-26 Thread Marco Cimarosti
Michael Everson wrote: Yes, an alphabet proper is usually the subset of an alphabetic script. Armenian seems to be the exception, as it is only used for one language; Georgian, Latin, Cyrillic, Ogham, Runic, and Greek have been used for other languages. I miss the other language(s) for

RE: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-26 Thread Michael Everson
At 18:44 +0100 2001-02-26, Marco Cimarosti wrote: Michael Everson wrote: Yes, an alphabet proper is usually the subset of an alphabetic script. Armenian seems to be the exception, as it is only used for one language; Georgian, Latin, Cyrillic, Ogham, Runic, and Greek have been used for

RE: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-26 Thread Peter_Constable
On 02/26/2001 03:21:15 AM Marco Cimarosti wrote: But would you say, following your definition, that the subset of the CJK Script used to write Mandarin in Mainland China should be called "The Chinese Simplified *Alphabet*"? It is a writing system, but not an alphabetic one. - "Alphabet" is a

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-26 Thread Kenneth Whistler
Doug Ewell asked, on this hopelessly wandering thread: (Is there an English-language term for the subset of the CJK ideographic script that is used by a given language, say, Japanese?) Well, since "kanji" by now has been borrowed into English, at least among a rather large class of

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-25 Thread Joel Rees
Hi John, Or consider IPv6 network addresses. There are 340,282,366,920,938,463,463,374,607,431,768,211,456 of them. They won't be assigned densely according to current plans, There are actually still quite a few 32 bit IP addresses not in use. (Does there exist one computer attached to the

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-23 Thread Peter_Constable
On 02/22/2001 01:38:24 PM Tom Lord wrote: [EMAIL PROTECTED] wrote: "Unicode is a character set encoding standard which currently provides for its entire character repertoire to be represented using 8-bit, 16-bit or 32-bit encodings." Please say "encoding forms". OK, but I'm more

RE: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-23 Thread Ayers, Mike
I advocate taking it one step farther, and referring to Unicode as "21 bits and counting". Sure, it should be a long long time before more space is needed, but it's a good idea to prepare the audience now. After all, pretty much every ceiling ever established in computing has been

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-23 Thread John Cowan
Ayers, Mike wrote: After all, pretty much every ceiling ever established in computing has been broken through, and there is no reason to believe that it won't happen again! On the contrary. There *are* reasons to believe that it won't happen in the case of character encoding. As for

RE: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-23 Thread Ayers, Mike
From: John Cowan [mailto:[EMAIL PROTECTED]] Ayers, Mike wrote: After all, pretty much every ceiling ever established in computing has been broken through, and there is no reason to believe that it won't happen again! On the contrary. There *are* reasons to believe that it won't

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-22 Thread Peter_Constable
What exactly _would_ be wrong with calling UNICODE a thirty-two bit encoding In part, it's the ambiguity or lack of clarity involved when we say "an encoding". What's an encoding? I think most people (I certainly used to) think of a character encoding as a collection of characters each

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-22 Thread Peter_Constable
On 02/21/2001 10:55:09 PM "Joel Rees" wrote: Now I happen to be of the opinion that the attempt to proclaim the set closed at 17 planes is a little premature. For better or worse, not only is it not premature, it's a done deal! - Peter

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-22 Thread Peter_Constable
On 02/22/2001 12:11:49 PM "P. T. Rourke" wrote: What about saying that it's "an encoding standard which can currently be represented by 8 bit, 16 bit, and 32 bit encodings?" As I revise (and revise, and revise) my page, that's the answer I'm leaning toward. Yes, that should work. I'd make

RE: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-22 Thread Carl W. Brown
Joel, You comment about Microsoft having pie in its face is a bit puzzling. They based NT on Unicode 1.0 and Windows 2000 which was sent to manufacturing 15 months ago has surrogate support. For all its faults MS has been a big promoter of Unicode. What burns me up is Sun implementing a

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-22 Thread Tom Lord
[EMAIL PROTECTED] wrote: "Unicode is a character set encoding standard which currently provides for its entire character repertoire to be represented using 8-bit, 16-bit or 32-bit encodings." Please say "encoding forms". There are three distinct terms, that sound similar, and

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-22 Thread Tex Texin
Peter, good points. What's clear from this discussion, is when somebody asks about the encoding of Unicode, the right response is "Why do you want to know?" not this elaboration of terminology etc. If they want to know maximum character count, tell them 1M+. If they want to know whether it's

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-22 Thread Peter_Constable
On 02/22/2001 04:15:02 PM "Tex Texin" wrote: What's clear from this discussion, is when somebody asks about the encoding of Unicode, the right response is "Why do you want to know?" Yes, that's probably the best response. - Peter

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-22 Thread Joel Rees
Hi, Carl, Joel, You comment about Microsoft having pie in its face is a bit puzzling. They based NT on Unicode 1.0 and Windows 2000 which was sent to manufacturing 15 months ago has surrogate support. For all its faults MS has been a big promoter of Unicode. Sometimes I run off at the

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-22 Thread Joel Rees
Ken, Thanks for the consideration. I threw my ego away years ago. Joel, Note that I am just sending a response to you, not to the list. I wouldn't mind this being on the list. I was making bad assumptions about Sun's and others's reasons for wanting to do perverse things with

RE: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-21 Thread Marco Cimarosti
Peter Constable: On 02/20/2001 03:34:28 AM Marco Cimarosti wrote: "Unicode is now a 32-bit character encoding standard, although only about one million of codes actually exist, [...] Well, it's probably a better answer to say that Unicode is a 20.1-bit encoding since the direct

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-21 Thread Joel Rees
Marco: Would you mind if I re-post my reply that I forget to cc to the list? --- missing post What exactly _would_ be wrong with calling UNICODE a thirty-two bit encoding with a couple of ways to represent most of the characters in a smaller

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-21 Thread Tom Lord
What exactly _would_ be wrong with calling UNICODE a thirty-two bit encoding If I have a 32 bit integer type, holding a Unicode code point, I have 11 bits left over to hold other data. That's worth knowing. Btw, saying approximately 20.087 bits (Am I calculating that

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-20 Thread Antoine Leca
Marco Cimarosti wrote: Doug Ewell wrote: "A 16-bit character encoding standard [...] By contrast, 8-bit ASCII [...] These two statements are regularly found together, but it is the second one that makes me despair. If nearly half a century was not enough time for people to learn

RE: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-20 Thread Peter_Constable
On 02/20/2001 03:34:28 AM Marco Cimarosti wrote: How about considering UTF-32 as the default Unicode form, in order to be able to provide a short answer of this kind: "Unicode is now a 32-bit character encoding standard, although only about one million of codes actually exist, and there

RE: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-20 Thread Cathy Wissink
The people who are responsible for this text have been made aware of the problem. This will be updated for WindowsXP. Cathy -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Tuesday, February 20, 2001 8:04 AM To: Unicode List Subject: Re: Perception that