Doug Ewell wrote:
A *script* like Latin or Cyrillic typically has many more
characters than any one language will ever use.
An *alphabet* is, by definition, language-specific.
Hhmmm...
We probably all agree that Chinese, Japanese and Korean share the "CJK
script".
But would you say,
On 02/26/2001 03:21:15 AM Marco Cimarosti wrote:
But would you say, following your definition, that the subset of the CJK
Script used to write Mandarin in Mainland China should be called "The
Chinese Simplified *Alphabet*"?
It is a writing system, but not an alphabetic one.
- "Alphabet" is a
Marco Cimarosti wrote:
- "Script" is a generic term meaning a writing system of any kind, its
inventory of signs and its orthographic rules.
- "Alphabet" is a specific class of scripts, whose principal characteristic
is that tends to map each sign to one of the language's phonemes.
I
At 07:25 -0800 2001-02-26, John Cowan wrote:
Marco Cimarosti wrote:
- "Script" is a generic term meaning a writing system of any kind, its
inventory of signs and its orthographic rules.
- "Alphabet" is a specific class of scripts, whose principal characteristic
is that tends to map each sign to
Michael Everson wrote:
Yes, an alphabet proper is usually the subset of an alphabetic
script. Armenian seems to be the exception, as it is only used for
one language; Georgian, Latin, Cyrillic, Ogham, Runic, and Greek have
been used for other languages.
I miss the other language(s) for
At 18:44 +0100 2001-02-26, Marco Cimarosti wrote:
Michael Everson wrote:
Yes, an alphabet proper is usually the subset of an alphabetic
script. Armenian seems to be the exception, as it is only used for
one language; Georgian, Latin, Cyrillic, Ogham, Runic, and Greek have
been used for
On 02/26/2001 03:21:15 AM Marco Cimarosti wrote:
But would you say, following your definition, that the subset of the CJK
Script used to write Mandarin in Mainland China should be called "The
Chinese Simplified *Alphabet*"?
It is a writing system, but not an alphabetic one.
- "Alphabet" is a
Doug Ewell asked, on this hopelessly wandering thread:
(Is
there an English-language term for the subset of the CJK ideographic script
that is used by a given language, say, Japanese?)
Well, since "kanji" by now has been borrowed into English, at least among
a rather large class of
Hi John,
Or consider IPv6 network addresses. There are
340,282,366,920,938,463,463,374,607,431,768,211,456 of them. They
won't be assigned densely according to current plans,
There are actually still quite a few 32 bit IP addresses not in use. (Does
there exist one computer attached to the
On 02/22/2001 01:38:24 PM Tom Lord wrote:
[EMAIL PROTECTED] wrote:
"Unicode is a character set encoding standard which currently provides
for
its entire character repertoire to be represented using 8-bit, 16-bit
or
32-bit encodings."
Please say "encoding forms".
OK, but I'm more
I advocate taking it one step farther, and referring to Unicode as
"21 bits and counting". Sure, it should be a long long time before more
space is needed, but it's a good idea to prepare the audience now. After
all, pretty much every ceiling ever established in computing has been
Ayers, Mike wrote:
After
all, pretty much every ceiling ever established in computing has been broken
through, and there is no reason to believe that it won't happen again!
On the contrary. There *are* reasons to believe that it won't happen
in the case of character encoding.
As for
From: John Cowan [mailto:[EMAIL PROTECTED]]
Ayers, Mike wrote:
After
all, pretty much every ceiling ever established in
computing has been broken
through, and there is no reason to believe that it won't
happen again!
On the contrary. There *are* reasons to believe that it won't
What exactly _would_ be wrong with calling UNICODE a
thirty-two bit encoding
In part, it's the ambiguity or lack of clarity involved when we say "an
encoding". What's an encoding? I think most people (I certainly used to)
think of a character encoding as a collection of characters each
On 02/21/2001 10:55:09 PM "Joel Rees" wrote:
Now I happen to be of the opinion that the attempt to proclaim the set
closed at 17 planes is a little premature.
For better or worse, not only is it not premature, it's a done deal!
- Peter
On 02/22/2001 12:11:49 PM "P. T. Rourke" wrote:
What about saying that it's "an encoding standard which can currently be
represented by 8 bit, 16 bit, and 32 bit encodings?" As I revise (and
revise, and revise) my page, that's the answer I'm leaning toward.
Yes, that should work. I'd make
Joel,
You comment about Microsoft having pie in its face is a bit puzzling. They
based NT on Unicode 1.0 and Windows 2000 which was sent to manufacturing 15
months ago has surrogate support. For all its faults MS has been a big
promoter of Unicode.
What burns me up is Sun implementing a
[EMAIL PROTECTED] wrote:
"Unicode is a character set encoding standard which currently provides for
its entire character repertoire to be represented using 8-bit, 16-bit or
32-bit encodings."
Please say "encoding forms".
There are three distinct terms, that sound similar, and
Peter, good points.
What's clear from this discussion, is when somebody asks about
the encoding of Unicode, the right response is "Why do you want to
know?" not this elaboration of terminology etc.
If they want to know maximum character count, tell them 1M+.
If they want to know whether it's
On 02/22/2001 04:15:02 PM "Tex Texin" wrote:
What's clear from this discussion, is when somebody asks about
the encoding of Unicode, the right response is "Why do you want to
know?"
Yes, that's probably the best response.
- Peter
Hi, Carl,
Joel,
You comment about Microsoft having pie in its face is a bit puzzling.
They
based NT on Unicode 1.0 and Windows 2000 which was sent to manufacturing
15
months ago has surrogate support. For all its faults MS has been a big
promoter of Unicode.
Sometimes I run off at the
Ken,
Thanks for the consideration. I threw my ego away years ago.
Joel,
Note that I am just sending a response to you, not to the list.
I wouldn't mind this being on the list. I was making bad assumptions
about
Sun's and others's reasons for wanting to do perverse things with
Peter Constable:
On 02/20/2001 03:34:28 AM Marco Cimarosti wrote:
"Unicode is now a 32-bit character encoding standard,
although only about one million of codes actually exist,
[...]
Well, it's probably a better answer to say that Unicode is a 20.1-bit
encoding since the direct
Marco:
Would you mind if I re-post my reply that I forget to cc to the list?
--- missing post
What exactly _would_ be wrong with calling UNICODE a thirty-two bit encoding
with a couple of ways to represent most of the characters in a smaller
What exactly _would_ be wrong with calling UNICODE a
thirty-two bit encoding
If I have a 32 bit integer type, holding a Unicode code point, I have
11 bits left over to hold other data. That's worth knowing.
Btw, saying approximately 20.087 bits (Am I calculating that
Marco Cimarosti wrote:
Doug Ewell wrote:
"A 16-bit character encoding standard [...]
By contrast, 8-bit ASCII [...]
These two statements are regularly found together, but it is the second one
that makes me despair.
If nearly half a century was not enough time for people to learn
On 02/20/2001 03:34:28 AM Marco Cimarosti wrote:
How about considering UTF-32 as the default Unicode form, in order to be
able to provide a short answer of this kind:
"Unicode is now a 32-bit character encoding standard, although only
about one million of codes actually exist, and there
The people who are responsible for this text have been made aware of the
problem. This will be updated for WindowsXP.
Cathy
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, February 20, 2001 8:04 AM
To: Unicode List
Subject: Re: Perception that
28 matches
Mail list logo