Paul Keinanen wrote:
Regarding how to describe Unicode in the public, I think it is best to
say that it can encode more than a million characters, of which about
10 (in 3.1) is used. It is better to defer the discussion of any
transformation forms to a much later stage.
I don't agree.
Joel Rees [EMAIL PROTECTED]
I'm telling you that 17 planes is not enough, and it _will_ become a painful
constraint in your lifetime.
How? It looks likely to me that unicode now encodes more than half of the
characters known by living people. Do you think people are going to
expand their
On 2001.02.23 19:42 Arnt Gulbrandsen [EMAIL PROTECTED] asked:
Joel Rees [EMAIL PROTECTED]
I'm telling you that 17 planes is not enough, and it _will_ become a
painful
constraint in your lifetime.
How? It looks likely to me that unicode now encodes more than half of the
characters known
On 02/22/2001 01:38:24 PM Tom Lord wrote:
[EMAIL PROTECTED] wrote:
"Unicode is a character set encoding standard which currently provides
for
its entire character repertoire to be represented using 8-bit, 16-bit
or
32-bit encodings."
Please say "encoding forms".
OK, but I'm more
many comments
- Original Message -
From: "Tom Lord" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Sent: Wednesday, February 21, 2001 21:15
Subject: An Aburdly Brief Introduction to Unicode (was Re: Perception ...)
We've seen several posts about the perception that Unicode is
At 6:28 AM -0800 2/23/01, [EMAIL PROTECTED] wrote:
The unlikelihood of you or anybody coming up with sufficient
evidence to make that case is such that I'd be willing to put less
constraint on you: present clear evidence that more than 880,790 characters
will *ever* be in wide use and will merit
Mark Davis wrote:
A _code_point_ is an integer value which is assigned to an abstract
character. Each character receives a unique code point.
inaccurate. Multiple *abstract characters* can have a single code point;
multiple code points can correspond to a single *abstract character*.
I advocate taking it one step farther, and referring to Unicode as
"21 bits and counting". Sure, it should be a long long time before more
space is needed, but it's a good idea to prepare the audience now. After
all, pretty much every ceiling ever established in computing has been
From: David Starner [mailto:[EMAIL PROTECTED]]
The second example I would like to raise are the "Square
Words" or "New
English Calligraphy"[6] (I don't know which name is more
appropriate,
but I will refer to it hereafter as "NEC"), which is a
Sinoform script.
NEC is a system where
On Wed, Feb 21, 2001 at 10:58:06PM -0800, Thomas Chan wrote:
First, there are the 4000 new[4] "CJK Ideographs" that he created solely
for a work called _Tianshu_ (A Book from the Sky)[5] (1987-1991), which Xu
spent three years carving movable wooden type for. There is no doubt that
In somewhat more detail:
In general, a single abstract character corresponds to a single code point.
However, due to the requirement of compatibility with legacy code sets, plus
some inherent fuzziness in what constitutes abstract characters, there are
cases where this is not true:
- one
At 8:27 AM -0800 2/23/01, Dan Kolis wrote:
Well, if you have no cultural bias and you encode Klingon, you pretty well
have to include anything.
Klingon is not likely to be encoded any time soon. The basic problem
here is that the Klingon Language Institute has shown little interest
in
Ayers, Mike wrote:
After
all, pretty much every ceiling ever established in computing has been broken
through, and there is no reason to believe that it won't happen again!
On the contrary. There *are* reasons to believe that it won't happen
in the case of character encoding.
As for
Since folks are debating whether 21 bits is really enough for Unicode
forever, I thought I should toss in these gems from my quotation
collection, about previous mistakes when people thought something was
big enough:
\QUOTATION{
There is only one mistake that can be made in computer design that
Gentlepeople,
I'm surprised that nobody whose responses I've seen has taken the trouble
to actually go to ANSI to see what "ASCII" means to that
standards-publishing body.
A quick search at http://webstore.ansi.org for the word "ASCII" (without
the quotes, of course) shows the following two
On Fri, 23 Feb 2001, Ayers, Mike wrote:
This, however, is absurd - one of those 1,000,000 words is
"antidisestablishmentarianism", and there's a whole bunch half that long or
longer. Show me the glyphs for them! This NEC thingy may make cute artsy
stuff, but it would be useless for
On Fri, Feb 23, 2001 at 08:11:51AM -0800, Ayers, Mike wrote:
Besides, does anyone
really believe that alphabetic writers would decide that they'd rather learn
thousands of glyphs? We're getting deeply fictional here...
All it would take is some small dictator-run communist country whose
Mark said:
In somewhat more detail:
In general, a single abstract character corresponds to a single code point.
However, due to the requirement of compatibility with legacy code sets, plus
some inherent fuzziness in what constitutes abstract characters, there are
cases where this is not
On 02/23/2001 09:58:55 AM John Cowan wrote:
Mark Davis wrote:
A _code_point_ is an integer value which is assigned to an abstract
character. Each character receives a unique code point.
inaccurate. Multiple *abstract characters* can have a single code point;
multiple code points can
On 02/23/2001 10:34:05 AM "Mark Davis" wrote:
In somewhat more detail:
In general, a single abstract character corresponds to a single code
point.
However, due to the requirement of compatibility with legacy code sets,
plus
some inherent fuzziness in what constitutes abstract characters, there
On 02/23/2001 01:28:07 PM Kenneth Whistler wrote:
- one abstract character can correspond to two different code points
{a with ring above} == U+00C5 LATIN CAPITAL LETTER WITH RING ABOVE
== U+212B ANGSTROM SIGN (singleton canonical
equivalence
At 10:51 PM 2/22/01, Joel Rees wrote:
So Plane 9, say, can be nothing but surrogates-of-surrogates, to some 64-
or 128-bit code space.
You do mean for UTF-16, don't you?
Let me be somewhat more explicit, now that I've thought about it for a
while. IIRC there is an entire private use
From: John Cowan [mailto:[EMAIL PROTECTED]]
Ayers, Mike wrote:
After
all, pretty much every ceiling ever established in
computing has been broken
through, and there is no reason to believe that it won't
happen again!
On the contrary. There *are* reasons to believe that it won't
Sorry, I tuned out for a moment: is there a URL for the final version of
Tex's tabulation of benefits?
Also, I'd appreciate any similar links that might be used in a page of
info for the uninitiated.
Best,
Richard
Richard,
The list is attached. The page contains some links which would help
someone get started.
I did mean to make a couple more small changes that I haven't gotten
to yet. In particular, someone wrote me that the item:
"Standards insure interoperability and portability by prescribing
Mark Davis wrote:
that must be made about what counts as an abstract character and what
does not; and the generally acknowledged desirability of supporting
bijective mappings between a variety of older character sets and
while I like bijective, it is not a commonly understood term.
I
26 matches
Mail list logo