In a message dated 2001-06-06 9:35:45 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
> we see that Unicode does not *exclude* D800 and DC00 from the
> codespace for the CCS, and therefore it would seem that that UTF-8 sequence
> would have to be interpreted (in the encoding form level of in
Thanks. That's Markus's invention.
Mark
- Original Message -
From: "Carl W. Brown" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, June 06, 2001 11:08
Subject: RE: UTF-8S (was: Re: ISO vs Unicode UTF-8)
> Mark,
>
> I like the clever ICU technique for sorting in code point
Actually, Hanyu only means Chinese speech (or language) where "Han" (as you
mention) refers to the majority ethnic Chinese group. It does not refer
specifically to any dialect, however, and the Han people speak a large
variety.
The official national language of China is called "Putonghua" and ref
Duly noted, Gentleman, and not surprising at all. The only question
remains, I suppose, LTR > RTL > LTR or ? There's gotta be at least one
math historian on this list
> From: Edward Cherlin <[EMAIL PROTECTED]>
>
> This is an FAQ. Europeans call them Arabic numerals because that's
> whe
This is an FAQ. Europeans call them Arabic numerals because that's
where Europe got them from. Same problem as native American Turkeys,
which went to Spain, then Turkey, then England, where they were
naturally known as Turkey cocks and Turkey hens. (Side note: 1492 is
the year the Jews in Spai
On 06/06/2001 17:20:50 Peter Constable wrote:
> >Peter Constable replied:
> >> That has to do with XML conformance, not Unicode. You were
> >> looking in the wrong spec.
> >
> >I did not grasp that Mark was talking about XML
>
> I made a wrong assumption about what Mark was meaning. He used "str
Mark,
I like the clever ICU technique for sorting in code point order.
U_CAPI int32_t U_EXPORT2
u_strcmpCodePointOrder(const UChar *s1, const UChar *s2) {
static const UChar utf16Fixup[32]={
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> U = (C(subscript: H) ? D800(subscript: 16)) * 400(subscript: 16) + (C
> (subscript: L) ? DC00(subscript: 16)) + 1(subscript: 16)
That didn't survive very well did it. I think you probably got the gist.
The "?" are supposed to be minus signs. I didn't explain that CL and CH are
low- and h
>Peter Constable replied:
>> That has to do with XML conformance, not Unicode. You were
>> looking in the wrong spec.
>
>I did not grasp that Mark was talking about XML
I made a wrong assumption about what Mark was meaning. He used "strict" in
a way that I don't really see supported in the defin
On Wed, 6 Jun 2001, John Cowan wrote:
> Marco Cimarosti scripsit:
> > Or "Hanyu", in fact, which is the normal name for "Mandarin" in Mandarin.
>
> I believe, however, that this term is relatively recent in its current
> sense, and is part of the effort the PRC government makes to distinguish
>
In Pashto they call it "Pashto Alefbe" having 15 Extra Characters,
while in Urdu called "Hurruf Tahajee" haveing upto 6 extra Characters,
As Mr. Roozbeh said "There are few people who refer to it as Arabic script,
namely some
> linguists and some Unicodies".
Liwal
- Original Message -
At 11:35 AM +0200 6/6/01, Marco Cimarosti wrote:
>
>If I remember correctly (Thomas Chan can help us here), the normal Cantonese
>word for "hanzi" is spelled with the same two characters as in all other
>"CJK" languages, but I don't think they think they write their language in
>"Mandarin characte
Jeff Guevin wrote:
> My understanding, coming from a professor of Arabic in Cairo,
> is that Arabs
> (or Egyptians, at least) say that the numerals they use are called
> "hindi"--ie, "Indian"--in Arabic. Granted, however, he was unable to
> explain why Europeans' digits are called "Arabic" and w
On Wed, 6 Jun 2001, Marco Cimarosti wrote:
> I wonder what non-Arabs users of the Arabic script call it. Perhaps Roozbeh
> Pournander and N.R. Liwal can help us: how is the "Arabic alphabet" called
> in Farsi, Urdu, and Pashtun?
Persian speakers call it "alefbaa-ye faarsi". Persian Alphabet. The
My understanding, coming from a professor of Arabic in Cairo, is that Arabs
(or Egyptians, at least) say that the numerals they use are called
"hindi"--ie, "Indian"--in Arabic. Granted, however, he was unable to
explain why Europeans' digits are called "Arabic" and why Arabs don't use
Arabic nume
On Tue, 5 Jun 2001 [EMAIL PROTECTED] wrote:
> In the Arabic block, there are four characters with compatibility
> decompositions: 0675 - 0678. The names list says that these are for Kazakh.
> I'm wondering why these have compatibility decompositions and not canonical
> deompositions (possibly exc
I (Marco Cimarosti) asked:
> 2) According to the Unicode Standard (with no higher-level
> protocols in
> action), what code point(s) correspond(s) to the sequence of
> UTF-32BE octets
> <00 00 00 00 D8 00 00 00 00 00 DC 00>:
> A) ?
> B) ?
>
Ooops! Too many octets. That would rather
On Tue, 5 Jun 2001, Thomas Chan wrote:
> of "ideographs". Actually, what's worse is that "hangul" is often used
> rather the name of the language--one'll see a list of choices like
> "English, Francais, Deutsch, Nihongo, Hangul..." (the last not in Latin
> script, of course, but its own.
You'
Kenneth Whistler wrote:
> Because of their origin -- they are JIS X 0213 compatibility
> characters.
> They got into JIS X 0213 because they are used in a native Japanese
> katakana-based phonetic transcription system for Ainu. That phonetic
> transcription system starts with katakana and then ad
Marco Cimarosti scripsit:
> Or "Hanyu", in fact, which is the normal name for "Mandarin" in Mandarin.
I believe, however, that this term is relatively recent in its current
sense, and is part of the effort the PRC government makes to distinguish
between "zhongguo" as a political term and "han" a
> or "Latin Quarter" (= the area of Paris where Italian
>was spoken?).
>
No it's because it's a quarter of Paris where you used to find scholars
and universities (La Sorbonne) and they spoke Latin ! Today they don't
speak speak Latin anymore but you still find universities and students
(and touris
Mike Ayers wrote:
> 1. When told that I was using a writing system that
> was named the same as another language, I was not
> inordinately confused. A better example would be the
> Latin alphabet used for many European languages (yes,
> Latin's a dead language, but a language nonetheless)
Edward Cherlin wrote:
> Um, Han doesn't mean Chinese. It means the Han dynasty and its
> cultural and ethnic successors. These are *Han* characters in all
> three languages.
There are many words to say "Chinese" in Chinese. "Han" is one of these,
although it also has the more restrict meanings
Eliotte Rusty Harold wrote (on [EMAIL PROTECTED]):
> Today's European digits like 0, 1, 2, and 3 are actually closer to
> the original Hindu glyphs from 1000 years ago than to true Arabic
> numerals. Both Arabic and European digits derive from the original
> sources in India. however, the Arabi
I (Marco Cimarosti) asked:
> >1) The difference between "lenient" vs. "strict" parsers.
Mark Davis replied:
> 1. By strict, I meant "excludes irregular sequences"
Peter Constable replied:
> That has to do with XML conformance, not Unicode. You were
> looking in the wrong spec.
I did not grasp
At 3:46 PM -0400 6/4/01, Jungshik Shin wrote:
>On Mon, 4 Jun 2001, Ayers, Mike wrote:
>
>> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
>>
>> > For the Han characters, I have found in the past that people
>> > whose native
>> > language does not use these characters usually refer to th
> http://www.hastingsresearch.com/net/04-unicode-limitations.shtml
I decided to be courteous this time and let others burn this article to a
crisp before stepping in to blow away the ashes.
There's something rewarding about reading an anti-Unicode article that
starts, in the first paragraph, b
27 matches
Mail list logo