Re: UTF-8 syntax

2001-06-06 Thread DougEwell2
In a message dated 2001-06-06 9:35:45 Pacific Daylight Time, [EMAIL PROTECTED] writes: > we see that Unicode does not *exclude* D800 and DC00 from the > codespace for the CCS, and therefore it would seem that that UTF-8 sequence > would have to be interpreted (in the encoding form level of in

Re: UTF-8S (was: Re: ISO vs Unicode UTF-8)

2001-06-06 Thread Mark Davis
Thanks. That's Markus's invention. Mark - Original Message - From: "Carl W. Brown" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, June 06, 2001 11:08 Subject: RE: UTF-8S (was: Re: ISO vs Unicode UTF-8) > Mark, > > I like the clever ICU technique for sorting in code point

Re: RECOMMENDATIONs( Term Asian is not used properly on Computers andNET)

2001-06-06 Thread David Gallardo
Actually, Hanyu only means Chinese speech (or language) where "Han" (as you mention) refers to the majority ethnic Chinese group. It does not refer specifically to any dialect, however, and the Han people speak a large variety. The official national language of China is called "Putonghua" and ref

Re: Digits shapes

2001-06-06 Thread Jeff Guevin
Duly noted, Gentleman, and not surprising at all. The only question remains, I suppose, LTR > RTL > LTR or ? There's gotta be at least one math historian on this list > From: Edward Cherlin <[EMAIL PROTECTED]> > > This is an FAQ. Europeans call them Arabic numerals because that's > whe

Re: Digits shapes

2001-06-06 Thread Edward Cherlin
This is an FAQ. Europeans call them Arabic numerals because that's where Europe got them from. Same problem as native American Turkeys, which went to Spain, then Turkey, then England, where they were naturally known as Turkey cocks and Turkey hens. (Side note: 1492 is the year the Jews in Spai

RE: UTF-8 syntax

2001-06-06 Thread Misha Wolf
On 06/06/2001 17:20:50 Peter Constable wrote: > >Peter Constable replied: > >> That has to do with XML conformance, not Unicode. You were > >> looking in the wrong spec. > > > >I did not grasp that Mark was talking about XML > > I made a wrong assumption about what Mark was meaning. He used "str

RE: UTF-8S (was: Re: ISO vs Unicode UTF-8)

2001-06-06 Thread Carl W. Brown
Mark, I like the clever ICU technique for sorting in code point order. U_CAPI int32_t U_EXPORT2 u_strcmpCodePointOrder(const UChar *s1, const UChar *s2) { static const UChar utf16Fixup[32]={ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

RE: UTF-8 syntax

2001-06-06 Thread Peter_Constable
> U = (C(subscript: H) ? D800(subscript: 16)) * 400(subscript: 16) + (C > (subscript: L) ? DC00(subscript: 16)) + 1(subscript: 16) That didn't survive very well did it. I think you probably got the gist. The "?" are supposed to be minus signs. I didn't explain that CL and CH are low- and h

RE: UTF-8 syntax

2001-06-06 Thread Peter_Constable
>Peter Constable replied: >> That has to do with XML conformance, not Unicode. You were >> looking in the wrong spec. > >I did not grasp that Mark was talking about XML I made a wrong assumption about what Mark was meaning. He used "strict" in a way that I don't really see supported in the defin

Re: RECOMMENDATIONs( Term Asian is not used properly on Computers and NET)

2001-06-06 Thread Thomas Chan
On Wed, 6 Jun 2001, John Cowan wrote: > Marco Cimarosti scripsit: > > Or "Hanyu", in fact, which is the normal name for "Mandarin" in Mandarin. > > I believe, however, that this term is relatively recent in its current > sense, and is part of the effort the PRC government makes to distinguish >

Re: RECOMMENDATIONs( Term Asian is not used properly on Computers and NET)

2001-06-06 Thread N.R.Liwal
In Pashto they call it "Pashto Alefbe" having 15 Extra Characters, while in Urdu called "Hurruf Tahajee" haveing upto 6 extra Characters, As Mr. Roozbeh said "There are few people who refer to it as Arabic script, namely some > linguists and some Unicodies". Liwal - Original Message -

RE: RECOMMENDATIONs( Term Asian is not used properly on Computersand NET)

2001-06-06 Thread John H. Jenkins
At 11:35 AM +0200 6/6/01, Marco Cimarosti wrote: > >If I remember correctly (Thomas Chan can help us here), the normal Cantonese >word for "hanzi" is spelled with the same two characters as in all other >"CJK" languages, but I don't think they think they write their language in >"Mandarin characte

RE: Digits shapes

2001-06-06 Thread Marco Cimarosti
Jeff Guevin wrote: > My understanding, coming from a professor of Arabic in Cairo, > is that Arabs > (or Egyptians, at least) say that the numerals they use are called > "hindi"--ie, "Indian"--in Arabic. Granted, however, he was unable to > explain why Europeans' digits are called "Arabic" and w

RE: RECOMMENDATIONs( Term Asian is not used properly on Computers and NET)

2001-06-06 Thread Roozbeh Pournader
On Wed, 6 Jun 2001, Marco Cimarosti wrote: > I wonder what non-Arabs users of the Arabic script call it. Perhaps Roozbeh > Pournander and N.R. Liwal can help us: how is the "Arabic alphabet" called > in Farsi, Urdu, and Pashtun? Persian speakers call it "alefbaa-ye faarsi". Persian Alphabet. The

Re: Digits shapes

2001-06-06 Thread Jeff Guevin
My understanding, coming from a professor of Arabic in Cairo, is that Arabs (or Egyptians, at least) say that the numerals they use are called "hindi"--ie, "Indian"--in Arabic. Granted, however, he was unable to explain why Europeans' digits are called "Arabic" and why Arabs don't use Arabic nume

Re: compatibility characters in Arabic block

2001-06-06 Thread Roozbeh Pournader
On Tue, 5 Jun 2001 [EMAIL PROTECTED] wrote: > In the Arabic block, there are four characters with compatibility > decompositions: 0675 - 0678. The names list says that these are for Kazakh. > I'm wondering why these have compatibility decompositions and not canonical > deompositions (possibly exc

RE: UTF-8 syntax

2001-06-06 Thread Marco Cimarosti
I (Marco Cimarosti) asked: > 2) According to the Unicode Standard (with no higher-level > protocols in > action), what code point(s) correspond(s) to the sequence of > UTF-32BE octets > <00 00 00 00 D8 00 00 00 00 00 DC 00>: > A) ? > B) ? > Ooops! Too many octets. That would rather

RE: RECOMMENDATIONs( Term Asian is not used properly on Computers and NET)

2001-06-06 Thread Jungshik Shin
On Tue, 5 Jun 2001, Thomas Chan wrote: > of "ideographs". Actually, what's worse is that "hangul" is often used > rather the name of the language--one'll see a list of choices like > "English, Francais, Deutsch, Nihongo, Hangul..." (the last not in Latin > script, of course, but its own. You'

RE: New kana letters (was RE: Oriyan Language)

2001-06-06 Thread Marco Cimarosti
Kenneth Whistler wrote: > Because of their origin -- they are JIS X 0213 compatibility > characters. > They got into JIS X 0213 because they are used in a native Japanese > katakana-based phonetic transcription system for Ainu. That phonetic > transcription system starts with katakana and then ad

Re: RECOMMENDATIONs( Term Asian is not used properly on Computers andNET)

2001-06-06 Thread John Cowan
Marco Cimarosti scripsit: > Or "Hanyu", in fact, which is the normal name for "Mandarin" in Mandarin. I believe, however, that this term is relatively recent in its current sense, and is part of the effort the PRC government makes to distinguish between "zhongguo" as a political term and "han" a

RE: RECOMMENDATIONs( Term Asian is not used properly on Computersand NET)

2001-06-06 Thread Bertrand Laidain
> or "Latin Quarter" (= the area of Paris where Italian >was spoken?). > No it's because it's a quarter of Paris where you used to find scholars and universities (La Sorbonne) and they spoke Latin ! Today they don't speak speak Latin anymore but you still find universities and students (and touris

RE: RECOMMENDATIONs( Term Asian is not used properly on Computers and NET)

2001-06-06 Thread Marco Cimarosti
Mike Ayers wrote: > 1. When told that I was using a writing system that > was named the same as another language, I was not > inordinately confused. A better example would be the > Latin alphabet used for many European languages (yes, > Latin's a dead language, but a language nonetheless)

RE: RECOMMENDATIONs( Term Asian is not used properly on Computers and NET)

2001-06-06 Thread Marco Cimarosti
Edward Cherlin wrote: > Um, Han doesn't mean Chinese. It means the Han dynasty and its > cultural and ethnic successors. These are *Han* characters in all > three languages. There are many words to say "Chinese" in Chinese. "Han" is one of these, although it also has the more restrict meanings

Digits shapes (was RE: RECOMMENDATIONs( Term Asian is not used properly on Computers and NET))

2001-06-06 Thread Marco Cimarosti
Eliotte Rusty Harold wrote (on [EMAIL PROTECTED]): > Today's European digits like 0, 1, 2, and 3 are actually closer to > the original Hindu glyphs from 1000 years ago than to true Arabic > numerals. Both Arabic and European digits derive from the original > sources in India. however, the Arabi

RE: UTF-8 syntax

2001-06-06 Thread Marco Cimarosti
I (Marco Cimarosti) asked: > >1) The difference between "lenient" vs. "strict" parsers. Mark Davis replied: > 1. By strict, I meant "excludes irregular sequences" Peter Constable replied: > That has to do with XML conformance, not Unicode. You were > looking in the wrong spec. I did not grasp

RE: RECOMMENDATIONs( Term Asian is not used properly on Computers andNET)

2001-06-06 Thread Edward Cherlin
At 3:46 PM -0400 6/4/01, Jungshik Shin wrote: >On Mon, 4 Jun 2001, Ayers, Mike wrote: > >> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] >> >> > For the Han characters, I have found in the past that people >> > whose native >> > language does not use these characters usually refer to th

Re: Unicode under fire again

2001-06-06 Thread DougEwell2
> http://www.hastingsresearch.com/net/04-unicode-limitations.shtml I decided to be courteous this time and let others burn this article to a crisp before stepping in to blow away the ashes. There's something rewarding about reading an anti-Unicode article that starts, in the first paragraph, b