Re: 3rd-party cross-platform UTF-8 support

2001-09-24 Thread Andy Heninger
From: "Marcin 'Qrczak' Kowalczyk" [EMAIL PROTECTED] Why would UTF-16 be easier for internal processing than UTF-8? Both are variable-length encodings. Performance tuning is easier with UTF-16. You can optimize for BMP characters, knowing that surrogate pairs are sufficiently uncommon that

Re: 3rd-party cross-platform UTF-8 support

2001-09-24 Thread Andy Heninger
From: "Marcin 'Qrczak' Kowalczyk" [EMAIL PROTECTED] Why would UTF-16 be easier for internal processing than UTF-8? Both are variable-length encodings. Performance tuning is easier with UTF-16. You can optimize for BMP characters, knowing that surrogate pairs are sufficiently uncommon that

Re: 3rd-party cross-platform UTF-8 support

2001-09-24 Thread Andy Heninger
From: "Marcin 'Qrczak' Kowalczyk" [EMAIL PROTECTED] Why would UTF-16 be easier for internal processing than UTF-8? Both are variable-length encodings. Performance tuning is easier with UTF-16. You can optimize for BMP characters, knowing that surrogate pairs are sufficiently uncommon that

Re: 3rd-party cross-platform UTF-8 support

2001-09-24 Thread Andy Heninger
From: "Marcin 'Qrczak' Kowalczyk" [EMAIL PROTECTED] Why would UTF-16 be easier for internal processing than UTF-8? Both are variable-length encodings. Performance tuning is easier with UTF-16. You can optimize for BMP characters, knowing that surrogate pairs are sufficiently uncommon that

[OT] What happened to the OpenType list?

2001-09-24 Thread Charlie Jolly

Re: Missing Arabic and Syriac characters in Unicode

2001-09-24 Thread Michael Everson
Miikka-Markus, I'd suggest that you write this up as a PDF document (with scanned examples) and submit it to the UTC and WG2 for consideration.

RE: UTF-8 UCS-2/UTF-16 conversion for library use

2001-09-24 Thread Ayers, Mike
From: Asmus Freytag [mailto:[EMAIL PROTECTED]] Sent: Sunday, September 23, 2001 02:24 AM The typical situation involves cases where large data sets are cached in memory, for immediate access. Going to UTF-32 reduces the cache effectively by a factor of two, with no comparable

Re: UTF-8 UCS-2/UTF-16 conversion for library use

2001-09-24 Thread Mark Davis
For this situation you have a good point. For others, however, the extra data space of UTF-32 is bound to be lower cost than having to check every character for special meaning (i.e. surrogate) before passing it on. First, it is generally faster to test something in a cache than it is to

Re: 3rd-party cross-platform UTF-8 support

2001-09-24 Thread Tom Emerson
Andy Heninger writes: Performance tuning is easier with UTF-16. You can optimize for BMP characters, knowing that surrogate pairs are sufficiently uncommon that it's OK for them take a bail-out slow path. Sure, but if you are using UTF-16 (or any other multibyte encoding) you loose the

Re: Position of 1 and 0

2001-09-24 Thread Michael Everson
I forwarded Carl's note to a Typewriter list, and received this response. At 12:49 -0500 2001-09-24, Eric Fischer wrote: Michael Everson [EMAIL PROTECTED] quotes Carl W. Brown: This is logical. Originally typewrites had no 1 or 0. You code use the letters l and O. They look the same so

OT: a joke

2001-09-24 Thread Michael Everson
Three fonts walk into a bar. The barman, wiping a glass, shakes his head and says to them: I'll have none of your type in here.

RE: UTF-8 UCS-2/UTF-16 conversion for library use

2001-09-24 Thread Carl W. Brown
Mike, The typical situation involves cases where large data sets are cached in memory, for immediate access. Going to UTF-32 reduces the cache effectively by a factor of two, with no comparable increase in processing efficiency to balance out the extra cache misses. This is because

RE: a joke

2001-09-24 Thread Suzanne M. Topping
-Original Message- From: Michael Everson [mailto:[EMAIL PROTECTED]] Three fonts walk into a bar. The barman, wiping a glass, shakes his head and says to them: I'll have none of your type in here. Gee, and I thought he was going to say: Why the long face?

RE: 3rd-party cross-platform UTF-8 support

2001-09-24 Thread Carl W. Brown
Tom, Andy Heninger writes: Performance tuning is easier with UTF-16. You can optimize for BMP characters, knowing that surrogate pairs are sufficiently uncommon that it's OK for them take a bail-out slow path. Sure, but if you are using UTF-16 (or any other multibyte encoding) you

RE: UTF-8 UCS-2/UTF-16 conversion for library use

2001-09-24 Thread Ayers, Mike
If you think you have the answer to all the problems, then you don't know all the problems. I tried to make a point, and apparently made it poorly. I will try again. It seems that some people are arguing that UTF-16 is the ideal solution for all computing, and that UTF-8 and

RE: UTF-8 UCS-2/UTF-16 conversion for library use

2001-09-24 Thread Michael Everson
At 13:59 -0500 2001-09-24, Ayers, Mike wrote: It seems that some people are arguing that UTF-16 is the ideal solution for all computing, and that UTF-8 and UTF-32 exist only for network transport. I tend to think that because I have to make web pages using UTF-8, I wish that I had better

Re: a joke

2001-09-24 Thread Michael \(michka\) Kaplan
From: Suzanne M. Topping [EMAIL PROTECTED] From: Michael Everson [mailto:[EMAIL PROTECTED]] Three fonts walk into a bar. The barman, wiping a glass, shakes his head and says to them: I'll have none of your type in here. Gee, and I thought he was going to say: Why the long face?

[OT] Computer input of numbers

2001-09-24 Thread てんどう瘢雹りゅう瘢雹じ
Actually, there is no need of the digits at all! Why even have them? With my Japanese IME, I simply type the number as words (japanese number words are much shorter than the English) and convert! Or, you can say, why in English do we have the words "two", "three", etc., when we can merely write

RE: 3rd-party cross-platform UTF-8 support

2001-09-24 Thread Tom Emerson
Carl W. Brown writes: If you implement an array that is directly indexed by Unicode code point it would have to have 1114111 entries. (I love the number) I don't think that many applications can afford to have over a megabyte of storage per byte of table width. If nothing else it would be

RE: Position of 1 and 0

2001-09-24 Thread Carl W. Brown
Michael, I was over simplifying. If you look at the older teletype keyboards you will notice that the shift is between letters (mono case) and figures. You will also see three rows of keys. With 5 bit encoding you had a letters and figures shift. If I remember correctly the space and charage

Re: UTF-8 UCS-2/UTF-16 conversion for library use

2001-09-24 Thread Michael \(michka\) Kaplan
From: Ayers, Mike [EMAIL PROTECTED] Analyze problem. Pick solution. In that order. Wiser advise was ne'er spoken, on *this* topic at least. I wonder is there is some way that a policy decision can be made to declare a moratorium on the whole *My* UTF is better than *your* UTF for a while?

Re: 3rd-party cross-platform UTF-8 support

2001-09-24 Thread Michael \(michka\) Kaplan
From: Tom Emerson [EMAIL PROTECTED] But if I have a text string, and that string is encoded in UTF-16, and I want to access Unicode character values, then I cannot index that string in constant time. To find character n I have to walk all of the 16-bit values in that string accounting for

RE: UTF-8 UCS-2/UTF-16 conversion for library use

2001-09-24 Thread Carl W. Brown
Mike, If you think you have the answer to all the problems, then you don't know all the problems. I tried to make a point, and apparently made it poorly. I will try again. It seems that some people are arguing that UTF-16 is the ideal solution for all computing, and that

Re: 3rd-party cross-platform UTF-8 support

2001-09-24 Thread Tom Emerson
Michael \(michka\) Kaplan writes: To find character n I have to walk all of the 16-bit values in that string accounting for surrogates. If I use UTF-32 I don't need to do that. This very issue came up during the discussion of how to handle surrogates in Python. Would this not be the

Re: GB18030

2001-09-24 Thread Markus Scherer
Yung-Fong Tang wrote: bascillay GB18030 is design to encode All Unicode BMP in a encoding which is backward compatable with GB2312 and GBK. Correction: to encode _all_ of Unicode, not just all Unicode BMP - GB 18030 covers all 17 planes, not just the BMP. markus

Re: GB18030

2001-09-24 Thread Yung-Fong Tang
Markus Scherer wrote: Yung-Fong Tang wrote: bascillay GB18030 is design to encode All Unicode BMP in a encoding which is backward compatable with GB2312 and GBK. Correction: to encode _all_ of Unicode, not just all Unicode BMP - GB 18030 covers all 17 planes, not just the BMP. Does

Re: GB18030

2001-09-24 Thread David Starner
On Mon, Sep 24, 2001 at 06:18:19PM -0700, Yung-Fong Tang wrote: Markus Scherer wrote: Correction: to encode _all_ of Unicode, not just all Unicode BMP - GB 18030 covers all 17 planes, not just the BMP. Does GB18030 DEFINED the mapping between GB18030 and the rest of 11 planes? I don't

Re: GB18030

2001-09-24 Thread Tom Emerson
Yung-Fong Tang writes: Does GB18030 DEFINED the mapping between GB18030 and the rest of 11 planes? I don't think so, since Unicode have not define them yet, right ? Sure it does. We know what the code points are, even if they don't have characters assigned to them yet. This allows GB18030 to

Re: GB18030

2001-09-24 Thread DougEwell2
In a message dated 2001-09-24 20:50:25 Pacific Daylight Time, [EMAIL PROTECTED] writes: Does GB18030 DEFINED the mapping between GB18030 and the rest of 11 planes? I don't think so, since Unicode have not define them yet, right ? Unicode defined all the planes, a long long time ago. It's