Re: PDUTR #26 posted

2001-09-28 Thread jgo
> "Carl W. Brown" wrote: >> "Michael \(michka\) Kaplan", Mon, 2001-09-17 12:07:19 -0700 wrote: >>> "Carl W. Brown" wrote: >>> It would seem to be that if you either have to change the UTF-8 code to >>> support CESU-8 or change the UTF-16 compare logic then changing the UTF-16 >>> logic to do code

Re: PDUTR #26 posted

2001-09-19 Thread Michael \(michka\) Kaplan
From: "Ayers, Mike" <[EMAIL PROTECTED]> > > From: John Cowan [mailto:[EMAIL PROTECTED]] > > [EMAIL PROTECTED] scripsit: > > > Oops! One of two "Unicode 101" mistakes I made in the > > > same day. Where was my brain? > > Unicode Ate Your Brain, of course! (See my tutorial at > > Orlando thi

RE: PDUTR #26 posted

2001-09-19 Thread Ayers, Mike
> From: John Cowan [mailto:[EMAIL PROTECTED]] > > [EMAIL PROTECTED] scripsit: > > > Oops! One of two "Unicode 101" mistakes I made in the same > day. Where was > > my brain? > > Unicode Ate Your Brain, of course! (See my tutorial at > Orlando this year.) Nah, UTF ate it!

Re: PDUTR #26 posted

2001-09-19 Thread John Cowan
[EMAIL PROTECTED] scripsit: > Oops! One of two "Unicode 101" mistakes I made in the same day. Where was > my brain? Unicode Ate Your Brain, of course! (See my tutorial at Orlando this year.) -- John Cowan http://www.ccil.org/~cowan [EMAIL PROTECTED] Please leave your

Re: PDUTR #26 posted

2001-09-18 Thread DougEwell2
In a message dated 2001-09-18 9:22:17 Pacific Daylight Time, [EMAIL PROTECTED] writes: > Doug Ewell wrote: >> All Unicode code points of the form U+FE and U+FF are >> special, in >> that they are non-characters and can be treated in a special way by >> applications (e.g. as sentinels

Re: PDUTR #26 posted

2001-09-18 Thread DougEwell2
David Hopwood and Carl Brown graciously corrected me: >> I don't agree that irregular UTF-8 sequences in general can only decode to >> characters above 0x. > > That's why I specifically referred to irregular sequences as defined by > Unicode 3.1 (i.e. UAX #27). I stand corrected. That's w

RE: PDUTR #26 posted

2001-09-18 Thread Marco Cimarosti
Doug Ewell wrote: > All Unicode code points of the form U+FE and U+FF are > special, in > that they are non-characters and can be treated in a special way by > applications (e.g. as sentinels). I think this should be "All Unicode code points of the form U+xxFFFE and U+xx are specia

Re: PDUTR #26 posted

2001-09-18 Thread David Hopwood
-BEGIN PGP SIGNED MESSAGE- [EMAIL PROTECTED] wrote: > In a message dated 2001-09-17 16:24:05 Pacific Daylight Time, > [EMAIL PROTECTED] writes: > > > It doesn't reopen that specific type of security hole, because irregular > > UTF-8 sequences (as defined by Unicode 3.1) can only decode t

RE: PDUTR #26 posted

2001-09-18 Thread Carl W. Brown
Doug, > > It is true that the *specific* irregular UTF-8 sequences introduced (and > required) by CESU-8 decode to characters above 0x when interpreted as > CESU-8, and to pairs of surrogate code points when (incorrectly) > interpreted > as UTF-8. Since definition D29, arguably my least favo

Re: PDUTR #26 posted

2001-09-17 Thread DougEwell2
In a message dated 2001-09-17 16:24:05 Pacific Daylight Time, [EMAIL PROTECTED] writes: > It doesn't reopen that specific type of security hole, because irregular UTF-8 > sequences (as defined by Unicode 3.1) can only decode to characters above > 0x, and those characters are unlikely to be

Re: PDUTR #26 posted

2001-09-17 Thread David Starner
On Sun, Sep 16, 2001 at 09:28:34PM +0100, David Hopwood wrote: > It doesn't reopen that specific type of security hole, because irregular > UTF-8 sequences (as defined by Unicode 3.1) can only decode to characters above > 0x, and those characters are unlikely to be "special" for any applicatio

Re: PDUTR #26 posted

2001-09-17 Thread David Hopwood
-BEGIN PGP SIGNED MESSAGE- "Carl W. Brown" wrote: > Doug, > > But if people start compromising their UTF-8 parsers to accommodate > > CESU-8 "adaptively," it would be a great blow to UTF-8. It would > > essentially undo all the tightening-up that was accomplished by the > > Corrigendum,

Re: PDUTR #26 posted

2001-09-17 Thread David Hopwood
-BEGIN PGP SIGNED MESSAGE- Mark Davis wrote: > A few notes: > > - IANA is a registry. I believe the only legitimate grounds that they have > for denying a registration is that it is incompletely specified or has a > misleading name. No, that's not correct. The relevant document defining

Re: PDUTR #26 posted

2001-09-17 Thread Markus Scherer
One technical nit: The document says: 2.1 c. The bit pattern 0xxx is illegal in any CESU-8 byte, ... In fact, this should say "The bit patterns are illegal in ..." The changes are subtle: one '0' replaced by 'x' - you want to forbid all bytes >=0xf0 (f0..ff), not just f0..f7. (The

RE: PDUTR #26 posted

2001-09-17 Thread Carl W. Brown
Mark, > - Just because it is in IANA does *not* mean that everyone will > support it. > There are many encodings in IANA supported by very few people. Nor does it > mean that it is intended for widespread public use. The IANA registry is > also used as a general purpose registry, even for encodi

Re: PDUTR #26 posted

2001-09-17 Thread Michael \(michka\) Kaplan
From: "Carl W. Brown" <[EMAIL PROTECTED]> > It would seem to be that if you either have to change the UTF-8 code to > support CESU-8 or change the UTF-16 compare logic then changing the UTF-16 > logic to do code point order compares is a much more containable change with > a much lower processing

Re: PDUTR #26 posted

2001-09-17 Thread Michael \(michka\) Kaplan
From: "John Cowan" <[EMAIL PROTECTED]> > False. > > IANA's registry is merely de facto: what they register is not in fact > encodings, but *names* of encodings. The charset name "ISO646-DE" is > legal as an XML encoding, but it would astonish me if any extant > XML parser supports it. (This is

RE: PDUTR #26 posted

2001-09-17 Thread Carl W. Brown
Doug, > But if people start compromising their UTF-8 parsers to > accommodate CESU-8 > "adaptively," it would be a great blow to UTF-8. It would > essentially undo > all the tightening-up that was accomplished by the Corrigendum, > and it would > revive all the old Bruce Schneier-style skeptici

RE: PDUTR #26 posted

2001-09-17 Thread Carl W. Brown
MichKa, > Also, Toby was not attempting to be deceitful, AFAIK. The > original proposal > he submitted (still called UTF-8S) was not in any way > contradictory but many > people objected to various issues within it and the way many things were > presented. The current proposal was a very rushed a

Re: PDUTR #26 posted

2001-09-17 Thread David Starner
On Mon, Sep 17, 2001 at 08:45:59AM -0700, Michael (michka) Kaplan wrote: > Actually, once its in IANA then it is legal in XML and other places, and > *everyone* will have to support it, whether they want to or not. What is I think that's a little excessive. UTF-1, NATS-DANO, GOST_19768-74 and SCS

Re: PDUTR #26 posted

2001-09-17 Thread Michael \(michka\) Kaplan
From: "Carl W. Brown" <[EMAIL PROTECTED]> > In actuality it would be difficult for IANA to deny a character set for any > "official" character set so the decision is actually up to the Unicode > committee. I concur. > I don't believe that the idea of registering CESU-8 with IANA came from the >

Re: PDUTR #26 posted

2001-09-17 Thread Michael \(michka\) Kaplan
From: "Mark Davis" <[EMAIL PROTECTED]> > - A significant reason for CESU-8 garnering enough support was that its > introduction allows the definition of UTF-8 itself to be tightened, to > formally exclude the 3-byte surrogates both in reading and writing. I do not see this as a valid argument at

Re: PDUTR #26 posted

2001-09-17 Thread Mark Davis
Μαργίτῃ [http://www.macchiato.com] - Original Message - From: "Michael (michka) Kaplan" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Monday, September 17, 2001 8:45 AM Subject: Re: PDUT

RE: PDUTR #26 posted

2001-09-17 Thread Carl W. Brown
MichKa, > Actually, once its in IANA then it is legal in XML and other places, and > *everyone* will have to support it, whether they want to or not. What is > supposedly private will become quite public. IANA, after all, > does not have > charsets that they register for people to "not use" and n

Re: PDUTR #26 posted

2001-09-17 Thread Michael \(michka\) Kaplan
From: <[EMAIL PROTECTED]> > If Michka is referring to non-compliant CESU-8 parsers, I really > wouldn't care much because CESU-8 is supposed to live in its > own little private world. But if people start compromising their > UTF-8 parsers to accommodate CESU-8 "adaptively," it would > be a great

Re: PDUTR #26 posted

2001-09-17 Thread DougEwell2
In a message dated 2001-09-17 4:25:47 Pacific Daylight Time, [EMAIL PROTECTED] writes: >> How should an UTF-8 application behave if it accidentally receives >> a CESU-8 surrogate sequence? How does an application which >> relies on CESU-8 binary sorting behave if it accidentally receives an >>

Re: PDUTR #26 posted

2001-09-17 Thread Michael \(michka\) Kaplan
From: "Marco Cimarosti" <[EMAIL PROTECTED]> > Does renaming "UTF-8S" to "CESU-8" fix all the issues that were > discussed on this mailing list at the beginning of last spring? In my opinion (and the opinion of some others), no. But they do represent the *attempt* to answer them. > Specifically:

RE: PDUTR #26 posted

2001-09-17 Thread Marco Cimarosti
Julie Doll Allen wrote: > Proposed Draft Unicode Technical Report #26: Compatibility Encoding > Scheme for UTF-16: 8-Bit (CESU-8) is now available at: > http://www.unicode.org/unicode/reports/tr26/ Does renaming "UTF-8S" to "CESU-8" fix all the issues that were discussed on this mailing list at t

RE: Anti-UTF-16 Rant (was: Re: PDUTR #26 posted)

2001-09-14 Thread Carl W. Brown
gt; From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On > Behalf Of Kenneth Whistler > Sent: Friday, September 14, 2001 10:43 AM > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > Subject: Anti-UTF-16 Rant (was: Re: PDUTR #26 posted) > > > Marcin, > > > IMHO Unicode wou

Anti-UTF-16 Rant (was: Re: PDUTR #26 posted)

2001-09-14 Thread Kenneth Whistler
Marcin, > IMHO Unicode would have been a better standard if UTF-16 > hadn't existed. I won't repeat Asmus' rebuttal, which adequately addresses this claim. However, if people want to go off on the "UTF-16 sucks, UTF-8/UTF-32, the way Linux likes it, is far better, and besides, Windows sucks" t

Re: PDUTR #26 posted

2001-09-14 Thread Michael \(michka\) Kaplan
From: "Ayers, Mike" <[EMAIL PROTECTED]> > Not in the best mood, am I? Well, you did forget the all important "My encoding is better than your encoding!" at the end. :-) MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.com/

RE: PDUTR #26 posted

2001-09-14 Thread Ayers, Mike
> From: Marcin 'Qrczak' Kowalczyk [mailto:[EMAIL PROTECTED]] > Sent: Friday, September 14, 2001 02:11 AM > Thu, 13 Sep 2001 12:52:04 -0700, Asmus Freytag > <[EMAIL PROTECTED]> pisze: > > > UTF-32 does have the same byte order issues as UTF-16, except that > > byte order is recognizable withou

Re: PDUTR #26 posted

2001-09-14 Thread Igor Bukanov
Michael (michka) Kaplan wrote: > Actually, most "internal" mechanisms do not need cch calculations; for them > the count of code points is fine. Since you do not work much with platforms > that use it, I guess you would not have run across this fact? IMO if you need just code points which are no

Re: PDUTR #26 posted

2001-09-14 Thread Michael \(michka\) Kaplan
From: "Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]> > Thu, 13 Sep 2001 12:52:04 -0700, Asmus Freytag <[EMAIL PROTECTED]> > > utf-8 cannot as readily be used as internal format. > > It's as easy as UTF-16. Unless you want a broken implementation which > treats surrogates as pairs of characters.

Re: PDUTR #26 posted

2001-09-14 Thread Marcin 'Qrczak' Kowalczyk
Thu, 13 Sep 2001 12:52:04 -0700, Asmus Freytag <[EMAIL PROTECTED]> pisze: > UTF-32 does have the same byte order issues as UTF-16, except that > byte order is recognizable without a BOM. UTF-8 would be used for external communication almost exclusively. Especially as it's compatible with ASCII a

Re: PDUTR #26 posted

2001-09-14 Thread Marcin 'Qrczak' Kowalczyk
Two things I forgot to add: Thu, 13 Sep 2001 12:52:04 -0700, Asmus Freytag <[EMAIL PROTECTED]> pisze: >>IMHO Unicode would have been a better standard if UTF-16 >>hadn't existed. > > Decidedly not. In fact, Unicode would not be widely implemented today. It's much simpler to migrate from byte e

Re: PDUTR #26 posted

2001-09-13 Thread Asmus Freytag
At 11:42 AM 9/13/01 +, Marcin 'Qrczak' Kowalczyk wrote: >IMHO Unicode would have been a better standard if UTF-16 >hadn't existed. Decidedly not. In fact, Unicode would not be widely implemented today. >Just UTF-8 and UTF-32, code points in the range >U+..7FFF, no surrogates, no conf

Re: PDUTR #26 posted

2001-09-13 Thread Marcin 'Qrczak' Kowalczyk
Wed, 12 Sep 2001 11:08:41 -0700, Julie Doll Allen <[EMAIL PROTECTED]> pisze: > Proposed Draft Unicode Technical Report #26: Compatibility Encoding > Scheme for UTF-16: 8-Bit (CESU-8) is now available at: > http://www.unicode.org/unicode/reports/tr26/ IMHO Unicode would have been a better standar