> "Carl W. Brown" wrote:
>> "Michael \(michka\) Kaplan", Mon, 2001-09-17 12:07:19 -0700 wrote:
>>> "Carl W. Brown" wrote:
>>> It would seem to be that if you either have to change the UTF-8 code to
>>> support CESU-8 or change the UTF-16 compare logic then changing the UTF-16
>>> logic to do code
From: "Ayers, Mike" <[EMAIL PROTECTED]>
> > From: John Cowan [mailto:[EMAIL PROTECTED]]
> > [EMAIL PROTECTED] scripsit:
> > > Oops! One of two "Unicode 101" mistakes I made in the
> > > same day. Where was my brain?
> > Unicode Ate Your Brain, of course! (See my tutorial at
> > Orlando thi
> From: John Cowan [mailto:[EMAIL PROTECTED]]
>
> [EMAIL PROTECTED] scripsit:
>
> > Oops! One of two "Unicode 101" mistakes I made in the same
> day. Where was
> > my brain?
>
> Unicode Ate Your Brain, of course! (See my tutorial at
> Orlando this year.)
Nah, UTF ate it!
[EMAIL PROTECTED] scripsit:
> Oops! One of two "Unicode 101" mistakes I made in the same day. Where was
> my brain?
Unicode Ate Your Brain, of course! (See my tutorial at Orlando this year.)
--
John Cowan http://www.ccil.org/~cowan [EMAIL PROTECTED]
Please leave your
In a message dated 2001-09-18 9:22:17 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
> Doug Ewell wrote:
>> All Unicode code points of the form U+FE and U+FF are
>> special, in
>> that they are non-characters and can be treated in a special way by
>> applications (e.g. as sentinels
David Hopwood and Carl Brown graciously corrected me:
>> I don't agree that irregular UTF-8 sequences in general can only decode to
>> characters above 0x.
>
> That's why I specifically referred to irregular sequences as defined by
> Unicode 3.1 (i.e. UAX #27).
I stand corrected. That's w
Doug Ewell wrote:
> All Unicode code points of the form U+FE and U+FF are
> special, in
> that they are non-characters and can be treated in a special way by
> applications (e.g. as sentinels).
I think this should be "All Unicode code points of the form U+xxFFFE and
U+xx are specia
-BEGIN PGP SIGNED MESSAGE-
[EMAIL PROTECTED] wrote:
> In a message dated 2001-09-17 16:24:05 Pacific Daylight Time,
> [EMAIL PROTECTED] writes:
>
> > It doesn't reopen that specific type of security hole, because irregular
> > UTF-8 sequences (as defined by Unicode 3.1) can only decode t
Doug,
>
> It is true that the *specific* irregular UTF-8 sequences introduced (and
> required) by CESU-8 decode to characters above 0x when interpreted as
> CESU-8, and to pairs of surrogate code points when (incorrectly)
> interpreted
> as UTF-8. Since definition D29, arguably my least favo
In a message dated 2001-09-17 16:24:05 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
> It doesn't reopen that specific type of security hole, because irregular
UTF-8
> sequences (as defined by Unicode 3.1) can only decode to characters above
> 0x, and those characters are unlikely to be
On Sun, Sep 16, 2001 at 09:28:34PM +0100, David Hopwood wrote:
> It doesn't reopen that specific type of security hole, because irregular
> UTF-8 sequences (as defined by Unicode 3.1) can only decode to characters above
> 0x, and those characters are unlikely to be "special" for any applicatio
-BEGIN PGP SIGNED MESSAGE-
"Carl W. Brown" wrote:
> Doug,
> > But if people start compromising their UTF-8 parsers to accommodate
> > CESU-8 "adaptively," it would be a great blow to UTF-8. It would
> > essentially undo all the tightening-up that was accomplished by the
> > Corrigendum,
-BEGIN PGP SIGNED MESSAGE-
Mark Davis wrote:
> A few notes:
>
> - IANA is a registry. I believe the only legitimate grounds that they have
> for denying a registration is that it is incompletely specified or has a
> misleading name.
No, that's not correct. The relevant document defining
One technical nit:
The document says:
2.1 c. The bit pattern 0xxx is illegal in any CESU-8 byte, ...
In fact, this should say "The bit patterns are illegal in ..."
The changes are subtle: one '0' replaced by 'x' - you want to forbid all bytes >=0xf0
(f0..ff), not just f0..f7.
(The
Mark,
> - Just because it is in IANA does *not* mean that everyone will
> support it.
> There are many encodings in IANA supported by very few people. Nor does it
> mean that it is intended for widespread public use. The IANA registry is
> also used as a general purpose registry, even for encodi
From: "Carl W. Brown" <[EMAIL PROTECTED]>
> It would seem to be that if you either have to change the UTF-8 code to
> support CESU-8 or change the UTF-16 compare logic then changing the UTF-16
> logic to do code point order compares is a much more containable change
with
> a much lower processing
From: "John Cowan" <[EMAIL PROTECTED]>
> False.
>
> IANA's registry is merely de facto: what they register is not in fact
> encodings, but *names* of encodings. The charset name "ISO646-DE" is
> legal as an XML encoding, but it would astonish me if any extant
> XML parser supports it. (This is
Doug,
> But if people start compromising their UTF-8 parsers to
> accommodate CESU-8
> "adaptively," it would be a great blow to UTF-8. It would
> essentially undo
> all the tightening-up that was accomplished by the Corrigendum,
> and it would
> revive all the old Bruce Schneier-style skeptici
MichKa,
> Also, Toby was not attempting to be deceitful, AFAIK. The
> original proposal
> he submitted (still called UTF-8S) was not in any way
> contradictory but many
> people objected to various issues within it and the way many things were
> presented. The current proposal was a very rushed a
On Mon, Sep 17, 2001 at 08:45:59AM -0700, Michael (michka) Kaplan wrote:
> Actually, once its in IANA then it is legal in XML and other places, and
> *everyone* will have to support it, whether they want to or not. What is
I think that's a little excessive. UTF-1, NATS-DANO, GOST_19768-74 and
SCS
From: "Carl W. Brown" <[EMAIL PROTECTED]>
> In actuality it would be difficult for IANA to deny a character set for
any
> "official" character set so the decision is actually up to the Unicode
> committee.
I concur.
> I don't believe that the idea of registering CESU-8 with IANA came from
the
>
From: "Mark Davis" <[EMAIL PROTECTED]>
> - A significant reason for CESU-8 garnering enough support was that its
> introduction allows the definition of UTF-8 itself to be tightened, to
> formally exclude the 3-byte surrogates both in reading and writing.
I do not see this as a valid argument at
Μαργίτῃ
[http://www.macchiato.com]
- Original Message -
From: "Michael (michka) Kaplan" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Monday, September 17, 2001 8:45 AM
Subject: Re: PDUT
MichKa,
> Actually, once its in IANA then it is legal in XML and other places, and
> *everyone* will have to support it, whether they want to or not. What is
> supposedly private will become quite public. IANA, after all,
> does not have
> charsets that they register for people to "not use" and n
From: <[EMAIL PROTECTED]>
> If Michka is referring to non-compliant CESU-8 parsers, I really
> wouldn't care much because CESU-8 is supposed to live in its
> own little private world. But if people start compromising their
> UTF-8 parsers to accommodate CESU-8 "adaptively," it would
> be a great
In a message dated 2001-09-17 4:25:47 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
>> How should an UTF-8 application behave if it accidentally receives
>> a CESU-8 surrogate sequence? How does an application which
>> relies on CESU-8 binary sorting behave if it accidentally receives an
>>
From: "Marco Cimarosti" <[EMAIL PROTECTED]>
> Does renaming "UTF-8S" to "CESU-8" fix all the issues that were
> discussed on this mailing list at the beginning of last spring?
In my opinion (and the opinion of some others), no. But they do represent
the *attempt* to answer them.
> Specifically:
Julie Doll Allen wrote:
> Proposed Draft Unicode Technical Report #26: Compatibility Encoding
> Scheme for UTF-16: 8-Bit (CESU-8) is now available at:
> http://www.unicode.org/unicode/reports/tr26/
Does renaming "UTF-8S" to "CESU-8" fix all the issues that were discussed on
this mailing list at t
gt; From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
> Behalf Of Kenneth Whistler
> Sent: Friday, September 14, 2001 10:43 AM
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: Anti-UTF-16 Rant (was: Re: PDUTR #26 posted)
>
>
> Marcin,
>
> > IMHO Unicode wou
Marcin,
> IMHO Unicode would have been a better standard if UTF-16
> hadn't existed.
I won't repeat Asmus' rebuttal, which adequately addresses this
claim.
However, if people want to go off on the "UTF-16 sucks,
UTF-8/UTF-32, the way Linux likes it, is far better, and
besides, Windows sucks" t
From: "Ayers, Mike" <[EMAIL PROTECTED]>
> Not in the best mood, am I?
Well, you did forget the all important "My encoding is better than your
encoding!" at the end. :-)
MichKa
Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/
> From: Marcin 'Qrczak' Kowalczyk [mailto:[EMAIL PROTECTED]]
> Sent: Friday, September 14, 2001 02:11 AM
> Thu, 13 Sep 2001 12:52:04 -0700, Asmus Freytag
> <[EMAIL PROTECTED]> pisze:
>
> > UTF-32 does have the same byte order issues as UTF-16, except that
> > byte order is recognizable withou
Michael (michka) Kaplan wrote:
> Actually, most "internal" mechanisms do not need cch calculations; for them
> the count of code points is fine. Since you do not work much with platforms
> that use it, I guess you would not have run across this fact?
IMO if you need just code points which are no
From: "Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]>
> Thu, 13 Sep 2001 12:52:04 -0700, Asmus Freytag <[EMAIL PROTECTED]>
> > utf-8 cannot as readily be used as internal format.
>
> It's as easy as UTF-16. Unless you want a broken implementation which
> treats surrogates as pairs of characters.
Thu, 13 Sep 2001 12:52:04 -0700, Asmus Freytag <[EMAIL PROTECTED]> pisze:
> UTF-32 does have the same byte order issues as UTF-16, except that
> byte order is recognizable without a BOM.
UTF-8 would be used for external communication almost exclusively.
Especially as it's compatible with ASCII a
Two things I forgot to add:
Thu, 13 Sep 2001 12:52:04 -0700, Asmus Freytag <[EMAIL PROTECTED]> pisze:
>>IMHO Unicode would have been a better standard if UTF-16
>>hadn't existed.
>
> Decidedly not. In fact, Unicode would not be widely implemented today.
It's much simpler to migrate from byte e
At 11:42 AM 9/13/01 +, Marcin 'Qrczak' Kowalczyk wrote:
>IMHO Unicode would have been a better standard if UTF-16
>hadn't existed.
Decidedly not. In fact, Unicode would not be widely implemented today.
>Just UTF-8 and UTF-32, code points in the range
>U+..7FFF, no surrogates, no conf
Wed, 12 Sep 2001 11:08:41 -0700, Julie Doll Allen <[EMAIL PROTECTED]> pisze:
> Proposed Draft Unicode Technical Report #26: Compatibility Encoding
> Scheme for UTF-16: 8-Bit (CESU-8) is now available at:
> http://www.unicode.org/unicode/reports/tr26/
IMHO Unicode would have been a better standar
38 matches
Mail list logo