RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

てんどうりゅうじ Tue, 29 May 2001 18:44:50 -0700

You can just say Screw the number 8, let's use 21-bit bytes.


★じゅういっちゃん★

EKYWY TXLY NPZ P MPVD XPHYV LPWWQY
NKT ZPN XT WYPZTX PE PMM ET HPWWD
"EYX EKTSZPXV'Z HTWY GSX
P XSHOYW EKPX TXY
PXV LTHHQEHYXE, ET HY, QZ RSQEY ZLPWD"


--- Original Message ---
差出人: "Carl W. Brown" <[EMAIL PROTECTED]>;
宛先: [EMAIL PROTECTED];
Cc: 
日時: 01/05/30 0:46
件名: RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

>Ken,
>
>I suspect that Oracle is specifically pushing for this standard because of
>its unique data base design.  In a sense Oracle almost picks it self up by
>its own bootstraps.  It has always tried to minimize actual code.  Therefore
>it was a natural choice to implement Unicode with UTF-8 because it is easy
>to reuse the multibyte support with minor changes to handle a different
>character length algorithm.  This has been one of the reasons that Oracle
>has been successful.  Its tinker toy like design has enabled them to quickly
>adapt and add new features.  Now however, they should take the time do "do
>it right".  Its UTF-8 storage creates problems for database designers
>because they can not predict field sizes.  This is a problem with MBCS code
>pages but UTF-8s will make it worse.  There will be lots of wasted storage
>when characters can vary in size from 1 to 6 bytes.
>
>Most other database systems require specific code to support Unicode.  As a
>consequence most have implemented using UCS-2.  Their migration is obviously
>to use UTF-16.  UTF-8s buys them nothing but headaches.
>
>Carl
>
>-----Original Message-----
>From: Kenneth Whistler [mailto:[EMAIL PROTECTED]]
>Sent: Tuesday, May 29, 2001 3:47 PM
>To: [EMAIL PROTECTED]
>Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
>Subject: RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and
>email)
>
>
>Carl,
>
>> Ken,
>>
>> UTF-8s is essentially a way to ignore surrogate processing.  It allows a
>> company to encode UTF-16 with UCS-2 logic.
>>
>> The problem is that by not implementing surrogate support you can
>introduce
>> subtle errors.  For example it is common to break buffers apart into
>> segments.  These segments may be reconcatinated but they may be processed
>> individually.
>
>You are preaching to the choir here. I didn't state that *I* was in
>favor of UTF-8S -- only that we have to be careful not to assume that
>UTC will obviously not support it. The proponents of UTF-8S are
>vigorously and actively campaigning for their proposal. In
>standardization committees, proposals that have committed, active
>proponents who can aim for the long haul, often have a way of getting
>adopted in one form or another, unless there are equally committed
>and active opponents of the proposal. It is just the nature of
>consensus politicking in these committees, whether corporate based
>or national body based.
>
>Also, I consider the stated position of "near-universal agreement
>among the database vendors" to be largely a rhetorical device by
>the proponents. Oracle is clearly pushing the proposal. NCR has
>stated it is not in favor of the proposal. The other big enterprise
>database vendors are hedging their positions somewhat -- in
>particular, the standards people in those companies may not be
>entirely in agreement with some of their database engine developers, for
>example. And the small database vendors are either not playing
>in this space or are part of desktop systems that will just follow
>the behavior of the platforms.
>
>--Ken
>
>
>

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

Reply via email to