UTF8 is not UTF-8 (was Re: UTF8 vs AL32UTF8)

Edward Cherlin Sat, 09 Jun 2001 20:13:08 -0700
OK, the bottom line here is that Oracle goofed in implementing UTF-8, 
and instead of fixing the mistake, either by renaming their 
proprietary format or getting the data converted to the correct form, 
they want to pass the error off on us, thus making things even worse.

I have a suggestion.

Say, "Oops!" loudly and publicly.

Name UTF-8 correctly in the next release, and rename UTF8 to 
something that shows its proprietary nature.

Tell your customers that "UTF8" is not UTF-8, even if it sorts 
quicker, and must be converted whenever it is exported.

Offer to convert any "UTF8" data to UTF-8 for free.

Drop the idea of getting a Unicode standard around your error. If you 
and other database providers want to create such a standard, do it 
yourselves, and jolly good luck to you! (You'll need it.)


I have an alternative suggestion.

We all know that the internal formats in a database are frequently 
different from those presented to users. So you can keep your format 
as long as you *always* convert it to UTF-8 externally. In this case 
also you can leave us out of it.

At 3:56 PM -0700 6/8/01, Jianping Yang wrote:
>Carl,
>
>"Carl W. Brown" wrote:
>  > Looking at your documentation you call UTF-8s UTF8 and standard UTF-8
>  > AL31UTF8.  To me this is very misleading.
>
>We clearly documented what character set definition for UTF8 and 
>AL32UTF8 in our
>manual. If you look at them

Oh, you have customers who read and understand the documentation? Can 
I have some of them?

>you should easy map UTF8 to UTF-8S and AL32UTF8 to UTF-8.

You refuse to abide by the Unicode standard, and you want *us* to 
"fix" your problem?!? I don't think so.

It has been made plain on this list that you have technical answers 
to your desiderata (which are *not* requirements). You claim that you 
can save processing time by storing data in a non-standard format and 
lying about it to your customers. I claim that any such savings will 
be lost many times over due to errors in identifying the encodings 
and to otherwise unnecessary conversions.

Now if you want to store data in your format, but always pass it 
around in legal UTF-8, you get your internal performance benefit 
without bugging any of us. I don't know of any way to handle 
transfers between databases other than asking what the encodings are 
at source and destination, and doing the conversion if necessary. 
*You* can set up so that databases in your "UTF8" format can exchange 
data directly, although I don't know why you would need to.
-- 

Edward Cherlin, Generalist
"It isn't what you don't know that hurts you, it's what you know for
certain that just ain't so."--Mark Twain, Josh Billings, Edwin Howard
Armstrong, Will Rogers, Satchel Paige (following Thomas Jefferson)
UTF8 is not UTF-8 (was Re: UTF8 vs AL32UTF8)

Reply via email to