Mark Davis wrote: > You are correct about the published definitions. As I recall, though, we > were referring to UTF-FSS as UTF-8 in the UTC meetings before it was changed > to account for UTF-16. > > In any event, I don't know whether Oracle was involved in those discussions > or not, or whether they introduced their tag "UTF8" before or after the > definition was changed. > As matter of fact, Oracle supported UTF-8 far earlier than surrogate or 4-byte encoding was introduced. As database vendor, Oracle took fully advantages of Unicode and also a victim of Unicode in sense of compatibility. As no burden of fonts and IME issue for a database to store Unicode at its server. Oracle supported very early version of Unicode in its Oracle 7 release as database character set AL24UTFFSS which means 3-byte encoding for UTF-FSS. When Unicode came to version 2.1, we found our AL24UTFFSS had trouble for 2.1 as Hangul's reallocation, and we could not simply update AL24UTFFSS to 2.1 definition as it would mess existing users' data in their database. So we came up with a new character set as UTF8 which is still 3-byte encoding to support Unicode 2.1. The choice of 3-byte encoding is also bound to AL24UTFFSS implementation as it would not break when users migrate AL24UTFFSS into UTF8. In 9i release, we cannot make an easy expansion for UTF8 up to 4-byte for the backward compatibility. Although we specifically document that UTF8 does not support supplementary character in 8i, but users can still input surrogate through UCS-2 into UTF8 database as a pair of 3-byte ( this is true to other database vendors ), which will make hard for us to simply change UTF8 definition up to 4-byte. If we did this simple update, a pair of surrogates from 8i UTF8 database would be stored into 9i UTF8 without character set conversion, resulting in irregular forms in AL32UTF8, which could make migration even harder as there would be two different versions of UTF8 in a distributed system. So what we did in Oracle 9i is to introduced a new character set as AL32UTF8 for the standard UTF-8 up to 4-byte encoding, and user can easily migrate UTF8 to AL32UTF8 either in a database version migration or in a distributed environment. People may argue that as there is no supplementary character defined before Unicode 3.1, it should be ok to simply update UTF8 to support 4-byte encoding without compatibility issue, but the case is not because we cannot force every Oracle customers to migrate their database into 9i, which means there is still a certain time period that Oracle 8i and 9i would be co-exist. You have to consider their compatibility and that's the price we have to pay to support Unicode. Regards, Jianping.
begin:vcard n:Yang;Jianping tel;fax:650-506-7225 tel;work:650-506-4865 x-mozilla-html:FALSE org:Server Gobalization Technology;Server Technology version:2.1 email;internet:[EMAIL PROTECTED] title:Senior Development Manager adr;quoted-printable:;;500 Oracle Packway=0D=0AM/S 659407;Redwood Shores;CA;94065; fn:Jianping Yang end:vcard