[OT] RE: UTF8 vs. Unicode (UTF16) in code

Ienup Sung Thu, 08 Mar 2001 18:18:11 -0800
Well, please ignore my previous email on the coding space. I somehow
read that Mr. Ayers was talking about that the UTF-32 has coding space
of 4G which he absolutely didn't mean that way. I just blame "super
fast reading" class that I took. ;-)

With regards,

Ienup

 
] Date: Thu, 08 Mar 2001 20:02:39 -0600
] From: "Ayers, Mike" <[EMAIL PROTECTED]>
] Subject: RE: UTF8 vs. Unicode (UTF16) in code
] To: 'Ienup Sung' <[EMAIL PROTECTED]>, Unicode List <[EMAIL PROTECTED]>
] MIME-version: 1.0
] 
] 
]       If you really want to finish the job, there's always UTF-32, which
] should do rather nicely until we meet the space aliens aith the
] 4,293,853,186 character alphabet!
] 
] 
] /|/|ike
] 
] P.S.  No, they're not Klingons!
] 
] > From: Ienup Sung [mailto:[EMAIL PROTECTED]]
] > 
] > I think we shouldn't advocate that since there will be only 43K
] > CJK characters at the SIP, about 1.6K characters at SMP, and, 97 tag
] > characters at SPP, we can ignore such the characters and the 
] > additional planes
] > of the UTF-16/32 of Unicode 3.1. Furthemore, when you're 
] > doing the first i18n
] > on the existing programs, you can do the whole thing at once 
] > with minor
] > additional cost if you choose to have support for UTF-16 
] > while you're at it
] > rather than do it only for BMP/UCS-2 now and later do one 
] > more time of change
] > even though that would be decided by each team/company who are doing
] > the i18n in my opinion.
] > 
] > And, as we all know, we can no longer claim that the UTF-16 is a fixed
] > width anymore since it is variable width now as like UTF-8; 
] > we will just
] > have to deal with it in my opinion.
] > 
] > With regards,
] > 
] > Ienup
] > 
] > 
] > ] Date: Fri, 09 Mar 2001 10:48:52 -0800 (PST)
] > ] From: [EMAIL PROTECTED]
] > ] Subject: Re: UTF8 vs. Unicode (UTF16) in code
] > ] X-Sender: [EMAIL PROTECTED]
] > ] To: Ienup Sung <[EMAIL PROTECTED]>
] > ] Cc: Unicode List <[EMAIL PROTECTED]>
] > ] MIME-version: 1.0
] > ] 
] > ] Well....
] > ] 
] > ] Actually, there is a significant difference between being "UTF-8
] > ] ignorant" and "UTF-16 ignorant". A "UTF-16 ignorant" 
] > program thinks that
] > ] surrogate pairs are just two characters with undefined 
] > properties. Since
] > ] currently there are no characters "up there" this isn't a really big
] > ] deal. Shortly, when Unicode 3.1 is official, there will be 40K or so
] > ] characters in the supplemental planes... but they'll be 
] > relatively rare.
] > ] 
] > ] In most cases where one has a "character pointer", one is 
] > not performing
] > ] casing, line breaking, or other text interpretation that requires
] > ] significant awareness of the meaning of the text. Of 
] > course, it depends on
] > ] the instance and the application how true that is ;-). But 
] > in many cases
] > ] you *can* ignore the fact that a high- or low-surrogate character is
] > ] really part of something else.
] > ] 
] > ] With UTF-8, however, is is impossible to ignore the 
] > multi-byte sequences
] > ] and they can never really be treated as separate 
] > characters. So I guess
] > ] all I'm saying is that, depending on what you need to do 
] > and what level of
] > ] awareness your application needs to achieve, a pure "UCS-2 
] > port" might be
] > ] a better choice than UTF-8, since the specific details 
] > overlooked are
] > ] of a different quality.
] > ] 
] > ] Best Regards,.
] > ] 
] > ] Addison
] > ] 
] > ] ===============================================================
] > ] Addison P. Phillips                     Globalization Architect
] > ] webMethods, Inc                       http://www.webmethods.com
] > ] Sunnyvale, CA, USA              mailto:[EMAIL PROTECTED]
] > ] 
] > ] +1 408.210.3569 (mobile)                  +1 408.962.5487 (ofc)
] > ] ===============================================================
] > ] "Internationalization is not a feature. It is an architecture."
[OT] RE: UTF8 vs. Unicode (UTF16) in code

Reply via email to