Frank Yung-Fong Tang writes: > But how about the UTF-16 vs UCS4 battle? Forget it: nearly nobody uses UCS-4 except very internally for string processing at the character level. For whole strings, nearly everybody uses UTF-16 as it performs better with less memory costs, and because UCS-4 is not needed.
Handling surrogates found in surrogates is quite simple and in fact it is even simpler to detect and manage than handling MBCS-encoded strings for Asian 8-bit applications, and today MBCS 8-bit processing is performed by transforming it first into equivalent internal 16-bit code positions, or sometimes by transcoding it to Unicode with UTF-16. So I do think that applications that could handle East-Asian DBCS 8-bit text (EUC-*, ISO2022-*, JIS) can very easily be modified to work internally with UTF-16 (notably because interoperability of Unicode code points with these DBCS charsets is excellent as the transcoding is not ambiguous, bijective, does not need code reordering, and just consists in a simple mapping table implemented now in all OSes localized for Asian markets). East-Asian developers have learned since long how to cope with DBCS-encoded strings. Now with UTF-16, handling surrogates found in string is even simpler, as UTF-16 allows bidirectional and random access to any positions in strings, which means additional performance and less tricky algorithms for text processing... __________________________________________________________________ << ella for Spam Control >> has removed Spam messages and set aside Newsletters for me You can use it too - and it's FREE! http://www.ellaforspam.com
<<attachment: winmail.dat>>

