Simo Sorce wrote: > Hi metze, > on top of the first doc I see you state that all strings should be utf8. > I hearteadly disagree, I woul d rather like to see all internal strings > on new code to be UCS-2. > Utf8 has many disadvantages: > 1. require any RPC requests that comes from clients to be converted > forth and back (UCS-2->UTF8->UCS-2)
Some "conversion" will always be required, not only for byte order issues (remember that UCS-2 strings can contain byte-order overrides) but for normalization forms that may be required. Also, some SMB clients are using UTF-16 now (superset of UCS-2 to support code points in other Unicode planes) instead of UCS-2. Finally, most UNIX filesystems only support the UTF-8 representation of Unicode, so at some point UCS-2/UTF-16 will have to be converted to UTF-8 anyways... > 2. Is difficult to manipulate UTF8 strings as they are variable lenght > multibyte chars and sometimes uppercase chars have different lenght than > lowercase chars. > ... UCS-2 can have different byte orders, and with UTF-16 you also need to keep track of the current plane as well, which makes life even more fun. In addition, no matter what Unicode representation is used, you still have to deal with different representations of the "same" character (is it a single character "a" with an umlat, or "a" plus a combining umlat character?, etc.) -- ______________________________________________________________________ Michael Sweet, Easy Software Products [EMAIL PROTECTED] Printing Software for UNIX http://www.easysw.com
