Simo Sorce wrote: > ... >>Also, some SMB clients are using UTF-16 now (superset of UCS-2 to >>support code points in other Unicode planes) instead of UCS-2. > > > which clients?
IIRC, MacOS X and Windows XP clients use UTF-16, although unless you are a Chinese user you will never notice. > ... >>In addition, no matter what Unicode representation is used, you >>still have to deal with different representations of the "same" >>character (is it a single character "a" with an umlat, or "a" >>plus a combining umlat character?, etc.) > > > If for that problem it does not matter which rep to use, than better go > with the one that ease programming (and easily avoid lots of errors, > specially in inside-string character or string search and > uppercasing/lowercasing). The issue is more that clients are free to provide whichever representation they want, and you may need to convert this to any of 4 normalization forms required by your local OS in order to do the proper comparisons. To make life even more interesting, case comparisons are a locale-dependent solution. That is, "A" with an umlat may not compare equal to "a" with an umlat in some locales (or shouldn't, anyways). -- ______________________________________________________________________ Michael Sweet, Easy Software Products [EMAIL PROTECTED] Printing Software for UNIX http://www.easysw.com