On Wed, Jun 19, 2002 at 12:02:00AM +0200, Simo Sorce wrote: > > > Yes, I think internal format (and format for tdbs) of utf8 seems > > > like the best idea (IMHO). > > There is a problem with utf8 for many fixed-size records in various tdbs. > > Also, most of data is in UCS-2 already.
I don't think that's true. Most data should be in unix character set. > Not only that, utf-8 is not easy to manipulate as characters are not > fixed lenght an upper case and lower case ones are not guaranted to be > long the same amount of bytes. Why would you need to manipulate the string on a character by character basis? The only case I can think of is the name mangling system. Every other part of Samba only cares about the total length of the string. > So UCS-2 is more suitable for most of the manipulations, utf8 is more > suitable to deal with unix system (file names, ecc..). > > But, as windows yet speak ucs-2 with us, it is better to use that > internally, so that conversions are kept to a minimum, and manipulation > of data is much easier and faster. > > Relegating utf8, in the long term to an internal vfs conversion for file > name storage purposes (yes I advocate an ucs2 vfs interface for the next > ntfs like semantic rewrite). Yuck. (-: Tim.
