On Tue, 2002-06-18 at 20:30, Alexander Bokovoy wrote: > On Tue, Jun 18, 2002 at 11:31:16AM -0700, Jeremy Allison wrote: > > On Tue, Jun 18, 2002 at 01:24:12PM -0500, Steve Langasek wrote: > > > > > > I do hope that tdb ends up going with UTF-8. UCS2 is not particularly > > > pleasant to work with under Unix; it's not endian-neutral, it doesn't > > > provide ASCII as a compatibility subset, and it has to be converted to > > > something else before it can be used by the majority of Unix tools. > > > Granted, to a certain extent this is already true with tdb because it's > > > a binary format, but making the import/export tools more complex gives > > > you less margin for error. Unless Samba chooses UCS-2 as an internal > > > format for string processing (which I also don't think is the best idea > > > in the world ;), using UCS-2 as a backend charset seems like an > > > all-around bad idea, IMHO. > > > > Yes, I think internal format (and format for tdbs) of utf8 seems > > like the best idea (IMHO). > There is a problem with utf8 for many fixed-size records in various tdbs. > Also, most of data is in UCS-2 already.
Not only that, utf-8 is not easy to manipulate as characters are not fixed lenght an upper case and lower case ones are not guaranted to be long the same amount of bytes. So UCS-2 is more suitable for most of the manipulations, utf8 is more suitable to deal with unix system (file names, ecc..). But, as windows yet speak ucs-2 with us, it is better to use that internally, so that conversions are kept to a minimum, and manipulation of data is much easier and faster. Relegating utf8, in the long term to an internal vfs conversion for file name storage purposes (yes I advocate an ucs2 vfs interface for the next ntfs like semantic rewrite). Simo. > -- > / Alexander Bokovoy > --- > Most people have a mind that's open by appointment only. > -- Simo Sorce ---------- Una scelta di liberta': Software Libero. A choice of freedom: Free Software. http://www.softwarelibero.it
