"Garrett D'Amore" <garr...@damore.org> wrote: > On Mon, 2010-11-01 at 10:36 +0100, Joerg Schilling wrote: > > The assumption that multi-byte characters use octets with the high order > > bit > > set is only correct for so called stateless locales. > > > > Locales that use shift codes behave different. > > Actually, its a safe assumption for UTF-8, which is the main concern I > think. > > The bigger question here is not locales, but character encoding schemes, > I think. Specifically we're talking about filenames, which do not > inherently carry a locale with them, but might be encoded in one of a > small number of locales... for UTF-8 I believe the code is fine.
How many people un Russia or China use UTF-8? > Also, if the code is using libc's glob interfaces, its fine too, because > libc's glob code is sensitive to the locale and correctly handles > stateful encodings. The sources from Sun do not deal with cases that cause the multi-byte state machine to go into an undefined state. I needed to fix this e.g. in order to port the Bourne Shell to FreeBSD and Mac OS X. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de (uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily _______________________________________________ opensolaris-code mailing list opensolaris-code@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/opensolaris-code