"Garrett D'Amore" <garr...@damore.org> wrote:

> On Mon, 2010-11-01 at 10:36 +0100, Joerg Schilling wrote:
> > The assumption that multi-byte characters use octets with the high order 
> > bit 
> > set is only correct for so called stateless locales.
> > 
> > Locales that use shift codes behave different.
>
> Actually, its a safe assumption for UTF-8, which is the main concern I
> think.
>
> The bigger question here is not locales, but character encoding schemes,
> I think.  Specifically we're talking about filenames, which do not
> inherently carry a locale with them, but might be encoded in one of a
> small number of locales... for UTF-8 I believe the code is fine.

How many people un Russia or China use UTF-8?

> Also, if the code is using libc's glob interfaces, its fine too, because
> libc's glob code is sensitive to the locale and correctly handles
> stateful encodings.

The sources from Sun do not deal with cases that cause the multi-byte state 
machine to go into an undefined state. I needed to fix this e.g. in order to 
port the Bourne Shell to FreeBSD and Mac OS X.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
       j...@cs.tu-berlin.de                (uni)  
       joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
_______________________________________________
opensolaris-code mailing list
opensolaris-code@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code

Reply via email to