Bart Smaalders wrote: > > Bill Sommerfeld wrote: > > On Fri, 2007-11-02 at 08:35 -0700, Bart Smaalders wrote: > >> packaging system's complete inability to deal with filenames with > >> spaces, Unicode characters, etc. > > > > I see your point about spaces in filenames. But they also cause > > trouble in human-readable output (trailing whitespace, in particular), > > and continuing trouble for unix shells and shell scripts (it's just too > > easy to make mistakes). > > > > But I thought UTF-8 encoding of unicode was specifically designed to > > permit continued use of things like space-delimited or comma-delimited > > formats -- the new characters are encoded using bytes outside the range > > of 7-bit ASCII, so single-byte matches for 7-bit ASCII characters > > continue to behave as expected. > > Spaces are of course the most commom example. But other characters may > be desirable as well. > > Do we encode characters in a form representable in ascii locales, or do > we use raw byte codes? Since UTF-8 allows control characters, line > oriented parsing is difficult for arbitrary filenames unless escaping > is done. It's not just spaces that are the problem, it's any other > control characters... we'll need to escape characters so that they > can be rationally displayed in multiple locales...
Erm... there is the $''-style string literal in bash&&ksh93 which handles this stuff (e.g. spaces, control characers and random byte sequences) and AFAIK "perl" understands this, too. The only tricky part is that you cannot output raw UTF-8 if the current locale is not UTF-8 based (e.g. ja_JP.PCK, zh_CN.GB18030 etc.) - in that case you'll generate an illegal byte sequence and the utilities&&shells will abort reading at that point (in some locales this error is recoverable but in others you're completely lost, e.g. there is no way to find the next valid charatcer in the input stream). I've discussed a solution for this with April&&Don - if anyone needs sample code I can post it later next week... ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) [EMAIL PROTECTED] \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 7950090 (;O/ \/ \O;) _______________________________________________ pkg-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/pkg-discuss
