Bart Smaalders wrote:
> 
> Bill Sommerfeld wrote:
> > On Fri, 2007-11-02 at 08:35 -0700, Bart Smaalders wrote:
> >> packaging system's complete inability to deal with filenames with
> >> spaces, Unicode characters, etc.
> >
> > I see your point about spaces in filenames.  But they also cause
> > trouble in human-readable output (trailing whitespace, in particular),
> > and continuing trouble for unix shells and shell scripts (it's just too
> > easy to make mistakes).
> >
> > But I thought UTF-8 encoding of unicode was specifically designed to
> > permit continued use of things like space-delimited or comma-delimited
> > formats -- the new characters are encoded using bytes outside the range
> > of 7-bit ASCII, so single-byte matches for 7-bit ASCII characters
> > continue to behave as expected.
> 
> Spaces are of course the most commom example.  But other characters may
> be desirable as well.
> 
> Do we encode characters in a form representable in ascii locales, or do
> we use raw byte codes?  Since UTF-8 allows control characters, line
> oriented parsing is difficult for arbitrary filenames unless escaping
> is done.  It's not just spaces that are the problem, it's any other
> control characters... we'll need to escape characters so that they
> can be rationally displayed in multiple locales...

Erm... there is the $''-style string literal in bash&&ksh93 which
handles this stuff (e.g. spaces, control characers and random byte
sequences) and AFAIK "perl" understands this, too. The only tricky part
is that you cannot output raw UTF-8 if the current locale is not UTF-8
based (e.g. ja_JP.PCK, zh_CN.GB18030 etc.) - in that case you'll
generate an illegal byte sequence and the utilities&&shells will abort
reading at that point (in some locales this error is recoverable but in
others you're completely lost, e.g. there is no way to find the next
valid charatcer in the input stream). I've discussed a solution for this
with April&&Don - if anyone needs sample code I can post it later next
week...

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) [EMAIL PROTECTED]
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 7950090
 (;O/ \/ \O;)
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Reply via email to