Nicolas Williams wrote:
> On Fri, Sep 21, 2007 at 03:27:57AM +0200, Roland Mainz wrote:
> > David Powell wrote:
> > >    It simply means that if you assume the *parseable* output of this
> > >    command is using the caller's encoding and character set, you are
> > >    wrong.
> >
> > Erm, my concern is not about "parseable" output. My point is that the
> > output cannot be processed. If you put the into a file and then let the
> > shell read it it may not be able to recover from this kind of error
> 
> Stop right there.  If you put UTF-8 into a file that will be read
> elsewhere (or even in the same process) in a context where something
> else is expected, well, then you lose.  That has nothing to do with
> svcprop.

If files do not work and shell variables do not work either... who is
actually able to use this interface ? AFAIK there is noone left, right ?

> It is perfectly OK for an interface to declare that "thou shalt provide
> UTF-8 input and this shall output UTF-8" and leave it to consumers of
> that interface to do any codeset conversions that might be required.

What if the interface is not able to the conversions unless you meet
very special conditions, like forcing the shell script to run in a
*.UTF-8 locale (and not all shells allow that you have something like
LC_MESSAGES=ja_JP.PCK while the rest of the LC_*-vars runs in an UTF-8
locale - it works with ksh93 but that's the only shell I'm aware about
where this really works...) ?

> Interfaces that perform codeset conversions automatically are certainly
> easier to use, but we do need a way to get the original data in any
> context without losing data.

See my other email how this can be handled (the "entity" thing). The
current solution AFAIK leads to a loss of data in all scenarious I can
imagine... ;-(

> So having an interface that deals only in
> UTF-8 here is a good thing.

Except that the interface doesn't work by design if you use a
non-("C"|"UTF-8")-locale.

> > >    Always emitting the UTF-8 encoded data is the only way for svcprop to
> > >    *avoid* data loss.
> 
> Exactly, though, to be fair, there are other ways: what matters is
> emitting an encoding of Unicode, such as UTF-8/16/32 or even punycode ;)

Erm, as said in previous emails the UTF-8 encoding won't work in
non-("C"|"UTF-8")-locales like ja_JP.PCK, ko_KR.EUC, zh_CN.GB18030 etc.

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) roland.mainz at nrubsig.org
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 7950090
 (;O/ \/ \O;)

Reply via email to