David Powell wrote:
[CC;'ing Ienup Sung to take a look, AFAIK this issue may be very very
serious...]
> Roland Mainz wrote:
>  > What happens with the value of a "ustring" (I assume this means "unicode
>  > string", right ?) value in a locale which may not be able to represent
>  > some characters in that value, e.g. charatcers outside the BMP (Basic
>  > Multilinguar Plane) when the current shell script runs in ja_JP.PCK ...
>  > or en_US.ISO8859-1 ?
> 
>    I assume you're asking what the output of svcprop is for a property
>    of that type?
> 
>    To answer your parenthetical question, a ustring is an "8-bit UTF-8
>    string" (intended to mean UTF-8-encoded Unicode).

Ok...

>    I believe svcprop emits the UTF-8-encoded string, with (some) special
>    characters (e.g. newlines) quoted.  If you view svcprop as a means of
>    getting values that can be reliably fed back into SCF (perhaps with
>    some processing) to reproduce the original values, this is a good
>    thing.

OUCH OUCH OUCH... erm...
... if I interpret the situation correctly you're output an UTF-8
encoding string, right ? If that's "true" then there is a _serious_
problem since such a value would be an invalid charatcer sequence for
non-UTF-8 multibyte encodings. You may get away in some shells like
ksh93, but only by "accident" because one of the implementation details
of ksh93 is that it treats all things as plain strings unless it needs
to do special handling like quotes, IFS etc. In that case the shell
script will break because you hit invalid charatcers... which is AFAIK
bad... ;-(

> If you expect svcprop to emit localized output, it isn't.

No, I am not asking for "localisation", I am asking about which
encodings the strings use, e.g. "UTF-8" vs. "Shift-Jis" (ja_JP.PCK uses
"Shift-JIS" as encoding).

>    Since svcprop is primarily intended for scripting purposes, I think
>    the former argument is in line with our expectations.

Erm, not really.. it seems I found something like a giant "dataloss"
bug... ;-(

>    See print_value() in svcprop.c for the gory details.

I am reading
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/svc/svcprop/svcprop.c
right now ... and the first thing I found was (line 55):
-- snip --
    151 /*
    152  * Return an allocated copy of str, with the Bourne shell's
metacharacters
    153  * escaped by '\'.
    154  *
    155  * What about unicode?
    156  */
    157 static char *
    158 quote_for_shell(const char *str)
    159 {
-- snip --

Erm... is it possible that the code is completely unaware about things
like "multibyte encodings" and that the system's default locale may be
something ja_JP.PCK, zh_CN.GB18030 or en_US.ISO8859-1 (e.g. not
*.UTF-8-related or compatible) ?

> You can find more information on SCF types in scf_value_create(3SCF).

Thanks! :-)

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) roland.mainz at nrubsig.org
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 7950090
 (;O/ \/ \O;)

Reply via email to