David Powell wrote: [CC;'ing Ienup Sung to take a look, AFAIK this issue may be very very serious...] > Roland Mainz wrote: > > What happens with the value of a "ustring" (I assume this means "unicode > > string", right ?) value in a locale which may not be able to represent > > some characters in that value, e.g. charatcers outside the BMP (Basic > > Multilinguar Plane) when the current shell script runs in ja_JP.PCK ... > > or en_US.ISO8859-1 ? > > I assume you're asking what the output of svcprop is for a property > of that type? > > To answer your parenthetical question, a ustring is an "8-bit UTF-8 > string" (intended to mean UTF-8-encoded Unicode).
Ok... > I believe svcprop emits the UTF-8-encoded string, with (some) special > characters (e.g. newlines) quoted. If you view svcprop as a means of > getting values that can be reliably fed back into SCF (perhaps with > some processing) to reproduce the original values, this is a good > thing. OUCH OUCH OUCH... erm... ... if I interpret the situation correctly you're output an UTF-8 encoding string, right ? If that's "true" then there is a _serious_ problem since such a value would be an invalid charatcer sequence for non-UTF-8 multibyte encodings. You may get away in some shells like ksh93, but only by "accident" because one of the implementation details of ksh93 is that it treats all things as plain strings unless it needs to do special handling like quotes, IFS etc. In that case the shell script will break because you hit invalid charatcers... which is AFAIK bad... ;-( > If you expect svcprop to emit localized output, it isn't. No, I am not asking for "localisation", I am asking about which encodings the strings use, e.g. "UTF-8" vs. "Shift-Jis" (ja_JP.PCK uses "Shift-JIS" as encoding). > Since svcprop is primarily intended for scripting purposes, I think > the former argument is in line with our expectations. Erm, not really.. it seems I found something like a giant "dataloss" bug... ;-( > See print_value() in svcprop.c for the gory details. I am reading http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/svc/svcprop/svcprop.c right now ... and the first thing I found was (line 55): -- snip -- 151 /* 152 * Return an allocated copy of str, with the Bourne shell's metacharacters 153 * escaped by '\'. 154 * 155 * What about unicode? 156 */ 157 static char * 158 quote_for_shell(const char *str) 159 { -- snip -- Erm... is it possible that the code is completely unaware about things like "multibyte encodings" and that the system's default locale may be something ja_JP.PCK, zh_CN.GB18030 or en_US.ISO8859-1 (e.g. not *.UTF-8-related or compatible) ? > You can find more information on SCF types in scf_value_create(3SCF). Thanks! :-) ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) roland.mainz at nrubsig.org \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 7950090 (;O/ \/ \O;)