Hi,

I think this problem is not environment variable exclusive. It also affects
file paths and others. So far Pharo does not detect the locale to perform
the encoding and it should be nice to do it.

On Tue, Apr 17, 2018 at 10:56 AM, Henrik Sperre Johansen <
[email protected]> wrote:

> Damien Pollet wrote
> > It seems macOS normalizes UTF-8 differently from everyone else in file
> > names (I think base character + composing instead of precomposed
> > codepoint). That might affect PWD.
> > For environment variables, even if most sensible platforms should have
> > adopted UTF-8 by now, I wouldn't be surprised if there's no official
> > encoding whatsoever (i.e. they're just bytes with a 0 at the end…)
> >
> > On 17 April 2018 at 09:36, Sven Van Caekenberghe &lt;
>
> > sven@
>
> > &gt; wrote:
> >
> >> Hi,
> >>
> >> The dictionary
> >>
> >>  OSPlatform current environment
> >>
> >> contains a copy of the OS's environment variables (more correctly of the
> >> VM process), as key/value pairs.
> >>
> >> These are obtained via the following system calls:
> >>
> >> on macOS & *nix
> >>
> >>   LIBC environ
> >>
> >> on Windows
> >>
> >>   KERNEL32 GetEnvironmentStrings
> >>
> >> It is however a bit unclear how these are encoded. On macOS & *nix that
> >> seems to be UTF8, on Windows there are some reports that it appears to
> be
> >> Latin1 - but both might be locale specific, I don't know either way.
> >>
> >> Does anyone know for sure ?
> >>
> >> I furthermore think that OSEnvironment and its subclasses, who do this
> >> call, should be responsible for decoding the C strings into proper Pharo
> >> strings, and not leave that responsibility to its users.
> >>
> >> Fundamentally, in the following, the decoding is still not done
> correctly
> >> and that is wrong/confusing IMHO.
> >>
> >> $ FOO=benoît ./pharo Pharo.image eval 'OSEnvironment current
> >> associations'
> >> {'TERM_PROGRAM'->'Apple_Terminal'. 'TERM'->'xterm-256color'.
> >> 'SHELL'->'/bin/bash'. 'TMPDIR'->'/var/folders/sy/
> >> sndrtj9j1tq06j0lfnshmrl80000gn/T/'. 'FOO'->'benoît'.
> >> 'Apple_PubSub_Socket_Render'->'/private/tmp/com.apple.
> launchd.uWk7pivcLT/Render'.
> >> 'TERM_PROGRAM_VERSION'->'404'.
> >> 'TERM_SESSION_ID'->'845BECCD-0AB0-4686-B7F9-3A0FF84BDCB7'.
> >> 'USER'->'sven'.
> >> 'SSH_AUTH_SOCK'->'/private/tmp/com.apple.launchd.y5oCwdUyaG/Listeners'.
> >> 'PATH'->'/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/
> texbin:/opt/X11/bin'.
> >> 'PWD'->'/tmp/benoît'. 'XPC_FLAGS'->'0x0'. 'XPC_SERVICE_NAME'->'0'.
> >> 'HOME'->'/Users/sven'. 'SHLVL'->'2'. 'LOGNAME'->'sven'.
> >> 'LC_CTYPE'->'UTF-8'. 'DISPLAY'->'/private/tmp/com.
> >> apple.launchd.lsgASYFiWW/org.macosforge.xquartz:0'.
> >> 'SECURITYSESSIONID'->'186a9'. 'OLDPWD'->'/tmp/benoît'.
> >> '_'->'/tmp/benoît/pharo-vm/Pharo.app/Contents/MacOS/Pharo'.
> >> '__CF_USER_TEXT_ENCODING'->'0x1F5:0x0:0x0'}
> >>
> >> Of course, if we change this, we will need to fix callers.
> >>
> >> Opinions ?
> >>
> >> Sven
> >>
> >> PS: Furthermore, I note that there is a subtle difference in how $FOO
> and
> >> $PWD in the above are UTF-8 encoded. In the former, normalisation was
> >> done,
> >> in the latter not. Maybe that could lead to problems (when
> >> comparing/composing them). This is a difficult/complex subject (
> >> https://medium.com/concerning-pharo/an-implementation-of-unicode-
> >> normalization-7c6719068f43).
> >>
> >>
> >>
> >>
> >
> >
> > --
> > Damien Pollet
> > type less, do more [ | ] http://people.untyped.org/damien.pollet
>
> If by different, you mean that it actually normalizes the file names, then
> yes.
> All Mac filenames are in a well defined form; NFD.
> On linux, they're just arrays of bytes, and anything goes.
> That the bytes mostly happen to be valid utf8 strings in NFC, is just a
> by-product of the fact that's the format most programs use when calling the
> file primitives.
>
> Cheers,
> Henry
>
>
>
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Developers-f1294837.html
>
>


-- 



Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - *http://www.cnrs.fr
<http://www.cnrs.fr>*


*Web:* *http://guillep.github.io* <http://guillep.github.io>

*Phone: *+33 06 52 70 66 13

Reply via email to