> On 17 Apr 2018, at 10:40, Nicolai Hess <[email protected]> wrote:
>
>
>
> 2018-04-17 10:05 GMT+02:00 Sven Van Caekenberghe <[email protected]>:
>
>
> > On 17 Apr 2018, at 09:57, Damien Pollet <[email protected]> wrote:
> >
> > It seems macOS normalizes UTF-8 differently from everyone else in file
> > names (I think base character + composing instead of precomposed
> > codepoint). That might affect PWD.
> > For environment variables, even if most sensible platforms should have
> > adopted UTF-8 by now, I wouldn't be surprised if there's no official
> > encoding whatsoever (i.e. they're just bytes with a 0 at the end…)
>
> ;-)
>
> We can decode everything, we have all the tools, but of course, we first have
> to know what encoding is being used. Hence my question.
>
> > On 17 April 2018 at 09:36, Sven Van Caekenberghe <[email protected]> wrote:
> > Hi,
> >
> > The dictionary
> >
> > OSPlatform current environment
> >
> > contains a copy of the OS's environment variables (more correctly of the VM
> > process), as key/value pairs.
> >
> > These are obtained via the following system calls:
> >
> > on macOS & *nix
> >
> > LIBC environ
> >
> > on Windows
> >
> > KERNEL32 GetEnvironmentStrings
>
>
> Interestingly, this is only for the dictionary operations (asDictionary,
> keysAndValuesDo...)
> If you just access the variable with getEnv, it works:
>
> OSPlatform current environment setEnv:'FOO' value:'benoît'.
> OSPlatform current environment getEnv:'FOO'. "'benoît'"
> OSPlatform current environment asDictionary at: 'FOO'. "'benoŒt'"
Hmm, not for me (on macOS):
$ FOO=benoît ./pharo Pharo.image eval "OSPlatform current environment
getEnv:'FOO'"
'benoît'
If you put it in yourself, are you not cheating then ?
> >
> > It is however a bit unclear how these are encoded. On macOS & *nix that
> > seems to be UTF8, on Windows there are some reports that it appears to be
> > Latin1 - but both might be locale specific, I don't know either way.
> >
> > Does anyone know for sure ?
> >
> > I furthermore think that OSEnvironment and its subclasses, who do this
> > call, should be responsible for decoding the C strings into proper Pharo
> > strings, and not leave that responsibility to its users.
> >
> > Fundamentally, in the following, the decoding is still not done correctly
> > and that is wrong/confusing IMHO.
> >
> > $ FOO=benoît ./pharo Pharo.image eval 'OSEnvironment current associations'
> > {'TERM_PROGRAM'->'Apple_Terminal'. 'TERM'->'xterm-256color'.
> > 'SHELL'->'/bin/bash'.
> > 'TMPDIR'->'/var/folders/sy/sndrtj9j1tq06j0lfnshmrl80000gn/T/'.
> > 'FOO'->'benoît'.
> > 'Apple_PubSub_Socket_Render'->'/private/tmp/com.apple.launchd.uWk7pivcLT/Render'.
> > 'TERM_PROGRAM_VERSION'->'404'.
> > 'TERM_SESSION_ID'->'845BECCD-0AB0-4686-B7F9-3A0FF84BDCB7'. 'USER'->'sven'.
> > 'SSH_AUTH_SOCK'->'/private/tmp/com.apple.launchd.y5oCwdUyaG/Listeners'.
> > 'PATH'->'/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/texbin:/opt/X11/bin'.
> > 'PWD'->'/tmp/benoît'. 'XPC_FLAGS'->'0x0'. 'XPC_SERVICE_NAME'->'0'.
> > 'HOME'->'/Users/sven'. 'SHLVL'->'2'. 'LOGNAME'->'sven'.
> > 'LC_CTYPE'->'UTF-8'.
> > 'DISPLAY'->'/private/tmp/com.apple.launchd.lsgASYFiWW/org.macosforge.xquartz:0'.
> > 'SECURITYSESSIONID'->'186a9'. 'OLDPWD'->'/tmp/benoît'.
> > '_'->'/tmp/benoît/pharo-vm/Pharo.app/Contents/MacOS/Pharo'.
> > '__CF_USER_TEXT_ENCODING'->'0x1F5:0x0:0x0'}
> >
> > Of course, if we change this, we will need to fix callers.
> >
> > Opinions ?
> >
> > Sven
> >
> > PS: Furthermore, I note that there is a subtle difference in how $FOO and
> > $PWD in the above are UTF-8 encoded. In the former, normalisation was done,
> > in the latter not. Maybe that could lead to problems (when
> > comparing/composing them). This is a difficult/complex subject
> > (https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43).
> >
> >
> >
> >
> >
> >
> > --
> > Damien Pollet
> > type less, do more [ | ] http://people.untyped.org/damien.pollet