> On 17 Apr 2018, at 09:57, Damien Pollet <damien.pol...@gmail.com> wrote:
> 
> It seems macOS normalizes UTF-8 differently from everyone else in file names 
> (I think base character + composing instead of precomposed codepoint). That 
> might affect PWD.
> For environment variables, even if most sensible platforms should have 
> adopted UTF-8 by now, I wouldn't be surprised if there's no official encoding 
> whatsoever (i.e. they're just bytes with a 0 at the end…)

;-)

We can decode everything, we have all the tools, but of course, we first have 
to know what encoding is being used. Hence my question.

> On 17 April 2018 at 09:36, Sven Van Caekenberghe <s...@stfx.eu> wrote:
> Hi,
> 
> The dictionary 
> 
>  OSPlatform current environment
> 
> contains a copy of the OS's environment variables (more correctly of the VM 
> process), as key/value pairs. 
> 
> These are obtained via the following system calls:
> 
> on macOS & *nix
> 
>   LIBC environ
> 
> on Windows
> 
>   KERNEL32 GetEnvironmentStrings
> 
> It is however a bit unclear how these are encoded. On macOS & *nix that seems 
> to be UTF8, on Windows there are some reports that it appears to be Latin1 - 
> but both might be locale specific, I don't know either way.
> 
> Does anyone know for sure ?
> 
> I furthermore think that OSEnvironment and its subclasses, who do this call, 
> should be responsible for decoding the C strings into proper Pharo strings, 
> and not leave that responsibility to its users.
> 
> Fundamentally, in the following, the decoding is still not done correctly and 
> that is wrong/confusing IMHO.
> 
> $ FOO=benoît ./pharo Pharo.image eval 'OSEnvironment current associations' 
> {'TERM_PROGRAM'->'Apple_Terminal'. 'TERM'->'xterm-256color'. 
> 'SHELL'->'/bin/bash'. 
> 'TMPDIR'->'/var/folders/sy/sndrtj9j1tq06j0lfnshmrl80000gn/T/'. 
> 'FOO'->'benoît'. 
> 'Apple_PubSub_Socket_Render'->'/private/tmp/com.apple.launchd.uWk7pivcLT/Render'.
>  'TERM_PROGRAM_VERSION'->'404'. 
> 'TERM_SESSION_ID'->'845BECCD-0AB0-4686-B7F9-3A0FF84BDCB7'. 'USER'->'sven'. 
> 'SSH_AUTH_SOCK'->'/private/tmp/com.apple.launchd.y5oCwdUyaG/Listeners'. 
> 'PATH'->'/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/texbin:/opt/X11/bin'.
>  'PWD'->'/tmp/benoît'. 'XPC_FLAGS'->'0x0'. 'XPC_SERVICE_NAME'->'0'. 
> 'HOME'->'/Users/sven'. 'SHLVL'->'2'. 'LOGNAME'->'sven'. 'LC_CTYPE'->'UTF-8'. 
> 'DISPLAY'->'/private/tmp/com.apple.launchd.lsgASYFiWW/org.macosforge.xquartz:0'.
>  'SECURITYSESSIONID'->'186a9'. 'OLDPWD'->'/tmp/benoît'. 
> '_'->'/tmp/benoît/pharo-vm/Pharo.app/Contents/MacOS/Pharo'. 
> '__CF_USER_TEXT_ENCODING'->'0x1F5:0x0:0x0'}
> 
> Of course, if we change this, we will need to fix callers.
> 
> Opinions ?
> 
> Sven
> 
> PS: Furthermore, I note that there is a subtle difference in how $FOO and 
> $PWD in the above are UTF-8 encoded. In the former, normalisation was done, 
> in the latter not. Maybe that could lead to problems (when 
> comparing/composing them). This is a difficult/complex subject 
> (https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43).
> 
> 
> 
> 
> 
> 
> -- 
> Damien Pollet
> type less, do more [ | ] http://people.untyped.org/damien.pollet


Reply via email to