> On 19 Apr 2018, at 10:21, Guillermo Polito <[email protected]> wrote:
> 
> Hi,
> 
> I think this problem is not environment variable exclusive. It also affects 
> file paths and others. So far Pharo does not detect the locale to perform the 
> encoding and it should be nice to do it.

Sure, it would be nice/good/helpful to detect locale (BTW, don't we have that 
already more or less).

But I would be surprised if an OS API would deliver different encoded data to a 
process, depending on the locale - I mean in general. That would be setting up 
things for a huge distaster, IMHO. A modern OS should just deliver UTF-8 (full 
Unicode data points) and be done with it. 

> On Tue, Apr 17, 2018 at 10:56 AM, Henrik Sperre Johansen 
> <[email protected]> wrote:
> Damien Pollet wrote
> > It seems macOS normalizes UTF-8 differently from everyone else in file
> > names (I think base character + composing instead of precomposed
> > codepoint). That might affect PWD.
> > For environment variables, even if most sensible platforms should have
> > adopted UTF-8 by now, I wouldn't be surprised if there's no official
> > encoding whatsoever (i.e. they're just bytes with a 0 at the end…)
> > 
> > On 17 April 2018 at 09:36, Sven Van Caekenberghe &lt;
> 
> > sven@
> 
> > &gt; wrote:
> > 
> >> Hi,
> >>
> >> The dictionary
> >>
> >>  OSPlatform current environment
> >>
> >> contains a copy of the OS's environment variables (more correctly of the
> >> VM process), as key/value pairs.
> >>
> >> These are obtained via the following system calls:
> >>
> >> on macOS & *nix
> >>
> >>   LIBC environ
> >>
> >> on Windows
> >>
> >>   KERNEL32 GetEnvironmentStrings
> >>
> >> It is however a bit unclear how these are encoded. On macOS & *nix that
> >> seems to be UTF8, on Windows there are some reports that it appears to be
> >> Latin1 - but both might be locale specific, I don't know either way.
> >>
> >> Does anyone know for sure ?
> >>
> >> I furthermore think that OSEnvironment and its subclasses, who do this
> >> call, should be responsible for decoding the C strings into proper Pharo
> >> strings, and not leave that responsibility to its users.
> >>
> >> Fundamentally, in the following, the decoding is still not done correctly
> >> and that is wrong/confusing IMHO.
> >>
> >> $ FOO=benoît ./pharo Pharo.image eval 'OSEnvironment current
> >> associations'
> >> {'TERM_PROGRAM'->'Apple_Terminal'. 'TERM'->'xterm-256color'.
> >> 'SHELL'->'/bin/bash'. 'TMPDIR'->'/var/folders/sy/
> >> sndrtj9j1tq06j0lfnshmrl80000gn/T/'. 'FOO'->'benoît'.
> >> 'Apple_PubSub_Socket_Render'->'/private/tmp/com.apple.launchd.uWk7pivcLT/Render'.
> >> 'TERM_PROGRAM_VERSION'->'404'.
> >> 'TERM_SESSION_ID'->'845BECCD-0AB0-4686-B7F9-3A0FF84BDCB7'.
> >> 'USER'->'sven'.
> >> 'SSH_AUTH_SOCK'->'/private/tmp/com.apple.launchd.y5oCwdUyaG/Listeners'.
> >> 'PATH'->'/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/texbin:/opt/X11/bin'.
> >> 'PWD'->'/tmp/benoît'. 'XPC_FLAGS'->'0x0'. 'XPC_SERVICE_NAME'->'0'.
> >> 'HOME'->'/Users/sven'. 'SHLVL'->'2'. 'LOGNAME'->'sven'.
> >> 'LC_CTYPE'->'UTF-8'. 'DISPLAY'->'/private/tmp/com.
> >> apple.launchd.lsgASYFiWW/org.macosforge.xquartz:0'.
> >> 'SECURITYSESSIONID'->'186a9'. 'OLDPWD'->'/tmp/benoît'.
> >> '_'->'/tmp/benoît/pharo-vm/Pharo.app/Contents/MacOS/Pharo'.
> >> '__CF_USER_TEXT_ENCODING'->'0x1F5:0x0:0x0'}
> >>
> >> Of course, if we change this, we will need to fix callers.
> >>
> >> Opinions ?
> >>
> >> Sven
> >>
> >> PS: Furthermore, I note that there is a subtle difference in how $FOO and
> >> $PWD in the above are UTF-8 encoded. In the former, normalisation was
> >> done,
> >> in the latter not. Maybe that could lead to problems (when
> >> comparing/composing them). This is a difficult/complex subject (
> >> https://medium.com/concerning-pharo/an-implementation-of-unicode-
> >> normalization-7c6719068f43).
> >>
> >>
> >>
> >>
> > 
> > 
> > -- 
> > Damien Pollet
> > type less, do more [ | ] http://people.untyped.org/damien.pollet
> 
> If by different, you mean that it actually normalizes the file names, then
> yes.
> All Mac filenames are in a well defined form; NFD.
> On linux, they're just arrays of bytes, and anything goes.
> That the bytes mostly happen to be valid utf8 strings in NFC, is just a
> by-product of the fact that's the format most programs use when calling the
> file primitives. 
> 
> Cheers,
> Henry
> 
> 
> 
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Developers-f1294837.html
> 
> 
> 
> 
> -- 
>    
> Guille Polito
> Research Engineer
> 
> Centre de Recherche en Informatique, Signal et Automatique de Lille
> CRIStAL - UMR 9189
> French National Center for Scientific Research - http://www.cnrs.fr
> 
> Web: http://guillep.github.io
> Phone: +33 06 52 70 66 13


Reply via email to