On Fri, Jan 18, 2019 at 01:40:26PM +0100, Sven Van Caekenberghe wrote:
> Dave,
> 
> > On 18 Jan 2019, at 01:54, David T. Lewis via Pharo-dev 
> > <pharo-dev@lists.pharo.org> wrote:
> > 
> > 
> > From: "David T. Lewis" <le...@mail.msen.com>
> > Subject: Re: [Pharo-dev] Better management of encoding of environment 
> > variables
> > Date: 18 January 2019 at 01:54:34 GMT+1
> > To: Pharo Development List <pharo-dev@lists.pharo.org>
> > 
> > 
> > On Thu, Jan 17, 2019 at 04:57:18PM +0100, Sven Van Caekenberghe wrote:
> >> 
> >>> On 16 Jan 2019, at 23:23, Eliot Miranda <eliot.mira...@gmail.com> wrote:
> >>> 
> >>> On Wed, Jan 16, 2019 at 2:37 AM Sven Van Caekenberghe <s...@stfx.eu> 
> >>> wrote:
> >>> 
> >>> The image side is perfectly capable of dealing with platform differences
> >>> in a clean/clear way, and at least we can then use the full power of our
> >>> language and our tools.
> >>> 
> >> Agreed.  At the same time I think it is very important that we don't reply
> >> on the FFI for environment variable access.  This is a basic cross-platform
> >> facility.  So I would like to see the environment accessed through 
> >> primitives,
> >> but have the image place interpretation on the result of the primitive(s),
> >> and have the primitive(s) answer a raw result, just a sequence of 
> >> uninterpreted
> >> bytes.
> >> 
> >> OK, I can understand that ENV VAR access is more fundamental than FFI
> >> (although FFI is already essential for Pharo, also during startup).
> >> 
> >>> VisualWorks takes this approach and provides a class UninterpretedBytes
> >>> that the VM is aware of.  That's always seemed like an ugly name and
> >>> overkill to me.  I would just use ByteArray and provide image level
> >>> conversion from ByteArray to String, which is what I believe we have 
> >>> anyway.
> >> 
> >> Right, bytes are always uninterpreted, else they would be something else.
> >> We got ByteArray>>#decodedWith: and ByteArray>>#utf8Decoded and our 
> >> ByteArray
> >> inspector decodes automatically if it can.
> >> 
> > 
> > Hi Sven,
> > 
> > I am the author of the getenv primitives, and I am also sadly uninformed
> > about matters of character sets and strings in a multilingual environment.
> > 
> > The primitives answer environment variable variable values as ByteString
> > rather than ByteArray. This made sense to me at the time that I wrote it,
> > because ByteString is easy to display in an inspector, and because it is
> > easily converted to ByteArray.
> > 
> > For an American English speaker this seems like a good choice, but I
> > wonder now if it is a bad decision. After all, it is also trivially easy
> > to convert a ByteArray to ByteString for display in the image.
> > 
> > Would it be helpful to have getenv primitives that answer ByteArray
> > instead, and to let all conversion (including in OSProcess) be done in
> > the image?
> > 
> > Thanks,
> > Dave
> 
> Normally, the correct way to represent uninterpreted bytes is with a 
> ByteArray. Decoding these bytes as characters is the specific task of a 
> character encoder/decoder, with a deliberate choice as to which to use.
> 
> Since the getenv() system call uses simple C strings, it is understandable 
> that this was carried over. It is probably not worth or too risky to change 
> that - as long as the receiver understands that it is a raw OS string that 
> needs more work.
> 
> Like with file path encoding/decoding, environment variable encoding/decoding 
> is plain messy and complex. IMHO it is better to manage that at the image 
> level where we are more agile and can better handle that complexity.
> 

Thanks Sven, that makes perfect sense to me.

>
> BTW: using funny Unicode chars, like ???? 
> [https://www.fileformat.info/info/unicode/char/1f388/index.htm] is something 
> even English speakers do.
>

You are right. I wrote those getenv primitives 20 years ago and
back then we were still doing our emoticons like this:

;-)

Thanks,
Dave
 

Reply via email to