On Fri, Jan 18, 2019 at 01:40:26PM +0100, Sven Van Caekenberghe wrote: > Dave, > > > On 18 Jan 2019, at 01:54, David T. Lewis via Pharo-dev > > <pharo-dev@lists.pharo.org> wrote: > > > > > > From: "David T. Lewis" <le...@mail.msen.com> > > Subject: Re: [Pharo-dev] Better management of encoding of environment > > variables > > Date: 18 January 2019 at 01:54:34 GMT+1 > > To: Pharo Development List <pharo-dev@lists.pharo.org> > > > > > > On Thu, Jan 17, 2019 at 04:57:18PM +0100, Sven Van Caekenberghe wrote: > >> > >>> On 16 Jan 2019, at 23:23, Eliot Miranda <eliot.mira...@gmail.com> wrote: > >>> > >>> On Wed, Jan 16, 2019 at 2:37 AM Sven Van Caekenberghe <s...@stfx.eu> > >>> wrote: > >>> > >>> The image side is perfectly capable of dealing with platform differences > >>> in a clean/clear way, and at least we can then use the full power of our > >>> language and our tools. > >>> > >> Agreed. At the same time I think it is very important that we don't reply > >> on the FFI for environment variable access. This is a basic cross-platform > >> facility. So I would like to see the environment accessed through > >> primitives, > >> but have the image place interpretation on the result of the primitive(s), > >> and have the primitive(s) answer a raw result, just a sequence of > >> uninterpreted > >> bytes. > >> > >> OK, I can understand that ENV VAR access is more fundamental than FFI > >> (although FFI is already essential for Pharo, also during startup). > >> > >>> VisualWorks takes this approach and provides a class UninterpretedBytes > >>> that the VM is aware of. That's always seemed like an ugly name and > >>> overkill to me. I would just use ByteArray and provide image level > >>> conversion from ByteArray to String, which is what I believe we have > >>> anyway. > >> > >> Right, bytes are always uninterpreted, else they would be something else. > >> We got ByteArray>>#decodedWith: and ByteArray>>#utf8Decoded and our > >> ByteArray > >> inspector decodes automatically if it can. > >> > > > > Hi Sven, > > > > I am the author of the getenv primitives, and I am also sadly uninformed > > about matters of character sets and strings in a multilingual environment. > > > > The primitives answer environment variable variable values as ByteString > > rather than ByteArray. This made sense to me at the time that I wrote it, > > because ByteString is easy to display in an inspector, and because it is > > easily converted to ByteArray. > > > > For an American English speaker this seems like a good choice, but I > > wonder now if it is a bad decision. After all, it is also trivially easy > > to convert a ByteArray to ByteString for display in the image. > > > > Would it be helpful to have getenv primitives that answer ByteArray > > instead, and to let all conversion (including in OSProcess) be done in > > the image? > > > > Thanks, > > Dave > > Normally, the correct way to represent uninterpreted bytes is with a > ByteArray. Decoding these bytes as characters is the specific task of a > character encoder/decoder, with a deliberate choice as to which to use. > > Since the getenv() system call uses simple C strings, it is understandable > that this was carried over. It is probably not worth or too risky to change > that - as long as the receiver understands that it is a raw OS string that > needs more work. > > Like with file path encoding/decoding, environment variable encoding/decoding > is plain messy and complex. IMHO it is better to manage that at the image > level where we are more agile and can better handle that complexity. >
Thanks Sven, that makes perfect sense to me. > > BTW: using funny Unicode chars, like ???? > [https://www.fileformat.info/info/unicode/char/1f388/index.htm] is something > even English speakers do. > You are right. I wrote those getenv primitives 20 years ago and back then we were still doing our emoticons like this: ;-) Thanks, Dave