Re: [Pharo-dev] Better management of encoding of environment variables

Guillermo Polito Fri, 18 Jan 2019 06:10:17 -0800

On Fri, Jan 18, 2019 at 2:46 PM Ben Coman <[email protected]> wrote:


>
>
> On Fri, 18 Jan 2019 at 21:39, Sven Van Caekenberghe <[email protected]> wrote:
>
>>
>>
>> > On 18 Jan 2019, at 14:23, Guillermo Polito <[email protected]>
>> wrote:
>> >
>> >
>> > I think that will just overcomplicate things. Right now, all Strings in
>> Pharo are unicode strings.
>
>
> Cool. I didn't realise that.  But to be pedantic, which unicode encoding?
> Should I presume from Sven's "UTF-8 encoding step" comment below
> and the WideString class comment  "This class represents the array of 32
> bit wide characters"
> that the WideString encoding is UTF-32?  So should its comment be updated
> to advise that?
>

None :D

That's the funny thing, they are not encoded.

Actually, you should see Strings as collections of Characters, and
Characters defined in terms of their abstract code points.
ByteStrings are an optimized (just more compact) version that stores
codepoints that fit in a byte.


> cheers -ben
>
> Characters are represented with their corresponding unicode codepoint.
>> > If all characters in a string have codepoints < 256 then they are just
>> stored in a bytestring. Otherwise they are WideStrings.
>> >
>> > I think assuming a single representation for strings, and then encode
>> when interacting with external apps/APIs is MUCH simpler.
>>
>> Absolutely !
>>
>> (and yes I know that for outgoing FFI calls that might mean a UTF-8
>> encoding step, so be it).
>>
>

-- 



Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - *http://www.cnrs.fr
<http://www.cnrs.fr>*


*Web:* *http://guillep.github.io* <http://guillep.github.io>

*Phone: *+33 06 52 70 66 13

Re: [Pharo-dev] Better management of encoding of environment variables

Reply via email to