Re: [Pharo-dev] Better management of encoding of environment variables

Sven Van Caekenberghe Fri, 18 Jan 2019 06:11:44 -0800


> On 18 Jan 2019, at 14:45, Ben Coman <[email protected]> wrote:
> 
> 
> 
> On Fri, 18 Jan 2019 at 21:39, Sven Van Caekenberghe <[email protected]> wrote:
> 
> 
> > On 18 Jan 2019, at 14:23, Guillermo Polito <[email protected]> 
> > wrote:
> > 
> > 
> > I think that will just overcomplicate things. Right now, all Strings in 
> > Pharo are unicode strings.
> 
> Cool. I didn't realise that.  But to be pedantic, which unicode encoding? 
> Should I presume from Sven's "UTF-8 encoding step" comment below 
> and the WideString class comment  "This class represents the array of 32 bit 
> wide characters"
> that the WideString encoding is UTF-32?  So should its comment be updated to 
> advise that?

Not really, Pharo Strings are a collection of Characters, each of which is a 
Unicode code point (yes a 32 bit one).

An encoding projects this rather abstract notion onto a sequence of bytes,

UTF-32 (ZnUTF32Encoder, https://en.wikipedia.org/wiki/UTF-32) is for example 
endian dependent.

Read the first part of

https://ci.inria.fr/pharo-contribution/job/EnterprisePharoBook/lastSuccessfulBuild/artifact/book-result/Zinc-Encoding-Meta/Zinc-Encoding-Meta.html

> cheers -ben
> 
> Characters are represented with their corresponding unicode codepoint.
> > If all characters in a string have codepoints < 256 then they are just 
> > stored in a bytestring. Otherwise they are WideStrings.
> > 
> > I think assuming a single representation for strings, and then encode when 
> > interacting with external apps/APIs is MUCH simpler.
> 
> Absolutely !
> 
> (and yes I know that for outgoing FFI calls that might mean a UTF-8 encoding 
> step, so be it).

Re: [Pharo-dev] Better management of encoding of environment variables

Reply via email to