Hi Guille,

> On Jan 18, 2019, at 6:04 AM, Guillermo Polito <[email protected]> 
> wrote:
> 
>> On Fri, Jan 18, 2019 at 2:46 PM Ben Coman <[email protected]> wrote:
>> 
>>> On Fri, 18 Jan 2019 at 21:39, Sven Van Caekenberghe <[email protected]> wrote:
>>> 
>>> > On 18 Jan 2019, at 14:23, Guillermo Polito <[email protected]> 
>>> > wrote:
>>> > 
>>> > 
>>> > I think that will just overcomplicate things. Right now, all Strings in 
>>> > Pharo are unicode strings.
>> 
>> Cool. I didn't realise that.  But to be pedantic, which unicode encoding? 
>> Should I presume from Sven's "UTF-8 encoding step" comment below 
>> and the WideString class comment  "This class represents the array of 32 bit 
>> wide characters"
>> that the WideString encoding is UTF-32?  So should its comment be updated to 
>> advise that?
> 
> None :D
> 
> That's the funny thing, they are not encoded.
> 
> Actually, you should see Strings as collections of Characters, and Characters 
> defined in terms of their abstract code points.
> ByteStrings are an optimized (just more compact) version that stores 
> codepoints that fit in a byte.

And Spur supports 16-bit strings too, which would be versions that store code 
points that fit in doublebytes.

>> cheers -ben
>> 
>>> Characters are represented with their corresponding unicode codepoint.
>>> > If all characters in a string have codepoints < 256 then they are just 
>>> > stored in a bytestring. Otherwise they are WideStrings.
>>> > 
>>> > I think assuming a single representation for strings, and then encode 
>>> > when interacting with external apps/APIs is MUCH simpler.
>>> 
>>> Absolutely !
>>> 
>>> (and yes I know that for outgoing FFI calls that might mean a UTF-8 
>>> encoding step, so be it).
> 
> 
> -- 
>    
> Guille Polito
> Research Engineer
> Centre de Recherche en Informatique, Signal et Automatique de Lille
> CRIStAL - UMR 9189
> French National Center for Scientific Research - http://www.cnrs.fr
> 
> Web: http://guillep.github.io
> Phone: +33 06 52 70 66 13

Reply via email to