On Fri, Jan 18, 2019 at 2:46 PM Ben Coman <[email protected]> wrote:
> > > On Fri, 18 Jan 2019 at 21:39, Sven Van Caekenberghe <[email protected]> wrote: > >> >> >> > On 18 Jan 2019, at 14:23, Guillermo Polito <[email protected]> >> wrote: >> > >> > >> > I think that will just overcomplicate things. Right now, all Strings in >> Pharo are unicode strings. > > > Cool. I didn't realise that. But to be pedantic, which unicode encoding? > Should I presume from Sven's "UTF-8 encoding step" comment below > and the WideString class comment "This class represents the array of 32 > bit wide characters" > that the WideString encoding is UTF-32? So should its comment be updated > to advise that? > None :D That's the funny thing, they are not encoded. Actually, you should see Strings as collections of Characters, and Characters defined in terms of their abstract code points. ByteStrings are an optimized (just more compact) version that stores codepoints that fit in a byte. > cheers -ben > > Characters are represented with their corresponding unicode codepoint. >> > If all characters in a string have codepoints < 256 then they are just >> stored in a bytestring. Otherwise they are WideStrings. >> > >> > I think assuming a single representation for strings, and then encode >> when interacting with external apps/APIs is MUCH simpler. >> >> Absolutely ! >> >> (and yes I know that for outgoing FFI calls that might mean a UTF-8 >> encoding step, so be it). >> > -- Guille Polito Research Engineer Centre de Recherche en Informatique, Signal et Automatique de Lille CRIStAL - UMR 9189 French National Center for Scientific Research - *http://www.cnrs.fr <http://www.cnrs.fr>* *Web:* *http://guillep.github.io* <http://guillep.github.io> *Phone: *+33 06 52 70 66 13
