Re: [Pharo-dev] [squeak-dev] Unicode Support

Gabriel Cotelli Sun, 06 Dec 2015 14:03:07 -0800

As far as I know Dart also uses utf-16 for Strings
On Dec 6, 2015 16:33, "Mark Bestley" <s...@bestley.co.uk> wrote:


> On 06/12/2015 19:08, Sven Van Caekenberghe wrote:
>
>> BTW, does anyone know of any programming language that did go that way or
>> has a library that directly implements 'storing all strings as utf-8' ?
>>
> Java is UTF-16
>
> Python3, Go and Swift are UTF-8 as I suspect are other new languages not
> based on .Net or the JVM
>
> Mark
>
>
>>
>
> On 06 Dec 2015, at 18:45, Sven Van Caekenberghe <s...@stfx.eu> wrote:
>>>
>>> Well written, Todd. I agree, the loss of indexing might not be that big
>>> a problem in practice. The only way to find out it to try an experiment I
>>> guess.
>>>
>>> Sven
>>>
>>> On 06 Dec 2015, at 17:37, Todd Blanchard <tblanch...@mac.com> wrote:
>>>>
>>>> (Resent because of bounce notification (email handling in osx is really
>>>> beginning to annoy me).  Sorry if its a dup)
>>>>
>>>> I used to worry a lot about strings being indexable.  And then I
>>>> eventually let go of that and realized that it isn't a particularly
>>>> important property for them to have.
>>>>
>>>> I think you will find that UTF8 is generally the most convenient for a
>>>> lot of things but its a bit like light in that you treat it alternately as
>>>> a wave or particle depending on what you are trying to do.
>>>>
>>>> So goes strings - they can be treated alternately as streams or byte
>>>> arrays (not character arrays - stop thinking in characters).  In practice,
>>>> this tends to not be a problem since a lot of the times when you want to
>>>> replace a character or pick out the nth one you are doing something very
>>>> computerish and the characters you are working with are the single byte
>>>> (ASCII legacy) variety.  You generally know when you can get away with that
>>>> and when you can't.
>>>>
>>>> Otherwise you are most likely doing things that are best dealt with in
>>>> a streaming paradigm.  For most computation, you come to realize you don't
>>>> generally care how many characters but how much space (bytes) you need to
>>>> store your chunk of text.  Collation is tricky and complicated in unicode
>>>> in general but it isn't any worse in UTF8 than any other encoding.  You are
>>>> still going to scan each sortable item from front to back to determine its
>>>> order, regardless.
>>>>
>>>> Most of the outside world has settled on UTF8 and any ASCII file is
>>>> already UTF8 - which is why it ends up being so convenient.  Most of our
>>>> old text handling infrastructure can still handle UTF8 while it tends to
>>>> choke on wider encodings.
>>>>
>>>> -Todd Blanchard
>>>>
>>>> On Dec 6, 2015, at 07:23, H. Hirzel <hannes.hir...@gmail.com> wrote:
>>>>>
>>>>> We do the same thing, but that doesn't mean it's a good idea to create
>>>>>> a
>>>>>> new String-like class having its content encoded in UTF-8, because
>>>>>> UTF-8-encoded strings can't be modified like regular strings. While it
>>>>>> would be possible to implement all operations, such implementation
>>>>>> would
>>>>>> become the next SortedCollection (bad performance due to misuse).
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>>
>
> --
> Mark
>
>
>

Re: [Pharo-dev] [squeak-dev] Unicode Support

Reply via email to