As far as I know Dart also uses utf-16 for Strings On Dec 6, 2015 16:33, "Mark Bestley" <s...@bestley.co.uk> wrote:
> On 06/12/2015 19:08, Sven Van Caekenberghe wrote: > >> BTW, does anyone know of any programming language that did go that way or >> has a library that directly implements 'storing all strings as utf-8' ? >> > Java is UTF-16 > > Python3, Go and Swift are UTF-8 as I suspect are other new languages not > based on .Net or the JVM > > Mark > > >> > > On 06 Dec 2015, at 18:45, Sven Van Caekenberghe <s...@stfx.eu> wrote: >>> >>> Well written, Todd. I agree, the loss of indexing might not be that big >>> a problem in practice. The only way to find out it to try an experiment I >>> guess. >>> >>> Sven >>> >>> On 06 Dec 2015, at 17:37, Todd Blanchard <tblanch...@mac.com> wrote: >>>> >>>> (Resent because of bounce notification (email handling in osx is really >>>> beginning to annoy me). Sorry if its a dup) >>>> >>>> I used to worry a lot about strings being indexable. And then I >>>> eventually let go of that and realized that it isn't a particularly >>>> important property for them to have. >>>> >>>> I think you will find that UTF8 is generally the most convenient for a >>>> lot of things but its a bit like light in that you treat it alternately as >>>> a wave or particle depending on what you are trying to do. >>>> >>>> So goes strings - they can be treated alternately as streams or byte >>>> arrays (not character arrays - stop thinking in characters). In practice, >>>> this tends to not be a problem since a lot of the times when you want to >>>> replace a character or pick out the nth one you are doing something very >>>> computerish and the characters you are working with are the single byte >>>> (ASCII legacy) variety. You generally know when you can get away with that >>>> and when you can't. >>>> >>>> Otherwise you are most likely doing things that are best dealt with in >>>> a streaming paradigm. For most computation, you come to realize you don't >>>> generally care how many characters but how much space (bytes) you need to >>>> store your chunk of text. Collation is tricky and complicated in unicode >>>> in general but it isn't any worse in UTF8 than any other encoding. You are >>>> still going to scan each sortable item from front to back to determine its >>>> order, regardless. >>>> >>>> Most of the outside world has settled on UTF8 and any ASCII file is >>>> already UTF8 - which is why it ends up being so convenient. Most of our >>>> old text handling infrastructure can still handle UTF8 while it tends to >>>> choke on wider encodings. >>>> >>>> -Todd Blanchard >>>> >>>> On Dec 6, 2015, at 07:23, H. Hirzel <hannes.hir...@gmail.com> wrote: >>>>> >>>>> We do the same thing, but that doesn't mean it's a good idea to create >>>>>> a >>>>>> new String-like class having its content encoded in UTF-8, because >>>>>> UTF-8-encoded strings can't be modified like regular strings. While it >>>>>> would be possible to implement all operations, such implementation >>>>>> would >>>>>> become the next SortedCollection (bad performance due to misuse). >>>>>> >>>>> >>>> >>> >> >> >> > > -- > Mark > > >