Re: Micro Python -- a lean and efficient implementation of Python 3

Robin Becker Wed, 04 Jun 2014 04:56:25 -0700

On 04/06/2014 12:01, Tim Chase wrote:

On 2014-06-04 00:58, Paul Rubin wrote:

Steven D'Aprano <st...@pearwood.info> writes:

Maybe there's a use-case for a microcontroller that works in
ISO-8859-5 natively, thus using only eight bits per character,

That won't even make the Russians happy, since in Russia there
are multiple incompatible legacy encodings.


I've never understood why not use UTF-8 for everything.


If you use UTF-8 for everything, then you end up in a world where
string-indexing (see ChrisA's other side thread on this topic) is no
longer an O(1) operation, but an O(N) operation.  Some of us slice
strings for a living. ;-)  I understand that using UTF-32 would allow
us to maintain O(1) indexing at the cost of every string occupying 4
bytes per character.  The FSR (again, as I understand it) allows
strings that fit in one-byte-per-character to use that, scaling up to
use wider characters internally as they're actually needed/used.

........

I believe that we should distinguish between glyph/character indexing and stringindexing. Even in unicode it may be hard to decide where a visual glyph startsand ends. I assume most people would like to assign one glyph to one unicode,but that's not always possible with composed glyphs.


>>> for a in (u'\xc5',u'A\u030a'):
...     for o in (u'\xf6',u'o\u0308'):
...             u=a+u'ngstr'+o+u'm'
...             print("%s %s" % (repr(u),u))
...
u'\xc5ngstr\xf6m' Ångström
u'\xc5ngstro\u0308m' Ångström
u'A\u030angstr\xf6m' Ångström
u'A\u030angstro\u0308m' Ångström
>>> u'\xc5ngstr\xf6m'==u'\xc5ngstro\u0308m'
False

so even unicode doesn't always allow for O(1) glyph indexing. I know this isartificial, but this is the same situation as utf8 faces just the frequency ofoccurrence is different. A very large amount of computing is still westerncentric so searching a byte string for latin characters is still efficient;searching for an n with a tilde on top might not be so easy.

--
Robin Becker

--
https://mail.python.org/mailman/listinfo/python-list

Re: Micro Python -- a lean and efficient implementation of Python 3

Reply via email to