Ok, true enough that dereferencing and limited linear search is still O(1).
I could have phrased that slightly more precisely.

But the trade-off part is true. Indexing into character 1 million of a
utf-32 string is just one memory offset calculation, them following the
reference. Indexing into the utf-8-with-offset-list is a couple
dereferences, and on average 128 sequential scans. So it's not worse big-O,
but it's around 100x slower... Still a lot faster than sequential scan of
all 1 million though.

What does actual CPython do currently to find that s[1_000_000], assuming
utf-8 internal representation?

On Sat, Oct 26, 2019, 11:02 PM Random832 <random...@fastmail.com> wrote:

> On Sat, Oct 26, 2019, at 20:26, David Mertz wrote:
> > Absolutely, utf-8 is a wonderful encoding. And indeed, worst case is
> > the same storage requirement as utf-16 or utf-32. For O(1) random
> > access into all strings, we have to eat 32-bits per character, one way
> > or the other, but of course there are space/speed trade-offs one could
> > make for intermediate behavior.
>
> A string representation considering of (say) a UTF-8 string, plus an
> auxiliary list of byte indices of, say, 256-codepoint-long chunks [along
> with perhaps a flag to say that the chunk is all-ASCII or not] would
> provide O(1) random access, though, of course, despite both being O(1),
> "single index access" vs "single index access then either another index
> access or up to 256 iterate-forward operations" aren't *really* the same
> speed.
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/N4ONH5O443FWB7M7E2FF24QR32HXAPAD/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ENN4Y3ZZOPG2NM5SEQOQMLQ2N7P6L3LI/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to