Ok, true enough that dereferencing and limited linear search is still O(1). I could have phrased that slightly more precisely.
But the trade-off part is true. Indexing into character 1 million of a utf-32 string is just one memory offset calculation, them following the reference. Indexing into the utf-8-with-offset-list is a couple dereferences, and on average 128 sequential scans. So it's not worse big-O, but it's around 100x slower... Still a lot faster than sequential scan of all 1 million though. What does actual CPython do currently to find that s[1_000_000], assuming utf-8 internal representation? On Sat, Oct 26, 2019, 11:02 PM Random832 <random...@fastmail.com> wrote: > On Sat, Oct 26, 2019, at 20:26, David Mertz wrote: > > Absolutely, utf-8 is a wonderful encoding. And indeed, worst case is > > the same storage requirement as utf-16 or utf-32. For O(1) random > > access into all strings, we have to eat 32-bits per character, one way > > or the other, but of course there are space/speed trade-offs one could > > make for intermediate behavior. > > A string representation considering of (say) a UTF-8 string, plus an > auxiliary list of byte indices of, say, 256-codepoint-long chunks [along > with perhaps a flag to say that the chunk is all-ASCII or not] would > provide O(1) random access, though, of course, despite both being O(1), > "single index access" vs "single index access then either another index > access or up to 256 iterate-forward operations" aren't *really* the same > speed. > _______________________________________________ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-le...@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/N4ONH5O443FWB7M7E2FF24QR32HXAPAD/ > Code of Conduct: http://python.org/psf/codeofconduct/ >
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ENN4Y3ZZOPG2NM5SEQOQMLQ2N7P6L3LI/ Code of Conduct: http://python.org/psf/codeofconduct/