On Oct 26, 2019, at 19:59, Random832 <random...@fastmail.com> wrote:
> 
> A string representation considering of (say) a UTF-8 string, plus an 
> auxiliary list of byte indices of, say, 256-codepoint-long chunks [along with 
> perhaps a flag to say that the chunk is all-ASCII or not] would provide O(1) 
> random access, though, of course, despite both being O(1), "single index 
> access" vs "single index access then either another index access or up to 256 
> iterate-forward operations" aren't *really* the same speed.

Yes, but that means constructing a string takes linear time, because you have 
to construct that index. You can’t just take a read/recv/mmap/result of a C 
library/whatever and use it as a string without doing linear work on it first. 

And you have to do that on _every_ string, even though you only need the index 
on a small percentage of them. (Unless you can statically look ahead at the 
code and prove that a string will never be indexed—which a Haskell compiler can 
do, but I don’t think it’s remotely feasible for a language like Python.)

If you redesign your find, re.search, etc. APIs to not return character 
indexes, then I think you can get away with not having character-indexable 
strings. On the rare occasions where you need it, construct a tuple of chars. 
If that isn’t good enough, you can easily write a custom object that wraps a 
string and an index list together that acts like a string and a sequence of 
chars at the same time. There’s no need for the string type itself to do that.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KPT2TJWZ4W4JXRHAIHDV557CWS53LEPX/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to