On Mar 30, 2020, at 12:00, Paul Sokolovsky <pmis...@gmail.com> wrote:
> Roughly speaking, to support efficient appending, one need to
> be ready to over-allocate string storage, and maintain bookkeeping for
> this. Another known optimization CPython does is for stuff like "s =
> s[off:]", which requires maintaining another "offset" pointer. Even
> with this simplistic consideration, internal structure of "str" would
> be about the same as "io.StringIO" (which also needs to over-allocate
> and maintain "current offset" pointer). But why, if there's io.StringIO
> in the first place?

Because io.StringIO does _not_ need to do that. It’s documented to act like a 
TextIOWrapper around a BytesIO. And the pure-Python implementation (as used by 
some non-CPython implementations of Python) is actually implemented that way: 
https://github.com/python/cpython/blob/3.8/Lib/_pyio.py#L2637. Every read and 
write to a StringIO passes through the incremental newline processor and the 
incremental UTF-8 coded to get passed on to a BytesIO. That’s not remotely 
optimal. And it doesn’t allow you to do random-access seeks to arbitrary 
character positions.

It’s true that the C accelerator for io.StringIO used by CPython uses a dynamic 
overallocated array of UCS4 instead, but you can’t rely on that portably any 
more than you can rely on CPython’s str.__iadd__
optimization portably. Plus, it’s optimized for typical file-like usage, not 
for typical string-like usage, so the resize rules aren’t the same; there’s no 
attempt to optimize storage for all-Latin or all-BMP text; and so on. Plus, it 
still has to deal with file-ish things like universal newline support which you 
not only don’t need, but explicitly want to not be there.

> (*) Instead, there're various of practical hacks to implement it, as
> both 2006's and this thread shows.

No, there is one idiomatic way to do it: create a list of strings and join 
them. That’s not a “hack” any more than using a string builder class or a 
string stream/file class is a “hack”. The fact that the standard Python idiom, 
the standard Java idiom, and the standard C++ idiom for building strings are 
all different is not a defect in any of those three languages; they’re all 
perfectly reasonable. And changing Python to have two standard idioms instead 
of one (with the new one less efficient and more complicated) would not be an 
improvement.

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/RV4QGKKU4OQVP4RVHFIYP5OQCDV2OTYO/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to