On Mar 30, 2020, at 08:29, Joao S. O. Bueno <jsbu...@python.org.br> wrote:
> 
> 
> I agree with the arguments the OP brings forward.
> 
> Maybe, it should be the case of having an `StringIO` and `BytesIO` subclass?
> Or better yet, just a class that wraps those, and hide away the other 
> file-like
> methods and behaviors? 

Why? What’s the benefit of building a mutable string around a virtual file 
object wrapped around a buffer (with all the extra complexities and performance 
costs that involves, like incremental Unicode encoding and decoding) instead of 
just building it around a buffer directly?

Also, how can you implement an efficient randomly-accessible mutable string 
object on top of a text file object? Text files don’t do constant-time 
random-access seek to character positions; they can only seek to the opaque 
tokens returned by tell. (This should be obvious if you think about how you 
could seek to the 137th character in a UTF-8 file without reading all of the 
first 137 characters.) (In fact, recent versions of CPython optimize StringIO 
so it only fakes being a TextIOWrapper around a BytesIO and actually uses a 
Py_UCS4* buffer for storage, but that’s CPython-specific, not guaranteed, and 
not accessible from Python even in CPython.)

And, even if that were a good idea for implementation reasons, why should the 
user care? If they need a mutable string, why do they care whether you give 
them one that inherits from or delegates to a StringIO instead of a list or an 
array.array of int32 or the CPython string buffer API (whether accessed via a C 
extension or ctypes.pythonapi) or a pure C library with its own implementation 
and optimizations?

More generally, a StringIO is neither the obvious way nor the fastest way nor 
the recommended way to build strings on the fly in Python, so why do you agree 
with the OP that we need to make it better for that purpose? Just to benefit 
people who want to write C++ instead of Python? If the goal is to cater to 
people who won’t read the docs to learn the right way, the obvious solution is 
to mandate the non-quadratic string concatenation of CPython for all 
implementations, not to give them yet another way of doing it and hope they’ll 
guess or look up that one even though they didn’t guess or look up the 
long-standing existing one.

> That would keep the new class semantically as a string,
> and they could implement all of the str/bytes methods and attributes 
> so as to be a drop-in replacement 

Sadly, this isn’t possible. Large amounts of C code—including builtins and 
stdlib—won’t let you duck type as a string; as it will do a type check and 
expect an actual str (and if you subclass str, it will ignore your methods and 
use the PyUnicode APIs to get your base class’s storage directly as a buffer 
instead). So, no type, either C or Python, can really be a drop-in replacement 
for str. At best you can have something that you have to call str() on half the 
time. That’s why there’s no MutableStr on PyPI, and no UTF8Str, no EncodedStr 
that can act as both a bytes and a str by remembering its encoding (Nick 
Coghlan’s motivating example for changing this back in the early 3.x days), etc.

Fixing this cleanly would probably require splitting the string C API into 
abstract and concrete versions a la sequence and then changing a ton of code to 
respect abstract strings (to only optimize for concrete ones rather than 
requiring them, again like sequences). Fixing it slightly less cleanly with a 
hookable API might be more feasible (I’m pretty sure Nick Coghlan looked into 
it before the 3.3 string redesign; I don’t know if anyone has since), but it’s 
still probably a major change.


_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3EPSLFWDAOHKBXST6HYZIXPJHPNNMB6R/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to