[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

Christopher Barker Mon, 30 Mar 2020 16:28:20 -0700

As others have pointed out, the OP started in a  bit of an oblique way, but
it maybe come down to this:

There are some use-cases for a mutable string type. And one could certainly
write one.

presto: here is one:

https://github.com/Daniil-Kost/mutable_strings

Which looks to me to be more a toy than anything, but maybe the author is
seriously using it... (it does look like it has a bug indexing if there
are  non-ascii)

And yet, as far as I know, there has never been one that was carefully
written and optimized, which would be a bit of a trick, because of how
Python strings handle Unicode. (it would have been a lot easier with
Python2 :-) )

So why not?

1) As pointed out, high performance strings are key to a lot of coding, so
Python's str is very baked-in to a LOT of code, and can't be duck-typed. I
know that pretty much the only time I ever type check (as apposed to simple
duck typing EAFTP) is for str. So if one were to make a mutable string
type, you'd have to convert it to a string a lot in order to use most other
libraries.

That being said, one could write a mutable string that mirrored' the
cPython string types as much as possible, and it could be pretty efficient,
even for making regular strings out of it.

2) Maybe it's really not that useful. Other than building up a long string
with a bunch of small ones (which can be done fine with .join())  , I'm not
sure I've had much of a use case -- it would buy you a tiny bit of
performance for, say, altering strings in ways that don't change their
length, but I doubt there's many (if any) applications that would see any
meaningful benefit from that.

So I'd say it hasn't been done because (1) it's a lot of work and (2) it
would be a bit of a pain to use, and not gain much at all.

A kind-of-related anecdote:

numpy arrays are mutable, but you can not change their length in place. So,
similar with strings, if you want to build up an array with a lot of little
pieces, then the best way is to put all the pieces in a list, and then make
an array out of it when you are done.

I had a need to do that fairly often (reading data from files of unknown
size) so I actually took the time to write an array that could be extended.

Turns out that:

1) it really wasn't much faster (than using a list) in the usual use-cases
anyway :-)
2) it did save memory -- which only mattered for monster arrays, and I'd
likely need to do something smarter anyway in those cases.

I even took some time to write a Cython-optimized version, which only
helped a little. I offered it up to the numpy community.

But in the end: no one expressed much interest. And I haven't used it
myself for anything in a long while.

Moral of the story: not much point in a special class to do something that
can already be done almost as well with the builtins.

-CHB

On Mon, Mar 30, 2020 at 2:06 PM Paul Sokolovsky <pmis...@gmail.com> wrote:

> Hello,
>
> On Tue, 31 Mar 2020 07:40:01 +1100
> Chris Angelico <ros...@gmail.com> wrote:
>
> > On Tue, Mar 31, 2020 at 7:04 AM Paul Sokolovsky <pmis...@gmail.com>
> > wrote:
> > >     for i in range(50000):
> > >         v = u"==%d==" % i
> > >         # All individual strings will be kept in the list and
> > >         # can't be GCed before teh final join.
> > >         sz += sys.getsizeof(v)
> > >         sb.append(v)
> > >     s = "".join(sb)
> > >     sz += sys.getsizeof(sb)
> > >     sz += sys.getsizeof(s)
> > >     print(sz)
> > >
> >
> > > ... about order of magnitude more memory ...
> >
> > I suspect you may be multiply-counting some of your usage here. Rather
> > than this, it would be more reliable to use the resident set size (on
> > platforms where you can query that).
>
> I may humbly suggest a different process too: get any hardware
> board with MicroPython and see how much data you can collect in a
> StringIO and in a list of strings. Well, you actually don't need a
> dedicated hardware, just get a Linux or Windows version and run it
> with a specific heap size using a -X heapsize= switch, e.g. -X
> heapsize=100K.
>
> Please don't stop there, we talk multiple implementations, try it on
> CPython too. There must be a similar option there (because how
> otherwise you can perform any memory-related testing!), I just forgot
> which.
>
> The results should be very apparent, and only forgotten option may
> obfuscate it.
>
> []
>
> --
> Best regards,
>  Paul                          mailto:pmis...@gmail.com
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/ZWKHUVQUMTUIGKXHGXG2AA3F35VUD2Y4/
> Code of Conduct: http://python.org/psf/codeofconduct/
>

-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CXKA2FVQSFZHSOE2RBDK5RBKPG5HFM3A/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

Reply via email to