[issue46990] Surprising list overallocation from .split()

Tim Peters Fri, 11 Mar 2022 19:25:07 -0800


Tim Peters <t...@python.org> added the comment:


Well, that's annoying ;-) In context, the OP was saving a list of 10 million 
splits. So each overallocation by a single element burned 80 million bytes of 
RAM. Overallocating by 7 burned 560 million bytes.

Which is unusual. Usually a split result is short-lived, consumed once then 
thrown away.

OTOH, the overwhelming motivation for overallocating at all is to acheive O(1) 
amortized time after a long _sequence_ of appends, and split results typically 
aren't appended to at all. split() appears to be using it as a timing 
micro-optimization for tiny lists instead.

So, like I said, it's annoying ;-) For "small" lists, split() really shouldn't 
overallocate at all (because, as before, split results are rarely appended to). 
A compromise could be to save pointers to the first N (12, whatever) instances 
of the splitting string in a stack ("auto") vector, before any list object (or 
result string object) is created. If it's out of stuff to do before reaching N, 
fine, build a result out of exactly what was found. If there's more to do, 
build a result from the first N, and go on as currently (letting PyList_Append 
deal with it - overallocation is huge in percentage terms when the list is 
short, but not so much as the list gets longer).

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue46990>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue46990] Surprising list overallocation from .split()

Reply via email to