I am interested in creating a patch to make deleting elements from the front of
Python list work in O(1) time by advancing the ob_item pointer.
The patch will probably be rejected, but I would like to try it anyway as an
exercise in digging into the CPython source, and working through the process.
My goal is to accompany the proposed patch with a PEP (which I also expect to
be initially rejected, but which will hopefully be a useful contribution in
terms of documenting the decision.)
The reason I am posting here is that there appears to be some history behind
similar patches that I am not aware of, so if anybody can refer me to earlier
patches, PEPS, discussion threads, etc., I would be grateful.
I am aware of PEP 3128, which has similar goals to what I'm trying to achieve,
but there are also some differences.
The blist implementation described in PEP 3128 achieves the goal of reducing
time complexity for some operations, but necessarily at the expense of other
operations, notably list access.
The patch that I am considering would not affect time complexity for any other
operations, nor memory complexity, but it would, of course, have marginal costs
in certain operations, notably any operation that eventually calls
list_resize(). I am confident that the patch would not impact performance of
list accesses at all. The memory cost for the list itself would be an
additional pointer or integer, which I think should be considered negligible
compared to the cost of the list itself [O(N)] and the elements in the list
[O(N)]. I haven't completely worked out the best strategy to eventually
release the memory taken up by the pointers of the unreleased elements, but the
worst case scenario is that the unused memory only gets wasted until the time
that the list itself gets garbage collected. I think I can do better than
that, at some cost of complicating list_resize. From a memory standpoint, the
benefits of encouraging deleting items from the front
of the list should outweigh any disadvantages with respect to lazily releasing
memory from the pointers at the front of the list, since in deleting elements,
you allow the elements themselves to be garbage collected earlier, as well as
objects that might be referenced by those elements.
My goal would be to target the patch at 3.x, and if I was lucky enough to get
it accepted, I think it could eventually be backported to 2.x. The proposal
does not affect the definition of the language itself, of course; it merely
attempts to improve the performance of the CPython implementation.
The instructions that I have found for setting up your development environment
seemed to be targeted at 2.x trunk, which is fine with me. I will attempt the
patch off the 2.x trunk to get an initial feel for the issues involved, unless
somebody counsels me to work off 3.x right from the start.
http://www.python.org/dev/setup/
I have not been able to locate the source code for 3.x. Is the implementation
of list more or less the same there?
There is a long thread entitled "list.pop(0) vs. collections.dequeue" on
comp.lang.python that addresses alternatives to list.pop(0), but none of them
really fit my use case.
Here is a sketch of the PEP that I would propose:
Proposal:
Improve list's implementation so that deleting elements from
the front of the list does not require an O(N) memmove operation.
Rationale:
Some Python programs that process lists have multiple
methods that consume the first element of the list and pop it off.
The pattern comes up with parsers in particular, but there are other
examples. It is possible now, of course, to use a data structure in
Python that has O(1) for deleting off the top of the list, but none of
the alternatives fully replicate the benefits of list itself.
Specification:
Improving CPython's performance does not affect the
language itself, so there are no bikeshed arguments to be had with
respect to syntax, etc. Any patch would, of course, affect the
performance of nearly every Python program in existence, so any patch
would have to, at a bare minimum:
1) Not increase the time or memory complexity of any other list
operation.
2) Not affect list access at all.
3) Minimally affect list operations that mutate the list.
4) Be reasonably simple within CPython itself.
5) Not be grossly wasteful of memory.
Backwards Compatibility:
See above. An implementation of this PEP would not change the
definition of the language in any way, but it would have to minimally
impact the performance of lists for the normal use cases.
Implementation:
There are two ways to make deleting the first item of the list run
more efficiently.
The most ambitious proposal is to fix the memory manager itself to
allow the release of memory from the start of the chunk. The
advantage of this proposal is that it would simplify the changes to
list itself, and possibly have collateral benefits for other CPython
internal data structures. The disadvantage of the proposal is that
there is a strong tradition in CPython to use native memory
management, particularly with respect to the fact that it runs on many
platforms.
The less ambitious proposal is to change the memory management scheme
within list itself. There is already precedent in list_resize() to
optimistically allocate memory, so it is not a great break from
tradition to optimistically defer the release of memory. But it would
complicate things.
Under both alternatives, ob_item would continue to point to the first active
element in the list as it does now.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com