I am interested in creating a patch to make deleting elements from the front of 
Python list work in O(1) time by advancing the ob_item pointer.

The patch will probably be rejected, but I would like to try it anyway as an 
exercise in digging into the CPython source, and working through the process.  
My goal is to accompany the proposed patch with a PEP (which I also expect to 
be initially rejected, but which will hopefully be a useful contribution in 
terms of documenting the decision.)

The reason I am posting here is that there appears to be some history behind 
similar patches that I am not aware of, so if anybody can refer me to earlier 
patches, PEPS, discussion threads, etc., I would be grateful.  

I am aware of PEP 3128, which has similar goals to what I'm trying to achieve, 
but there are also some differences.

The blist implementation described in PEP 3128 achieves the goal of reducing 
time complexity for some operations, but necessarily at the expense of other 
operations, notably list access.

The patch that I am considering would not affect time complexity for any other 
operations, nor memory complexity, but it would, of course, have marginal costs 
in certain operations, notably any operation that eventually calls 
list_resize().  I am confident that the patch would not impact performance of 
list accesses at all.  The memory cost for the list itself would be an 
additional pointer or integer, which I think should be considered negligible 
compared to the cost of the list itself [O(N)] and the elements in the list 
[O(N)].  I haven't completely worked out the best strategy to eventually 
release the memory taken up by the pointers of the unreleased elements, but the 
worst case scenario is that the unused memory only gets wasted until the time 
that the list itself gets garbage collected.  I think I can do better than 
that, at some cost of complicating list_resize.  From a memory standpoint, the 
benefits of encouraging deleting items from the front
 of the list should outweigh any disadvantages with respect to lazily releasing 
memory from the pointers at the front of the list, since in deleting elements, 
you allow the elements themselves to be garbage collected earlier, as well as 
objects that might be referenced by those elements.

My goal would be to target the patch at 3.x, and if I was lucky enough to get 
it accepted, I think it could eventually be backported to 2.x.  The proposal 
does not affect the definition of the language itself, of course; it merely 
attempts to improve the performance of the CPython implementation.

The instructions that I have found for setting up your development environment 
seemed to be targeted at 2.x trunk, which is fine with me.  I will attempt the 
patch off  the 2.x trunk to get an initial feel for the issues involved, unless 
somebody counsels me to work off 3.x right from the start.

http://www.python.org/dev/setup/

I have not been able to locate the source code for 3.x.  Is the implementation 
of list more or less the same there?

There is a long thread entitled "list.pop(0) vs. collections.dequeue" on 
comp.lang.python that addresses alternatives to list.pop(0), but none of them 
really fit my use case.

Here is a sketch of the PEP that I would propose:


Proposal: 


Improve list's implementation so that deleting elements from 
the front of the list does not require an O(N) memmove operation. 


Rationale: 


Some Python programs that process lists have multiple 
methods that consume the first element of the list and pop it off. 
The pattern comes up with parsers in particular, but there are other 
examples.  It is possible now, of course, to use a data structure in 
Python that has O(1) for deleting off the top of the list, but none of 
the alternatives fully replicate the benefits of list itself. 


Specification: 


Improving CPython's performance does not affect the 
language itself, so there are no bikeshed arguments to be had with 
respect to syntax, etc.  Any patch would, of course, affect the 
performance of nearly every Python program in existence, so any patch 
would have to, at a bare minimum: 


  1) Not increase the time or memory complexity of any other list 
operation. 
  2) Not affect list access at all. 
  3) Minimally affect list operations that mutate the list. 
  4) Be reasonably simple within CPython itself. 
  5) Not be grossly wasteful of memory. 


Backwards Compatibility: 


See above.  An implementation of this PEP would not change the 
definition of the language in any way, but it would have to minimally 
impact the performance of lists for the normal use cases. 


Implementation: 


There are two ways to make deleting the first item of the list run 
more efficiently. 


The most ambitious proposal is to fix the memory manager itself to 
allow the release of memory from the start of the chunk.  The 
advantage of this proposal is that it would simplify the changes to 
list itself, and possibly have collateral benefits for other CPython 
internal data structures.  The disadvantage of the proposal is that 
there is a strong tradition in CPython to use native memory 
management, particularly with respect to the fact that it runs on many 
platforms.

The less ambitious proposal is to change the memory management scheme 
within list itself.  There is already precedent in list_resize() to 
optimistically allocate memory, so it is not a great break from 
tradition to optimistically defer the release of memory.  But it would 
complicate things. 

Under both alternatives, ob_item would continue to point to the first active 
element in the list as it does now.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to