--- On Mon, 1/25/10, Nick Coghlan <ncogh...@gmail.com> wrote:

> From: Nick Coghlan <ncogh...@gmail.com>
> Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
> To: "Michael Foord" <fuzzy...@voidspace.org.uk>
> Cc: "Christian Heimes" <li...@cheimes.de>, python-dev@python.org
> Date: Monday, January 25, 2010, 6:32 PM
> Michael Foord wrote:
> > On 26/01/2010 00:28, Christian Heimes wrote:
> >> I favor this approach over an integer offset
> because doesn't change the
> >> semantic of ob_item.
> >>    
> > Well, on the face of it this doesn't sound like a huge
> increase in
> > complexity. Not that I'm qualified to judge.
> 
> Christian's approach is a good one for minimising the
> semantic changes,
> and compared to an offset based approach actually has a
> decent chance of
> working without breaking too much C code (you'd have to
> change the way
> sys.getsizeof() worked for lists, but I can't think of
> anything else off
> the top of my head that would definitely break).

As I'm diving into this, it is clear that you want to preserve the semantics of 
ob_item and ob_size, as they are used in a whole bunch of places.  For now I am 
tracking a var called orphans, which subtly changes one invariant:

  0 <= ob_size + orphans <= allocated

I think Christian covered most of the places that would need to change, and 
list_dealloc would also need to change.
 
> However, as Cameron pointed out, the O() value for an
> operation is an
> important characteristic of containers, and having people
> get used to an
> O(1) list.pop(0) in CPython could create problems not only
> for other
> current Python implementations but also for future versions
> of CPython
> itself.

I hadn't thought of that.

Here are the objections that I've heard or thought of myself:

 * The simplicity of the current implementation is important beyond the normal 
benefits of simplicity, since it is also a reference implementation for other 
ports of Python.
 * People who got used to O(1) in one version of Python might have unpleasant 
surprises when they went to other versions.
 * Alternatives to list already exist, such as deque and blist 
 * An O(1) solution would increase the size of PyListObject.
 * An O(1) solution would postpone the release of the memory from the orphaned 
pointers.
 * An O(1) solution would slow down calls to list_resize, PyList_new, and 
list_dealloc.
 * For small and medium sized lists, memmove()'s penalty is usually drowned out 
by other operations on the list elements.
 * The use case of popping elements off a large list is not that common 
(although this might be somewhat driven by the documented slowness)
 * There may be third party code that relies on the current internal 
implementation

Did I leave anything out?

Here are the benefits of an O(1) implementation.

 * O(1) is faster than O(N) for some, presumably quite small, value of N.
 * Performance benefits tend to compound.  If you have P processes doing pop(0) 
in a loop on an N-element list, you are saving P * N memmoves of size kN.
 * The technique required to make O(1) work is simple at its core--advance a 
pointer forward instead of moving the memory backward.
 * Encouraging the use of pop(0) will lead to leaner Python programs that 
release memory earlier in the process.
 * While not super common, there do exist programs today that pop from the top, 
either using pop itself or del, including programs in the standard library
 * The language moratorium provides a good window for performance improvements 
in general (even if this one does not pass the litmus test for other reasons)

Did I leave anything out?


 


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to