--- On Mon, 1/25/10, Nick Coghlan <ncogh...@gmail.com> wrote: > From: Nick Coghlan <ncogh...@gmail.com> > Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time > To: "Michael Foord" <fuzzy...@voidspace.org.uk> > Cc: "Christian Heimes" <li...@cheimes.de>, python-dev@python.org > Date: Monday, January 25, 2010, 6:32 PM > Michael Foord wrote: > > On 26/01/2010 00:28, Christian Heimes wrote: > >> I favor this approach over an integer offset > because doesn't change the > >> semantic of ob_item. > >> > > Well, on the face of it this doesn't sound like a huge > increase in > > complexity. Not that I'm qualified to judge. > > Christian's approach is a good one for minimising the > semantic changes, > and compared to an offset based approach actually has a > decent chance of > working without breaking too much C code (you'd have to > change the way > sys.getsizeof() worked for lists, but I can't think of > anything else off > the top of my head that would definitely break).
As I'm diving into this, it is clear that you want to preserve the semantics of ob_item and ob_size, as they are used in a whole bunch of places. For now I am tracking a var called orphans, which subtly changes one invariant: 0 <= ob_size + orphans <= allocated I think Christian covered most of the places that would need to change, and list_dealloc would also need to change. > However, as Cameron pointed out, the O() value for an > operation is an > important characteristic of containers, and having people > get used to an > O(1) list.pop(0) in CPython could create problems not only > for other > current Python implementations but also for future versions > of CPython > itself. I hadn't thought of that. Here are the objections that I've heard or thought of myself: * The simplicity of the current implementation is important beyond the normal benefits of simplicity, since it is also a reference implementation for other ports of Python. * People who got used to O(1) in one version of Python might have unpleasant surprises when they went to other versions. * Alternatives to list already exist, such as deque and blist * An O(1) solution would increase the size of PyListObject. * An O(1) solution would postpone the release of the memory from the orphaned pointers. * An O(1) solution would slow down calls to list_resize, PyList_new, and list_dealloc. * For small and medium sized lists, memmove()'s penalty is usually drowned out by other operations on the list elements. * The use case of popping elements off a large list is not that common (although this might be somewhat driven by the documented slowness) * There may be third party code that relies on the current internal implementation Did I leave anything out? Here are the benefits of an O(1) implementation. * O(1) is faster than O(N) for some, presumably quite small, value of N. * Performance benefits tend to compound. If you have P processes doing pop(0) in a loop on an N-element list, you are saving P * N memmoves of size kN. * The technique required to make O(1) work is simple at its core--advance a pointer forward instead of moving the memory backward. * Encouraging the use of pop(0) will lead to leaner Python programs that release memory earlier in the process. * While not super common, there do exist programs today that pop from the top, either using pop itself or del, including programs in the standard library * The language moratorium provides a good window for performance improvements in general (even if this one does not pass the litmus test for other reasons) Did I leave anything out? _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com