Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-30 Thread Josiah Carlson
On Fri, Jan 29, 2010 at 11:25 PM, Stephen J. Turnbull
step...@xemacs.org wrote:
 Josiah Carlson writes:

   Lisp lists are really stacks

 No, they're really (ie, concretely) singly-linked lists.

 Now, stacks are an abstract data type, and singly-linked lists provide
 an efficient implementation of stacks.  But that's not what linked
 lists really are.  For example, singly-linked lists are also a
 reasonable way to implement inverted trees (ie, the node knows its
 parent, but not its children), which is surely not a stack.

 The Python use of list to denote what is concretely a dynamically
 extensible one-dimensional array confused me a bit.  But what the
 heck, Guido needed a four-letter word to denote a concrete type used
 to implement a mutable sequence ADT, and he wasn't going to borrow one
 from that French guy on the ramparts, right?  No big deal.  Ahem...

 So the confusion here is that in Python, list denotes a particular
 concrete data type, while Steve H. is using a more abstract idea of
 list as mutable sequence to suggest there's a reason for optimizing
 certain mutations that Python's data type isn't good at.  I don't
 think that's an effective way for him to make his point, unfortunately.
 But both usages are consistent with Python's usage; mutability is the
 usual way that lists are distinguished from tuples, for example, and
 the underlying dynamic array implementation is rarely mentioned.

My experience with Lisp is limited to mzScheme and DrScheme, but
AFAIR, neither of them had mutable lists.  Both had list semantics
that were equivalent (in terms of limitations and functionality) to
the structure I described using tuples in my earlier post.  If other
Lisp implementations have mutable lists, I'd be surprised to learn
that.

However, now we are well into the weeds, far off the track of whether
or not Steve H's feature is worth saddling Python lists with cruft.

 - Josiah
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-30 Thread Josiah Carlson
On Fri, Jan 29, 2010 at 11:31 PM, Stephen J. Turnbull
step...@xemacs.org wrote:
 Josiah Carlson writes:
   On Thu, Jan 28, 2010 at 8:57 PM, Steve Howell showel...@yahoo.com wrote:

    What do you think of LISP, and car in particular (apart from
    the stupidly cryptic name)?

   Apples and oranges.

 True, but speaking of Lisp lists, here's some possibly relevant
 experience.  About 10 years ago, XEmacs converted its cons type from a
 special immediate representation (ie, cons == (car, cdr)) to a generic
 record representation (ie, cons == (pointer to type descriptor, car,
 cdr)).  This resulted in a perceptible increase in VM usage and disk
 usage.  A typical running XEmacs instance for me contains about 0.75
 million conses and uses 200MB of VM, so with 32-bit pointers that's
 about 3MB extra, or 1.5%, and with 64-bit pointers it's 6MB extra,
 about 3%.  However, I tend to have several big buffers (20-50MB) of
 pure character data; people who work with smaller buffers on 64-bit
 machines have reported as much as 10% extra overhead.  On disk, the
 binary is typically about 9MB stripped.  That contains about 50,000
 conses, or an extra 200KB/400KB with the new structure, somewhat more
 than my experience (2% or 4%).

 Some people complained, but we considered this well worthwhile (moving
 one type bit from the car to the header allowed Lisp integers to
 cover the range -1G to +1G, and there are a surprising number of
 people who would like to use XEmacs on files 512MB).  I suppose that
 Steve's proposal probably has similar impact on binaries and running
 instances of Python, but he hasn't given any use cases for list.pop(0)
 to compared to doubling the size of usable buffers.

The choice that emacs made is great for emacs; as you stated, it
allowed emacs to do something it was previously unable to do.  Steve
H's proposed change would not allow Python to do anything it wasn't
able to do before, and would (as TJR stated in this and other threads)
saddle Python with overhead so as to make more convenient the use a
structure for which it was not intended (paraphrased, of course).
Again; no good use-case, means no problem, means no reason to try to
solve the perceived problem.

It's great that you support Steve H's proposal, but can we keep the
discussion on why this would be good for Python, rather than why
changing a structure that is identical in name (but only similar in
functionality) to Python's list was good for another language?

 - Josiah
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-30 Thread Stephen J. Turnbull
Josiah Carlson writes:
  On Fri, Jan 29, 2010 at 11:31 PM, Stephen J. Turnbull
  step...@xemacs.org wrote:

   Some people complained, but we considered this well worthwhile (moving
   one type bit from the car to the header allowed Lisp integers to
   cover the range -1G to +1G, and there are a surprising number of
   people who would like to use XEmacs on files 512MB).  I suppose that
   Steve's proposal probably has similar impact on binaries and running
   instances of Python, but he hasn't given any use cases for list.pop(0)
   to compared to doubling the size of usable buffers.

  The choice that emacs made is great for emacs;

Emacs hasn't made that choice, XEmacs did.  I believe Emacs is still
restricted to 128MB, or maybe 256MB, buffers.  They recently had an
opportunity to increase integer size, and thus maximum buffer size,
but refused it.  It's not a no-brainer.

  It's great that you support Steve H's proposal, but can we keep the
  discussion on why this would be good for Python,

I don't support it or oppose it (I wouldn't notice the increased
overhead myself, but I have no use case for O(1) list.pop(0)).  I'm
giving some figures on a similar change (adding a single pointer to a
previously low-overhead structure used in large numbers in some
applications), and pointing out that this was good for XEmacs only
because there was a rather big increase in capability in a use-case
that people can sympathize with even if they don't need it themselves.

I hope that this example will help Steve H understand why he needs to
give real use-cases, or if he doesn't know of any, give up.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-30 Thread Stephen J. Turnbull
Minor erratum:

Stephen J. Turnbull writes:

  Emacs hasn't made that choice, XEmacs did.  I believe Emacs is still
  restricted to 128MB, or maybe 256MB, buffers.  They recently had an
  opportunity to increase integer size, and thus maximum buffer size,
  but refused it.  It's not a no-brainer.

I stand corrected.  Emacs did make some changes which increased
integer size from 28 bits to 30, allowing a maximum signed value of
512M, but refused the tradeoff I described of making the cons type be
indicated by a pointer to a type description record rather than a type
bit in one of the pointers.  That would have allowed 31 bits for
integers, as in XEmacs.  The basic thrust of my argument was correct,
though.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-30 Thread Steve Howell
--- On Fri, 1/29/10, Stephen J. Turnbull step...@xemacs.org wrote:

 
   Lisp lists are really stacks
 
 No, they're really (ie, concretely) singly-linked
 lists.  
 
 Now, stacks are an abstract data type, and singly-linked
 lists provide
 an efficient implementation of stacks.  But that's not
 what linked
 lists really are.  For example, singly-linked lists
 are also a
 reasonable way to implement inverted trees (ie, the node
 knows its
 parent, but not its children), which is surely not a
 stack.

I like your distinction between abstract data types and concrete 
implementations.
 
From a mutability perspective, the concrete implementation of Python lists 
shares a performance characteristic with most concrete implementations of 
stacks, in that inserts/pops at the top are cheap.

Unlike most stacks, Python lists do at least semantically allow queue-like 
behavior for removing elements from the bottom, but I don't think it's unfair 
of me to say that removes from the bottom are discouraged under the current 
implementation.  (I can cite the tutorial, for example).  So, to the extent 
that removes from the bottom are frowned upon, Python again has the same 
mutability characteristics as a stack.

The abstract data type stack does not allow for random access of elements 
AFAIK, so Python lists are definitely more than a stack, especially since 
random accesses are not only possible, they are quite efficient.

So I guess they are an array.

I don't know whether or not arrays are considered to be an abstract data type 
or not, but my de facto concept of an array is something that supports fast 
random access, cheap mutation at the top, and no guarantees at the bottom.  I 
am guessing that from a big-O perspective, Python lists have the exact same 
performance characteristics as the data structures that Perl, Ruby, and 
Javascript all call array.  Also, Python lists are built on top of a C array, 
and while it would be a bit of an overstatement to say that lists are just a 
nicely sugared encapsulation of C arrays, I think it would be a fair statement 
to say that Python lists only give O(1) performance for the same operations as 
the underlying C array; all the other operations are there just for convenience 
where performance is not a driving concern.

Also, we can go back to the example of LISP, the one language that I know of 
that shares the term list.  Whatever a list denotes from an abstract 
perspective, Python and LISP do not agree upon the definition.  Python lists 
are more like right-side-up stacks with fast random access, while LISP lists 
are more like an upside-down stack without iteration.

 The Python use of list to denote what is concretely a
 dynamically
 extensible one-dimensional array confused me a bit. 
 But what the
 heck, Guido needed a four-letter word to denote a concrete
 type used
 to implement a mutable sequence ADT, and he wasn't going to
 borrow one
 from that French guy on the ramparts, right?  No big
 deal.  Ahem...

Probably the same reason he didn't call dictionaries hashes, right? :)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-30 Thread geremy condra
On Fri, Jan 29, 2010 at 12:48 AM, Terry Reedy tjre...@udel.edu wrote:
 On 1/28/2010 6:30 PM, Josiah Carlson wrote:

 I would also point out that the way these things are typically done is
 that programmers/engineers have use-cases that are not satisfied by
 existing structures, they explain the issues they have with existing
 structures, and they propose modifications.  So far, Steve has not
 offered any use-cases for why his proposed change is necessary; merely

 Use of a list as a queue rather than as a stack, as in breadth-first search,
 where one only needs to pop off the front but never push to the front. That
 is not to say that this is common or that a deque or other options may no be
 pretty satisfactory. But it would certainly be easier, when presenting such
 algorithms, to just be able to use a list, which has already been taught,
 than to introduce another structure. Currently a deque is not a drop-in
 replacement for a list in that one cannot use all list methods with a deque.

 As I understand it, his proposal is simpler than the one rejected a couple
 of years ago is that it does not include intentional over-allocation at the
 front of the list, as would be needed for guaranteed O(1) behavior for
 deque-like insertion at the front. I may consider a Python version of his
 idea for one of my needs, where speed is not an issue.

 I agree that the discussion has gone on too long here and that some of
 Steve's rhetoric has been unnecessarily abrasive and off-putting. He has
 been told this and acknowledged it once on Python-list, but habits die hard.
 For both reasons, I suggested a few days ago that further discussion should
 focus on the patch and be moved to the issue on the tracker. So I will not
 say more here.

 Terry Jan Reedy

Excellently put.

Geremy Condra
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-29 Thread Georg Brandl
Am 28.01.2010 05:30, schrieb Steve Howell:

 If you want tools that are easy to use correctly, make them bug-free and
 document their behavior.  If you want tools that are easy to use well, then
 make them perform better.  I am not sure how my patch contradicts either of
 these goals.
 
 You keep making the argument that deque is a better alternative to list in
 many situations.  I actually agree with you.  Most programming problems are
 best modelled by a queue.  I am not sure why Python lists get all the syntax
 sugar and promotion over deque, when in reality, Python lists implement a
 pretty useless data structure.  Python lists are a glorification of a C array
 built on top of a memory-upward-biased memory allocator.  As such, they
 optimize list appends (good) but fail awfully on list prepops (bad).  They
 are much better as stacks than queues, even though queues are more useful for
 the most common programming known to man--work through a work queue and
 delete tasks when they are done.
 
 It is not surprising that Python lists are starting to show their lack of
 versatility in 2010.  They're based on 1970's technology.  Python lists are
 really just a thin encapsulation of C arrays built on top of an asymmetrical
 memory manager.
 
 In 2010 you could improve Python lists by releasing from the constraints of
 1970s semantics.  But I am starting to think a more modern approach would be
 to take more useful data structures like deques and give them more sugar.

This post made me laugh, and skip the rest of the thread.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-29 Thread Josiah Carlson
On Thu, Jan 28, 2010 at 8:57 PM, Steve Howell showel...@yahoo.com wrote:
 --- On Thu, 1/28/10, Josiah Carlson josiah.carl...@gmail.com wrote:
 [...] in the decade+ that I've been using
 Python and
 needed an ordered sequence; lists were the right solution
 99% of the
 time [...]

 What do you think of LISP, and car in particular (apart from the stupidly 
 cryptic name)?

Apples and oranges.  Lisp lists are really stacks, and are analogous
in Python to...
lisp_list = (item1, (item2, (item3, (item4, ()
lisp_car, lisp_cdr = lisp_list

In many typical uses of lisp lists, car/cdr are used as a replacement
for the equivalent of iteration in other languages.

 - Josiah
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-29 Thread Josiah Carlson
On Thu, Jan 28, 2010 at 9:48 PM, Terry Reedy tjre...@udel.edu wrote:
 On 1/28/2010 6:30 PM, Josiah Carlson wrote:

 I would also point out that the way these things are typically done is
 that programmers/engineers have use-cases that are not satisfied by
 existing structures, they explain the issues they have with existing
 structures, and they propose modifications.  So far, Steve has not
 offered any use-cases for why his proposed change is necessary; merely

 Use of a list as a queue rather than as a stack, as in breadth-first search,
 where one only needs to pop off the front but never push to the front. That
 is not to say that this is common or that a deque or other options may no be
 pretty satisfactory. But it would certainly be easier, when presenting such
 algorithms, to just be able to use a list, which has already been taught,
 than to introduce another structure. Currently a deque is not a drop-in
 replacement for a list in that one cannot use all list methods with a deque.

Being able to use a list and get good performance straight off would
be *convenient*.  But that's it.  People use it as a queue, discover
that it is slow, ask (or research), and discover the deque.  The
use-cases where having the full range of list methods *and* deque
behavior are fairly slim (perhaps none in my experience), and I argue
are better covered by structures that are neither a Python list (even
with the modifications offered by Steve) nor a deque.

 - Josiah
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-29 Thread Stephen J. Turnbull
Josiah Carlson writes:

  Lisp lists are really stacks

No, they're really (ie, concretely) singly-linked lists.  

Now, stacks are an abstract data type, and singly-linked lists provide
an efficient implementation of stacks.  But that's not what linked
lists really are.  For example, singly-linked lists are also a
reasonable way to implement inverted trees (ie, the node knows its
parent, but not its children), which is surely not a stack.

The Python use of list to denote what is concretely a dynamically
extensible one-dimensional array confused me a bit.  But what the
heck, Guido needed a four-letter word to denote a concrete type used
to implement a mutable sequence ADT, and he wasn't going to borrow one
from that French guy on the ramparts, right?  No big deal.  Ahem...

So the confusion here is that in Python, list denotes a particular
concrete data type, while Steve H. is using a more abstract idea of
list as mutable sequence to suggest there's a reason for optimizing
certain mutations that Python's data type isn't good at.  I don't
think that's an effective way for him to make his point, unfortunately.
But both usages are consistent with Python's usage; mutability is the
usual way that lists are distinguished from tuples, for example, and
the underlying dynamic array implementation is rarely mentioned.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-29 Thread Stephen J. Turnbull
Josiah Carlson writes:
  On Thu, Jan 28, 2010 at 8:57 PM, Steve Howell showel...@yahoo.com wrote:

   What do you think of LISP, and car in particular (apart from
   the stupidly cryptic name)?

  Apples and oranges.

True, but speaking of Lisp lists, here's some possibly relevant
experience.  About 10 years ago, XEmacs converted its cons type from a
special immediate representation (ie, cons == (car, cdr)) to a generic
record representation (ie, cons == (pointer to type descriptor, car,
cdr)).  This resulted in a perceptible increase in VM usage and disk
usage.  A typical running XEmacs instance for me contains about 0.75
million conses and uses 200MB of VM, so with 32-bit pointers that's
about 3MB extra, or 1.5%, and with 64-bit pointers it's 6MB extra,
about 3%.  However, I tend to have several big buffers (20-50MB) of
pure character data; people who work with smaller buffers on 64-bit
machines have reported as much as 10% extra overhead.  On disk, the
binary is typically about 9MB stripped.  That contains about 50,000
conses, or an extra 200KB/400KB with the new structure, somewhat more
than my experience (2% or 4%).

Some people complained, but we considered this well worthwhile (moving
one type bit from the car to the header allowed Lisp integers to
cover the range -1G to +1G, and there are a surprising number of
people who would like to use XEmacs on files 512MB).  I suppose that
Steve's proposal probably has similar impact on binaries and running
instances of Python, but he hasn't given any use cases for list.pop(0)
to compared to doubling the size of usable buffers.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-28 Thread Ulrich Eckhardt
On Tuesday 26 January 2010, Steve Howell wrote:
 Here are the benefits of an O(1) implementation.
[...]
 Did I leave anything out?

Good summary, Steve, thanks!

Anyway, you left two out:

 * Inserting at the front gets the same complexity as inserting at the back.

 * Inserting and erasing anywhere in the middle can memmove() the smaller 
tail, so it changes from O(N) to O(N/2).


Finally, if you drop the invariant first=last, you can simply wrap around the 
pointers. This allows creating a producer/consumer queue which never needs to 
move its content.


Cheers!

Uli

**
Sator Laser GmbH, Fangdieckstraße 75a, 22547 Hamburg, Deutschland
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932
**
   Visit our website at http://www.satorlaser.de/
**
Diese E-Mail einschließlich sämtlicher Anhänge ist nur für den Adressaten 
bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen 
Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empfänger sein 
sollten. Die E-Mail ist in diesem Fall zu löschen und darf weder gelesen, 
weitergeleitet, veröffentlicht oder anderweitig benutzt werden.
E-Mails können durch Dritte gelesen werden und Viren sowie nichtautorisierte 
Änderungen enthalten. Sator Laser GmbH ist für diese Folgen nicht 
verantwortlich.
**

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-28 Thread Josiah Carlson
Having read the entirety of the thread (which is a rare case these
days, I need more spare time), and being that I'm feeling particularly
snarky today, I'm going to agree 100% with everything that Raymond has
said in this message and his few subsequent messages.  Snarky comments
to follow.

I would also point out that the way these things are typically done is
that programmers/engineers have use-cases that are not satisfied by
existing structures, they explain the issues they have with existing
structures, and they propose modifications.  So far, Steve has not
offered any use-cases for why his proposed change is necessary; merely
a (paraphrased) It would be great if list.pop(0) was O(1) instead of
O(n).  It's a solution looking for a problem that doesn't exist.
That said, even if he were to magically come up with twenty examples
where neither a deque nor a list were the right solution, I'd still be
a -1, because in the decade+ that I've been using Python and
needed an ordered sequence; lists were the right solution 99% of the
time, deques got .99%, and the remaining .01% would not have been
satisfied with what Steve is proposing (this is obvious hyperbole and
made-up numbers, but the number of sequence types I've created (easily
10-20) is still a few orders of magnitude lower than how often I use
plain lists and deques).

Don't get me wrong, I'm all about nifty optimizations (I still get a
chuckle out of proper string lengths with surrogate pairs, string
views, Some, ...).  But in this case there is no problem; merely the
use of a data structure for something it was not designed.

It's great that Steve wants to help.  And I 3 innovation in data
structures.  But this patch isn't innovation, and it isn't helping
99.99% of use-cases.
/snark

Going back to not having free time,
 - Josiah

On Tue, Jan 26, 2010 at 1:52 PM, Raymond Hettinger
raymond.hettin...@gmail.com wrote:

 Ah, but that applies for *large* lists. Adding 8 bytes to

 each list

 object applies to *all* lists. I betcha that many programs

 use many

 tiny lists.


 Even most tiny-ish lists are wasting memory, though, according to this
 sequence:

 4, 8, 16, 25, ...


 That is only is they are being grown with list.append().
 Otherwise, list sizes are exact.  For example, [0,1,2]
 uses space for just three entries and [] for none.
 I get the impression that 1) you've already made up your
 mind and are ignoring input from the other developers,
 2) that you're ignoring the python-dev discussions of long ago
 where this very idea was rejected and deques came in to
 being instead, 3) over-estimating the prevalence of use
 cases for pop(0), and 4) under-estimating the impact on
 small lists, 5) under-estimating the impact on psyco or
 other implementations of Python, 6) under-estimating the
 impact on third-party extensions and debugging tools.
 Deques already provide a solution to the FIFO problem
 and they do so without huge wastes of memory or calls
 to memmove().  They handle both insertion and deletion
 from the front and back.  In comparison, the proposed
 changes to lists seem like a complete hack.  Just because
 it can be done, doesn't mean that it should.
 ISTM that you're attacking an already solved problem
 and that you're playing fast and loose with a core data
 structure.

 Raymond

 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/josiah.carlson%40gmail.com


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-28 Thread Steve Howell
--- On Thu, 1/28/10, Josiah Carlson josiah.carl...@gmail.com wrote:
 [...] in the decade+ that I've been using
 Python and
 needed an ordered sequence; lists were the right solution
 99% of the
 time [...]

What do you think of LISP, and car in particular (apart from the stupidly 
cryptic name)?




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-28 Thread Terry Reedy

On 1/28/2010 6:30 PM, Josiah Carlson wrote:


I would also point out that the way these things are typically done is
that programmers/engineers have use-cases that are not satisfied by
existing structures, they explain the issues they have with existing
structures, and they propose modifications.  So far, Steve has not
offered any use-cases for why his proposed change is necessary; merely


Use of a list as a queue rather than as a stack, as in breadth-first 
search, where one only needs to pop off the front but never push to the 
front. That is not to say that this is common or that a deque or other 
options may no be pretty satisfactory. But it would certainly be easier, 
when presenting such algorithms, to just be able to use a list, which 
has already been taught, than to introduce another structure. Currently 
a deque is not a drop-in replacement for a list in that one cannot use 
all list methods with a deque.


As I understand it, his proposal is simpler than the one rejected a 
couple of years ago is that it does not include intentional 
over-allocation at the front of the list, as would be needed for 
guaranteed O(1) behavior for deque-like insertion at the front. I may 
consider a Python version of his idea for one of my needs, where speed 
is not an issue.


I agree that the discussion has gone on too long here and that some of 
Steve's rhetoric has been unnecessarily abrasive and off-putting. He has 
been told this and acknowledged it once on Python-list, but habits die 
hard. For both reasons, I suggested a few days ago that further 
discussion should focus on the patch and be moved to the issue on the 
tracker. So I will not say more here.


Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-28 Thread Scott Dial
On 1/28/2010 11:57 PM, Steve Howell wrote:
 --- On Thu, 1/28/10, Josiah Carlson josiah.carl...@gmail.com wrote:
 [...] in the decade+ that I've been using
 Python and
 needed an ordered sequence; lists were the right solution
 99% of the
 time [...]
 
 What do you think of LISP, and car in particular (apart from the stupidly 
 cryptic name)?

A LISP list/pair has nothing to do with a Python list. The list of LISP
is a *singly-linked* list. You cannot O(1) index a LISP list. A Python
list is more equivalent to a LISP vector, which car does not work
with; in fact, there is not even a pop operator -- all size changes of
a vector O(n) unless the implementation is playing games (like the one
you are proposing for the start and the one Python already uses for the
end of a list).

(And with this, clearly uninformed reply by you, I am now ignoring your
trolling.)

-- 
Scott Dial
sc...@scottdial.com
scod...@cs.indiana.edu
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Glenn Linderman
On approximately 1/26/2010 7:50 PM, came the following characters from 
the keyboard of Cameron Simpson:

My point was that I look on python builtins like list and dict as highly
optimised, highly efficient facilities. That means that I expect a list
to be very very much like a linear array as one would find in C-like
languages, with realloc() managment behind the scenes to handle the
resizing requirements on append/extend.
   


As a newcomer to python, I must say that I wouldn't expect a list to be 
like an array.  I'd expect it more to be like a list... many 
implementations of lists (linked lists, in particular) make it O(1) to 
add to the front or back.  An array can be used to represent a list, but 
there are known inefficiencies that result when doing so, one of which 
the Subject patch is working to address.  I guess I would have expected 
something called an array to be more like an array, rather than 
something called a list.  But one has to read the documentation to find 
out what things really mean, in a new environment.


--
Glenn -- http://nevcal.com/
===
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Nick Coghlan
Cameron Simpson wrote:
 The proposed change to make pop(0) O(1) involves adding a base offset
 to the internal list implementation. Doesn't that incur a (small)
 overhead to _every_ list operation? Doesn't that weaken list as the
 as efficient as possible implementation of choice for array-like
 things?

No, the implementation is smarter than that. The existing pointer in the
PyList_* structure will continue to point to the first element of the
list while an extra pointer is added that points to the beginning of the
allocated memory. The difference between the two memory addresses will
determine the number of orphan pointer slots that exist at the
beginning of the list.

Only operations that change the length of the list will be slowed down
at all, since they will need to take the orphans into account when
deciding whether or not to resize the allocated memory.

The big practical concern is actually the memory cost of adding yet
another pointer (potentially 8 bytes of data) to every list object, even
empty ones.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Steve Howell
--- On Wed, 1/27/10, Glenn Linderman v+pyt...@g.nevcal.com wrote:
 As a newcomer to python, I must say that I wouldn't expect
 a list to be like an array.  I'd expect it more to be
 like a list... many implementations of lists (linked lists,
 in particular) make it O(1) to add to the front or
 back.  An array can be used to represent a list, but
 there are known inefficiencies that result when doing so,
 one of which the Subject patch is working to address. 
 I guess I would have expected something called an array to
 be more like an array, rather than something called a
 list.  But one has to read the documentation to find
 out what things really mean, in a new environment.
 

My concept of Python lists is that they should have at least the same 
performance characteristics as an ordinary to-do list that you make with 
pencil, paper, and an eraser.

When you complete the first task on your to-do list, you can just erase it; no 
need to recopy the whole list.   When you complete all the elements on the 
first page, throw away the paper.  As you find new tasks to complete, add them 
on the end.  When you fill up a page, get a new sheet of paper.

If you complete the first task, erase it, but then a new urgent task comes in, 
go ahead and write the new urgent task on the first line of the first page.

If, for some reason, you keep getting bombarded with tasks that are more urgent 
than the plan you originally set out for yourself, you need a more powerful 
tool than a simple pencil/paper tool list, and by analogy, you would need a 
more powerful tool than a Python list to do it electronically.

But if you are just working your way through a paper to-do list, then erasing 
elements from the top works in O(1) time with occasional compacting of paper 
when you finish a whole page of tasks.

In my mind Python's lists should have the same performance characteristics as 
the paper list (or better).


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Steve Howell
--- On Tue, 1/26/10, Cameron Simpson c...@zip.com.au wrote:

 From: Cameron Simpson c...@zip.com.au
 | The idea that CPython should not be improved because it
 would spoil
 | programmers strikes me as a thin, even desparate
 objection.
 
 Hey, I even used the word thin to describe my concern!
 
 My point was that I look on python builtins like list and
 dict as highly
 optimised, highly efficient facilities. That means that I
 expect a list
 to be very very much like a linear array as one would find
 in C-like
 languages, with realloc() managment behind the scenes to
 handle the
 resizing requirements on append/extend.
 
 It also means that the O() cost of operations should be
 pretty obvious.
 pop(0) obviously involves shuffling everything down one
 slot.

In my previous posting, though, I mention another plausible metaphor that 
programmers would have about lists--ordinary to-do lists where you can cross 
out or erase the first item in the list in O(1) time.
 
 
 The proposed change to make pop(0) O(1) involves adding a
 base offset
 to the internal list implementation. Doesn't that incur a
 (small)
 overhead to _every_ list operation? Doesn't that weaken
 list as the
 as efficient as possible implementation of choice for
 array-like
 things?


The patch advances the pointer itself during prepops, so accesses continue to 
work as quickly as before.   The offset between the originally allocated 
pointer and the pointer to the new first element of the list only gets 
calculated and used during list_resize and list_dealloc.  List_resize gets 
called by any operation that changes the size of the list, including inserts, 
deletes, pops, etc.  

Because I have to increase the size of PyListObject, you could argue that I 
even affect the performance of access to the extent that the greater size of 
PyListObject increases the likelihood of cache misses.  I would consider that 
also to be a very thin objection, although not completely implausible. 

 
 Yes, optimisation is nice. Higher seed is nice. But there
 are
 optimisations that simple fix inefficiencies or provide a
 special case
 for a common operation, and there are those with side
 effects elsewhere
 in the implementation (i.e. the base offset this pop(0) opt
 would incur,
 which adds a (small) cost to everything).


Your general point about optimizations having tradeoffs is valid, but I am not 
making the particular tradeoff that you mention.
 
 The other aspect, mentioned elsewhere in this thread, is
 that
 programmers should know the O() cost of these operations.
 

The tutorial does mention that the current list implementation has to move all 
the elements forward when you remove off the top.

(Cameron: Quick aside, for some reason your emails keep going to my spam 
folder.  I am sure it's just my mail client being stupid, but I thought I'd let 
you know in case it's something on your side.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Daniel Stutzbach
On Wed, Jan 27, 2010 at 7:13 AM, Steve Howell showel...@yahoo.com wrote:

 My concept of Python lists is that they should have at least the same
 performance characteristics as an ordinary to-do list that you make with
 pencil, paper, and an eraser.

 When you complete the first task on your to-do list, you can just erase it;
 no need to recopy the whole list.


I don't think your analogy works, unless you recopy your to-do lists
whenever you complete a task in the middle of the list. ;-)

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Steve Howell
--- On Wed, 1/27/10, Daniel Stutzbach dan...@stutzbachenterprises.com wrote:

 From: Daniel Stutzbach dan...@stutzbachenterprises.com
 Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
 To: Steve Howell showel...@yahoo.com
 Cc: python-dev@python.org
 Date: Wednesday, January 27, 2010, 5:32 AM
 On Wed, Jan 27,
 2010 at 7:13 AM, Steve Howell showel...@yahoo.com
 wrote:
 
 My concept of Python lists is that they should have at
 least the same performance characteristics as an ordinary
 to-do list that you make with pencil, paper, and an eraser.
 
 
 
 When you complete the first task on your to-do list, you
 can just erase it; no need to recopy the whole list.   
 
 I don't think your analogy works, unless you recopy
 your to-do lists whenever you complete a task in the middle
 of the list. ;-)
 

The bunch of stickies on my desk, and scribbled notes on the back of envelopes, 
etc. does indeed suggest a jumbled data structure that I would never want to 
reproduce electronically! :)  


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Stefan Behnel
Glenn Linderman, 27.01.2010 10:13:
 As a newcomer to python, I must say that I wouldn't expect a list to be
 like an array.  I'd expect it more to be like a list... many
 implementations of lists (linked lists, in particular) make it O(1) to
 add to the front or back.  An array can be used to represent a list, but
 there are known inefficiencies that result when doing so

Performance-wise, there are a lot more inefficiencies in linked lists for
the most common use cases than in arrays. It's hinting to see how common the

List l = new ArrayList()

idiom is in Java (plus generics, obviously). I can't remember seeing any
other kind of initialisation in ages.

That's a huge difference between Java and Python, BTW. Python optimises for
common use cases to keep you from thinking too much about implementation
details, whereas Java just leaves you alone with all possible solutions for
all possible use cases and forces you to choose the right one at each
single code line you write.

Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Antoine Pitrou
Nick Coghlan ncoghlan at gmail.com writes:
 
 The big practical concern is actually the memory cost of adding yet
 another pointer (potentially 8 bytes of data) to every list object, even
 empty ones.

It needn't be, actually. Why?
You only need to store this other pointer when you have an orphaned area in the
first place. But then you can store the pointer *in* the orphaned area :-)

So let's say for example:
- when ob_size = 0, there are no orphans
- when ob_size  0, the pointer to the orphaned area is given by 
  (PyObject **) self-ob_items[-1]

This makes a couple of micro-operations slightly slower (you need to take
ob_size's absolute value to get the length (*)); and it kills code which uses
Py_SIZE() on lists. But PyList_GET_SIZE() exists for a reason.

(*) or you could use ob_size's MSB, and then a simple binary AND gets you the
length; or even ob_items's LSB, since the pointer will always be aligned on more
than 1 byte

(and, yes, this is a bit insane)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Antoine Pitrou
Daniel Stutzbach daniel at stutzbachenterprises.com writes:
 
 
 I don't think your analogy works, unless you recopy your to-do lists whenever
 you complete a task in the middle of the list. 

I think his analogy suggests that his to-do list is a doubly-linked list ;)
Or perhaps an array with lazy deletion (so that he loses O(1) random access).

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Steve Howell
--- On Tue, 1/26/10, Guido van Rossum gu...@python.org wrote:

 From: Guido van Rossum gu...@python.org
 Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
 To: Steve Howell showel...@yahoo.com
 Cc: Nick Coghlan ncogh...@gmail.com, python-dev@python.org
 Date: Tuesday, January 26, 2010, 12:57 PM
 On Tue, Jan 26, 2010 at 12:46 PM,
 Steve Howell showel...@yahoo.com
 wrote:
  It seems to me that the goal of keeping lists
 super-compact from a memory standpoint is contradicted by
 the code in list_resize that optimistically preallocates
 extra memory on appends.
 
 Ah, but that applies for *large* lists. Adding 8 bytes to
 each list
 object applies to *all* lists. I betcha that many programs
 use many
 tiny lists.
 

I think that some of the large programs that you mention with many tiny lists 
have some subset of lists still in memory only due to the fact that they cannot 
be garbage collected due to dangling references.

One of the ways to eliminate dangling references is to aggressively delete 
objects after you are done processing them.

Right now the Python programmer looking to aggressively delete elements from 
the top of a list has to consider the tradeoff that the operation takes O(N) 
time and would possibly churn his memory caches with the O(N) memmove 
operation.  In some cases, the Python programmer would only have himself to 
blame for not using a deque in the first place.  But maybe he's a maintenance 
programmer, so it's not his fault, and maybe the code he inherits uses lists in 
a pervasive way that makes it hard to swap in deque after the fact.  What 
advice do you give him?

Of course, this scenario is mostly speculative.  A concrete example of a large 
Python program that keeps lots of tiny lists around would probably shed more 
light on the matter.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Steve Howell
--- On Wed, 1/27/10, John Arbash Meinel john.arbash.mei...@gmail.com wrote:

 From: John Arbash Meinel john.arbash.mei...@gmail.com
 Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
 To: Steve Howell showel...@yahoo.com
 Cc: Guido van Rossum gu...@python.org, Nick Coghlan 
 ncogh...@gmail.com, python-dev@python.org
 Date: Wednesday, January 27, 2010, 7:45 AM
 
  Right now the Python programmer looking to
 aggressively delete elements from the top of a list has to
 consider the tradeoff that the operation takes O(N) time and
 would possibly churn his memory caches with the O(N) memmove
 operation.  In some cases, the Python programmer would
 only have himself to blame for not using a deque in the
 first place.  But maybe he's a maintenance programmer,
 so it's not his fault, and maybe the code he inherits uses
 lists in a pervasive way that makes it hard to swap in deque
 after the fact.  What advice do you give him?
  
 
 Or he could just set them to None.

Fair enough, but that's still wasteful of memory, keeping around a bunch of 
None elements because you can't inexpensively delete them.  I concede that you 
can break the dangling references, though, and that's often where large 
programs waste a lot of memory, so your point is well taken.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Daniel Stutzbach
On Wed, Jan 27, 2010 at 9:55 AM, Steve Howell showel...@yahoo.com wrote:

 Fair enough, but that's still wasteful of memory, keeping around a bunch of
 None elements because you can't inexpensively delete them.


Even if there are many references to it, there is only one None element.

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Virgil Dupras
On Wed, Jan 27, 2010 at 4:55 PM, Steve Howell showel...@yahoo.com wrote:
 --- On Wed, 1/27/10, John Arbash Meinel john.arbash.mei...@gmail.com wrote:

 From: John Arbash Meinel john.arbash.mei...@gmail.com
 Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
 To: Steve Howell showel...@yahoo.com
 Cc: Guido van Rossum gu...@python.org, Nick Coghlan 
 ncogh...@gmail.com, python-dev@python.org
 Date: Wednesday, January 27, 2010, 7:45 AM

  Right now the Python programmer looking to
 aggressively delete elements from the top of a list has to
 consider the tradeoff that the operation takes O(N) time and
 would possibly churn his memory caches with the O(N) memmove
 operation.  In some cases, the Python programmer would
 only have himself to blame for not using a deque in the
 first place.  But maybe he's a maintenance programmer,
 so it's not his fault, and maybe the code he inherits uses
 lists in a pervasive way that makes it hard to swap in deque
 after the fact.  What advice do you give him?
 

 Or he could just set them to None.

 Fair enough, but that's still wasteful of memory, keeping around a bunch of 
 None elements because you can't inexpensively delete them.  I concede that 
 you can break the dangling references, though, and that's often where large 
 programs waste a lot of memory, so your point is well taken.


Why is this thread still going? It seems to me that the case for this
change is very weak. Lists, like dicts and tuples, are used
*everywhere* and in *vast* quantities. Making them grow by 4 or 8
bytes each for such a corner case can't be an option.

I'm sure your new list class has a lot of uses, but it should be an
external class. If it stays close in behavior to the lists' behavior,
then we could even add a notice in pop()'s documentation that
recommends to use your new class in case they want a painless way to
replace list usage (to make the life of those poor developers
maintaining other people's code easier), maybe even add it in stdlib's
collections unit.

--
Virgil Dupras
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Steve Howell
--- On Wed, 1/27/10, Daniel Stutzbach dan...@stutzbachenterprises.com wrote:

 From: Daniel Stutzbach dan...@stutzbachenterprises.com
 Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
 To: Steve Howell showel...@yahoo.com
 Cc: John Arbash Meinel john.arbash.mei...@gmail.com, python-dev@python.org
 Date: Wednesday, January 27, 2010, 8:20 AM
 On Wed, Jan 27,
 2010 at 9:55 AM, Steve Howell showel...@yahoo.com
 wrote:
 
 
 Fair enough, but that's still wasteful of memory,
 keeping around a bunch of None elements because you
 can't inexpensively delete them.  
 
 Even if there are many references to it, there is only one
 None element.
 

I should have been more precise and said the pointers to None, which could be 
reclaimed.  But that's a pretty minor savings--I concede on the greater point, 
you do have the alternative to break dangling references with None, so the 
expense of avoiding remove operations is only local to list itself.






___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Steve Howell
--- On Wed, 1/27/10, Virgil Dupras hs...@hardcoded.net wrote:
 
 Why is this thread still going? It seems to me that the
 case for this
 change is very weak. Lists, like dicts and tuples, are
 used
 *everywhere* and in *vast* quantities. Making them grow by
 4 or 8
 bytes each for such a corner case can't be an option.
 
 I'm sure your new list class has a lot of uses, but it
 should be an
 external class. If it stays close in behavior to the lists'
 behavior,
 then we could even add a notice in pop()'s documentation
 that
 recommends to use your new class in case they want a
 painless way to
 replace list usage (to make the life of those poor
 developers
 maintaining other people's code easier), maybe even add it
 in stdlib's
 collections unit.
 


Lists are indeed used *everywhere* and *vast* quantities, which is why 
optimizations on list operations are worthwhile to pursue.

The particular optimization makes a few tradeoffs, and the consensus here 
appears to be that the ugliest tradeoff is adding the pointer to PyListObject.

There is at least some irony in opposing an optimization to a remove() 
operation on the basis of compacting memory, which isn't to say that the 
argument isn't valid.

There is also the possibility that my initial patch can be refined by somebody 
smarter than myself to eliminate the particular tradeoff.  In fact, Antoine 
Pitrou already suggested an approach, although I agree that it kind of pushes 
the boundary of sanity. :)



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Martin v. Löwis
 Lists are indeed used *everywhere* and *vast* quantities, which is
 why optimizations on list operations are worthwhile to pursue.

Unfortunately, the proposed change is a pessimisation, not an
optimisation. We probably shouldn't discuss it further, at least not
until a PEP gets written by its proponents.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Martin v. Löwis
 In my mind Python's lists should have the same performance
 characteristics as the paper list (or better).

I think you'll have to adjust your mind then. People have
proposed various data structures that *do* work efficiently as
paper lists. So if you want a paper list, use one of them, rather
than abusing the Python list for one.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Steve Howell
--- On Wed, 1/27/10, Antoine Pitrou solip...@pitrou.net wrote:

 From: Antoine Pitrou solip...@pitrou.net
 Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
 To: python-dev@python.org
 Date: Wednesday, January 27, 2010, 6:15 AM
 Nick Coghlan ncoghlan at
 gmail.com writes:
  
  The big practical concern is actually the memory cost
 of adding yet
  another pointer (potentially 8 bytes of data) to every
 list object, even
  empty ones.
 
 It needn't be, actually. Why?
 You only need to store this other pointer when you have an
 orphaned area in the
 first place. But then you can store the pointer *in* the
 orphaned area :-)
 
 So let's say for example:
 - when ob_size = 0, there are no orphans
 - when ob_size  0, the pointer to the orphaned area is
 given by 
   (PyObject **) self-ob_items[-1]
 
 This makes a couple of micro-operations slightly slower
 (you need to take
 ob_size's absolute value to get the length (*)); and it
 kills code which uses
 Py_SIZE() on lists. But PyList_GET_SIZE() exists for a
 reason.
 
 (*) or you could use ob_size's MSB, and then a simple
 binary AND gets you the
 length; or even ob_items's LSB, since the pointer will
 always be aligned on more
 than 1 byte
 
 (and, yes, this is a bit insane)
 

A slightly more sane alternative would be to leave ob_size and ob_item alone 
with their current semantics, and then replace allocated with self-excess, 
where

  self-excess == excess_above * 256 + excess_below.

Right now self-allocated is only used in a few places:
  
  list_resize
  listextend
  listsort
  list_init (but only in an assert)
  list_sizeof

Those places would need to compute:

  allocated = self-ob_size + self-excess / 256 + self-excess % 256

The right hand side could be done with shifting/bitmasking inside a macro.

Excess_left would obviously not be allowed to exceed 256; excess_right would 
max out at PY_SIZE_MAX / 256, instead of PY_SIZE_MAX / 8 + 6, which is probably 
the right thing anyway.

Here is the current algorithm for allocating extra bytes on the right:

new_allocated = (newsize  3) + (newsize  9 ? 3 : 6);

On a 32-bit platform, excess_right would max out at 16M instead of the current 
512M.

Any method that wanted to find the originally allocated pointer would compute

  self-ob_item - (self-excess % 256)

To the extent that most of the ugly details could be encapsulated in macros, I 
do not think this would complicate the list code itself greatly, but it would 
complicate debugging.  Guido has expressed a strong desire to keep a hard 
pointer to the originally allocated chunk, especially with regard to future 
garbage collection changes.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Antoine Pitrou
Le mercredi 27 janvier 2010 à 11:49 -0800, Steve Howell a écrit :
 A slightly more sane alternative would be to leave ob_size and ob_item
 alone with their current semantics, and then replace allocated with
 self-excess, where
 
   self-excess == excess_above * 256 + excess_below.

Or we could use allocated's sign bit in order to store the flag (of
whether there is an orphaned area or not) and then store the orphaned
pointer as ob_items[-1]. Thus we don't have to limit the magnitude of
anything. And since allocated itself isn't used in any really critical
path, it doesn't slow down anything.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Steve Howell
--- On Wed, 1/27/10, Antoine Pitrou solip...@pitrou.net wrote:

 From: Antoine Pitrou solip...@pitrou.net
 Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
 To: python-dev@python.org
 Date: Wednesday, January 27, 2010, 12:41 PM
 Le mercredi 27 janvier 2010 à 11:49
 -0800, Steve Howell a écrit :
  A slightly more sane alternative would be to leave
 ob_size and ob_item
  alone with their current semantics, and then replace
 allocated with
  self-excess, where
  
    self-excess == excess_above * 256
 + excess_below.
 
 Or we could use allocated's sign bit in order to store the
 flag (of
 whether there is an orphaned area or not) and then store
 the orphaned
 pointer as ob_items[-1]. Thus we don't have to limit the
 magnitude of
 anything. And since allocated itself isn't used in any
 really critical
 path, it doesn't slow down anything.
 

Ok, I will try that technique and benchmark it, as it would address one strong 
objection to the proposal. 

I think I've summarized most of the other major objections here:

http://wiki.python.org/moin/ProposalToSpeedUpListOperations


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Raymond Hettinger

On Jan 27, 2010, at 11:38 AM, Martin v. Löwis wrote:

 Lists are indeed used *everywhere* and *vast* quantities, which is
 why optimizations on list operations are worthwhile to pursue.
 
 Unfortunately, the proposed change is a pessimisation, not an
 optimisation. We probably shouldn't discuss it further, at least not
 until a PEP gets written by its proponents.
 

I concur (on both points).

AFAICT, the performance of the proposal:

* increases space requirements by a small fixed amount

* s.append() performance slightly degraded.

* the s.insert(0, value) case isn't helped -- still O(n)

* repeated s.pop(0) have amortized O(1) but either
   needs to waste space indefinitely (likely not what
   the programmer intended by popping off values)
   or needs to do occasional memmoves (which isn't
   as good as a deque which never moves the data).
 
* the resize performance doesn't work well with the
   memory allocator -- while a series of appends can often
   resize in-place without a need to do an O(n) memmove,
   but a series of pop(0) calls doesn't have a resize in-place
   option.

What currently unknown is the effect on third-party extensions
and debugging tools that access the structure directly.
Also, am not sure if this affects psyco or the other 
implementations such as Jython which may implement
lists in terms of existing Java containers.

ISTM, the *only* benefit is that an occasional s.pop(0)
will perform better than it does now but not as well
as a deque which never has to move data).


Raymond

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Daniel Stutzbach
On Wed, Jan 27, 2010 at 3:49 PM, Raymond Hettinger 
raymond.hettin...@gmail.com wrote:

 Also, am not sure if this affects psyco or the other
 implementations such as Jython which may implement
 lists in terms of existing Java containers.


Or Unladen Swallow.  I'm -1 on mucking with any of the fundamental data
structures until the Unladen Swallow patch lands (assuming it lands).

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Steve Howell

--- On Wed, 1/27/10, Raymond Hettinger raymond.hettin...@gmail.com wrote:

 From: Raymond Hettinger raymond.hettin...@gmail.com
 Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
 To: Martin v. Löwis mar...@v.loewis.de
 Cc: Steve Howell showel...@yahoo.com, python-dev@python.org
 Date: Wednesday, January 27, 2010, 1:49 PM
 
 On Jan 27, 2010, at 11:38 AM, Martin v. Löwis wrote:
 
  Lists are indeed used *everywhere* and *vast*
 quantities, which is
  why optimizations on list operations are
 worthwhile to pursue.
  
  Unfortunately, the proposed change is a pessimisation,
 not an
  optimisation. We probably shouldn't discuss it
 further, at least not
  until a PEP gets written by its proponents.
  
 
 I concur (on both points).
 
 AFAICT, the performance of the proposal:
 
 * increases space requirements by a small fixed amount
 
 * s.append() performance slightly degraded.
 
 * the s.insert(0, value) case isn't helped -- still O(n)
 
 * repeated s.pop(0) have amortized O(1) but either
    needs to waste space indefinitely (likely
 not what
    the programmer intended by popping off
 values)
    or needs to do occasional memmoves (which
 isn't
    as good as a deque which never moves the
 data).
  
 * the resize performance doesn't work well with the
    memory allocator -- while a series of
 appends can often
    resize in-place without a need to do an
 O(n) memmove,
    but a series of pop(0) calls doesn't have
 a resize in-place
    option.
 
 What currently unknown is the effect on third-party
 extensions
 and debugging tools that access the structure directly.
 Also, am not sure if this affects psyco or the other 
 implementations such as Jython which may implement
 lists in terms of existing Java containers.
 
 ISTM, the *only* benefit is that an occasional s.pop(0)
 will perform better than it does now but not as well
 as a deque which never has to move data).
 
 

I agree with most of what's said above, with a few clarifications.

The speedup is not limited to pop(0); it also considerably speeds up code like 
below:

n = 80
for i in range(n):
x = lst[:10]
del lst[:10]

Understood that it is a contrived benchmark, and real code like that could be 
replaced with a technique that Nones out the used up elements and advances a 
start pointer.  

Prepends are still O(N) for most use cases, but prepends will reclaim space at 
the front if it's there in O(1).

The biggest performance tradeoff, the extra space requirement in PyListObject, 
can be eliminated, albeit with a pretty ugly hack.

Since there is a lot of discussion about tradeoffs, I would remind folks that 
asking somebody to use a deque instead of a list also forces tradeoffs; you 
lose the comfort and familiarity of lists (not to be underestimated) as well as 
some features (you can't spell d[:10] as d[:10] for example).


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Antoine Pitrou
Raymond Hettinger raymond.hettinger at gmail.com writes:
 
 AFAICT, the performance of the proposal:
 
 * increases space requirements by a small fixed amount

Well, given my proposal (assuming it turns out ok) it doesn't.

 * s.append() performance slightly degraded.

Why?

 * the resize performance doesn't work well with the
memory allocator -- while a series of appends can often
resize in-place without a need to do an O(n) memmove,
but a series of pop(0) calls doesn't have a resize in-place
option.

Can often resize in-place is overstated. It only works if there's a chunk of
sufficient free space just after your list's storage, which is probably not the
common case (except perhaps in microbenchmarks where the list is the latest
allocated object).

 What currently unknown is the effect on third-party extensions
 and debugging tools that access the structure directly.
 Also, am not sure if this affects psyco or the other 
 implementations such as Jython which may implement
 lists in terms of existing Java containers.

Well, if psyco wants to muck with internal implementation details, it's psyco's
problem. We shouldn't refrain from making internal changes under the pretext
that it might break psyco.
Otherwise the psyco author(s) should file a PEP for inclusion in the core
interpreter ;)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Jeffrey Yasskin
On Wed, Jan 27, 2010 at 2:14 PM, Daniel Stutzbach
dan...@stutzbachenterprises.com wrote:
 On Wed, Jan 27, 2010 at 3:49 PM, Raymond Hettinger
 raymond.hettin...@gmail.com wrote:

 Also, am not sure if this affects psyco or the other
 implementations such as Jython which may implement
 lists in terms of existing Java containers.

 Or Unladen Swallow.  I'm -1 on mucking with any of the fundamental data
 structures until the Unladen Swallow patch lands (assuming it lands).

Don't block internal changes for our sake. For the most part, we've
already made plans to make them non-issues. In any cases we haven't,
we need to make them non-issues anyway to avoid slowing down python
development in the future.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Steve Howell
 From: Antoine Pitrou solip...@pitrou.net
 raymond.hettinger at gmail.com writes:
  
  AFAICT, the performance of the proposal:
  
  * increases space requirements by a small fixed
 amount
 
 Well, given my proposal (assuming it turns out ok) it
 doesn't.
 
  * s.append() performance slightly degraded.
 
 Why?
 

A slight degradation is unavoidable AFAICT.  The overhead is roughly this on 
every call:


if (self-orphans = newsize) { // SKIP ALMOST ALWAYS
needed = newsize + self-orphans;

And if it's time for a realloc, you have to pay down a little more bookkeeping:

items = self-ob_item - self-orphans;
self-ob_item = items + self-orphans;
op-orphans = 0;


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Raymond Hettinger

On Jan 27, 2010, at 2:56 PM, Steve Howell wrote:

 
 --- On Wed, 1/27/10, Raymond Hettinger raymond.hettin...@gmail.com wrote:
 
 From: Raymond Hettinger raymond.hettin...@gmail.com
 Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
 To: Martin v. Löwis mar...@v.loewis.de
 Cc: Steve Howell showel...@yahoo.com, python-dev@python.org
 Date: Wednesday, January 27, 2010, 1:49 PM
 
 On Jan 27, 2010, at 11:38 AM, Martin v. Löwis wrote:
 
 Lists are indeed used *everywhere* and *vast*
 quantities, which is
 why optimizations on list operations are
 worthwhile to pursue.
 
 Unfortunately, the proposed change is a pessimisation,
 not an
 optimisation. We probably shouldn't discuss it
 further, at least not
 until a PEP gets written by its proponents.
 
 
 I concur (on both points).
 
 AFAICT, the performance of the proposal:
 
 * increases space requirements by a small fixed amount
 
 * s.append() performance slightly degraded.
 
 * the s.insert(0, value) case isn't helped -- still O(n)
 
 * repeated s.pop(0) have amortized O(1) but either
needs to waste space indefinitely (likely
 not what
the programmer intended by popping off
 values)
or needs to do occasional memmoves (which
 isn't
as good as a deque which never moves the
 data).
 
 * the resize performance doesn't work well with the
memory allocator -- while a series of
 appends can often
resize in-place without a need to do an
 O(n) memmove,
but a series of pop(0) calls doesn't have
 a resize in-place
option.
 
 What currently unknown is the effect on third-party
 extensions
 and debugging tools that access the structure directly.
 Also, am not sure if this affects psyco or the other 
 implementations such as Jython which may implement
 lists in terms of existing Java containers.
 
 ISTM, the *only* benefit is that an occasional s.pop(0)
 will perform better than it does now but not as well
 as a deque which never has to move data).
 
 
 
 I agree with most of what's said above, with a few clarifications.
 
 The speedup is not limited to pop(0); it also considerably speeds up code 
 like below:
 
 n = 80
 for i in range(n):
   x = lst[:10]
   del lst[:10]
 

For this code, the slicing notation is nicer than the equivalent pop 
one-at-a-time code for deques.

The underlying performance isn't a good though -- a deque would free block as 
they become available and would never call a memmove.

The use case is a bit weird.  If people needed something like this, we would 
already see it in reverse order:

   lst.reverse()   # then delete slices from the right

ISTM, that there aren't actually any use cases for left popping in the absence 
of right appending or left inserting; otherwise you could just retrieve the 
slices directly (without doing any deletes).   In cases where the list is 
growing on one end and shrinking on another, a deque wins because it frees 
memory readily and doesn't need memmoves.

 Understood that it is a contrived benchmark, and real code like that could be 
 replaced with a technique that Nones out the used up elements and advances a 
 start pointer.  

Right.


 
 Since there is a lot of discussion about tradeoffs, I would remind folks that 
 asking somebody to use a deque instead of a list also forces tradeoffs; you 
 lose the comfort and familiarity of lists (not to be underestimated) as well 
 as some features (you can't spell d[:10] as d[:10] for example).


If it were actually needed, there is no reason deques couldn't support slicing 
notation.  But then, I've never had a request for it ever (and users typically 
haven't been shy about asking for what they want).

The familiarity argument doesn't hold much water for two reasons:

* deques seem to have a nearly zero learning curve.  There have been no posts 
or reports on people being challenged by the API.  The *only* issue that has 
ever arisen is that a fair number of people have heard of a queue but not heard 
of the term, deque.

* changing the list implementation makes it harder to decide which data 
structure to use.  Currently, it is simple -- if you need to append or pop on 
the left, use a deque; otherwise, use a list.  With the proposed change, the 
performance trade-offs are harding to understand (you can do pop(0) but not 
insert(0,v) unless you've popped first but not after a resize, a deque is 
better is most cases but it is hard to describe real use cases with list.pop(0) 
would be preferable).

* the current design encourages people to use the right data structure for a 
given need.  the proposed change makes the trades-off murky and implementation 
dependent.  you have to know a lot more in order to make the right choices.  
that's not good for usability.  we want tools that are easy to use 
correctly/well.

Raymond___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman

Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Steve Howell


--- On Wed, 1/27/10, Raymond Hettinger raymond.hettin...@gmail.com wrote:

 From: Raymond Hettinger raymond.hettin...@gmail.com

 * the current design encourages people to use
 the right data structure for a given need.  the
 proposed change makes the trades-off murky and
 implementation dependent.  

Are you saying that the current slowness of list for prepops helps people to 
choose more appropriate data structures?  Really

 you have to know a lot more
 in order to make the right choices.  that's not
 good for usability.  we want tools that are easy to use
 correctly/well.

If you want tools that are easy to use correctly, make them bug-free and 
document their behavior.  If you want tools that are easy to use well, then 
make them perform better.  I am not sure how my patch contradicts either of 
these goals.

You keep making the argument that deque is a better alternative to list in many 
situations.  I actually agree with you.  Most programming problems are best 
modelled by a queue.  I am not sure why Python lists get all the syntax sugar 
and promotion over deque, when in reality, Python lists implement a pretty 
useless data structure.  Python lists are a glorification of a C array built on 
top of a memory-upward-biased memory allocator.  As such, they optimize list 
appends (good) but fail awfully on list prepops (bad).  They are much better as 
stacks than queues, even though queues are more useful for the most common 
programming known to man--work through a work queue and delete tasks when they 
are done.

It is not surprising that Python lists are starting to show their lack of 
versatility in 2010.  They're based on 1970's technology.  Python lists are 
really just a thin encapsulation of C arrays built on top of an asymmetrical 
memory manager.

In 2010 you could improve Python lists by releasing from the constraints of 
1970s semantics.  But I am starting to think a more modern approach would be to 
take more useful data structures like deques and give them more sugar.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Alex Gaynor
On Wed, Jan 27, 2010 at 11:30 PM, Steve Howell showel...@yahoo.com wrote:


 --- On Wed, 1/27/10, Raymond Hettinger raymond.hettin...@gmail.com wrote:

 From: Raymond Hettinger raymond.hettin...@gmail.com

 * the current design encourages people to use
 the right data structure for a given need.  the
 proposed change makes the trades-off murky and
 implementation dependent.

 Are you saying that the current slowness of list for prepops helps people to 
 choose more appropriate data structures?  Really

 you have to know a lot more
 in order to make the right choices.  that's not
 good for usability.  we want tools that are easy to use
 correctly/well.

 If you want tools that are easy to use correctly, make them bug-free and 
 document their behavior.  If you want tools that are easy to use well, then 
 make them perform better.  I am not sure how my patch contradicts either of 
 these goals.

 You keep making the argument that deque is a better alternative to list in 
 many situations.  I actually agree with you.  Most programming problems are 
 best modelled by a queue.  I am not sure why Python lists get all the syntax 
 sugar and promotion over deque, when in reality, Python lists implement a 
 pretty useless data structure.  Python lists are a glorification of a C array 
 built on top of a memory-upward-biased memory allocator.  As such, they 
 optimize list appends (good) but fail awfully on list prepops (bad).  They 
 are much better as stacks than queues, even though queues are more useful for 
 the most common programming known to man--work through a work queue and 
 delete tasks when they are done.

 It is not surprising that Python lists are starting to show their lack of 
 versatility in 2010.  They're based on 1970's technology.  Python lists are 
 really just a thin encapsulation of C arrays built on top of an asymmetrical 
 memory manager.

 In 2010 you could improve Python lists by releasing from the constraints of 
 1970s semantics.  But I am starting to think a more modern approach would be 
 to take more useful data structures like deques and give them more sugar.



 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/alex.gaynor%40gmail.com


Python lists implement a pretty useless data structure

It's very difficult for ideas to gain traction when they contain such
useless, and obviously wrong, rhetoric.  There's an enormous body of
code out there that begs to disagree with this assertion.

Alex

-- 
I disapprove of what you say, but I will defend to the death your
right to say it. -- Voltaire
The people's good is the highest law. -- Cicero
Code can always be simpler than you think, but never as simple as you
want -- Me
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Guido van Rossum
On Wed, Jan 27, 2010 at 8:30 PM, Steve Howell showel...@yahoo.com wrote:
 I am not sure why Python lists get all the syntax sugar and promotion over 
deque, when in reality, Python lists implement a pretty useless data 
structure.  Python lists are a glorification of a C array built on top of a 
memory-upward-biased memory allocator.  As such, they optimize list appends 
(good) but fail awfully on list prepops (bad).  They are much better as stacks 
than queues, even though queues are more useful for the most common 
programming known to man--work through a work queue and delete tasks when they 
are done.

 It is not surprising that Python lists are starting to show their lack of 
 versatility in 2010.  They're based on 1970's technology.  Python lists are 
 really just a thin encapsulation of C arrays built on top of an asymmetrical 
 memory manager.

Steve, I think you might as well stop now. I see nothing useful coming
out of pursuing this thread further.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Steve Howell

--- On Wed, 1/27/10, Alex Gaynor alex.gay...@gmail.com wrote:

 
 Python lists implement a pretty useless data structure
 
 It's very difficult for ideas to gain traction when they
 contain such
 useless, and obviously wrong, rhetoric.  There's an
 enormous body of
 code out there that begs to disagree with this assertion.
 

The statement was meant 99% tongue in cheek.  Like probably 99.99% of Python 
programmers, I use lists all the time; that's why I want them to be more 
versatile.

There's also a 1% element of truth to the statement.  To the extent that people 
are arguing for alternative data structures to list, particularly deque, I 
wonder if there actually is some merit in discouraging the use of lists in 
favor of better alternatives.





___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Guido van Rossum
On Wed, Jan 27, 2010 at 8:46 PM, Steve Howell showel...@yahoo.com wrote:
 The statement was meant 99% tongue in cheek.  Like probably 99.99% of Python 
 programmers, I use lists all the time; that's why I want them to be more 
 versatile.

 There's also a 1% element of truth to the statement.  To the extent that 
 people are arguing for alternative data structures to list, particularly 
 deque, I wonder if there actually is some merit in discouraging the use of 
 lists in favor of better alternatives.

Steve, whatever your goal is, you're wearing out your welcome pretty
quickly with your current set of tactics. Try something different.
This thread is full.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-27 Thread Cameron Simpson
On 27Jan2010 23:08, Nick Coghlan ncogh...@gmail.com wrote:
| Cameron Simpson wrote:
|  The proposed change to make pop(0) O(1) involves adding a base offset
|  to the internal list implementation. Doesn't that incur a (small)
|  overhead to _every_ list operation? Doesn't that weaken list as the
|  as efficient as possible implementation of choice for array-like
|  things?
| 
| No, the implementation is smarter than that. The existing pointer in the
| PyList_* structure will continue to point to the first element of the
| list while an extra pointer is added that points to the beginning of the
| allocated memory. The difference between the two memory addresses will
| determine the number of orphan pointer slots that exist at the
| beginning of the list.

Mmmm, that is smarter. Nice.

| Only operations that change the length of the list will be slowed down
| at all, since they will need to take the orphans into account when
| deciding whether or not to resize the allocated memory.

Nice again.

| The big practical concern is actually the memory cost of adding yet
| another pointer (potentially 8 bytes of data) to every list object, even
| empty ones.

Um, yes. How big is a an empty list already?
-- 
Cameron Simpson c...@zip.com.au DoD#743
http://www.cskk.ezoshosting.com/cs/

Who is the happier man, he who has braved the storm of life and lived, or he
who has stayed securely on shore and merely existed?
- Hunter S. Thompson, age seventeen
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-26 Thread Antoine Pitrou
Terry Reedy tjreedy at udel.edu writes:
 
 The idea that CPython should not be improved because it would spoil 
 programmers strikes me as a thin, even desparate objection. One could 
 say that same thing about the recent optimization of string += string so 
 that repeated concats are O(n) instead of O(n*n).

Note that it's not even generally true. It only works if your platform's
realloc() is sufficiently smart. That's why we once have had some
Windows-specific performance problems in the py3k IO module.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-26 Thread Stefan Behnel
Michael Foord, 26.01.2010 01:14:
 How great is the complication? Making list.pop(0) efficient sounds like
 a worthy goal, particularly given that the reason you don't use it is
 because you *know* it is inefficient

I agree. Given a programmer the insight that lists are implemented as
arrays in CPython is essentially saying don't use list.pop(0), it's
SLOW!. So they won't use it, and even avoid it for small lists where it
doesn't really matter. It basically stinks.

Making list.pop(0) fast removes another edge where programmers must
prematurely concentrate on implementation specifics of the interpreter they
use, rather than the functionality they want to implement. I think that's
an improvement to the simplicity that Python provides. It's basically
saying: care about performance when you have to, but otherwise believe in
the core developers to make your code fast enough in most common cases.

Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-26 Thread Steve Howell
--- On Tue, 1/26/10, Stefan Behnel stefan...@behnel.de wrote:
 From: Stefan Behnel stefan...@behnel.de
 Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
 To: python-dev@python.org
 Date: Tuesday, January 26, 2010, 1:27 AM
 Michael Foord, 26.01.2010 01:14:
  How great is the complication? Making list.pop(0)
 efficient sounds like
  a worthy goal, particularly given that the reason you
 don't use it is
  because you *know* it is inefficient
 
 I agree. Giving a programmer the insight that lists are
 implemented as
 arrays in CPython is essentially saying don't use
 list.pop(0), it's
 SLOW!. So they won't use it, and even avoid it for small
 lists where it
 doesn't really matter. It basically stinks.
 
 Making list.pop(0) fast removes another edge where
 programmers must
 prematurely concentrate on implementation specifics of the
 interpreter they
 use, rather than the functionality they want to implement.
 I think that's
 an improvement to the simplicity that Python provides. It's
 basically
 saying: care about performance when you have to, but
 otherwise believe in
 the core developers to make your code fast enough in most
 common cases.
 

Thanks! I agree with you 100%, of course.

In fairness, I think my patch will only negligibly affect the performance of 
almost every Python program in existence.  I don't see any likelihood of 
negative effects for real world programs, but I also concede that the positive 
effects will usually be drowned out by other performance bottlenecks.

So I'd like to push for the patch on the grounds that it simply relieves Python 
programmers from worrying about a needless implementation detail.  But I am 
also prepared to have it rejected on the simple principle of keeping the 
CPython implementation less complex.  I've strived to make the changes as 
uninvasive as possible, but there is no getting around 1) increasing the size 
of PyListObject by an int, and 2) touching every major function in listobject.c 
related to memory management (w/list_resize getting the most surgery).  So 
there are certainly tradeoffs, unless I am overlooking some more clever way to 
implement this.

The core piece of the change that I made was to make deleting elements off the 
top of the list more efficient, by advancing a single pointer forward instead 
of moving an entire chunk of memory backward.  An incidental change was then to 
make inserting elements at the top of the list try to reclaim empty slots.  I 
think the former use case is valid and reasonably common (for some definition 
of common).  The latter use case is probably a lot more rare, but I 
implemented the insert optimization to avoid the spiky double-memmove that the 
delete optimization would have otherwise introduced to prepends that succeeded 
pops.

My current strategy for giving memory back to the OS is overly simple and needs 
refinement, but it basically says that the number of orphaned pointers cannot 
exceed the number of active elements in the list.  This might sound wasteful, 
but the reality is that N list elements consume a lot more memory than N 
orphaned pointers, and the greater attractiveness of list.pop(0) would of 
course lead to earlier garbage collection of the popped elements in real world 
programs.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-26 Thread Nick Coghlan
Steve Howell wrote:
 Ok, I fixed the obvious segfaults, and I am now able to pass all the
 tests on my debug build.  A new diff is attached.

Steve, I suggest creating a new issue at bugs.python.org to track your
proposal rather than sending diffs to the list. Putting the patch code
up on Rietveld is also a good step in helping potential reviewers (I
believe there is some info on using Rietveld in the dev FAQ).

The patch may still end up being rejected, but I know I at least am lot
less opposed to the idea than I was when this thread started.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-26 Thread Steve Howell
 From: Nick Coghlan ncogh...@gmail.com
 
 Steve, I suggest creating a new issue at bugs.python.org to
 track your
 proposal rather than sending diffs to the list. 

Agreed.  After sending out the second, still slightly broken diff, I realized 
that I probably needed to read up on the process.  You can find the issue 
tracked here:

http://bugs.python.org/issue7784

Florent Xicluna suggested applying my patch to 2.x, instead of 3.x, and was 
kind enough to fix up my tab/spacing issues.

 Putting the
 patch code
 up on Rietveld is also a good step in helping potential
 reviewers (I
 believe there is some info on using Rietveld in the dev
 FAQ).

Ok, I will look into Rietveld tomorrow.

 The patch may still end up being rejected, but I know I at
 least am lot
 less opposed to the idea than I was when this thread
 started.
 

Thanks for the encouragement.  Regardless of the outcome, this has been a good 
learning experience for me.  I haven't touched gdb in over a decade; it was 
actually kind of fun to debug some C code.  As I mentioned earlier, Python has 
totally spoiled me for other languages, including C!


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-26 Thread Daniel Stutzbach
On Tue, Jan 26, 2010 at 1:17 AM, Steve Howell showel...@yahoo.com wrote:

 Ok, I fixed the obvious segfaults, and I am now able to pass all the tests
 on my debug build.  A new diff is attached.


Just to be clear, when you say all tests do you mean all of the list
tests or the full Python test suite?

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-26 Thread Steve Howell

--- On Tue, 1/26/10, Daniel Stutzbach dan...@stutzbachenterprises.com wrote:


 Just to be clear, when you say all tests do you
 mean all of the list tests or the full
 Python test suite?
 

The full suite, minus some tests that skipped on my platform.  The last patch I 
posted was broken on test_sys.py, but I have since fixed that.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-26 Thread Steve Howell

--- On Tue, 1/26/10, Antoine Pitrou solip...@pitrou.net wrote:

 From: Antoine Pitrou solip...@pitrou.net
 Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
 To: python-dev@python.org
 Date: Tuesday, January 26, 2010, 12:50 AM
 Terry Reedy tjreedy at
 udel.edu writes:
  
  The idea that CPython should not be improved because
 it would spoil 
  programmers strikes me as a thin, even desparate
 objection. One could 
  say that same thing about the recent optimization of
 string += string so 
  that repeated concats are O(n) instead of O(n*n).
 
 Note that it's not even generally true. It only works if
 your platform's
 realloc() is sufficiently smart. That's why we once have
 had some
 Windows-specific performance problems in the py3k IO
 module.
 


One thing that has struck me in working on the O(1) patch is that almost all of 
my code would be unneeded if C's memory manager were just smart enough to allow 
you to release memory off the top of a chunk equally as efficiently as 
releasing it off the bottom of a chunk.  It's not like memory is asymmetrical; 
it's just the 1980's C interface for memory management that makes us think we 
can only extend and shrink in one direction.  The memory management paradigm 
that CPython builds on top of is at least a quarter century outdated, as far as 
I am concerned.  It might even go back to the 1970s.

It's not Python's fault that OSes or compilers have not modernized their 
approach to memory management, but I do think there will come a time, maybe 25 
years from now, when CPython says enough is enough, it's time to treat memory 
like it's extendible and shrinkable in two directions.




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-26 Thread Glenn Linderman
On approximately 1/26/2010 1:27 AM, came the following characters from 
the keyboard of Stefan Behnel:

Michael Foord, 26.01.2010 01:14:
   

How great is the complication? Making list.pop(0) efficient sounds like
a worthy goal, particularly given that the reason you don't use it is
because you *know* it is inefficient
 

I agree. Given a programmer the insight that lists are implemented as
arrays in CPython is essentially saying don't use list.pop(0), it's
SLOW!. So they won't use it, and even avoid it for small lists where it
doesn't really matter. It basically stinks.

Making list.pop(0) fast removes another edge where programmers must
prematurely concentrate on implementation specifics of the interpreter they
use, rather than the functionality they want to implement. I think that's
an improvement to the simplicity that Python provides. It's basically
saying: care about performance when you have to, but otherwise believe in
the core developers to make your code fast enough in most common cases.

Stefan
   


Being relatively new to Python, and not yet having discovered deque, 
I've coded a list.pop(0) just the other day.  Guess I better go find 
that, and see if my usage should be optimized somehow.


--
Glenn -- http://nevcal.com/
===
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-26 Thread Steve Howell
The patch is now on Rietveld.

http://codereview.appspot.com/194083/show

I wsa getting HTTP errors for certain operations, like trying to publish 
comments, but I am able to see the patch there.

Here is the issue tracker link:

http://bugs.python.org/issue7784

Here is a rough draft of a proposal to include the patch, which includes 
discussion of possible objections:

http://wiki.python.org/moin/ProposalToSpeedUpListOperations


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-26 Thread Adam Olsen
On Mon, Jan 25, 2010 at 23:57, Terry Reedy tjre...@udel.edu wrote:
 On 1/25/2010 9:32 PM, Nick Coghlan wrote:

 However, as Cameron pointed out, the O() value for an operation is an
 important characteristic of containers, and having people get used to an
 O(1) list.pop(0) in CPython could create problems not only for other
 current Python implementations but also for future versions of CPython
 itself.

 The idea that CPython should not be improved because it would spoil
 programmers strikes me as a thin, even desparate objection. One could say
 that same thing about the recent optimization of string += string so that
 repeated concats are O(n) instead of O(n*n). What a trap if people move code
 to other implementations (or older Python) without that new feature.

This is a much better optimization than the string appending
optimization, as it is both portable and robust.

I find it shocking to change a semantic I've come to see as a core
part of the language, but I can't come up with a rational reason to
oppose it.  The approach is sane and the performance impact is
(presumably) negligible.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-26 Thread Steve Howell
--- On Tue, 1/26/10, Guido van Rossum gu...@python.org wrote:

 From: Guido van Rossum gu...@python.org
 Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
 To: Steve Howell showel...@yahoo.com
 Cc: Nick Coghlan ncogh...@gmail.com, python-dev@python.org
 Date: Tuesday, January 26, 2010, 11:17 AM
 On Tue, Jan 26, 2010 at 10:01 AM,
 Steve Howell showel...@yahoo.com
 wrote:
  The patch is now on Rietveld.
 
  http://codereview.appspot.com/194083/show
 
  I wsa getting HTTP errors for certain operations, like
 trying to publish comments, but I am able to see the patch
 there.
 
 Hey Steve,
 
 You seem to be using Rietveld in a slightly odd fashion
 which prevents
 the side-by-side diff and commenting feature fro working.
 
 Try downloading Rietveld's upload.py script
 (http://codereview.appspot.com/static/upload.py) and
 uploading your
 patch again from your SVN client, using
 
   upload.py -i 194083
 
 (This will require the email and password you used to
 upload the issue
 in the first place.)


Ok, I just ran upload.py with my patch.  It shows up on Rietveld, but I am 
still getting 500 errors when I attempt to publish my own comments, so I am not 
sure what I'm doing wrong.

The version that I just uploaded was my working copy, which was off of the 3.x 
trunk.

The version that I attempted to upload earlier, via the URL feature, was 
slightly different--Florent had taken my changes, applied them to 2.x, cleaned 
up tabs/spaces, and posted his diff to the issue tracker:

http://bugs.python.org/issue7784

The content of both patches is the same otherwise.

I suspect the patch has lots of minor cleanup that needs to be done, but I 
should point out one major design decision that probably needs to be addressed 
first.  I chose to store the number of orphans vs. storing the pointer to the 
originally allocated memory.  There was no rhyme or reason for my decision, 
other than it was just how I initially got my head around the problem.  The 
tradeoffs should be pretty obvious--there are places where it's convenient to 
work with the count, but it comes at the cost of pointer arithmetic in other 
places.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-26 Thread Guido van Rossum
On Tue, Jan 26, 2010 at 11:36 AM, Steve Howell showel...@yahoo.com wrote:
 --- On Tue, 1/26/10, Guido van Rossum gu...@python.org wrote:

 From: Guido van Rossum gu...@python.org
 Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
 To: Steve Howell showel...@yahoo.com
 Cc: Nick Coghlan ncogh...@gmail.com, python-dev@python.org
 Date: Tuesday, January 26, 2010, 11:17 AM
 On Tue, Jan 26, 2010 at 10:01 AM,
 Steve Howell showel...@yahoo.com
 wrote:
  The patch is now on Rietveld.
 
  http://codereview.appspot.com/194083/show
 
  I wsa getting HTTP errors for certain operations, like
 trying to publish comments, but I am able to see the patch
 there.

 Hey Steve,

 You seem to be using Rietveld in a slightly odd fashion
 which prevents
 the side-by-side diff and commenting feature fro working.

 Try downloading Rietveld's upload.py script
 (http://codereview.appspot.com/static/upload.py) and
 uploading your
 patch again from your SVN client, using

   upload.py -i 194083

 (This will require the email and password you used to
 upload the issue
 in the first place.)


 Ok, I just ran upload.py with my patch.  It shows up on Rietveld, but I am 
 still getting 500 errors when I attempt to publish my own comments, so I am 
 not sure what I'm doing wrong.

The first Patch Set is broken, and attempts to add comments to it will
fail. As the issue's owner you can delete the first patch set.

 The version that I just uploaded was my working copy, which was off of the 
 3.x trunk.

 The version that I attempted to upload earlier, via the URL feature, was 
 slightly different--Florent had taken my changes, applied them to 2.x, 
 cleaned up tabs/spaces, and posted his diff to the issue tracker:

 http://bugs.python.org/issue7784

So here's how you can fix it: go to Edit Issue and change the
Base: field to the following:

http://svn.python.org/view/*checkout*/python/trunk/

Note the addition of trunk/ at the end.

But it's best not to use the same Rietveld Issue for versions
pertaining to different branches.

 The content of both patches is the same otherwise.

 I suspect the patch has lots of minor cleanup that needs to be done, but I 
 should point out one major design decision that probably needs to be 
 addressed first.  I chose to store the number of orphans vs. storing the 
 pointer to the originally allocated memory.  There was no rhyme or reason for 
 my decision, other than it was just how I initially got my head around the 
 problem.  The tradeoffs should be pretty obvious--there are places where it's 
 convenient to work with the count, but it comes at the cost of pointer 
 arithmetic in other places.

I'd like you to revisit this design decision. It would make life
harder if we ever switched to a conservative garbage collector. I feel
much more comfortable having a hard pointer to the actual memory block
in my hands rather than having to compute where that block starts
using pointer arithmetic.

Note that I'm not endorsing your patch -- I expect that the number of
(real) use cases that aren't already served better or equally well by
deques is very small, and I fear that the real cost of adding
sizeof(ssize_t) bytes to every list object is real (unless there are 8
bytes unused given the block size rounding happening in obmalloc). I
don't believe a microbenchmark proves much of anything.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-26 Thread Steve Howell
From: Guido van Rossum gu...@python.org
 
 So here's how you can fix it: go to Edit Issue and change
 the
 Base: field to the following:
 
 http://svn.python.org/view/*checkout*/python/trunk/


I just deleted the issue altogether for now, since the preferred approach is to 
use a pointer, and that's gonna change the patch in lots of little places, so I 
think it would be wasteful to review it in its current form.
 
 
 I'd like you to revisit this design decision. It would make
 life
 harder if we ever switched to a conservative garbage
 collector. I feel
 much more comfortable having a hard pointer to the actual
 memory block
 in my hands rather than having to compute where that block
 starts
 using pointer arithmetic.

Ok, will do.
 
 Note that I'm not endorsing your patch -- I expect that the
 number of
 (real) use cases that aren't already served better or
 equally well by
 deques is very small, and I fear that the real cost of
 adding
 sizeof(ssize_t) bytes to every list object is real (unless
 there are 8
 bytes unused given the block size rounding happening in
 obmalloc). I
 don't believe a microbenchmark proves much of anything.
 

Fair enough.

It seems to me that the goal of keeping lists super-compact from a memory 
standpoint is contradicted by the code in list_resize that optimistically 
preallocates extra memory on appends.  I'm not arguing against the tradeoff 
there, as I think that optimization has a way more compelling CPU gain for most 
applications than the optimizations I'm proposing.  What I'm really saying is 
that there is some precedent to waste a little memory to make CPython lists run 
faster.  My gut says that the trend of the future is more CPU scarcity than 
memory scarcity, but that's just a WAG.

Just more food for thought--doing the memory management within listobject.c is 
really a workaround to the limitations of the underlying memory manager.  It 
seems conceptually possible to me to design a memory manager that allows you to 
shrink and extend memory chunks on both sides, without any extra memory 
overhead for bookkeeping.  I'm sure the devils in the details, of course.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-26 Thread Guido van Rossum
On Tue, Jan 26, 2010 at 12:46 PM, Steve Howell showel...@yahoo.com wrote:
 It seems to me that the goal of keeping lists super-compact from a memory 
 standpoint is contradicted by the code in list_resize that optimistically 
 preallocates extra memory on appends.

Ah, but that applies for *large* lists. Adding 8 bytes to each list
object applies to *all* lists. I betcha that many programs use many
tiny lists.

If I had a program that was growing and shrinking a list at both ends,
I'd consider a deque. If I had a program that was growing and
shrinking a list at the front, I'd consider maintaining the data
backwards. Extensive use of pop(0) (and insert(0, x) I suppose) is
only defensible if you also need to access the data by index and
negative indexes are not an option. Think about that.

 I'm not arguing against the tradeoff there, as I think that optimization has 
 a way more compelling CPU gain for most applications than the optimizations 
 I'm proposing.  What I'm really saying is that there is some precedent to 
 waste a little memory to make CPython lists run faster.  My gut says that the 
 trend of the future is more CPU scarcity than memory scarcity, but that's 
 just a WAG.

I'm no expert in this area, but considering the cost of cache misses,
I'm not so sure about that.

 Just more food for thought--doing the memory management within listobject.c 
 is really a workaround to the limitations of the underlying memory manager.  
 It seems conceptually possible to me to design a memory manager that allows 
 you to shrink and extend memory chunks on both sides, without any extra 
 memory overhead for bookkeeping.  I'm sure the devils in the details, of 
 course.

Maybe, but that's a project of a different magnitude altogether. For
better or for worse, the assumption is widely made that blocks are
represented by a pointer to the start plus a length. And considering
zero to have special properties has pretty good mathematical backup of
you ask me. :-)

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-26 Thread Nick Coghlan
Adam Olsen wrote:
 This is a much better optimization than the string appending
 optimization, as it is both portable and robust.
 
 I find it shocking to change a semantic I've come to see as a core
 part of the language, but I can't come up with a rational reason to
 oppose it.  The approach is sane and the performance impact is
 (presumably) negligible.

Couldn't have said it better myself...

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-26 Thread Terry Reedy

On 1/26/2010 7:51 AM, Steve Howell wrote:

From: Nick Coghlanncogh...@gmail.com

Steve, I suggest creating a new issue at bugs.python.org to
track your
proposal rather than sending diffs to the list.


I was about to suggest the same thing.

Even if your final patch is not currently accepted, it will remain on 
the tracker indefinitely, where interested parties can potentially use 
it in private customized versions that they compile themselves.


http://bugs.python.org/issue7784

Technical discussion of your patch should mostly migrate to the tracker, 
as with other issues. The issue will appear on both the new tracker 
issues list and the weekly (Friday) tracker summary, so interested 
persons can comment or just insert themselves on the nosy list.


Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-26 Thread Steve Howell
 From: Guido van Rossum gu...@python.org
 Steve Howell showel...@yahoo.com
 wrote:
  It seems to me that the goal of keeping lists
  super-compact from a memory standpoint is contradicted by
  the code in list_resize that optimistically preallocates
  extra memory on appends.
 
 Ah, but that applies for *large* lists. Adding 8 bytes to
 each list
 object applies to *all* lists. I betcha that many programs
 use many
 tiny lists.


Even most tiny-ish lists are wasting memory, though, according to this sequence:

4, 8, 16, 25, ...


 
 If I had a program that was growing and shrinking a list at
 both ends,
 I'd consider a deque. 

I'd use a deque on current Python, but I'd use list with the proposed patch. :)

 If I had a program that was growing
 and
 shrinking a list at the front, I'd consider maintaining the
 data
 backwards. Extensive use of pop(0) (and insert(0, x) I
 suppose) is
 only defensible if you also need to access the data by
 index and
 negative indexes are not an option. Think about that.
 

Maintaining the data backwards creates complications if you are popping off 
small chunks of the list and then subsequently operating on them.  A parser 
would be a good example.  If you reverse the original list, then grab the first 
four lines, for example, and then want to concatenate the first four lines in 
the original order, you will need to reverse them again.

Having to reverse a list to change its performance just seems backward to me.  

  [...] My
  gut says that the trend of the future is more CPU scarcity
  than memory scarcity, but that's just a WAG.
 
 I'm no expert in this area, but considering the cost of
 cache misses,
 I'm not so sure about that.

Ok, good point.
 
  Just more food for thought--doing the memory
 management within listobject.c is really a workaround to the
 limitations of the underlying memory manager.  It seems
 conceptually possible to me to design a memory manager that
 allows you to shrink and extend memory chunks on both sides,
 without any extra memory overhead for bookkeeping.  I'm
 sure the devils in the details, of course.

I asked Doug Lea about this, and he just replied to me that the devil is indeed 
in the details--it would be hard to make memory managers do what I'm proposing 
in an efficient way.


 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-26 Thread Daniel Stutzbach
On Tue, Jan 26, 2010 at 3:32 PM, Steve Howell showel...@yahoo.com wrote:

 Even most tiny-ish lists are wasting memory, though, according to this
 sequence:

 4, 8, 16, 25, ...


Several operations will allocate a list of exactly the right size, wasting
no memory.  In particular, a fixed-size list will always be exactly the
right size.

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-26 Thread Raymond Hettinger
 
 Ah, but that applies for *large* lists. Adding 8 bytes to
 each list
 object applies to *all* lists. I betcha that many programs
 use many
 tiny lists.
 
 
 Even most tiny-ish lists are wasting memory, though, according to this 
 sequence:
 
 4, 8, 16, 25, ...
 


That is only is they are being grown with list.append().
Otherwise, list sizes are exact.  For example, [0,1,2]
uses space for just three entries and [] for none.

I get the impression that 1) you've already made up your
mind and are ignoring input from the other developers, 
2) that you're ignoring the python-dev discussions of long ago 
where this very idea was rejected and deques came in to
being instead, 3) over-estimating the prevalence of use
cases for pop(0), and 4) under-estimating the impact on
small lists, 5) under-estimating the impact on psyco or
other implementations of Python, 6) under-estimating the
impact on third-party extensions and debugging tools.

Deques already provide a solution to the FIFO problem
and they do so without huge wastes of memory or calls
to memmove().  They handle both insertion and deletion
from the front and back.  In comparison, the proposed
changes to lists seem like a complete hack.  Just because
it can be done, doesn't mean that it should.

ISTM that you're attacking an already solved problem
and that you're playing fast and loose with a core data
structure.


Raymond

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-26 Thread M.-A. Lemburg
Benjamin Peterson wrote:
 2010/1/25 Steve Howell showel...@yahoo.com:
 I am interested in creating a patch to make deleting elements from the front
 of Python list work in O(1) time by advancing the ob_item pointer.
 
 How about just using a deque?

... or a stack:

http://www.egenix.com/products/python/mxBase/mxStack/

... or a queue:

http://www.egenix.com/products/python/mxBase/mxQueue/

Specialized implementations usually give the best performance -
of course, it all depends on what you're trying to achieve.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 26 2010)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-26 Thread Cameron Simpson
On 26Jan2010 01:57, Terry Reedy tjre...@udel.edu wrote:
| On 1/25/2010 9:32 PM, Nick Coghlan wrote:
| However, as Cameron pointed out, the O() value for an operation is an
| important characteristic of containers, and having people get used to an
| O(1) list.pop(0) in CPython could create problems not only for other
| current Python implementations but also for future versions of CPython
| itself.
| 
| The idea that CPython should not be improved because it would spoil
| programmers strikes me as a thin, even desparate objection.

Hey, I even used the word thin to describe my concern!

My point was that I look on python builtins like list and dict as highly
optimised, highly efficient facilities. That means that I expect a list
to be very very much like a linear array as one would find in C-like
languages, with realloc() managment behind the scenes to handle the
resizing requirements on append/extend.

It also means that the O() cost of operations should be pretty obvious.
pop(0) obviously involves shuffling everything down one slot.

This means that the python programmer should expect that the common
array type operations such as __getitem__ should be as blazingly fast
as possible, and to expect that barring unusual programmer requirements,
if I'm thinking about arrays I should almost always reach for a python
list.

The proposed change to make pop(0) O(1) involves adding a base offset
to the internal list implementation. Doesn't that incur a (small)
overhead to _every_ list operation? Doesn't that weaken list as the
as efficient as possible implementation of choice for array-like
things?

| One
| could say that same thing about the recent optimization of string +=
| string so that repeated concats are O(n) instead of O(n*n). What a
| trap if people move code to other implementations (or older Python)
| without that new feature.
| 
| Of course, the whole purpose of adding a jit to CPython would be to
| spoil us.
| 
| It is a fact already that optimizing CPython code is specific to a
| particular interpreter, system, and workload.

Yes, optimisation is nice. Higher seed is nice. But there are
optimisations that simple fix inefficiencies or provide a special case
for a common operation, and there are those with side effects elsewhere
in the implementation (i.e. the base offset this pop(0) opt would incur,
which adds a (small) cost to everything).

The other aspect, mentioned elsewhere in this thread, is that
programmers should know the O() cost of these operations.

Cheers,
-- 
Cameron Simpson c...@zip.com.au DoD#743
http://www.cskk.ezoshosting.com/cs/

I am a kind of burr; I shall stick.
- William Shakespeare, _Measure for Measure_
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Steve Howell
I am interested in creating a patch to make deleting elements from the front of 
Python list work in O(1) time by advancing the ob_item pointer.

The patch will probably be rejected, but I would like to try it anyway as an 
exercise in digging into the CPython source, and working through the process.  
My goal is to accompany the proposed patch with a PEP (which I also expect to 
be initially rejected, but which will hopefully be a useful contribution in 
terms of documenting the decision.)

The reason I am posting here is that there appears to be some history behind 
similar patches that I am not aware of, so if anybody can refer me to earlier 
patches, PEPS, discussion threads, etc., I would be grateful.  

I am aware of PEP 3128, which has similar goals to what I'm trying to achieve, 
but there are also some differences.

The blist implementation described in PEP 3128 achieves the goal of reducing 
time complexity for some operations, but necessarily at the expense of other 
operations, notably list access.

The patch that I am considering would not affect time complexity for any other 
operations, nor memory complexity, but it would, of course, have marginal costs 
in certain operations, notably any operation that eventually calls 
list_resize().  I am confident that the patch would not impact performance of 
list accesses at all.  The memory cost for the list itself would be an 
additional pointer or integer, which I think should be considered negligible 
compared to the cost of the list itself [O(N)] and the elements in the list 
[O(N)].  I haven't completely worked out the best strategy to eventually 
release the memory taken up by the pointers of the unreleased elements, but the 
worst case scenario is that the unused memory only gets wasted until the time 
that the list itself gets garbage collected.  I think I can do better than 
that, at some cost of complicating list_resize.  From a memory standpoint, the 
benefits of encouraging deleting items from the front
 of the list should outweigh any disadvantages with respect to lazily releasing 
memory from the pointers at the front of the list, since in deleting elements, 
you allow the elements themselves to be garbage collected earlier, as well as 
objects that might be referenced by those elements.

My goal would be to target the patch at 3.x, and if I was lucky enough to get 
it accepted, I think it could eventually be backported to 2.x.  The proposal 
does not affect the definition of the language itself, of course; it merely 
attempts to improve the performance of the CPython implementation.

The instructions that I have found for setting up your development environment 
seemed to be targeted at 2.x trunk, which is fine with me.  I will attempt the 
patch off  the 2.x trunk to get an initial feel for the issues involved, unless 
somebody counsels me to work off 3.x right from the start.

http://www.python.org/dev/setup/

I have not been able to locate the source code for 3.x.  Is the implementation 
of list more or less the same there?

There is a long thread entitled list.pop(0) vs. collections.dequeue on 
comp.lang.python that addresses alternatives to list.pop(0), but none of them 
really fit my use case.

Here is a sketch of the PEP that I would propose:


Proposal: 


Improve list's implementation so that deleting elements from 
the front of the list does not require an O(N) memmove operation. 


Rationale: 


Some Python programs that process lists have multiple 
methods that consume the first element of the list and pop it off. 
The pattern comes up with parsers in particular, but there are other 
examples.  It is possible now, of course, to use a data structure in 
Python that has O(1) for deleting off the top of the list, but none of 
the alternatives fully replicate the benefits of list itself. 


Specification: 


Improving CPython's performance does not affect the 
language itself, so there are no bikeshed arguments to be had with 
respect to syntax, etc.  Any patch would, of course, affect the 
performance of nearly every Python program in existence, so any patch 
would have to, at a bare minimum: 


  1) Not increase the time or memory complexity of any other list 
operation. 
  2) Not affect list access at all. 
  3) Minimally affect list operations that mutate the list. 
  4) Be reasonably simple within CPython itself. 
  5) Not be grossly wasteful of memory. 


Backwards Compatibility: 


See above.  An implementation of this PEP would not change the 
definition of the language in any way, but it would have to minimally 
impact the performance of lists for the normal use cases. 


Implementation: 


There are two ways to make deleting the first item of the list run 
more efficiently. 


The most ambitious proposal is to fix the memory manager itself to 
allow the release of memory from the start of the chunk.  The 
advantage of this proposal is that it would simplify the changes to 
list itself, and possibly have collateral 

Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Daniel Stutzbach
On Mon, Jan 25, 2010 at 1:22 PM, Steve Howell showel...@yahoo.com wrote:

 I haven't completely worked out the best strategy to eventually release the
 memory taken up by the pointers of the unreleased elements, but the worst
 case scenario is that the unused memory only gets wasted until the time that
 the list itself gets garbage collected.


FWIW, for a long-running FIFO queue, it's critical to release some of the
memory along the way, otherwise the amount of wasted memory is unbounded.

Good luck :)

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Raymond Hettinger

On Jan 25, 2010, at 11:22 AM, Steve Howell wrote:

 I am interested in creating a patch to make deleting elements from the front 
 of Python list work in O(1) time by advancing the ob_item pointer.

+1 on doing whatever experiments you feel like doing
-1 on putting something like this in the core

1) To many things in the Python world rely on the current implementation of 
lists.  It's not worth breaking third-party extensions, tools like psyco, work 
on unladen swallow, and other implementations of Python such as PyPy and Jython.

2). The use of lists pervades the language and it doesn't make sense to have 
the language as a whole pay a price (in terms of speed and space) for every 
list that gets created.  The corner case isn't worth it.

3).  We already got one.  The collections.deque() class was introduced 
specifically to handle inserting and popping from the front of a list 
efficiently.

4).  In the comp.lang.python thread on this subject, you've gotten nearly zero 
support for your position and have managed to offend many of the developers.


Raymond___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Benjamin Peterson
2010/1/25 Steve Howell showel...@yahoo.com:
 I am interested in creating a patch to make deleting elements from the front
 of Python list work in O(1) time by advancing the ob_item pointer.

How about just using a deque?



-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Steve Howell
--- On Mon, 1/25/10, Daniel Stutzbach dan...@stutzbachenterprises.com wrote:

 FWIW, for a long-running FIFO queue, it's critical to
 release some of the memory along the way, otherwise the
 amount of wasted memory is unbounded.
 

Somebody implementing a long-running FIFO queue should actually be using deque 
instead of list, but I agree with your greater point that waiting until the 
list gets garbage-collected to release memory would probably be unacceptable, 
so I'll see what I can do.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Steve Howell
From: Raymond Hettinger raymond.hettin...@gmail.com
 On Jan 25, 2010, at 11:22 AM, Steve Howell
 wrote:
 I
 am interested in creating a patch to make deleting elements
 from the front of Python list work in O(1) time by advancing
 the ob_item pointer.

 +1 on doing whatever experiments you feel like
 doing-1 on putting something like this in the
 core
 1) To many things in the Python world rely on
 the current implementation of lists.  It's not
 worth breaking third-party extensions, tools like psyco,
 work on unladen swallow, and other implementations of Python
 such as PyPy and Jython.

I don't understand how changing the implementation of CPython would impact PyPy 
and Jython, unless you are just referring to the fact that CPython is treated 
as a reference implementation, so its simplicity is a virtue for other ports.  
Am I missing something else?


 2). The use of lists pervades the language and
 it doesn't make sense to have the language as a whole
 pay a price (in terms of speed and space) for every list
 that gets created.  The corner case isn't worth
 it.

I understand the tradeoff.

 3).  We already got one.  The
 collections.deque() class was introduced specifically to
 handle inserting and popping from the front of a list
 efficiently.

I understand that deque solves some problems that list does not.  Obviously, it 
allows you to delete elements off the front in O(1) time, but it also has other 
advantages, such as allowing you to rotate elements efficiently.  It's 
designed, I am guessing, for FIFO queues, and it's a perfectly good data 
structure, just not one that is well suited for all use cases.

 4).  In the comp.lang.python thread on this
 subject, you've gotten nearly zero support for your
 position and have managed to offend many of the
 developers.

Fair enough.  I think the idea is at least worthy of a PEP that puts forward 
the strongest case possible.

Terry Jan Reedy wrote:

'''
I am not opposed to a possible change, just hasty, ill-informed
criticism. If there is not a PEP on this issue, it would be good to have
one that recorded the proposal and the pros and cons, regardless of the
outcome, so there would be something to refer people to. If that had
been already done, it would have shortened this thread considerably.
'''

Do you agree that it is at least worthwhile to write a PEP here?  It could be 
fairly short and quickly rejected, and down the road, if more people get behind 
it, it could always be strengthened and revisited.

There seems to be at least some precedence for PEPs that only pertain to 
internal implementation details, such as this one:

http://www.python.org/dev/peps/pep-0267/

(Raymond, apologies for my double reply...the first one was meant to go to the 
list, not directly to you.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Steve Howell
--- On Mon, 1/25/10, Benjamin Peterson benja...@python.org wrote:

 2010/1/25 Steve Howell showel...@yahoo.com:
  I am interested in creating a patch to make deleting
 elements from the front
  of Python list work in O(1) time by advancing the
 ob_item pointer.

 How about just using a deque?

Deque does not support all the operations that list does.  It is also roughly 
twice as slow for accessing elements (I've measured it).

 lst = ['foo', 'bar', 'baz']
 lst[1:]
['bar', 'baz']
 lst.insert(0, 'spam')
 lst
['spam', 'foo', 'bar', 'baz']


 d = deque(lst)
 d[2:]
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: sequence index must be integer, not 'slice'
 d.insert(0, 'eggs')
Traceback (most recent call last):
  File stdin, line 1, in module
AttributeError: 'collections.deque' object has no attribute 'insert'


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Raymond Hettinger

On Jan 25, 2010, at 12:36 PM, Steve Howell wrote:

 
 Deque does not support all the operations that list does.  It is also roughly 
 twice as slow for accessing elements (I've measured it).


ISTM that apps that want to insert or pop from the front of list are also apps 
that don't care about accessing arbitrary elements in the middle using the 
position index.  When lists are growing or shrinking from the front, the 
meaning of the i-th element changes.   So, it doesn't make sense for an 
application to track indices of objects in such a list.

   i = s.find('abc')
   s.pop(0)
   print s[i]# i no longer points at 'abc'


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Steve Howell
 From: Raymond Hettinger raymond.hettin...@gmail.com

 On Jan 25, 2010, at 12:36 PM, Steve Howell wrote:

 
  Deque does not support all the operations that list
 does.  It is also roughly twice as slow for accessing
 elements (I've measured it).


 ISTM that apps that want to insert or pop from the front of
 list are also apps that don't care about accessing arbitrary
 elements in the middle using the position index.  When
 lists are growing or shrinking from the front, the meaning
 of the i-th element changes.   So, it doesn't
 make sense for an application to track indices of objects in
 such a list.

i = s.find('abc')
s.pop(0)
print s[i]# i no longer
 points at 'abc'


I am not going to directly address your point, but I'd like to give a examples 
of code that uses pop(0) from the standard library.

If you look at the code for multiprocessing/connection.py, you will see that 
PipeListener creates _handle_queue as an ordinary Python list, and in line 317 
it uses pop(0) to pop the first handle off the top of the queue.

Why does that code not use a deque?  In hindsight, it probably should.  But to 
make the change now, it's not a simple matter of fixing just PipeListener, 
because PipeListener passes off _handle_queue to Finalize, which also expects a 
list.

In order to understand why Finalize expects a list, you need to look at how it 
uses args, and here is one example usage:

res = self._callback(*self._args, **self._kwargs)

Ok, so now you need to know what self._callback is doing, so now you have to 
trace through all callers of Finalize are passing in for their args.

So what seems like a trivial matter--switching over a list to a deque--actually 
requires a lot of thinking.

It turns out that the callback for PipeListener just iterates through the 
remaining handles and closes them.  So a deque would not break that code. 

If you look at difflib.py, it also does pop(0) in a loop.  Why doesn't it use a 
deque?  Simplicity, maybe?

codecs.py also deletes from the top of the list:

del self.linebuffer[0]
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Benjamin Peterson
2010/1/25 Steve Howell showel...@yahoo.com:
 From: Raymond Hettinger raymond.hettin...@gmail.com

 On Jan 25, 2010, at 12:36 PM, Steve Howell wrote:

 
  Deque does not support all the operations that list
 does.  It is also roughly twice as slow for accessing
 elements (I've measured it).


 ISTM that apps that want to insert or pop from the front of
 list are also apps that don't care about accessing arbitrary
 elements in the middle using the position index.  When
 lists are growing or shrinking from the front, the meaning
 of the i-th element changes.   So, it doesn't
 make sense for an application to track indices of objects in
 such a list.

    i = s.find('abc')
    s.pop(0)
    print s[i]    # i no longer
 points at 'abc'


 I am not going to directly address your point, but I'd like to give a 
 examples of code that uses pop(0) from the standard library.

 If you look at the code for multiprocessing/connection.py, you will see that 
 PipeListener creates _handle_queue as an ordinary Python list, and in line 
 317 it uses pop(0) to pop the first handle off the top of the queue.

 Why does that code not use a deque?  In hindsight, it probably should.  But 
 to make the change now, it's not a simple matter of fixing just PipeListener, 
 because PipeListener passes off _handle_queue to Finalize, which also expects 
 a list.

 In order to understand why Finalize expects a list, you need to look at how 
 it uses args, and here is one example usage:

 res = self._callback(*self._args, **self._kwargs)

 Ok, so now you need to know what self._callback is doing, so now you have to 
 trace through all callers of Finalize are passing in for their args.

 So what seems like a trivial matter--switching over a list to a 
 deque--actually requires a lot of thinking.

 It turns out that the callback for PipeListener just iterates through the 
 remaining handles and closes them.  So a deque would not break that code.

 If you look at difflib.py, it also does pop(0) in a loop.  Why doesn't it use 
 a deque?  Simplicity, maybe?

 codecs.py also deletes from the top of the list:

 del self.linebuffer[0]

Yes, but in either of these cases is there an excellent performance
improvement to be gained and is it worth the complexity of your
optimization? I think not.


-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Mike Klaas
On Mon, Jan 25, 2010 at 11:32 AM, Daniel Stutzbach
dan...@stutzbachenterprises.com wrote:
 On Mon, Jan 25, 2010 at 1:22 PM, Steve Howell showel...@yahoo.com wrote:

 I haven't completely worked out the best strategy to eventually release
 the memory taken up by the pointers of the unreleased elements, but the
 worst case scenario is that the unused memory only gets wasted until the
 time that the list itself gets garbage collected.

 FWIW, for a long-running FIFO queue, it's critical to release some of the
 memory along the way, otherwise the amount of wasted memory is unbounded.

 Good luck :)

It seems to me that the best way to do this is invert .append() logic:
leave at most X amount of wasted space at the beginning of the list,
where X is a constant fraction of the list size.

Whether it is worth adding a extra pointer to the data stored by a
list is another story.

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Steve Howell
--- On Mon, 1/25/10, Mike Klaas mike.kl...@gmail.com wrote:

 From: Mike Klaas mike.kl...@gmail.com
  On Mon, Jan 25, 2010 at 1:22 PM, Steve Howell showel...@yahoo.com
 wrote:
 
  I haven't completely worked out the best strategy
 to eventually release
  the memory taken up by the pointers of the
 unreleased elements, but the
  worst case scenario is that the unused memory only
 gets wasted until the
  time that the list itself gets garbage collected.
 
  FWIW, for a long-running FIFO queue, it's critical to
 release some of the
  memory along the way, otherwise the amount of wasted
 memory is unbounded.
 
  Good luck :)
 
 It seems to me that the best way to do this is invert
 .append() logic:
 leave at most X amount of wasted space at the beginning of
 the list,
 where X is a constant fraction of the list size.

That's roughly what I intend to do.  The problems are not entirely symmetric.  
For appends, the wasted space exists in the sense that CPython optimistically 
get extra chunks of memory for future appends, to save CPU at the possible cost 
of needlessly allocating memory.

For pops, the bargain would be that you optimistically defer releasing memory 
to save CPU cycles in the hope that memory is not scarce.  Of course, if you 
have just popped an element off the list, then you have just made memory less 
scarce by virtue of removing the list elements themselves.

 Whether it is worth adding a extra pointer to the data
 stored by a
 list is another story.

That is definitely a point of contention.  It would certainly bloat a program 
that had millions of empty lists.  I think in most real-world programs, though, 
the amount of memory taken by PyListObjects will always be greatly exceeded by 
the amount of memory used by the list elements, or even just the pointers to 
the list elements.  It's the difference between the number of elements in a 
list, O(N), and the number of structures that define a list's state, O(1).

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Christian Heimes
Benjamin Peterson wrote:
 Yes, but in either of these cases is there an excellent performance
 improvement to be gained and is it worth the complexity of your
 optimization? I think not.

Me, too.
I already tried to explain Steve that I have used list.pop(0) in very
few cases during my seven years as a professional Python developer.
Since I knew that popping from the beginning of a list is slower than
popping from the end or just leaving the list unmodified I found ways to
alter my algorithms. The few cases left were either not performance
critical or used dequeue instead.

I vote -0.5 on the change unless Guido, Tim or Raymond think that the
size and complication impact is worth the hassle.

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Michael Foord

On 26/01/2010 00:12, Christian Heimes wrote:

Benjamin Peterson wrote:
   

Yes, but in either of these cases is there an excellent performance
improvement to be gained and is it worth the complexity of your
optimization? I think not.
 

Me, too.
I already tried to explain Steve that I have used list.pop(0) in very
few cases during my seven years as a professional Python developer.
Since I knew that popping from the beginning of a list is slower than
popping from the end or just leaving the list unmodified I found ways to
alter my algorithms. The few cases left were either not performance
critical or used dequeue instead.

I vote -0.5 on the change unless Guido, Tim or Raymond think that the
size and complication impact is worth the hassle.
   


How great is the complication? Making list.pop(0) efficient sounds like 
a worthy goal, particularly given that the reason you don't use it is 
because you *know* it is inefficient (so the fact that you don't use it 
isn't evidence that it isn't wanted - merely evidence that you had to 
work around the known inefficiency).


Michael


Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
   



--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of 
your employer, to release me from all obligations and waivers arising from any 
and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, 
clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and 
acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your 
employer, its partners, licensors, agents and assigns, in perpetuity, without 
prejudice to my ongoing rights and privileges. You further represent that you 
have the authority to release me from any BOGUS AGREEMENTS on behalf of your 
employer.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Steve Howell
--- On Mon, 1/25/10, Benjamin Peterson benja...@python.org wrote:

 From: Benjamin Peterson benja...@python.org
 Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
 To: Steve Howell showel...@yahoo.com
 Cc: python-dev@python.org
 Date: Monday, January 25, 2010, 3:15 PM
 2010/1/25 Steve Howell showel...@yahoo.com:
  From: Raymond Hettinger raymond.hettin...@gmail.com
 
  On Jan 25, 2010, at 12:36 PM, Steve Howell wrote:
 
  
   Deque does not support all the operations
 that list
  does.  It is also roughly twice as slow for
 accessing
  elements (I've measured it).
 
 
  ISTM that apps that want to insert or pop from the
 front of
  list are also apps that don't care about accessing
 arbitrary
  elements in the middle using the position index.
  When
  lists are growing or shrinking from the front, the
 meaning
  of the i-th element changes.   So, it doesn't
  make sense for an application to track indices of
 objects in
  such a list.
 
     i = s.find('abc')
     s.pop(0)
     print s[i]    # i no longer
  points at 'abc'
 
 
  I am not going to directly address your point, but I'd
 like to give a examples of code that uses pop(0) from the
 standard library.
 
  If you look at the code for
 multiprocessing/connection.py, you will see that
 PipeListener creates _handle_queue as an ordinary Python
 list, and in line 317 it uses pop(0) to pop the first handle
 off the top of the queue.
 
  Why does that code not use a deque?  In hindsight, it
 probably should.  But to make the change now, it's not a
 simple matter of fixing just PipeListener, because
 PipeListener passes off _handle_queue to Finalize, which
 also expects a list.
 
  In order to understand why Finalize expects a list,
 you need to look at how it uses args, and here is one
 example usage:
 
  res = self._callback(*self._args, **self._kwargs)
 
  Ok, so now you need to know what self._callback is
 doing, so now you have to trace through all callers of
 Finalize are passing in for their args.
 
  So what seems like a trivial matter--switching over a
 list to a deque--actually requires a lot of thinking.
 
  It turns out that the callback for PipeListener just
 iterates through the remaining handles and closes them.  So
 a deque would not break that code.
 
  If you look at difflib.py, it also does pop(0) in a
 loop.  Why doesn't it use a deque?  Simplicity, maybe?
 
  codecs.py also deletes from the top of the list:
 
  del self.linebuffer[0]
 
 Yes, but in either of these cases is there an excellent
 performance
 improvement to be gained and is it worth the complexity of
 your
 optimization? I think not.
 

I am starting to think that the optimization would be drowned out by the cost 
of processing each line, unless you had some combination of the following:
 
 1) a pretty large list (plausible)
 2) a very inexpensive operation that you were applying to each line (rare)
 3) a really slow memmove implementation (extremely doubtful)

The complexity of the optimization does not phase me for some reason.  If 
ruthless simplicity were the only goal, then I'd also simplify/remove some of 
this code:

/* Bypass realloc() when a previous overallocation is large enough
   to accommodate the newsize.  If the newsize falls lower than half
   the allocated size, then proceed with the realloc() to shrink the list.
*/
if (allocated = newsize  newsize = (allocated  1)) {
assert(self-ob_item != NULL || newsize == 0);
Py_SIZE(self) = newsize;
return 0;
}

/* This over-allocates proportional to the list size, making room
 * for additional growth.  The over-allocation is mild, but is
 * enough to give linear-time amortized behavior over a long
 * sequence of appends() in the presence of a poorly-performing
 * system realloc().
 * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
 */
new_allocated = (newsize  3) + (newsize  9 ? 3 : 6);

In my mind, though, the complexity within CPython does not have to leak up to 
the Python level.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Christian Heimes
Michael Foord wrote:
 How great is the complication? Making list.pop(0) efficient sounds like 
 a worthy goal, particularly given that the reason you don't use it is 
 because you *know* it is inefficient (so the fact that you don't use it 
 isn't evidence that it isn't wanted - merely evidence that you had to 
 work around the known inefficiency).

The implementation must be changed in at least four places:

* The PyListObject struct gets an additional pointer that stores a
reference to the head. I would keep the head (element 0) of the list in
**ob_item and the reference to the malloc()ed array in a new pointer
*ob_allocated.

* PyList_New() stores the pointer to the allocated memory in
op-ob_allocated and sets op-ob_item = op-ob_allocated

* listpop() moves the op-ob_item pointer by one  for the special case
of pop(0)

* list_resize() should occasionally compact the free space before the
head with memcpy() if it gets too large.

listinsert() could be optimized for 0 if the list has some free space in
front of the header, too.

I favor this approach over an integer offset because doesn't change the
semantic of ob_item.

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Michael Foord

On 26/01/2010 00:28, Christian Heimes wrote:

Michael Foord wrote:
   

How great is the complication? Making list.pop(0) efficient sounds like
a worthy goal, particularly given that the reason you don't use it is
because you *know* it is inefficient (so the fact that you don't use it
isn't evidence that it isn't wanted - merely evidence that you had to
work around the known inefficiency).
 

The implementation must be changed in at least four places:

* The PyListObject struct gets an additional pointer that stores a
reference to the head. I would keep the head (element 0) of the list in
**ob_item and the reference to the malloc()ed array in a new pointer
*ob_allocated.

* PyList_New() stores the pointer to the allocated memory in
op-ob_allocated and sets op-ob_item = op-ob_allocated

* listpop() moves the op-ob_item pointer by one  for the special case
of pop(0)

* list_resize() should occasionally compact the free space before the
head with memcpy() if it gets too large.

listinsert() could be optimized for 0 if the list has some free space in
front of the header, too.

I favor this approach over an integer offset because doesn't change the
semantic of ob_item.

Christian
   
Well, on the face of it this doesn't sound like a huge increase in 
complexity. Not that I'm qualified to judge.


Michael

--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of 
your employer, to release me from all obligations and waivers arising from any 
and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, 
clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and 
acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your 
employer, its partners, licensors, agents and assigns, in perpetuity, without 
prejudice to my ongoing rights and privileges. You further represent that you 
have the authority to release me from any BOGUS AGREEMENTS on behalf of your 
employer.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Alex Gaynor
On Mon, Jan 25, 2010 at 7:32 PM, Michael Foord
fuzzy...@voidspace.org.uk wrote:
 On 26/01/2010 00:28, Christian Heimes wrote:

 Michael Foord wrote:


 How great is the complication? Making list.pop(0) efficient sounds like
 a worthy goal, particularly given that the reason you don't use it is
 because you *know* it is inefficient (so the fact that you don't use it
 isn't evidence that it isn't wanted - merely evidence that you had to
 work around the known inefficiency).


 The implementation must be changed in at least four places:

 * The PyListObject struct gets an additional pointer that stores a
 reference to the head. I would keep the head (element 0) of the list in
 **ob_item and the reference to the malloc()ed array in a new pointer
 *ob_allocated.

 * PyList_New() stores the pointer to the allocated memory in
 op-ob_allocated and sets op-ob_item = op-ob_allocated

 * listpop() moves the op-ob_item pointer by one  for the special case
 of pop(0)

 * list_resize() should occasionally compact the free space before the
 head with memcpy() if it gets too large.

 listinsert() could be optimized for 0 if the list has some free space in
 front of the header, too.

 I favor this approach over an integer offset because doesn't change the
 semantic of ob_item.

 Christian


 Well, on the face of it this doesn't sound like a huge increase in
 complexity. Not that I'm qualified to judge.

 Michael

 --
 http://www.ironpythoninaction.com/
 http://www.voidspace.org.uk/blog

 READ CAREFULLY. By accepting and reading this email you agree, on behalf of
 your employer, to release me from all obligations and waivers arising from
 any and all NON-NEGOTIATED agreements, licenses, terms-of-service,
 shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure,
 non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have
 entered into with your employer, its partners, licensors, agents and
 assigns, in perpetuity, without prejudice to my ongoing rights and
 privileges. You further represent that you have the authority to release me
 from any BOGUS AGREEMENTS on behalf of your employer.


 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/alex.gaynor%40gmail.com


Does anyone know if any other language's automatic array (or whatever
they call it) special case the pop(0) case like this?

Alex

-- 
I disapprove of what you say, but I will defend to the death your
right to say it. -- Voltaire
The people's good is the highest law. -- Cicero
Code can always be simpler than you think, but never as simple as you
want -- Me
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Steve Howell
--- On Mon, 1/25/10, Christian Heimes li...@cheimes.de wrote:
 From: Christian Heimes li...@cheimes.de
 Michael Foord wrote:
  How great is the complication? Making list.pop(0)
 efficient sounds like 
  a worthy goal, particularly given that the reason you
 don't use it is 
  because you *know* it is inefficient (so the fact that
 you don't use it 
  isn't evidence that it isn't wanted - merely evidence
 that you had to 
  work around the known inefficiency).
 
 The implementation must be changed in at least four
 places:
 
 * The PyListObject struct gets an additional pointer that
 stores a
 reference to the head. I would keep the head (element 0) of
 the list in
 **ob_item and the reference to the malloc()ed array in a
 new pointer
 *ob_allocated.
 
 * PyList_New() stores the pointer to the allocated memory
 in
 op-ob_allocated and sets op-ob_item =
 op-ob_allocated
 
 * listpop() moves the op-ob_item pointer by one 
 for the special case
 of pop(0)
 
 * list_resize() should occasionally compact the free space
 before the
 head with memcpy() if it gets too large.
 
 listinsert() could be optimized for 0 if the list has some
 free space in
 front of the header, too.
 
 I favor this approach over an integer offset because
 doesn't change the
 semantic of ob_item.
 

The approach that Christian outlines is exactly what I intend to accomplish, 
even if the patch does get permanently or temporarily rejected.

I am pretty confident about what needs to be done within list_ass_slice, 
including the listinsert() optimization.  I also see where I need to add the 
new pointer (ob_allocated seems like a good name to me) within the PyListObject 
struct.

Still wrestling with the other details, though.  My C is pretty rusty, and of 
course I have the extreme versatility of Python to blame for that! :)




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Cameron Simpson
On 25Jan2010 12:34, Steve Howell showel...@yahoo.com wrote:
| From: Raymond Hettinger raymond.hettin...@gmail.com
|  1) To many things in the Python world rely on
|  the current implementation of lists.  It's not
|  worth breaking third-party extensions, tools like psyco,
|  work on unladen swallow, and other implementations of Python
|  such as PyPy and Jython.
| 
| I don't understand how changing the implementation of CPython would
| impact PyPy and Jython, unless you are just referring to the fact that
| CPython is treated as a reference implementation, so its simplicity is
| a virtue for other ports.  Am I missing something else?

I can think of something: lists traditionally have O(n) pop(0) performance
because they're normally quite simple layers on top of an array.

(And likewise for pop(1), which your approach won't help; I know you
could measure the list and decide pop(1) is a small copy of the left of
the list along with a move of the base offset, but that degrades as you
move from 0 and 1 to larger indices).

Supposing pop(0) becomes cheap in CPython and this becomes well known
(or worse, documented:-)
Someone depending on this now has code that is fundamentally inefficient
on other Pythons.

I know this is a slightly thin objection, since your change can probably
be taken to the other implementations.
-- 
Cameron Simpson c...@zip.com.au DoD#743
http://www.cskk.ezoshosting.com/cs/

But pessimism IS realism!   - D.L.Bahr
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Nick Coghlan
Michael Foord wrote:
 On 26/01/2010 00:28, Christian Heimes wrote:
 I favor this approach over an integer offset because doesn't change the
 semantic of ob_item.

 Well, on the face of it this doesn't sound like a huge increase in
 complexity. Not that I'm qualified to judge.

Christian's approach is a good one for minimising the semantic changes,
and compared to an offset based approach actually has a decent chance of
working without breaking too much C code (you'd have to change the way
sys.getsizeof() worked for lists, but I can't think of anything else off
the top of my head that would definitely break).

The potential of resizing and hence relocation of the storage buffer
means it was already unsafe to cache ob_item when changing the size of
the list, so code should generally be unaware that ob_item can now
change even when the buffer isn't reallocated.

However, as Cameron pointed out, the O() value for an operation is an
important characteristic of containers, and having people get used to an
O(1) list.pop(0) in CPython could create problems not only for other
current Python implementations but also for future versions of CPython
itself.

This idea changes list from a simple concept (ob_item points to the
beginning of an allocated array which may change length) to a more
complicated deque-like one (ob_item points somewhere near the beginning
of an allocated array which may change length).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Steve Howell
--- On Mon, 1/25/10, Nick Coghlan ncogh...@gmail.com wrote:

 From: Nick Coghlan ncogh...@gmail.com
 Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
 To: Michael Foord fuzzy...@voidspace.org.uk
 Cc: Christian Heimes li...@cheimes.de, python-dev@python.org
 Date: Monday, January 25, 2010, 6:32 PM
 Michael Foord wrote:
  On 26/01/2010 00:28, Christian Heimes wrote:
  I favor this approach over an integer offset
 because doesn't change the
  semantic of ob_item.
     
  Well, on the face of it this doesn't sound like a huge
 increase in
  complexity. Not that I'm qualified to judge.
 
 Christian's approach is a good one for minimising the
 semantic changes,
 and compared to an offset based approach actually has a
 decent chance of
 working without breaking too much C code (you'd have to
 change the way
 sys.getsizeof() worked for lists, but I can't think of
 anything else off
 the top of my head that would definitely break).

As I'm diving into this, it is clear that you want to preserve the semantics of 
ob_item and ob_size, as they are used in a whole bunch of places.  For now I am 
tracking a var called orphans, which subtly changes one invariant:

  0 = ob_size + orphans = allocated

I think Christian covered most of the places that would need to change, and 
list_dealloc would also need to change.
 
 However, as Cameron pointed out, the O() value for an
 operation is an
 important characteristic of containers, and having people
 get used to an
 O(1) list.pop(0) in CPython could create problems not only
 for other
 current Python implementations but also for future versions
 of CPython
 itself.

I hadn't thought of that.

Here are the objections that I've heard or thought of myself:

 * The simplicity of the current implementation is important beyond the normal 
benefits of simplicity, since it is also a reference implementation for other 
ports of Python.
 * People who got used to O(1) in one version of Python might have unpleasant 
surprises when they went to other versions.
 * Alternatives to list already exist, such as deque and blist 
 * An O(1) solution would increase the size of PyListObject.
 * An O(1) solution would postpone the release of the memory from the orphaned 
pointers.
 * An O(1) solution would slow down calls to list_resize, PyList_new, and 
list_dealloc.
 * For small and medium sized lists, memmove()'s penalty is usually drowned out 
by other operations on the list elements.
 * The use case of popping elements off a large list is not that common 
(although this might be somewhat driven by the documented slowness)
 * There may be third party code that relies on the current internal 
implementation

Did I leave anything out?

Here are the benefits of an O(1) implementation.

 * O(1) is faster than O(N) for some, presumably quite small, value of N.
 * Performance benefits tend to compound.  If you have P processes doing pop(0) 
in a loop on an N-element list, you are saving P * N memmoves of size kN.
 * The technique required to make O(1) work is simple at its core--advance a 
pointer forward instead of moving the memory backward.
 * Encouraging the use of pop(0) will lead to leaner Python programs that 
release memory earlier in the process.
 * While not super common, there do exist programs today that pop from the top, 
either using pop itself or del, including programs in the standard library
 * The language moratorium provides a good window for performance improvements 
in general (even if this one does not pass the litmus test for other reasons)

Did I leave anything out?


 


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Steve Howell
I made enough of a patch to at least get a preliminary benchmark.

The program toward the bottom of this email runs over 100 times faster with my 
patch.  The patch still has a ways to go--I use a very primitive scheme to 
reclaim orphan pointers (1000 at a time) and I am still segfaulting when 
removing the last element of the list.  But the initial results at least 
confirm that the intended benefit is achievable.

I've attached the diff, in case anyone wants to try it out or help me figure 
out what else needs to change.

The core piece of the patch is this--everything else is memory management 
related.

+if (ilow == 0) {
+a-orphans += 1;
+a-ob_item += (-1 * d);
+}
+else {
+memmove(item[ihigh+d], item[ihigh],
+(Py_SIZE(a) - ihigh)*sizeof(PyObject *));
+}


import time

n = 8

lst = []
for i in range(n):
lst.append(i)

t = time.time()
for i in range(n-1):
del lst[0]

print('time = ' + str(time.time() - t))
print(len(lst))
print('got here at least')


show...@showell-laptop:~/PYTHON/py3k$ cat BEFORE 
0
2.52699589729

show...@showell-laptop:~/PYTHON/py3k$ cat AFTER 
time = 0.0216660499573
1
got here at least


Python 3.2a0 (py3k:77751M, Jan 25 2010, 20:25:21) 
[GCC 4.3.3] on linux2
Type help, copyright, credits or license for more information.
 2.526996 / 0.021666
116.63417335918028


DIFF
Description: Binary data
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Terry Reedy

On 1/25/2010 9:32 PM, Nick Coghlan wrote:


However, as Cameron pointed out, the O() value for an operation is an
important characteristic of containers, and having people get used to an
O(1) list.pop(0) in CPython could create problems not only for other
current Python implementations but also for future versions of CPython
itself.


The idea that CPython should not be improved because it would spoil 
programmers strikes me as a thin, even desparate objection. One could 
say that same thing about the recent optimization of string += string so 
that repeated concats are O(n) instead of O(n*n). What a trap if people 
move code to other implementations (or older Python) without that new 
feature.


Of course, the whole purpose of adding a jit to CPython would be to 
spoil us.


It is a fact already that optimizing CPython code is specific to a 
particular interpreter, system, and workload.


Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Steve Howell
--- On Mon, 1/25/10, Steve Howell showel...@yahoo.com wrote:

 From: Steve Howell showel...@yahoo.com
 Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
 To: Michael Foord fuzzy...@voidspace.org.uk, Nick Coghlan 
 ncogh...@gmail.com
 Cc: Christian Heimes li...@cheimes.de, python-dev@python.org
 Date: Monday, January 25, 2010, 8:33 PM
 I made enough of a patch to at least
 get a preliminary benchmark.
 
 The program toward the bottom of this email runs over 100
 times faster with my patch.  The patch still has a ways
 to go--I use a very primitive scheme to reclaim orphan
 pointers (1000 at a time) and I am still segfaulting when
 removing the last element of the list.  But the initial
 results at least confirm that the intended benefit is
 achievable.
 

Ok, I fixed the obvious segfaults, and I am now able to pass all the tests on 
my debug build.  A new diff is attached.

There is still at least one bug in my code in listextend, which the tests do 
not seem to expose, so I will try to at least beef up the test suite a bit.

I really like listobject.c.  Very clean code, very easy to understand.  I guess 
I shouldn't be surprised.





DIFF
Description: Binary data
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com