Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Fri, Jan 29, 2010 at 11:25 PM, Stephen J. Turnbull step...@xemacs.org wrote: Josiah Carlson writes: Lisp lists are really stacks No, they're really (ie, concretely) singly-linked lists. Now, stacks are an abstract data type, and singly-linked lists provide an efficient implementation of stacks. But that's not what linked lists really are. For example, singly-linked lists are also a reasonable way to implement inverted trees (ie, the node knows its parent, but not its children), which is surely not a stack. The Python use of list to denote what is concretely a dynamically extensible one-dimensional array confused me a bit. But what the heck, Guido needed a four-letter word to denote a concrete type used to implement a mutable sequence ADT, and he wasn't going to borrow one from that French guy on the ramparts, right? No big deal. Ahem... So the confusion here is that in Python, list denotes a particular concrete data type, while Steve H. is using a more abstract idea of list as mutable sequence to suggest there's a reason for optimizing certain mutations that Python's data type isn't good at. I don't think that's an effective way for him to make his point, unfortunately. But both usages are consistent with Python's usage; mutability is the usual way that lists are distinguished from tuples, for example, and the underlying dynamic array implementation is rarely mentioned. My experience with Lisp is limited to mzScheme and DrScheme, but AFAIR, neither of them had mutable lists. Both had list semantics that were equivalent (in terms of limitations and functionality) to the structure I described using tuples in my earlier post. If other Lisp implementations have mutable lists, I'd be surprised to learn that. However, now we are well into the weeds, far off the track of whether or not Steve H's feature is worth saddling Python lists with cruft. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Fri, Jan 29, 2010 at 11:31 PM, Stephen J. Turnbull step...@xemacs.org wrote: Josiah Carlson writes: On Thu, Jan 28, 2010 at 8:57 PM, Steve Howell showel...@yahoo.com wrote: What do you think of LISP, and car in particular (apart from the stupidly cryptic name)? Apples and oranges. True, but speaking of Lisp lists, here's some possibly relevant experience. About 10 years ago, XEmacs converted its cons type from a special immediate representation (ie, cons == (car, cdr)) to a generic record representation (ie, cons == (pointer to type descriptor, car, cdr)). This resulted in a perceptible increase in VM usage and disk usage. A typical running XEmacs instance for me contains about 0.75 million conses and uses 200MB of VM, so with 32-bit pointers that's about 3MB extra, or 1.5%, and with 64-bit pointers it's 6MB extra, about 3%. However, I tend to have several big buffers (20-50MB) of pure character data; people who work with smaller buffers on 64-bit machines have reported as much as 10% extra overhead. On disk, the binary is typically about 9MB stripped. That contains about 50,000 conses, or an extra 200KB/400KB with the new structure, somewhat more than my experience (2% or 4%). Some people complained, but we considered this well worthwhile (moving one type bit from the car to the header allowed Lisp integers to cover the range -1G to +1G, and there are a surprising number of people who would like to use XEmacs on files 512MB). I suppose that Steve's proposal probably has similar impact on binaries and running instances of Python, but he hasn't given any use cases for list.pop(0) to compared to doubling the size of usable buffers. The choice that emacs made is great for emacs; as you stated, it allowed emacs to do something it was previously unable to do. Steve H's proposed change would not allow Python to do anything it wasn't able to do before, and would (as TJR stated in this and other threads) saddle Python with overhead so as to make more convenient the use a structure for which it was not intended (paraphrased, of course). Again; no good use-case, means no problem, means no reason to try to solve the perceived problem. It's great that you support Steve H's proposal, but can we keep the discussion on why this would be good for Python, rather than why changing a structure that is identical in name (but only similar in functionality) to Python's list was good for another language? - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
Josiah Carlson writes: On Fri, Jan 29, 2010 at 11:31 PM, Stephen J. Turnbull step...@xemacs.org wrote: Some people complained, but we considered this well worthwhile (moving one type bit from the car to the header allowed Lisp integers to cover the range -1G to +1G, and there are a surprising number of people who would like to use XEmacs on files 512MB). I suppose that Steve's proposal probably has similar impact on binaries and running instances of Python, but he hasn't given any use cases for list.pop(0) to compared to doubling the size of usable buffers. The choice that emacs made is great for emacs; Emacs hasn't made that choice, XEmacs did. I believe Emacs is still restricted to 128MB, or maybe 256MB, buffers. They recently had an opportunity to increase integer size, and thus maximum buffer size, but refused it. It's not a no-brainer. It's great that you support Steve H's proposal, but can we keep the discussion on why this would be good for Python, I don't support it or oppose it (I wouldn't notice the increased overhead myself, but I have no use case for O(1) list.pop(0)). I'm giving some figures on a similar change (adding a single pointer to a previously low-overhead structure used in large numbers in some applications), and pointing out that this was good for XEmacs only because there was a rather big increase in capability in a use-case that people can sympathize with even if they don't need it themselves. I hope that this example will help Steve H understand why he needs to give real use-cases, or if he doesn't know of any, give up. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
Minor erratum: Stephen J. Turnbull writes: Emacs hasn't made that choice, XEmacs did. I believe Emacs is still restricted to 128MB, or maybe 256MB, buffers. They recently had an opportunity to increase integer size, and thus maximum buffer size, but refused it. It's not a no-brainer. I stand corrected. Emacs did make some changes which increased integer size from 28 bits to 30, allowing a maximum signed value of 512M, but refused the tradeoff I described of making the cons type be indicated by a pointer to a type description record rather than a type bit in one of the pointers. That would have allowed 31 bits for integers, as in XEmacs. The basic thrust of my argument was correct, though. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
--- On Fri, 1/29/10, Stephen J. Turnbull step...@xemacs.org wrote: Lisp lists are really stacks No, they're really (ie, concretely) singly-linked lists. Now, stacks are an abstract data type, and singly-linked lists provide an efficient implementation of stacks. But that's not what linked lists really are. For example, singly-linked lists are also a reasonable way to implement inverted trees (ie, the node knows its parent, but not its children), which is surely not a stack. I like your distinction between abstract data types and concrete implementations. From a mutability perspective, the concrete implementation of Python lists shares a performance characteristic with most concrete implementations of stacks, in that inserts/pops at the top are cheap. Unlike most stacks, Python lists do at least semantically allow queue-like behavior for removing elements from the bottom, but I don't think it's unfair of me to say that removes from the bottom are discouraged under the current implementation. (I can cite the tutorial, for example). So, to the extent that removes from the bottom are frowned upon, Python again has the same mutability characteristics as a stack. The abstract data type stack does not allow for random access of elements AFAIK, so Python lists are definitely more than a stack, especially since random accesses are not only possible, they are quite efficient. So I guess they are an array. I don't know whether or not arrays are considered to be an abstract data type or not, but my de facto concept of an array is something that supports fast random access, cheap mutation at the top, and no guarantees at the bottom. I am guessing that from a big-O perspective, Python lists have the exact same performance characteristics as the data structures that Perl, Ruby, and Javascript all call array. Also, Python lists are built on top of a C array, and while it would be a bit of an overstatement to say that lists are just a nicely sugared encapsulation of C arrays, I think it would be a fair statement to say that Python lists only give O(1) performance for the same operations as the underlying C array; all the other operations are there just for convenience where performance is not a driving concern. Also, we can go back to the example of LISP, the one language that I know of that shares the term list. Whatever a list denotes from an abstract perspective, Python and LISP do not agree upon the definition. Python lists are more like right-side-up stacks with fast random access, while LISP lists are more like an upside-down stack without iteration. The Python use of list to denote what is concretely a dynamically extensible one-dimensional array confused me a bit. But what the heck, Guido needed a four-letter word to denote a concrete type used to implement a mutable sequence ADT, and he wasn't going to borrow one from that French guy on the ramparts, right? No big deal. Ahem... Probably the same reason he didn't call dictionaries hashes, right? :) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Fri, Jan 29, 2010 at 12:48 AM, Terry Reedy tjre...@udel.edu wrote: On 1/28/2010 6:30 PM, Josiah Carlson wrote: I would also point out that the way these things are typically done is that programmers/engineers have use-cases that are not satisfied by existing structures, they explain the issues they have with existing structures, and they propose modifications. So far, Steve has not offered any use-cases for why his proposed change is necessary; merely Use of a list as a queue rather than as a stack, as in breadth-first search, where one only needs to pop off the front but never push to the front. That is not to say that this is common or that a deque or other options may no be pretty satisfactory. But it would certainly be easier, when presenting such algorithms, to just be able to use a list, which has already been taught, than to introduce another structure. Currently a deque is not a drop-in replacement for a list in that one cannot use all list methods with a deque. As I understand it, his proposal is simpler than the one rejected a couple of years ago is that it does not include intentional over-allocation at the front of the list, as would be needed for guaranteed O(1) behavior for deque-like insertion at the front. I may consider a Python version of his idea for one of my needs, where speed is not an issue. I agree that the discussion has gone on too long here and that some of Steve's rhetoric has been unnecessarily abrasive and off-putting. He has been told this and acknowledged it once on Python-list, but habits die hard. For both reasons, I suggested a few days ago that further discussion should focus on the patch and be moved to the issue on the tracker. So I will not say more here. Terry Jan Reedy Excellently put. Geremy Condra ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
Am 28.01.2010 05:30, schrieb Steve Howell: If you want tools that are easy to use correctly, make them bug-free and document their behavior. If you want tools that are easy to use well, then make them perform better. I am not sure how my patch contradicts either of these goals. You keep making the argument that deque is a better alternative to list in many situations. I actually agree with you. Most programming problems are best modelled by a queue. I am not sure why Python lists get all the syntax sugar and promotion over deque, when in reality, Python lists implement a pretty useless data structure. Python lists are a glorification of a C array built on top of a memory-upward-biased memory allocator. As such, they optimize list appends (good) but fail awfully on list prepops (bad). They are much better as stacks than queues, even though queues are more useful for the most common programming known to man--work through a work queue and delete tasks when they are done. It is not surprising that Python lists are starting to show their lack of versatility in 2010. They're based on 1970's technology. Python lists are really just a thin encapsulation of C arrays built on top of an asymmetrical memory manager. In 2010 you could improve Python lists by releasing from the constraints of 1970s semantics. But I am starting to think a more modern approach would be to take more useful data structures like deques and give them more sugar. This post made me laugh, and skip the rest of the thread. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Thu, Jan 28, 2010 at 8:57 PM, Steve Howell showel...@yahoo.com wrote: --- On Thu, 1/28/10, Josiah Carlson josiah.carl...@gmail.com wrote: [...] in the decade+ that I've been using Python and needed an ordered sequence; lists were the right solution 99% of the time [...] What do you think of LISP, and car in particular (apart from the stupidly cryptic name)? Apples and oranges. Lisp lists are really stacks, and are analogous in Python to... lisp_list = (item1, (item2, (item3, (item4, () lisp_car, lisp_cdr = lisp_list In many typical uses of lisp lists, car/cdr are used as a replacement for the equivalent of iteration in other languages. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Thu, Jan 28, 2010 at 9:48 PM, Terry Reedy tjre...@udel.edu wrote: On 1/28/2010 6:30 PM, Josiah Carlson wrote: I would also point out that the way these things are typically done is that programmers/engineers have use-cases that are not satisfied by existing structures, they explain the issues they have with existing structures, and they propose modifications. So far, Steve has not offered any use-cases for why his proposed change is necessary; merely Use of a list as a queue rather than as a stack, as in breadth-first search, where one only needs to pop off the front but never push to the front. That is not to say that this is common or that a deque or other options may no be pretty satisfactory. But it would certainly be easier, when presenting such algorithms, to just be able to use a list, which has already been taught, than to introduce another structure. Currently a deque is not a drop-in replacement for a list in that one cannot use all list methods with a deque. Being able to use a list and get good performance straight off would be *convenient*. But that's it. People use it as a queue, discover that it is slow, ask (or research), and discover the deque. The use-cases where having the full range of list methods *and* deque behavior are fairly slim (perhaps none in my experience), and I argue are better covered by structures that are neither a Python list (even with the modifications offered by Steve) nor a deque. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
Josiah Carlson writes: Lisp lists are really stacks No, they're really (ie, concretely) singly-linked lists. Now, stacks are an abstract data type, and singly-linked lists provide an efficient implementation of stacks. But that's not what linked lists really are. For example, singly-linked lists are also a reasonable way to implement inverted trees (ie, the node knows its parent, but not its children), which is surely not a stack. The Python use of list to denote what is concretely a dynamically extensible one-dimensional array confused me a bit. But what the heck, Guido needed a four-letter word to denote a concrete type used to implement a mutable sequence ADT, and he wasn't going to borrow one from that French guy on the ramparts, right? No big deal. Ahem... So the confusion here is that in Python, list denotes a particular concrete data type, while Steve H. is using a more abstract idea of list as mutable sequence to suggest there's a reason for optimizing certain mutations that Python's data type isn't good at. I don't think that's an effective way for him to make his point, unfortunately. But both usages are consistent with Python's usage; mutability is the usual way that lists are distinguished from tuples, for example, and the underlying dynamic array implementation is rarely mentioned. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
Josiah Carlson writes: On Thu, Jan 28, 2010 at 8:57 PM, Steve Howell showel...@yahoo.com wrote: What do you think of LISP, and car in particular (apart from the stupidly cryptic name)? Apples and oranges. True, but speaking of Lisp lists, here's some possibly relevant experience. About 10 years ago, XEmacs converted its cons type from a special immediate representation (ie, cons == (car, cdr)) to a generic record representation (ie, cons == (pointer to type descriptor, car, cdr)). This resulted in a perceptible increase in VM usage and disk usage. A typical running XEmacs instance for me contains about 0.75 million conses and uses 200MB of VM, so with 32-bit pointers that's about 3MB extra, or 1.5%, and with 64-bit pointers it's 6MB extra, about 3%. However, I tend to have several big buffers (20-50MB) of pure character data; people who work with smaller buffers on 64-bit machines have reported as much as 10% extra overhead. On disk, the binary is typically about 9MB stripped. That contains about 50,000 conses, or an extra 200KB/400KB with the new structure, somewhat more than my experience (2% or 4%). Some people complained, but we considered this well worthwhile (moving one type bit from the car to the header allowed Lisp integers to cover the range -1G to +1G, and there are a surprising number of people who would like to use XEmacs on files 512MB). I suppose that Steve's proposal probably has similar impact on binaries and running instances of Python, but he hasn't given any use cases for list.pop(0) to compared to doubling the size of usable buffers. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Tuesday 26 January 2010, Steve Howell wrote: Here are the benefits of an O(1) implementation. [...] Did I leave anything out? Good summary, Steve, thanks! Anyway, you left two out: * Inserting at the front gets the same complexity as inserting at the back. * Inserting and erasing anywhere in the middle can memmove() the smaller tail, so it changes from O(N) to O(N/2). Finally, if you drop the invariant first=last, you can simply wrap around the pointers. This allows creating a producer/consumer queue which never needs to move its content. Cheers! Uli ** Sator Laser GmbH, Fangdieckstraße 75a, 22547 Hamburg, Deutschland Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932 ** Visit our website at http://www.satorlaser.de/ ** Diese E-Mail einschließlich sämtlicher Anhänge ist nur für den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empfänger sein sollten. Die E-Mail ist in diesem Fall zu löschen und darf weder gelesen, weitergeleitet, veröffentlicht oder anderweitig benutzt werden. E-Mails können durch Dritte gelesen werden und Viren sowie nichtautorisierte Änderungen enthalten. Sator Laser GmbH ist für diese Folgen nicht verantwortlich. ** ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
Having read the entirety of the thread (which is a rare case these days, I need more spare time), and being that I'm feeling particularly snarky today, I'm going to agree 100% with everything that Raymond has said in this message and his few subsequent messages. Snarky comments to follow. I would also point out that the way these things are typically done is that programmers/engineers have use-cases that are not satisfied by existing structures, they explain the issues they have with existing structures, and they propose modifications. So far, Steve has not offered any use-cases for why his proposed change is necessary; merely a (paraphrased) It would be great if list.pop(0) was O(1) instead of O(n). It's a solution looking for a problem that doesn't exist. That said, even if he were to magically come up with twenty examples where neither a deque nor a list were the right solution, I'd still be a -1, because in the decade+ that I've been using Python and needed an ordered sequence; lists were the right solution 99% of the time, deques got .99%, and the remaining .01% would not have been satisfied with what Steve is proposing (this is obvious hyperbole and made-up numbers, but the number of sequence types I've created (easily 10-20) is still a few orders of magnitude lower than how often I use plain lists and deques). Don't get me wrong, I'm all about nifty optimizations (I still get a chuckle out of proper string lengths with surrogate pairs, string views, Some, ...). But in this case there is no problem; merely the use of a data structure for something it was not designed. It's great that Steve wants to help. And I 3 innovation in data structures. But this patch isn't innovation, and it isn't helping 99.99% of use-cases. /snark Going back to not having free time, - Josiah On Tue, Jan 26, 2010 at 1:52 PM, Raymond Hettinger raymond.hettin...@gmail.com wrote: Ah, but that applies for *large* lists. Adding 8 bytes to each list object applies to *all* lists. I betcha that many programs use many tiny lists. Even most tiny-ish lists are wasting memory, though, according to this sequence: 4, 8, 16, 25, ... That is only is they are being grown with list.append(). Otherwise, list sizes are exact. For example, [0,1,2] uses space for just three entries and [] for none. I get the impression that 1) you've already made up your mind and are ignoring input from the other developers, 2) that you're ignoring the python-dev discussions of long ago where this very idea was rejected and deques came in to being instead, 3) over-estimating the prevalence of use cases for pop(0), and 4) under-estimating the impact on small lists, 5) under-estimating the impact on psyco or other implementations of Python, 6) under-estimating the impact on third-party extensions and debugging tools. Deques already provide a solution to the FIFO problem and they do so without huge wastes of memory or calls to memmove(). They handle both insertion and deletion from the front and back. In comparison, the proposed changes to lists seem like a complete hack. Just because it can be done, doesn't mean that it should. ISTM that you're attacking an already solved problem and that you're playing fast and loose with a core data structure. Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/josiah.carlson%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
--- On Thu, 1/28/10, Josiah Carlson josiah.carl...@gmail.com wrote: [...] in the decade+ that I've been using Python and needed an ordered sequence; lists were the right solution 99% of the time [...] What do you think of LISP, and car in particular (apart from the stupidly cryptic name)? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On 1/28/2010 6:30 PM, Josiah Carlson wrote: I would also point out that the way these things are typically done is that programmers/engineers have use-cases that are not satisfied by existing structures, they explain the issues they have with existing structures, and they propose modifications. So far, Steve has not offered any use-cases for why his proposed change is necessary; merely Use of a list as a queue rather than as a stack, as in breadth-first search, where one only needs to pop off the front but never push to the front. That is not to say that this is common or that a deque or other options may no be pretty satisfactory. But it would certainly be easier, when presenting such algorithms, to just be able to use a list, which has already been taught, than to introduce another structure. Currently a deque is not a drop-in replacement for a list in that one cannot use all list methods with a deque. As I understand it, his proposal is simpler than the one rejected a couple of years ago is that it does not include intentional over-allocation at the front of the list, as would be needed for guaranteed O(1) behavior for deque-like insertion at the front. I may consider a Python version of his idea for one of my needs, where speed is not an issue. I agree that the discussion has gone on too long here and that some of Steve's rhetoric has been unnecessarily abrasive and off-putting. He has been told this and acknowledged it once on Python-list, but habits die hard. For both reasons, I suggested a few days ago that further discussion should focus on the patch and be moved to the issue on the tracker. So I will not say more here. Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On 1/28/2010 11:57 PM, Steve Howell wrote: --- On Thu, 1/28/10, Josiah Carlson josiah.carl...@gmail.com wrote: [...] in the decade+ that I've been using Python and needed an ordered sequence; lists were the right solution 99% of the time [...] What do you think of LISP, and car in particular (apart from the stupidly cryptic name)? A LISP list/pair has nothing to do with a Python list. The list of LISP is a *singly-linked* list. You cannot O(1) index a LISP list. A Python list is more equivalent to a LISP vector, which car does not work with; in fact, there is not even a pop operator -- all size changes of a vector O(n) unless the implementation is playing games (like the one you are proposing for the start and the one Python already uses for the end of a list). (And with this, clearly uninformed reply by you, I am now ignoring your trolling.) -- Scott Dial sc...@scottdial.com scod...@cs.indiana.edu ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On approximately 1/26/2010 7:50 PM, came the following characters from the keyboard of Cameron Simpson: My point was that I look on python builtins like list and dict as highly optimised, highly efficient facilities. That means that I expect a list to be very very much like a linear array as one would find in C-like languages, with realloc() managment behind the scenes to handle the resizing requirements on append/extend. As a newcomer to python, I must say that I wouldn't expect a list to be like an array. I'd expect it more to be like a list... many implementations of lists (linked lists, in particular) make it O(1) to add to the front or back. An array can be used to represent a list, but there are known inefficiencies that result when doing so, one of which the Subject patch is working to address. I guess I would have expected something called an array to be more like an array, rather than something called a list. But one has to read the documentation to find out what things really mean, in a new environment. -- Glenn -- http://nevcal.com/ === A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
Cameron Simpson wrote: The proposed change to make pop(0) O(1) involves adding a base offset to the internal list implementation. Doesn't that incur a (small) overhead to _every_ list operation? Doesn't that weaken list as the as efficient as possible implementation of choice for array-like things? No, the implementation is smarter than that. The existing pointer in the PyList_* structure will continue to point to the first element of the list while an extra pointer is added that points to the beginning of the allocated memory. The difference between the two memory addresses will determine the number of orphan pointer slots that exist at the beginning of the list. Only operations that change the length of the list will be slowed down at all, since they will need to take the orphans into account when deciding whether or not to resize the allocated memory. The big practical concern is actually the memory cost of adding yet another pointer (potentially 8 bytes of data) to every list object, even empty ones. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia --- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
--- On Wed, 1/27/10, Glenn Linderman v+pyt...@g.nevcal.com wrote: As a newcomer to python, I must say that I wouldn't expect a list to be like an array. I'd expect it more to be like a list... many implementations of lists (linked lists, in particular) make it O(1) to add to the front or back. An array can be used to represent a list, but there are known inefficiencies that result when doing so, one of which the Subject patch is working to address. I guess I would have expected something called an array to be more like an array, rather than something called a list. But one has to read the documentation to find out what things really mean, in a new environment. My concept of Python lists is that they should have at least the same performance characteristics as an ordinary to-do list that you make with pencil, paper, and an eraser. When you complete the first task on your to-do list, you can just erase it; no need to recopy the whole list. When you complete all the elements on the first page, throw away the paper. As you find new tasks to complete, add them on the end. When you fill up a page, get a new sheet of paper. If you complete the first task, erase it, but then a new urgent task comes in, go ahead and write the new urgent task on the first line of the first page. If, for some reason, you keep getting bombarded with tasks that are more urgent than the plan you originally set out for yourself, you need a more powerful tool than a simple pencil/paper tool list, and by analogy, you would need a more powerful tool than a Python list to do it electronically. But if you are just working your way through a paper to-do list, then erasing elements from the top works in O(1) time with occasional compacting of paper when you finish a whole page of tasks. In my mind Python's lists should have the same performance characteristics as the paper list (or better). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
--- On Tue, 1/26/10, Cameron Simpson c...@zip.com.au wrote: From: Cameron Simpson c...@zip.com.au | The idea that CPython should not be improved because it would spoil | programmers strikes me as a thin, even desparate objection. Hey, I even used the word thin to describe my concern! My point was that I look on python builtins like list and dict as highly optimised, highly efficient facilities. That means that I expect a list to be very very much like a linear array as one would find in C-like languages, with realloc() managment behind the scenes to handle the resizing requirements on append/extend. It also means that the O() cost of operations should be pretty obvious. pop(0) obviously involves shuffling everything down one slot. In my previous posting, though, I mention another plausible metaphor that programmers would have about lists--ordinary to-do lists where you can cross out or erase the first item in the list in O(1) time. The proposed change to make pop(0) O(1) involves adding a base offset to the internal list implementation. Doesn't that incur a (small) overhead to _every_ list operation? Doesn't that weaken list as the as efficient as possible implementation of choice for array-like things? The patch advances the pointer itself during prepops, so accesses continue to work as quickly as before. The offset between the originally allocated pointer and the pointer to the new first element of the list only gets calculated and used during list_resize and list_dealloc. List_resize gets called by any operation that changes the size of the list, including inserts, deletes, pops, etc. Because I have to increase the size of PyListObject, you could argue that I even affect the performance of access to the extent that the greater size of PyListObject increases the likelihood of cache misses. I would consider that also to be a very thin objection, although not completely implausible. Yes, optimisation is nice. Higher seed is nice. But there are optimisations that simple fix inefficiencies or provide a special case for a common operation, and there are those with side effects elsewhere in the implementation (i.e. the base offset this pop(0) opt would incur, which adds a (small) cost to everything). Your general point about optimizations having tradeoffs is valid, but I am not making the particular tradeoff that you mention. The other aspect, mentioned elsewhere in this thread, is that programmers should know the O() cost of these operations. The tutorial does mention that the current list implementation has to move all the elements forward when you remove off the top. (Cameron: Quick aside, for some reason your emails keep going to my spam folder. I am sure it's just my mail client being stupid, but I thought I'd let you know in case it's something on your side.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Wed, Jan 27, 2010 at 7:13 AM, Steve Howell showel...@yahoo.com wrote: My concept of Python lists is that they should have at least the same performance characteristics as an ordinary to-do list that you make with pencil, paper, and an eraser. When you complete the first task on your to-do list, you can just erase it; no need to recopy the whole list. I don't think your analogy works, unless you recopy your to-do lists whenever you complete a task in the middle of the list. ;-) -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
--- On Wed, 1/27/10, Daniel Stutzbach dan...@stutzbachenterprises.com wrote: From: Daniel Stutzbach dan...@stutzbachenterprises.com Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time To: Steve Howell showel...@yahoo.com Cc: python-dev@python.org Date: Wednesday, January 27, 2010, 5:32 AM On Wed, Jan 27, 2010 at 7:13 AM, Steve Howell showel...@yahoo.com wrote: My concept of Python lists is that they should have at least the same performance characteristics as an ordinary to-do list that you make with pencil, paper, and an eraser. When you complete the first task on your to-do list, you can just erase it; no need to recopy the whole list. I don't think your analogy works, unless you recopy your to-do lists whenever you complete a task in the middle of the list. ;-) The bunch of stickies on my desk, and scribbled notes on the back of envelopes, etc. does indeed suggest a jumbled data structure that I would never want to reproduce electronically! :) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
Glenn Linderman, 27.01.2010 10:13: As a newcomer to python, I must say that I wouldn't expect a list to be like an array. I'd expect it more to be like a list... many implementations of lists (linked lists, in particular) make it O(1) to add to the front or back. An array can be used to represent a list, but there are known inefficiencies that result when doing so Performance-wise, there are a lot more inefficiencies in linked lists for the most common use cases than in arrays. It's hinting to see how common the List l = new ArrayList() idiom is in Java (plus generics, obviously). I can't remember seeing any other kind of initialisation in ages. That's a huge difference between Java and Python, BTW. Python optimises for common use cases to keep you from thinking too much about implementation details, whereas Java just leaves you alone with all possible solutions for all possible use cases and forces you to choose the right one at each single code line you write. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
Nick Coghlan ncoghlan at gmail.com writes: The big practical concern is actually the memory cost of adding yet another pointer (potentially 8 bytes of data) to every list object, even empty ones. It needn't be, actually. Why? You only need to store this other pointer when you have an orphaned area in the first place. But then you can store the pointer *in* the orphaned area :-) So let's say for example: - when ob_size = 0, there are no orphans - when ob_size 0, the pointer to the orphaned area is given by (PyObject **) self-ob_items[-1] This makes a couple of micro-operations slightly slower (you need to take ob_size's absolute value to get the length (*)); and it kills code which uses Py_SIZE() on lists. But PyList_GET_SIZE() exists for a reason. (*) or you could use ob_size's MSB, and then a simple binary AND gets you the length; or even ob_items's LSB, since the pointer will always be aligned on more than 1 byte (and, yes, this is a bit insane) Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
Daniel Stutzbach daniel at stutzbachenterprises.com writes: I don't think your analogy works, unless you recopy your to-do lists whenever you complete a task in the middle of the list. I think his analogy suggests that his to-do list is a doubly-linked list ;) Or perhaps an array with lazy deletion (so that he loses O(1) random access). Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
--- On Tue, 1/26/10, Guido van Rossum gu...@python.org wrote: From: Guido van Rossum gu...@python.org Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time To: Steve Howell showel...@yahoo.com Cc: Nick Coghlan ncogh...@gmail.com, python-dev@python.org Date: Tuesday, January 26, 2010, 12:57 PM On Tue, Jan 26, 2010 at 12:46 PM, Steve Howell showel...@yahoo.com wrote: It seems to me that the goal of keeping lists super-compact from a memory standpoint is contradicted by the code in list_resize that optimistically preallocates extra memory on appends. Ah, but that applies for *large* lists. Adding 8 bytes to each list object applies to *all* lists. I betcha that many programs use many tiny lists. I think that some of the large programs that you mention with many tiny lists have some subset of lists still in memory only due to the fact that they cannot be garbage collected due to dangling references. One of the ways to eliminate dangling references is to aggressively delete objects after you are done processing them. Right now the Python programmer looking to aggressively delete elements from the top of a list has to consider the tradeoff that the operation takes O(N) time and would possibly churn his memory caches with the O(N) memmove operation. In some cases, the Python programmer would only have himself to blame for not using a deque in the first place. But maybe he's a maintenance programmer, so it's not his fault, and maybe the code he inherits uses lists in a pervasive way that makes it hard to swap in deque after the fact. What advice do you give him? Of course, this scenario is mostly speculative. A concrete example of a large Python program that keeps lots of tiny lists around would probably shed more light on the matter. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
--- On Wed, 1/27/10, John Arbash Meinel john.arbash.mei...@gmail.com wrote: From: John Arbash Meinel john.arbash.mei...@gmail.com Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time To: Steve Howell showel...@yahoo.com Cc: Guido van Rossum gu...@python.org, Nick Coghlan ncogh...@gmail.com, python-dev@python.org Date: Wednesday, January 27, 2010, 7:45 AM Right now the Python programmer looking to aggressively delete elements from the top of a list has to consider the tradeoff that the operation takes O(N) time and would possibly churn his memory caches with the O(N) memmove operation. In some cases, the Python programmer would only have himself to blame for not using a deque in the first place. But maybe he's a maintenance programmer, so it's not his fault, and maybe the code he inherits uses lists in a pervasive way that makes it hard to swap in deque after the fact. What advice do you give him? Or he could just set them to None. Fair enough, but that's still wasteful of memory, keeping around a bunch of None elements because you can't inexpensively delete them. I concede that you can break the dangling references, though, and that's often where large programs waste a lot of memory, so your point is well taken. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Wed, Jan 27, 2010 at 9:55 AM, Steve Howell showel...@yahoo.com wrote: Fair enough, but that's still wasteful of memory, keeping around a bunch of None elements because you can't inexpensively delete them. Even if there are many references to it, there is only one None element. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Wed, Jan 27, 2010 at 4:55 PM, Steve Howell showel...@yahoo.com wrote: --- On Wed, 1/27/10, John Arbash Meinel john.arbash.mei...@gmail.com wrote: From: John Arbash Meinel john.arbash.mei...@gmail.com Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time To: Steve Howell showel...@yahoo.com Cc: Guido van Rossum gu...@python.org, Nick Coghlan ncogh...@gmail.com, python-dev@python.org Date: Wednesday, January 27, 2010, 7:45 AM Right now the Python programmer looking to aggressively delete elements from the top of a list has to consider the tradeoff that the operation takes O(N) time and would possibly churn his memory caches with the O(N) memmove operation. In some cases, the Python programmer would only have himself to blame for not using a deque in the first place. But maybe he's a maintenance programmer, so it's not his fault, and maybe the code he inherits uses lists in a pervasive way that makes it hard to swap in deque after the fact. What advice do you give him? Or he could just set them to None. Fair enough, but that's still wasteful of memory, keeping around a bunch of None elements because you can't inexpensively delete them. I concede that you can break the dangling references, though, and that's often where large programs waste a lot of memory, so your point is well taken. Why is this thread still going? It seems to me that the case for this change is very weak. Lists, like dicts and tuples, are used *everywhere* and in *vast* quantities. Making them grow by 4 or 8 bytes each for such a corner case can't be an option. I'm sure your new list class has a lot of uses, but it should be an external class. If it stays close in behavior to the lists' behavior, then we could even add a notice in pop()'s documentation that recommends to use your new class in case they want a painless way to replace list usage (to make the life of those poor developers maintaining other people's code easier), maybe even add it in stdlib's collections unit. -- Virgil Dupras ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
--- On Wed, 1/27/10, Daniel Stutzbach dan...@stutzbachenterprises.com wrote: From: Daniel Stutzbach dan...@stutzbachenterprises.com Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time To: Steve Howell showel...@yahoo.com Cc: John Arbash Meinel john.arbash.mei...@gmail.com, python-dev@python.org Date: Wednesday, January 27, 2010, 8:20 AM On Wed, Jan 27, 2010 at 9:55 AM, Steve Howell showel...@yahoo.com wrote: Fair enough, but that's still wasteful of memory, keeping around a bunch of None elements because you can't inexpensively delete them. Even if there are many references to it, there is only one None element. I should have been more precise and said the pointers to None, which could be reclaimed. But that's a pretty minor savings--I concede on the greater point, you do have the alternative to break dangling references with None, so the expense of avoiding remove operations is only local to list itself. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
--- On Wed, 1/27/10, Virgil Dupras hs...@hardcoded.net wrote: Why is this thread still going? It seems to me that the case for this change is very weak. Lists, like dicts and tuples, are used *everywhere* and in *vast* quantities. Making them grow by 4 or 8 bytes each for such a corner case can't be an option. I'm sure your new list class has a lot of uses, but it should be an external class. If it stays close in behavior to the lists' behavior, then we could even add a notice in pop()'s documentation that recommends to use your new class in case they want a painless way to replace list usage (to make the life of those poor developers maintaining other people's code easier), maybe even add it in stdlib's collections unit. Lists are indeed used *everywhere* and *vast* quantities, which is why optimizations on list operations are worthwhile to pursue. The particular optimization makes a few tradeoffs, and the consensus here appears to be that the ugliest tradeoff is adding the pointer to PyListObject. There is at least some irony in opposing an optimization to a remove() operation on the basis of compacting memory, which isn't to say that the argument isn't valid. There is also the possibility that my initial patch can be refined by somebody smarter than myself to eliminate the particular tradeoff. In fact, Antoine Pitrou already suggested an approach, although I agree that it kind of pushes the boundary of sanity. :) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
Lists are indeed used *everywhere* and *vast* quantities, which is why optimizations on list operations are worthwhile to pursue. Unfortunately, the proposed change is a pessimisation, not an optimisation. We probably shouldn't discuss it further, at least not until a PEP gets written by its proponents. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
In my mind Python's lists should have the same performance characteristics as the paper list (or better). I think you'll have to adjust your mind then. People have proposed various data structures that *do* work efficiently as paper lists. So if you want a paper list, use one of them, rather than abusing the Python list for one. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
--- On Wed, 1/27/10, Antoine Pitrou solip...@pitrou.net wrote: From: Antoine Pitrou solip...@pitrou.net Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time To: python-dev@python.org Date: Wednesday, January 27, 2010, 6:15 AM Nick Coghlan ncoghlan at gmail.com writes: The big practical concern is actually the memory cost of adding yet another pointer (potentially 8 bytes of data) to every list object, even empty ones. It needn't be, actually. Why? You only need to store this other pointer when you have an orphaned area in the first place. But then you can store the pointer *in* the orphaned area :-) So let's say for example: - when ob_size = 0, there are no orphans - when ob_size 0, the pointer to the orphaned area is given by (PyObject **) self-ob_items[-1] This makes a couple of micro-operations slightly slower (you need to take ob_size's absolute value to get the length (*)); and it kills code which uses Py_SIZE() on lists. But PyList_GET_SIZE() exists for a reason. (*) or you could use ob_size's MSB, and then a simple binary AND gets you the length; or even ob_items's LSB, since the pointer will always be aligned on more than 1 byte (and, yes, this is a bit insane) A slightly more sane alternative would be to leave ob_size and ob_item alone with their current semantics, and then replace allocated with self-excess, where self-excess == excess_above * 256 + excess_below. Right now self-allocated is only used in a few places: list_resize listextend listsort list_init (but only in an assert) list_sizeof Those places would need to compute: allocated = self-ob_size + self-excess / 256 + self-excess % 256 The right hand side could be done with shifting/bitmasking inside a macro. Excess_left would obviously not be allowed to exceed 256; excess_right would max out at PY_SIZE_MAX / 256, instead of PY_SIZE_MAX / 8 + 6, which is probably the right thing anyway. Here is the current algorithm for allocating extra bytes on the right: new_allocated = (newsize 3) + (newsize 9 ? 3 : 6); On a 32-bit platform, excess_right would max out at 16M instead of the current 512M. Any method that wanted to find the originally allocated pointer would compute self-ob_item - (self-excess % 256) To the extent that most of the ugly details could be encapsulated in macros, I do not think this would complicate the list code itself greatly, but it would complicate debugging. Guido has expressed a strong desire to keep a hard pointer to the originally allocated chunk, especially with regard to future garbage collection changes. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
Le mercredi 27 janvier 2010 à 11:49 -0800, Steve Howell a écrit : A slightly more sane alternative would be to leave ob_size and ob_item alone with their current semantics, and then replace allocated with self-excess, where self-excess == excess_above * 256 + excess_below. Or we could use allocated's sign bit in order to store the flag (of whether there is an orphaned area or not) and then store the orphaned pointer as ob_items[-1]. Thus we don't have to limit the magnitude of anything. And since allocated itself isn't used in any really critical path, it doesn't slow down anything. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
--- On Wed, 1/27/10, Antoine Pitrou solip...@pitrou.net wrote: From: Antoine Pitrou solip...@pitrou.net Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time To: python-dev@python.org Date: Wednesday, January 27, 2010, 12:41 PM Le mercredi 27 janvier 2010 à 11:49 -0800, Steve Howell a écrit : A slightly more sane alternative would be to leave ob_size and ob_item alone with their current semantics, and then replace allocated with self-excess, where self-excess == excess_above * 256 + excess_below. Or we could use allocated's sign bit in order to store the flag (of whether there is an orphaned area or not) and then store the orphaned pointer as ob_items[-1]. Thus we don't have to limit the magnitude of anything. And since allocated itself isn't used in any really critical path, it doesn't slow down anything. Ok, I will try that technique and benchmark it, as it would address one strong objection to the proposal. I think I've summarized most of the other major objections here: http://wiki.python.org/moin/ProposalToSpeedUpListOperations ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Jan 27, 2010, at 11:38 AM, Martin v. Löwis wrote: Lists are indeed used *everywhere* and *vast* quantities, which is why optimizations on list operations are worthwhile to pursue. Unfortunately, the proposed change is a pessimisation, not an optimisation. We probably shouldn't discuss it further, at least not until a PEP gets written by its proponents. I concur (on both points). AFAICT, the performance of the proposal: * increases space requirements by a small fixed amount * s.append() performance slightly degraded. * the s.insert(0, value) case isn't helped -- still O(n) * repeated s.pop(0) have amortized O(1) but either needs to waste space indefinitely (likely not what the programmer intended by popping off values) or needs to do occasional memmoves (which isn't as good as a deque which never moves the data). * the resize performance doesn't work well with the memory allocator -- while a series of appends can often resize in-place without a need to do an O(n) memmove, but a series of pop(0) calls doesn't have a resize in-place option. What currently unknown is the effect on third-party extensions and debugging tools that access the structure directly. Also, am not sure if this affects psyco or the other implementations such as Jython which may implement lists in terms of existing Java containers. ISTM, the *only* benefit is that an occasional s.pop(0) will perform better than it does now but not as well as a deque which never has to move data). Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Wed, Jan 27, 2010 at 3:49 PM, Raymond Hettinger raymond.hettin...@gmail.com wrote: Also, am not sure if this affects psyco or the other implementations such as Jython which may implement lists in terms of existing Java containers. Or Unladen Swallow. I'm -1 on mucking with any of the fundamental data structures until the Unladen Swallow patch lands (assuming it lands). -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
--- On Wed, 1/27/10, Raymond Hettinger raymond.hettin...@gmail.com wrote: From: Raymond Hettinger raymond.hettin...@gmail.com Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time To: Martin v. Löwis mar...@v.loewis.de Cc: Steve Howell showel...@yahoo.com, python-dev@python.org Date: Wednesday, January 27, 2010, 1:49 PM On Jan 27, 2010, at 11:38 AM, Martin v. Löwis wrote: Lists are indeed used *everywhere* and *vast* quantities, which is why optimizations on list operations are worthwhile to pursue. Unfortunately, the proposed change is a pessimisation, not an optimisation. We probably shouldn't discuss it further, at least not until a PEP gets written by its proponents. I concur (on both points). AFAICT, the performance of the proposal: * increases space requirements by a small fixed amount * s.append() performance slightly degraded. * the s.insert(0, value) case isn't helped -- still O(n) * repeated s.pop(0) have amortized O(1) but either needs to waste space indefinitely (likely not what the programmer intended by popping off values) or needs to do occasional memmoves (which isn't as good as a deque which never moves the data). * the resize performance doesn't work well with the memory allocator -- while a series of appends can often resize in-place without a need to do an O(n) memmove, but a series of pop(0) calls doesn't have a resize in-place option. What currently unknown is the effect on third-party extensions and debugging tools that access the structure directly. Also, am not sure if this affects psyco or the other implementations such as Jython which may implement lists in terms of existing Java containers. ISTM, the *only* benefit is that an occasional s.pop(0) will perform better than it does now but not as well as a deque which never has to move data). I agree with most of what's said above, with a few clarifications. The speedup is not limited to pop(0); it also considerably speeds up code like below: n = 80 for i in range(n): x = lst[:10] del lst[:10] Understood that it is a contrived benchmark, and real code like that could be replaced with a technique that Nones out the used up elements and advances a start pointer. Prepends are still O(N) for most use cases, but prepends will reclaim space at the front if it's there in O(1). The biggest performance tradeoff, the extra space requirement in PyListObject, can be eliminated, albeit with a pretty ugly hack. Since there is a lot of discussion about tradeoffs, I would remind folks that asking somebody to use a deque instead of a list also forces tradeoffs; you lose the comfort and familiarity of lists (not to be underestimated) as well as some features (you can't spell d[:10] as d[:10] for example). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
Raymond Hettinger raymond.hettinger at gmail.com writes: AFAICT, the performance of the proposal: * increases space requirements by a small fixed amount Well, given my proposal (assuming it turns out ok) it doesn't. * s.append() performance slightly degraded. Why? * the resize performance doesn't work well with the memory allocator -- while a series of appends can often resize in-place without a need to do an O(n) memmove, but a series of pop(0) calls doesn't have a resize in-place option. Can often resize in-place is overstated. It only works if there's a chunk of sufficient free space just after your list's storage, which is probably not the common case (except perhaps in microbenchmarks where the list is the latest allocated object). What currently unknown is the effect on third-party extensions and debugging tools that access the structure directly. Also, am not sure if this affects psyco or the other implementations such as Jython which may implement lists in terms of existing Java containers. Well, if psyco wants to muck with internal implementation details, it's psyco's problem. We shouldn't refrain from making internal changes under the pretext that it might break psyco. Otherwise the psyco author(s) should file a PEP for inclusion in the core interpreter ;) Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Wed, Jan 27, 2010 at 2:14 PM, Daniel Stutzbach dan...@stutzbachenterprises.com wrote: On Wed, Jan 27, 2010 at 3:49 PM, Raymond Hettinger raymond.hettin...@gmail.com wrote: Also, am not sure if this affects psyco or the other implementations such as Jython which may implement lists in terms of existing Java containers. Or Unladen Swallow. I'm -1 on mucking with any of the fundamental data structures until the Unladen Swallow patch lands (assuming it lands). Don't block internal changes for our sake. For the most part, we've already made plans to make them non-issues. In any cases we haven't, we need to make them non-issues anyway to avoid slowing down python development in the future. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
From: Antoine Pitrou solip...@pitrou.net raymond.hettinger at gmail.com writes: AFAICT, the performance of the proposal: * increases space requirements by a small fixed amount Well, given my proposal (assuming it turns out ok) it doesn't. * s.append() performance slightly degraded. Why? A slight degradation is unavoidable AFAICT. The overhead is roughly this on every call: if (self-orphans = newsize) { // SKIP ALMOST ALWAYS needed = newsize + self-orphans; And if it's time for a realloc, you have to pay down a little more bookkeeping: items = self-ob_item - self-orphans; self-ob_item = items + self-orphans; op-orphans = 0; ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Jan 27, 2010, at 2:56 PM, Steve Howell wrote: --- On Wed, 1/27/10, Raymond Hettinger raymond.hettin...@gmail.com wrote: From: Raymond Hettinger raymond.hettin...@gmail.com Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time To: Martin v. Löwis mar...@v.loewis.de Cc: Steve Howell showel...@yahoo.com, python-dev@python.org Date: Wednesday, January 27, 2010, 1:49 PM On Jan 27, 2010, at 11:38 AM, Martin v. Löwis wrote: Lists are indeed used *everywhere* and *vast* quantities, which is why optimizations on list operations are worthwhile to pursue. Unfortunately, the proposed change is a pessimisation, not an optimisation. We probably shouldn't discuss it further, at least not until a PEP gets written by its proponents. I concur (on both points). AFAICT, the performance of the proposal: * increases space requirements by a small fixed amount * s.append() performance slightly degraded. * the s.insert(0, value) case isn't helped -- still O(n) * repeated s.pop(0) have amortized O(1) but either needs to waste space indefinitely (likely not what the programmer intended by popping off values) or needs to do occasional memmoves (which isn't as good as a deque which never moves the data). * the resize performance doesn't work well with the memory allocator -- while a series of appends can often resize in-place without a need to do an O(n) memmove, but a series of pop(0) calls doesn't have a resize in-place option. What currently unknown is the effect on third-party extensions and debugging tools that access the structure directly. Also, am not sure if this affects psyco or the other implementations such as Jython which may implement lists in terms of existing Java containers. ISTM, the *only* benefit is that an occasional s.pop(0) will perform better than it does now but not as well as a deque which never has to move data). I agree with most of what's said above, with a few clarifications. The speedup is not limited to pop(0); it also considerably speeds up code like below: n = 80 for i in range(n): x = lst[:10] del lst[:10] For this code, the slicing notation is nicer than the equivalent pop one-at-a-time code for deques. The underlying performance isn't a good though -- a deque would free block as they become available and would never call a memmove. The use case is a bit weird. If people needed something like this, we would already see it in reverse order: lst.reverse() # then delete slices from the right ISTM, that there aren't actually any use cases for left popping in the absence of right appending or left inserting; otherwise you could just retrieve the slices directly (without doing any deletes). In cases where the list is growing on one end and shrinking on another, a deque wins because it frees memory readily and doesn't need memmoves. Understood that it is a contrived benchmark, and real code like that could be replaced with a technique that Nones out the used up elements and advances a start pointer. Right. Since there is a lot of discussion about tradeoffs, I would remind folks that asking somebody to use a deque instead of a list also forces tradeoffs; you lose the comfort and familiarity of lists (not to be underestimated) as well as some features (you can't spell d[:10] as d[:10] for example). If it were actually needed, there is no reason deques couldn't support slicing notation. But then, I've never had a request for it ever (and users typically haven't been shy about asking for what they want). The familiarity argument doesn't hold much water for two reasons: * deques seem to have a nearly zero learning curve. There have been no posts or reports on people being challenged by the API. The *only* issue that has ever arisen is that a fair number of people have heard of a queue but not heard of the term, deque. * changing the list implementation makes it harder to decide which data structure to use. Currently, it is simple -- if you need to append or pop on the left, use a deque; otherwise, use a list. With the proposed change, the performance trade-offs are harding to understand (you can do pop(0) but not insert(0,v) unless you've popped first but not after a resize, a deque is better is most cases but it is hard to describe real use cases with list.pop(0) would be preferable). * the current design encourages people to use the right data structure for a given need. the proposed change makes the trades-off murky and implementation dependent. you have to know a lot more in order to make the right choices. that's not good for usability. we want tools that are easy to use correctly/well. Raymond___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
--- On Wed, 1/27/10, Raymond Hettinger raymond.hettin...@gmail.com wrote: From: Raymond Hettinger raymond.hettin...@gmail.com * the current design encourages people to use the right data structure for a given need. the proposed change makes the trades-off murky and implementation dependent. Are you saying that the current slowness of list for prepops helps people to choose more appropriate data structures? Really you have to know a lot more in order to make the right choices. that's not good for usability. we want tools that are easy to use correctly/well. If you want tools that are easy to use correctly, make them bug-free and document their behavior. If you want tools that are easy to use well, then make them perform better. I am not sure how my patch contradicts either of these goals. You keep making the argument that deque is a better alternative to list in many situations. I actually agree with you. Most programming problems are best modelled by a queue. I am not sure why Python lists get all the syntax sugar and promotion over deque, when in reality, Python lists implement a pretty useless data structure. Python lists are a glorification of a C array built on top of a memory-upward-biased memory allocator. As such, they optimize list appends (good) but fail awfully on list prepops (bad). They are much better as stacks than queues, even though queues are more useful for the most common programming known to man--work through a work queue and delete tasks when they are done. It is not surprising that Python lists are starting to show their lack of versatility in 2010. They're based on 1970's technology. Python lists are really just a thin encapsulation of C arrays built on top of an asymmetrical memory manager. In 2010 you could improve Python lists by releasing from the constraints of 1970s semantics. But I am starting to think a more modern approach would be to take more useful data structures like deques and give them more sugar. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Wed, Jan 27, 2010 at 11:30 PM, Steve Howell showel...@yahoo.com wrote: --- On Wed, 1/27/10, Raymond Hettinger raymond.hettin...@gmail.com wrote: From: Raymond Hettinger raymond.hettin...@gmail.com * the current design encourages people to use the right data structure for a given need. the proposed change makes the trades-off murky and implementation dependent. Are you saying that the current slowness of list for prepops helps people to choose more appropriate data structures? Really you have to know a lot more in order to make the right choices. that's not good for usability. we want tools that are easy to use correctly/well. If you want tools that are easy to use correctly, make them bug-free and document their behavior. If you want tools that are easy to use well, then make them perform better. I am not sure how my patch contradicts either of these goals. You keep making the argument that deque is a better alternative to list in many situations. I actually agree with you. Most programming problems are best modelled by a queue. I am not sure why Python lists get all the syntax sugar and promotion over deque, when in reality, Python lists implement a pretty useless data structure. Python lists are a glorification of a C array built on top of a memory-upward-biased memory allocator. As such, they optimize list appends (good) but fail awfully on list prepops (bad). They are much better as stacks than queues, even though queues are more useful for the most common programming known to man--work through a work queue and delete tasks when they are done. It is not surprising that Python lists are starting to show their lack of versatility in 2010. They're based on 1970's technology. Python lists are really just a thin encapsulation of C arrays built on top of an asymmetrical memory manager. In 2010 you could improve Python lists by releasing from the constraints of 1970s semantics. But I am starting to think a more modern approach would be to take more useful data structures like deques and give them more sugar. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/alex.gaynor%40gmail.com Python lists implement a pretty useless data structure It's very difficult for ideas to gain traction when they contain such useless, and obviously wrong, rhetoric. There's an enormous body of code out there that begs to disagree with this assertion. Alex -- I disapprove of what you say, but I will defend to the death your right to say it. -- Voltaire The people's good is the highest law. -- Cicero Code can always be simpler than you think, but never as simple as you want -- Me ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Wed, Jan 27, 2010 at 8:30 PM, Steve Howell showel...@yahoo.com wrote: I am not sure why Python lists get all the syntax sugar and promotion over deque, when in reality, Python lists implement a pretty useless data structure. Python lists are a glorification of a C array built on top of a memory-upward-biased memory allocator. As such, they optimize list appends (good) but fail awfully on list prepops (bad). They are much better as stacks than queues, even though queues are more useful for the most common programming known to man--work through a work queue and delete tasks when they are done. It is not surprising that Python lists are starting to show their lack of versatility in 2010. They're based on 1970's technology. Python lists are really just a thin encapsulation of C arrays built on top of an asymmetrical memory manager. Steve, I think you might as well stop now. I see nothing useful coming out of pursuing this thread further. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
--- On Wed, 1/27/10, Alex Gaynor alex.gay...@gmail.com wrote: Python lists implement a pretty useless data structure It's very difficult for ideas to gain traction when they contain such useless, and obviously wrong, rhetoric. There's an enormous body of code out there that begs to disagree with this assertion. The statement was meant 99% tongue in cheek. Like probably 99.99% of Python programmers, I use lists all the time; that's why I want them to be more versatile. There's also a 1% element of truth to the statement. To the extent that people are arguing for alternative data structures to list, particularly deque, I wonder if there actually is some merit in discouraging the use of lists in favor of better alternatives. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Wed, Jan 27, 2010 at 8:46 PM, Steve Howell showel...@yahoo.com wrote: The statement was meant 99% tongue in cheek. Like probably 99.99% of Python programmers, I use lists all the time; that's why I want them to be more versatile. There's also a 1% element of truth to the statement. To the extent that people are arguing for alternative data structures to list, particularly deque, I wonder if there actually is some merit in discouraging the use of lists in favor of better alternatives. Steve, whatever your goal is, you're wearing out your welcome pretty quickly with your current set of tactics. Try something different. This thread is full. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On 27Jan2010 23:08, Nick Coghlan ncogh...@gmail.com wrote: | Cameron Simpson wrote: | The proposed change to make pop(0) O(1) involves adding a base offset | to the internal list implementation. Doesn't that incur a (small) | overhead to _every_ list operation? Doesn't that weaken list as the | as efficient as possible implementation of choice for array-like | things? | | No, the implementation is smarter than that. The existing pointer in the | PyList_* structure will continue to point to the first element of the | list while an extra pointer is added that points to the beginning of the | allocated memory. The difference between the two memory addresses will | determine the number of orphan pointer slots that exist at the | beginning of the list. Mmmm, that is smarter. Nice. | Only operations that change the length of the list will be slowed down | at all, since they will need to take the orphans into account when | deciding whether or not to resize the allocated memory. Nice again. | The big practical concern is actually the memory cost of adding yet | another pointer (potentially 8 bytes of data) to every list object, even | empty ones. Um, yes. How big is a an empty list already? -- Cameron Simpson c...@zip.com.au DoD#743 http://www.cskk.ezoshosting.com/cs/ Who is the happier man, he who has braved the storm of life and lived, or he who has stayed securely on shore and merely existed? - Hunter S. Thompson, age seventeen ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
Terry Reedy tjreedy at udel.edu writes: The idea that CPython should not be improved because it would spoil programmers strikes me as a thin, even desparate objection. One could say that same thing about the recent optimization of string += string so that repeated concats are O(n) instead of O(n*n). Note that it's not even generally true. It only works if your platform's realloc() is sufficiently smart. That's why we once have had some Windows-specific performance problems in the py3k IO module. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
Michael Foord, 26.01.2010 01:14: How great is the complication? Making list.pop(0) efficient sounds like a worthy goal, particularly given that the reason you don't use it is because you *know* it is inefficient I agree. Given a programmer the insight that lists are implemented as arrays in CPython is essentially saying don't use list.pop(0), it's SLOW!. So they won't use it, and even avoid it for small lists where it doesn't really matter. It basically stinks. Making list.pop(0) fast removes another edge where programmers must prematurely concentrate on implementation specifics of the interpreter they use, rather than the functionality they want to implement. I think that's an improvement to the simplicity that Python provides. It's basically saying: care about performance when you have to, but otherwise believe in the core developers to make your code fast enough in most common cases. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
--- On Tue, 1/26/10, Stefan Behnel stefan...@behnel.de wrote: From: Stefan Behnel stefan...@behnel.de Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time To: python-dev@python.org Date: Tuesday, January 26, 2010, 1:27 AM Michael Foord, 26.01.2010 01:14: How great is the complication? Making list.pop(0) efficient sounds like a worthy goal, particularly given that the reason you don't use it is because you *know* it is inefficient I agree. Giving a programmer the insight that lists are implemented as arrays in CPython is essentially saying don't use list.pop(0), it's SLOW!. So they won't use it, and even avoid it for small lists where it doesn't really matter. It basically stinks. Making list.pop(0) fast removes another edge where programmers must prematurely concentrate on implementation specifics of the interpreter they use, rather than the functionality they want to implement. I think that's an improvement to the simplicity that Python provides. It's basically saying: care about performance when you have to, but otherwise believe in the core developers to make your code fast enough in most common cases. Thanks! I agree with you 100%, of course. In fairness, I think my patch will only negligibly affect the performance of almost every Python program in existence. I don't see any likelihood of negative effects for real world programs, but I also concede that the positive effects will usually be drowned out by other performance bottlenecks. So I'd like to push for the patch on the grounds that it simply relieves Python programmers from worrying about a needless implementation detail. But I am also prepared to have it rejected on the simple principle of keeping the CPython implementation less complex. I've strived to make the changes as uninvasive as possible, but there is no getting around 1) increasing the size of PyListObject by an int, and 2) touching every major function in listobject.c related to memory management (w/list_resize getting the most surgery). So there are certainly tradeoffs, unless I am overlooking some more clever way to implement this. The core piece of the change that I made was to make deleting elements off the top of the list more efficient, by advancing a single pointer forward instead of moving an entire chunk of memory backward. An incidental change was then to make inserting elements at the top of the list try to reclaim empty slots. I think the former use case is valid and reasonably common (for some definition of common). The latter use case is probably a lot more rare, but I implemented the insert optimization to avoid the spiky double-memmove that the delete optimization would have otherwise introduced to prepends that succeeded pops. My current strategy for giving memory back to the OS is overly simple and needs refinement, but it basically says that the number of orphaned pointers cannot exceed the number of active elements in the list. This might sound wasteful, but the reality is that N list elements consume a lot more memory than N orphaned pointers, and the greater attractiveness of list.pop(0) would of course lead to earlier garbage collection of the popped elements in real world programs. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
Steve Howell wrote: Ok, I fixed the obvious segfaults, and I am now able to pass all the tests on my debug build. A new diff is attached. Steve, I suggest creating a new issue at bugs.python.org to track your proposal rather than sending diffs to the list. Putting the patch code up on Rietveld is also a good step in helping potential reviewers (I believe there is some info on using Rietveld in the dev FAQ). The patch may still end up being rejected, but I know I at least am lot less opposed to the idea than I was when this thread started. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia --- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
From: Nick Coghlan ncogh...@gmail.com Steve, I suggest creating a new issue at bugs.python.org to track your proposal rather than sending diffs to the list. Agreed. After sending out the second, still slightly broken diff, I realized that I probably needed to read up on the process. You can find the issue tracked here: http://bugs.python.org/issue7784 Florent Xicluna suggested applying my patch to 2.x, instead of 3.x, and was kind enough to fix up my tab/spacing issues. Putting the patch code up on Rietveld is also a good step in helping potential reviewers (I believe there is some info on using Rietveld in the dev FAQ). Ok, I will look into Rietveld tomorrow. The patch may still end up being rejected, but I know I at least am lot less opposed to the idea than I was when this thread started. Thanks for the encouragement. Regardless of the outcome, this has been a good learning experience for me. I haven't touched gdb in over a decade; it was actually kind of fun to debug some C code. As I mentioned earlier, Python has totally spoiled me for other languages, including C! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Tue, Jan 26, 2010 at 1:17 AM, Steve Howell showel...@yahoo.com wrote: Ok, I fixed the obvious segfaults, and I am now able to pass all the tests on my debug build. A new diff is attached. Just to be clear, when you say all tests do you mean all of the list tests or the full Python test suite? -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
--- On Tue, 1/26/10, Daniel Stutzbach dan...@stutzbachenterprises.com wrote: Just to be clear, when you say all tests do you mean all of the list tests or the full Python test suite? The full suite, minus some tests that skipped on my platform. The last patch I posted was broken on test_sys.py, but I have since fixed that. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
--- On Tue, 1/26/10, Antoine Pitrou solip...@pitrou.net wrote: From: Antoine Pitrou solip...@pitrou.net Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time To: python-dev@python.org Date: Tuesday, January 26, 2010, 12:50 AM Terry Reedy tjreedy at udel.edu writes: The idea that CPython should not be improved because it would spoil programmers strikes me as a thin, even desparate objection. One could say that same thing about the recent optimization of string += string so that repeated concats are O(n) instead of O(n*n). Note that it's not even generally true. It only works if your platform's realloc() is sufficiently smart. That's why we once have had some Windows-specific performance problems in the py3k IO module. One thing that has struck me in working on the O(1) patch is that almost all of my code would be unneeded if C's memory manager were just smart enough to allow you to release memory off the top of a chunk equally as efficiently as releasing it off the bottom of a chunk. It's not like memory is asymmetrical; it's just the 1980's C interface for memory management that makes us think we can only extend and shrink in one direction. The memory management paradigm that CPython builds on top of is at least a quarter century outdated, as far as I am concerned. It might even go back to the 1970s. It's not Python's fault that OSes or compilers have not modernized their approach to memory management, but I do think there will come a time, maybe 25 years from now, when CPython says enough is enough, it's time to treat memory like it's extendible and shrinkable in two directions. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On approximately 1/26/2010 1:27 AM, came the following characters from the keyboard of Stefan Behnel: Michael Foord, 26.01.2010 01:14: How great is the complication? Making list.pop(0) efficient sounds like a worthy goal, particularly given that the reason you don't use it is because you *know* it is inefficient I agree. Given a programmer the insight that lists are implemented as arrays in CPython is essentially saying don't use list.pop(0), it's SLOW!. So they won't use it, and even avoid it for small lists where it doesn't really matter. It basically stinks. Making list.pop(0) fast removes another edge where programmers must prematurely concentrate on implementation specifics of the interpreter they use, rather than the functionality they want to implement. I think that's an improvement to the simplicity that Python provides. It's basically saying: care about performance when you have to, but otherwise believe in the core developers to make your code fast enough in most common cases. Stefan Being relatively new to Python, and not yet having discovered deque, I've coded a list.pop(0) just the other day. Guess I better go find that, and see if my usage should be optimized somehow. -- Glenn -- http://nevcal.com/ === A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
The patch is now on Rietveld. http://codereview.appspot.com/194083/show I wsa getting HTTP errors for certain operations, like trying to publish comments, but I am able to see the patch there. Here is the issue tracker link: http://bugs.python.org/issue7784 Here is a rough draft of a proposal to include the patch, which includes discussion of possible objections: http://wiki.python.org/moin/ProposalToSpeedUpListOperations ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Mon, Jan 25, 2010 at 23:57, Terry Reedy tjre...@udel.edu wrote: On 1/25/2010 9:32 PM, Nick Coghlan wrote: However, as Cameron pointed out, the O() value for an operation is an important characteristic of containers, and having people get used to an O(1) list.pop(0) in CPython could create problems not only for other current Python implementations but also for future versions of CPython itself. The idea that CPython should not be improved because it would spoil programmers strikes me as a thin, even desparate objection. One could say that same thing about the recent optimization of string += string so that repeated concats are O(n) instead of O(n*n). What a trap if people move code to other implementations (or older Python) without that new feature. This is a much better optimization than the string appending optimization, as it is both portable and robust. I find it shocking to change a semantic I've come to see as a core part of the language, but I can't come up with a rational reason to oppose it. The approach is sane and the performance impact is (presumably) negligible. -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
--- On Tue, 1/26/10, Guido van Rossum gu...@python.org wrote: From: Guido van Rossum gu...@python.org Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time To: Steve Howell showel...@yahoo.com Cc: Nick Coghlan ncogh...@gmail.com, python-dev@python.org Date: Tuesday, January 26, 2010, 11:17 AM On Tue, Jan 26, 2010 at 10:01 AM, Steve Howell showel...@yahoo.com wrote: The patch is now on Rietveld. http://codereview.appspot.com/194083/show I wsa getting HTTP errors for certain operations, like trying to publish comments, but I am able to see the patch there. Hey Steve, You seem to be using Rietveld in a slightly odd fashion which prevents the side-by-side diff and commenting feature fro working. Try downloading Rietveld's upload.py script (http://codereview.appspot.com/static/upload.py) and uploading your patch again from your SVN client, using upload.py -i 194083 (This will require the email and password you used to upload the issue in the first place.) Ok, I just ran upload.py with my patch. It shows up on Rietveld, but I am still getting 500 errors when I attempt to publish my own comments, so I am not sure what I'm doing wrong. The version that I just uploaded was my working copy, which was off of the 3.x trunk. The version that I attempted to upload earlier, via the URL feature, was slightly different--Florent had taken my changes, applied them to 2.x, cleaned up tabs/spaces, and posted his diff to the issue tracker: http://bugs.python.org/issue7784 The content of both patches is the same otherwise. I suspect the patch has lots of minor cleanup that needs to be done, but I should point out one major design decision that probably needs to be addressed first. I chose to store the number of orphans vs. storing the pointer to the originally allocated memory. There was no rhyme or reason for my decision, other than it was just how I initially got my head around the problem. The tradeoffs should be pretty obvious--there are places where it's convenient to work with the count, but it comes at the cost of pointer arithmetic in other places. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Tue, Jan 26, 2010 at 11:36 AM, Steve Howell showel...@yahoo.com wrote: --- On Tue, 1/26/10, Guido van Rossum gu...@python.org wrote: From: Guido van Rossum gu...@python.org Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time To: Steve Howell showel...@yahoo.com Cc: Nick Coghlan ncogh...@gmail.com, python-dev@python.org Date: Tuesday, January 26, 2010, 11:17 AM On Tue, Jan 26, 2010 at 10:01 AM, Steve Howell showel...@yahoo.com wrote: The patch is now on Rietveld. http://codereview.appspot.com/194083/show I wsa getting HTTP errors for certain operations, like trying to publish comments, but I am able to see the patch there. Hey Steve, You seem to be using Rietveld in a slightly odd fashion which prevents the side-by-side diff and commenting feature fro working. Try downloading Rietveld's upload.py script (http://codereview.appspot.com/static/upload.py) and uploading your patch again from your SVN client, using upload.py -i 194083 (This will require the email and password you used to upload the issue in the first place.) Ok, I just ran upload.py with my patch. It shows up on Rietveld, but I am still getting 500 errors when I attempt to publish my own comments, so I am not sure what I'm doing wrong. The first Patch Set is broken, and attempts to add comments to it will fail. As the issue's owner you can delete the first patch set. The version that I just uploaded was my working copy, which was off of the 3.x trunk. The version that I attempted to upload earlier, via the URL feature, was slightly different--Florent had taken my changes, applied them to 2.x, cleaned up tabs/spaces, and posted his diff to the issue tracker: http://bugs.python.org/issue7784 So here's how you can fix it: go to Edit Issue and change the Base: field to the following: http://svn.python.org/view/*checkout*/python/trunk/ Note the addition of trunk/ at the end. But it's best not to use the same Rietveld Issue for versions pertaining to different branches. The content of both patches is the same otherwise. I suspect the patch has lots of minor cleanup that needs to be done, but I should point out one major design decision that probably needs to be addressed first. I chose to store the number of orphans vs. storing the pointer to the originally allocated memory. There was no rhyme or reason for my decision, other than it was just how I initially got my head around the problem. The tradeoffs should be pretty obvious--there are places where it's convenient to work with the count, but it comes at the cost of pointer arithmetic in other places. I'd like you to revisit this design decision. It would make life harder if we ever switched to a conservative garbage collector. I feel much more comfortable having a hard pointer to the actual memory block in my hands rather than having to compute where that block starts using pointer arithmetic. Note that I'm not endorsing your patch -- I expect that the number of (real) use cases that aren't already served better or equally well by deques is very small, and I fear that the real cost of adding sizeof(ssize_t) bytes to every list object is real (unless there are 8 bytes unused given the block size rounding happening in obmalloc). I don't believe a microbenchmark proves much of anything. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
From: Guido van Rossum gu...@python.org So here's how you can fix it: go to Edit Issue and change the Base: field to the following: http://svn.python.org/view/*checkout*/python/trunk/ I just deleted the issue altogether for now, since the preferred approach is to use a pointer, and that's gonna change the patch in lots of little places, so I think it would be wasteful to review it in its current form. I'd like you to revisit this design decision. It would make life harder if we ever switched to a conservative garbage collector. I feel much more comfortable having a hard pointer to the actual memory block in my hands rather than having to compute where that block starts using pointer arithmetic. Ok, will do. Note that I'm not endorsing your patch -- I expect that the number of (real) use cases that aren't already served better or equally well by deques is very small, and I fear that the real cost of adding sizeof(ssize_t) bytes to every list object is real (unless there are 8 bytes unused given the block size rounding happening in obmalloc). I don't believe a microbenchmark proves much of anything. Fair enough. It seems to me that the goal of keeping lists super-compact from a memory standpoint is contradicted by the code in list_resize that optimistically preallocates extra memory on appends. I'm not arguing against the tradeoff there, as I think that optimization has a way more compelling CPU gain for most applications than the optimizations I'm proposing. What I'm really saying is that there is some precedent to waste a little memory to make CPython lists run faster. My gut says that the trend of the future is more CPU scarcity than memory scarcity, but that's just a WAG. Just more food for thought--doing the memory management within listobject.c is really a workaround to the limitations of the underlying memory manager. It seems conceptually possible to me to design a memory manager that allows you to shrink and extend memory chunks on both sides, without any extra memory overhead for bookkeeping. I'm sure the devils in the details, of course. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Tue, Jan 26, 2010 at 12:46 PM, Steve Howell showel...@yahoo.com wrote: It seems to me that the goal of keeping lists super-compact from a memory standpoint is contradicted by the code in list_resize that optimistically preallocates extra memory on appends. Ah, but that applies for *large* lists. Adding 8 bytes to each list object applies to *all* lists. I betcha that many programs use many tiny lists. If I had a program that was growing and shrinking a list at both ends, I'd consider a deque. If I had a program that was growing and shrinking a list at the front, I'd consider maintaining the data backwards. Extensive use of pop(0) (and insert(0, x) I suppose) is only defensible if you also need to access the data by index and negative indexes are not an option. Think about that. I'm not arguing against the tradeoff there, as I think that optimization has a way more compelling CPU gain for most applications than the optimizations I'm proposing. What I'm really saying is that there is some precedent to waste a little memory to make CPython lists run faster. My gut says that the trend of the future is more CPU scarcity than memory scarcity, but that's just a WAG. I'm no expert in this area, but considering the cost of cache misses, I'm not so sure about that. Just more food for thought--doing the memory management within listobject.c is really a workaround to the limitations of the underlying memory manager. It seems conceptually possible to me to design a memory manager that allows you to shrink and extend memory chunks on both sides, without any extra memory overhead for bookkeeping. I'm sure the devils in the details, of course. Maybe, but that's a project of a different magnitude altogether. For better or for worse, the assumption is widely made that blocks are represented by a pointer to the start plus a length. And considering zero to have special properties has pretty good mathematical backup of you ask me. :-) -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
Adam Olsen wrote: This is a much better optimization than the string appending optimization, as it is both portable and robust. I find it shocking to change a semantic I've come to see as a core part of the language, but I can't come up with a rational reason to oppose it. The approach is sane and the performance impact is (presumably) negligible. Couldn't have said it better myself... Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia --- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On 1/26/2010 7:51 AM, Steve Howell wrote: From: Nick Coghlanncogh...@gmail.com Steve, I suggest creating a new issue at bugs.python.org to track your proposal rather than sending diffs to the list. I was about to suggest the same thing. Even if your final patch is not currently accepted, it will remain on the tracker indefinitely, where interested parties can potentially use it in private customized versions that they compile themselves. http://bugs.python.org/issue7784 Technical discussion of your patch should mostly migrate to the tracker, as with other issues. The issue will appear on both the new tracker issues list and the weekly (Friday) tracker summary, so interested persons can comment or just insert themselves on the nosy list. Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
From: Guido van Rossum gu...@python.org Steve Howell showel...@yahoo.com wrote: It seems to me that the goal of keeping lists super-compact from a memory standpoint is contradicted by the code in list_resize that optimistically preallocates extra memory on appends. Ah, but that applies for *large* lists. Adding 8 bytes to each list object applies to *all* lists. I betcha that many programs use many tiny lists. Even most tiny-ish lists are wasting memory, though, according to this sequence: 4, 8, 16, 25, ... If I had a program that was growing and shrinking a list at both ends, I'd consider a deque. I'd use a deque on current Python, but I'd use list with the proposed patch. :) If I had a program that was growing and shrinking a list at the front, I'd consider maintaining the data backwards. Extensive use of pop(0) (and insert(0, x) I suppose) is only defensible if you also need to access the data by index and negative indexes are not an option. Think about that. Maintaining the data backwards creates complications if you are popping off small chunks of the list and then subsequently operating on them. A parser would be a good example. If you reverse the original list, then grab the first four lines, for example, and then want to concatenate the first four lines in the original order, you will need to reverse them again. Having to reverse a list to change its performance just seems backward to me. [...] My gut says that the trend of the future is more CPU scarcity than memory scarcity, but that's just a WAG. I'm no expert in this area, but considering the cost of cache misses, I'm not so sure about that. Ok, good point. Just more food for thought--doing the memory management within listobject.c is really a workaround to the limitations of the underlying memory manager. It seems conceptually possible to me to design a memory manager that allows you to shrink and extend memory chunks on both sides, without any extra memory overhead for bookkeeping. I'm sure the devils in the details, of course. I asked Doug Lea about this, and he just replied to me that the devil is indeed in the details--it would be hard to make memory managers do what I'm proposing in an efficient way. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Tue, Jan 26, 2010 at 3:32 PM, Steve Howell showel...@yahoo.com wrote: Even most tiny-ish lists are wasting memory, though, according to this sequence: 4, 8, 16, 25, ... Several operations will allocate a list of exactly the right size, wasting no memory. In particular, a fixed-size list will always be exactly the right size. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
Ah, but that applies for *large* lists. Adding 8 bytes to each list object applies to *all* lists. I betcha that many programs use many tiny lists. Even most tiny-ish lists are wasting memory, though, according to this sequence: 4, 8, 16, 25, ... That is only is they are being grown with list.append(). Otherwise, list sizes are exact. For example, [0,1,2] uses space for just three entries and [] for none. I get the impression that 1) you've already made up your mind and are ignoring input from the other developers, 2) that you're ignoring the python-dev discussions of long ago where this very idea was rejected and deques came in to being instead, 3) over-estimating the prevalence of use cases for pop(0), and 4) under-estimating the impact on small lists, 5) under-estimating the impact on psyco or other implementations of Python, 6) under-estimating the impact on third-party extensions and debugging tools. Deques already provide a solution to the FIFO problem and they do so without huge wastes of memory or calls to memmove(). They handle both insertion and deletion from the front and back. In comparison, the proposed changes to lists seem like a complete hack. Just because it can be done, doesn't mean that it should. ISTM that you're attacking an already solved problem and that you're playing fast and loose with a core data structure. Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
Benjamin Peterson wrote: 2010/1/25 Steve Howell showel...@yahoo.com: I am interested in creating a patch to make deleting elements from the front of Python list work in O(1) time by advancing the ob_item pointer. How about just using a deque? ... or a stack: http://www.egenix.com/products/python/mxBase/mxStack/ ... or a queue: http://www.egenix.com/products/python/mxBase/mxQueue/ Specialized implementations usually give the best performance - of course, it all depends on what you're trying to achieve. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 26 2010) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On 26Jan2010 01:57, Terry Reedy tjre...@udel.edu wrote: | On 1/25/2010 9:32 PM, Nick Coghlan wrote: | However, as Cameron pointed out, the O() value for an operation is an | important characteristic of containers, and having people get used to an | O(1) list.pop(0) in CPython could create problems not only for other | current Python implementations but also for future versions of CPython | itself. | | The idea that CPython should not be improved because it would spoil | programmers strikes me as a thin, even desparate objection. Hey, I even used the word thin to describe my concern! My point was that I look on python builtins like list and dict as highly optimised, highly efficient facilities. That means that I expect a list to be very very much like a linear array as one would find in C-like languages, with realloc() managment behind the scenes to handle the resizing requirements on append/extend. It also means that the O() cost of operations should be pretty obvious. pop(0) obviously involves shuffling everything down one slot. This means that the python programmer should expect that the common array type operations such as __getitem__ should be as blazingly fast as possible, and to expect that barring unusual programmer requirements, if I'm thinking about arrays I should almost always reach for a python list. The proposed change to make pop(0) O(1) involves adding a base offset to the internal list implementation. Doesn't that incur a (small) overhead to _every_ list operation? Doesn't that weaken list as the as efficient as possible implementation of choice for array-like things? | One | could say that same thing about the recent optimization of string += | string so that repeated concats are O(n) instead of O(n*n). What a | trap if people move code to other implementations (or older Python) | without that new feature. | | Of course, the whole purpose of adding a jit to CPython would be to | spoil us. | | It is a fact already that optimizing CPython code is specific to a | particular interpreter, system, and workload. Yes, optimisation is nice. Higher seed is nice. But there are optimisations that simple fix inefficiencies or provide a special case for a common operation, and there are those with side effects elsewhere in the implementation (i.e. the base offset this pop(0) opt would incur, which adds a (small) cost to everything). The other aspect, mentioned elsewhere in this thread, is that programmers should know the O() cost of these operations. Cheers, -- Cameron Simpson c...@zip.com.au DoD#743 http://www.cskk.ezoshosting.com/cs/ I am a kind of burr; I shall stick. - William Shakespeare, _Measure for Measure_ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] patch to make list.pop(0) work in O(1) time
I am interested in creating a patch to make deleting elements from the front of Python list work in O(1) time by advancing the ob_item pointer. The patch will probably be rejected, but I would like to try it anyway as an exercise in digging into the CPython source, and working through the process. My goal is to accompany the proposed patch with a PEP (which I also expect to be initially rejected, but which will hopefully be a useful contribution in terms of documenting the decision.) The reason I am posting here is that there appears to be some history behind similar patches that I am not aware of, so if anybody can refer me to earlier patches, PEPS, discussion threads, etc., I would be grateful. I am aware of PEP 3128, which has similar goals to what I'm trying to achieve, but there are also some differences. The blist implementation described in PEP 3128 achieves the goal of reducing time complexity for some operations, but necessarily at the expense of other operations, notably list access. The patch that I am considering would not affect time complexity for any other operations, nor memory complexity, but it would, of course, have marginal costs in certain operations, notably any operation that eventually calls list_resize(). I am confident that the patch would not impact performance of list accesses at all. The memory cost for the list itself would be an additional pointer or integer, which I think should be considered negligible compared to the cost of the list itself [O(N)] and the elements in the list [O(N)]. I haven't completely worked out the best strategy to eventually release the memory taken up by the pointers of the unreleased elements, but the worst case scenario is that the unused memory only gets wasted until the time that the list itself gets garbage collected. I think I can do better than that, at some cost of complicating list_resize. From a memory standpoint, the benefits of encouraging deleting items from the front of the list should outweigh any disadvantages with respect to lazily releasing memory from the pointers at the front of the list, since in deleting elements, you allow the elements themselves to be garbage collected earlier, as well as objects that might be referenced by those elements. My goal would be to target the patch at 3.x, and if I was lucky enough to get it accepted, I think it could eventually be backported to 2.x. The proposal does not affect the definition of the language itself, of course; it merely attempts to improve the performance of the CPython implementation. The instructions that I have found for setting up your development environment seemed to be targeted at 2.x trunk, which is fine with me. I will attempt the patch off the 2.x trunk to get an initial feel for the issues involved, unless somebody counsels me to work off 3.x right from the start. http://www.python.org/dev/setup/ I have not been able to locate the source code for 3.x. Is the implementation of list more or less the same there? There is a long thread entitled list.pop(0) vs. collections.dequeue on comp.lang.python that addresses alternatives to list.pop(0), but none of them really fit my use case. Here is a sketch of the PEP that I would propose: Proposal: Improve list's implementation so that deleting elements from the front of the list does not require an O(N) memmove operation. Rationale: Some Python programs that process lists have multiple methods that consume the first element of the list and pop it off. The pattern comes up with parsers in particular, but there are other examples. It is possible now, of course, to use a data structure in Python that has O(1) for deleting off the top of the list, but none of the alternatives fully replicate the benefits of list itself. Specification: Improving CPython's performance does not affect the language itself, so there are no bikeshed arguments to be had with respect to syntax, etc. Any patch would, of course, affect the performance of nearly every Python program in existence, so any patch would have to, at a bare minimum: 1) Not increase the time or memory complexity of any other list operation. 2) Not affect list access at all. 3) Minimally affect list operations that mutate the list. 4) Be reasonably simple within CPython itself. 5) Not be grossly wasteful of memory. Backwards Compatibility: See above. An implementation of this PEP would not change the definition of the language in any way, but it would have to minimally impact the performance of lists for the normal use cases. Implementation: There are two ways to make deleting the first item of the list run more efficiently. The most ambitious proposal is to fix the memory manager itself to allow the release of memory from the start of the chunk. The advantage of this proposal is that it would simplify the changes to list itself, and possibly have collateral
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Mon, Jan 25, 2010 at 1:22 PM, Steve Howell showel...@yahoo.com wrote: I haven't completely worked out the best strategy to eventually release the memory taken up by the pointers of the unreleased elements, but the worst case scenario is that the unused memory only gets wasted until the time that the list itself gets garbage collected. FWIW, for a long-running FIFO queue, it's critical to release some of the memory along the way, otherwise the amount of wasted memory is unbounded. Good luck :) -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Jan 25, 2010, at 11:22 AM, Steve Howell wrote: I am interested in creating a patch to make deleting elements from the front of Python list work in O(1) time by advancing the ob_item pointer. +1 on doing whatever experiments you feel like doing -1 on putting something like this in the core 1) To many things in the Python world rely on the current implementation of lists. It's not worth breaking third-party extensions, tools like psyco, work on unladen swallow, and other implementations of Python such as PyPy and Jython. 2). The use of lists pervades the language and it doesn't make sense to have the language as a whole pay a price (in terms of speed and space) for every list that gets created. The corner case isn't worth it. 3). We already got one. The collections.deque() class was introduced specifically to handle inserting and popping from the front of a list efficiently. 4). In the comp.lang.python thread on this subject, you've gotten nearly zero support for your position and have managed to offend many of the developers. Raymond___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
2010/1/25 Steve Howell showel...@yahoo.com: I am interested in creating a patch to make deleting elements from the front of Python list work in O(1) time by advancing the ob_item pointer. How about just using a deque? -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
--- On Mon, 1/25/10, Daniel Stutzbach dan...@stutzbachenterprises.com wrote: FWIW, for a long-running FIFO queue, it's critical to release some of the memory along the way, otherwise the amount of wasted memory is unbounded. Somebody implementing a long-running FIFO queue should actually be using deque instead of list, but I agree with your greater point that waiting until the list gets garbage-collected to release memory would probably be unacceptable, so I'll see what I can do. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
From: Raymond Hettinger raymond.hettin...@gmail.com On Jan 25, 2010, at 11:22 AM, Steve Howell wrote: I am interested in creating a patch to make deleting elements from the front of Python list work in O(1) time by advancing the ob_item pointer. +1 on doing whatever experiments you feel like doing-1 on putting something like this in the core 1) To many things in the Python world rely on the current implementation of lists. It's not worth breaking third-party extensions, tools like psyco, work on unladen swallow, and other implementations of Python such as PyPy and Jython. I don't understand how changing the implementation of CPython would impact PyPy and Jython, unless you are just referring to the fact that CPython is treated as a reference implementation, so its simplicity is a virtue for other ports. Am I missing something else? 2). The use of lists pervades the language and it doesn't make sense to have the language as a whole pay a price (in terms of speed and space) for every list that gets created. The corner case isn't worth it. I understand the tradeoff. 3). We already got one. The collections.deque() class was introduced specifically to handle inserting and popping from the front of a list efficiently. I understand that deque solves some problems that list does not. Obviously, it allows you to delete elements off the front in O(1) time, but it also has other advantages, such as allowing you to rotate elements efficiently. It's designed, I am guessing, for FIFO queues, and it's a perfectly good data structure, just not one that is well suited for all use cases. 4). In the comp.lang.python thread on this subject, you've gotten nearly zero support for your position and have managed to offend many of the developers. Fair enough. I think the idea is at least worthy of a PEP that puts forward the strongest case possible. Terry Jan Reedy wrote: ''' I am not opposed to a possible change, just hasty, ill-informed criticism. If there is not a PEP on this issue, it would be good to have one that recorded the proposal and the pros and cons, regardless of the outcome, so there would be something to refer people to. If that had been already done, it would have shortened this thread considerably. ''' Do you agree that it is at least worthwhile to write a PEP here? It could be fairly short and quickly rejected, and down the road, if more people get behind it, it could always be strengthened and revisited. There seems to be at least some precedence for PEPs that only pertain to internal implementation details, such as this one: http://www.python.org/dev/peps/pep-0267/ (Raymond, apologies for my double reply...the first one was meant to go to the list, not directly to you.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
--- On Mon, 1/25/10, Benjamin Peterson benja...@python.org wrote: 2010/1/25 Steve Howell showel...@yahoo.com: I am interested in creating a patch to make deleting elements from the front of Python list work in O(1) time by advancing the ob_item pointer. How about just using a deque? Deque does not support all the operations that list does. It is also roughly twice as slow for accessing elements (I've measured it). lst = ['foo', 'bar', 'baz'] lst[1:] ['bar', 'baz'] lst.insert(0, 'spam') lst ['spam', 'foo', 'bar', 'baz'] d = deque(lst) d[2:] Traceback (most recent call last): File stdin, line 1, in module TypeError: sequence index must be integer, not 'slice' d.insert(0, 'eggs') Traceback (most recent call last): File stdin, line 1, in module AttributeError: 'collections.deque' object has no attribute 'insert' ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Jan 25, 2010, at 12:36 PM, Steve Howell wrote: Deque does not support all the operations that list does. It is also roughly twice as slow for accessing elements (I've measured it). ISTM that apps that want to insert or pop from the front of list are also apps that don't care about accessing arbitrary elements in the middle using the position index. When lists are growing or shrinking from the front, the meaning of the i-th element changes. So, it doesn't make sense for an application to track indices of objects in such a list. i = s.find('abc') s.pop(0) print s[i]# i no longer points at 'abc' Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
From: Raymond Hettinger raymond.hettin...@gmail.com On Jan 25, 2010, at 12:36 PM, Steve Howell wrote: Deque does not support all the operations that list does. It is also roughly twice as slow for accessing elements (I've measured it). ISTM that apps that want to insert or pop from the front of list are also apps that don't care about accessing arbitrary elements in the middle using the position index. When lists are growing or shrinking from the front, the meaning of the i-th element changes. So, it doesn't make sense for an application to track indices of objects in such a list. i = s.find('abc') s.pop(0) print s[i]# i no longer points at 'abc' I am not going to directly address your point, but I'd like to give a examples of code that uses pop(0) from the standard library. If you look at the code for multiprocessing/connection.py, you will see that PipeListener creates _handle_queue as an ordinary Python list, and in line 317 it uses pop(0) to pop the first handle off the top of the queue. Why does that code not use a deque? In hindsight, it probably should. But to make the change now, it's not a simple matter of fixing just PipeListener, because PipeListener passes off _handle_queue to Finalize, which also expects a list. In order to understand why Finalize expects a list, you need to look at how it uses args, and here is one example usage: res = self._callback(*self._args, **self._kwargs) Ok, so now you need to know what self._callback is doing, so now you have to trace through all callers of Finalize are passing in for their args. So what seems like a trivial matter--switching over a list to a deque--actually requires a lot of thinking. It turns out that the callback for PipeListener just iterates through the remaining handles and closes them. So a deque would not break that code. If you look at difflib.py, it also does pop(0) in a loop. Why doesn't it use a deque? Simplicity, maybe? codecs.py also deletes from the top of the list: del self.linebuffer[0] ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
2010/1/25 Steve Howell showel...@yahoo.com: From: Raymond Hettinger raymond.hettin...@gmail.com On Jan 25, 2010, at 12:36 PM, Steve Howell wrote: Deque does not support all the operations that list does. It is also roughly twice as slow for accessing elements (I've measured it). ISTM that apps that want to insert or pop from the front of list are also apps that don't care about accessing arbitrary elements in the middle using the position index. When lists are growing or shrinking from the front, the meaning of the i-th element changes. So, it doesn't make sense for an application to track indices of objects in such a list. i = s.find('abc') s.pop(0) print s[i] # i no longer points at 'abc' I am not going to directly address your point, but I'd like to give a examples of code that uses pop(0) from the standard library. If you look at the code for multiprocessing/connection.py, you will see that PipeListener creates _handle_queue as an ordinary Python list, and in line 317 it uses pop(0) to pop the first handle off the top of the queue. Why does that code not use a deque? In hindsight, it probably should. But to make the change now, it's not a simple matter of fixing just PipeListener, because PipeListener passes off _handle_queue to Finalize, which also expects a list. In order to understand why Finalize expects a list, you need to look at how it uses args, and here is one example usage: res = self._callback(*self._args, **self._kwargs) Ok, so now you need to know what self._callback is doing, so now you have to trace through all callers of Finalize are passing in for their args. So what seems like a trivial matter--switching over a list to a deque--actually requires a lot of thinking. It turns out that the callback for PipeListener just iterates through the remaining handles and closes them. So a deque would not break that code. If you look at difflib.py, it also does pop(0) in a loop. Why doesn't it use a deque? Simplicity, maybe? codecs.py also deletes from the top of the list: del self.linebuffer[0] Yes, but in either of these cases is there an excellent performance improvement to be gained and is it worth the complexity of your optimization? I think not. -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Mon, Jan 25, 2010 at 11:32 AM, Daniel Stutzbach dan...@stutzbachenterprises.com wrote: On Mon, Jan 25, 2010 at 1:22 PM, Steve Howell showel...@yahoo.com wrote: I haven't completely worked out the best strategy to eventually release the memory taken up by the pointers of the unreleased elements, but the worst case scenario is that the unused memory only gets wasted until the time that the list itself gets garbage collected. FWIW, for a long-running FIFO queue, it's critical to release some of the memory along the way, otherwise the amount of wasted memory is unbounded. Good luck :) It seems to me that the best way to do this is invert .append() logic: leave at most X amount of wasted space at the beginning of the list, where X is a constant fraction of the list size. Whether it is worth adding a extra pointer to the data stored by a list is another story. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
--- On Mon, 1/25/10, Mike Klaas mike.kl...@gmail.com wrote: From: Mike Klaas mike.kl...@gmail.com On Mon, Jan 25, 2010 at 1:22 PM, Steve Howell showel...@yahoo.com wrote: I haven't completely worked out the best strategy to eventually release the memory taken up by the pointers of the unreleased elements, but the worst case scenario is that the unused memory only gets wasted until the time that the list itself gets garbage collected. FWIW, for a long-running FIFO queue, it's critical to release some of the memory along the way, otherwise the amount of wasted memory is unbounded. Good luck :) It seems to me that the best way to do this is invert .append() logic: leave at most X amount of wasted space at the beginning of the list, where X is a constant fraction of the list size. That's roughly what I intend to do. The problems are not entirely symmetric. For appends, the wasted space exists in the sense that CPython optimistically get extra chunks of memory for future appends, to save CPU at the possible cost of needlessly allocating memory. For pops, the bargain would be that you optimistically defer releasing memory to save CPU cycles in the hope that memory is not scarce. Of course, if you have just popped an element off the list, then you have just made memory less scarce by virtue of removing the list elements themselves. Whether it is worth adding a extra pointer to the data stored by a list is another story. That is definitely a point of contention. It would certainly bloat a program that had millions of empty lists. I think in most real-world programs, though, the amount of memory taken by PyListObjects will always be greatly exceeded by the amount of memory used by the list elements, or even just the pointers to the list elements. It's the difference between the number of elements in a list, O(N), and the number of structures that define a list's state, O(1). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
Benjamin Peterson wrote: Yes, but in either of these cases is there an excellent performance improvement to be gained and is it worth the complexity of your optimization? I think not. Me, too. I already tried to explain Steve that I have used list.pop(0) in very few cases during my seven years as a professional Python developer. Since I knew that popping from the beginning of a list is slower than popping from the end or just leaving the list unmodified I found ways to alter my algorithms. The few cases left were either not performance critical or used dequeue instead. I vote -0.5 on the change unless Guido, Tim or Raymond think that the size and complication impact is worth the hassle. Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On 26/01/2010 00:12, Christian Heimes wrote: Benjamin Peterson wrote: Yes, but in either of these cases is there an excellent performance improvement to be gained and is it worth the complexity of your optimization? I think not. Me, too. I already tried to explain Steve that I have used list.pop(0) in very few cases during my seven years as a professional Python developer. Since I knew that popping from the beginning of a list is slower than popping from the end or just leaving the list unmodified I found ways to alter my algorithms. The few cases left were either not performance critical or used dequeue instead. I vote -0.5 on the change unless Guido, Tim or Raymond think that the size and complication impact is worth the hassle. How great is the complication? Making list.pop(0) efficient sounds like a worthy goal, particularly given that the reason you don't use it is because you *know* it is inefficient (so the fact that you don't use it isn't evidence that it isn't wanted - merely evidence that you had to work around the known inefficiency). Michael Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
--- On Mon, 1/25/10, Benjamin Peterson benja...@python.org wrote: From: Benjamin Peterson benja...@python.org Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time To: Steve Howell showel...@yahoo.com Cc: python-dev@python.org Date: Monday, January 25, 2010, 3:15 PM 2010/1/25 Steve Howell showel...@yahoo.com: From: Raymond Hettinger raymond.hettin...@gmail.com On Jan 25, 2010, at 12:36 PM, Steve Howell wrote: Deque does not support all the operations that list does. It is also roughly twice as slow for accessing elements (I've measured it). ISTM that apps that want to insert or pop from the front of list are also apps that don't care about accessing arbitrary elements in the middle using the position index. When lists are growing or shrinking from the front, the meaning of the i-th element changes. So, it doesn't make sense for an application to track indices of objects in such a list. i = s.find('abc') s.pop(0) print s[i] # i no longer points at 'abc' I am not going to directly address your point, but I'd like to give a examples of code that uses pop(0) from the standard library. If you look at the code for multiprocessing/connection.py, you will see that PipeListener creates _handle_queue as an ordinary Python list, and in line 317 it uses pop(0) to pop the first handle off the top of the queue. Why does that code not use a deque? In hindsight, it probably should. But to make the change now, it's not a simple matter of fixing just PipeListener, because PipeListener passes off _handle_queue to Finalize, which also expects a list. In order to understand why Finalize expects a list, you need to look at how it uses args, and here is one example usage: res = self._callback(*self._args, **self._kwargs) Ok, so now you need to know what self._callback is doing, so now you have to trace through all callers of Finalize are passing in for their args. So what seems like a trivial matter--switching over a list to a deque--actually requires a lot of thinking. It turns out that the callback for PipeListener just iterates through the remaining handles and closes them. So a deque would not break that code. If you look at difflib.py, it also does pop(0) in a loop. Why doesn't it use a deque? Simplicity, maybe? codecs.py also deletes from the top of the list: del self.linebuffer[0] Yes, but in either of these cases is there an excellent performance improvement to be gained and is it worth the complexity of your optimization? I think not. I am starting to think that the optimization would be drowned out by the cost of processing each line, unless you had some combination of the following: 1) a pretty large list (plausible) 2) a very inexpensive operation that you were applying to each line (rare) 3) a really slow memmove implementation (extremely doubtful) The complexity of the optimization does not phase me for some reason. If ruthless simplicity were the only goal, then I'd also simplify/remove some of this code: /* Bypass realloc() when a previous overallocation is large enough to accommodate the newsize. If the newsize falls lower than half the allocated size, then proceed with the realloc() to shrink the list. */ if (allocated = newsize newsize = (allocated 1)) { assert(self-ob_item != NULL || newsize == 0); Py_SIZE(self) = newsize; return 0; } /* This over-allocates proportional to the list size, making room * for additional growth. The over-allocation is mild, but is * enough to give linear-time amortized behavior over a long * sequence of appends() in the presence of a poorly-performing * system realloc(). * The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ... */ new_allocated = (newsize 3) + (newsize 9 ? 3 : 6); In my mind, though, the complexity within CPython does not have to leak up to the Python level. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
Michael Foord wrote: How great is the complication? Making list.pop(0) efficient sounds like a worthy goal, particularly given that the reason you don't use it is because you *know* it is inefficient (so the fact that you don't use it isn't evidence that it isn't wanted - merely evidence that you had to work around the known inefficiency). The implementation must be changed in at least four places: * The PyListObject struct gets an additional pointer that stores a reference to the head. I would keep the head (element 0) of the list in **ob_item and the reference to the malloc()ed array in a new pointer *ob_allocated. * PyList_New() stores the pointer to the allocated memory in op-ob_allocated and sets op-ob_item = op-ob_allocated * listpop() moves the op-ob_item pointer by one for the special case of pop(0) * list_resize() should occasionally compact the free space before the head with memcpy() if it gets too large. listinsert() could be optimized for 0 if the list has some free space in front of the header, too. I favor this approach over an integer offset because doesn't change the semantic of ob_item. Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On 26/01/2010 00:28, Christian Heimes wrote: Michael Foord wrote: How great is the complication? Making list.pop(0) efficient sounds like a worthy goal, particularly given that the reason you don't use it is because you *know* it is inefficient (so the fact that you don't use it isn't evidence that it isn't wanted - merely evidence that you had to work around the known inefficiency). The implementation must be changed in at least four places: * The PyListObject struct gets an additional pointer that stores a reference to the head. I would keep the head (element 0) of the list in **ob_item and the reference to the malloc()ed array in a new pointer *ob_allocated. * PyList_New() stores the pointer to the allocated memory in op-ob_allocated and sets op-ob_item = op-ob_allocated * listpop() moves the op-ob_item pointer by one for the special case of pop(0) * list_resize() should occasionally compact the free space before the head with memcpy() if it gets too large. listinsert() could be optimized for 0 if the list has some free space in front of the header, too. I favor this approach over an integer offset because doesn't change the semantic of ob_item. Christian Well, on the face of it this doesn't sound like a huge increase in complexity. Not that I'm qualified to judge. Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Mon, Jan 25, 2010 at 7:32 PM, Michael Foord fuzzy...@voidspace.org.uk wrote: On 26/01/2010 00:28, Christian Heimes wrote: Michael Foord wrote: How great is the complication? Making list.pop(0) efficient sounds like a worthy goal, particularly given that the reason you don't use it is because you *know* it is inefficient (so the fact that you don't use it isn't evidence that it isn't wanted - merely evidence that you had to work around the known inefficiency). The implementation must be changed in at least four places: * The PyListObject struct gets an additional pointer that stores a reference to the head. I would keep the head (element 0) of the list in **ob_item and the reference to the malloc()ed array in a new pointer *ob_allocated. * PyList_New() stores the pointer to the allocated memory in op-ob_allocated and sets op-ob_item = op-ob_allocated * listpop() moves the op-ob_item pointer by one for the special case of pop(0) * list_resize() should occasionally compact the free space before the head with memcpy() if it gets too large. listinsert() could be optimized for 0 if the list has some free space in front of the header, too. I favor this approach over an integer offset because doesn't change the semantic of ob_item. Christian Well, on the face of it this doesn't sound like a huge increase in complexity. Not that I'm qualified to judge. Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/alex.gaynor%40gmail.com Does anyone know if any other language's automatic array (or whatever they call it) special case the pop(0) case like this? Alex -- I disapprove of what you say, but I will defend to the death your right to say it. -- Voltaire The people's good is the highest law. -- Cicero Code can always be simpler than you think, but never as simple as you want -- Me ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
--- On Mon, 1/25/10, Christian Heimes li...@cheimes.de wrote: From: Christian Heimes li...@cheimes.de Michael Foord wrote: How great is the complication? Making list.pop(0) efficient sounds like a worthy goal, particularly given that the reason you don't use it is because you *know* it is inefficient (so the fact that you don't use it isn't evidence that it isn't wanted - merely evidence that you had to work around the known inefficiency). The implementation must be changed in at least four places: * The PyListObject struct gets an additional pointer that stores a reference to the head. I would keep the head (element 0) of the list in **ob_item and the reference to the malloc()ed array in a new pointer *ob_allocated. * PyList_New() stores the pointer to the allocated memory in op-ob_allocated and sets op-ob_item = op-ob_allocated * listpop() moves the op-ob_item pointer by one for the special case of pop(0) * list_resize() should occasionally compact the free space before the head with memcpy() if it gets too large. listinsert() could be optimized for 0 if the list has some free space in front of the header, too. I favor this approach over an integer offset because doesn't change the semantic of ob_item. The approach that Christian outlines is exactly what I intend to accomplish, even if the patch does get permanently or temporarily rejected. I am pretty confident about what needs to be done within list_ass_slice, including the listinsert() optimization. I also see where I need to add the new pointer (ob_allocated seems like a good name to me) within the PyListObject struct. Still wrestling with the other details, though. My C is pretty rusty, and of course I have the extreme versatility of Python to blame for that! :) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On 25Jan2010 12:34, Steve Howell showel...@yahoo.com wrote: | From: Raymond Hettinger raymond.hettin...@gmail.com | 1) To many things in the Python world rely on | the current implementation of lists. It's not | worth breaking third-party extensions, tools like psyco, | work on unladen swallow, and other implementations of Python | such as PyPy and Jython. | | I don't understand how changing the implementation of CPython would | impact PyPy and Jython, unless you are just referring to the fact that | CPython is treated as a reference implementation, so its simplicity is | a virtue for other ports. Am I missing something else? I can think of something: lists traditionally have O(n) pop(0) performance because they're normally quite simple layers on top of an array. (And likewise for pop(1), which your approach won't help; I know you could measure the list and decide pop(1) is a small copy of the left of the list along with a move of the base offset, but that degrades as you move from 0 and 1 to larger indices). Supposing pop(0) becomes cheap in CPython and this becomes well known (or worse, documented:-) Someone depending on this now has code that is fundamentally inefficient on other Pythons. I know this is a slightly thin objection, since your change can probably be taken to the other implementations. -- Cameron Simpson c...@zip.com.au DoD#743 http://www.cskk.ezoshosting.com/cs/ But pessimism IS realism! - D.L.Bahr ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
Michael Foord wrote: On 26/01/2010 00:28, Christian Heimes wrote: I favor this approach over an integer offset because doesn't change the semantic of ob_item. Well, on the face of it this doesn't sound like a huge increase in complexity. Not that I'm qualified to judge. Christian's approach is a good one for minimising the semantic changes, and compared to an offset based approach actually has a decent chance of working without breaking too much C code (you'd have to change the way sys.getsizeof() worked for lists, but I can't think of anything else off the top of my head that would definitely break). The potential of resizing and hence relocation of the storage buffer means it was already unsafe to cache ob_item when changing the size of the list, so code should generally be unaware that ob_item can now change even when the buffer isn't reallocated. However, as Cameron pointed out, the O() value for an operation is an important characteristic of containers, and having people get used to an O(1) list.pop(0) in CPython could create problems not only for other current Python implementations but also for future versions of CPython itself. This idea changes list from a simple concept (ob_item points to the beginning of an allocated array which may change length) to a more complicated deque-like one (ob_item points somewhere near the beginning of an allocated array which may change length). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia --- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
--- On Mon, 1/25/10, Nick Coghlan ncogh...@gmail.com wrote: From: Nick Coghlan ncogh...@gmail.com Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time To: Michael Foord fuzzy...@voidspace.org.uk Cc: Christian Heimes li...@cheimes.de, python-dev@python.org Date: Monday, January 25, 2010, 6:32 PM Michael Foord wrote: On 26/01/2010 00:28, Christian Heimes wrote: I favor this approach over an integer offset because doesn't change the semantic of ob_item. Well, on the face of it this doesn't sound like a huge increase in complexity. Not that I'm qualified to judge. Christian's approach is a good one for minimising the semantic changes, and compared to an offset based approach actually has a decent chance of working without breaking too much C code (you'd have to change the way sys.getsizeof() worked for lists, but I can't think of anything else off the top of my head that would definitely break). As I'm diving into this, it is clear that you want to preserve the semantics of ob_item and ob_size, as they are used in a whole bunch of places. For now I am tracking a var called orphans, which subtly changes one invariant: 0 = ob_size + orphans = allocated I think Christian covered most of the places that would need to change, and list_dealloc would also need to change. However, as Cameron pointed out, the O() value for an operation is an important characteristic of containers, and having people get used to an O(1) list.pop(0) in CPython could create problems not only for other current Python implementations but also for future versions of CPython itself. I hadn't thought of that. Here are the objections that I've heard or thought of myself: * The simplicity of the current implementation is important beyond the normal benefits of simplicity, since it is also a reference implementation for other ports of Python. * People who got used to O(1) in one version of Python might have unpleasant surprises when they went to other versions. * Alternatives to list already exist, such as deque and blist * An O(1) solution would increase the size of PyListObject. * An O(1) solution would postpone the release of the memory from the orphaned pointers. * An O(1) solution would slow down calls to list_resize, PyList_new, and list_dealloc. * For small and medium sized lists, memmove()'s penalty is usually drowned out by other operations on the list elements. * The use case of popping elements off a large list is not that common (although this might be somewhat driven by the documented slowness) * There may be third party code that relies on the current internal implementation Did I leave anything out? Here are the benefits of an O(1) implementation. * O(1) is faster than O(N) for some, presumably quite small, value of N. * Performance benefits tend to compound. If you have P processes doing pop(0) in a loop on an N-element list, you are saving P * N memmoves of size kN. * The technique required to make O(1) work is simple at its core--advance a pointer forward instead of moving the memory backward. * Encouraging the use of pop(0) will lead to leaner Python programs that release memory earlier in the process. * While not super common, there do exist programs today that pop from the top, either using pop itself or del, including programs in the standard library * The language moratorium provides a good window for performance improvements in general (even if this one does not pass the litmus test for other reasons) Did I leave anything out? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
I made enough of a patch to at least get a preliminary benchmark. The program toward the bottom of this email runs over 100 times faster with my patch. The patch still has a ways to go--I use a very primitive scheme to reclaim orphan pointers (1000 at a time) and I am still segfaulting when removing the last element of the list. But the initial results at least confirm that the intended benefit is achievable. I've attached the diff, in case anyone wants to try it out or help me figure out what else needs to change. The core piece of the patch is this--everything else is memory management related. +if (ilow == 0) { +a-orphans += 1; +a-ob_item += (-1 * d); +} +else { +memmove(item[ihigh+d], item[ihigh], +(Py_SIZE(a) - ihigh)*sizeof(PyObject *)); +} import time n = 8 lst = [] for i in range(n): lst.append(i) t = time.time() for i in range(n-1): del lst[0] print('time = ' + str(time.time() - t)) print(len(lst)) print('got here at least') show...@showell-laptop:~/PYTHON/py3k$ cat BEFORE 0 2.52699589729 show...@showell-laptop:~/PYTHON/py3k$ cat AFTER time = 0.0216660499573 1 got here at least Python 3.2a0 (py3k:77751M, Jan 25 2010, 20:25:21) [GCC 4.3.3] on linux2 Type help, copyright, credits or license for more information. 2.526996 / 0.021666 116.63417335918028 DIFF Description: Binary data ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On 1/25/2010 9:32 PM, Nick Coghlan wrote: However, as Cameron pointed out, the O() value for an operation is an important characteristic of containers, and having people get used to an O(1) list.pop(0) in CPython could create problems not only for other current Python implementations but also for future versions of CPython itself. The idea that CPython should not be improved because it would spoil programmers strikes me as a thin, even desparate objection. One could say that same thing about the recent optimization of string += string so that repeated concats are O(n) instead of O(n*n). What a trap if people move code to other implementations (or older Python) without that new feature. Of course, the whole purpose of adding a jit to CPython would be to spoil us. It is a fact already that optimizing CPython code is specific to a particular interpreter, system, and workload. Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
--- On Mon, 1/25/10, Steve Howell showel...@yahoo.com wrote: From: Steve Howell showel...@yahoo.com Subject: Re: [Python-Dev] patch to make list.pop(0) work in O(1) time To: Michael Foord fuzzy...@voidspace.org.uk, Nick Coghlan ncogh...@gmail.com Cc: Christian Heimes li...@cheimes.de, python-dev@python.org Date: Monday, January 25, 2010, 8:33 PM I made enough of a patch to at least get a preliminary benchmark. The program toward the bottom of this email runs over 100 times faster with my patch. The patch still has a ways to go--I use a very primitive scheme to reclaim orphan pointers (1000 at a time) and I am still segfaulting when removing the last element of the list. But the initial results at least confirm that the intended benefit is achievable. Ok, I fixed the obvious segfaults, and I am now able to pass all the tests on my debug build. A new diff is attached. There is still at least one bug in my code in listextend, which the tests do not seem to expose, so I will try to at least beef up the test suite a bit. I really like listobject.c. Very clean code, very easy to understand. I guess I shouldn't be surprised. DIFF Description: Binary data ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com