On 4/25/07, Raymond Hettinger <[EMAIL PROTECTED]> wrote: > In practice, the code for BList is somewhat complex, and its desirability and > performance in actual applications is unproven. > Fortunately, Py2.6 is still a long way off. My recommendation is that you > release it right away as a third-party module and let the > hordes of Pythonistas test it in battle. With the code base thoroughly > exercised and some positive user feedback, it would be hard > to say no to this going into Py2.6.
Will do. What's the best way to release a module to make it easy for people to find, download, and install? (a pointer to a relevant document would be great) > How does BList's memory consumption compare with the current list > over-allocation scheme? > > Does it take longer to instantiate an emtpy list? What are the malloc/free > patterns? See these earlier message in the thread: http://mail.python.org/pipermail/python-3000/2007-April/006801.html http://mail.python.org/pipermail/python-3000/2007-April/006804.html http://mail.python.org/pipermail/python-3000/2007-April/006809.html The short summary is that BList and the current list have very similar best-case and worst-case memory consumption for a single list. (If copy-on-write comes into play, BList is much more memory efficient than a regular list) Malloc/free operations occur when the list is large and growing or shrinking (approximately when 64 elements have been added or removed). They occur more often than for list(), but if you're doing a lot of growing and shrinking the O(log n) vs O(n) performance boost easily pays for the extra memory operations. All the blocks are the same size. The BList code keeps a small list of recently freed nodes to recycle, which speeds things up in certain common cases where additions and removals are interchanged. > > In other words, for short lists, > > a BList works just like Python's array-based list() type. Thus, it > > has the same good performance on small lists. > > While this is a good implementation idea (similar ideas are used to speed-up > quicksort for instance), it makes the first listed > advantage seem like a straw-man designed to mislead proposal reviewers into > thinking that the blist structure is more efficient than > is actually is. Perhaps rewrite it as: "To keep the performance acceptable > for short lists, btreelists fall back to the existing > implementation." I fear that you have misunderstood me, as the BList does not fall back on the existing implementation. I'm sorry that you may have gotten the impression that I was attempting to mislead. I admit openly that the performance for small lists is not due to any fancy data structure. However, it's not array-like as a special case just for small lists, either. Each BList node contains an array of up to 128 children. If the BList node is a leaf, these children are pointers to items stored by the user, otherwise they are pointers to other BList nodes. If the list contains 128 or fewer children, the BList can have one node which is both the root and a leaf. This node just has an array of the user's items, so it ends up having the same performance as Python's array-based list. It is true that you end up with array-operations when you have a small list and as the base case for many operations implemented via recursion. In some sense that is similar to a QuickSort that uses InsertionSort for n < threshold. Unlike QuickSort, I couldn't rewrite it *without* array operations as the base case, and array operations also occur in the middle of the tree. They're fundamental, not a performance tweak. So I don't think the QuickSort with InsertionSort analogy holds. How about if I include a brief description up-front of how it works like: "The BList internally uses a tree structure where each node contains an array of up to 128 children." Then the first performance bullet can be something like "just as fast as a Python list() when the list is small, since BList's are also based on arrays"? > FWIW, I prefer that the current list implementation remain in-place. Because > what we have is simple, it is not hard to maintain, its > performance characteristics are easily understood, and we have a > straight-forward C-API that allows module writers to have fast, > intuitive direct access to the structure. > > OTOH, if your btreelist goes into Py2.6's collections module and users flock > to it instead of regular lists, then there may be some > basis for further discussion. This sounds like a good plan to me, though how does that square with Guido's desire to avoid a proliferation of data types? http://mail.python.org/pipermail/python-3000/2007-April/006774.html > BTW, please offer-up a pure python version (like we do for the heapq module). > That will make it easier for the curious to find-out > how the magic is done. And, it will make life easier for Jython, IronPython, > and PyPy implementer so they won't have to do any > extra work to support Py2.6. Will do. > > There are only a few use-cases (that I can think of) where Python's > > list() regularly outperforms the BList. These are: > > > > 1. A large LIFO stack, where there are many .append() and .pop(-1) > > operations. These are O(1) for a Python list, but O(log n) for the > > BList(). > > This is a somewhat important use-case (we devote two methods to it). > > > 2. An large, unchanging list, where there are many getitem() calls, > > and none of the inserts or deletes where the BList() shines. Again, > > these are O(1) for a Python list, but O(log n) for the BList(), and > > the BList() is just as fast if the list is small. > > This is also a critical use-case. I don't disagree. I think the key question is this: Given that there's no performance difference for small lists, would you rather have a type that for large lists: 1) Gives you O(1) speed for some critical use-cases, but O(n) for others, or 2) Gives you O(log n) across the board? Put another way, for a 10,000 element list how would you feel about taking a small performance hit (50% increase in execution time for LIFO; 5% increase in execution time for getitem) in exchange for large performance improvements elsewhere (< 1% of the old execution time for getslice, setslice, FIFO, etc.)? I obviously favor #2, but I acknowledge that this is a subjective question that depends on which use-cases you care most about. Your incremental deployment plan (extension module, possibly in collections for 2.6, go from there) may be a good way to evaluate how the community as a whole feels. > You've come a long way since then. Congrats. > > Thanks for your contribution and thoughtful discussion. This weekend, I look > forward to reading you source code in detail. Thanks! I've just adding my Python prototype implementation to the .tar.gz. It has a lengthy text section at the top that describes the structure of the BList and the invariants that it must maintain. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises LLC _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com