Noam Raphael <[EMAIL PROTECTED]> wrote: > On 12/29/05, Fredrik Lundh <[EMAIL PROTECTED]> wrote: > > Noam Raphael wrote: > > > > > I'm not saying that practically it must be used - I'm just saying that > > > it can't be called a heuristic, and that it doesn't involve any "fancy > > > overkill size hinting or history tracking". It actually means > > > something like this: > > > 1. If you want to insert and the table is full, resize the table to > > > twice the current size. > > > 2. If you delete and the number of elements turns out to be less than > > > a quarter of the size of the table, resize the table to half of the > > > current size. > > > > sure sounds like a heuristic algorithm to me... (as in "not guaranteed to > > be optimal under all circumstances, even if it's probably quite good in all > > practical cases") > > I'm not saying it's optimal, but it is really amortized O(1) per > insert/delete. I looked up in "Introduction to Algorithms" for this, > and it has a complicated explanation. A simple explanation is that > after every resize the table is exactly half-full. Let's say it has n > elements and the table size is 2*n. To get to the next resize, you > have to do at least n/2 removals of elements, or n insertion of > elements. After that, you do a resize operation. In either case, you > do an O(n) resize operation after at least O(n) insertions/removals > which are O(1) operations. This means that the toal cost remains O(n) > per n simple operations, which you can say is O(1) per simple > operation. > > I hope that if you read this slowly it makes sense...
This is understood by (I would expect) most people here (hash-tables are (theoretically and practically) average as you state, but (theoretically) worst-case as Martin states). For resizing, a quick-and-dirty rule of thumb is that if you are overallocating by a factor of f(n), the amount of work you will be performing per insertion is ~cn/f(n) (1 <= c <= 2). As per recent discussions on lists, Python chooses f(n) to be n/8. (at least in the list case) This says that every insertion is taking around ~8 memory copies if/when the list is resized up, but practical experience has shown that it also tends to minimize memory usage as the list grows. It's all about tradeoffs. Increasing general memory usage for the sake of a lower constant or not is a tradeoff. As is resizing or not resizing as a list gets smaller. Would changing the overallocation strategy change Python's performance? Likely, but possibly not noticable. Would it change Python memory usage? Yes. A vast majority of list use would cause larger allocations than currently performed by the list allocator. Want to test it? Create a micro benchmark which tests repeated append performance with the two list allocation strategies and remember to note the memory usage of Python after each test. - Josiah _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com