On Mon, 03 Sep 2012 13:04:23 -0700, Aaron Brady wrote: > On Monday, September 3, 2012 2:30:24 PM UTC-5, Ian wrote: >> On Sun, Sep 2, 2012 at 11:43 AM, Aaron Brady <castiro...@gmail.com> >> wrote: >> >> > We could use a Python long object for the version index to prevent >> > overflow. Combined with P. Rubin's idea to count the number of open >> > iterators, most use cases still wouldn't exceed a single word >> > comparison; we could reset the counter when there weren't any. >> >> We could use a Python long; I just don't think the extra overhead is >> justified in a data structure that is already highly optimized for >> speed. Incrementing and testing a C int is *much* faster than doing >> the same with a Python long. > > I think the technique would require two python longs and a bool in the > set, and a python long in the iterator. > > One long counts the number of existing (open) iterators. Another counts > the version. The bool keeps track of whether an iterator has been > created since the last modification, in which case the next modification > requires incrementing the version and resetting the flag.
I think that is over-engineered and could be the difference between having the patch accepted and having it rejected. After all, many people will argue that the existing solution to the problem is good enough. Dicts are extremely common, and your patch increases both the memory usage of every dict, and the overhead of every write operation (__setitem__, __delitem__, update). Only a very few dicts will actually need this overhead, for the rest it is waste. It is important to keep that waste to a minimum or risk having the patch rejected. An unsigned C int can count up to 4,294,967,295. I propose that you say that is enough iterators for anyone, and use a single, simple, version counter in the dict and the iterator. If somebody exceeds that many iterators to a single dict or set, and the version field overflows by exactly 2**32 versions, the results are no worse than they are now. You won't be introducing any new bugs. Complicating the solution is, in my opinion, unnecessary. Why should every set and dict carry the cost of incrementing TWO Python longs and a flag when just a single C int is sufficient for all realistic use-cases? The opportunity for failure is extremely narrow: - I must have an iterator over a dict or set; - and I must have paused the iteration in some way; - and then I must create exactly 2**32 other iterators to the same dict or set; - and at some point modify the dict or set - and then restart the first iterator at which point some items returned by the iterator *may* be duplicated or skipped (depends on the nature of the modifications). -- Steven -- http://mail.python.org/mailman/listinfo/python-list