Josh Rosenberg added the comment: Explaining expiringdict's issue: It's two race conditions, with itself, not another thread.
Example scenario (narrow race window): 1. An entry has an expiry time of X (so it will self-delete at time X or later) 2. At time X-1, the PySequence_Contains check is run, and it returns 1 (true) 3. Because contains returned True, at time X PyObject_GetItem is run, but because it's time X, expiringdict's __getitem__ deletes the entry and raises KeyError An alternative scenario with a *huge* race window is: 1. An entry has an expiry time of X (so it will self-delete at time X or later) 2. No lookups or membership tests or any other operations that implicitly clean the expiring dict occur for a while 3. At time X+1000, _odict_FIRST (in popitem) grabs the first entry in the OrderedDict without invoking the __contains__ machinery that would delete the entry 3. At time X+1000 or so, the PySequence_Contains check is run, and it returns 0 (false), because the __contains__ machinery is invoked, and again, because no default is provided for popitem, a KeyError is raised (this time by the popitem machinery, not __getitem__) expiringdict is unusually good at bringing this on itself. The failing popitem call is in __setitem__ for limited length expiringdicts, self.popitem(last=False), where they're intentionally removing the oldest entry, when the oldest entry is the most likely to have expired (and since __len__ isn't overridden to expire old entries, it may have been expired for quite a while). The del self[next(OrderedDict.__iter__(self))] works because they didn't override __iter__, so it's not expiring anything to get the first item, and therefore only __delitem__ is involved, not __contains__ or __getitem__ (note: This is also why the bug they reference has an issue with "OrderedDict mutated during iteration"; iteration returns long expired keys, but looking the expired keys up deletes them, causing the mutation issue). Possible correct fixes: 1. Make popitem _more_ subclass friendly; when it's an OrderedDict subclass, instead of using _odict_LAST and _odict_FIRST, use (C equivalent) of `next(reversed(self))` and `next(iter(self))` respectively. This won't fix expiringdict as is (because it's broken by design; it doesn't override __iter__, so it will happily return long expired keys that disappear on access), but if we're going for subclass friendliness and allowing non-idempotent __contains__/__getitem__/__iter__ implementations, it's the right thing to do. If expiringdict implemented __iter__ to copy the keys, then loop over the copy, deleting expired values and yielding unexpired values, this would at least reduce the huge race window to a narrow window (because it could still yield a key that's almost expired) 2. Check for the presence of __missing__ on the type and only perform the PySequence_Contains check if __missing__ is defined (to avoid autovivification). This fixes the narrow race condition for subclasses without __missing__, but not the huge race condition 3. Explicitly document the idempotence assumptions made by OrderedDict (specifically, that all non-mutating methods of OrderedDict must not be made mutating in subclasses unless the caller also overrides all multistep operations, e.g. pop/popitem/setdefault). TL;DR: expiringdict is doing terrible things, assuming the superclass will handle them even though the superclass has completely different assumptions, and therefore expiringdict has only itself to blame. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue27275> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com