[issue27275] KeyError thrown by optimised collections.OrderedDict.popitem()

Josh Rosenberg Tue, 25 Oct 2016 07:32:11 -0700

Josh Rosenberg added the comment:

Explaining expiringdict's issue: It's two race conditions, with itself, not 
another thread.


Example scenario (narrow race window):

1. An entry has an expiry time of X (so it will self-delete at time X or later)
2. At time X-1, the PySequence_Contains check is run, and it returns 1 (true)
3. Because contains returned True, at time X PyObject_GetItem is run, but 
because it's time X, expiringdict's __getitem__ deletes the entry and raises 
KeyError

An alternative scenario with a *huge* race window is:

1. An entry has an expiry time of X (so it will self-delete at time X or later)
2. No lookups or membership tests or any other operations that implicitly clean 
the expiring dict occur for a while
3. At time X+1000, _odict_FIRST (in popitem) grabs the first entry in the 
OrderedDict without invoking the __contains__ machinery that would delete the 
entry
3. At time X+1000 or so, the PySequence_Contains check is run, and it returns 0 
(false), because the __contains__ machinery is invoked, and again, because no 
default is provided for popitem, a KeyError is raised (this time by the popitem 
machinery, not __getitem__)

expiringdict is unusually good at bringing this on itself. The failing popitem 
call is in __setitem__ for limited length expiringdicts, 
self.popitem(last=False), where they're intentionally removing the oldest 
entry, when the oldest entry is the most likely to have expired (and since 
__len__ isn't overridden to expire old entries, it may have been expired for 
quite a while).

The del self[next(OrderedDict.__iter__(self))] works because they didn't 
override __iter__, so it's not expiring anything to get the first item, and 
therefore only __delitem__ is involved, not __contains__ or __getitem__ (note: 
This is also why the bug they reference has an issue with "OrderedDict mutated 
during iteration"; iteration returns long expired keys, but looking the expired 
keys up deletes them, causing the mutation issue).


Possible correct fixes:

1. Make popitem _more_ subclass friendly; when it's an OrderedDict subclass, 
instead of using _odict_LAST and _odict_FIRST, use (C equivalent) of 
`next(reversed(self))` and `next(iter(self))` respectively. This won't fix 
expiringdict as is (because it's broken by design; it doesn't override 
__iter__, so it will happily return long expired keys that disappear on 
access), but if we're going for subclass friendliness and allowing 
non-idempotent __contains__/__getitem__/__iter__ implementations, it's the 
right thing to do. If expiringdict implemented __iter__ to copy the keys, then 
loop over the copy, deleting expired values and yielding unexpired values, this 
would at least reduce the huge race window to a narrow window (because it could 
still yield a key that's almost expired)
2. Check for the presence of __missing__ on the type and only perform the 
PySequence_Contains check if __missing__ is defined (to avoid 
autovivification). This fixes the narrow race condition for subclasses without 
__missing__, but not the huge race condition
3. Explicitly document the idempotence assumptions made by OrderedDict 
(specifically, that all non-mutating methods of OrderedDict must not be made 
mutating in subclasses unless the caller also overrides all multistep 
operations, e.g. pop/popitem/setdefault).

TL;DR: expiringdict is doing terrible things, assuming the superclass will 
handle them even though the superclass has completely different assumptions, 
and therefore expiringdict has only itself to blame.

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue27275>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27275] KeyError thrown by optimised collections.OrderedDict.popitem()

Reply via email to