New submission from Ben Spiller <[email protected]>:
In performance-critical python code, it's quite common to need to get an item
from a dictionary, falling back on a default (e.g. None, 0 etc) if it doesn't
yet exist. The obvious way to do this based on the documentation is to call the
dict.get() method:
> python -m timeit -s "d={'abc':123}" "x=d.get('abc',None)"
5000000 loops, best of 5: 74.6 nsec per loop
... however the performance of natural approach is very poor (2.2 times
slower!) compared to the time really needed to look up the value:
>python -m timeit -s "d={'abc':123}" "x=d['abc']"
5000000 loops, best of 5: 33 nsec per loop
There are various ways to do this more efficiently, but they all have
significant performance or functional drawbacks:
custom dict subclass with __missing__() override: promising approach, but use
of a custom class instead of dict seems to increase [] cost significantly:
>python -m timeit -s "class mydict(dict):" -s " def
>__missing__(self,key):return None" -s "d = mydict({'abc':123})" "x=d['abc']"
5000000 loops, best of 5: 60.4 nsec per loop
get() with caching of function lookup - somewhat better but not great:
>python -m timeit -s "d={'abc':123}; G=d.get" "x=G('abc',None)"
5000000 loops, best of 5: 59.8 nsec per loop
[] and "in" (inevitably a bit slow due to needing to do the lookup twice):
>python -m timeit -s "d={'abc':123}" "x=d['abc'] if 'abc' in d else None"
5000000 loops, best of 5: 53.9 nsec per loop
try/except approach: quickest solution if it exists, but clunky syntax, and
VERY slow if it doesn't exist
>python -m timeit -s "d={'abc':123}" "try:" " x=d['abc']" "except KeyError:
>pass"
5000000 loops, best of 5: 41.8 nsec per loop
>python -m timeit -s "d={'abc':123}" "try:" " x=d['XXX']" "except KeyError:
>pass"
1000000 loops, best of 5: 174 nsec per loop
collections.defaultdict: reasonable performance if key exists, but unwanted
behaviour of adding the key if missing (which if used with an unbounded
universe of keys could produce a memory leak):
>python -m timeit -s "import collections; d=collections.defaultdict(lambda:
>None); d['abc']=123; " "x=d['XXX']"
5000000 loops, best of 5: 34.3 nsec per loop
I bet we can do better!
Lots of solutions are possible - maybe some kind of peephole optimization to
make dict.get() itself perform similarly to the [] operator, or if that's
challenging perhaps providing a class or option that behaves like defaultdict
but without the auto-adding behaviour and with comparable [] performance to the
"dict" type - for example dict.raiseExceptionOnMissing=False, or perhaps even
some kind of new syntax (e.g. dict['key', default=None]). Which option would be
easiest/nicest?
----------
components: Interpreter Core
messages: 353206
nosy: benspiller
priority: normal
severity: normal
status: open
title: Need a more efficient way to perform dict.get(key, default)
type: enhancement
versions: Python 3.7
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue38278>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com