What, exactly, is frozen? My understanding is that one problem with frozen dicts in the past is deciding exactly what is mutable and what is immutable. Can you change what object a key maps to so long as the set of keys stay the same? Can you modify the contents of mutable object that is a value?
On Tue, Jul 21, 2020 at 6:30 PM Marco Sulla <marco.sulla.pyt...@gmail.com> wrote: > Let me first say that the code and discussion, as per title, is also about > possible performance improvements to the dict base type. > > TL;DR I implemented a frozendict using CPython 3.9 code. It seems that an > immutable dict *could* be faster than dict in some cases. Furthermore, some > optimization could be applied to dict too. > > Long explaining: > > Since now I have some time, I decided to experiment a little with the > creation of an immutable dict in Python. > > Unluckily, I started this experiment many months ago, so the CPython code > I used is old. Maybe some or all of my considerations are outdated. > > Initially, I wrote a quick and dirty implementation: > > https://github.com/Marco-Sulla/cpython/commit/fde4e6d236b19636063f8afedea8c50278205334 > > The code was very simple, and the performance was identical to dict. So in > theory, adding a frozendict to CPython is not hard. But there's more. > > After the first implementation, I started to try to improve the > performance of frozendict. The result of the improvements are here: > > > https://github.com/Marco-Sulla/cpython/blob/master/frozendict/test/bench.txt > > For benchmarks, I used simply timeit, with autorange and repeat and, as > suggested in the module documentation, I got the minimum of the results. > Here is the code: > > https://github.com/Marco-Sulla/cpython/blob/master/frozendict/test/bench.py > > I have not tested with an optimized build, since optimization data is > collected using the unit tests, and I didn't write tests for frozendict in > the official CPython test framework. > > The tests and benchmarks were done using CPython 3.9a0. CPU and other pc > resources were not restricted using pyperf or similar tools, to see the > "real" speed. CPython was compiled using gcc and g++. > > In benchmarks, I compared methods and operators using dict and frozendict. > The benchmark uses dicts with all integers and dicts with all strings. > Furthermore, I tested dicts with size 8 (the most optimized size in > CPython) and 1000 elements (maybe too much, but I wanted to see how they > perform with a high number of elements). > > Every benchmark has a line in the output. The Name describes the > benchmarked method, operator or code snippet. Size is the size of the dict, > 8 or 1000. Keys are the keys type, str or int. Type is the dictionary type, > dict or frozendict. > > In Name, the "o" represents the object itself (dict or frozendict). "d", > in benchmark with dict, is "o"; in benchmarks with frozendict is an > equivalent instance of type dict. > > Some consideration: > > 1. frozendict is very fast, as expected, at copying. But it's also faster > at creation, using a (not frozen) dict, kwargs or a sequence2. Speedups > range from 20% to 45%. > 2. frozendict is also a bit faster when you iterate over it, especially > over values, where is ~15% faster > 3. hash seems really fast, but this is because it's cached the first time > hash() is invoked > 4. where frozendict is slower is when you unpickle it and with fromkeys > and repr. This is because I wrote a very naif implementation of these > methods, without optimizing them. The other methods have a comparable speed. > > Here is the code: > https://github.com/Marco-Sulla/cpython > > Here is the diff between the CPython code and my commits: > https://github.com/python/cpython/compare/master...Marco-Sulla:master > > About code > > I coded the implementation doing a simple copy/paste of the existing dict > functions, modifying their code and renaming them. This way I'm sure dict > continues to work as before, and I can compare the speed gain. > > Some of the optimization I adopted can be implemented in `dict` too. For > example, instead of creating an empty dict and resizing it, I create it > with the "maximum" size and I fill it. It seems to work, even if I did not > explore the possibility that a mutable object can change while a frozendict > creation is in progress. > > Some problems with optimizing dict and maintaining a frozendict: > > 1. duplication of code. To gain a significant boost, I had to copy and > paste a lot of code. Functions can be remerged again, but maybe the speedup > will be reduced. > 2. split vs combined dicts. As I wrote, split dicts seem to be faster in > reading than combined dicts. For example, iterating over values is faster > with a split dict, as expected. > But writing was not tested; furthermore, some of the optimizations can be > adopted for dicts too, so the convenience of a split table can be lowered. > dict continues to maintain both split and combined tables, so this could > be not a problem. But the code could be less and more fast if only a table > layout is supported > 3. the CPython code I used is old, so some of the improvements I adopted > could be already implemented > > About frozendict > > Apart the considerations done in the [PEP 416]( > https://www.python.org/dev/peps/pep-0416/), that was rejected since there > was little gain from its implementation, I think that frozendict can be > useful as a substitute of MappingProxyType, that is really slow. > MappingProxyType is not much used, but it's present in critical parts of > CPython code, for example in _sre. We have to see if a mapping proxy type > *can* be substituted with an immutable map in some critical part of CPython > code. > > Furthermore, frozendicts could be used for implementing "immutable" > classes and modules, and can be used as a faster dict if its content does > not change. > _______________________________________________ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-le...@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/K7CRVW6O7RO6DT3JIG3OAJCAVCA5CNTN/ > Code of Conduct: http://python.org/psf/codeofconduct/ >
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/GAJHZEBOU5BGYJAUNQP5V66KSS6HHSQD/ Code of Conduct: http://python.org/psf/codeofconduct/