What, exactly, is frozen?  My understanding is that one problem with frozen
dicts in the past is deciding exactly what is mutable and what is
immutable.  Can you change what object a key maps to so long as the set of
keys stay the same?  Can you modify the contents of mutable object that is
a value?

On Tue, Jul 21, 2020 at 6:30 PM Marco Sulla <marco.sulla.pyt...@gmail.com>
wrote:

> Let me first say that the code and discussion, as per title, is also about
> possible performance improvements to the dict base type.
>
> TL;DR I implemented a frozendict using CPython 3.9 code. It seems that an
> immutable dict *could* be faster than dict in some cases. Furthermore, some
> optimization could be applied to dict too.
>
> Long explaining:
>
> Since now I have some time, I decided to experiment a little with the
> creation of an immutable dict in Python.
>
> Unluckily, I started this experiment many months ago, so the CPython code
> I used is old. Maybe some or all of my considerations are outdated.
>
> Initially, I wrote a quick and dirty implementation:
>
> https://github.com/Marco-Sulla/cpython/commit/fde4e6d236b19636063f8afedea8c50278205334
>
> The code was very simple, and the performance was identical to dict. So in
> theory, adding a frozendict to CPython is not hard. But there's more.
>
> After the first implementation, I started to try to improve the
> performance of frozendict. The result of the improvements are here:
>
>
> https://github.com/Marco-Sulla/cpython/blob/master/frozendict/test/bench.txt
>
> For benchmarks, I used simply timeit, with autorange and repeat and, as
> suggested in the module documentation, I got the minimum of the results.
> Here is the code:
>
> https://github.com/Marco-Sulla/cpython/blob/master/frozendict/test/bench.py
>
> I have not tested with an optimized build, since optimization data is
> collected using the unit tests, and I didn't write tests for frozendict in
> the official CPython test framework.
>
> The tests and benchmarks were done using CPython 3.9a0. CPU and other pc
> resources were not restricted using pyperf or similar tools, to see the
> "real" speed. CPython was compiled using gcc and g++.
>
> In benchmarks, I compared methods and operators using dict and frozendict.
> The benchmark uses dicts with all integers and dicts with all strings.
> Furthermore, I tested dicts with size 8 (the most optimized size in
> CPython) and 1000 elements (maybe too much, but I wanted to see how they
> perform with a high number of elements).
>
> Every benchmark has a line in the output. The Name describes the
> benchmarked method, operator or code snippet. Size is the size of the dict,
> 8 or 1000. Keys are the keys type, str or int. Type is the dictionary type,
> dict or frozendict.
>
> In Name, the "o" represents the object itself (dict or frozendict). "d",
> in benchmark with dict, is "o"; in benchmarks with frozendict is an
> equivalent instance of type dict.
>
> Some consideration:
>
> 1. frozendict is very fast, as expected, at copying. But it's also faster
> at creation, using a (not frozen) dict, kwargs or a sequence2. Speedups
> range from 20% to 45%.
> 2. frozendict is also a bit faster when you iterate over it, especially
> over values, where is ~15% faster
> 3. hash seems really fast, but this is because it's cached the first time
> hash() is invoked
> 4. where frozendict is slower is when you unpickle it and with fromkeys
> and repr. This is because I wrote a very naif implementation of these
> methods, without optimizing them. The other methods have a comparable speed.
>
> Here is the code:
> https://github.com/Marco-Sulla/cpython
>
> Here is the diff between the CPython code and my commits:
> https://github.com/python/cpython/compare/master...Marco-Sulla:master
>
> About code
>
> I coded the implementation doing a simple copy/paste of the existing dict
> functions, modifying their code and renaming them. This way I'm sure dict
> continues to work as before, and I can compare the speed gain.
>
> Some of the optimization I adopted can be implemented in `dict` too. For
> example, instead of creating an empty dict and resizing it, I create it
> with the "maximum" size and I fill it. It seems to work, even if I did not
> explore the possibility that a mutable object can change while a frozendict
> creation is in progress.
>
> Some problems with optimizing dict and maintaining a frozendict:
>
> 1. duplication of code. To gain a significant boost, I had to copy and
> paste a lot of code. Functions can be remerged again, but maybe the speedup
> will be reduced.
> 2. split vs combined dicts. As I wrote, split dicts seem to be faster in
> reading than combined dicts. For example, iterating over values is faster
> with a split dict, as expected.
> But writing was not tested; furthermore, some of the optimizations can be
> adopted for dicts too, so the convenience of a split table can be lowered.
> dict continues to maintain both split and combined tables, so this could
> be not a problem. But the code could be less and more fast if only a table
> layout is supported
> 3. the CPython code I used is old, so some of the improvements I adopted
> could be already implemented
>
> About frozendict
>
> Apart the considerations done in the [PEP 416](
> https://www.python.org/dev/peps/pep-0416/), that was rejected since there
> was little gain from its implementation, I think that frozendict can be
> useful as a substitute of MappingProxyType, that is really slow.
> MappingProxyType is not much used, but it's present in critical parts of
> CPython code, for example in _sre. We have to see if a mapping proxy type
> *can* be substituted with an immutable map in some critical part of CPython
> code.
>
> Furthermore, frozendicts could be used for implementing "immutable"
> classes and modules, and can be used as a faster dict if its content does
> not change.
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/K7CRVW6O7RO6DT3JIG3OAJCAVCA5CNTN/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/GAJHZEBOU5BGYJAUNQP5V66KSS6HHSQD/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to