[Python-ideas] Re: Experimenting with dict performance, and an immutable dict

Marco Sulla Tue, 21 Jul 2020 22:34:16 -0700

Yes. Values can be mutable, as tuple. A "pure" immutable dict is a
frozendict with only immutable values (strings, tuples of numbers etc). In
this case you also have a hash.


frozendict apart, I think some of my "tricks" could be applied to dict, if
the implementation sounds and there's a real benefit. I'm not a C expert
and my knowledge of the Python C API is very limited. But, if you core devs
think it's worth it, I could try to submit some patch in CPython trunk.

> one problem with frozen dicts in the past is deciding exactly what is
mutable and what is immutable.

Well, I had the same problem... Indeed it seems there's no way to use the
duck typing to understand if a type is mutable or not. I suppose this can
only be done using a convention, or a contract. If we know that an object
is immutable, you can adopt some cache and speedup strategies.

On Wed, 22 Jul 2020 at 06:53, Guido van Rossum <gu...@python.org> wrote:

> That seems pretty clear — presumably it follows the lead of frozenset and
> tuple.
>
> On Tue, Jul 21, 2020 at 21:45 Todd <toddr...@gmail.com> wrote:
>
>> What, exactly, is frozen?  My understanding is that one problem with
>> frozen dicts in the past is deciding exactly what is mutable and what is
>> immutable.  Can you change what object a key maps to so long as the set of
>> keys stay the same?  Can you modify the contents of mutable object that is
>> a value?
>>
>> On Tue, Jul 21, 2020 at 6:30 PM Marco Sulla <marco.sulla.pyt...@gmail.com>
>> wrote:
>>
>>> Let me first say that the code and discussion, as per title, is also
>>> about possible performance improvements to the dict base type.
>>>
>>> TL;DR I implemented a frozendict using CPython 3.9 code. It seems that
>>> an immutable dict *could* be faster than dict in some cases. Furthermore,
>>> some optimization could be applied to dict too.
>>>
>>> Long explaining:
>>>
>>> Since now I have some time, I decided to experiment a little with the
>>> creation of an immutable dict in Python.
>>>
>>> Unluckily, I started this experiment many months ago, so the CPython
>>> code I used is old. Maybe some or all of my considerations are outdated.
>>>
>>> Initially, I wrote a quick and dirty implementation:
>>>
>>> https://github.com/Marco-Sulla/cpython/commit/fde4e6d236b19636063f8afedea8c50278205334
>>>
>>> The code was very simple, and the performance was identical to dict. So
>>> in theory, adding a frozendict to CPython is not hard. But there's more.
>>>
>>> After the first implementation, I started to try to improve the
>>> performance of frozendict. The result of the improvements are here:
>>>
>>>
>>> https://github.com/Marco-Sulla/cpython/blob/master/frozendict/test/bench.txt
>>>
>>> For benchmarks, I used simply timeit, with autorange and repeat and, as
>>> suggested in the module documentation, I got the minimum of the results.
>>> Here is the code:
>>>
>>>
>>> https://github.com/Marco-Sulla/cpython/blob/master/frozendict/test/bench.py
>>>
>>> I have not tested with an optimized build, since optimization data is
>>> collected using the unit tests, and I didn't write tests for frozendict in
>>> the official CPython test framework.
>>>
>>> The tests and benchmarks were done using CPython 3.9a0. CPU and other pc
>>> resources were not restricted using pyperf or similar tools, to see the
>>> "real" speed. CPython was compiled using gcc and g++.
>>>
>>> In benchmarks, I compared methods and operators using dict and
>>> frozendict. The benchmark uses dicts with all integers and dicts with all
>>> strings. Furthermore, I tested dicts with size 8 (the most optimized size
>>> in CPython) and 1000 elements (maybe too much, but I wanted to see how they
>>> perform with a high number of elements).
>>>
>>> Every benchmark has a line in the output. The Name describes the
>>> benchmarked method, operator or code snippet. Size is the size of the dict,
>>> 8 or 1000. Keys are the keys type, str or int. Type is the dictionary type,
>>> dict or frozendict.
>>>
>>> In Name, the "o" represents the object itself (dict or frozendict). "d",
>>> in benchmark with dict, is "o"; in benchmarks with frozendict is an
>>> equivalent instance of type dict.
>>>
>>> Some consideration:
>>>
>>> 1. frozendict is very fast, as expected, at copying. But it's also
>>> faster at creation, using a (not frozen) dict, kwargs or a sequence2.
>>> Speedups range from 20% to 45%.
>>> 2. frozendict is also a bit faster when you iterate over it, especially
>>> over values, where is ~15% faster
>>> 3. hash seems really fast, but this is because it's cached the first
>>> time hash() is invoked
>>> 4. where frozendict is slower is when you unpickle it and with fromkeys
>>> and repr. This is because I wrote a very naif implementation of these
>>> methods, without optimizing them. The other methods have a comparable speed.
>>>
>>> Here is the code:
>>> https://github.com/Marco-Sulla/cpython
>>>
>>> Here is the diff between the CPython code and my commits:
>>> https://github.com/python/cpython/compare/master...Marco-Sulla:master
>>>
>>> About code
>>>
>>> I coded the implementation doing a simple copy/paste of the existing
>>> dict functions, modifying their code and renaming them. This way I'm sure
>>> dict continues to work as before, and I can compare the speed gain.
>>>
>>> Some of the optimization I adopted can be implemented in `dict` too. For
>>> example, instead of creating an empty dict and resizing it, I create it
>>> with the "maximum" size and I fill it. It seems to work, even if I did not
>>> explore the possibility that a mutable object can change while a frozendict
>>> creation is in progress.
>>>
>>> Some problems with optimizing dict and maintaining a frozendict:
>>>
>>> 1. duplication of code. To gain a significant boost, I had to copy and
>>> paste a lot of code. Functions can be remerged again, but maybe the speedup
>>> will be reduced.
>>> 2. split vs combined dicts. As I wrote, split dicts seem to be faster in
>>> reading than combined dicts. For example, iterating over values is faster
>>> with a split dict, as expected.
>>> But writing was not tested; furthermore, some of the optimizations can
>>> be adopted for dicts too, so the convenience of a split table can be
>>> lowered.
>>> dict continues to maintain both split and combined tables, so this could
>>> be not a problem. But the code could be less and more fast if only a table
>>> layout is supported
>>> 3. the CPython code I used is old, so some of the improvements I adopted
>>> could be already implemented
>>>
>>> About frozendict
>>>
>>> Apart the considerations done in the [PEP 416](
>>> https://www.python.org/dev/peps/pep-0416/), that was rejected since
>>> there was little gain from its implementation, I think that frozendict can
>>> be useful as a substitute of MappingProxyType, that is really slow.
>>> MappingProxyType is not much used, but it's present in critical parts of
>>> CPython code, for example in _sre. We have to see if a mapping proxy type
>>> *can* be substituted with an immutable map in some critical part of CPython
>>> code.
>>>
>>> Furthermore, frozendicts could be used for implementing "immutable"
>>> classes and modules, and can be used as a faster dict if its content does
>>> not change.
>>> _______________________________________________
>>> Python-ideas mailing list -- python-ideas@python.org
>>> To unsubscribe send an email to python-ideas-le...@python.org
>>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>>> Message archived at
>>> https://mail.python.org/archives/list/python-ideas@python.org/message/K7CRVW6O7RO6DT3JIG3OAJCAVCA5CNTN/
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>> _______________________________________________
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to python-ideas-le...@python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-ideas@python.org/message/GAJHZEBOU5BGYJAUNQP5V66KSS6HHSQD/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> --
> --Guido (mobile)
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/YPQRFJIHHQQAVYN5AZ53HONUJJRKEV5A/
> Code of Conduct: http://python.org/psf/codeofconduct/
>

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KG737YMR3OC7KMYFAUWXG2P6MQ4HY62E/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Experimenting with dict performance, and an immutable dict

Reply via email to