Hi all,
I think we need some more data before going any further reimplementing
dicts.
What I would like to know is, across a set of Python programs (ideally a
representative set), what the proportion of dicts in memory at any one
time are:
a) instance dicts
b) other namespace dicts (classes and modules)
c) data dicts with all string keys
d) other data dicts
e) keyword argument dicts (I'm guessing this is vanishingly small)
I would expect that (a) far exceeds (b) and depending on the application
also considerably exceeds (c), but I would like some real data.
From that we can compute the (approximate) memory costs of the
competing designs.
As an aside, if anyone is really keen to save memory, then removing the
cycle GC header is the thing to do.
That uses 24 bytes per object and *half* of all live objects have it.
And don't forget that any Python object is really two objects, the
object and its dict, so that is 48 extra bytes every time you create a
new object.
Cheers,
Mark.
On 22/06/16 10:23, INADA Naoki wrote:
As my last email, compact ordered dict can't preserve
insertion order of key sharing dict (PEP 412).
I'm thinking about deprecating key shared dict for now.
Instead, my new idea is introducing more compact dict
specialized for namespace.
If BDFL (or BDFL delegate) likes this idea, I'll take another
one week to implement this.
Background
----------------
* Most keys of namespace dict are string.
* Calculating hash of string is cheap (one memory access, thanks for cache).
* And most keys are interned already.
Design
----------
Instead of normal PyDictKeyEntry, use PyInternedKeyEntry like this.
typedef struct {
// no me_hash
PyObject *me_key, *me_value;
} PyInternedKeyEntry;
insertdict() interns key if it's unicode, otherwise it converts dict to
normal compact ordered dict.
lookdict_interned() compares only pointer (doesn't call unicode_eq())
when searching key is interned.
And add new internal API to create interned key only dict.
PyDictObject* _PyDict_NewForNamespace();
Memory usage
--------------------
on amd64 arch.
key-sharing dict:
* 96 bytes for ~3 items
* 128 bytes for 4~5 items.
compact dict:
* 224 bytes for ~5 items.
(232 bytes when keep supporting key-shared dict)
interned key only dict:
* 184 bytes for ~5 items
Note
------
Interned key only dict is still larger than key-shared dict.
But it can be used for more purpose. It can be used for interning string
for example. It can be used to kwargs dict when all keys are interned already.
If we provide _PyDict_NewForNamespace to extension modules,
json decoder can have option to use this, too.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com