Jim Jewett wrote:
Greg Ewing wrote:
Mark Shannon wrote:

I have a new dict implementation which allows sharing of keys between
objects of the same class.

We already have the __slots__ mechanism for memory savings.
Have you done any comparisons with that?

You can't make Python programmers use slots, neither can you
automatically change existing programs.

The automatic change is exactly what a dictionary upgrade provides.

I haven't read your patch in detail yet, but it sounds like you're
replacing the array of keys + array of values with just an array of
values, and getting the numerical index from a single per-class array
of keys.

Each dictionary has key/hash/values as before, but instead of on array,
they are broken into two: a key/hash array and a value array.
The key/hash arrays can be shared amongst dicts,
this happens for well behaved classes and completely empty dicts,
other wise each dict gets two arrays.


That would normally be sensible (so thanks!), but it isn't a drop-in
replacement.  If you have a "Data" class intended to take arbitrary

It is a drop in replacement. It conforms to the current API.

per-instance attributes, it just forces them all to keep resizing up,
even though individual instances would be small with the current dict.
There is a cut-off point, at the moment it's quite unsophisticated about how it does this, but it could easily be improved.
Suggestions are welcome.


How is this more extreme than replacing a pure dict with some
auto-calculated slots and an "other_attrs" dict that would normally
remain empty?

Its less extreme, but equally effective.


[It may be harder to implement, because of the difficulty of
calculating the slots in advance ... but I don't see it as any worse,
once implemented.]
Its a trade of between ease of implementation as effectiveness.
I think the shared key/hash array approach gets most the advantages of
a full map implementation (like PyPy or V8) with much less hassle.


Of course, maybe your shared dict just points to sequential array
positions (rather than matching the key position) ... in which case,
it may well beat slots, though the the "Data" class would still be a
problem.

It won't beat slots, mainly due to the extra space required to minimise collisions, but it is a lot more compact than the present approach.

For a well behaved class with lots of instances, each with 3 or 4 attributes (ie the minimum size dict) its cuts the space used by the per-instance dict from 136 bytes (32bit machine) to 64 bytes plus the shared key/hash array. Slots would only require 12 or 16 bytes.

(When verifying these numbers I found a bug in the resizing,
which I have just fixed)

The next enhancement would be to store the naked value array directly into an instance, trimming the space cost down to just 32 bytes, but this would cause compatibility issues as the (internal) API would need to change.

Cheers,
Mark.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to