Martin v. Löwis wrote:
- it would be useful to have a specialized representation for
  all-keys-are-strings. In that case, me_hash could be dropped
  from the representation. You would get savings compared to
  the status quo even in the non-shared case.
It might tricky switching key tables and I dont think it would save much
memory as keys that are widely shared take up very little memory anyway,
and not many other dicts are long-lived.

Why do you say that? In a plain 3.3 interpreter, I counted 595 dict
objects (see script below). Of these, 563 (so nearly of them) had
only strings as keys. Among those, I found 286 different key sets,
where 231 key sets occurred only once (i.e. wouldn't be shared).

Together, the string dictionaries had 13282 keys, and you could save
as many pointers (actually more, because there will be more key slots
than keys).

The question is how much memory needs to be saved to be worth adding the complexity, 10kb: No, 100Mb: yes.
So data from "real" benchmarks would be useful.

Also, I'm assuming that it would be tricky to implement correctly due to implicit assumptions in the rest of the code.
If I'm wrong and its easy to implement then please do.


I'm not sure why you think the string dicts with unshared keys would be
short-lived.
Not all, but most. Most dicts with unshared keys would most likely be
for keyword parameters. Explicit dicts tend to be few in number.
(When I say few I mean up to 1k, not 100k or 1M).

Module dicts are very likely to have unshared keys; they number in the 10s or 100s, but they do tend to be large.

But even if they were, what matters is the steady-state
number of dictionaries - if for every short-lived dictionary that
gets released another one is created, any memory savings from reducing
the dict size would still materialize.
But only a few kb?


- I wonder whether the shared keys could be computed at compile
  time, considering all attribute names that get assigned for
  self. The compiler could list those in the code object, and
  class creation could iterate over all methods (taking base
  classes into account).

It probably wouldn't be that hard to make a guess at compile time as to
what the shared keys would be, but it doesn't really matter.
The generation of intermediate shared keys will only happen once per
class, so the overhead would be negligible.

I'm not so much concerned about overhead, but about correctness/
effectiveness of the heuristics. For a class with dynamic attributes,
you may well come up with a very large key set. With source analysis,
you wouldn't attempt to grow the keyset beyond what likely is being
shared.
I agree some sort of heuristic is required to limit excessive growth
and prevent pathological behaviour.
The current implementation just has a cut off at a certain size;
it could definitely be improved.

As I said, I'll update the code soon and then, well what's the phase...
Oh yes, "patches welcome" ;)

Thanks for the feedback.

Cheers,
Mark.


Regards,
Martin

import sys
d = sys.getobjects(0,dict)
print(len(d), "dicts")
d2 = []
for o in d:
    keys = o.keys()
    if not keys:continue
    types = tuple(set(type(k) for k in keys))
    if types != (str,):
        continue
    d2.append(tuple(sorted(keys)))
print(len(d2), "str dicts")
freq = {}
for keys in d2:
    freq[keys] = freq.get(keys,0)+1
print(len(freq), "different key sets")
freq = sorted(freq.items(), key=lambda t:t[1])
print(len([o for o in freq if o[1]==1]), "unsharable")
print(sum(len(o[0]) for o in freq), "keys")
print(freq[-10:])




_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to