Quoting Victor Stinner <victor.stin...@gmail.com>:

Slowly, I'm trying to see if it would be possible to reduce the memory
footprint of Python using the tracemalloc module.
[...]
Should I open a separated issue for each idea to track them in the bug
tracker, or a global issue?

There is a third alternative which I would recommend: not open tracker
issues at all - unless you can also offer a patch. The things you find
are not bugs per se, not even "issues". It is fine and applaudable that
you look into this, but other people may have other priorities (like
reimplementing the hash function of string objects).

So if you remember that there is a potential for optimization, that
may be enough for the moment. Or share it on python-dev (as you do
below); people may be intrigued to look into this further, or ignore
it. It's easy to ignore a posting to python-dev, but more difficult to
ignore an issue on the tracker (*something* should be done about it,
e.g. close with no action).

First, I noticed that linecache can allocate more than 2 MB. What do
you think of adding a registry of "clear cache" functions? For
exemple, re.purge() and linecache.clearcache(). gc.collect() clears
free lists. I don't know if gc.collect() should be related to this new
registy (clear all caches) or not.

I'm -1 on this idea. There are some "canonical" events that could trigger
clearance of caches, namely
- out-of-memory situations
- OS signals indicating memory pressure
While these sound interesting in theory, they fail in practice. For
example, they are very difficult to test.

The dictionary of interned Unicode strings can be large: up to 1.5 MB
(with +30,000 strings). Just the dictionary, excluding size of
strings. Is the size normal or not? Using tracemalloc, this dictionary
is usually to largest memory block.

I'd check the contents of the dictionary. How many strings are in there;
how many of these are identifiers; how many have more than one outside
reference; how many are immortal?

If there is a lot of strings that are not identifiers, some code possibly
abuses interning, and should use its own dictionary instead. For the
refcount-1 mortal identifiers, I'd trace back where they are stored,
and check if many of them originate from the same module.

Regards,
Martin


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to