Quoting Victor Stinner <victor.stin...@gmail.com>:
Slowly, I'm trying to see if it would be possible to reduce the memory footprint of Python using the tracemalloc module.
[...]
Should I open a separated issue for each idea to track them in the bug tracker, or a global issue?
There is a third alternative which I would recommend: not open tracker issues at all - unless you can also offer a patch. The things you find are not bugs per se, not even "issues". It is fine and applaudable that you look into this, but other people may have other priorities (like reimplementing the hash function of string objects). So if you remember that there is a potential for optimization, that may be enough for the moment. Or share it on python-dev (as you do below); people may be intrigued to look into this further, or ignore it. It's easy to ignore a posting to python-dev, but more difficult to ignore an issue on the tracker (*something* should be done about it, e.g. close with no action).
First, I noticed that linecache can allocate more than 2 MB. What do you think of adding a registry of "clear cache" functions? For exemple, re.purge() and linecache.clearcache(). gc.collect() clears free lists. I don't know if gc.collect() should be related to this new registy (clear all caches) or not.
I'm -1 on this idea. There are some "canonical" events that could trigger clearance of caches, namely - out-of-memory situations - OS signals indicating memory pressure While these sound interesting in theory, they fail in practice. For example, they are very difficult to test.
The dictionary of interned Unicode strings can be large: up to 1.5 MB (with +30,000 strings). Just the dictionary, excluding size of strings. Is the size normal or not? Using tracemalloc, this dictionary is usually to largest memory block.
I'd check the contents of the dictionary. How many strings are in there; how many of these are identifiers; how many have more than one outside reference; how many are immortal? If there is a lot of strings that are not identifiers, some code possibly abuses interning, and should use its own dictionary instead. For the refcount-1 mortal identifiers, I'd trace back where they are stored, and check if many of them originate from the same module. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com