On Fri, 20 Jan 2017 19:49:01 +0900 INADA Naoki <songofaca...@gmail.com> wrote: > > Report is here > https://gist.github.com/methane/ce723adb9a4d32d32dc7525b738d3c31
"this script counts static memory usage. It doesn’t care about dynamic memory usage of processing real request" You may be trying to optimize something which is only a very small fraction of your actual memory footprint. That said, the marshal module could certainly try to intern some tuples and other immutable structures. > * Most large strings are docstring. Is it worth enough that option > for trim docstrings, without disabling asserts? Perhaps docstrings may be compressed and then lazily decompressed when accessed for the first time. lz4 and zstd are good modern candidates for that. zstd also has a dictionary mode that helps for small data (*). See https://facebook.github.io/zstd/ (*) Even a 200-bytes docstring can be compressed this way: >>> data = os.times.__doc__.encode() >>> len(data) 211 >>> len(lz4.compress(data)) 200 >>> c = zstd.ZstdCompressor() >>> len(c.compress(data)) 156 >>> c = zstd.ZstdCompressor(dict_data=dict_data) >>> len(c.compress(data)) 104 `dict_data` here is some 16KB dictionary I've trained on some Python docstrings. That 16KB dictionary could be computed while building Python (or hand-generated from time to time, since it's unlikely to change a lot) and put in a static array somewhere: >>> samples = [(mod.__doc__ or '').encode() for mod in sys.modules.values()] >>> sum(map(len, samples)) 258113 >>> dict_data = zstd.train_dictionary(16384, samples) >>> len(dict_data.as_bytes()) 16384 Of course, compression is much more efficient on larger docstrings: >>> import numpy as np >>> data = np.__doc__.encode() >>> len(data) 3140 >>> len(lz4.compress(data)) 2271 >>> c = zstd.ZstdCompressor() >>> len(c.compress(data)) 1539 >>> c = zstd.ZstdCompressor(dict_data=dict_data) >>> len(c.compress(data)) 1348 >>> import pdb >>> data = pdb.__doc__.encode() >>> len(data) 12018 >>> len(lz4.compress(data)) 6592 >>> c = zstd.ZstdCompressor() >>> len(c.compress(data)) 4502 >>> c = zstd.ZstdCompressor(dict_data=dict_data) >>> len(c.compress(data)) 4128 A similar strategy may be used for annotations and other rarely-accessed metadata. Another possibility, but probably much more costly in terms of initial development and maintenance, is to put the docstrings (+ annotations, etc.) in a separate file that's lazily read. I think optimizing the footprint for everyone is much better than adding command-line options to disable some specific metadata. Regards Antoine. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com