Kristján Valur Jónsson added the comment: Basically, reuse of strings (and preservation of their internment status) fell by the wayside somewhere in the 3.x transition. Strings have been reused, and interned strings re-interned, since protocol version 1 in 2.x. This patch adds that feature back, and uses that mechanism to reuse not only strings, but also any other multiply-referenced object.
It is not desirable to simply intern all strings that are read from marshaled data. Only selected strings are interned by python during compilation and we want to keep it that way. Also, 2.x reuses not only interned strings but other strings as well. Generalizing reuse of strings to other objects is trivial, and a logical step forward. This allows optimizations to be made on code objects where common data are identified and instanced, and those code objects to be saved and reloaded with that instancing intact. But even without such code-object optimization, the changes are significant: The sizes of the marshaled code object of lib/test/test_marshal drops from 24093 bytes in version 2 to 17841 bytes with version 3, without any additional massaging of the module code object. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue16475> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com