2014-01-28 "Martin v. Löwis" <mar...@v.loewis.de>: > Debugging reveals that it is actually the many integer objects which > trigger the sharing code. So a much simplified example of Victor's > benchmarking code can use > > data = [0]*10000000 > > The difference between version 2 and version 3 here is that v2 marshals > a lot of "0" integers, whereas version 3 marshals a single one, and then > a lot of references to this integer.
Since the output size looks to be the same, it may be interesting to special-case small integers, or even integers and floats in general. Handling references to these numbers takes probably more CPU, whereas the gain on the file size is probably minor. I wrote a short patch: http://bugs.python.org/issue20416 "dumps v3 is 60% faster, loads v3 is also 14% *faster*." "dumps v4 is 66% faster, loads v4 is 16% faster." "file size (on version 3 and 4) is unchanged with my patch." "So with the patch, the Python 3.4 default version (4) is *faster* (dump 20% faster, load 16% faster) and produces *smaller files* (10% smaller)." It looks like a win-win patch :-) The drawback is that files storing many duplicated huge numbers will not be smaller with marshal version >= 3. Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com