[issue15596] pickle: Faster serialization of Unicode strings
Roundup Robot added the comment: New changeset 09a84091ae96 by Antoine Pitrou in branch 'default': Issue #15596: Faster pickling of unicode strings. http://hg.python.org/cpython/rev/09a84091ae96 -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15596 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15596] pickle: Faster serialization of Unicode strings
Antoine Pitrou added the comment: I've applied the review comments and committed the patch. Thank you! -- resolution: - fixed stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15596 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15596] pickle: Faster serialization of Unicode strings
STINNER Victor added the comment: Hi Antoine, I prefer your patch. Great job! 2013/4/7 Antoine Pitrou rep...@bugs.python.org: Antoine Pitrou added the comment: I've applied the review comments and committed the patch. Thank you! -- resolution: - fixed stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15596 ___ -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15596 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15596] pickle: Faster serialization of Unicode strings
Antoine Pitrou added the comment: Since protocol 0 is essentially dead in Python 3, I would like to propose something simpler and safer: only optimize the binary protocols. If noone beats me to it, I'll adapt Victor's patch for that. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15596 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15596] pickle: Faster serialization of Unicode strings
Changes by Antoine Pitrou pit...@free.fr: -- stage: - patch review ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15596 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15596] pickle: Faster serialization of Unicode strings
Antoine Pitrou added the comment: Ping? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15596 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15596] pickle: Faster serialization of Unicode strings
Serhiy Storchaka added the comment: Well, I take care of this. I have the own patch for raw_unicode_escape() optimization, but microbenchmarks don't show any speed up. Maybe your approach will be better. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15596 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15596] pickle: Faster serialization of Unicode strings
STINNER Victor added the comment: serhiy: I'm not really motivated to finish the work on this issue (especially ... it would probably be good idea to benchmarks non-ASCII strings as well.). Would you like to work on this? -- nosy: +serhiy.storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15596 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15596] pickle: Faster serialization of Unicode strings
Changes by Jesús Cea Avión j...@jcea.es: -- nosy: +jcea ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15596 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15596] pickle: Faster serialization of Unicode strings
Antoine Pitrou added the comment: Looks interesting. Can you post benchmark numbers? (you can use the pickle tests from http://hg.python.org/benchmarks ) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15596 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15596] pickle: Faster serialization of Unicode strings
STINNER Victor added the comment: Here is a benchmark comparing Python 3.3 without and with my patch ned$ python3 perf.py -b fastpickle,pickle_dict,pickle_list,slowpickle ../default/python ../fasterpickle/python Running fastpickle... INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle Running pickle_dict... INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dict INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dict Running pickle_list... INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle_list INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle_list Running slowpickle... INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 pickle INFO:root:Running ../default/python performance/bm_pickle.py -n 50 pickle Report on Linux ned 3.4.4-4.fc16.x86_64 #1 SMP Thu Jul 5 20:01:38 UTC 2012 x86_64 x86_64 Total CPU cores: 8 ### fastpickle ### Min: 0.530622 - 0.332841: 1.59x faster Avg: 0.539450 - 0.336833: 1.60x faster Significant (t=232.04) Stddev: 0.00552 - 0.00276: 2.0032x smaller Timeline: b'http://tinyurl.com/dyu3vap' The following not significant results are hidden, use -v to show them: pickle_dict, pickle_list, slowpickle. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15596 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15596] pickle: Faster serialization of Unicode strings
STINNER Victor added the comment: For your information, results of benchmark comparing Python 3.2 to 3.3: ned$ python3 perf.py -b fastpickle,pickle_dict,pickle_list,slowpickle ../3.2/python ../default/python Running fastpickle... INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle Running pickle_dict... INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dict INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dict Running pickle_list... INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle_list INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle_list Running slowpickle... INFO:root:Running ../default/python performance/bm_pickle.py -n 50 pickle INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 pickle Report on Linux ned 3.4.4-4.fc16.x86_64 #1 SMP Thu Jul 5 20:01:38 UTC 2012 x86_64 x86_64 Total CPU cores: 8 ### fastpickle ### Min: 0.455842 - 0.542103: 1.19x slower Avg: 0.462334 - 0.547271: 1.18x slower Significant (t=-101.15) Stddev: 0.00362 - 0.00471: 1.3028x larger Timeline: b'http://tinyurl.com/btr644x' ### pickle_dict ### Min: 0.360125 - 0.345850: 1.04x faster Avg: 0.364019 - 0.348431: 1.04x faster Significant (t=30.84) Stddev: 0.00308 - 0.00181: 1.6973x smaller Timeline: b'http://tinyurl.com/cd3ashu' ### pickle_list ### Min: 0.803941 - 0.584800: 1.37x faster Avg: 0.85 - 0.589200: 1.38x faster Significant (t=455.00) Stddev: 0.00261 - 0.00225: 1.1612x smaller Timeline: b'http://tinyurl.com/8u4m2wf' ### slowpickle ### Min: 0.409008 - 0.461257: 1.13x slower Avg: 0.413668 - 0.466201: 1.13x slower Significant (t=-115.31) Stddev: 0.00236 - 0.00219: 1.0772x smaller Timeline: b'http://tinyurl.com/czrg5kf' -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15596 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15596] pickle: Faster serialization of Unicode strings
Alexandre Vassalotti added the comment: Amazing! Though, it would probably be good idea to benchmarks non-ASCII strings as well. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15596 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15596] pickle: Faster serialization of Unicode strings
STINNER Victor added the comment: Last one: Python 3.2 vs patched Python 3.3. ned$ python3 perf.py -b fastpickle,pickle_dict,pickle_list,slowpickle ../3.2/python ../fasterpickle/python Running fastpickle... INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle Running pickle_dict... INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dict INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dict Running pickle_list... INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle_list INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle_list Running slowpickle... INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 pickle INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 pickle Report on Linux ned 3.4.4-4.fc16.x86_64 #1 SMP Thu Jul 5 20:01:38 UTC 2012 x86_64 x86_64 Total CPU cores: 8 ### fastpickle ### Min: 0.470211 - 0.322453: 1.46x faster Avg: 0.475718 - 0.328496: 1.45x faster Significant (t=205.65) Stddev: 0.00317 - 0.00395: 1.2456x larger Timeline: b'http://tinyurl.com/9qpphzp' ### pickle_dict ### Min: 0.353965 - 0.347959: 1.02x faster Avg: 0.358980 - 0.350596: 1.02x faster Significant (t=10.44) Stddev: 0.00545 - 0.00160: 3.3956x smaller Timeline: b'http://tinyurl.com/9pfeqf9' ### pickle_list ### Min: 0.838222 - 0.593497: 1.41x faster Avg: 0.844636 - 0.599491: 1.41x faster Significant (t=296.53) Stddev: 0.00520 - 0.00267: 1.9521x smaller Timeline: b'http://tinyurl.com/9rynvnv' ### slowpickle ### Min: 0.408205 - 0.458309: 1.12x slower Avg: 0.413738 - 0.463916: 1.12x slower Significant (t=-53.85) Stddev: 0.00263 - 0.00604: 2.3019x larger Timeline: b'http://tinyurl.com/coffkbg' -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15596 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15596] pickle: Faster serialization of Unicode strings
New submission from STINNER Victor: Serialization of Unicode strings in the pickle module is suboptimal, especially for long strings. Attached patch optimize the serialization thanks to new properties of Unicode strings (PEP 393): * text (protocol 0): avoid any temporary buffer if the string is an ASCII or latin1 string without \\ or \n character; otherwise use a small buffer of 64 KB (instead of two buffer) * binary (protocol 1, 2): avoid any temporary buffer if string is an ASCII string or if the string is already available encoded as UTF-8 The current code for protocol 0 uses raw_unicode_escape() which is really suboptimal: it uses a first buffer to write the escape string, and then a new temporary buffer to store the buffer with the right size (instead of just calling _PyBytes_Resize). -- components: Library (Lib) files: pickle_unicode.patch keywords: patch messages: 167730 nosy: alexandre.vassalotti, haypo, pitrou priority: normal severity: normal status: open title: pickle: Faster serialization of Unicode strings type: performance versions: Python 3.4 Added file: http://bugs.python.org/file26730/pickle_unicode.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15596 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15596] pickle: Faster serialization of Unicode strings
STINNER Victor added the comment: Oh, I forgot to explain that I initially wrote the patch to fix the following failure on our bigmem buildbot. http://buildbot.python.org/all/builders/AMD64%20Ubuntu%20LTS%20bigmem%203.x/builds/165/steps/test/logs/stdio == ERROR: test_huge_str_32b (test.test_pickle.InMemoryPickleTests) -- Traceback (most recent call last): File /opt/python-bigmem/3.x.langa-bigmem/build/Lib/test/support.py, line 1281, in wrapper return f(self, maxsize) File /opt/python-bigmem/3.x.langa-bigmem/build/Lib/test/pickletester.py, line 1267, in test_huge_str_32b pickled = self.dumps(data, protocol=proto) File /opt/python-bigmem/3.x.langa-bigmem/build/Lib/test/test_pickle.py, line 49, in dumps return pickle.dumps(arg, protocol) MemoryError -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15596 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com