[issue17810] Implement PEP 3154 (pickle protocol 4)
Serhiy Storchaka added the comment: Here is a patch which restores optimization for frame headers. Unfortunately it breaks test_optional_frames. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Larry Hastings added the comment: Isn't it a little late to be changing the pickle protocol, now that we've hit feature-freeze? If you want to check something like this in you're going to have to make a good case for it. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Changes by Serhiy Storchaka storch...@gmail.com: Added file: http://bugs.python.org/file32840/pickle_frame_headers.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Serhiy Storchaka added the comment: This doesn't change the pickle protocol. This is just an implementation detail. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Alexandre Vassalotti added the comment: Optimizing the output of the pickler class should be fine during the feature freeze as long the semantics of the current opcodes stay unchanged. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Antoine Pitrou added the comment: Well, Larry may expand, but I think we don't commit performance optimizations during the feature freeze either. (feature is taken in the same sense as in no new features in the bugfix branches) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Larry Hastings added the comment: I'll make you a deal. As long as the protocol remains 100% backwards and forwards compatible (3.4.0b1 can read anything written by trunk, and trunk can read anything written by 3.4.0b1), you can make optimizations until beta 2. After that you have to stop... or get permission again. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Serhiy Storchaka added the comment: I have opened separate issue19780 for this. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Roundup Robot added the comment: New changeset 992ef855b3ed by Antoine Pitrou in branch 'default': Issue #17810: Implement PEP 3154, pickle protocol 4. http://hg.python.org/cpython/rev/992ef855b3ed -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Antoine Pitrou added the comment: I've now committed Alexandre's latest work (including the FRAME and MEMOIZE opcodes). -- resolution: - fixed stage: patch review - committed/rejected ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Roundup Robot added the comment: New changeset d719975f4d25 by Christian Heimes in branch 'default': Issue #17810: Add NULL check to save_frozenset http://hg.python.org/cpython/rev/d719975f4d25 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Roundup Robot added the comment: New changeset c54becd69805 by Christian Heimes in branch 'default': Issue #17810: return -1 on error http://hg.python.org/cpython/rev/c54becd69805 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Roundup Robot added the comment: New changeset a02adfb3260a by Christian Heimes in branch 'default': Issue #17810: Add two missing error checks to save_global http://hg.python.org/cpython/rev/a02adfb3260a -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Roundup Robot added the comment: New changeset 3e16c8c34e69 by Christian Heimes in branch 'default': Issue #17810: Fixed NULL check in _PyObject_GetItemsIter() http://hg.python.org/cpython/rev/3e16c8c34e69 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Alexandre Vassalotti added the comment: I've finalized the framing implementation in de9bda43d552. There will be more improvements to come until 3.4 final. However, feature-wise we are done. Thank you everyone for the help! -- status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Tim Peters added the comment: [Alexandre Vassalotti] I've finalized the framing implementation in de9bda43d552. There will be more improvements to come until 3.4 final. However, feature-wise we are done. Thank you everyone for the help! Woo hoo! Thank YOU for the hard work - I know how much fun this is ;-) -- nosy: +tim.peters ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Changes by Antoine Pitrou pit...@free.fr: -- nosy: +larry priority: high - release blocker ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Serhiy Storchaka added the comment: I propose to include frame size in previous frame. This will twice decrease the number of file reads. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Martin v. Löwis added the comment: Attached is a patch that takes a different approach to framing, putting it into an optional framing layer by means of a buffered reader/writer. The framing structure is the same as in PEP 3154; a separate PYFRAMES magic is prepended to guard against protocol inconsistencies and to allow for automatic detection of framing. -- nosy: +loewis Added file: http://bugs.python.org/file32709/framing.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Alexandre Vassalotti added the comment: I have been looking again at Stefan's previous proposal of making memoization implicit in the new pickle protocol. While I liked the smaller pickles it produced, I didn't the invasiveness of the implementation, which requires a change for almost every opcode processed by the Unpickler. This led me to, what I think is, a reasonable compromise between what we have right now and Stefan's proposal. That is we can make the argument of the PUT opcodes implicit, without making the whole opcode implicit. I've implemented this by introducing a new opcode MEMOIZE, which stores the top of the pickle stack using the size of the memo as the index. Using the memo size as the index avoids us some extra bookkeeping variables and handles nicely situations where Pickler.memo.clear() or Unpickler.memo.clear() are used. Size-wise, this brings some good improvements for pickles containing a lot of dicts and lists. # Before $ ./python.exe -c import pickle; print(len(pickle.dumps([[] for _ in range(1000)], 4))) 5251 # After with new MEMOIZE opcode ./python.exe -c import pickle; print(len(pickle.dumps([[] for _ in range(1000)], 4))) 2015 Time-wise, the change is mostly neutral. It makes pickling dicts and lists slightly faster because it simplifies the code for memo_put() in _pickle. Report on Darwin Kernel Version 12.5.0: Sun Sep 29 13:33:47 PDT 2013; root:xnu-2050.48.12~1/RELEASE_X86_64 x86_64 i386 Total CPU cores: 4 ### pickle4_dict ### Min: 0.714912 - 0.667203: 1.07x faster Avg: 0.741616 - 0.685567: 1.08x faster Significant (t=16.25) Stddev: 0.02033 - 0.01346: 1.5102x smaller Timeline: http://goo.gl/iHqCfB ### pickle4_list ### Min: 0.414151 - 0.398913: 1.04x faster Avg: 0.432094 - 0.409058: 1.06x faster Significant (t=11.83) Stddev: 0.01049 - 0.00893: 1.1749x smaller Timeline: http://goo.gl/wfQzgL Anyhow, I have committed this improvement in my pep-3154 branch (http://hg.python.org/features/pep-3154-alexandre/rev/8a2861aaef82) for now, though I will happily revert it if people oppose to the change. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Changes by Alexandre Vassalotti alexan...@peadrop.com: Added file: http://bugs.python.org/file32639/f87b455af573.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Changes by Alexandre Vassalotti alexan...@peadrop.com: Added file: http://bugs.python.org/file32640/8434af450da0.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Changes by Alexandre Vassalotti alexan...@peadrop.com: Removed file: http://bugs.python.org/file32639/f87b455af573.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Alexandre Vassalotti added the comment: Hi folks, I consider my implementation of PEP-3154 mostly feature complete at this point. I still have a few things left to do. For example, I need to update the documentation about the new protocol. However, these can mostly be done along the review process. Plus, I definitely prefer getting feedback sooner. :-) Please review at: http://bugs.python.org/review/17810/ Thanks! -- stage: needs patch - patch review ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Alexandre Vassalotti added the comment: I am still working on it. I am implemented support for nested globals last week (http://hg.python.org/features/pep-3154-alexandre/rev/c8991b32a47e). At this point, the only big piece remaining is the support for method descriptors. There are other minor things left but we can worry about those later. Nick, thanks for the pointer! I didn't know about PEP 451. I will look how we can use it in pickle. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Antoine Pitrou added the comment: Alexandre, Stefan, is any of you working on this? If not, could you please expose what the status of the patch is, whose work is the most advanced (Alexandre's or Stefan's) and what should be the plan to move this forward? Thanks! -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Nick Coghlan added the comment: Potentially relevant to this: we hope to have PEP 451 done for 3.4, which adds a __spec__ attribute to module objects, and will also tweak runpy to ensure -m registers __main__ under it's real name as well. If pickle uses __spec__.name in preference to __name__ when __spec__ is defined, then objects defined in __main__ modules run via -m should start being pickled correctly. -- nosy: +ncoghlan ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Alexandre Vassalotti added the comment: Stefan, could you address my review comments soon? The improved support for globals is the only big piece missing from the implementation of PEP, which I would like to get done and submitted by the end of the month. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Stefan Mihaila added the comment: On 6/3/2013 9:33 PM, Alexandre Vassalotti wrote: Alexandre Vassalotti added the comment: Stefan, could you address my review comments soon? The improved support for globals is the only big piece missing from the implementation of PEP, which I would like to get done and submitted by the end of the month. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ Yes, I apologize for the delay again. Today is my last exam this semester, so I'll do my best to get it done as soon as possible (hopefully this weekend). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Alexandre Vassalotti added the comment: Stefan, I took a quick look at your patch. There is a couple things that stands out. First, I think the implementation of BINGLOBAL and BINGLOBAL_BIG should be moved to another patch. Adding a binary version opcode for GLOBAL is a separate feature and it should be reviewed independently. Personally, I prefer the STACK_GLOBAL opcode I proposed as it much simpler to implement, but I am biased. Next, the patch's formatting should be fixed to conform to PEP 7 and PEP 8. Make sure the formatting is consistent with the surrounding code. In particular, comments should be full sentences that explains why we need this code. Avoid adding comments that merely say what the code does, unless the code is complex. In addition, please replace the uses of PyUnicode_InternFromString with the _Py_IDENTIFIER as needed. The latter allow the static strings to be garbage collected when the module is deleted, which is friendlier to embedded interpreters. It is also lead to cleaner code. Finally, the class method check hack looks like a bug to me. There are multiple solutions here. For example, we could fix class methods to be cached so they always have the same ID once they are created. Or, we could remove the 'is' check completely if it is unnecessary. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Antoine Pitrou added the comment: Stefan, I took a quick look at your patch. There is a couple things that stands out. It would be nice if you could reconcile each other's work. Especially so I don't re-implement framing on top of something else :-) Adding a binary version opcode for GLOBAL is a separate feature and it should be reviewed independently. Well, it's part of the PEP. Personally, I prefer the STACK_GLOBAL opcode I proposed as it much simpler to implement, but I am biased. I agree it sounds simpler. I hadn't thought about it when first writing the PEP. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Alexandre Vassalotti added the comment: Thanks Stefan for the patch. It's very much appreciated. I will try to review it soon. Of the features you proposed, the twos I would like to take a look again is implicit memoization and the BAIL_OUT opcode. For the implicit memoization feature, we will need to have some performance results in hand to justify the major changes it needs. If you can you work out a quick patch, I can run it through the benchmarks suite for pickle and measure the impact. Hopefully, we will see a good improvement though we can't be sure until we measure. And as for the BAIL_OUT opcode, it would be interesting to revisit its use now that we support binary framing. It could be helpful to add it to prevent the Unpickler from hanging if the other end forgot to close the stream. I am still not totally convinced. However if you make a good case for it, I would support to see it included. -- Added file: http://bugs.python.org/file30229/pickle4+methods.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Changes by Alexandre Vassalotti alexan...@peadrop.com: Removed file: http://bugs.python.org/file30229/pickle4+methods.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Changes by Stefan Mihaila mstefa...@gmail.com: -- nosy: +mstefanro ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Changes by Stefan Mihaila mstefa...@gmail.com: Added file: http://bugs.python.org/file30211/780722877a3e.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Changes by Stefan Mihaila mstefa...@gmail.com: Removed file: http://bugs.python.org/file30211/780722877a3e.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Stefan Mihaila added the comment: On 5/10/2013 11:46 PM, Stefan Mihaila wrote: Changes by Stefan Mihaila mstefa...@gmail.com: -- nosy: +mstefanro ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ Hello. I've worked on implementing PEP3154 as part of GSoC2012. My work is available in a repo at [1]. The blog I've used to report my work is at [2] and contains some useful information. Here is a list of features that were implemented as part of GSoC: * Pickling of very large bytes and strings * Better pickling of small string and bytes (+ tests) * Native pickling of sets and frozensets (+ tests) * Self-referential sets and frozensets (+ tests) * Implicit memoization (BINPUT is implicit for certain opcodes) - The argument against this was that pickletools.optimize would not be able to prevent memoization of objects that are not referred later. For such situations, a special flag at beginning could be added, which indicates whether implicit BINPUT is enabled. This flag could be added as one of the higher-order bits of the protocol version. For instance: PROTO \x04 + BINUNICODE .. and PROTO \x84 + BINUNICODE .. + BINPUT 1 would be equivalent. Then pickletools.optimize could choose whether it wants implicit BINPUT or not. Sure, this would complicate matters and it's not for me to decide whether it's worth it. In my midterm report at [3] there are some examples of what a pickled string looks in v4 without implicit memoization, and some size comparisons to v3. * Pickling of nested globals, methods etc. (+ tests) * Pickling calls to __new__ with keyword args (+ tests) * A BAIL_OUT opcode was always outputted when pickling failed, so that the Pickler and Unpickler can be both run at once on different ends of a stream. The Pickler could guarantee to always send a correct pickle on the stream. The Unpickler would never end up hanging when Pickling failed mid-work. - At the time, Alexandre suggested this would probably not be a great idea because it should be the responsibility of the protocol used to assure some consistency. However, this does not appear to be a trivial task to achieve. The size of the pickle is not known in advance, and waiting for the Pickler to complete before sending the data via stream is not as efficient, because the Unpickler would not be able to run at the same time. write and read methods of the stream would have to be wrapped and some escape sequence used. This would increase the size of the pickled string for some sort of worst-case of the escape sequence, probably. My thought was that it would be beneficial for the average user to have the guarantee that the Pickler always outputs a correct pickle to a stream, even if it raises an exception. * Other minor changes that I can't really remember. Although I'm sure Alexandre had his good reasons to start the work from scratch, it would be a shame to waste all this work. The features mentioned above are working and although the implementation may not be ideal (I don't have the cpython experience of a regular dev), I'm sure useful bits can be extracted from it. Alexandre suggested that I extract bits and post patches, so I have attached, for now, support for pickling methods and nested globals (+tests). I'm willing to do so for some or the rest of the features, should this be requested and should I have the necessary time to do so. [1] https://bitbucket.org/mstefanro/pickle4/ [2] https://pypickle4.wordpress.com/ [3] https://gist.github.com/mstefanro/3086647 -- Added file: http://bugs.python.org/file30213/methods.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___diff -r 780722877a3e Lib/pickle.py --- a/Lib/pickle.py Wed May 01 13:16:11 2013 -0700 +++ b/Lib/pickle.py Sat May 11 03:06:28 2013 +0300 @@ -23,7 +23,7 @@ -from types import FunctionType, BuiltinFunctionType +from types import FunctionType, BuiltinFunctionType, MethodType, ModuleType from copyreg import dispatch_table from copyreg import _extension_registry, _inverted_registry, _extension_cache from itertools import islice @@ -34,10 +34,44 @@ import io import codecs import _compat_pickle +import builtins +from inspect import ismodule, isclass __all__ = [PickleError, PicklingError, UnpicklingError, Pickler, Unpickler, dump, dumps, load, loads] +# Issue 15397: Unbinding of methods +# Adds the possibility to unbind methods as well as a few definitions missing +# from the types module. + +_MethodDescriptorType = type(list.append) +_WrapperDescriptorType = type(list.__add__) +_MethodWrapperType = type([].__add__) + +def _unbind(f): +Unbinds
[issue17810] Implement PEP 3154 (pickle protocol 4)
Antoine Pitrou added the comment: Antoine, can you share the code for your benchmarks which show performance improvements when framing is enabled? I am seeing the same 10-15% slowdown even when pickling stuff to pure Python objects: The performance improvement is when unpickling, not when pickling. Pickling always buffers data, so framing doesn't bring anything on this side of the fence. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Antoine Pitrou added the comment: Here are some numbers: # Without the patch $ ./python -m timeit -s import pickle, io; d=pickle.dumps(list(range(1000)), 4); b=io.BytesIO(d) b.seek(0); pickle.load(b) 1 loops, best of 3: 180 usec per loop $ ./python -m timeit -s import pickle, _pyio as io; d=pickle.dumps(list(range(1000)), 4); b=io.BytesIO(d) b.seek(0); pickle.load(b) 100 loops, best of 3: 4.52 msec per loop # With the patch $ ./python -m timeit -s import pickle, io; d=pickle.dumps(list(range(1000)), 4); b=io.BytesIO(d) b.seek(0); pickle.load(b) 1 loops, best of 3: 42.8 usec per loop $ ./python -m timeit -s import pickle, _pyio as io; d=pickle.dumps(list(range(1000)), 4); b=io.BytesIO(d) b.seek(0); pickle.load(b) 1 loops, best of 3: 47.3 usec per loop -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Alexandre Vassalotti added the comment: I am currently fleshing out an improved implementation for the reduce protocol version 4. One thing I am curious about is whether we should keep the special cases we currently have there for dict and list subclasses. I recall Raymond expressed disagreement in #msg83098 about this behavior. I agree that having __setitem__ called before __init__ make it harder for dict and list subclasses to support pickling. To take advantage of the special case, subclasses need to do their required initialization in the __new__ method. On the other hand, it does decrease the memory requirements for unpickling such subclasses---i.e., we can build the object in-place instead of building an intermediary list or dict. Reading PEP 307 confirms indeed that was the original intention. One possible solution, other than removing the special case completely, is to make sure we initialize the object (using the BUILD opcode) before we call __setitem__ or append on it. This would be a simple change that would solve the initialization issue. However, I would still feel uneasy about the default object.__reduce__ behavior depending on the object's subtype. I think it could be worthwhile to investigate a generic API for pickling collections in-place. For example, a such API would helpful for pickling set subclasses in-place. __items__() or Return an iterator of the items in the collection. Would be __getitems__() equivalent to iter(dict.items()) on dicts and iter(list) on lists. __additems__(items) Add a batch of items to the collection. By default, it would be defined as: for item in items: self.__additem__(item) However, subclasses would be free to provide a more efficient implementation of the method. Would be equivalent to dict.update on dicts and list.extend on lists. __additem__(item)Add a single item to the collection. Would be equivalent to dict[item[0]] = item[1] on dicts and list.append on lists. The collections module's ABCs could then provide default implementations of this API, which would give its users efficient in-place pickling automatically. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Antoine Pitrou added the comment: I think it could be worthwhile to investigate a generic API for pickling collections in-place. For example, a such API would helpful for pickling set subclasses in-place. Is the use case important enough? Otherwise, this is more __special_method__ complication that we'll have to maintain for pickle's only use. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Alexandre Vassalotti added the comment: Those methods wouldn't be much more a maintenance burden than the special cases already present in the implementation of __reduce__. These methods would only need to be provided by classes that wishes to support efficient in-place pickling provided by protocol 4. As such, this approach better as it would rely on duck typing rather than concrete type checks, which IMHO do not belong in the default object implementation. Plus, having this generic API would allow pickle to share the same pickling and unpickling code for lists, dicts, sets and other mutable collections. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Antoine Pitrou added the comment: Here is an updated framing patch which fixes the issue reported by Alexandre. There are also a couple added tests. -- Added file: http://bugs.python.org/file30118/framing3.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Alexandre Vassalotti added the comment: The framing patch seems to have a significant negative effect on performance. Report on Linux avassalotti 3.2.5-gg1130 #1 SMP Mon Feb 4 02:25:47 PST 2013 x86_64 x86_64 Total CPU cores: 12 ### fastpickle ### Min: 0.447194 - 0.505841: 1.13x slower Avg: 0.455517 - 0.509537: 1.12x slower Significant (t=-22.05) Stddev: 0.01438 - 0.00967: 1.4875x smaller ### fastunpickle ### Min: 0.583922 - 0.638744: 1.09x slower Avg: 0.589183 - 0.649506: 1.10x slower Significant (t=-21.77) Stddev: 0.00939 - 0.01720: 1.8324x larger Would it be possible to mitigate the regression? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Antoine Pitrou added the comment: The framing patch seems to have a significant negative effect on performance. I wouldn't call it significant. Any speedup or slowdown less than 50% is unlikely to be noticeable in real-world applications. Mitigating the regression is probably a matter of tweaking the read/write fast paths (optimizing for the common case where a frame is ongoing and the buffer is neither full nor empty). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Alexandre Vassalotti added the comment: Antoine, can you share the code for your benchmarks which show performance improvements when framing is enabled? I am seeing the same 10-15% slowdown even when pickling stuff to pure Python objects: ### Without the patch ./python -m timeit -r 50 -s import pickle, _pyio; f = _pyio.BytesIO(); x = list(range(1000)) pickle.dump(x, f, protocol=4) 1 loops, best of 50: 28.5 usec per loop ### With the patch ./python -m timeit -r 50 -s import pickle, _pyio; f = _pyio.BytesIO(); x = list(range(1000)) pickle.dump(x, f, protocol=4) 1 loops, best of 50: 32.9 usec per loop -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Changes by Alexandre Vassalotti alexan...@peadrop.com: -- dependencies: +Refactor reduce protocol implementation ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Alexandre Vassalotti added the comment: The latest framing patch looks pretty nice overall. One concern is we need to make sure the C implementation call _Pickler_OpcodeBoundary often enough to keep the frames around the sizes. For example, batch_save_list and batch_save_dict can currently create a frame much larger than expected. Interestingly enough, I found pickle, with patch applied, crashes when handling such frames: 13:44:43 pep-3154 $ ./python -c import pickle, io; pickle.dump(list(range(10**5)), io.BytesIO(), 4) Debug memory block at address p=0x1e96b10: API 'o' 52 bytes originally requested The 7 pad bytes at p-7 are FORBIDDENBYTE, as expected. The 8 pad bytes at tail=0x1e96b44 are not all FORBIDDENBYTE (0xfb): at tail+0: 0x00 *** OUCH at tail+1: 0x00 *** OUCH at tail+2: 0x00 *** OUCH at tail+3: 0x00 *** OUCH at tail+4: 0x4d *** OUCH at tail+5: 0x75 *** OUCH at tail+6: 0x5b *** OUCH at tail+7: 0xfb The block was made by call #237465 to debug malloc/realloc. Data at p: 00 00 00 00 00 00 00 00 ... ff ff ff ff 00 00 00 00 Fatal Python error: bad trailing pad byte Current thread 0x7f5dea491700: File string, line 1 in module Aborted (core dumped) Also, I think we should try to make pickletools.dis display the frame boundaries to help with debugging. This could be implemented by adding an option to pickletools.genops which could be helpful for testing the framing implementation as well. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Antoine Pitrou added the comment: One concern is we need to make sure the C implementation call _Pickler_OpcodeBoundary often enough to keep the frames around the sizes. For example, batch_save_list and batch_save_dict can currently create a frame much larger than expected. I don't understand how that can happen. batch_list() and batch_dict() both call save() for each item, and save() calls _Pickler_OpcodeBoundary() at the end. Have I missed something? Interestingly enough, I found pickle, with patch applied, crashes when handling such frames: Interesting, I'll take a look when I have some time. Also, I think we should try to make pickletools.dis display the frame boundaries to help with debugging. This could be implemented by adding an option to pickletools.genops which could be helpful for testing the framing implementation as well. Agreed. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Alexandre Vassalotti added the comment: I don't understand how that can happen. batch_list() and batch_dict() both call save() for each item, and save() calls _Pickler_OpcodeBoundary() at the end. Have I missed something? Ah, you're right. I was thinking in terms of my fast dispatch patch in issue #17787. Sorry for the confusion! -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Antoine Pitrou added the comment: Here is an updated framing patch which supports pickletools.optimize(). -- Added file: http://bugs.python.org/file30094/framing2.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Antoine Pitrou added the comment: Here is a framing patch on top of Alexandre's work. There is one thing that framing breaks: pickletools.optimize(). I think it would be non-trivial to fix it. Perhaps the PREFETCH opcode is a better idea for this. Alexandre, I don't understand why you removed STACK_GLOBAL. GLOBAL is a PITA that we should not use in protocol 4 anymore, so we need either STACK_GLOBAL or some kind of BINGLOBAL. -- Added file: http://bugs.python.org/file30068/framing.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Serhiy Storchaka added the comment: What is wrong with GLOBAL? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Antoine Pitrou added the comment: What is wrong with GLOBAL? It uses the lame text mode that scans for newlines, and is generally annoying to optimize. This is like C strings vs. Pascal strings. http://www.python.org/dev/peps/pep-3154/#binary-encoding-for-all-opcodes -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Serhiy Storchaka added the comment: With framing it isn't annoying. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Alexandre Vassalotti added the comment: Antoine, I removed STACK_GLOBAL when I found performance issues with the implementation. The changeset that added it had some unrelated changes that made it harder to debug than necessary. I am planning to re-add it when I worked out the kinks. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Antoine Pitrou added the comment: With framing it isn't annoying. Slightly less, but you still have to wrap readline() calls in the unpickler. I have started experimenting with PREFETCH, but making the opcode optional is a bit annoying in the C pickler, which means it's simpler to always emit it, which means it's not very different from framing in the end :-) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Antoine Pitrou added the comment: And here is an implementation of PREFETCH over Alexandre's work. As you can see the code complexity compared to framing is mostly a wash, but I think fixing pickletools.optimize() will be easier with PREFETCH (still needs confirmation, of course :-)). -- Added file: http://bugs.python.org/file30072/prefetch.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Serhiy Storchaka added the comment: I were thinking about framing before looking at your last changes to PEP 3154 and I have two alternative propositions. 1. Pack picked items in blocks of some predefined (or specified at the start with the BLOCKSIZE opcode) size. Only some large data (long strings, large integers) can cross the boundary between blocks. In all other cases the block should be padded with the NOP opcode. 2. A similar to your proposition, but frames should be declared with a special PREFETCH opcode (with 2- or 4-bytes argument). Large data pickled outside frames (this prevents doublecopying). Opcode and size of large data object can (should?) be included in the previous frame. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Antoine Pitrou added the comment: 1. Pack picked items in blocks of some predefined (or specified at the start with the BLOCKSIZE opcode) size. Only some large data (long strings, large integers) can cross the boundary between blocks. In all other cases the block should be padded with the NOP opcode. Padding makes it both less efficient and more annoying to handle, IMO. My framing proof-of-concept ends up quite simple in terms of code complexity. For example, the C version only adds 125 lines of code in 3 additional functions. 2. A similar to your proposition, but frames should be declared with a special PREFETCH opcode (with 2- or 4-bytes argument). Large data pickled outside frames (this prevents doublecopying). No doublecopying is necessary (not in the C version, that is). That said, this is an interesting idea. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Serhiy Storchaka added the comment: Padding makes it both less efficient and more annoying to handle, IMO. Agree. But there is other application for NOPs. UTF-8 decoder (and some other decoders) works more fast (up to 4x) when input is aligned. By adding several NOPs before BINUNICODE so that start of encoded data is 4- or 8-bytes aligned relatively to start of frame, we can significan speedup unpickling long ASCII strings. I propose to add new NOP opcode and to use it to align some align-sensitive data. My framing proof-of-concept ends up quite simple in terms of code complexity. For example, the C version only adds 125 lines of code in 3 additional functions. I just looked in the code and saw that the unpickler already has a ready infrastructure for prefetching. Now your words have not appear to be so incredible. ;) It should work. No doublecopying is necessary (not in the C version, that is). Agree, there is no doublecopying (except for large bytes objects). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Charles-François Natali added the comment: I would like to see Proto4 include an option for compression (zlib,bz2) or somesuch and become self-decompressing upon unpickling. I don't see what this would bring over explicit compression: - depending on the use case, you may want to use different compression algorithms, e.g. for disk you may want higher compression ratio like bzip2/lzma, but for wire you'd prefer something fast like snappy - supporting multiple compression algorithms and levels would complicate the API - this would probably complicate the code, since you'd have to support optional compression, and have a way to indicate which format is used - that's really mixing two entirely different concepts (serialization vs compression) -- nosy: +neologix ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Antoine Pitrou added the comment: I don't see what this would bring over explicit compression: - depending on the use case, you may want to use different compression algorithms, e.g. for disk you may want higher compression ratio like bzip2/lzma, but for wire you'd prefer something fast like snappy - supporting multiple compression algorithms and levels would complicate the API - this would probably complicate the code, since you'd have to support optional compression, and have a way to indicate which format is used - that's really mixing two entirely different concepts (serialization vs compression) I agree with Charles-François. A feature that may be actually nice to have in the pickle protocol would be some framing, to help with streaming unpickling (right now unpickling a stream can read almost one byte at a time, IIRC). However, that would also make the protocol and the pickler significantly more complex. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Changes by Arfrever Frehtes Taifersar Arahesis arfrever@gmail.com: -- nosy: +Arfrever ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Antoine Pitrou added the comment: A proof of concept hack to enable framing on pickle showed a massive performance increase on streaming unpickling (up to 5x faster with a C file object such as io.BytesIO, up to 150x faster with a pure Python file object such as _pyio.BytesIO). There is a slight slowdown on non-streaming operation, but that could probably be optimized. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Antoine Pitrou added the comment: (note: I've updated PEP 3154 with framing and GLOBAL_STACK) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Serhiy Storchaka added the comment: A feature that may be actually nice to have in the pickle protocol would be some framing, to help with streaming unpickling (right now unpickling a stream can read almost one byte at a time, IIRC). However, that would also make the protocol and the pickler significantly more complex. What if just use io.BufferedReader? if not isinstance(file, io.BufferedReader): file = io.BufferedReader(file) (at start of _Unpickler.__init__) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Antoine Pitrou added the comment: What if just use io.BufferedReader? if not isinstance(file, io.BufferedReader): file = io.BufferedReader(file) (at start of _Unpickler.__init__) Two problems: 1. semantically, it is wrong; the BufferedReader will read bytes beyond the pickle end, so the underlying stream will be desynchronized 2. performance-wise, it doesn't solve the issue either: read() method calls are costly, even on an optimized C object -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Raymond Hettinger added the comment: I would like to see Proto4 include an option for compression (zlib,bz2) or somesuch and become self-decompressing upon unpickling. The primary use cases for pickling involve writing to disk or transmitting across a wire -- both use cases benefit from compression (with reduced read/write times). -- nosy: +rhettinger ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
New submission from Alexandre Vassalotti: I have restarted the work on PEP 3154. Stefan Mihaila had begun an implementation as part of the Google Summer of Code 2012. Unfortunately, he hit multiple roadblocks which prevented him to finish his work by the end of the summer. He previously shown interest in completing his implementation. However he got constrained by time and never resumed his work. So I am taking over the implementation of the PEP. I have decided to go forward with a brand new code, using Stefan's work only as a guide. At the moment, I have completed about half of the PEP---missing only support for calling __new__ with keyword arguments and the use of new qualified name for referring objects. Design-wise, there is still a few things that we should discuss. For example, I think Stefan's idea, which is not specified in the PEP, to eliminate PUT opcodes is interesting. His proposal was to emit an implicit PUT opcode after each object pickled and make the Pickler and Unpickler classes agree on the scheme. A drawback of this implicit scheme is we cannot be selective about which object we save in the memo during unpickling. That means, for example, we won't be able to make pickletools.optimize work with protocol 4 to reduce the memory footprint of the unpickling process. This scheme also alters the meaning of all previously defined opcodes because of the implicit PUTs, which is sort of okay because we are changing protocol. Alternatively, we could use an explicit scheme by defining new fat opcodes, for the built-in types we care about, which includes memoization. This scheme would a bit more flexible however it would also be slightly more involved implementation-wise. In any case, I will run benchmarks to see if either schemes are worthwhile. -- assignee: alexandre.vassalotti components: Library (Lib) hgrepos: 184 messages: 187496 nosy: alexandre.vassalotti, pitrou priority: high severity: normal stage: needs patch status: open title: Implement PEP 3154 (pickle protocol 4) type: enhancement versions: Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Changes by Alexandre Vassalotti alexan...@peadrop.com: -- dependencies: +Unbinding of methods ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Changes by Andrew Svetlov andrew.svet...@gmail.com: -- nosy: +asvetlov ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Changes by Antoine Pitrou pit...@free.fr: -- keywords: +patch Added file: http://bugs.python.org/file29966/9f1be171da08.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Antoine Pitrou added the comment: Thank you for reviving this :) A couple of questions: - why ADDITEM in addition to ADDITEMS? I don't think single-element sets are an important use case (as opposed to, say, single-element tuples) - what is the purpose of STACK_GLOBAL? I would say memoization of common names but you pass memoize=False For example, I think Stefan's idea, which is not specified in the PEP, to eliminate PUT opcodes is interesting. His proposal was to emit an implicit PUT opcode after each object pickled and make the Pickler and Unpickler classes agree on the scheme. Are the savings worth it? I've tried pickletools.optimize() on two objects: - a typical data dict (http.client.responses). The pickle length decreases from 1155 to 1063 (8% shrink); unpickling is faster by 4%. - a Logger object (logging.getLogger(foobar). The pickle length decreases from 427 to 389 (9% shrink); unpickling is faster by 2%. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Changes by Serhiy Storchaka storch...@gmail.com: -- nosy: +serhiy.storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Serhiy Storchaka added the comment: Link to the previous attempt: issue15642. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Serhiy Storchaka added the comment: Memoization consumes memory during pickling. For now every memoized object requires memory for: dict's entity; an id() integer object; a 2-element tuple; a pickle's index (an integer object). It's about 80 bytes on 32-bit platform (and twice as this on 64-bit). For data which contains a lot of floats it can be cumbersome. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Antoine Pitrou added the comment: Memoization consumes memory during pickling. For now every memoized object requires memory for: dict's entity; an id() integer object; a 2-element tuple; a pickle's index (an integer object). It's about 80 bytes on 32-bit platform (and twice as this on 64-bit). As far as I understand, Alexandre doesn't propose to suppress memoization, only to make it implicit. Therefore the memory overhead would be the same (but the pickle would have less opcodes). For data which contains a lot of floats it can be cumbersome. Apparently, floats don't get memoized: pickletools.dis(pickle.dumps([1.0, 2.0])) 0: \x80 PROTO 3 2: ]EMPTY_LIST 3: qBINPUT 0 5: (MARK 6: GBINFLOAT 1.0 15: GBINFLOAT 2.0 24: eAPPENDS(MARK at 5) 25: .STOP -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17810 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com