[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-11-25 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Here is a patch which restores optimization for frame headers. Unfortunately it 
breaks test_optional_frames.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-11-25 Thread Larry Hastings

Larry Hastings added the comment:

Isn't it a little late to be changing the pickle protocol, now that we've hit 
feature-freeze?  If you want to check something like this in you're going to 
have to make a good case for it.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-11-25 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


Added file: http://bugs.python.org/file32840/pickle_frame_headers.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-11-25 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

This doesn't change the pickle protocol. This is just an implementation detail.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-11-25 Thread Alexandre Vassalotti

Alexandre Vassalotti added the comment:

Optimizing the output of the pickler class should be fine during the feature 
freeze as long the semantics of the current opcodes stay unchanged.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-11-25 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Well, Larry may expand, but I think we don't commit performance optimizations 
during the feature freeze either.
(feature is taken in the same sense as in no new features in the bugfix 
branches)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-11-25 Thread Larry Hastings

Larry Hastings added the comment:

I'll make you a deal.  As long as the protocol remains 100% backwards and 
forwards compatible (3.4.0b1 can read anything written by trunk, and trunk can 
read anything written by 3.4.0b1), you can make optimizations until beta 2.  
After that you have to stop... or get permission again.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-11-25 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I have opened separate issue19780 for this.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-11-23 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 992ef855b3ed by Antoine Pitrou in branch 'default':
Issue #17810: Implement PEP 3154, pickle protocol 4.
http://hg.python.org/cpython/rev/992ef855b3ed

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-11-23 Thread Antoine Pitrou

Antoine Pitrou added the comment:

I've now committed Alexandre's latest work (including the FRAME and MEMOIZE 
opcodes).

--
resolution:  - fixed
stage: patch review - committed/rejected

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-11-23 Thread Roundup Robot

Roundup Robot added the comment:

New changeset d719975f4d25 by Christian Heimes in branch 'default':
Issue #17810: Add NULL check to save_frozenset
http://hg.python.org/cpython/rev/d719975f4d25

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-11-23 Thread Roundup Robot

Roundup Robot added the comment:

New changeset c54becd69805 by Christian Heimes in branch 'default':
Issue #17810: return -1 on error
http://hg.python.org/cpython/rev/c54becd69805

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-11-23 Thread Roundup Robot

Roundup Robot added the comment:

New changeset a02adfb3260a by Christian Heimes in branch 'default':
Issue #17810: Add two missing error checks to save_global
http://hg.python.org/cpython/rev/a02adfb3260a

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-11-23 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 3e16c8c34e69 by Christian Heimes in branch 'default':
Issue #17810: Fixed NULL check in _PyObject_GetItemsIter()
http://hg.python.org/cpython/rev/3e16c8c34e69

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-11-23 Thread Alexandre Vassalotti

Alexandre Vassalotti added the comment:

I've finalized the framing implementation in de9bda43d552.

There will be more improvements to come until 3.4 final. However, feature-wise 
we are done. Thank you everyone for the help!

--
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-11-23 Thread Tim Peters

Tim Peters added the comment:

[Alexandre Vassalotti]
 I've finalized the framing implementation in de9bda43d552.

 There will be more improvements to come until 3.4 final. However, feature-wise
 we are done. Thank you everyone for the help!

Woo hoo!  Thank YOU for the hard work - I know how much fun this is ;-)

--
nosy: +tim.peters

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-11-20 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
nosy: +larry
priority: high - release blocker

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-11-19 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I propose to include frame size in previous frame. This will twice decrease the 
number of file reads.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-11-19 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Attached is a patch that takes a different approach to framing, putting it into 
an optional framing layer by means of a buffered reader/writer.

The framing structure is the same as in PEP 3154; a separate PYFRAMES magic is 
prepended to guard against protocol inconsistencies and to allow for automatic 
detection of framing.

--
nosy: +loewis
Added file: http://bugs.python.org/file32709/framing.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-11-18 Thread Alexandre Vassalotti

Alexandre Vassalotti added the comment:

I have been looking again at Stefan's previous proposal of making memoization 
implicit in the new pickle protocol. While I liked the smaller pickles it 
produced, I didn't the invasiveness of the implementation, which requires a 
change for almost every opcode processed by the Unpickler. This led me to, what 
I think is, a reasonable compromise between what we have right now and Stefan's 
proposal. That is we can make the argument of the PUT opcodes implicit, without 
making the whole opcode implicit.

I've implemented this by introducing a new opcode MEMOIZE, which stores the top 
of the pickle stack using the size of the memo as the index. Using the memo 
size as the index avoids us some extra bookkeeping variables and handles nicely 
situations where Pickler.memo.clear() or Unpickler.memo.clear() are used.

Size-wise, this brings some good improvements for pickles containing a lot of 
dicts and lists.

# Before
$ ./python.exe -c import pickle; print(len(pickle.dumps([[] for _ in 
range(1000)], 4)))
5251

# After with new MEMOIZE opcode
./python.exe -c import pickle; print(len(pickle.dumps([[] for _ in 
range(1000)], 4)))
2015

Time-wise, the change is mostly neutral. It makes pickling dicts and lists 
slightly faster because it simplifies the code for memo_put() in _pickle.

Report on Darwin Kernel Version 12.5.0: Sun Sep 29 13:33:47 PDT 2013; 
root:xnu-2050.48.12~1/RELEASE_X86_64 x86_64 i386
Total CPU cores: 4

### pickle4_dict ###
Min: 0.714912 - 0.667203: 1.07x faster
Avg: 0.741616 - 0.685567: 1.08x faster
Significant (t=16.25)
Stddev: 0.02033 - 0.01346: 1.5102x smaller
Timeline: http://goo.gl/iHqCfB

### pickle4_list ###
Min: 0.414151 - 0.398913: 1.04x faster
Avg: 0.432094 - 0.409058: 1.06x faster
Significant (t=11.83)
Stddev: 0.01049 - 0.00893: 1.1749x smaller
Timeline: http://goo.gl/wfQzgL

Anyhow, I have committed this improvement in my pep-3154 branch 
(http://hg.python.org/features/pep-3154-alexandre/rev/8a2861aaef82) for now, 
though I will happily revert it if people oppose to the change.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-11-15 Thread Alexandre Vassalotti

Changes by Alexandre Vassalotti alexan...@peadrop.com:


Added file: http://bugs.python.org/file32639/f87b455af573.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-11-15 Thread Alexandre Vassalotti

Changes by Alexandre Vassalotti alexan...@peadrop.com:


Added file: http://bugs.python.org/file32640/8434af450da0.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-11-15 Thread Alexandre Vassalotti

Changes by Alexandre Vassalotti alexan...@peadrop.com:


Removed file: http://bugs.python.org/file32639/f87b455af573.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-11-15 Thread Alexandre Vassalotti

Alexandre Vassalotti added the comment:

Hi folks,

I consider my implementation of PEP-3154 mostly feature complete at this point. 
I still have a few things left to do. For example, I need to update the 
documentation about the new protocol. However, these can mostly be done along 
the review process. Plus, I definitely prefer getting feedback sooner. :-) 

Please review at:

http://bugs.python.org/review/17810/

Thanks!

--
stage: needs patch - patch review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-08-18 Thread Alexandre Vassalotti

Alexandre Vassalotti added the comment:

I am still working on it. I am implemented support for nested globals last week 
(http://hg.python.org/features/pep-3154-alexandre/rev/c8991b32a47e). At this 
point, the only big piece remaining is the support for method descriptors. 
There are other minor things left but we can worry about those later.

Nick, thanks for the pointer! I didn't know about PEP 451. I will look how we 
can use it in pickle.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-08-17 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Alexandre, Stefan, is any of you working on this?
If not, could you please expose what the status of the patch is, whose work is 
the most advanced (Alexandre's or Stefan's) and what should be the plan to move 
this forward?

Thanks!

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-08-17 Thread Nick Coghlan

Nick Coghlan added the comment:

Potentially relevant to this: we hope to have PEP 451 done for 3.4, which adds 
a __spec__ attribute to module objects, and will also tweak runpy to ensure -m 
registers __main__ under it's real name as well.

If pickle uses __spec__.name in preference to __name__ when __spec__ is 
defined, then objects defined in __main__ modules run via -m should start being 
pickled correctly.

--
nosy: +ncoghlan

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-06-03 Thread Alexandre Vassalotti

Alexandre Vassalotti added the comment:

Stefan, could you address my review comments soon? The improved support for 
globals is the only big piece missing from the implementation of PEP, which I 
would like to get done and submitted by the end of the month.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-06-03 Thread Stefan Mihaila

Stefan Mihaila added the comment:

On 6/3/2013 9:33 PM, Alexandre Vassalotti wrote:
 Alexandre Vassalotti added the comment:

 Stefan, could you address my review comments soon? The improved support for 
 globals is the only big piece missing from the implementation of PEP, which I 
 would like to get done and submitted by the end of the month.

 --

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue17810
 ___

Yes, I apologize for the delay again. Today is my last exam this 
semester, so
I'll do my best to get it done as soon as possible (hopefully this weekend).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-05-12 Thread Alexandre Vassalotti

Alexandre Vassalotti added the comment:

Stefan, I took a quick look at your patch. There is a couple things that stands 
out.

First, I think the implementation of BINGLOBAL and BINGLOBAL_BIG should be 
moved to another patch. Adding a binary version opcode for GLOBAL is a separate 
feature and it should be reviewed independently. Personally, I prefer the 
STACK_GLOBAL opcode I proposed as it much simpler to implement, but I am biased.

Next, the patch's formatting should be fixed to conform to PEP 7 and PEP 8. 
Make sure the formatting is consistent with the surrounding code. In 
particular, comments should be full sentences that explains why we need this 
code. Avoid adding comments that merely say what the code does, unless the code 
is complex.

In addition, please replace the uses of PyUnicode_InternFromString with the 
_Py_IDENTIFIER as needed. The latter allow the static strings to be garbage 
collected when the module is deleted, which is friendlier to embedded 
interpreters. It is also lead to cleaner code.

Finally, the class method check hack looks like a bug to me. There are multiple 
solutions here. For example, we could fix class methods to be cached so they 
always have the same ID once they are created. Or, we could remove the 'is' 
check completely if it is unnecessary.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-05-12 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 Stefan, I took a quick look at your patch. There is a couple things
 that stands out.

It would be nice if you could reconcile each other's work. Especially so
I don't re-implement framing on top of something else :-)

 Adding a binary version opcode for GLOBAL is a separate feature and it
 should be reviewed independently.

Well, it's part of the PEP.

 Personally, I prefer the STACK_GLOBAL opcode I proposed as it much
 simpler to implement, but I am biased.

I agree it sounds simpler. I hadn't thought about it when first writing
the PEP.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-05-11 Thread Alexandre Vassalotti

Alexandre Vassalotti added the comment:

Thanks Stefan for the patch. It's very much appreciated. I will try to review 
it soon.

Of the features you proposed, the twos I would like to take a look again is 
implicit memoization and the BAIL_OUT opcode. For the implicit memoization 
feature, we will need to have some performance results in hand to justify the 
major changes it needs. If you can you work out a quick patch, I can run it 
through the benchmarks suite for pickle and measure the impact. Hopefully, we 
will see a good improvement though we can't be sure until we measure.

And as for the BAIL_OUT opcode, it would be interesting to revisit its use now 
that we support binary framing. It could be helpful to add it to prevent the 
Unpickler from hanging if the other end forgot to close the stream. I am still 
not totally convinced. However if you make a good case for it, I would support 
to see it included.

--
Added file: http://bugs.python.org/file30229/pickle4+methods.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-05-11 Thread Alexandre Vassalotti

Changes by Alexandre Vassalotti alexan...@peadrop.com:


Removed file: http://bugs.python.org/file30229/pickle4+methods.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-05-10 Thread Stefan Mihaila

Changes by Stefan Mihaila mstefa...@gmail.com:


--
nosy: +mstefanro

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-05-10 Thread Stefan Mihaila

Changes by Stefan Mihaila mstefa...@gmail.com:


Added file: http://bugs.python.org/file30211/780722877a3e.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-05-10 Thread Stefan Mihaila

Changes by Stefan Mihaila mstefa...@gmail.com:


Removed file: http://bugs.python.org/file30211/780722877a3e.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-05-10 Thread Stefan Mihaila

Stefan Mihaila added the comment:

On 5/10/2013 11:46 PM, Stefan Mihaila wrote:
 Changes by Stefan Mihaila mstefa...@gmail.com:


 --
 nosy: +mstefanro

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue17810
 ___

Hello. I've worked on implementing PEP3154 as part of GSoC2012.
My work is available in a repo at [1].
The blog I've used to report my work is at [2] and contains some useful 
information.

Here is a list of features that were implemented as part of GSoC:

* Pickling of very large bytes and strings
* Better pickling of small string and bytes (+ tests)
* Native pickling of sets and frozensets (+ tests)
* Self-referential sets and frozensets (+ tests)
* Implicit memoization (BINPUT is implicit for certain opcodes)
   - The argument against this was that pickletools.optimize would
 not be able to prevent memoization of objects that are not
 referred later. For such situations, a special flag at beginning
 could be added, which indicates whether implicit BINPUT is enabled.
 This flag could be added as one of the higher-order bits of the 
protocol
 version. For instance:
 PROTO \x04 + BINUNICODE ..
 and
 PROTO \x84 + BINUNICODE .. + BINPUT 1
 would be equivalent. Then pickletools.optimize could choose whether
 it wants implicit BINPUT or not. Sure, this would complicate 
matters and it's
 not for me to decide whether it's worth it.
 In my midterm report at [3] there are some examples of what a 
pickled string
 looks in v4 without implicit memoization, and some size comparisons
 to v3.
* Pickling of nested globals, methods etc. (+ tests)
* Pickling calls to __new__ with keyword args (+ tests)
* A BAIL_OUT opcode was always outputted when pickling failed, so that
   the Pickler and Unpickler can be both run at once on different ends
   of a stream. The Pickler could guarantee to always send a
   correct pickle on the stream. The Unpickler would never end up hanging
   when Pickling failed mid-work.
   -  At the time, Alexandre suggested this would probably not be a great
  idea because it should be the responsibility of the protocol used
  to assure some consistency. However, this does not appear to be
  a trivial task to achieve. The size of the pickle is not known in
  advance, and waiting for the Pickler to complete before sending
  the data via stream is not as efficient, because the Unpickler
  would not be able to run at the same time.
  write and read methods of the stream would have to be wrapped and
  some escape sequence used. This would
  increase the size of the pickled string for some sort of worst-case
  of the escape sequence, probably. My thought was that it would be
  beneficial for the average user to have the guarantee that the Pickler
  always outputs a correct pickle to a stream, even if it raises an 
exception.
* Other minor changes that I can't really remember.

Although I'm sure Alexandre had his good reasons to start the work from
scratch, it would be a shame to waste all this work. The features mentioned
above are working and although the implementation may not be ideal (I don't
have the cpython experience of a regular dev), I'm sure useful bits can be
extracted from it.
Alexandre suggested that I extract bits and post patches, so I have 
attached,
for now, support for pickling methods and nested globals (+tests).
I'm willing to do so for some or the rest of the features, should this 
be requested
and should I have the necessary time to do so.

[1] https://bitbucket.org/mstefanro/pickle4/
[2] https://pypickle4.wordpress.com/
[3] https://gist.github.com/mstefanro/3086647

--
Added file: http://bugs.python.org/file30213/methods.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___diff -r 780722877a3e Lib/pickle.py
--- a/Lib/pickle.py Wed May 01 13:16:11 2013 -0700
+++ b/Lib/pickle.py Sat May 11 03:06:28 2013 +0300
@@ -23,7 +23,7 @@
 
 
 
-from types import FunctionType, BuiltinFunctionType
+from types import FunctionType, BuiltinFunctionType, MethodType, ModuleType
 from copyreg import dispatch_table
 from copyreg import _extension_registry, _inverted_registry, _extension_cache
 from itertools import islice
@@ -34,10 +34,44 @@
 import io
 import codecs
 import _compat_pickle
+import builtins
+from inspect import ismodule, isclass
 
 __all__ = [PickleError, PicklingError, UnpicklingError, Pickler,
Unpickler, dump, dumps, load, loads]
 
+# Issue 15397: Unbinding of methods
+# Adds the possibility to unbind methods as well as a few definitions missing
+# from the types module.
+
+_MethodDescriptorType = type(list.append)
+_WrapperDescriptorType = type(list.__add__)
+_MethodWrapperType = type([].__add__)
+
+def _unbind(f):
+Unbinds 

[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-05-04 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 Antoine, can you share the code for your benchmarks which show
 performance improvements when framing is enabled? I am seeing the same
 10-15% slowdown even when pickling stuff to pure Python objects:

The performance improvement is when unpickling, not when pickling.
Pickling always buffers data, so framing doesn't bring anything on this
side of the fence.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-05-04 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Here are some numbers:

# Without the patch

$ ./python -m timeit -s import pickle, io; d=pickle.dumps(list(range(1000)), 
4); b=io.BytesIO(d) b.seek(0); pickle.load(b)
1 loops, best of 3: 180 usec per loop

$ ./python -m timeit -s import pickle, _pyio as io; 
d=pickle.dumps(list(range(1000)), 4); b=io.BytesIO(d) b.seek(0); 
pickle.load(b) 
100 loops, best of 3: 4.52 msec per loop

# With the patch

$ ./python -m timeit -s import pickle, io; d=pickle.dumps(list(range(1000)), 
4); b=io.BytesIO(d) b.seek(0); pickle.load(b)
1 loops, best of 3: 42.8 usec per loop

$ ./python -m timeit -s import pickle, _pyio as io; 
d=pickle.dumps(list(range(1000)), 4); b=io.BytesIO(d) b.seek(0); 
pickle.load(b)
1 loops, best of 3: 47.3 usec per loop

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-05-03 Thread Alexandre Vassalotti

Alexandre Vassalotti added the comment:

I am currently fleshing out an improved implementation for the reduce protocol 
version 4. One thing I am curious about is whether we should keep the special 
cases we currently have there for dict and list subclasses.

I recall Raymond expressed disagreement in #msg83098 about this behavior. I 
agree that having __setitem__ called before __init__ make it harder for dict 
and list subclasses to support pickling. To take advantage of the special case, 
subclasses need to do their required initialization in the __new__ method.

On the other hand, it does decrease the memory requirements for unpickling such 
subclasses---i.e., we can build the object in-place instead of building an 
intermediary list or dict. Reading PEP 307 confirms indeed that was the 
original intention.

One possible solution, other than removing the special case completely, is to 
make sure we initialize the object (using the BUILD opcode) before we call 
__setitem__ or append on it. This would be a simple change that would solve the 
initialization issue. However, I would still feel uneasy about the default 
object.__reduce__ behavior depending on the object's subtype.

I think it could be worthwhile to investigate a generic API for pickling 
collections in-place. For example, a such API would helpful for pickling set 
subclasses in-place.

__items__() or   Return an iterator of the items in the collection. Would be
__getitems__()   equivalent to iter(dict.items()) on dicts and iter(list) on
 lists.

__additems__(items)  Add a batch of items to the collection. By default, it 
would
 be defined as:

 for item in items:
 self.__additem__(item)

 However, subclasses would be free to provide a more 
efficient
 implementation of the method. Would be equivalent to
 dict.update on dicts and list.extend on lists.

__additem__(item)Add a single item to the collection. Would be equivalent to
 dict[item[0]] = item[1] on dicts and list.append on lists.

The collections module's ABCs could then provide default implementations of 
this API, which would give its users efficient in-place pickling automatically.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-05-03 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 I think it could be worthwhile to investigate a generic API for
 pickling collections in-place. For example, a such API would helpful
 for pickling set subclasses in-place.

Is the use case important enough? Otherwise, this is more
__special_method__ complication that we'll have to maintain for pickle's
only use.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-05-03 Thread Alexandre Vassalotti

Alexandre Vassalotti added the comment:

Those methods wouldn't be much more a maintenance burden than the special cases 
already present in the implementation of __reduce__. These methods would only 
need to be provided by classes that wishes to support efficient in-place 
pickling provided by protocol 4. As such, this approach better as it would rely 
on duck typing rather than concrete type checks, which IMHO do not belong in 
the default object implementation.

Plus, having this generic API would allow pickle to share the same pickling and 
unpickling code for lists, dicts, sets and other mutable collections.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-05-03 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Here is an updated framing patch which fixes the issue reported by Alexandre. 
There are also a couple added tests.

--
Added file: http://bugs.python.org/file30118/framing3.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-05-03 Thread Alexandre Vassalotti

Alexandre Vassalotti added the comment:

The framing patch seems to have a significant negative effect on performance.

Report on Linux avassalotti 3.2.5-gg1130 #1 SMP Mon Feb 4 02:25:47 PST 2013 
x86_64 x86_64
Total CPU cores: 12

### fastpickle ###
Min: 0.447194 - 0.505841: 1.13x slower
Avg: 0.455517 - 0.509537: 1.12x slower
Significant (t=-22.05)
Stddev: 0.01438 - 0.00967: 1.4875x smaller

### fastunpickle ###
Min: 0.583922 - 0.638744: 1.09x slower
Avg: 0.589183 - 0.649506: 1.10x slower
Significant (t=-21.77)
Stddev: 0.00939 - 0.01720: 1.8324x larger

Would it be possible to mitigate the regression?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-05-03 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 The framing patch seems to have a significant negative effect on
 performance.

I wouldn't call it significant. Any speedup or slowdown less than 50% is
unlikely to be noticeable in real-world applications.

Mitigating the regression is probably a matter of tweaking the
read/write fast paths (optimizing for the common case where a frame is
ongoing and the buffer is neither full nor empty).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-05-03 Thread Alexandre Vassalotti

Alexandre Vassalotti added the comment:

Antoine, can you share the code for your benchmarks which show performance 
improvements when framing is enabled? I am seeing the same 10-15% slowdown even 
when pickling stuff to pure Python objects:

### Without the patch
./python -m timeit -r 50 -s import pickle, _pyio; f = _pyio.BytesIO(); x = 
list(range(1000)) pickle.dump(x, f, protocol=4)
1 loops, best of 50: 28.5 usec per loop

### With the patch
./python -m timeit -r 50 -s import pickle, _pyio; f = _pyio.BytesIO(); x = 
list(range(1000)) pickle.dump(x, f, protocol=4)
1 loops, best of 50: 32.9 usec per loop

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-05-02 Thread Alexandre Vassalotti

Changes by Alexandre Vassalotti alexan...@peadrop.com:


--
dependencies: +Refactor reduce protocol implementation

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-05-02 Thread Alexandre Vassalotti

Alexandre Vassalotti added the comment:

The latest framing patch looks pretty nice overall. One concern is we need to 
make sure the C implementation call _Pickler_OpcodeBoundary often enough to 
keep the frames around the sizes. For example, batch_save_list and 
batch_save_dict can currently create a frame much larger than expected. 
Interestingly enough, I found pickle, with patch applied, crashes when handling 
such frames:

13:44:43 pep-3154 $ ./python -c import pickle, io; 
pickle.dump(list(range(10**5)), io.BytesIO(), 4)
Debug memory block at address p=0x1e96b10: API 'o'
52 bytes originally requested
The 7 pad bytes at p-7 are FORBIDDENBYTE, as expected.
The 8 pad bytes at tail=0x1e96b44 are not all FORBIDDENBYTE (0xfb):
at tail+0: 0x00 *** OUCH
at tail+1: 0x00 *** OUCH
at tail+2: 0x00 *** OUCH
at tail+3: 0x00 *** OUCH
at tail+4: 0x4d *** OUCH
at tail+5: 0x75 *** OUCH
at tail+6: 0x5b *** OUCH
at tail+7: 0xfb
The block was made by call #237465 to debug malloc/realloc.
Data at p: 00 00 00 00 00 00 00 00 ... ff ff ff ff 00 00 00 00
Fatal Python error: bad trailing pad byte

Current thread 0x7f5dea491700:
  File string, line 1 in module
Aborted (core dumped)

Also, I think we should try to make pickletools.dis display the frame 
boundaries to help with debugging. This could be implemented by adding an 
option to pickletools.genops which could be helpful for testing the framing 
implementation as well.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-05-02 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 One concern is we need to make sure the C implementation call
 _Pickler_OpcodeBoundary often enough to keep the frames around the
 sizes. For example, batch_save_list and batch_save_dict can currently
 create a frame much larger than expected.

I don't understand how that can happen. batch_list() and batch_dict()
both call save() for each item, and save() calls
_Pickler_OpcodeBoundary() at the end. Have I missed something?

 Interestingly enough, I found pickle, with patch applied, crashes when
 handling such frames:

Interesting, I'll take a look when I have some time.

 Also, I think we should try to make pickletools.dis display the frame
 boundaries to help with debugging. This could be implemented by adding
 an option to pickletools.genops which could be helpful for testing the
 framing implementation as well.

Agreed.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-05-02 Thread Alexandre Vassalotti

Alexandre Vassalotti added the comment:

 I don't understand how that can happen. batch_list() and batch_dict()
 both call save() for each item, and save() calls
 _Pickler_OpcodeBoundary() at the end. Have I missed something?

Ah, you're right. I was thinking in terms of my fast dispatch patch in issue 
#17787. Sorry for the confusion!

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-05-01 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Here is an updated framing patch which supports pickletools.optimize().

--
Added file: http://bugs.python.org/file30094/framing2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com




[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-29 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Here is a framing patch on top of Alexandre's work.

There is one thing that framing breaks: pickletools.optimize(). I think it 
would be non-trivial to fix it. Perhaps the PREFETCH opcode is a better idea 
for this.

Alexandre, I don't understand why you removed STACK_GLOBAL. GLOBAL is a PITA 
that we should not use in protocol 4 anymore, so we need either STACK_GLOBAL or 
some kind of BINGLOBAL.

--
Added file: http://bugs.python.org/file30068/framing.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-29 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

What is wrong with GLOBAL?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-29 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 What is wrong with GLOBAL?

It uses the lame text mode that scans for newlines, and is generally
annoying to optimize. This is like C strings vs. Pascal strings.
http://www.python.org/dev/peps/pep-3154/#binary-encoding-for-all-opcodes

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-29 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

With framing it isn't annoying.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-29 Thread Alexandre Vassalotti

Alexandre Vassalotti added the comment:

Antoine, I removed STACK_GLOBAL when I found performance issues with the 
implementation. The changeset that added it had some unrelated changes that 
made it harder to debug than necessary. I am planning to re-add it when I 
worked out the kinks.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-29 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 With framing it isn't annoying.

Slightly less, but you still have to wrap readline() calls in the
unpickler.

I have started experimenting with PREFETCH, but making the opcode
optional is a bit annoying in the C pickler, which means it's simpler to
always emit it, which means it's not very different from framing in the
end :-)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-29 Thread Antoine Pitrou

Antoine Pitrou added the comment:

And here is an implementation of PREFETCH over Alexandre's work.
As you can see the code complexity compared to framing is mostly a wash, but I 
think fixing pickletools.optimize() will be easier with PREFETCH (still needs 
confirmation, of course :-)).

--
Added file: http://bugs.python.org/file30072/prefetch.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-27 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I were thinking about framing before looking at your last changes to PEP 3154 
and I have two alternative propositions.

1. Pack picked items in blocks of some predefined (or specified at the start 
with the BLOCKSIZE opcode) size. Only some large data (long strings, large 
integers) can cross the boundary between blocks. In all other cases the block 
should be padded with the NOP opcode.

2. A similar to your proposition, but frames should be declared with a special 
PREFETCH opcode (with 2- or 4-bytes argument). Large data pickled outside 
frames (this prevents doublecopying). Opcode and size of large data object can 
(should?) be included in the previous frame.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-27 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 1. Pack picked items in blocks of some predefined (or specified at the
 start with the BLOCKSIZE opcode) size. Only some large data (long
 strings, large integers) can cross the boundary between blocks. In all
 other cases the block should be padded with the NOP opcode.

Padding makes it both less efficient and more annoying to handle, IMO.
My framing proof-of-concept ends up quite simple in terms of code
complexity. For example, the C version only adds 125 lines of code in 3
additional functions.

 2. A similar to your proposition, but frames should be declared with a
 special PREFETCH opcode (with 2- or 4-bytes argument). Large data
 pickled outside frames (this prevents doublecopying).

No doublecopying is necessary (not in the C version, that is). That
said, this is an interesting idea.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-27 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 Padding makes it both less efficient and more annoying to handle, IMO.

Agree. But there is other application for NOPs. UTF-8 decoder (and some other 
decoders) works more fast (up to 4x) when input is aligned. By adding several 
NOPs before BINUNICODE so that start of encoded data is 4- or 8-bytes aligned 
relatively to start of frame, we can significan speedup unpickling long ASCII 
strings. I propose to add new NOP opcode and to use it to align some 
align-sensitive data.

 My framing proof-of-concept ends up quite simple in terms of code
 complexity. For example, the C version only adds 125 lines of code in 3
 additional functions.

I just looked in the code and saw that the unpickler already has a ready 
infrastructure for prefetching. Now your words have not appear to be so 
incredible. ;) It should work.

 No doublecopying is necessary (not in the C version, that is).

Agree, there is no doublecopying (except for large bytes objects).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-26 Thread Charles-François Natali

Charles-François Natali added the comment:

 I would like to see Proto4 include an option for compression
 (zlib,bz2) or somesuch and become self-decompressing upon unpickling.

I don't see what this would bring over explicit compression:
- depending on the use case, you may want to use different compression 
algorithms, e.g. for disk you may want higher compression ratio like 
bzip2/lzma, but for wire you'd prefer something fast like snappy
- supporting multiple compression algorithms and levels would complicate the API
- this would probably complicate the code, since you'd have to support optional 
compression, and have a way to indicate which format is used
- that's really mixing two entirely different concepts (serialization vs 
compression)

--
nosy: +neologix

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-26 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 I don't see what this would bring over explicit compression:
 - depending on the use case, you may want to use different compression 
 algorithms, e.g. for disk you may want higher compression ratio like 
 bzip2/lzma, but for wire you'd prefer something fast like snappy
 - supporting multiple compression algorithms and levels would complicate the 
 API
 - this would probably complicate the code, since you'd have to support 
 optional compression, and have a way to indicate which format is used
 - that's really mixing two entirely different concepts (serialization vs 
 compression)

I agree with Charles-François.
A feature that may be actually nice to have in the pickle protocol would
be some framing, to help with streaming unpickling (right now unpickling
a stream can read almost one byte at a time, IIRC).
However, that would also make the protocol and the pickler significantly
more complex.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-26 Thread Arfrever Frehtes Taifersar Arahesis

Changes by Arfrever Frehtes Taifersar Arahesis arfrever@gmail.com:


--
nosy: +Arfrever

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-26 Thread Antoine Pitrou

Antoine Pitrou added the comment:

A proof of concept hack to enable framing on pickle showed a massive 
performance increase on streaming unpickling (up to 5x faster with a C file 
object such as io.BytesIO, up to 150x faster with a pure Python file object 
such as _pyio.BytesIO). There is a slight slowdown on non-streaming operation, 
but that could probably be optimized.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-26 Thread Antoine Pitrou

Antoine Pitrou added the comment:

(note: I've updated PEP 3154 with framing and GLOBAL_STACK)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-26 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 A feature that may be actually nice to have in the pickle protocol would
be some framing, to help with streaming unpickling (right now unpickling
a stream can read almost one byte at a time, IIRC).
 However, that would also make the protocol and the pickler significantly
more complex.

What if just use io.BufferedReader?

if not isinstance(file, io.BufferedReader):
file = io.BufferedReader(file)

(at start of _Unpickler.__init__)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-26 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 What if just use io.BufferedReader?
 
 if not isinstance(file, io.BufferedReader):
 file = io.BufferedReader(file)
 
 (at start of _Unpickler.__init__)

Two problems:

1. semantically, it is wrong; the BufferedReader will read bytes beyond
the pickle end, so the underlying stream will be desynchronized

2. performance-wise, it doesn't solve the issue either: read() method
calls are costly, even on an optimized C object

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-25 Thread Raymond Hettinger

Raymond Hettinger added the comment:

I would like to see Proto4 include an option for compression (zlib,bz2) or 
somesuch and become self-decompressing upon unpickling.  The primary use cases 
for pickling involve writing to disk or transmitting across a wire -- both use 
cases benefit from compression (with reduced read/write times).

--
nosy: +rhettinger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-21 Thread Alexandre Vassalotti

New submission from Alexandre Vassalotti:

I have restarted the work on PEP 3154. Stefan Mihaila had begun an 
implementation as part of the Google Summer of Code 2012. Unfortunately, he hit 
multiple roadblocks which prevented him to finish his work by the end of the 
summer. He previously shown interest in completing his implementation. However 
he got constrained by time and never resumed his work.

So I am taking over the implementation of the PEP. I have decided to go forward 
with a brand new code, using Stefan's work only as a guide. At the moment, I 
have completed about half of the PEP---missing only support for calling __new__ 
with keyword arguments and the use of new qualified name for referring objects.

Design-wise, there is still a few things that we should discuss. For example, I 
think Stefan's idea, which is not specified in the PEP, to eliminate PUT 
opcodes is interesting. His proposal was to emit an implicit PUT opcode after 
each object pickled and make the Pickler and Unpickler classes agree on the 
scheme. A drawback of this implicit scheme is we cannot be selective about 
which object we save in the memo during unpickling. That means, for example, we 
won't be able to make pickletools.optimize work with protocol 4 to reduce the 
memory footprint of the unpickling process. This scheme also alters the meaning 
of all previously defined opcodes because of the implicit PUTs, which is sort 
of okay because we are changing protocol. Alternatively, we could use an 
explicit scheme by defining new fat opcodes, for the built-in types we care 
about, which includes memoization. This scheme would a bit more flexible 
however it would also be slightly more involved implementation-wise. In any 
 case, I will run benchmarks to see if either schemes are worthwhile.

--
assignee: alexandre.vassalotti
components: Library (Lib)
hgrepos: 184
messages: 187496
nosy: alexandre.vassalotti, pitrou
priority: high
severity: normal
stage: needs patch
status: open
title: Implement PEP 3154 (pickle protocol 4)
type: enhancement
versions: Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-21 Thread Alexandre Vassalotti

Changes by Alexandre Vassalotti alexan...@peadrop.com:


--
dependencies: +Unbinding of methods

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-21 Thread Andrew Svetlov

Changes by Andrew Svetlov andrew.svet...@gmail.com:


--
nosy: +asvetlov

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-21 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
keywords: +patch
Added file: http://bugs.python.org/file29966/9f1be171da08.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-21 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Thank you for reviving this :)
A couple of questions:
- why ADDITEM in addition to ADDITEMS? I don't think single-element sets are an 
important use case (as opposed to, say, single-element tuples)
- what is the purpose of STACK_GLOBAL? I would say memoization of common names 
but you pass memoize=False

 For example, I think Stefan's idea, which is not specified in the
 PEP, to eliminate PUT opcodes is interesting. His proposal was to
 emit an implicit PUT opcode after each object pickled and make the
 Pickler and Unpickler classes agree on the scheme.

Are the savings worth it?
I've tried pickletools.optimize() on two objects:

- a typical data dict (http.client.responses). The pickle length decreases from 
1155 to 1063 (8% shrink); unpickling is faster by 4%.

- a Logger object (logging.getLogger(foobar). The pickle length decreases 
from 427 to 389 (9% shrink); unpickling is faster by 2%.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-21 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-21 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Link to the previous attempt: issue15642.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-21 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Memoization consumes memory during pickling. For now every memoized object 
requires memory for:

dict's entity;
an id() integer object;
a 2-element tuple;
a pickle's index (an integer object).

It's about 80 bytes on 32-bit platform (and twice as this on 64-bit). For data 
which contains a lot of floats it can be cumbersome.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17810] Implement PEP 3154 (pickle protocol 4)

2013-04-21 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 Memoization consumes memory during pickling. For now every memoized
 object requires memory for:
 
 dict's entity;
 an id() integer object;
 a 2-element tuple;
 a pickle's index (an integer object).
 
 It's about 80 bytes on 32-bit platform (and twice as this on 64-bit).

As far as I understand, Alexandre doesn't propose to suppress
memoization, only to make it implicit. Therefore the memory overhead
would be the same (but the pickle would have less opcodes).

 For data which contains a lot of floats it can be cumbersome.

Apparently, floats don't get memoized:

 pickletools.dis(pickle.dumps([1.0, 2.0]))
0: \x80 PROTO  3
2: ]EMPTY_LIST
3: qBINPUT 0
5: (MARK
6: GBINFLOAT   1.0
   15: GBINFLOAT   2.0
   24: eAPPENDS(MARK at 5)
   25: .STOP

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17810
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com