[Python-Dev] Inconsistent nesting of scopes in exec(..., locals())

2010-04-23 Thread Joachim B Haga
There seem to be an inconsistency in the handling of local scopes in
exec. Consider the following code, which raises NameError if the '#' is
removed from the second last line.


block = """
b = 'ok'
def f():
print(b)# raises NameError here
f()
"""
scope = locals()#.copy()
exec(block, globals(), scope)


The intermediate scope is searched for the variable name if the third
argument to exec() is locals(), but not if it is locals().copy().
Testing further, it looks like NameError is raised for any dict which 
is not identically equal to either globals() or locals().

This behaviour is quite unexpected, and I believe it qualifies as a 
bug. Tested with python 2.6.5 and 3.1.2.

-- 
Joachim B Haga

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] autoconf update to 2.65 and cleanup

2010-04-23 Thread Matthias Klose
configure is still generated by 2.61; would it be possible to update to 2.65? 
The cr_lf issue mentioned in [1] seems to be resolved, ac_cr is now defined as


  ac_cr=`echo X | tr X '\015'`

Proposing to

 - fix some quoting in help strings and code snippets (#8509)
 - update to autoconf 2.65 (#8510)
 - convert obsolete macros (AC_HELP_STRING, AC_TRY_*, AC_AIX, ...)
   one by one (tracking these in separate reports).

Could this be done for both trunk and py3k branch, even if 2.7 already is in 
beta?


  Matthias

[1] http://mail.python.org/pipermail/python-dev/2008-November/083781.html
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] autoconf update to 2.65 and cleanup

2010-04-23 Thread Victor Stinner
Le vendredi 23 avril 2010 17:26:59, Matthias Klose a écrit :
> configure is still generated by 2.61; would it be possible to update to
>  2.65?

Yes, everything is possible. Open a new issue and write a patch ;-)

> even if 2.7 already is in beta?

I'm not sure that it's a good idea to change the build process after the first 
beta. It would depend on the issue comments ;-)

-- 
Victor Stinner
http://www.haypocalc.com/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] code.python.org - random 403 errors

2010-04-23 Thread Victor Stinner
Le jeudi 22 avril 2010 22:14:48, Sridhar Ratnakumar a écrit :
> I am seeing random 403 errors when cloning the mercurial repositories of
>  Python.

I don't know if it is related, but I get errors from the bbreport tool:
--
$ python2.6 bbreport.py 3.x
Selected builders: 20 / 80 (branch: 3.x)

Traceback (most recent call last):
  File "bbreport.py", line 903, in 
builders = main()
  File "bbreport.py", line 853, in main
for xrb in proxy.getLastBuildsAllBuilders(limit):
  File "/usr/lib/python2.6/xmlrpclib.py", line 1199, in __call__
return self.__send(self.__name, args)
  File "/usr/lib/python2.6/xmlrpclib.py", line 1489, in __request
verbose=self.__verbose
  File "/usr/lib/python2.6/xmlrpclib.py", line 1253, in request
return self._parse_response(h.getfile(), sock)
  File "/usr/lib/python2.6/xmlrpclib.py", line 1392, in _parse_response
return u.close()
  File "/usr/lib/python2.6/xmlrpclib.py", line 838, in close
raise Fault(**self._stack[0])
xmlrpclib.Fault: 
--

Did the buildbot configuration/version changed recently?

-- 
Victor Stinner
http://www.haypocalc.com/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Summary of Python tracker Issues

2010-04-23 Thread Python tracker

ACTIVITY SUMMARY (2010-04-16 - 2010-04-23)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue 
number.  Do NOT respond to this message.


 2665 open (+58) / 17664 closed (+31) / 20329 total (+89)

Open issues with patches:  1084

Average duration of open issues: 725 days.
Median duration of open issues: 494 days.

Open Issues Breakdown
   open  2621 (+58)
languishing 9 ( +0)
pending34 ( +0)

Issues Created Or Reopened (94)
___

Obsolete RFCs should be removed from doc of urllib.urlparse2010-04-19
CLOSED http://bugs.python.org/issue5650reopened ezio.melotti
  
   patch, easy 

[PATCH] Drop "Computer" from "Apple Computer" in plistlib  2010-04-20
   http://bugs.python.org/issue7852reopened haypo   
  
   patch   

configure: ignore AC_PROG_CC hardcoded CFLAGS  2010-04-23
   http://bugs.python.org/issue8211reopened lemburg 
  
   patch   

subprocess: support undecodable current working directory on P 2010-04-19
CLOSED http://bugs.python.org/issue8393reopened amaury.forgeotdarc  
  
   patch   

ctypes.dlopen() doesn't support surrogates 2010-04-19
CLOSED http://bugs.python.org/issue8394reopened amaury.forgeotdarc  
  
   patch   

tiger buildbot: unable to resolv hostname address  2010-04-16
CLOSED http://bugs.python.org/issue8421created  haypo   
  
   

tiger buildbot: test_abspath_issue3426 failure (test_genericpa 2010-04-16
CLOSED http://bugs.python.org/issue8422created  haypo   
  
   buildbot

tiger buildbot: test_pep277 failures   2010-04-16
   http://bugs.python.org/issue8423created  haypo   
  
   buildbot

Test assumptions for test_itimer_virtual and test_itimer_prof  2010-04-16
   http://bugs.python.org/issue8424created  haypo   
  
   patch, buildbot 

a -= b should be fast if a is a small set and b is a large set 2010-04-16
   http://bugs.python.org/issue8425created  abacabadabacaba 
  
   easy

multiprocessing.Queue fails to get() very large objects2010-04-16
   http://bugs.python.org/issue8426created  Ian.Davis   
  
   

toplevel jumps to another location on the screen   2010-04-16
   http://bugs.python.org/issue8427created  aparasch
  
   

buildbot: test_multiprocessing timeout (test_notify_all? test_ 2010-04-16
   http://bugs.python.org/issue8428created  haypo   
  
   buildbot

buildbot: test_subprocess timeout  2010-04-16
   http://bugs.python.org/issue8429created  haypo   
  
   buildbot

test_site failure with non-ASCII directory 2010-04-17
CLOSED http://bugs.python.org/issue8430created  haypo   
  
   patch, buildbot 

buildbot: hung on ARM Debian   2010-04-17
   http://bugs.python.org/issue8431created  haypo   
  
   buildbot

build: test_send_signal of test_subprocess failure 2010-04-17
   http://bugs.python.org/issue8432created  haypo   
  
   buildbot

buildbot: test_curses failure, getmouse() returned ERR 2010-04-17
   http://bugs.python.org/issue8433created  haypo   
  
   buildbot

buildbot: test_gdb failure on sparc Ubuntu trunk   2010-04-17
CLOSED ht

Re: [Python-Dev] autoconf update to 2.65 and cleanup

2010-04-23 Thread Barry Warsaw
On Apr 23, 2010, at 05:44 PM, Victor Stinner wrote:

>I'm not sure that it's a good idea to change the build process after the
>first beta. It would depend on the issue comments ;-)

OTOH, this doesn't seem like a new feature, so I think it should be okay.
Doubly so if it fixes a bug. 

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Inconsistent nesting of scopes in exec(..., locals())

2010-04-23 Thread Nick Coghlan
Joachim B Haga wrote:
> There seem to be an inconsistency in the handling of local scopes in
> exec. Consider the following code, which raises NameError if the '#' is
> removed from the second last line.
> 
> 
> block = """
> b = 'ok'
> def f():
> print(b)# raises NameError here
> f()
> """
> scope = locals()#.copy()
> exec(block, globals(), scope)
> 
> 
> The intermediate scope is searched for the variable name if the third
> argument to exec() is locals(), but not if it is locals().copy().
> Testing further, it looks like NameError is raised for any dict which 
> is not identically equal to either globals() or locals().

What actually matters is whether or not the first and second scope are
the same dictionary or not.

If they're different, then the supplied local scope is treated as
equivalent to a class definition scope, and hence won't participate in
lexical scoping. If they're the same (which happens implicitly if the
second one is omitted) then they're treated as a module scope (and hence
written values are visible as globals inside any defined functions).

(Using 2.x syntax)

>>> outer_scope = dict()
>>> inner_scope = dict()
>>> block = """
... b = 'ok'
... def f():
...   print (b)
... f()
... """
>>> exec block in outer_scope
ok
>>> outer_scope.clear()
>>> exec block in outer_scope, outer_scope
ok
>>> outer_scope.clear()
>>> exec block in outer_scope, inner_scope
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 5, in 
  File "", line 4, in f
NameError: global name 'b' is not defined

Since changing this would break class definitions, that ain't going to
happen. Suggestions for how to explain the behaviour more clearly in the
exec() documentation probably wouldn't hurt though.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
---
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] code.python.org - random 403 errors

2010-04-23 Thread Sridhar Ratnakumar

On 2010-04-22, at 10:55 PM, Jeroen Ruigrok van der Werven wrote:

> -On [20100423 02:48], Sridhar Ratnakumar ([email protected]) wrote:
>>> Ok, I setup a cron job to maintain an internal mirror of the above 
>>> mentioned repositories in code.python.org. We'll do a "hg pull -u" 
>>> (equivalent to "svn up") every hour; no clones. Hopefully, that should 
>>> reduce the amount of requests from our side. Let me know if in future this 
>>> issue repeats.
> 
> Dirk Jan can probably correct me (or some other heavy Hg user) but for all I
> know you should indeed simply clone once and subsequently hg pull and from
> your local copy clone as you like. (At least that's also how
> http://wiki.services.openoffice.org/wiki/Mercurial/Getting_Started seems to
> aim at explaining.)

Since the "download Python source code" thing is just part of the ActivePython 
build script, and the Hudson build script deletes the "build/" directory of the 
previous build .. a clone was necessary. To fix this I ended up creating a 
"mirror" at a local site which mirror was maintained by an hourly 'hg pull -u'. 
The hudson build script still does a clone although from this local mirror URL.

(Incidentally, cloning from this mirror via the Apache index listing URL 
doesn't seem to work; gotta investigate why...)

-srid

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Unpickling memory usage problem, and a proposed solution

2010-04-23 Thread Dan Gindikin
We were having performance problems unpickling a large pickle file, we were
getting 170s running time (which was fine), but 1100mb memory usage. Memory
usage ought to have been about 300mb, this was happening because of memory
fragmentation, due to many unnecessary "puts" in the pickle stream.

We made a pickletools.optimize inspired tool that could run directly on a
pickle file and used pickletools.genops. This solved the unpickling problem
(84s, 382mb).

However the tool itself was using too much memory and time (1100s, 470mb), so
I recoded it to scan through the pickle stream directly, without going through
pickletools.genops, giving (240s, 130mb).

Other people that deal with large pickle files are probably having similar
problems, and since this comes up when dealing with large data it is precisely
in this situation that you probably can't use pickletools.optimize or
pickletools.genops. It feels like functionality that ought to be added to
pickletools, is there some way I can contribute this?

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] autoconf update to 2.65 and cleanup

2010-04-23 Thread Brett Cannon
On Fri, Apr 23, 2010 at 09:21, Barry Warsaw  wrote:

> On Apr 23, 2010, at 05:44 PM, Victor Stinner wrote:
>
> >I'm not sure that it's a good idea to change the build process after the
> >first beta. It would depend on the issue comments ;-)
>
> OTOH, this doesn't seem like a new feature, so I think it should be okay.
> Doubly so if it fixes a bug. 
>
>
I'm with Barry; if this fixes something then it's worth a go.

-Brett




>  -Barry
>
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/brett%40python.org
>
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution

2010-04-23 Thread Brett Cannon
On Fri, Apr 23, 2010 at 11:11, Dan Gindikin  wrote:

> We were having performance problems unpickling a large pickle file, we were
> getting 170s running time (which was fine), but 1100mb memory usage. Memory
> usage ought to have been about 300mb, this was happening because of memory
> fragmentation, due to many unnecessary "puts" in the pickle stream.
>
> We made a pickletools.optimize inspired tool that could run directly on a
> pickle file and used pickletools.genops. This solved the unpickling problem
> (84s, 382mb).
>
> However the tool itself was using too much memory and time (1100s, 470mb),
> so
> I recoded it to scan through the pickle stream directly, without going
> through
> pickletools.genops, giving (240s, 130mb).
>
> Other people that deal with large pickle files are probably having similar
> problems, and since this comes up when dealing with large data it is
> precisely
> in this situation that you probably can't use pickletools.optimize or
> pickletools.genops. It feels like functionality that ought to be added to
> pickletools, is there some way I can contribute this?
>

The best next step is to open an issue at bugs.python.org and upload the
patch. I can't make any guarantees on when someone will look at it or if it
will get accepted, but putting the code there is your best bet for
acceptance.

-Brett
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution

2010-04-23 Thread Alexandre Vassalotti
On Fri, Apr 23, 2010 at 2:11 PM, Dan Gindikin  wrote:
> We were having performance problems unpickling a large pickle file, we were
> getting 170s running time (which was fine), but 1100mb memory usage. Memory
> usage ought to have been about 300mb, this was happening because of memory
> fragmentation, due to many unnecessary "puts" in the pickle stream.
>
> We made a pickletools.optimize inspired tool that could run directly on a
> pickle file and used pickletools.genops. This solved the unpickling problem
> (84s, 382mb).
>
> However the tool itself was using too much memory and time (1100s, 470mb), so
> I recoded it to scan through the pickle stream directly, without going through
> pickletools.genops, giving (240s, 130mb).
>

Collin Winter wrote a simple optimization pass for cPickle in Unladen
Swallow [1]. The code reads through the stream and remove all the
unnecessary PUTs in-place.

[1]: 
http://code.google.com/p/unladen-swallow/source/browse/trunk/Modules/cPickle.c#735

> Other people that deal with large pickle files are probably having similar
> problems, and since this comes up when dealing with large data it is precisely
> in this situation that you probably can't use pickletools.optimize or
> pickletools.genops. It feels like functionality that ought to be added to
> pickletools, is there some way I can contribute this?
>

Just put your code on bugs.python.org and I will take a look.

-- Alexandre
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution

2010-04-23 Thread Alexandre Vassalotti
On Fri, Apr 23, 2010 at 2:38 PM, Alexandre Vassalotti
 wrote:
> Collin Winter wrote a simple optimization pass for cPickle in Unladen
> Swallow [1]. The code reads through the stream and remove all the
> unnecessary PUTs in-place.
>

I just noticed the code removes *all* PUT opcodes, regardless if they
are needed or not. So, this code can only be used if there's no GET in
the stream (which is unlikely for a large stream). I believe Collin
made this trade-off for performance reasons. However, it wouldn't be
hard to make the current code to work like pickletools.optimize().

-- Alexandre
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution

2010-04-23 Thread Dan Gindikin
Alexandre Vassalotti  peadrop.com> writes:

> Just put your code on bugs.python.org and I will take a look.
> 

Thanks, I'll put it in there.




___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution

2010-04-23 Thread Collin Winter
On Fri, Apr 23, 2010 at 11:49 AM, Alexandre Vassalotti
 wrote:
> On Fri, Apr 23, 2010 at 2:38 PM, Alexandre Vassalotti
>  wrote:
>> Collin Winter wrote a simple optimization pass for cPickle in Unladen
>> Swallow [1]. The code reads through the stream and remove all the
>> unnecessary PUTs in-place.
>>
>
> I just noticed the code removes *all* PUT opcodes, regardless if they
> are needed or not. So, this code can only be used if there's no GET in
> the stream (which is unlikely for a large stream). I believe Collin
> made this trade-off for performance reasons. However, it wouldn't be
> hard to make the current code to work like pickletools.optimize().

The optimization pass is only run if you don't use any GETs. The
optimization is also disabled if you're writing to a file-like object.
These tradeoffs were appropriate for the workload I was optimizing
against.

Collin Winter
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution

2010-04-23 Thread Collin Winter
On Fri, Apr 23, 2010 at 11:53 AM, Collin Winter  wrote:
> On Fri, Apr 23, 2010 at 11:49 AM, Alexandre Vassalotti
>  wrote:
>> On Fri, Apr 23, 2010 at 2:38 PM, Alexandre Vassalotti
>>  wrote:
>>> Collin Winter wrote a simple optimization pass for cPickle in Unladen
>>> Swallow [1]. The code reads through the stream and remove all the
>>> unnecessary PUTs in-place.
>>>
>>
>> I just noticed the code removes *all* PUT opcodes, regardless if they
>> are needed or not. So, this code can only be used if there's no GET in
>> the stream (which is unlikely for a large stream). I believe Collin
>> made this trade-off for performance reasons. However, it wouldn't be
>> hard to make the current code to work like pickletools.optimize().
>
> The optimization pass is only run if you don't use any GETs. The
> optimization is also disabled if you're writing to a file-like object.
> These tradeoffs were appropriate for the workload I was optimizing
> against.

I should add that, adding the necessary bookkeeping to remove only
unused PUTs (instead of the current all-or-nothing scheme) should not
be hard. I'd watch out for a further performance/memory hit; the
pickling benchmarks in the benchmark suite should help assess this.
The current optimization penalizes pickling to speed up unpickling,
which made sense when optimizing pickles that would go into memcache
and be read out 13-15x more often than they were written.

Collin Winter
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution

2010-04-23 Thread Alexandre Vassalotti
On Fri, Apr 23, 2010 at 3:07 PM, Collin Winter  wrote:
> I should add that, adding the necessary bookkeeping to remove only
> unused PUTs (instead of the current all-or-nothing scheme) should not
> be hard. I'd watch out for a further performance/memory hit; the
> pickling benchmarks in the benchmark suite should help assess this.

I was thinking about this too. A simple boolean table could be fast,
while keeping the space requirement down. This scheme would be nice to
caches as well.

> The current optimization penalizes pickling to speed up unpickling,
> which made sense when optimizing pickles that would go into memcache
> and be read out 13-15x more often than they were written.

This is my current impression of how pickle is most often used. Are
you aware of a use case of pickle where you do more writes than reads?
I can't think of any.

-- Alexandre
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Inconsistent nesting of scopes in exec(..., locals())

2010-04-23 Thread Joachim B Haga
Nick Coghlan  writes:

> Joachim B Haga wrote:
>> There seem to be an inconsistency in the handling of local scopes in
>> exec. [...]
>> 
>> The intermediate scope is searched for the variable name if the third
>> argument to exec() is locals(), but not if it is locals().copy().
>
> What actually matters is whether or not the first and second scope are
> the same dictionary or not.
>
> If they're different, then the supplied local scope is treated as
> equivalent to a class definition scope, and hence won't participate in
> lexical scoping. If they're the same (which happens implicitly if the
> second one is omitted) then they're treated as a module scope (and hence
> written values are visible as globals inside any defined functions).

Ok, thank you for the explanation.

> Since changing this would break class definitions, that ain't going to
> happen. Suggestions for how to explain the behaviour more clearly in the
> exec() documentation probably wouldn't hurt though.

I don't quite see how exec() affects the class definition syntax?
Anyhow, I definitely agree that this should be documented. I 
suggest the following (condensed from your explanation):

-If provided, /locals/ can be any mapping object.
+If provided, /locals/ can be any mapping object. It is treated as
equivalent to a class definition scope, and hence does not participate
in lexical scoping.

-- 
Joachim B Haga

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution

2010-04-23 Thread Dan Gindikin
Collin Winter  google.com> writes:
> I should add that, adding the necessary bookkeeping to remove only
> unused PUTs (instead of the current all-or-nothing scheme) should not
> be hard. I'd watch out for a further performance/memory hit; the
> pickling benchmarks in the benchmark suite should help assess this.
> The current optimization penalizes pickling to speed up unpickling,
> which made sense when optimizing pickles that would go into memcache
> and be read out 13-15x more often than they were written.

This wouldn't help our use case, your code needs the entire pickle
stream to be in memory, which in our case would be about 475mb, this
is on top of the 300mb+ data structures that generated the pickle
stream.



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution

2010-04-23 Thread Alexandre Vassalotti
On Fri, Apr 23, 2010 at 3:57 PM, Dan Gindikin  wrote:
> This wouldn't help our use case, your code needs the entire pickle
> stream to be in memory, which in our case would be about 475mb, this
> is on top of the 300mb+ data structures that generated the pickle
> stream.
>

In that case, the best we could do is a two-pass algorithm to remove
the unused PUTs. That won't be efficient, but it will satisfy the
memory constraint. Another solution is to not generate the PUTs at all
by setting the 'fast' attribute on Pickler. But that won't work if you
have a recursive structure, or have code that requires that the
identity of objects to be preserved.

>>> import io, pickle
>>> x=[1,2]
>>> f = io.BytesIO()
>>> p = pickle.Pickler(f, protocol=-1)
>>> p.dump([x,x])
>>> pickletools.dis(f.getvalue())
0: \x80 PROTO  2
2: ]EMPTY_LIST
3: qBINPUT 0
5: (MARK
6: ]EMPTY_LIST
7: qBINPUT 1
9: (MARK
   10: KBININT11
   12: KBININT12
   14: eAPPENDS(MARK at 9)
   15: hBINGET 1
   17: eAPPENDS(MARK at 5)
   18: .STOP
highest protocol among opcodes = 2
>>> [id(x) for x in pickle.loads(f.getvalue())]
[20966504, 20966504]

Now with the 'fast' mode enabled:

>>> f = io.BytesIO()
>>> p = pickle.Pickler(f, protocol=-1)
>>> p.fast = True
>>> p.dump([x,x])
>>> pickletools.dis(f.getvalue())
0: \x80 PROTO  2
2: ]EMPTY_LIST
3: (MARK
4: ]EMPTY_LIST
5: (MARK
6: KBININT11
8: KBININT12
   10: eAPPENDS(MARK at 5)
   11: ]EMPTY_LIST
   12: (MARK
   13: KBININT11
   15: KBININT12
   17: eAPPENDS(MARK at 12)
   18: eAPPENDS(MARK at 3)
   19: .STOP
highest protocol among opcodes = 2
>>> [id(x) for x in pickle.loads(f.getvalue())]
[20966504, 21917992]

As you can observe, the pickle stream generated with the fast mode
might actually be bigger.

By the way, it is weird that the total memory usage of the data
structure is smaller than the size of its respective pickle stream.
What pickle protocol are you using?

-- Alexandre
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution

2010-04-23 Thread Dan Gindikin
Alexandre Vassalotti  peadrop.com> writes:

> 
> On Fri, Apr 23, 2010 at 3:57 PM, Dan Gindikin  gmail.com> 
> wrote:
> > This wouldn't help our use case, your code needs the entire pickle
> > stream to be in memory, which in our case would be about 475mb, this
> > is on top of the 300mb+ data structures that generated the pickle
> > stream.
> >
> 
> In that case, the best we could do is a two-pass algorithm to remove
> the unused PUTs. That won't be efficient, but it will satisfy the
> memory constraint.

That is for what I'm doing for us right now.

> Another solution is to not generate the PUTs at all
> by setting the 'fast' attribute on Pickler. But that won't work if you
> have a recursive structure, or have code that requires that the
> identity of objects to be preserved.

We definitely have some cross links amongst the objects, so we need PUTs.

> By the way, it is weird that the total memory usage of the data
> structure is smaller than the size of its respective pickle stream.
> What pickle protocol are you using?

Its highest protocol, but we have a bunch of extension types that
get expanded into python tuples for pickling.




___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution

2010-04-23 Thread Collin Winter
On Fri, Apr 23, 2010 at 1:53 PM, Alexandre Vassalotti
 wrote:
> On Fri, Apr 23, 2010 at 3:57 PM, Dan Gindikin  wrote:
>> This wouldn't help our use case, your code needs the entire pickle
>> stream to be in memory, which in our case would be about 475mb, this
>> is on top of the 300mb+ data structures that generated the pickle
>> stream.
>>
>
> In that case, the best we could do is a two-pass algorithm to remove
> the unused PUTs. That won't be efficient, but it will satisfy the
> memory constraint. Another solution is to not generate the PUTs at all
> by setting the 'fast' attribute on Pickler. But that won't work if you
> have a recursive structure, or have code that requires that the
> identity of objects to be preserved.

I don't think it's possible in general to remove any PUTs if the
pickle is being written to a file-like object. It is possible to reuse
a single Pickler to pickle multiple objects: this causes the Pickler's
memo dict to be shared between the objects being pickled. If you
pickle foo, bar, and baz, foo may not have any GETs, but bar and baz
may have GETs that reference data added to the memo by foo's PUT
operations. Because you can't know what will be written to the
file-like object later, you can't remove any of the PUT instructions
in this scenario.

This kind of thing is done in real-world code like cvs2svn (which I
broke when I was optimizing cPickle; don't break cvs2svn, it's not fun
to fix :). I added some basic tests for this support in cPython's
Lib/test/pickletester.py.

There might be room for app-specific optimizations that do this, but
I'm not sure it would work for a general-usage cPickle that needs to
stay compatible with the current system.

Collin Winter
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution

2010-04-23 Thread Antoine Pitrou
Collin Winter  google.com> writes:
> 
> I don't think it's possible in general to remove any PUTs if the
> pickle is being written to a file-like object.

Does cPickle bytecode have some kind of NOP instruction?
You could keep track of which PUTs weren't necessary and zero them out at the
end. It would be much cheaper than writing a whole other "optimized" stream.

(of course it only works on a seekable stream :-))

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution

2010-04-23 Thread Dan Gindikin
Collin Winter  google.com> writes:
> I don't think it's possible in general to remove any PUTs if the
> pickle is being written to a file-like object. It is possible to reuse
> a single Pickler to pickle multiple objects: this causes the Pickler's
> memo dict to be shared between the objects being pickled. If you
> pickle foo, bar, and baz, foo may not have any GETs, but bar and baz
> may have GETs that reference data added to the memo by foo's PUT
> operations. Because you can't know what will be written to the
> file-like object later, you can't remove any of the PUT instructions
> in this scenario.

Hmm, that is a good point. A possible solution would be for the
two-pass optimizer to scan through the entire file, going right
through '.' opcodes. That would deal with the case you are
describing, but not if the user "maliciously" wrote some other
stuff into the file in between pickle dumps, all the while reusing
the same pickler.

I think a better solution would be to make sure that the '.' is
the last thing in the file and die otherwise. This would at least
ensure correctness and detection of cases that this thing could
not handle.

> don't break cvs2svn, it's not fun
> to fix :). I added some basic tests for this support in cPython's
> Lib/test/pickletester.py.

Thanks for the warning :)

> There might be room for app-specific optimizations that do this, but
> I'm not sure it would work for a general-usage cPickle that needs to
> stay compatible with the current system.

That may well be true. Still, when trying to deal with large data
you really need something like this. Our situation was made worse because
we had a extension types. As they were allocated they got interspersed
with temporaries generated by the spurious PUTs, and that is what
really fragmented the memory. However its probably not a stretch to
assume that if you are dealing with large stuff through python you are
going to have extension types in the mix.




___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution

2010-04-23 Thread Dan Gindikin
Antoine Pitrou  pitrou.net> writes:
> Does cPickle bytecode have some kind of NOP instruction?
> You could keep track of which PUTs weren't necessary and zero them out at the
> end. It would be much cheaper than writing a whole other "optimized" stream.

For a large file, I'm not sure it is much faster to edit it in place
than to rewrite it, and also since it's a large file, you are going
to probably have it compressed, in which case you are out of luck anyway.

> (of course it only works on a seekable stream )

Indeed... since I was dealing with zipped files, I had to pass in
a "seek0" func, like so:

   file_in_seek0_func = lambda x: os.popen('zcat ' + file_in)



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution

2010-04-23 Thread Antoine Pitrou
Dan Gindikin  gmail.com> writes:
> 
> Antoine Pitrou  pitrou.net> writes:
> > Does cPickle bytecode have some kind of NOP instruction?
> > You could keep track of which PUTs weren't necessary and zero them out at 
> > the
> > end. It would be much cheaper than writing a whole other "optimized" stream.
> 
> For a large file, I'm not sure it is much faster to edit it in place
> than to rewrite it, and also since it's a large file, you are going
> to probably have it compressed, in which case you are out of luck anyway.

Depends whether you really care about disk occupation or not. Disk space is
often cheap (much more so than RAM).
Also, I'm quite sure overwriting a couple of blocks in a file is cheaper than
rewriting it entirely. Of course, if you must overwrite every other block, it
might not be true anymore :-)

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com