[Python-Dev] Inconsistent nesting of scopes in exec(..., locals())
There seem to be an inconsistency in the handling of local scopes in exec. Consider the following code, which raises NameError if the '#' is removed from the second last line. block = """ b = 'ok' def f(): print(b)# raises NameError here f() """ scope = locals()#.copy() exec(block, globals(), scope) The intermediate scope is searched for the variable name if the third argument to exec() is locals(), but not if it is locals().copy(). Testing further, it looks like NameError is raised for any dict which is not identically equal to either globals() or locals(). This behaviour is quite unexpected, and I believe it qualifies as a bug. Tested with python 2.6.5 and 3.1.2. -- Joachim B Haga ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] autoconf update to 2.65 and cleanup
configure is still generated by 2.61; would it be possible to update to 2.65? The cr_lf issue mentioned in [1] seems to be resolved, ac_cr is now defined as ac_cr=`echo X | tr X '\015'` Proposing to - fix some quoting in help strings and code snippets (#8509) - update to autoconf 2.65 (#8510) - convert obsolete macros (AC_HELP_STRING, AC_TRY_*, AC_AIX, ...) one by one (tracking these in separate reports). Could this be done for both trunk and py3k branch, even if 2.7 already is in beta? Matthias [1] http://mail.python.org/pipermail/python-dev/2008-November/083781.html ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] autoconf update to 2.65 and cleanup
Le vendredi 23 avril 2010 17:26:59, Matthias Klose a écrit : > configure is still generated by 2.61; would it be possible to update to > 2.65? Yes, everything is possible. Open a new issue and write a patch ;-) > even if 2.7 already is in beta? I'm not sure that it's a good idea to change the build process after the first beta. It would depend on the issue comments ;-) -- Victor Stinner http://www.haypocalc.com/ ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] code.python.org - random 403 errors
Le jeudi 22 avril 2010 22:14:48, Sridhar Ratnakumar a écrit : > I am seeing random 403 errors when cloning the mercurial repositories of > Python. I don't know if it is related, but I get errors from the bbreport tool: -- $ python2.6 bbreport.py 3.x Selected builders: 20 / 80 (branch: 3.x) Traceback (most recent call last): File "bbreport.py", line 903, in builders = main() File "bbreport.py", line 853, in main for xrb in proxy.getLastBuildsAllBuilders(limit): File "/usr/lib/python2.6/xmlrpclib.py", line 1199, in __call__ return self.__send(self.__name, args) File "/usr/lib/python2.6/xmlrpclib.py", line 1489, in __request verbose=self.__verbose File "/usr/lib/python2.6/xmlrpclib.py", line 1253, in request return self._parse_response(h.getfile(), sock) File "/usr/lib/python2.6/xmlrpclib.py", line 1392, in _parse_response return u.close() File "/usr/lib/python2.6/xmlrpclib.py", line 838, in close raise Fault(**self._stack[0]) xmlrpclib.Fault: -- Did the buildbot configuration/version changed recently? -- Victor Stinner http://www.haypocalc.com/ ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Summary of Python tracker Issues
ACTIVITY SUMMARY (2010-04-16 - 2010-04-23) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 2665 open (+58) / 17664 closed (+31) / 20329 total (+89) Open issues with patches: 1084 Average duration of open issues: 725 days. Median duration of open issues: 494 days. Open Issues Breakdown open 2621 (+58) languishing 9 ( +0) pending34 ( +0) Issues Created Or Reopened (94) ___ Obsolete RFCs should be removed from doc of urllib.urlparse2010-04-19 CLOSED http://bugs.python.org/issue5650reopened ezio.melotti patch, easy [PATCH] Drop "Computer" from "Apple Computer" in plistlib 2010-04-20 http://bugs.python.org/issue7852reopened haypo patch configure: ignore AC_PROG_CC hardcoded CFLAGS 2010-04-23 http://bugs.python.org/issue8211reopened lemburg patch subprocess: support undecodable current working directory on P 2010-04-19 CLOSED http://bugs.python.org/issue8393reopened amaury.forgeotdarc patch ctypes.dlopen() doesn't support surrogates 2010-04-19 CLOSED http://bugs.python.org/issue8394reopened amaury.forgeotdarc patch tiger buildbot: unable to resolv hostname address 2010-04-16 CLOSED http://bugs.python.org/issue8421created haypo tiger buildbot: test_abspath_issue3426 failure (test_genericpa 2010-04-16 CLOSED http://bugs.python.org/issue8422created haypo buildbot tiger buildbot: test_pep277 failures 2010-04-16 http://bugs.python.org/issue8423created haypo buildbot Test assumptions for test_itimer_virtual and test_itimer_prof 2010-04-16 http://bugs.python.org/issue8424created haypo patch, buildbot a -= b should be fast if a is a small set and b is a large set 2010-04-16 http://bugs.python.org/issue8425created abacabadabacaba easy multiprocessing.Queue fails to get() very large objects2010-04-16 http://bugs.python.org/issue8426created Ian.Davis toplevel jumps to another location on the screen 2010-04-16 http://bugs.python.org/issue8427created aparasch buildbot: test_multiprocessing timeout (test_notify_all? test_ 2010-04-16 http://bugs.python.org/issue8428created haypo buildbot buildbot: test_subprocess timeout 2010-04-16 http://bugs.python.org/issue8429created haypo buildbot test_site failure with non-ASCII directory 2010-04-17 CLOSED http://bugs.python.org/issue8430created haypo patch, buildbot buildbot: hung on ARM Debian 2010-04-17 http://bugs.python.org/issue8431created haypo buildbot build: test_send_signal of test_subprocess failure 2010-04-17 http://bugs.python.org/issue8432created haypo buildbot buildbot: test_curses failure, getmouse() returned ERR 2010-04-17 http://bugs.python.org/issue8433created haypo buildbot buildbot: test_gdb failure on sparc Ubuntu trunk 2010-04-17 CLOSED ht
Re: [Python-Dev] autoconf update to 2.65 and cleanup
On Apr 23, 2010, at 05:44 PM, Victor Stinner wrote: >I'm not sure that it's a good idea to change the build process after the >first beta. It would depend on the issue comments ;-) OTOH, this doesn't seem like a new feature, so I think it should be okay. Doubly so if it fixes a bug. -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Inconsistent nesting of scopes in exec(..., locals())
Joachim B Haga wrote: > There seem to be an inconsistency in the handling of local scopes in > exec. Consider the following code, which raises NameError if the '#' is > removed from the second last line. > > > block = """ > b = 'ok' > def f(): > print(b)# raises NameError here > f() > """ > scope = locals()#.copy() > exec(block, globals(), scope) > > > The intermediate scope is searched for the variable name if the third > argument to exec() is locals(), but not if it is locals().copy(). > Testing further, it looks like NameError is raised for any dict which > is not identically equal to either globals() or locals(). What actually matters is whether or not the first and second scope are the same dictionary or not. If they're different, then the supplied local scope is treated as equivalent to a class definition scope, and hence won't participate in lexical scoping. If they're the same (which happens implicitly if the second one is omitted) then they're treated as a module scope (and hence written values are visible as globals inside any defined functions). (Using 2.x syntax) >>> outer_scope = dict() >>> inner_scope = dict() >>> block = """ ... b = 'ok' ... def f(): ... print (b) ... f() ... """ >>> exec block in outer_scope ok >>> outer_scope.clear() >>> exec block in outer_scope, outer_scope ok >>> outer_scope.clear() >>> exec block in outer_scope, inner_scope Traceback (most recent call last): File "", line 1, in File "", line 5, in File "", line 4, in f NameError: global name 'b' is not defined Since changing this would break class definitions, that ain't going to happen. Suggestions for how to explain the behaviour more clearly in the exec() documentation probably wouldn't hurt though. Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia --- ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] code.python.org - random 403 errors
On 2010-04-22, at 10:55 PM, Jeroen Ruigrok van der Werven wrote: > -On [20100423 02:48], Sridhar Ratnakumar ([email protected]) wrote: >>> Ok, I setup a cron job to maintain an internal mirror of the above >>> mentioned repositories in code.python.org. We'll do a "hg pull -u" >>> (equivalent to "svn up") every hour; no clones. Hopefully, that should >>> reduce the amount of requests from our side. Let me know if in future this >>> issue repeats. > > Dirk Jan can probably correct me (or some other heavy Hg user) but for all I > know you should indeed simply clone once and subsequently hg pull and from > your local copy clone as you like. (At least that's also how > http://wiki.services.openoffice.org/wiki/Mercurial/Getting_Started seems to > aim at explaining.) Since the "download Python source code" thing is just part of the ActivePython build script, and the Hudson build script deletes the "build/" directory of the previous build .. a clone was necessary. To fix this I ended up creating a "mirror" at a local site which mirror was maintained by an hourly 'hg pull -u'. The hudson build script still does a clone although from this local mirror URL. (Incidentally, cloning from this mirror via the Apache index listing URL doesn't seem to work; gotta investigate why...) -srid ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Unpickling memory usage problem, and a proposed solution
We were having performance problems unpickling a large pickle file, we were getting 170s running time (which was fine), but 1100mb memory usage. Memory usage ought to have been about 300mb, this was happening because of memory fragmentation, due to many unnecessary "puts" in the pickle stream. We made a pickletools.optimize inspired tool that could run directly on a pickle file and used pickletools.genops. This solved the unpickling problem (84s, 382mb). However the tool itself was using too much memory and time (1100s, 470mb), so I recoded it to scan through the pickle stream directly, without going through pickletools.genops, giving (240s, 130mb). Other people that deal with large pickle files are probably having similar problems, and since this comes up when dealing with large data it is precisely in this situation that you probably can't use pickletools.optimize or pickletools.genops. It feels like functionality that ought to be added to pickletools, is there some way I can contribute this? ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] autoconf update to 2.65 and cleanup
On Fri, Apr 23, 2010 at 09:21, Barry Warsaw wrote: > On Apr 23, 2010, at 05:44 PM, Victor Stinner wrote: > > >I'm not sure that it's a good idea to change the build process after the > >first beta. It would depend on the issue comments ;-) > > OTOH, this doesn't seem like a new feature, so I think it should be okay. > Doubly so if it fixes a bug. > > I'm with Barry; if this fixes something then it's worth a go. -Brett > -Barry > > ___ > Python-Dev mailing list > [email protected] > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/brett%40python.org > > ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution
On Fri, Apr 23, 2010 at 11:11, Dan Gindikin wrote: > We were having performance problems unpickling a large pickle file, we were > getting 170s running time (which was fine), but 1100mb memory usage. Memory > usage ought to have been about 300mb, this was happening because of memory > fragmentation, due to many unnecessary "puts" in the pickle stream. > > We made a pickletools.optimize inspired tool that could run directly on a > pickle file and used pickletools.genops. This solved the unpickling problem > (84s, 382mb). > > However the tool itself was using too much memory and time (1100s, 470mb), > so > I recoded it to scan through the pickle stream directly, without going > through > pickletools.genops, giving (240s, 130mb). > > Other people that deal with large pickle files are probably having similar > problems, and since this comes up when dealing with large data it is > precisely > in this situation that you probably can't use pickletools.optimize or > pickletools.genops. It feels like functionality that ought to be added to > pickletools, is there some way I can contribute this? > The best next step is to open an issue at bugs.python.org and upload the patch. I can't make any guarantees on when someone will look at it or if it will get accepted, but putting the code there is your best bet for acceptance. -Brett ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution
On Fri, Apr 23, 2010 at 2:11 PM, Dan Gindikin wrote: > We were having performance problems unpickling a large pickle file, we were > getting 170s running time (which was fine), but 1100mb memory usage. Memory > usage ought to have been about 300mb, this was happening because of memory > fragmentation, due to many unnecessary "puts" in the pickle stream. > > We made a pickletools.optimize inspired tool that could run directly on a > pickle file and used pickletools.genops. This solved the unpickling problem > (84s, 382mb). > > However the tool itself was using too much memory and time (1100s, 470mb), so > I recoded it to scan through the pickle stream directly, without going through > pickletools.genops, giving (240s, 130mb). > Collin Winter wrote a simple optimization pass for cPickle in Unladen Swallow [1]. The code reads through the stream and remove all the unnecessary PUTs in-place. [1]: http://code.google.com/p/unladen-swallow/source/browse/trunk/Modules/cPickle.c#735 > Other people that deal with large pickle files are probably having similar > problems, and since this comes up when dealing with large data it is precisely > in this situation that you probably can't use pickletools.optimize or > pickletools.genops. It feels like functionality that ought to be added to > pickletools, is there some way I can contribute this? > Just put your code on bugs.python.org and I will take a look. -- Alexandre ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution
On Fri, Apr 23, 2010 at 2:38 PM, Alexandre Vassalotti wrote: > Collin Winter wrote a simple optimization pass for cPickle in Unladen > Swallow [1]. The code reads through the stream and remove all the > unnecessary PUTs in-place. > I just noticed the code removes *all* PUT opcodes, regardless if they are needed or not. So, this code can only be used if there's no GET in the stream (which is unlikely for a large stream). I believe Collin made this trade-off for performance reasons. However, it wouldn't be hard to make the current code to work like pickletools.optimize(). -- Alexandre ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution
Alexandre Vassalotti peadrop.com> writes: > Just put your code on bugs.python.org and I will take a look. > Thanks, I'll put it in there. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution
On Fri, Apr 23, 2010 at 11:49 AM, Alexandre Vassalotti wrote: > On Fri, Apr 23, 2010 at 2:38 PM, Alexandre Vassalotti > wrote: >> Collin Winter wrote a simple optimization pass for cPickle in Unladen >> Swallow [1]. The code reads through the stream and remove all the >> unnecessary PUTs in-place. >> > > I just noticed the code removes *all* PUT opcodes, regardless if they > are needed or not. So, this code can only be used if there's no GET in > the stream (which is unlikely for a large stream). I believe Collin > made this trade-off for performance reasons. However, it wouldn't be > hard to make the current code to work like pickletools.optimize(). The optimization pass is only run if you don't use any GETs. The optimization is also disabled if you're writing to a file-like object. These tradeoffs were appropriate for the workload I was optimizing against. Collin Winter ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution
On Fri, Apr 23, 2010 at 11:53 AM, Collin Winter wrote: > On Fri, Apr 23, 2010 at 11:49 AM, Alexandre Vassalotti > wrote: >> On Fri, Apr 23, 2010 at 2:38 PM, Alexandre Vassalotti >> wrote: >>> Collin Winter wrote a simple optimization pass for cPickle in Unladen >>> Swallow [1]. The code reads through the stream and remove all the >>> unnecessary PUTs in-place. >>> >> >> I just noticed the code removes *all* PUT opcodes, regardless if they >> are needed or not. So, this code can only be used if there's no GET in >> the stream (which is unlikely for a large stream). I believe Collin >> made this trade-off for performance reasons. However, it wouldn't be >> hard to make the current code to work like pickletools.optimize(). > > The optimization pass is only run if you don't use any GETs. The > optimization is also disabled if you're writing to a file-like object. > These tradeoffs were appropriate for the workload I was optimizing > against. I should add that, adding the necessary bookkeeping to remove only unused PUTs (instead of the current all-or-nothing scheme) should not be hard. I'd watch out for a further performance/memory hit; the pickling benchmarks in the benchmark suite should help assess this. The current optimization penalizes pickling to speed up unpickling, which made sense when optimizing pickles that would go into memcache and be read out 13-15x more often than they were written. Collin Winter ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution
On Fri, Apr 23, 2010 at 3:07 PM, Collin Winter wrote: > I should add that, adding the necessary bookkeeping to remove only > unused PUTs (instead of the current all-or-nothing scheme) should not > be hard. I'd watch out for a further performance/memory hit; the > pickling benchmarks in the benchmark suite should help assess this. I was thinking about this too. A simple boolean table could be fast, while keeping the space requirement down. This scheme would be nice to caches as well. > The current optimization penalizes pickling to speed up unpickling, > which made sense when optimizing pickles that would go into memcache > and be read out 13-15x more often than they were written. This is my current impression of how pickle is most often used. Are you aware of a use case of pickle where you do more writes than reads? I can't think of any. -- Alexandre ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Inconsistent nesting of scopes in exec(..., locals())
Nick Coghlan writes: > Joachim B Haga wrote: >> There seem to be an inconsistency in the handling of local scopes in >> exec. [...] >> >> The intermediate scope is searched for the variable name if the third >> argument to exec() is locals(), but not if it is locals().copy(). > > What actually matters is whether or not the first and second scope are > the same dictionary or not. > > If they're different, then the supplied local scope is treated as > equivalent to a class definition scope, and hence won't participate in > lexical scoping. If they're the same (which happens implicitly if the > second one is omitted) then they're treated as a module scope (and hence > written values are visible as globals inside any defined functions). Ok, thank you for the explanation. > Since changing this would break class definitions, that ain't going to > happen. Suggestions for how to explain the behaviour more clearly in the > exec() documentation probably wouldn't hurt though. I don't quite see how exec() affects the class definition syntax? Anyhow, I definitely agree that this should be documented. I suggest the following (condensed from your explanation): -If provided, /locals/ can be any mapping object. +If provided, /locals/ can be any mapping object. It is treated as equivalent to a class definition scope, and hence does not participate in lexical scoping. -- Joachim B Haga ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution
Collin Winter google.com> writes: > I should add that, adding the necessary bookkeeping to remove only > unused PUTs (instead of the current all-or-nothing scheme) should not > be hard. I'd watch out for a further performance/memory hit; the > pickling benchmarks in the benchmark suite should help assess this. > The current optimization penalizes pickling to speed up unpickling, > which made sense when optimizing pickles that would go into memcache > and be read out 13-15x more often than they were written. This wouldn't help our use case, your code needs the entire pickle stream to be in memory, which in our case would be about 475mb, this is on top of the 300mb+ data structures that generated the pickle stream. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution
On Fri, Apr 23, 2010 at 3:57 PM, Dan Gindikin wrote: > This wouldn't help our use case, your code needs the entire pickle > stream to be in memory, which in our case would be about 475mb, this > is on top of the 300mb+ data structures that generated the pickle > stream. > In that case, the best we could do is a two-pass algorithm to remove the unused PUTs. That won't be efficient, but it will satisfy the memory constraint. Another solution is to not generate the PUTs at all by setting the 'fast' attribute on Pickler. But that won't work if you have a recursive structure, or have code that requires that the identity of objects to be preserved. >>> import io, pickle >>> x=[1,2] >>> f = io.BytesIO() >>> p = pickle.Pickler(f, protocol=-1) >>> p.dump([x,x]) >>> pickletools.dis(f.getvalue()) 0: \x80 PROTO 2 2: ]EMPTY_LIST 3: qBINPUT 0 5: (MARK 6: ]EMPTY_LIST 7: qBINPUT 1 9: (MARK 10: KBININT11 12: KBININT12 14: eAPPENDS(MARK at 9) 15: hBINGET 1 17: eAPPENDS(MARK at 5) 18: .STOP highest protocol among opcodes = 2 >>> [id(x) for x in pickle.loads(f.getvalue())] [20966504, 20966504] Now with the 'fast' mode enabled: >>> f = io.BytesIO() >>> p = pickle.Pickler(f, protocol=-1) >>> p.fast = True >>> p.dump([x,x]) >>> pickletools.dis(f.getvalue()) 0: \x80 PROTO 2 2: ]EMPTY_LIST 3: (MARK 4: ]EMPTY_LIST 5: (MARK 6: KBININT11 8: KBININT12 10: eAPPENDS(MARK at 5) 11: ]EMPTY_LIST 12: (MARK 13: KBININT11 15: KBININT12 17: eAPPENDS(MARK at 12) 18: eAPPENDS(MARK at 3) 19: .STOP highest protocol among opcodes = 2 >>> [id(x) for x in pickle.loads(f.getvalue())] [20966504, 21917992] As you can observe, the pickle stream generated with the fast mode might actually be bigger. By the way, it is weird that the total memory usage of the data structure is smaller than the size of its respective pickle stream. What pickle protocol are you using? -- Alexandre ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution
Alexandre Vassalotti peadrop.com> writes: > > On Fri, Apr 23, 2010 at 3:57 PM, Dan Gindikin gmail.com> > wrote: > > This wouldn't help our use case, your code needs the entire pickle > > stream to be in memory, which in our case would be about 475mb, this > > is on top of the 300mb+ data structures that generated the pickle > > stream. > > > > In that case, the best we could do is a two-pass algorithm to remove > the unused PUTs. That won't be efficient, but it will satisfy the > memory constraint. That is for what I'm doing for us right now. > Another solution is to not generate the PUTs at all > by setting the 'fast' attribute on Pickler. But that won't work if you > have a recursive structure, or have code that requires that the > identity of objects to be preserved. We definitely have some cross links amongst the objects, so we need PUTs. > By the way, it is weird that the total memory usage of the data > structure is smaller than the size of its respective pickle stream. > What pickle protocol are you using? Its highest protocol, but we have a bunch of extension types that get expanded into python tuples for pickling. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution
On Fri, Apr 23, 2010 at 1:53 PM, Alexandre Vassalotti wrote: > On Fri, Apr 23, 2010 at 3:57 PM, Dan Gindikin wrote: >> This wouldn't help our use case, your code needs the entire pickle >> stream to be in memory, which in our case would be about 475mb, this >> is on top of the 300mb+ data structures that generated the pickle >> stream. >> > > In that case, the best we could do is a two-pass algorithm to remove > the unused PUTs. That won't be efficient, but it will satisfy the > memory constraint. Another solution is to not generate the PUTs at all > by setting the 'fast' attribute on Pickler. But that won't work if you > have a recursive structure, or have code that requires that the > identity of objects to be preserved. I don't think it's possible in general to remove any PUTs if the pickle is being written to a file-like object. It is possible to reuse a single Pickler to pickle multiple objects: this causes the Pickler's memo dict to be shared between the objects being pickled. If you pickle foo, bar, and baz, foo may not have any GETs, but bar and baz may have GETs that reference data added to the memo by foo's PUT operations. Because you can't know what will be written to the file-like object later, you can't remove any of the PUT instructions in this scenario. This kind of thing is done in real-world code like cvs2svn (which I broke when I was optimizing cPickle; don't break cvs2svn, it's not fun to fix :). I added some basic tests for this support in cPython's Lib/test/pickletester.py. There might be room for app-specific optimizations that do this, but I'm not sure it would work for a general-usage cPickle that needs to stay compatible with the current system. Collin Winter ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution
Collin Winter google.com> writes: > > I don't think it's possible in general to remove any PUTs if the > pickle is being written to a file-like object. Does cPickle bytecode have some kind of NOP instruction? You could keep track of which PUTs weren't necessary and zero them out at the end. It would be much cheaper than writing a whole other "optimized" stream. (of course it only works on a seekable stream :-)) Regards Antoine. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution
Collin Winter google.com> writes: > I don't think it's possible in general to remove any PUTs if the > pickle is being written to a file-like object. It is possible to reuse > a single Pickler to pickle multiple objects: this causes the Pickler's > memo dict to be shared between the objects being pickled. If you > pickle foo, bar, and baz, foo may not have any GETs, but bar and baz > may have GETs that reference data added to the memo by foo's PUT > operations. Because you can't know what will be written to the > file-like object later, you can't remove any of the PUT instructions > in this scenario. Hmm, that is a good point. A possible solution would be for the two-pass optimizer to scan through the entire file, going right through '.' opcodes. That would deal with the case you are describing, but not if the user "maliciously" wrote some other stuff into the file in between pickle dumps, all the while reusing the same pickler. I think a better solution would be to make sure that the '.' is the last thing in the file and die otherwise. This would at least ensure correctness and detection of cases that this thing could not handle. > don't break cvs2svn, it's not fun > to fix :). I added some basic tests for this support in cPython's > Lib/test/pickletester.py. Thanks for the warning :) > There might be room for app-specific optimizations that do this, but > I'm not sure it would work for a general-usage cPickle that needs to > stay compatible with the current system. That may well be true. Still, when trying to deal with large data you really need something like this. Our situation was made worse because we had a extension types. As they were allocated they got interspersed with temporaries generated by the spurious PUTs, and that is what really fragmented the memory. However its probably not a stretch to assume that if you are dealing with large stuff through python you are going to have extension types in the mix. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution
Antoine Pitrou pitrou.net> writes:
> Does cPickle bytecode have some kind of NOP instruction?
> You could keep track of which PUTs weren't necessary and zero them out at the
> end. It would be much cheaper than writing a whole other "optimized" stream.
For a large file, I'm not sure it is much faster to edit it in place
than to rewrite it, and also since it's a large file, you are going
to probably have it compressed, in which case you are out of luck anyway.
> (of course it only works on a seekable stream )
Indeed... since I was dealing with zipped files, I had to pass in
a "seek0" func, like so:
file_in_seek0_func = lambda x: os.popen('zcat ' + file_in)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution
Dan Gindikin gmail.com> writes: > > Antoine Pitrou pitrou.net> writes: > > Does cPickle bytecode have some kind of NOP instruction? > > You could keep track of which PUTs weren't necessary and zero them out at > > the > > end. It would be much cheaper than writing a whole other "optimized" stream. > > For a large file, I'm not sure it is much faster to edit it in place > than to rewrite it, and also since it's a large file, you are going > to probably have it compressed, in which case you are out of luck anyway. Depends whether you really care about disk occupation or not. Disk space is often cheap (much more so than RAM). Also, I'm quite sure overwriting a couple of blocks in a file is cheaper than rewriting it entirely. Of course, if you must overwrite every other block, it might not be true anymore :-) Regards Antoine. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
