Re: cPickle asymptotic performance?
Eric Jonas [EMAIL PROTECTED] writes: Try gc.disable() before loading the pickle, and gc.enable() after. Is cPickle's behavior known to be O(n^2)? No, but the garbage collector's sometimes is. Wow, that totally fixed it -- we went from 1200 seconds to 60 seconds. 60 seconds is still a long time. How many objects are you deserializing? Is the time now approximately O(n)? I'm somewhat shocked -- is the occasionally-abysmal behavior of the GC documented anywhere? I don't think so. It's somewhat of an FAQ on the list, though. The question tends to arise when someone tries to construct a large list of non-trivial objects, which takes quadratic time because object allocation regularly triggers GC, which traverses the growing list. -- http://mail.python.org/mailman/listinfo/python-list
cPickle asymptotic performance?
Hello, I've done some benchmarking while attempting to serialize my (large) graph data structure with cPickle; I'm seeing superlinear performance (plotting it seems to suggest n^2 where n is the number of nodes of my graph), in the duration of the pickle.dump calls and I can't quite figure out why. The connectivity of the graph is such that the number of nodes is ~ number of edges, so I don't think this is a problem of edge count secretly growing O(n). Is cPickle's behavior known to be O(n^2)? Does anyone have any generic tips for speeding up cPickle? Thanks, ...Eric -- http://mail.python.org/mailman/listinfo/python-list
Re: cPickle asymptotic performance?
Eric Jonas [EMAIL PROTECTED] writes: I've done some benchmarking while attempting to serialize my (large) graph data structure with cPickle; I'm seeing superlinear performance (plotting it seems to suggest n^2 where n is the number of nodes of my graph), in the duration of the pickle.dump calls and I can't quite figure out why. Try gc.disable() before loading the pickle, and gc.enable() after. Is cPickle's behavior known to be O(n^2)? No, but the garbage collector's sometimes is. -- http://mail.python.org/mailman/listinfo/python-list
Re: cPickle asymptotic performance?
If you are getting to the point where your data is large enough to really care about the speed of cPickle, then maybe its time you moved past pickles for your storage format? 2.5 includes sqlite, so you could persist them in a nice, indexed table or something. Just a suggestion. On Jun 12, 2008, at 2:25 PM, Eric Jonas wrote: Hello, I've done some benchmarking while attempting to serialize my (large) graph data structure with cPickle; I'm seeing superlinear performance (plotting it seems to suggest n^2 where n is the number of nodes of my graph), in the duration of the pickle.dump calls and I can't quite figure out why. The connectivity of the graph is such that the number of nodes is ~ number of edges, so I don't think this is a problem of edge count secretly growing O(n). Is cPickle's behavior known to be O (n^2)? Does anyone have any generic tips for speeding up cPickle? Thanks, ...Eric -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: cPickle asymptotic performance?
On Thu, 2008-06-12 at 20:57 +0200, Hrvoje Niksic wrote: Eric Jonas [EMAIL PROTECTED] writes: I've done some benchmarking while attempting to serialize my (large) graph data structure with cPickle; I'm seeing superlinear performance (plotting it seems to suggest n^2 where n is the number of nodes of my graph), in the duration of the pickle.dump calls and I can't quite figure out why. Try gc.disable() before loading the pickle, and gc.enable() after. Is cPickle's behavior known to be O(n^2)? No, but the garbage collector's sometimes is. Wow, that totally fixed it -- we went from 1200 seconds to 60 seconds. I'm somewhat shocked -- is the occasionally-abysmal behavior of the GC documented anywhere? ...Eric -- http://mail.python.org/mailman/listinfo/python-list