Giorgos Tzampanakis wrote: > I have a program that saves lots (about 800k) objects into a shelve > database (I'm using sqlite3dbm for this since all the default python dbm > packages seem to be unreliable and effectively unusable, but this is > another discussion). > > The process takes about 10-15 minutes. During that time I see memory usage > steadily rising, sometimes resulting in a MemoryError. Now, there is a > chance that my code is keeping unneeded references to the stored objects, > but I have debugged it thoroughly and haven't found any. > > So I'm beginning to suspect that the pickle module might be keeping an > internal cache of objects being pickled. Is this true?
Pickler/Unpickler objects use a cache to maintain object identity, but at least shelve in the standard library uses a new Pickler/Unpickler for each set/get operation. I don't have sqlite3dbm, but you can try the following: >>> import shelve >>> class A: pass ... >>> a = A() >>> s = shelve.open("tmp.shelve") >>> s["x"] = s["y"] = a >>> s["x"] is s["y"] False If you are getting True there must be a cache. One way to enable a cache yourself is writeback: >>> s = shelve.open("tmp.shelve", writeback=True) >>> s["x"] = s["y"] = a >>> s["x"] is s["y"] True You didn't do that, I guess? -- http://mail.python.org/mailman/listinfo/python-list