Bob Fnord wrote: > I'm using python to do some log file analysis and I need to store > on disk a very large dict with tuples of strings as keys and > lists of strings and numbers as values. > > I started by using cPickle to save the instance of the class that > contained this dict, but the pickling process started to write > the file but ate so much memory that my computer (4 GB RAM) > crashed so badly that I had to press the reset button. I've never > seen out-of-memory errors do this before. Is this normal? > > (I know from the output that got written before the crash that my > program had finished building the dict and started the > pickle. When I tried running the other program that reads the > pickle and analyzes the data in it, it gave an error because the > file was incomplete. So I know where in my code the crash > happened.) > >>From searching the web, I get the impression that pickle uses a > lot of memory because it checked for recursion and other things > that could break other serialization methods. So I've switched to > using marshal to save the dict itself (the only persistent thing > in the class, which just has convenience methods for adding data > to the dict and searching it for the second stage of analysis). > > I found some references to h5 tables for getting around the > pickling memory problem, but I got the impression they only work > with fixed columns, not a somewhat complex data structure like > mine. > > Any comments, suggestions?
Have you seen that one? http://mail.python.org/pipermail/python-list/2008-July/1139855.html -- http://mail.python.org/mailman/listinfo/python-list