Hi Nils, A Thursday 25 November 2010 10:50:45 Nils Becker escrigué: > Hi pytables users, > > > I am considering using pytables for storing a tree structure of > python objects. I am not sure if pytables is the right tool for my > application and wasn't able to find an answer in the archive, so i > am trying it here. My simulation generates a large tree structure > whose nodes are instances of a subclass of tinytree.Tree > (http://pypi.python.org/pypi/tinytree/0.2). Currently, I use shelve > and cPickle to store these data to disk as python object trees. The > files are about 700 MB but may grow to several GB in the > future. > I am having memory issues when iterating over the trees for data > analysis; typically the required memory is at least a factor of 3 > larger than the stored file size. Part of the reason is that I have > to load the entire tree structure into memory at once during > unpickling. > > The question is: > > 1. is it a good idea to try to replace this object tree structure by > a deeply nested HDF5 file using pytables? I.e. tree nodes -> HDF5 > groups; tree root -> HDF5 file; node attibutes -> Datasets or -> > Group attributes? What I would want in the end is an on-disk version > of the tree in which nodes are loaded into memory on-demand, and > partial loading of a tree is possible. > Ideally, as little as possible would change in the API exposed to the > data generation part of the program...
I'd say, yes, PyTables could work as a replacement for your current pickle approach because it supports your requirements. PyTables supports lazy node loading and also comes with an integrated cache for nodes that only keeps the ones that are frequently accessed. In addition, PyTables Pro implements another kind of cache for nodes, implemented at C-level, in case you need extreme speed. Regarding the data on each node, as a rule of thumb, I'd use group attributes if the data per node does not exceed 64 KB. For larger sizes, it is best to use a separate leaf (dataset). > 2. if yes, does this work better than using ZODB for out-of-the box > object persistence? Speed of data retrieval and moderate size of the > stored file are important. I don't know. There is a lot of time that I don't use ZODB anymore. The best would be running a small benchmark and decide. If you do so, please come back and show us your results. I'm interested. Hope this helps, -- Francesc Alted ------------------------------------------------------------------------------ Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! Tap into the largest installed PC base & get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users