Hi pytables users,
I am considering using pytables for storing a tree structure of python objects. I am not sure if pytables is the right tool for my application and wasn't able to find an answer in the archive, so i am trying it here. My simulation generates a large tree structure whose nodes are instances of a subclass of tinytree.Tree (http://pypi.python.org/pypi/tinytree/0.2). Currently, I use shelve and cPickle to store these data to disk as python object trees. The files are about 700 MB but may grow to several GB in the future. I am having memory issues when iterating over the trees for data analysis; typically the required memory is at least a factor of 3 larger than the stored file size. Part of the reason is that I have to load the entire tree structure into memory at once during unpickling. The question is: 1. is it a good idea to try to replace this object tree structure by a deeply nested HDF5 file using pytables? I.e. tree nodes -> HDF5 groups; tree root -> HDF5 file; node attibutes -> Datasets or -> Group attributes? What I would want in the end is an on-disk version of the tree in which nodes are loaded into memory on-demand, and partial loading of a tree is possible. Ideally, as little as possible would change in the API exposed to the data generation part of the program... 2. if yes, does this work better than using ZODB for out-of-the box object persistence? Speed of data retrieval and moderate size of the stored file are important. Thanks a lot for any hints! Nils Details of the present implementation as a tinytree.Tree: Each node has < 10 children, often only one; there is one root and no cycles. However, the tree may be deeply nested; thousands of generations are common. Each node carries data as class attributes: Several floats, some booleans, and a list of variable length with numpy.ndarrays of floats as elements. These attributes are not big, typically < 1000*sizeof(float) or so. The whole tree typically consumes several hundred MB on disk but could grow to GB size in the future. The simulation is append-only, and the analysis is read-only. ------------------------------------------------------------------------------ Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! Tap into the largest installed PC base & get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users