Hi pytables users,

I am considering using pytables for storing a tree structure of python
objects. I am not sure if pytables is the right tool for my application
and wasn't able to find an answer in the archive, so i am trying it here.
My simulation generates a large tree structure whose nodes are instances
of a subclass of tinytree.Tree
(http://pypi.python.org/pypi/tinytree/0.2). Currently, I use shelve and
cPickle to store these data to disk as python object trees. The files
are about 700 MB but may grow to several GB in the
future.
I am having memory issues when iterating over the trees for data
analysis; typically the required memory is at least a factor of 3 larger
than the stored file size. Part of the reason is that I have to load the
entire tree structure into memory at once during unpickling.

The question is:

1. is it a good idea to try to replace this object tree structure by a
deeply nested HDF5 file using pytables? I.e. tree nodes -> HDF5 groups;
tree root -> HDF5 file; node attibutes -> Datasets or -> Group attributes?
What I would want in the end is an on-disk version of the tree in which
nodes are loaded into memory on-demand, and partial loading of a tree is
possible.
Ideally, as little as possible would change in the API exposed to the
data generation part of the program...

2. if yes, does this work better than using ZODB for out-of-the box
object persistence? Speed of data retrieval and moderate size of the
stored file are important.

Thanks a lot for any hints!

Nils


Details of the present implementation as a tinytree.Tree:

Each node has < 10 children, often only one; there is one root and no
cycles. However, the tree may be deeply nested; thousands of generations
are common.
Each node carries data as class attributes: Several floats, some
booleans, and a list of variable length with numpy.ndarrays of floats as
elements.
These attributes are not big, typically < 1000*sizeof(float) or so. The
whole tree typically consumes several hundred MB on disk but could grow
to GB size in the future.

The simulation is append-only, and the analysis is read-only.



------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to