Anand Patil (el 2008-05-19 a les 17:48:24 +0100) va dir:: > I'd like to store a long sequence of python objects with pytables. The > only things I know about the objects are: > > - Their memory footprint is dominated by a big numpy array, and > - The attribute name of the big array for each object is the same; > it's x1.big_array, x2.big_array, etc. > > I would rather not require the array to be the same shape for each > object. > > I think I'd want to to make a group with a single ObjectAtom array and > a whole bunch of arrays whose atoms correspond to big_array.dtype. To > store an object, I'd destroy all its references to its big_array, > pickle it in the ObjectAtom array, and store its big_array in one of > the other arrays. > > My questions are: > - Is this the best way to go? > - What kind of performance penalty am I incurring by storing each of > the big_array attributes in its own pytables array, rather than making > them cells in a table? How can I mitigate it? > - How can I make sure that all of an object's references to its > big_array get destroyed, so that the latter doesn't get pickled with > the object?
I find your approach a quite reasonable one. You'd have some overhead when creating each of the data arrays (for the node metadata), but it could be overcome by the space gains you'd get if using ``CArray`` or ``EArray`` nodes with compression. Then, if you have more than 4096 nodes, you should be careful not to place them all in the same group to avoid performance problems with the object tree. Since PyTables doesn't alter the data you store, to avoid storing the array along the object you could define your own loader and storer functions that replaced the ``big_array`` attribute by some kind of reference to the array node (i.e. by storing its path), something like:: import copy def store_object(obj, vlarray): array = obj.big_array arrpath, arrname = compute_data_path(obj, vlarray, ...) st_obj = copy.copy(obj) # shallow copy st_obj.big_array = (arrpath, arrname) vlarray.append(st_obj) st_obj_pos = len(vlarray) - 1 arr = vlarray._v_file.createCArray( arrpath, arrname, tables.atom_from_dtype(array.dtype), array.shape ) arr[:] = big_array return st_obj_pos def load_object(st_obj_pos, vlarray): obj = vlarray[st_obj_pos] arrpath, arrname = obj.big_array arr = vlarray._v_file.getNode(arrpath, arrname) obj.big_array = arr[:] return obj Hope that helps, :: Ivan Vilata i Balaguer @ Welcome to the European Banana Republic! @ http://www.selidor.net/ @ http://www.nosoftwarepatents.com/ @
signature.asc
Description: Digital signature
------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users