Looking through the archives there seems to be a number of suggestions on organizing data, but all of the ones I could find are for data sets that are more complicated than I need.
I just need to store the 3D coordinates (~ 500x3 numpy array) of a number of instances of a system for each iteration. Basically I have a variable number of 'molecules' per iteration, made up of about 500 particles (this shape is fixed). Each iteration has 20-5000 such molecules. In the end I imagine that I will need to store on the order of 10-200 million such 500x3 arrays, grouped by iteration and labeled by an integer identification number. Since I'm planning on using the pytables in tandem with a sqlalchemy based sqlite database, I don't need to store any other information in the pytable, besides the coordinates and the iteration and id labels. After writing the data I would need easy access to the coordinate array using the iteration and molecule index id, and the ability to say I want all of the coordinates stored for iteration N. Would it make sense to make a table with columns for iterations, id and coordinates, or would it be better to use an EArray or Vlarray (although the data isn't jagged so the later may not be a good choice)? My understanding from reading posts on the list is that it would be highly inefficient to save each 500x3 numpy array as its own node Any suggestions on a more optimal way of organizing this sort of data set would be most appreciated. Best wishes, Josh ------------------------------------------------------------------------------ _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users