Looking through the archives there seems to be a number of suggestions on 
organizing data, but all of the ones I could find are for data sets that are 
more complicated than I need. 

I just need to store the 3D coordinates (~ 500x3 numpy array) of a number of 
instances of a system for each iteration. Basically I have a variable number of 
'molecules' per iteration, made up of about 500 particles (this shape is 
fixed). Each iteration has 20-5000 such molecules. In the end I imagine that I 
will need to store on the order of 10-200 million such 500x3 arrays, grouped by 
iteration and labeled by an integer identification number. Since I'm planning 
on using the pytables in tandem with a sqlalchemy based sqlite database, I 
don't need to store any other information in the pytable, besides the 
coordinates and the iteration and id labels.

After writing the data I would need easy access to the coordinate array using 
the iteration and molecule index id, and the ability to say I want all of the 
coordinates stored for iteration N. Would it make sense to make a table with 
columns for iterations, id and coordinates, or would it be better to use an 
EArray or Vlarray (although the data isn't jagged so the later may not be a 
good choice)? My understanding from reading posts on the list is that it would 
be highly inefficient to save each 500x3 numpy array as its own node

Any suggestions on a more optimal way of organizing this sort of data set would 
be most appreciated.

Best wishes,
Josh




------------------------------------------------------------------------------

_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to