A Wednesday 02 June 2010 05:22:15 Joshua Adelman escrigué: > Looking through the archives there seems to be a number of suggestions on > organizing data, but all of the ones I could find are for data sets that > are more complicated than I need. > > I just need to store the 3D coordinates (~ 500x3 numpy array) of a number > of instances of a system for each iteration. Basically I have a variable > number of 'molecules' per iteration, made up of about 500 particles (this > shape is fixed). Each iteration has 20-5000 such molecules. In the end I > imagine that I will need to store on the order of 10-200 million such > 500x3 arrays, grouped by iteration and labeled by an integer > identification number. Since I'm planning on using the pytables in tandem > with a sqlalchemy based sqlite database, I don't need to store any other > information in the pytable, besides the coordinates and the iteration and > id labels. > > After writing the data I would need easy access to the coordinate array > using the iteration and molecule index id, and the ability to say I want > all of the coordinates stored for iteration N. Would it make sense to make > a table with columns for iterations, id and coordinates, or would it be > better to use an EArray or Vlarray (although the data isn't jagged so the > later may not be a good choice)? My understanding from reading posts on > the list is that it would be highly inefficient to save each 500x3 numpy > array as its own node > > Any suggestions on a more optimal way of organizing this sort of data set > would be most appreciated.
Well, I'd say that 500x3 would be around 12 KB per molecule (assuming that the 3D coordinates are 8-byte elements). This is not too much, and pytables will choose a chunksize that is able to contain a handful of rows (molecules) per chunk, which is fine. Then, you can use the selection capabilities of tables to easily select the interesting iterations. So I'd go this venue. For the record, the problem is when you have row sizes that exceeds by far the chunksize of a dataset. Typically, pytables chunksizes are between 32 KB up to 256 KB. When you have row sizes that exceeds 10x these sizes, then you may have performance problems. But if not, you can safely use fixed-size row objects (like Table, EArray or CArray) in pytables. Hope this helps, -- Francesc Alted ------------------------------------------------------------------------------ _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users