A Friday 30 September 2011 15:58:25 Dharhas Pothina escrigué: > Hi, > > I'm trying to learn how to use pytables to store some hydrographic > survey data. Essentially I have the following structure: > > profile - fixed size string > time - float > latitude - float > longitude - float > frequency - float > trace - numpy array of floats (fixed size for a particular profile > but changes for different profiles) cur_depth - float > pre_depth - float > > I want to aggregate this data from multiple profiles and store it in > a single hdf5 file. Each profile has about 2-3K records and a fixed > trace size. Later I want to be able pull data out in the following > ways. > > 1) for a particular profile and frequency get all trace data to form > a raster. For example, for profile = '082311', frequency = 200.0, I > would get back a 2D array of traces 2) for a particular frequency > get latitude,longitude,cur_depth > > I've followed the tutorial and done some googling of the mailing list > but I'm unsure how to represent the 'trace' numpy array within > pytables. Any pointers would be greatly appreciated.
2-3K records is really not too much, so I'd go by consolidating all your records on a single Table object. For the traces you could use a VLArray, which supports each entry having a different number of data elements. You can then use the row number in order to 'link' the Table record with the variable length entry in the VLArray entity. With this schema you can solve your problems with, for example: 1) my_traces = [] for row in profile_table.where( "(profile == '082311') & (frequency == 200.0)"): my_traces.append(trace_array[row.nrow]) in my_traces you will have the complete set of traces. In case you want to accumulate them in an array, you could use the generator form of the above and then use np.fromiter(). Something like: my_arr = np.fromiter(trace_array[row.nrow] for row in table.where(...)) 2) The same, but just add more conditions to the where iterator: profile_table.where("(frequency == X) & (lat == Y) & ...") However, in this case you should be careful because you can't use a final NumPy container because the traces differ in length from profile to profile. Use a list in this case (or a NumPy array or objects). Finally, if your table grows a lot, you can use the shiny new indexing option so as to accelerate the queries. Look into: http://pytables.github.com/usersguide/optimization.html#indexed-searches for more info on how to use this. -- Francesc Alted ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2dcopy2 _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users