Awesome. Thats exactly what I needed.
- d
>>> Francesc Alted <fal...@pytables.org> 9/30/2011 11:46 AM >>>
A Friday 30 September 2011 15:58:25 Dharhas Pothina escrigué:
> Hi,
>
> I'm trying to learn how to use pytables to store some hydrographic
> survey data. Essentially I have the following structure:
>
> profile - fixed size string
> time - float
> latitude - float
> longitude - float
> frequency - float
> trace - numpy array of floats (fixed size for a particular profile
> but changes for different profiles) cur_depth - float
> pre_depth - float
>
> I want to aggregate this data from multiple profiles and store it in
> a single hdf5 file. Each profile has about 2-3K records and a fixed
> trace size. Later I want to be able pull data out in the following
> ways.
>
> 1) for a particular profile and frequency get all trace data to form
> a raster. For example, for profile = '082311', frequency = 200.0, I
> would get back a 2D array of traces 2) for a particular frequency
> get latitude,longitude,cur_depth
>
> I've followed the tutorial and done some googling of the mailing
list
> but I'm unsure how to represent the 'trace' numpy array within
> pytables. Any pointers would be greatly appreciated.
2-3K records is really not too much, so I'd go by consolidating all
your
records on a single Table object. For the traces you could use a
VLArray, which supports each entry having a different number of data
elements. You can then use the row number in order to 'link' the
Table
record with the variable length entry in the VLArray entity.
With this schema you can solve your problems with, for example:
1)
my_traces = []
for row in profile_table.where(
"(profile == '082311') & (frequency == 200.0)"):
my_traces.append(trace_array[row.nrow])
in my_traces you will have the complete set of traces. In case you
want
to accumulate them in an array, you could use the generator form of
the
above and then use np.fromiter(). Something like:
my_arr = np.fromiter(trace_array[row.nrow] for row in
table.where(...))
2)
The same, but just add more conditions to the where iterator:
profile_table.where("(frequency == X) & (lat == Y) & ...")
However, in this case you should be careful because you can't use a
final NumPy container because the traces differ in length from profile
to profile. Use a list in this case (or a NumPy array or objects).
Finally, if your table grows a lot, you can use the shiny new indexing
option so as to accelerate the queries. Look into:
http://pytables.github.com/usersguide/optimization.html#indexed-searches
for more info on how to use this.
--
Francesc Alted
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously
valuable.
Why? It contains a definitive record of application performance,
security
threats, fraudulent activity, and more. Splunk takes this data and
makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users