Re: [Pytables-users] How to use vlarray within a table OR alternate way of storing data.

Francesc Alted Fri, 30 Sep 2011 10:02:16 -0700

A Friday 30 September 2011 15:58:25 Dharhas Pothina escrigué:
> Hi,
> 
> I'm trying to learn how to use pytables to store some hydrographic
> survey data. Essentially I have the following structure:
> 
> profile - fixed size string
> time - float
> latitude - float
> longitude - float
> frequency - float
> trace - numpy array of floats (fixed size for a particular profile
> but changes for different profiles) cur_depth - float
> pre_depth - float
> 
> I want to aggregate this data from multiple profiles and store it in
> a single hdf5 file. Each profile has about 2-3K records and a fixed
> trace size. Later I want to be able pull data out in the following
> ways.
> 
> 1) for a particular profile and frequency get all trace data to form
> a raster. For example, for profile = '082311', frequency = 200.0, I
> would get back a 2D array of traces 2) for a particular frequency
> get latitude,longitude,cur_depth
> 
> I've followed the tutorial and done some googling of the mailing list
> but I'm unsure how to represent the 'trace' numpy array within
> pytables. Any pointers would be greatly appreciated.


2-3K records is really not too much, so I'd go by consolidating all your 
records on a single Table object.  For the traces you could use a 
VLArray, which supports each entry having a different number of data 
elements.  You can then use the row number in order to 'link' the Table 
record with the variable length entry in the VLArray entity.

With this schema you can solve your problems with, for example:

1)

my_traces = []
for row in profile_table.where(
    "(profile == '082311') & (frequency == 200.0)"):
    my_traces.append(trace_array[row.nrow])

in my_traces you will have the complete set of traces.  In case you want 
to accumulate them in an array, you could use the generator form of the 
above and then use np.fromiter().  Something like:

my_arr = np.fromiter(trace_array[row.nrow] for row in table.where(...))

2)

The same, but just add more conditions to the where iterator:

profile_table.where("(frequency == X) & (lat == Y) & ...")

However, in this case you should be careful because you can't use a 
final NumPy container because the traces differ in length from profile 
to profile.  Use a list in this case (or a NumPy array or objects).

Finally, if your table grows a lot, you can use the shiny new indexing 
option so as to accelerate the queries.  Look into:

http://pytables.github.com/usersguide/optimization.html#indexed-searches

for more info on how to use this.

-- 
Francesc Alted

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] How to use vlarray within a table OR alternate way of storing data.

Reply via email to