Re: [Pytables-users] Advice for organizing data

Francesc Alted Tue, 07 Oct 2008 11:14:20 -0700

A Tuesday 07 October 2008, Mico Filós escrigué:
> Thanks Francesc for the information.
>
> I am a little puzzled, though. Isn't VLArray intended to represent
> rows with a variable number of elements (ragged tables)?
> In my case, what is variable is the number of rows (records), not the
> number of columns; records have all fixed length.


That's right, but you asked to include sub-tables with a variable number 
of rows inside another table.  This is what PyTables doesn't support.

> In such case, is it possible to fill the entries of a table with
> (sub)tables? [Fig. 2 of Dragan's article]

It depends on the amount of data you have.  If I understand well, 
Dragan's approach is to create a *different* Table object for every set 
of parameters.  This is perfectly possible if you have a small set of 
different parameters (I mean, for all the *possible* combinations of 
parameters).  However, if you have a lot of different sets of 
parameters (say, more than 10000), then you are going to need to create 
a lot of tables for keeping your observations, and this is not terribly 
efficent with PyTables/HDF5.

In the case of having lots of possible combinations of parameters, 
another possibility would be to setup a single table with the next 
schema:

A      B      C   ntrial   trajectory
-------------------------------------------
a1    b1    c1    1         traj_array1
a1    b1    c1    2         traj_array2
a2    b1    c1    3         traj_array3
a2    b1    c1    4         traj_array4

i.e. the values in columns A, B and C would represent each possible 
combination of your parameters.  This would let you to consolidate all 
your observations in a single table, and it is normally handier and 
more efficient to retrieve data from a single table than several.  
Moreover, you can minimize the overhead of the duplicated entries in 
A,B,C columns by setting the compression on.

Other alternative would be to create a couple of tables, one for 
parameters, and the other for observations.  The way to establish a 
relation one to many between both could be through a single additional 
VLArray dataset.  So, for each row in the parameter table, there would 
be a row of values (with variable length) in the VLArray telling the 
rows of the observations that belongs to that parameter set.

But, as you can see, the latter option is a bit more work than the first 
one, so my advice is to go with the former first and let 
PyTables/compression to do the dirty job for you.

Hope this time things would be clearer now,

-- 
Francesc Alted
Freelance developer
Tel +34-964-282-249

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Advice for organizing data

Reply via email to