On Fri, Mar 18, 2011 at 1:20 AM, Dhananjaya <dhanush...@yahoo.com> wrote:

>
>
> thank you Francesc and Anthony,
>
>    Now i better undestand the pytables objective.
> But Anthony can you be more specific ? Where exactly I can see the gain in
> Python. We are trying to switch from C to python with pytables, because of
> Pythons introspection capability and less maintainability overhead (
> reduced
> lines of code ). How further can i represent data so that my I/O with HDF5
> increases ?
>

Basically, by having data in fewer, but larger datasets is how you will see
i/o gains.  For example, 10 arrays of 1 million elements will preform far
better than 1 million arrays of 10 elements.

So if possible, I would consolidate your datasets.  How you go about doing
this depends highly on the structure of the data that you have.  One trick I
sometimes use is that I have a single 'info' Table that contains metadata
about the rows of (multiple) other data sets.   For example, say that you
have weather data coming from multiple different locations, all sampled at
possibly different times.  You could have the following structure:

Station1 (Group)
  |- times (Array)
  |- temperature (Array)
  |- windspeed (Array)
  |- ...
Station2 (Group)
  |- times (Array)
  |- temperature (Array)
  |- windspeed (Array)
  |- ...
Station3 (Group)
  |- times (Array)
  |- temperature (Array)
  |- windspeed (Array)
  |- ...
...

But a better way to do it might be to have an info table, with only one,
much larger array for each of measurement type

station_times (Table):
   |- station (StrCol)
   |- time (Time64Col)
temperatures (Array)
windspeed (Array)
...

where the indices of the station_times table match the indices of the data
arrays.

I hope this is concrete enough...
Be Well
Anthony

Regards
> Dhananjaya
>
>
>
>
> ------------------------------------------------------------------------------
> Colocation vs. Managed Hosting
> A question and answer guide to determining the best fit
> for your organization - today and in the future.
> http://p.sf.net/sfu/internap-sfd2d
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to