Re: [Pytables-users] problem creating table with large number of columns

Francesc Alted Thu, 23 Sep 2010 00:38:01 -0700

A Wednesday 22 September 2010 20:57:06 Josh Ayers escrigué:
> For your reference, the application is a large data acquisition
> system with around 10,000 total channels, and many different data
> types.  The data is currently stored in a National Instruments
> semi-proprietary file format, and it's split over a few files with
> several thousand channels each.  I'm looking into using hdf5 as an
> alternative, mostly for my own personal use. The NI file format is
> poorly documented and difficult to use without their expensive
> software.
> 
> An important feature is accessing the columns by name, which is why
> it seems a table would work well.  I don't think multi-dimensional
> columns would work for that reason.
> 
> You indicated in the trac ticket that there was a workaround for the
> HDF5 limitation.  Is there anything I need to do to utilize that
> workaround? I'll be manually filling in all the values for each row
> before appending it to the table, so I don't need to use any default
> values.


I don't think so.  The fundamental problem here seems a limitation on 
the HDF5 type size (64 KB).  Perhaps you can report that to the hdf-
forum list so that the HDF crew may raise the priority to fix this 
limitation.

> Another option would be to split the data over several tables.  Then
> I could either have a separate index table that lists which column
> is located in which table, or just have my code search each table
> until it finds the desired column.  The downside to this approach is
> I lose the ability to do tables.where() searches on multiple columns
> if they appear in different tables, but I don't think that's too
> much of a problem.  If I was to do this, do you have a
> recommendation for the number of columns per table?  By default
> PyTables gives a warning if there are more than 512 columns.  Does
> performance start to degrade above this number?

PyTables' Table objects are stored row-wise, so if the number of columns 
per table grows too much, a lot of data has to be retrieved from disk 
even if you are interested only in the contents of one column.  Hence, 
definitely, it is wise to keep the number of columns as low as possible.  
512 is a somewhat artificial figure, and the 'degradation' does not 
start here, but it is progressive (i.e. it grows with the number of 
columns, unless you need *all* the column data during queries).

Hope this helps,

-- 
Francesc Alted

------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] problem creating table with large number of columns

Reply via email to