> > Here's a simpler code snippet to reproduce the error. It appears
> > there is a maximum number of columns in a table, and it depends on
> > the data type in an unusual way (at least to me). All floats have
> > one limit and all integers have another limit, regardless of the bit
> > size. I didn't test strings or booleans.
> >
> > 1092 for floats (both 32 bit and 64 bit)
> > 1260 for ints and unsigned ints (both 32 bit and 64 bit)
> >
> > I did some more searching of the email list archives, and I found
> > ticket #211 which describes a similar problem. It was caused by a
> > limitation in the HDF5 library. Could this be the same issue?
>
> Most probably yes. I'm getting this message in the HDF5 stack error:
>
> #014: H5Oalloc.c line 1135 in H5O_alloc(): object header message is too
> large
>
> So, yeah, it seems that the HDF5 have not fixed that yet. Until they
> address it, could multidimensional columns be a solution for you?
>
> --
> Francesc Alted
>
For your reference, the application is a large data acquisition system with
around 10,000 total channels, and many different data types. The data is
currently stored in a National Instruments semi-proprietary file format, and
it's split over a few files with several thousand channels each. I'm
looking into using hdf5 as an alternative, mostly for my own personal use.
The NI file format is poorly documented and difficult to use without their
expensive software.
An important feature is accessing the columns by name, which is why it seems
a table would work well. I don't think multi-dimensional columns would work
for that reason.
You indicated in the trac ticket that there was a workaround for the HDF5
limitation. Is there anything I need to do to utilize that workaround?
I'll be manually filling in all the values for each row before appending it
to the table, so I don't need to use any default values.
Another option would be to split the data over several tables. Then I could
either have a separate index table that lists which column is located in
which table, or just have my code search each table until it finds the
desired column. The downside to this approach is I lose the ability to do
tables.where() searches on multiple columns if they appear in different
tables, but I don't think that's too much of a problem. If I was to do
this, do you have a recommendation for the number of columns per table? By
default PyTables gives a warning if there are more than 512 columns. Does
performance start to degrade above this number?
Thanks again for your quick response,
Josh Ayers
------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users