Re: [Pytables-users] Data Format Suggestions

Glenn Sat, 03 May 2008 02:39:23 -0700

Francesc Alted <falted <at> pytables.org> writes:

> 
> A Friday 02 May 2008, Glenn escrigué:
> > Hello,
> > I would like to use pytables to store the output from a spectrometer.
> > The spectra come in at a rapid rate. I am having trouble
> > understanding how to set up a data structure for the data. The two
> > options that seem reasonable are an EArray and a Table. The example
> > shown for an EArray leaves me wondering how to make an array of 
> > numpy 1D array rows that I can dynamically add to.
> 
> If all the data you want to save is homogeneous, using an EArray is ok.  
> See below an example of use:
> 
> N = 10  # your 1D array length
> f = tables.openFile("test.h5", "w")
> e = f.createEArray(f.root, 'earray', tables.FloatAtom(), (0,N), 'test')
> for i in xrange(10):
>     e.append([numpy.random.rand(N)])
> f.close()
> 
> > With a Table, I 
> > tried setting up an IsDescription subclass but could not figure out
> > how to add a member to again represent a 1D array.
> 
> Generally speaking, a Table is best for saving heterogeneous datasets. 
> In addition, the I/O is buffered in PyTables space (and not only in 
> HDF5) and it is generally faster than using an EArray, so it may be 
> more adequate in your case.
> 
> Representing a 1D column is as easy as passing a 'shape=(N,)'  argument 
> to your 1D columns.  Look at this example:
> 
> N = 10  # your 1D array length
> class TTable(tables.IsDescription):
>     col1 = tables.Int32Col(pos=0)
>     col2 = tables.Float64Col(shape=(N,), pos=1)  # you 1D column
> f = tables.openFile("test.h5", "w")
> t = f.createTable(f.root, 'table', TTable, 'table test')
> for i in xrange(10):
>     t.append([[i, numpy.random.rand(N)]])
> t.flush()
> f.close()
> 
> Hope that helps,
>



Thank you for the help, I got it working with a Table now.
I have a couple of new questions:
My table has a column with a 1000 element 1d numpy array. I would like to do the
following types of operations where I treat this column as a N x 1000 2d array,
call it X:
mean(X,axis=0)

std(X[k].reshape((k, N/k)))

In the mean case, I could imagine doing something like:
m = zeros((1,1000))
for row in X:
  m = m + x
m/N
But it seems like this will be slow. I tried just numpy.mean(X) out of
curiosity, but it took forever and finally ran out of memory. I assume it was
forming a copy of the array in memory.

Thanks again for the help!



-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Data Format Suggestions

Reply via email to