Re: [Pytables-users] Data Format Suggestions

Glenn Mon, 05 May 2008 11:49:54 -0700

Francesc Alted <falted <at> pytables.org> writes:

> > >
> > > Representing a 1D column is as easy as passing a 'shape=(N,)' 
> > > argument to your 1D columns.  Look at this example:
> > >
> > > N = 10  # your 1D array length
> > > class TTable(tables.IsDescription):
> > >     col1 = tables.Int32Col(pos=0)
> > >     col2 = tables.Float64Col(shape=(N,), pos=1)  # you 1D column
> > > f = tables.openFile("test.h5", "w")
> > > t = f.createTable(f.root, 'table', TTable, 'table test')
> > > for i in xrange(10):
> > >     t.append([[i, numpy.random.rand(N)]])
> > > t.flush()
> > > f.close()
> > >
> > > Hope that helps,
> >
> > Thank you for the help, I got it working with a Table now.
> > I have a couple of new questions:
> > My table has a column with a 1000 element 1d numpy array. I would
> > like to do the following types of operations where I treat this
> > column as a N x 1000 2d array, call it X:
> > mean(X,axis=0)
> >
> > std(X[k].reshape((k, N/k)))
> >
> > In the mean case, I could imagine doing something like:
> > m = zeros((1,1000))
> > for row in X:
> >   m = m + x
> > m/N
> > But it seems like this will be slow. I tried just numpy.mean(X) out
> > of curiosity, but it took forever and finally ran out of memory. I
> > assume it was forming a copy of the array in memory.
> 
> Can you be a more explicit on how you are building X?  An autocontained 
> code example, with timings, is always nice to have.
> 
> Cheers,
>


I am building X just as you suggested:
Setting up:
desc = {'AccNumber':tables.Int32Col(), \

'dataI':tables.Float32Col(shape=(512,)), \
'dataQ':tables.Float32Col(shape=(512,))}
            self.table = self.fileh.createTable(self.fileh.root, \
'SpectrometerTimeSeries', desc , '')
            self.fileh.setNodeAttr(self.fileh.root, 'StartTime', time.asctime())

Write data each iteration:
               self.table.row['AccNumber'] = data['lastAccNum']
                self.table.row['dataI'] = dataA
                self.table.row['dataQ'] = dataB
                self.table.row.append()

Periodically flush the data:
        if now - self.LastUpdateTime > self.UpdatePeriod:
                self.table.flush()

Writing the data is indeed very fast.

I just tried timing the following:

table = fh.root.SpectrometerTimeSeries
def test():
    tic = time.time()
    m = np.zeros(512)
    for x in table.iterrows():
        m += x['dataI']
    print time.time() - tic 

and it took 82 seconds for my table with 1.35 million rows, so that works out to
~33 MB per second, which is not too bad. I guess my dataset was just larger than
I had realized. I would still appreciate any comments on the above code, if I am
doing things correctly.

Thanks again,
Glenn




-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Data Format Suggestions

Reply via email to