Francesc Alted <falted <at> pytables.org> writes:
> > >
> > > Representing a 1D column is as easy as passing a 'shape=(N,)'
> > > argument to your 1D columns. Look at this example:
> > >
> > > N = 10 # your 1D array length
> > > class TTable(tables.IsDescription):
> > > col1 = tables.Int32Col(pos=0)
> > > col2 = tables.Float64Col(shape=(N,), pos=1) # you 1D column
> > > f = tables.openFile("test.h5", "w")
> > > t = f.createTable(f.root, 'table', TTable, 'table test')
> > > for i in xrange(10):
> > > t.append([[i, numpy.random.rand(N)]])
> > > t.flush()
> > > f.close()
> > >
> > > Hope that helps,
> >
> > Thank you for the help, I got it working with a Table now.
> > I have a couple of new questions:
> > My table has a column with a 1000 element 1d numpy array. I would
> > like to do the following types of operations where I treat this
> > column as a N x 1000 2d array, call it X:
> > mean(X,axis=0)
> >
> > std(X[k].reshape((k, N/k)))
> >
> > In the mean case, I could imagine doing something like:
> > m = zeros((1,1000))
> > for row in X:
> > m = m + x
> > m/N
> > But it seems like this will be slow. I tried just numpy.mean(X) out
> > of curiosity, but it took forever and finally ran out of memory. I
> > assume it was forming a copy of the array in memory.
>
> Can you be a more explicit on how you are building X? An autocontained
> code example, with timings, is always nice to have.
>
> Cheers,
>
I am building X just as you suggested:
Setting up:
desc = {'AccNumber':tables.Int32Col(), \
'dataI':tables.Float32Col(shape=(512,)), \
'dataQ':tables.Float32Col(shape=(512,))}
self.table = self.fileh.createTable(self.fileh.root, \
'SpectrometerTimeSeries', desc , '')
self.fileh.setNodeAttr(self.fileh.root, 'StartTime', time.asctime())
Write data each iteration:
self.table.row['AccNumber'] = data['lastAccNum']
self.table.row['dataI'] = dataA
self.table.row['dataQ'] = dataB
self.table.row.append()
Periodically flush the data:
if now - self.LastUpdateTime > self.UpdatePeriod:
self.table.flush()
Writing the data is indeed very fast.
I just tried timing the following:
table = fh.root.SpectrometerTimeSeries
def test():
tic = time.time()
m = np.zeros(512)
for x in table.iterrows():
m += x['dataI']
print time.time() - tic
and it took 82 seconds for my table with 1.35 million rows, so that works out to
~33 MB per second, which is not too bad. I guess my dataset was just larger than
I had realized. I would still appreciate any comments on the above code, if I am
doing things correctly.
Thanks again,
Glenn
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users