Francesc Alted <falted <at> pytables.org> writes: > > > > > > Representing a 1D column is as easy as passing a 'shape=(N,)' > > > argument to your 1D columns. Look at this example: > > > > > > N = 10 # your 1D array length > > > class TTable(tables.IsDescription): > > > col1 = tables.Int32Col(pos=0) > > > col2 = tables.Float64Col(shape=(N,), pos=1) # you 1D column > > > f = tables.openFile("test.h5", "w") > > > t = f.createTable(f.root, 'table', TTable, 'table test') > > > for i in xrange(10): > > > t.append([[i, numpy.random.rand(N)]]) > > > t.flush() > > > f.close() > > > > > > Hope that helps, > > > > Thank you for the help, I got it working with a Table now. > > I have a couple of new questions: > > My table has a column with a 1000 element 1d numpy array. I would > > like to do the following types of operations where I treat this > > column as a N x 1000 2d array, call it X: > > mean(X,axis=0) > > > > std(X[k].reshape((k, N/k))) > > > > In the mean case, I could imagine doing something like: > > m = zeros((1,1000)) > > for row in X: > > m = m + x > > m/N > > But it seems like this will be slow. I tried just numpy.mean(X) out > > of curiosity, but it took forever and finally ran out of memory. I > > assume it was forming a copy of the array in memory. > > Can you be a more explicit on how you are building X? An autocontained > code example, with timings, is always nice to have. > > Cheers, >
I am building X just as you suggested: Setting up: desc = {'AccNumber':tables.Int32Col(), \ 'dataI':tables.Float32Col(shape=(512,)), \ 'dataQ':tables.Float32Col(shape=(512,))} self.table = self.fileh.createTable(self.fileh.root, \ 'SpectrometerTimeSeries', desc , '') self.fileh.setNodeAttr(self.fileh.root, 'StartTime', time.asctime()) Write data each iteration: self.table.row['AccNumber'] = data['lastAccNum'] self.table.row['dataI'] = dataA self.table.row['dataQ'] = dataB self.table.row.append() Periodically flush the data: if now - self.LastUpdateTime > self.UpdatePeriod: self.table.flush() Writing the data is indeed very fast. I just tried timing the following: table = fh.root.SpectrometerTimeSeries def test(): tic = time.time() m = np.zeros(512) for x in table.iterrows(): m += x['dataI'] print time.time() - tic and it took 82 seconds for my table with 1.35 million rows, so that works out to ~33 MB per second, which is not too bad. I guess my dataset was just larger than I had realized. I would still appreciate any comments on the above code, if I am doing things correctly. Thanks again, Glenn ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users