A Monday 05 May 2008, Glenn escrigué: > Francesc Alted <falted <at> pytables.org> writes: > > > > Representing a 1D column is as easy as passing a 'shape=(N,)' > > > > argument to your 1D columns. Look at this example: > > > > > > > > N = 10 # your 1D array length > > > > class TTable(tables.IsDescription): > > > > col1 = tables.Int32Col(pos=0) > > > > col2 = tables.Float64Col(shape=(N,), pos=1) # you 1D > > > > column f = tables.openFile("test.h5", "w") > > > > t = f.createTable(f.root, 'table', TTable, 'table test') > > > > for i in xrange(10): > > > > t.append([[i, numpy.random.rand(N)]]) > > > > t.flush() > > > > f.close() > > > > > > > > Hope that helps, > > > > > > Thank you for the help, I got it working with a Table now. > > > I have a couple of new questions: > > > My table has a column with a 1000 element 1d numpy array. I would > > > like to do the following types of operations where I treat this > > > column as a N x 1000 2d array, call it X: > > > mean(X,axis=0) > > > > > > std(X[k].reshape((k, N/k))) > > > > > > In the mean case, I could imagine doing something like: > > > m = zeros((1,1000)) > > > for row in X: > > > m = m + x > > > m/N > > > But it seems like this will be slow. I tried just numpy.mean(X) > > > out of curiosity, but it took forever and finally ran out of > > > memory. I assume it was forming a copy of the array in memory. > > > > Can you be a more explicit on how you are building X? An > > autocontained code example, with timings, is always nice to have. > > > > Cheers, > > I am building X just as you suggested: > Setting up: > desc = {'AccNumber':tables.Int32Col(), \ > > 'dataI':tables.Float32Col(shape=(512,)), \ > 'dataQ':tables.Float32Col(shape=(512,))} > self.table = self.fileh.createTable(self.fileh.root, \ > 'SpectrometerTimeSeries', desc , '') > self.fileh.setNodeAttr(self.fileh.root, 'StartTime', > time.asctime()) > > Write data each iteration: > self.table.row['AccNumber'] = data['lastAccNum'] > self.table.row['dataI'] = dataA > self.table.row['dataQ'] = dataB > self.table.row.append() > > Periodically flush the data: > if now - self.LastUpdateTime > self.UpdatePeriod: > self.table.flush() > > Writing the data is indeed very fast. > > I just tried timing the following: > > table = fh.root.SpectrometerTimeSeries > def test(): > tic = time.time() > m = np.zeros(512) > for x in table.iterrows(): > m += x['dataI'] > print time.time() - tic > > and it took 82 seconds for my table with 1.35 million rows, so that > works out to ~33 MB per second, which is not too bad. I guess my > dataset was just larger than I had realized. I would still appreciate > any comments on the above code, if I am doing things correctly.
I see. First of all, since the tables in PyTables are currently row wise ordered, you need to load complete rows whenever you iterate through the table. So, you are reading at 4MB*1.3e6/82 =~ 66 MB/s which is pretty good for a modern single hard disk. If you want more speed, you have two options: - Use LZO so as to compress your data (see the 'filters=' parameter of createTable()). If your data is compressible enough you should be able to duplicate your I/O throughtput provided that you are using a relatively modern CPU. See chapter 5 of User's Guide for more info about the speed-ups you can achieve by using compression. - Save the columns in EArrays. With this, you only have to read the data from one column out of the disk, halving the time required to do this. Of course, you can combine both approaches for optimal results. Regards, -- Francesc Alted ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users