Hi Andrew, El dl 06 de 03 del 2006 a les 13:17 -0800, en/na Andrew Straw va escriure: > I find myself thinking the following operation: > > frames = numarray.array([row['frame'] for row in data2d.where( > data2d.cols.camn == camn )]) > > could probably be made much faster with something like: > > frames = data2d.where( data2d.cols.camn == camn )['frame'] > > This doesn't work currently because the __getitem__() method of the Row > class expects to be operating in the context of an iterator. It returns > only a single row value.
Well, while I agree that this would be a nice trick to implement (in fact, this would be relatively easy to do), I prefer to not do this mainly because of 2 reasons: 1.- IMO, the idiom that you are proposing is not a standard procedure in context of iterators. However, I might be wrong and you can provide examples where this kind of idiom is used. 2.- Implementing this would require to keep all the indexes satisfying the condition in a cache in the iterator (just in case the user wants to apply the above trick). This fact causes that, in the context of very large tables, the cache can potentially grow up to a too large size, but worse than that, it may be perfectly the case that it is completely unnecessary to keep it in-memory because the user is not going to make use of it. So, my guts are saying that, while your suggestion would be a nice thing to have in some contexts, it's not worth to implement for general use, specially when you have other alternatives (see later), although admittedly not as elegant as the one that you are proposing. > Is there another idiom to access all the > selected rows with a similar speed? Yes, you can use a combination of the Table.getWhereList and Table.readCoordinates methods. For small tables, this will give better results in terms of time. For example, for a table with 1000 rows: $ python2.4 -m timeit -s "import tables; t=tables.openFile('data.nobackup/test-1K.h5').root.table" "coords=t.getWhereList(t.cols.var2 == 998);frames = t.readCoordinates(coords, 'var3')" 100 loops, best of 3: 9.38 msec per loop $ python2.4 -m timeit "import tables; import numarray; t=tables.openFile('data.nobackup/test-1K.h5').root.table" "frames = numarray.array([r['var3'] for r in t.where(t.cols.var2 == 998)])" 10 loops, best of 3: 25.5 msec per loop i.e. almost a 3x speedup. However, for larger tables, the bottleneck is usually the lookup routine. For a table with 1 million rows: $ python2.4 -m timeit -s "import tables; t=tables.openFile('data.nobackup/test-1M.h5').root.table" "coords=t.getWhereList(t.cols.var2 == 998);frames = t.readCoordinates(coords, 'var3')" 10 loops, best of 3: 364 msec per loop $ python2.4 -m timeit "import tables; import numarray; t=tables.openFile('data.nobackup/test-1M.h5').root.table" "frames = numarray.array([r['var3'] for r in t.where(t.cols.var2 == 998)])" 10 loops, best of 3: 380 msec per loop i.e. the difference is not very much because the time is mostly spent looking up the values. Depending on your data, indexing can help improving the lookup times: $ python2.4 -m timeit -s "import tables; t=tables.openFile('data.nobackup/test-1M-idx.h5').root.table" "coords=t.getWhereList(t.cols.var2 == 998);frames = t.readCoordinates(coords, 'var3')" 100 loops, best of 3: 11.8 msec per loop $ python2.4 -m timeit "import tables; import numarray; t=tables.openFile('data.nobackup/test-1M-idx.h5').root.table" "frames = numarray.array([r['var3'] for r in t.where(t.cols.var2 == 998)])" 10 loops, best of 3: 38.3 msec per loop Cheers, -- >0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-" ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642 _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users