Re: [Pytables-users] using all results after in-kernel selection

Francesc Altet Tue, 07 Mar 2006 03:01:08 -0800

Hi Andrew,

El dl 06 de 03 del 2006 a les 13:17 -0800, en/na Andrew Straw va
escriure:
> I find myself thinking the following operation:
> 
> frames = numarray.array([row['frame'] for row in data2d.where( 
> data2d.cols.camn == camn )])
> 
> could probably be made much faster with something like:
> 
> frames = data2d.where( data2d.cols.camn == camn )['frame']
> 
> This doesn't work currently because the __getitem__() method of the Row 
> class expects to be operating in the context of an iterator. It returns 
> only a single row value.


Well, while I agree that this would be a nice trick to implement (in
fact, this would be relatively easy to do), I prefer to not do this
mainly because of 2 reasons:

1.- IMO, the idiom that you are proposing is not a standard procedure in
context of iterators. However, I might be wrong and you can provide
examples where this kind of idiom is used.

2.- Implementing this would require to keep all the indexes satisfying
the condition in a cache in the iterator (just in case the user wants to
apply the above trick). This fact causes that, in the context of very
large tables, the cache can potentially grow up to a too large size, but
worse than that, it may be perfectly the case that it is completely
unnecessary to keep it in-memory because the user is not going to make
use of it.

So, my guts are saying that, while your suggestion would be a nice thing
to have in some contexts, it's not worth to implement for general use,
specially when you have other alternatives (see later), although
admittedly not as elegant as the one that you are proposing.

>  Is there another idiom to access all the 
> selected rows with a similar speed?

Yes, you can use a combination of the Table.getWhereList and
Table.readCoordinates methods. For small tables, this will give better
results in terms of time. For example, for a table with 1000 rows:

$ python2.4 -m timeit -s "import tables;
t=tables.openFile('data.nobackup/test-1K.h5').root.table"
"coords=t.getWhereList(t.cols.var2 == 998);frames =
t.readCoordinates(coords, 'var3')"
100 loops, best of 3: 9.38 msec per loop

$ python2.4 -m timeit "import tables; import numarray;
t=tables.openFile('data.nobackup/test-1K.h5').root.table" "frames =
numarray.array([r['var3'] for r in t.where(t.cols.var2 == 998)])"
10 loops, best of 3: 25.5 msec per loop

i.e. almost a 3x speedup. However, for larger tables, the bottleneck is
usually the lookup routine. For a table with 1 million rows:

$ python2.4 -m timeit -s "import tables;
t=tables.openFile('data.nobackup/test-1M.h5').root.table"
"coords=t.getWhereList(t.cols.var2 == 998);frames =
t.readCoordinates(coords, 'var3')"
10 loops, best of 3: 364 msec per loop

$ python2.4 -m timeit "import tables; import numarray;
t=tables.openFile('data.nobackup/test-1M.h5').root.table" "frames =
numarray.array([r['var3'] for r in t.where(t.cols.var2 == 998)])"
10 loops, best of 3: 380 msec per loop

i.e. the difference is not very much because the time is mostly spent
looking up the values. Depending on your data, indexing can help
improving the lookup times:

$ python2.4 -m timeit -s "import tables;
t=tables.openFile('data.nobackup/test-1M-idx.h5').root.table"
"coords=t.getWhereList(t.cols.var2 == 998);frames =
t.readCoordinates(coords, 'var3')"
100 loops, best of 3: 11.8 msec per loop

$ python2.4 -m timeit "import tables; import numarray;
t=tables.openFile('data.nobackup/test-1M-idx.h5').root.table" "frames =
numarray.array([r['var3'] for r in t.where(t.cols.var2 == 998)])"
10 loops, best of 3: 38.3 msec per loop

Cheers,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] using all results after in-kernel selection

Reply via email to