Re: [Pytables-users] using all results after in-kernel selection

Andrew Straw Tue, 07 Mar 2006 18:02:03 -0800

Francesc Altet wrote:

Hi Andrew,


El dl 06 de 03 del 2006 a les 13:17 -0800, en/na Andrew Straw va
escriure:

I find myself thinking the following operation:
frames = numarray.array([row['frame'] for row in data2d.where(data2d.cols.camn == camn )])
could probably be made much faster with something like:

frames = data2d.where( data2d.cols.camn == camn )['frame']
This doesn't work currently because the __getitem__() method of the Rowclass expects to be operating in the context of an iterator. It returnsonly a single row value.


Well, while I agree that this would be a nice trick to implement (in
fact, this would be relatively easy to do), I prefer to not do this
mainly because of 2 reasons:

1.- IMO, the idiom that you are proposing is not a standard procedure in
context of iterators. However, I might be wrong and you can provide
examples where this kind of idiom is used.

Well, I guess the conceptual issue that confused me is that Row isacting as both an iterable and an iterator. Would it make sense todisentangle these two functionalities? My expectation after doingtable.where( table.cols.a == something ) is that this would return aniterable. I should, of course, be able to iterate through this uponcreation of an iterator object. I know there are other examples wherethe iterable is its own iterator (e.g. a file instance), so the pytablesexample isn't entirely unexpected.

2.- Implementing this would require to keep all the indexes satisfying
the condition in a cache in the iterator (just in case the user wants to
apply the above trick). This fact causes that, in the context of very
large tables, the cache can potentially grow up to a too large size, but
worse than that, it may be perfectly the case that it is completely
unnecessary to keep it in-memory because the user is not going to make
use of it.

Hmm, couldn't the results of .where() just be an iterable that spawnsiterators that only do their lookups when necessary? The only time allresults would be pulled into memory would be when someone does somethinglike the above, which would then be a __getitem__ method of the iterableis called. The spawned iterator could remain 99% the same as what Rowcurrently is. Perhaps I don't understand enough of the guts of Pytablesto know why this won't work, as you suggest is the case.

So, my guts are saying that, while your suggestion would be a nice thing
to have in some contexts, it's not worth to implement for general use,
specially when you have other alternatives (see later), although
admittedly not as elegant as the one that you are proposing.

Is there another idiom to access all theselected rows with a similar speed?


Yes, you can use a combination of the Table.getWhereList and

Table.readCoordinates methods.

Ahh, yes, I should have known you wouldn't have forgotten something likethis... I'll try using it. Thanks for the examples -- complete withtimings -- very useful! This certainly minimizes my desire for the aboveseparation of Row into something like RowIterable and RowIterator intothe realm of aesthetics.


Cheers!
Andrew


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] using all results after in-kernel selection

Reply via email to