Francesc Altet wrote:

Hi Andrew,

El dl 06 de 03 del 2006 a les 13:17 -0800, en/na Andrew Straw va
escriure:
I find myself thinking the following operation:

frames = numarray.array([row['frame'] for row in data2d.where( data2d.cols.camn == camn )])

could probably be made much faster with something like:

frames = data2d.where( data2d.cols.camn == camn )['frame']

This doesn't work currently because the __getitem__() method of the Row class expects to be operating in the context of an iterator. It returns only a single row value.

Well, while I agree that this would be a nice trick to implement (in
fact, this would be relatively easy to do), I prefer to not do this
mainly because of 2 reasons:

1.- IMO, the idiom that you are proposing is not a standard procedure in
context of iterators. However, I might be wrong and you can provide
examples where this kind of idiom is used.
Well, I guess the conceptual issue that confused me is that Row is acting as both an iterable and an iterator. Would it make sense to disentangle these two functionalities? My expectation after doing table.where( table.cols.a == something ) is that this would return an iterable. I should, of course, be able to iterate through this upon creation of an iterator object. I know there are other examples where the iterable is its own iterator (e.g. a file instance), so the pytables example isn't entirely unexpected.

2.- Implementing this would require to keep all the indexes satisfying
the condition in a cache in the iterator (just in case the user wants to
apply the above trick). This fact causes that, in the context of very
large tables, the cache can potentially grow up to a too large size, but
worse than that, it may be perfectly the case that it is completely
unnecessary to keep it in-memory because the user is not going to make
use of it.
Hmm, couldn't the results of .where() just be an iterable that spawns iterators that only do their lookups when necessary? The only time all results would be pulled into memory would be when someone does something like the above, which would then be a __getitem__ method of the iterable is called. The spawned iterator could remain 99% the same as what Row currently is. Perhaps I don't understand enough of the guts of Pytables to know why this won't work, as you suggest is the case.

So, my guts are saying that, while your suggestion would be a nice thing
to have in some contexts, it's not worth to implement for general use,
specially when you have other alternatives (see later), although
admittedly not as elegant as the one that you are proposing.

Is there another idiom to access all the selected rows with a similar speed?

Yes, you can use a combination of the Table.getWhereList and
Table.readCoordinates methods.
Ahh, yes, I should have known you wouldn't have forgotten something like this... I'll try using it. Thanks for the examples -- complete with timings -- very useful! This certainly minimizes my desire for the above separation of Row into something like RowIterable and RowIterator into the realm of aesthetics.

Cheers!
Andrew


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to