Hi Anthony,
On 06/06/2012 12:45 AM, Anthony Scopatz wrote:
I think something like
histogram(tables.Expr('col0 + col1**2', mytable.where('col2 > 15 &
abs(col3) < 5')).eval())
would be ideal, but since where() returns a row iterator, and not
something that I can extract Column objects from, I don't see any
way to make it work.
You are probably looking for the readWhere() method
<http://pytables.github.com/usersguide/libref.html#tables.Table.readWhere> which
normally returns a numpy structured array. The line you are looking
for is thus:
histogram(tables.Expr('col0 + col1**2', mytable.readWhere('col2 > 15 &
abs(col3) < 5')).eval())
This will likely be fast in both cases. I hope this helps.
Oddly, it doesn't work with tables.Expr, but does work with
numexpr.evaluate. In the case I talked about before with 7M rows, when
selecting very few rows, it does just fine (between the other two
solutions), but when selecting all rows, it is still about 2.75x slower
than the technique of using tables.Expr for both the histogram var and
the condition.
I think that this is because .readWhere() pulls all the table rows
satisfying the where condition into memory first, and it furthermore
does so for all columns of all selected rows, so, for a table with many
columns, it has to read many times as much data into memory. I can use
the field parameter, but it only accepts one single field, so I would
have to perform the query once per variable used in the histogram
variable expression to do that.
Using .readWhere() gives a medium-fast performance in both cases, but I
still feel like it is not quite the right thing because it reads the
data completely into memory instead of allowing the computation to be
performed out-of-core. Perhaps it is not really feasible, but I think
the ideal would be to have a .where type query operator that returns
Column objects or a Table object, with a "view" imposed in either case.
Regards,
Jon
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users