Hi Anthony,

On 06/06/2012 12:45 AM, Anthony Scopatz wrote:

    I think something like
    histogram(tables.Expr('col0 + col1**2', mytable.where('col2 > 15 &
    abs(col3) < 5')).eval())
    would be ideal, but since where() returns a row iterator, and not
    something that I can extract Column objects from, I don't see any
    way to make it work.


You are probably looking for the readWhere() method <http://pytables.github.com/usersguide/libref.html#tables.Table.readWhere> which normally returns a numpy structured array. The line you are looking for is thus:

histogram(tables.Expr('col0 + col1**2', mytable.readWhere('col2 > 15 & abs(col3) < 5')).eval())

This will likely be fast in both cases.  I hope this helps.

Oddly, it doesn't work with tables.Expr, but does work with numexpr.evaluate. In the case I talked about before with 7M rows, when selecting very few rows, it does just fine (between the other two solutions), but when selecting all rows, it is still about 2.75x slower than the technique of using tables.Expr for both the histogram var and the condition.

I think that this is because .readWhere() pulls all the table rows satisfying the where condition into memory first, and it furthermore does so for all columns of all selected rows, so, for a table with many columns, it has to read many times as much data into memory. I can use the field parameter, but it only accepts one single field, so I would have to perform the query once per variable used in the histogram variable expression to do that.

Using .readWhere() gives a medium-fast performance in both cases, but I still feel like it is not quite the right thing because it reads the data completely into memory instead of allowing the computation to be performed out-of-core. Perhaps it is not really feasible, but I think the ideal would be to have a .where type query operator that returns Column objects or a Table object, with a "view" imposed in either case.
Regards,
Jon
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to