Re: [Pytables-users] Table.where and conditions across tables

Alvaro Tejero Cantero Wed, 28 Mar 2012 08:16:02 -0700

That is a perfectly fine solution for me, as long as the arrays aren't
copied in memory for the query.


Thank you!

Thinking that your proposed solution uses iterables to avoid it I tried

boolcond = pt.Expr('(exp(a)<0.9)&(a*b>0.7)|(b*sin(a)<0.1)')
indices = [i for i,v in boolcond if v]
(...) TypeError: 'numpy.bool_' object is not iterable

I can, however, do
boolarr = boolcond.eval()
indices = np.nonzero(boolarr)

but then I get boolarr into memory.

Did I miss something? What is your advice on how to monitor the use of
memory? (I need this until PyTables is second skin).

It is very rewarding to see that these numexpr's are 3-4 times faster
than the same with arrays in memory. However, I didn't find a way to
set the number of threads used

When evaluating the blosc benchmarks I found that in my system with
two 6-core processors , using 12 is best for writing and 6 for
reading. Interesting...

Another question (maybe for a separate thread): is there any way to
shrink memory usage of booleans to 1 bit? It might well be that this
optimizes the use of the memory bus (at some processing cost). But I
am not aware of a numpy container for this.

-á.



On Wed, Mar 28, 2012 at 00:34, Francesc Alted <fal...@pytables.org> wrote:
> Another option that occurred to me recently is to save all your columns
> as unidimensional arrays (Array object, or, if you want compression, a
> CArray or EArray), and then use them as components of a boolean
> expression using the class `tables.Expr`.  For example, if a, b and c
> are unidimensional arrays of the same size, you can do:
>
> bool_cond = tables.Expr('(2*a>0) & (cos(b) < .5) & (c**3 < 1)')
> indices = [ind for ind, bool_val in bool_cond if bool_val ]
> results = your_dataset[indices]
>
> Does that make sense for your problem?  Of course, this class uses
> numexpr behind the scenes, so it is perfectly equivalent to classical
> queries in tables, but without being restricted to use tables.  Please
> see more details about the `tables.Expr` in:
>
> http://pytables.github.com/usersguide/libref.html#the-expr-class-a-general-purpose-expression-evaluator
>
> Francesc
>
> On 3/26/12 12:43 PM, Alvaro Tejero Cantero wrote:
>> Would it be an option to have
>>
>> * raw data on one table
>> * all imaginable columns used for query conditions in another table
>> (but how to grow it in columns without deleting&  recreating?)
>>
>> and fetch indexes for the first based on .whereList(condition) of the second?
>>
>> Are there alternatives?
>>
>> -á.
>>
>>
>>
>> On Mon, Mar 26, 2012 at 18:29, Alvaro Tejero Cantero<alv...@minin.es>  wrote:
>>> Hi there,
>>>
>>> I am following advice by Anthony and giving a go at representing
>>> different sensors in my dataset as columns in a Table, or in several
>>> Tables. This is about in-kernel queries.
>>>
>>> The documentation of condvars in Table.where [1] says "condvars should
>>> consist of identifier-like strings pointing to Column (see The Column
>>> class) instances of this table, or to other values (which will be
>>> converted to arrays)".
>>>
>>> Conversion to arrays will likely exhaust the memory and be slow.
>>> Furthermore, when I tried with a toy example (naively extrapolating
>>> the behaviour of indexing in numpy), I obtained
>>>
>>> In [109]: valuesext = [x['V01'] for x in tet1.where("""(b>18)&
>>> (a<4)""", condvars={'a':tet1.cols.V01,'b':tet2.cols.V02})]
>>>
>>> (... elided output)
>>> ValueError: variable ``b`` refers to a column which is not part of
>>> table ``/tetrode1
>>>
>>> I am interested in the scenario where an in-kernel query is applied to
>>> a table based in columns *from other tables*  that still are aligned
>>> with the current table (same number of elements). These conditions may
>>> be sophisticated and mix columns from the local table as well.
>>>
>>> One obvious solution would be to put all aligned columns on the same
>>> table. But adding columns to a table is cumbersome, and I cannot think
>>> beforehand of the many precomputed columns that I would like to use as
>>> query conditions.
>>>
>>> What do you recommend in this scenario?
>>>
>>> -á.
>>>
>>> [1] 
>>> http://pytables.github.com/usersguide/libref.html?highlight=vlstring#tables.Table.where
>> ------------------------------------------------------------------------------
>> This SF email is sponsosred by:
>> Try Windows Azure free for 90 days Click Here
>> http://p.sf.net/sfu/sfd2d-msazure
>> _______________________________________________
>> Pytables-users mailing list
>> Pytables-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
> --
> Francesc Alted
>
>
> ------------------------------------------------------------------------------
> This SF email is sponsosred by:
> Try Windows Azure free for 90 days Click Here
> http://p.sf.net/sfu/sfd2d-msazure
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Table.where and conditions across tables

Reply via email to