Hello Alvaro,

What are the timings using the normal where() method?
http://pytables.github.com/usersguide/libref.html?highlight=where#tables.Table.where

Be Well
Anthony

On Wed, Apr 18, 2012 at 12:33 PM, Alvaro Tejero Cantero <alv...@minin.es>wrote:

> A single array with 312 000 000 int 16 values.
>
> Two (uncompressed) ways to store it:
>
> * Array
>
> >>> wa02[:10]
> array([306, 345, 353, 335, 345, 345, 356, 341, 338, 357], dtype=int16
>
> * Table wtab02 (single column, named 'val')
> >>> wtab02[:10]
> array([(306,), (345,), (353,), (335,), (345,), (345,), (356,), (341,),
>       (338,), (357,)],
>      dtype=[('val', '<i2')])
>
> read time respectively 120 ms, 220 ms.
>
> >>> timeit big=np.nonzero(wa02[:]>1)
> 1 loops, best of 3: 1.66 s per loop
>
> >>> timeit bigtab=wtab02.getWhereList('val>1')
> 1 loops, best of 3: 119 s per loop
>
> with a Complete Sorted Index on val and blosc9 compression:
> 1 loops, best of 3: 149 s per loop
>
> indicating expectedrows=312 000 000 (so that chunklen goes from 32K to
> 132K)
> 1 loops, best of 3: 119 s per loop
>
> (I wanted to compare getting a boolean mask, but it seems that Tables
> don't have a .wheretrue like carrays in Francesc's carray package (?).
> For reference just the mask times to 344 ms).
>
> ---
>
> Question: the difference in speed is due to in-core vs out-of-core?
>
> If so, and if maximum unit of data fits in memory (even considering
> loading a few columns to operate among them) -> is the corollary is
> 'stay in memory at all costs'?
>
> With this exercise, I was trying to find out what is the best
> structure to hold raw data (just one col in this case), and whether
> indexing could help in queries.
>
> -รก.
>
>
> ------------------------------------------------------------------------------
> Better than sec? Nothing is better than sec when it comes to
> monitoring Big Data applications. Try Boundary one-second
> resolution app monitoring today. Free.
> http://p.sf.net/sfu/Boundary-dev2dev
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to