Re: [Pytables-users] Table.where and conditions across tables

Alvaro Tejero Cantero Thu, 29 Mar 2012 08:50:24 -0700

>>   What is your advice on how to monitor the use of
>> memory? (I need this until PyTables is second skin).
>
> top?


I had so far used it only in a very rudimentary way and found the man
page quite intimidating. Would you care to share your tips for this
particular scenario? (e.g. how do you keep the ipython process
'focused'?)

>> It is very rewarding to see that these numexpr's are 3-4 times faster
>> than the same with arrays in memory. However, I didn't find a way to
>> set the number of threads used
>
> Well, you can use the `MAX_THREADS` variable in 'parameters.py', but
> this do not offer separate controls for numexpr and blosc.  Feel free to
> open a ticket asking for imporving this functionality.

Ok, I opened the following tickets (since I have to build the
application first and then revisit the infrastructural issues, I
cannot do more about them now):

* one for implementation of references
https://github.com/PyTables/PyTables/issues/140
* one for the estimation of dataset (group?) size
https://github.com/PyTables/PyTables/issues/141
* one for an interface function to set MAX_THREADS for numexpr
independently of blosc's
https://github.com/PyTables/PyTables/issues/142

>> When evaluating the blosc benchmarks I found that in my system with
>> two 6-core processors , using 12 is best for writing and 6 for
>> reading. Interesting...
>
> Yes, it is :)

Are you interested in my .out bench output file for the
SyntheticBenchmarks page?

>> Another question (maybe for a separate thread): is there any way to
>> shrink memory usage of booleans to 1 bit? It might well be that this
>> optimizes the use of the memory bus (at some processing cost). But I
>> am not aware of a numpy container for this.
>
> Maybe a compressed array?  That would lead to using less that 1 bit per
> element in many situations.  If you are interested in this, look into:
>
> https://github.com/FrancescAlted/carray

Ok, I did some playing around with this:

* a bool array of 10**8 elements with True in two separate slices of
length 10**6 each compresses by ~350. Using .wheretrue to obtain
indices is faster by a factor of 2 to 3 than np.nonzero(normal numpy
array). The resulting filesize is 248kb, still far from storing the 4
or 6 integer indexes that define the slices (I am experimenting with
an approach for scientific databases where this is a concern).

* a sample of my normal electrophysiological data (15M Int16 data
points) compresses by about 1.7-1.8.

* how blosc choses the chunklen is black magic for me, but it seems to
be quite spot-on. (e.g. it changed from '1' for a 64x15M array to
64*1024 when CArraying only one row).

* A quick way to know how well your data will compress in PyTables if
you will be using blosc is to test in the REPL with CArray. I guess
for the other compressors we still need to go (for the moment) to
checking filesystem-reported sizes.

Best,

á.


> --
> Francesc Alted
>
>
> ------------------------------------------------------------------------------
> This SF email is sponsosred by:
> Try Windows Azure free for 90 days Click Here
> http://p.sf.net/sfu/sfd2d-msazure
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Table.where and conditions across tables

Reply via email to