>> What is your advice on how to monitor the use of >> memory? (I need this until PyTables is second skin). > > top?
I had so far used it only in a very rudimentary way and found the man page quite intimidating. Would you care to share your tips for this particular scenario? (e.g. how do you keep the ipython process 'focused'?) >> It is very rewarding to see that these numexpr's are 3-4 times faster >> than the same with arrays in memory. However, I didn't find a way to >> set the number of threads used > > Well, you can use the `MAX_THREADS` variable in 'parameters.py', but > this do not offer separate controls for numexpr and blosc. Feel free to > open a ticket asking for imporving this functionality. Ok, I opened the following tickets (since I have to build the application first and then revisit the infrastructural issues, I cannot do more about them now): * one for implementation of references https://github.com/PyTables/PyTables/issues/140 * one for the estimation of dataset (group?) size https://github.com/PyTables/PyTables/issues/141 * one for an interface function to set MAX_THREADS for numexpr independently of blosc's https://github.com/PyTables/PyTables/issues/142 >> When evaluating the blosc benchmarks I found that in my system with >> two 6-core processors , using 12 is best for writing and 6 for >> reading. Interesting... > > Yes, it is :) Are you interested in my .out bench output file for the SyntheticBenchmarks page? >> Another question (maybe for a separate thread): is there any way to >> shrink memory usage of booleans to 1 bit? It might well be that this >> optimizes the use of the memory bus (at some processing cost). But I >> am not aware of a numpy container for this. > > Maybe a compressed array? That would lead to using less that 1 bit per > element in many situations. If you are interested in this, look into: > > https://github.com/FrancescAlted/carray Ok, I did some playing around with this: * a bool array of 10**8 elements with True in two separate slices of length 10**6 each compresses by ~350. Using .wheretrue to obtain indices is faster by a factor of 2 to 3 than np.nonzero(normal numpy array). The resulting filesize is 248kb, still far from storing the 4 or 6 integer indexes that define the slices (I am experimenting with an approach for scientific databases where this is a concern). * a sample of my normal electrophysiological data (15M Int16 data points) compresses by about 1.7-1.8. * how blosc choses the chunklen is black magic for me, but it seems to be quite spot-on. (e.g. it changed from '1' for a 64x15M array to 64*1024 when CArraying only one row). * A quick way to know how well your data will compress in PyTables if you will be using blosc is to test in the REPL with CArray. I guess for the other compressors we still need to go (for the moment) to checking filesystem-reported sizes. Best, á. > -- > Francesc Alted > > > ------------------------------------------------------------------------------ > This SF email is sponsosred by: > Try Windows Azure free for 90 days Click Here > http://p.sf.net/sfu/sfd2d-msazure > _______________________________________________ > Pytables-users mailing list > Pytables-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pytables-users ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users