A Thursday 02 December 2010 18:19:50 david.bri...@ubs.com escrigué:
> The pytables side, once I figured how to use it was simple. I'm
> needing to write a memory cache on top because I need fast temporary
> tables that have the same interface (retrieval) but aren't backed by
> disk. That's the slow bit.

Hmm, in my opinion it is better to trust with the OS filesystem cache.  
If your temporary tables are accessed enough times (2 or 3, typically), 
they will eventually get loaded into memory (OS cache).  You may want to 
use compression on these tables so that the OS can load them more easily 
(and using Blosc will be of great help so as to not loose speed).

For more speed, you may want to consider using the carray package:

https://github.com/FrancescAlted/carray

It contains a special object, called a ctable, that is basically a NumPy 
structured array, but with important differences:

- The data is stored column-wise (not row-wise)
- The data can be compressed

This minimizes a lot memory usage (both bandwidth and consumption).  To 
prove this, I've just added a small benchmark (bench/ctable-query.py) 
that do some query on a table with 1 million rows and 100 columns.  Here 
are the results (run on a 2-core machine):

Querying '(f1>300) & ((f2>3) & (f2<1e4))' with 10^6 rows
Time for plain numpy --> 0.140
Time for numexpr --> 0.114
Time for ctable (uncompressed) --> 0.048 -- size (MB): 770
Time for ctable (compressed) --> 0.048  -- size (MB): 20
Time for PyTables (non-indexed) --> 0.618
PyTables Pro detected!  Indexing f2 column...
Time for PyTables Pro (indexed) --> 0.113

Note how the ctable object reduces the time to compute the query by a 
factor of 2x, while reducing the memory needs by a huge amount (that 
depends on the dataset indeed).  It can even do queries faster than Pro!

I'm in the final stages for releasing carray 0.3 (I still need to finish 
parts of the documentation), but I plan to announce 0.3 next week or so.  
carray will be a nice complement for PyTables (not only for implementing 
fast temporary tables, but also to serve as intermediate buffer for 
improved I/O), and I expect to integrate it for PyTables 3.

-- 
Francesc Alted

------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to