A Thursday 02 December 2010 18:19:50 david.bri...@ubs.com escrigué: > The pytables side, once I figured how to use it was simple. I'm > needing to write a memory cache on top because I need fast temporary > tables that have the same interface (retrieval) but aren't backed by > disk. That's the slow bit.
Hmm, in my opinion it is better to trust with the OS filesystem cache. If your temporary tables are accessed enough times (2 or 3, typically), they will eventually get loaded into memory (OS cache). You may want to use compression on these tables so that the OS can load them more easily (and using Blosc will be of great help so as to not loose speed). For more speed, you may want to consider using the carray package: https://github.com/FrancescAlted/carray It contains a special object, called a ctable, that is basically a NumPy structured array, but with important differences: - The data is stored column-wise (not row-wise) - The data can be compressed This minimizes a lot memory usage (both bandwidth and consumption). To prove this, I've just added a small benchmark (bench/ctable-query.py) that do some query on a table with 1 million rows and 100 columns. Here are the results (run on a 2-core machine): Querying '(f1>300) & ((f2>3) & (f2<1e4))' with 10^6 rows Time for plain numpy --> 0.140 Time for numexpr --> 0.114 Time for ctable (uncompressed) --> 0.048 -- size (MB): 770 Time for ctable (compressed) --> 0.048 -- size (MB): 20 Time for PyTables (non-indexed) --> 0.618 PyTables Pro detected! Indexing f2 column... Time for PyTables Pro (indexed) --> 0.113 Note how the ctable object reduces the time to compute the query by a factor of 2x, while reducing the memory needs by a huge amount (that depends on the dataset indeed). It can even do queries faster than Pro! I'm in the final stages for releasing carray 0.3 (I still need to finish parts of the documentation), but I plan to announce 0.3 next week or so. carray will be a nice complement for PyTables (not only for implementing fast temporary tables, but also to serve as intermediate buffer for improved I/O), and I expect to integrate it for PyTables 3. -- Francesc Alted ------------------------------------------------------------------------------ Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! Tap into the largest installed PC base & get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users