2011/1/4, Ben Elliston <b...@air.net.au>: > Hi Francesc, > > On Tue, Jan 04, 2011 at 01:11:03PM +0100, Francesc Alted wrote: > >> Well, yes and no ;-) In principle they are only compressed on-disk, >> but if you access a CArray enough, and it is small enough, then >> chances are that it would actually exist in the OS filesystem cache >> memory in compressed state. But this is kind of fake in-memory >> compression. For a true compressed array in-memory, see this other >> project of mine: >> https://github.com/FrancescAlted/carray > > Thanks. That's very helpeful. > > Something else I forgot to ask regarding the implementation is: to > what extent does PyTables employ threads? I have two arrays with > shape (107352, 679, 839) and often need to perform operations over the > entire array.
PyTables does use multithreading in two places. The first one is whenever you make use of the Numexpr library. Numexpr is mainly used in the tables.Expr computing kernel and during table queries with conditions (in addition, if Numexpr is linked with Intel's MKL, it can make use of SSE instructions too). The other place where multithreading is used, is if you make use of the Blosc filter for compressing your datasets; however, multithreading only enters in action here if chunkshapes of datasets are larger than 128 KB (the reason is that, for chunks smaller than this, the overhead of multithreading is too much). Multithreaded Blosc generally helps in getting better I/O speed, while tables.Expr accelerates pure computations. > I have an 8-way machine and have not yet attempted to access these > arrays using multiple threads or processes (the operations are almost > always highly parallel). If PyTables is already using threads to > accelerate internal operation, there would not be much point trying to > do this myself. Well, it depends on what you want to do. If you cannot express your computations in term of tables.Expr expressions, you may want to implement your own multithreading. Cheers, -- Francesc Alted ------------------------------------------------------------------------------ Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users