2011/1/4, Ben Elliston <b...@air.net.au>:
> Hi Francesc,
>
> On Tue, Jan 04, 2011 at 01:11:03PM +0100, Francesc Alted wrote:
>
>> Well, yes and no ;-)  In principle they are only compressed on-disk,
>> but if you access a CArray enough, and it is small enough, then
>> chances are that it would actually exist in the OS filesystem cache
>> memory in compressed state.  But this is kind of fake in-memory
>> compression.  For a true compressed array in-memory, see this other
>> project of mine:
>> https://github.com/FrancescAlted/carray
>
> Thanks.  That's very helpeful.
>
> Something else I forgot to ask regarding the implementation is: to
> what extent does PyTables employ threads?  I have two arrays with
> shape (107352, 679, 839) and often need to perform operations over the
> entire array.

PyTables does use multithreading in two places.  The first one is
whenever you make use of the Numexpr library.  Numexpr is mainly used
in the tables.Expr computing kernel and during table queries with
conditions (in addition, if Numexpr is linked with Intel's MKL, it can
make use of SSE instructions too).  The other place where
multithreading is used, is if you make use of the Blosc filter for
compressing your datasets; however, multithreading only enters in
action here if chunkshapes of datasets are larger than 128 KB (the
reason is that, for chunks smaller than this, the overhead of
multithreading is too much).  Multithreaded Blosc generally helps in
getting better I/O speed, while tables.Expr accelerates pure
computations.

> I have an 8-way machine and have not yet attempted to access these
> arrays using multiple threads or processes (the operations are almost
> always highly parallel).  If PyTables is already using threads to
> accelerate internal operation, there would not be much point trying to
> do this myself.

Well, it depends on what you want to do.  If you cannot express your
computations in term of tables.Expr expressions, you may want to
implement your own multithreading.

Cheers,

-- 
Francesc Alted

------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to