Hi Ben, 2011/1/4, Ben Elliston <b...@air.net.au>: > On Mon, Jan 03, 2011 at 05:48:50PM +0100, Francesc Alted wrote: > >> Array objects are non-chunked. In order to use compression, you need >> to use a CArray: >> http://www.pytables.org/docs/manual/ch04.html#CArrayClassDescr > > Here's my chance to ask a question that I've had for a week or two: > how are compressed arrays actually implemented? I gather that the > array contents are compressed (using the chosen compressor) into the > HDF5 data file, but are CArrays actually compressed in memory?
Well, yes and no ;-) In principle they are only compressed on-disk, but if you access a CArray enough, and it is small enough, then chances are that it would actually exist in the OS filesystem cache memory in compressed state. But this is kind of fake in-memory compression. For a true compressed array in-memory, see this other project of mine: https://github.com/FrancescAlted/carray > My data set is larger than physical memory, but has a lot of repeated > values that lead to ~90% compression. Thus, it should be possible to > keep the whole array compressed in memory and decompress chunks of the > array as necessary. Is this what PyTables does? You can try doing this with PyTables, yes. As I said, PyTables keeps data compressed on disk in chunks. Whenever you read a chunk(s) of the disk-based array, it is decompressed automatically and you receive the chunk in decompressed form (the opposite goes for writing). Furthermore, you can perform different operations with compressed arrays on-disk by using the tables.Expr module: http://www.pytables.org/moin/ComputingKernel You may also want to use the carray package mentioned above, but this is still in early beta (for example, multidimensional arrays are not supported yet, just bidimensional tables). HTH, -- Francesc Alted ------------------------------------------------------------------------------ Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users