A Monday 27 September 2010 20:52:29 Koert Kuipers escrigué: > Unfortunately I cannot do that since it is company data. I wrote a > simple script that queries a file twice: [clip]
> Now I created the same file with lzo, blosc, and zlib compression, > each with 2 chunkshapes (large means chunkshape = (3971,), while > small means chunkshape = (248,)) > > I ran the script for each file twice (to detect any operating system > file buffering). Results: [clip] > So small chunkshape is generally better, and blosc is the slowest on > the very first query, and the fastest after that. This must be an > operating system issue? Yes, you are seeing the operating system disk cache in action. > The blosc files are much larger, perhaps > that plays a role? Yes and no. Blosc is generally (much) faster than other compressors, but the price to pay is that it compress less. > The files for lzo and zlib are 180 - 240Mb, while > the files for blosc are 1.1Gb You probably used the lowest compression level for Blosc (1), but you can probably get much better results by using level 5 (intermediate) or 9 (maximum). Generally speaking, and due to the fast nature of Blosc, you can see Blosc-powered PyTables operations achieving performance that is very close to not using a compressor --most specially when using large chunk sizes and multi-processor machines (Blosc is multi-threaded). -- Francesc Alted ------------------------------------------------------------------------------ Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users