A Monday 27 September 2010 20:52:29 Koert Kuipers escrigué:
> Unfortunately I cannot do that since it is company data. I wrote a
> simple script that queries a file twice:
[clip]

> Now I created the same file with lzo, blosc, and zlib compression,
> each with 2 chunkshapes (large means chunkshape = (3971,), while
> small means chunkshape = (248,))
> 
> I ran the script for each file twice (to detect any operating system
> file buffering). Results:
[clip]

> So small chunkshape is generally better, and blosc is the slowest on
> the very first query, and the fastest after that. This must be an
> operating system issue?

Yes, you are seeing the operating system disk cache in action.

> The blosc files are much larger, perhaps
> that plays a role?

Yes and no.  Blosc is generally (much) faster than other compressors, 
but the price to pay is that it compress less.

> The files for lzo and zlib are 180 - 240Mb, while
> the files for blosc are 1.1Gb

You probably used the lowest compression level for Blosc (1), but you 
can probably get much better results by using level 5 (intermediate) or 
9 (maximum).

Generally speaking, and due to the fast nature of Blosc, you can see 
Blosc-powered PyTables operations achieving performance that is very 
close to not using a compressor --most specially when using large chunk 
sizes and multi-processor machines (Blosc is multi-threaded).

-- 
Francesc Alted

------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to