Re: [Pytables-users] places to look to optimize queries on externally-generated files

Francesc Alted Thu, 25 Mar 2010 09:27:03 -0700

Hi Demolishor,

A Thursday 25 March 2010 16:09:08 Craig the Demolishor escrigué:
> Hi folks,
>   I've recently written a C++ program which writes out some data in HDF5
> format, conforming to the file description in Appendix F of the PyTables
> docs. When I open this file using PyTables I can do everything I would
> expect to do with a PyTables-generated file, like queries and iterate over
> rows and all that. My problem is that when I create a file with identical
> data from within PyTables, and then I run queries on *that* file, they run
> twice as fast. Both files have no filters on, no compression or anything
> like that. I also let the PyTables-generated file determine my chunksize
> when I create my C++-generated HDF5 file, so when I look at the
>  "chunkshape" attribute of both tables, they are the same. Is there another
>  parameter or something I am missing that could cause this slowdown?


Well, if you are really cloning the PyTables metainformation, I cannot see how 
queries can be slower with your approach.  I'd suggest you to try with the 
h5diff utility to compare both files.  Perhaps that could shed some light.

> I would like to provide an example but I only see the effect on large files
> (100MB, 100K rows)...I will try and spend some time to see if I can
> reproduce it with a more email-friendly filesize.

Cheers,

-- 
Francesc Alted

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] places to look to optimize queries on externally-generated files

Reply via email to