[Pytables-users] places to look to optimize queries on externally-generated files

Craig the Demolishor Thu, 25 Mar 2010 08:09:54 -0700

Hi folks,
  I've recently written a C++ program which writes out some data in HDF5
format, conforming to the file description in Appendix F of the PyTables
docs. When I open this file using PyTables I can do everything I would
expect to do with a PyTables-generated file, like queries and iterate over
rows and all that. My problem is that when I create a file with identical
data from within PyTables, and then I run queries on *that* file, they run
twice as fast. Both files have no filters on, no compression or anything
like that. I also let the PyTables-generated file determine my chunksize
when I create my C++-generated HDF5 file, so when I look at the "chunkshape"
attribute of both tables, they are the same. Is there another parameter or
something I am missing that could cause this slowdown?


I would like to provide an example but I only see the effect on large files
(100MB, 100K rows)...I will try and spend some time to see if I can
reproduce it with a more email-friendly filesize.

Many thanks in advance.

Craig

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev

_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

[Pytables-users] places to look to optimize queries on externally-generated files

Reply via email to