Francesc,
Here's my setup:
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
PyTables version: 1.3
HDF5 version: 1.6.5
numarray version: 1.5.1
Zlib version: 1.2.1
BZIP2 version: 1.0.2 (30-Dec-2001)
Python version: 2.4.3 (#1, Apr 21 2006, 14:31:08)
[GCC 3.3.3 (SuSE Linux)]
Platform: linux2-x86_64
Byte-ordering: little
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
I recently switched from 'h5import' to PyTables to convert the output from
large finite element models into HDF5 format. I like using the PyTables
approach because it gives me more control than the shell scripts that I
cobbled together to use 'h5import'
However, the most recent file takes much longer to search. Here is the
results of a simple test I ran with old and new databases:
'New':
$ python test_finder.py
Found 3 results for your search
CQUAD4 1121910
fh.find('1121910') took 2.37 sec
Found 3 results for your search
fh.find('1121910', gpf=True) took 9.44 sec
'Old':
$ python test_finder.py
Found 3 results for your search
CQUAD4 1121910
fh.find('1121910') took 0.664 sec
Found 3 results for your search
fh.find('1121910', gpf=True) took 0.638 sec
The only difference I could detect between the two files was that the
PyTables version is the 'shuffle' parameter. Here is some ptdump output of
some nodes:
'New':
$ ptdump -v xxx_lev_1_1.h5:/results/oef1/quad4
/results/oef1/quad4 (EArray(1022L, 17759L, 3L), shuffle, zlib(6)) ''
atom = Atom(dtype='Float32', shape=(0, 17759L, 3L), flavor='numarray')
nrows = 1022
extdim = 0
flavor = 'numarray'
byteorder = 'little'
'Old':
$ ptdump -v xxx_lev_0.h5:/results/oef1/quad4
/cluster/stress/methods/local/lib/python2.4/site-packages/tables/File.py:227:
UserWarning: file ``xxx_lev_0.h5`` exists and it is an HDF5 file, but it
does not have a PyTables format; I will try to do my best to guess what's
there using HDF5 metadata
METADATA_CACHE_SIZE, nodeCacheSize)
/results/oef1/quad4 (EArray(1018L, 17402L, 3L), zlib(6)) ''
atom = Atom(dtype='Float32', shape=(0, 17402L, 3L), flavor='numarray')
nrows = 1018
extdim = 0
flavor = 'numarray'
byteorder = 'big'
My client code is completely unchanged with this testing: only the
databases were created by two different methods. I have yet to do more
testing with smaller files (these are ~2.2G). I read the section on
shuffling in the manual where it suggest that shuffle will actually
improve throughput. but this is the only difference I could detect. It is
not a trivial matter to produce these large files, so I need to get it
right. I know it's not much to go on, but any suggestions are appreciated.
Elias Collas
Stress Methods
Gulfstream Aerospace Corp
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users