>
> Aha, so you are doing a binary search in an 'index' first; then it is
> almost sure that most of the time is spent in performing the look-up in
> this rank-1 array. As you are doing binary search, and the minimum
> amount of I/O chunk in HDF5 is precisely the chunksize, having small
> chunksizes will favor the performance. By looking at your finding
> times, my guess is that your 'index' array is on-disk, and the sparse
> access (i.e. the binary search) to it is your bottleneck.
While I think you are generally correct, the search times are somewhat
deceptive, as there is a lot going on besides just finding an offset. I
basically have to initialize finite-element (FE) data objects from the
results of the pytables searches. In any case, if I understand you
correctly, to make all my index arrays uncompressed Arrays rather than
CArrays would be optimal from a performance point-of-view? If not, then is
there a way to determine optimal chunkshape? FE uses unstructured grids
that are not so trivial to model in HDF5. The solution I use is to store
different elements as separate datasets within the same group like so:
/model/geom2/eid/type1 {17400}
/model/geom2/eid/type2 {61/512}
/model/geom2/eid/type3 {1567}
etc.
Associated data arrays have different shapes depending on element
topology. Another thing that slows down is that the /model/geom2/eid group
has to be walked to binary search each leaf. Maybe not optimal, but it is
clean and easy to understand.
> Unfortunately, you are not sending the chunksizes for the 1-rank index
> array, but most probably the chunksize for 'old' files must be rather
> small compared with the 'new' arrays.
Yes. You certainly know your business. When I originally setup 'h5import'
to do my conversions, I just used a CHUNKED-DIMENSION-SIZES parameter of
100, not knowing any better. PyTables 1.3 chunked the entire array. When I
did a ptrepack --upgrade-flavors, the chunks went down to 1024 and the
performance was again reasonable.
I'm pressing ahead with upgrading to 2.0. I'm seeing significant
improvements indicating that this is the right move. Fortunately, I only
have the one database with the flavors set to 'numarray', which would
definitely cause me problems since a lot of my client scripts use numarray
(or even Numeric) to manipulate arrays pulled from the HDF5 files.
Bravo for amazing software and astonishing support!
Elias Collas
Stress Methods Group
Gulfstream Aerospace Corp
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users