A Tuesday 28 August 2007, [EMAIL PROTECTED] escrigué: > > Aha, so you are doing a binary search in an 'index' first; then it > > is almost sure that most of the time is spent in performing the > > look-up in this rank-1 array. As you are doing binary search, and > > the minimum amount of I/O chunk in HDF5 is precisely the chunksize, > > having small chunksizes will favor the performance. By looking at > > your finding times, my guess is that your 'index' array is on-disk, > > and the sparse access (i.e. the binary search) to it is your > > bottleneck. > > While I think you are generally correct, the search times are > somewhat deceptive, as there is a lot going on besides just finding > an offset. I basically have to initialize finite-element (FE) data > objects from the results of the pytables searches. In any case, if I > understand you correctly, to make all my index arrays uncompressed > Arrays rather than CArrays would be optimal from a performance > point-of-view?
I don't know, but if you are reading a lot the index arrays then avoiding compression completely would be a good move. Array objects are the simplest and hence, the fastest, objects to read indeed. > If not, then is there a way to determine optimal chunkshape? > FE uses unstructured grids that are not so trivial to > model in HDF5. The solution I use is to store different elements as > separate datasets within the same group like so: > > /model/geom2/eid/type1 {17400} > /model/geom2/eid/type2 {61/512} > /model/geom2/eid/type3 {1567} > etc. > > Associated data arrays have different shapes depending on element > topology. Another thing that slows down is that the /model/geom2/eid > group has to be walked to binary search each leaf. Maybe not optimal, > but it is clean and easy to understand. Determining the optimal chunksize depends largely on your problem. As I said before, Array objects are good candidates to try with (they are 'contiguous' datasets, in HDF5 parlance, instead of 'chunked' ones, so you don't have to worry about setting chunksizes), and if they are small enough and you have to read them frequently enough, then they will be placed in the HDF5 internal cache. This should boost your searches considerably, I guess. Another possibility is to load your index matrices in NumPy/numarray arrays and do the lookups completely in-memory objects. This should certainly be the faster approach. > > Unfortunately, you are not sending the chunksizes for the 1-rank > > index array, but most probably the chunksize for 'old' files must > > be rather small compared with the 'new' arrays. > > Yes. You certainly know your business. When I originally setup > 'h5import' to do my conversions, I just used a > CHUNKED-DIMENSION-SIZES parameter of 100, not knowing any better. > PyTables 1.3 chunked the entire array. When I did a ptrepack > --upgrade-flavors, the chunks went down to 1024 and the performance > was again reasonable. Aha, that explains the performance pattern you were reporting. > I'm pressing ahead with upgrading to 2.0. I'm seeing significant > improvements indicating that this is the right move. Fortunately, I > only have the one database with the flavors set to 'numarray', which > would definitely cause me problems since a lot of my client scripts > use numarray (or even Numeric) to manipulate arrays pulled from the > HDF5 files. You should know that PyTables 2.0 does support numarray/Numeric right out-of-the-box. The only thing to have in mind is that, although NumPy is used internally, by using the appropriate flavors, you can continue obtaining numarray/Numeric objects out of PyTables 2.0. So, if you want to continue using numarray (which I do not recommend, at least for the long run), just don't pass the '--upgrade-flavors' flag to 'ptrepack'. > Bravo for amazing software and astonishing support! Thanks! :) -- >0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-" ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users