Hello again,

This topic is of great interest to me as I have been attempting to tune 
the chunkshape parameter manually.

After our last exchange, I took your suggestions and made all my index 
searches in-memory to get max speed. What I found was initially very 
surprising, but on reflection started to make sense: I actually had a 
greater bottleneck due to how I organized my data vs. how it was being 
used. To whit, I had a multidimensional array with a shape like this:

{1020, 4, 15678, 3}

but I was reading it -- with PyTables -- like so:

>>> data = earrayObject[:,:,offset,:]

With small arrays like {20, 4, 15678, 3} it is not so noticeable, but with 
the combination of large arrays and the default chunkshape, a lot of time 
was being spent slicing the array. 

The switch to PyTables (from h5import) I was able to easily reorganize the 
data to be more efficient for how I was reading it, ie,

>>> earrayObject.shape
(15678L, 4L, 1020L, 3L)
>>> data = earrayObject[offset,:,:,:]

It seems to me then, that chunkshape could be selected to also give 
optimal, or near-optimal performance. My problem now is that as I make the 
chunks smaller, I get better read performance (which is the goal), but 
write performance (not done very often) has slowed way down. I suppose 
this makes sense, as smaller chunks implies more trips to the disk for I/O 
writing the entire array.

So are there any guidelines to balance reading vs writing performance with 
chunkshape? Right now I'm just trying 'sensible' chunkshapes and seeing 
what the result is. Currently, I'm leaning toward something like (32, 4, 
256, 3). The truth is, only one row is ever read at a time, but the write 
time for (1, 4, 512, 3) is just too long. Is there an obvious flaw in my 
approach that I cannot see?

Also, should I avoid ptrepack, or is there a switch that will preserve my 
carefully chosen chunkshapes? I have the same situation as Gabriel in that 
I don't know what the final number of rows my EArray will have (it's the 
now the third dimension that is the extensible axis) and I just take the 
default, expectedrows=1000.

With gratitude,

Elias Collas
Stress Methods Group
Gulfstream Aerospace Corp.

This e-mail message, including all attachments, is for the sole use of the 
intended recipient(s) and may contain legally privileged and confidential 
information.  If you are not an intended recipient, you are hereby 
notified that you have either received this message in error or through 
interception, and that any review, use, distribution, copying or 
disclosure of this message or its attachments is strictly prohibited and 
is subject to criminal and civil penalties.  All personal messages express 
solely the sender's views and not those of Gulfstream Aerospace 
Corporation.  If you received this message in error, please contact the 
sender by reply e-mail and destroy all copies of the original message.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to