My thoughts are:

- try it without any compression. Assuming 32 bit floats, your monthly 5760 x 2880 is only about 65MB. Uncompressed data may perform well and at the least it will give you a baseline to work from - and will help if you are investigating IO tuning.

- I have found with CArray that the auto chunksize works fairly well. Experiment with that chunksize and with some chunksizes that you think are more appropriate (maybe temporal rather than spatial in your case).

On Jun 03, 2013, at 10:45 PM, Andreas Hilboll <> wrote:

On 03.06.2013 14:43, Andreas Hilboll wrote:
> Hi,
> I'm storing large datasets (5760 x 2880 x ~150) in a compressed EArray
> (the last dimension represents time, and once per month there'll be one
> more 5760x2880 array to add to the end).
> Now, extracting timeseries at one index location is slow; e.g., for four
> indices, it takes several seconds:
> In [19]: idx = ((5000, 600, 800, 900), (1000, 2000, 500, 1))
> In [20]: %time AA = np.vstack([_a[i,j] for i,j in zip(*idx)])
> CPU times: user 4.31 s, sys: 0.07 s, total: 4.38 s
> Wall time: 7.17 s
> I have the feeling that this performance could be improved, but I'm not
> sure about how to properly use the `chunkshape` parameter in my case.
> Any help is greatly appreciated :)
> Cheers, Andreas.

PS: If I could get significant performance gains by not using an EArray
and therefore re-creating the whole database each month, then this would
also be an option.

-- Andreas.

Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
Pytables-users mailing list
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
Pytables-users mailing list

Reply via email to