On 04.06.2013 05:35, Tim Burgess wrote: > My thoughts are: > > - try it without any compression. Assuming 32 bit floats, your monthly > 5760 x 2880 is only about 65MB. Uncompressed data may perform well and > at the least it will give you a baseline to work from - and will help if > you are investigating IO tuning. > > - I have found with CArray that the auto chunksize works fairly well. > Experiment with that chunksize and with some chunksizes that you think > are more appropriate (maybe temporal rather than spatial in your case). > > On Jun 03, 2013, at 10:45 PM, Andreas Hilboll <li...@hilboll.de> wrote: > >> On 03.06.2013 14:43, Andreas Hilboll wrote: >> > Hi, >> > >> > I'm storing large datasets (5760 x 2880 x ~150) in a compressed EArray >> > (the last dimension represents time, and once per month there'll be one >> > more 5760x2880 array to add to the end). >> > >> > Now, extracting timeseries at one index location is slow; e.g., for four >> > indices, it takes several seconds: >> > >> > In [19]: idx = ((5000, 600, 800, 900), (1000, 2000, 500, 1)) >> > >> > In [20]: %time AA = np.vstack([_a[i,j] for i,j in zip(*idx)]) >> > CPU times: user 4.31 s, sys: 0.07 s, total: 4.38 s >> > Wall time: 7.17 s >> > >> > I have the feeling that this performance could be improved, but I'm not >> > sure about how to properly use the `chunkshape` parameter in my case. >> > >> > Any help is greatly appreciated :) >> > >> > Cheers, Andreas. >> >> PS: If I could get significant performance gains by not using an EArray >> and therefore re-creating the whole database each month, then this would >> also be an option. >> >> -- Andreas.
Thanks a lot, Anthony and Tim! I was able to get down the readout time considerably using chunkshape=(32, 32, 256) for my 5760x2880x150 array. Now, reading times are about as fast as I expected. the downside is that now, building up the database takes up a lot of time, because i get the data in chunks of 5760x2880x1. So I guess that writing the data to disk like this causes a load of IO operations ... My new question: Is there a way to create a file in-memory? If possible, I could then build up my database in-memory and then, once it's done, just copy the arrays to an on-disk file. Is that possible? If so, how? Thanks a lot for your help! -- Andreas. ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users