Re: [Pytables-users] Chunk selection for optimized data access

Andreas Hilboll Tue, 04 Jun 2013 06:10:46 -0700

On 04.06.2013 05:35, Tim Burgess wrote:
> My thoughts are:
> 
> - try it without any compression. Assuming 32 bit floats, your monthly
> 5760 x 2880 is only about 65MB. Uncompressed data may perform well and
> at the least it will give you a baseline to work from - and will help if
> you are investigating IO tuning.
> 
> - I have found with CArray that the auto chunksize works fairly well.
> Experiment with that chunksize and with some chunksizes that you think
> are more appropriate (maybe temporal rather than spatial in your case).
> 
> On Jun 03, 2013, at 10:45 PM, Andreas Hilboll <li...@hilboll.de> wrote:
> 
>> On 03.06.2013 14:43, Andreas Hilboll wrote:
>> > Hi,
>> >
>> > I'm storing large datasets (5760 x 2880 x ~150) in a compressed EArray
>> > (the last dimension represents time, and once per month there'll be one
>> > more 5760x2880 array to add to the end).
>> >
>> > Now, extracting timeseries at one index location is slow; e.g., for four
>> > indices, it takes several seconds:
>> >
>> > In [19]: idx = ((5000, 600, 800, 900), (1000, 2000, 500, 1))
>> >
>> > In [20]: %time AA = np.vstack([_a[i,j] for i,j in zip(*idx)])
>> > CPU times: user 4.31 s, sys: 0.07 s, total: 4.38 s
>> > Wall time: 7.17 s
>> >
>> > I have the feeling that this performance could be improved, but I'm not
>> > sure about how to properly use the `chunkshape` parameter in my case.
>> >
>> > Any help is greatly appreciated :)
>> >
>> > Cheers, Andreas.
>>
>> PS: If I could get significant performance gains by not using an EArray
>> and therefore re-creating the whole database each month, then this would
>> also be an option.
>>
>> -- Andreas.


Thanks a lot, Anthony and Tim! I was able to get down the readout time
considerably using  chunkshape=(32, 32, 256) for my 5760x2880x150 array.
Now, reading times are about as fast as I expected.

the downside is that now, building up the database takes up a lot of
time, because i get the data in chunks of 5760x2880x1. So I guess that
writing the data to disk like this causes a load of IO operations ...

My new question: Is there a way to create a file in-memory? If possible,
I could then build up my database in-memory and then, once it's done,
just copy the arrays to an on-disk file. Is that possible? If so, how?

Thanks a lot for your help!

-- Andreas.


------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Chunk selection for optimized data access

Reply via email to