Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Antonio Valentino
Hi list, Il 05/06/2013 00:38, Anthony Scopatz ha scritto: On Tue, Jun 4, 2013 at 12:30 PM, Seref Arikan serefari...@gmail.com wrote: I think I've seen this in the release notes of 3.0. This is actually something that I'm looking into as well. So any experience/feedback about creating files

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Seref Arikan
You would be suprised to see how convenient HDF5 can be in small scale data :) There are cases where one may need to use binary serialization of a few thousand items, but still needing metadata, indexing and other nice features provided by HDF5/pyTables. On Wed, Jun 5, 2013 at 2:29 AM, Tim

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Andreas Hilboll
On 05.06.2013 10:31, Andreas Hilboll wrote: On 05.06.2013 03:29, Tim Burgess wrote: I was playing around with in-memory HDF5 prior to the 3.0 release. Here's an example based on what I was doing. I looked over the docs and it does mention that there is an option to throw away the 'file'

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Francesc Alted
On 6/5/13 11:45 AM, Andreas Hilboll wrote: On 05.06.2013 10:31, Andreas Hilboll wrote: On 05.06.2013 03:29, Tim Burgess wrote: I was playing around with in-memory HDF5 prior to the 3.0 release. Here's an example based on what I was doing. I looked over the docs and it does mention that there

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Francesc Alted
On 6/5/13 11:45 AM, Andreas Hilboll wrote: On 05.06.2013 10:31, Andreas Hilboll wrote: On 05.06.2013 03:29, Tim Burgess wrote: I was playing around with in-memory HDF5 prior to the 3.0 release. Here's an example based on what I was doing. I looked over the docs and it does mention that there

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Anthony Scopatz
Thanks Antonio and Tim! These are great. I think that one of these should definitely make it into the examples/ dir. Be Well Anthony On Wed, Jun 5, 2013 at 8:10 AM, Francesc Alted fal...@gmail.com wrote: On 6/5/13 11:45 AM, Andreas Hilboll wrote: On 05.06.2013 10:31, Andreas Hilboll wrote:

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Tim Burgess
On Jun 06, 2013, at 04:19 AM, Anthony Scopatz scop...@gmail.com wrote:Thanks Antonio and Tim!These are great. I think that one of these should definitely make it into the examples/ dir.Be WellAnthonyOK. I have put up a pull request with the code

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-04 Thread Andreas Hilboll
On 04.06.2013 05:35, Tim Burgess wrote: My thoughts are: - try it without any compression. Assuming 32 bit floats, your monthly 5760 x 2880 is only about 65MB. Uncompressed data may perform well and at the least it will give you a baseline to work from - and will help if you are

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-04 Thread Seref Arikan
I think I've seen this in the release notes of 3.0. This is actually something that I'm looking into as well. So any experience/feedback about creating files in memory would be much appreciated. Best regards Seref On Tue, Jun 4, 2013 at 2:09 PM, Andreas Hilboll li...@hilboll.de wrote: On

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-04 Thread Tim Burgess
I was playing around with in-memory HDF5 prior to the 3.0 release. Here's an example based on what I was doing.I looked over the docs and it does mention that there is an option to throw away the 'file' rather than write it to disk.Not sure how to do that and can't actually think of a use case

[Pytables-users] Chunk selection for optimized data access

2013-06-03 Thread Andreas Hilboll
Hi, I'm storing large datasets (5760 x 2880 x ~150) in a compressed EArray (the last dimension represents time, and once per month there'll be one more 5760x2880 array to add to the end). Now, extracting timeseries at one index location is slow; e.g., for four indices, it takes several seconds:

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-03 Thread Andreas Hilboll
On 03.06.2013 14:43, Andreas Hilboll wrote: Hi, I'm storing large datasets (5760 x 2880 x ~150) in a compressed EArray (the last dimension represents time, and once per month there'll be one more 5760x2880 array to add to the end). Now, extracting timeseries at one index location is slow;

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-03 Thread Anthony Scopatz
Hi Andreas, First off, nothing should be this bad, but What is the data type of the array? Also are you selecting chunksize manually or letting PyTables figure it out? Here are some things that you can try: 1. Query with fancy indexing, once. That is, rather than using a list

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-03 Thread Tim Burgess
My thoughts are:- try it without any compression. Assuming 32 bit floats, your monthly 5760 x 2880 is only about 65MB. Uncompressed data may perform well and at the least it will give you a baseline to work from - and will help if you are investigating IO tuning.- I have found with CArray that the

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-03 Thread Anthony Scopatz
Opps! I forgot to mention CArray! On Mon, Jun 3, 2013 at 10:35 PM, Tim Burgess timburg...@mac.com wrote: My thoughts are: - try it without any compression. Assuming 32 bit floats, your monthly 5760 x 2880 is only about 65MB. Uncompressed data may perform well and at the least it will give

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-03 Thread Tim Burgess
and for the record...yes, it should be much faster than 4 seconds. foo = np.empty([5760,2880,150],dtype=np.float32) idx = ((5000,600,800,900),(1000,2000,500,1)) import time t0 = time.time();bar=np.vstack([foo[i,j] for i,j in zip(*idx)]);t1=time.time(); print t1-t00.000144004821777On Jun 03, 2013,