Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Anthony Scopatz
Thanks Tim! You are the best. Hopefully I will get to this later tonight. Be Well Anthony On Wed, Jun 5, 2013 at 9:20 PM, Tim Burgess wrote: > > > On Jun 06, 2013, at 04:19 AM, Anthony Scopatz wrote: > > Thanks Antonio and Tim! > > These are great. I think that one of these should definitel

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Tim Burgess
On Jun 06, 2013, at 04:19 AM, Anthony Scopatz wrote:Thanks Antonio and Tim!These are great. I think that one of these should definitely make it into the examples/ dir.Be WellAnthony OK. I have put up a pull request with the code added. https://github.com/PyTables/PyTables/pull/266Cheers, Tim -

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Anthony Scopatz
Thanks Antonio and Tim! These are great. I think that one of these should definitely make it into the examples/ dir. Be Well Anthony On Wed, Jun 5, 2013 at 8:10 AM, Francesc Alted wrote: > On 6/5/13 11:45 AM, Andreas Hilboll wrote: > > On 05.06.2013 10:31, Andreas Hilboll wrote: > >> On 05.06

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Francesc Alted
On 6/5/13 11:45 AM, Andreas Hilboll wrote: > On 05.06.2013 10:31, Andreas Hilboll wrote: >> On 05.06.2013 03:29, Tim Burgess wrote: >>> I was playing around with in-memory HDF5 prior to the 3.0 release. >>> Here's an example based on what I was doing. >>> I looked over the docs and it does mention

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Francesc Alted
On 6/5/13 11:45 AM, Andreas Hilboll wrote: > On 05.06.2013 10:31, Andreas Hilboll wrote: >> On 05.06.2013 03:29, Tim Burgess wrote: >>> I was playing around with in-memory HDF5 prior to the 3.0 release. >>> Here's an example based on what I was doing. >>> I looked over the docs and it does mention

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Andreas Hilboll
On 05.06.2013 10:31, Andreas Hilboll wrote: > On 05.06.2013 03:29, Tim Burgess wrote: >> I was playing around with in-memory HDF5 prior to the 3.0 release. >> Here's an example based on what I was doing. >> I looked over the docs and it does mention that there is an option to >> throw away the 'fil

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Andreas Hilboll
On 05.06.2013 09:15, Seref Arikan wrote: > You would be suprised to see how convenient HDF5 can be in small scale > data :) There are cases where one may need to use binary serialization > of a few thousand items, but still needing metadata, indexing and other > nice features provided by HDF5/pyTab

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Andreas Hilboll
On 05.06.2013 03:29, Tim Burgess wrote: > I was playing around with in-memory HDF5 prior to the 3.0 release. > Here's an example based on what I was doing. > I looked over the docs and it does mention that there is an option to > throw away the 'file' rather than write it to disk. > Not sure how to

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Seref Arikan
You would be suprised to see how convenient HDF5 can be in small scale data :) There are cases where one may need to use binary serialization of a few thousand items, but still needing metadata, indexing and other nice features provided by HDF5/pyTables. On Wed, Jun 5, 2013 at 2:29 AM, Tim Burg

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Antonio Valentino
Hi Tim, Il 05/06/2013 03:29, Tim Burgess ha scritto: > I was playing around with in-memory HDF5 prior to the 3.0 release. Here's an > example based on what I was doing. > I looked over the docs and it does mention that there is an option to throw > away > the 'file' rather than write it to disk.

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Antonio Valentino
Hi list, Il 05/06/2013 00:38, Anthony Scopatz ha scritto: > On Tue, Jun 4, 2013 at 12:30 PM, Seref Arikan wrote: > >> I think I've seen this in the release notes of 3.0. This is actually >> something that I'm looking into as well. So any experience/feedback about >> creating files in memory would

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-04 Thread Tim Burgess
I was playing around with in-memory HDF5 prior to the 3.0 release. Here's an example based on what I was doing.I looked over the docs and it does mention that there is an option to throw away the 'file' rather than write it to disk.Not sure how to do that and can't actually think of a use case wher

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-04 Thread Anthony Scopatz
On Tue, Jun 4, 2013 at 12:30 PM, Seref Arikan wrote: > I think I've seen this in the release notes of 3.0. This is actually > something that I'm looking into as well. So any experience/feedback about > creating files in memory would be much appreciated. > I think that you want to set parameters.

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-04 Thread Seref Arikan
I think I've seen this in the release notes of 3.0. This is actually something that I'm looking into as well. So any experience/feedback about creating files in memory would be much appreciated. Best regards Seref On Tue, Jun 4, 2013 at 2:09 PM, Andreas Hilboll wrote: > On 04.06.2013 05:35, T

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-04 Thread Andreas Hilboll
On 04.06.2013 05:35, Tim Burgess wrote: > My thoughts are: > > - try it without any compression. Assuming 32 bit floats, your monthly > 5760 x 2880 is only about 65MB. Uncompressed data may perform well and > at the least it will give you a baseline to work from - and will help if > you are invest

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-03 Thread Tim Burgess
and for the record...yes, it should be much faster than 4 seconds.>>> foo = np.empty([5760,2880,150],dtype=np.float32)>>> idx = ((5000,600,800,900),(1000,2000,500,1))>>> import time>>> t0 = time.time();bar=np.vstack([foo[i,j] for i,j in zip(*idx)]);t1=time.time(); print t1-t00.000144004821777On Jun

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-03 Thread Anthony Scopatz
Opps! I forgot to mention CArray! On Mon, Jun 3, 2013 at 10:35 PM, Tim Burgess wrote: > My thoughts are: > > - try it without any compression. Assuming 32 bit floats, your monthly > 5760 x 2880 is only about 65MB. Uncompressed data may perform well and at > the least it will give you a baselin

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-03 Thread Tim Burgess
My thoughts are:- try it without any compression. Assuming 32 bit floats, your monthly 5760 x 2880 is only about 65MB. Uncompressed data may perform well and at the least it will give you a baseline to work from - and will help if you are investigating IO tuning.- I have found with CArray that the

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-03 Thread Anthony Scopatz
Hi Andreas, First off, nothing should be this bad, but What is the data type of the array? Also are you selecting chunksize manually or letting PyTables figure it out? Here are some things that you can try: 1. Query with fancy indexing, once. That is, rather than using a list comprehensi

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-03 Thread Andreas Hilboll
On 03.06.2013 14:43, Andreas Hilboll wrote: > Hi, > > I'm storing large datasets (5760 x 2880 x ~150) in a compressed EArray > (the last dimension represents time, and once per month there'll be one > more 5760x2880 array to add to the end). > > Now, extracting timeseries at one index location is

[Pytables-users] Chunk selection for optimized data access

2013-06-03 Thread Andreas Hilboll
Hi, I'm storing large datasets (5760 x 2880 x ~150) in a compressed EArray (the last dimension represents time, and once per month there'll be one more 5760x2880 array to add to the end). Now, extracting timeseries at one index location is slow; e.g., for four indices, it takes several seconds: