Re: [Pytables-users] Chunk selection for optimized data access

Andreas Hilboll Mon, 03 Jun 2013 05:47:29 -0700

On 03.06.2013 14:43, Andreas Hilboll wrote:
> Hi,
> 
> I'm storing large datasets (5760 x 2880 x ~150) in a compressed EArray
> (the last dimension represents time, and once per month there'll be one
> more 5760x2880 array to add to the end).
> 
> Now, extracting timeseries at one index location is slow; e.g., for four
> indices, it takes several seconds:
> 
>    In [19]: idx = ((5000, 600, 800, 900), (1000, 2000, 500, 1))
> 
>    In [20]: %time AA = np.vstack([_a[i,j] for i,j in zip(*idx)])
>    CPU times: user 4.31 s, sys: 0.07 s, total: 4.38 s
>    Wall time: 7.17 s
> 
> I have the feeling that this performance could be improved, but I'm not
> sure about how to properly use the `chunkshape` parameter in my case.
> 
> Any help is greatly appreciated :)
> 
> Cheers, Andreas.


PS: If I could get significant performance gains by not using an EArray
and therefore re-creating the whole database each month, then this would
also be an option.

-- Andreas.


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Chunk selection for optimized data access

Reply via email to