Re: [Pytables-users] Performance issues when writing a large number of arrays

David Fokkema Sat, 26 Sep 2009 09:50:32 -0700

Hi Abiel,

On Fri, 2009-09-25 at 23:07 -0400, Abiel Reinhart wrote:
> I am attempting to store a large number of moderately-sized
> variable-length numpy arrays in a PyTables database, where each array
> can be referred to by a string key. Looking through the mailing list
> archives, it seems that one possible solution to this problem is to
> simply create a large number of Array objects.


<snip>

Another solution is to create a VLArray (variable-length array). Like
this:
>>> import tables
>>> import numpy as np
>>> h5f = tables.openFile('test.h5', 'w')
>>> h5f.createVLArray('/', 'test', tables.Int32Atom())
/test (VLArray(0,)) ''
  atom = Int32Atom(shape=(), dflt=0)
  byteorder = 'little'
  nrows = 0
  flavor = 'numpy'
>>> for i in range(10000):
...    a1 = np.arange(np.random.randint(1000, 10000))
...    h5f.root.test.append(a1)

On my puny Eee PC (Atom 1.6 Ghz variable), linux, python 2.6 it runs in
roughly 7 seconds, while the arrays are recreated throughout the loop
and have variable size. So already it is faster than your test. You can
reference a particular array with h5f.root.test[idx] where idx can of
course be textual, in the sense that h5f.root.test[int(idx)] can be used
if idx is '1233'.

At least, this was suggested by Francesc when I brought up my own
problem on this list.

Good luck,

David


------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Performance issues when writing a large number of arrays

Reply via email to