I have a Pytables 2.0.4 VLArray, called "y", with about 6500 rows of about 8500 
atoms of shape (36,). The following line takes about 20 minutes to run:
i = sum(len(yi) for yi in y)

Question 1: Can I somehow access the length of a VLArray row without having to 
read the entire row?

Question 2: Further on I only need to work with the last 20% or so of each row. 
Is there an efficient way to slice from a row without having to load it all 
from disk?

for i in range(len(y)):
    yj = y[i][-2000:] # not having to read y[i][:6500]
    ...

Thanks in advance for any tips.

Regards,
Jon Olav


Background:

If y were a Numpy array in memory, the summing would be fast, because each 
array object remembers its shape. For the VLArray in the HDF5 file, I realize 
now that I need to read all the data to compute the total number of atoms. 
That's 6500 * 8500 * 36 * 8 = 16 GB (meaning about 13 MB/s for 20 minutes).

>From the timing below (and watching "top" for ages), I see that the (len(yi) 
for yi in y) spent almost all its time _waiting_ for disk access (status 'D' = 
uninterruptible sleep, but the support staff tell me it means waiting for disk).

In [17]: time i = sum(len(yi) for yi in y)
CPU times: user 39.93 s, sys: 16.24 s, total: 56.16 s
Wall time: 1192.63

In [18]: y
Out[18]:
/ap/ph/y (VLArray(6561L,)) 'State vector'
  atom = Float64Atom(shape=(36L,), dflt=0.0)
  byteorder = 'little'
  nrows = 6561
  flavor = 'numpy'

In [20]: len(y[0])
Out[20]: 8977

In [23]: ls -l vlarraytest.h5
-rw-r--r-- 1 jonvi users 17377780785 ...



------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to