I have a Pytables 2.0.4 VLArray, called "y", with about 6500 rows of about 8500 atoms of shape (36,). The following line takes about 20 minutes to run: i = sum(len(yi) for yi in y)
Question 1: Can I somehow access the length of a VLArray row without having to read the entire row? Question 2: Further on I only need to work with the last 20% or so of each row. Is there an efficient way to slice from a row without having to load it all from disk? for i in range(len(y)): yj = y[i][-2000:] # not having to read y[i][:6500] ... Thanks in advance for any tips. Regards, Jon Olav Background: If y were a Numpy array in memory, the summing would be fast, because each array object remembers its shape. For the VLArray in the HDF5 file, I realize now that I need to read all the data to compute the total number of atoms. That's 6500 * 8500 * 36 * 8 = 16 GB (meaning about 13 MB/s for 20 minutes). >From the timing below (and watching "top" for ages), I see that the (len(yi) for yi in y) spent almost all its time _waiting_ for disk access (status 'D' = uninterruptible sleep, but the support staff tell me it means waiting for disk). In [17]: time i = sum(len(yi) for yi in y) CPU times: user 39.93 s, sys: 16.24 s, total: 56.16 s Wall time: 1192.63 In [18]: y Out[18]: /ap/ph/y (VLArray(6561L,)) 'State vector' atom = Float64Atom(shape=(36L,), dflt=0.0) byteorder = 'little' nrows = 6561 flavor = 'numpy' In [20]: len(y[0]) Out[20]: 8977 In [23]: ls -l vlarraytest.h5 -rw-r--r-- 1 jonvi users 17377780785 ... ------------------------------------------------------------------------------ Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users