2011/11/10 Thibault North <tno...@fedoraproject.org>: [clip] > Thanks for your detailled reply. > > I am now using a EArray, as you suggested, with: > myearray = h5file.createEArray(h5file.root, 'steps', atom > , (L, K, 0), "desc") > (Number of iteration N is unknown but usually small, L and K are known) > > If I understand well, I need to append an initially empty (L,K) matrix for > each iteration: > myearray.append(np.zeros((L,K))).reshape(L, K, 1)) > > and then fill it by iterating on the K dimension: > for line in mymatrix: > myearray[line, :, iteration] = mydatacol > > This actually works, but: > - the append() requires to load an array of size(L,K) in memory > - I get a warning (with L=2e4+1 and K=1e3) > /usr/lib64/python2.7/site-packages/tables/leaf.py:416: PerformanceWarning: The > Leaf ``/steps`` is exceeding the maximum recommended rowsize (104857600 > bytes); > be ready to see PyTables asking for *lots* of memory and possibly slow > I/O. You may want to reduce the rowsize by trimming the value of > dimensions that are orthogonal (and preferably close) to the *main* > dimension of this leave. Alternatively, in case you have specified a > very small/large chunksize, you may want to increase/decrease it. > PerformanceWarning) > Is that just a matter of choosing the good position for the extendable > dimension?
Uh, for such a large L and K (20e6 elements ~ 180 MB if atom = double) your rowsize clearly exceeds the recommendation of PyTables. As I see this, you have two options here: 1) Split your data in several CArrays, one per each range(N) entry (you can use a group to tie them together), and define the shape of every array as (L, K). This will allow to efficiently write/read your data to/from disk. 2) If, for convenience reasons, you prefer to stick with a monolithic EArray, my advice is to increase the internal buffers for I/O by tweaking params IO_BUFFER_SIZE and BUFFER_TIMES. See: http://pytables.readthedocs.org/en/latest/usersguide/parameter_files.html#parameters-for-the-i-o-buffer-in-leaf-objects > I quickly checked the docs, but couldn't find a way to "fill [...] parts of it > [the array] easily via indexing". Append() seems to be required before... Yes. Append is the recommended way for enlarging an EArray. Then you can use an slice assignment in case you want to modify parts of the array, i.e.: steps[:,:,n] = np.arrange(L*K).reshape(L,K) -- Francesc Alted ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users