On Fri, Nov 11, 2011 at 4:25 AM, Francesc Alted <fal...@pytables.org> wrote: > 2011/11/10 Thibault North <tno...@fedoraproject.org>: > [clip] >> Thanks for your detailled reply. >> >> I am now using a EArray, as you suggested, with: >> myearray = h5file.createEArray(h5file.root, 'steps', atom >> , (L, K, 0), "desc") >> (Number of iteration N is unknown but usually small, L and K are known) >> >> If I understand well, I need to append an initially empty (L,K) matrix for >> each iteration: >> myearray.append(np.zeros((L,K))).reshape(L, K, 1)) >> >> and then fill it by iterating on the K dimension: >> for line in mymatrix: >> myearray[line, :, iteration] = mydatacol >> >> This actually works, but: >> - the append() requires to load an array of size(L,K) in memory >> - I get a warning (with L=2e4+1 and K=1e3) >> /usr/lib64/python2.7/site-packages/tables/leaf.py:416: PerformanceWarning: >> The >> Leaf ``/steps`` is exceeding the maximum recommended rowsize (104857600 >> bytes); >> be ready to see PyTables asking for *lots* of memory and possibly slow >> I/O. You may want to reduce the rowsize by trimming the value of >> dimensions that are orthogonal (and preferably close) to the *main* >> dimension of this leave. Alternatively, in case you have specified a >> very small/large chunksize, you may want to increase/decrease it. >> PerformanceWarning) >> Is that just a matter of choosing the good position for the extendable >> dimension? > > Uh, for such a large L and K (20e6 elements ~ 180 MB if atom = double) > your rowsize clearly exceeds the recommendation of PyTables. As I see > this, you have two options here: > > 1) Split your data in several CArrays, one per each range(N) entry > (you can use a group to tie them together), and define the shape of > every array as (L, K). This will allow to efficiently write/read your > data to/from disk. > > 2) If, for convenience reasons, you prefer to stick with a monolithic > EArray, my advice is to increase the internal buffers for I/O by > tweaking params IO_BUFFER_SIZE and BUFFER_TIMES. See: > http://pytables.readthedocs.org/en/latest/usersguide/parameter_files.html#parameters-for-the-i-o-buffer-in-leaf-objects
Alright. My atom is a Complex128, so indeed the rowsize increases quickly. Your first suggestion is certainly the most robust one, and I will look at these groups. >> I quickly checked the docs, but couldn't find a way to "fill [...] parts of >> it >> [the array] easily via indexing". Append() seems to be required before... > > Yes. Append is the recommended way for enlarging an EArray. Then you > can use an slice assignment in case you want to modify parts of the > array, i.e.: > > steps[:,:,n] = np.arrange(L*K).reshape(L,K) Yes, this is very handy indeed. Thanks for your time. Thibault >[...] ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users