2011/11/10 Thibault North <tno...@fedoraproject.org>:
[clip]
> Thanks for your detailled reply.
>
> I am now using a EArray, as you suggested, with:
> myearray = h5file.createEArray(h5file.root,  'steps', atom
> , (L, K, 0), "desc")
> (Number of iteration N is unknown but usually small, L and K are known)
>
> If I understand well, I need to append an initially empty (L,K) matrix for
> each iteration:
> myearray.append(np.zeros((L,K))).reshape(L, K, 1))
>
> and then fill it by iterating on the K dimension:
> for line in mymatrix:
>  myearray[line, :, iteration] = mydatacol
>
> This actually works, but:
> - the append() requires to load an array of size(L,K) in memory
> - I get a warning (with L=2e4+1 and K=1e3)
> /usr/lib64/python2.7/site-packages/tables/leaf.py:416: PerformanceWarning: The
> Leaf ``/steps`` is exceeding the maximum recommended rowsize (104857600
> bytes);
> be ready to see PyTables asking for *lots* of memory and possibly slow
> I/O.  You may want to reduce the rowsize by trimming the value of
> dimensions that are orthogonal (and preferably close) to the *main*
> dimension of this leave.  Alternatively, in case you have specified a
> very small/large chunksize, you may want to increase/decrease it.
>  PerformanceWarning)
> Is that just a matter of choosing the good position for the extendable
> dimension?

Uh, for such a large L and K (20e6 elements ~ 180 MB if atom = double)
your rowsize clearly exceeds the recommendation of PyTables.  As I see
this, you have two options here:

1) Split your data in several CArrays, one per each range(N) entry
(you can use a group to tie them together), and define the shape of
every array as (L, K).  This will allow to efficiently write/read your
data to/from disk.

2) If, for convenience reasons, you prefer to stick with a monolithic
EArray, my advice is to increase the internal buffers for I/O by
tweaking params IO_BUFFER_SIZE and BUFFER_TIMES.  See:

 
http://pytables.readthedocs.org/en/latest/usersguide/parameter_files.html#parameters-for-the-i-o-buffer-in-leaf-objects

> I quickly checked the docs, but couldn't find a way to "fill [...] parts of it
> [the array] easily via indexing". Append() seems to be required before...

Yes. Append is the recommended way for enlarging an EArray.  Then you
can use an slice assignment in case you want to modify parts of the
array, i.e.:

steps[:,:,n] = np.arrange(L*K).reshape(L,K)

-- 
Francesc Alted

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to