On Fri, Nov 11, 2011 at 4:25 AM, Francesc Alted <fal...@pytables.org> wrote:
> 2011/11/10 Thibault North <tno...@fedoraproject.org>:
> [clip]
>> Thanks for your detailled reply.
>>
>> I am now using a EArray, as you suggested, with:
>> myearray = h5file.createEArray(h5file.root,  'steps', atom
>> , (L, K, 0), "desc")
>> (Number of iteration N is unknown but usually small, L and K are known)
>>
>> If I understand well, I need to append an initially empty (L,K) matrix for
>> each iteration:
>> myearray.append(np.zeros((L,K))).reshape(L, K, 1))
>>
>> and then fill it by iterating on the K dimension:
>> for line in mymatrix:
>>  myearray[line, :, iteration] = mydatacol
>>
>> This actually works, but:
>> - the append() requires to load an array of size(L,K) in memory
>> - I get a warning (with L=2e4+1 and K=1e3)
>> /usr/lib64/python2.7/site-packages/tables/leaf.py:416: PerformanceWarning: 
>> The
>> Leaf ``/steps`` is exceeding the maximum recommended rowsize (104857600
>> bytes);
>> be ready to see PyTables asking for *lots* of memory and possibly slow
>> I/O.  You may want to reduce the rowsize by trimming the value of
>> dimensions that are orthogonal (and preferably close) to the *main*
>> dimension of this leave.  Alternatively, in case you have specified a
>> very small/large chunksize, you may want to increase/decrease it.
>>  PerformanceWarning)
>> Is that just a matter of choosing the good position for the extendable
>> dimension?
>
> Uh, for such a large L and K (20e6 elements ~ 180 MB if atom = double)
> your rowsize clearly exceeds the recommendation of PyTables.  As I see
> this, you have two options here:
>
> 1) Split your data in several CArrays, one per each range(N) entry
> (you can use a group to tie them together), and define the shape of
> every array as (L, K).  This will allow to efficiently write/read your
> data to/from disk.
>
> 2) If, for convenience reasons, you prefer to stick with a monolithic
> EArray, my advice is to increase the internal buffers for I/O by
> tweaking params IO_BUFFER_SIZE and BUFFER_TIMES.  See:
>  http://pytables.readthedocs.org/en/latest/usersguide/parameter_files.html#parameters-for-the-i-o-buffer-in-leaf-objects

Alright. My atom is a Complex128, so indeed the rowsize increases quickly.
Your first suggestion is certainly the most robust one, and I will
look at these groups.

>> I quickly checked the docs, but couldn't find a way to "fill [...] parts of 
>> it
>> [the array] easily via indexing". Append() seems to be required before...
>
> Yes. Append is the recommended way for enlarging an EArray.  Then you
> can use an slice assignment in case you want to modify parts of the
> array, i.e.:
>
> steps[:,:,n] = np.arrange(L*K).reshape(L,K)

Yes, this is very handy indeed.
Thanks for your time.
Thibault

>[...]

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to