[Pytables-users] Fwd: Re: Writing to a dataset with 'wrong' chunksize

Francesc Altet Tue, 27 Nov 2007 07:55:36 -0800

Hi Pauli,

This is the answer of Quicey Koziol, one core developer of the HDF5 
library about the memory problem when updating many chunks at the same 
time.  Can you try the latest version of HDF5 1.8 series?  It seems 
like this problem would be much alleviated.  Remember to add 
the "--with-default-api-version=v16" flag when compiling the HDF5 
library in order to be able to link with PyTables later on.


Cheers,

----------  Missatge transmès  ----------

Subject: Re: Writing to a dataset with 'wrong' chunksize
Date: Tuesday 27 November 2007
From: Quincey Koziol <[EMAIL PROTECTED]>
To: Francesc Altet <[EMAIL PROTECTED]>

Hi Francesc,

On Nov 23, 2007, at 2:06 PM, Francesc Altet wrote:

> Hi,
>
> Some time ago, a Pytables user complained about that the next simple
> operation was hogging gigantics amounts of memory:
>
> import tables, numpy
> N = 600
> f = tables.openFile('foo.h5', 'w')
> f.createCArray(f.root, 'huge_array',
>                tables.Float64Atom(),
>                shape = (2,2,N,N,50,50))
> for i in xrange(50):
>     for j in xrange(50):
>         f.root.huge_array[:,:,:,:,j,i] = \
>             numpy.array([[1,0],[0,1]])[:,:,None,None]
>
> and I think that the problem could be in the HDF5 side.
>
> The point is that, for the 6-th dimensional 'huge_array' dataset,
> Pytables computed an 'optimal' chunkshape of (1, 1, 1, 6, 50, 50).
> Then, the user wanted to update the array starting in the trailing
> dimensions (instead of using the leading ones, which is the  
> recommended
> practice for C-ordered arrays).  This results in Pytables asking HDF5
> to do the update using the traditional procedure:
>
>  /* Create a simple memory data space */
>  if ( (mem_space_id = H5Screate_simple( rank, count, NULL )) < 0 )
>    return -3;
>
>  /* Get the file data space */
>  if ( (space_id = H5Dget_space( dataset_id )) < 0 )
>   return -4;
>
>  /* Define a hyperslab in the dataset */
>  if ( rank != 0 && H5Sselect_hyperslab( space_id, H5S_SELECT_SET,  
> start,
>                                       step, count, NULL) < 0 )
>   return -5;
>
>  if ( H5Dwrite( dataset_id, type_id, mem_space_id, space_id,
> H5P_DEFAULT, data ) < 0 )
>    return -6;
>
> While I understand that this approach is suboptimal  
> (2*2*600*100=240000
> chunks has to 'updated' for each update operation in the loop), I  
> don't
> understand completely the reason why the user reports that the script
> is consuming so much memory (the script crashes, but perhaps it is
> asking for several GB).  My guess is that perhaps HDF5 is trying to
> load all the affected chunks in-memory before trying to update them,
> but I thought it is best to report this here just in case this is a
> bug, or, if not, the huge demand of memory can be somewhat alleviated.

        Is this with the 1.6.x library code?  If so, it would be worthwhile  
checking with the 1.8.0 code, which is designed to do all the I/O on  
each chunk at once and then proceed to the next chunk.  However, it  
does build information about the selection on each chunk to update  
and if the I/O operation will update 240,000 chunks, that could be a  
large amount of memory...

        Quincey

> In case you need more information, you may find it by following the
> details of the discussion in the next thread:
>
> http://www.mail-archive.com/pytables-users@lists.sourceforge.net/ 
> msg00722.html
>
> Thanks!
>
> -- 
>> 0,0<   Francesc Altet     http://www.carabos.com/
> V   V   Cárabos Coop. V.   Enjoy Data
>  "-"
>
> ----------------------------------------------------------------------
> This mailing list is for HDF software users discussion.
> To subscribe to this list, send a message to hdf-forum- 
> [EMAIL PROTECTED]
> To unsubscribe, send a message to [EMAIL PROTECTED]
>
>


----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to 
[EMAIL PROTECTED]
To unsubscribe, send a message to [EMAIL PROTECTED]


-------------------------------------------------------

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

[Pytables-users] Fwd: Re: Writing to a dataset with 'wrong' chunksize

Reply via email to