Hi James, A Monday 04 February 2008, James Philbin escrigué: > Hi, > > I'm planning on using pytables for storing data on large image > datasets (1M+) and while playing around with some code, came across a > warning saying I had exceeded the maximum number of children (4096). > My aim is to eventually have one child per image (ie millions of > children), so i'm wondering where this limitation comes from? HDF5 > itself seems to have no such limits.
The maximum number of children is more a recommendation than a limitation. Such recommendation is based on my experiences, and they say that, when this limit is exceeded, HDF5 starts to use large amounts of memory and access to metadata is slooow. Lately, this has been discussed in the HDF5 list and the HDF group is saying that they are aware of the problem and that they will come up with a solution shortly. As soon as they fix this, we will remove the warning from PyTables. I'm attaching the mail from Elena Pourmal, from the HDF group, where she recognizes the problem in the metadata cache subsystem of HDF5 and where she is expecting a fix very soon (at least, in the HDF5 1.8 branch). Cheers, ---------- Missatge transmès ---------- Subject: Re: Opening datasets expensive? Date: Thursday 17 January 2008 From: Elena Pourmal <[EMAIL PROTECTED]> To: Jim Robinson <[EMAIL PROTECTED]>, [EMAIL PROTECTED] Jim, It is a known performance problem related to the behavior of the HDF5 metadata cache. We have a fix and will be testing it in the next few days. Would you like to get a tar ball when it is available and see if the fix addresses the problem? The fix will be in the 1.8 branch. Elena At 11:31 PM -0500 1/16/08, Jim Robinson wrote: >Hi, I am using HDF5 as the backend for a genomics visualizer. >The data is organized by experiement, chromosome, and resolution >scale. A typical file might have 300 or so experiments, 24 >chromosomes, and 8 resolution scales. My current design uses a for >each experiment, chromosome, and resolution scale, or 57,600 >datasets in all. > >First question, is that too many datasets? I could combine the >experiment and chromosome dimensions with a corresponding reduction >in the number of datasets and increase in each datasets size. It >would complicate the application code but is doable. > >The application is a visualization and needs to access small >portions of each dataset very quickly. It is organized similar to >google maps and as the user zooms and pans small slices or datasets >are accessed and rendered. The number of datasets accessed at one >time is equal to the number of experiments. It is working fine with >small numbers of experiments, < 20, but panning and zooming is >noticeably sluggish with 300. I did some profiling and discoverd >that about 70% of the time is spent just opening the datasets. Is >this to be expected? Is it good practice to have a few large >datasets rather than many smaller ones? >Oh, I'm using the java jni wrapper (H5). I am not using the object >api, just the jni wrapper functions. > >Thanks for any tips. > >Jim Robinson >Broad Institute > > >---------------------------------------------------------------------- >This mailing list is for HDF software users discussion. >To subscribe to this list, send a message to [EMAIL PROTECTED] >To unsubscribe, send a message to [EMAIL PROTECTED] -- ------------------------------------------------------------ Elena Pourmal The HDF Group 1901 So First ST. Suite C-2 Champaign, IL 61820 [EMAIL PROTECTED] (217)333-0238 (office) (217)333-9049 (fax) ------------------------------------------------------------ ---------------------------------------------------------------------- This mailing list is for HDF software users discussion. To subscribe to this list, send a message to [EMAIL PROTECTED] To unsubscribe, send a message to [EMAIL PROTECTED] ------------------------------------------------------- -- >0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-" -- >0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-" ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users