Hi James,

A Monday 04 February 2008, James Philbin escrigué:
> Hi,
>
> I'm planning on using pytables for storing data on large image
> datasets (1M+) and while playing around with some code, came across a
> warning saying I had exceeded the maximum number of children (4096).
> My aim is to eventually have one child per image (ie millions of
> children), so i'm wondering where this limitation comes from? HDF5
> itself seems to have no such limits.

The maximum number of children is more a recommendation than a 
limitation.  Such recommendation is based on my experiences, and they 
say that, when this limit is exceeded, HDF5 starts to use large amounts 
of memory and access to metadata is slooow.  Lately, this has been 
discussed in the HDF5 list and the HDF group is saying that they are 
aware of the problem and that they will come up with a solution 
shortly.  As soon as they fix this, we will remove the warning from 
PyTables.

I'm attaching the mail from Elena Pourmal, from the HDF group, where she 
recognizes the problem in the metadata cache subsystem of HDF5 and 
where she is expecting a fix very soon (at least, in the HDF5 1.8 
branch).

Cheers,


----------  Missatge transmès  ----------

Subject: Re: Opening datasets expensive?
Date: Thursday 17 January 2008
From: Elena Pourmal <[EMAIL PROTECTED]>
To: Jim Robinson <[EMAIL PROTECTED]>, [EMAIL PROTECTED]

Jim,

It is a known performance problem related to the behavior of the HDF5 
metadata cache. We have a fix  and will be testing it in the next few 
days.

Would you like to get a tar ball when it is available and see if the 
fix addresses the problem? The fix will be in the 1.8 branch.

Elena

At 11:31 PM -0500 1/16/08, Jim Robinson wrote:
>Hi,  I am using HDF5 as the backend for a genomics visualizer. 
>The data is organized by experiement,  chromosome, and resolution 
>scale.  A typical file might have 300 or so experiments, 24 
>chromosomes, and 8 resolution scales.  My current design uses a for 
>each experiment, chromosome, and resolution scale,  or 57,600 
>datasets in all.  
>
>First question, is that too many datasets?   I could combine the 
>experiment and chromosome dimensions with a corresponding reduction 
>in the number of datasets and increase in each datasets size.  It 
>would complicate the application code but is doable.
>
>The application is a visualization and needs to access small 
>portions of each dataset very quickly.  It is organized similar to 
>google maps and as the user zooms and pans small slices or datasets 
>are accessed and rendered.  The number of datasets accessed at one 
>time is equal to the number of experiments.  It is working fine with 
>small numbers of experiments,  < 20,  but panning and zooming is 
>noticeably sluggish with 300.   I did some profiling and discoverd 
>that about 70% of the time is spent just opening the datasets.    Is 
>this to be expected?      Is it good practice to have a few large 
>datasets rather than many smaller ones?
>Oh, I'm using the java jni wrapper (H5).  I am not using the object 
>api,  just the jni wrapper functions.
>
>Thanks for any tips.
>
>Jim Robinson
>Broad Institute
>
>
>----------------------------------------------------------------------
>This mailing list is for HDF software users discussion.
>To subscribe to this list, send a message to 
[EMAIL PROTECTED]
>To unsubscribe, send a message to [EMAIL PROTECTED]


-- 

------------------------------------------------------------
Elena Pourmal
The HDF Group
1901 So First ST.
Suite C-2
Champaign, IL 61820

[EMAIL PROTECTED]
(217)333-0238 (office)
(217)333-9049 (fax)
------------------------------------------------------------

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to 
[EMAIL PROTECTED]
To unsubscribe, send a message to [EMAIL PROTECTED]


-------------------------------------------------------

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"
-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to