Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

Francesc Alted Wed, 18 Jul 2012 07:40:07 -0700

On 7/18/12 4:11 PM, Ümit Seren wrote:
> Actually I had 30.000 groups in a parent group.
> Each of the 30.000 groups had maybe 3 datasets.
> So to be honest I never had 30.000 datasets in a single group.
> I guess you will probably have to disable the LRU cache in that case right?


Okay.  So I'd say that having 30.000 entries (no matter if they are 
groups or datasets) would be a bad performance practice in general, but 
maybe it is a difference between groups and datasets (i.e. it affects 
more to datasets than groups)?.  Just curious, PyTables did not complain 
when you created 30.000 groups in the same group?

Regarding the LRU cache, no, I don't think this is the problem, but 
rather how HDF5 implements the 'inodes' (or whatever they call that).  
This is a big issue in general (inodes in filesystems have similar 
problems too), and what hurts performance in this case.

-- 
Francesc Alted


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

Reply via email to