Re: [Pytables-users] Efficient access to large numbers of datasets

Michael Hoffman Thu, 12 Apr 2007 09:29:41 -0700

Francesc Altet wrote:
> A Dilluns 09 Abril 2007 15:57, Michael Hoffman escrigué:
>> As a followup to my previous message, I have realized that I am supposed
>> to tune the lustre filesystem for large files. Hopefully that will solve
>> my performance problems.
> 
> Maybe. A good crosscheck would be to copy the file to a local filesystem and 
> test the performance. If you still see high latency, please explain which 
> hierarchy have you endowed to your data and I'll try to provide you more 
> feedback.


Well, I tried that and it was still really slow. So I tried balancing 
the tree by creating groups named _00 through _ff, from the first octet 
of the MD5 digest of the dataset name. This afforded a considerable 
speedup in opening even on a remote filesystem:

$ time python -c 'import tables; tables.openFile("original.h5")'
Closing remaining opened files...  original.h5... done.

real    2m25.643s
user    0m1.271s
sys     0m1.379s

$ time python -c 'import tables; tables.openFile("balanced.h5")'
Closing remaining opened files...  balanced.h5... done.

real    0m2.186s
user    0m0.158s
sys     0m0.106s

So perhaps sticking to <4096 nodes per group (or here, <256) is still a 
good idea. I'm thankful that I don't need to move to multiple files 
which would have been a real pain. It would be nice if this sort of 
thing were done automatically but that would probably be best handled 
upstream in HDF5.
-- 
Michael Hoffman


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Pytables-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Efficient access to large numbers of datasets

Reply via email to