Francesc Altet wrote:
> A Dijous 12 Abril 2007 18:18, Michael Hoffman escrigué:
>> Francesc Altet wrote:
>>> A Dilluns 09 Abril 2007 15:57, Michael Hoffman escrigué:
>>>> As a followup to my previous message, I have realized that I am supposed
>>>> to tune the lustre filesystem for large files. Hopefully that will solve
>>>> my performance problems.
>>> Maybe. A good crosscheck would be to copy the file to a local filesystem
>>> and test the performance. If you still see high latency, please explain
>>> which hierarchy have you endowed to your data and I'll try to provide you
>>> more feedback.
>> Well, I tried that and it was still really slow. So I tried balancing
>> the tree by creating groups named _00 through _ff, from the first octet
>> of the MD5 digest of the dataset name. This afforded a considerable
>> speedup in opening even on a remote filesystem:
>>
>> $ time python -c 'import tables; tables.openFile("original.h5")'
>> Closing remaining opened files...  original.h5... done.
>>
>> real    2m25.643s
>> user    0m1.271s
>> sys     0m1.379s
>>
>> $ time python -c 'import tables; tables.openFile("balanced.h5")'
>> Closing remaining opened files...  balanced.h5... done.
>>
>> real    0m2.186s
>> user    0m0.158s
>> sys     0m0.106s
>>
>> So perhaps sticking to <4096 nodes per group (or here, <256) is still a
>> good idea. I'm thankful that I don't need to move to multiple files
>> which would have been a real pain. It would be nice if this sort of
>> thing were done automatically but that would probably be best handled
>> upstream in HDF5.
> 
> I see. So, in the end the PerformanceWarning that was issued some time ago 
> when too many nodes were put in a single group was not a bad idea...
> 
> In any case, could you develop further which is your tree structure 
> in 'original.h5' and how you changed it for 'balanced.h5'?  I'd like to 
> figure out what's going on there so as to see whether it is worth to setup 
> the PerformanceWarning back.

This is using PyTables 1.4 and numarray, so I am not yet sure how it 
will apply to PyTables 2.0 and numpy.

original.h5:
   root:
     22,714 (2000, 12) arrays of Float64

balanced.h5:
   root:
     256 groups with approximately the same number of children:
       total 22,714 (2000, 12) arrays of Float64
-- 
Michael Hoffman


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Pytables-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to