Francesc Altet wrote: > A Dijous 12 Abril 2007 18:18, Michael Hoffman escrigué: >> Francesc Altet wrote: >>> A Dilluns 09 Abril 2007 15:57, Michael Hoffman escrigué: >>>> As a followup to my previous message, I have realized that I am supposed >>>> to tune the lustre filesystem for large files. Hopefully that will solve >>>> my performance problems. >>> Maybe. A good crosscheck would be to copy the file to a local filesystem >>> and test the performance. If you still see high latency, please explain >>> which hierarchy have you endowed to your data and I'll try to provide you >>> more feedback. >> Well, I tried that and it was still really slow. So I tried balancing >> the tree by creating groups named _00 through _ff, from the first octet >> of the MD5 digest of the dataset name. This afforded a considerable >> speedup in opening even on a remote filesystem: >> >> $ time python -c 'import tables; tables.openFile("original.h5")' >> Closing remaining opened files... original.h5... done. >> >> real 2m25.643s >> user 0m1.271s >> sys 0m1.379s >> >> $ time python -c 'import tables; tables.openFile("balanced.h5")' >> Closing remaining opened files... balanced.h5... done. >> >> real 0m2.186s >> user 0m0.158s >> sys 0m0.106s >> >> So perhaps sticking to <4096 nodes per group (or here, <256) is still a >> good idea. I'm thankful that I don't need to move to multiple files >> which would have been a real pain. It would be nice if this sort of >> thing were done automatically but that would probably be best handled >> upstream in HDF5. > > I see. So, in the end the PerformanceWarning that was issued some time ago > when too many nodes were put in a single group was not a bad idea... > > In any case, could you develop further which is your tree structure > in 'original.h5' and how you changed it for 'balanced.h5'? I'd like to > figure out what's going on there so as to see whether it is worth to setup > the PerformanceWarning back.
This is using PyTables 1.4 and numarray, so I am not yet sure how it will apply to PyTables 2.0 and numpy. original.h5: root: 22,714 (2000, 12) arrays of Float64 balanced.h5: root: 256 groups with approximately the same number of children: total 22,714 (2000, 12) arrays of Float64 -- Michael Hoffman ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Pytables-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/pytables-users