[Pytables-users] Efficient access to large numbers of datasets

Michael Hoffman Sat, 07 Apr 2007 06:05:16 -0700

PyTables has been a great help to my research. I was wondering if I 
could make my use somewhat more efficient.


For a particular project, I produce about 22000 tables of 2000 rows 
each. These are initially produced by a distributing computing farm into 
2500 files, but I concatenate them so that (a) I won't have so many 
files lying around, which our system administrators hate, and (b) I can 
randomly access tables by name easily.

Of course, having about 4 GiB and 22000 tables in one file slows things 
down a bit, especially since it is stored on a remote Lustre file 
system. One thing I thought of was to find some middle ground and 
concatenate the original file set into a small number of files, but not 
just one. Then I could make a separate file for an index to provide 
random access.

Is this a good idea? Any suggestions as to a target number of datasets 
(I know 4096 was once suggested as the max) or data size per file? Are 
there any facilities within PyTables or elsewhere to make this easier?

Many thanks,
-- 
Michael Hoffman


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

[Pytables-users] Efficient access to large numbers of datasets

Reply via email to