Francesc Altet wrote:
> A Dissabte 07 Abril 2007 13:55, Michael Hoffman escrigué:
>> PyTables has been a great help to my research. I was wondering if I
>> could make my use somewhat more efficient.
>>
>> For a particular project, I produce about 22000 tables of 2000 rows
>> each. These are initially produced by a distributing computing farm into
>> 2500 files, but I concatenate them so that (a) I won't have so many
>> files lying around, which our system administrators hate, and (b) I can
>> randomly access tables by name easily.
>>
>> Of course, having about 4 GiB and 22000 tables in one file slows things
>> down a bit, especially since it is stored on a remote Lustre file
>> system. One thing I thought of was to find some middle ground and
>> concatenate the original file set into a small number of files, but not
>> just one. Then I could make a separate file for an index to provide
>> random access.
>>
>> Is this a good idea? Any suggestions as to a target number of datasets
>> (I know 4096 was once suggested as the max) or data size per file? Are
>> there any facilities within PyTables or elsewhere to make this easier?
> 
> Well, it largely depends on your requeriments. In these days that disks do 
> offer huge capacities at reasonable prices, I'd advocate by creating a 
> monolithic file containing all your data.

Thanks for advice, Francesc. Disk space is not really the problem, and 
as far as our sysadmins are concerned, the smaller number of files the 
better. I am more worried about speed, especially the sometimes enormous 
lag time when first opening the file. This may have more to do with the 
filesystem in use here, as later opens are much faster.
-- 
Michael Hoffman


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to