Howdy: anyone with an input? On 2011-03-25 02:19, Philippe Ombredanne wrote: > Hello Pytables wolrd! > I am a python open source hacker and programmer. > > I need to store files metadata for several 100's of terabytes > /billions of files and I am considering Pytables. Postgres is making > the job too hard. > > The metadata themselves are for files and directories, and represent a > few terabytes for now up to the low 10 terabytes in the long run. > There are inherently hierarchic in the sense that directory level > metadata apply down to all child files/dirs unless overridden at a sub > level. > The metadata are characterized by a reasonably high level of > redundancy: several files share the same value for a column, and in > some cases a couple millions files do share the same value for a > certain column/attribute. > These highly duplicated columns need to be indexed for fast access > (think about an IR-style inverted index at least conceptually), and > are the keys used for the look-ups/queries. > The metadata themselves can be either single values, or a list of > values. Some node can have up to a few millions of values in a list of > variable length. > > The metadata are otherwise mostly numbers of well defined types with a > pseudo random distribution: the whole range of a numeric type is used. > (typically 64 bits to 512 bits numbers) > > The metadata are mostly static: they are written once in batches of > several 100MB, very rarely updated once written. > > The read load requires querying and possibly traversing the whole > file-system-like metadata tree about 100 to a 1000 times per day. > The response time for such queries is not critical as long as it takes > less than 24 hours. The load can be spread on several (10 to 100) > hosts as needed with data possibly replicated. The querying takes care > of de-duplication on duplicated retrieved records. > > Is Pytable suitable for the job? > Any tips? example of similar usage? > Is the right approach to use the object tree to model the file system > tree? (aka filenode? http://www.pytables.org/docs/manual/ch06.html ) > though the file content is not meant to be stored in Pytables, only > metadata. > > Any tool to help with replication/distribution on several hosts? > I am not looking for getting complete answers right away of course, > but any tips will be warmly welcomed! > >
-- Cordially Philippe philippe ombredanne | 1 650 799 0949 | pombredanne at nexb.com nexB - Open by Design (tm) - http://www.nexb.com http://eclipse.org/atf - http://eclipse.org/soc - http://eclipse.org/vep http://drools.org/ - http://easyeclipse.org - http://phpeclipse.com ------------------------------------------------------------------------------ Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users