Howdy:
anyone with an input?

On 2011-03-25 02:19, Philippe Ombredanne wrote:
> Hello Pytables wolrd!
> I am a python open source hacker and programmer.
>
> I need to store files metadata for several 100's of terabytes 
> /billions of files and I am considering Pytables. Postgres is making 
> the job too hard.
>
> The metadata themselves are for files and directories, and represent a 
> few terabytes  for now up to the low 10 terabytes in the long run.
> There are inherently hierarchic in the sense that directory level 
> metadata apply down to all child files/dirs unless overridden at a sub 
> level.
> The metadata are characterized by a reasonably high level of 
> redundancy: several files share the same value for a column, and in 
> some cases a couple millions files do share the same value for a 
> certain column/attribute.
> These highly duplicated columns need to be indexed for fast access 
> (think about an IR-style inverted index at least conceptually), and 
> are the keys used for the look-ups/queries.
> The metadata themselves can be either single values, or a list of 
> values. Some node can have up to a few millions of values in a list of 
> variable length.
>
> The metadata are otherwise mostly numbers of well defined types with a 
> pseudo random distribution: the whole range of a numeric type is used. 
> (typically 64 bits to 512 bits numbers)
>
> The metadata are mostly static: they are written once in batches of 
> several 100MB, very rarely updated once written.
>
> The read load requires querying and possibly traversing the whole 
> file-system-like metadata tree about 100 to a 1000 times per day.
> The response time for such queries is not critical as long as it takes 
> less than 24 hours. The load can be spread on several (10 to 100) 
> hosts as needed with data possibly replicated. The querying takes care 
> of de-duplication on duplicated retrieved records.
>
> Is Pytable suitable for the job?
> Any tips? example of similar usage?
> Is the right approach to use the object tree to model the file system 
> tree? (aka filenode? http://www.pytables.org/docs/manual/ch06.html ) 
> though the file content is not meant to be stored in Pytables, only 
> metadata.
>
> Any tool to help with replication/distribution on several hosts?
> I am not looking for getting complete answers right away of course, 
> but any tips will be warmly welcomed!
>
>

-- 
Cordially
Philippe

philippe ombredanne | 1 650 799 0949 | pombredanne at nexb.com
nexB - Open by Design (tm) - http://www.nexb.com
http://eclipse.org/atf - http://eclipse.org/soc - http://eclipse.org/vep
http://drools.org/ - http://easyeclipse.org - http://phpeclipse.com


------------------------------------------------------------------------------
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to