Re: [SLUG] document storage and serving

Jeff Waugh Sat, 01 Jun 2002 02:34:11 -0700

<quote who="[EMAIL PROTECTED]">

> 3) I suspect in the long run it is going to be easier and more efficient to
> store all the text in a RDBMS than as text files


I'm in the process of building a document management system as you've
described above, but just wanted to give you an inconsequential, but perhaps
knowledge-encouraging answer for this problem: It depends. :-)

File systems are pretty good at storing large chunks of stuff, and sifting
through it via a standardised hierarchy. That sounds good - files are large
chunks of stuff, and one would assume that you'd have some kind of
standardised hierarchy for your documents.

Problem: You will never have a single hierarchy. There are many different
hierarchies for a set of documents; consider sorting, categorisation,
multiple categorisation, etc. Perhaps, at a simple level, hard and soft
links will help you with this. At a semantic level, they can't.

Databases are pretty good at storing indexed information, and lots of it.
That also sounds good - documents have all sorts of metadata that you'd want
to filter, search and sort by, and the documents themselves can be
accessible via different paths (hierarchies, search terms, however your
brain is working at the time).

Problem: Databases are not all that good at storing large chunks of stuff,
even though they are great with tons of little stuff. A very large database
is also a pain to administer and maintain, you might find yourself up
against hard OS limits pretty soon if you're storing entire files.

Solution: Use both, for what they're good at doing. We're storing the files
on a standard filesystem, and referring to them in the database. Instead of
putting all of the files in a single directory (most filesystems are not
optimised for this usage pattern), we've split them across a number of
directories, by the unique id of the document.

[ For the record, we're using Python and PostgreSQL. Random sig action. ]

- Jeff

-- 
    "Python amazes me for its concision. The current prototype is all of    
       900 lines of code, yet it contains a lexer, parser (recursive        
       descent), core language interpreter, and parallelizing process       
                      spawner." - Raph Levien on Rebar                      
-- 
SLUG - Sydney Linux User's Group - http://slug.org.au/
More Info: http://lists.slug.org.au/listinfo/slug

Re: [SLUG] document storage and serving

Reply via email to