RE: [jug-discussion] storing blobs on file system or in db

2005-03-29 Thread Richard Hightower
We did both. We stored the file in the DB. Then when the file is retrieved
from the db, we write the file to the filesystem. The app would check for
the file on the file system first if not found it gets it from the db and
then writes it to the file system for next time. This has the advantage of
working in a clustered env., and having all data reside in the DB for
reasons already mentioned in this thread. This takes the load of the db
(depending on the use cases of how often the documents get used). Also, this
allowed the files to be served directly by Apache with a little Perl magic,
but I digress. The first version worked well and did not use the Perl magic.
The perl magic improved performance even more by taking the load off of our
app server, and allowing Perl to hanlde delivering files, but again I
digress.

Just random thoughts
 

-Original Message-
From: Drew Davidson [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, March 16, 2005 4:46 PM
To: jug-discussion@tucson-jug.org
Subject: Re: [jug-discussion] storing blobs on file system or in db

Andrew Huntwork wrote:

 I'm writing this web app that allows users to upload documents, such 
 as word docs, images, etc, and then to download those documents again 
 on request.  the documents are not searched, interpretted, processed, 
 version controlled, or anything else.  just upload and download.  i 
 wonder if there's a general rule on whether one should stick such 
 things into a db or onto the file system.

 i currently favor sticking them in the db.  putting them on the fs 
 seems to interfere with clustering (different files would be on 
 different filesystems).  it's also another thing to back up and 
 generally maintain.  on the other hand putting them in the db puts 
 extra load on the db and the network.  there are a bunch of other 
 issues too.

 Any ideas?  Thanks for any help.


I'm all in favor of storing large documents, images, etc. in the filesystem
and storing metadata in the db.  I've implemented web-based systems using
both purely db and combination of db and filesystem for storing data.  I've
found that the db route is, as you say, easier to administer in terms of
backing up and access across multiple instances of applications and easier
to configure to get to the data.  But the performance penalty can be severe,
especially in a heavily loaded application.  I've done performance analysis
on the db-based application and during peak loads up to 40% of the runtime
of my application is spent on serving up the BLOBs as images (I store image
data in the DB and access it through a special servlet that reads the BLOB
from the database along with the image metadata like length and MIME type).
This is just silly tying up a servlet engine to do stuff that Apache does
more efficiently.

My setup now is more complicated, but much more performant.  By complicated
I mean that I have a Spring-configured manager for db-external assets.
This coordinates the usage of the filesystem with the db.  Also backing up
now has to include the virtual root of the filesystem where external
resources are configured (the Spring-configured manager has a property that
is set to this virtual root).  The other complication is the setup of the
Apache server to point to the resource directory.  This is not so bad
because I had another servlet serving this content anyway, it has now just
moved to Apache instead of using the servlet.  I'm not just uploading
documents and serving them, however, so my setup is probably more
complicated that yours would be.  My application has uploaded images that
are thumbnailed on-demand to verious sizes.

Just my opinion, FWIW.

- Drew

-- 
+-+
 Drew Davidson | OGNL Technology 
+-+
|  Email: [EMAIL PROTECTED]  /
|Web: http://www.ognl.org   /
|Vox: (520) 531-1966   
|Fax: (520) 531-1965\
| Mobile: (520) 405-2967 \
+-+


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [jug-discussion] storing blobs on file system or in db

2005-03-29 Thread Richard Hightower
We started using a shared file system so we could easily invalidate the file
cache, but this came later. 

-Original Message-
From: Randolph Kahle [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, March 16, 2005 3:28 PM
To: jug-discussion@tucson-jug.org
Subject: Re: [jug-discussion] storing blobs on file system or in db

Interesting question.

You could consider a shared file system.

I hesitate recommending that documents be stored in a database. You don't
need the transactional capabilities (correct?), and a RDMBS is not really a
great blob storage device (yes, they can do it, but I don't reach for an
RDBMS to store things like this unless I really need to).

Randy

On Mar 16, 2005, at 3:21 PM, Andrew Huntwork wrote:

 I'm writing this web app that allows users to upload documents, such 
 as word docs, images, etc, and then to download those documents again 
 on request.  the documents are not searched, interpretted, processed, 
 version controlled, or anything else.  just upload and download.  i 
 wonder if there's a general rule on whether one should stick such 
 things into a db or onto the file system.

 i currently favor sticking them in the db.  putting them on the fs 
 seems to interfere with clustering (different files would be on 
 different filesystems).  it's also another thing to back up and 
 generally maintain.  on the other hand putting them in the db puts 
 extra load on the db and the network.  there are a bunch of other 
 issues too.

 Any ideas?  Thanks for any help.

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [jug-discussion] storing blobs on file system or in db

2005-03-29 Thread Richard Hightower
I like to add this caveat. We arrived at this solution, because somebody was
storing images in the db in the original design, and this was a major
bottleneck. This was a workaround so we did not have to change as much code
yet get the performance we wanted. The workaround had some nice advantages
mentioned below, but it was arrived at not chosen per se. I am not sure if
starting from scratch we would have stored any files in the DB. Thus, below
was not a suggestion, it was an idea.

-Original Message-
From: Richard Hightower [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, March 29, 2005 9:39 AM
To: 'jug-discussion@tucson-jug.org'
Subject: RE: [jug-discussion] storing blobs on file system or in db

We did both. We stored the file in the DB. Then when the file is retrieved
from the db, we write the file to the filesystem. The app would check for
the file on the file system first if not found it gets it from the db and
then writes it to the file system for next time. This has the advantage of
working in a clustered env., and having all data reside in the DB for
reasons already mentioned in this thread. This takes the load of the db
(depending on the use cases of how often the documents get used). Also, this
allowed the files to be served directly by Apache with a little Perl magic,
but I digress. The first version worked well and did not use the Perl magic.
The perl magic improved performance even more by taking the load off of our
app server, and allowing Perl to hanlde delivering files, but again I
digress.

Just random thoughts
 

-Original Message-
From: Drew Davidson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 16, 2005 4:46 PM
To: jug-discussion@tucson-jug.org
Subject: Re: [jug-discussion] storing blobs on file system or in db

Andrew Huntwork wrote:

 I'm writing this web app that allows users to upload documents, such 
 as word docs, images, etc, and then to download those documents again 
 on request.  the documents are not searched, interpretted, processed, 
 version controlled, or anything else.  just upload and download.  i 
 wonder if there's a general rule on whether one should stick such 
 things into a db or onto the file system.

 i currently favor sticking them in the db.  putting them on the fs 
 seems to interfere with clustering (different files would be on 
 different filesystems).  it's also another thing to back up and 
 generally maintain.  on the other hand putting them in the db puts 
 extra load on the db and the network.  there are a bunch of other 
 issues too.

 Any ideas?  Thanks for any help.


I'm all in favor of storing large documents, images, etc. in the filesystem
and storing metadata in the db.  I've implemented web-based systems using
both purely db and combination of db and filesystem for storing data.  I've
found that the db route is, as you say, easier to administer in terms of
backing up and access across multiple instances of applications and easier
to configure to get to the data.  But the performance penalty can be severe,
especially in a heavily loaded application.  I've done performance analysis
on the db-based application and during peak loads up to 40% of the runtime
of my application is spent on serving up the BLOBs as images (I store image
data in the DB and access it through a special servlet that reads the BLOB
from the database along with the image metadata like length and MIME type).
This is just silly tying up a servlet engine to do stuff that Apache does
more efficiently.

My setup now is more complicated, but much more performant.  By complicated
I mean that I have a Spring-configured manager for db-external assets.
This coordinates the usage of the filesystem with the db.  Also backing up
now has to include the virtual root of the filesystem where external
resources are configured (the Spring-configured manager has a property that
is set to this virtual root).  The other complication is the setup of the
Apache server to point to the resource directory.  This is not so bad
because I had another servlet serving this content anyway, it has now just
moved to Apache instead of using the servlet.  I'm not just uploading
documents and serving them, however, so my setup is probably more
complicated that yours would be.  My application has uploaded images that
are thumbnailed on-demand to verious sizes.

Just my opinion, FWIW.

- Drew

-- 
+-+
 Drew Davidson | OGNL Technology 
+-+
|  Email: [EMAIL PROTECTED]  /
|Web: http://www.ognl.org   /
|Vox: (520) 531-1966   
|Fax: (520) 531-1965\
| Mobile: (520) 405-2967 \
+-+


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For