Large files in the DB didn't work well for us, either.

We ended up putting together a solution where we stored large files on the
file server and simply associated them with documents in different
databases.  We used nginx upfront to forward the normal CouchDB attachment
requests to our server handling the files so that the user could use the
normal Couch endpoints.  It works well for us (we have generally files up
to ~10 GBs) and if you'd like to have a look at how to do it, here's a link
to some docs:

http://nedm-tum.github.io/FileServer-Docker/
https://github.com/nEDM-TUM/FileServer-Docker

Cheers,
Mike

On Tue, Jun 21, 2016 at 11:10 PM, Kevin Coombes <[email protected]>
wrote:

> Have you thought about a two-part solution? You can use Couch for the
> front end to store the metadata (making it searchable in lots of
> interesting ways) with a separate data store behind it. Along with the
> metadata, each CouchDB document would hold a URI that points to the actual
> file somewhere else. You can even mix-and-match back-ends, including
> straight HTTP or FTP servers as well as subversion or git. (We started
> implementing this idea to store various kinds of
> genomics/genetics/transcriptoimics data before I left M.D. Anderson a
> couple of years ago. We got far enough to know that it is at least somewhat
> more than just theoretically possible. It never got finished, however,
> since after I left there was no one to push hard for it....
>   Kevin
>
>
> On 6/21/2016 4:47 PM, Brad Rhoads wrote:
>
>> I'll second that. It didn't work out well for us. It's probably OK for
>> small, plain text documents. But it didn't work too well with large media
>> files.
>> ᐧ
>>
>> ---------------------------
>> www.maf.org/rhoads
>> www.ontherhoads.org
>>
>> On Tue, Jun 21, 2016 at 2:29 PM, Alexander Harm <[email protected]> wrote:
>>
>> Hello Etay,
>>>
>>> npm did that at one point and they have a couple of articles in their
>>> blog
>>> that might be of your interest:
>>>
>>>
>>>
>>> http://blog.npmjs.org/post/71267056460/fastly-manta-loggly-and-couchdb-attachments
>>> <
>>>
>>> http://blog.npmjs.org/post/71267056460/fastly-manta-loggly-and-couchdb-attachments
>>> http://blog.npmjs.org/post/75707294465/new-npm-registry-architecture <
>>> http://blog.npmjs.org/post/75707294465/new-npm-registry-architecture>
>>>
>>> They experienced problems with storing a lot of attachments in CouchDB
>>> and
>>> moved to another solution. Also note this post of Nolan Lawson, point 4:
>>>
>>>
>>>
>>> https://pouchdb.com/2014/06/17/12-pro-tips-for-better-code-with-pouchdb.html
>>> <
>>>
>>> https://pouchdb.com/2014/06/17/12-pro-tips-for-better-code-with-pouchdb.html
>>> I especially love the quote of Laurie Voss:
>>>
>>> "One of the big things that everybody who's spent a lot of time with
>>> databases knows is that you should never put your binaries in the
>>> database.
>>> It's a terrible idea. It always goes wrong. I have never met a database
>>> in
>>> 15 years of which it is not true, and it's definitely not true of
>>> CouchDB.
>>> You are taking this thing which is meant to sort and organize data, and
>>> you're giving it binary data, which it can neither sort nor organize. It
>>> can't do anything with that data, other than get really fat.”
>>>
>>> My advice: DON’T.
>>>
>>> Regards, Alexander
>>>
>>> On 21. Jun. 2016, at 21:44, Etay Haun <[email protected]> wrote:
>>>>
>>>> Hi,
>>>> Thanks for your answers to my last post. It was very helpful.
>>>>
>>>> We are developing a distributed file system solution and we would like
>>>> to
>>>> base our solution on CouchDB.
>>>> We would like to use CouchDB to store the files as attachments  (each
>>>> document will include the file and the file meta-data).
>>>> We have a few data centers that stores *different* file systems,
>>>> Although
>>>> some of the documents are replicated to other data centers.
>>>> We have a few questions regarding possible technical issues.
>>>> As mentioned, Part of our possible solution involves using attachments
>>>> to
>>>> store the actual files in couchdb.
>>>> 1. We couldn't find any information regarding suggested attachment size.
>>>> 2. Is there an issue with storing large attachments? (up to 2GB per file
>>>>
>>> -
>>>
>>>> although most files will be much smaller - few KB or MB)
>>>> 3. We need to replicate some documents between couch instances including
>>>> the attachments, Is this okay?
>>>> 4. Does CouchDB also stores revisions of attachments?
>>>> 5. If so, how can we determine the required storage space for an
>>>> instance
>>>> assuming we know what will be the entire system's size?
>>>> Our biggest instance will include 20TB of attachments.
>>>> 6. Are there any possible issues with running the instances on Windows
>>>>
>>> 2012
>>>
>>>> servers?
>>>> Thank you in advance.
>>>>
>>>
>>>
>

Reply via email to