You can also use a filesystem that does block level deduplication on the fly (there's also dedicated hardware for that). Example filesystem: http://www.lessfs.com/wordpress/
There are of course the tradeoffs, like speed vs space saving gains for example. On Fri, Oct 28, 2011 at 12:25 PM, Robert Newson <[email protected]> wrote: > The approach would be to teach couchdb how to deduplicate > byte-identical attachments (or chunks thereof) with a file. Sounds a > bit tricky but not impossible. > > B. > > On 28 October 2011 12:22, Gregor Martynus <[email protected]> wrote: >> Thanks for your responses! >> >> I'm not sure if there is any approach to go minimize the disadvantage of >> replicated attachments eating up space and performance, if there is, please >> let me know. >> >> My approach would be to setup a backend server that listens to new >> attachments coming in, transferring these to an external store like S3 and >> then replace the doc attachment in the DB with some kind of pointer to the >> new location of the attachments. >> >> Not sure if that makes sense, I'm open for suggestions. >> >> And once more thanks for your help! >> >> On Fri, Oct 28, 2011 at 1:14 PM, CGS <[email protected]> wrote: >> >>> Hi Gregor, >>> >>> I might be wrong because I am no expert in that field. But from the >>> documentation, one can deduce that all the attachments are inserted into the >>> document and not pointing toward a physical file (quite logic if you >>> consider the main purpose of CouchDB: web-oriented database). As replication >>> mechanism is the same for local replication and replication over the network >>> (just transferring the content of data from source file to the target file), >>> my guess is that your attachment is copied in all the physical files for >>> which a replication operation was applied. >>> >>> However, depending on your project requests, instead of attachment you can >>> use a pointer which you can use it in shows (at the user's end). The >>> limitations of such a method are imposed by the cross-domain limitations (if >>> you use AJAX). >>> >>> I hope this answer will help you in designing your project and if somebody >>> notice any mistake in my answer, please, correct me. >>> >>> Cheers, >>> CGS >>> >>> >>> >>> >>> On 10/28/2011 12:32 PM, Gregor Martynus wrote: >>> >>>> I wonder how couchDB stores document attachments internally. In >>>> particular, >>>> I'd like to know if I replicate a document with attachments from one >>>> database to another, will the attachments be stored twice internally or >>>> will >>>> the couchDB be smart enough to understand that the attachment does already >>>> exist and only needs to link to it? >>>> >>>> I hope my question is clear. In my case, each account has an own database >>>> with its own documents. Now documents can be shared between accounts which >>>> will be done using replication. But when attachments would get stored >>>> multiple times although they are exactly the same I fear that it would use >>>> up too much space and eventually slow down replications etc? >>>> >>>> >>> >> > -- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men."
