Gregor, your approach makes perfect sense, only that you need some work to do because:
1. the attachments are encoded in CouchDB;
2. you will need a document scanner.

I don't know about your project, but I would go with attachments on a different pipe managed by a web server and pointed in CouchDB documents (with maximum a document per attachment to manage the attachment description and use of include_docs). Now, it's up to you because you know better your project requests.

CGS



On 10/28/2011 01:25 PM, Robert Newson wrote:
The approach would be to teach couchdb how to deduplicate
byte-identical attachments (or chunks thereof) with a file. Sounds a
bit tricky but not impossible.

B.

On 28 October 2011 12:22, Gregor Martynus<[email protected]>  wrote:
Thanks for your responses!

I'm not sure if there is any approach to go minimize the disadvantage of
replicated attachments eating up space and performance, if there is, please
let me know.

My approach would be to setup a backend server that listens to new
attachments coming in, transferring these to an external store like S3 and
then replace the doc attachment in the DB with some kind of pointer to the
new location of the attachments.

Not sure if that makes sense, I'm open for suggestions.

And once more thanks for your help!

On Fri, Oct 28, 2011 at 1:14 PM, CGS<[email protected]>  wrote:

Hi Gregor,

I might be wrong because I am no expert in that field. But from the
documentation, one can deduce that all the attachments are inserted into the
document and not pointing toward a physical file (quite logic if you
consider the main purpose of CouchDB: web-oriented database). As replication
mechanism is the same for local replication and replication over the network
(just transferring the content of data from source file to the target file),
my guess is that your attachment is copied in all the physical files for
which a replication operation was applied.

However, depending on your project requests, instead of attachment you can
use a pointer which you can use it in shows (at the user's end). The
limitations of such a method are imposed by the cross-domain limitations (if
you use AJAX).

I hope this answer will help you in designing your project and if somebody
notice any mistake in my answer, please, correct me.

Cheers,
CGS




On 10/28/2011 12:32 PM, Gregor Martynus wrote:

I wonder how couchDB stores document attachments internally. In
particular,
I'd like to know if I replicate a document with attachments from one
database to another, will the attachments be stored twice internally or
will
the couchDB be smart enough to understand that the attachment does already
exist and only needs to link to it?

I hope my question is clear. In my case, each account has an own database
with its own documents. Now documents can be shared between accounts which
will be done using replication. But when attachments would get stored
multiple times although they are exactly the same I fear that it would use
up too much space and eventually slow down replications etc?



Reply via email to