BigCouch doesn't provide attachment digests?

Jens Alfke Thu, 05 Apr 2012 10:42:14 -0700

Documents stored in Cloudant databases aren't including MD5 digests of 
attachment contents in the _attachments metadata. Here's an example:


    "_attachments": {
        "photo-15357DCF-9566-4DFD-9120-8A9164EE5873": {
            "follows": true,
            "length": 79608,
            "content_type": "image/jpeg",
            "revpos": 2
        }
    },

Other servers don't do this; I assume this is a difference between BigCouch and 
CouchDB. Is this intentional? It's causing problems replicating databases from 
Cloudant to TouchDB, and the workarounds I can think of for this in TouchDB are 
either fairly ugly (basically involving writing a custom JSON parser…) or 
involve performance regressions.

Here's more detail on my problem:
* For efficiency, the replicator in TouchDB (like CouchDB 1.2) fetches 
documents in MIME multipart format, so that attachments are easily streamable 
to disk and aren't base64-encoded.
* This requires correlating the MIME bodies with the metadata objects in the 
_attachments object.
* CouchDB (and BigCouch) unfortunately don't add any headers to the MIME bodies 
to identify what they are. I've already filed a bug report against this.
* TouchDB's replicator works around this by computing an MD5 digest of each 
MIME body and then correlating those with the "digest" properties of the 
attachment metadata objects.
* …which fails with Cloudant/BigCouch because that "digest" property is missing.

The reason CouchDB itself doesn't have trouble correlating the attachments is 
that it knows the MIME bodies are written in the same order as the attachments 
appear in the _attachments object. However, key order is not significant in 
JSON objects, and in most implementations the parser stores the object contents 
in a hash table (like a Ruby Hash object or a Cocoa NSDictionary), which means 
the ordering of the keys is lost. The only way for me to determine the true 
order of the attachment keys would be to write my own specialized JSON parser 
that could identify the keys and put the names into an ordered structure like 
an array.

—Jens

BigCouch doesn't provide attachment digests?

Reply via email to