Documents stored in Cloudant databases aren't including MD5 digests of
attachment contents in the _attachments metadata. Here's an example:
"_attachments": {
"photo-15357DCF-9566-4DFD-9120-8A9164EE5873": {
"follows": true,
"length": 79608,
"content_type": "image/jpeg",
"revpos": 2
}
},
Other servers don't do this; I assume this is a difference between BigCouch and
CouchDB. Is this intentional? It's causing problems replicating databases from
Cloudant to TouchDB, and the workarounds I can think of for this in TouchDB are
either fairly ugly (basically involving writing a custom JSON parser…) or
involve performance regressions.
Here's more detail on my problem:
* For efficiency, the replicator in TouchDB (like CouchDB 1.2) fetches
documents in MIME multipart format, so that attachments are easily streamable
to disk and aren't base64-encoded.
* This requires correlating the MIME bodies with the metadata objects in the
_attachments object.
* CouchDB (and BigCouch) unfortunately don't add any headers to the MIME bodies
to identify what they are. I've already filed a bug report against this.
* TouchDB's replicator works around this by computing an MD5 digest of each
MIME body and then correlating those with the "digest" properties of the
attachment metadata objects.
* …which fails with Cloudant/BigCouch because that "digest" property is missing.
The reason CouchDB itself doesn't have trouble correlating the attachments is
that it knows the MIME bodies are written in the same order as the attachments
appear in the _attachments object. However, key order is not significant in
JSON objects, and in most implementations the parser stores the object contents
in a hash table (like a Ruby Hash object or a Cocoa NSDictionary), which means
the ordering of the keys is lost. The only way for me to determine the true
order of the attachment keys would be to write my own specialized JSON parser
that could identify the keys and put the names into an ordered structure like
an array.
—Jens