BigCouch returns compressed attachments without indicating they're compressed

Jens Alfke Thu, 17 May 2012 15:47:48 -0700

I’m having (or rather, TouchDB is having*) problems receiving documents with 
attachments from Cloudant; I assume this is a difference between BigCouch and 
CouchDB. I believe it's a bug in the server.


The issue is that the server is returning compressed attachment bodies without 
indicating that they’re compressed. TouchDB barfs because the length of the 
received data doesn’t match the “length” property in the _attachments entry, 
and there is no "encoded_length" property giving the encoded length, let alone 
an "encoding" property that indicates that the data's been compressed (and by 
what algorithm.)

For example, take this document 
<https://snej.cloudant.com/attachment-test/readme> which has a 5313-byte HTML 
attachment.

A plain GET returns:

> {"_id":"readme","_rev":"2-4eb511f5ad0707c6e9fb1160b3f0bedd","_attachments":{"README.html":{"content_type":"text\/html","revpos":2,"digest":"md5-DRLenhWRAAAW9Q0RHyrG+w==","length":5313,"stub":true}}}

If I ask for the attachment inline I get:

> {"_id":"readme","_rev":"2-4eb511f5ad0707c6e9fb1160b3f0bedd","_attachments":{"README.html":{"content_type":"text\/html","revpos":2,"digest":"md5-DRLenhWRAAAW9Q0RHyrG+w==","data":"PGgxIGlkPSJ0b3…{{{lots
>  of Base64 data}}}..."}}}

where the base64 data decodes to 2136 bytes, and is not HTML but GZIPped HTML.

Asking for the document with attachments in MIME multipart format results in:

> --fbd433e586402848d98875903ea97f67
> content-type: application/json
> 
> {"_id":"readme","_rev":"2-4eb511f5ad0707c6e9fb1160b3f0bedd","_attachments":{"README.html":{"content_type":"text\/html","revpos":2,"digest":"md5-DRLenhWRAAAW9Q0RHyrG+w==","length":5313,"follows":true}}}
> --fbd433e586402848d98875903ea97f67
> {{{2136 bytes of GZIP data}}}
> --fbd433e586402848d98875903ea97f67—

Same thing — the data is GZIPped but there is no metadata to indicate the fact.

I believe this is a bug in BigCouch. It results in an ambiguity as to whether 
the content is encoded or not (and if so, what encoding is being used.) In the 
worst case you could have an attachment whose GZIPped encoding is exactly the 
same length as the raw data, in which case there would be no way to tell 
whether it was encoded or not since the lengths would match either way.

—Jens

* https://github.com/couchbaselabs/TouchDB-iOS/issues/80

BigCouch returns compressed attachments without indicating they're compressed

Reply via email to