Re: _bulk_get protocol extension

Dave Cottlehuber Fri, 24 Jan 2014 13:52:35 -0800

Hey Jens,

that looks interesting indeed. Worth posting a jira ticket with the
link, so it doesn't get lost in email.


A+
Dave

On 24 January 2014 16:20, Jens Alfke <[email protected]> wrote:
> (I'm excited about this list! There have been some topics I've wanted to 
> bring up that are too implementation-oriented for the user@ list, but I 
> haven't been brave enough to dive into the dev@ list because I don't know 
> Erlang or the internals of CouchDB. I also really appreciate folks sharing 
> the viewpoint that CouchDB is an ecosystem and an open replication protocol, 
> not just a particular database implementation.)
>
> Anyway. One topic I'd like to bring up is that, in my non-scientific 
> observations, the major performance bottleneck in pull replications is the 
> fact that revisions have to be transferred using individual GET requests. 
> I've seen very poor performance when pulling lots of small documents from a 
> distant server, like an order of magnitude below the throughput of sending a 
> single huge document.
>
> (Yes, it's possible to get multiple revisions at once by POSTing to 
> _all_docs. Unfortunately this has limitations that make it unsuitable for 
> replication; see my explanation at the page linked below.)
>
> A few months ago I experimentally implemented a new "_bulk_get" REST call in 
> Couchbase's replicators (Couchbase Lite and the Sync Gateway), which 
> significantly improves performance by allowing the puller to request any 
> number of revisions in a single HTTP request. Again, no scientific tests or 
> hard numbers, but it was enough to convince me it's worthwhile. I've 
> documented it here:
>         https://github.com/couchbase/sync_gateway/wiki/Bulk-GET
> It's pretty straightforward and I've tried to make it consistent with the 
> standard API. The only unusual thing is that the response can contain nested 
> MIME multipart bodies: the response format is multipart, with every requested 
> revision in a part, but revisions containing attachments are themselves sent 
> as multipart. (This shouldn't be an issue for any decent multipart parser, 
> since nested multipart is pretty common in emails, but I think it's the first 
> time it's happened in the CouchDB API.)
>
> I'd be happy if this were implemented in CouchDB and made an official part of 
> the API. Hopefully the spec I wrote is detailed enough to make that 
> straightforward. (I don't have the Erlang skills to do it myself, though.)
>
> —Jens

Re: _bulk_get protocol extension

Reply via email to