In the HTTP WG more than a decade ago issues like this came up under the name
'boxcar'ing'. But with the introduction of pipelining the performance benefits
of boxcar'ing for idempotent requests went away.
In a replication the source should be able to fire off GET requests down the
pipeline non-stop and the remote server should be able to return them just as
quickly. So have you identified why you are seeing bad performance?
Thanks,
Yaron
> -----Original Message-----
> From: Jens Alfke [mailto:[email protected]]
> Sent: Friday, January 24, 2014 7:21 AM
> To: [email protected]
> Subject: _bulk_get protocol extension
>
> (I'm excited about this list! There have been some topics I've wanted to bring
> up that are too implementation-oriented for the user@ list, but I haven't
> been brave enough to dive into the dev@ list because I don't know Erlang or
> the internals of CouchDB. I also really appreciate folks sharing the viewpoint
> that CouchDB is an ecosystem and an open replication protocol, not just a
> particular database implementation.)
>
> Anyway. One topic I'd like to bring up is that, in my non-scientific
> observations, the major performance bottleneck in pull replications is the
> fact that revisions have to be transferred using individual GET requests. I've
> seen very poor performance when pulling lots of small documents from a
> distant server, like an order of magnitude below the throughput of sending a
> single huge document.
>
> (Yes, it's possible to get multiple revisions at once by POSTing to _all_docs.
> Unfortunately this has limitations that make it unsuitable for replication;
> see
> my explanation at the page linked below.)
>
> A few months ago I experimentally implemented a new "_bulk_get" REST call
> in Couchbase's replicators (Couchbase Lite and the Sync Gateway), which
> significantly improves performance by allowing the puller to request any
> number of revisions in a single HTTP request. Again, no scientific tests or
> hard
> numbers, but it was enough to convince me it's worthwhile. I've
> documented it here:
> https://github.com/couchbase/sync_gateway/wiki/Bulk-GET
> It's pretty straightforward and I've tried to make it consistent with the
> standard API. The only unusual thing is that the response can contain nested
> MIME multipart bodies: the response format is multipart, with every
> requested revision in a part, but revisions containing attachments are
> themselves sent as multipart. (This shouldn't be an issue for any decent
> multipart parser, since nested multipart is pretty common in emails, but I
> think it's the first time it's happened in the CouchDB API.)
>
> I'd be happy if this were implemented in CouchDB and made an official part
> of the API. Hopefully the spec I wrote is detailed enough to make that
> straightforward. (I don't have the Erlang skills to do it myself, though.)
>
> -Jens