Hey Jens, that looks interesting indeed. Worth posting a jira ticket with the link, so it doesn't get lost in email.
A+ Dave On 24 January 2014 16:20, Jens Alfke <[email protected]> wrote: > (I'm excited about this list! There have been some topics I've wanted to > bring up that are too implementation-oriented for the user@ list, but I > haven't been brave enough to dive into the dev@ list because I don't know > Erlang or the internals of CouchDB. I also really appreciate folks sharing > the viewpoint that CouchDB is an ecosystem and an open replication protocol, > not just a particular database implementation.) > > Anyway. One topic I'd like to bring up is that, in my non-scientific > observations, the major performance bottleneck in pull replications is the > fact that revisions have to be transferred using individual GET requests. > I've seen very poor performance when pulling lots of small documents from a > distant server, like an order of magnitude below the throughput of sending a > single huge document. > > (Yes, it's possible to get multiple revisions at once by POSTing to > _all_docs. Unfortunately this has limitations that make it unsuitable for > replication; see my explanation at the page linked below.) > > A few months ago I experimentally implemented a new "_bulk_get" REST call in > Couchbase's replicators (Couchbase Lite and the Sync Gateway), which > significantly improves performance by allowing the puller to request any > number of revisions in a single HTTP request. Again, no scientific tests or > hard numbers, but it was enough to convince me it's worthwhile. I've > documented it here: > https://github.com/couchbase/sync_gateway/wiki/Bulk-GET > It's pretty straightforward and I've tried to make it consistent with the > standard API. The only unusual thing is that the response can contain nested > MIME multipart bodies: the response format is multipart, with every requested > revision in a part, but revisions containing attachments are themselves sent > as multipart. (This shouldn't be an issue for any decent multipart parser, > since nested multipart is pretty common in emails, but I think it's the first > time it's happened in the CouchDB API.) > > I'd be happy if this were implemented in CouchDB and made an official part of > the API. Hopefully the spec I wrote is detailed enough to make that > straightforward. (I don't have the Erlang skills to do it myself, though.) > > —Jens
