Re: Scaling with filtered replication

Jens Alfke Fri, 12 Jul 2013 12:00:48 -0700

On Jul 11, 2013, at 9:25 AM, Bill Foshay <[email protected]> wrote:


> Ignoring filtering, is there any idea roughly how many persistent 
> replications can be running before it starts to hurt performance. I know 
> this is a vague question, highly dependent on the system resources of the 
> machine hosting the database, the number of updates being made, etc. I'm 
> just trying to get a rough idea if possible. Are we talking like on the 
> order of 100 replications, 1000s, etc? 

Client pushes aren’t expensive. They don’t consume resources on the server 
except for the occasional POSTs to _revs_diff and _bulk_docs when the client 
has new revisions to upload.

The overhead of client pulls is
* Open TCP sockets (for _changes feeds) — this is the same scaling problem that 
large-scale Comet, IMAP, XMPP, etc. servers have. There are hardware and kernel 
issues to consider if you need to handle tends/hundreds of thousands of open 
TCP connections per host.
* User-space server state for all those connections — fortunately Erlang is 
kind of the poster child of scalability here. I don’t know what extra overhead 
CouchDB adds.
* Writing to all those sockets whenever a revision is added — Don’t know how 
bad this gets. It’s on the order of one packet of payload, per active listener. 
In response, there will probably be a GET request sent from each listener to 
retrieve the new revision. In extreme cases the GETs could result in the same 
thundering-herd problem that was seen with RSS (where updating the feed 
produces a zillion simultaneous hits to the new article.)

I don’t know much about the innards of CouchDB (or Erlang) so I can’t get more 
specific about these…

—Jens

Re: Scaling with filtered replication

Reply via email to