On Jul 11, 2013, at 9:25 AM, Bill Foshay <[email protected]> wrote:
> Ignoring filtering, is there any idea roughly how many persistent > replications can be running before it starts to hurt performance. I know > this is a vague question, highly dependent on the system resources of the > machine hosting the database, the number of updates being made, etc. I'm > just trying to get a rough idea if possible. Are we talking like on the > order of 100 replications, 1000s, etc? Client pushes aren’t expensive. They don’t consume resources on the server except for the occasional POSTs to _revs_diff and _bulk_docs when the client has new revisions to upload. The overhead of client pulls is * Open TCP sockets (for _changes feeds) — this is the same scaling problem that large-scale Comet, IMAP, XMPP, etc. servers have. There are hardware and kernel issues to consider if you need to handle tends/hundreds of thousands of open TCP connections per host. * User-space server state for all those connections — fortunately Erlang is kind of the poster child of scalability here. I don’t know what extra overhead CouchDB adds. * Writing to all those sockets whenever a revision is added — Don’t know how bad this gets. It’s on the order of one packet of payload, per active listener. In response, there will probably be a GET request sent from each listener to retrieve the new revision. In extreme cases the GETs could result in the same thundering-herd problem that was seen with RSS (where updating the feed produces a zillion simultaneous hits to the new article.) I don’t know much about the innards of CouchDB (or Erlang) so I can’t get more specific about these… —Jens
