On Wed, Apr 25, 2012 at 7:53 PM, Chris Stockton <[email protected]> wrote: > Hello,
Hi Chris > > I was very excited when I was reading the replicator changes in the > release notes[1], specifically because I saw "Number of worker > processes" and thought that maybe it is now pooled. Although I am very > glad to see the improvements to the replicator and very much appreciate > the work that has been done to it; I am a bit confused after reading the > more detailed paramters for the new replicator[2]. It seems that the > configuration options and worker processes are for a specific database, > with some decently high defaults, such as 20 "http_connections". From > what I gather reading this is per database, or is it per server? It's per replication (what you call "per database" if I understood correctly). It's specifically mentioned at http://wiki.apache.org/couchdb/Replication#New_features_introduced_in_CouchDB_1.2.0 > > I have sent emails in the past to this list how I would love to see a > server wide replicator, something that created a configurable pool of > connections for server relationships. For us, we scale with many > databases instead of having one giant database. The problem we have > faced is as we reached only 2K databases some configuration tweaks had > to be made to allow replication to run from our Master -> Failover -> > Backup machine, as we got up to 5000 we were forced to take our Backup > machine out of the picture due to putting around 10K TCP connection > requirement to our Fail over machine. It was simply to much strain even > for very large enterprise database servers. > > So my question here is does the new replicator pool an entire server, > solving our growth problem with MANY databases, or does it simply add > additional strain with more workers (from 5000 tcp connections to 100k)? > If it does indeed add additional workers instead of lower them, if I was > to lower the defaults to 1 connection per database, is the new > replicator designed in such a way that it will still offer at least > comparable performance to the 1.1 replicator, or could I possibly incur > a penalty because the new architecture is designed and expected to have > a modest pool size? A big difference is that each replication has its own set of dedicated connections (for better error isolation and performance). Keep in mind however, that if you're doing pull-style replications, there's always one connection (per replication) fully dedicated to the remote _changes feed. This is true for both replicators. I think you'll to solve your problem by having only N non-continuous replications active at any time, and do your own round-robin scheduling manually in the meanwhile. I started some time ago some work to make the pooling more configurable, namely to allow to choose between a per-replication dedicated pool or a shared pool of connections amongst multiple replications, amongst other features. It's not finished however, and only the following is online: https://github.com/fdmanana/couchdb/tree/lhttpc regards > > Kind Regards, > > -Chris > > [1] > http://www.apache.org/dist/couchdb/notes/1.2.0/apache-couchdb-1.2.0.html > [2] http://wiki.apache.org/couchdb/Replication > > -- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men."
