Hello, On Fri, Jun 11, 2010 at 4:13 PM, Mark Anderson <[email protected]> wrote: > I tried replying to the list, but I got bounced; > > > ---------- Forwarded message ---------- > From: Mark Anderson <[email protected]> > Date: Fri, Jun 11, 2010 at 3:59 PM > Subject: Re: Replication scalability on many databases > To: [email protected] > > > I'd certainly like to see a whole server replication feature, but I've > been playing with large numbers of continuous replications; you may be > able to implement your original plan of using continuous replications. > > Here's some configuration tips from my experiments. I got 5000 running > with these fixes on an EC2 large before I got bored. > > There are a few general scaling limits to consider: > * File descriptors (sockets and data files) > * Erlang Ports, Tables, Processes > * Mochiweb connections > > * Insure max open files is large enough... > ulimit defaults to 1024 in many cases. > ulimit -n 1000000 fixes; probably needs to be in init.d/couchdb > > * Increase erlang max ports and tables > export ERL_MAX_PORTS=100000 > export ERL_MAX_ETS_TABLES=10000 > These probably need to be incorporated into init.d/couchdb > > * Insure that the number of dbs can scale > In local.ini > [couchdb] > max_dbs_open = 10000 > > * Insure the number of active web connections can scale; default is 2048. > I backported a patch from trunk that adds a configuration variable for > this. It's in [email protected]:manderson26/couchdb > In local.ini you can then add. > [mochiweb] > max_connections = 16384 > > * Increase the number of available erlang 'processes' > As the load increases the number of active processes can exceed the > 32768 limit that is the erlang default. This can be fixed by adding > start_arguments="+P 131072" > to the /usr/local/bin/couchdb script. >
Thank you Mark for the response, this is good information, although I did do tweaking I did miss some of your suggestions such as erlang processes, ports and tables. I think although with enough raw numbers I might be able to achieve our replication needs as it appears you have, their is one final problem with replication that you may be able to answer for me, turning it off. Part of our application has a unique automatic-failover built in. Our replication strategy includes a analysis pass on a service status table and switches replication direction based off the state of a couchdb service. In example: Master -> Failover -> Backup, service status labels Master as "down", the replicator then stop making the replication pulls on Failover until Master becomes healthy, once master is healthy it will replicate the changes since master was down, then go back to a Master -> Failover -> Backup replication strategy. So, during this specific set of replication strategies, the Failover machine never went down or was restarted. The only way for the replication processes to stop, as far as I know is to restart couchdb. If we had a way to easily stop a continuous replication, your configuration tweaks suggestion becomes more viable for us. However, while it then would work and fit our model, something tells me that it still isn't as efficient as a server-wide replication. A server wide replication, with the addition perhaps of a java-script filter function for database names would be fantastic and likely give us the performance we want. Thank you again Mark for the response. -Chris
