How to tell if replication is caught up?

Wayne Conrad Tue, 22 Mar 2011 12:28:01 -0700

My largest, ~600GB database was awful to compact. Because much of itseldom changes, I shared that database by account, yielding about 500databases of various sizes. With a compaction daemon that only compactsa database when it grows, compaction is no longer a problem. However, Iappear to be suffering now when it comes to replication.

Five hundred continuous "pull" replications have the destinationdatabase crying for mercy. Its four CPUs are continously busy (loadaverage ~4) and requests to the destination database occasionally time out.

The replication script starts a "pull" replication for each database,one at a time. The replication requests start out taking about 0.3seconds per database, but towards the end of the list each reques istaking many seconds.

Shortly after the replication starts, before it's got past more than afew dozen database, there is a brief flood of stack traces (or whateverErlang calls them) in the destination couch log. I think there arefewer lines of error info than there are atoms in the sun, but onlyjust. Is there a guide that can help me know which lines of that logyou need to know?

The source database is not suffering: It's load average is < 1 and itserves requests quickly.

Due to the number of databases, I've added "ulimit -n 32768" to thestartup script.

We're running version 1.2.0ac052866-git on linux 2.6.32. This versionhas the new replicator.


* Are we "doing it all wrong?"

* Can I expect the storm to abate once all of the replications arecaught up?

* How can I tell which replications are "caught up?" I see that a GETto /_active_tasks tells me that some replication tasks are "Starting"and others have, e.g., "Processed source seq 17", but I don't know ifthis is enough to know what's caught up and what's not. Do I have toquery the source database somehow to find out what source sequence isavailable?


Best Regards,
Wayne Conrad

How to tell if replication is caught up?

Reply via email to