Replications stopping unexpectedly

Daniel Gonzalez Fri, 27 Apr 2012 03:04:29 -0700

Hello,

I will describe my problem in a general way. If more details are needed, I
will try to gather them from my production environments.
We have several couchdb instances, with a bunch of databases. Some of these
databases are connected via replication.
Some of the replications are working via an ssh-tunnel, others by direct
internet connection. The latency between couchdb instances ranges between
few milliseconds to up de several hundreds of milliseconds.


My problem is that it is very common for the replications to stop. It could
due to connectivity being lost (sometimes the ssh tunnels fail and must be
recreated), but this is not the only reason.

And worse: the replications are not restarted automatically. They stay in
error. The problem is so frequent that I have a replication monitor process
looking for erroneous replications, and deleting and recreating the
replication documents of those replications in error, every 5 minutes. This
is the only method I have found to reliably restart the replications.

Is somebody else experiencing similar problems? Do you have any suggestion
on how to make replications more robust in front of connectivity issues?
Are there other methods to
restart erroneous replications, apart from redefining them?

Thanks,
Daniel Gonzalez

Replications stopping unexpectedly

Reply via email to