The current max_dbs_open value is set at 600.
The server is running 112 continuous replications with the following topology:
+--> F001
S(*) ---> T --| ...
+--> F111
(*) S is on a different host
On the first data change at the source database, the following issue was logged
and the replication between S and T died:
{checkpoint_commit_failure,<<"Target database out of sync. Try to increase
max_dbs_open at the target's server.">>}
One of the filtered replications between T and Fn died as well 2 seconds later
with the same checkpoint_commit_failure issue. I suspect that it was the one
that let the new document through its filter, but cannot verify.
Upon restart of the replication between S and T, it ran to completion, but
several of the filtered replications died with the same issue from above. I
suspect that all filtered replications that let the new documents through their
filters were affected, but cannot verify.
After starting the failed filtered replications once more, everything runs to
completion.
Another change triggers the following issue, yet the replication keeps running
and the filtered replication does not show any sign of issue:
{checkpoint_commit_failure,<<"Error updating the source checkpoint document:
conflict">>}
...
[Mon, 16 Jul 2012 23:34:10 GMT] [info] [<0.27578.249>] recording a checkpoint
for `S` -> `T` at source update_seq 169029
...
[Mon, 16 Jul 2012 23:34:17 GMT] [info] [<0.28279.247>] recording a checkpoint
for `T` -> `http://Fx` at source update_seq 52930
...
Subsequent changes at the source do not trigger any other errors in the log
files.
Is this last issue related to the previous ones or just coincidental?
Is there a formula that allows me to project the value I need to chose
for max_dbs_open?
What is the reason that the value of 600 appears to be too low?
I also see a lot of 'GET /llfs/ 200' in the logs, probably originating from the
112 replication - it appears they poll every 5 seconds.
Is there a parameter to reduce the interval? I've looked and couldn't find it,
but might have missed it.
One other thing I noticed is that if you start 2 continuous replications, one
with 'create_target': true, another w/o the parameter, the replications are
treated as different and not recognized as 'already running'. In my opinion,
as 'create_target' is a null operation with an already created database, they
should be recognized as 'already running'. What happens in the case of 2
identical replications running?
-- Andreas