Hello,
We're looking at using CouchDB's replication to allow us to easily have
multi-master replicating databases across multiple facilities, (e.g. Los
Angeles, Albuquerque, Bristol, England, etc). It looks like it'll be the
perfect tool for the job.
Some questions on the current implementation and the work that I've read is
going to be in forthcoming releases.
1) What's the most robust automatic replication mechanism? While continuous
replication looks nice, I see there's some tickets open with it and that it has
issues with four nodes. Is a more robust solution, but a little slower and
heavier, it to have an update_notification that manually POSTs to _replicate?
2) With the persistent continuous replication feature, is there a way to stop
continuous replication without restarting couchdb? Will there be a way to
manage the list of replicant databases when the persistent continuous
replication feature is complete?
3) How does continuous replication deal with network outages, say if a link goes
down between the Los Angeles and Bristol data centers? Does CouchDB deal with a
hanging TCP connection ok?
4) It would be nice for CouchDB to have in it a list of replicant databases that
it will automatically push changes to, so this list could also be maintained in
CouchDB, instead of with an external script. Is there any work on a feature
like this? This could be easily done with an external update_notification script.
5) I wrote the following Bourne shell script and after running it for an hour,
it consumes 100% of a CPU. This is even after stopping the shell script and
compacting both databases. What would explain this behavior?
#!/bin/sh
HOST1=http://localhost:5984
HOST1=http://localhost:5984
DB1=$HOST1/db1
DB2=$HOST2/db2
curl -X PUT $DB1
curl -X PUT $DB2
curl -X POST $HOST1/_replicate -d '{"source": "db1", "target": "db2",
"continuous": true}'
curl -X POST $HOST2/_replicate -d '{"source": "db2", "target": "db1",
"continuous": true}'
while true; do
seconds="`date +%s`"
echo Working on $DB1/$seconds
rev=`curl -X PUT $DB1/$seconds -d '{"name": "$seconds"}' 2>/dev/null |
python2.6 -c 'import cjson, sys; print
cjson.decode(sys.stdin.readline()).get("rev")'`
while curl $DB2/$seconds 2>/dev/null | grep error; do
echo " Does not exist yet at $DB2/$seconds."
sleep 1
done
echo " It exists now at $DB2/$seconds."
curl -X DELETE "$DB2/$seconds?rev=$rev" >/dev/null 2>&1
while curl $DB1/$seconds 2>/dev/null | grep _rev; do
echo " It has not been deleted yet at $DB1/$seconds"
sleep 1
done
echo " It has been deleted at $DB1/$seconds."
echo
done
6) The other thing I noticed is that after compacting the databases, these
messages appear frequently in the log, when they didn't appear before the
compaction.
[info] [<0.185.0>] recording a checkpoint at source update_seq 6748
[info] [<0.185.0>] A server has restarted sinced replication start. Not
recording the new sequence number to ensure the replication is redone and
documents reexamined.
And then later got this:
[info] [<0.4831.9>] 172.29.113.186 - - 'GET' /db1/1251156832 404
[info] [<0.18214.10>] 172.29.113.186 - - 'PUT' /db1/1251156833 201
[error] [<0.186.0>] changes_loop died with reason {system_limit,
[{erlang,spawn_opt,
[proc_lib,init_p,
[<0.187.0>,[],gen,init_it,
[gen_server,<0.187.0>,<0.187.0>,
couch_event_sup,
{couch_db_update,
{couch_db_update_notifier,
#Ref<0.0.6.249926>},
#Fun<couch_rep_changes_feed.4.19189490>},
[]]],
[link]]},
{proc_lib,start_link,5},
{couch_rep_changes_feed,
send_local_changes_forever,3}]}
Thanks and nice work on the project.
Regards,
Blair