Replication and new user questions

Blair Zajac Tue, 25 Aug 2009 14:11:10 -0700

Hello,

We're looking at using CouchDB's replication to allow us to easily havemulti-master replicating databases across multiple facilities, (e.g. LosAngeles, Albuquerque, Bristol, England, etc). It looks like it'll be theperfect tool for the job.

Some questions on the current implementation and the work that I've read isgoing to be in forthcoming releases.

1) What's the most robust automatic replication mechanism? While continuousreplication looks nice, I see there's some tickets open with it and that it hasissues with four nodes. Is a more robust solution, but a little slower andheavier, it to have an update_notification that manually POSTs to _replicate?

2) With the persistent continuous replication feature, is there a way to stopcontinuous replication without restarting couchdb? Will there be a way tomanage the list of replicant databases when the persistent continuousreplication feature is complete?

3) How does continuous replication deal with network outages, say if a link goesdown between the Los Angeles and Bristol data centers? Does CouchDB deal with ahanging TCP connection ok?

4) It would be nice for CouchDB to have in it a list of replicant databases thatit will automatically push changes to, so this list could also be maintained inCouchDB, instead of with an external script. Is there any work on a featurelike this? This could be easily done with an external update_notification script.

5) I wrote the following Bourne shell script and after running it for an hour,it consumes 100% of a CPU. This is even after stopping the shell script andcompacting both databases. What would explain this behavior?


#!/bin/sh

HOST1=http://localhost:5984
HOST1=http://localhost:5984
DB1=$HOST1/db1
DB2=$HOST2/db2

curl -X PUT $DB1
curl -X PUT $DB2

curl -X POST $HOST1/_replicate -d '{"source": "db1", "target": "db2",
"continuous": true}'
curl -X POST $HOST2/_replicate -d '{"source": "db2", "target": "db1",
"continuous": true}'

while true; do
  seconds="`date +%s`"
  echo Working on $DB1/$seconds
  rev=`curl -X PUT $DB1/$seconds -d '{"name": "$seconds"}' 2>/dev/null |
python2.6 -c 'import cjson, sys; print
cjson.decode(sys.stdin.readline()).get("rev")'`

  while curl $DB2/$seconds 2>/dev/null | grep error; do
    echo "  Does not exist yet at $DB2/$seconds."
    sleep 1
  done
  echo "  It exists now at $DB2/$seconds."

  curl -X DELETE "$DB2/$seconds?rev=$rev" >/dev/null 2>&1

  while curl $DB1/$seconds 2>/dev/null | grep _rev; do
    echo "  It has not been deleted yet at $DB1/$seconds"
    sleep 1
  done
  echo "  It has been deleted at $DB1/$seconds."
  echo
done

6) The other thing I noticed is that after compacting the databases, thesemessages appear frequently in the log, when they didn't appear before thecompaction.


[info] [<0.185.0>] recording a checkpoint at source update_seq 6748
[info] [<0.185.0>] A server has restarted sinced replication start. Not
recording the new sequence number to ensure the replication is redone and
documents reexamined.

And then later got this:

[info] [<0.4831.9>] 172.29.113.186 - - 'GET' /db1/1251156832 404
[info] [<0.18214.10>] 172.29.113.186 - - 'PUT' /db1/1251156833 201
[error] [<0.186.0>] changes_loop died with reason {system_limit,
                               [{erlang,spawn_opt,
                                 [proc_lib,init_p,
                                  [<0.187.0>,[],gen,init_it,
                                   [gen_server,<0.187.0>,<0.187.0>,
                                    couch_event_sup,
                                    {couch_db_update,
                                     {couch_db_update_notifier,
                                      #Ref<0.0.6.249926>},
                                     #Fun<couch_rep_changes_feed.4.19189490>},
                                    []]],
                                  [link]]},
                                {proc_lib,start_link,5},
                                {couch_rep_changes_feed,
                                 send_local_changes_forever,3}]}

Thanks and nice work on the project.

Regards,
Blair

Replication and new user questions

Reply via email to