On 11/19/2014 10:00 PM, Matthieu Rakotojaona wrote: > Excerpts from Jeldrik's message of 2014-11-19 16:38:53 +0100: >> Hi there, >> >> I already asked this question on #couchdb but I'm not really satisfied >> with the answers I got. Just because there are some open questions left >> with no answer in IRC. I thought it could be a good idea to open the >> question for a wider group. I will paste both my original question and >> the answers I got in #couchdb. >> >> Many thanks for your help, >> Jeldrik >> >> == >> >> This was the question (I just added some information): >> >> We are moving a couchdb to new hardware but we have a pull replication >> (couch_backup.example.com) which we want to keep. Our planned steps are >> like these: >> 1. rsync db files from couch_live.example.com to couch_new.example.com >> 2. compact dbs on couch_new (this is neccessary because on couch_live >> compression was turned off and is wished to be turned on now) >> # Meanwhile the couch_live is still live and data is pushed to it from >> clients and pulled by the couch_backup replication >> 3. start pull replication on couch_new with source couch_live and target >> couch_new for all dbs >> 4. if all dbs are nearly in sync have a short downtime until the data is >> fully in sync then turn over to couch_new >> 5. shutdown couch_live and the replication to couch_backup >> 6. new data is comming in to couch_new >> 7. start pull replication on couch_backup with source couch_new >> >> Now the question is how to keep the couch_backup replication? If I got >> it right the replication depends on two values. The first one is the uri >> to the source. So could a switch from couch_live.example.com/db1 to >> couch_new.example.com/db1 break the replication? The second one is or >> more precisely are the seq no. At the moment when we turn off the >> couch_live all three couch_live, couch_backup and couch_new will have >> the same data. So from the point of view of the data we have >> consistency. But maybe the seq no. differ. Of course the couch_new will >> immediately receive new data. So how can I convice the couch_backup to >> start replication from that one point of data consistency? >> >> == >> >> And these were responses and my following questions on IRC to it: >> >> 15:09 <mar-ia> jeldrik: couch_backup will continue from the last data it >> has. You should not need to wory about it. If I have understood >> everything correctly :) >> 15:37 <jeldrik> mar-ia: thx. but how sure are you about that? the >> problem is that couch_backup is on a remote site. and it happened to >> them when we had a similar system move. >> 15:44 <mar-ia> jeldrik: Every node knows the last change it has. So when >> it starts a replication it askes for all the changes made after that >> point. It does not get the complete history, only the latest version (as >> always). >> 15:49 <jeldrik> but if i got it right it does that with the checkpoints >> aka seq no., doesn't it? and we had situations where the seq. no of a >> replication differed from the source. so couldn't it happen that the new >> system has a lower seq no. but new data and because of that after the >> change the backup couch asks like for "everything after 'higher seq no'" >> and then gets nothing >> 15:50 <jeldrik> what would break the consistency of the backup > Hi Jeldrik, > > Sequence number are per-db helpers for replication and incremental > views, they are opaque data for users. The best way to see them is like > ETags: they mean something only for the database that holds it, so you > can query that database with "since=", but external components (that > includes you as a user) don't have to worry about their meaning. > Actually, if I refer to this thread [0], BigCouch uses strings and there > are discussions about switching to strings. Don't take my word for it > though, I'm not in the inner circles :) > > This is why there is no transferring of sequence numbers. Even if you > remained on the same couchdb instance but had two databases with the > same data, the sequence numbers could differ. Incidentally, > replications, which are relying on sequence numbers, aren't supposed to > be transferred from database to database. You have to setup a new > replication every time you change a database name/server. > > But it's okay, here is what would happen in the worst case for you: > > - you set up a new replication from couh_new to couch_backup as soon as > couch_new is running, even though it's not receiving user data > directly > > - replicator runs, checks replication history between couch_new and > couch_backup, sees there is none, starts from scratch > > - replicator gets all changes in couch_new from the beginning, sees if > they exist in couch_backup. It checks _every_ doc so it might take > some time > > - since the data already exists on the destination, replicator won't > transfer any data > > - at the end, replicator will save a checkpoint for this replication > stating "there's been a replication between those 2 databases, up to > source id xxx and target id yyy" (note: this checkpoint is saved in > two parts, one on each end. But as a user you don't care). Now the > next time replicator runs, it will not start from scratch. > > In your situation, I'd set up a replication couch_new => couch_backup > as soon as couch_new is up. You'd have a 3-way replication: > > - couch_live => couch_new > - couch_live => couch_backup > - couch_new => couch_backup > > which is totally fine and how CouchDB's replication protocol was > intended to work. This way, the moment you turn couch_live down, the > backup replication will already be up and running and all the data is > where it should be. Don't forget to remove the couch_live => > couch_backup replication. > > I hope this answers your questions ! > > [0] http://thread.gmane.org/gmane.comp.db.couchdb.devel/11724 > Hi Matthieu,
thanks a lot. That makes things clearer to me! Best wishes, Jeldrik
