The algorithm at a high level goes like this: Get some changes Check them for revisions that are missing from the target Pull/Push the missing revisions Repeat
These functions map to the public API in an obvious way: /_changes /_missing_revs /_all_docs or /_bulk_docs For any production-ready replicator you'll probably want to checkpoint your progress. CouchDB stores checkpoints in documents prefixed with "_local/". Docs named this way don't replicate, don't show up in views, etc. Good for internal metadata stuff like this. Stable checkpointing requires that, up to a certain sequence, all updates must be flushed to disk on both sides. Currently this is accomplished with a separate POST to /_ensure_full_commit. Couch also honors a header on document update requests called X-Couch-Full-Commit. Almost all the replicator code is contained in couch_rep* or couch_replicator*. The latter is the new replicator code by Filipe, which may have some dependency on the old code (I'm not sure). That should be enough to get you started. -Randall On Tue, Feb 15, 2011 at 18:51, Aaron Boxer <[email protected]> wrote: > Thanks, guys! I guess I need to dig into the actual code. > > I would like to implement a similar algorithm in C, for another project > I am working on. > > > > On Tue, Feb 15, 2011 at 5:48 PM, Robert Newson <[email protected]> > wrote: >> It's worth mentioning that, like git, the hash also includes the >> previous contents (and, hence, is dependent on all previous updates), >> >> Only identical sequences of updates will yield the same _rev. >> >> B. >> >> On 15 February 2011 22:37, Randall Leeds <[email protected]> wrote: >>> On Tue, Feb 15, 2011 at 07:30, Aaron Boxer <[email protected]> wrote: >>>> Interesting. Thanks! >>>> >>>> How do version ids get generated? How do the different nodes >>>> avoid version id collision; i.e. two nodes updating a document with the >>>> same version id? >>> >>> The revision id contains both a monotonically increasing number >>> revision number and a hash of the document contents. The hash breaks >>> ties (storing the conflict, not resolving it, but deterministically >>> choosing a privileged version to report as the "newest"). >>> >>> In this manner, should two nodes perform the same update the revision >>> is said to exist in both places already and replication will note this >>> and not copy the document again. >>> >>> -Randall >>> >> >
