On 05/04/2011 18:36, Zdravko Gligic wrote:
Hi Folks,

Are there any large implementations of CouchDB peer-to-peer
replications or even smaller open source samples?  Actually, the piece
that I am mostly interested in is at the application/design end of how
to go about implementing the "traffic cop" for a use case where
everyone is eventually synchronized with everyone else.

Given a large number of peers that one could replicate to/from, is
there anything within CouchDB that can be "posted centrally" to know
how up to date anyone is, so that badly out of date peers are
replicate to/from the more up to date ones, instead to/from each
other?  What else should I ask, if I knew better ;?)

Hi,

I've been thinking about this and had an idea. IIRC you mention in one post that
you may have 100,000 users, each with a couch on their kit.

The idea is this. (and its not peer to peer :) ).

You publish a set of couches, each of which will accept HTTP (port 80) replications from your users. You have enough of these scattered about so your users can find one close that is responsive. It is up to the user to initiate these replications and they are free to use any
published node they like.

By using port 80, and triggering it from the user's end, you will avoid most problems
with routers and port forwarding. (I think- needs confirmation).

You also have a central (private) set of nodes. Perhaps 4. The published nodes replicate with these central machines in a round robin fashion, so that each published node replicates with the center 4 times an hour, but with a different machine each time. Each uses the same sequence, but the times are staggered. If the 4 central nodes are A, B, C and D, then A may see published node x at 3 minutes past, B will see X at 15+3 = 18 minutes past, C will be replicated with by x at 33 minutes past, and D will receive the call at 48 minutes past. Published node Y might call in 22 minutes after X - and it would be 22
minutes later for every central node.

These central machines replicate continuously with each other, in a ring.

In normal use, a message dropped at any published node, will get to a central node within 15 minutes (average 8), and from there to all the central nodes in a few moments. From there, it will propagate out to all other nodes within 15 minutes, average 8 minutes. So updates cross the network in an average of about 16 minutes, and a maximun of
just over half an hour.

If a published node goes down, the users will switch to another until you bring it up
again. Shortly after restarting, it will catch up by replication.

If a single central node goes down, replication still happens, but messages that can't be dropped off or picked up from the lost mode, will be delayed by an extra 15 minutes.
(possibly twice).  When it returns to life, it will catch up.

You can have more or fewer than 4 central nodes. So long as there is a central machine running, then replication will happen - even if the central group is split and cannot
replicate within itself.

Although I have specified an hourly cycle, you could use any time scale you like - or you
could change it by database.

Note - There are no direct users of published or central nodes, so those nodes do not
have to spend time building indexes.

Regards

Ian




Reply via email to