Miles, Oh. Gotchya. But the fun part is the protocol implementation ;)
Paul Davis On Mon, Oct 26, 2009 at 2:21 PM, Miles Fidelman <[email protected]> wrote: > Paul, > > I'm actually suggesting using NNTP infrastructure, or something like it, to > propagate updates - rather than trying to reinvent the protocol and > supporting infrastructure in CouchDB. > Something like: > change -> local CouchDB instance -<continuous replication (output)> -> local > NNTP daemon (specific newsgroup) -> "the net" > > "the net" -> local NNTP daemon (specific newsgroup) -> <continuous > replication (input) -> local CouchDB instance > > All messages eventually get to all Couch instances - but the order and delay > can vary considerably. > > Miles > > > Paul Davis wrote: >> >> Miles, >> >> This sounds like what I was trying to propose. More concretely: >> >> >>> >>> I keep coming back to NNTP (USENET) as a model for many-to-many >>> messaging: >>> >>> - push a message into a newsgroup on any NNTP node subscribing to that >>> newsgroup >>> >> >> A message group is persisted in a DB. In the future, a single db with >> filtered replication could work, but a db per group will probably be >> easier. Not all nodes have all db's. >> >> >>> >>> - nodes exchange "I-have"/"You-have" on a regular basis >>> >> >> Replication of the "network status db" that holds what nodes have what >> update_seq/host pairs. The pairing is important. The rate of >> replication obviously affects delays and what not. >> >> >>> >>> - message propagate to all subscribing nodes by essentially a flooding or >>> epidemic routing mechanism >>> >> >> This is the fun part. Given your local copy of a group, how do you >> pick a peer to pull from. Or push to if you're feeling proactive. Do >> we sync in both directions, etc etc. Depending on the behavior this >> could manifest in multiple ways. If you have a single authority per >> 'subscription source' then you'd want to keep >> "authority/update_seq/located_at" triples. This way you know that if >> the largest update_seq seen is N, then you can replicate from any node >> that has that sequence. >> >> >>> >>> - pretty quickly, a message propagates to all nodes subscribing to the >>> newsgroup >>> >> >> Message delivery via replication. Judging the quickly part would >> require a more formal definition of quickly and then building the >> thing. Depending on your definition of quickly there are definitely >> different types of design decisions to be made. >> >> >>> >>> - lack of connectivity simply delays message propagation >>> >> >> Or reroutes through an alternate node. If we know that four nodes have >> a copy of the db, we can replicate from any that are alive. >> Replication is incremental, a link can disappear in the middle of a >> replication and it'll resume from the previous check point. >> >> >>> >>> - the whole system scales massively, and is very robust in the face of >>> connectivity outages, node failures, etc. (messages can flow across >>> multiple >>> routes) >>> >> >> It scales on my brain debugger assuming the propagation algorithm >> isn't extremely naīve. But as I said, I haven't built it yet so who >> knows. >> >> >>> >>> In some sense, what I'm thinking of would look a lot like: >>> >>> - a group of CouchDB nodes all subscribe to a newsgroup >>> >> >> I'm confused by the term subscription here. Generally I'd assume that >> means that a foreign host knows that the local node is interested in >> something. For fully distributed awesome, I think it'd be better >> phrased as "nodes can declare what they want" and the algorithm will >> attempt to maintain their local state somewhere close to the global >> state. Kind of like the difference between newspaper delivery and >> buying a newspaper at any one of the many shops in town. >> >> >>> >>> - each node publishes changes as messages to that newsgroup >>> >> >> $ curl -X PUT -d '{"message": "First post!!1!1"}' >> http://127.0.0.1:5984/newsgroup/memesrus >> >> Part of the replication routing could include a proactive step in >> pushing messages to nodes it knows care about a message. >> >> >>> >>> - NNTP takes care of getting messages everywhere, eventually >>> >> >> The 'network state in a db' means that regardless of who's interested >> in what, if we make sure that the first step in node communication is >> a 'i know about these endpoints in these states' then we'll push >> information to the people that care. Eventually. >> >> >>> >>> - each node looks for incoming messages and applies them as changes >>> >> >> Replication FTW! >> >> >>> >>> - use a shared key to secure things (note: some implementations of NNTP >>> already support secure messaging) >>> >> >> This is slightly harder. Read only means you'd need something infront >> of couch to prevent readers. But OAuth or distributing SSL certs or >> similar wouldn't be out of the question. >> >> >>> >>> A similar approach could be taken using: >>> - a distributed hash table as a message que (that's what spread and >>> splines >>> seem to do) >>> >> >> I haven't looked to hard at these, but "DHT as message queue" seems >> contradictory to the idea that most DHT's (all?) that I know of don't >> allow range queries. >> >> >>> >>> - the DIS or HLA protocols (used for distributed simulation - keeping >>> multiple copies of a "world" synchronized) >>> >> >> I couldn't get past the first wiki page Google gave me, so, I dunno. >> >> HTH, >> Paul Davis >> > > > -- > In theory, there is no difference between theory and practice. > In practice, there is. .... Yogi Berra > > >
