[Tech] Publish/subscribe, 3rd attempt - confusion over error handling

Matthew Toseland Fri, 2 Sep 2005 22:03:29 +0100

On Fri, Sep 02, 2005 at 09:59:21PM +0100, Matthew Toseland wrote:
> Okay, what about ID collisions? We want to keep prevent collisions with
> nodes which are already dependant on us, right? Does it matter? We can
> collide on one ID, but not on the others... but it doesn't matter that
> much. Collision detection isn't vital, because we are moving in the
> right direction - towards the key. Or is it? We could well loop back on
> ourselves... we would run out of HTL, but it wouldn't be very
> profitable. There is a similar situation with request coalescing. I
> decided this wasn't a big deal a while ago, I'm not sure I remember
> why... The obvious solution is not to coalesce subscribe requests. We
> can achieve this while still using max HTL as above. The problem is two
> such requests could be running at the same time. The answer to that is
> to do the determination of which nodes need to be told that we are no
> longer subscribed through them atomically at the end of the request.


The problem with *that* is that different requests have been routed to
us through different chains... Some of which will be rather problematic
now. So our routing could vary quite dramatically from moment to
moment...
> 
> Ideally we would coalesce them though, especially as they're all at the
> same HTL. We may need to make coalescing (of requests and inserts as
> well as subscribe requests) slightly less transparent... we
> can hit our tail if we're not careful... we could send a Coalesced
> message back, indicating we were coalesced with a given ID, and to
> regard that ID as equal to the original ID, so to reject it if it loops
> back... this could then be sent to the other nodes on our request
> chain...
> 
> On Fri, Sep 02, 2005 at 09:09:53PM +0100, Matthew Toseland wrote:
> > Actually, the mechanism proposed on routing through a node we're already
> > connected indirectly to WON'T work, because there's no clear definition
> > of dependancy. There is no clear definition of "the node from which I
> > expect to receive packets". Thus if a node has to reroute, it will end
> > up sending a SubscribeRestarted to the entire network.
> > 
> > Now, can we make it into a tree?
> > 
> > The data probably won't be inserted at the root node. We could route it
> > up to the root node though.
> > 
> > The other problem is backtracking. The simple solution to this is to
> > make the RNF loops we get dependant on us, and dependant on our final
> > node. This is roughly analogous to how we deal with requests. Hmmm...
> > 
> > On Fri, Sep 02, 2005 at 09:00:42PM +0100, Matthew Toseland wrote:
> > > Here's another attempt at a publish/subscribe architecture, after some
> > > discussion with Ian, this should be simpler, persist even when the
> > > source node goes down, and make a start on resource limiting:
> > > 
> > > TOPOLOGY
> > > --------
> > > 
> > > Firstly, all the below will probably work much better with the new
> > > backtracking mechanism.
> > > 
> > > There is no fixed path from the data source to the subscribe tree.
> > > Packets are routed individually, and we make no attempt to track how
> > > they were routed last time, or to keep nodes on the right path.
> > > 
> > > There is a single subscribe tree. Actually, it's a subscribe graph. The
> > > reason it is a graph is backtracking. If I route a request with HTL 20
> > > to a clump of 10 nodes, it will return with an RNF with HTL 10. I will
> > > then route it elsewhere. Thus I will have two successful routes - the
> > > data will have to be sent to both of them if I get a packet. Now, the
> > > subscribe graph is the result of very many subscriptions. Each is routed
> > > towards the key, and together they make a graph, which is somewhat
> > > treeish, but is not a tree. It therefore has no fully defined root. But
> > > data will certainly spread very quickly through it.
> > > 
> > > Then, when we route a publish packet, it will also be routed towards the
> > > tree. In all likelihood it will eventually meet a node which is on the
> > > graph. The data will then be broadcast across the graph. Each node
> > > verifies it, checks that it hasn't already got it, and relays it to all
> > > directly connected nodes. Because it is a graph and not a strict tree,
> > > it is possible that the same packet would be sent to the same node twice
> > > by two different nodes. This will waste some bandwidth. However, it
> > > probably won't waste a LOT of bandwidth, because mostly the node will be
> > > told directly before it is told indirectly.
> > > 
> > > So we have a persistent subscribe graph, and we have transient
> > > PublishData's - equivalent to inserts. Actually this looks a lot like
> > > passive requests when you think about it. The difference being that we
> > > know the keys are connected, so you don't have to constantly send out
> > > requests to subscribe to the next key in the sequence. And completely
> > > different caching policy.
> > > 
> > > Each node will have a maximum of 32 streams. If we run out of streams,
> > > we reject new SubscribeRequests. So it is important that we not lose
> > > track of the streams. When a node is restarted, it should tell its
> > > neighbours which streams it is subscribed to, and to disconnect from any
> > > others. This will require a special message, BulkSubscribeStreams, or
> > > something...
> > > 
> > > If a node no longer has any subscribers for a given stream, it drops the
> > > stream. This might cause a cascade effect. But once one node has
> > > subscribed to a stream, there will always be at least a chain of nodes,
> > > moving outward from the start, to which the data can be sent.
> > > 
> > > These arbitrary limits suggest a DoS may be possible; we will have to
> > > review this part of the architecture when we have some more experience.
> > > LRU is an option, but would mean that streams which have infrequent
> > > packets die, regardless of demand. That would be bad.
> > > 
> > > Now, here's the hard part: Error handling.
> > > Error handling on publish is easy. The packets just get routed
> > > differently. They end up in the same place, and they still rendezvous
> > > with the subscribe graph.
> > > 
> > > BUT: Error handling on subscribe is, as always, complicated.
> > > 
> > > I adapt the previous proposal:
> > > 
> > > Subscriptions will normally be routed similarly to requests, and will
> > > terminate on a collision i.e. they will terminate successfully when they
> > > reach a node which has already got the stream. They can RNF and will be
> > > rerouted in that case. Or they can run out of HTL, in which case they
> > > are regarded as successful, although it is possible that they have run
> > > into a network dead-end.
> > > 
> > > Once we are subscribed, we can have various errors:
> > > - We disconnect from a node we were subscribed through.
> > > We will have to reroute.
> > > - The locations of our neighbour nodes change so much that the route
> > >   would have changed.
> > > We will have to reroute.
> > > - A node downstream reroutes for one of the above reasons.
> > > We will have to reroute.
> > > 
> > > Further complication: If an upstream has to reroute, it may have to
> > > reroute through us. Therefore somehow it must tell us that it is about
> > > to do so so that we pass the request on rather than accepting it.
> > > 
> > > Simplest option first:
> > > We disconnect from a node we were subscribed through.
> > > 
> > > If this was the last or only node we were subscribed through (meaning we
> > > successfully sent a subscribe request through), it is relatively simple:
> > > We check what HTL we had left at that point, and rerun the request from
> > > that point.
> > > 
> > > If the node which has broken was NOT the last node, things are more
> > > complex. We know what HTL we had left at that point. We will probably
> > > have to reroute from that point. Assuming the locations have not
> > > changed:
> > > - If the next node returned an RNF, we can use that as the next block -
> > >   just starting at a different HTL; taking the same number of hops. Then
> > >   we can route again, to a further node; this is merely adding to the
> > >   path so is not a problem.
> > > - If the next node returned success due to running out of hops, (this
> > >   needs to be a distinct mode), we will definitely need to rerequest
> > >   through it.
> > > 
> > > If a node downstream loses a connection, it will have to reroute. It may
> > > then be that its RNF is no longer valid, and it has to give us one with
> > > a different HTL. We can then act on this...
> > > 
> > > If the locations HAVE changed... things are even worse. If we discover
> > > that we should have routed to B then C, not C then B, and they both
> > > RNFed, this is not a problem. But if the latter succeeded due to running
> > > out of hops, this is a problem.
> > > 
> > > What about HTLs? If we get a subscription with HTL 20, when we had
> > > routed at HTL 17...? If it is less than our HTL, no problem, but if it
> > > is more:
> > > - If each node on the chain RNFed, we don't need to do anything, because
> > >   the HTL was not the limiting factor; the size of the network was.
> > > - Otherwise we may need to do some rerouting...
> > > 
> > > 
> > > Also, if a node upstream has to reroute, it may end up rerouting through
> > > us, and we would accept it because we have a stream already... the
> > > mechanism described in the previous mail is adequate for dealing with
> > > this though.
> > > 
> > > 
> > > Possible general principles:
> > > - Live requests: When we get the first subscribe request, we create a
> > >   thread to route it. We then route it. The request succeeds, or at
> > >   least, we run out of nodes. We then wait for:
> > >   a) A connection to a node to go down
> > >   b) A connection to a (possibly new) node to come up
> > >   c) Any connected node we routed through to change location
> > >   d) Any node we routed through to return a message such as a new RNF,
> > >   etc.
> > >   If this happens, we may have to reroute.
> > > 
> > > This is exactly the same challenge we faced with the other side of the
> > > equasion (the data publish blocks) in the first version of the draft.
> > > The solution was to route them individually and keep nodes updated as to
> > > whether they were involved if we routed through them on one packet but
> > > not on the next. We can't do this exactly for subscriptions however.
> > > What is the answer?
> > > -- 
> > > Matthew J Toseland - toad at amphibian.dyndns.org
> > > Freenet Project Official Codemonkey - http://freenetproject.org/
> > > ICTHUS - Nothing is impossible. Our Boss says so.
> > 
> > 
> > 
> > > _______________________________________________
> > > Tech mailing list
> > > Tech at freenetproject.org
> > > http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/tech
> > 
> > -- 
> > Matthew J Toseland - toad at amphibian.dyndns.org
> > Freenet Project Official Codemonkey - http://freenetproject.org/
> > ICTHUS - Nothing is impossible. Our Boss says so.
> 
> 
> 
> > _______________________________________________
> > Tech mailing list
> > Tech at freenetproject.org
> > http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/tech
> 
> -- 
> Matthew J Toseland - toad at amphibian.dyndns.org
> Freenet Project Official Codemonkey - http://freenetproject.org/
> ICTHUS - Nothing is impossible. Our Boss says so.



> _______________________________________________
> Tech mailing list
> Tech at freenetproject.org
> http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/tech

-- 
Matthew J Toseland - toad at amphibian.dyndns.org
Freenet Project Official Codemonkey - http://freenetproject.org/
ICTHUS - Nothing is impossible. Our Boss says so.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: 
<https://emu.freenetproject.org/pipermail/tech/attachments/20050902/f5a9dddc/attachment.pgp>

[Tech] Publish/subscribe, 3rd attempt - confusion over error handling

Reply via email to