[Tech] Publish/subscribe, 3rd attempt - confusion over error handling

Matthew Toseland Fri, 2 Sep 2005 21:09:53 +0100

Actually, the mechanism proposed on routing through a node we're already
connected indirectly to WON'T work, because there's no clear definition
of dependancy. There is no clear definition of "the node from which I
expect to receive packets". Thus if a node has to reroute, it will end
up sending a SubscribeRestarted to the entire network.


Now, can we make it into a tree?

The data probably won't be inserted at the root node. We could route it
up to the root node though.

The other problem is backtracking. The simple solution to this is to
make the RNF loops we get dependant on us, and dependant on our final
node. This is roughly analogous to how we deal with requests. Hmmm...

On Fri, Sep 02, 2005 at 09:00:42PM +0100, Matthew Toseland wrote:
> Here's another attempt at a publish/subscribe architecture, after some
> discussion with Ian, this should be simpler, persist even when the
> source node goes down, and make a start on resource limiting:
> 
> TOPOLOGY
> --------
> 
> Firstly, all the below will probably work much better with the new
> backtracking mechanism.
> 
> There is no fixed path from the data source to the subscribe tree.
> Packets are routed individually, and we make no attempt to track how
> they were routed last time, or to keep nodes on the right path.
> 
> There is a single subscribe tree. Actually, it's a subscribe graph. The
> reason it is a graph is backtracking. If I route a request with HTL 20
> to a clump of 10 nodes, it will return with an RNF with HTL 10. I will
> then route it elsewhere. Thus I will have two successful routes - the
> data will have to be sent to both of them if I get a packet. Now, the
> subscribe graph is the result of very many subscriptions. Each is routed
> towards the key, and together they make a graph, which is somewhat
> treeish, but is not a tree. It therefore has no fully defined root. But
> data will certainly spread very quickly through it.
> 
> Then, when we route a publish packet, it will also be routed towards the
> tree. In all likelihood it will eventually meet a node which is on the
> graph. The data will then be broadcast across the graph. Each node
> verifies it, checks that it hasn't already got it, and relays it to all
> directly connected nodes. Because it is a graph and not a strict tree,
> it is possible that the same packet would be sent to the same node twice
> by two different nodes. This will waste some bandwidth. However, it
> probably won't waste a LOT of bandwidth, because mostly the node will be
> told directly before it is told indirectly.
> 
> So we have a persistent subscribe graph, and we have transient
> PublishData's - equivalent to inserts. Actually this looks a lot like
> passive requests when you think about it. The difference being that we
> know the keys are connected, so you don't have to constantly send out
> requests to subscribe to the next key in the sequence. And completely
> different caching policy.
> 
> Each node will have a maximum of 32 streams. If we run out of streams,
> we reject new SubscribeRequests. So it is important that we not lose
> track of the streams. When a node is restarted, it should tell its
> neighbours which streams it is subscribed to, and to disconnect from any
> others. This will require a special message, BulkSubscribeStreams, or
> something...
> 
> If a node no longer has any subscribers for a given stream, it drops the
> stream. This might cause a cascade effect. But once one node has
> subscribed to a stream, there will always be at least a chain of nodes,
> moving outward from the start, to which the data can be sent.
> 
> These arbitrary limits suggest a DoS may be possible; we will have to
> review this part of the architecture when we have some more experience.
> LRU is an option, but would mean that streams which have infrequent
> packets die, regardless of demand. That would be bad.
> 
> Now, here's the hard part: Error handling.
> Error handling on publish is easy. The packets just get routed
> differently. They end up in the same place, and they still rendezvous
> with the subscribe graph.
> 
> BUT: Error handling on subscribe is, as always, complicated.
> 
> I adapt the previous proposal:
> 
> Subscriptions will normally be routed similarly to requests, and will
> terminate on a collision i.e. they will terminate successfully when they
> reach a node which has already got the stream. They can RNF and will be
> rerouted in that case. Or they can run out of HTL, in which case they
> are regarded as successful, although it is possible that they have run
> into a network dead-end.
> 
> Once we are subscribed, we can have various errors:
> - We disconnect from a node we were subscribed through.
> We will have to reroute.
> - The locations of our neighbour nodes change so much that the route
>   would have changed.
> We will have to reroute.
> - A node downstream reroutes for one of the above reasons.
> We will have to reroute.
> 
> Further complication: If an upstream has to reroute, it may have to
> reroute through us. Therefore somehow it must tell us that it is about
> to do so so that we pass the request on rather than accepting it.
> 
> Simplest option first:
> We disconnect from a node we were subscribed through.
> 
> If this was the last or only node we were subscribed through (meaning we
> successfully sent a subscribe request through), it is relatively simple:
> We check what HTL we had left at that point, and rerun the request from
> that point.
> 
> If the node which has broken was NOT the last node, things are more
> complex. We know what HTL we had left at that point. We will probably
> have to reroute from that point. Assuming the locations have not
> changed:
> - If the next node returned an RNF, we can use that as the next block -
>   just starting at a different HTL; taking the same number of hops. Then
>   we can route again, to a further node; this is merely adding to the
>   path so is not a problem.
> - If the next node returned success due to running out of hops, (this
>   needs to be a distinct mode), we will definitely need to rerequest
>   through it.
> 
> If a node downstream loses a connection, it will have to reroute. It may
> then be that its RNF is no longer valid, and it has to give us one with
> a different HTL. We can then act on this...
> 
> If the locations HAVE changed... things are even worse. If we discover
> that we should have routed to B then C, not C then B, and they both
> RNFed, this is not a problem. But if the latter succeeded due to running
> out of hops, this is a problem.
> 
> What about HTLs? If we get a subscription with HTL 20, when we had
> routed at HTL 17...? If it is less than our HTL, no problem, but if it
> is more:
> - If each node on the chain RNFed, we don't need to do anything, because
>   the HTL was not the limiting factor; the size of the network was.
> - Otherwise we may need to do some rerouting...
> 
> 
> Also, if a node upstream has to reroute, it may end up rerouting through
> us, and we would accept it because we have a stream already... the
> mechanism described in the previous mail is adequate for dealing with
> this though.
> 
> 
> Possible general principles:
> - Live requests: When we get the first subscribe request, we create a
>   thread to route it. We then route it. The request succeeds, or at
>   least, we run out of nodes. We then wait for:
>   a) A connection to a node to go down
>   b) A connection to a (possibly new) node to come up
>   c) Any connected node we routed through to change location
>   d) Any node we routed through to return a message such as a new RNF,
>   etc.
>   If this happens, we may have to reroute.
> 
> This is exactly the same challenge we faced with the other side of the
> equasion (the data publish blocks) in the first version of the draft.
> The solution was to route them individually and keep nodes updated as to
> whether they were involved if we routed through them on one packet but
> not on the next. We can't do this exactly for subscriptions however.
> What is the answer?
> -- 
> Matthew J Toseland - toad at amphibian.dyndns.org
> Freenet Project Official Codemonkey - http://freenetproject.org/
> ICTHUS - Nothing is impossible. Our Boss says so.



> _______________________________________________
> Tech mailing list
> Tech at freenetproject.org
> http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/tech

-- 
Matthew J Toseland - toad at amphibian.dyndns.org
Freenet Project Official Codemonkey - http://freenetproject.org/
ICTHUS - Nothing is impossible. Our Boss says so.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: 
<https://emu.freenetproject.org/pipermail/tech/attachments/20050902/c3234930/attachment.pgp>

[Tech] Publish/subscribe, 3rd attempt - confusion over error handling

Reply via email to