Dear all, Lilian has found a case in which Babel 0.9 would delay reconvergence by 30 seconds. I've now fixed this case, and reworked the route selection algorithm somewhat; convergence speed is much improved.
Since I'd like to release 0.10 pretty soon, I'm very much interested in any data about the behaviour of the current Darcs tree that you may have. Since I'm in a chatty mood, I'll give a few details about the issue. (By the way, Lilian, my analysis of the problem on IRC was completely wrong.) A (proactive) routing protocol has three tasks: - neighbour sensing: measuring link quality between neighbours; - route discovery: discovering the set of available routes; - route selection: choosing a particular route to every destination. In many routing protocols, the three tasks are mixed up. For example, both RIP and B.A.T.M.A.N. mix up neighbour sensing and route discovery, which is why in RIP you cannot tune the hello interval separately from the updte interval. Babel was designed as a modular protocol: the three stages of the protocol are, to the extent possible, independent. Of particular interest is the independence of route selection and route discovery: the protocol remains correct if nodes do not always choose the route with smallest metric, but choose ``worse'' routes under some circumstances (as is usually the case with distance-vector protocols, but not with link-state protocols). This makes it possible to implement route selection policies that choose routes according to criteria richer than just link quality. Now, when I started working on Babel, I had just experimented with OLSR, and found that it has serious problems with route stability: under some circumstances, it flaps between two parallel routes to a given destination. Route flapping is ungood: it can cause packet reordering, and it usually causes jitter (for example due to an extra ARP exchange) and sometimes packet loss. Because of that, Babel's route selection policy tries to promote route stability. The main stability mechanism is a simple form of hysteresis: we don't switch routes unless the new route is significantly better than the old route. (The exact amount of hysteresis used depends on whether we also switch router-id, and is something that needs more experimentation.) The second route stability mechanism is much simpler: a route is regarded with great suspicion unless it has been around for 30 seconds. Unfortunately, this mechanism turned out to be buggy, and to interact badly with some other features of the protocol (notably the feasibility condition), which caused perfectly good routes to be disregarded for 30 seconds in Lilian's network topology. After some thought, I've simply removed the 30 second condition; the only route stability mechanism remaining is hysteresis, which appears to do The Right Thing in practice. In case you want to check the code, all of this happens in route.c, in the functions find_best_route (which finds the candidate route for a given destination) and consider_route (which decides whether to switch to a new, potentially ``better'' route). While I was at it, I reworked some of the constants in the protocol, notably by making it send multi-hop requests and triggered updates more aggressively; this makes the protocol somewhat more chatty, but very slightly so. All together, you should see much faster convergence with 0.10 than with 0.9. Juliusz _______________________________________________ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/mailman/listinfo/babel-users