On Mar 31, 2013, at 6:26 AM, Andrew Ferguson wrote: > I'm curious about the background of the openflow.discovery component. my > main question boils down to: is a timeout-based approach actually necessary > in a pure OpenFlow network? was LLDP implemented so non-OpenFlow switches > could discover POX-controlled OF switches, or was this based on something you > all experienced with a pure OF network? > > as I currently understand things, one could simply use the PortStatus > messages as proactive notification that a link has disappeared. (this > assumes that OF switches properly deliver such messages -- if some don't, > then indeed, a timeout approach is required, so I'm wondering if anyone has > seen such behavior.) in other words, while I believe PortStatus messages are > sufficient, are they also necessary? (if they are, then we could ditch the > periodic re-sending of LLDP packets, right?)
This design was inspired by NOX. I don't think discovery by non-OF switches was a major factor. I think the major reasons are: 1) Port status messages alone don't actually let you discover the topology, which is a major point of this module. The discovery module really does both topology discovery and link failure detection; in the mental model behind the module, they're kind of the same thing, but other approaches also make sense. For example, I think that the existing "I will continually figure out what people have plugged together" approach may make sense for enterprise scenarios, but for something like a datacenter it's probably more reasonable to either load the topology from a file or do topology discovery *once* at startup and then assume it's the same forever and all you do is check for link failure/recovery. I just haven't gotten around to making any clean versions of alternate approaches to include in POX. 2) This approach works in non-pure-OpenFlow networks. Indeed, the choice of a non-standard ethernet address for the LLDP messages is specifically to allow the controller to see through non-OpenFlow switches. 3) The relevant port state is "link down", which we can assume is probably tied to LIT/link pulse/whatever, but there are ways that a link can fail which aren't caught by these -- though admittedly, many more in mixed networks. A more recent but related concern is ... for a virtual port that represents a tunnel (as is common in the important use case of an OpenFlow overlay network -- a different sort of mixed network), do you get link downs? I don't actually know, but I expect the answer is "not always" at best. > relatedly, I noticed that while POX's openflow.discovery proactively deletes > links in response to a switch's ConnectionDown, the PortStatus events are > only used to remove ports from the list of ports out of which to send LLDP > messages on each cycle. what was the reasoning behind this design? No real rationale there. The POX design followed the broad strokes of the NOX design (diverging a bit), and I expect NOX didn't do it so neither did POX. I actually noticed this myself when I did some work on discovery relatively recently but didn't do anything about it (the refactoring moved the port status handlers to another class which made it obvious that this wasn't being done). I think it'd be a reasonable addition, but (in my opinion) is a fairly minor optimization -- under some conditions you wouldn't have to wait some fraction of the discovery cycle time to notice a link had been removed/downed. Of course, if the switch-provided link state was assumed to be sufficient, this would be a different story, but that motivates an entirely different design of the discovery component anyway. > anyway, this is the discovery approach I currently use in my own controller, > and I'm wondering if I'll need the timeout-based approach in some scenario I > haven't encountered yet. In general, I think you need to probe and timeout for reliable detection. But as in so many things, I think there are probably multiple designs that make sense for different use cases and environments. -- Murphy
