Curtis - I think we are converging.
Some context from my side... I am fully aware that this draft is about Homenet environments, but there is a suspicion in the back of my mind that once the duplicate-id resolution mechanism is defined and deployed that folks may want to use it in other environments e.g. TRILL has targeted auto-config as a goal (just an example). You may remember a few years ago a proposal to automatically resolve system-id conflicts was discussed in IS-IS WG. The proposal had a lot of flaws and we shot it down - but it does suggest that some folks may want to use such a mechanism in other types of deployments someday. So I would like to define things such that it is robust enough to be used elsewhere. And since what I am proposing is quite simple I don't think it unduly burdens the Homenet environments. As regards preserving router-id across reboots - sure - that is a good idea also. And what I am proposing is supportive of that because it guarantees that so long as an existing router's LSAs are in the LSDB (even if it is currently undergoing maintenance) any new router that comes up (or even another old router that reboots and is not so well behaved as to remember the router-id it previously used) will not take the router-id of any router seen in the LSDB (reachable or not). This is better than the existing logic which leaves the decision to chance. More inline. > -----Original Message----- > From: Curtis Villamizar [mailto:[email protected]] > Sent: Sunday, February 09, 2014 12:39 PM > To: Les Ginsberg (ginsberg) > Cc: [email protected]; Acee Lindem; OSPF List > Subject: Re: [OSPF] OSPFv3 Autoconfiguration - draft-ietf-ospf-ospfv3- > autoconfig-05.txt > > > Les, > > Perhaps you should read the abstract of the document you are > commenting about: > > SPFv3 is a candidate for deployments in environments where auto- > configuration is a requirement. One such environment is the IPv6 > home network where users expect to simply plug in a router and have > it automatically use OSPFv3 for intra-domain routing. This > document describes the necessary mechanisms for OSPFv3 to be > self-configuring. > > Home network! > > Or the introductio: > > OSPFv3 [OSPFV3] is a candidate for deployments in environments > where auto-configuration is a requirement. > > [...] > > 1.2. Acknowledgments > > This specification was inspired by the work presented in the > Homenet working group meeting in October 2011 in Philadelphia, > Pennsylvania. > > The Homenet WG works on what? Home networks! > > So please keep that in mind when commenting. > > Unless a provider were to be so stupid or lazy to use this on a SP > network then most of the comments from both of us don't apply, > *except* the few comments below about "in a home network". > > Perhaps the draft should add text explicitly stating that the last > router-id used successfully should be used on a reboot rather than a > new random number. I notice that only the Router-Hardware-Fingerprint > TLV is persistent across reboots. This is insufficient if we want to > minimize disruption. > > The only case then (if router-id is remembered across reboots) would > be a new router. Also a router which fails to remember its old router-id across a reboot. > In that case your uptime rule would help. So > perhaps two things could be reocmmended: > > 1. In section 4, include a "SHOULD remember the most recent > successfully used router-id across reboots and reuse that". > Reword the rest so if that information is not available, then > pick a random number. Fine with me. > > 2. a. In section 6, mention the uptime rule. Modify the Router > Uptime TLV as suggested. > > b. Alternately add a flag to the Router-Hardware-Fingerprint > TLV that indicates that since last reboot this router-id has > been used and acheived a "full state". A router just > rebooting would not have ever reached the full state before > noticing a conflict as long as the conflct check is run > before considering itself in the full state. Yes - this is what I had in mind. Also note we need a flag in hellos as well - for which I had proposed using an option bit (or LLS if folks don't want to consume an options bit). But what is your definition of "full state"? It cannot be just having reached "full state" with a single neighbor as it is possible the first neighbor that comes up might also be in the process of coming up itself and does not yet have the full LSDB. What I had in mind was a short but sufficient time that if we had been up for that long we could be comfortable that our existence was known network-wide. I had mentioned 20 minutes - but that was quite a conservative number - I think we could safely be more aggressive (5 minutes??). Once that period had passed we set the flag and leave it set. And if we are smart enough to reuse the same router-id following reboot when we get our own Fingerprint from our old incarnation we will see that the flag is set and can therefore set it immediately following reboot without waiting for 5 minutes. The significance of the time interval is only to define the period during which if we are unlucky enough to have two new routers come up within that interval and happen to pick the same router-id that we will defer to the fingerprint/link local address tie breaker i.e. both routers are considered "new" and so neither one has staked a claim yet. If you and I are now in consensus (as I think we are), it is time for the authors of the draft to weigh in and if they agree update the draft with the specifics. Les > > Note: A second flag bit indicating that this router-id had > been used successfully in a past reboot might also help but > would only matter among two routers both rebooting and > neither having reached the full state. > > I think #1 above is sufficient and does more to prevent surprises. I > think #2 above helps only in the new router case but #2a requires > adding a TLV and so isn't worth it IMHO. Case #2b accomplished the > same thing with only a flag. I would not object to #2b above if #1 > above is also added. > > See inline anyway. > > In message <[email protected]> > "Les Ginsberg (ginsberg)" writes: > > > > Curtis - > > > > > -----Original Message----- > > > From: Curtis Villamizar [mailto:[email protected]] > > > Sent: Saturday, February 08, 2014 7:30 AM > > > To: Les Ginsberg (ginsberg) > > > Cc: [email protected]; Acee Lindem; OSPF List > > > Subject: Re: [OSPF] OSPFv3 Autoconfiguration - draft-ietf-ospf-ospfv3- > > > autoconfig-05.txt > > > > > > > > > In message <F3ADE4747C9E124B89F0ED2180CC814F23C621A3@xmb-aln- > x02.cisco.com> > > > "Les Ginsberg (ginsberg)" writes: > > > > > > > Curtis - > > > > > > > > Your reply below is talking about things which I think do not directly > > > > bear on the value add of what I have proposed. > > > > > > > > You mention various ways to insure that a given device assigns the > > > > same router-id each time it starts up and ways to insure it picks the > > > > same sequence of second/third... choices in the event it has to change > > > > its router-id. All good suggestions, but what I am talking about is > > > > what we do in the event a conflict occurs despite our best efforts to > > > > avoid it. With the current draft content preference is based solely on > > > > a fixed identifier (fingerprint) without regard to which choice would > > > > minimize disruption to the network. When preference is given to the > > > > "old router" to retain its existing router-id this shortcoming is > > > > addressed. > > > > > > In the lifetime of a router it only gets added once. In the lifetime > > > of a router we would hope it only reboots zero time but experience so > > > far has been that reboots over a router's lifetime tend to be > 0 and > > > in some cases >> 0. > > > > > > So you are optimizing for a 1 in 4 billion occurance that can happen > > > only once in the lifetime of a router. > > > > The entire duplicate router-id resolution logic is addressing the > > improbable case. My proposal adds - literally - one line of code to > > the logic used to decide whether "I" should change my router-id or > > whether "you" should change your router-id. > > > > > > > > We also need to look at the consequences of this very improbably > > > occurance. Today's routers accomplish IGP convergence in large > > > networks in subsecond times, in some cases << 1 second. > > > > > > Note that if flooding is completed (both withdraw old and install new) > > > in less than the SPF delay which is commonly implemented (some delay > > > after receiving the first flooded IGP change), then there is no impact > > > on routing. > > > > Your analysis does not apply to this scenario. The router which > > changes its router-id is effectively doing a cold start. All > > adjacencies will go down. All LSAs originated by this router become > > invalid. All routes will be removed from the forwarding plane. If > > you are running BGP all the BGP nexthops will be gone on the router > > which is changing its identity. Restoration of the adjacencies and > > reacquisition of the LSDB will take multiple seconds. The best you > > can hope for is several seconds of disruption - it could easily be > > much longer. > > > > For the new node which has usurped the old node's identity it will > > have to purge/replace all of the LSAs generated by the old > > node. While normal operation of the update process will insure that > > this happens in a reliable way the amount of flooding network-wide > > required to bringup a new node has now been roughly doubled > > i.e. the old node must reissue all of its LSAs using a new identity > > and the new node must purge/replace the old node's LSAs with its > > own versions. This will result in multiple SPFs on all nodes in the > > network and likely cause loops/blackholes during the transition > > since some of the SPFs will be run on versions of the LSDB which > > are inaccurate (part old node's old LSAs and part new node's > > LSAs). Suggesting that this could be handled in the same way/time > > as we typically handle a single link failure isn't credible. > > All routers are supposed to keep a fixed router-id across reboots. If > interfaces are changed when down, the last used router-id should be on > flash. If flash is removed and replaced (rather than a new image > installed), then with the same set of interfaces, the same decision > should be made. We are down to a very special case where both flash > and interfaces are removed and replaced yielding no history and a > different set of MACs to pick from. > > > > > Your statement that what I propose is only relevant when two routers > > > > go down does not match the scenarios I envision. If I want to add a > > > > new device to my network or if I need to replace an existing device in > > > > my network I am only affecting one device - but as I am introducing a > > > > device with a new fingerprint it is possible that it will introduce a > > > > conflict with an existing router-id. > > > > > > In provider networks routers are generally added during maintenance > > > windows so should anything unexpected happen, impact is minimized. > > > > > > In home nets, the home user isn't going to notice the convergence time > > > if there is any. A 10 msec SPF delay is likely to be plenty. > > > > As I stated above, disruption will be orders of magnitude longer than 10 > ms. > > In a home net? With perhaps a half dozen routers and a default route? > Someone has a very bad OSPF implementation. :-) Or did you miss the > "In home nets" at the front of the paragraph. > > For example, in a 10 node network with average degree 4, perhpas 40 > links in 10 router LSA exist. A few RTT (less than 1 msec for a > homenet) for each neighbor adjacency (which happen in parallel) and > ten packets from 4 sources is needed to reach the full state followed > by one SPF to be fully up and running. Other routers get one > additional router LSA plus four new links in existing router LSA and > have to run an SPF. Even on a software based homenet router using an > ARM, 10 msec is likely to be enough time and if it is "orders of > magnitude" longer, something is wrong with one of the implementations. > This would be an more complicated than usual home net or even soho, > more likely a small business. > > > > > In a subsequent reply you liked the idea of the new device delaying > > > > advertising reachability until it is has determined that its router-id > > > > choice is not in conflict. The old/new router paradigm supports this > > > > strategy by assuring that the old router will not consider changing > > > > its router-id until enough time has elapsed for the new router to > > > > transition to being an old router. > > > > > > If it wins the coin toss, the router would advertise at least one LSA > > > to indicate its existance and could hold back on any additional > > > advertisements until the other router has withdrawn routes. > > > > > > > This suggestion does not alter the fact that if the old node changes > > > > its router-id the network has to respond to three events: > > > > 1)Loss of the old node > > 2)Introduction of the old-node with a new identity > > 3)Introduction of the new node with the identity of the old-node > > Again, the old node should remember the last router-id used and try to > reuse it. > > > If however we insure that the old-node does not change its identity > > then the network only has to respond to a single event - the > > introduction of the new-node. > > Yes and if it were up and won the resolution last time, it will have > saved that router-id and will reuse it. If it came up previously and > lost the resolution, then it will remember the router-id it used, > whether second or third pick, and use that. > > > > > Finally, what I propose is extremely simple to implement. I think it > > > > isn't much of an exaggeration to say that any one of us could have > > > > implemented the enhancement in the time it has taken to discuss its > > > > merits. So we aren't overengineering for a case which is admittedly > > > > very unlikely to occur - we are adding a modest extension to make our > > > > solution less disruptive. > > > > > > Yes but it it *bad* for the more common case where routers go down > > > occasionally. > > > > You are going to have to clarify exactly what "bad side effects" you > > see for what I propose - because I don't see any - whereas I do > > see benefits as described above. > > If router-id is not remembered between reboots, then there is the one > in 4 billion time number of routers (less than 10 for a home net > today, but maybe more in the future). > > If router-id is remembered between reboots, then no matter how long a > router has been down, if nothing else in the network changed, there is > zero chance of having a collision. > > With either method, if router-id is remembered between reboots, then > there is zero chance of collision. > > IMO should this ever be used on a managed network (including a home > net / soho / small business net that happens to be managed) then > having routers come back from a reboot with the same router-ids would > be a big plus. For example, after a power outage NMS discovery would > not have to be repeated. > > > Les > > > > > > > > > > > Les > > > > > > Curtis > > > > > > > > > > > -----Original Message----- > > > > > From: Curtis Villamizar [mailto:[email protected]] > > > > > Sent: Friday, February 07, 2014 9:22 AM > > > > > To: Les Ginsberg (ginsberg) > > > > > Cc: Acee Lindem; Curtis Villamizar; OSPF List > > > > > Subject: Re: [OSPF] OSPFv3 Autoconfiguration - draft-ietf-ospf- > ospfv3- > > > > > autoconfig-05.txt > > > > > > > > > > > > > > > In message <F3ADE4747C9E124B89F0ED2180CC814F23C619A9@xmb-aln- > > > x02.cisco.com> > > > > > "Les Ginsberg (ginsberg)" writes: > > > > > > > > > > > > So, I am one person who raised this concern to Acee - but the > proposal > > > > > > outlined by Acee is not what I had in mind. There is no need to use > > > > > > "uptime" or to invent some unusual exchange of LSAs prior to > Exchange > > > > > > state. > > > > > > > > > > > > Also, in regards to Curtis's comment - it is not DOS attacks that I > am > > > > > > trying to mitigate here. As he says if an attacker is in your > network > > > > > > and able to originate credible packets no strategy is safe. > > > > > > > > > > > > The motivating use case is to minimize disruption of a stable > network > > > > > > when a new router is added or an existing router is > > > > > > replaced/rebooted. In other words non-disruptive handling of the > > > > > > common maintenance/upgrade scenarios. > > > > > > > > > > > > What I have in mind is this: > > > > > > > > > > > > 1) A router needs a way to advertise that it has been up and > running > > > > > > for a minimum length of time - for the sake of discussion let's > say > > > > > > 20 minutes. Routers then fall into two categories: > > > > > > > > > > > > o Old routers (up >= minimum time) > > > > > > o New routers (up < minimum time) > > > > > > > > > > > > 2) When a duplicate router-id is detected, the first tie breaker is > > > > > > between old routers and new routers. The old router gets to keep > > > > > > its router-id and the new router picks a new router-id. If both > > > > > > routers are "new" or both routers are "old" then we revert to > the > > > > > > existing tie breakers defined in the document (link local > address > > > > > > for directly connected routers and fingerprint info for > > > > > > non-neighbors). > > > > > > > > > > > > 3) Advertisement of the "old/new" state requires a single bit - but > it > > > > > > has to be available both in hellos and the new AC-LSA. Adding it > to > > > > > > the AC-LSA is easy to do. For hellos, there are two > possibilities: > > > > > > > > > > > > o Use one of the Options Bits > > > > > > o Use LLS > > > > > > > > > > > > Be interested in how folks feel about this. > > > > > > > > > > > > Les > > > > > > > > > > > > > > > Les, > > > > > > > > > > Excluding DoS attack, we are talking about a one in 4 billion case > > > > > (for any two routers, so with 400 routers, still well under one in > 1M) > > > > > where two routers hash a MAC address or pick a one time random number > > > > > from out of nowhere and end up with the same number. > > > > > > > > > > If that does happen (and one in 1M is certainly possible), then it > > > > > would be nice if the routers always ended up with the same router-id. > > > > > This could be accomplished by some fixed method such as hashing a > > > > > constant with the first choice or router-id or using the router-id as > > > > > a seed for the random number generator (which will pick the same > > > > > sequence of random numbers each time). If this is done, then a > > > > > conflict would always produce the same set of next picks. The set of > > > > > routers in a given network would always end up with the same > > > > > router-ids once they all came up and if only one went down at a time > > > > > then it would always end up with the same router-id when it came up. > > > > > > > > > > Zero conf was mainly intended for unmanaged networks (motivated by > > > > > work in the homenet WG). In these small unmanaged networks it > doesn't > > > > > matter which router gets what router-id as long as they end up unique > > > > > and convergence is in a reasonable time relative to keeping eyeballs > > > > > happy. It could be applied to enterprise or providers but in either > > > > > case having the routers end up with the same router-ids would make > for > > > > > easier management. > > > > > > > > > > For your scenario to matter at all with current rules, both routers > in > > > > > the conflict would have to go down. If only the one that is > preferred > > > > > goes down, the other is not going to change its router-id as a result > > > > > so when it comes up it gets its first pick with no conflict. If the > > > > > one that was not preferred goes down, it comes up, sees a conflict > and > > > > > takes its second pick (loses the conflict every time). It is only if > > > > > they both go down and the one that normally loses the conflict comes > > > > > up first that there is a change in router-id. That too can be solved > > > > > with a rule that you always come up with the last router-id used. > > > > > > > > > > Curtis > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: OSPF [mailto:[email protected]] On Behalf Of Acee > Lindem > > > > > > > Sent: Thursday, February 06, 2014 5:12 PM > > > > > > > To: Curtis Villamizar > > > > > > > Cc: OSPF List > > > > > > > Subject: Re: [OSPF] OSPFv3 Autoconfiguration - draft-ietf-ospf- > > > ospfv3- > > > > > > > autoconfig-05.txt > > > > > > > > > > > > > > Hi Curtis, > > > > > > > I agree and believe the significance of this use case where a new > > > router > > > > > is > > > > > > > inserted into an auto-configured domain has been greater > exaggerated. > > > > > > > Thanks, > > > > > > > Acee > > > > > > > On Feb 5, 2014, at 3:58 PM, Curtis Villamizar > <[email protected]> > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > In message <cf17dd4e.2696b%[email protected]> > > > > > > > > Acee Lindem writes: > > > > > > > > > > > > > > > >> The OSPFv3 autoconfiguration draft was cloned and presented in > the > > > > > > > >> ISIS WG (http://www.ietf.org/id/draft-liu-isis-auto-conf- > 00.txt). > > > In > > > > > > > >> the ISIS WG, there was a concern that the resolution of a > > > duplicate > > > > > > > >> system ID did not include the amount of time the router was > > > > > > > >> operational when determining which router would need to choose > a > > > new > > > > > > > >> router ID. With additional complexity, we could incorporate > router > > > > > > > >> uptime into the resolution process. One way to do this would > be > > > to: > > > > > > > >> > > > > > > > >> 1. Add a Router Uptime TLV to the OSPFv3 AC-LSA. It would > > > include > > > > > > > >> the uptime in seconds. > > > > > > > >> > > > > > > > >> 2. Use the Router Uptime TLV as the primary determinant in > > > > > > > >> deciding which router must choose a new OSPFv3 Router > > > > > > > >> ID. Router uptimes less than 3600 (MaxAge) seconds > apart > > > are > > > > > > > >> considered equal. > > > > > > > >> > > > > > > > >> 3. When an OSPFv3 Hello is received with a different link- > > > local > > > > > > > >> source address but a different router-id, unicast the > > > OSPFv3 > > > > > > > >> AC-LSA to the neighbor so that OSPFv3 duplicate > router > > > > > > > >> resolution can proceed as in the case where it is > received > > > > > > > >> through the normal flooding process. This is somewhat > of a > > > > > > > >> hack as the we'd also need to accept OSPF Link State > > > Updates > > > > > > > >> from a neighbor that is not in Exchange State or > greater. > > > > > > > >> > > > > > > > >> An alternative to #3 would be to use Link-Local Signaling > (LLS) > > > for > > > > > > > >> signaling the contents of the OSPFv3 AC-LSA. However, you'd > only > > > want > > > > > > > >> to send the Router-Uptime and Router Hardware Fingerprint when > a > > > > > > > >> duplicate Router-ID is detected. This requires implementing > the > > > > > > > >> resolution two ways but may be preferable since it doesn't > require > > > > > > > >> violating the flooding rules. > > > > > > > >> > > > > > > > >> In any case, I'd like to get other opinions as to whether this > > > problem > > > > > > > >> is worth solving. > > > > > > > >> > > > > > > > >> Thanks, > > > > > > > >> Acee > > > > > > > > > > > > > > > > > > > > > > > > Acee, > > > > > > > > > > > > > > > > If the basis for router-id on boot up results in a fixed value, > and > > > if > > > > > > > > a duplicate will occur on a give network, then which of two > > > duplicate > > > > > > > > routers gets that value may change after one of them reboots. > If > > > > > > > > uptime is not considered, it will never change as long as one > > > router > > > > > > > > stays up at any given time. > > > > > > > > > > > > > > > > We are talking about a very low probability event (a duplicate) > > > except > > > > > > > > if this is a DoS attack and then either using or not using > uptime > > > > > > > > won't matter since the attacker will claim an impossibly long > > > uptime. > > > > > > > > > > > > > > > > Curtis _______________________________________________ OSPF mailing list [email protected] https://www.ietf.org/mailman/listinfo/ospf
