Re: [OSPF] OSPFv3 Autoconfiguration - draft-ietf-ospf-ospfv3-autoconfig-05.txt

Les Ginsberg (ginsberg) Sat, 08 Feb 2014 10:24:32 -0800

Curtis -

> -----Original Message-----
> From: Curtis Villamizar [mailto:[email protected]]
> Sent: Saturday, February 08, 2014 7:30 AM
> To: Les Ginsberg (ginsberg)
> Cc: [email protected]; Acee Lindem; OSPF List
> Subject: Re: [OSPF] OSPFv3 Autoconfiguration - draft-ietf-ospf-ospfv3-
> autoconfig-05.txt
> 
> 
> In message <[email protected]>
> "Les Ginsberg (ginsberg)" writes:
> 
> > Curtis -
> >
> > Your reply below is talking about things which I think do not directly
> > bear on the value add of what I have proposed.
> >
> > You mention various ways to insure that a given device assigns the
> > same router-id each time it starts up and ways to insure it picks the
> > same sequence of second/third... choices in the event it has to change
> > its router-id. All good suggestions, but what I am talking about is
> > what we do in the event a conflict occurs despite our best efforts to
> > avoid it. With the current draft content preference is based solely on
> > a fixed identifier (fingerprint) without regard to which choice would
> > minimize disruption to the network. When preference is given to the
> > "old router" to retain its existing router-id this shortcoming is
> > addressed.
> 
> In the lifetime of a router it only gets added once.  In the lifetime
> of a router we would hope it only reboots zero time but experience so
> far has been that reboots over a router's lifetime tend to be > 0 and
> in some cases >> 0.
> 
> So you are optimizing for a 1 in 4 billion occurance that can happen
> only once in the lifetime of a router.


The entire duplicate router-id resolution logic is addressing the improbable 
case. My proposal adds - literally - one line of code to the logic used to 
decide whether "I" should change my router-id or whether "you" should change 
your router-id.

> 
> We also need to look at the consequences of this very improbably
> occurance.  Today's routers accomplish IGP convergence in large
> networks in subsecond times, in some cases << 1 second.
> 
> Note that if flooding is completed (both withdraw old and install new)
> in less than the SPF delay which is commonly implemented (some delay
> after receiving the first flooded IGP change), then there is no impact
> on routing.

Your analysis does not apply to this scenario. The router which changes its 
router-id is effectively doing a cold start. All adjacencies will go down. All 
LSAs originated by this router become invalid. All routes will be removed from 
the forwarding plane. If you are running BGP all the BGP nexthops will be gone 
on the router which is changing its identity. Restoration of the adjacencies 
and reacquisition of the LSDB will take multiple seconds. The best you can hope 
for is several seconds of disruption - it could easily be much longer.

For the new node which has usurped the old node's identity it will have to 
purge/replace all of the LSAs generated by the old node. While normal operation 
of the update process will insure that this happens in a reliable way the 
amount of flooding network-wide required to bringup a new node has now been 
roughly doubled i.e. the old node must reissue all of its LSAs using a new 
identity and the new node must purge/replace the old node's LSAs with its own 
versions. This will result in multiple SPFs on all nodes in the network and 
likely cause loops/blackholes during the transition since some of the SPFs will 
be run on versions of the LSDB which are inaccurate (part old node's old LSAs 
and part new node's LSAs). Suggesting that this could be handled in the same 
way/time as we typically handle a single link failure isn't credible.

> 
> > Your statement that what I propose is only relevant when two routers
> > go down does not match the scenarios I envision. If I want to add a
> > new device to my network or if I need to replace an existing device in
> > my network I am only affecting one device - but as I am introducing a
> > device with a new fingerprint it is possible that it will introduce a
> > conflict with an existing router-id.
> 
> In provider networks routers are generally added during maintenance
> windows so should anything unexpected happen, impact is minimized.
> 
> In home nets, the home user isn't going to notice the convergence time
> if there is any.  A 10 msec SPF delay is likely to be plenty.

As I stated above, disruption will be orders of magnitude longer than 10 ms. 

> 
> > In a subsequent reply you liked the idea of the new device delaying
> > advertising reachability until it is has determined that its router-id
> > choice is not in conflict. The old/new router paradigm supports this
> > strategy by assuring that the old router will not consider changing
> > its router-id until enough time has elapsed for the new router to
> > transition to being an old router.
> 
> If it wins the coin toss, the router would advertise at least one LSA
> to indicate its existance and could hold back on any additional
> advertisements until the other router has withdrawn routes.
> 

This suggestion does not alter the fact that if the old node changes its 
router-id the network has to respond to three events:

1)Loss of the old node
2)Introduction of the old-node with a new identity
3)Introduction of the new node with the identity of the old-node

If however we insure that the old-node does not change its identity then the 
network only has to respond to a single event - the introduction of the 
new-node.

> > Finally, what I propose is extremely simple to implement. I think it
> > isn't much of an exaggeration to say that any one of us could have
> > implemented the enhancement in the time it has taken to discuss its
> > merits. So we aren't overengineering for a case which is admittedly
> > very unlikely to occur - we are adding a modest extension to make our
> > solution less disruptive.
> 
> Yes but it it *bad* for the more common case where routers go down
> occasionally.

You are going to have to clarify exactly what "bad side effects" you see for 
what I propose - because I don't see any - whereas I do see benefits as 
described above.


   Les


> 
> >    Les
> 
> Curtis
> 
> 
> > > -----Original Message-----
> > > From: Curtis Villamizar [mailto:[email protected]]
> > > Sent: Friday, February 07, 2014 9:22 AM
> > > To: Les Ginsberg (ginsberg)
> > > Cc: Acee Lindem; Curtis Villamizar; OSPF List
> > > Subject: Re: [OSPF] OSPFv3 Autoconfiguration - draft-ietf-ospf-ospfv3-
> > > autoconfig-05.txt
> > >
> > >
> > > In message <F3ADE4747C9E124B89F0ED2180CC814F23C619A9@xmb-aln-
> x02.cisco.com>
> > > "Les Ginsberg (ginsberg)" writes:
> > > >
> > > > So, I am one person who raised this concern to Acee - but the proposal
> > > > outlined by Acee is not what I had in mind. There is no need to use
> > > > "uptime" or to invent some unusual exchange of LSAs prior to Exchange
> > > > state.
> > > >
> > > > Also, in regards to Curtis's comment - it is not DOS attacks that I am
> > > > trying to mitigate here. As he says if an attacker is in your network
> > > > and able to originate credible packets no strategy is safe.
> > > >
> > > > The motivating use case is to minimize disruption of a stable network
> > > > when a new router is added or an existing router is
> > > > replaced/rebooted. In other words non-disruptive handling of the
> > > > common maintenance/upgrade scenarios.
> > > >
> > > > What I have in mind is this:
> > > >
> > > > 1) A router needs a way to advertise that it has been up and running
> > > >    for a minimum length of time - for the sake of discussion let's say
> > > >    20 minutes. Routers then fall into two categories:
> > > >
> > > >   o Old routers (up >= minimum time)
> > > >   o New routers (up < minimum time)
> > > >
> > > > 2) When a duplicate router-id is detected, the first tie breaker is
> > > >    between old routers and new routers. The old router gets to keep
> > > >    its router-id and the new router picks a new router-id.  If both
> > > >    routers are "new" or both routers are "old" then we revert to the
> > > >    existing tie breakers defined in the document (link local address
> > > >    for directly connected routers and fingerprint info for
> > > >    non-neighbors).
> > > >
> > > > 3) Advertisement of the "old/new" state requires a single bit - but it
> > > >    has to be available both in hellos and the new AC-LSA. Adding it to
> > > >    the AC-LSA is easy to do. For hellos, there are two possibilities:
> > > >
> > > >    o Use one of the Options Bits
> > > >    o Use LLS
> > > >
> > > > Be interested in how folks feel about this.
> > > >
> > > >    Les
> > >
> > >
> > > Les,
> > >
> > > Excluding DoS attack, we are talking about a one in 4 billion case
> > > (for any two routers, so with 400 routers, still well under one in 1M)
> > > where two routers hash a MAC address or pick a one time random number
> > > from out of nowhere and end up with the same number.
> > >
> > > If that does happen (and one in 1M is certainly possible), then it
> > > would be nice if the routers always ended up with the same router-id.
> > > This could be accomplished by some fixed method such as hashing a
> > > constant with the first choice or router-id or using the router-id as
> > > a seed for the random number generator (which will pick the same
> > > sequence of random numbers each time).  If this is done, then a
> > > conflict would always produce the same set of next picks.  The set of
> > > routers in a given network would always end up with the same
> > > router-ids once they all came up and if only one went down at a time
> > > then it would always end up with the same router-id when it came up.
> > >
> > > Zero conf was mainly intended for unmanaged networks (motivated by
> > > work in the homenet WG).  In these small unmanaged networks it doesn't
> > > matter which router gets what router-id as long as they end up unique
> > > and convergence is in a reasonable time relative to keeping eyeballs
> > > happy.  It could be applied to enterprise or providers but in either
> > > case having the routers end up with the same router-ids would make for
> > > easier management.
> > >
> > > For your scenario to matter at all with current rules, both routers in
> > > the conflict would have to go down.  If only the one that is preferred
> > > goes down, the other is not going to change its router-id as a result
> > > so when it comes up it gets its first pick with no conflict.  If the
> > > one that was not preferred goes down, it comes up, sees a conflict and
> > > takes its second pick (loses the conflict every time).  It is only if
> > > they both go down and the one that normally loses the conflict comes
> > > up first that there is a change in router-id.  That too can be solved
> > > with a rule that you always come up with the last router-id used.
> > >
> > > Curtis
> > >
> > >
> > > > > -----Original Message-----
> > > > > From: OSPF [mailto:[email protected]] On Behalf Of Acee Lindem
> > > > > Sent: Thursday, February 06, 2014 5:12 PM
> > > > > To: Curtis Villamizar
> > > > > Cc: OSPF List
> > > > > Subject: Re: [OSPF] OSPFv3 Autoconfiguration - draft-ietf-ospf-
> ospfv3-
> > > > > autoconfig-05.txt
> > > > >
> > > > > Hi Curtis,
> > > > > I agree and believe the significance of this use case where a new
> router
> > > is
> > > > > inserted into an auto-configured domain has been greater exaggerated.
> > > > > Thanks,
> > > > > Acee
> > > > > On Feb 5, 2014, at 3:58 PM, Curtis Villamizar <[email protected]>
> > > wrote:
> > > > >
> > > > > >
> > > > > > In message <cf17dd4e.2696b%[email protected]>
> > > > > > Acee Lindem writes:
> > > > > >
> > > > > >> The OSPFv3 autoconfiguration draft was cloned and presented in the
> > > > > >> ISIS WG (http://www.ietf.org/id/draft-liu-isis-auto-conf-00.txt).
> In
> > > > > >> the ISIS WG, there was a concern that the resolution of a
> duplicate
> > > > > >> system ID did not include the amount of time the router was
> > > > > >> operational when determining which router would need to choose a
> new
> > > > > >> router ID. With additional complexity, we could incorporate router
> > > > > >> uptime into the resolution process. One way to do this would be
> to:
> > > > > >>
> > > > > >>     1. Add a Router Uptime TLV to the OSPFv3 AC-LSA. It would
> include
> > > > > >>        the uptime in seconds.
> > > > > >>
> > > > > >>     2. Use the Router Uptime TLV as the primary determinant in
> > > > > >>        deciding which router must choose a new OSPFv3 Router
> > > > > >>        ID. Router uptimes less than 3600 (MaxAge) seconds apart
> are
> > > > > >>        considered equal.
> > > > > >>
> > > > > >>     3. When an OSPFv3 Hello is received with a different link-
> local
> > > > > >>            source address but a different router-id, unicast the
> OSPFv3
> > > > > >>            AC-LSA to the neighbor so that OSPFv3 duplicate router
> > > > > >>            resolution can proceed as in the case where it is 
> > > > > >> received
> > > > > >>            through the normal flooding process. This is somewhat 
> > > > > >> of a
> > > > > >>            hack as the we'd also need to accept OSPF Link State
> Updates
> > > > > >>            from a neighbor that is not in Exchange State or 
> > > > > >> greater.
> > > > > >>
> > > > > >> An alternative to #3 would be to use Link-Local Signaling (LLS)
> for
> > > > > >> signaling the contents of the OSPFv3 AC-LSA. However, you'd only
> want
> > > > > >> to send the Router-Uptime and Router Hardware Fingerprint when a
> > > > > >> duplicate Router-ID is detected. This requires implementing the
> > > > > >> resolution two ways but may be preferable since it doesn't require
> > > > > >> violating the flooding rules.
> > > > > >>
> > > > > >> In any case, I'd like to get other opinions as to whether this
> problem
> > > > > >> is worth solving.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Acee
> > > > > >
> > > > > >
> > > > > > Acee,
> > > > > >
> > > > > > If the basis for router-id on boot up results in a fixed value, and
> if
> > > > > > a duplicate will occur on a give network, then which of two
> duplicate
> > > > > > routers gets that value may change after one of them reboots.  If
> > > > > > uptime is not considered, it will never change as long as one
> router
> > > > > > stays up at any given time.
> > > > > >
> > > > > > We are talking about a very low probability event (a duplicate)
> except
> > > > > > if this is a DoS attack and then either using or not using uptime
> > > > > > won't matter since the attacker will claim an impossibly long
> uptime.
> > > > > >
> > > > > > Curtis
> > > > >
> > > > > _______________________________________________
> > > > > OSPF mailing list
> > > > > [email protected]
> > > > > https://www.ietf.org/mailman/listinfo/ospf

_______________________________________________
OSPF mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/ospf

Re: [OSPF] OSPFv3 Autoconfiguration - draft-ietf-ospf-ospfv3-autoconfig-05.txt

Reply via email to