Re: [OSPF] OSPFv3 Autoconfiguration - draft-ietf-ospf-ospfv3-autoconfig-05.txt

Acee Lindem Mon, 10 Feb 2014 14:18:08 -0800

Hi Curtis, 

See inline.


On 2/9/14 12:38 PM, "Curtis Villamizar" <[email protected]> wrote:

>
>Les,
>
>Perhaps you should read the abstract of the document you are
>commenting about:
>
>   SPFv3 is a candidate for deployments in environments where auto-
>   configuration is a requirement.  One such environment is the IPv6
>   home network where users expect to simply plug in a router and have
>   it automatically use OSPFv3 for intra-domain routing.  This
>   document describes the necessary mechanisms for OSPFv3 to be
>   self-configuring.
>
>Home network!
>
>Or the introductio:
>
>   OSPFv3 [OSPFV3] is a candidate for deployments in environments
>   where auto-configuration is a requirement.
>
>   [...]
>
> 1.2. Acknowledgments
>
>      This specification was inspired by the work presented in the
>      Homenet working group meeting in October 2011 in Philadelphia,
>      Pennsylvania.
>
>The Homenet WG works on what?  Home networks!
>
>So please keep that in mind when commenting.
>
>Unless a provider were to be so stupid or lazy to use this on a SP
>network then most of the comments from both of us don't apply,
>*except* the few comments below about "in a home network".
>
>Perhaps the draft should add text explicitly stating that the last
>router-id used successfully should be used on a reboot rather than a
>new random number.  I notice that only the Router-Hardware-Fingerprint
>TLV is persistent across reboots.  This is insufficient if we want to
>minimize disruption.
>
>The only case then (if router-id is remembered across reboots) would
>be a new router.  In that case your uptime rule would help.  So
>perhaps two things could be reocmmended:
>
>  1.  In section 4, include a "SHOULD remember the most recent
>      successfully used router-id across reboots and reuse that".
>      Reword the rest so if that information is not available, then
>      pick a random number.

I will do this. 



>
>  2.  a.  In section 6, mention the uptime rule.  Modify the Router
>          Uptime TLV as suggested.
>
>      b.  Alternately add a flag to the Router-Hardware-Fingerprint
>         TLV that indicates that since last reboot this router-id has
>         been used and acheived a "full state".  A router just
>         rebooting would not have ever reached the full state before
>         noticing a conflict as long as the conflct check is run
>         before considering itself in the full state.
>
>          Note: A second flag bit indicating that this router-id had
>         been used successfully in a past reboot might also help but
>         would only matter among two routers both rebooting and
>         neither having reached the full state.
>
>I think #1 above is sufficient and does more to prevent surprises.

I agree and appreciate you arguments in previous messages in this thread.


> I
>think #2 above helps only in the new router case but #2a requires
>adding a TLV and so isn't worth it IMHO.  Case #2b accomplished the
>same thing with only a flag.  I would not object to #2b above if #1
>above is also added.

I agree that this would be a better mechanism and would only represent a
single modification to the hardware fingerprint TLV. However, I really
don't think even this is necessary.

Thanks,
Acee 



>
>See inline anyway.
>
>In message 
><[email protected]>
>"Les Ginsberg (ginsberg)" writes:
>> 
>> Curtis -
>>  
>> > -----Original Message-----
>> > From: Curtis Villamizar [mailto:[email protected]]
>> > Sent: Saturday, February 08, 2014 7:30 AM
>> > To: Les Ginsberg (ginsberg)
>> > Cc: [email protected]; Acee Lindem; OSPF List
>> > Subject: Re: [OSPF] OSPFv3 Autoconfiguration - draft-ietf-ospf-ospfv3-
>> > autoconfig-05.txt
>> > 
>> > 
>> > In message 
>><[email protected]>
>> > "Les Ginsberg (ginsberg)" writes:
>> > 
>> > > Curtis -
>> > >
>> > > Your reply below is talking about things which I think do not
>>directly
>> > > bear on the value add of what I have proposed.
>> > >
>> > > You mention various ways to insure that a given device assigns the
>> > > same router-id each time it starts up and ways to insure it picks
>>the
>> > > same sequence of second/third... choices in the event it has to
>>change
>> > > its router-id. All good suggestions, but what I am talking about is
>> > > what we do in the event a conflict occurs despite our best efforts
>>to
>> > > avoid it. With the current draft content preference is based solely
>>on
>> > > a fixed identifier (fingerprint) without regard to which choice
>>would
>> > > minimize disruption to the network. When preference is given to the
>> > > "old router" to retain its existing router-id this shortcoming is
>> > > addressed.
>> > 
>> > In the lifetime of a router it only gets added once.  In the lifetime
>> > of a router we would hope it only reboots zero time but experience so
>> > far has been that reboots over a router's lifetime tend to be > 0 and
>> > in some cases >> 0.
>> > 
>> > So you are optimizing for a 1 in 4 billion occurance that can happen
>> > only once in the lifetime of a router.
>>  
>> The entire duplicate router-id resolution logic is addressing the
>>  improbable case. My proposal adds - literally - one line of code to
>>  the logic used to decide whether "I" should change my router-id or
>>  whether "you" should change your router-id.
>>  
>> > 
>> > We also need to look at the consequences of this very improbably
>> > occurance.  Today's routers accomplish IGP convergence in large
>> > networks in subsecond times, in some cases << 1 second.
>> > 
>> > Note that if flooding is completed (both withdraw old and install new)
>> > in less than the SPF delay which is commonly implemented (some delay
>> > after receiving the first flooded IGP change), then there is no impact
>> > on routing.
>>  
>> Your analysis does not apply to this scenario. The router which
>>  changes its router-id is effectively doing a cold start. All
>>  adjacencies will go down. All LSAs originated by this router become
>>  invalid. All routes will be removed from the forwarding plane. If
>>  you are running BGP all the BGP nexthops will be gone on the router
>>  which is changing its identity. Restoration of the adjacencies and
>>  reacquisition of the LSDB will take multiple seconds. The best you
>>  can hope for is several seconds of disruption - it could easily be
>>  much longer.
>>  
>> For the new node which has usurped the old node's identity it will
>>  have to purge/replace all of the LSAs generated by the old
>>  node. While normal operation of the update process will insure that
>>  this happens in a reliable way the amount of flooding network-wide
>>  required to bringup a new node has now been roughly doubled
>>  i.e. the old node must reissue all of its LSAs using a new identity
>>  and the new node must purge/replace the old node's LSAs with its
>>  own versions. This will result in multiple SPFs on all nodes in the
>>  network and likely cause loops/blackholes during the transition
>>  since some of the SPFs will be run on versions of the LSDB which
>>  are inaccurate (part old node's old LSAs and part new node's
>>  LSAs). Suggesting that this could be handled in the same way/time
>>  as we typically handle a single link failure isn't credible.
>
>All routers are supposed to keep a fixed router-id across reboots.  If
>interfaces are changed when down, the last used router-id should be on
>flash.  If flash is removed and replaced (rather than a new image
>installed), then with the same set of interfaces, the same decision
>should be made.  We are down to a very special case where both flash
>and interfaces are removed and replaced yielding no history and a
>different set of MACs to pick from.
>
>> > > Your statement that what I propose is only relevant when two routers
>> > > go down does not match the scenarios I envision. If I want to add a
>> > > new device to my network or if I need to replace an existing device
>>in
>> > > my network I am only affecting one device - but as I am introducing
>>a
>> > > device with a new fingerprint it is possible that it will introduce
>>a
>> > > conflict with an existing router-id.
>> > 
>> > In provider networks routers are generally added during maintenance
>> > windows so should anything unexpected happen, impact is minimized.
>> > 
>> > In home nets, the home user isn't going to notice the convergence time
>> > if there is any.  A 10 msec SPF delay is likely to be plenty.
>>  
>> As I stated above, disruption will be orders of magnitude longer than
>>10 ms. 
>
>In a home net?  With perhaps a half dozen routers and a default route?
>Someone has a very bad OSPF implementation.  :-)  Or did you miss the
>"In home nets" at the front of the paragraph.
>
>For example, in a 10 node network with average degree 4, perhpas 40
>links in 10 router LSA exist.  A few RTT (less than 1 msec for a
>homenet) for each neighbor adjacency (which happen in parallel) and
>ten packets from 4 sources is needed to reach the full state followed
>by one SPF to be fully up and running.  Other routers get one
>additional router LSA plus four new links in existing router LSA and
>have to run an SPF.  Even on a software based homenet router using an
>ARM, 10 msec is likely to be enough time and if it is "orders of
>magnitude" longer, something is wrong with one of the implementations.
>This would be an more complicated than usual home net or even soho,
>more likely a small business.
>
>> > > In a subsequent reply you liked the idea of the new device delaying
>> > > advertising reachability until it is has determined that its
>>router-id
>> > > choice is not in conflict. The old/new router paradigm supports this
>> > > strategy by assuring that the old router will not consider changing
>> > > its router-id until enough time has elapsed for the new router to
>> > > transition to being an old router.
>> > 
>> > If it wins the coin toss, the router would advertise at least one LSA
>> > to indicate its existance and could hold back on any additional
>> > advertisements until the other router has withdrawn routes.
>> > 
>>  
>> This suggestion does not alter the fact that if the old node changes
>> > > its router-id the network has to respond to three events:
>>  
>> 1)Loss of the old node
>> 2)Introduction of the old-node with a new identity
>> 3)Introduction of the new node with the identity of the old-node
>
>Again, the old node should remember the last router-id used and try to
>reuse it.
>
>> If however we insure that the old-node does not change its identity
>>  then the network only has to respond to a single event - the
>>  introduction of the new-node.
>
>Yes and if it were up and won the resolution last time, it will have
>saved that router-id and will reuse it.  If it came up previously and
>lost the resolution, then it will remember the router-id it used,
>whether second or third pick, and use that.
>
>> > > Finally, what I propose is extremely simple to implement. I think it
>> > > isn't much of an exaggeration to say that any one of us could have
>> > > implemented the enhancement in the time it has taken to discuss its
>> > > merits. So we aren't overengineering for a case which is admittedly
>> > > very unlikely to occur - we are adding a modest extension to make
>>our
>> > > solution less disruptive.
>> > 
>> > Yes but it it *bad* for the more common case where routers go down
>> > occasionally.
>>  
>> You are going to have to clarify exactly what "bad side effects" you
>> see for what I propose - because I don't see any - whereas I do
>> see benefits as described above.
>
>If router-id is not remembered between reboots, then there is the one
>in 4 billion time number of routers (less than 10 for a home net
>today, but maybe more in the future).
>
>If router-id is remembered between reboots, then no matter how long a
>router has been down, if nothing else in the network changed, there is
>zero chance of having a collision.
>
>With either method, if router-id is remembered between reboots, then
>there is zero chance of collision.
>
>IMO should this ever be used on a managed network (including a home
>net / soho / small business net that happens to be managed) then
>having routers come back from a reboot with the same router-ids would
>be a big plus.  For example, after a power outage NMS discovery would
>not have to be repeated.
>
>>    Les
>>  
>>  
>> > 
>> > >    Les
>> > 
>> > Curtis
>> > 
>> > 
>> > > > -----Original Message-----
>> > > > From: Curtis Villamizar [mailto:[email protected]]
>> > > > Sent: Friday, February 07, 2014 9:22 AM
>> > > > To: Les Ginsberg (ginsberg)
>> > > > Cc: Acee Lindem; Curtis Villamizar; OSPF List
>> > > > Subject: Re: [OSPF] OSPFv3 Autoconfiguration -
>>draft-ietf-ospf-ospfv3-
>> > > > autoconfig-05.txt
>> > > >
>> > > >
>> > > > In message <F3ADE4747C9E124B89F0ED2180CC814F23C619A9@xmb-aln-
>> > x02.cisco.com>
>> > > > "Les Ginsberg (ginsberg)" writes:
>> > > > >
>> > > > > So, I am one person who raised this concern to Acee - but the
>>proposal
>> > > > > outlined by Acee is not what I had in mind. There is no need to
>>use
>> > > > > "uptime" or to invent some unusual exchange of LSAs prior to
>>Exchange
>> > > > > state.
>> > > > >
>> > > > > Also, in regards to Curtis's comment - it is not DOS attacks
>>that I am
>> > > > > trying to mitigate here. As he says if an attacker is in your
>>network
>> > > > > and able to originate credible packets no strategy is safe.
>> > > > >
>> > > > > The motivating use case is to minimize disruption of a stable
>>network
>> > > > > when a new router is added or an existing router is
>> > > > > replaced/rebooted. In other words non-disruptive handling of the
>> > > > > common maintenance/upgrade scenarios.
>> > > > >
>> > > > > What I have in mind is this:
>> > > > >
>> > > > > 1) A router needs a way to advertise that it has been up and
>>running
>> > > > >    for a minimum length of time - for the sake of discussion
>>let's say
>> > > > >    20 minutes. Routers then fall into two categories:
>> > > > >
>> > > > >   o Old routers (up >= minimum time)
>> > > > >   o New routers (up < minimum time)
>> > > > >
>> > > > > 2) When a duplicate router-id is detected, the first tie
>>breaker is
>> > > > >    between old routers and new routers. The old router gets to
>>keep
>> > > > >    its router-id and the new router picks a new router-id.  If
>>both
>> > > > >    routers are "new" or both routers are "old" then we revert
>>to the
>> > > > >    existing tie breakers defined in the document (link local
>>address
>> > > > >    for directly connected routers and fingerprint info for
>> > > > >    non-neighbors).
>> > > > >
>> > > > > 3) Advertisement of the "old/new" state requires a single bit -
>>but it
>> > > > >    has to be available both in hellos and the new AC-LSA.
>>Adding it to
>> > > > >    the AC-LSA is easy to do. For hellos, there are two
>>possibilities:
>> > > > >
>> > > > >    o Use one of the Options Bits
>> > > > >    o Use LLS
>> > > > >
>> > > > > Be interested in how folks feel about this.
>> > > > >
>> > > > >    Les
>> > > >
>> > > >
>> > > > Les,
>> > > >
>> > > > Excluding DoS attack, we are talking about a one in 4 billion case
>> > > > (for any two routers, so with 400 routers, still well under one
>>in 1M)
>> > > > where two routers hash a MAC address or pick a one time random
>>number
>> > > > from out of nowhere and end up with the same number.
>> > > >
>> > > > If that does happen (and one in 1M is certainly possible), then it
>> > > > would be nice if the routers always ended up with the same
>>router-id.
>> > > > This could be accomplished by some fixed method such as hashing a
>> > > > constant with the first choice or router-id or using the
>>router-id as
>> > > > a seed for the random number generator (which will pick the same
>> > > > sequence of random numbers each time).  If this is done, then a
>> > > > conflict would always produce the same set of next picks.  The
>>set of
>> > > > routers in a given network would always end up with the same
>> > > > router-ids once they all came up and if only one went down at a
>>time
>> > > > then it would always end up with the same router-id when it came
>>up.
>> > > >
>> > > > Zero conf was mainly intended for unmanaged networks (motivated by
>> > > > work in the homenet WG).  In these small unmanaged networks it
>>doesn't
>> > > > matter which router gets what router-id as long as they end up
>>unique
>> > > > and convergence is in a reasonable time relative to keeping
>>eyeballs
>> > > > happy.  It could be applied to enterprise or providers but in
>>either
>> > > > case having the routers end up with the same router-ids would
>>make for
>> > > > easier management.
>> > > >
>> > > > For your scenario to matter at all with current rules, both
>>routers in
>> > > > the conflict would have to go down.  If only the one that is
>>preferred
>> > > > goes down, the other is not going to change its router-id as a
>>result
>> > > > so when it comes up it gets its first pick with no conflict.  If
>>the
>> > > > one that was not preferred goes down, it comes up, sees a
>>conflict and
>> > > > takes its second pick (loses the conflict every time).  It is
>>only if
>> > > > they both go down and the one that normally loses the conflict
>>comes
>> > > > up first that there is a change in router-id.  That too can be
>>solved
>> > > > with a rule that you always come up with the last router-id used.
>> > > >
>> > > > Curtis
>> > > >
>> > > >
>> > > > > > -----Original Message-----
>> > > > > > From: OSPF [mailto:[email protected]] On Behalf Of Acee
>>Lindem
>> > > > > > Sent: Thursday, February 06, 2014 5:12 PM
>> > > > > > To: Curtis Villamizar
>> > > > > > Cc: OSPF List
>> > > > > > Subject: Re: [OSPF] OSPFv3 Autoconfiguration -
>>draft-ietf-ospf-
>> > ospfv3-
>> > > > > > autoconfig-05.txt
>> > > > > >
>> > > > > > Hi Curtis,
>> > > > > > I agree and believe the significance of this use case where a
>>new
>> > router
>> > > > is
>> > > > > > inserted into an auto-configured domain has been greater
>>exaggerated.
>> > > > > > Thanks,
>> > > > > > Acee
>> > > > > > On Feb 5, 2014, at 3:58 PM, Curtis Villamizar
>><[email protected]>
>> > > > wrote:
>> > > > > >
>> > > > > > >
>> > > > > > > In message <cf17dd4e.2696b%[email protected]>
>> > > > > > > Acee Lindem writes:
>> > > > > > >
>> > > > > > >> The OSPFv3 autoconfiguration draft was cloned and
>>presented in the
>> > > > > > >> ISIS WG
>>(http://www.ietf.org/id/draft-liu-isis-auto-conf-00.txt).
>> > In
>> > > > > > >> the ISIS WG, there was a concern that the resolution of a
>> > duplicate
>> > > > > > >> system ID did not include the amount of time the router was
>> > > > > > >> operational when determining which router would need to
>>choose a
>> > new
>> > > > > > >> router ID. With additional complexity, we could
>>incorporate router
>> > > > > > >> uptime into the resolution process. One way to do this
>>would be
>> > to:
>> > > > > > >>
>> > > > > > >>     1. Add a Router Uptime TLV to the OSPFv3 AC-LSA. It
>>would
>> > include
>> > > > > > >>        the uptime in seconds.
>> > > > > > >>
>> > > > > > >>     2. Use the Router Uptime TLV as the primary
>>determinant in
>> > > > > > >>        deciding which router must choose a new OSPFv3
>>Router
>> > > > > > >>        ID. Router uptimes less than 3600 (MaxAge) seconds
>>apart
>> > are
>> > > > > > >>        considered equal.
>> > > > > > >>
>> > > > > > >>     3. When an OSPFv3 Hello is received with a different
>>link-
>> > local
>> > > > > > >>         source address but a different router-id, unicast the
>> > OSPFv3
>> > > > > > >>         AC-LSA to the neighbor so that OSPFv3 duplicate router
>> > > > > > >>         resolution can proceed as in the case where it is
>>received
>> > > > > > >>         through the normal flooding process. This is somewhat
>>of a
>> > > > > > >>         hack as the we'd also need to accept OSPF Link State
>> > Updates
>> > > > > > >>         from a neighbor that is not in Exchange State or
>>greater.
>> > > > > > >>
>> > > > > > >> An alternative to #3 would be to use Link-Local Signaling
>>(LLS)
>> > for
>> > > > > > >> signaling the contents of the OSPFv3 AC-LSA. However,
>>you'd only
>> > want
>> > > > > > >> to send the Router-Uptime and Router Hardware Fingerprint
>>when a
>> > > > > > >> duplicate Router-ID is detected. This requires
>>implementing the
>> > > > > > >> resolution two ways but may be preferable since it doesn't
>>require
>> > > > > > >> violating the flooding rules.
>> > > > > > >>
>> > > > > > >> In any case, I'd like to get other opinions as to whether
>>this
>> > problem
>> > > > > > >> is worth solving.
>> > > > > > >>
>> > > > > > >> Thanks,
>> > > > > > >> Acee
>> > > > > > >
>> > > > > > >
>> > > > > > > Acee,
>> > > > > > >
>> > > > > > > If the basis for router-id on boot up results in a fixed
>>value, and
>> > if
>> > > > > > > a duplicate will occur on a give network, then which of two
>> > duplicate
>> > > > > > > routers gets that value may change after one of them
>>reboots.  If
>> > > > > > > uptime is not considered, it will never change as long as
>>one
>> > router
>> > > > > > > stays up at any given time.
>> > > > > > >
>> > > > > > > We are talking about a very low probability event (a
>>duplicate)
>> > except
>> > > > > > > if this is a DoS attack and then either using or not using
>>uptime
>> > > > > > > won't matter since the attacker will claim an impossibly
>>long
>> > uptime.
>> > > > > > >
>> > > > > > > Curtis

_______________________________________________
OSPF mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/ospf

Re: [OSPF] OSPFv3 Autoconfiguration - draft-ietf-ospf-ospfv3-autoconfig-05.txt

Reply via email to