RE: rfd

2018-12-27 Thread adamv0025
> Randy Bush
> Sent: Tuesday, December 18, 2018 5:40 PM
> 
> do you have rfd on?  with what parms?
> 
> randy

If I remember correctly the industry was back and forth on this several
times now.
First it was deemed good then some studies came out proving the penalty is
worse than the crime couple years later another study came out suggesting
that if correct parameters are used it should be alright, but I guess at
that time no one could have cared less already switching it on and off and
on again...

With regards to the comments made here on the number of unstable routes till
the whole system or significant parts collapse, I could easily revert that
argument and ask how many badly configured rfd till the whole system
shuts/dampens itself down... (positive vs negative feedback loop) I guess
the ideal solution is somewhere in between.

Personally I think rfd is just the aspirin, i.e. not treating the cause -but
merely helping with the headaches. 
And I suspect that Interface State Dampening would address 80% of the
route-flaps out there (it works exactly like rfd but treats the cause).
With the reminder being true protocol flaps either by misconfiguration of
max prefix limit (sessions should stay down) or BGP error handling -which
again can be solved by the enhanced BGP error handling or genuine bugs. 

adam  



Re: rfd

2018-12-18 Thread Michael Still
In general I agree with the idea here but I would also be interested in the
possibility of running the local route policy engine against routes that
are locally detected to meet a damping condition (user configureable of
course). This would potentially yield the ability to change local_pref as
well as other attributes that may be useful such as MED/metric (which can
be transitive) and/or communities.

On Tue, Dec 18, 2018 at 4:55 PM Job Snijders  wrote:

> Dear Steve,
>
> No worries, I have not forgotten the transitive properties of the
> LOCAL_PREF BGP Path Attribute! :-) You are right that any LOCAL_PREF
> modifications (and the attribute itself), are local to the Autonomous
> System in which they were set, but the effects of such settings can
> percolate further into the routing system.
>
> A great example is the "BGP Graceful Shutdown" mechanism (science
> partially documented in https://tools.ietf.org/html/rfc6198, actual
> specification here https://tools.ietf.org/html/rfc8326). What is
> interesting is that by considering a path (any path, could be flapping)
> my network will propagate alternative paths to my neighboring networks,
> or possibly even *withdraw* my announcement in favor of alternative
> (stable?) paths via competitors.
>
> By attaching a lower LOCAL_PREF value to a given path for a period of
> time as a 'penalty' for flapping, I suspect the visiblity of that
> flapping will be greatly reduced. This of course doesn't hold true when
> the only origin of the path is flapping, but in many flapping cases I
> triaged it was clear that only one out of many links was the root of the
> flapping.
>
> I'm not sure I share your concerns about scale, it appears that so far
> we seem to be doing just fine without "route flap dampening, penalty
> type: suppress". No customers ask for it, in fact many are relieved we
> don't use it. None of our peering partners ask for it either. When we
> see oscillating paths we reach out to the offending party and ask them
> to fix it, or take unilateral action within a specific time frame.
>
> Kind regards,
>
> Job
>


-- 
[stillwa...@gmail.com ~]$ cat .signature
cat: .signature: No such file or directory
[stillwa...@gmail.com ~]$


RE: rfd

2018-12-18 Thread Naslund, Steve
I will grant you that no customer ever asked for route dampening.  I also 
realize that RFD is much less important now than in the past.  I come from the 
ARPANET/DDN ages of the Internet and can tell you that RFD was absolutely 
critical in the days of very under powered routers and very unstable data 
links.  I remember when it was quite hard to maintain a 64k link to some 
locations at all.  There might be less of a need for such a simple RFD but it 
did serve its purpose.  In fact, my main argument on this whole topic is that 
RFD is not relevant enough to waste a lot of effort on a global accepted 
mechanism.  It is just not the low hanging fruit of routing performance 
improvements.  I see two major improvements to global routing...congestion 
avoidance (which goes a little bit with bandwidth awareness but not exactly) 
and multipath load balancing (which kind of requires a congestion avoidance 
awareness).  Both of these are going to be extremely difficult issues on a 
global scale of adoption but that's what is needed.

Steven Naslund
Chicago IL

>Dear Steve,
>
>No worries, I have not forgotten the transitive properties of the LOCAL_PREF 
>BGP Path Attribute! :-) You are right that any LOCAL_PREF modifications (and 
>the attribute itself), are local to the Autonomous System in which they were 
>>set, but the effects of such settings can percolate further into the routing 
>system.
>
>A great example is the "BGP Graceful Shutdown" mechanism (science partially 
>documented in https://tools.ietf.org/html/rfc6198, actual specification here 
>https://tools.ietf.org/html/rfc8326). What is interesting is that by 
>>considering a path (any path, could be flapping) my network will propagate 
>alternative paths to my neighboring networks, or possibly even *withdraw* my 
>announcement in favor of alternative
>(stable?) paths via competitors.
>
>By attaching a lower LOCAL_PREF value to a given path for a period of time as 
>a 'penalty' for flapping, I suspect the visiblity of that flapping will be 
>greatly reduced. This of course doesn't hold true when the only origin of the 
>path is >flapping, but in many flapping cases I triaged it was clear that only 
>one out of many links was the root of the flapping.
>
>I'm not sure I share your concerns about scale, it appears that so far we seem 
>to be doing just fine without "route flap dampening, penalty
>type: suppress". No customers ask for it, in fact many are relieved we don't 
>use it. None of our peering partners ask for it either. When we see 
>oscillating paths we reach out to the offending party and ask them to fix it, 
>or take >unilateral action within a specific time frame.
>
>Kind regards,
>
>Job


RE: rfd

2018-12-18 Thread Naslund, Steve
I think you will find that very hard to evaluate since the value of RFD will be 
different in different network regions.  For example, it is probably good 
practice to run RFD toward a customer on an unstable access link.  It might not 
be a good idea to run it on a major backbone link that could possibly flap a 
large number of times in a very short period due to something like a 
maintenance activity.  Also, in areas that are largely on a fiber 
infrastructure will see RFD in a much different light than a largely wireless 
infrastructure that might be subject to momentary interference or 
interruptions.  I think it is most safe to say that RFD needs to be evaluated 
and tuned for what you want it to do.  Penalties are never a pleasant thing but 
they prevent lawlessness.  That is exactly what RFD does.  You are the cop that 
decides how to enforce the laws.

In fact in my experience people could also get much better network performance 
overall by properly tuning BGP timers but very few actually do it.  I bet you 
could improve the Internet stability way more by doing that.

Steven Naslund
Chicago IL

>What would really be of interest to me would be for those that run RFD to 
>measure its impact to their network (positive or otherwise) so we have 
>something scientific to base on.
>
>The theory (and practice of old) tells us that RFD is either very good, or 
>very bad. There are probably more folk that have turned it off than run it, or 
>vice versa. Ultimately, if we can get the state >of RFD's performance in 2018 
>on an axis, our words will likely carry more weight.
>Mark.


Re: rfd

2018-12-18 Thread Job Snijders
Dear Steve,

No worries, I have not forgotten the transitive properties of the
LOCAL_PREF BGP Path Attribute! :-) You are right that any LOCAL_PREF
modifications (and the attribute itself), are local to the Autonomous
System in which they were set, but the effects of such settings can
percolate further into the routing system.

A great example is the "BGP Graceful Shutdown" mechanism (science
partially documented in https://tools.ietf.org/html/rfc6198, actual
specification here https://tools.ietf.org/html/rfc8326). What is
interesting is that by considering a path (any path, could be flapping)
my network will propagate alternative paths to my neighboring networks,
or possibly even *withdraw* my announcement in favor of alternative
(stable?) paths via competitors.

By attaching a lower LOCAL_PREF value to a given path for a period of
time as a 'penalty' for flapping, I suspect the visiblity of that
flapping will be greatly reduced. This of course doesn't hold true when
the only origin of the path is flapping, but in many flapping cases I
triaged it was clear that only one out of many links was the root of the
flapping.

I'm not sure I share your concerns about scale, it appears that so far
we seem to be doing just fine without "route flap dampening, penalty
type: suppress". No customers ask for it, in fact many are relieved we
don't use it. None of our peering partners ask for it either. When we
see oscillating paths we reach out to the offending party and ask them
to fix it, or take unilateral action within a specific time frame.

Kind regards,

Job


RE: rfd

2018-12-18 Thread Naslund, Steve
It is an interesting article but confirms a few things to me.  

1.  There are only a very small percentage of flapping routes causing an 
inordinate amount of BGP processing.  Would it be more effective to implement 
this route damping mechanism world wide or try to eliminate the source of the 
instability?

2.  The paper does not suggest how you would implement this on a global basis 
and what the "some people have it and some don't" scenario looks like.  The guy 
in the middle pays the price for your unstable customer.

3.  This affects such a small subset of global routes that we should be 
spending our time solving more globally impactful changes to BGP (like path 
bandwidth awareness).

Steven Naslund
Chicago IL

>Mainly because propagating a flapping route across the entire Internet is 
>damaging...
>
>
>https://www.researchgate.net/publication/220850232_Route_Flap_Damping_Made_Usable
>
>scott


Re: rfd

2018-12-18 Thread Mark Tinka
What would really be of interest to me would be for those that run RFD
to measure its impact to their network (positive or otherwise) so we
have something scientific to base on.

The theory (and practice of old) tells us that RFD is either very good,
or very bad. There are probably more folk that have turned it off than
run it, or vice versa. Ultimately, if we can get the state of RFD's
performance in 2018 on an axis, our words will likely carry more weight.

Mark.

On 18/Dec/18 23:24, Naslund, Steve wrote:
>
> Remember always that the local pref is just that, YOUR local
> preference.  Sending that flapping route upstream does not give your
> peer the option to ignore it.  In any case, the downside is that you
> have to process that route and then choose whether or not to use it. 
> It’s like saying “now that you have processed this unstable route and
> burned your CPU cycles, I am now giving you to option not to install
> it into your table”.  Remember also that we are only talking about
> default behavior here.  You always have the option to override it by
> changing timer, penalties, or shutting down RFD all together.  We are
> only talking about day-to-day operation here.
>
>  
>
> Also, keep in mind that when we are talking about alterative stable
> paths we are only talking about what your network sees, not the entire
> Internet.  If you as a service provider are experiencing major issues,
> you may see a route to me as stable or unstable but making global
> routing decisions based on that is not sound.  What might be best for
> your customer or your business might not be best for the Internet
> community as a whole.   It is a matter of scale, how many services
> providers can allow how many unstable routes before the entire network
> becomes regionally or globally unstable.  It’s important to remember
> that flapping routes leave a certain amount of data in flight with no
> destination which is detrimental to overall performance.  As we move
> into a V6 world we are again worried about the size of the global
> routing tables and pushing routing performance.  Instability of routes
> is dangerous to system running near the limits.  Propagating a known
> unstable route would be a major shift in routing policy.  Today, you
> either say you can reach something or you don’t say anything.  Using
> the suggested alternative adds the option of “I might be able to reach
> this but not reliably” which then brings about metrics of “how
> reliably?” and that is a huge shift in how global routing works.  We
> have been struggling with a backbone routing protocol that does not
> really do a good job of understanding bandwidth and multiple paths so
> I would suggest that adding “maybe” routes is not a good idea.
>
> At least using RFD you can explain to your customer why they are not
> reachable rather than explaining how you made a manual decision to
> dump them for the “good of the Internet”.  There is also a business
> penalty to the service provider that exposes instability to network. 
> People don’t want to peer or send traffic through unstable network
> regions.
>
>  
>
> Steve
>
>  
>
>  
>
> >Hi Steve,
>
> > 
>
> >Lowering the LP would achieve the outcome you desire, provided there
> are (stable) alternative paths.
>
> > 
>
> >What you advocate results in absolute outages in what may already be
> precarious situations (natural disasters?) - what Saku Ytti suggests
> like a less painful alternative with desirable properties.
>
> > 
>
> >Kind regards,
>
> > 
>
> >Job
>
>  
>



RE: rfd

2018-12-18 Thread Scott Weeks



--- snasl...@medline.com wrote:
From: "Naslund, Steve" 

Mainly because propagating a flapping route across 
the entire Internet is damaging...


https://www.researchgate.net/publication/220850232_Route_Flap_Damping_Made_Usable

scott


RE: rfd

2018-12-18 Thread Naslund, Steve
Remember always that the local pref is just that, YOUR local preference.  
Sending that flapping route upstream does not give your peer the option to 
ignore it.  In any case, the downside is that you have to process that route 
and then choose whether or not to use it.  It’s like saying “now that you have 
processed this unstable route and burned your CPU cycles, I am now giving you 
to option not to install it into your table”.  Remember also that we are only 
talking about default behavior here.  You always have the option to override it 
by changing timer, penalties, or shutting down RFD all together.  We are only 
talking about day-to-day operation here.

Also, keep in mind that when we are talking about alterative stable paths we 
are only talking about what your network sees, not the entire Internet.  If you 
as a service provider are experiencing major issues, you may see a route to me 
as stable or unstable but making global routing decisions based on that is not 
sound.  What might be best for your customer or your business might not be best 
for the Internet community as a whole.   It is a matter of scale, how many 
services providers can allow how many unstable routes before the entire network 
becomes regionally or globally unstable.  It’s important to remember that 
flapping routes leave a certain amount of data in flight with no destination 
which is detrimental to overall performance.  As we move into a V6 world we are 
again worried about the size of the global routing tables and pushing routing 
performance.  Instability of routes is dangerous to system running near the 
limits.  Propagating a known unstable route would be a major shift in routing 
policy.  Today, you either say you can reach something or you don’t say 
anything.  Using the suggested alternative adds the option of “I might be able 
to reach this but not reliably” which then brings about metrics of “how 
reliably?” and that is a huge shift in how global routing works.  We have been 
struggling with a backbone routing protocol that does not really do a good job 
of understanding bandwidth and multiple paths so I would suggest that adding 
“maybe” routes is not a good idea.
At least using RFD you can explain to your customer why they are not reachable 
rather than explaining how you made a manual decision to dump them for the 
“good of the Internet”.  There is also a business penalty to the service 
provider that exposes instability to network.  People don’t want to peer or 
send traffic through unstable network regions.

Steve


>Hi Steve,
>
>Lowering the LP would achieve the outcome you desire, provided there are 
>(stable) alternative paths.
>
>What you advocate results in absolute outages in what may already be 
>precarious situations (natural disasters?) - what Saku Ytti suggests like a 
>less painful alternative with desirable properties.
>
>Kind regards,
>
>Job



Re: rfd

2018-12-18 Thread Job Snijders
Hi Steve,

Lowering the LP would achieve the outcome you desire, provided there are
(stable) alternative paths.

What you advocate results in absolute outages in what may already be
precarious situations (natural disasters?) - what Saku Ytti suggests like a
less painful alternative with desirable properties.

Kind regards,

Job

On Tue, Dec 18, 2018 at 21:56 Naslund, Steve  wrote:

> Mainly because propagating a flapping route across the entire Internet is
> damaging to performance of things other your own equipment and that of your
> customer.  It is just "bad manners" to propagate a flapping route to your
> peers and it helps maintain a minimum level of stability that it required
> to keep you "on the Internet".  Imagine a table where 1000s of providers
> are each sending 100s of unstable routes and that those unstable routes
> might be redistributing into various IGPs that may not respond very
> gracefully to rapid table changes (like most distance vector IGPs).  Also
> think of this scenario, your link to your customer might be flapping but
> that same customer might have other carriers advertising the same address
> space over a stable link.  In that case you would be doing a dis-service by
> not withdrawing that route and having a local-pref does not help since you
> don't necessarily have visibility to all of your customers other carrier
> networks.
>
> You do have the ability to clear the RFD timers for a route if you need to
> manually intervene for example when you know for a fact that you fixed the
> problem.  That means that if no one is watching or intervening the network
> will "do the safe thing".
>
> Steven Naslund
> Chicago IL
>
> >
> >I always wondered why does it have to be so binary.
> >
> >I don't want to decide for my customers if partial visibility is better
> than busy CPU, but I do appreciate stability. Why can't we have local-pref
> penalty for flapping route. If it's only option, keep offering it, if there
> are other, more >stable options, offer those.
> >
> >--
> >  ++ytti
>


RE: rfd

2018-12-18 Thread Naslund, Steve
Mainly because propagating a flapping route across the entire Internet is 
damaging to performance of things other your own equipment and that of your 
customer.  It is just "bad manners" to propagate a flapping route to your peers 
and it helps maintain a minimum level of stability that it required to keep you 
"on the Internet".  Imagine a table where 1000s of providers are each sending 
100s of unstable routes and that those unstable routes might be redistributing 
into various IGPs that may not respond very gracefully to rapid table changes 
(like most distance vector IGPs).  Also think of this scenario, your link to 
your customer might be flapping but that same customer might have other 
carriers advertising the same address space over a stable link.  In that case 
you would be doing a dis-service by not withdrawing that route and having a 
local-pref does not help since you don't necessarily have visibility to all of 
your customers other carrier networks.

You do have the ability to clear the RFD timers for a route if you need to 
manually intervene for example when you know for a fact that you fixed the 
problem.  That means that if no one is watching or intervening the network will 
"do the safe thing".

Steven Naslund
Chicago IL

>
>I always wondered why does it have to be so binary.
>
>I don't want to decide for my customers if partial visibility is better than 
>busy CPU, but I do appreciate stability. Why can't we have local-pref penalty 
>for flapping route. If it's only option, keep offering it, if there are other, 
>more >stable options, offer those.
>
>--
>  ++ytti


Re: rfd

2018-12-18 Thread Jared Mauch



> On Dec 18, 2018, at 1:45 PM, Mark Tinka  wrote:
> 
> 
> 
> On 18/Dec/18 19:40, Randy Bush wrote:
> 
>> do you have rfd on?  with what parms?
> 
> We don't do it (SEACOM, AS37100).


Similarly 20940 does not use it.  I find it hard to see a case where we would 
turn it on.

- jared

Re: rfd

2018-12-18 Thread Saku Ytti
I always wondered why does it have to be so binary.

I don't want to decide for my customers if partial visibility is
better than busy CPU, but I do appreciate stability. Why can't we have
local-pref penalty for flapping route. If it's only option, keep
offering it, if there are other, more stable options, offer those.

-- 
  ++ytti


Re: rfd

2018-12-18 Thread Mark Tinka



On 18/Dec/18 19:40, Randy Bush wrote:

> do you have rfd on?  with what parms?

We don't do it (SEACOM, AS37100).

Mark.



Re: rfd

2018-12-18 Thread Mark Tinka



On 18/Dec/18 19:40, Randy Bush wrote:

> do you have rfd on?  with what parms?

We don't do it (SEACOM, AS37100).

Mark.


Re: rfd

2018-12-18 Thread Andrew Latham
Route Flap Damping via https://tools.ietf.org/html/rfc2439 for everyone.

On Tue, Dec 18, 2018 at 11:42 AM Randy Bush  wrote:

> do you have rfd on?  with what parms?
>
> randy
>


-- 
- Andrew "lathama" Latham -


Re: rfd

2018-12-18 Thread Job Snijders
On Tue, Dec 18, 2018 at 6:40 PM Randy Bush  wrote:
>
> do you have rfd on?  with what parms?

I assume rfd in this context means "Route Flap Dampening".

NTT / AS 2914 does *not* have Route Flap Dampening configured, as is
documented here
https://us.ntt.net/support/policy/routing.cfm#routedampening

Kind regards,

Job