Re: bfd-like mechanism for LANPHY connections between providers

2011-03-16 Thread Richard A Steenbergen
On Wed, Mar 16, 2011 at 06:56:28PM +0200, Tassos Chatzithomaoglou wrote:
 Are there any transit providers out there that accept using the BFD (or 
 any other similar) mechanism for eBGP peerings?
 If no, how do you solve the issue with the physical interface state when 
 LANPHY connections are used?
 Anyone messing with the BGP timers? If yes, what about multiple LAN 
 connections with a single BGP peering?

Well first off LAN PHY has a perfectly useful link state. That's pretty 
much the ONLY thing it has in the way of native OAM, but it does have 
that, and that's normally good enough to bring down your EBGP session 
quickly. Personally I find the risk of false positives when speaking to 
other people's random bad BGP implementations to be too great if you go 
much below 30 sec hold timers (and sadly, even 30 secs is too low for 
some people). We (nLayer) are still waiting for our first customer to 
request BFD, we'd be happy to offer it (with reasonable timer values of 
course). :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)



Re: bfd-like mechanism for LANPHY connections between providers

2011-03-16 Thread Tassos Chatzithomaoglou


Richard A Steenbergen wrote on 16/03/2011 19:03:

On Wed, Mar 16, 2011 at 06:56:28PM +0200, Tassos Chatzithomaoglou wrote:
   

Are there any transit providers out there that accept using the BFD (or
any other similar) mechanism for eBGP peerings?
If no, how do you solve the issue with the physical interface state when
LANPHY connections are used?
Anyone messing with the BGP timers? If yes, what about multiple LAN
connections with a single BGP peering?
 

Well first off LAN PHY has a perfectly useful link state. That's pretty
much the ONLY thing it has in the way of native OAM, but it does have
that, and that's normally good enough to bring down your EBGP session
quickly. Personally I find the risk of false positives when speaking to
other people's random bad BGP implementations to be too great if you go
much below 30 sec hold timers (and sadly, even 30 secs is too low for
some people). We (nLayer) are still waiting for our first customer to
request BFD, we'd be happy to offer it (with reasonable timer values of
course). :)

   


Link state is good for the local connection. If there are multiple 
intermediate optical points (not managed by either party), or a lan 
switch (IX environment), you won't get any link notification for 
everything not connected locally to your interface, unless there is a 
mechanism to signal that to you.


--
Tassos




RE: bfd-like mechanism for LANPHY connections between providers

2011-03-16 Thread Jensen Tyler
We are going to turn up BFD with Level3 this Saturday. They require that you 
run a Juniper(per SE). Its sounds like it is fairly new as there was no 
paperwork to request the service, had to put it in the notes.

We have many switches between us and Level3 so we don't get a interface down 
to drop the session in the event of a failure.

-Original Message-
From: Tassos Chatzithomaoglou [mailto:ach...@forthnet.gr] 
Sent: Wednesday, March 16, 2011 1:26 PM
To: nanog@nanog.org
Subject: Re: bfd-like mechanism for LANPHY connections between providers


Richard A Steenbergen wrote on 16/03/2011 19:03:
 On Wed, Mar 16, 2011 at 06:56:28PM +0200, Tassos Chatzithomaoglou wrote:

 Are there any transit providers out there that accept using the BFD (or
 any other similar) mechanism for eBGP peerings?
 If no, how do you solve the issue with the physical interface state when
 LANPHY connections are used?
 Anyone messing with the BGP timers? If yes, what about multiple LAN
 connections with a single BGP peering?
  
 Well first off LAN PHY has a perfectly useful link state. That's pretty
 much the ONLY thing it has in the way of native OAM, but it does have
 that, and that's normally good enough to bring down your EBGP session
 quickly. Personally I find the risk of false positives when speaking to
 other people's random bad BGP implementations to be too great if you go
 much below 30 sec hold timers (and sadly, even 30 secs is too low for
 some people). We (nLayer) are still waiting for our first customer to
 request BFD, we'd be happy to offer it (with reasonable timer values of
 course). :)



Link state is good for the local connection. If there are multiple 
intermediate optical points (not managed by either party), or a lan 
switch (IX environment), you won't get any link notification for 
everything not connected locally to your interface, unless there is a 
mechanism to signal that to you.

--
Tassos




Re: bfd-like mechanism for LANPHY connections between providers

2011-03-16 Thread Jeff Wheeler
On Wed, Mar 16, 2011 at 2:33 PM, Jensen Tyler jty...@fiberutilities.com wrote:
 We have many switches between us and Level3 so we don't get a interface 
 down to drop the session in the event of a failure.

This is often my topology as well.  I am satisfied with BGP's
mechanism and default timers, and have been for many years.  The
reason for this is quite simple: failures are relatively rare, my
convergence time to a good state is largely bounded by CPU, and I do
not consider a slightly improved convergence time to be worth an
a-typical configuration.  Case in point, Richard says that none of his
customers have requested such configuration to date; and you indicate
that Level3 will provision BFD only if you use a certain vendor and
this is handled outside of their normal provisioning process.

For an IXP LAN interface and associated BGP neighbors, I see much more
advantage.  I imagine this will become common practice for IXP peering
sessions long before it is typical to use BFD on
customer/transit-provider BGP sessions.

-- 
Jeff S Wheeler j...@inconcepts.biz
Sr Network Operator  /  Innovative Network Concepts



Re: bfd-like mechanism for LANPHY connections between providers

2011-03-16 Thread Richard A Steenbergen
On Wed, Mar 16, 2011 at 02:55:14PM -0400, Jeff Wheeler wrote:
 
 This is often my topology as well.  I am satisfied with BGP's 
 mechanism and default timers, and have been for many years.  The 
 reason for this is quite simple: failures are relatively rare, my 
 convergence time to a good state is largely bounded by CPU, and I do 
 not consider a slightly improved convergence time to be worth an 
 a-typical configuration.  Case in point, Richard says that none of his 
 customers have requested such configuration to date; and you indicate 
 that Level3 will provision BFD only if you use a certain vendor and 
 this is handled outside of their normal provisioning process.

There are still a LOT of platforms where BFD doesn't work reliably 
(without false positives), doesn't work as advertised, doesn't work 
under every configuration (e.g. on SVIs), or doesn't scale very well 
(i.e. it would fall over if you had more than a few neighbors 
configured). The list of caveats is huge, the list of vendors which 
support it well is small, and there should be giant YMMV stickers 
everywhere. But Juniper (M/T/MX series at any rate) is definitely one of 
the better options (though not without its flaws, inability to configure 
on the group level and selectively disable per-peer, and lack of support 
on the group level where any IPv6 neighbor is configured, come to mind).

Running BFD with a transit provider is USUALLY the least interesting use 
case, since you're typically connected either directly, or via a metro 
transport service which is capable of passing link state. One possible 
exception to this is when you need to bundle multiple links together, 
but link-agg isn't a good solution, and you need to limit the number of 
EBGP paths to reduce load on the routers. The typical solution for this 
is loopback peering, but this kills your link state detection mechanism 
for killing BGP during a failure, which is where BFD starts to make 
sense.

For IX's, where you have an active L2 switch in the middle and no link 
state, BFD makes the most sense. Unfortunately it's the area where we've 
seen the least traction among peers, with zomg why are you sending me 
these udp packets complaints outnumbering people interesting in 
configuring BFD 10:1.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)



RE: bfd-like mechanism for LANPHY connections between providers

2011-03-16 Thread Jensen Tyler
Correct me if I am wrong but to detect a failure by default BGP would wait the 
hold-timer then declare a peer dead and converge.

So you would be looking at 90 seconds(juniper default?) + CPU bound convergence 
time to recover? Am I thinking about this right?

-Original Message-
From: Jeff Wheeler [mailto:j...@inconcepts.biz] 
Sent: Wednesday, March 16, 2011 1:55 PM
To: nanog@nanog.org
Subject: Re: bfd-like mechanism for LANPHY connections between providers

On Wed, Mar 16, 2011 at 2:33 PM, Jensen Tyler jty...@fiberutilities.com wrote:
 We have many switches between us and Level3 so we don't get a interface 
 down to drop the session in the event of a failure.

This is often my topology as well.  I am satisfied with BGP's
mechanism and default timers, and have been for many years.  The
reason for this is quite simple: failures are relatively rare, my
convergence time to a good state is largely bounded by CPU, and I do
not consider a slightly improved convergence time to be worth an
a-typical configuration.  Case in point, Richard says that none of his
customers have requested such configuration to date; and you indicate
that Level3 will provision BFD only if you use a certain vendor and
this is handled outside of their normal provisioning process.

For an IXP LAN interface and associated BGP neighbors, I see much more
advantage.  I imagine this will become common practice for IXP peering
sessions long before it is typical to use BFD on
customer/transit-provider BGP sessions.

-- 
Jeff S Wheeler j...@inconcepts.biz
Sr Network Operator  /  Innovative Network Concepts




Re: bfd-like mechanism for LANPHY connections between providers

2011-03-16 Thread Jeff Wheeler
On Wed, Mar 16, 2011 at 4:42 PM, Jensen Tyler jty...@fiberutilities.com wrote:
 Correct me if I am wrong but to detect a failure by default BGP would wait 
 the hold-timer then declare a peer dead and converge.

 So you would be looking at 90 seconds(juniper default?) + CPU bound 
 convergence time to recover? Am I thinking about this right?

This is correct.  Note that 90 seconds isn't just a Juniper default.
 This suggested value appeared in RFC 1267 §5.4 (BGP-3) all the way
back in 1991.

In my view, configuring BFD for eBGP sessions is risking increased
MTBF for rare reductions in MTTR.

This is a risk / reward decision that IMO is still leaning towards
lots of risk for little reward.  I'll change my mind about this
when BFD works on most boxes and is part of the standard provisioning
procedure for more networks.  It has already been pointed out that
this is not true today.

If your eBGP sessions are failing so frequently that you are very
concerned about this 90 seconds, I suggest you won't reduce your
operational headaches or customer grief by configuring BFD.  This is
probably an indication that you need to:
1) straighten out the problems with your switching network or transport vendor
2) get better transit
3) depeer some peers who can't maintain a stable connection to you; or
4) sacrifice something to the backhoe deity

Again, in the case of an IXP interface, I believe BFD has much more
potential benefit.

-- 
Jeff S Wheeler j...@inconcepts.biz
Sr Network Operator  /  Innovative Network Concepts



Re: bfd-like mechanism for LANPHY connections between providers

2011-03-16 Thread Sudeep Khuraijam
Correct me if I am wrong but to detect a failure by default BGP would wait the 
hold-timer then declare a peer dead and converge.
Hence the case for BFD.

There a difference of several orders of magnitude  between BFD keepalive 
intervals  (in ms) and BGP (in seconds) with generally configurable multipliers 
vs. hold timer.
With Real time media and ever faster last miles, BGP hold timer may find itself 
inadequate, if not in appropriate in some cases.

For a provider to require a vendor instead of RFC compliance is sinful.

Sudeep
On Mar 16, 2011, at 1:42 PM, Jensen Tyler wrote:

Correct me if I am wrong but to detect a failure by default BGP would wait the 
hold-timer then declare a peer dead and converge.

So you would be looking at 90 seconds(juniper default?) + CPU bound convergence 
time to recover? Am I thinking about this right?

-Original Message-
From: Jeff Wheeler [mailto:j...@inconcepts.biz]
Sent: Wednesday, March 16, 2011 1:55 PM
To: nanog@nanog.orgmailto:nanog@nanog.org
Subject: Re: bfd-like mechanism for LANPHY connections between providers

On Wed, Mar 16, 2011 at 2:33 PM, Jensen Tyler 
jty...@fiberutilities.commailto:jty...@fiberutilities.com wrote:
We have many switches between us and Level3 so we don't get a interface down 
to drop the session in the event of a failure.

This is often my topology as well.  I am satisfied with BGP's
mechanism and default timers, and have been for many years.  The
reason for this is quite simple: failures are relatively rare, my
convergence time to a good state is largely bounded by CPU, and I do
not consider a slightly improved convergence time to be worth an
a-typical configuration.  Case in point, Richard says that none of his
customers have requested such configuration to date; and you indicate
that Level3 will provision BFD only if you use a certain vendor and
this is handled outside of their normal provisioning process.

For an IXP LAN interface and associated BGP neighbors, I see much more
advantage.  I imagine this will become common practice for IXP peering
sessions long before it is typical to use BFD on
customer/transit-provider BGP sessions.

--
Jeff S Wheeler j...@inconcepts.bizmailto:j...@inconcepts.biz
Sr Network Operator  /  Innovative Network Concepts




Sudeep Khuraijam | I speak for no one but I









Re: bfd-like mechanism for LANPHY connections between providers

2011-03-16 Thread Jeff Wheeler
On Wed, Mar 16, 2011 at 8:00 PM, Sudeep Khuraijam
skhurai...@liveops.com wrote:
 There a difference of several orders of magnitude  between BFD keepalive 
 intervals  (in ms) and BGP (in seconds) with generally configurable 
 multipliers vs. hold timer.
 With Real time media and ever faster last miles, BGP hold timer may find 
 itself inadequate, if not in appropriate in some cases.

For eBGP peerings, your router must re-converge to a good state in  9
seconds to see an order of magnitude improvement in time-to-repair.
This is typically not the case for transit/customer sessions.

To make a risk/reward choice that is actually based in reality, you
need to understand your total time to re-converge to a good state, and
how much of that is BGP hold-time.  You should then consider whether
changing BGP timers (with its own set of disadvantages) is more or
less practical than using BFD.

Let's put it another way: if CPU/FIB convergence time were not a
significant issue, do you think vendors would be working to optimize
this process, that we would have concepts like MPLS FRR and PIC, and
that each new router product line upgrade comes with a yet-faster CPU?
 Of course not.  Vendors would just have said, hey, let's get
together on a lower hold time for BGP.

As I stated, I'll change my opinion of BFD when implementations
improve.  I understand the risk/reward situation.  You don't seem to
get this, and as a result, your overly-simplistic view is that BGP
takes seconds and BFD takes milliseconds.

 For a provider to require a vendor instead of RFC compliance is sinful.

Many sins are more practical than the alternatives.

-- 
Jeff S Wheeler j...@inconcepts.biz
Sr Network Operator  /  Innovative Network Concepts



Re: bfd-like mechanism for LANPHY connections between providers

2011-03-16 Thread Sudeep Khuraijam

On Mar 16, 2011, at 6:05 PM, Jeff Wheeler wrote:

There a difference of several orders of magnitude  between BFD keepalive 
intervals  (in ms) and BGP (in seconds) with generally configurable 
multipliers vs. hold  timer.
With Real time media and ever faster last miles, BGP hold timer may find 
itself inadequate, if not in appropriate in some cases.

For eBGP peerings, your router must re-converge to a good state in  9
seconds to see an order of magnitude improvement in time-to-repair.
This is typically not the case for transit/customer sessions.



Not so, if your goal is peer deactivation and failover.Also you miss the 
point.   Once the event is detected the rest of the process starts.  I am 
talking about
event detection.One may  want longer than a  30 second hold-timer but  peer 
state deactivated instantly on link failure.  If thats the design goal AND link 
state is not passed through, then
   BFD BGP deactivation is a good choice.

To make a risk/reward choice that is actually based in reality, you
need to understand your total time to re-converge to a good state, and
how much of that is BGP hold-time.  You should then consider whether
changing BGP timers (with its own set of disadvantages) is more or
less practical than using BFD.



Yes I see that and  I mentioned  in some cases not all or most cases.


Let's put it another way: if CPU/FIB convergence time were not a
significant issue, do you think vendors would be working to optimize

  This goes orthogonal to my point.  The Table size taxes, best path algorithms 
and the speed with
  which you can re-FIB  rewrite the ASICs are constant in both the cases.  But 
thats post event.
this process, that we would have concepts like MPLS FRR and PIC, and

Those are out of scope in the context of this thread and have completely 
different roles.

that each new router product line upgrade comes with a yet-faster CPU?


For things they can sell more licenses for such as 3DES,  keying algorithms , 
virtual instances, other things on BGP, stuff that allow service providers to 
charge a lot more money
while running on common infrastructure such as MPLS   FRR and zillion other 
things like stateful redundancy, higher housekeeping needs, inservice upgrades 
and anything else with a list price.   And its cheaper than the old cpu.

Of course not.  Vendors would just have said, hey, let's get
together on a lower hold time for BGP.


Because it would be horrible code design.  Link detection is a common service.  
Besides BGP process threads can run longer than min intervals for link.  
Vendors would have to write checkpoints within BGP
   code to come up and service link state machine.   And wait its a user 
configurable checkpoint!!   So came BFD.  Write a simple state machine and make 
it available to all protocols.


As I stated, I'll change my opinion of BFD when implementations
improve.  I understand the risk/reward situation.  You don't seem to
get this, and as a result, your overly-simplistic view is that BGP
takes seconds and BFD takes milliseconds.

 I have no doubt that you understand your risk/reward but you don't for every 
other environments.

For event detection leading to a state change leading to peer deactivation,  
my overly-simplistic view  is the fact ( not as you put it, but as it was 
written unedited).  How you want to act in response is dependent on design.
is that BGP
takes seconds and BFD takes milliseconds.

Thats what you read not what I wrote.   I was comparing the speed of event 
detection.

Now like I said for speed of deactivation  BGP hold timer may find itself 
inadequate, if not in appropriate in some cases in this same context.  But as 
I mentioned , we don't know the pain we are trying to solve for the 
requirements thats drove this thread in the first place.  So I simply put the 
facts and a business driver.


   BFD is no different than deactivating a peer based on link failure.  Your 
view is that there is no case for it.  My point is - it arrived yesterday,  its 
just a damn hard thing to monetize upstream in transit.


For a provider to require a vendor instead of RFC compliance is sinful.

Many sins are more practical than the alternatives.
Few maybe.


--
Jeff S Wheeler 
j...@inconcepts.bizmailto:j...@inconcepts.bizmailto:j...@inconcepts.bizmailto:j...@inconcepts.biz
Sr Network Operator  /  Innovative Network Concepts