Re: bfd-like mechanism for LANPHY connections between providers
On Wed, Mar 16, 2011 at 06:56:28PM +0200, Tassos Chatzithomaoglou wrote: Are there any transit providers out there that accept using the BFD (or any other similar) mechanism for eBGP peerings? If no, how do you solve the issue with the physical interface state when LANPHY connections are used? Anyone messing with the BGP timers? If yes, what about multiple LAN connections with a single BGP peering? Well first off LAN PHY has a perfectly useful link state. That's pretty much the ONLY thing it has in the way of native OAM, but it does have that, and that's normally good enough to bring down your EBGP session quickly. Personally I find the risk of false positives when speaking to other people's random bad BGP implementations to be too great if you go much below 30 sec hold timers (and sadly, even 30 secs is too low for some people). We (nLayer) are still waiting for our first customer to request BFD, we'd be happy to offer it (with reasonable timer values of course). :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: bfd-like mechanism for LANPHY connections between providers
Richard A Steenbergen wrote on 16/03/2011 19:03: On Wed, Mar 16, 2011 at 06:56:28PM +0200, Tassos Chatzithomaoglou wrote: Are there any transit providers out there that accept using the BFD (or any other similar) mechanism for eBGP peerings? If no, how do you solve the issue with the physical interface state when LANPHY connections are used? Anyone messing with the BGP timers? If yes, what about multiple LAN connections with a single BGP peering? Well first off LAN PHY has a perfectly useful link state. That's pretty much the ONLY thing it has in the way of native OAM, but it does have that, and that's normally good enough to bring down your EBGP session quickly. Personally I find the risk of false positives when speaking to other people's random bad BGP implementations to be too great if you go much below 30 sec hold timers (and sadly, even 30 secs is too low for some people). We (nLayer) are still waiting for our first customer to request BFD, we'd be happy to offer it (with reasonable timer values of course). :) Link state is good for the local connection. If there are multiple intermediate optical points (not managed by either party), or a lan switch (IX environment), you won't get any link notification for everything not connected locally to your interface, unless there is a mechanism to signal that to you. -- Tassos
RE: bfd-like mechanism for LANPHY connections between providers
We are going to turn up BFD with Level3 this Saturday. They require that you run a Juniper(per SE). Its sounds like it is fairly new as there was no paperwork to request the service, had to put it in the notes. We have many switches between us and Level3 so we don't get a interface down to drop the session in the event of a failure. -Original Message- From: Tassos Chatzithomaoglou [mailto:ach...@forthnet.gr] Sent: Wednesday, March 16, 2011 1:26 PM To: nanog@nanog.org Subject: Re: bfd-like mechanism for LANPHY connections between providers Richard A Steenbergen wrote on 16/03/2011 19:03: On Wed, Mar 16, 2011 at 06:56:28PM +0200, Tassos Chatzithomaoglou wrote: Are there any transit providers out there that accept using the BFD (or any other similar) mechanism for eBGP peerings? If no, how do you solve the issue with the physical interface state when LANPHY connections are used? Anyone messing with the BGP timers? If yes, what about multiple LAN connections with a single BGP peering? Well first off LAN PHY has a perfectly useful link state. That's pretty much the ONLY thing it has in the way of native OAM, but it does have that, and that's normally good enough to bring down your EBGP session quickly. Personally I find the risk of false positives when speaking to other people's random bad BGP implementations to be too great if you go much below 30 sec hold timers (and sadly, even 30 secs is too low for some people). We (nLayer) are still waiting for our first customer to request BFD, we'd be happy to offer it (with reasonable timer values of course). :) Link state is good for the local connection. If there are multiple intermediate optical points (not managed by either party), or a lan switch (IX environment), you won't get any link notification for everything not connected locally to your interface, unless there is a mechanism to signal that to you. -- Tassos
Re: bfd-like mechanism for LANPHY connections between providers
On Wed, Mar 16, 2011 at 2:33 PM, Jensen Tyler jty...@fiberutilities.com wrote: We have many switches between us and Level3 so we don't get a interface down to drop the session in the event of a failure. This is often my topology as well. I am satisfied with BGP's mechanism and default timers, and have been for many years. The reason for this is quite simple: failures are relatively rare, my convergence time to a good state is largely bounded by CPU, and I do not consider a slightly improved convergence time to be worth an a-typical configuration. Case in point, Richard says that none of his customers have requested such configuration to date; and you indicate that Level3 will provision BFD only if you use a certain vendor and this is handled outside of their normal provisioning process. For an IXP LAN interface and associated BGP neighbors, I see much more advantage. I imagine this will become common practice for IXP peering sessions long before it is typical to use BFD on customer/transit-provider BGP sessions. -- Jeff S Wheeler j...@inconcepts.biz Sr Network Operator / Innovative Network Concepts
Re: bfd-like mechanism for LANPHY connections between providers
On Wed, Mar 16, 2011 at 02:55:14PM -0400, Jeff Wheeler wrote: This is often my topology as well. I am satisfied with BGP's mechanism and default timers, and have been for many years. The reason for this is quite simple: failures are relatively rare, my convergence time to a good state is largely bounded by CPU, and I do not consider a slightly improved convergence time to be worth an a-typical configuration. Case in point, Richard says that none of his customers have requested such configuration to date; and you indicate that Level3 will provision BFD only if you use a certain vendor and this is handled outside of their normal provisioning process. There are still a LOT of platforms where BFD doesn't work reliably (without false positives), doesn't work as advertised, doesn't work under every configuration (e.g. on SVIs), or doesn't scale very well (i.e. it would fall over if you had more than a few neighbors configured). The list of caveats is huge, the list of vendors which support it well is small, and there should be giant YMMV stickers everywhere. But Juniper (M/T/MX series at any rate) is definitely one of the better options (though not without its flaws, inability to configure on the group level and selectively disable per-peer, and lack of support on the group level where any IPv6 neighbor is configured, come to mind). Running BFD with a transit provider is USUALLY the least interesting use case, since you're typically connected either directly, or via a metro transport service which is capable of passing link state. One possible exception to this is when you need to bundle multiple links together, but link-agg isn't a good solution, and you need to limit the number of EBGP paths to reduce load on the routers. The typical solution for this is loopback peering, but this kills your link state detection mechanism for killing BGP during a failure, which is where BFD starts to make sense. For IX's, where you have an active L2 switch in the middle and no link state, BFD makes the most sense. Unfortunately it's the area where we've seen the least traction among peers, with zomg why are you sending me these udp packets complaints outnumbering people interesting in configuring BFD 10:1. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
RE: bfd-like mechanism for LANPHY connections between providers
Correct me if I am wrong but to detect a failure by default BGP would wait the hold-timer then declare a peer dead and converge. So you would be looking at 90 seconds(juniper default?) + CPU bound convergence time to recover? Am I thinking about this right? -Original Message- From: Jeff Wheeler [mailto:j...@inconcepts.biz] Sent: Wednesday, March 16, 2011 1:55 PM To: nanog@nanog.org Subject: Re: bfd-like mechanism for LANPHY connections between providers On Wed, Mar 16, 2011 at 2:33 PM, Jensen Tyler jty...@fiberutilities.com wrote: We have many switches between us and Level3 so we don't get a interface down to drop the session in the event of a failure. This is often my topology as well. I am satisfied with BGP's mechanism and default timers, and have been for many years. The reason for this is quite simple: failures are relatively rare, my convergence time to a good state is largely bounded by CPU, and I do not consider a slightly improved convergence time to be worth an a-typical configuration. Case in point, Richard says that none of his customers have requested such configuration to date; and you indicate that Level3 will provision BFD only if you use a certain vendor and this is handled outside of their normal provisioning process. For an IXP LAN interface and associated BGP neighbors, I see much more advantage. I imagine this will become common practice for IXP peering sessions long before it is typical to use BFD on customer/transit-provider BGP sessions. -- Jeff S Wheeler j...@inconcepts.biz Sr Network Operator / Innovative Network Concepts
Re: bfd-like mechanism for LANPHY connections between providers
On Wed, Mar 16, 2011 at 4:42 PM, Jensen Tyler jty...@fiberutilities.com wrote: Correct me if I am wrong but to detect a failure by default BGP would wait the hold-timer then declare a peer dead and converge. So you would be looking at 90 seconds(juniper default?) + CPU bound convergence time to recover? Am I thinking about this right? This is correct. Note that 90 seconds isn't just a Juniper default. This suggested value appeared in RFC 1267 §5.4 (BGP-3) all the way back in 1991. In my view, configuring BFD for eBGP sessions is risking increased MTBF for rare reductions in MTTR. This is a risk / reward decision that IMO is still leaning towards lots of risk for little reward. I'll change my mind about this when BFD works on most boxes and is part of the standard provisioning procedure for more networks. It has already been pointed out that this is not true today. If your eBGP sessions are failing so frequently that you are very concerned about this 90 seconds, I suggest you won't reduce your operational headaches or customer grief by configuring BFD. This is probably an indication that you need to: 1) straighten out the problems with your switching network or transport vendor 2) get better transit 3) depeer some peers who can't maintain a stable connection to you; or 4) sacrifice something to the backhoe deity Again, in the case of an IXP interface, I believe BFD has much more potential benefit. -- Jeff S Wheeler j...@inconcepts.biz Sr Network Operator / Innovative Network Concepts
Re: bfd-like mechanism for LANPHY connections between providers
Correct me if I am wrong but to detect a failure by default BGP would wait the hold-timer then declare a peer dead and converge. Hence the case for BFD. There a difference of several orders of magnitude between BFD keepalive intervals (in ms) and BGP (in seconds) with generally configurable multipliers vs. hold timer. With Real time media and ever faster last miles, BGP hold timer may find itself inadequate, if not in appropriate in some cases. For a provider to require a vendor instead of RFC compliance is sinful. Sudeep On Mar 16, 2011, at 1:42 PM, Jensen Tyler wrote: Correct me if I am wrong but to detect a failure by default BGP would wait the hold-timer then declare a peer dead and converge. So you would be looking at 90 seconds(juniper default?) + CPU bound convergence time to recover? Am I thinking about this right? -Original Message- From: Jeff Wheeler [mailto:j...@inconcepts.biz] Sent: Wednesday, March 16, 2011 1:55 PM To: nanog@nanog.orgmailto:nanog@nanog.org Subject: Re: bfd-like mechanism for LANPHY connections between providers On Wed, Mar 16, 2011 at 2:33 PM, Jensen Tyler jty...@fiberutilities.commailto:jty...@fiberutilities.com wrote: We have many switches between us and Level3 so we don't get a interface down to drop the session in the event of a failure. This is often my topology as well. I am satisfied with BGP's mechanism and default timers, and have been for many years. The reason for this is quite simple: failures are relatively rare, my convergence time to a good state is largely bounded by CPU, and I do not consider a slightly improved convergence time to be worth an a-typical configuration. Case in point, Richard says that none of his customers have requested such configuration to date; and you indicate that Level3 will provision BFD only if you use a certain vendor and this is handled outside of their normal provisioning process. For an IXP LAN interface and associated BGP neighbors, I see much more advantage. I imagine this will become common practice for IXP peering sessions long before it is typical to use BFD on customer/transit-provider BGP sessions. -- Jeff S Wheeler j...@inconcepts.bizmailto:j...@inconcepts.biz Sr Network Operator / Innovative Network Concepts Sudeep Khuraijam | I speak for no one but I
Re: bfd-like mechanism for LANPHY connections between providers
On Wed, Mar 16, 2011 at 8:00 PM, Sudeep Khuraijam skhurai...@liveops.com wrote: There a difference of several orders of magnitude between BFD keepalive intervals (in ms) and BGP (in seconds) with generally configurable multipliers vs. hold timer. With Real time media and ever faster last miles, BGP hold timer may find itself inadequate, if not in appropriate in some cases. For eBGP peerings, your router must re-converge to a good state in 9 seconds to see an order of magnitude improvement in time-to-repair. This is typically not the case for transit/customer sessions. To make a risk/reward choice that is actually based in reality, you need to understand your total time to re-converge to a good state, and how much of that is BGP hold-time. You should then consider whether changing BGP timers (with its own set of disadvantages) is more or less practical than using BFD. Let's put it another way: if CPU/FIB convergence time were not a significant issue, do you think vendors would be working to optimize this process, that we would have concepts like MPLS FRR and PIC, and that each new router product line upgrade comes with a yet-faster CPU? Of course not. Vendors would just have said, hey, let's get together on a lower hold time for BGP. As I stated, I'll change my opinion of BFD when implementations improve. I understand the risk/reward situation. You don't seem to get this, and as a result, your overly-simplistic view is that BGP takes seconds and BFD takes milliseconds. For a provider to require a vendor instead of RFC compliance is sinful. Many sins are more practical than the alternatives. -- Jeff S Wheeler j...@inconcepts.biz Sr Network Operator / Innovative Network Concepts
Re: bfd-like mechanism for LANPHY connections between providers
On Mar 16, 2011, at 6:05 PM, Jeff Wheeler wrote: There a difference of several orders of magnitude between BFD keepalive intervals (in ms) and BGP (in seconds) with generally configurable multipliers vs. hold timer. With Real time media and ever faster last miles, BGP hold timer may find itself inadequate, if not in appropriate in some cases. For eBGP peerings, your router must re-converge to a good state in 9 seconds to see an order of magnitude improvement in time-to-repair. This is typically not the case for transit/customer sessions. Not so, if your goal is peer deactivation and failover.Also you miss the point. Once the event is detected the rest of the process starts. I am talking about event detection.One may want longer than a 30 second hold-timer but peer state deactivated instantly on link failure. If thats the design goal AND link state is not passed through, then BFD BGP deactivation is a good choice. To make a risk/reward choice that is actually based in reality, you need to understand your total time to re-converge to a good state, and how much of that is BGP hold-time. You should then consider whether changing BGP timers (with its own set of disadvantages) is more or less practical than using BFD. Yes I see that and I mentioned in some cases not all or most cases. Let's put it another way: if CPU/FIB convergence time were not a significant issue, do you think vendors would be working to optimize This goes orthogonal to my point. The Table size taxes, best path algorithms and the speed with which you can re-FIB rewrite the ASICs are constant in both the cases. But thats post event. this process, that we would have concepts like MPLS FRR and PIC, and Those are out of scope in the context of this thread and have completely different roles. that each new router product line upgrade comes with a yet-faster CPU? For things they can sell more licenses for such as 3DES, keying algorithms , virtual instances, other things on BGP, stuff that allow service providers to charge a lot more money while running on common infrastructure such as MPLS FRR and zillion other things like stateful redundancy, higher housekeeping needs, inservice upgrades and anything else with a list price. And its cheaper than the old cpu. Of course not. Vendors would just have said, hey, let's get together on a lower hold time for BGP. Because it would be horrible code design. Link detection is a common service. Besides BGP process threads can run longer than min intervals for link. Vendors would have to write checkpoints within BGP code to come up and service link state machine. And wait its a user configurable checkpoint!! So came BFD. Write a simple state machine and make it available to all protocols. As I stated, I'll change my opinion of BFD when implementations improve. I understand the risk/reward situation. You don't seem to get this, and as a result, your overly-simplistic view is that BGP takes seconds and BFD takes milliseconds. I have no doubt that you understand your risk/reward but you don't for every other environments. For event detection leading to a state change leading to peer deactivation, my overly-simplistic view is the fact ( not as you put it, but as it was written unedited). How you want to act in response is dependent on design. is that BGP takes seconds and BFD takes milliseconds. Thats what you read not what I wrote. I was comparing the speed of event detection. Now like I said for speed of deactivation BGP hold timer may find itself inadequate, if not in appropriate in some cases in this same context. But as I mentioned , we don't know the pain we are trying to solve for the requirements thats drove this thread in the first place. So I simply put the facts and a business driver. BFD is no different than deactivating a peer based on link failure. Your view is that there is no case for it. My point is - it arrived yesterday, its just a damn hard thing to monetize upstream in transit. For a provider to require a vendor instead of RFC compliance is sinful. Many sins are more practical than the alternatives. Few maybe. -- Jeff S Wheeler j...@inconcepts.bizmailto:j...@inconcepts.bizmailto:j...@inconcepts.bizmailto:j...@inconcepts.biz Sr Network Operator / Innovative Network Concepts