Re: link monitoring
On Fri, 30 Apr 2021 at 00:35, Eric Kuhnke wrote: > The Junipers on both sides should have discrete SNMP OIDs that respond with a > FEC stress value, or FEC error value. See blue highlighted part here about > FEC. Depending on what version of JunOS you're running the MIB for it may or > may not exist. This feature will be introduced by ER-079886 in some future date. You may be confused about OTN FEC, which is available via MIB, but unrelated to the topic. I did plan to open a feature request for other vendors too, but I've been lazy. It is broadly missing, We are doing very little as a community to address problems before they become symptomatic and undercapitalising the information we already have from DDM and RS-FEC. Only slightly on-topic; people who interact with optical vendors might want to ask about propagating RS-FEC correctable errors. RS-FEC of course is point-to-point, so in your active optical system it terminates on the first hop. But technically nothing stopping far end optical link inducing RS-FEC correctable error, if there was an error. Perhaps even standard to discriminate between organic near-hop RS-FEC correctable error from induced. We have a sort of precedent for this, as some cut-through switches can discriminate between near-hop FCS error from other type of FCS, because of course sender will know about FCS after it already sent the frame, but it can add some symbol in this case, to let the receiver know it's not near-end. This allows the receiver to keep two FCS counters. -- ++ytti
Re: link monitoring
Y.1731 or TWAMP if available on those devices. Le ven. 30 avr. 2021 17:57, Colton Conor a écrit : > What NMS is everyone using to graph and alert on this data? > > On Fri, Apr 30, 2021 at 7:49 AM Alain Hebert wrote: > >> Yes the JNP DOM MIB is what you are looking for. >> >> It also the traps for warnings and alarms thresholds you can use >> which is driven by the optic own parameters. >> ( Human Interface: show interfaces diagnostics optics ] ) >> >> TLDR: >> >> Realtime: Traps; >> Monitoring: DOM MIB; >> >> PS: I suggest you join [ juniper-...@puck.nether.net ] mailing list. >> >> - >> Alain Hebertaheb...@pubnix.net >> PubNIX Inc. >> 50 boul. St-Charles >> P.O. Box 26770 Beaconsfield, Quebec H9W 6G7 >> Tel: 514-990-5911 http://www.pubnix.netFax: 514-990-9443 >> >> On 4/29/21 5:32 PM, Eric Kuhnke wrote: >> >> The Junipers on both sides should have discrete SNMP OIDs that respond >> with a FEC stress value, or FEC error value. See blue highlighted part here >> about FEC. Depending on what version of JunOS you're running the MIB for it >> may or may not exist. >> >> >> https://kb.juniper.net/InfoCenter/index?page=content=KB36074=MX2008=LIST >> >> In other equipment sometimes it's found in a sub-tree of SNMP adjacent to >> optical DOM values. Once you can acquire and poll that value, set it up as >> a custom thing to graph and alert upon certain threshold values in your >> choice of NMS. >> >> Additionally signs of a failing optic may show up in some of the optical >> DOM MIB items you can poll: >> https://mibs.observium.org/mib/JUNIPER-DOM-MIB/ >> >> It helps if you have some non-misbehaving similar linecards and optics >> which can be polled during custom graph/OID configuration, to establish a >> baseline 'no problem' value, which if exceeded will trigger whatever >> threshold value you set in your monitoring system. >> >> On Thu, Apr 29, 2021 at 1:40 PM Baldur Norddahl < >> baldur.nordd...@gmail.com> wrote: >> >>> Hello >>> >>> We had a 100G link that started to misbehave and caused the customers to >>> notice bad packet loss. The optical values are just fine but we had packet >>> loss and latency. Interface shows FEC errors on one end and carrier >>> transitions on the other end. But otherwise the link would stay up and our >>> monitor system completely failed to warn about the failure. Had to find the >>> bad link by traceroute (mtr) and observe where packet loss started. >>> >>> The link was between a Juniper MX204 and Juniper ACX5448. Link length 2 >>> meters using 2 km single mode SFP modules. >>> >>> What is the best practice to monitor links to avoid this scenarium? What >>> options do we have to do link monitoring? I am investigating BFD but I am >>> unsure if that would have helped the situation. >>> >>> Thanks, >>> >>> Baldur >>> >>> >>> >>
Re: link monitoring
What NMS is everyone using to graph and alert on this data? On Fri, Apr 30, 2021 at 7:49 AM Alain Hebert wrote: > Yes the JNP DOM MIB is what you are looking for. > > It also the traps for warnings and alarms thresholds you can use which > is driven by the optic own parameters. > ( Human Interface: show interfaces diagnostics optics ] ) > > TLDR: > > Realtime: Traps; > Monitoring: DOM MIB; > > PS: I suggest you join [ juniper-...@puck.nether.net ] mailing list. > > - > Alain Hebertaheb...@pubnix.net > PubNIX Inc. > 50 boul. St-Charles > P.O. Box 26770 Beaconsfield, Quebec H9W 6G7 > Tel: 514-990-5911 http://www.pubnix.netFax: 514-990-9443 > > On 4/29/21 5:32 PM, Eric Kuhnke wrote: > > The Junipers on both sides should have discrete SNMP OIDs that respond > with a FEC stress value, or FEC error value. See blue highlighted part here > about FEC. Depending on what version of JunOS you're running the MIB for it > may or may not exist. > > > https://kb.juniper.net/InfoCenter/index?page=content=KB36074=MX2008=LIST > > In other equipment sometimes it's found in a sub-tree of SNMP adjacent to > optical DOM values. Once you can acquire and poll that value, set it up as > a custom thing to graph and alert upon certain threshold values in your > choice of NMS. > > Additionally signs of a failing optic may show up in some of the optical > DOM MIB items you can poll: > https://mibs.observium.org/mib/JUNIPER-DOM-MIB/ > > It helps if you have some non-misbehaving similar linecards and optics > which can be polled during custom graph/OID configuration, to establish a > baseline 'no problem' value, which if exceeded will trigger whatever > threshold value you set in your monitoring system. > > On Thu, Apr 29, 2021 at 1:40 PM Baldur Norddahl > wrote: > >> Hello >> >> We had a 100G link that started to misbehave and caused the customers to >> notice bad packet loss. The optical values are just fine but we had packet >> loss and latency. Interface shows FEC errors on one end and carrier >> transitions on the other end. But otherwise the link would stay up and our >> monitor system completely failed to warn about the failure. Had to find the >> bad link by traceroute (mtr) and observe where packet loss started. >> >> The link was between a Juniper MX204 and Juniper ACX5448. Link length 2 >> meters using 2 km single mode SFP modules. >> >> What is the best practice to monitor links to avoid this scenarium? What >> options do we have to do link monitoring? I am investigating BFD but I am >> unsure if that would have helped the situation. >> >> Thanks, >> >> Baldur >> >> >> >
Re: link monitoring
Yes the JNP DOM MIB is what you are looking for. It also the traps for warnings and alarms thresholds you can use which is driven by the optic own parameters. ( Human Interface: show interfaces diagnostics optics ] ) TLDR: Realtime: Traps; Monitoring: DOM MIB; PS: I suggest you join [ juniper-...@puck.nether.net ] mailing list. - Alain Hebertaheb...@pubnix.net PubNIX Inc. 50 boul. St-Charles P.O. Box 26770 Beaconsfield, Quebec H9W 6G7 Tel: 514-990-5911 http://www.pubnix.netFax: 514-990-9443 On 4/29/21 5:32 PM, Eric Kuhnke wrote: The Junipers on both sides should have discrete SNMP OIDs that respond with a FEC stress value, or FEC error value. See blue highlighted part here about FEC. Depending on what version of JunOS you're running the MIB for it may or may not exist. https://kb.juniper.net/InfoCenter/index?page=content=KB36074=MX2008=LIST <https://kb.juniper.net/InfoCenter/index?page=content=KB36074=MX2008=LIST> In other equipment sometimes it's found in a sub-tree of SNMP adjacent to optical DOM values. Once you can acquire and poll that value, set it up as a custom thing to graph and alert upon certain threshold values in your choice of NMS. Additionally signs of a failing optic may show up in some of the optical DOM MIB items you can poll: https://mibs.observium.org/mib/JUNIPER-DOM-MIB/ <https://mibs.observium.org/mib/JUNIPER-DOM-MIB/> It helps if you have some non-misbehaving similar linecards and optics which can be polled during custom graph/OID configuration, to establish a baseline 'no problem' value, which if exceeded will trigger whatever threshold value you set in your monitoring system. On Thu, Apr 29, 2021 at 1:40 PM Baldur Norddahl mailto:baldur.nordd...@gmail.com>> wrote: Hello We had a 100G link that started to misbehave and caused the customers to notice bad packet loss. The optical values are just fine but we had packet loss and latency. Interface shows FEC errors on one end and carrier transitions on the other end. But otherwise the link would stay up and our monitor system completely failed to warn about the failure. Had to find the bad link by traceroute (mtr) and observe where packet loss started. The link was between a Juniper MX204 and Juniper ACX5448. Link length 2 meters using 2 km single mode SFP modules. What is the best practice to monitor links to avoid this scenarium? What options do we have to do link monitoring? I am investigating BFD but I am unsure if that would have helped the situation. Thanks, Baldur
RE: link monitoring
We use LibreNMS and smokeping to monitor latency and dropped packets on all our links and setup alerts if they go over a certain threshold. We are working on a script to automatically reroute traffic based on the alerts to route around the bad link to give us time to fix it. Thanks Travis From: NANOG On Behalf Of Baldur Norddahl Sent: Thursday, April 29, 2021 3:39 PM To: nanog@nanog.org Subject: link monitoring Hello We had a 100G link that started to misbehave and caused the customers to notice bad packet loss. The optical values are just fine but we had packet loss and latency. Interface shows FEC errors on one end and carrier transitions on the other end. But otherwise the link would stay up and our monitor system completely failed to warn about the failure. Had to find the bad link by traceroute (mtr) and observe where packet loss started. The link was between a Juniper MX204 and Juniper ACX5448. Link length 2 meters using 2 km single mode SFP modules. What is the best practice to monitor links to avoid this scenarium? What options do we have to do link monitoring? I am investigating BFD but I am unsure if that would have helped the situation. Thanks, Baldur
Re: link monitoring
If I may add one thing I forgot, this post reminded me. In the question I think it was probably a 100G CWDM4 short distance link. When monitoring a 100G coherent (QPSK, 16QAM, whatever) longer distance link, be absolutely sure to poll all of the SNMP OIDs for it the same as if it was a point to point microwave link. Depending on exactly what line card and optic it is, it may behave somewhat similarly to a faded or misaligned radio link under conditions related to degradation of the fiber or the lasers. In particular I'm thinking of coherent 100G linecards that can switch on the fly between 'low FEC' and 'high FEC' payload vs FEC percentage (much as an ACM-capable 18 or 23 GHz band radio would), which should absolutely trigger an alarm. And also the data for FEC decode stress percentage level, etc. On Thu, Apr 29, 2021 at 2:37 PM Lady Benjamin Cannon of Glencoe, ASCE < l...@6by7.net> wrote: > We monitor light levels and FEC values on all links and have thresholds > for early-warning and PRe-failure analysis. > > Short answer is yes we see links lose packets before completely failing > and for dozens of reasons that’s still a good thing, but you need to > monitor every part of a resilient network. > > Ms. Lady Benjamin PD Cannon of Glencoe, ASCE > 6x7 Networks & 6x7 Telecom, LLC > CEO > l...@6by7.net > "The only fully end-to-end encrypted global telecommunications company > in the world.” > > FCC License KJ6FJJ > > Sent from my iPhone via RFC1149. > > On Apr 29, 2021, at 2:32 PM, Eric Kuhnke wrote: > > > The Junipers on both sides should have discrete SNMP OIDs that respond > with a FEC stress value, or FEC error value. See blue highlighted part here > about FEC. Depending on what version of JunOS you're running the MIB for it > may or may not exist. > > > https://kb.juniper.net/InfoCenter/index?page=content=KB36074=MX2008=LIST > > In other equipment sometimes it's found in a sub-tree of SNMP adjacent to > optical DOM values. Once you can acquire and poll that value, set it up as > a custom thing to graph and alert upon certain threshold values in your > choice of NMS. > > Additionally signs of a failing optic may show up in some of the optical > DOM MIB items you can poll: > https://mibs.observium.org/mib/JUNIPER-DOM-MIB/ > > It helps if you have some non-misbehaving similar linecards and optics > which can be polled during custom graph/OID configuration, to establish a > baseline 'no problem' value, which if exceeded will trigger whatever > threshold value you set in your monitoring system. > > On Thu, Apr 29, 2021 at 1:40 PM Baldur Norddahl > wrote: > >> Hello >> >> We had a 100G link that started to misbehave and caused the customers to >> notice bad packet loss. The optical values are just fine but we had packet >> loss and latency. Interface shows FEC errors on one end and carrier >> transitions on the other end. But otherwise the link would stay up and our >> monitor system completely failed to warn about the failure. Had to find the >> bad link by traceroute (mtr) and observe where packet loss started. >> >> The link was between a Juniper MX204 and Juniper ACX5448. Link length 2 >> meters using 2 km single mode SFP modules. >> >> What is the best practice to monitor links to avoid this scenarium? What >> options do we have to do link monitoring? I am investigating BFD but I am >> unsure if that would have helped the situation. >> >> Thanks, >> >> Baldur >> >> >>
Re: link monitoring
We monitor light levels and FEC values on all links and have thresholds for early-warning and PRe-failure analysis. Short answer is yes we see links lose packets before completely failing and for dozens of reasons that’s still a good thing, but you need to monitor every part of a resilient network. Ms. Lady Benjamin PD Cannon of Glencoe, ASCE 6x7 Networks & 6x7 Telecom, LLC CEO l...@6by7.net "The only fully end-to-end encrypted global telecommunications company in the world.” FCC License KJ6FJJ Sent from my iPhone via RFC1149. > On Apr 29, 2021, at 2:32 PM, Eric Kuhnke wrote: > > > The Junipers on both sides should have discrete SNMP OIDs that respond with a > FEC stress value, or FEC error value. See blue highlighted part here about > FEC. Depending on what version of JunOS you're running the MIB for it may or > may not exist. > > https://kb.juniper.net/InfoCenter/index?page=content=KB36074=MX2008=LIST > > In other equipment sometimes it's found in a sub-tree of SNMP adjacent to > optical DOM values. Once you can acquire and poll that value, set it up as a > custom thing to graph and alert upon certain threshold values in your choice > of NMS. > > Additionally signs of a failing optic may show up in some of the optical DOM > MIB items you can poll: https://mibs.observium.org/mib/JUNIPER-DOM-MIB/ > > It helps if you have some non-misbehaving similar linecards and optics which > can be polled during custom graph/OID configuration, to establish a baseline > 'no problem' value, which if exceeded will trigger whatever threshold value > you set in your monitoring system. > >> On Thu, Apr 29, 2021 at 1:40 PM Baldur Norddahl >> wrote: >> Hello >> >> We had a 100G link that started to misbehave and caused the customers to >> notice bad packet loss. The optical values are just fine but we had packet >> loss and latency. Interface shows FEC errors on one end and carrier >> transitions on the other end. But otherwise the link would stay up and our >> monitor system completely failed to warn about the failure. Had to find the >> bad link by traceroute (mtr) and observe where packet loss started. >> >> The link was between a Juniper MX204 and Juniper ACX5448. Link length 2 >> meters using 2 km single mode SFP modules. >> >> What is the best practice to monitor links to avoid this scenarium? What >> options do we have to do link monitoring? I am investigating BFD but I am >> unsure if that would have helped the situation. >> >> Thanks, >> >> Baldur >> >>
Re: link monitoring
The Junipers on both sides should have discrete SNMP OIDs that respond with a FEC stress value, or FEC error value. See blue highlighted part here about FEC. Depending on what version of JunOS you're running the MIB for it may or may not exist. https://kb.juniper.net/InfoCenter/index?page=content=KB36074=MX2008=LIST In other equipment sometimes it's found in a sub-tree of SNMP adjacent to optical DOM values. Once you can acquire and poll that value, set it up as a custom thing to graph and alert upon certain threshold values in your choice of NMS. Additionally signs of a failing optic may show up in some of the optical DOM MIB items you can poll: https://mibs.observium.org/mib/JUNIPER-DOM-MIB/ It helps if you have some non-misbehaving similar linecards and optics which can be polled during custom graph/OID configuration, to establish a baseline 'no problem' value, which if exceeded will trigger whatever threshold value you set in your monitoring system. On Thu, Apr 29, 2021 at 1:40 PM Baldur Norddahl wrote: > Hello > > We had a 100G link that started to misbehave and caused the customers to > notice bad packet loss. The optical values are just fine but we had packet > loss and latency. Interface shows FEC errors on one end and carrier > transitions on the other end. But otherwise the link would stay up and our > monitor system completely failed to warn about the failure. Had to find the > bad link by traceroute (mtr) and observe where packet loss started. > > The link was between a Juniper MX204 and Juniper ACX5448. Link length 2 > meters using 2 km single mode SFP modules. > > What is the best practice to monitor links to avoid this scenarium? What > options do we have to do link monitoring? I am investigating BFD but I am > unsure if that would have helped the situation. > > Thanks, > > Baldur > > >
Re: link monitoring
I'll sell you my Solar Winds license - cheap! Pete Rohrman Stage2 Support 212 497 8000, Opt. 2 On 4/29/21 4:39 PM, Baldur Norddahl wrote: Hello We had a 100G link that started to misbehave and caused the customers to notice bad packet loss. The optical values are just fine but we had packet loss and latency. Interface shows FEC errors on one end and carrier transitions on the other end. But otherwise the link would stay up and our monitor system completely failed to warn about the failure. Had to find the bad link by traceroute (mtr) and observe where packet loss started. The link was between a Juniper MX204 and Juniper ACX5448. Link length 2 meters using 2 km single mode SFP modules. What is the best practice to monitor links to avoid this scenarium? What options do we have to do link monitoring? I am investigating BFD but I am unsure if that would have helped the situation. Thanks, Baldur
link monitoring
Hello We had a 100G link that started to misbehave and caused the customers to notice bad packet loss. The optical values are just fine but we had packet loss and latency. Interface shows FEC errors on one end and carrier transitions on the other end. But otherwise the link would stay up and our monitor system completely failed to warn about the failure. Had to find the bad link by traceroute (mtr) and observe where packet loss started. The link was between a Juniper MX204 and Juniper ACX5448. Link length 2 meters using 2 km single mode SFP modules. What is the best practice to monitor links to avoid this scenarium? What options do we have to do link monitoring? I am investigating BFD but I am unsure if that would have helped the situation. Thanks, Baldur
Re: link monitoring and BFD in SDN networks
Gents, We need to separate the context of fast reroute via control plane topology map vs local link protection with OAM at mac/phy sub-layer and time frames at which they are relevant. There are efforts going on at the media level but then there are current solutions that are media and encapsulation independent which need to be juxtaposed to the SDN paradigm. Going back to the original question that Glen posed, it is more a question on implementation complexity. The more state machines that are pushed down to the Nodes in SDN network away from the control plane, the more cost and barriers to entry for OEM products, inter-op issues etc. Now looking squarely at BFD, the popular application is bootstrapping BFD link state to routing topology and peer pathway which may traverse multiple nodes/switches/media and encapsulations. BFD is a next hop communication failure detection mechanism which may itself rely (bootstrap) on routing topology to find alternate paths and is therefore a larger time frame event than a phy/mac sub layer protection, and is media/encapsulation independent. And the fact that such a state change will have a high probability to trigger a topology/network wide event (if not less need to run BFD) makes it a controller centric state which it needs to bootstrap its routing services on. Link layer OAM on the other hand may be a mechanism that protects the BFD event from triggering. Further, BFD will enable faster end to end connectivity communication/reachability detection than hold down timers allow on hardware that do not support OAM features. Finally the scale at which BFD is used is far less than the number of links. I.e if you have a 10K port network, you are likely using BFD on a few tens maybe (for Datacenters) and the timescale is typically in the 100s ms which any control plane software module can handle at large scale and should be run just like any hello protocols for routing services. Link layer state machines on the nodes on the other hand operate in the sub 1ms timeframe. It is an overhead, but an insignificantly small tax. Cheers, Sudeep Khuraijam On 1/21/15, 3:14 PM, Nitin Sharma nitin...@gmail.com wrote: On Wed, Jan 21, 2015 at 12:22 PM, Ronald van der Pol ronald.vander...@rvdp.org wrote: On Mon, Jan 19, 2015 at 22:55:04 +, Dave Bell wrote: http://www.rvdp.org/presentations/SC11-SRS-8021ag.pdf; The 802.1ag code used is open source and available on: https://svn.surfnet.nl/trac/dot1ag-utils/ Of course if you want fast failover, you need to send packets very rapidly. Every 250ms is not unreasonable. This is going to cause the control plane to get very chatty. Typically on high end routers, processes such as BFD are actually ran on line cards as opposed to on the routing engine. When a failure is detected this reports up into the control plane to trigger a reconvergence event. I see no reason why this couldn't occur using SDN. Exactly. This is something you want to do in hardware, especially if you want to do fast reroute with the OpenFlow group table. Problem is that many 1U OpenFlow switches do not support 802.1ag. We made the propotype mentioned above to show and investigate the benefits of OAM. The closed open networking foundation is supposed to be working on this, but I don't know the status because their mailing lists are closed. In SDN/OpenFlow I think a couple of things are needed: - configure 802.1ag on the interfaces (via ofconfig?) - configure OpenFlow paths (e.g. primary and backup) and also create forwarding entries for 802.1ag datagrams along those paths - configure fast reroute with the group table (ofconfig?) Fast reroute (in the form of fast failover) is supported in the OF spec (1.3+), using Group Tables. By doing this detection and failover are handled in hardware. rvdp Data plane reachability could be performed in SDN/OpenFlow networks using BFD/ Ethernet CFM (802.1ag), Y.1731, preferably on silicon if there is support (which i believe every silicon vendor should work on). It would not be ideal if these OAM frames are forwarded to a central controller. Today - I think it is done on some form of software layer (ovs, sdks) that reside on these OF switches.
Re: link monitoring and BFD in SDN networks
On Mon, Jan 19, 2015 at 22:55:04 +, Dave Bell wrote: http://www.rvdp.org/presentations/SC11-SRS-8021ag.pdf; The 802.1ag code used is open source and available on: https://svn.surfnet.nl/trac/dot1ag-utils/ Of course if you want fast failover, you need to send packets very rapidly. Every 250ms is not unreasonable. This is going to cause the control plane to get very chatty. Typically on high end routers, processes such as BFD are actually ran on line cards as opposed to on the routing engine. When a failure is detected this reports up into the control plane to trigger a reconvergence event. I see no reason why this couldn't occur using SDN. Exactly. This is something you want to do in hardware, especially if you want to do fast reroute with the OpenFlow group table. Problem is that many 1U OpenFlow switches do not support 802.1ag. We made the propotype mentioned above to show and investigate the benefits of OAM. The closed open networking foundation is supposed to be working on this, but I don't know the status because their mailing lists are closed. In SDN/OpenFlow I think a couple of things are needed: - configure 802.1ag on the interfaces (via ofconfig?) - configure OpenFlow paths (e.g. primary and backup) and also create forwarding entries for 802.1ag datagrams along those paths - configure fast reroute with the group table (ofconfig?) By doing this detection and failover are handled in hardware. rvdp
Re: link monitoring and BFD in SDN networks
On Wed, Jan 21, 2015 at 12:22 PM, Ronald van der Pol ronald.vander...@rvdp.org wrote: On Mon, Jan 19, 2015 at 22:55:04 +, Dave Bell wrote: http://www.rvdp.org/presentations/SC11-SRS-8021ag.pdf; The 802.1ag code used is open source and available on: https://svn.surfnet.nl/trac/dot1ag-utils/ Of course if you want fast failover, you need to send packets very rapidly. Every 250ms is not unreasonable. This is going to cause the control plane to get very chatty. Typically on high end routers, processes such as BFD are actually ran on line cards as opposed to on the routing engine. When a failure is detected this reports up into the control plane to trigger a reconvergence event. I see no reason why this couldn't occur using SDN. Exactly. This is something you want to do in hardware, especially if you want to do fast reroute with the OpenFlow group table. Problem is that many 1U OpenFlow switches do not support 802.1ag. We made the propotype mentioned above to show and investigate the benefits of OAM. The closed open networking foundation is supposed to be working on this, but I don't know the status because their mailing lists are closed. In SDN/OpenFlow I think a couple of things are needed: - configure 802.1ag on the interfaces (via ofconfig?) - configure OpenFlow paths (e.g. primary and backup) and also create forwarding entries for 802.1ag datagrams along those paths - configure fast reroute with the group table (ofconfig?) Fast reroute (in the form of fast failover) is supported in the OF spec (1.3+), using Group Tables. By doing this detection and failover are handled in hardware. rvdp Data plane reachability could be performed in SDN/OpenFlow networks using BFD/ Ethernet CFM (802.1ag), Y.1731, preferably on silicon if there is support (which i believe every silicon vendor should work on). It would not be ideal if these OAM frames are forwarded to a central controller. Today - I think it is done on some form of software layer (ovs, sdks) that reside on these OF switches.
Re: link monitoring and BFD in SDN networks
BFD etc aim to prove there is end-to-end connectivity between two points, not just that all links are up along the path. All ports could be up, but end-to-end connectivity broken, for example a misconfigured VLAN across a L2 network. Sending some kind of packet across the network is pretty much the only way to guarantee reachability. The OpenFlow protocol in particular has a way to instruct a switch to send a frame out of an interface. By default, the OpenFlow switches will forward all frames it has received and doesn't know what to do with back to the controller. This means someone could write an OAM protocol that will work via OpenFlow. A quick google for 'OpenFlow OAM' brought me this link which has someone who has done just that: http://www.rvdp.org/presentations/SC11-SRS-8021ag.pdf; Of course if you want fast failover, you need to send packets very rapidly. Every 250ms is not unreasonable. This is going to cause the control plane to get very chatty. Typically on high end routers, processes such as BFD are actually ran on line cards as opposed to on the routing engine. When a failure is detected this reports up into the control plane to trigger a reconvergence event. I see no reason why this couldn't occur using SDN. Regards, Dave On 19 January 2015 at 22:01, Glen Kent glen.k...@gmail.com wrote: Hi, Routers connected back to back often rely on BFD for link failures. Its certainly possible that there is a switch between two routers and hence a link down event on one side is not visible to the other side. So, you run some sort of an OAM protocol on the two routers so that they can detect link flaps/failures. How will this happen in SDN networks where there is no control plane on the routers. Will the routers be sending a state of all their links to a central controller who will then detect that a link has gone down. This just doesnt sound good. I am presuming that some sort of control plane will always be required. Any pointers here? Is there any other reason other than link events for which we would need a control plane on the routers in SDN? Thanks, Glen
link monitoring and BFD in SDN networks
Hi, Routers connected back to back often rely on BFD for link failures. Its certainly possible that there is a switch between two routers and hence a link down event on one side is not visible to the other side. So, you run some sort of an OAM protocol on the two routers so that they can detect link flaps/failures. How will this happen in SDN networks where there is no control plane on the routers. Will the routers be sending a state of all their links to a central controller who will then detect that a link has gone down. This just doesnt sound good. I am presuming that some sort of control plane will always be required. Any pointers here? Is there any other reason other than link events for which we would need a control plane on the routers in SDN? Thanks, Glen