Re: bgpd patch, WAS: bgpd causing black-holes with bgp-only setup
On 11/12/07, Claudio Jeker [EMAIL PROTECTED] wrote: On Tue, Nov 06, 2007 at 06:26:47PM +0100, Tony Sarendal wrote: New version. Less duplication and a nice feature as bonus. With softreconfig in enabled the looped prefixes are accepted into the Adj-RIB-In. This means that I can tell if my neighbor AS is using a path via myself. Either I'm tired or that is cool. router-02# bgpctl show rib 192.168.0.0 flags: * = Valid, = Selected, I = via IBGP, A = Announced origin: i = IGP, e = EGP, ? = Incomplete flags destination gateway lpref med aspath origin *192.168.0.0/16 192.168.100.5 100 0 65100 i * 192.168.0.0/16 172.17.1.1 100 0 65200 65100 i * 192.168.0.0/16 172.17.1.5 100 0 65200 65200 65200 65200 65100 i router-02# I now kill the peering that 65200 has to 65100, removing their direct path to 192.168.0.0/16. router-02# bgpctl show rib 192.168.0.0 flags: * = Valid, = Selected, I = via IBGP, A = Announced origin: i = IGP, e = EGP, ? = Incomplete flags destination gateway lpref med aspath origin *192.168.0.0/16 192.168.100.5 100 0 65100 i router-02# Sweet, the looping issue is gone. Here is the bonus: router-02# bgpctl show rib neigh 172.17.1.5 in | grep 65300 * 172.17.0.2/32 172.17.1.5 100 0 65200 65300 i * 192.168.0.0/16 172.17.1.5 100 0 65200 65300 65100 i * 192.168.100.4/30172.17.1.5 100 0 65200 65300 i router-02# I now see the paths that the peer uses my network to access. Note that this depends a bit on remote implementation. I think this works agains a cisco router. /Tony Index: rde.c === RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v retrieving revision 1.228 diff -u -r1.228 rde.c --- rde.c 16 Sep 2007 15:20:50 - 1.228 +++ rde.c 6 Nov 2007 17:08:50 - @@ -919,12 +919,6 @@ /* shift to NLRI information */ p += 2 + attrpath_len; - /* aspath needs to be loop free nota bene this is not a hard error */ - if (peer-conf.ebgp !aspath_loopfree(asp-aspath, conf-as)) { - error = 0; - goto done; - } - /* parse nlri prefix */ while (nlri_len 0) { if ((pos = rde_update_get_prefix(p, nlri_len, prefix, @@ -977,10 +971,18 @@ if (fasp == NULL) fasp = asp; - rde_update_log(update, peer, fasp-nexthop-exit_nexthop, - prefix, prefixlen); - path_update(peer, fasp, prefix, prefixlen, F_LOCAL); - + rde_update_log(update, peer, + fasp-nexthop-exit_nexthop,prefix, + prefixlen); + /* handle an update with loop as a withdraw */ + if (peer-conf.ebgp !aspath_loopfree(asp-aspath, + conf-as)) + prefix_remove(peer, prefix, prefixlen, + F_LOCAL); + else + path_update(peer, fasp, prefix, prefixlen, + F_LOCAL); + /* free modified aspath */ if (fasp != asp) path_put(fasp); @@ -1075,9 +1077,15 @@ rde_update_log(update, peer, asp-nexthop-exit_nexthop, - prefix, prefixlen); - path_update(peer, fasp, prefix, prefixlen, - F_LOCAL); + prefix, prefixlen); + /* handle an update with loop as a withdraw */ + if (peer-conf.ebgp + !aspath_loopfree(asp-aspath,conf-as)) + prefix_remove(peer, prefix, + prefixlen,F_LOCAL); + else + path_update(peer, fasp, prefix, + prefixlen,F_LOCAL); /* free modified aspath */ if (fasp != asp) I looked a bit closer at this problem and the RFC mentions that pathes with loops need to be inserted into the RIB and will be ignored in phase 2 of the decision process. So this diff does just about that. It does not remove any prefix if there is a loop but instead is ignoring them during the route decision process. This seems to work for me but I'm currently unable to do larger tests. -- :wq Claudio Index: rde.c === RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v retrieving revision 1.228 diff -u -p -r1.228 rde.c --- rde.c 16
Re: bgpd patch, WAS: bgpd causing black-holes with bgp-only setup
On Tue, Nov 06, 2007 at 06:26:47PM +0100, Tony Sarendal wrote: New version. Less duplication and a nice feature as bonus. With softreconfig in enabled the looped prefixes are accepted into the Adj-RIB-In. This means that I can tell if my neighbor AS is using a path via myself. Either I'm tired or that is cool. router-02# bgpctl show rib 192.168.0.0 flags: * = Valid, = Selected, I = via IBGP, A = Announced origin: i = IGP, e = EGP, ? = Incomplete flags destination gateway lpref med aspath origin *192.168.0.0/16 192.168.100.5 100 0 65100 i * 192.168.0.0/16 172.17.1.1 100 0 65200 65100 i * 192.168.0.0/16 172.17.1.5 100 0 65200 65200 65200 65200 65100 i router-02# I now kill the peering that 65200 has to 65100, removing their direct path to 192.168.0.0/16. router-02# bgpctl show rib 192.168.0.0 flags: * = Valid, = Selected, I = via IBGP, A = Announced origin: i = IGP, e = EGP, ? = Incomplete flags destination gateway lpref med aspath origin *192.168.0.0/16 192.168.100.5 100 0 65100 i router-02# Sweet, the looping issue is gone. Here is the bonus: router-02# bgpctl show rib neigh 172.17.1.5 in | grep 65300 * 172.17.0.2/32 172.17.1.5 100 0 65200 65300 i * 192.168.0.0/16 172.17.1.5 100 0 65200 65300 65100 i * 192.168.100.4/30172.17.1.5 100 0 65200 65300 i router-02# I now see the paths that the peer uses my network to access. Note that this depends a bit on remote implementation. I think this works agains a cisco router. /Tony Index: rde.c === RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v retrieving revision 1.228 diff -u -r1.228 rde.c --- rde.c 16 Sep 2007 15:20:50 - 1.228 +++ rde.c 6 Nov 2007 17:08:50 - @@ -919,12 +919,6 @@ /* shift to NLRI information */ p += 2 + attrpath_len; - /* aspath needs to be loop free nota bene this is not a hard error */ - if (peer-conf.ebgp !aspath_loopfree(asp-aspath, conf-as)) { - error = 0; - goto done; - } - /* parse nlri prefix */ while (nlri_len 0) { if ((pos = rde_update_get_prefix(p, nlri_len, prefix, @@ -977,10 +971,18 @@ if (fasp == NULL) fasp = asp; - rde_update_log(update, peer, fasp-nexthop-exit_nexthop, - prefix, prefixlen); - path_update(peer, fasp, prefix, prefixlen, F_LOCAL); - + rde_update_log(update, peer, + fasp-nexthop-exit_nexthop,prefix, + prefixlen); + /* handle an update with loop as a withdraw */ + if (peer-conf.ebgp !aspath_loopfree(asp-aspath, + conf-as)) + prefix_remove(peer, prefix, prefixlen, + F_LOCAL); + else + path_update(peer, fasp, prefix, prefixlen, + F_LOCAL); + /* free modified aspath */ if (fasp != asp) path_put(fasp); @@ -1075,9 +1077,15 @@ rde_update_log(update, peer, asp-nexthop-exit_nexthop, - prefix, prefixlen); - path_update(peer, fasp, prefix, prefixlen, - F_LOCAL); + prefix, prefixlen); + /* handle an update with loop as a withdraw */ + if (peer-conf.ebgp + !aspath_loopfree(asp-aspath,conf-as)) + prefix_remove(peer, prefix, + prefixlen,F_LOCAL); + else + path_update(peer, fasp, prefix, + prefixlen,F_LOCAL); /* free modified aspath */ if (fasp != asp) I looked a bit closer at this problem and the RFC mentions that pathes with loops need to be inserted into the RIB and will be ignored in phase 2 of the decision process. So this diff does just about that. It does not remove any prefix if there is a loop but instead is ignoring them during the route decision process. This seems to work for me but I'm currently unable to do larger tests. -- :wq Claudio Index: rde.c === RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v retrieving revision 1.228 diff -u -p -r1.228 rde.c --- rde.c 16 Sep 2007 15:20:50 - 1.228 +++ rde.c 6 Nov 2007 18:27:42 - @@ -920,10 +920,8 @@
Re: bgpd causing black-holes with bgp-only setup
On Mon, 05 Nov 2007 07:08:56 +0700, Claudio Jeker [EMAIL PROTECTED] wrote: Hi, I'm currently setup a redundant BGP router, from your presentation (maybe around 2004-2006), you discourage using carp for fail-over/load balancing since it will loose the session. so I wonder, since I'm using 4.2-current, is using carp interface already do-able, it wont loose session, etc? Thanks, Insan On Sun, Nov 04, 2007 at 11:30:20PM +, Tony Sarendal wrote: On 11/4/07, Tony Sarendal [EMAIL PROTECTED] wrote: Thanks for all the info. I will have a look at this as well. Currently I think it is possible that route-reflector is not bug free in cases where you have route-reflector rings or other very complex setups. I only tested the easy setups till now. Why you get routing loops and black-holes in your 3 AS setups is not clear (at least for me) but I guess it may be an issue with a failed update. I have the feeling that when we get a update with a routing loop in it we should actually issue a withdraw for the prefix carried in it so the following code in rde.c is looking suspicious: /* aspath needs to be loop free nota bene this is not a hard error */ if (peer-conf.ebgp !aspath_loopfree(asp-aspath, conf-as)) { error = 0; goto done; } I'm mostly offline in the next days so maybe you beat me in finding a fix for this. -- Insan Praja SW
Re: bgpd causing black-holes with bgp-only setup
* Insan Praja SW [EMAIL PROTECTED] [2007-11-09 16:37]: On Mon, 05 Nov 2007 07:08:56 +0700, Claudio Jeker [EMAIL PROTECTED] wrote: Hi, I'm currently setup a redundant BGP router, from your presentation (maybe around 2004-2006), you discourage using carp for fail-over/load balancing since it will loose the session. so I wonder, since I'm using 4.2-current, is using carp interface already do-able, it wont loose session, etc? using carp interfaces for failover is perfectly fine, you just have to understand what it does and what not. sessions get lost and re-established of course. -- Henning Brauer, [EMAIL PROTECTED], [EMAIL PROTECTED] BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting - Hamburg Amsterdam
Re: bgpd patch, WAS: bgpd causing black-holes with bgp-only setup
diff -u version. /Tony Index: rde.c === RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v retrieving revision 1.228 diff -u -r1.228 rde.c --- rde.c 16 Sep 2007 15:20:50 - 1.228 +++ rde.c 6 Nov 2007 10:38:23 - @@ -919,12 +919,6 @@ /* shift to NLRI information */ p += 2 + attrpath_len; - /* aspath needs to be loop free nota bene this is not a hard error */ - if (peer-conf.ebgp !aspath_loopfree(asp-aspath, conf-as)) { - error = 0; - goto done; - } - /* parse nlri prefix */ while (nlri_len 0) { if ((pos = rde_update_get_prefix(p, nlri_len, prefix, @@ -954,9 +948,17 @@ peer-prefix_rcvd_update++; /* add original path to the Adj-RIB-In */ - if (peer-conf.softreconfig_in) - path_update(peer, asp, prefix, prefixlen, F_ORIGINAL); - + if (peer-conf.softreconfig_in) { + /* handle an update with loop as a withdraw */ + if (peer-conf.ebgp !aspath_loopfree(asp-aspath, + conf-as)) + prefix_remove(peer, prefix, prefixlen, + F_ORIGINAL); + else + path_update(peer, asp, prefix, prefixlen, + F_ORIGINAL); + + } /* input filter */ if (rde_filter(fasp, rules_l, peer, asp, prefix, prefixlen, peer, DIR_IN) == ACTION_DENY) { @@ -977,10 +979,18 @@ if (fasp == NULL) fasp = asp; - rde_update_log(update, peer, fasp-nexthop-exit_nexthop, - prefix, prefixlen); - path_update(peer, fasp, prefix, prefixlen, F_LOCAL); - + rde_update_log(update, peer, + fasp-nexthop-exit_nexthop,prefix, + prefixlen); + /* handle an update with loop as a withdraw */ + if (peer-conf.ebgp !aspath_loopfree(asp-aspath, + conf-as)) + prefix_remove(peer, prefix, prefixlen, + F_LOCAL); + else + path_update(peer, fasp, prefix, prefixlen, + F_LOCAL); + /* free modified aspath */ if (fasp != asp) path_put(fasp); @@ -1047,10 +1057,16 @@ peer-prefix_rcvd_update++; /* add original path to the Adj-RIB-In */ - if (peer-conf.softreconfig_in) - path_update(peer, asp, prefix, - prefixlen, F_ORIGINAL); - + if (peer-conf.softreconfig_in) { + /* handle an update with loop as a withdraw */ + if (peer-conf.ebgp + !aspath_loopfree(asp-aspath,conf-as)) + prefix_remove(peer, prefix, + prefixlen,F_ORIGINAL); + else + path_update(peer, asp, prefix, + prefixlen, F_ORIGINAL); + } /* input filter */ if (rde_filter(fasp, rules_l, peer, asp, prefix, prefixlen, peer, DIR_IN) == @@ -1075,9 +1091,15 @@ rde_update_log(update, peer, asp-nexthop-exit_nexthop, - prefix, prefixlen); - path_update(peer, fasp, prefix, prefixlen, - F_LOCAL); + prefix, prefixlen); + /* handle an update with loop as a withdraw */ + if (peer-conf.ebgp + !aspath_loopfree(asp-aspath,conf-as)) + prefix_remove(peer, prefix, + prefixlen,F_LOCAL); + else + path_update(peer, fasp, prefix, + prefixlen,F_LOCAL); /* free modified aspath */ if (fasp != asp)
Re: bgpd patch, WAS: bgpd causing black-holes with bgp-only setup
New version. Less duplication and a nice feature as bonus. With softreconfig in enabled the looped prefixes are accepted into the Adj-RIB-In. This means that I can tell if my neighbor AS is using a path via myself. Either I'm tired or that is cool. router-02# bgpctl show rib 192.168.0.0 flags: * = Valid, = Selected, I = via IBGP, A = Announced origin: i = IGP, e = EGP, ? = Incomplete flags destination gateway lpref med aspath origin *192.168.0.0/16 192.168.100.5 100 0 65100 i * 192.168.0.0/16 172.17.1.1 100 0 65200 65100 i * 192.168.0.0/16 172.17.1.5 100 0 65200 65200 65200 65200 65100 i router-02# I now kill the peering that 65200 has to 65100, removing their direct path to 192.168.0.0/16. router-02# bgpctl show rib 192.168.0.0 flags: * = Valid, = Selected, I = via IBGP, A = Announced origin: i = IGP, e = EGP, ? = Incomplete flags destination gateway lpref med aspath origin *192.168.0.0/16 192.168.100.5 100 0 65100 i router-02# Sweet, the looping issue is gone. Here is the bonus: router-02# bgpctl show rib neigh 172.17.1.5 in | grep 65300 * 172.17.0.2/32 172.17.1.5 100 0 65200 65300 i * 192.168.0.0/16 172.17.1.5 100 0 65200 65300 65100 i * 192.168.100.4/30172.17.1.5 100 0 65200 65300 i router-02# I now see the paths that the peer uses my network to access. Note that this depends a bit on remote implementation. I think this works agains a cisco router. /Tony Index: rde.c === RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v retrieving revision 1.228 diff -u -r1.228 rde.c --- rde.c 16 Sep 2007 15:20:50 - 1.228 +++ rde.c 6 Nov 2007 17:08:50 - @@ -919,12 +919,6 @@ /* shift to NLRI information */ p += 2 + attrpath_len; - /* aspath needs to be loop free nota bene this is not a hard error */ - if (peer-conf.ebgp !aspath_loopfree(asp-aspath, conf-as)) { - error = 0; - goto done; - } - /* parse nlri prefix */ while (nlri_len 0) { if ((pos = rde_update_get_prefix(p, nlri_len, prefix, @@ -977,10 +971,18 @@ if (fasp == NULL) fasp = asp; - rde_update_log(update, peer, fasp-nexthop-exit_nexthop, - prefix, prefixlen); - path_update(peer, fasp, prefix, prefixlen, F_LOCAL); - + rde_update_log(update, peer, + fasp-nexthop-exit_nexthop,prefix, + prefixlen); + /* handle an update with loop as a withdraw */ + if (peer-conf.ebgp !aspath_loopfree(asp-aspath, + conf-as)) + prefix_remove(peer, prefix, prefixlen, + F_LOCAL); + else + path_update(peer, fasp, prefix, prefixlen, + F_LOCAL); + /* free modified aspath */ if (fasp != asp) path_put(fasp); @@ -1075,9 +1077,15 @@ rde_update_log(update, peer, asp-nexthop-exit_nexthop, - prefix, prefixlen); - path_update(peer, fasp, prefix, prefixlen, - F_LOCAL); + prefix, prefixlen); + /* handle an update with loop as a withdraw */ + if (peer-conf.ebgp + !aspath_loopfree(asp-aspath,conf-as)) + prefix_remove(peer, prefix, + prefixlen,F_LOCAL); + else + path_update(peer, fasp, prefix, + prefixlen,F_LOCAL); /* free modified aspath */ if (fasp != asp) -- --- Tony Sarendal - [EMAIL PROTECTED] IP/Unix -= The scorpion replied, I couldn't help it, it's my nature =-
bgpd patch, WAS: bgpd causing black-holes with bgp-only setup
I have not yet checked how other implementations handle the situation where an update with a as-path loop hides the fact that the neighbor just lost a path. But I made a quick patch if anyone feel like testing. The black-hole condition does not appear anymore when I test. Be gentle, I only browsed through the code while on the underground to and from work. /Tony Index: rde.c === RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v retrieving revision 1.228 diff -r1.228 rde.c 922,927d921 /* aspath needs to be loop free nota bene this is not a hard error */ if (peer-conf.ebgp !aspath_loopfree(asp-aspath, conf-as)) { error = 0; goto done; } 957,959c951,961 if (peer-conf.softreconfig_in) path_update(peer, asp, prefix, prefixlen, F_ORIGINAL); --- if (peer-conf.softreconfig_in) { /* handle an update with loop as a withdraw */ if (peer-conf.ebgp !aspath_loopfree(asp-aspath, conf-as)) prefix_remove(peer, prefix, prefixlen, F_ORIGINAL); else path_update(peer, asp, prefix, prefixlen, F_ORIGINAL); } 980,983c982,993 rde_update_log(update, peer, fasp-nexthop-exit_nexthop, prefix, prefixlen); path_update(peer, fasp, prefix, prefixlen, F_LOCAL); --- rde_update_log(update, peer, fasp-nexthop-exit_nexthop,prefix, prefixlen); /* handle an update with loop as a withdraw */ if (peer-conf.ebgp !aspath_loopfree(asp-aspath, conf-as)) prefix_remove(peer, prefix, prefixlen, F_LOCAL); else path_update(peer, fasp, prefix, prefixlen, F_LOCAL); 1050,1053c1060,1069 if (peer-conf.softreconfig_in) path_update(peer, asp, prefix, prefixlen, F_ORIGINAL); --- if (peer-conf.softreconfig_in) { /* handle an update with loop as a withdraw */ if (peer-conf.ebgp !aspath_loopfree(asp-aspath,conf-as)) prefix_remove(peer, prefix, prefixlen,F_ORIGINAL); else path_update(peer, asp, prefix, prefixlen, F_ORIGINAL); } 1078,1080c1094,1102 prefix, prefixlen); path_update(peer, fasp, prefix, prefixlen, F_LOCAL); --- prefix, prefixlen); /* handle an update with loop as a withdraw */ if (peer-conf.ebgp !aspath_loopfree(asp-aspath,conf-as)) prefix_remove(peer, prefix, prefixlen,F_LOCAL); else path_update(peer, fasp, prefix, prefixlen,F_LOCAL);
bgpd causing black-holes with bgp-only setup
bgpd does not re-route correctly when I shut down a transit when I use a bgp-only design, causing black-holes for some prefixes. router-01 and router-02 are in the same AS and peer with the same transit provider. router-01 and router-02 have two ibgp peerings, primary and standby path. router-01 sets localpref 60 on all transit prefixes, router-02 sets local-pref 50. When I take down the transit on router-01 I see this on router-02: router-02# bgpctl show rib | head -n 10 flags: * = Valid, = Selected, I = via IBGP, A = Announced origin: i = IGP, e = EGP, ? = Incomplete flags destination gateway lpref med aspath origin I* 26.0.128.0/17 172.17.1.1 60 11100 65100 i * 26.0.128.0/17 192.168.100.5 50 10100 65100 i I* 26.0.144.0/22 172.17.1.1 60 11100 65100 i * 26.0.144.0/22 192.168.100.5 50 10100 65100 i I* 26.1.77.0/24172.17.1.1 60 11100 65100 i * 26.1.77.0/24192.168.100.5 50 10100 65100 i router-02# prefixes with local-pref 60 pointing at router-01. router-01 does not have it's transit peering up, and thus itself has no prefixes with local-pref 60. router-01# bgpctl show rib | head -n 10 flags: * = Valid, = Selected, I = via IBGP, A = Announced origin: i = IGP, e = EGP, ? = Incomplete flags destination gateway lpref med aspath origin I* 26.0.128.0/17 172.17.1.6 50 21100 65100 i I* 26.0.144.0/22 172.17.1.6 50 21100 65100 i I* 26.1.77.0/24172.17.1.6 50 21100 65100 i I* 26.2.172.0/22 172.17.1.6 50 21100 65100 i I* 26.3.241.0/24 172.17.1.6 50 21100 65100 i I* 26.6.126.0/24 172.17.1.6 50 21100 65100 i router-01# bgpctl show rib 26.0.128.0/17 all flags: * = Valid, = Selected, I = via IBGP, A = Announced origin: i = IGP, e = EGP, ? = Incomplete flags destination gateway lpref med aspath origin I* 26.0.128.0/17 172.17.1.6 50 21100 65100 i I* 26.0.144.0/22 172.17.1.6 50 21100 65100 i router-01# I saw this before when I tested bgpd around a year ago. So it isn't a new bug. This is with 4.2-RELEASE, no patches. This info is from a lab I setup to replicate a live environment. /Tony router-01# cat /etc/bgpd.conf # $OpenBSD: bgpd.conf,v 1.8 2007/03/29 13:37:35 claudio Exp $ # sample bgpd configuration file # see bgpd.conf(5) #macros loopback=172.17.0.1 # global configuration AS 65200 router-id $loopback network $loopback/32 set {localpref 120, med 10} network 172.17.0.0/16 set {localpref 120, med 10} network connected set {localpref 120, med 10} network static set {localpref 120, med 10} group TRANSIT { remote-as 65100 announce all set nexthop self set med 10100 set localpref 60 neighbor 192.168.100.1 { descr TRANSIT } } group IBGP { remote-as 65200 route-reflector set nexthop self set med +1000 neighbor 172.17.1.2 { local-address 172.17.1.1 descr router-02 primary } neighbor 172.17.1.6 { local-address 172.17.1.5 descr router-02 standby set med +1 } } # filter deny from any deny to any allow quick to group IBGP allow quick from group IBGP allow quick to group TRANSIT prefix 172.17.0.0/16 allow quick from group TRANSIT router-01# ifconfig lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST mtu 33208 groups: lo inet 127.0.0.1 netmask 0xff00 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5 ne3: flags=8863UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST mtu 1500 lladdr 52:54:00:12:02:01 description: transit media: Ethernet 10baseT full-duplex inet6 fe80::5054:ff:fe12:201%ne3 prefixlen 64 scopeid 0x1 inet 192.168.100.2 netmask 0xfffc broadcast 192.168.100.3 ne4: flags=8863UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST mtu 1500 lladdr 52:54:00:12:02:02 description: router-01 primary path media: Ethernet 10baseT full-duplex inet6 fe80::5054:ff:fe12:202%ne4 prefixlen 64 scopeid 0x2 inet 172.17.1.1 netmask 0xfffc broadcast 172.17.1.3 ne5: flags=8863UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST mtu 1500 lladdr 52:54:00:12:02:03 description: route-02 standby path media: Ethernet 10baseT full-duplex inet6 fe80::5054:ff:fe12:203%ne5 prefixlen 64 scopeid 0x3 inet 172.17.1.5 netmask 0xfffc broadcast 172.17.1.7 enc0: flags=0 mtu 1536 lo1: flags=8049UP,LOOPBACK,RUNNING,MULTICAST mtu 33208 description: ROUTING LOOPBACK groups: lo inet 172.17.0.1 netmask 0x router-01# router-02# cat /etc/bgpd.conf # $OpenBSD: bgpd.conf,v 1.8 2007/03/29 13:37:35 claudio Exp $ # sample bgpd configuration
Re: bgpd causing black-holes with bgp-only setup
On 11/4/07, Tony Sarendal [EMAIL PROTECTED] wrote: bgpd does not re-route correctly when I shut down a transit when I use a bgp-only design, causing black-holes for some prefixes. router-01 and router-02 are in the same AS and peer with the same transit provider. router-01 and router-02 have two ibgp peerings, primary and standby path. router-01 sets localpref 60 on all transit prefixes, router-02 sets local-pref 50. When I take down the transit on router-01 I see this on router-02: router-02# bgpctl show rib | head -n 10 flags: * = Valid, = Selected, I = via IBGP, A = Announced origin: i = IGP, e = EGP, ? = Incomplete flags destination gateway lpref med aspath origin I* 26.0.128.0/17 172.17.1.1 60 11100 65100 i * 26.0.128.0/17 192.168.100.5 50 10100 65100 i I* 26.0.144.0/22 172.17.1.1 60 11100 65100 i * 26.0.144.0/22192.168.100.5 50 10100 65100 i I* 26.1.77.0/24172.17.1.1 60 11100 65100 i * 26.1.77.0/24192.168.100.5 50 10100 65100 i router-02# prefixes with local-pref 60 pointing at router-01. router-01 does not have it's transit peering up, and thus itself has no prefixes with local-pref 60. router-01# bgpctl show rib | head -n 10 flags: * = Valid, = Selected, I = via IBGP, A = Announced origin: i = IGP, e = EGP, ? = Incomplete flags destination gateway lpref med aspath origin I* 26.0.128.0/17 172.17.1.6 50 21100 65100 i I* 26.0.144.0/22 172.17.1.6 50 21100 65100 i I* 26.1.77.0/24 172.17.1.6 50 21100 65100 i I* 26.2.172.0/22 172.17.1.6 50 21100 65100 i I* 26.3.241.0/24 172.17.1.6 50 21100 65100 i I* 26.6.126.0/24 172.17.1.6 50 21100 65100 i router-01# bgpctl show rib 26.0.128.0/17 all flags: * = Valid, = Selected, I = via IBGP, A = Announced origin: i = IGP, e = EGP, ? = Incomplete flags destination gateway lpref med aspath origin I* 26.0.128.0/17 172.17.1.6 50 21100 65100 i I* 26.0.144.0/22 172.17.1.6 50 21100 65100 i router-01# I saw this before when I tested bgpd around a year ago. So it isn't a new bug. This is with 4.2-RELEASE, no patches. This info is from a lab I setup to replicate a live environment. /Tony router-01# cat /etc/bgpd.conf # $OpenBSD: bgpd.conf,v 1.8 2007/03/29 13:37:35 claudio Exp $ # sample bgpd configuration file # see bgpd.conf(5) #macros loopback=172.17.0.1 # global configuration AS 65200 router-id $loopback network $loopback/32 set {localpref 120, med 10} network 172.17.0.0/16 set {localpref 120, med 10} network connected set {localpref 120, med 10} network static set {localpref 120, med 10} group TRANSIT { remote-as 65100 announce all set nexthop self set med 10100 set localpref 60 neighbor 192.168.100.1 { descr TRANSIT } } group IBGP { remote-as 65200 route-reflector set nexthop self set med +1000 neighbor 172.17.1.2 { local-address 172.17.1.1 descr router-02 primary } neighbor 172.17.1.6 { local-address 172.17.1.5 descr router-02 standby set med +1 } } # filter deny from any deny to any allow quick to group IBGP allow quick from group IBGP allow quick to group TRANSIT prefix 172.17.0.0/16 allow quick from group TRANSIT router-01# ifconfig lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST mtu 33208 groups: lo inet 127.0.0.1 netmask 0xff00 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5 ne3: flags=8863UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST mtu 1500 lladdr 52:54:00:12:02:01 description: transit media: Ethernet 10baseT full-duplex inet6 fe80::5054:ff:fe12:201%ne3 prefixlen 64 scopeid 0x1 inet 192.168.100.2 netmask 0xfffc broadcast 192.168.100.3 ne4: flags=8863UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST mtu 1500 lladdr 52:54:00:12:02:02 description: router-01 primary path media: Ethernet 10baseT full-duplex inet6 fe80::5054:ff:fe12:202%ne4 prefixlen 64 scopeid 0x2 inet 172.17.1.1 netmask 0xfffc broadcast 172.17.1.3 ne5: flags=8863UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST mtu 1500 lladdr 52:54:00:12:02:03 description: route-02 standby path media: Ethernet 10baseT full-duplex inet6 fe80::5054:ff:fe12:203%ne5 prefixlen 64 scopeid 0x3 inet 172.17.1.5 netmask 0xfffc broadcast 172.17.1.7 enc0: flags=0 mtu 1536 lo1: flags=8049UP,LOOPBACK,RUNNING,MULTICAST mtu 33208 description: ROUTING LOOPBACK groups: lo
Re: bgpd causing black-holes with bgp-only setup
On 11/4/07, Tony Sarendal [EMAIL PROTECTED] wrote: On 11/4/07, Tony Sarendal [EMAIL PROTECTED] wrote: bgpd does not re-route correctly when I shut down a transit when I use a bgp-only design, causing black-holes for some prefixes. router-01 and router-02 are in the same AS and peer with the same transit provider. router-01 and router-02 have two ibgp peerings, primary and standby path. router-01 sets localpref 60 on all transit prefixes, router-02 sets local-pref 50. When I take down the transit on router-01 I see this on router-02: router-02# bgpctl show rib | head -n 10 flags: * = Valid, = Selected, I = via IBGP, A = Announced origin: i = IGP, e = EGP, ? = Incomplete flags destination gateway lpref med aspath origin I* 26.0.128.0/17 172.17.1.1 60 11100 65100 i * 26.0.128.0/17 192.168.100.5 50 10100 65100 i I* 26.0.144.0/22 172.17.1.1 60 11100 65100 i * 26.0.144.0/22192.168.100.5 50 10100 65100 i I* 26.1.77.0/24172.17.1.1 60 11100 65100 i * 26.1.77.0/24192.168.100.5 50 10100 65100 i router-02# prefixes with local-pref 60 pointing at router-01. router-01 does not have it's transit peering up, and thus itself has no prefixes with local-pref 60. router-01# bgpctl show rib | head -n 10 flags: * = Valid, = Selected, I = via IBGP, A = Announced origin: i = IGP, e = EGP, ? = Incomplete flags destination gateway lpref med aspath origin I* 26.0.128.0/17 172.17.1.6 50 21100 65100 i I* 26.0.144.0/22 172.17.1.6 50 21100 65100 i I* 26.1.77.0/24 172.17.1.6 50 21100 65100 i I* 26.2.172.0/22 172.17.1.6 50 21100 65100 i I* 26.3.241.0/24 172.17.1.6 50 21100 65100 i I* 26.6.126.0/24 172.17.1.6 50 21100 65100 i router-01# bgpctl show rib 26.0.128.0/17 all flags: * = Valid, = Selected, I = via IBGP, A = Announced origin: i = IGP, e = EGP, ? = Incomplete flags destination gateway lpref med aspath origin I* 26.0.128.0/17 172.17.1.6 50 21100 65100 i I* 26.0.144.0/22 172.17.1.6 50 21100 65100 i router-01# I saw this before when I tested bgpd around a year ago. So it isn't a new bug. This is with 4.2-RELEASE, no patches. This info is from a lab I setup to replicate a live environment. /Tony router-01# cat /etc/bgpd.conf # $OpenBSD: bgpd.conf,v 1.8 2007/03/29 13:37:35 claudio Exp $ # sample bgpd configuration file # see bgpd.conf(5) #macros loopback=172.17.0.1 # global configuration AS 65200 router-id $loopback network $loopback/32 set {localpref 120, med 10} network 172.17.0.0/16 set {localpref 120, med 10} network connected set {localpref 120, med 10} network static set {localpref 120, med 10} group TRANSIT { remote-as 65100 announce all set nexthop self set med 10100 set localpref 60 neighbor 192.168.100.1 { descr TRANSIT } } group IBGP { remote-as 65200 route-reflector set nexthop self set med +1000 neighbor 172.17.1.2 { local-address 172.17.1.1 descr router-02 primary } neighbor 172.17.1.6 { local-address 172.17.1.5 descr router-02 standby set med +1 } } # filter deny from any deny to any allow quick to group IBGP allow quick from group IBGP allow quick to group TRANSIT prefix 172.17.0.0/16 allow quick from group TRANSIT router-01# ifconfig lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST mtu 33208 groups: lo inet 127.0.0.1 netmask 0xff00 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5 ne3: flags=8863UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST mtu 1500 lladdr 52:54:00:12:02:01 description: transit media: Ethernet 10baseT full-duplex inet6 fe80::5054:ff:fe12:201%ne3 prefixlen 64 scopeid 0x1 inet 192.168.100.2 netmask 0xfffc broadcast 192.168.100.3 ne4: flags=8863UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST mtu 1500 lladdr 52:54:00:12:02:02 description: router-01 primary path media: Ethernet 10baseT full-duplex inet6 fe80::5054:ff:fe12:202%ne4 prefixlen 64 scopeid 0x2 inet 172.17.1.1 netmask 0xfffc broadcast 172.17.1.3 ne5: flags=8863UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST mtu 1500 lladdr 52:54:00:12:02:03 description: route-02 standby path media: Ethernet 10baseT full-duplex inet6 fe80::5054:ff:fe12:203%ne5 prefixlen 64 scopeid 0x3
Re: bgpd causing black-holes with bgp-only setup
On 11/4/07, Tony Sarendal [EMAIL PROTECTED] wrote: On 11/4/07, Tony Sarendal [EMAIL PROTECTED] wrote: On 11/4/07, Tony Sarendal [EMAIL PROTECTED] wrote: bgpd does not re-route correctly when I shut down a transit when I use a bgp-only design, causing black-holes for some prefixes. router-01 and router-02 are in the same AS and peer with the same transit provider. router-01 and router-02 have two ibgp peerings, primary and standby path. router-01 sets localpref 60 on all transit prefixes, router-02 sets local-pref 50. When I take down the transit on router-01 I see this on router-02: router-02# bgpctl show rib | head -n 10 flags: * = Valid, = Selected, I = via IBGP, A = Announced origin: i = IGP, e = EGP, ? = Incomplete flags destination gateway lpref med aspath origin I* 26.0.128.0/17 172.17.1.1 60 11100 65100 i * 26.0.128.0/17 192.168.100.5 50 10100 65100 i I* 26.0.144.0/22 172.17.1.1 60 11100 65100 i * 26.0.144.0/22192.168.100.5 50 10100 65100 i I* 26.1.77.0/24172.17.1.1 60 11100 65100 i * 26.1.77.0/24192.168.100.5 50 10100 65100 i router-02# prefixes with local-pref 60 pointing at router-01. router-01 does not have it's transit peering up, and thus itself has no prefixes with local-pref 60. router-01# bgpctl show rib | head -n 10 flags: * = Valid, = Selected, I = via IBGP, A = Announced origin: i = IGP, e = EGP, ? = Incomplete flags destination gateway lpref med aspath origin I* 26.0.128.0/17 172.17.1.6 50 21100 65100 i I* 26.0.144.0/22 172.17.1.6 50 21100 65100 i I* 26.1.77.0/24 172.17.1.6 50 21100 65100 i I* 26.2.172.0/22 172.17.1.6 50 21100 65100 i I* 26.3.241.0/24 172.17.1.6 50 21100 65100 i I* 26.6.126.0/24 172.17.1.6 50 21100 65100 i router-01# bgpctl show rib 26.0.128.0/17 all flags: * = Valid, = Selected, I = via IBGP, A = Announced origin: i = IGP, e = EGP, ? = Incomplete flags destination gateway lpref med aspath origin I* 26.0.128.0/17 172.17.1.6 50 21100 65100 i I* 26.0.144.0/22 172.17.1.6 50 21100 65100 i router-01# I saw this before when I tested bgpd around a year ago. So it isn't a new bug. This is with 4.2-RELEASE, no patches. This info is from a lab I setup to replicate a live environment. /Tony router-01# cat /etc/bgpd.conf # $OpenBSD: bgpd.conf,v 1.8 2007/03/29 13:37:35 claudio Exp $ # sample bgpd configuration file # see bgpd.conf(5) #macros loopback=172.17.0.1 # global configuration AS 65200 router-id $loopback network $loopback/32 set {localpref 120, med 10} network 172.17.0.0/16 set {localpref 120, med 10} network connected set {localpref 120, med 10} network static set {localpref 120, med 10} group TRANSIT { remote-as 65100 announce all set nexthop self set med 10100 set localpref 60 neighbor 192.168.100.1 { descr TRANSIT } } group IBGP { remote-as 65200 route-reflector set nexthop self set med +1000 neighbor 172.17.1.2 { local-address 172.17.1.1 descr router-02 primary } neighbor 172.17.1.6 { local-address 172.17.1.5 descr router-02 standby set med +1 } } # filter deny from any deny to any allow quick to group IBGP allow quick from group IBGP allow quick to group TRANSIT prefix 172.17.0.0/16 allow quick from group TRANSIT router-01# ifconfig lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST mtu 33208 groups: lo inet 127.0.0.1 netmask 0xff00 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5 ne3: flags=8863UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST mtu 1500 lladdr 52:54:00:12:02:01 description: transit media: Ethernet 10baseT full-duplex inet6 fe80::5054:ff:fe12:201%ne3 prefixlen 64 scopeid 0x1 inet 192.168.100.2 netmask 0xfffc broadcast 192.168.100.3 ne4: flags=8863UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST mtu 1500 lladdr 52:54:00:12:02:02 description: router-01 primary path media: Ethernet 10baseT full-duplex inet6 fe80::5054:ff:fe12:202%ne4 prefixlen 64 scopeid 0x2 inet 172.17.1.1 netmask 0xfffc broadcast 172.17.1.3 ne5: flags=8863UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST mtu 1500
Re: bgpd causing black-holes with bgp-only setup
On Sun, Nov 04, 2007 at 11:30:20PM +, Tony Sarendal wrote: On 11/4/07, Tony Sarendal [EMAIL PROTECTED] wrote: Thanks for all the info. I will have a look at this as well. Currently I think it is possible that route-reflector is not bug free in cases where you have route-reflector rings or other very complex setups. I only tested the easy setups till now. Why you get routing loops and black-holes in your 3 AS setups is not clear (at least for me) but I guess it may be an issue with a failed update. I have the feeling that when we get a update with a routing loop in it we should actually issue a withdraw for the prefix carried in it so the following code in rde.c is looking suspicious: /* aspath needs to be loop free nota bene this is not a hard error */ if (peer-conf.ebgp !aspath_loopfree(asp-aspath, conf-as)) { error = 0; goto done; } I'm mostly offline in the next days so maybe you beat me in finding a fix for this. -- :wq Claudio
Re: bgpd causing black-holes with bgp-only setup
On 11/5/07, Claudio Jeker [EMAIL PROTECTED] wrote: On Sun, Nov 04, 2007 at 11:30:20PM +, Tony Sarendal wrote: On 11/4/07, Tony Sarendal [EMAIL PROTECTED] wrote: Thanks for all the info. I will have a look at this as well. Currently I think it is possible that route-reflector is not bug free in cases where you have route-reflector rings or other very complex setups. I only tested the easy setups till now. Why you get routing loops and black-holes in your 3 AS setups is not clear (at least for me) but I guess it may be an issue with a failed update. I have the feeling that when we get a update with a routing loop in it we should actually issue a withdraw for the prefix carried in it so the following code in rde.c is looking suspicious: /* aspath needs to be loop free nota bene this is not a hard error */ if (peer-conf.ebgp !aspath_loopfree(asp-aspath, conf-as)) { error = 0; goto done; } I'm mostly offline in the next days so maybe you beat me in finding a fix for this. -- RFC4271: Changing the attribute(s) of a route is accomplished by advertising a replacement route. The replacement route carries new (changed) attributes and has the same address prefix as the original route. That is the reason. When in my tests AS65200 looses direct connectivity with AS65100 it sees AS65300 as a viable path. It sends a WITHDRAW of the AS65100 prefix to AS65300 via the primary peering. On the standby peering no WITHDRAW is sent, instead AS65200 sends an UPDATE with it's new path. Since this update has AS65300 in the AS-PATH AS65300 will discard the update and just missed the fact that AS65200 doesn't have connectivity to AS65100. Handling an incoming UPDATE with a loop as a WITHDRAW, be it as-path, cluster-list or originator-id, sounds pretty good to me right now. I'll sleep on it and see how it feels tomorrow. As I said, I don't see anything here that violates RFC's, but I have never seen this before either. I will try to get the time to check out how IOS and IOS XR handle this. No point in re-inventing the wheel if they happen to have a round one. /Tony