There is a somewhat critical bug in bgpd which got hit by local friends
a few weeks ago. The problem is that on session with the graceful restart
capability stale routes are not properly flushed. This can lead to bad
FIB entries and black holes. This happens when a router does not reconnect
before the timeout fires.
As a quick fix disabling the graceful reload capability with
"announce restart no" will skip the probelmatic code and no stale routes
will remain. The session must be cleared first before the change becomes
active.
The proper fix is attached. in short this removes a check for the
CAPA_GR_FORWARD flag in the timeout and IMSG_SESSION_RESTARTED handler.
CAPA_GR_RESTARTING is indicating that bgpd is currently doing a graceful
restart for this neighbor and it is set for any session that must issue
a flush of stale routes (either after a successful restart or because of
hitting a timeout or change of configuration). So whenever the SE issues
an IMSG_SESSION_FLUSH or IMSG_SESSION_RESTARTED message the
CAPA_GR_RESTARTING flag needs to be cleared.
Please test and/or verify that my logic is now correct.
--
:wq Claudio
PS: I'm away for a bit more then a week so I will not be able to answer
any questions anytime soon.
Index: rde.c
===================================================================
RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v
retrieving revision 1.326
diff -u -p -r1.326 rde.c
--- rde.c 13 Nov 2013 20:41:01 -0000 1.326
+++ rde.c 1 Jan 2014 02:30:04 -0000
@@ -3282,6 +3282,7 @@ peer_stale(u_int32_t id, u_int8_t aid)
return;
}
+ /* flush the now even staler routes out */
if (peer->staletime[aid])
peer_flush(peer, aid);
peer->staletime[aid] = now = time(NULL);
@@ -3317,7 +3318,14 @@ peer_recv_eor(struct rde_peer *peer, u_i
{
peer->prefix_rcvd_eor++;
- /* First notify SE to remove possible race with the timeout. */
+ /*
+ * First notify SE to avert a possible race with the restart timeout.
+ * If the timeout fires before this imsg is processed by the SE it will
+ * result in the same operation since the timeout issues a FLUSH which
+ * does the same as the RESTARTED action (flushing stale routes).
+ * The logic in the SE is so that only one of FLUSH or RESTARTED will
+ * be sent back to the RDE and so peer_flush is only called once.
+ */
if (imsg_compose(ibuf_se, IMSG_SESSION_RESTARTED, peer->conf.id,
0, -1, &aid, sizeof(aid)) == -1)
fatal("imsg_compose error");
Index: session.c
===================================================================
RCS file: /cvs/src/usr.sbin/bgpd/session.c,v
retrieving revision 1.333
diff -u -p -r1.333 session.c
--- session.c 13 Nov 2013 20:41:01 -0000 1.333
+++ session.c 1 Jan 2014 02:36:31 -0000
@@ -1729,12 +1729,11 @@ session_graceful_stop(struct peer *p)
for (i = 0; i < AID_MAX; i++) {
/*
- * Only flush if the peer is restarting and the peer indicated
- * it hold the forwarding state. In all other cases the
- * session was already flushed when the session came up.
+ * Only flush if the peer is restarting and the timeout fired.
+ * In all other cases the session was already flushed when the
+ * session went down or when the new open message was parsed.
*/
- if (p->capa.neg.grestart.flags[i] & CAPA_GR_RESTARTING &&
- p->capa.neg.grestart.flags[i] & CAPA_GR_FORWARD) {
+ if (p->capa.neg.grestart.flags[i] & CAPA_GR_RESTARTING) {
log_peer_warnx(&p->conf, "graceful restart of %s, "
"time-out, flushing", aid2str(i));
if (imsg_compose(ibuf_rde, IMSG_SESSION_FLUSH,
@@ -2520,12 +2519,16 @@ capa_neg_calc(struct peer *p)
*/
for (i = 0; i < AID_MAX; i++) {
+ int8_t negflags;
+
/* disable GR if the AFI/SAFI is not present */
if (p->capa.peer.grestart.flags[i] & CAPA_GR_PRESENT &&
p->capa.neg.mp[i] == 0)
p->capa.peer.grestart.flags[i] = 0; /* disable */
/* look at current GR state and decide what to do */
- if (p->capa.neg.grestart.flags[i] & CAPA_GR_RESTARTING) {
+ negflags = p->capa.neg.grestart.flags[i];
+ p->capa.neg.grestart.flags[i] = p->capa.peer.grestart.flags[i];
+ if (negflags & CAPA_GR_RESTARTING) {
if (!(p->capa.peer.grestart.flags[i] &
CAPA_GR_FORWARD)) {
if (imsg_compose(ibuf_rde, IMSG_SESSION_FLUSH,
@@ -2533,12 +2536,10 @@ capa_neg_calc(struct peer *p)
return (-1);
log_peer_warnx(&p->conf, "graceful restart of "
"%s, not restarted, flushing", aid2str(i));
- }
- p->capa.neg.grestart.flags[i] =
- p->capa.peer.grestart.flags[i] | CAPA_GR_RESTARTING;
- } else
- p->capa.neg.grestart.flags[i] =
- p->capa.peer.grestart.flags[i];
+ } else
+ p->capa.neg.grestart.flags[i] |=
+ CAPA_GR_RESTARTING;
+ }
}
p->capa.neg.grestart.timeout = p->capa.peer.grestart.timeout;
p->capa.neg.grestart.restart = p->capa.peer.grestart.restart;
@@ -2911,9 +2912,7 @@ session_dispatch_imsg(struct imsgbuf *ib
if (aid >= AID_MAX)
fatalx("IMSG_SESSION_RESTARTED: bad AID");
if (p->capa.neg.grestart.flags[aid] &
- CAPA_GR_RESTARTING &&
- p->capa.neg.grestart.flags[aid] &
- CAPA_GR_FORWARD) {
+ CAPA_GR_RESTARTING) {
log_peer_warnx(&p->conf,
"graceful restart of %s finished",
aid2str(aid));