I'm running this on one router without seeing any problems yet,
however it does not have any graceful-restart peers so it's not exactly
a great test.
Has anyone else tried this at all yet?
On 2014/01/02 00:03, Claudio Jeker wrote:
> There is a somewhat critical bug in bgpd which got hit by local friends
> a few weeks ago. The problem is that on session with the graceful restart
> capability stale routes are not properly flushed. This can lead to bad
> FIB entries and black holes. This happens when a router does not reconnect
> before the timeout fires.
>
> As a quick fix disabling the graceful reload capability with
> "announce restart no" will skip the probelmatic code and no stale routes
> will remain. The session must be cleared first before the change becomes
> active.
>
> The proper fix is attached. in short this removes a check for the
> CAPA_GR_FORWARD flag in the timeout and IMSG_SESSION_RESTARTED handler.
> CAPA_GR_RESTARTING is indicating that bgpd is currently doing a graceful
> restart for this neighbor and it is set for any session that must issue
> a flush of stale routes (either after a successful restart or because of
> hitting a timeout or change of configuration). So whenever the SE issues
> an IMSG_SESSION_FLUSH or IMSG_SESSION_RESTARTED message the
> CAPA_GR_RESTARTING flag needs to be cleared.
>
> Please test and/or verify that my logic is now correct.
> --
> :wq Claudio
>
> PS: I'm away for a bit more then a week so I will not be able to answer
> any questions anytime soon.
>
> Index: rde.c
> ===================================================================
> RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v
> retrieving revision 1.326
> diff -u -p -r1.326 rde.c
> --- rde.c 13 Nov 2013 20:41:01 -0000 1.326
> +++ rde.c 1 Jan 2014 02:30:04 -0000
> @@ -3282,6 +3282,7 @@ peer_stale(u_int32_t id, u_int8_t aid)
> return;
> }
>
> + /* flush the now even staler routes out */
> if (peer->staletime[aid])
> peer_flush(peer, aid);
> peer->staletime[aid] = now = time(NULL);
> @@ -3317,7 +3318,14 @@ peer_recv_eor(struct rde_peer *peer, u_i
> {
> peer->prefix_rcvd_eor++;
>
> - /* First notify SE to remove possible race with the timeout. */
> + /*
> + * First notify SE to avert a possible race with the restart timeout.
> + * If the timeout fires before this imsg is processed by the SE it will
> + * result in the same operation since the timeout issues a FLUSH which
> + * does the same as the RESTARTED action (flushing stale routes).
> + * The logic in the SE is so that only one of FLUSH or RESTARTED will
> + * be sent back to the RDE and so peer_flush is only called once.
> + */
> if (imsg_compose(ibuf_se, IMSG_SESSION_RESTARTED, peer->conf.id,
> 0, -1, &aid, sizeof(aid)) == -1)
> fatal("imsg_compose error");
> Index: session.c
> ===================================================================
> RCS file: /cvs/src/usr.sbin/bgpd/session.c,v
> retrieving revision 1.333
> diff -u -p -r1.333 session.c
> --- session.c 13 Nov 2013 20:41:01 -0000 1.333
> +++ session.c 1 Jan 2014 02:36:31 -0000
> @@ -1729,12 +1729,11 @@ session_graceful_stop(struct peer *p)
>
> for (i = 0; i < AID_MAX; i++) {
> /*
> - * Only flush if the peer is restarting and the peer indicated
> - * it hold the forwarding state. In all other cases the
> - * session was already flushed when the session came up.
> + * Only flush if the peer is restarting and the timeout fired.
> + * In all other cases the session was already flushed when the
> + * session went down or when the new open message was parsed.
> */
> - if (p->capa.neg.grestart.flags[i] & CAPA_GR_RESTARTING &&
> - p->capa.neg.grestart.flags[i] & CAPA_GR_FORWARD) {
> + if (p->capa.neg.grestart.flags[i] & CAPA_GR_RESTARTING) {
> log_peer_warnx(&p->conf, "graceful restart of %s, "
> "time-out, flushing", aid2str(i));
> if (imsg_compose(ibuf_rde, IMSG_SESSION_FLUSH,
> @@ -2520,12 +2519,16 @@ capa_neg_calc(struct peer *p)
> */
>
> for (i = 0; i < AID_MAX; i++) {
> + int8_t negflags;
> +
> /* disable GR if the AFI/SAFI is not present */
> if (p->capa.peer.grestart.flags[i] & CAPA_GR_PRESENT &&
> p->capa.neg.mp[i] == 0)
> p->capa.peer.grestart.flags[i] = 0; /* disable */
> /* look at current GR state and decide what to do */
> - if (p->capa.neg.grestart.flags[i] & CAPA_GR_RESTARTING) {
> + negflags = p->capa.neg.grestart.flags[i];
> + p->capa.neg.grestart.flags[i] = p->capa.peer.grestart.flags[i];
> + if (negflags & CAPA_GR_RESTARTING) {
> if (!(p->capa.peer.grestart.flags[i] &
> CAPA_GR_FORWARD)) {
> if (imsg_compose(ibuf_rde, IMSG_SESSION_FLUSH,
> @@ -2533,12 +2536,10 @@ capa_neg_calc(struct peer *p)
> return (-1);
> log_peer_warnx(&p->conf, "graceful restart of "
> "%s, not restarted, flushing", aid2str(i));
> - }
> - p->capa.neg.grestart.flags[i] =
> - p->capa.peer.grestart.flags[i] | CAPA_GR_RESTARTING;
> - } else
> - p->capa.neg.grestart.flags[i] =
> - p->capa.peer.grestart.flags[i];
> + } else
> + p->capa.neg.grestart.flags[i] |=
> + CAPA_GR_RESTARTING;
> + }
> }
> p->capa.neg.grestart.timeout = p->capa.peer.grestart.timeout;
> p->capa.neg.grestart.restart = p->capa.peer.grestart.restart;
> @@ -2911,9 +2912,7 @@ session_dispatch_imsg(struct imsgbuf *ib
> if (aid >= AID_MAX)
> fatalx("IMSG_SESSION_RESTARTED: bad AID");
> if (p->capa.neg.grestart.flags[aid] &
> - CAPA_GR_RESTARTING &&
> - p->capa.neg.grestart.flags[aid] &
> - CAPA_GR_FORWARD) {
> + CAPA_GR_RESTARTING) {
> log_peer_warnx(&p->conf,
> "graceful restart of %s finished",
> aid2str(aid));
>