fix possible nexthop list corruption in bgpd

Claudio Jeker Fri, 24 Jan 2020 17:25:14 -0800

Adam Thompson reported a bgpd crash he sees in 6.6. Using
kern.nosuidcoredump=3 he was able to get me a back trace of the crash.
The RDE chokes on the TAILQ_REMOVE in nexthop_runner() which indicates
that the nexthop_runners list gets corrupted.
After staring at the code for a while I realized that it is possible that
a nexthop is put on the runner list even though there are no objects to
process. In this case if nexthop_unref() is called before the
nexthop_runner() had a chance to run and remove that nexthop the queue
will be corrupt because of a use-after-free of that element.
Fix is simple, check before enqueuing a nexthop on the nexthop_runners
queue that next_prefix is not NULL.


The 2nd hunk just adds a debug log in the case where a prefix removal
actually completes the nexthop run.

OK?
-- 
:wq Claudio

Index: rde_rib.c
===================================================================
RCS file: /cvs/src/usr.sbin/bgpd/rde_rib.c,v
retrieving revision 1.214
diff -u -p -r1.214 rde_rib.c
--- rde_rib.c   10 Jan 2020 14:52:57 -0000      1.214
+++ rde_rib.c   23 Jan 2020 03:59:09 -0000
@@ -1800,8 +1800,11 @@ nexthop_update(struct kroute_nexthop *ms
        nh->nexthop_netlen = msg->netlen;
 
        nh->next_prefix = LIST_FIRST(&nh->prefix_h);
-       TAILQ_INSERT_HEAD(&nexthop_runners, nh, runner_l);
-       log_debug("nexthop %s update starting", log_addr(&nh->exit_nexthop));
+       if (nh->next_prefix != NULL) {
+               TAILQ_INSERT_HEAD(&nexthop_runners, nh, runner_l);
+               log_debug("nexthop %s update starting",
+                   log_addr(&nh->exit_nexthop));
+       }
 }
 
 void
@@ -1860,8 +1863,11 @@ nexthop_unlink(struct prefix *p)
        if (p == p->nexthop->next_prefix) {
                p->nexthop->next_prefix = LIST_NEXT(p, entry.list.nexthop);
                /* remove nexthop from list if no prefixes left to update */
-               if (p->nexthop->next_prefix == NULL)
+               if (p->nexthop->next_prefix == NULL) {
                        TAILQ_REMOVE(&nexthop_runners, p->nexthop, runner_l);
+                       log_debug("nexthop %s update finished",
+                           log_addr(&p->nexthop->exit_nexthop));
+               }
        }
 
        p->flags &= ~PREFIX_NEXTHOP_LINKED;

fix possible nexthop list corruption in bgpd

Reply via email to