Re: bgpd better nexthop handling

Claudio Jeker Mon, 15 Aug 2022 06:20:45 -0700

On Mon, Aug 15, 2022 at 03:00:36PM +0200, Theo Buehler wrote:
> On Mon, Aug 15, 2022 at 02:26:32PM +0200, Claudio Jeker wrote:
> > On Fri, Aug 12, 2022 at 03:30:16PM +0200, Claudio Jeker wrote:
> > > On Fri, Aug 12, 2022 at 03:06:14PM +0200, Theo Buehler wrote:
> > > > On Fri, Aug 12, 2022 at 12:43:09PM +0200, Claudio Jeker wrote:
> > > > > There is currently a race in bgpd when multiple nexthop become 
> > > > > invalid at
> > > > > the same time. The problem is that the decision process may select an
> > > > > alternative path that also has a no longer valid nexthop but the 
> > > > > process
> > > > > that does all the adjustments did not reach that prefix yet.
> > > > > The main issue here is that the RDE uses the true_nexthop to 
> > > > > communicate
> > > > > this change but the true_nexthop is 0 or :: in this case.
> > > > > 
> > > > > This diff solves the issue by no longer using true_nexthop in the 
> > > > > kroute
> > > > > message but instead have the kroute code do the lookup instead. The 
> > > > > state
> > > > > in kroute is always up to date so the system knows if the nexthop is 
> > > > > valid
> > > > > or not and either issues a change or remove depending on that.
> > > > > 
> > > > > With this the rde no longer uses the true_nexthop (it is only there 
> > > > > to be
> > > > > reported to bgpctl). It only cares about exit_nexthop, validity and 
> > > > > the
> > > > > connected network info. All of those should not cause any problems 
> > > > > during
> > > > > nexthop flips.
> > > > 
> > > > This reads fine. If you fix the grammar of the comment, then it's ok, 
> > > > e.g.:
> > > > 
> > > > > +      * Ignore the nexthop for VPN routes. The gateway is a forced
> > > > 
> > > > s/a //
> > > > 
> > > > > +      * to an mpe(4) interface route using an MPLS label.
> > > 
> > > Something is still missing in this diff. Since changing a route does not
> > > update the kroutes. Guess the kroute code needs to do this proper now, I
> > > thought I could still use the old slow path for this but it looks like the
> > > update is surpressed. 
> > > 
> > > Will look more into this and come up with a new version (with the grammar
> > > fix from above).
> > 
> > Here we go. The bug I was chasing was actually introduced a few weeks ago.
> > The problem is that when you RTM_CHANGE a route knexthop_track() fails to
> > notice the change (okr and kr remain the same since the route did not
> > change) and so no update is issued.
> > Fix this by introducing knexthop_update() that just forces an update if
> > the kroute covering this knexthop is modified. The rest of the diff is
> > unmodified.
> > 
> 
> ok tb
> 
> I have one small question:
> 
> > +/*
> > + * Called on route change.
> > + */
> > +void
> > +knexthop_update(struct ktable *kt, struct kroute_full *kf)
> > +{
> > +   struct knexthop *kn;
> > +
> > +   RB_FOREACH(kn, knexthop_tree, KT2KNT(kt))
> > +           if (prefix_compare(&kf->prefix, &kn->nexthop,
> > +               kf->prefixlen) == 0)
> > +                   knexthop_send_update(kn);
> 
> Can there actually be multiple knexthops with the same prefix or could we
> return early here?


No we can't. For example in the validate via default route case most
nexthops will track the default route. So we need to walk the full list.

The problem with this diff is that it may call knexthop_send_update() too
often. Again in the default route case all nexthops will match in
prefix_compare() so all are sent to the RDE even if they are backed by
some more specific route. In the RDE the nexthop list is walked but that's
about it. Results in a fair but of imsg chatter but that's about it.
Not ideal but better update too often than not often enough.

 
> > +}
> > +
> >  void
> >  knexthop_send_update(struct knexthop *kn)
> >  {
> 

-- 
:wq Claudio

Re: bgpd better nexthop handling

Reply via email to