On Tue, Nov 24, 2015 at 11:27 AM, Christian Franke < [email protected]> wrote:
> On 11/24/2015 05:11 AM, Vivek Venkatraman wrote: > > On Thu, Nov 19, 2015 at 12:42 PM, Christian Franke > > <[email protected] <mailto:[email protected]>> > wrote: > > > > Thank you, we also saw the existing method of periodic scan for nexthop > > reachability (sending a message to zebra and waiting for response) as a > > significant impediment towards improved scaling/performance of BGP. > > Indeed. As long as the issue of sending everything to Zebra again every > minute, it will be a significant improvement. :) > > > These two issues you pointed out have been addressed in a subsequent > > patch that Donald is including into his 'take X' branch. > > If you have a patch that introduces a bug that you know about and fixed > later in another commit, please squash them together into a single > commit, otherwise review becomes even more painful than it already is. > > > > +int > > > +zebra_evaluate_rnh_table (vrf_id_t vrfid, int family) > > > +{ > > > + struct route_table *ptable; > > > + struct route_table *ntable; > > > + struct route_node *prn; > > > + struct route_node *nrn; > > > + struct rnh *rnh; > > > + struct zserv *client; > > > + struct listnode *node; > > > + struct rib *rib; > > > + > > > + ntable = lookup_rnh_table(vrfid, family); > > > + if (!ntable) > > > + { > > > + zlog_debug("evaluate_rnh_table: rnh table not found\n"); > > > + return -1; > > > + } > > > + > > > + ptable = zebra_vrf_table(family2afi(family), SAFI_UNICAST, > vrfid); > > > + if (!ptable) > > > + { > > > + zlog_debug("evaluate_rnh_table: prefix table not found\n"); > > > + return -1; > > > + } > > > + > > > + for (nrn = route_top (ntable); nrn; nrn = route_next (nrn)) > > > + { > > > + if (!nrn->info) > > > + continue; > > > + > > > + prn = route_node_match(ptable, &nrn->p); > > > + if (!prn) > > > + rib = NULL; > > > + else > > > + { > > > + RNODE_FOREACH_RIB(prn, rib) > > > + { > > > + if (CHECK_FLAG (rib->status, RIB_ENTRY_REMOVED)) > > > + continue; > > > + if (CHECK_FLAG (rib->flags, ZEBRA_FLAG_SELECTED)) > > > + break; > > > + } > > > + } > > > > Shouldn't this have similar semantics like the resolution in > > nexthop_active_ipv4/_ipv6 in zebra_rib.c? E.g. not matching against > BGP > > routes and progressing upwards to less specific routes if the result > of > > the longest prefix match has all its ribs with RIB_ENTRY_REMOVED > and/or > > w/o ZEBRA_FLAG_SELECTED? Not matching against BGP here is obviously > > problematic because it is what is usually needed for iBGP, but not > > necessarily what is needed e.g. for PIM, I think. > > > > Looking at e.g. this (admittedly somewhat synthetic) case: > > > > # show ip route > > B> 10.10.0.0/16 <http://10.10.0.0/16> [200/0] via 100.100.253.3 > > (recursive), 00:00:33 > > * via 100.100.254.2, veth0, 00:00:33 > > * via 100.100.252.2, veth2, 00:00:33 > > O>* 100.100.253.0/24 <http://100.100.253.0/24> [110/20] via > > 100.100.254.2, veth0, 00:01:52 > > * via 100.100.252.2, veth2, 00:01:52 > > S 100.100.253.3/32 <http://100.100.253.3/32> [1/0] via 192.168.1.1 > > inactive > > > > It works absolutely fine with 0.99.24.1 while bgpd won't send the > > 10.10.0.0/16 <http://10.10.0.0/16> route down to zebra with this > > patch applied, because zebra > > claims that the nexthop is not usable. > > > > > > On the first point, there are scenarios where a BGP nexthop can be > > resolved over a BGP route. One example would be with BGP > > labeled-unicast. draft-ietf-rtgwg-bgp-routing-large-dc mentions a > > use-case too. > > > > On the second point, it is a somewhat artificial scenario as you say. It > > is possible that the right change is to walk up the tree as done in > > zebra_rib.c. We'll get back on this. > > Okay cool. You may want to look into the nexthop_active_ipv4/ipv6 > methods whichever way you go because they really need to match the > semantics of the nexthop tracking. E.g. when the nexthop tracking comes > up with a nexthop that is resolved via BGP, it will currently not result > in a working FIB route, because bgpd will send the route to zebra, but > nexthop_active_ipv4/ipv6 won't resolve over the BGP route. > On this point, I believe there isn't any real need to restrict BGP nexthops to be resolved only over non-BGP routes. Our patch (on Donald's 'take X' branch) has also removed this check from nexthop_active_ipv4/ipv6. I think my colleague Daniel also had more to add on this aspect. > > > diff --git a/zebra/zserv.c b/zebra/zserv.c > > > > > > +/* Nexthop register */ > > > +static int > > > +zserv_nexthop_register (struct zserv *client, int sock, u_short > > length, vrf_id_t vrf_id) > > > +{ > > > + struct rnh *rnh; > > > + struct stream *s; > > > + struct prefix p; > > > + u_short l = 0; > > > + > > > + if (IS_ZEBRA_DEBUG_NHT) > > > + zlog_debug("nexthop_register msg from client %s: length=%d\n", > > > + zebra_route_string(client->proto), length); > > > + > > > + s = client->ibuf; > > > + > > > + while (l < length) > > > + { > > > + p.family = stream_getw(s); > > > + p.prefixlen = stream_getc(s); > > > + l += 3; > > > + stream_get(&p.u.prefix, s, PSIZE(p.prefixlen)); > > > + l += PSIZE(p.prefixlen); > > > + rnh = zebra_add_rnh(&p, 0); > > > + zebra_add_rnh_client(rnh, client, vrf_id); > > > > What are the semantics here with respect to the vrf_id? It seems that > > the rnh_table is only maintained for vrf 0, at least 0 is hardcoded > in > > most places. > > > > Also, it appears to me that a client that sends a nexthop_register > with > > e.g. vrf_id == 1 will receive one nexthop_update with vrf_id == 1 > > triggered from the zerba_add_rnh_client(..., vrf_id). This update > does > > possibly not list any resolving nexthops since it is sent before any > > call to zebra_evaluate_rnh_table. From there on, any update will be > sent > > with vrf_id == 0 since the updates are originated by the calls to > > zebra_evaluate_rnh_table which are always done with vrf_id 0. > > > > My expectation as a user of this API would be that if I send a > > nexthop_register with vrf_id == 1, the lookup/tracking would be done > in > > the rib of vrf 1. > > > > Yes, with the support for VRFs in BGP that we're working on, the nexthop > > tracking will be per VRF. Of course, Quagga currently does not have VRF > > support other than in Zebra, but that's what we're adding in BGP right > > now, hence some of the other mails on that topic. > > This functionality is part of Zebra, so I would expect it to have the > same level of VRF awareness as other current Zebra code, which it > doesn't seem to have? > Yes, the original NHT implementation submitted by Cumulus was before Zebra VRF support was incorporated. NHT will be made VRF-aware as part of patches we will post extending the VRF support to BGP and to support name-based registration. Right now, since BGP does not have VRF support, it would only register for the default VRF and the notification for this would be fine with the current NHT implementation. > > -Christian >
_______________________________________________ Quagga-dev mailing list [email protected] https://lists.quagga.net/mailman/listinfo/quagga-dev
