On 11/24/2015 05:11 AM, Vivek Venkatraman wrote:
> On Thu, Nov 19, 2015 at 12:42 PM, Christian Franke
> <[email protected] <mailto:[email protected]>> wrote:
>
> Thank you, we also saw the existing method of periodic scan for nexthop
> reachability (sending a message to zebra and waiting for response) as a
> significant impediment towards improved scaling/performance of BGP.
Indeed. As long as the issue of sending everything to Zebra again every
minute, it will be a significant improvement. :)
> These two issues you pointed out have been addressed in a subsequent
> patch that Donald is including into his 'take X' branch.
If you have a patch that introduces a bug that you know about and fixed
later in another commit, please squash them together into a single
commit, otherwise review becomes even more painful than it already is.
> > +int
> > +zebra_evaluate_rnh_table (vrf_id_t vrfid, int family)
> > +{
> > + struct route_table *ptable;
> > + struct route_table *ntable;
> > + struct route_node *prn;
> > + struct route_node *nrn;
> > + struct rnh *rnh;
> > + struct zserv *client;
> > + struct listnode *node;
> > + struct rib *rib;
> > +
> > + ntable = lookup_rnh_table(vrfid, family);
> > + if (!ntable)
> > + {
> > + zlog_debug("evaluate_rnh_table: rnh table not found\n");
> > + return -1;
> > + }
> > +
> > + ptable = zebra_vrf_table(family2afi(family), SAFI_UNICAST, vrfid);
> > + if (!ptable)
> > + {
> > + zlog_debug("evaluate_rnh_table: prefix table not found\n");
> > + return -1;
> > + }
> > +
> > + for (nrn = route_top (ntable); nrn; nrn = route_next (nrn))
> > + {
> > + if (!nrn->info)
> > + continue;
> > +
> > + prn = route_node_match(ptable, &nrn->p);
> > + if (!prn)
> > + rib = NULL;
> > + else
> > + {
> > + RNODE_FOREACH_RIB(prn, rib)
> > + {
> > + if (CHECK_FLAG (rib->status, RIB_ENTRY_REMOVED))
> > + continue;
> > + if (CHECK_FLAG (rib->flags, ZEBRA_FLAG_SELECTED))
> > + break;
> > + }
> > + }
>
> Shouldn't this have similar semantics like the resolution in
> nexthop_active_ipv4/_ipv6 in zebra_rib.c? E.g. not matching against BGP
> routes and progressing upwards to less specific routes if the result of
> the longest prefix match has all its ribs with RIB_ENTRY_REMOVED and/or
> w/o ZEBRA_FLAG_SELECTED? Not matching against BGP here is obviously
> problematic because it is what is usually needed for iBGP, but not
> necessarily what is needed e.g. for PIM, I think.
>
> Looking at e.g. this (admittedly somewhat synthetic) case:
>
> # show ip route
> B> 10.10.0.0/16 <http://10.10.0.0/16> [200/0] via 100.100.253.3
> (recursive), 00:00:33
> * via 100.100.254.2, veth0, 00:00:33
> * via 100.100.252.2, veth2, 00:00:33
> O>* 100.100.253.0/24 <http://100.100.253.0/24> [110/20] via
> 100.100.254.2, veth0, 00:01:52
> * via 100.100.252.2, veth2, 00:01:52
> S 100.100.253.3/32 <http://100.100.253.3/32> [1/0] via 192.168.1.1
> inactive
>
> It works absolutely fine with 0.99.24.1 while bgpd won't send the
> 10.10.0.0/16 <http://10.10.0.0/16> route down to zebra with this
> patch applied, because zebra
> claims that the nexthop is not usable.
>
>
> On the first point, there are scenarios where a BGP nexthop can be
> resolved over a BGP route. One example would be with BGP
> labeled-unicast. draft-ietf-rtgwg-bgp-routing-large-dc mentions a
> use-case too.
>
> On the second point, it is a somewhat artificial scenario as you say. It
> is possible that the right change is to walk up the tree as done in
> zebra_rib.c. We'll get back on this.
Okay cool. You may want to look into the nexthop_active_ipv4/ipv6
methods whichever way you go because they really need to match the
semantics of the nexthop tracking. E.g. when the nexthop tracking comes
up with a nexthop that is resolved via BGP, it will currently not result
in a working FIB route, because bgpd will send the route to zebra, but
nexthop_active_ipv4/ipv6 won't resolve over the BGP route.
> > diff --git a/zebra/zserv.c b/zebra/zserv.c
> >
> > +/* Nexthop register */
> > +static int
> > +zserv_nexthop_register (struct zserv *client, int sock, u_short
> length, vrf_id_t vrf_id)
> > +{
> > + struct rnh *rnh;
> > + struct stream *s;
> > + struct prefix p;
> > + u_short l = 0;
> > +
> > + if (IS_ZEBRA_DEBUG_NHT)
> > + zlog_debug("nexthop_register msg from client %s: length=%d\n",
> > + zebra_route_string(client->proto), length);
> > +
> > + s = client->ibuf;
> > +
> > + while (l < length)
> > + {
> > + p.family = stream_getw(s);
> > + p.prefixlen = stream_getc(s);
> > + l += 3;
> > + stream_get(&p.u.prefix, s, PSIZE(p.prefixlen));
> > + l += PSIZE(p.prefixlen);
> > + rnh = zebra_add_rnh(&p, 0);
> > + zebra_add_rnh_client(rnh, client, vrf_id);
>
> What are the semantics here with respect to the vrf_id? It seems that
> the rnh_table is only maintained for vrf 0, at least 0 is hardcoded in
> most places.
>
> Also, it appears to me that a client that sends a nexthop_register with
> e.g. vrf_id == 1 will receive one nexthop_update with vrf_id == 1
> triggered from the zerba_add_rnh_client(..., vrf_id). This update does
> possibly not list any resolving nexthops since it is sent before any
> call to zebra_evaluate_rnh_table. From there on, any update will be sent
> with vrf_id == 0 since the updates are originated by the calls to
> zebra_evaluate_rnh_table which are always done with vrf_id 0.
>
> My expectation as a user of this API would be that if I send a
> nexthop_register with vrf_id == 1, the lookup/tracking would be done in
> the rib of vrf 1.
>
> Yes, with the support for VRFs in BGP that we're working on, the nexthop
> tracking will be per VRF. Of course, Quagga currently does not have VRF
> support other than in Zebra, but that's what we're adding in BGP right
> now, hence some of the other mails on that topic.
This functionality is part of Zebra, so I would expect it to have the
same level of VRF awareness as other current Zebra code, which it
doesn't seem to have?
-Christian
_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev