On 11/24/2015 05:11 AM, Vivek Venkatraman wrote:
> On Thu, Nov 19, 2015 at 12:42 PM, Christian Franke
> <[email protected] <mailto:[email protected]>> wrote:
>
> Thank you, we also saw the existing method of periodic scan for nexthop
> reachability (sending a message to zebra and waiting for response) as a
> significant impediment towards improved scaling/performance of BGP.

Indeed. As long as the issue of sending everything to Zebra again every
minute, it will be a significant improvement. :)

> These two issues you pointed out have been addressed in a subsequent
> patch that Donald is including into his 'take X' branch.

If you have a patch that introduces a bug that you know about and fixed
later in another commit, please squash them together into a single
commit, otherwise review becomes even more painful than it already is.

>     > +int
>     > +zebra_evaluate_rnh_table (vrf_id_t vrfid, int family)
>     > +{
>     > +  struct route_table *ptable;
>     > +  struct route_table *ntable;
>     > +  struct route_node *prn;
>     > +  struct route_node *nrn;
>     > +  struct rnh *rnh;
>     > +  struct zserv *client;
>     > +  struct listnode *node;
>     > +  struct rib *rib;
>     > +
>     > +  ntable = lookup_rnh_table(vrfid, family);
>     > +  if (!ntable)
>     > +    {
>     > +      zlog_debug("evaluate_rnh_table: rnh table not found\n");
>     > +      return -1;
>     > +    }
>     > +
>     > +  ptable = zebra_vrf_table(family2afi(family), SAFI_UNICAST, vrfid);
>     > +  if (!ptable)
>     > +    {
>     > +      zlog_debug("evaluate_rnh_table: prefix table not found\n");
>     > +      return -1;
>     > +    }
>     > +
>     > +  for (nrn = route_top (ntable); nrn; nrn = route_next (nrn))
>     > +    {
>     > +      if (!nrn->info)
>     > +       continue;
>     > +
>     > +      prn = route_node_match(ptable, &nrn->p);
>     > +      if (!prn)
>     > +     rib = NULL;
>     > +      else
>     > +     {
>     > +       RNODE_FOREACH_RIB(prn, rib)
>     > +         {
>     > +           if (CHECK_FLAG (rib->status, RIB_ENTRY_REMOVED))
>     > +             continue;
>     > +           if (CHECK_FLAG (rib->flags, ZEBRA_FLAG_SELECTED))
>     > +             break;
>     > +         }
>     > +     }
> 
>     Shouldn't this have similar semantics like the resolution in
>     nexthop_active_ipv4/_ipv6 in zebra_rib.c? E.g. not matching against BGP
>     routes and progressing upwards to less specific routes if the result of
>     the longest prefix match has all its ribs with RIB_ENTRY_REMOVED and/or
>     w/o ZEBRA_FLAG_SELECTED? Not matching against BGP here is obviously
>     problematic because it is what is usually needed for iBGP, but not
>     necessarily what is needed e.g. for PIM, I think.
> 
>     Looking at e.g. this (admittedly somewhat synthetic) case:
> 
>     # show ip route
>     B>  10.10.0.0/16 <http://10.10.0.0/16> [200/0] via 100.100.253.3
>     (recursive), 00:00:33
>       *                        via 100.100.254.2, veth0, 00:00:33
>       *                        via 100.100.252.2, veth2, 00:00:33
>     O>* 100.100.253.0/24 <http://100.100.253.0/24> [110/20] via
>     100.100.254.2, veth0, 00:01:52
>       *                           via 100.100.252.2, veth2, 00:01:52
>     S   100.100.253.3/32 <http://100.100.253.3/32> [1/0] via 192.168.1.1
>     inactive
> 
>     It works absolutely fine with 0.99.24.1 while bgpd won't send the
>     10.10.0.0/16 <http://10.10.0.0/16> route down to zebra with this
>     patch applied, because zebra
>     claims that the nexthop is not usable.
> 
> 
> On the first point, there are scenarios where a BGP nexthop can be
> resolved over a BGP route. One example would be with BGP
> labeled-unicast. draft-ietf-rtgwg-bgp-routing-large-dc mentions a
> use-case too.
> 
> On the second point, it is a somewhat artificial scenario as you say. It
> is possible that the right change is to walk up the tree as done in
> zebra_rib.c. We'll get back on this.

Okay cool. You may want to look into the nexthop_active_ipv4/ipv6
methods whichever way you go because they really need to match the
semantics of the nexthop tracking. E.g. when the nexthop tracking comes
up with a nexthop that is resolved via BGP, it will currently not result
in a working FIB route, because bgpd will send the route to zebra, but
nexthop_active_ipv4/ipv6 won't resolve over the BGP route.

>     > diff --git a/zebra/zserv.c b/zebra/zserv.c
>     >
>     > +/* Nexthop register */
>     > +static int
>     > +zserv_nexthop_register (struct zserv *client, int sock, u_short
>     length, vrf_id_t vrf_id)
>     > +{
>     > +  struct rnh *rnh;
>     > +  struct stream *s;
>     > +  struct prefix p;
>     > +  u_short l = 0;
>     > +
>     > +  if (IS_ZEBRA_DEBUG_NHT)
>     > +    zlog_debug("nexthop_register msg from client %s: length=%d\n",
>     > +            zebra_route_string(client->proto), length);
>     > +
>     > +  s = client->ibuf;
>     > +
>     > +  while (l < length)
>     > +    {
>     > +      p.family = stream_getw(s);
>     > +      p.prefixlen = stream_getc(s);
>     > +      l += 3;
>     > +      stream_get(&p.u.prefix, s, PSIZE(p.prefixlen));
>     > +      l += PSIZE(p.prefixlen);
>     > +      rnh = zebra_add_rnh(&p, 0);
>     > +      zebra_add_rnh_client(rnh, client, vrf_id);
> 
>     What are the semantics here with respect to the vrf_id? It seems that
>     the rnh_table is only maintained for vrf 0, at least 0 is hardcoded in
>     most places.
> 
>     Also, it appears to me that a client that sends a nexthop_register with
>     e.g. vrf_id == 1 will receive one nexthop_update with vrf_id == 1
>     triggered from the zerba_add_rnh_client(..., vrf_id). This update does
>     possibly not list any resolving nexthops since it is sent before any
>     call to zebra_evaluate_rnh_table. From there on, any update will be sent
>     with vrf_id == 0 since the updates are originated by the calls to
>     zebra_evaluate_rnh_table which are always done with vrf_id 0.
> 
>     My expectation as a user of this API would be that if I send a
>     nexthop_register with vrf_id == 1, the lookup/tracking would be done in
>     the rib of vrf 1.
> 
> Yes, with the support for VRFs in BGP that we're working on, the nexthop
> tracking will be per VRF. Of course, Quagga currently does not have VRF
> support other than in Zebra, but that's what we're adding in BGP right
> now, hence some of the other mails on that topic.

This functionality is part of Zebra, so I would expect it to have the
same level of VRF awareness as other current Zebra code, which it
doesn't seem to have?

-Christian

_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev

Reply via email to