Martin -

You are correct.  On my flight home last night( 4am is the best time to get
home! ), I found the zebra multipath patch that I forgot to apply!
Unfortunately for me the patch depends on code that was removed due to
no-one using it on mainline.  So I was working on figuring out if I could
write code around it.

Updates in the next few days and thanks for testing.

donald

On Sat, Nov 7, 2015 at 7:49 AM, Martin Winter <[email protected]
> wrote:

> Tested this, but fails to work for me.
>
> I’ve tested it together and without the link local patch (Patch 2/4 in
> your series)
> on top of current Quagga Master.
>
> BGP does the multi path, but Zebra still only uses one path (the 2nd)
> It looks to me like zebra replaces the 1st route instead of adding the
> nexthop.
>
> If you want to try what I did, then checkout branch test/ipv6_ecmp from my
> bgptool
> at https://git-us.netdef.org/projects/NETDEF/repos/bgptool
>
> See doc inside README_ipv6_ecmp.md for details on how I test
>
> https://git-us.netdef.org/projects/NETDEF/repos/bgptool/browse/README_ipv6_ecmp.md?at=refs%2Fheads%2Ftest%2Fipv6_ecmp
>
>
> Below is what I see.
>
> Regards,
>    Martin Winter
>
>
>
> dut# sh bgp
>> BGP table version is 0, local router ID is 192.168.1.101
>> Status codes: s suppressed, d damped, h history, * valid, > best, =
>> multipath,
>>             i internal, r RIB-failure, S Stale, R Removed
>> Origin codes: i - IGP, e - EGP, ? - incomplete
>>
>>  Network          Next Hop            Metric LocPrf Weight Path
>> *=i3ffe:5:0:a::/64  fc00:192:168:2::1
>>                                          100   1000      0 65001 65100 i
>> *>i                 fc00:192:168:1::1
>>                                          100   1000      0 65001 65100 i
>>
>> Total number of prefixes 1
>> dut# sh bgp 3ffe:5:0:a::/64
>> BGP routing table entry for 3ffe:5:0:a::/64
>> Paths: (2 available, best #2, table Default-IP-Routing-Table)
>> Not advertised to any peer
>> 65001 65100
>>   fc00:192:168:2::1 from fc00:192:168:2::1 (192.168.2.1)
>>   (fe80::21c:42ff:fe82:2070)
>>     Origin IGP, metric 100, localpref 1000, valid, internal, multipath
>>     Last update: Sat Nov  7 03:45:09 2015
>>
>> 65001 65100
>>   fc00:192:168:1::1 from fc00:192:168:1::1 (192.168.1.1)
>>   (fe80::21c:42ff:feda:3815)
>>     Origin IGP, metric 100, localpref 1000, valid, internal, multipath,
>> best
>>     Last update: Sat Nov  7 03:45:08 2015
>>
>> dut# sh ipv6 route
>> Codes: K - kernel route, C - connected, S - static, R - RIPng,
>>      O - OSPFv6, I - IS-IS, B - BGP, A - Babel,
>>      > - selected route, * - FIB route
>>
>> C>* ::1/128 is directly connected, lo
>> B>* 3ffe:5:0:a::/64 [200/100] via fe80::21c:42ff:fe82:2070, eth2, 00:00:41
>> C>* fc00:192:168:1::/64 is directly connected, eth1
>> C>* fc00:192:168:2::/64 is directly connected, eth2
>> C * fe80::/64 is directly connected, eth1
>> C * fe80::/64 is directly connected, eth2
>>
>
> Looking at the bgpd log, I see:
>
> 2015/11/07 03:45:08 BGP: fc00:192:168:1::1 [FSM] Timer (routeadv timer
> expire)
> 2015/11/07 03:45:08 BGP: fc00:192:168:1::1 rcvd UPDATE w/ attr: , origin
> i, mp_nexthop fc00:192:168:1::1(fe80::21c:42ff:feda:3815), localpref 1000,
> metric 100, path 65001 65100
> 2015/11/07 03:45:08 BGP: fc00:192:168:1::1 rcvd 3ffe:5:0:a::/64
> 2015/11/07 03:45:08 BGP: Zebra send: IPv6 route add 3ffe:5:0:a::/64
> nexthop fe80::21c:42ff:feda:3815 metric 100
> 2015/11/07 03:45:09 BGP: fc00:192:168:2::1 [FSM] Timer (routeadv timer
> expire)
> 2015/11/07 03:45:09 BGP: fc00:192:168:2::1 rcvd UPDATE w/ attr: , origin
> i, mp_nexthop fc00:192:168:2::1(fe80::21c:42ff:fe82:2070), localpref 1000,
> metric 100, path 65001 65100
> 2015/11/07 03:45:09 BGP: fc00:192:168:2::1 rcvd 3ffe:5:0:a::/64
> 2015/11/07 03:45:09 BGP: 3ffe:5:0:a::/64 add mpath nexthop 0.0.0.0 peer
> (null)
> 2015/11/07 03:45:09 BGP: Zebra send: IPv6 route add 3ffe:5:0:a::/64
> nexthop fe80::21c:42ff:fe82:2070 metric 100
> 2015/11/07 03:45:13 BGP: fc00:192:168:1::1 [FSM] Timer (routeadv timer
> expire)
>
> and in zebra log:
>
> 2015/11/07 03:45:08 ZEBRA: rib_delnode: 3ffe:5:0:a::/64 vrf 0: rn
> 0xce1330, rib 0xcd5580, removing
> 2015/11/07 03:45:08 ZEBRA: rib_process: 3ffe:5:0:a::/64 vrf 0: Removing
> existing route, fib 0xcd5580
> 2015/11/07 03:45:08 ZEBRA: rib_process: 3ffe:5:0:a::/64 vrf 0: Adding
> route, select 0xce1f50
> 2015/11/07 03:45:08 ZEBRA: rib_process: 3ffe:5:0:a::/64 vrf 0: Deleting
> fib 0xcd5580, rn 0xce1330
> 2015/11/07 03:45:08 ZEBRA: rib_unlink: 3ffe:5:0:a::/64 vrf 0: rn 0xce1330,
> rib 0xcd5580
> 2015/11/07 03:45:09 ZEBRA: zebra message comes from socket [15]
> 2015/11/07 03:45:09 ZEBRA: zebra message received [ZEBRA_IPV6_ROUTE_ADD]
> 63 in VRF 0
> 2015/11/07 03:45:09 ZEBRA: rib_link: 3ffe:5:0:a::/64 vrf 0: rn 0xce1330,
> rib 0xce13f0
> 2015/11/07 03:45:09 ZEBRA: rib_add_ipv6: called rib_addnode (0xce1330,
> 0xce13f0) on new RIB entry
> 2015/11/07 03:45:09 ZEBRA: rib_add_ipv6: dumping RIB entry 0xce13f0 for
> 3ffe:5:0:a::/64 vrf 0
> 2015/11/07 03:45:09 ZEBRA: rib_add_ipv6: refcnt == 0, uptime ==
> 1446896709, type == 9, table == 0
> 2015/11/07 03:45:09 ZEBRA: rib_add_ipv6: metric == 100, distance == 200,
> flags == 9, status == 0
> 2015/11/07 03:45:09 ZEBRA: rib_add_ipv6: nexthop_num == 1,
> nexthop_active_num == 0, nexthop_fib_num == 0
> 2015/11/07 03:45:09 ZEBRA: rib_add_ipv6: NH fe80::21c:42ff:fe82:2070 with
> flags
> 2015/11/07 03:45:09 ZEBRA: rib_add_ipv6: dump complete
> 2015/11/07 03:45:09 ZEBRA: rib_add_ipv6: calling rib_delnode (0xce1330,
> 0xce1f50) on existing RIB entry
> 2015/11/07 03:45:09 ZEBRA: rib_add_ipv6: dumping RIB entry 0xce1f50 for
> 3ffe:5:0:a::/64 vrf 0
> 2015/11/07 03:45:09 ZEBRA: rib_add_ipv6: refcnt == 0, uptime ==
> 1446896708, type == 9, table == 0
> 2015/11/07 03:45:09 ZEBRA: rib_add_ipv6: metric == 100, distance == 200,
> flags == 25, status == 0
> 2015/11/07 03:45:09 ZEBRA: rib_add_ipv6: nexthop_num == 1,
> nexthop_active_num == 1, nexthop_fib_num == 0
> 2015/11/07 03:45:09 ZEBRA: rib_add_ipv6: NH fe80::21c:42ff:feda:3815 with
> flags ACTIVE FIB
>
>
>
>
> On 6 Nov 2015, at 8:57, Donald Sharp wrote:
>
> From: Ayan Banerjee <[email protected]>
>>
>> Signed-off-by: Ayan Banerjee <[email protected]>
>> Signed-off-by: Dinesh G Dutt <[email protected]>
>> Reviewed-by: Scott Feldman <[email protected]>
>> ---
>> bgpd/bgp_main.c  |    2 +
>> bgpd/bgp_vty.c   |    6 +++
>> bgpd/bgp_zebra.c |  130
>> +++++++++++++++++++++++++++++++++++++++++++++---------
>> bgpd/bgp_zebra.h |    2 +
>> 4 files changed, 120 insertions(+), 20 deletions(-)
>>
>> diff --git a/bgpd/bgp_main.c b/bgpd/bgp_main.c
>> index 7c2988c..13e0dea 100644
>> --- a/bgpd/bgp_main.c
>> +++ b/bgpd/bgp_main.c
>> @@ -300,6 +300,8 @@ bgp_exit (int status)
>>   zclient_free (zlookup);
>> if (bgp_nexthop_buf)
>>   stream_free (bgp_nexthop_buf);
>> +  if (bgp_ifindices_buf)
>> +    stream_free (bgp_ifindices_buf);
>>
>> /* reverse bgp_master_init */
>> if (master)
>> diff --git a/bgpd/bgp_vty.c b/bgpd/bgp_vty.c
>> index 4fd255f..3f2c49a 100644
>> --- a/bgpd/bgp_vty.c
>> +++ b/bgpd/bgp_vty.c
>> @@ -9179,12 +9179,18 @@ bgp_vty_init (void)
>> install_element (BGP_IPV4_NODE, &bgp_maxpaths_cmd);
>> install_element (BGP_IPV4_NODE, &no_bgp_maxpaths_cmd);
>> install_element (BGP_IPV4_NODE, &no_bgp_maxpaths_arg_cmd);
>> +  install_element (BGP_IPV6_NODE, &bgp_maxpaths_cmd);
>> +  install_element (BGP_IPV6_NODE, &no_bgp_maxpaths_cmd);
>> +  install_element (BGP_IPV6_NODE, &no_bgp_maxpaths_arg_cmd);
>> install_element (BGP_NODE, &bgp_maxpaths_ibgp_cmd);
>> install_element (BGP_NODE, &no_bgp_maxpaths_ibgp_cmd);
>> install_element (BGP_NODE, &no_bgp_maxpaths_ibgp_arg_cmd);
>> install_element (BGP_IPV4_NODE, &bgp_maxpaths_ibgp_cmd);
>> install_element (BGP_IPV4_NODE, &no_bgp_maxpaths_ibgp_cmd);
>> install_element (BGP_IPV4_NODE, &no_bgp_maxpaths_ibgp_arg_cmd);
>> +  install_element (BGP_IPV6_NODE, &bgp_maxpaths_ibgp_cmd);
>> +  install_element (BGP_IPV6_NODE, &no_bgp_maxpaths_ibgp_cmd);
>> +  install_element (BGP_IPV6_NODE, &no_bgp_maxpaths_ibgp_arg_cmd);
>>
>> /* "timers bgp" commands. */
>> install_element (BGP_NODE, &bgp_timers_cmd);
>> diff --git a/bgpd/bgp_zebra.c b/bgpd/bgp_zebra.c
>> index 2616351..5e25da9 100644
>> --- a/bgpd/bgp_zebra.c
>> +++ b/bgpd/bgp_zebra.c
>> @@ -45,6 +45,7 @@ struct in_addr router_id_zebra;
>>
>> /* Growable buffer for nexthops sent to zebra */
>> struct stream *bgp_nexthop_buf = NULL;
>> +struct stream *bgp_ifindices_buf = NULL;
>>
>> /* Router-id update message from zebra. */
>> static int
>> @@ -674,6 +675,7 @@ bgp_zebra_announce (struct prefix *p, struct bgp_info
>> *info, struct bgp *bgp, sa
>> struct peer *peer;
>> struct bgp_info *mpinfo;
>> size_t oldsize, newsize;
>> +  u_int32_t nhcount;
>>
>> if (zclient->sock < 0)
>>   return;
>> @@ -694,26 +696,27 @@ bgp_zebra_announce (struct prefix *p, struct
>> bgp_info *info, struct bgp *bgp, sa
>>     || CHECK_FLAG (peer->flags, PEER_FLAG_DISABLE_CONNECTED_CHECK))
>>   SET_FLAG (flags, ZEBRA_FLAG_INTERNAL);
>>
>> -  /* resize nexthop buffer size if necessary */
>> -  if ((oldsize = stream_get_size (bgp_nexthop_buf)) <
>> -      (sizeof (struct in_addr *) * (bgp_info_mpath_count (info) + 1)))
>> -    {
>> -      newsize = (sizeof (struct in_addr *) * (bgp_info_mpath_count
>> (info) + 1));
>> -      newsize = stream_resize (bgp_nexthop_buf, newsize);
>> -      if (newsize == oldsize)
>> -       {
>> -         zlog_err ("can't resize nexthop buffer");
>> -         return;
>> -       }
>> -    }
>> -
>> -  stream_reset (bgp_nexthop_buf);
>> +  nhcount = 1 + bgp_info_mpath_count (info);
>>
>> if (p->family == AF_INET)
>>   {
>>     struct zapi_ipv4 api;
>>     struct in_addr *nexthop;
>>
>> +      /* resize nexthop buffer size if necessary */
>> +      if ((oldsize = stream_get_size (bgp_nexthop_buf)) <
>> +          (sizeof (struct in_addr *) * nhcount))
>> +        {
>> +          newsize = (sizeof (struct in_addr *) * nhcount);
>> +          newsize = stream_resize (bgp_nexthop_buf, newsize);
>> +          if (newsize == oldsize)
>> +            {
>> +                 zlog_err ("can't resize nexthop buffer");
>> +                 return;
>> +            }
>> +        }
>> +      stream_reset (bgp_nexthop_buf);
>> +
>>     api.vrf_id = VRF_DEFAULT;
>>     api.flags = flags;
>>     nexthop = &info->attr->nexthop;
>> @@ -729,7 +732,7 @@ bgp_zebra_announce (struct prefix *p, struct bgp_info
>> *info, struct bgp *bgp, sa
>>     api.message = 0;
>>     api.safi = safi;
>>     SET_FLAG (api.message, ZAPI_MESSAGE_NEXTHOP);
>> -      api.nexthop_num = 1 + bgp_info_mpath_count (info);
>> +      api.nexthop_num = nhcount;
>>     api.nexthop = (struct in_addr **)STREAM_DATA (bgp_nexthop_buf);
>>     api.ifindex_num = 0;
>>     SET_FLAG (api.message, ZAPI_MESSAGE_METRIC);
>> @@ -763,16 +766,46 @@ bgp_zebra_announce (struct prefix *p, struct
>> bgp_info *info, struct bgp *bgp, sa
>>                      (struct prefix_ipv4 *) p, &api);
>>   }
>> #ifdef HAVE_IPV6
>> +
>> /* We have to think about a IPv6 link-local address curse. */
>> if (p->family == AF_INET6)
>>   {
>>     unsigned int ifindex;
>>     struct in6_addr *nexthop;
>>     struct zapi_ipv6 api;
>> +      int valid_nh_count = 0;
>> +
>> +      /* resize nexthop buffer size if necessary */
>> +      if ((oldsize = stream_get_size (bgp_nexthop_buf)) <
>> +          (sizeof (struct in6_addr *) * nhcount))
>> +        {
>> +          newsize = (sizeof (struct in6_addr *) * nhcount);
>> +          newsize = stream_resize (bgp_nexthop_buf, newsize);
>> +          if (newsize == oldsize)
>> +            {
>> +              zlog_err ("can't resize nexthop buffer");
>> +              return;
>> +            }
>> +        }
>> +      stream_reset (bgp_nexthop_buf);
>> +
>> +      /* resize ifindices buffer size if necessary */
>> +      if ((oldsize = stream_get_size (bgp_ifindices_buf)) <
>> +          (sizeof (unsigned int) * nhcount))
>> +        {
>> +          newsize = (sizeof (unsigned int) * nhcount);
>> +          newsize = stream_resize (bgp_ifindices_buf, newsize);
>> +          if (newsize == oldsize)
>> +            {
>> +              zlog_err ("can't resize nexthop buffer");
>> +              return;
>> +            }
>> +        }
>> +      stream_reset (bgp_ifindices_buf);
>>
>>     ifindex = 0;
>>     nexthop = NULL;
>> -
>> +
>>     assert (info->attr->extra);
>>
>>     /* Only global address nexthop exists. */
>> @@ -803,6 +836,62 @@ bgp_zebra_announce (struct prefix *p, struct
>> bgp_info *info, struct bgp *bgp, sa
>>           else if (info->peer->nexthop.ifp)
>>             ifindex = info->peer->nexthop.ifp->ifindex;
>>         }
>> +      stream_put (bgp_nexthop_buf, &nexthop, sizeof (struct in6_addr *));
>> +      stream_put (bgp_ifindices_buf, &ifindex, sizeof (unsigned int));
>> +      valid_nh_count++;
>> +
>> +      for (mpinfo = bgp_info_mpath_first (info); mpinfo;
>> +           mpinfo = bgp_info_mpath_next (mpinfo))
>> +       {
>> +          /* Only global address nexthop exists. */
>> +          if (mpinfo->attr->extra->mp_nexthop_len == 16)
>> +            {
>> +              nexthop = &mpinfo->attr->extra->mp_nexthop_global;
>> +            }
>> +          /* If both global and link-local address present. */
>> +                 if (mpinfo->attr->extra->mp_nexthop_len == 32)
>> +            {
>> +              /* Workaround for Cisco's nexthop bug.  */
>> +              if (IN6_IS_ADDR_UNSPECIFIED
>> (&mpinfo->attr->extra->mp_nexthop_global)
>> +                  && mpinfo->peer->su_remote->sa.sa_family == AF_INET6)
>> +                {
>> +                   nexthop = &mpinfo->peer->su_remote->sin6.sin6_addr;
>> +                }
>> +              else
>> +                {
>> +                  nexthop = &mpinfo->attr->extra->mp_nexthop_local;
>> +               }
>> +
>> +              if (mpinfo->peer->nexthop.ifp)
>> +                {
>> +                  ifindex = mpinfo->peer->nexthop.ifp->ifindex;
>> +                }
>> +            }
>> +             if (nexthop == NULL)
>> +               {
>> +                 continue;
>> +               }
>> +
>> +          if (IN6_IS_ADDR_LINKLOCAL (nexthop) && ! ifindex)
>> +               {
>> +                 if (mpinfo->peer->ifname)
>> +                {
>> +                   ifindex = if_nametoindex (mpinfo->peer->ifname);
>> +               }
>> +                 else if (mpinfo->peer->nexthop.ifp)
>> +                       {
>> +                          ifindex = mpinfo->peer->nexthop.ifp->ifindex;
>> +                       }
>> +                }
>> +             if (ifindex == 0)
>> +               {
>> +                 continue;
>> +               }
>> +
>> +          stream_put (bgp_nexthop_buf, &nexthop, sizeof (struct in6_addr
>> *));
>> +          stream_put (bgp_ifindices_buf, &ifindex, sizeof (unsigned
>> int));
>> +          valid_nh_count++;
>> +       }
>>
>>     /* Make Zebra API structure. */
>>     api.vrf_id = VRF_DEFAULT;
>> @@ -811,11 +900,11 @@ bgp_zebra_announce (struct prefix *p, struct
>> bgp_info *info, struct bgp *bgp, sa
>>     api.message = 0;
>>     api.safi = safi;
>>     SET_FLAG (api.message, ZAPI_MESSAGE_NEXTHOP);
>> -      api.nexthop_num = 1;
>> -      api.nexthop = &nexthop;
>> +      api.nexthop_num = valid_nh_count;
>> +      api.nexthop = (struct in6_addr **)STREAM_DATA (bgp_nexthop_buf);
>>     SET_FLAG (api.message, ZAPI_MESSAGE_IFINDEX);
>> -      api.ifindex_num = 1;
>> -      api.ifindex = &ifindex;
>> +      api.ifindex_num = valid_nh_count;
>> +      api.ifindex = (unsigned int *)STREAM_DATA (bgp_ifindices_buf);
>>     SET_FLAG (api.message, ZAPI_MESSAGE_METRIC);
>>     api.metric = info->attr->med;
>>
>> @@ -1115,4 +1204,5 @@ bgp_zebra_init (void)
>> #endif /* HAVE_IPV6 */
>>
>> bgp_nexthop_buf = stream_new(BGP_NEXTHOP_BUF_SIZE);
>> +  bgp_ifindices_buf = stream_new(BGP_IFINDICES_BUF_SIZE);
>> }
>> diff --git a/bgpd/bgp_zebra.h b/bgpd/bgp_zebra.h
>> index 8099193..466758e 100644
>> --- a/bgpd/bgp_zebra.h
>> +++ b/bgpd/bgp_zebra.h
>> @@ -22,8 +22,10 @@ Boston, MA 02111-1307, USA.  */
>> #define _QUAGGA_BGP_ZEBRA_H
>>
>> #define BGP_NEXTHOP_BUF_SIZE (8 * sizeof (struct in_addr *))
>> +#define BGP_IFINDICES_BUF_SIZE (8 * sizeof (unsigned int))
>>
>> extern struct stream *bgp_nexthop_buf;
>> +extern struct stream *bgp_ifindices_buf;
>>
>> extern void bgp_zebra_init (void);
>> extern int bgp_if_update_all (void);
>> --
>> 1.7.10.4
>>
>>
>> _______________________________________________
>> Quagga-dev mailing list
>> [email protected]
>> https://lists.quagga.net/mailman/listinfo/quagga-dev
>>
>
_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev

Reply via email to