Re: [PATCH net-next 1/3 v3] net: track link-status of ipv4 nexthops

2015-06-11 Thread Andy Gospodarek
On Wed, Jun 10, 2015 at 11:07:28PM -0700, Scott Feldman wrote:
 On Wed, Jun 10, 2015 at 7:37 PM, Andy Gospodarek
 go...@cumulusnetworks.com wrote:
  Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are
  reachable via an interface where carrier is off.  No action is taken,
  but additional flags are passed to userspace to indicate carrier status.
 
 Andy, it seems now RTNH_F_LINKDOWN and RTNH_F_DEAD are very similar
 and I'm wondering if this could be done without introducing a new flag
 and just use RTNH_F_DEAD.  The link change event would set RTNH_F_DEAD
 on nh on dev link down, and clear on link up.  The sysctl knob would
 be something like nexthop_dead_on_linkdown, default off.  So
 basically expanding the ways RTNH_F_DEAD can be set.  That would
 simplify the patch set quite a bit and require no changes to iproute2.
 

You are absolutely correct that what you describe would be less churn to
userspace.  From a functionality standpoint that is close to what was
originally proposed, but Alex specifically did not like the behavioral
change to what having RTNH_F_DEAD set (at least that was what I
understood).

That was what made me make the move to add this additional flag that was
exported to userspace, so it was possible to differentiate the old dead
routes/nexthop functionality from those that were not going to be dead
due to link being down.  

At this point I think I prefer the additional data provided by the new
flag exported to userspace.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/3 v3] net: track link-status of ipv4 nexthops

2015-06-11 Thread Scott Feldman
On Thu, Jun 11, 2015 at 4:23 AM, Andy Gospodarek
go...@cumulusnetworks.com wrote:
 On Wed, Jun 10, 2015 at 11:07:28PM -0700, Scott Feldman wrote:
 On Wed, Jun 10, 2015 at 7:37 PM, Andy Gospodarek
 go...@cumulusnetworks.com wrote:
  Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are
  reachable via an interface where carrier is off.  No action is taken,
  but additional flags are passed to userspace to indicate carrier status.

 Andy, it seems now RTNH_F_LINKDOWN and RTNH_F_DEAD are very similar
 and I'm wondering if this could be done without introducing a new flag
 and just use RTNH_F_DEAD.  The link change event would set RTNH_F_DEAD
 on nh on dev link down, and clear on link up.  The sysctl knob would
 be something like nexthop_dead_on_linkdown, default off.  So
 basically expanding the ways RTNH_F_DEAD can be set.  That would
 simplify the patch set quite a bit and require no changes to iproute2.


 You are absolutely correct that what you describe would be less churn to
 userspace.  From a functionality standpoint that is close to what was
 originally proposed, but Alex specifically did not like the behavioral
 change to what having RTNH_F_DEAD set (at least that was what I
 understood).

 That was what made me make the move to add this additional flag that was
 exported to userspace, so it was possible to differentiate the old dead
 routes/nexthop functionality from those that were not going to be dead
 due to link being down.

Why does user space need to know _why_ a nh is dead?  User space
already knows the state (admin/link) of the nh dev.

I not seeing why user space needs to differentiate why nh is dead.
The kernel only needs to know if nh is dead to exclude nh from ecmp
selection.  Same for an offload device.

Can you explain how this new flag provides user space more information
than what's already available from RTM_NEWLINK notifications?

 At this point I think I prefer the additional data provided by the new
 flag exported to userspace.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/3 v3] net: track link-status of ipv4 nexthops

2015-06-11 Thread Alexander Duyck



On 06/11/2015 04:23 AM, Andy Gospodarek wrote:

On Wed, Jun 10, 2015 at 11:07:28PM -0700, Scott Feldman wrote:

On Wed, Jun 10, 2015 at 7:37 PM, Andy Gospodarek
go...@cumulusnetworks.com wrote:

Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are
reachable via an interface where carrier is off.  No action is taken,
but additional flags are passed to userspace to indicate carrier status.

Andy, it seems now RTNH_F_LINKDOWN and RTNH_F_DEAD are very similar
and I'm wondering if this could be done without introducing a new flag
and just use RTNH_F_DEAD.  The link change event would set RTNH_F_DEAD
on nh on dev link down, and clear on link up.  The sysctl knob would
be something like nexthop_dead_on_linkdown, default off.  So
basically expanding the ways RTNH_F_DEAD can be set.  That would
simplify the patch set quite a bit and require no changes to iproute2.


You are absolutely correct that what you describe would be less churn to
userspace.  From a functionality standpoint that is close to what was
originally proposed, but Alex specifically did not like the behavioral
change to what having RTNH_F_DEAD set (at least that was what I
understood).

That was what made me make the move to add this additional flag that was
exported to userspace, so it was possible to differentiate the old dead
routes/nexthop functionality from those that were not going to be dead
due to link being down.
  this point I think I prefer the additional data provided by the new
flag exported to userspace.


I preferred the 2 flag solution as the original solution still required 
2 flags, it just only exposed 1 to user-space.  As a result it was much 
more error prone since it was fairly easy to get into a confused state 
about why the link was dead.


With the 2 flag solution it becomes much easier to sort out why the 
route is not functional and it is much easier to isolate for things like 
the sysctl which only disables the use of LINKDOWN and not DEAD.


- Alex
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/3 v3] net: track link-status of ipv4 nexthops

2015-06-11 Thread Scott Feldman
On Wed, Jun 10, 2015 at 7:37 PM, Andy Gospodarek
go...@cumulusnetworks.com wrote:
 Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are
 reachable via an interface where carrier is off.  No action is taken,
 but additional flags are passed to userspace to indicate carrier status.

Andy, it seems now RTNH_F_LINKDOWN and RTNH_F_DEAD are very similar
and I'm wondering if this could be done without introducing a new flag
and just use RTNH_F_DEAD.  The link change event would set RTNH_F_DEAD
on nh on dev link down, and clear on link up.  The sysctl knob would
be something like nexthop_dead_on_linkdown, default off.  So
basically expanding the ways RTNH_F_DEAD can be set.  That would
simplify the patch set quite a bit and require no changes to iproute2.

-scott
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/3 v3] net: track link-status of ipv4 nexthops

2015-06-11 Thread Scott Feldman
On Thu, Jun 11, 2015 at 7:50 AM, Alexander Duyck
alexander.h.du...@redhat.com wrote:


 On 06/11/2015 04:23 AM, Andy Gospodarek wrote:

 On Wed, Jun 10, 2015 at 11:07:28PM -0700, Scott Feldman wrote:

 On Wed, Jun 10, 2015 at 7:37 PM, Andy Gospodarek
 go...@cumulusnetworks.com wrote:

 Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are
 reachable via an interface where carrier is off.  No action is taken,
 but additional flags are passed to userspace to indicate carrier status.

 Andy, it seems now RTNH_F_LINKDOWN and RTNH_F_DEAD are very similar
 and I'm wondering if this could be done without introducing a new flag
 and just use RTNH_F_DEAD.  The link change event would set RTNH_F_DEAD
 on nh on dev link down, and clear on link up.  The sysctl knob would
 be something like nexthop_dead_on_linkdown, default off.  So
 basically expanding the ways RTNH_F_DEAD can be set.  That would
 simplify the patch set quite a bit and require no changes to iproute2.

 You are absolutely correct that what you describe would be less churn to
 userspace.  From a functionality standpoint that is close to what was
 originally proposed, but Alex specifically did not like the behavioral
 change to what having RTNH_F_DEAD set (at least that was what I
 understood).

 That was what made me make the move to add this additional flag that was
 exported to userspace, so it was possible to differentiate the old dead
 routes/nexthop functionality from those that were not going to be dead
 due to link being down.
   this point I think I prefer the additional data provided by the new
 flag exported to userspace.


 I preferred the 2 flag solution as the original solution still required 2
 flags, it just only exposed 1 to user-space.  As a result it was much more
 error prone since it was fairly easy to get into a confused state about why
 the link was dead.

 With the 2 flag solution it becomes much easier to sort out why the route is
 not functional and it is much easier to isolate for things like the sysctl
 which only disables the use of LINKDOWN and not DEAD.

Ok, for the user troubleshooting, that make sense.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/3 v3] net: track link-status of ipv4 nexthops

2015-06-11 Thread Dinesh Dutt
Yes, this is what I liked about the 2 flag solution too compared to
the original.

Dinesh

On Thu, Jun 11, 2015 at 7:50 AM, Alexander Duyck
alexander.h.du...@redhat.com wrote:


 On 06/11/2015 04:23 AM, Andy Gospodarek wrote:

 On Wed, Jun 10, 2015 at 11:07:28PM -0700, Scott Feldman wrote:

 On Wed, Jun 10, 2015 at 7:37 PM, Andy Gospodarek
 go...@cumulusnetworks.com wrote:

 Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are
 reachable via an interface where carrier is off.  No action is taken,
 but additional flags are passed to userspace to indicate carrier status.

 Andy, it seems now RTNH_F_LINKDOWN and RTNH_F_DEAD are very similar
 and I'm wondering if this could be done without introducing a new flag
 and just use RTNH_F_DEAD.  The link change event would set RTNH_F_DEAD
 on nh on dev link down, and clear on link up.  The sysctl knob would
 be something like nexthop_dead_on_linkdown, default off.  So
 basically expanding the ways RTNH_F_DEAD can be set.  That would
 simplify the patch set quite a bit and require no changes to iproute2.

 You are absolutely correct that what you describe would be less churn to
 userspace.  From a functionality standpoint that is close to what was
 originally proposed, but Alex specifically did not like the behavioral
 change to what having RTNH_F_DEAD set (at least that was what I
 understood).

 That was what made me make the move to add this additional flag that was
 exported to userspace, so it was possible to differentiate the old dead
 routes/nexthop functionality from those that were not going to be dead
 due to link being down.
   this point I think I prefer the additional data provided by the new
 flag exported to userspace.


 I preferred the 2 flag solution as the original solution still required 2
 flags, it just only exposed 1 to user-space.  As a result it was much more
 error prone since it was fairly easy to get into a confused state about why
 the link was dead.

 With the 2 flag solution it becomes much easier to sort out why the route is
 not functional and it is much easier to isolate for things like the sysctl
 which only disables the use of LINKDOWN and not DEAD.

 - Alex
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 1/3 v3] net: track link-status of ipv4 nexthops

2015-06-10 Thread Andy Gospodarek
Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are
reachable via an interface where carrier is off.  No action is taken,
but additional flags are passed to userspace to indicate carrier status.

This also includes a cleanup to fib_disable_ip to more clearly indicate
what event made the function call to replace the more cryptic force
option previously used.

v2: Split out kernel functionality into 2 patches, this patch simply sets and
clears new nexthop flag RTNH_F_LINKDOWN.

v3: Cleanups suggested by Alex as well as a bug noticed in
fib_sync_down_dev and fib_sync_up when multipath was not enabled.

Signed-off-by: Andy Gospodarek go...@cumulusnetworks.com
Signed-off-by: Dinesh Dutt dd...@cumulusnetworks.com
---
 include/net/ip_fib.h   |  4 +--
 include/uapi/linux/rtnetlink.h |  3 +++
 net/ipv4/fib_frontend.c| 22 ++--
 net/ipv4/fib_semantics.c   | 59 --
 4 files changed, 65 insertions(+), 23 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 54271ed..f73d27c 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -305,9 +305,9 @@ void fib_flush_external(struct net *net);
 
 /* Exported by fib_semantics.c */
 int ip_fib_check_default(__be32 gw, struct net_device *dev);
-int fib_sync_down_dev(struct net_device *dev, int force);
+int fib_sync_down_dev(struct net_device *dev, unsigned long event);
 int fib_sync_down_addr(struct net *net, __be32 local);
-int fib_sync_up(struct net_device *dev);
+int fib_sync_up(struct net_device *dev, unsigned int nh_flags);
 void fib_select_multipath(struct fib_result *res);
 
 /* Exported by fib_trie.c */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 17fb02f..8ab874a 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -338,6 +338,9 @@ struct rtnexthop {
 #define RTNH_F_PERVASIVE   2   /* Do recursive gateway lookup  */
 #define RTNH_F_ONLINK  4   /* Gateway is forced on link*/
 #define RTNH_F_OFFLOAD 8   /* offloaded route */
+#define RTNH_F_LINKDOWN16  /* carrier-down on nexthop */
+
+#define RTNH_F_COMPARE_MASK(RTNH_F_DEAD | RTNH_F_LINKDOWN) /* used as mask 
for route comparisons */
 
 /* Macros to handle hexthops */
 
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 872494e..872defb 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1063,9 +1063,9 @@ static void nl_fib_lookup_exit(struct net *net)
net-ipv4.fibnl = NULL;
 }
 
-static void fib_disable_ip(struct net_device *dev, int force)
+static void fib_disable_ip(struct net_device *dev, unsigned long event)
 {
-   if (fib_sync_down_dev(dev, force))
+   if (fib_sync_down_dev(dev, event))
fib_flush(dev_net(dev));
rt_cache_flush(dev_net(dev));
arp_ifdown(dev);
@@ -1081,7 +1081,7 @@ static int fib_inetaddr_event(struct notifier_block 
*this, unsigned long event,
case NETDEV_UP:
fib_add_ifaddr(ifa);
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
-   fib_sync_up(dev);
+   fib_sync_up(dev, RTNH_F_DEAD);
 #endif
atomic_inc(net-ipv4.dev_addr_genid);
rt_cache_flush(dev_net(dev));
@@ -1093,7 +1093,7 @@ static int fib_inetaddr_event(struct notifier_block 
*this, unsigned long event,
/* Last address was deleted from this interface.
 * Disable IP.
 */
-   fib_disable_ip(dev, 1);
+   fib_disable_ip(dev, event);
} else {
rt_cache_flush(dev_net(dev));
}
@@ -1107,9 +1107,10 @@ static int fib_netdev_event(struct notifier_block *this, 
unsigned long event, vo
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
struct in_device *in_dev;
struct net *net = dev_net(dev);
+   unsigned flags;
 
if (event == NETDEV_UNREGISTER) {
-   fib_disable_ip(dev, 2);
+   fib_disable_ip(dev, event);
rt_flush_dev(dev);
return NOTIFY_DONE;
}
@@ -1124,16 +1125,21 @@ static int fib_netdev_event(struct notifier_block 
*this, unsigned long event, vo
fib_add_ifaddr(ifa);
} endfor_ifa(in_dev);
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
-   fib_sync_up(dev);
+   fib_sync_up(dev, RTNH_F_DEAD);
 #endif
atomic_inc(net-ipv4.dev_addr_genid);
rt_cache_flush(net);
break;
case NETDEV_DOWN:
-   fib_disable_ip(dev, 0);
+   fib_disable_ip(dev, event);
break;
-   case NETDEV_CHANGEMTU:
case NETDEV_CHANGE:
+   flags = dev_get_flags(dev);
+   if (flags  (IFF_RUNNING|IFF_LOWER_UP))
+   fib_sync_up(dev, 

Re: [PATCH net-next 1/3 v3] net: track link-status of ipv4 nexthops

2015-06-10 Thread Scott Feldman
On Wed, Jun 10, 2015 at 7:37 PM, Andy Gospodarek
go...@cumulusnetworks.com wrote:

 @@ -1129,7 +1142,15 @@ int fib_sync_down_dev(struct net_device *dev, int 
 force)
 dead++;
 else if (nexthop_nh-nh_dev == dev 
  nexthop_nh-nh_scope != scope) {
 -   nexthop_nh-nh_flags |= RTNH_F_DEAD;
 +   switch (event) {
 +   case NETDEV_DOWN:
 +   case NETDEV_UNREGISTER:
 +   nexthop_nh-nh_flags |= RTNH_F_DEAD;
 +   /* fall through */
 +   case NETDEV_CHANGE:
 +   nexthop_nh-nh_flags |= 
 RTNH_F_LINKDOWN;
 +   break;
 +   }
  #ifdef CONFIG_IP_ROUTE_MULTIPATH
 spin_lock_bh(fib_multipath_lock);
 fi-fib_power -= nexthop_nh-nh_power;
 @@ -1139,14 +1160,22 @@ int fib_sync_down_dev(struct net_device *dev, int 
 force)
 dead++;
 }
  #ifdef CONFIG_IP_ROUTE_MULTIPATH
 -   if (force  1  nexthop_nh-nh_dev == dev) {
 +   if (event == NETDEV_UNREGISTER  nexthop_nh-nh_dev 
 == dev) {
 dead = fi-fib_nhs;
 break;
 }
  #endif
 } endfor_nexthops(fi)
 if (dead == fi-fib_nhs) {
 -   fi-fib_flags |= RTNH_F_DEAD;
 +   switch (event) {
 +   case NETDEV_DOWN:
 +   case NETDEV_UNREGISTER:
 +   fi-fib_flags |= RTNH_F_DEAD;
 +   /* fall through */
 +   case NETDEV_CHANGE:
 +   fi-fib_flags |= RTNH_F_LINKDOWN;

RTNH_F_LINKDOWN is to mark linkdown nexthop devswhy is the route
fi being marked RTNH_F_LINKDOWN?

The RTNH_F_LINKDOWN comment says:

#define RTNH_F_LINKDOWN16  /* carrier-down on nexthop */

It's a per-nh flag, not per-route flag, correct?

Can you show an ECMP example with only a subset of the nexthops dev
linkdowned?  Show the ip route output after going thru some link
down/up events on some of the nexthops devs.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/3 v3] net: track link-status of ipv4 nexthops

2015-06-10 Thread Andy Gospodarek
On Wed, Jun 10, 2015 at 07:53:59PM -0700, Scott Feldman wrote:
 On Wed, Jun 10, 2015 at 7:37 PM, Andy Gospodarek
 go...@cumulusnetworks.com wrote:
 
  @@ -1129,7 +1142,15 @@ int fib_sync_down_dev(struct net_device *dev, int 
  force)
  dead++;
  else if (nexthop_nh-nh_dev == dev 
   nexthop_nh-nh_scope != scope) {
  -   nexthop_nh-nh_flags |= RTNH_F_DEAD;
  +   switch (event) {
  +   case NETDEV_DOWN:
  +   case NETDEV_UNREGISTER:
  +   nexthop_nh-nh_flags |= RTNH_F_DEAD;
  +   /* fall through */
  +   case NETDEV_CHANGE:
  +   nexthop_nh-nh_flags |= 
  RTNH_F_LINKDOWN;
  +   break;
  +   }
   #ifdef CONFIG_IP_ROUTE_MULTIPATH
  spin_lock_bh(fib_multipath_lock);
  fi-fib_power -= nexthop_nh-nh_power;
  @@ -1139,14 +1160,22 @@ int fib_sync_down_dev(struct net_device *dev, int 
  force)
  dead++;
  }
   #ifdef CONFIG_IP_ROUTE_MULTIPATH
  -   if (force  1  nexthop_nh-nh_dev == dev) {
  +   if (event == NETDEV_UNREGISTER  
  nexthop_nh-nh_dev == dev) {
  dead = fi-fib_nhs;
  break;
  }
   #endif
  } endfor_nexthops(fi)
  if (dead == fi-fib_nhs) {
  -   fi-fib_flags |= RTNH_F_DEAD;
  +   switch (event) {
  +   case NETDEV_DOWN:
  +   case NETDEV_UNREGISTER:
  +   fi-fib_flags |= RTNH_F_DEAD;
  +   /* fall through */
  +   case NETDEV_CHANGE:
  +   fi-fib_flags |= RTNH_F_LINKDOWN;
 
 RTNH_F_LINKDOWN is to mark linkdown nexthop devswhy is the route
 fi being marked RTNH_F_LINKDOWN?
 
 The RTNH_F_LINKDOWN comment says:
 
 #define RTNH_F_LINKDOWN16  /* carrier-down on nexthop */

This is done with the dead flag already.  I'm actually following the
precedent already set there.

 It's a per-nh flag, not per-route flag, correct?
 
 Can you show an ECMP example with only a subset of the nexthops dev
 linkdowned?  Show the ip route output after going thru some link
 down/up events on some of the nexthops devs.

Sure!  This is exactly what I've been using for testing.

# ip route show
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1
90.0.0.0/24 via 70.0.0.2 dev p7p1
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 10
100.0.0.0/24
nexthop via 70.0.0.2  dev p7p1 weight 1
nexthop via 80.0.0.2  dev p8p1 weight 1
192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2 
# # take p8p1 link down
# ip route show 
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1 
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown 
90.0.0.0/24 via 70.0.0.2 dev p7p1 
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 10 dead linkdown 
100.0.0.0/24 
nexthop via 70.0.0.2  dev p7p1 weight 1
nexthop via 80.0.0.2  dev p8p1 weight 1 dead linkdown
192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2 
# ip route get 100.0.0.2 
100.0.0.2 via 70.0.0.2 dev p7p1  src 70.0.0.1 
cache 
# ip route get 100.0.0.2 
100.0.0.2 via 70.0.0.2 dev p7p1  src 70.0.0.1 
cache 
# # take p8p1 link up
# ip route show
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1 
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1
90.0.0.0/24 via 70.0.0.2 dev p7p1 
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 10 
100.0.0.0/24 
nexthop via 70.0.0.2  dev p7p1 weight 1
nexthop via 80.0.0.2  dev p8p1 weight 1
192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2 
# ip route show 
100.0.0.2 via 70.0.0.2 dev p7p1  src 70.0.0.1 
cache 
# ip route get 100.0.0.2 
100.0.0.2 via 80.0.0.2 dev p8p1  src 80.0.0.1 
cache 
# ip route get 100.0.0.2 
100.0.0.2 via 70.0.0.2 dev p7p1  src 70.0.0.1 
cache 
# ip route get 100.0.0.2 
100.0.0.2 via 80.0.0.2 dev p8p1  src 80.0.0.1 
cache 
# # you can see the round robin happening
# # take all ports p8p1 and p7p1 down
# ip route show
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1 dead linkdown 
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1 dead linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 10 dead linkdown
100.0.0.0/24
nexthop via 70.0.0.2  dev p7p1 weight 1 dead linkdown

Re: [PATCH net-next 1/3 v3] net: track link-status of ipv4 nexthops

2015-06-10 Thread Scott Feldman
On Wed, Jun 10, 2015 at 8:28 PM, Andy Gospodarek
go...@cumulusnetworks.com wrote:
 On Wed, Jun 10, 2015 at 07:53:59PM -0700, Scott Feldman wrote:
 On Wed, Jun 10, 2015 at 7:37 PM, Andy Gospodarek
 go...@cumulusnetworks.com wrote:

  @@ -1129,7 +1142,15 @@ int fib_sync_down_dev(struct net_device *dev, int 
  force)
  dead++;
  else if (nexthop_nh-nh_dev == dev 
   nexthop_nh-nh_scope != scope) {
  -   nexthop_nh-nh_flags |= RTNH_F_DEAD;
  +   switch (event) {
  +   case NETDEV_DOWN:
  +   case NETDEV_UNREGISTER:
  +   nexthop_nh-nh_flags |= 
  RTNH_F_DEAD;
  +   /* fall through */
  +   case NETDEV_CHANGE:
  +   nexthop_nh-nh_flags |= 
  RTNH_F_LINKDOWN;
  +   break;
  +   }
   #ifdef CONFIG_IP_ROUTE_MULTIPATH
  spin_lock_bh(fib_multipath_lock);
  fi-fib_power -= nexthop_nh-nh_power;
  @@ -1139,14 +1160,22 @@ int fib_sync_down_dev(struct net_device *dev, int 
  force)
  dead++;
  }
   #ifdef CONFIG_IP_ROUTE_MULTIPATH
  -   if (force  1  nexthop_nh-nh_dev == dev) {
  +   if (event == NETDEV_UNREGISTER  
  nexthop_nh-nh_dev == dev) {
  dead = fi-fib_nhs;
  break;
  }
   #endif
  } endfor_nexthops(fi)
  if (dead == fi-fib_nhs) {
  -   fi-fib_flags |= RTNH_F_DEAD;
  +   switch (event) {
  +   case NETDEV_DOWN:
  +   case NETDEV_UNREGISTER:
  +   fi-fib_flags |= RTNH_F_DEAD;
  +   /* fall through */
  +   case NETDEV_CHANGE:
  +   fi-fib_flags |= RTNH_F_LINKDOWN;

 RTNH_F_LINKDOWN is to mark linkdown nexthop devswhy is the route
 fi being marked RTNH_F_LINKDOWN?

 The RTNH_F_LINKDOWN comment says:

 #define RTNH_F_LINKDOWN16  /* carrier-down on nexthop */

 This is done with the dead flag already.  I'm actually following the
 precedent already set there.

 It's a per-nh flag, not per-route flag, correct?

 Can you show an ECMP example with only a subset of the nexthops dev
 linkdowned?  Show the ip route output after going thru some link
 down/up events on some of the nexthops devs.

 Sure!  This is exactly what I've been using for testing.

 # ip route show
 70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
 80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1
 90.0.0.0/24 via 70.0.0.2 dev p7p1
 90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 10
 100.0.0.0/24
 nexthop via 70.0.0.2  dev p7p1 weight 1
 nexthop via 80.0.0.2  dev p8p1 weight 1
 192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2
 # # take p8p1 link down
 # ip route show
 70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
 80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
 90.0.0.0/24 via 70.0.0.2 dev p7p1
 90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 10 dead linkdown
 100.0.0.0/24
 nexthop via 70.0.0.2  dev p7p1 weight 1
 nexthop via 80.0.0.2  dev p8p1 weight 1 dead linkdown
 192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2
 # ip route get 100.0.0.2
 100.0.0.2 via 70.0.0.2 dev p7p1  src 70.0.0.1
 cache
 # ip route get 100.0.0.2
 100.0.0.2 via 70.0.0.2 dev p7p1  src 70.0.0.1
 cache
 # # take p8p1 link up
 # ip route show
 70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
 80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1
 90.0.0.0/24 via 70.0.0.2 dev p7p1
 90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 10
 100.0.0.0/24
 nexthop via 70.0.0.2  dev p7p1 weight 1
 nexthop via 80.0.0.2  dev p8p1 weight 1
 192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2
 # ip route show
 100.0.0.2 via 70.0.0.2 dev p7p1  src 70.0.0.1
 cache
 # ip route get 100.0.0.2
 100.0.0.2 via 80.0.0.2 dev p8p1  src 80.0.0.1
 cache
 # ip route get 100.0.0.2
 100.0.0.2 via 70.0.0.2 dev p7p1  src 70.0.0.1
 cache
 # ip route get 100.0.0.2
 100.0.0.2 via 80.0.0.2 dev p8p1  src 80.0.0.1
 cache
 # # you can see the round robin happening
 # # take all ports p8p1 and p7p1 down
 # ip route show
 70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1 dead linkdown
 80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
 90.0.0.0/24 via 70.0.0.2 dev p7p1 dead linkdown
 90.0.0.0/24 via 80.0.0.2 dev p8p1  metric