On Thu, Apr 27, 2017 at 02:47:03PM -0700, William Tu wrote:
> On Thu, Apr 27, 2017 at 11:17 AM, Ben Pfaff <b...@ovn.org> wrote:
> > On Thu, Apr 27, 2017 at 10:17:20AM -0700, William Tu wrote:
> >> On Thu, Apr 27, 2017 at 9:57 AM, Ben Pfaff <b...@ovn.org> wrote:
> >> > On Thu, Apr 27, 2017 at 04:02:10AM -0700, William Tu wrote:
> >> >> Before the patch, when users create bridge named "default", although
> >> >> ovs-vsctl fails but vswitchd in the background will keep retrying it,
> >> >> causing the systemd-udev to reach 100% cpu utilization.  The reason is
> >> >> due to frequent calls into kernel's register_netdevice function,
> >> >> which will invoke several kernel elements who has registered on the
> >> >> netdevice notifier chain.  One of the notifier, the inetdev_event 
> >> >> rejects
> >> >> this devname and register_netdevice fails.  The patch prohibits creating
> >> >> "default" bridge name.
> >> >>
> >> >> VMWare-BZ: #1842388
> >> >> Signed-off-by: William Tu <u9012...@gmail.com>
> >> >
> >> > It all seems very arbitrary. Do we understand why this is an invalid
> >> > device name?
> >>
> >> Yes, when kernel is configured with CONFIG_SYSCTL, creating a new
> >> netdev creates a dir in /proc/sys/net/ipv4/conf/<device name>
> >>
> >> The <device name> "default" and "all" is pre-existed when SYSCTL
> >> starts (which means we should also prohibit "all") for default
> >> property of the system's netdev. So sysctl prevents creating dev->name
> >> is "default" or "all". A call stack is below if interested:
> >> sysctl_dev_name_is_allowed
> >>    devinet_sysctl_register
> >>      inetdev_event
> >>        notifier_call_chain
> >>          raw_notifier_call_chain
> >>            call_netdevice_notifiers_info
> >>              register_netdevice
> >
> > Do we get the same behavior (100% CPU) if creating a bridge fails for
> > other reasons?  For example, it might fail because a network device
> > already exists with the given name, or because the name is too long for
> > a network device name.  If we do get 100% CPU from such a failure, then
> > we should fix the root of the problem instead of blacklisting particular
> > names.
> 
> There are two scenarios:
> 1) if the bridge name exists, ex: eth0
> then "ovs-vsctl add-br eth0" fails due to EEXIST
> 2) if the bridge name does not exists, but is "default" or "all"
> then "ovs-vsctl add-br default" fails due to EINVAL
> 
> From OVS's point of view it's the same, ovs-vsctl fails creating the
> bridge, and keeps retry. From the whole system's point of view, (2)
> has impact on other Linux subsystems, due to kernel netdev notifier
> chain mechanism informing other subsystems when trying to add a new
> device, while (1) fails pretty early in register_netdevice() and has
> no impact.

OK, if I understand properly here, the problem is that asking the Linux
kernel datapath to add a bridge named "default" does two things.  First,
it fails.  Second, it triggers a notifier that causes OVS to try again.
The second part is what makes this different from other failures and
what makes OVS use 100% CPU in this case but not other cases.  Is that
right?

If I'm right, then how about something like the following instead?  It
pushes this down to Linux-specific code, which seems better because this
is a Linux-specific issue, and it thoroughly documents the problem.  It
compiles but I have not tested it.

Thanks,

Ben.

diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 9ff1333f8e85..79e827303d07 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016 Nicira, Inc.
+ * Copyright (c) 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017 Nicira, 
Inc.
  *
  * Licensed under the Apache License, Version 2.0 (the "License");
  * you may not use this file except in compliance with the License.
@@ -773,10 +773,28 @@ netdev_linux_alloc(void)
     return &netdev->up;
 }
 
-static void
-netdev_linux_common_construct(struct netdev_linux *netdev)
-{
+static int
+netdev_linux_common_construct(struct netdev *netdev_)
+{
+    /* Prevent any attempt to create (or open) a network device named "default"
+     * or "all".  These device names are effectively reserved on Linux because
+     * /proc/sys/net/ipv4/conf/ always contains directories by these names.  By
+     * itself this wouldn't call for any special treatment, but in practice if
+     * a program tries to create devices with these names, it causes the kernel
+     * to fire a "new device" notification event even though creation failed,
+     * and in turn that causes OVS to wake up and try to create them again,
+     * which ends up as a 100% CPU loop. */
+    struct netdev_linux *netdev = netdev_linux_cast(netdev_);
+    const char *name = netdev_->name;
+    if (!strcmp(name, "default") || !strcmp(name, "all")) {
+        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
+        VLOG_WARN_RL(&rl, "%s: Linux forbids network device with this name",
+                     name);
+        return EINVAL;
+    }
+
     ovs_mutex_init(&netdev->mutex);
+    return 0;
 }
 
 /* Creates system and internal devices. */
@@ -784,9 +802,10 @@ static int
 netdev_linux_construct(struct netdev *netdev_)
 {
     struct netdev_linux *netdev = netdev_linux_cast(netdev_);
-    int error;
-
-    netdev_linux_common_construct(netdev);
+    int error = netdev_linux_common_construct(netdev_);
+    if (error) {
+        return error;
+    }
 
     error = get_flags(&netdev->up, &netdev->ifi_flags);
     if (error == ENODEV) {
@@ -817,9 +836,11 @@ netdev_linux_construct_tap(struct netdev *netdev_)
     static const char tap_dev[] = "/dev/net/tun";
     const char *name = netdev_->name;
     struct ifreq ifr;
-    int error;
 
-    netdev_linux_common_construct(netdev);
+    int error = netdev_linux_common_construct(netdev_);
+    if (error) {
+        return error;
+    }
 
     /* Open tap device. */
     netdev->tap_fd = open(tap_dev, O_RDWR);
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to