On Thu, Apr 27, 2017 at 02:47:03PM -0700, William Tu wrote: > On Thu, Apr 27, 2017 at 11:17 AM, Ben Pfaff <b...@ovn.org> wrote: > > On Thu, Apr 27, 2017 at 10:17:20AM -0700, William Tu wrote: > >> On Thu, Apr 27, 2017 at 9:57 AM, Ben Pfaff <b...@ovn.org> wrote: > >> > On Thu, Apr 27, 2017 at 04:02:10AM -0700, William Tu wrote: > >> >> Before the patch, when users create bridge named "default", although > >> >> ovs-vsctl fails but vswitchd in the background will keep retrying it, > >> >> causing the systemd-udev to reach 100% cpu utilization. The reason is > >> >> due to frequent calls into kernel's register_netdevice function, > >> >> which will invoke several kernel elements who has registered on the > >> >> netdevice notifier chain. One of the notifier, the inetdev_event > >> >> rejects > >> >> this devname and register_netdevice fails. The patch prohibits creating > >> >> "default" bridge name. > >> >> > >> >> VMWare-BZ: #1842388 > >> >> Signed-off-by: William Tu <u9012...@gmail.com> > >> > > >> > It all seems very arbitrary. Do we understand why this is an invalid > >> > device name? > >> > >> Yes, when kernel is configured with CONFIG_SYSCTL, creating a new > >> netdev creates a dir in /proc/sys/net/ipv4/conf/<device name> > >> > >> The <device name> "default" and "all" is pre-existed when SYSCTL > >> starts (which means we should also prohibit "all") for default > >> property of the system's netdev. So sysctl prevents creating dev->name > >> is "default" or "all". A call stack is below if interested: > >> sysctl_dev_name_is_allowed > >> devinet_sysctl_register > >> inetdev_event > >> notifier_call_chain > >> raw_notifier_call_chain > >> call_netdevice_notifiers_info > >> register_netdevice > > > > Do we get the same behavior (100% CPU) if creating a bridge fails for > > other reasons? For example, it might fail because a network device > > already exists with the given name, or because the name is too long for > > a network device name. If we do get 100% CPU from such a failure, then > > we should fix the root of the problem instead of blacklisting particular > > names. > > There are two scenarios: > 1) if the bridge name exists, ex: eth0 > then "ovs-vsctl add-br eth0" fails due to EEXIST > 2) if the bridge name does not exists, but is "default" or "all" > then "ovs-vsctl add-br default" fails due to EINVAL > > From OVS's point of view it's the same, ovs-vsctl fails creating the > bridge, and keeps retry. From the whole system's point of view, (2) > has impact on other Linux subsystems, due to kernel netdev notifier > chain mechanism informing other subsystems when trying to add a new > device, while (1) fails pretty early in register_netdevice() and has > no impact.
OK, if I understand properly here, the problem is that asking the Linux kernel datapath to add a bridge named "default" does two things. First, it fails. Second, it triggers a notifier that causes OVS to try again. The second part is what makes this different from other failures and what makes OVS use 100% CPU in this case but not other cases. Is that right? If I'm right, then how about something like the following instead? It pushes this down to Linux-specific code, which seems better because this is a Linux-specific issue, and it thoroughly documents the problem. It compiles but I have not tested it. Thanks, Ben. diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 9ff1333f8e85..79e827303d07 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016 Nicira, Inc. + * Copyright (c) 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017 Nicira, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -773,10 +773,28 @@ netdev_linux_alloc(void) return &netdev->up; } -static void -netdev_linux_common_construct(struct netdev_linux *netdev) -{ +static int +netdev_linux_common_construct(struct netdev *netdev_) +{ + /* Prevent any attempt to create (or open) a network device named "default" + * or "all". These device names are effectively reserved on Linux because + * /proc/sys/net/ipv4/conf/ always contains directories by these names. By + * itself this wouldn't call for any special treatment, but in practice if + * a program tries to create devices with these names, it causes the kernel + * to fire a "new device" notification event even though creation failed, + * and in turn that causes OVS to wake up and try to create them again, + * which ends up as a 100% CPU loop. */ + struct netdev_linux *netdev = netdev_linux_cast(netdev_); + const char *name = netdev_->name; + if (!strcmp(name, "default") || !strcmp(name, "all")) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); + VLOG_WARN_RL(&rl, "%s: Linux forbids network device with this name", + name); + return EINVAL; + } + ovs_mutex_init(&netdev->mutex); + return 0; } /* Creates system and internal devices. */ @@ -784,9 +802,10 @@ static int netdev_linux_construct(struct netdev *netdev_) { struct netdev_linux *netdev = netdev_linux_cast(netdev_); - int error; - - netdev_linux_common_construct(netdev); + int error = netdev_linux_common_construct(netdev_); + if (error) { + return error; + } error = get_flags(&netdev->up, &netdev->ifi_flags); if (error == ENODEV) { @@ -817,9 +836,11 @@ netdev_linux_construct_tap(struct netdev *netdev_) static const char tap_dev[] = "/dev/net/tun"; const char *name = netdev_->name; struct ifreq ifr; - int error; - netdev_linux_common_construct(netdev); + int error = netdev_linux_common_construct(netdev_); + if (error) { + return error; + } /* Open tap device. */ netdev->tap_fd = open(tap_dev, O_RDWR); _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev