Re: [OpenWrt-Devel] [PATCH] netifd: Improve handling of device rename

2020-06-06 Thread Hans Dedecker
On Wed, Mar 11, 2020 at 2:13 PM Kristian Evensen
 wrote:
>
> After an interface has been renamed on a "fast" device (for example
> x86_64), the interface is sometimes not handled correctly by netifd.
> Looking in the logs, I see the following messages when renaming fails:
>
> Wed Mar 11 08:52:44 2020 kern.info kernel: [68383.522038] igb :03:00.0 
> nlw_1: renamed from eth2
> Wed Mar 11 08:52:44 2020 daemon.err netifd[2739]: __device_add_user(710): Add 
> user for device 'nlw_1', refcount=2
> Wed Mar 11 08:52:44 2020 daemon.err netifd[2739]: device_claim(413): Claim 
> Network device nlw_1, new active count: 2
> Wed Mar 11 08:52:44 2020 daemon.err netifd[2739]: device_claim(432): claim 
> Network device nlw_1 failed: -1
>
> Instrumenting netifd further reveals that there is a race between the hotplug
> "@move" event and ioctl(SIOCGIFINDEX). When the above error happens, the
> ioctl-call fails with ENODEV. Looking closer at the kernel code, it seems the
> hotplug-event is triggered before the renaming is completed. The easiest way 
> to
> trigger the race, is if an interface name with the old name is not handled by
> netifd and an interface with the new name is. If only the old name is handled,
> or both names, I was not able to provoke the race.
>
> When the renaming is complete, a NEWLINK-message is generated. This patch
> modifies the logic surrounding renaming, so that we wait for the
> NEWLINK-message before marking an interface as present. The changes made are:
>
> * We only handle move-events for interfaces we know, and we return after
> device has been set as not present.
> * When we receive a NEWLINK message for an interface managed by netifd,
> we call device_set_present. device_set_present is guarded by the same
> checks as the add hotplug-event.
>
> After these changes, renaming works properly on both "fast" and "slow"
> devices. Removing a device is also handled correctly.
>
> Signed-off-by: Kristian Evensen 
> ---
>  system-linux.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/system-linux.c b/system-linux.c
> index d533be8..aff67d6 100644
> --- a/system-linux.c
> +++ b/system-linux.c
> @@ -590,6 +590,11 @@ static int cb_rtnl_event(struct nl_msg *msg, void *arg)
> if (!system_get_dev_sysctl("/sys/class/net/%s/carrier", dev->ifname, 
> buf, sizeof(buf)))
> link_state = strtoul(buf, NULL, 0);
>
> +   if (dev->type == _device_type &&
> +   !system_if_force_external(dev->ifname) &&
> +   !dev->present)
> +   device_set_present(dev, true);
> +
> device_set_link(dev, link_state ? true : false);
>
>  out:
> @@ -652,13 +657,15 @@ handle_hotplug_msg(char *data, int size)
>  move:
> dev = device_find(interface_old);
> if (!dev)
> -   goto found;
> +   return;
>
> if (dev->type != _device_type)
> goto found;
>
> device_set_present(dev, false);
>
> +   return;
> +
>  found:
> dev = device_find(interface);
> if (!dev)
> --
> 2.20.1
Patch applied with minor modification
(https://git.openwrt.org/?p=project/netifd.git;a=commitdiff;h=51e9fb8151e8f2c16ac1400bf4d64147ee7e8f5a)

Thx
Hans
>
>
> ___
> openwrt-devel mailing list
> openwrt-devel@lists.openwrt.org
> https://lists.openwrt.org/mailman/listinfo/openwrt-devel

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] [PATCH] netifd: Improve handling of device rename

2020-05-16 Thread Kristian Evensen
Hi,

On Wed, Mar 11, 2020 at 2:13 PM Kristian Evensen
 wrote:
>
> After an interface has been renamed on a "fast" device (for example
> x86_64), the interface is sometimes not handled correctly by netifd.
> Looking in the logs, I see the following messages when renaming fails:
>
> Wed Mar 11 08:52:44 2020 kern.info kernel: [68383.522038] igb :03:00.0 
> nlw_1: renamed from eth2
> Wed Mar 11 08:52:44 2020 daemon.err netifd[2739]: __device_add_user(710): Add 
> user for device 'nlw_1', refcount=2
> Wed Mar 11 08:52:44 2020 daemon.err netifd[2739]: device_claim(413): Claim 
> Network device nlw_1, new active count: 2
> Wed Mar 11 08:52:44 2020 daemon.err netifd[2739]: device_claim(432): claim 
> Network device nlw_1 failed: -1
>
> Instrumenting netifd further reveals that there is a race between the hotplug
> "@move" event and ioctl(SIOCGIFINDEX). When the above error happens, the
> ioctl-call fails with ENODEV. Looking closer at the kernel code, it seems the
> hotplug-event is triggered before the renaming is completed. The easiest way 
> to
> trigger the race, is if an interface name with the old name is not handled by
> netifd and an interface with the new name is. If only the old name is handled,
> or both names, I was not able to provoke the race.
>
> When the renaming is complete, a NEWLINK-message is generated. This patch
> modifies the logic surrounding renaming, so that we wait for the
> NEWLINK-message before marking an interface as present. The changes made are:
>
> * We only handle move-events for interfaces we know, and we return after
> device has been set as not present.
> * When we receive a NEWLINK message for an interface managed by netifd,
> we call device_set_present. device_set_present is guarded by the same
> checks as the add hotplug-event.
>
> After these changes, renaming works properly on both "fast" and "slow"
> devices. Removing a device is also handled correctly.
>
> Signed-off-by: Kristian Evensen 

I was wondering if anyone has had time to look at this patch and have
any opinions? I've been running the change in production since the
change was submitted, and all my renaming issues have been resolved
(and no new ones have appeared :)).

BR,
Kristian

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


[OpenWrt-Devel] [PATCH] netifd: Improve handling of device rename

2020-03-11 Thread Kristian Evensen
After an interface has been renamed on a "fast" device (for example
x86_64), the interface is sometimes not handled correctly by netifd.
Looking in the logs, I see the following messages when renaming fails:

Wed Mar 11 08:52:44 2020 kern.info kernel: [68383.522038] igb :03:00.0 
nlw_1: renamed from eth2
Wed Mar 11 08:52:44 2020 daemon.err netifd[2739]: __device_add_user(710): Add 
user for device 'nlw_1', refcount=2
Wed Mar 11 08:52:44 2020 daemon.err netifd[2739]: device_claim(413): Claim 
Network device nlw_1, new active count: 2
Wed Mar 11 08:52:44 2020 daemon.err netifd[2739]: device_claim(432): claim 
Network device nlw_1 failed: -1

Instrumenting netifd further reveals that there is a race between the hotplug
"@move" event and ioctl(SIOCGIFINDEX). When the above error happens, the
ioctl-call fails with ENODEV. Looking closer at the kernel code, it seems the
hotplug-event is triggered before the renaming is completed. The easiest way to
trigger the race, is if an interface name with the old name is not handled by
netifd and an interface with the new name is. If only the old name is handled,
or both names, I was not able to provoke the race.

When the renaming is complete, a NEWLINK-message is generated. This patch
modifies the logic surrounding renaming, so that we wait for the
NEWLINK-message before marking an interface as present. The changes made are:

* We only handle move-events for interfaces we know, and we return after
device has been set as not present.
* When we receive a NEWLINK message for an interface managed by netifd,
we call device_set_present. device_set_present is guarded by the same
checks as the add hotplug-event.

After these changes, renaming works properly on both "fast" and "slow"
devices. Removing a device is also handled correctly.

Signed-off-by: Kristian Evensen 
---
 system-linux.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/system-linux.c b/system-linux.c
index d533be8..aff67d6 100644
--- a/system-linux.c
+++ b/system-linux.c
@@ -590,6 +590,11 @@ static int cb_rtnl_event(struct nl_msg *msg, void *arg)
if (!system_get_dev_sysctl("/sys/class/net/%s/carrier", dev->ifname, 
buf, sizeof(buf)))
link_state = strtoul(buf, NULL, 0);
 
+   if (dev->type == _device_type &&
+   !system_if_force_external(dev->ifname) &&
+   !dev->present)
+   device_set_present(dev, true);
+
device_set_link(dev, link_state ? true : false);
 
 out:
@@ -652,13 +657,15 @@ handle_hotplug_msg(char *data, int size)
 move:
dev = device_find(interface_old);
if (!dev)
-   goto found;
+   return;
 
if (dev->type != _device_type)
goto found;
 
device_set_present(dev, false);
 
+   return;
+
 found:
dev = device_find(interface);
if (!dev)
-- 
2.20.1


___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel