Re: WDS stopped working in 21.02, looking for bug in netifd, how to patch?
There is a race condition between hostapd and netifd. Now that the bug is found, I could try to write a patch. But I do not know what the correct behaviour should be. Should netifd not add wlan0.sta1 to the bridge at all? If so, what is the best way to implement it? Or should hostapd be patched not to treat it as an error if wlan0.sta1 is already added to the bridge? - D.H. ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: WDS stopped working in 21.02, looking for bug in netifd, BUG FOUND!
Hi everyone, I think I finally located the problem! There is a race condition between hostapd and netifd. In hostapd, src/drivers/driver_nl80211.c, look at the function i802_set_wds_sta. There are calls to 1) nl80211_create_iface and 2) linux_br_add_if. Now call 1) seems to trigger netifd into calling 3) bridge_hotplug_add from bridge.c in netifd. Now the problem is whether 2) or 3) is called first. If 2) is called first, it succeeds, and hostapd continues with WDS mode as required -> GOOD CASE, connection works. If 3) is called first, then 2) fails which leads i802_set_wds_sta to return immediately without doing what it should do -> BAD CASE, connection does not work. - D.H. ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: WDS stopped working in 21.02, looking for bug in netifd
On Thu, Sep 23, 2021 at 03:17:15PM +0200, Daniel Haid wrote: > Is there any way to dump a detailed state of the wlan driver in the kernel? > Or the state of netifd? Sould I enable some debug options? at least you can try to debug with 2 terminals an running: iw event ip monitor bye, Bastian ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: WDS stopped working in 21.02, looking for bug in netifd
Another update: If I issue the following commands: 1. /etc/init.d/network restart 2. ip addr 3. ip addr 4. ip addr Then, in a "bad case", if the timing is right, 2. shows that the interface wlan0.sta1 is DOWN, 3. shows that it is UP and 4. shows that is DOWN again. Then it stays DOWN. This might explain why yesterday I noticed so many "good" cases, because I did not try to ping the client, I only looked at the output of "ip addr". But it can show UP for a short time, even if the bug has triggered. (By the way, even in a good case it first shows DOWN for a short time, and the UP, but then it stays UP.) Is there any way to dump a detailed state of the wlan driver in the kernel? Or the state of netifd? Sould I enable some debug options? - D.H. ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: WDS stopped working in 21.02, looking for bug in netifd
can you please add this function ontop of /lib/netifd/wireless/mac80211.sh Unfortunately, /tmp/foo is identical after good and bad boot, see below. There are three ways to trigger the bug (randomly, yesterday I thought the chance was about 50%, but today it felt much lower, about 5-10%): 1. Reboot 2. /etc/init.d/network restart 3. Turn off the client, wait for wlan0.sta1 to disappear on the AP, then restart the client. Note that if 3. triggers the bug, the file /tmp/foo does not change at all, so whatever causes the bug seems not to affect /tmp/foo. Previously I thought the chance for the bug was higher when 802.11r was enabled, but today the chance felt as low as when I tried without 802.11r, so maybe this is totally independent of 802.11r after all. --- D.H. -- rc:0 | iw dev wlan0 del rc:0 | iw phy0 info rc:0 | iw reg get rc:0 | iw reg set DE rc:0 | iw phy phy0 set antenna 0x 0x rc:0 | iw phy phy0 set antenna_gain 0 rc:0 | iw phy phy0 set distance 0 rc:0 | iw phy phy0 set txpower auto rc:0 | iw phy phy0 info rc:0 | iw phy phy0 interface add wlan0 type __ap ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: WDS stopped working in 21.02, looking for bug in netifd
On Wed, Sep 22, 2021 at 06:12:13PM +0200, Daniel Haid wrote: > Another update: can you please add this function ontop of /lib/netifd/wireless/mac80211.sh #!/bin/sh iw() { local rc; command iw "$@"; rc=$?; echo "rc:$rc | iw $*" >>/tmp/foo; test $rc -eq 0 || command iw "$@" 2>>/tmp/foo; return $rc; } After booting the file '/tmp/foo' looks like: ### rc:0 | iw dev wlan1 del rc:0 | iw dev wlan0 del rc:0 | iw phy1 info rc:0 | iw phy0 info rc:0 | iw reg get rc:0 | iw reg set US rc:0 | iw reg get rc:0 | iw reg set US rc:0 | iw phy phy1 set antenna 0x 0x rc:0 | iw phy phy1 set antenna_gain 0 rc:0 | iw phy phy1 set distance 0 rc:0 | iw phy phy1 set txpower auto rc:0 | iw phy phy1 info rc:134 | iw phy phy0 set antenna 0x 0x command failed: Not supported (-122) rc:0 | iw phy phy0 set antenna_gain 0 rc:0 | iw phy phy1 info rc:0 | iw phy phy0 set distance 100 rc:0 | iw phy phy0 set txpower fixed 2300 rc:0 | iw phy phy0 info rc:0 | iw phy phy0 interface add wlan0 type adhoc rc:0 | iw phy phy1 interface add wlan1 type __ap rc:0 | iw phy phy0 interface add wlan0-1 type __ap rc:0 | iw dev wlan0 ibss join ffintern.2GHz 2432 HT20 fixed-freq 02:ca:ff:ee:ba:be beacon-interval 250 basic-rates 6,12,24 mcast-rate 6 ### Please send a good and a bad case, or ignore if the error is deeper. bye, Bastian ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: WDS stopped working in 21.02, looking for bug in netifd
Another update: I put some logging code into the function interface_add_link. On every reboot the function interface_add_link is sometimes called for the device wlan0.sta1 and sometimes not. What I have seen is the following: When it is not called, the connection works. When it is called, the bug takes effect and the connection does not work or sometimes it still works. So there seem to be two layers of randomness. What I have not seen: interface_add_link is NOT called for wlan0.sta1, but the bug still DOES take effect. Of course, without infinitely many tries I cannot rule it completely out. I have also been able to trigger the bug with 802.11r disabled by turning of the client until the interface wlan0.sta1 disappears on the AP and then turning the client on again. But this is also not deterministic. - D.H. ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: WDS stopped working in 21.02, looking for bug in netifd
Small update: Preventing the call to mdev->hotplug_ops->add (and replacing it with return 0) inside the function interface_add_link whenever it is called from interface_handle_link and the string name contains the substring ".sta" seems to "fix" the bug. What kind of hotplug_ops are called for such interfaces? - D.H. ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: WDS stopped working in 21.02, looking for bug in netifd
By the way, maybe I should add that both devices are GL.iNet GL-AR150. Also, the configs are only minimally different from the defaults. The only option that could be a bit unusual is having 802.11r enabled. And indeed, after disabling 802.11r, the bug occurs much less often. In fact, without 802.11r, I have not seen the bug for several restarts. I thought I saw the bug once even with 802.11r disabled, but I am not so sure anymore. I now did another dozen restarts (with 802.11r disabled), and the connection always works. Maybe the one time I thought I saw the bug I just wrote "ping" too quickly. I enabled 802.11r again, and the bug appeared again after only one restart. It feels like it is not a coincidence. - D.H. ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: WDS stopped working in 21.02, looking for bug in netifd
Can you please send me the config that you're using? I'd like to try to reproduce it myself. Find attached the config dumps of the AP and the client. They have been created with 21.02, but after flashing the snapshot on the AP I restored exactly this config (and the bug was still there). - D.H. backup-wds-ap.tar.gz Description: application/gzip backup-wds-client.tar.gz Description: application/gzip ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: WDS stopped working in 21.02, looking for bug in netifd
On 2021-09-20 22:56, Daniel Haid wrote: > Felix, I took the last openwrt snapshot and compiled netifd from master > with your patch applied and installed it. > > Result: > After boot wlan0.sta1 was DOWN. > After "/etc/init.d/network restart" it was UP and the connection worked! > After another "/etc/init.d/network restart" it was DOWN again. > After reboot it was UP again and working. > After "/etc/init.d/network restart" it was DOWN again. Can you please send me the config that you're using? I'd like to try to reproduce it myself. Thanks, - Felix ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: WDS stopped working in 21.02, looking for bug in netifd
Felix, I took the last openwrt snapshot and compiled netifd from master with your patch applied and installed it. Result: After boot wlan0.sta1 was DOWN. After "/etc/init.d/network restart" it was UP and the connection worked! After another "/etc/init.d/network restart" it was DOWN again. After reboot it was UP again and working. After "/etc/init.d/network restart" it was DOWN again. - D.H. ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: WDS stopped working in 21.02, looking for bug in netifd
Please test if applying this change to netifd fixes the issue. I am currently building the toolchain for the current snapshot, so I can test on the current snapshot. So far I have only been able to test the patch on 21.02. Since the patch does not apply cleanly I tried to versions of the patch. Version 1: Apply everything except the line with dev->bpdu_filer = <...>. With this version, it seems that now the interface is *always* in state DOWN after booting and it never works even if manually set to UP. Version 2: Apply only the last two hunks of the patch. With this version, I still get random UP/DOWN behaviour at boot, but now "/etc/init.d/network restart" seems to sometimes give me a working state with the interface UP and the connection working, which I am quite sure I never had before. But since the whole behaviour has a random element of course I can not be 100% sure. I will try with the snapshot as soon as possible. D.H. ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: WDS stopped working in 21.02, looking for bug in netifd
On 2021-09-20 16:46, Daniel Haid wrote: > I have continued investigating. > > After all, it seems that the interface being down is just a symptom. > > I summarize my current findings: > > With the 21.02 netifd version, there seems to be a bug concerting WDS. > The bug has the following effect: > > I have openwrt 21.02 running on one system running as WDS AP and another > one running as WDS Client. The WDS Client is running and its > configuration never changed. > > After booting the WDS AP, there are two possibilities for in what state > the system can be, I call them NON-WORKING and WORKING. The probability > seems to be about 50% to be in one or the other state after booting. > > To find out in which state I am after booting, I look for the interface > wlan0.sta1. If it is UP, then we are in state WORKING. If it is DOWN, > then we are in state NON-WORKING. > > Using ping, in state WORKING, the AP can reach the Client. In state > NON-WORKING, the AP cannot reach the Client. > > In state WORKING, the interface wlan0.sta1 can be set to DOWN and UP > again, and the AP can then again ping the Client, but only after waiting > about 4 minutes for the Client to reconnect to the AP (in my last mail, > I wrote that it did not work, but I just did not wait long enough). > > In state NON-WORKING, I can set the interface wlan0.sta1 to UP, but this > will not help. The ip command will show that the interface is UP, but > the AP can not ping the Client, no matter how long I wait after setting > the state to UP. > > If I turn off the Client, wait for the interface wlan0.sta1 to be > removed on the AP, and then restart the Client, then the interface > wlan0.sta1 will be created again, in state DOWN. Everything is again as > in the state NON-WORKING. > > To reliably reach the state NON-WORKING, run "/etc/init.d/network restart". > > Changing the function wireless_interface_handle_link such that it does > not call interface_handle_link when it is called from > wireless_device_hotplug_event fixes the bug. > > But I do not understand what is happening. > > I am not subscribed to the list; please send Cc to me. Please test if applying this change to netifd fixes the issue. Thanks, - Felix --- --- a/wireless.c +++ b/wireless.c @@ -328,14 +328,14 @@ static void wireless_interface_handle_link(struct wireless_interface *vif, const if (!ifname) ifname = vif->ifname; - if (up) { + if (up && ifname != vif->ifname) { struct device *dev = device_get(ifname, 2); if (dev) { dev->wireless_isolate = vif->isolate; dev->wireless_proxyarp = vif->proxyarp; dev->wireless = true; dev->wireless_ap = vif->ap_mode; - dev->bpdu_filter = dev->wireless_ap && ifname == vif->ifname; + dev->bpdu_filter = dev->wireless_ap; } } @@ -793,6 +793,13 @@ wireless_interface_init_config(struct wireless_interface *vif) if ((cur = tb[VIF_ATTR_NETWORK])) vif->network = cur; + cur = tb[VIF_ATTR_MODE]; + if (cur) + vif->ap_mode = !strcmp(blobmsg_get_string(cur), "ap"); + + if (!vif->ap_mode) + return; + cur = tb[VIF_ATTR_ISOLATE]; if (cur) vif->isolate = blobmsg_get_bool(cur); @@ -801,9 +808,6 @@ wireless_interface_init_config(struct wireless_interface *vif) if (cur) vif->proxyarp = blobmsg_get_bool(cur); - cur = tb[VIF_ATTR_MODE]; - if (cur) - vif->ap_mode = !strcmp(blobmsg_get_string(cur), "ap"); } /* vlist update call for wireless interface list */ ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: WDS stopped working in 21.02, looking for bug in netifd
I have continued investigating. After all, it seems that the interface being down is just a symptom. I summarize my current findings: With the 21.02 netifd version, there seems to be a bug concerting WDS. The bug has the following effect: I have openwrt 21.02 running on one system running as WDS AP and another one running as WDS Client. The WDS Client is running and its configuration never changed. After booting the WDS AP, there are two possibilities for in what state the system can be, I call them NON-WORKING and WORKING. The probability seems to be about 50% to be in one or the other state after booting. To find out in which state I am after booting, I look for the interface wlan0.sta1. If it is UP, then we are in state WORKING. If it is DOWN, then we are in state NON-WORKING. Using ping, in state WORKING, the AP can reach the Client. In state NON-WORKING, the AP cannot reach the Client. In state WORKING, the interface wlan0.sta1 can be set to DOWN and UP again, and the AP can then again ping the Client, but only after waiting about 4 minutes for the Client to reconnect to the AP (in my last mail, I wrote that it did not work, but I just did not wait long enough). In state NON-WORKING, I can set the interface wlan0.sta1 to UP, but this will not help. The ip command will show that the interface is UP, but the AP can not ping the Client, no matter how long I wait after setting the state to UP. If I turn off the Client, wait for the interface wlan0.sta1 to be removed on the AP, and then restart the Client, then the interface wlan0.sta1 will be created again, in state DOWN. Everything is again as in the state NON-WORKING. To reliably reach the state NON-WORKING, run "/etc/init.d/network restart". Changing the function wireless_interface_handle_link such that it does not call interface_handle_link when it is called from wireless_device_hotplug_event fixes the bug. But I do not understand what is happening. I am not subscribed to the list; please send Cc to me. D.H. ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: WDS stopped working in 21.02, looking for bug in netifd
I have investigated a bit more. Even without the "fix", after each reboot WDS there seems to be about a 50% chance of WDS working. To reliably reproduce the bug, it is necessary to do /etc/init.d/network restart with the WDS client connected. Now what I noticed is that using the netifd version with the bug, after the network restart the interface wlan0.sta1 is DOWN (and the WDS client not reachable) while using the version with the "fix" the interface wlan0.sta1 is UP (and the WDS client is reachable). However, in the version with the bug, if I manually set the interface to UP with "ip link set up dev wlan0.sta1", it now says that the interface is UP, but the WDS client is still not reachable. On the other hand, if use a version with the "fix", then initally the WDS client is reachable. But if I now issue ip link set down dev wlan0.sta1 ip link set up dev wlan0.sta1 then the WDS client is not reachable anymore! Who creates these interfaces "wlan0.sta1"? Maybe we are not allowed to set them to DOWN after they have been created. Or after setting them to UP there is some further initalization required which netifd does not do, but the original process that creates these interfaces does? I will investigate further. I am also grateful for any suggestions. I am not subscribed to the list; please send Cc to me. D.H. ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
WDS stopped working in 21.02, looking for bug in netifd
Hello, I just installed 21.02 on my devices and WDS stopped working. This seems to be the following bug: https://bugs.openwrt.org/index.php?do=details_id=3961 I have ath79 and x86 devices, but all with ath9k wireless. WDS stopped working on all of them. Since the bug reporter already found out what commit introduced the bug, I have built netifd simply with the call to wireless_device_hotplug_event commented out in the function device_hotplug_event in device.c (and no other changes). With that one line commented out, WDS works again. Since my whole setup depends on it, I would very much like to find out how to fix this. Could someone explain why wireless_device_hotplug_event was added and what its purpose is? I am not subscribed to the list; please send Cc to me. Best, D.H. ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel