Re: WDS stopped working in 21.02, looking for bug in netifd, how to patch?

2021-09-27 Thread Daniel Haid

There is a race condition between hostapd and netifd.


Now that the bug is found, I could try to write a patch. But I do not 
know what the correct behaviour should be.


Should netifd not add wlan0.sta1 to the bridge at all? If so, what is 
the best way to implement it?


Or should hostapd be patched not to treat it as an error if wlan0.sta1 
is already added to the bridge?


-
D.H.

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: WDS stopped working in 21.02, looking for bug in netifd, BUG FOUND!

2021-09-23 Thread Daniel Haid

Hi everyone, I think I finally located the problem!

There is a race condition between hostapd and netifd.

In hostapd, src/drivers/driver_nl80211.c, look at the function 
i802_set_wds_sta. There are calls to


1) nl80211_create_iface and
2) linux_br_add_if.

Now call 1) seems to trigger netifd into calling

3) bridge_hotplug_add from bridge.c in netifd.

Now the problem is whether 2) or 3) is called first.

If 2) is called first, it succeeds, and hostapd continues with WDS mode 
as required -> GOOD CASE, connection works.


If 3) is called first, then 2) fails which leads i802_set_wds_sta to 
return immediately without doing what it should do -> BAD CASE, 
connection does not work.


-
D.H.

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: WDS stopped working in 21.02, looking for bug in netifd

2021-09-23 Thread Bastian Bittorf
On Thu, Sep 23, 2021 at 03:17:15PM +0200, Daniel Haid wrote:
> Is there any way to dump a detailed state of the wlan driver in the kernel?
> Or the state of netifd? Sould I enable some debug options?

at least you can try to debug with 2 terminals an running:

iw event
ip monitor

bye, Bastian

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: WDS stopped working in 21.02, looking for bug in netifd

2021-09-23 Thread Daniel Haid

Another update:

If I issue the following commands:

1. /etc/init.d/network restart
2. ip addr
3. ip addr
4. ip addr

Then, in a "bad case", if the timing is right, 2. shows that the 
interface wlan0.sta1 is DOWN, 3. shows that it is UP and 4. shows that 
is DOWN again. Then it stays DOWN.


This might explain why yesterday I noticed so many "good" cases, because 
I did not try to ping the client, I only looked at the output of "ip 
addr". But it can show UP for a short time, even if the bug has triggered.


(By the way, even in a good case it first shows DOWN for a short time, 
and the UP, but then it stays UP.)


Is there any way to dump a detailed state of the wlan driver in the 
kernel? Or the state of netifd? Sould I enable some debug options?


-
D.H.

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: WDS stopped working in 21.02, looking for bug in netifd

2021-09-22 Thread Daniel Haid

can you please add this function ontop of
/lib/netifd/wireless/mac80211.sh


Unfortunately, /tmp/foo is identical after good and bad boot, see below.

There are three ways to trigger the bug (randomly, yesterday I thought 
the chance was about 50%, but today it felt much lower, about 5-10%):


1. Reboot
2. /etc/init.d/network restart
3. Turn off the client, wait for wlan0.sta1 to disappear on the AP, then 
restart the client.


Note that if 3. triggers the bug, the file /tmp/foo does not change at 
all, so whatever causes the bug seems not to affect /tmp/foo.


Previously I thought the chance for the bug was higher when 802.11r was 
enabled, but today the chance felt as low as when I tried without 
802.11r, so maybe this is totally independent of 802.11r after all.


---
D.H.

--
rc:0 | iw dev wlan0 del
rc:0 | iw phy0 info
rc:0 | iw reg get
rc:0 | iw reg set DE
rc:0 | iw phy phy0 set antenna 0x 0x
rc:0 | iw phy phy0 set antenna_gain 0
rc:0 | iw phy phy0 set distance 0
rc:0 | iw phy phy0 set txpower auto
rc:0 | iw phy phy0 info
rc:0 | iw phy phy0 interface add wlan0 type __ap

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: WDS stopped working in 21.02, looking for bug in netifd

2021-09-22 Thread Bastian Bittorf
On Wed, Sep 22, 2021 at 06:12:13PM +0200, Daniel Haid wrote:
> Another update:

can you please add this function ontop of
/lib/netifd/wireless/mac80211.sh

#!/bin/sh
iw() { local rc; command iw "$@"; rc=$?; echo "rc:$rc | iw $*" >>/tmp/foo; test 
$rc -eq 0 || command iw "$@" 2>>/tmp/foo; return $rc; }

After booting the file '/tmp/foo' looks like:
###
rc:0 | iw dev wlan1 del
rc:0 | iw dev wlan0 del
rc:0 | iw phy1 info
rc:0 | iw phy0 info
rc:0 | iw reg get
rc:0 | iw reg set US
rc:0 | iw reg get
rc:0 | iw reg set US
rc:0 | iw phy phy1 set antenna 0x 0x
rc:0 | iw phy phy1 set antenna_gain 0
rc:0 | iw phy phy1 set distance 0
rc:0 | iw phy phy1 set txpower auto
rc:0 | iw phy phy1 info
rc:134 | iw phy phy0 set antenna 0x 0x
command failed: Not supported (-122)
rc:0 | iw phy phy0 set antenna_gain 0
rc:0 | iw phy phy1 info
rc:0 | iw phy phy0 set distance 100
rc:0 | iw phy phy0 set txpower fixed 2300
rc:0 | iw phy phy0 info
rc:0 | iw phy phy0 interface add wlan0 type adhoc
rc:0 | iw phy phy1 interface add wlan1 type __ap
rc:0 | iw phy phy0 interface add wlan0-1 type __ap
rc:0 | iw dev wlan0 ibss join ffintern.2GHz 2432 HT20 fixed-freq 
02:ca:ff:ee:ba:be beacon-interval 250 basic-rates 6,12,24 mcast-rate 6
###

Please send a good and a bad case, or ignore if the error is deeper.

bye, Bastian

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: WDS stopped working in 21.02, looking for bug in netifd

2021-09-22 Thread Daniel Haid

Another update:

I put some logging code into the function interface_add_link. On every 
reboot the function interface_add_link is sometimes called for the 
device wlan0.sta1 and sometimes not.


What I have seen is the following:

When it is not called, the connection works.

When it is called, the bug takes effect and the connection does not work 
or sometimes it still works.


So there seem to be two layers of randomness.

What I have not seen: interface_add_link is NOT called for wlan0.sta1, 
but the bug still DOES take effect. Of course, without infinitely many 
tries I cannot rule it completely out.


I have also been able to trigger the bug with 802.11r disabled by 
turning of the client until the interface wlan0.sta1 disappears on the 
AP and then turning the client on again. But this is also not deterministic.


-
D.H.

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: WDS stopped working in 21.02, looking for bug in netifd

2021-09-21 Thread Daniel Haid

Small update:

Preventing the call to mdev->hotplug_ops->add (and replacing it with 
return 0) inside the function interface_add_link whenever it is called 
from interface_handle_link and the string name contains the substring 
".sta" seems to "fix" the bug.


What kind of hotplug_ops are called for such interfaces?

-
D.H.

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: WDS stopped working in 21.02, looking for bug in netifd

2021-09-21 Thread Daniel Haid

By the way, maybe I should add that both devices are GL.iNet GL-AR150.

Also, the configs are only minimally different from the defaults. The 
only option that could be a bit unusual is having 802.11r enabled.


And indeed, after disabling 802.11r, the bug occurs much less often. In 
fact, without 802.11r, I have not seen the bug for several restarts. I 
thought I saw the bug once even with 802.11r disabled, but I am not so 
sure anymore. I now did another dozen restarts (with 802.11r disabled), 
and the connection always works. Maybe the one time I thought I saw the 
bug I just wrote "ping" too quickly.


I enabled 802.11r again, and the bug appeared again after only one 
restart. It feels like it is not a coincidence.


-
D.H.

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: WDS stopped working in 21.02, looking for bug in netifd

2021-09-21 Thread Daniel Haid

Can you please send me the config that you're using? I'd like to try to
reproduce it myself.


Find attached the config dumps of the AP and the client.

They have been created with 21.02, but after flashing the snapshot on 
the AP I restored exactly this config (and the bug was still there).


-
D.H.



backup-wds-ap.tar.gz
Description: application/gzip


backup-wds-client.tar.gz
Description: application/gzip
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: WDS stopped working in 21.02, looking for bug in netifd

2021-09-21 Thread Felix Fietkau
On 2021-09-20 22:56, Daniel Haid wrote:
> Felix, I took the last openwrt snapshot and compiled netifd from master 
> with your patch applied and installed it.
> 
> Result:
> After boot wlan0.sta1 was DOWN.
> After "/etc/init.d/network restart" it was UP and the connection worked!
> After another "/etc/init.d/network restart" it was DOWN again.
> After reboot it was UP again and working.
> After "/etc/init.d/network restart" it was DOWN again.
Can you please send me the config that you're using? I'd like to try to
reproduce it myself.

Thanks,

- Felix

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: WDS stopped working in 21.02, looking for bug in netifd

2021-09-20 Thread Daniel Haid
Felix, I took the last openwrt snapshot and compiled netifd from master 
with your patch applied and installed it.


Result:
After boot wlan0.sta1 was DOWN.
After "/etc/init.d/network restart" it was UP and the connection worked!
After another "/etc/init.d/network restart" it was DOWN again.
After reboot it was UP again and working.
After "/etc/init.d/network restart" it was DOWN again.

-
D.H.

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: WDS stopped working in 21.02, looking for bug in netifd

2021-09-20 Thread Daniel Haid

Please test if applying this change to netifd fixes the issue.


I am currently building the toolchain for the current snapshot, so I can 
test on the current snapshot.


So far I have only been able to test the patch on 21.02. Since the patch 
does not apply cleanly I tried to versions of the patch.


Version 1:

Apply everything except the line with dev->bpdu_filer = <...>.

With this version, it seems that now the interface is *always* in state 
DOWN after booting and it never works even if manually set to UP.


Version 2:

Apply only the last two hunks of the patch.

With this version, I still get random UP/DOWN behaviour at boot, but now 
"/etc/init.d/network restart" seems to sometimes give me a working state 
with the interface UP and the connection working, which I am quite sure 
I never had before.


But since the whole behaviour has a random element of course I can not 
be 100% sure.


I will try with the snapshot as soon as possible.

D.H.

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: WDS stopped working in 21.02, looking for bug in netifd

2021-09-20 Thread Felix Fietkau
On 2021-09-20 16:46, Daniel Haid wrote:
> I have continued investigating.
> 
> After all, it seems that the interface being down is just a symptom.
> 
> I summarize my current findings:
> 
> With the 21.02 netifd version, there seems to be a bug concerting WDS. 
> The bug has the following effect:
> 
> I have openwrt 21.02 running on one system running as WDS AP and another 
> one running as WDS Client. The WDS Client is running and its 
> configuration never changed.
> 
> After booting the WDS AP, there are two possibilities for in what state 
> the system can be, I call them NON-WORKING and WORKING. The probability 
> seems to be about 50% to be in one or the other state after booting.
> 
> To find out in which state I am after booting, I look for the interface 
> wlan0.sta1. If it is UP, then we are in state WORKING. If it is DOWN, 
> then we are in state NON-WORKING.
> 
> Using ping, in state WORKING, the AP can reach the Client. In state 
> NON-WORKING, the AP cannot reach the Client.
> 
> In state WORKING, the interface wlan0.sta1 can be set to DOWN and UP 
> again, and the AP can then again ping the Client, but only after waiting 
> about 4 minutes for the Client to reconnect to the AP (in my last mail, 
> I wrote that it did not work, but I just did not wait long enough).
> 
> In state NON-WORKING, I can set the interface wlan0.sta1 to UP, but this 
> will not help. The ip command will show that the interface is UP, but 
> the AP can not ping the Client, no matter how long I wait after setting 
> the state to UP.
> 
> If I turn off the Client, wait for the interface wlan0.sta1 to be 
> removed on the AP, and then restart the Client, then the interface 
> wlan0.sta1 will be created again, in state DOWN. Everything is again as 
> in the state NON-WORKING.
> 
> To reliably reach the state NON-WORKING, run "/etc/init.d/network restart".
> 
> Changing the function wireless_interface_handle_link such that it does 
> not call interface_handle_link when it is called from 
> wireless_device_hotplug_event fixes the bug.
> 
> But I do not understand what is happening.
> 
> I am not subscribed to the list; please send Cc to me.
Please test if applying this change to netifd fixes the issue.

Thanks,

- Felix

---
--- a/wireless.c
+++ b/wireless.c
@@ -328,14 +328,14 @@ static void wireless_interface_handle_link(struct 
wireless_interface *vif, const
if (!ifname)
ifname = vif->ifname;
 
-   if (up) {
+   if (up && ifname != vif->ifname) {
struct device *dev = device_get(ifname, 2);
if (dev) {
dev->wireless_isolate = vif->isolate;
dev->wireless_proxyarp = vif->proxyarp;
dev->wireless = true;
dev->wireless_ap = vif->ap_mode;
-   dev->bpdu_filter = dev->wireless_ap && ifname == 
vif->ifname;
+   dev->bpdu_filter = dev->wireless_ap;
}
}
 
@@ -793,6 +793,13 @@ wireless_interface_init_config(struct wireless_interface 
*vif)
if ((cur = tb[VIF_ATTR_NETWORK]))
vif->network = cur;
 
+   cur = tb[VIF_ATTR_MODE];
+   if (cur)
+   vif->ap_mode = !strcmp(blobmsg_get_string(cur), "ap");
+
+   if (!vif->ap_mode)
+   return;
+
cur = tb[VIF_ATTR_ISOLATE];
if (cur)
vif->isolate = blobmsg_get_bool(cur);
@@ -801,9 +808,6 @@ wireless_interface_init_config(struct wireless_interface 
*vif)
if (cur)
vif->proxyarp = blobmsg_get_bool(cur);
 
-   cur = tb[VIF_ATTR_MODE];
-   if (cur)
-   vif->ap_mode = !strcmp(blobmsg_get_string(cur), "ap");
 }
 
 /* vlist update call for wireless interface list */

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: WDS stopped working in 21.02, looking for bug in netifd

2021-09-20 Thread Daniel Haid

I have continued investigating.

After all, it seems that the interface being down is just a symptom.

I summarize my current findings:

With the 21.02 netifd version, there seems to be a bug concerting WDS. 
The bug has the following effect:


I have openwrt 21.02 running on one system running as WDS AP and another 
one running as WDS Client. The WDS Client is running and its 
configuration never changed.


After booting the WDS AP, there are two possibilities for in what state 
the system can be, I call them NON-WORKING and WORKING. The probability 
seems to be about 50% to be in one or the other state after booting.


To find out in which state I am after booting, I look for the interface 
wlan0.sta1. If it is UP, then we are in state WORKING. If it is DOWN, 
then we are in state NON-WORKING.


Using ping, in state WORKING, the AP can reach the Client. In state 
NON-WORKING, the AP cannot reach the Client.


In state WORKING, the interface wlan0.sta1 can be set to DOWN and UP 
again, and the AP can then again ping the Client, but only after waiting 
about 4 minutes for the Client to reconnect to the AP (in my last mail, 
I wrote that it did not work, but I just did not wait long enough).


In state NON-WORKING, I can set the interface wlan0.sta1 to UP, but this 
will not help. The ip command will show that the interface is UP, but 
the AP can not ping the Client, no matter how long I wait after setting 
the state to UP.


If I turn off the Client, wait for the interface wlan0.sta1 to be 
removed on the AP, and then restart the Client, then the interface 
wlan0.sta1 will be created again, in state DOWN. Everything is again as 
in the state NON-WORKING.


To reliably reach the state NON-WORKING, run "/etc/init.d/network restart".

Changing the function wireless_interface_handle_link such that it does 
not call interface_handle_link when it is called from 
wireless_device_hotplug_event fixes the bug.


But I do not understand what is happening.

I am not subscribed to the list; please send Cc to me.

D.H.

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: WDS stopped working in 21.02, looking for bug in netifd

2021-09-20 Thread Daniel Haid

I have investigated a bit more.

Even without the "fix", after each reboot WDS there seems to be about a 
50% chance of WDS working.


To reliably reproduce the bug, it is necessary to do
/etc/init.d/network restart
with the WDS client connected.

Now what I noticed is that using the netifd version with the bug, after 
the network restart the interface wlan0.sta1 is DOWN (and the WDS client 
not reachable) while using the version with the "fix" the interface 
wlan0.sta1 is UP (and the WDS client is reachable).


However, in the version with the bug, if I manually set the interface to 
UP with "ip link set up dev wlan0.sta1", it now says that the interface 
is UP, but the WDS client is still not reachable.


On the other hand, if use a version with the "fix", then initally the 
WDS client is reachable. But if I now issue


ip link set down dev wlan0.sta1
ip link set up dev wlan0.sta1

then the WDS client is not reachable anymore!

Who creates these interfaces "wlan0.sta1"? Maybe we are not allowed to 
set them to DOWN after they have been created. Or after setting them to 
UP there is some further initalization required which netifd does not 
do, but the original process that creates these interfaces does?


I will investigate further. I am also grateful for any suggestions.

I am not subscribed to the list; please send Cc to me.

D.H.

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


WDS stopped working in 21.02, looking for bug in netifd

2021-09-19 Thread Daniel Haid

Hello,

I just installed 21.02 on my devices and WDS stopped working. This seems 
to be the following bug:


https://bugs.openwrt.org/index.php?do=details_id=3961

I have ath79 and x86 devices, but all with ath9k wireless. WDS stopped 
working on all of them.


Since the bug reporter already found out what commit introduced the bug, 
I have built netifd simply with the call to 
wireless_device_hotplug_event commented out in the function 
device_hotplug_event in device.c (and no other changes).


With that one line commented out, WDS works again.

Since my whole setup depends on it, I would very much like to find out 
how to fix this.


Could someone explain why wireless_device_hotplug_event was added and 
what its purpose is?


I am not subscribed to the list; please send Cc to me.

Best,

D.H.

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel