#22156: Race condition causes netifd to fail when bringing up network interface
----------------------------------------+--------------------------------
Reporter: alan.christopher.jenkins@… | Owner: developers
Type: defect | Status: new
Priority: normal | Milestone:
Component: base system | Version: Chaos Calmer 15.05
Keywords: |
----------------------------------------+--------------------------------
I finally figured out what caused the 5Ghz network on my box to fail to
start! I've shown to my satisfaction a race condition, which must be a
bug in the base system. Next challenge: work out what race condition
there is, that could cause the error. "Could not set interface wlan1-1
flags (UP): Device or resource busy".
Maybe netifd isn't serializing something it should be?
This is a Netgear WNDR3800. I have two 5Ghz networks (SSIDs) configured
on the same radio.
It's happening on 15.05. I've now tried upgrading to 15.05.1 for the
netifd update, but it didn't help.
I think I've included all the important details here. The configuration
is described in more detail at http://superuser.com/questions/1060845
/openwrt-on-wndr3800-5ghz-wifi-shows-as-disabled/1060846#1060846
Error log:
Sun Apr 3 15:02:19 2016 user.notice SQM: Starting simple.qos
Sun Apr 3 15:02:19 2016 user.notice SQM: ifb associated with interface
pppoe-wan:
Sun Apr 3 15:02:19 2016 user.notice SQM: Currently no ifb is associated
with pppoe-wan, this is normal during starting of the sqm system.
Sun Apr 3 15:02:19 2016 daemon.notice netifd: radio1 (9031): wlan1: ACS-
COMPLETED freq=5320 channel=64
Sun Apr 3 15:02:19 2016 daemon.notice netifd: radio1 (9031): Using
interface wlan1 with hwaddr 74:44:01:86:42:d6 and ssid
"VOYAGER2091-90-jenkins"
Sun Apr 3 15:02:20 2016 user.notice SQM: Squashing differentiated
services code points (DSCP) from ingress.
Sun Apr 3 15:02:21 2016 kern.info kernel: [ 199.510000] IPv6:
ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready
Sun Apr 3 15:02:21 2016 daemon.notice netifd: radio1 (9031): Could not
set interface wlan1-1 flags (UP): Device or resource busy
Sun Apr 3 15:02:21 2016 daemon.notice netifd: radio1 (9031): Failed to
add BSS (BSSID=76:44:01:86:42:d6)
Sun Apr 3 15:02:21 2016 daemon.notice netifd: radio1 (9031): Interface
initialization failed
Sun Apr 3 15:02:21 2016 daemon.notice netifd: radio1 (9031): wlan1:
interface state ACS->DISABLED
Sun Apr 3 15:02:21 2016 daemon.notice netifd: radio1 (9031): wlan1: AP-
DISABLED
Sun Apr 3 15:02:21 2016 daemon.notice netifd: radio1 (9031): ACS:
Possibly channel configuration is invalid, please report this along with
your config file.
Sun Apr 3 15:02:21 2016 daemon.notice netifd: radio1 (9031): ACS: Failed
to start
Sun Apr 3 15:02:21 2016 daemon.notice netifd: radio1 (9031): wlan1: AP-
DISABLED
Running /etc/init.d/network restart on the router generates the same
error.
Running ifdown wifi_a_guest and then ifup wifi_a_guest seems to fix
everything until the next reboot.
Disabling sqm-scripts (improved version of qos-scripts) resolved the 5Ghz
problem permanently. Of course, I would like to find a way to have sqm
working :). I know that sqm has a fairly slow script (it can take a number
of seconds) that runs when its network interface is brought up.
Since sqm is not configured to touch the wireless interfaces, I was
inclined to blame a race condition in OpenWRT's homegrown netifd. Indeed,
I was able to reproduce the failure even after replacing the
implementation of sqm with a busy loop of about 3 seconds. It's not just
the delay: using `sleep 3` didn't reproduce the failure.
It also turned out that `network restart` is slightly different from boot.
It only seems to fail when I have both sqm _and_ miniupnpd enabled.
--
Ticket URL: <https://dev.openwrt.org/ticket/22156>
OpenWrt <http://openwrt.org>
Opensource Wireless Router Technology
_______________________________________________
openwrt-tickets mailing list
[email protected]
https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-tickets