Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]
On Thu, 2016-01-28 at 19:54 -0600, Larry Finger wrote: > > I have been running an RTL8821AE since kernel 3.18 without hitting > this problem > using a TRENDnet AC1750 dual-band AP. The UniFi may be doing > something that the > driver is not expecting. Are you quite sure you're actually using VHT though, perhaps the AP somehow turned it off? It seems unlikely that you could successfully use it in any way given that RATE_INFO_FLAGS_VHT_MCS doesn't show up in the driver or rate scaling at all. johannes
Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]
On Fri, 2016-01-29 at 10:12 -0600, Larry Finger wrote: > > Upon further inspection, my log has the line "rtl8821ae :02:00.0 > wlp2s0: disabling HT/VHT due to WEP/TKIP use". I need to fix that > first. > Likely TKIP; enable only WPA2 (CCMP) on the AP. johannes
Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]
On 01/29/2016 10:15 AM, Johannes Berg wrote: On Fri, 2016-01-29 at 10:12 -0600, Larry Finger wrote: Upon further inspection, my log has the line "rtl8821ae :02:00.0 wlp2s0: disabling HT/VHT due to WEP/TKIP use". I need to fix that first. Likely TKIP; enable only WPA2 (CCMP) on the AP. I found and fixed it. The AP was using only 20 MHz channels, but now is configured for 20/40/80. That duplicates the warning for me and the lack of throughput even though it associates and authenticates. Larry
Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]
Linus, Attached is a trial patch that fixes the problem on my system. As I told Johannes earlier, my AP was not configured to use VHT, thus I did not see the problem. The test patch that Johannes sent earlier was close. The section needed to add VHT rates is: --- a/drivers/net/wireless/realtek/rtlwifi/rc.c +++ b/drivers/net/wireless/realtek/rtlwifi/rc.c @@ -138,6 +138,9 @@ static void _rtl_rc_rate_set_series(struct rtl_priv *rtlpriv, ((wireless_mode == WIRELESS_MODE_N_5G) || (wireless_mode == WIRELESS_MODE_N_24G))) rate->flags |= IEEE80211_TX_RC_MCS; + if (sta && sta->vht_cap.vht_supported && + (wireless_mode == WIRELESS_MODE_AC_5G)) + rate->flags |= IEEE80211_TX_RC_VHT_MCS; } } Larry >From bd34ac0c3caa9ff982194256b0e96772a17e719d Mon Sep 17 00:00:00 2001 From: Larry FingerDate: Fri, 29 Jan 2016 11:29:10 -0600 Subject: [PATCH] rtlwifi: Fix warning from ieee80211_get_tx_rates() when using 5G To: kv...@codeaurora.org Cc: linux-wirel...@vger.kernel.org, de...@driverdev.osuosl.org When using a 5G-capable device with VHT rates enabled, the following warning results: WARNING: CPU: 3 PID: 2253 at net/mac80211/rate.c:625 ieee80211_get_tx_rates+0x22e/0x620 [mac80211]() Modules linked in: rtl8821ae btcoexist rtl_pci rtlwifi fuse drbg ansi_cprng ctr ccm bnep bluetooth af_packet nfs fscache vboxpci(O) vboxnetadp(O) vboxne tflt(O) vboxdrv(O) arc4 snd_hda_codec_generic x86_pkg_temp_thermal rtsx_pci_sdmmc mmc_core rtsx_pci_ms kvm_intel memstick iwlmvm kvm mac80211 snd_hda_intel snd_hda_cod ec snd_hwdep snd_hda_core irqbypass snd_pcm iwlwifi crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 snd_timer lrw gf128mul glue_h elper ablk_helper cryptd snd cfg80211 pcspkr serio_raw e1000e rtsx_pci lpc_ich ptp xhci_pci mfd_core pps_core xhci_hcd soundcore toshiba_acpi thermal sparse_keymap wmi toshiba_bluetooth rfkill acpi_cpufreq battery ac processor dm_mod i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm sr_mod cdrom video button sg autofs4 [last unloaded: rtlwifi] CPU: 3 PID: 2253 Comm: Timer Tainted: GW O4.5.0-rc1-wl+ #79 Hardware name: TOSHIBA TECRA A50-A/TECRA A50-A, BIOS Version 4.20 04/17/2014 a05c4be6 8802262036d8 813d7912 880226203710 8106bcb6 8800c6831300 8800c6831330 8800c683133c 880065923638 880226203720 Call Trace: [] dump_stack+0x4b/0x79 [] warn_slowpath_common+0x86/0xc0 [] warn_slowpath_null+0x1a/0x20 [] ieee80211_get_tx_rates+0x22e/0x620 [mac80211] [] ? rtl_is_special_data+0x32/0x240 [rtlwifi] [] ? rate_control_get_rate+0xce/0x150 [mac80211] [] ? trace_hardirqs_on+0xd/0x10 [] ? __local_bh_enable_ip+0x65/0xd0 --- traceback terminated here --- The problem is that IEEE80211_TX_RC_VHT_MCS is not set in the rate flags. Reported-by: Linus Torvalds Cc: Johannes Berg Signed-off-by: Larry Finger Cc: Stable --- drivers/net/wireless/realtek/rtlwifi/rc.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/wireless/realtek/rtlwifi/rc.c b/drivers/net/wireless/realtek/rtlwifi/rc.c index 74c14ce..e7eae63 100644 --- a/drivers/net/wireless/realtek/rtlwifi/rc.c +++ b/drivers/net/wireless/realtek/rtlwifi/rc.c @@ -138,6 +138,9 @@ static void _rtl_rc_rate_set_series(struct rtl_priv *rtlpriv, ((wireless_mode == WIRELESS_MODE_N_5G) || (wireless_mode == WIRELESS_MODE_N_24G))) rate->flags |= IEEE80211_TX_RC_MCS; + if (sta && sta->vht_cap.vht_supported && + (wireless_mode == WIRELESS_MODE_AC_5G)) + rate->flags |= IEEE80211_TX_RC_VHT_MCS; } } -- 2.1.4
Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]
On Fri, Jan 29, 2016 at 9:54 AM, Larry Fingerwrote: > > The test patch that Johannes sent earlier was close. The section needed to > add VHT rates is: Hmm. This looks pretty much exactly like what I already tried (I had fixed Johannes' patch to use "vht_cap" already, since it didn't compile otherwise). So the only difference is that it only checks WIRELESS_MODE_AC_5G. But it worked for me this time. I have no idea why. Maybe Johannes' patch actually always worked for me, but I just had a transient problem that made me think it didn't. I think I only booted it once, and went "oh, ok, no network, that didn't work". Linus
Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]
On Fri, Jan 29, 2016 at 11:42 AM, Larry Fingerwrote: > > Thanks for testing. > > Upon reflection, it really should check the other WIRELESS_MODE_AC_x bits. > Johannes' patch was indeed correct. I just retested with this incremental (and whitespace-damaged) patch: @@ -139,7 +139,9 @@ static void _rtl_rc_rate_set_series(struct rtl_priv *rtlpriv, (wireless_mode == WIRELESS_MODE_N_24G))) rate->flags |= IEEE80211_TX_RC_MCS; if (sta && sta->vht_cap.vht_supported && - (wireless_mode == WIRELESS_MODE_AC_5G)) + ((wireless_mode == WIRELESS_MODE_AC_5G) || +(wireless_mode == WIRELESS_MODE_AC_24G) || +(wireless_mode == WIRELESS_MODE_AC_ONLY))) rate->flags |= IEEE80211_TX_RC_VHT_MCS; } } which brings it in line with Johannes' patch, and it does indeed still work. I think marking it for stable is also the right thing to do - the driver clearly doesn't work well in a wide-channel AC environment otherwise, and I assume it's going to be more and more common.. Linus
Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]
On 01/29/2016 12:39 PM, Linus Torvalds wrote: On Fri, Jan 29, 2016 at 9:54 AM, Larry Fingerwrote: The test patch that Johannes sent earlier was close. The section needed to add VHT rates is: Hmm. This looks pretty much exactly like what I already tried (I had fixed Johannes' patch to use "vht_cap" already, since it didn't compile otherwise). So the only difference is that it only checks WIRELESS_MODE_AC_5G. But it worked for me this time. I have no idea why. Maybe Johannes' patch actually always worked for me, but I just had a transient problem that made me think it didn't. I think I only booted it once, and went "oh, ok, no network, that didn't work". Thanks for testing. Upon reflection, it really should check the other WIRELESS_MODE_AC_x bits. Johannes' patch was indeed correct. With my AP setup changed, I get about 100 Mb/s RX using netperf. TX is still bad. @Johannes: OK if I make you the author of the final version of the patch? Larry
Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]
On Thu, Jan 28, 2016 at 4:13 AM, Johannes Bergwrote: > On Wed, 2016-01-27 at 21:34 -0800, Linus Torvalds wrote: > >> .. except now I upgraded the nearest access point, and now wireless >> on that machine no longer works. > > Can you describe the upgrade a bit more, just for background? I used to have the basic original UniFi UAP. I've replaced them with the newer "AC Lite" version: https://www.ubnt.com/unifi/unifi-ap-ac-lite/ so it's a fairly big jump from a 2.4GHz-only network to a dual-band one. The old 2.4GHz-only AP's showed the problem with minstrel-ht incorrectly starting off at the highest rate (on a totally different machine). So the Unifi AP's have shown problems in the kernel wireless before, but so far it's always been the fault of the kernel wireless, not the AP. > Could you print out the entire table there when the warning happens? This is the best I can come up with: printing out the index, and the rate and bitrate tables: rates[i].idx (9) >= sband->n_bitrates (8) Rates: 0: idx 9 count 1 flags a0 1: idx 8 count 1 flags a0 2: idx 7 count 2 flags a0 3: idx 6 count 3 flags a0 Bitrates: 0: flags 0002 bitrate 60 (hw: 0004 ) 1: flags bitrate 90 (hw: 0005 ) 2: flags 0002 bitrate 120 (hw: 0006 ) 3: flags bitrate 180 (hw: 0007 ) 4: flags 0002 bitrate 240 (hw: 0008 ) 5: flags bitrate 360 (hw: 0009 ) 6: flags bitrate 480 (hw: 000a ) 7: flags bitrate 540 (hw: 000b ) So it's the very first rate that has index 9, but the bitrate table only goes from 0-7. So I suspect that once the first index has been marked invalid, it now will never even look at the later indices, so it has no transmit rates at all. Or something. That bitrate table does seem to match: static struct ieee80211_rate rtl_ratetable_5g[] = { in drivers/net/wireless/realtek/rtlwifi/base.c Does this give you any ideas? Linus
Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]
On Thu, Jan 28, 2016 at 1:44 PM, Linus Torvaldswrote: > > I will try Johannes' suggestion on that machine to see if it makes a > difference Well, it "makes a difference" in the sense that the warning goes away. But it doesn't make things work. In fact, it might be making things worse. Because with that patch, the wireless still authenticates and associates, but then it doesn't even get an IP address, so now even dhcp doesn't work. Of course, I was surprised that it worked last time, and I'm not 100% sure it did work consistently. I'll re-test without the patch, just to make sure, but it doesn't really seem to improve on anything. Linus
Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]
On Thu, 2016-01-28 at 14:04 -0800, Linus Torvalds wrote: > > Well, it "makes a difference" in the sense that the warning goes > away. > But it doesn't make things work. In fact, it might be making things > worse. Heh, ok. > Because with that patch, the wireless still authenticates and > associates, but then it doesn't even get an IP address, so now even > dhcp doesn't work. Of course, I was surprised that it worked last > time, and I'm not 100% sure it did work consistently. I'll re-test > without the patch, just to make sure, but it doesn't really seem to > improve on anything. > It makes some sense, here's some speculation: VHT rates are MCS 0-9. If the rate scaling decides to use only VHT MCSes with a VHT-capable peer, then it stands to reason it might still start at 0, but forget to set the VHT_MCS flag, so it would really use rate index 0 from the table, which is 6 MBps. Then, it would see that "working" (since it's not the right thing) and scale up until it hits MCS 8 or 9, which is no longer a valid rate (those are only 0-7). Since the suggested changes make it worse, we can assume that this is not the only place where VHT is simply completely broken, and fixing VHT here will instead uncover a bug elsewhere, that was previously not happening because we never got to real VHT rates. Your best workaround may just be to ignore VHT for now - clearly it's broken so using "just" HT (which is likely not that much of a penalty anyway since you're apparently not using 80 MHz) will be much better. Go into _rtl_init_hw_vht_capab() and just remove or stub out the entire contents of that (or you could just remove the "vht_supported=true" if you feel like it.) That should get it to HT only, which is likely tested and working better. johannes
Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]
On Thu, Jan 28, 2016 at 2:12 PM, Johannes Bergwrote: > > Your best workaround may just be to ignore VHT for now - clearly it's > broken so using "just" HT (which is likely not that much of a penalty > anyway since you're apparently not using 80 MHz) will be much better. > > Go into > > _rtl_init_hw_vht_capab() > > and just remove or stub out the entire contents of that (or you could > just remove the "vht_supported=true" if you feel like it.) > > That should get it to HT only, which is likely tested and working > better. Bingo. That indeed gets me working wireless. It's not super-fast, but I don't think it ever has been.. If somebody has a suggested patch to actually *fix* VHT on this chipset, that would obviously be better. And maybe it works on some other chipsets, but not on mine. I'll happily test patches now that the merge window is over and I have some time again (and I can also make my AP do 80MHz channels if that matters, although as Johannes noted it's not enabled by default). For the realtek driver people, here is what lspci says: 02:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8821AE 802.11ac PCIe Wireless Network Adapter Subsystem: AzureWave Device 2161 Kernel driver in use: rtl8821ae (Numeric PCI ID: 10ec:8821, subsystem 1a3b:2161) Thanks, Linus
Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]
Adding the RTL people to the cc, and leaving the whole thing quoted at the bottom.. I will try Johannes' suggestion on that machine to see if it makes a difference, but somebody who knows the rtlwifi rate control code should take a double- or triple-look at this. Please? Some googling shows that this is not a new issue. Or at least I seem to find reports that look very much like this from over a year ago. Linus On Thu, Jan 28, 2016 at 12:40 PM, Johannes Bergwrote: > On Thu, 2016-01-28 at 11:01 -0800, Linus Torvalds wrote: >> >> I used to have the basic original UniFi UAP. I've replaced them with >> the newer "AC Lite" version: >> >> https://www.ubnt.com/unifi/unifi-ap-ac-lite/ >> >> so it's a fairly big jump from a 2.4GHz-only network to a dual-band >> one. >> >> The old 2.4GHz-only AP's showed the problem with minstrel-ht >> incorrectly starting off at the highest rate (on a totally different >> machine). So the Unifi AP's have shown problems in the kernel >> wireless before, but so far it's always been the fault of the kernel >> wireless, not the AP. > > Yeah; I wasn't trying to blame it on this change, I was just trying to > understand the change in the environment. Seems likely that it's simply > the switch to 5 GHz, which is strange, I'd have thought that even that > rtlwifi driver would've been tested with that :) > >> > Could you print out the entire table there when the warning >> > happens? >> >> This is the best I can come up with: printing out the index, and the >> rate and bitrate tables: >> >> rates[i].idx (9) >= sband->n_bitrates (8) >> Rates: >> 0: idx 9 count 1 flags a0 >> 1: idx 8 count 1 flags a0 >> 2: idx 7 count 2 flags a0 >> 3: idx 6 count 3 flags a0 > > Yeah, perfect. See, this is already evidently not making any sense: > > flags a0 is > IEEE80211_TX_RC_40_MHZ_WIDTH | IEEE80211_TX_RC_SHORT_GI > > both of those options *require* IEEE80211_TX_RC_MCS or > IEEE80211_TX_RC_VHT_MCS as well, so the flags really should be a8 or > 1a0. > >> Bitrates: >> 0: flags 0002 bitrate 60 (hw: 0004 ) >> 1: flags bitrate 90 (hw: 0005 ) >> 2: flags 0002 bitrate 120 (hw: 0006 ) >> 3: flags bitrate 180 (hw: 0007 ) >> 4: flags 0002 bitrate 240 (hw: 0008 ) >> 5: flags bitrate 360 (hw: 0009 ) >> 6: flags bitrate 480 (hw: 000a ) >> 7: flags bitrate 540 (hw: 000b ) >> >> So it's the very first rate that has index 9, but the bitrate table >> only goes from 0-7. >> >> So I suspect that once the first index has been marked invalid, it >> now will never even look at the later indices, so it has no transmit >> rates at all. Or something. > > Indeed. > >> That bitrate table does seem to match: >> >>static struct ieee80211_rate rtl_ratetable_5g[] = { >> >> in drivers/net/wireless/realtek/rtlwifi/base.c >> > > Yeah, it would, but it's irrelevant since the rate table isn't actually > used with MCS rates. > > I'm not familiar with this code at all, but looking at it suggests that > perhaps the switch to 5 GHz wasn't at fault, but instead the switch to > VHT (802.11ac) - that's more plausible too, not testing with VHT seems > like something that could have happened for this driver. > > And as I figured, the code in _rtl_rc_rate_set_series() is obviously > not handling VHT correctly: it has > > if (sgi_20 || sgi_40 || sgi_80) > rate->flags |= IEEE80211_TX_RC_SHORT_GI; > if (sta && sta->ht_cap.ht_supported && > ((wireless_mode == WIRELESS_MODE_N_5G) || > (wireless_mode == WIRELESS_MODE_N_24G))) > rate->flags |= IEEE80211_TX_RC_MCS; > > but can never set IEEE80211_TX_RC_VHT_MCS. Seems like there should be > something like > > if (sta && sta->ht_cap.vht_supported && > (wireless_mode == WIRELESS_MODE_AC_5G || > wireless_mode == WIRELESS_MODE_AC_24G || > wireless_mode == WIRELESS_MODE_AC_ONLY)) > rate->flags |= IEEE80211_TX_RC_VHT_MCS; > > just after/before the above block. > > But I'm not familiar with this code at all, so that may not really be > the right fix or even work. > > johannes
Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]
On Thu, 2016-01-28 at 11:01 -0800, Linus Torvalds wrote: > > I used to have the basic original UniFi UAP. I've replaced them with > the newer "AC Lite" version: > > https://www.ubnt.com/unifi/unifi-ap-ac-lite/ > > so it's a fairly big jump from a 2.4GHz-only network to a dual-band > one. > > The old 2.4GHz-only AP's showed the problem with minstrel-ht > incorrectly starting off at the highest rate (on a totally different > machine). So the Unifi AP's have shown problems in the kernel > wireless before, but so far it's always been the fault of the kernel > wireless, not the AP. Yeah; I wasn't trying to blame it on this change, I was just trying to understand the change in the environment. Seems likely that it's simply the switch to 5 GHz, which is strange, I'd have thought that even that rtlwifi driver would've been tested with that :) > > Could you print out the entire table there when the warning > > happens? > > This is the best I can come up with: printing out the index, and the > rate and bitrate tables: > > rates[i].idx (9) >= sband->n_bitrates (8) > Rates: > 0: idx 9 count 1 flags a0 > 1: idx 8 count 1 flags a0 > 2: idx 7 count 2 flags a0 > 3: idx 6 count 3 flags a0 Yeah, perfect. See, this is already evidently not making any sense: flags a0 is IEEE80211_TX_RC_40_MHZ_WIDTH | IEEE80211_TX_RC_SHORT_GI both of those options *require* IEEE80211_TX_RC_MCS or IEEE80211_TX_RC_VHT_MCS as well, so the flags really should be a8 or 1a0. > Bitrates: > 0: flags 0002 bitrate 60 (hw: 0004 ) > 1: flags bitrate 90 (hw: 0005 ) > 2: flags 0002 bitrate 120 (hw: 0006 ) > 3: flags bitrate 180 (hw: 0007 ) > 4: flags 0002 bitrate 240 (hw: 0008 ) > 5: flags bitrate 360 (hw: 0009 ) > 6: flags bitrate 480 (hw: 000a ) > 7: flags bitrate 540 (hw: 000b ) > > So it's the very first rate that has index 9, but the bitrate table > only goes from 0-7. > > So I suspect that once the first index has been marked invalid, it > now will never even look at the later indices, so it has no transmit > rates at all. Or something. Indeed. > That bitrate table does seem to match: > > static struct ieee80211_rate rtl_ratetable_5g[] = { > > in drivers/net/wireless/realtek/rtlwifi/base.c > Yeah, it would, but it's irrelevant since the rate table isn't actually used with MCS rates. I'm not familiar with this code at all, but looking at it suggests that perhaps the switch to 5 GHz wasn't at fault, but instead the switch to VHT (802.11ac) - that's more plausible too, not testing with VHT seems like something that could have happened for this driver. And as I figured, the code in _rtl_rc_rate_set_series() is obviously not handling VHT correctly: it has if (sgi_20 || sgi_40 || sgi_80) rate->flags |= IEEE80211_TX_RC_SHORT_GI; if (sta && sta->ht_cap.ht_supported && ((wireless_mode == WIRELESS_MODE_N_5G) || (wireless_mode == WIRELESS_MODE_N_24G))) rate->flags |= IEEE80211_TX_RC_MCS; but can never set IEEE80211_TX_RC_VHT_MCS. Seems like there should be something like if (sta && sta->ht_cap.vht_supported && (wireless_mode == WIRELESS_MODE_AC_5G || wireless_mode == WIRELESS_MODE_AC_24G || wireless_mode == WIRELESS_MODE_AC_ONLY)) rate->flags |= IEEE80211_TX_RC_VHT_MCS; just after/before the above block. But I'm not familiar with this code at all, so that may not really be the right fix or even work. johannes
Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]
On Wed, 2016-01-27 at 21:34 -0800, Linus Torvalds wrote: > .. except now I upgraded the nearest access point, and now wireless > on that machine no longer works. Can you describe the upgrade a bit more, just for background? > Or rather, it actually *does* work in the sense that it > authenticates, it associates, and it actually gets a DHCP lease etc. > So the darn thing has an IP address and everything, but then nothing > else seems to go through after that. Very odd. My guess is that the > auth/assoc/dhcp thign happens at low rates, then it starts trying to > up the rates, and things go to hell. That's usually the case, yes. Auth/assoc/etc. management frames use low rates anyway, and the first few data frames usually also do until it scales up. The code involved is drivers/net/wireless/realtek/rtlwifi/rc.c > I do note that that rate_fixup_ratelist() function is a bit odd wrt > those rate indexes: it has code to make sure that there are no valid > rates following an invalid one: > > /* > * make sure there's no valid rate following > * an invalid one, just in case drivers don't > * take the API seriously to stop at -1. > */ > if (inval) { > rates[i].idx = -1; > continue; > } > if (rates[i].idx < 0) { > inval = true; > continue; > } > > but then that "RC is busted" case that generates a warning will add > one of those invalid rates in the middle anyway. So I get the feeling > that if that warning ever triggers, it will basically be screwing up > that whole rate table. I dunno. This should be OK, it's more of a sanity check. The driver is supposed to stop transmission attempts at the first -1 it seems, but the rate control algorithm shouldn't generate useless attempts that will never really get used, since that indicates a bug in the rate scaling. > Is there anything sane I can do to help debug this case? Could you print out the entire table there when the warning happens? Or at least, it'd help to figure out at which index the invalid actually happens. It seems that if that perhaps happens on the very first index, the driver might get completely confused and perhaps not even send the frame, which would lead to symptoms like the one you describe. It seems plausible that there's a path somewhere in the rate scaling code that forgets to set IEEE80211_TX_RC_MCS or so. johannes
Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]
On Thu, Jan 28, 2016 at 5:54 PM, Larry Fingerwrote: > > I have been running an RTL8821AE since kernel 3.18 without hitting this > problem using a TRENDnet AC1750 dual-band AP. The UniFi may be doing > something that the driver is not expecting. I've had issues with unifi ap's before, but to be honest, I've had issues with lots of hotel and airport wifi too. I don't think the Unifi APs are outside of the normal spectrum.. > Attached is a minimal patch that comments out the "vht_cap->vht_supported = > true;" statement for both RTL8821AE and RTL8812AE in > _rtl_init_hw_vht_capab(). Does that allow your system to work? That works too, yes. > The patch > also logs some information regarding the channelplan and the country code. > Please let me know the values for those. rtlwifi: channelplan 127 rtlwifi: country code 13 > I apparently missed a previous complaint about this issue. If you still have > the reference, please send it to me. So googling for similar issues, I found https://bugzilla.redhat.com/show_bug.cgi?id=1168467 https://bugzilla.redhat.com/show_bug.cgi?id=1293136 where that second one in particular looks very like my issue ("Association succeeds, and ARP/DHCP work, but no IP frames can be transmitted"). In both cases you have to go into the dmesg attachment to see that its rtlwifi in both cases). And there's an ubuntuforum thread http://ubuntuforums.org/showthread.php?t=2226009=2 where it you follow the thing, it's an rtl chip on a PCI card, and it has very similar "connected but no internet" behavior, along with the "net/mac80211/rate.c:526" warning (different line numbers, different kernel version, but it smells similar). Or this one: http://forums.debian.net/viewtopic.php?f=5=111781 which is also rtl-wifi, and also has the "associated, connected, got an IP, but no data, not even a ping" behavior. It also has the warning, but it looks different in other ways (2.4GHz only and actually says it's not doing HT/VHT). So I don't know. The warning in net/mac80211/rate.c:does seem to be associated with the realtek driver. Linus
Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]
On 01/28/2016 05:01 PM, Linus Torvalds wrote: On Thu, Jan 28, 2016 at 2:12 PM, Johannes Bergwrote: Your best workaround may just be to ignore VHT for now - clearly it's broken so using "just" HT (which is likely not that much of a penalty anyway since you're apparently not using 80 MHz) will be much better. Go into _rtl_init_hw_vht_capab() and just remove or stub out the entire contents of that (or you could just remove the "vht_supported=true" if you feel like it.) That should get it to HT only, which is likely tested and working better. Bingo. That indeed gets me working wireless. It's not super-fast, but I don't think it ever has been.. If somebody has a suggested patch to actually *fix* VHT on this chipset, that would obviously be better. And maybe it works on some other chipsets, but not on mine. I'll happily test patches now that the merge window is over and I have some time again (and I can also make my AP do 80MHz channels if that matters, although as Johannes noted it's not enabled by default). For the realtek driver people, here is what lspci says: 02:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8821AE 802.11ac PCIe Wireless Network Adapter Subsystem: AzureWave Device 2161 Kernel driver in use: rtl8821ae (Numeric PCI ID: 10ec:8821, subsystem 1a3b:2161) Thanks, Linus, I have been running an RTL8821AE since kernel 3.18 without hitting this problem using a TRENDnet AC1750 dual-band AP. The UniFi may be doing something that the driver is not expecting. There have also been some problems with the regdom in some models of these chips that I also fail to see. It appears that some vendors are not coding the EEPROM correctly. That should not affect your system. Attached is a minimal patch that comments out the "vht_cap->vht_supported = true;" statement for both RTL8821AE and RTL8812AE in _rtl_init_hw_vht_capab(). Does that allow your system to work? The patch also logs some information regarding the channelplan and the country code. Please let me know the values for those. I apparently missed a previous complaint about this issue. If you still have the reference, please send it to me. Larry diff --git a/drivers/net/wireless/realtek/rtlwifi/base.c b/drivers/net/wireless/realtek/rtlwifi/base.c index 0517a4f..2464d41 100644 --- a/drivers/net/wireless/realtek/rtlwifi/base.c +++ b/drivers/net/wireless/realtek/rtlwifi/base.c @@ -248,7 +248,7 @@ static void _rtl_init_hw_vht_capab(struct ieee80211_hw *hw, if (rtlhal->hw_type == HARDWARE_TYPE_RTL8812AE) { u16 mcs_map; - vht_cap->vht_supported = true; + /* vht_cap->vht_supported = true; */ vht_cap->cap = IEEE80211_VHT_CAP_MAX_MPDU_LENGTH_3895 | IEEE80211_VHT_CAP_MAX_MPDU_LENGTH_7991 | @@ -282,7 +282,7 @@ static void _rtl_init_hw_vht_capab(struct ieee80211_hw *hw, } else if (rtlhal->hw_type == HARDWARE_TYPE_RTL8821AE) { u16 mcs_map; - vht_cap->vht_supported = true; + /* vht_cap->vht_supported = true; */ vht_cap->cap = IEEE80211_VHT_CAP_MAX_MPDU_LENGTH_3895 | IEEE80211_VHT_CAP_MAX_MPDU_LENGTH_7991 | diff --git a/drivers/net/wireless/realtek/rtlwifi/regd.c b/drivers/net/wireless/realtek/rtlwifi/regd.c index 5be3411..38f464e 100644 --- a/drivers/net/wireless/realtek/rtlwifi/regd.c +++ b/drivers/net/wireless/realtek/rtlwifi/regd.c @@ -340,6 +340,7 @@ static int _rtl_reg_notifier_apply(struct wiphy *wiphy, static const struct ieee80211_regdomain *_rtl_regdomain_select( struct rtl_regulatory *reg) { + pr_info(" country code %d\n", reg->country_code); switch (reg->country_code) { case COUNTRY_CODE_FCC: return _regdom_no_midband; @@ -400,6 +401,7 @@ static struct country_code_to_enum_rd *_rtl_regd_find_country(u16 countrycode) static u8 channel_plan_to_country_code(u8 channelplan) { + pr_info(" channelplan %d\n", channelplan); switch (channelplan) { case 0x20: case 0x21:
WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]
Hmm. So my daughter has a little Gigabyte Brix that has rtl8821ae wireless in it. Yeah, nasty, I know, but it has actually worked reasonably well. .. except now I upgraded the nearest access point, and now wireless on that machine no longer works. Or rather, it actually *does* work in the sense that it authenticates, it associates, and it actually gets a DHCP lease etc. So the darn thing has an IP address and everything, but then nothing else seems to go through after that. Very odd. My guess is that the auth/assoc/dhcp thign happens at low rates, then it starts trying to up the rates, and things go to hell. But clearly several packets have gotten through. And then absolutely nothing. Everything else is happy with the new AP, so this is not a problem with the wireless network itself. I'm appending the warning that gets printed, which may or may not be relevant. This is with a clean and up-to-date Fedora 23 install, so that line 513 is the 512 /* RC is busted */ 513 if (WARN_ON_ONCE(rates[i].idx >= sband->n_bitrates)) { 514 rates[i].idx = -1; 515 continue; 516 } thing, which still exists in the same form in current kernels (except in current -git it's line 625). I do note that that rate_fixup_ratelist() function is a bit odd wrt those rate indexes: it has code to make sure that there are no valid rates following an invalid one: /* * make sure there's no valid rate following * an invalid one, just in case drivers don't * take the API seriously to stop at -1. */ if (inval) { rates[i].idx = -1; continue; } if (rates[i].idx < 0) { inval = true; continue; } but then that "RC is busted" case that generates a warning will add one of those invalid rates in the middle anyway. So I get the feeling that if that warning ever triggers, it will basically be screwing up that whole rate table. I dunno. Is there anything sane I can do to help debug this case? Linus --- snip snip, relevant (?) wireless warning --- IPv6: ADDRCONF(NETDEV_UP): enp3s0: link is not ready r8169 :03:00.0 enp3s0: link down IPv6: ADDRCONF(NETDEV_UP): enp3s0: link is not ready IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready tun: Universal TUN/TAP device driver, 1.6 tun: (C) 1999-2004 Max Krasnyanskydevice virbr0-nic entered promiscuous mode virbr0: port 1(virbr0-nic) entered listening state virbr0: port 1(virbr0-nic) entered listening state virbr0: port 1(virbr0-nic) entered disabled state wlp2s0: authenticate with 46:d9:e7:92:bf:29 wlp2s0: send auth to 46:d9:e7:92:bf:29 (try 1/3) wlp2s0: authenticated wlp2s0: associate with 46:d9:e7:92:bf:29 (try 1/3) wlp2s0: associate with 46:d9:e7:92:bf:29 (try 2/3) wlp2s0: RX AssocResp from 46:d9:e7:92:bf:29 (capab=0x411 status=0 aid=1) wlp2s0: associated IPv6: ADDRCONF(NETDEV_CHANGE): wlp2s0: link becomes ready [ cut here ] WARNING: CPU: 2 PID: 0 at net/mac80211/rate.c:513 ieee80211_get_tx_rates+0x243/0x7d0 [mac80211]() Modules linked in: ccm cmac xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_filter ebtable_nat ebtable_broute bridge ebtables ip6table_raw ip6table_security ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_filter ip6_tables iptable_raw iptable_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle bnep arc4 rtl8821ae vfat fat btcoexist rtl_pci rtlwifi mac80211 x86_pkg_temp_thermal coretemp snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic kvm_intel snd_soc_rt5640 kvm snd_soc_rl6231 snd_hda_intel snd_soc_core iTCO_wdt snd_hda_codec snd_compress btusb snd_pcm_dmaengine snd_hda_core iTCO_vendor_support cfg80211 ac97_bus btrtl snd_hwdep crct10dif_pclmul btbcm snd_seq crc32_pclmul btintel crc32c_intel bluetooth snd_seq_device joydev snd_pcm mei_me mei shpchp dw_dmac tpm_tis lpc_ich i2c_i801 snd_timer rfkill snd tpm soundcore snd_soc_sst_acpi dw_dmac_core i2c_designware_platform i2c_designware_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc hid_logitech_hidpp hid_logitech_dj i915 i2c_algo_bit drm_kms_helper 8021q garp drm stp llc mrp r8169 sdhci_acpi mii sdhci mmc_core video i2c_hid CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.2.8-300.fc23.x86_64 #1 Hardware name: GIGABYTE M4HM87P-00/M4HM87P-00, BIOS F2 12/11/2013 aad0aff724c0ea01 88021ea83648 817738ca