Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-29 Thread Johannes Berg
On Thu, 2016-01-28 at 19:54 -0600, Larry Finger wrote:
> 
> I have been running an RTL8821AE since kernel 3.18 without hitting
> this problem 
> using a TRENDnet AC1750 dual-band AP. The UniFi may be doing
> something that the 
> driver is not expecting.

Are you quite sure you're actually using VHT though, perhaps the AP
somehow turned it off? It seems unlikely that you could successfully
use it in any way given that RATE_INFO_FLAGS_VHT_MCS doesn't show up in
the driver or rate scaling at all.


johannes


Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-29 Thread Johannes Berg
On Fri, 2016-01-29 at 10:12 -0600, Larry Finger wrote:
> 
> Upon further inspection, my log has the line "rtl8821ae :02:00.0
> wlp2s0: disabling HT/VHT due to WEP/TKIP use". I need to fix that
> first.
> 
Likely TKIP; enable only WPA2 (CCMP) on the AP.

johannes


Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-29 Thread Larry Finger

On 01/29/2016 10:15 AM, Johannes Berg wrote:

On Fri, 2016-01-29 at 10:12 -0600, Larry Finger wrote:


Upon further inspection, my log has the line "rtl8821ae :02:00.0
wlp2s0: disabling HT/VHT due to WEP/TKIP use". I need to fix that
first.


Likely TKIP; enable only WPA2 (CCMP) on the AP.


I found and fixed it. The AP was using only 20 MHz channels, but now is 
configured for 20/40/80. That duplicates the warning for me and the lack of 
throughput even though it associates and authenticates.


Larry




Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-29 Thread Larry Finger

Linus,

Attached is a trial patch that fixes the problem on my system. As I told 
Johannes earlier, my AP was not configured to use VHT, thus I did not see the 
problem.


The test patch that Johannes sent earlier was close. The section needed to add 
VHT rates is:


--- a/drivers/net/wireless/realtek/rtlwifi/rc.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rc.c
@@ -138,6 +138,9 @@ static void _rtl_rc_rate_set_series(struct rtl_priv 
*rtlpriv,
((wireless_mode == WIRELESS_MODE_N_5G) ||
 (wireless_mode == WIRELESS_MODE_N_24G)))
rate->flags |= IEEE80211_TX_RC_MCS;
+   if (sta && sta->vht_cap.vht_supported &&
+   (wireless_mode == WIRELESS_MODE_AC_5G))
+   rate->flags |= IEEE80211_TX_RC_VHT_MCS;
}
 }

Larry

>From bd34ac0c3caa9ff982194256b0e96772a17e719d Mon Sep 17 00:00:00 2001
From: Larry Finger 
Date: Fri, 29 Jan 2016 11:29:10 -0600
Subject: [PATCH] rtlwifi: Fix warning from ieee80211_get_tx_rates() when using
 5G
To: kv...@codeaurora.org
Cc: linux-wirel...@vger.kernel.org,
de...@driverdev.osuosl.org

When using a 5G-capable device with VHT rates enabled, the following
warning results:

WARNING: CPU: 3 PID: 2253 at net/mac80211/rate.c:625 ieee80211_get_tx_rates+0x22e/0x620 [mac80211]()
Modules linked in: rtl8821ae btcoexist rtl_pci rtlwifi fuse drbg ansi_cprng ctr ccm bnep bluetooth af_packet nfs fscache vboxpci(O) vboxnetadp(O) vboxne
tflt(O) vboxdrv(O) arc4 snd_hda_codec_generic x86_pkg_temp_thermal rtsx_pci_sdmmc mmc_core rtsx_pci_ms kvm_intel memstick iwlmvm kvm mac80211 snd_hda_intel snd_hda_cod
ec snd_hwdep snd_hda_core irqbypass snd_pcm iwlwifi crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 snd_timer lrw gf128mul glue_h
elper ablk_helper cryptd snd cfg80211 pcspkr serio_raw e1000e rtsx_pci lpc_ich ptp xhci_pci mfd_core pps_core xhci_hcd soundcore toshiba_acpi thermal sparse_keymap wmi
 toshiba_bluetooth rfkill acpi_cpufreq battery ac processor dm_mod i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
drm sr_mod cdrom video button sg autofs4 [last unloaded: rtlwifi]
CPU: 3 PID: 2253 Comm: Timer Tainted: GW  O4.5.0-rc1-wl+ #79
Hardware name: TOSHIBA TECRA A50-A/TECRA A50-A, BIOS Version 4.20   04/17/2014
  a05c4be6 8802262036d8 813d7912 
  880226203710 8106bcb6 8800c6831300 8800c6831330
   8800c683133c 880065923638 880226203720
Call Trace:
[] dump_stack+0x4b/0x79
  [] warn_slowpath_common+0x86/0xc0
  [] warn_slowpath_null+0x1a/0x20
  [] ieee80211_get_tx_rates+0x22e/0x620 [mac80211]
  [] ? rtl_is_special_data+0x32/0x240 [rtlwifi]
  [] ? rate_control_get_rate+0xce/0x150 [mac80211]
  [] ? trace_hardirqs_on+0xd/0x10
  [] ? __local_bh_enable_ip+0x65/0xd0
--- traceback terminated here ---

The problem is that IEEE80211_TX_RC_VHT_MCS is not set in the rate flags.

Reported-by: Linus Torvalds 
Cc: Johannes Berg 
Signed-off-by: Larry Finger 
Cc: Stable 
---
 drivers/net/wireless/realtek/rtlwifi/rc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/wireless/realtek/rtlwifi/rc.c b/drivers/net/wireless/realtek/rtlwifi/rc.c
index 74c14ce..e7eae63 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rc.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rc.c
@@ -138,6 +138,9 @@ static void _rtl_rc_rate_set_series(struct rtl_priv *rtlpriv,
 		((wireless_mode == WIRELESS_MODE_N_5G) ||
 		 (wireless_mode == WIRELESS_MODE_N_24G)))
 			rate->flags |= IEEE80211_TX_RC_MCS;
+		if (sta && sta->vht_cap.vht_supported &&
+		(wireless_mode == WIRELESS_MODE_AC_5G))
+			rate->flags |= IEEE80211_TX_RC_VHT_MCS;
 	}
 }
 
-- 
2.1.4



Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-29 Thread Linus Torvalds
On Fri, Jan 29, 2016 at 9:54 AM, Larry Finger  wrote:
>
> The test patch that Johannes sent earlier was close. The section needed to
> add VHT rates is:

Hmm. This looks pretty much exactly like what I already tried (I had
fixed Johannes' patch to use "vht_cap" already, since it didn't
compile otherwise).

So the only difference is that it only checks WIRELESS_MODE_AC_5G.

But it worked for me this time. I have no idea why.

Maybe Johannes' patch actually always worked for me, but I just had a
transient problem that made me think it didn't. I think I only booted
it once, and went "oh, ok, no network, that didn't work".

  Linus


Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-29 Thread Linus Torvalds
On Fri, Jan 29, 2016 at 11:42 AM, Larry Finger
 wrote:
>
> Thanks for testing.
>
> Upon reflection, it really should check the other WIRELESS_MODE_AC_x bits.
> Johannes' patch was indeed correct.

I just retested with this incremental (and whitespace-damaged) patch:

  @@ -139,7 +139,9 @@ static void _rtl_rc_rate_set_series(struct
rtl_priv *rtlpriv,
   (wireless_mode == WIRELESS_MODE_N_24G)))
  rate->flags |= IEEE80211_TX_RC_MCS;
  if (sta && sta->vht_cap.vht_supported &&
  -   (wireless_mode == WIRELESS_MODE_AC_5G))
  +   ((wireless_mode == WIRELESS_MODE_AC_5G) ||
  +(wireless_mode == WIRELESS_MODE_AC_24G) ||
  +(wireless_mode == WIRELESS_MODE_AC_ONLY)))
  rate->flags |= IEEE80211_TX_RC_VHT_MCS;
  }
   }

which brings it in line with Johannes' patch, and it does indeed still work.

I think marking it for stable is also the right thing to do - the
driver clearly doesn't work well in a wide-channel AC environment
otherwise, and I assume it's going to be more and more common..

Linus


Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-29 Thread Larry Finger

On 01/29/2016 12:39 PM, Linus Torvalds wrote:

On Fri, Jan 29, 2016 at 9:54 AM, Larry Finger  wrote:


The test patch that Johannes sent earlier was close. The section needed to
add VHT rates is:


Hmm. This looks pretty much exactly like what I already tried (I had
fixed Johannes' patch to use "vht_cap" already, since it didn't
compile otherwise).

So the only difference is that it only checks WIRELESS_MODE_AC_5G.

But it worked for me this time. I have no idea why.

Maybe Johannes' patch actually always worked for me, but I just had a
transient problem that made me think it didn't. I think I only booted
it once, and went "oh, ok, no network, that didn't work".


Thanks for testing.

Upon reflection, it really should check the other WIRELESS_MODE_AC_x bits. 
Johannes' patch was indeed correct.


With my AP setup changed, I get about 100 Mb/s RX using netperf. TX is still 
bad.

@Johannes: OK if I make you the author of the final version of the patch?

Larry





Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-28 Thread Linus Torvalds
On Thu, Jan 28, 2016 at 4:13 AM, Johannes Berg
 wrote:
> On Wed, 2016-01-27 at 21:34 -0800, Linus Torvalds wrote:
>
>> .. except now I upgraded the nearest access point, and now wireless
>> on that machine no longer works.
>
> Can you describe the upgrade a bit more, just for background?

I used to have the basic original UniFi UAP. I've replaced them with
the newer "AC Lite" version:

https://www.ubnt.com/unifi/unifi-ap-ac-lite/

so it's a fairly big jump from a 2.4GHz-only network to a dual-band one.

The old 2.4GHz-only AP's showed the problem with minstrel-ht
incorrectly starting off at the highest rate (on a totally different
machine). So the Unifi AP's have shown problems in the kernel wireless
before, but so far it's always been the fault of the kernel wireless,
not the AP.

> Could you print out the entire table there when the warning happens?

This is the best I can come up with: printing out the index, and the
rate and bitrate tables:

  rates[i].idx (9) >= sband->n_bitrates (8)
  Rates:
  0: idx 9 count 1 flags a0
  1: idx 8 count 1 flags a0
  2: idx 7 count 2 flags a0
  3: idx 6 count 3 flags a0
  Bitrates:
  0: flags 0002 bitrate 60 (hw: 0004 )
  1: flags  bitrate 90 (hw: 0005 )
  2: flags 0002 bitrate 120 (hw: 0006 )
  3: flags  bitrate 180 (hw: 0007 )
  4: flags 0002 bitrate 240 (hw: 0008 )
  5: flags  bitrate 360 (hw: 0009 )
  6: flags  bitrate 480 (hw: 000a )
  7: flags  bitrate 540 (hw: 000b )

So it's the very first rate that has index 9, but the bitrate table
only goes from 0-7.

So I suspect that once the first index has been marked invalid, it now
will never even look at the later indices, so it has no transmit rates
at all.  Or something.

That bitrate table does seem to match:

   static struct ieee80211_rate rtl_ratetable_5g[] = {

in drivers/net/wireless/realtek/rtlwifi/base.c

Does this give you any ideas?

  Linus


Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-28 Thread Linus Torvalds
On Thu, Jan 28, 2016 at 1:44 PM, Linus Torvalds
 wrote:
>
> I will try Johannes' suggestion on that machine to see if it makes a
> difference

Well, it "makes a difference" in the sense that the warning goes away.
But it doesn't make things work. In fact, it might be making things
worse.

Because with that patch, the wireless still authenticates and
associates, but then it doesn't even get an IP address, so now even
dhcp doesn't work. Of course, I was surprised that it worked last
time, and I'm not 100% sure it did work consistently. I'll re-test
without the patch, just to make sure, but it doesn't really seem to
improve on anything.

Linus


Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-28 Thread Johannes Berg
On Thu, 2016-01-28 at 14:04 -0800, Linus Torvalds wrote:
> 
> Well, it "makes a difference" in the sense that the warning goes
> away.
> But it doesn't make things work. In fact, it might be making things
> worse.

Heh, ok.

> Because with that patch, the wireless still authenticates and
> associates, but then it doesn't even get an IP address, so now even
> dhcp doesn't work. Of course, I was surprised that it worked last
> time, and I'm not 100% sure it did work consistently. I'll re-test
> without the patch, just to make sure, but it doesn't really seem to
> improve on anything.
> 

It makes some sense, here's some speculation:

VHT rates are MCS 0-9. If the rate scaling decides to use only VHT
MCSes with a VHT-capable peer, then it stands to reason it might still
start at 0, but forget to set the VHT_MCS flag, so it would really use
rate index 0 from the table, which is 6 MBps. Then, it would see that
"working" (since it's not the right thing) and scale up until it hits
MCS 8 or 9, which is no longer a valid rate (those are only 0-7).

Since the suggested changes make it worse, we can assume that this is
not the only place where VHT is simply completely broken, and fixing
VHT here will instead uncover a bug elsewhere, that was previously not
happening because we never got to real VHT rates.

Your best workaround may just be to ignore VHT for now - clearly it's
broken so using "just" HT (which is likely not that much of a penalty
anyway since you're apparently not using 80 MHz) will be much better.

Go into

_rtl_init_hw_vht_capab()

and just remove or stub out the entire contents of that (or you could
just remove the "vht_supported=true" if you feel like it.)

That should get it to HT only, which is likely tested and working
better.

johannes


Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-28 Thread Linus Torvalds
On Thu, Jan 28, 2016 at 2:12 PM, Johannes Berg
 wrote:
>
> Your best workaround may just be to ignore VHT for now - clearly it's
> broken so using "just" HT (which is likely not that much of a penalty
> anyway since you're apparently not using 80 MHz) will be much better.
>
> Go into
>
> _rtl_init_hw_vht_capab()
>
> and just remove or stub out the entire contents of that (or you could
> just remove the "vht_supported=true" if you feel like it.)
>
> That should get it to HT only, which is likely tested and working
> better.

Bingo. That indeed gets me working wireless. It's not super-fast, but
I don't think it ever has been..

If somebody has a suggested patch to actually *fix* VHT on this
chipset, that would obviously be better. And maybe it works on some
other chipsets, but not on mine. I'll happily test patches now that
the merge window is over and I have some time again (and I can also
make my AP do 80MHz channels if that matters, although as Johannes
noted it's not enabled by default).

For the realtek driver people, here is what lspci says:

02:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8821AE
802.11ac PCIe Wireless Network Adapter
Subsystem: AzureWave Device 2161
Kernel driver in use: rtl8821ae

(Numeric PCI ID: 10ec:8821, subsystem 1a3b:2161)

Thanks,

  Linus


Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-28 Thread Linus Torvalds
Adding the RTL people to the cc, and leaving the whole thing quoted at
the bottom..

I will try Johannes' suggestion on that machine to see if it makes a
difference, but somebody who knows the rtlwifi rate control code
should take a double- or triple-look at this.

Please? Some googling shows that this is not a new issue. Or at least
I seem to find reports that look very much like this from over a year
ago.

 Linus

On Thu, Jan 28, 2016 at 12:40 PM, Johannes Berg
 wrote:
> On Thu, 2016-01-28 at 11:01 -0800, Linus Torvalds wrote:
>>
>> I used to have the basic original UniFi UAP. I've replaced them with
>> the newer "AC Lite" version:
>>
>> https://www.ubnt.com/unifi/unifi-ap-ac-lite/
>>
>> so it's a fairly big jump from a 2.4GHz-only network to a dual-band
>> one.
>>
>> The old 2.4GHz-only AP's showed the problem with minstrel-ht
>> incorrectly starting off at the highest rate (on a totally different
>> machine). So the Unifi AP's have shown problems in the kernel
>> wireless before, but so far it's always been the fault of the kernel
>> wireless, not the AP.
>
> Yeah; I wasn't trying to blame it on this change, I was just trying to
> understand the change in the environment. Seems likely that it's simply
> the switch to 5 GHz, which is strange, I'd have thought that even that
> rtlwifi driver would've been tested with that :)
>
>> > Could you print out the entire table there when the warning
>> > happens?
>>
>> This is the best I can come up with: printing out the index, and the
>> rate and bitrate tables:
>>
>>   rates[i].idx (9) >= sband->n_bitrates (8)
>>   Rates:
>>   0: idx 9 count 1 flags a0
>>   1: idx 8 count 1 flags a0
>>   2: idx 7 count 2 flags a0
>>   3: idx 6 count 3 flags a0
>
> Yeah, perfect. See, this is already evidently not making any sense:
>
> flags a0 is
> IEEE80211_TX_RC_40_MHZ_WIDTH | IEEE80211_TX_RC_SHORT_GI
>
> both of those options *require* IEEE80211_TX_RC_MCS or
> IEEE80211_TX_RC_VHT_MCS as well, so the flags really should be a8 or
> 1a0.
>
>>   Bitrates:
>>   0: flags 0002 bitrate 60 (hw: 0004 )
>>   1: flags  bitrate 90 (hw: 0005 )
>>   2: flags 0002 bitrate 120 (hw: 0006 )
>>   3: flags  bitrate 180 (hw: 0007 )
>>   4: flags 0002 bitrate 240 (hw: 0008 )
>>   5: flags  bitrate 360 (hw: 0009 )
>>   6: flags  bitrate 480 (hw: 000a )
>>   7: flags  bitrate 540 (hw: 000b )
>>
>> So it's the very first rate that has index 9, but the bitrate table
>> only goes from 0-7.
>>
>> So I suspect that once the first index has been marked invalid, it
>> now will never even look at the later indices, so it has no transmit
>> rates at all.  Or something.
>
> Indeed.
>
>> That bitrate table does seem to match:
>>
>>static struct ieee80211_rate rtl_ratetable_5g[] = {
>>
>> in drivers/net/wireless/realtek/rtlwifi/base.c
>>
>
> Yeah, it would, but it's irrelevant since the rate table isn't actually
> used with MCS rates.
>
> I'm not familiar with this code at all, but looking at it suggests that
> perhaps the switch to 5 GHz wasn't at fault, but instead the switch to
> VHT (802.11ac) - that's more plausible too, not testing with VHT seems
> like something that could have happened for this driver.
>
> And as I figured, the code in _rtl_rc_rate_set_series() is obviously
> not handling VHT correctly: it has
>
> if (sgi_20 || sgi_40 || sgi_80)
> rate->flags |= IEEE80211_TX_RC_SHORT_GI;
> if (sta && sta->ht_cap.ht_supported &&
> ((wireless_mode == WIRELESS_MODE_N_5G) ||
>  (wireless_mode == WIRELESS_MODE_N_24G)))
> rate->flags |= IEEE80211_TX_RC_MCS;
>
> but can never set IEEE80211_TX_RC_VHT_MCS. Seems like there should be
> something like
>
> if (sta && sta->ht_cap.vht_supported &&
> (wireless_mode == WIRELESS_MODE_AC_5G ||
>  wireless_mode == WIRELESS_MODE_AC_24G ||
>  wireless_mode == WIRELESS_MODE_AC_ONLY))
> rate->flags |= IEEE80211_TX_RC_VHT_MCS;
>
> just after/before the above block.
>
> But I'm not familiar with this code at all, so that may not really be
> the right fix or even work.
>
> johannes


Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-28 Thread Johannes Berg
On Thu, 2016-01-28 at 11:01 -0800, Linus Torvalds wrote:
> 
> I used to have the basic original UniFi UAP. I've replaced them with
> the newer "AC Lite" version:
> 
> https://www.ubnt.com/unifi/unifi-ap-ac-lite/
> 
> so it's a fairly big jump from a 2.4GHz-only network to a dual-band
> one.
> 
> The old 2.4GHz-only AP's showed the problem with minstrel-ht
> incorrectly starting off at the highest rate (on a totally different
> machine). So the Unifi AP's have shown problems in the kernel
> wireless before, but so far it's always been the fault of the kernel
> wireless, not the AP.

Yeah; I wasn't trying to blame it on this change, I was just trying to
understand the change in the environment. Seems likely that it's simply
the switch to 5 GHz, which is strange, I'd have thought that even that
rtlwifi driver would've been tested with that :)

> > Could you print out the entire table there when the warning
> > happens?
> 
> This is the best I can come up with: printing out the index, and the
> rate and bitrate tables:
> 
>   rates[i].idx (9) >= sband->n_bitrates (8)
>   Rates:
>   0: idx 9 count 1 flags a0
>   1: idx 8 count 1 flags a0
>   2: idx 7 count 2 flags a0
>   3: idx 6 count 3 flags a0

Yeah, perfect. See, this is already evidently not making any sense:

flags a0 is
IEEE80211_TX_RC_40_MHZ_WIDTH | IEEE80211_TX_RC_SHORT_GI

both of those options *require* IEEE80211_TX_RC_MCS or
IEEE80211_TX_RC_VHT_MCS as well, so the flags really should be a8 or
1a0.

>   Bitrates:
>   0: flags 0002 bitrate 60 (hw: 0004 )
>   1: flags  bitrate 90 (hw: 0005 )
>   2: flags 0002 bitrate 120 (hw: 0006 )
>   3: flags  bitrate 180 (hw: 0007 )
>   4: flags 0002 bitrate 240 (hw: 0008 )
>   5: flags  bitrate 360 (hw: 0009 )
>   6: flags  bitrate 480 (hw: 000a )
>   7: flags  bitrate 540 (hw: 000b )
> 
> So it's the very first rate that has index 9, but the bitrate table
> only goes from 0-7.
> 
> So I suspect that once the first index has been marked invalid, it
> now will never even look at the later indices, so it has no transmit
> rates at all.  Or something.

Indeed.

> That bitrate table does seem to match:
> 
>    static struct ieee80211_rate rtl_ratetable_5g[] = {
> 
> in drivers/net/wireless/realtek/rtlwifi/base.c
> 

Yeah, it would, but it's irrelevant since the rate table isn't actually
used with MCS rates.

I'm not familiar with this code at all, but looking at it suggests that
perhaps the switch to 5 GHz wasn't at fault, but instead the switch to
VHT (802.11ac) - that's more plausible too, not testing with VHT seems
like something that could have happened for this driver.

And as I figured, the code in _rtl_rc_rate_set_series() is obviously
not handling VHT correctly: it has

                if (sgi_20 || sgi_40 || sgi_80)
rate->flags |= IEEE80211_TX_RC_SHORT_GI;
if (sta && sta->ht_cap.ht_supported &&
((wireless_mode == WIRELESS_MODE_N_5G) ||
 (wireless_mode == WIRELESS_MODE_N_24G)))
rate->flags |= IEEE80211_TX_RC_MCS;

but can never set IEEE80211_TX_RC_VHT_MCS. Seems like there should be
something like

                if (sta && sta->ht_cap.vht_supported &&
(wireless_mode == WIRELESS_MODE_AC_5G ||
                     wireless_mode == WIRELESS_MODE_AC_24G ||
                     wireless_mode == WIRELESS_MODE_AC_ONLY))
rate->flags |= IEEE80211_TX_RC_VHT_MCS;

just after/before the above block.

But I'm not familiar with this code at all, so that may not really be
the right fix or even work.

johannes


Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-28 Thread Johannes Berg
On Wed, 2016-01-27 at 21:34 -0800, Linus Torvalds wrote:

> .. except now I upgraded the nearest access point, and now wireless
> on that machine no longer works.

Can you describe the upgrade a bit more, just for background?

> Or rather, it actually *does* work in the sense that it
> authenticates, it associates, and it actually gets a DHCP lease etc.
> So the darn thing has an IP address and everything, but then nothing
> else seems to go through after that. Very odd. My guess is that the
> auth/assoc/dhcp thign happens at low rates, then it starts trying to
> up the rates, and things go to hell.

That's usually the case, yes. Auth/assoc/etc. management frames use low
rates anyway, and the first few data frames usually also do until it
scales up.

The code involved is drivers/net/wireless/realtek/rtlwifi/rc.c

> I do note that that rate_fixup_ratelist() function is a bit odd wrt
> those rate indexes: it has code to make sure that there are no valid
> rates following an invalid one:
> 
> /*
>  * make sure there's no valid rate following
>  * an invalid one, just in case drivers don't
>  * take the API seriously to stop at -1.
>  */
> if (inval) {
> rates[i].idx = -1;
> continue;
> }
> if (rates[i].idx < 0) {
> inval = true;
> continue;
> }
> 
> but then that "RC is busted" case that generates a warning will add
> one of those invalid rates in the middle anyway. So I get the feeling
> that if that warning ever triggers, it will basically be screwing up
> that whole rate table. I dunno.

This should be OK, it's more of a sanity check. The driver is supposed
to stop transmission attempts at the first -1 it seems, but the rate
control algorithm shouldn't generate useless attempts that will never
really get used, since that indicates a bug in the rate scaling.

> Is there anything sane I can do to help debug this case?

Could you print out the entire table there when the warning happens? Or
at least, it'd help to figure out at which index the invalid actually
happens. It seems that if that perhaps happens on the very first index,
the driver might get completely confused and perhaps not even send the
frame, which would lead to symptoms like the one you describe.

It seems plausible that there's a path somewhere in the rate scaling
code that forgets to set IEEE80211_TX_RC_MCS or so.

johannes


Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-28 Thread Linus Torvalds
On Thu, Jan 28, 2016 at 5:54 PM, Larry Finger  wrote:
>
> I have been running an RTL8821AE since kernel 3.18 without hitting this
> problem using a TRENDnet AC1750 dual-band AP. The UniFi may be doing
> something that the driver is not expecting.

I've had issues with unifi ap's before, but to be honest, I've had
issues with lots of hotel and airport wifi too. I don't think the
Unifi APs are outside of the normal spectrum..

> Attached is a minimal patch that comments out the "vht_cap->vht_supported =
> true;" statement for both RTL8821AE and RTL8812AE in
> _rtl_init_hw_vht_capab(). Does that allow your system to work?

That works too, yes.

> The patch
> also logs some information regarding the channelplan and the country code.
> Please let me know the values for those.

  rtlwifi:  channelplan 127
  rtlwifi:  country code 13

> I apparently missed a previous complaint about this issue. If you still have
> the reference, please send it to me.

So googling for similar issues, I found

  https://bugzilla.redhat.com/show_bug.cgi?id=1168467
  https://bugzilla.redhat.com/show_bug.cgi?id=1293136

where that second one in particular looks very like my issue
("Association succeeds, and ARP/DHCP work, but no IP frames can be
transmitted").

In both cases you have to go into the dmesg attachment to see that its
rtlwifi in both cases).

And there's an ubuntuforum thread

  http://ubuntuforums.org/showthread.php?t=2226009=2

where it you follow the thing, it's an rtl chip on a PCI card, and it
has very similar "connected but no internet" behavior, along with the
"net/mac80211/rate.c:526" warning (different line numbers, different
kernel version, but it smells similar).

Or this one:

  http://forums.debian.net/viewtopic.php?f=5=111781

which is also rtl-wifi, and also has the "associated, connected, got
an IP, but no data, not even a ping" behavior. It also has the
warning, but it looks different in other ways (2.4GHz only and
actually says it's not doing HT/VHT).

So I don't know. The warning in net/mac80211/rate.c:does seem to be
associated with the realtek driver.

 Linus


Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-28 Thread Larry Finger

On 01/28/2016 05:01 PM, Linus Torvalds wrote:

On Thu, Jan 28, 2016 at 2:12 PM, Johannes Berg
 wrote:


Your best workaround may just be to ignore VHT for now - clearly it's
broken so using "just" HT (which is likely not that much of a penalty
anyway since you're apparently not using 80 MHz) will be much better.

Go into

_rtl_init_hw_vht_capab()

and just remove or stub out the entire contents of that (or you could
just remove the "vht_supported=true" if you feel like it.)

That should get it to HT only, which is likely tested and working
better.


Bingo. That indeed gets me working wireless. It's not super-fast, but
I don't think it ever has been..

If somebody has a suggested patch to actually *fix* VHT on this
chipset, that would obviously be better. And maybe it works on some
other chipsets, but not on mine. I'll happily test patches now that
the merge window is over and I have some time again (and I can also
make my AP do 80MHz channels if that matters, although as Johannes
noted it's not enabled by default).

For the realtek driver people, here is what lspci says:

02:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8821AE
802.11ac PCIe Wireless Network Adapter
 Subsystem: AzureWave Device 2161
 Kernel driver in use: rtl8821ae

(Numeric PCI ID: 10ec:8821, subsystem 1a3b:2161)

Thanks,


Linus,

I have been running an RTL8821AE since kernel 3.18 without hitting this problem 
using a TRENDnet AC1750 dual-band AP. The UniFi may be doing something that the 
driver is not expecting.


There have also been some problems with the regdom in some models of these chips 
that I also fail to see. It appears that some vendors are not coding the EEPROM 
correctly. That should not affect your system.


Attached is a minimal patch that comments out the "vht_cap->vht_supported = 
true;" statement for both RTL8821AE and RTL8812AE in _rtl_init_hw_vht_capab(). 
Does that allow your system to work? The patch also logs some information 
regarding the channelplan and the country code. Please let me know the values 
for those.


I apparently missed a previous complaint about this issue. If you still have the 
reference, please send it to me.


Larry


diff --git a/drivers/net/wireless/realtek/rtlwifi/base.c b/drivers/net/wireless/realtek/rtlwifi/base.c
index 0517a4f..2464d41 100644
--- a/drivers/net/wireless/realtek/rtlwifi/base.c
+++ b/drivers/net/wireless/realtek/rtlwifi/base.c
@@ -248,7 +248,7 @@ static void _rtl_init_hw_vht_capab(struct ieee80211_hw *hw,
 	if (rtlhal->hw_type == HARDWARE_TYPE_RTL8812AE) {
 		u16 mcs_map;
 
-		vht_cap->vht_supported = true;
+		/* vht_cap->vht_supported = true; */
 		vht_cap->cap =
 			IEEE80211_VHT_CAP_MAX_MPDU_LENGTH_3895 |
 			IEEE80211_VHT_CAP_MAX_MPDU_LENGTH_7991 |
@@ -282,7 +282,7 @@ static void _rtl_init_hw_vht_capab(struct ieee80211_hw *hw,
 	} else if (rtlhal->hw_type == HARDWARE_TYPE_RTL8821AE) {
 		u16 mcs_map;
 
-		vht_cap->vht_supported = true;
+		/* vht_cap->vht_supported = true; */
 		vht_cap->cap =
 			IEEE80211_VHT_CAP_MAX_MPDU_LENGTH_3895 |
 			IEEE80211_VHT_CAP_MAX_MPDU_LENGTH_7991 |
diff --git a/drivers/net/wireless/realtek/rtlwifi/regd.c b/drivers/net/wireless/realtek/rtlwifi/regd.c
index 5be3411..38f464e 100644
--- a/drivers/net/wireless/realtek/rtlwifi/regd.c
+++ b/drivers/net/wireless/realtek/rtlwifi/regd.c
@@ -340,6 +340,7 @@ static int _rtl_reg_notifier_apply(struct wiphy *wiphy,
 static const struct ieee80211_regdomain *_rtl_regdomain_select(
 		struct rtl_regulatory *reg)
 {
+	pr_info(" country code %d\n", reg->country_code);
 	switch (reg->country_code) {
 	case COUNTRY_CODE_FCC:
 		return _regdom_no_midband;
@@ -400,6 +401,7 @@ static struct country_code_to_enum_rd *_rtl_regd_find_country(u16 countrycode)
 
 static u8 channel_plan_to_country_code(u8 channelplan)
 {
+	pr_info(" channelplan %d\n", channelplan);
 	switch (channelplan) {
 	case 0x20:
 	case 0x21:



WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-27 Thread Linus Torvalds
Hmm. So my daughter has a little Gigabyte Brix that has rtl8821ae
wireless in it. Yeah, nasty, I know, but it has actually worked
reasonably well.

.. except now I upgraded the nearest access point, and now wireless on
that machine no longer works.

Or rather, it actually *does* work in the sense that it authenticates,
it associates, and it actually gets a DHCP lease etc. So the darn
thing has an IP address and everything, but then nothing else seems to
go through after that. Very odd. My guess is that the auth/assoc/dhcp
thign happens at low rates, then it starts trying to up the rates, and
things go to hell.

But clearly several packets have gotten through.  And then absolutely
nothing. Everything else is happy with the new AP, so this is not a
problem with the wireless network itself.

I'm appending the warning that gets printed, which may or may not be relevant.

This is with a clean and up-to-date Fedora 23 install, so that line 513 is the

   512  /* RC is busted */
   513  if (WARN_ON_ONCE(rates[i].idx >= sband->n_bitrates)) {
   514  rates[i].idx = -1;
   515  continue;
   516  }

thing, which still exists in the same form in current kernels (except
in current -git it's line 625).

I do note that that rate_fixup_ratelist() function is a bit odd wrt
those rate indexes: it has code to make sure that there are no valid
rates following an invalid one:

/*
 * make sure there's no valid rate following
 * an invalid one, just in case drivers don't
 * take the API seriously to stop at -1.
 */
if (inval) {
rates[i].idx = -1;
continue;
}
if (rates[i].idx < 0) {
inval = true;
continue;
}

but then that "RC is busted" case that generates a warning will add
one of those invalid rates in the middle anyway. So I get the feeling
that if that warning ever triggers, it will basically be screwing up
that whole rate table. I dunno.

Is there anything sane I can do to help debug this case?

 Linus

--- snip snip, relevant (?) wireless warning ---

IPv6: ADDRCONF(NETDEV_UP): enp3s0: link is not ready
  r8169 :03:00.0 enp3s0: link down
  IPv6: ADDRCONF(NETDEV_UP): enp3s0: link is not ready
  IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
  IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
  IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
  tun: Universal TUN/TAP device driver, 1.6
  tun: (C) 1999-2004 Max Krasnyansky 
  device virbr0-nic entered promiscuous mode
  virbr0: port 1(virbr0-nic) entered listening state
  virbr0: port 1(virbr0-nic) entered listening state
  virbr0: port 1(virbr0-nic) entered disabled state
  wlp2s0: authenticate with 46:d9:e7:92:bf:29
  wlp2s0: send auth to 46:d9:e7:92:bf:29 (try 1/3)
  wlp2s0: authenticated
  wlp2s0: associate with 46:d9:e7:92:bf:29 (try 1/3)
  wlp2s0: associate with 46:d9:e7:92:bf:29 (try 2/3)
  wlp2s0: RX AssocResp from 46:d9:e7:92:bf:29 (capab=0x411 status=0 aid=1)
  wlp2s0: associated
  IPv6: ADDRCONF(NETDEV_CHANGE): wlp2s0: link becomes ready
  [ cut here ]
  WARNING: CPU: 2 PID: 0 at net/mac80211/rate.c:513
ieee80211_get_tx_rates+0x243/0x7d0 [mac80211]()
  Modules linked in: ccm cmac xt_CHECKSUM ipt_MASQUERADE
nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns
nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
xt_conntrack ebtable_filter ebtable_nat ebtable_broute bridge ebtables
ip6table_raw ip6table_security ip6table_nat nf_conntrack_ipv6
nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_filter ip6_tables
iptable_raw iptable_security iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle bnep
arc4 rtl8821ae vfat fat btcoexist rtl_pci rtlwifi mac80211
x86_pkg_temp_thermal coretemp snd_hda_codec_realtek snd_hda_codec_hdmi
snd_hda_codec_generic kvm_intel snd_soc_rt5640 kvm snd_soc_rl6231
snd_hda_intel snd_soc_core iTCO_wdt snd_hda_codec snd_compress btusb
snd_pcm_dmaengine snd_hda_core
   iTCO_vendor_support cfg80211 ac97_bus btrtl snd_hwdep
crct10dif_pclmul btbcm snd_seq crc32_pclmul btintel crc32c_intel
bluetooth snd_seq_device joydev snd_pcm mei_me mei shpchp dw_dmac
tpm_tis lpc_ich i2c_i801 snd_timer rfkill snd tpm soundcore
snd_soc_sst_acpi dw_dmac_core i2c_designware_platform
i2c_designware_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc
hid_logitech_hidpp hid_logitech_dj i915 i2c_algo_bit drm_kms_helper
8021q garp drm stp llc mrp r8169 sdhci_acpi mii sdhci mmc_core video
i2c_hid
  CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.2.8-300.fc23.x86_64 #1
  Hardware name: GIGABYTE M4HM87P-00/M4HM87P-00, BIOS F2 12/11/2013
    aad0aff724c0ea01 88021ea83648 817738ca