Re: [PATCH] cfg80211: Add cumulative channel survey dump support.

2019-09-18 Thread Sven Eckelmann
On Wednesday, 18 September 2019 15:07:11 CEST Sven Eckelmann wrote:
[...]
> I have now prepared a test patch [1] to get the data every 10 seconds. This 
> was a compromise between having useful information over time and the 
> overflowing problem. 

...over time without too many "bss channel survey timed out" errors and...

Kind regards,
Sven

signature.asc
Description: This is a digitally signed message part.


Re: [PATCH] cfg80211: Add cumulative channel survey dump support.

2019-09-18 Thread Sven Eckelmann
On Wednesday, 18 September 2019 14:58:46 CEST Ben Greear wrote:
[...]
> > So as Ben Greear said, the 10.4 firmware version is fixed and 10.2.* (for
> > the wave-1 cards) is still broken and we need a QCA firmware engineer to
> > fix it. Or to work around it by polling every couple of seconds and
> > manually do the cleanup of the values from the firmware.
> 
> Have you tried probing very fast, like every 100ms, to see if returned values
> look sane?  I seem to recall that there was some firmware issue with this, 
> like
> it only updates internal counters every second or so.
> 
> Polling slow would have the same off-by-a-second's-worth-of-data, but you 
> would not
> easily notice it at slower polling intervals.

Yes, I've polled at ~100ms intervals at some point. And it looked like I get 
most of the time only 0 values (for everything - including noisefloor) from 
the firmware when I do this. And the actual values are only send every second 
or so (I didn't actual make precise calculations here).

I have now prepared a test patch [1] to get the data every 10 seconds. This 
was a compromise between having useful information over time and the 
overflowing problem. While it is not the perfect solution (QCA *cough*), it is 
at least more bearable for me.

Kind regards,
Sven

[1] https://patchwork.kernel.org/patch/11150289/

signature.asc
Description: This is a digitally signed message part.


Re: [PATCH] cfg80211: Add cumulative channel survey dump support.

2019-09-18 Thread Ben Greear




On 09/18/2019 01:46 AM, Sven Eckelmann wrote:

On Tuesday, 17 September 2019 19:27:50 CEST Sven Eckelmann wrote:
[...]

So whatever the firmware does when it gets a
WMI_BSS_SURVEY_REQ_TYPE_READ_CLEAR -  it is not a CLEAR after read. And they
also don't simply wrap around but there all values have to get some kind of
"fix" like the active time one shown in ath10k_hw_fill_survey_time.
Just that the actual "fixes" for them are unknown. To me it looks like
firmware ATH10K_HW_CC_WRAP_SHIFTED_ALL have busy and rx interlinked with
the overflow of total. But the tx and rx_bss are actually cleared.

Other than that, the counters are wrapping every ~14-30 seconds. So we
also need also some worker for ath10k which every couple of seconds
requests new values for all the channel from the firmware. Which already
sounds problematic because I get
"ath10k_pci :00:00.0: bss channelsurvey timed out" all the time
when requesting surveys manually.


I've just tested it on 10.4 (wave-2) cards and it seems like it is cleared as
expected on them. So the change I posted earlier (with a minor fix for
ath10k_hw_fill_survey_time) returns now useful (accumulated) values. This can
be seen in
https://stats.freifunk-vogtland.net/d/ffv_node/nodeinfo?orgId=1=ac86749f4d60=5=1568782046974=1568807068706
(after the reboot at 10:15 UTC+2)

So as Ben Greear said, the 10.4 firmware version is fixed and 10.2.* (for
the wave-1 cards) is still broken and we need a QCA firmware engineer to
fix it. Or to work around it by polling every couple of seconds and
manually do the cleanup of the values from the firmware.


Have you tried probing very fast, like every 100ms, to see if returned values
look sane?  I seem to recall that there was some firmware issue with this, like
it only updates internal counters every second or so.

Polling slow would have the same off-by-a-second's-worth-of-data, but you would 
not
easily notice it at slower polling intervals.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


Re: [PATCH] cfg80211: Add cumulative channel survey dump support.

2019-09-18 Thread Sven Eckelmann
On Tuesday, 17 September 2019 19:27:50 CEST Sven Eckelmann wrote:
[...]
> So whatever the firmware does when it gets a 
> WMI_BSS_SURVEY_REQ_TYPE_READ_CLEAR -  it is not a CLEAR after read. And they 
> also don't simply wrap around but there all values have to get some kind of 
> "fix" like the active time one shown in ath10k_hw_fill_survey_time.
> Just that the actual "fixes" for them are unknown. To me it looks like
> firmware ATH10K_HW_CC_WRAP_SHIFTED_ALL have busy and rx interlinked with
> the overflow of total. But the tx and rx_bss are actually cleared.
> 
> Other than that, the counters are wrapping every ~14-30 seconds. So we
> also need also some worker for ath10k which every couple of seconds
> requests new values for all the channel from the firmware. Which already
> sounds problematic because I get
> "ath10k_pci :00:00.0: bss channelsurvey timed out" all the time
> when requesting surveys manually.

I've just tested it on 10.4 (wave-2) cards and it seems like it is cleared as 
expected on them. So the change I posted earlier (with a minor fix for 
ath10k_hw_fill_survey_time) returns now useful (accumulated) values. This can 
be seen in 
https://stats.freifunk-vogtland.net/d/ffv_node/nodeinfo?orgId=1=ac86749f4d60=5=1568782046974=1568807068706
 
(after the reboot at 10:15 UTC+2)

So as Ben Greear said, the 10.4 firmware version is fixed and 10.2.* (for
the wave-1 cards) is still broken and we need a QCA firmware engineer to
fix it. Or to work around it by polling every couple of seconds and 
manually do the cleanup of the values from the firmware.

Kind regards,
Sven

signature.asc
Description: This is a digitally signed message part.


Re: [PATCH] cfg80211: Add cumulative channel survey dump support.

2019-09-17 Thread Ben Greear

On 9/17/19 10:27 AM, Sven Eckelmann wrote:

On Thursday, 31 May 2018 11:06:59 CEST vnara...@codeaurora.org wrote:

I will sent next version of patch with updated commit log.


Can you please point me to the second version?

Btw. I've just checked the minimal changes in ath10k to get this working. It
seems we need SURVEY_INFO_NON_ACC_DATA in ath10k's ath10k_get_survey + memset
of ar->survey[idx].

But right now the total time looks (especially) wrong to me. At least it is
rather unlikely that I can have around 30 second active time delta in
roughly 1 real world second.  Maybe a bug with the READ_CLEAR handling in
firmware 10.2.4-1.0-00043 or maybe all firmware version? More logs about
that at the end.

@Ben: Was this also what you've experience in the past with the 10.2 firmware
bss_chan_info counter bugs or am I just misusing the functionality of the
firmware?


Last I recall, the upstream code had several bugs.  Maybe some QCA
firmware person can let you know if they fixed the upstream firmware.

If you want to test against ath10k-ct driver/firmware, and if you still see 
bogus values, then
I can debug and fix it.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Re: [PATCH] cfg80211: Add cumulative channel survey dump support.

2019-09-17 Thread Sven Eckelmann
On Thursday, 31 May 2018 11:06:59 CEST vnara...@codeaurora.org wrote:
> I will sent next version of patch with updated commit log.

Can you please point me to the second version?

Btw. I've just checked the minimal changes in ath10k to get this working. It 
seems we need SURVEY_INFO_NON_ACC_DATA in ath10k's ath10k_get_survey + memset 
of ar->survey[idx].

But right now the total time looks (especially) wrong to me. At least it is 
rather unlikely that I can have around 30 second active time delta in
roughly 1 real world second.  Maybe a bug with the READ_CLEAR handling in
firmware 10.2.4-1.0-00043 or maybe all firmware version? More logs about
that at the end.

@Ben: Was this also what you've experience in the past with the 10.2 firmware
bss_chan_info counter bugs or am I just misusing the functionality of the
firmware?


And yes, I am aware that the minimal change in ath10k is not enough - but it 
was good enough for a simple test. I think ath10k_wmi_event_pdev_bss_chan_info 
+ ath10k_hw_fill_survey_time also have to be modified to accumulate .time, 
.time_busy, .time_rx, and .time_tx.  But it makes me wonder whether there is 
now any benefit from the cfg80211 patch (for ath10k). You can basically get 
the same functionality from

--- a/drivers/net/wireless/ath/ath10k/hw.c
+++ b/drivers/net/wireless/ath/ath10k/hw.c
@@ -216,8 +217,9 @@ void ath10k_hw_fill_survey_time(struct ath10k *ar, struct 
survey_info *survey,
cc -= cc_prev - cc_fix;
rcc -= rcc_prev - rcc_fix;
 
-   survey->time = CCNT_TO_MSEC(ar, cc);
-   survey->time_busy = CCNT_TO_MSEC(ar, rcc);
+   survey->time += CCNT_TO_MSEC(ar, cc);
+   if (survey->filled & SURVEY_INFO_TIME_BUSY)
+   survey->time_busy += CCNT_TO_MSEC(ar, rcc);
 }
 
 const struct ath10k_hw_ops qca988x_ops = {
--- a/drivers/net/wireless/ath/ath10k/wmi.c
+++ b/drivers/net/wireless/ath/ath10k/wmi.c
@@ -4905,6 +4905,13 @@ static int ath10k_wmi_event_pdev_bss_chan_info(struct 
ath10k *ar,
   "wmi event pdev bss chan info:\n freq: %d noise: %d cycle: 
busy %llu total %llu tx %llu rx %llu rx_bss %llu\n",
   freq, noise_floor, busy, total, tx, rx, rx_bss);
 
+   /* everything zero means invalid data -> drop it to avoid ruining the
+* noisefloor
+*/
+   if (noise_floor == 0 && busy == 0 && total == 0 && tx == 0 && rx == 0 &&
+   rx_bss == 0)
+   return -EPROTO;
+
spin_lock_bh(>data_lock);
idx = freq_to_idx(ar, freq);
if (idx >= ARRAY_SIZE(ar->survey)) {
@@ -4916,10 +4923,10 @@ static int ath10k_wmi_event_pdev_bss_chan_info(struct 
ath10k *ar,
survey = >survey[idx];
 
survey->noise = noise_floor;
-   survey->time  = div_u64(total, cc_freq_hz);
-   survey->time_busy = div_u64(busy, cc_freq_hz);
-   survey->time_rx   = div_u64(rx_bss, cc_freq_hz);
-   survey->time_tx   = div_u64(tx, cc_freq_hz);
+   survey->time  += div_u64(total, cc_freq_hz);
+   survey->time_busy += div_u64(busy, cc_freq_hz);
+   survey->time_rx   += div_u64(rx_bss, cc_freq_hz);
+   survey->time_tx   += div_u64(tx, cc_freq_hz);
survey->filled   |= (SURVEY_INFO_NOISE_DBM |
 SURVEY_INFO_TIME |
 SURVEY_INFO_TIME_BUSY |


Both version get to the same (incorrect) timing results :)

Here is the time data I get reported from the firmware (entries
with 0 in each field were removed)

[  162.875008] wmi event pdev bss chan info:  freq: 5180 noise: -102 cycle: 
busy 83895506 total 3801904834 tx 267520 rx 71507709 rx_bss 0
[  163.917612] wmi event pdev bss chan info:  freq: 5180 noise: -102 cycle: 
busy 85745287 total 3889908918 tx 214016 rx 73140278 rx_bss 0
[  164.960733] wmi event pdev bss chan info:  freq: 5180 noise: -102 cycle: 
busy 87612741 total 3977912729 tx 221408 rx 74750020 rx_bss 22880
[  166.004348] wmi event pdev bss chan info:  freq: 5180 noise: -102 cycle: 
busy 89489393 total 4065916182 tx 214016 rx 76383244 rx_bss 0
[  167.048330] wmi event pdev bss chan info:  freq: 5180 noise: -102 cycle: 
busy 91292949 total 4153919205 tx 214016 rx 77942266 rx_bss 0
[  168.091828] wmi event pdev bss chan info:  freq: 5180 noise: -102 cycle: 
busy 93118897 total 4241923018 tx 187264 rx 79551036 rx_bss 0
[  169.134349] wmi event pdev bss chan info:  freq: 5180 noise: -102 cycle: 
busy 47834553 total 2182443012 tx 160512 rx 40879686 rx_bss 0
[  170.177511] wmi event pdev bss chan info:  freq: 5180 noise: -102 cycle: 
busy 49751910 total 2270447314 tx 167904 rx 42539598 rx_bss 22880
[  171.221156] wmi event pdev bss chan info:  freq: 5180 noise: -102 cycle: 
busy 51639827 total 2358451435 tx 160512 rx 44184146 rx_bss 0
[  172.262426] wmi event pdev bss chan info:  freq: 5180 noise: -102 cycle: 
busy 53420475 total 2446455772 tx 133760 rx 45746866 rx_bss 0
[  173.309067] wmi event pdev bss chan info:  freq: 5180 noise: -103