Re: [PATCH] cfg80211: Add cumulative channel survey dump support.
On Wednesday, 18 September 2019 15:07:11 CEST Sven Eckelmann wrote: [...] > I have now prepared a test patch [1] to get the data every 10 seconds. This > was a compromise between having useful information over time and the > overflowing problem. ...over time without too many "bss channel survey timed out" errors and... Kind regards, Sven signature.asc Description: This is a digitally signed message part.
Re: [PATCH] cfg80211: Add cumulative channel survey dump support.
On Wednesday, 18 September 2019 14:58:46 CEST Ben Greear wrote: [...] > > So as Ben Greear said, the 10.4 firmware version is fixed and 10.2.* (for > > the wave-1 cards) is still broken and we need a QCA firmware engineer to > > fix it. Or to work around it by polling every couple of seconds and > > manually do the cleanup of the values from the firmware. > > Have you tried probing very fast, like every 100ms, to see if returned values > look sane? I seem to recall that there was some firmware issue with this, > like > it only updates internal counters every second or so. > > Polling slow would have the same off-by-a-second's-worth-of-data, but you > would not > easily notice it at slower polling intervals. Yes, I've polled at ~100ms intervals at some point. And it looked like I get most of the time only 0 values (for everything - including noisefloor) from the firmware when I do this. And the actual values are only send every second or so (I didn't actual make precise calculations here). I have now prepared a test patch [1] to get the data every 10 seconds. This was a compromise between having useful information over time and the overflowing problem. While it is not the perfect solution (QCA *cough*), it is at least more bearable for me. Kind regards, Sven [1] https://patchwork.kernel.org/patch/11150289/ signature.asc Description: This is a digitally signed message part.
Re: [PATCH] cfg80211: Add cumulative channel survey dump support.
On 09/18/2019 01:46 AM, Sven Eckelmann wrote: On Tuesday, 17 September 2019 19:27:50 CEST Sven Eckelmann wrote: [...] So whatever the firmware does when it gets a WMI_BSS_SURVEY_REQ_TYPE_READ_CLEAR - it is not a CLEAR after read. And they also don't simply wrap around but there all values have to get some kind of "fix" like the active time one shown in ath10k_hw_fill_survey_time. Just that the actual "fixes" for them are unknown. To me it looks like firmware ATH10K_HW_CC_WRAP_SHIFTED_ALL have busy and rx interlinked with the overflow of total. But the tx and rx_bss are actually cleared. Other than that, the counters are wrapping every ~14-30 seconds. So we also need also some worker for ath10k which every couple of seconds requests new values for all the channel from the firmware. Which already sounds problematic because I get "ath10k_pci :00:00.0: bss channelsurvey timed out" all the time when requesting surveys manually. I've just tested it on 10.4 (wave-2) cards and it seems like it is cleared as expected on them. So the change I posted earlier (with a minor fix for ath10k_hw_fill_survey_time) returns now useful (accumulated) values. This can be seen in https://stats.freifunk-vogtland.net/d/ffv_node/nodeinfo?orgId=1=ac86749f4d60=5=1568782046974=1568807068706 (after the reboot at 10:15 UTC+2) So as Ben Greear said, the 10.4 firmware version is fixed and 10.2.* (for the wave-1 cards) is still broken and we need a QCA firmware engineer to fix it. Or to work around it by polling every couple of seconds and manually do the cleanup of the values from the firmware. Have you tried probing very fast, like every 100ms, to see if returned values look sane? I seem to recall that there was some firmware issue with this, like it only updates internal counters every second or so. Polling slow would have the same off-by-a-second's-worth-of-data, but you would not easily notice it at slower polling intervals. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [PATCH] cfg80211: Add cumulative channel survey dump support.
On Tuesday, 17 September 2019 19:27:50 CEST Sven Eckelmann wrote: [...] > So whatever the firmware does when it gets a > WMI_BSS_SURVEY_REQ_TYPE_READ_CLEAR - it is not a CLEAR after read. And they > also don't simply wrap around but there all values have to get some kind of > "fix" like the active time one shown in ath10k_hw_fill_survey_time. > Just that the actual "fixes" for them are unknown. To me it looks like > firmware ATH10K_HW_CC_WRAP_SHIFTED_ALL have busy and rx interlinked with > the overflow of total. But the tx and rx_bss are actually cleared. > > Other than that, the counters are wrapping every ~14-30 seconds. So we > also need also some worker for ath10k which every couple of seconds > requests new values for all the channel from the firmware. Which already > sounds problematic because I get > "ath10k_pci :00:00.0: bss channelsurvey timed out" all the time > when requesting surveys manually. I've just tested it on 10.4 (wave-2) cards and it seems like it is cleared as expected on them. So the change I posted earlier (with a minor fix for ath10k_hw_fill_survey_time) returns now useful (accumulated) values. This can be seen in https://stats.freifunk-vogtland.net/d/ffv_node/nodeinfo?orgId=1=ac86749f4d60=5=1568782046974=1568807068706 (after the reboot at 10:15 UTC+2) So as Ben Greear said, the 10.4 firmware version is fixed and 10.2.* (for the wave-1 cards) is still broken and we need a QCA firmware engineer to fix it. Or to work around it by polling every couple of seconds and manually do the cleanup of the values from the firmware. Kind regards, Sven signature.asc Description: This is a digitally signed message part.
Re: [PATCH] cfg80211: Add cumulative channel survey dump support.
On 9/17/19 10:27 AM, Sven Eckelmann wrote: On Thursday, 31 May 2018 11:06:59 CEST vnara...@codeaurora.org wrote: I will sent next version of patch with updated commit log. Can you please point me to the second version? Btw. I've just checked the minimal changes in ath10k to get this working. It seems we need SURVEY_INFO_NON_ACC_DATA in ath10k's ath10k_get_survey + memset of ar->survey[idx]. But right now the total time looks (especially) wrong to me. At least it is rather unlikely that I can have around 30 second active time delta in roughly 1 real world second. Maybe a bug with the READ_CLEAR handling in firmware 10.2.4-1.0-00043 or maybe all firmware version? More logs about that at the end. @Ben: Was this also what you've experience in the past with the 10.2 firmware bss_chan_info counter bugs or am I just misusing the functionality of the firmware? Last I recall, the upstream code had several bugs. Maybe some QCA firmware person can let you know if they fixed the upstream firmware. If you want to test against ath10k-ct driver/firmware, and if you still see bogus values, then I can debug and fix it. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [PATCH] cfg80211: Add cumulative channel survey dump support.
On Thursday, 31 May 2018 11:06:59 CEST vnara...@codeaurora.org wrote: > I will sent next version of patch with updated commit log. Can you please point me to the second version? Btw. I've just checked the minimal changes in ath10k to get this working. It seems we need SURVEY_INFO_NON_ACC_DATA in ath10k's ath10k_get_survey + memset of ar->survey[idx]. But right now the total time looks (especially) wrong to me. At least it is rather unlikely that I can have around 30 second active time delta in roughly 1 real world second. Maybe a bug with the READ_CLEAR handling in firmware 10.2.4-1.0-00043 or maybe all firmware version? More logs about that at the end. @Ben: Was this also what you've experience in the past with the 10.2 firmware bss_chan_info counter bugs or am I just misusing the functionality of the firmware? And yes, I am aware that the minimal change in ath10k is not enough - but it was good enough for a simple test. I think ath10k_wmi_event_pdev_bss_chan_info + ath10k_hw_fill_survey_time also have to be modified to accumulate .time, .time_busy, .time_rx, and .time_tx. But it makes me wonder whether there is now any benefit from the cfg80211 patch (for ath10k). You can basically get the same functionality from --- a/drivers/net/wireless/ath/ath10k/hw.c +++ b/drivers/net/wireless/ath/ath10k/hw.c @@ -216,8 +217,9 @@ void ath10k_hw_fill_survey_time(struct ath10k *ar, struct survey_info *survey, cc -= cc_prev - cc_fix; rcc -= rcc_prev - rcc_fix; - survey->time = CCNT_TO_MSEC(ar, cc); - survey->time_busy = CCNT_TO_MSEC(ar, rcc); + survey->time += CCNT_TO_MSEC(ar, cc); + if (survey->filled & SURVEY_INFO_TIME_BUSY) + survey->time_busy += CCNT_TO_MSEC(ar, rcc); } const struct ath10k_hw_ops qca988x_ops = { --- a/drivers/net/wireless/ath/ath10k/wmi.c +++ b/drivers/net/wireless/ath/ath10k/wmi.c @@ -4905,6 +4905,13 @@ static int ath10k_wmi_event_pdev_bss_chan_info(struct ath10k *ar, "wmi event pdev bss chan info:\n freq: %d noise: %d cycle: busy %llu total %llu tx %llu rx %llu rx_bss %llu\n", freq, noise_floor, busy, total, tx, rx, rx_bss); + /* everything zero means invalid data -> drop it to avoid ruining the +* noisefloor +*/ + if (noise_floor == 0 && busy == 0 && total == 0 && tx == 0 && rx == 0 && + rx_bss == 0) + return -EPROTO; + spin_lock_bh(>data_lock); idx = freq_to_idx(ar, freq); if (idx >= ARRAY_SIZE(ar->survey)) { @@ -4916,10 +4923,10 @@ static int ath10k_wmi_event_pdev_bss_chan_info(struct ath10k *ar, survey = >survey[idx]; survey->noise = noise_floor; - survey->time = div_u64(total, cc_freq_hz); - survey->time_busy = div_u64(busy, cc_freq_hz); - survey->time_rx = div_u64(rx_bss, cc_freq_hz); - survey->time_tx = div_u64(tx, cc_freq_hz); + survey->time += div_u64(total, cc_freq_hz); + survey->time_busy += div_u64(busy, cc_freq_hz); + survey->time_rx += div_u64(rx_bss, cc_freq_hz); + survey->time_tx += div_u64(tx, cc_freq_hz); survey->filled |= (SURVEY_INFO_NOISE_DBM | SURVEY_INFO_TIME | SURVEY_INFO_TIME_BUSY | Both version get to the same (incorrect) timing results :) Here is the time data I get reported from the firmware (entries with 0 in each field were removed) [ 162.875008] wmi event pdev bss chan info: freq: 5180 noise: -102 cycle: busy 83895506 total 3801904834 tx 267520 rx 71507709 rx_bss 0 [ 163.917612] wmi event pdev bss chan info: freq: 5180 noise: -102 cycle: busy 85745287 total 3889908918 tx 214016 rx 73140278 rx_bss 0 [ 164.960733] wmi event pdev bss chan info: freq: 5180 noise: -102 cycle: busy 87612741 total 3977912729 tx 221408 rx 74750020 rx_bss 22880 [ 166.004348] wmi event pdev bss chan info: freq: 5180 noise: -102 cycle: busy 89489393 total 4065916182 tx 214016 rx 76383244 rx_bss 0 [ 167.048330] wmi event pdev bss chan info: freq: 5180 noise: -102 cycle: busy 91292949 total 4153919205 tx 214016 rx 77942266 rx_bss 0 [ 168.091828] wmi event pdev bss chan info: freq: 5180 noise: -102 cycle: busy 93118897 total 4241923018 tx 187264 rx 79551036 rx_bss 0 [ 169.134349] wmi event pdev bss chan info: freq: 5180 noise: -102 cycle: busy 47834553 total 2182443012 tx 160512 rx 40879686 rx_bss 0 [ 170.177511] wmi event pdev bss chan info: freq: 5180 noise: -102 cycle: busy 49751910 total 2270447314 tx 167904 rx 42539598 rx_bss 22880 [ 171.221156] wmi event pdev bss chan info: freq: 5180 noise: -102 cycle: busy 51639827 total 2358451435 tx 160512 rx 44184146 rx_bss 0 [ 172.262426] wmi event pdev bss chan info: freq: 5180 noise: -102 cycle: busy 53420475 total 2446455772 tx 133760 rx 45746866 rx_bss 0 [ 173.309067] wmi event pdev bss chan info: freq: 5180 noise: -103