Re: [RFCv2 0/3] mac80211: implement fq codel

2016-03-22 Thread Toke Høiland-Jørgensen
Michal Kazior  writes:

> traffic-gen generates only BE traffic. Everything else runs UDP_RR
> which doesn't generate a lot of traffic.

Good point. Fixed that: the newest git version of traffic-gen supports a
-t parameter which will be set as the TOS byte on outgoing traffic
(literal; no smart diffserv handling, so you can override the ECN bits
as well).

Added support for a burst-tos test parameter in the Flent burst test
configs which will use this new parameter if set.

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 2/2] ath10k: add per peer tx stats support for 10.2.4

2017-11-30 Thread Toke Høiland-Jørgensen



>> +struct ath10k_10_2_peer_tx_stats {
>> +u8 ratecode[PEER_STATS_FOR_NO_OF_PPDUS];
>> +u8 success_pkts[PEER_STATS_FOR_NO_OF_PPDUS];
>> +__le16 success_bytes[PEER_STATS_FOR_NO_OF_PPDUS];
>> +u8 retry_pkts[PEER_STATS_FOR_NO_OF_PPDUS];
>> +__le16 retry_bytes[PEER_STATS_FOR_NO_OF_PPDUS];
>> +u8 failed_pkts[PEER_STATS_FOR_NO_OF_PPDUS];
>> +__le16 failed_bytes[PEER_STATS_FOR_NO_OF_PPDUS];
>> +u8 flags[PEER_STATS_FOR_NO_OF_PPDUS];
>> +__le32 tx_duration;
>> +u8 tx_ppdu_cnt;
>> +u8 peer_id;
>> +} __packed;
>
>Toke, hopefully the tx_duration value here helps with ATF
>implementation
>using QCA988X.

Awesome! What's the semantics of this field? Just total 
duration spent serving that station in the reporting interval?
Does it include retry attempts?

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 2/2] ath10k: add per peer tx stats support for 10.2.4

2017-12-02 Thread Toke Høiland-Jørgensen
Johannes Berg <johan...@sipsolutions.net> writes:

> On Fri, 2017-12-01 at 16:54 +0100, Toke Høiland-Jørgensen wrote:
>
>> But since we'll have to do that for ath9k anyway it
>> becomes more a question of whether it's something that should be
>> supported in mac80211 or if it should be implemented in each driver.
>> But since that logic will have to be written from scratch anyway, is
>> there any reason to not just do it in mac80211 from the beginning?
>
> I was thinking in the driver so that there's no overhead for the other
> drivers that don't want it that way.
>
> Dunno. It probably doesn't really matter.

I'll prototype something, then we can revisit this based on actual code :)

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2] ath10k: Re-enable TXQs for all devices

2017-11-12 Thread Toke Høiland-Jørgensen
Felix Fietkau <n...@nbd.name> writes:

> On 2017-11-10 01:48, Toke Høiland-Jørgensen wrote:
>> Commit 4ca1807815aa6801aaced7fdefa9edacc2521767 disables the use of the
>> mac80211 TXQs for some devices because of a theoretical throughput
>> regression. The original regression report[1] was related to fq_codel
>> qdisc drop performance, which was fixed in
>> 9d18562a227874289fda8ca5d117d8f503f1dcca. Since then, we have not seen
>> the TXQ-related regression, so it should be safe to re-enable TXQs.
> That commit is unrelated to the fq/codel implementations in mac80211,
> since mac80211 with txq does not use qdisc.

Yup. Which is why it's odd that the initial bug report referred to this;
that would indicate that the regression had nothing to do with TXQs in
the first place, and we could have merged this ages ago?

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


RE: [Make-wifi-fast] [PATCH] ath10k: Re-enable TXQs for all devices

2017-11-09 Thread Toke Høiland-Jørgensen
Rajkumar Manoharan  writes:

> Agree.. Even I am not fan of that patch in ath10k. IIRC Michal pushed
> this change as temp one and planned to revert it once it is fixed in
> stack or in driver.
>
> I thought Eric change in fq_codel is meant only for fatty udp flows.
> Forgive my ignorance.

It is, mostly. There's a separate possible issue with TCP performance
that we're looking into, but that is unrelated to TXQs.

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH v2] ath10k: Re-enable TXQs for all devices

2017-11-09 Thread Toke Høiland-Jørgensen
Commit 4ca1807815aa6801aaced7fdefa9edacc2521767 disables the use of the
mac80211 TXQs for some devices because of a theoretical throughput
regression. The original regression report[1] was related to fq_codel
qdisc drop performance, which was fixed in
9d18562a227874289fda8ca5d117d8f503f1dcca. Since then, we have not seen
the TXQ-related regression, so it should be safe to re-enable TXQs.

[1] http://lists.infradead.org/pipermail/ath10k/2016-April/007266.html

Signed-off-by: Toke Høiland-Jørgensen <t...@toke.dk>
---
This has been in LEDE trunk for a couple of months now with good
results.

Changes since v1:
- Update commit message to refer to original report and the fix for it.
- Add ath10k list to cc

 drivers/net/wireless/ath/ath10k/mac.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/mac.c 
b/drivers/net/wireless/ath/ath10k/mac.c
index 0a947eef348d..ca596ecd2d64 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -8329,15 +8329,6 @@ int ath10k_mac_register(struct ath10k *ar)
ath10k_warn(ar, "failed to initialise DFS pattern 
detector\n");
}
 
-   /* Current wake_tx_queue implementation imposes a significant
-* performance penalty in some setups. The tx scheduling code needs
-* more work anyway so disable the wake_tx_queue unless firmware
-* supports the pull-push mechanism.
-*/
-   if (!test_bit(ATH10K_FW_FEATURE_PEER_FLOW_CONTROL,
- ar->running_fw->fw_file.fw_features))
-   ar->ops->wake_tx_queue = NULL;
-
ret = ath10k_mac_init_rd(ar);
if (ret) {
ath10k_err(ar, "failed to derive regdom: %d\n", ret);
-- 
2.15.0


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 2/2] ath10k: add per peer tx stats support for 10.2.4

2017-12-01 Thread Toke Høiland-Jørgensen
ako...@codeaurora.org writes:

> On 2017-11-30 22:08, Kalle Valo wrote:
>> Toke Høiland-Jørgensen <t...@toke.dk> writes:
>> 
>>>>> +struct ath10k_10_2_peer_tx_stats {
>>>>> + u8 ratecode[PEER_STATS_FOR_NO_OF_PPDUS];
>>>>> + u8 success_pkts[PEER_STATS_FOR_NO_OF_PPDUS];
>>>>> + __le16 success_bytes[PEER_STATS_FOR_NO_OF_PPDUS];
>>>>> + u8 retry_pkts[PEER_STATS_FOR_NO_OF_PPDUS];
>>>>> + __le16 retry_bytes[PEER_STATS_FOR_NO_OF_PPDUS];
>>>>> + u8 failed_pkts[PEER_STATS_FOR_NO_OF_PPDUS];
>>>>> + __le16 failed_bytes[PEER_STATS_FOR_NO_OF_PPDUS];
>>>>> + u8 flags[PEER_STATS_FOR_NO_OF_PPDUS];
>>>>> + __le32 tx_duration;
>>>>> + u8 tx_ppdu_cnt;
>>>>> + u8 peer_id;
>>>>> +} __packed;
>>>> 
>>>> Toke, hopefully the tx_duration value here helps with ATF
>>>> implementation
>>>> using QCA988X.
>>> 
>>> Awesome! What's the semantics of this field? Just total
>>> duration spent serving that station in the reporting interval?
>>> Does it include retry attempts?
>> 
>> I have no clue :) I just noticed this while I was reviewing the patch
>> internally and immediately recalled our discussions at Seoul. I can try
>> to find out, but that will take a long time as I have way too much 
>> stuff
>> pending at the moment. Hopefully someone more knowledgeable 
>> (Anilkumar?)
>> can chime in and help.
>
> tx_duration is aggregate time duration of 4 PPDU sent to STA.
> FW sends these values for retry packets also.

Great, that sounds like just what we need. Thanks for the pointer :)

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 2/2] ath10k: add per peer tx stats support for 10.2.4

2017-12-01 Thread Toke Høiland-Jørgensen
Johannes Berg <johan...@sipsolutions.net> writes:

> On Fri, 2017-12-01 at 16:29 +0100, Toke Høiland-Jørgensen wrote:
>> 
>> Yeah, figures. Well, maybe we'll just have to support an asynchronous
>> callback into mac80211 to register airtime usage. Will probably have to
>> add some asynchronicity anyway to avoid doing too much work in the fast
>> path.
>
> I really think we should have that anyway. In fact, perhaps we should
> just enforce it that way? Then ath9k would have to do a bit more work,
> but it's also the slower of the devices :-)
>
> But this plays in with the whole multi-queue RX we discussed before.

Yeah, that's what I meant by 'add some asynchronicity anyway' :)

I guess that once we have an asynchronous scheduler task of some kind,
supporting the callback is easy, but pulling the data out of each packet
needs more work. But since we'll have to do that for ath9k anyway it
becomes more a question of whether it's something that should be
supported in mac80211 or if it should be implemented in each driver.
But since that logic will have to be written from scratch anyway, is
there any reason to not just do it in mac80211 from the beginning?

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 2/2] ath10k: add per peer tx stats support for 10.2.4

2017-12-01 Thread Toke Høiland-Jørgensen
Kalle Valo  writes:

>>> tx_duration is aggregate time duration of 4 PPDU sent to STA.
>>> FW sends these values for retry packets also.
>>
>> Great, that sounds like just what we need.
>
> Except mapping to the iee80211_tx_status() might be difficult. I'm not
> sure how HTT_T2H_MSG_TYPE_PKTLOG events are sent in relation to
> HTT_T2H_MSG_TYPE_TX_COMPL_IND.

Yeah, figures. Well, maybe we'll just have to support an asynchronous
callback into mac80211 to register airtime usage. Will probably have to
add some asynchronicity anyway to avoid doing too much work in the fast
path.

Will keep that in mind when I get around to revisiting those patches :)

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: ath10k wake_tx_queue issues

2018-05-15 Thread Toke Høiland-Jørgensen
[ Adding Felix ]


Niklas Cassel  writes:

> Hello mac80211 and ath10k people
>
> Using ath10k, TX stops working when running iperf
>
> [  3]  0.0- 1.0 sec  2.00 MBytes  16.8 Mbits/sec
> [  3]  1.0- 2.0 sec  3.12 MBytes  26.2 Mbits/sec
> [  3]  2.0- 3.0 sec  3.25 MBytes  27.3 Mbits/sec
> [  3]  3.0- 4.0 sec   655 KBytes  5.36 Mbits/sec
> [  3]  4.0- 5.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  6.0- 7.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  7.0- 8.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  0.0-10.3 sec  9.01 MBytes  7.32 Mbits/sec
>
> The problem can be reproduced without specifying a send buffer size,
> however, specifying a small send buffer helps to reproduce the problem faster.
>
> What happens is that iperf gets -EAGAIN on write().
> It continues to get -EAGAIN, even if iperf runs for e.g. 300 seconds.
> The reason why we get -EAGAIN is because the send socket buffer is full
> (iperf uses non-blocking I/O).
>
> The problem is that the mac80211 wake_tx_queue callback never comes.
>
> I guess the best way to describe this is to show my ftrace buffer:
>
>  ksoftirqd/2-21[002] .ns474.711744: ath10k_htt_tx_dec_pending: 
> num_pen: 60
>  ksoftirqd/2-21[002] .ns374.711761: ath10k_mac_op_wake_tx_queue: 
> txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 
> 60 queue: 0
>  ksoftirqd/2-21[002] .ns474.711765: ath10k_htt_tx_inc_pending: 
> num_pen: 61
>  ksoftirqd/2-21[002] .ns474.711781: ath10k_htt_tx_inc_pending: 
> num_pen: 62
>  ksoftirqd/2-21[002] .ns474.711787: ath10k_htt_tx_dec_pending: 
> num_pen: 61
>  ksoftirqd/2-21[002] .ns374.711803: ath10k_mac_op_wake_tx_queue: 
> txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 
> 61 queue: 0
>  ksoftirqd/2-21[002] .ns474.711807: ath10k_htt_tx_inc_pending: 
> num_pen: 62
>  ksoftirqd/2-21[002] .ns474.711823: ath10k_htt_tx_inc_pending: 
> num_pen: 63
>  ksoftirqd/2-21[002] .ns474.711829: ath10k_htt_tx_dec_pending: 
> num_pen: 62
>  ksoftirqd/2-21[002] .ns374.711845: ath10k_mac_op_wake_tx_queue: 
> txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 
> 62 queue: 0
>  ksoftirqd/2-21[002] .ns474.711849: ath10k_htt_tx_inc_pending: 
> num_pen: 63
>  ksoftirqd/2-21[002] .ns474.711865: ath10k_htt_tx_inc_pending: 
> num_pen: 64
>  ksoftirqd/2-21[002] dns574.711870: stop_queue: phy0 queue:0, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711874: stop_queue: phy0 queue:1, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711877: stop_queue: phy0 queue:2, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711880: stop_queue: phy0 queue:3, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711882: stop_queue: phy0 queue:4, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711885: stop_queue: phy0 queue:5, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711887: stop_queue: phy0 queue:6, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711890: stop_queue: phy0 queue:7, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711892: stop_queue: phy0 queue:8, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711895: stop_queue: phy0 queue:9, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711898: stop_queue: phy0 queue:10, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711900: stop_queue: phy0 queue:11, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711903: stop_queue: phy0 queue:12, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711905: stop_queue: phy0 queue:13, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711908: stop_queue: phy0 queue:14, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711910: stop_queue: phy0 queue:15, 
> reason:0
>  ksoftirqd/2-21[002] .ns474.711917: ath10k_htt_tx_dec_pending: 
> num_pen: 63
>  ksoftirqd/2-21[002] dns574.711922: wake_queue: phy0 queue:0, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711929: wake_queue: phy0 queue:15, 
> reason:0
>  ksoftirqd/2-21[002] .ns374.711948: ath10k_mac_op_wake_tx_queue: 
> txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 
> 63 queue: 0
>  ksoftirqd/2-21[002] .ns474.711952: ath10k_htt_tx_inc_pending: 
> num_pen: 64
>  ksoftirqd/2-21[002] dns574.711956: stop_queue: phy0 queue:0, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711959: stop_queue: phy0 queue:1, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711962: stop_queue: phy0 queue:2, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711964: stop_queue: phy0 queue:3, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711967: stop_queue: phy0 queue:4, 
> reason:0
>  

Re: ath10k wake_tx_queue issues

2018-05-16 Thread Toke Høiland-Jørgensen
Niklas Cassel  writes:
> [ .. snip .. ]
>> > Sure, the regular way ath10k_mac_op_wake_tx_queue is called is via
>> > ieee80211_subif_start_xmit => __ieee80211_subif_start_xmit => 
>> > ieee80211_xmit_fast
>> > => ieee80211_queue_skb => drv_wake_tx_queue.
>> >
>> > But I was expecting the call to ieee80211_wake_queue to somehow trigger a 
>> > call
>> > to ath10k_mac_op_wake_tx_queue, since there is still data in the send 
>> > buffer/
>> > in the ieee80211_txq that needs to be sent, to allow more data to be 
>> > written to
>> > the socket. But obviously the callback never comes.
>> > Or how else is this supposed to work?
>> 
>> The driver should reschedule itself before/after calling
>> ieee80211_wake_queue. mt76 does this; I'm not actually sure if ath9k
>> does the right thing either, I'm not too familiar with that part of the
>> code. There's no direct call to reschedule that I can see, but there may
>> be another reason why this is not needed for ath9k. I'm sure Felix
>> knows?
>
> Hello Toke
>
> Unfortunately, it doesn't look like mt76 uses any ieee80211_* function
> to reschedule.

It doesn't need to; it just reschedules itself.

Basically, the wake_tx_queue() callback is just a way for mac80211 to
notify the driver that new packets are available and that it should
start its scheduling function. But in this case it is the driver that is
restarting the queues, so it already knows that. And so it can just call
its internal scheduling function. This is what mt76 does in
mt76_dma_tx_cleanup() with the call to mt76_txq_schedule() before
calling ieee80211_wake_queue().

I think that what ath10k should be doing is calling
ath10k_mac_tx_push_pending() when it restarts the queues.

> I just came across a ieee80211_schedule_txq() function in e937b8da5a59
> ("mac80211: Add TXQ scheduling API"). However, this commit was
> reverted. Any plans on resubmitting this?

Yeah, I have a revised version lying around waiting for Felix to review
it. But that wouldn't help this bug; it's just an API change, it doesn't
change behaviour...

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [Make-wifi-fast] [PATCH] ath10k: Add independent prio sta queue

2018-01-08 Thread Toke Høiland-Jørgensen
> Using what I understand now, I using aid to select which txq will be
> priority and sent out first. But when I testing this code with iperf,
> it is not working as I think. 2 station both sending -b 400M -t 0 packet,
> (maximum about 250M)I  exepct the station I setup in param will take all the
> bandwidth, but the result is 120M/120M, where did I get wrong?

IIRC, ath10k has two places it feeds packets to the firmware. You
probably need to change the other one as well...

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-07-26 Thread Toke Høiland-Jørgensen
Wen Gong  writes:

> Upstream kernel has an interface to help adjust sk_pacing_shift to help
> improve TCP UL throughput.
> The sk_pacing_shift is 8 in mac80211, this is based on test with 11N
> WiFi chips with ath9k. For QCA6174/QCA9377 PCI 11AC chips, the 11AC
> VHT80 TCP UL throughput testing result shows 6 is the optimal.
> Overwrite the sk_pacing_shift to 6 in ath10k driver.

When I tested this, a pacing shift of 8 was quite close to optimal as
well for ath10k. Why are you getting different results?

> Tested with QCA6174 PCI with firmware
> WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377 PCI.
> It's not a regression with new firmware releases.
>
> There have 2 test result of different settings:
>
> ARM CPU based device with QCA6174A PCI with different
> sk_pacing_shift:
>
>  sk_pacing_shift  throughput(Mbps) CPU utilization
>  6500(-P5)  ~75% idle, Focus on CPU1: ~14%idle
>  7454(-P5)  ~80% idle, Focus on CPU1: ~4%idle
>  8   288~90% idle, Focus on CPU1: ~35%idle
>  9  ~200~92% idle, Focus on CPU1: ~50%idle

Your tests do not include latency values; please try running a test that
also measures latency. The tcp_nup test in Flent (https://flent.org)
will do that, for instance. Also, is this a single TCP flow?

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-07-27 Thread Toke Høiland-Jørgensen
Wen Gong  writes:

> On 2018-07-26 19:45, Toke Høiland-Jørgensen wrote:
>> Wen Gong  writes:
>> 
>>> Upstream kernel has an interface to help adjust sk_pacing_shift to 
>>> help
>>> improve TCP UL throughput.
>>> The sk_pacing_shift is 8 in mac80211, this is based on test with 11N
>>> WiFi chips with ath9k. For QCA6174/QCA9377 PCI 11AC chips, the 11AC
>>> VHT80 TCP UL throughput testing result shows 6 is the optimal.
>>> Overwrite the sk_pacing_shift to 6 in ath10k driver.
>> 
>> When I tested this, a pacing shift of 8 was quite close to optimal as
>> well for ath10k. Why are you getting different results?
>
> the default value is still 8 in the patch:
> https://patchwork.kernel.org/patch/10545361/
>
> In my test, pacing shift 6 is better than 8.
> The test is for ath10k/11AC WiFi chips.
> Test result is show in the commit logs before.
>> 
>>> Tested with QCA6174 PCI with firmware
>>> WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377 
>>> PCI.
>>> It's not a regression with new firmware releases.
>>> 
>>> There have 2 test result of different settings:
>>> 
>>> ARM CPU based device with QCA6174A PCI with different
>>> sk_pacing_shift:
>>> 
>>>  sk_pacing_shift  throughput(Mbps) CPU utilization
>>>  6500(-P5)  ~75% idle, Focus on CPU1: ~14%idle
>>>  7454(-P5)  ~80% idle, Focus on CPU1: ~4%idle
>>>  8   288~90% idle, Focus on CPU1: ~35%idle
>>>  9  ~200~92% idle, Focus on CPU1: ~50%idle
>> 
>> Your tests do not include latency values; please try running a test 
>> that
>> also measures latency. The tcp_nup test in Flent (https://flent.org)
>> will do that, for instance. Also, is this a single TCP flow?
>> 
>
> It is not a single TCP flow, it is 500Mbps with 5 flows.
>
> below is result show in commit log before:
> 5G TCP UL VTH80 on X86 platform with QCA6174A PCI with sk_packing_shift
> set to 6:
>
>tcp_limit_output_bytesthroughput(Mbps)
>   default(262144)+1 Stream 336
>   default(262144)+2 Streams558
>   default(262144)+3 Streams584
>   default(262144)+4 Streams602
>   default(262144)+5 Streams598
>   changed(2621440)+1 Stream598
>   changed(2621440)+2 Streams   601

This is useless without latency numbers. The whole point of
sk_pacing_shift is to control the tradeoff between latency and
throughput. You're only showing the throughput, so it's impossible to
judge if setting the pacing shift to 6 is right (and from your results I
suspect the sweet spot is actually 7).

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


RE: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-08-17 Thread Toke Høiland-Jørgensen
Wen Gong  writes:

>> -Original Message-
>> From: Toke Høiland-Jørgensen 
>> Sent: Monday, August 13, 2018 7:18 PM
>> To: Wen Gong ; Wen Gong
>> ; ath10k@lists.infradead.org;
>> johan...@sipsolutions.net
>> Cc: linux-wirel...@vger.kernel.org
>> Subject: RE: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi
>> chips
>> 
>> Wen Gong  writes:
>> 
>> > Hi Toke,
>> >
>> > I have tested with your method for shift 6/8/10/7 all twice time in
>> > open air environment.
>> 
>> Great, thanks!
>> 
>> > Shift 6:
>> > wgong@wgong-Latitude-E5440-1:~/flent$ flent -H 192.168.1.7 -t
>> > "sk_pacing_shift6" tcp_nup --test-parameter upload_streams=1
>> 
>> Just to be sure: You are running this on your laptop? And that laptop has the
>> ath10k and the modified kernel you are testing? Is 192.168.1.7 the AP or
>> another device?
> Hi Toke,
> Yes, I run on my laptop(Dell LATITUDE E5440 + QCA6174 PCI wireless card) in 
> open air environment,
> Kernel is 4.18.0-rc2-wt-ath+ #2 SMP which is built myself.
> The mac80211.ko/ath10k is built with the 2 patches.
> 192.168.1.7 is another device(PC installed with 4.4.0-116-generic 
> #140~14.04.1-Ubuntu)
>> 
>> I'd like to see what happens when the link is fully saturated my multiple
>> streams as well. Could you repeat the tests with upload_streams=5, please?
>> And if you could share the .flent.gz files (upload them somewhere, or email
>> me off-list), that would be useful as well :)
>
> Test result with upload_streams=5:

Sorry for the delay in looking at this, I've been terribly busy this
week; will get back to it in more detail next week.

One question for now: In the results you quote below the total
throughput is way lower than the up to 600 mbps you quoted in the patch
description. How did you run your original tests to achieve that high a
throughput?

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


RE: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-08-13 Thread Toke Høiland-Jørgensen
Wen Gong  writes:

> Hi Toke,
>
> I have tested with your method for shift 6/8/10/7 all twice time in
> open air environment.

Great, thanks!

> Shift 6:
> wgong@wgong-Latitude-E5440-1:~/flent$ flent -H 192.168.1.7 -t
> "sk_pacing_shift6" tcp_nup --test-parameter upload_streams=1

Just to be sure: You are running this on your laptop? And that laptop
has the ath10k and the modified kernel you are testing? Is 192.168.1.7
the AP or another device?

I'd like to see what happens when the link is fully saturated my
multiple streams as well. Could you repeat the tests with
upload_streams=5, please? And if you could share the .flent.gz files
(upload them somewhere, or email me off-list), that would be useful as
well :)

Thanks,

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-08-08 Thread Toke Høiland-Jørgensen
Wen Gong  writes:

> Upstream kernel has an interface to help adjust sk_pacing_shift to help
> improve TCP UL throughput.
> The sk_pacing_shift is 8 in mac80211, this is based on test with 11N
> WiFi chips with ath9k. For QCA6174/QCA9377 PCI 11AC chips, the 11AC
> VHT80 TCP UL throughput testing result shows 6 is the optimal.
> Overwrite the sk_pacing_shift to 6 in ath10k driver for QCA6174/9377 PCI.
>
> Tested with QCA6174 PCI with firmware
> WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377 PCI.
> It's not a regression with new firmware releases.
>
> There have 2 test result of different settings:
>
> ARM CPU based device with QCA6174A PCI with different
> sk_pacing_shift:
>
>  sk_pacing_shift  throughput(Mbps) CPU utilization
>  6500(-P5)  ~75% idle, Focus on CPU1: ~14%idle
>  7454(-P5)  ~80% idle, Focus on CPU1: ~4%idle
>  8   288~90% idle, Focus on CPU1: ~35%idle
>  9  ~200~92% idle, Focus on CPU1: ~50%idle
>
> 5G TCP UL VTH80 on X86 platform with QCA6174A PCI with sk_packing_shift
> set to 6:
>
>   tcp_limit_output_bytesthroughput(Mbps)
>  default(262144)+1 Stream 336
>  default(262144)+2 Streams558
>  default(262144)+3 Streams584
>  default(262144)+4 Streams602
>  default(262144)+5 Streams598
>  changed(2621440)+1 Stream598
>  changed(2621440)+2 Streams   601

You still haven't provided any latency numbers for these tests, which
makes it impossible to verify that setting sk_pacing_shift to 6 is the
right tradeoff.

As I said before, from your numbers I suspect the right setting is
actually 7, which would be 10-20ms less latency under load; way more
important than ~50 Mbps...

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


RE: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-08-10 Thread Toke Høiland-Jørgensen
Wen Gong  writes:

>> -Original Message-
>> From: ath10k  On Behalf Of Toke
>> Høiland-Jørgensen
>> Sent: Wednesday, August 8, 2018 6:44 PM
>> To: Wen Gong ; ath10k@lists.infradead.org;
>> johan...@sipsolutions.net
>> Cc: linux-wirel...@vger.kernel.org
>> Subject: Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi
>> chips
>> 
>> Wen Gong  writes:
>> 
>> > Upstream kernel has an interface to help adjust sk_pacing_shift to
>> > help improve TCP UL throughput.
>> > The sk_pacing_shift is 8 in mac80211, this is based on test with 11N
>> > WiFi chips with ath9k. For QCA6174/QCA9377 PCI 11AC chips, the 11AC
>> > VHT80 TCP UL throughput testing result shows 6 is the optimal.
>> > Overwrite the sk_pacing_shift to 6 in ath10k driver for QCA6174/9377 PCI.
>> >
>> > Tested with QCA6174 PCI with firmware
>> > WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377
>> PCI.
>> > It's not a regression with new firmware releases.
>> >
>> > There have 2 test result of different settings:
>> >
>> > ARM CPU based device with QCA6174A PCI with different
>> > sk_pacing_shift:
>> >
>> >  sk_pacing_shift  throughput(Mbps) CPU utilization
>> >  6500(-P5)  ~75% idle, Focus on CPU1: ~14%idle
>> >  7454(-P5)  ~80% idle, Focus on CPU1: ~4%idle
>> >  8   288~90% idle, Focus on CPU1: ~35%idle
>> >  9  ~200~92% idle, Focus on CPU1: ~50%idle
>> >
>> > 5G TCP UL VTH80 on X86 platform with QCA6174A PCI with
>> > sk_packing_shift set to 6:
>> >
>> >   tcp_limit_output_bytesthroughput(Mbps)
>> >  default(262144)+1 Stream 336
>> >  default(262144)+2 Streams558
>> >  default(262144)+3 Streams584
>> >  default(262144)+4 Streams602
>> >  default(262144)+5 Streams598
>> >  changed(2621440)+1 Stream598
>> >  changed(2621440)+2 Streams   601
>> 
>> You still haven't provided any latency numbers for these tests, which makes
>> it impossible to verify that setting sk_pacing_shift to 6 is the right 
>> tradeoff.
>> 
>> As I said before, from your numbers I suspect the right setting is actually 
>> 7,
>> which would be 10-20ms less latency under load; way more important than
>> ~50 Mbps...
>> 
> Hi Toke,
> Could you give the command line for the latency test?
> https://flent.org/intro.html#quick-start
> I used the command but test failed:
> flent tcp_download -p 1 -l 60 -H 192.168.1.5 -t text-to-be-included-in-plot 
> -o file1.png
> error loading plotter: unable to find plot configuration "1"

Try something like:


flent -H 192.168.1.5 -t "sk_pacing_shift 7" tcp_nup --test-parameter 
upload_streams=1


This will note the value of sk_pacing_shift you are testing in the data
file (so change that as appropriate), and you can vary the number of TCP
streams by changing the upload_streams parameter.

Note that in this case I'm assuming you are running Flent on the device
with the kernel you are trying to test, so you want a TCP transfer going
*from* the device. If not, change "tcp_nup" to "tcp_ndown" and
"upload_streams" to "download_streams". Upload is netperf TCP_STREAM
test, and download is TCP_MAERTS.

When running the above command you'll get a summary output on the
terminal that you can paste on the list; and also a data file to plot
things form. For instance, you can do something like 'flent -p ping_cdf
*.flent.gz' to get a CDF plot of all your test results afterwards. You
are also very welcome to send me the .flent.gz data files and I'll take
a look to make sure everything looks reasonable :)

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 0/2] Change sk_pacing_shift in ieee80211_hw for best tx throughput

2018-08-10 Thread Toke Høiland-Jørgensen
Arend van Spriel  writes:

> On 8/8/2018 9:00 PM, Peter Oh wrote:
>>
>>
>> On 08/08/2018 03:40 AM, Wen Gong wrote:
>>> Add a field for ath10k to adjust the sk_pacing_shift, mac80211 set
>>> the default value to 8, and ath10k will change it to 6. Then mac80211
>>> will use the changed value 6 as sk_pacing_shift since 6 is the best
>>> value for tx throughput by test result.
>> I don't think you can convince people with the numbers unless you
>> provide latency along with the numbers and also measurement result on
>> different chipsets as Michal addressed (QCA4019, QCA9984, etc.) From
>> users view point, I also agree on Toke that we cannot scarify latency
>> for the small throughput improvement.
>
> Yeah. The wireless industry (admittedly that is me too :-p ) has been 
> focused on just throughput long enough.

Tell me about it ;)

> All the preaching about bufferbloat from Dave and others is (just)
> starting to sink in here and there.

Yeah, I've noticed; this is good!

> Now as for the value of the sk_pacing_shift I think we agree it
> depends on the specific device so in that sense the api makes sense,
> but I think there are a lot of variables so I was wondering if we
> could introduce a sysctl parameter for it. Does that make sense?

I'm not sure a sysctl parameter would make sense; for one thing, it
would be global for the host, while different network interfaces will
probably need different values. And for another, I don't think it's
something a user can reasonably be expected to set correctly, and I
think it *is* actually possible to pick a value that works well at the
driver level.

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 0/2] Change sk_pacing_shift in ieee80211_hw for best tx throughput

2018-08-20 Thread Toke Høiland-Jørgensen
Arend van Spriel  writes:

> + Eric
>
> On 8/10/2018 9:52 PM, Ben Greear wrote:
>> On 08/10/2018 12:28 PM, Arend van Spriel wrote:
>>> On 8/10/2018 3:20 PM, Toke Høiland-Jørgensen wrote:
>>>> Arend van Spriel  writes:
>>>>
>>>>> On 8/8/2018 9:00 PM, Peter Oh wrote:
>>>>>>
>>>>>>
>>>>>> On 08/08/2018 03:40 AM, Wen Gong wrote:
>>>>>>> Add a field for ath10k to adjust the sk_pacing_shift, mac80211 set
>>>>>>> the default value to 8, and ath10k will change it to 6. Then mac80211
>>>>>>> will use the changed value 6 as sk_pacing_shift since 6 is the best
>>>>>>> value for tx throughput by test result.
>>>>>> I don't think you can convince people with the numbers unless you
>>>>>> provide latency along with the numbers and also measurement result on
>>>>>> different chipsets as Michal addressed (QCA4019, QCA9984, etc.) From
>>>>>> users view point, I also agree on Toke that we cannot scarify latency
>>>>>> for the small throughput improvement.
>>>>>
>>>>> Yeah. The wireless industry (admittedly that is me too :-p ) has been
>>>>> focused on just throughput long enough.
>>>>
>>>> Tell me about it ;)
>>>>
>>>>> All the preaching about bufferbloat from Dave and others is (just)
>>>>> starting to sink in here and there.
>>>>
>>>> Yeah, I've noticed; this is good!
>>>>
>>>>> Now as for the value of the sk_pacing_shift I think we agree it
>>>>> depends on the specific device so in that sense the api makes sense,
>>>>> but I think there are a lot of variables so I was wondering if we
>>>>> could introduce a sysctl parameter for it. Does that make sense?
>>>>
>>>> I'm not sure a sysctl parameter would make sense; for one thing, it
>>>> would be global for the host, while different network interfaces will
>>>> probably need different values. And for another, I don't think it's
>>>> something a user can reasonably be expected to set correctly, and I
>>>> think it *is* actually possible to pick a value that works well at the
>>>> driver level.
>>>
>>> I not sure either. Do you think a user could come up with something
>>> like this (found here [1]):
>>>
>>> sysctl -w net.core.rmem_max=8388608
>>> sysctl -w net.core.wmem_max=8388608
>>> sysctl -w net.core.rmem_default=65536
>>> sysctl -w net.core.wmem_default=65536
>>> sysctl -w net.ipv4.tcp_rmem='4096 87380 8388608'
>>> sysctl -w net.ipv4.tcp_wmem='4096 65536 8388608'
>>> sysctl -w net.ipv4.tcp_mem='8388608 8388608 8388608'
>>> sysctl -w net.ipv4.route.flush=1
>>>
>>> Now the page listing this config claims this is for use "on Linux 2.4+
>>> for high-bandwidth applications". Beats me if it still is correct in
>>> 4.17.
>>>
>>> Anyway, sysctl is nice for parameterizing code that is built-in the
>>> kernel so you don't need to rebuild it. mac80211 tends to be a module
>>> in most distros so
>>> maybe sysctl is not a good fit. So lets agree on that.
>>>
>>> Picking a value at driver level may be possible, but a driver tends to
>>> support a number of different devices. So how do you see the picking
>>> work. Some static
>>> table with entries for the different devices?
>>
>> Some users are not going to care about latency, and for others, latency may
>> be absolutely important and they don't care about bandwidth.
>>
>> So, it should be tunable.  sysctl can support per network-device settings,
>> right?  Or, probably could use ethtool API to set a per-netdev value as
>> well.
>> That might be nice for other network devices as well, not just wifi.
>
> I was under the impression that the parameters are all global, but your 
> statement made me look. I came across some references here [2] so I 
> checked the kernel sources under net/ and found net/ipv4/devinet.c [3]. 
> So that confirms it supports per-netdev settings.

Yeah, I think that *if* this is to be made configurable, a per-netdev
sysctl would be the way to go, with the driver being able to set the
default.

However, the reason I think it may not be worth it to expose this as a
setting is that it is very much a case of diminishing returns. Once the
buffer size is large enough that full aggregates can be built,
increasing it further just adds latency with very little effect on
throughput. Which means that fiddling with the parameter is not going to
have a lot of effect, so it is not very useful to expose, which makes it
not worth the added configuration complexity...

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


RE: Setting tx retry count in ath10k

2018-07-18 Thread Toke Høiland-Jørgensen
Jean-Pierre TOSONI  writes:

> @Toke: As you can see in the patch, the value 30 was the fixed value
> defined in ath9k, I kept it for compatibility only (and that's why I
> wanted to make it configurable :-)

Yup, I'm aware that this is the default from ath9k; doesn't make it any
less wrong ;)

> On another hand, Minstrel in mac80211 seems to vary retries according
> to what you say, i.e. Minstrel tries to stay below a certain amount of
> time, but this only applies to short/long retries.

Right, haven't had time to poke into what exactly Minstrel does; have
just observed that it sometimes runs really long. But for most cases
it's not too bad...

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: Setting tx retry count in ath10k

2018-07-18 Thread Toke Høiland-Jørgensen
Ben Greear  writes:

> On 07/18/2018 08:50 AM, Jean-Pierre TOSONI wrote:
>> Hi,
>>
>> We made retries configurable in our mac80211+ath9k system, and we ended with 
>> 3 counts:
>> 1) short retry count, defaults to 4
>> 2) long retrys count, defauts to 7
>> 3) software retry count, defaults to 30
>> This last one is used separately for each frame in an aggregated frame, 
>> since they can be separately acknowledged.
>
> Did you have to change code for #3, and if so, can you share the patch?
>
> I wonder also if retries should be different for different types of
> data. For instance, if someone is using UDP, maybe they don't care so
> much about lost packets and would prefer a lower retry count. Or,
> maybe IP type-of-service could be taken into account and retry frames
> different amounts based on ToS?

For general internet traffic, a retry count of 30 is way too high; that
is up to 120 ms of HOL blocking latency. Better to just drop the packet
at that point.

Ideally, the retry count should be dynamically calculated in units of
time (which would depend on the rate and aggregate size), and also take
queueing time into account. I've been meaning to experiment with this
for minstrel and ath9k, but haven't gotten around to it...

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: Setting tx retry count in ath10k

2018-07-19 Thread Toke Høiland-Jørgensen
Benjamin Beichler  writes:

>>> For general internet traffic, a retry count of 30 is way too high; that
>>> is up to 120 ms of HOL blocking latency. Better to just drop the packet
>>> at that point.
>>>
>>> Ideally, the retry count should be dynamically calculated in units of
>>> time (which would depend on the rate and aggregate size), and also take
>>> queueing time into account. I've been meaning to experiment with this
>>> for minstrel and ath9k, but haven't gotten around to it...
> We have running current research on this topic but focused on the
> effects in 802.11s mesh networks. With multiple(forwarding) wireless
> links, the retry limit have a bigger impact as only in managed mode
> setups, since every forwarding step is doing repeated transmissions.
> But I also totally agree, that the retry count needs to be considered
> in the bufferbloat/airtime queuing stuff, which Toke proposed.

Ah, cool. Looking forward to seeing your results. And yeah, it's
probably even worse in meshes...

> Nonetheless, since this ambiguities are known, wouldn't it be nice to
> clean up all this patches to enable at least ath9k and ath10k to do
> the right thing, or do anybody can tell why they weren't included the
> first time ?

No objection from me, certainly! :)

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-09-06 Thread Toke Høiland-Jørgensen
Grant Grundler  writes:

>> And, well, Grant's data is from a single test in a noisy
>> environment where the time series graph shows that throughput is all over
>> the place for the duration of the test; so it's hard to draw solid
>> conclusions from (for instance, for the 5-stream test, the average
>> throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 7
>> it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware
>> used in this test, so I can't go verify it myself; so the only thing I
>> can do is grumble about it here... :)
>
> It's a fair complaint and I agree with it. My counter argument is the
> opposite is true too: most ideal benchmarks don't measure what most
> users see. While the data wgong provided are way more noisy than I
> like, my overall "confidence" in the "conclusion" I offered is still
> positive.

Right. I guess I would just prefer a slightly more comprehensive
evaluation to base a 4x increase in buffer size on...

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 0/6] Move TXQ scheduling and airtime fairness into mac80211

2018-10-21 Thread Toke Høiland-Jørgensen
Rajkumar Manoharan  writes:

> Toke,
>
> It is been a while since mac80211 TXQ discussion started. Here is the 
> consolidated
> series of mac80211, ath9k and ath10k changes to move TXQ scheduling and 
> airtime
> fairness into mac80211. The major changes w.r.t 5th RFC version are in 
> may_transmit()
> API. Whenever the driver checks deficit for given TXQ, the list will be
> reordered so that driver/firmware RR quickly becomes in sync with mac80211
> list. Also airtime tasklet approach cumbersome and causing regression
> in multi client performance.

Cool, thanks. I'll look at this in more detail later, but in the
meantime, do you have any test results to share? Specifically, how did
you measure that the fairness part actually works? :)

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 3/6] mac80211: Add airtime accounting and scheduling to TXQs

2018-10-26 Thread Toke Høiland-Jørgensen
Rajkumar Manoharan  writes:

> From: Toke Høiland-Jørgensen 
>
> This adds airtime accounting and scheduling to the mac80211 TXQ
> scheduler. A new callback, ieee80211_sta_register_airtime(), is added
> that drivers can call to report airtime usage for stations.
>
> When airtime information is present, mac80211 will schedule TXQs
> (through ieee80211_next_txq()) in a way that enforces airtime fairness
> between active stations. This scheduling works the same way as the ath9k
> in-driver airtime fairness scheduling. If no airtime usage is reported
> by the driver, the scheduler will default to round-robin scheduling.
>
> For drivers that don't control TXQ scheduling in software, a new API
> function, ieee80211_txq_may_transmit(), is added which the driver can use
> to check if the TXQ is eligible for transmission, or should be throttled to
> enforce fairness. Calls to this function must also be enclosed in
> ieee80211_txq_schedule_{start,end}() calls to ensure proper locking.
>
> The API ieee80211_txq_may_transmit() also ensures that TXQ list will be
> aligned aginst driver's own round-robin scheduler list. i.e it rotates
> the TXQ list till it makes the requested node becomes the first entry
> in TXQ list. Thus both the TXQ list and driver's list are in sync.
>
> Signed-off-by: Toke Høiland-Jørgensen 
> Signed-off-by: Rajkumar Manoharan 
> ---
>  include/net/mac80211.h | 58 ++
>  net/mac80211/cfg.c |  3 ++
>  net/mac80211/debugfs.c |  3 ++
>  net/mac80211/debugfs_sta.c | 50 --
>  net/mac80211/ieee80211_i.h |  2 ++
>  net/mac80211/main.c|  4 +++
>  net/mac80211/sta_info.c| 45 +--
>  net/mac80211/sta_info.h| 13 +++
>  net/mac80211/status.c  |  6 
>  net/mac80211/tx.c  | 90 
> +++---
>  10 files changed, 264 insertions(+), 10 deletions(-)
>
> diff --git a/include/net/mac80211.h b/include/net/mac80211.h
> index 2f5c0fbd453c..0ced3adb09ac 100644
> --- a/include/net/mac80211.h
> +++ b/include/net/mac80211.h
> @@ -2334,6 +2334,8 @@ enum ieee80211_hw_flags {
>   * @tx_sk_pacing_shift: Pacing shift to set on TCP sockets when frames from
>   *   them are encountered. The default should typically not be changed,
>   *   unless the driver has good reasons for needing more buffers.
> + *
> + * @airtime_weight: Default airtime weight preferred by driver.
>   */
>  struct ieee80211_hw {
>   struct ieee80211_conf conf;
> @@ -2370,6 +2372,7 @@ struct ieee80211_hw {
>   const struct ieee80211_cipher_scheme *cipher_schemes;
>   u8 max_nan_de_entries;
>   u8 tx_sk_pacing_shift;
> + u32 airtime_weight;
>  };

This doesn't make sense. Airtime weights can be set by userspace, so
even if a driver sets another default it is not guaranteed to be
honoured. So what's the point?

> +bool ieee80211_txq_may_transmit(struct ieee80211_hw *hw,
> + struct ieee80211_txq *txq)
> +{
> + struct ieee80211_local *local = hw_to_local(hw);
> + struct txq_info *iter, *tmp, *txqi = to_txq_info(txq);
> + struct sta_info *sta;
> + u8 ac = txq->ac;
> +
> + lockdep_assert_held(>active_txq_lock[ac]);
> +
> + if (!txqi->txq.sta)
> + goto out;
> +
> + if (list_empty(>schedule_order))
> + goto out;
> +
> + list_for_each_entry_safe(iter, tmp, >active_txqs[ac],
> +  schedule_order) {
> + if (iter == txqi)
> + break;
> +
> + if (!iter->txq.sta) {
> + list_move_tail(>schedule_order,
> +>active_txqs[ac]);
> + continue;
> + }
> + sta = container_of(iter->txq.sta, struct sta_info, sta);
> + if (sta->airtime[ac].deficit < 0)
> + sta->airtime[ac].deficit += sta->airtime_weight;
> + list_move_tail(>schedule_order, >active_txqs[ac]);
> + }

So since we're rotating the queue on every call to the function, I'm
wondering if this actually succeeds in throttling the slow station's
airtime usage enough to achieve fairness? So I'll ask again: Did you
test the fairness performance, and how?


Also, taking a step back:

With this, we're doing lots of work to make sure that the hardware's
round-robin scheduling queue lines up with mac80211; so if we do this
anyway, why can't we just control the order directly from mac80211
(which is what we do with the other next_txq() API)?

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 3/6] mac80211: Add airtime accounting and scheduling to TXQs

2018-10-28 Thread Toke Høiland-Jørgensen
Rajkumar Manoharan  writes:

> On 2018-10-26 07:16, Toke Høiland-Jørgensen wrote:
>> Rajkumar Manoharan  writes:
>> 
>>> From: Toke Høiland-Jørgensen 
> [...]
>>> u8 max_nan_de_entries;
>>> u8 tx_sk_pacing_shift;
>>> +   u32 airtime_weight;
>>>  };
>> 
>> This doesn't make sense. Airtime weights can be set by userspace, so
>> even if a driver sets another default it is not guaranteed to be
>> honoured. So what's the point?
>> 
> The reason for driver specific default is to avoid performance impact
> in ath10k when the user is using vanilla ath10k with default airtime.
> As I mentioned earlier, mac80211 default (256us) is too low for 11ac
> devices especially with driver is bursting aggregation.
>
> Yes. I do understand the user can change airtime at anytime but It
> must be noted that different airtime weight will result in different
> throughput. IMHO the defaults should not impact current benchmark.
> Otherwise it will be alarmed as regression later. isn't it?

My point is that if the user has to know the implementation-specific
limitations of each driver before setting a weight, then it's not a
particularly friendly API. I think we should be able to do better than
that...

>> So since we're rotating the queue on every call to the function, I'm
>> wondering if this actually succeeds in throttling the slow station's
>> airtime usage enough to achieve fairness? So I'll ask again: Did you
>> test the fairness performance, and how?
>> 
> We are collecting data of mixed clients (11ac, 11n and legacy) keeping
> them at same distance and different distance. So that lower rate
> clients will interfere higher MCS rate station. Also configuring
> different airtime weight for each stations so that throttling low rate
> clients more should help higher performing(better MCS) clients.
>
> By allowing different airtime for each stations, the user can control
> guest network over primary network. Also It helped to improve
> performance of preferred station and algo. to control station is given
> to cloud or user application.
>
> As of now, We are testing with 4 11ac clients of same distance. And
> collecting the performance data in multiple iteration. Below are
> current data of station's performance (Mbps) against airtime weight.
>
> airtime   station1(%airtime)  station2 station3   station4 
> (Mbps)
>
> No ATF  182   168  166169
>
> 4ms 170 (100%)164 (100%)   185  (100%)175 (100%)
>
> 4ms 277 (70%) 115 (10%)103 (10%)  105 (10%)
>
> 4ms 223 (40%) 214 (40%)109 (10%)   94 (10%)
>
> 4ms 337 (90%) 182 (8%)  23 (1%)30 (1%)

So this looks like it's doing *something*, but not like it's succeeding
in achieving the set percentages. Did you check if the actual airtime
values (in debugfs) corresponds to the configured weights?

>
>  STA1(11ac)  STA2 (11n)  STA3(11a)
>  ==  ==  =
>
> No ATF   225 166 3.5
>
> ATF (4ms)234 151 3.5

This also shows like ATF has no effect?

>> Also, taking a step back:
>> 
>> With this, we're doing lots of work to make sure that the hardware's
>> round-robin scheduling queue lines up with mac80211; so if we do this
>> anyway, why can't we just control the order directly from mac80211
>> (which is what we do with the other next_txq() API)?
>> 
> The only way to enforce mac80211 ordering in ath10k is to disable pull
> mode in firmware and always operates in push mode similar to ath9k.

And I assume that is not too likely to happen, or? What is the benefit
of pull mode at the firmware level?

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 3/6] mac80211: Add airtime accounting and scheduling to TXQs

2018-11-02 Thread Toke Høiland-Jørgensen
Rajkumar Manoharan  writes:

> On 2018-10-28 15:01, Rajkumar Manoharan wrote:
>> On 2018-10-28 08:48, Toke Høiland-Jørgensen wrote:
>>> Rajkumar Manoharan  writes:
>>> 
>>>> 
>>>> 4ms 223 (40%) 214 (40%)109 (10%)   94 (10%)
>>>> 
>>>> 4ms 337 (90%) 182 (8%)  23 (1%)30 (1%)
>>> 
>>> So this looks like it's doing *something*, but not like it's 
>>> succeeding
>>> in achieving the set percentages. Did you check if the actual airtime
>>> values (in debugfs) corresponds to the configured weights?
>>> 
>> No. Will check that.
>> 
> Toke,
>
>  From above results, different airtime for each station is reflecting on
> output performance. Unfortunately I don't see such tput difference, when
> the tx mode is fixed in push-only. Even low weight station is giving 
> same
> performance. Are you also seeing the same behavior in your setup? Could
> you please share your results?

Sorry, I've been travelling this week; I'll be back in the office next
week and can run some tests then. I may also have an idea for a
different algorithm that will work better in pull mode, but need to see
if it works at all first :)

How do I force ath10k into push mode?

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 3/6] mac80211: Add airtime accounting and scheduling to TXQs

2018-11-08 Thread Toke Høiland-Jørgensen
Rajkumar Manoharan  writes:

> On 2018-11-07 06:53, Toke Høiland-Jørgensen wrote:
>> Rajkumar Manoharan  writes:
>> 
>>> Meanwhile we did some more experiments with both modes. The experiment
>>> was done in open environment and fixed rate and UDP traffic ran for 60
>>> seconds.
>>> 
>>> Seems like push mode not honoring the configured weight. Always the
>>> airtime was almost same whereas in pull-mode airtime is changing based
>>> on configured weight. Hence I would like to know your results.
>> 
>> Right, so I verified that the current version of the patch set still
>> works with ath9k. However, the ath10k card I have doesn't seem to
>> support peer stats, so I can't test ath10k.
>> 
>> $ lspci | grep Qualcomm
>> 03:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac
>> Wireless Network Adapter
>> 
>> $ cat /sys/kernel/debug/ieee80211/phy1/ath10k/chip_id
>> 0x043202ff
>> 
>> $ cat /sys/kernel/debug/ieee80211/phy1/ath10k/wmi_services  | grep PEER
>> WMI_SERVICE_PEER_CACHING -
>> WMI_SERVICE_PEER_STATS   -
>> 
>
> Oops... Yeah 988x firmware (10.2.4) does not have peer stats support.
>
>> 
>> Is there a way to force-enable airtime support, is this a hardware 
>> issue?
>> 
> Unfortunately not. There is one more pending change that handles
> airtime report from HTT tx-compl. Again it depends firmware support.
> These experiments are taken with this f/w interface. Will post the
> change.

Right, thought so. Too bad. Guess you are doing all the ath10k testing,
then ;)

>>> sta1sta2sta3sta4
>>> pull-mode   8s(205us)   18s(3.2ms)  8s(205us)   14s(410us)
>>> 12s(256us)  12s(256us)  13s(256us)  12s(256us)
>>> 14s(4ms)13s(4ms)14s(4ms)13s(4ms)
>>> 
>>> push-mode   15s(205us)  12s(3.2ms)  16s(205us)  12s(410us)
>>> 15s(256us)  12s(256us)  16s(256us)  12s(256us)
>>> 14s(4ms)13s(4ms)16s(4ms)12s(4ms)
>> 
>> Right, so the pull-mode results are encouraging! *Something* is
>> happening, at least, even though the aggregate airtime doesn't quite
>> match the ratios of the configured weights.
>> 
>> Are you running the UDP generator on the AP itself, or on a separate
>> device, BTW? If it's on the AP, the userspace socket can get throttled,
>> which will interfere with results, so it's better to have it on a
>> separate device (connected via ethernet).
>> 
> Traffic b/w wired client (via ethernet) and wireless clients.

Cool.

>> As for push-mode, could this be because of bad buffer management? I.e.,
>> because there's a lag between the time airtime is registered, and the
>> time that airtime usage is reported, the driver just pushes a whole
>> bunch of packets to the firmware when it gets the chance, which 
>> prevents
>> the scheduler from throttling properly. This could also explain the
>> better, but not quite perfect, results in pull mode, assuming that pull
>> mode results in better firmware buffer management which reduces, but
>> doesn't quite remove, the lag.
>> 
> Hmm... I agree that lag in reporting airtime may cause more data push
> to hw. Right now both tx and tx-compl are serialized by
> active_txq_lock. So once lock acquired by tx path, it will download
> all frames. i.e it is even true for ath9k driver. Hence I am wondering
> how it is working only with ath9k.

If enough packets are dequeued at once that the TXQ empties and is not
put back on the scheduling loop, the next_txq() loop is just going to
loop through the remaining TXQs and immediately increase their quantum.
In ath9k, there's a maximum of two outstanding aggregates below the TXQ
level, but I think you mentioned that ath10k can queue thousands in
firmware?

Your patch removes the 'max 16 packets at a time' check before the call
to ath10k_mac_tx_push_txq(); try adding that back and see if it helps?

>> If this is indeed the reason, the queue limit patches should hopefully
>> be a solution. So guess we need to get those working as well :)
>> 
> I would prefer to baseline the basic infra into upstream first and do
> enhancement on top of that.

Sure, I'm fine with merging this first and building on top.

> I request you to revisit maintaining per driver default. Otherwise
> there would be performance impact with 256us. :(

Hmm, how about we make it a driver-specified multiplier instead (which
could be 4, 8 or 16 for ath10k)? That way it would still work even if
userspace changes the weights. It would affect the minimum possible
granularity, but that is probably acceptable; and we would be free to
change the mechanism later, without affecting the userspace API.

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [Make-wifi-fast] [PATCH v3 3/6] mac80211: Add airtime accounting and scheduling to TXQs

2018-11-15 Thread Toke Høiland-Jørgensen
Louie Lu  writes:

> Hi Rajkumar, Toke,
>
> I found the series (v3,4/6) remove the debugfs remove reset station's
> airtime method, and didn't added at here.
>
> Not sure how to help this kind of situation, do I need a separate
> patch to fix this, or posting the patch here is fine?

This is fine; we can fold it into the next version. Thanks :)

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v3 3/6] mac80211: Add airtime accounting and scheduling to TXQs

2018-11-15 Thread Toke Høiland-Jørgensen
Felix Fietkau  writes:

> On 2018-11-14 18:40, Toke Høiland-Jørgensen wrote:
>>> This part doesn't really make much sense to me, but maybe I'm
>>> misunderstanding how the code works.
>>> Let's assume we have a driver like ath9k or mt76, which tries to keep a
>>> number of aggregates in the hardware queue, and the hardware queue is
>>> currently empty.
>>> If the current txq entry is kept at the head of the schedule list,
>>> wouldn't the code just pull from that one over and over again, until
>>> enough packets are transmitted by the hardware and their tx status
>>> processed?
>>> It seems to me that while fairness is still preserved in the long run,
>>> this could lead to rather bursty scheduling, which may not be
>>> particularly latency friendly.
>> 
>> Yes, it'll be a bit more bursty when the hardware queue is completely
>> empty. However, when a TX completion comes back, that will adjust the
>> deficit of that sta and cause it to be rotated on the next dequeue. This
>> obviously relies on the fact that the lower-level hardware queue is
>> sufficiently shallow to not add a lot of latency. But we want that to be
>> the case anyway. In practice, it works quite well for ath9k, but not so
>> well for ath10k because it has a large buffer in firmware.
>> 
>> If we requeue the TXQ at the end of the list, a station that is taking
>> up too much airtime will fail to be throttled properly, so the
>> queue-at-head is kinda needed to ensure fairness...
> Thanks for the explanation, that makes sense to me. I have an idea on
> how to mitigate the burstiness within the driver. I'll write it down in
> pseudocode, please let me know if you think that'll work.

I don't think it will, unfortunately. For example, consider the case
where there are two stations queued; one with a large negative deficit
(say, -10ms), and one with a positive deficit.

In this case, we really need to throttle the station with a negative
deficit. But if the driver loops and caches txqs, we'll get something
like the following:

- First driver loop iteration: returns TXQ with positive deficit.
- Second driver loop iteration: Only the negative-deficit TXQ is in the
  mac80211 list, so it will loop until that TXQ's deficit turns positive
  and return it.

Because of this, the negative-deficit station won't be throttled, and we
won't get fairness.

How many frames will mt76 queue up below the driver point? I.e., how
much burstiness are you expecting this will introduce on that driver?

Taking a step back, it's clear that it would be good to be able to
dequeue packets to multiple STAs at once (we need that for MU-MIMO on
ath10k as well). However, I don't think we can do that with the
round-robin fairness scheduler; so we are going to need a different
algorithm. I *think* it may be possible to do this with a virtual-time
scheduler, but I haven't sat down and worked out the details yet...

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 3/6] mac80211: Add airtime accounting and scheduling to TXQs

2018-11-07 Thread Toke Høiland-Jørgensen
Rajkumar Manoharan  writes:

> On 2018-11-02 03:30, Toke Høiland-Jørgensen wrote:
>> Rajkumar Manoharan  writes:
>> 
>>> On 2018-10-28 15:01, Rajkumar Manoharan wrote:
>>>> On 2018-10-28 08:48, Toke Høiland-Jørgensen wrote:
>>>>> Rajkumar Manoharan  writes:
>>>>> 
>>>>>> 
>>>>>> 4ms 223 (40%) 214 (40%)109 (10%)   94 (10%)
>>>>>> 
>>>>>> 4ms 337 (90%) 182 (8%)  23 (1%)30 (1%)
>>>>> 
>>>>> So this looks like it's doing *something*, but not like it's
>>>>> succeeding
>>>>> in achieving the set percentages. Did you check if the actual 
>>>>> airtime
>>>>> values (in debugfs) corresponds to the configured weights?
>>>>> 
>>>> No. Will check that.
>>>> 
>>> Toke,
>>> 
>>>  From above results, different airtime for each station is reflecting 
>>> on
>>> output performance. Unfortunately I don't see such tput difference, 
>>> when
>>> the tx mode is fixed in push-only. Even low weight station is giving
>>> same
>>> performance. Are you also seeing the same behavior in your setup? 
>>> Could
>>> you please share your results?
>> 
>> Sorry, I've been travelling this week; I'll be back in the office next
>> week and can run some tests then. I may also have an idea for a
>> different algorithm that will work better in pull mode, but need to see
>> if it works at all first :)
>> 
> Wow... :)
>
> Meanwhile we did some more experiments with both modes. The experiment
> was done in open environment and fixed rate and UDP traffic ran for 60
> seconds.
>
> Seems like push mode not honoring the configured weight. Always the
> airtime was almost same whereas in pull-mode airtime is changing based
> on configured weight. Hence I would like to know your results.

Right, so I verified that the current version of the patch set still
works with ath9k. However, the ath10k card I have doesn't seem to
support peer stats, so I can't test ath10k. 

$ lspci | grep Qualcomm
03:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless 
Network Adapter

$ cat /sys/kernel/debug/ieee80211/phy1/ath10k/chip_id 
0x043202ff

$ cat /sys/kernel/debug/ieee80211/phy1/ath10k/wmi_services  | grep PEER
WMI_SERVICE_PEER_CACHING -
WMI_SERVICE_PEER_STATS   -


Is there a way to force-enable airtime support, is this a hardware issue?

>   sta1sta2sta3sta4
> pull-mode 8s(205us)   18s(3.2ms)  8s(205us)   14s(410us)
>   12s(256us)  12s(256us)  13s(256us)  12s(256us)
>   14s(4ms)13s(4ms)14s(4ms)13s(4ms)
>
> push-mode 15s(205us)  12s(3.2ms)  16s(205us)  12s(410us)
>   15s(256us)  12s(256us)  16s(256us)  12s(256us)
>   14s(4ms)13s(4ms)16s(4ms)12s(4ms)

Right, so the pull-mode results are encouraging! *Something* is
happening, at least, even though the aggregate airtime doesn't quite
match the ratios of the configured weights.

Are you running the UDP generator on the AP itself, or on a separate
device, BTW? If it's on the AP, the userspace socket can get throttled,
which will interfere with results, so it's better to have it on a
separate device (connected via ethernet).

As for push-mode, could this be because of bad buffer management? I.e.,
because there's a lag between the time airtime is registered, and the
time that airtime usage is reported, the driver just pushes a whole
bunch of packets to the firmware when it gets the chance, which prevents
the scheduler from throttling properly. This could also explain the
better, but not quite perfect, results in pull mode, assuming that pull
mode results in better firmware buffer management which reduces, but
doesn't quite remove, the lag.

If this is indeed the reason, the queue limit patches should hopefully
be a solution. So guess we need to get those working as well :)

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 1/6] mac80211: Add TXQ scheduling API

2018-11-09 Thread Toke Høiland-Jørgensen



On 9 November 2018 13:00:04 CET, Johannes Berg  
wrote:
>On Sat, 2018-10-20 at 04:05 -0700, Rajkumar Manoharan wrote:
>> 
>> + * @txq: pointer obtained from station or virtual interface, or from
>> + *   ieee80211_next_txq()
>
>nit: please just indent by a single tab for continuation, trying to
>line
>it all up doesn't really make it more readable

Not sure I agree about the "not more readable" part, but OK.

>I see you're still discussing all this, so I'm going to assume you'll
>resend this series eventually :-)

Yup. Think we're getting close to something that might be mergeable. Will send 
another version then :)

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v3 3/6] mac80211: Add airtime accounting and scheduling to TXQs

2018-11-14 Thread Toke Høiland-Jørgensen
Felix Fietkau  writes:

> On 2018-11-12 23:51, Rajkumar Manoharan wrote:
>> From: Toke Høiland-Jørgensen 
>> 
>> This adds airtime accounting and scheduling to the mac80211 TXQ
>> scheduler. A new callback, ieee80211_sta_register_airtime(), is added
>> that drivers can call to report airtime usage for stations.
>> 
>> When airtime information is present, mac80211 will schedule TXQs
>> (through ieee80211_next_txq()) in a way that enforces airtime fairness
>> between active stations. This scheduling works the same way as the ath9k
>> in-driver airtime fairness scheduling. If no airtime usage is reported
>> by the driver, the scheduler will default to round-robin scheduling.
>> 
>> For drivers that don't control TXQ scheduling in software, a new API
>> function, ieee80211_txq_may_transmit(), is added which the driver can use
>> to check if the TXQ is eligible for transmission, or should be throttled to
>> enforce fairness. Calls to this function must also be enclosed in
>> ieee80211_txq_schedule_{start,end}() calls to ensure proper locking.
>> 
>> The API ieee80211_txq_may_transmit() also ensures that TXQ list will be
>> aligned aginst driver's own round-robin scheduler list. i.e it rotates
>> the TXQ list till it makes the requested node becomes the first entry
>> in TXQ list. Thus both the TXQ list and driver's list are in sync.
>> 
>> Co-Developed-by: Rajkumar Manoharan 
>> Signed-off-by: Toke Høiland-Jørgensen 
>> Signed-off-by: Rajkumar Manoharan 
>> ---
>>  include/net/mac80211.h | 59 ++
>>  net/mac80211/cfg.c |  3 ++
>>  net/mac80211/debugfs.c |  3 ++
>>  net/mac80211/debugfs_sta.c | 50 --
>>  net/mac80211/ieee80211_i.h |  2 ++
>>  net/mac80211/main.c|  4 +++
>>  net/mac80211/sta_info.c| 44 +--
>>  net/mac80211/sta_info.h| 13 +++
>>  net/mac80211/status.c  |  6 
>>  net/mac80211/tx.c  | 90 
>> +++---
>>  10 files changed, 264 insertions(+), 10 deletions(-)
>> 
>> diff --git a/net/mac80211/status.c b/net/mac80211/status.c
>> index aa4afbf0abaf..a1f1256448f5 100644
>> --- a/net/mac80211/status.c
>> +++ b/net/mac80211/status.c
>> @@ -818,6 +818,12 @@ static void __ieee80211_tx_status(struct ieee80211_hw 
>> *hw,
>>  ieee80211_sta_tx_notify(sta->sdata, (void *) skb->data,
>>  acked, info->status.tx_time);
>>  
>> +if (info->status.tx_time &&
>> +wiphy_ext_feature_isset(local->hw.wiphy,
>> +
>> NL80211_EXT_FEATURE_AIRTIME_FAIRNESS))
>> +ieee80211_sta_register_airtime(>sta, tid,
>> +   info->status.tx_time, 0);
>> +
>>  if (ieee80211_hw_check(>hw, REPORTS_TX_ACK_STATUS)) {
>>  if (info->flags & IEEE80211_TX_STAT_ACK) {
>>  if (sta->status_stats.lost_packets)
> I think the same is needed in ieee80211_tx_status_ext.

Right, good point.

>> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
>> index 305965283506..3f417e80e041 100644
>> --- a/net/mac80211/tx.c
>> +++ b/net/mac80211/tx.c
>> @@ -3660,12 +3680,74 @@ void ieee80211_return_txq(struct ieee80211_hw *hw,
>>  lockdep_assert_held(>active_txq_lock[txq->ac]);
>>  
>>  if (list_empty(>schedule_order) &&
>> -(!skb_queue_empty(>frags) || txqi->tin.backlog_packets))
>> -list_add_tail(>schedule_order,
>> -  >active_txqs[txq->ac]);
>> +(!skb_queue_empty(>frags) || txqi->tin.backlog_packets)) {
>> +/* If airtime accounting is active, always enqueue STAs at the
>> + * head of the list to ensure that they only get moved to the
>> + * back by the airtime DRR scheduler once they have a negative
>> + * deficit. A station that already has a negative deficit will
>> + * get immediately moved to the back of the list on the next
>> + * call to ieee80211_next_txq().
>> + */
>> +if (txqi->txq.sta &&
>> +wiphy_ext_feature_isset(local->hw.wiphy,
>> +
>> NL80211_EXT_FEATURE_AIRTIME_FAIRNESS))
>> +list_add(>schedule_order,
>> +   

Re: [RFC v2 1/2] ath10k: migrate to mac80211 txq scheduling

2018-09-29 Thread Toke Høiland-Jørgensen
Rajkumar Manoharan  writes:

> ath10k maintains common txqs list for all stations. This txq
> management can be removed by migrating to mac80211 txq APIs
> and let mac80211 handle txqs reordering based on reported airtime.
> By doing this, txq fairness maintained in ath10k i.e processing
> N frames per txq is removed. By adapting to mac80211 APIs,
> ath10k will support mac80211 based airtime fairness algorithm.
>
> Signed-off-by: Toke Høiland-Jørgensen 
> Signed-off-by: Rajkumar Manoharan 
> ---
>  drivers/net/wireless/ath/ath10k/core.c   |  2 -
>  drivers/net/wireless/ath/ath10k/core.h   |  3 --
>  drivers/net/wireless/ath/ath10k/htc.h|  1 -
>  drivers/net/wireless/ath/ath10k/htt_rx.c |  8 +++
>  drivers/net/wireless/ath/ath10k/mac.c| 92 
> +---
>  5 files changed, 45 insertions(+), 61 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath10k/core.c 
> b/drivers/net/wireless/ath/ath10k/core.c
> index cf3c47b8cb2d..0684f87abc9b 100644
> --- a/drivers/net/wireless/ath/ath10k/core.c
> +++ b/drivers/net/wireless/ath/ath10k/core.c
> @@ -3068,9 +3068,7 @@ struct ath10k *ath10k_core_create(size_t priv_size, 
> struct device *dev,
>  
>   mutex_init(>conf_mutex);
>   spin_lock_init(>data_lock);
> - spin_lock_init(>txqs_lock);
>  
> - INIT_LIST_HEAD(>txqs);
>   INIT_LIST_HEAD(>peers);
>   init_waitqueue_head(>peer_mapping_wq);
>   init_waitqueue_head(>htt.empty_tx_wq);
> diff --git a/drivers/net/wireless/ath/ath10k/core.h 
> b/drivers/net/wireless/ath/ath10k/core.h
> index f6e5c29f74e7..d3e20aaf8023 100644
> --- a/drivers/net/wireless/ath/ath10k/core.h
> +++ b/drivers/net/wireless/ath/ath10k/core.h
> @@ -1054,10 +1054,7 @@ struct ath10k {
>  
>   /* protects shared structure data */
>   spinlock_t data_lock;
> - /* protects: ar->txqs, artxq->list */
> - spinlock_t txqs_lock;
>  
> - struct list_head txqs;
>   struct list_head arvifs;
>   struct list_head peers;
>   struct ath10k_peer *peer_map[ATH10K_MAX_NUM_PEER_IDS];
> diff --git a/drivers/net/wireless/ath/ath10k/htc.h 
> b/drivers/net/wireless/ath/ath10k/htc.h
> index 51fda6c23f69..cb30add7dd33 100644
> --- a/drivers/net/wireless/ath/ath10k/htc.h
> +++ b/drivers/net/wireless/ath/ath10k/htc.h
> @@ -51,7 +51,6 @@
>   */
>  
>  #define HTC_HOST_MAX_MSG_PER_RX_BUNDLE8
> -#define HTC_HOST_MAX_MSG_PER_TX_BUNDLE16
>  
>  enum ath10k_htc_tx_flags {
>   ATH10K_HTC_FLAG_NEED_CREDIT_UPDATE = 0x01,
> diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c 
> b/drivers/net/wireless/ath/ath10k/htt_rx.c
> index f2405258a6d3..f2aaa2f7a022 100644
> --- a/drivers/net/wireless/ath/ath10k/htt_rx.c
> +++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
> @@ -2379,6 +2379,7 @@ static void ath10k_htt_rx_tx_fetch_ind(struct ath10k 
> *ar, struct sk_buff *skb)
>   u8 tid;
>   int ret;
>   int i;
> + bool may_tx;
>  
>   ath10k_dbg(ar, ATH10K_DBG_HTT, "htt rx tx fetch ind\n");
>  
> @@ -2451,8 +2452,13 @@ static void ath10k_htt_rx_tx_fetch_ind(struct ath10k 
> *ar, struct sk_buff *skb)
>   num_msdus = 0;
>   num_bytes = 0;
>  
> + ieee80211_txq_schedule_start(hw, txq->ac);
> + may_tx = ieee80211_txq_may_transmit(hw, txq);
>   while (num_msdus < max_num_msdus &&
>  num_bytes < max_num_bytes) {
> + if (!may_tx)
> + break;
> +
>   ret = ath10k_mac_tx_push_txq(hw, txq);
>   if (ret < 0)
>   break;
> @@ -2460,6 +2466,8 @@ static void ath10k_htt_rx_tx_fetch_ind(struct ath10k 
> *ar, struct sk_buff *skb)
>   num_msdus++;
>   num_bytes += ret;
>   }
> + ieee80211_return_txq(hw, txq);
> + ieee80211_txq_schedule_end(hw, txq->ac);
>  
>   record->num_msdus = cpu_to_le16(num_msdus);
>   record->num_bytes = cpu_to_le32(num_bytes);
> diff --git a/drivers/net/wireless/ath/ath10k/mac.c 
> b/drivers/net/wireless/ath/ath10k/mac.c
> index 97548f96a2f7..3b441179a36c 100644
> --- a/drivers/net/wireless/ath/ath10k/mac.c
> +++ b/drivers/net/wireless/ath/ath10k/mac.c
> @@ -3874,7 +3874,6 @@ static void ath10k_mac_txq_init(struct ieee80211_txq 
> *txq)
>  
>  static void ath10k_mac_txq_unref(struct ath10k *ar, struct ieee80211_txq 
> *txq)
>  {
> - struct ath10k_txq *artxq;
>   struct ath10k_skb_cb *cb;
>   struct sk_buff *msdu;
>   int msdu_id;
> @@ -388

Re: [RFC v2 1/2] ath10k: migrate to mac80211 txq scheduling

2018-10-02 Thread Toke Høiland-Jørgensen
Rajkumar Manoharan  writes:

> On 2018-09-29 03:06, Toke Høiland-Jørgensen wrote:
>> Rajkumar Manoharan  writes:
>> 
>>> -   ath10k_htt_tx_txq_update(hw, f_txq);
>>> +   if (ret == -EBUSY) {
>>> +   ieee80211_txq_schedule_start(hw, ac);
>>> +   ieee80211_return_txq(hw, txq);
>>> +   ieee80211_txq_schedule_end(hw, ac);
>>> +   }
>> 
>> And ieee80211_return_txq() should be called regardless of the return
>> code (it'll do it's own checking and do nothing if the queue is empty).
>> 
> Yeah.. Seems like calling return_txq unconditionally is messing with
> may_transmit sequence. i.e pull mode.

Messing with, how?

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [RFC v2] ath10k: report tx rate using ieee80211_tx_status()

2018-09-03 Thread Toke Høiland-Jørgensen
Anilkumar Kolli  writes:

> On 2018-08-31 19:52, Toke Høiland-Jørgensen wrote:
>> Kalle Valo  writes:
>> 
>>> Anilkumar Kolli  writes:
>>> 
>>>> Mesh path metric needs txrate information from ieee80211_tx_status()
>>>> call but in ath10k there is no mechanism to report tx rate 
>>>> information
>>>> via ieee80211_tx_status(), the rate is only accessible via
>>>> sta_statiscs() op.
>>>> 
>>>> Per peer stats has tx rate info available, this patch stores per peer
>>>> last tx rate and updates the tx rate info structures in tx 
>>>> completition.
>>>> The rate updated in ieee80211_tx_status() is not exactly for the last
>>>> transmitted frame instead the rate is from one of the previous 
>>>> frames.
>>>> 
>>>> Per peer txrate information is updated through per peer statistics
>>>> and is available for QCA9888/QCA9984/QCA4019/QCA998X only
>>>> 
>>>> Tested on QCA9984 with firmware-5.bin_10.4-3.5.3-00053
>>>> Tested on QCA998X with firmware-5.bin_10.2.4-1.0-00036
>>>> 
>>>> Signed-off-by: Anilkumar Kolli 
>>> 
>>> This is a patch from last March, full patch here:
>>> 
>>> https://patchwork.kernel.org/patch/10267693/
>>> 
>>>> @@ -119,6 +122,18 @@ int ath10k_txrx_tx_unref(struct ath10k_htt *htt,
>>>>info->flags &= ~IEEE80211_TX_STAT_ACK;
>>>>}
>>>> 
>>>> +  if (sta) {
>>>> +  spin_lock_bh(>data_lock);
>>>> +  arsta = (struct ath10k_sta *)sta->drv_priv;
>>>> +  info->status.rates[0].idx =
>>>> +  arsta->tx_info.status.rates[0].idx;
>>>> +  info->status.rates[0].count =
>>>> +  arsta->tx_info.status.rates[0].count;
>>>> +  info->status.rates[0].flags =
>>> <> +arsta->tx_info.status.rates[0].flags;
>>>> +  spin_unlock_bh(>data_lock);
>>>> +  }
>>>> +
>>>>ieee80211_tx_status(htt->ar->hw, msdu);
>>>>/* we do not own the msdu anymore */
>>> 
>>> "is not exactly" is IMHO an understatement. What it means that with 
>>> this
>>> patch ath10k reports the rate information from another frame instead 
>>> of
>>> the current skb, because the firmware provides the rate information 
>>> "out
>>> of band". A simple example to clarify:
>>> 
>>>Let's say ath10k transmits frames A, B and C. Then ath10k calls
>>>ieee80211_tx_status() for frame C the rate information could be for
>>>frame A, or it could be the other around for frame A status the 
>>> rate
>>>information is from frame C.
>>> 
>>> In other words, there's no guarantee from what frame the rate
>>> information is from.
>>> 
>>> Too me this feels like a bad idea but I'm not familiar enough with
>>> mac80211 to really comment on this. What kind of implications does 
>>> this
>>> have for Mesh or ATF, for example? Adding Johannes and Toke to hear
>>> about their opinion about this.
>> 
>> I was under the impression that the values gathered (at least for
>> airtime) would be cumulative values? If it's just the airtime of a
>> single random frame, which is what I understand from your example, it's
>> not going to be terribly useful to provide ATF at least...
>> 
>> -Toke
>
> The design:
> Whenever radio transmits packet, firmware will record numbers of bytes 
> sent, MSDU’s sent, TX duration, AMPDU information, ACK fail, BA fail, 
> Rate at which packet is sent. This is recorded for 4 frames sent on that 
> peer. Once 4 frames are sent for that peer, TX packet event is sent to 
> ath10k driver with the recorded values. These rate values are updated to 
> mac80211 using ieee80211_tx_status().

So the values reported are the sums for all four packets? But the latest
value for rate information?

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-08-31 Thread Toke Høiland-Jørgensen
Peter Oh  writes:

> Hi Toke,
>
>
> On 08/17/2018 04:32 AM, Toke Høiland-Jørgensen wrote:
>>
>>>>> Shift 6:
>>>>> wgong@wgong-Latitude-E5440-1:~/flent$ flent -H 192.168.1.7 -t
>>>>> "sk_pacing_shift6" tcp_nup --test-parameter upload_streams=1
>>>>
> Does flent have options to set TOS or QoS queue for TCP or UDP test?
> If I add "--test-paramete burst_tos=0xe0", will it use corresponding 
> mac80211 QoS queue?

Yes, for some tests:

- The RRUL test will run four TCP flows in each direction, one marked for
  each 802.11 QoS queue.

- For the rtt_fair* tests you can set them with a test parameter:
--test-parameter markings=BE,CS1 to set flows 1 and 2 to BE and CS1
respectively. You can use the rtt_fair_var* tests to get a variable
number of flows (the same host can be specified multiple times by having
different hostnames resolve to the same IP).

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [RFC v2] ath10k: report tx rate using ieee80211_tx_status()

2018-08-31 Thread Toke Høiland-Jørgensen
Kalle Valo  writes:

> Anilkumar Kolli  writes:
>
>> Mesh path metric needs txrate information from ieee80211_tx_status()
>> call but in ath10k there is no mechanism to report tx rate information
>> via ieee80211_tx_status(), the rate is only accessible via
>> sta_statiscs() op.
>>
>> Per peer stats has tx rate info available, this patch stores per peer
>> last tx rate and updates the tx rate info structures in tx completition.
>> The rate updated in ieee80211_tx_status() is not exactly for the last
>> transmitted frame instead the rate is from one of the previous frames.
>>
>> Per peer txrate information is updated through per peer statistics
>> and is available for QCA9888/QCA9984/QCA4019/QCA998X only
>>
>> Tested on QCA9984 with firmware-5.bin_10.4-3.5.3-00053
>> Tested on QCA998X with firmware-5.bin_10.2.4-1.0-00036
>>
>> Signed-off-by: Anilkumar Kolli 
>
> This is a patch from last March, full patch here:
>
> https://patchwork.kernel.org/patch/10267693/
>
>> @@ -119,6 +122,18 @@ int ath10k_txrx_tx_unref(struct ath10k_htt *htt,
>>  info->flags &= ~IEEE80211_TX_STAT_ACK;
>>  }
>>  
>> +if (sta) {
>> +spin_lock_bh(>data_lock);
>> +arsta = (struct ath10k_sta *)sta->drv_priv;
>> +info->status.rates[0].idx =
>> +arsta->tx_info.status.rates[0].idx;
>> +info->status.rates[0].count =
>> +arsta->tx_info.status.rates[0].count;
>> +info->status.rates[0].flags =
> <> +  arsta->tx_info.status.rates[0].flags;
>> +spin_unlock_bh(>data_lock);
>> +}
>> +
>>  ieee80211_tx_status(htt->ar->hw, msdu);
>>  /* we do not own the msdu anymore */
>
> "is not exactly" is IMHO an understatement. What it means that with this
> patch ath10k reports the rate information from another frame instead of
> the current skb, because the firmware provides the rate information "out
> of band". A simple example to clarify:
>
>Let's say ath10k transmits frames A, B and C. Then ath10k calls
>ieee80211_tx_status() for frame C the rate information could be for
>frame A, or it could be the other around for frame A status the rate
>information is from frame C.
>
> In other words, there's no guarantee from what frame the rate
> information is from.
>
> Too me this feels like a bad idea but I'm not familiar enough with
> mac80211 to really comment on this. What kind of implications does this
> have for Mesh or ATF, for example? Adding Johannes and Toke to hear
> about their opinion about this.

I was under the impression that the values gathered (at least for
airtime) would be cumulative values? If it's just the airtime of a
single random frame, which is what I understand from your example, it's
not going to be terribly useful to provide ATF at least...

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-09-03 Thread Toke Høiland-Jørgensen
Johannes Berg  writes:

> On Thu, 2018-08-30 at 16:32 -0700, Grant Grundler wrote:
>> On Sun, Aug 12, 2018 at 10:37 PM Wen Gong  wrote:
>> ...
>> > I have tested with your method for shift 6/8/10/7 all twice time in open 
>> > air environment.
>> 
>> I've attached the graphed data which Wen provided me and I made one
>> "cleanup": the median and average "ICMP" are computed only using
>> values where flent has an "antagonist" load running.  The first 28 and
>> last 26 seconds of the test are "idle" - it makes no sense to include
>> the latency of those. This drops about 1/7th of the data points.
>> 
>> My observations:
>> o the delta between average and median is evidence this is not
>> particularly reliable data (but expected result for "open air" wifi
>> testing in a corp environment).
>> o sk_pacing_shift=8 and/or =7 appears to have suffered from
>> significant open air interference - I've attached a graph of the raw
>> data.
>> o I'm comfortable asserting throughput is higher and latency is lower
>> for sk_pacing_shift=6 (vs 7) at the cost of additional CPU
>> utilization.
>> 
>> My questions:
>> 1) Is the current conversation just quibbling about 6 vs 7 for the
>> ath10k default?
>
> 6 vs. 8, I think? But I didn't follow the full discussion.
>
> I also think that we shouldn't necessarily restrict this to "for the
> ath10k". Is there any reason to think that this would be different for
> different chips?

No, I also think it should be possible to select a value that will work
for all drivers. There's a strong diminishing returns effect here after
a certain point, and I believe we should strongly push back against just
adding more buffering to chase (single-flow) throughput numbers; and I'm
not convinced that having different values for different drivers is
worth the complexity.

As far as I'm concerned, just showing an increase in maximum throughput
under ideal conditions (as this initial patch submission did) is not
sufficient argument for adding more buffering. At a minimum, the
evaluation should also include:

- Impact on latency and throughput for competing flows in the same test.
- Impact on latency for the TCP flow itself.
- A full evaluation at low rates and in scenarios with more than one
  client.

As far as the actual value, I *think* it may be that the default shift
should be 7 (8 ms) rather than 8 (4 ms) as it currently is. Going back
and looking at my data from when I submitted the original patch, it
looks like the point of diminishing returns is somewhere between those
two with ath9k (where I did most of my testing), and it seems reasonable
that it could be slightly higher (i.e., more buffering) for ath10k.

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-09-03 Thread Toke Høiland-Jørgensen
Johannes Berg  writes:

> On Mon, 2018-09-03 at 13:11 +0200, Toke Høiland-Jørgensen wrote:
>
>> > 6 vs. 8, I think? But I didn't follow the full discussion.
>
> Err, I just realized that I was completely wrong - the default, of
> course, is 10. So smaller values mean more buffering.
>
> Most of my argumentation was thus utter garbage :-)

Well, I got the gist, even if the sign bit was wrong ;)

>> > I also think that we shouldn't necessarily restrict this to "for the
>> > ath10k". Is there any reason to think that this would be different for
>> > different chips?
>> 
>> No, I also think it should be possible to select a value that will work
>> for all drivers. There's a strong diminishing returns effect here after
>> a certain point, and I believe we should strongly push back against just
>> adding more buffering to chase (single-flow) throughput numbers; and I'm
>> not convinced that having different values for different drivers is
>> worth the complexity.
>
> I think I can see some point in it if the driver requires some
> buffering for some specific reason? But you'd have to be able to state
> that reason, I guess, I could imagine having a firmware limitation to
> need to build two aggregates, or something around MU-MIMO, etc.

Right, I'm not ruling out that there can be legitimate reasons to add
extra buffering; but a lot of times it's just used to paper over other
issues, so a good explanation is definitely needed...

>> As far as the actual value, I *think* it may be that the default shift
>> should be 7 (8 ms) rather than 8 (4 ms) as it currently is. Going back
>> and looking at my data from when I submitted the original patch, it
>> looks like the point of diminishing returns is somewhere between those
>> two with ath9k (where I did most of my testing), and it seems reasonable
>> that it could be slightly higher (i.e., more buffering) for ath10k.
>
> Grant's data shows a significant difference between 6 and 7 for both
> latency and throughput:
>
>  * median tpt
>- ~241 vs ~201 (both 1 and 5 streams)
>  * median latency
>- 7.5 vs 6 (1 stream)
>- 17.3 vs. 16.6 (5 streams)
>
> A 20% throughput improvement at <= 1.5ms latency cost seems like a
> pretty reasonable trade-off?

Yeah, on it's face. What I'm bothered about is that it is the exact
opposite results that I got from my ath10k tests (there, throughput
*dropped* and latency doubled when going to from 4 to 16 ms of
buffering). And, well, Grant's data is from a single test in a noisy
environment where the timeseries graph shows that throughput is all over
the place for the duration of the test; so it's hard to draw solid
conclusions from (for instance, for the 5-stream test, the average
throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 7
it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware
used in this test, so I can't go verify it myself; so the only thing I
can do is grumble about it here... :)

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [RFCv3 1/2] mac80211: implement ieee80211_tx_rate_update to update rate

2018-09-21 Thread Toke Høiland-Jørgensen
Anilkumar Kolli  writes:

> On 2018-09-20 21:40, Toke Høiland-Jørgensen wrote:
>> Anilkumar Kolli  writes:
>> 
>>> Current mac80211 has provision to update tx status through
>>> ieee80211_tx_status() and ieee80211_tx_status_ext(). But
>>> drivers like ath10k updates the tx status from the skb except
>>> txrate, txrate will be updated from a different path, peer stats.
>>> 
>>> Using ieee80211_tx_status_ext() in two different paths
>>>   - (one for the stats, one for the tx rate) will duplicate the stats.
>>> 
>>> To avoid this stats duplication, ieee80211_tx_rate_update() is 
>>> implemented.
>>> 
>>> Signed-off-by: Anilkumar Kolli 
>>> ---
>>> V3:
>>>   - Added new API in mac80211 to update tx rate(Johannes)
>>> 
>>>  include/net/mac80211.h |   14 ++
>>>  net/mac80211/status.c  |   24 
>>>  2 files changed, 38 insertions(+)
>>> 
>>> diff --git a/include/net/mac80211.h b/include/net/mac80211.h
>>> index 8c26d2d36cbe..042186a21770 100644
>>> --- a/include/net/mac80211.h
>>> +++ b/include/net/mac80211.h
>>> @@ -4331,6 +4331,20 @@ void 
>>> ieee80211_sta_set_expected_throughput(struct ieee80211_sta *pubsta,
>>>u32 thr);
>>> 
>>>  /**
>>> + * ieee80211_tx_rate_update - transmit rate update callback
>>> + *
>>> + * This function can be used in drivers that does not have provision
>>> + * in updating the tx rate in data path.
>>> + *
>>> + * @hw: the hardware the frame was transmitted by
>>> + * @status: tx status information
>>> + * @pubsta: the station to update the tx rate for.
>>> + */
>>> +void ieee80211_tx_rate_update(struct ieee80211_hw *hw,
>>> + struct ieee80211_sta *pubsta,
>>> + struct ieee80211_tx_info *info);
>>> +
>>> +/**
>>>   * ieee80211_tx_status - transmit status callback
>>>   *
>>>   * Call this function for all transmitted frames after they have been
>>> diff --git a/net/mac80211/status.c b/net/mac80211/status.c
>>> index 9a6d7208bf4f..232297ddaa02 100644
>>> --- a/net/mac80211/status.c
>>> +++ b/net/mac80211/status.c
>>> @@ -988,6 +988,30 @@ void ieee80211_tx_status_ext(struct ieee80211_hw 
>>> *hw,
>>>  }
>>>  EXPORT_SYMBOL(ieee80211_tx_status_ext);
>>> 
>>> +void ieee80211_tx_rate_update(struct ieee80211_hw *hw,
>>> + struct ieee80211_sta *pubsta,
>>> + struct ieee80211_tx_info *info)
>>> +{
>>> +   struct ieee80211_local *local = hw_to_local(hw);
>>> +   struct ieee80211_supported_band *sband;
>>> +   struct sta_info *sta;
>>> +   struct ieee80211_tx_status status;
>>> +
>>> +   sband = hw->wiphy->bands[info->band];
>>> +
>>> +   if (pubsta) {
>> 
>> The function does nothing if pubsta is NULL; wouldn't it make more 
>> sense
>> to just mandate that it isn't NULL rather than have this check?
>> 
>
> ATH10K driver always calls this function with a valid STA [not NULL].

Yeah, so why the check for non-NULL? :)

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [RFCv3 1/2] mac80211: implement ieee80211_tx_rate_update to update rate

2018-09-20 Thread Toke Høiland-Jørgensen
Anilkumar Kolli  writes:

> Current mac80211 has provision to update tx status through
> ieee80211_tx_status() and ieee80211_tx_status_ext(). But
> drivers like ath10k updates the tx status from the skb except
> txrate, txrate will be updated from a different path, peer stats.
>
> Using ieee80211_tx_status_ext() in two different paths
>   - (one for the stats, one for the tx rate) will duplicate the stats.
>
> To avoid this stats duplication, ieee80211_tx_rate_update() is implemented.
>
> Signed-off-by: Anilkumar Kolli 
> ---
> V3:
>   - Added new API in mac80211 to update tx rate(Johannes)
>
>  include/net/mac80211.h |   14 ++
>  net/mac80211/status.c  |   24 
>  2 files changed, 38 insertions(+)
>
> diff --git a/include/net/mac80211.h b/include/net/mac80211.h
> index 8c26d2d36cbe..042186a21770 100644
> --- a/include/net/mac80211.h
> +++ b/include/net/mac80211.h
> @@ -4331,6 +4331,20 @@ void ieee80211_sta_set_expected_throughput(struct 
> ieee80211_sta *pubsta,
>  u32 thr);
>  
>  /**
> + * ieee80211_tx_rate_update - transmit rate update callback
> + *
> + * This function can be used in drivers that does not have provision
> + * in updating the tx rate in data path.
> + *
> + * @hw: the hardware the frame was transmitted by
> + * @status: tx status information
> + * @pubsta: the station to update the tx rate for.
> + */
> +void ieee80211_tx_rate_update(struct ieee80211_hw *hw,
> +   struct ieee80211_sta *pubsta,
> +   struct ieee80211_tx_info *info);
> +
> +/**
>   * ieee80211_tx_status - transmit status callback
>   *
>   * Call this function for all transmitted frames after they have been
> diff --git a/net/mac80211/status.c b/net/mac80211/status.c
> index 9a6d7208bf4f..232297ddaa02 100644
> --- a/net/mac80211/status.c
> +++ b/net/mac80211/status.c
> @@ -988,6 +988,30 @@ void ieee80211_tx_status_ext(struct ieee80211_hw *hw,
>  }
>  EXPORT_SYMBOL(ieee80211_tx_status_ext);
>  
> +void ieee80211_tx_rate_update(struct ieee80211_hw *hw,
> +   struct ieee80211_sta *pubsta,
> +   struct ieee80211_tx_info *info)
> +{
> + struct ieee80211_local *local = hw_to_local(hw);
> + struct ieee80211_supported_band *sband;
> + struct sta_info *sta;
> + struct ieee80211_tx_status status;
> +
> + sband = hw->wiphy->bands[info->band];
> +
> + if (pubsta) {

The function does nothing if pubsta is NULL; wouldn't it make more sense
to just mandate that it isn't NULL rather than have this check?

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [RFC 2/2] ath10k: reporting estimated tx airtime for fairness

2018-09-28 Thread Toke Høiland-Jørgensen
Rajkumar Manoharan  writes:

> Transmit airtime will be estimated from last tx rate used.
> Firmware report tx rate by peer stats. Airtime is computed
> on tx path and the same will be reported to mac80211 upon
> tx completion.
>
> Signed-off-by: Kan Yan 
> Signed-off-by: Rajkumar Manoharan 
> ---
>  drivers/net/wireless/ath/ath10k/core.h   |  2 ++
>  drivers/net/wireless/ath/ath10k/htt_rx.c |  1 +
>  drivers/net/wireless/ath/ath10k/mac.c| 58 
> ++--
>  drivers/net/wireless/ath/ath10k/txrx.c   |  4 +++
>  4 files changed, 62 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath10k/core.h 
> b/drivers/net/wireless/ath/ath10k/core.h
> index d3e20aaf8023..4bfc370bf659 100644
> --- a/drivers/net/wireless/ath/ath10k/core.h
> +++ b/drivers/net/wireless/ath/ath10k/core.h
> @@ -123,6 +123,7 @@ struct ath10k_skb_cb {
>   u8 flags;
>   u8 eid;
>   u16 msdu_id;
> + u16 airtime_est;
>   struct ieee80211_vif *vif;
>   struct ieee80211_txq *txq;
>  } __packed;
> @@ -493,6 +494,7 @@ struct ath10k_sta {
>   u32 smps;
>   u16 peer_id;
>   struct rate_info txrate;
> + u32 last_tx_bitrate;
>  
>   struct work_struct update_wk;
>   u64 rx_duration;
> diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c 
> b/drivers/net/wireless/ath/ath10k/htt_rx.c
> index f2aaa2f7a022..f1de9fc09d9f 100644
> --- a/drivers/net/wireless/ath/ath10k/htt_rx.c
> +++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
> @@ -2809,6 +2809,7 @@ static inline int ath10k_get_legacy_rate_idx(struct 
> ath10k *ar, u8 rate)
>  
>   arsta->txrate.nss = txrate.nss;
>   arsta->txrate.bw = ath10k_bw_to_mac80211_bw(txrate.bw);
> + arsta->last_tx_bitrate = cfg80211_calculate_bitrate(>txrate);
>  
>   if (ath10k_debug_is_extd_tx_stats_enabled(ar))
>   ath10k_accumulate_per_peer_tx_stats(ar, arsta, peer_stats,
> diff --git a/drivers/net/wireless/ath/ath10k/mac.c 
> b/drivers/net/wireless/ath/ath10k/mac.c
> index d4648b22ad64..a694dae6f024 100644
> --- a/drivers/net/wireless/ath/ath10k/mac.c
> +++ b/drivers/net/wireless/ath/ath10k/mac.c
> @@ -3528,7 +3528,7 @@ static void ath10k_tx_h_add_p2p_noa_ie(struct ath10k 
> *ar,
>  static void ath10k_mac_tx_h_fill_cb(struct ath10k *ar,
>   struct ieee80211_vif *vif,
>   struct ieee80211_txq *txq,
> - struct sk_buff *skb)
> + struct sk_buff *skb, u16 airtime)
>  {
>   struct ieee80211_hdr *hdr = (void *)skb->data;
>   struct ath10k_skb_cb *cb = ATH10K_SKB_CB(skb);
> @@ -3545,6 +3545,7 @@ static void ath10k_mac_tx_h_fill_cb(struct ath10k *ar,
>  
>   cb->vif = vif;
>   cb->txq = txq;
> + cb->airtime_est = airtime;
>  }
>  
>  bool ath10k_mac_tx_frm_has_freq(struct ath10k *ar)
> @@ -3932,6 +3933,50 @@ static bool ath10k_mac_tx_can_push(struct ieee80211_hw 
> *hw,
>   return false;
>  }
>  
> +/* Return estimated airtime in microsecond, which is calculated using last
> + * reported TX rate. This is just a rough estimation because host driver has 
> no
> + * knowledge of the actual transmit rate, retries or aggregation. If actual
> + * airtime can be reported by firmware, then delta between estimated and 
> actual
> + * airtime can be adjusted from deficit.
> + */
> +#define IEEE80211_ATF_OVERHEAD   100 /* IFS + some slot time 
> */
> +#define IEEE80211_ATF_OVERHEAD_IFS   16  /* IFS only */
> +static u16 ath10k_mac_update_airtime(struct ath10k *ar,
> +  struct ieee80211_txq *txq,
> +  struct sk_buff *skb)
> +{
> + struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)skb->data;
> + struct ath10k_sta *arsta;
> + u32 pktlen;
> + u16 airtime = 0;
> +
> + if (!txq || !txq->sta || !ieee80211_is_data(hdr->frame_control))
> + return airtime;
> +
> + spin_lock_bh(>data_lock);
> + arsta = (struct ath10k_sta *)txq->sta->drv_priv;
> +
> + pktlen = skb->len + 38; /* Assume MAC header 30, SNAP 8 for most case */
> + if (arsta->last_tx_bitrate) {
> + /* airtime in us, last_tx_bitrate in 100kbps */
> + airtime = (pktlen * 8 * (1000 / 100))
> + / arsta->last_tx_bitrate;
> + /* overhead for media access time and IFS */
> + airtime += IEEE80211_ATF_OVERHEAD_IFS;
> + } else {
> + /* This is mostly for throttle excessive BC/MC frames, and the
> +  * airtime/rate doesn't need be exact. Airtime of BC/MC frames
> +  * in 2G get some discount, which helps prevent very low rate
> +  * frames from being blocked for too long.
> +  */
> + airtime = (pktlen * 8 * (1000 / 100)) / 60; /* 6M */
> + airtime += IEEE80211_ATF_OVERHEAD;
> + }
> + spin_unlock_bh(>data_lock);
> +
> + return 

Re: [PATCH v5 4/6] ath9k: Switch to mac80211 TXQ scheduling and airtime APIs

2019-01-21 Thread Toke Høiland-Jørgensen
Just discovered this while working on my follow-up:

>  void ath_tx_queue_tid(struct ath_softc *sc, struct ath_atx_tid *tid)
>  {
> - struct ath_vif *avp = (struct ath_vif *) tid->an->vif->drv_priv;
> - struct ath_chanctx *ctx = avp->chanctx;
> - struct ath_acq *acq;
> + struct ieee80211_txq *queue =
> + container_of((void *)tid, struct ieee80211_txq, drv_priv);
>  
> - if (!ctx || !list_empty(>list))
> - return;
> -
> - acq = >acq[TID_TO_WME_AC(tid->tidno)];
> - spin_lock_bh(>lock);
> - __ath_tx_queue_tid(sc, tid);
> - spin_unlock_bh(>lock);
> + ieee80211_return_txq(sc->hw, queue);
>  }

After we evolved the API, this is now wrong, as ieee80211_return_txq()
should only be called while holding the right lock. I'll post a fixed
version tomorrow.

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v4 6/6] ath10k: reporting estimated tx airtime for fairness

2019-01-21 Thread Toke Høiland-Jørgensen
Toke Høiland-Jørgensen  writes:

> Rajkumar Manoharan  writes:
>
>> From: Kan Yan 
>>
>> Transmit airtime will be estimated from last tx rate used.
>> Firmware report tx rate by peer stats. Airtime is computed
>> on tx path and the same will be reported to mac80211 upon
>> tx completion.
>
> FYI, it seems that this last patch in the series no longer applies
> cleanly on top of mac80211-next. May need a refresh.
>
> The mac80211 parts were merged into mac80211-next (thanks, Johannes!); I
> guess the driver updates are waiting for drivers-next to catch up to
> that?
>
> Patches 4 and 5 in the series are fine...

This was meant to be a reply to v5 of the patch set (which is where I
had the issue); apologies for picking the wrong one to reply to...

-Toke
>
> -Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH v6 0/4] Switch ath9k and ath10k to mac80211 airtime framework

2019-01-22 Thread Toke Høiland-Jørgensen
This is an updated resend of the driver part of the previous patch set
that moves airtime fairness scheduling into mac80211 and enables it for
ath10k as well.

This version is just a refresh of the driver code, along with a small
fix for the issue I noticed yesterday where ath9k was calling
ieee80211_return_txq() without proper logging.


Kan Yan (1):
  ath10k: reporting estimated tx airtime for fairness

Toke Høiland-Jørgensen (3):
  mac80211: Expose ieee80211_schedule_txq() function
  ath9k: Switch to mac80211 TXQ scheduling and airtime APIs
  ath10k: migrate to mac80211 txq scheduling

 drivers/net/wireless/ath/ath10k/core.c |   2 -
 drivers/net/wireless/ath/ath10k/core.h |   8 +-
 drivers/net/wireless/ath/ath10k/htc.h  |   1 -
 drivers/net/wireless/ath/ath10k/htt_rx.c   |   9 +
 drivers/net/wireless/ath/ath10k/mac.c  | 155 -
 drivers/net/wireless/ath/ath10k/txrx.c |   4 +
 drivers/net/wireless/ath/ath9k/ath9k.h |  14 --
 drivers/net/wireless/ath/ath9k/debug.c |   3 -
 drivers/net/wireless/ath/ath9k/debug.h |   8 -
 drivers/net/wireless/ath/ath9k/debug_sta.c |  70 --
 drivers/net/wireless/ath/ath9k/init.c  |   3 +-
 drivers/net/wireless/ath/ath9k/recv.c  |   9 +-
 drivers/net/wireless/ath/ath9k/xmit.c  | 244 ++---
 include/net/mac80211.h |  13 ++
 net/mac80211/driver-ops.h  |   4 +-
 net/mac80211/tx.c  |  13 ++
 16 files changed, 217 insertions(+), 343 deletions(-)

-- 
2.20.1


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH v6 4/4] ath10k: reporting estimated tx airtime for fairness

2019-01-22 Thread Toke Høiland-Jørgensen
From: Kan Yan 

Transmit airtime will be estimated from last tx rate used.
Firmware report tx rate by peer stats. Airtime is computed
on tx path and the same will be reported to mac80211 upon
tx completion.

This change is based on Kan's orginal commit in Chromium tree
("CHROMIUM: ath10k: Implementing airtime fairness based TX scheduler")
ref: https://chromium-review.googlesource.com/588190

Tested on QCA4019 with firmware version 10.4-3.2.1.1-00015
Tested on QCA9984 with firmware version 10.4-3.9.0.1-5

Signed-off-by: Kan Yan 
[rmano...@codeaurora.org: ported only the airtime computation]
Signed-off-by: Rajkumar Manoharan 
[t...@redhat.com: Rebase to mac80211-next, add test note]
Signed-off-by: Toke Høiland-Jørgensen 
---

Rajkumar / Kan: Please check that I didn't mess anything up while
rebasing this onto the current mac80211-next branch!

 drivers/net/wireless/ath/ath10k/core.h   |  2 +
 drivers/net/wireless/ath/ath10k/htt_rx.c |  1 +
 drivers/net/wireless/ath/ath10k/mac.c| 57 ++--
 drivers/net/wireless/ath/ath10k/txrx.c   |  4 ++
 4 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/core.h 
b/drivers/net/wireless/ath/ath10k/core.h
index 28cecf7375f8..4528a6e47b8d 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -126,6 +126,7 @@ struct ath10k_skb_cb {
u8 flags;
u8 eid;
u16 msdu_id;
+   u16 airtime_est;
struct ieee80211_vif *vif;
struct ieee80211_txq *txq;
 } __packed;
@@ -498,6 +499,7 @@ struct ath10k_sta {
u16 peer_id;
struct rate_info txrate;
struct ieee80211_tx_info tx_info;
+   u32 last_tx_bitrate;
 
struct work_struct update_wk;
u64 rx_duration;
diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c 
b/drivers/net/wireless/ath/ath10k/htt_rx.c
index dc9895f2eb9d..6a93b57900ae 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -3078,6 +3078,7 @@ ath10k_update_per_peer_tx_stats(struct ath10k *ar,
 
arsta->txrate.nss = txrate.nss;
arsta->txrate.bw = ath10k_bw_to_mac80211_bw(txrate.bw);
+   arsta->last_tx_bitrate = cfg80211_calculate_bitrate(>txrate);
if (sgi)
arsta->txrate.flags |= RATE_INFO_FLAGS_SHORT_GI;
 
diff --git a/drivers/net/wireless/ath/ath10k/mac.c 
b/drivers/net/wireless/ath/ath10k/mac.c
index cf7fdf72e990..b13dcb4533cb 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -3544,7 +3544,7 @@ static void ath10k_tx_h_add_p2p_noa_ie(struct ath10k *ar,
 static void ath10k_mac_tx_h_fill_cb(struct ath10k *ar,
struct ieee80211_vif *vif,
struct ieee80211_txq *txq,
-   struct sk_buff *skb)
+   struct sk_buff *skb, u16 airtime)
 {
struct ieee80211_hdr *hdr = (void *)skb->data;
struct ath10k_skb_cb *cb = ATH10K_SKB_CB(skb);
@@ -3561,6 +3561,7 @@ static void ath10k_mac_tx_h_fill_cb(struct ath10k *ar,
 
cb->vif = vif;
cb->txq = txq;
+   cb->airtime_est = airtime;
 }
 
 bool ath10k_mac_tx_frm_has_freq(struct ath10k *ar)
@@ -3948,6 +3949,49 @@ static bool ath10k_mac_tx_can_push(struct ieee80211_hw 
*hw,
return false;
 }
 
+/* Return estimated airtime in microsecond, which is calculated using last
+ * reported TX rate. This is just a rough estimation because host driver has no
+ * knowledge of the actual transmit rate, retries or aggregation. If actual
+ * airtime can be reported by firmware, then delta between estimated and actual
+ * airtime can be adjusted from deficit.
+ */
+#define IEEE80211_ATF_OVERHEAD 100 /* IFS + some slot time */
+#define IEEE80211_ATF_OVERHEAD_IFS 16  /* IFS only */
+static u16 ath10k_mac_update_airtime(struct ath10k *ar,
+struct ieee80211_txq *txq,
+struct sk_buff *skb)
+{
+   struct ath10k_sta *arsta;
+   u32 pktlen;
+   u16 airtime = 0;
+
+   if (!txq || !txq->sta)
+   return airtime;
+
+   spin_lock_bh(>data_lock);
+   arsta = (struct ath10k_sta *)txq->sta->drv_priv;
+
+   pktlen = skb->len + 38; /* Assume MAC header 30, SNAP 8 for most case */
+   if (arsta->last_tx_bitrate) {
+   /* airtime in us, last_tx_bitrate in 100kbps */
+   airtime = (pktlen * 8 * (1000 / 100))
+   / arsta->last_tx_bitrate;
+   /* overhead for media access time and IFS */
+   airtime += IEEE80211_ATF_OVERHEAD_IFS;
+   } else {
+   /* This is mostly for throttle excessive BC/MC frames, and the
+* airtime/rate doesn't need be exact. Airtime of BC/MC frames
+* 

[PATCH v6 2/4] ath9k: Switch to mac80211 TXQ scheduling and airtime APIs

2019-01-22 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

This moves the ath9k driver to use the mac80211 TXQ scheduling and
airtime accounting APIs, removing the corresponding state tracking
inside the driver.

Signed-off-by: Toke Høiland-Jørgensen 
[rmano...@codeaurora.org: fixed checkpatch error and warnings]
Signed-off-by: Rajkumar Manoharan 
---
v6:
  - Use the ieee80211_schedule_txq() function introduced in the previous patch
instead of calling ieee80211_return_txq() with no locking.  

 drivers/net/wireless/ath/ath9k/ath9k.h |  14 --
 drivers/net/wireless/ath/ath9k/debug.c |   3 -
 drivers/net/wireless/ath/ath9k/debug.h |   8 -
 drivers/net/wireless/ath/ath9k/debug_sta.c |  70 --
 drivers/net/wireless/ath/ath9k/init.c  |   3 +-
 drivers/net/wireless/ath/ath9k/recv.c  |   9 +-
 drivers/net/wireless/ath/ath9k/xmit.c  | 244 ++---
 7 files changed, 75 insertions(+), 276 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h 
b/drivers/net/wireless/ath/ath9k/ath9k.h
index 0fca44e91a71..a412b352182c 100644
--- a/drivers/net/wireless/ath/ath9k/ath9k.h
+++ b/drivers/net/wireless/ath/ath9k/ath9k.h
@@ -112,8 +112,6 @@ int ath_descdma_setup(struct ath_softc *sc, struct 
ath_descdma *dd,
 #define ATH_TXFIFO_DEPTH   8
 #define ATH_TX_ERROR   0x01
 
-#define ATH_AIRTIME_QUANTUM300 /* usec */
-
 /* Stop tx traffic 1ms before the GO goes away */
 #define ATH_P2P_PS_STOP_TIME   1000
 
@@ -246,10 +244,8 @@ struct ath_atx_tid {
s8 bar_index;
bool active;
bool clear_ps_filter;
-   bool has_queued;
 };
 
-void __ath_tx_queue_tid(struct ath_softc *sc, struct ath_atx_tid *tid);
 void ath_tx_queue_tid(struct ath_softc *sc, struct ath_atx_tid *tid);
 
 struct ath_node {
@@ -263,12 +259,9 @@ struct ath_node {
 
bool sleeping;
bool no_ps_filter;
-   s64 airtime_deficit[IEEE80211_NUM_ACS];
-   u32 airtime_rx_start;
 
 #ifdef CONFIG_ATH9K_STATION_STATISTICS
struct ath_rx_rate_stats rx_rate_stats;
-   struct ath_airtime_stats airtime_stats;
 #endif
u8 key_idx[4];
 
@@ -986,11 +979,6 @@ void ath_ant_comb_scan(struct ath_softc *sc, struct 
ath_rx_status *rs);
 
 #define ATH9K_NUM_CHANCTX  2 /* supports 2 operating channels */
 
-#define AIRTIME_USE_TX BIT(0)
-#define AIRTIME_USE_RX BIT(1)
-#define AIRTIME_USE_NEW_QUEUES BIT(2)
-#define AIRTIME_ACTIVE(flags) (!!(flags & (AIRTIME_USE_TX|AIRTIME_USE_RX)))
-
 struct ath_softc {
struct ieee80211_hw *hw;
struct device *dev;
@@ -1034,8 +1022,6 @@ struct ath_softc {
short nbcnvifs;
unsigned long ps_usecount;
 
-   u16 airtime_flags; /* AIRTIME_* */
-
struct ath_rx rx;
struct ath_tx tx;
struct ath_beacon beacon;
diff --git a/drivers/net/wireless/ath/ath9k/debug.c 
b/drivers/net/wireless/ath/ath9k/debug.c
index 4399e9ad058f..0dfea5d6e949 100644
--- a/drivers/net/wireless/ath/ath9k/debug.c
+++ b/drivers/net/wireless/ath/ath9k/debug.c
@@ -1443,9 +1443,6 @@ int ath9k_init_debug(struct ath_hw *ah)
 #endif
debugfs_create_file("tpc", 0600, sc->debug.debugfs_phy, sc, _tpc);
 
-   debugfs_create_u16("airtime_flags", 0600,
-  sc->debug.debugfs_phy, >airtime_flags);
-
debugfs_create_file("nf_override", 0600,
sc->debug.debugfs_phy, sc, _nf_override);
 
diff --git a/drivers/net/wireless/ath/ath9k/debug.h 
b/drivers/net/wireless/ath/ath9k/debug.h
index 79607db14387..33826aa13687 100644
--- a/drivers/net/wireless/ath/ath9k/debug.h
+++ b/drivers/net/wireless/ath/ath9k/debug.h
@@ -319,20 +319,12 @@ ath9k_debug_sync_cause(struct ath_softc *sc, u32 
sync_cause)
 void ath_debug_rate_stats(struct ath_softc *sc,
  struct ath_rx_status *rs,
  struct sk_buff *skb);
-void ath_debug_airtime(struct ath_softc *sc,
-  struct ath_node *an,
-  u32 rx, u32 tx);
 #else
 static inline void ath_debug_rate_stats(struct ath_softc *sc,
struct ath_rx_status *rs,
struct sk_buff *skb)
 {
 }
-static inline void ath_debug_airtime(struct ath_softc *sc,
- struct ath_node *an,
- u32 rx, u32 tx)
-{
-}
 #endif /* CONFIG_ATH9K_STATION_STATISTICS */
 
 #endif /* DEBUG_H */
diff --git a/drivers/net/wireless/ath/ath9k/debug_sta.c 
b/drivers/net/wireless/ath/ath9k/debug_sta.c
index e8fcd3e1c470..d95cabddce33 100644
--- a/drivers/net/wireless/ath/ath9k/debug_sta.c
+++ b/drivers/net/wireless/ath/ath9k/debug_sta.c
@@ -242,75 +242,6 @@ static const struct file_operations fops_node_recv = {
.llseek = default_llseek,
 };
 
-void ath_debug_airtime(struct ath_softc *sc,
-   struct ath_node *an,
-   u32 rx,
-   u32 tx)
-{
-   struct ath_airtime_stats *astats 

[PATCH v6 3/4] ath10k: migrate to mac80211 txq scheduling

2019-01-22 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

ath10k maintains common txqs list for all stations. This txq
management can be removed by migrating to mac80211 txq APIs
and let mac80211 handle txqs reordering based on reported airtime.
By doing this, txq fairness maintained in ath10k i.e processing
N frames per txq is removed. By adapting to mac80211 APIs,
ath10k will support mac80211 based airtime fairness algorithm.

Tested on QCA4019 with firmware version 10.4-3.2.1.1-00015
Tested on QCA9984 with firmware version 10.4-3.9.0.1-5

Tested-by: Venkateswara Naralasetty 
Co-developed-by: Rajkumar Manoharan 
Signed-off-by: Rajkumar Manoharan 
Signed-off-by: Toke Høiland-Jørgensen 
---
 drivers/net/wireless/ath/ath10k/core.c   |  2 -
 drivers/net/wireless/ath/ath10k/core.h   |  6 +-
 drivers/net/wireless/ath/ath10k/htc.h|  1 -
 drivers/net/wireless/ath/ath10k/htt_rx.c |  8 ++
 drivers/net/wireless/ath/ath10k/mac.c| 98 +++-
 5 files changed, 54 insertions(+), 61 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/core.c 
b/drivers/net/wireless/ath/ath10k/core.c
index 399b501f3c3c..c7327900316e 100644
--- a/drivers/net/wireless/ath/ath10k/core.c
+++ b/drivers/net/wireless/ath/ath10k/core.c
@@ -3098,9 +3098,7 @@ struct ath10k *ath10k_core_create(size_t priv_size, 
struct device *dev,
 
mutex_init(>conf_mutex);
spin_lock_init(>data_lock);
-   spin_lock_init(>txqs_lock);
 
-   INIT_LIST_HEAD(>txqs);
INIT_LIST_HEAD(>peers);
init_waitqueue_head(>peer_mapping_wq);
init_waitqueue_head(>htt.empty_tx_wq);
diff --git a/drivers/net/wireless/ath/ath10k/core.h 
b/drivers/net/wireless/ath/ath10k/core.h
index 46e9c8c97a4d..28cecf7375f8 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -90,6 +90,9 @@
 /* The magic used by QCA spec */
 #define ATH10K_SMBIOS_BDF_EXT_MAGIC "BDF_"
 
+/* Default Airtime weight multipler (Tuned for multiclient performance) */
+#define ATH10K_AIRTIME_WEIGHT_MULTIPLIER  4
+
 struct ath10k;
 
 static inline const char *ath10k_bus_str(enum ath10k_bus bus)
@@ -1068,10 +1071,7 @@ struct ath10k {
 
/* protects shared structure data */
spinlock_t data_lock;
-   /* protects: ar->txqs, artxq->list */
-   spinlock_t txqs_lock;
 
-   struct list_head txqs;
struct list_head arvifs;
struct list_head peers;
struct ath10k_peer *peer_map[ATH10K_MAX_NUM_PEER_IDS];
diff --git a/drivers/net/wireless/ath/ath10k/htc.h 
b/drivers/net/wireless/ath/ath10k/htc.h
index 51fda6c23f69..cb30add7dd33 100644
--- a/drivers/net/wireless/ath/ath10k/htc.h
+++ b/drivers/net/wireless/ath/ath10k/htc.h
@@ -51,7 +51,6 @@ struct ath10k;
  */
 
 #define HTC_HOST_MAX_MSG_PER_RX_BUNDLE8
-#define HTC_HOST_MAX_MSG_PER_TX_BUNDLE16
 
 enum ath10k_htc_tx_flags {
ATH10K_HTC_FLAG_NEED_CREDIT_UPDATE = 0x01,
diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c 
b/drivers/net/wireless/ath/ath10k/htt_rx.c
index f42bac204ef8..dc9895f2eb9d 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -2596,6 +2596,7 @@ static void ath10k_htt_rx_tx_fetch_ind(struct ath10k *ar, 
struct sk_buff *skb)
u8 tid;
int ret;
int i;
+   bool may_tx;
 
ath10k_dbg(ar, ATH10K_DBG_HTT, "htt rx tx fetch ind\n");
 
@@ -2668,8 +2669,13 @@ static void ath10k_htt_rx_tx_fetch_ind(struct ath10k 
*ar, struct sk_buff *skb)
num_msdus = 0;
num_bytes = 0;
 
+   ieee80211_txq_schedule_start(hw, txq->ac);
+   may_tx = ieee80211_txq_may_transmit(hw, txq);
while (num_msdus < max_num_msdus &&
   num_bytes < max_num_bytes) {
+   if (!may_tx)
+   break;
+
ret = ath10k_mac_tx_push_txq(hw, txq);
if (ret < 0)
break;
@@ -2677,6 +2683,8 @@ static void ath10k_htt_rx_tx_fetch_ind(struct ath10k *ar, 
struct sk_buff *skb)
num_msdus++;
num_bytes += ret;
}
+   ieee80211_return_txq(hw, txq);
+   ieee80211_txq_schedule_end(hw, txq->ac);
 
record->num_msdus = cpu_to_le16(num_msdus);
record->num_bytes = cpu_to_le32(num_bytes);
diff --git a/drivers/net/wireless/ath/ath10k/mac.c 
b/drivers/net/wireless/ath/ath10k/mac.c
index 49758490eaba..cf7fdf72e990 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -3890,7 +3890,6 @@ static void ath10k_mac_txq_init(struct ieee80211_txq *txq)
 
 static void ath10k_mac_txq_unref(struct ath10k *ar, struct ieee80211_txq *txq)
 {
-   struct ath10k_txq *artxq;
struct ath10k_skb_cb *cb;
struct sk_buff *msdu;
int msdu_id;
@@ -38

Re: [PATCH v7 1/4] mac80211: Expose ieee80211_schedule_txq() function

2019-01-22 Thread Toke Høiland-Jørgensen
Toke Høiland-Jørgensen  writes:

> Since we reworked ieee80211_return_txq() so it assumes that the caller
> takes care of logging, we need another function that can be called without
> holding any locks. Introduce ieee80211_schedule_txq() which serves this
> purpose.

Hmm, messed up the version number on this one, sorry about that. It was
supposed to be v6 along with the rest...

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [Make-wifi-fast] [PATCH v3 3/6] mac80211: Add airtime accounting and scheduling to TXQs

2018-12-18 Thread Toke Høiland-Jørgensen
Johannes Berg  writes:

> On Thu, 2018-11-15 at 09:10 -0800, Toke Høiland-Jørgensen wrote:
>> Louie Lu  writes:
>> 
>> > Hi Rajkumar, Toke,
>> > 
>> > I found the series (v3,4/6) remove the debugfs remove reset station's
>> > airtime method, and didn't added at here.
>> > 
>> > Not sure how to help this kind of situation, do I need a separate
>> > patch to fix this, or posting the patch here is fine?
>> 
>> This is fine; we can fold it into the next version. Thanks :)
>
> Just FYI - I'm going to assume, given this comment and the long
> discussion, that there will be a next version :)

Yes. Got caught up in moving before I managed to get another version
out. Will get around to it eventually, promise! :)

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v3 3/6] mac80211: Add airtime accounting and scheduling to TXQs

2018-12-04 Thread Toke Høiland-Jørgensen
Felix Fietkau  writes:

>> diff --git a/net/mac80211/status.c b/net/mac80211/status.c
>> index aa4afbf0abaf..a1f1256448f5 100644
>> --- a/net/mac80211/status.c
>> +++ b/net/mac80211/status.c
>> @@ -818,6 +818,12 @@ static void __ieee80211_tx_status(struct ieee80211_hw 
>> *hw,
>>  ieee80211_sta_tx_notify(sta->sdata, (void *) skb->data,
>>  acked, info->status.tx_time);
>>  
>> +if (info->status.tx_time &&
>> +wiphy_ext_feature_isset(local->hw.wiphy,
>> +
>> NL80211_EXT_FEATURE_AIRTIME_FAIRNESS))
>> +ieee80211_sta_register_airtime(>sta, tid,
>> +   info->status.tx_time, 0);
>> +
>>  if (ieee80211_hw_check(>hw, REPORTS_TX_ACK_STATUS)) {
>>  if (info->flags & IEEE80211_TX_STAT_ACK) {
>>  if (sta->status_stats.lost_packets)
> I think the same is needed in ieee80211_tx_status_ext.

So finally circled back to this. In ieee80211_tx_status_ext() we don't
have an skb, so we don't know which TID the packet was sent to; what
airtime information would the driver actually provide in this case? Is
it an aggregate of all ACs, or?

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [Make-wifi-fast] [PATCH v3 3/6] mac80211: Add airtime accounting and scheduling to TXQs

2018-11-19 Thread Toke Høiland-Jørgensen
Dave Taht  writes:

> Toke Høiland-Jørgensen  writes:
>
>> Felix Fietkau  writes:
>>
>>> On 2018-11-14 18:40, Toke Høiland-Jørgensen wrote:
>>>>> This part doesn't really make much sense to me, but maybe I'm
>>>>> misunderstanding how the code works.
>>>>> Let's assume we have a driver like ath9k or mt76, which tries to keep a
>>>>> number of aggregates in the hardware queue, and the hardware queue is
>>>>> currently empty.
>>>>> If the current txq entry is kept at the head of the schedule list,
>>>>> wouldn't the code just pull from that one over and over again, until
>>>>> enough packets are transmitted by the hardware and their tx status
>>>>> processed?
>>>>> It seems to me that while fairness is still preserved in the long run,
>>>>> this could lead to rather bursty scheduling, which may not be
>>>>> particularly latency friendly.
>>>> 
>>>> Yes, it'll be a bit more bursty when the hardware queue is completely
>>>> empty. However, when a TX completion comes back, that will adjust the
>>>> deficit of that sta and cause it to be rotated on the next dequeue. This
>>>> obviously relies on the fact that the lower-level hardware queue is
>>>> sufficiently shallow to not add a lot of latency. But we want that to be
>>>> the case anyway. In practice, it works quite well for ath9k, but not so
>>>> well for ath10k because it has a large buffer in firmware.
>>>> 
>>>> If we requeue the TXQ at the end of the list, a station that is taking
>>>> up too much airtime will fail to be throttled properly, so the
>>>> queue-at-head is kinda needed to ensure fairness...
>>> Thanks for the explanation, that makes sense to me. I have an idea on
>>> how to mitigate the burstiness within the driver. I'll write it down in
>>> pseudocode, please let me know if you think that'll work.
>>
>> I don't think it will, unfortunately. For example, consider the case
>> where there are two stations queued; one with a large negative deficit
>> (say, -10ms), and one with a positive deficit.
>
> Perhaps a flag for one way or the other?
>
> if(driver->has_absurd_hardware_queue_depth) doitthisway(); else
> doitabetterway();

Well, there's going to be a BQL-like queue limit (but for airtime) on
top, which drivers can opt-in to if the hardware has too much queueing.

>> In this case, we really need to throttle the station with a negative
>> deficit. But if the driver loops and caches txqs, we'll get something
>> like the following:
>>
>> - First driver loop iteration: returns TXQ with positive deficit.
>> - Second driver loop iteration: Only the negative-deficit TXQ is in the
>>   mac80211 list, so it will loop until that TXQ's deficit turns positive
>>   and return it.
>>
>> Because of this, the negative-deficit station won't be throttled, and we
>> won't get fairness.
>>
>> How many frames will mt76 queue up below the driver point? I.e., how
>> much burstiness are you expecting this will introduce on that driver?
>>
>> Taking a step back, it's clear that it would be good to be able to
>> dequeue packets to multiple STAs at once (we need that for MU-MIMO on
>> ath10k as well). However, I don't think we can do that with the
>> round-robin fairness scheduler; so we are going to need a different
>> algorithm. I *think* it may be possible to do this with a virtual-time
>> scheduler, but I haven't sat down and worked out the details yet...
>
> The answer to which did not fit on the margins of your thesis. :)
>
> I too have been trying to come up with a better means of gang
> scheduling... for about 2 years now. In terms of bitmaps it looks a bit
> like QFQ, but honestly...

It's not the gang scheduling we need, deciding which devices to send to
at once is generally done in firmware anyway. We just need to be able to
dequeue packets for more than one station when possible. I don't think
we need the fancy bitmap stuff from QFQ since we don't have that many
stations to schedule at once; so we can probably live with O(log(n)) in
the number of active stations.

> Is there going to be some point where whatever we have here is
> significantly better than what we had? Or not significantly worse? Or
> handwavy enough to fix the rest once enlightenment arrives?
>
> The perfect is the enemy of the good.

Well, what we have now works for ath9k, works reasonably well for ath10k
in pull mode, not so well for ath10k in push mode, and then there's
Felix' comments in this thread...

> I'd rather like the intel folk to be weighing in on this stuff, too,
> trying to get an API right requires use cases.

Johannes has already reviewed a previous version, and I do believe he
said he'd review it again once we have converged on something :)

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v3 3/6] mac80211: Add airtime accounting and scheduling to TXQs

2018-11-19 Thread Toke Høiland-Jørgensen
Hi Felix

Thinking a bit more about this, I think that rather than having the
driver work around the API as in your example...

> do {
>   struct ieee80211_txq *pending_txq[4];
>   int n_pending_txq = 0;
>   int i;
>
>   if (hwq->pending < 4)
>   break;p
>
>   nframes = 0;
>
>   ieee80211_txq_schedule_start(hw, ac)
>   do {
>   bool requeue = false;
>
>   struct ieee80211_txq *txq;
>
>   txq = ieee80211_next_txq(hw, ac);
>   if (!txq)
>   break;
>
>   nframes += schedule_txq(txq, );
>   if (requeue)
>   pending_txq[n_pending_txq++] = txq;
>
>   } while (n_pending_txq < ARRAY_SIZE(pending_txq));
>
>   for (i = n_pending_txq; i > 0; i--)
>   ieee80211_return_txq(hw, pending_txq[i - 1]);
>
>   ieee80211_txq_schedule_end(hw, ac)
> } while (nframes);

... really what we want is that the driver can just do this:

ieee80211_txq_schedule_start(hw, ac);
while ((txq = ieee80211_next_txq(hw, ac)) {
schedule_txq(txq, );
return_txq(txq);
}
ieee80211_txq_schedule_end(hw, ac);

and expect so get through all eligible TXQs. Note that there will be
cases where there is only a single eligible TXQ (such as the example I
gave in the other email); in which case the current version is fine. But
there is (probably) also going to be cases where more than one TXQ is
eligible at the same time, which we cannot handle with the current RR
scheduler.

However, I think that assuming we can get the scheduler to guarantee
that it will return all eligible TXQs between each pair of calls to
schedule_start()/schedule_end(), we should be fine with the current API.
Do you agree with this?

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v5 0/6] Move TXQ scheduling and airtime fairness into mac80211

2018-12-19 Thread Toke Høiland-Jørgensen
Rajkumar Manoharan  writes:

> All,
>
> Sorry for the long delay. Here is the consolidated series of mac80211,
> ath9k and ath10k changes for moving TXQ scheduling and airtime fairness
> into mac80211 and support airtime fairness.

Thanks for taking care of the respin!

Johannes, I think it's safe for you to review this version. I have
started working on another approach for the scheduler algorithm itself,
but it is fine to do as an incremental patch on top of this as it
shouldn't impact the API. Having a separate patch also makes testing
easier anyway. I'll see if I can't get it to a working state over the
holidays...

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH] ath10k: remove iteration in wake_tx_queue

2019-04-01 Thread Toke Høiland-Jørgensen
Erik Stromdahl  writes:

> Iterating the TX queue and thereby dequeuing all available packets in the
> queue could result in performance penalties on some SMP systems.
>
> The reason for this is most likely that the per-ac lock (active_txq_lock)
> in mac80211 will be held by the CPU iterating the current queue.
>
> This will lock up other CPUs trying to push new messages on the TX
> queue.
>
> Instead of iterating the queue we fetch just one packet at the time,
> resulting in minimal starvation of the other CPUs.

Did you test this with Felix' patches reducing the time the lock is held
in mac80211?

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2019-02-21 Thread Toke Høiland-Jørgensen
Ben Greear  writes:

> On 2/21/19 8:37 AM, Toke Høiland-Jørgensen wrote:
>> Ben Greear  writes:
>> 
>>> On 2/21/19 8:10 AM, Kalle Valo wrote:
>>>> Toke Høiland-Jørgensen  writes:
>>>>
>>>>> Grant Grundler  writes:
>>>>>
>>>>>> On Thu, Sep 6, 2018 at 3:18 AM Toke Høiland-Jørgensen  
>>>>>> wrote:
>>>>>>>
>>>>>>> Grant Grundler  writes:
>>>>>>>
>>>>>>>>> And, well, Grant's data is from a single test in a noisy
>>>>>>>>> environment where the time series graph shows that throughput is all 
>>>>>>>>> over
>>>>>>>>> the place for the duration of the test; so it's hard to draw solid
>>>>>>>>> conclusions from (for instance, for the 5-stream test, the average
>>>>>>>>> throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 
>>>>>>>>> 7
>>>>>>>>> it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware
>>>>>>>>> used in this test, so I can't go verify it myself; so the only thing I
>>>>>>>>> can do is grumble about it here... :)
>>>>>>>>
>>>>>>>> It's a fair complaint and I agree with it. My counter argument is the
>>>>>>>> opposite is true too: most ideal benchmarks don't measure what most
>>>>>>>> users see. While the data wgong provided are way more noisy than I
>>>>>>>> like, my overall "confidence" in the "conclusion" I offered is still
>>>>>>>> positive.
>>>>>>>
>>>>>>> Right. I guess I would just prefer a slightly more comprehensive
>>>>>>> evaluation to base a 4x increase in buffer size on...
>>>>>>
>>>>>> Kalle, is this why you didn't accept this patch? Other reasons?
>>>>>>
>>>>>> Toke, what else would you like to see evaluated?
>>>>>>
>>>>>> I generally want to see three things measured when "benchmarking"
>>>>>> technologies: throughput, latency, cpu utilization
>>>>>> We've covered those three I think "reasonably".
>>>>>
>>>>> Hmm, going back and looking at this (I'd completely forgotten about this
>>>>> patch), I think I had two main concerns:
>>>>>
>>>>> 1. What happens in a degraded signal situation, where the throughput is
>>>>>  limited by the signal conditions, or by contention with other 
>>>>> devices.
>>>>>  Both of these happen regularly, and I worry that latency will be
>>>>>  badly affected under those conditions.
>>>>>
>>>>> 2. What happens with old hardware that has worse buffer management in
>>>>>  the driver->firmware path (especially drivers without push/pull mode
>>>>>  support)? For these, the lower-level queueing structure is less
>>>>>  effective at controlling queueing latency.
>>>>
>>>> Do note that this patch changes behaviour _only_ for QCA6174 and QCA9377
>>>> PCI devices, which IIRC do not even support push/pull mode. All the
>>>> rest, including QCA988X and QCA9984 are unaffected.
>>>
>>> Just as a note, at least kernels such as 4.14.whatever perform poorly when
>>> running ath10k on 9984 when acting as TCP endpoints.  This makes them not
>>> really usable for stuff like serving video to lots of clients.
>>>
>>> Tweaking TCP (I do it a bit differently, but either way) can significantly
>>> improve performance.
>> 
>> Differently how? Did you have to do more than fiddle with the pacing_shift?
>
> This one, or a slightly tweaked version that applies to different kernels:
>
> https://github.com/greearb/linux-ct-4.16/commit/3e14e8491a5b31ce994fb2752347145e6ab7eaf5

Right; but the current mac80211 default (pacing shift 8) corresponds to
setting your sysctl to 4...

>>> Recently I helped a user that could get barely 70 stations streaming
>>> at 1Mbps on stock kernel (using one wave1 on 2.4, one wave-2 on 5Ghz),
>>> and we got 110 working with a tweaked TCP stack. These were /n
>>> stations too.
>>>
>>> I think it is lame that it _still_ requires out of tree patches to
>>> make TCP work well on ath10k...even if you want to default to current
>>> behaviour, you should allow users to tweak it to work with their use
>>> case.
>> 
>> Well if TCP is broken to the point of being unusable I do think we
>> should fix it; but I think "just provide a configuration knob" should be
>> the last resort...
>
> So, it has been broken for years, and waiting for a perfect solution
> has not gotten the problem fixed.

Well, the current default should at least be closer to something that
works well.

I do think I may have erred on the wrong side of the optimum when I
submitted the original patch to set the default to 8; that should
probably have been 7 (i.e., 8 ms; the optimum in the evaluation we did
was around 6 ms, which is sadly not a power of two). Maybe changing that
default is actually better than having to redo the testing for all the
different devices as we're discussing in the context of this patch.
Maybe I should just send a patch to do that...

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH] mac80211: Change default tx_sk_pacing_shift to 7

2019-02-21 Thread Toke Høiland-Jørgensen
When we did the original tests for the optimal value of sk_pacing_shift, we
came up with 6 ms of buffering as the default. Sadly, 6 is not a power of
two, so when picking the shift value I erred on the size of less buffering
and picked 4 ms instead of 8. This was probably wrong; those 2 ms of extra
buffering makes a larger difference than I thought.

So, change the default pacing shift to 7, which corresponds to 8 ms of
buffering. The point of diminishing returns really kicks in after 8 ms, and
so having this as a default should cut down on the need for extensive
per-device testing and overrides needed in the drivers.

Signed-off-by: Toke Høiland-Jørgensen 
---
 net/mac80211/main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index 5055aeba5c5a..800e67615e2a 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -617,13 +617,13 @@ struct ieee80211_hw *ieee80211_alloc_hw_nm(size_t 
priv_data_len,
 * We need a bit of data queued to build aggregates properly, so
 * instruct the TCP stack to allow more than a single ms of data
 * to be queued in the stack. The value is a bit-shift of 1
-* second, so 8 is ~4ms of queued data. Only affects local TCP
+* second, so 7 is ~8ms of queued data. Only affects local TCP
 * sockets.
 * This is the default, anyhow - drivers may need to override it
 * for local reasons (longer buffers, longer completion time, or
 * similar).
 */
-   local->hw.tx_sk_pacing_shift = 8;
+   local->hw.tx_sk_pacing_shift = 7;
 
/* set up some defaults */
local->hw.queues = 1;
-- 
2.20.1


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2019-02-21 Thread Toke Høiland-Jørgensen
Ben Greear  writes:

> On 2/21/19 8:10 AM, Kalle Valo wrote:
>> Toke Høiland-Jørgensen  writes:
>> 
>>> Grant Grundler  writes:
>>>
>>>> On Thu, Sep 6, 2018 at 3:18 AM Toke Høiland-Jørgensen  wrote:
>>>>>
>>>>> Grant Grundler  writes:
>>>>>
>>>>>>> And, well, Grant's data is from a single test in a noisy
>>>>>>> environment where the time series graph shows that throughput is all 
>>>>>>> over
>>>>>>> the place for the duration of the test; so it's hard to draw solid
>>>>>>> conclusions from (for instance, for the 5-stream test, the average
>>>>>>> throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 7
>>>>>>> it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware
>>>>>>> used in this test, so I can't go verify it myself; so the only thing I
>>>>>>> can do is grumble about it here... :)
>>>>>>
>>>>>> It's a fair complaint and I agree with it. My counter argument is the
>>>>>> opposite is true too: most ideal benchmarks don't measure what most
>>>>>> users see. While the data wgong provided are way more noisy than I
>>>>>> like, my overall "confidence" in the "conclusion" I offered is still
>>>>>> positive.
>>>>>
>>>>> Right. I guess I would just prefer a slightly more comprehensive
>>>>> evaluation to base a 4x increase in buffer size on...
>>>>
>>>> Kalle, is this why you didn't accept this patch? Other reasons?
>>>>
>>>> Toke, what else would you like to see evaluated?
>>>>
>>>> I generally want to see three things measured when "benchmarking"
>>>> technologies: throughput, latency, cpu utilization
>>>> We've covered those three I think "reasonably".
>>>
>>> Hmm, going back and looking at this (I'd completely forgotten about this
>>> patch), I think I had two main concerns:
>>>
>>> 1. What happens in a degraded signal situation, where the throughput is
>>> limited by the signal conditions, or by contention with other devices.
>>> Both of these happen regularly, and I worry that latency will be
>>> badly affected under those conditions.
>>>
>>> 2. What happens with old hardware that has worse buffer management in
>>> the driver->firmware path (especially drivers without push/pull mode
>>> support)? For these, the lower-level queueing structure is less
>>> effective at controlling queueing latency.
>> 
>> Do note that this patch changes behaviour _only_ for QCA6174 and QCA9377
>> PCI devices, which IIRC do not even support push/pull mode. All the
>> rest, including QCA988X and QCA9984 are unaffected.
>
> Just as a note, at least kernels such as 4.14.whatever perform poorly when
> running ath10k on 9984 when acting as TCP endpoints.  This makes them not
> really usable for stuff like serving video to lots of clients.
>
> Tweaking TCP (I do it a bit differently, but either way) can significantly
> improve performance.

Differently how? Did you have to do more than fiddle with the pacing_shift?

> Recently I helped a user that could get barely 70 stations streaming
> at 1Mbps on stock kernel (using one wave1 on 2.4, one wave-2 on 5Ghz),
> and we got 110 working with a tweaked TCP stack. These were /n
> stations too.
>
> I think it is lame that it _still_ requires out of tree patches to
> make TCP work well on ath10k...even if you want to default to current
> behaviour, you should allow users to tweak it to work with their use
> case.

Well if TCP is broken to the point of being unusable I do think we
should fix it; but I think "just provide a configuration knob" should be
the last resort...

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2019-02-21 Thread Toke Høiland-Jørgensen
Grant Grundler  writes:

> On Thu, Sep 6, 2018 at 3:18 AM Toke Høiland-Jørgensen  wrote:
>>
>> Grant Grundler  writes:
>>
>> >> And, well, Grant's data is from a single test in a noisy
>> >> environment where the time series graph shows that throughput is all over
>> >> the place for the duration of the test; so it's hard to draw solid
>> >> conclusions from (for instance, for the 5-stream test, the average
>> >> throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 7
>> >> it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware
>> >> used in this test, so I can't go verify it myself; so the only thing I
>> >> can do is grumble about it here... :)
>> >
>> > It's a fair complaint and I agree with it. My counter argument is the
>> > opposite is true too: most ideal benchmarks don't measure what most
>> > users see. While the data wgong provided are way more noisy than I
>> > like, my overall "confidence" in the "conclusion" I offered is still
>> > positive.
>>
>> Right. I guess I would just prefer a slightly more comprehensive
>> evaluation to base a 4x increase in buffer size on...
>
> Kalle, is this why you didn't accept this patch? Other reasons?
>
> Toke, what else would you like to see evaluated?
>
> I generally want to see three things measured when "benchmarking"
> technologies: throughput, latency, cpu utilization
> We've covered those three I think "reasonably".

Hmm, going back and looking at this (I'd completely forgotten about this
patch), I think I had two main concerns:

1. What happens in a degraded signal situation, where the throughput is
   limited by the signal conditions, or by contention with other devices.
   Both of these happen regularly, and I worry that latency will be
   badly affected under those conditions.

2. What happens with old hardware that has worse buffer management in
   the driver->firmware path (especially drivers without push/pull mode
   support)? For these, the lower-level queueing structure is less
   effective at controlling queueing latency.

Getting the queue size limit patches from ChromeOS ported would
alleviate point 2. I do believe Kan said he'd look into that once the
airtime patches were merged. So Kan, any progress on that front? :)

> What does a "4x increase in memory" mean here? Wen, how much more
> memory does this cause ath10k to use?

I didn't say "memory", I said "buffer size"... :)
I.e., it's the latency impact of the increased buffering I'm worried
about (see above), not the system memory usage.

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH] mac80211: Change default tx_sk_pacing_shift to 7

2019-02-22 Thread Toke Høiland-Jørgensen
Johannes Berg  writes:

> Toke Høiland-Jørgensen wrote:
>
>> When we did the original tests for the optimal value of sk_pacing_shift, we
>> came up with 6 ms of buffering as the default. Sadly, 6 is not a power of
>> two, so when picking the shift value I erred on the size of less buffering
>> and picked 4 ms instead of 8. This was probably wrong; those 2 ms of extra
>> buffering makes a larger difference than I thought.
>> 
>> So, change the default pacing shift to 7, which corresponds to 8 ms of
>> buffering. The point of diminishing returns really kicks in after 8 ms, and
>> so having this as a default should cut down on the need for extensive
>> per-device testing and overrides needed in the drivers.
>> 
>> Signed-off-by: Toke Høiland-Jørgensen 
>
> Patch applied to wireless.git, thanks.
>
> a41f56b9a17a (HEAD -> mac80211) mac80211: Change default
> tx_sk_pacing_shift to 7

Cool, thanks! What's the easiest way to backport this? I figure it's
easier to just update sk_pacing_shift_update() in tx.c for 4.19 (which
predates the addition of the driver override hook); shall I just send a
separate patch to stable for that? Or do we need to backport the driver
override hook as well?

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH] mac80211: Change default tx_sk_pacing_shift to 7

2019-02-22 Thread Toke Høiland-Jørgensen
Johannes Berg  writes:

> On Fri, 2019-02-22 at 14:06 +0100, Toke Høiland-Jørgensen wrote:
>> Johannes Berg  writes:
>> 
>> > Toke Høiland-Jørgensen wrote:
>> > 
>> > > When we did the original tests for the optimal value of sk_pacing_shift, 
>> > > we
>> > > came up with 6 ms of buffering as the default. Sadly, 6 is not a power of
>> > > two, so when picking the shift value I erred on the size of less 
>> > > buffering
>> > > and picked 4 ms instead of 8. This was probably wrong; those 2 ms of 
>> > > extra
>> > > buffering makes a larger difference than I thought.
>> > > 
>> > > So, change the default pacing shift to 7, which corresponds to 8 ms of
>> > > buffering. The point of diminishing returns really kicks in after 8 ms, 
>> > > and
>> > > so having this as a default should cut down on the need for extensive
>> > > per-device testing and overrides needed in the drivers.
>> > > 
>> > > Signed-off-by: Toke Høiland-Jørgensen 
>> > 
>> > Patch applied to wireless.git, thanks.
>> > 
>> > a41f56b9a17a (HEAD -> mac80211) mac80211: Change default
>> > tx_sk_pacing_shift to 7
>
> This mess came from Kalle's tool btw, so I can't really use it yet :-)
>
>> Cool, thanks! What's the easiest way to backport this? I figure it's
>> easier to just update sk_pacing_shift_update() in tx.c for 4.19 (which
>> predates the addition of the driver override hook); shall I just send a
>> separate patch to stable for that? Or do we need to backport the driver
>> override hook as well?
>
> Just update the value, no need to backport the hook.

And send it directly to stable, or does it need to go through you?

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2019-02-21 Thread Toke Høiland-Jørgensen
Ben Greear  writes:

> On 2/21/19 9:15 AM, Toke Høiland-Jørgensen wrote:
>> Ben Greear  writes:
>> 
>>> On 2/21/19 8:37 AM, Toke Høiland-Jørgensen wrote:
>>>> Ben Greear  writes:
>>>>
>>>>> On 2/21/19 8:10 AM, Kalle Valo wrote:
>>>>>> Toke Høiland-Jørgensen  writes:
>>>>>>
>>>>>>> Grant Grundler  writes:
>>>>>>>
>>>>>>>> On Thu, Sep 6, 2018 at 3:18 AM Toke Høiland-Jørgensen  
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Grant Grundler  writes:
>>>>>>>>>
>>>>>>>>>>> And, well, Grant's data is from a single test in a noisy
>>>>>>>>>>> environment where the time series graph shows that throughput is 
>>>>>>>>>>> all over
>>>>>>>>>>> the place for the duration of the test; so it's hard to draw solid
>>>>>>>>>>> conclusions from (for instance, for the 5-stream test, the average
>>>>>>>>>>> throughput for 6 is 331 and 379 Mbps for the two repetitions, and 
>>>>>>>>>>> for 7
>>>>>>>>>>> it's 326 and 371 Mbps) . Unfortunately I don't have the same 
>>>>>>>>>>> hardware
>>>>>>>>>>> used in this test, so I can't go verify it myself; so the only 
>>>>>>>>>>> thing I
>>>>>>>>>>> can do is grumble about it here... :)
>>>>>>>>>>
>>>>>>>>>> It's a fair complaint and I agree with it. My counter argument is the
>>>>>>>>>> opposite is true too: most ideal benchmarks don't measure what most
>>>>>>>>>> users see. While the data wgong provided are way more noisy than I
>>>>>>>>>> like, my overall "confidence" in the "conclusion" I offered is still
>>>>>>>>>> positive.
>>>>>>>>>
>>>>>>>>> Right. I guess I would just prefer a slightly more comprehensive
>>>>>>>>> evaluation to base a 4x increase in buffer size on...
>>>>>>>>
>>>>>>>> Kalle, is this why you didn't accept this patch? Other reasons?
>>>>>>>>
>>>>>>>> Toke, what else would you like to see evaluated?
>>>>>>>>
>>>>>>>> I generally want to see three things measured when "benchmarking"
>>>>>>>> technologies: throughput, latency, cpu utilization
>>>>>>>> We've covered those three I think "reasonably".
>>>>>>>
>>>>>>> Hmm, going back and looking at this (I'd completely forgotten about this
>>>>>>> patch), I think I had two main concerns:
>>>>>>>
>>>>>>> 1. What happens in a degraded signal situation, where the throughput is
>>>>>>>   limited by the signal conditions, or by contention with other 
>>>>>>> devices.
>>>>>>>   Both of these happen regularly, and I worry that latency will be
>>>>>>>   badly affected under those conditions.
>>>>>>>
>>>>>>> 2. What happens with old hardware that has worse buffer management in
>>>>>>>   the driver->firmware path (especially drivers without push/pull 
>>>>>>> mode
>>>>>>>   support)? For these, the lower-level queueing structure is less
>>>>>>>   effective at controlling queueing latency.
>>>>>>
>>>>>> Do note that this patch changes behaviour _only_ for QCA6174 and QCA9377
>>>>>> PCI devices, which IIRC do not even support push/pull mode. All the
>>>>>> rest, including QCA988X and QCA9984 are unaffected.
>>>>>
>>>>> Just as a note, at least kernels such as 4.14.whatever perform poorly when
>>>>> running ath10k on 9984 when acting as TCP endpoints.  This makes them not
>>>>> really usable for stuff like serving video to lots of clients.
>>>>>
>>>>> Tweaking TCP (I do it a bit differently, but either way) can significantly
>>>>> improve performance.
>>>>
>>>> Differentl

Re: [PATCH] mac80211: Change default tx_sk_pacing_shift to 7

2019-02-23 Thread Toke Høiland-Jørgensen
Johannes Berg  writes:

> On Fri, 2019-02-22 at 14:40 +0100, Toke Høiland-Jørgensen wrote:
>> 
>> And send it directly to stable, or does it need to go through you?
>
> I added a Cc: stable tag. So all you really need to do is wait for the
> emails telling you it failed to apply, and handle accordingly :)

Cool, that I can do; thanks! :)

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v6 4/4] ath10k: reporting estimated tx airtime for fairness

2019-01-25 Thread Toke Høiland-Jørgensen
Kan Yan  writes:

> Hi Toke,
>
> This patch looks good to me. Great job on getting the airtime fairness
> scheduling framework done!

Great, thanks for checking! Have another patch to test on top of this
series, then we should look into getting the airtime queue limit patch
into mac80211 as well :)

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v7 1/4] mac80211: Expose ieee80211_schedule_txq() function

2019-01-25 Thread Toke Høiland-Jørgensen
Johannes Berg  writes:

>> +void ieee80211_schedule_txq(struct ieee80211_hw *hw,
>> +struct ieee80211_txq *txq)
>> +__acquires(txq_lock) __releases(txq_lock)
>> +{
>> +struct ieee80211_local *local = hw_to_local(hw);
>> +struct txq_info *txqi = to_txq_info(txq);
>> +
>> +spin_lock_bh(>active_txq_lock[txq->ac]);
>> +ieee80211_return_txq(hw, txq);
>> +spin_unlock_bh(>active_txq_lock[ac]);
>> 
> Maybe v6 had txq->ac here instead of just ac which doesn't compile ;-)
>
> I fixed it up, but I hope you tested a compiling version :P

Ah, right, thanks for fixing it! I think this was an artifact of moving
things around while rebasing for submission (I have another patch on top
of this series that I need to test first).

So yeah, I definitely have a version in my tree somewhere that actually
compiles ;)

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH] ath10k: remove iteration in wake_tx_queue

2019-04-16 Thread Toke Høiland-Jørgensen
Erik Stromdahl  writes:

> On 4/1/19 1:05 PM, Toke Høiland-Jørgensen wrote:
>> Erik Stromdahl  writes:
>> 
>>> Iterating the TX queue and thereby dequeuing all available packets in the
>>> queue could result in performance penalties on some SMP systems.
>>>
>>> The reason for this is most likely that the per-ac lock (active_txq_lock)
>>> in mac80211 will be held by the CPU iterating the current queue.
>>>
>>> This will lock up other CPUs trying to push new messages on the TX
>>> queue.
>>>
>>> Instead of iterating the queue we fetch just one packet at the time,
>>> resulting in minimal starvation of the other CPUs.
>> 
>> Did you test this with Felix' patches reducing the time the lock is held
>> in mac80211?
>> 
>> -Toke
>> 
> Hi Toke,
>
> I am not aware of these patches. Can you please point them out for me?

They've already been merged. Commits dcec1d9bc8a7 and 7ef769459f14 in
mac80211-next :)

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2] ath10k: fix different tx duration output

2019-04-17 Thread Toke Høiland-Jørgensen
Lei Wang  writes:

> TX duration output of tx_stats in debugfs and station dump had big
> difference because they got tx duration value from different statistic
> data. We should use the same statistic data.

So are you sure you picked the most accurate one of the two? :)

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2] ath10k: fix different tx duration output

2019-05-07 Thread Toke Høiland-Jørgensen
le...@codeaurora.org writes:

> On 2019-04-18 16:07, Toke Høiland-Jørgensen wrote:
>> le...@codeaurora.org writes:
>> 
>>> On 2019-04-17 17:26, Toke Høiland-Jørgensen wrote:
>>>> Lei Wang  writes:
>>>> 
>>>>> TX duration output of tx_stats in debugfs and station dump had big
>>>>> difference because they got tx duration value from different 
>>>>> statistic
>>>>> data. We should use the same statistic data.
>>>> 
>>>> So are you sure you picked the most accurate one of the two? :)
>>>> 
>>>> -Toke
>>> 
>>> Hi Toke,
>>> 
>>> Yes.
>>> Now for ath10k, there are two ways to get tx duration output.
>>> One is got from tx_stats in debugfs reported by firmware. It is a 
>>> total
>>> value including all the frames which created by host and firmware sent
>>> to the peer.
>>> And the second is calculated from
>>> ath10k_htt_rx_tx_compl_ind()-->ieee80211_sta_register_airtime(), here
>>> the tx duration just includes the data frames sent from host to the
>>> peer.
>> 
>> So the difference is that the former includes control frames as well? 
>> Is
>> that the only difference? And what exactly is a "big difference" (from
>> the commit message)?
>> 
> Yes,it adds the duration time of receiving ACK frames.
>  From my test,TX from AP to station with iperf UDP test in 
> 10s,tx_stats->tx_duration:5496623us,
> and another value is 3934327us.

Hmm, that's quite a big difference. Is this really only ACKs, or is it
also a question of whether retries are accounted? If so, it may actually
be that what we should do is change which value is passed to
ieee80211_sta_register_airtime()?

>>> So the first value is preferable for station dump.
>> 
>> Hmm, I'm not sure if I agree with this. I specifically added the
>> tx_duration to the station dump to be able to get the values used by 
>> the
>> airtime scheduler. This breaks with this patch.
>> 
>> -Toke
>  From our internal discussing, we will revert this change.

Cool, but see above :)

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH] mac80211: Fix kernel panic due to use of txq after free

2019-04-16 Thread Toke Høiland-Jørgensen
Bhagavathi Perumal S  writes:

> The txq of vif is added to active_txqs list for ATF TXQ scheduling
> in the function ieee80211_queue_skb(), but it was not properly removed
> before freeing the txq object. It was causing use after free of the txq
> objects from the active_txqs list, result was kernel panic
> due to invalid memory access.
>
> Fix kernel invalid memory access by properly removing txq object
> from active_txqs list before free the object.
>
> Signed-off-by: Bhagavathi Perumal S 

Nice catch, thanks!

Acked-by: Toke Høiland-Jørgensen 

This should probably have a fixes tag:

Fixes: 1866760096bf ("mac80211: Add TXQ scheduling API")

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


RE: [PATCH 4/7] ath10k: disable TX complete indication of htt for sdio

2019-08-22 Thread Toke Høiland-Jørgensen
Wen Gong  writes:

>> -Original Message-
>> From: Toke Høiland-Jørgensen 
>> Sent: Wednesday, August 21, 2019 6:10 PM
>> To: Wen Gong ; Wen Gong
>> ; ath10k@lists.infradead.org
>> Cc: linux-wirel...@vger.kernel.org
>> Subject: [EXT] RE: [PATCH 4/7] ath10k: disable TX complete indication of htt
>> for sdio
>> 
>> Wen Gong  writes:
>> 
>> >> -Original Message-
>> >> From: ath10k  On Behalf Of Toke
>> >> Høiland-Jørgensen
>> >> Sent: Tuesday, August 20, 2019 8:24 PM
>> >> To: Wen Gong ; ath10k@lists.infradead.org
>> >> Cc: linux-wirel...@vger.kernel.org
>> >> Subject: [EXT] Re: [PATCH 4/7] ath10k: disable TX complete indication of
>> htt
>> > When this patch applied, firmware will not indicate tx complete for tx
>> > Data, it only indicate HTT_T2H_MSG_TYPE_TX_CREDIT_UPDATE_IND,
>> > This htt msg will tell how many data tx complete without status(status
>> maybe success/fail).
>> 
>> Ah, so this is basically a counter of how much data is currently queued
>> in the firmware?
> Yes.
>> 
>> >> And could you explain what the credits thing is for, please? :)
>> > For high latency bus chip, all the tx data's content(include ip/udp/tcp
>> header
>> > and payload) will be transfer to firmware's memory via bus.
>> > And firmware has limited memory for tx data, the tx data's content must
>> > Saved in firmware memory before it tx complete, if ath10k transfer tx
>> > data more than the limit, firmware will occur error. The credit is used
>> > to avoid ath10k exceed the limit.
>> 
>> What's a typical limit in the firmware?
> It is 96 data packet in my test. It can changed in firmware code.

Right, I see. Is this counter available in all ath10k firmware, or is it
SDIO only?

I'm asking because this could also be a way of implementing something
like Byte Queue Limits (though I'm not sure how effective it would be).

-Toke


Re: [PATCH 2/7] ath10k: change max RX bundle size from 8 to 32 for sdio

2019-08-20 Thread Toke Høiland-Jørgensen
Wen Gong  writes:

> The max bundle size support by firmware is 32, change it from 8 to 32
> will help performance. This results in significant performance
> improvement on RX path.

What happens when the hardware doesn't have enough data to fill a
bundle? Does it send a smaller one, or does it wait until it can fill
it?

-Toke


Re: [PATCH 4/7] ath10k: disable TX complete indication of htt for sdio

2019-08-20 Thread Toke Høiland-Jørgensen
Wen Gong  writes:

> Tx complete message from firmware cost bus bandwidth of sdio, and bus
> bandwidth is the bollteneck of throughput, it will effect the bandwidth
> occupancy of data packet of TX and RX.
>
> This patch disable TX complete indication from firmware for htt data
> packet, it results in significant performance improvement on TX path.

Wait, how does that work? Am I understanding it correctly that this
replaces a per-packet TX completion with a periodic one sent out of
band?

And could you explain what the credits thing is for, please? :)

-Toke


RE: [PATCH 2/7] ath10k: change max RX bundle size from 8 to 32 for sdio

2019-08-21 Thread Toke Høiland-Jørgensen
Wen Gong  writes:

>> -Original Message-
>> From: ath10k  On Behalf Of Toke
>> Høiland-Jørgensen
>> Sent: Tuesday, August 20, 2019 8:23 PM
>> To: Wen Gong ; ath10k@lists.infradead.org
>> Cc: linux-wirel...@vger.kernel.org
>> Subject: [EXT] Re: [PATCH 2/7] ath10k: change max RX bundle size from 8 to
>> 32 for sdio
>> 
>> Wen Gong  writes:
>> 
>> > The max bundle size support by firmware is 32, change it from 8 to 32
>> > will help performance. This results in significant performance
>> > improvement on RX path.
>> 
>> What happens when the hardware doesn't have enough data to fill a
>> bundle? Does it send a smaller one, or does it wait until it can fill
>> it?
>> 
>
> The bundle is filled by firmware. 
> It will not wait until it can fill it.
> For example, if you do ping per second, it will have only 1 ping packet
> In the bundle.

Right, cool.

-Toke


RE: [PATCH 4/7] ath10k: disable TX complete indication of htt for sdio

2019-08-21 Thread Toke Høiland-Jørgensen
Wen Gong  writes:

>> -Original Message-
>> From: ath10k  On Behalf Of Toke
>> Høiland-Jørgensen
>> Sent: Tuesday, August 20, 2019 8:24 PM
>> To: Wen Gong ; ath10k@lists.infradead.org
>> Cc: linux-wirel...@vger.kernel.org
>> Subject: [EXT] Re: [PATCH 4/7] ath10k: disable TX complete indication of htt
>> for sdio
>> 
>> Wen Gong  writes:
>> 
>> > Tx complete message from firmware cost bus bandwidth of sdio, and bus
>> > bandwidth is the bollteneck of throughput, it will effect the bandwidth
>> > occupancy of data packet of TX and RX.
>> >
>> > This patch disable TX complete indication from firmware for htt data
>> > packet, it results in significant performance improvement on TX path.
>> 
>> Wait, how does that work? Am I understanding it correctly that this
>> replaces a per-packet TX completion with a periodic one sent out of
>> band?
> When this patch applied, firmware will not indicate tx complete for tx
> Data, it only indicate HTT_T2H_MSG_TYPE_TX_CREDIT_UPDATE_IND,
> This htt msg will tell how many data tx complete without status(status maybe 
> success/fail).

Ah, so this is basically a counter of how much data is currently queued
in the firmware?

>> And could you explain what the credits thing is for, please? :)
> For high latency bus chip, all the tx data's content(include ip/udp/tcp header
> and payload) will be transfer to firmware's memory via bus.
> And firmware has limited memory for tx data, the tx data's content must
> Saved in firmware memory before it tx complete, if ath10k transfer tx
> data more than the limit, firmware will occur error. The credit is used
> to avoid ath10k exceed the limit.

What's a typical limit in the firmware?

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


RE: [PATCH 4/7] ath10k: disable TX complete indication of htt for sdio

2019-08-23 Thread Toke Høiland-Jørgensen
Wen Gong  writes:

>> -Original Message-
>> From: Toke Høiland-Jørgensen 
>> Sent: Thursday, August 22, 2019 8:12 PM
>> To: Wen Gong ; Wen Gong
>> ; ath10k@lists.infradead.org
>> Cc: linux-wirel...@vger.kernel.org
>> Subject: [EXT] RE: [PATCH 4/7] ath10k: disable TX complete indication of htt
>> for sdio
>> >> What's a typical limit in the firmware?
>> > It is 96 data packet in my test. It can changed in firmware code.
>> 
>> Right, I see. Is this counter available in all ath10k firmware, or is it
>> SDIO only?
>> 
> It is only for SDIO.
> For PCIE, it does not have credit limit, firmware memory only need to save
> The tx descriptor, not need for palyload.

OK, too bad. Thanks for the explanations :)

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-17 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> In a loop txqs dequeue scenario, if the first txq in the rbtree gets
> removed from rbtree immediately in the ieee80211_return_txq(), the
> loop will break soon in the ieee80211_next_txq() due to schedule_pos
> not leading to the second txq in the rbtree. Thus, defering the
> removal right before the end of this schedule round.
>
> Co-developed-by: Yibo Zhao 
> Signed-off-by: Yibo Zhao 
> Signed-off-by: Toke Høiland-Jørgensen 

I didn't write this patch, so please don't use my sign-off. I'll add
ack or review tags as appropriate in reply; but a few comments first:

> ---
>  include/net/mac80211.h | 16 ++--
>  net/mac80211/ieee80211_i.h |  3 +++
>  net/mac80211/main.c|  6 +
>  net/mac80211/tx.c  | 63 
> +++---
>  4 files changed, 83 insertions(+), 5 deletions(-)
>
> diff --git a/include/net/mac80211.h b/include/net/mac80211.h
> index ac2ed8e..ba5a345 100644
> --- a/include/net/mac80211.h
> +++ b/include/net/mac80211.h
> @@ -925,6 +925,8 @@ struct ieee80211_tx_rate {
>  
>  #define IEEE80211_MAX_TX_RETRY   31
>  
> +#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
> +
>  static inline void ieee80211_rate_set_vht(struct ieee80211_tx_rate *rate,
> u8 mcs, u8 nss)
>  {
> @@ -6232,7 +6234,8 @@ struct sk_buff *ieee80211_tx_dequeue(struct 
> ieee80211_hw *hw,
>   * @ac: AC number to return packets from.
>   *
>   * Should only be called between calls to ieee80211_txq_schedule_start()
> - * and ieee80211_txq_schedule_end().
> + * and ieee80211_txq_schedule_end(). If the txq is empty, it will be added
> + * to a remove list and get removed later.
>   * Returns the next txq if successful, %NULL if no queue is eligible. If a 
> txq
>   * is returned, it should be returned with ieee80211_return_txq() after the
>   * driver has finished scheduling it.
> @@ -6268,7 +6271,8 @@ void ieee80211_txq_schedule_start(struct ieee80211_hw 
> *hw, u8 ac)
>   * @hw: pointer as obtained from ieee80211_alloc_hw()
>   * @ac: AC number to acquire locks for
>   *
> - * Release locks previously acquired by ieee80211_txq_schedule_end().
> + * Release locks previously acquired by ieee80211_txq_schedule_end(). Check
> + * and remove the empty txq from rb-tree.
>   */
>  void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, u8 ac)
>   __releases(txq_lock);
> @@ -6287,6 +6291,14 @@ void ieee80211_schedule_txq(struct ieee80211_hw *hw, 
> struct ieee80211_txq *txq)
>   __acquires(txq_lock) __releases(txq_lock);
>  
>  /**
> + * ieee80211_txqs_check - Check txqs waiting for removal
> + *
> + * @tmr: pointer as obtained from local
> + *
> + */
> +void ieee80211_txqs_check(struct timer_list *tmr);
> +
> +/**
>   * ieee80211_txq_may_transmit - check whether TXQ is allowed to transmit
>   *
>   * This function is used to check whether given txq is allowed to transmit by
> diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
> index a4556f9..49aa143e 100644
> --- a/net/mac80211/ieee80211_i.h
> +++ b/net/mac80211/ieee80211_i.h
> @@ -847,6 +847,7 @@ struct txq_info {
>   struct codel_stats cstats;
>   struct sk_buff_head frags;
>   struct rb_node schedule_order;
> + struct list_head candidate;
>   unsigned long flags;
>  
>   /* keep last! */
> @@ -1145,6 +1146,8 @@ struct ieee80211_local {
>   u64 airtime_v_t[IEEE80211_NUM_ACS];
>   u64 airtime_weight_sum[IEEE80211_NUM_ACS];
>  
> + struct list_head remove_list[IEEE80211_NUM_ACS];
> + struct timer_list remove_timer;
>   u16 airtime_flags;
>  
>   const struct ieee80211_ops *ops;
> diff --git a/net/mac80211/main.c b/net/mac80211/main.c
> index e9ffa8e..78fe24a 100644
> --- a/net/mac80211/main.c
> +++ b/net/mac80211/main.c
> @@ -667,10 +667,15 @@ struct ieee80211_hw *ieee80211_alloc_hw_nm(size_t 
> priv_data_len,
>  
>   for (i = 0; i < IEEE80211_NUM_ACS; i++) {
>   local->active_txqs[i] = RB_ROOT_CACHED;
> + INIT_LIST_HEAD(>remove_list[i]);
>   spin_lock_init(>active_txq_lock[i]);
>   }
>   local->airtime_flags = AIRTIME_USE_TX | AIRTIME_USE_RX;
>  
> + timer_setup(>remove_timer, ieee80211_txqs_check, 0);
> + mod_timer(>remove_timer,
> +   jiffies + 
> msecs_to_jiffies(IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS));
> +
>   INIT_LIST_HEAD(>chanctx_list);
>   mutex_init(>chanctx_mtx);
>  
> @@ -1305,6 +1310,7 @@ void ieee80211_unregister_hw(struct ieee80211_hw *hw)
>   tasklet_kill(>tx_pending_tasklet);
>   tasklet_kill(>tasklet);

Re: [PATCH 1/4] mac80211: Switch to a virtual time-based airtime scheduler

2019-09-17 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> From: Toke Høiland-Jørgensen 
>
> This switches the airtime scheduler in mac80211 to use a virtual time-based
> scheduler instead of the round-robin scheduler used before. This has a
> couple of advantages:
>
> - No need to sync up the round-robin scheduler in firmware/hardware with
>   the round-robin airtime scheduler.
>
> - If several stations are eligible for transmission we can schedule both of
>   them; no need to hard-block the scheduling rotation until the head of the
>   queue has used up its quantum.
>
> - The check of whether a station is eligible for transmission becomes
>   simpler (in ieee80211_txq_may_transmit()).
>
> The drawback is that scheduling becomes slightly more expensive, as we need
> to maintain an rbtree of TXQs sorted by virtual time. This means that
> ieee80211_register_airtime() becomes O(logN) in the number of currently
> scheduled TXQs. However, hopefully this number rarely grows too big (it's
> only TXQs currently backlogged, not all associated stations), so it
> shouldn't be too big of an issue.
>
> Signed-off-by: Toke Høiland-Jørgensen 

I'll note that this patch still has the two issues that Felix pointed
out when I posted the RFC version. Namely:

- The use of divisions in the fast path. I guess I need to go write some
  reciprocal-calculation code, since that is also an issue with the AQL
  patches I linked to before.

- The fact that we don't count the airtime usage of multicast traffic,
  which with this series means that the vif TXQ will get priority over
  the others. I think we agreed to fix this by just adding an airtime
  v_t to the vif as well and use that for scheduling the TXQ. Does
  ath10k report airtime usage for multicast as well, or only for
  stations?


-Toke



Re: [PATCH 4/4] mac80211: Sync airtime weight sum with per AC synced sta airtime weight together

2019-09-17 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> Global airtime weight sum is updated only when txq is added/removed
> from rbtree. If upper layer configures sta weight during high load,
> airtime weight sum will not be updated since txq is most likely on the
> tree. It could a little late for upper layer to reconfigure sta weight
> when txq is already in the rbtree. And thus, incorrect airtime weight sum
> will lead to incorrect global virtual time calculation as well as global
> airtime weight sum overflow of airtime weight sum during txq removed.
>
> Hence, need to update airtime weight sum upon receiving event for
> configuring sta weight once sta's txq is on the rbtree.
>
> Besides, if airtime weight sum of ACs and sta weight is synced under the
> same per AC lock protection, there can be a very short window causing
> incorrct airtime weight sum calculation as below:
>
> active_txq_lock_VO  .
> VO weight sum is syncd.
> sta airtime weight sum is synced  .
> active_txq_unlock_VO  .
> . .
> active_txq_lock_VI.
> VI weight sum is syncd.
> sta airtime weight sumactive_txq_lock_BE
> active_txq_unlock_VIRemove txq and thus sum
> .   is calculated with synced
> .   sta airtime weight
> . active_txq_unlock_BE
>
> So introduce a per ac synced station airtime weight synced with per
> AC synced weight sum together. And the per-AC station airtime weight
> is used to calculate weight sum.
>
> Signed-off-by: Yibo Zhao 
> ---
>  net/mac80211/cfg.c  | 27 +--
>  net/mac80211/sta_info.c |  6 --
>  net/mac80211/sta_info.h |  3 +++
>  net/mac80211/tx.c   |  4 ++--
>  4 files changed, 34 insertions(+), 6 deletions(-)
>
> diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
> index d65aa01..4b420bb 100644
> --- a/net/mac80211/cfg.c
> +++ b/net/mac80211/cfg.c
> @@ -1284,7 +1284,8 @@ static int sta_apply_parameters(struct ieee80211_local 
> *local,
>   int ret = 0;
>   struct ieee80211_supported_band *sband;
>   struct ieee80211_sub_if_data *sdata = sta->sdata;
> - u32 mask, set;
> + u32 mask, set, tid, ac;
> + struct txq_info *txqi;
>  
>   sband = ieee80211_get_sband(sdata);
>   if (!sband)
> @@ -1452,8 +1453,30 @@ static int sta_apply_parameters(struct ieee80211_local 
> *local,
>   if (ieee80211_vif_is_mesh(>vif))
>   sta_apply_mesh_params(local, sta, params);
>  
> - if (params->airtime_weight)
> + if (params->airtime_weight &&
> + params->airtime_weight != sta->airtime_weight) {
> + for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) {
> + spin_lock_bh(>active_txq_lock[ac]);
> + for (tid = 0; tid < IEEE80211_NUM_TIDS + 1; tid++) {
> + if (!sta->sta.txq[tid] ||
> + ac != ieee80211_ac_from_tid(tid))
> + continue;
> +
> + sta->airtime_weight_synced[ac] =
> + params->airtime_weight;
> +
> + txqi = to_txq_info(sta->sta.txq[tid]);
> + if (RB_EMPTY_NODE(>schedule_order))
> + continue;
> +
> + local->airtime_weight_sum[ac] = 
> local->airtime_weight_sum[ac] +
> + 
> params->airtime_weight -
> + 
> sta->airtime_weight;
> + }
> + spin_unlock_bh(>active_txq_lock[ac]);
> + }
>   sta->airtime_weight = params->airtime_weight;

With this, airtime_weight is basically only used to return to and from
userspace, right? I.e., after the above loop has run, it will match the
contents of airtime_weight_synced; so why not just turn airtime_weight
into  a per-ac array? You could just use airtime_weight[0] as the value
to return to userspace...

-Toke



Re: [PATCH 3/4] mac80211: fix low throughput in push pull mode

2019-09-17 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> On 2019-09-16 23:27, Johannes Berg wrote:
>> Without really looking at the code -
>> 
>>> If station is ineligible for transmission in 
>>> ieee80211_txq_may_transmit(),
>>> no packet will be delivered to FW. During the tests in push-pull mode 
>>> with
>>> many clients, after several seconds, not a single station is an 
>>> eligible
>>> candidate for transmission since global time is smaller than all the
>>> station's virtual airtime. As a consequence, the Tx has been blocked 
>>> and
>>> throughput is quite low.
>> 
>> You should rewrite this to be, erm, a bit more understandable in
>> mac80211 context. I assume you're speaking (mostly?) about ath10k, but 
>> I
>> have very little context there. "push pull mode"? "firmware"? These
>> things are not something mac80211 knows about.
> Hi Johannes,
>
> Thanks for your kindly reminder. Will rewrite the commit log.
>
>> 
>>> Co-developed-by: Yibo Zhao 
>> 
>> That also seems wrong, should be Toke I guess, unless you intended for 
>> a
>> From: Toke to be present?
> Do you mean it should be something like:
>
> Co-developed-by: Toke Høiland-Jørgensen 
> Signed-off-by: Yibo Zhao 
> Signed-off-by: Toke Høiland-Jørgensen 
>
> Am I understanding right?

I think the right thing here, as with the previous patch, is to just
drop my sign-off; you're writing this patch, and I'll add ack/reviews as
appropriate. And in that case, well, no need to have co-developed-by
yourself when your name is on the patch as author :)

-Toke



Re: [PATCH 3/4] mac80211: fix low throughput in push pull mode

2019-09-18 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> On 2019-09-18 05:12, Toke Høiland-Jørgensen wrote:
>> Yibo Zhao  writes:
>> 
>>> On 2019-09-16 23:27, Johannes Berg wrote:
>>>> Without really looking at the code -
>>>> 
>>>>> If station is ineligible for transmission in
>>>>> ieee80211_txq_may_transmit(),
>>>>> no packet will be delivered to FW. During the tests in push-pull 
>>>>> mode
>>>>> with
>>>>> many clients, after several seconds, not a single station is an
>>>>> eligible
>>>>> candidate for transmission since global time is smaller than all the
>>>>> station's virtual airtime. As a consequence, the Tx has been blocked
>>>>> and
>>>>> throughput is quite low.
>>>> 
>>>> You should rewrite this to be, erm, a bit more understandable in
>>>> mac80211 context. I assume you're speaking (mostly?) about ath10k, 
>>>> but
>>>> I
>>>> have very little context there. "push pull mode"? "firmware"? These
>>>> things are not something mac80211 knows about.
>>> Hi Johannes,
>>> 
>>> Thanks for your kindly reminder. Will rewrite the commit log.
>>> 
>>>> 
>>>>> Co-developed-by: Yibo Zhao 
>>>> 
>>>> That also seems wrong, should be Toke I guess, unless you intended 
>>>> for
>>>> a
>>>> From: Toke to be present?
>>> Do you mean it should be something like:
>>> 
>>> Co-developed-by: Toke Høiland-Jørgensen 
>>> Signed-off-by: Yibo Zhao 
>>> Signed-off-by: Toke Høiland-Jørgensen 
>>> 
>>> Am I understanding right?
>> 
>> I think the right thing here, as with the previous patch, is to just
>> drop my sign-off; you're writing this patch, and I'll add ack/reviews 
>> as
>> appropriate. And in that case, well, no need to have co-developed-by
>> yourself when your name is on the patch as author :)
>> 
>> -Toke
> Sorry, I think I have missed checking your reply, please ignore the 
> wrong signed-off in PATCH-V2.

While you're re-spinning, could you please add a changelog for the
changes you make? Makes it easier to keep track :)

You can add a cover-letter with a full changelog instead of having a
separate changelog for each patch; that's what I usually do...

-Toke



Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-18 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:
>> Yibo Zhao  writes:
>> 
>>> In a loop txqs dequeue scenario, if the first txq in the rbtree gets
>>> removed from rbtree immediately in the ieee80211_return_txq(), the
>>> loop will break soon in the ieee80211_next_txq() due to schedule_pos
>>> not leading to the second txq in the rbtree. Thus, defering the
>>> removal right before the end of this schedule round.
>>> 
>>> Co-developed-by: Yibo Zhao 
>>> Signed-off-by: Yibo Zhao 
>>> Signed-off-by: Toke Høiland-Jørgensen 
>> 
>> I didn't write this patch, so please don't use my sign-off. I'll add
>> ack or review tags as appropriate in reply; but a few comments first:
>> 
>>> ---
>>>  include/net/mac80211.h | 16 ++--
>>>  net/mac80211/ieee80211_i.h |  3 +++
>>>  net/mac80211/main.c|  6 +
>>>  net/mac80211/tx.c  | 63 
>>> +++---
>>>  4 files changed, 83 insertions(+), 5 deletions(-)
>>> 
>>> diff --git a/include/net/mac80211.h b/include/net/mac80211.h
>>> index ac2ed8e..ba5a345 100644
>>> --- a/include/net/mac80211.h
>>> +++ b/include/net/mac80211.h
>>> @@ -925,6 +925,8 @@ struct ieee80211_tx_rate {
>>> 
>>>  #define IEEE80211_MAX_TX_RETRY 31
>>> 
>>> +#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
>>> +
>>>  static inline void ieee80211_rate_set_vht(struct ieee80211_tx_rate 
>>> *rate,
>>>   u8 mcs, u8 nss)
>>>  {
>>> @@ -6232,7 +6234,8 @@ struct sk_buff *ieee80211_tx_dequeue(struct 
>>> ieee80211_hw *hw,
>>>   * @ac: AC number to return packets from.
>>>   *
>>>   * Should only be called between calls to 
>>> ieee80211_txq_schedule_start()
>>> - * and ieee80211_txq_schedule_end().
>>> + * and ieee80211_txq_schedule_end(). If the txq is empty, it will be 
>>> added
>>> + * to a remove list and get removed later.
>>>   * Returns the next txq if successful, %NULL if no queue is eligible. 
>>> If a txq
>>>   * is returned, it should be returned with ieee80211_return_txq() 
>>> after the
>>>   * driver has finished scheduling it.
>>> @@ -6268,7 +6271,8 @@ void ieee80211_txq_schedule_start(struct 
>>> ieee80211_hw *hw, u8 ac)
>>>   * @hw: pointer as obtained from ieee80211_alloc_hw()
>>>   * @ac: AC number to acquire locks for
>>>   *
>>> - * Release locks previously acquired by ieee80211_txq_schedule_end().
>>> + * Release locks previously acquired by ieee80211_txq_schedule_end(). 
>>> Check
>>> + * and remove the empty txq from rb-tree.
>>>   */
>>>  void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, u8 ac)
>>> __releases(txq_lock);
>>> @@ -6287,6 +6291,14 @@ void ieee80211_schedule_txq(struct ieee80211_hw 
>>> *hw, struct ieee80211_txq *txq)
>>> __acquires(txq_lock) __releases(txq_lock);
>>> 
>>>  /**
>>> + * ieee80211_txqs_check - Check txqs waiting for removal
>>> + *
>>> + * @tmr: pointer as obtained from local
>>> + *
>>> + */
>>> +void ieee80211_txqs_check(struct timer_list *tmr);
>>> +
>>> +/**
>>>   * ieee80211_txq_may_transmit - check whether TXQ is allowed to 
>>> transmit
>>>   *
>>>   * This function is used to check whether given txq is allowed to 
>>> transmit by
>>> diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
>>> index a4556f9..49aa143e 100644
>>> --- a/net/mac80211/ieee80211_i.h
>>> +++ b/net/mac80211/ieee80211_i.h
>>> @@ -847,6 +847,7 @@ struct txq_info {
>>> struct codel_stats cstats;
>>> struct sk_buff_head frags;
>>> struct rb_node schedule_order;
>>> +   struct list_head candidate;
>>> unsigned long flags;
>>> 
>>> /* keep last! */
>>> @@ -1145,6 +1146,8 @@ struct ieee80211_local {
>>> u64 airtime_v_t[IEEE80211_NUM_ACS];
>>> u64 airtime_weight_sum[IEEE80211_NUM_ACS];
>>> 
>>> +   struct list_head remove_list[IEEE80211_NUM_ACS];
>>> +   struct timer_list remove_timer;
>>> u16 airtime_flags;
>>> 
>>> const struct ieee80211_ops *ops;
>>> diff --git a/net/mac80211/main.c b/net/mac80211/main.c
>>> index e9ffa8e..78fe24a 100644
>>> ---

Re: [PATCH 1/4] mac80211: Switch to a virtual time-based airtime scheduler

2019-09-17 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> From: Toke Høiland-Jørgensen 
>
> This switches the airtime scheduler in mac80211 to use a virtual time-based
> scheduler instead of the round-robin scheduler used before. This has a
> couple of advantages:

Thank you for keeping at this! I'll take a look at the series in detail
tomorrow.

While you're testing things related to this, I've also prototyped a port
of the "airtime queue limit" feature from chromeos into mainline. It's
currently in my tree here:
https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/log/?h=mac80211-aql-01

If you have time to test it at some point, that would be awesome. I'm
planning to submit it as an RFC, but it needs a bit more work first.
Also, it's completely untested, but it does compile :)

-Toke



Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-21 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> On 2019-09-20 17:15, Toke Høiland-Jørgensen wrote:
>> Yibo Zhao  writes:
>> 
>>> On 2019-09-19 18:37, Toke Høiland-Jørgensen wrote:
>>>> Yibo Zhao  writes:
>>>> 
>>>>> On 2019-09-18 19:23, Toke Høiland-Jørgensen wrote:
>>>>>> Yibo Zhao  writes:
>>>>>> 
>>>>>>> On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:
>>>>>>>> Yibo Zhao  writes:
>>>>>>>> 
>>>>>>>>> In a loop txqs dequeue scenario, if the first txq in the rbtree
>>>>>>>>> gets
>>>>>>>>> removed from rbtree immediately in the ieee80211_return_txq(), 
>>>>>>>>> the
>>>>>>>>> loop will break soon in the ieee80211_next_txq() due to
>>>>>>>>> schedule_pos
>>>>>>>>> not leading to the second txq in the rbtree. Thus, defering the
>>>>>>>>> removal right before the end of this schedule round.
>>>>>>>>> 
>>>>>>>>> Co-developed-by: Yibo Zhao 
>>>>>>>>> Signed-off-by: Yibo Zhao 
>>>>>>>>> Signed-off-by: Toke Høiland-Jørgensen 
>>>>>>>> 
>>>>>>>> I didn't write this patch, so please don't use my sign-off. I'll
>>>>>>>> add
>>>>>>>> ack or review tags as appropriate in reply; but a few comments
>>>>>>>> first:
>>>>>>>> 
>>>>>>>>> ---
>>>>>>>>>  include/net/mac80211.h | 16 ++--
>>>>>>>>>  net/mac80211/ieee80211_i.h |  3 +++
>>>>>>>>>  net/mac80211/main.c|  6 +
>>>>>>>>>  net/mac80211/tx.c  | 63
>>>>>>>>> +++---
>>>>>>>>>  4 files changed, 83 insertions(+), 5 deletions(-)
>>>>>>>>> 
>>>>>>>>> diff --git a/include/net/mac80211.h b/include/net/mac80211.h
>>>>>>>>> index ac2ed8e..ba5a345 100644
>>>>>>>>> --- a/include/net/mac80211.h
>>>>>>>>> +++ b/include/net/mac80211.h
>>>>>>>>> @@ -925,6 +925,8 @@ struct ieee80211_tx_rate {
>>>>>>>>> 
>>>>>>>>>  #define IEEE80211_MAX_TX_RETRY   31
>>>>>>>>> 
>>>>>>>>> +#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
>>>>>>>>> +
>>>>>>>>>  static inline void ieee80211_rate_set_vht(struct
>>>>>>>>> ieee80211_tx_rate
>>>>>>>>> *rate,
>>>>>>>>> u8 mcs, u8 nss)
>>>>>>>>>  {
>>>>>>>>> @@ -6232,7 +6234,8 @@ struct sk_buff 
>>>>>>>>> *ieee80211_tx_dequeue(struct
>>>>>>>>> ieee80211_hw *hw,
>>>>>>>>>   * @ac: AC number to return packets from.
>>>>>>>>>   *
>>>>>>>>>   * Should only be called between calls to
>>>>>>>>> ieee80211_txq_schedule_start()
>>>>>>>>> - * and ieee80211_txq_schedule_end().
>>>>>>>>> + * and ieee80211_txq_schedule_end(). If the txq is empty, it 
>>>>>>>>> will
>>>>>>>>> be
>>>>>>>>> added
>>>>>>>>> + * to a remove list and get removed later.
>>>>>>>>>   * Returns the next txq if successful, %NULL if no queue is
>>>>>>>>> eligible.
>>>>>>>>> If a txq
>>>>>>>>>   * is returned, it should be returned with 
>>>>>>>>> ieee80211_return_txq()
>>>>>>>>> after the
>>>>>>>>>   * driver has finished scheduling it.
>>>>>>>>> @@ -6268,7 +6271,8 @@ void ieee80211_txq_schedule_start(struct
>>>>>>>>> ieee80211_hw *hw, u8 ac)
>>>>>>>>>   * @hw: pointer as obtained from ieee80211_alloc_hw()
>>>>>>>>>   * @ac: AC number to acquire locks for
>>>>>>>>>   *
>>>>>>>>>

Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-21 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> On 2019-09-21 19:27, Toke Høiland-Jørgensen wrote:
>> Yibo Zhao  writes:
>> 
>>> On 2019-09-20 17:15, Toke Høiland-Jørgensen wrote:
>>>> Yibo Zhao  writes:
>>>> 
>>>>> On 2019-09-19 18:37, Toke Høiland-Jørgensen wrote:
>>>>>> Yibo Zhao  writes:
>>>>>> 
>>>>>>> On 2019-09-18 19:23, Toke Høiland-Jørgensen wrote:
>>>>>>>> Yibo Zhao  writes:
>>>>>>>> 
>>>>>>>>> On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:
>>>>>>>>>> Yibo Zhao  writes:
>>>>>>>>>> 
>>>>>>>>>>> In a loop txqs dequeue scenario, if the first txq in the 
>>>>>>>>>>> rbtree
>>>>>>>>>>> gets
>>>>>>>>>>> removed from rbtree immediately in the ieee80211_return_txq(),
>>>>>>>>>>> the
>>>>>>>>>>> loop will break soon in the ieee80211_next_txq() due to
>>>>>>>>>>> schedule_pos
>>>>>>>>>>> not leading to the second txq in the rbtree. Thus, defering 
>>>>>>>>>>> the
>>>>>>>>>>> removal right before the end of this schedule round.
>>>>>>>>>>> 
>>>>>>>>>>> Co-developed-by: Yibo Zhao 
>>>>>>>>>>> Signed-off-by: Yibo Zhao 
>>>>>>>>>>> Signed-off-by: Toke Høiland-Jørgensen 
>>>>>>>>>> 
>>>>>>>>>> I didn't write this patch, so please don't use my sign-off. 
>>>>>>>>>> I'll
>>>>>>>>>> add
>>>>>>>>>> ack or review tags as appropriate in reply; but a few comments
>>>>>>>>>> first:
>>>>>>>>>> 
>>>>>>>>>>> ---
>>>>>>>>>>>  include/net/mac80211.h | 16 ++--
>>>>>>>>>>>  net/mac80211/ieee80211_i.h |  3 +++
>>>>>>>>>>>  net/mac80211/main.c|  6 +
>>>>>>>>>>>  net/mac80211/tx.c  | 63
>>>>>>>>>>> +++---
>>>>>>>>>>>  4 files changed, 83 insertions(+), 5 deletions(-)
>>>>>>>>>>> 
>>>>>>>>>>> diff --git a/include/net/mac80211.h b/include/net/mac80211.h
>>>>>>>>>>> index ac2ed8e..ba5a345 100644
>>>>>>>>>>> --- a/include/net/mac80211.h
>>>>>>>>>>> +++ b/include/net/mac80211.h
>>>>>>>>>>> @@ -925,6 +925,8 @@ struct ieee80211_tx_rate {
>>>>>>>>>>> 
>>>>>>>>>>>  #define IEEE80211_MAX_TX_RETRY 31
>>>>>>>>>>> 
>>>>>>>>>>> +#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
>>>>>>>>>>> +
>>>>>>>>>>>  static inline void ieee80211_rate_set_vht(struct
>>>>>>>>>>> ieee80211_tx_rate
>>>>>>>>>>> *rate,
>>>>>>>>>>>   u8 mcs, u8 nss)
>>>>>>>>>>>  {
>>>>>>>>>>> @@ -6232,7 +6234,8 @@ struct sk_buff
>>>>>>>>>>> *ieee80211_tx_dequeue(struct
>>>>>>>>>>> ieee80211_hw *hw,
>>>>>>>>>>>   * @ac: AC number to return packets from.
>>>>>>>>>>>   *
>>>>>>>>>>>   * Should only be called between calls to
>>>>>>>>>>> ieee80211_txq_schedule_start()
>>>>>>>>>>> - * and ieee80211_txq_schedule_end().
>>>>>>>>>>> + * and ieee80211_txq_schedule_end(). If the txq is empty, it
>>>>>>>>>>> will
>>>>>>>>>>> be
>>>>>>>>>>> added
>>>>>>>>>>> + * to a remove list and get removed later.
>>>>>>>>>>>   * Returns the next txq if successful, %NULL if no queu

Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-21 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> On 2019-09-21 21:02, Toke Høiland-Jørgensen wrote:
>> Yibo Zhao  writes:
>> 
>>> On 2019-09-21 19:27, Toke Høiland-Jørgensen wrote:
>>>> Yibo Zhao  writes:
>>>> 
>>>>> On 2019-09-20 17:15, Toke Høiland-Jørgensen wrote:
>>>>>> Yibo Zhao  writes:
>>>>>> 
>>>>>>> On 2019-09-19 18:37, Toke Høiland-Jørgensen wrote:
>>>>>>>> Yibo Zhao  writes:
>>>>>>>> 
>>>>>>>>> On 2019-09-18 19:23, Toke Høiland-Jørgensen wrote:
>>>>>>>>>> Yibo Zhao  writes:
>>>>>>>>>> 
>>>>>>>>>>> On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:
>>>>>>>>>>>> Yibo Zhao  writes:
>>>>>>>>>>>> 
>>>>>>>>>>>>> In a loop txqs dequeue scenario, if the first txq in the
>>>>>>>>>>>>> rbtree
>>>>>>>>>>>>> gets
>>>>>>>>>>>>> removed from rbtree immediately in the 
>>>>>>>>>>>>> ieee80211_return_txq(),
>>>>>>>>>>>>> the
>>>>>>>>>>>>> loop will break soon in the ieee80211_next_txq() due to
>>>>>>>>>>>>> schedule_pos
>>>>>>>>>>>>> not leading to the second txq in the rbtree. Thus, defering
>>>>>>>>>>>>> the
>>>>>>>>>>>>> removal right before the end of this schedule round.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Co-developed-by: Yibo Zhao 
>>>>>>>>>>>>> Signed-off-by: Yibo Zhao 
>>>>>>>>>>>>> Signed-off-by: Toke Høiland-Jørgensen 
>>>>>>>>>>>> 
>>>>>>>>>>>> I didn't write this patch, so please don't use my sign-off.
>>>>>>>>>>>> I'll
>>>>>>>>>>>> add
>>>>>>>>>>>> ack or review tags as appropriate in reply; but a few 
>>>>>>>>>>>> comments
>>>>>>>>>>>> first:
>>>>>>>>>>>> 
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>  include/net/mac80211.h | 16 ++--
>>>>>>>>>>>>>  net/mac80211/ieee80211_i.h |  3 +++
>>>>>>>>>>>>>  net/mac80211/main.c|  6 +
>>>>>>>>>>>>>  net/mac80211/tx.c  | 63
>>>>>>>>>>>>> +++---
>>>>>>>>>>>>>  4 files changed, 83 insertions(+), 5 deletions(-)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> diff --git a/include/net/mac80211.h b/include/net/mac80211.h
>>>>>>>>>>>>> index ac2ed8e..ba5a345 100644
>>>>>>>>>>>>> --- a/include/net/mac80211.h
>>>>>>>>>>>>> +++ b/include/net/mac80211.h
>>>>>>>>>>>>> @@ -925,6 +925,8 @@ struct ieee80211_tx_rate {
>>>>>>>>>>>>> 
>>>>>>>>>>>>>  #define IEEE80211_MAX_TX_RETRY   31
>>>>>>>>>>>>> 
>>>>>>>>>>>>> +#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
>>>>>>>>>>>>> +
>>>>>>>>>>>>>  static inline void ieee80211_rate_set_vht(struct
>>>>>>>>>>>>> ieee80211_tx_rate
>>>>>>>>>>>>> *rate,
>>>>>>>>>>>>> u8 mcs, u8 nss)
>>>>>>>>>>>>>  {
>>>>>>>>>>>>> @@ -6232,7 +6234,8 @@ struct sk_buff
>>>>>>>>>>>>> *ieee80211_tx_dequeue(struct
>>>>>>>>>>>>> ieee80211_hw *hw,
>>>>>>>>>>>>>   * @ac: AC number to return packets from.
>>>>>>>>>>>>>   *
>

Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-19 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> On 2019-09-18 19:23, Toke Høiland-Jørgensen wrote:
>> Yibo Zhao  writes:
>> 
>>> On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:
>>>> Yibo Zhao  writes:
>>>> 
>>>>> In a loop txqs dequeue scenario, if the first txq in the rbtree gets
>>>>> removed from rbtree immediately in the ieee80211_return_txq(), the
>>>>> loop will break soon in the ieee80211_next_txq() due to schedule_pos
>>>>> not leading to the second txq in the rbtree. Thus, defering the
>>>>> removal right before the end of this schedule round.
>>>>> 
>>>>> Co-developed-by: Yibo Zhao 
>>>>> Signed-off-by: Yibo Zhao 
>>>>> Signed-off-by: Toke Høiland-Jørgensen 
>>>> 
>>>> I didn't write this patch, so please don't use my sign-off. I'll add
>>>> ack or review tags as appropriate in reply; but a few comments first:
>>>> 
>>>>> ---
>>>>>  include/net/mac80211.h | 16 ++--
>>>>>  net/mac80211/ieee80211_i.h |  3 +++
>>>>>  net/mac80211/main.c|  6 +
>>>>>  net/mac80211/tx.c  | 63
>>>>> +++---
>>>>>  4 files changed, 83 insertions(+), 5 deletions(-)
>>>>> 
>>>>> diff --git a/include/net/mac80211.h b/include/net/mac80211.h
>>>>> index ac2ed8e..ba5a345 100644
>>>>> --- a/include/net/mac80211.h
>>>>> +++ b/include/net/mac80211.h
>>>>> @@ -925,6 +925,8 @@ struct ieee80211_tx_rate {
>>>>> 
>>>>>  #define IEEE80211_MAX_TX_RETRY   31
>>>>> 
>>>>> +#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
>>>>> +
>>>>>  static inline void ieee80211_rate_set_vht(struct ieee80211_tx_rate
>>>>> *rate,
>>>>> u8 mcs, u8 nss)
>>>>>  {
>>>>> @@ -6232,7 +6234,8 @@ struct sk_buff *ieee80211_tx_dequeue(struct
>>>>> ieee80211_hw *hw,
>>>>>   * @ac: AC number to return packets from.
>>>>>   *
>>>>>   * Should only be called between calls to
>>>>> ieee80211_txq_schedule_start()
>>>>> - * and ieee80211_txq_schedule_end().
>>>>> + * and ieee80211_txq_schedule_end(). If the txq is empty, it will 
>>>>> be
>>>>> added
>>>>> + * to a remove list and get removed later.
>>>>>   * Returns the next txq if successful, %NULL if no queue is 
>>>>> eligible.
>>>>> If a txq
>>>>>   * is returned, it should be returned with ieee80211_return_txq()
>>>>> after the
>>>>>   * driver has finished scheduling it.
>>>>> @@ -6268,7 +6271,8 @@ void ieee80211_txq_schedule_start(struct
>>>>> ieee80211_hw *hw, u8 ac)
>>>>>   * @hw: pointer as obtained from ieee80211_alloc_hw()
>>>>>   * @ac: AC number to acquire locks for
>>>>>   *
>>>>> - * Release locks previously acquired by 
>>>>> ieee80211_txq_schedule_end().
>>>>> + * Release locks previously acquired by 
>>>>> ieee80211_txq_schedule_end().
>>>>> Check
>>>>> + * and remove the empty txq from rb-tree.
>>>>>   */
>>>>>  void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, u8 ac)
>>>>>   __releases(txq_lock);
>>>>> @@ -6287,6 +6291,14 @@ void ieee80211_schedule_txq(struct 
>>>>> ieee80211_hw
>>>>> *hw, struct ieee80211_txq *txq)
>>>>>   __acquires(txq_lock) __releases(txq_lock);
>>>>> 
>>>>>  /**
>>>>> + * ieee80211_txqs_check - Check txqs waiting for removal
>>>>> + *
>>>>> + * @tmr: pointer as obtained from local
>>>>> + *
>>>>> + */
>>>>> +void ieee80211_txqs_check(struct timer_list *tmr);
>>>>> +
>>>>> +/**
>>>>>   * ieee80211_txq_may_transmit - check whether TXQ is allowed to
>>>>> transmit
>>>>>   *
>>>>>   * This function is used to check whether given txq is allowed to
>>>>> transmit by
>>>>> diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
>>>>> index a4556f9..49aa143e 100644
>>>>> --- a/net/mac80211/ieee80211_i.h
>>>>> +++ b/net/mac

Re: [PATCH V3 0/4] Enable virtual time-based airtime scheduler support on ath10k

2019-10-01 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> On 2019-10-01 18:19, Johannes Berg wrote:
>> On Mon, 2019-09-23 at 15:19 +0800, Yibo Zhao wrote:
>>> This series fix some issues when enabling virtual time-based airtime 
>>> scheduler on ath10k.
>>> 
>> Given the lengthy discussion here and also over in the related thread
>> about AQL, I'm assuming you're going to repost this eventually.
>
> Yes, will repost once Toke have updated "mac80211: Switch to a virtual 
> time-based airtime scheduler" with his new ideas.

Which in turn is waiting for the AQL stuff... :)

-Toke



Re: [PATCH] mac80211: fix txq null pointer dereference

2019-09-27 Thread Toke Høiland-Jørgensen
Miaoqing Pan  writes:

> If the interface type is P2P_DEVICE or NAN, read the file of
> '/sys/kernel/debug/ieee80211/phyx/netdev:wlanx/aqm' will get a
> NULL pointer dereference. As for those interface type, the
> pointer sdata->vif.txq is NULL.

Heh. Oops!

> Unable to handle kernel NULL pointer dereference at virtual address 0011
> CPU: 1 PID: 30936 Comm: cat Not tainted 4.14.104 #1
> task: ffc0337e4880 task.stack: ff800cd2
> PC is at ieee80211_if_fmt_aqm+0x34/0xa0 [mac80211]
> LR is at ieee80211_if_fmt_aqm+0x34/0xa0 [mac80211]
> pc : [] lr : [] pstate: 6145
> sp : ff800cd23bb0
> x29: ff800cd23c00 x28: ffc0337e4880
> x27: ff8008a04000 x26: 0003
> x25: 018e x24: ff80090f9d30
> x23: ffc034b62000 x22: 
> x21: ffc0335008e0 x20: ff800cd23c10
> x19: 00c8 x18: 
> x17:  x16: ff80081ef454
> x15:  x14: f45c1d27
> x13: ffab6710 x12: 0003
> x11: 0006 x10: 
> x9 : 0001 x8 : ffc0337e4880
> x7 : 0003 x6 : f4498000
> x5 :  x4 : ff8000b7
> x3 : ff800cd23e80 x2 : 00c8
> x1 : ff800cd23c10 x0 : ffc0335008e0
> Process cat (pid: 30936, stack limit = 0xff800cd2)
> [Call trace:
> Exception stack(0xff800cd23a70 to 0xff800cd23bb0)
> 3a60:   ffc0335008e0 ff800cd23c10
> 3a80: 00c8 ff800cd23e80 ff8000b7 
> 3aa0: f4498000 0003 ffc0337e4880 0001
> 3ac0:  0006 0003 ffab6710
> 3ae0: f45c1d27  ff80081ef454 
> 3b00:  00c8 ff800cd23c10 ffc0335008e0
> 3b20:  ffc034b62000 ff80090f9d30 018e
> 3b40: 0003 ff8008a04000 ffc0337e4880 ff800cd23c00
> 3b60: ff8000b7cd00 ff800cd23bb0 ff8000b7cd00 6145
> 3b80: ff800cd23ba0 ff80088e16e4 007f ff800cd23c40
> 3ba0: ff800cd23c00 ff8000b7cd00
> [] ieee80211_if_fmt_aqm+0x34/0xa0 [mac80211]
> [] ieee80211_if_read+0x60/0xbc [mac80211]
> [] ieee80211_if_read_aqm+0x28/0x30 [mac80211]
> [] full_proxy_read+0x2c/0x48
> [] __vfs_read+0x2c/0xd4
> [] vfs_read+0x8c/0x108
> [] SyS_read+0x40/0x7c
>
> Tested HW: QCA9984
> Tested FW: 10.4-3.10-00047
>
> Signed-off-by: Miaoqing Pan 

Acked-by: Toke Høiland-Jørgensen 



Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-23 Thread Toke Høiland-Jørgensen
Kalle Valo  writes:

> Toke Høiland-Jørgensen  writes:
>
>> Yibo Zhao  writes:
>>
>>> On 2019-09-21 22:00, Toke Høiland-Jørgensen wrote:
>>>> Yibo Zhao  writes:
>>>> 
>>>>> On 2019-09-21 21:02, Toke Høiland-Jørgensen wrote:
>>>>>> Yibo Zhao  writes:
>>>>>> 
>>>>>>> On 2019-09-21 19:27, Toke Høiland-Jørgensen wrote:
>>>>>>>> Yibo Zhao  writes:
>>>>>>>> 
>>>>>>>>> On 2019-09-20 17:15, Toke Høiland-Jørgensen wrote:
>>>>>>>>>> Yibo Zhao  writes:
>>>>>>>>>> 
>>>>>>>>>>> On 2019-09-19 18:37, Toke Høiland-Jørgensen wrote:
>>>>>>>>>>>> Yibo Zhao  writes:
>>>>>>>>>>>> 
>>>>>>>>>>>>> On 2019-09-18 19:23, Toke Høiland-Jørgensen wrote:
>>>>>>>>>>>>>> Yibo Zhao  writes:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:
>>>>>>>>>>>>>>>> Yibo Zhao  writes:
>
> Guys, PLEASE please consider us poor maintainers drowning in email and
> edit your quotes :) This style of discussion makes patchwork unusable:
>
> https://patchwork.kernel.org/patch/11147019/

Heh, oops, didn't realise you were following the discussion from
patchwork; sorry, will be sure to cut things in the future.

The quote marks do make a very nice (reverse) christmas tree, though ;)

-Toke



Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-24 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> On 2019-09-24 15:26, Toke Høiland-Jørgensen wrote:
>>>> Hmm, yeah, I guess we could end up with a loop like that as well.
>>>> Keeping the schedule_round would be a way to fix it, but I'm not sure
>>>> we
>>>> should just skip that station; maybe we should just end the round
>>>> instead?
>>> I am not sure. I believe, in some cases, the rest of the nodes which
>>> could be most of the nodes in the tree will not have the chance to be
>>> scheduled in this round.
>> 
>> My guess would be that it doesn't really matter, because in most cases
>> each schedule round will only actually end up queueing packets from one
>> or two stations; as the driver will pull multiple packets from that one
>> station which will often fill up the firmware queues (especially once 
>> we
>> start throttling that with the AQL stuff).
>> 
>> So I guess we can just skip TXQs that we've already seen this 
>> scheduling
>> round, and let the v_t compare determine transmit eligibility :)
>
> I am a little confused. So do you mean it is fine for you to skip the 
> TXQs we met in this round before and continue the loop until the end or 
> vt comparison failure?

Yeah. In most cases it won't make any difference; but it'll make sure we
visit all eligible TXQs in all cases, so we might as well do that if
we're tracking the scheduling round anyway.

-Toke



Re: [PATCH V3 3/4] mac80211: fix low throughput in multi-clients situation

2019-09-24 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

>> I can see why we need the second part (basically, this happens because 
>> I
>> forgot to add a check for "no eligible stations" in may_transmit(), 
>> like
>> the one in next_txq()). And rounding up the division result doesn't
>> hurt, I guess. But why does it help to change the grace period if we're
>> doing all the other stuff?
> In multi-clients case, it is possible a TXQ sometimes gets drained due 
> to FW has deep queue and few packets in TXQ at that time. So the TXQ is 
> removed from the rbtree after dequeuing. When it is about to added back 
> very soon after the removal, the g_vt might have gone a little far away 
> from sta vt where sync is needed. With this sync, the station is forced 
> to catch up with the g_vt, however, its chance for transmission has been 
> reduced. I think 500us is quite a short period in multi-clients case.

That's a good point, actually: Having the grace period be too small will
allow stations that leave and re-enter the queue to "skip ahead" and use
more than its share. However, I think it's a separate issue from what
this patch is about; so how about I just increase the grace period in
the next version of the base patch?

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-24 Thread Toke Høiland-Jørgensen
Kalle Valo  writes:

> Toke Høiland-Jørgensen  writes:
>
>> Kalle Valo  writes:
>>
>>> Toke Høiland-Jørgensen  writes:
>>>
>>>> Yibo Zhao  writes:
>>>>
>>>>> On 2019-09-21 22:00, Toke Høiland-Jørgensen wrote:
>>>>>> Yibo Zhao  writes:
>>>>>> 
>>>>>>> On 2019-09-21 21:02, Toke Høiland-Jørgensen wrote:
>>>>>>>> Yibo Zhao  writes:
>>>>>>>> 
>>>>>>>>> On 2019-09-21 19:27, Toke Høiland-Jørgensen wrote:
>>>>>>>>>> Yibo Zhao  writes:
>>>>>>>>>> 
>>>>>>>>>>> On 2019-09-20 17:15, Toke Høiland-Jørgensen wrote:
>>>>>>>>>>>> Yibo Zhao  writes:
>>>>>>>>>>>> 
>>>>>>>>>>>>> On 2019-09-19 18:37, Toke Høiland-Jørgensen wrote:
>>>>>>>>>>>>>> Yibo Zhao  writes:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 2019-09-18 19:23, Toke Høiland-Jørgensen wrote:
>>>>>>>>>>>>>>>> Yibo Zhao  writes:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:
>>>>>>>>>>>>>>>>>> Yibo Zhao  writes:
>>>
>>> Guys, PLEASE please consider us poor maintainers drowning in email and
>>> edit your quotes :) This style of discussion makes patchwork unusable:
>>>
>>> https://patchwork.kernel.org/patch/11147019/
>>
>> Heh, oops, didn't realise you were following the discussion from
>> patchwork; sorry, will be sure to cut things in the future.
>
> To be honest, I'm not sure how much Johannes uses patchwork. But I
> check everything from patchwork 95% of the time and try to keep my
> email boxes clean.

Noted. I'll try to be nice to patchwork, then :)

>> The quote marks do make a very nice (reverse) christmas tree, though ;)
>
> It did! I had to include that to my rant :)

:D

-Toke



Re: [PATCH V3 4/4] mac80211: Sync airtime weight sum with per AC synced sta airtime weight together

2019-09-24 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> On 2019-09-23 19:00, Toke Høiland-Jørgensen wrote:
>
>>> -   if (params->airtime_weight)
>>> -   sta->airtime_weight = params->airtime_weight;
>>> +   if (params->airtime_weight &&
>>> +   params->airtime_weight != sta->airtime_weight) {
>> 
>> This check doesn't work I think? You're not using the array-based
>> sta->airtime_weight[], and there are locking issues by just checking
>> like this; so maybe just keep the if() on params->airtime_weight, and 
>> do
>> the checking inside the loop while holding the lock?
>
> It should be array-based sta->airtime_weight[] and I am missing that 
> part during the porting. But you are right about we should check it 
> inside the loop with the lock.
>
>> 
>> Or could we just turn the weights into atomics to avoid the locking
>> entirely?
>
> Actually, we still need the active txq locking to make sure the txq is 
> on the rbtree. Otherwise, no need to change airtime weight sum.

True. Just moving the check inside the locking will be the right thing
to do, then.

-Toke



Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-24 Thread Toke Høiland-Jørgensen
>> Hmm, yeah, I guess we could end up with a loop like that as well.
>> Keeping the schedule_round would be a way to fix it, but I'm not sure 
>> we
>> should just skip that station; maybe we should just end the round
>> instead?
> I am not sure. I believe, in some cases, the rest of the nodes which 
> could be most of the nodes in the tree will not have the chance to be 
> scheduled in this round.

My guess would be that it doesn't really matter, because in most cases
each schedule round will only actually end up queueing packets from one
or two stations; as the driver will pull multiple packets from that one
station which will often fill up the firmware queues (especially once we
start throttling that with the AQL stuff).

So I guess we can just skip TXQs that we've already seen this scheduling
round, and let the v_t compare determine transmit eligibility :)

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH V3 4/4] mac80211: Sync airtime weight sum with per AC synced sta airtime weight together

2019-09-23 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> Global airtime weight sum is updated only when txq is added/removed
> from rbtree. If upper layer configures sta weight during high load,
> airtime weight sum will not be updated since txq is most likely on the
> tree. It could a little late for upper layer to reconfigure sta weight
> when txq is already in the rbtree. And thus, incorrect airtime weight sum
> will lead to incorrect global virtual time calculation as well as overflow
> of airtime weight sum during txq removed.
>
> Hence, need to update airtime weight sum upon receiving event for
> configuring sta weight once sta's txq is on the rbtree.
>
> Besides, if airtime weight sum of ACs and sta weight is synced under the
> same per AC lock protection, there can be a very short window causing
> incorrct airtime weight sum calculation as below:
>
> active_txq_lock_VO  .
> VO weight sum is syncd.
> sta airtime weight sum is synced  .
> active_txq_unlock_VO  .
> . .
> active_txq_lock_VI.
> VI weight sum is syncd.
> sta airtime weight sumactive_txq_lock_BE
> active_txq_unlock_VIRemove txq and thus sum
> .   is calculated with synced
> .   sta airtime weight
> . active_txq_unlock_BE
>
> So introduce a per ac synced station airtime weight synced with per
> AC synced weight sum together. And the per-AC station airtime weight
> is used to calculate weight sum.
>
> Signed-off-by: Yibo Zhao 
> ---
>  net/mac80211/cfg.c | 29 ++---
>  net/mac80211/debugfs_sta.c |  2 +-
>  net/mac80211/sta_info.c|  9 -
>  net/mac80211/sta_info.h|  5 +++--
>  net/mac80211/tx.c  |  4 ++--
>  5 files changed, 36 insertions(+), 13 deletions(-)
>
> diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
> index d65aa01..c8a0683 100644
> --- a/net/mac80211/cfg.c
> +++ b/net/mac80211/cfg.c
> @@ -1284,7 +1284,8 @@ static int sta_apply_parameters(struct ieee80211_local 
> *local,
>   int ret = 0;
>   struct ieee80211_supported_band *sband;
>   struct ieee80211_sub_if_data *sdata = sta->sdata;
> - u32 mask, set;
> + u32 mask, set, tid, ac, pre_weight;
> + struct txq_info *txqi;
>  
>   sband = ieee80211_get_sband(sdata);
>   if (!sband)
> @@ -1452,8 +1453,30 @@ static int sta_apply_parameters(struct ieee80211_local 
> *local,
>   if (ieee80211_vif_is_mesh(>vif))
>   sta_apply_mesh_params(local, sta, params);
>  
> - if (params->airtime_weight)
> - sta->airtime_weight = params->airtime_weight;
> + if (params->airtime_weight &&
> + params->airtime_weight != sta->airtime_weight) {

This check doesn't work I think? You're not using the array-based
sta->airtime_weight[], and there are locking issues by just checking
like this; so maybe just keep the if() on params->airtime_weight, and do
the checking inside the loop while holding the lock?

Or could we just turn the weights into atomics to avoid the locking
entirely?

> + for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) {
> + spin_lock_bh(>active_txq_lock[ac]);
> + for (tid = 0; tid < IEEE80211_NUM_TIDS + 1; tid++) {
> + if (!sta->sta.txq[tid] ||
> + ac != ieee80211_ac_from_tid(tid))
> + continue;
> +
> + pre_weight = sta->airtime_weight[ac];
> + sta->airtime_weight[ac] =
> + params->airtime_weight;
> +
> + txqi = to_txq_info(sta->sta.txq[tid]);
> + if (RB_EMPTY_NODE(>schedule_order))
> + continue;
> +
> + local->airtime_weight_sum[ac] = 
> local->airtime_weight_sum[ac] +
> + 
> params->airtime_weight -
> + pre_weight;
> + }
> + spin_unlock_bh(>active_txq_lock[ac]);
> + }
> + }
>  
>   /* set the STA state after all sta info from usermode has been set */
>   if (test_sta_flag(sta, WLAN_STA_TDLS_PEER) ||
> diff --git a/net/mac80211/debugfs_sta.c b/net/mac80211/debugfs_sta.c
> index 80028da..43a7e6a 100644
> --- a/net/mac80211/debugfs_sta.c
> +++ b/net/mac80211/debugfs_sta.c
> @@ -223,7 +223,7 @@ static ssize_t sta_airtime_read(struct file *file, char 
> __user *userbuf,
>   "Virt-T: VO: %lld us VI: %lld us BE: %lld us BK: %lld us\n",
>   rx_airtime,
>   tx_airtime,
> - sta->airtime_weight,
> + 

Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-20 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> On 2019-09-19 18:37, Toke Høiland-Jørgensen wrote:
>> Yibo Zhao  writes:
>> 
>>> On 2019-09-18 19:23, Toke Høiland-Jørgensen wrote:
>>>> Yibo Zhao  writes:
>>>> 
>>>>> On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:
>>>>>> Yibo Zhao  writes:
>>>>>> 
>>>>>>> In a loop txqs dequeue scenario, if the first txq in the rbtree 
>>>>>>> gets
>>>>>>> removed from rbtree immediately in the ieee80211_return_txq(), the
>>>>>>> loop will break soon in the ieee80211_next_txq() due to 
>>>>>>> schedule_pos
>>>>>>> not leading to the second txq in the rbtree. Thus, defering the
>>>>>>> removal right before the end of this schedule round.
>>>>>>> 
>>>>>>> Co-developed-by: Yibo Zhao 
>>>>>>> Signed-off-by: Yibo Zhao 
>>>>>>> Signed-off-by: Toke Høiland-Jørgensen 
>>>>>> 
>>>>>> I didn't write this patch, so please don't use my sign-off. I'll 
>>>>>> add
>>>>>> ack or review tags as appropriate in reply; but a few comments 
>>>>>> first:
>>>>>> 
>>>>>>> ---
>>>>>>>  include/net/mac80211.h | 16 ++--
>>>>>>>  net/mac80211/ieee80211_i.h |  3 +++
>>>>>>>  net/mac80211/main.c|  6 +
>>>>>>>  net/mac80211/tx.c  | 63
>>>>>>> +++---
>>>>>>>  4 files changed, 83 insertions(+), 5 deletions(-)
>>>>>>> 
>>>>>>> diff --git a/include/net/mac80211.h b/include/net/mac80211.h
>>>>>>> index ac2ed8e..ba5a345 100644
>>>>>>> --- a/include/net/mac80211.h
>>>>>>> +++ b/include/net/mac80211.h
>>>>>>> @@ -925,6 +925,8 @@ struct ieee80211_tx_rate {
>>>>>>> 
>>>>>>>  #define IEEE80211_MAX_TX_RETRY 31
>>>>>>> 
>>>>>>> +#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
>>>>>>> +
>>>>>>>  static inline void ieee80211_rate_set_vht(struct 
>>>>>>> ieee80211_tx_rate
>>>>>>> *rate,
>>>>>>>   u8 mcs, u8 nss)
>>>>>>>  {
>>>>>>> @@ -6232,7 +6234,8 @@ struct sk_buff *ieee80211_tx_dequeue(struct
>>>>>>> ieee80211_hw *hw,
>>>>>>>   * @ac: AC number to return packets from.
>>>>>>>   *
>>>>>>>   * Should only be called between calls to
>>>>>>> ieee80211_txq_schedule_start()
>>>>>>> - * and ieee80211_txq_schedule_end().
>>>>>>> + * and ieee80211_txq_schedule_end(). If the txq is empty, it will
>>>>>>> be
>>>>>>> added
>>>>>>> + * to a remove list and get removed later.
>>>>>>>   * Returns the next txq if successful, %NULL if no queue is
>>>>>>> eligible.
>>>>>>> If a txq
>>>>>>>   * is returned, it should be returned with ieee80211_return_txq()
>>>>>>> after the
>>>>>>>   * driver has finished scheduling it.
>>>>>>> @@ -6268,7 +6271,8 @@ void ieee80211_txq_schedule_start(struct
>>>>>>> ieee80211_hw *hw, u8 ac)
>>>>>>>   * @hw: pointer as obtained from ieee80211_alloc_hw()
>>>>>>>   * @ac: AC number to acquire locks for
>>>>>>>   *
>>>>>>> - * Release locks previously acquired by
>>>>>>> ieee80211_txq_schedule_end().
>>>>>>> + * Release locks previously acquired by
>>>>>>> ieee80211_txq_schedule_end().
>>>>>>> Check
>>>>>>> + * and remove the empty txq from rb-tree.
>>>>>>>   */
>>>>>>>  void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, u8 ac)
>>>>>>> __releases(txq_lock);
>>>>>>> @@ -6287,6 +6291,14 @@ void ieee80211_schedule_txq(struct
>>>>>>> ieee80211_hw
>>>>>>> *hw, struct ieee80211_txq *txq)
>>>>>>> __acquires(txq_lock) __releases(txq_lock);
>>>>>&

[PATCH v7] mac80211: Import airtime calculation code from mt76

2019-11-11 Thread Toke Høiland-Jørgensen
Felix recently added code to calculate airtime of packets to the mt76
driver. Import this into mac80211 so we can use it for airtime queue limit
calculations.

The airtime.c file is copied verbatim from the mt76 driver, and adjusted to
be usable in mac80211. This involves:

- Switching to mac80211 data structures.
- Adding support for 160 MHz channels and HE mode.
- Moving the symbol and duration calculations around a bit to avoid
  rounding with the higher rates and longer symbol times used for HE rates.

The per-rate TX rate calculation is also split out to its own function so
it can be used directly for the AQL calculations later.

Signed-off-by: Toke Høiland-Jørgensen 
---
I realised this patch was fairly self-contained, so I thought I would
just go ahead and send it on its own instead of waiting for Kan to update
his patch.

Someone *please* check my math oh the HE mode stuff; I've done my best
to get this right based on public sources, but I have no hardware to
test it on.


 include/net/mac80211.h |  29 ++
 net/mac80211/Makefile  |   3 +-
 net/mac80211/airtime.c | 597 +
 net/mac80211/ieee80211_i.h |   4 +
 4 files changed, 632 insertions(+), 1 deletion(-)
 create mode 100644 net/mac80211/airtime.c

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index c643a19dce96..6fc26a051ba0 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -6424,4 +6424,33 @@ void ieee80211_nan_func_match(struct ieee80211_vif *vif,
  struct cfg80211_nan_match_params *match,
  gfp_t gfp);
 
+/**
+ * ieee80211_calc_rx_airtime - calculate estimated transmission airtime for RX.
+ *
+ * This function calculates the estimated airtime usage of a frame based on the
+ * rate information in the RX status struct and the frame length.
+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @status:  ieee80211_rx_status containing the transmission rate
+ *  information.
+ * @len: frame length in bytes
+ */
+u32 ieee80211_calc_rx_airtime(struct ieee80211_hw *hw,
+ struct ieee80211_rx_status *status,
+ int len);
+
+/**
+ * ieee80211_calc_tx_airtime - calculate estimated transmission airtime for TX.
+ *
+ * This function calculates the estimated airtime usage of a frame based on the
+ * rate information in the TX info struct and the frame length.
+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @info:  ieee80211_tx_info of the frame.
+ * @len: frame length in bytes
+ */
+u32 ieee80211_calc_tx_airtime(struct ieee80211_hw *hw,
+ struct ieee80211_tx_info *info,
+ int len);
+
 #endif /* MAC80211_H */
diff --git a/net/mac80211/Makefile b/net/mac80211/Makefile
index 4f03ebe732fa..6cbb1286d6c0 100644
--- a/net/mac80211/Makefile
+++ b/net/mac80211/Makefile
@@ -32,7 +32,8 @@ mac80211-y := \
chan.o \
trace.o mlme.o \
tdls.o \
-   ocb.o
+   ocb.o \
+   airtime.o
 
 mac80211-$(CONFIG_MAC80211_LEDS) += led.o
 mac80211-$(CONFIG_MAC80211_DEBUGFS) += \
diff --git a/net/mac80211/airtime.c b/net/mac80211/airtime.c
new file mode 100644
index ..0c2a0bb94727
--- /dev/null
+++ b/net/mac80211/airtime.c
@@ -0,0 +1,597 @@
+// SPDX-License-Identifier: ISC
+/*
+ * Copyright (C) 2019 Felix Fietkau 
+ */
+
+#include 
+#include "ieee80211_i.h"
+#include "sta_info.h"
+
+#define AVG_PKT_SIZE   1024
+
+/* Number of bits for an average sized packet */
+#define MCS_NBITS (AVG_PKT_SIZE << 3)
+
+/* Number of kilo-symbols (symbols * 1024) for a packet with (bps) bits per
+ * symbol. We use k-symbols to avoid rounding in the _TIME macros below.
+ */
+#define MCS_N_KSYMS(bps) DIV_ROUND_UP(MCS_NBITS << 10, (bps))
+
+/* Transmission time (in 1024 * usec) for a packet containing (ksyms) * 1024
+ * symbols.
+ */
+#define MCS_SYMBOL_TIME(sgi, ksyms)\
+   (sgi ?  \
+ ((ksyms) * 4 * 18) / 20 : /* 3.6 us per sym */\
+ ((ksyms) * 4) /* 4.0 us per sym */\
+   )
+
+/* Transmit duration for the raw data part of an average sized packet */
+#define MCS_DURATION(streams, sgi, bps) \
+   ((u32)MCS_SYMBOL_TIME(sgi, MCS_N_KSYMS((streams) * (bps
+
+#define MCS_DURATION_S(shift, streams, sgi, bps)   \
+   ((u16)((MCS_DURATION(streams, sgi, bps) >> shift)))
+
+/* These should match the values in enum nl80211_he_gi */
+#define HE_GI_08 0
+#define HE_GI_16 1
+#define HE_GI_32 2
+
+/* Transmission time (1024 usec) for a packet containing (ksyms) * k-symbols */
+#define HE_SYMBOL_TIME(gi, ksyms)  \
+   (gi == HE_GI_08 ?   \
+((ksyms) * 16 * 17) / 20 : /* 13.6 us p

Re: [PATCH v6 3/4] mac80211: Implement Airtime-based Queue Limit (AQL)

2019-11-09 Thread Toke Høiland-Jørgensen
Kan Yan  writes:

> It is most likely just insufficient locking. active_txq_lock is per
> AC, can't protect local->aql_total_pending_airtime against racing
> conditions:
> void ieee80211_sta_update_pending_airtime(...)
> {
> spin_lock_bh(>active_txq_lock[ac]);
> ...
> local->aql_total_pending_airtime -= tx_airtime;
> ...
> spin_unlock_bh(>active_txq_lock[ac]);
> }

Ohh, right; didn't even realise those were not per-AC as well...

> After changing it to atomic_t, no more aql_total_pending_airtime
> underflow so far :). Using atomic operation should also help reduce
> CPU overhead due to lock contention, as
> ieee80211_sta_update_pending_airtime() is often called from the tx
> completion routine triggered by interrupts, often in a different core
> than where __ieee80211_schedule_txq() is running.
>
> I will post a new version a bit later if the test goes well.

Awesome! :)

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


  1   2   >