Re: [PATCH] mac80211: Fix wrong channel bandwidths reported for aggregates

2022-07-20 Thread Toke Høiland-Jørgensen
Linus Lüssing  writes:

> On 19/07/2022 17:03, Adrian Chadd wrote:
>> Hi!
>> 
>> It's not a hardware bug. Dating back to the original AR5416 11n chip,
>> most flags aren't valid for subframes in an aggregate. Only the final
>> frame has valid flags. This was explicitly covered internally way back
>> when.
>
> Ah, thanks for the clarification! I see it in the datasheet for the 
> QCA9531, too, now. And thanks for the confirmation, that what we are 
> doing so far is not correct for ath9k.
>
> Words 0+2 are valid for all RX descriptors, 0+2+11 valid for the last RX 
> descriptor of each packet and 0-11 for the last RX descriptor of an 
> aggregate or last RX descriptor of a stand-alone packet. Or in other 
> words, word 4, which contains the 20 vs. 40 MHz indicator, is invalid 
> for any aggregate sub-frame other than the last one. I can rename that 
> in the commit message.
>
>
> Another approach that also came to my mind was introducing more explicit 
> flags in cfg80211.h's "struct rate_info", like a RATE_INFO_BW_UNKNOWN in 
> "enum rate_info_bw" and/or RATE_INFO_FLAGS_UNKNOWN in "enum 
> rate_info_flags". And setting those flags in ath9k_cmn_process_rate().
>
> The current approach is smaller though, as it simply uses the already 
> existing flags. If anyone has any preferences, please let me know.

I have no objections to doing it in mac80211 like you're proposing here :)

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH] PCI: Disallow retraining link for Atheros QCA98xx chips on non-Gen1 PCIe bridges

2021-03-26 Thread Toke Høiland-Jørgensen
Pali Rohár  writes:

> Atheros QCA9880 and QCA9890 chips do not behave after a bus reset and also
> after retrain link when PCIe bridge is not in GEN1 mode at 2.5 GT/s speed.
> The device will throw a Link Down error and config space is not accessible
> again. Retrain link can be called only when using GEN1 PCIe bridge or when
> PCIe bridge has forced link speed to 2.5 GT/s via PCI_EXP_LNKCTL2 register.
>
> This issue was reproduced with more Compex WLE900VX cards (QCA9880 based)
> on Armada 385 with pci-mvebu.c driver and also on Armada 3720 with
> pci-aardvark.c driver. Also this issue was reproduced with some "noname"
> card with QCA9890 WiFi chip on Armada 3720. All problematic cards with
> these QCA chips have PCI device id 0x003c.
>
> Tests showed that other WiFi cards based on AR93xx (PCI device id 0x0030)
> and AR9287 (PCI device id 0x002e) chips do not have these problems.
>
> To workaround this issue, this change introduces a new PCI quirk called
> PCI_DEV_FLAGS_NO_RETRAIN_LINK_WHEN_NOT_GEN1 for PCI device id 0x003c.
>
> When this quirk is set then kernel disallows triggering PCI_EXP_LNKCTL_RL
> bit in config space of PCIe Bridge in case PCIe Bridge is capable of higher
> speed than 2.5 GT/s and higher speed is already allowed. When PCIe Bridge
> has accessible LNKCTL2 register then kernel tries to force target link
> speed via PCI_EXP_LNKCTL2_TLS* bits to 2.5 GT/s. After this change it is
> possible to trigger PCI_EXP_LNKCTL_RL bit without causing issues on
> problematic Atheros QCA98xx cards.
>
> Currently only PCIe ASPM kernel code triggers this PCI_EXP_LNKCTL_RL bit,
> so quirk check is added only into pcie/aspm.c file.
>
> Signed-off-by: Pali Rohár 
> Reported-by: Toke Høiland-Jørgensen 
> Tested-by: Marek Behún 
> Link: https://lore.kernel.org/linux-pci/87h7l8axqp@toke.dk/
> Cc: sta...@vger.kernel.org # c80851f6ce63a ("PCI: Add
> PCI_EXP_LNKCTL2_TLS* macros")

Thanks!

Tested-by: Toke Høiland-Jørgensen 


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: ath10k: Missing airtime scheduler parts

2020-12-01 Thread Toke Høiland-Jørgensen
Sven Eckelmann  writes:

> Hi,
>
> thanks for the reply. I will forward the information.
>
> On Tuesday, 1 December 2020 11:56:57 CET Toke Høiland-Jørgensen wrote:
>> I did recently rebase that patch to the current mac80211-next, actually.
>> Sent it off to Felix for some testing, but if you wan to give it a go I
>> can post it to the list as well 
>
> Thank you for the offer. But I doubt that I will find time for it
> before next year.

Alright, fair enough; I'll give Felix a chance to test it first, then... :)

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: ath10k: Missing airtime scheduler parts

2020-12-01 Thread Toke Høiland-Jørgensen
Sven Eckelmann  writes:

> Hi,
>
> I was asked what parts are currently missing for (a better) airtime fairness 
> with ath10k. I know that the AQL were merged and enabled for ath10k (and 
> mt76). But there was also the virtual time-based airtime scheduler [1] which 
> was proposed. I think the development on that one didn't continue since last 
> year.

I did recently rebase that patch to the current mac80211-next, actually.
Sent it off to Felix for some testing, but if you wan to give it a go I
can post it to the list as well :)

> Maybe someone else knows if there were some other parts worked on
> which I've am missed and which were not yet merged.

I don't think there have been any other patches being actively worked
on, but I seem to recall something about AQL (partly?) breaking airtime
fairness because the scheduler only keeps fairness between stations that
are actually scheduled, and AQL interrupting that...

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: CVE-2020-3702: Firmware updates for ath9k and ath10k chips

2020-08-12 Thread Toke Høiland-Jørgensen
Pali Rohár  writes:

> On Monday 10 August 2020 11:01:26 Pali Rohár wrote:
>> Hello!
>> 
>> ESET engineers on their blog published some information about new
>> security vulnerability CVE-2020-3702 in ath9k wifi cards:
>> https://www.welivesecurity.com/2020/08/06/beyond-kr00k-even-more-wifi-chips-vulnerable-eavesdropping/
>> 
>> According to Qualcomm security bulletin this CVE-2020-3702 affects also
>> some Qualcomm IPQ chips which are handled by ath10k driver:
>> https://www.qualcomm.com/company/product-security/bulletins/august-2020-security-bulletin#_cve-2020-3702
>> 
>> Kalle, could you or other people from Qualcomm provide updated and fixed
>> version of ath9k and ath10k firmwares in linux-firmware git repository?
>> 
>> According to Qualcomm security bulletin this issue has Critical security
>> rating, so I think fixed firmware files should be updated also in stable
>> releases of linux distributions.
>
> Hello!
>
> Qualcomm has already sent following statement to media:
>
> Qualcomm has already made mitigations available to OEMs in May 2020,
> and we encourage end users to update their devices as patches have
> become available from OEMs.
>
> And based on information from ESET blog post, Qualcomm's proprietary
> driver for these wifi cards is fixed since Qualcomm July release.
>
> Could somebody react and provide some details when fixes would be
> available for ath9k and ath10k Linux drivers? And what is current state
> of this issue for Linux?
>
> I'm looking at ath9k and ath10k git trees [1] [2] [3] and I do not see
> there any change which could be related to CVE-2020-3702.

How about these, from March:

a0761a301746 ("mac80211: drop data frames without key on encrypted links")
ce2e1ca70307 ("mac80211: Check port authorization in the ieee80211_tx_dequeue() 
case")
b16798f5b907 ("mac80211: mark station unauthorized before key removal")

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 3/4] mac80211: fix low throughput in multi-clients situation

2020-01-08 Thread Toke Høiland-Jørgensen
Justin Capella  writes:

> Is there a risk of division by zero?

No, weights are always positive.

> I'm curious about IEEE80211_AIRTIME_QUANTUM in this multiple sta
> scenario, should weight maybe be the depth of the queue or something
> like that? Using real time delta I assume is not performant?

Eh?

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 2/4] mac80211: fix issue in loop scenario

2019-12-18 Thread Toke Høiland-Jørgensen
Johannes Berg  writes:

> On Wed, 2019-12-18 at 18:12 +0800, yi...@codeaurora.org wrote:
>> 
>> Yes, this is a fix to the first patch. Actually, the rest of two patches 
>> are also serve the same. So, are you suggesting to merge them to the 
>> first patch?
>
> Yes.
>
>> Previouly, I had added Toke's signature in this patch but Toke advised 
>> to delete it. I feel a little bit confused about how to handle it. 
>> Appreciate any detail guide.
>
> I guess that you have to discuss that with Toke, how he wants to handle
> it ... If he gave you a patch with his signed-off-by, then you should
> probably keep it? There's also "Co-developed-by" tag to give some sort
> of non-author developer credits.

I'll do some squashing and send a new version; in which case I think it
makes sense to have both our s-o-b tags, and maybe a co-developed-by.

I'm hoping to get this done before the holidays (i.e., this week).
Already got everything rebased to current mac80211-next, just need to do
the squashing and fix up the other outstanding issues from last time.

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 4/4] mac80211: Sync airtime weight sum with per AC synced sta airtime weight together

2019-12-17 Thread Toke Høiland-Jørgensen
Johannes Berg  writes:

> I'm going to assume that Toke will review all of this and there will be
> changes, so you'd resend anyway ...

Yeah, this series doesn't even apply in its current form. I'll try to
fix that, and do a few other updates that are needed while I'm at it.
And to answer your question in the other email, yeah, this should
probably be squashed to a single patch...

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH V4 0/4] Enable virtual time-based airtime scheduler support on ath10k

2019-12-13 Thread Toke Høiland-Jørgensen
Justin Capella  writes:

> Would it make sense to consider skb->priority / QoS in the Access
> Classifier selection?

Erm, what? Not sure I understand what you're asking here...

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v9 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

2019-11-18 Thread Toke Høiland-Jørgensen
Kan Yan  writes:

>> > Given the fact that AQL is only tested in very limited platforms,
>> > should we set the default to disabled by removing this change in the
>> > next update?
>> >
>> > -   local->airtime_flags = AIRTIME_USE_TX | AIRTIME_USE_RX;
>> > +
>> > +   local->airtime_flags = AIRTIME_USE_TX |
>> > +  AIRTIME_USE_RX |
>> > +  AIRTIME_USE_AQL;
>> > +   local->aql_threshold = IEEE80211_AQL_THRESHOLD;
>> > +   atomic_set(&local->aql_total_pending_airtime, 0);
>> Well, we have the whole -rc series to get more testing in if we merge it
>> as-is. It's up to the maintainers, of course, but I would be in favour
>> of merging as-is, then optionally backing out the default before the
>> final release if problems do turn up. But I would hope that the limits
>> are sufficiently conservative that it would not result in any problems :)
>
> Sounds good. The current default limits are reasonably conservative
> and are tunable via debugfs.
>
> I will give the v10 version of this patch serial a quick test and
> hopefully we can wrap it up soon.

Sounds good, thanks! Also, seems we got a 5.4-rc8 yesterday, so we have
another week before the merge window.

Johannes, any chance you'll get a chance to take a look at this sometime
this week? :)

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH v10 1/4] mac80211: Add new sta_info getter by sta/vif addrs

2019-11-16 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

In ieee80211_tx_status() we don't have an sdata struct when looking up the
destination sta. Instead, we just do a lookup by the vif addr that is the
source of the packet being completed. Factor this out into a new sta_info
getter helper, since we need to use it for accounting AQL as well.

Signed-off-by: Toke Høiland-Jørgensen 
---
 net/mac80211/sta_info.c |   20 
 net/mac80211/sta_info.h |3 +++
 net/mac80211/status.c   |   10 ++
 3 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index bd11fef2139f..465d83b13582 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -210,6 +210,26 @@ struct sta_info *sta_info_get_bss(struct 
ieee80211_sub_if_data *sdata,
return NULL;
 }
 
+struct sta_info *sta_info_get_by_addrs(struct ieee80211_local *local,
+  const u8 *sta_addr, const u8 *vif_addr)
+{
+   struct rhlist_head *tmp;
+   struct sta_info *sta;
+
+   rcu_read_lock();
+   for_each_sta_info(local, sta_addr, sta, tmp) {
+   if (ether_addr_equal(vif_addr, sta->sdata->vif.addr)) {
+   rcu_read_unlock();
+   /* this is safe as the caller must already hold
+* another rcu read section or the mutex
+*/
+   return sta;
+   }
+   }
+   rcu_read_unlock();
+   return NULL;
+}
+
 struct sta_info *sta_info_get_by_idx(struct ieee80211_sub_if_data *sdata,
 int idx)
 {
diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
index 369c2dddce52..80e76569144e 100644
--- a/net/mac80211/sta_info.h
+++ b/net/mac80211/sta_info.h
@@ -725,6 +725,9 @@ struct sta_info *sta_info_get(struct ieee80211_sub_if_data 
*sdata,
 struct sta_info *sta_info_get_bss(struct ieee80211_sub_if_data *sdata,
  const u8 *addr);
 
+struct sta_info *sta_info_get_by_addrs(struct ieee80211_local *local,
+  const u8 *sta_addr, const u8 *vif_addr);
+
 #define for_each_sta_info(local, _addr, _sta, _tmp)\
rhl_for_each_entry_rcu(_sta, _tmp,  \
   sta_info_hash_lookup(local, _addr), hash_node)
diff --git a/net/mac80211/status.c b/net/mac80211/status.c
index ab8ba5835ca0..0e51def35b8a 100644
--- a/net/mac80211/status.c
+++ b/net/mac80211/status.c
@@ -1073,19 +1073,13 @@ void ieee80211_tx_status(struct ieee80211_hw *hw, 
struct sk_buff *skb)
.skb = skb,
.info = IEEE80211_SKB_CB(skb),
};
-   struct rhlist_head *tmp;
struct sta_info *sta;
 
rcu_read_lock();
 
-   for_each_sta_info(local, hdr->addr1, sta, tmp) {
-   /* skip wrong virtual interface */
-   if (!ether_addr_equal(hdr->addr2, sta->sdata->vif.addr))
-   continue;
-
+   sta = sta_info_get_by_addrs(local, hdr->addr1, hdr->addr2);
+   if (sta)
status.sta = &sta->sta;
-   break;
-   }
 
__ieee80211_tx_status(hw, &status);
rcu_read_unlock();


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH v10 2/4] mac80211: Import airtime calculation code from mt76

2019-11-16 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

Felix recently added code to calculate airtime of packets to the mt76
driver. Import this into mac80211 so we can use it for airtime queue limit
calculations.

The airtime.c file is copied verbatim from the mt76 driver, and adjusted to
be usable in mac80211. This involves:

- Switching to mac80211 data structures.
- Adding support for 160 MHz channels and HE mode.
- Moving the symbol and duration calculations around a bit to avoid
  rounding with the higher rates and longer symbol times used for HE rates.

The per-rate TX rate calculation is also split out to its own function so
it can be used directly for the AQL calculations later.

Signed-off-by: Toke Høiland-Jørgensen 
---
 include/net/mac80211.h |   29 ++
 net/mac80211/Makefile  |3 
 net/mac80211/airtime.c |  597 
 net/mac80211/ieee80211_i.h |4 
 4 files changed, 632 insertions(+), 1 deletion(-)
 create mode 100644 net/mac80211/airtime.c

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index c643a19dce96..6fc26a051ba0 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -6424,4 +6424,33 @@ void ieee80211_nan_func_match(struct ieee80211_vif *vif,
  struct cfg80211_nan_match_params *match,
  gfp_t gfp);
 
+/**
+ * ieee80211_calc_rx_airtime - calculate estimated transmission airtime for RX.
+ *
+ * This function calculates the estimated airtime usage of a frame based on the
+ * rate information in the RX status struct and the frame length.
+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @status: &struct ieee80211_rx_status containing the transmission rate
+ *  information.
+ * @len: frame length in bytes
+ */
+u32 ieee80211_calc_rx_airtime(struct ieee80211_hw *hw,
+ struct ieee80211_rx_status *status,
+ int len);
+
+/**
+ * ieee80211_calc_tx_airtime - calculate estimated transmission airtime for TX.
+ *
+ * This function calculates the estimated airtime usage of a frame based on the
+ * rate information in the TX info struct and the frame length.
+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @info: &struct ieee80211_tx_info of the frame.
+ * @len: frame length in bytes
+ */
+u32 ieee80211_calc_tx_airtime(struct ieee80211_hw *hw,
+ struct ieee80211_tx_info *info,
+ int len);
+
 #endif /* MAC80211_H */
diff --git a/net/mac80211/Makefile b/net/mac80211/Makefile
index 4f03ebe732fa..6cbb1286d6c0 100644
--- a/net/mac80211/Makefile
+++ b/net/mac80211/Makefile
@@ -32,7 +32,8 @@ mac80211-y := \
chan.o \
trace.o mlme.o \
tdls.o \
-   ocb.o
+   ocb.o \
+   airtime.o
 
 mac80211-$(CONFIG_MAC80211_LEDS) += led.o
 mac80211-$(CONFIG_MAC80211_DEBUGFS) += \
diff --git a/net/mac80211/airtime.c b/net/mac80211/airtime.c
new file mode 100644
index ..0c2a0bb94727
--- /dev/null
+++ b/net/mac80211/airtime.c
@@ -0,0 +1,597 @@
+// SPDX-License-Identifier: ISC
+/*
+ * Copyright (C) 2019 Felix Fietkau 
+ */
+
+#include 
+#include "ieee80211_i.h"
+#include "sta_info.h"
+
+#define AVG_PKT_SIZE   1024
+
+/* Number of bits for an average sized packet */
+#define MCS_NBITS (AVG_PKT_SIZE << 3)
+
+/* Number of kilo-symbols (symbols * 1024) for a packet with (bps) bits per
+ * symbol. We use k-symbols to avoid rounding in the _TIME macros below.
+ */
+#define MCS_N_KSYMS(bps) DIV_ROUND_UP(MCS_NBITS << 10, (bps))
+
+/* Transmission time (in 1024 * usec) for a packet containing (ksyms) * 1024
+ * symbols.
+ */
+#define MCS_SYMBOL_TIME(sgi, ksyms)\
+   (sgi ?  \
+ ((ksyms) * 4 * 18) / 20 : /* 3.6 us per sym */\
+ ((ksyms) * 4) /* 4.0 us per sym */\
+   )
+
+/* Transmit duration for the raw data part of an average sized packet */
+#define MCS_DURATION(streams, sgi, bps) \
+   ((u32)MCS_SYMBOL_TIME(sgi, MCS_N_KSYMS((streams) * (bps
+
+#define MCS_DURATION_S(shift, streams, sgi, bps)   \
+   ((u16)((MCS_DURATION(streams, sgi, bps) >> shift)))
+
+/* These should match the values in enum nl80211_he_gi */
+#define HE_GI_08 0
+#define HE_GI_16 1
+#define HE_GI_32 2
+
+/* Transmission time (1024 usec) for a packet containing (ksyms) * k-symbols */
+#define HE_SYMBOL_TIME(gi, ksyms)  \
+   (gi == HE_GI_08 ?   \
+((ksyms) * 16 * 17) / 20 : /* 13.6 us per sym */   \
+(gi == HE_GI_16 ?  \
+ ((ksyms) * 16 * 18) / 20 :/* 14.4 us per sym */   \
+ ((ksyms) * 16)/* 16.0 us per sym */   \
+))
+
+/* Transmi

[PATCH v10 0/4] Add Airtime Queue Limits (AQL) to mac80211

2019-11-16 Thread Toke Høiland-Jørgensen
This series is a first attempt at porting the Airtime Queue Limits concept from
the out-of-tree ath10k implementation[0] to mac80211. This version takes Kan's
patch to do the throttling in mac80211, and replaces the driver API with the
mechanism from the previous version of my series, which instead calculated the
expected airtime at dequeue time inside mac80211, storing it in the SKB cb
field.

This series also imports Felix' airtime calculation code from mt76 into
mac80211, adjusting the API so it can be used from TX dequeue, by extracting the
latest TX rate from the tx_stats structure kept for each station.

As before, I've only compile tested this (lacking the proper hardware to do more
testing), but Kan has tested in on the QCA9984 platform.

The series is also available in my git repo here:
https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/log/?h=mac80211-aql-10

[0] 
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1703105/7

Changelog:

v10:
  - Fix return value from ieee80211_info_set_tx_time_est()

v9:
  - Use atomic_sub_return() instead of separate atomic_sub() and atomic_read()
  - Add getter/setter for tx_time_est
  - Use get_sta_info_by_addrs() to find the station in
ieee80211_report_used_skb()
  - Integrate everything back into one series

v8:
  - Includes Toke's v7 version of "mac80211: Import airtime calculation code 
from mt76"
  - Don't clobber sta's customized queue limit when configuring the default via 
debugfs
  - Fix a racing condition when reset aql_tx_pending.

v7:
  - Fix aql_total_pending_airtime underflow due to insufficient locking.

v6:
  - Fix sta lookup in ieee80211_report_used_skb().
  - Move call to ieee80211_sta_update_pending_airtime() to a bit later in
__ieee80211_tx_status() 
v5:
  - Add missing export of ieee80211_calc_rx_airtime() and make
ieee80211_calc_tx_airtime_rate() static (kbuildbot).
  - Use skb_get_queue_mapping() to get the AC from the skb.
  - Take basic rate configuration for the BSS into account when calculating
multicast rate.
v4:
  - Fix calculation that clamps the maximum airtime to fit into 10 bits
  - Incorporate Rich Brown's nits for the commit message in Kan's patch
  - Add fewer local variables to ieee80211_tx_dequeue()
v3:
  - Move the tx_time_est field so it's shared with ack_frame_id, and use units
of 4us for the value stored in it.
  - Move the addition of the Ethernet header size into 
ieee80211_calc_expected_tx_airtime()
v2:
  - Integrate Kan's approach to airtime throttling.
  - Hopefully fix the cb struct alignment on big-endian architectures.

---

Kan Yan (1):
      mac80211: Implement Airtime-based Queue Limit (AQL)

Toke Høiland-Jørgensen (3):
  mac80211: Add new sta_info getter by sta/vif addrs
  mac80211: Import airtime calculation code from mt76
  mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue


 include/net/cfg80211.h |7 +
 include/net/mac80211.h |   57 
 net/mac80211/Makefile  |3 
 net/mac80211/airtime.c |  597 
 net/mac80211/debugfs.c |   85 ++
 net/mac80211/debugfs_sta.c |   43 ++-
 net/mac80211/ieee80211_i.h |8 +
 net/mac80211/main.c|   10 +
 net/mac80211/sta_info.c|   58 
 net/mac80211/sta_info.h|   11 +
 net/mac80211/status.c  |   36 ++-
 net/mac80211/tx.c  |   65 +
 12 files changed, 957 insertions(+), 23 deletions(-)
 create mode 100644 net/mac80211/airtime.c


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH v10 3/4] mac80211: Implement Airtime-based Queue Limit (AQL)

2019-11-16 Thread Toke Høiland-Jørgensen
From: Kan Yan 

In order for the Fq_CoDel algorithm integrated in mac80211 layer to operate
effectively to control excessive queueing latency, the CoDel algorithm
requires an accurate measure of how long packets stays in the queue, AKA
sojourn time. The sojourn time measured at the mac80211 layer doesn't
include queueing latency in the lower layer (firmware/hardware) and CoDel
expects lower layer to have a short queue. However, most 802.11ac chipsets
offload tasks such TX aggregation to firmware or hardware, thus have a deep
lower layer queue.

Without a mechanism to control the lower layer queue size, packets only
stay in mac80211 layer transiently before being sent to firmware queue.
As a result, the sojourn time measured by CoDel in the mac80211 layer is
almost always lower than the CoDel latency target, hence CoDel does little
to control the latency, even when the lower layer queue causes excessive
latency.

The Byte Queue Limits (BQL) mechanism is commonly used to address the
similar issue with wired network interface. However, this method cannot be
applied directly to the wireless network interface. "Bytes" is not a
suitable measure of queue depth in the wireless network, as the data rate
can vary dramatically from station to station in the same network, from a
few Mbps to over Gbps.

This patch implements an Airtime-based Queue Limit (AQL) to make CoDel work
effectively with wireless drivers that utilized firmware/hardware
offloading. AQL allows each txq to release just enough packets to the lower
layer to form 1-2 large aggregations to keep hardware fully utilized and
retains the rest of the frames in mac80211 layer to be controlled by the
CoDel algorithm.

Signed-off-by: Kan Yan 
[ Toke: Keep API to set pending airtime internal, fix nits in commit msg ]
Signed-off-by: Toke Høiland-Jørgensen 
---
 include/net/cfg80211.h |7 
 include/net/mac80211.h |   12 ++
 net/mac80211/debugfs.c |   85 
 net/mac80211/debugfs_sta.c |   43 +-
 net/mac80211/ieee80211_i.h |4 ++
 net/mac80211/main.c|   10 +
 net/mac80211/sta_info.c|   38 
 net/mac80211/sta_info.h|8 
 net/mac80211/tx.c  |   47 +++-
 9 files changed, 240 insertions(+), 14 deletions(-)

diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index e309cc826b40..4f75dbdd406a 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -2606,6 +2606,13 @@ enum wiphy_params_flags {
 
 #define IEEE80211_DEFAULT_AIRTIME_WEIGHT   256
 
+/* The per TXQ device queue limit in airtime */
+#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_L  4000
+#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_H  8000
+
+/* The per interface airtime threshold to switch to lower queue limit */
+#define IEEE80211_AQL_THRESHOLD24000
+
 /**
  * struct cfg80211_pmksa - PMK Security Association
  *
diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index 6fc26a051ba0..ba3f33cc41ea 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -5565,6 +5565,18 @@ void ieee80211_send_eosp_nullfunc(struct ieee80211_sta 
*pubsta, int tid);
 void ieee80211_sta_register_airtime(struct ieee80211_sta *pubsta, u8 tid,
u32 tx_airtime, u32 rx_airtime);
 
+/**
+ * ieee80211_txq_airtime_check - check if a txq can send frame to device
+ *
+ * @hw: pointer obtained from ieee80211_alloc_hw()
+ * @txq: pointer obtained from station or virtual interface
+ *
+ * Return true if the AQL's airtime limit has not been reached and the txq can
+ * continue to send more packets to the device. Otherwise return false.
+ */
+bool
+ieee80211_txq_airtime_check(struct ieee80211_hw *hw, struct ieee80211_txq 
*txq);
+
 /**
  * ieee80211_iter_keys - iterate keys programmed into the device
  * @hw: pointer obtained from ieee80211_alloc_hw()
diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index 568b3b276931..399d4e8b8546 100644
--- a/net/mac80211/debugfs.c
+++ b/net/mac80211/debugfs.c
@@ -148,6 +148,87 @@ static const struct file_operations aqm_ops = {
.llseek = default_llseek,
 };
 
+static ssize_t aql_txq_limit_read(struct file *file,
+ char __user *user_buf,
+ size_t count,
+ loff_t *ppos)
+{
+   struct ieee80211_local *local = file->private_data;
+   char buf[400];
+   int len = 0;
+
+   len = scnprintf(buf, sizeof(buf),
+   "AC AQL limit low   AQL limit high\n"
+   "VO %u  %u\n"
+   "VI %u  %u\n"
+   "BE %u  %u\n"
+   "BK %u  %u\n",
+   local->aql_txq_limit_low[IEEE802

[PATCH v10 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

2019-11-16 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

The previous commit added the ability to throttle stations when they queue
too much airtime in the hardware. This commit enables the functionality by
calculating the expected airtime usage of each packet that is dequeued from
the TXQs in mac80211, and accounting that as pending airtime.

The estimated airtime for each skb is stored in the tx_info, so we can
subtract the same amount from the running total when the skb is freed or
recycled. The throttling mechanism relies on this accounting to be
accurate (i.e., that we are not freeing skbs without subtracting any
airtime they were accounted for), so we put the subtraction into
ieee80211_report_used_skb(). As an optimisation, we also subtract the
airtime on regular TX completion, zeroing out the value stored in the
packet afterwards, to avoid having to do an expensive lookup of the station
from the packet data on every packet.

This patch does *not* include any mechanism to wake a throttled TXQ again,
on the assumption that this will happen anyway as a side effect of whatever
freed the skb (most commonly a TX completion).

Signed-off-by: Toke Høiland-Jørgensen 
---
 include/net/mac80211.h |   16 
 net/mac80211/status.c  |   26 ++
 net/mac80211/tx.c  |   18 ++
 3 files changed, 60 insertions(+)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index ba3f33cc41ea..aa145808e57a 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -1060,6 +1060,22 @@ struct ieee80211_tx_info {
};
 };
 
+static inline u16
+ieee80211_info_set_tx_time_est(struct ieee80211_tx_info *info, u16 tx_time_est)
+{
+   /* We only have 10 bits in tx_time_est, so store airtime
+* in increments of 4us and clamp the maximum to 2**12-1
+*/
+   info->tx_time_est = min_t(u16, tx_time_est, 4095) >> 2;
+   return info->tx_time_est << 2;
+}
+
+static inline u16
+ieee80211_info_get_tx_time_est(struct ieee80211_tx_info *info)
+{
+   return info->tx_time_est << 2;
+}
+
 /**
  * struct ieee80211_tx_status - extended tx status info for rate control
  *
diff --git a/net/mac80211/status.c b/net/mac80211/status.c
index 0e51def35b8a..39da82b35be9 100644
--- a/net/mac80211/status.c
+++ b/net/mac80211/status.c
@@ -670,12 +670,26 @@ static void ieee80211_report_used_skb(struct 
ieee80211_local *local,
  struct sk_buff *skb, bool dropped)
 {
struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
+   u16 tx_time_est = ieee80211_info_get_tx_time_est(info);
struct ieee80211_hdr *hdr = (void *)skb->data;
bool acked = info->flags & IEEE80211_TX_STAT_ACK;
 
if (dropped)
acked = false;
 
+   if (tx_time_est) {
+   struct sta_info *sta;
+
+   rcu_read_lock();
+
+   sta = sta_info_get_by_addrs(local, hdr->addr1, hdr->addr2);
+   ieee80211_sta_update_pending_airtime(local, sta,
+skb_get_queue_mapping(skb),
+tx_time_est,
+true);
+   rcu_read_unlock();
+   }
+
if (info->flags & IEEE80211_TX_INTFL_MLME_CONN_TX) {
struct ieee80211_sub_if_data *sdata;
 
@@ -877,6 +891,7 @@ static void __ieee80211_tx_status(struct ieee80211_hw *hw,
struct ieee80211_bar *bar;
int shift = 0;
int tid = IEEE80211_NUM_TIDS;
+   u16 tx_time_est;
 
rates_idx = ieee80211_tx_get_rates(hw, info, &retry_count);
 
@@ -986,6 +1001,17 @@ static void __ieee80211_tx_status(struct ieee80211_hw *hw,
ieee80211_sta_register_airtime(&sta->sta, tid,
   info->status.tx_time, 0);
 
+   if ((tx_time_est = ieee80211_info_get_tx_time_est(info)) > 0) {
+   /* Do this here to avoid the expensive lookup of the sta
+* in ieee80211_report_used_skb().
+*/
+   ieee80211_sta_update_pending_airtime(local, sta,
+
skb_get_queue_mapping(skb),
+tx_time_est,
+true);
+   ieee80211_info_set_tx_time_est(info, 0);
+   }
+
if (ieee80211_hw_check(&local->hw, REPORTS_TX_ACK_STATUS)) {
if (info->flags & IEEE80211_TX_STAT_ACK) {
if (sta->status_stats.lost_packets)
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index aafc67b562eb..2fb6571453e7 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3551,6 +3551,9 @@ struct sk_buff 

Re: [PATCH v9 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

2019-11-16 Thread Toke Høiland-Jørgensen
Kan Yan  writes:

>> +static inline u16
>> +ieee80211_info_set_tx_time_est(struct ieee80211_tx_info *info, u16 
>> tx_time_est)
>> +{
>> +   /* We only have 10 bits in tx_time_est, so store airtime
>> +* in increments of 4us and clamp the maximum to 2**12-1
>> +*/
>> +   info->tx_time_est = min_t(u16, tx_time_est, 4095) >> 2;
>> +   return info->tx_time_est;
>> +}
>> +
>> +static inline u16
>> +ieee80211_info_get_tx_time_est(struct ieee80211_tx_info *info)
>> +{
>> +   return info->tx_time_est << 2;
>> +}
>> +
>
> set_tx_time_est() returns airtime in different units (4us) than
> get_tx_time_est(), this will cause the pending_airtime out of whack.

Huh, you're quite right; oops! I meant to shift that back before
returning. Will fix.

> Given the fact that AQL is only tested in very limited platforms,
> should we set the default to disabled by removing this change in the
> next update?
>
> -   local->airtime_flags = AIRTIME_USE_TX | AIRTIME_USE_RX;
> +
> +   local->airtime_flags = AIRTIME_USE_TX |
> +  AIRTIME_USE_RX |
> +  AIRTIME_USE_AQL;
> +   local->aql_threshold = IEEE80211_AQL_THRESHOLD;
> +   atomic_set(&local->aql_total_pending_airtime, 0);

Well, we have the whole -rc series to get more testing in if we merge it
as-is. It's up to the maintainers, of course, but I would be in favour
of merging as-is, then optionally backing out the default before the
final release if problems do turn up. But I would hope that the limits
are sufficiently conservative that it would not result in any problems :)

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH v9 3/4] mac80211: Implement Airtime-based Queue Limit (AQL)

2019-11-15 Thread Toke Høiland-Jørgensen
From: Kan Yan 

In order for the Fq_CoDel algorithm integrated in mac80211 layer to operate
effectively to control excessive queueing latency, the CoDel algorithm
requires an accurate measure of how long packets stays in the queue, AKA
sojourn time. The sojourn time measured at the mac80211 layer doesn't
include queueing latency in the lower layer (firmware/hardware) and CoDel
expects lower layer to have a short queue. However, most 802.11ac chipsets
offload tasks such TX aggregation to firmware or hardware, thus have a deep
lower layer queue.

Without a mechanism to control the lower layer queue size, packets only
stay in mac80211 layer transiently before being sent to firmware queue.
As a result, the sojourn time measured by CoDel in the mac80211 layer is
almost always lower than the CoDel latency target, hence CoDel does little
to control the latency, even when the lower layer queue causes excessive
latency.

The Byte Queue Limits (BQL) mechanism is commonly used to address the
similar issue with wired network interface. However, this method cannot be
applied directly to the wireless network interface. "Bytes" is not a
suitable measure of queue depth in the wireless network, as the data rate
can vary dramatically from station to station in the same network, from a
few Mbps to over Gbps.

This patch implements an Airtime-based Queue Limit (AQL) to make CoDel work
effectively with wireless drivers that utilized firmware/hardware
offloading. AQL allows each txq to release just enough packets to the lower
layer to form 1-2 large aggregations to keep hardware fully utilized and
retains the rest of the frames in mac80211 layer to be controlled by the
CoDel algorithm.

Signed-off-by: Kan Yan 
[ Toke: Keep API to set pending airtime internal, fix nits in commit msg ]
Signed-off-by: Toke Høiland-Jørgensen 
---
 include/net/cfg80211.h |7 
 include/net/mac80211.h |   12 ++
 net/mac80211/debugfs.c |   85 
 net/mac80211/debugfs_sta.c |   43 +-
 net/mac80211/ieee80211_i.h |4 ++
 net/mac80211/main.c|   10 +
 net/mac80211/sta_info.c|   38 
 net/mac80211/sta_info.h|8 
 net/mac80211/tx.c  |   47 +++-
 9 files changed, 240 insertions(+), 14 deletions(-)

diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index e309cc826b40..4f75dbdd406a 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -2606,6 +2606,13 @@ enum wiphy_params_flags {
 
 #define IEEE80211_DEFAULT_AIRTIME_WEIGHT   256
 
+/* The per TXQ device queue limit in airtime */
+#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_L  4000
+#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_H  8000
+
+/* The per interface airtime threshold to switch to lower queue limit */
+#define IEEE80211_AQL_THRESHOLD24000
+
 /**
  * struct cfg80211_pmksa - PMK Security Association
  *
diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index 6fc26a051ba0..ba3f33cc41ea 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -5565,6 +5565,18 @@ void ieee80211_send_eosp_nullfunc(struct ieee80211_sta 
*pubsta, int tid);
 void ieee80211_sta_register_airtime(struct ieee80211_sta *pubsta, u8 tid,
u32 tx_airtime, u32 rx_airtime);
 
+/**
+ * ieee80211_txq_airtime_check - check if a txq can send frame to device
+ *
+ * @hw: pointer obtained from ieee80211_alloc_hw()
+ * @txq: pointer obtained from station or virtual interface
+ *
+ * Return true if the AQL's airtime limit has not been reached and the txq can
+ * continue to send more packets to the device. Otherwise return false.
+ */
+bool
+ieee80211_txq_airtime_check(struct ieee80211_hw *hw, struct ieee80211_txq 
*txq);
+
 /**
  * ieee80211_iter_keys - iterate keys programmed into the device
  * @hw: pointer obtained from ieee80211_alloc_hw()
diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index 568b3b276931..399d4e8b8546 100644
--- a/net/mac80211/debugfs.c
+++ b/net/mac80211/debugfs.c
@@ -148,6 +148,87 @@ static const struct file_operations aqm_ops = {
.llseek = default_llseek,
 };
 
+static ssize_t aql_txq_limit_read(struct file *file,
+ char __user *user_buf,
+ size_t count,
+ loff_t *ppos)
+{
+   struct ieee80211_local *local = file->private_data;
+   char buf[400];
+   int len = 0;
+
+   len = scnprintf(buf, sizeof(buf),
+   "AC AQL limit low   AQL limit high\n"
+   "VO %u  %u\n"
+   "VI %u  %u\n"
+   "BE %u  %u\n"
+   "BK %u  %u\n",
+   local->aql_txq_limit_low[IEEE802

[PATCH v9 1/4] mac80211: Add new sta_info getter by sta/vif addrs

2019-11-15 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

In ieee80211_tx_status() we don't have an sdata struct when looking up the
destination sta. Instead, we just do a lookup by the vif addr that is the
source of the packet being completed. Factor this out into a new sta_info
getter helper, since we need to use it for accounting AQL as well.

Signed-off-by: Toke Høiland-Jørgensen 
---
 net/mac80211/sta_info.c |   20 
 net/mac80211/sta_info.h |3 +++
 net/mac80211/status.c   |   10 ++
 3 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index bd11fef2139f..465d83b13582 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -210,6 +210,26 @@ struct sta_info *sta_info_get_bss(struct 
ieee80211_sub_if_data *sdata,
return NULL;
 }
 
+struct sta_info *sta_info_get_by_addrs(struct ieee80211_local *local,
+  const u8 *sta_addr, const u8 *vif_addr)
+{
+   struct rhlist_head *tmp;
+   struct sta_info *sta;
+
+   rcu_read_lock();
+   for_each_sta_info(local, sta_addr, sta, tmp) {
+   if (ether_addr_equal(vif_addr, sta->sdata->vif.addr)) {
+   rcu_read_unlock();
+   /* this is safe as the caller must already hold
+* another rcu read section or the mutex
+*/
+   return sta;
+   }
+   }
+   rcu_read_unlock();
+   return NULL;
+}
+
 struct sta_info *sta_info_get_by_idx(struct ieee80211_sub_if_data *sdata,
 int idx)
 {
diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
index 369c2dddce52..80e76569144e 100644
--- a/net/mac80211/sta_info.h
+++ b/net/mac80211/sta_info.h
@@ -725,6 +725,9 @@ struct sta_info *sta_info_get(struct ieee80211_sub_if_data 
*sdata,
 struct sta_info *sta_info_get_bss(struct ieee80211_sub_if_data *sdata,
  const u8 *addr);
 
+struct sta_info *sta_info_get_by_addrs(struct ieee80211_local *local,
+  const u8 *sta_addr, const u8 *vif_addr);
+
 #define for_each_sta_info(local, _addr, _sta, _tmp)\
rhl_for_each_entry_rcu(_sta, _tmp,  \
   sta_info_hash_lookup(local, _addr), hash_node)
diff --git a/net/mac80211/status.c b/net/mac80211/status.c
index ab8ba5835ca0..0e51def35b8a 100644
--- a/net/mac80211/status.c
+++ b/net/mac80211/status.c
@@ -1073,19 +1073,13 @@ void ieee80211_tx_status(struct ieee80211_hw *hw, 
struct sk_buff *skb)
.skb = skb,
.info = IEEE80211_SKB_CB(skb),
};
-   struct rhlist_head *tmp;
struct sta_info *sta;
 
rcu_read_lock();
 
-   for_each_sta_info(local, hdr->addr1, sta, tmp) {
-   /* skip wrong virtual interface */
-   if (!ether_addr_equal(hdr->addr2, sta->sdata->vif.addr))
-   continue;
-
+   sta = sta_info_get_by_addrs(local, hdr->addr1, hdr->addr2);
+   if (sta)
status.sta = &sta->sta;
-   break;
-   }
 
__ieee80211_tx_status(hw, &status);
rcu_read_unlock();


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH v9 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

2019-11-15 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

The previous commit added the ability to throttle stations when they queue
too much airtime in the hardware. This commit enables the functionality by
calculating the expected airtime usage of each packet that is dequeued from
the TXQs in mac80211, and accounting that as pending airtime.

The estimated airtime for each skb is stored in the tx_info, so we can
subtract the same amount from the running total when the skb is freed or
recycled. The throttling mechanism relies on this accounting to be
accurate (i.e., that we are not freeing skbs without subtracting any
airtime they were accounted for), so we put the subtraction into
ieee80211_report_used_skb(). As an optimisation, we also subtract the
airtime on regular TX completion, zeroing out the value stored in the
packet afterwards, to avoid having to do an expensive lookup of the station
from the packet data on every packet.

This patch does *not* include any mechanism to wake a throttled TXQ again,
on the assumption that this will happen anyway as a side effect of whatever
freed the skb (most commonly a TX completion).

Signed-off-by: Toke Høiland-Jørgensen 
---
 include/net/mac80211.h |   16 
 net/mac80211/status.c  |   26 ++
 net/mac80211/tx.c  |   18 ++
 3 files changed, 60 insertions(+)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index ba3f33cc41ea..dcb4a1f19829 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -1060,6 +1060,22 @@ struct ieee80211_tx_info {
};
 };
 
+static inline u16
+ieee80211_info_set_tx_time_est(struct ieee80211_tx_info *info, u16 tx_time_est)
+{
+   /* We only have 10 bits in tx_time_est, so store airtime
+* in increments of 4us and clamp the maximum to 2**12-1
+*/
+   info->tx_time_est = min_t(u16, tx_time_est, 4095) >> 2;
+   return info->tx_time_est;
+}
+
+static inline u16
+ieee80211_info_get_tx_time_est(struct ieee80211_tx_info *info)
+{
+   return info->tx_time_est << 2;
+}
+
 /**
  * struct ieee80211_tx_status - extended tx status info for rate control
  *
diff --git a/net/mac80211/status.c b/net/mac80211/status.c
index 0e51def35b8a..39da82b35be9 100644
--- a/net/mac80211/status.c
+++ b/net/mac80211/status.c
@@ -670,12 +670,26 @@ static void ieee80211_report_used_skb(struct 
ieee80211_local *local,
  struct sk_buff *skb, bool dropped)
 {
struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
+   u16 tx_time_est = ieee80211_info_get_tx_time_est(info);
struct ieee80211_hdr *hdr = (void *)skb->data;
bool acked = info->flags & IEEE80211_TX_STAT_ACK;
 
if (dropped)
acked = false;
 
+   if (tx_time_est) {
+   struct sta_info *sta;
+
+   rcu_read_lock();
+
+   sta = sta_info_get_by_addrs(local, hdr->addr1, hdr->addr2);
+   ieee80211_sta_update_pending_airtime(local, sta,
+skb_get_queue_mapping(skb),
+tx_time_est,
+true);
+   rcu_read_unlock();
+   }
+
if (info->flags & IEEE80211_TX_INTFL_MLME_CONN_TX) {
struct ieee80211_sub_if_data *sdata;
 
@@ -877,6 +891,7 @@ static void __ieee80211_tx_status(struct ieee80211_hw *hw,
struct ieee80211_bar *bar;
int shift = 0;
int tid = IEEE80211_NUM_TIDS;
+   u16 tx_time_est;
 
rates_idx = ieee80211_tx_get_rates(hw, info, &retry_count);
 
@@ -986,6 +1001,17 @@ static void __ieee80211_tx_status(struct ieee80211_hw *hw,
ieee80211_sta_register_airtime(&sta->sta, tid,
   info->status.tx_time, 0);
 
+   if ((tx_time_est = ieee80211_info_get_tx_time_est(info)) > 0) {
+   /* Do this here to avoid the expensive lookup of the sta
+* in ieee80211_report_used_skb().
+*/
+   ieee80211_sta_update_pending_airtime(local, sta,
+
skb_get_queue_mapping(skb),
+tx_time_est,
+true);
+   ieee80211_info_set_tx_time_est(info, 0);
+   }
+
if (ieee80211_hw_check(&local->hw, REPORTS_TX_ACK_STATUS)) {
if (info->flags & IEEE80211_TX_STAT_ACK) {
if (sta->status_stats.lost_packets)
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index aafc67b562eb..2fb6571453e7 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3551,6 +3551,9 @@ struct sk_buff *ieee80211_tx

[PATCH v9 0/4] Add Airtime Queue Limits (AQL) to mac80211

2019-11-15 Thread Toke Høiland-Jørgensen
This series is a first attempt at porting the Airtime Queue Limits concept from
the out-of-tree ath10k implementation[0] to mac80211. This version takes Kan's
patch to do the throttling in mac80211, and replaces the driver API with the
mechanism from the previous version of my series, which instead calculated the
expected airtime at dequeue time inside mac80211, storing it in the SKB cb
field.

This series also imports Felix' airtime calculation code from mt76 into
mac80211, adjusting the API so it can be used from TX dequeue, by extracting the
latest TX rate from the tx_stats structure kept for each station.

As before, I've only compile tested this (lacking the proper hardware to do more
testing), but Kan has tested in on the QCA9984 platform.

The series is also available in my git repo here:
https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/log/?h=mac80211-aql-09

[0] 
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1703105/7

Changelog:

v9:
  - Use atomic_sub_return() instead of separate atomic_sub() and atomic_read()
  - Add getter/setter for tx_time_est
  - Use get_sta_info_by_addrs() to find the station in
ieee80211_report_used_skb()
  - Integrate everything back into one series

v8:
  - Includes Toke's v7 version of "mac80211: Import airtime calculation code 
from mt76"
  - Don't clobber sta's customized queue limit when configuring the default via 
debugfs
  - Fix a racing condition when reset aql_tx_pending.

v7:
  - Fix aql_total_pending_airtime underflow due to insufficient locking.

v6:
  - Fix sta lookup in ieee80211_report_used_skb().
  - Move call to ieee80211_sta_update_pending_airtime() to a bit later in
__ieee80211_tx_status() 
v5:
  - Add missing export of ieee80211_calc_rx_airtime() and make
ieee80211_calc_tx_airtime_rate() static (kbuildbot).
  - Use skb_get_queue_mapping() to get the AC from the skb.
  - Take basic rate configuration for the BSS into account when calculating
multicast rate.
v4:
  - Fix calculation that clamps the maximum airtime to fit into 10 bits
  - Incorporate Rich Brown's nits for the commit message in Kan's patch
  - Add fewer local variables to ieee80211_tx_dequeue()
v3:
  - Move the tx_time_est field so it's shared with ack_frame_id, and use units
of 4us for the value stored in it.
  - Move the addition of the Ethernet header size into 
ieee80211_calc_expected_tx_airtime()
v2:
  - Integrate Kan's approach to airtime throttling.
  - Hopefully fix the cb struct alignment on big-endian architectures.

---

Kan Yan (1):
      mac80211: Implement Airtime-based Queue Limit (AQL)

Toke Høiland-Jørgensen (3):
  mac80211: Add new sta_info getter by sta/vif addrs
  mac80211: Import airtime calculation code from mt76
  mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue


 include/net/cfg80211.h |7 +
 include/net/mac80211.h |   57 
 net/mac80211/Makefile  |3 
 net/mac80211/airtime.c |  597 
 net/mac80211/debugfs.c |   85 ++
 net/mac80211/debugfs_sta.c |   43 ++-
 net/mac80211/ieee80211_i.h |8 +
 net/mac80211/main.c|   10 +
 net/mac80211/sta_info.c|   58 
 net/mac80211/sta_info.h|   11 +
 net/mac80211/status.c  |   36 ++-
 net/mac80211/tx.c  |   65 +
 12 files changed, 957 insertions(+), 23 deletions(-)
 create mode 100644 net/mac80211/airtime.c


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH v9 2/4] mac80211: Import airtime calculation code from mt76

2019-11-15 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

Felix recently added code to calculate airtime of packets to the mt76
driver. Import this into mac80211 so we can use it for airtime queue limit
calculations.

The airtime.c file is copied verbatim from the mt76 driver, and adjusted to
be usable in mac80211. This involves:

- Switching to mac80211 data structures.
- Adding support for 160 MHz channels and HE mode.
- Moving the symbol and duration calculations around a bit to avoid
  rounding with the higher rates and longer symbol times used for HE rates.

The per-rate TX rate calculation is also split out to its own function so
it can be used directly for the AQL calculations later.

Signed-off-by: Toke Høiland-Jørgensen 
---
 include/net/mac80211.h |   29 ++
 net/mac80211/Makefile  |3 
 net/mac80211/airtime.c |  597 
 net/mac80211/ieee80211_i.h |4 
 4 files changed, 632 insertions(+), 1 deletion(-)
 create mode 100644 net/mac80211/airtime.c

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index c643a19dce96..6fc26a051ba0 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -6424,4 +6424,33 @@ void ieee80211_nan_func_match(struct ieee80211_vif *vif,
  struct cfg80211_nan_match_params *match,
  gfp_t gfp);
 
+/**
+ * ieee80211_calc_rx_airtime - calculate estimated transmission airtime for RX.
+ *
+ * This function calculates the estimated airtime usage of a frame based on the
+ * rate information in the RX status struct and the frame length.
+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @status: &struct ieee80211_rx_status containing the transmission rate
+ *  information.
+ * @len: frame length in bytes
+ */
+u32 ieee80211_calc_rx_airtime(struct ieee80211_hw *hw,
+ struct ieee80211_rx_status *status,
+ int len);
+
+/**
+ * ieee80211_calc_tx_airtime - calculate estimated transmission airtime for TX.
+ *
+ * This function calculates the estimated airtime usage of a frame based on the
+ * rate information in the TX info struct and the frame length.
+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @info: &struct ieee80211_tx_info of the frame.
+ * @len: frame length in bytes
+ */
+u32 ieee80211_calc_tx_airtime(struct ieee80211_hw *hw,
+ struct ieee80211_tx_info *info,
+ int len);
+
 #endif /* MAC80211_H */
diff --git a/net/mac80211/Makefile b/net/mac80211/Makefile
index 4f03ebe732fa..6cbb1286d6c0 100644
--- a/net/mac80211/Makefile
+++ b/net/mac80211/Makefile
@@ -32,7 +32,8 @@ mac80211-y := \
chan.o \
trace.o mlme.o \
tdls.o \
-   ocb.o
+   ocb.o \
+   airtime.o
 
 mac80211-$(CONFIG_MAC80211_LEDS) += led.o
 mac80211-$(CONFIG_MAC80211_DEBUGFS) += \
diff --git a/net/mac80211/airtime.c b/net/mac80211/airtime.c
new file mode 100644
index ..0c2a0bb94727
--- /dev/null
+++ b/net/mac80211/airtime.c
@@ -0,0 +1,597 @@
+// SPDX-License-Identifier: ISC
+/*
+ * Copyright (C) 2019 Felix Fietkau 
+ */
+
+#include 
+#include "ieee80211_i.h"
+#include "sta_info.h"
+
+#define AVG_PKT_SIZE   1024
+
+/* Number of bits for an average sized packet */
+#define MCS_NBITS (AVG_PKT_SIZE << 3)
+
+/* Number of kilo-symbols (symbols * 1024) for a packet with (bps) bits per
+ * symbol. We use k-symbols to avoid rounding in the _TIME macros below.
+ */
+#define MCS_N_KSYMS(bps) DIV_ROUND_UP(MCS_NBITS << 10, (bps))
+
+/* Transmission time (in 1024 * usec) for a packet containing (ksyms) * 1024
+ * symbols.
+ */
+#define MCS_SYMBOL_TIME(sgi, ksyms)\
+   (sgi ?  \
+ ((ksyms) * 4 * 18) / 20 : /* 3.6 us per sym */\
+ ((ksyms) * 4) /* 4.0 us per sym */\
+   )
+
+/* Transmit duration for the raw data part of an average sized packet */
+#define MCS_DURATION(streams, sgi, bps) \
+   ((u32)MCS_SYMBOL_TIME(sgi, MCS_N_KSYMS((streams) * (bps
+
+#define MCS_DURATION_S(shift, streams, sgi, bps)   \
+   ((u16)((MCS_DURATION(streams, sgi, bps) >> shift)))
+
+/* These should match the values in enum nl80211_he_gi */
+#define HE_GI_08 0
+#define HE_GI_16 1
+#define HE_GI_32 2
+
+/* Transmission time (1024 usec) for a packet containing (ksyms) * k-symbols */
+#define HE_SYMBOL_TIME(gi, ksyms)  \
+   (gi == HE_GI_08 ?   \
+((ksyms) * 16 * 17) / 20 : /* 13.6 us per sym */   \
+(gi == HE_GI_16 ?  \
+ ((ksyms) * 16 * 18) / 20 :/* 14.4 us per sym */   \
+ ((ksyms) * 16)/* 16.0 us per sym */   \
+))
+
+/* Transmi

[PATCH v7] mac80211: Import airtime calculation code from mt76

2019-11-11 Thread Toke Høiland-Jørgensen
Felix recently added code to calculate airtime of packets to the mt76
driver. Import this into mac80211 so we can use it for airtime queue limit
calculations.

The airtime.c file is copied verbatim from the mt76 driver, and adjusted to
be usable in mac80211. This involves:

- Switching to mac80211 data structures.
- Adding support for 160 MHz channels and HE mode.
- Moving the symbol and duration calculations around a bit to avoid
  rounding with the higher rates and longer symbol times used for HE rates.

The per-rate TX rate calculation is also split out to its own function so
it can be used directly for the AQL calculations later.

Signed-off-by: Toke Høiland-Jørgensen 
---
I realised this patch was fairly self-contained, so I thought I would
just go ahead and send it on its own instead of waiting for Kan to update
his patch.

Someone *please* check my math oh the HE mode stuff; I've done my best
to get this right based on public sources, but I have no hardware to
test it on.


 include/net/mac80211.h |  29 ++
 net/mac80211/Makefile  |   3 +-
 net/mac80211/airtime.c | 597 +
 net/mac80211/ieee80211_i.h |   4 +
 4 files changed, 632 insertions(+), 1 deletion(-)
 create mode 100644 net/mac80211/airtime.c

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index c643a19dce96..6fc26a051ba0 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -6424,4 +6424,33 @@ void ieee80211_nan_func_match(struct ieee80211_vif *vif,
  struct cfg80211_nan_match_params *match,
  gfp_t gfp);
 
+/**
+ * ieee80211_calc_rx_airtime - calculate estimated transmission airtime for RX.
+ *
+ * This function calculates the estimated airtime usage of a frame based on the
+ * rate information in the RX status struct and the frame length.
+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @status: &struct ieee80211_rx_status containing the transmission rate
+ *  information.
+ * @len: frame length in bytes
+ */
+u32 ieee80211_calc_rx_airtime(struct ieee80211_hw *hw,
+ struct ieee80211_rx_status *status,
+ int len);
+
+/**
+ * ieee80211_calc_tx_airtime - calculate estimated transmission airtime for TX.
+ *
+ * This function calculates the estimated airtime usage of a frame based on the
+ * rate information in the TX info struct and the frame length.
+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @info: &struct ieee80211_tx_info of the frame.
+ * @len: frame length in bytes
+ */
+u32 ieee80211_calc_tx_airtime(struct ieee80211_hw *hw,
+ struct ieee80211_tx_info *info,
+ int len);
+
 #endif /* MAC80211_H */
diff --git a/net/mac80211/Makefile b/net/mac80211/Makefile
index 4f03ebe732fa..6cbb1286d6c0 100644
--- a/net/mac80211/Makefile
+++ b/net/mac80211/Makefile
@@ -32,7 +32,8 @@ mac80211-y := \
chan.o \
trace.o mlme.o \
tdls.o \
-   ocb.o
+   ocb.o \
+   airtime.o
 
 mac80211-$(CONFIG_MAC80211_LEDS) += led.o
 mac80211-$(CONFIG_MAC80211_DEBUGFS) += \
diff --git a/net/mac80211/airtime.c b/net/mac80211/airtime.c
new file mode 100644
index ..0c2a0bb94727
--- /dev/null
+++ b/net/mac80211/airtime.c
@@ -0,0 +1,597 @@
+// SPDX-License-Identifier: ISC
+/*
+ * Copyright (C) 2019 Felix Fietkau 
+ */
+
+#include 
+#include "ieee80211_i.h"
+#include "sta_info.h"
+
+#define AVG_PKT_SIZE   1024
+
+/* Number of bits for an average sized packet */
+#define MCS_NBITS (AVG_PKT_SIZE << 3)
+
+/* Number of kilo-symbols (symbols * 1024) for a packet with (bps) bits per
+ * symbol. We use k-symbols to avoid rounding in the _TIME macros below.
+ */
+#define MCS_N_KSYMS(bps) DIV_ROUND_UP(MCS_NBITS << 10, (bps))
+
+/* Transmission time (in 1024 * usec) for a packet containing (ksyms) * 1024
+ * symbols.
+ */
+#define MCS_SYMBOL_TIME(sgi, ksyms)\
+   (sgi ?  \
+ ((ksyms) * 4 * 18) / 20 : /* 3.6 us per sym */\
+ ((ksyms) * 4) /* 4.0 us per sym */\
+   )
+
+/* Transmit duration for the raw data part of an average sized packet */
+#define MCS_DURATION(streams, sgi, bps) \
+   ((u32)MCS_SYMBOL_TIME(sgi, MCS_N_KSYMS((streams) * (bps
+
+#define MCS_DURATION_S(shift, streams, sgi, bps)   \
+   ((u16)((MCS_DURATION(streams, sgi, bps) >> shift)))
+
+/* These should match the values in enum nl80211_he_gi */
+#define HE_GI_08 0
+#define HE_GI_16 1
+#define HE_GI_32 2
+
+/* Transmission time (1024 usec) for a packet containing (ksyms) * k-symbols */
+#define HE_SYMBOL_TIME(gi, ksyms)  \
+   (gi == HE_GI_08 ?   \
+((ksyms) * 16 * 17) / 20 :

Re: [PATCH v6 3/4] mac80211: Implement Airtime-based Queue Limit (AQL)

2019-11-09 Thread Toke Høiland-Jørgensen
Kan Yan  writes:

> It is most likely just insufficient locking. active_txq_lock is per
> AC, can't protect local->aql_total_pending_airtime against racing
> conditions:
> void ieee80211_sta_update_pending_airtime(...)
> {
> spin_lock_bh(&local->active_txq_lock[ac]);
> ...
> local->aql_total_pending_airtime -= tx_airtime;
> ...
> spin_unlock_bh(&local->active_txq_lock[ac]);
> }

Ohh, right; didn't even realise those were not per-AC as well...

> After changing it to atomic_t, no more aql_total_pending_airtime
> underflow so far :). Using atomic operation should also help reduce
> CPU overhead due to lock contention, as
> ieee80211_sta_update_pending_airtime() is often called from the tx
> completion routine triggered by interrupts, often in a different core
> than where __ieee80211_schedule_txq() is running.
>
> I will post a new version a bit later if the test goes well.

Awesome! :)

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v6 3/4] mac80211: Implement Airtime-based Queue Limit (AQL)

2019-11-08 Thread Toke Høiland-Jørgensen
Johannes Berg  writes:

> On Fri, 2019-11-08 at 11:56 +0100, Toke Høiland-Jørgensen wrote:
>> Johannes Berg  writes:
>> 
>> > On Wed, 2019-10-23 at 11:59 +0200, Toke Høiland-Jørgensen wrote:
>> > >  
>> > > +void ieee80211_sta_update_pending_airtime(struct ieee80211_local *local,
>> > > +  struct sta_info *sta, u8 ac,
>> > > +  u16 tx_airtime, bool 
>> > > tx_completed)
>> > > +{
>> > > +spin_lock_bh(&local->active_txq_lock[ac]);
>> > > +if (tx_completed) {
>> > > +if (sta) {
>> > > +if (WARN_ONCE(sta->airtime[ac].aql_tx_pending < 
>> > > tx_airtime,
>> > > +  "TXQ pending airtime underflow: 
>> > > %u, %u",
>> > > +  sta->airtime[ac].aql_tx_pending, 
>> > > tx_airtime))
>> > 
>> > Maybe add the STA/AC to the message?
>> 
>> Can do. Any idea why we might be seeing underflows (as Kan reported)?
>
> No, I really have no idea. The shifting looked OK to me, though I didn't
> review it carefully enough to say I've really looked at all places ...

Right, bugger. I was thinking maybe there's a case where skbs can be
cloned (and retain the tx_time_est field) and then released twice? Or
maybe somewhere that steps on the skb->cb field in some other way?
Couldn't find anything obvious on a first perusal of the TX path code,
but maybe you could think of something?

Otherwise I guess we'll be forced to go and do some actual,
old-fashioned debugging ;)

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v6 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

2019-11-08 Thread Toke Høiland-Jørgensen
Johannes Berg  writes:

> On Wed, 2019-10-23 at 11:59 +0200, Toke Høiland-Jørgensen wrote:
>> 
>> +if (info->tx_time_est) {
>> +struct sta_info *sta = NULL, *s;
>> +struct rhlist_head *tmp;
>> +
>> +rcu_read_lock();
>> +
>> +for_each_sta_info(local, hdr->addr1, s, tmp) {
>> +/* skip wrong virtual interface */
>> +if (!ether_addr_equal(hdr->addr2, s->sdata->vif.addr))
>> +continue;
>> +
>> +sta = s;
>> +break;
>> +}
>
> I guess that is better than looking up the sdata and then using
> sta_info_get(), but I think I'd like to see this wrapped into a function
> (even if it's an inline) in sta_info.{c,h}.

OK, can do.

>> +airtime = ieee80211_calc_expected_tx_airtime(hw, vif, txq->sta,
>> + skb->len);
>> +if (airtime) {
>> +/* We only have 10 bits in tx_time_est, so store airtime
>> + * in increments of 4us and clamp the maximum to 2**12-1
>> + */
>> +airtime = min_t(u32, airtime, 4095) & ~3U;
>> +info->tx_time_est = airtime >> 2;
>> +ieee80211_sta_update_pending_airtime(local, tx.sta,
>> + txq->ac, airtime,
>> + false);
>
> I wonder if it'd be better to pass the shifted value to
> ieee80211_sta_update_pending_airtime() to avoid all the shifting in all
> places?
>
> You could even store the shifted value in "aql_tx_pending" and
> "aql_total_pending_airtime" etc., it's completely equivalent, and only
> shift it out for people looking at debugfs.

My reasoning for doing it this way was that we have another set of APIs
dealing with airtime which doesn't do any shifting; so keeping the APIs
in the same unit (straight airtime) seemed less confusing.

We could add (inline) setter and getter functions for the tx_time_est
field instead to avoid sprinkling shifts all over the place? :)

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v6 3/4] mac80211: Implement Airtime-based Queue Limit (AQL)

2019-11-08 Thread Toke Høiland-Jørgensen
Johannes Berg  writes:

> On Wed, 2019-10-23 at 11:59 +0200, Toke Høiland-Jørgensen wrote:
>> 
>>  
>> +void ieee80211_sta_update_pending_airtime(struct ieee80211_local *local,
>> +  struct sta_info *sta, u8 ac,
>> +  u16 tx_airtime, bool tx_completed)
>> +{
>> +spin_lock_bh(&local->active_txq_lock[ac]);
>> +if (tx_completed) {
>> +if (sta) {
>> +if (WARN_ONCE(sta->airtime[ac].aql_tx_pending < 
>> tx_airtime,
>> +  "TXQ pending airtime underflow: %u, %u",
>> +  sta->airtime[ac].aql_tx_pending, 
>> tx_airtime))
>
> Maybe add the STA/AC to the message?

Can do. Any idea why we might be seeing underflows (as Kan reported)?

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v6 2/4] mac80211: Import airtime calculation code from mt76

2019-11-08 Thread Toke Høiland-Jørgensen
Johannes Berg  writes:

> On Wed, 2019-10-23 at 11:59 +0200, Toke Høiland-Jørgensen wrote:
>> From: Toke Høiland-Jørgensen 
>> 
>> Felix recently added code to calculate airtime of packets to the mt76
>> driver. Import this into mac80211 so we can use it for airtime queue limit
>> calculations later.
>> 
>> The airtime.c file is copied verbatim from the mt76 driver, and adjusted to
>> use mac80211 data structures instead (which is fairly straight forward).
>> The per-rate TX rate calculation is split out to its own
>> function (ieee80211_calc_tx_airtime_rate()) so it can be used directly for
>> the AQL calculations added in a subsequent patch.
>
> Any way it could be exposed by mac80211 back to the drivers, perhaps, to
> share it?

Didn't I already export some of the functions? My intention was to do
that, certainly, and to patch mt76 to switch to using them once the
trees have converged...

>> The only thing that it was not possible to port directly was the bit that
>> read the internal driver flags of struct ieee80211_rate to determine
>> whether a rate is using CCK or OFDM encoding. Instead, just look at the
>> rate index, since at least mt76 and ath10k both seem to have the same
>> number of CCK rates (4) in their tables.
>
> This is highly questionable ...
>
>> +switch (status->encoding) {
>> +case RX_ENC_LEGACY:
>> +if (WARN_ON_ONCE(status->band > NL80211_BAND_5GHZ))
>> +return 0;
>> +
>> +sband = hw->wiphy->bands[status->band];
>> +if (!sband || status->rate_idx > sband->n_bitrates)
>> +return 0;
>> +
>> +rate = &sband->bitrates[status->rate_idx];
>> +cck = (status->rate_idx < CCK_NUM_RATES);

Heh, yeah this did feel like a hack to me as well ;)

> Why not
>
>   cck = rate->flags & IEEE80211_RATE_MANDATORY_B;
>
> I mean .. we know that IEEE80211_RATE_MANDATORY_B rates are exactly the
> CCK rates, and that's not really going to change?
>
> Alternatively, we could do
>
>   cck = sband->band == NL80211_BAND_2GHZ &&
> !(rate->flags & IEEE80211_RATE_ERP_G);
>
> or even
>
>   cck = rate->bitrate == 10 || rate->bitrate == 20 ||
> rate->bitrate == 55 || rate->bitrate == 110;

I am fine with either of those; I just wasn't sure what assumptions I
could actually make here. I guess I'll just pick one :)

>> +default:
>> +WARN_ON_ONCE(1);
>
> You can't do that in mac80211 either. That might be fine for mt76, but
> mac80211 already supports HE.

Good point, will fix.

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v6 0/4] Add Airtime Queue Limits (AQL) to mac80211

2019-11-07 Thread Toke Høiland-Jørgensen
Kan Yan  writes:

> Patchset v6 works for me with ath10k driver.  AQL does its job as
> expected and tests show very significant reduction in latency in
> congested environment. The txq stuck issue in patchset v4 got fixed.

Awesome! Thank you for testing!

> However, the device's total pending airtime count still underflows
> sometimes. Even though it doesn't cause apparent side effect, there is
> some issue with the pending airtime update and needs further
> debugging.

Huh, it *under*flows? That's... interesting. Cloned SKBs? Or maybe the
tx_time_est field is being set in some other place?

Could I get you to add a dump_stack() to the underflow test so we can
get an idea of where that happens?

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v4 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

2019-10-23 Thread Toke Høiland-Jørgensen
Kan Yan  writes:

>>
>> Aha! Turns out I was doing the sta lookup completely wrong in
>> ieee80211_report_used_skb(); so anything frames that were dropped and
>> went through there would not get its airtime subtracted correctly. Will
>> send a v6 with a fix :)
>
> Awesome, thanks! That looks very promising.  The symptom I see with
> previous patch is the interface's pending airtime count looks fine, but the
> station's airtime get wrong, either due to airtime is credited to the wrong
> station or wrong AC.

Right, if the interface count is fine that means it's not a missing call
to the subtraction. So hopefully it will be fixed with v6 :)

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH v6 3/4] mac80211: Implement Airtime-based Queue Limit (AQL)

2019-10-23 Thread Toke Høiland-Jørgensen
From: Kan Yan 

In order for the Fq_CoDel algorithm integrated in mac80211 layer to operate
effectively to control excessive queueing latency, the CoDel algorithm
requires an accurate measure of how long packets stays in the queue, AKA
sojourn time. The sojourn time measured at the mac80211 layer doesn't
include queueing latency in the lower layer (firmware/hardware) and CoDel
expects lower layer to have a short queue. However, most 802.11ac chipsets
offload tasks such TX aggregation to firmware or hardware, thus have a deep
lower layer queue.

Without a mechanism to control the lower layer queue size, packets only
stay in mac80211 layer transiently before being sent to firmware queue.
As a result, the sojourn time measured by CoDel in the mac80211 layer is
almost always lower than the CoDel latency target, hence CoDel does little
to control the latency, even when the lower layer queue causes excessive
latency.

The Byte Queue Limits (BQL) mechanism is commonly used to address the
similar issue with wired network interface. However, this method cannot be
applied directly to the wireless network interface. "Bytes" is not a
suitable measure of queue depth in the wireless network, as the data rate
can vary dramatically from station to station in the same network, from a
few Mbps to over Gbps.

This patch implements an Airtime-based Queue Limit (AQL) to make CoDel work
effectively with wireless drivers that utilized firmware/hardware
offloading. AQL allows each txq to release just enough packets to the lower
layer to form 1-2 large aggregations to keep hardware fully utilized and
retains the rest of the frames in mac80211 layer to be controlled by the
CoDel algorithm.

Signed-off-by: Kan Yan 
[ Toke: Keep API to set pending airtime internal, fix nits in commit msg ]
Signed-off-by: Toke Høiland-Jørgensen 
---
 include/net/cfg80211.h |7 
 include/net/mac80211.h |   12 +++
 net/mac80211/debugfs.c |   78 
 net/mac80211/debugfs_sta.c |   43 +++-
 net/mac80211/ieee80211_i.h |4 ++
 net/mac80211/main.c|9 +
 net/mac80211/sta_info.c|   32 ++
 net/mac80211/sta_info.h|8 +
 net/mac80211/tx.c  |   46 --
 9 files changed, 225 insertions(+), 14 deletions(-)

diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index ff45c3e1abff..8d50c0a60dbd 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -2602,6 +2602,13 @@ enum wiphy_params_flags {
 
 #define IEEE80211_DEFAULT_AIRTIME_WEIGHT   256
 
+/* The per TXQ device queue limit in airtime */
+#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_L  4000
+#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_H  8000
+
+/* The per interface airtime threshold to switch to lower queue limit */
+#define IEEE80211_AQL_THRESHOLD24000
+
 /**
  * struct cfg80211_pmksa - PMK Security Association
  *
diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index 8a3e0544a026..2bc0f2538a36 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -5565,6 +5565,18 @@ void ieee80211_send_eosp_nullfunc(struct ieee80211_sta 
*pubsta, int tid);
 void ieee80211_sta_register_airtime(struct ieee80211_sta *pubsta, u8 tid,
u32 tx_airtime, u32 rx_airtime);
 
+/**
+ * ieee80211_txq_airtime_check - check if a txq can send frame to device
+ *
+ * @hw: pointer obtained from ieee80211_alloc_hw()
+ * @txq: pointer obtained from station or virtual interface
+ *
+ * Return true if the AQL's airtime limit has not been reached and the txq can
+ * continue to send more packets to the device. Otherwise return false.
+ */
+bool
+ieee80211_txq_airtime_check(struct ieee80211_hw *hw, struct ieee80211_txq 
*txq);
+
 /**
  * ieee80211_iter_keys - iterate keys programmed into the device
  * @hw: pointer obtained from ieee80211_alloc_hw()
diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index 568b3b276931..d77ea0e51c1d 100644
--- a/net/mac80211/debugfs.c
+++ b/net/mac80211/debugfs.c
@@ -148,6 +148,80 @@ static const struct file_operations aqm_ops = {
.llseek = default_llseek,
 };
 
+static ssize_t aql_txq_limit_read(struct file *file,
+ char __user *user_buf,
+ size_t count,
+ loff_t *ppos)
+{
+   struct ieee80211_local *local = file->private_data;
+   char buf[400];
+   int len = 0;
+
+   len = scnprintf(buf, sizeof(buf),
+   "AC AQL limit low   AQL limit high\n"
+   "VO %u  %u\n"
+   "VI %u  %u\n"
+   "BE %u  %u\n"
+   "BK %u  %u\n",
+   local->aql_txq_limit_low[IEEE802

[PATCH v6 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

2019-10-23 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

The previous commit added the ability to throttle stations when they queue
too much airtime in the hardware. This commit enables the functionality by
calculating the expected airtime usage of each packet that is dequeued from
the TXQs in mac80211, and accounting that as pending airtime.

The estimated airtime for each skb is stored in the tx_info, so we can
subtract the same amount from the running total when the skb is freed or
recycled. The throttling mechanism relies on this accounting to be
accurate (i.e., that we are not freeing skbs without subtracting any
airtime they were accounted for), so we put the subtraction into
ieee80211_report_used_skb(). As an optimisation, we also subtract the
airtime on regular TX completion, zeroing out the value stored in the
packet afterwards, to avoid having to do an expensive lookup of the station
from the packet data on every packet.

This patch does *not* include any mechanism to wake a throttled TXQ again,
on the assumption that this will happen anyway as a side effect of whatever
freed the skb (most commonly a TX completion).

Signed-off-by: Toke Høiland-Jørgensen 
---
 net/mac80211/status.c |   33 +
 net/mac80211/tx.c |   21 +
 2 files changed, 54 insertions(+)

diff --git a/net/mac80211/status.c b/net/mac80211/status.c
index ab8ba5835ca0..ae15c8fd2421 100644
--- a/net/mac80211/status.c
+++ b/net/mac80211/status.c
@@ -676,6 +676,28 @@ static void ieee80211_report_used_skb(struct 
ieee80211_local *local,
if (dropped)
acked = false;
 
+   if (info->tx_time_est) {
+   struct sta_info *sta = NULL, *s;
+   struct rhlist_head *tmp;
+
+   rcu_read_lock();
+
+   for_each_sta_info(local, hdr->addr1, s, tmp) {
+   /* skip wrong virtual interface */
+   if (!ether_addr_equal(hdr->addr2, s->sdata->vif.addr))
+   continue;
+
+   sta = s;
+   break;
+   }
+
+   ieee80211_sta_update_pending_airtime(local, sta,
+skb_get_queue_mapping(skb),
+info->tx_time_est << 2,
+true);
+   rcu_read_unlock();
+   }
+
if (info->flags & IEEE80211_TX_INTFL_MLME_CONN_TX) {
struct ieee80211_sub_if_data *sdata;
 
@@ -986,6 +1008,17 @@ static void __ieee80211_tx_status(struct ieee80211_hw *hw,
ieee80211_sta_register_airtime(&sta->sta, tid,
   info->status.tx_time, 0);
 
+   if (info->tx_time_est) {
+   /* Do this here to avoid the expensive lookup of the sta
+* in ieee80211_report_used_skb().
+*/
+   ieee80211_sta_update_pending_airtime(local, sta,
+
skb_get_queue_mapping(skb),
+info->tx_time_est 
<< 2,
+true);
+   info->tx_time_est = 0;
+   }
+
if (ieee80211_hw_check(&local->hw, REPORTS_TX_ACK_STATUS)) {
if (info->flags & IEEE80211_TX_STAT_ACK) {
if (sta->status_stats.lost_packets)
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 12653d873b8c..1405304d8994 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3542,6 +3542,9 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw 
*hw,
 
WARN_ON_ONCE(softirq_count() == 0);
 
+   if (!ieee80211_txq_airtime_check(hw, txq))
+   return NULL;
+
 begin:
spin_lock_bh(&fq->lock);
 
@@ -3652,6 +3655,24 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw 
*hw,
}
 
IEEE80211_SKB_CB(skb)->control.vif = vif;
+
+   if (local->airtime_flags & AIRTIME_USE_AQL) {
+   u32 airtime;
+
+   airtime = ieee80211_calc_expected_tx_airtime(hw, vif, txq->sta,
+skb->len);
+   if (airtime) {
+   /* We only have 10 bits in tx_time_est, so store airtime
+* in increments of 4us and clamp the maximum to 2**12-1
+*/
+   airtime = min_t(u32, airtime, 4095) & ~3U;
+   info->tx_time_est = airtime >> 2;
+   ieee80211_sta_update_pending_airtime(local, tx.sta,
+txq->ac, airtime,
+

[PATCH v6 2/4] mac80211: Import airtime calculation code from mt76

2019-10-23 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

Felix recently added code to calculate airtime of packets to the mt76
driver. Import this into mac80211 so we can use it for airtime queue limit
calculations later.

The airtime.c file is copied verbatim from the mt76 driver, and adjusted to
use mac80211 data structures instead (which is fairly straight forward).
The per-rate TX rate calculation is split out to its own
function (ieee80211_calc_tx_airtime_rate()) so it can be used directly for
the AQL calculations added in a subsequent patch.

The only thing that it was not possible to port directly was the bit that
read the internal driver flags of struct ieee80211_rate to determine
whether a rate is using CCK or OFDM encoding. Instead, just look at the
rate index, since at least mt76 and ath10k both seem to have the same
number of CCK rates (4) in their tables.

Signed-off-by: Toke Høiland-Jørgensen 
---
 include/net/mac80211.h |   29 +++
 net/mac80211/Makefile  |3 
 net/mac80211/airtime.c |  390 
 net/mac80211/ieee80211_i.h |4 
 4 files changed, 425 insertions(+), 1 deletion(-)
 create mode 100644 net/mac80211/airtime.c

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index 4288ace72c2b..8a3e0544a026 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -6424,4 +6424,33 @@ void ieee80211_nan_func_match(struct ieee80211_vif *vif,
  struct cfg80211_nan_match_params *match,
  gfp_t gfp);
 
+/**
+ * ieee80211_calc_rx_airtime - calculate estimated transmission airtime for RX.
+ *
+ * This function calculates the estimated airtime usage of a frame based on the
+ * rate information in the RX status struct and the frame length.
+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @status: &struct ieee80211_rx_status containing the transmission rate
+ *  information.
+ * @len: frame length in bytes
+ */
+u32 ieee80211_calc_rx_airtime(struct ieee80211_hw *hw,
+ struct ieee80211_rx_status *status,
+ int len);
+
+/**
+ * ieee80211_calc_tx_airtime - calculate estimated transmission airtime for TX.
+ *
+ * This function calculates the estimated airtime usage of a frame based on the
+ * rate information in the TX info struct and the frame length.
+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @info: &struct ieee80211_tx_info of the frame.
+ * @len: frame length in bytes
+ */
+u32 ieee80211_calc_tx_airtime(struct ieee80211_hw *hw,
+ struct ieee80211_tx_info *info,
+ int len);
+
 #endif /* MAC80211_H */
diff --git a/net/mac80211/Makefile b/net/mac80211/Makefile
index 4f03ebe732fa..6cbb1286d6c0 100644
--- a/net/mac80211/Makefile
+++ b/net/mac80211/Makefile
@@ -32,7 +32,8 @@ mac80211-y := \
chan.o \
trace.o mlme.o \
tdls.o \
-   ocb.o
+   ocb.o \
+   airtime.o
 
 mac80211-$(CONFIG_MAC80211_LEDS) += led.o
 mac80211-$(CONFIG_MAC80211_DEBUGFS) += \
diff --git a/net/mac80211/airtime.c b/net/mac80211/airtime.c
new file mode 100644
index ..a521f3730381
--- /dev/null
+++ b/net/mac80211/airtime.c
@@ -0,0 +1,390 @@
+// SPDX-License-Identifier: ISC
+/*
+ * Copyright (C) 2019 Felix Fietkau 
+ */
+
+#include 
+#include "ieee80211_i.h"
+#include "sta_info.h"
+
+#define AVG_PKT_SIZE   1024
+
+/* Number of bits for an average sized packet */
+#define MCS_NBITS (AVG_PKT_SIZE << 3)
+
+/* Number of symbols for a packet with (bps) bits per symbol */
+#define MCS_NSYMS(bps) DIV_ROUND_UP(MCS_NBITS, (bps))
+
+/* Transmission time (1024 usec) for a packet containing (syms) * symbols */
+#define MCS_SYMBOL_TIME(sgi, syms) \
+   (sgi ?  \
+ ((syms) * 18 * 1024 + 4 * 1024) / 5 : /* syms * 3.6 us */ \
+ ((syms) * 1024) << 2  /* syms * 4 us */   \
+   )
+
+/* Transmit duration for the raw data part of an average sized packet */
+#define MCS_DURATION(streams, sgi, bps) \
+   MCS_SYMBOL_TIME(sgi, MCS_NSYMS((streams) * (bps)))
+
+#define BW_20  0
+#define BW_40  1
+#define BW_80  2
+
+/*
+ * Define group sort order: HT40 -> SGI -> #streams
+ */
+#define IEEE80211_MAX_STREAMS  4
+#define IEEE80211_HT_STREAM_GROUPS 4 /* BW(=2) * SGI(=2) */
+#define IEEE80211_VHT_STREAM_GROUPS6 /* BW(=3) * SGI(=2) */
+
+#define IEEE80211_HT_GROUPS_NB (IEEE80211_MAX_STREAMS *\
+IEEE80211_HT_STREAM_GROUPS)
+#define IEEE80211_VHT_GROUPS_NB(IEEE80211_MAX_STREAMS *\
+IEEE80211_VHT_STREAM_GROUPS)
+#define IEEE80211_GROUPS_NB(IEEE80211_HT_GROUPS_NB +   \
+IEEE80211_VHT_GROUPS_NB

[PATCH v6 1/4] mac80211: Shrink the size of ack_frame_id to make room for tx_time_est

2019-10-23 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

To implement airtime queue limiting, we need to keep a running account of
the estimated airtime of all skbs queued into the device. Do to this
correctly, we need to store the airtime estimate into the skb so we can
decrease the outstanding balance when the skb is freed. This means that the
time estimate must be stored somewhere that will survive for the lifetime
of the skb.

To get this, decrease the size of the ack_frame_id field to 6 bits, and
lower the size of the ID space accordingly. This leaves 10 bits for use for
tx_time_est, which is enough to store a maximum of 4096 us, if we shift the
values so they become units of 4us.

Signed-off-by: Toke Høiland-Jørgensen 
---
 include/net/mac80211.h |4 +++-
 net/mac80211/cfg.c |2 +-
 net/mac80211/tx.c  |2 +-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index d69081c38788..4288ace72c2b 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -967,6 +967,7 @@ ieee80211_rate_get_vht_nss(const struct ieee80211_tx_rate 
*rate)
  * @band: the band to transmit on (use for checking for races)
  * @hw_queue: HW queue to put the frame on, skb_get_queue_mapping() gives the 
AC
  * @ack_frame_id: internal frame ID for TX status, used internally
+ * @tx_time_est: TX time estimate in units of 4us, used internally
  * @control: union part for control data
  * @control.rates: TX rates array to try
  * @control.rts_cts_rate_idx: rate for RTS or CTS
@@ -1007,7 +1008,8 @@ struct ieee80211_tx_info {
 
u8 hw_queue;
 
-   u16 ack_frame_id;
+   u16 ack_frame_id:6;
+   u16 tx_time_est:10;
 
union {
struct {
diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index 70739e746c13..4fb7f1f12109 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -3428,7 +3428,7 @@ int ieee80211_attach_ack_skb(struct ieee80211_local 
*local, struct sk_buff *skb,
 
spin_lock_irqsave(&local->ack_status_lock, spin_flags);
id = idr_alloc(&local->ack_status_frames, ack_skb,
-  1, 0x1, GFP_ATOMIC);
+  1, 0x40, GFP_ATOMIC);
spin_unlock_irqrestore(&local->ack_status_lock, spin_flags);
 
if (id < 0) {
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 938c10f7955b..a16c2f863702 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -2719,7 +2719,7 @@ static struct sk_buff *ieee80211_build_hdr(struct 
ieee80211_sub_if_data *sdata,
 
spin_lock_irqsave(&local->ack_status_lock, flags);
id = idr_alloc(&local->ack_status_frames, ack_skb,
-  1, 0x1, GFP_ATOMIC);
+  1, 0x40, GFP_ATOMIC);
spin_unlock_irqrestore(&local->ack_status_lock, flags);
 
if (id >= 0) {


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH v6 0/4] Add Airtime Queue Limits (AQL) to mac80211

2019-10-23 Thread Toke Høiland-Jørgensen
This series is a first attempt at porting the Airtime Queue Limits concept from
the out-of-tree ath10k implementation[0] to mac80211. This version takes Kan's
patch to do the throttling in mac80211, and replaces the driver API with the
mechanism from the previous version of my series, which instead calculated the
expected airtime at dequeue time inside mac80211, storing it in the SKB cb
field.

This series also imports Felix' airtime calculation code from mt76 into
mac80211, adjusting the API so it can be used from TX dequeue, by extracting the
latest TX rate from the tx_stats structure kept for each station.

As before, I've only compile tested this (lacking the proper hardware to do more
testing). So I'm hoping someone with a proper testing setup can take the whole
thing for a spin... :)

The series is also available in my git repo here:
https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/log/?h=mac80211-aql-06

Changelog:

v6:
  - Fix sta lookup in ieee80211_report_used_skb().
  - Move call to ieee80211_sta_update_pending_airtime() to a bit later in
__ieee80211_tx_status() 
v5:
  - Add missing export of ieee80211_calc_rx_airtime() and make
ieee80211_calc_tx_airtime_rate() static (kbuildbot).
  - Use skb_get_queue_mapping() to get the AC from the skb.
  - Take basic rate configuration for the BSS into account when calculating
multicast rate.
v4:
  - Fix calculation that clamps the maximum airtime to fit into 10 bits
  - Incorporate Rich Brown's nits for the commit message in Kan's patch
  - Add fewer local variables to ieee80211_tx_dequeue()
v3:
  - Move the tx_time_est field so it's shared with ack_frame_id, and use units
of 4us for the value stored in it.
  - Move the addition of the Ethernet header size into 
ieee80211_calc_expected_tx_airtime()
v2:
  - Integrate Kan's approach to airtime throttling.
  - Hopefully fix the cb struct alignment on big-endian architectures.

---

Kan Yan (1):
  mac80211: Implement Airtime-based Queue Limit (AQL)

Toke Høiland-Jørgensen (3):
  mac80211: Shrink the size of ack_frame_id to make room for tx_time_est
  mac80211: Import airtime calculation code from mt76
  mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue


 include/net/cfg80211.h |7 +
 include/net/mac80211.h |   45 +
 net/mac80211/Makefile  |3 
 net/mac80211/airtime.c |  390 
 net/mac80211/cfg.c |2 
 net/mac80211/debugfs.c |   78 +
 net/mac80211/debugfs_sta.c |   43 -
 net/mac80211/ieee80211_i.h |8 +
 net/mac80211/main.c|9 +
 net/mac80211/sta_info.c|   32 
 net/mac80211/sta_info.h|8 +
 net/mac80211/status.c  |   33 
 net/mac80211/tx.c  |   69 +++-
 13 files changed, 709 insertions(+), 18 deletions(-)
 create mode 100644 net/mac80211/airtime.c


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v4 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

2019-10-23 Thread Toke Høiland-Jørgensen
Toke Høiland-Jørgensen  writes:

> Kan Yan  writes:
>
>>> >> +   if (ieee80211_is_data_qos(hdr->frame_control)) {
>>> >> +   qc = ieee80211_get_qos_ctl(hdr);
>>> >> +   tid = qc[0] & 0xf;
>>> >> +   ac = ieee80211_ac_from_tid(tid);
>>> >> +   } else {
>>> >> +   ac = IEEE80211_AC_BE;
>>> >> +   }
>>> >
>>> > The tid/ac is incorrect either here or in __ieee80211_tx_status() when
>>> > tested with ath10k. The ac is set to AC_BE with test done using BK
>>> > class traffic, hence the pending airtime get updated for the wrong
>>> > txq.
>>>
>>> Huh, well that won't do, obviously :)
>>>
>>> Any idea why it might be wrong?
>>
>> somehow  ieee80211_is_data_qos() returns false. Besides,  qos_control
>> field doesn't seems to be set in ieee80211_build_hdr().
>>
>>> Hmm, I guess we could just get the ac using skb_get_queue_mapping().
>>> I'll send an update with this fixed for you to try :)
>> Thanks for the quick update. It is getting much better, but
>> unfortunately the pending airtime accounting sometimes is still not
>> correct and cause txq stuck occasionally.
>
> OK, so that has to mean that there are packets getting dropped somewhere
> without going through ieee80211_report_used_skb(). Assuming you're not
> hitting the underflow warnings, just seeing the counter not get back
> down to zero?
>
> Could you see if you can find out if ath10k does that anywhere? I'll go
> hunting in mac80211.
>
> Looking for calls to kfree_skb() or kfree_skb_list() should hopefully
> turn up something...

Aha! Turns out I was doing the sta lookup completely wrong in
ieee80211_report_used_skb(); so anything frames that were dropped and
went through there would not get its airtime subtracted correctly. Will
send a v6 with a fix :)

-Toke



Re: [PATCH v4 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

2019-10-23 Thread Toke Høiland-Jørgensen
Kan Yan  writes:

>> >> +   if (ieee80211_is_data_qos(hdr->frame_control)) {
>> >> +   qc = ieee80211_get_qos_ctl(hdr);
>> >> +   tid = qc[0] & 0xf;
>> >> +   ac = ieee80211_ac_from_tid(tid);
>> >> +   } else {
>> >> +   ac = IEEE80211_AC_BE;
>> >> +   }
>> >
>> > The tid/ac is incorrect either here or in __ieee80211_tx_status() when
>> > tested with ath10k. The ac is set to AC_BE with test done using BK
>> > class traffic, hence the pending airtime get updated for the wrong
>> > txq.
>>
>> Huh, well that won't do, obviously :)
>>
>> Any idea why it might be wrong?
>
> somehow  ieee80211_is_data_qos() returns false. Besides,  qos_control
> field doesn't seems to be set in ieee80211_build_hdr().
>
>> Hmm, I guess we could just get the ac using skb_get_queue_mapping().
>> I'll send an update with this fixed for you to try :)
> Thanks for the quick update. It is getting much better, but
> unfortunately the pending airtime accounting sometimes is still not
> correct and cause txq stuck occasionally.

OK, so that has to mean that there are packets getting dropped somewhere
without going through ieee80211_report_used_skb(). Assuming you're not
hitting the underflow warnings, just seeing the counter not get back
down to zero?

Could you see if you can find out if ath10k does that anywhere? I'll go
hunting in mac80211.

Looking for calls to kfree_skb() or kfree_skb_list() should hopefully
turn up something...

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH v5 2/4] mac80211: Import airtime calculation code from mt76

2019-10-22 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

Felix recently added code to calculate airtime of packets to the mt76
driver. Import this into mac80211 so we can use it for airtime queue limit
calculations later.

The airtime.c file is copied verbatim from the mt76 driver, and adjusted to
use mac80211 data structures instead (which is fairly straight forward).
The per-rate TX rate calculation is split out to its own
function (ieee80211_calc_tx_airtime_rate()) so it can be used directly for
the AQL calculations added in a subsequent patch.

The only thing that it was not possible to port directly was the bit that
read the internal driver flags of struct ieee80211_rate to determine
whether a rate is using CCK or OFDM encoding. Instead, just look at the
rate index, since at least mt76 and ath10k both seem to have the same
number of CCK rates (4) in their tables.

Signed-off-by: Toke Høiland-Jørgensen 
---
 include/net/mac80211.h |   29 +++
 net/mac80211/Makefile  |3 
 net/mac80211/airtime.c |  390 
 net/mac80211/ieee80211_i.h |4 
 4 files changed, 425 insertions(+), 1 deletion(-)
 create mode 100644 net/mac80211/airtime.c

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index 4288ace72c2b..8a3e0544a026 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -6424,4 +6424,33 @@ void ieee80211_nan_func_match(struct ieee80211_vif *vif,
  struct cfg80211_nan_match_params *match,
  gfp_t gfp);
 
+/**
+ * ieee80211_calc_rx_airtime - calculate estimated transmission airtime for RX.
+ *
+ * This function calculates the estimated airtime usage of a frame based on the
+ * rate information in the RX status struct and the frame length.
+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @status: &struct ieee80211_rx_status containing the transmission rate
+ *  information.
+ * @len: frame length in bytes
+ */
+u32 ieee80211_calc_rx_airtime(struct ieee80211_hw *hw,
+ struct ieee80211_rx_status *status,
+ int len);
+
+/**
+ * ieee80211_calc_tx_airtime - calculate estimated transmission airtime for TX.
+ *
+ * This function calculates the estimated airtime usage of a frame based on the
+ * rate information in the TX info struct and the frame length.
+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @info: &struct ieee80211_tx_info of the frame.
+ * @len: frame length in bytes
+ */
+u32 ieee80211_calc_tx_airtime(struct ieee80211_hw *hw,
+ struct ieee80211_tx_info *info,
+ int len);
+
 #endif /* MAC80211_H */
diff --git a/net/mac80211/Makefile b/net/mac80211/Makefile
index 4f03ebe732fa..6cbb1286d6c0 100644
--- a/net/mac80211/Makefile
+++ b/net/mac80211/Makefile
@@ -32,7 +32,8 @@ mac80211-y := \
chan.o \
trace.o mlme.o \
tdls.o \
-   ocb.o
+   ocb.o \
+   airtime.o
 
 mac80211-$(CONFIG_MAC80211_LEDS) += led.o
 mac80211-$(CONFIG_MAC80211_DEBUGFS) += \
diff --git a/net/mac80211/airtime.c b/net/mac80211/airtime.c
new file mode 100644
index ..a521f3730381
--- /dev/null
+++ b/net/mac80211/airtime.c
@@ -0,0 +1,390 @@
+// SPDX-License-Identifier: ISC
+/*
+ * Copyright (C) 2019 Felix Fietkau 
+ */
+
+#include 
+#include "ieee80211_i.h"
+#include "sta_info.h"
+
+#define AVG_PKT_SIZE   1024
+
+/* Number of bits for an average sized packet */
+#define MCS_NBITS (AVG_PKT_SIZE << 3)
+
+/* Number of symbols for a packet with (bps) bits per symbol */
+#define MCS_NSYMS(bps) DIV_ROUND_UP(MCS_NBITS, (bps))
+
+/* Transmission time (1024 usec) for a packet containing (syms) * symbols */
+#define MCS_SYMBOL_TIME(sgi, syms) \
+   (sgi ?  \
+ ((syms) * 18 * 1024 + 4 * 1024) / 5 : /* syms * 3.6 us */ \
+ ((syms) * 1024) << 2  /* syms * 4 us */   \
+   )
+
+/* Transmit duration for the raw data part of an average sized packet */
+#define MCS_DURATION(streams, sgi, bps) \
+   MCS_SYMBOL_TIME(sgi, MCS_NSYMS((streams) * (bps)))
+
+#define BW_20  0
+#define BW_40  1
+#define BW_80  2
+
+/*
+ * Define group sort order: HT40 -> SGI -> #streams
+ */
+#define IEEE80211_MAX_STREAMS  4
+#define IEEE80211_HT_STREAM_GROUPS 4 /* BW(=2) * SGI(=2) */
+#define IEEE80211_VHT_STREAM_GROUPS6 /* BW(=3) * SGI(=2) */
+
+#define IEEE80211_HT_GROUPS_NB (IEEE80211_MAX_STREAMS *\
+IEEE80211_HT_STREAM_GROUPS)
+#define IEEE80211_VHT_GROUPS_NB(IEEE80211_MAX_STREAMS *\
+IEEE80211_VHT_STREAM_GROUPS)
+#define IEEE80211_GROUPS_NB(IEEE80211_HT_GROUPS_NB +   \
+IEEE80211_VHT_GROUPS_NB

[PATCH v5 0/4] Add Airtime Queue Limits (AQL) to mac80211

2019-10-22 Thread Toke Høiland-Jørgensen
This series is a first attempt at porting the Airtime Queue Limits concept from
the out-of-tree ath10k implementation[0] to mac80211. This version takes Kan's
patch to do the throttling in mac80211, and replaces the driver API with the
mechanism from the previous version of my series, which instead calculated the
expected airtime at dequeue time inside mac80211, storing it in the SKB cb
field.

This series also imports Felix' airtime calculation code from mt76 into
mac80211, adjusting the API so it can be used from TX dequeue, by extracting the
latest TX rate from the tx_stats structure kept for each station.

As before, I've only compile tested this (lacking the proper hardware to do more
testing). So I'm hoping someone with a proper testing setup can take the whole
thing for a spin... :)

The series is also available in my git repo here:
https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/log/?h=mac80211-aql-02

Changelog:

v5:
  - Add missing export of ieee80211_calc_rx_airtime() and make
ieee80211_calc_tx_airtime_rate() static (kbuildbot).
  - Use skb_get_queue_mapping() to get the AC from the skb.
  - Take basic rate configuration for the BSS into account when calculating
multicast rate.
v4:
  - Fix calculation that clamps the maximum airtime to fit into 10 bits
  - Incorporate Rich Brown's nits for the commit message in Kan's patch
  - Add fewer local variables to ieee80211_tx_dequeue()
v3:
  - Move the tx_time_est field so it's shared with ack_frame_id, and use units
of 4us for the value stored in it.
  - Move the addition of the Ethernet header size into 
ieee80211_calc_expected_tx_airtime()
v2:
  - Integrate Kan's approach to airtime throttling.
  - Hopefully fix the cb struct alignment on big-endian architectures.

---

Kan Yan (1):
  mac80211: Implement Airtime-based Queue Limit (AQL)

Toke Høiland-Jørgensen (3):
  mac80211: Shrink the size of ack_frame_id to make room for tx_time_est
  mac80211: Import airtime calculation code from mt76
  mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue


 include/net/cfg80211.h |7 +
 include/net/mac80211.h |   45 +
 net/mac80211/Makefile  |3 
 net/mac80211/airtime.c |  390 
 net/mac80211/cfg.c |2 
 net/mac80211/debugfs.c |   78 +
 net/mac80211/debugfs_sta.c |   43 -
 net/mac80211/ieee80211_i.h |8 +
 net/mac80211/main.c|9 +
 net/mac80211/sta_info.c|   32 
 net/mac80211/sta_info.h|8 +
 net/mac80211/status.c  |   28 +++
 net/mac80211/tx.c  |   69 +++-
 13 files changed, 704 insertions(+), 18 deletions(-)
 create mode 100644 net/mac80211/airtime.c



[PATCH v5 3/4] mac80211: Implement Airtime-based Queue Limit (AQL)

2019-10-22 Thread Toke Høiland-Jørgensen
From: Kan Yan 

In order for the Fq_CoDel algorithm integrated in mac80211 layer to operate
effectively to control excessive queueing latency, the CoDel algorithm
requires an accurate measure of how long packets stays in the queue, AKA
sojourn time. The sojourn time measured at the mac80211 layer doesn't
include queueing latency in the lower layer (firmware/hardware) and CoDel
expects lower layer to have a short queue. However, most 802.11ac chipsets
offload tasks such TX aggregation to firmware or hardware, thus have a deep
lower layer queue.

Without a mechanism to control the lower layer queue size, packets only
stay in mac80211 layer transiently before being sent to firmware queue.
As a result, the sojourn time measured by CoDel in the mac80211 layer is
almost always lower than the CoDel latency target, hence CoDel does little
to control the latency, even when the lower layer queue causes excessive
latency.

The Byte Queue Limits (BQL) mechanism is commonly used to address the
similar issue with wired network interface. However, this method cannot be
applied directly to the wireless network interface. "Bytes" is not a
suitable measure of queue depth in the wireless network, as the data rate
can vary dramatically from station to station in the same network, from a
few Mbps to over Gbps.

This patch implements an Airtime-based Queue Limit (AQL) to make CoDel work
effectively with wireless drivers that utilized firmware/hardware
offloading. AQL allows each txq to release just enough packets to the lower
layer to form 1-2 large aggregations to keep hardware fully utilized and
retains the rest of the frames in mac80211 layer to be controlled by the
CoDel algorithm.

Signed-off-by: Kan Yan 
[ Toke: Keep API to set pending airtime internal, fix nits in commit msg ]
Signed-off-by: Toke Høiland-Jørgensen 
---
 include/net/cfg80211.h |7 
 include/net/mac80211.h |   12 +++
 net/mac80211/debugfs.c |   78 
 net/mac80211/debugfs_sta.c |   43 +++-
 net/mac80211/ieee80211_i.h |4 ++
 net/mac80211/main.c|9 +
 net/mac80211/sta_info.c|   32 ++
 net/mac80211/sta_info.h|8 +
 net/mac80211/tx.c  |   46 --
 9 files changed, 225 insertions(+), 14 deletions(-)

diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index ff45c3e1abff..8d50c0a60dbd 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -2602,6 +2602,13 @@ enum wiphy_params_flags {
 
 #define IEEE80211_DEFAULT_AIRTIME_WEIGHT   256
 
+/* The per TXQ device queue limit in airtime */
+#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_L  4000
+#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_H  8000
+
+/* The per interface airtime threshold to switch to lower queue limit */
+#define IEEE80211_AQL_THRESHOLD24000
+
 /**
  * struct cfg80211_pmksa - PMK Security Association
  *
diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index 8a3e0544a026..2bc0f2538a36 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -5565,6 +5565,18 @@ void ieee80211_send_eosp_nullfunc(struct ieee80211_sta 
*pubsta, int tid);
 void ieee80211_sta_register_airtime(struct ieee80211_sta *pubsta, u8 tid,
u32 tx_airtime, u32 rx_airtime);
 
+/**
+ * ieee80211_txq_airtime_check - check if a txq can send frame to device
+ *
+ * @hw: pointer obtained from ieee80211_alloc_hw()
+ * @txq: pointer obtained from station or virtual interface
+ *
+ * Return true if the AQL's airtime limit has not been reached and the txq can
+ * continue to send more packets to the device. Otherwise return false.
+ */
+bool
+ieee80211_txq_airtime_check(struct ieee80211_hw *hw, struct ieee80211_txq 
*txq);
+
 /**
  * ieee80211_iter_keys - iterate keys programmed into the device
  * @hw: pointer obtained from ieee80211_alloc_hw()
diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index 568b3b276931..d77ea0e51c1d 100644
--- a/net/mac80211/debugfs.c
+++ b/net/mac80211/debugfs.c
@@ -148,6 +148,80 @@ static const struct file_operations aqm_ops = {
.llseek = default_llseek,
 };
 
+static ssize_t aql_txq_limit_read(struct file *file,
+ char __user *user_buf,
+ size_t count,
+ loff_t *ppos)
+{
+   struct ieee80211_local *local = file->private_data;
+   char buf[400];
+   int len = 0;
+
+   len = scnprintf(buf, sizeof(buf),
+   "AC AQL limit low   AQL limit high\n"
+   "VO %u  %u\n"
+   "VI %u  %u\n"
+   "BE %u  %u\n"
+   "BK %u  %u\n",
+   local->aql_txq_limit_low[IEEE802

[PATCH v5 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

2019-10-22 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

The previous commit added the ability to throttle stations when they queue
too much airtime in the hardware. This commit enables the functionality by
calculating the expected airtime usage of each packet that is dequeued from
the TXQs in mac80211, and accounting that as pending airtime.

The estimated airtime for each skb is stored in the tx_info, so we can
subtract the same amount from the running total when the skb is freed or
recycled. The throttling mechanism relies on this accounting to be
accurate (i.e., that we are not freeing skbs without subtracting any
airtime they were accounted for), so we put the subtraction into
ieee80211_report_used_skb(). As an optimisation, we also subtract the
airtime on regular TX completion, zeroing out the value stored in the
packet afterwards, to avoid having to do an expensive lookup of the station
from the packet data on every packet.

This patch does *not* include any mechanism to wake a throttled TXQ again,
on the assumption that this will happen anyway as a side effect of whatever
freed the skb (most commonly a TX completion).

Signed-off-by: Toke Høiland-Jørgensen 
---
 net/mac80211/status.c |   28 
 net/mac80211/tx.c |   21 +
 2 files changed, 49 insertions(+)

diff --git a/net/mac80211/status.c b/net/mac80211/status.c
index ab8ba5835ca0..f125984d5beb 100644
--- a/net/mac80211/status.c
+++ b/net/mac80211/status.c
@@ -676,6 +676,23 @@ static void ieee80211_report_used_skb(struct 
ieee80211_local *local,
if (dropped)
acked = false;
 
+   if (info->tx_time_est) {
+   struct ieee80211_sub_if_data *sdata;
+   struct sta_info *sta = NULL;
+
+   rcu_read_lock();
+
+   sdata = ieee80211_sdata_from_skb(local, skb);
+   if (sdata)
+   sta = sta_info_get_bss(sdata, skb_mac_header(skb));
+
+   ieee80211_sta_update_pending_airtime(local, sta,
+skb_get_queue_mapping(skb),
+info->tx_time_est << 2,
+true);
+   rcu_read_unlock();
+   }
+
if (info->flags & IEEE80211_TX_INTFL_MLME_CONN_TX) {
struct ieee80211_sub_if_data *sdata;
 
@@ -930,6 +947,17 @@ static void __ieee80211_tx_status(struct ieee80211_hw *hw,
tid = qc[0] & 0xf;
}
 
+   if (info->tx_time_est) {
+   /* Do this here to avoid the expensive lookup of the sta
+* in ieee80211_report_used_skb().
+*/
+   ieee80211_sta_update_pending_airtime(local, sta,
+
skb_get_queue_mapping(skb),
+info->tx_time_est 
<< 2,
+true);
+   info->tx_time_est = 0;
+   }
+
if (!acked && ieee80211_is_back_req(fc)) {
u16 control;
 
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 12653d873b8c..1405304d8994 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3542,6 +3542,9 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw 
*hw,
 
WARN_ON_ONCE(softirq_count() == 0);
 
+   if (!ieee80211_txq_airtime_check(hw, txq))
+   return NULL;
+
 begin:
spin_lock_bh(&fq->lock);
 
@@ -3652,6 +3655,24 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw 
*hw,
}
 
IEEE80211_SKB_CB(skb)->control.vif = vif;
+
+   if (local->airtime_flags & AIRTIME_USE_AQL) {
+   u32 airtime;
+
+   airtime = ieee80211_calc_expected_tx_airtime(hw, vif, txq->sta,
+skb->len);
+   if (airtime) {
+   /* We only have 10 bits in tx_time_est, so store airtime
+* in increments of 4us and clamp the maximum to 2**12-1
+*/
+   airtime = min_t(u32, airtime, 4095) & ~3U;
+   info->tx_time_est = airtime >> 2;
+   ieee80211_sta_update_pending_airtime(local, tx.sta,
+txq->ac, airtime,
+false);
+   }
+   }
+
return skb;
 
 out:



[PATCH v5 1/4] mac80211: Shrink the size of ack_frame_id to make room for tx_time_est

2019-10-22 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

To implement airtime queue limiting, we need to keep a running account of
the estimated airtime of all skbs queued into the device. Do to this
correctly, we need to store the airtime estimate into the skb so we can
decrease the outstanding balance when the skb is freed. This means that the
time estimate must be stored somewhere that will survive for the lifetime
of the skb.

To get this, decrease the size of the ack_frame_id field to 6 bits, and
lower the size of the ID space accordingly. This leaves 10 bits for use for
tx_time_est, which is enough to store a maximum of 4096 us, if we shift the
values so they become units of 4us.

Signed-off-by: Toke Høiland-Jørgensen 
---
 include/net/mac80211.h |4 +++-
 net/mac80211/cfg.c |2 +-
 net/mac80211/tx.c  |2 +-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index d69081c38788..4288ace72c2b 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -967,6 +967,7 @@ ieee80211_rate_get_vht_nss(const struct ieee80211_tx_rate 
*rate)
  * @band: the band to transmit on (use for checking for races)
  * @hw_queue: HW queue to put the frame on, skb_get_queue_mapping() gives the 
AC
  * @ack_frame_id: internal frame ID for TX status, used internally
+ * @tx_time_est: TX time estimate in units of 4us, used internally
  * @control: union part for control data
  * @control.rates: TX rates array to try
  * @control.rts_cts_rate_idx: rate for RTS or CTS
@@ -1007,7 +1008,8 @@ struct ieee80211_tx_info {
 
u8 hw_queue;
 
-   u16 ack_frame_id;
+   u16 ack_frame_id:6;
+   u16 tx_time_est:10;
 
union {
struct {
diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index 70739e746c13..4fb7f1f12109 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -3428,7 +3428,7 @@ int ieee80211_attach_ack_skb(struct ieee80211_local 
*local, struct sk_buff *skb,
 
spin_lock_irqsave(&local->ack_status_lock, spin_flags);
id = idr_alloc(&local->ack_status_frames, ack_skb,
-  1, 0x1, GFP_ATOMIC);
+  1, 0x40, GFP_ATOMIC);
spin_unlock_irqrestore(&local->ack_status_lock, spin_flags);
 
if (id < 0) {
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 938c10f7955b..a16c2f863702 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -2719,7 +2719,7 @@ static struct sk_buff *ieee80211_build_hdr(struct 
ieee80211_sub_if_data *sdata,
 
spin_lock_irqsave(&local->ack_status_lock, flags);
id = idr_alloc(&local->ack_status_frames, ack_skb,
-  1, 0x1, GFP_ATOMIC);
+  1, 0x40, GFP_ATOMIC);
spin_unlock_irqrestore(&local->ack_status_lock, flags);
 
if (id >= 0) {



Re: [PATCH v4 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

2019-10-22 Thread Toke Høiland-Jørgensen
Kan Yan  writes:

>> +   if (ieee80211_is_data_qos(hdr->frame_control)) {
>> +   qc = ieee80211_get_qos_ctl(hdr);
>> +   tid = qc[0] & 0xf;
>> +   ac = ieee80211_ac_from_tid(tid);
>> +   } else {
>> +   ac = IEEE80211_AC_BE;
>> +   }
>
> The tid/ac is incorrect either here or in __ieee80211_tx_status() when
> tested with ath10k. The ac is set to AC_BE with test done using BK
> class traffic,  hence the pending airtime get updated for the wrong
> txq.

Hmm, I guess we could just get the ac using skb_get_queue_mapping().
I'll send an update with this fixed for you to try :)

-Toke



Re: [PATCH v4 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

2019-10-22 Thread Toke Høiland-Jørgensen
Kan Yan  writes:

>> +   if (ieee80211_is_data_qos(hdr->frame_control)) {
>> +   qc = ieee80211_get_qos_ctl(hdr);
>> +   tid = qc[0] & 0xf;
>> +   ac = ieee80211_ac_from_tid(tid);
>> +   } else {
>> +   ac = IEEE80211_AC_BE;
>> +   }
>
> The tid/ac is incorrect either here or in __ieee80211_tx_status() when
> tested with ath10k. The ac is set to AC_BE with test done using BK
> class traffic, hence the pending airtime get updated for the wrong
> txq.

Huh, well that won't do, obviously :)

Any idea why it might be wrong?

> The rest of the patch seems to work as expected, after I did a quick
> hack to release the pending airtime from ath10k_txrx_tx_unref()
> instead, where the ac/tid can be directly retrieved from struck struct
> ieee80211_tx.

Awesome! Thanks for testing!

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v4 0/2] Implement Airtime-based Queue Limit (AQL)

2019-10-21 Thread Toke Høiland-Jørgensen
Kalle Valo  writes:

> (please don't top post)
>
> Kan Yan  writes:
>
>> I believe Toke will integrate this with his version and move the
>> estimating pending airtime part to mac80211, so maybe in the next
>> version, ath10k change is no longer required.
>
> What do you mean? Are you saying that I can drop this patch:
>
> [v4,2/2] ath10k: Enable Airtime-based Queue Limit (AQL)
>
> https://patchwork.kernel.org/patch/11184783/

Yes, we're trying to do it all in mac80211. See
https://patchwork.kernel.org/project/linux-wireless/list/?series=190333

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH v4 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

2019-10-19 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

The previous commit added the ability to throttle stations when they queue
too much airtime in the hardware. This commit enables the functionality by
calculating the expected airtime usage of each packet that is dequeued from
the TXQs in mac80211, and accounting that as pending airtime.

The estimated airtime for each skb is stored in the tx_info, so we can
subtract the same amount from the running total when the skb is freed or
recycled. The throttling mechanism relies on this accounting to be
accurate (i.e., that we are not freeing skbs without subtracting any
airtime they were accounted for), so we put the subtraction into
ieee80211_report_used_skb(). As an optimisation, we also subtract the
airtime on regular TX completion, zeroing out the value stored in the
packet afterwards, to avoid having to do an expensive lookup of the
station from the packet data on every packet.

This patch does *not* include any mechanism to wake a throttled TXQ again,
on the assumption that this will happen anyway as a side effect of whatever
freed the skb (most commonly a TX completion).

Signed-off-by: Toke Høiland-Jørgensen 
---
 net/mac80211/status.c |   38 ++
 net/mac80211/tx.c |   21 +
 2 files changed, 59 insertions(+)

diff --git a/net/mac80211/status.c b/net/mac80211/status.c
index ab8ba5835ca0..905b4afd457d 100644
--- a/net/mac80211/status.c
+++ b/net/mac80211/status.c
@@ -676,6 +676,33 @@ static void ieee80211_report_used_skb(struct 
ieee80211_local *local,
if (dropped)
acked = false;
 
+   if (info->tx_time_est) {
+   struct ieee80211_sub_if_data *sdata;
+   struct sta_info *sta = NULL;
+   u8 *qc, ac;
+   int tid;
+
+   rcu_read_lock();
+
+   sdata = ieee80211_sdata_from_skb(local, skb);
+   if (sdata)
+   sta = sta_info_get_bss(sdata, skb_mac_header(skb));
+
+   if (ieee80211_is_data_qos(hdr->frame_control)) {
+   qc = ieee80211_get_qos_ctl(hdr);
+   tid = qc[0] & 0xf;
+   ac = ieee80211_ac_from_tid(tid);
+   } else {
+   ac = IEEE80211_AC_BE;
+   }
+
+   ieee80211_sta_update_pending_airtime(local, sta, ac,
+info->tx_time_est << 2,
+true);
+   rcu_read_unlock();
+
+   }
+
if (info->flags & IEEE80211_TX_INTFL_MLME_CONN_TX) {
struct ieee80211_sub_if_data *sdata;
 
@@ -930,6 +957,17 @@ static void __ieee80211_tx_status(struct ieee80211_hw *hw,
tid = qc[0] & 0xf;
}
 
+   if (info->tx_time_est) {
+   /* Do this here to avoid the expensive lookup of the sta
+* in ieee80211_report_used_skb().
+*/
+   ieee80211_sta_update_pending_airtime(local, sta,
+
ieee80211_ac_from_tid(tid),
+info->tx_time_est 
<< 2,
+true);
+   info->tx_time_est = 0;
+   }
+
if (!acked && ieee80211_is_back_req(fc)) {
u16 control;
 
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 12653d873b8c..b8ff56d1d661 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3542,6 +3542,9 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw 
*hw,
 
WARN_ON_ONCE(softirq_count() == 0);
 
+   if (!ieee80211_txq_airtime_check(hw, txq))
+   return NULL;
+
 begin:
spin_lock_bh(&fq->lock);
 
@@ -3652,6 +3655,24 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw 
*hw,
}
 
IEEE80211_SKB_CB(skb)->control.vif = vif;
+
+   if (local->airtime_flags & AIRTIME_USE_AQL) {
+   u32 airtime;
+
+   airtime = ieee80211_calc_expected_tx_airtime(hw, vif, txq->sta,
+skb->len);
+   if (airtime) {
+   /* We only have 10 bits in tx_time_est, so store airtime
+* in increments of 4 us and clamp the maximum to 
2**12-1
+*/
+   airtime = min_t(u32, airtime, 4095) & ~3U;
+   info->tx_time_est = airtime >> 2;
+   ieee80211_sta_update_pending_airtime(local, tx.sta,
+txq->ac, airtime,
+false);
+   }
+   }
+
return skb;
 
 out:



[PATCH v4 1/4] mac80211: Shrink the size of ack_frame_id to make room for tx_time_est

2019-10-19 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

To implement airtime queue limiting, we need to keep a running account of
the estimated airtime of all skbs queued into the device. Do to this
correctly, we need to store the airtime estimate into the skb so we can
decrease the outstanding balance when the skb is freed. This means that the
time estimate must be stored somewhere that will survive for the lifetime
of the skb.

To get this, decrease the size of the ack_frame_id field to 6 bits, and
lower the size of the ID space accordingly. This leaves 10 bits for use for
tx_time_est, which is enough to store a maximum of 4096 us, if we shift the
values so they become units of 4us.

Signed-off-by: Toke Høiland-Jørgensen 
---
 include/net/mac80211.h |4 +++-
 net/mac80211/cfg.c |2 +-
 net/mac80211/tx.c  |2 +-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index d69081c38788..4288ace72c2b 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -967,6 +967,7 @@ ieee80211_rate_get_vht_nss(const struct ieee80211_tx_rate 
*rate)
  * @band: the band to transmit on (use for checking for races)
  * @hw_queue: HW queue to put the frame on, skb_get_queue_mapping() gives the 
AC
  * @ack_frame_id: internal frame ID for TX status, used internally
+ * @tx_time_est: TX time estimate in units of 4us, used internally
  * @control: union part for control data
  * @control.rates: TX rates array to try
  * @control.rts_cts_rate_idx: rate for RTS or CTS
@@ -1007,7 +1008,8 @@ struct ieee80211_tx_info {
 
u8 hw_queue;
 
-   u16 ack_frame_id;
+   u16 ack_frame_id:6;
+   u16 tx_time_est:10;
 
union {
struct {
diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index 70739e746c13..4fb7f1f12109 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -3428,7 +3428,7 @@ int ieee80211_attach_ack_skb(struct ieee80211_local 
*local, struct sk_buff *skb,
 
spin_lock_irqsave(&local->ack_status_lock, spin_flags);
id = idr_alloc(&local->ack_status_frames, ack_skb,
-  1, 0x1, GFP_ATOMIC);
+  1, 0x40, GFP_ATOMIC);
spin_unlock_irqrestore(&local->ack_status_lock, spin_flags);
 
if (id < 0) {
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 938c10f7955b..a16c2f863702 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -2719,7 +2719,7 @@ static struct sk_buff *ieee80211_build_hdr(struct 
ieee80211_sub_if_data *sdata,
 
spin_lock_irqsave(&local->ack_status_lock, flags);
id = idr_alloc(&local->ack_status_frames, ack_skb,
-  1, 0x1, GFP_ATOMIC);
+  1, 0x40, GFP_ATOMIC);
spin_unlock_irqrestore(&local->ack_status_lock, flags);
 
if (id >= 0) {



[PATCH v4 0/4] Add Airtime Queue Limits (AQL) to mac80211

2019-10-19 Thread Toke Høiland-Jørgensen
This series is a first attempt at porting the Airtime Queue Limits concept from
the out-of-tree ath10k implementation[0] to mac80211. This version takes Kan's
patch to do the throttling in mac80211, and replaces the driver API with the
mechanism from the previous version of my series, which instead calculated the
expected airtime at dequeue time inside mac80211, storing it in the SKB cb
field.

This series also imports Felix' airtime calculation code from mt76 into
mac80211, adjusting the API so it can be used from TX dequeue, by extracting the
latest TX rate from the tx_stats structure kept for each station.

As before, I've only compile tested this (lacking the proper hardware to do more
testing). So I'm hoping someone with a proper testing setup can take the whole
thing for a spin... :)

The series is also available in my git repo here:
https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/log/?h=mac80211-aql-02

Changelog:

v4:
  - Fix calculation that clamps the maximum airtime to fit into 10 bits
  - Incorporate Rich Brown's nits for the commit message in Kan's patch
  - Add fewer local variables to ieee80211_tx_dequeue()
v3:
  - Move the tx_time_est field so it's shared with ack_frame_id, and use units
of 4us for the value stored in it.
  - Move the addition of the Ethernet header size into 
ieee80211_calc_expected_tx_airtime()
v2:
  - Integrate Kan's approach to airtime throttling.
  - Hopefully fix the cb struct alignment on big-endian architectures.

---

Kan Yan (1):
  mac80211: Implement Airtime-based Queue Limit (AQL)

Toke Høiland-Jørgensen (3):
  mac80211: Shrink the size of ack_frame_id to make room for tx_time_est
  mac80211: Import airtime calculation code from mt76
  mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue


 include/net/cfg80211.h |7 +
 include/net/mac80211.h |   30 +++-
 net/mac80211/Makefile  |3 
 net/mac80211/airtime.c |  377 
 net/mac80211/cfg.c |2 
 net/mac80211/debugfs.c |   78 +
 net/mac80211/debugfs_sta.c |   43 -
 net/mac80211/ieee80211_i.h |8 +
 net/mac80211/main.c|9 +
 net/mac80211/sta_info.c|   32 
 net/mac80211/sta_info.h|8 +
 net/mac80211/status.c  |   38 
 net/mac80211/tx.c  |   69 
 13 files changed, 686 insertions(+), 18 deletions(-)
 create mode 100644 net/mac80211/airtime.c



[PATCH v4 3/4] mac80211: Implement Airtime-based Queue Limit (AQL)

2019-10-19 Thread Toke Høiland-Jørgensen
From: Kan Yan 

In order for the Fq_CoDel algorithm integrated in mac80211 layer to operate
effectively to control excessive queueing latency, the CoDel algorithm
requires an accurate measure of how long packets stays in the queue, AKA
sojourn time. The sojourn time measured at the mac80211 layer doesn't
include queueing latency in the lower layer (firmware/hardware) and CoDel
expects lower layer to have a short queue. However, most 802.11ac chipsets
offload tasks such TX aggregation to firmware or hardware, thus have a deep
lower layer queue.

Without a mechanism to control the lower layer queue size, packets only
stay in mac80211 layer transiently before being sent to firmware queue.
As a result, the sojourn time measured by CoDel in the mac80211 layer is
almost always lower than the CoDel latency target, hence CoDel does little
to control the latency, even when the lower layer queue causes excessive
latency.

The Byte Queue Limits (BQL) mechanism is commonly used to address the
similar issue with wired network interface. However, this method cannot be
applied directly to the wireless network interface. "Bytes" is not a
suitable measure of queue depth in the wireless network, as the data rate
can vary dramatically from station to station in the same network, from a
few Mbps to over Gbps.

This patch implements an Airtime-based Queue Limit (AQL) to make CoDel work
effectively with wireless drivers that utilized firmware/hardware
offloading. AQL allows each txq to release just enough packets to the lower
layer to form 1-2 large aggregations to keep hardware fully utilized and
retains the rest of the frames in mac80211 layer to be controlled by the
CoDel algorithm.

Signed-off-by: Kan Yan 
[ Toke: Keep API to set pending airtime internal, fix nits in commit msg ]
Signed-off-by: Toke Høiland-Jørgensen 
---
 include/net/cfg80211.h |7 
 include/net/mac80211.h |   12 +++
 net/mac80211/debugfs.c |   78 
 net/mac80211/debugfs_sta.c |   43 +++-
 net/mac80211/ieee80211_i.h |4 ++
 net/mac80211/main.c|9 +
 net/mac80211/sta_info.c|   32 ++
 net/mac80211/sta_info.h|8 +
 net/mac80211/tx.c  |   46 --
 9 files changed, 225 insertions(+), 14 deletions(-)

diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index ff45c3e1abff..8d50c0a60dbd 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -2602,6 +2602,13 @@ enum wiphy_params_flags {
 
 #define IEEE80211_DEFAULT_AIRTIME_WEIGHT   256
 
+/* The per TXQ device queue limit in airtime */
+#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_L  4000
+#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_H  8000
+
+/* The per interface airtime threshold to switch to lower queue limit */
+#define IEEE80211_AQL_THRESHOLD24000
+
 /**
  * struct cfg80211_pmksa - PMK Security Association
  *
diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index f058386e3fef..fdfa5805e1cf 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -5565,6 +5565,18 @@ void ieee80211_send_eosp_nullfunc(struct ieee80211_sta 
*pubsta, int tid);
 void ieee80211_sta_register_airtime(struct ieee80211_sta *pubsta, u8 tid,
u32 tx_airtime, u32 rx_airtime);
 
+/**
+ * ieee80211_txq_airtime_check - check if a txq can send frame to device
+ *
+ * @hw: pointer obtained from ieee80211_alloc_hw()
+ * @txq: pointer obtained from station or virtual interface
+ *
+ * Return true if the AQL's airtime limit has not been reached and the txq can
+ * continue to send more packets to the device. Otherwise return false.
+ */
+bool
+ieee80211_txq_airtime_check(struct ieee80211_hw *hw, struct ieee80211_txq 
*txq);
+
 /**
  * ieee80211_iter_keys - iterate keys programmed into the device
  * @hw: pointer obtained from ieee80211_alloc_hw()
diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index 568b3b276931..d77ea0e51c1d 100644
--- a/net/mac80211/debugfs.c
+++ b/net/mac80211/debugfs.c
@@ -148,6 +148,80 @@ static const struct file_operations aqm_ops = {
.llseek = default_llseek,
 };
 
+static ssize_t aql_txq_limit_read(struct file *file,
+ char __user *user_buf,
+ size_t count,
+ loff_t *ppos)
+{
+   struct ieee80211_local *local = file->private_data;
+   char buf[400];
+   int len = 0;
+
+   len = scnprintf(buf, sizeof(buf),
+   "AC AQL limit low   AQL limit high\n"
+   "VO %u  %u\n"
+   "VI %u  %u\n"
+   "BE %u  %u\n"
+   "BK %u  %u\n",
+   local->aql_txq_limit_low[IEEE802

[PATCH v4 2/4] mac80211: Import airtime calculation code from mt76

2019-10-19 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

Felix recently added code to calculate airtime of packets to the mt76
driver. Import this into mac80211 so we can use it for airtime queue limit
calculations later.

The airtime.c file is copied verbatim from the mt76 driver, and adjusted to
use mac80211 data structures instead (which is fairly straight forward).
The per-rate TX rate calculation is split out to its own
function (ieee80211_calc_tx_airtime_rate()) so it can be used directly for
the AQL calculations added in a subsequent patch.

The only thing that it was not possible to port directly was the bit that
read the internal driver flags of struct ieee80211_rate to determine
whether a rate is using CCK or OFDM encoding. Instead, just look at the
rate index, since at least mt76 and ath10k both seem to have the same
number of CCK rates (4) in their tables.

Signed-off-by: Toke Høiland-Jørgensen 
---
 include/net/mac80211.h |   14 ++
 net/mac80211/Makefile  |3 
 net/mac80211/airtime.c |  377 
 net/mac80211/ieee80211_i.h |4 
 4 files changed, 397 insertions(+), 1 deletion(-)
 create mode 100644 net/mac80211/airtime.c

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index 4288ace72c2b..f058386e3fef 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -6424,4 +6424,18 @@ void ieee80211_nan_func_match(struct ieee80211_vif *vif,
  struct cfg80211_nan_match_params *match,
  gfp_t gfp);
 
+/**
+ * ieee80211_calc_tx_airtime - calculate estimated transmission airtime.
+ *
+ * This function calculates the estimated airtime usage of a frame based on the
+ * rate information in the TX info struct and the frame length.
+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @info: &struct ieee80211_tx_info of the frame.
+ * @len: frame length in bytes
+ */
+u32 ieee80211_calc_tx_airtime(struct ieee80211_hw *hw,
+ struct ieee80211_tx_info *info,
+ int len);
+
 #endif /* MAC80211_H */
diff --git a/net/mac80211/Makefile b/net/mac80211/Makefile
index 4f03ebe732fa..6cbb1286d6c0 100644
--- a/net/mac80211/Makefile
+++ b/net/mac80211/Makefile
@@ -32,7 +32,8 @@ mac80211-y := \
chan.o \
trace.o mlme.o \
tdls.o \
-   ocb.o
+   ocb.o \
+   airtime.o
 
 mac80211-$(CONFIG_MAC80211_LEDS) += led.o
 mac80211-$(CONFIG_MAC80211_DEBUGFS) += \
diff --git a/net/mac80211/airtime.c b/net/mac80211/airtime.c
new file mode 100644
index ..c8d0cee61366
--- /dev/null
+++ b/net/mac80211/airtime.c
@@ -0,0 +1,377 @@
+// SPDX-License-Identifier: ISC
+/*
+ * Copyright (C) 2019 Felix Fietkau 
+ */
+
+#include 
+#include "ieee80211_i.h"
+#include "sta_info.h"
+
+#define AVG_PKT_SIZE   1024
+
+/* Number of bits for an average sized packet */
+#define MCS_NBITS (AVG_PKT_SIZE << 3)
+
+/* Number of symbols for a packet with (bps) bits per symbol */
+#define MCS_NSYMS(bps) DIV_ROUND_UP(MCS_NBITS, (bps))
+
+/* Transmission time (1024 usec) for a packet containing (syms) * symbols */
+#define MCS_SYMBOL_TIME(sgi, syms) \
+   (sgi ?  \
+ ((syms) * 18 * 1024 + 4 * 1024) / 5 : /* syms * 3.6 us */ \
+ ((syms) * 1024) << 2  /* syms * 4 us */   \
+   )
+
+/* Transmit duration for the raw data part of an average sized packet */
+#define MCS_DURATION(streams, sgi, bps) \
+   MCS_SYMBOL_TIME(sgi, MCS_NSYMS((streams) * (bps)))
+
+#define BW_20  0
+#define BW_40  1
+#define BW_80  2
+
+/*
+ * Define group sort order: HT40 -> SGI -> #streams
+ */
+#define IEEE80211_MAX_STREAMS  4
+#define IEEE80211_HT_STREAM_GROUPS 4 /* BW(=2) * SGI(=2) */
+#define IEEE80211_VHT_STREAM_GROUPS6 /* BW(=3) * SGI(=2) */
+
+#define IEEE80211_HT_GROUPS_NB (IEEE80211_MAX_STREAMS *\
+IEEE80211_HT_STREAM_GROUPS)
+#define IEEE80211_VHT_GROUPS_NB(IEEE80211_MAX_STREAMS *\
+IEEE80211_VHT_STREAM_GROUPS)
+#define IEEE80211_GROUPS_NB(IEEE80211_HT_GROUPS_NB +   \
+IEEE80211_VHT_GROUPS_NB)
+
+#define IEEE80211_HT_GROUP_0   0
+#define IEEE80211_VHT_GROUP_0  (IEEE80211_HT_GROUP_0 + IEEE80211_HT_GROUPS_NB)
+
+#define MCS_GROUP_RATES10
+#define CCK_NUM_RATES  4
+
+#define HT_GROUP_IDX(_streams, _sgi, _ht40)\
+   IEEE80211_HT_GROUP_0 +  \
+   IEEE80211_MAX_STREAMS * 2 * _ht40 + \
+   IEEE80211_MAX_STREAMS * _sgi +  \
+   _streams - 1
+
+#define _MAX(a, b) (((a)>(b))?(a):(b))
+
+#define GROUP_SHIFT(duration)  \
+   _MAX(0, 16 - __builtin_clz(duration))
+
+/* MCS 

Re: [PATCH v3 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

2019-10-19 Thread Toke Høiland-Jørgensen
Kan Yan  writes:

>> +   if (local->airtime_flags & AIRTIME_USE_AQL) {
>> +   airtime = ieee80211_calc_expected_tx_airtime(hw, vif, 
>> txq->sta,
>> +skb->len);
>> +   if (airtime) {
>> +   /* We only have 10 bits in tx_time_est, so store 
>> airtime
>> +* in increments of 4 us and clamp that to 2**10.
>> +*/
>> +   info->tx_time_est = min_t(u32, airtime >> 2, 1 << 
>> 10);
>> +   ieee80211_sta_update_pending_airtime(local, tx.sta, 
>> ac,
>> +airtime, false);
>> +   }
>> +   }
>> +
>
> It should be:
>  ieee80211_sta_update_pending_airtime(local, tx.sta, 
> ac,
>
> info->tx_time_est << 2, false);
>
> The airtime rounded to 4us (info->tx_time_est << 2), instead of the
> original airtime should be used when registering the pending airtime,
> to keep it consistent with airtime subtracted when the skb is freed.

Yes, I realised that last night as well. The rounding is also off (max
is 2**10-1, not 2**10. Will send a v4 :)

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH v3 0/4] Add Airtime Queue Limits (AQL) to mac80211

2019-10-18 Thread Toke Høiland-Jørgensen
This series is a first attempt at porting the Airtime Queue Limits concept from
the out-of-tree ath10k implementation[0] to mac80211. This version takes Kan's
patch to do the throttling in mac80211, and replaces the driver API with the
mechanism from the previous version of my series, which instead calculated the
expected airtime at dequeue time inside mac80211, storing it in the SKB cb
field.

This series also imports Felix' airtime calculation code from mt76 into
mac80211, adjusting the API so it can be used from TX dequeue, by extracting the
latest TX rate from the tx_stats structure kept for each station.

As before, I've only compile tested this (lacking the proper hardware to do more
testing). So I'm hoping someone with a proper testing setup can take the whole
thing for a spin... :)

The series is also available in my git repo here:
https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/log/?h=mac80211-aql-02

Changelog:

v3:
  - Move the tx_time_est field so it's shared with ack_frame_id, and use units
of 4us for the value stored in it.
  - Move the addition of the Ethernet header size into 
ieee80211_calc_expected_tx_airtime()
v2:
  - Integrate Kan's approach to airtime throttling.
  - Hopefully fix the cb struct alignment on big-endian architectures.

---

Kan Yan (1):
  mac80211: Implement Airtime-based Queue Limit (AQL)

Toke Høiland-Jørgensen (3):
  mac80211: Shrink the size of ack_frame_id to make room for tx_time_est
  mac80211: Import airtime calculation code from mt76
  mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue


 include/net/cfg80211.h |7 +
 include/net/mac80211.h |   30 +++-
 net/mac80211/Makefile  |3 
 net/mac80211/airtime.c |  377 
 net/mac80211/cfg.c |2 
 net/mac80211/debugfs.c |   78 +
 net/mac80211/debugfs_sta.c |   43 -
 net/mac80211/ieee80211_i.h |8 +
 net/mac80211/main.c|9 +
 net/mac80211/sta_info.c|   32 
 net/mac80211/sta_info.h|8 +
 net/mac80211/status.c  |   38 
 net/mac80211/tx.c  |   67 +++-
 13 files changed, 684 insertions(+), 18 deletions(-)
 create mode 100644 net/mac80211/airtime.c



[PATCH v3 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

2019-10-18 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

The previous commit added the ability to throttle stations when they queue
too much airtime in the hardware. This commit enables the functionality by
calculating the expected airtime usage of each packet that is dequeued from
the TXQs in mac80211, and accounting that as pending airtime.

The estimated airtime for each skb is stored in the tx_info, so we can
subtract the same amount from the running total when the skb is freed or
recycled. The throttling mechanism relies on this accounting to be
accurate (i.e., that we are not freeing skbs without subtracting any
airtime they were accounted for), so we put the subtraction into
ieee80211_report_used_skb(). As an optimisation, we also subtract the
airtime on regular TX completion, zeroing out the value stored in the
packet afterwards, to avoid having to do an expensive lookup of the
station from the packet data on every packet.

This patch does *not* include any mechanism to wake a throttled TXQ again,
on the assumption that this will happen anyway as a side effect of whatever
freed the skb (most commonly a TX completion).

Signed-off-by: Toke Høiland-Jørgensen 
---
 net/mac80211/status.c |   38 ++
 net/mac80211/tx.c |   19 +++
 2 files changed, 57 insertions(+)

diff --git a/net/mac80211/status.c b/net/mac80211/status.c
index ab8ba5835ca0..905b4afd457d 100644
--- a/net/mac80211/status.c
+++ b/net/mac80211/status.c
@@ -676,6 +676,33 @@ static void ieee80211_report_used_skb(struct 
ieee80211_local *local,
if (dropped)
acked = false;
 
+   if (info->tx_time_est) {
+   struct ieee80211_sub_if_data *sdata;
+   struct sta_info *sta = NULL;
+   u8 *qc, ac;
+   int tid;
+
+   rcu_read_lock();
+
+   sdata = ieee80211_sdata_from_skb(local, skb);
+   if (sdata)
+   sta = sta_info_get_bss(sdata, skb_mac_header(skb));
+
+   if (ieee80211_is_data_qos(hdr->frame_control)) {
+   qc = ieee80211_get_qos_ctl(hdr);
+   tid = qc[0] & 0xf;
+   ac = ieee80211_ac_from_tid(tid);
+   } else {
+   ac = IEEE80211_AC_BE;
+   }
+
+   ieee80211_sta_update_pending_airtime(local, sta, ac,
+info->tx_time_est << 2,
+true);
+   rcu_read_unlock();
+
+   }
+
if (info->flags & IEEE80211_TX_INTFL_MLME_CONN_TX) {
struct ieee80211_sub_if_data *sdata;
 
@@ -930,6 +957,17 @@ static void __ieee80211_tx_status(struct ieee80211_hw *hw,
tid = qc[0] & 0xf;
}
 
+   if (info->tx_time_est) {
+   /* Do this here to avoid the expensive lookup of the sta
+* in ieee80211_report_used_skb().
+*/
+   ieee80211_sta_update_pending_airtime(local, sta,
+
ieee80211_ac_from_tid(tid),
+info->tx_time_est 
<< 2,
+true);
+   info->tx_time_est = 0;
+   }
+
if (!acked && ieee80211_is_back_req(fc)) {
u16 control;
 
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 12653d873b8c..bd8284f04361 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3539,9 +3539,14 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw 
*hw,
struct ieee80211_tx_data tx;
ieee80211_tx_result r;
struct ieee80211_vif *vif = txq->vif;
+   u8 ac = txq->ac;
+   u32 airtime;
 
WARN_ON_ONCE(softirq_count() == 0);
 
+   if (!ieee80211_txq_airtime_check(hw, txq))
+   return NULL;
+
 begin:
spin_lock_bh(&fq->lock);
 
@@ -3652,6 +3657,20 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw 
*hw,
}
 
IEEE80211_SKB_CB(skb)->control.vif = vif;
+
+   if (local->airtime_flags & AIRTIME_USE_AQL) {
+   airtime = ieee80211_calc_expected_tx_airtime(hw, vif, txq->sta,
+skb->len);
+   if (airtime) {
+   /* We only have 10 bits in tx_time_est, so store airtime
+* in increments of 4 us and clamp that to 2**10.
+*/
+   info->tx_time_est = min_t(u32, airtime >> 2, 1 << 10);
+   ieee80211_sta_update_pending_airtime(local, tx.sta, ac,
+

[PATCH v3 1/4] mac80211: Shrink the size of ack_frame_id to make room for tx_time_est

2019-10-18 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

To implement airtime queue limiting, we need to keep a running account of
the estimated airtime of all skbs queued into the device. Do to this
correctly, we need to store the airtime estimate into the skb so we can
decrease the outstanding balance when the skb is freed. This means that the
time estimate must be stored somewhere that will survive for the lifetime
of the skb.

To get this, decrease the size of the ack_frame_id field to 6 bits, and
lower the size of the ID space accordingly. This leaves 10 bits for use for
tx_time_est, which is enough to store a maximum of 4096 us, if we shift the
values so they become units of 4us.

Signed-off-by: Toke Høiland-Jørgensen 
---
 include/net/mac80211.h |4 +++-
 net/mac80211/cfg.c |2 +-
 net/mac80211/tx.c  |2 +-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index d69081c38788..4288ace72c2b 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -967,6 +967,7 @@ ieee80211_rate_get_vht_nss(const struct ieee80211_tx_rate 
*rate)
  * @band: the band to transmit on (use for checking for races)
  * @hw_queue: HW queue to put the frame on, skb_get_queue_mapping() gives the 
AC
  * @ack_frame_id: internal frame ID for TX status, used internally
+ * @tx_time_est: TX time estimate in units of 4us, used internally
  * @control: union part for control data
  * @control.rates: TX rates array to try
  * @control.rts_cts_rate_idx: rate for RTS or CTS
@@ -1007,7 +1008,8 @@ struct ieee80211_tx_info {
 
u8 hw_queue;
 
-   u16 ack_frame_id;
+   u16 ack_frame_id:6;
+   u16 tx_time_est:10;
 
union {
struct {
diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index 70739e746c13..4fb7f1f12109 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -3428,7 +3428,7 @@ int ieee80211_attach_ack_skb(struct ieee80211_local 
*local, struct sk_buff *skb,
 
spin_lock_irqsave(&local->ack_status_lock, spin_flags);
id = idr_alloc(&local->ack_status_frames, ack_skb,
-  1, 0x1, GFP_ATOMIC);
+  1, 0x40, GFP_ATOMIC);
spin_unlock_irqrestore(&local->ack_status_lock, spin_flags);
 
if (id < 0) {
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 938c10f7955b..a16c2f863702 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -2719,7 +2719,7 @@ static struct sk_buff *ieee80211_build_hdr(struct 
ieee80211_sub_if_data *sdata,
 
spin_lock_irqsave(&local->ack_status_lock, flags);
id = idr_alloc(&local->ack_status_frames, ack_skb,
-  1, 0x1, GFP_ATOMIC);
+  1, 0x40, GFP_ATOMIC);
spin_unlock_irqrestore(&local->ack_status_lock, flags);
 
if (id >= 0) {



[PATCH v3 2/4] mac80211: Import airtime calculation code from mt76

2019-10-18 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

Felix recently added code to calculate airtime of packets to the mt76
driver. Import this into mac80211 so we can use it for airtime queue limit
calculations later.

The airtime.c file is copied verbatim from the mt76 driver, and adjusted to
use mac80211 data structures instead (which is fairly straight forward).
The per-rate TX rate calculation is split out to its own
function (ieee80211_calc_tx_airtime_rate()) so it can be used directly for
the AQL calculations added in a subsequent patch.

The only thing that it was not possible to port directly was the bit that
read the internal driver flags of struct ieee80211_rate to determine
whether a rate is using CCK or OFDM encoding. Instead, just look at the
rate index, since at least mt76 and ath10k both seem to have the same
number of CCK rates (4) in their tables.

Signed-off-by: Toke Høiland-Jørgensen 
---
 include/net/mac80211.h |   14 ++
 net/mac80211/Makefile  |3 
 net/mac80211/airtime.c |  377 
 net/mac80211/ieee80211_i.h |4 
 4 files changed, 397 insertions(+), 1 deletion(-)
 create mode 100644 net/mac80211/airtime.c

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index 4288ace72c2b..f058386e3fef 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -6424,4 +6424,18 @@ void ieee80211_nan_func_match(struct ieee80211_vif *vif,
  struct cfg80211_nan_match_params *match,
  gfp_t gfp);
 
+/**
+ * ieee80211_calc_tx_airtime - calculate estimated transmission airtime.
+ *
+ * This function calculates the estimated airtime usage of a frame based on the
+ * rate information in the TX info struct and the frame length.
+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @info: &struct ieee80211_tx_info of the frame.
+ * @len: frame length in bytes
+ */
+u32 ieee80211_calc_tx_airtime(struct ieee80211_hw *hw,
+ struct ieee80211_tx_info *info,
+ int len);
+
 #endif /* MAC80211_H */
diff --git a/net/mac80211/Makefile b/net/mac80211/Makefile
index 4f03ebe732fa..6cbb1286d6c0 100644
--- a/net/mac80211/Makefile
+++ b/net/mac80211/Makefile
@@ -32,7 +32,8 @@ mac80211-y := \
chan.o \
trace.o mlme.o \
tdls.o \
-   ocb.o
+   ocb.o \
+   airtime.o
 
 mac80211-$(CONFIG_MAC80211_LEDS) += led.o
 mac80211-$(CONFIG_MAC80211_DEBUGFS) += \
diff --git a/net/mac80211/airtime.c b/net/mac80211/airtime.c
new file mode 100644
index ..c8d0cee61366
--- /dev/null
+++ b/net/mac80211/airtime.c
@@ -0,0 +1,377 @@
+// SPDX-License-Identifier: ISC
+/*
+ * Copyright (C) 2019 Felix Fietkau 
+ */
+
+#include 
+#include "ieee80211_i.h"
+#include "sta_info.h"
+
+#define AVG_PKT_SIZE   1024
+
+/* Number of bits for an average sized packet */
+#define MCS_NBITS (AVG_PKT_SIZE << 3)
+
+/* Number of symbols for a packet with (bps) bits per symbol */
+#define MCS_NSYMS(bps) DIV_ROUND_UP(MCS_NBITS, (bps))
+
+/* Transmission time (1024 usec) for a packet containing (syms) * symbols */
+#define MCS_SYMBOL_TIME(sgi, syms) \
+   (sgi ?  \
+ ((syms) * 18 * 1024 + 4 * 1024) / 5 : /* syms * 3.6 us */ \
+ ((syms) * 1024) << 2  /* syms * 4 us */   \
+   )
+
+/* Transmit duration for the raw data part of an average sized packet */
+#define MCS_DURATION(streams, sgi, bps) \
+   MCS_SYMBOL_TIME(sgi, MCS_NSYMS((streams) * (bps)))
+
+#define BW_20  0
+#define BW_40  1
+#define BW_80  2
+
+/*
+ * Define group sort order: HT40 -> SGI -> #streams
+ */
+#define IEEE80211_MAX_STREAMS  4
+#define IEEE80211_HT_STREAM_GROUPS 4 /* BW(=2) * SGI(=2) */
+#define IEEE80211_VHT_STREAM_GROUPS6 /* BW(=3) * SGI(=2) */
+
+#define IEEE80211_HT_GROUPS_NB (IEEE80211_MAX_STREAMS *\
+IEEE80211_HT_STREAM_GROUPS)
+#define IEEE80211_VHT_GROUPS_NB(IEEE80211_MAX_STREAMS *\
+IEEE80211_VHT_STREAM_GROUPS)
+#define IEEE80211_GROUPS_NB(IEEE80211_HT_GROUPS_NB +   \
+IEEE80211_VHT_GROUPS_NB)
+
+#define IEEE80211_HT_GROUP_0   0
+#define IEEE80211_VHT_GROUP_0  (IEEE80211_HT_GROUP_0 + IEEE80211_HT_GROUPS_NB)
+
+#define MCS_GROUP_RATES10
+#define CCK_NUM_RATES  4
+
+#define HT_GROUP_IDX(_streams, _sgi, _ht40)\
+   IEEE80211_HT_GROUP_0 +  \
+   IEEE80211_MAX_STREAMS * 2 * _ht40 + \
+   IEEE80211_MAX_STREAMS * _sgi +  \
+   _streams - 1
+
+#define _MAX(a, b) (((a)>(b))?(a):(b))
+
+#define GROUP_SHIFT(duration)  \
+   _MAX(0, 16 - __builtin_clz(duration))
+
+/* MCS 

[PATCH v3 3/4] mac80211: Implement Airtime-based Queue Limit (AQL)

2019-10-18 Thread Toke Høiland-Jørgensen
From: Kan Yan 

In order for the Fq_CoDel integrated in mac80211 layer operates effectively
to control excessive queueing latency, the CoDel algorithm requires an
accurate measure of how long the packets stays in the queue, aka sojourn
time. The sojourn time measured at mac80211 layer doesn't include queueing
latency in lower layer (firmware/hardware) and CoDel expects lower layer to
have a short queue. However, most 802.11ac chipsets offload tasks such TX
aggregation to firmware or hardware, thus have a deep lower layer queue.
Without a mechanism to control the lower layer queue size, packets only
stays in mac80211 layer transiently before being sent to firmware queue.
As a result, the sojourn time measured by CoDel in the mac80211 layer is
almost always lower than the CoDel latency target, hence CoDel does little
to control the latency, even when the lower layer queue causes excessive
latency.

Byte Queue limits (BQL) is commonly used to address the similar issue with
wired network interface. However, this method cannot be applied directly
to the wireless network interface. Byte is not a suitable measure of queue
depth in the wireless network, as the data rate can vary dramatically from
station to station in the same network, from a few Mbps to over Gbps.

This patch implemented an Airtime-based Queue Limit (AQL) to make CoDel
works effectively with wireless drivers that utilized firmware/hardware
offloading. AQL only allows each txq to release just enough packets to the
lower layer to form 1-2 large aggregations to keep hardware fully utilized
and keep the rest of frames in mac80211 layer to be controlled by the CoDel
algorithm.

Signed-off-by: Kan Yan 
[ Toke: Get rid of the driver API to set pending airtime ]
Signed-off-by: Toke Høiland-Jørgensen 
---
 include/net/cfg80211.h |7 
 include/net/mac80211.h |   12 +++
 net/mac80211/debugfs.c |   78 
 net/mac80211/debugfs_sta.c |   43 +++-
 net/mac80211/ieee80211_i.h |4 ++
 net/mac80211/main.c|9 +
 net/mac80211/sta_info.c|   32 ++
 net/mac80211/sta_info.h|8 +
 net/mac80211/tx.c  |   46 --
 9 files changed, 225 insertions(+), 14 deletions(-)

diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index ff45c3e1abff..8d50c0a60dbd 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -2602,6 +2602,13 @@ enum wiphy_params_flags {
 
 #define IEEE80211_DEFAULT_AIRTIME_WEIGHT   256
 
+/* The per TXQ device queue limit in airtime */
+#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_L  4000
+#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_H  8000
+
+/* The per interface airtime threshold to switch to lower queue limit */
+#define IEEE80211_AQL_THRESHOLD24000
+
 /**
  * struct cfg80211_pmksa - PMK Security Association
  *
diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index f058386e3fef..fdfa5805e1cf 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -5565,6 +5565,18 @@ void ieee80211_send_eosp_nullfunc(struct ieee80211_sta 
*pubsta, int tid);
 void ieee80211_sta_register_airtime(struct ieee80211_sta *pubsta, u8 tid,
u32 tx_airtime, u32 rx_airtime);
 
+/**
+ * ieee80211_txq_airtime_check - check if a txq can send frame to device
+ *
+ * @hw: pointer obtained from ieee80211_alloc_hw()
+ * @txq: pointer obtained from station or virtual interface
+ *
+ * Return true if the AQL's airtime limit has not been reached and the txq can
+ * continue to send more packets to the device. Otherwise return false.
+ */
+bool
+ieee80211_txq_airtime_check(struct ieee80211_hw *hw, struct ieee80211_txq 
*txq);
+
 /**
  * ieee80211_iter_keys - iterate keys programmed into the device
  * @hw: pointer obtained from ieee80211_alloc_hw()
diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index 568b3b276931..d77ea0e51c1d 100644
--- a/net/mac80211/debugfs.c
+++ b/net/mac80211/debugfs.c
@@ -148,6 +148,80 @@ static const struct file_operations aqm_ops = {
.llseek = default_llseek,
 };
 
+static ssize_t aql_txq_limit_read(struct file *file,
+ char __user *user_buf,
+ size_t count,
+ loff_t *ppos)
+{
+   struct ieee80211_local *local = file->private_data;
+   char buf[400];
+   int len = 0;
+
+   len = scnprintf(buf, sizeof(buf),
+   "AC AQL limit low   AQL limit high\n"
+   "VO %u  %u\n"
+   "VI %u  %u\n"
+   "BE %u  %u\n"
+   "BK %u  %u\n",
+   local->aql_txq_limit_low[IEEE80211_AC_VO],
+   local->aql_txq_limit_hi

Re: [PATCH v2 1/4] mac80211: Rearrange ieee80211_tx_info to make room for tx_time_est

2019-10-18 Thread Toke Høiland-Jørgensen
Johannes Berg  writes:

> On Fri, 2019-10-18 at 16:01 +0200, Toke Høiland-Jørgensen wrote:
>> 
>> > We can also play with the units of the airtime, e.g. making that a
>> > multiple of 2 or 4 us? Seems unlikely to matter much?
>> 
>> Sure, that's a good point! Increments of 4us means we can fit 4ms is 10
>> bits, leaving plenty of space for ACK IDs (hopefully).
>
> If you do need more bits (e.g. to be on the safe side and have space for
> 8ms) you could also steal bits out of 'band' (we only need 3 I think)
> and 'hw_queue' (not sure what the limit really is, but there aren't many
> users, seems like only iwlwifi/dvm and hwsim care, and those certainly
> don't need >32 queues).
>
> Of course if you leave more bits for later that's good too ;-)

Yeah, let's leave that for later. After all, with the limits we
currently have configured, if a single packet takes up 4096 us, that
will trigger the per-station queue throttling in itself, so I think
we'll be fine (famous last words).

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 1/4] mac80211: Rearrange ieee80211_tx_info to make room for tx_time_est

2019-10-18 Thread Toke Høiland-Jørgensen
Johannes Berg  writes:

> On Fri, 2019-10-18 at 16:01 +0200, Toke Høiland-Jørgensen wrote:
>
>> Right. Well in that case, let's try it. As long as we fail in a
>> reasonable way, we can just see if we run into anything that breaks? I
>> guess in this case that means rejecting requests from userspace if we
>> run out of IDs rather than silently wrapping and returning wrong data :)
>
> We can't reject due to how this works, but if the idr_alloc() fails then
> we'll just not give a status back to userspace later.

OK, I guess someone will notice if that suddenly starts happening all
the time ;)

>> > > We could also split 5/11. That would support up to 32 ACK IDs, and we
>> > > can just truncate the airtime at 2048 us, which is not a big deal I'd
>> > > say.
>> > 
>> > We can also play with the units of the airtime, e.g. making that a
>> > multiple of 2 or 4 us? Seems unlikely to matter much?
>> 
>> Sure, that's a good point! Increments of 4us means we can fit 4ms is 10
>> bits, leaving plenty of space for ACK IDs (hopefully).
>> 
>> I'll rework the series to use that instead :)
>
> OK.
>
> There are two places that call idr_alloc() with a hardcoded limit of
> 0x1, you'll have to fix those to have the right limit according to
> the bits you leave for the ACK id.

Yup, found those. Will send a new version of the series in a bit.

-Toke



Re: [PATCH v2 1/4] mac80211: Rearrange ieee80211_tx_info to make room for tx_time_est

2019-10-18 Thread Toke Høiland-Jørgensen
Johannes Berg  writes:

> On Fri, 2019-10-18 at 15:31 +0200, Toke Høiland-Jørgensen wrote:
>
>> Well, let's try to do the actual math... A full-size (1538 bytes) packet
>> takes ~2050 microseconds to transmit at 6 Mbps. Adding in overhead, it's
>> certainly still less that 4096 us, so 12 bits is plenty.
>
> What about A-MSDUs? But I guess maximum continous transmissions are at
> most 4ms anyway, so a single packet should never be longer.

Ah yeah, those could be a bit bigger, but yeah, 4ms should at least be
enough.

>> That leaves
>> four bits for the ACK status ID if we just split the u16; if we only
>> ever have "a handful", that should be enough, no?
>
> It's how many are in flight at a time, 16 doesn't seem likely to happen,
> but I don't really know what applications are doing with it now.
> Probably only wpa_s for the EAPOL TX status.

Right. Well in that case, let's try it. As long as we fail in a
reasonable way, we can just see if we run into anything that breaks? I
guess in this case that means rejecting requests from userspace if we
run out of IDs rather than silently wrapping and returning wrong data :)

>> We could also split 5/11. That would support up to 32 ACK IDs, and we
>> can just truncate the airtime at 2048 us, which is not a big deal I'd
>> say.
>
> We can also play with the units of the airtime, e.g. making that a
> multiple of 2 or 4 us? Seems unlikely to matter much?

Sure, that's a good point! Increments of 4us means we can fit 4ms is 10
bits, leaving plenty of space for ACK IDs (hopefully).

I'll rework the series to use that instead :)

-Toke



Re: [PATCH v2 1/4] mac80211: Rearrange ieee80211_tx_info to make room for tx_time_est

2019-10-18 Thread Toke Høiland-Jørgensen
Johannes Berg  writes:

> On Fri, 2019-10-18 at 12:15 +0200, Toke Høiland-Jørgensen wrote:
>
>> However, there's a nice juicy 'u16 ack_frame_id' at the start of
>> ieee80211_tx_info. Could we potentially use that? We could use the top
>> bit as a disambiguation flag; I think we're fine with 15 bits for the TX
>> time itself (a single packet won't exceed 8ms or TX time), so if we can
>> live with 15 bits of ACK frame ID space, that could be a way forward?
>
> I was going to say that should work as we only ever have a handful of
> ACK frame IDs, but ... you still need the airtime even for a frame that
> userspace wants to know the ACK status of, no?

Oh, right.

Well, let's try to do the actual math... A full-size (1538 bytes) packet
takes ~2050 microseconds to transmit at 6 Mbps. Adding in overhead, it's
certainly still less that 4096 us, so 12 bits is plenty. That leaves
four bits for the ACK status ID if we just split the u16; if we only
ever have "a handful", that should be enough, no?

We could also split 5/11. That would support up to 32 ACK IDs, and we
can just truncate the airtime at 2048 us, which is not a big deal I'd
say. We could even split 6/10 and only need to truncate the TX time at
rates below 13 Mbps... Or we could sacrifice a bit to the flag and only
truncate if the ACK status flag is set.

Think it mostly depends on what is the smallest ID space for ACK IDs we
can live with? :)

-Toke



Re: [PATCH v2 1/4] mac80211: Rearrange ieee80211_tx_info to make room for tx_time_est

2019-10-18 Thread Toke Høiland-Jørgensen
Kan Yan  writes:

> The "tx_time_est" field, shared by control and status, is not able to
> survive until the skb returns to the mac80211 layer in some
> architectures. The same space is defined as driver_data and some
> wireless drivers use it for other purposes, as the cb in the sk_buff
> is free to be used by any layer.
>
> In the case of ath10k, the tx_time_est get clobbered by
> struct ath10k_skb_cb {
> dma_addr_t paddr;
> u8 flags;
> u8 eid;
> u16 msdu_id;
> u16 airtime_est;
> struct ieee80211_vif *vif;
> struct ieee80211_txq *txq;
> } __packed;

Ah, bugger, of course the driver that actually needs this is using the
full driver_data space :P

> Do you think shrink driver_data by 2 bytes and use that space for
> tx_time_est to make it persistent across mac80211 and wireless driver
> layer an acceptable solution?

Hmm, the driver_data field is defined as an array of pointers, so we can
only shrink it in increments of sizeof(void *). I think it may be
feasible to shrink it (as in, I don't think any drivers are actually
using the full 40 bytes), but doing this in a way that will gain us a
2-byte space that is also usable in the case driver_data is *not* used
(i.e., it needs be able to align with a field in .control and .status as
well) would require some serious surgery of the whole ieee80211_tx_info...

However, there's a nice juicy 'u16 ack_frame_id' at the start of
ieee80211_tx_info. Could we potentially use that? We could use the top
bit as a disambiguation flag; I think we're fine with 15 bits for the TX
time itself (a single packet won't exceed 8ms or TX time), so if we can
live with 15 bits of ACK frame ID space, that could be a way forward?

Johannes, what do you think?

-Toke



Re: [Make-wifi-fast] [PATCH v2 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

2019-10-17 Thread Toke Høiland-Jørgensen
Sebastian Moeller  writes:

>> On Oct 17, 2019, at 11:44, Toke Høiland-Jørgensen  wrote:
>> 
>> Kan Yan  writes:
>> 
>>> Hi Toke,
>>> 
>>> Thanks for getting this done! I will give it a try in the next few
>>> days.  A few comments:
>>> 
>>>> The estimated airtime for each skb is stored in the tx_info, so we can
>>>> subtract the same amount from the running total when the skb is freed or
>>>> recycled.
>>> 
>>> Looks like ath10k driver zero out the info->status before calling
>>> ieee80211_tx_status(...):
>>> int ath10k_txrx_tx_unref(struct ath10k_htt *htt,
>>> const struct htt_tx_done *tx_done)
>>> {
>>> ...
>>>info = IEEE80211_SKB_CB(msdu);
>>>memset(&info->status, 0, sizeof(info->status));
>>> ...
>>>ieee80211_tx_status(htt->ar->hw, msdu);
>>> }
>> 
>> Ah, bugger; I was afraid we'd run into this. A quick grep indicates that
>> it's only ath10k and iwl that do this, though, so it's probably
>> manageable to just fix this. I think the simplest solution is just to
>> restore the field after clearing, no?
>> 
>>> We need either restore the info->status.tx_time_est or calling
>>> ieee80211_sta_update_pending_airtime() in ath10k before tx_time_est
>>> get erased.
>>> 
>>>> +   if (local->airtime_flags & AIRTIME_USE_AQL) {
>>>> +   airtime = ieee80211_calc_expected_tx_airtime(hw, vif, 
>>>> txq->sta,
>>>> +skb->len + 
>>>> 38);
>>> 
>>> I think it is better to put the "+  38" that takes care of the header
>>> overhead inside ieee80211_calc_expected_tx_airtime().
>> 
>> Hmm, no strong opinion about this; but yeah, since we have a dedicated
>> function for this use I guess there's no harm in adding it there :)
>> 
>
> Silly question, is this Overhead guaranteed to be 38 Bytes for all
> eternity? Otherwise a variable or a preprocessor constant might be
> more future proof?

Well, yeah, as long as we're sending Ethernet packets. Which is kinda
baked into the WiFi standard :)

-Toke



Re: [PATCH v2 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

2019-10-17 Thread Toke Høiland-Jørgensen
Kan Yan  writes:

> Hi Toke,
>
> Thanks for getting this done! I will give it a try in the next few
> days.  A few comments:
>
>> The estimated airtime for each skb is stored in the tx_info, so we can
>> subtract the same amount from the running total when the skb is freed or
>> recycled.
>
> Looks like ath10k driver zero out the info->status before calling
> ieee80211_tx_status(...):
> int ath10k_txrx_tx_unref(struct ath10k_htt *htt,
>  const struct htt_tx_done *tx_done)
> {
>  ...
> info = IEEE80211_SKB_CB(msdu);
> memset(&info->status, 0, sizeof(info->status));
> ...
> ieee80211_tx_status(htt->ar->hw, msdu);
> }

Ah, bugger; I was afraid we'd run into this. A quick grep indicates that
it's only ath10k and iwl that do this, though, so it's probably
manageable to just fix this. I think the simplest solution is just to
restore the field after clearing, no?

> We need either restore the info->status.tx_time_est or calling
> ieee80211_sta_update_pending_airtime() in ath10k before tx_time_est
> get erased.
>
>> +   if (local->airtime_flags & AIRTIME_USE_AQL) {
>> +   airtime = ieee80211_calc_expected_tx_airtime(hw, vif, 
>> txq->sta,
>> +skb->len + 38);
>
> I think it is better to put the "+  38" that takes care of the header
> overhead inside ieee80211_calc_expected_tx_airtime().

Hmm, no strong opinion about this; but yeah, since we have a dedicated
function for this use I guess there's no harm in adding it there :)

-Toke



[PATCH v2 3/4] mac80211: Implement Airtime-based Queue Limit (AQL)

2019-10-15 Thread Toke Høiland-Jørgensen
From: Kan Yan 

In order for the Fq_CoDel integrated in mac80211 layer operates effectively
to control excessive queueing latency, the CoDel algorithm requires an
accurate measure of how long the packets stays in the queue, aka sojourn
time. The sojourn time measured at mac80211 layer doesn't include queueing
latency in lower layer (firmware/hardware) and CoDel expects lower layer to
have a short queue. However, most 802.11ac chipsets offload tasks such TX
aggregation to firmware or hardware, thus have a deep lower layer queue.
Without a mechanism to control the lower layer queue size, packets only
stays in mac80211 layer transiently before being sent to firmware queue.
As a result, the sojourn time measured by CoDel in the mac80211 layer is
almost always lower than the CoDel latency target, hence CoDel does little
to control the latency, even when the lower layer queue causes excessive
latency.

Byte Queue limits (BQL) is commonly used to address the similar issue with
wired network interface. However, this method cannot be applied directly
to the wireless network interface. Byte is not a suitable measure of queue
depth in the wireless network, as the data rate can vary dramatically from
station to station in the same network, from a few Mbps to over Gbps.

This patch implemented an Airtime-based Queue Limit (AQL) to make CoDel
works effectively with wireless drivers that utilized firmware/hardware
offloading. AQL only allows each txq to release just enough packets to the
lower layer to form 1-2 large aggregations to keep hardware fully utilized
and keep the rest of frames in mac80211 layer to be controlled by the CoDel
algorithm.

Signed-off-by: Kan Yan 
[ Toke: Get rid of the driver API to set pending airtime ]
Signed-off-by: Toke Høiland-Jørgensen 
---
 include/net/cfg80211.h |7 
 include/net/mac80211.h |   12 +++
 net/mac80211/debugfs.c |   78 
 net/mac80211/debugfs_sta.c |   43 +++-
 net/mac80211/ieee80211_i.h |4 ++
 net/mac80211/main.c|9 +
 net/mac80211/sta_info.c|   32 ++
 net/mac80211/sta_info.h|8 +
 net/mac80211/tx.c  |   46 --
 9 files changed, 225 insertions(+), 14 deletions(-)

diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index ff45c3e1abff..8d50c0a60dbd 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -2602,6 +2602,13 @@ enum wiphy_params_flags {
 
 #define IEEE80211_DEFAULT_AIRTIME_WEIGHT   256
 
+/* The per TXQ device queue limit in airtime */
+#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_L  4000
+#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_H  8000
+
+/* The per interface airtime threshold to switch to lower queue limit */
+#define IEEE80211_AQL_THRESHOLD24000
+
 /**
  * struct cfg80211_pmksa - PMK Security Association
  *
diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index 7619eb74d612..b5727a20754c 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -5575,6 +5575,18 @@ void ieee80211_send_eosp_nullfunc(struct ieee80211_sta 
*pubsta, int tid);
 void ieee80211_sta_register_airtime(struct ieee80211_sta *pubsta, u8 tid,
u32 tx_airtime, u32 rx_airtime);
 
+/**
+ * ieee80211_txq_airtime_check - check if a txq can send frame to device
+ *
+ * @hw: pointer obtained from ieee80211_alloc_hw()
+ * @txq: pointer obtained from station or virtual interface
+ *
+ * Return true if the AQL's airtime limit has not been reached and the txq can
+ * continue to send more packets to the device. Otherwise return false.
+ */
+bool
+ieee80211_txq_airtime_check(struct ieee80211_hw *hw, struct ieee80211_txq 
*txq);
+
 /**
  * ieee80211_iter_keys - iterate keys programmed into the device
  * @hw: pointer obtained from ieee80211_alloc_hw()
diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index 568b3b276931..d77ea0e51c1d 100644
--- a/net/mac80211/debugfs.c
+++ b/net/mac80211/debugfs.c
@@ -148,6 +148,80 @@ static const struct file_operations aqm_ops = {
.llseek = default_llseek,
 };
 
+static ssize_t aql_txq_limit_read(struct file *file,
+ char __user *user_buf,
+ size_t count,
+ loff_t *ppos)
+{
+   struct ieee80211_local *local = file->private_data;
+   char buf[400];
+   int len = 0;
+
+   len = scnprintf(buf, sizeof(buf),
+   "AC AQL limit low   AQL limit high\n"
+   "VO %u  %u\n"
+   "VI %u  %u\n"
+   "BE %u  %u\n"
+   "BK %u  %u\n",
+   local->aql_txq_limit_low[IEEE80211_AC_VO],
+   local->aql_txq_limit_hi

[PATCH v2 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

2019-10-15 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

The previous commit added the ability to throttle stations when they queue
too much airtime in the hardware. This commit enables the functionality by
calculating the expected airtime usage of each packet that is dequeued from
the TXQs in mac80211, and accounting that as pending airtime.

The estimated airtime for each skb is stored in the tx_info, so we can
subtract the same amount from the running total when the skb is freed or
recycled. The throttling mechanism relies on this accounting to be
accurate (i.e., that we are not freeing skbs without subtracting any
airtime they were accounted for), so we put the subtraction into
ieee80211_report_used_skb(). As an optimisation, we also subtract the
airtime on regular TX completion, zeroing out the value stored in the
packet afterwards, to avoid having to do an expensive lookup of the
station from the packet data on every packet.

This patch does *not* include any mechanism to wake a throttled TXQ again,
on the assumption that this will happen anyway as a side effect of whatever
freed the skb (most commonly a TX completion).

Signed-off-by: Toke Høiland-Jørgensen 
---
 net/mac80211/status.c |   38 ++
 net/mac80211/tx.c |   16 
 2 files changed, 54 insertions(+)

diff --git a/net/mac80211/status.c b/net/mac80211/status.c
index ab8ba5835ca0..ce990a1e9043 100644
--- a/net/mac80211/status.c
+++ b/net/mac80211/status.c
@@ -676,6 +676,33 @@ static void ieee80211_report_used_skb(struct 
ieee80211_local *local,
if (dropped)
acked = false;
 
+   if (info->status.tx_time_est) {
+   struct ieee80211_sub_if_data *sdata;
+   struct sta_info *sta = NULL;
+   u8 *qc, ac;
+   int tid;
+
+   rcu_read_lock();
+
+   sdata = ieee80211_sdata_from_skb(local, skb);
+   if (sdata)
+   sta = sta_info_get_bss(sdata, skb_mac_header(skb));
+
+   if (ieee80211_is_data_qos(hdr->frame_control)) {
+   qc = ieee80211_get_qos_ctl(hdr);
+   tid = qc[0] & 0xf;
+   ac = ieee80211_ac_from_tid(tid);
+   } else {
+   ac = IEEE80211_AC_BE;
+   }
+
+   ieee80211_sta_update_pending_airtime(local, sta, ac,
+info->status.tx_time_est,
+true);
+   rcu_read_unlock();
+
+   }
+
if (info->flags & IEEE80211_TX_INTFL_MLME_CONN_TX) {
struct ieee80211_sub_if_data *sdata;
 
@@ -930,6 +957,17 @@ static void __ieee80211_tx_status(struct ieee80211_hw *hw,
tid = qc[0] & 0xf;
}
 
+   if (info->status.tx_time_est) {
+   /* Do this here to avoid the expensive lookup of the sta
+* in ieee80211_report_used_skb().
+*/
+   ieee80211_sta_update_pending_airtime(local, sta,
+
ieee80211_ac_from_tid(tid),
+
info->status.tx_time_est,
+true);
+   info->status.tx_time_est = 0;
+   }
+
if (!acked && ieee80211_is_back_req(fc)) {
u16 control;
 
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 405f622b3fe0..b6b47171b340 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3539,9 +3539,14 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw 
*hw,
struct ieee80211_tx_data tx;
ieee80211_tx_result r;
struct ieee80211_vif *vif = txq->vif;
+   u8 ac = txq->ac;
+   u32 airtime;
 
WARN_ON_ONCE(softirq_count() == 0);
 
+   if (!ieee80211_txq_airtime_check(hw, txq))
+   return NULL;
+
 begin:
spin_lock_bh(&fq->lock);
 
@@ -3652,6 +3657,17 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw 
*hw,
}
 
IEEE80211_SKB_CB(skb)->control.vif = vif;
+
+   if (local->airtime_flags & AIRTIME_USE_AQL) {
+   airtime = ieee80211_calc_expected_tx_airtime(hw, vif, txq->sta,
+skb->len + 38);
+   if (airtime) {
+   info->control.tx_time_est = airtime;
+   ieee80211_sta_update_pending_airtime(local, tx.sta, ac,
+airtime, false);
+   }
+   }
+
return skb;
 
 out:



[PATCH v2 0/4] Add Airtime Queue Limits (AQL) to mac80211

2019-10-15 Thread Toke Høiland-Jørgensen
This series is a first attempt at porting the Airtime Queue Limits concept from
the out-of-tree ath10k implementation[0] to mac80211. This version takes Kan's
patch to do the throttling in mac80211, and replaces the driver API with the
mechanism from the previous version of my series, which instead calculated the
expected airtime at dequeue time inside mac80211, storing it in the SKB cb
field.

This series also imports Felix' airtime calculation code from mt76 into
mac80211, adjusting the API so it can be used from TX dequeue, by extracting the
latest TX rate from the tx_stats structure kept for each station.

As before, I've only compile tested this (lacking the proper hardware to do more
testing). So I'm hoping someone with a proper testing setup can take the whole
thing for a spin... :)

The series is also available in my git repo here:
https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/log/?h=mac80211-aql-02

Changelog:

v2:
  - Integrate Kan's approach to airtime throttling.
  - Hopefully fix the cb struct alignment on big-endian architectures.

---

Kan Yan (1):
  mac80211: Implement Airtime-based Queue Limit (AQL)

Toke Høiland-Jørgensen (3):
  mac80211: Rearrange ieee80211_tx_info to make room for tx_time_est
  mac80211: Import airtime calculation code from mt76
  mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue


 include/net/cfg80211.h |7 +
 include/net/mac80211.h |   52 +-
 net/mac80211/Makefile  |3 
 net/mac80211/airtime.c |  375 
 net/mac80211/debugfs.c |   78 +
 net/mac80211/debugfs_sta.c |   43 -
 net/mac80211/ieee80211_i.h |8 +
 net/mac80211/main.c|9 +
 net/mac80211/sta_info.c|   32 
 net/mac80211/sta_info.h|8 +
 net/mac80211/status.c  |   38 
 net/mac80211/tx.c  |   62 +++
 12 files changed, 693 insertions(+), 22 deletions(-)
 create mode 100644 net/mac80211/airtime.c


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH v2 1/4] mac80211: Rearrange ieee80211_tx_info to make room for tx_time_est

2019-10-15 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

To implement airtime queue limiting, we need to keep a running account of
the estimated airtime of all skbs queued into the device. Do to this
correctly, we need to store the airtime estimate into the skb so we can
decrease the outstanding balance when the skb is freed. This means that the
time estimate must be stored somewhere that will survive for the lifetime
of the skb.

Fortunately, we had a couple of bytes left in the 'status' field in the
ieee80211_tx_info; and since we only plan to calculate the airtime estimate
after the skb is dequeued from the FQ structure, on the control side we can
share the space with the codel enqueue time. And by rearranging the order
of elements it is possible to have the position of the new tx_time_est line
up between the control and status structs, so the value will survive from
when mac80211 hands the packet to the driver, and until the driver either
frees it, or hands it back through TX status.

Signed-off-by: Toke Høiland-Jørgensen 
---
 include/net/mac80211.h |   26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index d69081c38788..49f8ea0af5f8 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -975,20 +975,23 @@ ieee80211_rate_get_vht_nss(const struct ieee80211_tx_rate 
*rate)
  * @control.short_preamble: use short preamble (CCK only)
  * @control.skip_table: skip externally configured rate table
  * @control.jiffies: timestamp for expiry on powersave clients
+ * @control.enqueue_time: enqueue time (for iTXQs)
+ * @control.tx_time_est: estimated airtime usage (shared with @status)
+ * @control.reserved: unused field to ensure alignment of data structure
+ * @control.flags: control flags, see &enum mac80211_tx_control_flags
  * @control.vif: virtual interface (may be NULL)
  * @control.hw_key: key to encrypt with (may be NULL)
- * @control.flags: control flags, see &enum mac80211_tx_control_flags
- * @control.enqueue_time: enqueue time (for iTXQs)
  * @driver_rates: alias to @control.rates to reserve space
  * @pad: padding
  * @rate_driver_data: driver use area if driver needs @control.rates
  * @status: union part for status data
  * @status.rates: attempted rates
  * @status.ack_signal: ACK signal
+ * @status.tx_time_est: estimated airtime of skb (shared with @control)
+ * @status.tx_time: actual airtime consumed for transmission
  * @status.ampdu_ack_len: AMPDU ack length
  * @status.ampdu_len: AMPDU length
  * @status.antenna: (legacy, kept only for iwlegacy)
- * @status.tx_time: airtime consumed for transmission
  * @status.is_valid_ack_signal: ACK signal is valid
  * @status.status_driver_data: driver use area
  * @ack: union part for pure ACK data
@@ -1026,11 +1029,17 @@ struct ieee80211_tx_info {
/* only needed before rate control */
unsigned long jiffies;
};
+   union {
+   codel_time_t enqueue_time;
+   struct {
+   u16 tx_time_est; /* shared with status 
*/
+   u16 reserved; /* padding for alignment 
*/
+   };
+   };
+   u32 flags;
/* NB: vif can be NULL for injected frames */
struct ieee80211_vif *vif;
struct ieee80211_key_conf *hw_key;
-   u32 flags;
-   codel_time_t enqueue_time;
} control;
struct {
u64 cookie;
@@ -1038,12 +1047,13 @@ struct ieee80211_tx_info {
struct {
struct ieee80211_tx_rate rates[IEEE80211_TX_MAX_RATES];
s32 ack_signal;
+   u16 tx_time_est; /* shared with control */
+   u16 tx_time;
u8 ampdu_ack_len;
u8 ampdu_len;
u8 antenna;
-   u16 tx_time;
bool is_valid_ack_signal;
-   void *status_driver_data[19 / sizeof(void *)];
+   void *status_driver_data[16 / sizeof(void *)];
} status;
struct {
struct ieee80211_tx_rate driver_rates[
@@ -1126,6 +1136,8 @@ ieee80211_tx_info_clear_status(struct ieee80211_tx_info 
*info)
 offsetof(struct ieee80211_tx_info, control.rates));
BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
 offsetof(struct ieee80211_tx_info, driver_rates));
+   BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, control.tx_time_est) !=
+offsetof(struct ieee80211_tx_info, status.tx_time_est));
BUILD_BUG_

[PATCH v2 2/4] mac80211: Import airtime calculation code from mt76

2019-10-15 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

Felix recently added code to calculate airtime of packets to the mt76
driver. Import this into mac80211 so we can use it for airtime queue limit
calculations later.

The airtime.c file is copied verbatim from the mt76 driver, and adjusted to
use mac80211 data structures instead (which is fairly straight forward).
The per-rate TX rate calculation is split out to its own
function (ieee80211_calc_tx_airtime_rate()) so it can be used directly for
the AQL calculations added in a subsequent patch.

The only thing that it was not possible to port directly was the bit that
read the internal driver flags of struct ieee80211_rate to determine
whether a rate is using CCK or OFDM encoding. Instead, just look at the
rate index, since at least mt76 and ath10k both seem to have the same
number of CCK rates (4) in their tables.

Signed-off-by: Toke Høiland-Jørgensen 
---
 include/net/mac80211.h |   14 ++
 net/mac80211/Makefile  |3 
 net/mac80211/airtime.c |  375 
 net/mac80211/ieee80211_i.h |4 
 4 files changed, 395 insertions(+), 1 deletion(-)
 create mode 100644 net/mac80211/airtime.c

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index 49f8ea0af5f8..7619eb74d612 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -6434,4 +6434,18 @@ void ieee80211_nan_func_match(struct ieee80211_vif *vif,
  struct cfg80211_nan_match_params *match,
  gfp_t gfp);
 
+/**
+ * ieee80211_calc_tx_airtime - calculate estimated transmission airtime.
+ *
+ * This function calculates the estimated airtime usage of a frame based on the
+ * rate information in the TX info struct and the frame length.
+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @info: &struct ieee80211_tx_info of the frame.
+ * @len: frame length in bytes
+ */
+u32 ieee80211_calc_tx_airtime(struct ieee80211_hw *hw,
+ struct ieee80211_tx_info *info,
+ int len);
+
 #endif /* MAC80211_H */
diff --git a/net/mac80211/Makefile b/net/mac80211/Makefile
index 4f03ebe732fa..6cbb1286d6c0 100644
--- a/net/mac80211/Makefile
+++ b/net/mac80211/Makefile
@@ -32,7 +32,8 @@ mac80211-y := \
chan.o \
trace.o mlme.o \
tdls.o \
-   ocb.o
+   ocb.o \
+   airtime.o
 
 mac80211-$(CONFIG_MAC80211_LEDS) += led.o
 mac80211-$(CONFIG_MAC80211_DEBUGFS) += \
diff --git a/net/mac80211/airtime.c b/net/mac80211/airtime.c
new file mode 100644
index ..7a18d5405756
--- /dev/null
+++ b/net/mac80211/airtime.c
@@ -0,0 +1,375 @@
+// SPDX-License-Identifier: ISC
+/*
+ * Copyright (C) 2019 Felix Fietkau 
+ */
+
+#include 
+#include "ieee80211_i.h"
+#include "sta_info.h"
+
+#define AVG_PKT_SIZE   1024
+
+/* Number of bits for an average sized packet */
+#define MCS_NBITS (AVG_PKT_SIZE << 3)
+
+/* Number of symbols for a packet with (bps) bits per symbol */
+#define MCS_NSYMS(bps) DIV_ROUND_UP(MCS_NBITS, (bps))
+
+/* Transmission time (1024 usec) for a packet containing (syms) * symbols */
+#define MCS_SYMBOL_TIME(sgi, syms) \
+   (sgi ?  \
+ ((syms) * 18 * 1024 + 4 * 1024) / 5 : /* syms * 3.6 us */ \
+ ((syms) * 1024) << 2  /* syms * 4 us */   \
+   )
+
+/* Transmit duration for the raw data part of an average sized packet */
+#define MCS_DURATION(streams, sgi, bps) \
+   MCS_SYMBOL_TIME(sgi, MCS_NSYMS((streams) * (bps)))
+
+#define BW_20  0
+#define BW_40  1
+#define BW_80  2
+
+/*
+ * Define group sort order: HT40 -> SGI -> #streams
+ */
+#define IEEE80211_MAX_STREAMS  4
+#define IEEE80211_HT_STREAM_GROUPS 4 /* BW(=2) * SGI(=2) */
+#define IEEE80211_VHT_STREAM_GROUPS6 /* BW(=3) * SGI(=2) */
+
+#define IEEE80211_HT_GROUPS_NB (IEEE80211_MAX_STREAMS *\
+IEEE80211_HT_STREAM_GROUPS)
+#define IEEE80211_VHT_GROUPS_NB(IEEE80211_MAX_STREAMS *\
+IEEE80211_VHT_STREAM_GROUPS)
+#define IEEE80211_GROUPS_NB(IEEE80211_HT_GROUPS_NB +   \
+IEEE80211_VHT_GROUPS_NB)
+
+#define IEEE80211_HT_GROUP_0   0
+#define IEEE80211_VHT_GROUP_0  (IEEE80211_HT_GROUP_0 + IEEE80211_HT_GROUPS_NB)
+
+#define MCS_GROUP_RATES10
+#define CCK_NUM_RATES  4
+
+#define HT_GROUP_IDX(_streams, _sgi, _ht40)\
+   IEEE80211_HT_GROUP_0 +  \
+   IEEE80211_MAX_STREAMS * 2 * _ht40 + \
+   IEEE80211_MAX_STREAMS * _sgi +  \
+   _streams - 1
+
+#define _MAX(a, b) (((a)>(b))?(a):(b))
+
+#define GROUP_SHIFT(duration)  \
+   _MAX(0, 16 - __builtin_clz(duration))
+
+/* MCS 

Re: [PATCH v3 1/2] mac80211: Implement Airtime-based Queue Limit (AQL)

2019-10-10 Thread Toke Høiland-Jørgensen
Johannes Berg  writes:

> Hi,
>
> A couple of points...
>
> First, I'd like Toke to review & ack this if possible :-)

Sure, I'll look at it. I'm away the rest of this week, but should
hopefully get some more time next week. It may be that it will take the
form of another submission that integrates this with the previous patch
I sent that put more of the calculation into mac80211 itself...

-Toke



Re: [PATCH V3 0/4] Enable virtual time-based airtime scheduler support on ath10k

2019-10-01 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> On 2019-10-01 18:19, Johannes Berg wrote:
>> On Mon, 2019-09-23 at 15:19 +0800, Yibo Zhao wrote:
>>> This series fix some issues when enabling virtual time-based airtime 
>>> scheduler on ath10k.
>>> 
>> Given the lengthy discussion here and also over in the related thread
>> about AQL, I'm assuming you're going to repost this eventually.
>
> Yes, will repost once Toke have updated "mac80211: Switch to a virtual 
> time-based airtime scheduler" with his new ideas.

Which in turn is waiting for the AQL stuff... :)

-Toke



Re: [PATCH] mac80211: fix txq null pointer dereference

2019-09-26 Thread Toke Høiland-Jørgensen
Miaoqing Pan  writes:

> If the interface type is P2P_DEVICE or NAN, read the file of
> '/sys/kernel/debug/ieee80211/phyx/netdev:wlanx/aqm' will get a
> NULL pointer dereference. As for those interface type, the
> pointer sdata->vif.txq is NULL.

Heh. Oops!

> Unable to handle kernel NULL pointer dereference at virtual address 0011
> CPU: 1 PID: 30936 Comm: cat Not tainted 4.14.104 #1
> task: ffc0337e4880 task.stack: ff800cd2
> PC is at ieee80211_if_fmt_aqm+0x34/0xa0 [mac80211]
> LR is at ieee80211_if_fmt_aqm+0x34/0xa0 [mac80211]
> pc : [] lr : [] pstate: 6145
> sp : ff800cd23bb0
> x29: ff800cd23c00 x28: ffc0337e4880
> x27: ff8008a04000 x26: 0003
> x25: 018e x24: ff80090f9d30
> x23: ffc034b62000 x22: 
> x21: ffc0335008e0 x20: ff800cd23c10
> x19: 00c8 x18: 
> x17:  x16: ff80081ef454
> x15:  x14: f45c1d27
> x13: ffab6710 x12: 0003
> x11: 0006 x10: 
> x9 : 0001 x8 : ffc0337e4880
> x7 : 0003 x6 : f4498000
> x5 :  x4 : ff8000b7
> x3 : ff800cd23e80 x2 : 00c8
> x1 : ff800cd23c10 x0 : ffc0335008e0
> Process cat (pid: 30936, stack limit = 0xff800cd2)
> [Call trace:
> Exception stack(0xff800cd23a70 to 0xff800cd23bb0)
> 3a60:   ffc0335008e0 ff800cd23c10
> 3a80: 00c8 ff800cd23e80 ff8000b7 
> 3aa0: f4498000 0003 ffc0337e4880 0001
> 3ac0:  0006 0003 ffab6710
> 3ae0: f45c1d27  ff80081ef454 
> 3b00:  00c8 ff800cd23c10 ffc0335008e0
> 3b20:  ffc034b62000 ff80090f9d30 018e
> 3b40: 0003 ff8008a04000 ffc0337e4880 ff800cd23c00
> 3b60: ff8000b7cd00 ff800cd23bb0 ff8000b7cd00 6145
> 3b80: ff800cd23ba0 ff80088e16e4 007f ff800cd23c40
> 3ba0: ff800cd23c00 ff8000b7cd00
> [] ieee80211_if_fmt_aqm+0x34/0xa0 [mac80211]
> [] ieee80211_if_read+0x60/0xbc [mac80211]
> [] ieee80211_if_read_aqm+0x28/0x30 [mac80211]
> [] full_proxy_read+0x2c/0x48
> [] __vfs_read+0x2c/0xd4
> [] vfs_read+0x8c/0x108
> [] SyS_read+0x40/0x7c
>
> Tested HW: QCA9984
> Tested FW: 10.4-3.10-00047
>
> Signed-off-by: Miaoqing Pan 

Acked-by: Toke Høiland-Jørgensen 



Re: [PATCH V3 3/4] mac80211: fix low throughput in multi-clients situation

2019-09-24 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

>> I can see why we need the second part (basically, this happens because 
>> I
>> forgot to add a check for "no eligible stations" in may_transmit(), 
>> like
>> the one in next_txq()). And rounding up the division result doesn't
>> hurt, I guess. But why does it help to change the grace period if we're
>> doing all the other stuff?
> In multi-clients case, it is possible a TXQ sometimes gets drained due 
> to FW has deep queue and few packets in TXQ at that time. So the TXQ is 
> removed from the rbtree after dequeuing. When it is about to added back 
> very soon after the removal, the g_vt might have gone a little far away 
> from sta vt where sync is needed. With this sync, the station is forced 
> to catch up with the g_vt, however, its chance for transmission has been 
> reduced. I think 500us is quite a short period in multi-clients case.

That's a good point, actually: Having the grace period be too small will
allow stations that leave and re-enter the queue to "skip ahead" and use
more than its share. However, I think it's a separate issue from what
this patch is about; so how about I just increase the grace period in
the next version of the base patch?

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-24 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> On 2019-09-24 15:26, Toke Høiland-Jørgensen wrote:
>>>> Hmm, yeah, I guess we could end up with a loop like that as well.
>>>> Keeping the schedule_round would be a way to fix it, but I'm not sure
>>>> we
>>>> should just skip that station; maybe we should just end the round
>>>> instead?
>>> I am not sure. I believe, in some cases, the rest of the nodes which
>>> could be most of the nodes in the tree will not have the chance to be
>>> scheduled in this round.
>> 
>> My guess would be that it doesn't really matter, because in most cases
>> each schedule round will only actually end up queueing packets from one
>> or two stations; as the driver will pull multiple packets from that one
>> station which will often fill up the firmware queues (especially once 
>> we
>> start throttling that with the AQL stuff).
>> 
>> So I guess we can just skip TXQs that we've already seen this 
>> scheduling
>> round, and let the v_t compare determine transmit eligibility :)
>
> I am a little confused. So do you mean it is fine for you to skip the 
> TXQs we met in this round before and continue the loop until the end or 
> vt comparison failure?

Yeah. In most cases it won't make any difference; but it'll make sure we
visit all eligible TXQs in all cases, so we might as well do that if
we're tracking the scheduling round anyway.

-Toke



Re: [PATCH V3 4/4] mac80211: Sync airtime weight sum with per AC synced sta airtime weight together

2019-09-24 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> On 2019-09-23 19:00, Toke Høiland-Jørgensen wrote:
>
>>> -   if (params->airtime_weight)
>>> -   sta->airtime_weight = params->airtime_weight;
>>> +   if (params->airtime_weight &&
>>> +   params->airtime_weight != sta->airtime_weight) {
>> 
>> This check doesn't work I think? You're not using the array-based
>> sta->airtime_weight[], and there are locking issues by just checking
>> like this; so maybe just keep the if() on params->airtime_weight, and 
>> do
>> the checking inside the loop while holding the lock?
>
> It should be array-based sta->airtime_weight[] and I am missing that 
> part during the porting. But you are right about we should check it 
> inside the loop with the lock.
>
>> 
>> Or could we just turn the weights into atomics to avoid the locking
>> entirely?
>
> Actually, we still need the active txq locking to make sure the txq is 
> on the rbtree. Otherwise, no need to change airtime weight sum.

True. Just moving the check inside the locking will be the right thing
to do, then.

-Toke



Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-24 Thread Toke Høiland-Jørgensen
>> Hmm, yeah, I guess we could end up with a loop like that as well.
>> Keeping the schedule_round would be a way to fix it, but I'm not sure 
>> we
>> should just skip that station; maybe we should just end the round
>> instead?
> I am not sure. I believe, in some cases, the rest of the nodes which 
> could be most of the nodes in the tree will not have the chance to be 
> scheduled in this round.

My guess would be that it doesn't really matter, because in most cases
each schedule round will only actually end up queueing packets from one
or two stations; as the driver will pull multiple packets from that one
station which will often fill up the firmware queues (especially once we
start throttling that with the AQL stuff).

So I guess we can just skip TXQs that we've already seen this scheduling
round, and let the v_t compare determine transmit eligibility :)

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-24 Thread Toke Høiland-Jørgensen
Kalle Valo  writes:

> Toke Høiland-Jørgensen  writes:
>
>> Kalle Valo  writes:
>>
>>> Toke Høiland-Jørgensen  writes:
>>>
>>>> Yibo Zhao  writes:
>>>>
>>>>> On 2019-09-21 22:00, Toke Høiland-Jørgensen wrote:
>>>>>> Yibo Zhao  writes:
>>>>>> 
>>>>>>> On 2019-09-21 21:02, Toke Høiland-Jørgensen wrote:
>>>>>>>> Yibo Zhao  writes:
>>>>>>>> 
>>>>>>>>> On 2019-09-21 19:27, Toke Høiland-Jørgensen wrote:
>>>>>>>>>> Yibo Zhao  writes:
>>>>>>>>>> 
>>>>>>>>>>> On 2019-09-20 17:15, Toke Høiland-Jørgensen wrote:
>>>>>>>>>>>> Yibo Zhao  writes:
>>>>>>>>>>>> 
>>>>>>>>>>>>> On 2019-09-19 18:37, Toke Høiland-Jørgensen wrote:
>>>>>>>>>>>>>> Yibo Zhao  writes:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 2019-09-18 19:23, Toke Høiland-Jørgensen wrote:
>>>>>>>>>>>>>>>> Yibo Zhao  writes:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:
>>>>>>>>>>>>>>>>>> Yibo Zhao  writes:
>>>
>>> Guys, PLEASE please consider us poor maintainers drowning in email and
>>> edit your quotes :) This style of discussion makes patchwork unusable:
>>>
>>> https://patchwork.kernel.org/patch/11147019/
>>
>> Heh, oops, didn't realise you were following the discussion from
>> patchwork; sorry, will be sure to cut things in the future.
>
> To be honest, I'm not sure how much Johannes uses patchwork. But I
> check everything from patchwork 95% of the time and try to keep my
> email boxes clean.

Noted. I'll try to be nice to patchwork, then :)

>> The quote marks do make a very nice (reverse) christmas tree, though ;)
>
> It did! I had to include that to my rant :)

:D

-Toke



Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-23 Thread Toke Høiland-Jørgensen
Kalle Valo  writes:

> Toke Høiland-Jørgensen  writes:
>
>> Yibo Zhao  writes:
>>
>>> On 2019-09-21 22:00, Toke Høiland-Jørgensen wrote:
>>>> Yibo Zhao  writes:
>>>> 
>>>>> On 2019-09-21 21:02, Toke Høiland-Jørgensen wrote:
>>>>>> Yibo Zhao  writes:
>>>>>> 
>>>>>>> On 2019-09-21 19:27, Toke Høiland-Jørgensen wrote:
>>>>>>>> Yibo Zhao  writes:
>>>>>>>> 
>>>>>>>>> On 2019-09-20 17:15, Toke Høiland-Jørgensen wrote:
>>>>>>>>>> Yibo Zhao  writes:
>>>>>>>>>> 
>>>>>>>>>>> On 2019-09-19 18:37, Toke Høiland-Jørgensen wrote:
>>>>>>>>>>>> Yibo Zhao  writes:
>>>>>>>>>>>> 
>>>>>>>>>>>>> On 2019-09-18 19:23, Toke Høiland-Jørgensen wrote:
>>>>>>>>>>>>>> Yibo Zhao  writes:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:
>>>>>>>>>>>>>>>> Yibo Zhao  writes:
>
> Guys, PLEASE please consider us poor maintainers drowning in email and
> edit your quotes :) This style of discussion makes patchwork unusable:
>
> https://patchwork.kernel.org/patch/11147019/

Heh, oops, didn't realise you were following the discussion from
patchwork; sorry, will be sure to cut things in the future.

The quote marks do make a very nice (reverse) christmas tree, though ;)

-Toke



Re: [PATCH V3 4/4] mac80211: Sync airtime weight sum with per AC synced sta airtime weight together

2019-09-23 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> Global airtime weight sum is updated only when txq is added/removed
> from rbtree. If upper layer configures sta weight during high load,
> airtime weight sum will not be updated since txq is most likely on the
> tree. It could a little late for upper layer to reconfigure sta weight
> when txq is already in the rbtree. And thus, incorrect airtime weight sum
> will lead to incorrect global virtual time calculation as well as overflow
> of airtime weight sum during txq removed.
>
> Hence, need to update airtime weight sum upon receiving event for
> configuring sta weight once sta's txq is on the rbtree.
>
> Besides, if airtime weight sum of ACs and sta weight is synced under the
> same per AC lock protection, there can be a very short window causing
> incorrct airtime weight sum calculation as below:
>
> active_txq_lock_VO  .
> VO weight sum is syncd.
> sta airtime weight sum is synced  .
> active_txq_unlock_VO  .
> . .
> active_txq_lock_VI.
> VI weight sum is syncd.
> sta airtime weight sumactive_txq_lock_BE
> active_txq_unlock_VIRemove txq and thus sum
> .   is calculated with synced
> .   sta airtime weight
> . active_txq_unlock_BE
>
> So introduce a per ac synced station airtime weight synced with per
> AC synced weight sum together. And the per-AC station airtime weight
> is used to calculate weight sum.
>
> Signed-off-by: Yibo Zhao 
> ---
>  net/mac80211/cfg.c | 29 ++---
>  net/mac80211/debugfs_sta.c |  2 +-
>  net/mac80211/sta_info.c|  9 -
>  net/mac80211/sta_info.h|  5 +++--
>  net/mac80211/tx.c  |  4 ++--
>  5 files changed, 36 insertions(+), 13 deletions(-)
>
> diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
> index d65aa01..c8a0683 100644
> --- a/net/mac80211/cfg.c
> +++ b/net/mac80211/cfg.c
> @@ -1284,7 +1284,8 @@ static int sta_apply_parameters(struct ieee80211_local 
> *local,
>   int ret = 0;
>   struct ieee80211_supported_band *sband;
>   struct ieee80211_sub_if_data *sdata = sta->sdata;
> - u32 mask, set;
> + u32 mask, set, tid, ac, pre_weight;
> + struct txq_info *txqi;
>  
>   sband = ieee80211_get_sband(sdata);
>   if (!sband)
> @@ -1452,8 +1453,30 @@ static int sta_apply_parameters(struct ieee80211_local 
> *local,
>   if (ieee80211_vif_is_mesh(&sdata->vif))
>   sta_apply_mesh_params(local, sta, params);
>  
> - if (params->airtime_weight)
> - sta->airtime_weight = params->airtime_weight;
> + if (params->airtime_weight &&
> + params->airtime_weight != sta->airtime_weight) {

This check doesn't work I think? You're not using the array-based
sta->airtime_weight[], and there are locking issues by just checking
like this; so maybe just keep the if() on params->airtime_weight, and do
the checking inside the loop while holding the lock?

Or could we just turn the weights into atomics to avoid the locking
entirely?

> + for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) {
> + spin_lock_bh(&local->active_txq_lock[ac]);
> + for (tid = 0; tid < IEEE80211_NUM_TIDS + 1; tid++) {
> + if (!sta->sta.txq[tid] ||
> + ac != ieee80211_ac_from_tid(tid))
> + continue;
> +
> + pre_weight = sta->airtime_weight[ac];
> + sta->airtime_weight[ac] =
> + params->airtime_weight;
> +
> + txqi = to_txq_info(sta->sta.txq[tid]);
> + if (RB_EMPTY_NODE(&txqi->schedule_order))
> + continue;
> +
> + local->airtime_weight_sum[ac] = 
> local->airtime_weight_sum[ac] +
> + 
> params->airtime_weight -
> + pre_weight;
> + }
> + spin_unlock_bh(&local->active_txq_lock[ac]);
> + }
> + }
>  
>   /* set the STA state after all sta info from usermode has been set */
>   if (test_sta_flag(sta, WLAN_STA_TDLS_PEER) ||
> diff --git a/net/mac80211/debugfs_sta.c b/net/mac80211/debugfs_sta.c
> index 80028da..43a7e6a 100644
> --- a/net/mac80211/debugfs_sta.c
> +++ b/net/mac80211/debugfs_sta.c
> @@ -223,7 +223,7 @@ static ssize_t sta_airtime_read(struct file *file, char 
> __user *userbuf,
>   "Virt-T: VO: %lld us VI: %lld us BE: %lld us BK: %lld us\n",
>   rx_airtime,
>   tx_airtime,
> - sta->airti

Re: [PATCH V3 2/4] mac80211: defer txqs removal from rbtree

2019-09-23 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> In a loop txqs dequeue scenario, if the first txq in the rbtree gets
> removed from rbtree immediately in the ieee80211_return_txq(), the
> loop will break soon in the ieee80211_next_txq() due to schedule_pos
> not leading to the second txq in the rbtree. Thus, defering the
> removal right before the end of this schedule round.

Didn't we agree that we could fix this by making __unschedule_txq()
aware of schedule_pos instead of this deferred removal mechanism?

-Toke


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH V3 3/4] mac80211: fix low throughput in multi-clients situation

2019-09-23 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> Not long after the start of multi-clients test, not a single station is
> an eligible candidate for transmission since global virtual time(g_vt) is
> smaller than the virtual airtime(s_vt) of all the stations. As a result,
> the Tx has been blocked and throughput is quite low.
>
> This may mainly due to sync mechanism and accumulative deviation from the
> devision calculation of g_vt.
>
> For example:
> Suppose we have 50 clients in first round.
> Round 1:
> STA   weight  Tx_time_round  wt_sum   s_vtg_vt  valid_for_next_Tx
> 1 256 204812800   20482000N
> 2 256 20482048N
> . .   .   .   .
> . .   .   .   .
> . .   .   .   .
> 50256 20482048N
>
> After this round, all the stations are not valid for next transmission due to
> accumulative deviation.
>
> And if we add a new #51,
> STA   weight  Tx_time_round  wt_sum   s_vtg_vt  valid_for_next_Tx
> 1 256 204813056   20482020N
> 2 256 20482048N
> . .   .   .
> . .   .   .
> . .   .   .
> 50256 20482048N
> 51256 10242524N

That's better :)

> Sync is done by:
> max(g_vt of last round - grace period, s_vt)
> and s_vt of #51 = max(2000 - 500, 0) + 1024 = 2524, and it is more than the 
> final
> g_vt of this round.
>
> After this round, no more station is valid for transmission.
>
> The real situation can be more complicate, above is one of the extremely case.
>
> To avoid this situation to occur, the new proposal is:
>
> - Increase the airtime grace period a little more to reduce the
>   unexpected sync
>
> - If global virtual time is less than the virtual airtime of any station,
>   sync it to the airtime of first station in the red-black tree
>
> - Round the division result

I can see why we need the second part (basically, this happens because I
forgot to add a check for "no eligible stations" in may_transmit(), like
the one in next_txq()). And rounding up the division result doesn't
hurt, I guess. But why does it help to change the grace period if we're
doing all the other stuff?

-Toke



Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-23 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> On 2019-09-21 22:00, Toke Høiland-Jørgensen wrote:
>> Yibo Zhao  writes:
>> 
>>> On 2019-09-21 21:02, Toke Høiland-Jørgensen wrote:
>>>> Yibo Zhao  writes:
>>>> 
>>>>> On 2019-09-21 19:27, Toke Høiland-Jørgensen wrote:
>>>>>> Yibo Zhao  writes:
>>>>>> 
>>>>>>> On 2019-09-20 17:15, Toke Høiland-Jørgensen wrote:
>>>>>>>> Yibo Zhao  writes:
>>>>>>>> 
>>>>>>>>> On 2019-09-19 18:37, Toke Høiland-Jørgensen wrote:
>>>>>>>>>> Yibo Zhao  writes:
>>>>>>>>>> 
>>>>>>>>>>> On 2019-09-18 19:23, Toke Høiland-Jørgensen wrote:
>>>>>>>>>>>> Yibo Zhao  writes:
>>>>>>>>>>>> 
>>>>>>>>>>>>> On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:
>>>>>>>>>>>>>> Yibo Zhao  writes:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> In a loop txqs dequeue scenario, if the first txq in the
>>>>>>>>>>>>>>> rbtree
>>>>>>>>>>>>>>> gets
>>>>>>>>>>>>>>> removed from rbtree immediately in the
>>>>>>>>>>>>>>> ieee80211_return_txq(),
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> loop will break soon in the ieee80211_next_txq() due to
>>>>>>>>>>>>>>> schedule_pos
>>>>>>>>>>>>>>> not leading to the second txq in the rbtree. Thus, 
>>>>>>>>>>>>>>> defering
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> removal right before the end of this schedule round.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Co-developed-by: Yibo Zhao 
>>>>>>>>>>>>>>> Signed-off-by: Yibo Zhao 
>>>>>>>>>>>>>>> Signed-off-by: Toke Høiland-Jørgensen 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I didn't write this patch, so please don't use my sign-off.
>>>>>>>>>>>>>> I'll
>>>>>>>>>>>>>> add
>>>>>>>>>>>>>> ack or review tags as appropriate in reply; but a few
>>>>>>>>>>>>>> comments
>>>>>>>>>>>>>> first:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>  include/net/mac80211.h | 16 ++--
>>>>>>>>>>>>>>>  net/mac80211/ieee80211_i.h |  3 +++
>>>>>>>>>>>>>>>  net/mac80211/main.c|  6 +
>>>>>>>>>>>>>>>  net/mac80211/tx.c  | 63
>>>>>>>>>>>>>>> +++---
>>>>>>>>>>>>>>>  4 files changed, 83 insertions(+), 5 deletions(-)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> diff --git a/include/net/mac80211.h 
>>>>>>>>>>>>>>> b/include/net/mac80211.h
>>>>>>>>>>>>>>> index ac2ed8e..ba5a345 100644
>>>>>>>>>>>>>>> --- a/include/net/mac80211.h
>>>>>>>>>>>>>>> +++ b/include/net/mac80211.h
>>>>>>>>>>>>>>> @@ -925,6 +925,8 @@ struct ieee80211_tx_rate {
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>  #define IEEE80211_MAX_TX_RETRY 31
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> +#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>  static inline void ieee80211_rat

Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-21 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> On 2019-09-21 21:02, Toke Høiland-Jørgensen wrote:
>> Yibo Zhao  writes:
>> 
>>> On 2019-09-21 19:27, Toke Høiland-Jørgensen wrote:
>>>> Yibo Zhao  writes:
>>>> 
>>>>> On 2019-09-20 17:15, Toke Høiland-Jørgensen wrote:
>>>>>> Yibo Zhao  writes:
>>>>>> 
>>>>>>> On 2019-09-19 18:37, Toke Høiland-Jørgensen wrote:
>>>>>>>> Yibo Zhao  writes:
>>>>>>>> 
>>>>>>>>> On 2019-09-18 19:23, Toke Høiland-Jørgensen wrote:
>>>>>>>>>> Yibo Zhao  writes:
>>>>>>>>>> 
>>>>>>>>>>> On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:
>>>>>>>>>>>> Yibo Zhao  writes:
>>>>>>>>>>>> 
>>>>>>>>>>>>> In a loop txqs dequeue scenario, if the first txq in the
>>>>>>>>>>>>> rbtree
>>>>>>>>>>>>> gets
>>>>>>>>>>>>> removed from rbtree immediately in the 
>>>>>>>>>>>>> ieee80211_return_txq(),
>>>>>>>>>>>>> the
>>>>>>>>>>>>> loop will break soon in the ieee80211_next_txq() due to
>>>>>>>>>>>>> schedule_pos
>>>>>>>>>>>>> not leading to the second txq in the rbtree. Thus, defering
>>>>>>>>>>>>> the
>>>>>>>>>>>>> removal right before the end of this schedule round.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Co-developed-by: Yibo Zhao 
>>>>>>>>>>>>> Signed-off-by: Yibo Zhao 
>>>>>>>>>>>>> Signed-off-by: Toke Høiland-Jørgensen 
>>>>>>>>>>>> 
>>>>>>>>>>>> I didn't write this patch, so please don't use my sign-off.
>>>>>>>>>>>> I'll
>>>>>>>>>>>> add
>>>>>>>>>>>> ack or review tags as appropriate in reply; but a few 
>>>>>>>>>>>> comments
>>>>>>>>>>>> first:
>>>>>>>>>>>> 
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>  include/net/mac80211.h | 16 ++--
>>>>>>>>>>>>>  net/mac80211/ieee80211_i.h |  3 +++
>>>>>>>>>>>>>  net/mac80211/main.c|  6 +
>>>>>>>>>>>>>  net/mac80211/tx.c  | 63
>>>>>>>>>>>>> +++---
>>>>>>>>>>>>>  4 files changed, 83 insertions(+), 5 deletions(-)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> diff --git a/include/net/mac80211.h b/include/net/mac80211.h
>>>>>>>>>>>>> index ac2ed8e..ba5a345 100644
>>>>>>>>>>>>> --- a/include/net/mac80211.h
>>>>>>>>>>>>> +++ b/include/net/mac80211.h
>>>>>>>>>>>>> @@ -925,6 +925,8 @@ struct ieee80211_tx_rate {
>>>>>>>>>>>>> 
>>>>>>>>>>>>>  #define IEEE80211_MAX_TX_RETRY   31
>>>>>>>>>>>>> 
>>>>>>>>>>>>> +#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
>>>>>>>>>>>>> +
>>>>>>>>>>>>>  static inline void ieee80211_rate_set_vht(struct
>>>>>>>>>>>>> ieee80211_tx_rate
>>>>>>>>>>>>> *rate,
>>>>>>>>>>>>> u8 mcs, u8 nss)
>>>>>>>>>>>>>  {
>>>>>>>>>>>>> @@ -6232,7 +6234,8 @@ struct sk_buff
>>>>>>>>>>>>> *ieee80211_tx_dequeue(struct
>>>>>>>>>>>>> ieee80211_hw *hw,
>>>>>>>>>>>>>   * @ac: AC number to return packets from.
>>>>>>>>>>>&g

Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-21 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> On 2019-09-21 19:27, Toke Høiland-Jørgensen wrote:
>> Yibo Zhao  writes:
>> 
>>> On 2019-09-20 17:15, Toke Høiland-Jørgensen wrote:
>>>> Yibo Zhao  writes:
>>>> 
>>>>> On 2019-09-19 18:37, Toke Høiland-Jørgensen wrote:
>>>>>> Yibo Zhao  writes:
>>>>>> 
>>>>>>> On 2019-09-18 19:23, Toke Høiland-Jørgensen wrote:
>>>>>>>> Yibo Zhao  writes:
>>>>>>>> 
>>>>>>>>> On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:
>>>>>>>>>> Yibo Zhao  writes:
>>>>>>>>>> 
>>>>>>>>>>> In a loop txqs dequeue scenario, if the first txq in the 
>>>>>>>>>>> rbtree
>>>>>>>>>>> gets
>>>>>>>>>>> removed from rbtree immediately in the ieee80211_return_txq(),
>>>>>>>>>>> the
>>>>>>>>>>> loop will break soon in the ieee80211_next_txq() due to
>>>>>>>>>>> schedule_pos
>>>>>>>>>>> not leading to the second txq in the rbtree. Thus, defering 
>>>>>>>>>>> the
>>>>>>>>>>> removal right before the end of this schedule round.
>>>>>>>>>>> 
>>>>>>>>>>> Co-developed-by: Yibo Zhao 
>>>>>>>>>>> Signed-off-by: Yibo Zhao 
>>>>>>>>>>> Signed-off-by: Toke Høiland-Jørgensen 
>>>>>>>>>> 
>>>>>>>>>> I didn't write this patch, so please don't use my sign-off. 
>>>>>>>>>> I'll
>>>>>>>>>> add
>>>>>>>>>> ack or review tags as appropriate in reply; but a few comments
>>>>>>>>>> first:
>>>>>>>>>> 
>>>>>>>>>>> ---
>>>>>>>>>>>  include/net/mac80211.h | 16 ++--
>>>>>>>>>>>  net/mac80211/ieee80211_i.h |  3 +++
>>>>>>>>>>>  net/mac80211/main.c|  6 +
>>>>>>>>>>>  net/mac80211/tx.c  | 63
>>>>>>>>>>> +++---
>>>>>>>>>>>  4 files changed, 83 insertions(+), 5 deletions(-)
>>>>>>>>>>> 
>>>>>>>>>>> diff --git a/include/net/mac80211.h b/include/net/mac80211.h
>>>>>>>>>>> index ac2ed8e..ba5a345 100644
>>>>>>>>>>> --- a/include/net/mac80211.h
>>>>>>>>>>> +++ b/include/net/mac80211.h
>>>>>>>>>>> @@ -925,6 +925,8 @@ struct ieee80211_tx_rate {
>>>>>>>>>>> 
>>>>>>>>>>>  #define IEEE80211_MAX_TX_RETRY 31
>>>>>>>>>>> 
>>>>>>>>>>> +#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
>>>>>>>>>>> +
>>>>>>>>>>>  static inline void ieee80211_rate_set_vht(struct
>>>>>>>>>>> ieee80211_tx_rate
>>>>>>>>>>> *rate,
>>>>>>>>>>>   u8 mcs, u8 nss)
>>>>>>>>>>>  {
>>>>>>>>>>> @@ -6232,7 +6234,8 @@ struct sk_buff
>>>>>>>>>>> *ieee80211_tx_dequeue(struct
>>>>>>>>>>> ieee80211_hw *hw,
>>>>>>>>>>>   * @ac: AC number to return packets from.
>>>>>>>>>>>   *
>>>>>>>>>>>   * Should only be called between calls to
>>>>>>>>>>> ieee80211_txq_schedule_start()
>>>>>>>>>>> - * and ieee80211_txq_schedule_end().
>>>>>>>>>>> + * and ieee80211_txq_schedule_end(). If the txq is empty, it
>>>>>>>>>>> will
>>>>>>>>>>> be
>>>>>>>>>>> added
>>>>>>>>>>> + * to a remove list and get removed later.
>>>>>>>>>>>   * Returns the next txq if successful, %

Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-21 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> On 2019-09-20 17:15, Toke Høiland-Jørgensen wrote:
>> Yibo Zhao  writes:
>> 
>>> On 2019-09-19 18:37, Toke Høiland-Jørgensen wrote:
>>>> Yibo Zhao  writes:
>>>> 
>>>>> On 2019-09-18 19:23, Toke Høiland-Jørgensen wrote:
>>>>>> Yibo Zhao  writes:
>>>>>> 
>>>>>>> On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:
>>>>>>>> Yibo Zhao  writes:
>>>>>>>> 
>>>>>>>>> In a loop txqs dequeue scenario, if the first txq in the rbtree
>>>>>>>>> gets
>>>>>>>>> removed from rbtree immediately in the ieee80211_return_txq(), 
>>>>>>>>> the
>>>>>>>>> loop will break soon in the ieee80211_next_txq() due to
>>>>>>>>> schedule_pos
>>>>>>>>> not leading to the second txq in the rbtree. Thus, defering the
>>>>>>>>> removal right before the end of this schedule round.
>>>>>>>>> 
>>>>>>>>> Co-developed-by: Yibo Zhao 
>>>>>>>>> Signed-off-by: Yibo Zhao 
>>>>>>>>> Signed-off-by: Toke Høiland-Jørgensen 
>>>>>>>> 
>>>>>>>> I didn't write this patch, so please don't use my sign-off. I'll
>>>>>>>> add
>>>>>>>> ack or review tags as appropriate in reply; but a few comments
>>>>>>>> first:
>>>>>>>> 
>>>>>>>>> ---
>>>>>>>>>  include/net/mac80211.h | 16 ++--
>>>>>>>>>  net/mac80211/ieee80211_i.h |  3 +++
>>>>>>>>>  net/mac80211/main.c|  6 +
>>>>>>>>>  net/mac80211/tx.c  | 63
>>>>>>>>> +++---
>>>>>>>>>  4 files changed, 83 insertions(+), 5 deletions(-)
>>>>>>>>> 
>>>>>>>>> diff --git a/include/net/mac80211.h b/include/net/mac80211.h
>>>>>>>>> index ac2ed8e..ba5a345 100644
>>>>>>>>> --- a/include/net/mac80211.h
>>>>>>>>> +++ b/include/net/mac80211.h
>>>>>>>>> @@ -925,6 +925,8 @@ struct ieee80211_tx_rate {
>>>>>>>>> 
>>>>>>>>>  #define IEEE80211_MAX_TX_RETRY   31
>>>>>>>>> 
>>>>>>>>> +#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
>>>>>>>>> +
>>>>>>>>>  static inline void ieee80211_rate_set_vht(struct
>>>>>>>>> ieee80211_tx_rate
>>>>>>>>> *rate,
>>>>>>>>> u8 mcs, u8 nss)
>>>>>>>>>  {
>>>>>>>>> @@ -6232,7 +6234,8 @@ struct sk_buff 
>>>>>>>>> *ieee80211_tx_dequeue(struct
>>>>>>>>> ieee80211_hw *hw,
>>>>>>>>>   * @ac: AC number to return packets from.
>>>>>>>>>   *
>>>>>>>>>   * Should only be called between calls to
>>>>>>>>> ieee80211_txq_schedule_start()
>>>>>>>>> - * and ieee80211_txq_schedule_end().
>>>>>>>>> + * and ieee80211_txq_schedule_end(). If the txq is empty, it 
>>>>>>>>> will
>>>>>>>>> be
>>>>>>>>> added
>>>>>>>>> + * to a remove list and get removed later.
>>>>>>>>>   * Returns the next txq if successful, %NULL if no queue is
>>>>>>>>> eligible.
>>>>>>>>> If a txq
>>>>>>>>>   * is returned, it should be returned with 
>>>>>>>>> ieee80211_return_txq()
>>>>>>>>> after the
>>>>>>>>>   * driver has finished scheduling it.
>>>>>>>>> @@ -6268,7 +6271,8 @@ void ieee80211_txq_schedule_start(struct
>>>>>>>>> ieee80211_hw *hw, u8 ac)
>>>>>>>>>   * @hw: pointer as obtained from ieee80211_alloc_hw()
>>>>>>>>>   * @ac: AC number to acquire locks for
>>>>>>>>>   *
>>>>>&

Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-20 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> On 2019-09-19 18:37, Toke Høiland-Jørgensen wrote:
>> Yibo Zhao  writes:
>> 
>>> On 2019-09-18 19:23, Toke Høiland-Jørgensen wrote:
>>>> Yibo Zhao  writes:
>>>> 
>>>>> On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:
>>>>>> Yibo Zhao  writes:
>>>>>> 
>>>>>>> In a loop txqs dequeue scenario, if the first txq in the rbtree 
>>>>>>> gets
>>>>>>> removed from rbtree immediately in the ieee80211_return_txq(), the
>>>>>>> loop will break soon in the ieee80211_next_txq() due to 
>>>>>>> schedule_pos
>>>>>>> not leading to the second txq in the rbtree. Thus, defering the
>>>>>>> removal right before the end of this schedule round.
>>>>>>> 
>>>>>>> Co-developed-by: Yibo Zhao 
>>>>>>> Signed-off-by: Yibo Zhao 
>>>>>>> Signed-off-by: Toke Høiland-Jørgensen 
>>>>>> 
>>>>>> I didn't write this patch, so please don't use my sign-off. I'll 
>>>>>> add
>>>>>> ack or review tags as appropriate in reply; but a few comments 
>>>>>> first:
>>>>>> 
>>>>>>> ---
>>>>>>>  include/net/mac80211.h | 16 ++--
>>>>>>>  net/mac80211/ieee80211_i.h |  3 +++
>>>>>>>  net/mac80211/main.c|  6 +
>>>>>>>  net/mac80211/tx.c  | 63
>>>>>>> +++---
>>>>>>>  4 files changed, 83 insertions(+), 5 deletions(-)
>>>>>>> 
>>>>>>> diff --git a/include/net/mac80211.h b/include/net/mac80211.h
>>>>>>> index ac2ed8e..ba5a345 100644
>>>>>>> --- a/include/net/mac80211.h
>>>>>>> +++ b/include/net/mac80211.h
>>>>>>> @@ -925,6 +925,8 @@ struct ieee80211_tx_rate {
>>>>>>> 
>>>>>>>  #define IEEE80211_MAX_TX_RETRY 31
>>>>>>> 
>>>>>>> +#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
>>>>>>> +
>>>>>>>  static inline void ieee80211_rate_set_vht(struct 
>>>>>>> ieee80211_tx_rate
>>>>>>> *rate,
>>>>>>>   u8 mcs, u8 nss)
>>>>>>>  {
>>>>>>> @@ -6232,7 +6234,8 @@ struct sk_buff *ieee80211_tx_dequeue(struct
>>>>>>> ieee80211_hw *hw,
>>>>>>>   * @ac: AC number to return packets from.
>>>>>>>   *
>>>>>>>   * Should only be called between calls to
>>>>>>> ieee80211_txq_schedule_start()
>>>>>>> - * and ieee80211_txq_schedule_end().
>>>>>>> + * and ieee80211_txq_schedule_end(). If the txq is empty, it will
>>>>>>> be
>>>>>>> added
>>>>>>> + * to a remove list and get removed later.
>>>>>>>   * Returns the next txq if successful, %NULL if no queue is
>>>>>>> eligible.
>>>>>>> If a txq
>>>>>>>   * is returned, it should be returned with ieee80211_return_txq()
>>>>>>> after the
>>>>>>>   * driver has finished scheduling it.
>>>>>>> @@ -6268,7 +6271,8 @@ void ieee80211_txq_schedule_start(struct
>>>>>>> ieee80211_hw *hw, u8 ac)
>>>>>>>   * @hw: pointer as obtained from ieee80211_alloc_hw()
>>>>>>>   * @ac: AC number to acquire locks for
>>>>>>>   *
>>>>>>> - * Release locks previously acquired by
>>>>>>> ieee80211_txq_schedule_end().
>>>>>>> + * Release locks previously acquired by
>>>>>>> ieee80211_txq_schedule_end().
>>>>>>> Check
>>>>>>> + * and remove the empty txq from rb-tree.
>>>>>>>   */
>>>>>>>  void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, u8 ac)
>>>>>>> __releases(txq_lock);
>>>>>>> @@ -6287,6 +6291,14 @@ void ieee80211_schedule_txq(struct
>>>>>>> ieee80211_hw
>>>>>>> *hw, struct ieee80211_txq *txq)
>>>>>>> __acquires(txq_lock) __releases(txq_lock);
>&g

Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-19 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> On 2019-09-18 19:23, Toke Høiland-Jørgensen wrote:
>> Yibo Zhao  writes:
>> 
>>> On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:
>>>> Yibo Zhao  writes:
>>>> 
>>>>> In a loop txqs dequeue scenario, if the first txq in the rbtree gets
>>>>> removed from rbtree immediately in the ieee80211_return_txq(), the
>>>>> loop will break soon in the ieee80211_next_txq() due to schedule_pos
>>>>> not leading to the second txq in the rbtree. Thus, defering the
>>>>> removal right before the end of this schedule round.
>>>>> 
>>>>> Co-developed-by: Yibo Zhao 
>>>>> Signed-off-by: Yibo Zhao 
>>>>> Signed-off-by: Toke Høiland-Jørgensen 
>>>> 
>>>> I didn't write this patch, so please don't use my sign-off. I'll add
>>>> ack or review tags as appropriate in reply; but a few comments first:
>>>> 
>>>>> ---
>>>>>  include/net/mac80211.h | 16 ++--
>>>>>  net/mac80211/ieee80211_i.h |  3 +++
>>>>>  net/mac80211/main.c|  6 +
>>>>>  net/mac80211/tx.c  | 63
>>>>> +++---
>>>>>  4 files changed, 83 insertions(+), 5 deletions(-)
>>>>> 
>>>>> diff --git a/include/net/mac80211.h b/include/net/mac80211.h
>>>>> index ac2ed8e..ba5a345 100644
>>>>> --- a/include/net/mac80211.h
>>>>> +++ b/include/net/mac80211.h
>>>>> @@ -925,6 +925,8 @@ struct ieee80211_tx_rate {
>>>>> 
>>>>>  #define IEEE80211_MAX_TX_RETRY   31
>>>>> 
>>>>> +#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
>>>>> +
>>>>>  static inline void ieee80211_rate_set_vht(struct ieee80211_tx_rate
>>>>> *rate,
>>>>> u8 mcs, u8 nss)
>>>>>  {
>>>>> @@ -6232,7 +6234,8 @@ struct sk_buff *ieee80211_tx_dequeue(struct
>>>>> ieee80211_hw *hw,
>>>>>   * @ac: AC number to return packets from.
>>>>>   *
>>>>>   * Should only be called between calls to
>>>>> ieee80211_txq_schedule_start()
>>>>> - * and ieee80211_txq_schedule_end().
>>>>> + * and ieee80211_txq_schedule_end(). If the txq is empty, it will 
>>>>> be
>>>>> added
>>>>> + * to a remove list and get removed later.
>>>>>   * Returns the next txq if successful, %NULL if no queue is 
>>>>> eligible.
>>>>> If a txq
>>>>>   * is returned, it should be returned with ieee80211_return_txq()
>>>>> after the
>>>>>   * driver has finished scheduling it.
>>>>> @@ -6268,7 +6271,8 @@ void ieee80211_txq_schedule_start(struct
>>>>> ieee80211_hw *hw, u8 ac)
>>>>>   * @hw: pointer as obtained from ieee80211_alloc_hw()
>>>>>   * @ac: AC number to acquire locks for
>>>>>   *
>>>>> - * Release locks previously acquired by 
>>>>> ieee80211_txq_schedule_end().
>>>>> + * Release locks previously acquired by 
>>>>> ieee80211_txq_schedule_end().
>>>>> Check
>>>>> + * and remove the empty txq from rb-tree.
>>>>>   */
>>>>>  void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, u8 ac)
>>>>>   __releases(txq_lock);
>>>>> @@ -6287,6 +6291,14 @@ void ieee80211_schedule_txq(struct 
>>>>> ieee80211_hw
>>>>> *hw, struct ieee80211_txq *txq)
>>>>>   __acquires(txq_lock) __releases(txq_lock);
>>>>> 
>>>>>  /**
>>>>> + * ieee80211_txqs_check - Check txqs waiting for removal
>>>>> + *
>>>>> + * @tmr: pointer as obtained from local
>>>>> + *
>>>>> + */
>>>>> +void ieee80211_txqs_check(struct timer_list *tmr);
>>>>> +
>>>>> +/**
>>>>>   * ieee80211_txq_may_transmit - check whether TXQ is allowed to
>>>>> transmit
>>>>>   *
>>>>>   * This function is used to check whether given txq is allowed to
>>>>> transmit by
>>>>> diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
>>>>> index a4556f9..49aa143e 100644
>>>>> --- a/net/mac80211/ieee80211_i.h
>>>&

Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-18 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:
>> Yibo Zhao  writes:
>> 
>>> In a loop txqs dequeue scenario, if the first txq in the rbtree gets
>>> removed from rbtree immediately in the ieee80211_return_txq(), the
>>> loop will break soon in the ieee80211_next_txq() due to schedule_pos
>>> not leading to the second txq in the rbtree. Thus, defering the
>>> removal right before the end of this schedule round.
>>> 
>>> Co-developed-by: Yibo Zhao 
>>> Signed-off-by: Yibo Zhao 
>>> Signed-off-by: Toke Høiland-Jørgensen 
>> 
>> I didn't write this patch, so please don't use my sign-off. I'll add
>> ack or review tags as appropriate in reply; but a few comments first:
>> 
>>> ---
>>>  include/net/mac80211.h | 16 ++--
>>>  net/mac80211/ieee80211_i.h |  3 +++
>>>  net/mac80211/main.c|  6 +
>>>  net/mac80211/tx.c  | 63 
>>> +++---
>>>  4 files changed, 83 insertions(+), 5 deletions(-)
>>> 
>>> diff --git a/include/net/mac80211.h b/include/net/mac80211.h
>>> index ac2ed8e..ba5a345 100644
>>> --- a/include/net/mac80211.h
>>> +++ b/include/net/mac80211.h
>>> @@ -925,6 +925,8 @@ struct ieee80211_tx_rate {
>>> 
>>>  #define IEEE80211_MAX_TX_RETRY 31
>>> 
>>> +#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
>>> +
>>>  static inline void ieee80211_rate_set_vht(struct ieee80211_tx_rate 
>>> *rate,
>>>   u8 mcs, u8 nss)
>>>  {
>>> @@ -6232,7 +6234,8 @@ struct sk_buff *ieee80211_tx_dequeue(struct 
>>> ieee80211_hw *hw,
>>>   * @ac: AC number to return packets from.
>>>   *
>>>   * Should only be called between calls to 
>>> ieee80211_txq_schedule_start()
>>> - * and ieee80211_txq_schedule_end().
>>> + * and ieee80211_txq_schedule_end(). If the txq is empty, it will be 
>>> added
>>> + * to a remove list and get removed later.
>>>   * Returns the next txq if successful, %NULL if no queue is eligible. 
>>> If a txq
>>>   * is returned, it should be returned with ieee80211_return_txq() 
>>> after the
>>>   * driver has finished scheduling it.
>>> @@ -6268,7 +6271,8 @@ void ieee80211_txq_schedule_start(struct 
>>> ieee80211_hw *hw, u8 ac)
>>>   * @hw: pointer as obtained from ieee80211_alloc_hw()
>>>   * @ac: AC number to acquire locks for
>>>   *
>>> - * Release locks previously acquired by ieee80211_txq_schedule_end().
>>> + * Release locks previously acquired by ieee80211_txq_schedule_end(). 
>>> Check
>>> + * and remove the empty txq from rb-tree.
>>>   */
>>>  void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, u8 ac)
>>> __releases(txq_lock);
>>> @@ -6287,6 +6291,14 @@ void ieee80211_schedule_txq(struct ieee80211_hw 
>>> *hw, struct ieee80211_txq *txq)
>>> __acquires(txq_lock) __releases(txq_lock);
>>> 
>>>  /**
>>> + * ieee80211_txqs_check - Check txqs waiting for removal
>>> + *
>>> + * @tmr: pointer as obtained from local
>>> + *
>>> + */
>>> +void ieee80211_txqs_check(struct timer_list *tmr);
>>> +
>>> +/**
>>>   * ieee80211_txq_may_transmit - check whether TXQ is allowed to 
>>> transmit
>>>   *
>>>   * This function is used to check whether given txq is allowed to 
>>> transmit by
>>> diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
>>> index a4556f9..49aa143e 100644
>>> --- a/net/mac80211/ieee80211_i.h
>>> +++ b/net/mac80211/ieee80211_i.h
>>> @@ -847,6 +847,7 @@ struct txq_info {
>>> struct codel_stats cstats;
>>> struct sk_buff_head frags;
>>> struct rb_node schedule_order;
>>> +   struct list_head candidate;
>>> unsigned long flags;
>>> 
>>> /* keep last! */
>>> @@ -1145,6 +1146,8 @@ struct ieee80211_local {
>>> u64 airtime_v_t[IEEE80211_NUM_ACS];
>>> u64 airtime_weight_sum[IEEE80211_NUM_ACS];
>>> 
>>> +   struct list_head remove_list[IEEE80211_NUM_ACS];
>>> +   struct timer_list remove_timer;
>>> u16 airtime_flags;
>>> 
>>> const struct ieee80211_ops *ops;
>>> diff --git a/net/mac80211/main.c b/net/mac80211/main.c
>>> index e9ffa8e..78fe24a 100644
>>> --- 

Re: [PATCH 3/4] mac80211: fix low throughput in push pull mode

2019-09-18 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> On 2019-09-18 05:12, Toke Høiland-Jørgensen wrote:
>> Yibo Zhao  writes:
>> 
>>> On 2019-09-16 23:27, Johannes Berg wrote:
>>>> Without really looking at the code -
>>>> 
>>>>> If station is ineligible for transmission in
>>>>> ieee80211_txq_may_transmit(),
>>>>> no packet will be delivered to FW. During the tests in push-pull 
>>>>> mode
>>>>> with
>>>>> many clients, after several seconds, not a single station is an
>>>>> eligible
>>>>> candidate for transmission since global time is smaller than all the
>>>>> station's virtual airtime. As a consequence, the Tx has been blocked
>>>>> and
>>>>> throughput is quite low.
>>>> 
>>>> You should rewrite this to be, erm, a bit more understandable in
>>>> mac80211 context. I assume you're speaking (mostly?) about ath10k, 
>>>> but
>>>> I
>>>> have very little context there. "push pull mode"? "firmware"? These
>>>> things are not something mac80211 knows about.
>>> Hi Johannes,
>>> 
>>> Thanks for your kindly reminder. Will rewrite the commit log.
>>> 
>>>> 
>>>>> Co-developed-by: Yibo Zhao 
>>>> 
>>>> That also seems wrong, should be Toke I guess, unless you intended 
>>>> for
>>>> a
>>>> From: Toke to be present?
>>> Do you mean it should be something like:
>>> 
>>> Co-developed-by: Toke Høiland-Jørgensen 
>>> Signed-off-by: Yibo Zhao 
>>> Signed-off-by: Toke Høiland-Jørgensen 
>>> 
>>> Am I understanding right?
>> 
>> I think the right thing here, as with the previous patch, is to just
>> drop my sign-off; you're writing this patch, and I'll add ack/reviews 
>> as
>> appropriate. And in that case, well, no need to have co-developed-by
>> yourself when your name is on the patch as author :)
>> 
>> -Toke
> Sorry, I think I have missed checking your reply, please ignore the 
> wrong signed-off in PATCH-V2.

While you're re-spinning, could you please add a changelog for the
changes you make? Makes it easier to keep track :)

You can add a cover-letter with a full changelog instead of having a
separate changelog for each patch; that's what I usually do...

-Toke



Re: [PATCH V2 3/4] mac80211: fix low throughput in multi-clients situation

2019-09-18 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> Not long after the start of multi-clients test, not a single station is
> an eligible candidate for transmission since global virtual time(g_vt) is
> smaller than the virtual airtime(s_vt) of all the stations. As a result,
> the Tx has been blocked and throughput is quite low.
>
> This may mainly due to sync mechanism and accumulative deviation from the
> devision calculation of g_vt.
>
> For example:
> Suppose we have 50 clients in first round.
> Round 1:
> STA   weight  Tx_time_round  wt_sum   s_vtg_vt  valid_for_next_Tx
> . .   .   .   .
> . .   .   .   .
> . .   .   .   .
>
> After this round, all the stations are not valid for next transmission due
> to accumulative deviation.
>
> And if we add a new #51,
> Round 2:
> STA   weight  Tx_time_round   wt_sum  s_vtg_vt  valid_for_next_Tx
> . .   .   .   .
> . .   .   .   .
> . .   .   .   .
>
> Sync is done by:
> max(g_vt of last round - grace period, s_vt)
> and s_vt of #51 = max(2000 - 500, 0) + 1024 = 2524, and it is more than the
> final g_vt of this round.
>
> After this round, no more station is valid for transmission.

I'm not sure I understand this. Was there supposed to be numbers in
those tables above?

-Toke



Re: [PATCH 1/4] mac80211: Switch to a virtual time-based airtime scheduler

2019-09-17 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> From: Toke Høiland-Jørgensen 
>
> This switches the airtime scheduler in mac80211 to use a virtual time-based
> scheduler instead of the round-robin scheduler used before. This has a
> couple of advantages:
>
> - No need to sync up the round-robin scheduler in firmware/hardware with
>   the round-robin airtime scheduler.
>
> - If several stations are eligible for transmission we can schedule both of
>   them; no need to hard-block the scheduling rotation until the head of the
>   queue has used up its quantum.
>
> - The check of whether a station is eligible for transmission becomes
>   simpler (in ieee80211_txq_may_transmit()).
>
> The drawback is that scheduling becomes slightly more expensive, as we need
> to maintain an rbtree of TXQs sorted by virtual time. This means that
> ieee80211_register_airtime() becomes O(logN) in the number of currently
> scheduled TXQs. However, hopefully this number rarely grows too big (it's
> only TXQs currently backlogged, not all associated stations), so it
> shouldn't be too big of an issue.
>
> Signed-off-by: Toke Høiland-Jørgensen 

I'll note that this patch still has the two issues that Felix pointed
out when I posted the RFC version. Namely:

- The use of divisions in the fast path. I guess I need to go write some
  reciprocal-calculation code, since that is also an issue with the AQL
  patches I linked to before.

- The fact that we don't count the airtime usage of multicast traffic,
  which with this series means that the vif TXQ will get priority over
  the others. I think we agreed to fix this by just adding an airtime
  v_t to the vif as well and use that for scheduling the TXQ. Does
  ath10k report airtime usage for multicast as well, or only for
  stations?


-Toke



Re: [PATCH 4/4] mac80211: Sync airtime weight sum with per AC synced sta airtime weight together

2019-09-17 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> Global airtime weight sum is updated only when txq is added/removed
> from rbtree. If upper layer configures sta weight during high load,
> airtime weight sum will not be updated since txq is most likely on the
> tree. It could a little late for upper layer to reconfigure sta weight
> when txq is already in the rbtree. And thus, incorrect airtime weight sum
> will lead to incorrect global virtual time calculation as well as global
> airtime weight sum overflow of airtime weight sum during txq removed.
>
> Hence, need to update airtime weight sum upon receiving event for
> configuring sta weight once sta's txq is on the rbtree.
>
> Besides, if airtime weight sum of ACs and sta weight is synced under the
> same per AC lock protection, there can be a very short window causing
> incorrct airtime weight sum calculation as below:
>
> active_txq_lock_VO  .
> VO weight sum is syncd.
> sta airtime weight sum is synced  .
> active_txq_unlock_VO  .
> . .
> active_txq_lock_VI.
> VI weight sum is syncd.
> sta airtime weight sumactive_txq_lock_BE
> active_txq_unlock_VIRemove txq and thus sum
> .   is calculated with synced
> .   sta airtime weight
> . active_txq_unlock_BE
>
> So introduce a per ac synced station airtime weight synced with per
> AC synced weight sum together. And the per-AC station airtime weight
> is used to calculate weight sum.
>
> Signed-off-by: Yibo Zhao 
> ---
>  net/mac80211/cfg.c  | 27 +--
>  net/mac80211/sta_info.c |  6 --
>  net/mac80211/sta_info.h |  3 +++
>  net/mac80211/tx.c   |  4 ++--
>  4 files changed, 34 insertions(+), 6 deletions(-)
>
> diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
> index d65aa01..4b420bb 100644
> --- a/net/mac80211/cfg.c
> +++ b/net/mac80211/cfg.c
> @@ -1284,7 +1284,8 @@ static int sta_apply_parameters(struct ieee80211_local 
> *local,
>   int ret = 0;
>   struct ieee80211_supported_band *sband;
>   struct ieee80211_sub_if_data *sdata = sta->sdata;
> - u32 mask, set;
> + u32 mask, set, tid, ac;
> + struct txq_info *txqi;
>  
>   sband = ieee80211_get_sband(sdata);
>   if (!sband)
> @@ -1452,8 +1453,30 @@ static int sta_apply_parameters(struct ieee80211_local 
> *local,
>   if (ieee80211_vif_is_mesh(&sdata->vif))
>   sta_apply_mesh_params(local, sta, params);
>  
> - if (params->airtime_weight)
> + if (params->airtime_weight &&
> + params->airtime_weight != sta->airtime_weight) {
> + for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) {
> + spin_lock_bh(&local->active_txq_lock[ac]);
> + for (tid = 0; tid < IEEE80211_NUM_TIDS + 1; tid++) {
> + if (!sta->sta.txq[tid] ||
> + ac != ieee80211_ac_from_tid(tid))
> + continue;
> +
> + sta->airtime_weight_synced[ac] =
> + params->airtime_weight;
> +
> + txqi = to_txq_info(sta->sta.txq[tid]);
> + if (RB_EMPTY_NODE(&txqi->schedule_order))
> + continue;
> +
> + local->airtime_weight_sum[ac] = 
> local->airtime_weight_sum[ac] +
> + 
> params->airtime_weight -
> + 
> sta->airtime_weight;
> + }
> + spin_unlock_bh(&local->active_txq_lock[ac]);
> + }
>   sta->airtime_weight = params->airtime_weight;

With this, airtime_weight is basically only used to return to and from
userspace, right? I.e., after the above loop has run, it will match the
contents of airtime_weight_synced; so why not just turn airtime_weight
into  a per-ac array? You could just use airtime_weight[0] as the value
to return to userspace...

-Toke



Re: [PATCH 3/4] mac80211: fix low throughput in push pull mode

2019-09-17 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> On 2019-09-16 23:27, Johannes Berg wrote:
>> Without really looking at the code -
>> 
>>> If station is ineligible for transmission in 
>>> ieee80211_txq_may_transmit(),
>>> no packet will be delivered to FW. During the tests in push-pull mode 
>>> with
>>> many clients, after several seconds, not a single station is an 
>>> eligible
>>> candidate for transmission since global time is smaller than all the
>>> station's virtual airtime. As a consequence, the Tx has been blocked 
>>> and
>>> throughput is quite low.
>> 
>> You should rewrite this to be, erm, a bit more understandable in
>> mac80211 context. I assume you're speaking (mostly?) about ath10k, but 
>> I
>> have very little context there. "push pull mode"? "firmware"? These
>> things are not something mac80211 knows about.
> Hi Johannes,
>
> Thanks for your kindly reminder. Will rewrite the commit log.
>
>> 
>>> Co-developed-by: Yibo Zhao 
>> 
>> That also seems wrong, should be Toke I guess, unless you intended for 
>> a
>> From: Toke to be present?
> Do you mean it should be something like:
>
> Co-developed-by: Toke Høiland-Jørgensen 
> Signed-off-by: Yibo Zhao 
> Signed-off-by: Toke Høiland-Jørgensen 
>
> Am I understanding right?

I think the right thing here, as with the previous patch, is to just
drop my sign-off; you're writing this patch, and I'll add ack/reviews as
appropriate. And in that case, well, no need to have co-developed-by
yourself when your name is on the patch as author :)

-Toke



Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-17 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> In a loop txqs dequeue scenario, if the first txq in the rbtree gets
> removed from rbtree immediately in the ieee80211_return_txq(), the
> loop will break soon in the ieee80211_next_txq() due to schedule_pos
> not leading to the second txq in the rbtree. Thus, defering the
> removal right before the end of this schedule round.
>
> Co-developed-by: Yibo Zhao 
> Signed-off-by: Yibo Zhao 
> Signed-off-by: Toke Høiland-Jørgensen 

I didn't write this patch, so please don't use my sign-off. I'll add
ack or review tags as appropriate in reply; but a few comments first:

> ---
>  include/net/mac80211.h | 16 ++--
>  net/mac80211/ieee80211_i.h |  3 +++
>  net/mac80211/main.c|  6 +
>  net/mac80211/tx.c  | 63 
> +++---
>  4 files changed, 83 insertions(+), 5 deletions(-)
>
> diff --git a/include/net/mac80211.h b/include/net/mac80211.h
> index ac2ed8e..ba5a345 100644
> --- a/include/net/mac80211.h
> +++ b/include/net/mac80211.h
> @@ -925,6 +925,8 @@ struct ieee80211_tx_rate {
>  
>  #define IEEE80211_MAX_TX_RETRY   31
>  
> +#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
> +
>  static inline void ieee80211_rate_set_vht(struct ieee80211_tx_rate *rate,
> u8 mcs, u8 nss)
>  {
> @@ -6232,7 +6234,8 @@ struct sk_buff *ieee80211_tx_dequeue(struct 
> ieee80211_hw *hw,
>   * @ac: AC number to return packets from.
>   *
>   * Should only be called between calls to ieee80211_txq_schedule_start()
> - * and ieee80211_txq_schedule_end().
> + * and ieee80211_txq_schedule_end(). If the txq is empty, it will be added
> + * to a remove list and get removed later.
>   * Returns the next txq if successful, %NULL if no queue is eligible. If a 
> txq
>   * is returned, it should be returned with ieee80211_return_txq() after the
>   * driver has finished scheduling it.
> @@ -6268,7 +6271,8 @@ void ieee80211_txq_schedule_start(struct ieee80211_hw 
> *hw, u8 ac)
>   * @hw: pointer as obtained from ieee80211_alloc_hw()
>   * @ac: AC number to acquire locks for
>   *
> - * Release locks previously acquired by ieee80211_txq_schedule_end().
> + * Release locks previously acquired by ieee80211_txq_schedule_end(). Check
> + * and remove the empty txq from rb-tree.
>   */
>  void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, u8 ac)
>   __releases(txq_lock);
> @@ -6287,6 +6291,14 @@ void ieee80211_schedule_txq(struct ieee80211_hw *hw, 
> struct ieee80211_txq *txq)
>   __acquires(txq_lock) __releases(txq_lock);
>  
>  /**
> + * ieee80211_txqs_check - Check txqs waiting for removal
> + *
> + * @tmr: pointer as obtained from local
> + *
> + */
> +void ieee80211_txqs_check(struct timer_list *tmr);
> +
> +/**
>   * ieee80211_txq_may_transmit - check whether TXQ is allowed to transmit
>   *
>   * This function is used to check whether given txq is allowed to transmit by
> diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
> index a4556f9..49aa143e 100644
> --- a/net/mac80211/ieee80211_i.h
> +++ b/net/mac80211/ieee80211_i.h
> @@ -847,6 +847,7 @@ struct txq_info {
>   struct codel_stats cstats;
>   struct sk_buff_head frags;
>   struct rb_node schedule_order;
> + struct list_head candidate;
>   unsigned long flags;
>  
>   /* keep last! */
> @@ -1145,6 +1146,8 @@ struct ieee80211_local {
>   u64 airtime_v_t[IEEE80211_NUM_ACS];
>   u64 airtime_weight_sum[IEEE80211_NUM_ACS];
>  
> + struct list_head remove_list[IEEE80211_NUM_ACS];
> + struct timer_list remove_timer;
>   u16 airtime_flags;
>  
>   const struct ieee80211_ops *ops;
> diff --git a/net/mac80211/main.c b/net/mac80211/main.c
> index e9ffa8e..78fe24a 100644
> --- a/net/mac80211/main.c
> +++ b/net/mac80211/main.c
> @@ -667,10 +667,15 @@ struct ieee80211_hw *ieee80211_alloc_hw_nm(size_t 
> priv_data_len,
>  
>   for (i = 0; i < IEEE80211_NUM_ACS; i++) {
>   local->active_txqs[i] = RB_ROOT_CACHED;
> + INIT_LIST_HEAD(&local->remove_list[i]);
>   spin_lock_init(&local->active_txq_lock[i]);
>   }
>   local->airtime_flags = AIRTIME_USE_TX | AIRTIME_USE_RX;
>  
> + timer_setup(&local->remove_timer, ieee80211_txqs_check, 0);
> + mod_timer(&local->remove_timer,
> +   jiffies + 
> msecs_to_jiffies(IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS));
> +
>   INIT_LIST_HEAD(&local->chanctx_list);
>   mutex_init(&local->chanctx_mtx);
>  
> @@ -1305,6 +1310,7 @@ void ieee80211_unregister_hw(struct ieee80211_hw *hw)
>  

Re: [PATCH 1/4] mac80211: Switch to a virtual time-based airtime scheduler

2019-09-17 Thread Toke Høiland-Jørgensen
Yibo Zhao  writes:

> From: Toke Høiland-Jørgensen 
>
> This switches the airtime scheduler in mac80211 to use a virtual time-based
> scheduler instead of the round-robin scheduler used before. This has a
> couple of advantages:

Thank you for keeping at this! I'll take a look at the series in detail
tomorrow.

While you're testing things related to this, I've also prototyped a port
of the "airtime queue limit" feature from chromeos into mainline. It's
currently in my tree here:
https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/log/?h=mac80211-aql-01

If you have time to test it at some point, that would be awesome. I'm
planning to submit it as an RFC, but it needs a bit more work first.
Also, it's completely untested, but it does compile :)

-Toke



RE: [PATCH 4/7] ath10k: disable TX complete indication of htt for sdio

2019-08-23 Thread Toke Høiland-Jørgensen
Wen Gong  writes:

>> -Original Message-
>> From: Toke Høiland-Jørgensen 
>> Sent: Thursday, August 22, 2019 8:12 PM
>> To: Wen Gong ; Wen Gong
>> ; ath10k@lists.infradead.org
>> Cc: linux-wirel...@vger.kernel.org
>> Subject: [EXT] RE: [PATCH 4/7] ath10k: disable TX complete indication of htt
>> for sdio
>> >> What's a typical limit in the firmware?
>> > It is 96 data packet in my test. It can changed in firmware code.
>> 
>> Right, I see. Is this counter available in all ath10k firmware, or is it
>> SDIO only?
>> 
> It is only for SDIO.
> For PCIE, it does not have credit limit, firmware memory only need to save
> The tx descriptor, not need for palyload.

OK, too bad. Thanks for the explanations :)

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


RE: [PATCH 4/7] ath10k: disable TX complete indication of htt for sdio

2019-08-22 Thread Toke Høiland-Jørgensen
Wen Gong  writes:

>> -Original Message-
>> From: Toke Høiland-Jørgensen 
>> Sent: Wednesday, August 21, 2019 6:10 PM
>> To: Wen Gong ; Wen Gong
>> ; ath10k@lists.infradead.org
>> Cc: linux-wirel...@vger.kernel.org
>> Subject: [EXT] RE: [PATCH 4/7] ath10k: disable TX complete indication of htt
>> for sdio
>> 
>> Wen Gong  writes:
>> 
>> >> -Original Message-
>> >> From: ath10k  On Behalf Of Toke
>> >> Høiland-Jørgensen
>> >> Sent: Tuesday, August 20, 2019 8:24 PM
>> >> To: Wen Gong ; ath10k@lists.infradead.org
>> >> Cc: linux-wirel...@vger.kernel.org
>> >> Subject: [EXT] Re: [PATCH 4/7] ath10k: disable TX complete indication of
>> htt
>> > When this patch applied, firmware will not indicate tx complete for tx
>> > Data, it only indicate HTT_T2H_MSG_TYPE_TX_CREDIT_UPDATE_IND,
>> > This htt msg will tell how many data tx complete without status(status
>> maybe success/fail).
>> 
>> Ah, so this is basically a counter of how much data is currently queued
>> in the firmware?
> Yes.
>> 
>> >> And could you explain what the credits thing is for, please? :)
>> > For high latency bus chip, all the tx data's content(include ip/udp/tcp
>> header
>> > and payload) will be transfer to firmware's memory via bus.
>> > And firmware has limited memory for tx data, the tx data's content must
>> > Saved in firmware memory before it tx complete, if ath10k transfer tx
>> > data more than the limit, firmware will occur error. The credit is used
>> > to avoid ath10k exceed the limit.
>> 
>> What's a typical limit in the firmware?
> It is 96 data packet in my test. It can changed in firmware code.

Right, I see. Is this counter available in all ath10k firmware, or is it
SDIO only?

I'm asking because this could also be a way of implementing something
like Byte Queue Limits (though I'm not sure how effective it would be).

-Toke


RE: [PATCH 4/7] ath10k: disable TX complete indication of htt for sdio

2019-08-21 Thread Toke Høiland-Jørgensen
Wen Gong  writes:

>> -Original Message-
>> From: ath10k  On Behalf Of Toke
>> Høiland-Jørgensen
>> Sent: Tuesday, August 20, 2019 8:24 PM
>> To: Wen Gong ; ath10k@lists.infradead.org
>> Cc: linux-wirel...@vger.kernel.org
>> Subject: [EXT] Re: [PATCH 4/7] ath10k: disable TX complete indication of htt
>> for sdio
>> 
>> Wen Gong  writes:
>> 
>> > Tx complete message from firmware cost bus bandwidth of sdio, and bus
>> > bandwidth is the bollteneck of throughput, it will effect the bandwidth
>> > occupancy of data packet of TX and RX.
>> >
>> > This patch disable TX complete indication from firmware for htt data
>> > packet, it results in significant performance improvement on TX path.
>> 
>> Wait, how does that work? Am I understanding it correctly that this
>> replaces a per-packet TX completion with a periodic one sent out of
>> band?
> When this patch applied, firmware will not indicate tx complete for tx
> Data, it only indicate HTT_T2H_MSG_TYPE_TX_CREDIT_UPDATE_IND,
> This htt msg will tell how many data tx complete without status(status maybe 
> success/fail).

Ah, so this is basically a counter of how much data is currently queued
in the firmware?

>> And could you explain what the credits thing is for, please? :)
> For high latency bus chip, all the tx data's content(include ip/udp/tcp header
> and payload) will be transfer to firmware's memory via bus.
> And firmware has limited memory for tx data, the tx data's content must
> Saved in firmware memory before it tx complete, if ath10k transfer tx
> data more than the limit, firmware will occur error. The credit is used
> to avoid ath10k exceed the limit.

What's a typical limit in the firmware?

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


RE: [PATCH 2/7] ath10k: change max RX bundle size from 8 to 32 for sdio

2019-08-21 Thread Toke Høiland-Jørgensen
Wen Gong  writes:

>> -Original Message-
>> From: ath10k  On Behalf Of Toke
>> Høiland-Jørgensen
>> Sent: Tuesday, August 20, 2019 8:23 PM
>> To: Wen Gong ; ath10k@lists.infradead.org
>> Cc: linux-wirel...@vger.kernel.org
>> Subject: [EXT] Re: [PATCH 2/7] ath10k: change max RX bundle size from 8 to
>> 32 for sdio
>> 
>> Wen Gong  writes:
>> 
>> > The max bundle size support by firmware is 32, change it from 8 to 32
>> > will help performance. This results in significant performance
>> > improvement on RX path.
>> 
>> What happens when the hardware doesn't have enough data to fill a
>> bundle? Does it send a smaller one, or does it wait until it can fill
>> it?
>> 
>
> The bundle is filled by firmware. 
> It will not wait until it can fill it.
> For example, if you do ping per second, it will have only 1 ping packet
> In the bundle.

Right, cool.

-Toke


Re: [PATCH 4/7] ath10k: disable TX complete indication of htt for sdio

2019-08-20 Thread Toke Høiland-Jørgensen
Wen Gong  writes:

> Tx complete message from firmware cost bus bandwidth of sdio, and bus
> bandwidth is the bollteneck of throughput, it will effect the bandwidth
> occupancy of data packet of TX and RX.
>
> This patch disable TX complete indication from firmware for htt data
> packet, it results in significant performance improvement on TX path.

Wait, how does that work? Am I understanding it correctly that this
replaces a per-packet TX completion with a periodic one sent out of
band?

And could you explain what the credits thing is for, please? :)

-Toke


Re: [PATCH 2/7] ath10k: change max RX bundle size from 8 to 32 for sdio

2019-08-20 Thread Toke Høiland-Jørgensen
Wen Gong  writes:

> The max bundle size support by firmware is 32, change it from 8 to 32
> will help performance. This results in significant performance
> improvement on RX path.

What happens when the hardware doesn't have enough data to fill a
bundle? Does it send a smaller one, or does it wait until it can fill
it?

-Toke


Re: [PATCH v2] ath10k: fix different tx duration output

2019-05-07 Thread Toke Høiland-Jørgensen
le...@codeaurora.org writes:

> On 2019-04-18 16:07, Toke Høiland-Jørgensen wrote:
>> le...@codeaurora.org writes:
>> 
>>> On 2019-04-17 17:26, Toke Høiland-Jørgensen wrote:
>>>> Lei Wang  writes:
>>>> 
>>>>> TX duration output of tx_stats in debugfs and station dump had big
>>>>> difference because they got tx duration value from different 
>>>>> statistic
>>>>> data. We should use the same statistic data.
>>>> 
>>>> So are you sure you picked the most accurate one of the two? :)
>>>> 
>>>> -Toke
>>> 
>>> Hi Toke,
>>> 
>>> Yes.
>>> Now for ath10k, there are two ways to get tx duration output.
>>> One is got from tx_stats in debugfs reported by firmware. It is a 
>>> total
>>> value including all the frames which created by host and firmware sent
>>> to the peer.
>>> And the second is calculated from
>>> ath10k_htt_rx_tx_compl_ind()-->ieee80211_sta_register_airtime(), here
>>> the tx duration just includes the data frames sent from host to the
>>> peer.
>> 
>> So the difference is that the former includes control frames as well? 
>> Is
>> that the only difference? And what exactly is a "big difference" (from
>> the commit message)?
>> 
> Yes,it adds the duration time of receiving ACK frames.
>  From my test,TX from AP to station with iperf UDP test in 
> 10s,tx_stats->tx_duration:5496623us,
> and another value is 3934327us.

Hmm, that's quite a big difference. Is this really only ACKs, or is it
also a question of whether retries are accounted? If so, it may actually
be that what we should do is change which value is passed to
ieee80211_sta_register_airtime()?

>>> So the first value is preferable for station dump.
>> 
>> Hmm, I'm not sure if I agree with this. I specifically added the
>> tx_duration to the station dump to be able to get the values used by 
>> the
>> airtime scheduler. This breaks with this patch.
>> 
>> -Toke
>  From our internal discussing, we will revert this change.

Cool, but see above :)

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


  1   2   >