Re: [PATCH 1/2] mac80211: Implement Airtime-based Queue Limit (AQL)

2019-10-09 Thread Kan Yan
> I'd like to use the new rate calculation code that Felix added to mt76.
> Is the arsta->txrate info in ath10k suitable to be passed up to mac80211
> and used in that, do you think? Because then that would probably be the
> easiest way to go about it...

Do you mean the mt76 patch that using the EWMA to smooth estimate data
rate? It would be great if you can move that to mac80211.
Yes, arsta->txrate info in ath10k is a good place to extract latest date rate.



On Mon, Oct 7, 2019 at 12:43 PM Johannes Berg  wrote:
>
> On Mon, 2019-10-07 at 21:40 +0200, Toke Høiland-Jørgensen wrote:
>
> > > > So if and when we start supporting true multi-band devices we'll have to
> > > > change these things anyway. So might as well keep everything together so
> > > > it all gets fixed :)
> > >
> > > I guess I'm OK with that, but I'm pretty sure this will come up sooner
> > > rather than later ...
> > >
> > > What else is there though?
> >
> > By "it all" I meant "all the airtime fairness stuff". Other than that, I
> > didn't have anything in particular in mind. I just kinda assumed there
> > would be lots of places that had an implicit assumption that all devices
> > on the same phy shares a channel...
>
> Not _that_ much - we do have the channel contexts after all. But except
> for hwsim (*cough cough* I was lazy) nothing actually implements real
> concurrent multi-channel yet, obviously, but uses a single radio with
> channel hopping...
>
> johannes
>


Re: [PATCH v2 1/2] mac80211: Implement Airtime-based Queue Limit (AQL)

2019-10-09 Thread Kan Yan
Hi  Johannes,

Thanks for the review and will address all issues you pointed out in
the next version.

Hi Yibo,
>
> I assume here the only txq in the list that does not meet AQL check will
> not be dequeued. Right? Will it affect peak throughput once there is
> only one station.

Yes, the txq won't be picked for transmitting even if it is the only
active txq if the AQL check failed.  However, this won't affect peak
throughput. The reason why there are two queue limits is address this
kind of situation. The higher queue limit ensures the hardware get
enough frames.
>
> > @@ -3748,10 +3785,10 @@ bool ieee80211_txq_may_transmit(struct
> > ieee80211_hw *hw,
> >   struct sta_info *sta;
> >   u8 ac = txq->ac;
> >
> > - spin_lock_bh(&local->active_txq_lock[ac]);
> > -
> >   if (!txqi->txq.sta)
> > - goto out;
> > + return true;
>
> why return here? I think even a txq without sta info should get removed
> from list and added it back later in return_txq() if needed. No?
Yes, it should be removed from the active list. I will fix that.

Thanks,
Kan


On Wed, Oct 9, 2019 at 1:18 AM Yibo Zhao  wrote:
>
> > diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> > index f13eb2f61ccf..dadb643a5498 100644
> > --- a/net/mac80211/tx.c
> > +++ b/net/mac80211/tx.c
> > @@ -3669,7 +3669,8 @@ struct ieee80211_txq *ieee80211_next_txq(struct
> > ieee80211_hw *hw, u8 ac)
> >  {
> >   struct ieee80211_local *local = hw_to_local(hw);
> >   struct ieee80211_txq *ret = NULL;
> > - struct txq_info *txqi = NULL;
> > + struct txq_info *txqi = NULL, *head = NULL;
> > + bool found_eligible_txq = false;
> >
> >   spin_lock_bh(&local->active_txq_lock[ac]);
> >
> > @@ -3680,20 +3681,32 @@ struct ieee80211_txq
> > *ieee80211_next_txq(struct ieee80211_hw *hw, u8 ac)
> >   if (!txqi)
> >   goto out;
> >
> > + if (txqi == head && !found_eligible_txq)
> > + goto out;
>
> I assume here the only txq in the list that does not meet AQL check will
> not be dequeued. Right? Will it affect peak throughput once there is
> only one station.
>
> How about dequeuing it anyway regardless AQL because it is the only one
> active now so it is fine to occupy the rest bandwidth. Otherwise, I am
> afraid next_txq() will return NULL in the test only one station is
> present.
>
> > @@ -3748,10 +3785,10 @@ bool ieee80211_txq_may_transmit(struct
> > ieee80211_hw *hw,
> >   struct sta_info *sta;
> >   u8 ac = txq->ac;
> >
> > - spin_lock_bh(&local->active_txq_lock[ac]);
> > -
> >   if (!txqi->txq.sta)
> > - goto out;
> > + return true;
>
> why return here? I think even a txq without sta info should get removed
> from list and added it back later in return_txq() if needed. No?
>
> > +
> > + spin_lock_bh(&local->active_txq_lock[ac]);
> >
> >   if (list_empty(&txqi->schedule_order))
> >   goto out;
>
>
> --
> Yibo


[PATCH v2 2/2] ath10k: Enable Airtime-based Queue Limit (AQL)

2019-10-06 Thread Kan Yan
Calculate the estimated airtime pending in the txqs and apply AQL to
prevent excessive amounts of packets being queued in the firmware queue.

Signed-off-by: Kan Yan 
---
 drivers/net/wireless/ath/ath10k/htt_rx.c |  1 +
 drivers/net/wireless/ath/ath10k/mac.c|  8 +---
 drivers/net/wireless/ath/ath10k/txrx.c   | 13 ++---
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c 
b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 83a7fb68fd24..f2115b940964 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -3053,6 +3053,7 @@ static void ath10k_htt_rx_tx_fetch_ind(struct ath10k *ar, 
struct sk_buff *skb)
 
num_msdus++;
num_bytes += ret;
+   may_tx = ieee80211_txq_aql_check(hw, txq);
}
ieee80211_return_txq(hw, txq, false);
ieee80211_txq_schedule_end(hw, txq->ac);
diff --git a/drivers/net/wireless/ath/ath10k/mac.c 
b/drivers/net/wireless/ath/ath10k/mac.c
index 0606416dc971..c22b822bd8f1 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -3983,6 +3983,9 @@ static bool ath10k_mac_tx_can_push(struct ieee80211_hw 
*hw,
struct ath10k_txq *artxq = (void *)txq->drv_priv;
 
/* No need to get locks */
+   if (!ieee80211_txq_aql_check(hw, txq))
+   return false;
+
if (ar->htt.tx_q_state.mode == HTT_TX_MODE_SWITCH_PUSH)
return true;
 
@@ -4009,13 +4012,11 @@ static u16 ath10k_mac_update_airtime(struct ath10k *ar,
 {
struct ath10k_sta *arsta;
u32 pktlen;
-   u16 airtime = 0;
+   s32 airtime = 0;
 
if (!txq || !txq->sta)
return airtime;
 
-   if (test_bit(WMI_SERVICE_REPORT_AIRTIME, ar->wmi.svc_map))
-   return airtime;
 
spin_lock_bh(&ar->data_lock);
arsta = (struct ath10k_sta *)txq->sta->drv_priv;
@@ -4038,6 +4039,7 @@ static u16 ath10k_mac_update_airtime(struct ath10k *ar,
}
spin_unlock_bh(&ar->data_lock);
 
+   ieee80211_sta_update_pending_airtime(txq->sta, txq->tid, airtime);
return airtime;
 }
 
diff --git a/drivers/net/wireless/ath/ath10k/txrx.c 
b/drivers/net/wireless/ath/ath10k/txrx.c
index 4102df016931..03eb636c85ed 100644
--- a/drivers/net/wireless/ath/ath10k/txrx.c
+++ b/drivers/net/wireless/ath/ath10k/txrx.c
@@ -84,9 +84,16 @@ int ath10k_txrx_tx_unref(struct ath10k_htt *htt,
wake_up(&htt->empty_tx_wq);
spin_unlock_bh(&htt->tx_lock);
 
-   if (txq && txq->sta && skb_cb->airtime_est)
-   ieee80211_sta_register_airtime(txq->sta, txq->tid,
-  skb_cb->airtime_est, 0);
+   if (txq && txq->sta && skb_cb->airtime_est) {
+   s32 airtime = -skb_cb->airtime_est;
+
+   if (!test_bit(WMI_SERVICE_REPORT_AIRTIME, ar->wmi.svc_map))
+   ieee80211_sta_register_airtime(txq->sta, txq->tid,
+  skb_cb->airtime_est, 0);
+
+   ieee80211_sta_update_pending_airtime(txq->sta, txq->tid,
+airtime);
+   }
 
if (ar->bus_param.dev_type != ATH10K_DEV_TYPE_HL)
dma_unmap_single(dev, skb_cb->paddr, msdu->len, DMA_TO_DEVICE);
-- 
2.23.0.581.g78d2f28ef7-goog



[PATCH v2 1/2] mac80211: Implement Airtime-based Queue Limit (AQL)

2019-10-06 Thread Kan Yan
In order for the Fq_CoDel integrated in mac80211 layer operates effectively
to control excessive queueing latency, the CoDel algorithm requires an
accurate measure of how long the packets stays in the queue, aka sojourn
time. The sojourn time measured at mac80211 layer doesn't include queueing
latency in lower layer (firmware/hardware) and CoDel expects lower layer to
have a short queue. However, most 802.11ac chipsets offload tasks such TX
aggregation to firmware or hardware, thus have a deep lower layer queue.
Without a mechanism to control the lower layer queue size, packets only
stays in mac80211 layer transiently before being sent to firmware queue.
As a result, the sojourn time measured by CoDel in the mac80211 layer is
almost always lower than the CoDel latency target, hence CoDel does little
to control the latency, even when the lower layer queue causes excessive
latency.

Byte Queue limits (BQL) is commonly used to address the similar issue with
wired network interface. However, this method cannot be applied directly
to the wireless network interface. Byte is not a suitable measure of queue
depth in the wireless network, as the data rate can vary dramatically from
station to station in the same network, from a few Mbps to over Gbps.

This patch implemented an Airtime-based Queue Limit (AQL) to make CoDel
works effectively with wireless drivers that utilized firmware/hardware
offloading. AQL only allows each txq to release just enough packets to the
lower layer to form 1-2 large aggregations to keep hardware fully utilized
and keep the rest of frames in mac80211 layer to be controlled by the CoDel
algorithm.

Signed-off-by: Kan Yan 
---
 include/net/cfg80211.h |  7 
 include/net/mac80211.h | 29 ++
 net/mac80211/debugfs.c | 78 ++
 net/mac80211/debugfs_sta.c | 44 +++--
 net/mac80211/ieee80211_i.h |  4 ++
 net/mac80211/main.c|  8 +++-
 net/mac80211/sta_info.c| 20 ++
 net/mac80211/sta_info.h|  4 ++
 net/mac80211/tx.c  | 58 +++-
 9 files changed, 231 insertions(+), 21 deletions(-)

diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index 26e2ad2c7027..05352eac82ec 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -2499,6 +2499,13 @@ enum wiphy_params_flags {
 
 #define IEEE80211_DEFAULT_AIRTIME_WEIGHT   256
 
+/* The per TXQ firmware queue limit in airtime */
+#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_L  4000
+#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_H  8000
+
+/* The per interface airtime threshold to switch to lower queue limit */
+#define IEEE80211_AQL_THRESHOLD24000
+
 /**
  * struct cfg80211_pmksa - PMK Security Association
  *
diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index d26da013f7c0..51ee28e72161 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -5543,6 +5543,35 @@ void ieee80211_send_eosp_nullfunc(struct ieee80211_sta 
*pubsta, int tid);
 void ieee80211_sta_register_airtime(struct ieee80211_sta *pubsta, u8 tid,
u32 tx_airtime, u32 rx_airtime);
 
+/**
+ * ieee80211_sta_update_pending_airtime - update txq's estimated airtime
+ *
+ * Update the estimated total airtime of frames queued in the lower layer 
queue.
+ *
+ * The airtime is estimated using frame length and the last reported data
+ * rate. The pending airtime for a txq is increased by the estimated
+ * airtime when the frame is relased to the lower layer, and decreased by the
+ * same amount at the tx completion event.
+ *
+ * @pubsta: the station
+ * @tid: the TID to register airtime for
+ * @tx_airtime: the estimated airtime (in usec)
+ */
+void ieee80211_sta_update_pending_airtime(struct ieee80211_sta *pubsta,
+   u8 tid, s32 tx_airtime);
+
+/**
+ * ieee80211_txq_aql_check - check if a txq can send more frames to firmware
+ *
+ * @hw: pointer obtained from ieee80211_alloc_hw()
+ * @txq: pointer obtained from station or virtual interface
+ *
+ * Return true if the AQL's airtime limit has not been reached and the txq can
+ * continue to send more packets to the lower layer. Otherwise return false.
+ */
+bool
+ieee80211_txq_aql_check(struct ieee80211_hw *hw, struct ieee80211_txq *txq);
+
 /**
  * ieee80211_iter_keys - iterate keys programmed into the device
  * @hw: pointer obtained from ieee80211_alloc_hw()
diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index 2e7f75938c51..f7cbf248163a 100644
--- a/net/mac80211/debugfs.c
+++ b/net/mac80211/debugfs.c
@@ -148,6 +148,80 @@ static const struct file_operations aqm_ops = {
.llseek = default_llseek,
 };
 
+static ssize_t aql_txq_limit_read(struct file *file,
+ char __user *user_buf,
+ size_t count,
+ loff_t *ppos)
+{
+   struct ieee80211_loca

[PATCH v2 0/2] Implement Airtime-based Queue Limit (AQL)

2019-10-06 Thread Kan Yan
This patch series implements Airtime-based Queue Limit (AQL) in the mac80211 
and Ath10k driver. It is based on an earlier version from the ChromiumOS 
tree[0].

This version has been tested with QCA9884 platform with 4.14 kernel. Tests show 
AQL is able to reduce latency by an order of magnitude in a congested 
environment without negative impact on the throughput.

[0] 
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1703105/7

Kan Yan (2):
  mac80211: Implement Airtime-based Queue Limit (AQL)
  ath10k: Enable Airtime-based Queue Limit (AQL)

 drivers/net/wireless/ath/ath10k/htt_rx.c |  1 +
 drivers/net/wireless/ath/ath10k/mac.c|  8 ++-
 drivers/net/wireless/ath/ath10k/txrx.c   | 13 +++-
 include/net/cfg80211.h   |  7 +++
 include/net/mac80211.h   | 29 +
 net/mac80211/debugfs.c   | 78 
 net/mac80211/debugfs_sta.c   | 44 +
 net/mac80211/ieee80211_i.h   |  4 ++
 net/mac80211/main.c  |  8 ++-
 net/mac80211/sta_info.c  | 20 ++
 net/mac80211/sta_info.h  |  4 ++
 net/mac80211/tx.c| 58 +++---
 12 files changed, 247 insertions(+), 27 deletions(-)

-- 
Changes from v1:
 - Fix checkpatch error.
 - Keep iterate the list of active_txq until an eligible txq's deficit become 
non-negative in ieee80211_next_txq(), instead of break the loop after one 
iteration.
 - Enforce the AQL limit in ath10k's pulling mode in 
ath10k_htt_rx_tx_fetch_ind()


Re: [PATCH 1/2] mac80211: Implement Airtime-based Queue Limit (AQL)

2019-10-04 Thread Kan Yan
Hi Toke,

> So I think it'll be better to pass up the last_rate and have mac80211
> calculate the airtime (like I did in my patch set). But we can keep the
> rest of your logic and just switch the calculation, I guess?
Sounds like a good plan to me. Moving the airtime calculation do makes
it easier to support multiple drivers. The only benefit of keep that
part in driver is we can avoid extending ieee80211_tx_info with a new
field "tx_time_est".

I will fix the "AIRTIME_USE_TX" part and prepare a new patch.

Thanks,
Kan

On Fri, Oct 4, 2019 at 7:08 PM Kan Yan  wrote:
>
> Hi Johannes,
>
> Thanks for help review this patch.  I will fix all style errors.
>
> > > +ieee80211_txq_airtime_check(struct ieee80211_hw *hw, struct 
> > > ieee80211_txq *txq);
> > Why is it necessary to call this, vs. just returning NULL when an SKB is
> > requested?
> This function is also called by ath10k, from ath10k_mac_tx_can_push(),
> which returns a boolean.
>
> > However, I'm not sure it *shouldn't* actually be per interface (i.e.
> > moving from
> > local->aql_total_pending_airtime
> > to
> >
> > sdata->aql_total_pending_airtime
> >
> > because you could have multiple channels etc. involved and then using a
> > single airtime limit across two interfaces that are actually on two
> > different channels (e.g. 2.4 and 5 GHz) doesn't make that much sense.
> >
> > Actually, it does make some sense as long as the NIC is actually
> > channel-hopping ... but that's in the process of changing now, there's
> > going to be hardware really soon (or perhaps already exists) that has
> > real dual-band capabilities...
>
> That's a good point.  I haven't thought about real simultaneous dual
> band chipset and such chipset do exists now. Is RSDB support coming to
> mac80211 soon? Just curious if it will be just virtual interfaces or
> something else. I chose "local" instead of "sdata" thinking about the
> case of several virtual interfaces (AP, STA, MESH) operates in the
> same channel, then the interface total could be a better choice.
>
> I am ok with moving the "aql_total_pending_airtime" into sdata, but
> afraid that's not the most optimal choice for the case of multiple
> virtual interfaces operates in the same channel.
> Maybe we could leave it in "local" for now. What do you think?
>
> Kan
>
> On Fri, Oct 4, 2019 at 8:44 AM Toke Høiland-Jørgensen  wrote:
> >
> > Kan Yan  writes:
> >
> > > In order for the Fq_CoDel integrated in mac80211 layer operates
> > > effectively to control excessive queueing latency, the CoDel algorithm
> > > requires an accurate measure of how long the packets stays in the
> > > queue, aka sojourn time. The sojourn time measured at mac80211 layer
> > > doesn't include queueing latency in lower layer (firmware/hardware)
> > > and CoDel expects lower layer to have a short queue. However, most
> > > 802.11ac chipsets offload tasks such TX aggregation to firmware or
> > > hardware, thus have a deep lower layer queue. Without a mechanism to
> > > control the lower layer queue size, packets only stays in mac80211
> > > layer transiently before being released to firmware due to the deep
> > > queue in the lower layer. In this case, the sojourn time measured by
> > > CoDel in the mac80211 layer is always lower than the CoDel latency
> > > target hence it does little to control the latency, even when the lower
> > > layer queue causes excessive latency.
> > >
> > > Byte Queue limits (BQL) is commonly used to address the similar issue
> > > with wired network interface. However, this method can not be applied
> > > directly to the wireless network interface. Byte is not a suitable
> > > measure of queue depth in the wireless network, as the data rate can
> > > vary dramatically from station to station in the same network, from a
> > > few Mbps to over 1 Gbps.
> > >
> > > This patch implemented an Airtime-based Queue Limit (AQL) to make CoDel
> > > works effectively with wireless drivers that utilized firmware/hardware
> > > offloading. It only allows each txq to release just enough packets to
> > > form 1-2 large aggregations to keep hardware fully utilized and keep the
> > > rest of frames in mac80211 layer to be controlled by CoDel.
> > >
> > > Change-Id: I1427db2c4c7fcb4162b087514dcb06d5613fa5d2
> > > Signed-off-by: Kan Yan 
> > > ---
> > >  include/net/cfg80211.h |  7 
> >

Re: [PATCH 1/2] mac80211: Implement Airtime-based Queue Limit (AQL)

2019-10-04 Thread Kan Yan
Hi Johannes,

Thanks for help review this patch.  I will fix all style errors.

> > +ieee80211_txq_airtime_check(struct ieee80211_hw *hw, struct ieee80211_txq 
> > *txq);
> Why is it necessary to call this, vs. just returning NULL when an SKB is
> requested?
This function is also called by ath10k, from ath10k_mac_tx_can_push(),
which returns a boolean.

> However, I'm not sure it *shouldn't* actually be per interface (i.e.
> moving from
> local->aql_total_pending_airtime
> to
>
> sdata->aql_total_pending_airtime
>
> because you could have multiple channels etc. involved and then using a
> single airtime limit across two interfaces that are actually on two
> different channels (e.g. 2.4 and 5 GHz) doesn't make that much sense.
>
> Actually, it does make some sense as long as the NIC is actually
> channel-hopping ... but that's in the process of changing now, there's
> going to be hardware really soon (or perhaps already exists) that has
> real dual-band capabilities...

That's a good point.  I haven't thought about real simultaneous dual
band chipset and such chipset do exists now. Is RSDB support coming to
mac80211 soon? Just curious if it will be just virtual interfaces or
something else. I chose "local" instead of "sdata" thinking about the
case of several virtual interfaces (AP, STA, MESH) operates in the
same channel, then the interface total could be a better choice.

I am ok with moving the "aql_total_pending_airtime" into sdata, but
afraid that's not the most optimal choice for the case of multiple
virtual interfaces operates in the same channel.
Maybe we could leave it in "local" for now. What do you think?

Kan

On Fri, Oct 4, 2019 at 8:44 AM Toke Høiland-Jørgensen  wrote:
>
> Kan Yan  writes:
>
> > In order for the Fq_CoDel integrated in mac80211 layer operates
> > effectively to control excessive queueing latency, the CoDel algorithm
> > requires an accurate measure of how long the packets stays in the
> > queue, aka sojourn time. The sojourn time measured at mac80211 layer
> > doesn't include queueing latency in lower layer (firmware/hardware)
> > and CoDel expects lower layer to have a short queue. However, most
> > 802.11ac chipsets offload tasks such TX aggregation to firmware or
> > hardware, thus have a deep lower layer queue. Without a mechanism to
> > control the lower layer queue size, packets only stays in mac80211
> > layer transiently before being released to firmware due to the deep
> > queue in the lower layer. In this case, the sojourn time measured by
> > CoDel in the mac80211 layer is always lower than the CoDel latency
> > target hence it does little to control the latency, even when the lower
> > layer queue causes excessive latency.
> >
> > Byte Queue limits (BQL) is commonly used to address the similar issue
> > with wired network interface. However, this method can not be applied
> > directly to the wireless network interface. Byte is not a suitable
> > measure of queue depth in the wireless network, as the data rate can
> > vary dramatically from station to station in the same network, from a
> > few Mbps to over 1 Gbps.
> >
> > This patch implemented an Airtime-based Queue Limit (AQL) to make CoDel
> > works effectively with wireless drivers that utilized firmware/hardware
> > offloading. It only allows each txq to release just enough packets to
> > form 1-2 large aggregations to keep hardware fully utilized and keep the
> > rest of frames in mac80211 layer to be controlled by CoDel.
> >
> > Change-Id: I1427db2c4c7fcb4162b087514dcb06d5613fa5d2
> > Signed-off-by: Kan Yan 
> > ---
> >  include/net/cfg80211.h |  7 
> >  include/net/mac80211.h | 34 
> >  net/mac80211/debugfs.c | 79 ++
> >  net/mac80211/debugfs_sta.c | 43 +++--
> >  net/mac80211/ieee80211_i.h |  4 ++
> >  net/mac80211/main.c|  7 +++-
> >  net/mac80211/sta_info.c| 23 +++
> >  net/mac80211/sta_info.h|  4 ++
> >  net/mac80211/tx.c  | 60 -
> >  9 files changed, 239 insertions(+), 22 deletions(-)
> >
> > diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
> > index 26e2ad2c7027..301c11be366a 100644
> > --- a/include/net/cfg80211.h
> > +++ b/include/net/cfg80211.h
> > @@ -2499,6 +2499,13 @@ enum wiphy_params_flags {
> >
> >  #define IEEE80211_DEFAULT_AIRTIME_WEIGHT 256
> >
> > +/* The per TXQ firmware queue limit in airtime */
> > +#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_L

[PATCH 2/2] ath10k: Enable Airtime-based Queue Limit (AQL)

2019-10-03 Thread Kan Yan
Calculate the estimated airtime pending in the txqs and apply AQL to
prevent excessive amounts of packets being queued in the firmware queue.

Change-Id: I4047ce0938f7b97440fb040b16b74f2e49aa3433
Signed-off-by: Kan Yan 
---
 drivers/net/wireless/ath/ath10k/mac.c  |  7 +--
 drivers/net/wireless/ath/ath10k/txrx.c | 12 +---
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/mac.c 
b/drivers/net/wireless/ath/ath10k/mac.c
index 0606416dc971..37eb56c19f19 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -3983,6 +3983,9 @@ static bool ath10k_mac_tx_can_push(struct ieee80211_hw 
*hw,
struct ath10k_txq *artxq = (void *)txq->drv_priv;
 
/* No need to get locks */
+   if (!ieee80211_txq_airtime_check(hw, txq))
+   return false;
+
if (ar->htt.tx_q_state.mode == HTT_TX_MODE_SWITCH_PUSH)
return true;
 
@@ -4014,8 +4017,6 @@ static u16 ath10k_mac_update_airtime(struct ath10k *ar,
if (!txq || !txq->sta)
return airtime;
 
-   if (test_bit(WMI_SERVICE_REPORT_AIRTIME, ar->wmi.svc_map))
-   return airtime;
 
spin_lock_bh(&ar->data_lock);
arsta = (struct ath10k_sta *)txq->sta->drv_priv;
@@ -4038,6 +4039,8 @@ static u16 ath10k_mac_update_airtime(struct ath10k *ar,
}
spin_unlock_bh(&ar->data_lock);
 
+   ieee80211_sta_register_pending_airtime(txq->sta, txq->tid, airtime,
+  false);
return airtime;
 }
 
diff --git a/drivers/net/wireless/ath/ath10k/txrx.c 
b/drivers/net/wireless/ath/ath10k/txrx.c
index 4102df016931..9266d3dd6adb 100644
--- a/drivers/net/wireless/ath/ath10k/txrx.c
+++ b/drivers/net/wireless/ath/ath10k/txrx.c
@@ -84,9 +84,15 @@ int ath10k_txrx_tx_unref(struct ath10k_htt *htt,
wake_up(&htt->empty_tx_wq);
spin_unlock_bh(&htt->tx_lock);
 
-   if (txq && txq->sta && skb_cb->airtime_est)
-   ieee80211_sta_register_airtime(txq->sta, txq->tid,
-  skb_cb->airtime_est, 0);
+   if (txq && txq->sta && skb_cb->airtime_est) {
+   if (!test_bit(WMI_SERVICE_REPORT_AIRTIME, ar->wmi.svc_map))
+   ieee80211_sta_register_airtime(txq->sta, txq->tid,
+  skb_cb->airtime_est, 0);
+
+   ieee80211_sta_register_pending_airtime(txq->sta, txq->tid,
+  skb_cb->airtime_est,
+  true);
+   }
 
if (ar->bus_param.dev_type != ATH10K_DEV_TYPE_HL)
dma_unmap_single(dev, skb_cb->paddr, msdu->len, DMA_TO_DEVICE);
-- 
2.23.0.581.g78d2f28ef7-goog



[PATCH 0/2] Implement Airtime-based Queue Limit (AQL)

2019-10-03 Thread Kan Yan
This patch series implements Airtime-based Queue Limit (AQL) in the mac80211 
and Ath10k driver. It is based on an earlier version from the ChromiumOS 
tree[0]. 

This version has been tested with QCA9884 platform with 4.14 kernel. Tests show 
AQL is able to reduce latency by an order of magnitude in a congested 
environment without negative impact on the throughput.

[0] 
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1703105/7

Kan Yan (2):
  mac80211: Implement Airtime-based Queue Limit (AQL)
  ath10k: Enable Airtime-based Queue Limit (AQL)

 drivers/net/wireless/ath/ath10k/mac.c  |  7 ++-
 drivers/net/wireless/ath/ath10k/txrx.c | 12 +++-
 include/net/cfg80211.h |  7 +++
 include/net/mac80211.h | 34 +++
 net/mac80211/debugfs.c | 79 ++
 net/mac80211/debugfs_sta.c | 43 ++
 net/mac80211/ieee80211_i.h |  4 ++
 net/mac80211/main.c|  7 ++-
 net/mac80211/sta_info.c| 23 
 net/mac80211/sta_info.h|  4 ++
 net/mac80211/tx.c  | 60 +++
 11 files changed, 253 insertions(+), 27 deletions(-)

-- 


[PATCH 1/2] mac80211: Implement Airtime-based Queue Limit (AQL)

2019-10-03 Thread Kan Yan
In order for the Fq_CoDel integrated in mac80211 layer operates
effectively to control excessive queueing latency, the CoDel algorithm
requires an accurate measure of how long the packets stays in the
queue, aka sojourn time. The sojourn time measured at mac80211 layer
doesn't include queueing latency in lower layer (firmware/hardware)
and CoDel expects lower layer to have a short queue. However, most
802.11ac chipsets offload tasks such TX aggregation to firmware or
hardware, thus have a deep lower layer queue. Without a mechanism to
control the lower layer queue size, packets only stays in mac80211
layer transiently before being released to firmware due to the deep
queue in the lower layer. In this case, the sojourn time measured by
CoDel in the mac80211 layer is always lower than the CoDel latency
target hence it does little to control the latency, even when the lower
layer queue causes excessive latency.

Byte Queue limits (BQL) is commonly used to address the similar issue
with wired network interface. However, this method can not be applied
directly to the wireless network interface. Byte is not a suitable
measure of queue depth in the wireless network, as the data rate can
vary dramatically from station to station in the same network, from a
few Mbps to over 1 Gbps.

This patch implemented an Airtime-based Queue Limit (AQL) to make CoDel
works effectively with wireless drivers that utilized firmware/hardware
offloading. It only allows each txq to release just enough packets to
form 1-2 large aggregations to keep hardware fully utilized and keep the
rest of frames in mac80211 layer to be controlled by CoDel.

Change-Id: I1427db2c4c7fcb4162b087514dcb06d5613fa5d2
Signed-off-by: Kan Yan 
---
 include/net/cfg80211.h |  7 
 include/net/mac80211.h | 34 
 net/mac80211/debugfs.c | 79 ++
 net/mac80211/debugfs_sta.c | 43 +++--
 net/mac80211/ieee80211_i.h |  4 ++
 net/mac80211/main.c|  7 +++-
 net/mac80211/sta_info.c| 23 +++
 net/mac80211/sta_info.h|  4 ++
 net/mac80211/tx.c  | 60 -
 9 files changed, 239 insertions(+), 22 deletions(-)

diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index 26e2ad2c7027..301c11be366a 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -2499,6 +2499,13 @@ enum wiphy_params_flags {
 
 #define IEEE80211_DEFAULT_AIRTIME_WEIGHT   256
 
+/* The per TXQ firmware queue limit in airtime */
+#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_L  4000
+#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_H  8000
+
+/* The firmware's transmit queue size limit in airtime */
+#define IEEE80211_DEFAULT_AQL_INTERFACE_LIMIT  24000
+
 /**
  * struct cfg80211_pmksa - PMK Security Association
  *
diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index d26da013f7c0..4d62aba3e4b2 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -5543,6 +5543,40 @@ void ieee80211_send_eosp_nullfunc(struct ieee80211_sta 
*pubsta, int tid);
 void ieee80211_sta_register_airtime(struct ieee80211_sta *pubsta, u8 tid,
u32 tx_airtime, u32 rx_airtime);
 
+/**
+ * ieee80211_sta_register_pending_airtime - register the estimated airtime for
+ * the frames pending in lower layer (firmware/hardware) txq.
+ *
+ * Update the total pending airtime of the frames released to firmware. The
+ * pending airtime is used by AQL to control queue size in the lower layer.
+ *
+ * The airtime is estimated using frame length and the last reported data
+ * rate. The pending airtime for a txq is increased by the estimated
+ * airtime when the frame is relased to the lower layer, and decreased by the
+ * same amount at the tx completion event.
+ *
+ * @pubsta: the station
+ * @tid: the TID to register airtime for
+ * @tx_airtime: the estimated airtime (in usec)
+ * @tx_commpleted: true if called from the tx completion event.
+ */
+void ieee80211_sta_register_pending_airtime(struct ieee80211_sta *pubsta,
+   u8 tid, u32 tx_airtime,
+   bool tx_completed);
+
+/**
+ * ieee80211_txq_aritme_check - check if the airtime limit of AQL (Airtime 
based
+ * queue limit) has been reached.
+ * @hw: pointer obtained from ieee80211_alloc_hw()
+ * @txq: pointer obtained from station or virtual interface
+ * Retrun true if the queue limit has not been reached and txq can continue to
+ * release packets to the lower layer.
+ * Return false if the queue limit has been reached and the txq should not
+ * release more frames to the lower layer.
+ */
+bool
+ieee80211_txq_airtime_check(struct ieee80211_hw *hw, struct ieee80211_txq 
*txq);
+
 /**
  * ieee80211_iter_keys - iterate keys programmed into the device
  * @hw: pointer obtained from ieee80211_alloc_hw()
diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index 2e7f75938c51..a7da9a93b63e 100644

Re: [PATCH RFC/RFT 4/4] mac80211: Apply Airtime-based Queue Limit (AQL) on packet dequeue

2019-09-30 Thread Kan Yan
> I guess the risk is lower when with a 24ms per-iface limit; but with
> enough stations I guess it could still happen, no? So we should probably
> handle this case...
Each txq (per sta, per tid) is allowed to release at least the lower
AQL limit amount of packet (default 4ms), which is not affected by
other station's PS behavior and 4ms should be sufficient for most use
cases.
The 24ms per-interface limit is an optimization to get good benchmark
score in peak performance test, which usually only involve 1-2
stations. The higher limit probably won't matter anymore when there
are many stations. I haven't noticed side effects due to PS behavior
in the ChromiumOS version. Still, it is better to be able to take
frames in PS queue in to account,

> Cool. Are you going to submit a ported version of your implementation?
> Then we can work from the two submissions and see if we can't converge
> on something...
Working on porting, should have something ready before the end of this week.


On Sun, Sep 29, 2019 at 12:18 PM Toke Høiland-Jørgensen  wrote:
>
> Kan Yan  writes:
>
> >> No, ath10k would continue to do what it was always doing. Drivers that
> >> can report after-the-fact airtime usage per-frame (like ath9k) will
> >> continue to do that. In both of those cases, the airtime estimate is
> >> only used to throttle the queue, not to schedule for fairness.
> > You are right, I didn't realize ath9k reports per frame airtime usage.
> >
> >> Yeah, I was wondering about that. Makes sense. Why 24ms, exactly?
> > The per interface 24 ms queue limit is an empirical number that works
> > well for both achieve low latency when there is a lot of stations and
> > get high throughput when there is only 1-2 stations.  We could make it
> > configurable.
>
> Right. "Found by trial and error" is a fine answer as far as I'm
> concerned :)
>
> But yeah, this should probably be configurable, like BQL is.
>
> >> BTW, I think Felix' concern about powersave was in relation to AQL: If
> >> we don't handle power save in that, we can end up in a situation where
> >>the budget for packets allowed to be queued in the firmware is taken up
> >> entirely by stations that are currently in powersave mode; which would
> >> throttle the device completely. So we should take that into account for
> >> AQL; for the fairness scheduler, stations in powersave are already
> >> unscheduled, so that should be fine.
> > I think the accounting for the airtime of frames in the power saving
> > queue could affect both the fairness scheduler and AQL.
> > For chipset with firmware offload, PS handling is mostly done by
> > firmware, so host driver's knowledge of PS state could be slightly
> > out-of-dated. The power save behavior also make it harder to the
> > airtime_weight correct for the fairness scheduler.
>
> Hmm, maybe. I'm not sure how significant this effect would be, but I
> guess we'll need to find out :)
>
>
> > Powersave mode's impact to AQL is much smaller. The lower per station
> > queue limit is not impacted by other stations PS behavior, since the
> > estimated future airtime is not weighted for other stations and a
> > station won't get blocked due to others stations in PS mode.
> > Station in PS mode do affects AQL's higher per interface limit, but in
> > an inconsequential way. The per interface AQL queue limit is quite
> > large (24 ms), hence airtime from packets in PS queue is unlikely to
> > have a significant impact on it. Still, it will be better if the
> > packet in power saving queue can be taken into account.
>
> I guess the risk is lower when with a 24ms per-iface limit; but with
> enough stations I guess it could still happen, no? So we should probably
> handle this case...
>
> >> > make it easier to schedule multiple stations, I think it has some merit
> >> > that makes it worth trying out. We should probably get the AQL stuff
> >> > done first, though, and try the virtual time scheduler on top of that.
> >> Agree that we should get the AQL stuff done first since I believe it
> >> will help to fix the issue mentioned above.
> > That sounds like a good plan. The virtual time scheduler is more
> > involved and will take more work to get it right. It make sense to get
> > AQL done first.
>
> Cool. Are you going to submit a ported version of your implementation?
> Then we can work from the two submissions and see if we can't converge
> on something...
>
> -Toke
>


Re: [PATCH RFC/RFT 4/4] mac80211: Apply Airtime-based Queue Limit (AQL) on packet dequeue

2019-09-28 Thread Kan Yan
> No, ath10k would continue to do what it was always doing. Drivers that
> can report after-the-fact airtime usage per-frame (like ath9k) will
> continue to do that. In both of those cases, the airtime estimate is
> only used to throttle the queue, not to schedule for fairness.
You are right, I didn't realize ath9k reports per frame airtime usage.

> Yeah, I was wondering about that. Makes sense. Why 24ms, exactly?
The per interface 24 ms queue limit is an empirical number that works
well for both achieve low latency when there is a lot of stations and
get high throughput when there is only 1-2 stations.  We could make it
configurable.

> BTW, I think Felix' concern about powersave was in relation to AQL: If
> we don't handle power save in that, we can end up in a situation where
>the budget for packets allowed to be queued in the firmware is taken up
> entirely by stations that are currently in powersave mode; which would
> throttle the device completely. So we should take that into account for
> AQL; for the fairness scheduler, stations in powersave are already
> unscheduled, so that should be fine.
I think the accounting for the airtime of frames in the power saving
queue could affect both the fairness scheduler and AQL.
For chipset with firmware offload, PS handling is mostly done by
firmware, so host driver's knowledge of PS state could be slightly
out-of-dated. The power save behavior also make it harder to the
airtime_weight correct for the fairness scheduler.
Powersave mode's impact to AQL is much smaller. The lower per station
queue limit is not impacted by other stations PS behavior, since the
estimated future airtime is not weighted for other stations and a
station won't get blocked due to others stations in PS mode.
Station in PS mode do affects AQL's higher per interface limit, but in
an inconsequential way. The per interface AQL queue limit is quite
large (24 ms), hence airtime from packets in PS queue is unlikely to
have a significant impact on it. Still, it will be better if the
packet in power saving queue can be taken into account.

> > make it easier to schedule multiple stations, I think it has some merit
> > that makes it worth trying out. We should probably get the AQL stuff
> > done first, though, and try the virtual time scheduler on top of that.
> Agree that we should get the AQL stuff done first since I believe it
> will help to fix the issue mentioned above.
That sounds like a good plan. The virtual time scheduler is more
involved and will take more work to get it right. It make sense to get
AQL done first.


On Fri, Sep 27, 2019 at 2:20 AM Yibo Zhao  wrote:
>
> On 2019-09-27 14:12, Toke Høiland-Jørgensen wrote:
> > Kan Yan  writes:
> >
> >>> No, but on tx_completion we could do something like this:
> >>> airtime = CB(skb)->tx_time ?: CB(skb)->tx_time_est;
> >>> ieee80211_report_airtime(sta, airtime);
> >>> That way, if the driver sets the tx_time field to something the
> >>> firmware
> >>> reports, we'll use that, and otherwise we'd fall back to the
> >>> estimate.
> >>> Of course, there would need to be a way for the driver to opt out of
> >>> this, for drivers that report out of band airtime like ath10k does :)
> >>
> >> I doubt that will work, unless firmware can do per frame airtime
> >> update in the skb. It is more likely that firmware provides out of
> >> band airtime update for a period of time, like an aggregation. That's
> >> the case for ath10k: https://patchwork.kernel.org/patch/10684689
> >
> > No, ath10k would continue to do what it was always doing. Drivers that
> > can report after-the-fact airtime usage per-frame (like ath9k) will
> > continue to do that. In both of those cases, the airtime estimate is
> > only used to throttle the queue, not to schedule for fairness.
> >
> > My point is just that for drivers that have *no* mechanism to report
> > airtime usage after-the-fact, we can add a flag that will allow the
> > values estimated by mac80211 to also be used for the fairness
> > scheduler...
> >
> >> Regarding handling frame for station enters power saving mode:
> >>>
> >>> >> Oh, right. Didn't know that could happen (I assumed those would be
> >>> >> flushed out or something). But if we're going to go with Kan's
> >>> >> suggestion of having per-station accounting as well as a global
> >>> >> accounting for the device, we could just subtract the station's
> >>> >> outstanding balance from the device total when it goes into powersave
> >>> >> mode, and 

Re: [PATCH RFC/RFT 4/4] mac80211: Apply Airtime-based Queue Limit (AQL) on packet dequeue

2019-09-26 Thread Kan Yan
> No, but on tx_completion we could do something like this:
> airtime = CB(skb)->tx_time ?: CB(skb)->tx_time_est;
> ieee80211_report_airtime(sta, airtime);
> That way, if the driver sets the tx_time field to something the firmware
> reports, we'll use that, and otherwise we'd fall back to the estimate.
> Of course, there would need to be a way for the driver to opt out of
> this, for drivers that report out of band airtime like ath10k does :)

I doubt that will work, unless firmware can do per frame airtime
update in the skb. It is more likely that firmware provides out of
band airtime update for a period of time, like an aggregation. That's
the case for ath10k: https://patchwork.kernel.org/patch/10684689

Regarding handling frame for station enters power saving mode:
>
> >> Oh, right. Didn't know that could happen (I assumed those would be
> >> flushed out or something). But if we're going to go with Kan's
> >> suggestion of having per-station accounting as well as a global
> >> accounting for the device, we could just subtract the station's
> >> outstanding balance from the device total when it goes into powersave
> >> mode, and add it back once it wakes up again. At least I think that
> >> would work, no?
> >Yes, I think that would work.
> Great! Will incorporate that, then.

I think that could work but things could be a bit more complicated.
Not sure I fully understand the usage of airtime_weight_sum in  [PATCH
V3 1/4] mac80211: Switch to a virtual time-based airtime scheduler:

in ieee80211_schedule_txq():
   local->airtime_weight_sum[ac] += sta->airtime_weight;

in ieee80211_sta_register_airtime():
   weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight;
   local->airtime_v_t[ac] += airtime / weight_sum;
   sta->airtime[ac].v_t += airtime / sta->airtime_weight;

in  __ieee80211_unschedule_txq
 local->airtime_weight_sum[ac] -= sta->airtime_weight;

I assume the purpose of airtime_weight_sum is to count station's
virtual airtime proportional to the number of active stations for
fairness.  My concern is the per interface
local->airtime_weight_sum[ac] get updated when packets are released
from a txq to lower layer, but it doesn't mean the airtime will be
consumed (packets get transmitted) shortly, due to events like station
enter power save mode, so the weight_sum used in
ieee80211_sta_register_airtime() maybe inaccurate. For architects
using firmware/hardware offloading, firmware ultimately controls
packet scheduling and has quite a lot of autonomy. The host driver's
airtime_weight_sum may get out of sync with the number of active
stations actually scheduled by firmware even without power saving
events.

Is this a correct understanding? I kind of think the original method
of airtime accounting using deficit maybe more robust in this regard.


Re: [PATCH RFC/RFT 4/4] mac80211: Apply Airtime-based Queue Limit (AQL) on packet dequeue

2019-09-25 Thread Kan Yan
> > Do you mean we will use airtime reported by FW to calculate
>
> > local->airtime_queued in case we have FW reporting airtime?
> No, the opposite; if the firmware can't report airtime, we can use the
> estimated values to feed into report_airtime() for the fairness
> calculation...
The local->airtime_queued is the 'future' airtime for the packet
pending the queue. It can't be replaced by the after the fact airtime
reported from firmware for the frames transmitted.


On Wed, Sep 25, 2019 at 5:27 PM Kan Yan  wrote:
>
> > Yes, please do! AFAICT, the main difference is that your version keeps
> > the airtime calculation itself in the driver, while mine passes up the
> > rate and lets mac80211 do the calculation of airtime. Other than that,
> > the differences are minor, no?
> > I'm not actually sure which approach is best; I suspect doing all the
> > accounting in mac80211 will help with integrating this into drivers that
> > use minstrel; we can just add a hook in that and be done with it.
> > Whereas if the driver has to do the accounting, we would need to add
> > that to each driver (mt76, iwl(?)).
>
> Yes, they are essentially doing the same thing. I kept the airtime
> estimation code in the ath10k just because it is already there. It is
> better to do that in mac80211, so it doesn't have to be duplicated for
> each driver and avoids the overhead of updating the estimated airtime
> from host driver to mac80211.
>
> > But of course, doing things in mac80211 depends on stuffing even more
> > stuff into the already overloaded cb field; and I'm not actually
> > entirely sure what I've done with that will actually work. WDYT?
> Either way a field in skb cb is needed to record the estimated
> airtime. The  'tx_time_est' shares the space with the codel
> 'enque_time' looks fine to me, as their lifetime doesn't overlap.
>
> There is another minor difference in the ChromiumOs version, which
> actually address the issue Yibo just asked:
> > Meanwhile, airtime_queued will also limit the situation that we only
> > have a station for transmission. Not sure if the peak throughput will be
> > impacted. I believe it may work fine with 11ac in chromiumos, how about
> > 11n and 11a?
>
> My version has two AQL limits, a smaller per station limit (4ms) and a
> larger per interface limit (24 ms). When the per interface limit has
> not been reached, stations are allowed to transmit up to 1/3 of the
> interface limits (8ms). This way it balance the needs to control
> latency when there are a lot of stations and to get good throughput
> benchmark numbers with a single client. In my test, I found increasing
> the AQL limit to beyond 8 ms doesn't helps peak throughput on 4x4
> ath10k chipset.
> https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1734867/3/net/mac80211/tx.c#b3734
>
> > Of course we'll also have to eventually integrate this with the other
> > series that Yibo recently re-posted (the virtual time scheduler). I
> > think that will be relatively straight forward, except I'm not sure your
> > atomic patches will work when we also have to update the rbtree. Any
> > thoughts on that series in general?
> I do like the virtual time scheduler patchset. It makes it easier to
> schedule an arbitrary tx queue and handles ath10k's firmware pulling
> mode better. I will give it a try.
>
> > Yup, makes sense. Looking at the version you linked to, though, it seems
> > you're calling ieee80211_sta_register_airtime() with the estimated value
> > as well? So are you double-accounting airtime, or are you adjusting for
> > the accurate values somewhere else I don't see in that series?
> It does not double count airtime, just both the airtime fairness
> scheduler and AQL use the estimate airtime. It is on an older tree and
> still doesn't have the patch that provides the fw airtime:
> https://patchwork.kernel.org/patch/10684689
>
>
> On Wed, Sep 25, 2019 at 5:52 AM Toke Høiland-Jørgensen  
> wrote:
> >
> > Yibo Zhao  writes:
> >
> > > On 2019-09-25 16:11, Toke Høiland-Jørgensen wrote:
> > >> Yibo Zhao  writes:
> > >>
> > >>> So if it is going to work together with virtual time based mechanism
> > >>> in
> > >>> the future, the Tx criteria will be met both of below conditions,
> > >>> 1. Lower than g_vt
> > >>> 2. Lower than IEEE80211_AIRTIME_QUEUE_LIMIT
> > >>
> > >>> Are we going to maintain two kinds of airtime that one is from
> > >>> estima

Re: [PATCH RFC/RFT 4/4] mac80211: Apply Airtime-based Queue Limit (AQL) on packet dequeue

2019-09-25 Thread Kan Yan
> Yes, please do! AFAICT, the main difference is that your version keeps
> the airtime calculation itself in the driver, while mine passes up the
> rate and lets mac80211 do the calculation of airtime. Other than that,
> the differences are minor, no?
> I'm not actually sure which approach is best; I suspect doing all the
> accounting in mac80211 will help with integrating this into drivers that
> use minstrel; we can just add a hook in that and be done with it.
> Whereas if the driver has to do the accounting, we would need to add
> that to each driver (mt76, iwl(?)).

Yes, they are essentially doing the same thing. I kept the airtime
estimation code in the ath10k just because it is already there. It is
better to do that in mac80211, so it doesn't have to be duplicated for
each driver and avoids the overhead of updating the estimated airtime
from host driver to mac80211.

> But of course, doing things in mac80211 depends on stuffing even more
> stuff into the already overloaded cb field; and I'm not actually
> entirely sure what I've done with that will actually work. WDYT?
Either way a field in skb cb is needed to record the estimated
airtime. The  'tx_time_est' shares the space with the codel
'enque_time' looks fine to me, as their lifetime doesn't overlap.

There is another minor difference in the ChromiumOs version, which
actually address the issue Yibo just asked:
> Meanwhile, airtime_queued will also limit the situation that we only
> have a station for transmission. Not sure if the peak throughput will be
> impacted. I believe it may work fine with 11ac in chromiumos, how about
> 11n and 11a?

My version has two AQL limits, a smaller per station limit (4ms) and a
larger per interface limit (24 ms). When the per interface limit has
not been reached, stations are allowed to transmit up to 1/3 of the
interface limits (8ms). This way it balance the needs to control
latency when there are a lot of stations and to get good throughput
benchmark numbers with a single client. In my test, I found increasing
the AQL limit to beyond 8 ms doesn't helps peak throughput on 4x4
ath10k chipset.
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1734867/3/net/mac80211/tx.c#b3734

> Of course we'll also have to eventually integrate this with the other
> series that Yibo recently re-posted (the virtual time scheduler). I
> think that will be relatively straight forward, except I'm not sure your
> atomic patches will work when we also have to update the rbtree. Any
> thoughts on that series in general?
I do like the virtual time scheduler patchset. It makes it easier to
schedule an arbitrary tx queue and handles ath10k's firmware pulling
mode better. I will give it a try.

> Yup, makes sense. Looking at the version you linked to, though, it seems
> you're calling ieee80211_sta_register_airtime() with the estimated value
> as well? So are you double-accounting airtime, or are you adjusting for
> the accurate values somewhere else I don't see in that series?
It does not double count airtime, just both the airtime fairness
scheduler and AQL use the estimate airtime. It is on an older tree and
still doesn't have the patch that provides the fw airtime:
https://patchwork.kernel.org/patch/10684689


On Wed, Sep 25, 2019 at 5:52 AM Toke Høiland-Jørgensen  wrote:
>
> Yibo Zhao  writes:
>
> > On 2019-09-25 16:11, Toke Høiland-Jørgensen wrote:
> >> Yibo Zhao  writes:
> >>
> >>> So if it is going to work together with virtual time based mechanism
> >>> in
> >>> the future, the Tx criteria will be met both of below conditions,
> >>> 1. Lower than g_vt
> >>> 2. Lower than IEEE80211_AIRTIME_QUEUE_LIMIT
> >>
> >>> Are we going to maintain two kinds of airtime that one is from
> >>> estimation and the other is basically from FW reporting?
> >>
> >> Yes, that was my plan. For devices that don't have FW reporting of
> >> airtime, we can fall back to the estimation; but if we do have FW
> >> reporting that is most likely going to be more accurate, so better to
> >> use that for fairness...
> >
> > Do you mean we will use airtime reported by FW to calculate
> > local->airtime_queued in case we have FW reporting airtime?
>
> No, the opposite; if the firmware can't report airtime, we can use the
> estimated values to feed into report_airtime() for the fairness
> calculation...
>
> -Toke
>


Re: [PATCH RFC/RFT 4/4] mac80211: Apply Airtime-based Queue Limit (AQL) on packet dequeue

2019-09-20 Thread Kan Yan
Hi Toke,

There is an updated version of AQL in the chromiumos tree implemented
in the mac80211 driver, instead of in the ath10k driver as the
original version:
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1703105/7
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1703106/6

It is based on a more recent kernel (4.14) and integrated with the
airtime fairness tx scheduler in mac80211. This version has been
tested rather extensively.  I intended to use it as the basis for my
effort to bring AQL upstream, but get sidetracked by other things. I
can clean it up and send a patchset next week if you think that is the
right path. Sorry for the long delay and slack off on the upstream
effort.

There is some concern in this thread regarding the accuracy of the
estimated airtime using the last reported TX rate. It is indeed a
rather crude method and did not include retries in the calculation.
Besides, there are lags between firmware changing rate and host driver
get the rate update. The 16us IFS overhead is only correct for 5G and
it is actually 10us for 2.4 G. However, that hardly matters. The goal
of AQL is to prevent the firmware/hardware queue from getting bloated
or starved. There is a lot of headroom in the queue length limit (8-10
ms) to tolerate inaccuracy in the estimate airtime. AQL doesn't
control the fine grained TX packet scheduling. It is handled by the
airtime fairness scheduler and ultimately firmware.

There are two TX airtimes in the newer version (chromiumos 4.14
kernel): The estimated airtime for frames pending in the queue and the
airtime reported by the firmware for the frame transmitted, which
should be accurate as the firmware supposed to take retries and
aggregation into account. The airtime fairness scheduler that does the
TX packet scheduling should use the TX airtime reported by the
firmware. That's the reason why the original implementation in the
ChromiumOS tree tries to factor in aggregation size when estimate the
airtime overhead and the later version doesn't even bother with that.

Regards,
Kan


On Fri, Sep 20, 2019 at 6:32 AM Toke Høiland-Jørgensen  wrote:
>
> Lorenzo Bianconi  writes:
>
> >> Lorenzo Bianconi  writes:
> >>
> >> >> From: Toke Høiland-Jørgensen 
> >> >>
> >> >> Some devices have deep buffers in firmware and/or hardware which 
> >> >> prevents
> >> >> the FQ structure in mac80211 from effectively limiting bufferbloat on 
> >> >> the
> >> >> link. For Ethernet devices we have BQL to limit the lower-level queues, 
> >> >> but
> >> >> this cannot be applied to mac80211 because transmit rates can vary 
> >> >> wildly
> >> >> between packets depending on which station we are transmitting it to.
> >> >>
> >> >> To overcome this, we can use airtime-based queue limiting (AQL), where 
> >> >> we
> >> >> estimate the transmission time for each packet before dequeueing it, and
> >> >> use that to limit the amount of data in-flight to the hardware. This 
> >> >> idea
> >> >> was originally implemented as part of the out-of-tree airtime fairness
> >> >> patch to ath10k[0] in chromiumos.
> >> >>
> >> >> This patch ports that idea over to mac80211. The basic idea is simple
> >> >> enough: Whenever we dequeue a packet from the TXQs and send it to the
> >> >> driver, we estimate its airtime usage, based on the last recorded TX 
> >> >> rate
> >> >> of the station that packet is destined for. We keep a running per-AC 
> >> >> total
> >> >> of airtime queued for the whole device, and when that total climbs 
> >> >> above 8
> >> >> ms' worth of data (corresponding to two maximum-sized aggregates), we
> >> >> simply throttle the queues until it drops down again.
> >> >>
> >> >> The estimated airtime for each skb is stored in the tx_info, so we can
> >> >> subtract the same amount from the running total when the skb is freed or
> >> >> recycled. The throttling mechanism relies on this accounting to be
> >> >> accurate (i.e., that we are not freeing skbs without subtracting any
> >> >> airtime they were accounted for), so we put the subtraction into
> >> >> ieee80211_report_used_skb().
> >> >>
> >> >> This patch does *not* include any mechanism to wake a throttled TXQ 
> >> >> again,
> >> >> on the assumption that this will happen anyway as a side effect of 
> >> >> whatever
> >> >> freed the skb (most commonly a TX completion).
> >> >>
> >> >> The throttling mechanism only kicks in if the queued airtime total goes
> >> >> above the limit. Since mac80211 calculates the time based on the 
> >> >> reported
> >> >> last_tx_time from the driver, the whole throttling mechanism only kicks 
> >> >> in
> >> >> for drivers that actually report this value. With the exception of
> >> >> multicast, where we always calculate an estimated tx time on the 
> >> >> assumption
> >> >> that multicast is transmitted at the lowest (6 Mbps) rate.
> >> >>
> >> >> The throttling added in this patch is in addition to any throttling 
> >> >> already
> >> >> perfo

Re: [PATCH v6 4/4] ath10k: reporting estimated tx airtime for fairness

2019-01-24 Thread Kan Yan
Hi Toke,

This patch looks good to me. Great job on getting the airtime fairness
scheduling framework done!


On Tue, Jan 22, 2019 at 6:21 AM Toke Høiland-Jørgensen  wrote:
>
> From: Kan Yan 
>
> Transmit airtime will be estimated from last tx rate used.
> Firmware report tx rate by peer stats. Airtime is computed
> on tx path and the same will be reported to mac80211 upon
> tx completion.
>
> This change is based on Kan's orginal commit in Chromium tree
> ("CHROMIUM: ath10k: Implementing airtime fairness based TX scheduler")
> ref: https://chromium-review.googlesource.com/588190
>
> Tested on QCA4019 with firmware version 10.4-3.2.1.1-00015
> Tested on QCA9984 with firmware version 10.4-3.9.0.1-5
>
> Signed-off-by: Kan Yan 
> [rmano...@codeaurora.org: ported only the airtime computation]
> Signed-off-by: Rajkumar Manoharan 
> [t...@redhat.com: Rebase to mac80211-next, add test note]
> Signed-off-by: Toke Høiland-Jørgensen 
> ---
>
> Rajkumar / Kan: Please check that I didn't mess anything up while
> rebasing this onto the current mac80211-next branch!
>
>  drivers/net/wireless/ath/ath10k/core.h   |  2 +
>  drivers/net/wireless/ath/ath10k/htt_rx.c |  1 +
>  drivers/net/wireless/ath/ath10k/mac.c| 57 ++--
>  drivers/net/wireless/ath/ath10k/txrx.c   |  4 ++
>  4 files changed, 61 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath10k/core.h 
> b/drivers/net/wireless/ath/ath10k/core.h
> index 28cecf7375f8..4528a6e47b8d 100644
> --- a/drivers/net/wireless/ath/ath10k/core.h
> +++ b/drivers/net/wireless/ath/ath10k/core.h
> @@ -126,6 +126,7 @@ struct ath10k_skb_cb {
> u8 flags;
> u8 eid;
> u16 msdu_id;
> +   u16 airtime_est;
> struct ieee80211_vif *vif;
> struct ieee80211_txq *txq;
>  } __packed;
> @@ -498,6 +499,7 @@ struct ath10k_sta {
> u16 peer_id;
> struct rate_info txrate;
> struct ieee80211_tx_info tx_info;
> +   u32 last_tx_bitrate;
>
> struct work_struct update_wk;
> u64 rx_duration;
> diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c 
> b/drivers/net/wireless/ath/ath10k/htt_rx.c
> index dc9895f2eb9d..6a93b57900ae 100644
> --- a/drivers/net/wireless/ath/ath10k/htt_rx.c
> +++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
> @@ -3078,6 +3078,7 @@ ath10k_update_per_peer_tx_stats(struct ath10k *ar,
>
> arsta->txrate.nss = txrate.nss;
> arsta->txrate.bw = ath10k_bw_to_mac80211_bw(txrate.bw);
> +   arsta->last_tx_bitrate = cfg80211_calculate_bitrate(&arsta->txrate);
> if (sgi)
> arsta->txrate.flags |= RATE_INFO_FLAGS_SHORT_GI;
>
> diff --git a/drivers/net/wireless/ath/ath10k/mac.c 
> b/drivers/net/wireless/ath/ath10k/mac.c
> index cf7fdf72e990..b13dcb4533cb 100644
> --- a/drivers/net/wireless/ath/ath10k/mac.c
> +++ b/drivers/net/wireless/ath/ath10k/mac.c
> @@ -3544,7 +3544,7 @@ static void ath10k_tx_h_add_p2p_noa_ie(struct ath10k 
> *ar,
>  static void ath10k_mac_tx_h_fill_cb(struct ath10k *ar,
> struct ieee80211_vif *vif,
> struct ieee80211_txq *txq,
> -   struct sk_buff *skb)
> +   struct sk_buff *skb, u16 airtime)
>  {
> struct ieee80211_hdr *hdr = (void *)skb->data;
> struct ath10k_skb_cb *cb = ATH10K_SKB_CB(skb);
> @@ -3561,6 +3561,7 @@ static void ath10k_mac_tx_h_fill_cb(struct ath10k *ar,
>
> cb->vif = vif;
> cb->txq = txq;
> +   cb->airtime_est = airtime;
>  }
>
>  bool ath10k_mac_tx_frm_has_freq(struct ath10k *ar)
> @@ -3948,6 +3949,49 @@ static bool ath10k_mac_tx_can_push(struct ieee80211_hw 
> *hw,
> return false;
>  }
>
> +/* Return estimated airtime in microsecond, which is calculated using last
> + * reported TX rate. This is just a rough estimation because host driver has 
> no
> + * knowledge of the actual transmit rate, retries or aggregation. If actual
> + * airtime can be reported by firmware, then delta between estimated and 
> actual
> + * airtime can be adjusted from deficit.
> + */
> +#define IEEE80211_ATF_OVERHEAD 100 /* IFS + some slot time */
> +#define IEEE80211_ATF_OVERHEAD_IFS 16  /* IFS only */
> +static u16 ath10k_mac_update_airtime(struct ath10k *ar,
> +struct ieee80211_txq *txq,
> +struct sk_buff *skb)
> +{
> +   struct ath10k_sta *arsta;
> +   u32 pktlen;
> +   u16 airtime = 0;
> +
> +   if (!txq || !txq->s

Re: [PATCH] ath10k: Add wrapper function to ath10k debug

2018-10-15 Thread Kan Yan
> Johannes had an interesting idea to use trace_ath10k_log_dbg_enabled().
> Could you investigate if that would work? That way we might get the
> performance improvement even when is enabled CONFIG_ATH10K_TRACING (but
> actual trace point is disabled, of course).

That's a good idea. This patch was originally made for Google Wifi's
3.18 kernel and It does use trace_ath10k_log_dbg_enabled():
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/391891
+#define ath10k_dbg(ar, mask, format, ...)  \
+   do {\
+   if (unlikely((ath10k_debug_mask & mask) ||  \
+ trace_ath10k_log_dbg_enabled())) {\
+   __ath10k_dbg(ar, mask, format, ##__VA_ARGS__);  \
+   }   \
+   } while (0)


Re: [RFC v2] mac80211: budget outstanding airtime for transmission

2018-09-21 Thread Kan Yan
Hi Rajkumar,

It is great to see your patches and thank you for helping bring this
to upstream.

> Why signed? This should never become negative unless something is wrong
> with the accounting somewhere?

It is signed because the "budget" is allowed to go negative.  In the
original version,  as long as the budget is > 0, at least one packet
is released even the budget is less than the airtime required for the
packet.  One major issue I had when doing airtime based queue limit is
how to prevent txq with very low data rate (e.g., 6 mbps) from getting
stalled for too long. My solution is to allow budget to go negative
for one packet, then reset budget to 0 when all frames from that txq
has been completed and the inflight packets count reaches to 0.

For the ieee80211_calc_airtime_budget(), although it is sufficient for
the purpose of enforcing queue limit, the airtime overhead estimation
in the original patch really crude and could use some improvement.
Instead of using the last reported tx PHY rate and a guesstimate of
overhead to calculate airtime,  one possible improvement is to
calculate the estimated "real" tx rate from a windowed moving average
of past tx_bytes/tx_airtime, if the firmware is capable of reporting
tx airtime for each completed PPDU.


Re: [PATCH RFC v3 0/4] Move TXQ scheduling into mac80211

2018-09-12 Thread Kan Yan
> I think this is a great approach, that we should definitely adopt in
> mac80211. However, I'm not sure the outstanding airtime limiting should
> be integrated into the fairness scheduling, since there are drivers such
> as ath9k that don't need it.
>
> Rather, I'd propose that we figure out the API for fairness scheduling
> first, and add the BQL-like limiter as a separate layer. They can be
> integrated in the sense that the limiter's estimate of airtime can be
> used for fairness scheduling as well in the case where the
> driver/firmware doesn't have a better source of airtime usage.
>
> Does that make sense?
Sure that make sense. Yes, the airtime based BQL like queue limiter is
not needed for drivers such as ath9k. We could leave the outstanding
airtime accounting and queue depth limit to another subject and get
airtime fairness scheduling API done first.


On Mon, Sep 10, 2018 at 4:26 AM, Johannes Berg
 wrote:
> On Mon, 2018-09-10 at 13:17 +0200, Toke Høiland-Jørgensen wrote:
>
>> > > - I did not add any locking around next_txq(); the driver is still 
>> > > supposed
>> > >   to maintain a lock that prevents two threads from trying to schedule 
>> > > the
>> > >   same AC at the same time. This is what drivers already do, so I 
>> > > figured it
>> > >   was easier to just keep it that way rather than do it in mac80211.
>> >
>> > I'll look at this in the code, but from a maintainer perspective I'm
>> > somewhat worried that this will lead to issues that are really the
>> > driver's fault, but surface in mac80211. I don't know how easy it
>> > would be to catch that.
>>
>> Yeah, I get what you mean. The alternative would be to have a
>> ieee80211_start_schedule(ac) and ieee80211_end_schedule(ac), which
>> basically just takes a lock.
>
> And I guess start would increment the schedule number, which is now
> dependent on first
>
>> Would mean we could get rid of the 'first'
>> parameter for next_txq(), so might not be such a bad idea;
>
> Right, that's what I meant.
>
>> and if the
>> driver has its own locking the extra locking in mac80211 would just be
>> an always-uncontested spinlock, which shouldn't be much overhead, right?
>
> It may still bounce around CPUs if you call this from other places, but
> I suspect that wouldn't be the biggest issue. There are a lot of
> calculations going on too...
>
> johannes


Re: [PATCH RFC v3 0/4] Move TXQ scheduling into mac80211

2018-09-09 Thread Kan Yan
Hi Toke,

It is great to see finally there will be a general airtime fairness
support in mac80211. I feel this patch set could still use some
improvements to make it works better with 11ac drivers like ath10k. I
did a version of airtime fairness in the ath10k driver that used in
Google Wifi and I'd like to share a few things I learned from this
experience.

Airtime fairness is a cool feature, but that's not the motivation for
me to implement that. The goal is to solve the bufferbloat problem at
wireless interface. Fq_CoDel has been proved to be very effective in
fixing bufferbloat for wired interface, and it has been integrated
with mac802.11 for more than 2 years. However, it still does not work
well with 11ac drivers like ath10k, which relies heavily on firmware
to do most of transmit scheduling, and hence has deep firmware queue.
FQ_CoDel expects a thin driver layer queue so it can get an accurate
measurement of sojourn time and use the sojourn time to control
latency.  However, the queue in ath10k firmware is so deep, that
packets only stay in mac80211 layer queue transiently. The sojourn
time is almost always below target, and hence CoDel doesn't do much to
control the latency.

The key to make Fq_Codel works effectively with 11ac wireless driver
is to limit the queue depth in lower layer. It looks to me this patch
set only enforce fairness among stations (airtime wise), but did not
limit how many packets can be released to firmware/hardware.  When a
txq run out of its airtime deficit, the txq is put to the back of the
list but will get a refill of airtime deficit in next round. It
doesn't keep counts of the total pending airtime released to hardware
or firmware. Packets will be released as long as wireless driver keeps
calling the dequeue function. This maybe not an issue for ath9k,
because the number of available tx descriptors is more limited.
However, for 11ac drivers like ath10k, it routinely queues thousands
packets when the bandwidth is oversubscribed in my test.

The version I did is to use pending airtime to restrict the firmware
queue depth. Another difference is it does not use airtime reported
from firmware, which is "past" or spent airtime, but use "future"
airtime, the estimated airtime calculated using last reported tx rate
for the inflight packets pending in the firmware queue. Host driver
keeps counts of total pending airtime for each TxQ (per station, per
AC). It tries to release up to 4-8 ms worth of frames for each TxQ.
The idea is to use airtime accounting to give firmware just enough
frames to keep it busy and form good aggregations, and keeps the rest
of frames in mac80211 layer queues so Fq_Codel can work on it to
control latency.

Here are some of the patches from the version I did in ath10k for
kernel in ChromeOs.
CHROMIUM: net: mac80211: Recording last tx bitrate.
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/588189

CHROMIUM: ath10k: Implementing airtime fairness based TX scheduler.
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/588190/13

CHROMIUM: ath10k: use frame count as defict for fq_codel.
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/588192/13

I had some discussions with Rajkumar on how to adapt some methods from
my patch and make it fit with the new TXQ scheduling API proposed
here. The major issue is the lack of pending airtime accounting. It
could be done in individual wireless drivers, such as in ath10k, or
even in firmware, but implement it in mac80211 could be a better
choice than do it one driver at a time.

Thanks,
Kan

On Fri, Sep 7, 2018 at 3:22 PM, Toke Høiland-Jørgensen  wrote:
> This is an updated version of the patch series to move TXQ scheduling and
> airtime fairness scheduling into mac80211. I've only compile tested this
> version, but thought it was better to get the conversation moving instead
> of being blocked on me.
>
> This addresses most of the comments from the last round. Specifically:
>
> - It folds in Rajkumar's patches to keep per-AC TXQ lists, and to add an
>   API that ath10k can use to check if a TXQ may transmit according even
>   when not using get_next_txq(). I changed a few names and descriptions,
>   but otherwise it's mostly the same. After the discussions we had in the
>   last series, I *think* it will work this way, but I'm not entirely sure.
>
> - I got rid of any mention of seqno in next_txq() and schedule_txq() - and
>   removed the third parameter to schedule_txq() entirely, so drivers can no
>   longer signal that a TXQ should be allowed to re-appear in a scheduling
>   round. We can add that back if needed.
>
> - Added a helper function to schedule and wake TXQs in a single call, for
>   internal mac80211 use.
>
> - Changed the default station weight to 256 and got rid of the per-phy
>   quantum. This makes it possible to lower station weights without having
>   to change the weights of every other station.
>
> A few things that were 

[PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-09-03 Thread Kan Yan
Hi Wen Gong,

Do you also have throughput and latency data for different
sk_pacing_shift setting when the ath10k client is at some distance
away from AP? Just curious about its impact on latency with less than
ideal PHY rate.

Thanks,
Kan


Re: [PATCH v7] Add new mac80211 driver mwlwifi.

2015-12-14 Thread Kan Yan
David Lin  writes:

> +static inline struct sk_buff *mwl_tx_do_amsdu(struct mwl_priv *priv,
> +   int desc_num,
> ...
> + amsdu_pkts = (struct sk_buff_head *)
> + kmalloc(sizeof(*amsdu_pkts), GFP_KERNEL);
> + if (!amsdu_pkts) {

Should GFP_ATOMIC be used here instead of GFP_KERNEL? This function could be
called in interrupt context.

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html