Re: [PATCH 1/2] mac80211: Implement Airtime-based Queue Limit (AQL)
> I'd like to use the new rate calculation code that Felix added to mt76. > Is the arsta->txrate info in ath10k suitable to be passed up to mac80211 > and used in that, do you think? Because then that would probably be the > easiest way to go about it... Do you mean the mt76 patch that using the EWMA to smooth estimate data rate? It would be great if you can move that to mac80211. Yes, arsta->txrate info in ath10k is a good place to extract latest date rate. On Mon, Oct 7, 2019 at 12:43 PM Johannes Berg wrote: > > On Mon, 2019-10-07 at 21:40 +0200, Toke Høiland-Jørgensen wrote: > > > > > So if and when we start supporting true multi-band devices we'll have to > > > > change these things anyway. So might as well keep everything together so > > > > it all gets fixed :) > > > > > > I guess I'm OK with that, but I'm pretty sure this will come up sooner > > > rather than later ... > > > > > > What else is there though? > > > > By "it all" I meant "all the airtime fairness stuff". Other than that, I > > didn't have anything in particular in mind. I just kinda assumed there > > would be lots of places that had an implicit assumption that all devices > > on the same phy shares a channel... > > Not _that_ much - we do have the channel contexts after all. But except > for hwsim (*cough cough* I was lazy) nothing actually implements real > concurrent multi-channel yet, obviously, but uses a single radio with > channel hopping... > > johannes >
Re: [PATCH v2 1/2] mac80211: Implement Airtime-based Queue Limit (AQL)
Hi Johannes, Thanks for the review and will address all issues you pointed out in the next version. Hi Yibo, > > I assume here the only txq in the list that does not meet AQL check will > not be dequeued. Right? Will it affect peak throughput once there is > only one station. Yes, the txq won't be picked for transmitting even if it is the only active txq if the AQL check failed. However, this won't affect peak throughput. The reason why there are two queue limits is address this kind of situation. The higher queue limit ensures the hardware get enough frames. > > > @@ -3748,10 +3785,10 @@ bool ieee80211_txq_may_transmit(struct > > ieee80211_hw *hw, > > struct sta_info *sta; > > u8 ac = txq->ac; > > > > - spin_lock_bh(&local->active_txq_lock[ac]); > > - > > if (!txqi->txq.sta) > > - goto out; > > + return true; > > why return here? I think even a txq without sta info should get removed > from list and added it back later in return_txq() if needed. No? Yes, it should be removed from the active list. I will fix that. Thanks, Kan On Wed, Oct 9, 2019 at 1:18 AM Yibo Zhao wrote: > > > diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c > > index f13eb2f61ccf..dadb643a5498 100644 > > --- a/net/mac80211/tx.c > > +++ b/net/mac80211/tx.c > > @@ -3669,7 +3669,8 @@ struct ieee80211_txq *ieee80211_next_txq(struct > > ieee80211_hw *hw, u8 ac) > > { > > struct ieee80211_local *local = hw_to_local(hw); > > struct ieee80211_txq *ret = NULL; > > - struct txq_info *txqi = NULL; > > + struct txq_info *txqi = NULL, *head = NULL; > > + bool found_eligible_txq = false; > > > > spin_lock_bh(&local->active_txq_lock[ac]); > > > > @@ -3680,20 +3681,32 @@ struct ieee80211_txq > > *ieee80211_next_txq(struct ieee80211_hw *hw, u8 ac) > > if (!txqi) > > goto out; > > > > + if (txqi == head && !found_eligible_txq) > > + goto out; > > I assume here the only txq in the list that does not meet AQL check will > not be dequeued. Right? Will it affect peak throughput once there is > only one station. > > How about dequeuing it anyway regardless AQL because it is the only one > active now so it is fine to occupy the rest bandwidth. Otherwise, I am > afraid next_txq() will return NULL in the test only one station is > present. > > > @@ -3748,10 +3785,10 @@ bool ieee80211_txq_may_transmit(struct > > ieee80211_hw *hw, > > struct sta_info *sta; > > u8 ac = txq->ac; > > > > - spin_lock_bh(&local->active_txq_lock[ac]); > > - > > if (!txqi->txq.sta) > > - goto out; > > + return true; > > why return here? I think even a txq without sta info should get removed > from list and added it back later in return_txq() if needed. No? > > > + > > + spin_lock_bh(&local->active_txq_lock[ac]); > > > > if (list_empty(&txqi->schedule_order)) > > goto out; > > > -- > Yibo
[PATCH v2 2/2] ath10k: Enable Airtime-based Queue Limit (AQL)
Calculate the estimated airtime pending in the txqs and apply AQL to prevent excessive amounts of packets being queued in the firmware queue. Signed-off-by: Kan Yan --- drivers/net/wireless/ath/ath10k/htt_rx.c | 1 + drivers/net/wireless/ath/ath10k/mac.c| 8 +--- drivers/net/wireless/ath/ath10k/txrx.c | 13 ++--- 3 files changed, 16 insertions(+), 6 deletions(-) diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c index 83a7fb68fd24..f2115b940964 100644 --- a/drivers/net/wireless/ath/ath10k/htt_rx.c +++ b/drivers/net/wireless/ath/ath10k/htt_rx.c @@ -3053,6 +3053,7 @@ static void ath10k_htt_rx_tx_fetch_ind(struct ath10k *ar, struct sk_buff *skb) num_msdus++; num_bytes += ret; + may_tx = ieee80211_txq_aql_check(hw, txq); } ieee80211_return_txq(hw, txq, false); ieee80211_txq_schedule_end(hw, txq->ac); diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c index 0606416dc971..c22b822bd8f1 100644 --- a/drivers/net/wireless/ath/ath10k/mac.c +++ b/drivers/net/wireless/ath/ath10k/mac.c @@ -3983,6 +3983,9 @@ static bool ath10k_mac_tx_can_push(struct ieee80211_hw *hw, struct ath10k_txq *artxq = (void *)txq->drv_priv; /* No need to get locks */ + if (!ieee80211_txq_aql_check(hw, txq)) + return false; + if (ar->htt.tx_q_state.mode == HTT_TX_MODE_SWITCH_PUSH) return true; @@ -4009,13 +4012,11 @@ static u16 ath10k_mac_update_airtime(struct ath10k *ar, { struct ath10k_sta *arsta; u32 pktlen; - u16 airtime = 0; + s32 airtime = 0; if (!txq || !txq->sta) return airtime; - if (test_bit(WMI_SERVICE_REPORT_AIRTIME, ar->wmi.svc_map)) - return airtime; spin_lock_bh(&ar->data_lock); arsta = (struct ath10k_sta *)txq->sta->drv_priv; @@ -4038,6 +4039,7 @@ static u16 ath10k_mac_update_airtime(struct ath10k *ar, } spin_unlock_bh(&ar->data_lock); + ieee80211_sta_update_pending_airtime(txq->sta, txq->tid, airtime); return airtime; } diff --git a/drivers/net/wireless/ath/ath10k/txrx.c b/drivers/net/wireless/ath/ath10k/txrx.c index 4102df016931..03eb636c85ed 100644 --- a/drivers/net/wireless/ath/ath10k/txrx.c +++ b/drivers/net/wireless/ath/ath10k/txrx.c @@ -84,9 +84,16 @@ int ath10k_txrx_tx_unref(struct ath10k_htt *htt, wake_up(&htt->empty_tx_wq); spin_unlock_bh(&htt->tx_lock); - if (txq && txq->sta && skb_cb->airtime_est) - ieee80211_sta_register_airtime(txq->sta, txq->tid, - skb_cb->airtime_est, 0); + if (txq && txq->sta && skb_cb->airtime_est) { + s32 airtime = -skb_cb->airtime_est; + + if (!test_bit(WMI_SERVICE_REPORT_AIRTIME, ar->wmi.svc_map)) + ieee80211_sta_register_airtime(txq->sta, txq->tid, + skb_cb->airtime_est, 0); + + ieee80211_sta_update_pending_airtime(txq->sta, txq->tid, +airtime); + } if (ar->bus_param.dev_type != ATH10K_DEV_TYPE_HL) dma_unmap_single(dev, skb_cb->paddr, msdu->len, DMA_TO_DEVICE); -- 2.23.0.581.g78d2f28ef7-goog
[PATCH v2 1/2] mac80211: Implement Airtime-based Queue Limit (AQL)
In order for the Fq_CoDel integrated in mac80211 layer operates effectively to control excessive queueing latency, the CoDel algorithm requires an accurate measure of how long the packets stays in the queue, aka sojourn time. The sojourn time measured at mac80211 layer doesn't include queueing latency in lower layer (firmware/hardware) and CoDel expects lower layer to have a short queue. However, most 802.11ac chipsets offload tasks such TX aggregation to firmware or hardware, thus have a deep lower layer queue. Without a mechanism to control the lower layer queue size, packets only stays in mac80211 layer transiently before being sent to firmware queue. As a result, the sojourn time measured by CoDel in the mac80211 layer is almost always lower than the CoDel latency target, hence CoDel does little to control the latency, even when the lower layer queue causes excessive latency. Byte Queue limits (BQL) is commonly used to address the similar issue with wired network interface. However, this method cannot be applied directly to the wireless network interface. Byte is not a suitable measure of queue depth in the wireless network, as the data rate can vary dramatically from station to station in the same network, from a few Mbps to over Gbps. This patch implemented an Airtime-based Queue Limit (AQL) to make CoDel works effectively with wireless drivers that utilized firmware/hardware offloading. AQL only allows each txq to release just enough packets to the lower layer to form 1-2 large aggregations to keep hardware fully utilized and keep the rest of frames in mac80211 layer to be controlled by the CoDel algorithm. Signed-off-by: Kan Yan --- include/net/cfg80211.h | 7 include/net/mac80211.h | 29 ++ net/mac80211/debugfs.c | 78 ++ net/mac80211/debugfs_sta.c | 44 +++-- net/mac80211/ieee80211_i.h | 4 ++ net/mac80211/main.c| 8 +++- net/mac80211/sta_info.c| 20 ++ net/mac80211/sta_info.h| 4 ++ net/mac80211/tx.c | 58 +++- 9 files changed, 231 insertions(+), 21 deletions(-) diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index 26e2ad2c7027..05352eac82ec 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -2499,6 +2499,13 @@ enum wiphy_params_flags { #define IEEE80211_DEFAULT_AIRTIME_WEIGHT 256 +/* The per TXQ firmware queue limit in airtime */ +#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_L 4000 +#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_H 8000 + +/* The per interface airtime threshold to switch to lower queue limit */ +#define IEEE80211_AQL_THRESHOLD24000 + /** * struct cfg80211_pmksa - PMK Security Association * diff --git a/include/net/mac80211.h b/include/net/mac80211.h index d26da013f7c0..51ee28e72161 100644 --- a/include/net/mac80211.h +++ b/include/net/mac80211.h @@ -5543,6 +5543,35 @@ void ieee80211_send_eosp_nullfunc(struct ieee80211_sta *pubsta, int tid); void ieee80211_sta_register_airtime(struct ieee80211_sta *pubsta, u8 tid, u32 tx_airtime, u32 rx_airtime); +/** + * ieee80211_sta_update_pending_airtime - update txq's estimated airtime + * + * Update the estimated total airtime of frames queued in the lower layer queue. + * + * The airtime is estimated using frame length and the last reported data + * rate. The pending airtime for a txq is increased by the estimated + * airtime when the frame is relased to the lower layer, and decreased by the + * same amount at the tx completion event. + * + * @pubsta: the station + * @tid: the TID to register airtime for + * @tx_airtime: the estimated airtime (in usec) + */ +void ieee80211_sta_update_pending_airtime(struct ieee80211_sta *pubsta, + u8 tid, s32 tx_airtime); + +/** + * ieee80211_txq_aql_check - check if a txq can send more frames to firmware + * + * @hw: pointer obtained from ieee80211_alloc_hw() + * @txq: pointer obtained from station or virtual interface + * + * Return true if the AQL's airtime limit has not been reached and the txq can + * continue to send more packets to the lower layer. Otherwise return false. + */ +bool +ieee80211_txq_aql_check(struct ieee80211_hw *hw, struct ieee80211_txq *txq); + /** * ieee80211_iter_keys - iterate keys programmed into the device * @hw: pointer obtained from ieee80211_alloc_hw() diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c index 2e7f75938c51..f7cbf248163a 100644 --- a/net/mac80211/debugfs.c +++ b/net/mac80211/debugfs.c @@ -148,6 +148,80 @@ static const struct file_operations aqm_ops = { .llseek = default_llseek, }; +static ssize_t aql_txq_limit_read(struct file *file, + char __user *user_buf, + size_t count, + loff_t *ppos) +{ + struct ieee80211_loca
[PATCH v2 0/2] Implement Airtime-based Queue Limit (AQL)
This patch series implements Airtime-based Queue Limit (AQL) in the mac80211 and Ath10k driver. It is based on an earlier version from the ChromiumOS tree[0]. This version has been tested with QCA9884 platform with 4.14 kernel. Tests show AQL is able to reduce latency by an order of magnitude in a congested environment without negative impact on the throughput. [0] https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1703105/7 Kan Yan (2): mac80211: Implement Airtime-based Queue Limit (AQL) ath10k: Enable Airtime-based Queue Limit (AQL) drivers/net/wireless/ath/ath10k/htt_rx.c | 1 + drivers/net/wireless/ath/ath10k/mac.c| 8 ++- drivers/net/wireless/ath/ath10k/txrx.c | 13 +++- include/net/cfg80211.h | 7 +++ include/net/mac80211.h | 29 + net/mac80211/debugfs.c | 78 net/mac80211/debugfs_sta.c | 44 + net/mac80211/ieee80211_i.h | 4 ++ net/mac80211/main.c | 8 ++- net/mac80211/sta_info.c | 20 ++ net/mac80211/sta_info.h | 4 ++ net/mac80211/tx.c| 58 +++--- 12 files changed, 247 insertions(+), 27 deletions(-) -- Changes from v1: - Fix checkpatch error. - Keep iterate the list of active_txq until an eligible txq's deficit become non-negative in ieee80211_next_txq(), instead of break the loop after one iteration. - Enforce the AQL limit in ath10k's pulling mode in ath10k_htt_rx_tx_fetch_ind()
Re: [PATCH 1/2] mac80211: Implement Airtime-based Queue Limit (AQL)
Hi Toke, > So I think it'll be better to pass up the last_rate and have mac80211 > calculate the airtime (like I did in my patch set). But we can keep the > rest of your logic and just switch the calculation, I guess? Sounds like a good plan to me. Moving the airtime calculation do makes it easier to support multiple drivers. The only benefit of keep that part in driver is we can avoid extending ieee80211_tx_info with a new field "tx_time_est". I will fix the "AIRTIME_USE_TX" part and prepare a new patch. Thanks, Kan On Fri, Oct 4, 2019 at 7:08 PM Kan Yan wrote: > > Hi Johannes, > > Thanks for help review this patch. I will fix all style errors. > > > > +ieee80211_txq_airtime_check(struct ieee80211_hw *hw, struct > > > ieee80211_txq *txq); > > Why is it necessary to call this, vs. just returning NULL when an SKB is > > requested? > This function is also called by ath10k, from ath10k_mac_tx_can_push(), > which returns a boolean. > > > However, I'm not sure it *shouldn't* actually be per interface (i.e. > > moving from > > local->aql_total_pending_airtime > > to > > > > sdata->aql_total_pending_airtime > > > > because you could have multiple channels etc. involved and then using a > > single airtime limit across two interfaces that are actually on two > > different channels (e.g. 2.4 and 5 GHz) doesn't make that much sense. > > > > Actually, it does make some sense as long as the NIC is actually > > channel-hopping ... but that's in the process of changing now, there's > > going to be hardware really soon (or perhaps already exists) that has > > real dual-band capabilities... > > That's a good point. I haven't thought about real simultaneous dual > band chipset and such chipset do exists now. Is RSDB support coming to > mac80211 soon? Just curious if it will be just virtual interfaces or > something else. I chose "local" instead of "sdata" thinking about the > case of several virtual interfaces (AP, STA, MESH) operates in the > same channel, then the interface total could be a better choice. > > I am ok with moving the "aql_total_pending_airtime" into sdata, but > afraid that's not the most optimal choice for the case of multiple > virtual interfaces operates in the same channel. > Maybe we could leave it in "local" for now. What do you think? > > Kan > > On Fri, Oct 4, 2019 at 8:44 AM Toke Høiland-Jørgensen wrote: > > > > Kan Yan writes: > > > > > In order for the Fq_CoDel integrated in mac80211 layer operates > > > effectively to control excessive queueing latency, the CoDel algorithm > > > requires an accurate measure of how long the packets stays in the > > > queue, aka sojourn time. The sojourn time measured at mac80211 layer > > > doesn't include queueing latency in lower layer (firmware/hardware) > > > and CoDel expects lower layer to have a short queue. However, most > > > 802.11ac chipsets offload tasks such TX aggregation to firmware or > > > hardware, thus have a deep lower layer queue. Without a mechanism to > > > control the lower layer queue size, packets only stays in mac80211 > > > layer transiently before being released to firmware due to the deep > > > queue in the lower layer. In this case, the sojourn time measured by > > > CoDel in the mac80211 layer is always lower than the CoDel latency > > > target hence it does little to control the latency, even when the lower > > > layer queue causes excessive latency. > > > > > > Byte Queue limits (BQL) is commonly used to address the similar issue > > > with wired network interface. However, this method can not be applied > > > directly to the wireless network interface. Byte is not a suitable > > > measure of queue depth in the wireless network, as the data rate can > > > vary dramatically from station to station in the same network, from a > > > few Mbps to over 1 Gbps. > > > > > > This patch implemented an Airtime-based Queue Limit (AQL) to make CoDel > > > works effectively with wireless drivers that utilized firmware/hardware > > > offloading. It only allows each txq to release just enough packets to > > > form 1-2 large aggregations to keep hardware fully utilized and keep the > > > rest of frames in mac80211 layer to be controlled by CoDel. > > > > > > Change-Id: I1427db2c4c7fcb4162b087514dcb06d5613fa5d2 > > > Signed-off-by: Kan Yan > > > --- > > > include/net/cfg80211.h | 7 > >
Re: [PATCH 1/2] mac80211: Implement Airtime-based Queue Limit (AQL)
Hi Johannes, Thanks for help review this patch. I will fix all style errors. > > +ieee80211_txq_airtime_check(struct ieee80211_hw *hw, struct ieee80211_txq > > *txq); > Why is it necessary to call this, vs. just returning NULL when an SKB is > requested? This function is also called by ath10k, from ath10k_mac_tx_can_push(), which returns a boolean. > However, I'm not sure it *shouldn't* actually be per interface (i.e. > moving from > local->aql_total_pending_airtime > to > > sdata->aql_total_pending_airtime > > because you could have multiple channels etc. involved and then using a > single airtime limit across two interfaces that are actually on two > different channels (e.g. 2.4 and 5 GHz) doesn't make that much sense. > > Actually, it does make some sense as long as the NIC is actually > channel-hopping ... but that's in the process of changing now, there's > going to be hardware really soon (or perhaps already exists) that has > real dual-band capabilities... That's a good point. I haven't thought about real simultaneous dual band chipset and such chipset do exists now. Is RSDB support coming to mac80211 soon? Just curious if it will be just virtual interfaces or something else. I chose "local" instead of "sdata" thinking about the case of several virtual interfaces (AP, STA, MESH) operates in the same channel, then the interface total could be a better choice. I am ok with moving the "aql_total_pending_airtime" into sdata, but afraid that's not the most optimal choice for the case of multiple virtual interfaces operates in the same channel. Maybe we could leave it in "local" for now. What do you think? Kan On Fri, Oct 4, 2019 at 8:44 AM Toke Høiland-Jørgensen wrote: > > Kan Yan writes: > > > In order for the Fq_CoDel integrated in mac80211 layer operates > > effectively to control excessive queueing latency, the CoDel algorithm > > requires an accurate measure of how long the packets stays in the > > queue, aka sojourn time. The sojourn time measured at mac80211 layer > > doesn't include queueing latency in lower layer (firmware/hardware) > > and CoDel expects lower layer to have a short queue. However, most > > 802.11ac chipsets offload tasks such TX aggregation to firmware or > > hardware, thus have a deep lower layer queue. Without a mechanism to > > control the lower layer queue size, packets only stays in mac80211 > > layer transiently before being released to firmware due to the deep > > queue in the lower layer. In this case, the sojourn time measured by > > CoDel in the mac80211 layer is always lower than the CoDel latency > > target hence it does little to control the latency, even when the lower > > layer queue causes excessive latency. > > > > Byte Queue limits (BQL) is commonly used to address the similar issue > > with wired network interface. However, this method can not be applied > > directly to the wireless network interface. Byte is not a suitable > > measure of queue depth in the wireless network, as the data rate can > > vary dramatically from station to station in the same network, from a > > few Mbps to over 1 Gbps. > > > > This patch implemented an Airtime-based Queue Limit (AQL) to make CoDel > > works effectively with wireless drivers that utilized firmware/hardware > > offloading. It only allows each txq to release just enough packets to > > form 1-2 large aggregations to keep hardware fully utilized and keep the > > rest of frames in mac80211 layer to be controlled by CoDel. > > > > Change-Id: I1427db2c4c7fcb4162b087514dcb06d5613fa5d2 > > Signed-off-by: Kan Yan > > --- > > include/net/cfg80211.h | 7 > > include/net/mac80211.h | 34 > > net/mac80211/debugfs.c | 79 ++ > > net/mac80211/debugfs_sta.c | 43 +++-- > > net/mac80211/ieee80211_i.h | 4 ++ > > net/mac80211/main.c| 7 +++- > > net/mac80211/sta_info.c| 23 +++ > > net/mac80211/sta_info.h| 4 ++ > > net/mac80211/tx.c | 60 - > > 9 files changed, 239 insertions(+), 22 deletions(-) > > > > diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h > > index 26e2ad2c7027..301c11be366a 100644 > > --- a/include/net/cfg80211.h > > +++ b/include/net/cfg80211.h > > @@ -2499,6 +2499,13 @@ enum wiphy_params_flags { > > > > #define IEEE80211_DEFAULT_AIRTIME_WEIGHT 256 > > > > +/* The per TXQ firmware queue limit in airtime */ > > +#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_L
[PATCH 2/2] ath10k: Enable Airtime-based Queue Limit (AQL)
Calculate the estimated airtime pending in the txqs and apply AQL to prevent excessive amounts of packets being queued in the firmware queue. Change-Id: I4047ce0938f7b97440fb040b16b74f2e49aa3433 Signed-off-by: Kan Yan --- drivers/net/wireless/ath/ath10k/mac.c | 7 +-- drivers/net/wireless/ath/ath10k/txrx.c | 12 +--- 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c index 0606416dc971..37eb56c19f19 100644 --- a/drivers/net/wireless/ath/ath10k/mac.c +++ b/drivers/net/wireless/ath/ath10k/mac.c @@ -3983,6 +3983,9 @@ static bool ath10k_mac_tx_can_push(struct ieee80211_hw *hw, struct ath10k_txq *artxq = (void *)txq->drv_priv; /* No need to get locks */ + if (!ieee80211_txq_airtime_check(hw, txq)) + return false; + if (ar->htt.tx_q_state.mode == HTT_TX_MODE_SWITCH_PUSH) return true; @@ -4014,8 +4017,6 @@ static u16 ath10k_mac_update_airtime(struct ath10k *ar, if (!txq || !txq->sta) return airtime; - if (test_bit(WMI_SERVICE_REPORT_AIRTIME, ar->wmi.svc_map)) - return airtime; spin_lock_bh(&ar->data_lock); arsta = (struct ath10k_sta *)txq->sta->drv_priv; @@ -4038,6 +4039,8 @@ static u16 ath10k_mac_update_airtime(struct ath10k *ar, } spin_unlock_bh(&ar->data_lock); + ieee80211_sta_register_pending_airtime(txq->sta, txq->tid, airtime, + false); return airtime; } diff --git a/drivers/net/wireless/ath/ath10k/txrx.c b/drivers/net/wireless/ath/ath10k/txrx.c index 4102df016931..9266d3dd6adb 100644 --- a/drivers/net/wireless/ath/ath10k/txrx.c +++ b/drivers/net/wireless/ath/ath10k/txrx.c @@ -84,9 +84,15 @@ int ath10k_txrx_tx_unref(struct ath10k_htt *htt, wake_up(&htt->empty_tx_wq); spin_unlock_bh(&htt->tx_lock); - if (txq && txq->sta && skb_cb->airtime_est) - ieee80211_sta_register_airtime(txq->sta, txq->tid, - skb_cb->airtime_est, 0); + if (txq && txq->sta && skb_cb->airtime_est) { + if (!test_bit(WMI_SERVICE_REPORT_AIRTIME, ar->wmi.svc_map)) + ieee80211_sta_register_airtime(txq->sta, txq->tid, + skb_cb->airtime_est, 0); + + ieee80211_sta_register_pending_airtime(txq->sta, txq->tid, + skb_cb->airtime_est, + true); + } if (ar->bus_param.dev_type != ATH10K_DEV_TYPE_HL) dma_unmap_single(dev, skb_cb->paddr, msdu->len, DMA_TO_DEVICE); -- 2.23.0.581.g78d2f28ef7-goog
[PATCH 0/2] Implement Airtime-based Queue Limit (AQL)
This patch series implements Airtime-based Queue Limit (AQL) in the mac80211 and Ath10k driver. It is based on an earlier version from the ChromiumOS tree[0]. This version has been tested with QCA9884 platform with 4.14 kernel. Tests show AQL is able to reduce latency by an order of magnitude in a congested environment without negative impact on the throughput. [0] https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1703105/7 Kan Yan (2): mac80211: Implement Airtime-based Queue Limit (AQL) ath10k: Enable Airtime-based Queue Limit (AQL) drivers/net/wireless/ath/ath10k/mac.c | 7 ++- drivers/net/wireless/ath/ath10k/txrx.c | 12 +++- include/net/cfg80211.h | 7 +++ include/net/mac80211.h | 34 +++ net/mac80211/debugfs.c | 79 ++ net/mac80211/debugfs_sta.c | 43 ++ net/mac80211/ieee80211_i.h | 4 ++ net/mac80211/main.c| 7 ++- net/mac80211/sta_info.c| 23 net/mac80211/sta_info.h| 4 ++ net/mac80211/tx.c | 60 +++ 11 files changed, 253 insertions(+), 27 deletions(-) --
[PATCH 1/2] mac80211: Implement Airtime-based Queue Limit (AQL)
In order for the Fq_CoDel integrated in mac80211 layer operates effectively to control excessive queueing latency, the CoDel algorithm requires an accurate measure of how long the packets stays in the queue, aka sojourn time. The sojourn time measured at mac80211 layer doesn't include queueing latency in lower layer (firmware/hardware) and CoDel expects lower layer to have a short queue. However, most 802.11ac chipsets offload tasks such TX aggregation to firmware or hardware, thus have a deep lower layer queue. Without a mechanism to control the lower layer queue size, packets only stays in mac80211 layer transiently before being released to firmware due to the deep queue in the lower layer. In this case, the sojourn time measured by CoDel in the mac80211 layer is always lower than the CoDel latency target hence it does little to control the latency, even when the lower layer queue causes excessive latency. Byte Queue limits (BQL) is commonly used to address the similar issue with wired network interface. However, this method can not be applied directly to the wireless network interface. Byte is not a suitable measure of queue depth in the wireless network, as the data rate can vary dramatically from station to station in the same network, from a few Mbps to over 1 Gbps. This patch implemented an Airtime-based Queue Limit (AQL) to make CoDel works effectively with wireless drivers that utilized firmware/hardware offloading. It only allows each txq to release just enough packets to form 1-2 large aggregations to keep hardware fully utilized and keep the rest of frames in mac80211 layer to be controlled by CoDel. Change-Id: I1427db2c4c7fcb4162b087514dcb06d5613fa5d2 Signed-off-by: Kan Yan --- include/net/cfg80211.h | 7 include/net/mac80211.h | 34 net/mac80211/debugfs.c | 79 ++ net/mac80211/debugfs_sta.c | 43 +++-- net/mac80211/ieee80211_i.h | 4 ++ net/mac80211/main.c| 7 +++- net/mac80211/sta_info.c| 23 +++ net/mac80211/sta_info.h| 4 ++ net/mac80211/tx.c | 60 - 9 files changed, 239 insertions(+), 22 deletions(-) diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index 26e2ad2c7027..301c11be366a 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -2499,6 +2499,13 @@ enum wiphy_params_flags { #define IEEE80211_DEFAULT_AIRTIME_WEIGHT 256 +/* The per TXQ firmware queue limit in airtime */ +#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_L 4000 +#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_H 8000 + +/* The firmware's transmit queue size limit in airtime */ +#define IEEE80211_DEFAULT_AQL_INTERFACE_LIMIT 24000 + /** * struct cfg80211_pmksa - PMK Security Association * diff --git a/include/net/mac80211.h b/include/net/mac80211.h index d26da013f7c0..4d62aba3e4b2 100644 --- a/include/net/mac80211.h +++ b/include/net/mac80211.h @@ -5543,6 +5543,40 @@ void ieee80211_send_eosp_nullfunc(struct ieee80211_sta *pubsta, int tid); void ieee80211_sta_register_airtime(struct ieee80211_sta *pubsta, u8 tid, u32 tx_airtime, u32 rx_airtime); +/** + * ieee80211_sta_register_pending_airtime - register the estimated airtime for + * the frames pending in lower layer (firmware/hardware) txq. + * + * Update the total pending airtime of the frames released to firmware. The + * pending airtime is used by AQL to control queue size in the lower layer. + * + * The airtime is estimated using frame length and the last reported data + * rate. The pending airtime for a txq is increased by the estimated + * airtime when the frame is relased to the lower layer, and decreased by the + * same amount at the tx completion event. + * + * @pubsta: the station + * @tid: the TID to register airtime for + * @tx_airtime: the estimated airtime (in usec) + * @tx_commpleted: true if called from the tx completion event. + */ +void ieee80211_sta_register_pending_airtime(struct ieee80211_sta *pubsta, + u8 tid, u32 tx_airtime, + bool tx_completed); + +/** + * ieee80211_txq_aritme_check - check if the airtime limit of AQL (Airtime based + * queue limit) has been reached. + * @hw: pointer obtained from ieee80211_alloc_hw() + * @txq: pointer obtained from station or virtual interface + * Retrun true if the queue limit has not been reached and txq can continue to + * release packets to the lower layer. + * Return false if the queue limit has been reached and the txq should not + * release more frames to the lower layer. + */ +bool +ieee80211_txq_airtime_check(struct ieee80211_hw *hw, struct ieee80211_txq *txq); + /** * ieee80211_iter_keys - iterate keys programmed into the device * @hw: pointer obtained from ieee80211_alloc_hw() diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c index 2e7f75938c51..a7da9a93b63e 100644
Re: [PATCH RFC/RFT 4/4] mac80211: Apply Airtime-based Queue Limit (AQL) on packet dequeue
> I guess the risk is lower when with a 24ms per-iface limit; but with > enough stations I guess it could still happen, no? So we should probably > handle this case... Each txq (per sta, per tid) is allowed to release at least the lower AQL limit amount of packet (default 4ms), which is not affected by other station's PS behavior and 4ms should be sufficient for most use cases. The 24ms per-interface limit is an optimization to get good benchmark score in peak performance test, which usually only involve 1-2 stations. The higher limit probably won't matter anymore when there are many stations. I haven't noticed side effects due to PS behavior in the ChromiumOS version. Still, it is better to be able to take frames in PS queue in to account, > Cool. Are you going to submit a ported version of your implementation? > Then we can work from the two submissions and see if we can't converge > on something... Working on porting, should have something ready before the end of this week. On Sun, Sep 29, 2019 at 12:18 PM Toke Høiland-Jørgensen wrote: > > Kan Yan writes: > > >> No, ath10k would continue to do what it was always doing. Drivers that > >> can report after-the-fact airtime usage per-frame (like ath9k) will > >> continue to do that. In both of those cases, the airtime estimate is > >> only used to throttle the queue, not to schedule for fairness. > > You are right, I didn't realize ath9k reports per frame airtime usage. > > > >> Yeah, I was wondering about that. Makes sense. Why 24ms, exactly? > > The per interface 24 ms queue limit is an empirical number that works > > well for both achieve low latency when there is a lot of stations and > > get high throughput when there is only 1-2 stations. We could make it > > configurable. > > Right. "Found by trial and error" is a fine answer as far as I'm > concerned :) > > But yeah, this should probably be configurable, like BQL is. > > >> BTW, I think Felix' concern about powersave was in relation to AQL: If > >> we don't handle power save in that, we can end up in a situation where > >>the budget for packets allowed to be queued in the firmware is taken up > >> entirely by stations that are currently in powersave mode; which would > >> throttle the device completely. So we should take that into account for > >> AQL; for the fairness scheduler, stations in powersave are already > >> unscheduled, so that should be fine. > > I think the accounting for the airtime of frames in the power saving > > queue could affect both the fairness scheduler and AQL. > > For chipset with firmware offload, PS handling is mostly done by > > firmware, so host driver's knowledge of PS state could be slightly > > out-of-dated. The power save behavior also make it harder to the > > airtime_weight correct for the fairness scheduler. > > Hmm, maybe. I'm not sure how significant this effect would be, but I > guess we'll need to find out :) > > > > Powersave mode's impact to AQL is much smaller. The lower per station > > queue limit is not impacted by other stations PS behavior, since the > > estimated future airtime is not weighted for other stations and a > > station won't get blocked due to others stations in PS mode. > > Station in PS mode do affects AQL's higher per interface limit, but in > > an inconsequential way. The per interface AQL queue limit is quite > > large (24 ms), hence airtime from packets in PS queue is unlikely to > > have a significant impact on it. Still, it will be better if the > > packet in power saving queue can be taken into account. > > I guess the risk is lower when with a 24ms per-iface limit; but with > enough stations I guess it could still happen, no? So we should probably > handle this case... > > >> > make it easier to schedule multiple stations, I think it has some merit > >> > that makes it worth trying out. We should probably get the AQL stuff > >> > done first, though, and try the virtual time scheduler on top of that. > >> Agree that we should get the AQL stuff done first since I believe it > >> will help to fix the issue mentioned above. > > That sounds like a good plan. The virtual time scheduler is more > > involved and will take more work to get it right. It make sense to get > > AQL done first. > > Cool. Are you going to submit a ported version of your implementation? > Then we can work from the two submissions and see if we can't converge > on something... > > -Toke >
Re: [PATCH RFC/RFT 4/4] mac80211: Apply Airtime-based Queue Limit (AQL) on packet dequeue
> No, ath10k would continue to do what it was always doing. Drivers that > can report after-the-fact airtime usage per-frame (like ath9k) will > continue to do that. In both of those cases, the airtime estimate is > only used to throttle the queue, not to schedule for fairness. You are right, I didn't realize ath9k reports per frame airtime usage. > Yeah, I was wondering about that. Makes sense. Why 24ms, exactly? The per interface 24 ms queue limit is an empirical number that works well for both achieve low latency when there is a lot of stations and get high throughput when there is only 1-2 stations. We could make it configurable. > BTW, I think Felix' concern about powersave was in relation to AQL: If > we don't handle power save in that, we can end up in a situation where >the budget for packets allowed to be queued in the firmware is taken up > entirely by stations that are currently in powersave mode; which would > throttle the device completely. So we should take that into account for > AQL; for the fairness scheduler, stations in powersave are already > unscheduled, so that should be fine. I think the accounting for the airtime of frames in the power saving queue could affect both the fairness scheduler and AQL. For chipset with firmware offload, PS handling is mostly done by firmware, so host driver's knowledge of PS state could be slightly out-of-dated. The power save behavior also make it harder to the airtime_weight correct for the fairness scheduler. Powersave mode's impact to AQL is much smaller. The lower per station queue limit is not impacted by other stations PS behavior, since the estimated future airtime is not weighted for other stations and a station won't get blocked due to others stations in PS mode. Station in PS mode do affects AQL's higher per interface limit, but in an inconsequential way. The per interface AQL queue limit is quite large (24 ms), hence airtime from packets in PS queue is unlikely to have a significant impact on it. Still, it will be better if the packet in power saving queue can be taken into account. > > make it easier to schedule multiple stations, I think it has some merit > > that makes it worth trying out. We should probably get the AQL stuff > > done first, though, and try the virtual time scheduler on top of that. > Agree that we should get the AQL stuff done first since I believe it > will help to fix the issue mentioned above. That sounds like a good plan. The virtual time scheduler is more involved and will take more work to get it right. It make sense to get AQL done first. On Fri, Sep 27, 2019 at 2:20 AM Yibo Zhao wrote: > > On 2019-09-27 14:12, Toke Høiland-Jørgensen wrote: > > Kan Yan writes: > > > >>> No, but on tx_completion we could do something like this: > >>> airtime = CB(skb)->tx_time ?: CB(skb)->tx_time_est; > >>> ieee80211_report_airtime(sta, airtime); > >>> That way, if the driver sets the tx_time field to something the > >>> firmware > >>> reports, we'll use that, and otherwise we'd fall back to the > >>> estimate. > >>> Of course, there would need to be a way for the driver to opt out of > >>> this, for drivers that report out of band airtime like ath10k does :) > >> > >> I doubt that will work, unless firmware can do per frame airtime > >> update in the skb. It is more likely that firmware provides out of > >> band airtime update for a period of time, like an aggregation. That's > >> the case for ath10k: https://patchwork.kernel.org/patch/10684689 > > > > No, ath10k would continue to do what it was always doing. Drivers that > > can report after-the-fact airtime usage per-frame (like ath9k) will > > continue to do that. In both of those cases, the airtime estimate is > > only used to throttle the queue, not to schedule for fairness. > > > > My point is just that for drivers that have *no* mechanism to report > > airtime usage after-the-fact, we can add a flag that will allow the > > values estimated by mac80211 to also be used for the fairness > > scheduler... > > > >> Regarding handling frame for station enters power saving mode: > >>> > >>> >> Oh, right. Didn't know that could happen (I assumed those would be > >>> >> flushed out or something). But if we're going to go with Kan's > >>> >> suggestion of having per-station accounting as well as a global > >>> >> accounting for the device, we could just subtract the station's > >>> >> outstanding balance from the device total when it goes into powersave > >>> >> mode, and
Re: [PATCH RFC/RFT 4/4] mac80211: Apply Airtime-based Queue Limit (AQL) on packet dequeue
> No, but on tx_completion we could do something like this: > airtime = CB(skb)->tx_time ?: CB(skb)->tx_time_est; > ieee80211_report_airtime(sta, airtime); > That way, if the driver sets the tx_time field to something the firmware > reports, we'll use that, and otherwise we'd fall back to the estimate. > Of course, there would need to be a way for the driver to opt out of > this, for drivers that report out of band airtime like ath10k does :) I doubt that will work, unless firmware can do per frame airtime update in the skb. It is more likely that firmware provides out of band airtime update for a period of time, like an aggregation. That's the case for ath10k: https://patchwork.kernel.org/patch/10684689 Regarding handling frame for station enters power saving mode: > > >> Oh, right. Didn't know that could happen (I assumed those would be > >> flushed out or something). But if we're going to go with Kan's > >> suggestion of having per-station accounting as well as a global > >> accounting for the device, we could just subtract the station's > >> outstanding balance from the device total when it goes into powersave > >> mode, and add it back once it wakes up again. At least I think that > >> would work, no? > >Yes, I think that would work. > Great! Will incorporate that, then. I think that could work but things could be a bit more complicated. Not sure I fully understand the usage of airtime_weight_sum in [PATCH V3 1/4] mac80211: Switch to a virtual time-based airtime scheduler: in ieee80211_schedule_txq(): local->airtime_weight_sum[ac] += sta->airtime_weight; in ieee80211_sta_register_airtime(): weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight; local->airtime_v_t[ac] += airtime / weight_sum; sta->airtime[ac].v_t += airtime / sta->airtime_weight; in __ieee80211_unschedule_txq local->airtime_weight_sum[ac] -= sta->airtime_weight; I assume the purpose of airtime_weight_sum is to count station's virtual airtime proportional to the number of active stations for fairness. My concern is the per interface local->airtime_weight_sum[ac] get updated when packets are released from a txq to lower layer, but it doesn't mean the airtime will be consumed (packets get transmitted) shortly, due to events like station enter power save mode, so the weight_sum used in ieee80211_sta_register_airtime() maybe inaccurate. For architects using firmware/hardware offloading, firmware ultimately controls packet scheduling and has quite a lot of autonomy. The host driver's airtime_weight_sum may get out of sync with the number of active stations actually scheduled by firmware even without power saving events. Is this a correct understanding? I kind of think the original method of airtime accounting using deficit maybe more robust in this regard.
Re: [PATCH RFC/RFT 4/4] mac80211: Apply Airtime-based Queue Limit (AQL) on packet dequeue
> > Do you mean we will use airtime reported by FW to calculate > > > local->airtime_queued in case we have FW reporting airtime? > No, the opposite; if the firmware can't report airtime, we can use the > estimated values to feed into report_airtime() for the fairness > calculation... The local->airtime_queued is the 'future' airtime for the packet pending the queue. It can't be replaced by the after the fact airtime reported from firmware for the frames transmitted. On Wed, Sep 25, 2019 at 5:27 PM Kan Yan wrote: > > > Yes, please do! AFAICT, the main difference is that your version keeps > > the airtime calculation itself in the driver, while mine passes up the > > rate and lets mac80211 do the calculation of airtime. Other than that, > > the differences are minor, no? > > I'm not actually sure which approach is best; I suspect doing all the > > accounting in mac80211 will help with integrating this into drivers that > > use minstrel; we can just add a hook in that and be done with it. > > Whereas if the driver has to do the accounting, we would need to add > > that to each driver (mt76, iwl(?)). > > Yes, they are essentially doing the same thing. I kept the airtime > estimation code in the ath10k just because it is already there. It is > better to do that in mac80211, so it doesn't have to be duplicated for > each driver and avoids the overhead of updating the estimated airtime > from host driver to mac80211. > > > But of course, doing things in mac80211 depends on stuffing even more > > stuff into the already overloaded cb field; and I'm not actually > > entirely sure what I've done with that will actually work. WDYT? > Either way a field in skb cb is needed to record the estimated > airtime. The 'tx_time_est' shares the space with the codel > 'enque_time' looks fine to me, as their lifetime doesn't overlap. > > There is another minor difference in the ChromiumOs version, which > actually address the issue Yibo just asked: > > Meanwhile, airtime_queued will also limit the situation that we only > > have a station for transmission. Not sure if the peak throughput will be > > impacted. I believe it may work fine with 11ac in chromiumos, how about > > 11n and 11a? > > My version has two AQL limits, a smaller per station limit (4ms) and a > larger per interface limit (24 ms). When the per interface limit has > not been reached, stations are allowed to transmit up to 1/3 of the > interface limits (8ms). This way it balance the needs to control > latency when there are a lot of stations and to get good throughput > benchmark numbers with a single client. In my test, I found increasing > the AQL limit to beyond 8 ms doesn't helps peak throughput on 4x4 > ath10k chipset. > https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1734867/3/net/mac80211/tx.c#b3734 > > > Of course we'll also have to eventually integrate this with the other > > series that Yibo recently re-posted (the virtual time scheduler). I > > think that will be relatively straight forward, except I'm not sure your > > atomic patches will work when we also have to update the rbtree. Any > > thoughts on that series in general? > I do like the virtual time scheduler patchset. It makes it easier to > schedule an arbitrary tx queue and handles ath10k's firmware pulling > mode better. I will give it a try. > > > Yup, makes sense. Looking at the version you linked to, though, it seems > > you're calling ieee80211_sta_register_airtime() with the estimated value > > as well? So are you double-accounting airtime, or are you adjusting for > > the accurate values somewhere else I don't see in that series? > It does not double count airtime, just both the airtime fairness > scheduler and AQL use the estimate airtime. It is on an older tree and > still doesn't have the patch that provides the fw airtime: > https://patchwork.kernel.org/patch/10684689 > > > On Wed, Sep 25, 2019 at 5:52 AM Toke Høiland-Jørgensen > wrote: > > > > Yibo Zhao writes: > > > > > On 2019-09-25 16:11, Toke Høiland-Jørgensen wrote: > > >> Yibo Zhao writes: > > >> > > >>> So if it is going to work together with virtual time based mechanism > > >>> in > > >>> the future, the Tx criteria will be met both of below conditions, > > >>> 1. Lower than g_vt > > >>> 2. Lower than IEEE80211_AIRTIME_QUEUE_LIMIT > > >> > > >>> Are we going to maintain two kinds of airtime that one is from > > >>> estima
Re: [PATCH RFC/RFT 4/4] mac80211: Apply Airtime-based Queue Limit (AQL) on packet dequeue
> Yes, please do! AFAICT, the main difference is that your version keeps > the airtime calculation itself in the driver, while mine passes up the > rate and lets mac80211 do the calculation of airtime. Other than that, > the differences are minor, no? > I'm not actually sure which approach is best; I suspect doing all the > accounting in mac80211 will help with integrating this into drivers that > use minstrel; we can just add a hook in that and be done with it. > Whereas if the driver has to do the accounting, we would need to add > that to each driver (mt76, iwl(?)). Yes, they are essentially doing the same thing. I kept the airtime estimation code in the ath10k just because it is already there. It is better to do that in mac80211, so it doesn't have to be duplicated for each driver and avoids the overhead of updating the estimated airtime from host driver to mac80211. > But of course, doing things in mac80211 depends on stuffing even more > stuff into the already overloaded cb field; and I'm not actually > entirely sure what I've done with that will actually work. WDYT? Either way a field in skb cb is needed to record the estimated airtime. The 'tx_time_est' shares the space with the codel 'enque_time' looks fine to me, as their lifetime doesn't overlap. There is another minor difference in the ChromiumOs version, which actually address the issue Yibo just asked: > Meanwhile, airtime_queued will also limit the situation that we only > have a station for transmission. Not sure if the peak throughput will be > impacted. I believe it may work fine with 11ac in chromiumos, how about > 11n and 11a? My version has two AQL limits, a smaller per station limit (4ms) and a larger per interface limit (24 ms). When the per interface limit has not been reached, stations are allowed to transmit up to 1/3 of the interface limits (8ms). This way it balance the needs to control latency when there are a lot of stations and to get good throughput benchmark numbers with a single client. In my test, I found increasing the AQL limit to beyond 8 ms doesn't helps peak throughput on 4x4 ath10k chipset. https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1734867/3/net/mac80211/tx.c#b3734 > Of course we'll also have to eventually integrate this with the other > series that Yibo recently re-posted (the virtual time scheduler). I > think that will be relatively straight forward, except I'm not sure your > atomic patches will work when we also have to update the rbtree. Any > thoughts on that series in general? I do like the virtual time scheduler patchset. It makes it easier to schedule an arbitrary tx queue and handles ath10k's firmware pulling mode better. I will give it a try. > Yup, makes sense. Looking at the version you linked to, though, it seems > you're calling ieee80211_sta_register_airtime() with the estimated value > as well? So are you double-accounting airtime, or are you adjusting for > the accurate values somewhere else I don't see in that series? It does not double count airtime, just both the airtime fairness scheduler and AQL use the estimate airtime. It is on an older tree and still doesn't have the patch that provides the fw airtime: https://patchwork.kernel.org/patch/10684689 On Wed, Sep 25, 2019 at 5:52 AM Toke Høiland-Jørgensen wrote: > > Yibo Zhao writes: > > > On 2019-09-25 16:11, Toke Høiland-Jørgensen wrote: > >> Yibo Zhao writes: > >> > >>> So if it is going to work together with virtual time based mechanism > >>> in > >>> the future, the Tx criteria will be met both of below conditions, > >>> 1. Lower than g_vt > >>> 2. Lower than IEEE80211_AIRTIME_QUEUE_LIMIT > >> > >>> Are we going to maintain two kinds of airtime that one is from > >>> estimation and the other is basically from FW reporting? > >> > >> Yes, that was my plan. For devices that don't have FW reporting of > >> airtime, we can fall back to the estimation; but if we do have FW > >> reporting that is most likely going to be more accurate, so better to > >> use that for fairness... > > > > Do you mean we will use airtime reported by FW to calculate > > local->airtime_queued in case we have FW reporting airtime? > > No, the opposite; if the firmware can't report airtime, we can use the > estimated values to feed into report_airtime() for the fairness > calculation... > > -Toke >
Re: [PATCH RFC/RFT 4/4] mac80211: Apply Airtime-based Queue Limit (AQL) on packet dequeue
Hi Toke, There is an updated version of AQL in the chromiumos tree implemented in the mac80211 driver, instead of in the ath10k driver as the original version: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1703105/7 https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1703106/6 It is based on a more recent kernel (4.14) and integrated with the airtime fairness tx scheduler in mac80211. This version has been tested rather extensively. I intended to use it as the basis for my effort to bring AQL upstream, but get sidetracked by other things. I can clean it up and send a patchset next week if you think that is the right path. Sorry for the long delay and slack off on the upstream effort. There is some concern in this thread regarding the accuracy of the estimated airtime using the last reported TX rate. It is indeed a rather crude method and did not include retries in the calculation. Besides, there are lags between firmware changing rate and host driver get the rate update. The 16us IFS overhead is only correct for 5G and it is actually 10us for 2.4 G. However, that hardly matters. The goal of AQL is to prevent the firmware/hardware queue from getting bloated or starved. There is a lot of headroom in the queue length limit (8-10 ms) to tolerate inaccuracy in the estimate airtime. AQL doesn't control the fine grained TX packet scheduling. It is handled by the airtime fairness scheduler and ultimately firmware. There are two TX airtimes in the newer version (chromiumos 4.14 kernel): The estimated airtime for frames pending in the queue and the airtime reported by the firmware for the frame transmitted, which should be accurate as the firmware supposed to take retries and aggregation into account. The airtime fairness scheduler that does the TX packet scheduling should use the TX airtime reported by the firmware. That's the reason why the original implementation in the ChromiumOS tree tries to factor in aggregation size when estimate the airtime overhead and the later version doesn't even bother with that. Regards, Kan On Fri, Sep 20, 2019 at 6:32 AM Toke Høiland-Jørgensen wrote: > > Lorenzo Bianconi writes: > > >> Lorenzo Bianconi writes: > >> > >> >> From: Toke Høiland-Jørgensen > >> >> > >> >> Some devices have deep buffers in firmware and/or hardware which > >> >> prevents > >> >> the FQ structure in mac80211 from effectively limiting bufferbloat on > >> >> the > >> >> link. For Ethernet devices we have BQL to limit the lower-level queues, > >> >> but > >> >> this cannot be applied to mac80211 because transmit rates can vary > >> >> wildly > >> >> between packets depending on which station we are transmitting it to. > >> >> > >> >> To overcome this, we can use airtime-based queue limiting (AQL), where > >> >> we > >> >> estimate the transmission time for each packet before dequeueing it, and > >> >> use that to limit the amount of data in-flight to the hardware. This > >> >> idea > >> >> was originally implemented as part of the out-of-tree airtime fairness > >> >> patch to ath10k[0] in chromiumos. > >> >> > >> >> This patch ports that idea over to mac80211. The basic idea is simple > >> >> enough: Whenever we dequeue a packet from the TXQs and send it to the > >> >> driver, we estimate its airtime usage, based on the last recorded TX > >> >> rate > >> >> of the station that packet is destined for. We keep a running per-AC > >> >> total > >> >> of airtime queued for the whole device, and when that total climbs > >> >> above 8 > >> >> ms' worth of data (corresponding to two maximum-sized aggregates), we > >> >> simply throttle the queues until it drops down again. > >> >> > >> >> The estimated airtime for each skb is stored in the tx_info, so we can > >> >> subtract the same amount from the running total when the skb is freed or > >> >> recycled. The throttling mechanism relies on this accounting to be > >> >> accurate (i.e., that we are not freeing skbs without subtracting any > >> >> airtime they were accounted for), so we put the subtraction into > >> >> ieee80211_report_used_skb(). > >> >> > >> >> This patch does *not* include any mechanism to wake a throttled TXQ > >> >> again, > >> >> on the assumption that this will happen anyway as a side effect of > >> >> whatever > >> >> freed the skb (most commonly a TX completion). > >> >> > >> >> The throttling mechanism only kicks in if the queued airtime total goes > >> >> above the limit. Since mac80211 calculates the time based on the > >> >> reported > >> >> last_tx_time from the driver, the whole throttling mechanism only kicks > >> >> in > >> >> for drivers that actually report this value. With the exception of > >> >> multicast, where we always calculate an estimated tx time on the > >> >> assumption > >> >> that multicast is transmitted at the lowest (6 Mbps) rate. > >> >> > >> >> The throttling added in this patch is in addition to any throttling > >> >> already > >> >> perfo
Re: [PATCH v6 4/4] ath10k: reporting estimated tx airtime for fairness
Hi Toke, This patch looks good to me. Great job on getting the airtime fairness scheduling framework done! On Tue, Jan 22, 2019 at 6:21 AM Toke Høiland-Jørgensen wrote: > > From: Kan Yan > > Transmit airtime will be estimated from last tx rate used. > Firmware report tx rate by peer stats. Airtime is computed > on tx path and the same will be reported to mac80211 upon > tx completion. > > This change is based on Kan's orginal commit in Chromium tree > ("CHROMIUM: ath10k: Implementing airtime fairness based TX scheduler") > ref: https://chromium-review.googlesource.com/588190 > > Tested on QCA4019 with firmware version 10.4-3.2.1.1-00015 > Tested on QCA9984 with firmware version 10.4-3.9.0.1-5 > > Signed-off-by: Kan Yan > [rmano...@codeaurora.org: ported only the airtime computation] > Signed-off-by: Rajkumar Manoharan > [t...@redhat.com: Rebase to mac80211-next, add test note] > Signed-off-by: Toke Høiland-Jørgensen > --- > > Rajkumar / Kan: Please check that I didn't mess anything up while > rebasing this onto the current mac80211-next branch! > > drivers/net/wireless/ath/ath10k/core.h | 2 + > drivers/net/wireless/ath/ath10k/htt_rx.c | 1 + > drivers/net/wireless/ath/ath10k/mac.c| 57 ++-- > drivers/net/wireless/ath/ath10k/txrx.c | 4 ++ > 4 files changed, 61 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/wireless/ath/ath10k/core.h > b/drivers/net/wireless/ath/ath10k/core.h > index 28cecf7375f8..4528a6e47b8d 100644 > --- a/drivers/net/wireless/ath/ath10k/core.h > +++ b/drivers/net/wireless/ath/ath10k/core.h > @@ -126,6 +126,7 @@ struct ath10k_skb_cb { > u8 flags; > u8 eid; > u16 msdu_id; > + u16 airtime_est; > struct ieee80211_vif *vif; > struct ieee80211_txq *txq; > } __packed; > @@ -498,6 +499,7 @@ struct ath10k_sta { > u16 peer_id; > struct rate_info txrate; > struct ieee80211_tx_info tx_info; > + u32 last_tx_bitrate; > > struct work_struct update_wk; > u64 rx_duration; > diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c > b/drivers/net/wireless/ath/ath10k/htt_rx.c > index dc9895f2eb9d..6a93b57900ae 100644 > --- a/drivers/net/wireless/ath/ath10k/htt_rx.c > +++ b/drivers/net/wireless/ath/ath10k/htt_rx.c > @@ -3078,6 +3078,7 @@ ath10k_update_per_peer_tx_stats(struct ath10k *ar, > > arsta->txrate.nss = txrate.nss; > arsta->txrate.bw = ath10k_bw_to_mac80211_bw(txrate.bw); > + arsta->last_tx_bitrate = cfg80211_calculate_bitrate(&arsta->txrate); > if (sgi) > arsta->txrate.flags |= RATE_INFO_FLAGS_SHORT_GI; > > diff --git a/drivers/net/wireless/ath/ath10k/mac.c > b/drivers/net/wireless/ath/ath10k/mac.c > index cf7fdf72e990..b13dcb4533cb 100644 > --- a/drivers/net/wireless/ath/ath10k/mac.c > +++ b/drivers/net/wireless/ath/ath10k/mac.c > @@ -3544,7 +3544,7 @@ static void ath10k_tx_h_add_p2p_noa_ie(struct ath10k > *ar, > static void ath10k_mac_tx_h_fill_cb(struct ath10k *ar, > struct ieee80211_vif *vif, > struct ieee80211_txq *txq, > - struct sk_buff *skb) > + struct sk_buff *skb, u16 airtime) > { > struct ieee80211_hdr *hdr = (void *)skb->data; > struct ath10k_skb_cb *cb = ATH10K_SKB_CB(skb); > @@ -3561,6 +3561,7 @@ static void ath10k_mac_tx_h_fill_cb(struct ath10k *ar, > > cb->vif = vif; > cb->txq = txq; > + cb->airtime_est = airtime; > } > > bool ath10k_mac_tx_frm_has_freq(struct ath10k *ar) > @@ -3948,6 +3949,49 @@ static bool ath10k_mac_tx_can_push(struct ieee80211_hw > *hw, > return false; > } > > +/* Return estimated airtime in microsecond, which is calculated using last > + * reported TX rate. This is just a rough estimation because host driver has > no > + * knowledge of the actual transmit rate, retries or aggregation. If actual > + * airtime can be reported by firmware, then delta between estimated and > actual > + * airtime can be adjusted from deficit. > + */ > +#define IEEE80211_ATF_OVERHEAD 100 /* IFS + some slot time */ > +#define IEEE80211_ATF_OVERHEAD_IFS 16 /* IFS only */ > +static u16 ath10k_mac_update_airtime(struct ath10k *ar, > +struct ieee80211_txq *txq, > +struct sk_buff *skb) > +{ > + struct ath10k_sta *arsta; > + u32 pktlen; > + u16 airtime = 0; > + > + if (!txq || !txq->s
Re: [PATCH] ath10k: Add wrapper function to ath10k debug
> Johannes had an interesting idea to use trace_ath10k_log_dbg_enabled(). > Could you investigate if that would work? That way we might get the > performance improvement even when is enabled CONFIG_ATH10K_TRACING (but > actual trace point is disabled, of course). That's a good idea. This patch was originally made for Google Wifi's 3.18 kernel and It does use trace_ath10k_log_dbg_enabled(): https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/391891 +#define ath10k_dbg(ar, mask, format, ...) \ + do {\ + if (unlikely((ath10k_debug_mask & mask) || \ + trace_ath10k_log_dbg_enabled())) {\ + __ath10k_dbg(ar, mask, format, ##__VA_ARGS__); \ + } \ + } while (0)
Re: [RFC v2] mac80211: budget outstanding airtime for transmission
Hi Rajkumar, It is great to see your patches and thank you for helping bring this to upstream. > Why signed? This should never become negative unless something is wrong > with the accounting somewhere? It is signed because the "budget" is allowed to go negative. In the original version, as long as the budget is > 0, at least one packet is released even the budget is less than the airtime required for the packet. One major issue I had when doing airtime based queue limit is how to prevent txq with very low data rate (e.g., 6 mbps) from getting stalled for too long. My solution is to allow budget to go negative for one packet, then reset budget to 0 when all frames from that txq has been completed and the inflight packets count reaches to 0. For the ieee80211_calc_airtime_budget(), although it is sufficient for the purpose of enforcing queue limit, the airtime overhead estimation in the original patch really crude and could use some improvement. Instead of using the last reported tx PHY rate and a guesstimate of overhead to calculate airtime, one possible improvement is to calculate the estimated "real" tx rate from a windowed moving average of past tx_bytes/tx_airtime, if the firmware is capable of reporting tx airtime for each completed PPDU.
Re: [PATCH RFC v3 0/4] Move TXQ scheduling into mac80211
> I think this is a great approach, that we should definitely adopt in > mac80211. However, I'm not sure the outstanding airtime limiting should > be integrated into the fairness scheduling, since there are drivers such > as ath9k that don't need it. > > Rather, I'd propose that we figure out the API for fairness scheduling > first, and add the BQL-like limiter as a separate layer. They can be > integrated in the sense that the limiter's estimate of airtime can be > used for fairness scheduling as well in the case where the > driver/firmware doesn't have a better source of airtime usage. > > Does that make sense? Sure that make sense. Yes, the airtime based BQL like queue limiter is not needed for drivers such as ath9k. We could leave the outstanding airtime accounting and queue depth limit to another subject and get airtime fairness scheduling API done first. On Mon, Sep 10, 2018 at 4:26 AM, Johannes Berg wrote: > On Mon, 2018-09-10 at 13:17 +0200, Toke Høiland-Jørgensen wrote: > >> > > - I did not add any locking around next_txq(); the driver is still >> > > supposed >> > > to maintain a lock that prevents two threads from trying to schedule >> > > the >> > > same AC at the same time. This is what drivers already do, so I >> > > figured it >> > > was easier to just keep it that way rather than do it in mac80211. >> > >> > I'll look at this in the code, but from a maintainer perspective I'm >> > somewhat worried that this will lead to issues that are really the >> > driver's fault, but surface in mac80211. I don't know how easy it >> > would be to catch that. >> >> Yeah, I get what you mean. The alternative would be to have a >> ieee80211_start_schedule(ac) and ieee80211_end_schedule(ac), which >> basically just takes a lock. > > And I guess start would increment the schedule number, which is now > dependent on first > >> Would mean we could get rid of the 'first' >> parameter for next_txq(), so might not be such a bad idea; > > Right, that's what I meant. > >> and if the >> driver has its own locking the extra locking in mac80211 would just be >> an always-uncontested spinlock, which shouldn't be much overhead, right? > > It may still bounce around CPUs if you call this from other places, but > I suspect that wouldn't be the biggest issue. There are a lot of > calculations going on too... > > johannes
Re: [PATCH RFC v3 0/4] Move TXQ scheduling into mac80211
Hi Toke, It is great to see finally there will be a general airtime fairness support in mac80211. I feel this patch set could still use some improvements to make it works better with 11ac drivers like ath10k. I did a version of airtime fairness in the ath10k driver that used in Google Wifi and I'd like to share a few things I learned from this experience. Airtime fairness is a cool feature, but that's not the motivation for me to implement that. The goal is to solve the bufferbloat problem at wireless interface. Fq_CoDel has been proved to be very effective in fixing bufferbloat for wired interface, and it has been integrated with mac802.11 for more than 2 years. However, it still does not work well with 11ac drivers like ath10k, which relies heavily on firmware to do most of transmit scheduling, and hence has deep firmware queue. FQ_CoDel expects a thin driver layer queue so it can get an accurate measurement of sojourn time and use the sojourn time to control latency. However, the queue in ath10k firmware is so deep, that packets only stay in mac80211 layer queue transiently. The sojourn time is almost always below target, and hence CoDel doesn't do much to control the latency. The key to make Fq_Codel works effectively with 11ac wireless driver is to limit the queue depth in lower layer. It looks to me this patch set only enforce fairness among stations (airtime wise), but did not limit how many packets can be released to firmware/hardware. When a txq run out of its airtime deficit, the txq is put to the back of the list but will get a refill of airtime deficit in next round. It doesn't keep counts of the total pending airtime released to hardware or firmware. Packets will be released as long as wireless driver keeps calling the dequeue function. This maybe not an issue for ath9k, because the number of available tx descriptors is more limited. However, for 11ac drivers like ath10k, it routinely queues thousands packets when the bandwidth is oversubscribed in my test. The version I did is to use pending airtime to restrict the firmware queue depth. Another difference is it does not use airtime reported from firmware, which is "past" or spent airtime, but use "future" airtime, the estimated airtime calculated using last reported tx rate for the inflight packets pending in the firmware queue. Host driver keeps counts of total pending airtime for each TxQ (per station, per AC). It tries to release up to 4-8 ms worth of frames for each TxQ. The idea is to use airtime accounting to give firmware just enough frames to keep it busy and form good aggregations, and keeps the rest of frames in mac80211 layer queues so Fq_Codel can work on it to control latency. Here are some of the patches from the version I did in ath10k for kernel in ChromeOs. CHROMIUM: net: mac80211: Recording last tx bitrate. https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/588189 CHROMIUM: ath10k: Implementing airtime fairness based TX scheduler. https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/588190/13 CHROMIUM: ath10k: use frame count as defict for fq_codel. https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/588192/13 I had some discussions with Rajkumar on how to adapt some methods from my patch and make it fit with the new TXQ scheduling API proposed here. The major issue is the lack of pending airtime accounting. It could be done in individual wireless drivers, such as in ath10k, or even in firmware, but implement it in mac80211 could be a better choice than do it one driver at a time. Thanks, Kan On Fri, Sep 7, 2018 at 3:22 PM, Toke Høiland-Jørgensen wrote: > This is an updated version of the patch series to move TXQ scheduling and > airtime fairness scheduling into mac80211. I've only compile tested this > version, but thought it was better to get the conversation moving instead > of being blocked on me. > > This addresses most of the comments from the last round. Specifically: > > - It folds in Rajkumar's patches to keep per-AC TXQ lists, and to add an > API that ath10k can use to check if a TXQ may transmit according even > when not using get_next_txq(). I changed a few names and descriptions, > but otherwise it's mostly the same. After the discussions we had in the > last series, I *think* it will work this way, but I'm not entirely sure. > > - I got rid of any mention of seqno in next_txq() and schedule_txq() - and > removed the third parameter to schedule_txq() entirely, so drivers can no > longer signal that a TXQ should be allowed to re-appear in a scheduling > round. We can add that back if needed. > > - Added a helper function to schedule and wake TXQs in a single call, for > internal mac80211 use. > > - Changed the default station weight to 256 and got rid of the per-phy > quantum. This makes it possible to lower station weights without having > to change the weights of every other station. > > A few things that were
[PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips
Hi Wen Gong, Do you also have throughput and latency data for different sk_pacing_shift setting when the ath10k client is at some distance away from AP? Just curious about its impact on latency with less than ideal PHY rate. Thanks, Kan
Re: [PATCH v7] Add new mac80211 driver mwlwifi.
David Lin writes: > +static inline struct sk_buff *mwl_tx_do_amsdu(struct mwl_priv *priv, > + int desc_num, > ... > + amsdu_pkts = (struct sk_buff_head *) > + kmalloc(sizeof(*amsdu_pkts), GFP_KERNEL); > + if (!amsdu_pkts) { Should GFP_ATOMIC be used here instead of GFP_KERNEL? This function could be called in interrupt context. -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html