Re: [Make-wifi-fast] [PATCH] mac80211: Adjust TSQ pacing shift

2018-03-01 Thread Ryan Hsu
On 02/14/2018 12:23 AM, Jonathan Morton wrote:

>> On 14 Feb, 2018, at 10:18 am, Toke Høiland-Jørgensen  wrote:
>>
>> Why does the CPU usage go up >7?
> Just as a guess, it's generating extra packets which are then laboriously 
> discarded and retransmitted.
>
>  - Jonathan Morton

I think for 11n, like ath9k, it might be good enough for 8, but for 11ac could
aggregate a little more.

Yes, and CPU usage goes up after 6 or 7, might due to it generates extra
packets but the physical bus is capping the throughput, so that we can't see
much throughput difference after (or maybe my setup is not optimal, assumed we
should be seeing around 550-600Mbps for TCP in 11ac), but only the CPU usage
increased.

-- 
Ryan Hsu



Re: [Make-wifi-fast] [PATCH] mac80211: Adjust TSQ pacing shift

2018-02-14 Thread Jonathan Morton
> On 14 Feb, 2018, at 10:18 am, Toke Høiland-Jørgensen  wrote:
> 
> Why does the CPU usage go up >7?

Just as a guess, it's generating extra packets which are then laboriously 
discarded and retransmitted.

 - Jonathan Morton



Re: [PATCH] mac80211: Adjust TSQ pacing shift

2018-02-14 Thread Toke Høiland-Jørgensen


On 14 February 2018 01:43:25 CET, Ryan Hsu  wrote:
>On 02/02/2018 07:11 AM, Toke Høiland-Jørgensen wrote:
>
>> Since we now have the convenient helper to do so, actually adjust the
>> TSQ pacing shift for packets going out over a WiFi interface. This
>> significantly improves throughput for locally-originated TCP
>> connections. The default pacing shift of 10 corresponds to ~1ms of
>> queued packet data. Adjusting this to a shift of 8 (i.e. ~4ms)
>improves
>> 1-hop throughput for ath9k by a factor of 3, whereas increasing it
>more
>> has diminishing returns.
>>
>> Achieved throughput for different values of sk_pacing_shift (average
>of
>> 5 iterations of 10-sec netperf runs to a host on the other side of
>the
>> WiFi hop):
>>
>> sk_pacing_shift 10:  43.21 Mbps (pre-patch)
>> sk_pacing_shift  9:  78.17 Mbps
>> sk_pacing_shift  8: 123.94 Mbps
>> sk_pacing_shift  7: 128.31 Mbps
>>
>> Latency for competing flows increases from ~3 ms to ~10 ms with this
>> change. This is about the same magnitude of queueing latency induced
>by
>> flows that are not originated on the WiFi device itself (and so are
>not
>> limited by TSQ).
>>
>> Signed-off-by: Toke Høiland-Jørgensen 
>> ---
>>  net/mac80211/tx.c | 8 
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
>> index 25904af38839..69722504e3e1 100644
>> --- a/net/mac80211/tx.c
>> +++ b/net/mac80211/tx.c
>> @@ -3574,6 +3574,14 @@ void __ieee80211_subif_start_xmit(struct
>sk_buff *skb,
>>  if (!IS_ERR_OR_NULL(sta)) {
>>  struct ieee80211_fast_tx *fast_tx;
>>  
>> +/* We need a bit of data queued to build aggregates properly, so
>> + * instruct the TCP stack to allow more than a single ms of data
>> + * to be queued in the stack. The value is a bit-shift of 1
>> + * second, so 8 is ~4ms of queued data. Only affects local TCP
>> + * sockets.
>> + */
>> +sk_pacing_shift_update(skb->sk, 8);
>> +
>>  fast_tx = rcu_dereference(sta->fast_tx);
>>  
>>  if (fast_tx &&
>
>I knew increasing the value doesn't help much after 8 for ath9k, but I
>ran a
>testing on ath10k that 6 or 7 is having optimal number.
>Since ath10k/11ac device has higher bandwidth than ath9k/11n, can we
>consider
>to use to 6 or 7 to accommodate that effect?
>
>   tx (mbps) cpu usage (%)
>5404   28.5
>6398   13.8
>74018
>83785
>92304.5
>10   79.6   2

Why does the CPU usage go up >7? Also, what is the latency impact of each of 
those values?

-Toke


Re: [PATCH] mac80211: Adjust TSQ pacing shift

2018-02-13 Thread Ryan Hsu
On 02/02/2018 07:11 AM, Toke Høiland-Jørgensen wrote:

> Since we now have the convenient helper to do so, actually adjust the
> TSQ pacing shift for packets going out over a WiFi interface. This
> significantly improves throughput for locally-originated TCP
> connections. The default pacing shift of 10 corresponds to ~1ms of
> queued packet data. Adjusting this to a shift of 8 (i.e. ~4ms) improves
> 1-hop throughput for ath9k by a factor of 3, whereas increasing it more
> has diminishing returns.
>
> Achieved throughput for different values of sk_pacing_shift (average of
> 5 iterations of 10-sec netperf runs to a host on the other side of the
> WiFi hop):
>
> sk_pacing_shift 10:  43.21 Mbps (pre-patch)
> sk_pacing_shift  9:  78.17 Mbps
> sk_pacing_shift  8: 123.94 Mbps
> sk_pacing_shift  7: 128.31 Mbps
>
> Latency for competing flows increases from ~3 ms to ~10 ms with this
> change. This is about the same magnitude of queueing latency induced by
> flows that are not originated on the WiFi device itself (and so are not
> limited by TSQ).
>
> Signed-off-by: Toke Høiland-Jørgensen 
> ---
>  net/mac80211/tx.c | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> index 25904af38839..69722504e3e1 100644
> --- a/net/mac80211/tx.c
> +++ b/net/mac80211/tx.c
> @@ -3574,6 +3574,14 @@ void __ieee80211_subif_start_xmit(struct sk_buff *skb,
>   if (!IS_ERR_OR_NULL(sta)) {
>   struct ieee80211_fast_tx *fast_tx;
>  
> + /* We need a bit of data queued to build aggregates properly, so
> +  * instruct the TCP stack to allow more than a single ms of data
> +  * to be queued in the stack. The value is a bit-shift of 1
> +  * second, so 8 is ~4ms of queued data. Only affects local TCP
> +  * sockets.
> +  */
> + sk_pacing_shift_update(skb->sk, 8);
> +
>   fast_tx = rcu_dereference(sta->fast_tx);
>  
>   if (fast_tx &&

I knew increasing the value doesn't help much after 8 for ath9k, but I ran a
testing on ath10k that 6 or 7 is having optimal number.
Since ath10k/11ac device has higher bandwidth than ath9k/11n, can we consider
to use to 6 or 7 to accommodate that effect?

   tx (mbps) cpu usage (%)
5404   28.5
6398   13.8
74018
83785
92304.5
10   79.6   2

I have a quad core machine.

$ cat /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 58
model name  : Intel(R) Core(TM) i5-3380M CPU @ 2.90GHz

-- 
Ryan Hsu


Re: [Make-wifi-fast] [PATCH] mac80211: Adjust TSQ pacing shift

2018-02-02 Thread Arend van Spriel

On 2/2/2018 5:55 PM, dpr...@deepplum.com wrote:

I'm curious about the "WiFi Aware" initiative by the WiFi Alliance.

Does LEDE and/or Linux support this protocol? I know gSupplicant is potentially 
the way such things are supposed to work, at least according to its supporters.

The general NAN (Neighborhood-Aware-Networking) concept makes a lot of sense at 
one level, but as an Internet guy, it troubles me that they decided to split 
from the Internet and go a balkanized direction. To me, the neighborhood is 
interesting only as part of a larger Internet.

It also troubles me that WiFi Aware is a "certification program" rather than a 
real standard.


It troubles me that you are breaking into an email conversation with a 
topic that in my opinion is totally unrelated. Although probably not 
intended as such it seems rude. Just start your own conversation.


Regards,
Arend


-Original Message-
From: "Toke Høiland-Jørgensen" <t...@toke.dk>
Sent: Friday, February 2, 2018 10:11am
To: make-wifi-f...@lists.bufferbloat.net, linux-wireless@vger.kernel.org
Cc: "Toke Høiland-Jørgensen" <t...@toke.dk>
Subject: [Make-wifi-fast] [PATCH] mac80211: Adjust TSQ pacing shift

Since we now have the convenient helper to do so, actually adjust the
TSQ pacing shift for packets going out over a WiFi interface. This
significantly improves throughput for locally-originated TCP
connections. The default pacing shift of 10 corresponds to ~1ms of
queued packet data. Adjusting this to a shift of 8 (i.e. ~4ms) improves
1-hop throughput for ath9k by a factor of 3, whereas increasing it more
has diminishing returns.

Achieved throughput for different values of sk_pacing_shift (average of
5 iterations of 10-sec netperf runs to a host on the other side of the
WiFi hop):

sk_pacing_shift 10:  43.21 Mbps (pre-patch)
sk_pacing_shift  9:  78.17 Mbps
sk_pacing_shift  8: 123.94 Mbps
sk_pacing_shift  7: 128.31 Mbps

Latency for competing flows increases from ~3 ms to ~10 ms with this
change. This is about the same magnitude of queueing latency induced by
flows that are not originated on the WiFi device itself (and so are not
limited by TSQ).

Signed-off-by: Toke Høiland-Jørgensen <t...@toke.dk>
---
  net/mac80211/tx.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 25904af38839..69722504e3e1 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3574,6 +3574,14 @@ void __ieee80211_subif_start_xmit(struct sk_buff *skb,
if (!IS_ERR_OR_NULL(sta)) {
struct ieee80211_fast_tx *fast_tx;

+   /* We need a bit of data queued to build aggregates properly, so
+* instruct the TCP stack to allow more than a single ms of data
+* to be queued in the stack. The value is a bit-shift of 1
+* second, so 8 is ~4ms of queued data. Only affects local TCP
+* sockets.
+*/
+   sk_pacing_shift_update(skb->sk, 8);
+
fast_tx = rcu_dereference(sta->fast_tx);

if (fast_tx &&





RE: [Make-wifi-fast] [PATCH] mac80211: Adjust TSQ pacing shift

2018-02-02 Thread dpr...@deepplum.com
I'm curious about the "WiFi Aware" initiative by the WiFi Alliance.

Does LEDE and/or Linux support this protocol? I know gSupplicant is potentially 
the way such things are supposed to work, at least according to its supporters.

The general NAN (Neighborhood-Aware-Networking) concept makes a lot of sense at 
one level, but as an Internet guy, it troubles me that they decided to split 
from the Internet and go a balkanized direction. To me, the neighborhood is 
interesting only as part of a larger Internet.

It also troubles me that WiFi Aware is a "certification program" rather than a 
real standard.

-Original Message-
From: "Toke Høiland-Jørgensen" <t...@toke.dk>
Sent: Friday, February 2, 2018 10:11am
To: make-wifi-f...@lists.bufferbloat.net, linux-wireless@vger.kernel.org
Cc: "Toke Høiland-Jørgensen" <t...@toke.dk>
Subject: [Make-wifi-fast] [PATCH] mac80211: Adjust TSQ pacing shift

Since we now have the convenient helper to do so, actually adjust the
TSQ pacing shift for packets going out over a WiFi interface. This
significantly improves throughput for locally-originated TCP
connections. The default pacing shift of 10 corresponds to ~1ms of
queued packet data. Adjusting this to a shift of 8 (i.e. ~4ms) improves
1-hop throughput for ath9k by a factor of 3, whereas increasing it more
has diminishing returns.

Achieved throughput for different values of sk_pacing_shift (average of
5 iterations of 10-sec netperf runs to a host on the other side of the
WiFi hop):

sk_pacing_shift 10:  43.21 Mbps (pre-patch)
sk_pacing_shift  9:  78.17 Mbps
sk_pacing_shift  8: 123.94 Mbps
sk_pacing_shift  7: 128.31 Mbps

Latency for competing flows increases from ~3 ms to ~10 ms with this
change. This is about the same magnitude of queueing latency induced by
flows that are not originated on the WiFi device itself (and so are not
limited by TSQ).

Signed-off-by: Toke Høiland-Jørgensen <t...@toke.dk>
---
 net/mac80211/tx.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 25904af38839..69722504e3e1 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3574,6 +3574,14 @@ void __ieee80211_subif_start_xmit(struct sk_buff *skb,
if (!IS_ERR_OR_NULL(sta)) {
struct ieee80211_fast_tx *fast_tx;
 
+   /* We need a bit of data queued to build aggregates properly, so
+* instruct the TCP stack to allow more than a single ms of data
+* to be queued in the stack. The value is a bit-shift of 1
+* second, so 8 is ~4ms of queued data. Only affects local TCP
+* sockets.
+*/
+   sk_pacing_shift_update(skb->sk, 8);
+
fast_tx = rcu_dereference(sta->fast_tx);
 
if (fast_tx &&
-- 
2.16.0

___
Make-wifi-fast mailing list
make-wifi-f...@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/make-wifi-fast



[PATCH] mac80211: Adjust TSQ pacing shift

2018-02-02 Thread Toke Høiland-Jørgensen
Since we now have the convenient helper to do so, actually adjust the
TSQ pacing shift for packets going out over a WiFi interface. This
significantly improves throughput for locally-originated TCP
connections. The default pacing shift of 10 corresponds to ~1ms of
queued packet data. Adjusting this to a shift of 8 (i.e. ~4ms) improves
1-hop throughput for ath9k by a factor of 3, whereas increasing it more
has diminishing returns.

Achieved throughput for different values of sk_pacing_shift (average of
5 iterations of 10-sec netperf runs to a host on the other side of the
WiFi hop):

sk_pacing_shift 10:  43.21 Mbps (pre-patch)
sk_pacing_shift  9:  78.17 Mbps
sk_pacing_shift  8: 123.94 Mbps
sk_pacing_shift  7: 128.31 Mbps

Latency for competing flows increases from ~3 ms to ~10 ms with this
change. This is about the same magnitude of queueing latency induced by
flows that are not originated on the WiFi device itself (and so are not
limited by TSQ).

Signed-off-by: Toke Høiland-Jørgensen 
---
 net/mac80211/tx.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 25904af38839..69722504e3e1 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3574,6 +3574,14 @@ void __ieee80211_subif_start_xmit(struct sk_buff *skb,
if (!IS_ERR_OR_NULL(sta)) {
struct ieee80211_fast_tx *fast_tx;
 
+   /* We need a bit of data queued to build aggregates properly, so
+* instruct the TCP stack to allow more than a single ms of data
+* to be queued in the stack. The value is a bit-shift of 1
+* second, so 8 is ~4ms of queued data. Only affects local TCP
+* sockets.
+*/
+   sk_pacing_shift_update(skb->sk, 8);
+
fast_tx = rcu_dereference(sta->fast_tx);
 
if (fast_tx &&
-- 
2.16.0