Re: [PATCH net-next 0/6] tcp: remove non GSO code
On Wed, Feb 21, 2018 at 02:37:48PM -0500, David Miller wrote: > From: Eric Dumazet> Date: Mon, 19 Feb 2018 11:56:46 -0800 > > > Switching TCP to GSO mode, relying on core networking layers > > to perform eventual adaptation for dumb devices was overdue. > > > > 1) Most TCP developments are done with TSO in mind. > > 2) Less high-resolution timers needs to be armed for TCP-pacing > > 3) GSO can benefit of xmit_more hint > > 4) Receiver GRO is more effective (as if TSO was used for real on sender) > >-> less ACK packets and overhead. > > 5) Write queues have less overhead (one skb holds about 64KB of payload) > > 6) SACK coalescing just works. (no payload in skb->head) > > 7) rtx rb-tree contains less packets, SACK is cheaper. > > 8) Removal of legacy code. Less maintenance hassles. > > > > Note that I have left the sendpage/zerocopy paths, but they probably can > > benefit from the same strategy. > > > > Thanks to Oleksandr Natalenko for reporting a performance issue for > > BBR/fq_codel, > > which was the main reason I worked on this patch series. > > Series applied, thanks Eric. > > SCTP might want to do something similar, and if so we can get rid > of sk_can_gso() too. Cc'ing linux-sctp and adding to the ToDo here, although it may be too soon for SCTP. GSO support was added just a few months ago and considering that it is not that much widely used as TCP, I fear we may have some issues that didn't show up yet. Marcelo
Re: [PATCH net-next 0/6] tcp: remove non GSO code
From: Eric DumazetDate: Mon, 19 Feb 2018 11:56:46 -0800 > Switching TCP to GSO mode, relying on core networking layers > to perform eventual adaptation for dumb devices was overdue. > > 1) Most TCP developments are done with TSO in mind. > 2) Less high-resolution timers needs to be armed for TCP-pacing > 3) GSO can benefit of xmit_more hint > 4) Receiver GRO is more effective (as if TSO was used for real on sender) >-> less ACK packets and overhead. > 5) Write queues have less overhead (one skb holds about 64KB of payload) > 6) SACK coalescing just works. (no payload in skb->head) > 7) rtx rb-tree contains less packets, SACK is cheaper. > 8) Removal of legacy code. Less maintenance hassles. > > Note that I have left the sendpage/zerocopy paths, but they probably can > benefit from the same strategy. > > Thanks to Oleksandr Natalenko for reporting a performance issue for > BBR/fq_codel, > which was the main reason I worked on this patch series. Series applied, thanks Eric. SCTP might want to do something similar, and if so we can get rid of sk_can_gso() too.
Re: [PATCH net-next 0/6] tcp: remove non GSO code
Hi. On středa 21. února 2018 0:21:37 CET Eric Dumazet wrote: > My latest patch (fixing BBR underestimation of cwnd) > was meant for net tree, on a NIC where SG/TSO/GSO) are disabled. > > ( ie when sk->sk_gso_max_segs is not set to 'infinite' ) > > It is packet scheduler independent really. > > Tested here with pfifo_fast, TSO/GSO off. Well, before the patch with BBR and sg off here it is ~450 Mbps for fq and ~115 Mbps for pfifo_fast. So, comparing to what I see with the patch (850 and 200 respectively), it is definitely an improvement. Thanks.
Re: [PATCH net-next 0/6] tcp: remove non GSO code
On Tue, 2018-02-20 at 21:45 +0100, Oleksandr Natalenko wrote: > On úterý 20. února 2018 21:09:37 CET Eric Dumazet wrote: > > Also you can tune your NIC to accept few MSS per GSO/TSO packet > > > > ip link set dev eth0 gso_max_segs 2 > > > > So even if TSO/GSO is there, BBR should not use sk->sk_gso_max_segs to > > size its bursts, since burt sizes are also impacting GRO on the > > receiver. > > net-next + 7 patches (6 from the patchset + this one). My latest patch (fixing BBR underestimation of cwnd) was meant for net tree, on a NIC where SG/TSO/GSO) are disabled. ( ie when sk->sk_gso_max_segs is not set to 'infinite' ) It is packet scheduler independent really. Tested here with pfifo_fast, TSO/GSO off. Before patch : for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done 691 (ss -temoi shows cwnd is stuck around 6 ) 667 651 631 517 After patch : # for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done 1733 (ss -temoi shows cwnd is around 386 ) 1778 1746 1781 1718 Thanks.
Re: [PATCH net-next 0/6] tcp: remove non GSO code
On úterý 20. února 2018 21:09:37 CET Eric Dumazet wrote: > Also you can tune your NIC to accept few MSS per GSO/TSO packet > > ip link set dev eth0 gso_max_segs 2 > > So even if TSO/GSO is there, BBR should not use sk->sk_gso_max_segs to > size its bursts, since burt sizes are also impacting GRO on the > receiver. net-next + 7 patches (6 from the patchset + this one). Before playing with gso_max_segs: BBR+fq sg on: 4.39 Gbits/sec sg off: 1.33 Gbits/sec BBR+fq_codel sg on: 4.02 Gbits/sec sg off: 1.41 Gbits/sec BBR+pfifo_fast sg on: 3.66 Gbits/sec sg off: 1.41 Gbits/sec Reno+fq sg on: 5.69 Gbits/sec sg off: 1.53 Gbits/sec Reno+fq_codel sg on: 6.33 Gbits/sec sg off: 1.50 Gbits/sec Reno+pfifo_fast sg on: 6.26 Gbits/sec sg off: 1.48 Gbits/sec After "ip link set dev eth1 gso_max_segs 2": BBR+fq sg on: 806 Mbits/sec sg off: 886 Mbits/sec BBR+fq_codel sg on: 206 Mbits/sec sg off: 207 Mbits/sec BBR+pfifo_fast sg on: 220 Mbits/sec sg off: 200 Mbits/sec Reno+fq sg on: 2.16 Gbits/sec sg off: 1.27 Gbits/sec Reno+fq_codel sg on: 2.45 Gbits/sec sg off: 1.52 Gbits/sec Reno+pfifo_fast sg on: 2.31 Gbits/sec sg off: 1.54 Gbits/sec Oleksandr
Re: [PATCH net-next 0/6] tcp: remove non GSO code
On Tue, 2018-02-20 at 11:56 -0800, Eric Dumazet wrote: > On Tue, Feb 20, 2018 at 11:51 AM, Oleksandr Natalenko >wrote: > > On úterý 20. února 2018 20:39:49 CET Eric Dumazet wrote: > > > I am not trying to compare BBR and Reno on a lossless link. > > > > > > Reno is running as fast as possible and will win when bufferbloat is > > > not an issue. > > > > > > If bufferbloat is not an issue, simply use Reno and be happy ;) > > > > > > My patch helps BBR only, I thought it was obvious ;) > > > > Umm, yes, and my point was rather something like "the speed on a lossless > > link > > while using BBR with and without this patch is the same". Sorry for a > > confusion. I guess, the key word here is "lossless". > > That is with the other patches _not_ applied ? > > Here the gain is quite big, since BBR can setup a slightly better > cwnd, allowing proper GRO on receiver. Also you can tune your NIC to accept few MSS per GSO/TSO packet ip link set dev eth0 gso_max_segs 2 So even if TSO/GSO is there, BBR should not use sk->sk_gso_max_segs to size its bursts, since burt sizes are also impacting GRO on the receiver.
Re: [PATCH net-next 0/6] tcp: remove non GSO code
On úterý 20. února 2018 20:56:24 CET Eric Dumazet wrote: > That is with the other patches _not_ applied ? Yes, other patches are not applied. It is v4.15.4 + this patch only + BBR + fq_codel or pfifo_fast. Shall I re-test it on the net-next with the whole patchset (because it is not applied cleanly to 4.15)? Oleksandr
Re: [PATCH net-next 0/6] tcp: remove non GSO code
On Tue, Feb 20, 2018 at 11:51 AM, Oleksandr Natalenkowrote: > On úterý 20. února 2018 20:39:49 CET Eric Dumazet wrote: >> I am not trying to compare BBR and Reno on a lossless link. >> >> Reno is running as fast as possible and will win when bufferbloat is >> not an issue. >> >> If bufferbloat is not an issue, simply use Reno and be happy ;) >> >> My patch helps BBR only, I thought it was obvious ;) > > Umm, yes, and my point was rather something like "the speed on a lossless link > while using BBR with and without this patch is the same". Sorry for a > confusion. I guess, the key word here is "lossless". That is with the other patches _not_ applied ? Here the gain is quite big, since BBR can setup a slightly better cwnd, allowing proper GRO on receiver.
Re: [PATCH net-next 0/6] tcp: remove non GSO code
On úterý 20. února 2018 20:39:49 CET Eric Dumazet wrote: > I am not trying to compare BBR and Reno on a lossless link. > > Reno is running as fast as possible and will win when bufferbloat is > not an issue. > > If bufferbloat is not an issue, simply use Reno and be happy ;) > > My patch helps BBR only, I thought it was obvious ;) Umm, yes, and my point was rather something like "the speed on a lossless link while using BBR with and without this patch is the same". Sorry for a confusion. I guess, the key word here is "lossless". Oleksandr
Re: [PATCH net-next 0/6] tcp: remove non GSO code
On Tue, Feb 20, 2018 at 11:35 AM, Oleksandr Natalenkowrote: > Hi. > > On úterý 20. února 2018 19:57:42 CET Eric Dumazet wrote: >> Actually timer drifts are not horrible (at least on my lab hosts) >> >> But BBR has a pessimistic way to sense the burst size, as it is tied to >> TSO/GSO being there. >> >> Following patch helps a lot. > > Not really, at least if applied to v4.15.4. Still getting 2 Gbps less between > VMs if using BBR instead of Reno. > > Am I doing something wrong? I am not trying to compare BBR and Reno on a lossless link. Reno is running as fast as possible and will win when bufferbloat is not an issue. If bufferbloat is not an issue, simply use Reno and be happy ;) My patch helps BBR only, I thought it was obvious ;)
Re: [PATCH net-next 0/6] tcp: remove non GSO code
Hi. On úterý 20. února 2018 19:57:42 CET Eric Dumazet wrote: > Actually timer drifts are not horrible (at least on my lab hosts) > > But BBR has a pessimistic way to sense the burst size, as it is tied to > TSO/GSO being there. > > Following patch helps a lot. Not really, at least if applied to v4.15.4. Still getting 2 Gbps less between VMs if using BBR instead of Reno. Am I doing something wrong? Oleksandr
Re: [PATCH net-next 0/6] tcp: remove non GSO code
On Tue, 2018-02-20 at 07:39 -0800, Eric Dumazet wrote: > On Tue, 2018-02-20 at 10:32 +0100, Oleksandr Natalenko wrote: > > Hi. > > > > 19.02.2018 20:56, Eric Dumazet wrote: > > > Switching TCP to GSO mode, relying on core networking layers > > > to perform eventual adaptation for dumb devices was overdue. > > > > > > 1) Most TCP developments are done with TSO in mind. > > > 2) Less high-resolution timers needs to be armed for TCP-pacing > > > 3) GSO can benefit of xmit_more hint > > > 4) Receiver GRO is more effective (as if TSO was used for real on > > > sender) > > >-> less ACK packets and overhead. > > > 5) Write queues have less overhead (one skb holds about 64KB of > > > payload) > > > 6) SACK coalescing just works. (no payload in skb->head) > > > 7) rtx rb-tree contains less packets, SACK is cheaper. > > > 8) Removal of legacy code. Less maintenance hassles. > > > > > > Note that I have left the sendpage/zerocopy paths, but they probably > > > can > > > benefit from the same strategy. > > > > > > Thanks to Oleksandr Natalenko for reporting a performance issue for > > > BBR/fq_codel, > > > which was the main reason I worked on this patch series. > > > > Thanks for dealing with this that fast. > > > > Does this mean that the option to optimise internal TCP pacing is still > > an open question? > > It is not an optimization that is needed, but taking into account that > highres timers can have latencies of ~2 usec or more. > > When sending 64KB TSO packets, having extra 2 usec after every ~54 usec > (at 10Gbit) has no big impact, since TCP computes a slightly inflated > pacing rate anyway. > > But when sending one MSS/packet every usec, this definitely can > demonstrate a big slowdown. > > But the anser is yes, I will take a look at this timer drift. Actually timer drifts are not horrible (at least on my lab hosts) But BBR has a pessimistic way to sense the burst size, as it is tied to TSO/GSO being there. Following patch helps a lot. diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index b2bca373f8bee35267df49b5947a6793fed71a12..6818042cd8a9a1778f54637861647091afd9a769 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1730,7 +1730,7 @@ u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now, */ segs = max_t(u32, bytes / mss_now, min_tso_segs); - return min_t(u32, segs, sk->sk_gso_max_segs); + return segs; } EXPORT_SYMBOL(tcp_tso_autosize); @@ -1742,9 +1742,10 @@ static u32 tcp_tso_segs(struct sock *sk, unsigned int mss_now) const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops; u32 tso_segs = ca_ops->tso_segs_goal ? ca_ops->tso_segs_goal(sk) : 0; - return tso_segs ? : - tcp_tso_autosize(sk, mss_now, -sock_net(sk)->ipv4.sysctl_tcp_min_tso_segs); + if (!tso_segs) + tso_segs = tcp_tso_autosize(sk, mss_now, + sock_net(sk)->ipv4.sysctl_tcp_min_tso_segs); + return min_t(u32, tso_segs, sk->sk_gso_max_segs); } /* Returns the portion of skb which can be sent right away */
Re: [PATCH net-next 0/6] tcp: remove non GSO code
On Tue, 2018-02-20 at 10:32 +0100, Oleksandr Natalenko wrote: > Hi. > > 19.02.2018 20:56, Eric Dumazet wrote: > > Switching TCP to GSO mode, relying on core networking layers > > to perform eventual adaptation for dumb devices was overdue. > > > > 1) Most TCP developments are done with TSO in mind. > > 2) Less high-resolution timers needs to be armed for TCP-pacing > > 3) GSO can benefit of xmit_more hint > > 4) Receiver GRO is more effective (as if TSO was used for real on > > sender) > >-> less ACK packets and overhead. > > 5) Write queues have less overhead (one skb holds about 64KB of > > payload) > > 6) SACK coalescing just works. (no payload in skb->head) > > 7) rtx rb-tree contains less packets, SACK is cheaper. > > 8) Removal of legacy code. Less maintenance hassles. > > > > Note that I have left the sendpage/zerocopy paths, but they probably > > can > > benefit from the same strategy. > > > > Thanks to Oleksandr Natalenko for reporting a performance issue for > > BBR/fq_codel, > > which was the main reason I worked on this patch series. > > Thanks for dealing with this that fast. > > Does this mean that the option to optimise internal TCP pacing is still > an open question? It is not an optimization that is needed, but taking into account that highres timers can have latencies of ~2 usec or more. When sending 64KB TSO packets, having extra 2 usec after every ~54 usec (at 10Gbit) has no big impact, since TCP computes a slightly inflated pacing rate anyway. But when sending one MSS/packet every usec, this definitely can demonstrate a big slowdown. But the anser is yes, I will take a look at this timer drift.
Re: [PATCH net-next 0/6] tcp: remove non GSO code
Hi. 19.02.2018 20:56, Eric Dumazet wrote: Switching TCP to GSO mode, relying on core networking layers to perform eventual adaptation for dumb devices was overdue. 1) Most TCP developments are done with TSO in mind. 2) Less high-resolution timers needs to be armed for TCP-pacing 3) GSO can benefit of xmit_more hint 4) Receiver GRO is more effective (as if TSO was used for real on sender) -> less ACK packets and overhead. 5) Write queues have less overhead (one skb holds about 64KB of payload) 6) SACK coalescing just works. (no payload in skb->head) 7) rtx rb-tree contains less packets, SACK is cheaper. 8) Removal of legacy code. Less maintenance hassles. Note that I have left the sendpage/zerocopy paths, but they probably can benefit from the same strategy. Thanks to Oleksandr Natalenko for reporting a performance issue for BBR/fq_codel, which was the main reason I worked on this patch series. Thanks for dealing with this that fast. Does this mean that the option to optimise internal TCP pacing is still an open question? Oleksandr
Re: [PATCH net-next 0/6] tcp: remove non GSO code
On Mon, Feb 19, 2018 at 2:56 PM Eric Dumazetwrote: > Switching TCP to GSO mode, relying on core networking layers > to perform eventual adaptation for dumb devices was overdue. > 1) Most TCP developments are done with TSO in mind. > 2) Less high-resolution timers needs to be armed for TCP-pacing > 3) GSO can benefit of xmit_more hint > 4) Receiver GRO is more effective (as if TSO was used for real on sender) > -> less ACK packets and overhead. > 5) Write queues have less overhead (one skb holds about 64KB of payload) > 6) SACK coalescing just works. (no payload in skb->head) > 7) rtx rb-tree contains less packets, SACK is cheaper. > 8) Removal of legacy code. Less maintenance hassles. > Note that I have left the sendpage/zerocopy paths, but they probably can > benefit from the same strategy. > Thanks to Oleksandr Natalenko for reporting a performance issue for BBR/fq_codel, > which was the main reason I worked on this patch series. > Eric Dumazet (6): > tcp: switch to GSO being always on > tcp: remove sk_can_gso() use > tcp: remove sk_check_csum_caps() > tcp: tcp_sendmsg() only deals with CHECKSUM_PARTIAL > tcp: remove dead code from tcp_set_skb_tso_segs() > tcp: remove dead code after CHECKSUM_PARTIAL adoption Acked-by: Soheil Hassas Yeganeh Very nice patch-series! Thank you, Eric! >include/net/sock.h| 10 +--- >net/core/sock.c | 2 +- >net/ipv4/tcp.c| 57 --- >net/ipv4/tcp_input.c | 3 --- >net/ipv4/tcp_ipv4.c | 13 +++--- >net/ipv4/tcp_output.c | 40 +- >6 files changed, 26 insertions(+), 99 deletions(-) > -- > 2.16.1.291.g4437f3f132-goog
[PATCH net-next 0/6] tcp: remove non GSO code
Switching TCP to GSO mode, relying on core networking layers to perform eventual adaptation for dumb devices was overdue. 1) Most TCP developments are done with TSO in mind. 2) Less high-resolution timers needs to be armed for TCP-pacing 3) GSO can benefit of xmit_more hint 4) Receiver GRO is more effective (as if TSO was used for real on sender) -> less ACK packets and overhead. 5) Write queues have less overhead (one skb holds about 64KB of payload) 6) SACK coalescing just works. (no payload in skb->head) 7) rtx rb-tree contains less packets, SACK is cheaper. 8) Removal of legacy code. Less maintenance hassles. Note that I have left the sendpage/zerocopy paths, but they probably can benefit from the same strategy. Thanks to Oleksandr Natalenko for reporting a performance issue for BBR/fq_codel, which was the main reason I worked on this patch series. Eric Dumazet (6): tcp: switch to GSO being always on tcp: remove sk_can_gso() use tcp: remove sk_check_csum_caps() tcp: tcp_sendmsg() only deals with CHECKSUM_PARTIAL tcp: remove dead code from tcp_set_skb_tso_segs() tcp: remove dead code after CHECKSUM_PARTIAL adoption include/net/sock.h| 10 +--- net/core/sock.c | 2 +- net/ipv4/tcp.c| 57 --- net/ipv4/tcp_input.c | 3 --- net/ipv4/tcp_ipv4.c | 13 +++--- net/ipv4/tcp_output.c | 40 +- 6 files changed, 26 insertions(+), 99 deletions(-) -- 2.16.1.291.g4437f3f132-goog