Re: resend: tcp: performance issue with fastopen connections (mss > window)
On Wed, Jan 18, 2017 at 1:16 PM, Eric Dumazetwrote: > > On Wed, Jan 18, 2017 at 10:13 AM, Yuchung Cheng wrote: > > On Wed, Jan 18, 2017 at 9:35 AM, Eric Dumazet wrote: > >> > >> On Wed, Jan 18, 2017 at 9:32 AM, Alexey Kodanev > >> wrote: > >> > Hi Eric, > >> > > >> > On 01/13/2017 08:07 PM, Alexey Kodanev wrote: > >> > > >> > >> > Looks like max_window not correctly initialized for tfo sockets. > >> > On my test machine it has set to '5592320' in > >> > tcp_fastopen_create_child(). > >> > > >> > This diff fixes the issue, the question: is this the right place to do > >> > it? > >> > > >> > diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c > >> > index 4e777a3..33ed508 100644 > >> > --- a/net/ipv4/tcp_fastopen.c > >> > +++ b/net/ipv4/tcp_fastopen.c > >> > @@ -206,6 +206,8 @@ static struct sock *tcp_fastopen_create_child(struct > >> > sock *sk, > >> > */ > >> > tp->snd_wnd = ntohs(tcp_hdr(skb)->window); > >> > > >> > + tp->max_window = tp->snd_wnd; > >> > + > >> > >> Excellent catch. Let me test our regression tests with this. > > Indeed nice catch. Thanks for the investigative work! > > > > We do have 2 failures, but tests might have depended on undocumented behavior > > (For googlers : > Ran 211 tests: 209 passing, 0 flaky 2 failing > Sponge: http://sponge/f1575065-6e1c-4514-bced-9167ce56d2ee > ) Yes, exactly, those test failures look fine. Looks like the test was implicitly expecting the old buggy behavior (and it wasn't noticed because the 10*MSS output matches IW10, so it looked fine. But now with the fix, I believe we are seeing the correct new behavior because tcp_xmit_size_goal() is calling tcp_bound_to_half_wnd(), and now with max_window fixed, the outgoing TSO skb is now the correct 5*MSS (half of the rwin of 1). So the test results look good to me. Thanks, Alexey, for tracking this down! :-) neal
Re: resend: tcp: performance issue with fastopen connections (mss > window)
On Wed, Jan 18, 2017 at 1:27 PM, Yuchung Chengwrote: > Hi Eric: yeah I think the test was just due to the TSO chunking > difference between prod and upstream, which I was able to avoid with > this patch re-suited by Neal in b/34128974. In fact, this patch > enables me to run all recovery tests on upstream kernels for my RACK > patch set. > > Neal: can we polish and check that in? super useful. Yes, sounds good. Glad that was useful. I will work on polishing up and checking in the patch to make packetdrill agnostic to TSO segmentation by default (for upstream and Google). neal
Re: resend: tcp: performance issue with fastopen connections (mss > window)
Googler-only Hi Eric: yeah I think the test was just due to the TSO chunking difference between prod and upstream, which I was able to avoid with this patch re-suited by Neal in b/34128974. In fact, this patch enables me to run all recovery tests on upstream kernels for my RACK patch set. Neal: can we polish and check that in? super useful. On Wed, Jan 18, 2017 at 10:16 AM, Eric Dumazetwrote: > On Wed, Jan 18, 2017 at 10:13 AM, Yuchung Cheng wrote: >> On Wed, Jan 18, 2017 at 9:35 AM, Eric Dumazet wrote: >>> >>> On Wed, Jan 18, 2017 at 9:32 AM, Alexey Kodanev >>> wrote: >>> > Hi Eric, >>> > >>> > On 01/13/2017 08:07 PM, Alexey Kodanev wrote: >>> > >>> >>> > Looks like max_window not correctly initialized for tfo sockets. >>> > On my test machine it has set to '5592320' in tcp_fastopen_create_child(). >>> > >>> > This diff fixes the issue, the question: is this the right place to do it? >>> > >>> > diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c >>> > index 4e777a3..33ed508 100644 >>> > --- a/net/ipv4/tcp_fastopen.c >>> > +++ b/net/ipv4/tcp_fastopen.c >>> > @@ -206,6 +206,8 @@ static struct sock *tcp_fastopen_create_child(struct >>> > sock *sk, >>> > */ >>> > tp->snd_wnd = ntohs(tcp_hdr(skb)->window); >>> > >>> > + tp->max_window = tp->snd_wnd; >>> > + >>> >>> Excellent catch. Let me test our regression tests with this. >> Indeed nice catch. Thanks for the investigative work! >> > > We do have 2 failures, but tests might have depended on undocumented behavior > > (For googlers : > Ran 211 tests: 209 passing, 0 flaky 2 failing > Sponge: http://sponge/f1575065-6e1c-4514-bced-9167ce56d2ee > ) > > Please Alexey submit an official patch, thanks a lot !
Re: resend: tcp: performance issue with fastopen connections (mss > window)
On Wed, Jan 18, 2017 at 10:13 AM, Yuchung Chengwrote: > On Wed, Jan 18, 2017 at 9:35 AM, Eric Dumazet wrote: >> >> On Wed, Jan 18, 2017 at 9:32 AM, Alexey Kodanev >> wrote: >> > Hi Eric, >> > >> > On 01/13/2017 08:07 PM, Alexey Kodanev wrote: >> > >> >> > Looks like max_window not correctly initialized for tfo sockets. >> > On my test machine it has set to '5592320' in tcp_fastopen_create_child(). >> > >> > This diff fixes the issue, the question: is this the right place to do it? >> > >> > diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c >> > index 4e777a3..33ed508 100644 >> > --- a/net/ipv4/tcp_fastopen.c >> > +++ b/net/ipv4/tcp_fastopen.c >> > @@ -206,6 +206,8 @@ static struct sock *tcp_fastopen_create_child(struct >> > sock *sk, >> > */ >> > tp->snd_wnd = ntohs(tcp_hdr(skb)->window); >> > >> > + tp->max_window = tp->snd_wnd; >> > + >> >> Excellent catch. Let me test our regression tests with this. > Indeed nice catch. Thanks for the investigative work! > We do have 2 failures, but tests might have depended on undocumented behavior (For googlers : Ran 211 tests: 209 passing, 0 flaky 2 failing Sponge: http://sponge/f1575065-6e1c-4514-bced-9167ce56d2ee ) Please Alexey submit an official patch, thanks a lot !
Re: resend: tcp: performance issue with fastopen connections (mss > window)
On Wed, Jan 18, 2017 at 9:35 AM, Eric Dumazetwrote: > > On Wed, Jan 18, 2017 at 9:32 AM, Alexey Kodanev > wrote: > > Hi Eric, > > > > On 01/13/2017 08:07 PM, Alexey Kodanev wrote: > > > > > Looks like max_window not correctly initialized for tfo sockets. > > On my test machine it has set to '5592320' in tcp_fastopen_create_child(). > > > > This diff fixes the issue, the question: is this the right place to do it? > > > > diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c > > index 4e777a3..33ed508 100644 > > --- a/net/ipv4/tcp_fastopen.c > > +++ b/net/ipv4/tcp_fastopen.c > > @@ -206,6 +206,8 @@ static struct sock *tcp_fastopen_create_child(struct > > sock *sk, > > */ > > tp->snd_wnd = ntohs(tcp_hdr(skb)->window); > > > > + tp->max_window = tp->snd_wnd; > > + > > Excellent catch. Let me test our regression tests with this. Indeed nice catch. Thanks for the investigative work! > > Thanks !
Re: resend: tcp: performance issue with fastopen connections (mss > window)
On Wed, Jan 18, 2017 at 9:32 AM, Alexey Kodanevwrote: > Hi Eric, > > On 01/13/2017 08:07 PM, Alexey Kodanev wrote: > > Looks like max_window not correctly initialized for tfo sockets. > On my test machine it has set to '5592320' in tcp_fastopen_create_child(). > > This diff fixes the issue, the question: is this the right place to do it? > > diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c > index 4e777a3..33ed508 100644 > --- a/net/ipv4/tcp_fastopen.c > +++ b/net/ipv4/tcp_fastopen.c > @@ -206,6 +206,8 @@ static struct sock *tcp_fastopen_create_child(struct > sock *sk, > */ > tp->snd_wnd = ntohs(tcp_hdr(skb)->window); > > + tp->max_window = tp->snd_wnd; > + Excellent catch. Let me test our regression tests with this. Thanks !
Re: resend: tcp: performance issue with fastopen connections (mss > window)
Hi Eric, On 01/13/2017 08:07 PM, Alexey Kodanev wrote: Hi Eric, On 13.01.2017 18:35, Eric Dumazet wrote: I would suggest to clamp MSS to half the initial window, but I guess this is impractical since window in SYN/SYNACK are not scaled. Looks like max_window not correctly initialized for tfo sockets. On my test machine it has set to '5592320' in tcp_fastopen_create_child(). This diff fixes the issue, the question: is this the right place to do it? diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c index 4e777a3..33ed508 100644 --- a/net/ipv4/tcp_fastopen.c +++ b/net/ipv4/tcp_fastopen.c @@ -206,6 +206,8 @@ static struct sock *tcp_fastopen_create_child(struct sock *sk, */ tp->snd_wnd = ntohs(tcp_hdr(skb)->window); + tp->max_window = tp->snd_wnd; + /* Activate the retrans timer so that SYNACK can be retransmitted. * The request socket is not added to the ehash * because it's been added to the accept queue directly. Thanks, Alexey
Re: resend: tcp: performance issue with fastopen connections (mss > window)
On Fri, 2017-01-13 at 12:32 -0500, Neal Cardwell wrote: > On Fri, Jan 13, 2017 at 12:14 PM, Eric Dumazetwrote: > > > > On Fri, Jan 13, 2017 at 9:07 AM, Alexey Kodanev > > wrote: > > > Hi Eric, > > > On 13.01.2017 18:35, Eric Dumazet wrote: > > > > >> Care to send a packetdrill test so that we have a clear picture of what > > >> is going on ? > > > > > > Is it capable of making two connections in the single test, one after > > > another? > > > > Absolutely. > > > > Neal, Yuchung would you be kind enough to send a Fastopen tpacketdrill > > template showing a typical fastopen test > > running on an upstream kernel ? > > > > Thanks ! > > Sure, here is an example packetdrill script, IIRC written by Yuchung, > which demonstrates TCP fast open and consecutive active connections: > > `sysctl -q net.ipv4.tcp_timestamps=0` > > // Cache warmup: send a Fast Open cookie request > 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 >+0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 >+0 sendto(3, ..., 0, MSG_FASTOPEN, ..., ...) = -1 EINPROGRESS > (Operation is now in progress) >+0 > S 0:0(0) > +.010 < S. 123:123(0) ack 1 win 5840 1040,nop,nop,sackOK,nop,wscale 6,FO abcd1234,nop,nop> >+0 > . 1:1(0) ack 1 >+0 close(3) = 0 >+0 > F. 1:1(0) ack 1 > +.010 < F. 1:1(0) ack 2 win 92 >+0 > . 2:2(0) ack 2 > > // > // TEST1: Servers sends SYN-ACK with data and another two data packets > // >+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4 >+0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0 >+0 sendto(4, ..., 1000, MSG_FASTOPEN, ..., ...) = 1000 >+0 > S 0:1000(1000) abcd1234,nop,nop> > +.010 < S. 100:1001400(1400) ack 1001 win 5840 1040,nop,nop,sackOK,nop,wscale 6> >+0 < . 1401:2801(1400) ack 1001 win 257 >+0 < P. 2801:3001(200) ack 1001 win 257 > > neal Thanks Neal Also worth adding that packetdrill has the following option to tune the MTU on the tun device : --mtu=x
Re: resend: tcp: performance issue with fastopen connections (mss > window)
On Fri, Jan 13, 2017 at 12:14 PM, Eric Dumazetwrote: > > On Fri, Jan 13, 2017 at 9:07 AM, Alexey Kodanev > wrote: > > Hi Eric, > > On 13.01.2017 18:35, Eric Dumazet wrote: > > >> Care to send a packetdrill test so that we have a clear picture of what > >> is going on ? > > > > Is it capable of making two connections in the single test, one after > > another? > > Absolutely. > > Neal, Yuchung would you be kind enough to send a Fastopen tpacketdrill > template showing a typical fastopen test > running on an upstream kernel ? > > Thanks ! Sure, here is an example packetdrill script, IIRC written by Yuchung, which demonstrates TCP fast open and consecutive active connections: `sysctl -q net.ipv4.tcp_timestamps=0` // Cache warmup: send a Fast Open cookie request 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 +0 sendto(3, ..., 0, MSG_FASTOPEN, ..., ...) = -1 EINPROGRESS (Operation is now in progress) +0 > S 0:0(0) +.010 < S. 123:123(0) ack 1 win 5840 +0 > . 1:1(0) ack 1 +0 close(3) = 0 +0 > F. 1:1(0) ack 1 +.010 < F. 1:1(0) ack 2 win 92 +0 > . 2:2(0) ack 2 // // TEST1: Servers sends SYN-ACK with data and another two data packets // +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4 +0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0 +0 sendto(4, ..., 1000, MSG_FASTOPEN, ..., ...) = 1000 +0 > S 0:1000(1000) +.010 < S. 100:1001400(1400) ack 1001 win 5840 +0 < . 1401:2801(1400) ack 1001 win 257 +0 < P. 2801:3001(200) ack 1001 win 257 neal
Re: resend: tcp: performance issue with fastopen connections (mss > window)
On Fri, Jan 13, 2017 at 9:07 AM, Alexey Kodanevwrote: > Hi Eric, > On 13.01.2017 18:35, Eric Dumazet wrote: >> Care to send a packetdrill test so that we have a clear picture of what >> is going on ? > > Is it capable of making two connections in the single test, one after > another? Absolutely. Neal, Yuchung would you be kind enough to send a Fastopen tpacketdrill template showing a typical fastopen test running on an upstream kernel ? Thanks !
Re: resend: tcp: performance issue with fastopen connections (mss > window)
Hi Eric, On 13.01.2017 18:35, Eric Dumazet wrote: > On Fri, 2017-01-13 at 18:01 +0300, Alexey Kodanev wrote: >> Hi, >> >> Got the issue when running LTP/netstress test on localhost with mss >> greater than the send window advertised by client (right after 3WHS). >> Here is the testscenario that can reproduce this: > Hi Alexey > > So this is a combination of Fastopen + small window + large MSS ? Yeah, this happens only in the beginning, after first ack from client. Later window gets lager than mss and it doesn't happen. > > I would rather not force burning tons of coal or other fossil fuel, > by making each tcp_sendmsg() done by billions of linux devices more > expensive, only to accommodate for some LTP test doing something not > sensible ;) > > Fact that you removed one condition in the BUG_ON() might hide another > issue later in the path. > > I would suggest to clamp MSS to half the initial window, but I guess > this is impractical since window in SYN/SYNACK are not scaled. > Care to send a packetdrill test so that we have a clear picture of what > is going on ? Is it capable of making two connections in the single test, one after another? Thanks, Alexey
Re: resend: tcp: performance issue with fastopen connections (mss > window)
On Fri, 2017-01-13 at 18:01 +0300, Alexey Kodanev wrote: > Hi, > > Got the issue when running LTP/netstress test on localhost with mss > greater than the send window advertised by client (right after 3WHS). > Here is the testscenario that can reproduce this: Hi Alexey So this is a combination of Fastopen + small window + large MSS ? I would rather not force burning tons of coal or other fossil fuel, by making each tcp_sendmsg() done by billions of linux devices more expensive, only to accommodate for some LTP test doing something not sensible ;) Fact that you removed one condition in the BUG_ON() might hide another issue later in the path. I would suggest to clamp MSS to half the initial window, but I guess this is impractical since window in SYN/SYNACK are not scaled. Care to send a packetdrill test so that we have a clear picture of what is going on ? Thanks.
resend: tcp: performance issue with fastopen connections (mss > window)
Hi, Got the issue when running LTP/netstress test on localhost with mss greater than the send window advertised by client (right after 3WHS). Here is the testscenario that can reproduce this: TCP client is sending 32 bytes request, TCP server replies with 65KB answer. net.ipv4.tcp_fastopen set to 3. Also notethat first TCP Fastopen connection processed without delay as tcp_send_mss()setshalf of the window size to the'size_goal' inside tcp_sendmsg(). Though on the 2nd and subsequent connections: < S seq 0:0 win 43690 options [mss 65495 wscale 7 tfo cookie ac6246a51d5422fc] length 32 > S.seq 0:0ack 1win 43690 options [mss 65495wscale 7] length 0 <. ack 1 win 342 length 0 Inside tcp_sendmsg(), tcp_send_mss() returns 65483 in 'mss_now',as well as in 'size_goal'. This results the segment not queued for transmition until all data copied from userbuffer. Then, inside __tcp_push_pending_frames() it breaks on send window test,continue with the check probe timer, thus introducing 200ms delay here. Fragmentationoccurs in tcp_write_wakeup()... +0.2> P. seq 1:43777 ack 1 win 342 length 43776 < . ack 43777, win 1365 length 0 > P. seq 43777:65001 ack 1 win 342 optionslength 21224 ... Not sure what is the right fix for this, I guess we could limit size_goal to the current window or mss, what is currently less, e.g: diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 4a04496..3d3bd97 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -860,7 +860,12 @@ static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now, size_goal = tp->gso_segs * mss_now; } - return max(size_goal, mss_now); + size_goal = max(size_goal, mss_now); + + if (tp->snd_wnd > TCP_MSS_DEFAULT) + return min(tp->snd_wnd, size_goal); + + return size_goal; } static int tcp_send_mss(struct sock *sk, int *size_goal, int flags) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 1d5331a..0ac133f 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2445,7 +2445,7 @@ void tcp_push_one(struct sock *sk, unsigned int mss_now) { struct sk_buff *skb = tcp_send_head(sk); - BUG_ON(!skb || skb->len < mss_now); + BUG_ON(!skb); tcp_write_xmit(sk, mss_now, TCP_NAGLE_PUSH, 1, sk->sk_allocation); } Any ideas? Best regards, Alexey