Re: resend: tcp: performance issue with fastopen connections (mss > window)

2017-01-18 Thread Neal Cardwell
On Wed, Jan 18, 2017 at 1:16 PM, Eric Dumazet  wrote:
>
> On Wed, Jan 18, 2017 at 10:13 AM, Yuchung Cheng  wrote:
> > On Wed, Jan 18, 2017 at 9:35 AM, Eric Dumazet  wrote:
> >>
> >> On Wed, Jan 18, 2017 at 9:32 AM, Alexey Kodanev
> >>  wrote:
> >> > Hi Eric,
> >> >
> >> > On 01/13/2017 08:07 PM, Alexey Kodanev wrote:
> >> >
> >>
> >> > Looks like max_window not correctly initialized for tfo sockets.
> >> > On my test machine it has set to '5592320' in 
> >> > tcp_fastopen_create_child().
> >> >
> >> > This diff fixes the issue, the question: is this the right place to do 
> >> > it?
> >> >
> >> > diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
> >> > index 4e777a3..33ed508 100644
> >> > --- a/net/ipv4/tcp_fastopen.c
> >> > +++ b/net/ipv4/tcp_fastopen.c
> >> > @@ -206,6 +206,8 @@ static struct sock *tcp_fastopen_create_child(struct
> >> > sock *sk,
> >> >  */
> >> > tp->snd_wnd = ntohs(tcp_hdr(skb)->window);
> >> >
> >> > +   tp->max_window = tp->snd_wnd;
> >> > +
> >>
> >> Excellent catch. Let me test our regression tests with this.
> > Indeed nice catch. Thanks for the investigative work!
> >
>
> We do have 2 failures, but tests might have depended on undocumented behavior
>
> (For googlers :
> Ran 211 tests: 209 passing, 0 flaky 2 failing
> Sponge: http://sponge/f1575065-6e1c-4514-bced-9167ce56d2ee
> )

Yes, exactly, those test failures look fine. Looks like the test was
implicitly expecting the old buggy behavior (and it wasn't noticed
because the 10*MSS output matches IW10, so it looked fine. But now
with the fix, I believe we are seeing the correct new behavior because
tcp_xmit_size_goal() is calling tcp_bound_to_half_wnd(), and now with
max_window fixed, the outgoing TSO skb is now the correct 5*MSS (half
of the rwin of 1). So the test results look good to me.

Thanks, Alexey, for tracking this down! :-)

neal


Re: resend: tcp: performance issue with fastopen connections (mss > window)

2017-01-18 Thread Neal Cardwell
On Wed, Jan 18, 2017 at 1:27 PM, Yuchung Cheng  wrote:
> Hi Eric: yeah I think the test was just due to the TSO chunking
> difference between prod and upstream, which I was able to avoid with
> this patch re-suited by Neal in b/34128974. In fact, this patch
> enables me to run all recovery tests on upstream kernels for my RACK
> patch set.
>
> Neal: can we polish and check that in? super useful.

Yes, sounds good. Glad that was useful. I will work on polishing up
and checking in the patch to make packetdrill agnostic to TSO
segmentation by default (for upstream and Google).

neal


Re: resend: tcp: performance issue with fastopen connections (mss > window)

2017-01-18 Thread Yuchung Cheng
Googler-only

Hi Eric: yeah I think the test was just due to the TSO chunking
difference between prod and upstream, which I was able to avoid with
this patch re-suited by Neal in b/34128974. In fact, this patch
enables me to run all recovery tests on upstream kernels for my RACK
patch set.

Neal: can we polish and check that in? super useful.


On Wed, Jan 18, 2017 at 10:16 AM, Eric Dumazet  wrote:
> On Wed, Jan 18, 2017 at 10:13 AM, Yuchung Cheng  wrote:
>> On Wed, Jan 18, 2017 at 9:35 AM, Eric Dumazet  wrote:
>>>
>>> On Wed, Jan 18, 2017 at 9:32 AM, Alexey Kodanev
>>>  wrote:
>>> > Hi Eric,
>>> >
>>> > On 01/13/2017 08:07 PM, Alexey Kodanev wrote:
>>> >
>>>
>>> > Looks like max_window not correctly initialized for tfo sockets.
>>> > On my test machine it has set to '5592320' in tcp_fastopen_create_child().
>>> >
>>> > This diff fixes the issue, the question: is this the right place to do it?
>>> >
>>> > diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
>>> > index 4e777a3..33ed508 100644
>>> > --- a/net/ipv4/tcp_fastopen.c
>>> > +++ b/net/ipv4/tcp_fastopen.c
>>> > @@ -206,6 +206,8 @@ static struct sock *tcp_fastopen_create_child(struct
>>> > sock *sk,
>>> >  */
>>> > tp->snd_wnd = ntohs(tcp_hdr(skb)->window);
>>> >
>>> > +   tp->max_window = tp->snd_wnd;
>>> > +
>>>
>>> Excellent catch. Let me test our regression tests with this.
>> Indeed nice catch. Thanks for the investigative work!
>>
>
> We do have 2 failures, but tests might have depended on undocumented behavior
>
> (For googlers :
> Ran 211 tests: 209 passing, 0 flaky 2 failing
> Sponge: http://sponge/f1575065-6e1c-4514-bced-9167ce56d2ee
> )
>
> Please Alexey submit an official patch, thanks a lot !


Re: resend: tcp: performance issue with fastopen connections (mss > window)

2017-01-18 Thread Eric Dumazet
On Wed, Jan 18, 2017 at 10:13 AM, Yuchung Cheng  wrote:
> On Wed, Jan 18, 2017 at 9:35 AM, Eric Dumazet  wrote:
>>
>> On Wed, Jan 18, 2017 at 9:32 AM, Alexey Kodanev
>>  wrote:
>> > Hi Eric,
>> >
>> > On 01/13/2017 08:07 PM, Alexey Kodanev wrote:
>> >
>>
>> > Looks like max_window not correctly initialized for tfo sockets.
>> > On my test machine it has set to '5592320' in tcp_fastopen_create_child().
>> >
>> > This diff fixes the issue, the question: is this the right place to do it?
>> >
>> > diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
>> > index 4e777a3..33ed508 100644
>> > --- a/net/ipv4/tcp_fastopen.c
>> > +++ b/net/ipv4/tcp_fastopen.c
>> > @@ -206,6 +206,8 @@ static struct sock *tcp_fastopen_create_child(struct
>> > sock *sk,
>> >  */
>> > tp->snd_wnd = ntohs(tcp_hdr(skb)->window);
>> >
>> > +   tp->max_window = tp->snd_wnd;
>> > +
>>
>> Excellent catch. Let me test our regression tests with this.
> Indeed nice catch. Thanks for the investigative work!
>

We do have 2 failures, but tests might have depended on undocumented behavior

(For googlers :
Ran 211 tests: 209 passing, 0 flaky 2 failing
Sponge: http://sponge/f1575065-6e1c-4514-bced-9167ce56d2ee
)

Please Alexey submit an official patch, thanks a lot !


Re: resend: tcp: performance issue with fastopen connections (mss > window)

2017-01-18 Thread Yuchung Cheng
On Wed, Jan 18, 2017 at 9:35 AM, Eric Dumazet  wrote:
>
> On Wed, Jan 18, 2017 at 9:32 AM, Alexey Kodanev
>  wrote:
> > Hi Eric,
> >
> > On 01/13/2017 08:07 PM, Alexey Kodanev wrote:
> >
>
> > Looks like max_window not correctly initialized for tfo sockets.
> > On my test machine it has set to '5592320' in tcp_fastopen_create_child().
> >
> > This diff fixes the issue, the question: is this the right place to do it?
> >
> > diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
> > index 4e777a3..33ed508 100644
> > --- a/net/ipv4/tcp_fastopen.c
> > +++ b/net/ipv4/tcp_fastopen.c
> > @@ -206,6 +206,8 @@ static struct sock *tcp_fastopen_create_child(struct
> > sock *sk,
> >  */
> > tp->snd_wnd = ntohs(tcp_hdr(skb)->window);
> >
> > +   tp->max_window = tp->snd_wnd;
> > +
>
> Excellent catch. Let me test our regression tests with this.
Indeed nice catch. Thanks for the investigative work!

>
> Thanks !


Re: resend: tcp: performance issue with fastopen connections (mss > window)

2017-01-18 Thread Eric Dumazet
On Wed, Jan 18, 2017 at 9:32 AM, Alexey Kodanev
 wrote:
> Hi Eric,
>
> On 01/13/2017 08:07 PM, Alexey Kodanev wrote:
>

> Looks like max_window not correctly initialized for tfo sockets.
> On my test machine it has set to '5592320' in tcp_fastopen_create_child().
>
> This diff fixes the issue, the question: is this the right place to do it?
>
> diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
> index 4e777a3..33ed508 100644
> --- a/net/ipv4/tcp_fastopen.c
> +++ b/net/ipv4/tcp_fastopen.c
> @@ -206,6 +206,8 @@ static struct sock *tcp_fastopen_create_child(struct
> sock *sk,
>  */
> tp->snd_wnd = ntohs(tcp_hdr(skb)->window);
>
> +   tp->max_window = tp->snd_wnd;
> +

Excellent catch. Let me test our regression tests with this.

Thanks !


Re: resend: tcp: performance issue with fastopen connections (mss > window)

2017-01-18 Thread Alexey Kodanev

Hi Eric,

On 01/13/2017 08:07 PM, Alexey Kodanev wrote:


Hi Eric,
On 13.01.2017 18:35, Eric Dumazet wrote:


I would suggest to clamp MSS to half the initial window, but I guess
this is impractical since window in SYN/SYNACK are not scaled.


Looks like max_window not correctly initialized for tfo sockets.
On my test machine it has set to '5592320' in tcp_fastopen_create_child().

This diff fixes the issue, the question: is this the right place to do it?

diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index 4e777a3..33ed508 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -206,6 +206,8 @@ static struct sock *tcp_fastopen_create_child(struct 
sock *sk,

 */
tp->snd_wnd = ntohs(tcp_hdr(skb)->window);

+   tp->max_window = tp->snd_wnd;
+
/* Activate the retrans timer so that SYNACK can be retransmitted.
 * The request socket is not added to the ehash
 * because it's been added to the accept queue directly.


Thanks,
Alexey



Re: resend: tcp: performance issue with fastopen connections (mss > window)

2017-01-13 Thread Eric Dumazet
On Fri, 2017-01-13 at 12:32 -0500, Neal Cardwell wrote:
> On Fri, Jan 13, 2017 at 12:14 PM, Eric Dumazet  wrote:
> >
> > On Fri, Jan 13, 2017 at 9:07 AM, Alexey Kodanev
> >  wrote:
> > > Hi Eric,
> > > On 13.01.2017 18:35, Eric Dumazet wrote:
> >
> > >> Care to send a packetdrill test so that we have a clear picture of what
> > >> is going on ?
> > >
> > > Is it capable of making two connections in the single test, one after
> > > another?
> >
> > Absolutely.
> >
> > Neal, Yuchung would you be kind enough to send a Fastopen tpacketdrill
> > template showing a typical fastopen test
> > running on an upstream kernel ?
> >
> > Thanks !
> 
> Sure, here is an example packetdrill script, IIRC written by Yuchung,
> which demonstrates TCP fast open and consecutive active connections:
> 
> `sysctl -q net.ipv4.tcp_timestamps=0`
> 
> // Cache warmup: send a Fast Open cookie request
> 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
>+0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
>+0 sendto(3, ..., 0, MSG_FASTOPEN, ..., ...) = -1 EINPROGRESS
> (Operation is now in progress)
>+0 > S 0:0(0) 
> +.010 < S. 123:123(0) ack 1 win 5840  1040,nop,nop,sackOK,nop,wscale 6,FO abcd1234,nop,nop>
>+0 > . 1:1(0) ack 1
>+0 close(3) = 0
>+0 > F. 1:1(0) ack 1
> +.010 < F. 1:1(0) ack 2 win 92
>+0 > .  2:2(0) ack 2
> 
> //
> // TEST1: Servers sends SYN-ACK with data and another two data packets
> //
>+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4
>+0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
>+0 sendto(4, ..., 1000, MSG_FASTOPEN, ..., ...) = 1000
>+0 > S 0:1000(1000)  abcd1234,nop,nop>
> +.010 < S. 100:1001400(1400) ack 1001 win 5840  1040,nop,nop,sackOK,nop,wscale 6>
>+0 < . 1401:2801(1400) ack 1001 win 257
>+0 < P. 2801:3001(200) ack 1001 win 257
> 
> neal

Thanks Neal

Also worth adding that packetdrill has the following option to tune the
MTU on the tun device :

--mtu=x





Re: resend: tcp: performance issue with fastopen connections (mss > window)

2017-01-13 Thread Neal Cardwell
On Fri, Jan 13, 2017 at 12:14 PM, Eric Dumazet  wrote:
>
> On Fri, Jan 13, 2017 at 9:07 AM, Alexey Kodanev
>  wrote:
> > Hi Eric,
> > On 13.01.2017 18:35, Eric Dumazet wrote:
>
> >> Care to send a packetdrill test so that we have a clear picture of what
> >> is going on ?
> >
> > Is it capable of making two connections in the single test, one after
> > another?
>
> Absolutely.
>
> Neal, Yuchung would you be kind enough to send a Fastopen tpacketdrill
> template showing a typical fastopen test
> running on an upstream kernel ?
>
> Thanks !

Sure, here is an example packetdrill script, IIRC written by Yuchung,
which demonstrates TCP fast open and consecutive active connections:

`sysctl -q net.ipv4.tcp_timestamps=0`

// Cache warmup: send a Fast Open cookie request
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
   +0 sendto(3, ..., 0, MSG_FASTOPEN, ..., ...) = -1 EINPROGRESS
(Operation is now in progress)
   +0 > S 0:0(0) 
+.010 < S. 123:123(0) ack 1 win 5840 
   +0 > . 1:1(0) ack 1
   +0 close(3) = 0
   +0 > F. 1:1(0) ack 1
+.010 < F. 1:1(0) ack 2 win 92
   +0 > .  2:2(0) ack 2

//
// TEST1: Servers sends SYN-ACK with data and another two data packets
//
   +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4
   +0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
   +0 sendto(4, ..., 1000, MSG_FASTOPEN, ..., ...) = 1000
   +0 > S 0:1000(1000) 
+.010 < S. 100:1001400(1400) ack 1001 win 5840 
   +0 < . 1401:2801(1400) ack 1001 win 257
   +0 < P. 2801:3001(200) ack 1001 win 257

neal


Re: resend: tcp: performance issue with fastopen connections (mss > window)

2017-01-13 Thread Eric Dumazet
On Fri, Jan 13, 2017 at 9:07 AM, Alexey Kodanev
 wrote:
> Hi Eric,
> On 13.01.2017 18:35, Eric Dumazet wrote:

>> Care to send a packetdrill test so that we have a clear picture of what
>> is going on ?
>
> Is it capable of making two connections in the single test, one after
> another?

Absolutely.

Neal, Yuchung would you be kind enough to send a Fastopen tpacketdrill
template showing a typical fastopen test
running on an upstream kernel ?

Thanks !


Re: resend: tcp: performance issue with fastopen connections (mss > window)

2017-01-13 Thread Alexey Kodanev
Hi Eric,
On 13.01.2017 18:35, Eric Dumazet wrote:
> On Fri, 2017-01-13 at 18:01 +0300, Alexey Kodanev wrote:
>> Hi,
>>
>> Got the issue when running LTP/netstress test on localhost with mss
>> greater than the send window advertised by client (right after 3WHS).
>> Here is the testscenario that can reproduce this:
> Hi Alexey
>
> So this is a combination of Fastopen + small window + large MSS ?

Yeah, this happens only in the beginning, after first ack from client.
Later window gets
lager than mss and it doesn't happen.

>
> I would rather not force burning tons of coal or other fossil fuel,
> by making each tcp_sendmsg() done by billions of linux devices more
> expensive, only to accommodate for some LTP test doing something not
> sensible ;)
>
> Fact that you removed one condition in the BUG_ON() might hide another
> issue later in the path.
>
> I would suggest to clamp MSS to half the initial window, but I guess
> this is impractical since window in SYN/SYNACK are not scaled.
> Care to send a packetdrill test so that we have a clear picture of what
> is going on ?

Is it capable of making two connections in the single test, one after
another?

Thanks,
Alexey


Re: resend: tcp: performance issue with fastopen connections (mss > window)

2017-01-13 Thread Eric Dumazet
On Fri, 2017-01-13 at 18:01 +0300, Alexey Kodanev wrote:
> Hi,
> 
> Got the issue when running LTP/netstress test on localhost with mss
> greater than the send window advertised by client (right after 3WHS).
> Here is the testscenario that can reproduce this:

Hi Alexey

So this is a combination of Fastopen + small window + large MSS ?

I would rather not force burning tons of coal or other fossil fuel,
by making each tcp_sendmsg() done by billions of linux devices more
expensive, only to accommodate for some LTP test doing something not
sensible ;)

Fact that you removed one condition in the BUG_ON() might hide another
issue later in the path.

I would suggest to clamp MSS to half the initial window, but I guess
this is impractical since window in SYN/SYNACK are not scaled.

Care to send a packetdrill test so that we have a clear picture of what
is going on ?

Thanks.




resend: tcp: performance issue with fastopen connections (mss > window)

2017-01-13 Thread Alexey Kodanev
Hi,

Got the issue when running LTP/netstress test on localhost with mss
greater than the send window advertised by client (right after 3WHS).
Here is the testscenario that can reproduce this:

TCP client is sending 32 bytes request, TCP server replies with 65KB answer.
net.ipv4.tcp_fastopen set to 3. Also notethat first TCP Fastopen connection
processed without delay as tcp_send_mss()setshalf of the window size
to the'size_goal' inside tcp_sendmsg().

Though on the 2nd and subsequent connections:

< S  seq 0:0 win 43690 options [mss 65495 wscale 7
 tfo cookie ac6246a51d5422fc] length 32
> S.seq 0:0ack 1win 43690 options [mss 65495wscale 7] length 0
<.  ack 1 win 342 length 0

Inside tcp_sendmsg(), tcp_send_mss() returns 65483 in 'mss_now',as well as
in 'size_goal'. This results the segment not queued for transmition
until all
data copied from userbuffer. Then, inside  __tcp_push_pending_frames() it
breaks on send window test,continue with the check probe timer, thus
introducing 200ms delay here.

Fragmentationoccurs in tcp_write_wakeup()...

+0.2> P. seq 1:43777 ack 1 win 342 length 43776
 <  . ack 43777, win 1365 length 0
 > P. seq 43777:65001 ack 1 win 342 optionslength 21224
 ...


Not sure what is the right fix for this, I guess we could limit size_goal
to the current window or mss, what is currently less, e.g:

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4a04496..3d3bd97 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -860,7 +860,12 @@ static unsigned int tcp_xmit_size_goal(struct sock
*sk, u32 mss_now,
size_goal = tp->gso_segs * mss_now;
}

-   return max(size_goal, mss_now);
+   size_goal = max(size_goal, mss_now);
+
+   if (tp->snd_wnd > TCP_MSS_DEFAULT)
+   return min(tp->snd_wnd, size_goal);
+
+   return size_goal;
 }

 static int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 1d5331a..0ac133f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2445,7 +2445,7 @@ void tcp_push_one(struct sock *sk, unsigned int
mss_now)
 {
struct sk_buff *skb = tcp_send_head(sk);

-   BUG_ON(!skb || skb->len < mss_now);
+   BUG_ON(!skb);

tcp_write_xmit(sk, mss_now, TCP_NAGLE_PUSH, 1, sk->sk_allocation);
 }


Any ideas?

Best regards,
Alexey