Re: [PATCH net-next 3/3] net/tcp-fastopen: Add new API support

2017-01-25 Thread Wei Wang
> Note that I am not sure we correctly test that a second connect() can be
> done, and I am not sure kernel would check that the remote IP and
> destination port is the same.

> Ie what should happen for

> setsockopt(fd, SOL_TCP, TCP_FASTOPEN_CONNECT, , 4)
> connect(fd,  "1.2.3.4:80")
> connect(fd, "55.66.77.88:4000")

I wrote a simple code to test this scenario and the second connect()
returns EISCONN as well even though the destination IP is different.

On Wed, Jan 25, 2017 at 11:03 AM, David Miller  wrote:
> From: Wei Wang 
> Date: Wed, 25 Jan 2017 10:54:50 -0800
>
>>> Yes sorry David, for me it's OK. If Wei runs his whole series of tests
>>> on it again, it should be perfect.
>>
>> I just ran all the TFO related tests with Willy's patch on top of this
>> patch series.
>> And everything passes.
>
> Great, I'll apply everything, thanks.


Re: [PATCH net-next 3/3] net/tcp-fastopen: Add new API support

2017-01-25 Thread David Miller
From: Wei Wang 
Date: Wed, 25 Jan 2017 10:54:50 -0800

>> Yes sorry David, for me it's OK. If Wei runs his whole series of tests
>> on it again, it should be perfect.
> 
> I just ran all the TFO related tests with Willy's patch on top of this
> patch series.
> And everything passes.

Great, I'll apply everything, thanks.


Re: [PATCH net-next 3/3] net/tcp-fastopen: Add new API support

2017-01-25 Thread Eric Dumazet
On Wed, 2017-01-25 at 10:54 -0800, Wei Wang wrote:
> > Yes sorry David, for me it's OK. If Wei runs his whole series of tests
> > on it again, it should be perfect.
> 
> I just ran all the TFO related tests with Willy's patch on top of this
> patch series.
> And everything passes.

Note that I am not sure we correctly test that a second connect() can be
done, and I am not sure kernel would check that the remote IP and
destination port is the same.

Ie what should happen for

setsockopt(fd, SOL_TCP, TCP_FASTOPEN_CONNECT, , 4)
connect(fd,  "1.2.3.4:80")
connect(fd, "55.66.77.88:4000")

This multiple connect() thing should have been forbidden in the first
place really.




Re: [PATCH net-next 3/3] net/tcp-fastopen: Add new API support

2017-01-25 Thread Wei Wang
> Yes sorry David, for me it's OK. If Wei runs his whole series of tests
> on it again, it should be perfect.

I just ran all the TFO related tests with Willy's patch on top of this
patch series.
And everything passes.

On Wed, Jan 25, 2017 at 9:54 AM, Willy Tarreau  wrote:
> On Wed, Jan 25, 2017 at 12:22:05PM -0500, David Miller wrote:
>> From: Wei Wang 
>> Date: Wed, 25 Jan 2017 09:15:34 -0800
>>
>> > Looks like you sent a separate patch on top of this patch series to
>> > address double connect().  Then I think this patch series should be
>> > good to go.
>>
>> Indeed, Willy please give some kind of ACK.
>
> Yes sorry David, for me it's OK. If Wei runs his whole series of tests
> on it again, it should be perfect.
>
> thanks,
> Willy


Re: [PATCH net-next 3/3] net/tcp-fastopen: Add new API support

2017-01-25 Thread Willy Tarreau
On Wed, Jan 25, 2017 at 12:22:05PM -0500, David Miller wrote:
> From: Wei Wang 
> Date: Wed, 25 Jan 2017 09:15:34 -0800
> 
> > Looks like you sent a separate patch on top of this patch series to
> > address double connect().  Then I think this patch series should be
> > good to go.
> 
> Indeed, Willy please give some kind of ACK.

Yes sorry David, for me it's OK. If Wei runs his whole series of tests
on it again, it should be perfect.

thanks,
Willy


Re: [PATCH net-next 3/3] net/tcp-fastopen: Add new API support

2017-01-25 Thread Willy Tarreau
Hi Wei,

On Wed, Jan 25, 2017 at 09:15:34AM -0800, Wei Wang wrote:
> Willy,
> 
> Looks like you sent a separate patch on top of this patch series to address
> double connect().

Yes, sorry, I wanted to reply to this thread after the git-send-email
and got caught immediately after :-)

So as suggested by Eric in order to make the review easier, it was done
on top of your series.

> Then I think this patch series should be good to go.
> I will get your patch tested with our TFO test cases.

I think so as well. Thanks for running the tests. On my side I could fix
the haproxy bug which triggered this, and could verify the the whole
series works fine both with and without the haproxy fix. So I think we're
good now.

Thanks,
Willy


Re: [PATCH net-next 3/3] net/tcp-fastopen: Add new API support

2017-01-25 Thread David Miller
From: Wei Wang 
Date: Wed, 25 Jan 2017 09:15:34 -0800

> Looks like you sent a separate patch on top of this patch series to
> address double connect().  Then I think this patch series should be
> good to go.

Indeed, Willy please give some kind of ACK.

Thanks.


Re: [PATCH net-next 3/3] net/tcp-fastopen: Add new API support

2017-01-24 Thread Willy Tarreau
On Tue, Jan 24, 2017 at 10:51:25AM -0800, Eric Dumazet wrote:
> We do not return -1 / EINPROGRESS but 0
> 
> Do not call connect() twice, it is clearly not supposed to work.

Yes it is, it normally returns -1 / EISCONN on a regular socket :

  EISCONN
  The socket is already connected.

> Fact that it happened to work is still kept for applications not using
> new features (like TCP_FASTOPEN_CONNECT), we wont break this.

Sure but as we saw, deeply burried silent bugs having no effect in
existing applications can suddenly become problematic once TFO is
enabled, and the semantics difference between the two are minimal
enough to warrant being closed.

> I would prefer you submit _if_ needed a patch on top of Wei patch, which
> was carefully tested with our ~500 packetdrill tests.

I totally understand and rest assured that I have a great respect for
this amount of test, which is also why I find the feature really exciting.
I'll probably propose something involving an extra argument then, this
will be much easier to review in the perspective of the existing tests.

> TCP_FASTOPEN_CONNECT + connect() returning 0 is already a violation of
> past behavior, in the sense that no connection really happened yet.

I agree but semantically it could be considered that it means "connect()
already called successfully, feel free to proceed with send() whenever
you want" and that's why it's appealing ;-)

> An application exploiting this return value and consider the server is
> reachable would be mistaken.

100% agree, I even had a private discussion regarding this, mentionning
that I already added a test in haproxy to only enable it if there are
data scheduled for leaving. In my case it's easy because I already have
the same test to decide whether or not to disable TCP_QUICKACK to save
one packet by sending the payload with the first ACK. So in short it will
be :

 if (data) {
  if (disable_quick_ack)
   setsockopt(fd, SOL_TCP, TCP_QUICKACK, , sizeof());
  if (enable_fastopen)
   setsockopt(fd, SOL_TCP, TCP_FASTOPEN_CONNECT, , 
sizeof());
 }
 connect(fd, ...);

But I certainly understand that in some implementations it's could be
trickier. That just reminds me that I haven't tested it combined with
splicing. I'll have to try this.

> We do not support connect() + TCP_FASTOPEN_CONNECT + read(), because we
> do not want to add yet another conditional test in recvmsg() fast path
> for such feature, while existing sendmsg() can already be used to send a
> SYN with FastOpen option.

Yes, I think the mechanism is complex enough internally not to try to
make it even more complex :-)

> So people are not expected to blindly add TCP_FASTOPEN_CONNECT to every
> TCP socket they allocate/use. This would add too much bloat to the
> kernel.

I really think that the true benefit of TFO is for HTTP and SSL where
the client speaks first and already has something to say when the decision
to connect is made. It should be clear in implementors' minds that it
cannot be a default setting and that it doesn't make sense.

Thanks,
Willy


Re: [PATCH net-next 3/3] net/tcp-fastopen: Add new API support

2017-01-24 Thread Eric Dumazet
On Tue, 2017-01-24 at 19:34 +0100, Willy Tarreau wrote:
> Hi Eric,
> 
> On Tue, Jan 24, 2017 at 09:44:49AM -0800, Eric Dumazet wrote:
> > I believe there is a bug in this application.
> > 
> > It does not check connect() return value.
> 
> Yes in fact it does but I noticed the same thing, there's something causing
> the event not to be registered or something like this.
> 
> > When 0 is returned, it makes no sense to wait 200 ms :
> > 
> > > 06:29:24.048593 epoll_ctl(3, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLRDHUP, 
> > > {u32=9, u64=9}}) = 0
> > > 06:29:24.048651 epoll_wait(3, [], 200, 0) = 0
> > > 06:29:24.048699 getsockopt(10, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
> > 
> > And it makes no sense to call connect() again :
> > 
> > > 06:29:24.048751 connect(10, {sa_family=AF_INET, sin_port=htons(8001), 
> > > sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRES
> > > S (Operation now in progress)
> 
> I totally agree.
> 
> > man connect
> > 
> > 
> > Generally,  connection-based protocol sockets may successfully connect()
> > only once;
> > 
> > 
> > 
> > I would prefer we do not add yet another bit in tcp kernel sockets, to
> > work around some oddity in your program Willy.
> 
> I'm fine with chasing the bug on my side and fixing it, but there's a
> semantic trouble anyway with returning -EINPROGRESS :
>   - connect() = 0 indicates that the connection is established
>   - then a further connect() should return -EISCONN, and does so when
> not using TFO
> 
> man connect says this regarding EINPROGRESS :
> 
> The socket is nonblocking and the connection cannot be completed 
> immediately.
> It is possible to  select(2)  or  poll(2)  for completion  by  selecting  
> the
> socket  for  writing.  After  select(2) indicates writability, use 
> getsockopt(2)
> to read the SO_ERROR option at level SOL_SOCKET to determine whether 
> connect()
> completed successfully (SO_ERROR is  zero)  or  unsuccess-fully (SO_ERROR 
> is
> one of the usual error codes listed here, explaining the reason for the 
> failure).
> 
> Here we clearly have an incompatibility between this EINPROGRESS saying
> that we must poll, and poll returning POLLOUT suggesting that it's now
> OK.
> 
> I'm totally fine with not using an extra bit in a scarce area, but then
> we can either add an extra argument to __inet_stream_connect() to say
> "this is sendmsg" or just add an extra flag in the last argument.
> 
> But in general I don't feel comfortable with a semantics that doesn't
> completely match the current and documented one :-/
> 
> Thanks,
> Willy


We do not return -1 / EINPROGRESS but 0

Do not call connect() twice, it is clearly not supposed to work.

Fact that it happened to work is still kept for applications not using
new features (like TCP_FASTOPEN_CONNECT), we wont break this.

I would prefer you submit _if_ needed a patch on top of Wei patch, which
was carefully tested with our ~500 packetdrill tests.



TCP_FASTOPEN_CONNECT + connect() returning 0 is already a violation of
past behavior, in the sense that no connection really happened yet.

An application exploiting this return value and consider the server is
reachable would be mistaken.

We do not support connect() + TCP_FASTOPEN_CONNECT + read(), because we
do not want to add yet another conditional test in recvmsg() fast path
for such feature, while existing sendmsg() can already be used to send a
SYN with FastOpen option.

So people are not expected to blindly add TCP_FASTOPEN_CONNECT to every
TCP socket they allocate/use. This would add too much bloat to the
kernel.




Re: [PATCH net-next 3/3] net/tcp-fastopen: Add new API support

2017-01-24 Thread Willy Tarreau
On Tue, Jan 24, 2017 at 09:42:07AM -0800, Eric Dumazet wrote:
> On Tue, 2017-01-24 at 09:26 -0800, Yuchung Cheng wrote:
> 
> > >
> > > Do you think there's a compelling reason for adding a new option or
> > > are you interested in a small patch to perform the change above ?
> > I like the proposal especially other stack also uses TCP_FASTOPEN
> > https://msdn.microsoft.com/en-us/library/windows/desktop/ms738596(v=vs.85).aspx
> 
> 
> Problem is that might break existing applications that were using
> TCP_FASTOPEN before a connect() (it was a NOP until now)
> 
> I prefer we use a separate new option to be 100% safe, not adding
> regressions.
> 
> Only new applications, tested, will use this new feature at their risk.

That's indeed a good point. I Yuchung's comment above made me wonder
about application's portability but very few OSes will use this and in
the end it might be that portable applications will just add :

   #define TCP_FASTOPEN_CONNECT TCP_FASTOPEN

For other OSes and use TCP_FASTOPEN_CONNECT only for the connect() case.

Willy


Re: [PATCH net-next 3/3] net/tcp-fastopen: Add new API support

2017-01-24 Thread Willy Tarreau
Hi Eric,

On Tue, Jan 24, 2017 at 09:44:49AM -0800, Eric Dumazet wrote:
> I believe there is a bug in this application.
> 
> It does not check connect() return value.

Yes in fact it does but I noticed the same thing, there's something causing
the event not to be registered or something like this.

> When 0 is returned, it makes no sense to wait 200 ms :
> 
> > 06:29:24.048593 epoll_ctl(3, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLRDHUP, {u32=9, 
> > u64=9}}) = 0
> > 06:29:24.048651 epoll_wait(3, [], 200, 0) = 0
> > 06:29:24.048699 getsockopt(10, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
> 
> And it makes no sense to call connect() again :
> 
> > 06:29:24.048751 connect(10, {sa_family=AF_INET, sin_port=htons(8001), 
> > sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRES
> > S (Operation now in progress)

I totally agree.

> man connect
> 
> 
> Generally,  connection-based protocol sockets may successfully connect()
> only once;
> 
> 
> 
> I would prefer we do not add yet another bit in tcp kernel sockets, to
> work around some oddity in your program Willy.

I'm fine with chasing the bug on my side and fixing it, but there's a
semantic trouble anyway with returning -EINPROGRESS :
  - connect() = 0 indicates that the connection is established
  - then a further connect() should return -EISCONN, and does so when
not using TFO

man connect says this regarding EINPROGRESS :

The socket is nonblocking and the connection cannot be completed 
immediately.
It is possible to  select(2)  or  poll(2)  for completion  by  selecting  
the
socket  for  writing.  After  select(2) indicates writability, use 
getsockopt(2)
to read the SO_ERROR option at level SOL_SOCKET to determine whether 
connect()
completed successfully (SO_ERROR is  zero)  or  unsuccess-fully (SO_ERROR is
one of the usual error codes listed here, explaining the reason for the 
failure).

Here we clearly have an incompatibility between this EINPROGRESS saying
that we must poll, and poll returning POLLOUT suggesting that it's now
OK.

I'm totally fine with not using an extra bit in a scarce area, but then
we can either add an extra argument to __inet_stream_connect() to say
"this is sendmsg" or just add an extra flag in the last argument.

But in general I don't feel comfortable with a semantics that doesn't
completely match the current and documented one :-/

Thanks,
Willy


Re: [PATCH net-next 3/3] net/tcp-fastopen: Add new API support

2017-01-24 Thread Eric Dumazet
On Mon, 2017-01-23 at 22:16 +0100, Willy Tarreau wrote:
> Hi Wei,
> 
> first, thanks a lot for doing this, it's really awesome!
> 
> I'm testing it on 4.9 on haproxy and I met a corner case : when I
> perform a connect() to a server and I have nothing to send, upon
> POLLOUT notification since I have nothing to send I simply probe the
> connection using connect() again to see if it returns EISCONN or
> anything else. But here now I'm seeing EINPROGRESS loops.
> 
> To illustrate this, here's what I'm doing :
> 
> :8000  :8001
>   [ client ] ---> [ proxy ] ---> [ server ]
> 
> The proxy is configured to enable TFO to the server and the server
> supports TFO as well. The proxy and the server are in fact two proxy
> instances in haproxy running in the same process for convenience.
> 
> When I already have data to send here's what I'm seeing (so it works fine) :
> 
> 06:29:16.861190 accept4(7, {sa_family=AF_INET, sin_port=htons(33986), 
> sin_addr=inet_addr("192.168.0.176")}, [128->16], SOCK
> _NONBLOCK) = 9
> 06:29:16.861277 setsockopt(9, SOL_TCP, TCP_NODELAY, [1], 4) = 0
> 06:29:16.861342 accept4(7, 0x7ffd0d794430, [128], SOCK_NONBLOCK) = -1 EAGAIN 
> (Resource temporarily unavailable)
> 06:29:16.861417 recvfrom(9, "BLAH\n", 7006, 0, NULL, NULL) = 5
> 06:29:16.861509 recvfrom(9, 0x2619329, 7001, 0, NULL, NULL) = -1 EAGAIN 
> (Resource temporarily unavailable)
> 06:29:16.861657 socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 10
> 06:29:16.861730 fcntl(10, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
> 06:29:16.861779 setsockopt(10, SOL_TCP, TCP_NODELAY, [1], 4) = 0
> 06:29:16.861882 setsockopt(10, SOL_TCP, 0x1e /* TCP_??? */, [1], 4) = 0
> 06:29:16.861942 connect(10, {sa_family=AF_INET, sin_port=htons(8001), 
> sin_addr=inet_addr("127.0.0.1")}, 16) = 0
> 06:29:16.862015 epoll_ctl(3, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLRDHUP, {u32=9, 
> u64=9}}) = 0
> 06:29:16.862072 epoll_wait(3, [], 200, 0) = 0
> 06:29:16.862126 sendto(10, "BLAH\n", 5, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 
> 5
> 06:29:16.862281 epoll_wait(3, [{EPOLLIN, {u32=8, u64=8}}], 200, 0) = 1
> 06:29:16.862334 recvfrom(10, 0x26173a4, 8030, 0, NULL, NULL) = -1 EAGAIN 
> (Resource temporarily unavailable)
> 06:29:16.862385 accept4(8, {sa_family=AF_INET, sin_port=htons(46760), 
> sin_addr=inet_addr("127.0.0.1")}, [128->16], SOCK_NON
> BLOCK) = 11
> 06:29:16.862450 setsockopt(11, SOL_TCP, TCP_NODELAY, [1], 4) = 0
> 06:29:16.862504 accept4(8, 0x7ffd0d794430, [128], SOCK_NONBLOCK) = -1 EAGAIN 
> (Resource temporarily unavailable)
> 06:29:16.862564 recvfrom(11, "BLAH\n", 7006, 0, NULL, NULL) = 5
> 
> 
> When I don't have data, here's what I'm seeing :
> 
> 06:29:24.047801 accept4(7, {sa_family=AF_INET, sin_port=htons(33988), 
> sin_addr=inet_addr("192.168.0.176")}, [128->16], SOCK
> _NONBLOCK) = 9
> 06:29:24.047899 setsockopt(9, SOL_TCP, TCP_NODELAY, [1], 4) = 0
> 06:29:24.047966 accept4(7, 0x7ffdedb2c7f0, [128], SOCK_NONBLOCK) = -1 EAGAIN 
> (Resource temporarily unavailable)
> 06:29:24.048043 recvfrom(9, 0xd31324, 7006, 0, NULL, NULL) = -1 EAGAIN 
> (Resource temporarily unavailable)
> 06:29:24.048281 socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 10
> 06:29:24.048342 fcntl(10, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
> 06:29:24.048392 setsockopt(10, SOL_TCP, TCP_NODELAY, [1], 4) = 0
> 06:29:24.048447 setsockopt(10, SOL_TCP, 0x1e /* TCP_??? */, [1], 4) = 0
> 06:29:24.048508 connect(10, {sa_family=AF_INET, sin_port=htons(8001), 
> sin_addr=inet_addr("127.0.0.1")}, 16) = 0

I believe there is a bug in this application.

It does not check connect() return value.

When 0 is returned, it makes no sense to wait 200 ms :

> 06:29:24.048593 epoll_ctl(3, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLRDHUP, {u32=9, 
> u64=9}}) = 0
> 06:29:24.048651 epoll_wait(3, [], 200, 0) = 0
> 06:29:24.048699 getsockopt(10, SOL_SOCKET, SO_ERROR, [0], [4]) = 0

And it makes no sense to call connect() again :

> 06:29:24.048751 connect(10, {sa_family=AF_INET, sin_port=htons(8001), 
> sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRES
> S (Operation now in progress)

man connect


Generally,  connection-based protocol sockets may successfully connect()
only once;



I would prefer we do not add yet another bit in tcp kernel sockets, to
work around some oddity in your program Willy.

> 06:29:24.048808 epoll_ctl(3, EPOLL_CTL_ADD, 10, {EPOLLOUT, {u32=10, u64=10}}) 
> = 0
> 06:29:24.048860 epoll_wait(3, [{EPOLLOUT, {u32=10, u64=10}}], 200, 1000) = 1
> 06:29:24.048912 getsockopt(10, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
> 06:29:24.048963 connect(10, {sa_family=AF_INET, sin_port=htons(8001), 
> sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRES
> S (Operation now in progress)
> 06:29:24.049018 epoll_wait(3, [{EPOLLOUT, {u32=10, u64=10}}], 200, 1000) = 1
> 06:29:24.049072 getsockopt(10, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
> 06:29:24.049122 connect(10, {sa_family=AF_INET, sin_port=htons(8001), 
> sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRES
> S (Operation now in progress)
> 
> 
> I theorically 

Re: [PATCH net-next 3/3] net/tcp-fastopen: Add new API support

2017-01-24 Thread Eric Dumazet
On Tue, 2017-01-24 at 09:26 -0800, Yuchung Cheng wrote:

> >
> > Do you think there's a compelling reason for adding a new option or
> > are you interested in a small patch to perform the change above ?
> I like the proposal especially other stack also uses TCP_FASTOPEN
> https://msdn.microsoft.com/en-us/library/windows/desktop/ms738596(v=vs.85).aspx


Problem is that might break existing applications that were using
TCP_FASTOPEN before a connect() (it was a NOP until now)

I prefer we use a separate new option to be 100% safe, not adding
regressions.

Only new applications, tested, will use this new feature at their risk.





Re: [PATCH net-next 3/3] net/tcp-fastopen: Add new API support

2017-01-24 Thread Yuchung Cheng
On Mon, Jan 23, 2017 at 11:30 PM, Willy Tarreau  wrote:
>
> On Mon, Jan 23, 2017 at 10:59:22AM -0800, Wei Wang wrote:
> > This patch adds a new socket option, TCP_FASTOPEN_CONNECT, as an
> > alternative way to perform Fast Open on the active side (client).
>
> Wei, I think that nothing prevents from reusin the original TCP_FASTOPEN
> sockopt instead of adding a new one. The original one does this :
>
> case TCP_FASTOPEN:
> if (val >= 0 && ((1 << sk->sk_state) & (TCPF_CLOSE |
> TCPF_LISTEN))) {
> tcp_fastopen_init_key_once(true);
>
> fastopen_queue_tune(sk, val);
> } else {
> err = -EINVAL;
> }
> break;
>
> and your new option does this :
>
> case TCP_FASTOPEN_CONNECT:
> if (val > 1 || val < 0) {
> err = -EINVAL;
> } else if (sysctl_tcp_fastopen & TFO_CLIENT_ENABLE) {
> if (sk->sk_state == TCP_CLOSE)
> tp->fastopen_connect = val;
> else
> err = -EINVAL;
> } else {
> err = -EOPNOTSUPP;
> }
> break;
>
> Now if we compare :
>   - the value ranges are the same (0,1)
>   - tcp_fastopen_init_key_once() only performs an initialization once
>   - fastopen_queue_tune() only sets sk->max_qlen based on the backlog,
> this has no effect on an outgoing connection ;
>   - tp->fastopen_connect can be applied to a listening socket without
> side effect.
>
> Thus I think we can merge them this way :
>
> case TCP_FASTOPEN:
> if (val >= 0) {
> if ((sysctl_tcp_fastopen & TFO_CLIENT_ENABLE) &&
> (sk->sk_state == TCP_CLOSE)
> tp->fastopen_connect = val;
>
> if ((1 << sk->sk_state) & (TCPF_CLOSE | 
> TCPF_LISTEN))) {
> tcp_fastopen_init_key_once(true);
> fastopen_queue_tune(sk, val);
> }
> } else {
> err = -EINVAL;
> }
> break;
>
> And for the userland, the API is even simpler because we can use the
> same TCP_FASTOPEN sockopt regardless of the socket direction. Also,
> I don't know if TCP_FASTOPEN is supported on simultaneous connect,
> but at least if it works it would be easier to understand this way.
It supports partially (i.e. send SYN data but not accept simul.
SYN-data crossing).
Here is the snippet of packetdrill test we use internally:

`sysctl net.ipv4.tcp_timestamps=0`

// Cache warmup: send a Fast Open cookie request
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
0.000 sendto(3, ..., 0, MSG_FASTOPEN, ..., ...) = -1 EINPROGRESS
(Operation is now in progress)
0.000 > S 0:0(0) 
0.010 < S. 123:123(0) ack 1 win 14600 
0.010 > . 1:1(0) ack 1
0.020 close(3) = 0
0.020 > F. 1:1(0) ack 1
0.030 < F. 1:1(0) ack 2 win 92
0.030 > .  2:2(0) ack 2


//
// Test: simulatenous fast open
//
+.010 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4
+.000 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+.000 sendto(4, ..., 1000, MSG_FASTOPEN, ..., ...) = 1000
+.000 > S 0:1000(1000) 
// Simul. SYN-data crossing: we don't support that yet so ack only remote ISN
+.005 < S 1234:1734(500) win 14600 
+.000 > S. 0:0(0) ack 1235 

// SYN data is never retried.
+.045 < S. 1234:1234(0) ack 1001 win 14600 
+.000 > . 1001:1001(0) ack 1
// The other end retries
+.100 < P. 1:501(500) ack 1000 win 257
+.000 > . 1001:1001(0) ack 501
+.000 read(4, ..., 4096) = 500
+.000 close(4) = 0
+.000 > F. 1001:1001(0) ack 501
+.050 < F. 501:501(0) ack 1002 win 257
+.000 > . 1002:1002(0) ack 502


>
> Do you think there's a compelling reason for adding a new option or
> are you interested in a small patch to perform the change above ?
I like the proposal especially other stack also uses TCP_FASTOPEN
https://msdn.microsoft.com/en-us/library/windows/desktop/ms738596(v=vs.85).aspx


>
> Regards,
> Willy


Re: [PATCH net-next 3/3] net/tcp-fastopen: Add new API support

2017-01-23 Thread Willy Tarreau
On Mon, Jan 23, 2017 at 10:59:22AM -0800, Wei Wang wrote:
> This patch adds a new socket option, TCP_FASTOPEN_CONNECT, as an
> alternative way to perform Fast Open on the active side (client).

Wei, I think that nothing prevents from reusin the original TCP_FASTOPEN
sockopt instead of adding a new one. The original one does this :

case TCP_FASTOPEN:
if (val >= 0 && ((1 << sk->sk_state) & (TCPF_CLOSE |
TCPF_LISTEN))) {
tcp_fastopen_init_key_once(true);

fastopen_queue_tune(sk, val);
} else {
err = -EINVAL;
}
break;

and your new option does this :

case TCP_FASTOPEN_CONNECT:
if (val > 1 || val < 0) {
err = -EINVAL;
} else if (sysctl_tcp_fastopen & TFO_CLIENT_ENABLE) {
if (sk->sk_state == TCP_CLOSE)
tp->fastopen_connect = val;
else
err = -EINVAL;
} else {
err = -EOPNOTSUPP;
}
break;

Now if we compare :
  - the value ranges are the same (0,1)
  - tcp_fastopen_init_key_once() only performs an initialization once
  - fastopen_queue_tune() only sets sk->max_qlen based on the backlog,
this has no effect on an outgoing connection ;
  - tp->fastopen_connect can be applied to a listening socket without
side effect.

Thus I think we can merge them this way :

case TCP_FASTOPEN:
if (val >= 0) {
if ((sysctl_tcp_fastopen & TFO_CLIENT_ENABLE) &&
(sk->sk_state == TCP_CLOSE)
tp->fastopen_connect = val;

if ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN))) {
tcp_fastopen_init_key_once(true);
fastopen_queue_tune(sk, val);
}
} else {
err = -EINVAL;
}
break;

And for the userland, the API is even simpler because we can use the
same TCP_FASTOPEN sockopt regardless of the socket direction. Also,
I don't know if TCP_FASTOPEN is supported on simultaneous connect,
but at least if it works it would be easier to understand this way.

Do you think there's a compelling reason for adding a new option or
are you interested in a small patch to perform the change above ?

Regards,
Willy


Re: [PATCH net-next 3/3] net/tcp-fastopen: Add new API support

2017-01-23 Thread Willy Tarreau
On Mon, Jan 23, 2017 at 02:57:31PM -0800, Wei Wang wrote:
> Yes. That seems to be a valid fix to it.
> Let me try it with my existing test cases as well to see if it works for
> all scenarios I have.

Perfect. Note that since the state 2 is transient I initially thought
about abusing the flags passed to __inet_stream_connect() to say "hey
I'm sendmsg() and not connect()" but that would have been a ugly hack
while here we really have the 3 socket states represented eventhough
one changes twice around a call.

Thanks,
Willy


Re: [PATCH net-next 3/3] net/tcp-fastopen: Add new API support

2017-01-23 Thread Willy Tarreau
On Mon, Jan 23, 2017 at 11:01:21PM +0100, Willy Tarreau wrote:
> On Mon, Jan 23, 2017 at 10:37:32PM +0100, Willy Tarreau wrote:
> > On Mon, Jan 23, 2017 at 01:28:53PM -0800, Wei Wang wrote:
> > > Hi Willy,
> > > 
> > > True. If you call connect() multiple times on a socket which already has
> > > cookie without a write(), the second and onward connect() call will return
> > > EINPROGRESS.
> > > It is basically because the following code block in 
> > > __inet_stream_connect()
> > > can't distinguish if it is the first time connect() is called or not:
> > > 
> > > case SS_CONNECTING:
> > > if (inet_sk(sk)->defer_connect)  <- defer_connect will
> > > be 0 only after a write() is called
> > > err = -EINPROGRESS;
> > > else
> > > err = -EALREADY;
> > > /* Fall out of switch with err, set for this state */
> > > break;
> > 
> > Ah OK that totally makes sense, thanks for the explanation!
> > 
> > > I guess we can add some extra logic here to address this issue. So the
> > > second connect() and onwards will return EALREADY.
> 
> Thinking about it a bit more, I really think it would make more
> sense to return -EISCONN here if we want to match the semantics
> of a connect() returning zero on the first call. This way the
> caller knows it can write whenever it wants and can disable
> write polling until needed.
> 
> I'm currently rebuilding a kernel with this change to see if it
> behaves any better :
> 
> -err = -EINPROGRESS;
> +err = -EISCONN;

OK so obviously it didn't work since sendmsg() goes there as well.

But that made me realize that there really are 3 states, not 2 :

  - after connect() and before sendmsg() :
 defer_accept = 1, we want to lie to userland and pretend we're
 connected so that userland can call send(). A connect() must
 return either zero or -EISCONN.

  - during first sendmsg(), still connecting :
 the connection is in progress, EINPROGRESS must be returned to
 the first sendmsg().

  - after the first sendmsg() :
 defer_accept = 0 ; connect() must return -EALREADY. We want to
 return real socket states from now on.

Thus I modified defer_accept to take two bits to represent the extra
state we need to indicate the transition. Now sendmsg() upgrades
defer_accept from 1 to 2 before calling __inet_stream_connect(), which
then knows it must return EINPROGRESS to sendmsg().

This way we correctly cover all these situations. Even if we call
connect() again after the first connect() attempt it still matches
the first result :

accept4(7, {sa_family=AF_INET, sin_port=htons(36860), 
sin_addr=inet_addr("192.168.0.176")}, [128->16], SOCK_NONBLOCK) = 1
setsockopt(1, SOL_TCP, TCP_NODELAY, [1], 4) = 0
accept4(7, 0x7ffc2282fcb0, [128], SOCK_NONBLOCK) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(1, 0x7b53a4, 7006, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily 
unavailable)
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 2
fcntl(2, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
setsockopt(2, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(2, SOL_TCP, 0x1e /* TCP_??? */, [1], 4) = 0
connect(2, {sa_family=AF_INET, sin_port=htons(8001), 
sin_addr=inet_addr("127.0.0.1")}, 16) = 0
epoll_ctl(0, EPOLL_CTL_ADD, 1, {EPOLLIN|EPOLLRDHUP, {u32=1, u64=1}}) = 0
epoll_wait(0, [], 200, 0)   = 0
connect(2, {sa_family=AF_INET, sin_port=htons(8001), 
sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EISCONN (Transport endpoint is 
already connected)
epoll_wait(0, [], 200, 0)   = 0
recvfrom(2, 0x7b53a4, 8030, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily 
unavailable)
epoll_ctl(0, EPOLL_CTL_ADD, 2, {EPOLLIN|EPOLLRDHUP, {u32=2, u64=2}}) = 0
epoll_wait(0, [], 200, 1000)= 0
epoll_wait(0, [], 200, 1000)= 0
epoll_wait(0, [], 200, 1000)= 0
epoll_wait(0, [{EPOLLIN, {u32=1, u64=1}}], 200, 1000) = 1
recvfrom(1, "GET / HTTP/1.1\r\n", 8030, 0, NULL, NULL) = 16
sendto(2, "GET / HTTP/1.1\r\n", 16, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 16
epoll_wait(0, [{EPOLLIN, {u32=1, u64=1}}], 200, 1000) = 1
recvfrom(1, "\r\n", 8030, 0, NULL, NULL) = 2
sendto(2, "\r\n", 2, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 2
epoll_wait(0, [{EPOLLIN|EPOLLRDHUP, {u32=2, u64=2}}], 200, 1000) = 1
recvfrom(2, "HTTP/1.1 302 Found\r\nCache-Contro"..., 8030, 0, NULL, NULL) = 98
sendto(1, "HTTP/1.1 302 Found\r\nCache-Contro"..., 98, 
MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 98
shutdown(1, SHUT_WR)= 0
epoll_ctl(0, EPOLL_CTL_DEL, 2, 0x6ff55c) = 0
epoll_wait(0, [{EPOLLIN|EPOLLHUP|EPOLLRDHUP, {u32=1, u64=1}}], 200, 1000) = 1
recvfrom(1, "", 8030, 0, NULL, NULL)= 0
close(1)= 0
shutdown(2, SHUT_WR)= 0
close(2)= 0

Here's what I changed on top of your patchset :

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h

Re: [PATCH net-next 3/3] net/tcp-fastopen: Add new API support

2017-01-23 Thread Willy Tarreau
On Mon, Jan 23, 2017 at 10:37:32PM +0100, Willy Tarreau wrote:
> On Mon, Jan 23, 2017 at 01:28:53PM -0800, Wei Wang wrote:
> > Hi Willy,
> > 
> > True. If you call connect() multiple times on a socket which already has
> > cookie without a write(), the second and onward connect() call will return
> > EINPROGRESS.
> > It is basically because the following code block in __inet_stream_connect()
> > can't distinguish if it is the first time connect() is called or not:
> > 
> > case SS_CONNECTING:
> > if (inet_sk(sk)->defer_connect)  <- defer_connect will
> > be 0 only after a write() is called
> > err = -EINPROGRESS;
> > else
> > err = -EALREADY;
> > /* Fall out of switch with err, set for this state */
> > break;
> 
> Ah OK that totally makes sense, thanks for the explanation!
> 
> > I guess we can add some extra logic here to address this issue. So the
> > second connect() and onwards will return EALREADY.

Thinking about it a bit more, I really think it would make more
sense to return -EISCONN here if we want to match the semantics
of a connect() returning zero on the first call. This way the
caller knows it can write whenever it wants and can disable
write polling until needed.

I'm currently rebuilding a kernel with this change to see if it
behaves any better :

-err = -EINPROGRESS;
+err = -EISCONN;

I'll keep you updated.

Thanks,
Willy


Re: [PATCH net-next 3/3] net/tcp-fastopen: Add new API support

2017-01-23 Thread Willy Tarreau
On Mon, Jan 23, 2017 at 01:28:53PM -0800, Wei Wang wrote:
> Hi Willy,
> 
> True. If you call connect() multiple times on a socket which already has
> cookie without a write(), the second and onward connect() call will return
> EINPROGRESS.
> It is basically because the following code block in __inet_stream_connect()
> can't distinguish if it is the first time connect() is called or not:
> 
> case SS_CONNECTING:
> if (inet_sk(sk)->defer_connect)  <- defer_connect will
> be 0 only after a write() is called
> err = -EINPROGRESS;
> else
> err = -EALREADY;
> /* Fall out of switch with err, set for this state */
> break;

Ah OK that totally makes sense, thanks for the explanation!

> I guess we can add some extra logic here to address this issue. So the
> second connect() and onwards will return EALREADY.

If that's possible at little cost it would be nice, because your patch
makes it so easy to enable TFO on outgoing connections now that I
expect many people will blindly run the setsockopt() before connect().

Do not hesitate to ask me to run some tests. While 4 years ago it was
not easy, here it's very simple for me. By the way I'm seeing an ~10%
performance increase on haproxy by enabling this, it's really cool!

Thanks,
Willy


Re: [PATCH net-next 3/3] net/tcp-fastopen: Add new API support

2017-01-23 Thread Willy Tarreau
Hi Wei,

first, thanks a lot for doing this, it's really awesome!

I'm testing it on 4.9 on haproxy and I met a corner case : when I
perform a connect() to a server and I have nothing to send, upon
POLLOUT notification since I have nothing to send I simply probe the
connection using connect() again to see if it returns EISCONN or
anything else. But here now I'm seeing EINPROGRESS loops.

To illustrate this, here's what I'm doing :

:8000  :8001
  [ client ] ---> [ proxy ] ---> [ server ]

The proxy is configured to enable TFO to the server and the server
supports TFO as well. The proxy and the server are in fact two proxy
instances in haproxy running in the same process for convenience.

When I already have data to send here's what I'm seeing (so it works fine) :

06:29:16.861190 accept4(7, {sa_family=AF_INET, sin_port=htons(33986), 
sin_addr=inet_addr("192.168.0.176")}, [128->16], SOCK
_NONBLOCK) = 9
06:29:16.861277 setsockopt(9, SOL_TCP, TCP_NODELAY, [1], 4) = 0
06:29:16.861342 accept4(7, 0x7ffd0d794430, [128], SOCK_NONBLOCK) = -1 EAGAIN 
(Resource temporarily unavailable)
06:29:16.861417 recvfrom(9, "BLAH\n", 7006, 0, NULL, NULL) = 5
06:29:16.861509 recvfrom(9, 0x2619329, 7001, 0, NULL, NULL) = -1 EAGAIN 
(Resource temporarily unavailable)
06:29:16.861657 socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 10
06:29:16.861730 fcntl(10, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
06:29:16.861779 setsockopt(10, SOL_TCP, TCP_NODELAY, [1], 4) = 0
06:29:16.861882 setsockopt(10, SOL_TCP, 0x1e /* TCP_??? */, [1], 4) = 0
06:29:16.861942 connect(10, {sa_family=AF_INET, sin_port=htons(8001), 
sin_addr=inet_addr("127.0.0.1")}, 16) = 0
06:29:16.862015 epoll_ctl(3, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLRDHUP, {u32=9, 
u64=9}}) = 0
06:29:16.862072 epoll_wait(3, [], 200, 0) = 0
06:29:16.862126 sendto(10, "BLAH\n", 5, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 5
06:29:16.862281 epoll_wait(3, [{EPOLLIN, {u32=8, u64=8}}], 200, 0) = 1
06:29:16.862334 recvfrom(10, 0x26173a4, 8030, 0, NULL, NULL) = -1 EAGAIN 
(Resource temporarily unavailable)
06:29:16.862385 accept4(8, {sa_family=AF_INET, sin_port=htons(46760), 
sin_addr=inet_addr("127.0.0.1")}, [128->16], SOCK_NON
BLOCK) = 11
06:29:16.862450 setsockopt(11, SOL_TCP, TCP_NODELAY, [1], 4) = 0
06:29:16.862504 accept4(8, 0x7ffd0d794430, [128], SOCK_NONBLOCK) = -1 EAGAIN 
(Resource temporarily unavailable)
06:29:16.862564 recvfrom(11, "BLAH\n", 7006, 0, NULL, NULL) = 5


When I don't have data, here's what I'm seeing :

06:29:24.047801 accept4(7, {sa_family=AF_INET, sin_port=htons(33988), 
sin_addr=inet_addr("192.168.0.176")}, [128->16], SOCK
_NONBLOCK) = 9
06:29:24.047899 setsockopt(9, SOL_TCP, TCP_NODELAY, [1], 4) = 0
06:29:24.047966 accept4(7, 0x7ffdedb2c7f0, [128], SOCK_NONBLOCK) = -1 EAGAIN 
(Resource temporarily unavailable)
06:29:24.048043 recvfrom(9, 0xd31324, 7006, 0, NULL, NULL) = -1 EAGAIN 
(Resource temporarily unavailable)
06:29:24.048281 socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 10
06:29:24.048342 fcntl(10, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
06:29:24.048392 setsockopt(10, SOL_TCP, TCP_NODELAY, [1], 4) = 0
06:29:24.048447 setsockopt(10, SOL_TCP, 0x1e /* TCP_??? */, [1], 4) = 0
06:29:24.048508 connect(10, {sa_family=AF_INET, sin_port=htons(8001), 
sin_addr=inet_addr("127.0.0.1")}, 16) = 0
06:29:24.048593 epoll_ctl(3, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLRDHUP, {u32=9, 
u64=9}}) = 0
06:29:24.048651 epoll_wait(3, [], 200, 0) = 0
06:29:24.048699 getsockopt(10, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
06:29:24.048751 connect(10, {sa_family=AF_INET, sin_port=htons(8001), 
sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRES
S (Operation now in progress)
06:29:24.048808 epoll_ctl(3, EPOLL_CTL_ADD, 10, {EPOLLOUT, {u32=10, u64=10}}) = 0
06:29:24.048860 epoll_wait(3, [{EPOLLOUT, {u32=10, u64=10}}], 200, 1000) = 1
06:29:24.048912 getsockopt(10, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
06:29:24.048963 connect(10, {sa_family=AF_INET, sin_port=htons(8001), 
sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRES
S (Operation now in progress)
06:29:24.049018 epoll_wait(3, [{EPOLLOUT, {u32=10, u64=10}}], 200, 1000) = 1
06:29:24.049072 getsockopt(10, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
06:29:24.049122 connect(10, {sa_family=AF_INET, sin_port=htons(8001), 
sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRES
S (Operation now in progress)


I theorically understand why but I think we have something wrong here
and instead we should have -1 EISCONN (to pretend the connection is
established) or return EALREADY (to mention that a previous request was
already made and that we're waiting for the next step).

While I can instrument my connect() *not* to use TFO when connecting
without any pending data, I don't always know this (eg when I use
openssl and cross fingers so that it decides to quickly send something
on the next round).

I think it's easy to fall into this tricky corner case and am wondering
what can be done about it. Does the EINPROGRESS happen only because there
is no cookie yet ? If so, 

Re: [PATCH net-next 3/3] net/tcp-fastopen: Add new API support

2017-01-23 Thread Yuchung Cheng
On Mon, Jan 23, 2017 at 10:59 AM, Wei Wang  wrote:
> From: Wei Wang 
>
> This patch adds a new socket option, TCP_FASTOPEN_CONNECT, as an
> alternative way to perform Fast Open on the active side (client). Prior
> to this patch, a client needs to replace the connect() call with
> sendto(MSG_FASTOPEN). This can be cumbersome for applications who want
> to use Fast Open: these socket operations are often done in lower layer
> libraries used by many other applications. Changing these libraries
> and/or the socket call sequences are not trivial. A more convenient
> approach is to perform Fast Open by simply enabling a socket option when
> the socket is created w/o changing other socket calls sequence:
>   s = socket()
> create a new socket
>   setsockopt(s, IPPROTO_TCP, TCP_FASTOPEN_CONNECT …);
> newly introduced sockopt
> If set, new functionality described below will be used.
> Return ENOTSUPP if TFO is not supported or not enabled in the
> kernel.
>
>   connect()
> With cookie present, return 0 immediately.
> With no cookie, initiate 3WHS with TFO cookie-request option and
> return -1 with errno = EINPROGRESS.
>
>   write()/sendmsg()
> With cookie present, send out SYN with data and return the number of
> bytes buffered.
> With no cookie, and 3WHS not yet completed, return -1 with errno =
> EINPROGRESS.
> No MSG_FASTOPEN flag is needed.
>
>   read()
> Return -1 with errno = EWOULDBLOCK/EAGAIN if connect() is called but
> write() is not called yet.
> Return -1 with errno = EWOULDBLOCK/EAGAIN if connection is
> established but no msg is received yet.
> Return number of bytes read if socket is established and there is
> msg received.
>
> The new API simplifies life for applications that always perform a write()
> immediately after a successful connect(). Such applications can now take
> advantage of Fast Open by merely making one new setsockopt() call at the time
> of creating the socket. Nothing else about the application's socket call
> sequence needs to change.
>
> Signed-off-by: Wei Wang 
> ---
Acked-by: Yuchung Cheng 

Thanks for making this happen.

>  include/linux/tcp.h  |  3 ++-
>  include/net/inet_sock.h  |  6 +-
>  include/net/tcp.h|  1 +
>  include/uapi/linux/tcp.h |  1 +
>  net/ipv4/af_inet.c   | 31 ---
>  net/ipv4/tcp.c   | 35 ++-
>  net/ipv4/tcp_fastopen.c  | 33 +
>  net/ipv4/tcp_ipv4.c  |  7 ++-
>  net/ipv6/tcp_ipv6.c  |  5 +
>  9 files changed, 111 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> index 5371b3d70cfe..f88f4649ba6f 100644
> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -222,7 +222,8 @@ struct tcp_sock {
> u32 chrono_stat[3]; /* Time in jiffies for chrono_stat stats */
> u8  chrono_type:2,  /* current chronograph type */
> rate_app_limited:1,  /* rate_{delivered,interval_us} limited? 
> */
> -   unused:5;
> +   fastopen_connect:1, /* FASTOPEN_CONNECT sockopt */
> +   unused:4;
> u8  nonagle : 4,/* Disable Nagle algorithm? */
> thin_lto: 1,/* Use linear timeouts for thin streams */
> unused1 : 1,
> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
> index c9cff977a7fb..aa95053dfc78 100644
> --- a/include/net/inet_sock.h
> +++ b/include/net/inet_sock.h
> @@ -206,7 +206,11 @@ struct inet_sock {
> transparent:1,
> mc_all:1,
> nodefrag:1;
> -   __u8bind_address_no_port:1;
> +   __u8bind_address_no_port:1,
> +   defer_connect:1; /* Indicates that 
> fastopen_connect is set
> + * and cookie exists so we 
> defer connect
> + * until first data frame is 
> written
> + */
> __u8rcv_tos;
> __u8convert_csum;
> int uc_index;
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index de67541d7adf..6ec4ea652f3f 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1495,6 +1495,7 @@ struct sock *tcp_try_fastopen(struct sock *sk, struct 
> sk_buff *skb,
>  void tcp_fastopen_init_key_once(bool publish);
>  bool tcp_fastopen_cookie_check(struct sock *sk, u16 *mss,
>  struct tcp_fastopen_cookie *cookie);
> +bool tcp_fastopen_defer_connect(struct sock *sk, int *err);
>  #define TCP_FASTOPEN_KEY_LENGTH 16
>
>  /* Fastopen key context */
> diff --git a/include/uapi/linux/tcp.h 

Re: [PATCH net-next 3/3] net/tcp-fastopen: Add new API support

2017-01-23 Thread Eric Dumazet
On Mon, 2017-01-23 at 10:59 -0800, Wei Wang wrote:
> From: Wei Wang 
> 
> This patch adds a new socket option, TCP_FASTOPEN_CONNECT, as an
> alternative way to perform Fast Open on the active side (client). Prior
> to this patch, a client needs to replace the connect() call with
> sendto(MSG_FASTOPEN). This can be cumbersome for applications who want
> to use Fast Open: these socket operations are often done in lower layer
> libraries used by many other applications. Changing these libraries
> and/or the socket call sequences are not trivial. A more convenient
> approach is to perform Fast Open by simply enabling a socket option when
> the socket is created w/o changing other socket calls sequence:

> Signed-off-by: Wei Wang 
> ---

Thanks for this hard work Wei.

Acked-by: Eric Dumazet 




[PATCH net-next 3/3] net/tcp-fastopen: Add new API support

2017-01-23 Thread Wei Wang
From: Wei Wang 

This patch adds a new socket option, TCP_FASTOPEN_CONNECT, as an
alternative way to perform Fast Open on the active side (client). Prior
to this patch, a client needs to replace the connect() call with
sendto(MSG_FASTOPEN). This can be cumbersome for applications who want
to use Fast Open: these socket operations are often done in lower layer
libraries used by many other applications. Changing these libraries
and/or the socket call sequences are not trivial. A more convenient
approach is to perform Fast Open by simply enabling a socket option when
the socket is created w/o changing other socket calls sequence:
  s = socket()
create a new socket
  setsockopt(s, IPPROTO_TCP, TCP_FASTOPEN_CONNECT …);
newly introduced sockopt
If set, new functionality described below will be used.
Return ENOTSUPP if TFO is not supported or not enabled in the
kernel.

  connect()
With cookie present, return 0 immediately.
With no cookie, initiate 3WHS with TFO cookie-request option and
return -1 with errno = EINPROGRESS.

  write()/sendmsg()
With cookie present, send out SYN with data and return the number of
bytes buffered.
With no cookie, and 3WHS not yet completed, return -1 with errno =
EINPROGRESS.
No MSG_FASTOPEN flag is needed.

  read()
Return -1 with errno = EWOULDBLOCK/EAGAIN if connect() is called but
write() is not called yet.
Return -1 with errno = EWOULDBLOCK/EAGAIN if connection is
established but no msg is received yet.
Return number of bytes read if socket is established and there is
msg received.

The new API simplifies life for applications that always perform a write()
immediately after a successful connect(). Such applications can now take
advantage of Fast Open by merely making one new setsockopt() call at the time
of creating the socket. Nothing else about the application's socket call
sequence needs to change.

Signed-off-by: Wei Wang 
---
 include/linux/tcp.h  |  3 ++-
 include/net/inet_sock.h  |  6 +-
 include/net/tcp.h|  1 +
 include/uapi/linux/tcp.h |  1 +
 net/ipv4/af_inet.c   | 31 ---
 net/ipv4/tcp.c   | 35 ++-
 net/ipv4/tcp_fastopen.c  | 33 +
 net/ipv4/tcp_ipv4.c  |  7 ++-
 net/ipv6/tcp_ipv6.c  |  5 +
 9 files changed, 111 insertions(+), 11 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 5371b3d70cfe..f88f4649ba6f 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -222,7 +222,8 @@ struct tcp_sock {
u32 chrono_stat[3]; /* Time in jiffies for chrono_stat stats */
u8  chrono_type:2,  /* current chronograph type */
rate_app_limited:1,  /* rate_{delivered,interval_us} limited? */
-   unused:5;
+   fastopen_connect:1, /* FASTOPEN_CONNECT sockopt */
+   unused:4;
u8  nonagle : 4,/* Disable Nagle algorithm? */
thin_lto: 1,/* Use linear timeouts for thin streams */
unused1 : 1,
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index c9cff977a7fb..aa95053dfc78 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -206,7 +206,11 @@ struct inet_sock {
transparent:1,
mc_all:1,
nodefrag:1;
-   __u8bind_address_no_port:1;
+   __u8bind_address_no_port:1,
+   defer_connect:1; /* Indicates that 
fastopen_connect is set
+ * and cookie exists so we 
defer connect
+ * until first data frame is 
written
+ */
__u8rcv_tos;
__u8convert_csum;
int uc_index;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index de67541d7adf..6ec4ea652f3f 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1495,6 +1495,7 @@ struct sock *tcp_try_fastopen(struct sock *sk, struct 
sk_buff *skb,
 void tcp_fastopen_init_key_once(bool publish);
 bool tcp_fastopen_cookie_check(struct sock *sk, u16 *mss,
 struct tcp_fastopen_cookie *cookie);
+bool tcp_fastopen_defer_connect(struct sock *sk, int *err);
 #define TCP_FASTOPEN_KEY_LENGTH 16
 
 /* Fastopen key context */
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index c53de2691cec..6ff35eb48d10 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -116,6 +116,7 @@ enum {
 #define TCP_SAVE_SYN   27  /* Record SYN headers for new 
connections */
 #define TCP_SAVED_SYN  28  /* Get SYN headers recorded for 
connection */
 #define