On 2/5/20 8:26 AM, Xue, Ying wrote:

Sorry, I have a different opinion about this behavior. In the below scenario, it’s a bit weird if connect() returns 0. Although the connection was ever successful, it had been broken when connect() returned. So, I think connect() should return an error code at that moment. Whenever user receives an error from connect(), the user will call close() to destroy the socket.

But the connect() was successful! So why shouldn't we return the value that confirms that? The fact that the connection has already been closed by the peer doesn't change that fact.

On the contrary, if connect() returns 0, we will allow user to fetch messages from socket receive queue, which sounds reasonable. But at that moment, when user tries to call write(), it will return an error, which is quite strange for user, and it’s hard to be understandable.

When the user then tries to do write() he will encounter the situation that the peer has closed the connection, which is also a completely normal behavior that may happens at any time.

If connect() returns 0, the biggest benefit is that we give user a chance to get data from receive queue, but I am seriously doubt that the data from server side is meaningless for client. So we don’t need to tell client so complex truth, instead, we can hide the complexity internally and simply tell client connect() is failed.

To me this is a completely normal scenario. The client does connect(), maybe sends one or more messages, the server accepts the connection, reads the messages, responds with one or more messages, and closes the connection.

The response to connect() that  the client sees is semantically correct. What we are doing is just (re-)separating two different events that occur so close in time that they mess up our FSM. The patch fixes this so that they are agian reported as separate events.

BR
///jon

Thanks,

Ying

*From:*Jon Maloy [mailto:jma...@redhat.com]
*Sent:* Thursday, January 23, 2020 10:10 PM
*To:* Tuong Lien Tong; Xue, Ying
*Cc:* tipc-discussion@lists.sourceforge.net
*Subject:* Re: [tipc-discussion] Fwd: [net v2] tipc: fix wrong connect() return code

I think that is a better solution. The client will not be able to do any more writes, but that is a correct behavior when the peer has closed the connection.

Basically, you modify tipc_connect(), and return 0 if the socket is in state DISCONNECTING, right?

///jon

On 1/23/20 2:54 AM, Tuong Lien Tong wrote:

    An alternative is to allow the 'connect()' to return '0' in this particular 
case. The socket state is already 'DISCONNECTING', so any further write will get 
'-EPIPE' but the socket is still able to read the messages existing in its receive 
queue. This may seem a bit contrary to the previous patch but they are not 
mutually exclusive, the previous one would only deal with the case the socket 
state is 'DISCONNECTING' but the 'sk->err' != 0.

    How does this look correct from a functional standpoint?

    BR/Tuong

    -----Original Message-----

From: Jon Maloy<jma...@redhat.com> <mailto:jma...@redhat.com>
    Sent: Tuesday, January 21, 2020 7:11 PM

    To:ying....@windriver.com  <mailto:ying....@windriver.com>

    Cc:tipc-discussion@lists.sourceforge.net  
<mailto:tipc-discussion@lists.sourceforge.net>

    Subject: [tipc-discussion] Fwd: [net v2] tipc: fix wrong connect() return 
code

    Hi Ying,

    Do you have any ideas regarding this? My best proposal is to att a time 
stamp to the new server socket at creation,

    and then delay a 'close' if the socket is < 5ms old.

    But I am not quite happy with this solution, so any better idea would be 
welcome.

    ///jon

            ----- Forwarded Message ----- From: Tuong Lien Tong

            <tuong.t.l...@dektech.com.au>  <mailto:tuong.t.l...@dektech.com.au>To: 'Ying 
Xue'<ying....@windriver.com>  <mailto:ying....@windriver.com>;

            "jma...@redhat.com"  <mailto:jma...@redhat.com>  <jma...@redhat.com>  
<mailto:jma...@redhat.com>;"ma...@donjonn.com"  <mailto:ma...@donjonn.com>

            <ma...@donjonn.com>  
<mailto:ma...@donjonn.com>Cc:"tipc-...@dektech.com.au"  
<mailto:tipc-...@dektech.com.au>

            <tipc-...@dektech.com.au>  <mailto:tipc-...@dektech.com.au>Sent: 
Monday, January 13, 2020, 04:57:58 AM

            GMT-5Subject: FW: [net v2] tipc: fix wrong connect() return code

        Hi Ying, Jon,

        The last change has exposed a different issue with our “lightweight”

        connection setup mechanism,

        Please see the scenario below:

                    server process                                 client 
process

               -----------------------

                                   ------------------------------

                          create()|                          |

                          listen()*tskL                      |

                            bind()|                          |create()

                       +- accept()*                      tskC*

                       |          |                          |connect() 
~~~~~~~~~~~~~+

        wait_for_rcv()|          |<---------[SYN]-----------|                   
    |

                       |          |                          |//tskC: CONNECTING

                       |          |                          |     
*wait_for_conn()

                       +--------->*tskS                      |

                                              |->add to wait queue

                                  |//tskS: ESTABLISHED       |                  
     |

                                  |----------[ACK]---------->|//tskC: 
ESTABLISHED

                                  |   ->wakeup(): SLEEPING -> run queue

                            send()|----------[DATA]--------->|\

                                                 ->wakeup(): already woken

                            send()|----------[DATA]--------->| |

                                                ->wakeup(): already woken

                              .   .            .             . |-> receive 
queue     .

                              .   .            .             . |                
     .

                            send()|----------[DATA]--------->|/

                                                 ->wakeup(): already woken

                           close()|----------[FIN]---------->|//tskC: 
DISCONNECTING  |

                                  |                          |                  
     |

                                  |                        
~~~~~~~~~~~~~~~~~~~~~~~~~~>schedule():

                                  |                        run queue -> RUNNING

                                                                                   
  |->add

                                                                                
     |back

                                                                                
     |to

                                                                                
     |wait

                                                                                
     |queue

                                                                                
     |(state

                                                                                
     |!=

                                                                                
     |ESTABLISHED)

                                                                                
     .

                                                                                
     .

                                                                                
     |

                                                                                
     *timed

                                                                                
     out

        When the server side receives ‘SYN’ from client, the ‘accept()’ call 
ends up

        with the new socket in the ‘ESTABLISHED’ state. Shortly thereafter, the

        server starts sending a number of messages before calling the socket

        ‘close()’ without waiting for any response from the client!

        In this scenario, it may happen on the other side that the client 
process

        waiting for the connection to be ‘ESTABLISHED’ cannot be awakened in 
time.

        Although the ‘wakeup()’ call is still done when the ‘ACK’ from the 
server

        arrives, but it just moves the process into the run queue (i.e. 
runnable) to

        be scheduled to run later. The kernel (i.e. RX process) is not 
preemptable

        when it processes the messages from the server, including the ‘ACK ‘,

        ‘DATA’, … and finally the ‘FIN’ which will terminate the connection. 
When

        the wait process is really run back, the client socket state is now

        ‘DISCONNECTING’ (i.e. != ‘ESTABLISHED’), so it continues to wait until 
the

        timer expires. The client will finally get ‘-ETIMEDOUT’ and forced to

        release the socket whereas there remains the messages from the server 
in the

        socket’s receive queue.

        Notes:

            - The issue is found by the tipcutils/ptts tool.

            - This is a timing issue, sometimes it happens sometimes not, 
depending on

            the small enough number of the data messages (e.g. < 100) from the 
server

            before it sends the ‘FIN’, as well as the network traffic, RX 
process,

            scheduler, etc. on the client side.

            - Without the previous change, the issue is _not detected_ since the

            ‘connect()’ call will return 0 when the wait process is woken & 
running

            back but the socket state is ‘DISCONNECTING’, there is no socket 
error

            (sk->sk_err == 0) in this case… However, this is the original issue 
and

            unexpected because if the client tries to send something after 
that, it

            will get ‘-EPIPE’ immediately.

        We need to find a solution to this problem,

            - It seems to me that in this mechanism the server is “too hasty” 
to set

            the new socket to ‘ESTABLISHED’ at the ‘accept()’ when it has just

            received a ‘SYN’ from client. That allows it to go ahead with 
sending the

            messages & closing the socket in one step… Should we consider to 
include

            something like the TCP ‘3-way handshake’ here, so the server needs 
to get

            a ‘ACK’ from the client before the connection is fully established? 
I

            guess ‘no’ since it should be kept lightweight…?

            - An alternative is we can introduce a bit delay e.g. 2 or 5ms to 
the

            ‘accept()’ or ‘close()’ before completing it, which will give a 
chance

            for its client to be really established first? This solution looks 
simple

            and should have less impact to user space…

            - Or …?

        What is your opinion, please share?

        Many thanks!

        BR/Tuong

        -----Original Message-----

        From: Xue, Ying<ying....@windriver.com>  <mailto:ying....@windriver.com>

        Sent: Thursday, January 2, 2020 4:10 PM

        To: Tuong Lien<tuong.t.l...@dektech.com.au>  
<mailto:tuong.t.l...@dektech.com.au>;

        tipc-discussion@lists.sourceforge.net  
<mailto:tipc-discussion@lists.sourceforge.net>;jma...@redhat.com  
<mailto:jma...@redhat.com>;ma...@donjonn.com  <mailto:ma...@donjonn.com>

        Subject: RE: [net v2] tipc: fix wrong connect() return code

        Great!

        Acked-by: Ying Xue<ying....@windriver.com>  
<mailto:ying....@windriver.com>

        -----Original Message-----

        From: Tuong Lien [mailto:tuong.t.l...@dektech.com.au]

        Sent: Monday, December 30, 2019 4:56 PM

        To:tipc-discussion@lists.sourceforge.net  
<mailto:tipc-discussion@lists.sourceforge.net>;jma...@redhat.com  
<mailto:jma...@redhat.com>;

        ma...@donjonn.com  <mailto:ma...@donjonn.com>; Xue, Ying

        Subject: [net v2] tipc: fix wrong connect() return code

        The current 'tipc_wait_for_connect()' function makes a loop and waits

        for the condition 'sk->sk_state != TIPC_CONNECTING' to conclude if the

        connecting has done. However, when the condition is met, it always

        returns '0' even in the case the connecting was actually failed (e.g.

        refused because the server socket has closed...) and the socket state

        was set to 'TIPC_DISCONNECTING'.

        This results in a wrong return code for the 'connect()' call from user,

        making it believe that the connection is established and goes ahead

        with more actions e.g. building & sending a message but then finally

        gets an unexpected result (e.g. '-EPIPE').

        This commit fixes the issue by instead setting the wait condition to

        'tipc_sk_connected(sk)', so that the function will return '0' only when

        the connection is really established. Otherwise, either the socket

        error code if any or '-ETIMEDOUT'/'-EINTR' will be returned

        correspondingly.

        ---------

        v2: changed after discussing with Ying

        ---------

        Signed-off-by: Tuong Lien<tuong.t.l...@dektech.com.au>  
<mailto:tuong.t.l...@dektech.com.au>

        ---

          net/tipc/socket.c | 4 ++--

          1 file changed, 2 insertions(+), 2 deletions(-)

        diff --git a/net/tipc/socket.c b/net/tipc/socket.c

        index 6552f986774c..2f5679f84060 100644

        --- a/net/tipc/socket.c

        +++ b/net/tipc/socket.c

        @@ -2432,8 +2432,8 @@ static int tipc_wait_for_connect(struct socket 
*sock,

        long *timeo_p)

                                                         return

                                                         
sock_intr_errno(*timeo_p);

                                         add_wait_queue(sk_sleep(sk), &wait);

        -                              done = sk_wait_event(sk, timeo_p,

        -

                                                                            
sk->sk_state

        != TIPC_CONNECTING, &wait);

        +                             done = sk_wait_event(sk, timeo_p,

        tipc_sk_connected(sk),

        +                                                                  
&wait);

                                         remove_wait_queue(sk_sleep(sk), &wait);

                         } while (!done);

                         return 0;

        --

        2.13.7

    _______________________________________________

    tipc-discussion mailing list

    tipc-discussion@lists.sourceforge.net  
<mailto:tipc-discussion@lists.sourceforge.net>

    https://lists.sourceforge.net/lists/listinfo/tipc-discussion

--


--

_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to