Re: [tipc-discussion] [net-next PATCH 1/2] tipc: prevent swork item from queuing workqueue after connection is closed

Jon Maloy Wed, 06 Dec 2017 05:24:02 -0800


> -----Original Message-----
> From: Jon Maloy [mailto:[email protected]]
> Sent: Wednesday, December 06, 2017 08:04
> To: Ying Xue <[email protected]>; [email protected]
> Cc: [email protected]
> Subject: Re: [tipc-discussion] [net-next PATCH 1/2] tipc: prevent swork item
> from queuing workqueue after connection is closed
> 
> 
> 
> > -----Original Message-----
> > From: Ying Xue [mailto:[email protected]]
> > Sent: Wednesday, December 06, 2017 03:29
> > To: Jon Maloy <[email protected]>; [email protected]
> > Cc: [email protected]
> > Subject: Re: [net-next PATCH 1/2] tipc: prevent swork item from
> > queuing workqueue after connection is closed
> >
> >
> >
> > On 12/06/2017 03:31 AM, Jon Maloy wrote:
> > >> If you in remove the connection item from the idr list *before* you
> > >> flush the queues and remove the socket, there is no risk that this
> > >> will happen.
> > >> If tipc_conn_sendmsg() obtains a pointer, we can be certain that
> > >> the queue has not been flushed yet, and that this will not happen
> > >> until
> > >> conn_put() is called.
> > >> And, you won't need to use the callback lock or touch the socket at
> > >> all
> > here.
> > > It's even better: if you introduce the
> > > !kref_get_unless_zero(&con->kref))
> > test, there is no way tipc_conn_sendmsg() will obtain a pointer while
> > tipc_conn_kref_release() is being executed, i.e., after the queue is
> flushed.
> > > I suggest you make both these changes.
> >
> >
> > Please let me use below call chains to show what the race condition is:
> >
> >   CPU 1:                                       CPU 2:
> > ==========                                 ===============
> > T1: tipc_conn_sendmsg
> > T2:  con = tipc_conn_lookup()
> > T3:  test_bit(CF_CONNECTED, &con->flags)
> > T4:   =====> interrupted by another thread;
> > T5:                                        tipc_close_conn()
> > T6:                                        test_and_clear_bit(con->flag)
> > T7:                                        tipc_unregister_callbacks()
> > T8:                                        flush_work(&con->swork)
> > T9:                                        flush_work(&con->rwork)
> > T10: queue_work(s->send_wq, &con->swork)
> >
> >
> > The key point is that the whole process of identifying whether conn
> > is/being closed with test_bit(CF_CONNECTED, &con->flags) or
> > sk->sk_user_data and queuing swork item into workqueue is not atomic.
> 
> That is true. But looking at the kref value is, because if that value is zero
> nobody else has a pointer to the connection item. And vice versa, if it is 
> non-
> zero the queues cannot not have been flushed yet.
> I am not even sure that the CF_CONNECTED flag is needed anymore, but I
> need to think more about that.
> What I am suggesting is that we move most of the functionality that is
> currently in close_conn() to keref_release(). Something like this:
> 
> static void tipc_close_conn(struct tipc_conn *con, bool flush) {
>         if (test_and_clear_bit(CF_CONNECTED, &con->flags))
>                 conn_put(con);
> }
> 
> static void tipc_conn_kref_release(struct kref *kref) {
>         struct tipc_conn *con = container_of(kref, struct tipc_conn, kref);
>         struct tipc_server *s = con->server;
>         struct sockaddr_tipc *saddr = s->saddr;
>         struct socket *sock = con->sock;
>         struct sock *sk;
> 
>         spin_lock_bh(&s->idr_lock);
>         idr_remove(&s->conn_idr, con->conid);
>         s->idr_in_use--;
>         spin_unlock_bh(&s->idr_lock);
> 
>         if (sock) {
>                 sk = sock->sk;
>                 if (test_bit(CF_SERVER, &con->flags)) {
>                         __module_get(sock->ops->owner);
>                         __module_get(sk->sk_prot_creator->owner);
>                 }
>                 flush_work(&con->swork);
>                 flush_work(&con->rwork);
>                 saddr->scope = -TIPC_NODE_SCOPE;
>                 kernel_bind(sock, (struct sockaddr *)saddr, sizeof(*saddr));
>                 if (con->conid)
>                         s->tipc_conn_release(con->conid, con->usr_data);
>                 sock_release(sock);
>                 con->sock = NULL;
>         }
>         tipc_clean_outqueues(con);
>         kfree(con);
> }


Of course the flush_work() calls should be outside the if(sock) condition.

///jon

> 
> ///jon
> 
> > Even if all checking for connection status are well done and
> > connection is still connected before swork item is queued, if the
> > thread of running
> > tipc_conn_sendmsg() is interrupted at T4 and it's backed at T10, a
> > swork might be queued into workqueue after its queue has been flushed
> > in
> > tipc_close_conn() on CPU 2.
> >
> > That's why we need use sk_callback_lock read lock to guarantee that
> > the entire process of identifying connection status and queuing swork
> > item cannot be interrupted.
> >
> > >
> > > ///jon
> > >
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most engaging
> tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> tipc-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [net-next PATCH 1/2] tipc: prevent swork item from queuing workqueue after connection is closed

Reply via email to