> -----Original Message----- > From: Jon Maloy [mailto:[email protected]] > Sent: Wednesday, December 06, 2017 08:04 > To: Ying Xue <[email protected]>; [email protected] > Cc: [email protected] > Subject: Re: [tipc-discussion] [net-next PATCH 1/2] tipc: prevent swork item > from queuing workqueue after connection is closed > > > > > -----Original Message----- > > From: Ying Xue [mailto:[email protected]] > > Sent: Wednesday, December 06, 2017 03:29 > > To: Jon Maloy <[email protected]>; [email protected] > > Cc: [email protected] > > Subject: Re: [net-next PATCH 1/2] tipc: prevent swork item from > > queuing workqueue after connection is closed > > > > > > > > On 12/06/2017 03:31 AM, Jon Maloy wrote: > > >> If you in remove the connection item from the idr list *before* you > > >> flush the queues and remove the socket, there is no risk that this > > >> will happen. > > >> If tipc_conn_sendmsg() obtains a pointer, we can be certain that > > >> the queue has not been flushed yet, and that this will not happen > > >> until > > >> conn_put() is called. > > >> And, you won't need to use the callback lock or touch the socket at > > >> all > > here. > > > It's even better: if you introduce the > > > !kref_get_unless_zero(&con->kref)) > > test, there is no way tipc_conn_sendmsg() will obtain a pointer while > > tipc_conn_kref_release() is being executed, i.e., after the queue is > flushed. > > > I suggest you make both these changes. > > > > > > Please let me use below call chains to show what the race condition is: > > > > CPU 1: CPU 2: > > ========== =============== > > T1: tipc_conn_sendmsg > > T2: con = tipc_conn_lookup() > > T3: test_bit(CF_CONNECTED, &con->flags) > > T4: =====> interrupted by another thread; > > T5: tipc_close_conn() > > T6: test_and_clear_bit(con->flag) > > T7: tipc_unregister_callbacks() > > T8: flush_work(&con->swork) > > T9: flush_work(&con->rwork) > > T10: queue_work(s->send_wq, &con->swork) > > > > > > The key point is that the whole process of identifying whether conn > > is/being closed with test_bit(CF_CONNECTED, &con->flags) or > > sk->sk_user_data and queuing swork item into workqueue is not atomic. > > That is true. But looking at the kref value is, because if that value is zero > nobody else has a pointer to the connection item. And vice versa, if it is > non- > zero the queues cannot not have been flushed yet. > I am not even sure that the CF_CONNECTED flag is needed anymore, but I > need to think more about that. > What I am suggesting is that we move most of the functionality that is > currently in close_conn() to keref_release(). Something like this: > > static void tipc_close_conn(struct tipc_conn *con, bool flush) { > if (test_and_clear_bit(CF_CONNECTED, &con->flags)) > conn_put(con); > } > > static void tipc_conn_kref_release(struct kref *kref) { > struct tipc_conn *con = container_of(kref, struct tipc_conn, kref); > struct tipc_server *s = con->server; > struct sockaddr_tipc *saddr = s->saddr; > struct socket *sock = con->sock; > struct sock *sk; > > spin_lock_bh(&s->idr_lock); > idr_remove(&s->conn_idr, con->conid); > s->idr_in_use--; > spin_unlock_bh(&s->idr_lock); > > if (sock) { > sk = sock->sk; > if (test_bit(CF_SERVER, &con->flags)) { > __module_get(sock->ops->owner); > __module_get(sk->sk_prot_creator->owner); > } > flush_work(&con->swork); > flush_work(&con->rwork); > saddr->scope = -TIPC_NODE_SCOPE; > kernel_bind(sock, (struct sockaddr *)saddr, sizeof(*saddr)); > if (con->conid) > s->tipc_conn_release(con->conid, con->usr_data); > sock_release(sock); > con->sock = NULL; > } > tipc_clean_outqueues(con); > kfree(con); > }
Of course the flush_work() calls should be outside the if(sock) condition. ///jon > > ///jon > > > Even if all checking for connection status are well done and > > connection is still connected before swork item is queued, if the > > thread of running > > tipc_conn_sendmsg() is interrupted at T4 and it's backed at T10, a > > swork might be queued into workqueue after its queue has been flushed > > in > > tipc_close_conn() on CPU 2. > > > > That's why we need use sk_callback_lock read lock to guarantee that > > the entire process of identifying connection status and queuing swork > > item cannot be interrupted. > > > > > > > > ///jon > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most engaging > tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > tipc-discussion mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/tipc-discussion ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ tipc-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/tipc-discussion
