On 01/11/2017 08:19 PM, Parthasarathy Bhuvaragan wrote:
> Commit 333f796235a527 ("tipc: fix a race condition leading to
> subscriber refcnt bug") reveals a soft lockup while acquiring
> nametbl_lock.
>
> Before commit 333f796235a527, we call tipc_conn_shutdown() from
> tipc_close_conn() in the context of tipc_topsrv_stop(). In that
> context, we are allowed to grab the nametbl_lock.
>
> Commit 333f796235a527, moved tipc_conn_release (renamed from
> tipc_conn_shutdown) to the connection refcount cleanup. This allows
> either tipc_nametbl_withdraw() or tipc_topsrv_stop() to the cleanup.
>
> Since tipc_exit_net() first calls tipc_topsrv_stop() and then
> tipc_nametble_withdraw() increases the chances for the later to
> perform the connection cleanup.
>
> The soft lockup occurs in the call chain of tipc_nametbl_withdraw(),
> when it performs the tipc_conn_kref_release() as it tries to grab
> nametbl_lock again while holding it already.
> tipc_nametbl_withdraw() grabs nametbl_lock
> tipc_nametbl_remove_publ()
> tipc_subscrp_report_overlap()
> tipc_subscrp_send_event()
> tipc_conn_sendmsg()
> << if (con->flags != CF_CONNECTED) we do conn_put(),
> triggering the cleanup as refcount=0. >>
> tipc_conn_kref_release
> tipc_sock_release
> tipc_conn_release
> tipc_subscrb_delete
> tipc_subscrp_delete
> tipc_nametbl_unsubscribe << Soft Lockup >>
>
> The previous changes in this series fixes the race conditions fixed
> by commit 333f796235a527. Hence we can now revert the commit.
>
> Fixes: 333f796235a52727 ("tipc: fix a race condition leading to subscriber
> refcnt bug")
> Reported-by: John Thompson <[email protected]>
> Signed-off-by: Parthasarathy Bhuvaragan
> <[email protected]>
Acked-by: Ying Xue <[email protected]>
> ---
> net/tipc/server.c | 16 +++++-----------
> 1 file changed, 5 insertions(+), 11 deletions(-)
>
> diff --git a/net/tipc/server.c b/net/tipc/server.c
> index e178d41e1a68..e206b8fb826f 100644
> --- a/net/tipc/server.c
> +++ b/net/tipc/server.c
> @@ -86,7 +86,6 @@ struct outqueue_entry {
> static void tipc_recv_work(struct work_struct *work);
> static void tipc_send_work(struct work_struct *work);
> static void tipc_clean_outqueues(struct tipc_conn *con);
> -static void tipc_sock_release(struct tipc_conn *con);
>
> static void tipc_conn_kref_release(struct kref *kref)
> {
> @@ -104,7 +103,6 @@ static void tipc_conn_kref_release(struct kref *kref)
> }
> saddr->scope = -TIPC_NODE_SCOPE;
> kernel_bind(sock, (struct sockaddr *)saddr, sizeof(*saddr));
> - tipc_sock_release(con);
> sock_release(sock);
> con->sock = NULL;
>
> @@ -192,19 +190,15 @@ static void tipc_unregister_callbacks(struct tipc_conn
> *con)
> write_unlock_bh(&sk->sk_callback_lock);
> }
>
> -static void tipc_sock_release(struct tipc_conn *con)
> +static void tipc_close_conn(struct tipc_conn *con)
> {
> struct tipc_server *s = con->server;
>
> - if (con->conid)
> - s->tipc_conn_release(con->conid, con->usr_data);
> -
> - tipc_unregister_callbacks(con);
> -}
> -
> -static void tipc_close_conn(struct tipc_conn *con)
> -{
> if (test_and_clear_bit(CF_CONNECTED, &con->flags)) {
> + tipc_unregister_callbacks(con);
> +
> + if (con->conid)
> + s->tipc_conn_release(con->conid, con->usr_data);
>
> /* We shouldn't flush pending works as we may be in the
> * thread. In fact the races with pending rx/tx work structs
>
------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion