[tipc-discussion] [PATCH net v4 1/1] tipc: fix socket timer deadlock

Jon Maloy Tue, 14 Jun 2016 05:36:32 -0700

We sometimes observe a 'deadly embrace' type deadlock occurring
between mutually connected sockets on the same node. This happens
when the one-hour peer supervision timers happen to expire
simultaneously in both sockets.


The scenario is as follows:

CPU 1:                          CPU 2:
--------                        --------
tipc_sk_timeout(sk1)            tipc_sk_timeout(sk2)
  lock(sk1.slock)                 lock(sk2.slock)
  msg_create(probe)               msg_create(probe)
  unlock(sk1.slock)               unlock(sk2.slock)
  tipc_node_xmit_skb()            tipc_node_xmit_skb()
    tipc_node_xmit()                tipc_node_xmit()
      tipc_sk_rcv(sk2)                tipc_sk_rcv(sk1)
        lock(sk2.slock)                 lock((sk1.slock)
        filter_rcv()                    filter_rcv()
          tipc_sk_proto_rcv()             tipc_sk_proto_rcv()
            msg_create(probe_rsp)           msg_create(probe_rsp)
            tipc_sk_respond()               tipc_sk_respond()
              tipc_node_xmit_skb()            tipc_node_xmit_skb()
                tipc_node_xmit()                tipc_node_xmit()
                  tipc_sk_rcv(sk1)                tipc_sk_rcv(sk2)
                    lock((sk1.slock)                lock((sk2.slock)
                    ===> DEADLOCK                   ===> DEADLOCK

Further analysis reveals that there are at least three different
locations in the socket code where tipc_sk_respond() is called within
the context of the socket lock, with ensuing risk of similar deadlocks.

We solve this by ensuring that messages created by tipc_sk_respond()
only are sent directly if sk_lock.owned mutex is held. Otherwise they
are queued up in the socket write queue and sent after the socket lock
has been released.

v2: - Testing on mutex sk_lock.owned instead of sk_lock.slock in
      tipc_sk_respond(). This is safer, since sk_lock.slock may
      occasionally and briefly be held (by concurrent user contexts)
      even if we are in user context.
v3: - By lowering the socket timeout to 36 ms instead of 3,600,000 and
      setting up 1000 connections I could easily reproduce the deadlock
      and verify that my solution works.
    - When killing one of the processes I sometimes got a kernel crash
      in the loop emptying the socket write queue. Realizing that there
      may be concurrent processes emptying the write queue, I had to add
      a test that the dequeuing actually returned a buffer. This solved
      the problem.
    - I tried Ying's suggestion with unconditionally adding all
      CONN_MANAGER messages to the backlog queue, and it didn't work.
      This is because we will often add the message to the backlog when
      the socket is *not* owned, so there will be nothing triggering
      execution of backlog_rcv() within acceptable time. Apart from
      that, my solution solves the problem at all three locations where
      this deadlock may happen, as already stated above.
v4: - Introduced separate queue in struct tipc_sock for the purpose
      above, instead of using the socket send queue. The socket send
      queue was used for regular message sending until commit
      f214fc402967e ("tipc: Revert tipc: use existing sk_write_queue for
      outgoing packet chain") i.e. as recent as kernel 4.5, so using
      this queue would screw up older kernel versions.
    - Made small cosmetic improvement to the dequeuing loop.

Reported-by: GUNA <[email protected]>
Signed-off-by: Jon Maloy <[email protected]>
---
 net/tipc/socket.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 88bfcd7..5ed6d5c 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -90,6 +90,7 @@ struct tipc_sock {
        struct tipc_msg phdr;
        struct list_head sock_list;
        struct list_head publications;
+       struct sk_buff_head rspq;
        u32 pub_count;
        u32 probing_state;
        unsigned long probing_intv;
@@ -278,7 +279,11 @@ static void tipc_sk_respond(struct sock *sk, struct 
sk_buff *skb, int err)
 
        dnode = msg_destnode(buf_msg(skb));
        selector = msg_origport(buf_msg(skb));
-       tipc_node_xmit_skb(sock_net(sk), skb, dnode, selector);
+
+       if (sock_owned_by_user(sk))
+               tipc_node_xmit_skb(sock_net(sk), skb, dnode, selector);
+       else
+               skb_queue_tail(&tipc_sk(sk)->rspq, skb);
 }
 
 /**
@@ -379,6 +384,7 @@ static int tipc_sk_create(struct net *net, struct socket 
*sock,
        tsk = tipc_sk(sk);
        tsk->max_pkt = MAX_PKT_DEFAULT;
        INIT_LIST_HEAD(&tsk->publications);
+       skb_queue_head_init(&tsk->rspq);
        msg = &tsk->phdr;
        tn = net_generic(sock_net(sk), tipc_net_id);
        tipc_msg_init(tn->own_addr, msg, TIPC_LOW_IMPORTANCE, TIPC_NAMED_MSG,
@@ -1830,6 +1836,12 @@ void tipc_sk_rcv(struct net *net, struct sk_buff_head 
*inputq)
                                tipc_sk_enqueue(inputq, sk, dport);
                                spin_unlock_bh(&sk->sk_lock.slock);
                        }
+                       /* Send pending response/rejected messages, if any */
+                       while (!skb_queue_empty(&tsk->rspq) &&
+                              (skb = skb_dequeue(&tsk->rspq))) {
+                               dnode = msg_destnode(buf_msg(skb));
+                               tipc_node_xmit_skb(net, skb, dnode, dport);
+                       }
                        sock_put(sk);
                        continue;
                }
-- 
1.9.1


------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

[tipc-discussion] [PATCH net v4 1/1] tipc: fix socket timer deadlock

Reply via email to