We sometimes observe a 'deadly embrace' type deadlock situation
occurring between mutually connected sockets on the same node.
This happens when the one-hour peer supervision timers happen
to expire simultaneously in both sockets.

The scenario is as follows:

CPU 1:                          CPU 2:
--------                        --------
tipc_sk_timeout(sk1)            tipc_sk_timeout(sk2)
  lock(sk1.slock)                 lock(sk2.slock)
  msg_create(probe)               msg_create(probe)
  unlock(sk1.slock)               unlock(sk2.slock)
  tipc_node_xmit_skb()            tipc_node_xmit_skb()
    tipc_node_xmit()                tipc_node_xmit()
      tipc_sk_rcv(sk2)                tipc_sk_rcv(sk1)
        lock(s2.slock)                  lock((sk1.slock)
        filter_rcv()                    filter_rcv()
          tipc_sk_proto_rcv()             tipc_sk_proto_rcv()
            msg_create(probe_rsp)           msg_create(probe_rsp)
            tipc_sk_respond()               tipc_sk_respond()
              tipc_node_xmit_skb()            tipc_node_xmit_skb()
                tipc_node_xmit()                tipc_node_xmit()
                  tipc_sk_rcv(sk1)                tipc_sk_rcv(sk2)
                    lock((sk1.slock)                lock((sk2.slock)
                    ===> DEADLOCK                   ===> DEADLOCK

Further analysis reveals that there are at least three different
locations in the socket code where tipc_sk_respond() is called within
the context of the socket lock, with ensuing risk of similar deadlocks.

We solve this by ensuring that messages created by tipc_sk_respond()
only are sent directly if sk_lock.slock is not held. Otherwise they are
queued up in the socket write queue and sent after the lock has been
released.

v2: - Testing on mutex sk_lock.owned instead of sk_lock.slock in
      tipc_sk_respond(). This is safer, since sk_lock.slock may
      occasionally and briefly be held (by concurrent user contexts)
      even if we are in user context.

Reported-by: GUNA <gbala...@gmail.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/socket.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 88bfcd7..5d9da91 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -278,7 +278,11 @@ static void tipc_sk_respond(struct sock *sk, struct 
sk_buff *skb, int err)
 
        dnode = msg_destnode(buf_msg(skb));
        selector = msg_origport(buf_msg(skb));
-       tipc_node_xmit_skb(sock_net(sk), skb, dnode, selector);
+
+       if (sock_owned_by_user(sk))
+               tipc_node_xmit_skb(sock_net(sk), skb, dnode, selector);
+       else
+               skb_queue_tail(&sk->sk_write_queue, skb);
 }
 
 /**
@@ -1830,6 +1834,12 @@ void tipc_sk_rcv(struct net *net, struct sk_buff_head 
*inputq)
                                tipc_sk_enqueue(inputq, sk, dport);
                                spin_unlock_bh(&sk->sk_lock.slock);
                        }
+                       /* Send pending response/rejected messages, if any */
+                       while (!skb_queue_empty(&sk->sk_write_queue)) {
+                               skb = skb_dequeue(&sk->sk_write_queue);
+                               dnode = msg_destnode(buf_msg(skb));
+                               tipc_node_xmit_skb(net, skb, dnode, dport);
+                       }
                        sock_put(sk);
                        continue;
                }
-- 
1.9.1


------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to