Re: [tipc-discussion] v4.7: soft lockup when releasing a socket

Parthasarathy Bhuvaragan Wed, 16 Nov 2016 01:28:18 -0800

Hi Ying / John,

The soft lock is the call chain of tipc_nametbl_withdraw(), when it 
performs the tipc_conn_kref_release() as it tries to grab nametbl_lock 
again while holding it already.
 > tipc_nametbl_withdraw
 >   spin_lock_bh(&tn->nametbl_lock);
 >   tipc_nametbl_remove_publ
 >      spin_lock_bh(&seq->lock);
 >      tipc_nameseq_remove_publ
 >        tipc_subscrp_report_overlap
 >          tipc_subscrp_send_event
 >             tipc_conn_sendmsg
<< Here, the (test_bit(CF_CONNECTED, &con->flags)) Fails, leading to the 
else case where do do a conn_put() and that triggers the cleanup as 
refcount reached 0. Leading the call chain below : >>
tipc_conn_kref_release
    tipc_sock_release
      tipc_conn_release
         tipc_subscrb_delete
            tipc_subscrp_delete
               tipc_nametbl_unsubscribe
                  spin_lock_bh(&tn->nametbl_lock);  << !! Soft Lockup >>


One cause is that tipc_exit_net() calls first calls tipc_topsrv_stop() 
and then tipc_nametbl_withdraw() in scope of tipc_net_stop().

The above chain will only occur in a narrow window for a given connection:
CPU#1:
tipc_nametbl_withdraw() manages to perform tipc_conn_lookup() and steps 
the refcount to 2, while in CPU#2 the following occurs:
CPU#2:
tipc_server_stop() calls tipc_close_conn(con). This performs a 
conn_put() decrementing refcount to 1.
Now, CPU#1 continues and detects that the connection is not CF_CONNECTED 
and does a conn_put(), triggering the release callback.

Before commit 333f796235a527, the above wont happen.

/Partha

On 11/15/2016 04:11 PM, Xue, Ying wrote:
> Hi John,
>
> Regarding the stack trace you provided below, I get the two potential call 
> chains:
>
> tipc_nametbl_withdraw
>   spin_lock_bh(&tn->nametbl_lock);
>   tipc_nametbl_remove_publ
>      spin_lock_bh(&seq->lock);
>      tipc_nameseq_remove_publ
>        tipc_subscrp_report_overlap
>          tipc_subscrp_send_event
>             tipc_conn_sendmsg
>                spin_lock_bh(&con->outqueue_lock);
>                list_add_tail(&e->list, &con->outqueue);
>
>
> tipc_topsrv_stop
>   tipc_server_stop
>     tipc_close_conn
>       kernel_sock_shutdown
>         tipc_subscrb_delete
>           spin_lock_bh(&subscriber->lock);
>           tipc_nametbl_unsubscribe(sub);
>            spin_lock_bh(&tn->nametbl_lock);
>
> Although I suspect this is a revert lock issue leading to the soft lockup, I 
> am still unable to understand which lock together with nametbl_lock is taken 
> reversely on the two different paths above.
> However, you just gave us the log printed on CPU#2, but the logs outputted by 
> other cores are also important.  So if possible, please share them with us.
>
> By the way, I agree with you, and it seems that commit 333f796235a527 is 
> related to the soft lockup.
>
> Regards,
> Ying
>
> -----Original Message-----
> From: John Thompson [mailto:[email protected]]
> Sent: Tuesday, November 15, 2016 8:01 AM
> To: [email protected]
> Subject: [tipc-discussion] v4.7: soft lockup when releasing a socket
>
> Hi,
>
> I am seeing an occasional kernel soft lockup.  I have TIPC v4.7 and the 
> kernel dump occurs when the system is going down for a reboot.
>
> The kernel dump is:
>
> <0>NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [exfx:1474] 
> <6>Modules linked in: tipc jitterentropy_rng echainiv drbg
> platform_driver(O) ipifwd(PO)
> ...
> <6>
> <6>GPR00: c15333e8 a4e0fb80 a4ee3600 a51748ac 00000000 ae475024 a537feec 
> fffffffd
> <6>GPR08: a2197408 00000001 00000001 00000004 80691c00 <6>NIP [80691c40] 
> _raw_spin_lock_bh+0x40/0x70 <6>LR [c1534f30] 
> tipc_nametbl_unsubscribe+0x50/0x120 [tipc] <6>Call Trace:
> <6>[a4e0fba0] [c15333e8] tipc_named_reinit+0xf8/0x820 [tipc] <6>[a4e0fbb0] 
> [c15336a0] tipc_named_reinit+0x3b0/0x820 [tipc] <6>[a4e0fbd0] [c1540bac] 
> tipc_nl_publ_dump+0x50c/0xed0 [tipc] <6>[a4e0fc00] [c154164c] 
> tipc_conn_sendmsg+0xdc/0x170 [tipc] <6>[a4e0fc30] [c1533c9c] 
> tipc_subscrp_report_overlap+0xbc/0xd0 [tipc] <6>[a4e0fc70] [c153425c] 
> tipc_topsrv_stop+0x45c/0x4f0 [tipc] <6>[a4e0fca0] [c1534788] 
> tipc_nametbl_remove_publ+0x58/0x110 [tipc] <6>[a4e0fcd0] [c1534c48] 
> tipc_nametbl_withdraw+0x68/0x140 [tipc] <6>[a4e0fd00] [c153cc24] 
> tipc_nl_node_dump_link+0x1904/0x45d0 [tipc] <6>[a4e0fd30] [c153d838] 
> tipc_nl_node_dump_link+0x2518/0x45d0 [tipc] <6>[a4e0fd70] [804f2870] 
> sock_release+0x30/0xf0 <6>[a4e0fd80] [804f2944] sock_close+0x14/0x30 
> <6>[a4e0fd90] [80105844] __fput+0x94/0x200 <6>[a4e0fdb0] [8003dca4] 
> task_work_run+0xd4/0x100 <6>[a4e0fdd0] [80023620] do_exit+0x280/0x980 
> <6>[a4e0fe10] [80024c48] do_group_exit+0x48/0xb0 <6>[a4e0fe30] [80030344] 
> get_signal+0x244/0x4f0 <6>[a4e0fe80] [8000
 7734] do_signal+0x34/0x1c0 <6>[a4e0ff30] [800079a8] do_notify_resume+0x68/0x80 
<6>[a4e0ff40] [8000fa1c] do_user_signal+0x74/0xc4
>
>
>>From the stack dump it looks like tipc_named_reinit is trying to
> acquire nametbl_lock.
>
>>From looking at the call chain I can see that tipc_conn_sendmsg can
> send up calling conn_put
>
> which will go on and call the tipc_named_reinit via tipc_sock_release.
>
> As tipc_nametbl_withdraw (from the stack dump) has already acquired the 
> nametbl_lock, tipc_named_reinit
>
> cannot get it and so the process hangs.
>
>
> The call to tipc_sock_release (added in Commit 333f796235a527
> <http://git.atlnz.lc/cgit/cgit.cgi/upstream_imports/linux-stable.git/commit/?id=333f796235a52727db7e0a13888045f3aa3d5335>)
> seems to have changed the behaviour
>
> such that it tries to do a lot more when shutting the connection down.
>
>
> If there is other information I can provide please let me know.
>
> Regards,
>
> John
> ------------------------------------------------------------------------------
> _______________________________________________
> tipc-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion
> ------------------------------------------------------------------------------
> _______________________________________________
> tipc-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion
>

------------------------------------------------------------------------------
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] v4.7: soft lockup when releasing a socket

Reply via email to