Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

Erik Hugne Thu, 19 May 2016 12:08:19 -0700

On Thu, May 19, 2016 at 10:34:05AM -0400, GUNA wrote:
> One of the card in my system is dead and rebooted to recover it.
> The system is running on Kernel 4.4.0 + some latest TIPC patches.
> Your earliest feedback of the issue is recommended.
>
At first i thought this might be a spinlock contention problem.


CPU2 is receiving TIPC traffic on a socket, and is trying to grab a
spinlock in tipc_sk_rcv context (probably sk->sk_lock.slock)
First argument to spin_trylock_bh() is passed in RDI: ffffffffa01546cc

CPU3 is sending TIPC data, tipc_node_xmit()->tipc_sk_rcv() indicates
that it's traffic between sockets on the same machine.
And i think this is the same socket as on CPU2, because we see the same
address in RDI: ffffffffa01546cc

But this made me unsure:
[686798.930348] ixgbe 0000:01:00.0 p19p2: initiating reset due to tx timeout
Is it contributing to the problem, or is it a side effect of a spinlock 
contention?

Driver (or HW) bugs _are_ fatal for a network stack, but why would a lock 
contention
in a network stack cause NIC TX timeouts?

Does all cards in your system have similar workloads?
Do you see this on multiple cards?

//E 

------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

Reply via email to