Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-06-02 Thread Xue, Ying
Hi Guna, Please see my comments below. Regards, Ying -Original Message- From: GUNA [mailto:gbala...@gmail.com] Sent: 2016年6月1日 23:26 To: Xue, Ying Cc: Jon Maloy; Jon Maloy; tipc-discussion@lists.sourceforge.net; Erik Hugne; Xue Ying (ying.x...@gmail.com) Subject: Re: [tipc-discussion]

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-31 Thread Erik Hugne
On May 31, 2016 6:12 PM, "GUNA" wrote: > > Could you provide me the exact code change for rescheduling, so I > don't want to make any mistake. > Nope, I'm travelling now. But if you want to try the resched-timer-if-owned hack, use: sk_reset_timer(sk, >sk_timer, (HZ / 20));

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-31 Thread Erik Hugne
On May 31, 2016 17:34, "GUNA" wrote: > > Which Erik's patch you are talking about? > Is this one, "tipc: fix timer handling when socket is owned" ? I think he was referring to my earlier suggestion to reschedule the timer if the socket is owned by user when it fires. The

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-31 Thread GUNA
Just want to clarify, system was upgraded only the kernel from 3.4.2 to 4.4.0 + some tipc patches on Fedora distribution. That said, the patch, "net: do not block BH while processing socket backlog" is not part of the 4.4.0. So, the issue is not due to this commit. If the patch, "tipc: block BH

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-30 Thread Jon Maloy
> -Original Message- > From: Xue, Ying [mailto:ying@windriver.com] > Sent: Monday, 30 May, 2016 14:15 > To: Jon Maloy; GUNA; Jon Maloy; tipc-discussion@lists.sourceforge.net; Erik > Hugne; Xue Ying (ying.x...@gmail.com) > Subject: RE: [tipc-discussion] tipc_sk_rcv: Kernel panic on

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-30 Thread Xue, Ying
Hi Jon, First of all, slock lock is designed very specially and wisely. In process context, it's similar to a mutex. By contrast, in interrupt context, it likes a spin lock. Moreover, it can safely protect members of sock struct on both contexts. When we are in interrupt/softirq mode and the

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-29 Thread Erik Hugne
On May 29, 2016 23:32, "Jon Maloy" wrote: > > Hi Guna, > I am looking at it, but don't have much time to spend at it right now. > A further study of your dump makes me believe this is a case of race between tipc_recv_stream(), which in user context is setting the "owned"

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-24 Thread Jon Maloy
On 05/24/2016 12:16 PM, GUNA wrote: > I suspect there could be glitch on switch may cause lost the probe or > abort message. However, even if the messages are lost for what ever > reason, is not TIPC stack should handle the graceful shutdown of the > TIPC connection by releasing all the

[tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-24 Thread GUNA
I suspect there could be glitch on switch may cause lost the probe or abort message. However, even if the messages are lost for what ever reason, is not TIPC stack should handle the graceful shutdown of the TIPC connection by releasing all the resources instead of panic or dead itself ? Does

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-20 Thread Jon Maloy
> -Original Message- > From: GUNA [mailto:gbala...@gmail.com] > Sent: Friday, 20 May, 2016 11:04 > To: Erik Hugne > Cc: Richard Alpe; Ying Xue; Parthasarathy Bhuvaragan; Jon Maloy; tipc- > discuss...@lists.sourceforge.net > Subject: Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-20 Thread GUNA
Thanks Erik for your quick analysis. If it is not known issue, are there any expert available to investigate it further why this lockup happen? Otherwise let me know the patch or fix information. // Guna On Fri, May 20, 2016 at 1:19 AM, Erik Hugne wrote: > A little more

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-19 Thread Erik Hugne
A little more awake now. Didnt see this yesterday. Look at the trace from CPU2 in Guna's initial mail. TIPC is recursing into the receive loop a second time, and freezes when it tries to take slock a second time. this is done in a timer CB, and softirq lockup detector kicks in after ~20s. //E

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-19 Thread GUNA
All the CPU cards on the system running the same load. Seen similar issue about 6 weeks back but seen again now on one card compared to all cards last time. At this time, there was very light traffic (handshake). I had seen following as part of the log, not sure it contributes the issue or not:

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-19 Thread Erik Hugne
On Thu, May 19, 2016 at 10:34:05AM -0400, GUNA wrote: > One of the card in my system is dead and rebooted to recover it. > The system is running on Kernel 4.4.0 + some latest TIPC patches. > Your earliest feedback of the issue is recommended. > At first i thought this might be a spinlock

[tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-19 Thread GUNA
One of the card in my system is dead and rebooted to recover it. The system is running on Kernel 4.4.0 + some latest TIPC patches. Your earliest feedback of the issue is recommended. The cascaded failure logs are following: [686797.257405] Modules linked in: nf_log_ipv4 nf_log_common xt_LOG