Re: Dead lock condition occured ipanic during register_netdevice_notifier call in 4.9.102

2018-05-31 Thread Bharadiya,Pankaj
On Thu, May 31, 2018 at 12:21:31PM +0300, Kirill Tkhai wrote:
> Hi, Illyas,
> 
> On 31.05.2018 11:43, Mansoor, Illyas wrote:
> > We are facing mutex dead lock condition that we think might be related to a 
> > fix that you have provided in:
> > Merge branch 
> > 'Close-race-between-un-register_netdevice_notifier-and-pernet_operations' 
> > commit b9a12601541eb55d07e00261a5112a4bc36fe7be
> > 
> > We tried to backport the patch series, but got stuck due to dependencies 
> > not met in 4.9.102 kernel for these patch series.
> > Could you please provide some pointers, so that we can fix in 4.9.y kernel.
> > 
> > Appreciate any help or pointers on this one.
> > 
> > Ipanic logs pasted below:
> > 
> > <3>[ 6513.681473] INFO: task sensors@1.0-ser:2744 blocked for more than 120 
> > seconds.
> > <3>[ 6513.689723]   Tainted: P U  W  O
> > 4.9.102-quilt-2e5dc0ac-07850-g222b9655589b #1
> > <3>[ 6513.699108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> > disables this message.
> > <6>[ 6513.707997] sensors@1.0-ser D0  2744  1 0x
> > <4>[ 6513.708007]  880223f38040 88027fc980c0  
> > 880271987000
> > <4>[ 6513.708024]  88026f9ae040 c9d57d40 81b363d1 
> > 81396e0b
> > <4>[ 6513.708032]  00ffc9d57d20 88027fc980c0 c9d57d90 
> > 88026f9ae040
> > <4>[ 6513.708040] Call Trace:
> > <4>[ 6513.708056]  [] ? __schedule+0x221/0x6e0
> > <4>[ 6513.708063]  [] ? sidtab_context_to_sid+0x39b/0x410
> > <4>[ 6513.708068]  [] schedule+0x36/0x90
> > <4>[ 6513.708072]  [] schedule_preempt_disabled+0x18/0x30
> > <4>[ 6513.708078]  [] __mutex_lock_slowpath+0x185/0x3f0
> > <4>[ 6513.708083]  [] mutex_lock+0x25/0x30
> > <4>[ 6513.708089]  [] rtnl_lock+0x15/0x20
> > <4>[ 6513.708095]  [] 
> > register_netdevice_notifier+0x2d/0x200
> > <4>[ 6513.708107]  [] raw_init+0x8b/0x90
> > <4>[ 6513.708118]  [] can_create+0xe1/0x1c0
> > <4>[ 6513.708129]  [] __sock_create+0x12e/0x210
> > <4>[ 6513.708141]  [] SyS_socket+0x55/0xb0
> > <4>[ 6513.708156]  [] do_syscall_64+0x6a/0xe0
> > <4>[ 6513.708166]  [] 
> > entry_SYSCALL_64_after_swapgs+0x5d/0xd7
> > <4>[ 6513.708171] NMI backtrace for cpu 2
> > <4>[ 6513.708178] CPU: 2 PID: 482 Comm: khungtaskd Tainted: P U  W  O   
> >  4.9.102-quilt-2e5dc0ac-07850-g222b9655589b #1
> > <4>[ 6513.708180]  c9eafdd0 813f56bc  
> > 
> > <4>[ 6513.708188]  c9eafe00 813f9fe1 0002 
> > 
> > <4>[ 6513.708195]  81042d80 826120f8 c9eafe30 
> > 813fa0a3
> 
> 1)I'm not sure commit b9a12601541eb55d07e00261a5112a4bc36fe7be will help 
> here, because this
> stack looks for me like just someone does not release the mutex. It's 
> possible firstly
> try to analyze who actually owns it.
> 
> 2)Also, note that rtnl_is_locked() is used in wrong way in one driver there
> (see WILC_WFI_deinit_mon_interface()), so it also may introduce an imbalance
> (if you use the driver).
>

Thank you for your quick response. We will look into your suggestions and get 
back.

Thanks,
Pankaj
 
> Kirill


Re: Dead lock condition occured ipanic during register_netdevice_notifier call in 4.9.102

2018-05-31 Thread Kirill Tkhai
Hi, Illyas,

On 31.05.2018 11:43, Mansoor, Illyas wrote:
> We are facing mutex dead lock condition that we think might be related to a 
> fix that you have provided in:
> Merge branch 
> 'Close-race-between-un-register_netdevice_notifier-and-pernet_operations' 
> commit b9a12601541eb55d07e00261a5112a4bc36fe7be
> 
> We tried to backport the patch series, but got stuck due to dependencies not 
> met in 4.9.102 kernel for these patch series.
> Could you please provide some pointers, so that we can fix in 4.9.y kernel.
> 
> Appreciate any help or pointers on this one.
> 
> Ipanic logs pasted below:
> 
> <3>[ 6513.681473] INFO: task sensors@1.0-ser:2744 blocked for more than 120 
> seconds.
> <3>[ 6513.689723]   Tainted: P U  W  O
> 4.9.102-quilt-2e5dc0ac-07850-g222b9655589b #1
> <3>[ 6513.699108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> <6>[ 6513.707997] sensors@1.0-ser D0  2744  1 0x
> <4>[ 6513.708007]  880223f38040 88027fc980c0  
> 880271987000
> <4>[ 6513.708024]  88026f9ae040 c9d57d40 81b363d1 
> 81396e0b
> <4>[ 6513.708032]  00ffc9d57d20 88027fc980c0 c9d57d90 
> 88026f9ae040
> <4>[ 6513.708040] Call Trace:
> <4>[ 6513.708056]  [] ? __schedule+0x221/0x6e0
> <4>[ 6513.708063]  [] ? sidtab_context_to_sid+0x39b/0x410
> <4>[ 6513.708068]  [] schedule+0x36/0x90
> <4>[ 6513.708072]  [] schedule_preempt_disabled+0x18/0x30
> <4>[ 6513.708078]  [] __mutex_lock_slowpath+0x185/0x3f0
> <4>[ 6513.708083]  [] mutex_lock+0x25/0x30
> <4>[ 6513.708089]  [] rtnl_lock+0x15/0x20
> <4>[ 6513.708095]  [] register_netdevice_notifier+0x2d/0x200
> <4>[ 6513.708107]  [] raw_init+0x8b/0x90
> <4>[ 6513.708118]  [] can_create+0xe1/0x1c0
> <4>[ 6513.708129]  [] __sock_create+0x12e/0x210
> <4>[ 6513.708141]  [] SyS_socket+0x55/0xb0
> <4>[ 6513.708156]  [] do_syscall_64+0x6a/0xe0
> <4>[ 6513.708166]  [] 
> entry_SYSCALL_64_after_swapgs+0x5d/0xd7
> <4>[ 6513.708171] NMI backtrace for cpu 2
> <4>[ 6513.708178] CPU: 2 PID: 482 Comm: khungtaskd Tainted: P U  W  O
> 4.9.102-quilt-2e5dc0ac-07850-g222b9655589b #1
> <4>[ 6513.708180]  c9eafdd0 813f56bc  
> 
> <4>[ 6513.708188]  c9eafe00 813f9fe1 0002 
> 
> <4>[ 6513.708195]  81042d80 826120f8 c9eafe30 
> 813fa0a3

1)I'm not sure commit b9a12601541eb55d07e00261a5112a4bc36fe7be will help here, 
because this
stack looks for me like just someone does not release the mutex. It's possible 
firstly
try to analyze who actually owns it.

2)Also, note that rtnl_is_locked() is used in wrong way in one driver there
(see WILC_WFI_deinit_mon_interface()), so it also may introduce an imbalance
(if you use the driver).

Kirill


Dead lock condition occured ipanic during register_netdevice_notifier call in 4.9.102

2018-05-31 Thread Mansoor, Illyas
Hi Tkhai/David,

We are facing mutex dead lock condition that we think might be related to a fix 
that you have provided in:
Merge branch 
'Close-race-between-un-register_netdevice_notifier-and-pernet_operations' 
commit b9a12601541eb55d07e00261a5112a4bc36fe7be

We tried to backport the patch series, but got stuck due to dependencies not 
met in 4.9.102 kernel for these patch series.
Could you please provide some pointers, so that we can fix in 4.9.y kernel.

Appreciate any help or pointers on this one.

Ipanic logs pasted below:

<3>[ 6513.681473] INFO: task sensors@1.0-ser:2744 blocked for more than 120 
seconds.
<3>[ 6513.689723]   Tainted: P U  W  O
4.9.102-quilt-2e5dc0ac-07850-g222b9655589b #1
<3>[ 6513.699108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
<6>[ 6513.707997] sensors@1.0-ser D0  2744  1 0x
<4>[ 6513.708007]  880223f38040 88027fc980c0  
880271987000
<4>[ 6513.708024]  88026f9ae040 c9d57d40 81b363d1 
81396e0b
<4>[ 6513.708032]  00ffc9d57d20 88027fc980c0 c9d57d90 
88026f9ae040
<4>[ 6513.708040] Call Trace:
<4>[ 6513.708056]  [] ? __schedule+0x221/0x6e0
<4>[ 6513.708063]  [] ? sidtab_context_to_sid+0x39b/0x410
<4>[ 6513.708068]  [] schedule+0x36/0x90
<4>[ 6513.708072]  [] schedule_preempt_disabled+0x18/0x30
<4>[ 6513.708078]  [] __mutex_lock_slowpath+0x185/0x3f0
<4>[ 6513.708083]  [] mutex_lock+0x25/0x30
<4>[ 6513.708089]  [] rtnl_lock+0x15/0x20
<4>[ 6513.708095]  [] register_netdevice_notifier+0x2d/0x200
<4>[ 6513.708107]  [] raw_init+0x8b/0x90
<4>[ 6513.708118]  [] can_create+0xe1/0x1c0
<4>[ 6513.708129]  [] __sock_create+0x12e/0x210
<4>[ 6513.708141]  [] SyS_socket+0x55/0xb0
<4>[ 6513.708156]  [] do_syscall_64+0x6a/0xe0
<4>[ 6513.708166]  [] entry_SYSCALL_64_after_swapgs+0x5d/0xd7
<4>[ 6513.708171] NMI backtrace for cpu 2
<4>[ 6513.708178] CPU: 2 PID: 482 Comm: khungtaskd Tainted: P U  W  O
4.9.102-quilt-2e5dc0ac-07850-g222b9655589b #1
<4>[ 6513.708180]  c9eafdd0 813f56bc  

<4>[ 6513.708188]  c9eafe00 813f9fe1 0002 

<4>[ 6513.708195]  81042d80 826120f8 c9eafe30 
813fa0a3

Thanks&
Regards,
Illyas