Re: [PATCH] notifier: Fix soft lockup for notifier_call_chain().

2016-06-27 Thread Ding Tianhong
On 2016/6/28 3:50, Cong Wang wrote:
> On Fri, Jun 24, 2016 at 7:46 PM, Ding Tianhong  
> wrote:
>> diff --git a/kernel/notifier.c b/kernel/notifier.c
>> index fd2c9ac..9c30411 100644
>> --- a/kernel/notifier.c
>> +++ b/kernel/notifier.c
>> @@ -92,6 +92,8 @@ static int notifier_call_chain(struct notifier_block **nl,
>>  #endif
>> ret = nb->notifier_call(nb, val, v);
>>
>> +   cond_resched();
>> +
>> if (nr_calls)
>> (*nr_calls)++;
> 
> NAK.
> 
> You can't do a resched in atomic context in __atomic_notifier_call_chain().
> 
> 
Sorry, I miss this, so I think add touch_nmi_watchdog looks like the best 
solution for this problem.

Thanks
Ding






Re: [PATCH] notifier: Fix soft lockup for notifier_call_chain().

2016-06-27 Thread Ding Tianhong
On 2016/6/28 3:50, Cong Wang wrote:
> On Fri, Jun 24, 2016 at 7:46 PM, Ding Tianhong  
> wrote:
>> diff --git a/kernel/notifier.c b/kernel/notifier.c
>> index fd2c9ac..9c30411 100644
>> --- a/kernel/notifier.c
>> +++ b/kernel/notifier.c
>> @@ -92,6 +92,8 @@ static int notifier_call_chain(struct notifier_block **nl,
>>  #endif
>> ret = nb->notifier_call(nb, val, v);
>>
>> +   cond_resched();
>> +
>> if (nr_calls)
>> (*nr_calls)++;
> 
> NAK.
> 
> You can't do a resched in atomic context in __atomic_notifier_call_chain().
> 
> 
Sorry, I miss this, so I think add touch_nmi_watchdog looks like the best 
solution for this problem.

Thanks
Ding






Re: [PATCH] notifier: Fix soft lockup for notifier_call_chain().

2016-06-27 Thread Cong Wang
On Fri, Jun 24, 2016 at 7:46 PM, Ding Tianhong  wrote:
> diff --git a/kernel/notifier.c b/kernel/notifier.c
> index fd2c9ac..9c30411 100644
> --- a/kernel/notifier.c
> +++ b/kernel/notifier.c
> @@ -92,6 +92,8 @@ static int notifier_call_chain(struct notifier_block **nl,
>  #endif
> ret = nb->notifier_call(nb, val, v);
>
> +   cond_resched();
> +
> if (nr_calls)
> (*nr_calls)++;

NAK.

You can't do a resched in atomic context in __atomic_notifier_call_chain().


Re: [PATCH] notifier: Fix soft lockup for notifier_call_chain().

2016-06-27 Thread Cong Wang
On Fri, Jun 24, 2016 at 7:46 PM, Ding Tianhong  wrote:
> diff --git a/kernel/notifier.c b/kernel/notifier.c
> index fd2c9ac..9c30411 100644
> --- a/kernel/notifier.c
> +++ b/kernel/notifier.c
> @@ -92,6 +92,8 @@ static int notifier_call_chain(struct notifier_block **nl,
>  #endif
> ret = nb->notifier_call(nb, val, v);
>
> +   cond_resched();
> +
> if (nr_calls)
> (*nr_calls)++;

NAK.

You can't do a resched in atomic context in __atomic_notifier_call_chain().


[PATCH] notifier: Fix soft lockup for notifier_call_chain().

2016-06-24 Thread Ding Tianhong
The problem was occurs in my system that a lot of drviers register
its own handler to the notifier call chain for netdev_chain, and
then create 4095 vlan dev for one nic, and add several ipv6 address
on each one of them, just like this:

for i in `seq 1 4095`; do ip link add link eth0 name eth0.$i type vlan id $i; 
done
for i in `seq 1 4095`; do ip -6 addr add 2001::$i dev eth0.$i; done
for i in `seq 1 4095`; do ip -6 addr add 2002::$i dev eth0.$i; done
for i in `seq 1 4095`; do ip -6 addr add 2003::$i dev eth0.$i; done

ifconfig eth0 up
ifconfig eth0 down

then it will halt several seconds, and occurs softlockup:

<0>[ 7620.364058]NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! 
[ifconfig:19186]
<0>[ 7620.364592]Call trace:
<4>[ 7620.364599][] dump_backtrace+0x0/0x220
<4>[ 7620.364603][] show_stack+0x20/0x28
<4>[ 7620.364607][] dump_stack+0x90/0xb0
<4>[ 7620.364612][] watchdog_timer_fn+0x41c/0x460
<4>[ 7620.364617][] __run_hrtimer+0x98/0x2d8
<4>[ 7620.364620][] hrtimer_interrupt+0x110/0x288
<4>[ 7620.364624][] arch_timer_handler_phys+0x38/0x48
<4>[ 7620.364628][] handle_percpu_devid_irq+0x9c/0x190
<4>[ 7620.364632][] generic_handle_irq+0x40/0x58
<4>[ 7620.364635][] __handle_domain_irq+0x68/0xc0
<4>[ 7620.364638][] gic_handle_irq+0xc4/0x1c8
<4>[ 7620.364641]Exception stack(0xffc0309b3640 to 0xffc0309b3770)
<4>[ 7620.364644]3640: 1000  ffc0309b37c0 
ffbfa1019cf8
<4>[ 7620.364647]3660: 8145 ffc0309b3958  
ffbfa1013008
<4>[ 7620.364651]3680: 07f0 ffbfa131b770 ffd08aaadc40 
ffbfa1019cf8
<4>[ 7620.364654]36a0: ffbfa1019cc4 ffd089c2b000 ffd08eff8000 
ffc0309b3958
<4>[ 7620.364656]36c0: ffbfa101c5c0   
ffbfa101c66c
<4>[ 7620.364659]36e0: 7f7f7f7f7f7f7f7f 0030  

<4>[ 7620.364662]3700:   ffc000393d58 
007f794d67b0
<4>[ 7620.364665]3720: 007fe62215d0 ffc0309b3830 ffc00021d8e0 
ffbfa1049b68
<4>[ 7620.364668]3740: ffc000697578 ffc0006974b8 ffc0309b3958 

<4>[ 7620.364670]3760: ffbfa1013008 07f0
<4>[ 7620.364673][] el1_irq+0x80/0x100
<4>[ 7620.364692][] fib6_walk+0x3c/0x70 [ipv6]
<4>[ 7620.364710][] fib6_clean_tree+0x68/0x90 [ipv6]
<4>[ 7620.364727][] __fib6_clean_all+0x88/0xc0 [ipv6]
<4>[ 7620.364746][] fib6_clean_all+0x28/0x30 [ipv6]
<4>[ 7620.364763][] rt6_ifdown+0x64/0x148 [ipv6]
<4>[ 7620.364781][] addrconf_ifdown+0x68/0x540 [ipv6]
<4>[ 7620.364798][] addrconf_notify+0xd0/0x8b8 [ipv6]
<4>[ 7620.364801][] notifier_call_chain+0x5c/0xa0
<4>[ 7620.364804][] raw_notifier_call_chain+0x20/0x28
<4>[ 7620.364809][] call_netdevice_notifiers_info+0x4c/0x80
<4>[ 7620.364812][] dev_close_many+0xd0/0x138
<4>[ 7620.364821][] vlan_device_event+0x4a8/0x6a0 [8021q]
<4>[ 7620.364824][] notifier_call_chain+0x5c/0xa0
<4>[ 7620.364827][] raw_notifier_call_chain+0x20/0x28
<4>[ 7620.364830][] call_netdevice_notifiers_info+0x4c/0x80
<4>[ 7620.364833][] __dev_notify_flags+0xb8/0xe0
<4>[ 7620.364836][] dev_change_flags+0x54/0x68
<4>[ 7620.364840][] devinet_ioctl+0x650/0x700
<4>[ 7620.364843][] inet_ioctl+0xa4/0xc8
<4>[ 7620.364847][] sock_do_ioctl+0x44/0x88
<4>[ 7620.364850][] sock_ioctl+0x23c/0x308
<4>[ 7620.364854][] do_vfs_ioctl+0x48c/0x620
<4>[ 7620.364857][] SyS_ioctl+0x94/0xa8

=cut 
here

It looks that the notifier_call_chain has to deal with too much handler, and 
will not
feed the watchdog until finish the work, so add cond_resched() in the loops to 
fix
this problem, and it will not panic again.

Signed-off-by: Ding Tianhong 
---
 kernel/notifier.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/notifier.c b/kernel/notifier.c
index fd2c9ac..9c30411 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -92,6 +92,8 @@ static int notifier_call_chain(struct notifier_block **nl,
 #endif
ret = nb->notifier_call(nb, val, v);
 
+   cond_resched();
+
if (nr_calls)
(*nr_calls)++;
 
-- 
1.9.0




[PATCH] notifier: Fix soft lockup for notifier_call_chain().

2016-06-24 Thread Ding Tianhong
The problem was occurs in my system that a lot of drviers register
its own handler to the notifier call chain for netdev_chain, and
then create 4095 vlan dev for one nic, and add several ipv6 address
on each one of them, just like this:

for i in `seq 1 4095`; do ip link add link eth0 name eth0.$i type vlan id $i; 
done
for i in `seq 1 4095`; do ip -6 addr add 2001::$i dev eth0.$i; done
for i in `seq 1 4095`; do ip -6 addr add 2002::$i dev eth0.$i; done
for i in `seq 1 4095`; do ip -6 addr add 2003::$i dev eth0.$i; done

ifconfig eth0 up
ifconfig eth0 down

then it will halt several seconds, and occurs softlockup:

<0>[ 7620.364058]NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! 
[ifconfig:19186]
<0>[ 7620.364592]Call trace:
<4>[ 7620.364599][] dump_backtrace+0x0/0x220
<4>[ 7620.364603][] show_stack+0x20/0x28
<4>[ 7620.364607][] dump_stack+0x90/0xb0
<4>[ 7620.364612][] watchdog_timer_fn+0x41c/0x460
<4>[ 7620.364617][] __run_hrtimer+0x98/0x2d8
<4>[ 7620.364620][] hrtimer_interrupt+0x110/0x288
<4>[ 7620.364624][] arch_timer_handler_phys+0x38/0x48
<4>[ 7620.364628][] handle_percpu_devid_irq+0x9c/0x190
<4>[ 7620.364632][] generic_handle_irq+0x40/0x58
<4>[ 7620.364635][] __handle_domain_irq+0x68/0xc0
<4>[ 7620.364638][] gic_handle_irq+0xc4/0x1c8
<4>[ 7620.364641]Exception stack(0xffc0309b3640 to 0xffc0309b3770)
<4>[ 7620.364644]3640: 1000  ffc0309b37c0 
ffbfa1019cf8
<4>[ 7620.364647]3660: 8145 ffc0309b3958  
ffbfa1013008
<4>[ 7620.364651]3680: 07f0 ffbfa131b770 ffd08aaadc40 
ffbfa1019cf8
<4>[ 7620.364654]36a0: ffbfa1019cc4 ffd089c2b000 ffd08eff8000 
ffc0309b3958
<4>[ 7620.364656]36c0: ffbfa101c5c0   
ffbfa101c66c
<4>[ 7620.364659]36e0: 7f7f7f7f7f7f7f7f 0030  

<4>[ 7620.364662]3700:   ffc000393d58 
007f794d67b0
<4>[ 7620.364665]3720: 007fe62215d0 ffc0309b3830 ffc00021d8e0 
ffbfa1049b68
<4>[ 7620.364668]3740: ffc000697578 ffc0006974b8 ffc0309b3958 

<4>[ 7620.364670]3760: ffbfa1013008 07f0
<4>[ 7620.364673][] el1_irq+0x80/0x100
<4>[ 7620.364692][] fib6_walk+0x3c/0x70 [ipv6]
<4>[ 7620.364710][] fib6_clean_tree+0x68/0x90 [ipv6]
<4>[ 7620.364727][] __fib6_clean_all+0x88/0xc0 [ipv6]
<4>[ 7620.364746][] fib6_clean_all+0x28/0x30 [ipv6]
<4>[ 7620.364763][] rt6_ifdown+0x64/0x148 [ipv6]
<4>[ 7620.364781][] addrconf_ifdown+0x68/0x540 [ipv6]
<4>[ 7620.364798][] addrconf_notify+0xd0/0x8b8 [ipv6]
<4>[ 7620.364801][] notifier_call_chain+0x5c/0xa0
<4>[ 7620.364804][] raw_notifier_call_chain+0x20/0x28
<4>[ 7620.364809][] call_netdevice_notifiers_info+0x4c/0x80
<4>[ 7620.364812][] dev_close_many+0xd0/0x138
<4>[ 7620.364821][] vlan_device_event+0x4a8/0x6a0 [8021q]
<4>[ 7620.364824][] notifier_call_chain+0x5c/0xa0
<4>[ 7620.364827][] raw_notifier_call_chain+0x20/0x28
<4>[ 7620.364830][] call_netdevice_notifiers_info+0x4c/0x80
<4>[ 7620.364833][] __dev_notify_flags+0xb8/0xe0
<4>[ 7620.364836][] dev_change_flags+0x54/0x68
<4>[ 7620.364840][] devinet_ioctl+0x650/0x700
<4>[ 7620.364843][] inet_ioctl+0xa4/0xc8
<4>[ 7620.364847][] sock_do_ioctl+0x44/0x88
<4>[ 7620.364850][] sock_ioctl+0x23c/0x308
<4>[ 7620.364854][] do_vfs_ioctl+0x48c/0x620
<4>[ 7620.364857][] SyS_ioctl+0x94/0xa8

=cut 
here

It looks that the notifier_call_chain has to deal with too much handler, and 
will not
feed the watchdog until finish the work, so add cond_resched() in the loops to 
fix
this problem, and it will not panic again.

Signed-off-by: Ding Tianhong 
---
 kernel/notifier.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/notifier.c b/kernel/notifier.c
index fd2c9ac..9c30411 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -92,6 +92,8 @@ static int notifier_call_chain(struct notifier_block **nl,
 #endif
ret = nb->notifier_call(nb, val, v);
 
+   cond_resched();
+
if (nr_calls)
(*nr_calls)++;
 
-- 
1.9.0