Re: [PATCH] drm/vkms: Fix soft lockup.

2020-07-31 Thread daniel
On Wed, Jul 29, 2020 at 02:23:15AM +, Xu Qiang wrote:
> A soft deadlock occurs when call hrtimer_cancel in softirq context:
> 
>   a) The main frequency of the machine is very slow
>   b) output->period_ns is very small, even only 1 ns
> 
> The problem can be solved in the following way: Setting a hrtimer exit flag
> in the vkms_disable_vblank function and checking the flag in the
> vkms_vblank_simulate function. If the flag is set, the hrtimer is not added
> to hrtimer queue again,the hrtimer_cancel function can exit quickly
> without deadlock.
> 
> watchdog: BUG: soft lockup - CPU#2 stuck for 134s! [syz-executor.2:18027]
> Modules linked in:
> CPU: 2 PID: 18027 Comm: syz-executor.2 Tainted: GW 
> 5.8.0-rc2-csan #2
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
> RIP: 0010:csd_lock_wait kernel/smp.c:108 [inline]
> RIP: 0010:smp_call_function_many_cond+0x40c/0x470 kernel/smp.c:555
> RSP: 0018:c900028d3a88 EFLAGS: 0297
> RAX: 0002 RBX: 888237cacc80 RCX: 813240fc
> RDX: 0001 RSI:  RDI: 0005
> RBP: 0001 R08: 8881f497d000 R09: 
> R10:  R11: 0100 R12: 888237cf6fc0
> R13: 0003 R14: 888237cacc88 R15: 0001
> FS:  7fa6314a9700() GS:888237c8() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 00513c80 CR3: 0001e3df3002 CR4: 000626e0
> Call Trace:
>  smp_call_function_many kernel/smp.c:577 [inline]
>  smp_call_function+0x40/0x80 kernel/smp.c:599
>  on_each_cpu+0x2a/0xc0 kernel/smp.c:717
>  __purge_vmap_area_lazy+0x7c/0xbc0 mm/vmalloc.c:1367
>  _vm_unmap_aliases.part.0+0x126/0x180 mm/vmalloc.c:1800
>  _vm_unmap_aliases mm/vmalloc.c:1769 [inline]
>  vm_unmap_aliases+0x2f/0x40 mm/vmalloc.c:1823
>  change_page_attr_set_clr+0x10a/0x4a0 arch/x86/mm/pat/set_memory.c:1732
>  change_page_attr_clear arch/x86/mm/pat/set_memory.c:1789 [inline]
>  set_memory_ro+0x2b/0x40 arch/x86/mm/pat/set_memory.c:1935
>  bpf_jit_binary_lock_ro include/linux/filter.h:815 [inline]
>  bpf_int_jit_compile+0x54f/0x663 arch/x86/net/bpf_jit_comp.c:1929
>  bpf_prog_select_runtime+0x1cc/0x2c0 kernel/bpf/core.c:1807
>  bpf_migrate_filter net/core/filter.c:1264 [inline]
>  bpf_prepare_filter net/core/filter.c:1312 [inline]
>  bpf_prepare_filter+0x518/0x620 net/core/filter.c:1278
>  __get_filter+0x107/0x160 net/core/filter.c:1481
>  sk_attach_filter+0x19/0xa0 net/core/filter.c:1496
>  sock_setsockopt+0x1208/0x1230 net/core/sock.c:1080
>  __sys_setsockopt+0x248/0x270 net/socket.c:2123
>  __do_sys_setsockopt net/socket.c:2143 [inline]
>  __se_sys_setsockopt net/socket.c:2140 [inline]
>  __x64_sys_setsockopt+0x22/0x30 net/socket.c:2140
>  do_syscall_64+0x48/0xb0 arch/x86/entry/common.c:359
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> Sending NMI from CPU 2 to CPUs 0-1,3-5:
> NMI backtrace for cpu 4 skipped: idling at native_safe_halt+0xe/0x10 
> arch/x86/include/asm/irqflags.h:60
> NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x10 
> arch/x86/include/asm/irqflags.h:60
> NMI backtrace for cpu 1
> NMI backtrace for cpu 3
> CPU: 3 PID: 0 Comm: swapper/3 Tainted: GW 5.8.0-rc2-csan #2
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
> RIP: 0010:preempt_count arch/x86/include/asm/preempt.h:26 [inline]
> RIP: 0010:preempt_count_sub+0xa/0x90 kernel/sched/core.c:3874
> RSP: 0018:c912cdd0 EFLAGS: 0046
> RAX: 0001 RBX: 88822ecb8fb0 RCX: 
> RDX:  RSI: 0086 RDI: 0001
> RBP: 888237d5edc0 R08: 888236ddc000 R09: 
> R10:  R11:  R12: 
> R13: 0086 R14: 888237d5edc0 R15: 85bc6d60
> FS:  () GS:888237cc() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7f35cc47adb8 CR3: 05a23006 CR4: 000626e0
> Call Trace:
>  
>  __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
>  _raw_spin_unlock_irqrestore+0x2f/0x50 kernel/locking/spinlock.c:191
>  unlock_hrtimer_base kernel/time/hrtimer.c:898 [inline]
>  hrtimer_try_to_cancel kernel/time/hrtimer.c:1171 [inline]
>  hrtimer_try_to_cancel+0xbd/0x1b0 kernel/time/hrtimer.c:1151
>  hrtimer_cancel+0x13/0x40 kernel/time/hrtimer.c:1278
>  __disable_vblank drivers/gpu/drm/drm_vblank.c:429 [inline]
>  drm_vblank_disable_and_save+0x122/0x140 drivers/gpu/drm/drm_vblank.c:470
>  vblank_disable_fn+0x96/0xa0 drivers/gpu/drm/drm_vblank.c:487
>  call_timer_fn+0x3a/0x230 kernel/time/timer.c:1404
>  expire_timers kernel/time/timer.c:1449 [inline]
>  __run_timers kernel/time/timer.c:1773 [inline]
>  __run_timers kernel/time/timer.c:1740 [inline]
>  run_timer_softirq+0x2b8/0x840 

[PATCH] drm/vkms: Fix soft lockup.

2020-07-28 Thread Xu Qiang
A soft deadlock occurs when call hrtimer_cancel in softirq context:

a) The main frequency of the machine is very slow
b) output->period_ns is very small, even only 1 ns

The problem can be solved in the following way: Setting a hrtimer exit flag
in the vkms_disable_vblank function and checking the flag in the
vkms_vblank_simulate function. If the flag is set, the hrtimer is not added
to hrtimer queue again,the hrtimer_cancel function can exit quickly
without deadlock.

watchdog: BUG: soft lockup - CPU#2 stuck for 134s! [syz-executor.2:18027]
Modules linked in:
CPU: 2 PID: 18027 Comm: syz-executor.2 Tainted: GW 
5.8.0-rc2-csan #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu1 04/01/2014
RIP: 0010:csd_lock_wait kernel/smp.c:108 [inline]
RIP: 0010:smp_call_function_many_cond+0x40c/0x470 kernel/smp.c:555
RSP: 0018:c900028d3a88 EFLAGS: 0297
RAX: 0002 RBX: 888237cacc80 RCX: 813240fc
RDX: 0001 RSI:  RDI: 0005
RBP: 0001 R08: 8881f497d000 R09: 
R10:  R11: 0100 R12: 888237cf6fc0
R13: 0003 R14: 888237cacc88 R15: 0001
FS:  7fa6314a9700() GS:888237c8() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 00513c80 CR3: 0001e3df3002 CR4: 000626e0
Call Trace:
 smp_call_function_many kernel/smp.c:577 [inline]
 smp_call_function+0x40/0x80 kernel/smp.c:599
 on_each_cpu+0x2a/0xc0 kernel/smp.c:717
 __purge_vmap_area_lazy+0x7c/0xbc0 mm/vmalloc.c:1367
 _vm_unmap_aliases.part.0+0x126/0x180 mm/vmalloc.c:1800
 _vm_unmap_aliases mm/vmalloc.c:1769 [inline]
 vm_unmap_aliases+0x2f/0x40 mm/vmalloc.c:1823
 change_page_attr_set_clr+0x10a/0x4a0 arch/x86/mm/pat/set_memory.c:1732
 change_page_attr_clear arch/x86/mm/pat/set_memory.c:1789 [inline]
 set_memory_ro+0x2b/0x40 arch/x86/mm/pat/set_memory.c:1935
 bpf_jit_binary_lock_ro include/linux/filter.h:815 [inline]
 bpf_int_jit_compile+0x54f/0x663 arch/x86/net/bpf_jit_comp.c:1929
 bpf_prog_select_runtime+0x1cc/0x2c0 kernel/bpf/core.c:1807
 bpf_migrate_filter net/core/filter.c:1264 [inline]
 bpf_prepare_filter net/core/filter.c:1312 [inline]
 bpf_prepare_filter+0x518/0x620 net/core/filter.c:1278
 __get_filter+0x107/0x160 net/core/filter.c:1481
 sk_attach_filter+0x19/0xa0 net/core/filter.c:1496
 sock_setsockopt+0x1208/0x1230 net/core/sock.c:1080
 __sys_setsockopt+0x248/0x270 net/socket.c:2123
 __do_sys_setsockopt net/socket.c:2143 [inline]
 __se_sys_setsockopt net/socket.c:2140 [inline]
 __x64_sys_setsockopt+0x22/0x30 net/socket.c:2140
 do_syscall_64+0x48/0xb0 arch/x86/entry/common.c:359
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Sending NMI from CPU 2 to CPUs 0-1,3-5:
NMI backtrace for cpu 4 skipped: idling at native_safe_halt+0xe/0x10 
arch/x86/include/asm/irqflags.h:60
NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x10 
arch/x86/include/asm/irqflags.h:60
NMI backtrace for cpu 1
NMI backtrace for cpu 3
CPU: 3 PID: 0 Comm: swapper/3 Tainted: GW 5.8.0-rc2-csan #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu1 04/01/2014
RIP: 0010:preempt_count arch/x86/include/asm/preempt.h:26 [inline]
RIP: 0010:preempt_count_sub+0xa/0x90 kernel/sched/core.c:3874
RSP: 0018:c912cdd0 EFLAGS: 0046
RAX: 0001 RBX: 88822ecb8fb0 RCX: 
RDX:  RSI: 0086 RDI: 0001
RBP: 888237d5edc0 R08: 888236ddc000 R09: 
R10:  R11:  R12: 
R13: 0086 R14: 888237d5edc0 R15: 85bc6d60
FS:  () GS:888237cc() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f35cc47adb8 CR3: 05a23006 CR4: 000626e0
Call Trace:
 
 __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
 _raw_spin_unlock_irqrestore+0x2f/0x50 kernel/locking/spinlock.c:191
 unlock_hrtimer_base kernel/time/hrtimer.c:898 [inline]
 hrtimer_try_to_cancel kernel/time/hrtimer.c:1171 [inline]
 hrtimer_try_to_cancel+0xbd/0x1b0 kernel/time/hrtimer.c:1151
 hrtimer_cancel+0x13/0x40 kernel/time/hrtimer.c:1278
 __disable_vblank drivers/gpu/drm/drm_vblank.c:429 [inline]
 drm_vblank_disable_and_save+0x122/0x140 drivers/gpu/drm/drm_vblank.c:470
 vblank_disable_fn+0x96/0xa0 drivers/gpu/drm/drm_vblank.c:487
 call_timer_fn+0x3a/0x230 kernel/time/timer.c:1404
 expire_timers kernel/time/timer.c:1449 [inline]
 __run_timers kernel/time/timer.c:1773 [inline]
 __run_timers kernel/time/timer.c:1740 [inline]
 run_timer_softirq+0x2b8/0x840 kernel/time/timer.c:1786
 __do_softirq+0x118/0x344 kernel/softirq.c:292
 asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:711
 
 __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
 run_on_irqstack_cond