[PATCH] drm/mgag200: fix kernel hang in cursor code.
The machine hang completely with the following message on the console: [ 487.777538] BUG: unable to handle kernel NULL pointer dereference at 0060 [ 487.777554] IP: [] _raw_spin_lock+0xe/0x30 [ 487.777557] PGD 42e9f7067 PUD 42f2fa067 PMD 0 [ 487.777560] Oops: 0002 [#1] SMP ... [ 487.777618] CPU: 21 PID: 3190 Comm: Xorg Tainted: GE 4.4.0-rc1-3-default+ #6 [ 487.777620] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0059.R00.1501081238 01/08/2015 [ 487.777621] task: 880853ae4680 ti: 8808696d4000 task.ti: 8808696d4000 [ 487.777625] RIP: 0010:[] [] _raw_spin_lock+0xe/0x30 [ 487.777627] RSP: 0018:8808696d79c0 EFLAGS: 00010246 [ 487.777628] RAX: RBX: RCX: [ 487.777629] RDX: 0001 RSI: RDI: 0060 [ 487.777630] RBP: 8808696d79e0 R08: R09: 88086924a780 [ 487.777631] R10: 0001bb40 R11: 3246 R12: [ 487.777632] R13: 880463a27360 R14: 88046ca50218 R15: 0080 [ 487.777634] FS: 7f3f81c5a8c0() GS:88086f06() knlGS: [ 487.777635] CS: 0010 DS: ES: CR0: 80050033 [ 487.777636] CR2: 0060 CR3: 00042e678000 CR4: 001406e0 [ 487.777638] DR0: DR1: DR2: [ 487.777639] DR3: DR6: fffe0ff0 DR7: 0400 [ 487.777639] Stack: [ 487.777642] a00eb5fa 8808696d7b60 88086b87d800 [ 487.777644] 8808696d7ac8 a01694b6 8808696d7ae8 8109c8d5 [ 487.777647] 880469158740 880463a27000 88086b87d800 88086b87d800 [ 487.777647] Call Trace: [ 487.777674] [] ? drm_gem_object_lookup+0x1a/0xa0 [drm] [ 487.777681] [] mga_crtc_cursor_set+0xc6/0xb60 [mgag200] [ 487.777691] [] ? find_busiest_group+0x35/0x4a0 [ 487.777696] [] ? __might_sleep+0x44/0x80 [ 487.777699] [] ? __ww_mutex_lock+0x22/0x9c [ 487.22] [] ? drm_modeset_lock+0x34/0xf0 [drm] [ 487.33] [] restore_fbdev_mode+0xee/0x2a0 [drm_kms_helper] [ 487.42] [] drm_fb_helper_restore_fbdev_mode_unlocked+0x2e/0x70 [drm_kms_helper] [ 487.48] [] drm_fb_helper_set_par+0x27/0x50 [drm_kms_helper] [ 487.52] [] fb_set_var+0x18c/0x3f0 [ 487.77] [] ? __ext4_handle_dirty_metadata+0x8a/0x210 [ext4] [ 487.83] [] fbcon_blank+0x1b7/0x2b0 [ 487.90] [] do_unblank_screen+0xb3/0x1c0 [ 487.95] [] vt_ioctl+0x118a/0x1210 [ 487.777801] [] tty_ioctl+0x3f0/0xc90 [ 487.777808] [] ? kzfree+0x28/0x30 [ 487.777813] [] ? mntput+0x1f/0x30 [ 487.777817] [] do_vfs_ioctl+0x30d/0x570 [ 487.777822] [] ? task_work_run+0x8a/0xa0 [ 487.777825] [] SyS_ioctl+0x74/0x80 [ 487.777829] [] entry_SYSCALL_64_fastpath+0x12/0x71 [ 487.777851] Code: 65 ff 0d ce 02 a8 7e 5d c3 ba 01 00 00 00 f0 0f b1 17 85 c0 75 e8 b0 01 5d c3 0f 1f 00 65 ff 05 b1 02 a8 7e 31 c0 ba 01 00 00 00 0f b1 17 85 c0 75 01 c3 55 89 c6 48 89 e5 e8 4e f5 b1 ff 5d [ 487.777854] RIP [] _raw_spin_lock+0xe/0x30 [ 487.777855] RSP [ 487.777856] CR2: 0060 [ 487.777860] ---[ end trace 672a2cd555e0ebd3 ]--- The cursor code may be entered with file_priv == NULL && handle == NULL. The problem was introduced by: "bf89209 drm/mga200g: Hold a proper reference for cursor_set" which calls drm_gem_object_lookup(dev, file_priv...). Previously this wasn't a problem because we checked the handle. Move the check early in the function can fix the problem. Signed-off-by: Rui Wang --- drivers/gpu/drm/mgag200/mgag200_cursor.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/mgag200/mgag200_cursor.c b/drivers/gpu/drm/mgag200/mgag200_cursor.c index 4f2068f..a7bf6a9 100644 --- a/drivers/gpu/drm/mgag200/mgag200_cursor.c +++ b/drivers/gpu/drm/mgag200/mgag200_cursor.c @@ -70,6 +70,11 @@ int mga_crtc_cursor_set(struct drm_crtc *crtc, BUG_ON(pixels_2 != pixels_current && pixels_2 != pixels_prev); BUG_ON(pixels_current == pixels_prev); + if (!handle || !file_priv) { + mga_hide_cursor(mdev); + return 0; + } + obj = drm_gem_object_lookup(dev, file_priv, handle); if (!obj) return -ENOENT; @@ -88,12 +93,6 @@ int mga_crtc_cursor_set(struct drm_crtc *crtc, goto out_unreserve1; } - if (!handle) { - mga_hide_cursor(mdev); - ret = 0; - goto out1; - } - /* Move cursor buffers into VRAM if they aren't already */ if (!pixels_1->pin_count) { ret = mgag200_bo_pin(pixels_1, TTM_PL_FLAG_VRAM, -- 1.8.5.2
BUG: drm/mgag200 NULL pointer dereference at 0000000000000060
Hi All, Just found the following bug causing machine hang: [ 487.777538] BUG: unable to handle kernel NULL pointer dereference at 0060 [ 487.777554] IP: [] _raw_spin_lock+0xe/0x30 [ 487.777557] PGD 42e9f7067 PUD 42f2fa067 PMD 0 [ 487.777560] Oops: 0002 [#1] SMP ... [ 487.777618] CPU: 21 PID: 3190 Comm: Xorg Tainted: GE 4.4.0-rc1-3-default+ #6 [ 487.777620] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0059.R00.1501081238 01/08/2015 [ 487.777621] task: 880853ae4680 ti: 8808696d4000 task.ti: 8808696d4000 [ 487.777625] RIP: 0010:[] [] _raw_spin_lock+0xe/0x30 [ 487.777627] RSP: 0018:8808696d79c0 EFLAGS: 00010246 [ 487.777628] RAX: RBX: RCX: [ 487.777629] RDX: 0001 RSI: RDI: 0060 [ 487.777630] RBP: 8808696d79e0 R08: R09: 88086924a780 [ 487.777631] R10: 0001bb40 R11: 3246 R12: [ 487.777632] R13: 880463a27360 R14: 88046ca50218 R15: 0080 [ 487.777634] FS: 7f3f81c5a8c0() GS:88086f06() knlGS: [ 487.777635] CS: 0010 DS: ES: CR0: 80050033 [ 487.777636] CR2: 0060 CR3: 00042e678000 CR4: 001406e0 [ 487.777638] DR0: DR1: DR2: [ 487.777639] DR3: DR6: fffe0ff0 DR7: 0400 [ 487.777639] Stack: [ 487.777642] a00eb5fa 8808696d7b60 88086b87d800 [ 487.777644] 8808696d7ac8 a01694b6 8808696d7ae8 8109c8d5 [ 487.777647] 880469158740 880463a27000 88086b87d800 88086b87d800 [ 487.777647] Call Trace: [ 487.777674] [] ? drm_gem_object_lookup+0x1a/0xa0 [drm] [ 487.777681] [] mga_crtc_cursor_set+0xc6/0xb60 [mgag200] [ 487.777691] [] ? find_busiest_group+0x35/0x4a0 [ 487.777696] [] ? __might_sleep+0x44/0x80 [ 487.777699] [] ? __ww_mutex_lock+0x22/0x9c [ 487.22] [] ? drm_modeset_lock+0x34/0xf0 [drm] [ 487.33] [] restore_fbdev_mode+0xee/0x2a0 [drm_kms_helper] [ 487.42] [] drm_fb_helper_restore_fbdev_mode_unlocked+0x2e/0x70 [drm_kms_helper] [ 487.48] [] drm_fb_helper_set_par+0x27/0x50 [drm_kms_helper] [ 487.52] [] fb_set_var+0x18c/0x3f0 [ 487.77] [] ? __ext4_handle_dirty_metadata+0x8a/0x210 [ext4] [ 487.83] [] fbcon_blank+0x1b7/0x2b0 [ 487.90] [] do_unblank_screen+0xb3/0x1c0 [ 487.95] [] vt_ioctl+0x118a/0x1210 [ 487.777801] [] tty_ioctl+0x3f0/0xc90 [ 487.777808] [] ? kzfree+0x28/0x30 [ 487.777813] [] ? mntput+0x1f/0x30 [ 487.777817] [] do_vfs_ioctl+0x30d/0x570 [ 487.777822] [] ? task_work_run+0x8a/0xa0 [ 487.777825] [] SyS_ioctl+0x74/0x80 [ 487.777829] [] entry_SYSCALL_64_fastpath+0x12/0x71 [ 487.777851] Code: 65 ff 0d ce 02 a8 7e 5d c3 ba 01 00 00 00 f0 0f b1 17 85 c0 75 e8 b0 01 5d c3 0f 1f 00 65 ff 05 b1 02 a8 7e 31 c0 ba 01 00 00 00 0f b1 17 85 c0 75 01 c3 55 89 c6 48 89 e5 e8 4e f5 b1 ff 5d [ 487.777854] RIP [] _raw_spin_lock+0xe/0x30 [ 487.777855] RSP [ 487.777856] CR2: 0060 [ 487.777860] ---[ end trace 672a2cd555e0ebd3 ]--- Analysis: The faulting instruction is at _raw_spin_lock+0xe/0x30, which is: <_raw_spin_lock+14>: lock cmpxchg %edx,(%rdi) It is because rdi is an invalid pointer:0060 The source code: drm_gem_object_lookup(struct drm_device *dev, struct drm_file *filp, u32 handle) { struct drm_gem_object *obj; spin_lock(>table_lock); <== faulting In assembly: : push %rbp: lea0x60(%rsi),%rdi <== rdi=rsi+0x60, so rsi==NULL : mov%rsp,%rbp : push %r12 : mov%edx,%r12d : push %rbx : mov%rsi,%rbx : sub$0x8,%rsp : callq 0x8158aae0 <_raw_spin_lock> conclusion: %rsi, the second argument of drm_gem_object_lookup(), filp == NULL. I'll send a patch in a separate Email. Regards, Rui
[PATCH] drm: fb helper should avoid sleeping in panic context
Hi Rob, Yes it's exactly what I'm doing. Please scroll down and review my patch. Thanks Rui On 12/10/14, Rob Clark wrote: > perhaps fb helpers could use __drm_modeset_lock_all() w/ trylock=true > in panic context? > > BR, > -R > > On Tue, Dec 9, 2014 at 7:09 PM, rui wang wrote: >> Hi All, >> >> Any comment ? Or any better idea how this should be fixed? >> >> Regards, >> Rui >> >> -- Forwarded message -- >> From: ruiv.wang at gmail.com >> Date: Thu, 4 Dec 2014 22:11:35 +0800 >> Subject: [PATCH] drm: fb helper should avoid sleeping in panic context >> To: airlied at redhat.com, daniel.vetter at ffwll.ch, tony.luck at intel.com, >> bp at alien8.de, aris at redhat.com, rui.y.wang at intel.com >> Cc: linux-kernel at vger.kernel.org >> >> From: Rui Wang >> >> There are still some places in the fb helper that need to avoid >> sleeping in panic context. Here's an example: >> >> [ 65.615496] bad: scheduling from the idle thread! >> [ 65.620747] CPU: 92 PID: 0 Comm: swapper/92 Tainted: G ME >> 3.18.0-rc4-7-default+ #20 >> >> [ 65.630364] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, >> BIOS BRHSXSD1.86B.0056.R01.1409242327 09/24/2014 >> [ 65.641923] 88087f693d80 88087f689878 81566db9 >> >> [ 65.650226] 88087f693d80 88087f689898 810871ff >> 88046eb3e0d0 >> [ 65.658527] 88087f693d80 88087f6898c8 8107c1fa >> 00017f6898b8 >> [ 65.666830] Call Trace: >> [ 65.669557] <#MC> [] dump_stack+0x46/0x58 >> [ 65.675994] [] dequeue_task_idle+0x2f/0x40 >> [ 65.682412] [] dequeue_task+0x5a/0x80 >> [ 65.688345] [] deactivate_task+0x23/0x30 >> [ 65.694569] [] __schedule+0x580/0x7f0 >> [ 65.700502] [] schedule_preempt_disabled+0x29/0x70 >> [ 65.707696] [] __ww_mutex_lock_slowpath+0xb8/0x162 >> [ 65.714891] [] __ww_mutex_lock+0x53/0x85 >> [ 65.721125] [] drm_modeset_lock+0x3d/0x110 [drm] >> [ 65.728132] [] __drm_modeset_lock_all+0x8a/0x120 >> [drm] >> [ 65.735721] [] drm_modeset_lock_all+0x10/0x30 [drm] >> [ 65.743015] [] >> drm_fb_helper_pan_display+0x2f/0xf0 [drm_kms_helper] >> [ 65.751857] [] fb_pan_display+0xd1/0x1a0 >> [ 65.758081] [] bit_update_start+0x20/0x50 >> [ 65.764400] [] fbcon_switch+0x3a2/0x550 >> [ 65.770528] [] redraw_screen+0x189/0x240 >> [ 65.776750] [] fbcon_blank+0x20a/0x2d0 >> [ 65.782778] [] ? erst_writer+0x209/0x330 >> [ 65.789002] [] ? internal_add_timer+0x63/0x80 >> [ 65.795710] [] ? mod_timer+0x127/0x1e0 >> [ 65.801740] [] do_unblank_screen+0xa8/0x1d0 >> [ 65.808255] [] unblank_screen+0x10/0x20 >> [ 65.814381] [] bust_spinlocks+0x19/0x40 >> [ 65.820508] [] panic+0x106/0x1f5 >> [ 65.825955] [] mce_panic+0x2ac/0x2e0 >> [ 65.831789] [] ? delay_tsc+0x4a/0x80 >> [ 65.837625] [] do_machine_check+0xbaf/0xbf0 >> [ 65.844138] [] ? intel_idle+0xc7/0x150 >> [ 65.850166] [] machine_check+0x1f/0x30 >> [ 65.856195] [] ? intel_idle+0xc7/0x150 >> [ 65.86] <> [] >> cpuidle_enter_state+0x55/0x170 >> [ 65.869823] [] cpuidle_enter+0x17/0x20 >> [ 65.875852] [] cpu_startup_entry+0x2d8/0x370 >> [ 65.882467] [] start_secondary+0x159/0x180 >> >> There's __drm_modeset_lock_all() which Daniel Vetter introduced for this >> purpose. We can leverage that without reinventing anything. This patch >> works with the latest kernel. >> >> Signed-off-by: Rui Wang >> --- >> drivers/gpu/drm/drm_fb_helper.c |8 ++-- >> 1 files changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/drm_fb_helper.c >> b/drivers/gpu/drm/drm_fb_helper.c >> index 0c0c39b..70dd2f4 100644 >> --- a/drivers/gpu/drm/drm_fb_helper.c >> +++ b/drivers/gpu/drm/drm_fb_helper.c >> @@ -732,7 +732,9 @@ int drm_fb_helper_setcmap(struct fb_cmap *cmap, >> struct fb_info *info) >> int i, j, rc = 0; >> int start; >> >> - drm_modeset_lock_all(dev); >> + if (__drm_modeset_lock_all(dev, !!oops_in_progress)) { >> + return -EBUSY; >> + } >> if (!drm_fb_helper_is_bound(fb_helper)) { >> drm_modeset_unlock_all(dev); >> return -EBUSY; >> @@ -910,7 +912,9 @@ int drm_fb_helper_pan_display(struct fb_var_screeninfo >> *var, >> int ret = 0; >> int i; >> >> - drm_modeset_lock_all(
[PATCH] drm: fb helper should avoid sleeping in panic context
Hi All, Any comment ? Or any better idea how this should be fixed? Regards, Rui -- Forwarded message -- From: ruiv.w...@gmail.com Date: Thu, 4 Dec 2014 22:11:35 +0800 Subject: [PATCH] drm: fb helper should avoid sleeping in panic context To: airlied at redhat.com, daniel.vetter at ffwll.ch, tony.luck at intel.com, bp at alien8.de, aris at redhat.com, rui.y.wang at intel.com Cc: linux-kernel at vger.kernel.org From: Rui Wang <rui.y.w...@intel.com> There are still some places in the fb helper that need to avoid sleeping in panic context. Here's an example: [ 65.615496] bad: scheduling from the idle thread! [ 65.620747] CPU: 92 PID: 0 Comm: swapper/92 Tainted: G ME 3.18.0-rc4-7-default+ #20 [ 65.630364] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0056.R01.1409242327 09/24/2014 [ 65.641923] 88087f693d80 88087f689878 81566db9 [ 65.650226] 88087f693d80 88087f689898 810871ff 88046eb3e0d0 [ 65.658527] 88087f693d80 88087f6898c8 8107c1fa 00017f6898b8 [ 65.666830] Call Trace: [ 65.669557] <#MC> [] dump_stack+0x46/0x58 [ 65.675994] [] dequeue_task_idle+0x2f/0x40 [ 65.682412] [] dequeue_task+0x5a/0x80 [ 65.688345] [] deactivate_task+0x23/0x30 [ 65.694569] [] __schedule+0x580/0x7f0 [ 65.700502] [] schedule_preempt_disabled+0x29/0x70 [ 65.707696] [] __ww_mutex_lock_slowpath+0xb8/0x162 [ 65.714891] [] __ww_mutex_lock+0x53/0x85 [ 65.721125] [] drm_modeset_lock+0x3d/0x110 [drm] [ 65.728132] [] __drm_modeset_lock_all+0x8a/0x120 [drm] [ 65.735721] [] drm_modeset_lock_all+0x10/0x30 [drm] [ 65.743015] [] drm_fb_helper_pan_display+0x2f/0xf0 [drm_kms_helper] [ 65.751857] [] fb_pan_display+0xd1/0x1a0 [ 65.758081] [] bit_update_start+0x20/0x50 [ 65.764400] [] fbcon_switch+0x3a2/0x550 [ 65.770528] [] redraw_screen+0x189/0x240 [ 65.776750] [] fbcon_blank+0x20a/0x2d0 [ 65.782778] [] ? erst_writer+0x209/0x330 [ 65.789002] [] ? internal_add_timer+0x63/0x80 [ 65.795710] [] ? mod_timer+0x127/0x1e0 [ 65.801740] [] do_unblank_screen+0xa8/0x1d0 [ 65.808255] [] unblank_screen+0x10/0x20 [ 65.814381] [] bust_spinlocks+0x19/0x40 [ 65.820508] [] panic+0x106/0x1f5 [ 65.825955] [] mce_panic+0x2ac/0x2e0 [ 65.831789] [] ? delay_tsc+0x4a/0x80 [ 65.837625] [] do_machine_check+0xbaf/0xbf0 [ 65.844138] [] ? intel_idle+0xc7/0x150 [ 65.850166] [] machine_check+0x1f/0x30 [ 65.856195] [] ? intel_idle+0xc7/0x150 [ 65.86] <> [] cpuidle_enter_state+0x55/0x170 [ 65.869823] [] cpuidle_enter+0x17/0x20 [ 65.875852] [] cpu_startup_entry+0x2d8/0x370 [ 65.882467] [] start_secondary+0x159/0x180 There's __drm_modeset_lock_all() which Daniel Vetter introduced for this purpose. We can leverage that without reinventing anything. This patch works with the latest kernel. Signed-off-by: Rui Wang --- drivers/gpu/drm/drm_fb_helper.c |8 ++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index 0c0c39b..70dd2f4 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -732,7 +732,9 @@ int drm_fb_helper_setcmap(struct fb_cmap *cmap, struct fb_info *info) int i, j, rc = 0; int start; - drm_modeset_lock_all(dev); + if (__drm_modeset_lock_all(dev, !!oops_in_progress)) { + return -EBUSY; + } if (!drm_fb_helper_is_bound(fb_helper)) { drm_modeset_unlock_all(dev); return -EBUSY; @@ -910,7 +912,9 @@ int drm_fb_helper_pan_display(struct fb_var_screeninfo *var, int ret = 0; int i; - drm_modeset_lock_all(dev); + if (__drm_modeset_lock_all(dev, !!oops_in_progress)) { + return -EBUSY; + } if (!drm_fb_helper_is_bound(fb_helper)) { drm_modeset_unlock_all(dev); return -EBUSY; -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/