[PATCH] drm/mgag200: fix kernel hang in cursor code.

2015-11-18 Thread Rui Wang
The machine hang completely with the following message on the console:

[  487.777538] BUG: unable to handle kernel NULL pointer dereference at 
0060
[  487.777554] IP: [] _raw_spin_lock+0xe/0x30
[  487.777557] PGD 42e9f7067 PUD 42f2fa067 PMD 0
[  487.777560] Oops: 0002 [#1] SMP
...
[  487.777618] CPU: 21 PID: 3190 Comm: Xorg Tainted: GE   
4.4.0-rc1-3-default+ #6
[  487.777620] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS 
BRHSXSD1.86B.0059.R00.1501081238 01/08/2015
[  487.777621] task: 880853ae4680 ti: 8808696d4000 task.ti: 
8808696d4000
[  487.777625] RIP: 0010:[]  [] 
_raw_spin_lock+0xe/0x30
[  487.777627] RSP: 0018:8808696d79c0  EFLAGS: 00010246
[  487.777628] RAX:  RBX:  RCX: 
[  487.777629] RDX: 0001 RSI:  RDI: 0060
[  487.777630] RBP: 8808696d79e0 R08:  R09: 88086924a780
[  487.777631] R10: 0001bb40 R11: 3246 R12: 
[  487.777632] R13: 880463a27360 R14: 88046ca50218 R15: 0080
[  487.777634] FS:  7f3f81c5a8c0() GS:88086f06() 
knlGS:
[  487.777635] CS:  0010 DS:  ES:  CR0: 80050033
[  487.777636] CR2: 0060 CR3: 00042e678000 CR4: 001406e0
[  487.777638] DR0:  DR1:  DR2: 
[  487.777639] DR3:  DR6: fffe0ff0 DR7: 0400
[  487.777639] Stack:
[  487.777642]  a00eb5fa 8808696d7b60 88086b87d800 

[  487.777644]  8808696d7ac8 a01694b6 8808696d7ae8 
8109c8d5
[  487.777647]  880469158740 880463a27000 88086b87d800 
88086b87d800
[  487.777647] Call Trace:
[  487.777674]  [] ? drm_gem_object_lookup+0x1a/0xa0 [drm]
[  487.777681]  [] mga_crtc_cursor_set+0xc6/0xb60 [mgag200]
[  487.777691]  [] ? find_busiest_group+0x35/0x4a0
[  487.777696]  [] ? __might_sleep+0x44/0x80
[  487.777699]  [] ? __ww_mutex_lock+0x22/0x9c
[  487.22]  [] ? drm_modeset_lock+0x34/0xf0 [drm]
[  487.33]  [] restore_fbdev_mode+0xee/0x2a0 
[drm_kms_helper]
[  487.42]  [] 
drm_fb_helper_restore_fbdev_mode_unlocked+0x2e/0x70 [drm_kms_helper]
[  487.48]  [] drm_fb_helper_set_par+0x27/0x50 
[drm_kms_helper]
[  487.52]  [] fb_set_var+0x18c/0x3f0
[  487.77]  [] ? __ext4_handle_dirty_metadata+0x8a/0x210 
[ext4]
[  487.83]  [] fbcon_blank+0x1b7/0x2b0
[  487.90]  [] do_unblank_screen+0xb3/0x1c0
[  487.95]  [] vt_ioctl+0x118a/0x1210
[  487.777801]  [] tty_ioctl+0x3f0/0xc90
[  487.777808]  [] ? kzfree+0x28/0x30
[  487.777813]  [] ? mntput+0x1f/0x30
[  487.777817]  [] do_vfs_ioctl+0x30d/0x570
[  487.777822]  [] ? task_work_run+0x8a/0xa0
[  487.777825]  [] SyS_ioctl+0x74/0x80
[  487.777829]  [] entry_SYSCALL_64_fastpath+0x12/0x71
[  487.777851] Code: 65 ff 0d ce 02 a8 7e 5d c3 ba 01 00 00 00 f0 0f b1 17 85 
c0 75 e8 b0 01 5d c3 0f 1f 00 65 ff 05 b1 02 a8 7e 31 c0 ba 01 00 00 00  0f 
b1 17 85 c0 75 01 c3 55 89 c6 48 89 e5 e8 4e f5 b1 ff 5d
[  487.777854] RIP  [] _raw_spin_lock+0xe/0x30
[  487.777855]  RSP 
[  487.777856] CR2: 0060
[  487.777860] ---[ end trace 672a2cd555e0ebd3 ]---

The cursor code may be entered with file_priv == NULL && handle == NULL.
The problem was introduced by:

"bf89209 drm/mga200g: Hold a proper reference for cursor_set"

which calls drm_gem_object_lookup(dev, file_priv...). Previously this wasn't
a problem because we checked the handle. Move the check early in the function
can fix the problem.

Signed-off-by: Rui Wang 
---
 drivers/gpu/drm/mgag200/mgag200_cursor.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/mgag200/mgag200_cursor.c 
b/drivers/gpu/drm/mgag200/mgag200_cursor.c
index 4f2068f..a7bf6a9 100644
--- a/drivers/gpu/drm/mgag200/mgag200_cursor.c
+++ b/drivers/gpu/drm/mgag200/mgag200_cursor.c
@@ -70,6 +70,11 @@ int mga_crtc_cursor_set(struct drm_crtc *crtc,
BUG_ON(pixels_2 != pixels_current && pixels_2 != pixels_prev);
BUG_ON(pixels_current == pixels_prev);

+   if (!handle || !file_priv) {
+   mga_hide_cursor(mdev);
+   return 0;
+   }
+
obj = drm_gem_object_lookup(dev, file_priv, handle);
if (!obj)
return -ENOENT;
@@ -88,12 +93,6 @@ int mga_crtc_cursor_set(struct drm_crtc *crtc,
goto out_unreserve1;
}

-   if (!handle) {
-   mga_hide_cursor(mdev);
-   ret = 0;
-   goto out1;
-   }
-
/* Move cursor buffers into VRAM if they aren't already */
if (!pixels_1->pin_count) {
ret = mgag200_bo_pin(pixels_1, TTM_PL_FLAG_VRAM,
-- 
1.8.5.2



BUG: drm/mgag200 NULL pointer dereference at 0000000000000060

2015-11-18 Thread Rui Wang
Hi All,
Just found the following bug causing machine hang:

[  487.777538] BUG: unable to handle kernel NULL pointer dereference at 
0060
[  487.777554] IP: [] _raw_spin_lock+0xe/0x30
[  487.777557] PGD 42e9f7067 PUD 42f2fa067 PMD 0
[  487.777560] Oops: 0002 [#1] SMP
...
[  487.777618] CPU: 21 PID: 3190 Comm: Xorg Tainted: GE   
4.4.0-rc1-3-default+ #6
[  487.777620] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS 
BRHSXSD1.86B.0059.R00.1501081238 01/08/2015
[  487.777621] task: 880853ae4680 ti: 8808696d4000 task.ti: 
8808696d4000
[  487.777625] RIP: 0010:[]  [] 
_raw_spin_lock+0xe/0x30
[  487.777627] RSP: 0018:8808696d79c0  EFLAGS: 00010246
[  487.777628] RAX:  RBX:  RCX: 
[  487.777629] RDX: 0001 RSI:  RDI: 0060
[  487.777630] RBP: 8808696d79e0 R08:  R09: 88086924a780
[  487.777631] R10: 0001bb40 R11: 3246 R12: 
[  487.777632] R13: 880463a27360 R14: 88046ca50218 R15: 0080
[  487.777634] FS:  7f3f81c5a8c0() GS:88086f06() 
knlGS:
[  487.777635] CS:  0010 DS:  ES:  CR0: 80050033
[  487.777636] CR2: 0060 CR3: 00042e678000 CR4: 001406e0
[  487.777638] DR0:  DR1:  DR2: 
[  487.777639] DR3:  DR6: fffe0ff0 DR7: 0400
[  487.777639] Stack:
[  487.777642]  a00eb5fa 8808696d7b60 88086b87d800 

[  487.777644]  8808696d7ac8 a01694b6 8808696d7ae8 
8109c8d5
[  487.777647]  880469158740 880463a27000 88086b87d800 
88086b87d800
[  487.777647] Call Trace:
[  487.777674]  [] ? drm_gem_object_lookup+0x1a/0xa0 [drm]
[  487.777681]  [] mga_crtc_cursor_set+0xc6/0xb60 [mgag200]
[  487.777691]  [] ? find_busiest_group+0x35/0x4a0
[  487.777696]  [] ? __might_sleep+0x44/0x80
[  487.777699]  [] ? __ww_mutex_lock+0x22/0x9c
[  487.22]  [] ? drm_modeset_lock+0x34/0xf0 [drm]
[  487.33]  [] restore_fbdev_mode+0xee/0x2a0 
[drm_kms_helper]
[  487.42]  [] 
drm_fb_helper_restore_fbdev_mode_unlocked+0x2e/0x70 [drm_kms_helper]
[  487.48]  [] drm_fb_helper_set_par+0x27/0x50 
[drm_kms_helper]
[  487.52]  [] fb_set_var+0x18c/0x3f0
[  487.77]  [] ? __ext4_handle_dirty_metadata+0x8a/0x210 
[ext4]
[  487.83]  [] fbcon_blank+0x1b7/0x2b0
[  487.90]  [] do_unblank_screen+0xb3/0x1c0
[  487.95]  [] vt_ioctl+0x118a/0x1210
[  487.777801]  [] tty_ioctl+0x3f0/0xc90
[  487.777808]  [] ? kzfree+0x28/0x30
[  487.777813]  [] ? mntput+0x1f/0x30
[  487.777817]  [] do_vfs_ioctl+0x30d/0x570
[  487.777822]  [] ? task_work_run+0x8a/0xa0
[  487.777825]  [] SyS_ioctl+0x74/0x80
[  487.777829]  [] entry_SYSCALL_64_fastpath+0x12/0x71
[  487.777851] Code: 65 ff 0d ce 02 a8 7e 5d c3 ba 01 00 00 00 f0 0f b1 17 85 
c0 75 e8 b0 01 5d c3 0f 1f 00 65 ff 05 b1 02 a8 7e 31 c0 ba 01 00 00 00  0f 
b1 17 85 c0 75 01 c3 55 89 c6 48 89 e5 e8 4e f5 b1 ff 5d
[  487.777854] RIP  [] _raw_spin_lock+0xe/0x30
[  487.777855]  RSP 
[  487.777856] CR2: 0060
[  487.777860] ---[ end trace 672a2cd555e0ebd3 ]---

Analysis:
The faulting instruction is at _raw_spin_lock+0xe/0x30, which is:

<_raw_spin_lock+14>: lock cmpxchg %edx,(%rdi)

It is because rdi is an invalid pointer:0060

The source code:
drm_gem_object_lookup(struct drm_device *dev, struct drm_file *filp,
  u32 handle)
{
struct drm_gem_object *obj;

spin_lock(>table_lock); <== faulting

In assembly:
: push   %rbp
:   lea0x60(%rsi),%rdi <== rdi=rsi+0x60, so 
rsi==NULL
:   mov%rsp,%rbp
:   push   %r12
:  mov%edx,%r12d
:  push   %rbx
:  mov%rsi,%rbx
:  sub$0x8,%rsp
:  callq  0x8158aae0 <_raw_spin_lock>

conclusion:
%rsi, the second argument of drm_gem_object_lookup(), filp == NULL.

I'll send a patch in a separate Email. 

Regards,
Rui


[PATCH] drm: fb helper should avoid sleeping in panic context

2014-12-10 Thread rui wang
Hi Rob,
Yes it's exactly what I'm doing. Please scroll down and review my patch.

Thanks
Rui

On 12/10/14, Rob Clark  wrote:
> perhaps fb helpers could use __drm_modeset_lock_all() w/ trylock=true
> in panic context?
>
> BR,
> -R
>
> On Tue, Dec 9, 2014 at 7:09 PM, rui wang  wrote:
>> Hi All,
>>
>> Any comment ? Or any better idea how this should be fixed?
>>
>> Regards,
>> Rui
>>
>> -- Forwarded message --
>> From: ruiv.wang at gmail.com
>> Date: Thu,  4 Dec 2014 22:11:35 +0800
>> Subject: [PATCH] drm: fb helper should avoid sleeping in panic context
>> To: airlied at redhat.com, daniel.vetter at ffwll.ch, tony.luck at intel.com,
>> bp at alien8.de, aris at redhat.com, rui.y.wang at intel.com
>> Cc: linux-kernel at vger.kernel.org
>>
>> From: Rui Wang 
>>
>> There are still some places in the fb helper that need to avoid
>> sleeping in panic context. Here's an example:
>>
>> [   65.615496] bad: scheduling from the idle thread!
>> [   65.620747] CPU: 92 PID: 0 Comm: swapper/92 Tainted: G   ME
>>  3.18.0-rc4-7-default+ #20
>>
>> [   65.630364] Hardware name: Intel Corporation BRICKLAND/BRICKLAND,
>> BIOS BRHSXSD1.86B.0056.R01.1409242327 09/24/2014
>> [   65.641923]  88087f693d80 88087f689878 81566db9
>> 
>> [   65.650226]  88087f693d80 88087f689898 810871ff
>> 88046eb3e0d0
>> [   65.658527]  88087f693d80 88087f6898c8 8107c1fa
>> 00017f6898b8
>> [   65.666830] Call Trace:
>> [   65.669557]  <#MC>  [] dump_stack+0x46/0x58
>> [   65.675994]  [] dequeue_task_idle+0x2f/0x40
>> [   65.682412]  [] dequeue_task+0x5a/0x80
>> [   65.688345]  [] deactivate_task+0x23/0x30
>> [   65.694569]  [] __schedule+0x580/0x7f0
>> [   65.700502]  [] schedule_preempt_disabled+0x29/0x70
>> [   65.707696]  [] __ww_mutex_lock_slowpath+0xb8/0x162
>> [   65.714891]  [] __ww_mutex_lock+0x53/0x85
>> [   65.721125]  [] drm_modeset_lock+0x3d/0x110 [drm]
>> [   65.728132]  [] __drm_modeset_lock_all+0x8a/0x120
>> [drm]
>> [   65.735721]  [] drm_modeset_lock_all+0x10/0x30 [drm]
>> [   65.743015]  []
>> drm_fb_helper_pan_display+0x2f/0xf0 [drm_kms_helper]
>> [   65.751857]  [] fb_pan_display+0xd1/0x1a0
>> [   65.758081]  [] bit_update_start+0x20/0x50
>> [   65.764400]  [] fbcon_switch+0x3a2/0x550
>> [   65.770528]  [] redraw_screen+0x189/0x240
>> [   65.776750]  [] fbcon_blank+0x20a/0x2d0
>> [   65.782778]  [] ? erst_writer+0x209/0x330
>> [   65.789002]  [] ? internal_add_timer+0x63/0x80
>> [   65.795710]  [] ? mod_timer+0x127/0x1e0
>> [   65.801740]  [] do_unblank_screen+0xa8/0x1d0
>> [   65.808255]  [] unblank_screen+0x10/0x20
>> [   65.814381]  [] bust_spinlocks+0x19/0x40
>> [   65.820508]  [] panic+0x106/0x1f5
>> [   65.825955]  [] mce_panic+0x2ac/0x2e0
>> [   65.831789]  [] ? delay_tsc+0x4a/0x80
>> [   65.837625]  [] do_machine_check+0xbaf/0xbf0
>> [   65.844138]  [] ? intel_idle+0xc7/0x150
>> [   65.850166]  [] machine_check+0x1f/0x30
>> [   65.856195]  [] ? intel_idle+0xc7/0x150
>> [   65.86]  <>  []
>> cpuidle_enter_state+0x55/0x170
>> [   65.869823]  [] cpuidle_enter+0x17/0x20
>> [   65.875852]  [] cpu_startup_entry+0x2d8/0x370
>> [   65.882467]  [] start_secondary+0x159/0x180
>>
>> There's __drm_modeset_lock_all() which Daniel Vetter introduced for this
>> purpose. We can leverage that without reinventing anything. This patch
>> works with the latest kernel.
>>
>> Signed-off-by: Rui Wang 
>> ---
>>  drivers/gpu/drm/drm_fb_helper.c |8 ++--
>>  1 files changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_fb_helper.c
>> b/drivers/gpu/drm/drm_fb_helper.c
>> index 0c0c39b..70dd2f4 100644
>> --- a/drivers/gpu/drm/drm_fb_helper.c
>> +++ b/drivers/gpu/drm/drm_fb_helper.c
>> @@ -732,7 +732,9 @@ int drm_fb_helper_setcmap(struct fb_cmap *cmap,
>> struct fb_info *info)
>> int i, j, rc = 0;
>> int start;
>>
>> -   drm_modeset_lock_all(dev);
>> +   if (__drm_modeset_lock_all(dev, !!oops_in_progress)) {
>> +   return -EBUSY;
>> +   }
>> if (!drm_fb_helper_is_bound(fb_helper)) {
>> drm_modeset_unlock_all(dev);
>> return -EBUSY;
>> @@ -910,7 +912,9 @@ int drm_fb_helper_pan_display(struct fb_var_screeninfo
>> *var,
>> int ret = 0;
>> int i;
>>
>> -   drm_modeset_lock_all(

[PATCH] drm: fb helper should avoid sleeping in panic context

2014-12-10 Thread rui wang
Hi All,

Any comment ? Or any better idea how this should be fixed?

Regards,
Rui

-- Forwarded message --
From: ruiv.w...@gmail.com
Date: Thu,  4 Dec 2014 22:11:35 +0800
Subject: [PATCH] drm: fb helper should avoid sleeping in panic context
To: airlied at redhat.com, daniel.vetter at ffwll.ch, tony.luck at intel.com,
bp at alien8.de, aris at redhat.com, rui.y.wang at intel.com
Cc: linux-kernel at vger.kernel.org

From: Rui Wang <rui.y.w...@intel.com>

There are still some places in the fb helper that need to avoid
sleeping in panic context. Here's an example:

[   65.615496] bad: scheduling from the idle thread!
[   65.620747] CPU: 92 PID: 0 Comm: swapper/92 Tainted: G   ME
 3.18.0-rc4-7-default+ #20

[   65.630364] Hardware name: Intel Corporation BRICKLAND/BRICKLAND,
BIOS BRHSXSD1.86B.0056.R01.1409242327 09/24/2014
[   65.641923]  88087f693d80 88087f689878 81566db9

[   65.650226]  88087f693d80 88087f689898 810871ff
88046eb3e0d0
[   65.658527]  88087f693d80 88087f6898c8 8107c1fa
00017f6898b8
[   65.666830] Call Trace:
[   65.669557]  <#MC>  [] dump_stack+0x46/0x58
[   65.675994]  [] dequeue_task_idle+0x2f/0x40
[   65.682412]  [] dequeue_task+0x5a/0x80
[   65.688345]  [] deactivate_task+0x23/0x30
[   65.694569]  [] __schedule+0x580/0x7f0
[   65.700502]  [] schedule_preempt_disabled+0x29/0x70
[   65.707696]  [] __ww_mutex_lock_slowpath+0xb8/0x162
[   65.714891]  [] __ww_mutex_lock+0x53/0x85
[   65.721125]  [] drm_modeset_lock+0x3d/0x110 [drm]
[   65.728132]  [] __drm_modeset_lock_all+0x8a/0x120 [drm]
[   65.735721]  [] drm_modeset_lock_all+0x10/0x30 [drm]
[   65.743015]  []
drm_fb_helper_pan_display+0x2f/0xf0 [drm_kms_helper]
[   65.751857]  [] fb_pan_display+0xd1/0x1a0
[   65.758081]  [] bit_update_start+0x20/0x50
[   65.764400]  [] fbcon_switch+0x3a2/0x550
[   65.770528]  [] redraw_screen+0x189/0x240
[   65.776750]  [] fbcon_blank+0x20a/0x2d0
[   65.782778]  [] ? erst_writer+0x209/0x330
[   65.789002]  [] ? internal_add_timer+0x63/0x80
[   65.795710]  [] ? mod_timer+0x127/0x1e0
[   65.801740]  [] do_unblank_screen+0xa8/0x1d0
[   65.808255]  [] unblank_screen+0x10/0x20
[   65.814381]  [] bust_spinlocks+0x19/0x40
[   65.820508]  [] panic+0x106/0x1f5
[   65.825955]  [] mce_panic+0x2ac/0x2e0
[   65.831789]  [] ? delay_tsc+0x4a/0x80
[   65.837625]  [] do_machine_check+0xbaf/0xbf0
[   65.844138]  [] ? intel_idle+0xc7/0x150
[   65.850166]  [] machine_check+0x1f/0x30
[   65.856195]  [] ? intel_idle+0xc7/0x150
[   65.86]  <>  [] cpuidle_enter_state+0x55/0x170
[   65.869823]  [] cpuidle_enter+0x17/0x20
[   65.875852]  [] cpu_startup_entry+0x2d8/0x370
[   65.882467]  [] start_secondary+0x159/0x180

There's __drm_modeset_lock_all() which Daniel Vetter introduced for this
purpose. We can leverage that without reinventing anything. This patch
works with the latest kernel.

Signed-off-by: Rui Wang 
---
 drivers/gpu/drm/drm_fb_helper.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
index 0c0c39b..70dd2f4 100644
--- a/drivers/gpu/drm/drm_fb_helper.c
+++ b/drivers/gpu/drm/drm_fb_helper.c
@@ -732,7 +732,9 @@ int drm_fb_helper_setcmap(struct fb_cmap *cmap,
struct fb_info *info)
int i, j, rc = 0;
int start;

-   drm_modeset_lock_all(dev);
+   if (__drm_modeset_lock_all(dev, !!oops_in_progress)) {
+   return -EBUSY;
+   }
if (!drm_fb_helper_is_bound(fb_helper)) {
drm_modeset_unlock_all(dev);
return -EBUSY;
@@ -910,7 +912,9 @@ int drm_fb_helper_pan_display(struct fb_var_screeninfo *var,
int ret = 0;
int i;

-   drm_modeset_lock_all(dev);
+   if (__drm_modeset_lock_all(dev, !!oops_in_progress)) {
+   return -EBUSY;
+   }
if (!drm_fb_helper_is_bound(fb_helper)) {
drm_modeset_unlock_all(dev);
return -EBUSY;
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/