Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes

2021-10-15 Thread Jani Nikula
On Fri, 15 Oct 2021, John Harrison  wrote:
> On 10/15/2021 07:52, Tvrtko Ursulin wrote:
>> On 04/10/2021 08:36, Jani Nikula wrote:
>>> On Fri, 24 Sep 2021, Ville Syrjälä  
>>> wrote:
 On Tue, Sep 21, 2021 at 06:50:39PM -0700, Matthew Brost wrote:
> From: Hugh Dickins 
>
> 5.15-rc1 crashes with blank screen when booting up on two ThinkPads
> using i915.  Bisections converge convincingly, but arrive at different
> and surprising "culprits", none of them the actual culprit.
>
> netconsole (with init_netconsole() hacked to call i915_init() when
> logging has started, instead of by module_init()) tells the story:
>
> kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
> with RSI: 814d408b pointing to sw_fence_dummy_notify().
> I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
> function needs to be 4-byte aligned.
>
> v2:
>   (Jani Nikula)
>    - Change BUG_ON to WARN_ON
> v3:
>   (Jani / Tvrtko)
>    - Short circuit __i915_sw_fence_init on WARN_ON
> v4:
>   (Lucas)
>    - Break WARN_ON changes out in a different patch
>
> Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
> Signed-off-by: Hugh Dickins 
> Signed-off-by: Matthew Brost 
> Reviewed-by: Matthew Brost 
> ---
>   drivers/gpu/drm/i915/gt/intel_context.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> b/drivers/gpu/drm/i915/gt/intel_context.c
> index ff637147b1a9..e7f78bc7ebfc 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -362,8 +362,8 @@ static int __intel_context_active(struct 
> i915_active *active)
>   return 0;
>   }
>   -static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
> - enum i915_sw_fence_notify state)
> +static int __i915_sw_fence_call
> +sw_fence_dummy_notify(struct i915_sw_fence *sf, enum 
> i915_sw_fence_notify state)
>   {
>   return NOTIFY_DONE;
>   }

 This thing seems broken beyond just this alignment stuff. I'm getting
 this spew from DEBUG_OBJECTS all the time on a glk here:
>>>
>>> Nobody followed through with this, so:
>>>
>>> https://lore.kernel.org/r/20211002020257.34a0e...@oasis.local.home
>>>
>>> and
>>>
>>> cdc1e6e225e3 ("drm/i915: fix blank screen booting crashes")
>>
>> John you pushed this yesterday? Will this cause a problem now if we 
>> have two commits for the same bug:
> I'm thoroughly confused.
>
> I finally got far enough down my backlog to reach this and it did not 
> appear to be in the tree yet so I tried pushing it. The DIM tool gave me 
> a bunch of errors that didn't seem to make any sense. It certainly gave 
> me the impression that it did not actually do anything. So I gave up on 
> it. But now it seems like it did actually push something? And it was 
> already merged after all?

Linus merged it directly to his tree because we didn't follow through
with this soon enough. Linus' tree feeds to drm-tip via drm-intel-fixes,
but I guess we haven't done backmerges from Linux' tree to drm-next to
drm-intel-gt-next with that commit. Once you applied the patch, dim
rebuilt drm-tip, which now got the same change from both drm-intel-fixes
and drm-intel-gt-next.

It happens.

But in general, if there are regressions with blank screens at boot
occurring in Linus' tree, the fix needs to move much faster.

BR,
Jani.


>
> John.
>
>>
>> commit b0179f0d18dd7e6fb6b1c52c49ac21365257e97e
>> Author: Hugh Dickins 
>> AuthorDate: Tue Sep 21 18:50:39 2021 -0700
>> Commit: John Harrison 
>> CommitDate: Thu Oct 14 18:29:01 2021 -0700
>>
>>     drm/i915: fix blank screen booting crashes
>>
>>
>>
>> commit cdc1e6e225e3256d56dc6648411630e71d7c776b
>> Author: Hugh Dickins 
>> AuthorDate: Sat Oct 2 03:17:29 2021 -0700
>> Commit: Linus Torvalds 
>> CommitDate: Sat Oct 2 09:39:15 2021 -0700
>>
>>     drm/i915: fix blank screen booting crashes
>>
>> Regards,
>>
>> Tvrtko
>>
>>>
>>>
>>> BR,
>>> Jani.
>>>
>>>

 [   48.122629] [ cut here ]
 [   48.122640] ODEBUG: init destroyed (active state 0) object type: 
 i915_sw_fence hint: sw_fence_dummy_notify+0x0/0x10 [i915]
 [   48.122963] WARNING: CPU: 0 PID: 815 at lib/debugobjects.c:505 
 debug_print_object+0x6e/0x90
 [   48.122976] Modules linked in: i915 i2c_algo_bit ttm 
 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops 
 prime_numbers intel_gtt agpgart fuse nls_iso8859_1 nls_cp437 vfat 
 fat intel_rapl_msr wmi_bmof intel_rapl_common x86_pkg_temp_thermal 
 r8169 realtek mdio_devres coretemp libphy efi_pstore evdev sdhci_pci 
 cqhci sdhci mei_me mmc_core i2c_i801 intel_pmc_core mei led_class 
 wmi i2c_smbus sch_fq_codel drm ip_tables x_tables ipv6 autofs4
 [   48.123119] CPU: 

Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes

2021-10-15 Thread John Harrison

On 10/15/2021 07:52, Tvrtko Ursulin wrote:

On 04/10/2021 08:36, Jani Nikula wrote:
On Fri, 24 Sep 2021, Ville Syrjälä  
wrote:

On Tue, Sep 21, 2021 at 06:50:39PM -0700, Matthew Brost wrote:

From: Hugh Dickins 

5.15-rc1 crashes with blank screen when booting up on two ThinkPads
using i915.  Bisections converge convincingly, but arrive at different
and surprising "culprits", none of them the actual culprit.

netconsole (with init_netconsole() hacked to call i915_init() when
logging has started, instead of by module_init()) tells the story:

kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
with RSI: 814d408b pointing to sw_fence_dummy_notify().
I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
function needs to be 4-byte aligned.

v2:
  (Jani Nikula)
   - Change BUG_ON to WARN_ON
v3:
  (Jani / Tvrtko)
   - Short circuit __i915_sw_fence_init on WARN_ON
v4:
  (Lucas)
   - Break WARN_ON changes out in a different patch

Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
Signed-off-by: Hugh Dickins 
Signed-off-by: Matthew Brost 
Reviewed-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c

index ff637147b1a9..e7f78bc7ebfc 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -362,8 +362,8 @@ static int __intel_context_active(struct 
i915_active *active)

  return 0;
  }
  -static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
- enum i915_sw_fence_notify state)
+static int __i915_sw_fence_call
+sw_fence_dummy_notify(struct i915_sw_fence *sf, enum 
i915_sw_fence_notify state)

  {
  return NOTIFY_DONE;
  }


This thing seems broken beyond just this alignment stuff. I'm getting
this spew from DEBUG_OBJECTS all the time on a glk here:


Nobody followed through with this, so:

https://lore.kernel.org/r/20211002020257.34a0e...@oasis.local.home

and

cdc1e6e225e3 ("drm/i915: fix blank screen booting crashes")


John you pushed this yesterday? Will this cause a problem now if we 
have two commits for the same bug:

I'm thoroughly confused.

I finally got far enough down my backlog to reach this and it did not 
appear to be in the tree yet so I tried pushing it. The DIM tool gave me 
a bunch of errors that didn't seem to make any sense. It certainly gave 
me the impression that it did not actually do anything. So I gave up on 
it. But now it seems like it did actually push something? And it was 
already merged after all?


John.



commit b0179f0d18dd7e6fb6b1c52c49ac21365257e97e
Author: Hugh Dickins 
AuthorDate: Tue Sep 21 18:50:39 2021 -0700
Commit: John Harrison 
CommitDate: Thu Oct 14 18:29:01 2021 -0700

    drm/i915: fix blank screen booting crashes



commit cdc1e6e225e3256d56dc6648411630e71d7c776b
Author: Hugh Dickins 
AuthorDate: Sat Oct 2 03:17:29 2021 -0700
Commit: Linus Torvalds 
CommitDate: Sat Oct 2 09:39:15 2021 -0700

    drm/i915: fix blank screen booting crashes

Regards,

Tvrtko




BR,
Jani.




[   48.122629] [ cut here ]
[   48.122640] ODEBUG: init destroyed (active state 0) object type: 
i915_sw_fence hint: sw_fence_dummy_notify+0x0/0x10 [i915]
[   48.122963] WARNING: CPU: 0 PID: 815 at lib/debugobjects.c:505 
debug_print_object+0x6e/0x90
[   48.122976] Modules linked in: i915 i2c_algo_bit ttm 
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops 
prime_numbers intel_gtt agpgart fuse nls_iso8859_1 nls_cp437 vfat 
fat intel_rapl_msr wmi_bmof intel_rapl_common x86_pkg_temp_thermal 
r8169 realtek mdio_devres coretemp libphy efi_pstore evdev sdhci_pci 
cqhci sdhci mei_me mmc_core i2c_i801 intel_pmc_core mei led_class 
wmi i2c_smbus sch_fq_codel drm ip_tables x_tables ipv6 autofs4
[   48.123119] CPU: 0 PID: 815 Comm: kms_async_flips Not tainted 
5.15.0-rc2-hsw+ #131
[   48.123125] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, 
BIOS JYGLKCPX.86A.0027.2018.0125.1347 01/25/2018

[   48.123129] RIP: 0010:debug_print_object+0x6e/0x90
[   48.123137] Code: 07 08 02 83 c0 01 8b 4b 14 4c 8b 45 00 48 c7 c7 
a0 19 0a 82 89 05 66 07 08 02 8b 43 10 48 8b 14 c5 c0 0d e4 81 e8 d7 
2e 3c 00 <0f> 0b 83 05 c5 c0 0c 01 01 48 83 c4 08 5b 5d c3 83 05 b7 
c0 0c 01

[   48.123142] RSP: 0018:c9dabae0 EFLAGS: 00010282
[   48.123150] RAX:  RBX: 88810004f848 RCX: 

[   48.123154] RDX: 8001 RSI: 8112673f RDI: 
8112673f
[   48.123159] RBP: a0577480 R08: 88827fbfcfe8 R09: 
0009fffb
[   48.123163] R10: fffe R11: 3fff R12: 
88810a04d100
[   48.123167] R13: 88810a07d308 R14: 888109990800 R15: 
88810997b800
[   48.123171] FS:  7624b9c0() GS:888276e0() 
knlGS:

[   48.123176] CS:  0010 DS:  ES:  CR0: 80050033
[  

Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes

2021-10-15 Thread Tvrtko Ursulin




On 04/10/2021 08:36, Jani Nikula wrote:

On Fri, 24 Sep 2021, Ville Syrjälä  wrote:

On Tue, Sep 21, 2021 at 06:50:39PM -0700, Matthew Brost wrote:

From: Hugh Dickins 

5.15-rc1 crashes with blank screen when booting up on two ThinkPads
using i915.  Bisections converge convincingly, but arrive at different
and surprising "culprits", none of them the actual culprit.

netconsole (with init_netconsole() hacked to call i915_init() when
logging has started, instead of by module_init()) tells the story:

kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
with RSI: 814d408b pointing to sw_fence_dummy_notify().
I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
function needs to be 4-byte aligned.

v2:
  (Jani Nikula)
   - Change BUG_ON to WARN_ON
v3:
  (Jani / Tvrtko)
   - Short circuit __i915_sw_fence_init on WARN_ON
v4:
  (Lucas)
   - Break WARN_ON changes out in a different patch

Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
Signed-off-by: Hugh Dickins 
Signed-off-by: Matthew Brost 
Reviewed-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index ff637147b1a9..e7f78bc7ebfc 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -362,8 +362,8 @@ static int __intel_context_active(struct i915_active 
*active)
return 0;
  }
  
-static int sw_fence_dummy_notify(struct i915_sw_fence *sf,

-enum i915_sw_fence_notify state)
+static int __i915_sw_fence_call
+sw_fence_dummy_notify(struct i915_sw_fence *sf, enum i915_sw_fence_notify 
state)
  {
return NOTIFY_DONE;
  }


This thing seems broken beyond just this alignment stuff. I'm getting
this spew from DEBUG_OBJECTS all the time on a glk here:


Nobody followed through with this, so:

https://lore.kernel.org/r/20211002020257.34a0e...@oasis.local.home

and

cdc1e6e225e3 ("drm/i915: fix blank screen booting crashes")


John you pushed this yesterday? Will this cause a problem now if we have 
two commits for the same bug:


commit b0179f0d18dd7e6fb6b1c52c49ac21365257e97e
Author: Hugh Dickins 
AuthorDate: Tue Sep 21 18:50:39 2021 -0700
Commit: John Harrison 
CommitDate: Thu Oct 14 18:29:01 2021 -0700

drm/i915: fix blank screen booting crashes



commit cdc1e6e225e3256d56dc6648411630e71d7c776b
Author: Hugh Dickins 
AuthorDate: Sat Oct 2 03:17:29 2021 -0700
Commit: Linus Torvalds 
CommitDate: Sat Oct 2 09:39:15 2021 -0700

drm/i915: fix blank screen booting crashes

Regards,

Tvrtko




BR,
Jani.




[   48.122629] [ cut here ]
[   48.122640] ODEBUG: init destroyed (active state 0) object type: 
i915_sw_fence hint: sw_fence_dummy_notify+0x0/0x10 [i915]
[   48.122963] WARNING: CPU: 0 PID: 815 at lib/debugobjects.c:505 
debug_print_object+0x6e/0x90
[   48.122976] Modules linked in: i915 i2c_algo_bit ttm drm_kms_helper 
syscopyarea sysfillrect sysimgblt fb_sys_fops prime_numbers intel_gtt agpgart 
fuse nls_iso8859_1 nls_cp437 vfat fat intel_rapl_msr wmi_bmof intel_rapl_common 
x86_pkg_temp_thermal r8169 realtek mdio_devres coretemp libphy efi_pstore evdev 
sdhci_pci cqhci sdhci mei_me mmc_core i2c_i801 intel_pmc_core mei led_class wmi 
i2c_smbus sch_fq_codel drm ip_tables x_tables ipv6 autofs4
[   48.123119] CPU: 0 PID: 815 Comm: kms_async_flips Not tainted 
5.15.0-rc2-hsw+ #131
[   48.123125] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS 
JYGLKCPX.86A.0027.2018.0125.1347 01/25/2018
[   48.123129] RIP: 0010:debug_print_object+0x6e/0x90
[   48.123137] Code: 07 08 02 83 c0 01 8b 4b 14 4c 8b 45 00 48 c7 c7 a0 19 0a 82 89 
05 66 07 08 02 8b 43 10 48 8b 14 c5 c0 0d e4 81 e8 d7 2e 3c 00 <0f> 0b 83 05 c5 
c0 0c 01 01 48 83 c4 08 5b 5d c3 83 05 b7 c0 0c 01
[   48.123142] RSP: 0018:c9dabae0 EFLAGS: 00010282
[   48.123150] RAX:  RBX: 88810004f848 RCX: 
[   48.123154] RDX: 8001 RSI: 8112673f RDI: 8112673f
[   48.123159] RBP: a0577480 R08: 88827fbfcfe8 R09: 0009fffb
[   48.123163] R10: fffe R11: 3fff R12: 88810a04d100
[   48.123167] R13: 88810a07d308 R14: 888109990800 R15: 88810997b800
[   48.123171] FS:  7624b9c0() GS:888276e0() 
knlGS:
[   48.123176] CS:  0010 DS:  ES:  CR0: 80050033
[   48.123181] CR2: 77f93bf0 CR3: 000108e56000 CR4: 00350ef0
[   48.123185] Call Trace:
[   48.123190]  i915_sw_fence_reinit+0x15/0x40 [i915]
[   48.123341]  intel_context_init+0x16b/0x1d0 [i915]
[   48.123492]  intel_context_create+0x33/0x100 [i915]
[   48.123642]  default_engines+0x9d/0x120 [i915]
[   48.123806]  i915_gem_create_context+0x465/0x630 [i915]
[   48.125964]  ? trace_kmalloc+0x29/0xd0
[   48.125976]  ? 

Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes

2021-10-04 Thread Jani Nikula
On Fri, 24 Sep 2021, Ville Syrjälä  wrote:
> On Tue, Sep 21, 2021 at 06:50:39PM -0700, Matthew Brost wrote:
>> From: Hugh Dickins 
>> 
>> 5.15-rc1 crashes with blank screen when booting up on two ThinkPads
>> using i915.  Bisections converge convincingly, but arrive at different
>> and surprising "culprits", none of them the actual culprit.
>> 
>> netconsole (with init_netconsole() hacked to call i915_init() when
>> logging has started, instead of by module_init()) tells the story:
>> 
>> kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
>> with RSI: 814d408b pointing to sw_fence_dummy_notify().
>> I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
>> function needs to be 4-byte aligned.
>> 
>> v2:
>>  (Jani Nikula)
>>   - Change BUG_ON to WARN_ON
>> v3:
>>  (Jani / Tvrtko)
>>   - Short circuit __i915_sw_fence_init on WARN_ON
>> v4:
>>  (Lucas)
>>   - Break WARN_ON changes out in a different patch
>> 
>> Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
>> Signed-off-by: Hugh Dickins 
>> Signed-off-by: Matthew Brost 
>> Reviewed-by: Matthew Brost 
>> ---
>>  drivers/gpu/drm/i915/gt/intel_context.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
>> b/drivers/gpu/drm/i915/gt/intel_context.c
>> index ff637147b1a9..e7f78bc7ebfc 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_context.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
>> @@ -362,8 +362,8 @@ static int __intel_context_active(struct i915_active 
>> *active)
>>  return 0;
>>  }
>>  
>> -static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
>> - enum i915_sw_fence_notify state)
>> +static int __i915_sw_fence_call
>> +sw_fence_dummy_notify(struct i915_sw_fence *sf, enum i915_sw_fence_notify 
>> state)
>>  {
>>  return NOTIFY_DONE;
>>  }
>
> This thing seems broken beyond just this alignment stuff. I'm getting
> this spew from DEBUG_OBJECTS all the time on a glk here:

Nobody followed through with this, so:

https://lore.kernel.org/r/20211002020257.34a0e...@oasis.local.home

and

cdc1e6e225e3 ("drm/i915: fix blank screen booting crashes")


BR,
Jani.


>
> [   48.122629] [ cut here ]
> [   48.122640] ODEBUG: init destroyed (active state 0) object type: 
> i915_sw_fence hint: sw_fence_dummy_notify+0x0/0x10 [i915]
> [   48.122963] WARNING: CPU: 0 PID: 815 at lib/debugobjects.c:505 
> debug_print_object+0x6e/0x90
> [   48.122976] Modules linked in: i915 i2c_algo_bit ttm drm_kms_helper 
> syscopyarea sysfillrect sysimgblt fb_sys_fops prime_numbers intel_gtt agpgart 
> fuse nls_iso8859_1 nls_cp437 vfat fat intel_rapl_msr wmi_bmof 
> intel_rapl_common x86_pkg_temp_thermal r8169 realtek mdio_devres coretemp 
> libphy efi_pstore evdev sdhci_pci cqhci sdhci mei_me mmc_core i2c_i801 
> intel_pmc_core mei led_class wmi i2c_smbus sch_fq_codel drm ip_tables 
> x_tables ipv6 autofs4
> [   48.123119] CPU: 0 PID: 815 Comm: kms_async_flips Not tainted 
> 5.15.0-rc2-hsw+ #131
> [   48.123125] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS 
> JYGLKCPX.86A.0027.2018.0125.1347 01/25/2018
> [   48.123129] RIP: 0010:debug_print_object+0x6e/0x90
> [   48.123137] Code: 07 08 02 83 c0 01 8b 4b 14 4c 8b 45 00 48 c7 c7 a0 19 0a 
> 82 89 05 66 07 08 02 8b 43 10 48 8b 14 c5 c0 0d e4 81 e8 d7 2e 3c 00 <0f> 0b 
> 83 05 c5 c0 0c 01 01 48 83 c4 08 5b 5d c3 83 05 b7 c0 0c 01
> [   48.123142] RSP: 0018:c9dabae0 EFLAGS: 00010282
> [   48.123150] RAX:  RBX: 88810004f848 RCX: 
> 
> [   48.123154] RDX: 8001 RSI: 8112673f RDI: 
> 8112673f
> [   48.123159] RBP: a0577480 R08: 88827fbfcfe8 R09: 
> 0009fffb
> [   48.123163] R10: fffe R11: 3fff R12: 
> 88810a04d100
> [   48.123167] R13: 88810a07d308 R14: 888109990800 R15: 
> 88810997b800
> [   48.123171] FS:  7624b9c0() GS:888276e0() 
> knlGS:
> [   48.123176] CS:  0010 DS:  ES:  CR0: 80050033
> [   48.123181] CR2: 77f93bf0 CR3: 000108e56000 CR4: 
> 00350ef0
> [   48.123185] Call Trace:
> [   48.123190]  i915_sw_fence_reinit+0x15/0x40 [i915]
> [   48.123341]  intel_context_init+0x16b/0x1d0 [i915]
> [   48.123492]  intel_context_create+0x33/0x100 [i915]
> [   48.123642]  default_engines+0x9d/0x120 [i915]
> [   48.123806]  i915_gem_create_context+0x465/0x630 [i915]
> [   48.125964]  ? trace_kmalloc+0x29/0xd0
> [   48.125976]  ? kmem_cache_alloc_trace+0x121/0x620
> [   48.125984]  i915_gem_context_open+0x145/0x1d0 [i915]
> [   48.126172]  i915_gem_open+0x75/0xb0 [i915]
> [   48.126364]  drm_file_alloc+0x1b1/0x280 [drm]
> [   48.126427]  drm_open+0xde/0x250 [drm]
> [   48.126482]  drm_stub_open+0xa8/0x130 [drm]
> [   48.126538]  chrdev_open+0xbf/0x240
> [   48.126547]  ? cdev_device_add+0x90/0x90
> [   48.126553]  do_dentry_open+0x151/0x3a0
> [   48.126560]  

Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes

2021-09-24 Thread Ville Syrjälä
On Tue, Sep 21, 2021 at 06:50:39PM -0700, Matthew Brost wrote:
> From: Hugh Dickins 
> 
> 5.15-rc1 crashes with blank screen when booting up on two ThinkPads
> using i915.  Bisections converge convincingly, but arrive at different
> and surprising "culprits", none of them the actual culprit.
> 
> netconsole (with init_netconsole() hacked to call i915_init() when
> logging has started, instead of by module_init()) tells the story:
> 
> kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
> with RSI: 814d408b pointing to sw_fence_dummy_notify().
> I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
> function needs to be 4-byte aligned.
> 
> v2:
>  (Jani Nikula)
>   - Change BUG_ON to WARN_ON
> v3:
>  (Jani / Tvrtko)
>   - Short circuit __i915_sw_fence_init on WARN_ON
> v4:
>  (Lucas)
>   - Break WARN_ON changes out in a different patch
> 
> Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
> Signed-off-by: Hugh Dickins 
> Signed-off-by: Matthew Brost 
> Reviewed-by: Matthew Brost 
> ---
>  drivers/gpu/drm/i915/gt/intel_context.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> b/drivers/gpu/drm/i915/gt/intel_context.c
> index ff637147b1a9..e7f78bc7ebfc 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -362,8 +362,8 @@ static int __intel_context_active(struct i915_active 
> *active)
>   return 0;
>  }
>  
> -static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
> -  enum i915_sw_fence_notify state)
> +static int __i915_sw_fence_call
> +sw_fence_dummy_notify(struct i915_sw_fence *sf, enum i915_sw_fence_notify 
> state)
>  {
>   return NOTIFY_DONE;
>  }

This thing seems broken beyond just this alignment stuff. I'm getting
this spew from DEBUG_OBJECTS all the time on a glk here:

[   48.122629] [ cut here ]
[   48.122640] ODEBUG: init destroyed (active state 0) object type: 
i915_sw_fence hint: sw_fence_dummy_notify+0x0/0x10 [i915]
[   48.122963] WARNING: CPU: 0 PID: 815 at lib/debugobjects.c:505 
debug_print_object+0x6e/0x90
[   48.122976] Modules linked in: i915 i2c_algo_bit ttm drm_kms_helper 
syscopyarea sysfillrect sysimgblt fb_sys_fops prime_numbers intel_gtt agpgart 
fuse nls_iso8859_1 nls_cp437 vfat fat intel_rapl_msr wmi_bmof intel_rapl_common 
x86_pkg_temp_thermal r8169 realtek mdio_devres coretemp libphy efi_pstore evdev 
sdhci_pci cqhci sdhci mei_me mmc_core i2c_i801 intel_pmc_core mei led_class wmi 
i2c_smbus sch_fq_codel drm ip_tables x_tables ipv6 autofs4
[   48.123119] CPU: 0 PID: 815 Comm: kms_async_flips Not tainted 
5.15.0-rc2-hsw+ #131
[   48.123125] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS 
JYGLKCPX.86A.0027.2018.0125.1347 01/25/2018
[   48.123129] RIP: 0010:debug_print_object+0x6e/0x90
[   48.123137] Code: 07 08 02 83 c0 01 8b 4b 14 4c 8b 45 00 48 c7 c7 a0 19 0a 
82 89 05 66 07 08 02 8b 43 10 48 8b 14 c5 c0 0d e4 81 e8 d7 2e 3c 00 <0f> 0b 83 
05 c5 c0 0c 01 01 48 83 c4 08 5b 5d c3 83 05 b7 c0 0c 01
[   48.123142] RSP: 0018:c9dabae0 EFLAGS: 00010282
[   48.123150] RAX:  RBX: 88810004f848 RCX: 
[   48.123154] RDX: 8001 RSI: 8112673f RDI: 8112673f
[   48.123159] RBP: a0577480 R08: 88827fbfcfe8 R09: 0009fffb
[   48.123163] R10: fffe R11: 3fff R12: 88810a04d100
[   48.123167] R13: 88810a07d308 R14: 888109990800 R15: 88810997b800
[   48.123171] FS:  7624b9c0() GS:888276e0() 
knlGS:
[   48.123176] CS:  0010 DS:  ES:  CR0: 80050033
[   48.123181] CR2: 77f93bf0 CR3: 000108e56000 CR4: 00350ef0
[   48.123185] Call Trace:
[   48.123190]  i915_sw_fence_reinit+0x15/0x40 [i915]
[   48.123341]  intel_context_init+0x16b/0x1d0 [i915]
[   48.123492]  intel_context_create+0x33/0x100 [i915]
[   48.123642]  default_engines+0x9d/0x120 [i915]
[   48.123806]  i915_gem_create_context+0x465/0x630 [i915]
[   48.125964]  ? trace_kmalloc+0x29/0xd0
[   48.125976]  ? kmem_cache_alloc_trace+0x121/0x620
[   48.125984]  i915_gem_context_open+0x145/0x1d0 [i915]
[   48.126172]  i915_gem_open+0x75/0xb0 [i915]
[   48.126364]  drm_file_alloc+0x1b1/0x280 [drm]
[   48.126427]  drm_open+0xde/0x250 [drm]
[   48.126482]  drm_stub_open+0xa8/0x130 [drm]
[   48.126538]  chrdev_open+0xbf/0x240
[   48.126547]  ? cdev_device_add+0x90/0x90
[   48.126553]  do_dentry_open+0x151/0x3a0
[   48.126560]  path_openat+0x76f/0xa10
[   48.126568]  do_filp_open+0xa9/0x150
[   48.126575]  ? preempt_count_sub+0x9b/0xd0
[   48.126584]  ? _raw_spin_unlock+0x29/0x40
[   48.126593]  do_sys_openat2+0x97/0x160
[   48.126600]  __x64_sys_openat+0x54/0x90
[   48.126607]  do_syscall_64+0x38/0x90
[   48.126614]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   48.126622] RIP: 0033:0x77cdca5b
[   48.126630] Code: 

Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes

2021-09-22 Thread Lucas De Marchi

On Tue, Sep 21, 2021 at 06:40:41PM -0700, Matthew Brost wrote:

On Tue, Sep 21, 2021 at 04:29:31PM -0700, Lucas De Marchi wrote:

On Tue, Sep 21, 2021 at 03:55:15PM -0700, Matthew Brost wrote:
> On Tue, Sep 21, 2021 at 11:46:37AM -0700, Lucas De Marchi wrote:
> > On Tue, Sep 21, 2021 at 10:43:32AM -0700, Matthew Brost wrote:
> > > From: Hugh Dickins 
> > >
> > > 5.15-rc1 crashes with blank screen when booting up on two ThinkPads
> > > using i915.  Bisections converge convincingly, but arrive at different
> > > and surprising "culprits", none of them the actual culprit.
> > >
> > > netconsole (with init_netconsole() hacked to call i915_init() when
> > > logging has started, instead of by module_init()) tells the story:
> > >
> > > kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
> > > with RSI: 814d408b pointing to sw_fence_dummy_notify().
> > > I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
> > > function needs to be 4-byte aligned.
> > >
> > > v2:
> > > (Jani Nikula)
> > >  - Change BUG_ON to WARN_ON
> > > v3:
> > > (Jani / Tvrtko)
> > >  - Short circuit __i915_sw_fence_init on WARN_ON
> > >
> > > Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
> > > Signed-off-by: Hugh Dickins 
> > > Signed-off-by: Matthew Brost 
> > > Reviewed-by: Matthew Brost 
> > > ---
> > > drivers/gpu/drm/i915/gt/intel_context.c |  4 ++--
> > > drivers/gpu/drm/i915/i915_sw_fence.c| 17 ++---
> > > 2 files changed, 12 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
> > > index ff637147b1a9..e7f78bc7ebfc 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > > @@ -362,8 +362,8 @@ static int __intel_context_active(struct i915_active 
*active)
> > >  return 0;
> > > }
> > >
> >
> > > -static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
> > > - enum i915_sw_fence_notify state)
> > > +static int __i915_sw_fence_call
> > > +sw_fence_dummy_notify(struct i915_sw_fence *sf, enum 
i915_sw_fence_notify state)
> > > {
> > >  return NOTIFY_DONE;
> > > }
> > > diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c 
b/drivers/gpu/drm/i915/i915_sw_fence.c
> > > index c589a681da77..08cea73264e7 100644
> > > --- a/drivers/gpu/drm/i915/i915_sw_fence.c
> > > +++ b/drivers/gpu/drm/i915/i915_sw_fence.c
> > > @@ -13,9 +13,9 @@
> > > #include "i915_selftest.h"
> > >
> > > #if IS_ENABLED(CONFIG_DRM_I915_DEBUG)
> > > -#define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr)
> > > +#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr)
> > > #else
> > > -#define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr)
> > > +#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr)
> > > #endif
> > >
> > > static DEFINE_SPINLOCK(i915_sw_fence_lock);
> > > @@ -129,7 +129,10 @@ static int __i915_sw_fence_notify(struct 
i915_sw_fence *fence,
> > >  i915_sw_fence_notify_t fn;
> > >
> > >  fn = (i915_sw_fence_notify_t)(fence->flags & I915_SW_FENCE_MASK);
> > > -return fn(fence, state);
> > > +if (likely(fn))
> > > +return fn(fence, state);
> > > +else
> > > +return 0;
> >
> > since the knowledge for these being NULL (or with the wrong alignment)
> > are in the init/reinit functions,  wouldn't it be better to just add a
> > fence_nop() and assign it there instead this likely() here?
> >
>
> Maybe? I prefer the way it is.
>
> > > }
> > >
> > > #ifdef CONFIG_DRM_I915_SW_FENCE_DEBUG_OBJECTS
> > > @@ -242,9 +245,9 @@ void __i915_sw_fence_init(struct i915_sw_fence *fence,
> > >const char *name,
> > >struct lock_class_key *key)
> > > {
> > > -BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK);
> > > -
> > >  __init_waitqueue_head(>wait, name, key);
> > > +if (WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK))
> > > +return;
> >
> > like:
> >   if (WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK))
> >   fence->flags = (unsigned long)sw_fence_dummy_notify;
> >   else
> >   fence->flags = (unsigned long)fn;
> >
> >
> > f you return here instead of calling i915_sw_fence_reinit(), aren't you
> > just going to use uninitialized memory later? At least in the selftests,
> > which allocate it with kmalloc()... I didn't check others.
> >
>
> I don't think so, maybe the fence won't work but it won't blow up
> either.
>
> >
> > For the bug fix we could just add the __aligned(4) and leave the rest to a
> > separate patch.
> >
>
> The bug was sw_fence_dummy_notify in gt/intel_context.c was not 4 byte
> align which triggered a BUG_ON during boot which blank screened a
> laptop. Jani / Tvrtko suggested that we make the BUG_ON to WARN_ONs so
> if someone makes this mistake in the future kernel should boot albiet
> with a WARNING.

yes, I understood. But afaics with WARN_ON you 

[Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes

2021-09-21 Thread Matthew Brost
From: Hugh Dickins 

5.15-rc1 crashes with blank screen when booting up on two ThinkPads
using i915.  Bisections converge convincingly, but arrive at different
and surprising "culprits", none of them the actual culprit.

netconsole (with init_netconsole() hacked to call i915_init() when
logging has started, instead of by module_init()) tells the story:

kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
with RSI: 814d408b pointing to sw_fence_dummy_notify().
I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
function needs to be 4-byte aligned.

v2:
 (Jani Nikula)
  - Change BUG_ON to WARN_ON
v3:
 (Jani / Tvrtko)
  - Short circuit __i915_sw_fence_init on WARN_ON
v4:
 (Lucas)
  - Break WARN_ON changes out in a different patch

Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
Signed-off-by: Hugh Dickins 
Signed-off-by: Matthew Brost 
Reviewed-by: Matthew Brost 
---
 drivers/gpu/drm/i915/gt/intel_context.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index ff637147b1a9..e7f78bc7ebfc 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -362,8 +362,8 @@ static int __intel_context_active(struct i915_active 
*active)
return 0;
 }
 
-static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
-enum i915_sw_fence_notify state)
+static int __i915_sw_fence_call
+sw_fence_dummy_notify(struct i915_sw_fence *sf, enum i915_sw_fence_notify 
state)
 {
return NOTIFY_DONE;
 }
-- 
2.32.0



Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes

2021-09-21 Thread Matthew Brost
On Tue, Sep 21, 2021 at 04:29:31PM -0700, Lucas De Marchi wrote:
> On Tue, Sep 21, 2021 at 03:55:15PM -0700, Matthew Brost wrote:
> > On Tue, Sep 21, 2021 at 11:46:37AM -0700, Lucas De Marchi wrote:
> > > On Tue, Sep 21, 2021 at 10:43:32AM -0700, Matthew Brost wrote:
> > > > From: Hugh Dickins 
> > > >
> > > > 5.15-rc1 crashes with blank screen when booting up on two ThinkPads
> > > > using i915.  Bisections converge convincingly, but arrive at different
> > > > and surprising "culprits", none of them the actual culprit.
> > > >
> > > > netconsole (with init_netconsole() hacked to call i915_init() when
> > > > logging has started, instead of by module_init()) tells the story:
> > > >
> > > > kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
> > > > with RSI: 814d408b pointing to sw_fence_dummy_notify().
> > > > I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
> > > > function needs to be 4-byte aligned.
> > > >
> > > > v2:
> > > > (Jani Nikula)
> > > >  - Change BUG_ON to WARN_ON
> > > > v3:
> > > > (Jani / Tvrtko)
> > > >  - Short circuit __i915_sw_fence_init on WARN_ON
> > > >
> > > > Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
> > > > Signed-off-by: Hugh Dickins 
> > > > Signed-off-by: Matthew Brost 
> > > > Reviewed-by: Matthew Brost 
> > > > ---
> > > > drivers/gpu/drm/i915/gt/intel_context.c |  4 ++--
> > > > drivers/gpu/drm/i915/i915_sw_fence.c| 17 ++---
> > > > 2 files changed, 12 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> > > > b/drivers/gpu/drm/i915/gt/intel_context.c
> > > > index ff637147b1a9..e7f78bc7ebfc 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > > > @@ -362,8 +362,8 @@ static int __intel_context_active(struct 
> > > > i915_active *active)
> > > > return 0;
> > > > }
> > > >
> > > 
> > > > -static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
> > > > -enum i915_sw_fence_notify state)
> > > > +static int __i915_sw_fence_call
> > > > +sw_fence_dummy_notify(struct i915_sw_fence *sf, enum 
> > > > i915_sw_fence_notify state)
> > > > {
> > > > return NOTIFY_DONE;
> > > > }
> > > > diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c 
> > > > b/drivers/gpu/drm/i915/i915_sw_fence.c
> > > > index c589a681da77..08cea73264e7 100644
> > > > --- a/drivers/gpu/drm/i915/i915_sw_fence.c
> > > > +++ b/drivers/gpu/drm/i915/i915_sw_fence.c
> > > > @@ -13,9 +13,9 @@
> > > > #include "i915_selftest.h"
> > > >
> > > > #if IS_ENABLED(CONFIG_DRM_I915_DEBUG)
> > > > -#define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr)
> > > > +#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr)
> > > > #else
> > > > -#define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr)
> > > > +#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr)
> > > > #endif
> > > >
> > > > static DEFINE_SPINLOCK(i915_sw_fence_lock);
> > > > @@ -129,7 +129,10 @@ static int __i915_sw_fence_notify(struct 
> > > > i915_sw_fence *fence,
> > > > i915_sw_fence_notify_t fn;
> > > >
> > > > fn = (i915_sw_fence_notify_t)(fence->flags & 
> > > > I915_SW_FENCE_MASK);
> > > > -   return fn(fence, state);
> > > > +   if (likely(fn))
> > > > +   return fn(fence, state);
> > > > +   else
> > > > +   return 0;
> > > 
> > > since the knowledge for these being NULL (or with the wrong alignment)
> > > are in the init/reinit functions,  wouldn't it be better to just add a
> > > fence_nop() and assign it there instead this likely() here?
> > > 
> > 
> > Maybe? I prefer the way it is.
> > 
> > > > }
> > > >
> > > > #ifdef CONFIG_DRM_I915_SW_FENCE_DEBUG_OBJECTS
> > > > @@ -242,9 +245,9 @@ void __i915_sw_fence_init(struct i915_sw_fence 
> > > > *fence,
> > > >   const char *name,
> > > >   struct lock_class_key *key)
> > > > {
> > > > -   BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK);
> > > > -
> > > > __init_waitqueue_head(>wait, name, key);
> > > > +   if (WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK))
> > > > +   return;
> > > 
> > > like:
> > >   if (WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK))
> > >   fence->flags = (unsigned long)sw_fence_dummy_notify;
> > >   else
> > >   fence->flags = (unsigned long)fn;
> > > 
> > > 
> > > f you return here instead of calling i915_sw_fence_reinit(), aren't you
> > > just going to use uninitialized memory later? At least in the selftests,
> > > which allocate it with kmalloc()... I didn't check others.
> > > 
> > 
> > I don't think so, maybe the fence won't work but it won't blow up
> > either.
> > 
> > > 
> > > For the bug fix we could just add the __aligned(4) and leave the rest to a
> > > separate patch.
> > > 
> > 
> > The bug was sw_fence_dummy_notify in gt/intel_context.c was not 4 byte
> > align which 

Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes

2021-09-21 Thread Lucas De Marchi

On Tue, Sep 21, 2021 at 03:55:15PM -0700, Matthew Brost wrote:

On Tue, Sep 21, 2021 at 11:46:37AM -0700, Lucas De Marchi wrote:

On Tue, Sep 21, 2021 at 10:43:32AM -0700, Matthew Brost wrote:
> From: Hugh Dickins 
>
> 5.15-rc1 crashes with blank screen when booting up on two ThinkPads
> using i915.  Bisections converge convincingly, but arrive at different
> and surprising "culprits", none of them the actual culprit.
>
> netconsole (with init_netconsole() hacked to call i915_init() when
> logging has started, instead of by module_init()) tells the story:
>
> kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
> with RSI: 814d408b pointing to sw_fence_dummy_notify().
> I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
> function needs to be 4-byte aligned.
>
> v2:
> (Jani Nikula)
>  - Change BUG_ON to WARN_ON
> v3:
> (Jani / Tvrtko)
>  - Short circuit __i915_sw_fence_init on WARN_ON
>
> Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
> Signed-off-by: Hugh Dickins 
> Signed-off-by: Matthew Brost 
> Reviewed-by: Matthew Brost 
> ---
> drivers/gpu/drm/i915/gt/intel_context.c |  4 ++--
> drivers/gpu/drm/i915/i915_sw_fence.c| 17 ++---
> 2 files changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
> index ff637147b1a9..e7f78bc7ebfc 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -362,8 +362,8 @@ static int __intel_context_active(struct i915_active 
*active)
>return 0;
> }
>

> -static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
> -   enum i915_sw_fence_notify state)
> +static int __i915_sw_fence_call
> +sw_fence_dummy_notify(struct i915_sw_fence *sf, enum i915_sw_fence_notify 
state)
> {
>return NOTIFY_DONE;
> }
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c 
b/drivers/gpu/drm/i915/i915_sw_fence.c
> index c589a681da77..08cea73264e7 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.c
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.c
> @@ -13,9 +13,9 @@
> #include "i915_selftest.h"
>
> #if IS_ENABLED(CONFIG_DRM_I915_DEBUG)
> -#define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr)
> +#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr)
> #else
> -#define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr)
> +#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr)
> #endif
>
> static DEFINE_SPINLOCK(i915_sw_fence_lock);
> @@ -129,7 +129,10 @@ static int __i915_sw_fence_notify(struct i915_sw_fence 
*fence,
>i915_sw_fence_notify_t fn;
>
>fn = (i915_sw_fence_notify_t)(fence->flags & I915_SW_FENCE_MASK);
> -  return fn(fence, state);
> +  if (likely(fn))
> +  return fn(fence, state);
> +  else
> +  return 0;

since the knowledge for these being NULL (or with the wrong alignment)
are in the init/reinit functions,  wouldn't it be better to just add a
fence_nop() and assign it there instead this likely() here?



Maybe? I prefer the way it is.


> }
>
> #ifdef CONFIG_DRM_I915_SW_FENCE_DEBUG_OBJECTS
> @@ -242,9 +245,9 @@ void __i915_sw_fence_init(struct i915_sw_fence *fence,
>  const char *name,
>  struct lock_class_key *key)
> {
> -  BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK);
> -
>__init_waitqueue_head(>wait, name, key);
> +  if (WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK))
> +  return;

like:
if (WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK))
fence->flags = (unsigned long)sw_fence_dummy_notify;
else
fence->flags = (unsigned long)fn;


f you return here instead of calling i915_sw_fence_reinit(), aren't you
just going to use uninitialized memory later? At least in the selftests,
which allocate it with kmalloc()... I didn't check others.



I don't think so, maybe the fence won't work but it won't blow up
either.



For the bug fix we could just add the __aligned(4) and leave the rest to a
separate patch.



The bug was sw_fence_dummy_notify in gt/intel_context.c was not 4 byte
align which triggered a BUG_ON during boot which blank screened a
laptop. Jani / Tvrtko suggested that we make the BUG_ON to WARN_ONs so
if someone makes this mistake in the future kernel should boot albiet
with a WARNING.


yes, I understood. But afaics with WARN_ON you are allowing it to
continue and may be using uninitialized memory later, just causing other
problems down the line, which may be equally difficult to debug.

what I suggested is that there is the easy fix to apply to the current
rcX kernel, adding __aligned(4) to sw_fence_dummy_notify() (patch 1).
And there is the additional protection being added here (patch 2) which
is subject to the debate.



The long term fix is just pull out the I915_SW_FENCE_MASK (stealing bits
from a poitner) and we don't have to worry any of this.


Patch 2 may not even be needed if you're going that route. 

Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes

2021-09-21 Thread Matthew Brost
On Tue, Sep 21, 2021 at 11:46:37AM -0700, Lucas De Marchi wrote:
> On Tue, Sep 21, 2021 at 10:43:32AM -0700, Matthew Brost wrote:
> > From: Hugh Dickins 
> > 
> > 5.15-rc1 crashes with blank screen when booting up on two ThinkPads
> > using i915.  Bisections converge convincingly, but arrive at different
> > and surprising "culprits", none of them the actual culprit.
> > 
> > netconsole (with init_netconsole() hacked to call i915_init() when
> > logging has started, instead of by module_init()) tells the story:
> > 
> > kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
> > with RSI: 814d408b pointing to sw_fence_dummy_notify().
> > I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
> > function needs to be 4-byte aligned.
> > 
> > v2:
> > (Jani Nikula)
> >  - Change BUG_ON to WARN_ON
> > v3:
> > (Jani / Tvrtko)
> >  - Short circuit __i915_sw_fence_init on WARN_ON
> > 
> > Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
> > Signed-off-by: Hugh Dickins 
> > Signed-off-by: Matthew Brost 
> > Reviewed-by: Matthew Brost 
> > ---
> > drivers/gpu/drm/i915/gt/intel_context.c |  4 ++--
> > drivers/gpu/drm/i915/i915_sw_fence.c| 17 ++---
> > 2 files changed, 12 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> > b/drivers/gpu/drm/i915/gt/intel_context.c
> > index ff637147b1a9..e7f78bc7ebfc 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > @@ -362,8 +362,8 @@ static int __intel_context_active(struct i915_active 
> > *active)
> > return 0;
> > }
> > 
> 
> > -static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
> > -enum i915_sw_fence_notify state)
> > +static int __i915_sw_fence_call
> > +sw_fence_dummy_notify(struct i915_sw_fence *sf, enum i915_sw_fence_notify 
> > state)
> > {
> > return NOTIFY_DONE;
> > }
> > diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c 
> > b/drivers/gpu/drm/i915/i915_sw_fence.c
> > index c589a681da77..08cea73264e7 100644
> > --- a/drivers/gpu/drm/i915/i915_sw_fence.c
> > +++ b/drivers/gpu/drm/i915/i915_sw_fence.c
> > @@ -13,9 +13,9 @@
> > #include "i915_selftest.h"
> > 
> > #if IS_ENABLED(CONFIG_DRM_I915_DEBUG)
> > -#define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr)
> > +#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr)
> > #else
> > -#define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr)
> > +#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr)
> > #endif
> > 
> > static DEFINE_SPINLOCK(i915_sw_fence_lock);
> > @@ -129,7 +129,10 @@ static int __i915_sw_fence_notify(struct i915_sw_fence 
> > *fence,
> > i915_sw_fence_notify_t fn;
> > 
> > fn = (i915_sw_fence_notify_t)(fence->flags & I915_SW_FENCE_MASK);
> > -   return fn(fence, state);
> > +   if (likely(fn))
> > +   return fn(fence, state);
> > +   else
> > +   return 0;
> 
> since the knowledge for these being NULL (or with the wrong alignment)
> are in the init/reinit functions,  wouldn't it be better to just add a
> fence_nop() and assign it there instead this likely() here?
> 

Maybe? I prefer the way it is.

> > }
> > 
> > #ifdef CONFIG_DRM_I915_SW_FENCE_DEBUG_OBJECTS
> > @@ -242,9 +245,9 @@ void __i915_sw_fence_init(struct i915_sw_fence *fence,
> >   const char *name,
> >   struct lock_class_key *key)
> > {
> > -   BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK);
> > -
> > __init_waitqueue_head(>wait, name, key);
> > +   if (WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK))
> > +   return;
> 
> like:
>   if (WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK))
>   fence->flags = (unsigned long)sw_fence_dummy_notify;
>   else
>   fence->flags = (unsigned long)fn;
> 
> 
> f you return here instead of calling i915_sw_fence_reinit(), aren't you
> just going to use uninitialized memory later? At least in the selftests,
> which allocate it with kmalloc()... I didn't check others.
> 

I don't think so, maybe the fence won't work but it won't blow up
either.

> 
> For the bug fix we could just add the __aligned(4) and leave the rest to a
> separate patch.
> 

The bug was sw_fence_dummy_notify in gt/intel_context.c was not 4 byte
align which triggered a BUG_ON during boot which blank screened a
laptop. Jani / Tvrtko suggested that we make the BUG_ON to WARN_ONs so
if someone makes this mistake in the future kernel should boot albiet
with a WARNING.

The long term fix is just pull out the I915_SW_FENCE_MASK (stealing bits
from a poitner) and we don't have to worry any of this.

Matt

> 
> Lucas De Marchi
> 
> > fence->flags = (unsigned long)fn;
> > 
> > i915_sw_fence_reinit(fence);
> > @@ -257,8 +260,8 @@ void i915_sw_fence_reinit(struct i915_sw_fence *fence)
> > atomic_set(>pending, 1);
> > fence->error = 0;
> > 
> > -   I915_SW_FENCE_BUG_ON(!fence->flags);
> > -   

Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes

2021-09-21 Thread Lucas De Marchi

On Tue, Sep 21, 2021 at 10:43:32AM -0700, Matthew Brost wrote:

From: Hugh Dickins 

5.15-rc1 crashes with blank screen when booting up on two ThinkPads
using i915.  Bisections converge convincingly, but arrive at different
and surprising "culprits", none of them the actual culprit.

netconsole (with init_netconsole() hacked to call i915_init() when
logging has started, instead of by module_init()) tells the story:

kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
with RSI: 814d408b pointing to sw_fence_dummy_notify().
I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
function needs to be 4-byte aligned.

v2:
(Jani Nikula)
 - Change BUG_ON to WARN_ON
v3:
(Jani / Tvrtko)
 - Short circuit __i915_sw_fence_init on WARN_ON

Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
Signed-off-by: Hugh Dickins 
Signed-off-by: Matthew Brost 
Reviewed-by: Matthew Brost 
---
drivers/gpu/drm/i915/gt/intel_context.c |  4 ++--
drivers/gpu/drm/i915/i915_sw_fence.c| 17 ++---
2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index ff637147b1a9..e7f78bc7ebfc 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -362,8 +362,8 @@ static int __intel_context_active(struct i915_active 
*active)
return 0;
}




-static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
-enum i915_sw_fence_notify state)
+static int __i915_sw_fence_call
+sw_fence_dummy_notify(struct i915_sw_fence *sf, enum i915_sw_fence_notify 
state)
{
return NOTIFY_DONE;
}
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c 
b/drivers/gpu/drm/i915/i915_sw_fence.c
index c589a681da77..08cea73264e7 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -13,9 +13,9 @@
#include "i915_selftest.h"

#if IS_ENABLED(CONFIG_DRM_I915_DEBUG)
-#define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr)
+#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr)
#else
-#define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr)
+#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr)
#endif

static DEFINE_SPINLOCK(i915_sw_fence_lock);
@@ -129,7 +129,10 @@ static int __i915_sw_fence_notify(struct i915_sw_fence 
*fence,
i915_sw_fence_notify_t fn;

fn = (i915_sw_fence_notify_t)(fence->flags & I915_SW_FENCE_MASK);
-   return fn(fence, state);
+   if (likely(fn))
+   return fn(fence, state);
+   else
+   return 0;


since the knowledge for these being NULL (or with the wrong alignment)
are in the init/reinit functions,  wouldn't it be better to just add a
fence_nop() and assign it there instead this likely() here?


}

#ifdef CONFIG_DRM_I915_SW_FENCE_DEBUG_OBJECTS
@@ -242,9 +245,9 @@ void __i915_sw_fence_init(struct i915_sw_fence *fence,
  const char *name,
  struct lock_class_key *key)
{
-   BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK);
-
__init_waitqueue_head(>wait, name, key);
+   if (WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK))
+   return;


like:
if (WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK))
fence->flags = (unsigned long)sw_fence_dummy_notify;
else
fence->flags = (unsigned long)fn;


f you return here instead of calling i915_sw_fence_reinit(), aren't you
just going to use uninitialized memory later? At least in the selftests,
which allocate it with kmalloc()... I didn't check others.


For the bug fix we could just add the __aligned(4) and leave the rest to a
separate patch.


Lucas De Marchi


fence->flags = (unsigned long)fn;

i915_sw_fence_reinit(fence);
@@ -257,8 +260,8 @@ void i915_sw_fence_reinit(struct i915_sw_fence *fence)
atomic_set(>pending, 1);
fence->error = 0;

-   I915_SW_FENCE_BUG_ON(!fence->flags);
-   I915_SW_FENCE_BUG_ON(!list_empty(>wait.head));
+   I915_SW_FENCE_WARN_ON(!fence->flags);
+   I915_SW_FENCE_WARN_ON(!list_empty(>wait.head));
}

void i915_sw_fence_commit(struct i915_sw_fence *fence)
--
2.32.0



[Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes

2021-09-21 Thread Matthew Brost
From: Hugh Dickins 

5.15-rc1 crashes with blank screen when booting up on two ThinkPads
using i915.  Bisections converge convincingly, but arrive at different
and surprising "culprits", none of them the actual culprit.

netconsole (with init_netconsole() hacked to call i915_init() when
logging has started, instead of by module_init()) tells the story:

kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
with RSI: 814d408b pointing to sw_fence_dummy_notify().
I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
function needs to be 4-byte aligned.

v2:
 (Jani Nikula)
  - Change BUG_ON to WARN_ON
v3:
 (Jani / Tvrtko)
  - Short circuit __i915_sw_fence_init on WARN_ON

Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
Signed-off-by: Hugh Dickins 
Signed-off-by: Matthew Brost 
Reviewed-by: Matthew Brost 
---
 drivers/gpu/drm/i915/gt/intel_context.c |  4 ++--
 drivers/gpu/drm/i915/i915_sw_fence.c| 17 ++---
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index ff637147b1a9..e7f78bc7ebfc 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -362,8 +362,8 @@ static int __intel_context_active(struct i915_active 
*active)
return 0;
 }
 
-static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
-enum i915_sw_fence_notify state)
+static int __i915_sw_fence_call
+sw_fence_dummy_notify(struct i915_sw_fence *sf, enum i915_sw_fence_notify 
state)
 {
return NOTIFY_DONE;
 }
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c 
b/drivers/gpu/drm/i915/i915_sw_fence.c
index c589a681da77..08cea73264e7 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -13,9 +13,9 @@
 #include "i915_selftest.h"
 
 #if IS_ENABLED(CONFIG_DRM_I915_DEBUG)
-#define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr)
+#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr)
 #else
-#define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr)
+#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr)
 #endif
 
 static DEFINE_SPINLOCK(i915_sw_fence_lock);
@@ -129,7 +129,10 @@ static int __i915_sw_fence_notify(struct i915_sw_fence 
*fence,
i915_sw_fence_notify_t fn;
 
fn = (i915_sw_fence_notify_t)(fence->flags & I915_SW_FENCE_MASK);
-   return fn(fence, state);
+   if (likely(fn))
+   return fn(fence, state);
+   else
+   return 0;
 }
 
 #ifdef CONFIG_DRM_I915_SW_FENCE_DEBUG_OBJECTS
@@ -242,9 +245,9 @@ void __i915_sw_fence_init(struct i915_sw_fence *fence,
  const char *name,
  struct lock_class_key *key)
 {
-   BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK);
-
__init_waitqueue_head(>wait, name, key);
+   if (WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK))
+   return;
fence->flags = (unsigned long)fn;
 
i915_sw_fence_reinit(fence);
@@ -257,8 +260,8 @@ void i915_sw_fence_reinit(struct i915_sw_fence *fence)
atomic_set(>pending, 1);
fence->error = 0;
 
-   I915_SW_FENCE_BUG_ON(!fence->flags);
-   I915_SW_FENCE_BUG_ON(!list_empty(>wait.head));
+   I915_SW_FENCE_WARN_ON(!fence->flags);
+   I915_SW_FENCE_WARN_ON(!list_empty(>wait.head));
 }
 
 void i915_sw_fence_commit(struct i915_sw_fence *fence)
-- 
2.32.0



Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes

2021-09-20 Thread Matthew Brost
On Mon, Sep 20, 2021 at 08:42:42AM +0100, Tvrtko Ursulin wrote:
> 
> On 20/09/2021 08:38, Jani Nikula wrote:
> > On Mon, 20 Sep 2021, Tvrtko Ursulin  wrote:
> > > On 18/09/2021 00:38, Matthew Brost wrote:
> > > > From: Hugh Dickins 
> > > > 
> > > > 5.15-rc1 crashes with blank screen when booting up on two ThinkPads
> > > > using i915.  Bisections converge convincingly, but arrive at different
> > > > and surprising "culprits", none of them the actual culprit.
> > > 
> > > It is certainly surprising this patch crashed SNB and KBL.
> > > 
> > > How feasible would it be to make this code just not run when GuC is not
> > > used? Given the field it adds is called ce->guc_blocked it sounds like a
> > > natural and preferable thing to do... if possible.
> > > 
> > > > netconsole (with init_netconsole() hacked to call i915_init() when
> > > > logging has started, instead of by module_init()) tells the story:
> > > > 
> > > > kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
> > > > with RSI: 814d408b pointing to sw_fence_dummy_notify().
> > > > I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
> > > > function needs to be 4-byte aligned.
> > > > 
> > > > v2:
> > > >(Jani Nikula)
> > > > - Change BUG_ON to WARN_ON
> > > 
> > > However in this case the code would then go on and call into a wrong
> > > function offset which may be worse than a BUG_ON, no?
> > 
> > So how about just
> > 
> > if (WARN_ON(...))
> > return;

I don't think it is quite that simple as if we short circuit this
function fence->flags will be NULL which would be bad too. I'll have
make a few more changes to make this safe.

Matt

> > 
> > or whatever is needed to give both the user and the CI a better
> > opportunity to see the error.
> 
> Sounds good to me.
> 
> Regards,
> 
> Tvrtko
> 
> 
> > 
> > BR,
> > Jani
> > 
> > 
> > > 
> > > > 
> > > > Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
> > > > Signed-off-by: Hugh Dickins 
> > > > Signed-off-by: Matthew Brost 
> > > > Reviewed-by: Matthew Brost 
> > > > ---
> > > >drivers/gpu/drm/i915/gt/intel_context.c | 1 +
> > > >drivers/gpu/drm/i915/i915_sw_fence.c| 4 +++-
> > > >2 files changed, 4 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> > > > b/drivers/gpu/drm/i915/gt/intel_context.c
> > > > index ff637147b1a9..f02c2202da9d 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > > > @@ -362,6 +362,7 @@ static int __intel_context_active(struct 
> > > > i915_active *active)
> > > > return 0;
> > > >}
> > > > +__aligned(4)   /* Respect the I915_SW_FENCE_MASK */
> > > 
> > > Hugh suggested __i915_sw_fence_call which I think would be the right
> > > thing to do.
> > > 
> > > Regards,
> > > 
> > > Tvrtko
> > > 
> > > >static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
> > > >  enum i915_sw_fence_notify state)
> > > >{
> > > > diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c 
> > > > b/drivers/gpu/drm/i915/i915_sw_fence.c
> > > > index c589a681da77..1217b124c1d0 100644
> > > > --- a/drivers/gpu/drm/i915/i915_sw_fence.c
> > > > +++ b/drivers/gpu/drm/i915/i915_sw_fence.c
> > > > @@ -14,8 +14,10 @@
> > > >#if IS_ENABLED(CONFIG_DRM_I915_DEBUG)
> > > >#define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr)
> > > > +#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr)
> > > >#else
> > > >#define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr)
> > > > +#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr)
> > > >#endif
> > > >static DEFINE_SPINLOCK(i915_sw_fence_lock);
> > > > @@ -242,7 +244,7 @@ void __i915_sw_fence_init(struct i915_sw_fence 
> > > > *fence,
> > > >   const char *name,
> > > >   struct lock_class_key *key)
> > > >{
> > > > -   BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK);
> > > > +   I915_SW_FENCE_WARN_ON(!fn || (unsigned long)fn & 
> > > > ~I915_SW_FENCE_MASK);
> > > > __init_waitqueue_head(>wait, name, key);
> > > > fence->flags = (unsigned long)fn;
> > > > 
> > 


Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes

2021-09-20 Thread Matthew Brost
On Mon, Sep 20, 2021 at 08:28:13AM +0100, Tvrtko Ursulin wrote:
> 
> On 18/09/2021 00:38, Matthew Brost wrote:
> > From: Hugh Dickins 
> > 
> > 5.15-rc1 crashes with blank screen when booting up on two ThinkPads
> > using i915.  Bisections converge convincingly, but arrive at different
> > and surprising "culprits", none of them the actual culprit.
> 
> It is certainly surprising this patch crashed SNB and KBL.
> 
> How feasible would it be to make this code just not run when GuC is not
> used? Given the field it adds is called ce->guc_blocked it sounds like a
> natural and preferable thing to do... if possible.
> 

I can likely do this in a follow up patch.

> > netconsole (with init_netconsole() hacked to call i915_init() when
> > logging has started, instead of by module_init()) tells the story:
> > 
> > kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
> > with RSI: 814d408b pointing to sw_fence_dummy_notify().
> > I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
> > function needs to be 4-byte aligned.
> > 
> > v2:
> >   (Jani Nikula)
> >- Change BUG_ON to WARN_ON
> 
> However in this case the code would then go on and call into a wrong
> function offset which may be worse than a BUG_ON, no?
>

Yea, I guess that would be bad too.
 
> > 
> > Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
> > Signed-off-by: Hugh Dickins 
> > Signed-off-by: Matthew Brost 
> > Reviewed-by: Matthew Brost 
> > ---
> >   drivers/gpu/drm/i915/gt/intel_context.c | 1 +
> >   drivers/gpu/drm/i915/i915_sw_fence.c| 4 +++-
> >   2 files changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> > b/drivers/gpu/drm/i915/gt/intel_context.c
> > index ff637147b1a9..f02c2202da9d 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > @@ -362,6 +362,7 @@ static int __intel_context_active(struct i915_active 
> > *active)
> > return 0;
> >   }
> > +__aligned(4)   /* Respect the I915_SW_FENCE_MASK */
> 
> Hugh suggested __i915_sw_fence_call which I think would be the right thing
> to do.
> 

Yep. Will do.

Matt

> Regards,
> 
> Tvrtko
> 
> >   static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
> >  enum i915_sw_fence_notify state)
> >   {
> > diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c 
> > b/drivers/gpu/drm/i915/i915_sw_fence.c
> > index c589a681da77..1217b124c1d0 100644
> > --- a/drivers/gpu/drm/i915/i915_sw_fence.c
> > +++ b/drivers/gpu/drm/i915/i915_sw_fence.c
> > @@ -14,8 +14,10 @@
> >   #if IS_ENABLED(CONFIG_DRM_I915_DEBUG)
> >   #define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr)
> > +#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr)
> >   #else
> >   #define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr)
> > +#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr)
> >   #endif
> >   static DEFINE_SPINLOCK(i915_sw_fence_lock);
> > @@ -242,7 +244,7 @@ void __i915_sw_fence_init(struct i915_sw_fence *fence,
> >   const char *name,
> >   struct lock_class_key *key)
> >   {
> > -   BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK);
> > +   I915_SW_FENCE_WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK);
> > __init_waitqueue_head(>wait, name, key);
> > fence->flags = (unsigned long)fn;
> > 


Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes

2021-09-20 Thread Tvrtko Ursulin



On 20/09/2021 08:38, Jani Nikula wrote:

On Mon, 20 Sep 2021, Tvrtko Ursulin  wrote:

On 18/09/2021 00:38, Matthew Brost wrote:

From: Hugh Dickins 

5.15-rc1 crashes with blank screen when booting up on two ThinkPads
using i915.  Bisections converge convincingly, but arrive at different
and surprising "culprits", none of them the actual culprit.


It is certainly surprising this patch crashed SNB and KBL.

How feasible would it be to make this code just not run when GuC is not
used? Given the field it adds is called ce->guc_blocked it sounds like a
natural and preferable thing to do... if possible.


netconsole (with init_netconsole() hacked to call i915_init() when
logging has started, instead of by module_init()) tells the story:

kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
with RSI: 814d408b pointing to sw_fence_dummy_notify().
I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
function needs to be 4-byte aligned.

v2:
   (Jani Nikula)
- Change BUG_ON to WARN_ON


However in this case the code would then go on and call into a wrong
function offset which may be worse than a BUG_ON, no?


So how about just

if (WARN_ON(...))
return;

or whatever is needed to give both the user and the CI a better
opportunity to see the error.


Sounds good to me.

Regards,

Tvrtko




BR,
Jani






Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
Signed-off-by: Hugh Dickins 
Signed-off-by: Matthew Brost 
Reviewed-by: Matthew Brost 
---
   drivers/gpu/drm/i915/gt/intel_context.c | 1 +
   drivers/gpu/drm/i915/i915_sw_fence.c| 4 +++-
   2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index ff637147b1a9..f02c2202da9d 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -362,6 +362,7 @@ static int __intel_context_active(struct i915_active 
*active)
return 0;
   }
   
+__aligned(4)	/* Respect the I915_SW_FENCE_MASK */


Hugh suggested __i915_sw_fence_call which I think would be the right
thing to do.

Regards,

Tvrtko


   static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
 enum i915_sw_fence_notify state)
   {
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c 
b/drivers/gpu/drm/i915/i915_sw_fence.c
index c589a681da77..1217b124c1d0 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -14,8 +14,10 @@
   
   #if IS_ENABLED(CONFIG_DRM_I915_DEBUG)

   #define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr)
+#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr)
   #else
   #define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr)
+#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr)
   #endif
   
   static DEFINE_SPINLOCK(i915_sw_fence_lock);

@@ -242,7 +244,7 @@ void __i915_sw_fence_init(struct i915_sw_fence *fence,
  const char *name,
  struct lock_class_key *key)
   {
-   BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK);
+   I915_SW_FENCE_WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK);
   
   	__init_waitqueue_head(>wait, name, key);

fence->flags = (unsigned long)fn;





Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes

2021-09-20 Thread Jani Nikula
On Mon, 20 Sep 2021, Tvrtko Ursulin  wrote:
> On 18/09/2021 00:38, Matthew Brost wrote:
>> From: Hugh Dickins 
>> 
>> 5.15-rc1 crashes with blank screen when booting up on two ThinkPads
>> using i915.  Bisections converge convincingly, but arrive at different
>> and surprising "culprits", none of them the actual culprit.
>
> It is certainly surprising this patch crashed SNB and KBL.
>
> How feasible would it be to make this code just not run when GuC is not 
> used? Given the field it adds is called ce->guc_blocked it sounds like a 
> natural and preferable thing to do... if possible.
>
>> netconsole (with init_netconsole() hacked to call i915_init() when
>> logging has started, instead of by module_init()) tells the story:
>> 
>> kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
>> with RSI: 814d408b pointing to sw_fence_dummy_notify().
>> I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
>> function needs to be 4-byte aligned.
>> 
>> v2:
>>   (Jani Nikula)
>>- Change BUG_ON to WARN_ON
>
> However in this case the code would then go on and call into a wrong 
> function offset which may be worse than a BUG_ON, no?

So how about just

if (WARN_ON(...))
return;

or whatever is needed to give both the user and the CI a better
opportunity to see the error.


BR,
Jani


>
>> 
>> Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
>> Signed-off-by: Hugh Dickins 
>> Signed-off-by: Matthew Brost 
>> Reviewed-by: Matthew Brost 
>> ---
>>   drivers/gpu/drm/i915/gt/intel_context.c | 1 +
>>   drivers/gpu/drm/i915/i915_sw_fence.c| 4 +++-
>>   2 files changed, 4 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
>> b/drivers/gpu/drm/i915/gt/intel_context.c
>> index ff637147b1a9..f02c2202da9d 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_context.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
>> @@ -362,6 +362,7 @@ static int __intel_context_active(struct i915_active 
>> *active)
>>  return 0;
>>   }
>>   
>> +__aligned(4)/* Respect the I915_SW_FENCE_MASK */
>
> Hugh suggested __i915_sw_fence_call which I think would be the right 
> thing to do.
>
> Regards,
>
> Tvrtko
>
>>   static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
>>   enum i915_sw_fence_notify state)
>>   {
>> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c 
>> b/drivers/gpu/drm/i915/i915_sw_fence.c
>> index c589a681da77..1217b124c1d0 100644
>> --- a/drivers/gpu/drm/i915/i915_sw_fence.c
>> +++ b/drivers/gpu/drm/i915/i915_sw_fence.c
>> @@ -14,8 +14,10 @@
>>   
>>   #if IS_ENABLED(CONFIG_DRM_I915_DEBUG)
>>   #define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr)
>> +#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr)
>>   #else
>>   #define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr)
>> +#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr)
>>   #endif
>>   
>>   static DEFINE_SPINLOCK(i915_sw_fence_lock);
>> @@ -242,7 +244,7 @@ void __i915_sw_fence_init(struct i915_sw_fence *fence,
>>const char *name,
>>struct lock_class_key *key)
>>   {
>> -BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK);
>> +I915_SW_FENCE_WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK);
>>   
>>  __init_waitqueue_head(>wait, name, key);
>>  fence->flags = (unsigned long)fn;
>> 

-- 
Jani Nikula, Intel Open Source Graphics Center


Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes

2021-09-20 Thread Tvrtko Ursulin



On 18/09/2021 00:38, Matthew Brost wrote:

From: Hugh Dickins 

5.15-rc1 crashes with blank screen when booting up on two ThinkPads
using i915.  Bisections converge convincingly, but arrive at different
and surprising "culprits", none of them the actual culprit.


It is certainly surprising this patch crashed SNB and KBL.

How feasible would it be to make this code just not run when GuC is not 
used? Given the field it adds is called ce->guc_blocked it sounds like a 
natural and preferable thing to do... if possible.



netconsole (with init_netconsole() hacked to call i915_init() when
logging has started, instead of by module_init()) tells the story:

kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
with RSI: 814d408b pointing to sw_fence_dummy_notify().
I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
function needs to be 4-byte aligned.

v2:
  (Jani Nikula)
   - Change BUG_ON to WARN_ON


However in this case the code would then go on and call into a wrong 
function offset which may be worse than a BUG_ON, no?




Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
Signed-off-by: Hugh Dickins 
Signed-off-by: Matthew Brost 
Reviewed-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context.c | 1 +
  drivers/gpu/drm/i915/i915_sw_fence.c| 4 +++-
  2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index ff637147b1a9..f02c2202da9d 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -362,6 +362,7 @@ static int __intel_context_active(struct i915_active 
*active)
return 0;
  }
  
+__aligned(4)	/* Respect the I915_SW_FENCE_MASK */


Hugh suggested __i915_sw_fence_call which I think would be the right 
thing to do.


Regards,

Tvrtko


  static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
 enum i915_sw_fence_notify state)
  {
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c 
b/drivers/gpu/drm/i915/i915_sw_fence.c
index c589a681da77..1217b124c1d0 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -14,8 +14,10 @@
  
  #if IS_ENABLED(CONFIG_DRM_I915_DEBUG)

  #define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr)
+#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr)
  #else
  #define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr)
+#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr)
  #endif
  
  static DEFINE_SPINLOCK(i915_sw_fence_lock);

@@ -242,7 +244,7 @@ void __i915_sw_fence_init(struct i915_sw_fence *fence,
  const char *name,
  struct lock_class_key *key)
  {
-   BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK);
+   I915_SW_FENCE_WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK);
  
  	__init_waitqueue_head(>wait, name, key);

fence->flags = (unsigned long)fn;



[Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes

2021-09-17 Thread Matthew Brost
From: Hugh Dickins 

5.15-rc1 crashes with blank screen when booting up on two ThinkPads
using i915.  Bisections converge convincingly, but arrive at different
and surprising "culprits", none of them the actual culprit.

netconsole (with init_netconsole() hacked to call i915_init() when
logging has started, instead of by module_init()) tells the story:

kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
with RSI: 814d408b pointing to sw_fence_dummy_notify().
I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
function needs to be 4-byte aligned.

v2:
 (Jani Nikula)
  - Change BUG_ON to WARN_ON

Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
Signed-off-by: Hugh Dickins 
Signed-off-by: Matthew Brost 
Reviewed-by: Matthew Brost 
---
 drivers/gpu/drm/i915/gt/intel_context.c | 1 +
 drivers/gpu/drm/i915/i915_sw_fence.c| 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index ff637147b1a9..f02c2202da9d 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -362,6 +362,7 @@ static int __intel_context_active(struct i915_active 
*active)
return 0;
 }
 
+__aligned(4)   /* Respect the I915_SW_FENCE_MASK */
 static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
 enum i915_sw_fence_notify state)
 {
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c 
b/drivers/gpu/drm/i915/i915_sw_fence.c
index c589a681da77..1217b124c1d0 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -14,8 +14,10 @@
 
 #if IS_ENABLED(CONFIG_DRM_I915_DEBUG)
 #define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr)
+#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr)
 #else
 #define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr)
+#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr)
 #endif
 
 static DEFINE_SPINLOCK(i915_sw_fence_lock);
@@ -242,7 +244,7 @@ void __i915_sw_fence_init(struct i915_sw_fence *fence,
  const char *name,
  struct lock_class_key *key)
 {
-   BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK);
+   I915_SW_FENCE_WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK);
 
__init_waitqueue_head(>wait, name, key);
fence->flags = (unsigned long)fn;
-- 
2.32.0



[Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes

2021-09-17 Thread Matthew Brost
From: Hugh Dickins 

5.15-rc1 crashes with blank screen when booting up on two ThinkPads
using i915.  Bisections converge convincingly, but arrive at different
and surprising "culprits", none of them the actual culprit.

netconsole (with init_netconsole() hacked to call i915_init() when
logging has started, instead of by module_init()) tells the story:

kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
with RSI: 814d408b pointing to sw_fence_dummy_notify().
I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
function needs to be 4-byte aligned.

v2:
 (Jani Nikula)
  - Change BUG_ON to WARN_ON

Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
Signed-off-by: Hugh Dickins 
Signed-off-by: Matthew Brost 
Reviewed-by: Matthew Brost 
---
 drivers/gpu/drm/i915/gt/intel_context.c | 1 +
 drivers/gpu/drm/i915/i915_sw_fence.c| 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index ff637147b1a9..f02c2202da9d 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -362,6 +362,7 @@ static int __intel_context_active(struct i915_active 
*active)
return 0;
 }
 
+__aligned(4)   /* Respect the I915_SW_FENCE_MASK */
 static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
 enum i915_sw_fence_notify state)
 {
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c 
b/drivers/gpu/drm/i915/i915_sw_fence.c
index c589a681da77..1217b124c1d0 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -14,8 +14,10 @@
 
 #if IS_ENABLED(CONFIG_DRM_I915_DEBUG)
 #define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr)
+#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr)
 #else
 #define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr)
+#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr)
 #endif
 
 static DEFINE_SPINLOCK(i915_sw_fence_lock);
@@ -242,7 +244,7 @@ void __i915_sw_fence_init(struct i915_sw_fence *fence,
  const char *name,
  struct lock_class_key *key)
 {
-   BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK);
+   I915_SW_FENCE_WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK);
 
__init_waitqueue_head(>wait, name, key);
fence->flags = (unsigned long)fn;
-- 
2.32.0