Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes
On Fri, 15 Oct 2021, John Harrison wrote: > On 10/15/2021 07:52, Tvrtko Ursulin wrote: >> On 04/10/2021 08:36, Jani Nikula wrote: >>> On Fri, 24 Sep 2021, Ville Syrjälä >>> wrote: On Tue, Sep 21, 2021 at 06:50:39PM -0700, Matthew Brost wrote: > From: Hugh Dickins > > 5.15-rc1 crashes with blank screen when booting up on two ThinkPads > using i915. Bisections converge convincingly, but arrive at different > and surprising "culprits", none of them the actual culprit. > > netconsole (with init_netconsole() hacked to call i915_init() when > logging has started, instead of by module_init()) tells the story: > > kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245! > with RSI: 814d408b pointing to sw_fence_dummy_notify(). > I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that > function needs to be 4-byte aligned. > > v2: > (Jani Nikula) > - Change BUG_ON to WARN_ON > v3: > (Jani / Tvrtko) > - Short circuit __i915_sw_fence_init on WARN_ON > v4: > (Lucas) > - Break WARN_ON changes out in a different patch > > Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") > Signed-off-by: Hugh Dickins > Signed-off-by: Matthew Brost > Reviewed-by: Matthew Brost > --- > drivers/gpu/drm/i915/gt/intel_context.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c > b/drivers/gpu/drm/i915/gt/intel_context.c > index ff637147b1a9..e7f78bc7ebfc 100644 > --- a/drivers/gpu/drm/i915/gt/intel_context.c > +++ b/drivers/gpu/drm/i915/gt/intel_context.c > @@ -362,8 +362,8 @@ static int __intel_context_active(struct > i915_active *active) > return 0; > } > -static int sw_fence_dummy_notify(struct i915_sw_fence *sf, > - enum i915_sw_fence_notify state) > +static int __i915_sw_fence_call > +sw_fence_dummy_notify(struct i915_sw_fence *sf, enum > i915_sw_fence_notify state) > { > return NOTIFY_DONE; > } This thing seems broken beyond just this alignment stuff. I'm getting this spew from DEBUG_OBJECTS all the time on a glk here: >>> >>> Nobody followed through with this, so: >>> >>> https://lore.kernel.org/r/20211002020257.34a0e...@oasis.local.home >>> >>> and >>> >>> cdc1e6e225e3 ("drm/i915: fix blank screen booting crashes") >> >> John you pushed this yesterday? Will this cause a problem now if we >> have two commits for the same bug: > I'm thoroughly confused. > > I finally got far enough down my backlog to reach this and it did not > appear to be in the tree yet so I tried pushing it. The DIM tool gave me > a bunch of errors that didn't seem to make any sense. It certainly gave > me the impression that it did not actually do anything. So I gave up on > it. But now it seems like it did actually push something? And it was > already merged after all? Linus merged it directly to his tree because we didn't follow through with this soon enough. Linus' tree feeds to drm-tip via drm-intel-fixes, but I guess we haven't done backmerges from Linux' tree to drm-next to drm-intel-gt-next with that commit. Once you applied the patch, dim rebuilt drm-tip, which now got the same change from both drm-intel-fixes and drm-intel-gt-next. It happens. But in general, if there are regressions with blank screens at boot occurring in Linus' tree, the fix needs to move much faster. BR, Jani. > > John. > >> >> commit b0179f0d18dd7e6fb6b1c52c49ac21365257e97e >> Author: Hugh Dickins >> AuthorDate: Tue Sep 21 18:50:39 2021 -0700 >> Commit: John Harrison >> CommitDate: Thu Oct 14 18:29:01 2021 -0700 >> >> drm/i915: fix blank screen booting crashes >> >> >> >> commit cdc1e6e225e3256d56dc6648411630e71d7c776b >> Author: Hugh Dickins >> AuthorDate: Sat Oct 2 03:17:29 2021 -0700 >> Commit: Linus Torvalds >> CommitDate: Sat Oct 2 09:39:15 2021 -0700 >> >> drm/i915: fix blank screen booting crashes >> >> Regards, >> >> Tvrtko >> >>> >>> >>> BR, >>> Jani. >>> >>> [ 48.122629] [ cut here ] [ 48.122640] ODEBUG: init destroyed (active state 0) object type: i915_sw_fence hint: sw_fence_dummy_notify+0x0/0x10 [i915] [ 48.122963] WARNING: CPU: 0 PID: 815 at lib/debugobjects.c:505 debug_print_object+0x6e/0x90 [ 48.122976] Modules linked in: i915 i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops prime_numbers intel_gtt agpgart fuse nls_iso8859_1 nls_cp437 vfat fat intel_rapl_msr wmi_bmof intel_rapl_common x86_pkg_temp_thermal r8169 realtek mdio_devres coretemp libphy efi_pstore evdev sdhci_pci cqhci sdhci mei_me mmc_core i2c_i801 intel_pmc_core mei led_class wmi i2c_smbus sch_fq_codel drm ip_tables x_tables ipv6 autofs4 [ 48.123119] CPU:
Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes
On 10/15/2021 07:52, Tvrtko Ursulin wrote: On 04/10/2021 08:36, Jani Nikula wrote: On Fri, 24 Sep 2021, Ville Syrjälä wrote: On Tue, Sep 21, 2021 at 06:50:39PM -0700, Matthew Brost wrote: From: Hugh Dickins 5.15-rc1 crashes with blank screen when booting up on two ThinkPads using i915. Bisections converge convincingly, but arrive at different and surprising "culprits", none of them the actual culprit. netconsole (with init_netconsole() hacked to call i915_init() when logging has started, instead of by module_init()) tells the story: kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245! with RSI: 814d408b pointing to sw_fence_dummy_notify(). I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that function needs to be 4-byte aligned. v2: (Jani Nikula) - Change BUG_ON to WARN_ON v3: (Jani / Tvrtko) - Short circuit __i915_sw_fence_init on WARN_ON v4: (Lucas) - Break WARN_ON changes out in a different patch Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") Signed-off-by: Hugh Dickins Signed-off-by: Matthew Brost Reviewed-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_context.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index ff637147b1a9..e7f78bc7ebfc 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -362,8 +362,8 @@ static int __intel_context_active(struct i915_active *active) return 0; } -static int sw_fence_dummy_notify(struct i915_sw_fence *sf, - enum i915_sw_fence_notify state) +static int __i915_sw_fence_call +sw_fence_dummy_notify(struct i915_sw_fence *sf, enum i915_sw_fence_notify state) { return NOTIFY_DONE; } This thing seems broken beyond just this alignment stuff. I'm getting this spew from DEBUG_OBJECTS all the time on a glk here: Nobody followed through with this, so: https://lore.kernel.org/r/20211002020257.34a0e...@oasis.local.home and cdc1e6e225e3 ("drm/i915: fix blank screen booting crashes") John you pushed this yesterday? Will this cause a problem now if we have two commits for the same bug: I'm thoroughly confused. I finally got far enough down my backlog to reach this and it did not appear to be in the tree yet so I tried pushing it. The DIM tool gave me a bunch of errors that didn't seem to make any sense. It certainly gave me the impression that it did not actually do anything. So I gave up on it. But now it seems like it did actually push something? And it was already merged after all? John. commit b0179f0d18dd7e6fb6b1c52c49ac21365257e97e Author: Hugh Dickins AuthorDate: Tue Sep 21 18:50:39 2021 -0700 Commit: John Harrison CommitDate: Thu Oct 14 18:29:01 2021 -0700 drm/i915: fix blank screen booting crashes commit cdc1e6e225e3256d56dc6648411630e71d7c776b Author: Hugh Dickins AuthorDate: Sat Oct 2 03:17:29 2021 -0700 Commit: Linus Torvalds CommitDate: Sat Oct 2 09:39:15 2021 -0700 drm/i915: fix blank screen booting crashes Regards, Tvrtko BR, Jani. [ 48.122629] [ cut here ] [ 48.122640] ODEBUG: init destroyed (active state 0) object type: i915_sw_fence hint: sw_fence_dummy_notify+0x0/0x10 [i915] [ 48.122963] WARNING: CPU: 0 PID: 815 at lib/debugobjects.c:505 debug_print_object+0x6e/0x90 [ 48.122976] Modules linked in: i915 i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops prime_numbers intel_gtt agpgart fuse nls_iso8859_1 nls_cp437 vfat fat intel_rapl_msr wmi_bmof intel_rapl_common x86_pkg_temp_thermal r8169 realtek mdio_devres coretemp libphy efi_pstore evdev sdhci_pci cqhci sdhci mei_me mmc_core i2c_i801 intel_pmc_core mei led_class wmi i2c_smbus sch_fq_codel drm ip_tables x_tables ipv6 autofs4 [ 48.123119] CPU: 0 PID: 815 Comm: kms_async_flips Not tainted 5.15.0-rc2-hsw+ #131 [ 48.123125] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS JYGLKCPX.86A.0027.2018.0125.1347 01/25/2018 [ 48.123129] RIP: 0010:debug_print_object+0x6e/0x90 [ 48.123137] Code: 07 08 02 83 c0 01 8b 4b 14 4c 8b 45 00 48 c7 c7 a0 19 0a 82 89 05 66 07 08 02 8b 43 10 48 8b 14 c5 c0 0d e4 81 e8 d7 2e 3c 00 <0f> 0b 83 05 c5 c0 0c 01 01 48 83 c4 08 5b 5d c3 83 05 b7 c0 0c 01 [ 48.123142] RSP: 0018:c9dabae0 EFLAGS: 00010282 [ 48.123150] RAX: RBX: 88810004f848 RCX: [ 48.123154] RDX: 8001 RSI: 8112673f RDI: 8112673f [ 48.123159] RBP: a0577480 R08: 88827fbfcfe8 R09: 0009fffb [ 48.123163] R10: fffe R11: 3fff R12: 88810a04d100 [ 48.123167] R13: 88810a07d308 R14: 888109990800 R15: 88810997b800 [ 48.123171] FS: 7624b9c0() GS:888276e0() knlGS: [ 48.123176] CS: 0010 DS: ES: CR0: 80050033 [
Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes
On 04/10/2021 08:36, Jani Nikula wrote: On Fri, 24 Sep 2021, Ville Syrjälä wrote: On Tue, Sep 21, 2021 at 06:50:39PM -0700, Matthew Brost wrote: From: Hugh Dickins 5.15-rc1 crashes with blank screen when booting up on two ThinkPads using i915. Bisections converge convincingly, but arrive at different and surprising "culprits", none of them the actual culprit. netconsole (with init_netconsole() hacked to call i915_init() when logging has started, instead of by module_init()) tells the story: kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245! with RSI: 814d408b pointing to sw_fence_dummy_notify(). I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that function needs to be 4-byte aligned. v2: (Jani Nikula) - Change BUG_ON to WARN_ON v3: (Jani / Tvrtko) - Short circuit __i915_sw_fence_init on WARN_ON v4: (Lucas) - Break WARN_ON changes out in a different patch Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") Signed-off-by: Hugh Dickins Signed-off-by: Matthew Brost Reviewed-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_context.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index ff637147b1a9..e7f78bc7ebfc 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -362,8 +362,8 @@ static int __intel_context_active(struct i915_active *active) return 0; } -static int sw_fence_dummy_notify(struct i915_sw_fence *sf, -enum i915_sw_fence_notify state) +static int __i915_sw_fence_call +sw_fence_dummy_notify(struct i915_sw_fence *sf, enum i915_sw_fence_notify state) { return NOTIFY_DONE; } This thing seems broken beyond just this alignment stuff. I'm getting this spew from DEBUG_OBJECTS all the time on a glk here: Nobody followed through with this, so: https://lore.kernel.org/r/20211002020257.34a0e...@oasis.local.home and cdc1e6e225e3 ("drm/i915: fix blank screen booting crashes") John you pushed this yesterday? Will this cause a problem now if we have two commits for the same bug: commit b0179f0d18dd7e6fb6b1c52c49ac21365257e97e Author: Hugh Dickins AuthorDate: Tue Sep 21 18:50:39 2021 -0700 Commit: John Harrison CommitDate: Thu Oct 14 18:29:01 2021 -0700 drm/i915: fix blank screen booting crashes commit cdc1e6e225e3256d56dc6648411630e71d7c776b Author: Hugh Dickins AuthorDate: Sat Oct 2 03:17:29 2021 -0700 Commit: Linus Torvalds CommitDate: Sat Oct 2 09:39:15 2021 -0700 drm/i915: fix blank screen booting crashes Regards, Tvrtko BR, Jani. [ 48.122629] [ cut here ] [ 48.122640] ODEBUG: init destroyed (active state 0) object type: i915_sw_fence hint: sw_fence_dummy_notify+0x0/0x10 [i915] [ 48.122963] WARNING: CPU: 0 PID: 815 at lib/debugobjects.c:505 debug_print_object+0x6e/0x90 [ 48.122976] Modules linked in: i915 i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops prime_numbers intel_gtt agpgart fuse nls_iso8859_1 nls_cp437 vfat fat intel_rapl_msr wmi_bmof intel_rapl_common x86_pkg_temp_thermal r8169 realtek mdio_devres coretemp libphy efi_pstore evdev sdhci_pci cqhci sdhci mei_me mmc_core i2c_i801 intel_pmc_core mei led_class wmi i2c_smbus sch_fq_codel drm ip_tables x_tables ipv6 autofs4 [ 48.123119] CPU: 0 PID: 815 Comm: kms_async_flips Not tainted 5.15.0-rc2-hsw+ #131 [ 48.123125] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS JYGLKCPX.86A.0027.2018.0125.1347 01/25/2018 [ 48.123129] RIP: 0010:debug_print_object+0x6e/0x90 [ 48.123137] Code: 07 08 02 83 c0 01 8b 4b 14 4c 8b 45 00 48 c7 c7 a0 19 0a 82 89 05 66 07 08 02 8b 43 10 48 8b 14 c5 c0 0d e4 81 e8 d7 2e 3c 00 <0f> 0b 83 05 c5 c0 0c 01 01 48 83 c4 08 5b 5d c3 83 05 b7 c0 0c 01 [ 48.123142] RSP: 0018:c9dabae0 EFLAGS: 00010282 [ 48.123150] RAX: RBX: 88810004f848 RCX: [ 48.123154] RDX: 8001 RSI: 8112673f RDI: 8112673f [ 48.123159] RBP: a0577480 R08: 88827fbfcfe8 R09: 0009fffb [ 48.123163] R10: fffe R11: 3fff R12: 88810a04d100 [ 48.123167] R13: 88810a07d308 R14: 888109990800 R15: 88810997b800 [ 48.123171] FS: 7624b9c0() GS:888276e0() knlGS: [ 48.123176] CS: 0010 DS: ES: CR0: 80050033 [ 48.123181] CR2: 77f93bf0 CR3: 000108e56000 CR4: 00350ef0 [ 48.123185] Call Trace: [ 48.123190] i915_sw_fence_reinit+0x15/0x40 [i915] [ 48.123341] intel_context_init+0x16b/0x1d0 [i915] [ 48.123492] intel_context_create+0x33/0x100 [i915] [ 48.123642] default_engines+0x9d/0x120 [i915] [ 48.123806] i915_gem_create_context+0x465/0x630 [i915] [ 48.125964] ? trace_kmalloc+0x29/0xd0 [ 48.125976] ?
Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes
On Fri, 24 Sep 2021, Ville Syrjälä wrote: > On Tue, Sep 21, 2021 at 06:50:39PM -0700, Matthew Brost wrote: >> From: Hugh Dickins >> >> 5.15-rc1 crashes with blank screen when booting up on two ThinkPads >> using i915. Bisections converge convincingly, but arrive at different >> and surprising "culprits", none of them the actual culprit. >> >> netconsole (with init_netconsole() hacked to call i915_init() when >> logging has started, instead of by module_init()) tells the story: >> >> kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245! >> with RSI: 814d408b pointing to sw_fence_dummy_notify(). >> I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that >> function needs to be 4-byte aligned. >> >> v2: >> (Jani Nikula) >> - Change BUG_ON to WARN_ON >> v3: >> (Jani / Tvrtko) >> - Short circuit __i915_sw_fence_init on WARN_ON >> v4: >> (Lucas) >> - Break WARN_ON changes out in a different patch >> >> Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") >> Signed-off-by: Hugh Dickins >> Signed-off-by: Matthew Brost >> Reviewed-by: Matthew Brost >> --- >> drivers/gpu/drm/i915/gt/intel_context.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c >> b/drivers/gpu/drm/i915/gt/intel_context.c >> index ff637147b1a9..e7f78bc7ebfc 100644 >> --- a/drivers/gpu/drm/i915/gt/intel_context.c >> +++ b/drivers/gpu/drm/i915/gt/intel_context.c >> @@ -362,8 +362,8 @@ static int __intel_context_active(struct i915_active >> *active) >> return 0; >> } >> >> -static int sw_fence_dummy_notify(struct i915_sw_fence *sf, >> - enum i915_sw_fence_notify state) >> +static int __i915_sw_fence_call >> +sw_fence_dummy_notify(struct i915_sw_fence *sf, enum i915_sw_fence_notify >> state) >> { >> return NOTIFY_DONE; >> } > > This thing seems broken beyond just this alignment stuff. I'm getting > this spew from DEBUG_OBJECTS all the time on a glk here: Nobody followed through with this, so: https://lore.kernel.org/r/20211002020257.34a0e...@oasis.local.home and cdc1e6e225e3 ("drm/i915: fix blank screen booting crashes") BR, Jani. > > [ 48.122629] [ cut here ] > [ 48.122640] ODEBUG: init destroyed (active state 0) object type: > i915_sw_fence hint: sw_fence_dummy_notify+0x0/0x10 [i915] > [ 48.122963] WARNING: CPU: 0 PID: 815 at lib/debugobjects.c:505 > debug_print_object+0x6e/0x90 > [ 48.122976] Modules linked in: i915 i2c_algo_bit ttm drm_kms_helper > syscopyarea sysfillrect sysimgblt fb_sys_fops prime_numbers intel_gtt agpgart > fuse nls_iso8859_1 nls_cp437 vfat fat intel_rapl_msr wmi_bmof > intel_rapl_common x86_pkg_temp_thermal r8169 realtek mdio_devres coretemp > libphy efi_pstore evdev sdhci_pci cqhci sdhci mei_me mmc_core i2c_i801 > intel_pmc_core mei led_class wmi i2c_smbus sch_fq_codel drm ip_tables > x_tables ipv6 autofs4 > [ 48.123119] CPU: 0 PID: 815 Comm: kms_async_flips Not tainted > 5.15.0-rc2-hsw+ #131 > [ 48.123125] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS > JYGLKCPX.86A.0027.2018.0125.1347 01/25/2018 > [ 48.123129] RIP: 0010:debug_print_object+0x6e/0x90 > [ 48.123137] Code: 07 08 02 83 c0 01 8b 4b 14 4c 8b 45 00 48 c7 c7 a0 19 0a > 82 89 05 66 07 08 02 8b 43 10 48 8b 14 c5 c0 0d e4 81 e8 d7 2e 3c 00 <0f> 0b > 83 05 c5 c0 0c 01 01 48 83 c4 08 5b 5d c3 83 05 b7 c0 0c 01 > [ 48.123142] RSP: 0018:c9dabae0 EFLAGS: 00010282 > [ 48.123150] RAX: RBX: 88810004f848 RCX: > > [ 48.123154] RDX: 8001 RSI: 8112673f RDI: > 8112673f > [ 48.123159] RBP: a0577480 R08: 88827fbfcfe8 R09: > 0009fffb > [ 48.123163] R10: fffe R11: 3fff R12: > 88810a04d100 > [ 48.123167] R13: 88810a07d308 R14: 888109990800 R15: > 88810997b800 > [ 48.123171] FS: 7624b9c0() GS:888276e0() > knlGS: > [ 48.123176] CS: 0010 DS: ES: CR0: 80050033 > [ 48.123181] CR2: 77f93bf0 CR3: 000108e56000 CR4: > 00350ef0 > [ 48.123185] Call Trace: > [ 48.123190] i915_sw_fence_reinit+0x15/0x40 [i915] > [ 48.123341] intel_context_init+0x16b/0x1d0 [i915] > [ 48.123492] intel_context_create+0x33/0x100 [i915] > [ 48.123642] default_engines+0x9d/0x120 [i915] > [ 48.123806] i915_gem_create_context+0x465/0x630 [i915] > [ 48.125964] ? trace_kmalloc+0x29/0xd0 > [ 48.125976] ? kmem_cache_alloc_trace+0x121/0x620 > [ 48.125984] i915_gem_context_open+0x145/0x1d0 [i915] > [ 48.126172] i915_gem_open+0x75/0xb0 [i915] > [ 48.126364] drm_file_alloc+0x1b1/0x280 [drm] > [ 48.126427] drm_open+0xde/0x250 [drm] > [ 48.126482] drm_stub_open+0xa8/0x130 [drm] > [ 48.126538] chrdev_open+0xbf/0x240 > [ 48.126547] ? cdev_device_add+0x90/0x90 > [ 48.126553] do_dentry_open+0x151/0x3a0 > [ 48.126560]
Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes
On Tue, Sep 21, 2021 at 06:50:39PM -0700, Matthew Brost wrote: > From: Hugh Dickins > > 5.15-rc1 crashes with blank screen when booting up on two ThinkPads > using i915. Bisections converge convincingly, but arrive at different > and surprising "culprits", none of them the actual culprit. > > netconsole (with init_netconsole() hacked to call i915_init() when > logging has started, instead of by module_init()) tells the story: > > kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245! > with RSI: 814d408b pointing to sw_fence_dummy_notify(). > I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that > function needs to be 4-byte aligned. > > v2: > (Jani Nikula) > - Change BUG_ON to WARN_ON > v3: > (Jani / Tvrtko) > - Short circuit __i915_sw_fence_init on WARN_ON > v4: > (Lucas) > - Break WARN_ON changes out in a different patch > > Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") > Signed-off-by: Hugh Dickins > Signed-off-by: Matthew Brost > Reviewed-by: Matthew Brost > --- > drivers/gpu/drm/i915/gt/intel_context.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c > b/drivers/gpu/drm/i915/gt/intel_context.c > index ff637147b1a9..e7f78bc7ebfc 100644 > --- a/drivers/gpu/drm/i915/gt/intel_context.c > +++ b/drivers/gpu/drm/i915/gt/intel_context.c > @@ -362,8 +362,8 @@ static int __intel_context_active(struct i915_active > *active) > return 0; > } > > -static int sw_fence_dummy_notify(struct i915_sw_fence *sf, > - enum i915_sw_fence_notify state) > +static int __i915_sw_fence_call > +sw_fence_dummy_notify(struct i915_sw_fence *sf, enum i915_sw_fence_notify > state) > { > return NOTIFY_DONE; > } This thing seems broken beyond just this alignment stuff. I'm getting this spew from DEBUG_OBJECTS all the time on a glk here: [ 48.122629] [ cut here ] [ 48.122640] ODEBUG: init destroyed (active state 0) object type: i915_sw_fence hint: sw_fence_dummy_notify+0x0/0x10 [i915] [ 48.122963] WARNING: CPU: 0 PID: 815 at lib/debugobjects.c:505 debug_print_object+0x6e/0x90 [ 48.122976] Modules linked in: i915 i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops prime_numbers intel_gtt agpgart fuse nls_iso8859_1 nls_cp437 vfat fat intel_rapl_msr wmi_bmof intel_rapl_common x86_pkg_temp_thermal r8169 realtek mdio_devres coretemp libphy efi_pstore evdev sdhci_pci cqhci sdhci mei_me mmc_core i2c_i801 intel_pmc_core mei led_class wmi i2c_smbus sch_fq_codel drm ip_tables x_tables ipv6 autofs4 [ 48.123119] CPU: 0 PID: 815 Comm: kms_async_flips Not tainted 5.15.0-rc2-hsw+ #131 [ 48.123125] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS JYGLKCPX.86A.0027.2018.0125.1347 01/25/2018 [ 48.123129] RIP: 0010:debug_print_object+0x6e/0x90 [ 48.123137] Code: 07 08 02 83 c0 01 8b 4b 14 4c 8b 45 00 48 c7 c7 a0 19 0a 82 89 05 66 07 08 02 8b 43 10 48 8b 14 c5 c0 0d e4 81 e8 d7 2e 3c 00 <0f> 0b 83 05 c5 c0 0c 01 01 48 83 c4 08 5b 5d c3 83 05 b7 c0 0c 01 [ 48.123142] RSP: 0018:c9dabae0 EFLAGS: 00010282 [ 48.123150] RAX: RBX: 88810004f848 RCX: [ 48.123154] RDX: 8001 RSI: 8112673f RDI: 8112673f [ 48.123159] RBP: a0577480 R08: 88827fbfcfe8 R09: 0009fffb [ 48.123163] R10: fffe R11: 3fff R12: 88810a04d100 [ 48.123167] R13: 88810a07d308 R14: 888109990800 R15: 88810997b800 [ 48.123171] FS: 7624b9c0() GS:888276e0() knlGS: [ 48.123176] CS: 0010 DS: ES: CR0: 80050033 [ 48.123181] CR2: 77f93bf0 CR3: 000108e56000 CR4: 00350ef0 [ 48.123185] Call Trace: [ 48.123190] i915_sw_fence_reinit+0x15/0x40 [i915] [ 48.123341] intel_context_init+0x16b/0x1d0 [i915] [ 48.123492] intel_context_create+0x33/0x100 [i915] [ 48.123642] default_engines+0x9d/0x120 [i915] [ 48.123806] i915_gem_create_context+0x465/0x630 [i915] [ 48.125964] ? trace_kmalloc+0x29/0xd0 [ 48.125976] ? kmem_cache_alloc_trace+0x121/0x620 [ 48.125984] i915_gem_context_open+0x145/0x1d0 [i915] [ 48.126172] i915_gem_open+0x75/0xb0 [i915] [ 48.126364] drm_file_alloc+0x1b1/0x280 [drm] [ 48.126427] drm_open+0xde/0x250 [drm] [ 48.126482] drm_stub_open+0xa8/0x130 [drm] [ 48.126538] chrdev_open+0xbf/0x240 [ 48.126547] ? cdev_device_add+0x90/0x90 [ 48.126553] do_dentry_open+0x151/0x3a0 [ 48.126560] path_openat+0x76f/0xa10 [ 48.126568] do_filp_open+0xa9/0x150 [ 48.126575] ? preempt_count_sub+0x9b/0xd0 [ 48.126584] ? _raw_spin_unlock+0x29/0x40 [ 48.126593] do_sys_openat2+0x97/0x160 [ 48.126600] __x64_sys_openat+0x54/0x90 [ 48.126607] do_syscall_64+0x38/0x90 [ 48.126614] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 48.126622] RIP: 0033:0x77cdca5b [ 48.126630] Code:
Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes
On Tue, Sep 21, 2021 at 06:40:41PM -0700, Matthew Brost wrote: On Tue, Sep 21, 2021 at 04:29:31PM -0700, Lucas De Marchi wrote: On Tue, Sep 21, 2021 at 03:55:15PM -0700, Matthew Brost wrote: > On Tue, Sep 21, 2021 at 11:46:37AM -0700, Lucas De Marchi wrote: > > On Tue, Sep 21, 2021 at 10:43:32AM -0700, Matthew Brost wrote: > > > From: Hugh Dickins > > > > > > 5.15-rc1 crashes with blank screen when booting up on two ThinkPads > > > using i915. Bisections converge convincingly, but arrive at different > > > and surprising "culprits", none of them the actual culprit. > > > > > > netconsole (with init_netconsole() hacked to call i915_init() when > > > logging has started, instead of by module_init()) tells the story: > > > > > > kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245! > > > with RSI: 814d408b pointing to sw_fence_dummy_notify(). > > > I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that > > > function needs to be 4-byte aligned. > > > > > > v2: > > > (Jani Nikula) > > > - Change BUG_ON to WARN_ON > > > v3: > > > (Jani / Tvrtko) > > > - Short circuit __i915_sw_fence_init on WARN_ON > > > > > > Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") > > > Signed-off-by: Hugh Dickins > > > Signed-off-by: Matthew Brost > > > Reviewed-by: Matthew Brost > > > --- > > > drivers/gpu/drm/i915/gt/intel_context.c | 4 ++-- > > > drivers/gpu/drm/i915/i915_sw_fence.c| 17 ++--- > > > 2 files changed, 12 insertions(+), 9 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c > > > index ff637147b1a9..e7f78bc7ebfc 100644 > > > --- a/drivers/gpu/drm/i915/gt/intel_context.c > > > +++ b/drivers/gpu/drm/i915/gt/intel_context.c > > > @@ -362,8 +362,8 @@ static int __intel_context_active(struct i915_active *active) > > > return 0; > > > } > > > > > > > > -static int sw_fence_dummy_notify(struct i915_sw_fence *sf, > > > - enum i915_sw_fence_notify state) > > > +static int __i915_sw_fence_call > > > +sw_fence_dummy_notify(struct i915_sw_fence *sf, enum i915_sw_fence_notify state) > > > { > > > return NOTIFY_DONE; > > > } > > > diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c > > > index c589a681da77..08cea73264e7 100644 > > > --- a/drivers/gpu/drm/i915/i915_sw_fence.c > > > +++ b/drivers/gpu/drm/i915/i915_sw_fence.c > > > @@ -13,9 +13,9 @@ > > > #include "i915_selftest.h" > > > > > > #if IS_ENABLED(CONFIG_DRM_I915_DEBUG) > > > -#define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr) > > > +#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr) > > > #else > > > -#define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr) > > > +#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr) > > > #endif > > > > > > static DEFINE_SPINLOCK(i915_sw_fence_lock); > > > @@ -129,7 +129,10 @@ static int __i915_sw_fence_notify(struct i915_sw_fence *fence, > > > i915_sw_fence_notify_t fn; > > > > > > fn = (i915_sw_fence_notify_t)(fence->flags & I915_SW_FENCE_MASK); > > > -return fn(fence, state); > > > +if (likely(fn)) > > > +return fn(fence, state); > > > +else > > > +return 0; > > > > since the knowledge for these being NULL (or with the wrong alignment) > > are in the init/reinit functions, wouldn't it be better to just add a > > fence_nop() and assign it there instead this likely() here? > > > > Maybe? I prefer the way it is. > > > > } > > > > > > #ifdef CONFIG_DRM_I915_SW_FENCE_DEBUG_OBJECTS > > > @@ -242,9 +245,9 @@ void __i915_sw_fence_init(struct i915_sw_fence *fence, > > >const char *name, > > >struct lock_class_key *key) > > > { > > > -BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK); > > > - > > > __init_waitqueue_head(>wait, name, key); > > > +if (WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK)) > > > +return; > > > > like: > > if (WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK)) > > fence->flags = (unsigned long)sw_fence_dummy_notify; > > else > > fence->flags = (unsigned long)fn; > > > > > > f you return here instead of calling i915_sw_fence_reinit(), aren't you > > just going to use uninitialized memory later? At least in the selftests, > > which allocate it with kmalloc()... I didn't check others. > > > > I don't think so, maybe the fence won't work but it won't blow up > either. > > > > > For the bug fix we could just add the __aligned(4) and leave the rest to a > > separate patch. > > > > The bug was sw_fence_dummy_notify in gt/intel_context.c was not 4 byte > align which triggered a BUG_ON during boot which blank screened a > laptop. Jani / Tvrtko suggested that we make the BUG_ON to WARN_ONs so > if someone makes this mistake in the future kernel should boot albiet > with a WARNING. yes, I understood. But afaics with WARN_ON you
[Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes
From: Hugh Dickins 5.15-rc1 crashes with blank screen when booting up on two ThinkPads using i915. Bisections converge convincingly, but arrive at different and surprising "culprits", none of them the actual culprit. netconsole (with init_netconsole() hacked to call i915_init() when logging has started, instead of by module_init()) tells the story: kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245! with RSI: 814d408b pointing to sw_fence_dummy_notify(). I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that function needs to be 4-byte aligned. v2: (Jani Nikula) - Change BUG_ON to WARN_ON v3: (Jani / Tvrtko) - Short circuit __i915_sw_fence_init on WARN_ON v4: (Lucas) - Break WARN_ON changes out in a different patch Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") Signed-off-by: Hugh Dickins Signed-off-by: Matthew Brost Reviewed-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_context.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index ff637147b1a9..e7f78bc7ebfc 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -362,8 +362,8 @@ static int __intel_context_active(struct i915_active *active) return 0; } -static int sw_fence_dummy_notify(struct i915_sw_fence *sf, -enum i915_sw_fence_notify state) +static int __i915_sw_fence_call +sw_fence_dummy_notify(struct i915_sw_fence *sf, enum i915_sw_fence_notify state) { return NOTIFY_DONE; } -- 2.32.0
Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes
On Tue, Sep 21, 2021 at 04:29:31PM -0700, Lucas De Marchi wrote: > On Tue, Sep 21, 2021 at 03:55:15PM -0700, Matthew Brost wrote: > > On Tue, Sep 21, 2021 at 11:46:37AM -0700, Lucas De Marchi wrote: > > > On Tue, Sep 21, 2021 at 10:43:32AM -0700, Matthew Brost wrote: > > > > From: Hugh Dickins > > > > > > > > 5.15-rc1 crashes with blank screen when booting up on two ThinkPads > > > > using i915. Bisections converge convincingly, but arrive at different > > > > and surprising "culprits", none of them the actual culprit. > > > > > > > > netconsole (with init_netconsole() hacked to call i915_init() when > > > > logging has started, instead of by module_init()) tells the story: > > > > > > > > kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245! > > > > with RSI: 814d408b pointing to sw_fence_dummy_notify(). > > > > I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that > > > > function needs to be 4-byte aligned. > > > > > > > > v2: > > > > (Jani Nikula) > > > > - Change BUG_ON to WARN_ON > > > > v3: > > > > (Jani / Tvrtko) > > > > - Short circuit __i915_sw_fence_init on WARN_ON > > > > > > > > Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") > > > > Signed-off-by: Hugh Dickins > > > > Signed-off-by: Matthew Brost > > > > Reviewed-by: Matthew Brost > > > > --- > > > > drivers/gpu/drm/i915/gt/intel_context.c | 4 ++-- > > > > drivers/gpu/drm/i915/i915_sw_fence.c| 17 ++--- > > > > 2 files changed, 12 insertions(+), 9 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c > > > > b/drivers/gpu/drm/i915/gt/intel_context.c > > > > index ff637147b1a9..e7f78bc7ebfc 100644 > > > > --- a/drivers/gpu/drm/i915/gt/intel_context.c > > > > +++ b/drivers/gpu/drm/i915/gt/intel_context.c > > > > @@ -362,8 +362,8 @@ static int __intel_context_active(struct > > > > i915_active *active) > > > > return 0; > > > > } > > > > > > > > > > > -static int sw_fence_dummy_notify(struct i915_sw_fence *sf, > > > > -enum i915_sw_fence_notify state) > > > > +static int __i915_sw_fence_call > > > > +sw_fence_dummy_notify(struct i915_sw_fence *sf, enum > > > > i915_sw_fence_notify state) > > > > { > > > > return NOTIFY_DONE; > > > > } > > > > diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c > > > > b/drivers/gpu/drm/i915/i915_sw_fence.c > > > > index c589a681da77..08cea73264e7 100644 > > > > --- a/drivers/gpu/drm/i915/i915_sw_fence.c > > > > +++ b/drivers/gpu/drm/i915/i915_sw_fence.c > > > > @@ -13,9 +13,9 @@ > > > > #include "i915_selftest.h" > > > > > > > > #if IS_ENABLED(CONFIG_DRM_I915_DEBUG) > > > > -#define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr) > > > > +#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr) > > > > #else > > > > -#define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr) > > > > +#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr) > > > > #endif > > > > > > > > static DEFINE_SPINLOCK(i915_sw_fence_lock); > > > > @@ -129,7 +129,10 @@ static int __i915_sw_fence_notify(struct > > > > i915_sw_fence *fence, > > > > i915_sw_fence_notify_t fn; > > > > > > > > fn = (i915_sw_fence_notify_t)(fence->flags & > > > > I915_SW_FENCE_MASK); > > > > - return fn(fence, state); > > > > + if (likely(fn)) > > > > + return fn(fence, state); > > > > + else > > > > + return 0; > > > > > > since the knowledge for these being NULL (or with the wrong alignment) > > > are in the init/reinit functions, wouldn't it be better to just add a > > > fence_nop() and assign it there instead this likely() here? > > > > > > > Maybe? I prefer the way it is. > > > > > > } > > > > > > > > #ifdef CONFIG_DRM_I915_SW_FENCE_DEBUG_OBJECTS > > > > @@ -242,9 +245,9 @@ void __i915_sw_fence_init(struct i915_sw_fence > > > > *fence, > > > > const char *name, > > > > struct lock_class_key *key) > > > > { > > > > - BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK); > > > > - > > > > __init_waitqueue_head(>wait, name, key); > > > > + if (WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK)) > > > > + return; > > > > > > like: > > > if (WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK)) > > > fence->flags = (unsigned long)sw_fence_dummy_notify; > > > else > > > fence->flags = (unsigned long)fn; > > > > > > > > > f you return here instead of calling i915_sw_fence_reinit(), aren't you > > > just going to use uninitialized memory later? At least in the selftests, > > > which allocate it with kmalloc()... I didn't check others. > > > > > > > I don't think so, maybe the fence won't work but it won't blow up > > either. > > > > > > > > For the bug fix we could just add the __aligned(4) and leave the rest to a > > > separate patch. > > > > > > > The bug was sw_fence_dummy_notify in gt/intel_context.c was not 4 byte > > align which
Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes
On Tue, Sep 21, 2021 at 03:55:15PM -0700, Matthew Brost wrote: On Tue, Sep 21, 2021 at 11:46:37AM -0700, Lucas De Marchi wrote: On Tue, Sep 21, 2021 at 10:43:32AM -0700, Matthew Brost wrote: > From: Hugh Dickins > > 5.15-rc1 crashes with blank screen when booting up on two ThinkPads > using i915. Bisections converge convincingly, but arrive at different > and surprising "culprits", none of them the actual culprit. > > netconsole (with init_netconsole() hacked to call i915_init() when > logging has started, instead of by module_init()) tells the story: > > kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245! > with RSI: 814d408b pointing to sw_fence_dummy_notify(). > I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that > function needs to be 4-byte aligned. > > v2: > (Jani Nikula) > - Change BUG_ON to WARN_ON > v3: > (Jani / Tvrtko) > - Short circuit __i915_sw_fence_init on WARN_ON > > Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") > Signed-off-by: Hugh Dickins > Signed-off-by: Matthew Brost > Reviewed-by: Matthew Brost > --- > drivers/gpu/drm/i915/gt/intel_context.c | 4 ++-- > drivers/gpu/drm/i915/i915_sw_fence.c| 17 ++--- > 2 files changed, 12 insertions(+), 9 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c > index ff637147b1a9..e7f78bc7ebfc 100644 > --- a/drivers/gpu/drm/i915/gt/intel_context.c > +++ b/drivers/gpu/drm/i915/gt/intel_context.c > @@ -362,8 +362,8 @@ static int __intel_context_active(struct i915_active *active) >return 0; > } > > -static int sw_fence_dummy_notify(struct i915_sw_fence *sf, > - enum i915_sw_fence_notify state) > +static int __i915_sw_fence_call > +sw_fence_dummy_notify(struct i915_sw_fence *sf, enum i915_sw_fence_notify state) > { >return NOTIFY_DONE; > } > diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c > index c589a681da77..08cea73264e7 100644 > --- a/drivers/gpu/drm/i915/i915_sw_fence.c > +++ b/drivers/gpu/drm/i915/i915_sw_fence.c > @@ -13,9 +13,9 @@ > #include "i915_selftest.h" > > #if IS_ENABLED(CONFIG_DRM_I915_DEBUG) > -#define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr) > +#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr) > #else > -#define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr) > +#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr) > #endif > > static DEFINE_SPINLOCK(i915_sw_fence_lock); > @@ -129,7 +129,10 @@ static int __i915_sw_fence_notify(struct i915_sw_fence *fence, >i915_sw_fence_notify_t fn; > >fn = (i915_sw_fence_notify_t)(fence->flags & I915_SW_FENCE_MASK); > - return fn(fence, state); > + if (likely(fn)) > + return fn(fence, state); > + else > + return 0; since the knowledge for these being NULL (or with the wrong alignment) are in the init/reinit functions, wouldn't it be better to just add a fence_nop() and assign it there instead this likely() here? Maybe? I prefer the way it is. > } > > #ifdef CONFIG_DRM_I915_SW_FENCE_DEBUG_OBJECTS > @@ -242,9 +245,9 @@ void __i915_sw_fence_init(struct i915_sw_fence *fence, > const char *name, > struct lock_class_key *key) > { > - BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK); > - >__init_waitqueue_head(>wait, name, key); > + if (WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK)) > + return; like: if (WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK)) fence->flags = (unsigned long)sw_fence_dummy_notify; else fence->flags = (unsigned long)fn; f you return here instead of calling i915_sw_fence_reinit(), aren't you just going to use uninitialized memory later? At least in the selftests, which allocate it with kmalloc()... I didn't check others. I don't think so, maybe the fence won't work but it won't blow up either. For the bug fix we could just add the __aligned(4) and leave the rest to a separate patch. The bug was sw_fence_dummy_notify in gt/intel_context.c was not 4 byte align which triggered a BUG_ON during boot which blank screened a laptop. Jani / Tvrtko suggested that we make the BUG_ON to WARN_ONs so if someone makes this mistake in the future kernel should boot albiet with a WARNING. yes, I understood. But afaics with WARN_ON you are allowing it to continue and may be using uninitialized memory later, just causing other problems down the line, which may be equally difficult to debug. what I suggested is that there is the easy fix to apply to the current rcX kernel, adding __aligned(4) to sw_fence_dummy_notify() (patch 1). And there is the additional protection being added here (patch 2) which is subject to the debate. The long term fix is just pull out the I915_SW_FENCE_MASK (stealing bits from a poitner) and we don't have to worry any of this. Patch 2 may not even be needed if you're going that route.
Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes
On Tue, Sep 21, 2021 at 11:46:37AM -0700, Lucas De Marchi wrote: > On Tue, Sep 21, 2021 at 10:43:32AM -0700, Matthew Brost wrote: > > From: Hugh Dickins > > > > 5.15-rc1 crashes with blank screen when booting up on two ThinkPads > > using i915. Bisections converge convincingly, but arrive at different > > and surprising "culprits", none of them the actual culprit. > > > > netconsole (with init_netconsole() hacked to call i915_init() when > > logging has started, instead of by module_init()) tells the story: > > > > kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245! > > with RSI: 814d408b pointing to sw_fence_dummy_notify(). > > I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that > > function needs to be 4-byte aligned. > > > > v2: > > (Jani Nikula) > > - Change BUG_ON to WARN_ON > > v3: > > (Jani / Tvrtko) > > - Short circuit __i915_sw_fence_init on WARN_ON > > > > Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") > > Signed-off-by: Hugh Dickins > > Signed-off-by: Matthew Brost > > Reviewed-by: Matthew Brost > > --- > > drivers/gpu/drm/i915/gt/intel_context.c | 4 ++-- > > drivers/gpu/drm/i915/i915_sw_fence.c| 17 ++--- > > 2 files changed, 12 insertions(+), 9 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c > > b/drivers/gpu/drm/i915/gt/intel_context.c > > index ff637147b1a9..e7f78bc7ebfc 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_context.c > > +++ b/drivers/gpu/drm/i915/gt/intel_context.c > > @@ -362,8 +362,8 @@ static int __intel_context_active(struct i915_active > > *active) > > return 0; > > } > > > > > -static int sw_fence_dummy_notify(struct i915_sw_fence *sf, > > -enum i915_sw_fence_notify state) > > +static int __i915_sw_fence_call > > +sw_fence_dummy_notify(struct i915_sw_fence *sf, enum i915_sw_fence_notify > > state) > > { > > return NOTIFY_DONE; > > } > > diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c > > b/drivers/gpu/drm/i915/i915_sw_fence.c > > index c589a681da77..08cea73264e7 100644 > > --- a/drivers/gpu/drm/i915/i915_sw_fence.c > > +++ b/drivers/gpu/drm/i915/i915_sw_fence.c > > @@ -13,9 +13,9 @@ > > #include "i915_selftest.h" > > > > #if IS_ENABLED(CONFIG_DRM_I915_DEBUG) > > -#define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr) > > +#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr) > > #else > > -#define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr) > > +#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr) > > #endif > > > > static DEFINE_SPINLOCK(i915_sw_fence_lock); > > @@ -129,7 +129,10 @@ static int __i915_sw_fence_notify(struct i915_sw_fence > > *fence, > > i915_sw_fence_notify_t fn; > > > > fn = (i915_sw_fence_notify_t)(fence->flags & I915_SW_FENCE_MASK); > > - return fn(fence, state); > > + if (likely(fn)) > > + return fn(fence, state); > > + else > > + return 0; > > since the knowledge for these being NULL (or with the wrong alignment) > are in the init/reinit functions, wouldn't it be better to just add a > fence_nop() and assign it there instead this likely() here? > Maybe? I prefer the way it is. > > } > > > > #ifdef CONFIG_DRM_I915_SW_FENCE_DEBUG_OBJECTS > > @@ -242,9 +245,9 @@ void __i915_sw_fence_init(struct i915_sw_fence *fence, > > const char *name, > > struct lock_class_key *key) > > { > > - BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK); > > - > > __init_waitqueue_head(>wait, name, key); > > + if (WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK)) > > + return; > > like: > if (WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK)) > fence->flags = (unsigned long)sw_fence_dummy_notify; > else > fence->flags = (unsigned long)fn; > > > f you return here instead of calling i915_sw_fence_reinit(), aren't you > just going to use uninitialized memory later? At least in the selftests, > which allocate it with kmalloc()... I didn't check others. > I don't think so, maybe the fence won't work but it won't blow up either. > > For the bug fix we could just add the __aligned(4) and leave the rest to a > separate patch. > The bug was sw_fence_dummy_notify in gt/intel_context.c was not 4 byte align which triggered a BUG_ON during boot which blank screened a laptop. Jani / Tvrtko suggested that we make the BUG_ON to WARN_ONs so if someone makes this mistake in the future kernel should boot albiet with a WARNING. The long term fix is just pull out the I915_SW_FENCE_MASK (stealing bits from a poitner) and we don't have to worry any of this. Matt > > Lucas De Marchi > > > fence->flags = (unsigned long)fn; > > > > i915_sw_fence_reinit(fence); > > @@ -257,8 +260,8 @@ void i915_sw_fence_reinit(struct i915_sw_fence *fence) > > atomic_set(>pending, 1); > > fence->error = 0; > > > > - I915_SW_FENCE_BUG_ON(!fence->flags); > > -
Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes
On Tue, Sep 21, 2021 at 10:43:32AM -0700, Matthew Brost wrote: From: Hugh Dickins 5.15-rc1 crashes with blank screen when booting up on two ThinkPads using i915. Bisections converge convincingly, but arrive at different and surprising "culprits", none of them the actual culprit. netconsole (with init_netconsole() hacked to call i915_init() when logging has started, instead of by module_init()) tells the story: kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245! with RSI: 814d408b pointing to sw_fence_dummy_notify(). I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that function needs to be 4-byte aligned. v2: (Jani Nikula) - Change BUG_ON to WARN_ON v3: (Jani / Tvrtko) - Short circuit __i915_sw_fence_init on WARN_ON Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") Signed-off-by: Hugh Dickins Signed-off-by: Matthew Brost Reviewed-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_context.c | 4 ++-- drivers/gpu/drm/i915/i915_sw_fence.c| 17 ++--- 2 files changed, 12 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index ff637147b1a9..e7f78bc7ebfc 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -362,8 +362,8 @@ static int __intel_context_active(struct i915_active *active) return 0; } -static int sw_fence_dummy_notify(struct i915_sw_fence *sf, -enum i915_sw_fence_notify state) +static int __i915_sw_fence_call +sw_fence_dummy_notify(struct i915_sw_fence *sf, enum i915_sw_fence_notify state) { return NOTIFY_DONE; } diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c index c589a681da77..08cea73264e7 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence.c +++ b/drivers/gpu/drm/i915/i915_sw_fence.c @@ -13,9 +13,9 @@ #include "i915_selftest.h" #if IS_ENABLED(CONFIG_DRM_I915_DEBUG) -#define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr) +#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr) #else -#define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr) +#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr) #endif static DEFINE_SPINLOCK(i915_sw_fence_lock); @@ -129,7 +129,10 @@ static int __i915_sw_fence_notify(struct i915_sw_fence *fence, i915_sw_fence_notify_t fn; fn = (i915_sw_fence_notify_t)(fence->flags & I915_SW_FENCE_MASK); - return fn(fence, state); + if (likely(fn)) + return fn(fence, state); + else + return 0; since the knowledge for these being NULL (or with the wrong alignment) are in the init/reinit functions, wouldn't it be better to just add a fence_nop() and assign it there instead this likely() here? } #ifdef CONFIG_DRM_I915_SW_FENCE_DEBUG_OBJECTS @@ -242,9 +245,9 @@ void __i915_sw_fence_init(struct i915_sw_fence *fence, const char *name, struct lock_class_key *key) { - BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK); - __init_waitqueue_head(>wait, name, key); + if (WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK)) + return; like: if (WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK)) fence->flags = (unsigned long)sw_fence_dummy_notify; else fence->flags = (unsigned long)fn; f you return here instead of calling i915_sw_fence_reinit(), aren't you just going to use uninitialized memory later? At least in the selftests, which allocate it with kmalloc()... I didn't check others. For the bug fix we could just add the __aligned(4) and leave the rest to a separate patch. Lucas De Marchi fence->flags = (unsigned long)fn; i915_sw_fence_reinit(fence); @@ -257,8 +260,8 @@ void i915_sw_fence_reinit(struct i915_sw_fence *fence) atomic_set(>pending, 1); fence->error = 0; - I915_SW_FENCE_BUG_ON(!fence->flags); - I915_SW_FENCE_BUG_ON(!list_empty(>wait.head)); + I915_SW_FENCE_WARN_ON(!fence->flags); + I915_SW_FENCE_WARN_ON(!list_empty(>wait.head)); } void i915_sw_fence_commit(struct i915_sw_fence *fence) -- 2.32.0
[Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes
From: Hugh Dickins 5.15-rc1 crashes with blank screen when booting up on two ThinkPads using i915. Bisections converge convincingly, but arrive at different and surprising "culprits", none of them the actual culprit. netconsole (with init_netconsole() hacked to call i915_init() when logging has started, instead of by module_init()) tells the story: kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245! with RSI: 814d408b pointing to sw_fence_dummy_notify(). I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that function needs to be 4-byte aligned. v2: (Jani Nikula) - Change BUG_ON to WARN_ON v3: (Jani / Tvrtko) - Short circuit __i915_sw_fence_init on WARN_ON Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") Signed-off-by: Hugh Dickins Signed-off-by: Matthew Brost Reviewed-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_context.c | 4 ++-- drivers/gpu/drm/i915/i915_sw_fence.c| 17 ++--- 2 files changed, 12 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index ff637147b1a9..e7f78bc7ebfc 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -362,8 +362,8 @@ static int __intel_context_active(struct i915_active *active) return 0; } -static int sw_fence_dummy_notify(struct i915_sw_fence *sf, -enum i915_sw_fence_notify state) +static int __i915_sw_fence_call +sw_fence_dummy_notify(struct i915_sw_fence *sf, enum i915_sw_fence_notify state) { return NOTIFY_DONE; } diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c index c589a681da77..08cea73264e7 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence.c +++ b/drivers/gpu/drm/i915/i915_sw_fence.c @@ -13,9 +13,9 @@ #include "i915_selftest.h" #if IS_ENABLED(CONFIG_DRM_I915_DEBUG) -#define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr) +#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr) #else -#define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr) +#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr) #endif static DEFINE_SPINLOCK(i915_sw_fence_lock); @@ -129,7 +129,10 @@ static int __i915_sw_fence_notify(struct i915_sw_fence *fence, i915_sw_fence_notify_t fn; fn = (i915_sw_fence_notify_t)(fence->flags & I915_SW_FENCE_MASK); - return fn(fence, state); + if (likely(fn)) + return fn(fence, state); + else + return 0; } #ifdef CONFIG_DRM_I915_SW_FENCE_DEBUG_OBJECTS @@ -242,9 +245,9 @@ void __i915_sw_fence_init(struct i915_sw_fence *fence, const char *name, struct lock_class_key *key) { - BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK); - __init_waitqueue_head(>wait, name, key); + if (WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK)) + return; fence->flags = (unsigned long)fn; i915_sw_fence_reinit(fence); @@ -257,8 +260,8 @@ void i915_sw_fence_reinit(struct i915_sw_fence *fence) atomic_set(>pending, 1); fence->error = 0; - I915_SW_FENCE_BUG_ON(!fence->flags); - I915_SW_FENCE_BUG_ON(!list_empty(>wait.head)); + I915_SW_FENCE_WARN_ON(!fence->flags); + I915_SW_FENCE_WARN_ON(!list_empty(>wait.head)); } void i915_sw_fence_commit(struct i915_sw_fence *fence) -- 2.32.0
Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes
On Mon, Sep 20, 2021 at 08:42:42AM +0100, Tvrtko Ursulin wrote: > > On 20/09/2021 08:38, Jani Nikula wrote: > > On Mon, 20 Sep 2021, Tvrtko Ursulin wrote: > > > On 18/09/2021 00:38, Matthew Brost wrote: > > > > From: Hugh Dickins > > > > > > > > 5.15-rc1 crashes with blank screen when booting up on two ThinkPads > > > > using i915. Bisections converge convincingly, but arrive at different > > > > and surprising "culprits", none of them the actual culprit. > > > > > > It is certainly surprising this patch crashed SNB and KBL. > > > > > > How feasible would it be to make this code just not run when GuC is not > > > used? Given the field it adds is called ce->guc_blocked it sounds like a > > > natural and preferable thing to do... if possible. > > > > > > > netconsole (with init_netconsole() hacked to call i915_init() when > > > > logging has started, instead of by module_init()) tells the story: > > > > > > > > kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245! > > > > with RSI: 814d408b pointing to sw_fence_dummy_notify(). > > > > I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that > > > > function needs to be 4-byte aligned. > > > > > > > > v2: > > > >(Jani Nikula) > > > > - Change BUG_ON to WARN_ON > > > > > > However in this case the code would then go on and call into a wrong > > > function offset which may be worse than a BUG_ON, no? > > > > So how about just > > > > if (WARN_ON(...)) > > return; I don't think it is quite that simple as if we short circuit this function fence->flags will be NULL which would be bad too. I'll have make a few more changes to make this safe. Matt > > > > or whatever is needed to give both the user and the CI a better > > opportunity to see the error. > > Sounds good to me. > > Regards, > > Tvrtko > > > > > > BR, > > Jani > > > > > > > > > > > > > > > Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") > > > > Signed-off-by: Hugh Dickins > > > > Signed-off-by: Matthew Brost > > > > Reviewed-by: Matthew Brost > > > > --- > > > >drivers/gpu/drm/i915/gt/intel_context.c | 1 + > > > >drivers/gpu/drm/i915/i915_sw_fence.c| 4 +++- > > > >2 files changed, 4 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c > > > > b/drivers/gpu/drm/i915/gt/intel_context.c > > > > index ff637147b1a9..f02c2202da9d 100644 > > > > --- a/drivers/gpu/drm/i915/gt/intel_context.c > > > > +++ b/drivers/gpu/drm/i915/gt/intel_context.c > > > > @@ -362,6 +362,7 @@ static int __intel_context_active(struct > > > > i915_active *active) > > > > return 0; > > > >} > > > > +__aligned(4) /* Respect the I915_SW_FENCE_MASK */ > > > > > > Hugh suggested __i915_sw_fence_call which I think would be the right > > > thing to do. > > > > > > Regards, > > > > > > Tvrtko > > > > > > >static int sw_fence_dummy_notify(struct i915_sw_fence *sf, > > > > enum i915_sw_fence_notify state) > > > >{ > > > > diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c > > > > b/drivers/gpu/drm/i915/i915_sw_fence.c > > > > index c589a681da77..1217b124c1d0 100644 > > > > --- a/drivers/gpu/drm/i915/i915_sw_fence.c > > > > +++ b/drivers/gpu/drm/i915/i915_sw_fence.c > > > > @@ -14,8 +14,10 @@ > > > >#if IS_ENABLED(CONFIG_DRM_I915_DEBUG) > > > >#define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr) > > > > +#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr) > > > >#else > > > >#define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr) > > > > +#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr) > > > >#endif > > > >static DEFINE_SPINLOCK(i915_sw_fence_lock); > > > > @@ -242,7 +244,7 @@ void __i915_sw_fence_init(struct i915_sw_fence > > > > *fence, > > > > const char *name, > > > > struct lock_class_key *key) > > > >{ > > > > - BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK); > > > > + I915_SW_FENCE_WARN_ON(!fn || (unsigned long)fn & > > > > ~I915_SW_FENCE_MASK); > > > > __init_waitqueue_head(>wait, name, key); > > > > fence->flags = (unsigned long)fn; > > > > > >
Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes
On Mon, Sep 20, 2021 at 08:28:13AM +0100, Tvrtko Ursulin wrote: > > On 18/09/2021 00:38, Matthew Brost wrote: > > From: Hugh Dickins > > > > 5.15-rc1 crashes with blank screen when booting up on two ThinkPads > > using i915. Bisections converge convincingly, but arrive at different > > and surprising "culprits", none of them the actual culprit. > > It is certainly surprising this patch crashed SNB and KBL. > > How feasible would it be to make this code just not run when GuC is not > used? Given the field it adds is called ce->guc_blocked it sounds like a > natural and preferable thing to do... if possible. > I can likely do this in a follow up patch. > > netconsole (with init_netconsole() hacked to call i915_init() when > > logging has started, instead of by module_init()) tells the story: > > > > kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245! > > with RSI: 814d408b pointing to sw_fence_dummy_notify(). > > I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that > > function needs to be 4-byte aligned. > > > > v2: > > (Jani Nikula) > >- Change BUG_ON to WARN_ON > > However in this case the code would then go on and call into a wrong > function offset which may be worse than a BUG_ON, no? > Yea, I guess that would be bad too. > > > > Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") > > Signed-off-by: Hugh Dickins > > Signed-off-by: Matthew Brost > > Reviewed-by: Matthew Brost > > --- > > drivers/gpu/drm/i915/gt/intel_context.c | 1 + > > drivers/gpu/drm/i915/i915_sw_fence.c| 4 +++- > > 2 files changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c > > b/drivers/gpu/drm/i915/gt/intel_context.c > > index ff637147b1a9..f02c2202da9d 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_context.c > > +++ b/drivers/gpu/drm/i915/gt/intel_context.c > > @@ -362,6 +362,7 @@ static int __intel_context_active(struct i915_active > > *active) > > return 0; > > } > > +__aligned(4) /* Respect the I915_SW_FENCE_MASK */ > > Hugh suggested __i915_sw_fence_call which I think would be the right thing > to do. > Yep. Will do. Matt > Regards, > > Tvrtko > > > static int sw_fence_dummy_notify(struct i915_sw_fence *sf, > > enum i915_sw_fence_notify state) > > { > > diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c > > b/drivers/gpu/drm/i915/i915_sw_fence.c > > index c589a681da77..1217b124c1d0 100644 > > --- a/drivers/gpu/drm/i915/i915_sw_fence.c > > +++ b/drivers/gpu/drm/i915/i915_sw_fence.c > > @@ -14,8 +14,10 @@ > > #if IS_ENABLED(CONFIG_DRM_I915_DEBUG) > > #define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr) > > +#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr) > > #else > > #define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr) > > +#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr) > > #endif > > static DEFINE_SPINLOCK(i915_sw_fence_lock); > > @@ -242,7 +244,7 @@ void __i915_sw_fence_init(struct i915_sw_fence *fence, > > const char *name, > > struct lock_class_key *key) > > { > > - BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK); > > + I915_SW_FENCE_WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK); > > __init_waitqueue_head(>wait, name, key); > > fence->flags = (unsigned long)fn; > >
Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes
On 20/09/2021 08:38, Jani Nikula wrote: On Mon, 20 Sep 2021, Tvrtko Ursulin wrote: On 18/09/2021 00:38, Matthew Brost wrote: From: Hugh Dickins 5.15-rc1 crashes with blank screen when booting up on two ThinkPads using i915. Bisections converge convincingly, but arrive at different and surprising "culprits", none of them the actual culprit. It is certainly surprising this patch crashed SNB and KBL. How feasible would it be to make this code just not run when GuC is not used? Given the field it adds is called ce->guc_blocked it sounds like a natural and preferable thing to do... if possible. netconsole (with init_netconsole() hacked to call i915_init() when logging has started, instead of by module_init()) tells the story: kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245! with RSI: 814d408b pointing to sw_fence_dummy_notify(). I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that function needs to be 4-byte aligned. v2: (Jani Nikula) - Change BUG_ON to WARN_ON However in this case the code would then go on and call into a wrong function offset which may be worse than a BUG_ON, no? So how about just if (WARN_ON(...)) return; or whatever is needed to give both the user and the CI a better opportunity to see the error. Sounds good to me. Regards, Tvrtko BR, Jani Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") Signed-off-by: Hugh Dickins Signed-off-by: Matthew Brost Reviewed-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_context.c | 1 + drivers/gpu/drm/i915/i915_sw_fence.c| 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index ff637147b1a9..f02c2202da9d 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -362,6 +362,7 @@ static int __intel_context_active(struct i915_active *active) return 0; } +__aligned(4) /* Respect the I915_SW_FENCE_MASK */ Hugh suggested __i915_sw_fence_call which I think would be the right thing to do. Regards, Tvrtko static int sw_fence_dummy_notify(struct i915_sw_fence *sf, enum i915_sw_fence_notify state) { diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c index c589a681da77..1217b124c1d0 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence.c +++ b/drivers/gpu/drm/i915/i915_sw_fence.c @@ -14,8 +14,10 @@ #if IS_ENABLED(CONFIG_DRM_I915_DEBUG) #define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr) +#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr) #else #define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr) +#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr) #endif static DEFINE_SPINLOCK(i915_sw_fence_lock); @@ -242,7 +244,7 @@ void __i915_sw_fence_init(struct i915_sw_fence *fence, const char *name, struct lock_class_key *key) { - BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK); + I915_SW_FENCE_WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK); __init_waitqueue_head(>wait, name, key); fence->flags = (unsigned long)fn;
Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes
On Mon, 20 Sep 2021, Tvrtko Ursulin wrote: > On 18/09/2021 00:38, Matthew Brost wrote: >> From: Hugh Dickins >> >> 5.15-rc1 crashes with blank screen when booting up on two ThinkPads >> using i915. Bisections converge convincingly, but arrive at different >> and surprising "culprits", none of them the actual culprit. > > It is certainly surprising this patch crashed SNB and KBL. > > How feasible would it be to make this code just not run when GuC is not > used? Given the field it adds is called ce->guc_blocked it sounds like a > natural and preferable thing to do... if possible. > >> netconsole (with init_netconsole() hacked to call i915_init() when >> logging has started, instead of by module_init()) tells the story: >> >> kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245! >> with RSI: 814d408b pointing to sw_fence_dummy_notify(). >> I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that >> function needs to be 4-byte aligned. >> >> v2: >> (Jani Nikula) >>- Change BUG_ON to WARN_ON > > However in this case the code would then go on and call into a wrong > function offset which may be worse than a BUG_ON, no? So how about just if (WARN_ON(...)) return; or whatever is needed to give both the user and the CI a better opportunity to see the error. BR, Jani > >> >> Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") >> Signed-off-by: Hugh Dickins >> Signed-off-by: Matthew Brost >> Reviewed-by: Matthew Brost >> --- >> drivers/gpu/drm/i915/gt/intel_context.c | 1 + >> drivers/gpu/drm/i915/i915_sw_fence.c| 4 +++- >> 2 files changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c >> b/drivers/gpu/drm/i915/gt/intel_context.c >> index ff637147b1a9..f02c2202da9d 100644 >> --- a/drivers/gpu/drm/i915/gt/intel_context.c >> +++ b/drivers/gpu/drm/i915/gt/intel_context.c >> @@ -362,6 +362,7 @@ static int __intel_context_active(struct i915_active >> *active) >> return 0; >> } >> >> +__aligned(4)/* Respect the I915_SW_FENCE_MASK */ > > Hugh suggested __i915_sw_fence_call which I think would be the right > thing to do. > > Regards, > > Tvrtko > >> static int sw_fence_dummy_notify(struct i915_sw_fence *sf, >> enum i915_sw_fence_notify state) >> { >> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c >> b/drivers/gpu/drm/i915/i915_sw_fence.c >> index c589a681da77..1217b124c1d0 100644 >> --- a/drivers/gpu/drm/i915/i915_sw_fence.c >> +++ b/drivers/gpu/drm/i915/i915_sw_fence.c >> @@ -14,8 +14,10 @@ >> >> #if IS_ENABLED(CONFIG_DRM_I915_DEBUG) >> #define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr) >> +#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr) >> #else >> #define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr) >> +#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr) >> #endif >> >> static DEFINE_SPINLOCK(i915_sw_fence_lock); >> @@ -242,7 +244,7 @@ void __i915_sw_fence_init(struct i915_sw_fence *fence, >>const char *name, >>struct lock_class_key *key) >> { >> -BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK); >> +I915_SW_FENCE_WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK); >> >> __init_waitqueue_head(>wait, name, key); >> fence->flags = (unsigned long)fn; >> -- Jani Nikula, Intel Open Source Graphics Center
Re: [Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes
On 18/09/2021 00:38, Matthew Brost wrote: From: Hugh Dickins 5.15-rc1 crashes with blank screen when booting up on two ThinkPads using i915. Bisections converge convincingly, but arrive at different and surprising "culprits", none of them the actual culprit. It is certainly surprising this patch crashed SNB and KBL. How feasible would it be to make this code just not run when GuC is not used? Given the field it adds is called ce->guc_blocked it sounds like a natural and preferable thing to do... if possible. netconsole (with init_netconsole() hacked to call i915_init() when logging has started, instead of by module_init()) tells the story: kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245! with RSI: 814d408b pointing to sw_fence_dummy_notify(). I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that function needs to be 4-byte aligned. v2: (Jani Nikula) - Change BUG_ON to WARN_ON However in this case the code would then go on and call into a wrong function offset which may be worse than a BUG_ON, no? Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") Signed-off-by: Hugh Dickins Signed-off-by: Matthew Brost Reviewed-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_context.c | 1 + drivers/gpu/drm/i915/i915_sw_fence.c| 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index ff637147b1a9..f02c2202da9d 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -362,6 +362,7 @@ static int __intel_context_active(struct i915_active *active) return 0; } +__aligned(4) /* Respect the I915_SW_FENCE_MASK */ Hugh suggested __i915_sw_fence_call which I think would be the right thing to do. Regards, Tvrtko static int sw_fence_dummy_notify(struct i915_sw_fence *sf, enum i915_sw_fence_notify state) { diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c index c589a681da77..1217b124c1d0 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence.c +++ b/drivers/gpu/drm/i915/i915_sw_fence.c @@ -14,8 +14,10 @@ #if IS_ENABLED(CONFIG_DRM_I915_DEBUG) #define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr) +#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr) #else #define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr) +#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr) #endif static DEFINE_SPINLOCK(i915_sw_fence_lock); @@ -242,7 +244,7 @@ void __i915_sw_fence_init(struct i915_sw_fence *fence, const char *name, struct lock_class_key *key) { - BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK); + I915_SW_FENCE_WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK); __init_waitqueue_head(>wait, name, key); fence->flags = (unsigned long)fn;
[Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes
From: Hugh Dickins 5.15-rc1 crashes with blank screen when booting up on two ThinkPads using i915. Bisections converge convincingly, but arrive at different and surprising "culprits", none of them the actual culprit. netconsole (with init_netconsole() hacked to call i915_init() when logging has started, instead of by module_init()) tells the story: kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245! with RSI: 814d408b pointing to sw_fence_dummy_notify(). I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that function needs to be 4-byte aligned. v2: (Jani Nikula) - Change BUG_ON to WARN_ON Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") Signed-off-by: Hugh Dickins Signed-off-by: Matthew Brost Reviewed-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_context.c | 1 + drivers/gpu/drm/i915/i915_sw_fence.c| 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index ff637147b1a9..f02c2202da9d 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -362,6 +362,7 @@ static int __intel_context_active(struct i915_active *active) return 0; } +__aligned(4) /* Respect the I915_SW_FENCE_MASK */ static int sw_fence_dummy_notify(struct i915_sw_fence *sf, enum i915_sw_fence_notify state) { diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c index c589a681da77..1217b124c1d0 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence.c +++ b/drivers/gpu/drm/i915/i915_sw_fence.c @@ -14,8 +14,10 @@ #if IS_ENABLED(CONFIG_DRM_I915_DEBUG) #define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr) +#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr) #else #define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr) +#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr) #endif static DEFINE_SPINLOCK(i915_sw_fence_lock); @@ -242,7 +244,7 @@ void __i915_sw_fence_init(struct i915_sw_fence *fence, const char *name, struct lock_class_key *key) { - BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK); + I915_SW_FENCE_WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK); __init_waitqueue_head(>wait, name, key); fence->flags = (unsigned long)fn; -- 2.32.0
[Intel-gfx] [PATCH] drm/i915: fix blank screen booting crashes
From: Hugh Dickins 5.15-rc1 crashes with blank screen when booting up on two ThinkPads using i915. Bisections converge convincingly, but arrive at different and surprising "culprits", none of them the actual culprit. netconsole (with init_netconsole() hacked to call i915_init() when logging has started, instead of by module_init()) tells the story: kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245! with RSI: 814d408b pointing to sw_fence_dummy_notify(). I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that function needs to be 4-byte aligned. v2: (Jani Nikula) - Change BUG_ON to WARN_ON Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") Signed-off-by: Hugh Dickins Signed-off-by: Matthew Brost Reviewed-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_context.c | 1 + drivers/gpu/drm/i915/i915_sw_fence.c| 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index ff637147b1a9..f02c2202da9d 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -362,6 +362,7 @@ static int __intel_context_active(struct i915_active *active) return 0; } +__aligned(4) /* Respect the I915_SW_FENCE_MASK */ static int sw_fence_dummy_notify(struct i915_sw_fence *sf, enum i915_sw_fence_notify state) { diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c index c589a681da77..1217b124c1d0 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence.c +++ b/drivers/gpu/drm/i915/i915_sw_fence.c @@ -14,8 +14,10 @@ #if IS_ENABLED(CONFIG_DRM_I915_DEBUG) #define I915_SW_FENCE_BUG_ON(expr) BUG_ON(expr) +#define I915_SW_FENCE_WARN_ON(expr) WARN_ON(expr) #else #define I915_SW_FENCE_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr) +#define I915_SW_FENCE_WARN_ON(expr) BUILD_BUG_ON_INVALID(expr) #endif static DEFINE_SPINLOCK(i915_sw_fence_lock); @@ -242,7 +244,7 @@ void __i915_sw_fence_init(struct i915_sw_fence *fence, const char *name, struct lock_class_key *key) { - BUG_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK); + I915_SW_FENCE_WARN_ON(!fn || (unsigned long)fn & ~I915_SW_FENCE_MASK); __init_waitqueue_head(>wait, name, key); fence->flags = (unsigned long)fn; -- 2.32.0