Re: [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5

2023-09-19 Thread Matthew Wilcox
On Tue, Sep 19, 2023 at 08:11:47PM +0200, Oleksandr Natalenko wrote:
> I can confirm this one fixes the issue for me on T460s laptop. Thank you!

Yay!

> Should you submit it, please add:
> 
> Fixes: 0b62af28f2 ("i915: convert shmem_sg_free_table() to use a folio_batch")

Thanks for collecting all these; you're making my life really easy.
One minor point is that the standard format for Fixes: is 12 characters,
not 10.  eg,

Documentation/process/5.Posting.rst:Fixes: 1f2e3d4c5b6a ("The first line of 
the commit specified by the first 12 characters of its SHA-1 ID")

I have this in my .gitconfig:

[pretty]
fixes = Fixes: %h (\"%s\")

and in .git/config,

[core]
abbrev = 12

I'm working on the commit message now.


Re: [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5

2023-09-19 Thread Oleksandr Natalenko
Hello.

On úterý 19. září 2023 17:43:40 CEST Matthew Wilcox wrote:
> On Tue, Sep 19, 2023 at 10:26:42AM +0200, Oleksandr Natalenko wrote:
> > Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 
> > 1e0877d58b1e, and reverting those fixed the i915 crash for me. The 
> > e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I 
> > assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a 
> > folio_batch") is the culprit here.
> > 
> > Could you please check this?
> > 
> > Our conversation with Andrzej is available at drm-intel GitLab [1].
> > 
> > Thanks.
> > 
> > [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
> 
> Wow, that is some great debugging.  Thanks for all the time & effort
> you and others have invested.  Sorry for breaking your system.
> 
> You're almost right about the "prerequisites", but it's in the other
> direction; 0b62af28f249 is a prerequisite for the later two cleanups,
> so reverting all three is necessary to test 0b62af28f249.
> 
> It seems to me that you've isolated the problem to constructing overly
> long sg lists.  I didn't realise that was going to be a problem, so
> that's my fault.
> 
> Could I ask you to try this patch?  I'll follow up with another patch
> later because I think I made another assumption that may not be valid.

I can confirm this one fixes the issue for me on T460s laptop. Thank you!

Should you submit it, please add:

Fixes: 0b62af28f2 ("i915: convert shmem_sg_free_table() to use a folio_batch")
Cc: sta...@vger.kernel.org # 6.5.x
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/9256
Link: https://lore.kernel.org/lkml/6287208.lov4wx5...@natalenko.name/
Reported-by: Oleksandr Natalenko 
Tested-by: Oleksandr Natalenko 

> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> index 8f1633c3fb93..73a4a4eb29e0 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> @@ -100,6 +100,7 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, 
> struct sg_table *st,
>   st->nents = 0;
>   for (i = 0; i < page_count; i++) {
>   struct folio *folio;
> + unsigned long nr_pages;
>   const unsigned int shrink[] = {
>   I915_SHRINK_BOUND | I915_SHRINK_UNBOUND,
>   0,
> @@ -150,6 +151,8 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, 
> struct sg_table *st,
>   }
>   } while (1);
>  
> + nr_pages = min_t(unsigned long,
> + folio_nr_pages(folio), page_count - i);
>   if (!i ||
>   sg->length >= max_segment ||
>   folio_pfn(folio) != next_pfn) {
> @@ -157,13 +160,13 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, 
> struct sg_table *st,
>   sg = sg_next(sg);
>  
>   st->nents++;
> - sg_set_folio(sg, folio, folio_size(folio), 0);
> + sg_set_folio(sg, folio, nr_pages * PAGE_SIZE, 0);
>   } else {
>   /* XXX: could overflow? */
> - sg->length += folio_size(folio);
> + sg->length += nr_pages * PAGE_SIZE;
>   }
> - next_pfn = folio_pfn(folio) + folio_nr_pages(folio);
> - i += folio_nr_pages(folio) - 1;
> + next_pfn = folio_pfn(folio) + nr_pages;
> + i += nr_pages - 1;
>  
>   /* Check that the i965g/gm workaround works. */
>   GEM_BUG_ON(gfp & __GFP_DMA32 && next_pfn >= 0x0010UL);

-- 
Oleksandr Natalenko (post-factum)

signature.asc
Description: This is a digitally signed message part.


Re: [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5

2023-09-19 Thread Matthew Wilcox
On Tue, Sep 19, 2023 at 04:43:41PM +0100, Matthew Wilcox wrote:
> Could I ask you to try this patch?  I'll follow up with another patch
> later because I think I made another assumption that may not be valid.

Ah, no, never mind.  I thought we could start in the middle of a folio,
but we always start constructing the sg list from index 0 of the file,
so we always start at the first page of a folio.  If this patch solves
your problem, then I think we're done.


Re: [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5

2023-09-19 Thread Matthew Wilcox
On Tue, Sep 19, 2023 at 10:26:42AM +0200, Oleksandr Natalenko wrote:
> Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 
> 1e0877d58b1e, and reverting those fixed the i915 crash for me. The 
> e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I 
> assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a 
> folio_batch") is the culprit here.
> 
> Could you please check this?
> 
> Our conversation with Andrzej is available at drm-intel GitLab [1].
> 
> Thanks.
> 
> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256

Wow, that is some great debugging.  Thanks for all the time & effort
you and others have invested.  Sorry for breaking your system.

You're almost right about the "prerequisites", but it's in the other
direction; 0b62af28f249 is a prerequisite for the later two cleanups,
so reverting all three is necessary to test 0b62af28f249.

It seems to me that you've isolated the problem to constructing overly
long sg lists.  I didn't realise that was going to be a problem, so
that's my fault.

Could I ask you to try this patch?  I'll follow up with another patch
later because I think I made another assumption that may not be valid.

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c 
b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 8f1633c3fb93..73a4a4eb29e0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -100,6 +100,7 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, 
struct sg_table *st,
st->nents = 0;
for (i = 0; i < page_count; i++) {
struct folio *folio;
+   unsigned long nr_pages;
const unsigned int shrink[] = {
I915_SHRINK_BOUND | I915_SHRINK_UNBOUND,
0,
@@ -150,6 +151,8 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, 
struct sg_table *st,
}
} while (1);
 
+   nr_pages = min_t(unsigned long,
+   folio_nr_pages(folio), page_count - i);
if (!i ||
sg->length >= max_segment ||
folio_pfn(folio) != next_pfn) {
@@ -157,13 +160,13 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, 
struct sg_table *st,
sg = sg_next(sg);
 
st->nents++;
-   sg_set_folio(sg, folio, folio_size(folio), 0);
+   sg_set_folio(sg, folio, nr_pages * PAGE_SIZE, 0);
} else {
/* XXX: could overflow? */
-   sg->length += folio_size(folio);
+   sg->length += nr_pages * PAGE_SIZE;
}
-   next_pfn = folio_pfn(folio) + folio_nr_pages(folio);
-   i += folio_nr_pages(folio) - 1;
+   next_pfn = folio_pfn(folio) + nr_pages;
+   i += nr_pages - 1;
 
/* Check that the i965g/gm workaround works. */
GEM_BUG_ON(gfp & __GFP_DMA32 && next_pfn >= 0x0010UL);


Re: [Intel-gfx] [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5

2023-09-19 Thread Oleksandr Natalenko
On úterý 19. září 2023 16:03:03 CEST Bagas Sanjaya wrote:
> …
> > > [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
> > 
> > Bagas,
> > 
> > would you mind adding this to the regression tracker please?
> > 
> 
> Will add shortly, thanks!

Thank you.

Please consider correcting the title though. Lenovo T14s (Tiger Lake) is not 
affected.  Affected are: Lenovo T460s (Skylake) and Lenovo T490s (WhiskeyLake)

-- 
Oleksandr Natalenko (post-factum)

signature.asc
Description: This is a digitally signed message part.


Re: [Intel-gfx] [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5

2023-09-19 Thread Bagas Sanjaya
On Tue, Sep 19, 2023 at 03:23:28PM +0200, Oleksandr Natalenko wrote:
> /cc Bagas as well (see below).
> 
> On úterý 19. září 2023 10:26:42 CEST Oleksandr Natalenko wrote:
> > /cc Matthew Wilcox and Andrew Morton because of folios (please see below).
> > 
> > On sobota 2. září 2023 18:14:12 CEST Oleksandr Natalenko wrote:
> > > Hello.
> > > 
> > > Since v6.5 kernel the following HW:
> > > 
> > > * Lenovo T460s laptop with Skylake GT2 [HD Graphics 520] (rev 07)
> > > * Lenovo T490s laptop with WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)
> > > 
> > > is affected by the following crash once KDE on either X11 or Wayland is 
> > > started:
> > > 
> > > i915 :00:02.0: enabling device (0006 -> 0007)
> > > i915 :00:02.0: vgaarb: deactivate vga console
> > > i915 :00:02.0: vgaarb: changed VGA decodes: 
> > > olddecodes=io+mem,decodes=io+mem:owns=mem
> > > i915 :00:02.0: [drm] Finished loading DMC firmware 
> > > i915/skl_dmc_ver1_27.bin (v1.27)
> > > [drm] Initialized i915 1.6.0 20201103 for :00:02.0 on minor 1
> > > fbcon: i915drmfb (fb0) is primary device
> > > i915 :00:02.0: [drm] fb0: i915drmfb frame buffer device
> > > …
> > > memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=674 
> > > 'kwin_wayland'
> > > BUG: unable to handle page fault for address: b422c280
> > > #PF: supervisor write access in kernel mode
> > > #PF: error_code(0x0002) - not-present page
> > > PGD 10067 P4D 10067 PUD 1001df067 PMD 10d1cf067 PTE 0
> > > Oops: 0002 [#1] PREEMPT SMP PTI
> > > CPU: 1 PID: 674 Comm: kwin_wayland Not tainted 6.5.0-pf1 #1 
> > > a6c58ff41a7b8bb16a19f5af9e0e9bce20f9f38d
> > > Hardware name: LENOVO 20FAS2BM0F/20FAS2BM0F, BIOS N1CET90W (1.58 ) 
> > > 11/15/2022
> > > RIP: 0010:gen8_ggtt_insert_entries+0xc2/0x140 [i915]
> > > …
> > > Call Trace:
> > >  
> > >  intel_ggtt_bind_vma+0x3e/0x60 [i915 
> > > a83fdc6539431252dba13053979a8b680af86836]
> > >  i915_vma_bind+0x216/0x4b0 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  i915_vma_pin_ww+0x405/0xa80 [i915 
> > > a83fdc6539431252dba13053979a8b680af86836]
> > >  __i915_ggtt_pin+0x5a/0x130 [i915 
> > > a83fdc6539431252dba13053979a8b680af86836]
> > >  i915_ggtt_pin+0x78/0x1f0 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  __intel_context_do_pin_ww+0x312/0x700 [i915 
> > > a83fdc6539431252dba13053979a8b680af86836]
> > >  i915_gem_do_execbuffer+0xfc6/0x2720 [i915 
> > > a83fdc6539431252dba13053979a8b680af86836]
> > >  i915_gem_execbuffer2_ioctl+0x111/0x260 [i915 
> > > a83fdc6539431252dba13053979a8b680af86836]
> > >  drm_ioctl_kernel+0xca/0x170
> > >  drm_ioctl+0x30f/0x580
> > >  __x64_sys_ioctl+0x94/0xd0
> > >  do_syscall_64+0x5d/0x90
> > >  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> > > …
> > > note: kwin_wayland[674] exited with irqs disabled
> > > 
> > > RIP seems to translate into this:
> > > 
> > > $ scripts/faddr2line drivers/gpu/drm/i915/gt/intel_ggtt.o 
> > > gen8_ggtt_insert_entries+0xc2
> > > gen8_ggtt_insert_entries+0xc2/0x150:
> > > writeq at 
> > > /home/pf/work/devel/own/pf-kernel/linux/./arch/x86/include/asm/io.h:99
> > > (inlined by) gen8_set_pte at 
> > > /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:257
> > > (inlined by) gen8_ggtt_insert_entries at 
> > > /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:300
> > > 
> > > Probably, recent PTE-related changes are relevant:
> > > 
> > > $ git log --oneline --no-merges v6.4..v6.5 -- 
> > > drivers/gpu/drm/i915/gt/intel_ggtt.c
> > > 3532e75dfadcf drm/i915/uc: perma-pin firmwares
> > > 4722e2ebe6f21 drm/i915/gt: Fix second parameter type of pre-gen8 
> > > pte_encode callbacks
> > > 9275277d53248 drm/i915: use pat_index instead of cache_level
> > > 5e352e32aec23 drm/i915: preparation for using PAT index
> > > 341ad0e8e2542 drm/i915/mtl: Add PTE encode function
> > > 
> > > Also note Lenovo T14s laptop with TigerLake-LP GT2 [Iris Xe Graphics] 
> > > (rev 01) is not affected by this issue.
> > > 
> > > Full dmesg with DRM debug enabled is available in the bugreport I've 
> > > reported earlier [1]. I'm sending this email to make the issue more 
> > > visible.
> > > 
> > > Please help.
> > > 
> > > Thanks.
> > > 
> > > [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
> > 
> > Matthew,
> > 
> > Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 
> > 1e0877d58b1e, and reverting those fixed the i915 crash for me. The 
> > e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I 
> > assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a 
> > folio_batch") is the culprit here.
> > 
> > Could you please check this?
> > 
> > Our conversation with Andrzej is available at drm-intel GitLab [1].
> > 
> > Thanks.
> > 
> > [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
> 
> Bagas,
> 
> would you mind adding this to the regression tracker please?
> 

Will add shortly, thanks!

-- 
An old man doll... just what I always wanted! - Clara

Re: [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5

2023-09-19 Thread Oleksandr Natalenko
/cc Bagas as well (see below).

On úterý 19. září 2023 10:26:42 CEST Oleksandr Natalenko wrote:
> /cc Matthew Wilcox and Andrew Morton because of folios (please see below).
> 
> On sobota 2. září 2023 18:14:12 CEST Oleksandr Natalenko wrote:
> > Hello.
> > 
> > Since v6.5 kernel the following HW:
> > 
> > * Lenovo T460s laptop with Skylake GT2 [HD Graphics 520] (rev 07)
> > * Lenovo T490s laptop with WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)
> > 
> > is affected by the following crash once KDE on either X11 or Wayland is 
> > started:
> > 
> > i915 :00:02.0: enabling device (0006 -> 0007)
> > i915 :00:02.0: vgaarb: deactivate vga console
> > i915 :00:02.0: vgaarb: changed VGA decodes: 
> > olddecodes=io+mem,decodes=io+mem:owns=mem
> > i915 :00:02.0: [drm] Finished loading DMC firmware 
> > i915/skl_dmc_ver1_27.bin (v1.27)
> > [drm] Initialized i915 1.6.0 20201103 for :00:02.0 on minor 1
> > fbcon: i915drmfb (fb0) is primary device
> > i915 :00:02.0: [drm] fb0: i915drmfb frame buffer device
> > …
> > memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=674 'kwin_wayland'
> > BUG: unable to handle page fault for address: b422c280
> > #PF: supervisor write access in kernel mode
> > #PF: error_code(0x0002) - not-present page
> > PGD 10067 P4D 10067 PUD 1001df067 PMD 10d1cf067 PTE 0
> > Oops: 0002 [#1] PREEMPT SMP PTI
> > CPU: 1 PID: 674 Comm: kwin_wayland Not tainted 6.5.0-pf1 #1 
> > a6c58ff41a7b8bb16a19f5af9e0e9bce20f9f38d
> > Hardware name: LENOVO 20FAS2BM0F/20FAS2BM0F, BIOS N1CET90W (1.58 ) 
> > 11/15/2022
> > RIP: 0010:gen8_ggtt_insert_entries+0xc2/0x140 [i915]
> > …
> > Call Trace:
> >  
> >  intel_ggtt_bind_vma+0x3e/0x60 [i915 
> > a83fdc6539431252dba13053979a8b680af86836]
> >  i915_vma_bind+0x216/0x4b0 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  i915_vma_pin_ww+0x405/0xa80 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  __i915_ggtt_pin+0x5a/0x130 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  i915_ggtt_pin+0x78/0x1f0 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  __intel_context_do_pin_ww+0x312/0x700 [i915 
> > a83fdc6539431252dba13053979a8b680af86836]
> >  i915_gem_do_execbuffer+0xfc6/0x2720 [i915 
> > a83fdc6539431252dba13053979a8b680af86836]
> >  i915_gem_execbuffer2_ioctl+0x111/0x260 [i915 
> > a83fdc6539431252dba13053979a8b680af86836]
> >  drm_ioctl_kernel+0xca/0x170
> >  drm_ioctl+0x30f/0x580
> >  __x64_sys_ioctl+0x94/0xd0
> >  do_syscall_64+0x5d/0x90
> >  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> > …
> > note: kwin_wayland[674] exited with irqs disabled
> > 
> > RIP seems to translate into this:
> > 
> > $ scripts/faddr2line drivers/gpu/drm/i915/gt/intel_ggtt.o 
> > gen8_ggtt_insert_entries+0xc2
> > gen8_ggtt_insert_entries+0xc2/0x150:
> > writeq at 
> > /home/pf/work/devel/own/pf-kernel/linux/./arch/x86/include/asm/io.h:99
> > (inlined by) gen8_set_pte at 
> > /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:257
> > (inlined by) gen8_ggtt_insert_entries at 
> > /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:300
> > 
> > Probably, recent PTE-related changes are relevant:
> > 
> > $ git log --oneline --no-merges v6.4..v6.5 -- 
> > drivers/gpu/drm/i915/gt/intel_ggtt.c
> > 3532e75dfadcf drm/i915/uc: perma-pin firmwares
> > 4722e2ebe6f21 drm/i915/gt: Fix second parameter type of pre-gen8 pte_encode 
> > callbacks
> > 9275277d53248 drm/i915: use pat_index instead of cache_level
> > 5e352e32aec23 drm/i915: preparation for using PAT index
> > 341ad0e8e2542 drm/i915/mtl: Add PTE encode function
> > 
> > Also note Lenovo T14s laptop with TigerLake-LP GT2 [Iris Xe Graphics] (rev 
> > 01) is not affected by this issue.
> > 
> > Full dmesg with DRM debug enabled is available in the bugreport I've 
> > reported earlier [1]. I'm sending this email to make the issue more visible.
> > 
> > Please help.
> > 
> > Thanks.
> > 
> > [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
> 
> Matthew,
> 
> Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 
> 1e0877d58b1e, and reverting those fixed the i915 crash for me. The 
> e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I 
> assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a 
> folio_batch") is the culprit here.
> 
> Could you please check this?
> 
> Our conversation with Andrzej is available at drm-intel GitLab [1].
> 
> Thanks.
> 
> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256

Bagas,

would you mind adding this to the regression tracker please?

Thanks.

-- 
Oleksandr Natalenko (post-factum)

signature.asc
Description: This is a digitally signed message part.


[REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5

2023-09-19 Thread Oleksandr Natalenko
/cc Matthew Wilcox and Andrew Morton because of folios (please see below).

On sobota 2. září 2023 18:14:12 CEST Oleksandr Natalenko wrote:
> Hello.
> 
> Since v6.5 kernel the following HW:
> 
> * Lenovo T460s laptop with Skylake GT2 [HD Graphics 520] (rev 07)
> * Lenovo T490s laptop with WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)
> 
> is affected by the following crash once KDE on either X11 or Wayland is 
> started:
> 
> i915 :00:02.0: enabling device (0006 -> 0007)
> i915 :00:02.0: vgaarb: deactivate vga console
> i915 :00:02.0: vgaarb: changed VGA decodes: 
> olddecodes=io+mem,decodes=io+mem:owns=mem
> i915 :00:02.0: [drm] Finished loading DMC firmware 
> i915/skl_dmc_ver1_27.bin (v1.27)
> [drm] Initialized i915 1.6.0 20201103 for :00:02.0 on minor 1
> fbcon: i915drmfb (fb0) is primary device
> i915 :00:02.0: [drm] fb0: i915drmfb frame buffer device
> …
> memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=674 'kwin_wayland'
> BUG: unable to handle page fault for address: b422c280
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0002) - not-present page
> PGD 10067 P4D 10067 PUD 1001df067 PMD 10d1cf067 PTE 0
> Oops: 0002 [#1] PREEMPT SMP PTI
> CPU: 1 PID: 674 Comm: kwin_wayland Not tainted 6.5.0-pf1 #1 
> a6c58ff41a7b8bb16a19f5af9e0e9bce20f9f38d
> Hardware name: LENOVO 20FAS2BM0F/20FAS2BM0F, BIOS N1CET90W (1.58 ) 11/15/2022
> RIP: 0010:gen8_ggtt_insert_entries+0xc2/0x140 [i915]
> …
> Call Trace:
>  
>  intel_ggtt_bind_vma+0x3e/0x60 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_vma_bind+0x216/0x4b0 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_vma_pin_ww+0x405/0xa80 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  __i915_ggtt_pin+0x5a/0x130 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_ggtt_pin+0x78/0x1f0 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  __intel_context_do_pin_ww+0x312/0x700 [i915 
> a83fdc6539431252dba13053979a8b680af86836]
>  i915_gem_do_execbuffer+0xfc6/0x2720 [i915 
> a83fdc6539431252dba13053979a8b680af86836]
>  i915_gem_execbuffer2_ioctl+0x111/0x260 [i915 
> a83fdc6539431252dba13053979a8b680af86836]
>  drm_ioctl_kernel+0xca/0x170
>  drm_ioctl+0x30f/0x580
>  __x64_sys_ioctl+0x94/0xd0
>  do_syscall_64+0x5d/0x90
>  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> …
> note: kwin_wayland[674] exited with irqs disabled
> 
> RIP seems to translate into this:
> 
> $ scripts/faddr2line drivers/gpu/drm/i915/gt/intel_ggtt.o 
> gen8_ggtt_insert_entries+0xc2
> gen8_ggtt_insert_entries+0xc2/0x150:
> writeq at 
> /home/pf/work/devel/own/pf-kernel/linux/./arch/x86/include/asm/io.h:99
> (inlined by) gen8_set_pte at 
> /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:257
> (inlined by) gen8_ggtt_insert_entries at 
> /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:300
> 
> Probably, recent PTE-related changes are relevant:
> 
> $ git log --oneline --no-merges v6.4..v6.5 -- 
> drivers/gpu/drm/i915/gt/intel_ggtt.c
> 3532e75dfadcf drm/i915/uc: perma-pin firmwares
> 4722e2ebe6f21 drm/i915/gt: Fix second parameter type of pre-gen8 pte_encode 
> callbacks
> 9275277d53248 drm/i915: use pat_index instead of cache_level
> 5e352e32aec23 drm/i915: preparation for using PAT index
> 341ad0e8e2542 drm/i915/mtl: Add PTE encode function
> 
> Also note Lenovo T14s laptop with TigerLake-LP GT2 [Iris Xe Graphics] (rev 
> 01) is not affected by this issue.
> 
> Full dmesg with DRM debug enabled is available in the bugreport I've reported 
> earlier [1]. I'm sending this email to make the issue more visible.
> 
> Please help.
> 
> Thanks.
> 
> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256

Matthew,

Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 
1e0877d58b1e, and reverting those fixed the i915 crash for me. The e0b72c14d8dc 
and 1e0877d58b1e commits look like just prerequisites, so I assume 0b62af28f249 
("i915: convert shmem_sg_free_table() to use a folio_batch") is the culprit 
here.

Could you please check this?

Our conversation with Andrzej is available at drm-intel GitLab [1].

Thanks.

[1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256

-- 
Oleksandr Natalenko (post-factum)

signature.asc
Description: This is a digitally signed message part.