Re: BUG [RESEND][NEW BUG]: kernel NULL pointer dereference, address: 0000000000000008
Hi Mirsad, On 1/25/2024 1:48 AM, Mirsad Todorovac wrote: > Hi, Ma Jun, > > Normally, I would reply under the quoted text, but I will adjust to your > convention. > > I have just discovered that your patch causes Ubuntu 22.04 LTS GNOME XWayland > session > to block at typing password and ENTER in the graphical logon screen (tested > several times). > This problem is not caused by my patch. Based on your syslog, it looks more like a shedule issue. I just saw a similar problem, please refer to the link below https://gitlab.freedesktop.org/drm/amd/-/issues/3124 Regards, Ma Jun > After that, I was not able to even log from another box with ssh, or the > session would > block (tested one time, second time too, thrid time it passed after I > connected before > attempt to login on XWayland console). > > You might find useful syslog and dmesg of the freeze on this link (they were > +100K): > > https://magrf.grf.hr/~mtodorov/linux/bugreports/6.7.0/amdgpu/6.7.0-xway-09721-g61da593f4458/ > > The exact applied patch was this: > > marvin@defiant:~/linux/kernel/linux_torvalds$ git diff > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c > b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c > index 73f6d7e72c73..6ef333df9adf 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c > @@ -3996,16 +3996,13 @@ static int gfx_v10_0_init_microcode(struct > amdgpu_device *adev) > > if (!amdgpu_sriov_vf(adev)) { > snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", > ucode_prefix); > - err = amdgpu_ucode_request(adev, &adev->gfx.rlc_fw, fw_name); > - /* don't check this. There are apparently firmwares in the > wild with > -* incorrect size in the header > -*/ > - if (err == -ENODEV) > - goto out; > + err = request_firmware(&adev->gfx.rlc_fw, fw_name, adev->dev); > if (err) > - dev_dbg(adev->dev, > - "gfx10: amdgpu_ucode_request() failed > \"%s\"\n", > - fw_name); > + goto out; > + > + /* don't validate this firmware. There are apparently > firmwares > +* in the wild with incorrect size in the header > +*/ > rlc_hdr = (const struct rlc_firmware_header_v2_0 > *)adev->gfx.rlc_fw->data; > version_major = > le16_to_cpu(rlc_hdr->header.header_version_major); > version_minor = > le16_to_cpu(rlc_hdr->header.header_version_minor); > marvin@defiant:~/linux/kernel/linux_torvalds$ uname -rms > Linux 6.7.0-xway-09721-g61da593f4458 x86_64 > marvin@defiant:~/linux/kernel/linux_torvalds$ > > So, there seems to be a problem with the way the patch affects XWayland. > > Checked multiple times the exact commit with and without the diff. > > Hope this helps, because I am not familiar with the amdgpu driver. > > Best regards, > Mirsad Todorovac > > On 1/22/24 09:34, Ma, Jun wrote: >> Perhaps similar to the problem I encountered earlier, you can >> try the following patch >> >> https://lists.freedesktop.org/archives/amd-gfx/2024-January/103259.html >> >> Regards, >> Ma Jun >> >> On 1/21/2024 3:54 AM, Mirsad Todorovac wrote: >>> Hi, >>> >>> The last email did not pass to the most of the recipients due to banned .xz >>> attachment. >>> >>> As the .config is too big to send inline or uncompressed either, I will >>> omit it in this >>> attempt. In the meantime, I had some success in decoding the stack trace, >>> but sadly not >>> complete. >>> >>> I don't think this Oops is deterministic, but I am working on a reproducer. >>> >>> The platform is Ubuntu 22.04 LTS. >>> >>> Complete list of hardware and .config is available here: >>> >>> https://domac.alu.unizg.hr/~mtodorov/linux/bugreports/amdgpu/6.7.0-rtl-v02-nokcsan-09928-g052d534373b7/ >>> >>> Best regards, >>> Mirsad >>> >>> --- >>> kernel: [5.576702] BUG: kernel NULL pointer dereference, address: >>> 0008 >>> kernel: [5.576707] #PF: supervisor read access in kernel mode >>> kernel: [5.576710] #PF: error_code(0x) - not-present page >>> kernel: [5.576712] PGD 0
Re: BUG [RESEND]: kernel NULL pointer dereference, address: 0000000000000008
Perhaps similar to the problem I encountered earlier, you can try the following patch https://lists.freedesktop.org/archives/amd-gfx/2024-January/103259.html Regards, Ma Jun On 1/21/2024 3:54 AM, Mirsad Todorovac wrote: > Hi, > > The last email did not pass to the most of the recipients due to banned .xz > attachment. > > As the .config is too big to send inline or uncompressed either, I will omit > it in this > attempt. In the meantime, I had some success in decoding the stack trace, but > sadly not > complete. > > I don't think this Oops is deterministic, but I am working on a reproducer. > > The platform is Ubuntu 22.04 LTS. > > Complete list of hardware and .config is available here: > > https://domac.alu.unizg.hr/~mtodorov/linux/bugreports/amdgpu/6.7.0-rtl-v0.2-nokcsan-09928-g052d534373b7/ > > Best regards, > Mirsad > > --- > kernel: [5.576702] BUG: kernel NULL pointer dereference, address: > 0008 > kernel: [5.576707] #PF: supervisor read access in kernel mode > kernel: [5.576710] #PF: error_code(0x) - not-present page > kernel: [5.576712] PGD 0 P4D 0 > kernel: [5.576715] Oops: [#1] PREEMPT SMP NOPTI > kernel: [5.576718] CPU: 9 PID: 650 Comm: systemd-udevd Not tainted > 6.7.0-rtl-v0.2-nokcsan-09928-g052d534373b7 #2 > kernel: [5.576723] Hardware name: ASRock X670E PG Lightning/X670E PG > Lightning, BIOS 1.21 04/26/2023 > kernel: [5.576726] RIP: 0010:gfx_v10_0_early_init > (drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4009 > drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:7478) amdgpu > kernel: [ 5.576872] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed > 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff > <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0 > All code > > 0:8d 55 a8lea-0x58(%rbp),%edx > 3:4c 89 ffmov%r15,%rdi > 6:e8 e4 83 ec ff call 0xffec83ef > b:41 89 c2mov%eax,%r10d > e:83 f8 edcmp$0xffed,%eax >11:0f 84 b3 fd ff ff je 0xfdca >17:85 c0 test %eax,%eax >19:74 05 je 0x20 >1b:0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) >20:49 8b 87 08 87 01 00mov0x18708(%r15),%rax >27:4c 89 ffmov%r15,%rdi >2a:* 48 8b 40 08 mov0x8(%rax),%rax <-- > trapping instruction >2e:0f b7 50 0a movzwl 0xa(%rax),%edx >32:0f b7 70 08 movzwl 0x8(%rax),%esi >36:e8 e4 42 fb ff call 0xfffb431f >3b:41 89 c2mov%eax,%r10d >3e:85 c0 test %eax,%eax > > Code starting with the faulting instruction > === > 0:48 8b 40 08 mov0x8(%rax),%rax > 4:0f b7 50 0a movzwl 0xa(%rax),%edx > 8:0f b7 70 08 movzwl 0x8(%rax),%esi > c:e8 e4 42 fb ff call 0xfffb42f5 >11:41 89 c2mov%eax,%r10d >14:85 c0 test %eax,%eax > kernel: [5.576878] RSP: 0018:a5b3c103f720 EFLAGS: 00010282 > kernel: [5.576881] RAX: RBX: c1d73489 RCX: > > kernel: [5.576884] RDX: RSI: RDI: > 91ae4fa8 > kernel: [5.576886] RBP: a5b3c103f7b0 R08: R09: > > kernel: [5.576889] R10: ffea R11: R12: > 91ae4fa986e8 > kernel: [5.576892] R13: 91ae4fa986d8 R14: 91ae4fa986f8 R15: > 91ae4fa8 > kernel: [5.576895] FS: 7fdaa343c8c0() GS:91bd5844() > knlGS: > kernel: [5.576898] CS: 0010 DS: ES: CR0: 80050033 > kernel: [5.576900] CR2: 0008 CR3: 0001222d CR4: > 00750ef0 > kernel: [5.576903] PKRU: 5554 > kernel: [5.576905] Call Trace: > kernel: [5.576907] > kernel: [5.576909] ? show_regs (arch/x86/kernel/dumpstack.c:479) > kernel: [5.576914] ? __die (arch/x86/kernel/dumpstack.c:421 > arch/x86/kernel/dumpstack.c:434) > kernel: [5.576917] ? page_fault_oops (arch/x86/mm/fault.c:707) > kernel: [5.576921] ? srso_alias_return_thunk > (arch/x86/lib/retpoline.S:181) > kernel: [
[PATCH] drm/buddy: Fix drm buddy info output format
[1] Change pages to blocks to avoid confusion. [2] Fix output format to align the output info. Signed-off-by: Ma Jun --- drivers/gpu/drm/drm_buddy.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 3d1f50f481cf..ef3dd15c334a 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -781,15 +781,15 @@ void drm_buddy_print(struct drm_buddy *mm, struct drm_printer *p) count++; } - drm_printf(p, "order-%d ", order); + drm_printf(p, "order-%2d ", order); free = count * (mm->chunk_size << order); if (free < SZ_1M) - drm_printf(p, "free: %lluKiB", free >> 10); + drm_printf(p, "free: %8llu KiB", free >> 10); else - drm_printf(p, "free: %lluMiB", free >> 20); + drm_printf(p, "free: %8llu MiB", free >> 20); - drm_printf(p, ", pages: %llu\n", count); + drm_printf(p, ", blocks: %llu\n", count); } } EXPORT_SYMBOL(drm_buddy_print); -- 2.34.1
Re: [PATCH v2] drm/ttm: Remove redundant code in ttm_tt_init_fields
On 5/31/2023 4:45 PM, Christian König wrote: > Am 31.05.23 um 08:20 schrieb Chen, Guchun: >> [Public] >> >>> -Original Message- >>> From: amd-gfx On Behalf Of Ma >>> Jun >>> Sent: Wednesday, May 31, 2023 1:31 PM >>> To: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Koenig, >>> Christian >>> Cc: Ma, Jun >>> Subject: [PATCH v2] drm/ttm: Remove redundant code in ttm_tt_init_fields >>> >>> Remove redundant assignment code for ttm->caching as it's overwritten >>> >>> just a few lines later. >> Please drop the blank line in above message. With it fixed, the patch is: >> Reviewed-by: Guchun Chen > > Seconded, I'm going to pick that patch up and submit it to drm-misc-next > with the commit message fixed. > Thanks for help. Regards, Ma Jun > Regards, > Christian. > >> >> Regards, >> Guchun >> >>> v2: >>> - Update the commit message. >>> >>> Signed-off-by: Ma Jun >>> --- >>> drivers/gpu/drm/ttm/ttm_tt.c | 1 - >>> 1 file changed, 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c >>> index 02b812dacc5d..45a44544b656 100644 >>> --- a/drivers/gpu/drm/ttm/ttm_tt.c >>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c >>> @@ -143,7 +143,6 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm, >>> unsigned long extra_pages) >>> { >>>ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) + >>> extra_pages; >>> - ttm->caching = ttm_cached; >>>ttm->page_flags = page_flags; >>>ttm->dma_address = NULL; >>>ttm->swap_storage = NULL; >>> -- >>> 2.34.1 >
Re: [PATCH v2] drm/ttm: Remove redundant code in ttm_tt_init_fields
On 5/31/2023 2:20 PM, Chen, Guchun wrote: > [Public] > >> -Original Message- >> From: amd-gfx On Behalf Of Ma >> Jun >> Sent: Wednesday, May 31, 2023 1:31 PM >> To: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Koenig, >> Christian >> Cc: Ma, Jun >> Subject: [PATCH v2] drm/ttm: Remove redundant code in ttm_tt_init_fields >> >> Remove redundant assignment code for ttm->caching as it's overwritten >> >> just a few lines later. > > Please drop the blank line in above message. With it fixed, the patch is: > Reviewed-by: Guchun Chen > Thanks for review. Will fix it when push. Regards, Ma Jun > Regards, > Guchun > >> v2: >> - Update the commit message. >> >> Signed-off-by: Ma Jun >> --- >> drivers/gpu/drm/ttm/ttm_tt.c | 1 - >> 1 file changed, 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c >> index 02b812dacc5d..45a44544b656 100644 >> --- a/drivers/gpu/drm/ttm/ttm_tt.c >> +++ b/drivers/gpu/drm/ttm/ttm_tt.c >> @@ -143,7 +143,6 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm, >> unsigned long extra_pages) >> { >> ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) + >> extra_pages; >> - ttm->caching = ttm_cached; >> ttm->page_flags = page_flags; >> ttm->dma_address = NULL; >> ttm->swap_storage = NULL; >> -- >> 2.34.1 >
[PATCH v2] drm/ttm: Remove redundant code in ttm_tt_init_fields
Remove redundant assignment code for ttm->caching as it's overwritten just a few lines later. v2: - Update the commit message. Signed-off-by: Ma Jun --- drivers/gpu/drm/ttm/ttm_tt.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 02b812dacc5d..45a44544b656 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -143,7 +143,6 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm, unsigned long extra_pages) { ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) + extra_pages; - ttm->caching = ttm_cached; ttm->page_flags = page_flags; ttm->dma_address = NULL; ttm->swap_storage = NULL; -- 2.34.1
Re: [PATCH] drm/ttm: Remove redundant code in ttm_tt_init_fields
On 5/30/2023 4:59 PM, Christian König wrote: > Am 29.05.23 um 11:28 schrieb Ma Jun: >> Remove redundant assignment code for ttm->caching > > The explanation is missing why this is redundant, e.g. something like > "this is overwritten just a few lines later".. > Thanks for review. Will update the commit message in v2 Regards, Ma Jun > Apart from that looks good to me, > Christian., > >> >> Signed-off-by: Ma Jun >> --- >> drivers/gpu/drm/ttm/ttm_tt.c | 1 - >> 1 file changed, 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c >> index 02b812dacc5d..45a44544b656 100644 >> --- a/drivers/gpu/drm/ttm/ttm_tt.c >> +++ b/drivers/gpu/drm/ttm/ttm_tt.c >> @@ -143,7 +143,6 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm, >> unsigned long extra_pages) >> { >> ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) + >> extra_pages; >> -ttm->caching = ttm_cached; >> ttm->page_flags = page_flags; >> ttm->dma_address = NULL; >> ttm->swap_storage = NULL; >
[PATCH] drm/ttm: Remove redundant code in ttm_tt_init_fields
Remove redundant assignment code for ttm->caching Signed-off-by: Ma Jun --- drivers/gpu/drm/ttm/ttm_tt.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 02b812dacc5d..45a44544b656 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -143,7 +143,6 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm, unsigned long extra_pages) { ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) + extra_pages; - ttm->caching = ttm_cached; ttm->page_flags = page_flags; ttm->dma_address = NULL; ttm->swap_storage = NULL; -- 2.34.1
Re: [PATCH 1/2] drm/ttm: Check ttm_debugfs_root before creating files under it
On 1/13/2023 5:37 PM, Christian König wrote: > Am 13.01.23 um 06:34 schrieb Ma Jun: >> Check the ttm_debugfs_root before creating files under it. >> If the ttm_debugfs_root is NULL, all the files created for >> ttm/ will be placed in the /sys/kerne/debug/ but not >> /sys/kernel/debug/ttm/ > > Well NAK for upstreaming. Why should ttm_debugfs_root be NULL here? > In my case, when the ttm/ removal fails during amdgpu uninstall and then we try to modprobe the amdgpu again, the ttm_debugfs_root will be NULL because the ttm/ already exists. Regards, Ma Jun > Regards, > Christian. > >> >> Signed-off-by: Ma Jun >> --- >> drivers/gpu/drm/ttm/ttm_device.c | 3 ++- >> drivers/gpu/drm/ttm/ttm_pool.c | 10 ++ >> drivers/gpu/drm/ttm/ttm_tt.c | 5 +++-- >> 3 files changed, 11 insertions(+), 7 deletions(-) >> >> diff --git a/drivers/gpu/drm/ttm/ttm_device.c >> b/drivers/gpu/drm/ttm/ttm_device.c >> index e7147e304637..967bc2244df3 100644 >> --- a/drivers/gpu/drm/ttm/ttm_device.c >> +++ b/drivers/gpu/drm/ttm/ttm_device.c >> @@ -105,7 +105,8 @@ static int ttm_global_init(void) >> INIT_LIST_HEAD(&glob->device_list); >> atomic_set(&glob->bo_count, 0); >> >> -debugfs_create_atomic_t("buffer_objects", 0444, ttm_debugfs_root, >> +if(ttm_debugfs_root) >> +debugfs_create_atomic_t("buffer_objects", 0444, >> ttm_debugfs_root, >> &glob->bo_count); >> out: >> if (ret && ttm_debugfs_root) >> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c >> index 21b61631f73a..d95a65f759df 100644 >> --- a/drivers/gpu/drm/ttm/ttm_pool.c >> +++ b/drivers/gpu/drm/ttm/ttm_pool.c >> @@ -713,10 +713,12 @@ int ttm_pool_mgr_init(unsigned long num_pages) >> } >> >> #ifdef CONFIG_DEBUG_FS >> -debugfs_create_file("page_pool", 0444, ttm_debugfs_root, NULL, >> -&ttm_pool_debugfs_globals_fops); >> -debugfs_create_file("page_pool_shrink", 0400, ttm_debugfs_root, NULL, >> -&ttm_pool_debugfs_shrink_fops); >> +if(ttm_debugfs_root) { >> +debugfs_create_file("page_pool", 0444, ttm_debugfs_root, NULL, >> +&ttm_pool_debugfs_globals_fops); >> +debugfs_create_file("page_pool_shrink", 0400, ttm_debugfs_root, >> NULL, >> +&ttm_pool_debugfs_shrink_fops); >> +} >> #endif >> >> mm_shrinker.count_objects = ttm_pool_shrinker_count; >> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c >> index d505603930a7..fec443494ef0 100644 >> --- a/drivers/gpu/drm/ttm/ttm_tt.c >> +++ b/drivers/gpu/drm/ttm/ttm_tt.c >> @@ -394,8 +394,9 @@ DEFINE_SHOW_ATTRIBUTE(ttm_tt_debugfs_shrink); >> void ttm_tt_mgr_init(unsigned long num_pages, unsigned long >> num_dma32_pages) >> { >> #ifdef CONFIG_DEBUG_FS >> -debugfs_create_file("tt_shrink", 0400, ttm_debugfs_root, NULL, >> -&ttm_tt_debugfs_shrink_fops); >> +if(ttm_debugfs_root) >> +debugfs_create_file("tt_shrink", 0400, ttm_debugfs_root, NULL, >> +&ttm_tt_debugfs_shrink_fops); >> #endif >> >> if (!ttm_pages_limit) >
Re: [PATCH 2/2] drm/ttm: Use debugfs_remove_recursive to remove ttm directory
On 1/13/2023 5:38 PM, Christian König wrote: > Am 13.01.23 um 06:34 schrieb Ma Jun: >> Use debugfs_remove_recursive to remove the /sys/kernel/debug/ttm >> directory for better compatibility. Becuase debugfs_remove fails >> on older kernel. > > Again NAK for upstreaming. > > The upstream kernel is made for the newest kernel version and should not > contain any compatibility handling for older kernels. > Yes, generally so. But the debugfs_remove_recursive() and debugfs_remove() are same function now. The debugfs_remove_recursive is used here so that we don't need to make kcl patch for it. Regards, Ma Jun > Christian. > >> >> Signed-off-by: Ma Jun >> --- >> drivers/gpu/drm/ttm/ttm_device.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/ttm/ttm_device.c >> b/drivers/gpu/drm/ttm/ttm_device.c >> index 967bc2244df3..590297123bb2 100644 >> --- a/drivers/gpu/drm/ttm/ttm_device.c >> +++ b/drivers/gpu/drm/ttm/ttm_device.c >> @@ -55,7 +55,7 @@ static void ttm_global_release(void) >> goto out; >> >> ttm_pool_mgr_fini(); >> -debugfs_remove(ttm_debugfs_root); >> +debugfs_remove_recursive(ttm_debugfs_root); >> >> __free_page(glob->dummy_read_page); >> memset(glob, 0, sizeof(*glob)); >
[PATCH 2/2] drm/ttm: Use debugfs_remove_recursive to remove ttm directory
Use debugfs_remove_recursive to remove the /sys/kernel/debug/ttm directory for better compatibility. Becuase debugfs_remove fails on older kernel. Signed-off-by: Ma Jun --- drivers/gpu/drm/ttm/ttm_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c index 967bc2244df3..590297123bb2 100644 --- a/drivers/gpu/drm/ttm/ttm_device.c +++ b/drivers/gpu/drm/ttm/ttm_device.c @@ -55,7 +55,7 @@ static void ttm_global_release(void) goto out; ttm_pool_mgr_fini(); - debugfs_remove(ttm_debugfs_root); + debugfs_remove_recursive(ttm_debugfs_root); __free_page(glob->dummy_read_page); memset(glob, 0, sizeof(*glob)); -- 2.25.1
[PATCH 1/2] drm/ttm: Check ttm_debugfs_root before creating files under it
Check the ttm_debugfs_root before creating files under it. If the ttm_debugfs_root is NULL, all the files created for ttm/ will be placed in the /sys/kerne/debug/ but not /sys/kernel/debug/ttm/ Signed-off-by: Ma Jun --- drivers/gpu/drm/ttm/ttm_device.c | 3 ++- drivers/gpu/drm/ttm/ttm_pool.c | 10 ++ drivers/gpu/drm/ttm/ttm_tt.c | 5 +++-- 3 files changed, 11 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c index e7147e304637..967bc2244df3 100644 --- a/drivers/gpu/drm/ttm/ttm_device.c +++ b/drivers/gpu/drm/ttm/ttm_device.c @@ -105,7 +105,8 @@ static int ttm_global_init(void) INIT_LIST_HEAD(&glob->device_list); atomic_set(&glob->bo_count, 0); - debugfs_create_atomic_t("buffer_objects", 0444, ttm_debugfs_root, + if(ttm_debugfs_root) + debugfs_create_atomic_t("buffer_objects", 0444, ttm_debugfs_root, &glob->bo_count); out: if (ret && ttm_debugfs_root) diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c index 21b61631f73a..d95a65f759df 100644 --- a/drivers/gpu/drm/ttm/ttm_pool.c +++ b/drivers/gpu/drm/ttm/ttm_pool.c @@ -713,10 +713,12 @@ int ttm_pool_mgr_init(unsigned long num_pages) } #ifdef CONFIG_DEBUG_FS - debugfs_create_file("page_pool", 0444, ttm_debugfs_root, NULL, - &ttm_pool_debugfs_globals_fops); - debugfs_create_file("page_pool_shrink", 0400, ttm_debugfs_root, NULL, - &ttm_pool_debugfs_shrink_fops); + if(ttm_debugfs_root) { + debugfs_create_file("page_pool", 0444, ttm_debugfs_root, NULL, + &ttm_pool_debugfs_globals_fops); + debugfs_create_file("page_pool_shrink", 0400, ttm_debugfs_root, NULL, + &ttm_pool_debugfs_shrink_fops); + } #endif mm_shrinker.count_objects = ttm_pool_shrinker_count; diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index d505603930a7..fec443494ef0 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -394,8 +394,9 @@ DEFINE_SHOW_ATTRIBUTE(ttm_tt_debugfs_shrink); void ttm_tt_mgr_init(unsigned long num_pages, unsigned long num_dma32_pages) { #ifdef CONFIG_DEBUG_FS - debugfs_create_file("tt_shrink", 0400, ttm_debugfs_root, NULL, - &ttm_tt_debugfs_shrink_fops); + if(ttm_debugfs_root) + debugfs_create_file("tt_shrink", 0400, ttm_debugfs_root, NULL, + &ttm_tt_debugfs_shrink_fops); #endif if (!ttm_pages_limit) -- 2.25.1
[PATCH V2] drm/plane-helper: Add the missing declaration of drm_atomic_state
Add the missing declaration of struct drm_atomic_state to fix the compile error below: error: 'struct drm_atomic_state' declared inside parameter list will not be visible outside of this definition or declaration [-Werror] Signed-off-by: Ma Jun --- include/drm/drm_plane_helper.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/drm/drm_plane_helper.h b/include/drm/drm_plane_helper.h index b00ad36cf5b6..90156e13ac11 100644 --- a/include/drm/drm_plane_helper.h +++ b/include/drm/drm_plane_helper.h @@ -26,6 +26,7 @@ #include +struct drm_atomic_state; struct drm_crtc; struct drm_framebuffer; struct drm_modeset_acquire_ctx; -- 2.25.1
Re: [RESEND PATCH] drm/plane-helper: Add the missing declaration of drm_atomic_state
On 12/15/2022 4:40 PM, Thomas Zimmermann wrote: > Hi > > Am 15.12.22 um 04:01 schrieb Ma Jun: >> Add the missing declaration of struct drm_atomic_state to fix the >> compile error below: >> >> error: 'struct drm_atomic_state' declared inside parameter >> list will not be visible outside of this definition or declaration [-Werror] >> >> Signed-off-by: Ma Jun >> --- >> include/drm/drm_plane_helper.h | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/include/drm/drm_plane_helper.h b/include/drm/drm_plane_helper.h >> index b00ad36cf5b6..530f88176db4 100644 >> --- a/include/drm/drm_plane_helper.h >> +++ b/include/drm/drm_plane_helper.h >> @@ -30,6 +30,7 @@ struct drm_crtc; >> struct drm_framebuffer; >> struct drm_modeset_acquire_ctx; >> struct drm_plane; >> +struct drm_atomic_state; > > Thanks for the patch. Please sort the forward declarations alphabetically. > Thanks for review. Will fix in v2 Regards, Ma Jun > Best regards > Thomas > >> >> int drm_plane_helper_update_primary(struct drm_plane *plane, struct >> drm_crtc *crtc, >> struct drm_framebuffer *fb, >
[RESEND PATCH] drm/plane-helper: Add the missing declaration of drm_atomic_state
Add the missing declaration of struct drm_atomic_state to fix the compile error below: error: 'struct drm_atomic_state' declared inside parameter list will not be visible outside of this definition or declaration [-Werror] Signed-off-by: Ma Jun --- include/drm/drm_plane_helper.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/drm/drm_plane_helper.h b/include/drm/drm_plane_helper.h index b00ad36cf5b6..530f88176db4 100644 --- a/include/drm/drm_plane_helper.h +++ b/include/drm/drm_plane_helper.h @@ -30,6 +30,7 @@ struct drm_crtc; struct drm_framebuffer; struct drm_modeset_acquire_ctx; struct drm_plane; +struct drm_atomic_state; int drm_plane_helper_update_primary(struct drm_plane *plane, struct drm_crtc *crtc, struct drm_framebuffer *fb, -- 2.25.1
[PATCH] drm/plane-helper: Add the missing declaration of drm_atomic_state
Add the missing declaration of struct drm_atomic_state Signed-off-by: Ma Jun --- include/drm/drm_plane_helper.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/drm/drm_plane_helper.h b/include/drm/drm_plane_helper.h index b00ad36cf5b6..530f88176db4 100644 --- a/include/drm/drm_plane_helper.h +++ b/include/drm/drm_plane_helper.h @@ -30,6 +30,7 @@ struct drm_crtc; struct drm_framebuffer; struct drm_modeset_acquire_ctx; struct drm_plane; +struct drm_atomic_state; int drm_plane_helper_update_primary(struct drm_plane *plane, struct drm_crtc *crtc, struct drm_framebuffer *fb, -- 2.25.1
Re: Coverity: kfd_parse_subtype_cache(): Memory - corruptions
Thanks, I will send the fix patch. Regards, Ma Jun On 11/5/2022 4:40 AM, Felix Kuehling wrote: > On 2022-11-04 15:41, coverity-bot wrote: >> Hello! >> >> This is an experimental semi-automated report about issues detected by >> Coverity from a scan of next-20221104 as part of the linux-next scan project: >> https://scan.coverity.com/projects/linux-next-weekly-scan >> >> You're getting this email because you were associated with the identified >> lines of code (noted below) that were touched by commits: >> >>Fri Dec 8 23:08:59 2017 -0500 >> 3a87177eb141 ("drm/amdkfd: Add topology support for dGPUs") >> >> Coverity reported the following: >> >> *** CID 1527133: Memory - corruptions (OVERRUN) >> drivers/gpu/drm/amd/amdkfd/kfd_crat.c:1113 in kfd_parse_subtype_cache() >> 1107 props->cache_size = cache->cache_size; >> 1108 props->cacheline_size = cache->cache_line_size; >> 1109 props->cachelines_per_tag = >> cache->lines_per_tag; >> 1110 props->cache_assoc = cache->associativity; >> props->cache_latency = cache->cache_latency; >> 1112 >> vvv CID 1527133: Memory - corruptions (OVERRUN) >> vvv Overrunning array "cache->sibling_map" of 32 bytes by passing it to >> a function which accesses it at byte offset 63 using argument "64UL". [Note: >> The source code implementation of the function has been overridden by a >> builtin model.] >> 1113 memcpy(props->sibling_map, cache->sibling_map, >> 1114 sizeof(props->sibling_map)); >> 1115 >> 1116 /* set the sibling_map_size as 32 for CRAT from >> ACPI */ >> 1117 props->sibling_map_size = CRAT_SIBLINGMAP_SIZE; >> 1118 >> >> If this is a false positive, please let us know so we can mark it as >> such, or teach the Coverity rules to be smarter. If not, please make >> sure fixes get into linux-next. :) For patches fixing this, please >> include these lines (but double-check the "Fixes" first): >> >> Reported-by: coverity-bot >> Addresses-Coverity-ID: 1527133 ("Memory - corruptions") >> Fixes: 3a87177eb141 ("drm/amdkfd: Add topology support for dGPUs") >> >> I'm not sure why this suddenly appeared after 5 years, but the read >> over-run looks legit: > > > I think this was introduced by a more recent patch that was in fact > meant to fix an array overrun on HW that is outgrowing the CRAT sibling > map size: > >> commit 0938fbeb6f53fc44bc9b19784dee28496e68ba0c >> Author: Ma Jun >> Date: Wed Nov 2 15:53:26 2022 +0800 >> >> drm/amdkfd: Fix the warning of array-index-out-of-bounds >> >> For some GPUs with more CUs, the original sibling_map[32] >> in struct crat_subtype_cache is not enough >> to save the cache information when create the VCRAT table, >> so skip filling the struct crat_subtype_cache info instead >> fill struct kfd_cache_properties directly to fix this problem. >> >> Signed-off-by: Ma Jun >> Reviewed-by: Felix Kuehling >> Signed-off-by: Alex Deucher > I added Ma Jun to the email. > > Regards, > Felix > > >> >> struct crat_subtype_cache { >> ... >> uint8_t sibling_map[CRAT_SIBLINGMAP_SIZE]; >> >> #define CRAT_SIBLINGMAP_SIZE32 >> >> >> struct kfd_cache_properties { >> ... >> uint8_t sibling_map[CACHE_SIBLINGMAP_SIZE]; >> >> #define CACHE_SIBLINGMAP_SIZE 64 >> >> Thanks for your attention! >>