RE: [PATCH] drm/scheduler: Partially revert "drm/scheduler: track GPU active time per entity"
[Public] Hi Xinhui, That patch has been reverted on Linux mainline. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/scheduler/sched_main.c?h=v6.5-rc6=baad10973fdb442912af676de3348e80bd8fe602 Regards, Guchun > -Original Message- > From: amd-gfx On Behalf Of > xinhui pan > Sent: Thursday, August 17, 2023 1:05 PM > To: amd-...@lists.freedesktop.org > Cc: Pan, Xinhui ; dri-devel@lists.freedesktop.org; > Tuikov, Luben ; airl...@gmail.com; Koenig, > Christian ; l.st...@pengutronix.de > Subject: [PATCH] drm/scheduler: Partially revert "drm/scheduler: track GPU > active time per entity" > > This patch partially revert commit df622729ddbf ("drm/scheduler: track GPU > active time per entity") which touchs entity without any reference. > > I notice there is one memory overwritten from gpu scheduler side. > The case is like below. > A(drm_sched_main) B(vm fini) > drm_sched_job_begin drm_sched_entity_kill > //job in pending_list wait_for_completion > complete_all ... > ... kfree entity > drm_sched_get_cleanup_job > //fetch job from pending_list > access job->entity //memory overwitten > > As long as we can NOT guarantee entity is alive in this case, lets revert it > for > now. > > Signed-off-by: xinhui pan > --- > drivers/gpu/drm/scheduler/sched_main.c | 6 -- > 1 file changed, 6 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > b/drivers/gpu/drm/scheduler/sched_main.c > index 602361c690c9..1b3f1a6a8514 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -907,12 +907,6 @@ drm_sched_get_cleanup_job(struct > drm_gpu_scheduler *sched) > > spin_unlock(>job_list_lock); > > - if (job) { > - job->entity->elapsed_ns += ktime_to_ns( > - ktime_sub(job->s_fence->finished.timestamp, > - job->s_fence->scheduled.timestamp)); > - } > - > return job; > } > > -- > 2.34.1 <>
RE: [PATCH] drm/ttm: check null pointer before accessing when swapping
[Public] > -Original Message- > From: Koenig, Christian > Sent: Thursday, July 27, 2023 3:28 PM > To: Alex Deucher ; Chen, Guchun > > Cc: Deucher, Alexander ; airl...@gmail.com; > dan...@ffwll.ch; dri-devel@lists.freedesktop.org; Mikhail Gavrilov > > Subject: Re: [PATCH] drm/ttm: check null pointer before accessing when > swapping > > Am 24.07.23 um 15:36 schrieb Alex Deucher: > > On Sun, Jul 23, 2023 at 10:43 PM Guchun Chen > wrote: > >> Add a check to avoid null pointer dereference as below: > >> > >> [ 90.002283] general protection fault, probably for non-canonical > >> address 0xdc00: [#1] PREEMPT SMP KASAN NOPTI > >> [ 90.002292] KASAN: null-ptr-deref in range > >> [0x-0x0007] > >> [ 90.002346] ? exc_general_protection+0x159/0x240 > >> [ 90.002352] ? asm_exc_general_protection+0x26/0x30 > >> [ 90.002357] ? ttm_bo_evict_swapout_allowable+0x322/0x5e0 [ttm] > >> [ 90.002365] ? ttm_bo_evict_swapout_allowable+0x42e/0x5e0 [ttm] > >> [ 90.002373] ttm_bo_swapout+0x134/0x7f0 [ttm] > >> [ 90.002383] ? __pfx_ttm_bo_swapout+0x10/0x10 [ttm] > >> [ 90.002391] ? lock_acquire+0x44d/0x4f0 > >> [ 90.002398] ? ttm_device_swapout+0xa5/0x260 [ttm] > >> [ 90.002412] ? lock_acquired+0x355/0xa00 > >> [ 90.002416] ? do_raw_spin_trylock+0xb6/0x190 > >> [ 90.002421] ? __pfx_lock_acquired+0x10/0x10 > >> [ 90.002426] ? ttm_global_swapout+0x25/0x210 [ttm] > >> [ 90.002442] ttm_device_swapout+0x198/0x260 [ttm] > >> [ 90.002456] ? __pfx_ttm_device_swapout+0x10/0x10 [ttm] > >> [ 90.002472] ttm_global_swapout+0x75/0x210 [ttm] > >> [ 90.002486] ttm_tt_populate+0x187/0x3f0 [ttm] > >> [ 90.002501] ttm_bo_handle_move_mem+0x437/0x590 [ttm] > >> [ 90.002517] ttm_bo_validate+0x275/0x430 [ttm] > >> [ 90.002530] ? __pfx_ttm_bo_validate+0x10/0x10 [ttm] > >> [ 90.002544] ? kasan_save_stack+0x33/0x60 > >> [ 90.002550] ? kasan_set_track+0x25/0x30 > >> [ 90.002554] ? __kasan_kmalloc+0x8f/0xa0 > >> [ 90.002558] ? amdgpu_gtt_mgr_new+0x81/0x420 [amdgpu] > >> [ 90.003023] ? ttm_resource_alloc+0xf6/0x220 [ttm] > >> [ 90.003038] amdgpu_bo_pin_restricted+0x2dd/0x8b0 [amdgpu] > >> [ 90.003210] ? __x64_sys_ioctl+0x131/0x1a0 > >> [ 90.003210] ? do_syscall_64+0x60/0x90 > >> > >> Fixes: a2848d08742c ("drm/ttm: never consider pinned BOs for > >> eviction") > >> Tested-by: Mikhail Gavrilov > >> Signed-off-by: Guchun Chen > > Reviewed-by: Alex Deucher > > Reviewed-by: Christian König > > Has this already been pushed to drm-misc-next? > > Thanks, > Christian. Not yet, Christian, as I don't have push permission. I saw you were on vacation, so I would expect to ping you to push after you are back with full recharge. Regards, Guchun > > > >> --- > >> drivers/gpu/drm/ttm/ttm_bo.c | 3 ++- > >> 1 file changed, 2 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c > >> b/drivers/gpu/drm/ttm/ttm_bo.c index 7139a522b2f3..54e3083076b7 > >> 100644 > >> --- a/drivers/gpu/drm/ttm/ttm_bo.c > >> +++ b/drivers/gpu/drm/ttm/ttm_bo.c > >> @@ -519,7 +519,8 @@ static bool > ttm_bo_evict_swapout_allowable(struct > >> ttm_buffer_object *bo, > >> > >> if (bo->pin_count) { > >> *locked = false; > >> - *busy = false; > >> + if (busy) > >> + *busy = false; > >> return false; > >> } > >> > >> -- > >> 2.25.1 > >>
RE: [PATCH] drm/amdgpu: display/Kconfig: replace leading spaces with tab
[Public] It's https://gitlab.freedesktop.org/agd5f/linux/-/tree/amd-staging-drm-next?ref_type=heads. Latest patches including yours's will be pushed to this branch after a while. Regards, Guchun > -Original Message- > From: amd-gfx On Behalf Of Sui > Jingfeng > Sent: Wednesday, June 7, 2023 2:34 PM > To: Alex Deucher > Cc: Li, Sun peng (Leo) ; David Airlie > ; Pan, Xinhui ; Siqueira, Rodrigo > ; linux-ker...@vger.kernel.org; dri- > de...@lists.freedesktop.org; amd-...@lists.freedesktop.org; Daniel Vetter > ; Deucher, Alexander ; > Wentland, Harry ; Koenig, Christian > > Subject: Re: [PATCH] drm/amdgpu: display/Kconfig: replace leading spaces > with tab > > https://cgit.freedesktop.org/amd/drm-amd/ > > > This one has a long time with no update. > > > On 2023/6/7 14:31, Sui Jingfeng wrote: > > Hi, > > > > On 2023/6/7 03:15, Alex Deucher wrote: > >> Applied. Thanks! > > > > Where is the official branch of drm/amdgpu, I can't find it on the > > internet. > > > > Sorry for asking this silly question. > > > > >> Alex > >> > >> On Tue, Jun 6, 2023 at 9:33 AM Sui Jingfeng > >> wrote: > >>> This patch replace the leading spaces with tab, make them keep > >>> aligned with the rest of the config options. No functional change. > >>> > >>> Signed-off-by: Sui Jingfeng > >>> --- > >>> drivers/gpu/drm/amd/display/Kconfig | 17 +++-- > >>> 1 file changed, 7 insertions(+), 10 deletions(-) > >>> > >>> diff --git a/drivers/gpu/drm/amd/display/Kconfig > >>> b/drivers/gpu/drm/amd/display/Kconfig > >>> index 2d8e55e29637..04ccfc70d583 100644 > >>> --- a/drivers/gpu/drm/amd/display/Kconfig > >>> +++ b/drivers/gpu/drm/amd/display/Kconfig > >>> @@ -42,16 +42,13 @@ config DEBUG_KERNEL_DC > >>>Choose this option if you want to hit kdgb_break in assert. > >>> > >>> config DRM_AMD_SECURE_DISPLAY > >>> -bool "Enable secure display support" > >>> -depends on DEBUG_FS > >>> -depends on DRM_AMD_DC_FP > >>> -help > >>> -Choose this option if you want to > >>> -support secure display > >>> - > >>> -This option enables the calculation > >>> -of crc of specific region via debugfs. > >>> -Cooperate with specific DMCU FW. > >>> + bool "Enable secure display support" > >>> + depends on DEBUG_FS > >>> + depends on DRM_AMD_DC_FP > >>> + help > >>> + Choose this option if you want to support secure display > >>> > >>> + This option enables the calculation of crc of specific > >>> region via > >>> + debugfs. Cooperate with specific DMCU FW. > >>> > >>> endmenu > >>> -- > >>> 2.25.1 > >>> > -- > Jingfeng
RE: [PATCH v2] drm/ttm: Remove redundant code in ttm_tt_init_fields
[Public] > -Original Message- > From: amd-gfx On Behalf Of Ma > Jun > Sent: Wednesday, May 31, 2023 1:31 PM > To: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Koenig, > Christian > Cc: Ma, Jun > Subject: [PATCH v2] drm/ttm: Remove redundant code in ttm_tt_init_fields > > Remove redundant assignment code for ttm->caching as it's overwritten > > just a few lines later. Please drop the blank line in above message. With it fixed, the patch is: Reviewed-by: Guchun Chen Regards, Guchun > v2: > - Update the commit message. > > Signed-off-by: Ma Jun > --- > drivers/gpu/drm/ttm/ttm_tt.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c > index 02b812dacc5d..45a44544b656 100644 > --- a/drivers/gpu/drm/ttm/ttm_tt.c > +++ b/drivers/gpu/drm/ttm/ttm_tt.c > @@ -143,7 +143,6 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm, > unsigned long extra_pages) > { > ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) + > extra_pages; > - ttm->caching = ttm_cached; > ttm->page_flags = page_flags; > ttm->dma_address = NULL; > ttm->swap_storage = NULL; > -- > 2.34.1
RE: [PATCH 2/3] drm/amdgpu: Set GTT size equal to TTM mem limit
Looks you can drop macro 'AMDGPU_DEFAULT_GTT_SIZE_MB' as well. Regards, Guchun > -Original Message- > From: amd-gfx On Behalf Of > Mukul Joshi > Sent: Wednesday, April 26, 2023 9:53 AM > To: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org > Cc: Joshi, Mukul ; Kuehling, Felix > ; Koenig, Christian > Subject: [PATCH 2/3] drm/amdgpu: Set GTT size equal to TTM mem limit > > Use the helper function in TTM to get TTM mem limit and set GTT size to be > equal to TTL mem limit. > > Signed-off-by: Mukul Joshi > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 25 ++--- > 1 file changed, 6 insertions(+), 19 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index ce34b73d05bc..ac220c779fc8 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -1807,26 +1807,13 @@ int amdgpu_ttm_init(struct amdgpu_device > *adev) > DRM_INFO("amdgpu: %uM of VRAM memory ready\n", >(unsigned) (adev->gmc.real_vram_size / (1024 * 1024))); > > - /* Compute GTT size, either based on 1/2 the size of RAM size > - * or whatever the user passed on module init */ > - if (amdgpu_gtt_size == -1) { > - struct sysinfo si; > - > - si_meminfo(); > - /* Certain GL unit tests for large textures can cause problems > - * with the OOM killer since there is no way to link this > memory > - * to a process. This was originally mitigated (but not > necessarily > - * eliminated) by limiting the GTT size. The problem is this > limit > - * is often too low for many modern games so just make the > limit 1/2 > - * of system memory which aligns with TTM. The OOM > accounting needs > - * to be addressed, but we shouldn't prevent common 3D > applications > - * from being usable just to potentially mitigate that corner > case. > - */ > - gtt_size = max((AMDGPU_DEFAULT_GTT_SIZE_MB << 20), > -(u64)si.totalram * si.mem_unit / 2); > - } else { > + /* Compute GTT size, either based on TTM limit > + * or whatever the user passed on module init. > + */ > + if (amdgpu_gtt_size == -1) > + gtt_size = ttm_tt_pages_limit() << PAGE_SHIFT; > + else > gtt_size = (uint64_t)amdgpu_gtt_size << 20; > - } > > /* Initialize GTT memory pool */ > r = amdgpu_gtt_mgr_init(adev, gtt_size); > -- > 2.35.1
RE: [PATCH] drm/amdgpu: add a missing lock for AMDGPU_SCHED
>From coding style's perspective, this lock/unlock handling should be put into >amdgpu_ctx_priority_override. Regards, Guchun > -Original Message- > From: amd-gfx On Behalf Of Chia- > I Wu > Sent: Wednesday, April 26, 2023 8:48 AM > To: dri-devel@lists.freedesktop.org > Cc: Pan, Xinhui ; linux-ker...@vger.kernel.org; > sta...@vger.kernel.org; amd-...@lists.freedesktop.org; Daniel Vetter > ; Deucher, Alexander ; > David Airlie ; Koenig, Christian > > Subject: [PATCH] drm/amdgpu: add a missing lock for AMDGPU_SCHED > > Signed-off-by: Chia-I Wu > Cc: sta...@vger.kernel.org > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c > index e9b45089a28a6..863b2a34b2d64 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c > @@ -38,6 +38,7 @@ static int > amdgpu_sched_process_priority_override(struct amdgpu_device *adev, { > struct fd f = fdget(fd); > struct amdgpu_fpriv *fpriv; > + struct amdgpu_ctx_mgr *mgr; > struct amdgpu_ctx *ctx; > uint32_t id; > int r; > @@ -51,8 +52,11 @@ static int > amdgpu_sched_process_priority_override(struct amdgpu_device *adev, > return r; > } > > - idr_for_each_entry(>ctx_mgr.ctx_handles, ctx, id) > + mgr = >ctx_mgr; > + mutex_lock(>lock); > + idr_for_each_entry(>ctx_handles, ctx, id) > amdgpu_ctx_priority_override(ctx, priority); > + mutex_unlock(>lock); > > fdput(f); > return 0; > -- > 2.40.0.634.g4ca3ef3211-goog
RE: BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]
After reviewing this whole history, maybe attached patch is able to fix your problem. Can you have a try please? Regards, Guchun > -Original Message- > From: amd-gfx On Behalf Of > Mikhail Gavrilov > Sent: Tuesday, April 25, 2023 9:20 PM > To: Koenig, Christian > Cc: Daniel Vetter ; dri-devel de...@lists.freedesktop.org>; amd-gfx list ; > Linux List Kernel Mailing > Subject: Re: BUG: KASAN: null-ptr-deref in > drm_sched_job_cleanup+0x96/0x290 [gpu_sched] > > On Thu, Apr 20, 2023 at 3:32 PM Mikhail Gavrilov > wrote: > > > > Important don't give up. > > https://youtu.be/25zhHBGIHJ8 [40 min] > > https://youtu.be/utnDR26eYBY [50 min] > > https://youtu.be/DJQ_tiimW6g [12 min] > > https://youtu.be/Y6AH1oJKivA [6 min] > > Yes the issue is everything reproducible, but time to time it not > > happens at first attempt. > > I also uploaded other videos which proves that the issue definitely > > exists if someone will launch those games in turn. > > Reproducibility is only a matter of time. > > > > Anyway I didn't want you to spend so much time trying to reproduce it. > > This monkey business fits me more than you. > > It would be better if I could collect more useful info. > > Christian, > Did you manage to reproduce the problem? > > At the weekend I faced with slab-use-after-free in > amdgpu_vm_handle_moved. > I didn't play in the games at this time. > The Xwayland process was affected so it leads to desktop hang. > > > == > BUG: KASAN: slab-use-after-free in > amdgpu_vm_handle_moved+0x286/0x2d0 [amdgpu] Read of size 8 at addr > 888295c66190 by task Xwayland:cs0/173185 > > CPU: 21 PID: 173185 Comm: Xwayland:cs0 Tainted: GWL > --- --- 6.3.0-0.rc7.20230420gitcb0856346a60.59.fc39.x86_64+debug > #1 > Hardware name: System manufacturer System Product Name/ROG STRIX > X570-I GAMING, BIOS 4601 02/02/2023 Call Trace: > > dump_stack_lvl+0x76/0xd0 > print_report+0xcf/0x670 > ? amdgpu_vm_handle_moved+0x286/0x2d0 [amdgpu] ? > amdgpu_vm_handle_moved+0x286/0x2d0 [amdgpu] > kasan_report+0xa8/0xe0 > ? amdgpu_vm_handle_moved+0x286/0x2d0 [amdgpu] > amdgpu_vm_handle_moved+0x286/0x2d0 [amdgpu] > amdgpu_cs_ioctl+0x2b7e/0x5630 [amdgpu] > ? __pfx___lock_acquire+0x10/0x10 > ? __pfx_amdgpu_cs_ioctl+0x10/0x10 [amdgpu] ? mark_lock+0x101/0x16e0 ? > __lock_acquire+0xe54/0x59f0 ? __pfx_lock_release+0x10/0x10 ? > __pfx_amdgpu_cs_ioctl+0x10/0x10 [amdgpu] > drm_ioctl_kernel+0x1fc/0x3d0 > ? __pfx_drm_ioctl_kernel+0x10/0x10 > drm_ioctl+0x4c5/0xaa0 > ? __pfx_amdgpu_cs_ioctl+0x10/0x10 [amdgpu] ? > __pfx_drm_ioctl+0x10/0x10 ? _raw_spin_unlock_irqrestore+0x66/0x80 > ? lockdep_hardirqs_on+0x81/0x110 > ? _raw_spin_unlock_irqrestore+0x4f/0x80 > amdgpu_drm_ioctl+0xd2/0x1b0 [amdgpu] > __x64_sys_ioctl+0x131/0x1a0 > do_syscall_64+0x60/0x90 > ? do_syscall_64+0x6c/0x90 > ? lockdep_hardirqs_on+0x81/0x110 > ? do_syscall_64+0x6c/0x90 > ? lockdep_hardirqs_on+0x81/0x110 > ? do_syscall_64+0x6c/0x90 > ? lockdep_hardirqs_on+0x81/0x110 > ? do_syscall_64+0x6c/0x90 > ? lockdep_hardirqs_on+0x81/0x110 > entry_SYSCALL_64_after_hwframe+0x72/0xdc > RIP: 0033:0x7ffb71b0892d > Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 > 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 > f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00 > RSP: 002b:7ffb677fe840 EFLAGS: 0246 ORIG_RAX: > 0010 > RAX: ffda RBX: 7ffb677fe9f8 RCX: 7ffb71b0892d > RDX: 7ffb677fe900 RSI: c0186444 RDI: 000d > RBP: 7ffb677fe890 R08: 7ffb677fea50 R09: 7ffb677fe8e0 > R10: 556c4611bec0 R11: 0246 R12: 7ffb677fe900 > R13: c0186444 R14: 000d R15: 7ffb677fe9f8 > > > Allocated by task 173181: > kasan_save_stack+0x33/0x60 > kasan_set_track+0x25/0x30 > __kasan_kmalloc+0x8f/0xa0 > __kmalloc_node+0x65/0x160 > amdgpu_bo_create+0x31e/0xfb0 [amdgpu] > amdgpu_bo_create_user+0xca/0x160 [amdgpu] > amdgpu_gem_create_ioctl+0x398/0x980 [amdgpu] > drm_ioctl_kernel+0x1fc/0x3d0 > drm_ioctl+0x4c5/0xaa0 > amdgpu_drm_ioctl+0xd2/0x1b0 [amdgpu] > __x64_sys_ioctl+0x131/0x1a0 > do_syscall_64+0x60/0x90 > entry_SYSCALL_64_after_hwframe+0x72/0xdc > > Freed by task 173185: > kasan_save_stack+0x33/0x60 > kasan_set_track+0x25/0x30 > kasan_save_free_info+0x2e/0x50 > __kasan_slab_free+0x10b/0x1a0 > slab_free_freelist_hook+0x11e/0x1d0 > __kmem_cache_free+0xc0/0x2e0 > ttm_bo_release+0x667/0x9e0 [ttm] > amdgpu_bo_unref+0x35/0x70 [amdgpu] > amdgpu_gem_object_free+0x73/0xb0 [amdgpu] > drm_gem_handle_delete+0xe3/0x150 > drm_ioctl_kernel+0x1fc/0x3d0 > drm_ioctl+0x4c5/0xaa0 > amdgpu_drm_ioctl+0xd2/0x1b0 [amdgpu] > __x64_sys_ioctl+0x131/0x1a0 > do_syscall_64+0x60/0x90 > entry_SYSCALL_64_after_hwframe+0x72/0xdc > > Last potentially related work creation: > kasan_save_stack+0x33/0x60 > __kasan_record_aux_stack+0x97/0xb0 >
RE: [PATCH v3 2/2] drm/probe_helper: warning on poll_enabled for issue catching
> -Original Message- > From: Jani Nikula > Sent: Friday, March 10, 2023 8:05 PM > To: Chen, Guchun ; amd- > g...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Deucher, > Alexander ; Zhang, Hawking > ; dmitry.barysh...@linaro.org; > spassw...@web.de; m...@fireburn.co.uk > Cc: Chen, Guchun > Subject: Re: [PATCH v3 2/2] drm/probe_helper: warning on poll_enabled for > issue catching > > On Fri, 10 Mar 2023, Guchun Chen wrote: > > In order to catch issues in other drivers to ensure proper call > > sequence of polling function. > > > > v2: drop Fixes tag in commit message > > > > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2411 > > Reported-by: Bert Karwatzki > > Suggested-by: Dmitry Baryshkov > > Signed-off-by: Guchun Chen > > --- > > drivers/gpu/drm/drm_probe_helper.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/gpu/drm/drm_probe_helper.c > > b/drivers/gpu/drm/drm_probe_helper.c > > index 8127be134c39..85e0e80d4a52 100644 > > --- a/drivers/gpu/drm/drm_probe_helper.c > > +++ b/drivers/gpu/drm/drm_probe_helper.c > > @@ -852,6 +852,8 @@ > EXPORT_SYMBOL(drm_kms_helper_is_poll_worker); > > */ > > void drm_kms_helper_poll_disable(struct drm_device *dev) { > > + WARN_ON(!dev->mode_config.poll_enabled); > > Please address all previous review comments [1]. Sorry for missing your previous review email. Will address it in next patch set. Regards, Guchun > BR, > Jani. > > > [1] https://lore.kernel.org/r/87o7p3bde6@intel.com > > > > + > > if (dev->mode_config.poll_running) > > drm_kms_helper_disable_hpd(dev); > > -- > Jani Nikula, Intel Open Source Graphics Center
RE: [PATCH] drm/amdgpu: resove reboot exception for si oland
> -Original Message- > From: amd-gfx On Behalf Of > Zhenneng Li > Sent: Friday, March 10, 2023 3:40 PM > To: Deucher, Alexander > Cc: David Airlie ; Pan, Xinhui ; > linux-ker...@vger.kernel.org; dri-devel@lists.freedesktop.org; Zhenneng Li > ; amd-...@lists.freedesktop.org; Daniel Vetter > ; Koenig, Christian > Subject: [PATCH] drm/amdgpu: resove reboot exception for si oland > > During reboot test on arm64 platform, it may failure on boot. > > The error message are as follows: > [6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init [amdgpu]] > *ERROR* > late_init of IP block failed -22 > [7.006919][ 7] [ T295] amdgpu :04:00.0: amdgpu_device_ip_late_init > failed > [7.014224][ 7] [ T295] amdgpu :04:00.0: Fatal error during GPU init > --- > drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > index d6d9e3b1b2c0..dee51c757ac0 100644 > --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > @@ -7632,9 +7632,6 @@ static int si_dpm_late_init(void *handle) > if (!adev->pm.dpm_enabled) > return 0; > > - ret = si_set_temperature_range(adev); > - if (ret) > - return ret; si_set_temperature_range should be platform agnostic. Can you please elaborate more? Regards, Guchun > #if 0 //TODO ? > si_dpm_powergate_uvd(adev, true); > #endif > -- > 2.25.1
RE: [PATCH 2/2] drm/probe_helper: warning on poll_enabled for issue catching
> -Original Message- > From: Dmitry Baryshkov > Sent: Thursday, March 9, 2023 4:49 PM > To: Chen, Guchun ; amd- > g...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Deucher, > Alexander ; Zhang, Hawking > ; spassw...@web.de; m...@fireburn.co.uk > Subject: Re: [PATCH 2/2] drm/probe_helper: warning on poll_enabled for > issue catching > > On 09/03/2023 07:48, Guchun Chen wrote: > > In order to catch issues in other drivers to ensure proper call > > sequence of polling function. > > > > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2411 > > Fixes: a4e771729a51("drm/probe_helper: sort out poll_running vs > > poll_enabled") > > Previously it was suggested that this is not a fix, so the Fixes header is > incorrect. > > Also please use -vN when preparing/sending patchsets. This is v2. Will fix these in V3. Regards, Guchun > > Reported-by: Bert Karwatzki > > Suggested-by: Dmitry Baryshkov > > Signed-off-by: Guchun Chen > > --- > > drivers/gpu/drm/drm_probe_helper.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/gpu/drm/drm_probe_helper.c > > b/drivers/gpu/drm/drm_probe_helper.c > > index 8127be134c39..85e0e80d4a52 100644 > > --- a/drivers/gpu/drm/drm_probe_helper.c > > +++ b/drivers/gpu/drm/drm_probe_helper.c > > @@ -852,6 +852,8 @@ > EXPORT_SYMBOL(drm_kms_helper_is_poll_worker); > >*/ > > void drm_kms_helper_poll_disable(struct drm_device *dev) > > { > > + WARN_ON(!dev->mode_config.poll_enabled); > > + > > if (dev->mode_config.poll_running) > > drm_kms_helper_disable_hpd(dev); > > > > -- > With best wishes > Dmitry
RE: [PATCH 1/2] drm/amdgpu: add flag to enable/disable poll in suspend/resume path
Relying on dc_enabled will be more simple, thanks for your suggestion. I will send v2 to address this. Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Alex Deucher Sent: Thursday, March 9, 2023 12:29 AM To: Chen, Guchun Cc: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; dmitry.barysh...@linaro.org; spassw...@web.de; Deucher, Alexander ; Zhang, Hawking Subject: Re: [PATCH 1/2] drm/amdgpu: add flag to enable/disable poll in suspend/resume path On Wed, Mar 8, 2023 at 7:17 AM Guchun Chen wrote: > > Some amd asics having reliable hotplug support don't call > drm_kms_helper_poll_init in driver init sequence. However, due to the > unified suspend/resume path for all asics, because the > output_poll_work->func is not set for these asics, a warning arrives > when suspending. > > [ 90.656049] > [ 90.656050] ? console_unlock+0x4d/0x100 > [ 90.656053] ? __irq_work_queue_local+0x27/0x60 > [ 90.656056] ? irq_work_queue+0x2b/0x50 > [ 90.656057] ? __wake_up_klogd+0x40/0x60 > [ 90.656059] __cancel_work_timer+0xed/0x180 > [ 90.656061] drm_kms_helper_poll_disable.cold+0x1f/0x2c [drm_kms_helper] > [ 90.656072] amdgpu_device_suspend+0x81/0x170 [amdgpu] > [ 90.656180] amdgpu_pmops_runtime_suspend+0xb5/0x1b0 [amdgpu] > [ 90.656269] pci_pm_runtime_suspend+0x61/0x1b0 > > So add use_kms_poll flag as the initialization check in amdgpu code > before calling drm_kms_helper_poll_disable/drm_kms_helper_poll_enable > in suspend/resume path. > > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2411 > Fixes: a4e771729a51("drm/probe_helper: sort out poll_running vs > poll_enabled") > Reported-by: Bert Karwatzki > Suggested-by: Dmitry Baryshkov > Signed-off-by: Guchun Chen > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 -- > drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c | 1 + > drivers/gpu/drm/amd/amdgpu/dce_v10_0.c | 1 + > drivers/gpu/drm/amd/amdgpu/dce_v11_0.c | 1 + > drivers/gpu/drm/amd/amdgpu/dce_v6_0.c | 1 + > drivers/gpu/drm/amd/amdgpu/dce_v8_0.c | 1 + > 7 files changed, 10 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index c4a4e2fe6681..74af0b8c0d08 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -4145,7 +4145,8 @@ int amdgpu_device_suspend(struct drm_device *dev, bool > fbcon) > if (amdgpu_acpi_smart_shift_update(dev, AMDGPU_SS_DEV_D3)) > DRM_WARN("smart shift update failed\n"); > > - drm_kms_helper_poll_disable(dev); > + if (adev->mode_info.use_kms_poll) > + drm_kms_helper_poll_disable(dev); > > if (fbcon) > > drm_fb_helper_set_suspend_unlocked(adev_to_drm(adev)->fb_helper, true); @@ > -4243,7 +4244,8 @@ int amdgpu_device_resume(struct drm_device *dev, bool > fbcon) > if (fbcon) > > drm_fb_helper_set_suspend_unlocked(adev_to_drm(adev)->fb_helper, > false); > > - drm_kms_helper_poll_enable(dev); > + if (adev->mode_info.use_kms_poll) > + drm_kms_helper_poll_enable(dev); > Since polling is only enabled for analog outputs and DC doesn't support any analog outputs, I think we can simplify this to diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index c4a4e2fe6681..74af0b8c0d08 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4145,7 +4145,8 @@ int amdgpu_device_suspend(struct drm_device *dev, bool fbcon) if (amdgpu_acpi_smart_shift_update(dev, AMDGPU_SS_DEV_D3)) DRM_WARN("smart shift update failed\n"); - drm_kms_helper_poll_disable(dev); + if (!adev->dc_enabled) + drm_kms_helper_poll_disable(dev); if (fbcon) drm_fb_helper_set_suspend_unlocked(adev_to_drm(adev)->fb_helper, true); @@ -4243,7 +4244,8 @@ int amdgpu_device_resume(struct drm_device *dev, bool fbcon) if (fbcon) drm_fb_helper_set_suspend_unlocked(adev_to_drm(adev)->fb_helper, false); - drm_kms_helper_poll_enable(dev); + if (!adev->dc_enabled) + drm_kms_helper_poll_enable(dev); amdgpu_ras_resume(adev); Alternatively, we could also just move drm_kms_helper_poll_disable() into amdgpu_display_suspend_helper() and drm_kms_helper_poll_enable() into amdgpu_display_resume_helper(), but I'm not sure if the ordering here is important or not off hand. Alex > amdgpu_ras_resume(adev); > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h > index 32fe05c81
RE: [PATCH v2] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
Reviewed-by: Guchun Chen Regards, Guchun -Original Message- From: Guilherme G. Piccoli Sent: Thursday, February 2, 2023 12:48 AM To: amd-...@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org; Deucher, Alexander ; Koenig, Christian ; Pan, Xinhui ; ker...@gpiccoli.net; kernel-...@igalia.com; Guilherme G. Piccoli ; Chen, Guchun ; Tuikov, Luben ; Limonciello, Mario Subject: [PATCH v2] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini Currently amdgpu calls drm_sched_fini() from the fence driver sw fini routine - such function is expected to be called only after the respective init function - drm_sched_init() - was executed successfully. Happens that we faced a driver probe failure in the Steam Deck recently, and the function drm_sched_fini() was called even without its counter-part had been previously called, causing the following oops: amdgpu: probe of :04:00.0 failed with error -110 BUG: kernel NULL pointer dereference, address: 0090 PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338 Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022 RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched] [...] Call Trace: amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu] amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu] amdgpu_driver_release_kms+0x16/0x30 [amdgpu] devm_drm_dev_init_release+0x49/0x70 [...] To prevent that, check if the drm_sched was properly initialized for a given ring before calling its fini counter-part. Notice ideally we'd use sched.ready for that; such field is set as the latest thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such field - in the above oops for example, it was a GFX ring causing the crash, and the sched.ready field was set to true in the ring init routine, regardless of the state of the DRM scheduler. Hence, we ended-up using sched.ops as per Christian's suggestion [0]. [0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb...@amd.com/ Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)") Suggested-by: Christian König Cc: Guchun Chen Cc: Luben Tuikov Cc: Mario Limonciello Signed-off-by: Guilherme G. Piccoli --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 00444203220d..3b962cb680a6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -618,7 +618,13 @@ void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev) if (!ring || !ring->fence_drv.initialized) continue; - if (!ring->no_scheduler) + /* +* Notice we check for sched.ops since there's some +* override on the meaning of sched.ready by amdgpu. +* The natural check would be sched.ready, which is +* set as drm_sched_init() finishes... +*/ + if (!ring->no_scheduler && ring->sched.ops) drm_sched_fini(>sched); for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j) -- 2.39.0
RE: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
Hi Christian, Do you think if it makes sense that we can set 'ring->sched.ready' to be true in each ring init, even if before executing/setting up drm_sched_init in amdgpu_device_init_schedulers? As 'ready' is a member of gpu scheduler structure. Regards, Guchun -Original Message- From: Koenig, Christian Sent: Tuesday, January 31, 2023 6:59 PM To: Chen, Guchun ; Alex Deucher ; Guilherme G. Piccoli Cc: amd-...@lists.freedesktop.org; ker...@gpiccoli.net; Pan, Xinhui ; dri-devel@lists.freedesktop.org; Tuikov, Luben ; Limonciello, Mario ; kernel-...@igalia.com; Deucher, Alexander Subject: Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini Am 31.01.23 um 10:17 schrieb Chen, Guchun: > Hi Piccoli, > > Please ignore my request of full dmesg log. I can reproduce the issue and get > the same failure callstack by returning early with an error code prior to > amdgpu_device_init_schedulers. > > Regards, > Guchun > > -----Original Message- > From: Chen, Guchun > Sent: Tuesday, January 31, 2023 2:37 PM > To: Alex Deucher ; Guilherme G. Piccoli > > Cc: amd-...@lists.freedesktop.org; ker...@gpiccoli.net; Pan, Xinhui > ; dri-devel@lists.freedesktop.org; Tuikov, Luben > ; Limonciello, Mario > ; kernel-...@igalia.com; Deucher, Alexander > ; Koenig, Christian > > Subject: RE: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching > drm_sched init/fini > > Hi Piccoli, > > I agree with Alex's point, using ring->sched.name for such check is not a > good way. BTW, can you please attach a full dmesg long in bad case to help me > understand more? > > Regards, > Guchun > > -Original Message- > From: Alex Deucher > Sent: Tuesday, January 31, 2023 6:30 AM > To: Guilherme G. Piccoli > Cc: amd-...@lists.freedesktop.org; ker...@gpiccoli.net; Chen, Guchun > ; Pan, Xinhui ; > dri-devel@lists.freedesktop.org; Tuikov, Luben ; > Limonciello, Mario ; kernel-...@igalia.com; > Deucher, Alexander ; Koenig, Christian > > Subject: Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching > drm_sched init/fini > > On Mon, Jan 30, 2023 at 4:51 PM Guilherme G. Piccoli > wrote: >> + Luben >> >> (sorry, missed that in the first submission). >> >> On 30/01/2023 18:45, Guilherme G. Piccoli wrote: >>> Currently amdgpu calls drm_sched_fini() from the fence driver sw >>> fini routine - such function is expected to be called only after the >>> respective init function - drm_sched_init() - was executed successfully. >>> >>> Happens that we faced a driver probe failure in the Steam Deck >>> recently, and the function drm_sched_fini() was called even without >>> its counter-part had been previously called, causing the following oops: >>> >>> amdgpu: probe of :04:00.0 failed with error -110 >>> BUG: kernel NULL pointer dereference, address: 0090 PGD >>> 0 P4D 0 >>> Oops: 0002 [#1] PREEMPT SMP NOPTI >>> CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli >>> #338 Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022 >>> RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched] [...] Call Trace: >>> >>> amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu] >>> amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu] >>> amdgpu_driver_release_kms+0x16/0x30 [amdgpu] >>> devm_drm_dev_init_release+0x49/0x70 >>> [...] >>> >>> To prevent that, check if the drm_sched was properly initialized for >>> a given ring before calling its fini counter-part. >>> >>> Notice ideally we'd use sched.ready for that; such field is set as >>> the latest thing on drm_sched_init(). But amdgpu seems to "override" >>> the meaning of such field - in the above oops for example, it was a >>> GFX ring causing the crash, and the sched.ready field was set to >>> true in the ring init routine, regardless of the state of the DRM >>> scheduler. Hence, we ended-up using another sched field. >>>>> Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence >>>>> driver fini in s3 test (v2)") >>> Cc: Andrey Grodzovsky >>> Cc: Guchun Chen >>> Cc: Mario Limonciello >>> Signed-off-by: Guilherme G. Piccoli >>> --- >>> >>> >>> Hi folks, first of all thanks in advance for reviews / comments! >>> Notice that I've used the Fixes tag more in the sense to bring it to >>> stable, I didn't find a good patch candidate that added the call to >>> drm_sched_fini(), was reaching way too old commits...
RE: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
Hi Piccoli, Please ignore my request of full dmesg log. I can reproduce the issue and get the same failure callstack by returning early with an error code prior to amdgpu_device_init_schedulers. Regards, Guchun -Original Message- From: Chen, Guchun Sent: Tuesday, January 31, 2023 2:37 PM To: Alex Deucher ; Guilherme G. Piccoli Cc: amd-...@lists.freedesktop.org; ker...@gpiccoli.net; Pan, Xinhui ; dri-devel@lists.freedesktop.org; Tuikov, Luben ; Limonciello, Mario ; kernel-...@igalia.com; Deucher, Alexander ; Koenig, Christian Subject: RE: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini Hi Piccoli, I agree with Alex's point, using ring->sched.name for such check is not a good way. BTW, can you please attach a full dmesg long in bad case to help me understand more? Regards, Guchun -Original Message- From: Alex Deucher Sent: Tuesday, January 31, 2023 6:30 AM To: Guilherme G. Piccoli Cc: amd-...@lists.freedesktop.org; ker...@gpiccoli.net; Chen, Guchun ; Pan, Xinhui ; dri-devel@lists.freedesktop.org; Tuikov, Luben ; Limonciello, Mario ; kernel-...@igalia.com; Deucher, Alexander ; Koenig, Christian Subject: Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini On Mon, Jan 30, 2023 at 4:51 PM Guilherme G. Piccoli wrote: > > + Luben > > (sorry, missed that in the first submission). > > On 30/01/2023 18:45, Guilherme G. Piccoli wrote: > > Currently amdgpu calls drm_sched_fini() from the fence driver sw > > fini routine - such function is expected to be called only after the > > respective init function - drm_sched_init() - was executed successfully. > > > > Happens that we faced a driver probe failure in the Steam Deck > > recently, and the function drm_sched_fini() was called even without > > its counter-part had been previously called, causing the following oops: > > > > amdgpu: probe of :04:00.0 failed with error -110 > > BUG: kernel NULL pointer dereference, address: 0090 PGD > > 0 P4D 0 > > Oops: 0002 [#1] PREEMPT SMP NOPTI > > CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli > > #338 Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022 > > RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched] [...] Call Trace: > > > > amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu] > > amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu] > > amdgpu_driver_release_kms+0x16/0x30 [amdgpu] > > devm_drm_dev_init_release+0x49/0x70 > > [...] > > > > To prevent that, check if the drm_sched was properly initialized for > > a given ring before calling its fini counter-part. > > > > Notice ideally we'd use sched.ready for that; such field is set as > > the latest thing on drm_sched_init(). But amdgpu seems to "override" > > the meaning of such field - in the above oops for example, it was a > > GFX ring causing the crash, and the sched.ready field was set to > > true in the ring init routine, regardless of the state of the DRM > > scheduler. Hence, we ended-up using another sched field. > >> > Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence > >> > driver fini in s3 test (v2)") > > Cc: Andrey Grodzovsky > > Cc: Guchun Chen > > Cc: Mario Limonciello > > Signed-off-by: Guilherme G. Piccoli > > --- > > > > > > Hi folks, first of all thanks in advance for reviews / comments! > > Notice that I've used the Fixes tag more in the sense to bring it to > > stable, I didn't find a good patch candidate that added the call to > > drm_sched_fini(), was reaching way too old commits...so > > 067f44c8b459 seems a good candidate - or maybe not? > > > > Now, with regards sched.ready, spent a bit of time to figure what > > was happening...would be feasible maybe to stop using that to mark > > some kind ring status? I think it should be possible to add a flag > > to the ring structure for that, and free sched.ready from being > > manipulate by the amdgpu driver, what's your thoughts on that? It's been a while, but IIRC, we used to have a ring->ready field in the driver which at some point got migrated out of the driver into the GPU scheduler and the driver side code never got cleaned up. I think we should probably just drop the driver messing with that field and leave it up to the drm scheduler. Alex > > > > I could try myself, but first of course I'd like to raise the > > "temperature" on this topic and check if somebody is already working > > on that. > > > > Cheers, > > > > Guilherme > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 8 +++-
RE: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
Hi Piccoli, I agree with Alex's point, using ring->sched.name for such check is not a good way. BTW, can you please attach a full dmesg long in bad case to help me understand more? Regards, Guchun -Original Message- From: Alex Deucher Sent: Tuesday, January 31, 2023 6:30 AM To: Guilherme G. Piccoli Cc: amd-...@lists.freedesktop.org; ker...@gpiccoli.net; Chen, Guchun ; Pan, Xinhui ; dri-devel@lists.freedesktop.org; Tuikov, Luben ; Limonciello, Mario ; kernel-...@igalia.com; Deucher, Alexander ; Koenig, Christian Subject: Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini On Mon, Jan 30, 2023 at 4:51 PM Guilherme G. Piccoli wrote: > > + Luben > > (sorry, missed that in the first submission). > > On 30/01/2023 18:45, Guilherme G. Piccoli wrote: > > Currently amdgpu calls drm_sched_fini() from the fence driver sw > > fini routine - such function is expected to be called only after the > > respective init function - drm_sched_init() - was executed successfully. > > > > Happens that we faced a driver probe failure in the Steam Deck > > recently, and the function drm_sched_fini() was called even without > > its counter-part had been previously called, causing the following oops: > > > > amdgpu: probe of :04:00.0 failed with error -110 > > BUG: kernel NULL pointer dereference, address: 0090 PGD > > 0 P4D 0 > > Oops: 0002 [#1] PREEMPT SMP NOPTI > > CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli > > #338 Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022 > > RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched] [...] Call Trace: > > > > amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu] > > amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu] > > amdgpu_driver_release_kms+0x16/0x30 [amdgpu] > > devm_drm_dev_init_release+0x49/0x70 > > [...] > > > > To prevent that, check if the drm_sched was properly initialized for > > a given ring before calling its fini counter-part. > > > > Notice ideally we'd use sched.ready for that; such field is set as > > the latest thing on drm_sched_init(). But amdgpu seems to "override" > > the meaning of such field - in the above oops for example, it was a > > GFX ring causing the crash, and the sched.ready field was set to > > true in the ring init routine, regardless of the state of the DRM > > scheduler. Hence, we ended-up using another sched field. > >> > Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence > >> > driver fini in s3 test (v2)") > > Cc: Andrey Grodzovsky > > Cc: Guchun Chen > > Cc: Mario Limonciello > > Signed-off-by: Guilherme G. Piccoli > > --- > > > > > > Hi folks, first of all thanks in advance for reviews / comments! > > Notice that I've used the Fixes tag more in the sense to bring it to > > stable, I didn't find a good patch candidate that added the call to > > drm_sched_fini(), was reaching way too old commits...so > > 067f44c8b459 seems a good candidate - or maybe not? > > > > Now, with regards sched.ready, spent a bit of time to figure what > > was happening...would be feasible maybe to stop using that to mark > > some kind ring status? I think it should be possible to add a flag > > to the ring structure for that, and free sched.ready from being > > manipulate by the amdgpu driver, what's your thoughts on that? It's been a while, but IIRC, we used to have a ring->ready field in the driver which at some point got migrated out of the driver into the GPU scheduler and the driver side code never got cleaned up. I think we should probably just drop the driver messing with that field and leave it up to the drm scheduler. Alex > > > > I could try myself, but first of course I'd like to raise the > > "temperature" on this topic and check if somebody is already working > > on that. > > > > Cheers, > > > > Guilherme > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 8 +++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > > index 00444203220d..e154eb8241fb 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > > @@ -618,7 +618,13 @@ void amdgpu_fence_driver_sw_fini(struct amdgpu_device > > *adev) > > if (!ring || !ring->fence_drv.initialized) > > continue; > > > > - if (!ring->no_scheduler) > > + /* > > + * Notice we check for sched.name since there's some > > + * override on the meaning of sched.ready by amdgpu. > > + * The natural check would be sched.ready, which is > > + * set as drm_sched_init() finishes... > > + */ > > + if (!ring->no_scheduler && ring->sched.name) > > drm_sched_fini(>sched); > > > > for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
RE: [pull] amdgpu, amdkfd drm-fixes-6.1
Hello Alex, Regarding below patch, I guess we need to pick "8eb402f16d5b drm/amdgpu: Fix uninitialized warning in mmhub_v2_0_get_clockgating()" together, otherwise, build will possibly fail. Is it true? " Lijo Lazar (1): drm/amdgpu: Remove ATC L2 access for MMHUB 2.1.x" Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Alex Deucher Sent: Thursday, October 27, 2022 10:41 AM To: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; airl...@gmail.com; daniel.vet...@ffwll.ch Cc: Deucher, Alexander Subject: [pull] amdgpu, amdkfd drm-fixes-6.1 Hi Dave, Daniel, Fixes for 6.1. Fixes for new IPs and misc other fixes. The following changes since commit cbc543c59e8e7c8bc8604d6ac3e18a029e3d5118: Merge tag 'drm-misc-fixes-2022-10-20' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes (2022-10-21 09:56:14 +1000) are available in the Git repository at: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fagd5f%2Flinux.gitdata=05%7C01%7Cguchun.chen%40amd.com%7C6bbe7e42eb3d43bf622208dab7c4c906%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638024353059986195%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=Y%2BU1OrPyhCaS44nGQMTrtqBpdkcJwFdFJEAaqWGiaqo%3Dreserved=0 tags/amd-drm-fixes-6.1-2022-10-26-1 for you to fetch changes up to d61e1d1d5225a9baeb995bcbdb904f66f70ed87e: drm/amdgpu: disallow gfxoff until GC IP blocks complete s2idle resume (2022-10-26 17:48:43 -0400) amd-drm-fixes-6.1-2022-10-26-1: amdgpu: - Stable pstate fix - SMU 13.x updates - SR-IOV fixes - PCI AER fix - GC 11.x fixes - Display fixes - Expose IMU firmware version for debugging - Plane modifier fix - S0i3 fix amdkfd: - Fix possible memory leak - Fix GC 10.x cache info reporting UAPI: - Expose IMU firmware version via existing INFO firmware query Alvin Lee (1): drm/amd/display: Don't return false if no stream Chengming Gui (1): drm/amdgpu: fix pstate setting issue David Francis (1): drm/amd: Add IMU fw version to fw version queries Jesse Zhang (1): drm/amdkfd: correct the cache info for gfx1036 Joaquín Ignacio Aramendía (1): drm/amd/display: Revert logic for plane modifiers Kenneth Feng (2): drm/amd/pm: update driver-if header for smu_v13_0_10 drm/amd/pm: allow gfxoff on gc_11_0_3 Lijo Lazar (1): drm/amdgpu: Remove ATC L2 access for MMHUB 2.1.x Prike Liang (2): drm/amdkfd: update gfx1037 Lx cache setting drm/amdgpu: disallow gfxoff until GC IP blocks complete s2idle resume Rafael Mendonca (1): drm/amdkfd: Fix memory leak in kfd_mem_dmamap_userptr() Rodrigo Siqueira (1): drm/amd/display: Remove wrong pipe control lock Yiqing Yao (1): drm/amdgpu: Adjust MES polling timeout for sriov YuBiao Wang (1): drm/amdgpu: skip mes self test for gc 11.0.3 in recover drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c| 5 +- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 18 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 13 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h| 1 + drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 1 + drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 9 +- drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c| 28 ++ drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 106 +++- .../drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c| 50 ++ drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c | 12 +-- .../amd/display/dc/dcn32/dcn32_resource_helpers.c | 2 +- .../pm/swsmu/inc/pmfw_if/smu13_driver_if_v13_0_0.h | 111 +++-- drivers/gpu/drm/amd/pm/swsmu/inc/smu_v13_0.h | 2 +- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 7 +- include/uapi/drm/amdgpu_drm.h | 2 + 18 files changed, 259 insertions(+), 119 deletions(-)
RE: [PATCH v1] drivers:amdgpu: check the return value of amdgpu_bo_kmap
Perhaps you need to update the prefix of patch subject to 'drm/amd/pm: check return value ...'. With above addressed, it's: Acked-by: Guchun Chen Regards, Guchun -Original Message- From: Li Zhong Sent: Thursday, September 22, 2022 9:27 AM To: dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org Cc: jiapeng.ch...@linux.alibaba.com; Powell, Darren ; Chen, Guchun ; Limonciello, Mario ; Quan, Evan ; Lazar, Lijo ; dan...@ffwll.ch; airl...@linux.ie; Pan, Xinhui ; Koenig, Christian ; Deucher, Alexander ; Li Zhong Subject: [PATCH v1] drivers:amdgpu: check the return value of amdgpu_bo_kmap amdgpu_bo_kmap() returns error when fails to map buffer object. Add the error check and propagate the error. Signed-off-by: Li Zhong --- drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c index 1eb4e613b27a..ec055858eb95 100644 --- a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c +++ b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c @@ -1485,6 +1485,7 @@ static int pp_get_prv_buffer_details(void *handle, void **addr, size_t *size) { struct pp_hwmgr *hwmgr = handle; struct amdgpu_device *adev = hwmgr->adev; + int err; if (!addr || !size) return -EINVAL; @@ -1492,7 +1493,9 @@ static int pp_get_prv_buffer_details(void *handle, void **addr, size_t *size) *addr = NULL; *size = 0; if (adev->pm.smu_prv_buffer) { - amdgpu_bo_kmap(adev->pm.smu_prv_buffer, addr); + err = amdgpu_bo_kmap(adev->pm.smu_prv_buffer, addr); + if (err) + return err; *size = adev->pm.smu_prv_buffer_size; } -- 2.25.1
RE: [PATCH] drm/amdgpu: Fix GTT size reporting in amdgpu_ioctl
Hi Alex, I think we need to revert this patch on amd-staging-drm-next branch, as its base commit like " drm/amdgpu: remove GTT accounting v2" does not present on 5.16. Instead, the series is part of upcoming 5.18 based amd-staging-drm-next branch. Otherwise, incorrect GTT size reporting switched from page to bytes will crash several vulkan APPs. Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Alex Deucher Sent: Saturday, June 11, 2022 12:01 AM To: Michel Dänzer Cc: Deucher, Alexander ; Pan, Xinhui ; amd-gfx list ; Koenig, Christian ; Maling list - DRI developers Subject: Re: [PATCH] drm/amdgpu: Fix GTT size reporting in amdgpu_ioctl Applied. Thanks! Alex On Fri, Jun 10, 2022 at 10:01 AM Michel Dänzer wrote: > > From: Michel Dänzer > > The commit below changed the TTM manager size unit from pages to > bytes, but failed to adjust the corresponding calculations in > amdgpu_ioctl. > > Fixes: dfa714b88eb0 ("drm/amdgpu: remove GTT accounting v2") > Bug: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitl > ab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1930data=05%7C01%7C > guchun.chen%40amd.com%7C28ed180d765c4588474008da4afa68e1%7C3dd8961fe48 > 84e608e11a82d994e183d%7C0%7C0%7C637904736611555668%7CUnknown%7CTWFpbGZ > sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3 > D%7C3000%7C%7C%7Csdata=%2Bmr%2BJWj5q%2BfB04L4hmNSG%2BYpfhny6YayNV > gt2xty6bo%3Dreserved=0 > Bug: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitl > ab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fissues%2F6642data=05%7C01% > 7Cguchun.chen%40amd.com%7C28ed180d765c4588474008da4afa68e1%7C3dd8961fe > 4884e608e11a82d994e183d%7C0%7C0%7C637904736611555668%7CUnknown%7CTWFpb > GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0 > %3D%7C3000%7C%7C%7Csdata=yN1jFKsffHu2Ik2crsrRxGBxCRylXckSj9zILxTZ > QzE%3Dreserved=0 > Signed-off-by: Michel Dänzer > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > index 801f6fa692e9..6de63ea6687e 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > @@ -642,7 +642,6 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, > struct drm_file *filp) > atomic64_read(>visible_pin_size), > vram_gtt.vram_size); > vram_gtt.gtt_size = ttm_manager_type(>mman.bdev, > TTM_PL_TT)->size; > - vram_gtt.gtt_size *= PAGE_SIZE; > vram_gtt.gtt_size -= atomic64_read(>gart_pin_size); > return copy_to_user(out, _gtt, > min((size_t)size, > sizeof(vram_gtt))) ? -EFAULT : 0; @@ -675,7 +674,6 @@ int > amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) > mem.cpu_accessible_vram.usable_heap_size * 3 / > 4; > > mem.gtt.total_heap_size = gtt_man->size; > - mem.gtt.total_heap_size *= PAGE_SIZE; > mem.gtt.usable_heap_size = mem.gtt.total_heap_size - > atomic64_read(>gart_pin_size); > mem.gtt.heap_usage = > ttm_resource_manager_usage(gtt_man); > -- > 2.36.1 >
RE: [PATCH -next 1/2 v2] drm/amdgpu: remove unneeded semicolon
Series is: Reviewed-by: Guchun Chen Regards, Guchun -Original Message- From: Yang Li Sent: Thursday, January 13, 2022 3:12 PM To: airl...@linux.ie; Chen, Guchun Cc: dan...@ffwll.ch; Deucher, Alexander ; Koenig, Christian ; Pan, Xinhui ; amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; linux-ker...@vger.kernel.org; Yang Li ; Abaci Robot Subject: [PATCH -next 1/2 v2] drm/amdgpu: remove unneeded semicolon Eliminate the following coccicheck warning: ./drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2725:16-17: Unneeded semicolon Reported-by: Abaci Robot Signed-off-by: Yang Li --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index d4d9b9ea8bbd..ff9bd5a844fe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -2722,7 +2722,7 @@ struct amdgpu_ras* amdgpu_ras_get_context(struct amdgpu_device *adev) int amdgpu_ras_set_context(struct amdgpu_device *adev, struct amdgpu_ras* ras_con) { if (!adev) - return -EINVAL;; + return -EINVAL; adev->psp.ras_context.ras = ras_con; return 0; -- 2.20.1.7.g153144c
RE: [PATCH -next 1/2] drm/amdgpu: remove unneeded semicolon
Thanks for your patch, Yang. Can you pls also fix the original indentation problem as well? if (!adev) - return -EINVAL;; + return -EINVAL; Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Yang Li Sent: Thursday, January 13, 2022 9:22 AM To: airl...@linux.ie Cc: Pan, Xinhui ; Abaci Robot ; linux-ker...@vger.kernel.org; dri-devel@lists.freedesktop.org; Yang Li ; amd-...@lists.freedesktop.org; dan...@ffwll.ch; Deucher, Alexander ; Koenig, Christian Subject: [PATCH -next 1/2] drm/amdgpu: remove unneeded semicolon Eliminate the following coccicheck warning: ./drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2725:16-17: Unneeded semicolon Reported-by: Abaci Robot Signed-off-by: Yang Li --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index d4d9b9ea8bbd..7d9d99e581da 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -2722,7 +2722,7 @@ struct amdgpu_ras* amdgpu_ras_get_context(struct amdgpu_device *adev) int amdgpu_ras_set_context(struct amdgpu_device *adev, struct amdgpu_ras* ras_con) { if (!adev) - return -EINVAL;; + return -EINVAL; adev->psp.ras_context.ras = ras_con; return 0; -- 2.20.1.7.g153144c
RE: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list
[Public] Hi Christian, My BAD, I checked that discussion history of this just now. So If I read it correctly, the double check at a different place to skip evict is: " drm/ttm: Double check mem_type of BO while eviction"? It is in 5.16 kernel. Regards, Guchun -Original Message- From: Christian König Sent: Tuesday, January 11, 2022 7:27 PM To: Chen, Guchun ; Pan, Xinhui ; Koenig, Christian ; amd-...@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Subject: Re: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list IIRC we have completely dropped this patch in favor of a check at a different place. Regards, Christian. Am 11.01.22 um 09:47 schrieb Chen, Guchun: > [Public] > > Hi Christian, > > Looks this patch still missed in 5.16 kernel. Is it intentional? > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit. > kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2 > Ftree%2Fdrivers%2Fgpu%2Fdrm%2Fttm%2Fttm_bo.c%3Fh%3Dv5.16data=04%7 > C01%7CGuchun.Chen%40amd.com%7Cf3b7f4971dc8405b0c2908d9d4f55547%7C3dd89 > 61fe4884e608e11a82d994e183d%7C0%7C0%7C637774972434004088%7CUnknown%7CT > WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI > 6Mn0%3D%7C3000sdata=vbuBPHO40J2HGt7abzfzC0nC1DQa62qal5S6TXBRj4w%3 > Dreserved=0 > > Regards, > Guchun > > -Original Message- > From: amd-gfx On Behalf Of > Pan, Xinhui > Sent: Tuesday, November 9, 2021 9:16 PM > To: Koenig, Christian ; > amd-...@lists.freedesktop.org > Cc: dri-devel@lists.freedesktop.org > Subject: 回复: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru > list > > [AMD Official Use Only] > > [AMD Official Use Only] > > Actually this patch does not totally fix the mismatch of lru list with > mem_type as mem_type is changed in ->move() and lru list is changed after > that. > > During this small period, another eviction could still happed and evict this > mismatched BO from sMam(say, its lru list is on vram domain) to sMem. > > 发件人: Pan, Xinhui > 发送时间: 2021年11月9日 21:05 > 收件人: Koenig, Christian; amd-...@lists.freedesktop.org > 抄送: dri-devel@lists.freedesktop.org > 主题: 回复: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list > > Yes, a stable tag is needed. vulkan guys say 5.14 hit this issue too. > > I think that amdgpu_bo_move() does support copy from sysMem to sysMem > correctly. > maybe something below is needed. > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index c83ef42ca702..aa63ae7ddf1e 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -485,7 +485,8 @@ static int amdgpu_bo_move(struct ttm_buffer_object *bo, > bool evict, > } > if (old_mem->mem_type == TTM_PL_SYSTEM && > (new_mem->mem_type == TTM_PL_TT || > -new_mem->mem_type == AMDGPU_PL_PREEMPT)) { > +new_mem->mem_type == AMDGPU_PL_PREEMPT || > +new_mem->mem_type == TTM_PL_SYSTEM)) { > ttm_bo_move_null(bo, new_mem); > goto out; > } > > otherwise, amdgpu_move_blit() is called to do the system memory copy which > use a wrong address. > 206 /* Map only what can't be accessed directly */ > 207 if (!tmz && mem->start != AMDGPU_BO_INVALID_OFFSET) { > 208 *addr = amdgpu_ttm_domain_start(adev, mem->mem_type) + > 209 mm_cur->start; > 210 return 0; > 211 } > > line 208, *addr is zero. So when amdgpu_copy_buffer submit job with such > addr, page fault happens. > > > > 发件人: Koenig, Christian > 发送时间: 2021年11月9日 20:35 > 收件人: Pan, Xinhui; amd-...@lists.freedesktop.org > 抄送: dri-devel@lists.freedesktop.org > 主题: Re: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list > > Mhm, I'm not sure what the rational behind that is. > > Not moving the BO would make things less efficient, but should never cause a > crash. > > Maybe we should add a CC: stable tag and push it to -fixes instead? > > Christian. > > Am 09.11.21 um 13:28 schrieb Pan, Xinhui: >> [AMD Official Use Only] >> >> I hit vulkan cts test hang with navi23. >> >> dmesg says gmc page fault with address 0x0, 0x1000, 0x2000 >> And some debug log also says amdgu copy one BO from system Domain to system >> Domain which is really weird. >> >> 发件人: Koenig, Christian >> 发送时间: 2021年11月9日 20:20 >> 收件人: Pan
RE: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list
[Public] Hi Christian, Looks this patch still missed in 5.16 kernel. Is it intentional? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/ttm/ttm_bo.c?h=v5.16 Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Pan, Xinhui Sent: Tuesday, November 9, 2021 9:16 PM To: Koenig, Christian ; amd-...@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Subject: 回复: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list [AMD Official Use Only] [AMD Official Use Only] Actually this patch does not totally fix the mismatch of lru list with mem_type as mem_type is changed in ->move() and lru list is changed after that. During this small period, another eviction could still happed and evict this mismatched BO from sMam(say, its lru list is on vram domain) to sMem. 发件人: Pan, Xinhui 发送时间: 2021年11月9日 21:05 收件人: Koenig, Christian; amd-...@lists.freedesktop.org 抄送: dri-devel@lists.freedesktop.org 主题: 回复: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list Yes, a stable tag is needed. vulkan guys say 5.14 hit this issue too. I think that amdgpu_bo_move() does support copy from sysMem to sysMem correctly. maybe something below is needed. diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index c83ef42ca702..aa63ae7ddf1e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -485,7 +485,8 @@ static int amdgpu_bo_move(struct ttm_buffer_object *bo, bool evict, } if (old_mem->mem_type == TTM_PL_SYSTEM && (new_mem->mem_type == TTM_PL_TT || -new_mem->mem_type == AMDGPU_PL_PREEMPT)) { +new_mem->mem_type == AMDGPU_PL_PREEMPT || +new_mem->mem_type == TTM_PL_SYSTEM)) { ttm_bo_move_null(bo, new_mem); goto out; } otherwise, amdgpu_move_blit() is called to do the system memory copy which use a wrong address. 206 /* Map only what can't be accessed directly */ 207 if (!tmz && mem->start != AMDGPU_BO_INVALID_OFFSET) { 208 *addr = amdgpu_ttm_domain_start(adev, mem->mem_type) + 209 mm_cur->start; 210 return 0; 211 } line 208, *addr is zero. So when amdgpu_copy_buffer submit job with such addr, page fault happens. 发件人: Koenig, Christian 发送时间: 2021年11月9日 20:35 收件人: Pan, Xinhui; amd-...@lists.freedesktop.org 抄送: dri-devel@lists.freedesktop.org 主题: Re: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list Mhm, I'm not sure what the rational behind that is. Not moving the BO would make things less efficient, but should never cause a crash. Maybe we should add a CC: stable tag and push it to -fixes instead? Christian. Am 09.11.21 um 13:28 schrieb Pan, Xinhui: > [AMD Official Use Only] > > I hit vulkan cts test hang with navi23. > > dmesg says gmc page fault with address 0x0, 0x1000, 0x2000 > And some debug log also says amdgu copy one BO from system Domain to system > Domain which is really weird. > > 发件人: Koenig, Christian > 发送时间: 2021年11月9日 20:20 > 收件人: Pan, Xinhui; amd-...@lists.freedesktop.org > 抄送: dri-devel@lists.freedesktop.org > 主题: Re: [PATCH] drm/ttm: Put BO in its memory manager's lru list > > Am 09.11.21 um 12:19 schrieb xinhui pan: >> After we move BO to a new memory region, we should put it to the new >> memory manager's lru list regardless we unlock the resv or not. >> >> Signed-off-by: xinhui pan > Interesting find, did you trigger that somehow or did you just > stumbled over it by reading the code? > > Patch is Reviewed-by: Christian König , I > will pick that up for drm-misc-next. > > Thanks, > Christian. > >> --- >>drivers/gpu/drm/ttm/ttm_bo.c | 2 ++ >>1 file changed, 2 insertions(+) >> >> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c >> b/drivers/gpu/drm/ttm/ttm_bo.c index f1367107925b..e307004f0b28 >> 100644 >> --- a/drivers/gpu/drm/ttm/ttm_bo.c >> +++ b/drivers/gpu/drm/ttm/ttm_bo.c >> @@ -701,6 +701,8 @@ int ttm_mem_evict_first(struct ttm_device *bdev, >>ret = ttm_bo_evict(bo, ctx); >>if (locked) >>ttm_bo_unreserve(bo); >> + else >> + ttm_bo_move_to_lru_tail_unlocked(bo); >> >>ttm_bo_put(bo); >>return ret;
RE: [PATCH] drm/ttm: add a WARN_ON in ttm_set_driver_manager when array bounds (v2)
[Public] Thanks for your suggestion, Robin. Do you agree with this as well, Christian and Xinhui? Regards, Guchun -Original Message- From: Robin Murphy Sent: Saturday, September 11, 2021 2:25 AM To: Chen, Guchun ; amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Koenig, Christian ; Pan, Xinhui ; Deucher, Alexander Cc: Shi, Leslie Subject: Re: [PATCH] drm/ttm: add a WARN_ON in ttm_set_driver_manager when array bounds (v2) On 2021-09-10 11:09, Guchun Chen wrote: > Vendor will define their own memory types on top of TTM_PL_PRIV, but > call ttm_set_driver_manager directly without checking mem_type value > when setting up memory manager. So add such check to aware the case > when array bounds. > > v2: lower check level to WARN_ON > > Signed-off-by: Leslie Shi > Signed-off-by: Guchun Chen > --- > include/drm/ttm/ttm_device.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/include/drm/ttm/ttm_device.h > b/include/drm/ttm/ttm_device.h index 07d722950d5b..aa79953c807c 100644 > --- a/include/drm/ttm/ttm_device.h > +++ b/include/drm/ttm/ttm_device.h > @@ -291,6 +291,7 @@ ttm_manager_type(struct ttm_device *bdev, int mem_type) > static inline void ttm_set_driver_manager(struct ttm_device *bdev, int type, > struct ttm_resource_manager *manager) > { > + WARN_ON(type >= TTM_NUM_MEM_TYPES); Nit: I know nothing about this code, but from the context alone it would seem sensible to do if (WARN_ON(type >= TTM_NUM_MEM_TYPES)) return; to avoid making the subsequent assignment when we *know* it's invalid and likely to corrupt memory. Robin. > bdev->man_drv[type] = manager; > } > >
RE: [PATCH] drm/ttm: add a BUG_ON in ttm_set_driver_manager when array bounds
[Public] Hi Christian and Xinhui, Thanks for your suggestion. The cause is I saw data corruption in several proprietary use cases. BUILD_BUG_ON will have build variation per gcc difference? Anyway, WARN_ON is fine to me, and I will send a new patch set soon to address this. Regards, Guchun From: Koenig, Christian Sent: Friday, September 10, 2021 2:37 PM To: Pan, Xinhui ; amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Deucher, Alexander ; Chen, Guchun Cc: Shi, Leslie Subject: Re: [PATCH] drm/ttm: add a BUG_ON in ttm_set_driver_manager when array bounds Yeah, that's a good point. If build_bug_on() doesn't works for some reason then we at least need to lower this to a WARN_ON. A BUG_ON() is only justified if we prevent strong data corruption with it or note a NULL pointer earlier on or similar. Regards, Christian. Am 10.09.21 um 06:36 schrieb Pan, Xinhui: [AMD Official Use Only] looks good to me. But maybe build_bug_on works too and more reasonable to detect such wrong usage. From: Chen, Guchun <mailto:guchun.c...@amd.com> Sent: Friday, September 10, 2021 12:30:14 PM To: amd-...@lists.freedesktop.org<mailto:amd-...@lists.freedesktop.org> <mailto:amd-...@lists.freedesktop.org>; dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org> <mailto:dri-devel@lists.freedesktop.org>; Koenig, Christian <mailto:christian.koe...@amd.com>; Pan, Xinhui <mailto:xinhui@amd.com>; Deucher, Alexander <mailto:alexander.deuc...@amd.com> Cc: Chen, Guchun <mailto:guchun.c...@amd.com>; Shi, Leslie <mailto:yuliang@amd.com> Subject: [PATCH] drm/ttm: add a BUG_ON in ttm_set_driver_manager when array bounds Vendor will define their own memory types on top of TTM_PL_PRIV, but call ttm_set_driver_manager directly without checking mem_type value when setting up memory manager. So add such check to aware the case when array bounds. Signed-off-by: Leslie Shi <mailto:yuliang@amd.com> Signed-off-by: Guchun Chen <mailto:guchun.c...@amd.com> --- include/drm/ttm/ttm_device.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h index 7a0f561c57ee..24ad76ca8022 100644 --- a/include/drm/ttm/ttm_device.h +++ b/include/drm/ttm/ttm_device.h @@ -308,6 +308,7 @@ ttm_manager_type(struct ttm_device *bdev, int mem_type) static inline void ttm_set_driver_manager(struct ttm_device *bdev, int type, struct ttm_resource_manager *manager) { + BUG_ON(type >= TTM_NUM_MEM_TYPES); bdev->man_drv[type] = manager; } -- 2.17.1
RE: [PATCH] drm/display: fix possible null-pointer dereference in dcn10_set_clock()
[Public] Thanks for your patch. I suggest moving the check of function pointer dc->clk_mgr->funcs->get_clock earlier, and return early if it's NULL, as if it's NULL, it's meaningless to continue the clock setting. if (!dc->clk_mgr || !dc->clk_mgr->funcs->get_clock) return DC_FAIL_UNSUPPORTED_1; dc->clk_mgr->funcs->get_clock(dc->clk_mgr, context, clock_type, _cfg); Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Tuo Li Sent: Tuesday, August 10, 2021 5:20 PM To: Wentland, Harry ; Li, Sun peng (Leo) ; Deucher, Alexander ; Koenig, Christian ; Pan, Xinhui ; airl...@linux.ie; dan...@ffwll.ch; Cyr, Aric ; Lei, Jun ; Zhuo, Qingqing ; Siqueira, Rodrigo ; Lee, Alvin ; Stempen, Vladimir ; isabel.zh...@amd.com; Lee, Sung ; Po-Yu Hsieh Paul ; Wood, Wyatt Cc: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; linux-ker...@vger.kernel.org; baijiaju1...@gmail.com; Tuo Li ; TOTE Robot Subject: [PATCH] drm/display: fix possible null-pointer dereference in dcn10_set_clock() The variable dc->clk_mgr is checked in: if (dc->clk_mgr && dc->clk_mgr->funcs->get_clock) This indicates dc->clk_mgr can be NULL. However, it is dereferenced in: if (!dc->clk_mgr->funcs->get_clock) To fix this possible null-pointer dereference, check dc->clk_mgr before dereferencing it. Reported-by: TOTE Robot Signed-off-by: Tuo Li --- drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c index c545eddabdcc..3a7c7c7efa68 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c @@ -3635,7 +3635,7 @@ enum dc_status dcn10_set_clock(struct dc *dc, dc->clk_mgr->funcs->get_clock(dc->clk_mgr, context, clock_type, _cfg); - if (!dc->clk_mgr->funcs->get_clock) + if (dc->clk_mgr && !dc->clk_mgr->funcs->get_clock) return DC_FAIL_UNSUPPORTED_1; if (clk_khz > clock_cfg.max_clock_khz) -- 2.25.1
RE: [PATCH 1/3] drm/amdgpu: create amdgpu_vkms (v2)
[Public] Look copy right statement is missed in both amdgpu_vkms.c and amdgpu_vkms.h. Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Alex Deucher Sent: Friday, July 23, 2021 10:32 PM To: Taylor, Ryan Cc: kernel test robot ; Daniel Vetter ; Siqueira, Rodrigo ; amd-gfx list ; Melissa Wen ; Maling list - DRI developers Subject: Re: [PATCH 1/3] drm/amdgpu: create amdgpu_vkms (v2) On Wed, Jul 21, 2021 at 1:07 PM Ryan Taylor wrote: > > Modify the VKMS driver into an api that dce_virtual can use to create > virtual displays that obey drm's atomic modesetting api. > > v2: Made local functions static. > > Reported-by: kernel test robot > Signed-off-by: Ryan Taylor > --- > drivers/gpu/drm/amd/amdgpu/Makefile | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c | 411 > +++ drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.h | > 29 ++ drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 23 +- > 7 files changed, 458 insertions(+), 11 deletions(-) create mode > 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c > create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.h > > diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile > b/drivers/gpu/drm/amd/amdgpu/Makefile > index f089794bbdd5..30cbcd5ce1cc 100644 > --- a/drivers/gpu/drm/amd/amdgpu/Makefile > +++ b/drivers/gpu/drm/amd/amdgpu/Makefile > @@ -120,6 +120,7 @@ amdgpu-y += \ > amdgpu-y += \ > dce_v10_0.o \ > dce_v11_0.o \ > + amdgpu_vkms.o \ > dce_virtual.o > > # add GFX block > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > index 54cf647bd018..d0a2f2ed433d 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > @@ -919,6 +919,7 @@ struct amdgpu_device { > > /* display */ > boolenable_virtual_display; > + struct amdgpu_vkms_output *amdgpu_vkms_output; > struct amdgpu_mode_info mode_info; > /* For pre-DCE11. DCE11 and later are in "struct amdgpu_device->dm" */ > struct work_struct hotplug_work; > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > index d0c935cf4f0f..1b016e5bc75f 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > @@ -1230,7 +1230,7 @@ static int amdgpu_pci_probe(struct pci_dev *pdev, > int ret, retry = 0; > bool supports_atomic = false; > > - if (!amdgpu_virtual_display && > + if (amdgpu_virtual_display || > amdgpu_device_asic_has_dc_support(flags & AMD_ASIC_MASK)) > supports_atomic = true; > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c > index 09b048647523..5a143ca02cf9 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c > @@ -344,7 +344,7 @@ int amdgpu_fbdev_init(struct amdgpu_device *adev) > } > > /* disable all the possible outputs/crtcs before entering KMS mode */ > - if (!amdgpu_device_has_dc_support(adev)) > + if (!amdgpu_device_has_dc_support(adev) && > + !amdgpu_virtual_display) > > drm_helper_disable_unused_functions(adev_to_drm(adev)); > > drm_fb_helper_initial_config(>helper, bpp_sel); diff > --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c > new file mode 100644 > index ..d5c1f1c58f5f > --- /dev/null > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c > @@ -0,0 +1,411 @@ > +// SPDX-License-Identifier: GPL-2.0+ > + > +#include > +#include #include > + > +#include "amdgpu.h" > +#include "amdgpu_vkms.h" > +#include "amdgpu_display.h" > + > +/** > + * DOC: amdgpu_vkms > + * > + * The amdgpu vkms interface provides a virtual KMS interface for > +several use > + * cases: devices without display hardware, platforms where the > +actual display > + * hardware is not useful (e.g., servers), SR-IOV virtual functions, > +device > + * emulation/simulation, and device bring up prior to display > +hardware being > + * usable. We previously emulated a legacy KMS interface, but there > +was a desire > + * to move to the atomic KMS interface. The vkms driver did > +everything we > + * needed, but we wanted KMS support natively in the driver without > +buffer > + * sharing and the ability to support an instance of VKMS per device. > +We first > + * looked at splitting vkms into a stub driver and a helper module > +that other > + * drivers could use to implement a virtual display, but this > +strategy ended up > + * being messy due to driver specific callbacks needed for buffer management. > + * Ultimately, it proved easier to import the vkms code as it mostly >
RE: [PATCH -next] drm/amdgpu: Fix missing unlock on error in amdgpu_ras_debugfs_table_read()
[Public] Thank you for the patch, Yingliang. There is a similar patch sent out last Saturday and under review. Please check it. [PATCH 3/4] drm/amdgpu: unlock on error in amdgpu_ras_debugfs Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Yang Yingliang Sent: Monday, July 5, 2021 9:40 AM To: linux-ker...@vger.kernel.org; dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org Cc: Deucher, Alexander Subject: [PATCH -next] drm/amdgpu: Fix missing unlock on error in amdgpu_ras_debugfs_table_read() Add the missing unlock before return from function amdgpu_ras_debugfs_table_read() in the error handling case. Fixes: 9b790694a031 ("drm/amdgpu: RAS EEPROM table is now in debugfs") Reported-by: Hulk Robot Signed-off-by: Yang Yingliang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c index fc70620369e4..dbeeb4986ca6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c @@ -912,8 +912,10 @@ static ssize_t amdgpu_ras_debugfs_table_read(struct file *f, char __user *buf, record.retired_page); data_len = min_t(size_t, rec_hdr_fmt_size - r, size); - if (copy_to_user(buf, [r], data_len)) - return -EINVAL; + if (copy_to_user(buf, [r], data_len)) { + res = -EINVAL; + goto Out; + } buf += data_len; size -= data_len; *pos += data_len; -- 2.25.1 ___ amd-gfx mailing list amd-...@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7Cguchun.chen%40amd.com%7C895d0b06d5e54b3598cf08d93f83454a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637610655312805026%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=b8UQJCZDgKs7CkMFMMXtFUfGe%2FQA4Cnm%2FKJKOlvV1K0%3Dreserved=0
RE: [pull] amdgpu, radeon, ttm, sched drm-next-5.13
[AMD Public Use] Hi Felix and Christian, If the regression you are talking about is the NULL pointer problem when running KFD tests, it should fixed by below patch in this series. drm/amdgpu: fix NULL pointer dereference Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Christian König Sent: Wednesday, April 7, 2021 2:57 PM To: Kuehling, Felix ; Deucher, Alexander ; amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; airl...@gmail.com; daniel.vet...@ffwll.ch Subject: Re: [pull] amdgpu, radeon, ttm, sched drm-next-5.13 Am 06.04.21 um 17:42 schrieb Felix Kuehling: > Am 2021-04-01 um 6:29 p.m. schrieb Alex Deucher: >> Hi Dave, Daniel, >> >> New stuff for 5.13. There are two small patches for ttm and >> scheduler that were dependencies for amdgpu changes. >> >> The following changes since commit 2cbcb78c9ee5520c8d836c7ff57d1b60ebe8e9b7: >> >>Merge tag 'amd-drm-next-5.13-2021-03-23' of >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit >> lab.freedesktop.org%2Fagd5f%2Flinuxdata=04%7C01%7Cguchun.chen%40 >> amd.com%7C51d1cbcf7ccc43854abb08d8f99250d8%7C3dd8961fe4884e608e11a82d >> 994e183d%7C0%7C0%7C637533754128113017%7CUnknown%7CTWFpbGZsb3d8eyJWIjo >> iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000 >> p;sdata=FcdoL9w5LhBZ849ctXPudr%2BBQnnm7Oiq3pz5X7LGGk4%3Dreserved >> =0 into drm-next (2021-03-26 15:53:21 +0100) >> >> are available in the Git repository at: >> >> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit >> lab.freedesktop.org%2Fagd5f%2Flinux.gitdata=04%7C01%7Cguchun.che >> n%40amd.com%7C51d1cbcf7ccc43854abb08d8f99250d8%7C3dd8961fe4884e608e11 >> a82d994e183d%7C0%7C0%7C637533754128113017%7CUnknown%7CTWFpbGZsb3d8eyJ >> WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C100 >> 0sdata=N4JIk%2BEgzleaKYaxvdtT7TR1ZsS6FGsIGpDDUqiQiLw%3Drese >> rved=0 tags/amd-drm-next-5.13-2021-04-01 >> >> for you to fetch changes up to ef95d2a98d642a537190d73c45ae3c308afee890: >> >>drm/amdgpu/display: fix warning on 32 bit in dmub (2021-04-01 >> 17:32:32 -0400) >> >> >> amd-drm-next-5.13-2021-04-01: >> >> amdgpu: >> - Re-enable GPU reset on VanGogh >> - Enable DPM flags for SMART_SUSPEND and MAY_SKIP_RESUME >> - Disentangle HG from vga_switcheroo >> - S0ix fixes >> - W=1 fixes >> - Resource iterator fixes >> - DMCUB updates >> - UBSAN fixes >> - More PM API cleanup >> - Aldebaran updates >> - Modifier fixes >> - Enable VCN load balancing with asymmetric engines >> - Rework BO structs >> - Aldebaran reset support >> - Initial LTTPR display work >> - Display MALL fixes >> - Fall back to YCbCr420 when YCbCr444 fails >> - SR-IOV fixes >> - Misc cleanups and fixes >> >> radeon: >> - Typo fixes >> >> ttm: >> - Handle cached requests (required for Aldebaran) >> >> scheduler: >> - Fix runqueue selection when changing priorities (required to fix VCN >>load balancing) >> >> >> Alex Deucher (20): >>drm/amdgpu/display/dm: add missing parameter documentation >>drm/amdgpu: Add additional Sienna Cichlid PCI ID >>drm/amdgpu: add a dev_pm_ops prepare callback (v2) >>drm/amdgpu: enable DPM_FLAG_MAY_SKIP_RESUME and >> DPM_FLAG_SMART_SUSPEND flags (v2) >>drm/amdgpu: disentangle HG systems from vgaswitcheroo >>drm/amdgpu: rework S3/S4/S0ix state handling >>drm/amdgpu: don't evict vram on APUs for suspend to ram (v4) >>drm/amdgpu: clean up non-DC suspend/resume handling >>drm/amdgpu: move s0ix check into amdgpu_device_ip_suspend_phase2 (v3) >>drm/amdgpu: re-enable suspend phase 2 for S0ix >>drm/amdgpu/swsmu: skip gfx cgpg on s0ix suspend >>drm/amdgpu: update comments about s0ix suspend/resume >>drm/amdgpu: drop S0ix checks around CG/PG in suspend >>drm/amdgpu: skip kfd suspend/resume for S0ix >>drm/amdgpu/display: restore AUX_DPHY_TX_CONTROL for DCN2.x >>drm/amdgpu/display: fix memory leak for dimgrey cavefish >>drm/amdgpu/pm: mark pcie link/speed arrays as const >>drm/amdgpu/pm: bail on sysfs/debugfs queries during platform suspend >>drm/amdgpu/vangogh: don't check for dpm in is_dpm_running when in >> suspend >>drm/amdgpu/display: fix warning on 32 bit in dmub >> >> Alex Sierra (2): >>drm/amdgpu: replace per_device_list by array >>drm/amdgpu: ih reroute for newer asics than vega20 >> >> Alvin Lee (1): >>drm/amd/display: Change input parameter for set_drr >> >> Anson Jacob (2): >>drm/amd/display: Fix UBSAN: shift-out-of-bounds warning >>drm/amd/display: Removing unused code from dmub_cmd.h >> >> Anthony Koo (2): >>drm/amd/display: [FW Promotion] Release 0.0.57 >>drm/amd/display: [FW Promotion] Release 0.0.58 >> >> Aric Cyr (2): >>drm/amd/display: 3.2.128 >>
RE: [PATCH][next] drm/amd/display: Fix sizeof arguments in bw_calcs_init()
[AMD Public Use] Thanks for your patch, Silva. The issue has been fixed by " a5c6007e20e1 drm/amd/display: fix modprobe failure on vega series". Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Gustavo A. R. Silva Sent: Monday, March 22, 2021 8:51 PM To: Lee Jones ; Wentland, Harry ; Li, Sun peng (Leo) ; Deucher, Alexander ; Koenig, Christian ; David Airlie ; Daniel Vetter Cc: Gustavo A. R. Silva ; dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org; linux-ker...@vger.kernel.org Subject: [PATCH][next] drm/amd/display: Fix sizeof arguments in bw_calcs_init() The wrong sizeof values are currently being used as arguments to kzalloc(). Fix this by using the right arguments *dceip and *vbios, correspondingly. Addresses-Coverity-ID: 1502901 ("Wrong sizeof argument") Fixes: fca1e079055e ("drm/amd/display/dc/calcs/dce_calcs: Remove some large variables from the stack") Signed-off-by: Gustavo A. R. Silva --- drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c b/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c index 556ecfabc8d2..1244fcb0f446 100644 --- a/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c +++ b/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c @@ -2051,11 +2051,11 @@ void bw_calcs_init(struct bw_calcs_dceip *bw_dceip, enum bw_calcs_version version = bw_calcs_version_from_asic_id(asic_id); - dceip = kzalloc(sizeof(dceip), GFP_KERNEL); + dceip = kzalloc(sizeof(*dceip), GFP_KERNEL); if (!dceip) return; - vbios = kzalloc(sizeof(vbios), GFP_KERNEL); + vbios = kzalloc(sizeof(*vbios), GFP_KERNEL); if (!vbios) { kfree(dceip); return; -- 2.27.0 ___ amd-gfx mailing list amd-...@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7Cguchun.chen%40amd.com%7C4ec6ae20f70a488fd2dd08d8ed3987cd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637520178643844637%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=YKVR3n%2FnX50dwuP91T1xPxW%2FvgisWDY0dvF8PxO4P4A%3Dreserved=0 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
RE: [PATCH] drm/ttm: Do not add non-system domain BO into swap list
[AMD Public Use] Acked-by: Guchun Chen Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Alex Deucher Sent: Wednesday, February 24, 2021 11:35 AM To: Pan, Xinhui Cc: Deucher, Alexander ; Maling list - DRI developers ; Koenig, Christian ; amd-gfx list Subject: Re: [PATCH] drm/ttm: Do not add non-system domain BO into swap list On Tue, Feb 23, 2021 at 10:28 PM xinhui pan wrote: > > BO would be added into swap list if it is validated into system domain. > If BO is validated again into non-system domain, say, VRAM domain. It > actually should not be in the swap list. > > Signed-off-by: xinhui pan Acked-by: Alex Deucher > --- > drivers/gpu/drm/ttm/ttm_bo.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c > b/drivers/gpu/drm/ttm/ttm_bo.c index a97d41f4ce3c..3a10bebb75d6 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -111,6 +111,8 @@ void ttm_bo_move_to_lru_tail(struct > ttm_buffer_object *bo, > > swap = _glob.swap_lru[bo->priority]; > list_move_tail(>swap, swap); > + } else { > + list_del_init(>swap); > } > > if (bdev->funcs->del_from_lru_notify) > -- > 2.25.1 > > ___ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist > s.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-develdata=04%7C01%7C > guchun.chen%40amd.com%7C554dbc7fd1fe4438268508d8d87529da%7C3dd8961fe48 > 84e608e11a82d994e183d%7C0%7C0%7C637497345043233977%7CUnknown%7CTWFpbGZ > sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3 > D%7C1000sdata=2sWpQGXSETm6t%2FKwHXeuLjmcwHHMFKlIplpcL9T3VF8%3D > p;reserved=0 ___ amd-gfx mailing list amd-...@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7Cguchun.chen%40amd.com%7C554dbc7fd1fe4438268508d8d87529da%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637497345043233977%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=7sfyxSHzKhpYeh6GzlzhjkBDDsNlxMhz3Ydcs6AHnPw%3Dreserved=0 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
RE: [PATCH] drm/amd/display: use div_s64() for 64-bit division
[AMD Public Use] Hi Arnd Bergmann, Thanks for your patch. This link error during compile has been fixed by below commit and been submitted to drm-next branch already. 5da047444e82 drm/amd/display: fix 64-bit division issue on 32-bit OS Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Arnd Bergmann Sent: Monday, January 25, 2021 7:40 PM To: Wentland, Harry ; Li, Sun peng (Leo) ; Deucher, Alexander ; Koenig, Christian ; David Airlie ; Daniel Vetter ; Aberback, Joshua ; Lakha, Bhawanpreet ; Kazlauskas, Nicholas Cc: Arnd Bergmann ; Chalmers, Wesley ; Zhuo, Qingqing ; Siqueira, Rodrigo ; linux-ker...@vger.kernel.org; amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Jacky Liao ; Leung, Martin Subject: [PATCH] drm/amd/display: use div_s64() for 64-bit division From: Arnd Bergmann The open-coded 64-bit division causes a link error on 32-bit machines: ERROR: modpost: "__udivdi3" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined! ERROR: modpost: "__divdi3" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined! Use the div_s64() to perform the division here. One of them was an unsigned division originally, but it looks like signed division was intended, so use that to consistently allow a negative delay. Fixes: ea7154d8d9fb ("drm/amd/display: Update dcn30_apply_idle_power_optimizations() code") Signed-off-by: Arnd Bergmann --- drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c index dff83c6a142a..a133e399e76d 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c @@ -772,8 +772,8 @@ bool dcn30_apply_idle_power_optimizations(struct dc *dc, bool enable) cursor_cache_enable ? _attr : NULL)) { unsigned int v_total = stream->adjust.v_total_max ? stream->adjust.v_total_max : stream->timing.v_total; - unsigned int refresh_hz = (unsigned long long) stream->timing.pix_clk_100hz * - 100LL / (v_total * stream->timing.h_total); + unsigned int refresh_hz = div_s64((unsigned long long) stream->timing.pix_clk_100hz * + 100LL, v_total * stream->timing.h_total); /* * one frame time in microsec: @@ -800,8 +800,8 @@ bool dcn30_apply_idle_power_optimizations(struct dc *dc, bool enable) unsigned int denom = refresh_hz * 6528; unsigned int stutter_period = dc->current_state->perf_params.stutter_period_us; - tmr_delay = (((100LL + 2 * stutter_period * refresh_hz) * - (100LL + dc->debug.mall_additional_timer_percent) + denom - 1) / + tmr_delay = div_s64(((100LL + 2 * stutter_period * refresh_hz) * + (100LL + dc->debug.mall_additional_timer_percent) + denom - 1), denom) - 64LL; /* scale should be increased until it fits into 6 bits */ @@ -815,8 +815,8 @@ bool dcn30_apply_idle_power_optimizations(struct dc *dc, bool enable) } denom *= 2; - tmr_delay = (((100LL + 2 * stutter_period * refresh_hz) * - (100LL + dc->debug.mall_additional_timer_percent) + denom - 1) / + tmr_delay = div_s64(((100LL + 2 * stutter_period * refresh_hz) * + (100LL + dc->debug.mall_additional_timer_percent) + denom - 1), denom) - 64LL; } -- 2.29.2 ___ amd-gfx mailing list amd-...@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7Cguchun.chen%40amd.com%7C4bb97aae9edc4153392c08d8c1260048%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637471716255231899%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=kLdkVHfkYx%2Bd249%2BmtG5GJTq295Pxzw7mgTe0FD8QvY%3Dreserved=0 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
RE: linux-next: Tree for Jan 22 (amdgpu)
[AMD Public Use] The link error has been fixed by: 5da047444e82 drm/amd/display: fix 64-bit division issue on 32-bit OS Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Randy Dunlap Sent: Saturday, January 23, 2021 2:02 AM To: Stephen Rothwell ; Linux Next Mailing List Cc: amd-...@lists.freedesktop.org; Linux Kernel Mailing List ; dri-devel Subject: Re: linux-next: Tree for Jan 22 (amdgpu) On 1/21/21 11:06 PM, Stephen Rothwell wrote: > Hi all, > > Changes since 20210121: > on i386: ERROR: modpost: "__udivdi3" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined! ERROR: modpost: "__divdi3" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined! -- ~Randy ___ amd-gfx mailing list amd-...@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7Cguchun.chen%40amd.com%7C32b5c3dbae684672163608d8bf82ab0c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637469915239051891%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=0a61RTCcYsAXilfnxqSzPXxA2q6sIYDKEkMWL6HGJro%3Dreserved=0 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
RE: [PATCH] drm/amdgpu:Fixed the wrong macro definition in amdgpu_trace.h
[AMD Public Use] Nice catch and the patch is: Reviewed-by: Guchun Chen Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Chenyang Li Sent: Wednesday, December 23, 2020 9:19 AM To: Deucher, Alexander ; amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org Subject: [PATCH] drm/amdgpu:Fixed the wrong macro definition in amdgpu_trace.h In line 24 "_AMDGPU_TRACE_H" is missing an underscore. Signed-off-by: Chenyang Li --- drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h index ee9480d14cbc..86cfb3d55477 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h @@ -21,7 +21,7 @@ * */ -#if !defined(_AMDGPU_TRACE_H) || defined(TRACE_HEADER_MULTI_READ) +#if !defined(_AMDGPU_TRACE_H_) || defined(TRACE_HEADER_MULTI_READ) #define _AMDGPU_TRACE_H_ #include -- 2.29.2 ___ amd-gfx mailing list amd-...@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7Cguchun.chen%40amd.com%7C8d902bda929a44a4eac508d8a7114368%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637443040220638596%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000sdata=%2ByVfo1XiGQKDHkf354Kpi2edjFzsUT3FIlAXAM6O6AE%3Dreserved=0 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
RE: [radeon-alex:amd-20.45 2387/2427] drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1284:2: warning: ignoring return value of function declared with 'warn_unused_result' attribute
[AMD Public Use] Hi there, I will fix this soon. The issue is reported on amd-20.45 branch, which was branched out ahead of the fix available on mainline. Regards, Guchun -Original Message- From: kernel test robot Sent: Tuesday, December 15, 2020 1:49 PM To: Chen, Guchun Cc: kbuild-...@lists.01.org; clang-built-li...@googlegroups.com; dri-devel@lists.freedesktop.org; Li, Dennis Subject: [radeon-alex:amd-20.45 2387/2427] drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1284:2: warning: ignoring return value of function declared with 'warn_unused_result' attribute Hi Guchun, FYI, the error/warning still remains. tree: git://people.freedesktop.org/~agd5f/linux.git amd-20.45 head: a3950d94b046fb206e58fd3ec717f071c0203ba3 commit: cf13e50dea28cde351fa32767e36135afb30386d [2387/2427] drm/amdgpu: clean up ras sysfs creation (v2) config: x86_64-randconfig-a002-20201214 (attached as .config) compiler: clang version 12.0.0 (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fllvm%2Fllvm-projectdata=04%7C01%7Cguchun.chen%40amd.com%7C708cce12ecaa4d2ee1d108d8a0bd3135%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637436084052882308%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=I4ohtcRRR8iQs%2FfkMhy%2B7pnsAJ4V%2Br%2F0EpNjcoQp%2B4s%3Dreserved=0 a29ecca7819a6ed4250d3689b12b1f664bb790d7) reproduce (this is a W=1 build): wget https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fraw.githubusercontent.com%2Fintel%2Flkp-tests%2Fmaster%2Fsbin%2Fmake.crossdata=04%7C01%7Cguchun.chen%40amd.com%7C708cce12ecaa4d2ee1d108d8a0bd3135%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637436084052892302%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=ijbznpvhsb43YLeQJ6UZb%2BfG4mnCiAA2ZmPQhLgz6Ig%3Dreserved=0 -O ~/bin/make.cross chmod +x ~/bin/make.cross # install x86_64 cross compiling tool for clang build # apt-get install binutils-x86-64-linux-gnu git remote add radeon-alex git://people.freedesktop.org/~agd5f/linux.git git fetch --no-tags radeon-alex amd-20.45 git checkout cf13e50dea28cde351fa32767e36135afb30386d # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:906:5: warning: no previous prototype for function 'amdgpu_ras_error_cure' [-Wmissing-prototypes] int amdgpu_ras_error_cure(struct amdgpu_device *adev, ^ drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:906:1: note: declare 'static' if the function is not intended to be used outside of this translation unit int amdgpu_ras_error_cure(struct amdgpu_device *adev, ^ static >> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1284:2: warning: ignoring return >> value of function declared with 'warn_unused_result' attribute >> [-Wunused-result] sysfs_create_group(>dev->kobj, ); ^~ 2 warnings generated. vim +/warn_unused_result +1284 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 1249 1250 /* ras fs */ 1251 static BIN_ATTR(gpu_vram_bad_pages, S_IRUGO, 1252 amdgpu_ras_sysfs_badpages_read, NULL, 0); 1253 static DEVICE_ATTR(features, S_IRUGO, 1254 amdgpu_ras_sysfs_features_read, NULL); 1255 static int amdgpu_ras_fs_init(struct amdgpu_device *adev) 1256 { 1257 struct amdgpu_ras *con = amdgpu_ras_get_context(adev); 1258 struct attribute_group group = { 1259 .name = RAS_FS_NAME, 1260 }; 1261 struct attribute *attrs[] = { 1262 >features_attr.attr, 1263 NULL 1264 }; 1265 struct bin_attribute *bin_attrs[] = { 1266 NULL, 1267 NULL, 1268 }; 1269 1270 /* add features entry */ 1271 con->features_attr = dev_attr_features; 1272 group.attrs = attrs; 1273 sysfs_attr_init(attrs[0]); 1274 1275 if (amdgpu_bad_page_threshold != 0) { 1276 /* add bad_page_features entry */ 1277 bin_attr_gpu_vram_bad_pages.private = NULL; 1278 con->badpages_attr = bin_attr_gpu_vram_bad_pages; 1279 bin_attrs[0] = >badpages_attr; 1280 group.bin_attrs = bin_attrs; 1281 sysfs_bin_attr_init(bin_attrs[0]); 1282 } 1283 > 1284 sysfs_create_group(>dev->kobj, ); 1285 1286 return 0; 1287 } 1288 --- 0-DAY CI Kernel Test Service, Intel Corporation https://nam11.safelinks.protection.outlook.com/
RE: [PATCH][next] drm/amdgpu: Fix sizeof() mismatch in bps_bo kmalloc_array creation
[AMD Public Use] Reviewed-by: Guchun Chen Regards, Guchun -Original Message- From: Colin King Sent: Wednesday, November 25, 2020 10:18 PM To: Deucher, Alexander ; Koenig, Christian ; David Airlie ; Daniel Vetter ; Zhou1, Tao ; Chen, Guchun ; amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org Cc: kernel-janit...@vger.kernel.org; linux-ker...@vger.kernel.org Subject: [PATCH][next] drm/amdgpu: Fix sizeof() mismatch in bps_bo kmalloc_array creation From: Colin Ian King An incorrect sizeof() is being used, sizeof((*data)->bps_bo) is not correct, it should be sizeof(*(*data)->bps_bo). It just so happens to work because the sizes are the same. Fix it. Addresses-Coverity: ("Sizeof not portable (SIZEOF_MISMATCH)") Fixes: 5278a159cf35 ("drm/amdgpu: support reserve bad page for virt (v3)") Signed-off-by: Colin Ian King --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c index 2d51b7694d1f..df15d33e3c5c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c @@ -283,7 +283,7 @@ static int amdgpu_virt_init_ras_err_handler_data(struct amdgpu_device *adev) return -ENOMEM; bps = kmalloc_array(align_space, sizeof((*data)->bps), GFP_KERNEL); - bps_bo = kmalloc_array(align_space, sizeof((*data)->bps_bo), GFP_KERNEL); + bps_bo = kmalloc_array(align_space, sizeof(*(*data)->bps_bo), +GFP_KERNEL); if (!bps || !bps_bo) { kfree(bps); -- 2.29.2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
RE: [radeon-alex:amd-20.45 2387/2417] drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1284:2: warning: ignoring return value of function declared with 'warn_unused_result' attribute
[AMD Public Use] +Alex. We have one following patch to fix this. Please check. a069a9eb73f8 drm/amdgpu: fix a warning in amdgpu_ras.c (v2) Regards, Guchun -Original Message- From: kernel test robot Sent: Saturday, November 21, 2020 2:02 PM To: Chen, Guchun Cc: kbuild-...@lists.01.org; clang-built-li...@googlegroups.com; dri-devel@lists.freedesktop.org; Li, Dennis Subject: [radeon-alex:amd-20.45 2387/2417] drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1284:2: warning: ignoring return value of function declared with 'warn_unused_result' attribute tree: git://people.freedesktop.org/~agd5f/linux.git amd-20.45 head: 1807abbb3a7f17fc931a15d7fd4365ea148c6bb1 commit: cf13e50dea28cde351fa32767e36135afb30386d [2387/2417] drm/amdgpu: clean up ras sysfs creation (v2) config: x86_64-randconfig-a011-20201120 (attached as .config) compiler: clang version 12.0.0 (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fllvm%2Fllvm-projectdata=04%7C01%7Cguchun.chen%40amd.com%7C5d30079af0d54041642608d88de31a49%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637415356400664955%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=aEjIu7Q%2FtIwvVsp%2BV28FUwW74QJCsFQ7g3Qak6%2FrazU%3Dreserved=0 3ded927cf80ac519f9f9c4664fef08787f7c537d) reproduce (this is a W=1 build): wget https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fraw.githubusercontent.com%2Fintel%2Flkp-tests%2Fmaster%2Fsbin%2Fmake.crossdata=04%7C01%7Cguchun.chen%40amd.com%7C5d30079af0d54041642608d88de31a49%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637415356400664955%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=NphWy5Ztnf61zx9D%2FBrH%2FP64Yr5tecsLo2FecWTQNpE%3Dreserved=0 -O ~/bin/make.cross chmod +x ~/bin/make.cross # install x86_64 cross compiling tool for clang build # apt-get install binutils-x86-64-linux-gnu git remote add radeon-alex git://people.freedesktop.org/~agd5f/linux.git git fetch --no-tags radeon-alex amd-20.45 git checkout cf13e50dea28cde351fa32767e36135afb30386d # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:906:5: warning: no previous prototype for function 'amdgpu_ras_error_cure' [-Wmissing-prototypes] int amdgpu_ras_error_cure(struct amdgpu_device *adev, ^ drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:906:1: note: declare 'static' if the function is not intended to be used outside of this translation unit int amdgpu_ras_error_cure(struct amdgpu_device *adev, ^ static >> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1284:2: warning: ignoring return >> value of function declared with 'warn_unused_result' attribute >> [-Wunused-result] sysfs_create_group(>dev->kobj, ); ^~ 2 warnings generated. vim +/warn_unused_result +1284 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 1249 1250 /* ras fs */ 1251 static BIN_ATTR(gpu_vram_bad_pages, S_IRUGO, 1252 amdgpu_ras_sysfs_badpages_read, NULL, 0); 1253 static DEVICE_ATTR(features, S_IRUGO, 1254 amdgpu_ras_sysfs_features_read, NULL); 1255 static int amdgpu_ras_fs_init(struct amdgpu_device *adev) 1256 { 1257 struct amdgpu_ras *con = amdgpu_ras_get_context(adev); 1258 struct attribute_group group = { 1259 .name = RAS_FS_NAME, 1260 }; 1261 struct attribute *attrs[] = { 1262 >features_attr.attr, 1263 NULL 1264 }; 1265 struct bin_attribute *bin_attrs[] = { 1266 NULL, 1267 NULL, 1268 }; 1269 1270 /* add features entry */ 1271 con->features_attr = dev_attr_features; 1272 group.attrs = attrs; 1273 sysfs_attr_init(attrs[0]); 1274 1275 if (amdgpu_bad_page_threshold != 0) { 1276 /* add bad_page_features entry */ 1277 bin_attr_gpu_vram_bad_pages.private = NULL; 1278 con->badpages_attr = bin_attr_gpu_vram_bad_pages; 1279 bin_attrs[0] = >badpages_attr; 1280 group.bin_attrs = bin_attrs; 1281 sysfs_bin_attr_init(bin_attrs[0]); 1282 } 1283 > 1284 sysfs_create_group(>dev->kobj, ); 1285 1286 return 0; 1287 } 1288 --- 0-DAY CI Kernel Test Service, Intel Corporation https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.01.org%2Fhyperkitty%2Flist%2Fkbuild-all