Re: [PATCH 2/2] drm/amdgpu: reset gpu for pm abort case
On 1/26/2024 2:30 PM, Liang, Prike wrote: > [AMD Official Use Only - General] > >> >> On 1/25/2024 8:52 AM, Prike Liang wrote: >>> In the pm abort case the gfx power rail not turn off from FCH side and >>> this will lead to the gfx reinitialized failed base on the unknown gfx >>> HW status, so let's reset the gpu to a known good power state. >>> >> >> From the description, this an APU only problem (or this patch could only >> resolve APU abort sequence). However, there is no check for APU in the patch >> below. >> > [Prike] IIRC, there also has a similar problem on the dGPU side when suspend > abort and > now this patch is only drafted for a hot issue on the RV series. If need we > can add a TODO > item for drafting a more generic solution. > If this addresses a specific issue, then better to check the specific IP revision before presenting this as a generic one. Presently the patch logic considers this as a generic for all soc15 asics. >> >>> Signed-off-by: Prike Liang >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 + >>> drivers/gpu/drm/amd/amdgpu/soc15.c | 8 +++- >>> 2 files changed, 12 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> index 56d9dfa61290..4c40ffaaa5c2 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> @@ -4627,6 +4627,11 @@ int amdgpu_device_resume(struct drm_device >> *dev, bool fbcon) >>> return r; >>> } >>> >>> + if(amdgpu_asic_need_reset_on_init(adev)) { >>> + DRM_INFO("PM abort case and let's reset asic \n"); >>> + amdgpu_asic_reset(adev); >>> + } >>> + >> >> suspend_noirq is specific for suspend scenarios and not valid for >> freeze/thaw. >> I guess this could trigger reset for successful restore on APUs. >> > [Prike] If doesn't run into noirq_suspend then still need further check > whether the PSP TOS is still alive before gpu reset. > AFAIU, for a successful resume from hibernate on APUs, TOS will still be running. The patch will trigger a reset in such cases also. Thanks, Lijo >>> if (dev->switch_power_state == DRM_SWITCH_POWER_OFF) >>> return 0; >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c >>> b/drivers/gpu/drm/amd/amdgpu/soc15.c >>> index 15033efec2ba..9329a00b6abc 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c >>> @@ -804,9 +804,16 @@ static bool soc15_need_reset_on_init(struct >> amdgpu_device *adev) >>> if (adev->asic_type == CHIP_RENOIR) >>> return true; >>> >>> + sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81); >>> + >>> /* Just return false for soc15 GPUs. Reset does not seem to >>> * be necessary. >>> */ >> >> The comment now doesn't make sense. >> >> Thanks, >> Lijo >> >>> + if (adev->in_suspend && !adev->in_s0ix && >>> + !adev->pm_complete && >>> + sol_reg) >>> + return true; >>> + >>> if (!amdgpu_passthrough(adev)) >>> return false; >>> >>> @@ -816,7 +823,6 @@ static bool soc15_need_reset_on_init(struct >> amdgpu_device *adev) >>> /* Check sOS sign of life register to confirm sys driver and sOS >>> * are already been loaded. >>> */ >>> - sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81); >>> if (sol_reg) >>> return true; >>>
[PATCH] drm/amdgpu: Fix the logic error when init mec fw
Remove redundant code to fix the logic error and potential null pointer dereference if gfx.mec2_fw is null. Signed-off-by: Ma Jun --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 057d7f3b8ce0..3395b83e969e 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -4027,8 +4027,6 @@ static int gfx_v10_0_init_microcode(struct amdgpu_device *adev) err = 0; adev->gfx.mec2_fw = NULL; } - amdgpu_gfx_cp_init_microcode(adev, AMDGPU_UCODE_ID_CP_MEC2); - amdgpu_gfx_cp_init_microcode(adev, AMDGPU_UCODE_ID_CP_MEC2_JT); gfx_v10_0_check_fw_write_wait(adev); out: -- 2.34.1
RE: [PATCH] drm/amdgpu: disable ras feature when fini
[AMD Official Use Only - General] The patch is Reviewed-by: Hawking Zhang BTW, we could take further step to retire the if branch (bypass == 1) with proper RAS_TA changes on legacy Vega20/Arcturus if (bypass) { if (__amdgpu_ras_feature_enable(adev, >head, 0)) break; } Regards, Hawking -Original Message- From: amd-gfx On Behalf Of Tao Zhou Sent: Monday, January 29, 2024 11:54 To: amd-gfx@lists.freedesktop.org Cc: Zhou1, Tao Subject: [PATCH] drm/amdgpu: disable ras feature when fini Send ras disable feature command in fini. Signed-off-by: Tao Zhou Change-Id: I95f1d1e0a46fb613631e5cd77497e64c0551c4c7 --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index a249f24ed038..a9fa2d134670 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -3437,7 +3437,7 @@ int amdgpu_ras_fini(struct amdgpu_device *adev) WARN(AMDGPU_RAS_GET_FEATURES(con->features), "Feature mask is not cleared"); if (AMDGPU_RAS_GET_FEATURES(con->features)) - amdgpu_ras_disable_all_features(adev, 1); + amdgpu_ras_disable_all_features(adev, 0); cancel_delayed_work_sync(>ras_counte_delay_work); -- 2.34.1
Re: Two patches to improve gang submit with reserved VMIDs
On 2024-01-26 10:54, Christian König wrote: > Hi guys, > > those two patches clean up gang submit. The first one should prevent > crashes when gang submit is used together with a reserved VMID. > > The second rejects gang submits with a reserved VMID when this won't > work because of HW limitations. > > Only smoke tested since I need more HW for this setup > > @Vitaly: If you have some time please test them with your stand alone > gang submit test first and then we need to come up with a combined > test case for gang submit with reserved VMID. Hi Christian, After applying both patches no problem was found with the basic and gang submit IGT tests. When I modified a gang submits test ( added VMID allocation/free), It was rejected cs by amdgpu_cs_vm_handling(not sure here) Based on the above things look correct. Let's discuss the details of gang submission with VMID. Thanks, Vitaly > > Regards, > Christian. > >
[PATCH] drm/amdgpu: disable ras feature when fini
Send ras disable feature command in fini. Signed-off-by: Tao Zhou Change-Id: I95f1d1e0a46fb613631e5cd77497e64c0551c4c7 --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index a249f24ed038..a9fa2d134670 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -3437,7 +3437,7 @@ int amdgpu_ras_fini(struct amdgpu_device *adev) WARN(AMDGPU_RAS_GET_FEATURES(con->features), "Feature mask is not cleared"); if (AMDGPU_RAS_GET_FEATURES(con->features)) - amdgpu_ras_disable_all_features(adev, 1); + amdgpu_ras_disable_all_features(adev, 0); cancel_delayed_work_sync(>ras_counte_delay_work); -- 2.34.1
RE: [PATCH v2] drm/amdkfd: reserve the BO before validating it
[AMD Official Use Only - General] >-Original Message- >From: Kuehling, Felix >Sent: Saturday, January 27, 2024 3:22 AM >To: Yu, Lang ; amd-gfx@lists.freedesktop.org >Cc: Francis, David >Subject: Re: [PATCH v2] drm/amdkfd: reserve the BO before validating it > > >On 2024-01-25 20:59, Yu, Lang wrote: >> [AMD Official Use Only - General] >> >>> -Original Message- >>> From: Kuehling, Felix >>> Sent: Thursday, January 25, 2024 5:41 AM >>> To: Yu, Lang ; amd-gfx@lists.freedesktop.org >>> Cc: Francis, David >>> Subject: Re: [PATCH v2] drm/amdkfd: reserve the BO before validating >>> it >>> >>> On 2024-01-22 4:08, Lang Yu wrote: Fixes: 410f08516e0f ("drm/amdkfd: Move dma unmapping after TLB flush") v2: Avoid unmapping attachment twice when ERESTARTSYS. [ 41.708711] WARNING: CPU: 0 PID: 1463 at >>> drivers/gpu/drm/ttm/ttm_bo.c:846 ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.708989] Call Trace: [ 41.708992] [ 41.708996] ? show_regs+0x6c/0x80 [ 41.709000] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709008] ? __warn+0x93/0x190 [ 41.709014] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709024] ? report_bug+0x1f9/0x210 [ 41.709035] ? handle_bug+0x46/0x80 [ 41.709041] ? exc_invalid_op+0x1d/0x80 [ 41.709048] ? asm_exc_invalid_op+0x1f/0x30 [ 41.709057] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 >>> [amdgpu] [ 41.709185] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709197] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 >>> [amdgpu] [ 41.709337] ? srso_alias_return_thunk+0x5/0x7f [ 41.709346] kfd_mem_dmaunmap_attachment+0x9e/0x1e0 [amdgpu] [ 41.709467] amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x56/0x80 >>> [amdgpu] [ 41.709586] kfd_ioctl_unmap_memory_from_gpu+0x1b7/0x300 [amdgpu] [ 41.709710] kfd_ioctl+0x1ec/0x650 [amdgpu] [ 41.709822] ? __pfx_kfd_ioctl_unmap_memory_from_gpu+0x10/0x10 >>> [amdgpu] [ 41.709945] ? srso_alias_return_thunk+0x5/0x7f [ 41.709949] ? tomoyo_file_ioctl+0x20/0x30 [ 41.709959] __x64_sys_ioctl+0x9c/0xd0 [ 41.709967] do_syscall_64+0x3f/0x90 [ 41.709973] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Signed-off-by: Lang Yu --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 2 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 28 >>> +-- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 ++- 3 files changed, 29 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index 584a0cea5572..41854417e487 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -311,7 +311,7 @@ int >>> amdgpu_amdkfd_gpuvm_map_memory_to_gpu(struct amdgpu_device *adev, struct kgd_mem *mem, void >>> *drm_priv); int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu( struct amdgpu_device *adev, struct kgd_mem *mem, void >>> *drm_priv); -void amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv); +int amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void +*drm_priv); int amdgpu_amdkfd_gpuvm_sync_memory( struct amdgpu_device *adev, struct kgd_mem *mem, bool intr); int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_mem *mem, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 6f3a4cb2a9ef..7a050d46fa4d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -2088,21 +2088,43 @@ int >>> amdgpu_amdkfd_gpuvm_map_memory_to_gpu( return ret; } -void amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv) +int amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void +*drm_priv) { struct kfd_mem_attachment *entry; struct amdgpu_vm *vm; +bool reserved = false; +int ret = 0; vm = drm_priv_to_vm(drm_priv); mutex_lock(>lock); list_for_each_entry(entry, >attachments, list) { -if (entry->bo_va->base.vm == vm) -kfd_mem_dmaunmap_attachment(mem, entry); +if (entry->bo_va->base.vm != vm) +continue; +if (entry->type == KFD_MEM_ATT_SHARED || +entry->type == KFD_MEM_ATT_DMABUF) +continue; +if (!entry->bo_va->base.bo->tbo.ttm->sg) +continue; >>> You're going to great lengths to avoid the reservation when it's not >>> needed by kfd_mem_dmaunmap_attachment. However, this feels a bit
[PATCH v3 0/3] drm/atomic: Allow drivers to write their own plane check for async
Hi, AMD hardware can do more on the async flip path than just the primary plane, so to lift up the current restrictions, this patchset allows drivers to write their own check for planes for async flips. This patchset allows for async commits with IN_FENCE_ID in any driver and overlay planes on AMD. Userspace can query if a driver supports this with TEST_ONLY commits. Changes from v2: - Allow IN_FENCE_ID for any driver - Allow overlay planes again v2: https://lore.kernel.org/lkml/20240119181235.255060-1-andrealm...@igalia.com/ Changes from v1: - Drop overlay planes option for now v1: https://lore.kernel.org/dri-devel/20240116045159.1015510-1-andrealm...@igalia.com/ André Almeida (3): drm/atomic: Allow drivers to write their own plane check for async flips drm/atomic: Allow userspace to use explicit sync with atomic async flips drm/amdgpu: Implement check_async_props for planes .../amd/display/amdgpu_dm/amdgpu_dm_plane.c | 29 + drivers/gpu/drm/drm_atomic_uapi.c | 63 ++- include/drm/drm_atomic_uapi.h | 12 include/drm/drm_plane.h | 5 ++ 4 files changed, 92 insertions(+), 17 deletions(-) -- 2.43.0
[PATCH v3 3/3] drm/amdgpu: Implement check_async_props for planes
AMD GPUs can do async flips with changes on more properties than just the FB ID, so implement a custom check_async_props for AMD planes. Allow amdgpu to do async flips with overlay planes as well. Signed-off-by: André Almeida --- v3: allow overlay planes .../amd/display/amdgpu_dm/amdgpu_dm_plane.c | 29 +++ 1 file changed, 29 insertions(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c index 116121e647ca..ed75b69636b4 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c @@ -25,6 +25,7 @@ */ #include +#include #include #include #include @@ -1430,6 +1431,33 @@ static void amdgpu_dm_plane_drm_plane_destroy_state(struct drm_plane *plane, drm_atomic_helper_plane_destroy_state(plane, state); } +static int amdgpu_dm_plane_check_async_props(struct drm_property *prop, + struct drm_plane *plane, + struct drm_plane_state *plane_state, + struct drm_mode_object *obj, + u64 prop_value, u64 old_val) +{ + struct drm_mode_config *config = >dev->mode_config; + int ret; + + if (prop != config->prop_fb_id && + prop != config->prop_in_fence_fd) { + ret = drm_atomic_plane_get_property(plane, plane_state, + prop, _val); + return drm_atomic_check_prop_changes(ret, old_val, prop_value, prop); + } + + if (plane_state->plane->type != DRM_PLANE_TYPE_PRIMARY && + plane_state->plane->type != DRM_PLANE_TYPE_OVERLAY) { + drm_dbg_atomic(prop->dev, + "[OBJECT:%d] Only primary or overlay planes can be changed during async flip\n", + obj->id); + return -EINVAL; + } + + return 0; +} + static const struct drm_plane_funcs dm_plane_funcs = { .update_plane = drm_atomic_helper_update_plane, .disable_plane = drm_atomic_helper_disable_plane, @@ -1438,6 +1466,7 @@ static const struct drm_plane_funcs dm_plane_funcs = { .atomic_duplicate_state = amdgpu_dm_plane_drm_plane_duplicate_state, .atomic_destroy_state = amdgpu_dm_plane_drm_plane_destroy_state, .format_mod_supported = amdgpu_dm_plane_format_mod_supported, + .check_async_props = amdgpu_dm_plane_check_async_props, }; int amdgpu_dm_plane_init(struct amdgpu_display_manager *dm, -- 2.43.0
[PATCH v3 1/3] drm/atomic: Allow drivers to write their own plane check for async flips
Some hardware are more flexible on what they can flip asynchronously, so rework the plane check so drivers can implement their own check, lifting up some of the restrictions. Signed-off-by: André Almeida --- v3: no changes drivers/gpu/drm/drm_atomic_uapi.c | 62 ++- include/drm/drm_atomic_uapi.h | 12 ++ include/drm/drm_plane.h | 5 +++ 3 files changed, 62 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/drm_atomic_uapi.c b/drivers/gpu/drm/drm_atomic_uapi.c index aee4a65d4959..6d5b9fec90c7 100644 --- a/drivers/gpu/drm/drm_atomic_uapi.c +++ b/drivers/gpu/drm/drm_atomic_uapi.c @@ -620,7 +620,7 @@ static int drm_atomic_plane_set_property(struct drm_plane *plane, return 0; } -static int +int drm_atomic_plane_get_property(struct drm_plane *plane, const struct drm_plane_state *state, struct drm_property *property, uint64_t *val) @@ -683,6 +683,7 @@ drm_atomic_plane_get_property(struct drm_plane *plane, return 0; } +EXPORT_SYMBOL(drm_atomic_plane_get_property); static int drm_atomic_set_writeback_fb_for_connector( struct drm_connector_state *conn_state, @@ -1026,18 +1027,54 @@ int drm_atomic_connector_commit_dpms(struct drm_atomic_state *state, return ret; } -static int drm_atomic_check_prop_changes(int ret, uint64_t old_val, uint64_t prop_value, +int drm_atomic_check_prop_changes(int ret, uint64_t old_val, uint64_t prop_value, struct drm_property *prop) { if (ret != 0 || old_val != prop_value) { drm_dbg_atomic(prop->dev, - "[PROP:%d:%s] No prop can be changed during async flip\n", + "[PROP:%d:%s] This prop cannot be changed during async flip\n", prop->base.id, prop->name); return -EINVAL; } return 0; } +EXPORT_SYMBOL(drm_atomic_check_prop_changes); + +/* plane changes may have exceptions, so we have a special function for them */ +static int drm_atomic_check_plane_changes(struct drm_property *prop, + struct drm_plane *plane, + struct drm_plane_state *plane_state, + struct drm_mode_object *obj, + u64 prop_value, u64 old_val) +{ + struct drm_mode_config *config = >dev->mode_config; + int ret; + + if (plane->funcs->check_async_props) + return plane->funcs->check_async_props(prop, plane, plane_state, +obj, prop_value, old_val); + + /* +* if you are trying to change something other than the FB ID, your +* change will be either rejected or ignored, so we can stop the check +* here +*/ + if (prop != config->prop_fb_id) { + ret = drm_atomic_plane_get_property(plane, plane_state, + prop, _val); + return drm_atomic_check_prop_changes(ret, old_val, prop_value, prop); + } + + if (plane_state->plane->type != DRM_PLANE_TYPE_PRIMARY) { + drm_dbg_atomic(prop->dev, + "[OBJECT:%d] Only primary planes can be changed during async flip\n", + obj->id); + return -EINVAL; + } + + return 0; +} int drm_atomic_set_property(struct drm_atomic_state *state, struct drm_file *file_priv, @@ -1100,7 +1137,6 @@ int drm_atomic_set_property(struct drm_atomic_state *state, case DRM_MODE_OBJECT_PLANE: { struct drm_plane *plane = obj_to_plane(obj); struct drm_plane_state *plane_state; - struct drm_mode_config *config = >dev->mode_config; plane_state = drm_atomic_get_plane_state(state, plane); if (IS_ERR(plane_state)) { @@ -1108,19 +1144,11 @@ int drm_atomic_set_property(struct drm_atomic_state *state, break; } - if (async_flip && prop != config->prop_fb_id) { - ret = drm_atomic_plane_get_property(plane, plane_state, - prop, _val); - ret = drm_atomic_check_prop_changes(ret, old_val, prop_value, prop); - break; - } - - if (async_flip && plane_state->plane->type != DRM_PLANE_TYPE_PRIMARY) { - drm_dbg_atomic(prop->dev, - "[OBJECT:%d] Only primary planes can be changed during async flip\n", - obj->id); - ret = -EINVAL; - break; + if (async_flip) { +
[PATCH v3 2/3] drm/atomic: Allow userspace to use explicit sync with atomic async flips
Allow userspace to use explicit synchronization with atomic async flips. That means that the flip will wait for some hardware fence, and then will flip as soon as possible (async) in regard of the vblank. Signed-off-by: André Almeida --- v3: new patch drivers/gpu/drm/drm_atomic_uapi.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_atomic_uapi.c b/drivers/gpu/drm/drm_atomic_uapi.c index 6d5b9fec90c7..edae7924ad69 100644 --- a/drivers/gpu/drm/drm_atomic_uapi.c +++ b/drivers/gpu/drm/drm_atomic_uapi.c @@ -1060,7 +1060,8 @@ static int drm_atomic_check_plane_changes(struct drm_property *prop, * change will be either rejected or ignored, so we can stop the check * here */ - if (prop != config->prop_fb_id) { + if (prop != config->prop_fb_id && + prop != config->prop_in_fence_fd) { ret = drm_atomic_plane_get_property(plane, plane_state, prop, _val); return drm_atomic_check_prop_changes(ret, old_val, prop_value, prop); -- 2.43.0
Re: Have WX 3200 Radeon graphics card -- cannot get X11 session to work
According to "Deucher, Alexander" on Fri, 01/26/24 at 16:28: > > [AMD Official Use Only - General] > > Make sure you have OS mouse and keyboard drivers loaded > and configured within your X config? I got it to work!!! Thanks to all who helped. I got the clue I needed from this page this morning: https://fedoraproject.org/wiki/Input_device_configuration Here is the config that finally works: unix% pwd /usr/local/etc/X11/xorg.conf.d unix% cat 10-driver.conf Section "InputClass" Identifier "Keyboard0" MatchIsKeyboard "on" Driver "libinput" EndSection Section "InputClass" Identifier "Mouse0" MatchIsPointer "on" Driver "libinput" EndSection Section "Device" Identifier "Card0" Driver "amdgpu" BusID "PCI:41:0:0" Option "DisplayPort-0" "Monitor0" EndSection -- William Bulley E-MAIL: w...@umich.edu
[PATCH AUTOSEL 4.19 8/8] drm/amdgpu: Release 'adev->pm.fw' before return in 'amdgpu_device_need_post()'
From: Srinivasan Shanmugam [ Upstream commit 8a44fdd3cf91debbd09b43bd2519ad2b2486ccf4 ] In function 'amdgpu_device_need_post(struct amdgpu_device *adev)' - 'adev->pm.fw' may not be released before return. Using the function release_firmware() to release adev->pm.fw. Thus fixing the below: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1571 amdgpu_device_need_post() warn: 'adev->pm.fw' from request_firmware() not released on lines: 1554. Cc: Monk Liu Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Suggested-by: Lijo Lazar Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index c84f475d4f13..ae28f72c73ef 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -823,6 +823,7 @@ bool amdgpu_device_need_post(struct amdgpu_device *adev) return true; fw_ver = *((uint32_t *)adev->pm.fw->data + 69); + release_firmware(adev->pm.fw); if (fw_ver < 0x00160e00) return true; } -- 2.43.0
[PATCH AUTOSEL 5.10 12/13] drm/amd/powerplay: Fix kzalloc parameter 'ATOM_Tonga_PPM_Table' in 'get_platform_power_management_table()'
From: Srinivasan Shanmugam [ Upstream commit 6616b5e1999146b1304abe78232af810080c67e3 ] In 'struct phm_ppm_table *ptr' allocation using kzalloc, an incorrect structure type is passed to sizeof() in kzalloc, larger structure types were used, thus using correct type 'struct phm_ppm_table' fixes the below: drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/process_pptables_v1_0.c:203 get_platform_power_management_table() warn: struct type mismatch 'phm_ppm_table vs _ATOM_Tonga_PPM_Table' Cc: Eric Huang Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Acked-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c index b760f95e7fa7..5998c78ad536 100644 --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c @@ -204,7 +204,7 @@ static int get_platform_power_management_table( struct pp_hwmgr *hwmgr, ATOM_Tonga_PPM_Table *atom_ppm_table) { - struct phm_ppm_table *ptr = kzalloc(sizeof(ATOM_Tonga_PPM_Table), GFP_KERNEL); + struct phm_ppm_table *ptr = kzalloc(sizeof(*ptr), GFP_KERNEL); struct phm_ppt_v1_information *pp_table_information = (struct phm_ppt_v1_information *)(hwmgr->pptable); -- 2.43.0
[PATCH AUTOSEL 5.4 11/11] drm/amdgpu: Release 'adev->pm.fw' before return in 'amdgpu_device_need_post()'
From: Srinivasan Shanmugam [ Upstream commit 8a44fdd3cf91debbd09b43bd2519ad2b2486ccf4 ] In function 'amdgpu_device_need_post(struct amdgpu_device *adev)' - 'adev->pm.fw' may not be released before return. Using the function release_firmware() to release adev->pm.fw. Thus fixing the below: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1571 amdgpu_device_need_post() warn: 'adev->pm.fw' from request_firmware() not released on lines: 1554. Cc: Monk Liu Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Suggested-by: Lijo Lazar Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e5032eb9ae29..9dcb38bab0e1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -847,6 +847,7 @@ bool amdgpu_device_need_post(struct amdgpu_device *adev) return true; fw_ver = *((uint32_t *)adev->pm.fw->data + 69); + release_firmware(adev->pm.fw); if (fw_ver < 0x00160e00) return true; } -- 2.43.0
[PATCH AUTOSEL 5.15 19/19] drm/amdkfd: Fix 'node' NULL check in 'svm_range_get_range_boundaries()'
From: Srinivasan Shanmugam [ Upstream commit d7a254fad873775ce6c32b77796c81e81e6b7f2e ] Range interval [start, last] is ordered by rb_tree, rb_prev, rb_next return value still needs NULL check, thus modified from "node" to "rb_node". Fixes the below: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:2691 svm_range_get_range_boundaries() warn: can 'node' even be NULL? Suggested-by: Philip Yang Cc: Felix Kuehling Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index e2d4e2b42a7c..7f55decc5f37 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -2325,6 +2325,7 @@ svm_range_get_range_boundaries(struct kfd_process *p, int64_t addr, { struct vm_area_struct *vma; struct interval_tree_node *node; + struct rb_node *rb_node; unsigned long start_limit, end_limit; vma = find_vma(p->mm, addr << PAGE_SHIFT); @@ -2341,16 +2342,15 @@ svm_range_get_range_boundaries(struct kfd_process *p, int64_t addr, if (node) { end_limit = min(end_limit, node->start); /* Last range that ends before the fault address */ - node = container_of(rb_prev(>rb), - struct interval_tree_node, rb); + rb_node = rb_prev(>rb); } else { /* Last range must end before addr because * there was no range after addr */ - node = container_of(rb_last(>svms.objects.rb_root), - struct interval_tree_node, rb); + rb_node = rb_last(>svms.objects.rb_root); } - if (node) { + if (rb_node) { + node = container_of(rb_node, struct interval_tree_node, rb); if (node->last >= addr) { WARN(1, "Overlap with prev node and page fault addr\n"); return -EFAULT; -- 2.43.0
[PATCH AUTOSEL 5.15 18/19] drm/amdgpu: Release 'adev->pm.fw' before return in 'amdgpu_device_need_post()'
From: Srinivasan Shanmugam [ Upstream commit 8a44fdd3cf91debbd09b43bd2519ad2b2486ccf4 ] In function 'amdgpu_device_need_post(struct amdgpu_device *adev)' - 'adev->pm.fw' may not be released before return. Using the function release_firmware() to release adev->pm.fw. Thus fixing the below: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1571 amdgpu_device_need_post() warn: 'adev->pm.fw' from request_firmware() not released on lines: 1554. Cc: Monk Liu Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Suggested-by: Lijo Lazar Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 19e32f38a4c4..816dd59212c1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -1292,6 +1292,7 @@ bool amdgpu_device_need_post(struct amdgpu_device *adev) return true; fw_ver = *((uint32_t *)adev->pm.fw->data + 69); + release_firmware(adev->pm.fw); if (fw_ver < 0x00160e00) return true; } -- 2.43.0
[PATCH AUTOSEL 5.10 13/13] drm/amdgpu: Release 'adev->pm.fw' before return in 'amdgpu_device_need_post()'
From: Srinivasan Shanmugam [ Upstream commit 8a44fdd3cf91debbd09b43bd2519ad2b2486ccf4 ] In function 'amdgpu_device_need_post(struct amdgpu_device *adev)' - 'adev->pm.fw' may not be released before return. Using the function release_firmware() to release adev->pm.fw. Thus fixing the below: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1571 amdgpu_device_need_post() warn: 'adev->pm.fw' from request_firmware() not released on lines: 1554. Cc: Monk Liu Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Suggested-by: Lijo Lazar Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index a093f1b27724..e833c02fabff 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -1184,6 +1184,7 @@ bool amdgpu_device_need_post(struct amdgpu_device *adev) return true; fw_ver = *((uint32_t *)adev->pm.fw->data + 69); + release_firmware(adev->pm.fw); if (fw_ver < 0x00160e00) return true; } -- 2.43.0
[PATCH AUTOSEL 5.15 17/19] drm/amd/powerplay: Fix kzalloc parameter 'ATOM_Tonga_PPM_Table' in 'get_platform_power_management_table()'
From: Srinivasan Shanmugam [ Upstream commit 6616b5e1999146b1304abe78232af810080c67e3 ] In 'struct phm_ppm_table *ptr' allocation using kzalloc, an incorrect structure type is passed to sizeof() in kzalloc, larger structure types were used, thus using correct type 'struct phm_ppm_table' fixes the below: drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/process_pptables_v1_0.c:203 get_platform_power_management_table() warn: struct type mismatch 'phm_ppm_table vs _ATOM_Tonga_PPM_Table' Cc: Eric Huang Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Acked-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c index f2a55c1413f5..17882f8dfdd3 100644 --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c @@ -200,7 +200,7 @@ static int get_platform_power_management_table( struct pp_hwmgr *hwmgr, ATOM_Tonga_PPM_Table *atom_ppm_table) { - struct phm_ppm_table *ptr = kzalloc(sizeof(ATOM_Tonga_PPM_Table), GFP_KERNEL); + struct phm_ppm_table *ptr = kzalloc(sizeof(*ptr), GFP_KERNEL); struct phm_ppt_v1_information *pp_table_information = (struct phm_ppt_v1_information *)(hwmgr->pptable); -- 2.43.0
[PATCH AUTOSEL 5.15 13/19] drm/amdkfd: Fix lock dependency warning
From: Felix Kuehling [ Upstream commit 47bf0f83fc86df1bf42b385a91aadb910137c5c9 ] == WARNING: possible circular locking dependency detected 6.5.0-kfd-fkuehlin #276 Not tainted -- kworker/8:2/2676 is trying to acquire lock: 9435aae95c88 ((work_completion)(_bo->eviction_work)){+.+.}-{0:0}, at: __flush_work+0x52/0x550 but task is already holding lock: 9435cd8e1720 (>lock){+.+.}-{3:3}, at: svm_range_deferred_list_work+0xe8/0x340 [amdgpu] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (>lock){+.+.}-{3:3}: __mutex_lock+0x97/0xd30 kfd_ioctl_alloc_memory_of_gpu+0x6d/0x3c0 [amdgpu] kfd_ioctl+0x1b2/0x5d0 [amdgpu] __x64_sys_ioctl+0x86/0xc0 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd -> #1 (>mmap_lock){}-{3:3}: down_read+0x42/0x160 svm_range_evict_svm_bo_worker+0x8b/0x340 [amdgpu] process_one_work+0x27a/0x540 worker_thread+0x53/0x3e0 kthread+0xeb/0x120 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x11/0x20 -> #0 ((work_completion)(_bo->eviction_work)){+.+.}-{0:0}: __lock_acquire+0x1426/0x2200 lock_acquire+0xc1/0x2b0 __flush_work+0x80/0x550 __cancel_work_timer+0x109/0x190 svm_range_bo_release+0xdc/0x1c0 [amdgpu] svm_range_free+0x175/0x180 [amdgpu] svm_range_deferred_list_work+0x15d/0x340 [amdgpu] process_one_work+0x27a/0x540 worker_thread+0x53/0x3e0 kthread+0xeb/0x120 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x11/0x20 other info that might help us debug this: Chain exists of: (work_completion)(_bo->eviction_work) --> >mmap_lock --> >lock Possible unsafe locking scenario: CPU0CPU1 lock(>lock); lock(>mmap_lock); lock(>lock); lock((work_completion)(_bo->eviction_work)); I believe this cannot really lead to a deadlock in practice, because svm_range_evict_svm_bo_worker only takes the mmap_read_lock if the BO refcount is non-0. That means it's impossible that svm_range_bo_release is running concurrently. However, there is no good way to annotate this. To avoid the problem, take a BO reference in svm_range_schedule_evict_svm_bo instead of in the worker. That way it's impossible for a BO to get freed while eviction work is pending and the cancel_work_sync call in svm_range_bo_release can be eliminated. v2: Use svm_bo_ref_unless_zero and explained why that's safe. Also removed redundant checks that are already done in amdkfd_fence_enable_signaling. Signed-off-by: Felix Kuehling Reviewed-by: Philip Yang Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 26 ++ 1 file changed, 10 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 2cbe8ea16f24..e2d4e2b42a7c 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -347,14 +347,9 @@ static void svm_range_bo_release(struct kref *kref) spin_lock(_bo->list_lock); } spin_unlock(_bo->list_lock); - if (!dma_fence_is_signaled(_bo->eviction_fence->base)) { - /* We're not in the eviction worker. -* Signal the fence and synchronize with any -* pending eviction work. -*/ + if (!dma_fence_is_signaled(_bo->eviction_fence->base)) + /* We're not in the eviction worker. Signal the fence. */ dma_fence_signal(_bo->eviction_fence->base); - cancel_work_sync(_bo->eviction_work); - } dma_fence_put(_bo->eviction_fence->base); amdgpu_bo_unref(_bo->bo); kfree(svm_bo); @@ -2872,13 +2867,14 @@ svm_range_trigger_migration(struct mm_struct *mm, struct svm_range *prange, int svm_range_schedule_evict_svm_bo(struct amdgpu_amdkfd_fence *fence) { - if (!fence) - return -EINVAL; - - if (dma_fence_is_signaled(>base)) - return 0; - - if (fence->svm_bo) { + /* Dereferencing fence->svm_bo is safe here because the fence hasn't +* signaled yet and we're under the protection of the fence->lock. +* After the fence is signaled in svm_range_bo_release, we cannot get +* here any more. +* +* Reference is dropped in svm_range_evict_svm_bo_worker. +*/ + if (svm_bo_ref_unless_zero(fence->svm_bo)) { WRITE_ONCE(fence->svm_bo->evicting, 1); schedule_work(>svm_bo->eviction_work); } @@ -2893,8 +2889,6 @@ static void svm_range_evict_svm_bo_worker(struct work_struct *work) struct mm_struct *mm;
[PATCH AUTOSEL 6.1 26/27] drm/amdkfd: Fix 'node' NULL check in 'svm_range_get_range_boundaries()'
From: Srinivasan Shanmugam [ Upstream commit d7a254fad873775ce6c32b77796c81e81e6b7f2e ] Range interval [start, last] is ordered by rb_tree, rb_prev, rb_next return value still needs NULL check, thus modified from "node" to "rb_node". Fixes the below: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:2691 svm_range_get_range_boundaries() warn: can 'node' even be NULL? Suggested-by: Philip Yang Cc: Felix Kuehling Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 5188c4d2e7c0..7fa5e70f1aac 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -2553,6 +2553,7 @@ svm_range_get_range_boundaries(struct kfd_process *p, int64_t addr, { struct vm_area_struct *vma; struct interval_tree_node *node; + struct rb_node *rb_node; unsigned long start_limit, end_limit; vma = find_vma(p->mm, addr << PAGE_SHIFT); @@ -2575,16 +2576,15 @@ svm_range_get_range_boundaries(struct kfd_process *p, int64_t addr, if (node) { end_limit = min(end_limit, node->start); /* Last range that ends before the fault address */ - node = container_of(rb_prev(>rb), - struct interval_tree_node, rb); + rb_node = rb_prev(>rb); } else { /* Last range must end before addr because * there was no range after addr */ - node = container_of(rb_last(>svms.objects.rb_root), - struct interval_tree_node, rb); + rb_node = rb_last(>svms.objects.rb_root); } - if (node) { + if (rb_node) { + node = container_of(rb_node, struct interval_tree_node, rb); if (node->last >= addr) { WARN(1, "Overlap with prev node and page fault addr\n"); return -EFAULT; -- 2.43.0
[PATCH AUTOSEL 6.1 25/27] drm/amdgpu: Release 'adev->pm.fw' before return in 'amdgpu_device_need_post()'
From: Srinivasan Shanmugam [ Upstream commit 8a44fdd3cf91debbd09b43bd2519ad2b2486ccf4 ] In function 'amdgpu_device_need_post(struct amdgpu_device *adev)' - 'adev->pm.fw' may not be released before return. Using the function release_firmware() to release adev->pm.fw. Thus fixing the below: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1571 amdgpu_device_need_post() warn: 'adev->pm.fw' from request_firmware() not released on lines: 1554. Cc: Monk Liu Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Suggested-by: Lijo Lazar Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index a5352e5e2bd4..4b91f95066ec 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -1310,6 +1310,7 @@ bool amdgpu_device_need_post(struct amdgpu_device *adev) return true; fw_ver = *((uint32_t *)adev->pm.fw->data + 69); + release_firmware(adev->pm.fw); if (fw_ver < 0x00160e00) return true; } -- 2.43.0
[PATCH AUTOSEL 6.1 24/27] drm/amdgpu: Fix with right return code '-EIO' in 'amdgpu_gmc_vram_checking()'
From: Srinivasan Shanmugam [ Upstream commit fac4ebd79fed60e79cccafdad45a2bb8d3795044 ] The amdgpu_gmc_vram_checking() function in emulation checks whether all of the memory range of shared system memory could be accessed by GPU, from this aspect, -EIO is returned for error scenarios. Fixes the below: drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:919 gmc_v6_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1103 gmc_v7_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1223 gmc_v8_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2344 gmc_v9_0_hw_init() warn: missing error code? 'r' Cc: Xiaojian Du Cc: Lijo Lazar Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Suggested-by: Christian König Reviewed-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index 2bc791ed8830..ea0fb079f942 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c @@ -808,19 +808,26 @@ int amdgpu_gmc_vram_checking(struct amdgpu_device *adev) * seconds, so here, we just pick up three parts for emulation. */ ret = memcmp(vram_ptr, cptr, 10); - if (ret) - return ret; + if (ret) { + ret = -EIO; + goto release_buffer; + } ret = memcmp(vram_ptr + (size / 2), cptr, 10); - if (ret) - return ret; + if (ret) { + ret = -EIO; + goto release_buffer; + } ret = memcmp(vram_ptr + size - 10, cptr, 10); - if (ret) - return ret; + if (ret) { + ret = -EIO; + goto release_buffer; + } +release_buffer: amdgpu_bo_free_kernel(_bo, _gpu, _ptr); - return 0; + return ret; } -- 2.43.0
[PATCH AUTOSEL 6.1 23/27] drm/amd/powerplay: Fix kzalloc parameter 'ATOM_Tonga_PPM_Table' in 'get_platform_power_management_table()'
From: Srinivasan Shanmugam [ Upstream commit 6616b5e1999146b1304abe78232af810080c67e3 ] In 'struct phm_ppm_table *ptr' allocation using kzalloc, an incorrect structure type is passed to sizeof() in kzalloc, larger structure types were used, thus using correct type 'struct phm_ppm_table' fixes the below: drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/process_pptables_v1_0.c:203 get_platform_power_management_table() warn: struct type mismatch 'phm_ppm_table vs _ATOM_Tonga_PPM_Table' Cc: Eric Huang Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Acked-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c index f2a55c1413f5..17882f8dfdd3 100644 --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c @@ -200,7 +200,7 @@ static int get_platform_power_management_table( struct pp_hwmgr *hwmgr, ATOM_Tonga_PPM_Table *atom_ppm_table) { - struct phm_ppm_table *ptr = kzalloc(sizeof(ATOM_Tonga_PPM_Table), GFP_KERNEL); + struct phm_ppm_table *ptr = kzalloc(sizeof(*ptr), GFP_KERNEL); struct phm_ppt_v1_information *pp_table_information = (struct phm_ppt_v1_information *)(hwmgr->pptable); -- 2.43.0
[PATCH AUTOSEL 6.1 16/27] drm/amdkfd: Fix lock dependency warning
From: Felix Kuehling [ Upstream commit 47bf0f83fc86df1bf42b385a91aadb910137c5c9 ] == WARNING: possible circular locking dependency detected 6.5.0-kfd-fkuehlin #276 Not tainted -- kworker/8:2/2676 is trying to acquire lock: 9435aae95c88 ((work_completion)(_bo->eviction_work)){+.+.}-{0:0}, at: __flush_work+0x52/0x550 but task is already holding lock: 9435cd8e1720 (>lock){+.+.}-{3:3}, at: svm_range_deferred_list_work+0xe8/0x340 [amdgpu] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (>lock){+.+.}-{3:3}: __mutex_lock+0x97/0xd30 kfd_ioctl_alloc_memory_of_gpu+0x6d/0x3c0 [amdgpu] kfd_ioctl+0x1b2/0x5d0 [amdgpu] __x64_sys_ioctl+0x86/0xc0 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd -> #1 (>mmap_lock){}-{3:3}: down_read+0x42/0x160 svm_range_evict_svm_bo_worker+0x8b/0x340 [amdgpu] process_one_work+0x27a/0x540 worker_thread+0x53/0x3e0 kthread+0xeb/0x120 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x11/0x20 -> #0 ((work_completion)(_bo->eviction_work)){+.+.}-{0:0}: __lock_acquire+0x1426/0x2200 lock_acquire+0xc1/0x2b0 __flush_work+0x80/0x550 __cancel_work_timer+0x109/0x190 svm_range_bo_release+0xdc/0x1c0 [amdgpu] svm_range_free+0x175/0x180 [amdgpu] svm_range_deferred_list_work+0x15d/0x340 [amdgpu] process_one_work+0x27a/0x540 worker_thread+0x53/0x3e0 kthread+0xeb/0x120 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x11/0x20 other info that might help us debug this: Chain exists of: (work_completion)(_bo->eviction_work) --> >mmap_lock --> >lock Possible unsafe locking scenario: CPU0CPU1 lock(>lock); lock(>mmap_lock); lock(>lock); lock((work_completion)(_bo->eviction_work)); I believe this cannot really lead to a deadlock in practice, because svm_range_evict_svm_bo_worker only takes the mmap_read_lock if the BO refcount is non-0. That means it's impossible that svm_range_bo_release is running concurrently. However, there is no good way to annotate this. To avoid the problem, take a BO reference in svm_range_schedule_evict_svm_bo instead of in the worker. That way it's impossible for a BO to get freed while eviction work is pending and the cancel_work_sync call in svm_range_bo_release can be eliminated. v2: Use svm_bo_ref_unless_zero and explained why that's safe. Also removed redundant checks that are already done in amdkfd_fence_enable_signaling. Signed-off-by: Felix Kuehling Reviewed-by: Philip Yang Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 26 ++ 1 file changed, 10 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 208812512d8a..4ecc4be1a910 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -380,14 +380,9 @@ static void svm_range_bo_release(struct kref *kref) spin_lock(_bo->list_lock); } spin_unlock(_bo->list_lock); - if (!dma_fence_is_signaled(_bo->eviction_fence->base)) { - /* We're not in the eviction worker. -* Signal the fence and synchronize with any -* pending eviction work. -*/ + if (!dma_fence_is_signaled(_bo->eviction_fence->base)) + /* We're not in the eviction worker. Signal the fence. */ dma_fence_signal(_bo->eviction_fence->base); - cancel_work_sync(_bo->eviction_work); - } dma_fence_put(_bo->eviction_fence->base); amdgpu_bo_unref(_bo->bo); kfree(svm_bo); @@ -3310,13 +3305,14 @@ svm_range_trigger_migration(struct mm_struct *mm, struct svm_range *prange, int svm_range_schedule_evict_svm_bo(struct amdgpu_amdkfd_fence *fence) { - if (!fence) - return -EINVAL; - - if (dma_fence_is_signaled(>base)) - return 0; - - if (fence->svm_bo) { + /* Dereferencing fence->svm_bo is safe here because the fence hasn't +* signaled yet and we're under the protection of the fence->lock. +* After the fence is signaled in svm_range_bo_release, we cannot get +* here any more. +* +* Reference is dropped in svm_range_evict_svm_bo_worker. +*/ + if (svm_bo_ref_unless_zero(fence->svm_bo)) { WRITE_ONCE(fence->svm_bo->evicting, 1); schedule_work(>svm_bo->eviction_work); } @@ -3331,8 +3327,6 @@ static void svm_range_evict_svm_bo_worker(struct work_struct *work) int r = 0; svm_bo =
[PATCH AUTOSEL 6.1 17/27] drm/amdkfd: Fix lock dependency warning with srcu
From: Philip Yang [ Upstream commit 2a9de42e8d3c82c6990d226198602be44f43f340 ] == WARNING: possible circular locking dependency detected 6.5.0-kfd-yangp #2289 Not tainted -- kworker/0:2/996 is trying to acquire lock: (srcu){.+.+}-{0:0}, at: __synchronize_srcu+0x5/0x1a0 but task is already holding lock: ((work_completion)(>deferred_list_work)){+.+.}-{0:0}, at: process_one_work+0x211/0x560 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 ((work_completion)(>deferred_list_work)){+.+.}-{0:0}: __flush_work+0x88/0x4f0 svm_range_list_lock_and_flush_work+0x3d/0x110 [amdgpu] svm_range_set_attr+0xd6/0x14c0 [amdgpu] kfd_ioctl+0x1d1/0x630 [amdgpu] __x64_sys_ioctl+0x88/0xc0 -> #2 (>lock#2){+.+.}-{3:3}: __mutex_lock+0x99/0xc70 amdgpu_amdkfd_gpuvm_restore_process_bos+0x54/0x740 [amdgpu] restore_process_helper+0x22/0x80 [amdgpu] restore_process_worker+0x2d/0xa0 [amdgpu] process_one_work+0x29b/0x560 worker_thread+0x3d/0x3d0 -> #1 ((work_completion)(&(>restore_work)->work)){+.+.}-{0:0}: __flush_work+0x88/0x4f0 __cancel_work_timer+0x12c/0x1c0 kfd_process_notifier_release_internal+0x37/0x1f0 [amdgpu] __mmu_notifier_release+0xad/0x240 exit_mmap+0x6a/0x3a0 mmput+0x6a/0x120 do_exit+0x322/0xb90 do_group_exit+0x37/0xa0 __x64_sys_exit_group+0x18/0x20 do_syscall_64+0x38/0x80 -> #0 (srcu){.+.+}-{0:0}: __lock_acquire+0x1521/0x2510 lock_sync+0x5f/0x90 __synchronize_srcu+0x4f/0x1a0 __mmu_notifier_release+0x128/0x240 exit_mmap+0x6a/0x3a0 mmput+0x6a/0x120 svm_range_deferred_list_work+0x19f/0x350 [amdgpu] process_one_work+0x29b/0x560 worker_thread+0x3d/0x3d0 other info that might help us debug this: Chain exists of: srcu --> >lock#2 --> (work_completion)(>deferred_list_work) Possible unsafe locking scenario: CPU0CPU1 lock((work_completion)(>deferred_list_work)); lock(>lock#2); lock((work_completion)(>deferred_list_work)); sync(srcu); Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 4ecc4be1a910..5188c4d2e7c0 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -2241,8 +2241,10 @@ static void svm_range_deferred_list_work(struct work_struct *work) mutex_unlock(>lock); mmap_write_unlock(mm); - /* Pairs with mmget in svm_range_add_list_work */ - mmput(mm); + /* Pairs with mmget in svm_range_add_list_work. If dropping the +* last mm refcount, schedule release work to avoid circular locking +*/ + mmput_async(mm); spin_lock(>deferred_list_lock); } -- 2.43.0
[PATCH AUTOSEL 6.6 30/31] drm/amdkfd: Fix 'node' NULL check in 'svm_range_get_range_boundaries()'
From: Srinivasan Shanmugam [ Upstream commit d7a254fad873775ce6c32b77796c81e81e6b7f2e ] Range interval [start, last] is ordered by rb_tree, rb_prev, rb_next return value still needs NULL check, thus modified from "node" to "rb_node". Fixes the below: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:2691 svm_range_get_range_boundaries() warn: can 'node' even be NULL? Suggested-by: Philip Yang Cc: Felix Kuehling Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index b51224a85a38..87e9ca65e58e 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -2657,6 +2657,7 @@ svm_range_get_range_boundaries(struct kfd_process *p, int64_t addr, { struct vm_area_struct *vma; struct interval_tree_node *node; + struct rb_node *rb_node; unsigned long start_limit, end_limit; vma = vma_lookup(p->mm, addr << PAGE_SHIFT); @@ -2676,16 +2677,15 @@ svm_range_get_range_boundaries(struct kfd_process *p, int64_t addr, if (node) { end_limit = min(end_limit, node->start); /* Last range that ends before the fault address */ - node = container_of(rb_prev(>rb), - struct interval_tree_node, rb); + rb_node = rb_prev(>rb); } else { /* Last range must end before addr because * there was no range after addr */ - node = container_of(rb_last(>svms.objects.rb_root), - struct interval_tree_node, rb); + rb_node = rb_last(>svms.objects.rb_root); } - if (node) { + if (rb_node) { + node = container_of(rb_node, struct interval_tree_node, rb); if (node->last >= addr) { WARN(1, "Overlap with prev node and page fault addr\n"); return -EFAULT; -- 2.43.0
[PATCH AUTOSEL 6.6 29/31] drm/amdgpu: Release 'adev->pm.fw' before return in 'amdgpu_device_need_post()'
From: Srinivasan Shanmugam [ Upstream commit 8a44fdd3cf91debbd09b43bd2519ad2b2486ccf4 ] In function 'amdgpu_device_need_post(struct amdgpu_device *adev)' - 'adev->pm.fw' may not be released before return. Using the function release_firmware() to release adev->pm.fw. Thus fixing the below: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1571 amdgpu_device_need_post() warn: 'adev->pm.fw' from request_firmware() not released on lines: 1554. Cc: Monk Liu Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Suggested-by: Lijo Lazar Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 56d99ffbba2e..7791367e7c02 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -1218,6 +1218,7 @@ bool amdgpu_device_need_post(struct amdgpu_device *adev) return true; fw_ver = *((uint32_t *)adev->pm.fw->data + 69); + release_firmware(adev->pm.fw); if (fw_ver < 0x00160e00) return true; } -- 2.43.0
[PATCH AUTOSEL 6.6 26/31] drm/amdgpu: fix avg vs input power reporting on smu7
From: Alex Deucher [ Upstream commit 25852d4b97572ff62ffee574cb8bb4bc551af23a ] Hawaii, Bonaire, Fiji, and Tonga support average power, the others support current power. Reviewed-by: Yang Wang Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c index 11372fcc59c8..a2c7b2e111fa 100644 --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c @@ -3995,6 +3995,7 @@ static int smu7_read_sensor(struct pp_hwmgr *hwmgr, int idx, uint32_t sclk, mclk, activity_percent; uint32_t offset, val_vid; struct smu7_hwmgr *data = (struct smu7_hwmgr *)(hwmgr->backend); + struct amdgpu_device *adev = hwmgr->adev; /* size must be at least 4 bytes for all sensors */ if (*size < 4) @@ -4038,7 +4039,21 @@ static int smu7_read_sensor(struct pp_hwmgr *hwmgr, int idx, *size = 4; return 0; case AMDGPU_PP_SENSOR_GPU_INPUT_POWER: - return smu7_get_gpu_power(hwmgr, (uint32_t *)value); + if ((adev->asic_type != CHIP_HAWAII) && + (adev->asic_type != CHIP_BONAIRE) && + (adev->asic_type != CHIP_FIJI) && + (adev->asic_type != CHIP_TONGA)) + return smu7_get_gpu_power(hwmgr, (uint32_t *)value); + else + return -EOPNOTSUPP; + case AMDGPU_PP_SENSOR_GPU_AVG_POWER: + if ((adev->asic_type != CHIP_HAWAII) && + (adev->asic_type != CHIP_BONAIRE) && + (adev->asic_type != CHIP_FIJI) && + (adev->asic_type != CHIP_TONGA)) + return -EOPNOTSUPP; + else + return smu7_get_gpu_power(hwmgr, (uint32_t *)value); case AMDGPU_PP_SENSOR_VDDGFX: if ((data->vr_config & VRCONF_VDDGFX_MASK) == (VR_SVI2_PLANE_2 << VRCONF_VDDGFX_SHIFT)) -- 2.43.0
[PATCH AUTOSEL 6.6 28/31] drm/amdgpu: Fix with right return code '-EIO' in 'amdgpu_gmc_vram_checking()'
From: Srinivasan Shanmugam [ Upstream commit fac4ebd79fed60e79cccafdad45a2bb8d3795044 ] The amdgpu_gmc_vram_checking() function in emulation checks whether all of the memory range of shared system memory could be accessed by GPU, from this aspect, -EIO is returned for error scenarios. Fixes the below: drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:919 gmc_v6_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1103 gmc_v7_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1223 gmc_v8_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2344 gmc_v9_0_hw_init() warn: missing error code? 'r' Cc: Xiaojian Du Cc: Lijo Lazar Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Suggested-by: Christian König Reviewed-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index d78bd9732543..bc0eda1a729c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c @@ -876,21 +876,28 @@ int amdgpu_gmc_vram_checking(struct amdgpu_device *adev) * seconds, so here, we just pick up three parts for emulation. */ ret = memcmp(vram_ptr, cptr, 10); - if (ret) - return ret; + if (ret) { + ret = -EIO; + goto release_buffer; + } ret = memcmp(vram_ptr + (size / 2), cptr, 10); - if (ret) - return ret; + if (ret) { + ret = -EIO; + goto release_buffer; + } ret = memcmp(vram_ptr + size - 10, cptr, 10); - if (ret) - return ret; + if (ret) { + ret = -EIO; + goto release_buffer; + } +release_buffer: amdgpu_bo_free_kernel(_bo, _gpu, _ptr); - return 0; + return ret; } static ssize_t current_memory_partition_show( -- 2.43.0
[PATCH AUTOSEL 6.6 27/31] drm/amd/powerplay: Fix kzalloc parameter 'ATOM_Tonga_PPM_Table' in 'get_platform_power_management_table()'
From: Srinivasan Shanmugam [ Upstream commit 6616b5e1999146b1304abe78232af810080c67e3 ] In 'struct phm_ppm_table *ptr' allocation using kzalloc, an incorrect structure type is passed to sizeof() in kzalloc, larger structure types were used, thus using correct type 'struct phm_ppm_table' fixes the below: drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/process_pptables_v1_0.c:203 get_platform_power_management_table() warn: struct type mismatch 'phm_ppm_table vs _ATOM_Tonga_PPM_Table' Cc: Eric Huang Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Acked-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c index f2a55c1413f5..17882f8dfdd3 100644 --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c @@ -200,7 +200,7 @@ static int get_platform_power_management_table( struct pp_hwmgr *hwmgr, ATOM_Tonga_PPM_Table *atom_ppm_table) { - struct phm_ppm_table *ptr = kzalloc(sizeof(ATOM_Tonga_PPM_Table), GFP_KERNEL); + struct phm_ppm_table *ptr = kzalloc(sizeof(*ptr), GFP_KERNEL); struct phm_ppt_v1_information *pp_table_information = (struct phm_ppt_v1_information *)(hwmgr->pptable); -- 2.43.0
[PATCH AUTOSEL 6.6 19/31] drm/amdkfd: Fix lock dependency warning
From: Felix Kuehling [ Upstream commit 47bf0f83fc86df1bf42b385a91aadb910137c5c9 ] == WARNING: possible circular locking dependency detected 6.5.0-kfd-fkuehlin #276 Not tainted -- kworker/8:2/2676 is trying to acquire lock: 9435aae95c88 ((work_completion)(_bo->eviction_work)){+.+.}-{0:0}, at: __flush_work+0x52/0x550 but task is already holding lock: 9435cd8e1720 (>lock){+.+.}-{3:3}, at: svm_range_deferred_list_work+0xe8/0x340 [amdgpu] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (>lock){+.+.}-{3:3}: __mutex_lock+0x97/0xd30 kfd_ioctl_alloc_memory_of_gpu+0x6d/0x3c0 [amdgpu] kfd_ioctl+0x1b2/0x5d0 [amdgpu] __x64_sys_ioctl+0x86/0xc0 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd -> #1 (>mmap_lock){}-{3:3}: down_read+0x42/0x160 svm_range_evict_svm_bo_worker+0x8b/0x340 [amdgpu] process_one_work+0x27a/0x540 worker_thread+0x53/0x3e0 kthread+0xeb/0x120 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x11/0x20 -> #0 ((work_completion)(_bo->eviction_work)){+.+.}-{0:0}: __lock_acquire+0x1426/0x2200 lock_acquire+0xc1/0x2b0 __flush_work+0x80/0x550 __cancel_work_timer+0x109/0x190 svm_range_bo_release+0xdc/0x1c0 [amdgpu] svm_range_free+0x175/0x180 [amdgpu] svm_range_deferred_list_work+0x15d/0x340 [amdgpu] process_one_work+0x27a/0x540 worker_thread+0x53/0x3e0 kthread+0xeb/0x120 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x11/0x20 other info that might help us debug this: Chain exists of: (work_completion)(_bo->eviction_work) --> >mmap_lock --> >lock Possible unsafe locking scenario: CPU0CPU1 lock(>lock); lock(>mmap_lock); lock(>lock); lock((work_completion)(_bo->eviction_work)); I believe this cannot really lead to a deadlock in practice, because svm_range_evict_svm_bo_worker only takes the mmap_read_lock if the BO refcount is non-0. That means it's impossible that svm_range_bo_release is running concurrently. However, there is no good way to annotate this. To avoid the problem, take a BO reference in svm_range_schedule_evict_svm_bo instead of in the worker. That way it's impossible for a BO to get freed while eviction work is pending and the cancel_work_sync call in svm_range_bo_release can be eliminated. v2: Use svm_bo_ref_unless_zero and explained why that's safe. Also removed redundant checks that are already done in amdkfd_fence_enable_signaling. Signed-off-by: Felix Kuehling Reviewed-by: Philip Yang Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 26 ++ 1 file changed, 10 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 8e368e4659fd..a4c911fa1675 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -391,14 +391,9 @@ static void svm_range_bo_release(struct kref *kref) spin_lock(_bo->list_lock); } spin_unlock(_bo->list_lock); - if (!dma_fence_is_signaled(_bo->eviction_fence->base)) { - /* We're not in the eviction worker. -* Signal the fence and synchronize with any -* pending eviction work. -*/ + if (!dma_fence_is_signaled(_bo->eviction_fence->base)) + /* We're not in the eviction worker. Signal the fence. */ dma_fence_signal(_bo->eviction_fence->base); - cancel_work_sync(_bo->eviction_work); - } dma_fence_put(_bo->eviction_fence->base); amdgpu_bo_unref(_bo->bo); kfree(svm_bo); @@ -3424,13 +3419,14 @@ svm_range_trigger_migration(struct mm_struct *mm, struct svm_range *prange, int svm_range_schedule_evict_svm_bo(struct amdgpu_amdkfd_fence *fence) { - if (!fence) - return -EINVAL; - - if (dma_fence_is_signaled(>base)) - return 0; - - if (fence->svm_bo) { + /* Dereferencing fence->svm_bo is safe here because the fence hasn't +* signaled yet and we're under the protection of the fence->lock. +* After the fence is signaled in svm_range_bo_release, we cannot get +* here any more. +* +* Reference is dropped in svm_range_evict_svm_bo_worker. +*/ + if (svm_bo_ref_unless_zero(fence->svm_bo)) { WRITE_ONCE(fence->svm_bo->evicting, 1); schedule_work(>svm_bo->eviction_work); } @@ -3445,8 +3441,6 @@ static void svm_range_evict_svm_bo_worker(struct work_struct *work) int r = 0; svm_bo =
[PATCH AUTOSEL 6.6 20/31] drm/amdkfd: Fix lock dependency warning with srcu
From: Philip Yang [ Upstream commit 2a9de42e8d3c82c6990d226198602be44f43f340 ] == WARNING: possible circular locking dependency detected 6.5.0-kfd-yangp #2289 Not tainted -- kworker/0:2/996 is trying to acquire lock: (srcu){.+.+}-{0:0}, at: __synchronize_srcu+0x5/0x1a0 but task is already holding lock: ((work_completion)(>deferred_list_work)){+.+.}-{0:0}, at: process_one_work+0x211/0x560 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 ((work_completion)(>deferred_list_work)){+.+.}-{0:0}: __flush_work+0x88/0x4f0 svm_range_list_lock_and_flush_work+0x3d/0x110 [amdgpu] svm_range_set_attr+0xd6/0x14c0 [amdgpu] kfd_ioctl+0x1d1/0x630 [amdgpu] __x64_sys_ioctl+0x88/0xc0 -> #2 (>lock#2){+.+.}-{3:3}: __mutex_lock+0x99/0xc70 amdgpu_amdkfd_gpuvm_restore_process_bos+0x54/0x740 [amdgpu] restore_process_helper+0x22/0x80 [amdgpu] restore_process_worker+0x2d/0xa0 [amdgpu] process_one_work+0x29b/0x560 worker_thread+0x3d/0x3d0 -> #1 ((work_completion)(&(>restore_work)->work)){+.+.}-{0:0}: __flush_work+0x88/0x4f0 __cancel_work_timer+0x12c/0x1c0 kfd_process_notifier_release_internal+0x37/0x1f0 [amdgpu] __mmu_notifier_release+0xad/0x240 exit_mmap+0x6a/0x3a0 mmput+0x6a/0x120 do_exit+0x322/0xb90 do_group_exit+0x37/0xa0 __x64_sys_exit_group+0x18/0x20 do_syscall_64+0x38/0x80 -> #0 (srcu){.+.+}-{0:0}: __lock_acquire+0x1521/0x2510 lock_sync+0x5f/0x90 __synchronize_srcu+0x4f/0x1a0 __mmu_notifier_release+0x128/0x240 exit_mmap+0x6a/0x3a0 mmput+0x6a/0x120 svm_range_deferred_list_work+0x19f/0x350 [amdgpu] process_one_work+0x29b/0x560 worker_thread+0x3d/0x3d0 other info that might help us debug this: Chain exists of: srcu --> >lock#2 --> (work_completion)(>deferred_list_work) Possible unsafe locking scenario: CPU0CPU1 lock((work_completion)(>deferred_list_work)); lock(>lock#2); lock((work_completion)(>deferred_list_work)); sync(srcu); Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index a4c911fa1675..b51224a85a38 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -2343,8 +2343,10 @@ static void svm_range_deferred_list_work(struct work_struct *work) mutex_unlock(>lock); mmap_write_unlock(mm); - /* Pairs with mmget in svm_range_add_list_work */ - mmput(mm); + /* Pairs with mmget in svm_range_add_list_work. If dropping the +* last mm refcount, schedule release work to avoid circular locking +*/ + mmput_async(mm); spin_lock(>deferred_list_lock); } -- 2.43.0
[PATCH AUTOSEL 6.7 38/39] drm/amdkfd: Fix 'node' NULL check in 'svm_range_get_range_boundaries()'
From: Srinivasan Shanmugam [ Upstream commit d7a254fad873775ce6c32b77796c81e81e6b7f2e ] Range interval [start, last] is ordered by rb_tree, rb_prev, rb_next return value still needs NULL check, thus modified from "node" to "rb_node". Fixes the below: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:2691 svm_range_get_range_boundaries() warn: can 'node' even be NULL? Suggested-by: Philip Yang Cc: Felix Kuehling Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index f66f88d2b643..9af1d094385a 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -2680,6 +2680,7 @@ svm_range_get_range_boundaries(struct kfd_process *p, int64_t addr, { struct vm_area_struct *vma; struct interval_tree_node *node; + struct rb_node *rb_node; unsigned long start_limit, end_limit; vma = vma_lookup(p->mm, addr << PAGE_SHIFT); @@ -2699,16 +2700,15 @@ svm_range_get_range_boundaries(struct kfd_process *p, int64_t addr, if (node) { end_limit = min(end_limit, node->start); /* Last range that ends before the fault address */ - node = container_of(rb_prev(>rb), - struct interval_tree_node, rb); + rb_node = rb_prev(>rb); } else { /* Last range must end before addr because * there was no range after addr */ - node = container_of(rb_last(>svms.objects.rb_root), - struct interval_tree_node, rb); + rb_node = rb_last(>svms.objects.rb_root); } - if (node) { + if (rb_node) { + node = container_of(rb_node, struct interval_tree_node, rb); if (node->last >= addr) { WARN(1, "Overlap with prev node and page fault addr\n"); return -EFAULT; -- 2.43.0
[PATCH AUTOSEL 6.7 37/39] drm/amdgpu: Release 'adev->pm.fw' before return in 'amdgpu_device_need_post()'
From: Srinivasan Shanmugam [ Upstream commit 8a44fdd3cf91debbd09b43bd2519ad2b2486ccf4 ] In function 'amdgpu_device_need_post(struct amdgpu_device *adev)' - 'adev->pm.fw' may not be released before return. Using the function release_firmware() to release adev->pm.fw. Thus fixing the below: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1571 amdgpu_device_need_post() warn: 'adev->pm.fw' from request_firmware() not released on lines: 1554. Cc: Monk Liu Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Suggested-by: Lijo Lazar Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 93cf73d6fa11..16601d039dfa 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -1485,6 +1485,7 @@ bool amdgpu_device_need_post(struct amdgpu_device *adev) return true; fw_ver = *((uint32_t *)adev->pm.fw->data + 69); + release_firmware(adev->pm.fw); if (fw_ver < 0x00160e00) return true; } -- 2.43.0
[PATCH AUTOSEL 6.7 35/39] drm/amd/powerplay: Fix kzalloc parameter 'ATOM_Tonga_PPM_Table' in 'get_platform_power_management_table()'
From: Srinivasan Shanmugam [ Upstream commit 6616b5e1999146b1304abe78232af810080c67e3 ] In 'struct phm_ppm_table *ptr' allocation using kzalloc, an incorrect structure type is passed to sizeof() in kzalloc, larger structure types were used, thus using correct type 'struct phm_ppm_table' fixes the below: drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/process_pptables_v1_0.c:203 get_platform_power_management_table() warn: struct type mismatch 'phm_ppm_table vs _ATOM_Tonga_PPM_Table' Cc: Eric Huang Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Acked-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c index f2a55c1413f5..17882f8dfdd3 100644 --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c @@ -200,7 +200,7 @@ static int get_platform_power_management_table( struct pp_hwmgr *hwmgr, ATOM_Tonga_PPM_Table *atom_ppm_table) { - struct phm_ppm_table *ptr = kzalloc(sizeof(ATOM_Tonga_PPM_Table), GFP_KERNEL); + struct phm_ppm_table *ptr = kzalloc(sizeof(*ptr), GFP_KERNEL); struct phm_ppt_v1_information *pp_table_information = (struct phm_ppt_v1_information *)(hwmgr->pptable); -- 2.43.0
[PATCH AUTOSEL 6.7 34/39] drm/amdgpu: fix avg vs input power reporting on smu7
From: Alex Deucher [ Upstream commit 25852d4b97572ff62ffee574cb8bb4bc551af23a ] Hawaii, Bonaire, Fiji, and Tonga support average power, the others support current power. Reviewed-by: Yang Wang Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c index 11372fcc59c8..a2c7b2e111fa 100644 --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c @@ -3995,6 +3995,7 @@ static int smu7_read_sensor(struct pp_hwmgr *hwmgr, int idx, uint32_t sclk, mclk, activity_percent; uint32_t offset, val_vid; struct smu7_hwmgr *data = (struct smu7_hwmgr *)(hwmgr->backend); + struct amdgpu_device *adev = hwmgr->adev; /* size must be at least 4 bytes for all sensors */ if (*size < 4) @@ -4038,7 +4039,21 @@ static int smu7_read_sensor(struct pp_hwmgr *hwmgr, int idx, *size = 4; return 0; case AMDGPU_PP_SENSOR_GPU_INPUT_POWER: - return smu7_get_gpu_power(hwmgr, (uint32_t *)value); + if ((adev->asic_type != CHIP_HAWAII) && + (adev->asic_type != CHIP_BONAIRE) && + (adev->asic_type != CHIP_FIJI) && + (adev->asic_type != CHIP_TONGA)) + return smu7_get_gpu_power(hwmgr, (uint32_t *)value); + else + return -EOPNOTSUPP; + case AMDGPU_PP_SENSOR_GPU_AVG_POWER: + if ((adev->asic_type != CHIP_HAWAII) && + (adev->asic_type != CHIP_BONAIRE) && + (adev->asic_type != CHIP_FIJI) && + (adev->asic_type != CHIP_TONGA)) + return -EOPNOTSUPP; + else + return smu7_get_gpu_power(hwmgr, (uint32_t *)value); case AMDGPU_PP_SENSOR_VDDGFX: if ((data->vr_config & VRCONF_VDDGFX_MASK) == (VR_SVI2_PLANE_2 << VRCONF_VDDGFX_SHIFT)) -- 2.43.0
[PATCH AUTOSEL 6.7 27/39] Revert "drm/amd/display: Fix conversions between bytes and KB"
From: Daniel Miess [ Upstream commit bf282eb92b84709d99186ad5940b9997eb3c1ff2 ] This reverts commit d0f639c5869399bf6dde4d694d5f8c0ab8c0ec46. The previous commit causes failure to light up for 1080p eDP + 8k HDMI panel combo. Reviewed-by: Charlene Liu Acked-by: Rodrigo Siqueira Signed-off-by: Daniel Miess Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../amd/display/dc/dml2/display_mode_core.c| 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml2/display_mode_core.c b/drivers/gpu/drm/amd/display/dc/dml2/display_mode_core.c index b95bf27f2fe2..a6b938a12de1 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/display_mode_core.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/display_mode_core.c @@ -6229,7 +6229,7 @@ static void set_calculate_prefetch_schedule_params(struct display_mode_lib_st *m CalculatePrefetchSchedule_params->GPUVMEnable = mode_lib->ms.cache_display_cfg.plane.GPUVMEnable; CalculatePrefetchSchedule_params->HostVMEnable = mode_lib->ms.cache_display_cfg.plane.HostVMEnable; CalculatePrefetchSchedule_params->HostVMMaxNonCachedPageTableLevels = mode_lib->ms.cache_display_cfg.plane.HostVMMaxPageTableLevels; - CalculatePrefetchSchedule_params->HostVMMinPageSize = mode_lib->ms.soc.hostvm_min_page_size_kbytes; + CalculatePrefetchSchedule_params->HostVMMinPageSize = mode_lib->ms.soc.hostvm_min_page_size_kbytes * 1024; CalculatePrefetchSchedule_params->DynamicMetadataEnable = mode_lib->ms.cache_display_cfg.plane.DynamicMetadataEnable[k]; CalculatePrefetchSchedule_params->DynamicMetadataVMEnabled = mode_lib->ms.ip.dynamic_metadata_vm_enabled; CalculatePrefetchSchedule_params->DynamicMetadataLinesBeforeActiveRequired = mode_lib->ms.cache_display_cfg.plane.DynamicMetadataLinesBeforeActiveRequired[k]; @@ -6329,7 +6329,7 @@ static void dml_prefetch_check(struct display_mode_lib_st *mode_lib) mode_lib->ms.NoOfDPPThisState, mode_lib->ms.dpte_group_bytes, s->HostVMInefficiencyFactor, - mode_lib->ms.soc.hostvm_min_page_size_kbytes, + mode_lib->ms.soc.hostvm_min_page_size_kbytes * 1024, mode_lib->ms.cache_display_cfg.plane.HostVMMaxPageTableLevels); s->NextMaxVStartup = s->MaxVStartupAllPlanes[j]; @@ -6542,7 +6542,7 @@ static void dml_prefetch_check(struct display_mode_lib_st *mode_lib) mode_lib->ms.cache_display_cfg.plane.HostVMEnable, mode_lib->ms.cache_display_cfg.plane.HostVMMaxPageTableLevels, mode_lib->ms.cache_display_cfg.plane.GPUVMEnable, - mode_lib->ms.soc.hostvm_min_page_size_kbytes, + mode_lib->ms.soc.hostvm_min_page_size_kbytes * 1024, mode_lib->ms.PDEAndMetaPTEBytesPerFrame[j][k], mode_lib->ms.MetaRowBytes[j][k], mode_lib->ms.DPTEBytesPerRow[j][k], @@ -7687,7 +7687,7 @@ dml_bool_t dml_core_mode_support(struct display_mode_lib_st *mode_lib) CalculateVMRowAndSwath_params->HostVMMaxNonCachedPageTableLevels = mode_lib->ms.cache_display_cfg.plane.HostVMMaxPageTableLevels; CalculateVMRowAndSwath_params->GPUVMMaxPageTableLevels = mode_lib->ms.cache_display_cfg.plane.GPUVMMaxPageTableLevels; CalculateVMRowAndSwath_params->GPUVMMinPageSizeKBytes = mode_lib->ms.cache_display_cfg.plane.GPUVMMinPageSizeKBytes; - CalculateVMRowAndSwath_params->HostVMMinPageSize = mode_lib->ms.soc.hostvm_min_page_size_kbytes; + CalculateVMRowAndSwath_params->HostVMMinPageSize = mode_lib->ms.soc.hostvm_min_page_size_kbytes * 1024; CalculateVMRowAndSwath_params->PTEBufferModeOverrideEn = mode_lib->ms.cache_display_cfg.plane.PTEBufferModeOverrideEn; CalculateVMRowAndSwath_params->PTEBufferModeOverrideVal = mode_lib->ms.cache_display_cfg.plane.PTEBufferMode; CalculateVMRowAndSwath_params->PTEBufferSizeNotExceeded = mode_lib->ms.PTEBufferSizeNotExceededPerState; @@ -7957,7 +7957,7 @@ dml_bool_t dml_core_mode_support(struct display_mode_lib_st *mode_lib) UseMinimumDCFCLK_params->GPUVMMaxPageTableLevels = mode_lib->ms.cache_display_cfg.plane.GPUVMMaxPageTableLevels; UseMinimumDCFCLK_params->HostVMEnable =
[PATCH AUTOSEL 6.7 28/39] drm/amdkfd: Fix lock dependency warning with srcu
From: Philip Yang [ Upstream commit 2a9de42e8d3c82c6990d226198602be44f43f340 ] == WARNING: possible circular locking dependency detected 6.5.0-kfd-yangp #2289 Not tainted -- kworker/0:2/996 is trying to acquire lock: (srcu){.+.+}-{0:0}, at: __synchronize_srcu+0x5/0x1a0 but task is already holding lock: ((work_completion)(>deferred_list_work)){+.+.}-{0:0}, at: process_one_work+0x211/0x560 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 ((work_completion)(>deferred_list_work)){+.+.}-{0:0}: __flush_work+0x88/0x4f0 svm_range_list_lock_and_flush_work+0x3d/0x110 [amdgpu] svm_range_set_attr+0xd6/0x14c0 [amdgpu] kfd_ioctl+0x1d1/0x630 [amdgpu] __x64_sys_ioctl+0x88/0xc0 -> #2 (>lock#2){+.+.}-{3:3}: __mutex_lock+0x99/0xc70 amdgpu_amdkfd_gpuvm_restore_process_bos+0x54/0x740 [amdgpu] restore_process_helper+0x22/0x80 [amdgpu] restore_process_worker+0x2d/0xa0 [amdgpu] process_one_work+0x29b/0x560 worker_thread+0x3d/0x3d0 -> #1 ((work_completion)(&(>restore_work)->work)){+.+.}-{0:0}: __flush_work+0x88/0x4f0 __cancel_work_timer+0x12c/0x1c0 kfd_process_notifier_release_internal+0x37/0x1f0 [amdgpu] __mmu_notifier_release+0xad/0x240 exit_mmap+0x6a/0x3a0 mmput+0x6a/0x120 do_exit+0x322/0xb90 do_group_exit+0x37/0xa0 __x64_sys_exit_group+0x18/0x20 do_syscall_64+0x38/0x80 -> #0 (srcu){.+.+}-{0:0}: __lock_acquire+0x1521/0x2510 lock_sync+0x5f/0x90 __synchronize_srcu+0x4f/0x1a0 __mmu_notifier_release+0x128/0x240 exit_mmap+0x6a/0x3a0 mmput+0x6a/0x120 svm_range_deferred_list_work+0x19f/0x350 [amdgpu] process_one_work+0x29b/0x560 worker_thread+0x3d/0x3d0 other info that might help us debug this: Chain exists of: srcu --> >lock#2 --> (work_completion)(>deferred_list_work) Possible unsafe locking scenario: CPU0CPU1 lock((work_completion)(>deferred_list_work)); lock(>lock#2); lock((work_completion)(>deferred_list_work)); sync(srcu); Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 92d8b1513e57..f66f88d2b643 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -2366,8 +2366,10 @@ static void svm_range_deferred_list_work(struct work_struct *work) mutex_unlock(>lock); mmap_write_unlock(mm); - /* Pairs with mmget in svm_range_add_list_work */ - mmput(mm); + /* Pairs with mmget in svm_range_add_list_work. If dropping the +* last mm refcount, schedule release work to avoid circular locking +*/ + mmput_async(mm); spin_lock(>deferred_list_lock); } -- 2.43.0
[PATCH AUTOSEL 6.7 36/39] drm/amdgpu: Fix with right return code '-EIO' in 'amdgpu_gmc_vram_checking()'
From: Srinivasan Shanmugam [ Upstream commit fac4ebd79fed60e79cccafdad45a2bb8d3795044 ] The amdgpu_gmc_vram_checking() function in emulation checks whether all of the memory range of shared system memory could be accessed by GPU, from this aspect, -EIO is returned for error scenarios. Fixes the below: drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:919 gmc_v6_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1103 gmc_v7_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1223 gmc_v8_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2344 gmc_v9_0_hw_init() warn: missing error code? 'r' Cc: Xiaojian Du Cc: Lijo Lazar Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Suggested-by: Christian König Reviewed-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index d2f273d77e59..55784a9f26c4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c @@ -1045,21 +1045,28 @@ int amdgpu_gmc_vram_checking(struct amdgpu_device *adev) * seconds, so here, we just pick up three parts for emulation. */ ret = memcmp(vram_ptr, cptr, 10); - if (ret) - return ret; + if (ret) { + ret = -EIO; + goto release_buffer; + } ret = memcmp(vram_ptr + (size / 2), cptr, 10); - if (ret) - return ret; + if (ret) { + ret = -EIO; + goto release_buffer; + } ret = memcmp(vram_ptr + size - 10, cptr, 10); - if (ret) - return ret; + if (ret) { + ret = -EIO; + goto release_buffer; + } +release_buffer: amdgpu_bo_free_kernel(_bo, _gpu, _ptr); - return 0; + return ret; } static ssize_t current_memory_partition_show( -- 2.43.0
[PATCH AUTOSEL 6.7 26/39] drm/amd/display: To adjust dprefclk by down spread percentage
From: Martin Tsai [ Upstream commit 17e74e11ac2b46e7514705ae7abfb93ac0e20bd6 ] [Why] Panels show corruption with high refresh rate timings when ssc is enabled. [How] Read down-spread percentage from lut to adjust dprefclk. Issues come from S0i3 with this commit has been fixed by SMU. Reviewed-by: Nicholas Kazlauskas Acked-by: Rodrigo Siqueira Signed-off-by: Martin Tsai Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../dc/clk_mgr/dcn314/dcn314_clk_mgr.c| 71 ++- .../dc/clk_mgr/dcn314/dcn314_clk_mgr.h| 11 +++ .../gpu/drm/amd/display/dc/dce/dce_audio.c| 2 +- .../drm/amd/display/dc/dce/dce_clock_source.c | 9 ++- .../amd/display/dc/hwss/dce110/dce110_hwseq.c | 2 +- .../gpu/drm/amd/display/dc/inc/hw/clk_mgr.h | 1 + .../gpu/drm/amd/display/include/audio_types.h | 2 +- 7 files changed, 93 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c index 7326b7565846..bf17e78a0ae1 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c @@ -87,6 +87,20 @@ static const struct IP_BASE CLK_BASE = { { { { 0x00016C00, 0x02401800, 0, 0, 0, #define CLK1_CLK_PLL_REQ__PllSpineDiv_MASK 0xF000L #define CLK1_CLK_PLL_REQ__FbMult_frac_MASK 0xL +#define regCLK1_CLK2_BYPASS_CNTL 0x029c +#define regCLK1_CLK2_BYPASS_CNTL_BASE_IDX 0 + +#define CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_SEL__SHIFT 0x0 +#define CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_DIV__SHIFT 0x10 +#define CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_SEL_MASK0x0007L +#define CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_DIV_MASK0x000FL + +#define regCLK6_0_CLK6_spll_field_80x464b +#define regCLK6_0_CLK6_spll_field_8_BASE_IDX 0 + +#define CLK6_0_CLK6_spll_field_8__spll_ssc_en__SHIFT 0xd +#define CLK6_0_CLK6_spll_field_8__spll_ssc_en_MASK 0x2000L + #define REG(reg_name) \ (CLK_BASE.instance[0].segment[reg ## reg_name ## _BASE_IDX] + reg ## reg_name) @@ -160,6 +174,37 @@ static void dcn314_disable_otg_wa(struct clk_mgr *clk_mgr_base, struct dc_state } } +bool dcn314_is_spll_ssc_enabled(struct clk_mgr *clk_mgr_base) +{ + struct clk_mgr_internal *clk_mgr = TO_CLK_MGR_INTERNAL(clk_mgr_base); + uint32_t ssc_enable; + + REG_GET(CLK6_0_CLK6_spll_field_8, spll_ssc_en, _enable); + + return ssc_enable == 1; +} + +void dcn314_init_clocks(struct clk_mgr *clk_mgr) +{ + struct clk_mgr_internal *clk_mgr_int = TO_CLK_MGR_INTERNAL(clk_mgr); + uint32_t ref_dtbclk = clk_mgr->clks.ref_dtbclk_khz; + + memset(&(clk_mgr->clks), 0, sizeof(struct dc_clocks)); + // Assumption is that boot state always supports pstate + clk_mgr->clks.ref_dtbclk_khz = ref_dtbclk; // restore ref_dtbclk + clk_mgr->clks.p_state_change_support = true; + clk_mgr->clks.prev_p_state_change_support = true; + clk_mgr->clks.pwr_state = DCN_PWR_STATE_UNKNOWN; + clk_mgr->clks.zstate_support = DCN_ZSTATE_SUPPORT_UNKNOWN; + + // to adjust dp_dto reference clock if ssc is enable otherwise to apply dprefclk + if (dcn314_is_spll_ssc_enabled(clk_mgr)) + clk_mgr->dp_dto_source_clock_in_khz = + dce_adjust_dp_ref_freq_for_ss(clk_mgr_int, clk_mgr->dprefclk_khz); + else + clk_mgr->dp_dto_source_clock_in_khz = clk_mgr->dprefclk_khz; +} + void dcn314_update_clocks(struct clk_mgr *clk_mgr_base, struct dc_state *context, bool safe_to_lower) @@ -436,6 +481,11 @@ static DpmClocks314_t dummy_clocks; static struct dcn314_watermarks dummy_wms = { 0 }; +static struct dcn314_ss_info_table ss_info_table = { + .ss_divider = 1000, + .ss_percentage = {0, 0, 375, 375, 375} +}; + static void dcn314_build_watermark_ranges(struct clk_bw_params *bw_params, struct dcn314_watermarks *table) { int i, num_valid_sets; @@ -708,13 +758,31 @@ static struct clk_mgr_funcs dcn314_funcs = { .get_dp_ref_clk_frequency = dce12_get_dp_ref_freq_khz, .get_dtb_ref_clk_frequency = dcn31_get_dtb_ref_freq_khz, .update_clocks = dcn314_update_clocks, - .init_clocks = dcn31_init_clocks, + .init_clocks = dcn314_init_clocks, .enable_pme_wa = dcn314_enable_pme_wa, .are_clock_states_equal = dcn314_are_clock_states_equal, .notify_wm_ranges = dcn314_notify_wm_ranges }; extern struct clk_mgr_funcs dcn3_fpga_funcs; +static void dcn314_read_ss_info_from_lut(struct clk_mgr_internal *clk_mgr) +{ + uint32_t clock_source; + //uint32_t ssc_enable; + + REG_GET(CLK1_CLK2_BYPASS_CNTL, CLK2_BYPASS_SEL, _source); + //REG_GET(CLK6_0_CLK6_spll_field_8, spll_ssc_en,
[PATCH AUTOSEL 6.7 25/39] drm/amdkfd: Fix lock dependency warning
From: Felix Kuehling [ Upstream commit 47bf0f83fc86df1bf42b385a91aadb910137c5c9 ] == WARNING: possible circular locking dependency detected 6.5.0-kfd-fkuehlin #276 Not tainted -- kworker/8:2/2676 is trying to acquire lock: 9435aae95c88 ((work_completion)(_bo->eviction_work)){+.+.}-{0:0}, at: __flush_work+0x52/0x550 but task is already holding lock: 9435cd8e1720 (>lock){+.+.}-{3:3}, at: svm_range_deferred_list_work+0xe8/0x340 [amdgpu] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (>lock){+.+.}-{3:3}: __mutex_lock+0x97/0xd30 kfd_ioctl_alloc_memory_of_gpu+0x6d/0x3c0 [amdgpu] kfd_ioctl+0x1b2/0x5d0 [amdgpu] __x64_sys_ioctl+0x86/0xc0 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd -> #1 (>mmap_lock){}-{3:3}: down_read+0x42/0x160 svm_range_evict_svm_bo_worker+0x8b/0x340 [amdgpu] process_one_work+0x27a/0x540 worker_thread+0x53/0x3e0 kthread+0xeb/0x120 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x11/0x20 -> #0 ((work_completion)(_bo->eviction_work)){+.+.}-{0:0}: __lock_acquire+0x1426/0x2200 lock_acquire+0xc1/0x2b0 __flush_work+0x80/0x550 __cancel_work_timer+0x109/0x190 svm_range_bo_release+0xdc/0x1c0 [amdgpu] svm_range_free+0x175/0x180 [amdgpu] svm_range_deferred_list_work+0x15d/0x340 [amdgpu] process_one_work+0x27a/0x540 worker_thread+0x53/0x3e0 kthread+0xeb/0x120 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x11/0x20 other info that might help us debug this: Chain exists of: (work_completion)(_bo->eviction_work) --> >mmap_lock --> >lock Possible unsafe locking scenario: CPU0CPU1 lock(>lock); lock(>mmap_lock); lock(>lock); lock((work_completion)(_bo->eviction_work)); I believe this cannot really lead to a deadlock in practice, because svm_range_evict_svm_bo_worker only takes the mmap_read_lock if the BO refcount is non-0. That means it's impossible that svm_range_bo_release is running concurrently. However, there is no good way to annotate this. To avoid the problem, take a BO reference in svm_range_schedule_evict_svm_bo instead of in the worker. That way it's impossible for a BO to get freed while eviction work is pending and the cancel_work_sync call in svm_range_bo_release can be eliminated. v2: Use svm_bo_ref_unless_zero and explained why that's safe. Also removed redundant checks that are already done in amdkfd_fence_enable_signaling. Signed-off-by: Felix Kuehling Reviewed-by: Philip Yang Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 26 ++ 1 file changed, 10 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index a15bfb5223e8..92d8b1513e57 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -400,14 +400,9 @@ static void svm_range_bo_release(struct kref *kref) spin_lock(_bo->list_lock); } spin_unlock(_bo->list_lock); - if (!dma_fence_is_signaled(_bo->eviction_fence->base)) { - /* We're not in the eviction worker. -* Signal the fence and synchronize with any -* pending eviction work. -*/ + if (!dma_fence_is_signaled(_bo->eviction_fence->base)) + /* We're not in the eviction worker. Signal the fence. */ dma_fence_signal(_bo->eviction_fence->base); - cancel_work_sync(_bo->eviction_work); - } dma_fence_put(_bo->eviction_fence->base); amdgpu_bo_unref(_bo->bo); kfree(svm_bo); @@ -3447,13 +3442,14 @@ svm_range_trigger_migration(struct mm_struct *mm, struct svm_range *prange, int svm_range_schedule_evict_svm_bo(struct amdgpu_amdkfd_fence *fence) { - if (!fence) - return -EINVAL; - - if (dma_fence_is_signaled(>base)) - return 0; - - if (fence->svm_bo) { + /* Dereferencing fence->svm_bo is safe here because the fence hasn't +* signaled yet and we're under the protection of the fence->lock. +* After the fence is signaled in svm_range_bo_release, we cannot get +* here any more. +* +* Reference is dropped in svm_range_evict_svm_bo_worker. +*/ + if (svm_bo_ref_unless_zero(fence->svm_bo)) { WRITE_ONCE(fence->svm_bo->evicting, 1); schedule_work(>svm_bo->eviction_work); } @@ -3468,8 +3464,6 @@ static void svm_range_evict_svm_bo_worker(struct work_struct *work) int r = 0; svm_bo =