[AMD Official Use Only - General] Hey,
Due to mode-1 reset (pending_reset), the amdgpu_amdkfd_device_init will not be called and hence adev->kfd.init_complete will not be set. The function amdgpu_amdkfd_drm_client_create has condition: if (!adev->kfd.init_complete) return 0; So, in probe function, when we return from device_init the KFD is not initialized and amdgpu_amdkfd_drm_client_create returns without doing anything. Thanks, Ahmad ________________________________ From: Kuehling, Felix <felix.kuehl...@amd.com> Sent: Monday, March 4, 2024 6:39 PM To: Rehman, Ahmad <ahmad.reh...@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org> Cc: Wan, Gavin <gavin....@amd.com> Subject: Re: [PATCH] drm/amdgpu: Init zone device and drm client after mode-1 reset on reload On 2024-03-04 17:05, Ahmad Rehman wrote: > In passthrough environment, when amdgpu is reloaded after unload, mode-1 > is triggered after initializing the necessary IPs, That init does not > include KFD, and KFD init waits until the reset is completed. KFD init > is called in the reset handler, but in this case, the zone device and > drm client is not initialized, causing app to create kernel panic. > > Signed-off-by: Ahmad Rehman <ahmad.reh...@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > index 15b188aaf681..80b9642f2bc4 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > @@ -2479,8 +2479,11 @@ static void > amdgpu_drv_delayed_reset_work_handler(struct work_struct *work) > } > for (i = 0; i < mgpu_info.num_dgpu; i++) { > adev = mgpu_info.gpu_ins[i].adev; > - if (!adev->kfd.init_complete) > + if (!adev->kfd.init_complete) { > + kgd2kfd_init_zone_device(adev); > amdgpu_amdkfd_device_init(adev); > + amdgpu_amdkfd_drm_client_create(adev); I don't see what's preventing the DRM client initialization in the reset-on-driver-load case. It only needs to be created once and that happens in amdgpu_pci_probe. Am I missing anything? Regards, Felix > + } > amdgpu_ttm_set_buffer_funcs_status(adev, true); > } > }