Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh
On Wed, Mar 15, 2023 at 08:52:30AM +0800, Stefano Stabellini wrote: > On Mon, 13 Mar 2023, Jan Beulich wrote: > > On 12.03.2023 13:01, Huang Rui wrote: > > > Xen PVH is the paravirtualized mode and takes advantage of hardware > > > virtualization support when possible. It will using the hardware IOMMU > > > support instead of xen-swiotlb, so disable swiotlb if current domain is > > > Xen PVH. > > > > But the kernel has no way (yet) to drive the IOMMU, so how can it get > > away without resorting to swiotlb in certain cases (like I/O to an > > address-restricted device)? > > I think Ray meant that, thanks to the IOMMU setup by Xen, there is no > need for swiotlb-xen in Dom0. Address translations are done by the IOMMU > so we can use guest physical addresses instead of machine addresses for > DMA. This is a similar case to Dom0 on ARM when the IOMMU is available > (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding > case is XENFEAT_not_direct_mapped). Hi Jan, sorry to late reply. We are using the native kernel amdgpu and ttm driver on Dom0, amdgpu/ttm would like to use IOMMU to allocate coherent buffers for userptr that map the user space memory to gpu access, however, swiotlb doesn't support this. In other words, with swiotlb, we only can handle the buffer page by page. Thanks, Ray > > Jurgen, what do you think? Would you rather make xen_swiotlb_detect > common between ARM and x86?
RE: [PATCH v2] drm/amdgpu: resove reboot exception for si oland
[AMD Official Use Only - General] I'm OK with the drop of si_set_temperature_range() in late_init. Meanwhile, it's still not clear to me how this could lead reboot exception. Can you dig this a little bit further? For example, can you check whether the operation(si_thermal_start_thermal_controller()) actually already failed in hw_init(si_dpm_enable more specifically)? @@ -6918,7 +6918,11 @@ static int si_dpm_enable(struct amdgpu_device *adev) si_start_dpm(adev); si_enable_auto_throttle_source(adev, SI_DPM_AUTO_THROTTLE_SRC_THERMAL, true); - si_thermal_start_thermal_controller(adev); + ret = si_thermal_start_thermal_controller(adev); + if (ret) { + DRM_ERROR("si_thermal_start_thermal_controller failed\n"); + return ret; + } ni_update_current_ps(adev, boot_ps); BR Evan > -Original Message- > From: amd-gfx On Behalf Of > Zhenneng Li > Sent: Monday, March 13, 2023 10:57 AM > To: Chen, Guchun > Cc: David Airlie ; Pan, Xinhui ; > Zhenneng Li ; amd-gfx@lists.freedesktop.org; > Daniel Vetter ; Deucher, Alexander > ; Koenig, Christian > > Subject: [PATCH v2] drm/amdgpu: resove reboot exception for si oland > > During reboot test on arm64 platform, it may failure > on boot. > > The error message are as follows: > [6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init [amdgpu]] > *ERROR* > late_init of IP block failed -22 > [7.006919][ 7] [ T295] amdgpu :04:00.0: amdgpu_device_ip_late_init > failed > [7.014224][ 7] [ T295] amdgpu :04:00.0: Fatal error during GPU init > --- > drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 12 > 1 file changed, 12 deletions(-) > > diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > index d6d9e3b1b2c0..ca9bce895dbe 100644 > --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > @@ -7626,18 +7626,6 @@ static int si_dpm_process_interrupt(struct > amdgpu_device *adev, > > static int si_dpm_late_init(void *handle) > { > - int ret; > - struct amdgpu_device *adev = (struct amdgpu_device *)handle; > - > - if (!adev->pm.dpm_enabled) > - return 0; > - > - ret = si_set_temperature_range(adev); > - if (ret) > - return ret; > -#if 0 //TODO ? > - si_dpm_powergate_uvd(adev, true); > -#endif > return 0; > } > > -- > 2.25.1
Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh
On Mon, 13 Mar 2023, Jan Beulich wrote: > On 12.03.2023 13:01, Huang Rui wrote: > > Xen PVH is the paravirtualized mode and takes advantage of hardware > > virtualization support when possible. It will using the hardware IOMMU > > support instead of xen-swiotlb, so disable swiotlb if current domain is > > Xen PVH. > > But the kernel has no way (yet) to drive the IOMMU, so how can it get > away without resorting to swiotlb in certain cases (like I/O to an > address-restricted device)? I think Ray meant that, thanks to the IOMMU setup by Xen, there is no need for swiotlb-xen in Dom0. Address translations are done by the IOMMU so we can use guest physical addresses instead of machine addresses for DMA. This is a similar case to Dom0 on ARM when the IOMMU is available (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding case is XENFEAT_not_direct_mapped). Jurgen, what do you think? Would you rather make xen_swiotlb_detect common between ARM and x86?
Re: [PATCH] drm/amdgpu: Don't resume IOMMU after incomplete init
On Tue, Mar 14, 2023 at 1:54 PM Felix Kuehling wrote: > > Check kfd->init_complete in kgd2kfd_iommu_resume, consistent with other > kgd2kfd calls. This should fix IOMMU errors on resume from suspend when > KFD IOMMU initialization failed. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217170 > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2454 > Cc: Vasant Hegde > Cc: Linux regression tracking (Thorsten Leemhuis) > Cc: sta...@vger.kernel.org > Signed-off-by: Felix Kuehling Acked-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdkfd/kfd_device.c | 11 ++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c > b/drivers/gpu/drm/amd/amdkfd/kfd_device.c > index 521dfa88aad8..989c6aa2620b 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c > @@ -60,6 +60,7 @@ static int kfd_gtt_sa_init(struct kfd_dev *kfd, unsigned > int buf_size, > unsigned int chunk_size); > static void kfd_gtt_sa_fini(struct kfd_dev *kfd); > > +static int kfd_resume_iommu(struct kfd_dev *kfd); > static int kfd_resume(struct kfd_dev *kfd); > > static void kfd_device_info_set_sdma_info(struct kfd_dev *kfd) > @@ -625,7 +626,7 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, > > svm_migrate_init(kfd->adev); > > - if (kgd2kfd_resume_iommu(kfd)) > + if (kfd_resume_iommu(kfd)) > goto device_iommu_error; > > if (kfd_resume(kfd)) > @@ -773,6 +774,14 @@ int kgd2kfd_resume(struct kfd_dev *kfd, bool run_pm) > } > > int kgd2kfd_resume_iommu(struct kfd_dev *kfd) > +{ > + if (!kfd->init_complete) > + return 0; > + > + return kfd_resume_iommu(kfd); > +} > + > +static int kfd_resume_iommu(struct kfd_dev *kfd) > { > int err = 0; > > -- > 2.34.1 >
[PATCH] drm/amdgpu: Don't resume IOMMU after incomplete init
Check kfd->init_complete in kgd2kfd_iommu_resume, consistent with other kgd2kfd calls. This should fix IOMMU errors on resume from suspend when KFD IOMMU initialization failed. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217170 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2454 Cc: Vasant Hegde Cc: Linux regression tracking (Thorsten Leemhuis) Cc: sta...@vger.kernel.org Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 521dfa88aad8..989c6aa2620b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -60,6 +60,7 @@ static int kfd_gtt_sa_init(struct kfd_dev *kfd, unsigned int buf_size, unsigned int chunk_size); static void kfd_gtt_sa_fini(struct kfd_dev *kfd); +static int kfd_resume_iommu(struct kfd_dev *kfd); static int kfd_resume(struct kfd_dev *kfd); static void kfd_device_info_set_sdma_info(struct kfd_dev *kfd) @@ -625,7 +626,7 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, svm_migrate_init(kfd->adev); - if (kgd2kfd_resume_iommu(kfd)) + if (kfd_resume_iommu(kfd)) goto device_iommu_error; if (kfd_resume(kfd)) @@ -773,6 +774,14 @@ int kgd2kfd_resume(struct kfd_dev *kfd, bool run_pm) } int kgd2kfd_resume_iommu(struct kfd_dev *kfd) +{ + if (!kfd->init_complete) + return 0; + + return kfd_resume_iommu(kfd); +} + +static int kfd_resume_iommu(struct kfd_dev *kfd) { int err = 0; -- 2.34.1
Re: [PATCH] drm/amdgpu/nv: Apply ASPM quirk on Intel ADL + AMD Navi
On Tue, Mar 14, 2023 at 12:35 AM Kai-Heng Feng wrote: > > S2idle resume freeze can be observed on Intel ADL + AMD WX5500. This is > caused by commit 0064b0ce85bb ("drm/amd/pm: enable ASPM by default"). > > The root cause is still not clear for now. > > So extend and apply the ASPM quirk from commit e02fe3bc7aba > ("drm/amdgpu: vi: disable ASPM on Intel Alder Lake based systems"), to > workaround the issue on Navi cards too. > > Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default") > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2458 > Signed-off-by: Kai-Heng Feng > --- > drivers/gpu/drm/amd/amdgpu/amdgpu.h| 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 +++ > drivers/gpu/drm/amd/amdgpu/nv.c| 2 +- > drivers/gpu/drm/amd/amdgpu/vi.c| 15 --- > 4 files changed, 17 insertions(+), 16 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > index 164141bc8b4a..c697580f1ee4 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > @@ -1272,6 +1272,7 @@ void amdgpu_device_pci_config_reset(struct > amdgpu_device *adev); > int amdgpu_device_pci_reset(struct amdgpu_device *adev); > bool amdgpu_device_need_post(struct amdgpu_device *adev); > bool amdgpu_device_should_use_aspm(struct amdgpu_device *adev); > +bool aspm_support_quirk_check(void); > > void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64 num_bytes, > u64 num_vis_bytes); > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index c4a4e2fe6681..c09f19385628 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -80,6 +80,10 @@ > > #include > > +#if IS_ENABLED(CONFIG_X86) > +#include > +#endif > + > MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); > MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); > MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin"); > @@ -1356,6 +1360,17 @@ bool amdgpu_device_should_use_aspm(struct > amdgpu_device *adev) > return pcie_aspm_enabled(adev->pdev); > } > > +bool aspm_support_quirk_check(void) For consistency with naming, rename this amdgpu_device_aspm_support_quirk(). Other than that, looks good to me. With that fixed: Reviewed-by: Alex Deucher Alex > +{ > +#if IS_ENABLED(CONFIG_X86) > + struct cpuinfo_x86 *c = _data(0); > + > + return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE); > +#else > + return true; > +#endif > +} > + > /* if we get transitioned to only one device, take VGA back */ > /** > * amdgpu_device_vga_set_decode - enable/disable vga decode > diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c > index 855d390c41de..921adf66e3c4 100644 > --- a/drivers/gpu/drm/amd/amdgpu/nv.c > +++ b/drivers/gpu/drm/amd/amdgpu/nv.c > @@ -578,7 +578,7 @@ static void nv_pcie_gen3_enable(struct amdgpu_device > *adev) > > static void nv_program_aspm(struct amdgpu_device *adev) > { > - if (!amdgpu_device_should_use_aspm(adev)) > + if (!amdgpu_device_should_use_aspm(adev) || > !aspm_support_quirk_check()) > return; > > if (!(adev->flags & AMD_IS_APU) && > diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c > index 12ef782eb478..e61ae372d674 100644 > --- a/drivers/gpu/drm/amd/amdgpu/vi.c > +++ b/drivers/gpu/drm/amd/amdgpu/vi.c > @@ -81,10 +81,6 @@ > #include "mxgpu_vi.h" > #include "amdgpu_dm.h" > > -#if IS_ENABLED(CONFIG_X86) > -#include > -#endif > - > #define ixPCIE_LC_L1_PM_SUBSTATE 0x100100C6 > #define PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK > 0x0001L > #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK 0x0002L > @@ -1138,17 +1134,6 @@ static void vi_enable_aspm(struct amdgpu_device *adev) > WREG32_PCIE(ixPCIE_LC_CNTL, data); > } > > -static bool aspm_support_quirk_check(void) > -{ > -#if IS_ENABLED(CONFIG_X86) > - struct cpuinfo_x86 *c = _data(0); > - > - return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE); > -#else > - return true; > -#endif > -} > - > static void vi_program_aspm(struct amdgpu_device *adev) > { > u32 data, data1, orig; > -- > 2.34.1 >
RE: NAB Show 2023 - Lead & Deal retrievals
Hello, I sent you an email about attendees list ? Let me know your interest to Send Pricing Details.. Awaiting for your response! Cheers Nancy From: Nancy Tyler Sent: Thursday, March 9, 2023 5:13 PM To: amd-gfx@lists.freedesktop.org Subject: NAB Show 2023 - Lead & Deal retrievals Importance: High Hello, Have a wonderful day! Would you want to purchase an National Association of Broadcasters - NAB Show 2023 Attendees Pre-registered Contact List? ATTENDEES TITLES: - Executive/Corporate Management, Creative Professionals, Technical Professionals, Sales/Marketing/Programming Professionals, Others.. If you're Interested please reply back as a "Send Cost and Counts" Regards, Nancy Tyler |Global Marketing If you don't want to receive further emails please revert with "Take Out" in the subject
[PATCH] drm/radeon: remove unused variable rbo
gcc with W=1 reports this error drivers/gpu/drm/radeon/radeon_ttm.c:201:27: error: variable ‘rbo’ set but not used [-Werror=unused-but-set-variable] 201 | struct radeon_bo *rbo; | ^~~ rbo use was removed with commit f87c1f0b7b79 ("drm/ttm: prevent moving of pinned BOs") Since the variable is not used, remove it. Signed-off-by: Tom Rix --- drivers/gpu/drm/radeon/radeon_ttm.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 2220cdf6a3f6..0ea430ee5256 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -198,7 +198,6 @@ static int radeon_bo_move(struct ttm_buffer_object *bo, bool evict, { struct ttm_resource *old_mem = bo->resource; struct radeon_device *rdev; - struct radeon_bo *rbo; int r; if (new_mem->mem_type == TTM_PL_TT) { @@ -211,7 +210,6 @@ static int radeon_bo_move(struct ttm_buffer_object *bo, bool evict, if (r) return r; - rbo = container_of(bo, struct radeon_bo, tbo); rdev = radeon_get_rdev(bo->bdev); if (!old_mem || (old_mem->mem_type == TTM_PL_SYSTEM && bo->ttm == NULL)) { -- 2.27.0
Re: [PATCH] drm/radeon: remove unused variable rbo
Am 14.03.23 um 14:06 schrieb Tom Rix: gcc with W=1 reports this error drivers/gpu/drm/radeon/radeon_ttm.c:201:27: error: variable ‘rbo’ set but not used [-Werror=unused-but-set-variable] 201 | struct radeon_bo *rbo; | ^~~ rbo use was removed with commit f87c1f0b7b79 ("drm/ttm: prevent moving of pinned BOs") Since the variable is not used, remove it. Signed-off-by: Tom Rix Reviewed-by: Christian König --- drivers/gpu/drm/radeon/radeon_ttm.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 2220cdf6a3f6..0ea430ee5256 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -198,7 +198,6 @@ static int radeon_bo_move(struct ttm_buffer_object *bo, bool evict, { struct ttm_resource *old_mem = bo->resource; struct radeon_device *rdev; - struct radeon_bo *rbo; int r; if (new_mem->mem_type == TTM_PL_TT) { @@ -211,7 +210,6 @@ static int radeon_bo_move(struct ttm_buffer_object *bo, bool evict, if (r) return r; - rbo = container_of(bo, struct radeon_bo, tbo); rdev = radeon_get_rdev(bo->bdev); if (!old_mem || (old_mem->mem_type == TTM_PL_SYSTEM && bo->ttm == NULL)) {
[PATCH AUTOSEL 4.19 6/7] drm/amdkfd: Fix an illegal memory access
From: Qu Huang [ Upstream commit 4fc8fff378b2f2039f2a666d9f8c570f4e58352c ] In the kfd_wait_on_events() function, the kfd_event_waiter structure is allocated by alloc_event_waiters(), but the event field of the waiter structure is not initialized; When copy_from_user() fails in the kfd_wait_on_events() function, it will enter exception handling to release the previously allocated memory of the waiter structure; Due to the event field of the waiters structure being accessed in the free_waiters() function, this results in illegal memory access and system crash, here is the crash log: localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0 localhost kernel: RSP: 0018:aa53c362bd60 EFLAGS: 00010082 localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0282 RCX: 002c localhost kernel: RDX: 9e855eeacb80 RSI: 279c RDI: e7088f6a21d0 localhost kernel: RBP: e7088f6a21d0 R08: 002c R09: aa53c362be64 localhost kernel: R10: aa53c362bbd8 R11: 0001 R12: 0002 localhost kernel: R13: 9e7ead15d600 R14: R15: 9e7ead15d698 localhost kernel: FS: 152a3d111700() GS:9e855ee8() knlGS: localhost kernel: CS: 0010 DS: ES: CR0: 80050033 localhost kernel: CR2: 15293810 CR3: 00044d7a4000 CR4: 003506e0 localhost kernel: Call Trace: localhost kernel: _raw_spin_lock_irqsave+0x30/0x40 localhost kernel: remove_wait_queue+0x12/0x50 localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu] localhost kernel: ? ftrace_graph_caller+0xa0/0xa0 localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu] localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu] localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu] localhost kernel: ? ftrace_graph_caller+0xa0/0xa0 localhost kernel: __x64_sys_ioctl+0x8e/0xd0 localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0 localhost kernel: do_syscall_64+0x33/0x80 localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 localhost kernel: RIP: 0033:0x152a4dff68d7 Allocate the structure with kcalloc, and remove redundant 0-initialization and a redundant loop condition check. Signed-off-by: Qu Huang Signed-off-by: Felix Kuehling Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index 892077377339a..8f23192b67095 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -529,16 +529,13 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events) struct kfd_event_waiter *event_waiters; uint32_t i; - event_waiters = kmalloc_array(num_events, - sizeof(struct kfd_event_waiter), - GFP_KERNEL); + event_waiters = kcalloc(num_events, sizeof(struct kfd_event_waiter), + GFP_KERNEL); if (!event_waiters) return NULL; - for (i = 0; (event_waiters) && (i < num_events) ; i++) { + for (i = 0; i < num_events; i++) init_wait(_waiters[i].wait); - event_waiters[i].activated = false; - } return event_waiters; } -- 2.39.2
[PATCH AUTOSEL 5.4 6/7] drm/amdkfd: Fix an illegal memory access
From: Qu Huang [ Upstream commit 4fc8fff378b2f2039f2a666d9f8c570f4e58352c ] In the kfd_wait_on_events() function, the kfd_event_waiter structure is allocated by alloc_event_waiters(), but the event field of the waiter structure is not initialized; When copy_from_user() fails in the kfd_wait_on_events() function, it will enter exception handling to release the previously allocated memory of the waiter structure; Due to the event field of the waiters structure being accessed in the free_waiters() function, this results in illegal memory access and system crash, here is the crash log: localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0 localhost kernel: RSP: 0018:aa53c362bd60 EFLAGS: 00010082 localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0282 RCX: 002c localhost kernel: RDX: 9e855eeacb80 RSI: 279c RDI: e7088f6a21d0 localhost kernel: RBP: e7088f6a21d0 R08: 002c R09: aa53c362be64 localhost kernel: R10: aa53c362bbd8 R11: 0001 R12: 0002 localhost kernel: R13: 9e7ead15d600 R14: R15: 9e7ead15d698 localhost kernel: FS: 152a3d111700() GS:9e855ee8() knlGS: localhost kernel: CS: 0010 DS: ES: CR0: 80050033 localhost kernel: CR2: 15293810 CR3: 00044d7a4000 CR4: 003506e0 localhost kernel: Call Trace: localhost kernel: _raw_spin_lock_irqsave+0x30/0x40 localhost kernel: remove_wait_queue+0x12/0x50 localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu] localhost kernel: ? ftrace_graph_caller+0xa0/0xa0 localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu] localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu] localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu] localhost kernel: ? ftrace_graph_caller+0xa0/0xa0 localhost kernel: __x64_sys_ioctl+0x8e/0xd0 localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0 localhost kernel: do_syscall_64+0x33/0x80 localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 localhost kernel: RIP: 0033:0x152a4dff68d7 Allocate the structure with kcalloc, and remove redundant 0-initialization and a redundant loop condition check. Signed-off-by: Qu Huang Signed-off-by: Felix Kuehling Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index adbb2fec2e0f2..4fd7dcef2e382 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -529,16 +529,13 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events) struct kfd_event_waiter *event_waiters; uint32_t i; - event_waiters = kmalloc_array(num_events, - sizeof(struct kfd_event_waiter), - GFP_KERNEL); + event_waiters = kcalloc(num_events, sizeof(struct kfd_event_waiter), + GFP_KERNEL); if (!event_waiters) return NULL; - for (i = 0; (event_waiters) && (i < num_events) ; i++) { + for (i = 0; i < num_events; i++) init_wait(_waiters[i].wait); - event_waiters[i].activated = false; - } return event_waiters; } -- 2.39.2
[PATCH AUTOSEL 5.10 6/8] drm/amdkfd: Fix an illegal memory access
From: Qu Huang [ Upstream commit 4fc8fff378b2f2039f2a666d9f8c570f4e58352c ] In the kfd_wait_on_events() function, the kfd_event_waiter structure is allocated by alloc_event_waiters(), but the event field of the waiter structure is not initialized; When copy_from_user() fails in the kfd_wait_on_events() function, it will enter exception handling to release the previously allocated memory of the waiter structure; Due to the event field of the waiters structure being accessed in the free_waiters() function, this results in illegal memory access and system crash, here is the crash log: localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0 localhost kernel: RSP: 0018:aa53c362bd60 EFLAGS: 00010082 localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0282 RCX: 002c localhost kernel: RDX: 9e855eeacb80 RSI: 279c RDI: e7088f6a21d0 localhost kernel: RBP: e7088f6a21d0 R08: 002c R09: aa53c362be64 localhost kernel: R10: aa53c362bbd8 R11: 0001 R12: 0002 localhost kernel: R13: 9e7ead15d600 R14: R15: 9e7ead15d698 localhost kernel: FS: 152a3d111700() GS:9e855ee8() knlGS: localhost kernel: CS: 0010 DS: ES: CR0: 80050033 localhost kernel: CR2: 15293810 CR3: 00044d7a4000 CR4: 003506e0 localhost kernel: Call Trace: localhost kernel: _raw_spin_lock_irqsave+0x30/0x40 localhost kernel: remove_wait_queue+0x12/0x50 localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu] localhost kernel: ? ftrace_graph_caller+0xa0/0xa0 localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu] localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu] localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu] localhost kernel: ? ftrace_graph_caller+0xa0/0xa0 localhost kernel: __x64_sys_ioctl+0x8e/0xd0 localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0 localhost kernel: do_syscall_64+0x33/0x80 localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 localhost kernel: RIP: 0033:0x152a4dff68d7 Allocate the structure with kcalloc, and remove redundant 0-initialization and a redundant loop condition check. Signed-off-by: Qu Huang Signed-off-by: Felix Kuehling Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index 159be13ef20bb..2c19b3775179b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -528,16 +528,13 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events) struct kfd_event_waiter *event_waiters; uint32_t i; - event_waiters = kmalloc_array(num_events, - sizeof(struct kfd_event_waiter), - GFP_KERNEL); + event_waiters = kcalloc(num_events, sizeof(struct kfd_event_waiter), + GFP_KERNEL); if (!event_waiters) return NULL; - for (i = 0; (event_waiters) && (i < num_events) ; i++) { + for (i = 0; i < num_events; i++) init_wait(_waiters[i].wait); - event_waiters[i].activated = false; - } return event_waiters; } -- 2.39.2
[PATCH AUTOSEL 5.10 8/8] drm/amd/display: fix shift-out-of-bounds in CalculateVMAndRowBytes
From: Alex Hung [ Upstream commit 031f196d1b1b6d5dfcb0533b431e3ab1750e6189 ] [WHY] When PTEBufferSizeInRequests is zero, UBSAN reports the following warning because dml_log2 returns an unexpected negative value: shift exponent 4294966273 is too large for 32-bit type 'int' [HOW] In the case PTEBufferSizeInRequests is zero, skip the dml_log2() and assign the result directly. Reviewed-by: Jun Lei Acked-by: Qingqing Zhuo Signed-off-by: Alex Hung Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c index e427f4ffa0807..e5b1002d7f3f0 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c @@ -1868,7 +1868,10 @@ static unsigned int CalculateVMAndRowBytes( } if (SurfaceTiling == dm_sw_linear) { - *dpte_row_height = dml_min(128, 1 << (unsigned int) dml_floor(dml_log2(PTEBufferSizeInRequests * *PixelPTEReqWidth / Pitch), 1)); + if (PTEBufferSizeInRequests == 0) + *dpte_row_height = 1; + else + *dpte_row_height = dml_min(128, 1 << (unsigned int) dml_floor(dml_log2(PTEBufferSizeInRequests * *PixelPTEReqWidth / Pitch), 1)); *dpte_row_width_ub = (dml_ceil(((double) SwathWidth - 1) / *PixelPTEReqWidth, 1) + 1) * *PixelPTEReqWidth; *PixelPTEBytesPerRow = *dpte_row_width_ub / *PixelPTEReqWidth * *PTERequestSize; } else if (ScanDirection != dm_vert) { -- 2.39.2
[PATCH AUTOSEL 5.15 10/10] drm/amd/display: fix shift-out-of-bounds in CalculateVMAndRowBytes
From: Alex Hung [ Upstream commit 031f196d1b1b6d5dfcb0533b431e3ab1750e6189 ] [WHY] When PTEBufferSizeInRequests is zero, UBSAN reports the following warning because dml_log2 returns an unexpected negative value: shift exponent 4294966273 is too large for 32-bit type 'int' [HOW] In the case PTEBufferSizeInRequests is zero, skip the dml_log2() and assign the result directly. Reviewed-by: Jun Lei Acked-by: Qingqing Zhuo Signed-off-by: Alex Hung Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c index 518672a2450f4..de0fa87b301a5 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c @@ -1868,7 +1868,10 @@ static unsigned int CalculateVMAndRowBytes( } if (SurfaceTiling == dm_sw_linear) { - *dpte_row_height = dml_min(128, 1 << (unsigned int) dml_floor(dml_log2(PTEBufferSizeInRequests * *PixelPTEReqWidth / Pitch), 1)); + if (PTEBufferSizeInRequests == 0) + *dpte_row_height = 1; + else + *dpte_row_height = dml_min(128, 1 << (unsigned int) dml_floor(dml_log2(PTEBufferSizeInRequests * *PixelPTEReqWidth / Pitch), 1)); *dpte_row_width_ub = (dml_ceil(((double) SwathWidth - 1) / *PixelPTEReqWidth, 1) + 1) * *PixelPTEReqWidth; *PixelPTEBytesPerRow = *dpte_row_width_ub / *PixelPTEReqWidth * *PTERequestSize; } else if (ScanDirection != dm_vert) { -- 2.39.2
[PATCH AUTOSEL 5.15 07/10] drm/amdkfd: Fix an illegal memory access
From: Qu Huang [ Upstream commit 4fc8fff378b2f2039f2a666d9f8c570f4e58352c ] In the kfd_wait_on_events() function, the kfd_event_waiter structure is allocated by alloc_event_waiters(), but the event field of the waiter structure is not initialized; When copy_from_user() fails in the kfd_wait_on_events() function, it will enter exception handling to release the previously allocated memory of the waiter structure; Due to the event field of the waiters structure being accessed in the free_waiters() function, this results in illegal memory access and system crash, here is the crash log: localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0 localhost kernel: RSP: 0018:aa53c362bd60 EFLAGS: 00010082 localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0282 RCX: 002c localhost kernel: RDX: 9e855eeacb80 RSI: 279c RDI: e7088f6a21d0 localhost kernel: RBP: e7088f6a21d0 R08: 002c R09: aa53c362be64 localhost kernel: R10: aa53c362bbd8 R11: 0001 R12: 0002 localhost kernel: R13: 9e7ead15d600 R14: R15: 9e7ead15d698 localhost kernel: FS: 152a3d111700() GS:9e855ee8() knlGS: localhost kernel: CS: 0010 DS: ES: CR0: 80050033 localhost kernel: CR2: 15293810 CR3: 00044d7a4000 CR4: 003506e0 localhost kernel: Call Trace: localhost kernel: _raw_spin_lock_irqsave+0x30/0x40 localhost kernel: remove_wait_queue+0x12/0x50 localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu] localhost kernel: ? ftrace_graph_caller+0xa0/0xa0 localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu] localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu] localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu] localhost kernel: ? ftrace_graph_caller+0xa0/0xa0 localhost kernel: __x64_sys_ioctl+0x8e/0xd0 localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0 localhost kernel: do_syscall_64+0x33/0x80 localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 localhost kernel: RIP: 0033:0x152a4dff68d7 Allocate the structure with kcalloc, and remove redundant 0-initialization and a redundant loop condition check. Signed-off-by: Qu Huang Signed-off-by: Felix Kuehling Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index b8bdd796cd911..8b5c82af2acd7 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -528,16 +528,13 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events) struct kfd_event_waiter *event_waiters; uint32_t i; - event_waiters = kmalloc_array(num_events, - sizeof(struct kfd_event_waiter), - GFP_KERNEL); + event_waiters = kcalloc(num_events, sizeof(struct kfd_event_waiter), + GFP_KERNEL); if (!event_waiters) return NULL; - for (i = 0; (event_waiters) && (i < num_events) ; i++) { + for (i = 0; i < num_events; i++) init_wait(_waiters[i].wait); - event_waiters[i].activated = false; - } return event_waiters; } -- 2.39.2
[PATCH AUTOSEL 6.1 13/13] drm/amd/display: fix shift-out-of-bounds in CalculateVMAndRowBytes
From: Alex Hung [ Upstream commit 031f196d1b1b6d5dfcb0533b431e3ab1750e6189 ] [WHY] When PTEBufferSizeInRequests is zero, UBSAN reports the following warning because dml_log2 returns an unexpected negative value: shift exponent 4294966273 is too large for 32-bit type 'int' [HOW] In the case PTEBufferSizeInRequests is zero, skip the dml_log2() and assign the result directly. Reviewed-by: Jun Lei Acked-by: Qingqing Zhuo Signed-off-by: Alex Hung Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c index 479e2c1a13018..49da8119b28e9 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c @@ -1802,7 +1802,10 @@ static unsigned int CalculateVMAndRowBytes( } if (SurfaceTiling == dm_sw_linear) { - *dpte_row_height = dml_min(128, 1 << (unsigned int) dml_floor(dml_log2(PTEBufferSizeInRequests * *PixelPTEReqWidth / Pitch), 1)); + if (PTEBufferSizeInRequests == 0) + *dpte_row_height = 1; + else + *dpte_row_height = dml_min(128, 1 << (unsigned int) dml_floor(dml_log2(PTEBufferSizeInRequests * *PixelPTEReqWidth / Pitch), 1)); *dpte_row_width_ub = (dml_ceil(((double) SwathWidth - 1) / *PixelPTEReqWidth, 1) + 1) * *PixelPTEReqWidth; *PixelPTEBytesPerRow = *dpte_row_width_ub / *PixelPTEReqWidth * *PTERequestSize; } else if (ScanDirection != dm_vert) { -- 2.39.2
[PATCH AUTOSEL 6.1 12/13] drm/amdgpu: fix ttm_bo calltrace warning in psp_hw_fini
From: Horatio Zhang [ Upstream commit 23f4a2d29ba57bf88095f817de5809d427fcbe7e ] The call trace occurs when the amdgpu is removed after the mode1 reset. During mode1 reset, from suspend to resume, there is no need to reinitialize the ta firmware buffer which caused the bo pin_count increase redundantly. [ 489.885525] Call Trace: [ 489.885525] [ 489.885526] amdttm_bo_put+0x34/0x50 [amdttm] [ 489.885529] amdgpu_bo_free_kernel+0xe8/0x130 [amdgpu] [ 489.885620] psp_free_shared_bufs+0xb7/0x150 [amdgpu] [ 489.885720] psp_hw_fini+0xce/0x170 [amdgpu] [ 489.885815] amdgpu_device_fini_hw+0x2ff/0x413 [amdgpu] [ 489.885960] ? blocking_notifier_chain_unregister+0x56/0xb0 [ 489.885962] amdgpu_driver_unload_kms+0x51/0x60 [amdgpu] [ 489.886049] amdgpu_pci_remove+0x5a/0x140 [amdgpu] [ 489.886132] ? __pm_runtime_resume+0x60/0x90 [ 489.886134] pci_device_remove+0x3e/0xb0 [ 489.886135] __device_release_driver+0x1ab/0x2a0 [ 489.886137] driver_detach+0xf3/0x140 [ 489.886138] bus_remove_driver+0x6c/0xf0 [ 489.886140] driver_unregister+0x31/0x60 [ 489.886141] pci_unregister_driver+0x40/0x90 [ 489.886142] amdgpu_exit+0x15/0x451 [amdgpu] Signed-off-by: Horatio Zhang Signed-off-by: longlyao Reviewed-by: Guchun Chen Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index 087147f09933a..3b8825a3e2336 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -1695,7 +1695,7 @@ static int psp_hdcp_initialize(struct psp_context *psp) psp->hdcp_context.context.mem_context.shared_mem_size = PSP_HDCP_SHARED_MEM_SIZE; psp->hdcp_context.context.ta_load_type = GFX_CMD_ID_LOAD_TA; - if (!psp->hdcp_context.context.initialized) { + if (!psp->hdcp_context.context.mem_context.shared_buf) { ret = psp_ta_init_shared_buf(psp, >hdcp_context.context.mem_context); if (ret) return ret; @@ -1762,7 +1762,7 @@ static int psp_dtm_initialize(struct psp_context *psp) psp->dtm_context.context.mem_context.shared_mem_size = PSP_DTM_SHARED_MEM_SIZE; psp->dtm_context.context.ta_load_type = GFX_CMD_ID_LOAD_TA; - if (!psp->dtm_context.context.initialized) { + if (!psp->dtm_context.context.mem_context.shared_buf) { ret = psp_ta_init_shared_buf(psp, >dtm_context.context.mem_context); if (ret) return ret; @@ -1830,7 +1830,7 @@ static int psp_rap_initialize(struct psp_context *psp) psp->rap_context.context.mem_context.shared_mem_size = PSP_RAP_SHARED_MEM_SIZE; psp->rap_context.context.ta_load_type = GFX_CMD_ID_LOAD_TA; - if (!psp->rap_context.context.initialized) { + if (!psp->rap_context.context.mem_context.shared_buf) { ret = psp_ta_init_shared_buf(psp, >rap_context.context.mem_context); if (ret) return ret; -- 2.39.2
[PATCH AUTOSEL 6.1 08/13] drm/amdkfd: Fix an illegal memory access
From: Qu Huang [ Upstream commit 4fc8fff378b2f2039f2a666d9f8c570f4e58352c ] In the kfd_wait_on_events() function, the kfd_event_waiter structure is allocated by alloc_event_waiters(), but the event field of the waiter structure is not initialized; When copy_from_user() fails in the kfd_wait_on_events() function, it will enter exception handling to release the previously allocated memory of the waiter structure; Due to the event field of the waiters structure being accessed in the free_waiters() function, this results in illegal memory access and system crash, here is the crash log: localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0 localhost kernel: RSP: 0018:aa53c362bd60 EFLAGS: 00010082 localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0282 RCX: 002c localhost kernel: RDX: 9e855eeacb80 RSI: 279c RDI: e7088f6a21d0 localhost kernel: RBP: e7088f6a21d0 R08: 002c R09: aa53c362be64 localhost kernel: R10: aa53c362bbd8 R11: 0001 R12: 0002 localhost kernel: R13: 9e7ead15d600 R14: R15: 9e7ead15d698 localhost kernel: FS: 152a3d111700() GS:9e855ee8() knlGS: localhost kernel: CS: 0010 DS: ES: CR0: 80050033 localhost kernel: CR2: 15293810 CR3: 00044d7a4000 CR4: 003506e0 localhost kernel: Call Trace: localhost kernel: _raw_spin_lock_irqsave+0x30/0x40 localhost kernel: remove_wait_queue+0x12/0x50 localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu] localhost kernel: ? ftrace_graph_caller+0xa0/0xa0 localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu] localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu] localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu] localhost kernel: ? ftrace_graph_caller+0xa0/0xa0 localhost kernel: __x64_sys_ioctl+0x8e/0xd0 localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0 localhost kernel: do_syscall_64+0x33/0x80 localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 localhost kernel: RIP: 0033:0x152a4dff68d7 Allocate the structure with kcalloc, and remove redundant 0-initialization and a redundant loop condition check. Signed-off-by: Qu Huang Signed-off-by: Felix Kuehling Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index 729d26d648af3..2880ed96ac2e3 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -778,16 +778,13 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events) struct kfd_event_waiter *event_waiters; uint32_t i; - event_waiters = kmalloc_array(num_events, - sizeof(struct kfd_event_waiter), - GFP_KERNEL); + event_waiters = kcalloc(num_events, sizeof(struct kfd_event_waiter), + GFP_KERNEL); if (!event_waiters) return NULL; - for (i = 0; (event_waiters) && (i < num_events) ; i++) { + for (i = 0; i < num_events; i++) init_wait(_waiters[i].wait); - event_waiters[i].activated = false; - } return event_waiters; } -- 2.39.2
[PATCH AUTOSEL 6.2 13/13] drm/amd/display: fix shift-out-of-bounds in CalculateVMAndRowBytes
From: Alex Hung [ Upstream commit 031f196d1b1b6d5dfcb0533b431e3ab1750e6189 ] [WHY] When PTEBufferSizeInRequests is zero, UBSAN reports the following warning because dml_log2 returns an unexpected negative value: shift exponent 4294966273 is too large for 32-bit type 'int' [HOW] In the case PTEBufferSizeInRequests is zero, skip the dml_log2() and assign the result directly. Reviewed-by: Jun Lei Acked-by: Qingqing Zhuo Signed-off-by: Alex Hung Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c index 379729b028474..c3d75e56410cc 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c @@ -1802,7 +1802,10 @@ static unsigned int CalculateVMAndRowBytes( } if (SurfaceTiling == dm_sw_linear) { - *dpte_row_height = dml_min(128, 1 << (unsigned int) dml_floor(dml_log2(PTEBufferSizeInRequests * *PixelPTEReqWidth / Pitch), 1)); + if (PTEBufferSizeInRequests == 0) + *dpte_row_height = 1; + else + *dpte_row_height = dml_min(128, 1 << (unsigned int) dml_floor(dml_log2(PTEBufferSizeInRequests * *PixelPTEReqWidth / Pitch), 1)); *dpte_row_width_ub = (dml_ceil(((double) SwathWidth - 1) / *PixelPTEReqWidth, 1) + 1) * *PixelPTEReqWidth; *PixelPTEBytesPerRow = *dpte_row_width_ub / *PixelPTEReqWidth * *PTERequestSize; } else if (ScanDirection != dm_vert) { -- 2.39.2
[PATCH AUTOSEL 6.2 12/13] drm/amdgpu: fix ttm_bo calltrace warning in psp_hw_fini
From: Horatio Zhang [ Upstream commit 23f4a2d29ba57bf88095f817de5809d427fcbe7e ] The call trace occurs when the amdgpu is removed after the mode1 reset. During mode1 reset, from suspend to resume, there is no need to reinitialize the ta firmware buffer which caused the bo pin_count increase redundantly. [ 489.885525] Call Trace: [ 489.885525] [ 489.885526] amdttm_bo_put+0x34/0x50 [amdttm] [ 489.885529] amdgpu_bo_free_kernel+0xe8/0x130 [amdgpu] [ 489.885620] psp_free_shared_bufs+0xb7/0x150 [amdgpu] [ 489.885720] psp_hw_fini+0xce/0x170 [amdgpu] [ 489.885815] amdgpu_device_fini_hw+0x2ff/0x413 [amdgpu] [ 489.885960] ? blocking_notifier_chain_unregister+0x56/0xb0 [ 489.885962] amdgpu_driver_unload_kms+0x51/0x60 [amdgpu] [ 489.886049] amdgpu_pci_remove+0x5a/0x140 [amdgpu] [ 489.886132] ? __pm_runtime_resume+0x60/0x90 [ 489.886134] pci_device_remove+0x3e/0xb0 [ 489.886135] __device_release_driver+0x1ab/0x2a0 [ 489.886137] driver_detach+0xf3/0x140 [ 489.886138] bus_remove_driver+0x6c/0xf0 [ 489.886140] driver_unregister+0x31/0x60 [ 489.886141] pci_unregister_driver+0x40/0x90 [ 489.886142] amdgpu_exit+0x15/0x451 [amdgpu] Signed-off-by: Horatio Zhang Signed-off-by: longlyao Reviewed-by: Guchun Chen Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index ba092072308fa..1b4105110f398 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -1685,7 +1685,7 @@ static int psp_hdcp_initialize(struct psp_context *psp) psp->hdcp_context.context.mem_context.shared_mem_size = PSP_HDCP_SHARED_MEM_SIZE; psp->hdcp_context.context.ta_load_type = GFX_CMD_ID_LOAD_TA; - if (!psp->hdcp_context.context.initialized) { + if (!psp->hdcp_context.context.mem_context.shared_buf) { ret = psp_ta_init_shared_buf(psp, >hdcp_context.context.mem_context); if (ret) return ret; @@ -1752,7 +1752,7 @@ static int psp_dtm_initialize(struct psp_context *psp) psp->dtm_context.context.mem_context.shared_mem_size = PSP_DTM_SHARED_MEM_SIZE; psp->dtm_context.context.ta_load_type = GFX_CMD_ID_LOAD_TA; - if (!psp->dtm_context.context.initialized) { + if (!psp->dtm_context.context.mem_context.shared_buf) { ret = psp_ta_init_shared_buf(psp, >dtm_context.context.mem_context); if (ret) return ret; @@ -1820,7 +1820,7 @@ static int psp_rap_initialize(struct psp_context *psp) psp->rap_context.context.mem_context.shared_mem_size = PSP_RAP_SHARED_MEM_SIZE; psp->rap_context.context.ta_load_type = GFX_CMD_ID_LOAD_TA; - if (!psp->rap_context.context.initialized) { + if (!psp->rap_context.context.mem_context.shared_buf) { ret = psp_ta_init_shared_buf(psp, >rap_context.context.mem_context); if (ret) return ret; -- 2.39.2
[PATCH AUTOSEL 6.2 08/13] drm/amdkfd: Fix an illegal memory access
From: Qu Huang [ Upstream commit 4fc8fff378b2f2039f2a666d9f8c570f4e58352c ] In the kfd_wait_on_events() function, the kfd_event_waiter structure is allocated by alloc_event_waiters(), but the event field of the waiter structure is not initialized; When copy_from_user() fails in the kfd_wait_on_events() function, it will enter exception handling to release the previously allocated memory of the waiter structure; Due to the event field of the waiters structure being accessed in the free_waiters() function, this results in illegal memory access and system crash, here is the crash log: localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0 localhost kernel: RSP: 0018:aa53c362bd60 EFLAGS: 00010082 localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0282 RCX: 002c localhost kernel: RDX: 9e855eeacb80 RSI: 279c RDI: e7088f6a21d0 localhost kernel: RBP: e7088f6a21d0 R08: 002c R09: aa53c362be64 localhost kernel: R10: aa53c362bbd8 R11: 0001 R12: 0002 localhost kernel: R13: 9e7ead15d600 R14: R15: 9e7ead15d698 localhost kernel: FS: 152a3d111700() GS:9e855ee8() knlGS: localhost kernel: CS: 0010 DS: ES: CR0: 80050033 localhost kernel: CR2: 15293810 CR3: 00044d7a4000 CR4: 003506e0 localhost kernel: Call Trace: localhost kernel: _raw_spin_lock_irqsave+0x30/0x40 localhost kernel: remove_wait_queue+0x12/0x50 localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu] localhost kernel: ? ftrace_graph_caller+0xa0/0xa0 localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu] localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu] localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu] localhost kernel: ? ftrace_graph_caller+0xa0/0xa0 localhost kernel: __x64_sys_ioctl+0x8e/0xd0 localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0 localhost kernel: do_syscall_64+0x33/0x80 localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 localhost kernel: RIP: 0033:0x152a4dff68d7 Allocate the structure with kcalloc, and remove redundant 0-initialization and a redundant loop condition check. Signed-off-by: Qu Huang Signed-off-by: Felix Kuehling Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index 729d26d648af3..2880ed96ac2e3 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -778,16 +778,13 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events) struct kfd_event_waiter *event_waiters; uint32_t i; - event_waiters = kmalloc_array(num_events, - sizeof(struct kfd_event_waiter), - GFP_KERNEL); + event_waiters = kcalloc(num_events, sizeof(struct kfd_event_waiter), + GFP_KERNEL); if (!event_waiters) return NULL; - for (i = 0; (event_waiters) && (i < num_events) ; i++) { + for (i = 0; i < num_events; i++) init_wait(_waiters[i].wait); - event_waiters[i].activated = false; - } return event_waiters; } -- 2.39.2
RE: [PATCH] drm/amdgpu: skip ASIC reset for GC IP v11.0.4/11 when go to S4
[AMD Official Use Only - General] Please ignore this patch, will send out a new one to skip ASIC reset for all APUs. Thanks. -Original Message- From: Huang, Tim Sent: Monday, March 13, 2023 7:42 PM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Zhang, Yifan ; Du, Xiaojian ; Ma, Li ; Limonciello, Mario ; Huang, Tim Subject: [PATCH] drm/amdgpu: skip ASIC reset for GC IP v11.0.4/11 when go to S4 [Why] For GC IP v11.0.4/11, PSP TMR need to be reserved for ASIC mode2 reset. But for S4, when psp suspend, it will destroy the TMR that fails the ASIC reset. [ 96.006101] amdgpu :62:00.0: amdgpu: MODE2 reset [ 100.409717] amdgpu :62:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0011 SMN_C2PMSG_82:0x0002 [ 100.411593] amdgpu :62:00.0: amdgpu: Mode2 reset failed! [ 100.412470] amdgpu :62:00.0: PM: pci_pm_freeze(): amdgpu_pmops_freeze+0x0/0x50 [amdgpu] returns -62 [ 100.414020] amdgpu :62:00.0: PM: dpm_run_callback(): pci_pm_freeze+0x0/0xd0 returns -62 [ 100.415311] amdgpu :62:00.0: PM: pci_pm_freeze+0x0/0xd0 returned -62 after 4623202 usecs [ 100.416608] amdgpu :62:00.0: PM: failed to freeze async: error -62 [How] Skip the ASIC reset for S4, assuming we can resume properly without reset. Signed-off-by: Tim Huang --- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c index 8fa9a36c38b6..ba02b0d9ef7e 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c @@ -980,6 +980,8 @@ static int smu_v13_0_4_set_performance_level(struct smu_context *smu, static int smu_v13_0_4_mode2_reset(struct smu_context *smu) { + if (!amdgpu_in_reset(smu->adev)) /* Skip the reset for S4 */ + return 0; return smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_GfxDeviceDriverReset, SMU_RESET_MODE_2, NULL); } -- 2.25.1
RE: [PATCH v2] drm/amdgpu: resove reboot exception for si oland
[AMD Official Use Only - General] > -Original Message- > From: Lazar, Lijo > Sent: Tuesday, March 14, 2023 5:07 PM > To: Chen, Guchun ; Zhenneng Li > > Cc: David Airlie ; Pan, Xinhui ; > amd-gfx@lists.freedesktop.org; Daniel Vetter ; Deucher, > Alexander ; Koenig, Christian > > Subject: RE: [PATCH v2] drm/amdgpu: resove reboot exception for si oland > > [AMD Official Use Only - General] > > Hi Guchun, > > This patch doesn't look correct. Without dpm enabled, temperature range > shouldn't be set at all. The patch posted by Zhenneng is good enough or > better to skip late init altogether as it remains an empty function with that > patch. My intention is to prevent setting temperature range again in late_init, as in hw_init prior to late_init, we have configured this range and set dpm_enabled to true already. Also this is a draft patch:) Leaving a NULL function in late_init looks good to me. Regards, Guchun > Thanks, > Lijo > > -Original Message- > From: amd-gfx On Behalf Of Chen, > Guchun > Sent: Tuesday, March 14, 2023 6:35 AM > To: Zhenneng Li > Cc: David Airlie ; Pan, Xinhui ; > amd-gfx@lists.freedesktop.org; Daniel Vetter ; Deucher, > Alexander ; Koenig, Christian > > Subject: RE: [PATCH v2] drm/amdgpu: resove reboot exception for si oland > > Will attached patch help? > > Regards, > Guchun > > > -Original Message- > > From: Zhenneng Li > > Sent: Monday, March 13, 2023 10:57 AM > > To: Chen, Guchun > > Cc: Deucher, Alexander ; Koenig, Christian > > ; Pan, Xinhui ; David > > Airlie ; Daniel Vetter ; amd- > > g...@lists.freedesktop.org; Zhenneng Li > > Subject: [PATCH v2] drm/amdgpu: resove reboot exception for si oland > > > > During reboot test on arm64 platform, it may failure on boot. > > > > The error message are as follows: > > [6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init [amdgpu]] > > *ERROR* > > late_init of IP block failed -22 > > [7.006919][ 7] [ T295] amdgpu :04:00.0: > amdgpu_device_ip_late_init > > failed > > [7.014224][ 7] [ T295] amdgpu :04:00.0: Fatal error during GPU init > > --- > > drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 12 > > 1 file changed, 12 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > > b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > > index d6d9e3b1b2c0..ca9bce895dbe 100644 > > --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > > +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > > @@ -7626,18 +7626,6 @@ static int si_dpm_process_interrupt(struct > > amdgpu_device *adev, > > > > static int si_dpm_late_init(void *handle) { > > - int ret; > > - struct amdgpu_device *adev = (struct amdgpu_device *)handle; > > - > > - if (!adev->pm.dpm_enabled) > > - return 0; > > - > > - ret = si_set_temperature_range(adev); > > - if (ret) > > - return ret; > > -#if 0 //TODO ? > > - si_dpm_powergate_uvd(adev, true); > > -#endif > > return 0; > > } > > > > -- > > 2.25.1
RE: [PATCH v2] drm/amdgpu: resove reboot exception for si oland
[AMD Official Use Only - General] Hi Guchun, This patch doesn't look correct. Without dpm enabled, temperature range shouldn't be set at all. The patch posted by Zhenneng is good enough or better to skip late init altogether as it remains an empty function with that patch. Thanks, Lijo -Original Message- From: amd-gfx On Behalf Of Chen, Guchun Sent: Tuesday, March 14, 2023 6:35 AM To: Zhenneng Li Cc: David Airlie ; Pan, Xinhui ; amd-gfx@lists.freedesktop.org; Daniel Vetter ; Deucher, Alexander ; Koenig, Christian Subject: RE: [PATCH v2] drm/amdgpu: resove reboot exception for si oland Will attached patch help? Regards, Guchun > -Original Message- > From: Zhenneng Li > Sent: Monday, March 13, 2023 10:57 AM > To: Chen, Guchun > Cc: Deucher, Alexander ; Koenig, Christian > ; Pan, Xinhui ; David > Airlie ; Daniel Vetter ; amd- > g...@lists.freedesktop.org; Zhenneng Li > Subject: [PATCH v2] drm/amdgpu: resove reboot exception for si oland > > During reboot test on arm64 platform, it may failure on boot. > > The error message are as follows: > [6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init [amdgpu]] > *ERROR* > late_init of IP block failed -22 > [7.006919][ 7] [ T295] amdgpu :04:00.0: amdgpu_device_ip_late_init > failed > [7.014224][ 7] [ T295] amdgpu :04:00.0: Fatal error during GPU init > --- > drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 12 > 1 file changed, 12 deletions(-) > > diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > index d6d9e3b1b2c0..ca9bce895dbe 100644 > --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > @@ -7626,18 +7626,6 @@ static int si_dpm_process_interrupt(struct > amdgpu_device *adev, > > static int si_dpm_late_init(void *handle) { > - int ret; > - struct amdgpu_device *adev = (struct amdgpu_device *)handle; > - > - if (!adev->pm.dpm_enabled) > - return 0; > - > - ret = si_set_temperature_range(adev); > - if (ret) > - return ret; > -#if 0 //TODO ? > - si_dpm_powergate_uvd(adev, true); > -#endif > return 0; > } > > -- > 2.25.1
[PATCH] drm/amdgpu/nv: Apply ASPM quirk on Intel ADL + AMD Navi
S2idle resume freeze can be observed on Intel ADL + AMD WX5500. This is caused by commit 0064b0ce85bb ("drm/amd/pm: enable ASPM by default"). The root cause is still not clear for now. So extend and apply the ASPM quirk from commit e02fe3bc7aba ("drm/amdgpu: vi: disable ASPM on Intel Alder Lake based systems"), to workaround the issue on Navi cards too. Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2458 Signed-off-by: Kai-Heng Feng --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 +++ drivers/gpu/drm/amd/amdgpu/nv.c| 2 +- drivers/gpu/drm/amd/amdgpu/vi.c| 15 --- 4 files changed, 17 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 164141bc8b4a..c697580f1ee4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1272,6 +1272,7 @@ void amdgpu_device_pci_config_reset(struct amdgpu_device *adev); int amdgpu_device_pci_reset(struct amdgpu_device *adev); bool amdgpu_device_need_post(struct amdgpu_device *adev); bool amdgpu_device_should_use_aspm(struct amdgpu_device *adev); +bool aspm_support_quirk_check(void); void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64 num_bytes, u64 num_vis_bytes); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index c4a4e2fe6681..c09f19385628 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -80,6 +80,10 @@ #include +#if IS_ENABLED(CONFIG_X86) +#include +#endif + MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin"); @@ -1356,6 +1360,17 @@ bool amdgpu_device_should_use_aspm(struct amdgpu_device *adev) return pcie_aspm_enabled(adev->pdev); } +bool aspm_support_quirk_check(void) +{ +#if IS_ENABLED(CONFIG_X86) + struct cpuinfo_x86 *c = _data(0); + + return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE); +#else + return true; +#endif +} + /* if we get transitioned to only one device, take VGA back */ /** * amdgpu_device_vga_set_decode - enable/disable vga decode diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c index 855d390c41de..921adf66e3c4 100644 --- a/drivers/gpu/drm/amd/amdgpu/nv.c +++ b/drivers/gpu/drm/amd/amdgpu/nv.c @@ -578,7 +578,7 @@ static void nv_pcie_gen3_enable(struct amdgpu_device *adev) static void nv_program_aspm(struct amdgpu_device *adev) { - if (!amdgpu_device_should_use_aspm(adev)) + if (!amdgpu_device_should_use_aspm(adev) || !aspm_support_quirk_check()) return; if (!(adev->flags & AMD_IS_APU) && diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c index 12ef782eb478..e61ae372d674 100644 --- a/drivers/gpu/drm/amd/amdgpu/vi.c +++ b/drivers/gpu/drm/amd/amdgpu/vi.c @@ -81,10 +81,6 @@ #include "mxgpu_vi.h" #include "amdgpu_dm.h" -#if IS_ENABLED(CONFIG_X86) -#include -#endif - #define ixPCIE_LC_L1_PM_SUBSTATE 0x100100C6 #define PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK 0x0001L #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK 0x0002L @@ -1138,17 +1134,6 @@ static void vi_enable_aspm(struct amdgpu_device *adev) WREG32_PCIE(ixPCIE_LC_CNTL, data); } -static bool aspm_support_quirk_check(void) -{ -#if IS_ENABLED(CONFIG_X86) - struct cpuinfo_x86 *c = _data(0); - - return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE); -#else - return true; -#endif -} - static void vi_program_aspm(struct amdgpu_device *adev) { u32 data, data1, orig; -- 2.34.1
回复: RE: [PATCH v2] drm/amdgpu: resove reboot exception for si oland
Attached patch will change the code logic, if adev->pm.dpm_enabled is false, si_set_temperature_range(...) will be called, this is wrong obvious. 主 题:RE: [PATCH v2] drm/amdgpu: resove reboot exception for si oland 日 期:2023-03-14 09:04 发件人:Chen, Guchun 收件人:李真能; Will attached patch help?Regards,Guchun> -Original Message-> From: Zhenneng Li > Sent: Monday, March 13, 2023 10:57 AM> To: Chen, Guchun > Cc: Deucher, Alexander ; Koenig, Christian> ; Pan, Xinhui ; David> Airlie ; Daniel Vetter ; amd-> g...@lists.freedesktop.org; Zhenneng Li > Subject: [PATCH v2] drm/amdgpu: resove reboot exception for si oland> > During reboot test on arm64 platform, it may failure on boot.> > The error message are as follows:> [ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init [amdgpu]]> *ERROR*> late_init of IP block failed -22> [ 7.006919][ 7] [ T295] amdgpu :04:00.0: amdgpu_device_ip_late_init> failed> [ 7.014224][ 7] [ T295] amdgpu :04:00.0: Fatal error during GPU init> ---> drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 12 > 1 file changed, 12 deletions(-)> > diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c> b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c> index d6d9e3b1b2c0..ca9bce895dbe 100644> --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c> +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c> @@ -7626,18 +7626,6 @@ static int si_dpm_process_interrupt(struct> amdgpu_device *adev,> > static int si_dpm_late_init(void *handle) {> - int ret;> - struct amdgpu_device *adev = (struct amdgpu_device *)handle;> -> - if (!adev->pm.dpm_enabled)> - return 0;> -> - ret = si_set_temperature_range(adev);> - if (ret)> - return ret;> -#if 0 //TODO ?> - si_dpm_powergate_uvd(adev, true);> -#endif> return 0;> }> > --> 2.25.1
RE: [PATCH] drm/amdgpu: Init MMVM_CONTEXTS_DISABLE in gmc11 golden setting under SRIOV
[AMD Official Use Only - General] Reviewed-by: Horace Chen -Original Message- From: Yifan Zha Sent: Monday, March 6, 2023 3:25 PM To: amd-gfx@lists.freedesktop.org; Deucher, Alexander ; Zhang, Hawking Cc: Chen, Horace ; Chang, HaiJun ; Zha, YiFan(Even) Subject: [PATCH] drm/amdgpu: Init MMVM_CONTEXTS_DISABLE in gmc11 golden setting under SRIOV [Why] If disable the mmhub vm contexts(set MMVM_CONTEXTS_DISABLE to 0x), driver loading failed on vf due to fence fallback timer expired on all rings. FLR cannot reset MMVM_CONTEXTS_DISABLE. So this vf can not be recovered anymore unless trigger a whole gpu reset. [How] Under SRIOV, init MMVM_CONTEXTS_DISABLE in gmc11 golden register setting. Signed-off-by: Yifan Zha --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 2 ++ drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 6 ++ drivers/gpu/drm/amd/amdgpu/mmhub_v3_0.c | 3 +++ 3 files changed, 11 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h index 0305b660cd17..fad3034b35ee 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h @@ -104,6 +104,8 @@ struct amdgpu_vmhub { uint32_tvm_cntx_cntl_vm_fault; uint32_tvm_l2_bank_select_reserved_cid2; + uint32_tvm_contexts_disable; + const struct amdgpu_vmhub_funcs *vmhub_funcs; }; diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c index 0a31a341aa43..7481f2f2804c 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c @@ -875,6 +875,12 @@ static int gmc_v11_0_sw_fini(void *handle) static void gmc_v11_0_init_golden_registers(struct amdgpu_device *adev) { + if (amdgpu_sriov_vf(adev)) { + struct amdgpu_vmhub *hub = >vmhub[AMDGPU_MMHUB_0]; + + WREG32(hub->vm_contexts_disable, 0); + return; + } } /** diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v3_0.c b/drivers/gpu/drm/amd/amdgpu/mmhub_v3_0.c index 164948c50ac3..17a792616979 100644 --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v3_0.c +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v3_0.c @@ -517,6 +517,9 @@ static void mmhub_v3_0_init(struct amdgpu_device *adev) hub->vm_l2_bank_select_reserved_cid2 = SOC15_REG_OFFSET(MMHUB, 0, regMMVM_L2_BANK_SELECT_RESERVED_CID2); + hub->vm_contexts_disable = + SOC15_REG_OFFSET(MMHUB, 0, regMMVM_CONTEXTS_DISABLE); + hub->vmhub_funcs = _v3_0_vmhub_funcs; } -- 2.25.1
RE: [PATCH v2] drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs
Hi Luben, I'd have to ping you because we've got a P1 ticket currently on this issue. Would you please give a vague time when would you confirm whether this patch is safe? Thank you a lot for helping double check this. Regards & Thanks, Yubiao -Original Message- From: Tuikov, Luben Sent: Saturday, March 11, 2023 12:56 AM To: Wang, YuBiao ; amd-gfx@lists.freedesktop.org Cc: Quan, Evan ; Chen, Horace ; Koenig, Christian ; Deucher, Alexander ; Zhang, Hawking ; Liu, Monk ; Xu, Feifei ; Wang, Yang(Kevin) Subject: Re: [PATCH v2] drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs On 2023-03-08 21:27, YuBiao Wang wrote: > v2: Add comments to clarify in the code. > > [Why] > For engines not supporting soft reset, i.e. VCN, there will be a > failed ib test before mode 1 reset during asic reset. The fences in > this case are never signaled and next time when we try to free the > sa_bo, kernel will hang. > > [How] > During pre_asic_reset, driver will clear job fences and afterwards the > fences' refcount will be reduced to 1. For drm_sched_jobs it will be > released in job_free_cb, and for non-sched jobs like ib_test, it's > meant to be released in sa_bo_free but only when the fences are > signaled. So we have to force signal the non_sched bad job's fence > during pre_asic_reset or the clear is not complete. > > Signed-off-by: YuBiao Wang > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > index faff4a3f96e6..ad7c5b70c35a 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > @@ -673,6 +673,7 @@ void amdgpu_fence_driver_clear_job_fences(struct > amdgpu_ring *ring) { > int i; > struct dma_fence *old, **ptr; > + struct amdgpu_job *job; > > for (i = 0; i <= ring->fence_drv.num_fences_mask; i++) { > ptr = >fence_drv.fences[i]; > @@ -680,6 +681,13 @@ void amdgpu_fence_driver_clear_job_fences(struct > amdgpu_ring *ring) > if (old && old->ops == _job_fence_ops) { > RCU_INIT_POINTER(*ptr, NULL); > dma_fence_put(old); > + /* For non-sched bad job, i.e. failed ib test, we need > to force > + * signal it right here or we won't be able to track > them in fence drv > + * and they will remain unsignaled during sa_bo free. > + */ > + job = container_of(old, struct amdgpu_job, hw_fence); > + if (!job->base.s_fence && !dma_fence_is_signaled(old)) > + dma_fence_signal(old); Conceptually, I don't mind this patch for what it does. The only thing which worries me is this check here, !job->base.s_fence, which is used here to qualify that we can signal the fence (and of course that the fence is not yet signalled.) We need to audit this check to make sure that it is not overloaded to mean other things. I'll take a look. > } > } > } -- Regards, Luben