Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh

2023-03-14 Thread Huang Rui
On Wed, Mar 15, 2023 at 08:52:30AM +0800, Stefano Stabellini wrote:
> On Mon, 13 Mar 2023, Jan Beulich wrote:
> > On 12.03.2023 13:01, Huang Rui wrote:
> > > Xen PVH is the paravirtualized mode and takes advantage of hardware
> > > virtualization support when possible. It will using the hardware IOMMU
> > > support instead of xen-swiotlb, so disable swiotlb if current domain is
> > > Xen PVH.
> > 
> > But the kernel has no way (yet) to drive the IOMMU, so how can it get
> > away without resorting to swiotlb in certain cases (like I/O to an
> > address-restricted device)?
> 
> I think Ray meant that, thanks to the IOMMU setup by Xen, there is no
> need for swiotlb-xen in Dom0. Address translations are done by the IOMMU
> so we can use guest physical addresses instead of machine addresses for
> DMA. This is a similar case to Dom0 on ARM when the IOMMU is available
> (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding
> case is XENFEAT_not_direct_mapped).

Hi Jan, sorry to late reply. We are using the native kernel amdgpu and ttm
driver on Dom0, amdgpu/ttm would like to use IOMMU to allocate coherent
buffers for userptr that map the user space memory to gpu access, however,
swiotlb doesn't support this. In other words, with swiotlb, we only can
handle the buffer page by page.

Thanks,
Ray

> 
> Jurgen, what do you think? Would you rather make xen_swiotlb_detect
> common between ARM and x86?


RE: [PATCH v2] drm/amdgpu: resove reboot exception for si oland

2023-03-14 Thread Quan, Evan
[AMD Official Use Only - General]

I'm OK with the drop of si_set_temperature_range() in late_init.
Meanwhile, it's still not clear to me how this could lead reboot exception.
Can you dig this a little bit further?
For example, can you check whether the 
operation(si_thermal_start_thermal_controller()) actually already failed in 
hw_init(si_dpm_enable more specifically)?

@@ -6918,7 +6918,11 @@ static int si_dpm_enable(struct amdgpu_device *adev)
si_start_dpm(adev);

si_enable_auto_throttle_source(adev, SI_DPM_AUTO_THROTTLE_SRC_THERMAL, 
true);
-   si_thermal_start_thermal_controller(adev);
+   ret = si_thermal_start_thermal_controller(adev);
+   if (ret) {
+   DRM_ERROR("si_thermal_start_thermal_controller failed\n");
+   return ret;
+   }

ni_update_current_ps(adev, boot_ps);

BR
Evan
> -Original Message-
> From: amd-gfx  On Behalf Of
> Zhenneng Li
> Sent: Monday, March 13, 2023 10:57 AM
> To: Chen, Guchun 
> Cc: David Airlie ; Pan, Xinhui ;
> Zhenneng Li ; amd-gfx@lists.freedesktop.org;
> Daniel Vetter ; Deucher, Alexander
> ; Koenig, Christian
> 
> Subject: [PATCH v2] drm/amdgpu: resove reboot exception for si oland
> 
> During reboot test on arm64 platform, it may failure
> on boot.
> 
> The error message are as follows:
> [6.996395][ 7] [  T295] [drm:amdgpu_device_ip_late_init [amdgpu]]
> *ERROR*
>   late_init of IP block  failed -22
> [7.006919][ 7] [  T295] amdgpu :04:00.0: amdgpu_device_ip_late_init
> failed
> [7.014224][ 7] [  T295] amdgpu :04:00.0: Fatal error during GPU init
> ---
>  drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 12 
>  1 file changed, 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> index d6d9e3b1b2c0..ca9bce895dbe 100644
> --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> @@ -7626,18 +7626,6 @@ static int si_dpm_process_interrupt(struct
> amdgpu_device *adev,
> 
>  static int si_dpm_late_init(void *handle)
>  {
> - int ret;
> - struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> -
> - if (!adev->pm.dpm_enabled)
> - return 0;
> -
> - ret = si_set_temperature_range(adev);
> - if (ret)
> - return ret;
> -#if 0 //TODO ?
> - si_dpm_powergate_uvd(adev, true);
> -#endif
>   return 0;
>  }
> 
> --
> 2.25.1


Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh

2023-03-14 Thread Stefano Stabellini
On Mon, 13 Mar 2023, Jan Beulich wrote:
> On 12.03.2023 13:01, Huang Rui wrote:
> > Xen PVH is the paravirtualized mode and takes advantage of hardware
> > virtualization support when possible. It will using the hardware IOMMU
> > support instead of xen-swiotlb, so disable swiotlb if current domain is
> > Xen PVH.
> 
> But the kernel has no way (yet) to drive the IOMMU, so how can it get
> away without resorting to swiotlb in certain cases (like I/O to an
> address-restricted device)?

I think Ray meant that, thanks to the IOMMU setup by Xen, there is no
need for swiotlb-xen in Dom0. Address translations are done by the IOMMU
so we can use guest physical addresses instead of machine addresses for
DMA. This is a similar case to Dom0 on ARM when the IOMMU is available
(see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding
case is XENFEAT_not_direct_mapped).

Jurgen, what do you think? Would you rather make xen_swiotlb_detect
common between ARM and x86?


Re: [PATCH] drm/amdgpu: Don't resume IOMMU after incomplete init

2023-03-14 Thread Alex Deucher
On Tue, Mar 14, 2023 at 1:54 PM Felix Kuehling  wrote:
>
> Check kfd->init_complete in kgd2kfd_iommu_resume, consistent with other
> kgd2kfd calls. This should fix IOMMU errors on resume from suspend when
> KFD IOMMU initialization failed.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217170
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2454
> Cc: Vasant Hegde 
> Cc: Linux regression tracking (Thorsten Leemhuis) 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Felix Kuehling 

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 521dfa88aad8..989c6aa2620b 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -60,6 +60,7 @@ static int kfd_gtt_sa_init(struct kfd_dev *kfd, unsigned 
> int buf_size,
> unsigned int chunk_size);
>  static void kfd_gtt_sa_fini(struct kfd_dev *kfd);
>
> +static int kfd_resume_iommu(struct kfd_dev *kfd);
>  static int kfd_resume(struct kfd_dev *kfd);
>
>  static void kfd_device_info_set_sdma_info(struct kfd_dev *kfd)
> @@ -625,7 +626,7 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>
> svm_migrate_init(kfd->adev);
>
> -   if (kgd2kfd_resume_iommu(kfd))
> +   if (kfd_resume_iommu(kfd))
> goto device_iommu_error;
>
> if (kfd_resume(kfd))
> @@ -773,6 +774,14 @@ int kgd2kfd_resume(struct kfd_dev *kfd, bool run_pm)
>  }
>
>  int kgd2kfd_resume_iommu(struct kfd_dev *kfd)
> +{
> +   if (!kfd->init_complete)
> +   return 0;
> +
> +   return kfd_resume_iommu(kfd);
> +}
> +
> +static int kfd_resume_iommu(struct kfd_dev *kfd)
>  {
> int err = 0;
>
> --
> 2.34.1
>


[PATCH] drm/amdgpu: Don't resume IOMMU after incomplete init

2023-03-14 Thread Felix Kuehling
Check kfd->init_complete in kgd2kfd_iommu_resume, consistent with other
kgd2kfd calls. This should fix IOMMU errors on resume from suspend when
KFD IOMMU initialization failed.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217170
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2454
Cc: Vasant Hegde 
Cc: Linux regression tracking (Thorsten Leemhuis) 
Cc: sta...@vger.kernel.org
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 521dfa88aad8..989c6aa2620b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -60,6 +60,7 @@ static int kfd_gtt_sa_init(struct kfd_dev *kfd, unsigned int 
buf_size,
unsigned int chunk_size);
 static void kfd_gtt_sa_fini(struct kfd_dev *kfd);
 
+static int kfd_resume_iommu(struct kfd_dev *kfd);
 static int kfd_resume(struct kfd_dev *kfd);
 
 static void kfd_device_info_set_sdma_info(struct kfd_dev *kfd)
@@ -625,7 +626,7 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
 
svm_migrate_init(kfd->adev);
 
-   if (kgd2kfd_resume_iommu(kfd))
+   if (kfd_resume_iommu(kfd))
goto device_iommu_error;
 
if (kfd_resume(kfd))
@@ -773,6 +774,14 @@ int kgd2kfd_resume(struct kfd_dev *kfd, bool run_pm)
 }
 
 int kgd2kfd_resume_iommu(struct kfd_dev *kfd)
+{
+   if (!kfd->init_complete)
+   return 0;
+
+   return kfd_resume_iommu(kfd);
+}
+
+static int kfd_resume_iommu(struct kfd_dev *kfd)
 {
int err = 0;
 
-- 
2.34.1



Re: [PATCH] drm/amdgpu/nv: Apply ASPM quirk on Intel ADL + AMD Navi

2023-03-14 Thread Alex Deucher
On Tue, Mar 14, 2023 at 12:35 AM Kai-Heng Feng
 wrote:
>
> S2idle resume freeze can be observed on Intel ADL + AMD WX5500. This is
> caused by commit 0064b0ce85bb ("drm/amd/pm: enable ASPM by default").
>
> The root cause is still not clear for now.
>
> So extend and apply the ASPM quirk from commit e02fe3bc7aba
> ("drm/amdgpu: vi: disable ASPM on Intel Alder Lake based systems"), to
> workaround the issue on Navi cards too.
>
> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2458
> Signed-off-by: Kai-Heng Feng 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 +++
>  drivers/gpu/drm/amd/amdgpu/nv.c|  2 +-
>  drivers/gpu/drm/amd/amdgpu/vi.c| 15 ---
>  4 files changed, 17 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 164141bc8b4a..c697580f1ee4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1272,6 +1272,7 @@ void amdgpu_device_pci_config_reset(struct 
> amdgpu_device *adev);
>  int amdgpu_device_pci_reset(struct amdgpu_device *adev);
>  bool amdgpu_device_need_post(struct amdgpu_device *adev);
>  bool amdgpu_device_should_use_aspm(struct amdgpu_device *adev);
> +bool aspm_support_quirk_check(void);
>
>  void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64 num_bytes,
>   u64 num_vis_bytes);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index c4a4e2fe6681..c09f19385628 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -80,6 +80,10 @@
>
>  #include 
>
> +#if IS_ENABLED(CONFIG_X86)
> +#include 
> +#endif
> +
>  MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
>  MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
>  MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
> @@ -1356,6 +1360,17 @@ bool amdgpu_device_should_use_aspm(struct 
> amdgpu_device *adev)
> return pcie_aspm_enabled(adev->pdev);
>  }
>
> +bool aspm_support_quirk_check(void)

For consistency with naming, rename this
amdgpu_device_aspm_support_quirk().  Other than that, looks good to
me.  With that fixed:
Reviewed-by: Alex Deucher 

Alex


> +{
> +#if IS_ENABLED(CONFIG_X86)
> +   struct cpuinfo_x86 *c = _data(0);
> +
> +   return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
> +#else
> +   return true;
> +#endif
> +}
> +
>  /* if we get transitioned to only one device, take VGA back */
>  /**
>   * amdgpu_device_vga_set_decode - enable/disable vga decode
> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
> index 855d390c41de..921adf66e3c4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
> @@ -578,7 +578,7 @@ static void nv_pcie_gen3_enable(struct amdgpu_device 
> *adev)
>
>  static void nv_program_aspm(struct amdgpu_device *adev)
>  {
> -   if (!amdgpu_device_should_use_aspm(adev))
> +   if (!amdgpu_device_should_use_aspm(adev) || 
> !aspm_support_quirk_check())
> return;
>
> if (!(adev->flags & AMD_IS_APU) &&
> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c
> index 12ef782eb478..e61ae372d674 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
> @@ -81,10 +81,6 @@
>  #include "mxgpu_vi.h"
>  #include "amdgpu_dm.h"
>
> -#if IS_ENABLED(CONFIG_X86)
> -#include 
> -#endif
> -
>  #define ixPCIE_LC_L1_PM_SUBSTATE   0x100100C6
>  #define PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK   
> 0x0001L
>  #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK   0x0002L
> @@ -1138,17 +1134,6 @@ static void vi_enable_aspm(struct amdgpu_device *adev)
> WREG32_PCIE(ixPCIE_LC_CNTL, data);
>  }
>
> -static bool aspm_support_quirk_check(void)
> -{
> -#if IS_ENABLED(CONFIG_X86)
> -   struct cpuinfo_x86 *c = _data(0);
> -
> -   return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
> -#else
> -   return true;
> -#endif
> -}
> -
>  static void vi_program_aspm(struct amdgpu_device *adev)
>  {
> u32 data, data1, orig;
> --
> 2.34.1
>


RE: NAB Show 2023 - Lead & Deal retrievals

2023-03-14 Thread Nancy Tyler


Hello,

I sent you an email about attendees list ?

Let me know your interest to Send Pricing Details..

Awaiting for your response!

Cheers
Nancy

From: Nancy Tyler
Sent: Thursday, March 9, 2023 5:13 PM
To: amd-gfx@lists.freedesktop.org
Subject: NAB Show 2023 - Lead & Deal retrievals
Importance: High

Hello, Have a wonderful day!

Would you want to purchase an National Association of Broadcasters - NAB Show 
2023 Attendees Pre-registered Contact List?

ATTENDEES TITLES: -

Executive/Corporate Management, Creative Professionals, Technical 
Professionals, Sales/Marketing/Programming Professionals, Others..
If you're Interested please reply back as a "Send Cost and Counts"

Regards,
Nancy Tyler |Global Marketing

If you don't want to receive further emails please revert with "Take Out" in 
the subject



[PATCH] drm/radeon: remove unused variable rbo

2023-03-14 Thread Tom Rix
gcc with W=1 reports this error
drivers/gpu/drm/radeon/radeon_ttm.c:201:27: error:
  variable ‘rbo’ set but not used [-Werror=unused-but-set-variable]
  201 | struct radeon_bo *rbo;
  |   ^~~

rbo use was removed with
commit f87c1f0b7b79 ("drm/ttm: prevent moving of pinned BOs")
Since the variable is not used, remove it.

Signed-off-by: Tom Rix 
---
 drivers/gpu/drm/radeon/radeon_ttm.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 2220cdf6a3f6..0ea430ee5256 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -198,7 +198,6 @@ static int radeon_bo_move(struct ttm_buffer_object *bo, 
bool evict,
 {
struct ttm_resource *old_mem = bo->resource;
struct radeon_device *rdev;
-   struct radeon_bo *rbo;
int r;
 
if (new_mem->mem_type == TTM_PL_TT) {
@@ -211,7 +210,6 @@ static int radeon_bo_move(struct ttm_buffer_object *bo, 
bool evict,
if (r)
return r;
 
-   rbo = container_of(bo, struct radeon_bo, tbo);
rdev = radeon_get_rdev(bo->bdev);
if (!old_mem || (old_mem->mem_type == TTM_PL_SYSTEM &&
 bo->ttm == NULL)) {
-- 
2.27.0



Re: [PATCH] drm/radeon: remove unused variable rbo

2023-03-14 Thread Christian König

Am 14.03.23 um 14:06 schrieb Tom Rix:

gcc with W=1 reports this error
drivers/gpu/drm/radeon/radeon_ttm.c:201:27: error:
   variable ‘rbo’ set but not used [-Werror=unused-but-set-variable]
   201 | struct radeon_bo *rbo;
   |   ^~~

rbo use was removed with
commit f87c1f0b7b79 ("drm/ttm: prevent moving of pinned BOs")
Since the variable is not used, remove it.

Signed-off-by: Tom Rix 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/radeon/radeon_ttm.c | 2 --
  1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 2220cdf6a3f6..0ea430ee5256 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -198,7 +198,6 @@ static int radeon_bo_move(struct ttm_buffer_object *bo, 
bool evict,
  {
struct ttm_resource *old_mem = bo->resource;
struct radeon_device *rdev;
-   struct radeon_bo *rbo;
int r;
  
  	if (new_mem->mem_type == TTM_PL_TT) {

@@ -211,7 +210,6 @@ static int radeon_bo_move(struct ttm_buffer_object *bo, 
bool evict,
if (r)
return r;
  
-	rbo = container_of(bo, struct radeon_bo, tbo);

rdev = radeon_get_rdev(bo->bdev);
if (!old_mem || (old_mem->mem_type == TTM_PL_SYSTEM &&
 bo->ttm == NULL)) {




[PATCH AUTOSEL 4.19 6/7] drm/amdkfd: Fix an illegal memory access

2023-03-14 Thread Sasha Levin
From: Qu Huang 

[ Upstream commit 4fc8fff378b2f2039f2a666d9f8c570f4e58352c ]

In the kfd_wait_on_events() function, the kfd_event_waiter structure is
allocated by alloc_event_waiters(), but the event field of the waiter
structure is not initialized; When copy_from_user() fails in the
kfd_wait_on_events() function, it will enter exception handling to
release the previously allocated memory of the waiter structure;
Due to the event field of the waiters structure being accessed
in the free_waiters() function, this results in illegal memory access
and system crash, here is the crash log:

localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0
localhost kernel: RSP: 0018:aa53c362bd60 EFLAGS: 00010082
localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0282 RCX: 
002c
localhost kernel: RDX: 9e855eeacb80 RSI: 279c RDI: 
e7088f6a21d0
localhost kernel: RBP: e7088f6a21d0 R08: 002c R09: 
aa53c362be64
localhost kernel: R10: aa53c362bbd8 R11: 0001 R12: 
0002
localhost kernel: R13: 9e7ead15d600 R14:  R15: 
9e7ead15d698
localhost kernel: FS:  152a3d111700() GS:9e855ee8() 
knlGS:
localhost kernel: CS:  0010 DS:  ES:  CR0: 80050033
localhost kernel: CR2: 15293810 CR3: 00044d7a4000 CR4: 
003506e0
localhost kernel: Call Trace:
localhost kernel: _raw_spin_lock_irqsave+0x30/0x40
localhost kernel: remove_wait_queue+0x12/0x50
localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu]
localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu]
localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu]
localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu]
localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
localhost kernel: __x64_sys_ioctl+0x8e/0xd0
localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0
localhost kernel: do_syscall_64+0x33/0x80
localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
localhost kernel: RIP: 0033:0x152a4dff68d7

Allocate the structure with kcalloc, and remove redundant 0-initialization
and a redundant loop condition check.

Signed-off-by: Qu Huang 
Signed-off-by: Felix Kuehling 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_events.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 892077377339a..8f23192b67095 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -529,16 +529,13 @@ static struct kfd_event_waiter 
*alloc_event_waiters(uint32_t num_events)
struct kfd_event_waiter *event_waiters;
uint32_t i;
 
-   event_waiters = kmalloc_array(num_events,
-   sizeof(struct kfd_event_waiter),
-   GFP_KERNEL);
+   event_waiters = kcalloc(num_events, sizeof(struct kfd_event_waiter),
+   GFP_KERNEL);
if (!event_waiters)
return NULL;
 
-   for (i = 0; (event_waiters) && (i < num_events) ; i++) {
+   for (i = 0; i < num_events; i++)
init_wait(_waiters[i].wait);
-   event_waiters[i].activated = false;
-   }
 
return event_waiters;
 }
-- 
2.39.2



[PATCH AUTOSEL 5.4 6/7] drm/amdkfd: Fix an illegal memory access

2023-03-14 Thread Sasha Levin
From: Qu Huang 

[ Upstream commit 4fc8fff378b2f2039f2a666d9f8c570f4e58352c ]

In the kfd_wait_on_events() function, the kfd_event_waiter structure is
allocated by alloc_event_waiters(), but the event field of the waiter
structure is not initialized; When copy_from_user() fails in the
kfd_wait_on_events() function, it will enter exception handling to
release the previously allocated memory of the waiter structure;
Due to the event field of the waiters structure being accessed
in the free_waiters() function, this results in illegal memory access
and system crash, here is the crash log:

localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0
localhost kernel: RSP: 0018:aa53c362bd60 EFLAGS: 00010082
localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0282 RCX: 
002c
localhost kernel: RDX: 9e855eeacb80 RSI: 279c RDI: 
e7088f6a21d0
localhost kernel: RBP: e7088f6a21d0 R08: 002c R09: 
aa53c362be64
localhost kernel: R10: aa53c362bbd8 R11: 0001 R12: 
0002
localhost kernel: R13: 9e7ead15d600 R14:  R15: 
9e7ead15d698
localhost kernel: FS:  152a3d111700() GS:9e855ee8() 
knlGS:
localhost kernel: CS:  0010 DS:  ES:  CR0: 80050033
localhost kernel: CR2: 15293810 CR3: 00044d7a4000 CR4: 
003506e0
localhost kernel: Call Trace:
localhost kernel: _raw_spin_lock_irqsave+0x30/0x40
localhost kernel: remove_wait_queue+0x12/0x50
localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu]
localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu]
localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu]
localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu]
localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
localhost kernel: __x64_sys_ioctl+0x8e/0xd0
localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0
localhost kernel: do_syscall_64+0x33/0x80
localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
localhost kernel: RIP: 0033:0x152a4dff68d7

Allocate the structure with kcalloc, and remove redundant 0-initialization
and a redundant loop condition check.

Signed-off-by: Qu Huang 
Signed-off-by: Felix Kuehling 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_events.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index adbb2fec2e0f2..4fd7dcef2e382 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -529,16 +529,13 @@ static struct kfd_event_waiter 
*alloc_event_waiters(uint32_t num_events)
struct kfd_event_waiter *event_waiters;
uint32_t i;
 
-   event_waiters = kmalloc_array(num_events,
-   sizeof(struct kfd_event_waiter),
-   GFP_KERNEL);
+   event_waiters = kcalloc(num_events, sizeof(struct kfd_event_waiter),
+   GFP_KERNEL);
if (!event_waiters)
return NULL;
 
-   for (i = 0; (event_waiters) && (i < num_events) ; i++) {
+   for (i = 0; i < num_events; i++)
init_wait(_waiters[i].wait);
-   event_waiters[i].activated = false;
-   }
 
return event_waiters;
 }
-- 
2.39.2



[PATCH AUTOSEL 5.10 6/8] drm/amdkfd: Fix an illegal memory access

2023-03-14 Thread Sasha Levin
From: Qu Huang 

[ Upstream commit 4fc8fff378b2f2039f2a666d9f8c570f4e58352c ]

In the kfd_wait_on_events() function, the kfd_event_waiter structure is
allocated by alloc_event_waiters(), but the event field of the waiter
structure is not initialized; When copy_from_user() fails in the
kfd_wait_on_events() function, it will enter exception handling to
release the previously allocated memory of the waiter structure;
Due to the event field of the waiters structure being accessed
in the free_waiters() function, this results in illegal memory access
and system crash, here is the crash log:

localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0
localhost kernel: RSP: 0018:aa53c362bd60 EFLAGS: 00010082
localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0282 RCX: 
002c
localhost kernel: RDX: 9e855eeacb80 RSI: 279c RDI: 
e7088f6a21d0
localhost kernel: RBP: e7088f6a21d0 R08: 002c R09: 
aa53c362be64
localhost kernel: R10: aa53c362bbd8 R11: 0001 R12: 
0002
localhost kernel: R13: 9e7ead15d600 R14:  R15: 
9e7ead15d698
localhost kernel: FS:  152a3d111700() GS:9e855ee8() 
knlGS:
localhost kernel: CS:  0010 DS:  ES:  CR0: 80050033
localhost kernel: CR2: 15293810 CR3: 00044d7a4000 CR4: 
003506e0
localhost kernel: Call Trace:
localhost kernel: _raw_spin_lock_irqsave+0x30/0x40
localhost kernel: remove_wait_queue+0x12/0x50
localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu]
localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu]
localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu]
localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu]
localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
localhost kernel: __x64_sys_ioctl+0x8e/0xd0
localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0
localhost kernel: do_syscall_64+0x33/0x80
localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
localhost kernel: RIP: 0033:0x152a4dff68d7

Allocate the structure with kcalloc, and remove redundant 0-initialization
and a redundant loop condition check.

Signed-off-by: Qu Huang 
Signed-off-by: Felix Kuehling 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_events.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 159be13ef20bb..2c19b3775179b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -528,16 +528,13 @@ static struct kfd_event_waiter 
*alloc_event_waiters(uint32_t num_events)
struct kfd_event_waiter *event_waiters;
uint32_t i;
 
-   event_waiters = kmalloc_array(num_events,
-   sizeof(struct kfd_event_waiter),
-   GFP_KERNEL);
+   event_waiters = kcalloc(num_events, sizeof(struct kfd_event_waiter),
+   GFP_KERNEL);
if (!event_waiters)
return NULL;
 
-   for (i = 0; (event_waiters) && (i < num_events) ; i++) {
+   for (i = 0; i < num_events; i++)
init_wait(_waiters[i].wait);
-   event_waiters[i].activated = false;
-   }
 
return event_waiters;
 }
-- 
2.39.2



[PATCH AUTOSEL 5.10 8/8] drm/amd/display: fix shift-out-of-bounds in CalculateVMAndRowBytes

2023-03-14 Thread Sasha Levin
From: Alex Hung 

[ Upstream commit 031f196d1b1b6d5dfcb0533b431e3ab1750e6189 ]

[WHY]
When PTEBufferSizeInRequests is zero, UBSAN reports the following
warning because dml_log2 returns an unexpected negative value:

  shift exponent 4294966273 is too large for 32-bit type 'int'

[HOW]

In the case PTEBufferSizeInRequests is zero, skip the dml_log2() and
assign the result directly.

Reviewed-by: Jun Lei 
Acked-by: Qingqing Zhuo 
Signed-off-by: Alex Hung 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 .../gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c   | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c
index e427f4ffa0807..e5b1002d7f3f0 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c
@@ -1868,7 +1868,10 @@ static unsigned int CalculateVMAndRowBytes(
}
 
if (SurfaceTiling == dm_sw_linear) {
-   *dpte_row_height = dml_min(128, 1 << (unsigned int) 
dml_floor(dml_log2(PTEBufferSizeInRequests * *PixelPTEReqWidth / Pitch), 1));
+   if (PTEBufferSizeInRequests == 0)
+   *dpte_row_height = 1;
+   else
+   *dpte_row_height = dml_min(128, 1 << (unsigned int) 
dml_floor(dml_log2(PTEBufferSizeInRequests * *PixelPTEReqWidth / Pitch), 1));
*dpte_row_width_ub = (dml_ceil(((double) SwathWidth - 1) / 
*PixelPTEReqWidth, 1) + 1) * *PixelPTEReqWidth;
*PixelPTEBytesPerRow = *dpte_row_width_ub / *PixelPTEReqWidth * 
*PTERequestSize;
} else if (ScanDirection != dm_vert) {
-- 
2.39.2



[PATCH AUTOSEL 5.15 10/10] drm/amd/display: fix shift-out-of-bounds in CalculateVMAndRowBytes

2023-03-14 Thread Sasha Levin
From: Alex Hung 

[ Upstream commit 031f196d1b1b6d5dfcb0533b431e3ab1750e6189 ]

[WHY]
When PTEBufferSizeInRequests is zero, UBSAN reports the following
warning because dml_log2 returns an unexpected negative value:

  shift exponent 4294966273 is too large for 32-bit type 'int'

[HOW]

In the case PTEBufferSizeInRequests is zero, skip the dml_log2() and
assign the result directly.

Reviewed-by: Jun Lei 
Acked-by: Qingqing Zhuo 
Signed-off-by: Alex Hung 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 .../gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c   | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c
index 518672a2450f4..de0fa87b301a5 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c
@@ -1868,7 +1868,10 @@ static unsigned int CalculateVMAndRowBytes(
}
 
if (SurfaceTiling == dm_sw_linear) {
-   *dpte_row_height = dml_min(128, 1 << (unsigned int) 
dml_floor(dml_log2(PTEBufferSizeInRequests * *PixelPTEReqWidth / Pitch), 1));
+   if (PTEBufferSizeInRequests == 0)
+   *dpte_row_height = 1;
+   else
+   *dpte_row_height = dml_min(128, 1 << (unsigned int) 
dml_floor(dml_log2(PTEBufferSizeInRequests * *PixelPTEReqWidth / Pitch), 1));
*dpte_row_width_ub = (dml_ceil(((double) SwathWidth - 1) / 
*PixelPTEReqWidth, 1) + 1) * *PixelPTEReqWidth;
*PixelPTEBytesPerRow = *dpte_row_width_ub / *PixelPTEReqWidth * 
*PTERequestSize;
} else if (ScanDirection != dm_vert) {
-- 
2.39.2



[PATCH AUTOSEL 5.15 07/10] drm/amdkfd: Fix an illegal memory access

2023-03-14 Thread Sasha Levin
From: Qu Huang 

[ Upstream commit 4fc8fff378b2f2039f2a666d9f8c570f4e58352c ]

In the kfd_wait_on_events() function, the kfd_event_waiter structure is
allocated by alloc_event_waiters(), but the event field of the waiter
structure is not initialized; When copy_from_user() fails in the
kfd_wait_on_events() function, it will enter exception handling to
release the previously allocated memory of the waiter structure;
Due to the event field of the waiters structure being accessed
in the free_waiters() function, this results in illegal memory access
and system crash, here is the crash log:

localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0
localhost kernel: RSP: 0018:aa53c362bd60 EFLAGS: 00010082
localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0282 RCX: 
002c
localhost kernel: RDX: 9e855eeacb80 RSI: 279c RDI: 
e7088f6a21d0
localhost kernel: RBP: e7088f6a21d0 R08: 002c R09: 
aa53c362be64
localhost kernel: R10: aa53c362bbd8 R11: 0001 R12: 
0002
localhost kernel: R13: 9e7ead15d600 R14:  R15: 
9e7ead15d698
localhost kernel: FS:  152a3d111700() GS:9e855ee8() 
knlGS:
localhost kernel: CS:  0010 DS:  ES:  CR0: 80050033
localhost kernel: CR2: 15293810 CR3: 00044d7a4000 CR4: 
003506e0
localhost kernel: Call Trace:
localhost kernel: _raw_spin_lock_irqsave+0x30/0x40
localhost kernel: remove_wait_queue+0x12/0x50
localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu]
localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu]
localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu]
localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu]
localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
localhost kernel: __x64_sys_ioctl+0x8e/0xd0
localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0
localhost kernel: do_syscall_64+0x33/0x80
localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
localhost kernel: RIP: 0033:0x152a4dff68d7

Allocate the structure with kcalloc, and remove redundant 0-initialization
and a redundant loop condition check.

Signed-off-by: Qu Huang 
Signed-off-by: Felix Kuehling 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_events.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index b8bdd796cd911..8b5c82af2acd7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -528,16 +528,13 @@ static struct kfd_event_waiter 
*alloc_event_waiters(uint32_t num_events)
struct kfd_event_waiter *event_waiters;
uint32_t i;
 
-   event_waiters = kmalloc_array(num_events,
-   sizeof(struct kfd_event_waiter),
-   GFP_KERNEL);
+   event_waiters = kcalloc(num_events, sizeof(struct kfd_event_waiter),
+   GFP_KERNEL);
if (!event_waiters)
return NULL;
 
-   for (i = 0; (event_waiters) && (i < num_events) ; i++) {
+   for (i = 0; i < num_events; i++)
init_wait(_waiters[i].wait);
-   event_waiters[i].activated = false;
-   }
 
return event_waiters;
 }
-- 
2.39.2



[PATCH AUTOSEL 6.1 13/13] drm/amd/display: fix shift-out-of-bounds in CalculateVMAndRowBytes

2023-03-14 Thread Sasha Levin
From: Alex Hung 

[ Upstream commit 031f196d1b1b6d5dfcb0533b431e3ab1750e6189 ]

[WHY]
When PTEBufferSizeInRequests is zero, UBSAN reports the following
warning because dml_log2 returns an unexpected negative value:

  shift exponent 4294966273 is too large for 32-bit type 'int'

[HOW]

In the case PTEBufferSizeInRequests is zero, skip the dml_log2() and
assign the result directly.

Reviewed-by: Jun Lei 
Acked-by: Qingqing Zhuo 
Signed-off-by: Alex Hung 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 .../gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c   | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c
index 479e2c1a13018..49da8119b28e9 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c
@@ -1802,7 +1802,10 @@ static unsigned int CalculateVMAndRowBytes(
}
 
if (SurfaceTiling == dm_sw_linear) {
-   *dpte_row_height = dml_min(128, 1 << (unsigned int) 
dml_floor(dml_log2(PTEBufferSizeInRequests * *PixelPTEReqWidth / Pitch), 1));
+   if (PTEBufferSizeInRequests == 0)
+   *dpte_row_height = 1;
+   else
+   *dpte_row_height = dml_min(128, 1 << (unsigned int) 
dml_floor(dml_log2(PTEBufferSizeInRequests * *PixelPTEReqWidth / Pitch), 1));
*dpte_row_width_ub = (dml_ceil(((double) SwathWidth - 1) / 
*PixelPTEReqWidth, 1) + 1) * *PixelPTEReqWidth;
*PixelPTEBytesPerRow = *dpte_row_width_ub / *PixelPTEReqWidth * 
*PTERequestSize;
} else if (ScanDirection != dm_vert) {
-- 
2.39.2



[PATCH AUTOSEL 6.1 12/13] drm/amdgpu: fix ttm_bo calltrace warning in psp_hw_fini

2023-03-14 Thread Sasha Levin
From: Horatio Zhang 

[ Upstream commit 23f4a2d29ba57bf88095f817de5809d427fcbe7e ]

The call trace occurs when the amdgpu is removed after
the mode1 reset. During mode1 reset, from suspend to resume,
there is no need to reinitialize the ta firmware buffer
which caused the bo pin_count increase redundantly.

[  489.885525] Call Trace:
[  489.885525]  
[  489.885526]  amdttm_bo_put+0x34/0x50 [amdttm]
[  489.885529]  amdgpu_bo_free_kernel+0xe8/0x130 [amdgpu]
[  489.885620]  psp_free_shared_bufs+0xb7/0x150 [amdgpu]
[  489.885720]  psp_hw_fini+0xce/0x170 [amdgpu]
[  489.885815]  amdgpu_device_fini_hw+0x2ff/0x413 [amdgpu]
[  489.885960]  ? blocking_notifier_chain_unregister+0x56/0xb0
[  489.885962]  amdgpu_driver_unload_kms+0x51/0x60 [amdgpu]
[  489.886049]  amdgpu_pci_remove+0x5a/0x140 [amdgpu]
[  489.886132]  ? __pm_runtime_resume+0x60/0x90
[  489.886134]  pci_device_remove+0x3e/0xb0
[  489.886135]  __device_release_driver+0x1ab/0x2a0
[  489.886137]  driver_detach+0xf3/0x140
[  489.886138]  bus_remove_driver+0x6c/0xf0
[  489.886140]  driver_unregister+0x31/0x60
[  489.886141]  pci_unregister_driver+0x40/0x90
[  489.886142]  amdgpu_exit+0x15/0x451 [amdgpu]

Signed-off-by: Horatio Zhang 
Signed-off-by: longlyao 
Reviewed-by: Guchun Chen 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 087147f09933a..3b8825a3e2336 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -1695,7 +1695,7 @@ static int psp_hdcp_initialize(struct psp_context *psp)
psp->hdcp_context.context.mem_context.shared_mem_size = 
PSP_HDCP_SHARED_MEM_SIZE;
psp->hdcp_context.context.ta_load_type = GFX_CMD_ID_LOAD_TA;
 
-   if (!psp->hdcp_context.context.initialized) {
+   if (!psp->hdcp_context.context.mem_context.shared_buf) {
ret = psp_ta_init_shared_buf(psp, 
>hdcp_context.context.mem_context);
if (ret)
return ret;
@@ -1762,7 +1762,7 @@ static int psp_dtm_initialize(struct psp_context *psp)
psp->dtm_context.context.mem_context.shared_mem_size = 
PSP_DTM_SHARED_MEM_SIZE;
psp->dtm_context.context.ta_load_type = GFX_CMD_ID_LOAD_TA;
 
-   if (!psp->dtm_context.context.initialized) {
+   if (!psp->dtm_context.context.mem_context.shared_buf) {
ret = psp_ta_init_shared_buf(psp, 
>dtm_context.context.mem_context);
if (ret)
return ret;
@@ -1830,7 +1830,7 @@ static int psp_rap_initialize(struct psp_context *psp)
psp->rap_context.context.mem_context.shared_mem_size = 
PSP_RAP_SHARED_MEM_SIZE;
psp->rap_context.context.ta_load_type = GFX_CMD_ID_LOAD_TA;
 
-   if (!psp->rap_context.context.initialized) {
+   if (!psp->rap_context.context.mem_context.shared_buf) {
ret = psp_ta_init_shared_buf(psp, 
>rap_context.context.mem_context);
if (ret)
return ret;
-- 
2.39.2



[PATCH AUTOSEL 6.1 08/13] drm/amdkfd: Fix an illegal memory access

2023-03-14 Thread Sasha Levin
From: Qu Huang 

[ Upstream commit 4fc8fff378b2f2039f2a666d9f8c570f4e58352c ]

In the kfd_wait_on_events() function, the kfd_event_waiter structure is
allocated by alloc_event_waiters(), but the event field of the waiter
structure is not initialized; When copy_from_user() fails in the
kfd_wait_on_events() function, it will enter exception handling to
release the previously allocated memory of the waiter structure;
Due to the event field of the waiters structure being accessed
in the free_waiters() function, this results in illegal memory access
and system crash, here is the crash log:

localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0
localhost kernel: RSP: 0018:aa53c362bd60 EFLAGS: 00010082
localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0282 RCX: 
002c
localhost kernel: RDX: 9e855eeacb80 RSI: 279c RDI: 
e7088f6a21d0
localhost kernel: RBP: e7088f6a21d0 R08: 002c R09: 
aa53c362be64
localhost kernel: R10: aa53c362bbd8 R11: 0001 R12: 
0002
localhost kernel: R13: 9e7ead15d600 R14:  R15: 
9e7ead15d698
localhost kernel: FS:  152a3d111700() GS:9e855ee8() 
knlGS:
localhost kernel: CS:  0010 DS:  ES:  CR0: 80050033
localhost kernel: CR2: 15293810 CR3: 00044d7a4000 CR4: 
003506e0
localhost kernel: Call Trace:
localhost kernel: _raw_spin_lock_irqsave+0x30/0x40
localhost kernel: remove_wait_queue+0x12/0x50
localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu]
localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu]
localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu]
localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu]
localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
localhost kernel: __x64_sys_ioctl+0x8e/0xd0
localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0
localhost kernel: do_syscall_64+0x33/0x80
localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
localhost kernel: RIP: 0033:0x152a4dff68d7

Allocate the structure with kcalloc, and remove redundant 0-initialization
and a redundant loop condition check.

Signed-off-by: Qu Huang 
Signed-off-by: Felix Kuehling 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_events.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 729d26d648af3..2880ed96ac2e3 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -778,16 +778,13 @@ static struct kfd_event_waiter 
*alloc_event_waiters(uint32_t num_events)
struct kfd_event_waiter *event_waiters;
uint32_t i;
 
-   event_waiters = kmalloc_array(num_events,
-   sizeof(struct kfd_event_waiter),
-   GFP_KERNEL);
+   event_waiters = kcalloc(num_events, sizeof(struct kfd_event_waiter),
+   GFP_KERNEL);
if (!event_waiters)
return NULL;
 
-   for (i = 0; (event_waiters) && (i < num_events) ; i++) {
+   for (i = 0; i < num_events; i++)
init_wait(_waiters[i].wait);
-   event_waiters[i].activated = false;
-   }
 
return event_waiters;
 }
-- 
2.39.2



[PATCH AUTOSEL 6.2 13/13] drm/amd/display: fix shift-out-of-bounds in CalculateVMAndRowBytes

2023-03-14 Thread Sasha Levin
From: Alex Hung 

[ Upstream commit 031f196d1b1b6d5dfcb0533b431e3ab1750e6189 ]

[WHY]
When PTEBufferSizeInRequests is zero, UBSAN reports the following
warning because dml_log2 returns an unexpected negative value:

  shift exponent 4294966273 is too large for 32-bit type 'int'

[HOW]

In the case PTEBufferSizeInRequests is zero, skip the dml_log2() and
assign the result directly.

Reviewed-by: Jun Lei 
Acked-by: Qingqing Zhuo 
Signed-off-by: Alex Hung 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 .../gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c   | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c
index 379729b028474..c3d75e56410cc 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c
@@ -1802,7 +1802,10 @@ static unsigned int CalculateVMAndRowBytes(
}
 
if (SurfaceTiling == dm_sw_linear) {
-   *dpte_row_height = dml_min(128, 1 << (unsigned int) 
dml_floor(dml_log2(PTEBufferSizeInRequests * *PixelPTEReqWidth / Pitch), 1));
+   if (PTEBufferSizeInRequests == 0)
+   *dpte_row_height = 1;
+   else
+   *dpte_row_height = dml_min(128, 1 << (unsigned int) 
dml_floor(dml_log2(PTEBufferSizeInRequests * *PixelPTEReqWidth / Pitch), 1));
*dpte_row_width_ub = (dml_ceil(((double) SwathWidth - 1) / 
*PixelPTEReqWidth, 1) + 1) * *PixelPTEReqWidth;
*PixelPTEBytesPerRow = *dpte_row_width_ub / *PixelPTEReqWidth * 
*PTERequestSize;
} else if (ScanDirection != dm_vert) {
-- 
2.39.2



[PATCH AUTOSEL 6.2 12/13] drm/amdgpu: fix ttm_bo calltrace warning in psp_hw_fini

2023-03-14 Thread Sasha Levin
From: Horatio Zhang 

[ Upstream commit 23f4a2d29ba57bf88095f817de5809d427fcbe7e ]

The call trace occurs when the amdgpu is removed after
the mode1 reset. During mode1 reset, from suspend to resume,
there is no need to reinitialize the ta firmware buffer
which caused the bo pin_count increase redundantly.

[  489.885525] Call Trace:
[  489.885525]  
[  489.885526]  amdttm_bo_put+0x34/0x50 [amdttm]
[  489.885529]  amdgpu_bo_free_kernel+0xe8/0x130 [amdgpu]
[  489.885620]  psp_free_shared_bufs+0xb7/0x150 [amdgpu]
[  489.885720]  psp_hw_fini+0xce/0x170 [amdgpu]
[  489.885815]  amdgpu_device_fini_hw+0x2ff/0x413 [amdgpu]
[  489.885960]  ? blocking_notifier_chain_unregister+0x56/0xb0
[  489.885962]  amdgpu_driver_unload_kms+0x51/0x60 [amdgpu]
[  489.886049]  amdgpu_pci_remove+0x5a/0x140 [amdgpu]
[  489.886132]  ? __pm_runtime_resume+0x60/0x90
[  489.886134]  pci_device_remove+0x3e/0xb0
[  489.886135]  __device_release_driver+0x1ab/0x2a0
[  489.886137]  driver_detach+0xf3/0x140
[  489.886138]  bus_remove_driver+0x6c/0xf0
[  489.886140]  driver_unregister+0x31/0x60
[  489.886141]  pci_unregister_driver+0x40/0x90
[  489.886142]  amdgpu_exit+0x15/0x451 [amdgpu]

Signed-off-by: Horatio Zhang 
Signed-off-by: longlyao 
Reviewed-by: Guchun Chen 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index ba092072308fa..1b4105110f398 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -1685,7 +1685,7 @@ static int psp_hdcp_initialize(struct psp_context *psp)
psp->hdcp_context.context.mem_context.shared_mem_size = 
PSP_HDCP_SHARED_MEM_SIZE;
psp->hdcp_context.context.ta_load_type = GFX_CMD_ID_LOAD_TA;
 
-   if (!psp->hdcp_context.context.initialized) {
+   if (!psp->hdcp_context.context.mem_context.shared_buf) {
ret = psp_ta_init_shared_buf(psp, 
>hdcp_context.context.mem_context);
if (ret)
return ret;
@@ -1752,7 +1752,7 @@ static int psp_dtm_initialize(struct psp_context *psp)
psp->dtm_context.context.mem_context.shared_mem_size = 
PSP_DTM_SHARED_MEM_SIZE;
psp->dtm_context.context.ta_load_type = GFX_CMD_ID_LOAD_TA;
 
-   if (!psp->dtm_context.context.initialized) {
+   if (!psp->dtm_context.context.mem_context.shared_buf) {
ret = psp_ta_init_shared_buf(psp, 
>dtm_context.context.mem_context);
if (ret)
return ret;
@@ -1820,7 +1820,7 @@ static int psp_rap_initialize(struct psp_context *psp)
psp->rap_context.context.mem_context.shared_mem_size = 
PSP_RAP_SHARED_MEM_SIZE;
psp->rap_context.context.ta_load_type = GFX_CMD_ID_LOAD_TA;
 
-   if (!psp->rap_context.context.initialized) {
+   if (!psp->rap_context.context.mem_context.shared_buf) {
ret = psp_ta_init_shared_buf(psp, 
>rap_context.context.mem_context);
if (ret)
return ret;
-- 
2.39.2



[PATCH AUTOSEL 6.2 08/13] drm/amdkfd: Fix an illegal memory access

2023-03-14 Thread Sasha Levin
From: Qu Huang 

[ Upstream commit 4fc8fff378b2f2039f2a666d9f8c570f4e58352c ]

In the kfd_wait_on_events() function, the kfd_event_waiter structure is
allocated by alloc_event_waiters(), but the event field of the waiter
structure is not initialized; When copy_from_user() fails in the
kfd_wait_on_events() function, it will enter exception handling to
release the previously allocated memory of the waiter structure;
Due to the event field of the waiters structure being accessed
in the free_waiters() function, this results in illegal memory access
and system crash, here is the crash log:

localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0
localhost kernel: RSP: 0018:aa53c362bd60 EFLAGS: 00010082
localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0282 RCX: 
002c
localhost kernel: RDX: 9e855eeacb80 RSI: 279c RDI: 
e7088f6a21d0
localhost kernel: RBP: e7088f6a21d0 R08: 002c R09: 
aa53c362be64
localhost kernel: R10: aa53c362bbd8 R11: 0001 R12: 
0002
localhost kernel: R13: 9e7ead15d600 R14:  R15: 
9e7ead15d698
localhost kernel: FS:  152a3d111700() GS:9e855ee8() 
knlGS:
localhost kernel: CS:  0010 DS:  ES:  CR0: 80050033
localhost kernel: CR2: 15293810 CR3: 00044d7a4000 CR4: 
003506e0
localhost kernel: Call Trace:
localhost kernel: _raw_spin_lock_irqsave+0x30/0x40
localhost kernel: remove_wait_queue+0x12/0x50
localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu]
localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu]
localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu]
localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu]
localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
localhost kernel: __x64_sys_ioctl+0x8e/0xd0
localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0
localhost kernel: do_syscall_64+0x33/0x80
localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
localhost kernel: RIP: 0033:0x152a4dff68d7

Allocate the structure with kcalloc, and remove redundant 0-initialization
and a redundant loop condition check.

Signed-off-by: Qu Huang 
Signed-off-by: Felix Kuehling 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_events.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 729d26d648af3..2880ed96ac2e3 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -778,16 +778,13 @@ static struct kfd_event_waiter 
*alloc_event_waiters(uint32_t num_events)
struct kfd_event_waiter *event_waiters;
uint32_t i;
 
-   event_waiters = kmalloc_array(num_events,
-   sizeof(struct kfd_event_waiter),
-   GFP_KERNEL);
+   event_waiters = kcalloc(num_events, sizeof(struct kfd_event_waiter),
+   GFP_KERNEL);
if (!event_waiters)
return NULL;
 
-   for (i = 0; (event_waiters) && (i < num_events) ; i++) {
+   for (i = 0; i < num_events; i++)
init_wait(_waiters[i].wait);
-   event_waiters[i].activated = false;
-   }
 
return event_waiters;
 }
-- 
2.39.2



RE: [PATCH] drm/amdgpu: skip ASIC reset for GC IP v11.0.4/11 when go to S4

2023-03-14 Thread Huang, Tim
[AMD Official Use Only - General]

Please ignore this patch, will send out a new one to skip ASIC reset for all 
APUs. Thanks.

-Original Message-
From: Huang, Tim 
Sent: Monday, March 13, 2023 7:42 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Zhang, Yifan 
; Du, Xiaojian ; Ma, Li 
; Limonciello, Mario ; Huang, Tim 

Subject: [PATCH] drm/amdgpu: skip ASIC reset for GC IP v11.0.4/11 when go to S4

[Why]
For GC IP v11.0.4/11, PSP TMR need to be reserved for ASIC mode2 reset. But for 
S4, when psp suspend, it will destroy the TMR that fails the ASIC reset.

[  96.006101] amdgpu :62:00.0: amdgpu: MODE2 reset [  100.409717] amdgpu 
:62:00.0: amdgpu: SMU: I'm not done with your previous command: 
SMN_C2PMSG_66:0x0011 SMN_C2PMSG_82:0x0002 [  100.411593] amdgpu 
:62:00.0: amdgpu: Mode2 reset failed!
[  100.412470] amdgpu :62:00.0: PM: pci_pm_freeze(): 
amdgpu_pmops_freeze+0x0/0x50 [amdgpu] returns -62 [  100.414020] amdgpu 
:62:00.0: PM: dpm_run_callback(): pci_pm_freeze+0x0/0xd0 returns -62 [  
100.415311] amdgpu :62:00.0: PM: pci_pm_freeze+0x0/0xd0 returned -62 after 
4623202 usecs [  100.416608] amdgpu :62:00.0: PM: failed to freeze async: 
error -62

[How]
Skip the ASIC reset for S4, assuming we can resume properly without reset.

Signed-off-by: Tim Huang 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
index 8fa9a36c38b6..ba02b0d9ef7e 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
@@ -980,6 +980,8 @@ static int smu_v13_0_4_set_performance_level(struct 
smu_context *smu,

 static int smu_v13_0_4_mode2_reset(struct smu_context *smu)  {
+   if (!amdgpu_in_reset(smu->adev)) /* Skip the reset for S4 */
+   return 0;
return smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_GfxDeviceDriverReset,
   SMU_RESET_MODE_2, NULL);
 }
--
2.25.1



RE: [PATCH v2] drm/amdgpu: resove reboot exception for si oland

2023-03-14 Thread Chen, Guchun
[AMD Official Use Only - General]

> -Original Message-
> From: Lazar, Lijo 
> Sent: Tuesday, March 14, 2023 5:07 PM
> To: Chen, Guchun ; Zhenneng Li
> 
> Cc: David Airlie ; Pan, Xinhui ;
> amd-gfx@lists.freedesktop.org; Daniel Vetter ; Deucher,
> Alexander ; Koenig, Christian
> 
> Subject: RE: [PATCH v2] drm/amdgpu: resove reboot exception for si oland
>
> [AMD Official Use Only - General]
>
> Hi Guchun,
>
> This patch doesn't look correct. Without dpm enabled, temperature range
> shouldn't be set at all. The patch posted by Zhenneng is good enough or
> better to skip late init altogether as it remains an empty function with that
> patch.

My intention is to prevent setting temperature range again in late_init, as in 
hw_init prior to late_init, we have configured this range and set dpm_enabled 
to true already. Also this is a draft patch:)

Leaving a NULL function in late_init looks good to me.

Regards,
Guchun
> Thanks,
> Lijo
>
> -Original Message-
> From: amd-gfx  On Behalf Of Chen,
> Guchun
> Sent: Tuesday, March 14, 2023 6:35 AM
> To: Zhenneng Li 
> Cc: David Airlie ; Pan, Xinhui ;
> amd-gfx@lists.freedesktop.org; Daniel Vetter ; Deucher,
> Alexander ; Koenig, Christian
> 
> Subject: RE: [PATCH v2] drm/amdgpu: resove reboot exception for si oland
>
> Will attached patch help?
>
> Regards,
> Guchun
>
> > -Original Message-
> > From: Zhenneng Li 
> > Sent: Monday, March 13, 2023 10:57 AM
> > To: Chen, Guchun 
> > Cc: Deucher, Alexander ; Koenig, Christian
> > ; Pan, Xinhui ; David
> > Airlie ; Daniel Vetter ; amd-
> > g...@lists.freedesktop.org; Zhenneng Li 
> > Subject: [PATCH v2] drm/amdgpu: resove reboot exception for si oland
> >
> > During reboot test on arm64 platform, it may failure on boot.
> >
> > The error message are as follows:
> > [6.996395][ 7] [  T295] [drm:amdgpu_device_ip_late_init [amdgpu]]
> > *ERROR*
> > late_init of IP block  failed -22
> > [7.006919][ 7] [  T295] amdgpu :04:00.0:
> amdgpu_device_ip_late_init
> > failed
> > [7.014224][ 7] [  T295] amdgpu :04:00.0: Fatal error during GPU init
> > ---
> >  drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 12 
> >  1 file changed, 12 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> > b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> > index d6d9e3b1b2c0..ca9bce895dbe 100644
> > --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> > +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> > @@ -7626,18 +7626,6 @@ static int si_dpm_process_interrupt(struct
> > amdgpu_device *adev,
> >
> >  static int si_dpm_late_init(void *handle)  {
> > -   int ret;
> > -   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> > -
> > -   if (!adev->pm.dpm_enabled)
> > -   return 0;
> > -
> > -   ret = si_set_temperature_range(adev);
> > -   if (ret)
> > -   return ret;
> > -#if 0 //TODO ?
> > -   si_dpm_powergate_uvd(adev, true);
> > -#endif
> > return 0;
> >  }
> >
> > --
> > 2.25.1


RE: [PATCH v2] drm/amdgpu: resove reboot exception for si oland

2023-03-14 Thread Lazar, Lijo
[AMD Official Use Only - General]

Hi Guchun,

This patch doesn't look correct. Without dpm enabled, temperature range 
shouldn't be set at all. The patch posted by Zhenneng is good enough or better 
to skip late init altogether as it remains an empty function with that patch.

Thanks,
Lijo

-Original Message-
From: amd-gfx  On Behalf Of Chen, Guchun
Sent: Tuesday, March 14, 2023 6:35 AM
To: Zhenneng Li 
Cc: David Airlie ; Pan, Xinhui ; 
amd-gfx@lists.freedesktop.org; Daniel Vetter ; Deucher, 
Alexander ; Koenig, Christian 

Subject: RE: [PATCH v2] drm/amdgpu: resove reboot exception for si oland

Will attached patch help?

Regards,
Guchun

> -Original Message-
> From: Zhenneng Li 
> Sent: Monday, March 13, 2023 10:57 AM
> To: Chen, Guchun 
> Cc: Deucher, Alexander ; Koenig, Christian 
> ; Pan, Xinhui ; David 
> Airlie ; Daniel Vetter ; amd- 
> g...@lists.freedesktop.org; Zhenneng Li 
> Subject: [PATCH v2] drm/amdgpu: resove reboot exception for si oland
> 
> During reboot test on arm64 platform, it may failure on boot.
> 
> The error message are as follows:
> [6.996395][ 7] [  T295] [drm:amdgpu_device_ip_late_init [amdgpu]]
> *ERROR*
>   late_init of IP block  failed -22
> [7.006919][ 7] [  T295] amdgpu :04:00.0: amdgpu_device_ip_late_init
> failed
> [7.014224][ 7] [  T295] amdgpu :04:00.0: Fatal error during GPU init
> ---
>  drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 12 
>  1 file changed, 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> index d6d9e3b1b2c0..ca9bce895dbe 100644
> --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> @@ -7626,18 +7626,6 @@ static int si_dpm_process_interrupt(struct 
> amdgpu_device *adev,
> 
>  static int si_dpm_late_init(void *handle)  {
> - int ret;
> - struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> -
> - if (!adev->pm.dpm_enabled)
> - return 0;
> -
> - ret = si_set_temperature_range(adev);
> - if (ret)
> - return ret;
> -#if 0 //TODO ?
> - si_dpm_powergate_uvd(adev, true);
> -#endif
>   return 0;
>  }
> 
> --
> 2.25.1


[PATCH] drm/amdgpu/nv: Apply ASPM quirk on Intel ADL + AMD Navi

2023-03-14 Thread Kai-Heng Feng
S2idle resume freeze can be observed on Intel ADL + AMD WX5500. This is
caused by commit 0064b0ce85bb ("drm/amd/pm: enable ASPM by default").

The root cause is still not clear for now.

So extend and apply the ASPM quirk from commit e02fe3bc7aba
("drm/amdgpu: vi: disable ASPM on Intel Alder Lake based systems"), to
workaround the issue on Navi cards too.

Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2458
Signed-off-by: Kai-Heng Feng 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 +++
 drivers/gpu/drm/amd/amdgpu/nv.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/vi.c| 15 ---
 4 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 164141bc8b4a..c697580f1ee4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1272,6 +1272,7 @@ void amdgpu_device_pci_config_reset(struct amdgpu_device 
*adev);
 int amdgpu_device_pci_reset(struct amdgpu_device *adev);
 bool amdgpu_device_need_post(struct amdgpu_device *adev);
 bool amdgpu_device_should_use_aspm(struct amdgpu_device *adev);
+bool aspm_support_quirk_check(void);
 
 void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64 num_bytes,
  u64 num_vis_bytes);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index c4a4e2fe6681..c09f19385628 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -80,6 +80,10 @@
 
 #include 
 
+#if IS_ENABLED(CONFIG_X86)
+#include 
+#endif
+
 MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
 MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
 MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
@@ -1356,6 +1360,17 @@ bool amdgpu_device_should_use_aspm(struct amdgpu_device 
*adev)
return pcie_aspm_enabled(adev->pdev);
 }
 
+bool aspm_support_quirk_check(void)
+{
+#if IS_ENABLED(CONFIG_X86)
+   struct cpuinfo_x86 *c = _data(0);
+
+   return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
+#else
+   return true;
+#endif
+}
+
 /* if we get transitioned to only one device, take VGA back */
 /**
  * amdgpu_device_vga_set_decode - enable/disable vga decode
diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index 855d390c41de..921adf66e3c4 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -578,7 +578,7 @@ static void nv_pcie_gen3_enable(struct amdgpu_device *adev)
 
 static void nv_program_aspm(struct amdgpu_device *adev)
 {
-   if (!amdgpu_device_should_use_aspm(adev))
+   if (!amdgpu_device_should_use_aspm(adev) || !aspm_support_quirk_check())
return;
 
if (!(adev->flags & AMD_IS_APU) &&
diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c
index 12ef782eb478..e61ae372d674 100644
--- a/drivers/gpu/drm/amd/amdgpu/vi.c
+++ b/drivers/gpu/drm/amd/amdgpu/vi.c
@@ -81,10 +81,6 @@
 #include "mxgpu_vi.h"
 #include "amdgpu_dm.h"
 
-#if IS_ENABLED(CONFIG_X86)
-#include 
-#endif
-
 #define ixPCIE_LC_L1_PM_SUBSTATE   0x100100C6
 #define PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK   
0x0001L
 #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK   0x0002L
@@ -1138,17 +1134,6 @@ static void vi_enable_aspm(struct amdgpu_device *adev)
WREG32_PCIE(ixPCIE_LC_CNTL, data);
 }
 
-static bool aspm_support_quirk_check(void)
-{
-#if IS_ENABLED(CONFIG_X86)
-   struct cpuinfo_x86 *c = _data(0);
-
-   return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
-#else
-   return true;
-#endif
-}
-
 static void vi_program_aspm(struct amdgpu_device *adev)
 {
u32 data, data1, orig;
-- 
2.34.1



回复: RE: [PATCH v2] drm/amdgpu: resove reboot exception for si oland

2023-03-14 Thread 李真能
Attached patch will change the code logic, if adev->pm.dpm_enabled is false, si_set_temperature_range(...) will  be called, this is wrong obvious.

 

主 题:RE: [PATCH v2] drm/amdgpu: resove reboot exception for si oland 日 期:2023-03-14 09:04 发件人:Chen, Guchun 收件人:李真能;



Will attached patch help?Regards,Guchun> -Original Message-> From: Zhenneng Li > Sent: Monday, March 13, 2023 10:57 AM> To: Chen, Guchun > Cc: Deucher, Alexander ; Koenig, Christian> ; Pan, Xinhui ; David> Airlie ; Daniel Vetter ; amd-> g...@lists.freedesktop.org; Zhenneng Li > Subject: [PATCH v2] drm/amdgpu: resove reboot exception for si oland> > During reboot test on arm64 platform, it may failure on boot.> > The error message are as follows:> [ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init [amdgpu]]> *ERROR*> late_init of IP block failed -22> [ 7.006919][ 7] [ T295] amdgpu :04:00.0: amdgpu_device_ip_late_init> failed> [ 7.014224][ 7] [ T295] amdgpu :04:00.0: Fatal error during GPU init> ---> drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 12 > 1 file changed, 12 deletions(-)> > diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c> b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c> index d6d9e3b1b2c0..ca9bce895dbe 100644> --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c> +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c> @@ -7626,18 +7626,6 @@ static int si_dpm_process_interrupt(struct> amdgpu_device *adev,> > static int si_dpm_late_init(void *handle) {> - int ret;> - struct amdgpu_device *adev = (struct amdgpu_device *)handle;> -> - if (!adev->pm.dpm_enabled)> - return 0;> -> - ret = si_set_temperature_range(adev);> - if (ret)> - return ret;> -#if 0 //TODO ?> - si_dpm_powergate_uvd(adev, true);> -#endif> return 0;> }> > --> 2.25.1




RE: [PATCH] drm/amdgpu: Init MMVM_CONTEXTS_DISABLE in gmc11 golden setting under SRIOV

2023-03-14 Thread Chen, Horace
[AMD Official Use Only - General]

Reviewed-by: Horace Chen 

-Original Message-
From: Yifan Zha 
Sent: Monday, March 6, 2023 3:25 PM
To: amd-gfx@lists.freedesktop.org; Deucher, Alexander 
; Zhang, Hawking 
Cc: Chen, Horace ; Chang, HaiJun ; 
Zha, YiFan(Even) 
Subject: [PATCH] drm/amdgpu: Init MMVM_CONTEXTS_DISABLE in gmc11 golden setting 
under SRIOV

[Why]
If disable the mmhub vm contexts(set MMVM_CONTEXTS_DISABLE to 0x), driver 
loading failed on vf due to fence fallback timer expired on all rings.
FLR cannot reset MMVM_CONTEXTS_DISABLE.
So this vf can not be recovered anymore unless trigger a whole gpu reset.

[How]
Under SRIOV, init MMVM_CONTEXTS_DISABLE in gmc11 golden register setting.

Signed-off-by: Yifan Zha 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 2 ++  
drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 6 ++  
drivers/gpu/drm/amd/amdgpu/mmhub_v3_0.c | 3 +++
 3 files changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
index 0305b660cd17..fad3034b35ee 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -104,6 +104,8 @@ struct amdgpu_vmhub {
uint32_tvm_cntx_cntl_vm_fault;
uint32_tvm_l2_bank_select_reserved_cid2;

+   uint32_tvm_contexts_disable;
+
const struct amdgpu_vmhub_funcs *vmhub_funcs;  };

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
index 0a31a341aa43..7481f2f2804c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
@@ -875,6 +875,12 @@ static int gmc_v11_0_sw_fini(void *handle)

 static void gmc_v11_0_init_golden_registers(struct amdgpu_device *adev)  {
+   if (amdgpu_sriov_vf(adev)) {
+   struct amdgpu_vmhub *hub = >vmhub[AMDGPU_MMHUB_0];
+
+   WREG32(hub->vm_contexts_disable, 0);
+   return;
+   }
 }

 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/mmhub_v3_0.c
index 164948c50ac3..17a792616979 100644
--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v3_0.c
@@ -517,6 +517,9 @@ static void mmhub_v3_0_init(struct amdgpu_device *adev)
hub->vm_l2_bank_select_reserved_cid2 =
SOC15_REG_OFFSET(MMHUB, 0, 
regMMVM_L2_BANK_SELECT_RESERVED_CID2);

+   hub->vm_contexts_disable =
+   SOC15_REG_OFFSET(MMHUB, 0, regMMVM_CONTEXTS_DISABLE);
+
hub->vmhub_funcs = _v3_0_vmhub_funcs;  }

--
2.25.1



RE: [PATCH v2] drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs

2023-03-14 Thread Wang, YuBiao
Hi Luben,

I'd have to ping you because we've got a P1 ticket currently on this issue. 
Would you please give a vague time when would you confirm whether this patch is 
safe? Thank you a lot for helping double check this.

Regards & Thanks,
Yubiao 

-Original Message-
From: Tuikov, Luben  
Sent: Saturday, March 11, 2023 12:56 AM
To: Wang, YuBiao ; amd-gfx@lists.freedesktop.org
Cc: Quan, Evan ; Chen, Horace ; Koenig, 
Christian ; Deucher, Alexander 
; Zhang, Hawking ; Liu, Monk 
; Xu, Feifei ; Wang, Yang(Kevin) 

Subject: Re: [PATCH v2] drm/amdgpu: Force signal hw_fences that are embedded in 
non-sched jobs

On 2023-03-08 21:27, YuBiao Wang wrote:
> v2: Add comments to clarify in the code.
> 
> [Why]
> For engines not supporting soft reset, i.e. VCN, there will be a 
> failed ib test before mode 1 reset during asic reset. The fences in 
> this case are never signaled and next time when we try to free the 
> sa_bo, kernel will hang.
> 
> [How]
> During pre_asic_reset, driver will clear job fences and afterwards the 
> fences' refcount will be reduced to 1. For drm_sched_jobs it will be 
> released in job_free_cb, and for non-sched jobs like ib_test, it's 
> meant to be released in sa_bo_free but only when the fences are 
> signaled. So we have to force signal the non_sched bad job's fence 
> during pre_asic_reset or the clear is not complete.
> 
> Signed-off-by: YuBiao Wang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index faff4a3f96e6..ad7c5b70c35a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -673,6 +673,7 @@ void amdgpu_fence_driver_clear_job_fences(struct 
> amdgpu_ring *ring)  {
>   int i;
>   struct dma_fence *old, **ptr;
> + struct amdgpu_job *job;
>  
>   for (i = 0; i <= ring->fence_drv.num_fences_mask; i++) {
>   ptr = >fence_drv.fences[i];
> @@ -680,6 +681,13 @@ void amdgpu_fence_driver_clear_job_fences(struct 
> amdgpu_ring *ring)
>   if (old && old->ops == _job_fence_ops) {
>   RCU_INIT_POINTER(*ptr, NULL);
>   dma_fence_put(old);
> + /* For non-sched bad job, i.e. failed ib test, we need 
> to force
> +  * signal it right here or we won't be able to track 
> them in fence drv
> +  * and they will remain unsignaled during sa_bo free.
> +  */
> + job = container_of(old, struct amdgpu_job, hw_fence);
> + if (!job->base.s_fence && !dma_fence_is_signaled(old))
> + dma_fence_signal(old);

Conceptually, I don't mind this patch for what it does. The only thing which 
worries me is this check here, !job->base.s_fence, which is used here to 
qualify that we can signal the fence (and of course that the fence is not yet 
signalled.) We need to audit this check to make sure that it is not overloaded 
to mean other things. I'll take a look.

>   }
>   }
>  }

--
Regards,
Luben