date:20240213

[PATCH 2/2] drm/tests/drm_buddy: add alloc_contiguous test

2024-02-13 Thread Arunpravin Paneer Selvam

From: Matthew Auld 

Sanity check DRM_BUDDY_CONTIGUOUS_ALLOCATION.

References: https://gitlab.freedesktop.org/drm/amd/-/issues/3097
Signed-off-by: Matthew Auld 
Cc: Arunpravin Paneer Selvam 
Cc: Limonciello 
Cc: Christian König 
Reviewed-by: Arunpravin Paneer Selvam 
Signed-off-by: Arunpravin Paneer Selvam 
---
 drivers/gpu/drm/tests/drm_buddy_test.c | 89 ++
 1 file changed, 89 insertions(+)

diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c 
b/drivers/gpu/drm/tests/drm_buddy_test.c
index ea2af6bd9abe..4215d8b5fcf0 100644
--- a/drivers/gpu/drm/tests/drm_buddy_test.c
+++ b/drivers/gpu/drm/tests/drm_buddy_test.c
@@ -8,6 +8,7 @@
 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -18,6 +19,93 @@ static inline u64 get_size(int order, u64 chunk_size)
return (1 << order) * chunk_size;
 }
 
+static void drm_test_buddy_alloc_contiguous(struct kunit *test)
+{
+   u64 mm_size, ps = SZ_4K, i, n_pages, total;
+   struct drm_buddy_block *block;
+   struct drm_buddy mm;
+   LIST_HEAD(left);
+   LIST_HEAD(middle);
+   LIST_HEAD(right);
+   LIST_HEAD(allocated);
+
+   mm_size = 16 * 3 * SZ_4K;
+
+   KUNIT_EXPECT_FALSE(test, drm_buddy_init(, mm_size, ps));
+
+   /*
+* Idea is to fragment the address space by alternating block
+* allocations between three different lists; one for left, middle and
+* right. We can then free a list to simulate fragmentation. In
+* particular we want to exercise the DRM_BUDDY_CONTIGUOUS_ALLOCATION,
+* including the try_harder path.
+*/
+
+   i = 0;
+   n_pages = mm_size / ps;
+   do {
+   struct list_head *list;
+   int slot = i % 3;
+
+   if (slot == 0)
+   list = 
+   else if (slot == 1)
+   list = 
+   else
+   list = 
+   KUNIT_ASSERT_FALSE_MSG(test,
+  drm_buddy_alloc_blocks(, 0, mm_size,
+ ps, ps, list, 0),
+  "buddy_alloc hit an error size=%d\n",
+  ps);
+   } while (++i < n_pages);
+
+   KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(, 0, mm_size,
+  3 * ps, ps, 
,
+  
DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+  "buddy_alloc didn't error size=%d\n", 3 * ps);
+
+   drm_buddy_free_list(, );
+   KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(, 0, mm_size,
+  3 * ps, ps, 
,
+  
DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+  "buddy_alloc didn't error size=%llu\n", 3 * ps);
+   KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(, 0, mm_size,
+  2 * ps, ps, 
,
+  
DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+  "buddy_alloc didn't error size=%llu\n", 2 * ps);
+
+   drm_buddy_free_list(, );
+   KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(, 0, mm_size,
+  3 * ps, ps, 
,
+  
DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+  "buddy_alloc didn't error size=%llu\n", 3 * ps);
+   /*
+* At this point we should have enough contiguous space for 2 blocks,
+* however they are never buddies (since we freed middle and right) so
+* will require the try_harder logic to find them.
+*/
+   KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(, 0, mm_size,
+  2 * ps, ps, 
,
+  
DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+  "buddy_alloc hit an error size=%d\n", 2 * ps);
+
+   drm_buddy_free_list(, );
+   KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(, 0, mm_size,
+  3 * ps, ps, 
,
+  
DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+  "buddy_alloc hit an error size=%d\n", 3 * ps);
+
+   total = 0;
+   list_for_each_entry(block, , link)
+   total += drm_buddy_block_size(, block);
+
+   KUNIT_ASSERT_EQ(test, total, ps * 2 + ps * 3);
+
+   drm_buddy_free_list(, );
+   drm_buddy_fini();
+}
+
 static void drm_test_buddy_alloc_pathological(struct kunit *test)
 {
u64 mm_size, size, start = 0;
@@ -280,6 +368,7 @@ static struct kunit_case drm_buddy_tests[] = {

[PATCH 1/2] drm/buddy: Fix alloc_range() error handling code

2024-02-13 Thread Arunpravin Paneer Selvam

Few users have observed display corruption when they boot
the machine to KDE Plasma or playing games. We have root
caused the problem that whenever alloc_range() couldn't
find the required memory blocks the function was returning
SUCCESS in some of the corner cases.

The right approach would be if the total allocated size
is less than the required size, the function should
return -ENOSPC.

Cc:  # 6.7+
Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3097
Tested-by: Mario Limonciello 
Link: 
https://patchwork.kernel.org/project/dri-devel/patch/20240207174456.341121-1-arunpravin.paneersel...@amd.com/
Acked-by: Christian König 
Reviewed-by: Matthew Auld 
Signed-off-by: Arunpravin Paneer Selvam 
---
 drivers/gpu/drm/drm_buddy.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index f57e6d74fb0e..c1a99bf4dffd 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -539,6 +539,12 @@ static int __alloc_range(struct drm_buddy *mm,
} while (1);
 
list_splice_tail(, blocks);
+
+   if (total_allocated < size) {
+   err = -ENOSPC;
+   goto err_free;
+   }
+
return 0;
 
 err_undo:

base-commit: 2c80a2b715df75881359d07dbaacff8ad411f40e
-- 
2.25.1

Coverity: jpeg_v4_0_5_start(): Code maintainability issues

2024-02-13 Thread coverity-bot

Hello!

This is an experimental semi-automated report about issues detected by
Coverity from a scan of next-20240213 as part of the linux-next scan project:
https://scan.coverity.com/projects/linux-next-weekly-scan

You're getting this email because you were associated with the identified
lines of code (noted below) that were touched by commits:

  Mon Feb 12 16:09:32 2024 -0500
0a119d53f74a ("drm/amdgpu/jpeg: add support for jpeg DPG mode")

Coverity reported the following:

*** CID 1583635:  Code maintainability issues  (UNUSED_VALUE)
drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c:461 in jpeg_v4_0_5_start()
455
456 WREG32_SOC15(VCN, i, regVCN_JPEG_DB_CTRL,
457 ring->doorbell_index << 
VCN_JPEG_DB_CTRL__OFFSET__SHIFT |
458 VCN_JPEG_DB_CTRL__EN_MASK);
459
460 if (adev->pg_flags & AMD_PG_SUPPORT_JPEG_DPG) {
vvv CID 1583635:  Code maintainability issues  (UNUSED_VALUE)
vvv Assigning value from "jpeg_v4_0_5_start_dpg_mode(adev, i, 
adev->jpeg.indirect_sram)" to "r" here, but that stored value is overwritten 
before it can be used.
461 r = jpeg_v4_0_5_start_dpg_mode(adev, i, 
adev->jpeg.indirect_sram);
462 continue;
463 }
464
465 /* disable power gating */
466 r = jpeg_v4_0_5_disable_static_power_gating(adev, i);

If this is a false positive, please let us know so we can mark it as
such, or teach the Coverity rules to be smarter. If not, please make
sure fixes get into linux-next. :) For patches fixing this, please
include these lines (but double-check the "Fixes" first):

Reported-by: coverity-bot 
Addresses-Coverity-ID: 1583635 ("Code maintainability issues")
Fixes: 0a119d53f74a ("drm/amdgpu/jpeg: add support for jpeg DPG mode")

Thanks for your attention!

-- 
Coverity-bot

[PATCH 2/2] drm/amd/display: clean else not following close brace

2024-02-13 Thread Joao Paulo Pereira da Silva

From: jppaulo 

Put else statement in the same line and after the close brace.

Signed-off-by: Joao Paulo Pereira da Silva 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link_enc_cfg.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_enc_cfg.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link_enc_cfg.c
index be5a6d008b29..e750c853890e 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_enc_cfg.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_enc_cfg.c
@@ -396,8 +396,7 @@ void link_enc_cfg_link_encs_assign(
eng_id_req = 
stream->link->dpia_preferred_eng_id;
 
eng_id = find_first_avail_link_enc(stream->ctx, state, 
eng_id_req);
-   }
-   else
+   } else
eng_id =  link_enc->preferred_engine;
 
add_link_enc_assignment(state, stream, eng_id);
-- 
2.43.0

[PATCH 1/2] drm/amd/display: clean inconsistent indenting

2024-02-13 Thread Joao Paulo Pereira da Silva

From: jppaulo 

Clean some wrong indenting that throw errors in checkpatch.

Signed-off-by: Joao Paulo Pereira da Silva 
---
 drivers/gpu/drm/amd/display/dc/core/dc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index aa7c02ba948e..7832832b973d 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -962,7 +962,7 @@ static bool dc_construct(struct dc *dc,
goto fail;
}
 
-dc_ctx = dc->ctx;
+   dc_ctx = dc->ctx;
 
/* Resource should construct all asic specific resources.
 * This should be the only place where we need to parse the asic id
@@ -3155,10 +3155,10 @@ static void commit_planes_do_stream_update(struct dc 
*dc,
if (stream_update->mst_bw_update->is_increase)

dc->link_srv->increase_mst_payload(pipe_ctx,

stream_update->mst_bw_update->mst_stream_bw);
-   else
+   else

dc->link_srv->reduce_mst_payload(pipe_ctx,

stream_update->mst_bw_update->mst_stream_bw);
-   }
+   }
 
if (stream_update->pending_test_pattern) {
dc_link_dp_set_test_pattern(stream->link,
-- 
2.43.0

[PATCH 0/2] drm/amd/display: clean codestyle errors

2024-02-13 Thread Joao Paulo Pereira da Silva

jppaulo (2):
  drm/amd/display: clean inconsistent indenting
  drm/amd/display: clean else not following close brace

 drivers/gpu/drm/amd/display/dc/core/dc.c  | 6 +++---
 drivers/gpu/drm/amd/display/dc/core/dc_link_enc_cfg.c | 3 +--
 2 files changed, 4 insertions(+), 5 deletions(-)

-- 
2.43.0

[PATCH] drm/amdgpu: Do not program IH_CHICKEN in vega20_ih.c under SRIOV

2024-02-13 Thread Victor Lu

IH_CHICKEN is blocked for VF writes; this access should be skipped.

Signed-off-by: Victor Lu 
---
 drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 38 ++
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c 
b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
index db66e6cccaf2..b9e785846637 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
@@ -291,27 +291,29 @@ static int vega20_ih_irq_init(struct amdgpu_device *adev)
 
adev->nbio.funcs->ih_control(adev);
 
-   if ((amdgpu_ip_version(adev, OSSSYS_HWIP, 0) == IP_VERSION(4, 2, 1)) &&
-   adev->firmware.load_type == AMDGPU_FW_LOAD_DIRECT) {
-   ih_chicken = RREG32_SOC15(OSSSYS, 0, mmIH_CHICKEN);
-   if (adev->irq.ih.use_bus_addr) {
-   ih_chicken = REG_SET_FIELD(ih_chicken, IH_CHICKEN,
-  MC_SPACE_GPA_ENABLE, 1);
+   if (!amdgpu_sriov_vf(adev)) {
+   if ((amdgpu_ip_version(adev, OSSSYS_HWIP, 0) == IP_VERSION(4, 
2, 1)) &&
+   adev->firmware.load_type == AMDGPU_FW_LOAD_DIRECT) {
+   ih_chicken = RREG32_SOC15(OSSSYS, 0, mmIH_CHICKEN);
+   if (adev->irq.ih.use_bus_addr) {
+   ih_chicken = REG_SET_FIELD(ih_chicken, 
IH_CHICKEN,
+  MC_SPACE_GPA_ENABLE, 
1);
+   }
+   WREG32_SOC15(OSSSYS, 0, mmIH_CHICKEN, ih_chicken);
}
-   WREG32_SOC15(OSSSYS, 0, mmIH_CHICKEN, ih_chicken);
-   }
 
-   /* psp firmware won't program IH_CHICKEN for aldebaran
-* driver needs to program it properly according to
-* MC_SPACE type in IH_RB_CNTL */
-   if ((amdgpu_ip_version(adev, OSSSYS_HWIP, 0) == IP_VERSION(4, 4, 0)) ||
-   (amdgpu_ip_version(adev, OSSSYS_HWIP, 0) == IP_VERSION(4, 4, 2))) {
-   ih_chicken = RREG32_SOC15(OSSSYS, 0, mmIH_CHICKEN_ALDEBARAN);
-   if (adev->irq.ih.use_bus_addr) {
-   ih_chicken = REG_SET_FIELD(ih_chicken, IH_CHICKEN,
-  MC_SPACE_GPA_ENABLE, 1);
+   /* psp firmware won't program IH_CHICKEN for aldebaran
+* driver needs to program it properly according to
+* MC_SPACE type in IH_RB_CNTL */
+   if ((amdgpu_ip_version(adev, OSSSYS_HWIP, 0) == IP_VERSION(4, 
4, 0)) ||
+   (amdgpu_ip_version(adev, OSSSYS_HWIP, 0) == IP_VERSION(4, 
4, 2))) {
+   ih_chicken = RREG32_SOC15(OSSSYS, 0, 
mmIH_CHICKEN_ALDEBARAN);
+   if (adev->irq.ih.use_bus_addr) {
+   ih_chicken = REG_SET_FIELD(ih_chicken, 
IH_CHICKEN,
+  MC_SPACE_GPA_ENABLE, 
1);
+   }
+   WREG32_SOC15(OSSSYS, 0, mmIH_CHICKEN_ALDEBARAN, 
ih_chicken);
}
-   WREG32_SOC15(OSSSYS, 0, mmIH_CHICKEN_ALDEBARAN, ih_chicken);
}
 
for (i = 0; i < ARRAY_SIZE(ih); i++) {
-- 
2.34.1

[PATCH v3] drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole

2024-02-13 Thread Felix Kuehling

The TBA and TMA, along with an unused IB allocation, reside at low
addresses in the VM address space. A stray VM fault which hits these
pages must be serviced by making their page table entries invalid.
The scheduler depends upon these pages being resident and fails,
preventing a debugger from inspecting the failure state.

By relocating these pages above 47 bits in the VM address space they
can only be reached when bits [63:48] are set to 1. This makes it much
less likely for a misbehaving program to generate accesses to them.
The current placement at VA (PAGE_SIZE*2) is readily hit by a NULL
access with a small offset.

v2:
- Move it to the reserved space to avoid concflicts with Mesa
- Add macros to make reserved space management easier

v3:
- Move VM  max PFN calculation into AMDGPU_VA_RESERVED macros

Cc: Arunpravin Paneer Selvam 
Cc: Christian Koenig 
Signed-off-by: Jay Cornwall 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c  |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c|  6 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 11 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 29 ++--
 4 files changed, 27 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index 823d31f4a2a3..b0fb14a4b43c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -28,9 +28,8 @@
 
 uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev)
 {
-   uint64_t addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
+   uint64_t addr = AMDGPU_VA_RESERVED_CSA_START(adev);
 
-   addr -= AMDGPU_VA_RESERVED_CSA_SIZE;
addr = amdgpu_gmc_sign_extend(addr);
 
return addr;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
index 3d0d56087d41..4b9afc4df031 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -45,11 +45,7 @@
  */
 static inline u64 amdgpu_seq64_get_va_base(struct amdgpu_device *adev)
 {
-   u64 addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
-
-   addr -= AMDGPU_VA_RESERVED_TOP;
-
-   return addr;
+   return AMDGPU_VA_RESERVED_SEQ64_START(adev);
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 2c4053b29bb3..42f6ddec50c1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -137,9 +137,18 @@ struct amdgpu_mem_stats;
 
 /* Reserve space at top/bottom of address space for kernel use */
 #define AMDGPU_VA_RESERVED_CSA_SIZE(2ULL << 20)
+#define AMDGPU_VA_RESERVED_CSA_START(adev) (((adev)->vm_manager.max_pfn \
+ << AMDGPU_GPU_PAGE_SHIFT)  \
+- AMDGPU_VA_RESERVED_CSA_SIZE)
 #define AMDGPU_VA_RESERVED_SEQ64_SIZE  (2ULL << 20)
+#define AMDGPU_VA_RESERVED_SEQ64_START(adev)   
(AMDGPU_VA_RESERVED_CSA_START(adev) \
+- 
AMDGPU_VA_RESERVED_SEQ64_SIZE)
+#define AMDGPU_VA_RESERVED_TRAP_SIZE   (2ULL << 12)
+#define AMDGPU_VA_RESERVED_TRAP_START(adev)
(AMDGPU_VA_RESERVED_SEQ64_START(adev) \
+- AMDGPU_VA_RESERVED_TRAP_SIZE)
 #define AMDGPU_VA_RESERVED_BOTTOM  (1ULL << 16)
-#define AMDGPU_VA_RESERVED_TOP (AMDGPU_VA_RESERVED_SEQ64_SIZE 
+ \
+#define AMDGPU_VA_RESERVED_TOP (AMDGPU_VA_RESERVED_TRAP_SIZE + 
\
+AMDGPU_VA_RESERVED_SEQ64_SIZE 
+ \
 AMDGPU_VA_RESERVED_CSA_SIZE)
 
 /* See vm_update_mode */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
index 6604a3f99c5e..4a64307bc438 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include "amdgpu_vm.h"
 
 /*
  * The primary memory I/O features being added for revisions of gfxip
@@ -326,10 +327,16 @@ static void kfd_init_apertures_vi(struct 
kfd_process_device *pdd, uint8_t id)
 * with small reserved space for kernel.
 * Set them to CANONICAL addresses.
 */
-   pdd->gpuvm_base = SVM_USER_BASE;
+   pdd->gpuvm_base = max(SVM_USER_BASE, AMDGPU_VA_RESERVED_BOTTOM);
pdd->gpuvm_limit =
pdd->dev->kfd->shared_resources.gpuvm_size - 1;
 
+   /* dGPUs: the reserved space for kernel
+* before SVM
+*/
+   pdd->qpd.cwsr_base = SVM_CWSR_BASE;
+   pdd->qpd.ib_base = SVM_IB_BASE;
+
pdd->scratch_base = MAKE_SCRATCH_APP_BASE_VI();
pdd->scratch_limit = MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base);
 }
@@ -339,18 +346,18 @@ static void

Re: [Patch v2 1/2] drm/amdkfd: update SIMD distribution algo for GFXIP 9.4.2 onwards

2024-02-13 Thread Felix Kuehling




On 2024-02-13 16:39, Rajneesh Bhardwaj wrote:

In certain cooperative group dispatch scenarios the default SPI resource
allocation may cause reduced per-CU workgroup occupancy. Set
COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST=1 to mitigate soft hang
scenarions.

Suggested-by: Joseph Greathouse 
Signed-off-by: Rajneesh Bhardwaj 


Reviewed-by: Felix Kuehling 



---
* Change the enum bitfield to 4 to avoid ORing condition of previous
   member flags.
* Incorporate review feedback from Felix from
   https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg102840.html
   and split one of the suggested gfx11 changes as a seperate patch.


  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c| 9 +
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h  | 1 +
  drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 4 +++-
  3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
index 42d881809dc7..697b6d530d12 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
@@ -303,6 +303,15 @@ static void update_mqd(struct mqd_manager *mm, void *mqd,
update_cu_mask(mm, mqd, minfo, 0);
set_priority(m, q);
  
+	if (minfo && KFD_GC_VERSION(mm->dev) >= IP_VERSION(9, 4, 2)) {

+   if (minfo->update_flag & UPDATE_FLAG_IS_GWS)
+   m->compute_resource_limits |=
+   COMPUTE_RESOURCE_LIMITS__FORCE_SIMD_DIST_MASK;
+   else
+   m->compute_resource_limits &=
+   ~COMPUTE_RESOURCE_LIMITS__FORCE_SIMD_DIST_MASK;
+   }
+
q->is_active = QUEUE_IS_ACTIVE(*q);
  }
  
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h

index 677281c0793e..80320b8603fc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -532,6 +532,7 @@ struct queue_properties {
  enum mqd_update_flag {
UPDATE_FLAG_DBG_WA_ENABLE = 1,
UPDATE_FLAG_DBG_WA_DISABLE = 2,
+   UPDATE_FLAG_IS_GWS = 4, /* quirk for gfx9 IP */
  };
  
  struct mqd_update_info {

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 43eff221eae5..4858112f9a53 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -95,6 +95,7 @@ void kfd_process_dequeue_from_device(struct 
kfd_process_device *pdd)
  int pqm_set_gws(struct process_queue_manager *pqm, unsigned int qid,
void *gws)
  {
+   struct mqd_update_info minfo = {0};
struct kfd_node *dev = NULL;
struct process_queue_node *pqn;
struct kfd_process_device *pdd;
@@ -146,9 +147,10 @@ int pqm_set_gws(struct process_queue_manager *pqm, 
unsigned int qid,
}
  
  	pdd->qpd.num_gws = gws ? dev->adev->gds.gws_size : 0;

+   minfo.update_flag = gws ? UPDATE_FLAG_IS_GWS : 0;
  
  	return pqn->q->device->dqm->ops.update_queue(pqn->q->device->dqm,

-   pqn->q, NULL);
+   pqn->q, );
  }
  
  void kfd_process_dequeue_from_all_devices(struct kfd_process *p)

[Patch v2 1/2] drm/amdkfd: update SIMD distribution algo for GFXIP 9.4.2 onwards

2024-02-13 Thread Rajneesh Bhardwaj

In certain cooperative group dispatch scenarios the default SPI resource
allocation may cause reduced per-CU workgroup occupancy. Set
COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST=1 to mitigate soft hang
scenarions.

Suggested-by: Joseph Greathouse 
Signed-off-by: Rajneesh Bhardwaj 
---
* Change the enum bitfield to 4 to avoid ORing condition of previous
  member flags.
* Incorporate review feedback from Felix from
  https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg102840.html
  and split one of the suggested gfx11 changes as a seperate patch.


 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c| 9 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h  | 1 +
 drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 4 +++-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
index 42d881809dc7..697b6d530d12 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
@@ -303,6 +303,15 @@ static void update_mqd(struct mqd_manager *mm, void *mqd,
update_cu_mask(mm, mqd, minfo, 0);
set_priority(m, q);
 
+   if (minfo && KFD_GC_VERSION(mm->dev) >= IP_VERSION(9, 4, 2)) {
+   if (minfo->update_flag & UPDATE_FLAG_IS_GWS)
+   m->compute_resource_limits |=
+   COMPUTE_RESOURCE_LIMITS__FORCE_SIMD_DIST_MASK;
+   else
+   m->compute_resource_limits &=
+   ~COMPUTE_RESOURCE_LIMITS__FORCE_SIMD_DIST_MASK;
+   }
+
q->is_active = QUEUE_IS_ACTIVE(*q);
 }
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 677281c0793e..80320b8603fc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -532,6 +532,7 @@ struct queue_properties {
 enum mqd_update_flag {
UPDATE_FLAG_DBG_WA_ENABLE = 1,
UPDATE_FLAG_DBG_WA_DISABLE = 2,
+   UPDATE_FLAG_IS_GWS = 4, /* quirk for gfx9 IP */
 };
 
 struct mqd_update_info {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 43eff221eae5..4858112f9a53 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -95,6 +95,7 @@ void kfd_process_dequeue_from_device(struct 
kfd_process_device *pdd)
 int pqm_set_gws(struct process_queue_manager *pqm, unsigned int qid,
void *gws)
 {
+   struct mqd_update_info minfo = {0};
struct kfd_node *dev = NULL;
struct process_queue_node *pqn;
struct kfd_process_device *pdd;
@@ -146,9 +147,10 @@ int pqm_set_gws(struct process_queue_manager *pqm, 
unsigned int qid,
}
 
pdd->qpd.num_gws = gws ? dev->adev->gds.gws_size : 0;
+   minfo.update_flag = gws ? UPDATE_FLAG_IS_GWS : 0;
 
return pqn->q->device->dqm->ops.update_queue(pqn->q->device->dqm,
-   pqn->q, NULL);
+   pqn->q, );
 }
 
 void kfd_process_dequeue_from_all_devices(struct kfd_process *p)
-- 
2.34.1

[Patch v2 2/2] drm/amdgpu: Fix implicit assumtion in gfx11 debug flags

2024-02-13 Thread Rajneesh Bhardwaj

Gfx11 debug flags mask is currently set with an implicit assumption that
no other mqd update flags exist. This needs to be fixed with newly
introduced flag UPDATE_FLAG_IS_GWS by the previous patch.

Reviewed-by: Felix Kuehling 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
index d722cbd31783..826bc4f6c8a7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
@@ -55,8 +55,8 @@ static void update_cu_mask(struct mqd_manager *mm, void *mqd,
m = get_mqd(mqd);
 
if (has_wa_flag) {
-   uint32_t wa_mask = minfo->update_flag == 
UPDATE_FLAG_DBG_WA_ENABLE ?
-   0x : 0x;
+   uint32_t wa_mask =
+   (minfo->update_flag & UPDATE_FLAG_DBG_WA_ENABLE) ? 
0x : 0x;
 
m->compute_static_thread_mgmt_se0 = wa_mask;
m->compute_static_thread_mgmt_se1 = wa_mask;
-- 
2.34.1

Re: [PATCH 1/2] drm/amdkfd: update SIMD distribution algo for GFXIP 9.4.2 onwards

2024-02-13 Thread Bhardwaj, Rajneesh




On 2/13/2024 3:52 PM, Felix Kuehling wrote:


On 2024-02-09 20:49, Rajneesh Bhardwaj wrote:

In certain cooperative group dispatch scenarios the default SPI resource
allocation may cause reduced per-CU workgroup occupancy. Set
COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST=1 to mitigate soft hang
scenarions.

Suggested-by: Joseph Greathouse 
Signed-off-by: Rajneesh Bhardwaj 
---
* Incorporate review feedback from Felix from
https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg102840.html
   and split one of the suggested gfx11 changes as a seperate patch.


  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c    | 9 +
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h  | 1 +
  drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 4 +++-
  3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c

index 42d881809dc7..697b6d530d12 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
@@ -303,6 +303,15 @@ static void update_mqd(struct mqd_manager *mm, 
void *mqd,

  update_cu_mask(mm, mqd, minfo, 0);
  set_priority(m, q);
  +    if (minfo && KFD_GC_VERSION(mm->dev) >= IP_VERSION(9, 4, 2)) {
+    if (minfo->update_flag & UPDATE_FLAG_IS_GWS)
+    m->compute_resource_limits |=
+    COMPUTE_RESOURCE_LIMITS__FORCE_SIMD_DIST_MASK;
+    else
+    m->compute_resource_limits &=
+    ~COMPUTE_RESOURCE_LIMITS__FORCE_SIMD_DIST_MASK;
+    }
+
  q->is_active = QUEUE_IS_ACTIVE(*q);
  }
  diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h

index 677281c0793e..65b504813576 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -532,6 +532,7 @@ struct queue_properties {
  enum mqd_update_flag {
  UPDATE_FLAG_DBG_WA_ENABLE = 1,
  UPDATE_FLAG_DBG_WA_DISABLE = 2,
+    UPDATE_FLAG_IS_GWS = 3, /* quirk for gfx9 IP */


This flat needs to be a separate bit. So it should be defined as 4. 
Otherwise it looks just like UPDATE_FLAG_DBG_WA_ENABLE | 
UPDATE_FLAG_DBG_WA_DISABLE. I agree that defining bit-masks in an enum 
is not ideal, but I've seen the same in other places.



Yes, I agree. I am have sent the updated patches. When we extend this in 
future if required, it would be 8, 16 and so on...not a great way to 
extend/define an enum IMO.





Regards,
  Felix



  };
    struct mqd_update_info {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c

index 43eff221eae5..4858112f9a53 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -95,6 +95,7 @@ void kfd_process_dequeue_from_device(struct 
kfd_process_device *pdd)

  int pqm_set_gws(struct process_queue_manager *pqm, unsigned int qid,
  void *gws)
  {
+    struct mqd_update_info minfo = {0};
  struct kfd_node *dev = NULL;
  struct process_queue_node *pqn;
  struct kfd_process_device *pdd;
@@ -146,9 +147,10 @@ int pqm_set_gws(struct process_queue_manager 
*pqm, unsigned int qid,

  }
    pdd->qpd.num_gws = gws ? dev->adev->gds.gws_size : 0;
+    minfo.update_flag = gws ? UPDATE_FLAG_IS_GWS : 0;
    return pqn->q->device->dqm->ops.update_queue(pqn->q->device->dqm,
-    pqn->q, NULL);
+    pqn->q, );
  }
    void kfd_process_dequeue_from_all_devices(struct kfd_process *p)

Re: [PATCH 2/2] drm/amdgpu: Fix implicit assumtion in gfx11 debug flags

2024-02-13 Thread Felix Kuehling


On 2024-02-09 20:49, Rajneesh Bhardwaj wrote:

Gfx11 debug flags mask is currently set with an implicit assumption that
no other mqd update flags exist. This needs to be fixed with newly
introduced flag UPDATE_FLAG_IS_GWS by the previous patch.

Signed-off-by: Rajneesh Bhardwaj 


Reviewed-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
index d722cbd31783..826bc4f6c8a7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
@@ -55,8 +55,8 @@ static void update_cu_mask(struct mqd_manager *mm, void *mqd,
m = get_mqd(mqd);
  
  	if (has_wa_flag) {

-   uint32_t wa_mask = minfo->update_flag == 
UPDATE_FLAG_DBG_WA_ENABLE ?
-   0x : 0x;
+   uint32_t wa_mask =
+   (minfo->update_flag & UPDATE_FLAG_DBG_WA_ENABLE) ? 
0x : 0x;
  
  		m->compute_static_thread_mgmt_se0 = wa_mask;

m->compute_static_thread_mgmt_se1 = wa_mask;

Re: [PATCH 1/2] drm/amdkfd: update SIMD distribution algo for GFXIP 9.4.2 onwards

2024-02-13 Thread Felix Kuehling




On 2024-02-09 20:49, Rajneesh Bhardwaj wrote:

In certain cooperative group dispatch scenarios the default SPI resource
allocation may cause reduced per-CU workgroup occupancy. Set
COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST=1 to mitigate soft hang
scenarions.

Suggested-by: Joseph Greathouse 
Signed-off-by: Rajneesh Bhardwaj 
---
* Incorporate review feedback from Felix from
   https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg102840.html
   and split one of the suggested gfx11 changes as a seperate patch.


  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c| 9 +
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h  | 1 +
  drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 4 +++-
  3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
index 42d881809dc7..697b6d530d12 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
@@ -303,6 +303,15 @@ static void update_mqd(struct mqd_manager *mm, void *mqd,
update_cu_mask(mm, mqd, minfo, 0);
set_priority(m, q);
  
+	if (minfo && KFD_GC_VERSION(mm->dev) >= IP_VERSION(9, 4, 2)) {

+   if (minfo->update_flag & UPDATE_FLAG_IS_GWS)
+   m->compute_resource_limits |=
+   COMPUTE_RESOURCE_LIMITS__FORCE_SIMD_DIST_MASK;
+   else
+   m->compute_resource_limits &=
+   ~COMPUTE_RESOURCE_LIMITS__FORCE_SIMD_DIST_MASK;
+   }
+
q->is_active = QUEUE_IS_ACTIVE(*q);
  }
  
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h

index 677281c0793e..65b504813576 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -532,6 +532,7 @@ struct queue_properties {
  enum mqd_update_flag {
UPDATE_FLAG_DBG_WA_ENABLE = 1,
UPDATE_FLAG_DBG_WA_DISABLE = 2,
+   UPDATE_FLAG_IS_GWS = 3, /* quirk for gfx9 IP */


This flat needs to be a separate bit. So it should be defined as 4. 
Otherwise it looks just like UPDATE_FLAG_DBG_WA_ENABLE | 
UPDATE_FLAG_DBG_WA_DISABLE. I agree that defining bit-masks in an enum 
is not ideal, but I've seen the same in other places.


Regards,
  Felix



  };
  
  struct mqd_update_info {

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 43eff221eae5..4858112f9a53 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -95,6 +95,7 @@ void kfd_process_dequeue_from_device(struct 
kfd_process_device *pdd)
  int pqm_set_gws(struct process_queue_manager *pqm, unsigned int qid,
void *gws)
  {
+   struct mqd_update_info minfo = {0};
struct kfd_node *dev = NULL;
struct process_queue_node *pqn;
struct kfd_process_device *pdd;
@@ -146,9 +147,10 @@ int pqm_set_gws(struct process_queue_manager *pqm, 
unsigned int qid,
}
  
  	pdd->qpd.num_gws = gws ? dev->adev->gds.gws_size : 0;

+   minfo.update_flag = gws ? UPDATE_FLAG_IS_GWS : 0;
  
  	return pqn->q->device->dqm->ops.update_queue(pqn->q->device->dqm,

-   pqn->q, NULL);
+   pqn->q, );
  }
  
  void kfd_process_dequeue_from_all_devices(struct kfd_process *p)

Re: [PATCH] drm/amdgpu: Allow secure submission on SDMAv4.4.2 rings

2024-02-13 Thread Alex Deucher

On Tue, Feb 13, 2024 at 1:58 PM David Francis  wrote:
>
> This flag was accidentally left off of SDMAv4.4.2 when it was
> added. SDMAv4.4.2, like all other SDMA engines, does support
> secure submission.
>
> Signed-off-by: David Francis 

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c 
> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
> index fec5a3d1c4bc..bd29b13bc3d5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
> @@ -1839,6 +1839,7 @@ static const struct amdgpu_ring_funcs 
> sdma_v4_4_2_ring_funcs = {
> .align_mask = 0xff,
> .nop = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP),
> .support_64bit_ptrs = true,
> +   .secure_submission_supported = true,
> .get_rptr = sdma_v4_4_2_ring_get_rptr,
> .get_wptr = sdma_v4_4_2_ring_get_wptr,
> .set_wptr = sdma_v4_4_2_ring_set_wptr,
> @@ -1870,6 +1871,7 @@ static const struct amdgpu_ring_funcs 
> sdma_v4_4_2_page_ring_funcs = {
> .align_mask = 0xff,
> .nop = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP),
> .support_64bit_ptrs = true,
> +   .secure_submission_supported = true,
> .get_rptr = sdma_v4_4_2_ring_get_rptr,
> .get_wptr = sdma_v4_4_2_page_ring_get_wptr,
> .set_wptr = sdma_v4_4_2_page_ring_set_wptr,
> --
> 2.25.1
>

[PATCH] drm/amdgpu: Improve error checking in amdgpu_virt_rlcg_reg_rw (v2)

2024-02-13 Thread Victor Lu

The current error detection only looks for a timeout.
This should be changed to also check scratch_reg1 for any errors
returned from RLCG.

v2: remove new error value

Signed-off-by: Victor Lu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 5 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 1 +
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index 6ff7d3fb2008..7a4eae36778a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -979,7 +979,7 @@ u32 amdgpu_virt_rlcg_reg_rw(struct amdgpu_device *adev, u32 
offset, u32 v, u32 f
 * SCRATCH_REG0 = read/write value
 * SCRATCH_REG1[30:28]  = command
 * SCRATCH_REG1[19:0]   = address in dword
-* SCRATCH_REG1[26:24]  = Error reporting
+* SCRATCH_REG1[27:24]  = Error reporting
 */
writel(v, scratch_reg0);
writel((offset | flag), scratch_reg1);
@@ -993,7 +993,8 @@ u32 amdgpu_virt_rlcg_reg_rw(struct amdgpu_device *adev, u32 
offset, u32 v, u32 f
udelay(10);
}
 
-   if (i >= timeout) {
+   tmp = readl(scratch_reg1);
+   if (i >= timeout || (tmp & AMDGPU_RLCG_SCRATCH1_ERROR_MASK) != 
0) {
if (amdgpu_sriov_rlcg_error_report_enabled(adev)) {
if (tmp & AMDGPU_RLCG_VFGATE_DISABLED) {
dev_err(adev->dev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
index fa7be5f277b9..3f59b7b5523f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
@@ -45,6 +45,7 @@
 #define AMDGPU_RLCG_REG_NOT_IN_RANGE   0x100
 
 #define AMDGPU_RLCG_SCRATCH1_ADDRESS_MASK  0xF
+#define AMDGPU_RLCG_SCRATCH1_ERROR_MASK0xF00
 
 /* all asic after AI use this offset */
 #define mmRCC_IOV_FUNC_IDENTIFIER 0xDE5
-- 
2.34.1

[PATCH] drm/amdgpu: Allow secure submission on SDMAv4.4.2 rings

2024-02-13 Thread David Francis

This flag was accidentally left off of SDMAv4.4.2 when it was
added. SDMAv4.4.2, like all other SDMA engines, does support
secure submission.

Signed-off-by: David Francis 
---
 drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
index fec5a3d1c4bc..bd29b13bc3d5 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
@@ -1839,6 +1839,7 @@ static const struct amdgpu_ring_funcs 
sdma_v4_4_2_ring_funcs = {
.align_mask = 0xff,
.nop = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP),
.support_64bit_ptrs = true,
+   .secure_submission_supported = true,
.get_rptr = sdma_v4_4_2_ring_get_rptr,
.get_wptr = sdma_v4_4_2_ring_get_wptr,
.set_wptr = sdma_v4_4_2_ring_set_wptr,
@@ -1870,6 +1871,7 @@ static const struct amdgpu_ring_funcs 
sdma_v4_4_2_page_ring_funcs = {
.align_mask = 0xff,
.nop = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP),
.support_64bit_ptrs = true,
+   .secure_submission_supported = true,
.get_rptr = sdma_v4_4_2_ring_get_rptr,
.get_wptr = sdma_v4_4_2_page_ring_get_wptr,
.set_wptr = sdma_v4_4_2_page_ring_set_wptr,
-- 
2.25.1

Re: [PATCH v2 6/6] drm: add drm_mode_atomic_commit event

2024-02-13 Thread Steven Rostedt

On Tue, 13 Feb 2024 16:50:31 +0100
Pierre-Eric Pelloux-Prayer  wrote:

> @@ -1503,6 +1504,24 @@ int drm_mode_atomic_ioctl(struct drm_device *dev,
>   drm_mode_object_put(obj);
>   }
>  
> + if (trace_drm_mode_atomic_commit_enabled()) {
> + struct drm_crtc_state *crtc_state;
> + struct drm_crtc *crtc;
> + int *crtcs;
> + int i, num_crtcs;
> +
> + crtcs = kcalloc(dev->mode_config.num_crtc, sizeof(int),
> + GFP_KERNEL);

If the above allocation fails, this will cause a NULL kernel dereference.

-- Steve

> +
> + num_crtcs = 0;
> + for_each_new_crtc_in_state(state, crtc, crtc_state, i)
> + crtcs[num_crtcs++] = drm_crtc_index(crtc);
> +
> + trace_drm_mode_atomic_commit(file_priv, crtcs, num_crtcs, 
> arg->flags);
> +
> + kfree(crtcs);
> + }
> +
>   ret = prepare_signaling(dev, state, arg, file_priv, _state,
>   _fences);
>   if (ret)

[PATCH v2 6/6] drm: add drm_mode_atomic_commit event

2024-02-13 Thread Pierre-Eric Pelloux-Prayer

With this and the dma_fence_sync_to event, a tool can draw the
relationship between the compositing draw, the atomic commit, and vblank.

An example on a 2 monitors system look like this:

gnome-shell-1638[018] .  2571.905124: drm_mode_atomic_commit: 
file=245c3f0c, pid=1165, flags=0201, crtcs={0x1}
gnome-shell-1638[018] .  2571.905147: dma_fence_sync_to: 
driver=drm_sched timeline=gfx_0.0.0 context=270 seqno=73240 
reason=dma_fence_chain_init
gnome-shell-1638[018] .  2571.913226: drm_mode_atomic_commit: 
file=245c3f0c, pid=1165, flags=0201, crtcs={0x0}
gnome-shell-1638[018] .  2571.913250: dma_fence_sync_to: 
driver=drm_sched timeline=gfx_0.0.0 context=270 seqno=73241 
reason=dma_fence_chain_init
 -0   [018] d.h3.  2571.915687: drm_vblank_event: crtc=1, 
seq=155747, time=2571916093743, high-prec=true
 -0   [018] d.h3.  2571.915968: drm_vblank_event: crtc=0, 
seq=153862, time=2571916377180, high-prec=true

Signed-off-by: Pierre-Eric Pelloux-Prayer 
---
 drivers/gpu/drm/drm_atomic_uapi.c | 19 +++
 drivers/gpu/drm/drm_trace.h   | 28 ++--
 2 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
b/drivers/gpu/drm/drm_atomic_uapi.c
index 29d4940188d4..0d3767cd155a 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -41,6 +41,7 @@
 #include 
 
 #include "drm_crtc_internal.h"
+#include "drm_trace.h"
 
 /**
  * DOC: overview
@@ -1503,6 +1504,24 @@ int drm_mode_atomic_ioctl(struct drm_device *dev,
drm_mode_object_put(obj);
}
 
+   if (trace_drm_mode_atomic_commit_enabled()) {
+   struct drm_crtc_state *crtc_state;
+   struct drm_crtc *crtc;
+   int *crtcs;
+   int i, num_crtcs;
+
+   crtcs = kcalloc(dev->mode_config.num_crtc, sizeof(int),
+   GFP_KERNEL);
+
+   num_crtcs = 0;
+   for_each_new_crtc_in_state(state, crtc, crtc_state, i)
+   crtcs[num_crtcs++] = drm_crtc_index(crtc);
+
+   trace_drm_mode_atomic_commit(file_priv, crtcs, num_crtcs, 
arg->flags);
+
+   kfree(crtcs);
+   }
+
ret = prepare_signaling(dev, state, arg, file_priv, _state,
_fences);
if (ret)
diff --git a/drivers/gpu/drm/drm_trace.h b/drivers/gpu/drm/drm_trace.h
index 11c6dd577e8e..b62a44cb1270 100644
--- a/drivers/gpu/drm/drm_trace.h
+++ b/drivers/gpu/drm/drm_trace.h
@@ -62,8 +62,32 @@ TRACE_EVENT(drm_vblank_event_delivered,
__entry->crtc = crtc;
__entry->seq = seq;
),
-   TP_printk("file=%p, crtc=%d, seq=%u", __entry->file, __entry->crtc, 
\
- __entry->seq)
+   TP_printk("file=%p, crtc=%d, seq=%u, pid=%8d", \
+ __entry->file, __entry->crtc, __entry->seq, \
+ pid_nr(__entry->file->pid))
+);
+
+TRACE_EVENT(drm_mode_atomic_commit,
+   TP_PROTO(struct drm_file *file, int *crtcs, int ncrtcs, uint32_t 
flags),
+   TP_ARGS(file, crtcs, ncrtcs, flags),
+   TP_STRUCT__entry(
+   __field(struct drm_file *, file)
+   __dynamic_array(u32, crtcs, ncrtcs)
+   __field(uint32_t, ncrtcs)
+   __field(uint32_t, flags)
+   ),
+   TP_fast_assign(
+   unsigned int i;
+
+   __entry->file = file;
+   for (i = 0; i < ncrtcs; i++)
+   ((u32 *)__get_dynamic_array(crtcs))[i] = crtcs[i];
+   __entry->ncrtcs = ncrtcs;
+   __entry->flags = flags;
+   ),
+   TP_printk("file=%p, pid=%8d, flags=%08x, crtcs=%s", __entry->file, \
+ pid_nr(__entry->file->pid), __entry->flags, \
+ __print_array(__get_dynamic_array(crtcs), 
__entry->ncrtcs, 4))
 );
 
 #endif /* _DRM_TRACE_H_ */
-- 
2.40.1

[PATCH v2 5/6] drm/amdgpu: add a amdgpu_cs_ioctl2 event

2024-02-13 Thread Pierre-Eric Pelloux-Prayer

amdgpu_cs_ioctl already exists but serves a different
purpose.

amdgpu_cs_ioctl2 marks the beginning of the kernel processing of
the ioctl which is useful for tools to map which events belong to
the same submission (without this, the first event would be the
amdgpu_bo_set_list ones).

Signed-off-by: Pierre-Eric Pelloux-Prayer 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c|  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 12 
 2 files changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 6830892383c3..29e43a66d0d6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1402,6 +1402,8 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void *data, 
struct drm_file *filp)
return r;
}
 
+   trace_amdgpu_cs_ioctl2(data);
+
r = amdgpu_cs_pass1(, data);
if (r)
goto error_fini;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
index e8ea1cfe7027..24e95560ede5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
@@ -189,6 +189,18 @@ TRACE_EVENT(amdgpu_cs_ioctl,
  __entry->seqno, __get_str(ring), __entry->num_ibs)
 );
 
+TRACE_EVENT(amdgpu_cs_ioctl2,
+   TP_PROTO(union drm_amdgpu_cs *cs),
+   TP_ARGS(cs),
+   TP_STRUCT__entry(
+__field(uint32_t, ctx_id)
+   ),
+   TP_fast_assign(
+  __entry->ctx_id = cs->in.ctx_id;
+   ),
+   TP_printk("context=%u", __entry->ctx_id)
+);
+
 TRACE_EVENT(amdgpu_sched_run_job,
TP_PROTO(struct amdgpu_job *job),
TP_ARGS(job),
-- 
2.40.1

[PATCH v2 4/6] drm/amdgpu: add BO clear event

2024-02-13 Thread Pierre-Eric Pelloux-Prayer

Useful to identify why sdma jobs are submitted.

Signed-off-by: Pierre-Eric Pelloux-Prayer 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h  | 16 
 2 files changed, 18 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 425cebcc5cbf..7219f329d6f0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -631,6 +631,8 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
bo->tbo.resource->mem_type == TTM_PL_VRAM) {
struct dma_fence *fence;
 
+   trace_amdgpu_bo_clear(bo);
+
r = amdgpu_fill_buffer(bo, 0, bo->tbo.base.resv, , true);
if (unlikely(r))
goto fail_unreserve;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
index f539b1d00234..e8ea1cfe7027 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
@@ -514,6 +514,22 @@ TRACE_EVENT(amdgpu_bo_move,
__entry->new_placement, __entry->bo_size)
 );
 
+TRACE_EVENT(amdgpu_bo_clear,
+   TP_PROTO(struct amdgpu_bo *bo),
+   TP_ARGS(bo),
+   TP_STRUCT__entry(
+   __field(struct amdgpu_bo *, bo)
+   __field(u64, bo_size)
+   ),
+
+   TP_fast_assign(
+   __entry->bo  = bo;
+   __entry->bo_size = amdgpu_bo_size(bo);
+   ),
+   TP_printk("bo=%p, size=%lld",
+   __entry->bo, __entry->bo_size)
+);
+
 TRACE_EVENT(amdgpu_ib_pipe_sync,
TP_PROTO(struct amdgpu_job *sched_job, struct dma_fence *fence),
TP_ARGS(sched_job, fence),
-- 
2.40.1

[PATCH v2 3/6] amdgpu: use trace_dma_fence_sync_to in amdgpu_fence_sync

2024-02-13 Thread Pierre-Eric Pelloux-Prayer

This makes it possible to understand the dependencies between jobs.
Possible usage of this trace:
* stuttering issues like Mesa !9189
* incorrect synchronization: I don't have a link for this one, but having
  these events was very useful to debug a virtio-gpu / native-context /
  radeonsi sync issue

I have prototype code using this in UMR, as can be see here:
   https://gitlab.freedesktop.org/tomstdenis/umr/-/merge_requests/37

The 'reason' param currently uses __func__ but I didn't add a macro for
this because it'd be interesting to use more descriptive names for each
use of amdgpu_fence_sync.

Signed-off-by: Pierre-Eric Pelloux-Prayer 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  8 
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   | 14 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c  |  8 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c  |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 11 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c |  4 ++--
 7 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index d17b2452cb1f..fde98e48c84b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -491,7 +491,7 @@ static int vm_update_pds(struct amdgpu_vm *vm, struct 
amdgpu_sync *sync)
if (ret)
return ret;
 
-   return amdgpu_sync_fence(sync, vm->last_update);
+   return amdgpu_sync_fence(sync, vm->last_update, __func__);
 }
 
 static uint64_t get_pte_flags(struct amdgpu_device *adev, struct kgd_mem *mem)
@@ -1251,7 +1251,7 @@ static void unmap_bo_from_gpuvm(struct kgd_mem *mem,
 
amdgpu_vm_clear_freed(adev, vm, _va->last_pt_update);
 
-   amdgpu_sync_fence(sync, bo_va->last_pt_update);
+   amdgpu_sync_fence(sync, bo_va->last_pt_update, __func__);
 }
 
 static int update_gpuvm_pte(struct kgd_mem *mem,
@@ -1273,7 +1273,7 @@ static int update_gpuvm_pte(struct kgd_mem *mem,
return ret;
}
 
-   return amdgpu_sync_fence(sync, bo_va->last_pt_update);
+   return amdgpu_sync_fence(sync, bo_va->last_pt_update, __func__);
 }
 
 static int map_bo_to_gpuvm(struct kgd_mem *mem,
@@ -2910,7 +2910,7 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, 
struct dma_fence **ef)
}
dma_resv_for_each_fence(, bo->tbo.base.resv,
DMA_RESV_USAGE_KERNEL, fence) {
-   ret = amdgpu_sync_fence(_obj, fence);
+   ret = amdgpu_sync_fence(_obj, fence, __func__);
if (ret) {
pr_debug("Memory eviction: Sync BO fence 
failed. Try again\n");
goto validate_map_fail;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 6adeddfb3d56..6830892383c3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -423,7 +423,7 @@ static int amdgpu_cs_p2_dependencies(struct 
amdgpu_cs_parser *p,
dma_fence_put(old);
}
 
-   r = amdgpu_sync_fence(>sync, fence);
+   r = amdgpu_sync_fence(>sync, fence, __func__);
dma_fence_put(fence);
if (r)
return r;
@@ -445,7 +445,7 @@ static int amdgpu_syncobj_lookup_and_add(struct 
amdgpu_cs_parser *p,
return r;
}
 
-   r = amdgpu_sync_fence(>sync, fence);
+   r = amdgpu_sync_fence(>sync, fence, __func__);
dma_fence_put(fence);
return r;
 }
@@ -1101,7 +1101,7 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser 
*p)
if (r)
return r;
 
-   r = amdgpu_sync_fence(>sync, fpriv->prt_va->last_pt_update);
+   r = amdgpu_sync_fence(>sync, fpriv->prt_va->last_pt_update, 
__func__);
if (r)
return r;
 
@@ -1112,7 +1112,7 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser 
*p)
if (r)
return r;
 
-   r = amdgpu_sync_fence(>sync, bo_va->last_pt_update);
+   r = amdgpu_sync_fence(>sync, bo_va->last_pt_update, 
__func__);
if (r)
return r;
}
@@ -1131,7 +1131,7 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser 
*p)
if (r)
return r;
 
-   r = amdgpu_sync_fence(>sync, bo_va->last_pt_update);
+   r = amdgpu_sync_fence(>sync, bo_va->last_pt_update, 
__func__);
if (r)
return r;
}
@@ -1144,7 +1144,7 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser 
*p)
if (r)
return r;
 
-   r =

[PATCH v2 2/6] dma-buf/fence-chain: use trace_dma_fence_sync_to

2024-02-13 Thread Pierre-Eric Pelloux-Prayer

To inform tools about the relationship between the fences.

Signed-off-by: Pierre-Eric Pelloux-Prayer 
---
 drivers/dma-buf/dma-fence-chain.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/dma-buf/dma-fence-chain.c 
b/drivers/dma-buf/dma-fence-chain.c
index 9663ba1bb6ac..a211b3d4156a 100644
--- a/drivers/dma-buf/dma-fence-chain.c
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -9,6 +9,8 @@
 
 #include 
 
+#include "trace/events/dma_fence.h"
+
 static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);
 
 /**
@@ -251,6 +253,8 @@ void dma_fence_chain_init(struct dma_fence_chain *chain,
chain->fence = fence;
chain->prev_seqno = 0;
 
+   trace_dma_fence_sync_to(fence, __func__);
+
/* Try to reuse the context of the previous chain node. */
if (prev_chain && __dma_fence_is_later(seqno, prev->seqno, prev->ops)) {
context = prev->context;
-- 
2.40.1

[PATCH v2 1/6] tracing, dma-buf: add a trace_dma_fence_sync_to event

2024-02-13 Thread Pierre-Eric Pelloux-Prayer

This new event can be used to trace where a given dma_fence is added
as a dependency of some other work.

I plan to use it in amdgpu.

Signed-off-by: Pierre-Eric Pelloux-Prayer 
---
 drivers/dma-buf/dma-fence.c  |  1 +
 include/trace/events/dma_fence.h | 34 
 2 files changed, 35 insertions(+)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index e0fd99e61a2d..671a499a5ccd 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -23,6 +23,7 @@
 EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit);
 EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal);
 EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
+EXPORT_TRACEPOINT_SYMBOL(dma_fence_sync_to);
 
 static DEFINE_SPINLOCK(dma_fence_stub_lock);
 static struct dma_fence dma_fence_stub;
diff --git a/include/trace/events/dma_fence.h b/include/trace/events/dma_fence.h
index 3963e79ca7b4..9b3875f7aa79 100644
--- a/include/trace/events/dma_fence.h
+++ b/include/trace/events/dma_fence.h
@@ -83,6 +83,40 @@ DEFINE_EVENT(dma_fence, dma_fence_wait_end,
TP_ARGS(fence)
 );
 
+DECLARE_EVENT_CLASS(dma_fence_from,
+
+   TP_PROTO(struct dma_fence *fence, const char *reason),
+
+   TP_ARGS(fence, reason),
+
+   TP_STRUCT__entry(
+   __string(driver, fence->ops->get_driver_name(fence))
+   __string(timeline, fence->ops->get_timeline_name(fence))
+   __field(unsigned int, context)
+   __field(unsigned int, seqno)
+   __string(reason, reason)
+   ),
+
+   TP_fast_assign(
+   __assign_str(driver, fence->ops->get_driver_name(fence));
+   __assign_str(timeline, fence->ops->get_timeline_name(fence));
+   __entry->context = fence->context;
+   __entry->seqno = fence->seqno;
+   __assign_str(reason, reason);
+   ),
+
+   TP_printk("driver=%s timeline=%s context=%u seqno=%u reason=%s",
+ __get_str(driver), __get_str(timeline), __entry->context,
+ __entry->seqno, __get_str(reason))
+);
+
+DEFINE_EVENT(dma_fence_from, dma_fence_sync_to,
+
+   TP_PROTO(struct dma_fence *fence, const char *reason),
+
+   TP_ARGS(fence, reason)
+);
+
 #endif /*  _TRACE_DMA_FENCE_H */
 
 /* This part must be outside protection */
-- 
2.40.1

[PATCH v2 0/6] dma-fence, drm, amdgpu new trace events

2024-02-13 Thread Pierre-Eric Pelloux-Prayer

This series adds new events to make it easier for tools
like gpuvis or umr to graph the GPUs, kernel and applications
activity.

UMR patches using these events can be found here:
https://gitlab.freedesktop.org/tomstdenis/umr/-/merge_requests/37

V1:
https://patchwork.kernel.org/project/linux-media/patch/20240117184329.479554-1-pierre-eric.pelloux-pra...@amd.com/

Changes from V1:
* uses trace_dma_fence_sync_to from dma-fence-chain.c
* new amdgpu events
* new drm plane commit event

Pierre-Eric Pelloux-Prayer (6):
  tracing, dma-buf: add a trace_dma_fence_sync_to event
  dma-buf/fence-chain: use trace_dma_fence_sync_to
  amdgpu: use trace_dma_fence_sync_to in amdgpu_fence_sync
  drm/amdgpu: add BO clear event
  drm/amdgpu: add a amdgpu_cs_ioctl2 event
  drm: add drm_mode_atomic_commit event

 drivers/dma-buf/dma-fence-chain.c |  4 +++
 drivers/dma-buf/dma-fence.c   |  1 +
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  8 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 16 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c   |  8 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c   |  4 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c|  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c  | 11 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h  |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 28 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c  |  4 +--
 drivers/gpu/drm/drm_atomic_uapi.c | 19 +++
 drivers/gpu/drm/drm_trace.h   | 28 +--
 include/trace/events/dma_fence.h  | 34 +++
 14 files changed, 144 insertions(+), 26 deletions(-)

-- 
2.40.1

Re: [PATCH 2/2] drm/tests/drm_buddy: add alloc_contiguous test

2024-02-13 Thread Christian König


Am 13.02.24 um 15:28 schrieb Matthew Auld:

On 13/02/2024 13:52, Arunpravin Paneer Selvam wrote:

Sanity check DRM_BUDDY_CONTIGUOUS_ALLOCATION.

References: https://gitlab.freedesktop.org/drm/amd/-/issues/3097
Signed-off-by: Matthew Auld 
Reviewed-by: Arunpravin Paneer Selvam 


It looks like you changed the patch authorship here.


Going to fix this if I get tasked with pushing this to drm-misc-fixes.

But I still have hope that Arun will figure out how to do this himself.

Christian.




Cc: Arunpravin Paneer Selvam 
Cc: Limonciello 
Cc: Christian König 
Signed-off-by: Arunpravin Paneer Selvam 


---
  drivers/gpu/drm/tests/drm_buddy_test.c | 89 ++
  1 file changed, 89 insertions(+)

diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c 
b/drivers/gpu/drm/tests/drm_buddy_test.c

index ea2af6bd9abe..fee6bec757d1 100644
--- a/drivers/gpu/drm/tests/drm_buddy_test.c
+++ b/drivers/gpu/drm/tests/drm_buddy_test.c
@@ -8,6 +8,7 @@
    #include 
  #include 
+#include 
    #include 
  @@ -18,6 +19,93 @@ static inline u64 get_size(int order, u64 
chunk_size)

  return (1 << order) * chunk_size;
  }
  +static void drm_test_buddy_alloc_contiguous(struct kunit *test)
+{
+    u64 mm_size, ps = SZ_4K, i, n_pages, total;
+    struct drm_buddy_block *block;
+    struct drm_buddy mm;
+    LIST_HEAD(left);
+    LIST_HEAD(middle);
+    LIST_HEAD(right);
+    LIST_HEAD(allocated);
+
+    mm_size = 16 * 3 * SZ_4K;
+
+    KUNIT_EXPECT_FALSE(test, drm_buddy_init(, mm_size, ps));
+
+    /*
+ * Idea is to fragment the address space by alternating block
+ * allocations between three different lists; one for left, 
middle and

+ * right. We can then free a list to simulate fragmentation. In
+ * particular we want to exercise the 
DRM_BUDDY_CONTIGUOUS_ALLOCATION,

+ * including the try_harder path.
+ */
+
+    i = 0;
+    n_pages = mm_size / ps;
+    do {
+    struct list_head *list;
+    int slot = i % 3;
+
+    if (slot == 0)
+    list = 
+    else if (slot == 1)
+    list = 
+    else
+    list = 
+    KUNIT_ASSERT_FALSE_MSG(test,
+   drm_buddy_alloc_blocks(, 0, mm_size,
+  ps, ps, list, 0),
+   "buddy_alloc hit an error size=%d\n",
+   ps);
+    } while (++i < n_pages);
+
+    KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(, 0, mm_size,
+   3 * ps, ps, ,
+ DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+   "buddy_alloc didn't error size=%d\n", 3 * ps);
+
+    drm_buddy_free_list(, );
+    KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(, 0, mm_size,
+   3 * ps, ps, ,
+ DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+   "buddy_alloc didn't error size=%llu\n", 3 * ps);
+    KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(, 0, mm_size,
+   2 * ps, ps, ,
+ DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+   "buddy_alloc didn't error size=%llu\n", 2 * ps);
+
+    drm_buddy_free_list(, );
+    KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(, 0, mm_size,
+   3 * ps, ps, ,
+ DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+   "buddy_alloc didn't error size=%llu\n", 3 * ps);
+    /*
+ * At this point we should have enough contiguous space for 2 
blocks,
+ * however they are never buddies (since we freed middle and 
right) so

+ * will require the try_harder logic to find them.
+ */
+    KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(, 0, 
mm_size,

+    2 * ps, ps, ,
+ DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+   "buddy_alloc hit an error size=%d\n", 2 * ps);
+
+    drm_buddy_free_list(, );
+    KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(, 0, 
mm_size,

+    3 * ps, ps, ,
+ DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+   "buddy_alloc hit an error size=%d\n", 3 * ps);
+
+    total = 0;
+    list_for_each_entry(block, , link)
+    total += drm_buddy_block_size(, block);
+
+    KUNIT_ASSERT_EQ(test, total, ps * 2 + ps * 3);
+
+    drm_buddy_free_list(, );
+    drm_buddy_fini();
+}
+
  static void drm_test_buddy_alloc_pathological(struct kunit *test)
  {
  u64 mm_size, size, start = 0;
@@ -280,6 +368,7 @@ static struct kunit_case drm_buddy_tests[] = {
  KUNIT_CASE(drm_test_buddy_alloc_optimistic),
  KUNIT_CASE(drm_test_buddy_alloc_pessimistic),
  KUNIT_CASE(drm_test_buddy_alloc_pathological),
+    KUNIT_CASE(drm_test_buddy_alloc_contiguous),
  {}
  };

RE: [PATCH v2] drm/amd/pm: Allow setting max UCLK on SMU v13.0.6

2024-02-13 Thread Kamal, Asad

[AMD Official Use Only - General]

Reviewed-by: Asad Kamal 
Tested-by: Asad Kamal 

Thanks & Regards
Asad

-Original Message-
From: Lazar, Lijo 
Sent: Friday, February 9, 2024 1:21 PM
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Deucher, Alexander 
; Kamal, Asad ; Ma, Le 

Subject: [PATCH v2] drm/amd/pm: Allow setting max UCLK on SMU v13.0.6

Allow reducing max UCLK in MANUAL performance level. New UCLK value
should be less than the max DPM level UCLK level value.

Ex:
echo manual > "/sys/bus/pci/devices/.../power_dpm_force_performance_level"
echo m 1 900 > "/sys/bus/pci/devices/.../pp_od_clk_voltage”
echo c > "/sys/bus/pci/devices/.../pp_od_clk_voltage”

Signed-off-by: Lijo Lazar 
---
v2:
On switching perf level to auto, restore GFX and UCLK levels only if 
needed.

 .../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c  | 122 +++---
 1 file changed, 102 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
index 03873d784be6..6e8a7eb1864d 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
@@ -1578,6 +1578,8 @@ static int smu_v13_0_6_set_performance_level(struct 
smu_context *smu,
struct smu_13_0_dpm_context *dpm_context = smu_dpm->dpm_context;
struct smu_13_0_dpm_table *gfx_table =
_context->dpm_tables.gfx_table;
+   struct smu_13_0_dpm_table *uclk_table =
+   _context->dpm_tables.uclk_table;
struct smu_umd_pstate_table *pstate_table = >pstate_table;
int ret;

@@ -1593,17 +1595,27 @@ static int smu_v13_0_6_set_performance_level(struct 
smu_context *smu,
return 0;

case AMD_DPM_FORCED_LEVEL_AUTO:
-   if ((gfx_table->min == pstate_table->gfxclk_pstate.curr.min) &&
-   (gfx_table->max == pstate_table->gfxclk_pstate.curr.max))
-   return 0;
+   if ((gfx_table->min != pstate_table->gfxclk_pstate.curr.min) ||
+   (gfx_table->max != pstate_table->gfxclk_pstate.curr.max)) {
+   ret = smu_v13_0_6_set_gfx_soft_freq_limited_range(
+   smu, gfx_table->min, gfx_table->max);
+   if (ret)
+   return ret;

-   ret = smu_v13_0_6_set_gfx_soft_freq_limited_range(
-   smu, gfx_table->min, gfx_table->max);
-   if (ret)
-   return ret;
+   pstate_table->gfxclk_pstate.curr.min = gfx_table->min;
+   pstate_table->gfxclk_pstate.curr.max = gfx_table->max;
+   }
+
+   if (uclk_table->max != pstate_table->uclk_pstate.curr.max) {
+   /* Min UCLK is not expected to be changed */
+   ret = smu_v13_0_set_soft_freq_limited_range(
+   smu, SMU_UCLK, 0, uclk_table->max);
+   if (ret)
+   return ret;
+   pstate_table->uclk_pstate.curr.max = uclk_table->max;
+   }
+   pstate_table->uclk_pstate.custom.max = 0;

-   pstate_table->gfxclk_pstate.curr.min = gfx_table->min;
-   pstate_table->gfxclk_pstate.curr.max = gfx_table->max;
return 0;
case AMD_DPM_FORCED_LEVEL_MANUAL:
return 0;
@@ -1626,7 +1638,8 @@ static int smu_v13_0_6_set_soft_freq_limited_range(struct 
smu_context *smu,
uint32_t max_clk;
int ret = 0;

-   if (clk_type != SMU_GFXCLK && clk_type != SMU_SCLK)
+   if (clk_type != SMU_GFXCLK && clk_type != SMU_SCLK &&
+   clk_type != SMU_UCLK)
return -EINVAL;

if ((smu_dpm->dpm_level != AMD_DPM_FORCED_LEVEL_MANUAL) &&
@@ -1636,18 +1649,31 @@ static int 
smu_v13_0_6_set_soft_freq_limited_range(struct smu_context *smu,
if (smu_dpm->dpm_level == AMD_DPM_FORCED_LEVEL_MANUAL) {
if (min >= max) {
dev_err(smu->adev->dev,
-   "Minimum GFX clk should be less than the 
maximum allowed clock\n");
+   "Minimum clk should be less than the maximum 
allowed clock\n");
return -EINVAL;
}

-   if ((min == pstate_table->gfxclk_pstate.curr.min) &&
-   (max == pstate_table->gfxclk_pstate.curr.max))
-   return 0;
+   if (clk_type == SMU_GFXCLK) {
+   if ((min == pstate_table->gfxclk_pstate.curr.min) &&
+   (max == pstate_table->gfxclk_pstate.curr.max))
+   return 0;

-   ret = smu_v13_0_6_set_gfx_soft_freq_limited_range(smu, min, 
max);
-   if (!ret) {
-   pstate_table->gfxclk_pstate.curr.min

Re: [PATCH 2/2] drm/tests/drm_buddy: add alloc_contiguous test

2024-02-13 Thread Matthew Auld


On 13/02/2024 13:52, Arunpravin Paneer Selvam wrote:

Sanity check DRM_BUDDY_CONTIGUOUS_ALLOCATION.

References: https://gitlab.freedesktop.org/drm/amd/-/issues/3097
Signed-off-by: Matthew Auld 
Reviewed-by: Arunpravin Paneer Selvam 


It looks like you changed the patch authorship here.


Cc: Arunpravin Paneer Selvam 
Cc: Limonciello 
Cc: Christian König 
Signed-off-by: Arunpravin Paneer Selvam 
---
  drivers/gpu/drm/tests/drm_buddy_test.c | 89 ++
  1 file changed, 89 insertions(+)

diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c 
b/drivers/gpu/drm/tests/drm_buddy_test.c
index ea2af6bd9abe..fee6bec757d1 100644
--- a/drivers/gpu/drm/tests/drm_buddy_test.c
+++ b/drivers/gpu/drm/tests/drm_buddy_test.c
@@ -8,6 +8,7 @@
  
  #include 

  #include 
+#include 
  
  #include 
  
@@ -18,6 +19,93 @@ static inline u64 get_size(int order, u64 chunk_size)

return (1 << order) * chunk_size;
  }
  
+static void drm_test_buddy_alloc_contiguous(struct kunit *test)

+{
+   u64 mm_size, ps = SZ_4K, i, n_pages, total;
+   struct drm_buddy_block *block;
+   struct drm_buddy mm;
+   LIST_HEAD(left);
+   LIST_HEAD(middle);
+   LIST_HEAD(right);
+   LIST_HEAD(allocated);
+
+   mm_size = 16 * 3 * SZ_4K;
+
+   KUNIT_EXPECT_FALSE(test, drm_buddy_init(, mm_size, ps));
+
+   /*
+* Idea is to fragment the address space by alternating block
+* allocations between three different lists; one for left, middle and
+* right. We can then free a list to simulate fragmentation. In
+* particular we want to exercise the DRM_BUDDY_CONTIGUOUS_ALLOCATION,
+* including the try_harder path.
+*/
+
+   i = 0;
+   n_pages = mm_size / ps;
+   do {
+   struct list_head *list;
+   int slot = i % 3;
+
+   if (slot == 0)
+   list = 
+   else if (slot == 1)
+   list = 
+   else
+   list = 
+   KUNIT_ASSERT_FALSE_MSG(test,
+  drm_buddy_alloc_blocks(, 0, mm_size,
+ ps, ps, list, 0),
+  "buddy_alloc hit an error size=%d\n",
+  ps);
+   } while (++i < n_pages);
+
+   KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(, 0, mm_size,
+  3 * ps, ps, 
,
+  
DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+  "buddy_alloc didn't error size=%d\n", 3 * ps);
+
+   drm_buddy_free_list(, );
+   KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(, 0, mm_size,
+  3 * ps, ps, 
,
+  
DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+  "buddy_alloc didn't error size=%llu\n", 3 * ps);
+   KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(, 0, mm_size,
+  2 * ps, ps, 
,
+  
DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+  "buddy_alloc didn't error size=%llu\n", 2 * ps);
+
+   drm_buddy_free_list(, );
+   KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(, 0, mm_size,
+  3 * ps, ps, 
,
+  
DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+  "buddy_alloc didn't error size=%llu\n", 3 * ps);
+   /*
+* At this point we should have enough contiguous space for 2 blocks,
+* however they are never buddies (since we freed middle and right) so
+* will require the try_harder logic to find them.
+*/
+   KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(, 0, mm_size,
+   2 * ps, ps, 
,
+   
DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+  "buddy_alloc hit an error size=%d\n", 2 * ps);
+
+   drm_buddy_free_list(, );
+   KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(, 0, mm_size,
+   3 * ps, ps, 
,
+   
DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+  "buddy_alloc hit an error size=%d\n", 3 * ps);
+
+   total = 0;
+   list_for_each_entry(block, , link)
+   total += drm_buddy_block_size(, block);
+
+   KUNIT_ASSERT_EQ(test, total, ps * 2 + ps * 3);
+
+   drm_buddy_free_list(, );
+   drm_buddy_fini();
+}
+
  static void drm_test_buddy_alloc_pathological(struct kunit *test)
  {
u64 mm_size, size, start

[PATCH 2/2] drm/tests/drm_buddy: add alloc_contiguous test

2024-02-13 Thread Arunpravin Paneer Selvam

Sanity check DRM_BUDDY_CONTIGUOUS_ALLOCATION.

References: https://gitlab.freedesktop.org/drm/amd/-/issues/3097
Signed-off-by: Matthew Auld 
Reviewed-by: Arunpravin Paneer Selvam 
Cc: Arunpravin Paneer Selvam 
Cc: Limonciello 
Cc: Christian König 
Signed-off-by: Arunpravin Paneer Selvam 
---
 drivers/gpu/drm/tests/drm_buddy_test.c | 89 ++
 1 file changed, 89 insertions(+)

diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c 
b/drivers/gpu/drm/tests/drm_buddy_test.c
index ea2af6bd9abe..fee6bec757d1 100644
--- a/drivers/gpu/drm/tests/drm_buddy_test.c
+++ b/drivers/gpu/drm/tests/drm_buddy_test.c
@@ -8,6 +8,7 @@
 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -18,6 +19,93 @@ static inline u64 get_size(int order, u64 chunk_size)
return (1 << order) * chunk_size;
 }
 
+static void drm_test_buddy_alloc_contiguous(struct kunit *test)
+{
+   u64 mm_size, ps = SZ_4K, i, n_pages, total;
+   struct drm_buddy_block *block;
+   struct drm_buddy mm;
+   LIST_HEAD(left);
+   LIST_HEAD(middle);
+   LIST_HEAD(right);
+   LIST_HEAD(allocated);
+
+   mm_size = 16 * 3 * SZ_4K;
+
+   KUNIT_EXPECT_FALSE(test, drm_buddy_init(, mm_size, ps));
+
+   /*
+* Idea is to fragment the address space by alternating block
+* allocations between three different lists; one for left, middle and
+* right. We can then free a list to simulate fragmentation. In
+* particular we want to exercise the DRM_BUDDY_CONTIGUOUS_ALLOCATION,
+* including the try_harder path.
+*/
+
+   i = 0;
+   n_pages = mm_size / ps;
+   do {
+   struct list_head *list;
+   int slot = i % 3;
+
+   if (slot == 0)
+   list = 
+   else if (slot == 1)
+   list = 
+   else
+   list = 
+   KUNIT_ASSERT_FALSE_MSG(test,
+  drm_buddy_alloc_blocks(, 0, mm_size,
+ ps, ps, list, 0),
+  "buddy_alloc hit an error size=%d\n",
+  ps);
+   } while (++i < n_pages);
+
+   KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(, 0, mm_size,
+  3 * ps, ps, 
,
+  
DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+  "buddy_alloc didn't error size=%d\n", 3 * ps);
+
+   drm_buddy_free_list(, );
+   KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(, 0, mm_size,
+  3 * ps, ps, 
,
+  
DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+  "buddy_alloc didn't error size=%llu\n", 3 * ps);
+   KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(, 0, mm_size,
+  2 * ps, ps, 
,
+  
DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+  "buddy_alloc didn't error size=%llu\n", 2 * ps);
+
+   drm_buddy_free_list(, );
+   KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(, 0, mm_size,
+  3 * ps, ps, 
,
+  
DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+  "buddy_alloc didn't error size=%llu\n", 3 * ps);
+   /*
+* At this point we should have enough contiguous space for 2 blocks,
+* however they are never buddies (since we freed middle and right) so
+* will require the try_harder logic to find them.
+*/
+   KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(, 0, mm_size,
+   2 * ps, ps, 
,
+   
DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+  "buddy_alloc hit an error size=%d\n", 2 * ps);
+
+   drm_buddy_free_list(, );
+   KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(, 0, mm_size,
+   3 * ps, ps, 
,
+   
DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+  "buddy_alloc hit an error size=%d\n", 3 * ps);
+
+   total = 0;
+   list_for_each_entry(block, , link)
+   total += drm_buddy_block_size(, block);
+
+   KUNIT_ASSERT_EQ(test, total, ps * 2 + ps * 3);
+
+   drm_buddy_free_list(, );
+   drm_buddy_fini();
+}
+
 static void drm_test_buddy_alloc_pathological(struct kunit *test)
 {
u64 mm_size, size, start = 0;
@@ -280,6 +368,7 @@ static struct kunit_case drm_buddy_tests[] = {
KUNIT_CASE(drm_test_buddy_alloc_optimistic),

[PATCH 1/2] drm/buddy: Fix alloc_range() error handling code

2024-02-13 Thread Arunpravin Paneer Selvam

Few users have observed display corruption when they boot
the machine to KDE Plasma or playing games. We have root
caused the problem that whenever alloc_range() couldn't
find the required memory blocks the function was returning
SUCCESS in some of the corner cases.

The right approach would be if the total allocated size
is less than the required size, the function should
return -ENOSPC.

Cc:  # 6.7+
Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3097
Tested-by: Mario Limonciello 
Link: 
https://patchwork.kernel.org/project/dri-devel/patch/20240207174456.341121-1-arunpravin.paneersel...@amd.com/
Acked-by: Christian König 
Reviewed-by: Matthew Auld 
Signed-off-by: Arunpravin Paneer Selvam 
---
 drivers/gpu/drm/drm_buddy.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index f57e6d74fb0e..c1a99bf4dffd 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -539,6 +539,12 @@ static int __alloc_range(struct drm_buddy *mm,
} while (1);
 
list_splice_tail(, blocks);
+
+   if (total_allocated < size) {
+   err = -ENOSPC;
+   goto err_free;
+   }
+
return 0;
 
 err_undo:

base-commit: 2c80a2b715df75881359d07dbaacff8ad411f40e
-- 
2.25.1

Re: [PATCH v2] drm/amd/display: Add NULL test for 'timing generator' in 'dcn21_set_pipe()'

2024-02-13 Thread Kees Cook

On Thu, Feb 01, 2024 at 03:28:45PM +0530, Srinivasan Shanmugam wrote:
> In "u32 otg_inst = pipe_ctx->stream_res.tg->inst;"
> pipe_ctx->stream_res.tg could be NULL, it is relying on the caller to
> ensure the tg is not NULL.
> 
> Fixes: 474ac4a875ca ("drm/amd/display: Implement some asic specific abm call 
> backs.")
> Cc: Yongqiang Sun 
> Cc: Anthony Koo 
> Cc: Rodrigo Siqueira 
> Cc: Aurabindo Pillai 
> Signed-off-by: Srinivasan Shanmugam 
> ---
> v2:
>   - s/u32/uint32_t for consistency (Anthony)
> 
>  .../amd/display/dc/hwss/dcn21/dcn21_hwseq.c   | 24 +++
>  1 file changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn21/dcn21_hwseq.c 
> b/drivers/gpu/drm/amd/display/dc/hwss/dcn21/dcn21_hwseq.c
> index 8e88dcaf88f5..8323077bba15 100644
> --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn21/dcn21_hwseq.c
> +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn21/dcn21_hwseq.c
> @@ -206,28 +206,32 @@ void dcn21_set_abm_immediate_disable(struct pipe_ctx 
> *pipe_ctx)
>  void dcn21_set_pipe(struct pipe_ctx *pipe_ctx)
>  {
>   struct abm *abm = pipe_ctx->stream_res.abm;
> - uint32_t otg_inst = pipe_ctx->stream_res.tg->inst;
> + struct timing_generator *tg = pipe_ctx->stream_res.tg;
>   struct panel_cntl *panel_cntl = pipe_ctx->stream->link->panel_cntl;
>   struct dmcu *dmcu = pipe_ctx->stream->ctx->dc->res_pool->dmcu;
> + uint32_t otg_inst;
> +
> + if (!abm && !tg && !panel_cntl)
> + return;
> +
> + otg_inst = tg->inst;

Is the "if" supposed to be using "||"s instead of "&&"s? I noticed
Coverity complained "tg may be NULL" for the "tg->inst" dereference...

-Kees

-- 
Kees Cook

[PATCH] drm/amd/display: Fix memory leak in dm_sw_fini()

2024-02-13 Thread Armin Wolf

After destroying dmub_srv, the memory associated with it is
not freed, causing a memory leak:

unreferenced object 0x896302b45800 (size 1024):
  comm "(udev-worker)", pid 222, jiffies 4294894636
  hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
  backtrace (crc 6265fd77):
[] kmalloc_trace+0x29d/0x340
[] dm_dmub_sw_init+0xb4/0x450 [amdgpu]
[] dm_sw_init+0x15/0x2b0 [amdgpu]
[] amdgpu_device_init+0x1417/0x24e0 [amdgpu]
[] amdgpu_driver_load_kms+0x15/0x190 [amdgpu]
[] amdgpu_pci_probe+0x187/0x4e0 [amdgpu]
[] local_pci_probe+0x3e/0x90
[] pci_device_probe+0xc3/0x230
[] really_probe+0xe2/0x480
[] __driver_probe_device+0x78/0x160
[] driver_probe_device+0x1f/0x90
[] __driver_attach+0xce/0x1c0
[] bus_for_each_dev+0x70/0xc0
[] bus_add_driver+0x112/0x210
[] driver_register+0x55/0x100
[] do_one_initcall+0x41/0x300

Fix this by freeing dmub_srv after destroying it.

Fixes: 743b9786b14a ("drm/amd/display: Hook up the DMUB service in DM")
Signed-off-by: Armin Wolf 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 59d2eee72a32..9cbfc8d39dee 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -2287,6 +2287,7 @@ static int dm_sw_fini(void *handle)

if (adev->dm.dmub_srv) {
dmub_srv_destroy(adev->dm.dmub_srv);
+   kfree(adev->dm.dmub_srv);
adev->dm.dmub_srv = NULL;
}

--
2.39.2

[PATCH 2/2] drm/tests/drm_buddy: add alloc_contiguous test

[PATCH 1/2] drm/buddy: Fix alloc_range() error handling code

Coverity: jpeg_v4_0_5_start(): Code maintainability issues

[PATCH 2/2] drm/amd/display: clean else not following close brace

[PATCH 1/2] drm/amd/display: clean inconsistent indenting

[PATCH 0/2] drm/amd/display: clean codestyle errors

[PATCH] drm/amdgpu: Do not program IH_CHICKEN in vega20_ih.c under SRIOV

[PATCH v3] drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole

Re: [Patch v2 1/2] drm/amdkfd: update SIMD distribution algo for GFXIP 9.4.2 onwards

[Patch v2 1/2] drm/amdkfd: update SIMD distribution algo for GFXIP 9.4.2 onwards

[Patch v2 2/2] drm/amdgpu: Fix implicit assumtion in gfx11 debug flags

Re: [PATCH 1/2] drm/amdkfd: update SIMD distribution algo for GFXIP 9.4.2 onwards

Re: [PATCH 2/2] drm/amdgpu: Fix implicit assumtion in gfx11 debug flags

Re: [PATCH 1/2] drm/amdkfd: update SIMD distribution algo for GFXIP 9.4.2 onwards

Re: [PATCH] drm/amdgpu: Allow secure submission on SDMAv4.4.2 rings

[PATCH] drm/amdgpu: Improve error checking in amdgpu_virt_rlcg_reg_rw (v2)

[PATCH] drm/amdgpu: Allow secure submission on SDMAv4.4.2 rings

Re: [PATCH v2 6/6] drm: add drm_mode_atomic_commit event

[PATCH v2 6/6] drm: add drm_mode_atomic_commit event

[PATCH v2 5/6] drm/amdgpu: add a amdgpu_cs_ioctl2 event

[PATCH v2 4/6] drm/amdgpu: add BO clear event

[PATCH v2 3/6] amdgpu: use trace_dma_fence_sync_to in amdgpu_fence_sync

[PATCH v2 2/6] dma-buf/fence-chain: use trace_dma_fence_sync_to

[PATCH v2 1/6] tracing, dma-buf: add a trace_dma_fence_sync_to event

[PATCH v2 0/6] dma-fence, drm, amdgpu new trace events

Re: [PATCH 2/2] drm/tests/drm_buddy: add alloc_contiguous test

RE: [PATCH v2] drm/amd/pm: Allow setting max UCLK on SMU v13.0.6

Re: [PATCH 2/2] drm/tests/drm_buddy: add alloc_contiguous test

[PATCH 2/2] drm/tests/drm_buddy: add alloc_contiguous test

[PATCH 1/2] drm/buddy: Fix alloc_range() error handling code

Re: [PATCH v2] drm/amd/display: Add NULL test for 'timing generator' in 'dcn21_set_pipe()'

[PATCH] drm/amd/display: Fix memory leak in dm_sw_fini()

32 matches

Site Navigation

Mail list logo

Footer information