Re: [PATCH] drm/amdgpu: Fix out-of-bounds write warning

2024-04-25 Thread Christian König




Am 25.04.24 um 12:00 schrieb Ma Jun:

Check the ring type value to fix the out-of-bounds
write warning

Signed-off-by: Ma Jun 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 06f0a6534a94..1e0b5bb47bc9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -353,6 +353,11 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
amdgpu_ring *ring,
ring->hw_prio = hw_prio;
  
  	if (!ring->no_scheduler) {

+   if (ring->funcs->type >= AMDGPU_HW_IP_NUM) {
+   dev_warn(adev->dev, "ring type %d has no scheduler\n", 
ring->funcs->type);
+   return 0;
+   }
+


That check should probably be at the beginning of the function since 
trying to initialize a ring with an invalid type should be rejected 
immediately.


Regards,
Christian.


hw_ip = ring->funcs->type;
num_sched = >gpu_sched[hw_ip][hw_prio].num_scheds;
adev->gpu_sched[hw_ip][hw_prio].sched[(*num_sched)++] =




Re: [PATCH v4] drm/amdgpu: Modify the contiguous flags behaviour

2024-04-25 Thread Christian König

Am 25.04.24 um 10:15 schrieb Arunpravin Paneer Selvam:

Now we have two flags for contiguous VRAM buffer allocation.
If the application request for AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
it would set the ttm place TTM_PL_FLAG_CONTIGUOUS flag in the
buffer's placement function.

This patch will change the default behaviour of the two flags.

When we set AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS
- This means contiguous is not mandatory.
- we will try to allocate the contiguous buffer. Say if the
   allocation fails, we fallback to allocate the individual pages.

When we setTTM_PL_FLAG_CONTIGUOUS
- This means contiguous allocation is mandatory.
- we are setting this in amdgpu_bo_pin_restricted() before bo validation
   and check this flag in the vram manager file.
- if this is set, we should allocate the buffer pages contiguously.
   the allocation fails, we return -ENOSPC.

v2:
   - keep the mem_flags and bo->flags check as is(Christian)
   - place the TTM_PL_FLAG_CONTIGUOUS flag setting into the
 amdgpu_bo_pin_restricted function placement range iteration
 loop(Christian)
   - rename find_pages with amdgpu_vram_mgr_calculate_pages_per_block
 (Christian)
   - Keep the kernel BO allocation as is(Christain)
   - If BO pin vram allocation failed, we need to return -ENOSPC as
 RDMA cannot work with scattered VRAM pages(Philip)

v3(Christian):
   - keep contiguous flag handling outside of pages_per_block
 calculation
   - remove the hacky implementation in contiguous flag error
 handling code

v4(Christian):
   - use any variable and return value for non-contiguous
 fallback

Signed-off-by: Arunpravin Paneer Selvam 
Suggested-by: Christian König 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c   |  8 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 22 ++--
  2 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 492aebc44e51..c594d2a5978e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -154,8 +154,10 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo 
*abo, u32 domain)
else
places[c].flags |= TTM_PL_FLAG_TOPDOWN;
  
-		if (flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)

+   if (abo->tbo.type == ttm_bo_type_kernel &&
+   flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)
places[c].flags |= TTM_PL_FLAG_CONTIGUOUS;
+
c++;
}
  
@@ -965,6 +967,10 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain,

if (!bo->placements[i].lpfn ||
(lpfn && lpfn < bo->placements[i].lpfn))
bo->placements[i].lpfn = lpfn;
+
+   if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS &&
+   bo->placements[i].mem_type == TTM_PL_VRAM)
+   bo->placements[i].flags |= TTM_PL_FLAG_CONTIGUOUS;
}
  
  	r = ttm_bo_validate(>tbo, >placement, );

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index e494f5bf136a..6c30eceec896 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -469,7 +469,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager 
*man,
if (tbo->type != ttm_bo_type_kernel)
max_bytes -= AMDGPU_VM_RESERVED_VRAM;
  
-	if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {

+   if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) {
pages_per_block = ~0ul;
} else {
  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
@@ -478,7 +478,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager 
*man,
/* default to 2MB */
pages_per_block = 2UL << (20UL - PAGE_SHIFT);
  #endif
-   pages_per_block = max_t(uint32_t, pages_per_block,
+   pages_per_block = max_t(u32, pages_per_block,
tbo->page_alignment);
}
  
@@ -499,7 +499,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,

if (place->flags & TTM_PL_FLAG_TOPDOWN)
vres->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
  
-	if (place->flags & TTM_PL_FLAG_CONTIGUOUS)

+   if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)
vres->flags |= DRM_BUDDY_CONTIGUOUS_ALLOCATION;
  
  	if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CLEARED)

@@ -518,21 +518,31 @@ static int amdgpu_vram_mgr_new(struct 
ttm_resource_manager *man,
else
min_block_size = mgr->default_page_size;
  
-		BUG_ON(min_block_size < mm->chunk_size);

-
/* Limit maximum size to 2GiB due

Re: [RFC PATCH 10/18] drm/amdgpu: Don't add GTT to initial domains after failing to allocate VRAM

2024-04-25 Thread Christian König

Am 25.04.24 um 09:39 schrieb Friedrich Vock:

On 25.04.24 08:25, Christian König wrote:

Am 24.04.24 um 18:57 schrieb Friedrich Vock:

This adds GTT to the "preferred domains" of this buffer object, which
will also prevent any attempts at moving the buffer back to VRAM if
there is space. If VRAM is full, GTT will already be chosen as a
fallback.


Big NAK to that one, this is mandatory for correct operation.


Hm, how is correctness affected here? We still fall back to GTT if
allocating in VRAM doesn't work, I don't see a difference except that
now we'll actually try moving it back into VRAM again.


Well this is the fallback. Only during CS we try to allocate from GTT if 
allocating in VRAM doesn't work.


When you remove this here then any failed allocation from VRAM would be 
fatal.


What could be is that the handling is buggy and when we update the 
initial domain we also add GTT to the preferred domain, but that should 
then be fixed.


Regards,
Christian.



Regards,
Friedrich


Regards,
Christian.



Signed-off-by: Friedrich Vock 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 4 
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
  2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 6bbab141eaaeb..aea3770d3ea2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -378,10 +378,6 @@ int amdgpu_gem_create_ioctl(struct drm_device
*dev, void *data,
  goto retry;
  }

-    if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
-    initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
-    goto retry;
-    }
  DRM_DEBUG("Failed to allocate GEM object (%llu, %d, %llu,
%d)\n",
  size, initial_domain, args->in.alignment, r);
  }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 85c10d8086188..9978b85ed6f40 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -619,7 +619,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
    AMDGPU_GEM_DOMAIN_GDS))
  amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
  else
-    amdgpu_bo_placement_from_domain(bo, bp->domain);
+    amdgpu_bo_placement_from_domain(bo, bo->allowed_domains);
  if (bp->type == ttm_bo_type_kernel)
  bo->tbo.priority = 2;
  else if (!(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE))
--
2.44.0







Re: [PATCH V2] drm/amdgpu: fix the warning about the expression (int)size - len

2024-04-25 Thread Christian König

Am 25.04.24 um 09:11 schrieb Jesse Zhang:

Converting size from size_t to int may overflow.
v2: keep reverse xmas tree order (Christian)

Signed-off-by: Jesse Zhang 

---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index f5d0fa207a88..eed60d4b3390 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -2065,12 +2065,13 @@ static ssize_t 
amdgpu_reset_dump_register_list_write(struct file *f,
struct amdgpu_device *adev = (struct amdgpu_device 
*)file_inode(f)->i_private;
char reg_offset[11];
uint32_t *new = NULL, *tmp = NULL;
+   unsigned int len = 0;
int ret, i = 0, len = 0;


Well now you have len defined twice :)

Christian.

  
  	do {

memset(reg_offset, 0, 11);
if (copy_from_user(reg_offset, buf + len,
-   min(10, ((int)size-len {
+   min(10, (size-len {
ret = -EFAULT;
goto error_free;
}




Re: [RFC PATCH 16/18] drm/amdgpu: Implement SET_PRIORITY GEM op

2024-04-25 Thread Christian König

Am 25.04.24 um 09:06 schrieb Friedrich Vock:

On 25.04.24 08:58, Christian König wrote:

Am 25.04.24 um 08:46 schrieb Friedrich Vock:

On 25.04.24 08:32, Christian König wrote:

Am 24.04.24 um 18:57 schrieb Friedrich Vock:

Used by userspace to adjust buffer priorities in response to
changes in
application demand and memory pressure.


Yeah, that was discussed over and over again. One big design criteria
is that we can't have global priorities from userspace!

The background here is that this can trivially be abused.


I see your point when apps are allowed to prioritize themselves above
other apps, and I agree that should probably be disallowed at least for
unprivileged apps.

Disallowing this is a pretty trivial change though, and I don't really
see the abuse potential in being able to downgrade your own priority?


Yeah, I know what you mean and I'm also leaning towards that
argumentation. But another good point is also that it doesn't actually
help.

For example when you have desktop apps fighting with a game, you
probably don't want to use static priorities, but rather evict the
apps which are inactive and keep the apps which are active in the
background.


Sadly things are not as simple as "evict everything from app 1, keep
everything from app 2 active". The simplest failure case of this is
games that already oversubscribe VRAM on their own. Keeping the whole
app inside VRAM is literally impossible there, and it helps a lot to
know which buffers the app is most happy with evicting.

In other words the priority just tells you which stuff from each app
to evict first, but not which app to globally throw out.


Yeah, but per-buffer priority system could do both of these.


Yeah, but we already have that. See amdgpu_bo_list_entry_cmp() and 
amdgpu_bo_list_create().


This is the per application priority which can be used by userspace to 
give priority to each BO in a submission (or application wide).


The problem is rather that amdgpu/TTM never really made good use of that 
information.


Regards,
Christian.



Regards,
Friedrich


Regards,
Christian.



Regards,
Friedrich


What we can do is to have per process priorities, but that needs to be
in the VM subsystem.

That's also the reason why I personally think that the handling
shouldn't be inside TTM at all.

Regards,
Christian.



Signed-off-by: Friedrich Vock 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 20 
  include/uapi/drm/amdgpu_drm.h   |  1 +
  2 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 5ca13e2e50f50..6107810a9c205 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -836,8 +836,10 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev,
void *data,
  {
  struct amdgpu_device *adev = drm_to_adev(dev);
  struct drm_amdgpu_gem_op *args = data;
+    struct ttm_resource_manager *man;
  struct drm_gem_object *gobj;
  struct amdgpu_vm_bo_base *base;
+    struct ttm_operation_ctx ctx;
  struct amdgpu_bo *robj;
  int r;

@@ -851,6 +853,9 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev,
void *data,
  if (unlikely(r))
  goto out;

+    memset(, 0, sizeof(ctx));
+    ctx.interruptible = true;
+
  switch (args->op) {
  case AMDGPU_GEM_OP_GET_GEM_CREATE_INFO: {
  struct drm_amdgpu_gem_create_in info;
@@ -898,6 +903,21 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev,
void *data,

  amdgpu_bo_unreserve(robj);
  break;
+    case AMDGPU_GEM_OP_SET_PRIORITY:
+    if (args->value > AMDGPU_BO_PRIORITY_MAX_USER)
+    args->value = AMDGPU_BO_PRIORITY_MAX_USER;
+    ttm_bo_update_priority(>tbo, args->value);
+    if (robj->tbo.evicted_type != TTM_NUM_MEM_TYPES) {
+    ttm_bo_try_unevict(>tbo, );
+    amdgpu_bo_unreserve(robj);
+    } else {
+    amdgpu_bo_unreserve(robj);
+    man = ttm_manager_type(robj->tbo.bdev,
+    robj->tbo.resource->mem_type);
+    ttm_mem_unevict_evicted(robj->tbo.bdev, man,
+    true);
+    }
+    break;
  default:
  amdgpu_bo_unreserve(robj);
  r = -EINVAL;
diff --git a/include/uapi/drm/amdgpu_drm.h
b/include/uapi/drm/amdgpu_drm.h
index bdbe6b262a78d..53552dd489b9b 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -531,6 +531,7 @@ union drm_amdgpu_wait_fences {

  #define AMDGPU_GEM_OP_GET_GEM_CREATE_INFO    0
  #define AMDGPU_GEM_OP_SET_PLACEMENT    1
+#define AMDGPU_GEM_OP_SET_PRIORITY  2

  /* Sets or returns a value associated with a buffer. */
  struct drm_amdgpu_gem_op {
--
2.44.0









Re: [RFC PATCH 16/18] drm/amdgpu: Implement SET_PRIORITY GEM op

2024-04-25 Thread Christian König

Am 25.04.24 um 08:46 schrieb Friedrich Vock:

On 25.04.24 08:32, Christian König wrote:

Am 24.04.24 um 18:57 schrieb Friedrich Vock:

Used by userspace to adjust buffer priorities in response to changes in
application demand and memory pressure.


Yeah, that was discussed over and over again. One big design criteria
is that we can't have global priorities from userspace!

The background here is that this can trivially be abused.


I see your point when apps are allowed to prioritize themselves above
other apps, and I agree that should probably be disallowed at least for
unprivileged apps.

Disallowing this is a pretty trivial change though, and I don't really
see the abuse potential in being able to downgrade your own priority?


Yeah, I know what you mean and I'm also leaning towards that 
argumentation. But another good point is also that it doesn't actually help.


For example when you have desktop apps fighting with a game, you 
probably don't want to use static priorities, but rather evict the apps 
which are inactive and keep the apps which are active in the background.


In other words the priority just tells you which stuff from each app to 
evict first, but not which app to globally throw out.


Regards,
Christian.



Regards,
Friedrich


What we can do is to have per process priorities, but that needs to be
in the VM subsystem.

That's also the reason why I personally think that the handling
shouldn't be inside TTM at all.

Regards,
Christian.



Signed-off-by: Friedrich Vock 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 20 
  include/uapi/drm/amdgpu_drm.h   |  1 +
  2 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 5ca13e2e50f50..6107810a9c205 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -836,8 +836,10 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev,
void *data,
  {
  struct amdgpu_device *adev = drm_to_adev(dev);
  struct drm_amdgpu_gem_op *args = data;
+    struct ttm_resource_manager *man;
  struct drm_gem_object *gobj;
  struct amdgpu_vm_bo_base *base;
+    struct ttm_operation_ctx ctx;
  struct amdgpu_bo *robj;
  int r;

@@ -851,6 +853,9 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev,
void *data,
  if (unlikely(r))
  goto out;

+    memset(, 0, sizeof(ctx));
+    ctx.interruptible = true;
+
  switch (args->op) {
  case AMDGPU_GEM_OP_GET_GEM_CREATE_INFO: {
  struct drm_amdgpu_gem_create_in info;
@@ -898,6 +903,21 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev,
void *data,

  amdgpu_bo_unreserve(robj);
  break;
+    case AMDGPU_GEM_OP_SET_PRIORITY:
+    if (args->value > AMDGPU_BO_PRIORITY_MAX_USER)
+    args->value = AMDGPU_BO_PRIORITY_MAX_USER;
+    ttm_bo_update_priority(>tbo, args->value);
+    if (robj->tbo.evicted_type != TTM_NUM_MEM_TYPES) {
+    ttm_bo_try_unevict(>tbo, );
+    amdgpu_bo_unreserve(robj);
+    } else {
+    amdgpu_bo_unreserve(robj);
+    man = ttm_manager_type(robj->tbo.bdev,
+    robj->tbo.resource->mem_type);
+    ttm_mem_unevict_evicted(robj->tbo.bdev, man,
+    true);
+    }
+    break;
  default:
  amdgpu_bo_unreserve(robj);
  r = -EINVAL;
diff --git a/include/uapi/drm/amdgpu_drm.h
b/include/uapi/drm/amdgpu_drm.h
index bdbe6b262a78d..53552dd489b9b 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -531,6 +531,7 @@ union drm_amdgpu_wait_fences {

  #define AMDGPU_GEM_OP_GET_GEM_CREATE_INFO    0
  #define AMDGPU_GEM_OP_SET_PLACEMENT    1
+#define AMDGPU_GEM_OP_SET_PRIORITY  2

  /* Sets or returns a value associated with a buffer. */
  struct drm_amdgpu_gem_op {
--
2.44.0







Re: [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription

2024-04-25 Thread Christian König

In general: Yes please :)

But are exercising a lot of ideas we have already thrown over board over 
the years.


The general idea Marek and I have been working on for a while now is 
rather to make TTM aware of userspace "clients".


In other words we should start with having a TTM structure in the fpriv 
of the drivers and then track there how much VRAM was evicted for each 
client.


This should then be balanced so that each client gets it's equal share 
of VRAM and we pretty much end up with a static situation which only 
changes when applications become inactive/active (based on their GPU 
activity).


I will mail you some of the stuff we already came up with later on.

Regards,
Christian.

Am 24.04.24 um 18:56 schrieb Friedrich Vock:

Hi everyone,

recently I've been looking into remedies for apps (in particular, newer
games) that experience significant performance loss when they start to
hit VRAM limits, especially on older or lower-end cards that struggle
to fit both desktop apps and all the game data into VRAM at once.

The root of the problem lies in the fact that from userspace's POV,
buffer eviction is very opaque: Userspace applications/drivers cannot
tell how oversubscribed VRAM is, nor do they have fine-grained control
over which buffers get evicted.  At the same time, with GPU APIs becoming
increasingly lower-level and GPU-driven, only the application itself
can know which buffers are used within a particular submission, and
how important each buffer is. For this, GPU APIs include interfaces
to query oversubscription and specify memory priorities: In Vulkan,
oversubscription can be queried through the VK_EXT_memory_budget
extension. Different buffers can also be assigned priorities via the
VK_EXT_pageable_device_local_memory extension. Modern games, especially
D3D12 games via vkd3d-proton, rely on oversubscription being reported and
priorities being respected in order to perform their memory management.

However, relaying this information to the kernel via the current KMD uAPIs
is not possible. On AMDGPU for example, all work submissions include a
"bo list" that contains any buffer object that is accessed during the
course of the submission. If VRAM is oversubscribed and a buffer in the
list was evicted to system memory, that buffer is moved back to VRAM
(potentially evicting other unused buffers).

Since the usermode driver doesn't know what buffers are used by the
application, its only choice is to submit a bo list that contains every
buffer the application has allocated. In case of VRAM oversubscription,
it is highly likely that some of the application's buffers were evicted,
which almost guarantees that some buffers will get moved around. Since
the bo list is only known at submit time, this also means the buffers
will get moved right before submitting application work, which is the
worst possible time to move buffers from a latency perspective. Another
consequence of the large bo list is that nearly all memory from other
applications will be evicted, too. When different applications (e.g. game
and compositor) submit work one after the other, this causes a ping-pong
effect where each app's submission evicts the other app's memory,
resulting in a large amount of unnecessary moves.

This overly aggressive eviction behavior led to RADV adopting a change
that effectively allows all VRAM applications to reside in system memory
[1].  This worked around the ping-ponging/excessive buffer moving problem,
but also meant that any memory evicted to system memory would forever
stay there, regardless of how VRAM is used.

My proposal aims at providing a middle ground between these extremes.
The goals I want to meet are:
- Userspace is accurately informed about VRAM oversubscription/how much
   VRAM has been evicted
- Buffer eviction respects priorities set by userspace - Wasteful
   ping-ponging is avoided to the extent possible

I have been testing out some prototypes, and came up with this rough
sketch of an API:

- For each ttm_resource_manager, the amount of evicted memory is tracked
   (similarly to how "usage" tracks the memory usage). When memory is
   evicted via ttm_bo_evict, the size of the evicted memory is added, when
   memory is un-evicted (see below), its size is subtracted. The amount of
   evicted memory for e.g. VRAM can be queried by userspace via an ioctl.

- Each ttm_resource_manager maintains a list of evicted buffer objects.

- ttm_mem_unevict walks the list of evicted bos for a given
   ttm_resource_manager and tries moving evicted resources back. When a
   buffer is freed, this function is called to immediately restore some
   evicted memory.

- Each ttm_buffer_object independently tracks the mem_type it wants
   to reside in.

- ttm_bo_try_unevict is added as a helper function which attempts to
   move the buffer to its preferred mem_type. If no space is available
   there, it fails with -ENOSPC/-ENOMEM.

- Similar to how ttm_bo_evict works, each driver can implement
   

Re: [PATCH] drm/amdgpu: fix the warning about the expression (int)size - len

2024-04-25 Thread Christian König

Am 25.04.24 um 08:20 schrieb Jesse Zhang:

Converting size from size_t to int may overflow.

Signed-off-by: Jesse Zhang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index f5d0fa207a88..b828aad4f35e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -2065,12 +2065,13 @@ static ssize_t 
amdgpu_reset_dump_register_list_write(struct file *f,
struct amdgpu_device *adev = (struct amdgpu_device 
*)file_inode(f)->i_private;
char reg_offset[11];
uint32_t *new = NULL, *tmp = NULL;
-   int ret, i = 0, len = 0;
+   int ret, i = 0;
+   unsigned int len = 0;


Please keep reverse xmas tree order here, apart from that looks good to me.

Regards,
Christian.

  
  	do {

memset(reg_offset, 0, 11);
if (copy_from_user(reg_offset, buf + len,
-   min(10, ((int)size-len {
+   min(10, (size-len {
ret = -EFAULT;
goto error_free;
}




Re: [PATCH] drm/amdgpu: fix overflowed array index read warning

2024-04-25 Thread Christian König

Am 25.04.24 um 07:27 schrieb Tim Huang:

From: Tim Huang 

Clear warning that cast operation might have overflowed.

Signed-off-by: Tim Huang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 06f0a6534a94..6dfcd62e83ae 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -473,7 +473,7 @@ static ssize_t amdgpu_debugfs_ring_read(struct file *f, 
char __user *buf,
size_t size, loff_t *pos)
  {
struct amdgpu_ring *ring = file_inode(f)->i_private;
-   int r, i;
+   int r;
uint32_t value, result, early[3];


While at it please declare "int r;" last, e.g. keep reverse xmas tree 
order here.


Apart from that looks good to me.

Regards,
Christian.

  
  	if (*pos & 3 || size & 3)

@@ -485,7 +485,7 @@ static ssize_t amdgpu_debugfs_ring_read(struct file *f, 
char __user *buf,
early[0] = amdgpu_ring_get_rptr(ring) & ring->buf_mask;
early[1] = amdgpu_ring_get_wptr(ring) & ring->buf_mask;
early[2] = ring->wptr & ring->buf_mask;
-   for (i = *pos / 4; i < 3 && size; i++) {
+   for (loff_t i = *pos / 4; i < 3 && size; i++) {
r = put_user(early[i], (uint32_t *)buf);
if (r)
return r;




Re: [PATCH] drm/amdgpu: fix potential resource leak warning

2024-04-25 Thread Christian König

Am 25.04.24 um 05:33 schrieb Tim Huang:

From: Tim Huang 

Clear resource leak warning that when the prepare fails,
the allocated amdgpu job object will never be released.

Signed-off-by: Tim Huang 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
index 66e8a016126b..9b748d7058b5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
@@ -102,6 +102,11 @@ static int amdgpu_vm_sdma_prepare(struct 
amdgpu_vm_update_params *p,
if (!r)
r = amdgpu_sync_push_to_job(, p->job);
amdgpu_sync_free();
+
+   if (r) {
+   p->num_dw_left = 0;
+   amdgpu_job_free(p->job);
+   }
return r;
  }
  




Re: [RFC PATCH 08/18] drm/amdgpu: Don't try moving BOs to preferred domain before submit

2024-04-25 Thread Christian König

Am 24.04.24 um 18:56 schrieb Friedrich Vock:

TTM now takes care of moving buffers to the best possible domain.


Yeah, I've been planning to do this for a while as well. The problem is 
really that we need to keep the functionality.


For example TTM currently doesn't have a concept of an userspace client. 
So it can't track the bytes already evicted for each client.


This needs to be added as infrastructure first and then we can start to 
change this over into moving more functionality into TTM.


Regards,
Christian.



Signed-off-by: Friedrich Vock 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h|   2 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 191 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h |   4 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   7 -
  4 files changed, 3 insertions(+), 201 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index cac0ca64367b3..3004adc6fa679 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1404,8 +1404,6 @@ bool amdgpu_device_need_post(struct amdgpu_device *adev);
  bool amdgpu_device_seamless_boot_supported(struct amdgpu_device *adev);
  bool amdgpu_device_should_use_aspm(struct amdgpu_device *adev);

-void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64 num_bytes,
- u64 num_vis_bytes);
  int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev);
  void amdgpu_device_program_register_sequence(struct amdgpu_device *adev,
 const u32 *registers,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index e9168677ef0a6..92a0cffc1adc3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -638,196 +638,19 @@ static int amdgpu_cs_pass2(struct amdgpu_cs_parser *p)
return 0;
  }

-/* Convert microseconds to bytes. */
-static u64 us_to_bytes(struct amdgpu_device *adev, s64 us)
-{
-   if (us <= 0 || !adev->mm_stats.log2_max_MBps)
-   return 0;
-
-   /* Since accum_us is incremented by a million per second, just
-* multiply it by the number of MB/s to get the number of bytes.
-*/
-   return us << adev->mm_stats.log2_max_MBps;
-}
-
-static s64 bytes_to_us(struct amdgpu_device *adev, u64 bytes)
-{
-   if (!adev->mm_stats.log2_max_MBps)
-   return 0;
-
-   return bytes >> adev->mm_stats.log2_max_MBps;
-}
-
-/* Returns how many bytes TTM can move right now. If no bytes can be moved,
- * it returns 0. If it returns non-zero, it's OK to move at least one buffer,
- * which means it can go over the threshold once. If that happens, the driver
- * will be in debt and no other buffer migrations can be done until that debt
- * is repaid.
- *
- * This approach allows moving a buffer of any size (it's important to allow
- * that).
- *
- * The currency is simply time in microseconds and it increases as the clock
- * ticks. The accumulated microseconds (us) are converted to bytes and
- * returned.
- */
-static void amdgpu_cs_get_threshold_for_moves(struct amdgpu_device *adev,
- u64 *max_bytes,
- u64 *max_vis_bytes)
-{
-   s64 time_us, increment_us;
-   u64 free_vram, total_vram, used_vram;
-   /* Allow a maximum of 200 accumulated ms. This is basically per-IB
-* throttling.
-*
-* It means that in order to get full max MBps, at least 5 IBs per
-* second must be submitted and not more than 200ms apart from each
-* other.
-*/
-   const s64 us_upper_bound = 20;
-
-   if (!adev->mm_stats.log2_max_MBps) {
-   *max_bytes = 0;
-   *max_vis_bytes = 0;
-   return;
-   }
-
-   total_vram = adev->gmc.real_vram_size - 
atomic64_read(>vram_pin_size);
-   used_vram = ttm_resource_manager_usage(>mman.vram_mgr.manager);
-   free_vram = used_vram >= total_vram ? 0 : total_vram - used_vram;
-
-   spin_lock(>mm_stats.lock);
-
-   /* Increase the amount of accumulated us. */
-   time_us = ktime_to_us(ktime_get());
-   increment_us = time_us - adev->mm_stats.last_update_us;
-   adev->mm_stats.last_update_us = time_us;
-   adev->mm_stats.accum_us = min(adev->mm_stats.accum_us + increment_us,
- us_upper_bound);
-
-   /* This prevents the short period of low performance when the VRAM
-* usage is low and the driver is in debt or doesn't have enough
-* accumulated us to fill VRAM quickly.
-*
-* The situation can occur in these cases:
-* - a lot of VRAM is freed by userspace
-* - the presence of a big buffer causes a lot of evictions
-*   (solution: split buffers into smaller ones)
-*
-* If 128 MB or 1/8th of VRAM 

Re: [RFC PATCH 16/18] drm/amdgpu: Implement SET_PRIORITY GEM op

2024-04-25 Thread Christian König

Am 24.04.24 um 18:57 schrieb Friedrich Vock:

Used by userspace to adjust buffer priorities in response to changes in
application demand and memory pressure.


Yeah, that was discussed over and over again. One big design criteria is 
that we can't have global priorities from userspace!


The background here is that this can trivially be abused.

What we can do is to have per process priorities, but that needs to be 
in the VM subsystem.


That's also the reason why I personally think that the handling 
shouldn't be inside TTM at all.


Regards,
Christian.



Signed-off-by: Friedrich Vock 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 20 
  include/uapi/drm/amdgpu_drm.h   |  1 +
  2 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 5ca13e2e50f50..6107810a9c205 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -836,8 +836,10 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data,
  {
struct amdgpu_device *adev = drm_to_adev(dev);
struct drm_amdgpu_gem_op *args = data;
+   struct ttm_resource_manager *man;
struct drm_gem_object *gobj;
struct amdgpu_vm_bo_base *base;
+   struct ttm_operation_ctx ctx;
struct amdgpu_bo *robj;
int r;

@@ -851,6 +853,9 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data,
if (unlikely(r))
goto out;

+   memset(, 0, sizeof(ctx));
+   ctx.interruptible = true;
+
switch (args->op) {
case AMDGPU_GEM_OP_GET_GEM_CREATE_INFO: {
struct drm_amdgpu_gem_create_in info;
@@ -898,6 +903,21 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data,

amdgpu_bo_unreserve(robj);
break;
+   case AMDGPU_GEM_OP_SET_PRIORITY:
+   if (args->value > AMDGPU_BO_PRIORITY_MAX_USER)
+   args->value = AMDGPU_BO_PRIORITY_MAX_USER;
+   ttm_bo_update_priority(>tbo, args->value);
+   if (robj->tbo.evicted_type != TTM_NUM_MEM_TYPES) {
+   ttm_bo_try_unevict(>tbo, );
+   amdgpu_bo_unreserve(robj);
+   } else {
+   amdgpu_bo_unreserve(robj);
+   man = ttm_manager_type(robj->tbo.bdev,
+   robj->tbo.resource->mem_type);
+   ttm_mem_unevict_evicted(robj->tbo.bdev, man,
+   true);
+   }
+   break;
default:
amdgpu_bo_unreserve(robj);
r = -EINVAL;
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index bdbe6b262a78d..53552dd489b9b 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -531,6 +531,7 @@ union drm_amdgpu_wait_fences {

  #define AMDGPU_GEM_OP_GET_GEM_CREATE_INFO 0
  #define AMDGPU_GEM_OP_SET_PLACEMENT   1
+#define AMDGPU_GEM_OP_SET_PRIORITY  2

  /* Sets or returns a value associated with a buffer. */
  struct drm_amdgpu_gem_op {
--
2.44.0





Re: [RFC PATCH 13/18] drm/ttm: Implement ttm_bo_update_priority

2024-04-25 Thread Christian König

Am 24.04.24 um 18:57 schrieb Friedrich Vock:

Used to dynamically adjust priorities of buffers at runtime, to react to
changes in memory pressure/usage patterns.


And another big NAK. TTM priorities are meant to be static based on in 
kernel decisions which are not exposed to userspace.


In other words we can group BOs based on kernel, user, SVM etc... but 
never on something userspace can influence.


Regards,
Christian.



Signed-off-by: Friedrich Vock 
---
  drivers/gpu/drm/ttm/ttm_bo.c | 17 +
  include/drm/ttm/ttm_bo.h |  2 ++
  2 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index eae54cd4a7ce9..6ac939c58a6b8 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -112,6 +112,23 @@ void ttm_bo_set_bulk_move(struct ttm_buffer_object *bo,
  }
  EXPORT_SYMBOL(ttm_bo_set_bulk_move);

+void ttm_bo_update_priority(struct ttm_buffer_object *bo, unsigned int 
new_prio)
+{
+   struct ttm_resource_manager *man;
+
+   if (!bo->resource)
+   return;
+
+   man = ttm_manager_type(bo->bdev, bo->resource->mem_type);
+
+   spin_lock(>bdev->lru_lock);
+   ttm_resource_del_bulk_move(bo->resource, bo);
+   bo->priority = new_prio;
+   ttm_resource_add_bulk_move(bo->resource, bo);
+   spin_unlock(>bdev->lru_lock);
+}
+EXPORT_SYMBOL(ttm_bo_update_priority);
+
  static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
  struct ttm_resource *mem, bool evict,
  struct ttm_operation_ctx *ctx,
diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
index 91299a3b6fcfa..51040bc443ea0 100644
--- a/include/drm/ttm/ttm_bo.h
+++ b/include/drm/ttm/ttm_bo.h
@@ -359,6 +359,8 @@ static inline void *ttm_kmap_obj_virtual(struct 
ttm_bo_kmap_obj *map,
return map->virtual;
  }

+void ttm_bo_update_priority(struct ttm_buffer_object *bo,
+   unsigned int new_prio);

  int ttm_bo_wait_ctx(struct ttm_buffer_object *bo,
struct ttm_operation_ctx *ctx);
--
2.44.0





Re: [RFC PATCH 12/18] drm/ttm: Do not evict BOs with higher priority

2024-04-25 Thread Christian König

Am 24.04.24 um 18:57 schrieb Friedrich Vock:

This makes buffer eviction significantly more stable by avoiding
ping-ponging caused by low-priority buffers evicting high-priority
buffers and vice versa.


And creates a deny of service for the whole system by fork() bombing.

This is another very big NAK.

Regards,
Christian.



Signed-off-by: Friedrich Vock 
---
  drivers/gpu/drm/ttm/ttm_bo.c   | 9 +++--
  drivers/gpu/drm/ttm/ttm_resource.c | 5 +++--
  include/drm/ttm/ttm_bo.h   | 1 +
  3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 3047c763eb4eb..eae54cd4a7ce9 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -776,6 +776,7 @@ static int ttm_mem_evict_wait_busy(struct ttm_buffer_object 
*busy_bo,
  int ttm_mem_evict_first(struct ttm_device *bdev,
struct ttm_resource_manager *man,
const struct ttm_place *place,
+   unsigned int max_priority,
struct ttm_operation_ctx *ctx,
struct ww_acquire_ctx *ticket)
  {
@@ -788,6 +789,8 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
spin_lock(>lru_lock);
ttm_resource_manager_for_each_res(man, , res) {
bool busy;
+   if (res->bo->priority > max_priority)
+   break;

if (!ttm_bo_evict_swapout_allowable(res->bo, ctx, place,
, )) {
@@ -930,8 +933,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object 
*bo,
return ret;
if (ctx->no_evict)
return -ENOSPC;
-   ret = ttm_mem_evict_first(bdev, man, place, ctx,
- ticket);
+   if (!bo->priority)
+   return -ENOSPC;
+   ret = ttm_mem_evict_first(bdev, man, place, bo->priority - 1,
+ ctx, ticket);
if (unlikely(ret != 0))
return ret;
} while (1);
diff --git a/drivers/gpu/drm/ttm/ttm_resource.c 
b/drivers/gpu/drm/ttm/ttm_resource.c
index 1d6755a1153b1..63d4371adb519 100644
--- a/drivers/gpu/drm/ttm/ttm_resource.c
+++ b/drivers/gpu/drm/ttm/ttm_resource.c
@@ -431,8 +431,9 @@ int ttm_resource_manager_evict_all(struct ttm_device *bdev,
for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
while (!list_empty(>lru[i])) {
spin_unlock(>lru_lock);
-   ret = ttm_mem_evict_first(bdev, man, NULL, ,
- NULL);
+   ret = ttm_mem_evict_first(bdev, man, NULL,
+ TTM_MAX_BO_PRIORITY,
+ , NULL);
if (ret)
return ret;
spin_lock(>lru_lock);
diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
index 8f4e6366c0417..91299a3b6fcfa 100644
--- a/include/drm/ttm/ttm_bo.h
+++ b/include/drm/ttm/ttm_bo.h
@@ -396,6 +396,7 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo);
  int ttm_mem_evict_first(struct ttm_device *bdev,
struct ttm_resource_manager *man,
const struct ttm_place *place,
+   unsigned int max_priority,
struct ttm_operation_ctx *ctx,
struct ww_acquire_ctx *ticket);
  void ttm_mem_unevict_evicted(struct ttm_device *bdev,
--
2.44.0





Re: [RFC PATCH 10/18] drm/amdgpu: Don't add GTT to initial domains after failing to allocate VRAM

2024-04-25 Thread Christian König

Am 24.04.24 um 18:57 schrieb Friedrich Vock:

This adds GTT to the "preferred domains" of this buffer object, which
will also prevent any attempts at moving the buffer back to VRAM if
there is space. If VRAM is full, GTT will already be chosen as a
fallback.


Big NAK to that one, this is mandatory for correct operation.

Regards,
Christian.



Signed-off-by: Friedrich Vock 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 4 
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
  2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 6bbab141eaaeb..aea3770d3ea2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -378,10 +378,6 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void 
*data,
goto retry;
}

-   if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
-   initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
-   goto retry;
-   }
DRM_DEBUG("Failed to allocate GEM object (%llu, %d, %llu, 
%d)\n",
size, initial_domain, args->in.alignment, r);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 85c10d8086188..9978b85ed6f40 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -619,7 +619,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
  AMDGPU_GEM_DOMAIN_GDS))
amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
else
-   amdgpu_bo_placement_from_domain(bo, bp->domain);
+   amdgpu_bo_placement_from_domain(bo, bo->allowed_domains);
if (bp->type == ttm_bo_type_kernel)
bo->tbo.priority = 2;
else if (!(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE))
--
2.44.0





Re: [RFC PATCH 09/18] drm/amdgpu: Don't mark VRAM as a busy placement for VRAM|GTT resources

2024-04-25 Thread Christian König

Am 24.04.24 um 18:56 schrieb Friedrich Vock:

We will never try evicting things from VRAM for these resources anyway.
This affects TTM buffer uneviction logic, which would otherwise try to
move these buffers into VRAM (clashing with VRAM-only allocations).


You are working on outdated code. That change was already done by me as 
well.


Regards,
Christian.



Signed-off-by: Friedrich Vock 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 13 +
  1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 5834a95d680d9..85c10d8086188 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -127,6 +127,7 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, 
u32 domain)
struct amdgpu_device *adev = amdgpu_ttm_adev(abo->tbo.bdev);
struct ttm_placement *placement = >placement;
struct ttm_place *places = abo->placements;
+   bool skip_vram_busy = false;
u64 flags = abo->flags;
u32 c = 0;

@@ -156,6 +157,13 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo 
*abo, u32 domain)
if (flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)
places[c].flags |= TTM_PL_FLAG_CONTIGUOUS;
c++;
+
+   /*
+* If GTT is preferred by the buffer as well, don't try VRAM 
when it's
+* busy.
+*/
+   if ((domain & abo->preferred_domains) & AMDGPU_GEM_DOMAIN_GTT)
+   skip_vram_busy = true;
}

if (domain & AMDGPU_GEM_DOMAIN_DOORBELL) {
@@ -223,6 +231,11 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo 
*abo, u32 domain)

placement->num_busy_placement = c;
placement->busy_placement = places;
+
+   if (skip_vram_busy) {
+   --placement->num_busy_placement;
+   ++placement->busy_placement;
+   }
  }

  /**
--
2.44.0





Re: [RFC PATCH 05/18] drm/ttm: Add option to evict no BOs in operation

2024-04-25 Thread Christian König

Am 24.04.24 um 18:56 schrieb Friedrich Vock:

When undoing evictions because of decreased memory pressure, it makes no
sense to try evicting other buffers.


That duplicates some functionality.

If a driver doesn't want eviction to happen it just needs to mark the 
desired placements as non-evictable with the TTM_PL_FLAG_DESIRED flag.


Regards,
Christian.



Signed-off-by: Friedrich Vock 
---
  drivers/gpu/drm/ttm/ttm_bo.c | 2 ++
  include/drm/ttm/ttm_bo.h | 2 ++
  2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 9a0efbf79316c..3b89fabc2f00a 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -764,6 +764,8 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object 
*bo,
break;
if (unlikely(ret != -ENOSPC))
return ret;
+   if (ctx->no_evict)
+   return -ENOSPC;
ret = ttm_mem_evict_first(bdev, man, place, ctx,
  ticket);
if (unlikely(ret != 0))
diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
index 8a1a29c6fbc50..a8f21092403d6 100644
--- a/include/drm/ttm/ttm_bo.h
+++ b/include/drm/ttm/ttm_bo.h
@@ -192,6 +192,7 @@ struct ttm_operation_ctx {
bool gfp_retry_mayfail;
bool allow_res_evict;
bool force_alloc;
+   bool no_evict;
struct dma_resv *resv;
uint64_t bytes_moved;
  };
@@ -358,6 +359,7 @@ static inline void *ttm_kmap_obj_virtual(struct 
ttm_bo_kmap_obj *map,
return map->virtual;
  }

+
  int ttm_bo_wait_ctx(struct ttm_buffer_object *bo,
struct ttm_operation_ctx *ctx);
  int ttm_bo_validate(struct ttm_buffer_object *bo,
--
2.44.0





Re: [RFC PATCH 02/18] drm/ttm: Add per-BO eviction tracking

2024-04-25 Thread Christian König

Am 24.04.24 um 18:56 schrieb Friedrich Vock:

Make each buffer object aware of whether it has been evicted or not.


That reverts some changes we made a couple of years ago.

In general the idea is that eviction isn't something we need to reverse 
in TTM.


Rather the driver gives the desired placement.

Regards,
Christian.



Signed-off-by: Friedrich Vock 
---
  drivers/gpu/drm/ttm/ttm_bo.c |  1 +
  include/drm/ttm/ttm_bo.h | 11 +++
  2 files changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index edf10618fe2b2..3968b17453569 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -980,6 +980,7 @@ int ttm_bo_init_reserved(struct ttm_device *bdev, struct 
ttm_buffer_object *bo,
bo->pin_count = 0;
bo->sg = sg;
bo->bulk_move = NULL;
+   bo->evicted_type = TTM_NUM_MEM_TYPES;
if (resv)
bo->base.resv = resv;
else
diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
index 0223a41a64b24..8a1a29c6fbc50 100644
--- a/include/drm/ttm/ttm_bo.h
+++ b/include/drm/ttm/ttm_bo.h
@@ -121,6 +121,17 @@ struct ttm_buffer_object {
unsigned priority;
unsigned pin_count;

+   /**
+* @evicted_type: Memory type this BO was evicted from, if any.
+* TTM_NUM_MEM_TYPES if this BO was not evicted.
+*/
+   int evicted_type;
+   /**
+* @evicted: Entry in the evicted list for the resource manager
+* this BO was evicted from.
+*/
+   struct list_head evicted;
+
/**
 * @delayed_delete: Work item used when we can't delete the BO
 * immediately
--
2.44.0





Re: [PATCH v3] drm/amdgpu: fix uninitialized scalar variable warning

2024-04-24 Thread Christian König

Am 23.04.24 um 16:31 schrieb Tim Huang:

From: Tim Huang 

Clear warning that uses uninitialized value fw_size.

Signed-off-by: Tim Huang 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index d9dc5485..fb5de23fa8d8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -1205,7 +1205,8 @@ void amdgpu_gfx_cp_init_microcode(struct amdgpu_device 
*adev,
fw_size = le32_to_cpu(cp_hdr_v2_0->data_size_bytes);
break;
default:
-   break;
+   dev_err(adev->dev, "Invalid ucode id %u\n", ucode_id);
+   return;
}
  
  	if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {




Re: [PATCH] drm/amd/display: re-indent dc_power_down_on_boot()

2024-04-24 Thread Christian König

Am 24.04.24 um 15:20 schrieb Dan Carpenter:

On Wed, Apr 24, 2024 at 03:11:08PM +0200, Christian König wrote:

Am 24.04.24 um 13:41 schrieb Dan Carpenter:

These lines are indented too far.  Clean the whitespace.

Signed-off-by: Dan Carpenter 
---
   drivers/gpu/drm/amd/display/dc/core/dc.c | 7 +++
   1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 8eefba757da4..f64d7229eb6c 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -5043,11 +5043,10 @@ void dc_interrupt_ack(struct dc *dc, enum dc_irq_source 
src)
   void dc_power_down_on_boot(struct dc *dc)
   {
if (dc->ctx->dce_environment != DCE_ENV_VIRTUAL_HW &&
-   dc->hwss.power_down_on_boot) {
-
-   if (dc->caps.ips_support)
-   dc_exit_ips_for_hw_access(dc);
+   dc->hwss.power_down_on_boot) {
+   if (dc->caps.ips_support)
+   dc_exit_ips_for_hw_access(dc);

Well while at it can't the two ifs be merged here?

(I don't know this code to well, but it looks like it).


I'm sorry, I don't see what you're saying.


The indentation was so messed up that I though the call to 
power_down_on_boot() was after both ifs, but it is still inside the first.


So your patch is actually right, sorry for the noise.

Regards,
Christian.



I probably should have deleted the other blank line as well, though.
It introduces a checkpatch.pl --strict warning.

regards,
dan carpenter





Re: [PATCH] drm/amd/display: re-indent dc_power_down_on_boot()

2024-04-24 Thread Christian König

Am 24.04.24 um 13:41 schrieb Dan Carpenter:

These lines are indented too far.  Clean the whitespace.

Signed-off-by: Dan Carpenter 
---
  drivers/gpu/drm/amd/display/dc/core/dc.c | 7 +++
  1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 8eefba757da4..f64d7229eb6c 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -5043,11 +5043,10 @@ void dc_interrupt_ack(struct dc *dc, enum dc_irq_source 
src)
  void dc_power_down_on_boot(struct dc *dc)
  {
if (dc->ctx->dce_environment != DCE_ENV_VIRTUAL_HW &&
-   dc->hwss.power_down_on_boot) {
-
-   if (dc->caps.ips_support)
-   dc_exit_ips_for_hw_access(dc);
+   dc->hwss.power_down_on_boot) {
  
+		if (dc->caps.ips_support)

+   dc_exit_ips_for_hw_access(dc);


Well while at it can't the two ifs be merged here?

(I don't know this code to well, but it looks like it).

Regards,
Christian.


dc->hwss.power_down_on_boot(dc);
}
  }




Re: [PATCH 2/3] drm/amdgpu: Initialize timestamp for some legacy SOCs

2024-04-24 Thread Christian König

Am 24.04.24 um 12:03 schrieb Ma Jun:

Initialize the interrupt timestamp for some legacy SOCs
to fix the coverity issue "Uninitialized scalar variable"

Signed-off-by: Ma Jun 
Suggested-by: Christian König 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index 7e6d09730e6d..665c63f55278 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -445,6 +445,14 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev,
  
  	entry.ih = ih;

entry.iv_entry = (const uint32_t *)>ring[ring_index];
+
+   /*
+* timestamp is not supported on some legacy SOCs (cik, cz, iceland,
+* si and tonga), so initialize timestamp and timestamp_src to 0
+*/
+   entry.timestamp = 0;
+   entry.timestamp_src = 0;
+
amdgpu_ih_decode_iv(adev, );
  
  	trace_amdgpu_iv(ih - >irq.ih, );




Re: [PATCH v3] drm/amdgpu: add return result for amdgpu_i2c_{get/put}_byte

2024-04-24 Thread Christian König

Am 24.04.24 um 11:36 schrieb Bob Zhou:

After amdgpu_i2c_get_byte fail, amdgpu_i2c_put_byte shouldn't be
conducted to put wrong value.
So return and check the i2c transfer result.

Signed-off-by: Bob Zhou 
Suggested-by: Christian König 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c | 47 +++--
  1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c
index 82608df43396..e0f3bff335c4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c
@@ -280,7 +280,7 @@ amdgpu_i2c_lookup(struct amdgpu_device *adev,
return NULL;
  }
  
-static void amdgpu_i2c_get_byte(struct amdgpu_i2c_chan *i2c_bus,

+static int amdgpu_i2c_get_byte(struct amdgpu_i2c_chan *i2c_bus,
 u8 slave_addr,
 u8 addr,
 u8 *val)
@@ -305,16 +305,18 @@ static void amdgpu_i2c_get_byte(struct amdgpu_i2c_chan 
*i2c_bus,
out_buf[0] = addr;
out_buf[1] = 0;
  
-	if (i2c_transfer(_bus->adapter, msgs, 2) == 2) {

-   *val = in_buf[0];
-   DRM_DEBUG("val = 0x%02x\n", *val);
-   } else {
-   DRM_DEBUG("i2c 0x%02x 0x%02x read failed\n",
- addr, *val);
+   if (i2c_transfer(_bus->adapter, msgs, 2) != 2) {
+   DRM_DEBUG("i2c 0x%02x read failed\n", addr);
+   return -EIO;
}
+
+   *val = in_buf[0];
+   DRM_DEBUG("val = 0x%02x\n", *val);
+
+   return 0;
  }
  
-static void amdgpu_i2c_put_byte(struct amdgpu_i2c_chan *i2c_bus,

+static int amdgpu_i2c_put_byte(struct amdgpu_i2c_chan *i2c_bus,
 u8 slave_addr,
 u8 addr,
 u8 val)
@@ -330,9 +332,12 @@ static void amdgpu_i2c_put_byte(struct amdgpu_i2c_chan 
*i2c_bus,
out_buf[0] = addr;
out_buf[1] = val;
  
-	if (i2c_transfer(_bus->adapter, , 1) != 1)

-   DRM_DEBUG("i2c 0x%02x 0x%02x write failed\n",
- addr, val);
+   if (i2c_transfer(_bus->adapter, , 1) != 1) {
+   DRM_DEBUG("i2c 0x%02x 0x%02x write failed\n", addr, val);
+   return -EIO;
+   }
+
+   return 0;
  }
  
  /* ddc router switching */

@@ -347,16 +352,18 @@ amdgpu_i2c_router_select_ddc_port(const struct 
amdgpu_connector *amdgpu_connecto
if (!amdgpu_connector->router_bus)
return;
  
-	amdgpu_i2c_get_byte(amdgpu_connector->router_bus,

+   if (amdgpu_i2c_get_byte(amdgpu_connector->router_bus,
amdgpu_connector->router.i2c_addr,
-   0x3, );
+   0x3, ))
+   return;
val &= ~amdgpu_connector->router.ddc_mux_control_pin;
amdgpu_i2c_put_byte(amdgpu_connector->router_bus,
amdgpu_connector->router.i2c_addr,
0x3, val);
-   amdgpu_i2c_get_byte(amdgpu_connector->router_bus,
+   if (amdgpu_i2c_get_byte(amdgpu_connector->router_bus,
amdgpu_connector->router.i2c_addr,
-   0x1, );
+   0x1, ))
+   return;
val &= ~amdgpu_connector->router.ddc_mux_control_pin;
val |= amdgpu_connector->router.ddc_mux_state;
amdgpu_i2c_put_byte(amdgpu_connector->router_bus,
@@ -376,16 +383,18 @@ amdgpu_i2c_router_select_cd_port(const struct 
amdgpu_connector *amdgpu_connector
if (!amdgpu_connector->router_bus)
return;
  
-	amdgpu_i2c_get_byte(amdgpu_connector->router_bus,

+   if (amdgpu_i2c_get_byte(amdgpu_connector->router_bus,
amdgpu_connector->router.i2c_addr,
-   0x3, );
+   0x3, ))
+   return;
val &= ~amdgpu_connector->router.cd_mux_control_pin;
amdgpu_i2c_put_byte(amdgpu_connector->router_bus,
amdgpu_connector->router.i2c_addr,
0x3, val);
-   amdgpu_i2c_get_byte(amdgpu_connector->router_bus,
+   if (amdgpu_i2c_get_byte(amdgpu_connector->router_bus,
amdgpu_connector->router.i2c_addr,
-   0x1, );
+   0x1, ))
+   return;
val &= ~amdgpu_connector->router.cd_mux_control_pin;
val |= amdgpu_connector->router.cd_mux_state;
amdgpu_i2c_put_byte(amdgpu_connector->router_bus,




Re: [PATCH 4/4 V2] drm/amdgpu: Using uninitialized value *size when calling amdgpu_vce_cs_reloc

2024-04-24 Thread Christian König

Am 24.04.24 um 11:04 schrieb Jesse Zhang:

Initialize the size before calling amdgpu_vce_cs_reloc, such as case 0x0301.
V2: To really improve the handling we would actually
need to have a separate value of 0x.(Christian)

Signed-off-by: Jesse Zhang 
Suggested-by: Christian König 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index 59acf424a078..968ca2c84ef7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -743,7 +743,8 @@ int amdgpu_vce_ring_parse_cs(struct amdgpu_cs_parser *p,
uint32_t created = 0;
uint32_t allocated = 0;
uint32_t tmp, handle = 0;
-   uint32_t *size = 
+   uint32_t dummy = 0x;
+   uint32_t *size = 
unsigned int idx;
int i, r = 0;
  




Re: [PATCH 4/4 V2] drm/amdgpu: Using uninitialized value *size when calling amdgpu_vce_cs_reloc

2024-04-24 Thread Christian König

Am 24.04.24 um 10:41 schrieb Jesse Zhang:

Initialize the size before calling amdgpu_vce_cs_reloc, such as case 0x0301.
V2: To really improve the handling we would actually
 need to have a separate value of 0x.(Christian)

Signed-off-by: Jesse Zhang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index 59acf424a078..1929de0db3a1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -742,7 +742,7 @@ int amdgpu_vce_ring_parse_cs(struct amdgpu_cs_parser *p,
uint32_t destroyed = 0;
uint32_t created = 0;
uint32_t allocated = 0;
-   uint32_t tmp, handle = 0;
+   uint32_t tmp = 0x, handle = 0;


That's close, but what I meant was to have something like this instead:

uint32_t dummy = 0x; *size = 

Because tmp is overwritten by user values while parsing the command stream.

Regards,
Christian.


uint32_t *size = 
unsigned int idx;
int i, r = 0;




Re: [PATCH v2] drm/amdgpu: add return result for amdgpu_i2c_{get/put}_byte

2024-04-24 Thread Christian König

Am 24.04.24 um 09:52 schrieb Bob Zhou:

After amdgpu_i2c_get_byte fail, amdgpu_i2c_put_byte shouldn't be
conducted to put wrong value.
So return and check the i2c transfer result.

Signed-off-by: Bob Zhou 


Looks good in general, just some coding style comments below.


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c | 42 +++--
  1 file changed, 26 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c
index 82608df43396..c588704d56a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c
@@ -280,11 +280,12 @@ amdgpu_i2c_lookup(struct amdgpu_device *adev,
return NULL;
  }
  
-static void amdgpu_i2c_get_byte(struct amdgpu_i2c_chan *i2c_bus,

+static int amdgpu_i2c_get_byte(struct amdgpu_i2c_chan *i2c_bus,
 u8 slave_addr,
 u8 addr,
 u8 *val)
  {
+   int r = 0;


BTW: Short variables like i and r should be declared last. I don't care 
much about that personally, but some upstream maintainers insist on that.


And never initialize a variable if you don't need it. This will be 
complained about by automated checkers as well.



u8 out_buf[2];
u8 in_buf[2];
struct i2c_msg msgs[] = {
@@ -309,16 +310,18 @@ static void amdgpu_i2c_get_byte(struct amdgpu_i2c_chan 
*i2c_bus,
*val = in_buf[0];
DRM_DEBUG("val = 0x%02x\n", *val);
} else {
-   DRM_DEBUG("i2c 0x%02x 0x%02x read failed\n",
- addr, *val);
+   r = -EIO;
+   DRM_DEBUG("i2c 0x%02x 0x%02x read failed\n", addr, *val);
}
+   return r;


Better to write it like this:

if (error_condition) {
    DRM_DEBUG("i2c 0x%02x read failed\n", addr);
    return -EIO)
}

*val = in_buf[0];
DRM_DEBUG("val = 0x%02x\n", *val);

Printing *val in the error condition will result in use of uninitialized 
value as well.



  }
  
-static void amdgpu_i2c_put_byte(struct amdgpu_i2c_chan *i2c_bus,

+static int amdgpu_i2c_put_byte(struct amdgpu_i2c_chan *i2c_bus,
 u8 slave_addr,
 u8 addr,
 u8 val)
  {
+   int r = 0;
uint8_t out_buf[2];
struct i2c_msg msg = {
.addr = slave_addr,
@@ -330,9 +333,12 @@ static void amdgpu_i2c_put_byte(struct amdgpu_i2c_chan 
*i2c_bus,
out_buf[0] = addr;
out_buf[1] = val;
  
-	if (i2c_transfer(_bus->adapter, , 1) != 1)

-   DRM_DEBUG("i2c 0x%02x 0x%02x write failed\n",
- addr, val);
+   if (i2c_transfer(_bus->adapter, , 1) != 1) {
+   r = -EIO;
+   DRM_DEBUG("i2c 0x%02x 0x%02x write failed\n", addr, val);
+   }
+
+   return r;


Same here. As long as you don't have anything to cleanup just use 
"return -EIO" and "return 0;" directly.


Regards,
Christian.


  }
  
  /* ddc router switching */

@@ -347,16 +353,18 @@ amdgpu_i2c_router_select_ddc_port(const struct 
amdgpu_connector *amdgpu_connecto
if (!amdgpu_connector->router_bus)
return;
  
-	amdgpu_i2c_get_byte(amdgpu_connector->router_bus,

+   if (amdgpu_i2c_get_byte(amdgpu_connector->router_bus,
amdgpu_connector->router.i2c_addr,
-   0x3, );
+   0x3, ))
+   return;
val &= ~amdgpu_connector->router.ddc_mux_control_pin;
amdgpu_i2c_put_byte(amdgpu_connector->router_bus,
amdgpu_connector->router.i2c_addr,
0x3, val);
-   amdgpu_i2c_get_byte(amdgpu_connector->router_bus,
+   if (amdgpu_i2c_get_byte(amdgpu_connector->router_bus,
amdgpu_connector->router.i2c_addr,
-   0x1, );
+   0x1, ))
+   return;
val &= ~amdgpu_connector->router.ddc_mux_control_pin;
val |= amdgpu_connector->router.ddc_mux_state;
amdgpu_i2c_put_byte(amdgpu_connector->router_bus,
@@ -368,7 +376,7 @@ amdgpu_i2c_router_select_ddc_port(const struct 
amdgpu_connector *amdgpu_connecto
  void
  amdgpu_i2c_router_select_cd_port(const struct amdgpu_connector 
*amdgpu_connector)
  {
-   u8 val;
+   u8 val = 0;
  
  	if (!amdgpu_connector->router.cd_valid)

return;
@@ -376,16 +384,18 @@ amdgpu_i2c_router_select_cd_port(const struct 
amdgpu_connector *amdgpu_connector
if (!amdgpu_connector->router_bus)
return;
  
-	amdgpu_i2c_get_byte(amdgpu_connector->router_bus,

+   if (amdgpu_i2c_get_byte(amdgpu_connector->router_bus,
amdgpu_connector->router.i2c_addr,
-   0x3, );
+   0x3, ))
+   return;
val &= 

Re: [PATCH v3] drm/amdgpu: Modify the contiguous flags behaviour

2024-04-24 Thread Christian König

Am 24.04.24 um 09:13 schrieb Arunpravin Paneer Selvam:

Now we have two flags for contiguous VRAM buffer allocation.
If the application request for AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
it would set the ttm place TTM_PL_FLAG_CONTIGUOUS flag in the
buffer's placement function.

This patch will change the default behaviour of the two flags.

When we set AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS
- This means contiguous is not mandatory.
- we will try to allocate the contiguous buffer. Say if the
   allocation fails, we fallback to allocate the individual pages.

When we setTTM_PL_FLAG_CONTIGUOUS
- This means contiguous allocation is mandatory.
- we are setting this in amdgpu_bo_pin_restricted() before bo validation
   and check this flag in the vram manager file.
- if this is set, we should allocate the buffer pages contiguously.
   the allocation fails, we return -ENOSPC.

v2:
   - keep the mem_flags and bo->flags check as is(Christian)
   - place the TTM_PL_FLAG_CONTIGUOUS flag setting into the
 amdgpu_bo_pin_restricted function placement range iteration
 loop(Christian)
   - rename find_pages with amdgpu_vram_mgr_calculate_pages_per_block
 (Christian)
   - Keep the kernel BO allocation as is(Christain)
   - If BO pin vram allocation failed, we need to return -ENOSPC as
 RDMA cannot work with scattered VRAM pages(Philip)

v3(Christian):
   - keep contiguous flag handling outside of pages_per_block
 calculation
   - remove the hacky implementation in contiguous flag error
 handling code

Signed-off-by: Arunpravin Paneer Selvam 
Suggested-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c   |  8 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 83 ++--
  2 files changed, 65 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 492aebc44e51..c594d2a5978e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -154,8 +154,10 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo 
*abo, u32 domain)
else
places[c].flags |= TTM_PL_FLAG_TOPDOWN;
  
-		if (flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)

+   if (abo->tbo.type == ttm_bo_type_kernel &&
+   flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)
places[c].flags |= TTM_PL_FLAG_CONTIGUOUS;
+
c++;
}
  
@@ -965,6 +967,10 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain,

if (!bo->placements[i].lpfn ||
(lpfn && lpfn < bo->placements[i].lpfn))
bo->placements[i].lpfn = lpfn;
+
+   if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS &&
+   bo->placements[i].mem_type == TTM_PL_VRAM)
+   bo->placements[i].flags |= TTM_PL_FLAG_CONTIGUOUS;
}
  
  	r = ttm_bo_validate(>tbo, >placement, );

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index e494f5bf136a..17c5d9ce9927 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -88,6 +88,23 @@ static inline u64 amdgpu_vram_mgr_blocks_size(struct 
list_head *head)
return size;
  }
  
+static inline void amdgpu_vram_mgr_limit_min_block_size(unsigned long pages_per_block,

+   u64 size,
+   u64 *min_block_size,
+   bool contiguous_enabled)
+{
+   if (contiguous_enabled)
+   return;
+
+   /*
+* if size >= 2MiB, limit the min_block_size to 2MiB
+* for better TLB usage.
+*/
+   if ((size >= (u64)pages_per_block << PAGE_SHIFT) &&
+   !(size & (((u64)pages_per_block << PAGE_SHIFT) - 1)))
+   *min_block_size = (u64)pages_per_block << PAGE_SHIFT;
+}
+
  /**
   * DOC: mem_info_vram_total
   *
@@ -452,11 +469,12 @@ static int amdgpu_vram_mgr_new(struct 
ttm_resource_manager *man,
struct amdgpu_device *adev = to_amdgpu_device(mgr);
struct amdgpu_bo *bo = ttm_to_amdgpu_bo(tbo);
u64 vis_usage = 0, max_bytes, min_block_size;
+   struct amdgpu_bo *bo = ttm_to_amdgpu_bo(tbo);
struct amdgpu_vram_mgr_resource *vres;
u64 size, remaining_size, lpfn, fpfn;
struct drm_buddy *mm = >mm;
-   struct drm_buddy_block *block;
unsigned long pages_per_block;
+   struct drm_buddy_block *block;
int r;
  
  	lpfn = (u64)place->lpfn << PAGE_SHIFT;

@@ -469,18 +487,14 @@ static int amdgpu_vram_mgr_new(struct 
ttm_resource_manager *man,
if (tbo->type != ttm_bo_type_kernel)
max_byt

Re: [PATCH 4/4] drm/amdgpu: Using uninitialized value *size when calling amdgpu_vce_cs_reloc

2024-04-24 Thread Christian König

Am 24.04.24 um 04:50 schrieb jesse.zh...@amd.com:

From: Jesse Zhang 

Initialize the size before calling amdgpu_vce_cs_reloc, such as case 0x0301.

Signed-off-by: Jesse Zhang 


To really improve the handling we would actually need to have a separate 
value of 0x.


Regards,
Christian.


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index 59acf424a078..60d97cd14855 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -742,7 +742,7 @@ int amdgpu_vce_ring_parse_cs(struct amdgpu_cs_parser *p,
uint32_t destroyed = 0;
uint32_t created = 0;
uint32_t allocated = 0;
-   uint32_t tmp, handle = 0;
+   uint32_t tmp = 0, handle = 0;
uint32_t *size = 
unsigned int idx;
int i, r = 0;




Re: [PATCH] drm/amdgpu: fix some uninitialized variables

2024-04-24 Thread Christian König

Am 24.04.24 um 04:04 schrieb Zhang, Jesse(Jie):

[AMD Official Use Only - General]

Hi Alex,

-Original Message-
From: Alex Deucher 
Sent: Wednesday, April 24, 2024 9:46 AM
To: Zhang, Jesse(Jie) 
Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander ; Koenig, 
Christian ; Huang, Tim 
Subject: Re: [PATCH] drm/amdgpu: fix some uninitialized variables

On Tue, Apr 23, 2024 at 9:27 PM Jesse Zhang  wrote:

Fix some variables not initialized before use.
Scan them out using Synopsys tools.

Signed-off-by: Jesse Zhang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 5 +
  drivers/gpu/drm/amd/amdgpu/atom.c   | 1 +
  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c  | 3 ++-
drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c  | 3 ++-
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c  | 3 ++-

Please split out the SDMA changes into a separate patch.

More comments below on the other hunks.


  6 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index 59acf424a078..60d97cd14855 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -742,7 +742,7 @@ int amdgpu_vce_ring_parse_cs(struct amdgpu_cs_parser *p,
 uint32_t destroyed = 0;
 uint32_t created = 0;
 uint32_t allocated = 0;
-   uint32_t tmp, handle = 0;
+   uint32_t tmp = 0, handle = 0;

Can you elaborate on what the issue is here?  Presumably it's warning about 
size being passed as a parameter in the function?
[Zhang, Jesse(Jie)]  Using uninitialized value *size when calling 
amdgpu_vce_cs_reloc for case 0x0301. Because uint32_t *size = 
 I'm not sure if other commands initialize the size before 
running case 0x0301.


Ah! Yeah, that handling is actually correct. The size might be 
uninitialized in this function when the command stream isn't valid.


We could instead set size to NULL and check that everywhere, but that 
would probably be overkill.


Well we could silence the warning by setting tmp to zero, but that 
actually doesn't improve anything.


Regards,
Christian.




 uint32_t *size = 
 unsigned int idx;
 int i, r = 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 677eb141554e..13125ddd5e86 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -410,6 +410,11 @@ static void amdgpu_vcn_idle_work_handler(struct 
work_struct *work)
 else
 new_state.fw_based =
VCN_DPG_STATE__UNPAUSE;

+   if 
(amdgpu_fence_count_emitted(adev->jpeg.inst->ring_dec))
+   new_state.jpeg = VCN_DPG_STATE__PAUSE;
+   else
+   new_state.jpeg =
+ VCN_DPG_STATE__UNPAUSE;
+
 adev->vcn.pause_dpg_mode(adev, j, _state);
 }


This should be a separate patch as well.
  Thanks for your reminder, Alex, I will.



diff --git a/drivers/gpu/drm/amd/amdgpu/atom.c
b/drivers/gpu/drm/amd/amdgpu/atom.c
index 72362df352f6..d552e013354c 100644
--- a/drivers/gpu/drm/amd/amdgpu/atom.c
+++ b/drivers/gpu/drm/amd/amdgpu/atom.c
@@ -1243,6 +1243,7 @@ static int amdgpu_atom_execute_table_locked(struct 
atom_context *ctx, int index,
 ectx.ps_size = params_size;
 ectx.abort = false;
 ectx.last_jump = 0;
+   ectx.last_jump_jiffies = 0;
 if (ws) {
 ectx.ws = kcalloc(4, ws, GFP_KERNEL);
 ectx.ws_size = ws;
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
index 45a2d0a5a2d7..b7d33d78bce0 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
@@ -999,7 +999,8 @@ static int sdma_v5_0_ring_test_ring(struct amdgpu_ring 
*ring)
 r = amdgpu_ring_alloc(ring, 20);
 if (r) {
 DRM_ERROR("amdgpu: dma failed to lock ring %d (%d).\n", 
ring->idx, r);
-   amdgpu_device_wb_free(adev, index);
+   if (!ring->is_mes_queue)
+   amdgpu_device_wb_free(adev, index);
 return r;
 }

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
index 43e64b2da575..cc9e961f0078 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
@@ -839,7 +839,8 @@ static int sdma_v5_2_ring_test_ring(struct amdgpu_ring 
*ring)
 r = amdgpu_ring_alloc(ring, 20);
 if (r) {
 DRM_ERROR("amdgpu: dma failed to lock ring %d (%d).\n", 
ring->idx, r);
-   amdgpu_device_wb_free(adev, index);
+   if (!ring->is_mes_queue)
+   amdgpu_device_wb_free(adev, index);
 return r;
 }

diff --git 

Re: [PATCH] drm/amdgpu: fix some uninitialized variables

2024-04-24 Thread Christian König

Am 24.04.24 um 03:19 schrieb Jesse Zhang:

Fix some variables not initialized before use.
Scan them out using Synopsys tools.

Signed-off-by: Jesse Zhang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 5 +
  drivers/gpu/drm/amd/amdgpu/atom.c   | 1 +
  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c  | 3 ++-
  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c  | 3 ++-
  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c  | 3 ++-
  6 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index 59acf424a078..60d97cd14855 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -742,7 +742,7 @@ int amdgpu_vce_ring_parse_cs(struct amdgpu_cs_parser *p,
uint32_t destroyed = 0;
uint32_t created = 0;
uint32_t allocated = 0;
-   uint32_t tmp, handle = 0;
+   uint32_t tmp = 0, handle = 0;


As far as I can see that isn't correct. tmp isn't used before it is written.

Is the tool maybe broken?

Regards,
Christian.


uint32_t *size = 
unsigned int idx;
int i, r = 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 677eb141554e..13125ddd5e86 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -410,6 +410,11 @@ static void amdgpu_vcn_idle_work_handler(struct 
work_struct *work)
else
new_state.fw_based = VCN_DPG_STATE__UNPAUSE;
  
+			if (amdgpu_fence_count_emitted(adev->jpeg.inst->ring_dec))

+   new_state.jpeg = VCN_DPG_STATE__PAUSE;
+   else
+   new_state.jpeg = VCN_DPG_STATE__UNPAUSE;
+
adev->vcn.pause_dpg_mode(adev, j, _state);
}
  
diff --git a/drivers/gpu/drm/amd/amdgpu/atom.c b/drivers/gpu/drm/amd/amdgpu/atom.c

index 72362df352f6..d552e013354c 100644
--- a/drivers/gpu/drm/amd/amdgpu/atom.c
+++ b/drivers/gpu/drm/amd/amdgpu/atom.c
@@ -1243,6 +1243,7 @@ static int amdgpu_atom_execute_table_locked(struct 
atom_context *ctx, int index,
ectx.ps_size = params_size;
ectx.abort = false;
ectx.last_jump = 0;
+   ectx.last_jump_jiffies = 0;
if (ws) {
ectx.ws = kcalloc(4, ws, GFP_KERNEL);
ectx.ws_size = ws;
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
index 45a2d0a5a2d7..b7d33d78bce0 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
@@ -999,7 +999,8 @@ static int sdma_v5_0_ring_test_ring(struct amdgpu_ring 
*ring)
r = amdgpu_ring_alloc(ring, 20);
if (r) {
DRM_ERROR("amdgpu: dma failed to lock ring %d (%d).\n", 
ring->idx, r);
-   amdgpu_device_wb_free(adev, index);
+   if (!ring->is_mes_queue)
+   amdgpu_device_wb_free(adev, index);
return r;
}
  
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c

index 43e64b2da575..cc9e961f0078 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
@@ -839,7 +839,8 @@ static int sdma_v5_2_ring_test_ring(struct amdgpu_ring 
*ring)
r = amdgpu_ring_alloc(ring, 20);
if (r) {
DRM_ERROR("amdgpu: dma failed to lock ring %d (%d).\n", 
ring->idx, r);
-   amdgpu_device_wb_free(adev, index);
+   if (!ring->is_mes_queue)
+   amdgpu_device_wb_free(adev, index);
return r;
}
  
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c

index 1f4877195213..c833b6b8373b 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
@@ -861,7 +861,8 @@ static int sdma_v6_0_ring_test_ring(struct amdgpu_ring 
*ring)
r = amdgpu_ring_alloc(ring, 5);
if (r) {
DRM_ERROR("amdgpu: dma failed to lock ring %d (%d).\n", 
ring->idx, r);
-   amdgpu_device_wb_free(adev, index);
+   if (!ring->is_mes_queue)
+   amdgpu_device_wb_free(adev, index);
return r;
}
  




Re: [PATCH] drm/amdgpu: Fix two reset triggered in a row

2024-04-24 Thread Christian König

Am 23.04.24 um 20:05 schrieb Felix Kuehling:


On 2024-04-23 01:50, Christian König wrote:

Am 22.04.24 um 21:45 schrieb Yunxiang Li:

Reset request from KFD is missing a check for if a reset is already in
progress, this causes a second reset to be triggered right after the
previous one finishes. Add the check to align with the other reset 
sources.


NAK, that isn't how this should be handled.

Instead all reset source which are handled by a previous reset should 
be canceled.


In other words there should be a cancel_work(>kfd.reset_work); 
somewhere in the KFD code. When this doesn't work correctly then that 
is somehow missing.


If you see the use of amdgpu_in_reset() outside of the low level 
functions than that is clearly a bug.
Do we need to do that for all reset workers in the driver separately? 
I don't see where this is done for other reset workers.


Yeah, I think so. But we don't have so many reset workers if I'm not 
completely mistaken.


We have the KFD, FLR, the per engine one in the scheduler and IIRC one 
more for the CP (illegal operation and register write).


I'm not sure about the CP one, but all others should be handled 
correctly with the V2 patch as far as I can see.


Regards,
Christian.



Regards,
  Felix




Regards,
Christian.



Signed-off-by: Yunxiang Li 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c

index 3b4591f554f1..ce3dbb1cc2da 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -283,7 +283,7 @@ int amdgpu_amdkfd_post_reset(struct 
amdgpu_device *adev)

    void amdgpu_amdkfd_gpu_reset(struct amdgpu_device *adev)
  {
-    if (amdgpu_device_should_recover_gpu(adev))
+    if (amdgpu_device_should_recover_gpu(adev) && 
!amdgpu_in_reset(adev))

  amdgpu_reset_domain_schedule(adev->reset_domain,
   >kfd.reset_work);
  }






Re: [PATCH v5 2/6] drm/amdgpu: Handle sg size limit for contiguous allocation

2024-04-24 Thread Christian König

Am 23.04.24 um 17:28 schrieb Philip Yang:

Define macro MAX_SG_SEGMENT_SIZE 2GB, because struct scatterlist length
is unsigned int, and some users of it cast to a signed int, so every
segment of sg table is limited to size 2GB maximum.

For contiguous VRAM allocation, don't limit the max buddy block size in
order to get contiguous VRAM memory. To workaround the sg table segment
size limit, allocate multiple segments if contiguous size is bigger than
MAX_SG_SEGMENT_SIZE.

Signed-off-by: Philip Yang 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 4be8b091099a..ebffb58ea53a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -31,6 +31,8 @@
  #include "amdgpu_atomfirmware.h"
  #include "atom.h"
  
+#define AMDGPU_MAX_SG_SEGMENT_SIZE	(2UL << 30)

+
  struct amdgpu_vram_reservation {
u64 start;
u64 size;
@@ -532,9 +534,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager 
*man,
  
  		BUG_ON(min_block_size < mm->chunk_size);
  
-		/* Limit maximum size to 2GiB due to SG table limitations */

-   size = min(remaining_size, 2ULL << 30);
-
+   size = remaining_size;
if ((size >= (u64)pages_per_block << PAGE_SHIFT) &&
!(size & (((u64)pages_per_block << PAGE_SHIFT) 
- 1)))
min_block_size = (u64)pages_per_block << PAGE_SHIFT;
@@ -675,7 +675,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
amdgpu_res_first(res, offset, length, );
while (cursor.remaining) {
num_entries++;
-   amdgpu_res_next(, cursor.size);
+   amdgpu_res_next(, min(cursor.size, 
AMDGPU_MAX_SG_SEGMENT_SIZE));
}
  
  	r = sg_alloc_table(*sgt, num_entries, GFP_KERNEL);

@@ -695,7 +695,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
amdgpu_res_first(res, offset, length, );
for_each_sgtable_sg((*sgt), sg, i) {
phys_addr_t phys = cursor.start + adev->gmc.aper_base;
-   size_t size = cursor.size;
+   unsigned long size = min(cursor.size, 
AMDGPU_MAX_SG_SEGMENT_SIZE);
dma_addr_t addr;
  
  		addr = dma_map_resource(dev, phys, size, dir,

@@ -708,7 +708,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
sg_dma_address(sg) = addr;
sg_dma_len(sg) = size;
  
-		amdgpu_res_next(, cursor.size);

+   amdgpu_res_next(, size);
}
  
  	return 0;




Re: [PATCH v2] drm/amdgpu: Fix two reset triggered in a row

2024-04-23 Thread Christian König

Am 23.04.24 um 16:44 schrieb Yunxiang Li:

Some times a hang GPU causes multiple reset source to schedule resets,
if the second source schedule after we call
amdgpu_device_stop_pending_resets they will be able to trigger an
unnecessary reset.

Move amdgpu_device_stop_pending_resets to after the reset is already
done, since any reset scheduled after that point would be legitimate.
Remove unnecessary and incorrect checks for amdgpu_in_reset that was
kinda serving this purpose.

Signed-off-by: Yunxiang Li 


Looks really good to me of hand, especially that so many cases of using 
amdgpu_in_reset() are removed.


But I'm just not deeply into each component to fully judge everything here.

So only Acked-by: Christian König  for now, if 
you need a more in deep review please ping me.


Regards,
Christian.


---
v2: instead of adding amdgpu_in_reset check, move when we cancel pending
resets

  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 17 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c   |  2 +-
  drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c  |  2 +-
  drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c  |  2 +-
  drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c  |  2 +-
  5 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f8a34db5d9e3..28f6a1c38b17 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5061,8 +5061,6 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device 
*adev,
  retry:
amdgpu_amdkfd_pre_reset(adev);
  
-	amdgpu_device_stop_pending_resets(adev);

-
if (from_hypervisor)
r = amdgpu_virt_request_full_gpu(adev, true);
else
@@ -5813,13 +5811,6 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
  r, adev_to_drm(tmp_adev)->unique);
tmp_adev->asic_reset_res = r;
}
-
-   if (!amdgpu_sriov_vf(tmp_adev))
-   /*
-   * Drop all pending non scheduler resets. Scheduler 
resets
-   * were already dropped during drm_sched_stop
-   */
-   amdgpu_device_stop_pending_resets(tmp_adev);
}
  
  	/* Actual ASIC resets if needed.*/

@@ -5841,6 +5832,14 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
goto retry;
}
  
+	list_for_each_entry(tmp_adev, device_list_handle, reset_list) {

+   /*
+* Drop all pending non scheduler resets. Scheduler resets
+* were already dropped during drm_sched_stop
+*/
+   amdgpu_device_stop_pending_resets(tmp_adev);
+   }
+
  skip_hw_reset:
  
  	/* Post ASIC reset for all devs .*/

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index 54ab51a4ada7..c2385178d6b3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -597,7 +597,7 @@ static void amdgpu_virt_update_vf2pf_work_item(struct 
work_struct *work)
if (ret) {
adev->virt.vf2pf_update_retry_cnt++;
if ((adev->virt.vf2pf_update_retry_cnt >= 
AMDGPU_VF2PF_UPDATE_MAX_RETRY_LIMIT) &&
-   amdgpu_sriov_runtime(adev) && !amdgpu_in_reset(adev)) {
+   amdgpu_sriov_runtime(adev)) {
amdgpu_ras_set_fed(adev, true);
if (amdgpu_reset_domain_schedule(adev->reset_domain,
  >virt.flr_work))
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c 
b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
index 0c7275bca8f7..c5ba9c4757a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
@@ -319,7 +319,7 @@ static int xgpu_ai_mailbox_rcv_irq(struct amdgpu_device 
*adev,
  
  	switch (event) {

case IDH_FLR_NOTIFICATION:
-   if (amdgpu_sriov_runtime(adev) && !amdgpu_in_reset(adev))
+   if (amdgpu_sriov_runtime(adev))

WARN_ONCE(!amdgpu_reset_domain_schedule(adev->reset_domain,

>virt.flr_work),
  "Failed to queue work! at %s",
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c 
b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
index aba00d961627..fa9d1b02f391 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
@@ -358,7 +358,7 @@ static int xgpu_nv_mailbox_rcv_irq(struct amdgpu_device 
*adev,
  
  	switch (event) {

case IDH_FLR_NOTIFICATION:
-   if (amdgpu_sriov_runtime(adev) && !amdgpu_in_reset(adev))
+   if (amdgpu_sriov_runtime(adev))
  

Re: [PATCH v3] drm/amdgpu: fix uninitialized scalar variable warning

2024-04-23 Thread Christian König

Am 23.04.24 um 16:31 schrieb Tim Huang:

From: Tim Huang 

Clear warning that uses uninitialized value fw_size.

Signed-off-by: Tim Huang 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index d9dc5485..fb5de23fa8d8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -1205,7 +1205,8 @@ void amdgpu_gfx_cp_init_microcode(struct amdgpu_device 
*adev,
fw_size = le32_to_cpu(cp_hdr_v2_0->data_size_bytes);
break;
default:
-   break;
+   dev_err(adev->dev, "Invalid ucode id %u\n", ucode_id);
+   return;
}
  
  	if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {




Re: [PATCH 1/2] drm/amdgpu: add a spinlock to wb allocation

2024-04-23 Thread Christian König

Am 23.04.24 um 15:18 schrieb Alex Deucher:

On Tue, Apr 23, 2024 at 2:57 AM Christian König
 wrote:

Am 22.04.24 um 16:37 schrieb Alex Deucher:

As we use wb slots more dynamically, we need to lock
access to avoid racing on allocation or free.

Wait a second. Why are we using the wb slots dynamically?


See patch 2.  I needed a way to allocate small GPU accessible memory
locations on the fly.  Using WB seems like a good solution.


That's probably better done with the seq64 allocator. At least the 
original idea was that it is self containing and can be used by many 
threads at the same time.


Apart from that I really think we need to talk with the MES guys about 
changing that behavior ASAP. This is really a bug we need to fix and not 
work around like that.


Christian.



Alex


The number of slots made available is statically calculated, when this
is suddenly used dynamically we have quite a bug here.

Regards,
Christian.


Signed-off-by: Alex Deucher 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 +
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 ++-
   2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index cac0ca64367b..f87d53e183c3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -502,6 +502,7 @@ struct amdgpu_wb {
   uint64_tgpu_addr;
   u32 num_wb; /* Number of wb slots actually reserved 
for amdgpu. */
   unsigned long   used[DIV_ROUND_UP(AMDGPU_MAX_WB, BITS_PER_LONG)];
+ spinlock_t  lock;
   };

   int amdgpu_device_wb_get(struct amdgpu_device *adev, u32 *wb);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f8a34db5d9e3..869256394136 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1482,13 +1482,17 @@ static int amdgpu_device_wb_init(struct amdgpu_device 
*adev)
*/
   int amdgpu_device_wb_get(struct amdgpu_device *adev, u32 *wb)
   {
- unsigned long offset = find_first_zero_bit(adev->wb.used, 
adev->wb.num_wb);
+ unsigned long flags, offset;

+ spin_lock_irqsave(>wb.lock, flags);
+ offset = find_first_zero_bit(adev->wb.used, adev->wb.num_wb);
   if (offset < adev->wb.num_wb) {
   __set_bit(offset, adev->wb.used);
+ spin_unlock_irqrestore(>wb.lock, flags);
   *wb = offset << 3; /* convert to dw offset */
   return 0;
   } else {
+ spin_unlock_irqrestore(>wb.lock, flags);
   return -EINVAL;
   }
   }
@@ -1503,9 +1507,13 @@ int amdgpu_device_wb_get(struct amdgpu_device *adev, u32 
*wb)
*/
   void amdgpu_device_wb_free(struct amdgpu_device *adev, u32 wb)
   {
+ unsigned long flags;
+
   wb >>= 3;
+ spin_lock_irqsave(>wb.lock, flags);
   if (wb < adev->wb.num_wb)
   __clear_bit(wb, adev->wb.used);
+ spin_unlock_irqrestore(>wb.lock, flags);
   }

   /**
@@ -4061,6 +4069,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
   spin_lock_init(>se_cac_idx_lock);
   spin_lock_init(>audio_endpt_idx_lock);
   spin_lock_init(>mm_stats.lock);
+ spin_lock_init(>wb.lock);

   INIT_LIST_HEAD(>shadow_list);
   mutex_init(>shadow_list_lock);




Re: [PATCH v4 6/7] drm/amdgpu: Skip dma map resource for null RDMA device

2024-04-23 Thread Christian König

Am 23.04.24 um 15:04 schrieb Philip Yang:

To test RDMA using dummy driver on the system without NIC/RDMA
device, the get/put dma pages pass in null device pointer, skip the
dma map/unmap resource and sg table to avoid null pointer access.


Well just to make it clear this patch is really a no-go for upstreaming.

The RDMA code isn't upstream as far as I know and doing this here is 
really not a good idea even internally.


Regards,
Christian.



Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 33 +++-
  1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 6c7133bf51d8..101a85263b53 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -698,12 +698,15 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
unsigned long size = min(cursor.size, MAX_SG_SEGMENT_SIZE);
dma_addr_t addr;
  
-		addr = dma_map_resource(dev, phys, size, dir,

-   DMA_ATTR_SKIP_CPU_SYNC);
-   r = dma_mapping_error(dev, addr);
-   if (r)
-   goto error_unmap;
-
+   if (dev) {
+   addr = dma_map_resource(dev, phys, size, dir,
+   DMA_ATTR_SKIP_CPU_SYNC);
+   r = dma_mapping_error(dev, addr);
+   if (r)
+   goto error_unmap;
+   } else {
+   addr = phys;
+   }
sg_set_page(sg, NULL, size, 0);
sg_dma_address(sg) = addr;
sg_dma_len(sg) = size;
@@ -717,10 +720,10 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
for_each_sgtable_sg((*sgt), sg, i) {
if (!sg->length)
continue;
-
-   dma_unmap_resource(dev, sg->dma_address,
-  sg->length, dir,
-  DMA_ATTR_SKIP_CPU_SYNC);
+   if (dev)
+   dma_unmap_resource(dev, sg->dma_address,
+  sg->length, dir,
+  DMA_ATTR_SKIP_CPU_SYNC);
}
sg_free_table(*sgt);
  
@@ -745,10 +748,12 @@ void amdgpu_vram_mgr_free_sgt(struct device *dev,

struct scatterlist *sg;
int i;
  
-	for_each_sgtable_sg(sgt, sg, i)

-   dma_unmap_resource(dev, sg->dma_address,
-  sg->length, dir,
-  DMA_ATTR_SKIP_CPU_SYNC);
+   if (dev) {
+   for_each_sgtable_sg(sgt, sg, i)
+   dma_unmap_resource(dev, sg->dma_address,
+  sg->length, dir,
+  DMA_ATTR_SKIP_CPU_SYNC);
+   }
sg_free_table(sgt);
kfree(sgt);
  }




Re: [PATCH v4 2/7] drm/amdgpu: Handle sg size limit for contiguous allocation

2024-04-23 Thread Christian König

Am 23.04.24 um 15:04 schrieb Philip Yang:

Define macro MAX_SG_SEGMENT_SIZE 2GB, because struct scatterlist length
is unsigned int, and some users of it cast to a signed int, so every
segment of sg table is limited to size 2GB maximum.

For contiguous VRAM allocation, don't limit the max buddy block size in
order to get contiguous VRAM memory. To workaround the sg table segment
size limit, allocate multiple segments if contiguous size is bigger than
MAX_SG_SEGMENT_SIZE.

Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 4be8b091099a..6c7133bf51d8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -31,6 +31,8 @@
  #include "amdgpu_atomfirmware.h"
  #include "atom.h"
  
+#define MAX_SG_SEGMENT_SIZE	(2UL << 30)

+


Please add an AMDGPU prefix before that name.

Apart from that looks good to me,
Christian.


  struct amdgpu_vram_reservation {
u64 start;
u64 size;
@@ -532,9 +534,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager 
*man,
  
  		BUG_ON(min_block_size < mm->chunk_size);
  
-		/* Limit maximum size to 2GiB due to SG table limitations */

-   size = min(remaining_size, 2ULL << 30);
-
+   size = remaining_size;
if ((size >= (u64)pages_per_block << PAGE_SHIFT) &&
!(size & (((u64)pages_per_block << PAGE_SHIFT) 
- 1)))
min_block_size = (u64)pages_per_block << PAGE_SHIFT;
@@ -675,7 +675,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
amdgpu_res_first(res, offset, length, );
while (cursor.remaining) {
num_entries++;
-   amdgpu_res_next(, cursor.size);
+   amdgpu_res_next(, min(cursor.size, MAX_SG_SEGMENT_SIZE));
}
  
  	r = sg_alloc_table(*sgt, num_entries, GFP_KERNEL);

@@ -695,7 +695,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
amdgpu_res_first(res, offset, length, );
for_each_sgtable_sg((*sgt), sg, i) {
phys_addr_t phys = cursor.start + adev->gmc.aper_base;
-   size_t size = cursor.size;
+   unsigned long size = min(cursor.size, MAX_SG_SEGMENT_SIZE);
dma_addr_t addr;
  
  		addr = dma_map_resource(dev, phys, size, dir,

@@ -708,7 +708,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
sg_dma_address(sg) = addr;
sg_dma_len(sg) = size;
  
-		amdgpu_res_next(, cursor.size);

+   amdgpu_res_next(, size);
}
  
  	return 0;




Re: [PATCH] drm/amdgpu: add error handle to avoid out-of-bounds

2024-04-23 Thread Christian König

Am 23.04.24 um 11:15 schrieb Bob Zhou:

if the sdma_v4_0_irq_id_to_seq return -EINVAL, the process should
be stop to avoid out-of-bounds read, so directly return -EINVAL.

Signed-off-by: Bob Zhou 


Acked-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index e2e3856938ed..101038395c3b 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -2021,6 +2021,9 @@ static int sdma_v4_0_process_trap_irq(struct 
amdgpu_device *adev,
  
  	DRM_DEBUG("IH: SDMA trap\n");

instance = sdma_v4_0_irq_id_to_seq(entry->client_id);
+   if (instance < 0)
+   return instance;
+
switch (entry->ring_id) {
case 0:
amdgpu_fence_process(>sdma.instance[instance].ring);




Re: [PATCH v2] drm/amdgpu: fix uninitialized scalar variable warning

2024-04-23 Thread Christian König

Am 23.04.24 um 10:43 schrieb Tim Huang:

From: Tim Huang 

Clear warning that uses uninitialized value fw_size.

Signed-off-by: Tim Huang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index d9dc5485..8d5cdbb99d8d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -1084,7 +1084,7 @@ void amdgpu_gfx_cp_init_microcode(struct amdgpu_device 
*adev,
const struct gfx_firmware_header_v2_0 *cp_hdr_v2_0;
struct amdgpu_firmware_info *info = NULL;
const struct firmware *ucode_fw;
-   unsigned int fw_size;
+   unsigned int fw_size = 0;


You don't need that any more when the default case returns.

Regards,
Christian.

  
  	switch (ucode_id) {

case AMDGPU_UCODE_ID_CP_PFP:
@@ -1205,7 +1205,8 @@ void amdgpu_gfx_cp_init_microcode(struct amdgpu_device 
*adev,
fw_size = le32_to_cpu(cp_hdr_v2_0->data_size_bytes);
break;
default:
-   break;
+   dev_err(adev->dev, "Invalid ucode id %u\n", ucode_id);
+   return;
}
  
  	if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {




Re: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning

2024-04-23 Thread Christian König

Am 23.04.24 um 10:12 schrieb Huang, Tim:

[AMD Official Use Only - General]

-Original Message-
From: amd-gfx  On Behalf Of Huang, Tim
Sent: Tuesday, April 23, 2024 4:01 PM
To: Koenig, Christian ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
Subject: RE: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning

[AMD Official Use Only - General]

[AMD Official Use Only - General]

Hi Christian,

-Original Message-
From: Koenig, Christian 
Sent: Tuesday, April 23, 2024 3:43 PM
To: Huang, Tim ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Koenig, Christian 

Subject: Re: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning

Am 23.04.24 um 08:28 schrieb Tim Huang:

Clear warning that uses uninitialized value fw_size.
In which case is the fw_size uninitialized and why setting it to zero helps in 
that case?
It's a warning that reported by the Coverity scan.  When the switch case " switch (ucode_id) " got to 
default and Condition "adev->firmware.load_type == AMDGPU_FW_LOAD_PSP", taking true branch,  it 
reports " uses uninitialized value fw_size " by this line.
"adev->firmware.fw_size += ALIGN(fw_size, PAGE_SIZE);“
It may not happen if we call this function correctly, but it just clears the 
warning and looks harmless.

Hi Christian,

I think it more to fix this warning, maybe I need to print an error and just return when 
go to the default case of "switch (ucode_id)" , will send out a v2 patch. 
Thanks.


Yeah, exactly that's the right idea.

Regards,
Christian.




Regards,
Christian.
Signed-off-by: Tim Huang 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index d9dc5485..6b8a58f501d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -1084,7 +1084,7 @@ void amdgpu_gfx_cp_init_microcode(struct amdgpu_device 
*adev,
   const struct gfx_firmware_header_v2_0 *cp_hdr_v2_0;
   struct amdgpu_firmware_info *info = NULL;
   const struct firmware *ucode_fw;
- unsigned int fw_size;
+ unsigned int fw_size = 0;

   switch (ucode_id) {
   case AMDGPU_UCODE_ID_CP_PFP:




Re: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning

2024-04-23 Thread Christian König
The problem is that it's a hit all case and that's usually seen as bad 
coding style.


In other words when one branch by accident forgets to set the fw_size we 
wouldn't get a warning any more and just use zero.


Please rather add setting the fw_size to zero to the default branch and 
maybe even add a warning when that happens.


Regards,
Christian.

Am 23.04.24 um 10:01 schrieb Huang, Tim:

[AMD Official Use Only - General]

Hi Christian,

-Original Message-
From: Koenig, Christian 
Sent: Tuesday, April 23, 2024 3:43 PM
To: Huang, Tim ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Koenig, Christian 

Subject: Re: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning

Am 23.04.24 um 08:28 schrieb Tim Huang:

Clear warning that uses uninitialized value fw_size.
In which case is the fw_size uninitialized and why setting it to zero helps in 
that case?

It's a warning that reported by the Coverity scan.  When the switch case " switch (ucode_id) 
" got to default and Condition "adev->firmware.load_type == AMDGPU_FW_LOAD_PSP", 
taking true branch,
  it reports " uses uninitialized value fw_size " by this line.
"adev->firmware.fw_size += ALIGN(fw_size, PAGE_SIZE);“

It may not happen if we call this function correctly, but it just clears the 
warning and looks harmless.

Regards,
Christian.


Signed-off-by: Tim Huang 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index d9dc5485..6b8a58f501d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -1084,7 +1084,7 @@ void amdgpu_gfx_cp_init_microcode(struct amdgpu_device 
*adev,
   const struct gfx_firmware_header_v2_0 *cp_hdr_v2_0;
   struct amdgpu_firmware_info *info = NULL;
   const struct firmware *ucode_fw;
- unsigned int fw_size;
+ unsigned int fw_size = 0;

   switch (ucode_id) {
   case AMDGPU_UCODE_ID_CP_PFP:




Re: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning

2024-04-23 Thread Christian König

Am 23.04.24 um 08:28 schrieb Tim Huang:

Clear warning that uses uninitialized value fw_size.


In which case is the fw_size uninitialized and why setting it to zero 
helps in that case?


Regards,
Christian.



Signed-off-by: Tim Huang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index d9dc5485..6b8a58f501d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -1084,7 +1084,7 @@ void amdgpu_gfx_cp_init_microcode(struct amdgpu_device 
*adev,
const struct gfx_firmware_header_v2_0 *cp_hdr_v2_0;
struct amdgpu_firmware_info *info = NULL;
const struct firmware *ucode_fw;
-   unsigned int fw_size;
+   unsigned int fw_size = 0;
  
  	switch (ucode_id) {

case AMDGPU_UCODE_ID_CP_PFP:




Re: [PATCH 2/2] drm/amdgpu: fix uninitialized variable warning

2024-04-23 Thread Christian König
In this case we should modify amdgpu_i2c_get_byte() to return an error 
and prevent writing the value back.


See zero is as random as any other value and initializing the variable 
here doesn't really help, it just makes your warning disappear.


Regards,
Christian.

Am 23.04.24 um 08:27 schrieb Zhou, Bob:

[AMD Official Use Only - General]

Thanks for your comments.

I should clarify the issue. As you see the amdgpu_i2c_get_byte code:
 if (i2c_transfer(_bus->adapter, msgs, 2) == 2) {
 *val = in_buf[0];
 DRM_DEBUG("val = 0x%02x\n", *val);
 } else {
 DRM_DEBUG("i2c 0x%02x 0x%02x read failed\n",  addr, 
*val);
 }
If the read failure by amdgpu_i2c_get_byte(), the value will not be modified.
Then the amdgpu_i2c_put_byte() successfully written the random value and it 
will cause unexpected issue.

Regards,
Bob

-Original Message-
From: Koenig, Christian 
Sent: 2024年4月23日 14:05
To: Zhou, Bob ; amd-gfx@lists.freedesktop.org; Deucher, Alexander 
; Koenig, Christian 
Subject: Re: [PATCH 2/2] drm/amdgpu: fix uninitialized variable warning

Am 23.04.24 um 07:33 schrieb Bob Zhou:

Because the val isn't initialized, a random variable is set by 
amdgpu_i2c_put_byte.
So fix the uninitialized issue.

Well that isn't correct. See the code here:

  amdgpu_i2c_get_byte(amdgpu_connector->router_bus,
  amdgpu_connector->router.i2c_addr,
  0x3, );
  val &= ~amdgpu_connector->router.cd_mux_control_pin;
  amdgpu_i2c_put_byte(amdgpu_connector->router_bus,
  amdgpu_connector->router.i2c_addr,
  0x3, val);

The value is first read by amdgpu_i2c_get_byte(), then modified and then 
written again by amdgpu_i2c_put_byte().

Was this an automated warning?

Either way the patch is clearly rejected.

Regards,
Christian.


Signed-off-by: Bob Zhou 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c
index 82608df43396..d4d2dc792b60 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c
@@ -368,7 +368,7 @@ amdgpu_i2c_router_select_ddc_port(const struct 
amdgpu_connector *amdgpu_connecto
   void
   amdgpu_i2c_router_select_cd_port(const struct amdgpu_connector 
*amdgpu_connector)
   {
- u8 val;
+ u8 val = 0;

   if (!amdgpu_connector->router.cd_valid)
   return;




Re: [PATCH v2] drm/amdgpu: IB test encode test package change for VCN5

2024-04-23 Thread Christian König

Am 22.04.24 um 21:59 schrieb Sonny Jiang:

From: Sonny Jiang 

VCN5 session info package interface changed

Signed-off-by: Sonny Jiang 


Mhm, in general we should push back on FW changes which makes stuff like 
that necessary. So what is the justification?


If the FW has a good justification for it then in theory we should 
create new hw generation specific functions. But copying the whole 
function for vcn_v5_0.c is overkill as well.


Regards,
Christian.


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index bb85772b1374..2bebdaaff533 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -851,6 +851,7 @@ static int amdgpu_vcn_enc_get_create_msg(struct amdgpu_ring 
*ring, uint32_t hand
 struct amdgpu_ib *ib_msg,
 struct dma_fence **fence)
  {
+   struct amdgpu_device *adev = ring->adev;
unsigned int ib_size_dw = 16;
struct amdgpu_job *job;
struct amdgpu_ib *ib;
@@ -882,7 +883,10 @@ static int amdgpu_vcn_enc_get_create_msg(struct 
amdgpu_ring *ring, uint32_t hand
ib->ptr[ib->length_dw++] = handle;
ib->ptr[ib->length_dw++] = upper_32_bits(addr);
ib->ptr[ib->length_dw++] = addr;
-   ib->ptr[ib->length_dw++] = 0x000b;
+   if (amdgpu_ip_version(adev, UVD_HWIP, 0) < IP_VERSION(5, 0, 0))
+   ib->ptr[ib->length_dw++] = 0x000b;
+   else
+   ib->ptr[ib->length_dw++] = 0x;
  
  	ib->ptr[ib->length_dw++] = 0x0014;

ib->ptr[ib->length_dw++] = 0x0002; /* task info */
@@ -918,6 +922,7 @@ static int amdgpu_vcn_enc_get_destroy_msg(struct 
amdgpu_ring *ring, uint32_t han
  struct amdgpu_ib *ib_msg,
  struct dma_fence **fence)
  {
+   struct amdgpu_device *adev = ring->adev;
unsigned int ib_size_dw = 16;
struct amdgpu_job *job;
struct amdgpu_ib *ib;
@@ -949,7 +954,10 @@ static int amdgpu_vcn_enc_get_destroy_msg(struct 
amdgpu_ring *ring, uint32_t han
ib->ptr[ib->length_dw++] = handle;
ib->ptr[ib->length_dw++] = upper_32_bits(addr);
ib->ptr[ib->length_dw++] = addr;
-   ib->ptr[ib->length_dw++] = 0x000b;
+   if (amdgpu_ip_version(adev, UVD_HWIP, 0) < IP_VERSION(5, 0, 0))
+   ib->ptr[ib->length_dw++] = 0x000b;
+   else
+   ib->ptr[ib->length_dw++] = 0x;
  
  	ib->ptr[ib->length_dw++] = 0x0014;

ib->ptr[ib->length_dw++] = 0x0002;




Re: [PATCH 1/2] drm/amdgpu: add a spinlock to wb allocation

2024-04-23 Thread Christian König

Am 22.04.24 um 16:37 schrieb Alex Deucher:

As we use wb slots more dynamically, we need to lock
access to avoid racing on allocation or free.


Wait a second. Why are we using the wb slots dynamically?

The number of slots made available is statically calculated, when this 
is suddenly used dynamically we have quite a bug here.


Regards,
Christian.



Signed-off-by: Alex Deucher 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 ++-
  2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index cac0ca64367b..f87d53e183c3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -502,6 +502,7 @@ struct amdgpu_wb {
uint64_tgpu_addr;
u32 num_wb; /* Number of wb slots actually reserved 
for amdgpu. */
unsigned long   used[DIV_ROUND_UP(AMDGPU_MAX_WB, 
BITS_PER_LONG)];
+   spinlock_t  lock;
  };
  
  int amdgpu_device_wb_get(struct amdgpu_device *adev, u32 *wb);

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f8a34db5d9e3..869256394136 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1482,13 +1482,17 @@ static int amdgpu_device_wb_init(struct amdgpu_device 
*adev)
   */
  int amdgpu_device_wb_get(struct amdgpu_device *adev, u32 *wb)
  {
-   unsigned long offset = find_first_zero_bit(adev->wb.used, 
adev->wb.num_wb);
+   unsigned long flags, offset;
  
+	spin_lock_irqsave(>wb.lock, flags);

+   offset = find_first_zero_bit(adev->wb.used, adev->wb.num_wb);
if (offset < adev->wb.num_wb) {
__set_bit(offset, adev->wb.used);
+   spin_unlock_irqrestore(>wb.lock, flags);
*wb = offset << 3; /* convert to dw offset */
return 0;
} else {
+   spin_unlock_irqrestore(>wb.lock, flags);
return -EINVAL;
}
  }
@@ -1503,9 +1507,13 @@ int amdgpu_device_wb_get(struct amdgpu_device *adev, u32 
*wb)
   */
  void amdgpu_device_wb_free(struct amdgpu_device *adev, u32 wb)
  {
+   unsigned long flags;
+
wb >>= 3;
+   spin_lock_irqsave(>wb.lock, flags);
if (wb < adev->wb.num_wb)
__clear_bit(wb, adev->wb.used);
+   spin_unlock_irqrestore(>wb.lock, flags);
  }
  
  /**

@@ -4061,6 +4069,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
spin_lock_init(>se_cac_idx_lock);
spin_lock_init(>audio_endpt_idx_lock);
spin_lock_init(>mm_stats.lock);
+   spin_lock_init(>wb.lock);
  
  	INIT_LIST_HEAD(>shadow_list);

mutex_init(>shadow_list_lock);




Re: [PATCH 3/3] drm/amdgpu: Fix Uninitialized scalar variable warning

2024-04-23 Thread Christian König

Am 23.04.24 um 04:53 schrieb Ma, Jun:

unsigned int client_id, src_id;
struct amdgpu_irq_src *src;
bool handled = false;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 924baf58e322..f0a63d084b4d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -1559,7 +1559,7 @@ static int amdgpu_debugfs_firmware_info_show(struct 
seq_file *m, void *unused)
   {
struct amdgpu_device *adev = m->private;
struct drm_amdgpu_info_firmware fw_info;
-   struct drm_amdgpu_query_fw query_fw;
+   struct drm_amdgpu_query_fw query_fw = {0};

Coverity warning:
uninit_use_in_call Using uninitialized value query_fw.index when calling 
amdgpu_firmware_info

Even though qeuery_fw.index was assigned a value before it's used, there is 
still an coverity warning.
We need to initialize query_fw when declare it.


But initializing it to zero doesn't sounds correct either.

The amdgpu_firmware_info() function is designed to return the FW info 
for a specific block, if the block isn't specified than coverity is 
right that we have a coding error here.


Just initializing the value silences coverity but is most likely not the 
right thing to do.


Regards,
Christian.




struct atom_context *ctx = adev->mode_info.atom_context;
uint8_t smu_program, smu_major, smu_minor, smu_debug;
int ret, i;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
index 2b99eed5ba19..41ac3319108b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
@@ -120,7 +120,7 @@ static void __amdgpu_xcp_add_block(struct amdgpu_xcp_mgr 
*xcp_mgr, int xcp_id,
   int amdgpu_xcp_init(struct amdgpu_xcp_mgr *xcp_mgr, int num_xcps, int mode)
   {
struct amdgpu_device *adev = xcp_mgr->adev;
-   struct amdgpu_xcp_ip ip;
+   struct amdgpu_xcp_ip ip = {0};

Coverity Warning:
Using uninitialized value ip. Field ip.valid is uninitialized when calling 
__amdgpu_xcp_add_block

The code is ok. We just need to initialize the variable ip.

Regards,
Ma Jun



uint8_t mem_id;
int i, j, ret;
   


Re: [PATCH 2/2] drm/amdgpu: fix uninitialized variable warning

2024-04-23 Thread Christian König

Am 23.04.24 um 07:33 schrieb Bob Zhou:

Because the val isn't initialized, a random variable is set by 
amdgpu_i2c_put_byte.
So fix the uninitialized issue.


Well that isn't correct. See the code here:

    amdgpu_i2c_get_byte(amdgpu_connector->router_bus,
    amdgpu_connector->router.i2c_addr,
    0x3, );
    val &= ~amdgpu_connector->router.cd_mux_control_pin;
    amdgpu_i2c_put_byte(amdgpu_connector->router_bus,
    amdgpu_connector->router.i2c_addr,
    0x3, val);

The value is first read by amdgpu_i2c_get_byte(), then modified and then 
written again by amdgpu_i2c_put_byte().


Was this an automated warning?

Either way the patch is clearly rejected.

Regards,
Christian.



Signed-off-by: Bob Zhou 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c
index 82608df43396..d4d2dc792b60 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c
@@ -368,7 +368,7 @@ amdgpu_i2c_router_select_ddc_port(const struct 
amdgpu_connector *amdgpu_connecto
  void
  amdgpu_i2c_router_select_cd_port(const struct amdgpu_connector 
*amdgpu_connector)
  {
-   u8 val;
+   u8 val = 0;
  
  	if (!amdgpu_connector->router.cd_valid)

return;




Re: [PATCH] drm/amdgpu: Fix two reset triggered in a row

2024-04-22 Thread Christian König

Am 23.04.24 um 05:13 schrieb Li, Yunxiang (Teddy):

[Public]


We can't do this technically as there are cases where we skip full device reset 
(even then amdgpu_in_reset will return true). The better thing to do is to move 
amdgpu_device_stop_pending_resets() later in
gpu_recover()- if a device has undergone full reset, then cancel all pending 
resets. Presently it's happening earlier which could be why this issue is seen.

This sounds like it is a design issue then, if different reset workers expect 
different resets to be triggered but they all use the same flag. I wonder if 
the other places that check this flags are correct. FWIW I was testing with 
SRIOV where it always does full reset and ran into this issue.


Lijo is correct. The idea here is that all reset sources which have been 
covered by a reset are canceled directly after the reset is completed.


The approach with checking amdgpu_in_reset() is broken because it can 
still happen that multiple sources signal at the same time that a reset 
is necessary.


Regards,
Christian.


Re: [PATCH] drm/amdgpu: Fix two reset triggered in a row

2024-04-22 Thread Christian König

Am 22.04.24 um 21:45 schrieb Yunxiang Li:

Reset request from KFD is missing a check for if a reset is already in
progress, this causes a second reset to be triggered right after the
previous one finishes. Add the check to align with the other reset sources.


NAK, that isn't how this should be handled.

Instead all reset source which are handled by a previous reset should be 
canceled.


In other words there should be a cancel_work(>kfd.reset_work); 
somewhere in the KFD code. When this doesn't work correctly then that is 
somehow missing.


If you see the use of amdgpu_in_reset() outside of the low level 
functions than that is clearly a bug.


Regards,
Christian.



Signed-off-by: Yunxiang Li 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 3b4591f554f1..ce3dbb1cc2da 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -283,7 +283,7 @@ int amdgpu_amdkfd_post_reset(struct amdgpu_device *adev)
  
  void amdgpu_amdkfd_gpu_reset(struct amdgpu_device *adev)

  {
-   if (amdgpu_device_should_recover_gpu(adev))
+   if (amdgpu_device_should_recover_gpu(adev) && !amdgpu_in_reset(adev))
amdgpu_reset_domain_schedule(adev->reset_domain,
 >kfd.reset_work);
  }




Re: [PATCH] drm/amdgpu: once more fix the call oder in amdgpu_ttm_move()

2024-04-22 Thread Christian König

Am 18.04.24 um 18:10 schrieb Alex Deucher:

On Thu, Mar 21, 2024 at 10:37 AM Christian König
 wrote:

Am 21.03.24 um 15:12 schrieb Tvrtko Ursulin:

On 21/03/2024 12:43, Christian König wrote:

This reverts drm/amdgpu: fix ftrace event amdgpu_bo_move always move
on same heap. The basic problem here is that after the move the old
location is simply not available any more.

Some fixes where suggested, but essentially we should call the move
notification before actually moving things because only this way we have
the correct order for DMA-buf and VM move notifications as well.

Also rework the statistic handling so that we don't update the eviction
counter before the move.

Signed-off-by: Christian König 

Don't forget:

Fixes: 94aeb4117343 ("drm/amdgpu: fix ftrace event amdgpu_bo_move
always move on same heap")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3171

Ah, thanks. I already wanted to ask if there is any bug report about
that as well.

Did this ever land?  I don't see it anywhere.


No, I never found time to actually rebase and push it.

Just did so 10 minutes ago, should probably show up in 
amd-staging-drm-next unless there isn't any CI hickup again.


Christian.



Alex


Regards,
Christian.


;)

Regards,

Tvrtko


---
   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 15 +++
   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  4 +-
   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 48 --
   3 files changed, 37 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 425cebcc5cbf..eb7d824763b9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1245,19 +1245,20 @@ int amdgpu_bo_get_metadata(struct amdgpu_bo
*bo, void *buffer,
* amdgpu_bo_move_notify - notification about a memory move
* @bo: pointer to a buffer object
* @evict: if this move is evicting the buffer from the graphics
address space
+ * @new_mem: new resource for backing the BO
*
* Marks the corresponding _bo buffer object as invalid,
also performs
* bookkeeping.
* TTM driver callback which is called when ttm moves a buffer.
*/
-void amdgpu_bo_move_notify(struct ttm_buffer_object *bo, bool evict)
+void amdgpu_bo_move_notify(struct ttm_buffer_object *bo,
+   bool evict,
+   struct ttm_resource *new_mem)
   {
   struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
+struct ttm_resource *old_mem = bo->resource;
   struct amdgpu_bo *abo;
   -if (!amdgpu_bo_is_amdgpu_bo(bo))
-return;
-
   abo = ttm_to_amdgpu_bo(bo);
   amdgpu_vm_bo_invalidate(adev, abo, evict);
   @@ -1267,9 +1268,9 @@ void amdgpu_bo_move_notify(struct
ttm_buffer_object *bo, bool evict)
   bo->resource->mem_type != TTM_PL_SYSTEM)
   dma_buf_move_notify(abo->tbo.base.dma_buf);
   -/* remember the eviction */
-if (evict)
-atomic64_inc(>num_evictions);
+/* move_notify is called before move happens */
+trace_amdgpu_bo_move(abo, new_mem ? new_mem->mem_type : -1,
+ old_mem ? old_mem->mem_type : -1);
   }
 void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index a3ea8a82db23..d28e21baef16 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -344,7 +344,9 @@ int amdgpu_bo_set_metadata (struct amdgpu_bo *bo,
void *metadata,
   int amdgpu_bo_get_metadata(struct amdgpu_bo *bo, void *buffer,
  size_t buffer_size, uint32_t *metadata_size,
  uint64_t *flags);
-void amdgpu_bo_move_notify(struct ttm_buffer_object *bo, bool evict);
+void amdgpu_bo_move_notify(struct ttm_buffer_object *bo,
+   bool evict,
+   struct ttm_resource *new_mem);
   void amdgpu_bo_release_notify(struct ttm_buffer_object *bo);
   vm_fault_t amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object
*bo);
   void amdgpu_bo_fence(struct amdgpu_bo *bo, struct dma_fence *fence,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index a5ceec7820cf..460b23918bfc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -471,14 +471,16 @@ static int amdgpu_bo_move(struct
ttm_buffer_object *bo, bool evict,
 if (!old_mem || (old_mem->mem_type == TTM_PL_SYSTEM &&
bo->ttm == NULL)) {
+amdgpu_bo_move_notify(bo, evict, new_mem);
   ttm_bo_move_null(bo, new_mem);
-goto out;
+return 0;
   }
   if (old_mem->mem_type == TTM_PL_SYSTEM &&
   (new_mem->mem_type == TTM_PL_TT ||
new_mem->mem_type == AMDGPU_PL_PREEMPT)) {
+amdgpu_bo_move_notify(bo, evict, new_mem);
   tt

Re: [PATCH 3/3] drm/amdgpu: add the amdgpu buffer object move speed metrics

2024-04-22 Thread Christian König

Am 16.04.24 um 10:51 schrieb Prike Liang:

Add the amdgpu buffer object move speed metrics.


What should that be good for? It adds quite a bunch of complexity for a 
feature we actually want to deprecate.


Regards,
Christian.



Signed-off-by: Prike Liang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c | 78 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   |  2 +-
  3 files changed, 61 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 163d221b3bbd..2840f1536b51 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -502,7 +502,7 @@ void amdgpu_device_wb_free(struct amdgpu_device *adev, u32 
wb);
  /*
   * Benchmarking
   */
-int amdgpu_benchmark(struct amdgpu_device *adev, int test_number);
+int amdgpu_benchmark(struct amdgpu_device *adev, int test_number, struct 
seq_file *m);
  
  int amdgpu_benchmark_dump(struct amdgpu_device *adev, struct seq_file *m);

  /*
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
index f6848b574dea..fcd186ca088a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
@@ -65,20 +65,27 @@ static void amdgpu_benchmark_log_results(struct 
amdgpu_device *adev,
 int n, unsigned size,
 s64 time_ms,
 unsigned sdomain, unsigned ddomain,
-char *kind)
+char *kind, struct seq_file *m)
  {
s64 throughput = (n * (size >> 10));
  
  	throughput = div64_s64(throughput, time_ms);
  
-	dev_info(adev->dev, "amdgpu: %s %u bo moves of %u kB from"

-" %d to %d in %lld ms, throughput: %lld Mb/s or %lld MB/s\n",
-kind, n, size >> 10, sdomain, ddomain, time_ms,
-throughput * 8, throughput);
+   if (m) {
+   seq_printf(m, "\tamdgpu: %s %u bo moves of %u kB from"
+" %d to %d in %lld ms, throughput: %lld Mb/s or %lld 
MB/s\n",
+   kind, n, size >> 10, sdomain, ddomain, time_ms,
+   throughput * 8, throughput);
+   } else {
+   dev_info(adev->dev, "amdgpu: %s %u bo moves of %u kB from"
+" %d to %d in %lld ms, throughput: %lld Mb/s or %lld 
MB/s\n",
+   kind, n, size >> 10, sdomain, ddomain, time_ms,
+   throughput * 8, throughput);
+   }
  }
  
  static int amdgpu_benchmark_move(struct amdgpu_device *adev, unsigned size,

-unsigned sdomain, unsigned ddomain)
+unsigned sdomain, unsigned ddomain, struct 
seq_file *m)
  {
struct amdgpu_bo *dobj = NULL;
struct amdgpu_bo *sobj = NULL;
@@ -109,7 +116,7 @@ static int amdgpu_benchmark_move(struct amdgpu_device 
*adev, unsigned size,
goto out_cleanup;
else
amdgpu_benchmark_log_results(adev, n, size, time_ms,
-sdomain, ddomain, "dma");
+sdomain, ddomain, "dma", 
m);
}
  
  out_cleanup:

@@ -124,7 +131,7 @@ static int amdgpu_benchmark_move(struct amdgpu_device 
*adev, unsigned size,
return r;
  }
  
-int amdgpu_benchmark(struct amdgpu_device *adev, int test_number)

+int amdgpu_benchmark(struct amdgpu_device *adev, int test_number, struct 
seq_file *m)
  {
int i, r;
static const int common_modes[AMDGPU_BENCHMARK_COMMON_MODES_N] = {
@@ -153,13 +160,16 @@ int amdgpu_benchmark(struct amdgpu_device *adev, int 
test_number)
dev_info(adev->dev,
 "benchmark test: %d (simple test, VRAM to GTT and GTT to 
VRAM)\n",
 test_number);
+   if (m)
+   seq_printf(m, "\tbenchmark test: %d (simple test, VRAM to 
GTT and GTT to VRAM)\n",
+test_number);
/* simple test, VRAM to GTT and GTT to VRAM */
r = amdgpu_benchmark_move(adev, 1024*1024, 
AMDGPU_GEM_DOMAIN_GTT,
- AMDGPU_GEM_DOMAIN_VRAM);
+ AMDGPU_GEM_DOMAIN_VRAM, m);
if (r)
goto done;
r = amdgpu_benchmark_move(adev, 1024*1024, 
AMDGPU_GEM_DOMAIN_VRAM,
- AMDGPU_GEM_DOMAIN_GTT);
+ AMDGPU_GEM_DOMAIN_GTT, m);
if (r)
goto done;
break;
@@ -167,9 +177,13 @@ int amdgpu_benchmark(struct amdgpu_device *adev, int 
test_number)

Re: [PATCH v3 6/7] drm/amdgpu: Skip dma map resource for null RDMA device

2024-04-22 Thread Christian König

Am 22.04.24 um 15:57 schrieb Philip Yang:

To test RDMA using dummy driver on the system without NIC/RDMA
device, the get/put dma pages pass in null device pointer, skip the
dma map/unmap resource and sg table to avoid null pointer access.


Well that is completely illegal and would break IOMMU.

Why does the RDMA driver does that in the first place?

Regards,
Christian.



Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 33 +++-
  1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 9fe56a21ef88..0caf2c89ef1d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -705,12 +705,15 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
unsigned long size = min(cursor.size, MAX_SG_SEGMENT_SIZE);
dma_addr_t addr;
  
-		addr = dma_map_resource(dev, phys, size, dir,

-   DMA_ATTR_SKIP_CPU_SYNC);
-   r = dma_mapping_error(dev, addr);
-   if (r)
-   goto error_unmap;
-
+   if (dev) {
+   addr = dma_map_resource(dev, phys, size, dir,
+   DMA_ATTR_SKIP_CPU_SYNC);
+   r = dma_mapping_error(dev, addr);
+   if (r)
+   goto error_unmap;
+   } else {
+   addr = phys;
+   }
sg_set_page(sg, NULL, size, 0);
sg_dma_address(sg) = addr;
sg_dma_len(sg) = size;
@@ -724,10 +727,10 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
for_each_sgtable_sg((*sgt), sg, i) {
if (!sg->length)
continue;
-
-   dma_unmap_resource(dev, sg->dma_address,
-  sg->length, dir,
-  DMA_ATTR_SKIP_CPU_SYNC);
+   if (dev)
+   dma_unmap_resource(dev, sg->dma_address,
+  sg->length, dir,
+  DMA_ATTR_SKIP_CPU_SYNC);
}
sg_free_table(*sgt);
  
@@ -752,10 +755,12 @@ void amdgpu_vram_mgr_free_sgt(struct device *dev,

struct scatterlist *sg;
int i;
  
-	for_each_sgtable_sg(sgt, sg, i)

-   dma_unmap_resource(dev, sg->dma_address,
-  sg->length, dir,
-  DMA_ATTR_SKIP_CPU_SYNC);
+   if (dev) {
+   for_each_sgtable_sg(sgt, sg, i)
+   dma_unmap_resource(dev, sg->dma_address,
+  sg->length, dir,
+  DMA_ATTR_SKIP_CPU_SYNC);
+   }
sg_free_table(sgt);
kfree(sgt);
  }




Re: [PATCH] drm/amdgpu: Fixup bad vram size on gmc v6 and v7

2024-04-22 Thread Christian König

Am 22.04.24 um 16:40 schrieb Alex Deucher:

On Mon, Apr 22, 2024 at 9:00 AM Christian König
 wrote:

Am 22.04.24 um 14:33 schrieb Qiang Ma:

On Mon, 22 Apr 2024 11:40:26 +0200
Christian König  wrote:


Am 22.04.24 um 07:26 schrieb Qiang Ma:

Some boards(like Oland PRO: 0x1002:0x6613) seem to have
garbage in the upper 16 bits of the vram size register,
kern log as follows:

[6.00] [drm] Detected VRAM RAM=2256537600M, BAR=256M
[6.007812] [drm] RAM width 64bits GDDR5
[6.031250] [drm] amdgpu: 2256537600M of VRAM memory ready

This is obviously not true, check for this and clamp the size
properly. Fixes boards reporting bogus amounts of vram,
kern log as follows:

[2.789062] [drm] Probable bad vram size: 0x86800800
[2.789062] [drm] Detected VRAM RAM=2048M, BAR=256M
[2.789062] [drm] RAM width 64bits GDDR5
[2.789062] [drm] amdgpu: 2048M of VRAM memory ready

Well we had patches like this one here before and so far we always
rejected them.

When the mmCONFIG_MEMSIZE register isn't properly initialized then
there is something wrong with your hardware.

Working around that in the software driver is not going to fly.

Regards,
Christian.


Hi Christian:
I see that two patches for this issue have been merged, and the
patches are as follows:

11544d77e397 drm/amdgpu: fixup bad vram size on gmc v8
0ca223b029a2 drm/radeon: fixup bad vram size on SI

Mhm, I remember that we discussed reverting those but it looks like that
never happened. I need to ask around internally.

Question is do you see any other problems with the board? E.g. incorrect
connector or harvesting configuration?

I'll need to dig up the past discussion again, but IIRC, the issue was
only seen on some non-x86 platforms.  Maybe something specific to MMIO
on those?


I honestly doesn't remember it either, but in general it's the job of 
the VBIOS to init this register.


So if we see the upper bits mangled the VBIOS hasn't done that correctly 
and it's quite likely that this is only the tip of the iceberg of problems.


Christian.



Alex



Regards,
Christian.


Qiang Ma


Signed-off-by: Qiang Ma 
---
drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 11 +--
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 13 ++---
2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c index
23b478639921..3703695f7789 100644 ---
a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c +++
b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c @@ -309,8 +309,15 @@ static
int gmc_v6_0_mc_init(struct amdgpu_device *adev) }
 adev->gmc.vram_width = numchan * chansize;
 /* size in MB on si */
-   adev->gmc.mc_vram_size = RREG32(mmCONFIG_MEMSIZE) *
1024ULL * 1024ULL;
-   adev->gmc.real_vram_size = RREG32(mmCONFIG_MEMSIZE) *
1024ULL * 1024ULL;
+   tmp = RREG32(mmCONFIG_MEMSIZE);
+   /* some boards may have garbage in the upper 16 bits */
+   if (tmp & 0x) {
+   DRM_INFO("Probable bad vram size: 0x%08x\n", tmp);
+   if (tmp & 0x)
+   tmp &= 0x;
+   }
+   adev->gmc.mc_vram_size = tmp * 1024ULL * 1024ULL;
+   adev->gmc.real_vram_size = adev->gmc.mc_vram_size;

 if (!(adev->flags & AMD_IS_APU)) {
 r = amdgpu_device_resize_fb_bar(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c index
3da7b6a2b00d..1df1fc578ff6 100644 ---
a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c +++
b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c @@ -316,10 +316,10 @@
static void gmc_v7_0_mc_program(struct amdgpu_device *adev) static
int gmc_v7_0_mc_init(struct amdgpu_device *adev) {
 int r;
+   u32 tmp;

 adev->gmc.vram_width =
amdgpu_atombios_get_vram_width(adev); if (!adev->gmc.vram_width) {
-   u32 tmp;
 int chansize, numchan;

 /* Get VRAM informations */
@@ -363,8 +363,15 @@ static int gmc_v7_0_mc_init(struct
amdgpu_device *adev) adev->gmc.vram_width = numchan * chansize;
 }
 /* size in MB on si */
-   adev->gmc.mc_vram_size = RREG32(mmCONFIG_MEMSIZE) *
1024ULL * 1024ULL;
-   adev->gmc.real_vram_size = RREG32(mmCONFIG_MEMSIZE) *
1024ULL * 1024ULL;
+   tmp = RREG32(mmCONFIG_MEMSIZE);
+   /* some boards may have garbage in the upper 16 bits */
+   if (tmp & 0x) {
+   DRM_INFO("Probable bad vram size: 0x%08x\n", tmp);
+   if (tmp & 0x)
+   tmp &= 0x;
+   }
+   adev->gmc.mc_vram_size = tmp * 1024ULL * 1024ULL;
+   adev->gmc.real_vram_size = adev->gmc.mc_vram_size;

 if (!(adev->flags & AMD_IS_APU)) {
 r = amdgpu_device_resize_fb_bar(adev);




Re: [PATCH v3 2/7] drm/amdgpu: Handle sg size limit for contiguous allocation

2024-04-22 Thread Christian König

Am 22.04.24 um 15:57 schrieb Philip Yang:

Define macro MAX_SG_SEGMENT_SIZE 2GB, because struct scatterlist length
is unsigned int, and some users of it cast to a signed int, so every
segment of sg table is limited to size 2GB maximum.

For contiguous VRAM allocation, don't limit the max buddy block size in
order to get contiguous VRAM memory. To workaround the sg table segment
size limit, allocate multiple segments if contiguous size is bigger than
MAX_SG_SEGMENT_SIZE.

Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 -
  1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 4be8b091099a..9fe56a21ef88 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -31,6 +31,8 @@
  #include "amdgpu_atomfirmware.h"
  #include "atom.h"
  
+#define MAX_SG_SEGMENT_SIZE	(2UL << 30)

+
  struct amdgpu_vram_reservation {
u64 start;
u64 size;
@@ -532,8 +534,13 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager 
*man,
  
  		BUG_ON(min_block_size < mm->chunk_size);
  
-		/* Limit maximum size to 2GiB due to SG table limitations */

-   size = min(remaining_size, 2ULL << 30);
+   if (place->flags & TTM_PL_FLAG_CONTIGUOUS)
+   size = remaining_size;
+   else
+   /* Limit maximum size to 2GiB due to SG table 
limitations
+* for no contiguous allocation.
+*/
+   size = min(remaining_size, MAX_SG_SEGMENT_SIZE);


Well that doesn't make sense, either fix the creation of the sg tables 
or limit the segment size. Not both.


  
  		if ((size >= (u64)pages_per_block << PAGE_SHIFT) &&

!(size & (((u64)pages_per_block << PAGE_SHIFT) 
- 1)))
@@ -675,7 +682,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
amdgpu_res_first(res, offset, length, );
while (cursor.remaining) {
num_entries++;
-   amdgpu_res_next(, cursor.size);
+   amdgpu_res_next(, min(cursor.size, MAX_SG_SEGMENT_SIZE));
}
  
  	r = sg_alloc_table(*sgt, num_entries, GFP_KERNEL);

@@ -695,7 +702,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
amdgpu_res_first(res, offset, length, );
for_each_sgtable_sg((*sgt), sg, i) {
phys_addr_t phys = cursor.start + adev->gmc.aper_base;
-   size_t size = cursor.size;
+   unsigned long size = min(cursor.size, MAX_SG_SEGMENT_SIZE);


Please keep size_t here or use unsigned int, using unsigned long just 
looks like trying to hide the problem.


And I wouldn't use a separate define but rather just INT_MAX instead.

Regards,
Christian.


dma_addr_t addr;
  
  		addr = dma_map_resource(dev, phys, size, dir,

@@ -708,7 +715,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
sg_dma_address(sg) = addr;
sg_dma_len(sg) = size;
  
-		amdgpu_res_next(, cursor.size);

+   amdgpu_res_next(, size);
}
  
  	return 0;




Re: [PATCH] drm/amdgpu: Fixup bad vram size on gmc v6 and v7

2024-04-22 Thread Christian König

Am 22.04.24 um 14:33 schrieb Qiang Ma:

On Mon, 22 Apr 2024 11:40:26 +0200
Christian König  wrote:


Am 22.04.24 um 07:26 schrieb Qiang Ma:

Some boards(like Oland PRO: 0x1002:0x6613) seem to have
garbage in the upper 16 bits of the vram size register,
kern log as follows:

[6.00] [drm] Detected VRAM RAM=2256537600M, BAR=256M
[6.007812] [drm] RAM width 64bits GDDR5
[6.031250] [drm] amdgpu: 2256537600M of VRAM memory ready

This is obviously not true, check for this and clamp the size
properly. Fixes boards reporting bogus amounts of vram,
kern log as follows:

[2.789062] [drm] Probable bad vram size: 0x86800800
[2.789062] [drm] Detected VRAM RAM=2048M, BAR=256M
[2.789062] [drm] RAM width 64bits GDDR5
[2.789062] [drm] amdgpu: 2048M of VRAM memory ready

Well we had patches like this one here before and so far we always
rejected them.

When the mmCONFIG_MEMSIZE register isn't properly initialized then
there is something wrong with your hardware.

Working around that in the software driver is not going to fly.

Regards,
Christian.


Hi Christian:
I see that two patches for this issue have been merged, and the
patches are as follows:

11544d77e397 drm/amdgpu: fixup bad vram size on gmc v8
0ca223b029a2 drm/radeon: fixup bad vram size on SI


Mhm, I remember that we discussed reverting those but it looks like that 
never happened. I need to ask around internally.


Question is do you see any other problems with the board? E.g. incorrect 
connector or harvesting configuration?


Regards,
Christian.



Qiang Ma


Signed-off-by: Qiang Ma 
---
   drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 11 +--
   drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 13 ++---
   2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c index
23b478639921..3703695f7789 100644 ---
a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c +++
b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c @@ -309,8 +309,15 @@ static
int gmc_v6_0_mc_init(struct amdgpu_device *adev) }
adev->gmc.vram_width = numchan * chansize;
/* size in MB on si */
-   adev->gmc.mc_vram_size = RREG32(mmCONFIG_MEMSIZE) *
1024ULL * 1024ULL;
-   adev->gmc.real_vram_size = RREG32(mmCONFIG_MEMSIZE) *
1024ULL * 1024ULL;
+   tmp = RREG32(mmCONFIG_MEMSIZE);
+   /* some boards may have garbage in the upper 16 bits */
+   if (tmp & 0x) {
+   DRM_INFO("Probable bad vram size: 0x%08x\n", tmp);
+   if (tmp & 0x)
+   tmp &= 0x;
+   }
+   adev->gmc.mc_vram_size = tmp * 1024ULL * 1024ULL;
+   adev->gmc.real_vram_size = adev->gmc.mc_vram_size;
   
   	if (!(adev->flags & AMD_IS_APU)) {

r = amdgpu_device_resize_fb_bar(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c index
3da7b6a2b00d..1df1fc578ff6 100644 ---
a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c +++
b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c @@ -316,10 +316,10 @@
static void gmc_v7_0_mc_program(struct amdgpu_device *adev) static
int gmc_v7_0_mc_init(struct amdgpu_device *adev) {
int r;
+   u32 tmp;
   
   	adev->gmc.vram_width =

amdgpu_atombios_get_vram_width(adev); if (!adev->gmc.vram_width) {
-   u32 tmp;
int chansize, numchan;
   
   		/* Get VRAM informations */

@@ -363,8 +363,15 @@ static int gmc_v7_0_mc_init(struct
amdgpu_device *adev) adev->gmc.vram_width = numchan * chansize;
}
/* size in MB on si */
-   adev->gmc.mc_vram_size = RREG32(mmCONFIG_MEMSIZE) *
1024ULL * 1024ULL;
-   adev->gmc.real_vram_size = RREG32(mmCONFIG_MEMSIZE) *
1024ULL * 1024ULL;
+   tmp = RREG32(mmCONFIG_MEMSIZE);
+   /* some boards may have garbage in the upper 16 bits */
+   if (tmp & 0x) {
+   DRM_INFO("Probable bad vram size: 0x%08x\n", tmp);
+   if (tmp & 0x)
+   tmp &= 0x;
+   }
+   adev->gmc.mc_vram_size = tmp * 1024ULL * 1024ULL;
+   adev->gmc.real_vram_size = adev->gmc.mc_vram_size;
   
   	if (!(adev->flags & AMD_IS_APU)) {

r = amdgpu_device_resize_fb_bar(adev);






Re: [PATCH] drm/amdgpu/sdma5.2: use legacy HDP flush for SDMA2/3

2024-04-22 Thread Christian König

Am 20.04.24 um 21:02 schrieb Alex Deucher:

This avoids a potential conflict with firmwares with the newer
HDP flush mechanism.


The patch is fine, but I'm starting to wonder why we are using the newer 
HDP flush mechanism in the first place?




Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2156
Signed-off-by: Alex Deucher 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 26 +++---
  1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
index b2417ba4759b..c44ec41f1cb6 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
@@ -280,17 +280,21 @@ static void sdma_v5_2_ring_emit_hdp_flush(struct 
amdgpu_ring *ring)
u32 ref_and_mask = 0;
const struct nbio_hdp_flush_reg *nbio_hf_reg = adev->nbio.hdp_flush_reg;
  
-	ref_and_mask = nbio_hf_reg->ref_and_mask_sdma0 << ring->me;

-
-   amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_POLL_REGMEM) |
- SDMA_PKT_POLL_REGMEM_HEADER_HDP_FLUSH(1) |
- SDMA_PKT_POLL_REGMEM_HEADER_FUNC(3)); /* == */
-   amdgpu_ring_write(ring, (adev->nbio.funcs->get_hdp_flush_done_offset(adev)) 
<< 2);
-   amdgpu_ring_write(ring, (adev->nbio.funcs->get_hdp_flush_req_offset(adev)) 
<< 2);
-   amdgpu_ring_write(ring, ref_and_mask); /* reference */
-   amdgpu_ring_write(ring, ref_and_mask); /* mask */
-   amdgpu_ring_write(ring, SDMA_PKT_POLL_REGMEM_DW5_RETRY_COUNT(0xfff) |
- SDMA_PKT_POLL_REGMEM_DW5_INTERVAL(10)); /* retry 
count, poll interval */
+   if (ring->me > 1) {
+   amdgpu_asic_flush_hdp(adev, ring);
+   } else {
+   ref_and_mask = nbio_hf_reg->ref_and_mask_sdma0 << ring->me;
+
+   amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_POLL_REGMEM) 
|
+ SDMA_PKT_POLL_REGMEM_HEADER_HDP_FLUSH(1) |
+ SDMA_PKT_POLL_REGMEM_HEADER_FUNC(3)); /* == */
+   amdgpu_ring_write(ring, 
(adev->nbio.funcs->get_hdp_flush_done_offset(adev)) << 2);
+   amdgpu_ring_write(ring, 
(adev->nbio.funcs->get_hdp_flush_req_offset(adev)) << 2);
+   amdgpu_ring_write(ring, ref_and_mask); /* reference */
+   amdgpu_ring_write(ring, ref_and_mask); /* mask */
+   amdgpu_ring_write(ring, 
SDMA_PKT_POLL_REGMEM_DW5_RETRY_COUNT(0xfff) |
+ SDMA_PKT_POLL_REGMEM_DW5_INTERVAL(10)); /* 
retry count, poll interval */
+   }
  }
  
  /**




Re: [PATCH] drm/amdgpu: fix use-after-free issue

2024-04-22 Thread Christian König

Am 22.04.24 um 13:29 schrieb Lazar, Lijo:

On 4/22/2024 4:52 PM, Christian König wrote:

Am 22.04.24 um 11:37 schrieb Lazar, Lijo:

On 4/22/2024 2:59 PM, Christian König wrote:

Am 22.04.24 um 10:47 schrieb Jack Xiao:

Delete fence fallback timer to fix the ramdom
use-after-free issue.

That's already done in amdgpu_fence_driver_hw_fini() and absolutely
shouldn't be in amdgpu_ring_fini().

And the kfree(ring->fence_drv.fences); shouldn't be there either since
that is done in amdgpu_fence_driver_sw_fini().


In the present logic, these are part of special rings dynamically
created for mes self tests with
amdgpu_mes_add_ring/amdgpu_mes_remove_ring.

Ok, we should probably stop doing that altogether.

Shashanks work of utilizing the MES in userspace is nearly finished and
we don't really need the MES test in the kernel any more.


A v2 of the patch is posted. Can we use it temporarily till Shashank's
work is in place?


Yes, absolutely.


Assuming Shashank's work will also include removing
MES self test in kernel.


Yes, that was the long term plan. But no idea when we can completely 
upstream that work.


Regards,
Christian.



Thanks,
Lijo


Regards,
Christian.


Thanks,
Lijo


Regards,
Christian.


Signed-off-by: Jack Xiao 
---
    drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 1 +
    1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 06f0a6534a94..93ab9faa2d72 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -390,6 +390,7 @@ void amdgpu_ring_fini(struct amdgpu_ring *ring)
  >gpu_addr,
  (void **)>ring);
    } else {
+    del_timer_sync(>fence_drv.fallback_timer);
    kfree(ring->fence_drv.fences);
    }





Re: [PATCH v2] drm/amdgpu/mes: fix use-after-free issue

2024-04-22 Thread Christian König

Am 22.04.24 um 13:12 schrieb Lazar, Lijo:


On 4/22/2024 3:09 PM, Jack Xiao wrote:

Delete fence fallback timer to fix the ramdom
use-after-free issue.

v2: move to amdgpu_mes.c

Signed-off-by: Jack Xiao 

Acked-by: Lijo Lazar 


Acked-by: Christian König 



Thanks,
Lijo


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index 78e4f88f5134..226751ea084b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -1128,6 +1128,7 @@ void amdgpu_mes_remove_ring(struct amdgpu_device *adev,
return;
  
  	amdgpu_mes_remove_hw_queue(adev, ring->hw_queue_id);

+   del_timer_sync(>fence_drv.fallback_timer);
amdgpu_ring_fini(ring);
kfree(ring);
  }




Re: [PATCH 3/3] drm/amdgpu: Fix Uninitialized scalar variable warning

2024-04-22 Thread Christian König

Am 22.04.24 um 11:49 schrieb Ma Jun:

Initialize the variables which were not initialized
to fix the coverity issue "Uninitialized scalar variable"


Feel free to add my Acked-by to the first two patches, but this here 
clearly doesn't looks like a good idea to me.




Signed-off-by: Ma Jun 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 2 +-
  3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index 7e6d09730e6d..7b28b6b8982b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -437,7 +437,7 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev,
 struct amdgpu_ih_ring *ih)
  {
u32 ring_index = ih->rptr >> 2;
-   struct amdgpu_iv_entry entry;
+   struct amdgpu_iv_entry entry = {0};


When this needs to be initialized there is clearly something wrong with 
the code. I would guess similar for the other two.


What exactly does coverity complains about?

Regards,
Christian.


unsigned int client_id, src_id;
struct amdgpu_irq_src *src;
bool handled = false;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 924baf58e322..f0a63d084b4d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -1559,7 +1559,7 @@ static int amdgpu_debugfs_firmware_info_show(struct 
seq_file *m, void *unused)
  {
struct amdgpu_device *adev = m->private;
struct drm_amdgpu_info_firmware fw_info;
-   struct drm_amdgpu_query_fw query_fw;
+   struct drm_amdgpu_query_fw query_fw = {0};
struct atom_context *ctx = adev->mode_info.atom_context;
uint8_t smu_program, smu_major, smu_minor, smu_debug;
int ret, i;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
index 2b99eed5ba19..41ac3319108b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
@@ -120,7 +120,7 @@ static void __amdgpu_xcp_add_block(struct amdgpu_xcp_mgr 
*xcp_mgr, int xcp_id,
  int amdgpu_xcp_init(struct amdgpu_xcp_mgr *xcp_mgr, int num_xcps, int mode)
  {
struct amdgpu_device *adev = xcp_mgr->adev;
-   struct amdgpu_xcp_ip ip;
+   struct amdgpu_xcp_ip ip = {0};
uint8_t mem_id;
int i, j, ret;
  




Re: [PATCH] drm/amdgpu: fix use-after-free issue

2024-04-22 Thread Christian König

Am 22.04.24 um 11:37 schrieb Lazar, Lijo:


On 4/22/2024 2:59 PM, Christian König wrote:

Am 22.04.24 um 10:47 schrieb Jack Xiao:

Delete fence fallback timer to fix the ramdom
use-after-free issue.

That's already done in amdgpu_fence_driver_hw_fini() and absolutely
shouldn't be in amdgpu_ring_fini().

And the kfree(ring->fence_drv.fences); shouldn't be there either since
that is done in amdgpu_fence_driver_sw_fini().


In the present logic, these are part of special rings dynamically
created for mes self tests with amdgpu_mes_add_ring/amdgpu_mes_remove_ring.


Ok, we should probably stop doing that altogether.

Shashanks work of utilizing the MES in userspace is nearly finished and 
we don't really need the MES test in the kernel any more.


Regards,
Christian.



Thanks,
Lijo


Regards,
Christian.


Signed-off-by: Jack Xiao 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 1 +
   1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 06f0a6534a94..93ab9faa2d72 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -390,6 +390,7 @@ void amdgpu_ring_fini(struct amdgpu_ring *ring)
     >gpu_addr,
     (void **)>ring);
   } else {
+    del_timer_sync(>fence_drv.fallback_timer);
   kfree(ring->fence_drv.fences);
   }
   




Re: [PATCH] drm/amdgpu: Fixup bad vram size on gmc v6 and v7

2024-04-22 Thread Christian König

Am 22.04.24 um 07:26 schrieb Qiang Ma:

Some boards(like Oland PRO: 0x1002:0x6613) seem to have
garbage in the upper 16 bits of the vram size register,
kern log as follows:

[6.00] [drm] Detected VRAM RAM=2256537600M, BAR=256M
[6.007812] [drm] RAM width 64bits GDDR5
[6.031250] [drm] amdgpu: 2256537600M of VRAM memory ready

This is obviously not true, check for this and clamp the size
properly. Fixes boards reporting bogus amounts of vram,
kern log as follows:

[2.789062] [drm] Probable bad vram size: 0x86800800
[2.789062] [drm] Detected VRAM RAM=2048M, BAR=256M
[2.789062] [drm] RAM width 64bits GDDR5
[2.789062] [drm] amdgpu: 2048M of VRAM memory ready


Well we had patches like this one here before and so far we always 
rejected them.


When the mmCONFIG_MEMSIZE register isn't properly initialized then there 
is something wrong with your hardware.


Working around that in the software driver is not going to fly.

Regards,
Christian.


Signed-off-by: Qiang Ma 
---
  drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 11 +--
  drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 13 ++---
  2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
index 23b478639921..3703695f7789 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
@@ -309,8 +309,15 @@ static int gmc_v6_0_mc_init(struct amdgpu_device *adev)
}
adev->gmc.vram_width = numchan * chansize;
/* size in MB on si */
-   adev->gmc.mc_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL;
-   adev->gmc.real_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL;
+   tmp = RREG32(mmCONFIG_MEMSIZE);
+   /* some boards may have garbage in the upper 16 bits */
+   if (tmp & 0x) {
+   DRM_INFO("Probable bad vram size: 0x%08x\n", tmp);
+   if (tmp & 0x)
+   tmp &= 0x;
+   }
+   adev->gmc.mc_vram_size = tmp * 1024ULL * 1024ULL;
+   adev->gmc.real_vram_size = adev->gmc.mc_vram_size;
  
  	if (!(adev->flags & AMD_IS_APU)) {

r = amdgpu_device_resize_fb_bar(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
index 3da7b6a2b00d..1df1fc578ff6 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
@@ -316,10 +316,10 @@ static void gmc_v7_0_mc_program(struct amdgpu_device 
*adev)
  static int gmc_v7_0_mc_init(struct amdgpu_device *adev)
  {
int r;
+   u32 tmp;
  
  	adev->gmc.vram_width = amdgpu_atombios_get_vram_width(adev);

if (!adev->gmc.vram_width) {
-   u32 tmp;
int chansize, numchan;
  
  		/* Get VRAM informations */

@@ -363,8 +363,15 @@ static int gmc_v7_0_mc_init(struct amdgpu_device *adev)
adev->gmc.vram_width = numchan * chansize;
}
/* size in MB on si */
-   adev->gmc.mc_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL;
-   adev->gmc.real_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL;
+   tmp = RREG32(mmCONFIG_MEMSIZE);
+   /* some boards may have garbage in the upper 16 bits */
+   if (tmp & 0x) {
+   DRM_INFO("Probable bad vram size: 0x%08x\n", tmp);
+   if (tmp & 0x)
+   tmp &= 0x;
+   }
+   adev->gmc.mc_vram_size = tmp * 1024ULL * 1024ULL;
+   adev->gmc.real_vram_size = adev->gmc.mc_vram_size;
  
  	if (!(adev->flags & AMD_IS_APU)) {

r = amdgpu_device_resize_fb_bar(adev);




Re: [PATCH] drm/amdgpu: fix use-after-free issue

2024-04-22 Thread Christian König

Am 22.04.24 um 10:47 schrieb Jack Xiao:

Delete fence fallback timer to fix the ramdom
use-after-free issue.


That's already done in amdgpu_fence_driver_hw_fini() and absolutely 
shouldn't be in amdgpu_ring_fini().


And the kfree(ring->fence_drv.fences); shouldn't be there either since 
that is done in amdgpu_fence_driver_sw_fini().


Regards,
Christian.



Signed-off-by: Jack Xiao 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 06f0a6534a94..93ab9faa2d72 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -390,6 +390,7 @@ void amdgpu_ring_fini(struct amdgpu_ring *ring)
  >gpu_addr,
  (void **)>ring);
} else {
+   del_timer_sync(>fence_drv.fallback_timer);
kfree(ring->fence_drv.fences);
}
  




Re: [PATCH 01/15] drm/amdgpu: Add interface to reserve bad page

2024-04-22 Thread Christian König

Am 18.04.24 um 04:58 schrieb YiPeng Chai:

Add interface to reserve bad page.

Signed-off-by: YiPeng Chai 


Yeah, that approach looks valid to me. Just keep in mind that 
amdgpu_vram_mgr_query_page_status() is not the fastest function cause it 
does a linear search.


Apart from that Reviewed-by: Christian König  
for this patch, but can't really judge the rest of the patch set.


Regards,
Christian.


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 19 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h |  4 
  2 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 2c97cb80d79a..05782d68f073 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -2782,6 +2782,7 @@ int amdgpu_ras_recovery_init(struct amdgpu_device *adev)
}
}
  
+	mutex_init(>page_rsv_lock);

mutex_init(>page_retirement_lock);
init_waitqueue_head(>page_retirement_wq);
atomic_set(>page_retirement_req_cnt, 0);
@@ -2835,6 +2836,8 @@ static int amdgpu_ras_recovery_fini(struct amdgpu_device 
*adev)
  
  	atomic_set(>page_retirement_req_cnt, 0);
  
+	mutex_destroy(>page_rsv_lock);

+
cancel_work_sync(>recovery_work);
  
  	mutex_lock(>recovery_lock);

@@ -4278,3 +4281,19 @@ void amdgpu_ras_query_boot_status(struct amdgpu_device 
*adev, u32 num_instances)
amdgpu_ras_boot_time_error_reporting(adev, i, 
boot_error);
}
  }
+
+int amdgpu_ras_reserve_page(struct amdgpu_device *adev, uint64_t pfn)
+{
+   struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
+   struct amdgpu_vram_mgr *mgr = >mman.vram_mgr;
+   uint64_t start = pfn << AMDGPU_GPU_PAGE_SHIFT;
+   int ret = 0;
+
+   mutex_lock(>page_rsv_lock);
+   ret = amdgpu_vram_mgr_query_page_status(mgr, start);
+   if (ret == -ENOENT)
+   ret = amdgpu_vram_mgr_reserve_range(mgr, start, 
AMDGPU_GPU_PAGE_SIZE);
+   mutex_unlock(>page_rsv_lock);
+
+   return ret;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
index 8d26989c75c8..ab5bf573378e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
@@ -500,6 +500,7 @@ struct amdgpu_ras {
wait_queue_head_t page_retirement_wq;
struct mutex page_retirement_lock;
atomic_t page_retirement_req_cnt;
+   struct mutex page_rsv_lock;
/* Fatal error detected flag */
atomic_t fed;
  
@@ -909,4 +910,7 @@ bool amdgpu_ras_get_fed_status(struct amdgpu_device *adev);
  
  bool amdgpu_ras_event_id_is_valid(struct amdgpu_device *adev, u64 id);

  u64 amdgpu_ras_acquire_event_id(struct amdgpu_device *adev, enum 
ras_event_type type);
+
+int amdgpu_ras_reserve_page(struct amdgpu_device *adev, uint64_t pfn);
+
  #endif




Re: [PATCH] drm/amdgpu/vcn: fix unitialized variable warnings

2024-04-19 Thread Christian König

Am 18.04.24 um 20:07 schrieb Pierre-Eric Pelloux-Prayer:

Init r to 0 to avoid returning an uninitialized value if we never
enter the loop. This case should never be hit in practive, but
returning 0 doesn't hurt.

The same fix is applied to the 4 places using the same pattern.

Signed-off-by: Pierre-Eric Pelloux-Prayer 
---
  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c   | 2 +-
  drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c   | 2 +-
  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c | 2 +-
  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c | 2 +-
  4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
index 8f82fb887e9c..724445545563 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
@@ -298,7 +298,7 @@ static int vcn_v3_0_hw_init(void *handle)
  {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
struct amdgpu_ring *ring;
-   int i, j, r;
+   int i, j, r = 0;


That is usually considered bad coding style.

Better insert a "return 0;" directly before the done label.

Regards,
Christian.

  
  	if (amdgpu_sriov_vf(adev)) {

r = vcn_v3_0_start_sriov(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
index 832d15f7b5f6..9be7ae7af4b1 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
@@ -253,7 +253,7 @@ static int vcn_v4_0_hw_init(void *handle)
  {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
struct amdgpu_ring *ring;
-   int i, r;
+   int i, r = 0;
  
  	if (amdgpu_sriov_vf(adev)) {

r = vcn_v4_0_start_sriov(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
index 501e53e69f2a..593c64e4b8ef 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
@@ -221,7 +221,7 @@ static int vcn_v4_0_5_hw_init(void *handle)
  {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
struct amdgpu_ring *ring;
-   int i, r;
+   int i, r = 0;
  
  	for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {

if (adev->vcn.harvest_config & (1 << i))
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c
index bc60c554eb32..246f967e2e7d 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c
@@ -187,7 +187,7 @@ static int vcn_v5_0_0_hw_init(void *handle)
  {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
struct amdgpu_ring *ring;
-   int i, r;
+   int i, r = 0;
  
  	for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {

if (adev->vcn.harvest_config & (1 << i))




Re: [PATCH] drm/amdgpu/umsch: don't execute umsch test when GPU is in reset/suspend

2024-04-19 Thread Christian König

Am 19.04.24 um 09:52 schrieb Lang Yu:

umsch test needs full GPU functionality(e.g., VM update, TLB flush,
possibly buffer moving under memory pressure) which may be not ready
under these states. Just skip it to avoid potential issues.

Signed-off-by: Lang Yu 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
index 06ad68714172..9f9d6a6d5cf3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
@@ -774,6 +774,9 @@ static int umsch_mm_late_init(void *handle)
  {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
  
+	if (amdgpu_in_reset(adev) || adev->in_s0ix || adev->in_suspend)

+   return 0;
+
return umsch_mm_test(adev);
  }
  




Re: [PATCH] drm/amdgpu: Update BO eviction priorities

2024-04-19 Thread Christian König

Am 18.04.24 um 20:06 schrieb Felix Kuehling:

Make SVM BOs more likely to get evicted than other BOs. These BOs
opportunistically use available VRAM, but can fall back relatively
seamlessly to system memory. It also avoids SVM migrations evicting
other, more important BOs as they will evict other SVM allocations
first.

Signed-off-by: Felix Kuehling 


Good point and at least of hand I can't think of anything which could go 
wrong here.


Just keep an eye on potentially failing CI tests since we haven't really 
exercised this functionality in recent years.


Reviewed-by: Christian König 

Regards,
Christian.


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index cd2dd3ed7153..d80671535ab3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -608,6 +608,8 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
else
amdgpu_bo_placement_from_domain(bo, bp->domain);
if (bp->type == ttm_bo_type_kernel)
+   bo->tbo.priority = 2;
+   else if (!(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE))
bo->tbo.priority = 1;
  
  	if (!bp->destroy)




Re: [PATCH v2 1/6] drm/amdgpu: Support contiguous VRAM allocation

2024-04-18 Thread Christian König




Am 18.04.24 um 15:57 schrieb Philip Yang:

RDMA device with limited scatter-gather ability requires contiguous VRAM
buffer allocation for RDMA peer direct support.

Add a new KFD alloc memory flag and store as bo alloc flag
AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS. When pin this bo to export for RDMA
peerdirect access, this will set TTM_PL_FLAG_CONTIFUOUS flag, and ask
VRAM buddy allocator to get contiguous VRAM.

Remove the 2GB max memory block size limit for contiguous allocation.

Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 9 +++--
  include/uapi/linux/kfd_ioctl.h   | 1 +
  3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 0ae9fd844623..ef9154043757 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1712,6 +1712,10 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
alloc_flags = AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
alloc_flags |= (flags & KFD_IOC_ALLOC_MEM_FLAGS_PUBLIC) 
?
AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED : 0;
+
+   /* For contiguous VRAM allocation */
+   if (flags & 
KFD_IOC_ALLOC_MEM_FLAGS_CONTIGUOUS_BEST_EFFORT)
+   alloc_flags |= 
AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
}
xcp_id = fpriv->xcp_id == AMDGPU_XCP_NO_PARTITION ?
0 : fpriv->xcp_id;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 4be8b091099a..2f2ae711 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -532,8 +532,13 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager 
*man,
  
  		BUG_ON(min_block_size < mm->chunk_size);
  
-		/* Limit maximum size to 2GiB due to SG table limitations */

-   size = min(remaining_size, 2ULL << 30);
+   if (place->flags & TTM_PL_FLAG_CONTIGUOUS)
+   size = remaining_size;
+   else
+   /* Limit maximum size to 2GiB due to SG table 
limitations
+* for no contiguous allocation.
+*/
+   size = min(remaining_size, 2ULL << 30);


Oh, I totally missed this in the first review. That won't work like that 
the sg table limit is still there even if the BO is contiguous.


We could only fix up the VRAM P2P support to use multiple segments in 
the sg table.


Regards,
Christian.

  
  		if ((size >= (u64)pages_per_block << PAGE_SHIFT) &&

!(size & (((u64)pages_per_block << PAGE_SHIFT) 
- 1)))
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 2040a470ddb4..c1394c162d4e 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -407,6 +407,7 @@ struct kfd_ioctl_acquire_vm_args {
  #define KFD_IOC_ALLOC_MEM_FLAGS_COHERENT  (1 << 26)
  #define KFD_IOC_ALLOC_MEM_FLAGS_UNCACHED  (1 << 25)
  #define KFD_IOC_ALLOC_MEM_FLAGS_EXT_COHERENT  (1 << 24)
+#define KFD_IOC_ALLOC_MEM_FLAGS_CONTIGUOUS_BEST_EFFORT (1 << 23)
  
  /* Allocate memory for later SVM (shared virtual memory) mapping.

   *




Re: [PATCH 15/15] drm/amdgpu: Use new interface to reserve bad page

2024-04-18 Thread Christian König

Am 18.04.24 um 04:58 schrieb YiPeng Chai:

Use new interface to reserve bad page.

Signed-off-by: YiPeng Chai 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index d1a2ab944b7d..dee66db10fa2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -2548,9 +2548,7 @@ int amdgpu_ras_add_bad_pages(struct amdgpu_device *adev,
goto out;
}
  
-		amdgpu_vram_mgr_reserve_range(>mman.vram_mgr,

-   bps[i].retired_page << AMDGPU_GPU_PAGE_SHIFT,
-   AMDGPU_GPU_PAGE_SIZE);


Were is the call to reserve the VRAM range now moved?

Regards,
Christian.


+   amdgpu_ras_reserve_page(adev, bps[i].retired_page);
  
  		memcpy(>bps[data->count], [i], sizeof(*data->bps));

data->count++;




Re: [PATCH v5 1/6] drm/amdgpu: add prototype for ip dump

2024-04-18 Thread Christian König

Am 17.04.24 um 17:45 schrieb Alex Deucher:

On Wed, Apr 17, 2024 at 5:38 AM Sunil Khatri  wrote:

Add the prototype to dump ip registers
for all ips of different asics and set
them to NULL for now. Based on the
requirement add a function pointer for
each of them.

Signed-off-by: Sunil Khatri 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c   | 1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c  | 1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c  | 1 +
  drivers/gpu/drm/amd/amdgpu/cik.c  | 1 +
  drivers/gpu/drm/amd/amdgpu/cik_ih.c   | 1 +
  drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 1 +
  drivers/gpu/drm/amd/amdgpu/cz_ih.c| 1 +
  drivers/gpu/drm/amd/amdgpu/dce_v10_0.c| 1 +
  drivers/gpu/drm/amd/amdgpu/dce_v11_0.c| 1 +
  drivers/gpu/drm/amd/amdgpu/dce_v6_0.c | 1 +
  drivers/gpu/drm/amd/amdgpu/dce_v8_0.c | 1 +
  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 1 +
  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c| 1 +
  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 1 +
  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 1 +
  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 1 +
  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 1 +
  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c   | 1 +
  drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 1 +
  drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 1 +
  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 1 +
  drivers/gpu/drm/amd/amdgpu/iceland_ih.c   | 1 +
  drivers/gpu/drm/amd/amdgpu/ih_v6_0.c  | 1 +
  drivers/gpu/drm/amd/amdgpu/ih_v6_1.c  | 1 +
  drivers/gpu/drm/amd/amdgpu/ih_v7_0.c  | 1 +
  drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c| 1 +
  drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c| 2 ++
  drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c| 1 +
  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c| 1 +
  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c  | 1 +
  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c  | 1 +
  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c  | 1 +
  drivers/gpu/drm/amd/amdgpu/mes_v10_1.c| 1 +
  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 1 +
  drivers/gpu/drm/amd/amdgpu/navi10_ih.c| 1 +
  drivers/gpu/drm/amd/amdgpu/nv.c   | 1 +
  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c| 1 +
  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c| 1 +
  drivers/gpu/drm/amd/amdgpu/si.c   | 1 +
  drivers/gpu/drm/amd/amdgpu/si_dma.c   | 1 +
  drivers/gpu/drm/amd/amdgpu/si_ih.c| 1 +
  drivers/gpu/drm/amd/amdgpu/soc15.c| 1 +
  drivers/gpu/drm/amd/amdgpu/soc21.c| 1 +
  drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 1 +
  drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c | 1 +
  drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c | 1 +
  drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c | 1 +
  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 1 +
  drivers/gpu/drm/amd/amdgpu/vce_v2_0.c | 1 +
  drivers/gpu/drm/amd/amdgpu/vce_v3_0.c | 1 +
  drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 1 +
  drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 1 +
  drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 2 ++
  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 1 +
  drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 1 +
  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c   | 1 +
  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c   | 1 +
  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c   | 1 +
  drivers/gpu/drm/amd/amdgpu/vi.c   | 1 +
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 +
  drivers/gpu/drm/amd/include/amd_shared.h  | 1 +
  drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c| 1 +
  drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c| 1 +
  drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c  | 1 +
  64 files changed, 66 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c
index 6d72355ac492..34a62033a388 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c
@@ -637,6 +637,7 @@ static const struct amd_ip_funcs acp_ip_funcs = {
 .soft_reset = acp_soft_reset,
 .set_clockgating_state = acp_set_clockgating_state,
 .set_powergating_state = acp_set_powergating_state,
+   .dump_ip_state = NULL,

You can skip all of the NULL assignments.  Static global structures
will be 0 initialized.


Oh, that's a really good point. We have automated checkers complaining 
about NULL initialization in structures.


So that here would cause tons of automated complains.

Regards,
Christian.


   Either way:
Reviewed-by: Alex Deucher 

Alex


  };

  const struct amdgpu_ip_block_version acp_ip_block = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c 

Re: [PATCH v5 2/6] drm/amdgpu: add support of gfx10 register dump

2024-04-18 Thread Christian König

Am 17.04.24 um 19:30 schrieb Alex Deucher:

On Wed, Apr 17, 2024 at 1:01 PM Khatri, Sunil  wrote:


On 4/17/2024 10:21 PM, Alex Deucher wrote:

On Wed, Apr 17, 2024 at 12:24 PM Lazar, Lijo  wrote:

[AMD Official Use Only - General]

Yes, right now that API doesn't return anything. What I meant is to add that 
check as well as coredump API is essentially used in hang situations.

Old times, access to registers while in GFXOFF resulted in system hang 
(basically it won't go beyond this point). If that happens, then the purpose of 
the patch - to get the context of a device hang - is lost. We may not even get 
a proper dmesg log.

Maybe add a call to amdgpu_get_gfx_off_status(), but unfortunately,
it's not implemented on every chip yet.

So we need both the things do gfx_off and then try status and then read
reg and enable gfx_off again.

RIght, but first we need to implement the get_gfxoff_status smu
callback for all of the chips that are missing it.


The question is if it's save to query the status and disable it while 
the GPU is in a hung state?


I mean most of unrecoverable hungs are caused by the GFX block or the 
memory interface getting into a state where it can't get out again.


Regards,
Christian.



Alex


   amdgpu_gfx_off_ctrl(adev, false);
   r= amdgpu_get_gfx_off_status
   if (!r) {

 for (i = 0; i < reg_count; i++)
 adev->gfx.ip_dump[i] =
 RREG32(SOC15_REG_ENTRY_OFFSET(gc_reg_list_10_1[i]));
}
amdgpu_gfx_off_ctrl(adev, true);

Sunil


Alex


Thanks,
Lijo
-Original Message-
From: Khatri, Sunil 
Sent: Wednesday, April 17, 2024 9:42 PM
To: Lazar, Lijo ; Alex Deucher ; Khatri, 
Sunil 
Cc: Deucher, Alexander ; Koenig, Christian 
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH v5 2/6] drm/amdgpu: add support of gfx10 register dump


On 4/17/2024 9:31 PM, Lazar, Lijo wrote:

On 4/17/2024 9:21 PM, Alex Deucher wrote:

On Wed, Apr 17, 2024 at 5:38 AM Sunil Khatri  wrote:

Adding gfx10 gc registers to be used for register dump via
devcoredump during a gpu reset.

Signed-off-by: Sunil Khatri 

Reviewed-by: Alex Deucher 


---
drivers/gpu/drm/amd/amdgpu/amdgpu.h   |   8 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h   |   4 +
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 130 +-
drivers/gpu/drm/amd/amdgpu/soc15.h|   2 +
.../include/asic_reg/gc/gc_10_1_0_offset.h|  12 ++
5 files changed, 155 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index e0d7f4ee7e16..cac0ca64367b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -139,6 +139,14 @@ enum amdgpu_ss {
   AMDGPU_SS_DRV_UNLOAD
};

+struct amdgpu_hwip_reg_entry {
+   u32 hwip;
+   u32 inst;
+   u32 seg;
+   u32 reg_offset;
+   const char  *reg_name;
+};
+
struct amdgpu_watchdog_timer {
   bool timeout_fatal_disable;
   uint32_t period; /* maxCycles = (1 << period), the number
of cycles before a timeout */ diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index 04a86dff71e6..64f197bbc866 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -433,6 +433,10 @@ struct amdgpu_gfx {
   uint32_tnum_xcc_per_xcp;
   struct mutexpartition_mutex;
   boolmcbp; /* mid command buffer 
preemption */
+
+   /* IP reg dump */
+   uint32_t*ip_dump;
+   uint32_treg_count;
};

struct amdgpu_gfx_ras_reg_entry {
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index a0bc4196ff8b..4a54161f4837 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -276,6 +276,99 @@ MODULE_FIRMWARE("amdgpu/gc_10_3_7_mec.bin");
MODULE_FIRMWARE("amdgpu/gc_10_3_7_mec2.bin");
MODULE_FIRMWARE("amdgpu/gc_10_3_7_rlc.bin");

+static const struct amdgpu_hwip_reg_entry gc_reg_list_10_1[] = {
+   SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS),
+   SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS2),
+   SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS3),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_STALLED_STAT1),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_STALLED_STAT2),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPC_STALLED_STAT1),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPF_STALLED_STAT1),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_BUSY_STAT),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPC_BUSY_STAT),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPF_BUSY_STAT),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPC_BUSY_STAT2),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPF_BUSY_STAT2),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPF_STATUS),
+   SOC15_REG_ENTRY_STR(GC, 

Re: [PATCH 3/3] drm/amdgpu/mes11: make fence waits synchronous

2024-04-17 Thread Christian König

Am 17.04.24 um 21:21 schrieb Alex Deucher:

On Wed, Apr 17, 2024 at 3:17 PM Liu, Shaoyun  wrote:

[AMD Official Use Only - General]

I have  a discussion with Christian about this before .  The conclusion is that 
driver should prevent multiple process from using  the  MES ring at the same 
time . Also for current MES  ring usage ,driver doesn't have the  logic to 
prevent the ring  been  overflowed and we doesn't hit the issue because MES 
will wait polling for each MES submission . If we want to change the MES to 
work asynchronously , we need to consider a way to avoid this (similar to add 
the limit in the fence handling we use for kiq and  HMM paging)


I think we need a separate fence (different GPU address and seq
number) per request.  Then each caller can wait independently.


Well no, we need to modify the MES firmware to stop abusing the fence as 
signaling mechanism for the result of an operation.


I've pointed that out before and I think this is a hard requirement for 
correct operation.


Additional to that retrying on the reset flag looks like another broken 
workaround to me.


So just to make it clear this approach is a NAK from my side, don't 
commit that.


Regards,
Christian.



Alex


Regards
Shaoyun.liu

-Original Message-
From: amd-gfx  On Behalf Of Christian 
König
Sent: Wednesday, April 17, 2024 8:49 AM
To: Chen, Horace ; amd-gfx@lists.freedesktop.org
Cc: Andrey Grodzovsky ; Kuehling, Felix ; Deucher, Alexander 
; Xiao, Jack ; Zhang, Hawking ; Liu, Monk 
; Xu, Feifei ; Chang, HaiJun ; Leo Liu 
; Liu, Jenny (Jing) 
Subject: Re: [PATCH 3/3] drm/amdgpu/mes11: make fence waits synchronous

Am 17.04.24 um 13:30 schrieb Horace Chen:

The MES firmware expects synchronous operation with the driver.  For
this to work asynchronously, each caller would need to provide its own
fence location and sequence number.

Well that's certainly not correct. The seqno takes care that we can wait async 
for the submission to complete.

So clear NAK for that patch here.

Regards,
Christian.


For now, add a mutex lock to serialize the MES submission.
For SR-IOV long-wait case, break the long-wait to separated part to
prevent this wait from impacting reset sequence.

Signed-off-by: Horace Chen 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c |  3 +++
   drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h |  1 +
   drivers/gpu/drm/amd/amdgpu/mes_v11_0.c  | 18 ++
   3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index 78e4f88f5134..8896be95b2c8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -137,6 +137,7 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
   spin_lock_init(>mes.queue_id_lock);
   spin_lock_init(>mes.ring_lock);
   mutex_init(>mes.mutex_hidden);
+ mutex_init(>mes.submission_lock);

   adev->mes.total_max_queue = AMDGPU_FENCE_MES_QUEUE_ID_MASK;
   adev->mes.vmid_mask_mmhub = 0xff00; @@ -221,6 +222,7 @@ int
amdgpu_mes_init(struct amdgpu_device *adev)
   idr_destroy(>mes.queue_id_idr);
   ida_destroy(>mes.doorbell_ida);
   mutex_destroy(>mes.mutex_hidden);
+ mutex_destroy(>mes.submission_lock);
   return r;
   }

@@ -240,6 +242,7 @@ void amdgpu_mes_fini(struct amdgpu_device *adev)
   idr_destroy(>mes.queue_id_idr);
   ida_destroy(>mes.doorbell_ida);
   mutex_destroy(>mes.mutex_hidden);
+ mutex_destroy(>mes.submission_lock);
   }

   static void amdgpu_mes_queue_free_mqd(struct amdgpu_mes_queue *q)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index 6b3e1844eac5..90af935cc889 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -85,6 +85,7 @@ struct amdgpu_mes {

   struct amdgpu_ring  ring;
   spinlock_t  ring_lock;
+ struct mutexsubmission_lock;

   const struct firmware   *fw[AMDGPU_MAX_MES_PIPES];

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index e40d00afd4f5..0a609a5b8835 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -162,6 +162,7 @@ static int mes_v11_0_submit_pkt_and_poll_completion(struct 
amdgpu_mes *mes,
   struct amdgpu_ring *ring = >ring;
   unsigned long flags;
   signed long timeout = adev->usec_timeout;
+ signed long retry_count = 1;
   const char *op_str, *misc_op_str;

   if (x_pkt->header.opcode >= MES_SCH_API_MAX) @@ -169,15 +170,19 @@
static int mes_v11_0_submit_pkt_and_poll_completion(struct amdgpu_mes
*mes,

   if (amdgpu_emu_mode) {
   timeout *= 100;
- } else if (amdgpu_sriov_vf(adev)) {
+ }
+
+ if (amdgpu_sriov_vf(adev) && timeout > 0) {
   /* 

Re: [PATCH 3/3] drm/amdgpu/mes11: make fence waits synchronous

2024-04-17 Thread Christian König

Am 17.04.24 um 13:30 schrieb Horace Chen:

The MES firmware expects synchronous operation with the
driver.  For this to work asynchronously, each caller
would need to provide its own fence location and sequence
number.


Well that's certainly not correct. The seqno takes care that we can wait 
async for the submission to complete.


So clear NAK for that patch here.

Regards,
Christian.



For now, add a mutex lock to serialize the MES submission.
For SR-IOV long-wait case, break the long-wait to separated
part to prevent this wait from impacting reset sequence.

Signed-off-by: Horace Chen 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c |  3 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h |  1 +
  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c  | 18 ++
  3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index 78e4f88f5134..8896be95b2c8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -137,6 +137,7 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
spin_lock_init(>mes.queue_id_lock);
spin_lock_init(>mes.ring_lock);
mutex_init(>mes.mutex_hidden);
+   mutex_init(>mes.submission_lock);
  
  	adev->mes.total_max_queue = AMDGPU_FENCE_MES_QUEUE_ID_MASK;

adev->mes.vmid_mask_mmhub = 0xff00;
@@ -221,6 +222,7 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
idr_destroy(>mes.queue_id_idr);
ida_destroy(>mes.doorbell_ida);
mutex_destroy(>mes.mutex_hidden);
+   mutex_destroy(>mes.submission_lock);
return r;
  }
  
@@ -240,6 +242,7 @@ void amdgpu_mes_fini(struct amdgpu_device *adev)

idr_destroy(>mes.queue_id_idr);
ida_destroy(>mes.doorbell_ida);
mutex_destroy(>mes.mutex_hidden);
+   mutex_destroy(>mes.submission_lock);
  }
  
  static void amdgpu_mes_queue_free_mqd(struct amdgpu_mes_queue *q)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index 6b3e1844eac5..90af935cc889 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -85,6 +85,7 @@ struct amdgpu_mes {
  
  	struct amdgpu_ring  ring;

spinlock_t  ring_lock;
+   struct mutexsubmission_lock;
  
  	const struct firmware   *fw[AMDGPU_MAX_MES_PIPES];
  
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c

index e40d00afd4f5..0a609a5b8835 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -162,6 +162,7 @@ static int mes_v11_0_submit_pkt_and_poll_completion(struct 
amdgpu_mes *mes,
struct amdgpu_ring *ring = >ring;
unsigned long flags;
signed long timeout = adev->usec_timeout;
+   signed long retry_count = 1;
const char *op_str, *misc_op_str;
  
  	if (x_pkt->header.opcode >= MES_SCH_API_MAX)

@@ -169,15 +170,19 @@ static int 
mes_v11_0_submit_pkt_and_poll_completion(struct amdgpu_mes *mes,
  
  	if (amdgpu_emu_mode) {

timeout *= 100;
-   } else if (amdgpu_sriov_vf(adev)) {
+   }
+
+   if (amdgpu_sriov_vf(adev) && timeout > 0) {
/* Worst case in sriov where all other 15 VF timeout, each VF 
needs about 600ms */
-   timeout = 15 * 600 * 1000;
+   retry_count = (15 * 600 * 1000) / timeout;
}
BUG_ON(size % 4 != 0);
  
+	mutex_lock(>submission_lock);

spin_lock_irqsave(>ring_lock, flags);
if (amdgpu_ring_alloc(ring, ndw)) {
spin_unlock_irqrestore(>ring_lock, flags);
+   mutex_unlock(>submission_lock);
return -ENOMEM;
}
  
@@ -199,8 +204,13 @@ static int mes_v11_0_submit_pkt_and_poll_completion(struct amdgpu_mes *mes,

else
dev_dbg(adev->dev, "MES msg=%d was emitted\n", 
x_pkt->header.opcode);
  
-	r = amdgpu_fence_wait_polling(ring, ring->fence_drv.sync_seq,

- timeout);
+   do {
+   r = amdgpu_fence_wait_polling(ring, ring->fence_drv.sync_seq,
+   timeout);
+   retry_count--;
+   } while (retry_count > 0 && !amdgpu_in_reset(adev));
+
+   mutex_unlock(>submission_lock);
if (r < 1) {
  
  		if (misc_op_str)




Re: [PATCH Review 1/1] drm/amdgpu: Support setting reset_method at runtime

2024-04-17 Thread Christian König

Am 12.04.24 um 08:21 schrieb Stanley.Yang:

Signed-off-by: Stanley.Yang 


You are missing a commit message, without it the patch will 
automatically be rejected when you try to push it.


With that added Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 80b9642f2bc4..5f5bf0c26b1f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -915,7 +915,7 @@ module_param_named(freesync_video, 
amdgpu_freesync_vid_mode, uint, 0444);
   * GPU reset method (-1 = auto (default), 0 = legacy, 1 = mode0, 2 = mode1, 3 
= mode2, 4 = baco)
   */
  MODULE_PARM_DESC(reset_method, "GPU reset method (-1 = auto (default), 0 = legacy, 
1 = mode0, 2 = mode1, 3 = mode2, 4 = baco/bamaco)");
-module_param_named(reset_method, amdgpu_reset_method, int, 0444);
+module_param_named(reset_method, amdgpu_reset_method, int, 0644);
  
  /**

   * DOC: bad_page_threshold (int) Bad page threshold is specifies the




Re: [PATCH v4 2/6] drm/amdgpu: add support of gfx10 register dump

2024-04-17 Thread Christian König




Am 17.04.24 um 10:18 schrieb Sunil Khatri:

Adding gfx10 gc registers to be used for register
dump via devcoredump during a gpu reset.

Signed-off-by: Sunil Khatri 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |   8 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h   |   4 +
  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 130 +-
  drivers/gpu/drm/amd/amdgpu/soc15.h|   2 +
  .../include/asic_reg/gc/gc_10_1_0_offset.h|  12 ++
  5 files changed, 155 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index e0d7f4ee7e16..210af65a744c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -139,6 +139,14 @@ enum amdgpu_ss {
AMDGPU_SS_DRV_UNLOAD
  };
  
+struct amdgpu_hwip_reg_entry {

+   u32 hwip;
+   u32 inst;
+   u32 seg;
+   u32 reg_offset;



+   charreg_name[50];


Make that a const char *. Otherwise it bloats up the final binary 
because the compiler has to add zeros at the end.



+};
+
  struct amdgpu_watchdog_timer {
bool timeout_fatal_disable;
uint32_t period; /* maxCycles = (1 << period), the number of cycles 
before a timeout */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index 04a86dff71e6..64f197bbc866 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -433,6 +433,10 @@ struct amdgpu_gfx {
uint32_tnum_xcc_per_xcp;
struct mutexpartition_mutex;
boolmcbp; /* mid command buffer preemption 
*/
+
+   /* IP reg dump */
+   uint32_t*ip_dump;
+   uint32_treg_count;
  };
  
  struct amdgpu_gfx_ras_reg_entry {

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index a0bc4196ff8b..4a54161f4837 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -276,6 +276,99 @@ MODULE_FIRMWARE("amdgpu/gc_10_3_7_mec.bin");
  MODULE_FIRMWARE("amdgpu/gc_10_3_7_mec2.bin");
  MODULE_FIRMWARE("amdgpu/gc_10_3_7_rlc.bin");
  
+static const struct amdgpu_hwip_reg_entry gc_reg_list_10_1[] = {

+   SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS),
+   SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS2),
+   SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS3),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_STALLED_STAT1),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_STALLED_STAT2),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPC_STALLED_STAT1),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPF_STALLED_STAT1),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_BUSY_STAT),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPC_BUSY_STAT),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPF_BUSY_STAT),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPC_BUSY_STAT2),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPF_BUSY_STAT2),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPF_STATUS),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_ERROR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HPD_STATUS0),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB_BASE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB_RPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB_WPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB0_BASE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB0_RPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB0_WPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB1_BASE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB1_RPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB1_WPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB2_BASE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB2_WPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB2_WPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_CE_IB1_CMD_BUFSZ),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_CE_IB2_CMD_BUFSZ),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_IB1_CMD_BUFSZ),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_IB2_CMD_BUFSZ),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_CE_IB1_BASE_LO),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_CE_IB1_BASE_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_CE_IB1_BUFSZ),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_CE_IB2_BASE_LO),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_CE_IB2_BASE_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_CE_IB2_BUFSZ),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_IB1_BASE_LO),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_IB1_BASE_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_IB1_BUFSZ),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_IB2_BASE_LO),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_IB2_BASE_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_IB2_BUFSZ),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCPF_UTCL1_STATUS),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCPC_UTCL1_STATUS),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCPG_UTCL1_STATUS),
+   SOC15_REG_ENTRY_STR(GC, 0, mmGDS_PROTECTION_FAULT),
+   SOC15_REG_ENTRY_STR(GC, 0, mmGDS_VM_PROTECTION_FAULT),
+   

Re: [PATCH v3 2/5] drm:amdgpu: Enable IH ring1 for IH v6.1

2024-04-17 Thread Christian König

Am 17.04.24 um 08:43 schrieb Friedrich Vock:

On 16.04.24 15:34, Sunil Khatri wrote:

We need IH ring1 for handling the pagefault
interrupts which over flow in default
ring for specific usecases.

Signed-off-by: Sunil Khatri
---
  drivers/gpu/drm/amd/amdgpu/ih_v6_1.c | 11 +--
  1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c 
b/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c

index b8da0fc29378..73dba180fabd 100644
--- a/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c
@@ -550,8 +550,15 @@ static int ih_v6_1_sw_init(void *handle)
  adev->irq.ih.use_doorbell = true;
  adev->irq.ih.doorbell_index = adev->doorbell_index.ih << 1;

-    adev->irq.ih1.ring_size = 0;
-    adev->irq.ih2.ring_size = 0;
+    if (!(adev->flags & AMD_IS_APU)) {


Why restrict this to dGPUs? Page faults can overflow the default ring on
APUs too (e.g. for Vangogh).


Because APUs don't have the necessary hw. In other words they have no 
secondary IH ring buffer :(


But we are working on a fw fix for them and Navi 1x and 2x as well.

Regards,
Christian.



Regards,
Friedrich


+    r = amdgpu_ih_ring_init(adev, >irq.ih1, IH_RING_SIZE,
+    use_bus_addr);
+    if (r)
+    return r;
+
+    adev->irq.ih1.use_doorbell = true;
+    adev->irq.ih1.doorbell_index = (adev->doorbell_index.ih + 1) 
<< 1;

+    }

  /* initialize ih control register offset */
  ih_v6_1_init_register_offset(adev);




Re: [PATCH v2] drm/amdgpu: Modify the contiguous flags behaviour

2024-04-17 Thread Christian König

Am 17.04.24 um 08:21 schrieb Arunpravin Paneer Selvam:

Now we have two flags for contiguous VRAM buffer allocation.
If the application request for AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
it would set the ttm place TTM_PL_FLAG_CONTIGUOUS flag in the
buffer's placement function.

This patch will change the default behaviour of the two flags.

When we set AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS
- This means contiguous is not mandatory.
- we will try to allocate the contiguous buffer. Say if the
   allocation fails, we fallback to allocate the individual pages.

When we setTTM_PL_FLAG_CONTIGUOUS
- This means contiguous allocation is mandatory.
- we are setting this in amdgpu_bo_pin_restricted() before bo validation
   and check this flag in the vram manager file.
- if this is set, we should allocate the buffer pages contiguously.
   the allocation fails, we return -ENOSPC.

v2:
   - keep the mem_flags and bo->flags check as is(Christian)
   - place the TTM_PL_FLAG_CONTIGUOUS flag setting into the
 amdgpu_bo_pin_restricted function placement range iteration
 loop(Christian)
   - rename find_pages with amdgpu_vram_mgr_calculate_pages_per_block
 (Christian)
   - Keep the kernel BO allocation as is(Christain)
   - If BO pin vram allocation failed, we need to return -ENOSPC as
 RDMA cannot work with scattered VRAM pages(Philip)

Signed-off-by: Arunpravin Paneer Selvam 
Suggested-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c   |  8 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 57 +++-
  2 files changed, 50 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 8bc79924d171..caaef7b1df49 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -153,8 +153,10 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo 
*abo, u32 domain)
else
places[c].flags |= TTM_PL_FLAG_TOPDOWN;
  
-		if (flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)

+   if (abo->tbo.type == ttm_bo_type_kernel &&
+   flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)
places[c].flags |= TTM_PL_FLAG_CONTIGUOUS;
+
c++;
}
  
@@ -966,6 +968,10 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain,

if (!bo->placements[i].lpfn ||
(lpfn && lpfn < bo->placements[i].lpfn))
bo->placements[i].lpfn = lpfn;
+
+   if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS &&
+   bo->placements[i].mem_type == TTM_PL_VRAM)
+   bo->placements[i].flags |= TTM_PL_FLAG_CONTIGUOUS;
}
  
  	r = ttm_bo_validate(>tbo, >placement, );


Nice work, up till here that looks exactly right as far as I can see.


diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 8db880244324..4be8b091099a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -88,6 +88,29 @@ static inline u64 amdgpu_vram_mgr_blocks_size(struct 
list_head *head)
return size;
  }
  
+static inline unsigned long

+amdgpu_vram_mgr_calculate_pages_per_block(struct ttm_buffer_object *tbo,
+ const struct ttm_place *place,
+ unsigned long bo_flags)
+{
+   unsigned long pages_per_block;
+
+   if (bo_flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) {
+   pages_per_block = ~0ul;


If I understand it correctly this here enforces the allocation of a 
contiguous buffer in the way that it says we should have only one giant 
page for the whole BO.



+   } else {
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+   pages_per_block = HPAGE_PMD_NR;
+#else
+   /* default to 2MB */
+   pages_per_block = 2UL << (20UL - PAGE_SHIFT);
+#endif
+   pages_per_block = max_t(uint32_t, pages_per_block,
+   tbo->page_alignment);
+   }
+
+   return pages_per_block;
+}
+
  /**
   * DOC: mem_info_vram_total
   *
@@ -451,8 +474,10 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager 
*man,
struct amdgpu_vram_mgr *mgr = to_vram_mgr(man);
struct amdgpu_device *adev = to_amdgpu_device(mgr);
u64 vis_usage = 0, max_bytes, min_block_size;
+   struct amdgpu_bo *bo = ttm_to_amdgpu_bo(tbo);
struct amdgpu_vram_mgr_resource *vres;
u64 size, remaining_size, lpfn, fpfn;
+   unsigned long bo_flags = bo->flags;
struct drm_buddy *mm = >mm;
struct drm_buddy_block *block;
unsigned long pages_per_block;
@@ -468,18 +493,9 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager 
*man,
i

Re: [PATCH 2/6] drm/amdgpu: add support of gfx10 register dump

2024-04-16 Thread Christian König

Am 16.04.24 um 15:55 schrieb Alex Deucher:

On Tue, Apr 16, 2024 at 8:08 AM Sunil Khatri  wrote:

Adding gfx10 gc registers to be used for register
dump via devcoredump during a gpu reset.

Signed-off-by: Sunil Khatri 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  12 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h   |   4 +
  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 131 +-
  .../include/asic_reg/gc/gc_10_1_0_offset.h|  12 ++
  4 files changed, 158 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index e0d7f4ee7e16..e016ac33629d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -139,6 +139,18 @@ enum amdgpu_ss {
 AMDGPU_SS_DRV_UNLOAD
  };

+struct hwip_reg_entry {
+   u32 hwip;
+   u32 inst;
+   u32 seg;
+   u32 reg_offset;
+};
+
+struct reg_pair {
+   u32 offset;
+   u32 value;
+};
+
  struct amdgpu_watchdog_timer {
 bool timeout_fatal_disable;
 uint32_t period; /* maxCycles = (1 << period), the number of cycles 
before a timeout */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index 04a86dff71e6..295a2c8d2e48 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -433,6 +433,10 @@ struct amdgpu_gfx {
 uint32_tnum_xcc_per_xcp;
 struct mutexpartition_mutex;
 boolmcbp; /* mid command buffer preemption 
*/
+
+   /* IP reg dump */
+   struct reg_pair *ip_dump;
+   uint32_treg_count;
  };

  struct amdgpu_gfx_ras_reg_entry {
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index a0bc4196ff8b..46e136609ff1 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -276,6 +276,99 @@ MODULE_FIRMWARE("amdgpu/gc_10_3_7_mec.bin");
  MODULE_FIRMWARE("amdgpu/gc_10_3_7_mec2.bin");
  MODULE_FIRMWARE("amdgpu/gc_10_3_7_rlc.bin");

+static const struct hwip_reg_entry gc_reg_list_10_1[] = {
+   { SOC15_REG_ENTRY(GC, 0, mmGRBM_STATUS) },
+   { SOC15_REG_ENTRY(GC, 0, mmGRBM_STATUS2) },
+   { SOC15_REG_ENTRY(GC, 0, mmGRBM_STATUS3) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_STALLED_STAT1) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_STALLED_STAT2) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_CPC_STALLED_STAT1) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_CPF_STALLED_STAT1) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_BUSY_STAT) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_CPC_BUSY_STAT) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_CPF_BUSY_STAT) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_CPC_BUSY_STAT2) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_CPF_BUSY_STAT2) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_CPF_STATUS) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_GFX_ERROR) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_GFX_HPD_STATUS0) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_RB_BASE) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_RB_RPTR) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_RB_WPTR) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_RB0_BASE) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_RB0_RPTR) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_RB0_WPTR) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_RB1_BASE) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_RB1_RPTR) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_RB1_WPTR) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_RB2_BASE) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_RB2_WPTR) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_RB2_WPTR) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_CE_IB1_CMD_BUFSZ) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_CE_IB2_CMD_BUFSZ) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_IB1_CMD_BUFSZ) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_IB2_CMD_BUFSZ) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_CE_IB1_BASE_LO) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_CE_IB1_BASE_HI) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_CE_IB1_BUFSZ) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_CE_IB2_BASE_LO) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_CE_IB2_BASE_HI) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_CE_IB2_BUFSZ) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_IB1_BASE_LO) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_IB1_BASE_HI) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_IB1_BUFSZ) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_IB2_BASE_LO) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_IB2_BASE_HI) },
+   { SOC15_REG_ENTRY(GC, 0, mmCP_IB2_BUFSZ) },
+   { SOC15_REG_ENTRY(GC, 0, mmCPF_UTCL1_STATUS) },
+   { SOC15_REG_ENTRY(GC, 0, mmCPC_UTCL1_STATUS) },
+   { SOC15_REG_ENTRY(GC, 0, mmCPG_UTCL1_STATUS) },
+   { SOC15_REG_ENTRY(GC, 0, mmGDS_PROTECTION_FAULT) },
+   { SOC15_REG_ENTRY(GC, 0, mmGDS_VM_PROTECTION_FAULT) },
+   { SOC15_REG_ENTRY(GC, 0, mmIA_UTCL1_STATUS) },
+   { SOC15_REG_ENTRY(GC, 0, mmIA_UTCL1_STATUS_2) },
+   { 

Re: [PATCH v3 1/5] drm:amdgpu: enable IH RB ring1 for IH v6.0

2024-04-16 Thread Christian König

Am 16.04.24 um 15:34 schrieb Sunil Khatri:

We need IH ring1 for handling the pagefault
interrupts which are overflowing the default
ring for specific usecases.

Signed-off-by: Sunil Khatri 


Reviewed-by: Christian König  for the entire 
series.



---
  drivers/gpu/drm/amd/amdgpu/ih_v6_0.c | 11 +--
  1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c
index ad4ad39f128f..26dc99232eb6 100644
--- a/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c
@@ -549,8 +549,15 @@ static int ih_v6_0_sw_init(void *handle)
adev->irq.ih.use_doorbell = true;
adev->irq.ih.doorbell_index = adev->doorbell_index.ih << 1;
  
-	adev->irq.ih1.ring_size = 0;

-   adev->irq.ih2.ring_size = 0;
+   if (!(adev->flags & AMD_IS_APU)) {
+   r = amdgpu_ih_ring_init(adev, >irq.ih1, IH_RING_SIZE,
+   use_bus_addr);
+   if (r)
+   return r;
+
+   adev->irq.ih1.use_doorbell = true;
+   adev->irq.ih1.doorbell_index = (adev->doorbell_index.ih + 1) << 
1;
+   }
  
  	/* initialize ih control register offset */

ih_v6_0_init_register_offset(adev);




Re: [PATCH] drm/amdgpu: clear seq64 memory on free

2024-04-16 Thread Christian König

Am 16.04.24 um 14:34 schrieb Paneer Selvam, Arunpravin:

Hi Christian,

On 4/16/2024 5:47 PM, Christian König wrote:

Am 16.04.24 um 14:16 schrieb Paneer Selvam, Arunpravin:

Hi Christian,

On 4/16/2024 2:35 PM, Christian König wrote:

Am 15.04.24 um 20:48 schrieb Arunpravin Paneer Selvam:

We should clear the memory on free. Otherwise,
there is a chance that we will access the previous
application data and this would leads to an abnormal
behaviour in the current application.


Mhm, I would strongly expect that we initialize the seq number 
after allocation.


It could be that we also have situations were the correct start 
value is 0x or something like that instead.


Why does this matter?
when the user queue A's u64 address (fence address) is allocated to 
the newly created user queue B, we see a problem.
User queue B calls the signal IOCTL which creates a new fence having 
the wptr as the seq number, in
amdgpu_userq_fence_create function we have a check where we are 
comparing the rptr and wptr value (rptr >= wptr).
since the rptr value is read from the u64 address which has the user 
queue A's wptr data, this rptr >= wptr condition
gets satisfied and we are dropping the reference before the actual 
command gets processed in the hardware.

If we clear this u64 value on free, we dont see this problem.


Yeah, but that doesn't belongs into the seq64 handling.

Instead the code which allocates the seq64 during userqueue created 
needs to clear it to 0.

sure, got it.


Yeah, but fixing that aside. We should probably initialize the seq64 
array to something like 0xdeadbeef or a similar pattern to find issues 
were we forget to initialize the allocated slots.


Regards,
Christian.



Thanks,
Arun.


Regards,
Christian.



Thanks,
Arun.


Regards,
Christian.



Signed-off-by: Arunpravin Paneer Selvam 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c

index 4b9afc4df031..9613992c9804 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -189,10 +189,14 @@ int amdgpu_seq64_alloc(struct amdgpu_device 
*adev, u64 *va, u64 **cpu_addr)

  void amdgpu_seq64_free(struct amdgpu_device *adev, u64 va)
  {
  unsigned long bit_pos;
+    u64 *cpu_addr;
    bit_pos = (va - amdgpu_seq64_get_va_base(adev)) / 
sizeof(u64);

-    if (bit_pos < adev->seq64.num_sem)
+    if (bit_pos < adev->seq64.num_sem) {
+    cpu_addr = bit_pos + adev->seq64.cpu_base_addr;
+    memset(cpu_addr, 0, sizeof(u64));
  __clear_bit(bit_pos, adev->seq64.used);
+    }
  }
    /**












Re: [PATCH] drm/amdgpu: clear seq64 memory on free

2024-04-16 Thread Christian König

Am 16.04.24 um 14:16 schrieb Paneer Selvam, Arunpravin:

Hi Christian,

On 4/16/2024 2:35 PM, Christian König wrote:

Am 15.04.24 um 20:48 schrieb Arunpravin Paneer Selvam:

We should clear the memory on free. Otherwise,
there is a chance that we will access the previous
application data and this would leads to an abnormal
behaviour in the current application.


Mhm, I would strongly expect that we initialize the seq number after 
allocation.


It could be that we also have situations were the correct start value 
is 0x or something like that instead.


Why does this matter?
when the user queue A's u64 address (fence address) is allocated to 
the newly created user queue B, we see a problem.
User queue B calls the signal IOCTL which creates a new fence having 
the wptr as the seq number, in
amdgpu_userq_fence_create function we have a check where we are 
comparing the rptr and wptr value (rptr >= wptr).
since the rptr value is read from the u64 address which has the user 
queue A's wptr data, this rptr >= wptr condition
gets satisfied and we are dropping the reference before the actual 
command gets processed in the hardware.

If we clear this u64 value on free, we dont see this problem.


Yeah, but that doesn't belongs into the seq64 handling.

Instead the code which allocates the seq64 during userqueue created 
needs to clear it to 0.


Regards,
Christian.



Thanks,
Arun.


Regards,
Christian.



Signed-off-by: Arunpravin Paneer Selvam 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c

index 4b9afc4df031..9613992c9804 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -189,10 +189,14 @@ int amdgpu_seq64_alloc(struct amdgpu_device 
*adev, u64 *va, u64 **cpu_addr)

  void amdgpu_seq64_free(struct amdgpu_device *adev, u64 va)
  {
  unsigned long bit_pos;
+    u64 *cpu_addr;
    bit_pos = (va - amdgpu_seq64_get_va_base(adev)) / sizeof(u64);
-    if (bit_pos < adev->seq64.num_sem)
+    if (bit_pos < adev->seq64.num_sem) {
+    cpu_addr = bit_pos + adev->seq64.cpu_base_addr;
+    memset(cpu_addr, 0, sizeof(u64));
  __clear_bit(bit_pos, adev->seq64.used);
+    }
  }
    /**








Re: [PATCH v2] drm/amdkfd: make sure VM is ready for updating operations

2024-04-16 Thread Christian König

Looks valid to me of hand, but it's really Felix who needs to judge this.

On the other hand if it blocks any CI feel free to add my acked-by and 
submit it.


Christian.

Am 16.04.24 um 04:05 schrieb Yu, Lang:

[Public]

ping


-Original Message-
From: Yu, Lang 
Sent: Thursday, April 11, 2024 4:11 PM
To: amd-gfx@lists.freedesktop.org
Cc: Koenig, Christian ; Kuehling, Felix
; Yu, Lang 
Subject: [PATCH v2] drm/amdkfd: make sure VM is ready for updating
operations

When page table BOs were evicted but not validated before updating page
tables, VM is still in evicting state, amdgpu_vm_update_range returns -EBUSY
and restore_process_worker runs into a dead loop.

v2: Split the BO validation and page table update into two separate loops in
amdgpu_amdkfd_restore_process_bos. (Felix)
  1.Validate BOs
  2.Validate VM (and DMABuf attachments)
  3.Update page tables for the BOs validated above

Fixes: 2fdba514ad5a ("drm/amdgpu: Auto-validate DMABuf imports in
compute VMs")

Signed-off-by: Lang Yu 
---
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 34 +++

1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 0ae9fd844623..e2c9e6ddb1d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2900,13 +2900,12 @@ int
amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
__rcu *

   amdgpu_sync_create(_obj);

-  /* Validate BOs and map them to GPUVM (update VM page tables).
*/
+  /* Validate BOs managed by KFD */
   list_for_each_entry(mem, _info->kfd_bo_list,
   validate_list) {

   struct amdgpu_bo *bo = mem->bo;
   uint32_t domain = mem->domain;
-  struct kfd_mem_attachment *attachment;
   struct dma_resv_iter cursor;
   struct dma_fence *fence;

@@ -2931,6 +2930,25 @@ int
amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
__rcu *
   goto validate_map_fail;
   }
   }
+  }
+
+  if (failed_size)
+  pr_debug("0x%lx/0x%lx in system\n", failed_size, total_size);
+
+  /* Validate PDs, PTs and evicted DMABuf imports last. Otherwise BO
+   * validations above would invalidate DMABuf imports again.
+   */
+  ret = process_validate_vms(process_info, );
+  if (ret) {
+  pr_debug("Validating VMs failed, ret: %d\n", ret);
+  goto validate_map_fail;
+  }
+
+  /* Update mappings managed by KFD. */
+  list_for_each_entry(mem, _info->kfd_bo_list,
+  validate_list) {
+  struct kfd_mem_attachment *attachment;
+
   list_for_each_entry(attachment, >attachments, list) {
   if (!attachment->is_mapped)
   continue;
@@ -2947,18 +2965,6 @@ int
amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
__rcu *
   }
   }

-  if (failed_size)
-  pr_debug("0x%lx/0x%lx in system\n", failed_size, total_size);
-
-  /* Validate PDs, PTs and evicted DMABuf imports last. Otherwise BO
-   * validations above would invalidate DMABuf imports again.
-   */
-  ret = process_validate_vms(process_info, );
-  if (ret) {
-  pr_debug("Validating VMs failed, ret: %d\n", ret);
-  goto validate_map_fail;
-  }
-
   /* Update mappings not managed by KFD */
   list_for_each_entry(peer_vm, _info->vm_list_head,
   vm_list_node) {
--
2.25.1




Re: [PATCH] drm/amdgpu: clear seq64 memory on free

2024-04-16 Thread Christian König

Am 15.04.24 um 20:48 schrieb Arunpravin Paneer Selvam:

We should clear the memory on free. Otherwise,
there is a chance that we will access the previous
application data and this would leads to an abnormal
behaviour in the current application.


Mhm, I would strongly expect that we initialize the seq number after 
allocation.


It could be that we also have situations were the correct start value is 
0x or something like that instead.


Why does this matter?

Regards,
Christian.



Signed-off-by: Arunpravin Paneer Selvam 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
index 4b9afc4df031..9613992c9804 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -189,10 +189,14 @@ int amdgpu_seq64_alloc(struct amdgpu_device *adev, u64 
*va, u64 **cpu_addr)
  void amdgpu_seq64_free(struct amdgpu_device *adev, u64 va)
  {
unsigned long bit_pos;
+   u64 *cpu_addr;
  
  	bit_pos = (va - amdgpu_seq64_get_va_base(adev)) / sizeof(u64);

-   if (bit_pos < adev->seq64.num_sem)
+   if (bit_pos < adev->seq64.num_sem) {
+   cpu_addr = bit_pos + adev->seq64.cpu_base_addr;
+   memset(cpu_addr, 0, sizeof(u64));
__clear_bit(bit_pos, adev->seq64.used);
+   }
  }
  
  /**




Re: [PATCH V2] drm/amdgpu: Fix incorrect return value

2024-04-16 Thread Christian König

Well multiple things to consider here.

This is clearly not called from the interrupt, otherwise locking a mutex 
would be illegal. So question is who is calling this? And can the 
function be called from different threads at the same time?


As far as I can see you don't take that into account in the patch.

When there is some kind of single threaded worker handling the RAS 
errors instead then I strongly suggest to solve this issue in the worker.


As far as I can see hacking around a broken caller by inserting 
amdgpu_vram_mgr_query_page_status() inside 
amdgpu_vram_mgr_reserve_range() is an absolutely no-go. That is really 
bad coding style.


What could be is that the VRAM manager needs to be able to provide 
atomic uniqueness for the reserved addresses, e.g. that 
amdgpu_vram_mgr_reserve_range() can be called multiple times with same 
address from multiple threads, but then we would need a different data 
structure than a linked list which is protected by a mutex.


Regards,
Christian.

Am 15.04.24 um 06:04 schrieb Chai, Thomas:

[AMD Official Use Only - General]

Hi Christian:
If an ecc error occurs at an address, HW will generate an interrupt to SW 
to retire all pages located in the same physical row as the error address based 
on the physical characteristics of the memory device.
Therefore, if other pages located on the same physical row as the error 
address also occur ecc errors later, HW will also generate multiple interrupts 
to SW to retire these same pages again, so that amdgpu_vram_mgr_reserve_range 
will be called multiple times to reserve the same pages.

 I think it's more appropriate to do the status check inside the function. 
If the function entry is not checked, people who are not familiar with this 
part of the code can easily make mistakes when calling the function.


-
Best Regards,
Thomas

-Original Message-
From: Christian König 
Sent: Friday, April 12, 2024 5:24 PM
To: Chai, Thomas ; amd-gfx@lists.freedesktop.org
Cc: Chai, Thomas ; Zhang, Hawking ; Zhou1, Tao 
; Li, Candice ; Wang, Yang(Kevin) 
; Yang, Stanley 
Subject: Re: [PATCH V2] drm/amdgpu: Fix incorrect return value

Am 12.04.24 um 10:55 schrieb YiPeng Chai:

[Why]
After calling amdgpu_vram_mgr_reserve_range multiple times with the
same address, calling amdgpu_vram_mgr_query_page_status will always
return -EBUSY.
From the second call to amdgpu_vram_mgr_reserve_range, the same
address will be added to the reservations_pending list again and is
never moved to the reserved_pages list because the address had been
reserved.

Well just to make it clear that approach is a NAK until my concerns are solved.

Regards,
Christian.


[How]
First add the address status check before calling
amdgpu_vram_mgr_do_reserve, if the address is already reserved, do
nothing; If the address is already in the reservations_pending list,
directly reserve memory; only add new nodes for the addresses that are
not in the reserved_pages list and reservations_pending list.

V2:
   Avoid repeated locking/unlocking.

Signed-off-by: YiPeng Chai 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 25 +---
   1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 1e36c428d254..a636d3f650b1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -317,7 +317,6 @@ static void amdgpu_vram_mgr_do_reserve(struct
ttm_resource_manager *man)

   dev_dbg(adev->dev, "Reservation 0x%llx - %lld, Succeeded\n",
   rsv->start, rsv->size);
-
   vis_usage = amdgpu_vram_mgr_vis_size(adev, block);
   atomic64_add(vis_usage, >vis_usage);
   spin_lock(>bdev->lru_lock);
@@ -340,19 +339,27 @@ int amdgpu_vram_mgr_reserve_range(struct amdgpu_vram_mgr 
*mgr,
 uint64_t start, uint64_t size)
   {
   struct amdgpu_vram_reservation *rsv;
+ int ret = 0;

- rsv = kzalloc(sizeof(*rsv), GFP_KERNEL);
- if (!rsv)
- return -ENOMEM;
+ ret = amdgpu_vram_mgr_query_page_status(mgr, start);
+ if (!ret)
+ return 0;

- INIT_LIST_HEAD(>allocated);
- INIT_LIST_HEAD(>blocks);
+ if (ret == -ENOENT) {
+ rsv = kzalloc(sizeof(*rsv), GFP_KERNEL);
+ if (!rsv)
+ return -ENOMEM;

- rsv->start = start;
- rsv->size = size;
+ INIT_LIST_HEAD(>allocated);
+ INIT_LIST_HEAD(>blocks);
+
+ rsv->start = start;
+ rsv->size = size;
+ }

   mutex_lock(>lock);
- list_add_tail(>blocks, >reservations_pending);
+ if (ret == -ENOENT)
+ list_add_tail(>blocks, >reservations_pending);
   amdgpu_vram_mgr_do_reserve(>manager);
   mutex_unlock(>lock);





Re: [PATCH] drm/ttm: Make ttm shrinkers NUMA aware

2024-04-16 Thread Christian König

Am 08.04.24 um 19:49 schrieb Rajneesh Bhardwaj:

Otherwise the nid is always passed as 0 during memory reclaim so
make TTM shrinkers NUMA aware.

Signed-off-by: Rajneesh Bhardwaj 
---
  drivers/gpu/drm/ttm/ttm_pool.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index dbc96984d331..514261f44b78 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -794,7 +794,7 @@ int ttm_pool_mgr_init(unsigned long num_pages)
_pool_debugfs_shrink_fops);
  #endif
  
-	mm_shrinker = shrinker_alloc(0, "drm-ttm_pool");

+   mm_shrinker = shrinker_alloc(SHRINKER_NUMA_AWARE, "drm-ttm_pool");


Well that won't do it.

Setting the flag is just one step, but both ttm_pool_shrinker_scan() and 
ttm_pool_type_count() needs to be made NUMA aware.


This means that allocated_pages needs to become a per NID array and 
ttm_pool_shrink() should not shrink any pooĺ but only those with the 
correct nid (if the nid is set).


Regards,
Christian.


if (!mm_shrinker)
return -ENOMEM;
  




Re: [PATCH] drm/amdgpu: Modify the contiguous flags behaviour

2024-04-16 Thread Christian König

Am 16.04.24 um 00:02 schrieb Philip Yang:

On 2024-04-14 10:57, Arunpravin Paneer Selvam wrote:

Now we have two flags for contiguous VRAM buffer allocation.
If the application request for AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
it would set the ttm place TTM_PL_FLAG_CONTIGUOUS flag in the
buffer's placement function.

This patch will change the default behaviour of the two flags.
This change will simplify the KFD best effort contiguous VRAM 
allocation, because KFD doesn't need set new GEM_ flag.

When we set AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS
- This means contiguous is not mandatory.


AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS used in couple of places. For page 
table BO, it is fine as BO size is page size 4K. For 64KB reserved BOs 
and F/W size related BOs, do all allocation happen at driver 
initialization before the VRAM is fragmented?




Oh, that's a really good point, we need to keep the behavior as is for 
kernel allocations. Arun can you take care of that?


Thanks,
Christian.


- we will try to allocate the contiguous buffer. Say if the
   allocation fails, we fallback to allocate the individual pages.

When we setTTM_PL_FLAG_CONTIGUOUS
- This means contiguous allocation is mandatory.
- we are setting this in amdgpu_bo_pin_restricted() before bo validation
   and check this flag in the vram manager file.
- if this is set, we should allocate the buffer pages contiguously.
   the allocation fails, we return -ENOSPC.

Signed-off-by: Arunpravin Paneer Selvam
Suggested-by: Christian König
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c   | 14 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 57 +++-
  2 files changed, 49 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 8bc79924d171..41926d631563 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -153,8 +153,6 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, 
u32 domain)
else
places[c].flags |= TTM_PL_FLAG_TOPDOWN;
  
-		if (flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)

-   places[c].flags |= TTM_PL_FLAG_CONTIGUOUS;
c++;
}
  
@@ -899,6 +897,8 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain,

  {
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
struct ttm_operation_ctx ctx = { false, false };
+   struct ttm_place *places = bo->placements;
+   u32 c = 0;
int r, i;
  
  	if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm))

@@ -921,16 +921,10 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 
domain,
  
  	if (bo->tbo.pin_count) {

uint32_t mem_type = bo->tbo.resource->mem_type;
-   uint32_t mem_flags = bo->tbo.resource->placement;
  
  		if (!(domain & amdgpu_mem_type_to_domain(mem_type)))

return -EINVAL;
  
-		if ((mem_type == TTM_PL_VRAM) &&

-   (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) &&
-   !(mem_flags & TTM_PL_FLAG_CONTIGUOUS))
-   return -EINVAL;
-
This looks like a bug before, but with this patch, the check makes 
sense and is needed.

ttm_bo_pin(>tbo);
  
  		if (max_offset != 0) {

@@ -968,6 +962,10 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 
domain,
bo->placements[i].lpfn = lpfn;
}
  
+	if (domain & AMDGPU_GEM_DOMAIN_VRAM &&

+   !WARN_ON(places[c].mem_type != TTM_PL_VRAM))
+   places[c].flags |= TTM_PL_FLAG_CONTIGUOUS;
+


If BO pinned is not allocated with AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS, 
should pin and return scattered pages because the RDMA support 
scattered dmabuf. Christian also pointed this out.


If (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS &&

    bo->placements[i].mem_type == TTM_PL_VRAM)
        o->placements[i].flags |= TTM_PL_FLAG_CONTIGUOUS;

r = ttm_bo_validate(>tbo, >placement, );
if (unlikely(r)) {
dev_err(adev->dev, "%p pin failed\n", bo);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 8db880244324..ddbf302878f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -88,6 +88,30 @@ static inline u64 amdgpu_vram_mgr_blocks_size(struct 
list_head *head)
return size;
  }
  
+static inline unsigned long

+amdgpu_vram_find_pages_per_block(struct ttm_buffer_object *tbo,
+const struct ttm_place *place,
+unsigned long bo_flags)
+{
+   unsigned long pages_per_block;
+
+   if (bo_flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS ||
+   place->flags & TTM_PL_FLAG_CONTIGUOUS

Re: [PATCH] drm/radeon: make -fstrict-flex-arrays=3 happy

2024-04-15 Thread Christian König

Am 15.04.24 um 15:38 schrieb Alex Deucher:

The driver parses a union where the layout up through the first
array is the same, however, the array has different sizes
depending on the elements in the union.  Be explicit to
fix the UBSAN checker.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3323
Fixes: df8fc4e934c1 ("kbuild: Enable -fstrict-flex-arrays=3")
Signed-off-by: Alex Deucher 
Cc: Kees Cook 


Acked-by: Christian König 

But I have a bad feeling messing with that old code.

Regards,
Christian.


---
  drivers/gpu/drm/radeon/radeon_atombios.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_atombios.c 
b/drivers/gpu/drm/radeon/radeon_atombios.c
index bb1f0a3371ab5..10793a433bf58 100644
--- a/drivers/gpu/drm/radeon/radeon_atombios.c
+++ b/drivers/gpu/drm/radeon/radeon_atombios.c
@@ -923,8 +923,12 @@ bool 
radeon_get_atom_connector_info_from_supported_devices_table(struct
max_device = ATOM_MAX_SUPPORTED_DEVICE_INFO;
  
  	for (i = 0; i < max_device; i++) {

-   ATOM_CONNECTOR_INFO_I2C ci =
-   supported_devices->info.asConnInfo[i];
+   ATOM_CONNECTOR_INFO_I2C ci;
+
+   if (frev > 1)
+   ci = supported_devices->info_2d1.asConnInfo[i];
+   else
+   ci = supported_devices->info.asConnInfo[i];
  
  		bios_connectors[i].valid = false;
  




Re: [PATCH] drm/amdgpu: Modify the contiguous flags behaviour

2024-04-15 Thread Christian König

Am 14.04.24 um 16:57 schrieb Arunpravin Paneer Selvam:

Now we have two flags for contiguous VRAM buffer allocation.
If the application request for AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
it would set the ttm place TTM_PL_FLAG_CONTIGUOUS flag in the
buffer's placement function.

This patch will change the default behaviour of the two flags.

When we set AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS
- This means contiguous is not mandatory.
- we will try to allocate the contiguous buffer. Say if the
   allocation fails, we fallback to allocate the individual pages.

When we setTTM_PL_FLAG_CONTIGUOUS
- This means contiguous allocation is mandatory.
- we are setting this in amdgpu_bo_pin_restricted() before bo validation
   and check this flag in the vram manager file.
- if this is set, we should allocate the buffer pages contiguously.
   the allocation fails, we return -ENOSPC.

Signed-off-by: Arunpravin Paneer Selvam 
Suggested-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c   | 14 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 57 +++-
  2 files changed, 49 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 8bc79924d171..41926d631563 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -153,8 +153,6 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, 
u32 domain)
else
places[c].flags |= TTM_PL_FLAG_TOPDOWN;
  
-		if (flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)

-   places[c].flags |= TTM_PL_FLAG_CONTIGUOUS;
c++;
}
  
@@ -899,6 +897,8 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain,

  {
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
struct ttm_operation_ctx ctx = { false, false };
+   struct ttm_place *places = bo->placements;
+   u32 c = 0;
int r, i;
  
  	if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm))

@@ -921,16 +921,10 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 
domain,
  
  	if (bo->tbo.pin_count) {

uint32_t mem_type = bo->tbo.resource->mem_type;
-   uint32_t mem_flags = bo->tbo.resource->placement;
  
  		if (!(domain & amdgpu_mem_type_to_domain(mem_type)))

return -EINVAL;
  
-		if ((mem_type == TTM_PL_VRAM) &&

-   (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) &&
-   !(mem_flags & TTM_PL_FLAG_CONTIGUOUS))
-   return -EINVAL;
-


I think that check here needs to stay.


ttm_bo_pin(>tbo);
  
  		if (max_offset != 0) {

@@ -968,6 +962,10 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 
domain,
bo->placements[i].lpfn = lpfn;
}
  
+	if (domain & AMDGPU_GEM_DOMAIN_VRAM &&

+   !WARN_ON(places[c].mem_type != TTM_PL_VRAM))
+   places[c].flags |= TTM_PL_FLAG_CONTIGUOUS;
+


This needs to go into the loop directly above as something like this here:

If (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS &&
    bo->placements[i].mem_type == TTM_PL_VRAM)
        o->placements[i].flags |= TTM_PL_FLAG_CONTIGUOUS;

This essentially replaces the removed code in 
amdgpu_bo_placement_from_domain() and only applies it during pinning.



r = ttm_bo_validate(>tbo, >placement, );
if (unlikely(r)) {
dev_err(adev->dev, "%p pin failed\n", bo);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 8db880244324..ddbf302878f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -88,6 +88,30 @@ static inline u64 amdgpu_vram_mgr_blocks_size(struct 
list_head *head)
return size;
  }
  
+static inline unsigned long

+amdgpu_vram_find_pages_per_block(struct ttm_buffer_object *tbo,
+const struct ttm_place *place,
+unsigned long bo_flags)


Well I think we need a better name here. "find" usually means we look up 
something in a data structure. Maybe 
amdgpu_vram_mgr_calculate_pages_per_block.



+{
+   unsigned long pages_per_block;
+
+   if (bo_flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS ||
+   place->flags & TTM_PL_FLAG_CONTIGUOUS) {
+   pages_per_block = ~0ul;
+   } else {
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+   pages_per_block = HPAGE_PMD_NR;
+#else
+   /* default to 2MB */
+   pages_per_block = 2UL << (20UL - PAGE_SHIFT);
+#endif
+   pages_per_block = max_t(uint32_t, pages_per_block,
+   tbo->page_alignment);
+   }
+
+   return p

Re: [PATCH 1/6] drm/amdgpu: Support contiguous VRAM allocation

2024-04-15 Thread Christian König

Am 12.04.24 um 22:12 schrieb Philip Yang:

RDMA device with limited scatter-gather capability requires physical
address contiguous VRAM buffer for RDMA peer direct access.

Add a new KFD alloc memory flag and store as new GEM bo alloc flag. When
pin this buffer object to export for RDMA peerdirect access, set
AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS flag, and then vram_mgr will set
TTM_PL_FLAG_CONTIFUOUS flag to ask VRAM buddy allocator to get
contiguous VRAM.

Remove the 2GB max memory block size limit for contiguous allocation.


I'm going to sync up with Arun on this once more, but I think we won't 
even need the new flag.


We will just downgrade the existing flag to be a best effort allocation 
for contiguous buffers and only use the TTM flag internally to signal 
that we need to alter it while pinning.


Regards,
Christian.



Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 7 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 9 +++--
  include/uapi/drm/amdgpu_drm.h| 5 +
  include/uapi/linux/kfd_ioctl.h   | 1 +
  4 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 0ae9fd844623..3523b91f8add 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1470,6 +1470,9 @@ static int amdgpu_amdkfd_gpuvm_pin_bo(struct amdgpu_bo 
*bo, u32 domain)
if (unlikely(ret))
return ret;
  
+	if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS_BEST_EFFORT)

+   bo->flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
+
ret = amdgpu_bo_pin_restricted(bo, domain, 0, 0);
if (ret)
pr_err("Error in Pinning BO to domain: %d\n", domain);
@@ -1712,6 +1715,10 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
alloc_flags = AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
alloc_flags |= (flags & KFD_IOC_ALLOC_MEM_FLAGS_PUBLIC) 
?
AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED : 0;
+
+   /* For contiguous VRAM allocation */
+   if (flags & 
KFD_IOC_ALLOC_MEM_FLAGS_CONTIGUOUS_BEST_EFFORT)
+   alloc_flags |= 
AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS_BEST_EFFORT;
}
xcp_id = fpriv->xcp_id == AMDGPU_XCP_NO_PARTITION ?
0 : fpriv->xcp_id;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 8db880244324..1d6e45e238e1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -516,8 +516,13 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager 
*man,
  
  		BUG_ON(min_block_size < mm->chunk_size);
  
-		/* Limit maximum size to 2GiB due to SG table limitations */

-   size = min(remaining_size, 2ULL << 30);
+   if (place->flags & TTM_PL_FLAG_CONTIGUOUS)
+   size = remaining_size;
+   else
+   /* Limit maximum size to 2GiB due to SG table 
limitations
+* for no contiguous allocation.
+*/
+   size = min(remaining_size, 2ULL << 30);
  
  		if ((size >= (u64)pages_per_block << PAGE_SHIFT) &&

!(size & (((u64)pages_per_block << PAGE_SHIFT) 
- 1)))
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index ad21c613fec8..13645abb8e46 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -171,6 +171,11 @@ extern "C" {
   * may override the MTYPE selected in AMDGPU_VA_OP_MAP.
   */
  #define AMDGPU_GEM_CREATE_EXT_COHERENT(1 << 15)
+/* Flag that allocating the BO with best effort for contiguous VRAM.
+ * If no contiguous VRAM, fallback to scattered allocation.
+ * Pin the BO for peerdirect RDMA trigger VRAM defragmentation.
+ */
+#define AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS_BEST_EFFORT  (1 << 16)
  
  struct drm_amdgpu_gem_create_in  {

/** the requested memory size */
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 2040a470ddb4..c1394c162d4e 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -407,6 +407,7 @@ struct kfd_ioctl_acquire_vm_args {
  #define KFD_IOC_ALLOC_MEM_FLAGS_COHERENT  (1 << 26)
  #define KFD_IOC_ALLOC_MEM_FLAGS_UNCACHED  (1 << 25)
  #define KFD_IOC_ALLOC_MEM_FLAGS_EXT_COHERENT  (1 << 24)
+#define KFD_IOC_ALLOC_MEM_FLAGS_CONTIGUOUS_BEST_EFFORT (1 << 23)
  
  /* Allocate memory for later SVM (shared virtual memory) mapping.

   *




Re: [PATCH V2] drm/amdgpu: Fix incorrect return value

2024-04-12 Thread Christian König

Am 12.04.24 um 10:55 schrieb YiPeng Chai:

[Why]
   After calling amdgpu_vram_mgr_reserve_range
multiple times with the same address, calling
amdgpu_vram_mgr_query_page_status will always
return -EBUSY.
   From the second call to amdgpu_vram_mgr_reserve_range,
the same address will be added to the reservations_pending
list again and is never moved to the reserved_pages list
because the address had been reserved.


Well just to make it clear that approach is a NAK until my concerns are 
solved.


Regards,
Christian.



[How]
   First add the address status check before calling
amdgpu_vram_mgr_do_reserve, if the address is already
reserved, do nothing; If the address is already in the
reservations_pending list, directly reserve memory;
only add new nodes for the addresses that are not in the
reserved_pages list and reservations_pending list.

V2:
  Avoid repeated locking/unlocking.

Signed-off-by: YiPeng Chai 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 25 +---
  1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 1e36c428d254..a636d3f650b1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -317,7 +317,6 @@ static void amdgpu_vram_mgr_do_reserve(struct 
ttm_resource_manager *man)
  
  		dev_dbg(adev->dev, "Reservation 0x%llx - %lld, Succeeded\n",

rsv->start, rsv->size);
-
vis_usage = amdgpu_vram_mgr_vis_size(adev, block);
atomic64_add(vis_usage, >vis_usage);
spin_lock(>bdev->lru_lock);
@@ -340,19 +339,27 @@ int amdgpu_vram_mgr_reserve_range(struct amdgpu_vram_mgr 
*mgr,
  uint64_t start, uint64_t size)
  {
struct amdgpu_vram_reservation *rsv;
+   int ret = 0;
  
-	rsv = kzalloc(sizeof(*rsv), GFP_KERNEL);

-   if (!rsv)
-   return -ENOMEM;
+   ret = amdgpu_vram_mgr_query_page_status(mgr, start);
+   if (!ret)
+   return 0;
  
-	INIT_LIST_HEAD(>allocated);

-   INIT_LIST_HEAD(>blocks);
+   if (ret == -ENOENT) {
+   rsv = kzalloc(sizeof(*rsv), GFP_KERNEL);
+   if (!rsv)
+   return -ENOMEM;
  
-	rsv->start = start;

-   rsv->size = size;
+   INIT_LIST_HEAD(>allocated);
+   INIT_LIST_HEAD(>blocks);
+
+   rsv->start = start;
+   rsv->size = size;
+   }
  
  	mutex_lock(>lock);

-   list_add_tail(>blocks, >reservations_pending);
+   if (ret == -ENOENT)
+   list_add_tail(>blocks, >reservations_pending);
amdgpu_vram_mgr_do_reserve(>manager);
mutex_unlock(>lock);
  




Re: [PATCH] drm/amdgpu: Fix incorrect return value

2024-04-12 Thread Christian König

Am 03.04.24 um 09:06 schrieb YiPeng Chai:

[Why]
   After calling amdgpu_vram_mgr_reserve_range
multiple times with the same address, calling
amdgpu_vram_mgr_query_page_status will always
return -EBUSY.
   From the second call to amdgpu_vram_mgr_reserve_range,
the same address will be added to the reservations_pending
list again and is never moved to the reserved_pages list
because the address had been reserved.


Well that sounds like a really bad idea to me. Why is the function 
called multiple times with the same address in the first place ?


Apart from that a note on the coding style below.



[How]
   First add the address status check before calling
amdgpu_vram_mgr_do_reserve, if the address is already
reserved, do nothing; If the address is already in the
reservations_pending list, directly reserve memory;
only add new nodes for the addresses that are not in the
reserved_pages list and reservations_pending list.

Signed-off-by: YiPeng Chai 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 28 +---
  1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 1e36c428d254..0bf3f4092900 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -317,7 +317,6 @@ static void amdgpu_vram_mgr_do_reserve(struct 
ttm_resource_manager *man)
  
  		dev_dbg(adev->dev, "Reservation 0x%llx - %lld, Succeeded\n",

rsv->start, rsv->size);
-
vis_usage = amdgpu_vram_mgr_vis_size(adev, block);
atomic64_add(vis_usage, >vis_usage);
spin_lock(>bdev->lru_lock);
@@ -340,19 +339,30 @@ int amdgpu_vram_mgr_reserve_range(struct amdgpu_vram_mgr 
*mgr,
  uint64_t start, uint64_t size)
  {
struct amdgpu_vram_reservation *rsv;
+   int ret = 0;


Don't initialize local variables when it isn't necessary.

  
-	rsv = kzalloc(sizeof(*rsv), GFP_KERNEL);

-   if (!rsv)
-   return -ENOMEM;
+   ret = amdgpu_vram_mgr_query_page_status(mgr, start);
+   if (!ret)
+   return 0;
+
+   if (ret == -ENOENT) {
+   rsv = kzalloc(sizeof(*rsv), GFP_KERNEL);
+   if (!rsv)
+   return -ENOMEM;
  
-	INIT_LIST_HEAD(>allocated);

-   INIT_LIST_HEAD(>blocks);
+   INIT_LIST_HEAD(>allocated);
+   INIT_LIST_HEAD(>blocks);
  
-	rsv->start = start;

-   rsv->size = size;
+   rsv->start = start;
+   rsv->size = size;
+
+   mutex_lock(>lock);
+   list_add_tail(>blocks, >reservations_pending);
+   mutex_unlock(>lock);
+
+   }


You should probably not lock/unlock here.

Regards,
Christian.

  
  	mutex_lock(>lock);

-   list_add_tail(>blocks, >reservations_pending);
amdgpu_vram_mgr_do_reserve(>manager);
mutex_unlock(>lock);
  




Re: [PATCH] drm/amdgpu: validate the parameters of bo mapping operations more clearly

2024-04-12 Thread Christian König

Am 12.04.24 um 09:35 schrieb xinhui pan:

Verify the parameters of
amdgpu_vm_bo_(map/replace_map/clearing_mappings) in one common place.

Reported-by: Vlad Stolyarov 
Suggested-by: Christian König 
Signed-off-by: xinhui pan 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 72 --
  1 file changed, 46 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 8af3f0fd3073..4e2391c83d7c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1647,6 +1647,37 @@ static void amdgpu_vm_bo_insert_map(struct amdgpu_device 
*adev,
trace_amdgpu_vm_bo_map(bo_va, mapping);
  }
  
+/* Validate operation parameters to prevent potential abuse */

+static int amdgpu_vm_verify_parameters(struct amdgpu_device *adev,
+ struct amdgpu_bo *bo,
+ uint64_t saddr,
+ uint64_t offset,
+ uint64_t size)
+{
+   uint64_t tmp, lpfn;
+
+   if (saddr & AMDGPU_GPU_PAGE_MASK
+   || offset & AMDGPU_GPU_PAGE_MASK
+   || size & AMDGPU_GPU_PAGE_MASK)
+   return -EINVAL;
+
+   if (check_add_overflow(saddr, size, )
+   || check_add_overflow(offset, size, )
+   || size == 0 /* which also leads to end < begin */)
+   return -EINVAL;
+
+   /* make sure object fit at this offset */
+   if (bo && offset + size > amdgpu_bo_size(bo))
+   return -EINVAL;
+
+   /* Ensure last pfn not exceed max_pfn */
+   lpfn = (saddr + size - 1) >> AMDGPU_GPU_PAGE_SHIFT;
+   if (lpfn >= adev->vm_manager.max_pfn)
+   return -EINVAL;
+
+   return 0;
+}
+
  /**
   * amdgpu_vm_bo_map - map bo inside a vm
   *
@@ -1673,21 +1704,14 @@ int amdgpu_vm_bo_map(struct amdgpu_device *adev,
struct amdgpu_bo *bo = bo_va->base.bo;
struct amdgpu_vm *vm = bo_va->base.vm;
uint64_t eaddr;
+   int r;
  
-	/* validate the parameters */

-   if (saddr & ~PAGE_MASK || offset & ~PAGE_MASK || size & ~PAGE_MASK)
-   return -EINVAL;
-   if (saddr + size <= saddr || offset + size <= offset)
-   return -EINVAL;
-
-   /* make sure object fit at this offset */
-   eaddr = saddr + size - 1;
-   if ((bo && offset + size > amdgpu_bo_size(bo)) ||
-   (eaddr >= adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT))
-   return -EINVAL;
+   r = amdgpu_vm_verify_parameters(adev, bo, saddr, offset, size);
+   if (r)
+   return r;
  
  	saddr /= AMDGPU_GPU_PAGE_SIZE;

-   eaddr /= AMDGPU_GPU_PAGE_SIZE;
+   eaddr = saddr + (size - 1) / AMDGPU_GPU_PAGE_SIZE;
  
  	tmp = amdgpu_vm_it_iter_first(>va, saddr, eaddr);

if (tmp) {
@@ -1740,17 +1764,9 @@ int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
uint64_t eaddr;
int r;
  
-	/* validate the parameters */

-   if (saddr & ~PAGE_MASK || offset & ~PAGE_MASK || size & ~PAGE_MASK)
-   return -EINVAL;
-   if (saddr + size <= saddr || offset + size <= offset)
-   return -EINVAL;
-
-   /* make sure object fit at this offset */
-   eaddr = saddr + size - 1;
-   if ((bo && offset + size > amdgpu_bo_size(bo)) ||
-   (eaddr >= adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT))
-   return -EINVAL;
+   r = amdgpu_vm_verify_parameters(adev, bo, saddr, offset, size);
+   if (r)
+   return r;
  
  	/* Allocate all the needed memory */

mapping = kmalloc(sizeof(*mapping), GFP_KERNEL);
@@ -1764,7 +1780,7 @@ int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
}
  
  	saddr /= AMDGPU_GPU_PAGE_SIZE;

-   eaddr /= AMDGPU_GPU_PAGE_SIZE;
+   eaddr = saddr + (size - 1) / AMDGPU_GPU_PAGE_SIZE;
  
  	mapping->start = saddr;

mapping->last = eaddr;
@@ -1851,10 +1867,14 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device 
*adev,
struct amdgpu_bo_va_mapping *before, *after, *tmp, *next;
LIST_HEAD(removed);
uint64_t eaddr;
+   int r;
+
+   r = amdgpu_vm_verify_parameters(adev, NULL, saddr, 0, size);
+   if (r)
+   return r;
  
-	eaddr = saddr + size - 1;

saddr /= AMDGPU_GPU_PAGE_SIZE;
-   eaddr /= AMDGPU_GPU_PAGE_SIZE;
+   eaddr = saddr + (size - 1) / AMDGPU_GPU_PAGE_SIZE;
  
  	/* Allocate all the needed memory */

before = kzalloc(sizeof(*before), GFP_KERNEL);




Re: [PATCH] drm/amdgpu: validate the parameters of bo mapping operations more clearly

2024-04-12 Thread Christian König

Am 12.04.24 um 08:47 schrieb xinhui pan:

Verify the parameters of
amdgpu_vm_bo_(map/replace_map/clearing_mappings) in one common place.

Reported-by: Vlad Stolyarov 
Signed-off-by: xinhui pan 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 63 --
  1 file changed, 39 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 8af3f0fd3073..ea9721666756 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1647,6 +1647,37 @@ static void amdgpu_vm_bo_insert_map(struct amdgpu_device 
*adev,
trace_amdgpu_vm_bo_map(bo_va, mapping);
  }


Please add a one line comment here describing why we have the function.

E.g. "validate operation parameters to prevent potential abuse" or 
something like that.
  
+static int amdgpu_vm_bo_verify_parameters(struct amdgpu_device *adev,

+ struct amdgpu_bo *bo,
+ uint64_t saddr,
+ uint64_t offset,
+ uint64_t size)


Probably better to drop the _bo_ from the name cause this doesn't work 
on bo_va structures.



+{
+   size_t tmp, lpfn;


Extremely bad idea, size_t might only be 32bit. Please use uint64_t here 
as well.



+
+   if (saddr & AMDGPU_GPU_PAGE_MASK
+   || offset & AMDGPU_GPU_PAGE_MASK
+   || size & AMDGPU_GPU_PAGE_MASK)
+   return -EINVAL;
+
+   /* Check overflow */


That comment is a bit superfluous when check_add_overflow() is used. 
Maybe drop it.



+   if (check_add_overflow(saddr, size, )
+   || check_add_overflow(offset, size, )
+   || size == 0 /* which also leads to end < begin */)
+   return -EINVAL;
+
+   /* make sure object fit at this offset */
+   if (bo && offset + size > amdgpu_bo_size(bo))
+   return -EINVAL;
+
+   /* Ensure last pfn not exceed max_pfn */
+   lpfn = (saddr + size - 1) >> AMDGPU_GPU_PAGE_SHIFT;
+   if (lpfn >= adev->vm_manager.max_pfn)
+   return -EINVAL;
+
+   return 0;
+}
+
  /**
   * amdgpu_vm_bo_map - map bo inside a vm
   *
@@ -1674,20 +1705,11 @@ int amdgpu_vm_bo_map(struct amdgpu_device *adev,
struct amdgpu_vm *vm = bo_va->base.vm;
uint64_t eaddr;
  
-	/* validate the parameters */

-   if (saddr & ~PAGE_MASK || offset & ~PAGE_MASK || size & ~PAGE_MASK)
-   return -EINVAL;
-   if (saddr + size <= saddr || offset + size <= offset)
-   return -EINVAL;
-
-   /* make sure object fit at this offset */
-   eaddr = saddr + size - 1;
-   if ((bo && offset + size > amdgpu_bo_size(bo)) ||
-   (eaddr >= adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT))
+   if (amdgpu_vm_bo_verify_parameters(adev, bo, saddr, offset, size))
return -EINVAL;


Probably better to return the return value of 
amdgpu_vm_bo_verify_parameters().


  
+	eaddr = (saddr + size - 1) / AMDGPU_GPU_PAGE_SIZE;

saddr /= AMDGPU_GPU_PAGE_SIZE;
-   eaddr /= AMDGPU_GPU_PAGE_SIZE;


Please keep the saddr, eaddr calculation order.

Apart from those nit picks looks really good to me.

Regards,
Christian.

  
  	tmp = amdgpu_vm_it_iter_first(>va, saddr, eaddr);

if (tmp) {
@@ -1740,16 +1762,7 @@ int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
uint64_t eaddr;
int r;
  
-	/* validate the parameters */

-   if (saddr & ~PAGE_MASK || offset & ~PAGE_MASK || size & ~PAGE_MASK)
-   return -EINVAL;
-   if (saddr + size <= saddr || offset + size <= offset)
-   return -EINVAL;
-
-   /* make sure object fit at this offset */
-   eaddr = saddr + size - 1;
-   if ((bo && offset + size > amdgpu_bo_size(bo)) ||
-   (eaddr >= adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT))
+   if (amdgpu_vm_bo_verify_parameters(adev, bo, saddr, offset, size))
return -EINVAL;
  
  	/* Allocate all the needed memory */

@@ -1763,8 +1776,8 @@ int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
return r;
}
  
+	eaddr = (saddr + size - 1) / AMDGPU_GPU_PAGE_SIZE;

saddr /= AMDGPU_GPU_PAGE_SIZE;
-   eaddr /= AMDGPU_GPU_PAGE_SIZE;
  
  	mapping->start = saddr;

mapping->last = eaddr;
@@ -1852,9 +1865,11 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device 
*adev,
LIST_HEAD(removed);
uint64_t eaddr;
  
-	eaddr = saddr + size - 1;

+   if (amdgpu_vm_bo_verify_parameters(adev, NULL, saddr, 0, size))
+   return -EINVAL;
+
+   eaddr = (saddr + size - 1) / AMDGPU_GPU_PAGE_SIZE;
saddr /= AMDGPU_GPU_PAGE_SIZE;
-   eaddr /= AMDGPU_GPU_PAGE_SIZE;
  
  	/* Allocate all the needed memory */

before = kzalloc(sizeof(*before), GFP_KERNEL);




Re: [PATCH] drm/amdgpu: validate the parameters of amdgpu_vm_bo_clear_mappings

2024-04-11 Thread Christian König

Am 11.04.24 um 17:44 schrieb Jann Horn:

On Thu, Apr 11, 2024 at 12:25 PM Christian König
 wrote:

Am 11.04.24 um 05:28 schrieb xinhui pan:

Ensure there is no address overlapping.

Reported-by: Vlad Stolyarov 
Signed-off-by: xinhui pan 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 ++
   1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 8af3f0fd3073..f1315a854192 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1852,6 +1852,12 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device 
*adev,
   LIST_HEAD(removed);
   uint64_t eaddr;

+ /* validate the parameters */
+ if (saddr & ~PAGE_MASK || size & ~PAGE_MASK)
+ return -EINVAL;

Well as general rule: *never* use PAGE_MASK and other PAGE_* macros
here. This is GPUVM and not related to the CPUVM space.


+ if (saddr + size <= saddr)
+ return -EINVAL;
+

Mhm, so basically size is not checked for a wraparound?

Yeah, exactly.


   eaddr = saddr + size - 1;
   saddr /= AMDGPU_GPU_PAGE_SIZE;
   eaddr /= AMDGPU_GPU_PAGE_SIZE;

If that's the case then I would rather check for saddr < eaddr here.

FWIW, it would probably a good idea to keep the added check analogous
to other functions called from amdgpu_gem_va_ioctl() like
amdgpu_vm_bo_replace_map(), which also checks "if (saddr + size <=
saddr || offset + size <= offset)" before the division.


I would also change that function as well.

The eaddr needs to be checked against the max_pfn as well and we 
currently shift that around for this check which looks quite ugly.


Only the overflow check can probably be before it.




But that actually shouldn't matter since this code here:

  /* Now gather all removed mappings */
  tmp = amdgpu_vm_it_iter_first(>va, saddr, eaddr);
  while (tmp) {

Then shouldn't return anything, so the operation is basically a NO-OP then.

That's not how it works; the interval tree is not designed to be fed
bogus ranges that end before they start. (Or at least I don't think it
is - if it is, it is buggy.) I think basically if the specified start
and end addresses are both within an rbtree node, this rbtree node is
returned from the lookup, even if the specified end address is before
the specified start address.


Ah, yeah that makes sense. The search functions checks if a node only 
partially intersects with start and end and not if it is covered by it.


Thanks,
Christian.



A more verbose example:
Let's assume the interval tree contains a single entry from address A
to address D.
Looking at the _iter_first implementation in interval_tree_generic.h,
when it is called with a start address C which is between A and D, and
an end address B (result of an addition that wraps around to an
address below C but above A), it does the following:

1. bails out if "node->ITSUBTREE < start" (meaning if the specified
start address C is after the range covered by the root node - which is
not the case)
2. bails out if "ITSTART(leftmost) > last" (meaning if the specified
end address is smaller than the entry start address A - which is not
the case)
3. enters _subtree_search. in there:
4. the root node has no children, so "node->ITRB.rb_left" is NULL
5. the specified end address B is after the node's start address A, so
"Cond1" is fulfilled
6. the specified start address C is before the node's end address D,
so "Cond2" is fulfilled
7. the root node is returned from the lookup




Re: [PATCH Review 1/1] drm/amdgpu: Support setting recover method

2024-04-11 Thread Christian König

Am 11.04.24 um 13:49 schrieb Christian König:

Am 11.04.24 um 13:30 schrieb Yang, Stanley:

[AMD Official Use Only - General]


-Original Message-
From: Christian König 
Sent: Thursday, April 11, 2024 7:17 PM
To: Yang, Stanley ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH Review 1/1] drm/amdgpu: Support setting recover 
method


Am 11.04.24 um 13:11 schrieb Stanley.Yang:

Don't modify amdgpu gpu recover get operation, add amdgpu gpu recover
set operation to select reset method, only support mode1 and mode2
currently.

Well I don't think setting this from userspace is valid.

The reset method to use is determined by the hardware and 
environment (e.g.

SRIOV, passthrough, whatever) and can't be chosen simply.
[Stanley]: Agree, the setting is invalid for some devices not 
supported setting method and devices still reset with default method,
but it's valid for those devices supported setting reset method, user 
can conduct combination testing like mode1 test then mode2 test without

re-modprobe driver.


Well and the user could also shoot himself into the foot.

I really don't think that this is a valuable functionality.


To make clear what I mean: What you could do is to make the module 
parameter writeable.


In this case the hardware code still decides which reset method to use 
based on the module parameter in the moment the reset is requested.


That would also avoid re-loading the driver.

Regards,
Christian.



Regards,
Christian.



Regards,
Stanley

Regards,
Christian.


Signed-off-by: Stanley.Yang 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu.h    |  3 ++
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  1 +
   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 37

+++---

   3 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 9c62552bec34..c82976b2b977 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1151,6 +1151,9 @@ struct amdgpu_device {
 bool    debug_largebar;
 bool debug_disable_soft_recovery;
 bool    debug_use_vram_fw_buf;
+
+   /* Used to set gpu reset method */
+   int recover_method;
   };

   static inline uint32_t amdgpu_ip_version(const struct amdgpu_device
*adev, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 3204b8f6edeb..8411a793be18 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3908,6 +3908,7 @@ int amdgpu_device_init(struct amdgpu_device

*adev,

 else
 adev->asic_type = flags & AMD_ASIC_MASK;

+   adev->recover_method = AMD_RESET_METHOD_NONE;
 adev->usec_timeout = AMDGPU_MAX_USEC_TIMEOUT;
 if (amdgpu_emu_mode == 1)
 adev->usec_timeout *= 10;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 10832b470448..e388a50d11d9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -965,9 +965,37 @@ static int gpu_recover_get(void *data, u64 *val)
 return 0;
   }

+static int gpu_recover_set(void *data, u64 val) {
+   struct amdgpu_device *adev = (struct amdgpu_device *)data;
+   struct drm_device *dev = adev_to_drm(adev);
+   int r;
+
+   /* TODO: support mode1 and mode2 currently */
+   if (val == AMD_RESET_METHOD_MODE1 ||
+   val == AMD_RESET_METHOD_MODE2)
+   adev->recover_method = val;
+   else
+   adev->recover_method = AMD_RESET_METHOD_NONE;
+
+   r = pm_runtime_get_sync(dev->dev);
+   if (r < 0) {
+   pm_runtime_put_autosuspend(dev->dev);
+   return 0;
+   }
+
+   if (amdgpu_reset_domain_schedule(adev->reset_domain, 
reset_work))
+   flush_work(>reset_work);
+
+   pm_runtime_mark_last_busy(dev->dev);
+   pm_runtime_put_autosuspend(dev->dev);
+
+   return 0;
+}
+
   DEFINE_SHOW_ATTRIBUTE(amdgpu_debugfs_fence_info);
-DEFINE_DEBUGFS_ATTRIBUTE(amdgpu_debugfs_gpu_recover_fops,

gpu_recover_get, NULL,

-    "%lld\n");
+DEFINE_DEBUGFS_ATTRIBUTE(amdgpu_debugfs_gpu_recover_fops,

gpu_recover_get,

+    gpu_recover_set, "%lld\n");

   static void amdgpu_debugfs_reset_work(struct work_struct *work)
   {
@@ -978,9 +1006,10 @@ static void amdgpu_debugfs_reset_work(struct
work_struct *work)

 memset(_context, 0, sizeof(reset_context));

-   reset_context.method = AMD_RESET_METHOD_NONE;
+   reset_context.method = adev->recover_method;
 reset_context.reset_req_dev = adev;
 set_bit(AMDGPU_NEED_FULL_RESET, _context.flags);
+   adev->recover_method = AMD_RESET_METHOD_NONE;

 amdgpu_device_gpu_recover(adev, NULL, _context);
   }
@@ -999,7 +1028,7 @@ void amdgpu_debugfs_fence_init(struct

amdgpu_device *adev)

 

Re: [PATCH Review 1/1] drm/amdgpu: Support setting recover method

2024-04-11 Thread Christian König

Am 11.04.24 um 13:30 schrieb Yang, Stanley:

[AMD Official Use Only - General]


-Original Message-
From: Christian König 
Sent: Thursday, April 11, 2024 7:17 PM
To: Yang, Stanley ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH Review 1/1] drm/amdgpu: Support setting recover method

Am 11.04.24 um 13:11 schrieb Stanley.Yang:

Don't modify amdgpu gpu recover get operation, add amdgpu gpu recover
set operation to select reset method, only support mode1 and mode2
currently.

Well I don't think setting this from userspace is valid.

The reset method to use is determined by the hardware and environment (e.g.
SRIOV, passthrough, whatever) and can't be chosen simply.

[Stanley]: Agree, the setting is invalid for some devices not supported setting 
method and devices still reset with default method,
but it's valid for those devices supported setting reset method, user can 
conduct combination testing like mode1 test then mode2 test without
re-modprobe driver.


Well and the user could also shoot himself into the foot.

I really don't think that this is a valuable functionality.

Regards,
Christian.



Regards,
Stanley

Regards,
Christian.


Signed-off-by: Stanley.Yang 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu.h|  3 ++
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  1 +
   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 37

+++---

   3 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 9c62552bec34..c82976b2b977 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1151,6 +1151,9 @@ struct amdgpu_device {
 booldebug_largebar;
 booldebug_disable_soft_recovery;
 booldebug_use_vram_fw_buf;
+
+   /* Used to set gpu reset method */
+   int recover_method;
   };

   static inline uint32_t amdgpu_ip_version(const struct amdgpu_device
*adev, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 3204b8f6edeb..8411a793be18 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3908,6 +3908,7 @@ int amdgpu_device_init(struct amdgpu_device

*adev,

 else
 adev->asic_type = flags & AMD_ASIC_MASK;

+   adev->recover_method = AMD_RESET_METHOD_NONE;
 adev->usec_timeout = AMDGPU_MAX_USEC_TIMEOUT;
 if (amdgpu_emu_mode == 1)
 adev->usec_timeout *= 10;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 10832b470448..e388a50d11d9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -965,9 +965,37 @@ static int gpu_recover_get(void *data, u64 *val)
 return 0;
   }

+static int gpu_recover_set(void *data, u64 val) {
+   struct amdgpu_device *adev = (struct amdgpu_device *)data;
+   struct drm_device *dev = adev_to_drm(adev);
+   int r;
+
+   /* TODO: support mode1 and mode2 currently */
+   if (val == AMD_RESET_METHOD_MODE1 ||
+   val == AMD_RESET_METHOD_MODE2)
+   adev->recover_method = val;
+   else
+   adev->recover_method = AMD_RESET_METHOD_NONE;
+
+   r = pm_runtime_get_sync(dev->dev);
+   if (r < 0) {
+   pm_runtime_put_autosuspend(dev->dev);
+   return 0;
+   }
+
+   if (amdgpu_reset_domain_schedule(adev->reset_domain, 
reset_work))
+   flush_work(>reset_work);
+
+   pm_runtime_mark_last_busy(dev->dev);
+   pm_runtime_put_autosuspend(dev->dev);
+
+   return 0;
+}
+
   DEFINE_SHOW_ATTRIBUTE(amdgpu_debugfs_fence_info);
-DEFINE_DEBUGFS_ATTRIBUTE(amdgpu_debugfs_gpu_recover_fops,

gpu_recover_get, NULL,

-"%lld\n");
+DEFINE_DEBUGFS_ATTRIBUTE(amdgpu_debugfs_gpu_recover_fops,

gpu_recover_get,

+gpu_recover_set, "%lld\n");

   static void amdgpu_debugfs_reset_work(struct work_struct *work)
   {
@@ -978,9 +1006,10 @@ static void amdgpu_debugfs_reset_work(struct
work_struct *work)

 memset(_context, 0, sizeof(reset_context));

-   reset_context.method = AMD_RESET_METHOD_NONE;
+   reset_context.method = adev->recover_method;
 reset_context.reset_req_dev = adev;
 set_bit(AMDGPU_NEED_FULL_RESET, _context.flags);
+   adev->recover_method = AMD_RESET_METHOD_NONE;

 amdgpu_device_gpu_recover(adev, NULL, _context);
   }
@@ -999,7 +1028,7 @@ void amdgpu_debugfs_fence_init(struct

amdgpu_device *adev)

 if (!amdgpu_sriov_vf(adev)) {

 INIT_WORK(>reset_work, amdgpu_debugfs_reset_work);
-   debugfs_create_file("amdgpu_gpu_recover", 0444, root, adev,
+   debugfs_create_file("amdgpu_gpu_recover", 0666, root, adev,
 _debugfs_gpu_recover_fops);
 }
   #endif




Re: [PATCH V2] drm/ttm: remove unused paramter

2024-04-11 Thread Christian König

Am 01.04.24 um 05:04 schrieb jesse.zh...@amd.com:

From: Jesse Zhang 

remove the unsed the paramter in the function
ttm_bo_bounce_temp_buffer and ttm_bo_add_move_fence.
  V2:rebase the patch on top of drm-misc-next (Christian)


And pushed to drm-misc-next.

Thanks,
Christian.



Signed-off-by: Jesse Zhang 
Reviewed-by: Christian König 
---
  drivers/gpu/drm/ttm/ttm_bo.c | 8 +++-
  1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index e059b1e1b13b..6396dece0db1 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -402,7 +402,6 @@ void ttm_bo_put(struct ttm_buffer_object *bo)
  EXPORT_SYMBOL(ttm_bo_put);
  
  static int ttm_bo_bounce_temp_buffer(struct ttm_buffer_object *bo,

-struct ttm_resource **mem,
 struct ttm_operation_ctx *ctx,
 struct ttm_place *hop)
  {
@@ -469,7 +468,7 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
if (ret != -EMULTIHOP)
break;
  
-		ret = ttm_bo_bounce_temp_buffer(bo, _mem, ctx, );

+   ret = ttm_bo_bounce_temp_buffer(bo, ctx, );
} while (!ret);
  
  	if (ret) {

@@ -698,7 +697,6 @@ EXPORT_SYMBOL(ttm_bo_unpin);
   */
  static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo,
 struct ttm_resource_manager *man,
-struct ttm_resource *mem,
 bool no_wait_gpu)
  {
struct dma_fence *fence;
@@ -787,7 +785,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object 
*bo,
if (ret)
continue;
  
-		ret = ttm_bo_add_move_fence(bo, man, *res, ctx->no_wait_gpu);

+   ret = ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu);
if (unlikely(ret)) {
ttm_resource_free(bo, res);
if (ret == -EBUSY)
@@ -894,7 +892,7 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
  bounce:
ret = ttm_bo_handle_move_mem(bo, res, false, ctx, );
if (ret == -EMULTIHOP) {
-   ret = ttm_bo_bounce_temp_buffer(bo, , ctx, );
+   ret = ttm_bo_bounce_temp_buffer(bo, ctx, );
/* try and move to final place now. */
if (!ret)
goto bounce;




Re: [PATCH Review 1/1] drm/amdgpu: Support setting recover method

2024-04-11 Thread Christian König

Am 11.04.24 um 13:11 schrieb Stanley.Yang:

Don't modify amdgpu gpu recover get operation,
add amdgpu gpu recover set operation to select
reset method, only support mode1 and mode2 currently.


Well I don't think setting this from userspace is valid.

The reset method to use is determined by the hardware and environment 
(e.g. SRIOV, passthrough, whatever) and can't be chosen simply.


Regards,
Christian.



Signed-off-by: Stanley.Yang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h|  3 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 37 +++---
  3 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 9c62552bec34..c82976b2b977 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1151,6 +1151,9 @@ struct amdgpu_device {
booldebug_largebar;
booldebug_disable_soft_recovery;
booldebug_use_vram_fw_buf;
+
+   /* Used to set gpu reset method */
+   int recover_method;
  };
  
  static inline uint32_t amdgpu_ip_version(const struct amdgpu_device *adev,

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 3204b8f6edeb..8411a793be18 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3908,6 +3908,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
else
adev->asic_type = flags & AMD_ASIC_MASK;
  
+	adev->recover_method = AMD_RESET_METHOD_NONE;

adev->usec_timeout = AMDGPU_MAX_USEC_TIMEOUT;
if (amdgpu_emu_mode == 1)
adev->usec_timeout *= 10;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 10832b470448..e388a50d11d9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -965,9 +965,37 @@ static int gpu_recover_get(void *data, u64 *val)
return 0;
  }
  
+static int gpu_recover_set(void *data, u64 val)

+{
+   struct amdgpu_device *adev = (struct amdgpu_device *)data;
+   struct drm_device *dev = adev_to_drm(adev);
+   int r;
+
+   /* TODO: support mode1 and mode2 currently */
+   if (val == AMD_RESET_METHOD_MODE1 ||
+   val == AMD_RESET_METHOD_MODE2)
+   adev->recover_method = val;
+   else
+   adev->recover_method = AMD_RESET_METHOD_NONE;
+
+   r = pm_runtime_get_sync(dev->dev);
+   if (r < 0) {
+   pm_runtime_put_autosuspend(dev->dev);
+   return 0;
+   }
+
+   if (amdgpu_reset_domain_schedule(adev->reset_domain, >reset_work))
+   flush_work(>reset_work);
+
+   pm_runtime_mark_last_busy(dev->dev);
+   pm_runtime_put_autosuspend(dev->dev);
+
+   return 0;
+}
+
  DEFINE_SHOW_ATTRIBUTE(amdgpu_debugfs_fence_info);
-DEFINE_DEBUGFS_ATTRIBUTE(amdgpu_debugfs_gpu_recover_fops, gpu_recover_get, 
NULL,
-"%lld\n");
+DEFINE_DEBUGFS_ATTRIBUTE(amdgpu_debugfs_gpu_recover_fops, gpu_recover_get,
+gpu_recover_set, "%lld\n");
  
  static void amdgpu_debugfs_reset_work(struct work_struct *work)

  {
@@ -978,9 +1006,10 @@ static void amdgpu_debugfs_reset_work(struct work_struct 
*work)
  
  	memset(_context, 0, sizeof(reset_context));
  
-	reset_context.method = AMD_RESET_METHOD_NONE;

+   reset_context.method = adev->recover_method;
reset_context.reset_req_dev = adev;
set_bit(AMDGPU_NEED_FULL_RESET, _context.flags);
+   adev->recover_method = AMD_RESET_METHOD_NONE;
  
  	amdgpu_device_gpu_recover(adev, NULL, _context);

  }
@@ -999,7 +1028,7 @@ void amdgpu_debugfs_fence_init(struct amdgpu_device *adev)
if (!amdgpu_sriov_vf(adev)) {
  
  		INIT_WORK(>reset_work, amdgpu_debugfs_reset_work);

-   debugfs_create_file("amdgpu_gpu_recover", 0444, root, adev,
+   debugfs_create_file("amdgpu_gpu_recover", 0666, root, adev,
_debugfs_gpu_recover_fops);
}
  #endif




Re: [PATCH] drm/amdgpu: validate the parameters of amdgpu_vm_bo_clear_mappings

2024-04-11 Thread Christian König

Am 11.04.24 um 05:28 schrieb xinhui pan:

Ensure there is no address overlapping.

Reported-by: Vlad Stolyarov 
Signed-off-by: xinhui pan 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 8af3f0fd3073..f1315a854192 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1852,6 +1852,12 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device 
*adev,
LIST_HEAD(removed);
uint64_t eaddr;
  
+	/* validate the parameters */

+   if (saddr & ~PAGE_MASK || size & ~PAGE_MASK)
+   return -EINVAL;


Well as general rule: *never* use PAGE_MASK and other PAGE_* macros 
here. This is GPUVM and not related to the CPUVM space.



+   if (saddr + size <= saddr)
+   return -EINVAL;
+


Mhm, so basically size is not checked for a wraparound?


eaddr = saddr + size - 1;
saddr /= AMDGPU_GPU_PAGE_SIZE;
eaddr /= AMDGPU_GPU_PAGE_SIZE;


If that's the case then I would rather check for saddr < eaddr here.

But that actually shouldn't matter since this code here:

    /* Now gather all removed mappings */
    tmp = amdgpu_vm_it_iter_first(>va, saddr, eaddr);
    while (tmp) {

Then shouldn't return anything, so the operation is basically a NO-OP then.

Regards,
Christian.


Re: [PATCH] drm/amdgpu: set vm_update_mode=0 as default for NV32 in SRIOV case

2024-04-11 Thread Christian König

Am 28.03.24 um 00:34 schrieb Danijel Slivka:

For asic with VF MMIO access protection avoid using CPU for VM table updates.
CPU pagetable updates have issues with HDP flush as VF MMIO access protection
blocks write to BIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL register
during sriov runtime.


Well big NAK to the reasoning. HDP flush is *mandatory* to work correctly.

This not only includes flushes for CPU based VM updates, but also GART 
updates.


Without reliable HDP flushes the driver is simply not stable.

Regards,
Christian.



Signed-off-by: Danijel Slivka 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index aed60aaf1a55..a3012c9aa92b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -724,7 +724,8 @@ void amdgpu_detect_virtualization(struct amdgpu_device 
*adev)
adev->virt.caps |= AMDGPU_PASSTHROUGH_MODE;
}
  
-	if (amdgpu_sriov_vf(adev) && adev->asic_type == CHIP_SIENNA_CICHLID)

+   if ((amdgpu_sriov_vf(adev) && adev->asic_type == CHIP_SIENNA_CICHLID) ||
+   adev->pdev->device == 0x7461)
/* VF MMIO access (except mailbox range) from CPU
 * will be blocked during sriov runtime
 */




Re: [PATCH] drm/amdgpu: validate the parameters of amdgpu_vm_bo_clear_mappings

2024-04-11 Thread Christian König

Am 11.04.24 um 05:28 schrieb xinhui pan:

Ensure there is no address overlapping.

Reported-by: Vlad Stolyarov 
Signed-off-by: xinhui pan 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 8af3f0fd3073..f1315a854192 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1852,6 +1852,12 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device 
*adev,
LIST_HEAD(removed);
uint64_t eaddr;
  
+	/* validate the parameters */

+   if (saddr & ~PAGE_MASK || size & ~PAGE_MASK)
+   return -EINVAL;
+   if (saddr + size <= saddr)
+   return -EINVAL;
+


Why the heck should we do that? Looks invalid to me.

Regards,
Christian.


eaddr = saddr + size - 1;
saddr /= AMDGPU_GPU_PAGE_SIZE;
eaddr /= AMDGPU_GPU_PAGE_SIZE;




  1   2   3   4   5   6   7   8   9   10   >