Re: [PATCH] drm/amdgpu: Fix out-of-bounds write warning
Am 25.04.24 um 12:00 schrieb Ma Jun: Check the ring type value to fix the out-of-bounds write warning Signed-off-by: Ma Jun --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 06f0a6534a94..1e0b5bb47bc9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -353,6 +353,11 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring, ring->hw_prio = hw_prio; if (!ring->no_scheduler) { + if (ring->funcs->type >= AMDGPU_HW_IP_NUM) { + dev_warn(adev->dev, "ring type %d has no scheduler\n", ring->funcs->type); + return 0; + } + That check should probably be at the beginning of the function since trying to initialize a ring with an invalid type should be rejected immediately. Regards, Christian. hw_ip = ring->funcs->type; num_sched = >gpu_sched[hw_ip][hw_prio].num_scheds; adev->gpu_sched[hw_ip][hw_prio].sched[(*num_sched)++] =
Re: [PATCH v4] drm/amdgpu: Modify the contiguous flags behaviour
Am 25.04.24 um 10:15 schrieb Arunpravin Paneer Selvam: Now we have two flags for contiguous VRAM buffer allocation. If the application request for AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS, it would set the ttm place TTM_PL_FLAG_CONTIGUOUS flag in the buffer's placement function. This patch will change the default behaviour of the two flags. When we set AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS - This means contiguous is not mandatory. - we will try to allocate the contiguous buffer. Say if the allocation fails, we fallback to allocate the individual pages. When we setTTM_PL_FLAG_CONTIGUOUS - This means contiguous allocation is mandatory. - we are setting this in amdgpu_bo_pin_restricted() before bo validation and check this flag in the vram manager file. - if this is set, we should allocate the buffer pages contiguously. the allocation fails, we return -ENOSPC. v2: - keep the mem_flags and bo->flags check as is(Christian) - place the TTM_PL_FLAG_CONTIGUOUS flag setting into the amdgpu_bo_pin_restricted function placement range iteration loop(Christian) - rename find_pages with amdgpu_vram_mgr_calculate_pages_per_block (Christian) - Keep the kernel BO allocation as is(Christain) - If BO pin vram allocation failed, we need to return -ENOSPC as RDMA cannot work with scattered VRAM pages(Philip) v3(Christian): - keep contiguous flag handling outside of pages_per_block calculation - remove the hacky implementation in contiguous flag error handling code v4(Christian): - use any variable and return value for non-contiguous fallback Signed-off-by: Arunpravin Paneer Selvam Suggested-by: Christian König Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 8 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 22 ++-- 2 files changed, 23 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 492aebc44e51..c594d2a5978e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -154,8 +154,10 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) else places[c].flags |= TTM_PL_FLAG_TOPDOWN; - if (flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) + if (abo->tbo.type == ttm_bo_type_kernel && + flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) places[c].flags |= TTM_PL_FLAG_CONTIGUOUS; + c++; } @@ -965,6 +967,10 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain, if (!bo->placements[i].lpfn || (lpfn && lpfn < bo->placements[i].lpfn)) bo->placements[i].lpfn = lpfn; + + if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS && + bo->placements[i].mem_type == TTM_PL_VRAM) + bo->placements[i].flags |= TTM_PL_FLAG_CONTIGUOUS; } r = ttm_bo_validate(>tbo, >placement, ); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index e494f5bf136a..6c30eceec896 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -469,7 +469,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, if (tbo->type != ttm_bo_type_kernel) max_bytes -= AMDGPU_VM_RESERVED_VRAM; - if (place->flags & TTM_PL_FLAG_CONTIGUOUS) { + if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) { pages_per_block = ~0ul; } else { #ifdef CONFIG_TRANSPARENT_HUGEPAGE @@ -478,7 +478,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, /* default to 2MB */ pages_per_block = 2UL << (20UL - PAGE_SHIFT); #endif - pages_per_block = max_t(uint32_t, pages_per_block, + pages_per_block = max_t(u32, pages_per_block, tbo->page_alignment); } @@ -499,7 +499,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, if (place->flags & TTM_PL_FLAG_TOPDOWN) vres->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION; - if (place->flags & TTM_PL_FLAG_CONTIGUOUS) + if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) vres->flags |= DRM_BUDDY_CONTIGUOUS_ALLOCATION; if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CLEARED) @@ -518,21 +518,31 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, else min_block_size = mgr->default_page_size; - BUG_ON(min_block_size < mm->chunk_size); - /* Limit maximum size to 2GiB due
Re: [RFC PATCH 10/18] drm/amdgpu: Don't add GTT to initial domains after failing to allocate VRAM
Am 25.04.24 um 09:39 schrieb Friedrich Vock: On 25.04.24 08:25, Christian König wrote: Am 24.04.24 um 18:57 schrieb Friedrich Vock: This adds GTT to the "preferred domains" of this buffer object, which will also prevent any attempts at moving the buffer back to VRAM if there is space. If VRAM is full, GTT will already be chosen as a fallback. Big NAK to that one, this is mandatory for correct operation. Hm, how is correctness affected here? We still fall back to GTT if allocating in VRAM doesn't work, I don't see a difference except that now we'll actually try moving it back into VRAM again. Well this is the fallback. Only during CS we try to allocate from GTT if allocating in VRAM doesn't work. When you remove this here then any failed allocation from VRAM would be fatal. What could be is that the handling is buggy and when we update the initial domain we also add GTT to the preferred domain, but that should then be fixed. Regards, Christian. Regards, Friedrich Regards, Christian. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 4 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +- 2 files changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index 6bbab141eaaeb..aea3770d3ea2e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -378,10 +378,6 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void *data, goto retry; } - if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) { - initial_domain |= AMDGPU_GEM_DOMAIN_GTT; - goto retry; - } DRM_DEBUG("Failed to allocate GEM object (%llu, %d, %llu, %d)\n", size, initial_domain, args->in.alignment, r); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 85c10d8086188..9978b85ed6f40 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -619,7 +619,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev, AMDGPU_GEM_DOMAIN_GDS)) amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU); else - amdgpu_bo_placement_from_domain(bo, bp->domain); + amdgpu_bo_placement_from_domain(bo, bo->allowed_domains); if (bp->type == ttm_bo_type_kernel) bo->tbo.priority = 2; else if (!(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE)) -- 2.44.0
Re: [PATCH V2] drm/amdgpu: fix the warning about the expression (int)size - len
Am 25.04.24 um 09:11 schrieb Jesse Zhang: Converting size from size_t to int may overflow. v2: keep reverse xmas tree order (Christian) Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c index f5d0fa207a88..eed60d4b3390 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c @@ -2065,12 +2065,13 @@ static ssize_t amdgpu_reset_dump_register_list_write(struct file *f, struct amdgpu_device *adev = (struct amdgpu_device *)file_inode(f)->i_private; char reg_offset[11]; uint32_t *new = NULL, *tmp = NULL; + unsigned int len = 0; int ret, i = 0, len = 0; Well now you have len defined twice :) Christian. do { memset(reg_offset, 0, 11); if (copy_from_user(reg_offset, buf + len, - min(10, ((int)size-len { + min(10, (size-len { ret = -EFAULT; goto error_free; }
Re: [RFC PATCH 16/18] drm/amdgpu: Implement SET_PRIORITY GEM op
Am 25.04.24 um 09:06 schrieb Friedrich Vock: On 25.04.24 08:58, Christian König wrote: Am 25.04.24 um 08:46 schrieb Friedrich Vock: On 25.04.24 08:32, Christian König wrote: Am 24.04.24 um 18:57 schrieb Friedrich Vock: Used by userspace to adjust buffer priorities in response to changes in application demand and memory pressure. Yeah, that was discussed over and over again. One big design criteria is that we can't have global priorities from userspace! The background here is that this can trivially be abused. I see your point when apps are allowed to prioritize themselves above other apps, and I agree that should probably be disallowed at least for unprivileged apps. Disallowing this is a pretty trivial change though, and I don't really see the abuse potential in being able to downgrade your own priority? Yeah, I know what you mean and I'm also leaning towards that argumentation. But another good point is also that it doesn't actually help. For example when you have desktop apps fighting with a game, you probably don't want to use static priorities, but rather evict the apps which are inactive and keep the apps which are active in the background. Sadly things are not as simple as "evict everything from app 1, keep everything from app 2 active". The simplest failure case of this is games that already oversubscribe VRAM on their own. Keeping the whole app inside VRAM is literally impossible there, and it helps a lot to know which buffers the app is most happy with evicting. In other words the priority just tells you which stuff from each app to evict first, but not which app to globally throw out. Yeah, but per-buffer priority system could do both of these. Yeah, but we already have that. See amdgpu_bo_list_entry_cmp() and amdgpu_bo_list_create(). This is the per application priority which can be used by userspace to give priority to each BO in a submission (or application wide). The problem is rather that amdgpu/TTM never really made good use of that information. Regards, Christian. Regards, Friedrich Regards, Christian. Regards, Friedrich What we can do is to have per process priorities, but that needs to be in the VM subsystem. That's also the reason why I personally think that the handling shouldn't be inside TTM at all. Regards, Christian. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 20 include/uapi/drm/amdgpu_drm.h | 1 + 2 files changed, 21 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index 5ca13e2e50f50..6107810a9c205 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -836,8 +836,10 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data, { struct amdgpu_device *adev = drm_to_adev(dev); struct drm_amdgpu_gem_op *args = data; + struct ttm_resource_manager *man; struct drm_gem_object *gobj; struct amdgpu_vm_bo_base *base; + struct ttm_operation_ctx ctx; struct amdgpu_bo *robj; int r; @@ -851,6 +853,9 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data, if (unlikely(r)) goto out; + memset(, 0, sizeof(ctx)); + ctx.interruptible = true; + switch (args->op) { case AMDGPU_GEM_OP_GET_GEM_CREATE_INFO: { struct drm_amdgpu_gem_create_in info; @@ -898,6 +903,21 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data, amdgpu_bo_unreserve(robj); break; + case AMDGPU_GEM_OP_SET_PRIORITY: + if (args->value > AMDGPU_BO_PRIORITY_MAX_USER) + args->value = AMDGPU_BO_PRIORITY_MAX_USER; + ttm_bo_update_priority(>tbo, args->value); + if (robj->tbo.evicted_type != TTM_NUM_MEM_TYPES) { + ttm_bo_try_unevict(>tbo, ); + amdgpu_bo_unreserve(robj); + } else { + amdgpu_bo_unreserve(robj); + man = ttm_manager_type(robj->tbo.bdev, + robj->tbo.resource->mem_type); + ttm_mem_unevict_evicted(robj->tbo.bdev, man, + true); + } + break; default: amdgpu_bo_unreserve(robj); r = -EINVAL; diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h index bdbe6b262a78d..53552dd489b9b 100644 --- a/include/uapi/drm/amdgpu_drm.h +++ b/include/uapi/drm/amdgpu_drm.h @@ -531,6 +531,7 @@ union drm_amdgpu_wait_fences { #define AMDGPU_GEM_OP_GET_GEM_CREATE_INFO 0 #define AMDGPU_GEM_OP_SET_PLACEMENT 1 +#define AMDGPU_GEM_OP_SET_PRIORITY 2 /* Sets or returns a value associated with a buffer. */ struct drm_amdgpu_gem_op { -- 2.44.0
Re: [RFC PATCH 16/18] drm/amdgpu: Implement SET_PRIORITY GEM op
Am 25.04.24 um 08:46 schrieb Friedrich Vock: On 25.04.24 08:32, Christian König wrote: Am 24.04.24 um 18:57 schrieb Friedrich Vock: Used by userspace to adjust buffer priorities in response to changes in application demand and memory pressure. Yeah, that was discussed over and over again. One big design criteria is that we can't have global priorities from userspace! The background here is that this can trivially be abused. I see your point when apps are allowed to prioritize themselves above other apps, and I agree that should probably be disallowed at least for unprivileged apps. Disallowing this is a pretty trivial change though, and I don't really see the abuse potential in being able to downgrade your own priority? Yeah, I know what you mean and I'm also leaning towards that argumentation. But another good point is also that it doesn't actually help. For example when you have desktop apps fighting with a game, you probably don't want to use static priorities, but rather evict the apps which are inactive and keep the apps which are active in the background. In other words the priority just tells you which stuff from each app to evict first, but not which app to globally throw out. Regards, Christian. Regards, Friedrich What we can do is to have per process priorities, but that needs to be in the VM subsystem. That's also the reason why I personally think that the handling shouldn't be inside TTM at all. Regards, Christian. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 20 include/uapi/drm/amdgpu_drm.h | 1 + 2 files changed, 21 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index 5ca13e2e50f50..6107810a9c205 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -836,8 +836,10 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data, { struct amdgpu_device *adev = drm_to_adev(dev); struct drm_amdgpu_gem_op *args = data; + struct ttm_resource_manager *man; struct drm_gem_object *gobj; struct amdgpu_vm_bo_base *base; + struct ttm_operation_ctx ctx; struct amdgpu_bo *robj; int r; @@ -851,6 +853,9 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data, if (unlikely(r)) goto out; + memset(, 0, sizeof(ctx)); + ctx.interruptible = true; + switch (args->op) { case AMDGPU_GEM_OP_GET_GEM_CREATE_INFO: { struct drm_amdgpu_gem_create_in info; @@ -898,6 +903,21 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data, amdgpu_bo_unreserve(robj); break; + case AMDGPU_GEM_OP_SET_PRIORITY: + if (args->value > AMDGPU_BO_PRIORITY_MAX_USER) + args->value = AMDGPU_BO_PRIORITY_MAX_USER; + ttm_bo_update_priority(>tbo, args->value); + if (robj->tbo.evicted_type != TTM_NUM_MEM_TYPES) { + ttm_bo_try_unevict(>tbo, ); + amdgpu_bo_unreserve(robj); + } else { + amdgpu_bo_unreserve(robj); + man = ttm_manager_type(robj->tbo.bdev, + robj->tbo.resource->mem_type); + ttm_mem_unevict_evicted(robj->tbo.bdev, man, + true); + } + break; default: amdgpu_bo_unreserve(robj); r = -EINVAL; diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h index bdbe6b262a78d..53552dd489b9b 100644 --- a/include/uapi/drm/amdgpu_drm.h +++ b/include/uapi/drm/amdgpu_drm.h @@ -531,6 +531,7 @@ union drm_amdgpu_wait_fences { #define AMDGPU_GEM_OP_GET_GEM_CREATE_INFO 0 #define AMDGPU_GEM_OP_SET_PLACEMENT 1 +#define AMDGPU_GEM_OP_SET_PRIORITY 2 /* Sets or returns a value associated with a buffer. */ struct drm_amdgpu_gem_op { -- 2.44.0
Re: [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription
In general: Yes please :) But are exercising a lot of ideas we have already thrown over board over the years. The general idea Marek and I have been working on for a while now is rather to make TTM aware of userspace "clients". In other words we should start with having a TTM structure in the fpriv of the drivers and then track there how much VRAM was evicted for each client. This should then be balanced so that each client gets it's equal share of VRAM and we pretty much end up with a static situation which only changes when applications become inactive/active (based on their GPU activity). I will mail you some of the stuff we already came up with later on. Regards, Christian. Am 24.04.24 um 18:56 schrieb Friedrich Vock: Hi everyone, recently I've been looking into remedies for apps (in particular, newer games) that experience significant performance loss when they start to hit VRAM limits, especially on older or lower-end cards that struggle to fit both desktop apps and all the game data into VRAM at once. The root of the problem lies in the fact that from userspace's POV, buffer eviction is very opaque: Userspace applications/drivers cannot tell how oversubscribed VRAM is, nor do they have fine-grained control over which buffers get evicted. At the same time, with GPU APIs becoming increasingly lower-level and GPU-driven, only the application itself can know which buffers are used within a particular submission, and how important each buffer is. For this, GPU APIs include interfaces to query oversubscription and specify memory priorities: In Vulkan, oversubscription can be queried through the VK_EXT_memory_budget extension. Different buffers can also be assigned priorities via the VK_EXT_pageable_device_local_memory extension. Modern games, especially D3D12 games via vkd3d-proton, rely on oversubscription being reported and priorities being respected in order to perform their memory management. However, relaying this information to the kernel via the current KMD uAPIs is not possible. On AMDGPU for example, all work submissions include a "bo list" that contains any buffer object that is accessed during the course of the submission. If VRAM is oversubscribed and a buffer in the list was evicted to system memory, that buffer is moved back to VRAM (potentially evicting other unused buffers). Since the usermode driver doesn't know what buffers are used by the application, its only choice is to submit a bo list that contains every buffer the application has allocated. In case of VRAM oversubscription, it is highly likely that some of the application's buffers were evicted, which almost guarantees that some buffers will get moved around. Since the bo list is only known at submit time, this also means the buffers will get moved right before submitting application work, which is the worst possible time to move buffers from a latency perspective. Another consequence of the large bo list is that nearly all memory from other applications will be evicted, too. When different applications (e.g. game and compositor) submit work one after the other, this causes a ping-pong effect where each app's submission evicts the other app's memory, resulting in a large amount of unnecessary moves. This overly aggressive eviction behavior led to RADV adopting a change that effectively allows all VRAM applications to reside in system memory [1]. This worked around the ping-ponging/excessive buffer moving problem, but also meant that any memory evicted to system memory would forever stay there, regardless of how VRAM is used. My proposal aims at providing a middle ground between these extremes. The goals I want to meet are: - Userspace is accurately informed about VRAM oversubscription/how much VRAM has been evicted - Buffer eviction respects priorities set by userspace - Wasteful ping-ponging is avoided to the extent possible I have been testing out some prototypes, and came up with this rough sketch of an API: - For each ttm_resource_manager, the amount of evicted memory is tracked (similarly to how "usage" tracks the memory usage). When memory is evicted via ttm_bo_evict, the size of the evicted memory is added, when memory is un-evicted (see below), its size is subtracted. The amount of evicted memory for e.g. VRAM can be queried by userspace via an ioctl. - Each ttm_resource_manager maintains a list of evicted buffer objects. - ttm_mem_unevict walks the list of evicted bos for a given ttm_resource_manager and tries moving evicted resources back. When a buffer is freed, this function is called to immediately restore some evicted memory. - Each ttm_buffer_object independently tracks the mem_type it wants to reside in. - ttm_bo_try_unevict is added as a helper function which attempts to move the buffer to its preferred mem_type. If no space is available there, it fails with -ENOSPC/-ENOMEM. - Similar to how ttm_bo_evict works, each driver can implement
Re: [PATCH] drm/amdgpu: fix the warning about the expression (int)size - len
Am 25.04.24 um 08:20 schrieb Jesse Zhang: Converting size from size_t to int may overflow. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c index f5d0fa207a88..b828aad4f35e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c @@ -2065,12 +2065,13 @@ static ssize_t amdgpu_reset_dump_register_list_write(struct file *f, struct amdgpu_device *adev = (struct amdgpu_device *)file_inode(f)->i_private; char reg_offset[11]; uint32_t *new = NULL, *tmp = NULL; - int ret, i = 0, len = 0; + int ret, i = 0; + unsigned int len = 0; Please keep reverse xmas tree order here, apart from that looks good to me. Regards, Christian. do { memset(reg_offset, 0, 11); if (copy_from_user(reg_offset, buf + len, - min(10, ((int)size-len { + min(10, (size-len { ret = -EFAULT; goto error_free; }
Re: [PATCH] drm/amdgpu: fix overflowed array index read warning
Am 25.04.24 um 07:27 schrieb Tim Huang: From: Tim Huang Clear warning that cast operation might have overflowed. Signed-off-by: Tim Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 06f0a6534a94..6dfcd62e83ae 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -473,7 +473,7 @@ static ssize_t amdgpu_debugfs_ring_read(struct file *f, char __user *buf, size_t size, loff_t *pos) { struct amdgpu_ring *ring = file_inode(f)->i_private; - int r, i; + int r; uint32_t value, result, early[3]; While at it please declare "int r;" last, e.g. keep reverse xmas tree order here. Apart from that looks good to me. Regards, Christian. if (*pos & 3 || size & 3) @@ -485,7 +485,7 @@ static ssize_t amdgpu_debugfs_ring_read(struct file *f, char __user *buf, early[0] = amdgpu_ring_get_rptr(ring) & ring->buf_mask; early[1] = amdgpu_ring_get_wptr(ring) & ring->buf_mask; early[2] = ring->wptr & ring->buf_mask; - for (i = *pos / 4; i < 3 && size; i++) { + for (loff_t i = *pos / 4; i < 3 && size; i++) { r = put_user(early[i], (uint32_t *)buf); if (r) return r;
Re: [PATCH] drm/amdgpu: fix potential resource leak warning
Am 25.04.24 um 05:33 schrieb Tim Huang: From: Tim Huang Clear resource leak warning that when the prepare fails, the allocated amdgpu job object will never be released. Signed-off-by: Tim Huang Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c index 66e8a016126b..9b748d7058b5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c @@ -102,6 +102,11 @@ static int amdgpu_vm_sdma_prepare(struct amdgpu_vm_update_params *p, if (!r) r = amdgpu_sync_push_to_job(, p->job); amdgpu_sync_free(); + + if (r) { + p->num_dw_left = 0; + amdgpu_job_free(p->job); + } return r; }
Re: [RFC PATCH 08/18] drm/amdgpu: Don't try moving BOs to preferred domain before submit
Am 24.04.24 um 18:56 schrieb Friedrich Vock: TTM now takes care of moving buffers to the best possible domain. Yeah, I've been planning to do this for a while as well. The problem is really that we need to keep the functionality. For example TTM currently doesn't have a concept of an userspace client. So it can't track the bytes already evicted for each client. This needs to be added as infrastructure first and then we can start to change this over into moving more functionality into TTM. Regards, Christian. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 - drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 191 + drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h | 4 - drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 7 - 4 files changed, 3 insertions(+), 201 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index cac0ca64367b3..3004adc6fa679 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1404,8 +1404,6 @@ bool amdgpu_device_need_post(struct amdgpu_device *adev); bool amdgpu_device_seamless_boot_supported(struct amdgpu_device *adev); bool amdgpu_device_should_use_aspm(struct amdgpu_device *adev); -void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64 num_bytes, - u64 num_vis_bytes); int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev); void amdgpu_device_program_register_sequence(struct amdgpu_device *adev, const u32 *registers, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index e9168677ef0a6..92a0cffc1adc3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -638,196 +638,19 @@ static int amdgpu_cs_pass2(struct amdgpu_cs_parser *p) return 0; } -/* Convert microseconds to bytes. */ -static u64 us_to_bytes(struct amdgpu_device *adev, s64 us) -{ - if (us <= 0 || !adev->mm_stats.log2_max_MBps) - return 0; - - /* Since accum_us is incremented by a million per second, just -* multiply it by the number of MB/s to get the number of bytes. -*/ - return us << adev->mm_stats.log2_max_MBps; -} - -static s64 bytes_to_us(struct amdgpu_device *adev, u64 bytes) -{ - if (!adev->mm_stats.log2_max_MBps) - return 0; - - return bytes >> adev->mm_stats.log2_max_MBps; -} - -/* Returns how many bytes TTM can move right now. If no bytes can be moved, - * it returns 0. If it returns non-zero, it's OK to move at least one buffer, - * which means it can go over the threshold once. If that happens, the driver - * will be in debt and no other buffer migrations can be done until that debt - * is repaid. - * - * This approach allows moving a buffer of any size (it's important to allow - * that). - * - * The currency is simply time in microseconds and it increases as the clock - * ticks. The accumulated microseconds (us) are converted to bytes and - * returned. - */ -static void amdgpu_cs_get_threshold_for_moves(struct amdgpu_device *adev, - u64 *max_bytes, - u64 *max_vis_bytes) -{ - s64 time_us, increment_us; - u64 free_vram, total_vram, used_vram; - /* Allow a maximum of 200 accumulated ms. This is basically per-IB -* throttling. -* -* It means that in order to get full max MBps, at least 5 IBs per -* second must be submitted and not more than 200ms apart from each -* other. -*/ - const s64 us_upper_bound = 20; - - if (!adev->mm_stats.log2_max_MBps) { - *max_bytes = 0; - *max_vis_bytes = 0; - return; - } - - total_vram = adev->gmc.real_vram_size - atomic64_read(>vram_pin_size); - used_vram = ttm_resource_manager_usage(>mman.vram_mgr.manager); - free_vram = used_vram >= total_vram ? 0 : total_vram - used_vram; - - spin_lock(>mm_stats.lock); - - /* Increase the amount of accumulated us. */ - time_us = ktime_to_us(ktime_get()); - increment_us = time_us - adev->mm_stats.last_update_us; - adev->mm_stats.last_update_us = time_us; - adev->mm_stats.accum_us = min(adev->mm_stats.accum_us + increment_us, - us_upper_bound); - - /* This prevents the short period of low performance when the VRAM -* usage is low and the driver is in debt or doesn't have enough -* accumulated us to fill VRAM quickly. -* -* The situation can occur in these cases: -* - a lot of VRAM is freed by userspace -* - the presence of a big buffer causes a lot of evictions -* (solution: split buffers into smaller ones) -* -* If 128 MB or 1/8th of VRAM
Re: [RFC PATCH 16/18] drm/amdgpu: Implement SET_PRIORITY GEM op
Am 24.04.24 um 18:57 schrieb Friedrich Vock: Used by userspace to adjust buffer priorities in response to changes in application demand and memory pressure. Yeah, that was discussed over and over again. One big design criteria is that we can't have global priorities from userspace! The background here is that this can trivially be abused. What we can do is to have per process priorities, but that needs to be in the VM subsystem. That's also the reason why I personally think that the handling shouldn't be inside TTM at all. Regards, Christian. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 20 include/uapi/drm/amdgpu_drm.h | 1 + 2 files changed, 21 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index 5ca13e2e50f50..6107810a9c205 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -836,8 +836,10 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data, { struct amdgpu_device *adev = drm_to_adev(dev); struct drm_amdgpu_gem_op *args = data; + struct ttm_resource_manager *man; struct drm_gem_object *gobj; struct amdgpu_vm_bo_base *base; + struct ttm_operation_ctx ctx; struct amdgpu_bo *robj; int r; @@ -851,6 +853,9 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data, if (unlikely(r)) goto out; + memset(, 0, sizeof(ctx)); + ctx.interruptible = true; + switch (args->op) { case AMDGPU_GEM_OP_GET_GEM_CREATE_INFO: { struct drm_amdgpu_gem_create_in info; @@ -898,6 +903,21 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data, amdgpu_bo_unreserve(robj); break; + case AMDGPU_GEM_OP_SET_PRIORITY: + if (args->value > AMDGPU_BO_PRIORITY_MAX_USER) + args->value = AMDGPU_BO_PRIORITY_MAX_USER; + ttm_bo_update_priority(>tbo, args->value); + if (robj->tbo.evicted_type != TTM_NUM_MEM_TYPES) { + ttm_bo_try_unevict(>tbo, ); + amdgpu_bo_unreserve(robj); + } else { + amdgpu_bo_unreserve(robj); + man = ttm_manager_type(robj->tbo.bdev, + robj->tbo.resource->mem_type); + ttm_mem_unevict_evicted(robj->tbo.bdev, man, + true); + } + break; default: amdgpu_bo_unreserve(robj); r = -EINVAL; diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h index bdbe6b262a78d..53552dd489b9b 100644 --- a/include/uapi/drm/amdgpu_drm.h +++ b/include/uapi/drm/amdgpu_drm.h @@ -531,6 +531,7 @@ union drm_amdgpu_wait_fences { #define AMDGPU_GEM_OP_GET_GEM_CREATE_INFO 0 #define AMDGPU_GEM_OP_SET_PLACEMENT 1 +#define AMDGPU_GEM_OP_SET_PRIORITY 2 /* Sets or returns a value associated with a buffer. */ struct drm_amdgpu_gem_op { -- 2.44.0
Re: [RFC PATCH 13/18] drm/ttm: Implement ttm_bo_update_priority
Am 24.04.24 um 18:57 schrieb Friedrich Vock: Used to dynamically adjust priorities of buffers at runtime, to react to changes in memory pressure/usage patterns. And another big NAK. TTM priorities are meant to be static based on in kernel decisions which are not exposed to userspace. In other words we can group BOs based on kernel, user, SVM etc... but never on something userspace can influence. Regards, Christian. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/ttm/ttm_bo.c | 17 + include/drm/ttm/ttm_bo.h | 2 ++ 2 files changed, 19 insertions(+) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index eae54cd4a7ce9..6ac939c58a6b8 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -112,6 +112,23 @@ void ttm_bo_set_bulk_move(struct ttm_buffer_object *bo, } EXPORT_SYMBOL(ttm_bo_set_bulk_move); +void ttm_bo_update_priority(struct ttm_buffer_object *bo, unsigned int new_prio) +{ + struct ttm_resource_manager *man; + + if (!bo->resource) + return; + + man = ttm_manager_type(bo->bdev, bo->resource->mem_type); + + spin_lock(>bdev->lru_lock); + ttm_resource_del_bulk_move(bo->resource, bo); + bo->priority = new_prio; + ttm_resource_add_bulk_move(bo->resource, bo); + spin_unlock(>bdev->lru_lock); +} +EXPORT_SYMBOL(ttm_bo_update_priority); + static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo, struct ttm_resource *mem, bool evict, struct ttm_operation_ctx *ctx, diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h index 91299a3b6fcfa..51040bc443ea0 100644 --- a/include/drm/ttm/ttm_bo.h +++ b/include/drm/ttm/ttm_bo.h @@ -359,6 +359,8 @@ static inline void *ttm_kmap_obj_virtual(struct ttm_bo_kmap_obj *map, return map->virtual; } +void ttm_bo_update_priority(struct ttm_buffer_object *bo, + unsigned int new_prio); int ttm_bo_wait_ctx(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx); -- 2.44.0
Re: [RFC PATCH 12/18] drm/ttm: Do not evict BOs with higher priority
Am 24.04.24 um 18:57 schrieb Friedrich Vock: This makes buffer eviction significantly more stable by avoiding ping-ponging caused by low-priority buffers evicting high-priority buffers and vice versa. And creates a deny of service for the whole system by fork() bombing. This is another very big NAK. Regards, Christian. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/ttm/ttm_bo.c | 9 +++-- drivers/gpu/drm/ttm/ttm_resource.c | 5 +++-- include/drm/ttm/ttm_bo.h | 1 + 3 files changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 3047c763eb4eb..eae54cd4a7ce9 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -776,6 +776,7 @@ static int ttm_mem_evict_wait_busy(struct ttm_buffer_object *busy_bo, int ttm_mem_evict_first(struct ttm_device *bdev, struct ttm_resource_manager *man, const struct ttm_place *place, + unsigned int max_priority, struct ttm_operation_ctx *ctx, struct ww_acquire_ctx *ticket) { @@ -788,6 +789,8 @@ int ttm_mem_evict_first(struct ttm_device *bdev, spin_lock(>lru_lock); ttm_resource_manager_for_each_res(man, , res) { bool busy; + if (res->bo->priority > max_priority) + break; if (!ttm_bo_evict_swapout_allowable(res->bo, ctx, place, , )) { @@ -930,8 +933,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, return ret; if (ctx->no_evict) return -ENOSPC; - ret = ttm_mem_evict_first(bdev, man, place, ctx, - ticket); + if (!bo->priority) + return -ENOSPC; + ret = ttm_mem_evict_first(bdev, man, place, bo->priority - 1, + ctx, ticket); if (unlikely(ret != 0)) return ret; } while (1); diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c index 1d6755a1153b1..63d4371adb519 100644 --- a/drivers/gpu/drm/ttm/ttm_resource.c +++ b/drivers/gpu/drm/ttm/ttm_resource.c @@ -431,8 +431,9 @@ int ttm_resource_manager_evict_all(struct ttm_device *bdev, for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) { while (!list_empty(>lru[i])) { spin_unlock(>lru_lock); - ret = ttm_mem_evict_first(bdev, man, NULL, , - NULL); + ret = ttm_mem_evict_first(bdev, man, NULL, + TTM_MAX_BO_PRIORITY, + , NULL); if (ret) return ret; spin_lock(>lru_lock); diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h index 8f4e6366c0417..91299a3b6fcfa 100644 --- a/include/drm/ttm/ttm_bo.h +++ b/include/drm/ttm/ttm_bo.h @@ -396,6 +396,7 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo); int ttm_mem_evict_first(struct ttm_device *bdev, struct ttm_resource_manager *man, const struct ttm_place *place, + unsigned int max_priority, struct ttm_operation_ctx *ctx, struct ww_acquire_ctx *ticket); void ttm_mem_unevict_evicted(struct ttm_device *bdev, -- 2.44.0
Re: [RFC PATCH 10/18] drm/amdgpu: Don't add GTT to initial domains after failing to allocate VRAM
Am 24.04.24 um 18:57 schrieb Friedrich Vock: This adds GTT to the "preferred domains" of this buffer object, which will also prevent any attempts at moving the buffer back to VRAM if there is space. If VRAM is full, GTT will already be chosen as a fallback. Big NAK to that one, this is mandatory for correct operation. Regards, Christian. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 4 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +- 2 files changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index 6bbab141eaaeb..aea3770d3ea2e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -378,10 +378,6 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void *data, goto retry; } - if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) { - initial_domain |= AMDGPU_GEM_DOMAIN_GTT; - goto retry; - } DRM_DEBUG("Failed to allocate GEM object (%llu, %d, %llu, %d)\n", size, initial_domain, args->in.alignment, r); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 85c10d8086188..9978b85ed6f40 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -619,7 +619,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev, AMDGPU_GEM_DOMAIN_GDS)) amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU); else - amdgpu_bo_placement_from_domain(bo, bp->domain); + amdgpu_bo_placement_from_domain(bo, bo->allowed_domains); if (bp->type == ttm_bo_type_kernel) bo->tbo.priority = 2; else if (!(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE)) -- 2.44.0
Re: [RFC PATCH 09/18] drm/amdgpu: Don't mark VRAM as a busy placement for VRAM|GTT resources
Am 24.04.24 um 18:56 schrieb Friedrich Vock: We will never try evicting things from VRAM for these resources anyway. This affects TTM buffer uneviction logic, which would otherwise try to move these buffers into VRAM (clashing with VRAM-only allocations). You are working on outdated code. That change was already done by me as well. Regards, Christian. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 13 + 1 file changed, 13 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 5834a95d680d9..85c10d8086188 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -127,6 +127,7 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) struct amdgpu_device *adev = amdgpu_ttm_adev(abo->tbo.bdev); struct ttm_placement *placement = >placement; struct ttm_place *places = abo->placements; + bool skip_vram_busy = false; u64 flags = abo->flags; u32 c = 0; @@ -156,6 +157,13 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) if (flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) places[c].flags |= TTM_PL_FLAG_CONTIGUOUS; c++; + + /* +* If GTT is preferred by the buffer as well, don't try VRAM when it's +* busy. +*/ + if ((domain & abo->preferred_domains) & AMDGPU_GEM_DOMAIN_GTT) + skip_vram_busy = true; } if (domain & AMDGPU_GEM_DOMAIN_DOORBELL) { @@ -223,6 +231,11 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) placement->num_busy_placement = c; placement->busy_placement = places; + + if (skip_vram_busy) { + --placement->num_busy_placement; + ++placement->busy_placement; + } } /** -- 2.44.0
Re: [RFC PATCH 05/18] drm/ttm: Add option to evict no BOs in operation
Am 24.04.24 um 18:56 schrieb Friedrich Vock: When undoing evictions because of decreased memory pressure, it makes no sense to try evicting other buffers. That duplicates some functionality. If a driver doesn't want eviction to happen it just needs to mark the desired placements as non-evictable with the TTM_PL_FLAG_DESIRED flag. Regards, Christian. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/ttm/ttm_bo.c | 2 ++ include/drm/ttm/ttm_bo.h | 2 ++ 2 files changed, 4 insertions(+) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 9a0efbf79316c..3b89fabc2f00a 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -764,6 +764,8 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, break; if (unlikely(ret != -ENOSPC)) return ret; + if (ctx->no_evict) + return -ENOSPC; ret = ttm_mem_evict_first(bdev, man, place, ctx, ticket); if (unlikely(ret != 0)) diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h index 8a1a29c6fbc50..a8f21092403d6 100644 --- a/include/drm/ttm/ttm_bo.h +++ b/include/drm/ttm/ttm_bo.h @@ -192,6 +192,7 @@ struct ttm_operation_ctx { bool gfp_retry_mayfail; bool allow_res_evict; bool force_alloc; + bool no_evict; struct dma_resv *resv; uint64_t bytes_moved; }; @@ -358,6 +359,7 @@ static inline void *ttm_kmap_obj_virtual(struct ttm_bo_kmap_obj *map, return map->virtual; } + int ttm_bo_wait_ctx(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx); int ttm_bo_validate(struct ttm_buffer_object *bo, -- 2.44.0
Re: [RFC PATCH 02/18] drm/ttm: Add per-BO eviction tracking
Am 24.04.24 um 18:56 schrieb Friedrich Vock: Make each buffer object aware of whether it has been evicted or not. That reverts some changes we made a couple of years ago. In general the idea is that eviction isn't something we need to reverse in TTM. Rather the driver gives the desired placement. Regards, Christian. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/ttm/ttm_bo.c | 1 + include/drm/ttm/ttm_bo.h | 11 +++ 2 files changed, 12 insertions(+) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index edf10618fe2b2..3968b17453569 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -980,6 +980,7 @@ int ttm_bo_init_reserved(struct ttm_device *bdev, struct ttm_buffer_object *bo, bo->pin_count = 0; bo->sg = sg; bo->bulk_move = NULL; + bo->evicted_type = TTM_NUM_MEM_TYPES; if (resv) bo->base.resv = resv; else diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h index 0223a41a64b24..8a1a29c6fbc50 100644 --- a/include/drm/ttm/ttm_bo.h +++ b/include/drm/ttm/ttm_bo.h @@ -121,6 +121,17 @@ struct ttm_buffer_object { unsigned priority; unsigned pin_count; + /** +* @evicted_type: Memory type this BO was evicted from, if any. +* TTM_NUM_MEM_TYPES if this BO was not evicted. +*/ + int evicted_type; + /** +* @evicted: Entry in the evicted list for the resource manager +* this BO was evicted from. +*/ + struct list_head evicted; + /** * @delayed_delete: Work item used when we can't delete the BO * immediately -- 2.44.0
Re: [PATCH v3] drm/amdgpu: fix uninitialized scalar variable warning
Am 23.04.24 um 16:31 schrieb Tim Huang: From: Tim Huang Clear warning that uses uninitialized value fw_size. Signed-off-by: Tim Huang Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index d9dc5485..fb5de23fa8d8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c @@ -1205,7 +1205,8 @@ void amdgpu_gfx_cp_init_microcode(struct amdgpu_device *adev, fw_size = le32_to_cpu(cp_hdr_v2_0->data_size_bytes); break; default: - break; + dev_err(adev->dev, "Invalid ucode id %u\n", ucode_id); + return; } if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
Re: [PATCH] drm/amd/display: re-indent dc_power_down_on_boot()
Am 24.04.24 um 15:20 schrieb Dan Carpenter: On Wed, Apr 24, 2024 at 03:11:08PM +0200, Christian König wrote: Am 24.04.24 um 13:41 schrieb Dan Carpenter: These lines are indented too far. Clean the whitespace. Signed-off-by: Dan Carpenter --- drivers/gpu/drm/amd/display/dc/core/dc.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index 8eefba757da4..f64d7229eb6c 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -5043,11 +5043,10 @@ void dc_interrupt_ack(struct dc *dc, enum dc_irq_source src) void dc_power_down_on_boot(struct dc *dc) { if (dc->ctx->dce_environment != DCE_ENV_VIRTUAL_HW && - dc->hwss.power_down_on_boot) { - - if (dc->caps.ips_support) - dc_exit_ips_for_hw_access(dc); + dc->hwss.power_down_on_boot) { + if (dc->caps.ips_support) + dc_exit_ips_for_hw_access(dc); Well while at it can't the two ifs be merged here? (I don't know this code to well, but it looks like it). I'm sorry, I don't see what you're saying. The indentation was so messed up that I though the call to power_down_on_boot() was after both ifs, but it is still inside the first. So your patch is actually right, sorry for the noise. Regards, Christian. I probably should have deleted the other blank line as well, though. It introduces a checkpatch.pl --strict warning. regards, dan carpenter
Re: [PATCH] drm/amd/display: re-indent dc_power_down_on_boot()
Am 24.04.24 um 13:41 schrieb Dan Carpenter: These lines are indented too far. Clean the whitespace. Signed-off-by: Dan Carpenter --- drivers/gpu/drm/amd/display/dc/core/dc.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index 8eefba757da4..f64d7229eb6c 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -5043,11 +5043,10 @@ void dc_interrupt_ack(struct dc *dc, enum dc_irq_source src) void dc_power_down_on_boot(struct dc *dc) { if (dc->ctx->dce_environment != DCE_ENV_VIRTUAL_HW && - dc->hwss.power_down_on_boot) { - - if (dc->caps.ips_support) - dc_exit_ips_for_hw_access(dc); + dc->hwss.power_down_on_boot) { + if (dc->caps.ips_support) + dc_exit_ips_for_hw_access(dc); Well while at it can't the two ifs be merged here? (I don't know this code to well, but it looks like it). Regards, Christian. dc->hwss.power_down_on_boot(dc); } }
Re: [PATCH 2/3] drm/amdgpu: Initialize timestamp for some legacy SOCs
Am 24.04.24 um 12:03 schrieb Ma Jun: Initialize the interrupt timestamp for some legacy SOCs to fix the coverity issue "Uninitialized scalar variable" Signed-off-by: Ma Jun Suggested-by: Christian König Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c index 7e6d09730e6d..665c63f55278 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c @@ -445,6 +445,14 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev, entry.ih = ih; entry.iv_entry = (const uint32_t *)>ring[ring_index]; + + /* +* timestamp is not supported on some legacy SOCs (cik, cz, iceland, +* si and tonga), so initialize timestamp and timestamp_src to 0 +*/ + entry.timestamp = 0; + entry.timestamp_src = 0; + amdgpu_ih_decode_iv(adev, ); trace_amdgpu_iv(ih - >irq.ih, );
Re: [PATCH v3] drm/amdgpu: add return result for amdgpu_i2c_{get/put}_byte
Am 24.04.24 um 11:36 schrieb Bob Zhou: After amdgpu_i2c_get_byte fail, amdgpu_i2c_put_byte shouldn't be conducted to put wrong value. So return and check the i2c transfer result. Signed-off-by: Bob Zhou Suggested-by: Christian König Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c | 47 +++-- 1 file changed, 28 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c index 82608df43396..e0f3bff335c4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c @@ -280,7 +280,7 @@ amdgpu_i2c_lookup(struct amdgpu_device *adev, return NULL; } -static void amdgpu_i2c_get_byte(struct amdgpu_i2c_chan *i2c_bus, +static int amdgpu_i2c_get_byte(struct amdgpu_i2c_chan *i2c_bus, u8 slave_addr, u8 addr, u8 *val) @@ -305,16 +305,18 @@ static void amdgpu_i2c_get_byte(struct amdgpu_i2c_chan *i2c_bus, out_buf[0] = addr; out_buf[1] = 0; - if (i2c_transfer(_bus->adapter, msgs, 2) == 2) { - *val = in_buf[0]; - DRM_DEBUG("val = 0x%02x\n", *val); - } else { - DRM_DEBUG("i2c 0x%02x 0x%02x read failed\n", - addr, *val); + if (i2c_transfer(_bus->adapter, msgs, 2) != 2) { + DRM_DEBUG("i2c 0x%02x read failed\n", addr); + return -EIO; } + + *val = in_buf[0]; + DRM_DEBUG("val = 0x%02x\n", *val); + + return 0; } -static void amdgpu_i2c_put_byte(struct amdgpu_i2c_chan *i2c_bus, +static int amdgpu_i2c_put_byte(struct amdgpu_i2c_chan *i2c_bus, u8 slave_addr, u8 addr, u8 val) @@ -330,9 +332,12 @@ static void amdgpu_i2c_put_byte(struct amdgpu_i2c_chan *i2c_bus, out_buf[0] = addr; out_buf[1] = val; - if (i2c_transfer(_bus->adapter, , 1) != 1) - DRM_DEBUG("i2c 0x%02x 0x%02x write failed\n", - addr, val); + if (i2c_transfer(_bus->adapter, , 1) != 1) { + DRM_DEBUG("i2c 0x%02x 0x%02x write failed\n", addr, val); + return -EIO; + } + + return 0; } /* ddc router switching */ @@ -347,16 +352,18 @@ amdgpu_i2c_router_select_ddc_port(const struct amdgpu_connector *amdgpu_connecto if (!amdgpu_connector->router_bus) return; - amdgpu_i2c_get_byte(amdgpu_connector->router_bus, + if (amdgpu_i2c_get_byte(amdgpu_connector->router_bus, amdgpu_connector->router.i2c_addr, - 0x3, ); + 0x3, )) + return; val &= ~amdgpu_connector->router.ddc_mux_control_pin; amdgpu_i2c_put_byte(amdgpu_connector->router_bus, amdgpu_connector->router.i2c_addr, 0x3, val); - amdgpu_i2c_get_byte(amdgpu_connector->router_bus, + if (amdgpu_i2c_get_byte(amdgpu_connector->router_bus, amdgpu_connector->router.i2c_addr, - 0x1, ); + 0x1, )) + return; val &= ~amdgpu_connector->router.ddc_mux_control_pin; val |= amdgpu_connector->router.ddc_mux_state; amdgpu_i2c_put_byte(amdgpu_connector->router_bus, @@ -376,16 +383,18 @@ amdgpu_i2c_router_select_cd_port(const struct amdgpu_connector *amdgpu_connector if (!amdgpu_connector->router_bus) return; - amdgpu_i2c_get_byte(amdgpu_connector->router_bus, + if (amdgpu_i2c_get_byte(amdgpu_connector->router_bus, amdgpu_connector->router.i2c_addr, - 0x3, ); + 0x3, )) + return; val &= ~amdgpu_connector->router.cd_mux_control_pin; amdgpu_i2c_put_byte(amdgpu_connector->router_bus, amdgpu_connector->router.i2c_addr, 0x3, val); - amdgpu_i2c_get_byte(amdgpu_connector->router_bus, + if (amdgpu_i2c_get_byte(amdgpu_connector->router_bus, amdgpu_connector->router.i2c_addr, - 0x1, ); + 0x1, )) + return; val &= ~amdgpu_connector->router.cd_mux_control_pin; val |= amdgpu_connector->router.cd_mux_state; amdgpu_i2c_put_byte(amdgpu_connector->router_bus,
Re: [PATCH 4/4 V2] drm/amdgpu: Using uninitialized value *size when calling amdgpu_vce_cs_reloc
Am 24.04.24 um 11:04 schrieb Jesse Zhang: Initialize the size before calling amdgpu_vce_cs_reloc, such as case 0x0301. V2: To really improve the handling we would actually need to have a separate value of 0x.(Christian) Signed-off-by: Jesse Zhang Suggested-by: Christian König Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c index 59acf424a078..968ca2c84ef7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c @@ -743,7 +743,8 @@ int amdgpu_vce_ring_parse_cs(struct amdgpu_cs_parser *p, uint32_t created = 0; uint32_t allocated = 0; uint32_t tmp, handle = 0; - uint32_t *size = + uint32_t dummy = 0x; + uint32_t *size = unsigned int idx; int i, r = 0;
Re: [PATCH 4/4 V2] drm/amdgpu: Using uninitialized value *size when calling amdgpu_vce_cs_reloc
Am 24.04.24 um 10:41 schrieb Jesse Zhang: Initialize the size before calling amdgpu_vce_cs_reloc, such as case 0x0301. V2: To really improve the handling we would actually need to have a separate value of 0x.(Christian) Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c index 59acf424a078..1929de0db3a1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c @@ -742,7 +742,7 @@ int amdgpu_vce_ring_parse_cs(struct amdgpu_cs_parser *p, uint32_t destroyed = 0; uint32_t created = 0; uint32_t allocated = 0; - uint32_t tmp, handle = 0; + uint32_t tmp = 0x, handle = 0; That's close, but what I meant was to have something like this instead: uint32_t dummy = 0x; *size = Because tmp is overwritten by user values while parsing the command stream. Regards, Christian. uint32_t *size = unsigned int idx; int i, r = 0;
Re: [PATCH v2] drm/amdgpu: add return result for amdgpu_i2c_{get/put}_byte
Am 24.04.24 um 09:52 schrieb Bob Zhou: After amdgpu_i2c_get_byte fail, amdgpu_i2c_put_byte shouldn't be conducted to put wrong value. So return and check the i2c transfer result. Signed-off-by: Bob Zhou Looks good in general, just some coding style comments below. --- drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c | 42 +++-- 1 file changed, 26 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c index 82608df43396..c588704d56a7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c @@ -280,11 +280,12 @@ amdgpu_i2c_lookup(struct amdgpu_device *adev, return NULL; } -static void amdgpu_i2c_get_byte(struct amdgpu_i2c_chan *i2c_bus, +static int amdgpu_i2c_get_byte(struct amdgpu_i2c_chan *i2c_bus, u8 slave_addr, u8 addr, u8 *val) { + int r = 0; BTW: Short variables like i and r should be declared last. I don't care much about that personally, but some upstream maintainers insist on that. And never initialize a variable if you don't need it. This will be complained about by automated checkers as well. u8 out_buf[2]; u8 in_buf[2]; struct i2c_msg msgs[] = { @@ -309,16 +310,18 @@ static void amdgpu_i2c_get_byte(struct amdgpu_i2c_chan *i2c_bus, *val = in_buf[0]; DRM_DEBUG("val = 0x%02x\n", *val); } else { - DRM_DEBUG("i2c 0x%02x 0x%02x read failed\n", - addr, *val); + r = -EIO; + DRM_DEBUG("i2c 0x%02x 0x%02x read failed\n", addr, *val); } + return r; Better to write it like this: if (error_condition) { DRM_DEBUG("i2c 0x%02x read failed\n", addr); return -EIO) } *val = in_buf[0]; DRM_DEBUG("val = 0x%02x\n", *val); Printing *val in the error condition will result in use of uninitialized value as well. } -static void amdgpu_i2c_put_byte(struct amdgpu_i2c_chan *i2c_bus, +static int amdgpu_i2c_put_byte(struct amdgpu_i2c_chan *i2c_bus, u8 slave_addr, u8 addr, u8 val) { + int r = 0; uint8_t out_buf[2]; struct i2c_msg msg = { .addr = slave_addr, @@ -330,9 +333,12 @@ static void amdgpu_i2c_put_byte(struct amdgpu_i2c_chan *i2c_bus, out_buf[0] = addr; out_buf[1] = val; - if (i2c_transfer(_bus->adapter, , 1) != 1) - DRM_DEBUG("i2c 0x%02x 0x%02x write failed\n", - addr, val); + if (i2c_transfer(_bus->adapter, , 1) != 1) { + r = -EIO; + DRM_DEBUG("i2c 0x%02x 0x%02x write failed\n", addr, val); + } + + return r; Same here. As long as you don't have anything to cleanup just use "return -EIO" and "return 0;" directly. Regards, Christian. } /* ddc router switching */ @@ -347,16 +353,18 @@ amdgpu_i2c_router_select_ddc_port(const struct amdgpu_connector *amdgpu_connecto if (!amdgpu_connector->router_bus) return; - amdgpu_i2c_get_byte(amdgpu_connector->router_bus, + if (amdgpu_i2c_get_byte(amdgpu_connector->router_bus, amdgpu_connector->router.i2c_addr, - 0x3, ); + 0x3, )) + return; val &= ~amdgpu_connector->router.ddc_mux_control_pin; amdgpu_i2c_put_byte(amdgpu_connector->router_bus, amdgpu_connector->router.i2c_addr, 0x3, val); - amdgpu_i2c_get_byte(amdgpu_connector->router_bus, + if (amdgpu_i2c_get_byte(amdgpu_connector->router_bus, amdgpu_connector->router.i2c_addr, - 0x1, ); + 0x1, )) + return; val &= ~amdgpu_connector->router.ddc_mux_control_pin; val |= amdgpu_connector->router.ddc_mux_state; amdgpu_i2c_put_byte(amdgpu_connector->router_bus, @@ -368,7 +376,7 @@ amdgpu_i2c_router_select_ddc_port(const struct amdgpu_connector *amdgpu_connecto void amdgpu_i2c_router_select_cd_port(const struct amdgpu_connector *amdgpu_connector) { - u8 val; + u8 val = 0; if (!amdgpu_connector->router.cd_valid) return; @@ -376,16 +384,18 @@ amdgpu_i2c_router_select_cd_port(const struct amdgpu_connector *amdgpu_connector if (!amdgpu_connector->router_bus) return; - amdgpu_i2c_get_byte(amdgpu_connector->router_bus, + if (amdgpu_i2c_get_byte(amdgpu_connector->router_bus, amdgpu_connector->router.i2c_addr, - 0x3, ); + 0x3, )) + return; val &=
Re: [PATCH v3] drm/amdgpu: Modify the contiguous flags behaviour
Am 24.04.24 um 09:13 schrieb Arunpravin Paneer Selvam: Now we have two flags for contiguous VRAM buffer allocation. If the application request for AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS, it would set the ttm place TTM_PL_FLAG_CONTIGUOUS flag in the buffer's placement function. This patch will change the default behaviour of the two flags. When we set AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS - This means contiguous is not mandatory. - we will try to allocate the contiguous buffer. Say if the allocation fails, we fallback to allocate the individual pages. When we setTTM_PL_FLAG_CONTIGUOUS - This means contiguous allocation is mandatory. - we are setting this in amdgpu_bo_pin_restricted() before bo validation and check this flag in the vram manager file. - if this is set, we should allocate the buffer pages contiguously. the allocation fails, we return -ENOSPC. v2: - keep the mem_flags and bo->flags check as is(Christian) - place the TTM_PL_FLAG_CONTIGUOUS flag setting into the amdgpu_bo_pin_restricted function placement range iteration loop(Christian) - rename find_pages with amdgpu_vram_mgr_calculate_pages_per_block (Christian) - Keep the kernel BO allocation as is(Christain) - If BO pin vram allocation failed, we need to return -ENOSPC as RDMA cannot work with scattered VRAM pages(Philip) v3(Christian): - keep contiguous flag handling outside of pages_per_block calculation - remove the hacky implementation in contiguous flag error handling code Signed-off-by: Arunpravin Paneer Selvam Suggested-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 8 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 83 ++-- 2 files changed, 65 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 492aebc44e51..c594d2a5978e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -154,8 +154,10 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) else places[c].flags |= TTM_PL_FLAG_TOPDOWN; - if (flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) + if (abo->tbo.type == ttm_bo_type_kernel && + flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) places[c].flags |= TTM_PL_FLAG_CONTIGUOUS; + c++; } @@ -965,6 +967,10 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain, if (!bo->placements[i].lpfn || (lpfn && lpfn < bo->placements[i].lpfn)) bo->placements[i].lpfn = lpfn; + + if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS && + bo->placements[i].mem_type == TTM_PL_VRAM) + bo->placements[i].flags |= TTM_PL_FLAG_CONTIGUOUS; } r = ttm_bo_validate(>tbo, >placement, ); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index e494f5bf136a..17c5d9ce9927 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -88,6 +88,23 @@ static inline u64 amdgpu_vram_mgr_blocks_size(struct list_head *head) return size; } +static inline void amdgpu_vram_mgr_limit_min_block_size(unsigned long pages_per_block, + u64 size, + u64 *min_block_size, + bool contiguous_enabled) +{ + if (contiguous_enabled) + return; + + /* +* if size >= 2MiB, limit the min_block_size to 2MiB +* for better TLB usage. +*/ + if ((size >= (u64)pages_per_block << PAGE_SHIFT) && + !(size & (((u64)pages_per_block << PAGE_SHIFT) - 1))) + *min_block_size = (u64)pages_per_block << PAGE_SHIFT; +} + /** * DOC: mem_info_vram_total * @@ -452,11 +469,12 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, struct amdgpu_device *adev = to_amdgpu_device(mgr); struct amdgpu_bo *bo = ttm_to_amdgpu_bo(tbo); u64 vis_usage = 0, max_bytes, min_block_size; + struct amdgpu_bo *bo = ttm_to_amdgpu_bo(tbo); struct amdgpu_vram_mgr_resource *vres; u64 size, remaining_size, lpfn, fpfn; struct drm_buddy *mm = >mm; - struct drm_buddy_block *block; unsigned long pages_per_block; + struct drm_buddy_block *block; int r; lpfn = (u64)place->lpfn << PAGE_SHIFT; @@ -469,18 +487,14 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, if (tbo->type != ttm_bo_type_kernel) max_byt
Re: [PATCH 4/4] drm/amdgpu: Using uninitialized value *size when calling amdgpu_vce_cs_reloc
Am 24.04.24 um 04:50 schrieb jesse.zh...@amd.com: From: Jesse Zhang Initialize the size before calling amdgpu_vce_cs_reloc, such as case 0x0301. Signed-off-by: Jesse Zhang To really improve the handling we would actually need to have a separate value of 0x. Regards, Christian. --- drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c index 59acf424a078..60d97cd14855 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c @@ -742,7 +742,7 @@ int amdgpu_vce_ring_parse_cs(struct amdgpu_cs_parser *p, uint32_t destroyed = 0; uint32_t created = 0; uint32_t allocated = 0; - uint32_t tmp, handle = 0; + uint32_t tmp = 0, handle = 0; uint32_t *size = unsigned int idx; int i, r = 0;
Re: [PATCH] drm/amdgpu: fix some uninitialized variables
Am 24.04.24 um 04:04 schrieb Zhang, Jesse(Jie): [AMD Official Use Only - General] Hi Alex, -Original Message- From: Alex Deucher Sent: Wednesday, April 24, 2024 9:46 AM To: Zhang, Jesse(Jie) Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander ; Koenig, Christian ; Huang, Tim Subject: Re: [PATCH] drm/amdgpu: fix some uninitialized variables On Tue, Apr 23, 2024 at 9:27 PM Jesse Zhang wrote: Fix some variables not initialized before use. Scan them out using Synopsys tools. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 5 + drivers/gpu/drm/amd/amdgpu/atom.c | 1 + drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 3 ++- drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 3 ++- drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 3 ++- Please split out the SDMA changes into a separate patch. More comments below on the other hunks. 6 files changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c index 59acf424a078..60d97cd14855 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c @@ -742,7 +742,7 @@ int amdgpu_vce_ring_parse_cs(struct amdgpu_cs_parser *p, uint32_t destroyed = 0; uint32_t created = 0; uint32_t allocated = 0; - uint32_t tmp, handle = 0; + uint32_t tmp = 0, handle = 0; Can you elaborate on what the issue is here? Presumably it's warning about size being passed as a parameter in the function? [Zhang, Jesse(Jie)] Using uninitialized value *size when calling amdgpu_vce_cs_reloc for case 0x0301. Because uint32_t *size = I'm not sure if other commands initialize the size before running case 0x0301. Ah! Yeah, that handling is actually correct. The size might be uninitialized in this function when the command stream isn't valid. We could instead set size to NULL and check that everywhere, but that would probably be overkill. Well we could silence the warning by setting tmp to zero, but that actually doesn't improve anything. Regards, Christian. uint32_t *size = unsigned int idx; int i, r = 0; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c index 677eb141554e..13125ddd5e86 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c @@ -410,6 +410,11 @@ static void amdgpu_vcn_idle_work_handler(struct work_struct *work) else new_state.fw_based = VCN_DPG_STATE__UNPAUSE; + if (amdgpu_fence_count_emitted(adev->jpeg.inst->ring_dec)) + new_state.jpeg = VCN_DPG_STATE__PAUSE; + else + new_state.jpeg = + VCN_DPG_STATE__UNPAUSE; + adev->vcn.pause_dpg_mode(adev, j, _state); } This should be a separate patch as well. Thanks for your reminder, Alex, I will. diff --git a/drivers/gpu/drm/amd/amdgpu/atom.c b/drivers/gpu/drm/amd/amdgpu/atom.c index 72362df352f6..d552e013354c 100644 --- a/drivers/gpu/drm/amd/amdgpu/atom.c +++ b/drivers/gpu/drm/amd/amdgpu/atom.c @@ -1243,6 +1243,7 @@ static int amdgpu_atom_execute_table_locked(struct atom_context *ctx, int index, ectx.ps_size = params_size; ectx.abort = false; ectx.last_jump = 0; + ectx.last_jump_jiffies = 0; if (ws) { ectx.ws = kcalloc(4, ws, GFP_KERNEL); ectx.ws_size = ws; diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c index 45a2d0a5a2d7..b7d33d78bce0 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c @@ -999,7 +999,8 @@ static int sdma_v5_0_ring_test_ring(struct amdgpu_ring *ring) r = amdgpu_ring_alloc(ring, 20); if (r) { DRM_ERROR("amdgpu: dma failed to lock ring %d (%d).\n", ring->idx, r); - amdgpu_device_wb_free(adev, index); + if (!ring->is_mes_queue) + amdgpu_device_wb_free(adev, index); return r; } diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c index 43e64b2da575..cc9e961f0078 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c @@ -839,7 +839,8 @@ static int sdma_v5_2_ring_test_ring(struct amdgpu_ring *ring) r = amdgpu_ring_alloc(ring, 20); if (r) { DRM_ERROR("amdgpu: dma failed to lock ring %d (%d).\n", ring->idx, r); - amdgpu_device_wb_free(adev, index); + if (!ring->is_mes_queue) + amdgpu_device_wb_free(adev, index); return r; } diff --git
Re: [PATCH] drm/amdgpu: fix some uninitialized variables
Am 24.04.24 um 03:19 schrieb Jesse Zhang: Fix some variables not initialized before use. Scan them out using Synopsys tools. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 5 + drivers/gpu/drm/amd/amdgpu/atom.c | 1 + drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 3 ++- drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 3 ++- drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 3 ++- 6 files changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c index 59acf424a078..60d97cd14855 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c @@ -742,7 +742,7 @@ int amdgpu_vce_ring_parse_cs(struct amdgpu_cs_parser *p, uint32_t destroyed = 0; uint32_t created = 0; uint32_t allocated = 0; - uint32_t tmp, handle = 0; + uint32_t tmp = 0, handle = 0; As far as I can see that isn't correct. tmp isn't used before it is written. Is the tool maybe broken? Regards, Christian. uint32_t *size = unsigned int idx; int i, r = 0; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c index 677eb141554e..13125ddd5e86 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c @@ -410,6 +410,11 @@ static void amdgpu_vcn_idle_work_handler(struct work_struct *work) else new_state.fw_based = VCN_DPG_STATE__UNPAUSE; + if (amdgpu_fence_count_emitted(adev->jpeg.inst->ring_dec)) + new_state.jpeg = VCN_DPG_STATE__PAUSE; + else + new_state.jpeg = VCN_DPG_STATE__UNPAUSE; + adev->vcn.pause_dpg_mode(adev, j, _state); } diff --git a/drivers/gpu/drm/amd/amdgpu/atom.c b/drivers/gpu/drm/amd/amdgpu/atom.c index 72362df352f6..d552e013354c 100644 --- a/drivers/gpu/drm/amd/amdgpu/atom.c +++ b/drivers/gpu/drm/amd/amdgpu/atom.c @@ -1243,6 +1243,7 @@ static int amdgpu_atom_execute_table_locked(struct atom_context *ctx, int index, ectx.ps_size = params_size; ectx.abort = false; ectx.last_jump = 0; + ectx.last_jump_jiffies = 0; if (ws) { ectx.ws = kcalloc(4, ws, GFP_KERNEL); ectx.ws_size = ws; diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c index 45a2d0a5a2d7..b7d33d78bce0 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c @@ -999,7 +999,8 @@ static int sdma_v5_0_ring_test_ring(struct amdgpu_ring *ring) r = amdgpu_ring_alloc(ring, 20); if (r) { DRM_ERROR("amdgpu: dma failed to lock ring %d (%d).\n", ring->idx, r); - amdgpu_device_wb_free(adev, index); + if (!ring->is_mes_queue) + amdgpu_device_wb_free(adev, index); return r; } diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c index 43e64b2da575..cc9e961f0078 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c @@ -839,7 +839,8 @@ static int sdma_v5_2_ring_test_ring(struct amdgpu_ring *ring) r = amdgpu_ring_alloc(ring, 20); if (r) { DRM_ERROR("amdgpu: dma failed to lock ring %d (%d).\n", ring->idx, r); - amdgpu_device_wb_free(adev, index); + if (!ring->is_mes_queue) + amdgpu_device_wb_free(adev, index); return r; } diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c index 1f4877195213..c833b6b8373b 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c @@ -861,7 +861,8 @@ static int sdma_v6_0_ring_test_ring(struct amdgpu_ring *ring) r = amdgpu_ring_alloc(ring, 5); if (r) { DRM_ERROR("amdgpu: dma failed to lock ring %d (%d).\n", ring->idx, r); - amdgpu_device_wb_free(adev, index); + if (!ring->is_mes_queue) + amdgpu_device_wb_free(adev, index); return r; }
Re: [PATCH] drm/amdgpu: Fix two reset triggered in a row
Am 23.04.24 um 20:05 schrieb Felix Kuehling: On 2024-04-23 01:50, Christian König wrote: Am 22.04.24 um 21:45 schrieb Yunxiang Li: Reset request from KFD is missing a check for if a reset is already in progress, this causes a second reset to be triggered right after the previous one finishes. Add the check to align with the other reset sources. NAK, that isn't how this should be handled. Instead all reset source which are handled by a previous reset should be canceled. In other words there should be a cancel_work(>kfd.reset_work); somewhere in the KFD code. When this doesn't work correctly then that is somehow missing. If you see the use of amdgpu_in_reset() outside of the low level functions than that is clearly a bug. Do we need to do that for all reset workers in the driver separately? I don't see where this is done for other reset workers. Yeah, I think so. But we don't have so many reset workers if I'm not completely mistaken. We have the KFD, FLR, the per engine one in the scheduler and IIRC one more for the CP (illegal operation and register write). I'm not sure about the CP one, but all others should be handled correctly with the V2 patch as far as I can see. Regards, Christian. Regards, Felix Regards, Christian. Signed-off-by: Yunxiang Li --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 3b4591f554f1..ce3dbb1cc2da 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -283,7 +283,7 @@ int amdgpu_amdkfd_post_reset(struct amdgpu_device *adev) void amdgpu_amdkfd_gpu_reset(struct amdgpu_device *adev) { - if (amdgpu_device_should_recover_gpu(adev)) + if (amdgpu_device_should_recover_gpu(adev) && !amdgpu_in_reset(adev)) amdgpu_reset_domain_schedule(adev->reset_domain, >kfd.reset_work); }
Re: [PATCH v5 2/6] drm/amdgpu: Handle sg size limit for contiguous allocation
Am 23.04.24 um 17:28 schrieb Philip Yang: Define macro MAX_SG_SEGMENT_SIZE 2GB, because struct scatterlist length is unsigned int, and some users of it cast to a signed int, so every segment of sg table is limited to size 2GB maximum. For contiguous VRAM allocation, don't limit the max buddy block size in order to get contiguous VRAM memory. To workaround the sg table segment size limit, allocate multiple segments if contiguous size is bigger than MAX_SG_SEGMENT_SIZE. Signed-off-by: Philip Yang Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 4be8b091099a..ebffb58ea53a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -31,6 +31,8 @@ #include "amdgpu_atomfirmware.h" #include "atom.h" +#define AMDGPU_MAX_SG_SEGMENT_SIZE (2UL << 30) + struct amdgpu_vram_reservation { u64 start; u64 size; @@ -532,9 +534,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, BUG_ON(min_block_size < mm->chunk_size); - /* Limit maximum size to 2GiB due to SG table limitations */ - size = min(remaining_size, 2ULL << 30); - + size = remaining_size; if ((size >= (u64)pages_per_block << PAGE_SHIFT) && !(size & (((u64)pages_per_block << PAGE_SHIFT) - 1))) min_block_size = (u64)pages_per_block << PAGE_SHIFT; @@ -675,7 +675,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev, amdgpu_res_first(res, offset, length, ); while (cursor.remaining) { num_entries++; - amdgpu_res_next(, cursor.size); + amdgpu_res_next(, min(cursor.size, AMDGPU_MAX_SG_SEGMENT_SIZE)); } r = sg_alloc_table(*sgt, num_entries, GFP_KERNEL); @@ -695,7 +695,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev, amdgpu_res_first(res, offset, length, ); for_each_sgtable_sg((*sgt), sg, i) { phys_addr_t phys = cursor.start + adev->gmc.aper_base; - size_t size = cursor.size; + unsigned long size = min(cursor.size, AMDGPU_MAX_SG_SEGMENT_SIZE); dma_addr_t addr; addr = dma_map_resource(dev, phys, size, dir, @@ -708,7 +708,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev, sg_dma_address(sg) = addr; sg_dma_len(sg) = size; - amdgpu_res_next(, cursor.size); + amdgpu_res_next(, size); } return 0;
Re: [PATCH v2] drm/amdgpu: Fix two reset triggered in a row
Am 23.04.24 um 16:44 schrieb Yunxiang Li: Some times a hang GPU causes multiple reset source to schedule resets, if the second source schedule after we call amdgpu_device_stop_pending_resets they will be able to trigger an unnecessary reset. Move amdgpu_device_stop_pending_resets to after the reset is already done, since any reset scheduled after that point would be legitimate. Remove unnecessary and incorrect checks for amdgpu_in_reset that was kinda serving this purpose. Signed-off-by: Yunxiang Li Looks really good to me of hand, especially that so many cases of using amdgpu_in_reset() are removed. But I'm just not deeply into each component to fully judge everything here. So only Acked-by: Christian König for now, if you need a more in deep review please ping me. Regards, Christian. --- v2: instead of adding amdgpu_in_reset check, move when we cancel pending resets drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 17 - drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 2 +- drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 2 +- drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 2 +- drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 2 +- 5 files changed, 12 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index f8a34db5d9e3..28f6a1c38b17 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5061,8 +5061,6 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev, retry: amdgpu_amdkfd_pre_reset(adev); - amdgpu_device_stop_pending_resets(adev); - if (from_hypervisor) r = amdgpu_virt_request_full_gpu(adev, true); else @@ -5813,13 +5811,6 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, r, adev_to_drm(tmp_adev)->unique); tmp_adev->asic_reset_res = r; } - - if (!amdgpu_sriov_vf(tmp_adev)) - /* - * Drop all pending non scheduler resets. Scheduler resets - * were already dropped during drm_sched_stop - */ - amdgpu_device_stop_pending_resets(tmp_adev); } /* Actual ASIC resets if needed.*/ @@ -5841,6 +5832,14 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, goto retry; } + list_for_each_entry(tmp_adev, device_list_handle, reset_list) { + /* +* Drop all pending non scheduler resets. Scheduler resets +* were already dropped during drm_sched_stop +*/ + amdgpu_device_stop_pending_resets(tmp_adev); + } + skip_hw_reset: /* Post ASIC reset for all devs .*/ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c index 54ab51a4ada7..c2385178d6b3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c @@ -597,7 +597,7 @@ static void amdgpu_virt_update_vf2pf_work_item(struct work_struct *work) if (ret) { adev->virt.vf2pf_update_retry_cnt++; if ((adev->virt.vf2pf_update_retry_cnt >= AMDGPU_VF2PF_UPDATE_MAX_RETRY_LIMIT) && - amdgpu_sriov_runtime(adev) && !amdgpu_in_reset(adev)) { + amdgpu_sriov_runtime(adev)) { amdgpu_ras_set_fed(adev, true); if (amdgpu_reset_domain_schedule(adev->reset_domain, >virt.flr_work)) diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c index 0c7275bca8f7..c5ba9c4757a8 100644 --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c @@ -319,7 +319,7 @@ static int xgpu_ai_mailbox_rcv_irq(struct amdgpu_device *adev, switch (event) { case IDH_FLR_NOTIFICATION: - if (amdgpu_sriov_runtime(adev) && !amdgpu_in_reset(adev)) + if (amdgpu_sriov_runtime(adev)) WARN_ONCE(!amdgpu_reset_domain_schedule(adev->reset_domain, >virt.flr_work), "Failed to queue work! at %s", diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c index aba00d961627..fa9d1b02f391 100644 --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c @@ -358,7 +358,7 @@ static int xgpu_nv_mailbox_rcv_irq(struct amdgpu_device *adev, switch (event) { case IDH_FLR_NOTIFICATION: - if (amdgpu_sriov_runtime(adev) && !amdgpu_in_reset(adev)) + if (amdgpu_sriov_runtime(adev))
Re: [PATCH v3] drm/amdgpu: fix uninitialized scalar variable warning
Am 23.04.24 um 16:31 schrieb Tim Huang: From: Tim Huang Clear warning that uses uninitialized value fw_size. Signed-off-by: Tim Huang Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index d9dc5485..fb5de23fa8d8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c @@ -1205,7 +1205,8 @@ void amdgpu_gfx_cp_init_microcode(struct amdgpu_device *adev, fw_size = le32_to_cpu(cp_hdr_v2_0->data_size_bytes); break; default: - break; + dev_err(adev->dev, "Invalid ucode id %u\n", ucode_id); + return; } if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
Re: [PATCH 1/2] drm/amdgpu: add a spinlock to wb allocation
Am 23.04.24 um 15:18 schrieb Alex Deucher: On Tue, Apr 23, 2024 at 2:57 AM Christian König wrote: Am 22.04.24 um 16:37 schrieb Alex Deucher: As we use wb slots more dynamically, we need to lock access to avoid racing on allocation or free. Wait a second. Why are we using the wb slots dynamically? See patch 2. I needed a way to allocate small GPU accessible memory locations on the fly. Using WB seems like a good solution. That's probably better done with the seq64 allocator. At least the original idea was that it is self containing and can be used by many threads at the same time. Apart from that I really think we need to talk with the MES guys about changing that behavior ASAP. This is really a bug we need to fix and not work around like that. Christian. Alex The number of slots made available is statically calculated, when this is suddenly used dynamically we have quite a bug here. Regards, Christian. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 ++- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index cac0ca64367b..f87d53e183c3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -502,6 +502,7 @@ struct amdgpu_wb { uint64_tgpu_addr; u32 num_wb; /* Number of wb slots actually reserved for amdgpu. */ unsigned long used[DIV_ROUND_UP(AMDGPU_MAX_WB, BITS_PER_LONG)]; + spinlock_t lock; }; int amdgpu_device_wb_get(struct amdgpu_device *adev, u32 *wb); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index f8a34db5d9e3..869256394136 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -1482,13 +1482,17 @@ static int amdgpu_device_wb_init(struct amdgpu_device *adev) */ int amdgpu_device_wb_get(struct amdgpu_device *adev, u32 *wb) { - unsigned long offset = find_first_zero_bit(adev->wb.used, adev->wb.num_wb); + unsigned long flags, offset; + spin_lock_irqsave(>wb.lock, flags); + offset = find_first_zero_bit(adev->wb.used, adev->wb.num_wb); if (offset < adev->wb.num_wb) { __set_bit(offset, adev->wb.used); + spin_unlock_irqrestore(>wb.lock, flags); *wb = offset << 3; /* convert to dw offset */ return 0; } else { + spin_unlock_irqrestore(>wb.lock, flags); return -EINVAL; } } @@ -1503,9 +1507,13 @@ int amdgpu_device_wb_get(struct amdgpu_device *adev, u32 *wb) */ void amdgpu_device_wb_free(struct amdgpu_device *adev, u32 wb) { + unsigned long flags; + wb >>= 3; + spin_lock_irqsave(>wb.lock, flags); if (wb < adev->wb.num_wb) __clear_bit(wb, adev->wb.used); + spin_unlock_irqrestore(>wb.lock, flags); } /** @@ -4061,6 +4069,7 @@ int amdgpu_device_init(struct amdgpu_device *adev, spin_lock_init(>se_cac_idx_lock); spin_lock_init(>audio_endpt_idx_lock); spin_lock_init(>mm_stats.lock); + spin_lock_init(>wb.lock); INIT_LIST_HEAD(>shadow_list); mutex_init(>shadow_list_lock);
Re: [PATCH v4 6/7] drm/amdgpu: Skip dma map resource for null RDMA device
Am 23.04.24 um 15:04 schrieb Philip Yang: To test RDMA using dummy driver on the system without NIC/RDMA device, the get/put dma pages pass in null device pointer, skip the dma map/unmap resource and sg table to avoid null pointer access. Well just to make it clear this patch is really a no-go for upstreaming. The RDMA code isn't upstream as far as I know and doing this here is really not a good idea even internally. Regards, Christian. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 33 +++- 1 file changed, 19 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 6c7133bf51d8..101a85263b53 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -698,12 +698,15 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev, unsigned long size = min(cursor.size, MAX_SG_SEGMENT_SIZE); dma_addr_t addr; - addr = dma_map_resource(dev, phys, size, dir, - DMA_ATTR_SKIP_CPU_SYNC); - r = dma_mapping_error(dev, addr); - if (r) - goto error_unmap; - + if (dev) { + addr = dma_map_resource(dev, phys, size, dir, + DMA_ATTR_SKIP_CPU_SYNC); + r = dma_mapping_error(dev, addr); + if (r) + goto error_unmap; + } else { + addr = phys; + } sg_set_page(sg, NULL, size, 0); sg_dma_address(sg) = addr; sg_dma_len(sg) = size; @@ -717,10 +720,10 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev, for_each_sgtable_sg((*sgt), sg, i) { if (!sg->length) continue; - - dma_unmap_resource(dev, sg->dma_address, - sg->length, dir, - DMA_ATTR_SKIP_CPU_SYNC); + if (dev) + dma_unmap_resource(dev, sg->dma_address, + sg->length, dir, + DMA_ATTR_SKIP_CPU_SYNC); } sg_free_table(*sgt); @@ -745,10 +748,12 @@ void amdgpu_vram_mgr_free_sgt(struct device *dev, struct scatterlist *sg; int i; - for_each_sgtable_sg(sgt, sg, i) - dma_unmap_resource(dev, sg->dma_address, - sg->length, dir, - DMA_ATTR_SKIP_CPU_SYNC); + if (dev) { + for_each_sgtable_sg(sgt, sg, i) + dma_unmap_resource(dev, sg->dma_address, + sg->length, dir, + DMA_ATTR_SKIP_CPU_SYNC); + } sg_free_table(sgt); kfree(sgt); }
Re: [PATCH v4 2/7] drm/amdgpu: Handle sg size limit for contiguous allocation
Am 23.04.24 um 15:04 schrieb Philip Yang: Define macro MAX_SG_SEGMENT_SIZE 2GB, because struct scatterlist length is unsigned int, and some users of it cast to a signed int, so every segment of sg table is limited to size 2GB maximum. For contiguous VRAM allocation, don't limit the max buddy block size in order to get contiguous VRAM memory. To workaround the sg table segment size limit, allocate multiple segments if contiguous size is bigger than MAX_SG_SEGMENT_SIZE. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 4be8b091099a..6c7133bf51d8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -31,6 +31,8 @@ #include "amdgpu_atomfirmware.h" #include "atom.h" +#define MAX_SG_SEGMENT_SIZE (2UL << 30) + Please add an AMDGPU prefix before that name. Apart from that looks good to me, Christian. struct amdgpu_vram_reservation { u64 start; u64 size; @@ -532,9 +534,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, BUG_ON(min_block_size < mm->chunk_size); - /* Limit maximum size to 2GiB due to SG table limitations */ - size = min(remaining_size, 2ULL << 30); - + size = remaining_size; if ((size >= (u64)pages_per_block << PAGE_SHIFT) && !(size & (((u64)pages_per_block << PAGE_SHIFT) - 1))) min_block_size = (u64)pages_per_block << PAGE_SHIFT; @@ -675,7 +675,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev, amdgpu_res_first(res, offset, length, ); while (cursor.remaining) { num_entries++; - amdgpu_res_next(, cursor.size); + amdgpu_res_next(, min(cursor.size, MAX_SG_SEGMENT_SIZE)); } r = sg_alloc_table(*sgt, num_entries, GFP_KERNEL); @@ -695,7 +695,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev, amdgpu_res_first(res, offset, length, ); for_each_sgtable_sg((*sgt), sg, i) { phys_addr_t phys = cursor.start + adev->gmc.aper_base; - size_t size = cursor.size; + unsigned long size = min(cursor.size, MAX_SG_SEGMENT_SIZE); dma_addr_t addr; addr = dma_map_resource(dev, phys, size, dir, @@ -708,7 +708,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev, sg_dma_address(sg) = addr; sg_dma_len(sg) = size; - amdgpu_res_next(, cursor.size); + amdgpu_res_next(, size); } return 0;
Re: [PATCH] drm/amdgpu: add error handle to avoid out-of-bounds
Am 23.04.24 um 11:15 schrieb Bob Zhou: if the sdma_v4_0_irq_id_to_seq return -EINVAL, the process should be stop to avoid out-of-bounds read, so directly return -EINVAL. Signed-off-by: Bob Zhou Acked-by: Christian König --- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c index e2e3856938ed..101038395c3b 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c @@ -2021,6 +2021,9 @@ static int sdma_v4_0_process_trap_irq(struct amdgpu_device *adev, DRM_DEBUG("IH: SDMA trap\n"); instance = sdma_v4_0_irq_id_to_seq(entry->client_id); + if (instance < 0) + return instance; + switch (entry->ring_id) { case 0: amdgpu_fence_process(>sdma.instance[instance].ring);
Re: [PATCH v2] drm/amdgpu: fix uninitialized scalar variable warning
Am 23.04.24 um 10:43 schrieb Tim Huang: From: Tim Huang Clear warning that uses uninitialized value fw_size. Signed-off-by: Tim Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index d9dc5485..8d5cdbb99d8d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c @@ -1084,7 +1084,7 @@ void amdgpu_gfx_cp_init_microcode(struct amdgpu_device *adev, const struct gfx_firmware_header_v2_0 *cp_hdr_v2_0; struct amdgpu_firmware_info *info = NULL; const struct firmware *ucode_fw; - unsigned int fw_size; + unsigned int fw_size = 0; You don't need that any more when the default case returns. Regards, Christian. switch (ucode_id) { case AMDGPU_UCODE_ID_CP_PFP: @@ -1205,7 +1205,8 @@ void amdgpu_gfx_cp_init_microcode(struct amdgpu_device *adev, fw_size = le32_to_cpu(cp_hdr_v2_0->data_size_bytes); break; default: - break; + dev_err(adev->dev, "Invalid ucode id %u\n", ucode_id); + return; } if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
Re: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning
Am 23.04.24 um 10:12 schrieb Huang, Tim: [AMD Official Use Only - General] -Original Message- From: amd-gfx On Behalf Of Huang, Tim Sent: Tuesday, April 23, 2024 4:01 PM To: Koenig, Christian ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander Subject: RE: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning [AMD Official Use Only - General] [AMD Official Use Only - General] Hi Christian, -Original Message- From: Koenig, Christian Sent: Tuesday, April 23, 2024 3:43 PM To: Huang, Tim ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Koenig, Christian Subject: Re: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning Am 23.04.24 um 08:28 schrieb Tim Huang: Clear warning that uses uninitialized value fw_size. In which case is the fw_size uninitialized and why setting it to zero helps in that case? It's a warning that reported by the Coverity scan. When the switch case " switch (ucode_id) " got to default and Condition "adev->firmware.load_type == AMDGPU_FW_LOAD_PSP", taking true branch, it reports " uses uninitialized value fw_size " by this line. "adev->firmware.fw_size += ALIGN(fw_size, PAGE_SIZE);“ It may not happen if we call this function correctly, but it just clears the warning and looks harmless. Hi Christian, I think it more to fix this warning, maybe I need to print an error and just return when go to the default case of "switch (ucode_id)" , will send out a v2 patch. Thanks. Yeah, exactly that's the right idea. Regards, Christian. Regards, Christian. Signed-off-by: Tim Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index d9dc5485..6b8a58f501d3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c @@ -1084,7 +1084,7 @@ void amdgpu_gfx_cp_init_microcode(struct amdgpu_device *adev, const struct gfx_firmware_header_v2_0 *cp_hdr_v2_0; struct amdgpu_firmware_info *info = NULL; const struct firmware *ucode_fw; - unsigned int fw_size; + unsigned int fw_size = 0; switch (ucode_id) { case AMDGPU_UCODE_ID_CP_PFP:
Re: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning
The problem is that it's a hit all case and that's usually seen as bad coding style. In other words when one branch by accident forgets to set the fw_size we wouldn't get a warning any more and just use zero. Please rather add setting the fw_size to zero to the default branch and maybe even add a warning when that happens. Regards, Christian. Am 23.04.24 um 10:01 schrieb Huang, Tim: [AMD Official Use Only - General] Hi Christian, -Original Message- From: Koenig, Christian Sent: Tuesday, April 23, 2024 3:43 PM To: Huang, Tim ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Koenig, Christian Subject: Re: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning Am 23.04.24 um 08:28 schrieb Tim Huang: Clear warning that uses uninitialized value fw_size. In which case is the fw_size uninitialized and why setting it to zero helps in that case? It's a warning that reported by the Coverity scan. When the switch case " switch (ucode_id) " got to default and Condition "adev->firmware.load_type == AMDGPU_FW_LOAD_PSP", taking true branch, it reports " uses uninitialized value fw_size " by this line. "adev->firmware.fw_size += ALIGN(fw_size, PAGE_SIZE);“ It may not happen if we call this function correctly, but it just clears the warning and looks harmless. Regards, Christian. Signed-off-by: Tim Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index d9dc5485..6b8a58f501d3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c @@ -1084,7 +1084,7 @@ void amdgpu_gfx_cp_init_microcode(struct amdgpu_device *adev, const struct gfx_firmware_header_v2_0 *cp_hdr_v2_0; struct amdgpu_firmware_info *info = NULL; const struct firmware *ucode_fw; - unsigned int fw_size; + unsigned int fw_size = 0; switch (ucode_id) { case AMDGPU_UCODE_ID_CP_PFP:
Re: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning
Am 23.04.24 um 08:28 schrieb Tim Huang: Clear warning that uses uninitialized value fw_size. In which case is the fw_size uninitialized and why setting it to zero helps in that case? Regards, Christian. Signed-off-by: Tim Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index d9dc5485..6b8a58f501d3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c @@ -1084,7 +1084,7 @@ void amdgpu_gfx_cp_init_microcode(struct amdgpu_device *adev, const struct gfx_firmware_header_v2_0 *cp_hdr_v2_0; struct amdgpu_firmware_info *info = NULL; const struct firmware *ucode_fw; - unsigned int fw_size; + unsigned int fw_size = 0; switch (ucode_id) { case AMDGPU_UCODE_ID_CP_PFP:
Re: [PATCH 2/2] drm/amdgpu: fix uninitialized variable warning
In this case we should modify amdgpu_i2c_get_byte() to return an error and prevent writing the value back. See zero is as random as any other value and initializing the variable here doesn't really help, it just makes your warning disappear. Regards, Christian. Am 23.04.24 um 08:27 schrieb Zhou, Bob: [AMD Official Use Only - General] Thanks for your comments. I should clarify the issue. As you see the amdgpu_i2c_get_byte code: if (i2c_transfer(_bus->adapter, msgs, 2) == 2) { *val = in_buf[0]; DRM_DEBUG("val = 0x%02x\n", *val); } else { DRM_DEBUG("i2c 0x%02x 0x%02x read failed\n", addr, *val); } If the read failure by amdgpu_i2c_get_byte(), the value will not be modified. Then the amdgpu_i2c_put_byte() successfully written the random value and it will cause unexpected issue. Regards, Bob -Original Message- From: Koenig, Christian Sent: 2024年4月23日 14:05 To: Zhou, Bob ; amd-gfx@lists.freedesktop.org; Deucher, Alexander ; Koenig, Christian Subject: Re: [PATCH 2/2] drm/amdgpu: fix uninitialized variable warning Am 23.04.24 um 07:33 schrieb Bob Zhou: Because the val isn't initialized, a random variable is set by amdgpu_i2c_put_byte. So fix the uninitialized issue. Well that isn't correct. See the code here: amdgpu_i2c_get_byte(amdgpu_connector->router_bus, amdgpu_connector->router.i2c_addr, 0x3, ); val &= ~amdgpu_connector->router.cd_mux_control_pin; amdgpu_i2c_put_byte(amdgpu_connector->router_bus, amdgpu_connector->router.i2c_addr, 0x3, val); The value is first read by amdgpu_i2c_get_byte(), then modified and then written again by amdgpu_i2c_put_byte(). Was this an automated warning? Either way the patch is clearly rejected. Regards, Christian. Signed-off-by: Bob Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c index 82608df43396..d4d2dc792b60 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c @@ -368,7 +368,7 @@ amdgpu_i2c_router_select_ddc_port(const struct amdgpu_connector *amdgpu_connecto void amdgpu_i2c_router_select_cd_port(const struct amdgpu_connector *amdgpu_connector) { - u8 val; + u8 val = 0; if (!amdgpu_connector->router.cd_valid) return;
Re: [PATCH v2] drm/amdgpu: IB test encode test package change for VCN5
Am 22.04.24 um 21:59 schrieb Sonny Jiang: From: Sonny Jiang VCN5 session info package interface changed Signed-off-by: Sonny Jiang Mhm, in general we should push back on FW changes which makes stuff like that necessary. So what is the justification? If the FW has a good justification for it then in theory we should create new hw generation specific functions. But copying the whole function for vcn_v5_0.c is overkill as well. Regards, Christian. --- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c index bb85772b1374..2bebdaaff533 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c @@ -851,6 +851,7 @@ static int amdgpu_vcn_enc_get_create_msg(struct amdgpu_ring *ring, uint32_t hand struct amdgpu_ib *ib_msg, struct dma_fence **fence) { + struct amdgpu_device *adev = ring->adev; unsigned int ib_size_dw = 16; struct amdgpu_job *job; struct amdgpu_ib *ib; @@ -882,7 +883,10 @@ static int amdgpu_vcn_enc_get_create_msg(struct amdgpu_ring *ring, uint32_t hand ib->ptr[ib->length_dw++] = handle; ib->ptr[ib->length_dw++] = upper_32_bits(addr); ib->ptr[ib->length_dw++] = addr; - ib->ptr[ib->length_dw++] = 0x000b; + if (amdgpu_ip_version(adev, UVD_HWIP, 0) < IP_VERSION(5, 0, 0)) + ib->ptr[ib->length_dw++] = 0x000b; + else + ib->ptr[ib->length_dw++] = 0x; ib->ptr[ib->length_dw++] = 0x0014; ib->ptr[ib->length_dw++] = 0x0002; /* task info */ @@ -918,6 +922,7 @@ static int amdgpu_vcn_enc_get_destroy_msg(struct amdgpu_ring *ring, uint32_t han struct amdgpu_ib *ib_msg, struct dma_fence **fence) { + struct amdgpu_device *adev = ring->adev; unsigned int ib_size_dw = 16; struct amdgpu_job *job; struct amdgpu_ib *ib; @@ -949,7 +954,10 @@ static int amdgpu_vcn_enc_get_destroy_msg(struct amdgpu_ring *ring, uint32_t han ib->ptr[ib->length_dw++] = handle; ib->ptr[ib->length_dw++] = upper_32_bits(addr); ib->ptr[ib->length_dw++] = addr; - ib->ptr[ib->length_dw++] = 0x000b; + if (amdgpu_ip_version(adev, UVD_HWIP, 0) < IP_VERSION(5, 0, 0)) + ib->ptr[ib->length_dw++] = 0x000b; + else + ib->ptr[ib->length_dw++] = 0x; ib->ptr[ib->length_dw++] = 0x0014; ib->ptr[ib->length_dw++] = 0x0002;
Re: [PATCH 1/2] drm/amdgpu: add a spinlock to wb allocation
Am 22.04.24 um 16:37 schrieb Alex Deucher: As we use wb slots more dynamically, we need to lock access to avoid racing on allocation or free. Wait a second. Why are we using the wb slots dynamically? The number of slots made available is statically calculated, when this is suddenly used dynamically we have quite a bug here. Regards, Christian. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 ++- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index cac0ca64367b..f87d53e183c3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -502,6 +502,7 @@ struct amdgpu_wb { uint64_tgpu_addr; u32 num_wb; /* Number of wb slots actually reserved for amdgpu. */ unsigned long used[DIV_ROUND_UP(AMDGPU_MAX_WB, BITS_PER_LONG)]; + spinlock_t lock; }; int amdgpu_device_wb_get(struct amdgpu_device *adev, u32 *wb); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index f8a34db5d9e3..869256394136 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -1482,13 +1482,17 @@ static int amdgpu_device_wb_init(struct amdgpu_device *adev) */ int amdgpu_device_wb_get(struct amdgpu_device *adev, u32 *wb) { - unsigned long offset = find_first_zero_bit(adev->wb.used, adev->wb.num_wb); + unsigned long flags, offset; + spin_lock_irqsave(>wb.lock, flags); + offset = find_first_zero_bit(adev->wb.used, adev->wb.num_wb); if (offset < adev->wb.num_wb) { __set_bit(offset, adev->wb.used); + spin_unlock_irqrestore(>wb.lock, flags); *wb = offset << 3; /* convert to dw offset */ return 0; } else { + spin_unlock_irqrestore(>wb.lock, flags); return -EINVAL; } } @@ -1503,9 +1507,13 @@ int amdgpu_device_wb_get(struct amdgpu_device *adev, u32 *wb) */ void amdgpu_device_wb_free(struct amdgpu_device *adev, u32 wb) { + unsigned long flags; + wb >>= 3; + spin_lock_irqsave(>wb.lock, flags); if (wb < adev->wb.num_wb) __clear_bit(wb, adev->wb.used); + spin_unlock_irqrestore(>wb.lock, flags); } /** @@ -4061,6 +4069,7 @@ int amdgpu_device_init(struct amdgpu_device *adev, spin_lock_init(>se_cac_idx_lock); spin_lock_init(>audio_endpt_idx_lock); spin_lock_init(>mm_stats.lock); + spin_lock_init(>wb.lock); INIT_LIST_HEAD(>shadow_list); mutex_init(>shadow_list_lock);
Re: [PATCH 3/3] drm/amdgpu: Fix Uninitialized scalar variable warning
Am 23.04.24 um 04:53 schrieb Ma, Jun: unsigned int client_id, src_id; struct amdgpu_irq_src *src; bool handled = false; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index 924baf58e322..f0a63d084b4d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -1559,7 +1559,7 @@ static int amdgpu_debugfs_firmware_info_show(struct seq_file *m, void *unused) { struct amdgpu_device *adev = m->private; struct drm_amdgpu_info_firmware fw_info; - struct drm_amdgpu_query_fw query_fw; + struct drm_amdgpu_query_fw query_fw = {0}; Coverity warning: uninit_use_in_call Using uninitialized value query_fw.index when calling amdgpu_firmware_info Even though qeuery_fw.index was assigned a value before it's used, there is still an coverity warning. We need to initialize query_fw when declare it. But initializing it to zero doesn't sounds correct either. The amdgpu_firmware_info() function is designed to return the FW info for a specific block, if the block isn't specified than coverity is right that we have a coding error here. Just initializing the value silences coverity but is most likely not the right thing to do. Regards, Christian. struct atom_context *ctx = adev->mode_info.atom_context; uint8_t smu_program, smu_major, smu_minor, smu_debug; int ret, i; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c index 2b99eed5ba19..41ac3319108b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c @@ -120,7 +120,7 @@ static void __amdgpu_xcp_add_block(struct amdgpu_xcp_mgr *xcp_mgr, int xcp_id, int amdgpu_xcp_init(struct amdgpu_xcp_mgr *xcp_mgr, int num_xcps, int mode) { struct amdgpu_device *adev = xcp_mgr->adev; - struct amdgpu_xcp_ip ip; + struct amdgpu_xcp_ip ip = {0}; Coverity Warning: Using uninitialized value ip. Field ip.valid is uninitialized when calling __amdgpu_xcp_add_block The code is ok. We just need to initialize the variable ip. Regards, Ma Jun uint8_t mem_id; int i, j, ret;
Re: [PATCH 2/2] drm/amdgpu: fix uninitialized variable warning
Am 23.04.24 um 07:33 schrieb Bob Zhou: Because the val isn't initialized, a random variable is set by amdgpu_i2c_put_byte. So fix the uninitialized issue. Well that isn't correct. See the code here: amdgpu_i2c_get_byte(amdgpu_connector->router_bus, amdgpu_connector->router.i2c_addr, 0x3, ); val &= ~amdgpu_connector->router.cd_mux_control_pin; amdgpu_i2c_put_byte(amdgpu_connector->router_bus, amdgpu_connector->router.i2c_addr, 0x3, val); The value is first read by amdgpu_i2c_get_byte(), then modified and then written again by amdgpu_i2c_put_byte(). Was this an automated warning? Either way the patch is clearly rejected. Regards, Christian. Signed-off-by: Bob Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c index 82608df43396..d4d2dc792b60 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c @@ -368,7 +368,7 @@ amdgpu_i2c_router_select_ddc_port(const struct amdgpu_connector *amdgpu_connecto void amdgpu_i2c_router_select_cd_port(const struct amdgpu_connector *amdgpu_connector) { - u8 val; + u8 val = 0; if (!amdgpu_connector->router.cd_valid) return;
Re: [PATCH] drm/amdgpu: Fix two reset triggered in a row
Am 23.04.24 um 05:13 schrieb Li, Yunxiang (Teddy): [Public] We can't do this technically as there are cases where we skip full device reset (even then amdgpu_in_reset will return true). The better thing to do is to move amdgpu_device_stop_pending_resets() later in gpu_recover()- if a device has undergone full reset, then cancel all pending resets. Presently it's happening earlier which could be why this issue is seen. This sounds like it is a design issue then, if different reset workers expect different resets to be triggered but they all use the same flag. I wonder if the other places that check this flags are correct. FWIW I was testing with SRIOV where it always does full reset and ran into this issue. Lijo is correct. The idea here is that all reset sources which have been covered by a reset are canceled directly after the reset is completed. The approach with checking amdgpu_in_reset() is broken because it can still happen that multiple sources signal at the same time that a reset is necessary. Regards, Christian.
Re: [PATCH] drm/amdgpu: Fix two reset triggered in a row
Am 22.04.24 um 21:45 schrieb Yunxiang Li: Reset request from KFD is missing a check for if a reset is already in progress, this causes a second reset to be triggered right after the previous one finishes. Add the check to align with the other reset sources. NAK, that isn't how this should be handled. Instead all reset source which are handled by a previous reset should be canceled. In other words there should be a cancel_work(>kfd.reset_work); somewhere in the KFD code. When this doesn't work correctly then that is somehow missing. If you see the use of amdgpu_in_reset() outside of the low level functions than that is clearly a bug. Regards, Christian. Signed-off-by: Yunxiang Li --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 3b4591f554f1..ce3dbb1cc2da 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -283,7 +283,7 @@ int amdgpu_amdkfd_post_reset(struct amdgpu_device *adev) void amdgpu_amdkfd_gpu_reset(struct amdgpu_device *adev) { - if (amdgpu_device_should_recover_gpu(adev)) + if (amdgpu_device_should_recover_gpu(adev) && !amdgpu_in_reset(adev)) amdgpu_reset_domain_schedule(adev->reset_domain, >kfd.reset_work); }
Re: [PATCH] drm/amdgpu: once more fix the call oder in amdgpu_ttm_move()
Am 18.04.24 um 18:10 schrieb Alex Deucher: On Thu, Mar 21, 2024 at 10:37 AM Christian König wrote: Am 21.03.24 um 15:12 schrieb Tvrtko Ursulin: On 21/03/2024 12:43, Christian König wrote: This reverts drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap. The basic problem here is that after the move the old location is simply not available any more. Some fixes where suggested, but essentially we should call the move notification before actually moving things because only this way we have the correct order for DMA-buf and VM move notifications as well. Also rework the statistic handling so that we don't update the eviction counter before the move. Signed-off-by: Christian König Don't forget: Fixes: 94aeb4117343 ("drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3171 Ah, thanks. I already wanted to ask if there is any bug report about that as well. Did this ever land? I don't see it anywhere. No, I never found time to actually rebase and push it. Just did so 10 minutes ago, should probably show up in amd-staging-drm-next unless there isn't any CI hickup again. Christian. Alex Regards, Christian. ;) Regards, Tvrtko --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 15 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 48 -- 3 files changed, 37 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 425cebcc5cbf..eb7d824763b9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1245,19 +1245,20 @@ int amdgpu_bo_get_metadata(struct amdgpu_bo *bo, void *buffer, * amdgpu_bo_move_notify - notification about a memory move * @bo: pointer to a buffer object * @evict: if this move is evicting the buffer from the graphics address space + * @new_mem: new resource for backing the BO * * Marks the corresponding _bo buffer object as invalid, also performs * bookkeeping. * TTM driver callback which is called when ttm moves a buffer. */ -void amdgpu_bo_move_notify(struct ttm_buffer_object *bo, bool evict) +void amdgpu_bo_move_notify(struct ttm_buffer_object *bo, + bool evict, + struct ttm_resource *new_mem) { struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev); +struct ttm_resource *old_mem = bo->resource; struct amdgpu_bo *abo; -if (!amdgpu_bo_is_amdgpu_bo(bo)) -return; - abo = ttm_to_amdgpu_bo(bo); amdgpu_vm_bo_invalidate(adev, abo, evict); @@ -1267,9 +1268,9 @@ void amdgpu_bo_move_notify(struct ttm_buffer_object *bo, bool evict) bo->resource->mem_type != TTM_PL_SYSTEM) dma_buf_move_notify(abo->tbo.base.dma_buf); -/* remember the eviction */ -if (evict) -atomic64_inc(>num_evictions); +/* move_notify is called before move happens */ +trace_amdgpu_bo_move(abo, new_mem ? new_mem->mem_type : -1, + old_mem ? old_mem->mem_type : -1); } void amdgpu_bo_get_memory(struct amdgpu_bo *bo, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h index a3ea8a82db23..d28e21baef16 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -344,7 +344,9 @@ int amdgpu_bo_set_metadata (struct amdgpu_bo *bo, void *metadata, int amdgpu_bo_get_metadata(struct amdgpu_bo *bo, void *buffer, size_t buffer_size, uint32_t *metadata_size, uint64_t *flags); -void amdgpu_bo_move_notify(struct ttm_buffer_object *bo, bool evict); +void amdgpu_bo_move_notify(struct ttm_buffer_object *bo, + bool evict, + struct ttm_resource *new_mem); void amdgpu_bo_release_notify(struct ttm_buffer_object *bo); vm_fault_t amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo); void amdgpu_bo_fence(struct amdgpu_bo *bo, struct dma_fence *fence, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index a5ceec7820cf..460b23918bfc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -471,14 +471,16 @@ static int amdgpu_bo_move(struct ttm_buffer_object *bo, bool evict, if (!old_mem || (old_mem->mem_type == TTM_PL_SYSTEM && bo->ttm == NULL)) { +amdgpu_bo_move_notify(bo, evict, new_mem); ttm_bo_move_null(bo, new_mem); -goto out; +return 0; } if (old_mem->mem_type == TTM_PL_SYSTEM && (new_mem->mem_type == TTM_PL_TT || new_mem->mem_type == AMDGPU_PL_PREEMPT)) { +amdgpu_bo_move_notify(bo, evict, new_mem); tt
Re: [PATCH 3/3] drm/amdgpu: add the amdgpu buffer object move speed metrics
Am 16.04.24 um 10:51 schrieb Prike Liang: Add the amdgpu buffer object move speed metrics. What should that be good for? It adds quite a bunch of complexity for a feature we actually want to deprecate. Regards, Christian. Signed-off-by: Prike Liang --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c | 78 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +- 3 files changed, 61 insertions(+), 21 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 163d221b3bbd..2840f1536b51 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -502,7 +502,7 @@ void amdgpu_device_wb_free(struct amdgpu_device *adev, u32 wb); /* * Benchmarking */ -int amdgpu_benchmark(struct amdgpu_device *adev, int test_number); +int amdgpu_benchmark(struct amdgpu_device *adev, int test_number, struct seq_file *m); int amdgpu_benchmark_dump(struct amdgpu_device *adev, struct seq_file *m); /* diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c index f6848b574dea..fcd186ca088a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c @@ -65,20 +65,27 @@ static void amdgpu_benchmark_log_results(struct amdgpu_device *adev, int n, unsigned size, s64 time_ms, unsigned sdomain, unsigned ddomain, -char *kind) +char *kind, struct seq_file *m) { s64 throughput = (n * (size >> 10)); throughput = div64_s64(throughput, time_ms); - dev_info(adev->dev, "amdgpu: %s %u bo moves of %u kB from" -" %d to %d in %lld ms, throughput: %lld Mb/s or %lld MB/s\n", -kind, n, size >> 10, sdomain, ddomain, time_ms, -throughput * 8, throughput); + if (m) { + seq_printf(m, "\tamdgpu: %s %u bo moves of %u kB from" +" %d to %d in %lld ms, throughput: %lld Mb/s or %lld MB/s\n", + kind, n, size >> 10, sdomain, ddomain, time_ms, + throughput * 8, throughput); + } else { + dev_info(adev->dev, "amdgpu: %s %u bo moves of %u kB from" +" %d to %d in %lld ms, throughput: %lld Mb/s or %lld MB/s\n", + kind, n, size >> 10, sdomain, ddomain, time_ms, + throughput * 8, throughput); + } } static int amdgpu_benchmark_move(struct amdgpu_device *adev, unsigned size, -unsigned sdomain, unsigned ddomain) +unsigned sdomain, unsigned ddomain, struct seq_file *m) { struct amdgpu_bo *dobj = NULL; struct amdgpu_bo *sobj = NULL; @@ -109,7 +116,7 @@ static int amdgpu_benchmark_move(struct amdgpu_device *adev, unsigned size, goto out_cleanup; else amdgpu_benchmark_log_results(adev, n, size, time_ms, -sdomain, ddomain, "dma"); +sdomain, ddomain, "dma", m); } out_cleanup: @@ -124,7 +131,7 @@ static int amdgpu_benchmark_move(struct amdgpu_device *adev, unsigned size, return r; } -int amdgpu_benchmark(struct amdgpu_device *adev, int test_number) +int amdgpu_benchmark(struct amdgpu_device *adev, int test_number, struct seq_file *m) { int i, r; static const int common_modes[AMDGPU_BENCHMARK_COMMON_MODES_N] = { @@ -153,13 +160,16 @@ int amdgpu_benchmark(struct amdgpu_device *adev, int test_number) dev_info(adev->dev, "benchmark test: %d (simple test, VRAM to GTT and GTT to VRAM)\n", test_number); + if (m) + seq_printf(m, "\tbenchmark test: %d (simple test, VRAM to GTT and GTT to VRAM)\n", +test_number); /* simple test, VRAM to GTT and GTT to VRAM */ r = amdgpu_benchmark_move(adev, 1024*1024, AMDGPU_GEM_DOMAIN_GTT, - AMDGPU_GEM_DOMAIN_VRAM); + AMDGPU_GEM_DOMAIN_VRAM, m); if (r) goto done; r = amdgpu_benchmark_move(adev, 1024*1024, AMDGPU_GEM_DOMAIN_VRAM, - AMDGPU_GEM_DOMAIN_GTT); + AMDGPU_GEM_DOMAIN_GTT, m); if (r) goto done; break; @@ -167,9 +177,13 @@ int amdgpu_benchmark(struct amdgpu_device *adev, int test_number)
Re: [PATCH v3 6/7] drm/amdgpu: Skip dma map resource for null RDMA device
Am 22.04.24 um 15:57 schrieb Philip Yang: To test RDMA using dummy driver on the system without NIC/RDMA device, the get/put dma pages pass in null device pointer, skip the dma map/unmap resource and sg table to avoid null pointer access. Well that is completely illegal and would break IOMMU. Why does the RDMA driver does that in the first place? Regards, Christian. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 33 +++- 1 file changed, 19 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 9fe56a21ef88..0caf2c89ef1d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -705,12 +705,15 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev, unsigned long size = min(cursor.size, MAX_SG_SEGMENT_SIZE); dma_addr_t addr; - addr = dma_map_resource(dev, phys, size, dir, - DMA_ATTR_SKIP_CPU_SYNC); - r = dma_mapping_error(dev, addr); - if (r) - goto error_unmap; - + if (dev) { + addr = dma_map_resource(dev, phys, size, dir, + DMA_ATTR_SKIP_CPU_SYNC); + r = dma_mapping_error(dev, addr); + if (r) + goto error_unmap; + } else { + addr = phys; + } sg_set_page(sg, NULL, size, 0); sg_dma_address(sg) = addr; sg_dma_len(sg) = size; @@ -724,10 +727,10 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev, for_each_sgtable_sg((*sgt), sg, i) { if (!sg->length) continue; - - dma_unmap_resource(dev, sg->dma_address, - sg->length, dir, - DMA_ATTR_SKIP_CPU_SYNC); + if (dev) + dma_unmap_resource(dev, sg->dma_address, + sg->length, dir, + DMA_ATTR_SKIP_CPU_SYNC); } sg_free_table(*sgt); @@ -752,10 +755,12 @@ void amdgpu_vram_mgr_free_sgt(struct device *dev, struct scatterlist *sg; int i; - for_each_sgtable_sg(sgt, sg, i) - dma_unmap_resource(dev, sg->dma_address, - sg->length, dir, - DMA_ATTR_SKIP_CPU_SYNC); + if (dev) { + for_each_sgtable_sg(sgt, sg, i) + dma_unmap_resource(dev, sg->dma_address, + sg->length, dir, + DMA_ATTR_SKIP_CPU_SYNC); + } sg_free_table(sgt); kfree(sgt); }
Re: [PATCH] drm/amdgpu: Fixup bad vram size on gmc v6 and v7
Am 22.04.24 um 16:40 schrieb Alex Deucher: On Mon, Apr 22, 2024 at 9:00 AM Christian König wrote: Am 22.04.24 um 14:33 schrieb Qiang Ma: On Mon, 22 Apr 2024 11:40:26 +0200 Christian König wrote: Am 22.04.24 um 07:26 schrieb Qiang Ma: Some boards(like Oland PRO: 0x1002:0x6613) seem to have garbage in the upper 16 bits of the vram size register, kern log as follows: [6.00] [drm] Detected VRAM RAM=2256537600M, BAR=256M [6.007812] [drm] RAM width 64bits GDDR5 [6.031250] [drm] amdgpu: 2256537600M of VRAM memory ready This is obviously not true, check for this and clamp the size properly. Fixes boards reporting bogus amounts of vram, kern log as follows: [2.789062] [drm] Probable bad vram size: 0x86800800 [2.789062] [drm] Detected VRAM RAM=2048M, BAR=256M [2.789062] [drm] RAM width 64bits GDDR5 [2.789062] [drm] amdgpu: 2048M of VRAM memory ready Well we had patches like this one here before and so far we always rejected them. When the mmCONFIG_MEMSIZE register isn't properly initialized then there is something wrong with your hardware. Working around that in the software driver is not going to fly. Regards, Christian. Hi Christian: I see that two patches for this issue have been merged, and the patches are as follows: 11544d77e397 drm/amdgpu: fixup bad vram size on gmc v8 0ca223b029a2 drm/radeon: fixup bad vram size on SI Mhm, I remember that we discussed reverting those but it looks like that never happened. I need to ask around internally. Question is do you see any other problems with the board? E.g. incorrect connector or harvesting configuration? I'll need to dig up the past discussion again, but IIRC, the issue was only seen on some non-x86 platforms. Maybe something specific to MMIO on those? I honestly doesn't remember it either, but in general it's the job of the VBIOS to init this register. So if we see the upper bits mangled the VBIOS hasn't done that correctly and it's quite likely that this is only the tip of the iceberg of problems. Christian. Alex Regards, Christian. Qiang Ma Signed-off-by: Qiang Ma --- drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 11 +-- drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 13 ++--- 2 files changed, 19 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c index 23b478639921..3703695f7789 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c @@ -309,8 +309,15 @@ static int gmc_v6_0_mc_init(struct amdgpu_device *adev) } adev->gmc.vram_width = numchan * chansize; /* size in MB on si */ - adev->gmc.mc_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL; - adev->gmc.real_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL; + tmp = RREG32(mmCONFIG_MEMSIZE); + /* some boards may have garbage in the upper 16 bits */ + if (tmp & 0x) { + DRM_INFO("Probable bad vram size: 0x%08x\n", tmp); + if (tmp & 0x) + tmp &= 0x; + } + adev->gmc.mc_vram_size = tmp * 1024ULL * 1024ULL; + adev->gmc.real_vram_size = adev->gmc.mc_vram_size; if (!(adev->flags & AMD_IS_APU)) { r = amdgpu_device_resize_fb_bar(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c index 3da7b6a2b00d..1df1fc578ff6 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c @@ -316,10 +316,10 @@ static void gmc_v7_0_mc_program(struct amdgpu_device *adev) static int gmc_v7_0_mc_init(struct amdgpu_device *adev) { int r; + u32 tmp; adev->gmc.vram_width = amdgpu_atombios_get_vram_width(adev); if (!adev->gmc.vram_width) { - u32 tmp; int chansize, numchan; /* Get VRAM informations */ @@ -363,8 +363,15 @@ static int gmc_v7_0_mc_init(struct amdgpu_device *adev) adev->gmc.vram_width = numchan * chansize; } /* size in MB on si */ - adev->gmc.mc_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL; - adev->gmc.real_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL; + tmp = RREG32(mmCONFIG_MEMSIZE); + /* some boards may have garbage in the upper 16 bits */ + if (tmp & 0x) { + DRM_INFO("Probable bad vram size: 0x%08x\n", tmp); + if (tmp & 0x) + tmp &= 0x; + } + adev->gmc.mc_vram_size = tmp * 1024ULL * 1024ULL; + adev->gmc.real_vram_size = adev->gmc.mc_vram_size; if (!(adev->flags & AMD_IS_APU)) { r = amdgpu_device_resize_fb_bar(adev);
Re: [PATCH v3 2/7] drm/amdgpu: Handle sg size limit for contiguous allocation
Am 22.04.24 um 15:57 schrieb Philip Yang: Define macro MAX_SG_SEGMENT_SIZE 2GB, because struct scatterlist length is unsigned int, and some users of it cast to a signed int, so every segment of sg table is limited to size 2GB maximum. For contiguous VRAM allocation, don't limit the max buddy block size in order to get contiguous VRAM memory. To workaround the sg table segment size limit, allocate multiple segments if contiguous size is bigger than MAX_SG_SEGMENT_SIZE. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 4be8b091099a..9fe56a21ef88 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -31,6 +31,8 @@ #include "amdgpu_atomfirmware.h" #include "atom.h" +#define MAX_SG_SEGMENT_SIZE (2UL << 30) + struct amdgpu_vram_reservation { u64 start; u64 size; @@ -532,8 +534,13 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, BUG_ON(min_block_size < mm->chunk_size); - /* Limit maximum size to 2GiB due to SG table limitations */ - size = min(remaining_size, 2ULL << 30); + if (place->flags & TTM_PL_FLAG_CONTIGUOUS) + size = remaining_size; + else + /* Limit maximum size to 2GiB due to SG table limitations +* for no contiguous allocation. +*/ + size = min(remaining_size, MAX_SG_SEGMENT_SIZE); Well that doesn't make sense, either fix the creation of the sg tables or limit the segment size. Not both. if ((size >= (u64)pages_per_block << PAGE_SHIFT) && !(size & (((u64)pages_per_block << PAGE_SHIFT) - 1))) @@ -675,7 +682,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev, amdgpu_res_first(res, offset, length, ); while (cursor.remaining) { num_entries++; - amdgpu_res_next(, cursor.size); + amdgpu_res_next(, min(cursor.size, MAX_SG_SEGMENT_SIZE)); } r = sg_alloc_table(*sgt, num_entries, GFP_KERNEL); @@ -695,7 +702,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev, amdgpu_res_first(res, offset, length, ); for_each_sgtable_sg((*sgt), sg, i) { phys_addr_t phys = cursor.start + adev->gmc.aper_base; - size_t size = cursor.size; + unsigned long size = min(cursor.size, MAX_SG_SEGMENT_SIZE); Please keep size_t here or use unsigned int, using unsigned long just looks like trying to hide the problem. And I wouldn't use a separate define but rather just INT_MAX instead. Regards, Christian. dma_addr_t addr; addr = dma_map_resource(dev, phys, size, dir, @@ -708,7 +715,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev, sg_dma_address(sg) = addr; sg_dma_len(sg) = size; - amdgpu_res_next(, cursor.size); + amdgpu_res_next(, size); } return 0;
Re: [PATCH] drm/amdgpu: Fixup bad vram size on gmc v6 and v7
Am 22.04.24 um 14:33 schrieb Qiang Ma: On Mon, 22 Apr 2024 11:40:26 +0200 Christian König wrote: Am 22.04.24 um 07:26 schrieb Qiang Ma: Some boards(like Oland PRO: 0x1002:0x6613) seem to have garbage in the upper 16 bits of the vram size register, kern log as follows: [6.00] [drm] Detected VRAM RAM=2256537600M, BAR=256M [6.007812] [drm] RAM width 64bits GDDR5 [6.031250] [drm] amdgpu: 2256537600M of VRAM memory ready This is obviously not true, check for this and clamp the size properly. Fixes boards reporting bogus amounts of vram, kern log as follows: [2.789062] [drm] Probable bad vram size: 0x86800800 [2.789062] [drm] Detected VRAM RAM=2048M, BAR=256M [2.789062] [drm] RAM width 64bits GDDR5 [2.789062] [drm] amdgpu: 2048M of VRAM memory ready Well we had patches like this one here before and so far we always rejected them. When the mmCONFIG_MEMSIZE register isn't properly initialized then there is something wrong with your hardware. Working around that in the software driver is not going to fly. Regards, Christian. Hi Christian: I see that two patches for this issue have been merged, and the patches are as follows: 11544d77e397 drm/amdgpu: fixup bad vram size on gmc v8 0ca223b029a2 drm/radeon: fixup bad vram size on SI Mhm, I remember that we discussed reverting those but it looks like that never happened. I need to ask around internally. Question is do you see any other problems with the board? E.g. incorrect connector or harvesting configuration? Regards, Christian. Qiang Ma Signed-off-by: Qiang Ma --- drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 11 +-- drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 13 ++--- 2 files changed, 19 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c index 23b478639921..3703695f7789 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c @@ -309,8 +309,15 @@ static int gmc_v6_0_mc_init(struct amdgpu_device *adev) } adev->gmc.vram_width = numchan * chansize; /* size in MB on si */ - adev->gmc.mc_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL; - adev->gmc.real_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL; + tmp = RREG32(mmCONFIG_MEMSIZE); + /* some boards may have garbage in the upper 16 bits */ + if (tmp & 0x) { + DRM_INFO("Probable bad vram size: 0x%08x\n", tmp); + if (tmp & 0x) + tmp &= 0x; + } + adev->gmc.mc_vram_size = tmp * 1024ULL * 1024ULL; + adev->gmc.real_vram_size = adev->gmc.mc_vram_size; if (!(adev->flags & AMD_IS_APU)) { r = amdgpu_device_resize_fb_bar(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c index 3da7b6a2b00d..1df1fc578ff6 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c @@ -316,10 +316,10 @@ static void gmc_v7_0_mc_program(struct amdgpu_device *adev) static int gmc_v7_0_mc_init(struct amdgpu_device *adev) { int r; + u32 tmp; adev->gmc.vram_width = amdgpu_atombios_get_vram_width(adev); if (!adev->gmc.vram_width) { - u32 tmp; int chansize, numchan; /* Get VRAM informations */ @@ -363,8 +363,15 @@ static int gmc_v7_0_mc_init(struct amdgpu_device *adev) adev->gmc.vram_width = numchan * chansize; } /* size in MB on si */ - adev->gmc.mc_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL; - adev->gmc.real_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL; + tmp = RREG32(mmCONFIG_MEMSIZE); + /* some boards may have garbage in the upper 16 bits */ + if (tmp & 0x) { + DRM_INFO("Probable bad vram size: 0x%08x\n", tmp); + if (tmp & 0x) + tmp &= 0x; + } + adev->gmc.mc_vram_size = tmp * 1024ULL * 1024ULL; + adev->gmc.real_vram_size = adev->gmc.mc_vram_size; if (!(adev->flags & AMD_IS_APU)) { r = amdgpu_device_resize_fb_bar(adev);
Re: [PATCH] drm/amdgpu/sdma5.2: use legacy HDP flush for SDMA2/3
Am 20.04.24 um 21:02 schrieb Alex Deucher: This avoids a potential conflict with firmwares with the newer HDP flush mechanism. The patch is fine, but I'm starting to wonder why we are using the newer HDP flush mechanism in the first place? Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2156 Signed-off-by: Alex Deucher Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 26 +++--- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c index b2417ba4759b..c44ec41f1cb6 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c @@ -280,17 +280,21 @@ static void sdma_v5_2_ring_emit_hdp_flush(struct amdgpu_ring *ring) u32 ref_and_mask = 0; const struct nbio_hdp_flush_reg *nbio_hf_reg = adev->nbio.hdp_flush_reg; - ref_and_mask = nbio_hf_reg->ref_and_mask_sdma0 << ring->me; - - amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_POLL_REGMEM) | - SDMA_PKT_POLL_REGMEM_HEADER_HDP_FLUSH(1) | - SDMA_PKT_POLL_REGMEM_HEADER_FUNC(3)); /* == */ - amdgpu_ring_write(ring, (adev->nbio.funcs->get_hdp_flush_done_offset(adev)) << 2); - amdgpu_ring_write(ring, (adev->nbio.funcs->get_hdp_flush_req_offset(adev)) << 2); - amdgpu_ring_write(ring, ref_and_mask); /* reference */ - amdgpu_ring_write(ring, ref_and_mask); /* mask */ - amdgpu_ring_write(ring, SDMA_PKT_POLL_REGMEM_DW5_RETRY_COUNT(0xfff) | - SDMA_PKT_POLL_REGMEM_DW5_INTERVAL(10)); /* retry count, poll interval */ + if (ring->me > 1) { + amdgpu_asic_flush_hdp(adev, ring); + } else { + ref_and_mask = nbio_hf_reg->ref_and_mask_sdma0 << ring->me; + + amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_POLL_REGMEM) | + SDMA_PKT_POLL_REGMEM_HEADER_HDP_FLUSH(1) | + SDMA_PKT_POLL_REGMEM_HEADER_FUNC(3)); /* == */ + amdgpu_ring_write(ring, (adev->nbio.funcs->get_hdp_flush_done_offset(adev)) << 2); + amdgpu_ring_write(ring, (adev->nbio.funcs->get_hdp_flush_req_offset(adev)) << 2); + amdgpu_ring_write(ring, ref_and_mask); /* reference */ + amdgpu_ring_write(ring, ref_and_mask); /* mask */ + amdgpu_ring_write(ring, SDMA_PKT_POLL_REGMEM_DW5_RETRY_COUNT(0xfff) | + SDMA_PKT_POLL_REGMEM_DW5_INTERVAL(10)); /* retry count, poll interval */ + } } /**
Re: [PATCH] drm/amdgpu: fix use-after-free issue
Am 22.04.24 um 13:29 schrieb Lazar, Lijo: On 4/22/2024 4:52 PM, Christian König wrote: Am 22.04.24 um 11:37 schrieb Lazar, Lijo: On 4/22/2024 2:59 PM, Christian König wrote: Am 22.04.24 um 10:47 schrieb Jack Xiao: Delete fence fallback timer to fix the ramdom use-after-free issue. That's already done in amdgpu_fence_driver_hw_fini() and absolutely shouldn't be in amdgpu_ring_fini(). And the kfree(ring->fence_drv.fences); shouldn't be there either since that is done in amdgpu_fence_driver_sw_fini(). In the present logic, these are part of special rings dynamically created for mes self tests with amdgpu_mes_add_ring/amdgpu_mes_remove_ring. Ok, we should probably stop doing that altogether. Shashanks work of utilizing the MES in userspace is nearly finished and we don't really need the MES test in the kernel any more. A v2 of the patch is posted. Can we use it temporarily till Shashank's work is in place? Yes, absolutely. Assuming Shashank's work will also include removing MES self test in kernel. Yes, that was the long term plan. But no idea when we can completely upstream that work. Regards, Christian. Thanks, Lijo Regards, Christian. Thanks, Lijo Regards, Christian. Signed-off-by: Jack Xiao --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 06f0a6534a94..93ab9faa2d72 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -390,6 +390,7 @@ void amdgpu_ring_fini(struct amdgpu_ring *ring) >gpu_addr, (void **)>ring); } else { + del_timer_sync(>fence_drv.fallback_timer); kfree(ring->fence_drv.fences); }
Re: [PATCH v2] drm/amdgpu/mes: fix use-after-free issue
Am 22.04.24 um 13:12 schrieb Lazar, Lijo: On 4/22/2024 3:09 PM, Jack Xiao wrote: Delete fence fallback timer to fix the ramdom use-after-free issue. v2: move to amdgpu_mes.c Signed-off-by: Jack Xiao Acked-by: Lijo Lazar Acked-by: Christian König Thanks, Lijo --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c index 78e4f88f5134..226751ea084b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c @@ -1128,6 +1128,7 @@ void amdgpu_mes_remove_ring(struct amdgpu_device *adev, return; amdgpu_mes_remove_hw_queue(adev, ring->hw_queue_id); + del_timer_sync(>fence_drv.fallback_timer); amdgpu_ring_fini(ring); kfree(ring); }
Re: [PATCH 3/3] drm/amdgpu: Fix Uninitialized scalar variable warning
Am 22.04.24 um 11:49 schrieb Ma Jun: Initialize the variables which were not initialized to fix the coverity issue "Uninitialized scalar variable" Feel free to add my Acked-by to the first two patches, but this here clearly doesn't looks like a good idea to me. Signed-off-by: Ma Jun --- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c index 7e6d09730e6d..7b28b6b8982b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c @@ -437,7 +437,7 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih) { u32 ring_index = ih->rptr >> 2; - struct amdgpu_iv_entry entry; + struct amdgpu_iv_entry entry = {0}; When this needs to be initialized there is clearly something wrong with the code. I would guess similar for the other two. What exactly does coverity complains about? Regards, Christian. unsigned int client_id, src_id; struct amdgpu_irq_src *src; bool handled = false; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index 924baf58e322..f0a63d084b4d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -1559,7 +1559,7 @@ static int amdgpu_debugfs_firmware_info_show(struct seq_file *m, void *unused) { struct amdgpu_device *adev = m->private; struct drm_amdgpu_info_firmware fw_info; - struct drm_amdgpu_query_fw query_fw; + struct drm_amdgpu_query_fw query_fw = {0}; struct atom_context *ctx = adev->mode_info.atom_context; uint8_t smu_program, smu_major, smu_minor, smu_debug; int ret, i; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c index 2b99eed5ba19..41ac3319108b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c @@ -120,7 +120,7 @@ static void __amdgpu_xcp_add_block(struct amdgpu_xcp_mgr *xcp_mgr, int xcp_id, int amdgpu_xcp_init(struct amdgpu_xcp_mgr *xcp_mgr, int num_xcps, int mode) { struct amdgpu_device *adev = xcp_mgr->adev; - struct amdgpu_xcp_ip ip; + struct amdgpu_xcp_ip ip = {0}; uint8_t mem_id; int i, j, ret;
Re: [PATCH] drm/amdgpu: fix use-after-free issue
Am 22.04.24 um 11:37 schrieb Lazar, Lijo: On 4/22/2024 2:59 PM, Christian König wrote: Am 22.04.24 um 10:47 schrieb Jack Xiao: Delete fence fallback timer to fix the ramdom use-after-free issue. That's already done in amdgpu_fence_driver_hw_fini() and absolutely shouldn't be in amdgpu_ring_fini(). And the kfree(ring->fence_drv.fences); shouldn't be there either since that is done in amdgpu_fence_driver_sw_fini(). In the present logic, these are part of special rings dynamically created for mes self tests with amdgpu_mes_add_ring/amdgpu_mes_remove_ring. Ok, we should probably stop doing that altogether. Shashanks work of utilizing the MES in userspace is nearly finished and we don't really need the MES test in the kernel any more. Regards, Christian. Thanks, Lijo Regards, Christian. Signed-off-by: Jack Xiao --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 06f0a6534a94..93ab9faa2d72 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -390,6 +390,7 @@ void amdgpu_ring_fini(struct amdgpu_ring *ring) >gpu_addr, (void **)>ring); } else { + del_timer_sync(>fence_drv.fallback_timer); kfree(ring->fence_drv.fences); }
Re: [PATCH] drm/amdgpu: Fixup bad vram size on gmc v6 and v7
Am 22.04.24 um 07:26 schrieb Qiang Ma: Some boards(like Oland PRO: 0x1002:0x6613) seem to have garbage in the upper 16 bits of the vram size register, kern log as follows: [6.00] [drm] Detected VRAM RAM=2256537600M, BAR=256M [6.007812] [drm] RAM width 64bits GDDR5 [6.031250] [drm] amdgpu: 2256537600M of VRAM memory ready This is obviously not true, check for this and clamp the size properly. Fixes boards reporting bogus amounts of vram, kern log as follows: [2.789062] [drm] Probable bad vram size: 0x86800800 [2.789062] [drm] Detected VRAM RAM=2048M, BAR=256M [2.789062] [drm] RAM width 64bits GDDR5 [2.789062] [drm] amdgpu: 2048M of VRAM memory ready Well we had patches like this one here before and so far we always rejected them. When the mmCONFIG_MEMSIZE register isn't properly initialized then there is something wrong with your hardware. Working around that in the software driver is not going to fly. Regards, Christian. Signed-off-by: Qiang Ma --- drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 11 +-- drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 13 ++--- 2 files changed, 19 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c index 23b478639921..3703695f7789 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c @@ -309,8 +309,15 @@ static int gmc_v6_0_mc_init(struct amdgpu_device *adev) } adev->gmc.vram_width = numchan * chansize; /* size in MB on si */ - adev->gmc.mc_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL; - adev->gmc.real_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL; + tmp = RREG32(mmCONFIG_MEMSIZE); + /* some boards may have garbage in the upper 16 bits */ + if (tmp & 0x) { + DRM_INFO("Probable bad vram size: 0x%08x\n", tmp); + if (tmp & 0x) + tmp &= 0x; + } + adev->gmc.mc_vram_size = tmp * 1024ULL * 1024ULL; + adev->gmc.real_vram_size = adev->gmc.mc_vram_size; if (!(adev->flags & AMD_IS_APU)) { r = amdgpu_device_resize_fb_bar(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c index 3da7b6a2b00d..1df1fc578ff6 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c @@ -316,10 +316,10 @@ static void gmc_v7_0_mc_program(struct amdgpu_device *adev) static int gmc_v7_0_mc_init(struct amdgpu_device *adev) { int r; + u32 tmp; adev->gmc.vram_width = amdgpu_atombios_get_vram_width(adev); if (!adev->gmc.vram_width) { - u32 tmp; int chansize, numchan; /* Get VRAM informations */ @@ -363,8 +363,15 @@ static int gmc_v7_0_mc_init(struct amdgpu_device *adev) adev->gmc.vram_width = numchan * chansize; } /* size in MB on si */ - adev->gmc.mc_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL; - adev->gmc.real_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL; + tmp = RREG32(mmCONFIG_MEMSIZE); + /* some boards may have garbage in the upper 16 bits */ + if (tmp & 0x) { + DRM_INFO("Probable bad vram size: 0x%08x\n", tmp); + if (tmp & 0x) + tmp &= 0x; + } + adev->gmc.mc_vram_size = tmp * 1024ULL * 1024ULL; + adev->gmc.real_vram_size = adev->gmc.mc_vram_size; if (!(adev->flags & AMD_IS_APU)) { r = amdgpu_device_resize_fb_bar(adev);
Re: [PATCH] drm/amdgpu: fix use-after-free issue
Am 22.04.24 um 10:47 schrieb Jack Xiao: Delete fence fallback timer to fix the ramdom use-after-free issue. That's already done in amdgpu_fence_driver_hw_fini() and absolutely shouldn't be in amdgpu_ring_fini(). And the kfree(ring->fence_drv.fences); shouldn't be there either since that is done in amdgpu_fence_driver_sw_fini(). Regards, Christian. Signed-off-by: Jack Xiao --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 06f0a6534a94..93ab9faa2d72 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -390,6 +390,7 @@ void amdgpu_ring_fini(struct amdgpu_ring *ring) >gpu_addr, (void **)>ring); } else { + del_timer_sync(>fence_drv.fallback_timer); kfree(ring->fence_drv.fences); }
Re: [PATCH 01/15] drm/amdgpu: Add interface to reserve bad page
Am 18.04.24 um 04:58 schrieb YiPeng Chai: Add interface to reserve bad page. Signed-off-by: YiPeng Chai Yeah, that approach looks valid to me. Just keep in mind that amdgpu_vram_mgr_query_page_status() is not the fastest function cause it does a linear search. Apart from that Reviewed-by: Christian König for this patch, but can't really judge the rest of the patch set. Regards, Christian. --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 19 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 4 2 files changed, 23 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 2c97cb80d79a..05782d68f073 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -2782,6 +2782,7 @@ int amdgpu_ras_recovery_init(struct amdgpu_device *adev) } } + mutex_init(>page_rsv_lock); mutex_init(>page_retirement_lock); init_waitqueue_head(>page_retirement_wq); atomic_set(>page_retirement_req_cnt, 0); @@ -2835,6 +2836,8 @@ static int amdgpu_ras_recovery_fini(struct amdgpu_device *adev) atomic_set(>page_retirement_req_cnt, 0); + mutex_destroy(>page_rsv_lock); + cancel_work_sync(>recovery_work); mutex_lock(>recovery_lock); @@ -4278,3 +4281,19 @@ void amdgpu_ras_query_boot_status(struct amdgpu_device *adev, u32 num_instances) amdgpu_ras_boot_time_error_reporting(adev, i, boot_error); } } + +int amdgpu_ras_reserve_page(struct amdgpu_device *adev, uint64_t pfn) +{ + struct amdgpu_ras *con = amdgpu_ras_get_context(adev); + struct amdgpu_vram_mgr *mgr = >mman.vram_mgr; + uint64_t start = pfn << AMDGPU_GPU_PAGE_SHIFT; + int ret = 0; + + mutex_lock(>page_rsv_lock); + ret = amdgpu_vram_mgr_query_page_status(mgr, start); + if (ret == -ENOENT) + ret = amdgpu_vram_mgr_reserve_range(mgr, start, AMDGPU_GPU_PAGE_SIZE); + mutex_unlock(>page_rsv_lock); + + return ret; +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h index 8d26989c75c8..ab5bf573378e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h @@ -500,6 +500,7 @@ struct amdgpu_ras { wait_queue_head_t page_retirement_wq; struct mutex page_retirement_lock; atomic_t page_retirement_req_cnt; + struct mutex page_rsv_lock; /* Fatal error detected flag */ atomic_t fed; @@ -909,4 +910,7 @@ bool amdgpu_ras_get_fed_status(struct amdgpu_device *adev); bool amdgpu_ras_event_id_is_valid(struct amdgpu_device *adev, u64 id); u64 amdgpu_ras_acquire_event_id(struct amdgpu_device *adev, enum ras_event_type type); + +int amdgpu_ras_reserve_page(struct amdgpu_device *adev, uint64_t pfn); + #endif
Re: [PATCH] drm/amdgpu/vcn: fix unitialized variable warnings
Am 18.04.24 um 20:07 schrieb Pierre-Eric Pelloux-Prayer: Init r to 0 to avoid returning an uninitialized value if we never enter the loop. This case should never be hit in practive, but returning 0 doesn't hurt. The same fix is applied to the 4 places using the same pattern. Signed-off-by: Pierre-Eric Pelloux-Prayer --- drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c | 2 +- drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c index 8f82fb887e9c..724445545563 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c @@ -298,7 +298,7 @@ static int vcn_v3_0_hw_init(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle; struct amdgpu_ring *ring; - int i, j, r; + int i, j, r = 0; That is usually considered bad coding style. Better insert a "return 0;" directly before the done label. Regards, Christian. if (amdgpu_sriov_vf(adev)) { r = vcn_v3_0_start_sriov(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c index 832d15f7b5f6..9be7ae7af4b1 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c @@ -253,7 +253,7 @@ static int vcn_v4_0_hw_init(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle; struct amdgpu_ring *ring; - int i, r; + int i, r = 0; if (amdgpu_sriov_vf(adev)) { r = vcn_v4_0_start_sriov(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c index 501e53e69f2a..593c64e4b8ef 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c @@ -221,7 +221,7 @@ static int vcn_v4_0_5_hw_init(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle; struct amdgpu_ring *ring; - int i, r; + int i, r = 0; for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { if (adev->vcn.harvest_config & (1 << i)) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c index bc60c554eb32..246f967e2e7d 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c @@ -187,7 +187,7 @@ static int vcn_v5_0_0_hw_init(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle; struct amdgpu_ring *ring; - int i, r; + int i, r = 0; for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { if (adev->vcn.harvest_config & (1 << i))
Re: [PATCH] drm/amdgpu/umsch: don't execute umsch test when GPU is in reset/suspend
Am 19.04.24 um 09:52 schrieb Lang Yu: umsch test needs full GPU functionality(e.g., VM update, TLB flush, possibly buffer moving under memory pressure) which may be not ready under these states. Just skip it to avoid potential issues. Signed-off-by: Lang Yu Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c index 06ad68714172..9f9d6a6d5cf3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c @@ -774,6 +774,9 @@ static int umsch_mm_late_init(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle; + if (amdgpu_in_reset(adev) || adev->in_s0ix || adev->in_suspend) + return 0; + return umsch_mm_test(adev); }
Re: [PATCH] drm/amdgpu: Update BO eviction priorities
Am 18.04.24 um 20:06 schrieb Felix Kuehling: Make SVM BOs more likely to get evicted than other BOs. These BOs opportunistically use available VRAM, but can fall back relatively seamlessly to system memory. It also avoids SVM migrations evicting other, more important BOs as they will evict other SVM allocations first. Signed-off-by: Felix Kuehling Good point and at least of hand I can't think of anything which could go wrong here. Just keep an eye on potentially failing CI tests since we haven't really exercised this functionality in recent years. Reviewed-by: Christian König Regards, Christian. --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index cd2dd3ed7153..d80671535ab3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -608,6 +608,8 @@ int amdgpu_bo_create(struct amdgpu_device *adev, else amdgpu_bo_placement_from_domain(bo, bp->domain); if (bp->type == ttm_bo_type_kernel) + bo->tbo.priority = 2; + else if (!(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE)) bo->tbo.priority = 1; if (!bp->destroy)
Re: [PATCH v2 1/6] drm/amdgpu: Support contiguous VRAM allocation
Am 18.04.24 um 15:57 schrieb Philip Yang: RDMA device with limited scatter-gather ability requires contiguous VRAM buffer allocation for RDMA peer direct support. Add a new KFD alloc memory flag and store as bo alloc flag AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS. When pin this bo to export for RDMA peerdirect access, this will set TTM_PL_FLAG_CONTIFUOUS flag, and ask VRAM buddy allocator to get contiguous VRAM. Remove the 2GB max memory block size limit for contiguous allocation. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 9 +++-- include/uapi/linux/kfd_ioctl.h | 1 + 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 0ae9fd844623..ef9154043757 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1712,6 +1712,10 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( alloc_flags = AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE; alloc_flags |= (flags & KFD_IOC_ALLOC_MEM_FLAGS_PUBLIC) ? AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED : 0; + + /* For contiguous VRAM allocation */ + if (flags & KFD_IOC_ALLOC_MEM_FLAGS_CONTIGUOUS_BEST_EFFORT) + alloc_flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS; } xcp_id = fpriv->xcp_id == AMDGPU_XCP_NO_PARTITION ? 0 : fpriv->xcp_id; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 4be8b091099a..2f2ae711 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -532,8 +532,13 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, BUG_ON(min_block_size < mm->chunk_size); - /* Limit maximum size to 2GiB due to SG table limitations */ - size = min(remaining_size, 2ULL << 30); + if (place->flags & TTM_PL_FLAG_CONTIGUOUS) + size = remaining_size; + else + /* Limit maximum size to 2GiB due to SG table limitations +* for no contiguous allocation. +*/ + size = min(remaining_size, 2ULL << 30); Oh, I totally missed this in the first review. That won't work like that the sg table limit is still there even if the BO is contiguous. We could only fix up the VRAM P2P support to use multiple segments in the sg table. Regards, Christian. if ((size >= (u64)pages_per_block << PAGE_SHIFT) && !(size & (((u64)pages_per_block << PAGE_SHIFT) - 1))) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index 2040a470ddb4..c1394c162d4e 100644 --- a/include/uapi/linux/kfd_ioctl.h +++ b/include/uapi/linux/kfd_ioctl.h @@ -407,6 +407,7 @@ struct kfd_ioctl_acquire_vm_args { #define KFD_IOC_ALLOC_MEM_FLAGS_COHERENT (1 << 26) #define KFD_IOC_ALLOC_MEM_FLAGS_UNCACHED (1 << 25) #define KFD_IOC_ALLOC_MEM_FLAGS_EXT_COHERENT (1 << 24) +#define KFD_IOC_ALLOC_MEM_FLAGS_CONTIGUOUS_BEST_EFFORT (1 << 23) /* Allocate memory for later SVM (shared virtual memory) mapping. *
Re: [PATCH 15/15] drm/amdgpu: Use new interface to reserve bad page
Am 18.04.24 um 04:58 schrieb YiPeng Chai: Use new interface to reserve bad page. Signed-off-by: YiPeng Chai --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index d1a2ab944b7d..dee66db10fa2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -2548,9 +2548,7 @@ int amdgpu_ras_add_bad_pages(struct amdgpu_device *adev, goto out; } - amdgpu_vram_mgr_reserve_range(>mman.vram_mgr, - bps[i].retired_page << AMDGPU_GPU_PAGE_SHIFT, - AMDGPU_GPU_PAGE_SIZE); Were is the call to reserve the VRAM range now moved? Regards, Christian. + amdgpu_ras_reserve_page(adev, bps[i].retired_page); memcpy(>bps[data->count], [i], sizeof(*data->bps)); data->count++;
Re: [PATCH v5 1/6] drm/amdgpu: add prototype for ip dump
Am 17.04.24 um 17:45 schrieb Alex Deucher: On Wed, Apr 17, 2024 at 5:38 AM Sunil Khatri wrote: Add the prototype to dump ip registers for all ips of different asics and set them to NULL for now. Based on the requirement add a function pointer for each of them. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c | 1 + drivers/gpu/drm/amd/amdgpu/cik.c | 1 + drivers/gpu/drm/amd/amdgpu/cik_ih.c | 1 + drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 1 + drivers/gpu/drm/amd/amdgpu/cz_ih.c| 1 + drivers/gpu/drm/amd/amdgpu/dce_v10_0.c| 1 + drivers/gpu/drm/amd/amdgpu/dce_v11_0.c| 1 + drivers/gpu/drm/amd/amdgpu/dce_v6_0.c | 1 + drivers/gpu/drm/amd/amdgpu/dce_v8_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 1 + drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c| 1 + drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 1 + drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 1 + drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 1 + drivers/gpu/drm/amd/amdgpu/ih_v6_0.c | 1 + drivers/gpu/drm/amd/amdgpu/ih_v6_1.c | 1 + drivers/gpu/drm/amd/amdgpu/ih_v7_0.c | 1 + drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c| 1 + drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c| 2 ++ drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c| 1 + drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c| 1 + drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c | 1 + drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c | 1 + drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c | 1 + drivers/gpu/drm/amd/amdgpu/mes_v10_1.c| 1 + drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 1 + drivers/gpu/drm/amd/amdgpu/navi10_ih.c| 1 + drivers/gpu/drm/amd/amdgpu/nv.c | 1 + drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c| 1 + drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c| 1 + drivers/gpu/drm/amd/amdgpu/si.c | 1 + drivers/gpu/drm/amd/amdgpu/si_dma.c | 1 + drivers/gpu/drm/amd/amdgpu/si_ih.c| 1 + drivers/gpu/drm/amd/amdgpu/soc15.c| 1 + drivers/gpu/drm/amd/amdgpu/soc21.c| 1 + drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 1 + drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c | 1 + drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c | 1 + drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c | 1 + drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 1 + drivers/gpu/drm/amd/amdgpu/vce_v2_0.c | 1 + drivers/gpu/drm/amd/amdgpu/vce_v3_0.c | 1 + drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 1 + drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 1 + drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 2 ++ drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 1 + drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 1 + drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c | 1 + drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c | 1 + drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c | 1 + drivers/gpu/drm/amd/amdgpu/vi.c | 1 + drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 + drivers/gpu/drm/amd/include/amd_shared.h | 1 + drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c| 1 + drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c| 1 + drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c | 1 + 64 files changed, 66 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c index 6d72355ac492..34a62033a388 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c @@ -637,6 +637,7 @@ static const struct amd_ip_funcs acp_ip_funcs = { .soft_reset = acp_soft_reset, .set_clockgating_state = acp_set_clockgating_state, .set_powergating_state = acp_set_powergating_state, + .dump_ip_state = NULL, You can skip all of the NULL assignments. Static global structures will be 0 initialized. Oh, that's a really good point. We have automated checkers complaining about NULL initialization in structures. So that here would cause tons of automated complains. Regards, Christian. Either way: Reviewed-by: Alex Deucher Alex }; const struct amdgpu_ip_block_version acp_ip_block = { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
Re: [PATCH v5 2/6] drm/amdgpu: add support of gfx10 register dump
Am 17.04.24 um 19:30 schrieb Alex Deucher: On Wed, Apr 17, 2024 at 1:01 PM Khatri, Sunil wrote: On 4/17/2024 10:21 PM, Alex Deucher wrote: On Wed, Apr 17, 2024 at 12:24 PM Lazar, Lijo wrote: [AMD Official Use Only - General] Yes, right now that API doesn't return anything. What I meant is to add that check as well as coredump API is essentially used in hang situations. Old times, access to registers while in GFXOFF resulted in system hang (basically it won't go beyond this point). If that happens, then the purpose of the patch - to get the context of a device hang - is lost. We may not even get a proper dmesg log. Maybe add a call to amdgpu_get_gfx_off_status(), but unfortunately, it's not implemented on every chip yet. So we need both the things do gfx_off and then try status and then read reg and enable gfx_off again. RIght, but first we need to implement the get_gfxoff_status smu callback for all of the chips that are missing it. The question is if it's save to query the status and disable it while the GPU is in a hung state? I mean most of unrecoverable hungs are caused by the GFX block or the memory interface getting into a state where it can't get out again. Regards, Christian. Alex amdgpu_gfx_off_ctrl(adev, false); r= amdgpu_get_gfx_off_status if (!r) { for (i = 0; i < reg_count; i++) adev->gfx.ip_dump[i] = RREG32(SOC15_REG_ENTRY_OFFSET(gc_reg_list_10_1[i])); } amdgpu_gfx_off_ctrl(adev, true); Sunil Alex Thanks, Lijo -Original Message- From: Khatri, Sunil Sent: Wednesday, April 17, 2024 9:42 PM To: Lazar, Lijo ; Alex Deucher ; Khatri, Sunil Cc: Deucher, Alexander ; Koenig, Christian ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH v5 2/6] drm/amdgpu: add support of gfx10 register dump On 4/17/2024 9:31 PM, Lazar, Lijo wrote: On 4/17/2024 9:21 PM, Alex Deucher wrote: On Wed, Apr 17, 2024 at 5:38 AM Sunil Khatri wrote: Adding gfx10 gc registers to be used for register dump via devcoredump during a gpu reset. Signed-off-by: Sunil Khatri Reviewed-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 8 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 4 + drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 130 +- drivers/gpu/drm/amd/amdgpu/soc15.h| 2 + .../include/asic_reg/gc/gc_10_1_0_offset.h| 12 ++ 5 files changed, 155 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index e0d7f4ee7e16..cac0ca64367b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -139,6 +139,14 @@ enum amdgpu_ss { AMDGPU_SS_DRV_UNLOAD }; +struct amdgpu_hwip_reg_entry { + u32 hwip; + u32 inst; + u32 seg; + u32 reg_offset; + const char *reg_name; +}; + struct amdgpu_watchdog_timer { bool timeout_fatal_disable; uint32_t period; /* maxCycles = (1 << period), the number of cycles before a timeout */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h index 04a86dff71e6..64f197bbc866 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h @@ -433,6 +433,10 @@ struct amdgpu_gfx { uint32_tnum_xcc_per_xcp; struct mutexpartition_mutex; boolmcbp; /* mid command buffer preemption */ + + /* IP reg dump */ + uint32_t*ip_dump; + uint32_treg_count; }; struct amdgpu_gfx_ras_reg_entry { diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index a0bc4196ff8b..4a54161f4837 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -276,6 +276,99 @@ MODULE_FIRMWARE("amdgpu/gc_10_3_7_mec.bin"); MODULE_FIRMWARE("amdgpu/gc_10_3_7_mec2.bin"); MODULE_FIRMWARE("amdgpu/gc_10_3_7_rlc.bin"); +static const struct amdgpu_hwip_reg_entry gc_reg_list_10_1[] = { + SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS), + SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS2), + SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS3), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_STALLED_STAT1), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_STALLED_STAT2), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPC_STALLED_STAT1), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPF_STALLED_STAT1), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_BUSY_STAT), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPC_BUSY_STAT), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPF_BUSY_STAT), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPC_BUSY_STAT2), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPF_BUSY_STAT2), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPF_STATUS), + SOC15_REG_ENTRY_STR(GC,
Re: [PATCH 3/3] drm/amdgpu/mes11: make fence waits synchronous
Am 17.04.24 um 21:21 schrieb Alex Deucher: On Wed, Apr 17, 2024 at 3:17 PM Liu, Shaoyun wrote: [AMD Official Use Only - General] I have a discussion with Christian about this before . The conclusion is that driver should prevent multiple process from using the MES ring at the same time . Also for current MES ring usage ,driver doesn't have the logic to prevent the ring been overflowed and we doesn't hit the issue because MES will wait polling for each MES submission . If we want to change the MES to work asynchronously , we need to consider a way to avoid this (similar to add the limit in the fence handling we use for kiq and HMM paging) I think we need a separate fence (different GPU address and seq number) per request. Then each caller can wait independently. Well no, we need to modify the MES firmware to stop abusing the fence as signaling mechanism for the result of an operation. I've pointed that out before and I think this is a hard requirement for correct operation. Additional to that retrying on the reset flag looks like another broken workaround to me. So just to make it clear this approach is a NAK from my side, don't commit that. Regards, Christian. Alex Regards Shaoyun.liu -Original Message- From: amd-gfx On Behalf Of Christian König Sent: Wednesday, April 17, 2024 8:49 AM To: Chen, Horace ; amd-gfx@lists.freedesktop.org Cc: Andrey Grodzovsky ; Kuehling, Felix ; Deucher, Alexander ; Xiao, Jack ; Zhang, Hawking ; Liu, Monk ; Xu, Feifei ; Chang, HaiJun ; Leo Liu ; Liu, Jenny (Jing) Subject: Re: [PATCH 3/3] drm/amdgpu/mes11: make fence waits synchronous Am 17.04.24 um 13:30 schrieb Horace Chen: The MES firmware expects synchronous operation with the driver. For this to work asynchronously, each caller would need to provide its own fence location and sequence number. Well that's certainly not correct. The seqno takes care that we can wait async for the submission to complete. So clear NAK for that patch here. Regards, Christian. For now, add a mutex lock to serialize the MES submission. For SR-IOV long-wait case, break the long-wait to separated part to prevent this wait from impacting reset sequence. Signed-off-by: Horace Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 3 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 1 + drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 18 ++ 3 files changed, 18 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c index 78e4f88f5134..8896be95b2c8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c @@ -137,6 +137,7 @@ int amdgpu_mes_init(struct amdgpu_device *adev) spin_lock_init(>mes.queue_id_lock); spin_lock_init(>mes.ring_lock); mutex_init(>mes.mutex_hidden); + mutex_init(>mes.submission_lock); adev->mes.total_max_queue = AMDGPU_FENCE_MES_QUEUE_ID_MASK; adev->mes.vmid_mask_mmhub = 0xff00; @@ -221,6 +222,7 @@ int amdgpu_mes_init(struct amdgpu_device *adev) idr_destroy(>mes.queue_id_idr); ida_destroy(>mes.doorbell_ida); mutex_destroy(>mes.mutex_hidden); + mutex_destroy(>mes.submission_lock); return r; } @@ -240,6 +242,7 @@ void amdgpu_mes_fini(struct amdgpu_device *adev) idr_destroy(>mes.queue_id_idr); ida_destroy(>mes.doorbell_ida); mutex_destroy(>mes.mutex_hidden); + mutex_destroy(>mes.submission_lock); } static void amdgpu_mes_queue_free_mqd(struct amdgpu_mes_queue *q) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h index 6b3e1844eac5..90af935cc889 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h @@ -85,6 +85,7 @@ struct amdgpu_mes { struct amdgpu_ring ring; spinlock_t ring_lock; + struct mutexsubmission_lock; const struct firmware *fw[AMDGPU_MAX_MES_PIPES]; diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c index e40d00afd4f5..0a609a5b8835 100644 --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c @@ -162,6 +162,7 @@ static int mes_v11_0_submit_pkt_and_poll_completion(struct amdgpu_mes *mes, struct amdgpu_ring *ring = >ring; unsigned long flags; signed long timeout = adev->usec_timeout; + signed long retry_count = 1; const char *op_str, *misc_op_str; if (x_pkt->header.opcode >= MES_SCH_API_MAX) @@ -169,15 +170,19 @@ static int mes_v11_0_submit_pkt_and_poll_completion(struct amdgpu_mes *mes, if (amdgpu_emu_mode) { timeout *= 100; - } else if (amdgpu_sriov_vf(adev)) { + } + + if (amdgpu_sriov_vf(adev) && timeout > 0) { /*
Re: [PATCH 3/3] drm/amdgpu/mes11: make fence waits synchronous
Am 17.04.24 um 13:30 schrieb Horace Chen: The MES firmware expects synchronous operation with the driver. For this to work asynchronously, each caller would need to provide its own fence location and sequence number. Well that's certainly not correct. The seqno takes care that we can wait async for the submission to complete. So clear NAK for that patch here. Regards, Christian. For now, add a mutex lock to serialize the MES submission. For SR-IOV long-wait case, break the long-wait to separated part to prevent this wait from impacting reset sequence. Signed-off-by: Horace Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 3 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 1 + drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 18 ++ 3 files changed, 18 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c index 78e4f88f5134..8896be95b2c8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c @@ -137,6 +137,7 @@ int amdgpu_mes_init(struct amdgpu_device *adev) spin_lock_init(>mes.queue_id_lock); spin_lock_init(>mes.ring_lock); mutex_init(>mes.mutex_hidden); + mutex_init(>mes.submission_lock); adev->mes.total_max_queue = AMDGPU_FENCE_MES_QUEUE_ID_MASK; adev->mes.vmid_mask_mmhub = 0xff00; @@ -221,6 +222,7 @@ int amdgpu_mes_init(struct amdgpu_device *adev) idr_destroy(>mes.queue_id_idr); ida_destroy(>mes.doorbell_ida); mutex_destroy(>mes.mutex_hidden); + mutex_destroy(>mes.submission_lock); return r; } @@ -240,6 +242,7 @@ void amdgpu_mes_fini(struct amdgpu_device *adev) idr_destroy(>mes.queue_id_idr); ida_destroy(>mes.doorbell_ida); mutex_destroy(>mes.mutex_hidden); + mutex_destroy(>mes.submission_lock); } static void amdgpu_mes_queue_free_mqd(struct amdgpu_mes_queue *q) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h index 6b3e1844eac5..90af935cc889 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h @@ -85,6 +85,7 @@ struct amdgpu_mes { struct amdgpu_ring ring; spinlock_t ring_lock; + struct mutexsubmission_lock; const struct firmware *fw[AMDGPU_MAX_MES_PIPES]; diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c index e40d00afd4f5..0a609a5b8835 100644 --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c @@ -162,6 +162,7 @@ static int mes_v11_0_submit_pkt_and_poll_completion(struct amdgpu_mes *mes, struct amdgpu_ring *ring = >ring; unsigned long flags; signed long timeout = adev->usec_timeout; + signed long retry_count = 1; const char *op_str, *misc_op_str; if (x_pkt->header.opcode >= MES_SCH_API_MAX) @@ -169,15 +170,19 @@ static int mes_v11_0_submit_pkt_and_poll_completion(struct amdgpu_mes *mes, if (amdgpu_emu_mode) { timeout *= 100; - } else if (amdgpu_sriov_vf(adev)) { + } + + if (amdgpu_sriov_vf(adev) && timeout > 0) { /* Worst case in sriov where all other 15 VF timeout, each VF needs about 600ms */ - timeout = 15 * 600 * 1000; + retry_count = (15 * 600 * 1000) / timeout; } BUG_ON(size % 4 != 0); + mutex_lock(>submission_lock); spin_lock_irqsave(>ring_lock, flags); if (amdgpu_ring_alloc(ring, ndw)) { spin_unlock_irqrestore(>ring_lock, flags); + mutex_unlock(>submission_lock); return -ENOMEM; } @@ -199,8 +204,13 @@ static int mes_v11_0_submit_pkt_and_poll_completion(struct amdgpu_mes *mes, else dev_dbg(adev->dev, "MES msg=%d was emitted\n", x_pkt->header.opcode); - r = amdgpu_fence_wait_polling(ring, ring->fence_drv.sync_seq, - timeout); + do { + r = amdgpu_fence_wait_polling(ring, ring->fence_drv.sync_seq, + timeout); + retry_count--; + } while (retry_count > 0 && !amdgpu_in_reset(adev)); + + mutex_unlock(>submission_lock); if (r < 1) { if (misc_op_str)
Re: [PATCH Review 1/1] drm/amdgpu: Support setting reset_method at runtime
Am 12.04.24 um 08:21 schrieb Stanley.Yang: Signed-off-by: Stanley.Yang You are missing a commit message, without it the patch will automatically be rejected when you try to push it. With that added Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 80b9642f2bc4..5f5bf0c26b1f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -915,7 +915,7 @@ module_param_named(freesync_video, amdgpu_freesync_vid_mode, uint, 0444); * GPU reset method (-1 = auto (default), 0 = legacy, 1 = mode0, 2 = mode1, 3 = mode2, 4 = baco) */ MODULE_PARM_DESC(reset_method, "GPU reset method (-1 = auto (default), 0 = legacy, 1 = mode0, 2 = mode1, 3 = mode2, 4 = baco/bamaco)"); -module_param_named(reset_method, amdgpu_reset_method, int, 0444); +module_param_named(reset_method, amdgpu_reset_method, int, 0644); /** * DOC: bad_page_threshold (int) Bad page threshold is specifies the
Re: [PATCH v4 2/6] drm/amdgpu: add support of gfx10 register dump
Am 17.04.24 um 10:18 schrieb Sunil Khatri: Adding gfx10 gc registers to be used for register dump via devcoredump during a gpu reset. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 8 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 4 + drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 130 +- drivers/gpu/drm/amd/amdgpu/soc15.h| 2 + .../include/asic_reg/gc/gc_10_1_0_offset.h| 12 ++ 5 files changed, 155 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index e0d7f4ee7e16..210af65a744c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -139,6 +139,14 @@ enum amdgpu_ss { AMDGPU_SS_DRV_UNLOAD }; +struct amdgpu_hwip_reg_entry { + u32 hwip; + u32 inst; + u32 seg; + u32 reg_offset; + charreg_name[50]; Make that a const char *. Otherwise it bloats up the final binary because the compiler has to add zeros at the end. +}; + struct amdgpu_watchdog_timer { bool timeout_fatal_disable; uint32_t period; /* maxCycles = (1 << period), the number of cycles before a timeout */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h index 04a86dff71e6..64f197bbc866 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h @@ -433,6 +433,10 @@ struct amdgpu_gfx { uint32_tnum_xcc_per_xcp; struct mutexpartition_mutex; boolmcbp; /* mid command buffer preemption */ + + /* IP reg dump */ + uint32_t*ip_dump; + uint32_treg_count; }; struct amdgpu_gfx_ras_reg_entry { diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index a0bc4196ff8b..4a54161f4837 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -276,6 +276,99 @@ MODULE_FIRMWARE("amdgpu/gc_10_3_7_mec.bin"); MODULE_FIRMWARE("amdgpu/gc_10_3_7_mec2.bin"); MODULE_FIRMWARE("amdgpu/gc_10_3_7_rlc.bin"); +static const struct amdgpu_hwip_reg_entry gc_reg_list_10_1[] = { + SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS), + SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS2), + SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS3), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_STALLED_STAT1), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_STALLED_STAT2), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPC_STALLED_STAT1), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPF_STALLED_STAT1), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_BUSY_STAT), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPC_BUSY_STAT), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPF_BUSY_STAT), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPC_BUSY_STAT2), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPF_BUSY_STAT2), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_CPF_STATUS), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_ERROR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HPD_STATUS0), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB_BASE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB_RPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB_WPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB0_BASE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB0_RPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB0_WPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB1_BASE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB1_RPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB1_WPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB2_BASE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB2_WPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB2_WPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_CE_IB1_CMD_BUFSZ), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_CE_IB2_CMD_BUFSZ), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_IB1_CMD_BUFSZ), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_IB2_CMD_BUFSZ), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_CE_IB1_BASE_LO), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_CE_IB1_BASE_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_CE_IB1_BUFSZ), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_CE_IB2_BASE_LO), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_CE_IB2_BASE_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_CE_IB2_BUFSZ), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_IB1_BASE_LO), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_IB1_BASE_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_IB1_BUFSZ), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_IB2_BASE_LO), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_IB2_BASE_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_IB2_BUFSZ), + SOC15_REG_ENTRY_STR(GC, 0, mmCPF_UTCL1_STATUS), + SOC15_REG_ENTRY_STR(GC, 0, mmCPC_UTCL1_STATUS), + SOC15_REG_ENTRY_STR(GC, 0, mmCPG_UTCL1_STATUS), + SOC15_REG_ENTRY_STR(GC, 0, mmGDS_PROTECTION_FAULT), + SOC15_REG_ENTRY_STR(GC, 0, mmGDS_VM_PROTECTION_FAULT), +
Re: [PATCH v3 2/5] drm:amdgpu: Enable IH ring1 for IH v6.1
Am 17.04.24 um 08:43 schrieb Friedrich Vock: On 16.04.24 15:34, Sunil Khatri wrote: We need IH ring1 for handling the pagefault interrupts which over flow in default ring for specific usecases. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/ih_v6_1.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c b/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c index b8da0fc29378..73dba180fabd 100644 --- a/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c +++ b/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c @@ -550,8 +550,15 @@ static int ih_v6_1_sw_init(void *handle) adev->irq.ih.use_doorbell = true; adev->irq.ih.doorbell_index = adev->doorbell_index.ih << 1; - adev->irq.ih1.ring_size = 0; - adev->irq.ih2.ring_size = 0; + if (!(adev->flags & AMD_IS_APU)) { Why restrict this to dGPUs? Page faults can overflow the default ring on APUs too (e.g. for Vangogh). Because APUs don't have the necessary hw. In other words they have no secondary IH ring buffer :( But we are working on a fw fix for them and Navi 1x and 2x as well. Regards, Christian. Regards, Friedrich + r = amdgpu_ih_ring_init(adev, >irq.ih1, IH_RING_SIZE, + use_bus_addr); + if (r) + return r; + + adev->irq.ih1.use_doorbell = true; + adev->irq.ih1.doorbell_index = (adev->doorbell_index.ih + 1) << 1; + } /* initialize ih control register offset */ ih_v6_1_init_register_offset(adev);
Re: [PATCH v2] drm/amdgpu: Modify the contiguous flags behaviour
Am 17.04.24 um 08:21 schrieb Arunpravin Paneer Selvam: Now we have two flags for contiguous VRAM buffer allocation. If the application request for AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS, it would set the ttm place TTM_PL_FLAG_CONTIGUOUS flag in the buffer's placement function. This patch will change the default behaviour of the two flags. When we set AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS - This means contiguous is not mandatory. - we will try to allocate the contiguous buffer. Say if the allocation fails, we fallback to allocate the individual pages. When we setTTM_PL_FLAG_CONTIGUOUS - This means contiguous allocation is mandatory. - we are setting this in amdgpu_bo_pin_restricted() before bo validation and check this flag in the vram manager file. - if this is set, we should allocate the buffer pages contiguously. the allocation fails, we return -ENOSPC. v2: - keep the mem_flags and bo->flags check as is(Christian) - place the TTM_PL_FLAG_CONTIGUOUS flag setting into the amdgpu_bo_pin_restricted function placement range iteration loop(Christian) - rename find_pages with amdgpu_vram_mgr_calculate_pages_per_block (Christian) - Keep the kernel BO allocation as is(Christain) - If BO pin vram allocation failed, we need to return -ENOSPC as RDMA cannot work with scattered VRAM pages(Philip) Signed-off-by: Arunpravin Paneer Selvam Suggested-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 8 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 57 +++- 2 files changed, 50 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 8bc79924d171..caaef7b1df49 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -153,8 +153,10 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) else places[c].flags |= TTM_PL_FLAG_TOPDOWN; - if (flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) + if (abo->tbo.type == ttm_bo_type_kernel && + flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) places[c].flags |= TTM_PL_FLAG_CONTIGUOUS; + c++; } @@ -966,6 +968,10 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain, if (!bo->placements[i].lpfn || (lpfn && lpfn < bo->placements[i].lpfn)) bo->placements[i].lpfn = lpfn; + + if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS && + bo->placements[i].mem_type == TTM_PL_VRAM) + bo->placements[i].flags |= TTM_PL_FLAG_CONTIGUOUS; } r = ttm_bo_validate(>tbo, >placement, ); Nice work, up till here that looks exactly right as far as I can see. diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 8db880244324..4be8b091099a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -88,6 +88,29 @@ static inline u64 amdgpu_vram_mgr_blocks_size(struct list_head *head) return size; } +static inline unsigned long +amdgpu_vram_mgr_calculate_pages_per_block(struct ttm_buffer_object *tbo, + const struct ttm_place *place, + unsigned long bo_flags) +{ + unsigned long pages_per_block; + + if (bo_flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) { + pages_per_block = ~0ul; If I understand it correctly this here enforces the allocation of a contiguous buffer in the way that it says we should have only one giant page for the whole BO. + } else { +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + pages_per_block = HPAGE_PMD_NR; +#else + /* default to 2MB */ + pages_per_block = 2UL << (20UL - PAGE_SHIFT); +#endif + pages_per_block = max_t(uint32_t, pages_per_block, + tbo->page_alignment); + } + + return pages_per_block; +} + /** * DOC: mem_info_vram_total * @@ -451,8 +474,10 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, struct amdgpu_vram_mgr *mgr = to_vram_mgr(man); struct amdgpu_device *adev = to_amdgpu_device(mgr); u64 vis_usage = 0, max_bytes, min_block_size; + struct amdgpu_bo *bo = ttm_to_amdgpu_bo(tbo); struct amdgpu_vram_mgr_resource *vres; u64 size, remaining_size, lpfn, fpfn; + unsigned long bo_flags = bo->flags; struct drm_buddy *mm = >mm; struct drm_buddy_block *block; unsigned long pages_per_block; @@ -468,18 +493,9 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, i
Re: [PATCH 2/6] drm/amdgpu: add support of gfx10 register dump
Am 16.04.24 um 15:55 schrieb Alex Deucher: On Tue, Apr 16, 2024 at 8:08 AM Sunil Khatri wrote: Adding gfx10 gc registers to be used for register dump via devcoredump during a gpu reset. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 12 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 4 + drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 131 +- .../include/asic_reg/gc/gc_10_1_0_offset.h| 12 ++ 4 files changed, 158 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index e0d7f4ee7e16..e016ac33629d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -139,6 +139,18 @@ enum amdgpu_ss { AMDGPU_SS_DRV_UNLOAD }; +struct hwip_reg_entry { + u32 hwip; + u32 inst; + u32 seg; + u32 reg_offset; +}; + +struct reg_pair { + u32 offset; + u32 value; +}; + struct amdgpu_watchdog_timer { bool timeout_fatal_disable; uint32_t period; /* maxCycles = (1 << period), the number of cycles before a timeout */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h index 04a86dff71e6..295a2c8d2e48 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h @@ -433,6 +433,10 @@ struct amdgpu_gfx { uint32_tnum_xcc_per_xcp; struct mutexpartition_mutex; boolmcbp; /* mid command buffer preemption */ + + /* IP reg dump */ + struct reg_pair *ip_dump; + uint32_treg_count; }; struct amdgpu_gfx_ras_reg_entry { diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index a0bc4196ff8b..46e136609ff1 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -276,6 +276,99 @@ MODULE_FIRMWARE("amdgpu/gc_10_3_7_mec.bin"); MODULE_FIRMWARE("amdgpu/gc_10_3_7_mec2.bin"); MODULE_FIRMWARE("amdgpu/gc_10_3_7_rlc.bin"); +static const struct hwip_reg_entry gc_reg_list_10_1[] = { + { SOC15_REG_ENTRY(GC, 0, mmGRBM_STATUS) }, + { SOC15_REG_ENTRY(GC, 0, mmGRBM_STATUS2) }, + { SOC15_REG_ENTRY(GC, 0, mmGRBM_STATUS3) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_STALLED_STAT1) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_STALLED_STAT2) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_CPC_STALLED_STAT1) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_CPF_STALLED_STAT1) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_BUSY_STAT) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_CPC_BUSY_STAT) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_CPF_BUSY_STAT) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_CPC_BUSY_STAT2) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_CPF_BUSY_STAT2) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_CPF_STATUS) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_GFX_ERROR) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_GFX_HPD_STATUS0) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_RB_BASE) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_RB_RPTR) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_RB_WPTR) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_RB0_BASE) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_RB0_RPTR) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_RB0_WPTR) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_RB1_BASE) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_RB1_RPTR) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_RB1_WPTR) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_RB2_BASE) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_RB2_WPTR) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_RB2_WPTR) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_CE_IB1_CMD_BUFSZ) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_CE_IB2_CMD_BUFSZ) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_IB1_CMD_BUFSZ) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_IB2_CMD_BUFSZ) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_CE_IB1_BASE_LO) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_CE_IB1_BASE_HI) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_CE_IB1_BUFSZ) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_CE_IB2_BASE_LO) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_CE_IB2_BASE_HI) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_CE_IB2_BUFSZ) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_IB1_BASE_LO) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_IB1_BASE_HI) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_IB1_BUFSZ) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_IB2_BASE_LO) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_IB2_BASE_HI) }, + { SOC15_REG_ENTRY(GC, 0, mmCP_IB2_BUFSZ) }, + { SOC15_REG_ENTRY(GC, 0, mmCPF_UTCL1_STATUS) }, + { SOC15_REG_ENTRY(GC, 0, mmCPC_UTCL1_STATUS) }, + { SOC15_REG_ENTRY(GC, 0, mmCPG_UTCL1_STATUS) }, + { SOC15_REG_ENTRY(GC, 0, mmGDS_PROTECTION_FAULT) }, + { SOC15_REG_ENTRY(GC, 0, mmGDS_VM_PROTECTION_FAULT) }, + { SOC15_REG_ENTRY(GC, 0, mmIA_UTCL1_STATUS) }, + { SOC15_REG_ENTRY(GC, 0, mmIA_UTCL1_STATUS_2) }, + {
Re: [PATCH v3 1/5] drm:amdgpu: enable IH RB ring1 for IH v6.0
Am 16.04.24 um 15:34 schrieb Sunil Khatri: We need IH ring1 for handling the pagefault interrupts which are overflowing the default ring for specific usecases. Signed-off-by: Sunil Khatri Reviewed-by: Christian König for the entire series. --- drivers/gpu/drm/amd/amdgpu/ih_v6_0.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c b/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c index ad4ad39f128f..26dc99232eb6 100644 --- a/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c @@ -549,8 +549,15 @@ static int ih_v6_0_sw_init(void *handle) adev->irq.ih.use_doorbell = true; adev->irq.ih.doorbell_index = adev->doorbell_index.ih << 1; - adev->irq.ih1.ring_size = 0; - adev->irq.ih2.ring_size = 0; + if (!(adev->flags & AMD_IS_APU)) { + r = amdgpu_ih_ring_init(adev, >irq.ih1, IH_RING_SIZE, + use_bus_addr); + if (r) + return r; + + adev->irq.ih1.use_doorbell = true; + adev->irq.ih1.doorbell_index = (adev->doorbell_index.ih + 1) << 1; + } /* initialize ih control register offset */ ih_v6_0_init_register_offset(adev);
Re: [PATCH] drm/amdgpu: clear seq64 memory on free
Am 16.04.24 um 14:34 schrieb Paneer Selvam, Arunpravin: Hi Christian, On 4/16/2024 5:47 PM, Christian König wrote: Am 16.04.24 um 14:16 schrieb Paneer Selvam, Arunpravin: Hi Christian, On 4/16/2024 2:35 PM, Christian König wrote: Am 15.04.24 um 20:48 schrieb Arunpravin Paneer Selvam: We should clear the memory on free. Otherwise, there is a chance that we will access the previous application data and this would leads to an abnormal behaviour in the current application. Mhm, I would strongly expect that we initialize the seq number after allocation. It could be that we also have situations were the correct start value is 0x or something like that instead. Why does this matter? when the user queue A's u64 address (fence address) is allocated to the newly created user queue B, we see a problem. User queue B calls the signal IOCTL which creates a new fence having the wptr as the seq number, in amdgpu_userq_fence_create function we have a check where we are comparing the rptr and wptr value (rptr >= wptr). since the rptr value is read from the u64 address which has the user queue A's wptr data, this rptr >= wptr condition gets satisfied and we are dropping the reference before the actual command gets processed in the hardware. If we clear this u64 value on free, we dont see this problem. Yeah, but that doesn't belongs into the seq64 handling. Instead the code which allocates the seq64 during userqueue created needs to clear it to 0. sure, got it. Yeah, but fixing that aside. We should probably initialize the seq64 array to something like 0xdeadbeef or a similar pattern to find issues were we forget to initialize the allocated slots. Regards, Christian. Thanks, Arun. Regards, Christian. Thanks, Arun. Regards, Christian. Signed-off-by: Arunpravin Paneer Selvam --- drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c index 4b9afc4df031..9613992c9804 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c @@ -189,10 +189,14 @@ int amdgpu_seq64_alloc(struct amdgpu_device *adev, u64 *va, u64 **cpu_addr) void amdgpu_seq64_free(struct amdgpu_device *adev, u64 va) { unsigned long bit_pos; + u64 *cpu_addr; bit_pos = (va - amdgpu_seq64_get_va_base(adev)) / sizeof(u64); - if (bit_pos < adev->seq64.num_sem) + if (bit_pos < adev->seq64.num_sem) { + cpu_addr = bit_pos + adev->seq64.cpu_base_addr; + memset(cpu_addr, 0, sizeof(u64)); __clear_bit(bit_pos, adev->seq64.used); + } } /**
Re: [PATCH] drm/amdgpu: clear seq64 memory on free
Am 16.04.24 um 14:16 schrieb Paneer Selvam, Arunpravin: Hi Christian, On 4/16/2024 2:35 PM, Christian König wrote: Am 15.04.24 um 20:48 schrieb Arunpravin Paneer Selvam: We should clear the memory on free. Otherwise, there is a chance that we will access the previous application data and this would leads to an abnormal behaviour in the current application. Mhm, I would strongly expect that we initialize the seq number after allocation. It could be that we also have situations were the correct start value is 0x or something like that instead. Why does this matter? when the user queue A's u64 address (fence address) is allocated to the newly created user queue B, we see a problem. User queue B calls the signal IOCTL which creates a new fence having the wptr as the seq number, in amdgpu_userq_fence_create function we have a check where we are comparing the rptr and wptr value (rptr >= wptr). since the rptr value is read from the u64 address which has the user queue A's wptr data, this rptr >= wptr condition gets satisfied and we are dropping the reference before the actual command gets processed in the hardware. If we clear this u64 value on free, we dont see this problem. Yeah, but that doesn't belongs into the seq64 handling. Instead the code which allocates the seq64 during userqueue created needs to clear it to 0. Regards, Christian. Thanks, Arun. Regards, Christian. Signed-off-by: Arunpravin Paneer Selvam --- drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c index 4b9afc4df031..9613992c9804 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c @@ -189,10 +189,14 @@ int amdgpu_seq64_alloc(struct amdgpu_device *adev, u64 *va, u64 **cpu_addr) void amdgpu_seq64_free(struct amdgpu_device *adev, u64 va) { unsigned long bit_pos; + u64 *cpu_addr; bit_pos = (va - amdgpu_seq64_get_va_base(adev)) / sizeof(u64); - if (bit_pos < adev->seq64.num_sem) + if (bit_pos < adev->seq64.num_sem) { + cpu_addr = bit_pos + adev->seq64.cpu_base_addr; + memset(cpu_addr, 0, sizeof(u64)); __clear_bit(bit_pos, adev->seq64.used); + } } /**
Re: [PATCH v2] drm/amdkfd: make sure VM is ready for updating operations
Looks valid to me of hand, but it's really Felix who needs to judge this. On the other hand if it blocks any CI feel free to add my acked-by and submit it. Christian. Am 16.04.24 um 04:05 schrieb Yu, Lang: [Public] ping -Original Message- From: Yu, Lang Sent: Thursday, April 11, 2024 4:11 PM To: amd-gfx@lists.freedesktop.org Cc: Koenig, Christian ; Kuehling, Felix ; Yu, Lang Subject: [PATCH v2] drm/amdkfd: make sure VM is ready for updating operations When page table BOs were evicted but not validated before updating page tables, VM is still in evicting state, amdgpu_vm_update_range returns -EBUSY and restore_process_worker runs into a dead loop. v2: Split the BO validation and page table update into two separate loops in amdgpu_amdkfd_restore_process_bos. (Felix) 1.Validate BOs 2.Validate VM (and DMABuf attachments) 3.Update page tables for the BOs validated above Fixes: 2fdba514ad5a ("drm/amdgpu: Auto-validate DMABuf imports in compute VMs") Signed-off-by: Lang Yu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 34 +++ 1 file changed, 20 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 0ae9fd844623..e2c9e6ddb1d1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -2900,13 +2900,12 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence __rcu * amdgpu_sync_create(_obj); - /* Validate BOs and map them to GPUVM (update VM page tables). */ + /* Validate BOs managed by KFD */ list_for_each_entry(mem, _info->kfd_bo_list, validate_list) { struct amdgpu_bo *bo = mem->bo; uint32_t domain = mem->domain; - struct kfd_mem_attachment *attachment; struct dma_resv_iter cursor; struct dma_fence *fence; @@ -2931,6 +2930,25 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence __rcu * goto validate_map_fail; } } + } + + if (failed_size) + pr_debug("0x%lx/0x%lx in system\n", failed_size, total_size); + + /* Validate PDs, PTs and evicted DMABuf imports last. Otherwise BO + * validations above would invalidate DMABuf imports again. + */ + ret = process_validate_vms(process_info, ); + if (ret) { + pr_debug("Validating VMs failed, ret: %d\n", ret); + goto validate_map_fail; + } + + /* Update mappings managed by KFD. */ + list_for_each_entry(mem, _info->kfd_bo_list, + validate_list) { + struct kfd_mem_attachment *attachment; + list_for_each_entry(attachment, >attachments, list) { if (!attachment->is_mapped) continue; @@ -2947,18 +2965,6 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence __rcu * } } - if (failed_size) - pr_debug("0x%lx/0x%lx in system\n", failed_size, total_size); - - /* Validate PDs, PTs and evicted DMABuf imports last. Otherwise BO - * validations above would invalidate DMABuf imports again. - */ - ret = process_validate_vms(process_info, ); - if (ret) { - pr_debug("Validating VMs failed, ret: %d\n", ret); - goto validate_map_fail; - } - /* Update mappings not managed by KFD */ list_for_each_entry(peer_vm, _info->vm_list_head, vm_list_node) { -- 2.25.1
Re: [PATCH] drm/amdgpu: clear seq64 memory on free
Am 15.04.24 um 20:48 schrieb Arunpravin Paneer Selvam: We should clear the memory on free. Otherwise, there is a chance that we will access the previous application data and this would leads to an abnormal behaviour in the current application. Mhm, I would strongly expect that we initialize the seq number after allocation. It could be that we also have situations were the correct start value is 0x or something like that instead. Why does this matter? Regards, Christian. Signed-off-by: Arunpravin Paneer Selvam --- drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c index 4b9afc4df031..9613992c9804 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c @@ -189,10 +189,14 @@ int amdgpu_seq64_alloc(struct amdgpu_device *adev, u64 *va, u64 **cpu_addr) void amdgpu_seq64_free(struct amdgpu_device *adev, u64 va) { unsigned long bit_pos; + u64 *cpu_addr; bit_pos = (va - amdgpu_seq64_get_va_base(adev)) / sizeof(u64); - if (bit_pos < adev->seq64.num_sem) + if (bit_pos < adev->seq64.num_sem) { + cpu_addr = bit_pos + adev->seq64.cpu_base_addr; + memset(cpu_addr, 0, sizeof(u64)); __clear_bit(bit_pos, adev->seq64.used); + } } /**
Re: [PATCH V2] drm/amdgpu: Fix incorrect return value
Well multiple things to consider here. This is clearly not called from the interrupt, otherwise locking a mutex would be illegal. So question is who is calling this? And can the function be called from different threads at the same time? As far as I can see you don't take that into account in the patch. When there is some kind of single threaded worker handling the RAS errors instead then I strongly suggest to solve this issue in the worker. As far as I can see hacking around a broken caller by inserting amdgpu_vram_mgr_query_page_status() inside amdgpu_vram_mgr_reserve_range() is an absolutely no-go. That is really bad coding style. What could be is that the VRAM manager needs to be able to provide atomic uniqueness for the reserved addresses, e.g. that amdgpu_vram_mgr_reserve_range() can be called multiple times with same address from multiple threads, but then we would need a different data structure than a linked list which is protected by a mutex. Regards, Christian. Am 15.04.24 um 06:04 schrieb Chai, Thomas: [AMD Official Use Only - General] Hi Christian: If an ecc error occurs at an address, HW will generate an interrupt to SW to retire all pages located in the same physical row as the error address based on the physical characteristics of the memory device. Therefore, if other pages located on the same physical row as the error address also occur ecc errors later, HW will also generate multiple interrupts to SW to retire these same pages again, so that amdgpu_vram_mgr_reserve_range will be called multiple times to reserve the same pages. I think it's more appropriate to do the status check inside the function. If the function entry is not checked, people who are not familiar with this part of the code can easily make mistakes when calling the function. - Best Regards, Thomas -Original Message- From: Christian König Sent: Friday, April 12, 2024 5:24 PM To: Chai, Thomas ; amd-gfx@lists.freedesktop.org Cc: Chai, Thomas ; Zhang, Hawking ; Zhou1, Tao ; Li, Candice ; Wang, Yang(Kevin) ; Yang, Stanley Subject: Re: [PATCH V2] drm/amdgpu: Fix incorrect return value Am 12.04.24 um 10:55 schrieb YiPeng Chai: [Why] After calling amdgpu_vram_mgr_reserve_range multiple times with the same address, calling amdgpu_vram_mgr_query_page_status will always return -EBUSY. From the second call to amdgpu_vram_mgr_reserve_range, the same address will be added to the reservations_pending list again and is never moved to the reserved_pages list because the address had been reserved. Well just to make it clear that approach is a NAK until my concerns are solved. Regards, Christian. [How] First add the address status check before calling amdgpu_vram_mgr_do_reserve, if the address is already reserved, do nothing; If the address is already in the reservations_pending list, directly reserve memory; only add new nodes for the addresses that are not in the reserved_pages list and reservations_pending list. V2: Avoid repeated locking/unlocking. Signed-off-by: YiPeng Chai --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 25 +--- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 1e36c428d254..a636d3f650b1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -317,7 +317,6 @@ static void amdgpu_vram_mgr_do_reserve(struct ttm_resource_manager *man) dev_dbg(adev->dev, "Reservation 0x%llx - %lld, Succeeded\n", rsv->start, rsv->size); - vis_usage = amdgpu_vram_mgr_vis_size(adev, block); atomic64_add(vis_usage, >vis_usage); spin_lock(>bdev->lru_lock); @@ -340,19 +339,27 @@ int amdgpu_vram_mgr_reserve_range(struct amdgpu_vram_mgr *mgr, uint64_t start, uint64_t size) { struct amdgpu_vram_reservation *rsv; + int ret = 0; - rsv = kzalloc(sizeof(*rsv), GFP_KERNEL); - if (!rsv) - return -ENOMEM; + ret = amdgpu_vram_mgr_query_page_status(mgr, start); + if (!ret) + return 0; - INIT_LIST_HEAD(>allocated); - INIT_LIST_HEAD(>blocks); + if (ret == -ENOENT) { + rsv = kzalloc(sizeof(*rsv), GFP_KERNEL); + if (!rsv) + return -ENOMEM; - rsv->start = start; - rsv->size = size; + INIT_LIST_HEAD(>allocated); + INIT_LIST_HEAD(>blocks); + + rsv->start = start; + rsv->size = size; + } mutex_lock(>lock); - list_add_tail(>blocks, >reservations_pending); + if (ret == -ENOENT) + list_add_tail(>blocks, >reservations_pending); amdgpu_vram_mgr_do_reserve(>manager); mutex_unlock(>lock);
Re: [PATCH] drm/ttm: Make ttm shrinkers NUMA aware
Am 08.04.24 um 19:49 schrieb Rajneesh Bhardwaj: Otherwise the nid is always passed as 0 during memory reclaim so make TTM shrinkers NUMA aware. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/ttm/ttm_pool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c index dbc96984d331..514261f44b78 100644 --- a/drivers/gpu/drm/ttm/ttm_pool.c +++ b/drivers/gpu/drm/ttm/ttm_pool.c @@ -794,7 +794,7 @@ int ttm_pool_mgr_init(unsigned long num_pages) _pool_debugfs_shrink_fops); #endif - mm_shrinker = shrinker_alloc(0, "drm-ttm_pool"); + mm_shrinker = shrinker_alloc(SHRINKER_NUMA_AWARE, "drm-ttm_pool"); Well that won't do it. Setting the flag is just one step, but both ttm_pool_shrinker_scan() and ttm_pool_type_count() needs to be made NUMA aware. This means that allocated_pages needs to become a per NID array and ttm_pool_shrink() should not shrink any pooĺ but only those with the correct nid (if the nid is set). Regards, Christian. if (!mm_shrinker) return -ENOMEM;
Re: [PATCH] drm/amdgpu: Modify the contiguous flags behaviour
Am 16.04.24 um 00:02 schrieb Philip Yang: On 2024-04-14 10:57, Arunpravin Paneer Selvam wrote: Now we have two flags for contiguous VRAM buffer allocation. If the application request for AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS, it would set the ttm place TTM_PL_FLAG_CONTIGUOUS flag in the buffer's placement function. This patch will change the default behaviour of the two flags. This change will simplify the KFD best effort contiguous VRAM allocation, because KFD doesn't need set new GEM_ flag. When we set AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS - This means contiguous is not mandatory. AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS used in couple of places. For page table BO, it is fine as BO size is page size 4K. For 64KB reserved BOs and F/W size related BOs, do all allocation happen at driver initialization before the VRAM is fragmented? Oh, that's a really good point, we need to keep the behavior as is for kernel allocations. Arun can you take care of that? Thanks, Christian. - we will try to allocate the contiguous buffer. Say if the allocation fails, we fallback to allocate the individual pages. When we setTTM_PL_FLAG_CONTIGUOUS - This means contiguous allocation is mandatory. - we are setting this in amdgpu_bo_pin_restricted() before bo validation and check this flag in the vram manager file. - if this is set, we should allocate the buffer pages contiguously. the allocation fails, we return -ENOSPC. Signed-off-by: Arunpravin Paneer Selvam Suggested-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 14 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 57 +++- 2 files changed, 49 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 8bc79924d171..41926d631563 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -153,8 +153,6 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) else places[c].flags |= TTM_PL_FLAG_TOPDOWN; - if (flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) - places[c].flags |= TTM_PL_FLAG_CONTIGUOUS; c++; } @@ -899,6 +897,8 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain, { struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); struct ttm_operation_ctx ctx = { false, false }; + struct ttm_place *places = bo->placements; + u32 c = 0; int r, i; if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) @@ -921,16 +921,10 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain, if (bo->tbo.pin_count) { uint32_t mem_type = bo->tbo.resource->mem_type; - uint32_t mem_flags = bo->tbo.resource->placement; if (!(domain & amdgpu_mem_type_to_domain(mem_type))) return -EINVAL; - if ((mem_type == TTM_PL_VRAM) && - (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) && - !(mem_flags & TTM_PL_FLAG_CONTIGUOUS)) - return -EINVAL; - This looks like a bug before, but with this patch, the check makes sense and is needed. ttm_bo_pin(>tbo); if (max_offset != 0) { @@ -968,6 +962,10 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain, bo->placements[i].lpfn = lpfn; } + if (domain & AMDGPU_GEM_DOMAIN_VRAM && + !WARN_ON(places[c].mem_type != TTM_PL_VRAM)) + places[c].flags |= TTM_PL_FLAG_CONTIGUOUS; + If BO pinned is not allocated with AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS, should pin and return scattered pages because the RDMA support scattered dmabuf. Christian also pointed this out. If (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS && bo->placements[i].mem_type == TTM_PL_VRAM) o->placements[i].flags |= TTM_PL_FLAG_CONTIGUOUS; r = ttm_bo_validate(>tbo, >placement, ); if (unlikely(r)) { dev_err(adev->dev, "%p pin failed\n", bo); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 8db880244324..ddbf302878f6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -88,6 +88,30 @@ static inline u64 amdgpu_vram_mgr_blocks_size(struct list_head *head) return size; } +static inline unsigned long +amdgpu_vram_find_pages_per_block(struct ttm_buffer_object *tbo, +const struct ttm_place *place, +unsigned long bo_flags) +{ + unsigned long pages_per_block; + + if (bo_flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS || + place->flags & TTM_PL_FLAG_CONTIGUOUS
Re: [PATCH] drm/radeon: make -fstrict-flex-arrays=3 happy
Am 15.04.24 um 15:38 schrieb Alex Deucher: The driver parses a union where the layout up through the first array is the same, however, the array has different sizes depending on the elements in the union. Be explicit to fix the UBSAN checker. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3323 Fixes: df8fc4e934c1 ("kbuild: Enable -fstrict-flex-arrays=3") Signed-off-by: Alex Deucher Cc: Kees Cook Acked-by: Christian König But I have a bad feeling messing with that old code. Regards, Christian. --- drivers/gpu/drm/radeon/radeon_atombios.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_atombios.c b/drivers/gpu/drm/radeon/radeon_atombios.c index bb1f0a3371ab5..10793a433bf58 100644 --- a/drivers/gpu/drm/radeon/radeon_atombios.c +++ b/drivers/gpu/drm/radeon/radeon_atombios.c @@ -923,8 +923,12 @@ bool radeon_get_atom_connector_info_from_supported_devices_table(struct max_device = ATOM_MAX_SUPPORTED_DEVICE_INFO; for (i = 0; i < max_device; i++) { - ATOM_CONNECTOR_INFO_I2C ci = - supported_devices->info.asConnInfo[i]; + ATOM_CONNECTOR_INFO_I2C ci; + + if (frev > 1) + ci = supported_devices->info_2d1.asConnInfo[i]; + else + ci = supported_devices->info.asConnInfo[i]; bios_connectors[i].valid = false;
Re: [PATCH] drm/amdgpu: Modify the contiguous flags behaviour
Am 14.04.24 um 16:57 schrieb Arunpravin Paneer Selvam: Now we have two flags for contiguous VRAM buffer allocation. If the application request for AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS, it would set the ttm place TTM_PL_FLAG_CONTIGUOUS flag in the buffer's placement function. This patch will change the default behaviour of the two flags. When we set AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS - This means contiguous is not mandatory. - we will try to allocate the contiguous buffer. Say if the allocation fails, we fallback to allocate the individual pages. When we setTTM_PL_FLAG_CONTIGUOUS - This means contiguous allocation is mandatory. - we are setting this in amdgpu_bo_pin_restricted() before bo validation and check this flag in the vram manager file. - if this is set, we should allocate the buffer pages contiguously. the allocation fails, we return -ENOSPC. Signed-off-by: Arunpravin Paneer Selvam Suggested-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 14 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 57 +++- 2 files changed, 49 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 8bc79924d171..41926d631563 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -153,8 +153,6 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) else places[c].flags |= TTM_PL_FLAG_TOPDOWN; - if (flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) - places[c].flags |= TTM_PL_FLAG_CONTIGUOUS; c++; } @@ -899,6 +897,8 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain, { struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); struct ttm_operation_ctx ctx = { false, false }; + struct ttm_place *places = bo->placements; + u32 c = 0; int r, i; if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) @@ -921,16 +921,10 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain, if (bo->tbo.pin_count) { uint32_t mem_type = bo->tbo.resource->mem_type; - uint32_t mem_flags = bo->tbo.resource->placement; if (!(domain & amdgpu_mem_type_to_domain(mem_type))) return -EINVAL; - if ((mem_type == TTM_PL_VRAM) && - (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) && - !(mem_flags & TTM_PL_FLAG_CONTIGUOUS)) - return -EINVAL; - I think that check here needs to stay. ttm_bo_pin(>tbo); if (max_offset != 0) { @@ -968,6 +962,10 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain, bo->placements[i].lpfn = lpfn; } + if (domain & AMDGPU_GEM_DOMAIN_VRAM && + !WARN_ON(places[c].mem_type != TTM_PL_VRAM)) + places[c].flags |= TTM_PL_FLAG_CONTIGUOUS; + This needs to go into the loop directly above as something like this here: If (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS && bo->placements[i].mem_type == TTM_PL_VRAM) o->placements[i].flags |= TTM_PL_FLAG_CONTIGUOUS; This essentially replaces the removed code in amdgpu_bo_placement_from_domain() and only applies it during pinning. r = ttm_bo_validate(>tbo, >placement, ); if (unlikely(r)) { dev_err(adev->dev, "%p pin failed\n", bo); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 8db880244324..ddbf302878f6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -88,6 +88,30 @@ static inline u64 amdgpu_vram_mgr_blocks_size(struct list_head *head) return size; } +static inline unsigned long +amdgpu_vram_find_pages_per_block(struct ttm_buffer_object *tbo, +const struct ttm_place *place, +unsigned long bo_flags) Well I think we need a better name here. "find" usually means we look up something in a data structure. Maybe amdgpu_vram_mgr_calculate_pages_per_block. +{ + unsigned long pages_per_block; + + if (bo_flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS || + place->flags & TTM_PL_FLAG_CONTIGUOUS) { + pages_per_block = ~0ul; + } else { +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + pages_per_block = HPAGE_PMD_NR; +#else + /* default to 2MB */ + pages_per_block = 2UL << (20UL - PAGE_SHIFT); +#endif + pages_per_block = max_t(uint32_t, pages_per_block, + tbo->page_alignment); + } + + return p
Re: [PATCH 1/6] drm/amdgpu: Support contiguous VRAM allocation
Am 12.04.24 um 22:12 schrieb Philip Yang: RDMA device with limited scatter-gather capability requires physical address contiguous VRAM buffer for RDMA peer direct access. Add a new KFD alloc memory flag and store as new GEM bo alloc flag. When pin this buffer object to export for RDMA peerdirect access, set AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS flag, and then vram_mgr will set TTM_PL_FLAG_CONTIFUOUS flag to ask VRAM buddy allocator to get contiguous VRAM. Remove the 2GB max memory block size limit for contiguous allocation. I'm going to sync up with Arun on this once more, but I think we won't even need the new flag. We will just downgrade the existing flag to be a best effort allocation for contiguous buffers and only use the TTM flag internally to signal that we need to alter it while pinning. Regards, Christian. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 7 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 9 +++-- include/uapi/drm/amdgpu_drm.h| 5 + include/uapi/linux/kfd_ioctl.h | 1 + 4 files changed, 20 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 0ae9fd844623..3523b91f8add 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1470,6 +1470,9 @@ static int amdgpu_amdkfd_gpuvm_pin_bo(struct amdgpu_bo *bo, u32 domain) if (unlikely(ret)) return ret; + if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS_BEST_EFFORT) + bo->flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS; + ret = amdgpu_bo_pin_restricted(bo, domain, 0, 0); if (ret) pr_err("Error in Pinning BO to domain: %d\n", domain); @@ -1712,6 +1715,10 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( alloc_flags = AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE; alloc_flags |= (flags & KFD_IOC_ALLOC_MEM_FLAGS_PUBLIC) ? AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED : 0; + + /* For contiguous VRAM allocation */ + if (flags & KFD_IOC_ALLOC_MEM_FLAGS_CONTIGUOUS_BEST_EFFORT) + alloc_flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS_BEST_EFFORT; } xcp_id = fpriv->xcp_id == AMDGPU_XCP_NO_PARTITION ? 0 : fpriv->xcp_id; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 8db880244324..1d6e45e238e1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -516,8 +516,13 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, BUG_ON(min_block_size < mm->chunk_size); - /* Limit maximum size to 2GiB due to SG table limitations */ - size = min(remaining_size, 2ULL << 30); + if (place->flags & TTM_PL_FLAG_CONTIGUOUS) + size = remaining_size; + else + /* Limit maximum size to 2GiB due to SG table limitations +* for no contiguous allocation. +*/ + size = min(remaining_size, 2ULL << 30); if ((size >= (u64)pages_per_block << PAGE_SHIFT) && !(size & (((u64)pages_per_block << PAGE_SHIFT) - 1))) diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h index ad21c613fec8..13645abb8e46 100644 --- a/include/uapi/drm/amdgpu_drm.h +++ b/include/uapi/drm/amdgpu_drm.h @@ -171,6 +171,11 @@ extern "C" { * may override the MTYPE selected in AMDGPU_VA_OP_MAP. */ #define AMDGPU_GEM_CREATE_EXT_COHERENT(1 << 15) +/* Flag that allocating the BO with best effort for contiguous VRAM. + * If no contiguous VRAM, fallback to scattered allocation. + * Pin the BO for peerdirect RDMA trigger VRAM defragmentation. + */ +#define AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS_BEST_EFFORT (1 << 16) struct drm_amdgpu_gem_create_in { /** the requested memory size */ diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index 2040a470ddb4..c1394c162d4e 100644 --- a/include/uapi/linux/kfd_ioctl.h +++ b/include/uapi/linux/kfd_ioctl.h @@ -407,6 +407,7 @@ struct kfd_ioctl_acquire_vm_args { #define KFD_IOC_ALLOC_MEM_FLAGS_COHERENT (1 << 26) #define KFD_IOC_ALLOC_MEM_FLAGS_UNCACHED (1 << 25) #define KFD_IOC_ALLOC_MEM_FLAGS_EXT_COHERENT (1 << 24) +#define KFD_IOC_ALLOC_MEM_FLAGS_CONTIGUOUS_BEST_EFFORT (1 << 23) /* Allocate memory for later SVM (shared virtual memory) mapping. *
Re: [PATCH V2] drm/amdgpu: Fix incorrect return value
Am 12.04.24 um 10:55 schrieb YiPeng Chai: [Why] After calling amdgpu_vram_mgr_reserve_range multiple times with the same address, calling amdgpu_vram_mgr_query_page_status will always return -EBUSY. From the second call to amdgpu_vram_mgr_reserve_range, the same address will be added to the reservations_pending list again and is never moved to the reserved_pages list because the address had been reserved. Well just to make it clear that approach is a NAK until my concerns are solved. Regards, Christian. [How] First add the address status check before calling amdgpu_vram_mgr_do_reserve, if the address is already reserved, do nothing; If the address is already in the reservations_pending list, directly reserve memory; only add new nodes for the addresses that are not in the reserved_pages list and reservations_pending list. V2: Avoid repeated locking/unlocking. Signed-off-by: YiPeng Chai --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 25 +--- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 1e36c428d254..a636d3f650b1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -317,7 +317,6 @@ static void amdgpu_vram_mgr_do_reserve(struct ttm_resource_manager *man) dev_dbg(adev->dev, "Reservation 0x%llx - %lld, Succeeded\n", rsv->start, rsv->size); - vis_usage = amdgpu_vram_mgr_vis_size(adev, block); atomic64_add(vis_usage, >vis_usage); spin_lock(>bdev->lru_lock); @@ -340,19 +339,27 @@ int amdgpu_vram_mgr_reserve_range(struct amdgpu_vram_mgr *mgr, uint64_t start, uint64_t size) { struct amdgpu_vram_reservation *rsv; + int ret = 0; - rsv = kzalloc(sizeof(*rsv), GFP_KERNEL); - if (!rsv) - return -ENOMEM; + ret = amdgpu_vram_mgr_query_page_status(mgr, start); + if (!ret) + return 0; - INIT_LIST_HEAD(>allocated); - INIT_LIST_HEAD(>blocks); + if (ret == -ENOENT) { + rsv = kzalloc(sizeof(*rsv), GFP_KERNEL); + if (!rsv) + return -ENOMEM; - rsv->start = start; - rsv->size = size; + INIT_LIST_HEAD(>allocated); + INIT_LIST_HEAD(>blocks); + + rsv->start = start; + rsv->size = size; + } mutex_lock(>lock); - list_add_tail(>blocks, >reservations_pending); + if (ret == -ENOENT) + list_add_tail(>blocks, >reservations_pending); amdgpu_vram_mgr_do_reserve(>manager); mutex_unlock(>lock);
Re: [PATCH] drm/amdgpu: Fix incorrect return value
Am 03.04.24 um 09:06 schrieb YiPeng Chai: [Why] After calling amdgpu_vram_mgr_reserve_range multiple times with the same address, calling amdgpu_vram_mgr_query_page_status will always return -EBUSY. From the second call to amdgpu_vram_mgr_reserve_range, the same address will be added to the reservations_pending list again and is never moved to the reserved_pages list because the address had been reserved. Well that sounds like a really bad idea to me. Why is the function called multiple times with the same address in the first place ? Apart from that a note on the coding style below. [How] First add the address status check before calling amdgpu_vram_mgr_do_reserve, if the address is already reserved, do nothing; If the address is already in the reservations_pending list, directly reserve memory; only add new nodes for the addresses that are not in the reserved_pages list and reservations_pending list. Signed-off-by: YiPeng Chai --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 28 +--- 1 file changed, 19 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 1e36c428d254..0bf3f4092900 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -317,7 +317,6 @@ static void amdgpu_vram_mgr_do_reserve(struct ttm_resource_manager *man) dev_dbg(adev->dev, "Reservation 0x%llx - %lld, Succeeded\n", rsv->start, rsv->size); - vis_usage = amdgpu_vram_mgr_vis_size(adev, block); atomic64_add(vis_usage, >vis_usage); spin_lock(>bdev->lru_lock); @@ -340,19 +339,30 @@ int amdgpu_vram_mgr_reserve_range(struct amdgpu_vram_mgr *mgr, uint64_t start, uint64_t size) { struct amdgpu_vram_reservation *rsv; + int ret = 0; Don't initialize local variables when it isn't necessary. - rsv = kzalloc(sizeof(*rsv), GFP_KERNEL); - if (!rsv) - return -ENOMEM; + ret = amdgpu_vram_mgr_query_page_status(mgr, start); + if (!ret) + return 0; + + if (ret == -ENOENT) { + rsv = kzalloc(sizeof(*rsv), GFP_KERNEL); + if (!rsv) + return -ENOMEM; - INIT_LIST_HEAD(>allocated); - INIT_LIST_HEAD(>blocks); + INIT_LIST_HEAD(>allocated); + INIT_LIST_HEAD(>blocks); - rsv->start = start; - rsv->size = size; + rsv->start = start; + rsv->size = size; + + mutex_lock(>lock); + list_add_tail(>blocks, >reservations_pending); + mutex_unlock(>lock); + + } You should probably not lock/unlock here. Regards, Christian. mutex_lock(>lock); - list_add_tail(>blocks, >reservations_pending); amdgpu_vram_mgr_do_reserve(>manager); mutex_unlock(>lock);
Re: [PATCH] drm/amdgpu: validate the parameters of bo mapping operations more clearly
Am 12.04.24 um 09:35 schrieb xinhui pan: Verify the parameters of amdgpu_vm_bo_(map/replace_map/clearing_mappings) in one common place. Reported-by: Vlad Stolyarov Suggested-by: Christian König Signed-off-by: xinhui pan Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 72 -- 1 file changed, 46 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 8af3f0fd3073..4e2391c83d7c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -1647,6 +1647,37 @@ static void amdgpu_vm_bo_insert_map(struct amdgpu_device *adev, trace_amdgpu_vm_bo_map(bo_va, mapping); } +/* Validate operation parameters to prevent potential abuse */ +static int amdgpu_vm_verify_parameters(struct amdgpu_device *adev, + struct amdgpu_bo *bo, + uint64_t saddr, + uint64_t offset, + uint64_t size) +{ + uint64_t tmp, lpfn; + + if (saddr & AMDGPU_GPU_PAGE_MASK + || offset & AMDGPU_GPU_PAGE_MASK + || size & AMDGPU_GPU_PAGE_MASK) + return -EINVAL; + + if (check_add_overflow(saddr, size, ) + || check_add_overflow(offset, size, ) + || size == 0 /* which also leads to end < begin */) + return -EINVAL; + + /* make sure object fit at this offset */ + if (bo && offset + size > amdgpu_bo_size(bo)) + return -EINVAL; + + /* Ensure last pfn not exceed max_pfn */ + lpfn = (saddr + size - 1) >> AMDGPU_GPU_PAGE_SHIFT; + if (lpfn >= adev->vm_manager.max_pfn) + return -EINVAL; + + return 0; +} + /** * amdgpu_vm_bo_map - map bo inside a vm * @@ -1673,21 +1704,14 @@ int amdgpu_vm_bo_map(struct amdgpu_device *adev, struct amdgpu_bo *bo = bo_va->base.bo; struct amdgpu_vm *vm = bo_va->base.vm; uint64_t eaddr; + int r; - /* validate the parameters */ - if (saddr & ~PAGE_MASK || offset & ~PAGE_MASK || size & ~PAGE_MASK) - return -EINVAL; - if (saddr + size <= saddr || offset + size <= offset) - return -EINVAL; - - /* make sure object fit at this offset */ - eaddr = saddr + size - 1; - if ((bo && offset + size > amdgpu_bo_size(bo)) || - (eaddr >= adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT)) - return -EINVAL; + r = amdgpu_vm_verify_parameters(adev, bo, saddr, offset, size); + if (r) + return r; saddr /= AMDGPU_GPU_PAGE_SIZE; - eaddr /= AMDGPU_GPU_PAGE_SIZE; + eaddr = saddr + (size - 1) / AMDGPU_GPU_PAGE_SIZE; tmp = amdgpu_vm_it_iter_first(>va, saddr, eaddr); if (tmp) { @@ -1740,17 +1764,9 @@ int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev, uint64_t eaddr; int r; - /* validate the parameters */ - if (saddr & ~PAGE_MASK || offset & ~PAGE_MASK || size & ~PAGE_MASK) - return -EINVAL; - if (saddr + size <= saddr || offset + size <= offset) - return -EINVAL; - - /* make sure object fit at this offset */ - eaddr = saddr + size - 1; - if ((bo && offset + size > amdgpu_bo_size(bo)) || - (eaddr >= adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT)) - return -EINVAL; + r = amdgpu_vm_verify_parameters(adev, bo, saddr, offset, size); + if (r) + return r; /* Allocate all the needed memory */ mapping = kmalloc(sizeof(*mapping), GFP_KERNEL); @@ -1764,7 +1780,7 @@ int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev, } saddr /= AMDGPU_GPU_PAGE_SIZE; - eaddr /= AMDGPU_GPU_PAGE_SIZE; + eaddr = saddr + (size - 1) / AMDGPU_GPU_PAGE_SIZE; mapping->start = saddr; mapping->last = eaddr; @@ -1851,10 +1867,14 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev, struct amdgpu_bo_va_mapping *before, *after, *tmp, *next; LIST_HEAD(removed); uint64_t eaddr; + int r; + + r = amdgpu_vm_verify_parameters(adev, NULL, saddr, 0, size); + if (r) + return r; - eaddr = saddr + size - 1; saddr /= AMDGPU_GPU_PAGE_SIZE; - eaddr /= AMDGPU_GPU_PAGE_SIZE; + eaddr = saddr + (size - 1) / AMDGPU_GPU_PAGE_SIZE; /* Allocate all the needed memory */ before = kzalloc(sizeof(*before), GFP_KERNEL);
Re: [PATCH] drm/amdgpu: validate the parameters of bo mapping operations more clearly
Am 12.04.24 um 08:47 schrieb xinhui pan: Verify the parameters of amdgpu_vm_bo_(map/replace_map/clearing_mappings) in one common place. Reported-by: Vlad Stolyarov Signed-off-by: xinhui pan --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 63 -- 1 file changed, 39 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 8af3f0fd3073..ea9721666756 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -1647,6 +1647,37 @@ static void amdgpu_vm_bo_insert_map(struct amdgpu_device *adev, trace_amdgpu_vm_bo_map(bo_va, mapping); } Please add a one line comment here describing why we have the function. E.g. "validate operation parameters to prevent potential abuse" or something like that. +static int amdgpu_vm_bo_verify_parameters(struct amdgpu_device *adev, + struct amdgpu_bo *bo, + uint64_t saddr, + uint64_t offset, + uint64_t size) Probably better to drop the _bo_ from the name cause this doesn't work on bo_va structures. +{ + size_t tmp, lpfn; Extremely bad idea, size_t might only be 32bit. Please use uint64_t here as well. + + if (saddr & AMDGPU_GPU_PAGE_MASK + || offset & AMDGPU_GPU_PAGE_MASK + || size & AMDGPU_GPU_PAGE_MASK) + return -EINVAL; + + /* Check overflow */ That comment is a bit superfluous when check_add_overflow() is used. Maybe drop it. + if (check_add_overflow(saddr, size, ) + || check_add_overflow(offset, size, ) + || size == 0 /* which also leads to end < begin */) + return -EINVAL; + + /* make sure object fit at this offset */ + if (bo && offset + size > amdgpu_bo_size(bo)) + return -EINVAL; + + /* Ensure last pfn not exceed max_pfn */ + lpfn = (saddr + size - 1) >> AMDGPU_GPU_PAGE_SHIFT; + if (lpfn >= adev->vm_manager.max_pfn) + return -EINVAL; + + return 0; +} + /** * amdgpu_vm_bo_map - map bo inside a vm * @@ -1674,20 +1705,11 @@ int amdgpu_vm_bo_map(struct amdgpu_device *adev, struct amdgpu_vm *vm = bo_va->base.vm; uint64_t eaddr; - /* validate the parameters */ - if (saddr & ~PAGE_MASK || offset & ~PAGE_MASK || size & ~PAGE_MASK) - return -EINVAL; - if (saddr + size <= saddr || offset + size <= offset) - return -EINVAL; - - /* make sure object fit at this offset */ - eaddr = saddr + size - 1; - if ((bo && offset + size > amdgpu_bo_size(bo)) || - (eaddr >= adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT)) + if (amdgpu_vm_bo_verify_parameters(adev, bo, saddr, offset, size)) return -EINVAL; Probably better to return the return value of amdgpu_vm_bo_verify_parameters(). + eaddr = (saddr + size - 1) / AMDGPU_GPU_PAGE_SIZE; saddr /= AMDGPU_GPU_PAGE_SIZE; - eaddr /= AMDGPU_GPU_PAGE_SIZE; Please keep the saddr, eaddr calculation order. Apart from those nit picks looks really good to me. Regards, Christian. tmp = amdgpu_vm_it_iter_first(>va, saddr, eaddr); if (tmp) { @@ -1740,16 +1762,7 @@ int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev, uint64_t eaddr; int r; - /* validate the parameters */ - if (saddr & ~PAGE_MASK || offset & ~PAGE_MASK || size & ~PAGE_MASK) - return -EINVAL; - if (saddr + size <= saddr || offset + size <= offset) - return -EINVAL; - - /* make sure object fit at this offset */ - eaddr = saddr + size - 1; - if ((bo && offset + size > amdgpu_bo_size(bo)) || - (eaddr >= adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT)) + if (amdgpu_vm_bo_verify_parameters(adev, bo, saddr, offset, size)) return -EINVAL; /* Allocate all the needed memory */ @@ -1763,8 +1776,8 @@ int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev, return r; } + eaddr = (saddr + size - 1) / AMDGPU_GPU_PAGE_SIZE; saddr /= AMDGPU_GPU_PAGE_SIZE; - eaddr /= AMDGPU_GPU_PAGE_SIZE; mapping->start = saddr; mapping->last = eaddr; @@ -1852,9 +1865,11 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev, LIST_HEAD(removed); uint64_t eaddr; - eaddr = saddr + size - 1; + if (amdgpu_vm_bo_verify_parameters(adev, NULL, saddr, 0, size)) + return -EINVAL; + + eaddr = (saddr + size - 1) / AMDGPU_GPU_PAGE_SIZE; saddr /= AMDGPU_GPU_PAGE_SIZE; - eaddr /= AMDGPU_GPU_PAGE_SIZE; /* Allocate all the needed memory */ before = kzalloc(sizeof(*before), GFP_KERNEL);
Re: [PATCH] drm/amdgpu: validate the parameters of amdgpu_vm_bo_clear_mappings
Am 11.04.24 um 17:44 schrieb Jann Horn: On Thu, Apr 11, 2024 at 12:25 PM Christian König wrote: Am 11.04.24 um 05:28 schrieb xinhui pan: Ensure there is no address overlapping. Reported-by: Vlad Stolyarov Signed-off-by: xinhui pan --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 8af3f0fd3073..f1315a854192 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -1852,6 +1852,12 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev, LIST_HEAD(removed); uint64_t eaddr; + /* validate the parameters */ + if (saddr & ~PAGE_MASK || size & ~PAGE_MASK) + return -EINVAL; Well as general rule: *never* use PAGE_MASK and other PAGE_* macros here. This is GPUVM and not related to the CPUVM space. + if (saddr + size <= saddr) + return -EINVAL; + Mhm, so basically size is not checked for a wraparound? Yeah, exactly. eaddr = saddr + size - 1; saddr /= AMDGPU_GPU_PAGE_SIZE; eaddr /= AMDGPU_GPU_PAGE_SIZE; If that's the case then I would rather check for saddr < eaddr here. FWIW, it would probably a good idea to keep the added check analogous to other functions called from amdgpu_gem_va_ioctl() like amdgpu_vm_bo_replace_map(), which also checks "if (saddr + size <= saddr || offset + size <= offset)" before the division. I would also change that function as well. The eaddr needs to be checked against the max_pfn as well and we currently shift that around for this check which looks quite ugly. Only the overflow check can probably be before it. But that actually shouldn't matter since this code here: /* Now gather all removed mappings */ tmp = amdgpu_vm_it_iter_first(>va, saddr, eaddr); while (tmp) { Then shouldn't return anything, so the operation is basically a NO-OP then. That's not how it works; the interval tree is not designed to be fed bogus ranges that end before they start. (Or at least I don't think it is - if it is, it is buggy.) I think basically if the specified start and end addresses are both within an rbtree node, this rbtree node is returned from the lookup, even if the specified end address is before the specified start address. Ah, yeah that makes sense. The search functions checks if a node only partially intersects with start and end and not if it is covered by it. Thanks, Christian. A more verbose example: Let's assume the interval tree contains a single entry from address A to address D. Looking at the _iter_first implementation in interval_tree_generic.h, when it is called with a start address C which is between A and D, and an end address B (result of an addition that wraps around to an address below C but above A), it does the following: 1. bails out if "node->ITSUBTREE < start" (meaning if the specified start address C is after the range covered by the root node - which is not the case) 2. bails out if "ITSTART(leftmost) > last" (meaning if the specified end address is smaller than the entry start address A - which is not the case) 3. enters _subtree_search. in there: 4. the root node has no children, so "node->ITRB.rb_left" is NULL 5. the specified end address B is after the node's start address A, so "Cond1" is fulfilled 6. the specified start address C is before the node's end address D, so "Cond2" is fulfilled 7. the root node is returned from the lookup
Re: [PATCH Review 1/1] drm/amdgpu: Support setting recover method
Am 11.04.24 um 13:49 schrieb Christian König: Am 11.04.24 um 13:30 schrieb Yang, Stanley: [AMD Official Use Only - General] -Original Message- From: Christian König Sent: Thursday, April 11, 2024 7:17 PM To: Yang, Stanley ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH Review 1/1] drm/amdgpu: Support setting recover method Am 11.04.24 um 13:11 schrieb Stanley.Yang: Don't modify amdgpu gpu recover get operation, add amdgpu gpu recover set operation to select reset method, only support mode1 and mode2 currently. Well I don't think setting this from userspace is valid. The reset method to use is determined by the hardware and environment (e.g. SRIOV, passthrough, whatever) and can't be chosen simply. [Stanley]: Agree, the setting is invalid for some devices not supported setting method and devices still reset with default method, but it's valid for those devices supported setting reset method, user can conduct combination testing like mode1 test then mode2 test without re-modprobe driver. Well and the user could also shoot himself into the foot. I really don't think that this is a valuable functionality. To make clear what I mean: What you could do is to make the module parameter writeable. In this case the hardware code still decides which reset method to use based on the module parameter in the moment the reset is requested. That would also avoid re-loading the driver. Regards, Christian. Regards, Christian. Regards, Stanley Regards, Christian. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 37 +++--- 3 files changed, 37 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 9c62552bec34..c82976b2b977 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1151,6 +1151,9 @@ struct amdgpu_device { bool debug_largebar; bool debug_disable_soft_recovery; bool debug_use_vram_fw_buf; + + /* Used to set gpu reset method */ + int recover_method; }; static inline uint32_t amdgpu_ip_version(const struct amdgpu_device *adev, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 3204b8f6edeb..8411a793be18 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3908,6 +3908,7 @@ int amdgpu_device_init(struct amdgpu_device *adev, else adev->asic_type = flags & AMD_ASIC_MASK; + adev->recover_method = AMD_RESET_METHOD_NONE; adev->usec_timeout = AMDGPU_MAX_USEC_TIMEOUT; if (amdgpu_emu_mode == 1) adev->usec_timeout *= 10; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 10832b470448..e388a50d11d9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -965,9 +965,37 @@ static int gpu_recover_get(void *data, u64 *val) return 0; } +static int gpu_recover_set(void *data, u64 val) { + struct amdgpu_device *adev = (struct amdgpu_device *)data; + struct drm_device *dev = adev_to_drm(adev); + int r; + + /* TODO: support mode1 and mode2 currently */ + if (val == AMD_RESET_METHOD_MODE1 || + val == AMD_RESET_METHOD_MODE2) + adev->recover_method = val; + else + adev->recover_method = AMD_RESET_METHOD_NONE; + + r = pm_runtime_get_sync(dev->dev); + if (r < 0) { + pm_runtime_put_autosuspend(dev->dev); + return 0; + } + + if (amdgpu_reset_domain_schedule(adev->reset_domain, reset_work)) + flush_work(>reset_work); + + pm_runtime_mark_last_busy(dev->dev); + pm_runtime_put_autosuspend(dev->dev); + + return 0; +} + DEFINE_SHOW_ATTRIBUTE(amdgpu_debugfs_fence_info); -DEFINE_DEBUGFS_ATTRIBUTE(amdgpu_debugfs_gpu_recover_fops, gpu_recover_get, NULL, - "%lld\n"); +DEFINE_DEBUGFS_ATTRIBUTE(amdgpu_debugfs_gpu_recover_fops, gpu_recover_get, + gpu_recover_set, "%lld\n"); static void amdgpu_debugfs_reset_work(struct work_struct *work) { @@ -978,9 +1006,10 @@ static void amdgpu_debugfs_reset_work(struct work_struct *work) memset(_context, 0, sizeof(reset_context)); - reset_context.method = AMD_RESET_METHOD_NONE; + reset_context.method = adev->recover_method; reset_context.reset_req_dev = adev; set_bit(AMDGPU_NEED_FULL_RESET, _context.flags); + adev->recover_method = AMD_RESET_METHOD_NONE; amdgpu_device_gpu_recover(adev, NULL, _context); } @@ -999,7 +1028,7 @@ void amdgpu_debugfs_fence_init(struct amdgpu_device *adev)
Re: [PATCH Review 1/1] drm/amdgpu: Support setting recover method
Am 11.04.24 um 13:30 schrieb Yang, Stanley: [AMD Official Use Only - General] -Original Message- From: Christian König Sent: Thursday, April 11, 2024 7:17 PM To: Yang, Stanley ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH Review 1/1] drm/amdgpu: Support setting recover method Am 11.04.24 um 13:11 schrieb Stanley.Yang: Don't modify amdgpu gpu recover get operation, add amdgpu gpu recover set operation to select reset method, only support mode1 and mode2 currently. Well I don't think setting this from userspace is valid. The reset method to use is determined by the hardware and environment (e.g. SRIOV, passthrough, whatever) and can't be chosen simply. [Stanley]: Agree, the setting is invalid for some devices not supported setting method and devices still reset with default method, but it's valid for those devices supported setting reset method, user can conduct combination testing like mode1 test then mode2 test without re-modprobe driver. Well and the user could also shoot himself into the foot. I really don't think that this is a valuable functionality. Regards, Christian. Regards, Stanley Regards, Christian. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 3 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 37 +++--- 3 files changed, 37 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 9c62552bec34..c82976b2b977 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1151,6 +1151,9 @@ struct amdgpu_device { booldebug_largebar; booldebug_disable_soft_recovery; booldebug_use_vram_fw_buf; + + /* Used to set gpu reset method */ + int recover_method; }; static inline uint32_t amdgpu_ip_version(const struct amdgpu_device *adev, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 3204b8f6edeb..8411a793be18 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3908,6 +3908,7 @@ int amdgpu_device_init(struct amdgpu_device *adev, else adev->asic_type = flags & AMD_ASIC_MASK; + adev->recover_method = AMD_RESET_METHOD_NONE; adev->usec_timeout = AMDGPU_MAX_USEC_TIMEOUT; if (amdgpu_emu_mode == 1) adev->usec_timeout *= 10; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 10832b470448..e388a50d11d9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -965,9 +965,37 @@ static int gpu_recover_get(void *data, u64 *val) return 0; } +static int gpu_recover_set(void *data, u64 val) { + struct amdgpu_device *adev = (struct amdgpu_device *)data; + struct drm_device *dev = adev_to_drm(adev); + int r; + + /* TODO: support mode1 and mode2 currently */ + if (val == AMD_RESET_METHOD_MODE1 || + val == AMD_RESET_METHOD_MODE2) + adev->recover_method = val; + else + adev->recover_method = AMD_RESET_METHOD_NONE; + + r = pm_runtime_get_sync(dev->dev); + if (r < 0) { + pm_runtime_put_autosuspend(dev->dev); + return 0; + } + + if (amdgpu_reset_domain_schedule(adev->reset_domain, reset_work)) + flush_work(>reset_work); + + pm_runtime_mark_last_busy(dev->dev); + pm_runtime_put_autosuspend(dev->dev); + + return 0; +} + DEFINE_SHOW_ATTRIBUTE(amdgpu_debugfs_fence_info); -DEFINE_DEBUGFS_ATTRIBUTE(amdgpu_debugfs_gpu_recover_fops, gpu_recover_get, NULL, -"%lld\n"); +DEFINE_DEBUGFS_ATTRIBUTE(amdgpu_debugfs_gpu_recover_fops, gpu_recover_get, +gpu_recover_set, "%lld\n"); static void amdgpu_debugfs_reset_work(struct work_struct *work) { @@ -978,9 +1006,10 @@ static void amdgpu_debugfs_reset_work(struct work_struct *work) memset(_context, 0, sizeof(reset_context)); - reset_context.method = AMD_RESET_METHOD_NONE; + reset_context.method = adev->recover_method; reset_context.reset_req_dev = adev; set_bit(AMDGPU_NEED_FULL_RESET, _context.flags); + adev->recover_method = AMD_RESET_METHOD_NONE; amdgpu_device_gpu_recover(adev, NULL, _context); } @@ -999,7 +1028,7 @@ void amdgpu_debugfs_fence_init(struct amdgpu_device *adev) if (!amdgpu_sriov_vf(adev)) { INIT_WORK(>reset_work, amdgpu_debugfs_reset_work); - debugfs_create_file("amdgpu_gpu_recover", 0444, root, adev, + debugfs_create_file("amdgpu_gpu_recover", 0666, root, adev, _debugfs_gpu_recover_fops); } #endif
Re: [PATCH V2] drm/ttm: remove unused paramter
Am 01.04.24 um 05:04 schrieb jesse.zh...@amd.com: From: Jesse Zhang remove the unsed the paramter in the function ttm_bo_bounce_temp_buffer and ttm_bo_add_move_fence. V2:rebase the patch on top of drm-misc-next (Christian) And pushed to drm-misc-next. Thanks, Christian. Signed-off-by: Jesse Zhang Reviewed-by: Christian König --- drivers/gpu/drm/ttm/ttm_bo.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index e059b1e1b13b..6396dece0db1 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -402,7 +402,6 @@ void ttm_bo_put(struct ttm_buffer_object *bo) EXPORT_SYMBOL(ttm_bo_put); static int ttm_bo_bounce_temp_buffer(struct ttm_buffer_object *bo, -struct ttm_resource **mem, struct ttm_operation_ctx *ctx, struct ttm_place *hop) { @@ -469,7 +468,7 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo, if (ret != -EMULTIHOP) break; - ret = ttm_bo_bounce_temp_buffer(bo, _mem, ctx, ); + ret = ttm_bo_bounce_temp_buffer(bo, ctx, ); } while (!ret); if (ret) { @@ -698,7 +697,6 @@ EXPORT_SYMBOL(ttm_bo_unpin); */ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo, struct ttm_resource_manager *man, -struct ttm_resource *mem, bool no_wait_gpu) { struct dma_fence *fence; @@ -787,7 +785,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo, if (ret) continue; - ret = ttm_bo_add_move_fence(bo, man, *res, ctx->no_wait_gpu); + ret = ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu); if (unlikely(ret)) { ttm_resource_free(bo, res); if (ret == -EBUSY) @@ -894,7 +892,7 @@ int ttm_bo_validate(struct ttm_buffer_object *bo, bounce: ret = ttm_bo_handle_move_mem(bo, res, false, ctx, ); if (ret == -EMULTIHOP) { - ret = ttm_bo_bounce_temp_buffer(bo, , ctx, ); + ret = ttm_bo_bounce_temp_buffer(bo, ctx, ); /* try and move to final place now. */ if (!ret) goto bounce;
Re: [PATCH Review 1/1] drm/amdgpu: Support setting recover method
Am 11.04.24 um 13:11 schrieb Stanley.Yang: Don't modify amdgpu gpu recover get operation, add amdgpu gpu recover set operation to select reset method, only support mode1 and mode2 currently. Well I don't think setting this from userspace is valid. The reset method to use is determined by the hardware and environment (e.g. SRIOV, passthrough, whatever) and can't be chosen simply. Regards, Christian. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 3 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 37 +++--- 3 files changed, 37 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 9c62552bec34..c82976b2b977 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1151,6 +1151,9 @@ struct amdgpu_device { booldebug_largebar; booldebug_disable_soft_recovery; booldebug_use_vram_fw_buf; + + /* Used to set gpu reset method */ + int recover_method; }; static inline uint32_t amdgpu_ip_version(const struct amdgpu_device *adev, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 3204b8f6edeb..8411a793be18 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3908,6 +3908,7 @@ int amdgpu_device_init(struct amdgpu_device *adev, else adev->asic_type = flags & AMD_ASIC_MASK; + adev->recover_method = AMD_RESET_METHOD_NONE; adev->usec_timeout = AMDGPU_MAX_USEC_TIMEOUT; if (amdgpu_emu_mode == 1) adev->usec_timeout *= 10; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 10832b470448..e388a50d11d9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -965,9 +965,37 @@ static int gpu_recover_get(void *data, u64 *val) return 0; } +static int gpu_recover_set(void *data, u64 val) +{ + struct amdgpu_device *adev = (struct amdgpu_device *)data; + struct drm_device *dev = adev_to_drm(adev); + int r; + + /* TODO: support mode1 and mode2 currently */ + if (val == AMD_RESET_METHOD_MODE1 || + val == AMD_RESET_METHOD_MODE2) + adev->recover_method = val; + else + adev->recover_method = AMD_RESET_METHOD_NONE; + + r = pm_runtime_get_sync(dev->dev); + if (r < 0) { + pm_runtime_put_autosuspend(dev->dev); + return 0; + } + + if (amdgpu_reset_domain_schedule(adev->reset_domain, >reset_work)) + flush_work(>reset_work); + + pm_runtime_mark_last_busy(dev->dev); + pm_runtime_put_autosuspend(dev->dev); + + return 0; +} + DEFINE_SHOW_ATTRIBUTE(amdgpu_debugfs_fence_info); -DEFINE_DEBUGFS_ATTRIBUTE(amdgpu_debugfs_gpu_recover_fops, gpu_recover_get, NULL, -"%lld\n"); +DEFINE_DEBUGFS_ATTRIBUTE(amdgpu_debugfs_gpu_recover_fops, gpu_recover_get, +gpu_recover_set, "%lld\n"); static void amdgpu_debugfs_reset_work(struct work_struct *work) { @@ -978,9 +1006,10 @@ static void amdgpu_debugfs_reset_work(struct work_struct *work) memset(_context, 0, sizeof(reset_context)); - reset_context.method = AMD_RESET_METHOD_NONE; + reset_context.method = adev->recover_method; reset_context.reset_req_dev = adev; set_bit(AMDGPU_NEED_FULL_RESET, _context.flags); + adev->recover_method = AMD_RESET_METHOD_NONE; amdgpu_device_gpu_recover(adev, NULL, _context); } @@ -999,7 +1028,7 @@ void amdgpu_debugfs_fence_init(struct amdgpu_device *adev) if (!amdgpu_sriov_vf(adev)) { INIT_WORK(>reset_work, amdgpu_debugfs_reset_work); - debugfs_create_file("amdgpu_gpu_recover", 0444, root, adev, + debugfs_create_file("amdgpu_gpu_recover", 0666, root, adev, _debugfs_gpu_recover_fops); } #endif
Re: [PATCH] drm/amdgpu: validate the parameters of amdgpu_vm_bo_clear_mappings
Am 11.04.24 um 05:28 schrieb xinhui pan: Ensure there is no address overlapping. Reported-by: Vlad Stolyarov Signed-off-by: xinhui pan --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 8af3f0fd3073..f1315a854192 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -1852,6 +1852,12 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev, LIST_HEAD(removed); uint64_t eaddr; + /* validate the parameters */ + if (saddr & ~PAGE_MASK || size & ~PAGE_MASK) + return -EINVAL; Well as general rule: *never* use PAGE_MASK and other PAGE_* macros here. This is GPUVM and not related to the CPUVM space. + if (saddr + size <= saddr) + return -EINVAL; + Mhm, so basically size is not checked for a wraparound? eaddr = saddr + size - 1; saddr /= AMDGPU_GPU_PAGE_SIZE; eaddr /= AMDGPU_GPU_PAGE_SIZE; If that's the case then I would rather check for saddr < eaddr here. But that actually shouldn't matter since this code here: /* Now gather all removed mappings */ tmp = amdgpu_vm_it_iter_first(>va, saddr, eaddr); while (tmp) { Then shouldn't return anything, so the operation is basically a NO-OP then. Regards, Christian.
Re: [PATCH] drm/amdgpu: set vm_update_mode=0 as default for NV32 in SRIOV case
Am 28.03.24 um 00:34 schrieb Danijel Slivka: For asic with VF MMIO access protection avoid using CPU for VM table updates. CPU pagetable updates have issues with HDP flush as VF MMIO access protection blocks write to BIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL register during sriov runtime. Well big NAK to the reasoning. HDP flush is *mandatory* to work correctly. This not only includes flushes for CPU based VM updates, but also GART updates. Without reliable HDP flushes the driver is simply not stable. Regards, Christian. Signed-off-by: Danijel Slivka --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c index aed60aaf1a55..a3012c9aa92b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c @@ -724,7 +724,8 @@ void amdgpu_detect_virtualization(struct amdgpu_device *adev) adev->virt.caps |= AMDGPU_PASSTHROUGH_MODE; } - if (amdgpu_sriov_vf(adev) && adev->asic_type == CHIP_SIENNA_CICHLID) + if ((amdgpu_sriov_vf(adev) && adev->asic_type == CHIP_SIENNA_CICHLID) || + adev->pdev->device == 0x7461) /* VF MMIO access (except mailbox range) from CPU * will be blocked during sriov runtime */
Re: [PATCH] drm/amdgpu: validate the parameters of amdgpu_vm_bo_clear_mappings
Am 11.04.24 um 05:28 schrieb xinhui pan: Ensure there is no address overlapping. Reported-by: Vlad Stolyarov Signed-off-by: xinhui pan --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 8af3f0fd3073..f1315a854192 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -1852,6 +1852,12 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev, LIST_HEAD(removed); uint64_t eaddr; + /* validate the parameters */ + if (saddr & ~PAGE_MASK || size & ~PAGE_MASK) + return -EINVAL; + if (saddr + size <= saddr) + return -EINVAL; + Why the heck should we do that? Looks invalid to me. Regards, Christian. eaddr = saddr + size - 1; saddr /= AMDGPU_GPU_PAGE_SIZE; eaddr /= AMDGPU_GPU_PAGE_SIZE;