Re: 回复: [PATCH] drm/amdgpu: add amdgpu vram usage information into amdgpu_vram_mm
On 11/25/2022 8:05 AM, Wang, Yang(Kevin) wrote: [AMD Official Use Only - General] -邮件原件- 发件人: Paneer Selvam, Arunpravin 发送时间: 2022年11月24日 22:57 收件人: Christian König ; Wang, Yang(Kevin) ; Koenig, Christian ; amd-gfx@lists.freedesktop.org 抄送: Deucher, Alexander ; Zhang, Hawking 主题: Re: [PATCH] drm/amdgpu: add amdgpu vram usage information into amdgpu_vram_mm On 11/24/2022 1:17 PM, Christian König wrote: Am 24.11.22 um 08:45 schrieb Wang, Yang(Kevin): [AMD Official Use Only - General] -Original Message- From: Koenig, Christian Sent: Thursday, November 24, 2022 3:25 PM To: Wang, Yang(Kevin) ; amd-gfx@lists.freedesktop.org; Paneer Selvam, Arunpravin Cc: Zhang, Hawking ; Deucher, Alexander Subject: Re: [PATCH] drm/amdgpu: add amdgpu vram usage information into amdgpu_vram_mm Am 24.11.22 um 06:49 schrieb Yang Wang: add vram usage information into dri debugfs amdgpu_vram_mm node. Background: when amdgpu driver introduces drm buddy allocator, the kernel driver (and developer) is difficult to track the vram usage information. Field: 0x-0x: vram usaged range. type: kernel, device, sg usage: normal, vm, user domain: C-CPU, G-GTT, V-VRAM, P-PRIV @x: the address of "amdgpu_bo" object in kernel space. 4096: vram range range. Example: 0x0003fea68000-0x0003fea68fff: (type:kernel usage:vm domain:--V- --V-) @1d33dfee 4096 bytes 0x0003fea69000-0x0003fea69fff: (type:kernel usage:vm domain:--V- --V-) @a79155b5 4096 bytes 0x0003fea6b000-0x0003fea6bfff: (type:kernel usage:vm domain:--V- --V-) @38ad633b 4096 bytes 0x0003fea6c000-0x0003fea6cfff: (type:device usage:user domain:--V- --V-) @e302f90b 4096 bytes 0x0003fea6d000-0x0003fea6dfff: (type:device usage:user domain:--V- --V-) @e664c172 4096 bytes 0x0003fea6e000-0x0003fea6efff: (type:kernel usage:vm domain:--V- --V-) @4528cb2f 4096 bytes 0x0003fea6f000-0x0003fea6: (type:kernel usage:vm domain:--V- --V-) @a446bdbf 4096 bytes 0x0003fea7-0x0003fea7: (type:device usage:user domain:--V- --V-) @78fae42f 65536 bytes 0x0003fead8000-0x0003feadbfff: (type:kernel usage:normal domain:--V- --V-) @1327b7ff 16384 bytes 0x0003feadc000-0x0003feadcfff: (type:kernel usage:normal domain:--V- --V-) @1327b7ff 4096 bytes 0x0003feadd000-0x0003feaddfff: (type:kernel usage:normal domain:--V- --V-) @b9706fc1 4096 bytes 0x0003feade000-0x0003feadefff: (type:kernel usage:vm domain:--V- --V-) @71a25571 4096 bytes Note: although some vram ranges can be merged in the example above, but this can reflect the actual distribution of drm buddy allocator. Well completely NAK. This is way to much complexity for simple debugging. Question is what are your requirements here? E.g. what information do you want and why doesn't the buddy allocator already expose this? Regards, Christian. [Kevin]: For KMD debug. The DRM buddy interface doesn't provide an interface to query which ranges of ram(resource) are used. It is not easy to debug in KMD side if driver create BO fail at specific location. and from the view of KMD, the VRAM at some locations has special purposes. with this patch, we can know which range of vram are actually used. Well that's not a good reason to add this complexity. Debugging doesn't justify that. Please work with Arun to add the necessary information to the buddy allocator interface. Regards, Christian. Hi Kevin, I will check and list down some of the necessary information that we can add to the buddy allocator interface. Regards, Arun. [kevin]: Thanks, but some information is AMDGPU specific, so I hope we can add some flexible interfaces to adapt the different driver's extension. We have a debug interface in drm buddy module drm_buddy_print() function, it prints the available memory in each order at any instance and we have drm_buddy_block_print() function to print the address range of each blocks and its size. Please check if it solves the purpose. Regards, Arun Best Regards, Kevin Best Regards, Kevin Signed-off-by: Yang Wang --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 130 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h | 1 + 4 files changed, 136 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 90eb07106609..117c754409b3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -53,7 +53,7 @@ * */ -static void amdgpu_bo_destroy(struct ttm_buffer_object *tbo) +void amdgpu_bo_destroy(struct ttm_buffer_object *tbo) { struct amdgpu_bo *bo = ttm_to_amdgpu_bo(tbo); @@ -66,7 +66,7 @@ static void amdgpu_bo_destr
[PATCH] drm/amdgpu: New method to check block continuous
Blocks are not guarnteed to be in ascending order. Signed-off-by: xinhui pan --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 21 1 file changed, 8 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 27159f1d112e..17175d284869 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -59,22 +59,17 @@ amdgpu_vram_mgr_first_block(struct list_head *list) static inline bool amdgpu_is_vram_mgr_blocks_contiguous(struct list_head *head) { struct drm_buddy_block *block; - u64 start, size; + u64 start = LONG_MAX, end = 0, size = 0; - block = amdgpu_vram_mgr_first_block(head); - if (!block) - return false; + list_for_each_entry(block, head, link) { + u64 bstart = amdgpu_vram_mgr_block_start(block); + u64 bsize = amdgpu_vram_mgr_block_size(block); - while (head != block->link.next) { - start = amdgpu_vram_mgr_block_start(block); - size = amdgpu_vram_mgr_block_size(block); - - block = list_entry(block->link.next, struct drm_buddy_block, link); - if (start + size != amdgpu_vram_mgr_block_start(block)) - return false; + start = min(bstart, start); + end = max(bstart + bsize, end); + size += bsize; } - - return true; + return end == start + size; } -- 2.34.1
[PATCH v2] drm: Optimise for continuous memory allocation
Currently drm-buddy does not have full knowledge of continuous memory. Lets consider scenario below. order 1:L R order 0: LL LR RL RR for order 1 allocation, it can offer L or R or LR+RL. For now, we only implement L or R case for continuous memory allocation. So this patch aims to implement the LR+RL case. Signed-off-by: xinhui pan --- change from v1: implement top-down continuous allocation --- drivers/gpu/drm/drm_buddy.c | 66 + 1 file changed, 59 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 11bb59399471..eea14505070e 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -386,6 +386,46 @@ alloc_range_bias(struct drm_buddy *mm, return ERR_PTR(err); } +static struct drm_buddy_block * +find_continuous_blocks(struct drm_buddy *mm, + int order, + unsigned long flags, + struct drm_buddy_block **rn) +{ + struct list_head *head = &mm->free_list[order]; + struct drm_buddy_block *node, *parent, *free_node, *max_node = NULL; + + list_for_each_entry(free_node, head, link) { + if (max_node) { + if (!(flags & DRM_BUDDY_TOPDOWN_ALLOCATION)) + break; + + if (drm_buddy_block_offset(free_node) < + drm_buddy_block_offset(max_node)) + continue; + } + + parent = free_node; + do { + node = parent; + parent = parent->parent; + } while (parent && parent->right == node); + if (!parent) + continue; + + node = parent->right; + while (drm_buddy_block_is_split(node)) + node = node->left; + + if (drm_buddy_block_is_free(node) && + drm_buddy_block_order(node) == order) { + *rn = node; + max_node = free_node; + } + } + return max_node; +} + static struct drm_buddy_block * get_maxblock(struct list_head *head) { @@ -637,7 +677,7 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm, struct list_head *blocks, unsigned long flags) { - struct drm_buddy_block *block = NULL; + struct drm_buddy_block *block = NULL, *rblock = NULL; unsigned int min_order, order; unsigned long pages; LIST_HEAD(allocated); @@ -689,17 +729,29 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm, break; if (order-- == min_order) { + if (!(flags & DRM_BUDDY_RANGE_ALLOCATION) && + min_order != 0 && pages == BIT(order + 1)) { + block = find_continuous_blocks(mm, + order, + flags, + &rblock); + if (block) + break; + } err = -ENOSPC; goto err_free; } } while (1); - mark_allocated(block); - mm->avail -= drm_buddy_block_size(mm, block); - kmemleak_update_trace(block); - list_add_tail(&block->link, &allocated); - - pages -= BIT(order); + do { + mark_allocated(block); + mm->avail -= drm_buddy_block_size(mm, block); + kmemleak_update_trace(block); + list_add_tail(&block->link, &allocated); + pages -= BIT(order); + block = rblock; + rblock = NULL; + } while (block); if (!pages) break; -- 2.34.1
Re: [PATCH] drm/amdgpu: Fix minmax error
Am 26.11.22 um 06:25 schrieb Luben Tuikov: Fix minmax compilation error by using the correct constant and correct integer suffix. Cc: James Zhu Cc: Felix Kuehling Fixes: 58170a7a002ad6 ("drm/amdgpu: fix stall on CPU when allocate large system memory") Signed-off-by: Luben Tuikov Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c index 8a2e5716d8dba2..65715cb395d838 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c @@ -51,7 +51,7 @@ #include "amdgpu_amdkfd.h" #include "amdgpu_hmm.h" -#define MAX_WALK_BYTE (64ULL<<30) +#define MAX_WALK_BYTE (2UL << 30) /** * amdgpu_hmm_invalidate_gfx - callback to notify about mm change @@ -197,8 +197,8 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier *notifier, hmm_range->start, hmm_range->end); /* Assuming 512MB takes maxmium 1 second to fault page address */ - timeout = max((hmm_range->end - hmm_range->start) >> 29, 1ULL) * - HMM_RANGE_DEFAULT_TIMEOUT; + timeout = max((hmm_range->end - hmm_range->start) >> 29, 1UL); + timeout *= HMM_RANGE_DEFAULT_TIMEOUT; timeout = jiffies + msecs_to_jiffies(timeout); retry: base-commit: 9e95ce4c60631c1339204f8723008a715391f410
[PATCH] drm: Optimise for continuous memory allocation
Currently drm-buddy does not have full knowledge of continuous memory. Lets consider scenario below. order 1:L R order 0: LL LR RL RR for order 1 allocation, it can offer L or R or LR+RL. For now, we only implement L or R case for continuous memory allocation. So this patch aims to implement the LR+RL case. Signed-off-by: xinhui pan --- drivers/gpu/drm/drm_buddy.c | 56 - 1 file changed, 49 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 11bb59399471..550af558342e 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -386,6 +386,37 @@ alloc_range_bias(struct drm_buddy *mm, return ERR_PTR(err); } +static struct drm_buddy_block * +find_continuous_blocks(struct drm_buddy *mm, + int order, + struct drm_buddy_block **rn) +{ + struct list_head *head = &mm->free_list[order]; + struct drm_buddy_block *node, *parent, *free_node; + + list_for_each_entry(free_node, head, link) { + node = free_node; + parent = node->parent; + while (parent && parent->right == node) { + node = parent; + parent = node->parent; + } + if (!parent) + continue; + + node = parent->right; + while (drm_buddy_block_is_split(node)) + node = node->left; + + if (drm_buddy_block_is_free(node) && + drm_buddy_block_order(node) == order) { + *rn = node; + return free_node; + } + } + return NULL; +} + static struct drm_buddy_block * get_maxblock(struct list_head *head) { @@ -637,7 +668,7 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm, struct list_head *blocks, unsigned long flags) { - struct drm_buddy_block *block = NULL; + struct drm_buddy_block *block = NULL, *rblock = NULL; unsigned int min_order, order; unsigned long pages; LIST_HEAD(allocated); @@ -689,17 +720,28 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm, break; if (order-- == min_order) { + if (!(flags & DRM_BUDDY_RANGE_ALLOCATION) && + min_order != 0 && pages == BIT(order + 1)) { + block = find_continuous_blocks(mm, + order, + &rblock); + if (block) + break; + } err = -ENOSPC; goto err_free; } } while (1); - mark_allocated(block); - mm->avail -= drm_buddy_block_size(mm, block); - kmemleak_update_trace(block); - list_add_tail(&block->link, &allocated); - - pages -= BIT(order); + do { + mark_allocated(block); + mm->avail -= drm_buddy_block_size(mm, block); + kmemleak_update_trace(block); + list_add_tail(&block->link, &allocated); + pages -= BIT(order); + block = rblock; + rblock = NULL; + } while (block); if (!pages) break; -- 2.34.1