Re: 回复: [PATCH] drm/amdgpu: add amdgpu vram usage information into amdgpu_vram_mm

2022-11-26 Thread Arunpravin Paneer Selvam




On 11/25/2022 8:05 AM, Wang, Yang(Kevin) wrote:

[AMD Official Use Only - General]

-邮件原件-
发件人: Paneer Selvam, Arunpravin 
发送时间: 2022年11月24日 22:57
收件人: Christian König ; Wang, Yang(Kevin) 
; Koenig, Christian ; 
amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander ; Zhang, Hawking 

主题: Re: [PATCH] drm/amdgpu: add amdgpu vram usage information into 
amdgpu_vram_mm



On 11/24/2022 1:17 PM, Christian König wrote:

Am 24.11.22 um 08:45 schrieb Wang, Yang(Kevin):

[AMD Official Use Only - General]

-Original Message-
From: Koenig, Christian 
Sent: Thursday, November 24, 2022 3:25 PM
To: Wang, Yang(Kevin) ;
amd-gfx@lists.freedesktop.org; Paneer Selvam, Arunpravin

Cc: Zhang, Hawking ; Deucher, Alexander

Subject: Re: [PATCH] drm/amdgpu: add amdgpu vram usage information
into amdgpu_vram_mm

Am 24.11.22 um 06:49 schrieb Yang Wang:

add vram usage information into dri debugfs amdgpu_vram_mm node.

Background:
when amdgpu driver introduces drm buddy allocator, the kernel driver
(and developer) is difficult to track the vram usage information.

Field:
0x-0x: vram usaged range.
type: kernel, device, sg
usage: normal, vm, user
domain: C-CPU, G-GTT, V-VRAM, P-PRIV
@x: the address of "amdgpu_bo" object in kernel space.
4096: vram range range.

Example:
0x0003fea68000-0x0003fea68fff: (type:kernel usage:vm
domain:--V- --V-) @1d33dfee 4096 bytes
0x0003fea69000-0x0003fea69fff: (type:kernel usage:vm
domain:--V- --V-) @a79155b5 4096 bytes
0x0003fea6b000-0x0003fea6bfff: (type:kernel usage:vm
domain:--V- --V-) @38ad633b 4096 bytes
0x0003fea6c000-0x0003fea6cfff: (type:device usage:user
domain:--V- --V-) @e302f90b 4096 bytes
0x0003fea6d000-0x0003fea6dfff: (type:device usage:user
domain:--V- --V-) @e664c172 4096 bytes
0x0003fea6e000-0x0003fea6efff: (type:kernel usage:vm
domain:--V- --V-) @4528cb2f 4096 bytes
0x0003fea6f000-0x0003fea6: (type:kernel usage:vm
domain:--V- --V-) @a446bdbf 4096 bytes
0x0003fea7-0x0003fea7: (type:device usage:user
domain:--V- --V-) @78fae42f 65536 bytes
0x0003fead8000-0x0003feadbfff: (type:kernel usage:normal
domain:--V- --V-) @1327b7ff 16384 bytes
0x0003feadc000-0x0003feadcfff: (type:kernel usage:normal
domain:--V- --V-) @1327b7ff 4096 bytes
0x0003feadd000-0x0003feaddfff: (type:kernel usage:normal
domain:--V- --V-) @b9706fc1 4096 bytes
0x0003feade000-0x0003feadefff: (type:kernel usage:vm
domain:--V- --V-) @71a25571 4096 bytes

Note:
although some vram ranges can be merged in the example above, but
this can reflect the actual distribution of drm buddy allocator.

Well completely NAK. This is way to much complexity for simple
debugging.

Question is what are your requirements here? E.g. what information do
you want and why doesn't the buddy allocator already expose this?

Regards,
Christian.

[Kevin]:

For KMD debug.
The DRM buddy interface doesn't provide an interface to query which
ranges of ram(resource) are used.
It is not easy to debug in KMD side if driver create BO fail at
specific location.
and from the view of KMD, the VRAM at some locations has special
purposes.
with this patch, we can know which range of vram are actually used.

Well that's not a good reason to add this complexity. Debugging
doesn't justify that.

Please work with Arun to add the necessary information to the buddy
allocator interface.

Regards,
Christian.


Hi Kevin,

I will check and list down some of the necessary information that we can add to 
the buddy allocator interface.

Regards,
Arun.

[kevin]:

Thanks,
but some information is AMDGPU specific, so I hope we can add some flexible 
interfaces to adapt the different driver's extension.
We have a debug interface in drm buddy module drm_buddy_print() 
function, it prints the available memory in each order at any instance 
and we have
drm_buddy_block_print() function to print the address range of each 
blocks and its size. Please check if it solves the purpose.


Regards,
Arun


Best Regards,
Kevin


Best Regards,
Kevin

Signed-off-by: Yang Wang 
---
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c   |   6 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_object.h   |   3 +
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 130
++-
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h |   1 +
4 files changed, 136 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 90eb07106609..117c754409b3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -53,7 +53,7 @@
 *
 */

-static void amdgpu_bo_destroy(struct ttm_buffer_object *tbo)
+void amdgpu_bo_destroy(struct ttm_buffer_object *tbo)
{
struct amdgpu_bo *bo = ttm_to_amdgpu_bo(tbo);

@@ -66,7 +66,7 @@ static void amdgpu_bo_destr

[PATCH] drm/amdgpu: New method to check block continuous

2022-11-26 Thread xinhui pan
Blocks are not guarnteed to be in ascending order.

Signed-off-by: xinhui pan 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 21 
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 27159f1d112e..17175d284869 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -59,22 +59,17 @@ amdgpu_vram_mgr_first_block(struct list_head *list)
 static inline bool amdgpu_is_vram_mgr_blocks_contiguous(struct list_head *head)
 {
struct drm_buddy_block *block;
-   u64 start, size;
+   u64 start = LONG_MAX, end = 0, size = 0;
 
-   block = amdgpu_vram_mgr_first_block(head);
-   if (!block)
-   return false;
+   list_for_each_entry(block, head, link) {
+   u64 bstart = amdgpu_vram_mgr_block_start(block);
+   u64 bsize = amdgpu_vram_mgr_block_size(block);
 
-   while (head != block->link.next) {
-   start = amdgpu_vram_mgr_block_start(block);
-   size = amdgpu_vram_mgr_block_size(block);
-
-   block = list_entry(block->link.next, struct drm_buddy_block, 
link);
-   if (start + size != amdgpu_vram_mgr_block_start(block))
-   return false;
+   start = min(bstart, start);
+   end = max(bstart + bsize, end);
+   size += bsize;
}
-
-   return true;
+   return end == start + size;
 }
 
 
-- 
2.34.1



[PATCH v2] drm: Optimise for continuous memory allocation

2022-11-26 Thread xinhui pan
Currently drm-buddy does not have full knowledge of continuous memory.

Lets consider scenario below.
order 1:L   R
order 0: LL LR  RL  RR
for order 1 allocation, it can offer L or R or LR+RL.

For now, we only implement L or R case for continuous memory allocation.
So this patch aims to implement the LR+RL case.

Signed-off-by: xinhui pan 
---
change from v1:
implement top-down continuous allocation
---
 drivers/gpu/drm/drm_buddy.c | 66 +
 1 file changed, 59 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 11bb59399471..eea14505070e 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -386,6 +386,46 @@ alloc_range_bias(struct drm_buddy *mm,
return ERR_PTR(err);
 }
 
+static struct drm_buddy_block *
+find_continuous_blocks(struct drm_buddy *mm,
+  int order,
+  unsigned long flags,
+  struct drm_buddy_block **rn)
+{
+   struct list_head *head = &mm->free_list[order];
+   struct drm_buddy_block *node, *parent, *free_node, *max_node = NULL;
+
+   list_for_each_entry(free_node, head, link) {
+   if (max_node) {
+   if (!(flags & DRM_BUDDY_TOPDOWN_ALLOCATION))
+   break;
+
+   if (drm_buddy_block_offset(free_node) <
+   drm_buddy_block_offset(max_node))
+   continue;
+   }
+
+   parent = free_node;
+   do {
+   node = parent;
+   parent = parent->parent;
+   } while (parent && parent->right == node);
+   if (!parent)
+   continue;
+
+   node = parent->right;
+   while (drm_buddy_block_is_split(node))
+   node = node->left;
+
+   if (drm_buddy_block_is_free(node) &&
+   drm_buddy_block_order(node) == order) {
+   *rn = node;
+   max_node = free_node;
+   }
+   }
+   return max_node;
+}
+
 static struct drm_buddy_block *
 get_maxblock(struct list_head *head)
 {
@@ -637,7 +677,7 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm,
   struct list_head *blocks,
   unsigned long flags)
 {
-   struct drm_buddy_block *block = NULL;
+   struct drm_buddy_block *block = NULL, *rblock = NULL;
unsigned int min_order, order;
unsigned long pages;
LIST_HEAD(allocated);
@@ -689,17 +729,29 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm,
break;
 
if (order-- == min_order) {
+   if (!(flags & DRM_BUDDY_RANGE_ALLOCATION) &&
+   min_order != 0 && pages == BIT(order + 1)) {
+   block = find_continuous_blocks(mm,
+  order,
+  flags,
+  &rblock);
+   if (block)
+   break;
+   }
err = -ENOSPC;
goto err_free;
}
} while (1);
 
-   mark_allocated(block);
-   mm->avail -= drm_buddy_block_size(mm, block);
-   kmemleak_update_trace(block);
-   list_add_tail(&block->link, &allocated);
-
-   pages -= BIT(order);
+   do {
+   mark_allocated(block);
+   mm->avail -= drm_buddy_block_size(mm, block);
+   kmemleak_update_trace(block);
+   list_add_tail(&block->link, &allocated);
+   pages -= BIT(order);
+   block = rblock;
+   rblock = NULL;
+   } while (block);
 
if (!pages)
break;
-- 
2.34.1



Re: [PATCH] drm/amdgpu: Fix minmax error

2022-11-26 Thread Christian König

Am 26.11.22 um 06:25 schrieb Luben Tuikov:

Fix minmax compilation error by using the correct constant and correct integer
suffix.

Cc: James Zhu 
Cc: Felix Kuehling 
Fixes: 58170a7a002ad6 ("drm/amdgpu: fix stall on CPU when allocate large system 
memory")
Signed-off-by: Luben Tuikov 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
index 8a2e5716d8dba2..65715cb395d838 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
@@ -51,7 +51,7 @@
  #include "amdgpu_amdkfd.h"
  #include "amdgpu_hmm.h"
  
-#define MAX_WALK_BYTE	(64ULL<<30)

+#define MAX_WALK_BYTE  (2UL << 30)
  
  /**

   * amdgpu_hmm_invalidate_gfx - callback to notify about mm change
@@ -197,8 +197,8 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier 
*notifier,
hmm_range->start, hmm_range->end);
  
  		/* Assuming 512MB takes maxmium 1 second to fault page address */

-   timeout = max((hmm_range->end - hmm_range->start) >> 29, 1ULL) *
-   HMM_RANGE_DEFAULT_TIMEOUT;
+   timeout = max((hmm_range->end - hmm_range->start) >> 29, 1UL);
+   timeout *= HMM_RANGE_DEFAULT_TIMEOUT;
timeout = jiffies + msecs_to_jiffies(timeout);
  
  retry:


base-commit: 9e95ce4c60631c1339204f8723008a715391f410




[PATCH] drm: Optimise for continuous memory allocation

2022-11-26 Thread xinhui pan
Currently drm-buddy does not have full knowledge of continuous memory.

Lets consider scenario below.
order 1:L   R
order 0: LL LR  RL  RR
for order 1 allocation, it can offer L or R or LR+RL.

For now, we only implement L or R case for continuous memory allocation.
So this patch aims to implement the LR+RL case.

Signed-off-by: xinhui pan 
---
 drivers/gpu/drm/drm_buddy.c | 56 -
 1 file changed, 49 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 11bb59399471..550af558342e 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -386,6 +386,37 @@ alloc_range_bias(struct drm_buddy *mm,
return ERR_PTR(err);
 }
 
+static struct drm_buddy_block *
+find_continuous_blocks(struct drm_buddy *mm,
+  int order,
+  struct drm_buddy_block **rn)
+{
+   struct list_head *head = &mm->free_list[order];
+   struct drm_buddy_block *node, *parent, *free_node;
+
+   list_for_each_entry(free_node, head, link) {
+   node = free_node;
+   parent = node->parent;
+   while (parent && parent->right == node) {
+   node = parent;
+   parent = node->parent;
+   }
+   if (!parent)
+   continue;
+
+   node = parent->right;
+   while (drm_buddy_block_is_split(node))
+   node = node->left;
+
+   if (drm_buddy_block_is_free(node) &&
+   drm_buddy_block_order(node) == order) {
+   *rn = node;
+   return free_node;
+   }
+   }
+   return NULL;
+}
+
 static struct drm_buddy_block *
 get_maxblock(struct list_head *head)
 {
@@ -637,7 +668,7 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm,
   struct list_head *blocks,
   unsigned long flags)
 {
-   struct drm_buddy_block *block = NULL;
+   struct drm_buddy_block *block = NULL, *rblock = NULL;
unsigned int min_order, order;
unsigned long pages;
LIST_HEAD(allocated);
@@ -689,17 +720,28 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm,
break;
 
if (order-- == min_order) {
+   if (!(flags & DRM_BUDDY_RANGE_ALLOCATION) &&
+   min_order != 0 && pages == BIT(order + 1)) {
+   block = find_continuous_blocks(mm,
+  order,
+  &rblock);
+   if (block)
+   break;
+   }
err = -ENOSPC;
goto err_free;
}
} while (1);
 
-   mark_allocated(block);
-   mm->avail -= drm_buddy_block_size(mm, block);
-   kmemleak_update_trace(block);
-   list_add_tail(&block->link, &allocated);
-
-   pages -= BIT(order);
+   do {
+   mark_allocated(block);
+   mm->avail -= drm_buddy_block_size(mm, block);
+   kmemleak_update_trace(block);
+   list_add_tail(&block->link, &allocated);
+   pages -= BIT(order);
+   block = rblock;
+   rblock = NULL;
+   } while (block);
 
if (!pages)
break;
-- 
2.34.1