### Re: 回复: 回复: [PATCH v4] drm: Optimise for continuous memory allocation

```
Am 29.11.22 um 14:14 schrieb Pan, Xinhui:

[AMD Official Use Only - General]

linux-ker...@vger.kernel.org; Paneer Selvam, Arunpravin;
intel-...@lists.freedesktop.org

Am 29.11.22 um 12:54 schrieb Pan, Xinhui:

[AMD Official Use Only - General]

linux-ker...@vger.kernel.org; Paneer Selvam, Arunpravin;
intel-...@lists.freedesktop.org

Am 29.11.22 um 11:56 schrieb xinhui pan:

Currently drm-buddy does not have full knowledge of continuous memory.

Lets consider scenario below.
order 1:L R
order 0: LL   LR  RL  RR
for order 1 allocation, it can offer L or R or LR+RL.

For now, we only implement L or R case for continuous memory allocation.
So this patch aims to implement the rest cases.

order. Now we can find more than 2 sub-order blocks easier.
Say, order 4 can be combined with corresponding order 4, 2+2, 1+2+1,
0+1+2+0, 0+2+1+0.

Well that description is a bit confusing and doesn't make to much sense
to me.

When you have two adjacent free order 0 blocks then those should be
automatically combined into an order 1. This is a fundamental property
of the buddy allocator, otherwise the whole algorithm won't work.

[xh] sorry, The order above is not 4, should be 3.
order 3 can be combined with corresponding order 3, 2+2, 1+2+1, 0+1+2+0, 0+2+1+0
the order 0 + 1 + 2 + 0 case does not have two same order 0 adjacent. they are
in different tree.
looks like below
order 3:L3
R3
order 2:L2  (R2)*L2*
order 1:L1 (R1) L1
order 0: L0   (R0) (L0)
R0 + R1+R2 +L0 with () around combined to be order 3.
R2 + L2 with * followed combined to be order 3.
etc

When you have the case of a free order 1 block with two adjacent free
order 0 blocks then we a fragmented address space. In this case the best
approach is to fail the allocation and start to swap things out.

[xh] Eviction is expensive.

No, it isn't. Eviction is part of the algorithm to clean this up.

When we can't find any free room then evicting and moving things back in
is the best we can do to de-fragment the address space.

This is expected behavior.

[xh]  I believe eviction is the best approach to cleanup memory.
But as its cost is not cheap, it should be the final step.
As long as we could find any room to satisfy the request, no need to trigger
eviction.

Just a test in theory
total memory is 128.
while true {
alloc 32
alloc 32
free 32
free 32
alloc 64
free 64
}

when thread 0 wants to alloc 64, the memory layout might be
(32) means allocated, _32_ means free.
case 1: (32) _32_ _32_ (32)
case 2: (32) _32_ (32) _32_
case 3: (32) (32)  _64_
case 4: (32) _32_ 64_
case 5: _128_
case 6: (64) _64_

without this patch, it would trigger eviction in case 1 and case 2.
with this patch, it would trigger eviction only in case 2.
obviously, the two threads totally consume memory at most 128 at any time, no
overcommit.
The eviction is the less the better.

No, once more: Eviction is part of why this works as it should.

In other words eviction is expected here and de-fragments the address
space into larger blocks.

This patch here breaks the general approach of the buddy allocator and
is a no-go as far as I can see.

If looking at adjacent blocks would come without extra cost then we
could consider it, but this here means extra overhead and complexity.

Regards,
Christian.

Regards,
Christian.

And if it still fails to find the continuous memory with this approach, then
let's evict.

So what exactly is the goal here?

Regards,
Christian.

Signed-off-by: xinhui pan
---
change from v3:

change from v2:
search continuous block in nearby root if needed

change from v1:
implement top-down continuous allocation
---
drivers/gpu/drm/drm_buddy.c | 108 +---
include/drm/drm_buddy.h |   1 +
2 files changed, 102 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 11bb59399471..8edafb99b02c 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -80,6 +80,7 @@ int```

### 回复: 回复: [PATCH v4] drm: Optimise for continuous memory allocation

```[AMD Official Use Only - General]

linux-ker...@vger.kernel.org; Paneer Selvam, Arunpravin;
intel-...@lists.freedesktop.org

Am 29.11.22 um 12:54 schrieb Pan, Xinhui:
> [AMD Official Use Only - General]
>
>
>
> 发件人: Koenig, Christian
> 发送时间: 2022年11月29日 19:32
> 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
> 抄送: dan...@ffwll.ch; matthew.a...@intel.com; dri-de...@lists.freedesktop.org;
> linux-ker...@vger.kernel.org; Paneer Selvam, Arunpravin;
> intel-...@lists.freedesktop.org
> 主题: Re: [PATCH v4] drm: Optimise for continuous memory allocation
>
> Am 29.11.22 um 11:56 schrieb xinhui pan:
>> Currently drm-buddy does not have full knowledge of continuous memory.
>>
>> Lets consider scenario below.
>> order 1:L R
>> order 0: LL   LR  RL  RR
>> for order 1 allocation, it can offer L or R or LR+RL.
>>
>> For now, we only implement L or R case for continuous memory allocation.
>> So this patch aims to implement the rest cases.
>>
>> order. Now we can find more than 2 sub-order blocks easier.
>> Say, order 4 can be combined with corresponding order 4, 2+2, 1+2+1,
>> 0+1+2+0, 0+2+1+0.
> Well that description is a bit confusing and doesn't make to much sense
> to me.
>
> When you have two adjacent free order 0 blocks then those should be
> automatically combined into an order 1. This is a fundamental property
> of the buddy allocator, otherwise the whole algorithm won't work.
>
> [xh] sorry, The order above is not 4, should be 3.
> order 3 can be combined with corresponding order 3, 2+2, 1+2+1, 0+1+2+0,
> 0+2+1+0
> the order 0 + 1 + 2 + 0 case does not have two same order 0 adjacent. they
> are in different tree.
> looks like below
> order 3:L3
>R3
> order 2:L2  (R2)*
> L2*
> order 1:L1 (R1) L1
> order 0: L0   (R0) (L0)
> R0 + R1+R2 +L0 with () around combined to be order 3.
> R2 + L2 with * followed combined to be order 3.
> etc
>
> When you have the case of a free order 1 block with two adjacent free
> order 0 blocks then we a fragmented address space. In this case the best
> approach is to fail the allocation and start to swap things out.
>
> [xh] Eviction is expensive.

No, it isn't. Eviction is part of the algorithm to clean this up.

When we can't find any free room then evicting and moving things back in
is the best we can do to de-fragment the address space.

This is expected behavior.

[xh]  I believe eviction is the best approach to cleanup memory.
But as its cost is not cheap, it should be the final step.
As long as we could find any room to satisfy the request, no need to trigger
eviction.

Just a test in theory
total memory is 128.
while true {
alloc 32
alloc 32
free 32
free 32
alloc 64
free 64
}

when thread 0 wants to alloc 64, the memory layout might be
(32) means allocated, _32_ means free.
case 1: (32) _32_ _32_ (32)
case 2: (32) _32_ (32) _32_
case 3: (32) (32)  _64_
case 4: (32) _32_ 64_
case 5: _128_
case 6: (64) _64_

without this patch, it would trigger eviction in case 1 and case 2.
with this patch, it would trigger eviction only in case 2.
obviously, the two threads totally consume memory at most 128 at any time, no
overcommit.
The eviction is the less the better.

Regards,
Christian.

> And if it still fails to find the continuous memory with this approach, then
> let's evict.
>
> So what exactly is the goal here?
>
> Regards,
> Christian.
>
>> Signed-off-by: xinhui pan
>> ---
>> change from v3:
>>
>> change from v2:
>> search continuous block in nearby root if needed
>>
>> change from v1:
>> implement top-down continuous allocation
>> ---
>>drivers/gpu/drm/drm_buddy.c | 108 +---
>>include/drm/drm_buddy.h |   1 +
>>2 files changed, 102 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
>> index 11bb59399471..8edafb99b02c 100644
>> --- a/drivers/gpu/drm/drm_buddy.c
>> +++ b/d```

### Re: 回复: [PATCH v4] drm: Optimise for continuous memory allocation

```
Am 29.11.22 um 12:54 schrieb Pan, Xinhui:

[AMD Official Use Only - General]

linux-ker...@vger.kernel.org; Paneer Selvam, Arunpravin;
intel-...@lists.freedesktop.org

Am 29.11.22 um 11:56 schrieb xinhui pan:

Currently drm-buddy does not have full knowledge of continuous memory.

Lets consider scenario below.
order 1:L R
order 0: LL   LR  RL  RR
for order 1 allocation, it can offer L or R or LR+RL.

For now, we only implement L or R case for continuous memory allocation.
So this patch aims to implement the rest cases.

order. Now we can find more than 2 sub-order blocks easier.
Say, order 4 can be combined with corresponding order 4, 2+2, 1+2+1,
0+1+2+0, 0+2+1+0.

Well that description is a bit confusing and doesn't make to much sense
to me.

When you have two adjacent free order 0 blocks then those should be
automatically combined into an order 1. This is a fundamental property
of the buddy allocator, otherwise the whole algorithm won't work.

[xh] sorry, The order above is not 4, should be 3.
order 3 can be combined with corresponding order 3, 2+2, 1+2+1, 0+1+2+0, 0+2+1+0
the order 0 + 1 + 2 + 0 case does not have two same order 0 adjacent. they are
in different tree.
looks like below
order 3:L3
R3
order 2:L2  (R2)*L2*
order 1:L1 (R1) L1
order 0: L0   (R0) (L0)
R0 + R1+R2 +L0 with () around combined to be order 3.
R2 + L2 with * followed combined to be order 3.
etc

When you have the case of a free order 1 block with two adjacent free
order 0 blocks then we a fragmented address space. In this case the best
approach is to fail the allocation and start to swap things out.

[xh] Eviction is expensive.

No, it isn't. Eviction is part of the algorithm to clean this up.

When we can't find any free room then evicting and moving things back in
is the best we can do to de-fragment the address space.

This is expected behavior.

Regards,
Christian.

And if it still fails to find the continuous memory with this approach, then
let's evict.

So what exactly is the goal here?

Regards,
Christian.

Signed-off-by: xinhui pan
---
change from v3:

change from v2:
search continuous block in nearby root if needed

change from v1:
implement top-down continuous allocation
---
drivers/gpu/drm/drm_buddy.c | 108 +---
include/drm/drm_buddy.h |   1 +
2 files changed, 102 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 11bb59399471..8edafb99b02c 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -80,6 +80,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64
chunk_size)
{
unsigned int i;
u64 offset;

if (size < chunk_size)
return -EINVAL;
@@ -136,6 +137,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64
chunk_size)
goto out_free_roots;

mark_free(mm, root);

BUG_ON(i > mm->max_order);
BUG_ON(drm_buddy_block_size(mm, root) < chunk_size);
@@ -147,6 +149,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64
chunk_size)
i++;
} while (size);

+ list_del();
return 0;

out_free_roots:
@@ -205,6 +208,9 @@ static int split_block(struct drm_buddy *mm,
mark_free(mm, block->left);
mark_free(mm, block->right);

mark_split(block);

return 0;
@@ -256,6 +262,9 @@ static void __drm_buddy_free(struct drm_buddy *mm,
break;

drm_block_free(mm, block);
drm_block_free(mm, buddy);
@@ -386,6 +395,78 @@ alloc_range_bias(struct drm_buddy *mm,
return ERR_PTR(err);
}

+static struct drm_buddy_block *
+find_continuous_blocks(struct drm_buddy *mm,
+int order,
+unsigned long flags,
+struct drm_buddy_block **rblock)
+{
+ struct drm_buddy_block *free_block, ```

### 回复: [PATCH v4] drm: Optimise for continuous memory allocation

```[AMD Official Use Only - General]

linux-ker...@vger.kernel.org; Paneer Selvam, Arunpravin;
intel-...@lists.freedesktop.org

Am 29.11.22 um 11:56 schrieb xinhui pan:
> Currently drm-buddy does not have full knowledge of continuous memory.
>
> Lets consider scenario below.
> order 1:L R
> order 0: LL   LR  RL  RR
> for order 1 allocation, it can offer L or R or LR+RL.
>
> For now, we only implement L or R case for continuous memory allocation.
> So this patch aims to implement the rest cases.
>
> order. Now we can find more than 2 sub-order blocks easier.
> Say, order 4 can be combined with corresponding order 4, 2+2, 1+2+1,
> 0+1+2+0, 0+2+1+0.

Well that description is a bit confusing and doesn't make to much sense
to me.

When you have two adjacent free order 0 blocks then those should be
automatically combined into an order 1. This is a fundamental property
of the buddy allocator, otherwise the whole algorithm won't work.

[xh] sorry, The order above is not 4, should be 3.
order 3 can be combined with corresponding order 3, 2+2, 1+2+1, 0+1+2+0, 0+2+1+0
the order 0 + 1 + 2 + 0 case does not have two same order 0 adjacent. they are
in different tree.
looks like below
order 3:L3
R3
order 2:L2  (R2)*L2*
order 1:L1 (R1) L1
order 0: L0   (R0) (L0)
R0 + R1+R2 +L0 with () around combined to be order 3.
R2 + L2 with * followed combined to be order 3.
etc

When you have the case of a free order 1 block with two adjacent free
order 0 blocks then we a fragmented address space. In this case the best
approach is to fail the allocation and start to swap things out.

[xh] Eviction is expensive.
And if it still fails to find the continuous memory with this approach, then
let's evict.

So what exactly is the goal here?

Regards,
Christian.

>
> Signed-off-by: xinhui pan
> ---
> change from v3:
>
> change from v2:
> search continuous block in nearby root if needed
>
> change from v1:
> implement top-down continuous allocation
> ---
>   drivers/gpu/drm/drm_buddy.c | 108 +---
>   include/drm/drm_buddy.h |   1 +
>   2 files changed, 102 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
> index 11bb59399471..8edafb99b02c 100644
> --- a/drivers/gpu/drm/drm_buddy.c
> +++ b/drivers/gpu/drm/drm_buddy.c
> @@ -80,6 +80,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64
> chunk_size)
>   {
>   unsigned int i;
>   u64 offset;
>
>   if (size < chunk_size)
>   return -EINVAL;
> @@ -136,6 +137,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64
> chunk_size)
>   goto out_free_roots;
>
>   mark_free(mm, root);
>
>   BUG_ON(i > mm->max_order);
>   BUG_ON(drm_buddy_block_size(mm, root) < chunk_size);
> @@ -147,6 +149,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64
> chunk_size)
>   i++;
>   } while (size);
>
> + list_del();
>   return 0;
>
>   out_free_roots:
> @@ -205,6 +208,9 @@ static int split_block(struct drm_buddy *mm,
>   mark_free(mm, block->left);
>   mark_free(mm, block->right);
>
>   mark_split(block);
>
>   return 0;
> @@ -256,6 +262,9 @@ static void __drm_buddy_free(struct drm_buddy *mm,
>   break;
>
>
>   drm_block_free(mm, block);
>   drm_block_free(mm, buddy);
> @@ -386,6 +395,78 @@ alloc_range_bias(struct drm_buddy *mm,
>   return ERR_PTR(err);
>   }
>
> +static struct drm_buddy_block *
> +find_continuous_blocks(struct drm_buddy *mm,
> +int order,
> +unsigned long flags,
> +struct drm_buddy_block **rblock)
> +{
> + struct drm_buddy_block *free_block, *max_block = NULL, *end, *begin;
> + u64 pages = BIT(order + 1);
> + u64 cur_pages;
> +
> +   ```

### 回复: [PATCH v4] drm: Optimise for continuous memory allocation

```[AMD Official Use Only - General]

In one ROCM +  gdm restart test,
find_continuous_blocks() succeed with ratio 35%.
the cod coverage report is below.

7723998 : if (order-- == min_order) {
773 352 : if (!(flags &
DRM_BUDDY_RANGE_ALLOCATION) &&
774 352 : min_order != 0 &&
pages == BIT(order + 1)) {
775  79 : block =
find_continuous_blocks(mm,
776 :
order,
777 :
flags,
778 :
);
779  79 : if (block)
780 : break;
781 : }
782 300 : err = -ENOSPC;
783 300 : goto err_free;

thanks
xinhui

dri-de...@lists.freedesktop.org; linux-ker...@vger.kernel.org; Paneer Selvam,
Arunpravin; intel-...@lists.freedesktop.org; Pan, Xinhui

Currently drm-buddy does not have full knowledge of continuous memory.

Lets consider scenario below.
order 1:L   R
order 0: LL LR  RL  RR
for order 1 allocation, it can offer L or R or LR+RL.

For now, we only implement L or R case for continuous memory allocation.
So this patch aims to implement the rest cases.

order. Now we can find more than 2 sub-order blocks easier.
Say, order 4 can be combined with corresponding order 4, 2+2, 1+2+1,
0+1+2+0, 0+2+1+0.

Signed-off-by: xinhui pan
---
change from v3:

change from v2:
search continuous block in nearby root if needed

change from v1:
implement top-down continuous allocation
---
drivers/gpu/drm/drm_buddy.c | 108 +---
include/drm/drm_buddy.h |   1 +
2 files changed, 102 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 11bb59399471..8edafb99b02c 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -80,6 +80,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64
chunk_size)
{
unsigned int i;
u64 offset;

if (size < chunk_size)
return -EINVAL;
@@ -136,6 +137,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64
chunk_size)
goto out_free_roots;

mark_free(mm, root);

BUG_ON(i > mm->max_order);
BUG_ON(drm_buddy_block_size(mm, root) < chunk_size);
@@ -147,6 +149,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64
chunk_size)
i++;
} while (size);

+   list_del();
return 0;

out_free_roots:
@@ -205,6 +208,9 @@ static int split_block(struct drm_buddy *mm,
mark_free(mm, block->left);
mark_free(mm, block->right);

mark_split(block);

return 0;
@@ -256,6 +262,9 @@ static void __drm_buddy_free(struct drm_buddy *mm,
break;

drm_block_free(mm, block);
drm_block_free(mm, buddy);
@@ -386,6 +395,78 @@ alloc_range_bias(struct drm_buddy *mm,
return ERR_PTR(err);
}

+static struct drm_buddy_block *
+find_continuous_blocks(struct drm_buddy *mm,
+  int order,
+  unsigned long flags,
+  struct drm_buddy_block **rblock)
+{
+   struct drm_buddy_block *free_block, *max_block = NULL, *end, *begin;
+   u64 pages = BIT(order + 1);
+   u64 cur_pages;
+