Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC

2022-05-18 Thread Christian König
Found and fixed. If nobody has any more objections are batter names for 
the new flags I'm going to push this to amd-staging-drm-next tomorrow.


Thanks,
Christian.

Am 17.05.22 um 08:33 schrieb Christian König:

Ok that sounds like a rather simple bug. I will try to take a look.

Thanks,
Christian.

Am 17.05.22 um 02:12 schrieb Marek Olšák:
Dmesg doesn't contain anything. There is no backtrace because it's 
not a crash. The VA map ioctl just fails with the new flag. It looks 
like the flag is considered invalid.


Marek

On Mon., May 16, 2022, 12:13 Christian König, 
 wrote:


I don't have access to any gfx10 hardware.

Can you give me a dmesg and/or backtrace, etc..?

I can't push this unless it's working properly.

Christian.

Am 16.05.22 um 14:56 schrieb Marek Olšák:

Reproduction steps:
- use mesa/main on gfx10.3 (not sure what other GPUs do)
- run: radeonsi_mall_noalloc=true glxgears

Marek

On Mon, May 16, 2022 at 7:53 AM Christian König
 wrote:

Crap, do you have a link to the failure?

Am 16.05.22 um 13:10 schrieb Marek Olšák:

I forgot to say: The NOALLOC flag causes an allocation
failure, so there is a kernel bug somewhere.

Marek

On Mon, May 16, 2022 at 7:06 AM Marek Olšák
 wrote:

FYI, I think it's time to merge this because the Mesa
commits are going to be merged in ~30 minutes if Gitlab
CI is green, and that includes updated amdgpu_drm.h.

Marek

On Wed, May 11, 2022 at 2:55 PM Marek Olšák
 wrote:

Ok sounds good.

Marek

On Wed., May 11, 2022, 03:43 Christian König,
 wrote:

It really *is* a NOALLOC feature. In other
words there is no latency improvement on reads
because the cache is always checked, even with
the noalloc flag set.

The only thing it affects is that misses not
enter the cache and so don't cause any
additional pressure on evicting cache lines.

You might want to double check with the
hardware guys, but I'm something like 95% sure
that it works this way.

Christian.

Am 11.05.22 um 09:22 schrieb Marek Olšák:

Bypass means that the contents of the cache
are ignored, which decreases latency at the
cost of no coherency between bypassed and
normal memory requests. NOA (noalloc) means
that the cache is checked and can give you
cache hits, but misses are not cached and the
overall latency is higher. I don't know what
the hw does, but I hope it was misnamed and it
really means bypass because there is no point
in doing cache lookups on every memory request
if the driver wants to disable caching to
*decrease* latency in the situations when the
cache isn't helping.

Marek

On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo
 wrote:



On 5/11/2022 11:36 AM, Christian König wrote:
> Mhm, it doesn't really bypass MALL. It
just doesn't allocate any MALL
> entries on write.
>
> How about AMDGPU_VM_PAGE_NO_MALL ?

One more - AMDGPU_VM_PAGE_LLC_* [ LLC =
last level cache, * = some sort
of attribute which decides LLC behaviour]

Thanks,
Lijo

>
> Christian.
>
> Am 10.05.22 um 23:21 schrieb Marek Olšák:
>> A better name would be:
>> AMDGPU_VM_PAGE_BYPASS_MALL
>>
>> Marek
>>
>> On Fri, May 6, 2022 at 7:23 AM
Christian König
>>  wrote:
>>
>>     Add the AMDGPU_VM_NOALLOC flag to
let userspace control MALL
>>     allocation.
>>
>>     Only compile tested!
>>
>>  Signed-off-by: Christian König

>>     ---
>>
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
| 2 ++
 

Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC

2022-05-17 Thread Christian König

Ok that sounds like a rather simple bug. I will try to take a look.

Thanks,
Christian.

Am 17.05.22 um 02:12 schrieb Marek Olšák:
Dmesg doesn't contain anything. There is no backtrace because it's not 
a crash. The VA map ioctl just fails with the new flag. It looks like 
the flag is considered invalid.


Marek

On Mon., May 16, 2022, 12:13 Christian König, 
 wrote:


I don't have access to any gfx10 hardware.

Can you give me a dmesg and/or backtrace, etc..?

I can't push this unless it's working properly.

Christian.

Am 16.05.22 um 14:56 schrieb Marek Olšák:

Reproduction steps:
- use mesa/main on gfx10.3 (not sure what other GPUs do)
- run: radeonsi_mall_noalloc=true glxgears

Marek

On Mon, May 16, 2022 at 7:53 AM Christian König
 wrote:

Crap, do you have a link to the failure?

Am 16.05.22 um 13:10 schrieb Marek Olšák:

I forgot to say: The NOALLOC flag causes an allocation
failure, so there is a kernel bug somewhere.

Marek

On Mon, May 16, 2022 at 7:06 AM Marek Olšák
 wrote:

FYI, I think it's time to merge this because the Mesa
commits are going to be merged in ~30 minutes if Gitlab
CI is green, and that includes updated amdgpu_drm.h.

Marek

On Wed, May 11, 2022 at 2:55 PM Marek Olšák
 wrote:

Ok sounds good.

Marek

On Wed., May 11, 2022, 03:43 Christian König,
 wrote:

It really *is* a NOALLOC feature. In other words
there is no latency improvement on reads because
the cache is always checked, even with the
noalloc flag set.

The only thing it affects is that misses not
enter the cache and so don't cause any
additional pressure on evicting cache lines.

You might want to double check with the hardware
guys, but I'm something like 95% sure that it
works this way.

Christian.

Am 11.05.22 um 09:22 schrieb Marek Olšák:

Bypass means that the contents of the cache are
ignored, which decreases latency at the cost of
no coherency between bypassed and normal memory
requests. NOA (noalloc) means that the cache is
checked and can give you cache hits, but misses
are not cached and the overall latency is
higher. I don't know what the hw does, but I
hope it was misnamed and it really means bypass
because there is no point in doing cache
lookups on every memory request if the driver
wants to disable caching to *decrease* latency
in the situations when the cache isn't helping.

Marek

On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo
 wrote:



On 5/11/2022 11:36 AM, Christian König wrote:
> Mhm, it doesn't really bypass MALL. It
just doesn't allocate any MALL
> entries on write.
>
> How about AMDGPU_VM_PAGE_NO_MALL ?

One more - AMDGPU_VM_PAGE_LLC_* [ LLC =
last level cache, * = some sort
of attribute which decides LLC behaviour]

Thanks,
Lijo

>
> Christian.
>
> Am 10.05.22 um 23:21 schrieb Marek Olšák:
>> A better name would be:
>> AMDGPU_VM_PAGE_BYPASS_MALL
>>
>> Marek
>>
>> On Fri, May 6, 2022 at 7:23 AM Christian
König
>>  wrote:
>>
>>     Add the AMDGPU_VM_NOALLOC flag to
let userspace control MALL
>>     allocation.
>>
>>     Only compile tested!
>>
>>  Signed-off-by: Christian König

>>     ---
>>
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |
2 ++
>>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
| 3 +++
>>   drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
| 3 +++
>>   

Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC

2022-05-16 Thread Marek Olšák
Dmesg doesn't contain anything. There is no backtrace because it's not a
crash. The VA map ioctl just fails with the new flag. It looks like the
flag is considered invalid.

Marek

On Mon., May 16, 2022, 12:13 Christian König, <
ckoenig.leichtzumer...@gmail.com> wrote:

> I don't have access to any gfx10 hardware.
>
> Can you give me a dmesg and/or backtrace, etc..?
>
> I can't push this unless it's working properly.
>
> Christian.
>
> Am 16.05.22 um 14:56 schrieb Marek Olšák:
>
> Reproduction steps:
> - use mesa/main on gfx10.3 (not sure what other GPUs do)
> - run: radeonsi_mall_noalloc=true glxgears
>
> Marek
>
> On Mon, May 16, 2022 at 7:53 AM Christian König <
> ckoenig.leichtzumer...@gmail.com> wrote:
>
>> Crap, do you have a link to the failure?
>>
>> Am 16.05.22 um 13:10 schrieb Marek Olšák:
>>
>> I forgot to say: The NOALLOC flag causes an allocation failure, so there
>> is a kernel bug somewhere.
>>
>> Marek
>>
>> On Mon, May 16, 2022 at 7:06 AM Marek Olšák  wrote:
>>
>>> FYI, I think it's time to merge this because the Mesa commits are going
>>> to be merged in ~30 minutes if Gitlab CI is green, and that includes
>>> updated amdgpu_drm.h.
>>>
>>> Marek
>>>
>>> On Wed, May 11, 2022 at 2:55 PM Marek Olšák  wrote:
>>>
 Ok sounds good.

 Marek

 On Wed., May 11, 2022, 03:43 Christian König, <
 ckoenig.leichtzumer...@gmail.com> wrote:

> It really *is* a NOALLOC feature. In other words there is no latency
> improvement on reads because the cache is always checked, even with the
> noalloc flag set.
>
> The only thing it affects is that misses not enter the cache and so
> don't cause any additional pressure on evicting cache lines.
>
> You might want to double check with the hardware guys, but I'm
> something like 95% sure that it works this way.
>
> Christian.
>
> Am 11.05.22 um 09:22 schrieb Marek Olšák:
>
> Bypass means that the contents of the cache are ignored, which
> decreases latency at the cost of no coherency between bypassed and normal
> memory requests. NOA (noalloc) means that the cache is checked and can 
> give
> you cache hits, but misses are not cached and the overall latency is
> higher. I don't know what the hw does, but I hope it was misnamed and it
> really means bypass because there is no point in doing cache lookups on
> every memory request if the driver wants to disable caching to *decrease*
> latency in the situations when the cache isn't helping.
>
> Marek
>
> On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo 
> wrote:
>
>>
>>
>> On 5/11/2022 11:36 AM, Christian König wrote:
>> > Mhm, it doesn't really bypass MALL. It just doesn't allocate any
>> MALL
>> > entries on write.
>> >
>> > How about AMDGPU_VM_PAGE_NO_MALL ?
>>
>> One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last level cache, * = some
>> sort
>> of attribute which decides LLC behaviour]
>>
>> Thanks,
>> Lijo
>>
>> >
>> > Christian.
>> >
>> > Am 10.05.22 um 23:21 schrieb Marek Olšák:
>> >> A better name would be:
>> >> AMDGPU_VM_PAGE_BYPASS_MALL
>> >>
>> >> Marek
>> >>
>> >> On Fri, May 6, 2022 at 7:23 AM Christian König
>> >>  wrote:
>> >>
>> >> Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL
>> >> allocation.
>> >>
>> >> Only compile tested!
>> >>
>> >> Signed-off-by: Christian König 
>> >> ---
>> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
>> >>  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
>> >>  drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
>> >>  include/uapi/drm/amdgpu_drm.h   | 2 ++
>> >>  4 files changed, 10 insertions(+)
>> >>
>> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> >> index bf97d8f07f57..d8129626581f 100644
>> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> >> @@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct
>> >> amdgpu_device *adev, uint32_t flags)
>> >> pte_flag |= AMDGPU_PTE_WRITEABLE;
>> >> if (flags & AMDGPU_VM_PAGE_PRT)
>> >> pte_flag |= AMDGPU_PTE_PRT;
>> >> +   if (flags & AMDGPU_VM_PAGE_NOALLOC)
>> >> +   pte_flag |= AMDGPU_PTE_NOALLOC;
>> >>
>> >> if (adev->gmc.gmc_funcs->map_mtype)
>> >> pte_flag |= amdgpu_gmc_map_mtype(adev,
>> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>> >> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>> >> index b8c79789e1e4..9077dfccaf3c 100644
>> >> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>> >> +++ 

Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC

2022-05-16 Thread Christian König

I don't have access to any gfx10 hardware.

Can you give me a dmesg and/or backtrace, etc..?

I can't push this unless it's working properly.

Christian.

Am 16.05.22 um 14:56 schrieb Marek Olšák:

Reproduction steps:
- use mesa/main on gfx10.3 (not sure what other GPUs do)
- run: radeonsi_mall_noalloc=true glxgears

Marek

On Mon, May 16, 2022 at 7:53 AM Christian König 
 wrote:


Crap, do you have a link to the failure?

Am 16.05.22 um 13:10 schrieb Marek Olšák:

I forgot to say: The NOALLOC flag causes an allocation failure,
so there is a kernel bug somewhere.

Marek

On Mon, May 16, 2022 at 7:06 AM Marek Olšák  wrote:

FYI, I think it's time to merge this because the Mesa commits
are going to be merged in ~30 minutes if Gitlab CI is green,
and that includes updated amdgpu_drm.h.

Marek

On Wed, May 11, 2022 at 2:55 PM Marek Olšák
 wrote:

Ok sounds good.

Marek

On Wed., May 11, 2022, 03:43 Christian König,
 wrote:

It really *is* a NOALLOC feature. In other words
there is no latency improvement on reads because the
cache is always checked, even with the noalloc flag set.

The only thing it affects is that misses not enter
the cache and so don't cause any additional pressure
on evicting cache lines.

You might want to double check with the hardware
guys, but I'm something like 95% sure that it works
this way.

Christian.

Am 11.05.22 um 09:22 schrieb Marek Olšák:

Bypass means that the contents of the cache are
ignored, which decreases latency at the cost of no
coherency between bypassed and normal memory
requests. NOA (noalloc) means that the cache is
checked and can give you cache hits, but misses are
not cached and the overall latency is higher. I
don't know what the hw does, but I hope it was
misnamed and it really means bypass because there is
no point in doing cache lookups on every memory
request if the driver wants to disable caching to
*decrease* latency in the situations when the cache
isn't helping.

Marek

On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo
 wrote:



On 5/11/2022 11:36 AM, Christian König wrote:
> Mhm, it doesn't really bypass MALL. It just
doesn't allocate any MALL
> entries on write.
>
> How about AMDGPU_VM_PAGE_NO_MALL ?

One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last
level cache, * = some sort
of attribute which decides LLC behaviour]

Thanks,
Lijo

>
> Christian.
>
> Am 10.05.22 um 23:21 schrieb Marek Olšák:
>> A better name would be:
>> AMDGPU_VM_PAGE_BYPASS_MALL
>>
>> Marek
>>
>> On Fri, May 6, 2022 at 7:23 AM Christian König
>>  wrote:
>>
>>     Add the AMDGPU_VM_NOALLOC flag to let
userspace control MALL
>>     allocation.
>>
>>     Only compile tested!
>>
>>     Signed-off-by: Christian König

>>     ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
>>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 3 +++
>>   drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 3 +++
>>   include/uapi/drm/amdgpu_drm.h        | 2 ++
>>      4 files changed, 10 insertions(+)
>>
>>     diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>  b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>     index bf97d8f07f57..d8129626581f 100644
>>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>     @@ -650,6 +650,8 @@ uint64_t
amdgpu_gem_va_map_flags(struct
>>     amdgpu_device *adev, uint32_t flags)
>> pte_flag |= AMDGPU_PTE_WRITEABLE;
>>             if (flags & AMDGPU_VM_PAGE_PRT)
>> pte_flag |= AMDGPU_PTE_PRT;
>>     +       if 

Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC

2022-05-16 Thread Marek Olšák
Reproduction steps:
- use mesa/main on gfx10.3 (not sure what other GPUs do)
- run: radeonsi_mall_noalloc=true glxgears

Marek

On Mon, May 16, 2022 at 7:53 AM Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> Crap, do you have a link to the failure?
>
> Am 16.05.22 um 13:10 schrieb Marek Olšák:
>
> I forgot to say: The NOALLOC flag causes an allocation failure, so there
> is a kernel bug somewhere.
>
> Marek
>
> On Mon, May 16, 2022 at 7:06 AM Marek Olšák  wrote:
>
>> FYI, I think it's time to merge this because the Mesa commits are going
>> to be merged in ~30 minutes if Gitlab CI is green, and that includes
>> updated amdgpu_drm.h.
>>
>> Marek
>>
>> On Wed, May 11, 2022 at 2:55 PM Marek Olšák  wrote:
>>
>>> Ok sounds good.
>>>
>>> Marek
>>>
>>> On Wed., May 11, 2022, 03:43 Christian König, <
>>> ckoenig.leichtzumer...@gmail.com> wrote:
>>>
 It really *is* a NOALLOC feature. In other words there is no latency
 improvement on reads because the cache is always checked, even with the
 noalloc flag set.

 The only thing it affects is that misses not enter the cache and so
 don't cause any additional pressure on evicting cache lines.

 You might want to double check with the hardware guys, but I'm
 something like 95% sure that it works this way.

 Christian.

 Am 11.05.22 um 09:22 schrieb Marek Olšák:

 Bypass means that the contents of the cache are ignored, which
 decreases latency at the cost of no coherency between bypassed and normal
 memory requests. NOA (noalloc) means that the cache is checked and can give
 you cache hits, but misses are not cached and the overall latency is
 higher. I don't know what the hw does, but I hope it was misnamed and it
 really means bypass because there is no point in doing cache lookups on
 every memory request if the driver wants to disable caching to *decrease*
 latency in the situations when the cache isn't helping.

 Marek

 On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo  wrote:

>
>
> On 5/11/2022 11:36 AM, Christian König wrote:
> > Mhm, it doesn't really bypass MALL. It just doesn't allocate any
> MALL
> > entries on write.
> >
> > How about AMDGPU_VM_PAGE_NO_MALL ?
>
> One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last level cache, * = some
> sort
> of attribute which decides LLC behaviour]
>
> Thanks,
> Lijo
>
> >
> > Christian.
> >
> > Am 10.05.22 um 23:21 schrieb Marek Olšák:
> >> A better name would be:
> >> AMDGPU_VM_PAGE_BYPASS_MALL
> >>
> >> Marek
> >>
> >> On Fri, May 6, 2022 at 7:23 AM Christian König
> >>  wrote:
> >>
> >> Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL
> >> allocation.
> >>
> >> Only compile tested!
> >>
> >> Signed-off-by: Christian König 
> >> ---
> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
> >>  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
> >>  drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
> >>  include/uapi/drm/amdgpu_drm.h   | 2 ++
> >>  4 files changed, 10 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >> index bf97d8f07f57..d8129626581f 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >> @@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct
> >> amdgpu_device *adev, uint32_t flags)
> >> pte_flag |= AMDGPU_PTE_WRITEABLE;
> >> if (flags & AMDGPU_VM_PAGE_PRT)
> >> pte_flag |= AMDGPU_PTE_PRT;
> >> +   if (flags & AMDGPU_VM_PAGE_NOALLOC)
> >> +   pte_flag |= AMDGPU_PTE_NOALLOC;
> >>
> >> if (adev->gmc.gmc_funcs->map_mtype)
> >> pte_flag |= amdgpu_gmc_map_mtype(adev,
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >> index b8c79789e1e4..9077dfccaf3c 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >> @@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct
> >> amdgpu_device *adev,
> >> *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
> >> *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
> >>
> >> +   *flags &= ~AMDGPU_PTE_NOALLOC;
> >> +   *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
> >> +
> >> if (mapping->flags & AMDGPU_PTE_PRT) {
> >> *flags |= AMDGPU_PTE_PRT;
> >> *flags |= AMDGPU_PTE_SNOOPED;
> >> diff --git 

Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC

2022-05-16 Thread Christian König

Crap, do you have a link to the failure?

Am 16.05.22 um 13:10 schrieb Marek Olšák:
I forgot to say: The NOALLOC flag causes an allocation failure, so 
there is a kernel bug somewhere.


Marek

On Mon, May 16, 2022 at 7:06 AM Marek Olšák  wrote:

FYI, I think it's time to merge this because the Mesa commits are
going to be merged in ~30 minutes if Gitlab CI is green, and that
includes updated amdgpu_drm.h.

Marek

On Wed, May 11, 2022 at 2:55 PM Marek Olšák  wrote:

Ok sounds good.

Marek

On Wed., May 11, 2022, 03:43 Christian König,
 wrote:

It really *is* a NOALLOC feature. In other words there is
no latency improvement on reads because the cache is
always checked, even with the noalloc flag set.

The only thing it affects is that misses not enter the
cache and so don't cause any additional pressure on
evicting cache lines.

You might want to double check with the hardware guys, but
I'm something like 95% sure that it works this way.

Christian.

Am 11.05.22 um 09:22 schrieb Marek Olšák:

Bypass means that the contents of the cache are ignored,
which decreases latency at the cost of no coherency
between bypassed and normal memory requests. NOA
(noalloc) means that the cache is checked and can give
you cache hits, but misses are not cached and the overall
latency is higher. I don't know what the hw does, but I
hope it was misnamed and it really means bypass because
there is no point in doing cache lookups on every memory
request if the driver wants to disable caching to
*decrease* latency in the situations when the cache isn't
helping.

Marek

On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo
 wrote:



On 5/11/2022 11:36 AM, Christian König wrote:
> Mhm, it doesn't really bypass MALL. It just doesn't
allocate any MALL
> entries on write.
>
> How about AMDGPU_VM_PAGE_NO_MALL ?

One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last level
cache, * = some sort
of attribute which decides LLC behaviour]

Thanks,
Lijo

>
> Christian.
>
> Am 10.05.22 um 23:21 schrieb Marek Olšák:
>> A better name would be:
>> AMDGPU_VM_PAGE_BYPASS_MALL
>>
>> Marek
>>
>> On Fri, May 6, 2022 at 7:23 AM Christian König
>>  wrote:
>>
>>     Add the AMDGPU_VM_NOALLOC flag to let
userspace control MALL
>>     allocation.
>>
>>     Only compile tested!
>>
>>     Signed-off-by: Christian König

>>     ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
>>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
>>   drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
>>      include/uapi/drm/amdgpu_drm.h          | 2 ++
>>      4 files changed, 10 insertions(+)
>>
>>     diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>  b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>     index bf97d8f07f57..d8129626581f 100644
>>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>     @@ -650,6 +650,8 @@ uint64_t
amdgpu_gem_va_map_flags(struct
>>     amdgpu_device *adev, uint32_t flags)
>>                     pte_flag |= AMDGPU_PTE_WRITEABLE;
>>             if (flags & AMDGPU_VM_PAGE_PRT)
>>                     pte_flag |= AMDGPU_PTE_PRT;
>>     +       if (flags & AMDGPU_VM_PAGE_NOALLOC)
>>     +               pte_flag |= AMDGPU_PTE_NOALLOC;
>>
>>             if (adev->gmc.gmc_funcs->map_mtype)
>>                     pte_flag |=
amdgpu_gmc_map_mtype(adev,
>>     diff --git
a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>  b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>     index b8c79789e1e4..9077dfccaf3c 100644
>>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>     @@ -613,6 +613,9 @@ static void
gmc_v10_0_get_vm_pte(struct
  

Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC

2022-05-16 Thread Marek Olšák
I forgot to say: The NOALLOC flag causes an allocation failure, so there is
a kernel bug somewhere.

Marek

On Mon, May 16, 2022 at 7:06 AM Marek Olšák  wrote:

> FYI, I think it's time to merge this because the Mesa commits are going to
> be merged in ~30 minutes if Gitlab CI is green, and that includes updated
> amdgpu_drm.h.
>
> Marek
>
> On Wed, May 11, 2022 at 2:55 PM Marek Olšák  wrote:
>
>> Ok sounds good.
>>
>> Marek
>>
>> On Wed., May 11, 2022, 03:43 Christian König, <
>> ckoenig.leichtzumer...@gmail.com> wrote:
>>
>>> It really *is* a NOALLOC feature. In other words there is no latency
>>> improvement on reads because the cache is always checked, even with the
>>> noalloc flag set.
>>>
>>> The only thing it affects is that misses not enter the cache and so
>>> don't cause any additional pressure on evicting cache lines.
>>>
>>> You might want to double check with the hardware guys, but I'm something
>>> like 95% sure that it works this way.
>>>
>>> Christian.
>>>
>>> Am 11.05.22 um 09:22 schrieb Marek Olšák:
>>>
>>> Bypass means that the contents of the cache are ignored, which decreases
>>> latency at the cost of no coherency between bypassed and normal memory
>>> requests. NOA (noalloc) means that the cache is checked and can give you
>>> cache hits, but misses are not cached and the overall latency is higher. I
>>> don't know what the hw does, but I hope it was misnamed and it really means
>>> bypass because there is no point in doing cache lookups on every memory
>>> request if the driver wants to disable caching to *decrease* latency in the
>>> situations when the cache isn't helping.
>>>
>>> Marek
>>>
>>> On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo  wrote:
>>>


 On 5/11/2022 11:36 AM, Christian König wrote:
 > Mhm, it doesn't really bypass MALL. It just doesn't allocate any MALL
 > entries on write.
 >
 > How about AMDGPU_VM_PAGE_NO_MALL ?

 One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last level cache, * = some sort
 of attribute which decides LLC behaviour]

 Thanks,
 Lijo

 >
 > Christian.
 >
 > Am 10.05.22 um 23:21 schrieb Marek Olšák:
 >> A better name would be:
 >> AMDGPU_VM_PAGE_BYPASS_MALL
 >>
 >> Marek
 >>
 >> On Fri, May 6, 2022 at 7:23 AM Christian König
 >>  wrote:
 >>
 >> Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL
 >> allocation.
 >>
 >> Only compile tested!
 >>
 >> Signed-off-by: Christian König 
 >> ---
 >>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
 >>  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
 >>  drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
 >>  include/uapi/drm/amdgpu_drm.h   | 2 ++
 >>  4 files changed, 10 insertions(+)
 >>
 >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
 >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
 >> index bf97d8f07f57..d8129626581f 100644
 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
 >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
 >> @@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct
 >> amdgpu_device *adev, uint32_t flags)
 >> pte_flag |= AMDGPU_PTE_WRITEABLE;
 >> if (flags & AMDGPU_VM_PAGE_PRT)
 >> pte_flag |= AMDGPU_PTE_PRT;
 >> +   if (flags & AMDGPU_VM_PAGE_NOALLOC)
 >> +   pte_flag |= AMDGPU_PTE_NOALLOC;
 >>
 >> if (adev->gmc.gmc_funcs->map_mtype)
 >> pte_flag |= amdgpu_gmc_map_mtype(adev,
 >> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
 >> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
 >> index b8c79789e1e4..9077dfccaf3c 100644
 >> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
 >> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
 >> @@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct
 >> amdgpu_device *adev,
 >> *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
 >> *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
 >>
 >> +   *flags &= ~AMDGPU_PTE_NOALLOC;
 >> +   *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
 >> +
 >> if (mapping->flags & AMDGPU_PTE_PRT) {
 >> *flags |= AMDGPU_PTE_PRT;
 >> *flags |= AMDGPU_PTE_SNOOPED;
 >> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
 >> b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
 >> index 8d733eeac556..32ee56adb602 100644
 >> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
 >> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
 >> @@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct
 >> amdgpu_device *adev,
 >> *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
 >> *flags |= (mapping->flags & 

Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC

2022-05-16 Thread Marek Olšák
FYI, I think it's time to merge this because the Mesa commits are going to
be merged in ~30 minutes if Gitlab CI is green, and that includes updated
amdgpu_drm.h.

Marek

On Wed, May 11, 2022 at 2:55 PM Marek Olšák  wrote:

> Ok sounds good.
>
> Marek
>
> On Wed., May 11, 2022, 03:43 Christian König, <
> ckoenig.leichtzumer...@gmail.com> wrote:
>
>> It really *is* a NOALLOC feature. In other words there is no latency
>> improvement on reads because the cache is always checked, even with the
>> noalloc flag set.
>>
>> The only thing it affects is that misses not enter the cache and so don't
>> cause any additional pressure on evicting cache lines.
>>
>> You might want to double check with the hardware guys, but I'm something
>> like 95% sure that it works this way.
>>
>> Christian.
>>
>> Am 11.05.22 um 09:22 schrieb Marek Olšák:
>>
>> Bypass means that the contents of the cache are ignored, which decreases
>> latency at the cost of no coherency between bypassed and normal memory
>> requests. NOA (noalloc) means that the cache is checked and can give you
>> cache hits, but misses are not cached and the overall latency is higher. I
>> don't know what the hw does, but I hope it was misnamed and it really means
>> bypass because there is no point in doing cache lookups on every memory
>> request if the driver wants to disable caching to *decrease* latency in the
>> situations when the cache isn't helping.
>>
>> Marek
>>
>> On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo  wrote:
>>
>>>
>>>
>>> On 5/11/2022 11:36 AM, Christian König wrote:
>>> > Mhm, it doesn't really bypass MALL. It just doesn't allocate any MALL
>>> > entries on write.
>>> >
>>> > How about AMDGPU_VM_PAGE_NO_MALL ?
>>>
>>> One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last level cache, * = some sort
>>> of attribute which decides LLC behaviour]
>>>
>>> Thanks,
>>> Lijo
>>>
>>> >
>>> > Christian.
>>> >
>>> > Am 10.05.22 um 23:21 schrieb Marek Olšák:
>>> >> A better name would be:
>>> >> AMDGPU_VM_PAGE_BYPASS_MALL
>>> >>
>>> >> Marek
>>> >>
>>> >> On Fri, May 6, 2022 at 7:23 AM Christian König
>>> >>  wrote:
>>> >>
>>> >> Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL
>>> >> allocation.
>>> >>
>>> >> Only compile tested!
>>> >>
>>> >> Signed-off-by: Christian König 
>>> >> ---
>>> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
>>> >>  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
>>> >>  drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
>>> >>  include/uapi/drm/amdgpu_drm.h   | 2 ++
>>> >>  4 files changed, 10 insertions(+)
>>> >>
>>> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> >> index bf97d8f07f57..d8129626581f 100644
>>> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> >> @@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct
>>> >> amdgpu_device *adev, uint32_t flags)
>>> >> pte_flag |= AMDGPU_PTE_WRITEABLE;
>>> >> if (flags & AMDGPU_VM_PAGE_PRT)
>>> >> pte_flag |= AMDGPU_PTE_PRT;
>>> >> +   if (flags & AMDGPU_VM_PAGE_NOALLOC)
>>> >> +   pte_flag |= AMDGPU_PTE_NOALLOC;
>>> >>
>>> >> if (adev->gmc.gmc_funcs->map_mtype)
>>> >> pte_flag |= amdgpu_gmc_map_mtype(adev,
>>> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>> >> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>> >> index b8c79789e1e4..9077dfccaf3c 100644
>>> >> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>> >> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>> >> @@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct
>>> >> amdgpu_device *adev,
>>> >> *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>> >> *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
>>> >>
>>> >> +   *flags &= ~AMDGPU_PTE_NOALLOC;
>>> >> +   *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>>> >> +
>>> >> if (mapping->flags & AMDGPU_PTE_PRT) {
>>> >> *flags |= AMDGPU_PTE_PRT;
>>> >> *flags |= AMDGPU_PTE_SNOOPED;
>>> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>> >> b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>> >> index 8d733eeac556..32ee56adb602 100644
>>> >> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>> >> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>> >> @@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct
>>> >> amdgpu_device *adev,
>>> >> *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>> >> *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
>>> >>
>>> >> +   *flags &= ~AMDGPU_PTE_NOALLOC;
>>> >> +   *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>>> >> +
>>> >> if (mapping->flags & AMDGPU_PTE_PRT) {
>>> >> *flags |= AMDGPU_PTE_PRT;
>>> >>  

Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC

2022-05-11 Thread Marek Olšák
Ok sounds good.

Marek

On Wed., May 11, 2022, 03:43 Christian König, <
ckoenig.leichtzumer...@gmail.com> wrote:

> It really *is* a NOALLOC feature. In other words there is no latency
> improvement on reads because the cache is always checked, even with the
> noalloc flag set.
>
> The only thing it affects is that misses not enter the cache and so don't
> cause any additional pressure on evicting cache lines.
>
> You might want to double check with the hardware guys, but I'm something
> like 95% sure that it works this way.
>
> Christian.
>
> Am 11.05.22 um 09:22 schrieb Marek Olšák:
>
> Bypass means that the contents of the cache are ignored, which decreases
> latency at the cost of no coherency between bypassed and normal memory
> requests. NOA (noalloc) means that the cache is checked and can give you
> cache hits, but misses are not cached and the overall latency is higher. I
> don't know what the hw does, but I hope it was misnamed and it really means
> bypass because there is no point in doing cache lookups on every memory
> request if the driver wants to disable caching to *decrease* latency in the
> situations when the cache isn't helping.
>
> Marek
>
> On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo  wrote:
>
>>
>>
>> On 5/11/2022 11:36 AM, Christian König wrote:
>> > Mhm, it doesn't really bypass MALL. It just doesn't allocate any MALL
>> > entries on write.
>> >
>> > How about AMDGPU_VM_PAGE_NO_MALL ?
>>
>> One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last level cache, * = some sort
>> of attribute which decides LLC behaviour]
>>
>> Thanks,
>> Lijo
>>
>> >
>> > Christian.
>> >
>> > Am 10.05.22 um 23:21 schrieb Marek Olšák:
>> >> A better name would be:
>> >> AMDGPU_VM_PAGE_BYPASS_MALL
>> >>
>> >> Marek
>> >>
>> >> On Fri, May 6, 2022 at 7:23 AM Christian König
>> >>  wrote:
>> >>
>> >> Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL
>> >> allocation.
>> >>
>> >> Only compile tested!
>> >>
>> >> Signed-off-by: Christian König 
>> >> ---
>> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
>> >>  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
>> >>  drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
>> >>  include/uapi/drm/amdgpu_drm.h   | 2 ++
>> >>  4 files changed, 10 insertions(+)
>> >>
>> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> >> index bf97d8f07f57..d8129626581f 100644
>> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> >> @@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct
>> >> amdgpu_device *adev, uint32_t flags)
>> >> pte_flag |= AMDGPU_PTE_WRITEABLE;
>> >> if (flags & AMDGPU_VM_PAGE_PRT)
>> >> pte_flag |= AMDGPU_PTE_PRT;
>> >> +   if (flags & AMDGPU_VM_PAGE_NOALLOC)
>> >> +   pte_flag |= AMDGPU_PTE_NOALLOC;
>> >>
>> >> if (adev->gmc.gmc_funcs->map_mtype)
>> >> pte_flag |= amdgpu_gmc_map_mtype(adev,
>> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>> >> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>> >> index b8c79789e1e4..9077dfccaf3c 100644
>> >> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>> >> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>> >> @@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct
>> >> amdgpu_device *adev,
>> >> *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>> >> *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
>> >>
>> >> +   *flags &= ~AMDGPU_PTE_NOALLOC;
>> >> +   *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>> >> +
>> >> if (mapping->flags & AMDGPU_PTE_PRT) {
>> >> *flags |= AMDGPU_PTE_PRT;
>> >> *flags |= AMDGPU_PTE_SNOOPED;
>> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>> >> b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>> >> index 8d733eeac556..32ee56adb602 100644
>> >> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>> >> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>> >> @@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct
>> >> amdgpu_device *adev,
>> >> *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>> >> *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
>> >>
>> >> +   *flags &= ~AMDGPU_PTE_NOALLOC;
>> >> +   *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>> >> +
>> >> if (mapping->flags & AMDGPU_PTE_PRT) {
>> >> *flags |= AMDGPU_PTE_PRT;
>> >> *flags |= AMDGPU_PTE_SNOOPED;
>> >> diff --git a/include/uapi/drm/amdgpu_drm.h
>> >> b/include/uapi/drm/amdgpu_drm.h
>> >> index 57b9d8f0133a..9d71d6330687 100644
>> >> --- a/include/uapi/drm/amdgpu_drm.h
>> >> +++ b/include/uapi/drm/amdgpu_drm.h
>> >> @@ -533,6 +533,8 @@ struct drm_amdgpu_gem_op {
>> >>

Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC

2022-05-11 Thread Christian König
It really *is* a NOALLOC feature. In other words there is no latency 
improvement on reads because the cache is always checked, even with the 
noalloc flag set.


The only thing it affects is that misses not enter the cache and so 
don't cause any additional pressure on evicting cache lines.


You might want to double check with the hardware guys, but I'm something 
like 95% sure that it works this way.


Christian.

Am 11.05.22 um 09:22 schrieb Marek Olšák:
Bypass means that the contents of the cache are ignored, which 
decreases latency at the cost of no coherency between bypassed and 
normal memory requests. NOA (noalloc) means that the cache is checked 
and can give you cache hits, but misses are not cached and the overall 
latency is higher. I don't know what the hw does, but I hope it was 
misnamed and it really means bypass because there is no point in doing 
cache lookups on every memory request if the driver wants to disable 
caching to *decrease* latency in the situations when the cache isn't 
helping.


Marek

On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo  wrote:



On 5/11/2022 11:36 AM, Christian König wrote:
> Mhm, it doesn't really bypass MALL. It just doesn't allocate any
MALL
> entries on write.
>
> How about AMDGPU_VM_PAGE_NO_MALL ?

One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last level cache, * = some
sort
of attribute which decides LLC behaviour]

Thanks,
Lijo

>
> Christian.
>
> Am 10.05.22 um 23:21 schrieb Marek Olšák:
>> A better name would be:
>> AMDGPU_VM_PAGE_BYPASS_MALL
>>
>> Marek
>>
>> On Fri, May 6, 2022 at 7:23 AM Christian König
>>  wrote:
>>
>>     Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL
>>     allocation.
>>
>>     Only compile tested!
>>
>>     Signed-off-by: Christian König 
>>     ---
>>      drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
>>      drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
>>      drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
>>      include/uapi/drm/amdgpu_drm.h           | 2 ++
>>      4 files changed, 10 insertions(+)
>>
>>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>     index bf97d8f07f57..d8129626581f 100644
>>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>     @@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct
>>     amdgpu_device *adev, uint32_t flags)
>>                     pte_flag |= AMDGPU_PTE_WRITEABLE;
>>             if (flags & AMDGPU_VM_PAGE_PRT)
>>                     pte_flag |= AMDGPU_PTE_PRT;
>>     +       if (flags & AMDGPU_VM_PAGE_NOALLOC)
>>     +               pte_flag |= AMDGPU_PTE_NOALLOC;
>>
>>             if (adev->gmc.gmc_funcs->map_mtype)
>>                     pte_flag |= amdgpu_gmc_map_mtype(adev,
>>     diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>     b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>     index b8c79789e1e4..9077dfccaf3c 100644
>>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>     @@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct
>>     amdgpu_device *adev,
>>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>             *flags |= (mapping->flags &
AMDGPU_PTE_MTYPE_NV10_MASK);
>>
>>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>>     +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>>     +
>>             if (mapping->flags & AMDGPU_PTE_PRT) {
>>                     *flags |= AMDGPU_PTE_PRT;
>>                     *flags |= AMDGPU_PTE_SNOOPED;
>>     diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>     b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>     index 8d733eeac556..32ee56adb602 100644
>>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>     @@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct
>>     amdgpu_device *adev,
>>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>             *flags |= (mapping->flags &
AMDGPU_PTE_MTYPE_NV10_MASK);
>>
>>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>>     +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>>     +
>>             if (mapping->flags & AMDGPU_PTE_PRT) {
>>                     *flags |= AMDGPU_PTE_PRT;
>>                     *flags |= AMDGPU_PTE_SNOOPED;
>>     diff --git a/include/uapi/drm/amdgpu_drm.h
>>     b/include/uapi/drm/amdgpu_drm.h
>>     index 57b9d8f0133a..9d71d6330687 100644
>>     --- a/include/uapi/drm/amdgpu_drm.h
>>     +++ b/include/uapi/drm/amdgpu_drm.h
>>     @@ -533,6 +533,8 @@ struct drm_amdgpu_gem_op {
>>      #define AMDGPU_VM_MTYPE_UC             (4 << 5)
 

Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC

2022-05-11 Thread Marek Olšák
Bypass means that the contents of the cache are ignored, which decreases
latency at the cost of no coherency between bypassed and normal memory
requests. NOA (noalloc) means that the cache is checked and can give you
cache hits, but misses are not cached and the overall latency is higher. I
don't know what the hw does, but I hope it was misnamed and it really means
bypass because there is no point in doing cache lookups on every memory
request if the driver wants to disable caching to *decrease* latency in the
situations when the cache isn't helping.

Marek

On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo  wrote:

>
>
> On 5/11/2022 11:36 AM, Christian König wrote:
> > Mhm, it doesn't really bypass MALL. It just doesn't allocate any MALL
> > entries on write.
> >
> > How about AMDGPU_VM_PAGE_NO_MALL ?
>
> One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last level cache, * = some sort
> of attribute which decides LLC behaviour]
>
> Thanks,
> Lijo
>
> >
> > Christian.
> >
> > Am 10.05.22 um 23:21 schrieb Marek Olšák:
> >> A better name would be:
> >> AMDGPU_VM_PAGE_BYPASS_MALL
> >>
> >> Marek
> >>
> >> On Fri, May 6, 2022 at 7:23 AM Christian König
> >>  wrote:
> >>
> >> Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL
> >> allocation.
> >>
> >> Only compile tested!
> >>
> >> Signed-off-by: Christian König 
> >> ---
> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
> >>  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
> >>  drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
> >>  include/uapi/drm/amdgpu_drm.h   | 2 ++
> >>  4 files changed, 10 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >> index bf97d8f07f57..d8129626581f 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >> @@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct
> >> amdgpu_device *adev, uint32_t flags)
> >> pte_flag |= AMDGPU_PTE_WRITEABLE;
> >> if (flags & AMDGPU_VM_PAGE_PRT)
> >> pte_flag |= AMDGPU_PTE_PRT;
> >> +   if (flags & AMDGPU_VM_PAGE_NOALLOC)
> >> +   pte_flag |= AMDGPU_PTE_NOALLOC;
> >>
> >> if (adev->gmc.gmc_funcs->map_mtype)
> >> pte_flag |= amdgpu_gmc_map_mtype(adev,
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >> index b8c79789e1e4..9077dfccaf3c 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >> @@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct
> >> amdgpu_device *adev,
> >> *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
> >> *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
> >>
> >> +   *flags &= ~AMDGPU_PTE_NOALLOC;
> >> +   *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
> >> +
> >> if (mapping->flags & AMDGPU_PTE_PRT) {
> >> *flags |= AMDGPU_PTE_PRT;
> >> *flags |= AMDGPU_PTE_SNOOPED;
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> >> b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> >> index 8d733eeac556..32ee56adb602 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> >> @@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct
> >> amdgpu_device *adev,
> >> *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
> >> *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
> >>
> >> +   *flags &= ~AMDGPU_PTE_NOALLOC;
> >> +   *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
> >> +
> >> if (mapping->flags & AMDGPU_PTE_PRT) {
> >> *flags |= AMDGPU_PTE_PRT;
> >> *flags |= AMDGPU_PTE_SNOOPED;
> >> diff --git a/include/uapi/drm/amdgpu_drm.h
> >> b/include/uapi/drm/amdgpu_drm.h
> >> index 57b9d8f0133a..9d71d6330687 100644
> >> --- a/include/uapi/drm/amdgpu_drm.h
> >> +++ b/include/uapi/drm/amdgpu_drm.h
> >> @@ -533,6 +533,8 @@ struct drm_amdgpu_gem_op {
> >>  #define AMDGPU_VM_MTYPE_UC (4 << 5)
> >>  /* Use Read Write MTYPE instead of default MTYPE */
> >>  #define AMDGPU_VM_MTYPE_RW (5 << 5)
> >> +/* don't allocate MALL */
> >> +#define AMDGPU_VM_PAGE_NOALLOC (1 << 9)
> >>
> >>  struct drm_amdgpu_gem_va {
> >> /** GEM object handle */
> >> --
> >> 2.25.1
> >>
> >
>


Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC

2022-05-11 Thread Lazar, Lijo




On 5/11/2022 11:36 AM, Christian König wrote:
Mhm, it doesn't really bypass MALL. It just doesn't allocate any MALL 
entries on write.


How about AMDGPU_VM_PAGE_NO_MALL ?


One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last level cache, * = some sort 
of attribute which decides LLC behaviour]


Thanks,
Lijo



Christian.

Am 10.05.22 um 23:21 schrieb Marek Olšák:

A better name would be:
AMDGPU_VM_PAGE_BYPASS_MALL

Marek

On Fri, May 6, 2022 at 7:23 AM Christian König 
 wrote:


Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL
allocation.

Only compile tested!

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
 drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
 include/uapi/drm/amdgpu_drm.h           | 2 ++
 4 files changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index bf97d8f07f57..d8129626581f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct
amdgpu_device *adev, uint32_t flags)
                pte_flag |= AMDGPU_PTE_WRITEABLE;
        if (flags & AMDGPU_VM_PAGE_PRT)
                pte_flag |= AMDGPU_PTE_PRT;
+       if (flags & AMDGPU_VM_PAGE_NOALLOC)
+               pte_flag |= AMDGPU_PTE_NOALLOC;

        if (adev->gmc.gmc_funcs->map_mtype)
                pte_flag |= amdgpu_gmc_map_mtype(adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index b8c79789e1e4..9077dfccaf3c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct
amdgpu_device *adev,
        *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
        *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);

+       *flags &= ~AMDGPU_PTE_NOALLOC;
+       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
+
        if (mapping->flags & AMDGPU_PTE_PRT) {
                *flags |= AMDGPU_PTE_PRT;
                *flags |= AMDGPU_PTE_SNOOPED;
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
index 8d733eeac556..32ee56adb602 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
@@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct
amdgpu_device *adev,
        *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
        *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);

+       *flags &= ~AMDGPU_PTE_NOALLOC;
+       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
+
        if (mapping->flags & AMDGPU_PTE_PRT) {
                *flags |= AMDGPU_PTE_PRT;
                *flags |= AMDGPU_PTE_SNOOPED;
diff --git a/include/uapi/drm/amdgpu_drm.h
b/include/uapi/drm/amdgpu_drm.h
index 57b9d8f0133a..9d71d6330687 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -533,6 +533,8 @@ struct drm_amdgpu_gem_op {
 #define AMDGPU_VM_MTYPE_UC             (4 << 5)
 /* Use Read Write MTYPE instead of default MTYPE */
 #define AMDGPU_VM_MTYPE_RW             (5 << 5)
+/* don't allocate MALL */
+#define AMDGPU_VM_PAGE_NOALLOC         (1 << 9)

 struct drm_amdgpu_gem_va {
        /** GEM object handle */
-- 
2.25.1






Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC

2022-05-11 Thread Christian König
Mhm, it doesn't really bypass MALL. It just doesn't allocate any MALL 
entries on write.


How about AMDGPU_VM_PAGE_NO_MALL ?

Christian.

Am 10.05.22 um 23:21 schrieb Marek Olšák:

A better name would be:
AMDGPU_VM_PAGE_BYPASS_MALL

Marek

On Fri, May 6, 2022 at 7:23 AM Christian König 
 wrote:


Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL
allocation.

Only compile tested!

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
 drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
 include/uapi/drm/amdgpu_drm.h           | 2 ++
 4 files changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index bf97d8f07f57..d8129626581f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct
amdgpu_device *adev, uint32_t flags)
                pte_flag |= AMDGPU_PTE_WRITEABLE;
        if (flags & AMDGPU_VM_PAGE_PRT)
                pte_flag |= AMDGPU_PTE_PRT;
+       if (flags & AMDGPU_VM_PAGE_NOALLOC)
+               pte_flag |= AMDGPU_PTE_NOALLOC;

        if (adev->gmc.gmc_funcs->map_mtype)
                pte_flag |= amdgpu_gmc_map_mtype(adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index b8c79789e1e4..9077dfccaf3c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct
amdgpu_device *adev,
        *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
        *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);

+       *flags &= ~AMDGPU_PTE_NOALLOC;
+       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
+
        if (mapping->flags & AMDGPU_PTE_PRT) {
                *flags |= AMDGPU_PTE_PRT;
                *flags |= AMDGPU_PTE_SNOOPED;
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
index 8d733eeac556..32ee56adb602 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
@@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct
amdgpu_device *adev,
        *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
        *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);

+       *flags &= ~AMDGPU_PTE_NOALLOC;
+       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
+
        if (mapping->flags & AMDGPU_PTE_PRT) {
                *flags |= AMDGPU_PTE_PRT;
                *flags |= AMDGPU_PTE_SNOOPED;
diff --git a/include/uapi/drm/amdgpu_drm.h
b/include/uapi/drm/amdgpu_drm.h
index 57b9d8f0133a..9d71d6330687 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -533,6 +533,8 @@ struct drm_amdgpu_gem_op {
 #define AMDGPU_VM_MTYPE_UC             (4 << 5)
 /* Use Read Write MTYPE instead of default MTYPE */
 #define AMDGPU_VM_MTYPE_RW             (5 << 5)
+/* don't allocate MALL */
+#define AMDGPU_VM_PAGE_NOALLOC         (1 << 9)

 struct drm_amdgpu_gem_va {
        /** GEM object handle */
-- 
2.25.1




Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC

2022-05-10 Thread Marek Olšák
A better name would be:
AMDGPU_VM_PAGE_BYPASS_MALL

Marek

On Fri, May 6, 2022 at 7:23 AM Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL allocation.
>
> Only compile tested!
>
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
>  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
>  drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
>  include/uapi/drm/amdgpu_drm.h   | 2 ++
>  4 files changed, 10 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index bf97d8f07f57..d8129626581f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct amdgpu_device
> *adev, uint32_t flags)
> pte_flag |= AMDGPU_PTE_WRITEABLE;
> if (flags & AMDGPU_VM_PAGE_PRT)
> pte_flag |= AMDGPU_PTE_PRT;
> +   if (flags & AMDGPU_VM_PAGE_NOALLOC)
> +   pte_flag |= AMDGPU_PTE_NOALLOC;
>
> if (adev->gmc.gmc_funcs->map_mtype)
> pte_flag |= amdgpu_gmc_map_mtype(adev,
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> index b8c79789e1e4..9077dfccaf3c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> @@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct amdgpu_device
> *adev,
> *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
> *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
>
> +   *flags &= ~AMDGPU_PTE_NOALLOC;
> +   *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
> +
> if (mapping->flags & AMDGPU_PTE_PRT) {
> *flags |= AMDGPU_PTE_PRT;
> *flags |= AMDGPU_PTE_SNOOPED;
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> index 8d733eeac556..32ee56adb602 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> @@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct amdgpu_device
> *adev,
> *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
> *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
>
> +   *flags &= ~AMDGPU_PTE_NOALLOC;
> +   *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
> +
> if (mapping->flags & AMDGPU_PTE_PRT) {
> *flags |= AMDGPU_PTE_PRT;
> *flags |= AMDGPU_PTE_SNOOPED;
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index 57b9d8f0133a..9d71d6330687 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -533,6 +533,8 @@ struct drm_amdgpu_gem_op {
>  #define AMDGPU_VM_MTYPE_UC (4 << 5)
>  /* Use Read Write MTYPE instead of default MTYPE */
>  #define AMDGPU_VM_MTYPE_RW (5 << 5)
> +/* don't allocate MALL */
> +#define AMDGPU_VM_PAGE_NOALLOC (1 << 9)
>
>  struct drm_amdgpu_gem_va {
> /** GEM object handle */
> --
> 2.25.1
>
>


[PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC

2022-05-06 Thread Christian König
Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL allocation.

Only compile tested!

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
 drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
 include/uapi/drm/amdgpu_drm.h   | 2 ++
 4 files changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index bf97d8f07f57..d8129626581f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct amdgpu_device 
*adev, uint32_t flags)
pte_flag |= AMDGPU_PTE_WRITEABLE;
if (flags & AMDGPU_VM_PAGE_PRT)
pte_flag |= AMDGPU_PTE_PRT;
+   if (flags & AMDGPU_VM_PAGE_NOALLOC)
+   pte_flag |= AMDGPU_PTE_NOALLOC;
 
if (adev->gmc.gmc_funcs->map_mtype)
pte_flag |= amdgpu_gmc_map_mtype(adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index b8c79789e1e4..9077dfccaf3c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct amdgpu_device *adev,
*flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
*flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
 
+   *flags &= ~AMDGPU_PTE_NOALLOC;
+   *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
+
if (mapping->flags & AMDGPU_PTE_PRT) {
*flags |= AMDGPU_PTE_PRT;
*flags |= AMDGPU_PTE_SNOOPED;
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
index 8d733eeac556..32ee56adb602 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
@@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct amdgpu_device *adev,
*flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
*flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
 
+   *flags &= ~AMDGPU_PTE_NOALLOC;
+   *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
+
if (mapping->flags & AMDGPU_PTE_PRT) {
*flags |= AMDGPU_PTE_PRT;
*flags |= AMDGPU_PTE_SNOOPED;
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 57b9d8f0133a..9d71d6330687 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -533,6 +533,8 @@ struct drm_amdgpu_gem_op {
 #define AMDGPU_VM_MTYPE_UC (4 << 5)
 /* Use Read Write MTYPE instead of default MTYPE */
 #define AMDGPU_VM_MTYPE_RW (5 << 5)
+/* don't allocate MALL */
+#define AMDGPU_VM_PAGE_NOALLOC (1 << 9)
 
 struct drm_amdgpu_gem_va {
/** GEM object handle */
-- 
2.25.1