RE: [PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports

2023-04-27 Thread Kuehling, Felix
[AMD Official Use Only - General]

Re-mapping typically happens after evictions, before a new eviction fence gets 
attached. At that time the old eviction fence should be in the signaled state 
already, so it can't be signaled again. Therefore I would expect my patch to 
help with unmapping the DMABuf import, without breaking the eviction case.

Are you talking about remapping with a map-to-gpu call from user mode? I think 
that would only be a problem if the KFD BO was unmapped and remapped multiple 
times. The first time it's mapped, the fresh dmabuf import should be in the 
SYSTEM domain, so the validation in the SYSTEM domain before GTT would be a 
no-op.

I sort of agree that we don't really rely on the eviction fence on the DMABuf 
import. The reservation object is shared with the original BO. Moving the 
original BO triggers the eviction fence, so we don't need to trigger it again 
on the dmabuf import. Other than moving the original BO, I don't think we can 
do anything to the DMABuf import that would require an eviction for KFD use 
case. It is a special use case because we control both the import and the 
export in the same context.

In the general case dmabuf imports need their eviction fences. For example when 
we're importing a DMABuf from somewhere else, so the eviction fence is not 
shared with a BO that we already control. Even then, unmapping a dmabuf from 
our KFD VM does not need to wait for any fences on the DMABuf.

Regards,
  Felix

-Original Message-
From: Huang, JinHuiEric  
Sent: Thursday, April 27, 2023 14:58
To: Kuehling, Felix ; Koenig, Christian 
; Christian König ; 
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating 
preemptible DMABuf imports

Hi Felix,

I tested your patch on mGPU systems. It doesn't break any KFD eviction tests, 
because tests don't allocate DMABuf import, that doesn't trigger it's eviction 
fence. The only thing the patch affects is in re-mapping DMABuf imports that 
the eviction will still be triggered.

I have an idea that we probably can remove eviction fence for GTT bo, because 
currently the only way to trigger the eviction fence is by calling 
ttm_bo_validate for CPU domain in kfd_mem_dmaunmap_dmabuf. Do you know there is 
other case to trigger GTT bo's eviction?

Regards,
Eric

On 2023-04-26 22:21, Felix Kuehling wrote:
> Hi Eric,
>
> Can you try if the attached patch fixes the problem without breaking 
> the eviction tests on a multi-GPU PCIe P2P system?
>
> Thanks,
>   Felix
>
>
> On 2023-04-26 13:02, Christian König wrote:
>> Am 26.04.23 um 18:58 schrieb Felix Kuehling:
>>>
>>> On 2023-04-26 9:03, Christian König wrote:
>>>> Am 25.04.23 um 16:11 schrieb Eric Huang:
>>>>> Hi Christian,
>>>>>
>>>>> What do you think about Felix's explanation?
>>>>
>>>> That's unfortunately not something we can do here.
>>>>
>>>>>
>>>>> Regards,
>>>>> Eric
>>>>>
>>>>> On 2023-04-13 09:28, Felix Kuehling wrote:
>>>>>> Am 2023-04-13 um 07:35 schrieb Christian König:
>>>>>>> Am 13.04.23 um 03:01 schrieb Felix Kuehling:
>>>>>>>> Am 2023-04-12 um 18:25 schrieb Eric Huang:
>>>>>>>>> It is to avoid redundant eviction for KFD's DMAbuf import bo 
>>>>>>>>> when dmaunmapping DMAbuf. The DMAbuf import bo has been set as 
>>>>>>>>> AMDGPU_PL_PREEMPT in KFD when mapping.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Eric Huang 
>>>>>>>>
>>>>>>>> Reviewed-by: Felix Kuehling 
>>>>>>>>
>>>>>>>> I'd like to get an Acked-by from Christian as well before 
>>>>>>>> submitting this.
>>>>>>>
>>>>>>> I have to admit that I only partially followed the internal 
>>>>>>> discussion, but in general you need a *really* good explanation 
>>>>>>> for this.
>>>>>>>
>>>>>>> E.g. add code comment and explain in the commit message 
>>>>>>> extensively why this is needed and why there are no alternatives.
>>>>>>
>>>>>> OK. I'll give it a shot:
>>>>>>
>>>>>>    This code path is used among other things when invalidating 
>>>>>> DMABuf
>>>>>>    imports. These imports share a reservation object with the 
>>>>>> exported
>>>>>>    BO. Waiting on all the fences in this reservation will trigger 
>>>>>> KFD
>>>&g

Re: [PATCH] drm/amdkfd: Remove Align VRAM allocations to 1MB on APU ASIC

2022-07-15 Thread Kuehling, Felix
[AMD Official Use Only - General]

As a compromise we are considering a change that restores the old allocation 
behaviour, keeping the more conservative estimate only for the available-memory 
API.

Regards,
  Felix


From: Guo, Shikai 
Sent: Thursday, July 14, 2022 11:21 PM
To: Kuehling, Felix ; amd-gfx@lists.freedesktop.org 

Cc: Phillips, Daniel ; Ji, Ruili ; 
Liu, Aaron 
Subject: RE: [PATCH] drm/amdkfd: Remove Align VRAM allocations to 1MB on APU 
ASIC

[AMD Official Use Only - General]

Thanks Felix comment, I will further debug this issue.

-Original Message-
From: Guo, Shikai
Sent: Friday, July 15, 2022 11:21 AM
To: Kuehling, Felix ; amd-gfx@lists.freedesktop.org
Cc: Phillips, Daniel ; Ji, Ruili ; 
Liu, Aaron 
Subject: RE: [PATCH] drm/amdkfd: Remove Align VRAM allocations to 1MB on APU 
ASIC

[AMD Official Use Only - General]

This Felix comment, I will further debug this issue.

-Original Message-
From: Kuehling, Felix 
Sent: Wednesday, July 13, 2022 10:17 PM
To: Guo, Shikai ; amd-gfx@lists.freedesktop.org
Cc: Phillips, Daniel ; Ji, Ruili ; 
Liu, Aaron 
Subject: Re: [PATCH] drm/amdkfd: Remove Align VRAM allocations to 1MB on APU 
ASIC


Am 2022-07-13 um 05:14 schrieb shikai guo:
> From: Shikai Guo 
>
> While executing KFDMemoryTest.MMBench, test case will allocate 4KB size 
> memory 1000 times.
> Every time, user space will get 2M memory.APU VRAM is 512M, there is not 
> enough memory to be allocated.
> So the 2M aligned feature is not suitable for APU.
NAK. We can try to make the estimate of available VRAM more accurate.
But in the end, this comes down to limitations of the VRAM manager and how it 
handles memory fragmentation.

A large discrepancy between total VRAM and available VRAM can have a few
reasons:

  * Big system memory means we need to reserve more space for page tables
  * Many small allocations causing lots of fragmentation. This may be
the result of memory leaks in previous tests

This patch can "fix" a situation where a leak caused excessive fragmentation. 
But that just papers over the leak. And it will cause the opposite problem for 
the new AvailableMemory test that checks that we can really allocate as much 
memory as we promised.

Regards,
   Felix


>
> guoshikai@guoshikai-MayanKD-RMB:~/linth/libhsakmt/tests/kfdtest/build$ 
> ./kfdtest --gtest_filter=KFDMemoryTest.MMBench
> [  ] Profile: Full Test
> [  ] HW capabilities: 0x9
> Note: Google Test filter = KFDMemoryTest.MMBench [==] Running
> 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from KFDMemoryTest
> [ RUN  ] KFDMemoryTest.MMBench
> [  ] Found VRAM of 512MB.
> [  ] Available VRAM 328MB.
> [  ] Test (avg. ns) alloc   mapOne  umapOne   mapAll  umapAll 
> free
> [  ] 
> --
> [  ]   4K-SysMem-noSDMA 2656110350 5212 3787 3981 
>12372
> [  ]  64K-SysMem-noSDMA 42864 6648 3973 5223 3843 
>15100
> [  ]   2M-SysMem-noSDMA31290612614 4390 6254 4790 
>70260
> [  ]  32M-SysMem-noSDMA   4417812   130437216259768718500 
>   929562
> [  ]   1G-SysMem-noSDMA 132161000  2738000   583000  2181000   499000 
> 39091000
> [  ] 
> --
> /home/guoshikai/linth/libhsakmt/tests/kfdtest/src/KFDMemoryTest.cpp:92
> 2: Failure Value of: (hsaKmtAllocMemory(allocNode, bufSize, memFlags, 
> [i]))
>Actual: 6
> Expected: HSAKMT_STATUS_SUCCESS
> Which is: 0
> [  FAILED  ] KFDMemoryTest.MMBench (749 ms)
>
> fix this issue by adding different treatments for apu and dgpu
>
> Signed-off-by: ruili ji 
> Signed-off-by: shikai guo 
> ---
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c   | 18 +-
>   1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index d1657de5f875..2ad2cd5e3e8b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -115,7 +115,9 @@ void amdgpu_amdkfd_reserve_system_mem(uint64_t size)
>* compromise that should work in most cases without reserving too
>* much memory for page tables unnecessarily (factor 16K, >> 14).
>*/
> -#define ESTIMATE_PT_SIZE(mem_size) max(((mem_size) >> 14),
> AMDGPU_VM_RESERVED_VRAM)
> +
> +#define ESTIMATE_PT_SIZE(adev, mem_size)   (adev->flags & AMD_IS_APU) ? \
> +(mem_size >> 14) : max(((mem_size) >> 14)

Re: [PATCH] Revert "drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2"

2020-04-15 Thread Kuehling, Felix
[AMD Official Use Only - Internal Distribution Only]

The test does not access outside of the allocated memory. But it deliberately 
crosses a boundary where memory can be allocated non-contiguously. This is 
meant to catch problems where the access function doesn't handle non-contiguous 
VRAM allocations correctly. However, the way that VRAM allocation has been 
optimized, I expect that most allocations are contiguous nowadays. However, the 
more interesting aspect of the test is, that it performs misaligned memory 
accesses. The MMIO method of accessing VRAM explicitly handles misaligned 
accesses and breaks them down into dword aligned accesses with proper masking 
and shifting.

Could the unaligned nature of the memory access have something to do with 
hitting RAS errors? That's something unique to this test that we wouldn't see 
on a normal page table update or memory eviction.

Regards,
  Felix


From: Koenig, Christian 
Sent: Wednesday, April 15, 2020 6:58 AM
To: Kim, Jonathan ; Kuehling, Felix 
; Deucher, Alexander 
Cc: Russell, Kent ; amd-gfx@lists.freedesktop.org 

Subject: Re: [PATCH] Revert "drm/amdgpu: use the BAR if possible in 
amdgpu_device_vram_access v2"


To elaborate on the PTRACE test, we PEEK 2 DWORDs inside thunk allocated mapped 
memory and 2 DWORDS outside that boundary (it’s only about 4MB to the 
boundary).  Then we POKE to swap the DWORD positions across the boundary.  The 
RAS event on the single failing machine happens on the out of boundary PEEK.

Well when you access outside of an allocated buffer I would expect that we 
never get as far as even touching the hardware because the kernel should block 
the access with an -EPERM or -EFAULT. So sounds like I'm not understanding 
something correctly here.

Apart from that I completely agree that we need to sort out any other RAS event 
first to make sure that the system is simply not failing randomly.

Regards,
Christian.

Am 15.04.20 um 11:49 schrieb Kim, Jonathan:

[AMD Public Use]



Hi Christian,



That could potentially be it.  With additional testing, 2 of 3 Vega20 machines 
never hit error over BAR access with the PTRACE test.  3 of 3 machines (from 
the same pool) always hit error with CWSR.

To elaborate on the PTRACE test, we PEEK 2 DWORDs inside thunk allocated mapped 
memory and 2 DWORDS outside that boundary (it’s only about 4MB to the 
boundary).  Then we POKE to swap the DWORD positions across the boundary.  The 
RAS event on the single failing machine happens on the out of boundary PEEK.



Felix mentioned we don’t hit errors over general HDP access but that may not 
true.  An Arcturus failure sys logs posted (which wasn’t tested by me) shows 
someone launched rocm bandwidth test, hit a VM fault and a RAS event ensued 
during evictions (I can point the internal ticket or log snippet offline if 
interested).  Whether the RAS event is BAR access triggered or the result of HW 
instability is beyond me since I don’t have access to the machine.



Thanks,



Jon



From: Koenig, Christian 
<mailto:christian.koe...@amd.com>
Sent: Wednesday, April 15, 2020 4:11 AM
To: Kim, Jonathan <mailto:jonathan@amd.com>; 
Kuehling, Felix <mailto:felix.kuehl...@amd.com>; 
Deucher, Alexander <mailto:alexander.deuc...@amd.com>
Cc: Russell, Kent <mailto:kent.russ...@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] Revert "drm/amdgpu: use the BAR if possible in 
amdgpu_device_vram_access v2"



Hi Jon,


Also cwsr tests fail on Vega20 with or without the revert with the same RAS 
error.

That sounds like the system/setup has a more general problem.

Could it be that we are seeing RAS errors because there really is some hardware 
failure, but with the MM path we don't trigger a RAS interrupt?

Thanks,
Christian.

Am 14.04.20 um 22:30 schrieb Kim, Jonathan:

[AMD Official Use Only - Internal Distribution Only]



If we’re passing the test on the revert, then the only thing that’s different 
is we’re not invalidating HDP and doing a copy to host anymore in 
amdgpu_device_vram_access since the function is still called in ttm 
access_memory with BAR.



Also cwsr tests fail on Vega20 with or without the revert with the same RAS 
error.



Thanks,



Jon



From: Kuehling, Felix <mailto:felix.kuehl...@amd.com>
Sent: Tuesday, April 14, 2020 2:32 PM
To: Kim, Jonathan <mailto:jonathan@amd.com>; Koenig, 
Christian <mailto:christian.koe...@amd.com>; Deucher, 
Alexander <mailto:alexander.deuc...@amd.com>
Cc: Russell, Kent <mailto:kent.russ...@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] Revert "drm/amdgpu: use the BAR if possible in 
amdgpu_device_vram_access v2"



I wouldn't call it premature. Revert is a usual practice when there is a 
serious regression that isn't fully understood or root-caused. As

RE: [PATCH v2] drm/amdkfd: Provide SMI events watch

2020-04-03 Thread Kuehling, Felix
[AMD Official Use Only - Internal Distribution Only]

So are you saying you'll make the event descriptions text rather than binary?

If you switch to a text format, I wouldn't use a binary header. Rather I'd make 
it a text format completely. You could use one line per event, that makes it 
easy to use something like fgets to read a line (event) at a time in user mode.

Each line could still start with an event identifier, but it would be text 
rather than a binary. And you don’t need the size if you define "\n" as 
delimiter between events.

Regards,
  Felix

-Original Message-
From: Lin, Amber  
Sent: Friday, April 3, 2020 11:38
To: Kuehling, Felix ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH v2] drm/amdkfd: Provide SMI events watch

Further thinking about it, I'll use struct kfd_smi_msg_header. Instead of using 
struct kfd_smi_msg_vmfault, it's a description about the event. 
This way we make it generic to all events.

On 2020-04-03 9:38 a.m., Amber Lin wrote:
> Thanks Felix. I'll make changes accordingly but please pay attention 
> to my last reply inline.
>
> On 2020-04-02 7:51 p.m., Felix Kuehling wrote:
>> On 2020-04-02 4:46 p.m., Amber Lin wrote:
>>> When the compute is malfunctioning or performance drops, the system 
>>> admin will use SMI (System Management Interface) tool to 
>>> monitor/diagnostic what went wrong. This patch provides an event 
>>> watch interface for the user space to register events they are 
>>> interested. After the event is registered, the user can use 
>>> annoymous file descriptor's poll function with wait-time specified 
>>> to wait for the event to happen. Once the event happens, the user 
>>> can use read() to retrieve information related to the event.
>>>
>>> VM fault event is done in this patch.
>>>
>>> v2: - remove UNREGISTER and add event ENABLE/DISABLE
>>>  - correct kfifo usage
>>>  - move event message API to kfd_ioctl.h
>>>
>>> Signed-off-by: Amber Lin 
>>> ---
>>>   drivers/gpu/drm/amd/amdkfd/Makefile  |   3 +-
>>>   drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c |   2 +
>>>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  30 
>>>   drivers/gpu/drm/amd/amdkfd/kfd_device.c  |   1 +
>>>   drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c  |   2 +
>>>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  12 ++
>>>   drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c  | 177
>>> +++
>>>   drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h  |  31 
>>>   include/uapi/linux/kfd_ioctl.h   |  30 +++-
>>>   9 files changed, 286 insertions(+), 2 deletions(-)
>>>   create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
>>>   create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile
>>> b/drivers/gpu/drm/amd/amdkfd/Makefile
>>> index 6147462..cc98b4a 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/Makefile
>>> +++ b/drivers/gpu/drm/amd/amdkfd/Makefile
>>> @@ -53,7 +53,8 @@ AMDKFD_FILES    := $(AMDKFD_PATH)/kfd_module.o \
>>>   $(AMDKFD_PATH)/kfd_int_process_v9.o \
>>>   $(AMDKFD_PATH)/kfd_dbgdev.o \
>>>   $(AMDKFD_PATH)/kfd_dbgmgr.o \
>>> -    $(AMDKFD_PATH)/kfd_crat.o
>>> +    $(AMDKFD_PATH)/kfd_crat.o \
>>> +    $(AMDKFD_PATH)/kfd_smi_events.o
>>>     ifneq ($(CONFIG_AMD_IOMMU_V2),)
>>>   AMDKFD_FILES += $(AMDKFD_PATH)/kfd_iommu.o diff --git 
>>> a/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
>>> b/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
>>> index 9f59ba9..24b4717 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
>>> @@ -24,6 +24,7 @@
>>>   #include "kfd_events.h"
>>>   #include "cik_int.h"
>>>   #include "amdgpu_amdkfd.h"
>>> +#include "kfd_smi_events.h"
>>>     static bool cik_event_interrupt_isr(struct kfd_dev *dev,
>>>   const uint32_t *ih_ring_entry, @@ -107,6 
>>> +108,7 @@ static void cik_event_interrupt_wq(struct kfd_dev *dev,
>>>   ihre->source_id == CIK_INTSRC_GFX_MEM_PROT_FAULT) {
>>>   struct kfd_vm_fault_info info;
>>>   +    kfd_smi_event_update_vmfault(dev, pasid);
>>>   kfd_process_vm_fault(dev->dqm, pasid);
>>>     memset(, 0, sizeof(info)); diff --git 
>>> a/drivers/gpu/drm/amd/amdkfd/k

RE: linux-next: Tree for Feb 26 (gpu/drm/amd/amdgpu/)

2020-02-26 Thread Kuehling, Felix
[AMD Official Use Only - Internal Distribution Only]

I pushed the fix to amd-staging-drm-next. How do we get this into linux-next?

-Original Message-
From: Kuehling, Felix 
Sent: Wednesday, February 26, 2020 11:11
To: Pan, Xinhui ; Deucher, Alexander 
(alexander.deuc...@amd.com) 
Cc: amd-gfx@lists.freedesktop.org
Subject: RE: linux-next: Tree for Feb 26 (gpu/drm/amd/amdgpu/)

[AMD Official Use Only - Internal Distribution Only]

[Dropping most, +Alex, +Xinhui]

Looks like this was introduced by Xinhui's change:
commit be8e48e0849943fb53457e2fd83905eaf19cb1f7
Author: xinhui pan 
Date:   Tue Feb 11 11:28:34 2020 +0800

drm/amdgpu: Remove kfd eviction fence before release bo

No need to trigger eviction as the memory mapping will not be used
anymore.

All pt/pd bos share same resv, hence the same shared eviction fence.
Everytime page table is freed, the fence will be signled and that cuases
kfd unexcepted evictions.

CC: Christian König 
CC: Felix Kuehling 
CC: Alex Deucher 
Acked-by: Christian König 
Reviewed-by: Felix Kuehling 
Signed-off-by: xinhui pan 

I'm preparing a fix. Will send it out in a second.

Regards,
  Felix

-Original Message-
From: amd-gfx  On Behalf Of Randy Dunlap
Sent: Wednesday, February 26, 2020 10:03
To: Stephen Rothwell ; Linux Next Mailing List 

Cc: dri-devel ; Linux Kernel Mailing List 
; amd-gfx@lists.freedesktop.org
Subject: Re: linux-next: Tree for Feb 26 (gpu/drm/amd/amdgpu/)

On 2/25/20 8:34 PM, Stephen Rothwell wrote:
> Hi all,
> 
> Changes since 20200225:
> 

on i386:

ld: drivers/gpu/drm/amd/amdgpu/amdgpu_object.o: in function 
`amdgpu_bo_release_notify':
amdgpu_object.c:(.text+0xe07): undefined reference to 
`amdgpu_amdkfd_remove_fence_on_pt_pd_bos'


Full randconfig file is attached.

-- 
~Randy
Reported-by: Randy Dunlap 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: linux-next: Tree for Feb 26 (gpu/drm/amd/amdgpu/)

2020-02-26 Thread Kuehling, Felix
[AMD Official Use Only - Internal Distribution Only]

[Dropping most, +Alex, +Xinhui]

Looks like this was introduced by Xinhui's change:
commit be8e48e0849943fb53457e2fd83905eaf19cb1f7
Author: xinhui pan 
Date:   Tue Feb 11 11:28:34 2020 +0800

drm/amdgpu: Remove kfd eviction fence before release bo

No need to trigger eviction as the memory mapping will not be used
anymore.

All pt/pd bos share same resv, hence the same shared eviction fence.
Everytime page table is freed, the fence will be signled and that cuases
kfd unexcepted evictions.

CC: Christian König 
CC: Felix Kuehling 
CC: Alex Deucher 
Acked-by: Christian König 
Reviewed-by: Felix Kuehling 
Signed-off-by: xinhui pan 

I'm preparing a fix. Will send it out in a second.

Regards,
  Felix

-Original Message-
From: amd-gfx  On Behalf Of Randy Dunlap
Sent: Wednesday, February 26, 2020 10:03
To: Stephen Rothwell ; Linux Next Mailing List 

Cc: dri-devel ; Linux Kernel Mailing List 
; amd-gfx@lists.freedesktop.org
Subject: Re: linux-next: Tree for Feb 26 (gpu/drm/amd/amdgpu/)

On 2/25/20 8:34 PM, Stephen Rothwell wrote:
> Hi all,
> 
> Changes since 20200225:
> 

on i386:

ld: drivers/gpu/drm/amd/amdgpu/amdgpu_object.o: in function 
`amdgpu_bo_release_notify':
amdgpu_object.c:(.text+0xe07): undefined reference to 
`amdgpu_amdkfd_remove_fence_on_pt_pd_bos'


Full randconfig file is attached.

-- 
~Randy
Reported-by: Randy Dunlap 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdkfd: Use kernel queue v9 functions for v10 (ver2)

2019-11-07 Thread Kuehling, Felix
Are you sure that setting the SQ_SHADER_TBA_HI__TRAP_EN bit on GFXv9 is 
completely harmless? If the field is not defined, maybe setting the bit 
makes the address invalid. It's probably worth running that through a 
PSDB, which would cover Vega10, Vega20 and Arcturus.

If it actually works, the patch is

Reviewed-by: Felix Kuehling 

Regards,
   Felix

On 2019-11-07 15:34, Zhao, Yong wrote:
> The kernel queue functions for v9 and v10 are the same except
> pm_map_process_v* which have small difference, so they should be reused.
> This eliminates the need of reapplying several patches which were
> applied on v9 but not on v10, such as bigger GWS and more than 2
> SDMA engine support which were introduced on Arcturus.
>
> Change-Id: I2d385961e3c884db14e30b5afc98d0d9e4cb1802
> Signed-off-by: Yong Zhao 
> ---
>   drivers/gpu/drm/amd/amdkfd/Makefile   |   1 -
>   drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c |   4 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h |   1 -
>   .../gpu/drm/amd/amdkfd/kfd_kernel_queue_v10.c | 317 --
>   .../gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c  |  16 +-
>   .../gpu/drm/amd/amdkfd/kfd_packet_manager.c   |   4 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   4 -
>   7 files changed, 14 insertions(+), 333 deletions(-)
>   delete mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v10.c
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile 
> b/drivers/gpu/drm/amd/amdkfd/Makefile
> index 48155060a57c..017a8b7156da 100644
> --- a/drivers/gpu/drm/amd/amdkfd/Makefile
> +++ b/drivers/gpu/drm/amd/amdkfd/Makefile
> @@ -41,7 +41,6 @@ AMDKFD_FILES:= $(AMDKFD_PATH)/kfd_module.o \
>   $(AMDKFD_PATH)/kfd_kernel_queue_cik.o \
>   $(AMDKFD_PATH)/kfd_kernel_queue_vi.o \
>   $(AMDKFD_PATH)/kfd_kernel_queue_v9.o \
> - $(AMDKFD_PATH)/kfd_kernel_queue_v10.o \
>   $(AMDKFD_PATH)/kfd_packet_manager.o \
>   $(AMDKFD_PATH)/kfd_process_queue_manager.o \
>   $(AMDKFD_PATH)/kfd_device_queue_manager.o \
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
> index 11d244891393..0d966408ea87 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
> @@ -332,12 +332,10 @@ struct kernel_queue *kernel_queue_init(struct kfd_dev 
> *dev,
>   case CHIP_RAVEN:
>   case CHIP_RENOIR:
>   case CHIP_ARCTURUS:
> - kernel_queue_init_v9(>ops_asic_specific);
> - break;
>   case CHIP_NAVI10:
>   case CHIP_NAVI12:
>   case CHIP_NAVI14:
> - kernel_queue_init_v10(>ops_asic_specific);
> + kernel_queue_init_v9(>ops_asic_specific);
>   break;
>   default:
>   WARN(1, "Unexpected ASIC family %u",
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h
> index 365fc674fea4..a7116a939029 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h
> @@ -102,6 +102,5 @@ struct kernel_queue {
>   void kernel_queue_init_cik(struct kernel_queue_ops *ops);
>   void kernel_queue_init_vi(struct kernel_queue_ops *ops);
>   void kernel_queue_init_v9(struct kernel_queue_ops *ops);
> -void kernel_queue_init_v10(struct kernel_queue_ops *ops);
>   
>   #endif /* KFD_KERNEL_QUEUE_H_ */
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v10.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v10.c
> deleted file mode 100644
> index bfd6221acae9..
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v10.c
> +++ /dev/null
> @@ -1,317 +0,0 @@
> -/*
> - * Copyright 2018 Advanced Micro Devices, Inc.
> - *
> - * Permission is hereby granted, free of charge, to any person obtaining a
> - * copy of this software and associated documentation files (the "Software"),
> - * to deal in the Software without restriction, including without limitation
> - * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> - * and/or sell copies of the Software, and to permit persons to whom the
> - * Software is furnished to do so, subject to the following conditions:
> - *
> - * The above copyright notice and this permission notice shall be included in
> - * all copies or substantial portions of the Software.
> - *
> - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> - * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> - * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> - * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> - * OTHER DEALINGS IN THE SOFTWARE.
> - *
> - */
> -
> -#include "kfd_kernel_queue.h"
> -#include 

Re: [PATCH 2/3] drm/amdkfd: only keep release_mem function for Hawaii

2019-11-07 Thread Kuehling, Felix
On 2019-11-07 13:32, Alex Deucher wrote:
> On Thu, Nov 7, 2019 at 12:47 PM Kuehling, Felix  
> wrote:
>> No, please lets not add a new nomenclature for PM4 packet versions. GFX 
>> versions are agreed on between hardware, firmware, and software and it's 
>> generally understood what they mean. If we add a new PM4 packet versioning 
>> scheme on our own, then this will add a lot of confusion when talking to 
>> firmware teams.
>>
>> You know, this would all be more straight forward if we accepted a little 
>> bit of code duplication and had packet writing functions per GFX version. 
>> You'll see this pattern a lot in the amdgpu driver where each IP version 
>> duplicates a bunch of code. In many cases you may be able to save a few 
>> lines of code by sharing functions between IP versions. But you'll add some 
>> confusion and burden on future maintenance.
>>
>> This is the price we pay for micro-optimizing minor code duplication.
> What we've tried to do in amdgpu is to break out shared code in to
> common helpers that are not IP specific and use that in each IP module
> (e.g., amdgpu_uvd.c amdgpu_gfx.c, etc.).  Sometimes we can use a
> particular chunk of code across multiple generations.  E.g., the uvd
> stuff is a good example.  We have shared generic uvd helpers that work
> for most UVD IP versions, and then if we need an IP specific version,
> we override that in the callbacks with a version specific one.  E.g.,
> for the video decode engines we use the generic helpers for a number
> of ring functions:
>
> static const struct amdgpu_ring_funcs uvd_v7_0_ring_vm_funcs = {
> ...
>  .test_ring = uvd_v7_0_ring_test_ring,
>  .test_ib = amdgpu_uvd_ring_test_ib,
>  .insert_nop = uvd_v7_0_ring_insert_nop,
>  .pad_ib = amdgpu_ring_generic_pad_ib,
>  .begin_use = amdgpu_uvd_ring_begin_use,
>  .end_use = amdgpu_uvd_ring_end_use,
> ...
> };
>
> while we override more of them for the video encode engines:
>
> static const struct amdgpu_ring_funcs uvd_v7_0_enc_ring_vm_funcs = {
> ...
>  .test_ring = uvd_v7_0_enc_ring_test_ring,
>  .test_ib = uvd_v7_0_enc_ring_test_ib,
>  .insert_nop = amdgpu_ring_insert_nop,
>  .insert_end = uvd_v7_0_enc_ring_insert_end,
>  .pad_ib = amdgpu_ring_generic_pad_ib,
>  .begin_use = amdgpu_uvd_ring_begin_use,
>  .end_use = amdgpu_uvd_ring_end_use,
> ...
> };
>
> But still maintain IP specific components.

Thanks Alex. In this case the common code is in kfd_packet_manager.c and 
the IP-version-specific code that writes the actual PM4 packets is in 
the kernel_queue_v*.c files. Yong is trying to merge the PM4 packet 
writing code for v9 and v10 because the packet formats are essentially 
unchanged. It makes the naming conventions in the code a bit meaningless 
because v9 now really means "v9 and v10". Apparently there is precedent 
for this, as we already did the same thing with v7 and v8, which I 
forgot about in my initial code review.

Regards,
   Felix


>
> Alex
>
>> Regards,
>>Felix
>>
>> On 2019-11-07 12:39, Zhao, Yong wrote:
>>
>> Hi Kent,
>>
>> I can't agree more on this. Also, the same applies to the file names. 
>> Definitely we need to agree on the naming scheme before making it happen.
>>
>> Yong
>>
>> On 2019-11-07 12:33 p.m., Russell, Kent wrote:
>>
>> I think that the versioning is getting a little confusing since we’re using 
>> the old GFX versions, but not really sticking to it due to the shareability 
>> of certain managers and shaders. Could we look into doing something like 
>> gen1 or gen2, or some other more ambiguous non-GFX-related versioning? 
>> Otherwise we’ll keep having these questions of “why is Hawaii GFX8”, “why is 
>> Arcturus GFX9”, etc. Then if things change, we just up the value concretely, 
>> instead of maybe doing a v11 if GFX11 changes things, and only GFX11 ASICs 
>> use those functions/variables.
>>
>>
>>
>> Obviously not high-priority, but maybe something to consider as you continue 
>> to consolidate and remove duplicate code.
>>
>>
>>
>> Kent
>>
>>
>>
>> From: amd-gfx  On Behalf Of Zhao, Yong
>> Sent: Thursday, November 7, 2019 11:57 AM
>> To: Kuehling, Felix ; amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH 2/3] drm/amdkfd: only keep release_mem function for 
>> Hawaii
>>
>>
>>
>> Hi Felix,
>>
>>
>>
>> That's because v8 and v7 share the same packet_manager_funcs. In this case, 
>> it is better to keep it as it is.
>>
>>
>>
&

Re: [PATCH 2/3] drm/amdkfd: only keep release_mem function for Hawaii

2019-11-07 Thread Kuehling, Felix
On 2019-11-07 14:40, Zhao, Yong wrote:
Hi Felix,

The code working fine is true except that all not new features after this 
duplication are broken. If I want to make all GFX10 feature complete, I have to 
either manually adapt several duplications to the GFX10 file or do this 
consolidation. From this perspective and ease of my work, it is a must.

"A must" means there is no alternative. You already listed two alternatives 
yourself: "either manually adapt several duplications to the GFX10 file or do 
this consolidation."


In _your_ opinion, the consolidation means less work for _you_. That's _your_ 
point of view. The discussion in this code review pointed out other points of 
view. When you take all of them into account, you may reconsider what is less 
work overall, and what is easier to maintain.


I'm not opposing your change per-se. But I'd like you to consider the whole 
picture, including the consequences of any design decisions you're making and 
imposing on anyone working on this code in the future. In this cases I think 
it's a relatively minor issue and it may just come down to a matter of opinion 
that I don't feel terribly strongly about.


With that said, the change is

Reviewed-by: Felix Kuehling 
<mailto:felix.kuehl...@amd.com>


Regards,

  Felix


Regards,
Yong

____
From: Kuehling, Felix <mailto:felix.kuehl...@amd.com>
Sent: Thursday, November 7, 2019 2:12 PM
To: Zhao, Yong <mailto:yong.z...@amd.com>; Alex Deucher 
<mailto:alexdeuc...@gmail.com>
Cc: Russell, Kent <mailto:kent.russ...@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH 2/3] drm/amdkfd: only keep release_mem function for Hawaii

On 2019-11-07 13:54, Zhao, Yong wrote:
Hi Kent,

This consolidation is a must, because we should not have duplicated it in the 
first place.

The code is working fine with the duplication. You disagree with duplicating 
the code in the first place. But that's just your opinion. It's not a must in 
any objective sense.



The kernel queue functions by design are generic. The reasson why GFX8 and GFX9 
are different is because GFX9 is SOC15 where packet formats and doorbell size 
changed.  On the other hand, kfd_kernel_queue_v7.c file is pretty much empty by 
reusing v8 functions, even though it is there. Furthermore, in my opinion 
kfd_kernel_queue_v7.c should be merged into v8 counterpart, From GFX9 onwards, 
packet formats should stay the same. For kernel queues, we should be able to 
differentiate it by pre SOC15 or not, and I have an impression that MEC 
firmware agrees to maintain the kernel queue interface stable across 
generations most of time.

OK, you're making assumptions about PM4 packets on future ASIC generations. 
It's true that the transition to SOC15 with 64-bit doorbells and 
read/write-pointers was particularly disruptive. Your assumption will hold 
until it gets broken by some other disruptive change.


For now, if you want clear naming, we could call the GFXv7/8 packet manager 
functions "pre-SOC15" or "legacy" and the GFXv9/10 and future functions 
"SOC15". This may work for a while. But I suspect at some point something is 
going to change and we'll need to create a new version for a newer ASIC 
generation. You already have a small taste of that with the different 
TBA-enable bit in the MAP_PROCESS packet in GFXv10.


Regards,

  Felix


Regards,
Yong

From: Alex Deucher <mailto:alexdeuc...@gmail.com>
Sent: Thursday, November 7, 2019 1:32 PM
To: Kuehling, Felix <mailto:felix.kuehl...@amd.com>
Cc: Zhao, Yong <mailto:yong.z...@amd.com>; Russell, Kent 
<mailto:kent.russ...@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH 2/3] drm/amdkfd: only keep release_mem function for Hawaii

On Thu, Nov 7, 2019 at 12:47 PM Kuehling, Felix 
<mailto:felix.kuehl...@amd.com> wrote:
>
> No, please lets not add a new nomenclature for PM4 packet versions. GFX 
> versions are agreed on between hardware, firmware, and software and it's 
> generally understood what they mean. If we add a new PM4 packet versioning 
> scheme on our own, then this will add a lot of confusion when talking to 
> firmware teams.
>
> You know, this would all be more straight forward if we accepted a little bit 
> of code duplication and had packet writing functions per GFX version. You'll 
> see this pattern a lot in the amdgpu driver where each IP version duplicates 
> a bunch of code. In many cases you may be able to save a few lines of code by 
> sharing functions between IP versions. But you'll add some confusion and 
> burden on future maintenance.
>
> This is the price we pay for micro-optimizing min

Re: [PATCH 2/3] drm/amdkfd: only keep release_mem function for Hawaii

2019-11-07 Thread Kuehling, Felix
On 2019-11-07 13:54, Zhao, Yong wrote:
Hi Kent,

This consolidation is a must, because we should not have duplicated it in the 
first place.

The code is working fine with the duplication. You disagree with duplicating 
the code in the first place. But that's just your opinion. It's not a must in 
any objective sense.


The kernel queue functions by design are generic. The reasson why GFX8 and GFX9 
are different is because GFX9 is SOC15 where packet formats and doorbell size 
changed.  On the other hand, kfd_kernel_queue_v7.c file is pretty much empty by 
reusing v8 functions, even though it is there. Furthermore, in my opinion 
kfd_kernel_queue_v7.c should be merged into v8 counterpart, From GFX9 onwards, 
packet formats should stay the same. For kernel queues, we should be able to 
differentiate it by pre SOC15 or not, and I have an impression that MEC 
firmware agrees to maintain the kernel queue interface stable across 
generations most of time.

OK, you're making assumptions about PM4 packets on future ASIC generations. 
It's true that the transition to SOC15 with 64-bit doorbells and 
read/write-pointers was particularly disruptive. Your assumption will hold 
until it gets broken by some other disruptive change.


For now, if you want clear naming, we could call the GFXv7/8 packet manager 
functions "pre-SOC15" or "legacy" and the GFXv9/10 and future functions 
"SOC15". This may work for a while. But I suspect at some point something is 
going to change and we'll need to create a new version for a newer ASIC 
generation. You already have a small taste of that with the different 
TBA-enable bit in the MAP_PROCESS packet in GFXv10.


Regards,

  Felix


Regards,
Yong

From: Alex Deucher <mailto:alexdeuc...@gmail.com>
Sent: Thursday, November 7, 2019 1:32 PM
To: Kuehling, Felix <mailto:felix.kuehl...@amd.com>
Cc: Zhao, Yong <mailto:yong.z...@amd.com>; Russell, Kent 
<mailto:kent.russ...@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH 2/3] drm/amdkfd: only keep release_mem function for Hawaii

On Thu, Nov 7, 2019 at 12:47 PM Kuehling, Felix 
<mailto:felix.kuehl...@amd.com> wrote:
>
> No, please lets not add a new nomenclature for PM4 packet versions. GFX 
> versions are agreed on between hardware, firmware, and software and it's 
> generally understood what they mean. If we add a new PM4 packet versioning 
> scheme on our own, then this will add a lot of confusion when talking to 
> firmware teams.
>
> You know, this would all be more straight forward if we accepted a little bit 
> of code duplication and had packet writing functions per GFX version. You'll 
> see this pattern a lot in the amdgpu driver where each IP version duplicates 
> a bunch of code. In many cases you may be able to save a few lines of code by 
> sharing functions between IP versions. But you'll add some confusion and 
> burden on future maintenance.
>
> This is the price we pay for micro-optimizing minor code duplication.

What we've tried to do in amdgpu is to break out shared code in to
common helpers that are not IP specific and use that in each IP module
(e.g., amdgpu_uvd.c amdgpu_gfx.c, etc.).  Sometimes we can use a
particular chunk of code across multiple generations.  E.g., the uvd
stuff is a good example.  We have shared generic uvd helpers that work
for most UVD IP versions, and then if we need an IP specific version,
we override that in the callbacks with a version specific one.  E.g.,
for the video decode engines we use the generic helpers for a number
of ring functions:

static const struct amdgpu_ring_funcs uvd_v7_0_ring_vm_funcs = {
...
.test_ring = uvd_v7_0_ring_test_ring,
.test_ib = amdgpu_uvd_ring_test_ib,
.insert_nop = uvd_v7_0_ring_insert_nop,
.pad_ib = amdgpu_ring_generic_pad_ib,
.begin_use = amdgpu_uvd_ring_begin_use,
.end_use = amdgpu_uvd_ring_end_use,
...
};

while we override more of them for the video encode engines:

static const struct amdgpu_ring_funcs uvd_v7_0_enc_ring_vm_funcs = {
...
.test_ring = uvd_v7_0_enc_ring_test_ring,
.test_ib = uvd_v7_0_enc_ring_test_ib,
.insert_nop = amdgpu_ring_insert_nop,
.insert_end = uvd_v7_0_enc_ring_insert_end,
.pad_ib = amdgpu_ring_generic_pad_ib,
.begin_use = amdgpu_uvd_ring_begin_use,
.end_use = amdgpu_uvd_ring_end_use,
...
};

But still maintain IP specific components.

Alex

>
> Regards,
>   Felix
>
> On 2019-11-07 12:39, Zhao, Yong wrote:
>
> Hi Kent,
>
> I can't agree more on this. Also, the same applies to the file names. 
> Definitely we need to agree on the naming scheme before making it happen.
>
> Yong
>
> On 2019-11-07 12:33 p.m., Russell, Kent wrote:
>
> I think that the version

Re: [PATCH] drm/amdkfd: Simplify the mmap offset related bit operations

2019-11-07 Thread Kuehling, Felix
On 2019-11-07 12:33, Zhao, Yong wrote:
> The new code uses straightforward bit shifts and thus has better readability.
>
> Change-Id: I0c1f7cca7e24ddb7b4ffe1cb0fa71943828ae373
> Signed-off-by: Yong Zhao 

Reviewed-by: Felix Kuehling 


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 17 +++--
>   drivers/gpu/drm/amd/amdkfd/kfd_events.c  |  1 -
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  9 +++--
>   drivers/gpu/drm/amd/amdkfd/kfd_process.c |  3 +--
>   4 files changed, 11 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index b91993753b82..e59c229861e6 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -298,7 +298,6 @@ static int kfd_ioctl_create_queue(struct file *filep, 
> struct kfd_process *p,
>   /* Return gpu_id as doorbell offset for mmap usage */
>   args->doorbell_offset = KFD_MMAP_TYPE_DOORBELL;
>   args->doorbell_offset |= KFD_MMAP_GPU_ID(args->gpu_id);
> - args->doorbell_offset <<= PAGE_SHIFT;
>   if (KFD_IS_SOC15(dev->device_info->asic_family))
>   /* On SOC15 ASICs, include the doorbell offset within the
>* process doorbell frame, which could be 1 page or 2 pages.
> @@ -1312,10 +1311,9 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file 
> *filep,
>   /* MMIO is mapped through kfd device
>* Generate a kfd mmap offset
>*/
> - if (flags & KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP) {
> - args->mmap_offset = KFD_MMAP_TYPE_MMIO | 
> KFD_MMAP_GPU_ID(args->gpu_id);
> - args->mmap_offset <<= PAGE_SHIFT;
> - }
> + if (flags & KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)
> + args->mmap_offset = KFD_MMAP_TYPE_MMIO
> + | KFD_MMAP_GPU_ID(args->gpu_id);
>   
>   return 0;
>   
> @@ -1938,20 +1936,19 @@ static int kfd_mmap(struct file *filp, struct 
> vm_area_struct *vma)
>   {
>   struct kfd_process *process;
>   struct kfd_dev *dev = NULL;
> - unsigned long vm_pgoff;
> + unsigned long mmap_offset;
>   unsigned int gpu_id;
>   
>   process = kfd_get_process(current);
>   if (IS_ERR(process))
>   return PTR_ERR(process);
>   
> - vm_pgoff = vma->vm_pgoff;
> - vma->vm_pgoff = KFD_MMAP_OFFSET_VALUE_GET(vm_pgoff);
> - gpu_id = KFD_MMAP_GPU_ID_GET(vm_pgoff);
> + mmap_offset = vma->vm_pgoff << PAGE_SHIFT;
> + gpu_id = KFD_MMAP_GET_GPU_ID(mmap_offset);
>   if (gpu_id)
>   dev = kfd_device_by_id(gpu_id);
>   
> - switch (vm_pgoff & KFD_MMAP_TYPE_MASK) {
> + switch (mmap_offset & KFD_MMAP_TYPE_MASK) {
>   case KFD_MMAP_TYPE_DOORBELL:
>   if (!dev)
>   return -ENODEV;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> index 908081c85de1..1f8365575b12 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> @@ -346,7 +346,6 @@ int kfd_event_create(struct file *devkfd, struct 
> kfd_process *p,
>   ret = create_signal_event(devkfd, p, ev);
>   if (!ret) {
>   *event_page_offset = KFD_MMAP_TYPE_EVENTS;
> - *event_page_offset <<= PAGE_SHIFT;
>   *event_slot_index = ev->event_id;
>   }
>   break;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 66bae8f2dad1..8eecd2cd1fd2 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -59,24 +59,21 @@
>* NOTE: struct vm_area_struct.vm_pgoff uses offset in pages. Hence, these
>*  defines are w.r.t to PAGE_SIZE
>*/
> -#define KFD_MMAP_TYPE_SHIFT  (62 - PAGE_SHIFT)
> +#define KFD_MMAP_TYPE_SHIFT  (62)
>   #define KFD_MMAP_TYPE_MASK  (0x3ULL << KFD_MMAP_TYPE_SHIFT)
>   #define KFD_MMAP_TYPE_DOORBELL  (0x3ULL << KFD_MMAP_TYPE_SHIFT)
>   #define KFD_MMAP_TYPE_EVENTS(0x2ULL << KFD_MMAP_TYPE_SHIFT)
>   #define KFD_MMAP_TYPE_RESERVED_MEM  (0x1ULL << KFD_MMAP_TYPE_SHIFT)
>   #define KFD_MMAP_TYPE_MMIO  (0x0ULL << KFD_MMAP_TYPE_SHIFT)
>   
> -#define KFD_MMAP_GPU_ID_SHIFT (46 - PAGE_SHIFT)
> +#define KFD_MMAP_GPU_ID_SHIFT (46)
>   #define KFD_MMAP_GPU_ID_MASK (((1ULL << KFD_GPU_ID_HASH_WIDTH) - 1) \
>   << KFD_MMAP_GPU_ID_SHIFT)
>   #define KFD_MMAP_GPU_ID(gpu_id) uint64_t)gpu_id) << 
> KFD_MMAP_GPU_ID_SHIFT)\
>   & KFD_MMAP_GPU_ID_MASK)
> -#define KFD_MMAP_GPU_ID_GET(offset)((offset & KFD_MMAP_GPU_ID_MASK) \
> +#define KFD_MMAP_GET_GPU_ID(offset)((offset & KFD_MMAP_GPU_ID_MASK) \
>   >> KFD_MMAP_GPU_ID_SHIFT)
>   
> -#define KFD_MMAP_OFFSET_VALUE_MASK   (0x3FFFULL >> PAGE_SHIFT)
> -#define KFD_MMAP_OFFSET_VALUE_GET(offset) (offset & 

Re: [PATCH 2/3] drm/amdkfd: only keep release_mem function for Hawaii

2019-11-07 Thread Kuehling, Felix
No, please lets not add a new nomenclature for PM4 packet versions. GFX 
versions are agreed on between hardware, firmware, and software and it's 
generally understood what they mean. If we add a new PM4 packet versioning 
scheme on our own, then this will add a lot of confusion when talking to 
firmware teams.

You know, this would all be more straight forward if we accepted a little bit 
of code duplication and had packet writing functions per GFX version. You'll 
see this pattern a lot in the amdgpu driver where each IP version duplicates a 
bunch of code. In many cases you may be able to save a few lines of code by 
sharing functions between IP versions. But you'll add some confusion and burden 
on future maintenance.

This is the price we pay for micro-optimizing minor code duplication.

Regards,
  Felix

On 2019-11-07 12:39, Zhao, Yong wrote:

Hi Kent,

I can't agree more on this. Also, the same applies to the file names. 
Definitely we need to agree on the naming scheme before making it happen.

Yong

On 2019-11-07 12:33 p.m., Russell, Kent wrote:
I think that the versioning is getting a little confusing since we’re using the 
old GFX versions, but not really sticking to it due to the shareability of 
certain managers and shaders. Could we look into doing something like gen1 or 
gen2, or some other more ambiguous non-GFX-related versioning? Otherwise we’ll 
keep having these questions of “why is Hawaii GFX8”, “why is Arcturus GFX9”, 
etc. Then if things change, we just up the value concretely, instead of maybe 
doing a v11 if GFX11 changes things, and only GFX11 ASICs use those 
functions/variables.

Obviously not high-priority, but maybe something to consider as you continue to 
consolidate and remove duplicate code.

Kent

From: amd-gfx 
<mailto:amd-gfx-boun...@lists.freedesktop.org>
 On Behalf Of Zhao, Yong
Sent: Thursday, November 7, 2019 11:57 AM
To: Kuehling, Felix <mailto:felix.kuehl...@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH 2/3] drm/amdkfd: only keep release_mem function for Hawaii

Hi Felix,

That's because v8 and v7 share the same packet_manager_funcs. In this case, it 
is better to keep it as it is.

Regards,
Yong
____
From: Kuehling, Felix mailto:felix.kuehl...@amd.com>>
Sent: Wednesday, November 6, 2019 11:45 PM
To: Zhao, Yong mailto:yong.z...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
mailto:amd-gfx@lists.freedesktop.org>>
Subject: Re: [PATCH 2/3] drm/amdkfd: only keep release_mem function for Hawaii

On 2019-10-30 20:17, Zhao, Yong wrote:
> release_mem won't be used at all on GFX9 and GFX10, so delete it.

Hawaii was GFXv7. So we're not using the release_mem packet on GFXv8
either. Why arbitrarily limit this change to GFXv9 and 10?

Regards,
   Felix

>
> Change-Id: I13787a8a29b83e7516c582a7401f2e14721edf5f
> Signed-off-by: Yong Zhao mailto:yong.z...@amd.com>>
> ---
>   .../gpu/drm/amd/amdkfd/kfd_kernel_queue_v10.c | 35 ++-
>   .../gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c  | 33 ++---
>   2 files changed, 4 insertions(+), 64 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v10.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v10.c
> index aed32ab7102e..bfd6221acae9 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v10.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v10.c
> @@ -298,37 +298,6 @@ static int pm_query_status_v10(struct packet_manager 
> *pm, uint32_t *buffer,
>return 0;
>   }
>
> -
> -static int pm_release_mem_v10(uint64_t gpu_addr, uint32_t *buffer)
> -{
> - struct pm4_mec_release_mem *packet;
> -
> - WARN_ON(!buffer);
> -
> - packet = (struct pm4_mec_release_mem *)buffer;
> - memset(buffer, 0, sizeof(struct pm4_mec_release_mem));
> -
> - packet->header.u32All = pm_build_pm4_header(IT_RELEASE_MEM,
> - sizeof(struct pm4_mec_release_mem));
> -
> - packet->bitfields2.event_type = CACHE_FLUSH_AND_INV_TS_EVENT;
> - packet->bitfields2.event_index = 
> event_index__mec_release_mem__end_of_pipe;
> - packet->bitfields2.tcl1_action_ena = 1;
> - packet->bitfields2.tc_action_ena = 1;
> - packet->bitfields2.cache_policy = cache_policy__mec_release_mem__lru;
> -
> - packet->bitfields3.data_sel = 
> data_sel__mec_release_mem__send_32_bit_low;
> - packet->bitfields3.int_sel =
> - int_sel__mec_release_mem__send_interrupt_after_write_confirm;
> -
> - packet->bitfields4.address_lo_32b = (gpu_addr & 0x) >> 2;
> - packet->address_hi = upper_32_bits(gpu_addr);
> -
> - packet->data_lo = 0;
> -
> - return sizeof(struct pm4_mec_releas

Re: [PATCH] drm/amdkfd: Simplify the mmap offset related bit operations

2019-11-06 Thread Kuehling, Felix
On 2019-11-05 18:18, Zhao, Yong wrote:
> The new code uses straightforward bit shifts and thus has better 
> readability.

You're missing the MMAP-related code for mmio remapping. In 
kfd_ioctl_alloc_memory_of_gpu:

     /* MMIO is mapped through kfd device
  * Generate a kfd mmap offset
  */
     if (flags & KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP) {
     args->mmap_offset = KFD_MMAP_TYPE_MMIO | 
KFD_MMAP_GPU_ID(args->gpu_id);
     args->mmap_offset <<= PAGE_SHIFT;
     }

Regards,
   Felix

>
> Change-Id: I0c1f7cca7e24ddb7b4ffe1cb0fa71943828ae373
> Signed-off-by: Yong Zhao 
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 10 --
> drivers/gpu/drm/amd/amdkfd/kfd_events.c | 1 -
> drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 9 +++--
> drivers/gpu/drm/amd/amdkfd/kfd_process.c | 3 +--
> 4 files changed, 8 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index b91993753b82..34078df36621 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -298,7 +298,6 @@ static int kfd_ioctl_create_queue(struct file 
> *filep, struct kfd_process *p,
> /* Return gpu_id as doorbell offset for mmap usage */
> args->doorbell_offset = KFD_MMAP_TYPE_DOORBELL;
> args->doorbell_offset |= KFD_MMAP_GPU_ID(args->gpu_id);
> - args->doorbell_offset <<= PAGE_SHIFT;
> if (KFD_IS_SOC15(dev->device_info->asic_family))
> /* On SOC15 ASICs, include the doorbell offset within the
> * process doorbell frame, which could be 1 page or 2 pages.
> @@ -1938,20 +1937,19 @@ static int kfd_mmap(struct file *filp, struct 
> vm_area_struct *vma)
> {
> struct kfd_process *process;
> struct kfd_dev *dev = NULL;
> - unsigned long vm_pgoff;
> + unsigned long mmap_offset;
> unsigned int gpu_id;
> process = kfd_get_process(current);
> if (IS_ERR(process))
> return PTR_ERR(process);
> - vm_pgoff = vma->vm_pgoff;
> - vma->vm_pgoff = KFD_MMAP_OFFSET_VALUE_GET(vm_pgoff);
> - gpu_id = KFD_MMAP_GPU_ID_GET(vm_pgoff);
> + mmap_offset = vma->vm_pgoff << PAGE_SHIFT;
> + gpu_id = KFD_MMAP_GET_GPU_ID(mmap_offset);
> if (gpu_id)
> dev = kfd_device_by_id(gpu_id);
> - switch (vm_pgoff & KFD_MMAP_TYPE_MASK) {
> + switch (mmap_offset & KFD_MMAP_TYPE_MASK) {
> case KFD_MMAP_TYPE_DOORBELL:
> if (!dev)
> return -ENODEV;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> index 908081c85de1..1f8365575b12 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> @@ -346,7 +346,6 @@ int kfd_event_create(struct file *devkfd, struct 
> kfd_process *p,
> ret = create_signal_event(devkfd, p, ev);
> if (!ret) {
> *event_page_offset = KFD_MMAP_TYPE_EVENTS;
> - *event_page_offset <<= PAGE_SHIFT;
> *event_slot_index = ev->event_id;
> }
> break;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 66bae8f2dad1..8eecd2cd1fd2 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -59,24 +59,21 @@
> * NOTE: struct vm_area_struct.vm_pgoff uses offset in pages. Hence, these
> * defines are w.r.t to PAGE_SIZE
> */
> -#define KFD_MMAP_TYPE_SHIFT (62 - PAGE_SHIFT)
> +#define KFD_MMAP_TYPE_SHIFT (62)
> #define KFD_MMAP_TYPE_MASK (0x3ULL << KFD_MMAP_TYPE_SHIFT)
> #define KFD_MMAP_TYPE_DOORBELL (0x3ULL << KFD_MMAP_TYPE_SHIFT)
> #define KFD_MMAP_TYPE_EVENTS (0x2ULL << KFD_MMAP_TYPE_SHIFT)
> #define KFD_MMAP_TYPE_RESERVED_MEM (0x1ULL << KFD_MMAP_TYPE_SHIFT)
> #define KFD_MMAP_TYPE_MMIO (0x0ULL << KFD_MMAP_TYPE_SHIFT)
> -#define KFD_MMAP_GPU_ID_SHIFT (46 - PAGE_SHIFT)
> +#define KFD_MMAP_GPU_ID_SHIFT (46)
> #define KFD_MMAP_GPU_ID_MASK (((1ULL << KFD_GPU_ID_HASH_WIDTH) - 1) \
> << KFD_MMAP_GPU_ID_SHIFT)
> #define KFD_MMAP_GPU_ID(gpu_id) uint64_t)gpu_id) << 
> KFD_MMAP_GPU_ID_SHIFT)\
> & KFD_MMAP_GPU_ID_MASK)
> -#define KFD_MMAP_GPU_ID_GET(offset) ((offset & KFD_MMAP_GPU_ID_MASK) \
> +#define KFD_MMAP_GET_GPU_ID(offset) ((offset & KFD_MMAP_GPU_ID_MASK) \
> >> KFD_MMAP_GPU_ID_SHIFT)
> -#define KFD_MMAP_OFFSET_VALUE_MASK (0x3FFFULL >> PAGE_SHIFT)
> -#define KFD_MMAP_OFFSET_VALUE_GET(offset) (offset & 
> KFD_MMAP_OFFSET_VALUE_MASK)
> -
> /*
> * When working with cp scheduler we should assign the HIQ manually or via
> * the amdgpu driver to a fixed hqd slot, here are the fixed HIQ hqd slot
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index 6abfb77ae540..39dc49b8fd85 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -554,8 +554,7 @@ static int kfd_process_init_cwsr_apu(struct 
> kfd_process *p, struct file *filep)
> if (!dev->cwsr_enabled || qpd->cwsr_kaddr || qpd->cwsr_base)
> continue;
> - offset = (KFD_MMAP_TYPE_RESERVED_MEM | KFD_MMAP_GPU_ID(dev->id))
> - << PAGE_SHIFT;
> + offset = 

Re: [PATCH 3/3] drm/amdkfd: Use kernel queue v9 functions for v10

2019-11-06 Thread Kuehling, Felix
On 2019-10-30 20:17, Zhao, Yong wrote:
> The kernel queue functions for v9 and v10 are the same except
> pm_map_process_v* which have small difference, so they should be reused.
> This eliminates the need of reapplying several patches which were
> applied on v9 but not on v10, such as bigger GWS and more than 2
> SDMA engine support which were introduced on Arcturus.

This looks reasonable in principle. See one suggestion inline to 
simplify it further.


>
> Change-Id: I2d385961e3c884db14e30b5afc98d0d9e4cb1802
> Signed-off-by: Yong Zhao 
> ---
>   drivers/gpu/drm/amd/amdkfd/Makefile   |   1 -
>   drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c |   4 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h |   1 -
>   .../gpu/drm/amd/amdkfd/kfd_kernel_queue_v10.c | 317 --
>   .../gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c  |  49 ++-
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   3 -
>   6 files changed, 44 insertions(+), 331 deletions(-)
>   delete mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v10.c
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile 
> b/drivers/gpu/drm/amd/amdkfd/Makefile
> index 48155060a57c..017a8b7156da 100644
> --- a/drivers/gpu/drm/amd/amdkfd/Makefile
> +++ b/drivers/gpu/drm/amd/amdkfd/Makefile
> @@ -41,7 +41,6 @@ AMDKFD_FILES:= $(AMDKFD_PATH)/kfd_module.o \
>   $(AMDKFD_PATH)/kfd_kernel_queue_cik.o \
>   $(AMDKFD_PATH)/kfd_kernel_queue_vi.o \
>   $(AMDKFD_PATH)/kfd_kernel_queue_v9.o \
> - $(AMDKFD_PATH)/kfd_kernel_queue_v10.o \
>   $(AMDKFD_PATH)/kfd_packet_manager.o \
>   $(AMDKFD_PATH)/kfd_process_queue_manager.o \
>   $(AMDKFD_PATH)/kfd_device_queue_manager.o \
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
> index 11d244891393..0d966408ea87 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
> @@ -332,12 +332,10 @@ struct kernel_queue *kernel_queue_init(struct kfd_dev 
> *dev,
>   case CHIP_RAVEN:
>   case CHIP_RENOIR:
>   case CHIP_ARCTURUS:
> - kernel_queue_init_v9(>ops_asic_specific);
> - break;
>   case CHIP_NAVI10:
>   case CHIP_NAVI12:
>   case CHIP_NAVI14:
> - kernel_queue_init_v10(>ops_asic_specific);
> + kernel_queue_init_v9(>ops_asic_specific);
>   break;
>   default:
>   WARN(1, "Unexpected ASIC family %u",
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h
> index 365fc674fea4..a7116a939029 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h
> @@ -102,6 +102,5 @@ struct kernel_queue {
>   void kernel_queue_init_cik(struct kernel_queue_ops *ops);
>   void kernel_queue_init_vi(struct kernel_queue_ops *ops);
>   void kernel_queue_init_v9(struct kernel_queue_ops *ops);
> -void kernel_queue_init_v10(struct kernel_queue_ops *ops);
>   
>   #endif /* KFD_KERNEL_QUEUE_H_ */
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v10.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v10.c
> deleted file mode 100644
> index bfd6221acae9..
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v10.c
> +++ /dev/null
[snip]
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
> index f0e4910a8865..d8f7343bfe71 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
> @@ -62,8 +62,9 @@ void kernel_queue_init_v9(struct kernel_queue_ops *ops)
>   ops->submit_packet = submit_packet_v9;
>   }
>   
> -static int pm_map_process_v9(struct packet_manager *pm,
> - uint32_t *buffer, struct qcm_process_device *qpd)
> +static int pm_map_process_v9_base(struct packet_manager *pm,
> + uint32_t *buffer, struct qcm_process_device *qpd,
> + unsigned int sq_shader_tba_hi_trap_en_shift)
>   {
>   struct pm4_mes_map_process *packet;
>   uint64_t vm_page_table_base_addr = qpd->page_table_base;
> @@ -85,10 +86,16 @@ static int pm_map_process_v9(struct packet_manager *pm,
>   
>   packet->sh_mem_config = qpd->sh_mem_config;
>   packet->sh_mem_bases = qpd->sh_mem_bases;
> - packet->sq_shader_tba_lo = lower_32_bits(qpd->tba_addr >> 8);
> - packet->sq_shader_tba_hi = upper_32_bits(qpd->tba_addr >> 8);
> - packet->sq_shader_tma_lo = lower_32_bits(qpd->tma_addr >> 8);
> - packet->sq_shader_tma_hi = upper_32_bits(qpd->tma_addr >> 8);
> + if (qpd->tba_addr) {
> + packet->sq_shader_tba_lo = lower_32_bits(qpd->tba_addr >> 8);
> + packet->sq_shader_tba_hi = upper_32_bits(qpd->tba_addr >> 8);
> + if (sq_shader_tba_hi_trap_en_shift) {
> + packet->sq_shader_tba_hi |=
> +  

Re: [PATCH 2/3] drm/amdkfd: only keep release_mem function for Hawaii

2019-11-06 Thread Kuehling, Felix
On 2019-10-30 20:17, Zhao, Yong wrote:
> release_mem won't be used at all on GFX9 and GFX10, so delete it.

Hawaii was GFXv7. So we're not using the release_mem packet on GFXv8 
either. Why arbitrarily limit this change to GFXv9 and 10?

Regards,
   Felix

>
> Change-Id: I13787a8a29b83e7516c582a7401f2e14721edf5f
> Signed-off-by: Yong Zhao 
> ---
>   .../gpu/drm/amd/amdkfd/kfd_kernel_queue_v10.c | 35 ++-
>   .../gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c  | 33 ++---
>   2 files changed, 4 insertions(+), 64 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v10.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v10.c
> index aed32ab7102e..bfd6221acae9 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v10.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v10.c
> @@ -298,37 +298,6 @@ static int pm_query_status_v10(struct packet_manager 
> *pm, uint32_t *buffer,
>   return 0;
>   }
>   
> -
> -static int pm_release_mem_v10(uint64_t gpu_addr, uint32_t *buffer)
> -{
> - struct pm4_mec_release_mem *packet;
> -
> - WARN_ON(!buffer);
> -
> - packet = (struct pm4_mec_release_mem *)buffer;
> - memset(buffer, 0, sizeof(struct pm4_mec_release_mem));
> -
> - packet->header.u32All = pm_build_pm4_header(IT_RELEASE_MEM,
> - sizeof(struct pm4_mec_release_mem));
> -
> - packet->bitfields2.event_type = CACHE_FLUSH_AND_INV_TS_EVENT;
> - packet->bitfields2.event_index = 
> event_index__mec_release_mem__end_of_pipe;
> - packet->bitfields2.tcl1_action_ena = 1;
> - packet->bitfields2.tc_action_ena = 1;
> - packet->bitfields2.cache_policy = cache_policy__mec_release_mem__lru;
> -
> - packet->bitfields3.data_sel = 
> data_sel__mec_release_mem__send_32_bit_low;
> - packet->bitfields3.int_sel =
> - int_sel__mec_release_mem__send_interrupt_after_write_confirm;
> -
> - packet->bitfields4.address_lo_32b = (gpu_addr & 0x) >> 2;
> - packet->address_hi = upper_32_bits(gpu_addr);
> -
> - packet->data_lo = 0;
> -
> - return sizeof(struct pm4_mec_release_mem) / sizeof(unsigned int);
> -}
> -
>   const struct packet_manager_funcs kfd_v10_pm_funcs = {
>   .map_process= pm_map_process_v10,
>   .runlist= pm_runlist_v10,
> @@ -336,13 +305,13 @@ const struct packet_manager_funcs kfd_v10_pm_funcs = {
>   .map_queues = pm_map_queues_v10,
>   .unmap_queues   = pm_unmap_queues_v10,
>   .query_status   = pm_query_status_v10,
> - .release_mem= pm_release_mem_v10,
> + .release_mem= NULL,
>   .map_process_size   = sizeof(struct pm4_mes_map_process),
>   .runlist_size   = sizeof(struct pm4_mes_runlist),
>   .set_resources_size = sizeof(struct pm4_mes_set_resources),
>   .map_queues_size= sizeof(struct pm4_mes_map_queues),
>   .unmap_queues_size  = sizeof(struct pm4_mes_unmap_queues),
>   .query_status_size  = sizeof(struct pm4_mes_query_status),
> - .release_mem_size   = sizeof(struct pm4_mec_release_mem)
> + .release_mem_size   = 0,
>   };
>   
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
> index 3b5ca2b1d7a6..f0e4910a8865 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
> @@ -336,35 +336,6 @@ static int pm_query_status_v9(struct packet_manager *pm, 
> uint32_t *buffer,
>   return 0;
>   }
>   
> -
> -static int pm_release_mem_v9(uint64_t gpu_addr, uint32_t *buffer)
> -{
> - struct pm4_mec_release_mem *packet;
> -
> - packet = (struct pm4_mec_release_mem *)buffer;
> - memset(buffer, 0, sizeof(struct pm4_mec_release_mem));
> -
> - packet->header.u32All = pm_build_pm4_header(IT_RELEASE_MEM,
> - sizeof(struct pm4_mec_release_mem));
> -
> - packet->bitfields2.event_type = CACHE_FLUSH_AND_INV_TS_EVENT;
> - packet->bitfields2.event_index = 
> event_index__mec_release_mem__end_of_pipe;
> - packet->bitfields2.tcl1_action_ena = 1;
> - packet->bitfields2.tc_action_ena = 1;
> - packet->bitfields2.cache_policy = cache_policy__mec_release_mem__lru;
> -
> - packet->bitfields3.data_sel = 
> data_sel__mec_release_mem__send_32_bit_low;
> - packet->bitfields3.int_sel =
> - int_sel__mec_release_mem__send_interrupt_after_write_confirm;
> -
> - packet->bitfields4.address_lo_32b = (gpu_addr & 0x) >> 2;
> - packet->address_hi = upper_32_bits(gpu_addr);
> -
> - packet->data_lo = 0;
> -
> - return 0;
> -}
> -
>   const struct packet_manager_funcs kfd_v9_pm_funcs = {
>   .map_process= pm_map_process_v9,
>   .runlist

Re: [PATCH 1/3] drm/amdkfd: Adjust function sequences to avoid unnecessary declarations

2019-11-06 Thread Kuehling, Felix
On 2019-10-30 20:17, Zhao, Yong wrote:
> This is cleaner.
>
> Change-Id: I8cdecad387d8c547a088c6050f77385ee1135be1
> Signed-off-by: Yong Zhao 
Reviewed-by: Felix Kuehling 
> ---
>   .../gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c  | 19 +++
>   1 file changed, 7 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
> index 9a4bafb2e175..3b5ca2b1d7a6 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
> @@ -26,18 +26,6 @@
>   #include "kfd_pm4_headers_ai.h"
>   #include "kfd_pm4_opcodes.h"
>   
> -static bool initialize_v9(struct kernel_queue *kq, struct kfd_dev *dev,
> - enum kfd_queue_type type, unsigned int queue_size);
> -static void uninitialize_v9(struct kernel_queue *kq);
> -static void submit_packet_v9(struct kernel_queue *kq);
> -
> -void kernel_queue_init_v9(struct kernel_queue_ops *ops)
> -{
> - ops->initialize = initialize_v9;
> - ops->uninitialize = uninitialize_v9;
> - ops->submit_packet = submit_packet_v9;
> -}
> -
>   static bool initialize_v9(struct kernel_queue *kq, struct kfd_dev *dev,
>   enum kfd_queue_type type, unsigned int queue_size)
>   {
> @@ -67,6 +55,13 @@ static void submit_packet_v9(struct kernel_queue *kq)
>   kq->pending_wptr64);
>   }
>   
> +void kernel_queue_init_v9(struct kernel_queue_ops *ops)
> +{
> + ops->initialize = initialize_v9;
> + ops->uninitialize = uninitialize_v9;
> + ops->submit_packet = submit_packet_v9;
> +}
> +
>   static int pm_map_process_v9(struct packet_manager *pm,
>   uint32_t *buffer, struct qcm_process_device *qpd)
>   {
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: change read of GPU clock counter on Vega10 VF

2019-11-05 Thread Kuehling, Felix
On 2019-11-05 5:26 p.m., Huang, JinHuiEric wrote:
> Using unified VBIOS has performance drop in sriov environment.
> The fix is switching to another register instead.
>
> Signed-off-by: Eric Huang 

Reviewed-by: Felix Kuehling 


> ---
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 19 ---
>   1 file changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 829d623..e44a3ea 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -3885,9 +3885,22 @@ static uint64_t gfx_v9_0_get_gpu_clock_counter(struct 
> amdgpu_device *adev)
>   uint64_t clock;
>   
>   mutex_lock(>gfx.gpu_clock_mutex);
> - WREG32_SOC15(GC, 0, mmRLC_CAPTURE_GPU_CLOCK_COUNT, 1);
> - clock = (uint64_t)RREG32_SOC15(GC, 0, mmRLC_GPU_CLOCK_COUNT_LSB) |
> - ((uint64_t)RREG32_SOC15(GC, 0, mmRLC_GPU_CLOCK_COUNT_MSB) << 
> 32ULL);
> + if (adev->asic_type == CHIP_VEGA10 && amdgpu_sriov_runtime(adev)) {
> + uint32_t tmp, lsb, msb, i = 0;
> + do {
> + if (i != 0)
> + udelay(1);
> + tmp = RREG32_SOC15(GC, 0, mmRLC_REFCLOCK_TIMESTAMP_MSB);
> + lsb = RREG32_SOC15(GC, 0, mmRLC_REFCLOCK_TIMESTAMP_LSB);
> + msb = RREG32_SOC15(GC, 0, mmRLC_REFCLOCK_TIMESTAMP_MSB);
> + i++;
> + } while (unlikely(tmp != msb) && (i < adev->usec_timeout));
> + clock = (uint64_t)lsb | ((uint64_t)msb << 32ULL);
> + } else {
> + WREG32_SOC15(GC, 0, mmRLC_CAPTURE_GPU_CLOCK_COUNT, 1);
> + clock = (uint64_t)RREG32_SOC15(GC, 0, 
> mmRLC_GPU_CLOCK_COUNT_LSB) |
> + ((uint64_t)RREG32_SOC15(GC, 0, 
> mmRLC_GPU_CLOCK_COUNT_MSB) << 32ULL);
> + }
>   mutex_unlock(>gfx.gpu_clock_mutex);
>   return clock;
>   }
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdkfd: Simplify the mmap offset related bit operations

2019-11-05 Thread Kuehling, Felix
On 2019-11-01 7:03 p.m., Zhao, Yong wrote:
>
> > + /* only leave the offset segment */
> > + vma->vm_pgoff &= (1ULL << (KFD_MMAP_GPU_ID_SHIFT - PAGE_SHIFT)) - 
> 1;
>
> You're now open-coding what used to be done by the
> KFD_MMAP_OFFSET_VALUE_GET macro. I don't see how this is an
> improvement.
> Maybe better to update the macro to do this.
>
>
> I can definitely do that, but I think we'd better delete this line 
> completely as it seems odd to change vm_pgoff. Moreover this vm_pgoff 
> is not used at all in the following function calls. What do you think?

I think you're right. Looks like a historical accident. I see that older 
versions of kfd_event_mmap used to access vm_pgoff and probably depended 
on this. We removed that in this commit:


commit 50cb7dd94cb43a6204813376e1be1d21780b71fb
Author: Felix Kuehling 
Date:   Fri Oct 27 19:35:26 2017 -0400

     drm/amdkfd: Simplify events page allocator

     The first event page is always big enough to handle all events.
     Handling of multiple events pages is not supported by user mode, and
     not necessary.

     Signed-off-by: Yong Zhao 
     Signed-off-by: Felix Kuehling 
     Acked-by: Oded Gabbay 
     Signed-off-by: Oded Gabbay 



Regards,
   Felix


>
> Regards,
> Yong
> --------
> *From:* Kuehling, Felix 
> *Sent:* Friday, November 1, 2019 6:48 PM
> *To:* Zhao, Yong ; amd-gfx@lists.freedesktop.org 
> 
> *Subject:* Re: [PATCH] drm/amdkfd: Simplify the mmap offset related 
> bit operations
> On 2019-11-01 4:48 p.m., Zhao, Yong wrote:
> > The new code is much cleaner and results in better readability.
> >
> > Change-Id: I0c1f7cca7e24ddb7b4ffe1cb0fa71943828ae373
> > Signed-off-by: Yong Zhao 
> > ---
> >   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 13 +++--
> >   drivers/gpu/drm/amd/amdkfd/kfd_events.c  |  1 -
> >   drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  9 +++--
> >   drivers/gpu/drm/amd/amdkfd/kfd_process.c |  3 +--
> >   4 files changed, 11 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > index b91993753b82..590138727ca9 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > @@ -298,7 +298,6 @@ static int kfd_ioctl_create_queue(struct file 
> *filep, struct kfd_process *p,
> >    /* Return gpu_id as doorbell offset for mmap usage */
> >    args->doorbell_offset = KFD_MMAP_TYPE_DOORBELL;
> >    args->doorbell_offset |= KFD_MMAP_GPU_ID(args->gpu_id);
> > - args->doorbell_offset <<= PAGE_SHIFT;
> >    if (KFD_IS_SOC15(dev->device_info->asic_family))
> >    /* On SOC15 ASICs, include the doorbell offset within the
> > * process doorbell frame, which could be 1 page or 2 
> pages.
> > @@ -1938,20 +1937,22 @@ static int kfd_mmap(struct file *filp, 
> struct vm_area_struct *vma)
> >   {
> >    struct kfd_process *process;
> >    struct kfd_dev *dev = NULL;
> > - unsigned long vm_pgoff;
> > + unsigned long mmap_offset;
> >    unsigned int gpu_id;
> >
> >    process = kfd_get_process(current);
> >    if (IS_ERR(process))
> >    return PTR_ERR(process);
> >
> > - vm_pgoff = vma->vm_pgoff;
> > - vma->vm_pgoff = KFD_MMAP_OFFSET_VALUE_GET(vm_pgoff);
> > - gpu_id = KFD_MMAP_GPU_ID_GET(vm_pgoff);
> > + mmap_offset = vma->vm_pgoff << PAGE_SHIFT;
> > + gpu_id = KFD_MMAP_GET_GPU_ID(mmap_offset);
> >    if (gpu_id)
> >    dev = kfd_device_by_id(gpu_id);
> >
> > - switch (vm_pgoff & KFD_MMAP_TYPE_MASK) {
> > + /* only leave the offset segment */
> > + vma->vm_pgoff &= (1ULL << (KFD_MMAP_GPU_ID_SHIFT - 
> PAGE_SHIFT)) - 1;
>
> You're now open-coding what used to be done by the
> KFD_MMAP_OFFSET_VALUE_GET macro. I don't see how this is an improvement.
> Maybe better to update the macro to do this.
>
>
> > +
> > + switch (mmap_offset & KFD_MMAP_TYPE_MASK) {
> >    case KFD_MMAP_TYPE_DOORBELL:
> >    if (!dev)
> >    return -ENODEV;
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> > index 908081c85de1..1f8365575b12 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> > @@ -346,7 +346,6 @@ int kfd_event_

Re: [PATCH] drm/amdgpu: change read of GPU clock counter on Vega10 VF

2019-11-05 Thread Kuehling, Felix

On 2019-11-05 5:03 p.m., Huang, JinHuiEric wrote:
> Using unified VBIOS has performance drop in sriov environment.
> The fix is switching to another register instead.
>
> Signed-off-by: Eric Huang 
> ---
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 18 +++---
>   1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 829d623..6770bd1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -3885,9 +3885,21 @@ static uint64_t gfx_v9_0_get_gpu_clock_counter(struct 
> amdgpu_device *adev)
>   uint64_t clock;
>   
>   mutex_lock(>gfx.gpu_clock_mutex);
> - WREG32_SOC15(GC, 0, mmRLC_CAPTURE_GPU_CLOCK_COUNT, 1);
> - clock = (uint64_t)RREG32_SOC15(GC, 0, mmRLC_GPU_CLOCK_COUNT_LSB) |
> - ((uint64_t)RREG32_SOC15(GC, 0, mmRLC_GPU_CLOCK_COUNT_MSB) << 
> 32ULL);
> + if (adev->asic_type == CHIP_VEGA10 && amdgpu_sriov_runtime(adev)) {
> + uint32_t tmp, lsb, msb, i = 0;
> + do {
> + tmp = RREG32_SOC15(GC, 0, mmRLC_REFCLOCK_TIMESTAMP_MSB);
> + lsb = RREG32_SOC15(GC, 0, mmRLC_REFCLOCK_TIMESTAMP_LSB);
> + msb = RREG32_SOC15(GC, 0, mmRLC_REFCLOCK_TIMESTAMP_MSB);
> + i++;
> + udelay(1);

This udelay should be conditional. In the likely case that tmp == msb, 
you should never have to delay at all. Maybe put the delay at the start 
of the loop with a condition:

     if (i != 0)
         udelay(1);

So that it only applies before the second and later loops.

Regards,
   Felix


> + } while (unlikely(tmp != msb) && (i < adev->usec_timeout));
> + clock = (uint64_t)lsb | ((uint64_t)msb << 32ULL);
> + } else {
> + WREG32_SOC15(GC, 0, mmRLC_CAPTURE_GPU_CLOCK_COUNT, 1);
> + clock = (uint64_t)RREG32_SOC15(GC, 0, 
> mmRLC_GPU_CLOCK_COUNT_LSB) |
> + ((uint64_t)RREG32_SOC15(GC, 0, 
> mmRLC_GPU_CLOCK_COUNT_MSB) << 32ULL);
> + }
>   mutex_unlock(>gfx.gpu_clock_mutex);
>   return clock;
>   }
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdkfd: Simplify the mmap offset related bit operations

2019-11-01 Thread Kuehling, Felix
On 2019-11-01 4:48 p.m., Zhao, Yong wrote:
> The new code is much cleaner and results in better readability.
>
> Change-Id: I0c1f7cca7e24ddb7b4ffe1cb0fa71943828ae373
> Signed-off-by: Yong Zhao 
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 13 +++--
>   drivers/gpu/drm/amd/amdkfd/kfd_events.c  |  1 -
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  9 +++--
>   drivers/gpu/drm/amd/amdkfd/kfd_process.c |  3 +--
>   4 files changed, 11 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index b91993753b82..590138727ca9 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -298,7 +298,6 @@ static int kfd_ioctl_create_queue(struct file *filep, 
> struct kfd_process *p,
>   /* Return gpu_id as doorbell offset for mmap usage */
>   args->doorbell_offset = KFD_MMAP_TYPE_DOORBELL;
>   args->doorbell_offset |= KFD_MMAP_GPU_ID(args->gpu_id);
> - args->doorbell_offset <<= PAGE_SHIFT;
>   if (KFD_IS_SOC15(dev->device_info->asic_family))
>   /* On SOC15 ASICs, include the doorbell offset within the
>* process doorbell frame, which could be 1 page or 2 pages.
> @@ -1938,20 +1937,22 @@ static int kfd_mmap(struct file *filp, struct 
> vm_area_struct *vma)
>   {
>   struct kfd_process *process;
>   struct kfd_dev *dev = NULL;
> - unsigned long vm_pgoff;
> + unsigned long mmap_offset;
>   unsigned int gpu_id;
>   
>   process = kfd_get_process(current);
>   if (IS_ERR(process))
>   return PTR_ERR(process);
>   
> - vm_pgoff = vma->vm_pgoff;
> - vma->vm_pgoff = KFD_MMAP_OFFSET_VALUE_GET(vm_pgoff);
> - gpu_id = KFD_MMAP_GPU_ID_GET(vm_pgoff);
> + mmap_offset = vma->vm_pgoff << PAGE_SHIFT;
> + gpu_id = KFD_MMAP_GET_GPU_ID(mmap_offset);
>   if (gpu_id)
>   dev = kfd_device_by_id(gpu_id);
>   
> - switch (vm_pgoff & KFD_MMAP_TYPE_MASK) {
> + /* only leave the offset segment */
> + vma->vm_pgoff &= (1ULL << (KFD_MMAP_GPU_ID_SHIFT - PAGE_SHIFT)) - 1;

You're now open-coding what used to be done by the 
KFD_MMAP_OFFSET_VALUE_GET macro. I don't see how this is an improvement. 
Maybe better to update the macro to do this.


> +
> + switch (mmap_offset & KFD_MMAP_TYPE_MASK) {
>   case KFD_MMAP_TYPE_DOORBELL:
>   if (!dev)
>   return -ENODEV;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> index 908081c85de1..1f8365575b12 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> @@ -346,7 +346,6 @@ int kfd_event_create(struct file *devkfd, struct 
> kfd_process *p,
>   ret = create_signal_event(devkfd, p, ev);
>   if (!ret) {
>   *event_page_offset = KFD_MMAP_TYPE_EVENTS;
> - *event_page_offset <<= PAGE_SHIFT;
>   *event_slot_index = ev->event_id;
>   }
>   break;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 66bae8f2dad1..8eecd2cd1fd2 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -59,24 +59,21 @@
>* NOTE: struct vm_area_struct.vm_pgoff uses offset in pages. Hence, these
>*  defines are w.r.t to PAGE_SIZE
>*/
> -#define KFD_MMAP_TYPE_SHIFT  (62 - PAGE_SHIFT)
> +#define KFD_MMAP_TYPE_SHIFT  (62)
>   #define KFD_MMAP_TYPE_MASK  (0x3ULL << KFD_MMAP_TYPE_SHIFT)
>   #define KFD_MMAP_TYPE_DOORBELL  (0x3ULL << KFD_MMAP_TYPE_SHIFT)
>   #define KFD_MMAP_TYPE_EVENTS(0x2ULL << KFD_MMAP_TYPE_SHIFT)
>   #define KFD_MMAP_TYPE_RESERVED_MEM  (0x1ULL << KFD_MMAP_TYPE_SHIFT)
>   #define KFD_MMAP_TYPE_MMIO  (0x0ULL << KFD_MMAP_TYPE_SHIFT)
>   
> -#define KFD_MMAP_GPU_ID_SHIFT (46 - PAGE_SHIFT)
> +#define KFD_MMAP_GPU_ID_SHIFT (46)
>   #define KFD_MMAP_GPU_ID_MASK (((1ULL << KFD_GPU_ID_HASH_WIDTH) - 1) \
>   << KFD_MMAP_GPU_ID_SHIFT)
>   #define KFD_MMAP_GPU_ID(gpu_id) uint64_t)gpu_id) << 
> KFD_MMAP_GPU_ID_SHIFT)\
>   & KFD_MMAP_GPU_ID_MASK)
> -#define KFD_MMAP_GPU_ID_GET(offset)((offset & KFD_MMAP_GPU_ID_MASK) \
> +#define KFD_MMAP_GET_GPU_ID(offset)((offset & KFD_MMAP_GPU_ID_MASK) \
>   >> KFD_MMAP_GPU_ID_SHIFT)
>   
> -#define KFD_MMAP_OFFSET_VALUE_MASK   (0x3FFFULL >> PAGE_SHIFT)
> -#define KFD_MMAP_OFFSET_VALUE_GET(offset) (offset & 
> KFD_MMAP_OFFSET_VALUE_MASK)

This macro is still useful. See above. I think you should just update 
the mask and continue using it for clarity.

Regards,
   Felix


> -
>   /*
>* When working with cp scheduler we should assign the HIQ manually or via
>* the amdgpu driver to a fixed hqd slot, here are the fixed HIQ hqd slot
> diff --git 

Re: [PATCH] drm/amdkfd: Simplify the mmap offset related bit operations

2019-11-01 Thread Kuehling, Felix
NAK. This won't work for several reasons.

The mmap_offset is used as offset parameter in the mmap system call. If 
you check the man page of mmap, you'll see that "offset must be a 
multiple of the page size". Therefore the PAGE_SHIFT is necessary.

In the case of doorbell offsets, the offset is parsed and processed by 
the Thunk in user mode. On GFX9 GPUs the lower bits are used for the 
offset of the doorbell within the doorbell page. On GFX8 the queue ID 
was used, but on GFX9 we had to decoupled the doorbell ID from the queue 
ID. If you remove the PAGE_SHIFT, you'll need to put those bits 
somewhere else. But that change in the encoding would break the ABI with 
the Thunk.

Regards,
   Felix

On 2019-11-01 4:48 p.m., Zhao, Yong wrote:
> The new code is much cleaner and results in better readability.
>
> Change-Id: I0c1f7cca7e24ddb7b4ffe1cb0fa71943828ae373
> Signed-off-by: Yong Zhao 
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 13 +++--
>   drivers/gpu/drm/amd/amdkfd/kfd_events.c  |  1 -
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  9 +++--
>   drivers/gpu/drm/amd/amdkfd/kfd_process.c |  3 +--
>   4 files changed, 11 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index b91993753b82..590138727ca9 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -298,7 +298,6 @@ static int kfd_ioctl_create_queue(struct file *filep, 
> struct kfd_process *p,
>   /* Return gpu_id as doorbell offset for mmap usage */
>   args->doorbell_offset = KFD_MMAP_TYPE_DOORBELL;
>   args->doorbell_offset |= KFD_MMAP_GPU_ID(args->gpu_id);
> - args->doorbell_offset <<= PAGE_SHIFT;
>   if (KFD_IS_SOC15(dev->device_info->asic_family))
>   /* On SOC15 ASICs, include the doorbell offset within the
>* process doorbell frame, which could be 1 page or 2 pages.
> @@ -1938,20 +1937,22 @@ static int kfd_mmap(struct file *filp, struct 
> vm_area_struct *vma)
>   {
>   struct kfd_process *process;
>   struct kfd_dev *dev = NULL;
> - unsigned long vm_pgoff;
> + unsigned long mmap_offset;
>   unsigned int gpu_id;
>   
>   process = kfd_get_process(current);
>   if (IS_ERR(process))
>   return PTR_ERR(process);
>   
> - vm_pgoff = vma->vm_pgoff;
> - vma->vm_pgoff = KFD_MMAP_OFFSET_VALUE_GET(vm_pgoff);
> - gpu_id = KFD_MMAP_GPU_ID_GET(vm_pgoff);
> + mmap_offset = vma->vm_pgoff << PAGE_SHIFT;
> + gpu_id = KFD_MMAP_GET_GPU_ID(mmap_offset);
>   if (gpu_id)
>   dev = kfd_device_by_id(gpu_id);
>   
> - switch (vm_pgoff & KFD_MMAP_TYPE_MASK) {
> + /* only leave the offset segment */
> + vma->vm_pgoff &= (1ULL << (KFD_MMAP_GPU_ID_SHIFT - PAGE_SHIFT)) - 1;
> +
> + switch (mmap_offset & KFD_MMAP_TYPE_MASK) {
>   case KFD_MMAP_TYPE_DOORBELL:
>   if (!dev)
>   return -ENODEV;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> index 908081c85de1..1f8365575b12 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> @@ -346,7 +346,6 @@ int kfd_event_create(struct file *devkfd, struct 
> kfd_process *p,
>   ret = create_signal_event(devkfd, p, ev);
>   if (!ret) {
>   *event_page_offset = KFD_MMAP_TYPE_EVENTS;
> - *event_page_offset <<= PAGE_SHIFT;
>   *event_slot_index = ev->event_id;
>   }
>   break;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 66bae8f2dad1..8eecd2cd1fd2 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -59,24 +59,21 @@
>* NOTE: struct vm_area_struct.vm_pgoff uses offset in pages. Hence, these
>*  defines are w.r.t to PAGE_SIZE
>*/
> -#define KFD_MMAP_TYPE_SHIFT  (62 - PAGE_SHIFT)
> +#define KFD_MMAP_TYPE_SHIFT  (62)
>   #define KFD_MMAP_TYPE_MASK  (0x3ULL << KFD_MMAP_TYPE_SHIFT)
>   #define KFD_MMAP_TYPE_DOORBELL  (0x3ULL << KFD_MMAP_TYPE_SHIFT)
>   #define KFD_MMAP_TYPE_EVENTS(0x2ULL << KFD_MMAP_TYPE_SHIFT)
>   #define KFD_MMAP_TYPE_RESERVED_MEM  (0x1ULL << KFD_MMAP_TYPE_SHIFT)
>   #define KFD_MMAP_TYPE_MMIO  (0x0ULL << KFD_MMAP_TYPE_SHIFT)
>   
> -#define KFD_MMAP_GPU_ID_SHIFT (46 - PAGE_SHIFT)
> +#define KFD_MMAP_GPU_ID_SHIFT (46)
>   #define KFD_MMAP_GPU_ID_MASK (((1ULL << KFD_GPU_ID_HASH_WIDTH) - 1) \
>   << KFD_MMAP_GPU_ID_SHIFT)
>   #define KFD_MMAP_GPU_ID(gpu_id) uint64_t)gpu_id) << 
> KFD_MMAP_GPU_ID_SHIFT)\
>   & KFD_MMAP_GPU_ID_MASK)
> -#define KFD_MMAP_GPU_ID_GET(offset)((offset & KFD_MMAP_GPU_ID_MASK) \
> +#define KFD_MMAP_GET_GPU_ID(offset)((offset & KFD_MMAP_GPU_ID_MASK) \
>  

Re: [PATCH] drm/amdgpu: remove PT BOs when unmapping

2019-10-30 Thread Kuehling, Felix
On 2019-10-30 9:52 a.m., Christian König wrote:
> Am 29.10.19 um 21:06 schrieb Huang, JinHuiEric:
>> The issue is PT BOs are not freed when unmapping VA,
>> which causes vram usage accumulated is huge in some
>> memory stress test, such as kfd big buffer stress test.
>> Function amdgpu_vm_bo_update_mapping() is called by both
>> amdgpu_vm_bo_update() and amdgpu_vm_clear_freed(). The
>> solution is replacing amdgpu_vm_bo_update_mapping() in
>> amdgpu_vm_clear_freed() with removing PT BOs function
>> to save vram usage.
>
> NAK, that is intentional behavior.
>
> Otherwise we can run into out of memory situations when page tables 
> need to be allocated again under stress.

That's a bit arbitrary and inconsistent. We are freeing page tables in 
other situations, when a mapping uses huge pages in 
amdgpu_vm_update_ptes. Why not when a mapping is destroyed completely?

I'm actually a bit surprised that the huge-page handling in 
amdgpu_vm_update_ptes isn't kicking in to free up lower-level page 
tables when a BO is unmapped.

Regards,
   Felix


>
> Regards,
> Christian.
>
>>
>> Change-Id: Ic24e35bff8ca85265b418a642373f189d972a924
>> Signed-off-by: Eric Huang 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 56 
>> +-
>>   1 file changed, 48 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 0f4c3b2..8a480c7 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -1930,6 +1930,51 @@ static void amdgpu_vm_prt_fini(struct 
>> amdgpu_device *adev, struct amdgpu_vm *vm)
>>   }
>>     /**
>> + * amdgpu_vm_remove_ptes - free PT BOs
>> + *
>> + * @adev: amdgpu device structure
>> + * @vm: amdgpu vm structure
>> + * @start: start of mapped range
>> + * @end: end of mapped entry
>> + *
>> + * Free the page table level.
>> + */
>> +static int amdgpu_vm_remove_ptes(struct amdgpu_device *adev,
>> +    struct amdgpu_vm *vm, uint64_t start, uint64_t end)
>> +{
>> +    struct amdgpu_vm_pt_cursor cursor;
>> +    unsigned shift, num_entries;
>> +
>> +    amdgpu_vm_pt_start(adev, vm, start, );
>> +    while (cursor.level < AMDGPU_VM_PTB) {
>> +    if (!amdgpu_vm_pt_descendant(adev, ))
>> +    return -ENOENT;
>> +    }
>> +
>> +    while (cursor.pfn < end) {
>> +    amdgpu_vm_free_table(cursor.entry);
>> +    num_entries = amdgpu_vm_num_entries(adev, cursor.level - 1);
>> +
>> +    if (cursor.entry != >entries[num_entries - 1]) {
>> +    /* Next ptb entry */
>> +    shift = amdgpu_vm_level_shift(adev, cursor.level - 1);
>> +    cursor.pfn += 1ULL << shift;
>> +    cursor.pfn &= ~((1ULL << shift) - 1);
>> +    cursor.entry++;
>> +    } else {
>> +    /* Next ptb entry in next pd0 entry */
>> +    amdgpu_vm_pt_ancestor();
>> +    shift = amdgpu_vm_level_shift(adev, cursor.level - 1);
>> +    cursor.pfn += 1ULL << shift;
>> +    cursor.pfn &= ~((1ULL << shift) - 1);
>> +    amdgpu_vm_pt_descendant(adev, );
>> +    }
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/**
>>    * amdgpu_vm_clear_freed - clear freed BOs in the PT
>>    *
>>    * @adev: amdgpu_device pointer
>> @@ -1949,7 +1994,6 @@ int amdgpu_vm_clear_freed(struct amdgpu_device 
>> *adev,
>>     struct dma_fence **fence)
>>   {
>>   struct amdgpu_bo_va_mapping *mapping;
>> -    uint64_t init_pte_value = 0;
>>   struct dma_fence *f = NULL;
>>   int r;
>>   @@ -1958,13 +2002,10 @@ int amdgpu_vm_clear_freed(struct 
>> amdgpu_device *adev,
>>   struct amdgpu_bo_va_mapping, list);
>>   list_del(>list);
>>   -    if (vm->pte_support_ats &&
>> -    mapping->start < AMDGPU_GMC_HOLE_START)
>> -    init_pte_value = AMDGPU_PTE_DEFAULT_ATC;
>> +    r = amdgpu_vm_remove_ptes(adev, vm,
>> +    (mapping->start + 0x1ff) & (~0x1ffll),
>> +    (mapping->last + 1) & (~0x1ffll));
>>   -    r = amdgpu_vm_bo_update_mapping(adev, vm, false, NULL,
>> -    mapping->start, mapping->last,
>> -    init_pte_value, 0, NULL, );
>>   amdgpu_vm_free_mapping(adev, vm, mapping, f);
>>   if (r) {
>>   dma_fence_put(f);
>> @@ -1980,7 +2021,6 @@ int amdgpu_vm_clear_freed(struct amdgpu_device 
>> *adev,
>>   }
>>     return 0;
>> -
>>   }
>>     /**
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v2 13/15] drm/amdgpu: Use mmu_range_insert instead of hmm_mirror

2019-10-29 Thread Kuehling, Felix
On 2019-10-28 4:10 p.m., Jason Gunthorpe wrote:
> From: Jason Gunthorpe 
>
> Remove the interval tree in the driver and rely on the tree maintained by
> the mmu_notifier for delivering mmu_notifier invalidation callbacks.
>
> For some reason amdgpu has a very complicated arrangement where it tries
> to prevent duplicate entries in the interval_tree, this is not necessary,
> each amdgpu_bo can be its own stand alone entry. interval_tree already
> allows duplicates and overlaps in the tree.
>
> Also, there is no need to remove entries upon a release callback, the
> mmu_range API safely allows objects to remain registered beyond the
> lifetime of the mm. The driver only has to stop touching the pages during
> release.
>
> Cc: Alex Deucher 
> Cc: Christian König 
> Cc: David (ChunMing) Zhou 
> Cc: amd-gfx@lists.freedesktop.org
> Signed-off-by: Jason Gunthorpe 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h   |   2 +
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   5 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|   1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c| 341 --
>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h|   4 -
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h|  13 +-
>   6 files changed, 84 insertions(+), 282 deletions(-)
[snip]
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> index 31d4deb5d29484..4ffd7b90f4d907 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
[snip]
> @@ -50,66 +50,6 @@
>   #include "amdgpu.h"
>   #include "amdgpu_amdkfd.h"
>   
> -/**
> - * struct amdgpu_mn_node
> - *
> - * @it: interval node defining start-last of the affected address range
> - * @bos: list of all BOs in the affected address range
> - *
> - * Manages all BOs which are affected of a certain range of address space.
> - */
> -struct amdgpu_mn_node {
> - struct interval_tree_node   it;
> - struct list_headbos;
> -};
> -
> -/**
> - * amdgpu_mn_destroy - destroy the HMM mirror
> - *
> - * @work: previously sheduled work item
> - *
> - * Lazy destroys the notifier from a work item
> - */
> -static void amdgpu_mn_destroy(struct work_struct *work)
> -{
> - struct amdgpu_mn *amn = container_of(work, struct amdgpu_mn, work);
> - struct amdgpu_device *adev = amn->adev;
> - struct amdgpu_mn_node *node, *next_node;
> - struct amdgpu_bo *bo, *next_bo;
> -
> - mutex_lock(>mn_lock);
> - down_write(>lock);
> - hash_del(>node);
> - rbtree_postorder_for_each_entry_safe(node, next_node,
> -  >objects.rb_root, it.rb) {
> - list_for_each_entry_safe(bo, next_bo, >bos, mn_list) {
> - bo->mn = NULL;
> - list_del_init(>mn_list);
> - }
> - kfree(node);
> - }
> - up_write(>lock);
> - mutex_unlock(>mn_lock);
> -
> - hmm_mirror_unregister(>mirror);
> - kfree(amn);
> -}
> -
> -/**
> - * amdgpu_hmm_mirror_release - callback to notify about mm destruction
> - *
> - * @mirror: the HMM mirror (mm) this callback is about
> - *
> - * Shedule a work item to lazy destroy HMM mirror.
> - */
> -static void amdgpu_hmm_mirror_release(struct hmm_mirror *mirror)
> -{
> - struct amdgpu_mn *amn = container_of(mirror, struct amdgpu_mn, mirror);
> -
> - INIT_WORK(>work, amdgpu_mn_destroy);
> - schedule_work(>work);
> -}
> -
>   /**
>* amdgpu_mn_lock - take the write side lock for this notifier
>*
> @@ -133,157 +73,86 @@ void amdgpu_mn_unlock(struct amdgpu_mn *mn)
>   }
>   
>   /**
> - * amdgpu_mn_read_lock - take the read side lock for this notifier
> - *
> - * @amn: our notifier
> - */
> -static int amdgpu_mn_read_lock(struct amdgpu_mn *amn, bool blockable)
> -{
> - if (blockable)
> - down_read(>lock);
> - else if (!down_read_trylock(>lock))
> - return -EAGAIN;
> -
> - return 0;
> -}
> -
> -/**
> - * amdgpu_mn_read_unlock - drop the read side lock for this notifier
> - *
> - * @amn: our notifier
> - */
> -static void amdgpu_mn_read_unlock(struct amdgpu_mn *amn)
> -{
> - up_read(>lock);
> -}
> -
> -/**
> - * amdgpu_mn_invalidate_node - unmap all BOs of a node
> + * amdgpu_mn_invalidate_gfx - callback to notify about mm change
>*
> - * @node: the node with the BOs to unmap
> - * @start: start of address range affected
> - * @end: end of address range affected
> + * @mrn: the range (mm) is about to update
> + * @range: details on the invalidation
>*
>* Block for operations on BOs to finish and mark pages as accessed and
>* potentially dirty.
>*/
> -static void amdgpu_mn_invalidate_node(struct amdgpu_mn_node *node,
> -   unsigned long start,
> -   unsigned long end)
> +static bool amdgpu_mn_invalidate_gfx(struct mmu_range_notifier *mrn,
> +  const 

Re: [PATCH v2 02/15] mm/mmu_notifier: add an interval tree notifier

2019-10-29 Thread Kuehling, Felix
I haven't had enough time to fully understand the deferred logic in this 
change. I spotted one problem, see comments inline.

On 2019-10-28 4:10 p.m., Jason Gunthorpe wrote:
> From: Jason Gunthorpe 
>
> Of the 13 users of mmu_notifiers, 8 of them use only
> invalidate_range_start/end() and immediately intersect the
> mmu_notifier_range with some kind of internal list of VAs.  4 use an
> interval tree (i915_gem, radeon_mn, umem_odp, hfi1). 4 use a linked list
> of some kind (scif_dma, vhost, gntdev, hmm)
>
> And the remaining 5 either don't use invalidate_range_start() or do some
> special thing with it.
>
> It turns out that building a correct scheme with an interval tree is
> pretty complicated, particularly if the use case is synchronizing against
> another thread doing get_user_pages().  Many of these implementations have
> various subtle and difficult to fix races.
>
> This approach puts the interval tree as common code at the top of the mmu
> notifier call tree and implements a shareable locking scheme.
>
> It includes:
>   - An interval tree tracking VA ranges, with per-range callbacks
>   - A read/write locking scheme for the interval tree that avoids
> sleeping in the notifier path (for OOM killer)
>   - A sequence counter based collision-retry locking scheme to tell
> device page fault that a VA range is being concurrently invalidated.
>
> This is based on various ideas:
> - hmm accumulates invalidated VA ranges and releases them when all
>invalidates are done, via active_invalidate_ranges count.
>This approach avoids having to intersect the interval tree twice (as
>umem_odp does) at the potential cost of a longer device page fault.
>
> - kvm/umem_odp use a sequence counter to drive the collision retry,
>via invalidate_seq
>
> - a deferred work todo list on unlock scheme like RTNL, via deferred_list.
>This makes adding/removing interval tree members more deterministic
>
> - seqlock, except this version makes the seqlock idea multi-holder on the
>write side by protecting it with active_invalidate_ranges and a spinlock
>
> To minimize MM overhead when only the interval tree is being used, the
> entire SRCU and hlist overheads are dropped using some simple
> branches. Similarly the interval tree overhead is dropped when in hlist
> mode.
>
> The overhead from the mandatory spinlock is broadly the same as most of
> existing users which already had a lock (or two) of some sort on the
> invalidation path.
>
> Cc: Andrea Arcangeli 
> Cc: Michal Hocko 
> Acked-by: Christian König 
> Signed-off-by: Jason Gunthorpe 
> ---
>   include/linux/mmu_notifier.h |  98 +++
>   mm/Kconfig   |   1 +
>   mm/mmu_notifier.c| 533 +--
>   3 files changed, 607 insertions(+), 25 deletions(-)
>
[snip]
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index 367670cfd02b7b..d02d3c8c223eb7 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
[snip]
>* because mm->mm_users > 0 during mmu_notifier_register and exit_mmap
> @@ -52,17 +286,24 @@ struct mmu_notifier_mm {
>* can't go away from under us as exit_mmap holds an mm_count pin
>* itself.
>*/
> -void __mmu_notifier_release(struct mm_struct *mm)
> +static void mn_hlist_release(struct mmu_notifier_mm *mmn_mm,
> +  struct mm_struct *mm)
>   {
>   struct mmu_notifier *mn;
>   int id;
>   
> + if (mmn_mm->has_interval)
> + mn_itree_release(mmn_mm, mm);
> +
> + if (hlist_empty(_mm->list))
> + return;

This seems to duplicate the conditions in __mmu_notifier_release. See my 
comments below, I think one of them is wrong. I suspect this one, 
because __mmu_notifier_release follows the same pattern as the other 
notifiers.


> +
>   /*
>* SRCU here will block mmu_notifier_unregister until
>* ->release returns.
>*/
>   id = srcu_read_lock();
> - hlist_for_each_entry_rcu(mn, >mmu_notifier_mm->list, hlist)
> + hlist_for_each_entry_rcu(mn, _mm->list, hlist)
>   /*
>* If ->release runs before mmu_notifier_unregister it must be
>* handled, as it's the only way for the driver to flush all
> @@ -72,9 +313,9 @@ void __mmu_notifier_release(struct mm_struct *mm)
>   if (mn->ops->release)
>   mn->ops->release(mn, mm);
>   
> - spin_lock(>mmu_notifier_mm->lock);
> - while (unlikely(!hlist_empty(>mmu_notifier_mm->list))) {
> - mn = hlist_entry(mm->mmu_notifier_mm->list.first,
> + spin_lock(_mm->lock);
> + while (unlikely(!hlist_empty(_mm->list))) {
> + mn = hlist_entry(mmn_mm->list.first,
>struct mmu_notifier,
>hlist);
>   /*
> @@ -85,7 +326,7 @@ void __mmu_notifier_release(struct mm_struct *mm)
>*/
>   hlist_del_init_rcu(>hlist);
>   }
> - 

Re: [PATCH v2 12/15] drm/amdgpu: Call find_vma under mmap_sem

2019-10-29 Thread Kuehling, Felix
On 2019-10-28 4:10 p.m., Jason Gunthorpe wrote:
> From: Jason Gunthorpe 
>
> find_vma() must be called under the mmap_sem, reorganize this code to
> do the vma check after entering the lock.
>
> Further, fix the unlocked use of struct task_struct's mm, instead use
> the mm from hmm_mirror which has an active mm_grab. Also the mm_grab
> must be converted to a mm_get before acquiring mmap_sem or calling
> find_vma().
>
> Fixes: 66c45500bfdc ("drm/amdgpu: use new HMM APIs and helpers")
> Fixes: 0919195f2b0d ("drm/amdgpu: Enable amdgpu_ttm_tt_get_user_pages in 
> worker threads")
> Cc: Alex Deucher 
> Cc: Christian König 
> Cc: David (ChunMing) Zhou 
> Cc: amd-gfx@lists.freedesktop.org
> Signed-off-by: Jason Gunthorpe 

One question inline to confirm my understanding. Otherwise this patch is

Reviewed-by: Felix Kuehling 


> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 37 ++---
>   1 file changed, 21 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index dff41d0a85fe96..c0e41f1f0c2365 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -35,6 +35,7 @@
>   #include 
>   #include 
>   #include 
> +#include 
>   #include 
>   #include 
>   #include 
> @@ -788,7 +789,7 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, 
> struct page **pages)
>   struct hmm_mirror *mirror = bo->mn ? >mn->mirror : NULL;
>   struct ttm_tt *ttm = bo->tbo.ttm;
>   struct amdgpu_ttm_tt *gtt = (void *)ttm;
> - struct mm_struct *mm = gtt->usertask->mm;
> + struct mm_struct *mm;
>   unsigned long start = gtt->userptr;
>   struct vm_area_struct *vma;
>   struct hmm_range *range;
> @@ -796,25 +797,14 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, 
> struct page **pages)
>   uint64_t *pfns;
>   int r = 0;
>   
> - if (!mm) /* Happens during process shutdown */
> - return -ESRCH;
> -
>   if (unlikely(!mirror)) {
>   DRM_DEBUG_DRIVER("Failed to get hmm_mirror\n");
> - r = -EFAULT;
> - goto out;
> + return -EFAULT;
>   }
>   
> - vma = find_vma(mm, start);
> - if (unlikely(!vma || start < vma->vm_start)) {
> - r = -EFAULT;
> - goto out;
> - }
> - if (unlikely((gtt->userflags & AMDGPU_GEM_USERPTR_ANONONLY) &&
> - vma->vm_file)) {
> - r = -EPERM;
> - goto out;
> - }
> + mm = mirror->hmm->mmu_notifier.mm;
> + if (!mmget_not_zero(mm)) /* Happens during process shutdown */

This works because mirror->hmm->mmu_notifier holds an mmgrab reference 
to the mm? So the MM will not just go away, but if the mmget refcount is 
0, it means the mm is marked for destruction and shouldn't be used any more.


> + return -ESRCH;
>   
>   range = kzalloc(sizeof(*range), GFP_KERNEL);
>   if (unlikely(!range)) {
> @@ -847,6 +837,17 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, 
> struct page **pages)
>   hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT);
>   
>   down_read(>mmap_sem);
> + vma = find_vma(mm, start);
> + if (unlikely(!vma || start < vma->vm_start)) {
> + r = -EFAULT;
> + goto out_unlock;
> + }
> + if (unlikely((gtt->userflags & AMDGPU_GEM_USERPTR_ANONONLY) &&
> + vma->vm_file)) {
> + r = -EPERM;
> + goto out_unlock;
> + }
> +
>   r = hmm_range_fault(range, 0);
>   up_read(>mmap_sem);
>   
> @@ -865,15 +866,19 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, 
> struct page **pages)
>   }
>   
>   gtt->range = range;
> + mmput(mm);
>   
>   return 0;
>   
> +out_unlock:
> + up_read(>mmap_sem);
>   out_free_pfns:
>   hmm_range_unregister(range);
>   kvfree(pfns);
>   out_free_ranges:
>   kfree(range);
>   out:
> + mmput(mm);
>   return r;
>   }
>   
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdkfd: Delete duplicated queue bit map reservation

2019-10-28 Thread Kuehling, Felix
On 2019-10-24 5:14 p.m., Zhao, Yong wrote:
> The KIQ is on the second MEC and its reservation is covered in the
> latter logic, so no need to reserve its bit twice.
>
> Change-Id: Ieee390953a60c7d43de5a9aec38803f1f583a4a9
> Signed-off-by: Yong Zhao 

Reviewed-by: Felix Kuehling 


> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 8 
>   1 file changed, 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> index 8c531793fe17..d3da9dde4ee1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> @@ -130,14 +130,6 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device 
> *adev)
> adev->gfx.mec.queue_bitmap,
> KGD_MAX_QUEUES);
>   
> - /* remove the KIQ bit as well */
> - if (adev->gfx.kiq.ring.sched.ready)
> - clear_bit(amdgpu_gfx_mec_queue_to_bit(adev,
> -   adev->gfx.kiq.ring.me 
> - 1,
> -   
> adev->gfx.kiq.ring.pipe,
> -   
> adev->gfx.kiq.ring.queue),
> -   gpu_resources.queue_bitmap);
> -
>   /* According to linux/bitmap.h we shouldn't use bitmap_clear if
>* nbits is not compile time constant
>*/
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdkfd: bug fix for out of bounds mem on gpu cache filling info

2019-10-24 Thread Kuehling, Felix
On 2019-10-24 14:46, Sierra Guiza, Alejandro (Alex) wrote:
> The bitmap in cu_info structure is defined as a 4x4 size array. In
> Acturus, this matrix is initialized as a 4x2. Based on the 8 shaders.
> In the gpu cache filling initialization, the access to the bitmap matrix
> was done as an 8x1 instead of 4x2. Causing an out of bounds memory
> access error.
> Due to this, the number of GPU cache entries was inconsistent.
>
> Signed-off-by: Alex Sierra 

Reviewed-by: Felix Kuehling 


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> index 0c327e0fc0f7..de9f68d5c312 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> @@ -710,7 +710,7 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
>   pcache_info,
>   cu_info,
>   mem_available,
> - cu_info->cu_bitmap[i][j],
> + cu_info->cu_bitmap[i % 4][j + i 
> / 4],
>   ct,
>   cu_processor_id,
>   k);
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v2] drm/amdkfd: don't use dqm lock during device reset/suspend/resume

2019-10-22 Thread Kuehling, Felix
On 2019-10-22 14:28, Yang, Philip wrote:
> If device reset/suspend/resume failed for some reason, dqm lock is
> hold forever and this causes deadlock. Below is a kernel backtrace when
> application open kfd after suspend/resume failed.
>
> Instead of holding dqm lock in pre_reset and releasing dqm lock in
> post_reset, add dqm->device_stopped flag which is modified in
> dqm->ops.start and dqm->ops.stop. The flag doesn't need lock protection
> because write/read are all inside dqm lock.
>
> For HWS case, map_queues_cpsch and unmap_queues_cpsch checks
> device_stopped flag before sending the updated runlist.
>
> v2: For no-HWS case, when device is stopped, don't call
> load/destroy_mqd for eviction, restore and create queue, and avoid
> debugfs dump hdqs.
>
> Backtrace of dqm lock deadlock:
>
> [Thu Oct 17 16:43:37 2019] INFO: task rocminfo:3024 blocked for more
> than 120 seconds.
> [Thu Oct 17 16:43:37 2019]   Not tainted
> 5.0.0-rc1-kfd-compute-rocm-dkms-no-npi-1131 #1
> [Thu Oct 17 16:43:37 2019] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Thu Oct 17 16:43:37 2019] rocminfoD0  3024   2947
> 0x8000
> [Thu Oct 17 16:43:37 2019] Call Trace:
> [Thu Oct 17 16:43:37 2019]  ? __schedule+0x3d9/0x8a0
> [Thu Oct 17 16:43:37 2019]  schedule+0x32/0x70
> [Thu Oct 17 16:43:37 2019]  schedule_preempt_disabled+0xa/0x10
> [Thu Oct 17 16:43:37 2019]  __mutex_lock.isra.9+0x1e3/0x4e0
> [Thu Oct 17 16:43:37 2019]  ? __call_srcu+0x264/0x3b0
> [Thu Oct 17 16:43:37 2019]  ? process_termination_cpsch+0x24/0x2f0
> [amdgpu]
> [Thu Oct 17 16:43:37 2019]  process_termination_cpsch+0x24/0x2f0
> [amdgpu]
> [Thu Oct 17 16:43:37 2019]
> kfd_process_dequeue_from_all_devices+0x42/0x60 [amdgpu]
> [Thu Oct 17 16:43:37 2019]  kfd_process_notifier_release+0x1be/0x220
> [amdgpu]
> [Thu Oct 17 16:43:37 2019]  __mmu_notifier_release+0x3e/0xc0
> [Thu Oct 17 16:43:37 2019]  exit_mmap+0x160/0x1a0
> [Thu Oct 17 16:43:37 2019]  ? __handle_mm_fault+0xba3/0x1200
> [Thu Oct 17 16:43:37 2019]  ? exit_robust_list+0x5a/0x110
> [Thu Oct 17 16:43:37 2019]  mmput+0x4a/0x120
> [Thu Oct 17 16:43:37 2019]  do_exit+0x284/0xb20
> [Thu Oct 17 16:43:37 2019]  ? handle_mm_fault+0xfa/0x200
> [Thu Oct 17 16:43:37 2019]  do_group_exit+0x3a/0xa0
> [Thu Oct 17 16:43:37 2019]  __x64_sys_exit_group+0x14/0x20
> [Thu Oct 17 16:43:37 2019]  do_syscall_64+0x4f/0x100
> [Thu Oct 17 16:43:37 2019]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Suggested-by: Felix Kuehling 
> Signed-off-by: Philip Yang 

Three more comments inline. With those comments addressed, this patch is

Reviewed-by: Felix Kuehling 


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  6 +--
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c   |  5 --
>   .../drm/amd/amdkfd/kfd_device_queue_manager.c | 47 +--
>   .../drm/amd/amdkfd/kfd_device_queue_manager.h |  1 +
>   4 files changed, 46 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index d9e36dbf13d5..40d75c39f08e 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -120,13 +120,13 @@ static int kfd_open(struct inode *inode, struct file 
> *filep)
>   return -EPERM;
>   }
>   
> + if (kfd_is_locked())
> + return -EAGAIN;
> +
>   process = kfd_create_process(filep);
>   if (IS_ERR(process))
>   return PTR_ERR(process);
>   
> - if (kfd_is_locked())
> - return -EAGAIN;
> -

Is this part of the change still needed? I remember that this sequence 
was a bit tricky with some potential race condition when Shaoyun was 
working on it. This may have unintended side effects.


>   dev_dbg(kfd_device, "process %d opened, compat mode (32 bit) - %d\n",
>   process->pasid, process->is_32bit_user_mode);
>   
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 8f4b24e84964..4fa8834ce7cb 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -730,9 +730,6 @@ int kgd2kfd_pre_reset(struct kfd_dev *kfd)
>   return 0;
>   kgd2kfd_suspend(kfd);
>   
> - /* hold dqm->lock to prevent further execution*/
> - dqm_lock(kfd->dqm);
> -
>   kfd_signal_reset_event(kfd);
>   return 0;
>   }
> @@ -750,8 +747,6 @@ int kgd2kfd_post_reset(struct kfd_dev *kfd)
>   if (!kfd->init_complete)
>   return 0;
>   
> - dqm_unlock(kfd->dqm);
> -
>   ret = kfd_resume(kfd);
>   if (ret)
>   return ret;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 81fb545cf42c..82e1c6280d13 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -340,6 +340,10 @@ static int 

Re: [PATCH] drm/amdkfd: don't use dqm lock during device reset/suspend/resume

2019-10-22 Thread Kuehling, Felix
On 2019-10-21 22:02, Zeng, Oak wrote:
> If we decline the queue creation request in suspend state by returning 
> -EAGAIN, then this approach works for both hws and non-hws. This way the 
> driver is clean but application need to re-create queue later when it get a 
> EAGAIN. Currently application is not aware of the suspend/resume state, so it 
> is hard for application to know when to re-create queue.
>
> The main benefit to allowing create queue in suspend state is that it is 
> easier for application writer. But no actual performance gain as no task will 
> be executed in suspend state.

We should not need to prevent queue creation while suspended. The 
processes are suspended. That means new queues will be created in 
evicted state:

     /*
  * Eviction state logic: mark all queues as evicted, even ones
  * not currently active. Restoring inactive queues later only
  * updates the is_evicted flag but is a no-op otherwise.
  */
     q->properties.is_evicted = !!qpd->evicted;

mqd_mgr->load_mqd will only be called for active queues. So even in the 
non-HWS case we should not be touching the HW while suspended. But I'd 
like to see some safeguards in place to make sure those assumptions are 
never violated.

Regards,
   Felix


>
> Regards,
> Oak
>
> -Original Message-----
> From: amd-gfx  On Behalf Of Kuehling, 
> Felix
> Sent: Monday, October 21, 2019 9:04 PM
> To: Yang, Philip ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdkfd: don't use dqm lock during device 
> reset/suspend/resume
>
>
> On 2019-10-21 5:04 p.m., Yang, Philip wrote:
>> If device reset/suspend/resume failed for some reason, dqm lock is
>> hold forever and this causes deadlock. Below is a kernel backtrace
>> when application open kfd after suspend/resume failed.
>>
>> Instead of holding dqm lock in pre_reset and releasing dqm lock in
>> post_reset, add dqm->device_stopped flag which is modified in
>> dqm->ops.start and dqm->ops.stop. The flag doesn't need lock
>> dqm->protection
>> because write/read are all inside dqm lock.
>>
>> For HWS case, map_queues_cpsch and unmap_queues_cpsch checks
>> device_stopped flag before sending the updated runlist.
> What about the non-HWS case?
>
> In theory in non-HWS case new queues should be created in evicted state while 
> the device (and all processes) are suspended. So we should never try to map 
> or unmap queues to HQDs during suspend. But I'd feel better with a WARN_ON 
> and error return in the right places to make sure we're not missing anything. 
> Basically, we can't call any of the load/destroy_mqd functions while 
> suspended.
>
> That reminds me, we also have to add some checks in the debugfs code to avoid 
> dumping HQDs of a DQM that's stopped.
>
> Last comment: dqm->device_stopped must be initialized as true. It will get 
> set to false when the device is first started. It may be easier to reverse 
> the logic, something like dqm->sched_running that gets implicitly initialized 
> as false.
>
> Regards,
>     Felix
>
>
>> Backtrace of dqm lock deadlock:
>>
>> [Thu Oct 17 16:43:37 2019] INFO: task rocminfo:3024 blocked for more
>> than 120 seconds.
>> [Thu Oct 17 16:43:37 2019]   Not tainted
>> 5.0.0-rc1-kfd-compute-rocm-dkms-no-npi-1131 #1 [Thu Oct 17 16:43:37
>> 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
>> message.
>> [Thu Oct 17 16:43:37 2019] rocminfoD0  3024   2947
>> 0x8000
>> [Thu Oct 17 16:43:37 2019] Call Trace:
>> [Thu Oct 17 16:43:37 2019]  ? __schedule+0x3d9/0x8a0 [Thu Oct 17
>> 16:43:37 2019]  schedule+0x32/0x70 [Thu Oct 17 16:43:37 2019]
>> schedule_preempt_disabled+0xa/0x10
>> [Thu Oct 17 16:43:37 2019]  __mutex_lock.isra.9+0x1e3/0x4e0 [Thu Oct
>> 17 16:43:37 2019]  ? __call_srcu+0x264/0x3b0 [Thu Oct 17 16:43:37
>> 2019]  ? process_termination_cpsch+0x24/0x2f0
>> [amdgpu]
>> [Thu Oct 17 16:43:37 2019]  process_termination_cpsch+0x24/0x2f0
>> [amdgpu]
>> [Thu Oct 17 16:43:37 2019]
>> kfd_process_dequeue_from_all_devices+0x42/0x60 [amdgpu] [Thu Oct 17
>> 16:43:37 2019]  kfd_process_notifier_release+0x1be/0x220
>> [amdgpu]
>> [Thu Oct 17 16:43:37 2019]  __mmu_notifier_release+0x3e/0xc0 [Thu Oct
>> 17 16:43:37 2019]  exit_mmap+0x160/0x1a0 [Thu Oct 17 16:43:37 2019]  ?
>> __handle_mm_fault+0xba3/0x1200 [Thu Oct 17 16:43:37 2019]  ?
>> exit_robust_list+0x5a/0x110 [Thu Oct 17 16:43:37 2019]
>> mmput+0x4a/0x120 [Thu Oct 17 16:43:37 2019]  do_exit+0x284/0xb20 [Thu
>> Oct 17 16:43:37 2019]  ? handle_mm_fault+0xfa/0x200 [Thu Oct 17
>> 16:

Re: [PATCH] drm/amdkfd: don't use dqm lock during device reset/suspend/resume

2019-10-21 Thread Kuehling, Felix

On 2019-10-21 5:04 p.m., Yang, Philip wrote:
> If device reset/suspend/resume failed for some reason, dqm lock is
> hold forever and this causes deadlock. Below is a kernel backtrace when
> application open kfd after suspend/resume failed.
>
> Instead of holding dqm lock in pre_reset and releasing dqm lock in
> post_reset, add dqm->device_stopped flag which is modified in
> dqm->ops.start and dqm->ops.stop. The flag doesn't need lock protection
> because write/read are all inside dqm lock.
>
> For HWS case, map_queues_cpsch and unmap_queues_cpsch checks
> device_stopped flag before sending the updated runlist.

What about the non-HWS case?

In theory in non-HWS case new queues should be created in evicted state 
while the device (and all processes) are suspended. So we should never 
try to map or unmap queues to HQDs during suspend. But I'd feel better 
with a WARN_ON and error return in the right places to make sure we're 
not missing anything. Basically, we can't call any of the 
load/destroy_mqd functions while suspended.

That reminds me, we also have to add some checks in the debugfs code to 
avoid dumping HQDs of a DQM that's stopped.

Last comment: dqm->device_stopped must be initialized as true. It will 
get set to false when the device is first started. It may be easier to 
reverse the logic, something like dqm->sched_running that gets 
implicitly initialized as false.

Regards,
   Felix


>
> Backtrace of dqm lock deadlock:
>
> [Thu Oct 17 16:43:37 2019] INFO: task rocminfo:3024 blocked for more
> than 120 seconds.
> [Thu Oct 17 16:43:37 2019]   Not tainted
> 5.0.0-rc1-kfd-compute-rocm-dkms-no-npi-1131 #1
> [Thu Oct 17 16:43:37 2019] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Thu Oct 17 16:43:37 2019] rocminfoD0  3024   2947
> 0x8000
> [Thu Oct 17 16:43:37 2019] Call Trace:
> [Thu Oct 17 16:43:37 2019]  ? __schedule+0x3d9/0x8a0
> [Thu Oct 17 16:43:37 2019]  schedule+0x32/0x70
> [Thu Oct 17 16:43:37 2019]  schedule_preempt_disabled+0xa/0x10
> [Thu Oct 17 16:43:37 2019]  __mutex_lock.isra.9+0x1e3/0x4e0
> [Thu Oct 17 16:43:37 2019]  ? __call_srcu+0x264/0x3b0
> [Thu Oct 17 16:43:37 2019]  ? process_termination_cpsch+0x24/0x2f0
> [amdgpu]
> [Thu Oct 17 16:43:37 2019]  process_termination_cpsch+0x24/0x2f0
> [amdgpu]
> [Thu Oct 17 16:43:37 2019]
> kfd_process_dequeue_from_all_devices+0x42/0x60 [amdgpu]
> [Thu Oct 17 16:43:37 2019]  kfd_process_notifier_release+0x1be/0x220
> [amdgpu]
> [Thu Oct 17 16:43:37 2019]  __mmu_notifier_release+0x3e/0xc0
> [Thu Oct 17 16:43:37 2019]  exit_mmap+0x160/0x1a0
> [Thu Oct 17 16:43:37 2019]  ? __handle_mm_fault+0xba3/0x1200
> [Thu Oct 17 16:43:37 2019]  ? exit_robust_list+0x5a/0x110
> [Thu Oct 17 16:43:37 2019]  mmput+0x4a/0x120
> [Thu Oct 17 16:43:37 2019]  do_exit+0x284/0xb20
> [Thu Oct 17 16:43:37 2019]  ? handle_mm_fault+0xfa/0x200
> [Thu Oct 17 16:43:37 2019]  do_group_exit+0x3a/0xa0
> [Thu Oct 17 16:43:37 2019]  __x64_sys_exit_group+0x14/0x20
> [Thu Oct 17 16:43:37 2019]  do_syscall_64+0x4f/0x100
> [Thu Oct 17 16:43:37 2019]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Suggested-by: Felix Kuehling 
> Signed-off-by: Philip Yang 
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c|  6 +++---
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c |  5 -
>   .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c   | 13 ++---
>   .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h   |  1 +
>   4 files changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index d9e36dbf13d5..40d75c39f08e 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -120,13 +120,13 @@ static int kfd_open(struct inode *inode, struct file 
> *filep)
>   return -EPERM;
>   }
>   
> + if (kfd_is_locked())
> + return -EAGAIN;
> +
>   process = kfd_create_process(filep);
>   if (IS_ERR(process))
>   return PTR_ERR(process);
>   
> - if (kfd_is_locked())
> - return -EAGAIN;
> -
>   dev_dbg(kfd_device, "process %d opened, compat mode (32 bit) - %d\n",
>   process->pasid, process->is_32bit_user_mode);
>   
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 8f4b24e84964..4fa8834ce7cb 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -730,9 +730,6 @@ int kgd2kfd_pre_reset(struct kfd_dev *kfd)
>   return 0;
>   kgd2kfd_suspend(kfd);
>   
> - /* hold dqm->lock to prevent further execution*/
> - dqm_lock(kfd->dqm);
> -
>   kfd_signal_reset_event(kfd);
>   return 0;
>   }
> @@ -750,8 +747,6 @@ int kgd2kfd_post_reset(struct kfd_dev *kfd)
>   if (!kfd->init_complete)
>   return 0;
>   
> - dqm_unlock(kfd->dqm);
> -
>   ret = 

Re: [PATCH v2] drm/amdkfd: kfd open return failed if device is locked

2019-10-21 Thread Kuehling, Felix
On 2019-10-18 6:54 p.m., Zeng, Oak wrote:
> In current implementation, even dqm is stopped, user can still create (and 
> start) new queue. This is not correct. We should forbid user create/start new 
> queue if dqm is stopped - stop means stopping the current executing queues 
> and stop receiving new creating request.

Queues being stopped should be no reason not to create new queues. 
Creating a new queue is just allocating a new MQD and populating it. If 
the process is suspended (which it is during reset and suspend and 
evictions), there is no need to touch hardware registers or send an 
updated runlist to the HWS.

When the process is resumed at the end of the reset/suspend/eviction, 
that's when any newly created queues would get mapped to the hardware.

Regards,
   Felix


>
> Regards,
> Oak
>
> -Original Message-
> From: amd-gfx  On Behalf Of Kuehling, 
> Felix
> Sent: Friday, October 18, 2019 3:08 PM
> To: amd-gfx@lists.freedesktop.org; Yang, Philip 
> Subject: Re: [PATCH v2] drm/amdkfd: kfd open return failed if device is locked
>
> On 2019-10-18 1:36 p.m., Yang, Philip wrote:
>> If device is locked for suspend and resume, kfd open should return
>> failed -EAGAIN without creating process, otherwise the application
>> exit to release the process will hang to wait for resume is done if
>> the suspend and resume is stuck somewhere. This is backtrace:
>>
>> v2: fix processes that were created before suspend/resume got stuck
>>
>> [Thu Oct 17 16:43:37 2019] INFO: task rocminfo:3024 blocked for more
>> than 120 seconds.
>> [Thu Oct 17 16:43:37 2019]   Not tainted
>> 5.0.0-rc1-kfd-compute-rocm-dkms-no-npi-1131 #1 [Thu Oct 17 16:43:37
>> 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
>> message.
>> [Thu Oct 17 16:43:37 2019] rocminfoD0  3024   2947
>> 0x8000
>> [Thu Oct 17 16:43:37 2019] Call Trace:
>> [Thu Oct 17 16:43:37 2019]  ? __schedule+0x3d9/0x8a0 [Thu Oct 17
>> 16:43:37 2019]  schedule+0x32/0x70 [Thu Oct 17 16:43:37 2019]
>> schedule_preempt_disabled+0xa/0x10
>> [Thu Oct 17 16:43:37 2019]  __mutex_lock.isra.9+0x1e3/0x4e0 [Thu Oct
>> 17 16:43:37 2019]  ? __call_srcu+0x264/0x3b0 [Thu Oct 17 16:43:37
>> 2019]  ? process_termination_cpsch+0x24/0x2f0
>> [amdgpu]
>> [Thu Oct 17 16:43:37 2019]  process_termination_cpsch+0x24/0x2f0
>> [amdgpu]
>> [Thu Oct 17 16:43:37 2019]
>> kfd_process_dequeue_from_all_devices+0x42/0x60 [amdgpu] [Thu Oct 17
>> 16:43:37 2019]  kfd_process_notifier_release+0x1be/0x220
>> [amdgpu]
>> [Thu Oct 17 16:43:37 2019]  __mmu_notifier_release+0x3e/0xc0 [Thu Oct
>> 17 16:43:37 2019]  exit_mmap+0x160/0x1a0 [Thu Oct 17 16:43:37 2019]  ?
>> __handle_mm_fault+0xba3/0x1200 [Thu Oct 17 16:43:37 2019]  ?
>> exit_robust_list+0x5a/0x110 [Thu Oct 17 16:43:37 2019]
>> mmput+0x4a/0x120 [Thu Oct 17 16:43:37 2019]  do_exit+0x284/0xb20 [Thu
>> Oct 17 16:43:37 2019]  ? handle_mm_fault+0xfa/0x200 [Thu Oct 17
>> 16:43:37 2019]  do_group_exit+0x3a/0xa0 [Thu Oct 17 16:43:37 2019]
>> __x64_sys_exit_group+0x14/0x20 [Thu Oct 17 16:43:37 2019]
>> do_syscall_64+0x4f/0x100 [Thu Oct 17 16:43:37 2019]
>> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> Signed-off-by: Philip Yang 
>> ---
>>drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   | 6 +++---
>>drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 6 ++
>>2 files changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> index d9e36dbf13d5..40d75c39f08e 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> @@ -120,13 +120,13 @@ static int kfd_open(struct inode *inode, struct file 
>> *filep)
>>  return -EPERM;
>>  }
>>
>> +if (kfd_is_locked())
>> +return -EAGAIN;
>> +
>>  process = kfd_create_process(filep);
>>  if (IS_ERR(process))
>>  return PTR_ERR(process);
>>
>> -if (kfd_is_locked())
>> -return -EAGAIN;
>> -
>>  dev_dbg(kfd_device, "process %d opened, compat mode (32 bit) - %d\n",
>>  process->pasid, process->is_32bit_user_mode);
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>> index 8509814a6ff0..3784013b92a0 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_ma

Re: Stack out of bounds in KFD on Arcturus

2019-10-18 Thread Kuehling, Felix
On 2019-10-17 6:38 p.m., Grodzovsky, Andrey wrote:
> Not that I aware of, is there a special Kconfig flag to determine stack
> size ?

I remember there used to be a Kconfig option to force a 4KB kernel 
stack. I don't see it in the current kernel any more.

I don't have time to work on this myself. I'll create a ticket and see 
if I can find someone to investigate.

Thanks,
   Felix


>
> Andrey
>
> On 10/17/19 5:29 PM, Kuehling, Felix wrote:
>> I don't see why this problem would be specific to Arcturus. I don't see
>> any excessive allocations on the stack either. Also the code involved
>> here hasn't changed recently.
>>
>> Are you using some weird kernel config with a smaller stack? Is it
>> specific to a compiler version or some optimization flags? I've
>> sometimes seen function inlining cause excessive stack usage.
>>
>> Regards,
>>  Felix
>>
>> On 2019-10-17 4:09 p.m., Grodzovsky, Andrey wrote:
>>> He Felix - I see this on boot when working with Arcturus.
>>>
>>> Andrey
>>>
>>>
>>> [  103.602092] kfd kfd: Allocated 3969056 bytes on gart
>>> [  103.610769]
>>> ==
>>> [  103.611469] BUG: KASAN: stack-out-of-bounds in
>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>>> [  103.611646] Read of size 4 at addr 8883cb19ee38 by task modprobe/1122
>>>
>>> [  103.611836] CPU: 3 PID: 1122 Comm: modprobe Tainted: G
>>> O  5.3.0-rc3+ #45
>>> [  103.611847] Hardware name: System manufacturer System Product
>>> Name/Z170-PRO, BIOS 1902 06/27/2016
>>> [  103.611856] Call Trace:
>>> [  103.611879]  dump_stack+0x71/0xab
>>> [  103.611907]  print_address_description+0x1da/0x3c0
>>> [  103.612453]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>>> [  103.612479]  __kasan_report+0x13f/0x1a0
>>> [  103.613022]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>>> [  103.613580]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>>> [  103.613604]  kasan_report+0xe/0x20
>>> [  103.614149]  kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>>> [  103.614762]  ? kfd_fill_gpu_memory_affinity+0x110/0x110 [amdgpu]
>>> [  103.614796]  ? __alloc_pages_nodemask+0x2c9/0x560
>>> [  103.614824]  ? __alloc_pages_slowpath+0x1390/0x1390
>>> [  103.614898]  ? kmalloc_order+0x63/0x70
>>> [  103.615469]  kfd_create_crat_image_virtual+0x70c/0x770 [amdgpu]
>>> [  103.616054]  ? kfd_create_crat_image_acpi+0x1c0/0x1c0 [amdgpu]
>>> [  103.616095]  ? up_write+0x4b/0x70
>>> [  103.616649]  kfd_topology_add_device+0x98d/0xb10 [amdgpu]
>>> [  103.617207]  ? kfd_topology_shutdown+0x60/0x60 [amdgpu]
>>> [  103.617743]  ? start_cpsch+0x2ff/0x3a0 [amdgpu]
>>> [  103.61]  ? mutex_lock_io_nested+0xac0/0xac0
>>> [  103.617807]  ? __mutex_unlock_slowpath+0xda/0x420
>>> [  103.617848]  ? __mutex_unlock_slowpath+0xda/0x420
>>> [  103.617877]  ? wait_for_completion+0x200/0x200
>>> [  103.618461]  ? start_cpsch+0x38b/0x3a0 [amdgpu]
>>> [  103.619011]  ? create_queue_cpsch+0x670/0x670 [amdgpu]
>>> [  103.619573]  ? kfd_iommu_device_init+0x92/0x1e0 [amdgpu]
>>> [  103.620112]  ? kfd_iommu_resume+0x2c/0x2c0 [amdgpu]
>>> [  103.620655]  ? kfd_iommu_check_device+0xf0/0xf0 [amdgpu]
>>> [  103.621228]  kgd2kfd_device_init+0x474/0x870 [amdgpu]
>>> [  103.621781]  amdgpu_amdkfd_device_init+0x291/0x390 [amdgpu]
>>> [  103.622329]  ? amdgpu_amdkfd_device_probe+0x90/0x90 [amdgpu]
>>> [  103.622344]  ? kmsg_dump_rewind_nolock+0x59/0x59
>>> [  103.622895]  ? amdgpu_ras_eeprom_test+0x71/0x90 [amdgpu]
>>> [  103.623424]  amdgpu_device_init+0x1bbe/0x2f00 [amdgpu]
>>> [  103.623819]  ? amdgpu_device_has_dc_support+0x30/0x30 [amdgpu]
>>> [  103.623842]  ? __isolate_free_page+0x290/0x290
>>> [  103.623852]  ? fs_reclaim_acquire.part.97+0x5/0x30
>>> [  103.623891]  ? __alloc_pages_nodemask+0x2c9/0x560
>>> [  103.623912]  ? __alloc_pages_slowpath+0x1390/0x1390
>>> [  103.623945]  ? kasan_unpoison_shadow+0x31/0x40
>>> [  103.623970]  ? kmalloc_order+0x63/0x70
>>> [  103.624337]  amdgpu_driver_load_kms+0xd9/0x430 [amdgpu]
>>> [  103.624690]  ? amdgpu_register_gpu_instance+0xe0/0xe0 [amdgpu]
>>> [  103.624756]  ? drm_dev_register+0x19c/0x310 [drm]
>>> [  103.624768]  ? __kasan_slab_free+0x133/0x160
>>> [  103.624849]  drm_dev_register+0x1f5/0x310 [drm]
>>> [  103.625212]  amdgpu_pci_probe+0x109/0x1f0 [amdgpu]
>>> [  103.625565]  ? amdgpu_p

Re: [PATCH] drm/amdgpu: revert calling smu msg in df callbacks

2019-10-18 Thread Kuehling, Felix
On 2019-10-18 4:29 p.m., Kim, Jonathan wrote:
> reverting the following changes:
> commit 7dd2eb31fcd5 ("drm/amdgpu: fix compiler warnings for df perfmons")
> commit 54275cd1649f ("drm/amdgpu: disable c-states on xgmi perfmons")
>
> perf events use spin-locks.  embedded smu messages have potential long
> response times and potentially deadlocks the system.
>
> Change-Id: Ic36c35a62dec116d0a2f5b69c22af4d414458679
> Signed-off-by: Jonathan Kim 

Reviewed-by: Felix Kuehling 

See one more comment inline below ...


> ---
>   drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 38 ++--
>   1 file changed, 2 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c 
> b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
> index e1cf7e9c616a..16fbd2bc8ad1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
> +++ b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
> @@ -93,21 +93,6 @@ const struct attribute_group *df_v3_6_attr_groups[] = {
>   NULL
>   };
>   
> -static int df_v3_6_set_df_cstate(struct amdgpu_device *adev, int allow)
> -{
> - int r = 0;
> -
> - if (is_support_sw_smu(adev)) {
> - r = smu_set_df_cstate(>smu, allow);
> - } else if (adev->powerplay.pp_funcs
> - && adev->powerplay.pp_funcs->set_df_cstate) {
> - r = adev->powerplay.pp_funcs->set_df_cstate(
> - adev->powerplay.pp_handle, allow);
> - }
> -
> - return r;
> -}
> -
>   static uint64_t df_v3_6_get_fica(struct amdgpu_device *adev,
>uint32_t ficaa_val)
>   {
> @@ -117,9 +102,6 @@ static uint64_t df_v3_6_get_fica(struct amdgpu_device 
> *adev,
>   address = adev->nbio.funcs->get_pcie_index_offset(adev);
>   data = adev->nbio.funcs->get_pcie_data_offset(adev);
>   
> - if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
> - return 0x;
> -
>   spin_lock_irqsave(>pcie_idx_lock, flags);
>   WREG32(address, smnDF_PIE_AON_FabricIndirectConfigAccessAddress3);
>   WREG32(data, ficaa_val);
> @@ -132,8 +114,6 @@ static uint64_t df_v3_6_get_fica(struct amdgpu_device 
> *adev,
>   
>   spin_unlock_irqrestore(>pcie_idx_lock, flags);
>   
> - df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
> -
>   return (((ficadh_val & 0x) << 32) | ficadl_val);
>   }
>   
> @@ -145,9 +125,6 @@ static void df_v3_6_set_fica(struct amdgpu_device *adev, 
> uint32_t ficaa_val,
>   address = adev->nbio.funcs->get_pcie_index_offset(adev);
>   data = adev->nbio.funcs->get_pcie_data_offset(adev);
>   
> - if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
> - return;
> -
>   spin_lock_irqsave(>pcie_idx_lock, flags);
>   WREG32(address, smnDF_PIE_AON_FabricIndirectConfigAccessAddress3);
>   WREG32(data, ficaa_val);
> @@ -157,9 +134,8 @@ static void df_v3_6_set_fica(struct amdgpu_device *adev, 
> uint32_t ficaa_val,
>   
>   WREG32(address, smnDF_PIE_AON_FabricIndirectConfigAccessDataHi3);
>   WREG32(data, ficadh_val);
> - spin_unlock_irqrestore(>pcie_idx_lock, flags);
>   
> - df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
> + spin_unlock_irqrestore(>pcie_idx_lock, flags);
>   }
>   
>   /*
> @@ -177,17 +153,12 @@ static void df_v3_6_perfmon_rreg(struct amdgpu_device 
> *adev,
>   address = adev->nbio.funcs->get_pcie_index_offset(adev);
>   data = adev->nbio.funcs->get_pcie_data_offset(adev);
>   
> - if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
> - return;
> -
>   spin_lock_irqsave(>pcie_idx_lock, flags);
>   WREG32(address, lo_addr);
>   *lo_val = RREG32(data);
>   WREG32(address, hi_addr);
>   *hi_val = RREG32(data);
>   spin_unlock_irqrestore(>pcie_idx_lock, flags);
> -
> - df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
>   }
>   
>   /*
> @@ -204,17 +175,12 @@ static void df_v3_6_perfmon_wreg(struct amdgpu_device 
> *adev, uint32_t lo_addr,
>   address = adev->nbio.funcs->get_pcie_index_offset(adev);
>   data = adev->nbio.funcs->get_pcie_data_offset(adev);
>   
> - if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
> - return;
> -
>   spin_lock_irqsave(>pcie_idx_lock, flags);
>   WREG32(address, lo_addr);
>   WREG32(data, lo_val);
>   WREG32(address, hi_addr);
>   WREG32(data, hi_val);
>   spin_unlock_irqrestore(>pcie_idx_lock, flags);
> -
> - df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
>   }
>   
>   /* get the number of df counters available */
> @@ -546,7 +512,7 @@ static void df_v3_6_pmc_get_count(struct amdgpu_device 
> *adev,
> uint64_t config,
> uint64_t *count)
>   {
> - uint32_t lo_base_addr, hi_base_addr, lo_val = 0, hi_val = 0;
> + uint32_t lo_base_addr, hi_base_addr, lo_val, hi_val;

This part looks like it was unrelated to the DF Cstate changes. If this 
addressed a real problem, maybe it can be reintroduced with 

Re: [PATCH 2/2] Revert "drm/amdgpu: disable c-states on xgmi perfmons"

2019-10-18 Thread Kuehling, Felix
You can squash the two reverts into a single commit so you avoid 
reintroducing a broken intermediate state. Mention both reverted commits 
in the squashed commit description. Checkpatch.pl prefers a different 
format for quoting reverted commits. Run checkpatch.pl on your commit to 
see a proper example.

Regards,
   Felix


On 2019-10-18 1:59 p.m., Kim, Jonathan wrote:
> This reverts commit 54275cd1649f4034c6450b6c5a8358fcd4f7dda6.
>
> incomplete solution to df c-state race condition.  smu msg in perf events
> causes deadlock.
>
> Change-Id: Ia85179df2bd167657e42a2d828c4a7c475c392ff
> Signed-off-by: Jonathan Kim 
> ---
>   drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 36 +---
>   1 file changed, 1 insertion(+), 35 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c 
> b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
> index f403c62c944e..16fbd2bc8ad1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
> +++ b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
> @@ -93,21 +93,6 @@ const struct attribute_group *df_v3_6_attr_groups[] = {
>   NULL
>   };
>   
> -static df_v3_6_set_df_cstate(struct amdgpu_device *adev, int allow)
> -{
> - int r = 0;
> -
> - if (is_support_sw_smu(adev)) {
> - r = smu_set_df_cstate(>smu, allow);
> - } else if (adev->powerplay.pp_funcs
> - && adev->powerplay.pp_funcs->set_df_cstate) {
> - r = adev->powerplay.pp_funcs->set_df_cstate(
> - adev->powerplay.pp_handle, allow);
> - }
> -
> - return r;
> -}
> -
>   static uint64_t df_v3_6_get_fica(struct amdgpu_device *adev,
>uint32_t ficaa_val)
>   {
> @@ -117,9 +102,6 @@ static uint64_t df_v3_6_get_fica(struct amdgpu_device 
> *adev,
>   address = adev->nbio.funcs->get_pcie_index_offset(adev);
>   data = adev->nbio.funcs->get_pcie_data_offset(adev);
>   
> - if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
> - return 0x;
> -
>   spin_lock_irqsave(>pcie_idx_lock, flags);
>   WREG32(address, smnDF_PIE_AON_FabricIndirectConfigAccessAddress3);
>   WREG32(data, ficaa_val);
> @@ -132,8 +114,6 @@ static uint64_t df_v3_6_get_fica(struct amdgpu_device 
> *adev,
>   
>   spin_unlock_irqrestore(>pcie_idx_lock, flags);
>   
> - df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
> -
>   return (((ficadh_val & 0x) << 32) | ficadl_val);
>   }
>   
> @@ -145,9 +125,6 @@ static void df_v3_6_set_fica(struct amdgpu_device *adev, 
> uint32_t ficaa_val,
>   address = adev->nbio.funcs->get_pcie_index_offset(adev);
>   data = adev->nbio.funcs->get_pcie_data_offset(adev);
>   
> - if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
> - return;
> -
>   spin_lock_irqsave(>pcie_idx_lock, flags);
>   WREG32(address, smnDF_PIE_AON_FabricIndirectConfigAccessAddress3);
>   WREG32(data, ficaa_val);
> @@ -157,9 +134,8 @@ static void df_v3_6_set_fica(struct amdgpu_device *adev, 
> uint32_t ficaa_val,
>   
>   WREG32(address, smnDF_PIE_AON_FabricIndirectConfigAccessDataHi3);
>   WREG32(data, ficadh_val);
> - spin_unlock_irqrestore(>pcie_idx_lock, flags);
>   
> - df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
> + spin_unlock_irqrestore(>pcie_idx_lock, flags);
>   }
>   
>   /*
> @@ -177,17 +153,12 @@ static void df_v3_6_perfmon_rreg(struct amdgpu_device 
> *adev,
>   address = adev->nbio.funcs->get_pcie_index_offset(adev);
>   data = adev->nbio.funcs->get_pcie_data_offset(adev);
>   
> - if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
> - return;
> -
>   spin_lock_irqsave(>pcie_idx_lock, flags);
>   WREG32(address, lo_addr);
>   *lo_val = RREG32(data);
>   WREG32(address, hi_addr);
>   *hi_val = RREG32(data);
>   spin_unlock_irqrestore(>pcie_idx_lock, flags);
> -
> - df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
>   }
>   
>   /*
> @@ -204,17 +175,12 @@ static void df_v3_6_perfmon_wreg(struct amdgpu_device 
> *adev, uint32_t lo_addr,
>   address = adev->nbio.funcs->get_pcie_index_offset(adev);
>   data = adev->nbio.funcs->get_pcie_data_offset(adev);
>   
> - if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
> - return;
> -
>   spin_lock_irqsave(>pcie_idx_lock, flags);
>   WREG32(address, lo_addr);
>   WREG32(data, lo_val);
>   WREG32(address, hi_addr);
>   WREG32(data, hi_val);
>   spin_unlock_irqrestore(>pcie_idx_lock, flags);
> -
> - df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
>   }
>   
>   /* get the number of df counters available */
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v2] drm/amdkfd: kfd open return failed if device is locked

2019-10-18 Thread Kuehling, Felix
On 2019-10-18 1:36 p.m., Yang, Philip wrote:
> If device is locked for suspend and resume, kfd open should return
> failed -EAGAIN without creating process, otherwise the application exit
> to release the process will hang to wait for resume is done if the suspend
> and resume is stuck somewhere. This is backtrace:
>
> v2: fix processes that were created before suspend/resume got stuck
>
> [Thu Oct 17 16:43:37 2019] INFO: task rocminfo:3024 blocked for more
> than 120 seconds.
> [Thu Oct 17 16:43:37 2019]   Not tainted
> 5.0.0-rc1-kfd-compute-rocm-dkms-no-npi-1131 #1
> [Thu Oct 17 16:43:37 2019] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Thu Oct 17 16:43:37 2019] rocminfoD0  3024   2947
> 0x8000
> [Thu Oct 17 16:43:37 2019] Call Trace:
> [Thu Oct 17 16:43:37 2019]  ? __schedule+0x3d9/0x8a0
> [Thu Oct 17 16:43:37 2019]  schedule+0x32/0x70
> [Thu Oct 17 16:43:37 2019]  schedule_preempt_disabled+0xa/0x10
> [Thu Oct 17 16:43:37 2019]  __mutex_lock.isra.9+0x1e3/0x4e0
> [Thu Oct 17 16:43:37 2019]  ? __call_srcu+0x264/0x3b0
> [Thu Oct 17 16:43:37 2019]  ? process_termination_cpsch+0x24/0x2f0
> [amdgpu]
> [Thu Oct 17 16:43:37 2019]  process_termination_cpsch+0x24/0x2f0
> [amdgpu]
> [Thu Oct 17 16:43:37 2019]
> kfd_process_dequeue_from_all_devices+0x42/0x60 [amdgpu]
> [Thu Oct 17 16:43:37 2019]  kfd_process_notifier_release+0x1be/0x220
> [amdgpu]
> [Thu Oct 17 16:43:37 2019]  __mmu_notifier_release+0x3e/0xc0
> [Thu Oct 17 16:43:37 2019]  exit_mmap+0x160/0x1a0
> [Thu Oct 17 16:43:37 2019]  ? __handle_mm_fault+0xba3/0x1200
> [Thu Oct 17 16:43:37 2019]  ? exit_robust_list+0x5a/0x110
> [Thu Oct 17 16:43:37 2019]  mmput+0x4a/0x120
> [Thu Oct 17 16:43:37 2019]  do_exit+0x284/0xb20
> [Thu Oct 17 16:43:37 2019]  ? handle_mm_fault+0xfa/0x200
> [Thu Oct 17 16:43:37 2019]  do_group_exit+0x3a/0xa0
> [Thu Oct 17 16:43:37 2019]  __x64_sys_exit_group+0x14/0x20
> [Thu Oct 17 16:43:37 2019]  do_syscall_64+0x4f/0x100
> [Thu Oct 17 16:43:37 2019]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Signed-off-by: Philip Yang 
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   | 6 +++---
>   drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 6 ++
>   2 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index d9e36dbf13d5..40d75c39f08e 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -120,13 +120,13 @@ static int kfd_open(struct inode *inode, struct file 
> *filep)
>   return -EPERM;
>   }
>   
> + if (kfd_is_locked())
> + return -EAGAIN;
> +
>   process = kfd_create_process(filep);
>   if (IS_ERR(process))
>   return PTR_ERR(process);
>   
> - if (kfd_is_locked())
> - return -EAGAIN;
> -
>   dev_dbg(kfd_device, "process %d opened, compat mode (32 bit) - %d\n",
>   process->pasid, process->is_32bit_user_mode);
>   
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> index 8509814a6ff0..3784013b92a0 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> @@ -128,6 +128,12 @@ void kfd_process_dequeue_from_all_devices(struct 
> kfd_process *p)
>   {
>   struct kfd_process_device *pdd;
>   
> + /* If suspend/resume got stuck, dqm_lock is hold,
> +  * skip process_termination_cpsch to avoid deadlock
> +  */
> + if (kfd_is_locked())
> + return;
> +

Holding the DQM lock during reset has caused other problems (lock 
dependency issues and deadlocks) and I was thinking about getting rid of 
that completely. The intention of holding the DQM lock during reset was 
to prevent the device queue manager from accessing the CP hardware while 
a reset was in progress. However, I think there are smarter ways to 
achieve that. We already get a pre-reset callback (kgd2kfd_pre_reset) 
that executes the kgd2kfd_suspend, which suspends processes and stops 
DQM through kfd->dqm->ops.stop(kfd->dqm). This should take care of most 
of the problem. If there are any places in DQM that try to access the 
devices, they should add conditions to not access HW while DQM is 
stopped. Then we could avoid holding a lock indefinitely while a reset 
is in progress.

The DQM lock is particularly problematic in terms of lock dependencies 
because it can be taken in MMU notifiers. We want to avoid taking any 
other locks while holding the DQM lock. Holding the DQM lock for a long 
time during reset is counterproductive to that objective.

Regards,
   Felix


>   list_for_each_entry(pdd, >per_device_data, per_device_list)
>   kfd_process_dequeue_from_device(pdd);
>   }
___
amd-gfx mailing list

Re: [PATCH] drm/amdkfd: kfd open return failed if device is locked

2019-10-18 Thread Kuehling, Felix
On 2019-10-18 10:27 a.m., Yang, Philip wrote:
> If device is locked for suspend and resume, kfd open should return
> failed -EAGAIN without creating process, otherwise the application exit
> to release the process will hang to wait for resume is done if the suspend
> and resume is stuck somewhere. This is backtrace:

This doesn't fix processes that were created before suspend/resume got 
stuck. They would still get stuck with the same backtrace. So this is 
jut a band-aid. The real underlying problem, that is not getting 
addressed, is suspend/resume getting stuck.

Am I missing something?

Regards,
   Felix


>
> [Thu Oct 17 16:43:37 2019] INFO: task rocminfo:3024 blocked for more
> than 120 seconds.
> [Thu Oct 17 16:43:37 2019]   Not tainted
> 5.0.0-rc1-kfd-compute-rocm-dkms-no-npi-1131 #1
> [Thu Oct 17 16:43:37 2019] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Thu Oct 17 16:43:37 2019] rocminfoD0  3024   2947
> 0x8000
> [Thu Oct 17 16:43:37 2019] Call Trace:
> [Thu Oct 17 16:43:37 2019]  ? __schedule+0x3d9/0x8a0
> [Thu Oct 17 16:43:37 2019]  schedule+0x32/0x70
> [Thu Oct 17 16:43:37 2019]  schedule_preempt_disabled+0xa/0x10
> [Thu Oct 17 16:43:37 2019]  __mutex_lock.isra.9+0x1e3/0x4e0
> [Thu Oct 17 16:43:37 2019]  ? __call_srcu+0x264/0x3b0
> [Thu Oct 17 16:43:37 2019]  ? process_termination_cpsch+0x24/0x2f0
> [amdgpu]
> [Thu Oct 17 16:43:37 2019]  process_termination_cpsch+0x24/0x2f0
> [amdgpu]
> [Thu Oct 17 16:43:37 2019]
> kfd_process_dequeue_from_all_devices+0x42/0x60 [amdgpu]
> [Thu Oct 17 16:43:37 2019]  kfd_process_notifier_release+0x1be/0x220
> [amdgpu]
> [Thu Oct 17 16:43:37 2019]  __mmu_notifier_release+0x3e/0xc0
> [Thu Oct 17 16:43:37 2019]  exit_mmap+0x160/0x1a0
> [Thu Oct 17 16:43:37 2019]  ? __handle_mm_fault+0xba3/0x1200
> [Thu Oct 17 16:43:37 2019]  ? exit_robust_list+0x5a/0x110
> [Thu Oct 17 16:43:37 2019]  mmput+0x4a/0x120
> [Thu Oct 17 16:43:37 2019]  do_exit+0x284/0xb20
> [Thu Oct 17 16:43:37 2019]  ? handle_mm_fault+0xfa/0x200
> [Thu Oct 17 16:43:37 2019]  do_group_exit+0x3a/0xa0
> [Thu Oct 17 16:43:37 2019]  __x64_sys_exit_group+0x14/0x20
> [Thu Oct 17 16:43:37 2019]  do_syscall_64+0x4f/0x100
> [Thu Oct 17 16:43:37 2019]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Signed-off-by: Philip Yang 
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index d9e36dbf13d5..40d75c39f08e 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -120,13 +120,13 @@ static int kfd_open(struct inode *inode, struct file 
> *filep)
>   return -EPERM;
>   }
>   
> + if (kfd_is_locked())
> + return -EAGAIN;
> +
>   process = kfd_create_process(filep);
>   if (IS_ERR(process))
>   return PTR_ERR(process);
>   
> - if (kfd_is_locked())
> - return -EAGAIN;
> -
>   dev_dbg(kfd_device, "process %d opened, compat mode (32 bit) - %d\n",
>   process->pasid, process->is_32bit_user_mode);
>   
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Stack out of bounds in KFD on Arcturus

2019-10-17 Thread Kuehling, Felix
I don't see why this problem would be specific to Arcturus. I don't see 
any excessive allocations on the stack either. Also the code involved 
here hasn't changed recently.

Are you using some weird kernel config with a smaller stack? Is it 
specific to a compiler version or some optimization flags? I've 
sometimes seen function inlining cause excessive stack usage.

Regards,
   Felix

On 2019-10-17 4:09 p.m., Grodzovsky, Andrey wrote:
> He Felix - I see this on boot when working with Arcturus.
>
> Andrey
>
>
> [  103.602092] kfd kfd: Allocated 3969056 bytes on gart
> [  103.610769]
> ==
> [  103.611469] BUG: KASAN: stack-out-of-bounds in
> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
> [  103.611646] Read of size 4 at addr 8883cb19ee38 by task modprobe/1122
>
> [  103.611836] CPU: 3 PID: 1122 Comm: modprobe Tainted: G
> O  5.3.0-rc3+ #45
> [  103.611847] Hardware name: System manufacturer System Product
> Name/Z170-PRO, BIOS 1902 06/27/2016
> [  103.611856] Call Trace:
> [  103.611879]  dump_stack+0x71/0xab
> [  103.611907]  print_address_description+0x1da/0x3c0
> [  103.612453]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
> [  103.612479]  __kasan_report+0x13f/0x1a0
> [  103.613022]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
> [  103.613580]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
> [  103.613604]  kasan_report+0xe/0x20
> [  103.614149]  kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
> [  103.614762]  ? kfd_fill_gpu_memory_affinity+0x110/0x110 [amdgpu]
> [  103.614796]  ? __alloc_pages_nodemask+0x2c9/0x560
> [  103.614824]  ? __alloc_pages_slowpath+0x1390/0x1390
> [  103.614898]  ? kmalloc_order+0x63/0x70
> [  103.615469]  kfd_create_crat_image_virtual+0x70c/0x770 [amdgpu]
> [  103.616054]  ? kfd_create_crat_image_acpi+0x1c0/0x1c0 [amdgpu]
> [  103.616095]  ? up_write+0x4b/0x70
> [  103.616649]  kfd_topology_add_device+0x98d/0xb10 [amdgpu]
> [  103.617207]  ? kfd_topology_shutdown+0x60/0x60 [amdgpu]
> [  103.617743]  ? start_cpsch+0x2ff/0x3a0 [amdgpu]
> [  103.61]  ? mutex_lock_io_nested+0xac0/0xac0
> [  103.617807]  ? __mutex_unlock_slowpath+0xda/0x420
> [  103.617848]  ? __mutex_unlock_slowpath+0xda/0x420
> [  103.617877]  ? wait_for_completion+0x200/0x200
> [  103.618461]  ? start_cpsch+0x38b/0x3a0 [amdgpu]
> [  103.619011]  ? create_queue_cpsch+0x670/0x670 [amdgpu]
> [  103.619573]  ? kfd_iommu_device_init+0x92/0x1e0 [amdgpu]
> [  103.620112]  ? kfd_iommu_resume+0x2c/0x2c0 [amdgpu]
> [  103.620655]  ? kfd_iommu_check_device+0xf0/0xf0 [amdgpu]
> [  103.621228]  kgd2kfd_device_init+0x474/0x870 [amdgpu]
> [  103.621781]  amdgpu_amdkfd_device_init+0x291/0x390 [amdgpu]
> [  103.622329]  ? amdgpu_amdkfd_device_probe+0x90/0x90 [amdgpu]
> [  103.622344]  ? kmsg_dump_rewind_nolock+0x59/0x59
> [  103.622895]  ? amdgpu_ras_eeprom_test+0x71/0x90 [amdgpu]
> [  103.623424]  amdgpu_device_init+0x1bbe/0x2f00 [amdgpu]
> [  103.623819]  ? amdgpu_device_has_dc_support+0x30/0x30 [amdgpu]
> [  103.623842]  ? __isolate_free_page+0x290/0x290
> [  103.623852]  ? fs_reclaim_acquire.part.97+0x5/0x30
> [  103.623891]  ? __alloc_pages_nodemask+0x2c9/0x560
> [  103.623912]  ? __alloc_pages_slowpath+0x1390/0x1390
> [  103.623945]  ? kasan_unpoison_shadow+0x31/0x40
> [  103.623970]  ? kmalloc_order+0x63/0x70
> [  103.624337]  amdgpu_driver_load_kms+0xd9/0x430 [amdgpu]
> [  103.624690]  ? amdgpu_register_gpu_instance+0xe0/0xe0 [amdgpu]
> [  103.624756]  ? drm_dev_register+0x19c/0x310 [drm]
> [  103.624768]  ? __kasan_slab_free+0x133/0x160
> [  103.624849]  drm_dev_register+0x1f5/0x310 [drm]
> [  103.625212]  amdgpu_pci_probe+0x109/0x1f0 [amdgpu]
> [  103.625565]  ? amdgpu_pmops_runtime_idle+0xe0/0xe0 [amdgpu]
> [  103.625580]  local_pci_probe+0x74/0xd0
> [  103.625603]  pci_device_probe+0x1fa/0x310
> [  103.625620]  ? pci_device_remove+0x1c0/0x1c0
> [  103.625640]  ? sysfs_do_create_link_sd.isra.2+0x74/0xe0
> [  103.625673]  really_probe+0x367/0x5d0
> [  103.625700]  driver_probe_device+0x177/0x1b0
> [  103.625721]  device_driver_attach+0x8a/0x90
> [  103.625737]  ? device_driver_attach+0x90/0x90
> [  103.625746]  __driver_attach+0xeb/0x190
> [  103.625765]  ? device_driver_attach+0x90/0x90
> [  103.625773]  bus_for_each_dev+0xe4/0x160
> [  103.625789]  ? subsys_dev_iter_exit+0x10/0x10
> [  103.625829]  bus_add_driver+0x277/0x330
> [  103.625855]  driver_register+0xc6/0x1a0
> [  103.625866]  ? 0xa0d88000
> [  103.625880]  do_one_initcall+0xd3/0x334
> [  103.625895]  ? trace_event_raw_event_initcall_finish+0x150/0x150
> [  103.625911]  ? kasan_unpoison_shadow+0x31/0x40
> [  103.625924]  ? __kasan_kmalloc+0xd5/0xf0
> [  103.625946]  ? kmem_cache_alloc_trace+0x154/0x300
> [  103.625955]  ? kasan_unpoison_shadow+0x31/0x40
> [  103.625985]  do_init_module+0xec/0x354
> [  103.626011]  load_module+0x3c91/0x4980
> [  103.626118]  ? module_frob_arch_sections+0x20/0x20
> [  103.626132]  ? ima_read_file+0x10/0x10
> [  103.626142] 

Re: [PATCH v3] drm/amdgpu: user pages array memory leak fix

2019-10-11 Thread Kuehling, Felix
On 2019-10-11 10:36 a.m., Yang, Philip wrote:
> user_pages array should always be freed after validation regardless if
> user pages are changed after bo is created because with HMM change parse
> bo always allocate user pages array to get user pages for userptr bo.
>
> v2: remove unused local variable and amend commit
>
> v3: add back get user pages in gem_userptr_ioctl, to detect application
> bug where an userptr VMA is not ananymous memory and reject it.
>
> Bugzilla: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1844962
>
> Signed-off-by: Philip Yang 
> Tested-by: Joe Barnett 
> Reviewed-by: Christian König 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 4 +---
>   1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index c18a153b3d2a..e7b39daa22f6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -476,7 +476,6 @@ static int amdgpu_cs_list_validate(struct 
> amdgpu_cs_parser *p,
>   
>   list_for_each_entry(lobj, validated, tv.head) {
>   struct amdgpu_bo *bo = ttm_to_amdgpu_bo(lobj->tv.bo);
> - bool binding_userptr = false;
>   struct mm_struct *usermm;
>   
>   usermm = amdgpu_ttm_tt_get_usermm(bo->tbo.ttm);
> @@ -493,14 +492,13 @@ static int amdgpu_cs_list_validate(struct 
> amdgpu_cs_parser *p,
>   
>   amdgpu_ttm_tt_set_user_pages(bo->tbo.ttm,
>lobj->user_pages);
> - binding_userptr = true;
>   }
>   
>   r = amdgpu_cs_validate(p, bo);
>   if (r)
>   return r;
>   
> - if (binding_userptr) {
> + if (lobj->user_pages) {

This if is not needed. kvfree should be able to handle NULL pointers, 
and unconditionally setting the pointer to NULL afterwards is not a 
problem either. With that fixed, this commit is

Reviewed-by: Felix Kuehling 

However, I don't think this should be the final solution. My concern 
with this solution is, that you end up freeing and regenerating the 
user_pages arrays more frequently than necessary: On every command 
submission, even if there was no MMU notifier since the last command 
submission. I was hoping we could get back to a solution where we can 
maintain the same user_pages array across command submissions, since MMU 
notifiers are rare. That should reduce overhead from doing all thos page 
table walks in HMM on every command submissions when using userptrs.

Regards,
   Felix


>   kvfree(lobj->user_pages);
>   lobj->user_pages = NULL;
>   }
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource

2019-10-09 Thread Kuehling, Felix
On 2019-10-09 11:34, Daniel Vetter wrote:
> On Wed, Oct 09, 2019 at 03:25:22PM +0000, Kuehling, Felix wrote:
>> On 2019-10-09 6:31, Daniel Vetter wrote:
>>> On Tue, Oct 08, 2019 at 06:53:18PM +0000, Kuehling, Felix wrote:
>>>> The description sounds reasonable to me and maps well to the CU masking
>>>> feature in our GPUs.
>>>>
>>>> It would also allow us to do more coarse-grained masking for example to
>>>> guarantee balanced allocation of CUs across shader engines or
>>>> partitioning of memory bandwidth or CP pipes (if that is supported by
>>>> the hardware/firmware).
>>> Hm, so this sounds like the definition for how this cgroup is supposed to
>>> work is "amd CU masking" (whatever that exactly is). And the abstract
>>> description is just prettification on top, but not actually the real
>>> definition you guys want.
>> I think you're reading this as the opposite of what I was trying to say.
>> Using CU masking is one possible implementation of LGPUs on AMD
>> hardware. It's the one that Kenny implemented at the end of this patch
>> series, and I pointed out some problems with that approach. Other ways
>> to partition the hardware into LGPUs are conceivable. For example we're
>> considering splitting it along the lines of shader engines, which is
>> more coarse-grain and would also affect memory bandwidth available to
>> each partition.
> If this is supposed to be useful for admins then "other ways to partition
> the hw are conceivable" is the problem. This should be unique for
> admins/end-users. Reading the implementation details and realizing that
> the actual meaning is "amd CU masking" isn't good enough by far, since
> that's meaningless on any other hw.
>
> And if there's other ways to implement this cgroup for amd, it's also
> meaningless (to sysadmins/users) for amd hw.
>
>> We could also consider partitioning pipes in our command processor,
>> although that is not supported by our current CP scheduler firmware.
>>
>> The bottom line is, the LGPU model proposed by Kenny is quite abstract
>> and allows drivers implementing it a lot of flexibility depending on the
>> capability of their hardware and firmware. We haven't settled on a final
>> implementation choice even for AMD.
> That abstract model of essentially "anything goes" is the problem here
> imo. E.g. for cpu cgroups this would be similar to allowing the bitmaks to
> mean "cpu core" on one machine "physical die" on the next and maybe
> "hyperthread unit" on the 3rd. Useless for admins.
>
> So if we have a gpu bitmaks thing that might mean a command submissio pipe
> on one hw (maybe matching what vk exposed, maybe not), some compute unit
> mask on the next and something entirely different (e.g. intel has so
> called GT slices with compute cores + more stuff around) on the 3rd vendor
> then that's not useful for admins.

The goal is to partition GPU compute resources to eliminate as much 
resource contention as possible between different partitions. Different 
hardware will have different capabilities to implement this. No 
implementation will be perfect. For example, even with CPU cores that 
are supposedly well defined, you can still have different behaviours 
depending on CPU cache architectures, NUMA and thermal management across 
CPU cores. The admin will need some knowledge of their hardware 
architecture to understand those effects that are not described by the 
abstract model of cgroups.

The LGPU model is deliberately flexible, because GPU architectures are 
much less standardized than CPU architectures. Expecting a common model 
that is both very specific and applicable to to all GPUs is unrealistic, 
in my opinion.

Regards,
   Felix


> -Daniel
>
>> Regards,
>>     Felix
>>
>>
>>> I think adding a cgroup which is that much depending upon the hw
>>> implementation of the first driver supporting it is not a good idea.
>>> -Daniel
>>>
>>>> I can't comment on the code as I'm unfamiliar with the details of the
>>>> cgroup code.
>>>>
>>>> Acked-by: Felix Kuehling 
>>>>
>>>>
>>>>> ---
>>>>> Documentation/admin-guide/cgroup-v2.rst |  46 
>>>>> include/drm/drm_cgroup.h|   4 +
>>>>> include/linux/cgroup_drm.h  |   6 ++
>>>>> kernel/cgroup/drm.c | 135 
>>>>> 4 files changed, 191 insertions(+)
>>>>>
>>>>> diff --git a/Documentation/adm

Re: [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource

2019-10-09 Thread Kuehling, Felix
On 2019-10-09 6:31, Daniel Vetter wrote:
> On Tue, Oct 08, 2019 at 06:53:18PM +0000, Kuehling, Felix wrote:
>>
>> The description sounds reasonable to me and maps well to the CU masking
>> feature in our GPUs.
>>
>> It would also allow us to do more coarse-grained masking for example to
>> guarantee balanced allocation of CUs across shader engines or
>> partitioning of memory bandwidth or CP pipes (if that is supported by
>> the hardware/firmware).
> Hm, so this sounds like the definition for how this cgroup is supposed to
> work is "amd CU masking" (whatever that exactly is). And the abstract
> description is just prettification on top, but not actually the real
> definition you guys want.

I think you're reading this as the opposite of what I was trying to say. 
Using CU masking is one possible implementation of LGPUs on AMD 
hardware. It's the one that Kenny implemented at the end of this patch 
series, and I pointed out some problems with that approach. Other ways 
to partition the hardware into LGPUs are conceivable. For example we're 
considering splitting it along the lines of shader engines, which is 
more coarse-grain and would also affect memory bandwidth available to 
each partition.

We could also consider partitioning pipes in our command processor, 
although that is not supported by our current CP scheduler firmware.

The bottom line is, the LGPU model proposed by Kenny is quite abstract 
and allows drivers implementing it a lot of flexibility depending on the 
capability of their hardware and firmware. We haven't settled on a final 
implementation choice even for AMD.

Regards,
   Felix


>
> I think adding a cgroup which is that much depending upon the hw
> implementation of the first driver supporting it is not a good idea.
> -Daniel
>
>> I can't comment on the code as I'm unfamiliar with the details of the
>> cgroup code.
>>
>> Acked-by: Felix Kuehling 
>>
>>
>>> ---
>>>Documentation/admin-guide/cgroup-v2.rst |  46 
>>>include/drm/drm_cgroup.h|   4 +
>>>include/linux/cgroup_drm.h  |   6 ++
>>>kernel/cgroup/drm.c | 135 
>>>4 files changed, 191 insertions(+)
>>>
>>> diff --git a/Documentation/admin-guide/cgroup-v2.rst 
>>> b/Documentation/admin-guide/cgroup-v2.rst
>>> index 87a195133eaa..57f18469bd76 100644
>>> --- a/Documentation/admin-guide/cgroup-v2.rst
>>> +++ b/Documentation/admin-guide/cgroup-v2.rst
>>> @@ -1958,6 +1958,52 @@ DRM Interface Files
>>> Set largest allocation for /dev/dri/card1 to 4MB
>>> echo "226:1 4m" > drm.buffer.peak.max
>>>
>>> +  drm.lgpu
>>> +   A read-write nested-keyed file which exists on all cgroups.
>>> +   Each entry is keyed by the DRM device's major:minor.
>>> +
>>> +   lgpu stands for logical GPU, it is an abstraction used to
>>> +   subdivide a physical DRM device for the purpose of resource
>>> +   management.
>>> +
>>> +   The lgpu is a discrete quantity that is device specific (i.e.
>>> +   some DRM devices may have 64 lgpus while others may have 100
>>> +   lgpus.)  The lgpu is a single quantity with two representations
>>> +   denoted by the following nested keys.
>>> +
>>> + = 
>>> + count Representing lgpu as anonymous resource
>>> + list  Representing lgpu as named resource
>>> + = 
>>> +
>>> +   For example:
>>> +   226:0 count=256 list=0-255
>>> +   226:1 count=4 list=0,2,4,6
>>> +   226:2 count=32 list=32-63
>>> +
>>> +   lgpu is represented by a bitmap and uses the bitmap_parselist
>>> +   kernel function so the list key input format is a
>>> +   comma-separated list of decimal numbers and ranges.
>>> +
>>> +   Consecutively set bits are shown as two hyphen-separated decimal
>>> +   numbers, the smallest and largest bit numbers set in the range.
>>> +   Optionally each range can be postfixed to denote that only parts
>>> +   of it should be set.  The range will divided to groups of
>>> +   specific size.
>>> +   Syntax: range:used_size/group_size
>>> +   Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
>>> +
>>> +   The count key is the hamming weight / hweight of the bitmap.
>>> +
>>> +   Both count and list accept the max and default keywords.
>>> +
>>> +   Some DRM 

Re: [PATCH RFC v4 16/16] drm/amdgpu: Integrate with DRM cgroup

2019-10-08 Thread Kuehling, Felix
On 2019-08-29 2:05 a.m., Kenny Ho wrote:
> The number of logical gpu (lgpu) is defined to be the number of compute
> unit (CU) for a device.  The lgpu allocation limit only applies to
> compute workload for the moment (enforced via kfd queue creation.)  Any
> cu_mask update is validated against the availability of the compute unit
> as defined by the drmcg the kfd process belongs to.

There is something missing here. There is an API for the application to 
specify a CU mask. Right now it looks like the application-specified and 
CGroup-specified CU masks would clobber each other. Instead the two 
should be merged.

The CGroup-specified mask should specify a subset of CUs available for 
application-specified CU masks. When the cgroup CU mask changes, you'd 
need to take any application-specified CU masks into account before 
updating the hardware.

The KFD topology APIs report the number of available CUs to the 
application. CGroups would change that number at runtime and 
applications would not expect that. I think the best way to deal with 
that would be to have multiple bits in the application-specified CU mask 
map to the same CU. How to do that in a fair way is not obvious. I guess 
a more coarse-grain division of the GPU into LGPUs would make this 
somewhat easier.

How is this problem handled for CPU cores and the interaction with CPU 
pthread_setaffinity_np?

Regards,
   Felix


>
> Change-Id: I69a57452c549173a1cd623c30dc57195b3b6563e
> Signed-off-by: Kenny Ho 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|   4 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  21 +++
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |   6 +
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   3 +
>   .../amd/amdkfd/kfd_process_queue_manager.c| 140 ++
>   5 files changed, 174 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index 55cb1b2094fd..369915337213 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -198,6 +198,10 @@ uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct kgd_dev 
> *dst, struct kgd_dev *s
>   valid;  \
>   })
>   
> +int amdgpu_amdkfd_update_cu_mask_for_process(struct task_struct *task,
> + struct amdgpu_device *adev, unsigned long *lgpu_bitmap,
> + unsigned int nbits);
> +
>   /* GPUVM API */
>   int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, unsigned int 
> pasid,
>   void **vm, void **process_info,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 163a4fbf0611..8abeffdd2e5b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -1398,9 +1398,29 @@ amdgpu_get_crtc_scanout_position(struct drm_device 
> *dev, unsigned int pipe,
>   static void amdgpu_drmcg_custom_init(struct drm_device *dev,
>   struct drmcg_props *props)
>   {
> + struct amdgpu_device *adev = dev->dev_private;
> +
> + props->lgpu_capacity = adev->gfx.cu_info.number;
> +
>   props->limit_enforced = true;
>   }
>   
> +static void amdgpu_drmcg_limit_updated(struct drm_device *dev,
> + struct task_struct *task, struct drmcg_device_resource *ddr,
> + enum drmcg_res_type res_type)
> +{
> + struct amdgpu_device *adev = dev->dev_private;
> +
> + switch (res_type) {
> + case DRMCG_TYPE_LGPU:
> + amdgpu_amdkfd_update_cu_mask_for_process(task, adev,
> +ddr->lgpu_allocated, dev->drmcg_props.lgpu_capacity);
> + break;
> + default:
> + break;
> + }
> +}
> +
>   static struct drm_driver kms_driver = {
>   .driver_features =
>   DRIVER_USE_AGP | DRIVER_ATOMIC |
> @@ -1438,6 +1458,7 @@ static struct drm_driver kms_driver = {
>   .gem_prime_mmap = amdgpu_gem_prime_mmap,
>   
>   .drmcg_custom_init = amdgpu_drmcg_custom_init,
> + .drmcg_limit_updated = amdgpu_drmcg_limit_updated,
>   
>   .name = DRIVER_NAME,
>   .desc = DRIVER_DESC,
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index 138c70454e2b..fa765b803f97 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -450,6 +450,12 @@ static int kfd_ioctl_set_cu_mask(struct file *filp, 
> struct kfd_process *p,
>   return -EFAULT;
>   }
>   
> + if (!pqm_drmcg_lgpu_validate(p, args->queue_id, properties.cu_mask, 
> cu_mask_size)) {
> + pr_debug("CU mask not permitted by DRM Cgroup");
> + kfree(properties.cu_mask);
> + return -EACCES;
> + }
> +
>   mutex_lock(>mutex);
>   
>   retval = pqm_set_cu_mask(>pqm, args->queue_id, );
> diff --git 

Re: [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource

2019-10-08 Thread Kuehling, Felix
On 2019-08-29 2:05 a.m., Kenny Ho wrote:
> drm.lgpu
>  A read-write nested-keyed file which exists on all cgroups.
>  Each entry is keyed by the DRM device's major:minor.
>
>  lgpu stands for logical GPU, it is an abstraction used to
>  subdivide a physical DRM device for the purpose of resource
>  management.
>
>  The lgpu is a discrete quantity that is device specific (i.e.
>  some DRM devices may have 64 lgpus while others may have 100
>  lgpus.)  The lgpu is a single quantity with two representations
>  denoted by the following nested keys.
>
>= 
>count Representing lgpu as anonymous resource
>list  Representing lgpu as named resource
>= 
>
>  For example:
>  226:0 count=256 list=0-255
>  226:1 count=4 list=0,2,4,6
>  226:2 count=32 list=32-63
>
>  lgpu is represented by a bitmap and uses the bitmap_parselist
>  kernel function so the list key input format is a
>  comma-separated list of decimal numbers and ranges.
>
>  Consecutively set bits are shown as two hyphen-separated decimal
>  numbers, the smallest and largest bit numbers set in the range.
>  Optionally each range can be postfixed to denote that only parts
>  of it should be set.  The range will divided to groups of
>  specific size.
>  Syntax: range:used_size/group_size
>  Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
>
>  The count key is the hamming weight / hweight of the bitmap.
>
>  Both count and list accept the max and default keywords.
>
>  Some DRM devices may only support lgpu as anonymous resources.
>  In such case, the significance of the position of the set bits
>  in list will be ignored.
>
>  This lgpu resource supports the 'allocation' resource
>  distribution model.
>
> Change-Id: I1afcacf356770930c7f925df043e51ad06ceb98e
> Signed-off-by: Kenny Ho 

The description sounds reasonable to me and maps well to the CU masking 
feature in our GPUs.

It would also allow us to do more coarse-grained masking for example to 
guarantee balanced allocation of CUs across shader engines or 
partitioning of memory bandwidth or CP pipes (if that is supported by 
the hardware/firmware).

I can't comment on the code as I'm unfamiliar with the details of the 
cgroup code.

Acked-by: Felix Kuehling 


> ---
>   Documentation/admin-guide/cgroup-v2.rst |  46 
>   include/drm/drm_cgroup.h|   4 +
>   include/linux/cgroup_drm.h  |   6 ++
>   kernel/cgroup/drm.c | 135 
>   4 files changed, 191 insertions(+)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst 
> b/Documentation/admin-guide/cgroup-v2.rst
> index 87a195133eaa..57f18469bd76 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1958,6 +1958,52 @@ DRM Interface Files
>   Set largest allocation for /dev/dri/card1 to 4MB
>   echo "226:1 4m" > drm.buffer.peak.max
>   
> +  drm.lgpu
> + A read-write nested-keyed file which exists on all cgroups.
> + Each entry is keyed by the DRM device's major:minor.
> +
> + lgpu stands for logical GPU, it is an abstraction used to
> + subdivide a physical DRM device for the purpose of resource
> + management.
> +
> + The lgpu is a discrete quantity that is device specific (i.e.
> + some DRM devices may have 64 lgpus while others may have 100
> + lgpus.)  The lgpu is a single quantity with two representations
> + denoted by the following nested keys.
> +
> +   = 
> +   count Representing lgpu as anonymous resource
> +   list  Representing lgpu as named resource
> +   = 
> +
> + For example:
> + 226:0 count=256 list=0-255
> + 226:1 count=4 list=0,2,4,6
> + 226:2 count=32 list=32-63
> +
> + lgpu is represented by a bitmap and uses the bitmap_parselist
> + kernel function so the list key input format is a
> + comma-separated list of decimal numbers and ranges.
> +
> + Consecutively set bits are shown as two hyphen-separated decimal
> + numbers, the smallest and largest bit numbers set in the range.
> + Optionally each range can be postfixed to denote that only parts
> + of it should be set.  The range will divided to groups of
> + specific size.
> + Syntax: range:used_size/group_size
> + Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> +
> + The count key is the hamming weight / hweight of the bitmap.
> +
> + Both count and list accept the max and default keywords.
> +
> + Some DRM devices may only 

Re: [PATCH] drm/amdkfd: fix the build when CIK support is disabled

2019-10-07 Thread Kuehling, Felix
On 2019-10-04 10:15 a.m., Alex Deucher wrote:
> Add proper ifdefs around CIK code in kfd setup.
>
> Signed-off-by: Alex Deucher 

Reviewed-by: Felix Kuehling 

Thanks!

> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c | 6 ++
>   1 file changed, 6 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 0db273587af4..d898adf25fbb 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -39,7 +39,9 @@
>*/
>   static atomic_t kfd_locked = ATOMIC_INIT(0);
>   
> +#ifdef CONFIG_DRM_AMDGPU_CIK
>   extern const struct kfd2kgd_calls gfx_v7_kfd2kgd;
> +#endif
>   extern const struct kfd2kgd_calls gfx_v8_kfd2kgd;
>   extern const struct kfd2kgd_calls gfx_v9_kfd2kgd;
>   extern const struct kfd2kgd_calls arcturus_kfd2kgd;
> @@ -47,11 +49,15 @@ extern const struct kfd2kgd_calls gfx_v10_kfd2kgd;
>   
>   static const struct kfd2kgd_calls *kfd2kgd_funcs[] = {
>   #ifdef KFD_SUPPORT_IOMMU_V2
> +#ifdef CONFIG_DRM_AMDGPU_CIK
>   [CHIP_KAVERI] = _v7_kfd2kgd,
> +#endif
>   [CHIP_CARRIZO] = _v8_kfd2kgd,
>   [CHIP_RAVEN] = _v9_kfd2kgd,
>   #endif
> +#ifdef CONFIG_DRM_AMDGPU_CIK
>   [CHIP_HAWAII] = _v7_kfd2kgd,
> +#endif
>   [CHIP_TONGA] = _v8_kfd2kgd,
>   [CHIP_FIJI] = _v8_kfd2kgd,
>   [CHIP_POLARIS10] = _v8_kfd2kgd,
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdkfd: add missing void argument to function kgd2kfd_init

2019-10-07 Thread Kuehling, Felix
On 2019-10-07 12:08 p.m., Alex Deucher wrote:
> On Sat, Oct 5, 2019 at 1:58 PM Colin King  wrote:
>> From: Colin Ian King 
>>
>> Function kgd2kfd_init is missing a void argument, add it
>> to clean up the non-ANSI function declaration.
>>
>> Signed-off-by: Colin Ian King 
> Applied.  thanks!

Thank you!


>
> Alex
>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_module.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
>> index 986ff52d5750..f4b7f7e6c40e 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
>> @@ -82,7 +82,7 @@ static void kfd_exit(void)
>>  kfd_chardev_exit();
>>   }
>>
>> -int kgd2kfd_init()
>> +int kgd2kfd_init(void)
>>   {
>>  return kfd_init();
>>   }
>> --
>> 2.20.1
>>
>> ___
>> dri-devel mailing list
>> dri-de...@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: Enable gfx cache probing on HDP write for arcturus

2019-10-04 Thread Kuehling, Felix
I'm pretty sure the gart_enable function is not the right place for 
this. GART is for GPU access to system memory. HDP is for host access to 
GPU memory. Also, I would expect anything done in gart_enable to be 
undone in gart_disable. If that's not the intention, maybe this should 
go in gmc_v9_0_hw_init.

Regards,
   Felix

On 2019-10-04 10:56, Zeng, Oak wrote:
> Ping...
>
> Regards,
> Oak
>
> -Original Message-
> From: Zeng, Oak 
> Sent: Thursday, September 19, 2019 5:17 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Kuehling, Felix ; Koenig, Christian 
> ; Zeng, Oak 
> Subject: [PATCH] drm/amdgpu: Enable gfx cache probing on HDP write for 
> arcturus
>
> This allows gfx cache to be probed and invalidated (for none-dirty cache 
> lines) on a HDP write (from either another GPU or CPU). This should work only 
> for the memory mapped as RW memory type newly added for arcturus, to achieve 
> some cache coherence b/t multiple memory clients.
>
> Change-Id: I0a69de48706bb713235bfbc83fcc67774614
> Signed-off-by: Oak Zeng 
> ---
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index 57d76ee..e01a359 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -1272,6 +1272,9 @@ static int gmc_v9_0_gart_enable(struct amdgpu_device 
> *adev)
>   /* TODO for renoir */
>   mmhub_v1_0_update_power_gating(adev, true);
>   break;
> + case CHIP_ARCTURUS:
> + WREG32_FIELD15(HDP, 0, HDP_MMHUB_CNTL, HDP_MMHUB_GCC, 1);
> + break;
>   default:
>   break;
>   }
> --
> 2.7.4
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/2] drm/amdkfd: Print more sdma engine hqds in debug fs

2019-10-04 Thread Kuehling, Felix
On 2019-10-04 10:48, Zeng, Oak wrote:
> Previously only PCIe-optimized SDMA engine hqds were
> exposed in debug fs. Print all SDMA engine hqds.
>
> Change-Id: I03756fc0fa99169d88e265560f505ed186242b02
> Reported-by: Jonathan Kim 
> Signed-off-by: Jonathan Kim 
> Signed-off-by: Oak Zeng 
Minor cosmetic nit-pick inline that checkpatch.pl would probably warn 
about. With that fixed, this patch is

Reviewed-by: Felix Kuehling 


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index e55d021..0ebc604 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -2416,7 +2416,7 @@ int dqm_debugfs_hqds(struct seq_file *m, void *data)
>   }
>   }
>   
> - for (pipe = 0; pipe < get_num_sdma_engines(dqm); pipe++) {
> + for (pipe = 0; pipe < get_num_sdma_engines(dqm) + 
> get_num_xgmi_sdma_engines(dqm); pipe++) {

This line looks longer than 80 characters. Try to find a good place to 
break it.


>   for (queue = 0;
>queue < dqm->dev->device_info->num_sdma_queues_per_engine;
>queue++) {
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] drm/amdkfd: Fix MQD size calculation

2019-10-04 Thread Kuehling, Felix
On 2019-10-04 10:48, Zeng, Oak wrote:
> On device initialization, a trunk of GTT memory is pre-allocated for
> HIQ and all SDMA queues mqd. The size of this allocation was wrong.
> The correct sdma engine number should be PCIe-optimized SDMA engine
> number plus xgmi SDMA engine number.
>
> Change-Id: Iecd11ae4f5a314591566772aa2a23e1fe4b94275
> Reported-by: Jonathan Kim 
> Signed-off-by: Jonathan Kim 
> Signed-off-by: Oak Zeng 

Minor cosmetic nit-pick inline that checkpatch.pl would probably warn 
about. With that fixed, this patch is

Reviewed-by: Felix Kuehling 


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 16c04f8..e55d021 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -1849,7 +1849,8 @@ static int allocate_hiq_sdma_mqd(struct 
> device_queue_manager *dqm)
>   struct kfd_dev *dev = dqm->dev;
>   struct kfd_mem_obj *mem_obj = >hiq_sdma_mqd;
>   uint32_t size = dqm->mqd_mgrs[KFD_MQD_TYPE_SDMA]->mqd_size *
> - dev->device_info->num_sdma_engines *
> + (dev->device_info->num_sdma_engines +
> + dev->device_info->num_xgmi_sdma_engines)*

There should be a space between ) and *.


>   dev->device_info->num_sdma_queues_per_engine +
>   dqm->mqd_mgrs[KFD_MQD_TYPE_HIQ]->mqd_size;
>   
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/1] drm/amdgpu: Fix error handling in amdgpu_ras_recovery_init

2019-10-03 Thread Kuehling, Felix
Don't set a struct pointer to NULL before freeing its members. It's
hard to see what's happening due to a local pointer-to-pointer data
aliasing con->eh_data.

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 486568ded6d6..0e2ee5869b5f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1542,10 +1542,10 @@ int amdgpu_ras_recovery_init(struct amdgpu_device *adev)
 release:
amdgpu_ras_release_bad_pages(adev);
 free:
-   con->eh_data = NULL;
kfree((*data)->bps);
kfree((*data)->bps_bo);
kfree(*data);
+   con->eh_data = NULL;
 out:
DRM_WARN("Failed to initialize ras recovery!\n");
 
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdkfd: Improve KFD IOCTL printing

2019-10-01 Thread Kuehling, Felix
On 2019-09-30 17:55, Zhao, Yong wrote:
> The code use hex define, so should the printing.
>
> Change-Id: Ia7cc7690553bb043915b3d8c0157216c64421a60
> Signed-off-by: Yong Zhao 

Reviewed-by: Felix Kuehling 


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index c28ba0c1d7ac..d9e36dbf13d5 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -1840,7 +1840,7 @@ static long kfd_ioctl(struct file *filep, unsigned int 
> cmd, unsigned long arg)
>   } else
>   goto err_i1;
>   
> - dev_dbg(kfd_device, "ioctl cmd 0x%x (#%d), arg 0x%lx\n", cmd, nr, arg);
> + dev_dbg(kfd_device, "ioctl cmd 0x%x (#0x%x), arg 0x%lx\n", cmd, nr, 
> arg);
>   
>   process = kfd_get_process(current);
>   if (IS_ERR(process)) {
> @@ -1895,7 +1895,8 @@ static long kfd_ioctl(struct file *filep, unsigned int 
> cmd, unsigned long arg)
>   kfree(kdata);
>   
>   if (retcode)
> - dev_dbg(kfd_device, "ret = %d\n", retcode);
> + dev_dbg(kfd_device, "ioctl cmd (#0x%x), arg 0x%lx, ret = %d\n",
> + nr, arg, retcode);
>   
>   return retcode;
>   }
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 5/6] drm/amdgpu: Add the HDP flush support for Navi

2019-09-30 Thread Kuehling, Felix
As far as I can tell, this is the only patch with functional changes in 
the patch series. The rest are purely clean-up. Any relation I'm missing?

Anyway, patches 2,3,5 are

Reviewed-by: Felix Kuehling 

On 2019-09-27 11:41 p.m., Zhao, Yong wrote:
> The HDP flush support code was missing in the nbio and nv files.
>
> Change-Id: I046ff52567676b56bf16dc1728b02481233acb61
> Signed-off-by: Yong Zhao 
> ---
>   drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c | 16 +---
>   drivers/gpu/drm/amd/amdgpu/nv.c|  9 +
>   2 files changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c 
> b/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c
> index e7e36fb6113d..c699cbfe015a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c
> +++ b/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c
> @@ -27,11 +27,21 @@
>   #include "nbio/nbio_2_3_default.h"
>   #include "nbio/nbio_2_3_offset.h"
>   #include "nbio/nbio_2_3_sh_mask.h"
> +#include 
>   
>   #define smnPCIE_CONFIG_CNTL 0x11180044
>   #define smnCPM_CONTROL  0x11180460
>   #define smnPCIE_CNTL2   0x11180070
>   
> +
> +static void nbio_v2_3_remap_hdp_registers(struct amdgpu_device *adev)
> +{
> + WREG32_SOC15(NBIO, 0, mmREMAP_HDP_MEM_FLUSH_CNTL,
> + adev->rmmio_remap.reg_offset + 
> KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
> + WREG32_SOC15(NBIO, 0, mmREMAP_HDP_REG_FLUSH_CNTL,
> + adev->rmmio_remap.reg_offset + 
> KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
> +}
> +
>   static u32 nbio_v2_3_get_rev_id(struct amdgpu_device *adev)
>   {
>   u32 tmp = RREG32_SOC15(NBIO, 0, mmRCC_DEV0_EPF0_STRAP0);
> @@ -56,10 +66,9 @@ static void nbio_v2_3_hdp_flush(struct amdgpu_device *adev,
>   struct amdgpu_ring *ring)
>   {
>   if (!ring || !ring->funcs->emit_wreg)
> - WREG32_SOC15_NO_KIQ(NBIO, 0, 
> mmBIF_BX_PF_HDP_MEM_COHERENCY_FLUSH_CNTL, 0);
> + WREG32_NO_KIQ((adev->rmmio_remap.reg_offset + 
> KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2, 0);
>   else
> - amdgpu_ring_emit_wreg(ring, SOC15_REG_OFFSET(
> - NBIO, 0, mmBIF_BX_PF_HDP_MEM_COHERENCY_FLUSH_CNTL), 0);
> + amdgpu_ring_emit_wreg(ring, (adev->rmmio_remap.reg_offset + 
> KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2, 0);
>   }
>   
>   static u32 nbio_v2_3_get_memsize(struct amdgpu_device *adev)
> @@ -330,4 +339,5 @@ const struct amdgpu_nbio_funcs nbio_v2_3_funcs = {
>   .ih_control = nbio_v2_3_ih_control,
>   .init_registers = nbio_v2_3_init_registers,
>   .detect_hw_virt = nbio_v2_3_detect_hw_virt,
> + .remap_hdp_registers = nbio_v7_4_remap_hdp_registers,
>   };
> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
> index b3e7756fcc4b..6699a45b88ec 100644
> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
> @@ -587,8 +587,11 @@ static const struct amdgpu_asic_funcs nv_asic_funcs =
>   
>   static int nv_common_early_init(void *handle)
>   {
> +#define MMIO_REG_HOLE_OFFSET (0x8 - PAGE_SIZE)
>   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>   
> + adev->rmmio_remap.reg_offset = MMIO_REG_HOLE_OFFSET;
> + adev->rmmio_remap.bus_addr = adev->rmmio_base + MMIO_REG_HOLE_OFFSET;
>   adev->smc_rreg = NULL;
>   adev->smc_wreg = NULL;
>   adev->pcie_rreg = _pcie_rreg;
> @@ -714,6 +717,12 @@ static int nv_common_hw_init(void *handle)
>   nv_program_aspm(adev);
>   /* setup nbio registers */
>   adev->nbio.funcs->init_registers(adev);
> + /* remap HDP registers to a hole in mmio space,
> +  * for the purpose of expose those registers
> +  * to process space
> +  */
> + if (adev->nbio.funcs->remap_hdp_registers)
> + adev->nbio.funcs->remap_hdp_registers(adev);
>   /* enable the doorbell aperture */
>   nv_enable_doorbell_aperture(adev, true);
>   
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/6] drm/amdkfd: Update parameter type of pasid to uint16_t

2019-09-30 Thread Kuehling, Felix
If you want to make this interface consistent, you should make the vmid 
parameter uint8_t at the same time. That said, you don't really save any 
resources, because 8-bit and 16-bit ints still consume 32-bits on the 
call stack.

Regards,
   Felix

On 2019-09-27 11:41 p.m., Zhao, Yong wrote:
> This is consistent with other code and registers in the code.
>
> Change-Id: I04dd12bdb465a43cfcd8936ed0f227a6546830e8
> Signed-off-by: Yong Zhao 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 4 ++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c | 4 ++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 4 ++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h | 2 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 4 ++--
>   drivers/gpu/drm/amd/include/kgd_kfd_interface.h   | 2 +-
>   7 files changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> index 122698f8dd1e..33cbf1d073d3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> @@ -59,7 +59,7 @@ static void kgd_program_sh_mem_settings(struct kgd_dev 
> *kgd, uint32_t vmid,
>   uint32_t sh_mem_config,
>   uint32_t sh_mem_ape1_base, uint32_t sh_mem_ape1_limit,
>   uint32_t sh_mem_bases);
> -static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int 
> pasid,
> +static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, uint16_t pasid,
>   unsigned int vmid);
>   static int kgd_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id);
>   static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
> @@ -232,7 +232,7 @@ static void kgd_program_sh_mem_settings(struct kgd_dev 
> *kgd, uint32_t vmid,
>   unlock_srbm(kgd);
>   }
>   
> -static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int 
> pasid,
> +static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, uint16_t pasid,
>   unsigned int vmid)
>   {
>   struct amdgpu_device *adev = get_amdgpu_device(kgd);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
> index f77ddf7dba2b..0210d791dea1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
> @@ -94,7 +94,7 @@ static void kgd_program_sh_mem_settings(struct kgd_dev 
> *kgd, uint32_t vmid,
>   uint32_t sh_mem_config, uint32_t sh_mem_ape1_base,
>   uint32_t sh_mem_ape1_limit, uint32_t sh_mem_bases);
>   
> -static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int 
> pasid,
> +static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, uint16_t pasid,
>   unsigned int vmid);
>   
>   static int kgd_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id);
> @@ -256,7 +256,7 @@ static void kgd_program_sh_mem_settings(struct kgd_dev 
> *kgd, uint32_t vmid,
>   unlock_srbm(kgd);
>   }
>   
> -static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int 
> pasid,
> +static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, uint16_t pasid,
>   unsigned int vmid)
>   {
>   struct amdgpu_device *adev = get_amdgpu_device(kgd);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
> index 7478caf096ad..7a4c762e1209 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
> @@ -52,7 +52,7 @@ static void kgd_program_sh_mem_settings(struct kgd_dev 
> *kgd, uint32_t vmid,
>   uint32_t sh_mem_config,
>   uint32_t sh_mem_ape1_base, uint32_t sh_mem_ape1_limit,
>   uint32_t sh_mem_bases);
> -static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int 
> pasid,
> +static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, uint16_t pasid,
>   unsigned int vmid);
>   static int kgd_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id);
>   static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
> @@ -210,7 +210,7 @@ static void kgd_program_sh_mem_settings(struct kgd_dev 
> *kgd, uint32_t vmid,
>   unlock_srbm(kgd);
>   }
>   
> -static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int 
> pasid,
> +static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, uint16_t pasid,
>   unsigned int vmid)
>   {
>   struct amdgpu_device *adev = get_amdgpu_device(kgd);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> index 50f885576bbe..6be6061c5554 100644
> 

Re: [PATCH 6/6] drm/amdkfd: Improve KFD IOCTL printing

2019-09-30 Thread Kuehling, Felix
On 2019-09-27 11:41 p.m., Zhao, Yong wrote:
> The code use hex define, so should the printing. Also, printf a message
> if there is a failure.
>
> Change-Id: Ia7cc7690553bb043915b3d8c0157216c64421a60
> Signed-off-by: Yong Zhao 
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index c28ba0c1d7ac..d1ab09c0f522 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -1840,7 +1840,7 @@ static long kfd_ioctl(struct file *filep, unsigned int 
> cmd, unsigned long arg)
>   } else
>   goto err_i1;
>   
> - dev_dbg(kfd_device, "ioctl cmd 0x%x (#%d), arg 0x%lx\n", cmd, nr, arg);
> + dev_dbg(kfd_device, "ioctl cmd 0x%x (#0x%x), arg 0x%lx\n", cmd, nr, 
> arg);
>   
>   process = kfd_get_process(current);
>   if (IS_ERR(process)) {
> @@ -1895,7 +1895,8 @@ static long kfd_ioctl(struct file *filep, unsigned int 
> cmd, unsigned long arg)
>   kfree(kdata);
>   
>   if (retcode)
> - dev_dbg(kfd_device, "ret = %d\n", retcode);
> + dev_err(kfd_device, "ioctl cmd (#0x%x), arg 0x%lx, ret = %d\n",
> + nr, arg, retcode);

NAK. We don't want to spam the kernel log with cryptic error messages 
every time ioctl functions fail. Please leave this as a dev_dbg message. 
Failing ioctl functions could be perfectly normal for a number of 
reasons (system call interrupted by signal, running out of event slots, 
timeouts on event waiting, etc). But every bug report will incorrectly 
blame any unrelated problem on those messages if they happen to appear 
in the kernel log.

Regards,
   Felix


>   
>   return retcode;
>   }
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 4/6] drm/amdkfd: Use array to probe kfd2kgd_calls

2019-09-30 Thread Kuehling, Felix
On 2019-09-27 11:41 p.m., Zhao, Yong wrote:
> This is the same idea as the kfd device info probe and move all the
> probe control together for easy maintenance.
>
> Change-Id: I85c98bb08eb2a4a1a80c3b913c32691cc74602d1
> Signed-off-by: Yong Zhao 

Nice clean-up. See one comment inline.

Also, please check that this doesn't break the build if CONFIG_HSA_AMD 
is undefined.

With that fixed and checked, this patch is

Reviewed-by: Felix Kuehling 


> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 65 +--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|  7 --
>   .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |  8 +--
>   .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c|  7 +-
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |  7 +-
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |  7 +-
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |  7 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c   | 39 +--
>   8 files changed, 41 insertions(+), 106 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> index 92666b197f6c..8c531793fe17 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> @@ -63,47 +63,10 @@ void amdgpu_amdkfd_fini(void)
>   
>   void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev)
>   {
> - const struct kfd2kgd_calls *kfd2kgd;
>   bool vf = amdgpu_sriov_vf(adev);
>   
> - switch (adev->asic_type) {
> -#ifdef CONFIG_DRM_AMDGPU_CIK
> - case CHIP_KAVERI:
> - case CHIP_HAWAII:
> - kfd2kgd = amdgpu_amdkfd_gfx_7_get_functions();
> - break;
> -#endif
> - case CHIP_CARRIZO:
> - case CHIP_TONGA:
> - case CHIP_FIJI:
> - case CHIP_POLARIS10:
> - case CHIP_POLARIS11:
> - case CHIP_POLARIS12:
> - case CHIP_VEGAM:
> - kfd2kgd = amdgpu_amdkfd_gfx_8_0_get_functions();
> - break;
> - case CHIP_VEGA10:
> - case CHIP_VEGA12:
> - case CHIP_VEGA20:
> - case CHIP_RAVEN:
> - case CHIP_RENOIR:
> - kfd2kgd = amdgpu_amdkfd_gfx_9_0_get_functions();
> - break;
> - case CHIP_ARCTURUS:
> - kfd2kgd = amdgpu_amdkfd_arcturus_get_functions();
> - break;
> - case CHIP_NAVI10:
> - case CHIP_NAVI14:
> - case CHIP_NAVI12:
> - kfd2kgd = amdgpu_amdkfd_gfx_10_0_get_functions();
> - break;
> - default:
> - dev_info(adev->dev, "kfd not supported on this ASIC\n");
> - return;
> - }
> -
>   adev->kfd.dev = kgd2kfd_probe((struct kgd_dev *)adev,
> -   adev->pdev, kfd2kgd, adev->asic_type, vf);
> +   adev->pdev, adev->asic_type, vf);
>   
>   if (adev->kfd.dev)
>   amdgpu_amdkfd_total_mem_size += adev->gmc.real_vram_size;
> @@ -711,33 +674,7 @@ int amdgpu_amdkfd_evict_userptr(struct kgd_mem *mem, 
> struct mm_struct *mm)
>   return 0;
>   }
>   
> -struct kfd2kgd_calls *amdgpu_amdkfd_gfx_7_get_functions(void)
> -{
> - return NULL;
> -}
> -
> -struct kfd2kgd_calls *amdgpu_amdkfd_gfx_8_0_get_functions(void)
> -{
> - return NULL;
> -}
> -
> -struct kfd2kgd_calls *amdgpu_amdkfd_gfx_9_0_get_functions(void)
> -{
> - return NULL;
> -}
> -
> -struct kfd2kgd_calls *amdgpu_amdkfd_arcturus_get_functions(void)
> -{
> - return NULL;
> -}
> -
> -struct kfd2kgd_calls *amdgpu_amdkfd_gfx_10_0_get_functions(void)
> -{
> - return NULL;
> -}
> -
>   struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev,
> -   const struct kfd2kgd_calls *f2g,
> unsigned int asic_type, bool vf)
>   {
>   return NULL;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index 4eb2fb85de26..069d5d230810 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -137,12 +137,6 @@ int amdgpu_amdkfd_submit_ib(struct kgd_dev *kgd, enum 
> kgd_engine_type engine,
>   void amdgpu_amdkfd_set_compute_idle(struct kgd_dev *kgd, bool idle);
>   bool amdgpu_amdkfd_have_atomics_support(struct kgd_dev *kgd);
>   
> -struct kfd2kgd_calls *amdgpu_amdkfd_gfx_7_get_functions(void);
> -struct kfd2kgd_calls *amdgpu_amdkfd_gfx_8_0_get_functions(void);
> -struct kfd2kgd_calls *amdgpu_amdkfd_gfx_9_0_get_functions(void);
> -struct kfd2kgd_calls *amdgpu_amdkfd_arcturus_get_functions(void);
> -struct kfd2kgd_calls *amdgpu_amdkfd_gfx_10_0_get_functions(void);
> -
>   bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid);
>   
>   int amdgpu_amdkfd_pre_reset(struct amdgpu_device *adev);
> @@ -248,7 +242,6 @@ void amdgpu_amdkfd_unreserve_memory_limit(struct 
> amdgpu_bo *bo);
>   int kgd2kfd_init(void);
>   void kgd2kfd_exit(void);
>   struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev,
> -   

Re: [PATCH] drm/amdkfd: fix kgd2kfd_device_init() definition conflict error

2019-09-27 Thread Kuehling, Felix
On 2019-09-27 0:33, Liang, Prike wrote:
> The patch c670707 drm/amd: Pass drm_device to kfd introduced this issue and
> fix the following compiler error.
>
>CC [M]  drivers/gpu/drm/amd/amdgpu//../powerplay/smumgr/fiji_smumgr.o
> drivers/gpu/drm/amd/amdgpu//amdgpu_amdkfd.c:746:6: error: conflicting types 
> for ‘kgd2kfd_device_init’
>   bool kgd2kfd_device_init(struct kfd_dev *kfd,
>^
> In file included from drivers/gpu/drm/amd/amdgpu//amdgpu_amdkfd.c:23:0:
> drivers/gpu/drm/amd/amdgpu//amdgpu_amdkfd.h:253:6: note: previous declaration 
> of ‘kgd2kfd_device_init’ was here
>   bool kgd2kfd_device_init(struct kfd_dev *kfd,
>^
> scripts/Makefile.build:273: recipe for target 
> 'drivers/gpu/drm/amd/amdgpu//amdgpu_amdkfd.o' failed
> make[1]: *** [drivers/gpu/drm/amd/amdgpu//amdgpu_amdkfd.o] Error 1
>
> Signed-off-by: Prike Liang 

This fix is for the case that CONFIG_HSA_AMD is disabled. Sorry for 
missing that when reviewing Harish's code.

Reviewed-by: Felix Kuehling 


> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> index 221047d..92666b1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> @@ -744,6 +744,7 @@ struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct 
> pci_dev *pdev,
>   }
>   
>   bool kgd2kfd_device_init(struct kfd_dev *kfd,
> +  struct drm_device *ddev,
>const struct kgd2kfd_shared_resources *gpu_resources)
>   {
>   return false;
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/3] drm/amdkfd: Use setup_vm_pt_regs function from base driver in KFD

2019-09-26 Thread Kuehling, Felix
On 2019-09-25 2:15 p.m., Zhao, Yong wrote:
> This was done on GFX9 previously, now do it for GFX10.
>
> Change-Id: I4442e60534c59bc9526a673559f018ba8058deac
> Signed-off-by: Yong Zhao 

Reviewed-by: Felix Kuehling 


> ---
>   .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 23 +++
>   1 file changed, 3 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> index fe5b702c75ce..64568ed32793 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> @@ -42,6 +42,7 @@
>   #include "v10_structs.h"
>   #include "nv.h"
>   #include "nvd.h"
> +#include "gfxhub_v2_0.h"
>   
>   enum hqd_dequeue_request_type {
>   NO_ACTION = 0,
> @@ -251,11 +252,6 @@ static int kgd_set_pasid_vmid_mapping(struct kgd_dev 
> *kgd, unsigned int pasid,
>   ATC_VMID0_PASID_MAPPING__VALID_MASK;
>   
>   pr_debug("pasid 0x%x vmid %d, reg value %x\n", pasid, vmid, 
> pasid_mapping);
> - /*
> -  * need to do this twice, once for gfx and once for mmhub
> -  * for ATC add 16 to VMID for mmhub, for IH different registers.
> -  * ATC_VMID0..15 registers are separate from ATC_VMID16..31.
> -  */
>   
>   pr_debug("ATHUB, reg %x\n", SOC15_REG_OFFSET(ATHUB, 0, 
> mmATC_VMID0_PASID_MAPPING) + vmid);
>   WREG32(SOC15_REG_OFFSET(ATHUB, 0, mmATC_VMID0_PASID_MAPPING) + vmid,
> @@ -910,7 +906,6 @@ static void set_vm_context_page_table_base(struct kgd_dev 
> *kgd, uint32_t vmid,
>   uint64_t page_table_base)
>   {
>   struct amdgpu_device *adev = get_amdgpu_device(kgd);
> - uint64_t base = page_table_base | AMDGPU_PTE_VALID;
>   
>   if (!amdgpu_amdkfd_is_kfd_vmid(adev, vmid)) {
>   pr_err("trying to set page table base for wrong VMID %u\n",
> @@ -918,18 +913,6 @@ static void set_vm_context_page_table_base(struct 
> kgd_dev *kgd, uint32_t vmid,
>   return;
>   }
>   
> - /* TODO: take advantage of per-process address space size. For
> -  * now, all processes share the same address space size, like
> -  * on GFX8 and older.
> -  */
> - WREG32(SOC15_REG_OFFSET(GC, 0, 
> mmGCVM_CONTEXT0_PAGE_TABLE_START_ADDR_LO32) + (vmid*2), 0);
> - WREG32(SOC15_REG_OFFSET(GC, 0, 
> mmGCVM_CONTEXT0_PAGE_TABLE_START_ADDR_HI32) + (vmid*2), 0);
> -
> - WREG32(SOC15_REG_OFFSET(GC, 0, 
> mmGCVM_CONTEXT0_PAGE_TABLE_END_ADDR_LO32) + (vmid*2),
> - lower_32_bits(adev->vm_manager.max_pfn - 1));
> - WREG32(SOC15_REG_OFFSET(GC, 0, 
> mmGCVM_CONTEXT0_PAGE_TABLE_END_ADDR_HI32) + (vmid*2),
> - upper_32_bits(adev->vm_manager.max_pfn - 1));
> -
> - WREG32(SOC15_REG_OFFSET(GC, 0, 
> mmGCVM_CONTEXT0_PAGE_TABLE_BASE_ADDR_LO32) + (vmid*2), lower_32_bits(base));
> - WREG32(SOC15_REG_OFFSET(GC, 0, 
> mmGCVM_CONTEXT0_PAGE_TABLE_BASE_ADDR_HI32) + (vmid*2), upper_32_bits(base));
> + /* SDMA is on gfxhub as well on Navi1* series */
> + gfxhub_v2_0_setup_vm_pt_regs(adev, vmid, page_table_base);
>   }
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/3] drm/amdgpu: Export setup_vm_pt_regs() logic for gfxhub 2.0

2019-09-26 Thread Kuehling, Felix
For GFXv9 you made an equivalent change for both GFXHub and MMHub 
("drm/amdgpu: Expose *_setup_vm_pt_regs for kfd to use"). Your GFXv9 
commit was also reviewed by Alex and Christian. You should get at least 
one of them to Ack or Review this change.

For GFXv10 you're only changing the GFXHub. I suspect that's because KFD 
doesn't care about MMHub on GFXv10. That's fine with me.

You can add
Reviewed-by: Felix Kuehling 

Thanks,
   Felix

On 2019-09-25 2:15 p.m., Zhao, Yong wrote:
> The KFD code will call this function later.
>
> Change-Id: I88a53368cdee719b2c75393e5cdbd8290584548e
> Signed-off-by: Yong Zhao 
> ---
>   drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c | 20 
>   drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.h |  2 ++
>   2 files changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c
> index a9238735d361..b601c6740ef5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c
> @@ -46,21 +46,25 @@ u64 gfxhub_v2_0_get_mc_fb_offset(struct amdgpu_device 
> *adev)
>   return (u64)RREG32_SOC15(GC, 0, mmGCMC_VM_FB_OFFSET) << 24;
>   }
>   
> -static void gfxhub_v2_0_init_gart_pt_regs(struct amdgpu_device *adev)
> +void gfxhub_v2_0_setup_vm_pt_regs(struct amdgpu_device *adev, uint32_t vmid,
> + uint64_t page_table_base)
>   {
> - uint64_t value = amdgpu_gmc_pd_addr(adev->gart.bo);
> + /* two registers distance between mmGCVM_CONTEXT0_* to 
> mmGCVM_CONTEXT1_* */
> + int offset = mmGCVM_CONTEXT1_PAGE_TABLE_BASE_ADDR_LO32
> + - mmGCVM_CONTEXT0_PAGE_TABLE_BASE_ADDR_LO32;
>   
> + WREG32_SOC15_OFFSET(GC, 0, mmGCVM_CONTEXT0_PAGE_TABLE_BASE_ADDR_LO32,
> + offset * vmid, lower_32_bits(page_table_base));
>   
> - WREG32_SOC15(GC, 0, mmGCVM_CONTEXT0_PAGE_TABLE_BASE_ADDR_LO32,
> -  lower_32_bits(value));
> -
> - WREG32_SOC15(GC, 0, mmGCVM_CONTEXT0_PAGE_TABLE_BASE_ADDR_HI32,
> -  upper_32_bits(value));
> + WREG32_SOC15_OFFSET(GC, 0, mmGCVM_CONTEXT0_PAGE_TABLE_BASE_ADDR_HI32,
> + offset * vmid, upper_32_bits(page_table_base));
>   }
>   
>   static void gfxhub_v2_0_init_gart_aperture_regs(struct amdgpu_device *adev)
>   {
> - gfxhub_v2_0_init_gart_pt_regs(adev);
> + uint64_t pt_base = amdgpu_gmc_pd_addr(adev->gart.bo);
> +
> + gfxhub_v2_0_setup_vm_pt_regs(adev, 0, pt_base);
>   
>   WREG32_SOC15(GC, 0, mmGCVM_CONTEXT0_PAGE_TABLE_START_ADDR_LO32,
>(u32)(adev->gmc.gart_start >> 12));
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.h 
> b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.h
> index 06807940748b..392b8cd94fc0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.h
> +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.h
> @@ -31,5 +31,7 @@ void gfxhub_v2_0_set_fault_enable_default(struct 
> amdgpu_device *adev,
> bool value);
>   void gfxhub_v2_0_init(struct amdgpu_device *adev);
>   u64 gfxhub_v2_0_get_mc_fb_offset(struct amdgpu_device *adev);
> +void gfxhub_v2_0_setup_vm_pt_regs(struct amdgpu_device *adev, uint32_t vmid,
> + uint64_t page_table_base);
>   
>   #endif
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] drm/amdkfd: Record vmid pasid mapping in the driver for non HWS mode

2019-09-26 Thread Kuehling, Felix

On 2019-09-26 5:59 p.m., Zhao, Yong wrote:
> On 2019-09-26 5:36 p.m., Kuehling, Felix wrote:
>> Minor nit-pick inline. Otherwise this patch is
>>
>> Reviewed-by: Felix Kuehling 
>>
>> On 2019-09-26 5:27 p.m., Zhao, Yong wrote:
>>> This makes possible the vmid pasid mapping query through software.
>>>
>>> Change-Id: Ib539aae277a227cc39f6469ae23c46c4d289b87b
>>> Signed-off-by: Yong Zhao 
>>> ---
>>> .../drm/amd/amdkfd/kfd_device_queue_manager.c | 33 ---
>>> .../drm/amd/amdkfd/kfd_device_queue_manager.h |  3 +-
>>> drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  2 ++
>>> 3 files changed, 25 insertions(+), 13 deletions(-)
>>>
[snip]
>>> 
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
>>> index eed8f950b663..99c8b36301ef 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
>>> @@ -188,7 +188,8 @@ struct device_queue_manager {
>>> unsigned int*allocated_queues;
>>> uint64_tsdma_bitmap;
>>> uint64_txgmi_sdma_bitmap;
>>> -   unsigned intvmid_bitmap;
>>> +   /* the pasid mapping for each kfd vmid */
>>> +   uint16_tvmid_pasid[VMID_NUM];
>>> uint64_tpipelines_addr;
>>> struct kfd_mem_obj  *pipeline_mem;
>>> uint64_tfence_gpu_addr;
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> index 0d2c7fa1fa46..a08015720841 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> @@ -43,6 +43,8 @@
>>> 
>>> #include "amd_shared.h"
>>> 
>>> +#define VMID_NUM 16
>>> +
>> Any good reason why this is not defined in kfd_device_queue_manager.h?
>> It's only used there.
> [yz] It could be used by other places in the future, as they use 16
> directly now.

Can you point out those places? A quick grep for hard-coded 16 in kfd 
doesn't show up anything VMID-related on a first glance.

Regards,
   Felix


>>
>>> #define KFD_MAX_RING_ENTRY_SIZE 8
>>> 
>>> #define KFD_SYSFS_FILE_MODE 0444
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] drm/amdkfd: Record vmid pasid mapping in the driver for non HWS mode

2019-09-26 Thread Kuehling, Felix
Minor nit-pick inline. Otherwise this patch is

Reviewed-by: Felix Kuehling 

On 2019-09-26 5:27 p.m., Zhao, Yong wrote:
> This makes possible the vmid pasid mapping query through software.
>
> Change-Id: Ib539aae277a227cc39f6469ae23c46c4d289b87b
> Signed-off-by: Yong Zhao 
> ---
>   .../drm/amd/amdkfd/kfd_device_queue_manager.c | 33 ---
>   .../drm/amd/amdkfd/kfd_device_queue_manager.h |  3 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  2 ++
>   3 files changed, 25 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index e7f0a32e0e44..455f49a25ccb 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -224,20 +224,30 @@ static int allocate_vmid(struct device_queue_manager 
> *dqm,
>   struct qcm_process_device *qpd,
>   struct queue *q)
>   {
> - int bit, allocated_vmid;
> + int allocated_vmid = -1, i;
>   
> - if (dqm->vmid_bitmap == 0)
> - return -ENOMEM;
> + for (i = dqm->dev->vm_info.first_vmid_kfd;
> + i <= dqm->dev->vm_info.last_vmid_kfd; i++) {
> + if (!dqm->vmid_pasid[i]) {
> + allocated_vmid = i;
> + break;
> + }
> + }
> +
> + if (allocated_vmid < 0) {
> + pr_err("no more vmid to allocate\n");
> + return -ENOSPC;
> + }
> +
> + pr_debug("vmid allocated: %d\n", allocated_vmid);
> +
> + dqm->vmid_pasid[allocated_vmid] = q->process->pasid;
>   
> - bit = ffs(dqm->vmid_bitmap) - 1;
> - dqm->vmid_bitmap &= ~(1 << bit);
> + set_pasid_vmid_mapping(dqm, q->process->pasid, allocated_vmid);
>   
> - allocated_vmid = bit + dqm->dev->vm_info.first_vmid_kfd;
> - pr_debug("vmid allocation %d\n", allocated_vmid);
>   qpd->vmid = allocated_vmid;
>   q->properties.vmid = allocated_vmid;
>   
> - set_pasid_vmid_mapping(dqm, q->process->pasid, q->properties.vmid);
>   program_sh_mem_settings(dqm, qpd);
>   
>   /* qpd->page_table_base is set earlier when register_process()
> @@ -278,8 +288,6 @@ static void deallocate_vmid(struct device_queue_manager 
> *dqm,
>   struct qcm_process_device *qpd,
>   struct queue *q)
>   {
> - int bit = qpd->vmid - dqm->dev->vm_info.first_vmid_kfd;
> -
>   /* On GFX v7, CP doesn't flush TC at dequeue */
>   if (q->device->device_info->asic_family == CHIP_HAWAII)
>   if (flush_texture_cache_nocpsch(q->device, qpd))
> @@ -289,8 +297,8 @@ static void deallocate_vmid(struct device_queue_manager 
> *dqm,
>   
>   /* Release the vmid mapping */
>   set_pasid_vmid_mapping(dqm, 0, qpd->vmid);
> + dqm->vmid_pasid[qpd->vmid] = 0;
>   
> - dqm->vmid_bitmap |= (1 << bit);
>   qpd->vmid = 0;
>   q->properties.vmid = 0;
>   }
> @@ -1017,7 +1025,8 @@ static int initialize_nocpsch(struct 
> device_queue_manager *dqm)
>   dqm->allocated_queues[pipe] |= 1 << queue;
>   }
>   
> - dqm->vmid_bitmap = (1 << dqm->dev->vm_info.vmid_num_kfd) - 1;
> + memset(dqm->vmid_pasid, 0, sizeof(dqm->vmid_pasid));
> +
>   dqm->sdma_bitmap = ~0ULL >> (64 - get_num_sdma_queues(dqm));
>   dqm->xgmi_sdma_bitmap = ~0ULL >> (64 - get_num_xgmi_sdma_queues(dqm));
>   
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> index eed8f950b663..99c8b36301ef 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> @@ -188,7 +188,8 @@ struct device_queue_manager {
>   unsigned int*allocated_queues;
>   uint64_tsdma_bitmap;
>   uint64_txgmi_sdma_bitmap;
> - unsigned intvmid_bitmap;
> + /* the pasid mapping for each kfd vmid */
> + uint16_tvmid_pasid[VMID_NUM];
>   uint64_tpipelines_addr;
>   struct kfd_mem_obj  *pipeline_mem;
>   uint64_tfence_gpu_addr;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 0d2c7fa1fa46..a08015720841 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -43,6 +43,8 @@
>   
>   #include "amd_shared.h"
>   
> +#define VMID_NUM 16
> +

Any good reason why this is not defined in kfd_device_queue_manager.h? 
It's only used there.


>   #define KFD_MAX_RING_ENTRY_SIZE 8
>   
>   #define KFD_SYSFS_FILE_MODE 0444
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/2] drm/amdkfd: Query vmid pasid mapping through stored info for non HWS

2019-09-26 Thread Kuehling, Felix
On 2019-09-26 5:27 p.m., Zhao, Yong wrote:
> Because we record the mapping under non HWS mode in the software,
> we can query pasid through vmid using the stored mapping instead of
> reading from ATC registers.
>
> This also prepares for the defeatured ATC block in future ASICs.
>
> Change-Id: I781cb9d30dc0cc93379908ff1cf8da798bb26f13
> Signed-off-by: Yong Zhao 
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> index ab8a695c4a3c..9fff01c0fb9e 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> @@ -58,8 +58,8 @@ static bool event_interrupt_isr_v9(struct kfd_dev *dev,
>   memcpy(patched_ihre, ih_ring_entry,
>   dev->device_info->ih_ring_entry_size);
>   
> - pasid = dev->kfd2kgd->get_atc_vmid_pasid_mapping_pasid(
> - dev->kgd, vmid);
> + pasid = dev->dqm->vmid_pasid[vmid];
> + WARN_ONCE(pasid == 0, "No PASID assigned for VMID %d\n", vmid);

When this happens, you'll now get to WARN_ONCE messages. One here and 
then the one a few lines lower: WARN_ONCE(pasid == 0, "Bug: No PASID in 
KFD interrupt"). My point was, your messages is redundant. The original 
WARN_ONCE already covers both the HWS and non-HWS cases.

Regards,
   Felix

>   
>   /* Patch the pasid field */
>   patched_ihre[3] = cpu_to_le32((le32_to_cpu(patched_ihre[3])
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/2] drm/amdkfd: Query vmid pasid mapping through stored info

2019-09-26 Thread Kuehling, Felix

On 2019-09-26 5:02 p.m., Liu, Shaoyun wrote:
> I think this is only for none-hws case .
Yes. This is inside an if (!pasid && dev->dqm->sched_policy == 
KFD_SCHED_POLICY_NO_HWS).

Regards,
   Felix


> HWS may  dynamic change the
> mapping and driver will not get updated .  If that's the case , please
> specify this is for none hardware scheduler case in the header .
>
> Regards
>
> shaoyun.liu
>
> On 2019-09-26 4:07 p.m., Kuehling, Felix wrote:
>> On 2019-09-26 3:46 p.m., Zhao, Yong wrote:
>>> Because we record the mapping in the software, we can query pasid
>>> through vmid using the stored mapping instead of reading from ATC
>>> registers.
>>>
>>> This also prepares for the defeatured ATC block in future ASICs.
>>>
>>> Change-Id: I781cb9d30dc0cc93379908ff1cf8da798bb26f13
>>> Signed-off-by: Yong Zhao 
>>> ---
>>> drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 5 +++--
>>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c 
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
>>> index ab8a695c4a3c..754c052b7d72 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
>>> @@ -58,8 +58,9 @@ static bool event_interrupt_isr_v9(struct kfd_dev *dev,
>>> memcpy(patched_ihre, ih_ring_entry,
>>> dev->device_info->ih_ring_entry_size);
>>> 
>>> -   pasid = dev->kfd2kgd->get_atc_vmid_pasid_mapping_pasid(
>>> -   dev->kgd, vmid);
>>> +   pasid = dev->dqm->vmid_pasid[vmid];
>>> +   if (!pasid)
>>> +   pr_err("pasid is not queried correctly\n");
>> This error message is not helpful. A helpful message may be something
>> like "No PASID assigned for VMID %d". That said, printing error messages
>> in an interrupt handler that can be potentially very frequent is not the
>> best idea. There is already a WARN_ONCE a few lines below that should be
>> triggered if PASID is not assigned.
>>
>> Regards,
>>  Felix
>>
>>
>>> 
>>> /* Patch the pasid field */
>>> patched_ihre[3] = 
>>> cpu_to_le32((le32_to_cpu(patched_ihre[3])
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/2] drm/amdkfd: Query vmid pasid mapping through stored info

2019-09-26 Thread Kuehling, Felix
On 2019-09-26 3:46 p.m., Zhao, Yong wrote:
> Because we record the mapping in the software, we can query pasid
> through vmid using the stored mapping instead of reading from ATC
> registers.
>
> This also prepares for the defeatured ATC block in future ASICs.
>
> Change-Id: I781cb9d30dc0cc93379908ff1cf8da798bb26f13
> Signed-off-by: Yong Zhao 
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> index ab8a695c4a3c..754c052b7d72 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> @@ -58,8 +58,9 @@ static bool event_interrupt_isr_v9(struct kfd_dev *dev,
>   memcpy(patched_ihre, ih_ring_entry,
>   dev->device_info->ih_ring_entry_size);
>   
> - pasid = dev->kfd2kgd->get_atc_vmid_pasid_mapping_pasid(
> - dev->kgd, vmid);
> + pasid = dev->dqm->vmid_pasid[vmid];
> + if (!pasid)
> + pr_err("pasid is not queried correctly\n");

This error message is not helpful. A helpful message may be something 
like "No PASID assigned for VMID %d". That said, printing error messages 
in an interrupt handler that can be potentially very frequent is not the 
best idea. There is already a WARN_ONCE a few lines below that should be 
triggered if PASID is not assigned.

Regards,
   Felix


>   
>   /* Patch the pasid field */
>   patched_ihre[3] = cpu_to_le32((le32_to_cpu(patched_ihre[3])
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] drm/amdkfd: Record vmid pasid mapping in the driver

2019-09-26 Thread Kuehling, Felix
On 2019-09-26 3:46 p.m., Zhao, Yong wrote:
> This makes possible the vmid pasid mapping query through software.
>
> Change-Id: Ib539aae277a227cc39f6469ae23c46c4d289b87b
> Signed-off-by: Yong Zhao 
> ---
>   .../drm/amd/amdkfd/kfd_device_queue_manager.c | 33 ---
>   .../drm/amd/amdkfd/kfd_device_queue_manager.h |  3 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  2 ++
>   3 files changed, 25 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index e7f0a32e0e44..92fede18bf1d 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -224,20 +224,30 @@ static int allocate_vmid(struct device_queue_manager 
> *dqm,
>   struct qcm_process_device *qpd,
>   struct queue *q)
>   {
> - int bit, allocated_vmid;
> + int allocated_vmid = -1, i;
>   
> - if (dqm->vmid_bitmap == 0)
> - return -ENOMEM;
> + for (i = dqm->dev->vm_info.first_vmid_kfd;
> + i <= dqm->dev->vm_info.last_vmid_kfd; i++) {
> + if (!dqm->vmid_pasid[i]) {
> + allocated_vmid = i;
> + break;
> + }
> + }
> +
> + if (allocated_vmid < 0) {
> + pr_err("no more vmid to allocate\n");
> + return -ENOSPC;
> + }
> +
> + pr_debug("vmid allocated: %d\n", allocated_vmid);
> +
> + dqm->vmid_pasid[allocated_vmid] = q->process->pasid;
>   
> - bit = ffs(dqm->vmid_bitmap) - 1;
> - dqm->vmid_bitmap &= ~(1 << bit);
> + set_pasid_vmid_mapping(dqm, q->process->pasid, allocated_vmid);
>   
> - allocated_vmid = bit + dqm->dev->vm_info.first_vmid_kfd;
> - pr_debug("vmid allocation %d\n", allocated_vmid);
>   qpd->vmid = allocated_vmid;
>   q->properties.vmid = allocated_vmid;
>   
> - set_pasid_vmid_mapping(dqm, q->process->pasid, q->properties.vmid);
>   program_sh_mem_settings(dqm, qpd);
>   
>   /* qpd->page_table_base is set earlier when register_process()
> @@ -278,8 +288,6 @@ static void deallocate_vmid(struct device_queue_manager 
> *dqm,
>   struct qcm_process_device *qpd,
>   struct queue *q)
>   {
> - int bit = qpd->vmid - dqm->dev->vm_info.first_vmid_kfd;
> -
>   /* On GFX v7, CP doesn't flush TC at dequeue */
>   if (q->device->device_info->asic_family == CHIP_HAWAII)
>   if (flush_texture_cache_nocpsch(q->device, qpd))
> @@ -289,8 +297,8 @@ static void deallocate_vmid(struct device_queue_manager 
> *dqm,
>   
>   /* Release the vmid mapping */
>   set_pasid_vmid_mapping(dqm, 0, qpd->vmid);
> + dqm->vmid_pasid[qpd->vmid] = 0;
>   
> - dqm->vmid_bitmap |= (1 << bit);
>   qpd->vmid = 0;
>   q->properties.vmid = 0;
>   }
> @@ -1017,7 +1025,8 @@ static int initialize_nocpsch(struct 
> device_queue_manager *dqm)
>   dqm->allocated_queues[pipe] |= 1 << queue;
>   }
>   
> - dqm->vmid_bitmap = (1 << dqm->dev->vm_info.vmid_num_kfd) - 1;
> + memset(dqm->vmid_pasid, 0, VMID_NUM * sizeof(uint16_t));

Just use sizeof(dqm->vmid_pasid) to get the array size to avoid problems 
if the array size ever changes in the future.

Regards,
   Felix

> +
>   dqm->sdma_bitmap = ~0ULL >> (64 - get_num_sdma_queues(dqm));
>   dqm->xgmi_sdma_bitmap = ~0ULL >> (64 - get_num_xgmi_sdma_queues(dqm));
>   
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> index eed8f950b663..99c8b36301ef 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> @@ -188,7 +188,8 @@ struct device_queue_manager {
>   unsigned int*allocated_queues;
>   uint64_tsdma_bitmap;
>   uint64_txgmi_sdma_bitmap;
> - unsigned intvmid_bitmap;
> + /* the pasid mapping for each kfd vmid */
> + uint16_tvmid_pasid[VMID_NUM];
>   uint64_tpipelines_addr;
>   struct kfd_mem_obj  *pipeline_mem;
>   uint64_tfence_gpu_addr;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 0d2c7fa1fa46..a08015720841 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -43,6 +43,8 @@
>   
>   #include "amd_shared.h"
>   
> +#define VMID_NUM 16
> +
>   #define KFD_MAX_RING_ENTRY_SIZE 8
>   
>   #define KFD_SYSFS_FILE_MODE 0444
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/6] drm/amdkfd: Move the control stack on GFX10 to userspace buffer

2019-09-26 Thread Kuehling, Felix
Patches 1-3 and patch 5 are

Reviewed-by: Felix Kuehling 

See separate emails for patches 4 and 6.

On 2019-09-26 2:38 p.m., Zhao, Yong wrote:
> The GFX10 does not require the control stack to be right after mqd
> buffer any more, so move it back to usersapce allocated CSWR buffer.
>
> Change-Id: I446c9685549a09ac8846a42ee22d86cfb93fd98c
> Signed-off-by: Yong Zhao 
> ---
>   .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  | 37 ++-
>   1 file changed, 4 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
> index 29d50d6af9d7..e2fb76247f47 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
> @@ -69,35 +69,13 @@ static void update_cu_mask(struct mqd_manager *mm, void 
> *mqd,
>   static struct kfd_mem_obj *allocate_mqd(struct kfd_dev *kfd,
>   struct queue_properties *q)
>   {
> - int retval;
> - struct kfd_mem_obj *mqd_mem_obj = NULL;
> + struct kfd_mem_obj *mqd_mem_obj;
>   
> - /* From V9,  for CWSR, the control stack is located on the next page
> -  * boundary after the mqd, we will use the gtt allocation function
> -  * instead of sub-allocation function.
> -  */
> - if (kfd->cwsr_enabled && (q->type == KFD_QUEUE_TYPE_COMPUTE)) {
> - mqd_mem_obj = kzalloc(sizeof(struct kfd_mem_obj), GFP_NOIO);
> - if (!mqd_mem_obj)
> - return NULL;
> - retval = amdgpu_amdkfd_alloc_gtt_mem(kfd->kgd,
> - ALIGN(q->ctl_stack_size, PAGE_SIZE) +
> - ALIGN(sizeof(struct v10_compute_mqd), 
> PAGE_SIZE),
> - &(mqd_mem_obj->gtt_mem),
> - &(mqd_mem_obj->gpu_addr),
> - (void *)&(mqd_mem_obj->cpu_ptr), true);
> - } else {
> - retval = kfd_gtt_sa_allocate(kfd, sizeof(struct 
> v10_compute_mqd),
> - _mem_obj);
> - }
> -
> - if (retval) {
> - kfree(mqd_mem_obj);
> + if (kfd_gtt_sa_allocate(kfd, sizeof(struct v10_compute_mqd),
> + _mem_obj))
>   return NULL;
> - }
>   
>   return mqd_mem_obj;
> -
>   }
>   
>   static void init_mqd(struct mqd_manager *mm, void **mqd,
> @@ -250,14 +228,7 @@ static int destroy_mqd(struct mqd_manager *mm, void *mqd,
>   static void free_mqd(struct mqd_manager *mm, void *mqd,
>   struct kfd_mem_obj *mqd_mem_obj)
>   {
> - struct kfd_dev *kfd = mm->dev;
> -
> - if (mqd_mem_obj->gtt_mem) {
> - amdgpu_amdkfd_free_gtt_mem(kfd->kgd, mqd_mem_obj->gtt_mem);
> - kfree(mqd_mem_obj);
> - } else {
> - kfd_gtt_sa_free(mm->dev, mqd_mem_obj);
> - }
> + kfd_gtt_sa_free(mm->dev, mqd_mem_obj);
>   }
>   
>   static bool is_occupied(struct mqd_manager *mm, void *mqd,
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 6/6] drm/amdkfd: Eliminate get_atc_vmid_pasid_mapping_valid

2019-09-26 Thread Kuehling, Felix
On 2019-09-26 2:38 p.m., Zhao, Yong wrote:
> get_atc_vmid_pasid_mapping_valid() is very similar to
> get_atc_vmid_pasid_mapping_pasid(), so they can be merged into a new
> function get_atc_vmid_pasid_mapping_info() to reduce register access
> times.

Hmm, the most important part may actually not be the time saved, but 
getting the PASID and the valid bit atomically with a single read. That 
could fix some potential race conditions where the mapping changes 
between the two reads.

Add that to the patch description and the patch is

Reviewed-by: Felix Kuehling 


>
> Change-Id: I255ebf2629012400b07fe6a69c3d075cfd46612e
> Signed-off-by: Yong Zhao 
> ---
>   .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |  6 +--
>   .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 49 +++
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c | 28 ---
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 32 
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 45 +++--
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |  6 +--
>   .../gpu/drm/amd/amdkfd/cik_event_interrupt.c  |  8 +--
>   drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   | 16 +++---
>   .../gpu/drm/amd/include/kgd_kfd_interface.h   |  8 ++-
>   9 files changed, 76 insertions(+), 122 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
> index eb6e8b232729..5e1bd6500fe2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
> @@ -279,10 +279,8 @@ static const struct kfd2kgd_calls kfd2kgd = {
>   .address_watch_execute = kgd_gfx_v9_address_watch_execute,
>   .wave_control_execute = kgd_gfx_v9_wave_control_execute,
>   .address_watch_get_offset = kgd_gfx_v9_address_watch_get_offset,
> - .get_atc_vmid_pasid_mapping_pasid =
> - kgd_gfx_v9_get_atc_vmid_pasid_mapping_pasid,
> - .get_atc_vmid_pasid_mapping_valid =
> - kgd_gfx_v9_get_atc_vmid_pasid_mapping_valid,
> + .get_atc_vmid_pasid_mapping_info =
> + kgd_gfx_v9_get_atc_vmid_pasid_mapping_info,
>   .get_tile_config = kgd_gfx_v9_get_tile_config,
>   .set_vm_context_page_table_base = 
> kgd_gfx_v9_set_vm_context_page_table_base,
>   .invalidate_tlbs = kgd_gfx_v9_invalidate_tlbs,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> index 09d50949c5b9..57ff698f51bb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> @@ -100,10 +100,8 @@ static uint32_t kgd_address_watch_get_offset(struct 
> kgd_dev *kgd,
>   unsigned int watch_point_id,
>   unsigned int reg_offset);
>   
> -static bool get_atc_vmid_pasid_mapping_valid(struct kgd_dev *kgd,
> - uint8_t vmid);
> -static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
> - uint8_t vmid);
> +static bool get_atc_vmid_pasid_mapping_info(struct kgd_dev *kgd,
> + uint8_t vmid, uint16_t *p_pasid);
>   static void set_vm_context_page_table_base(struct kgd_dev *kgd, uint32_t 
> vmid,
>   uint64_t page_table_base);
>   static int invalidate_tlbs(struct kgd_dev *kgd, uint16_t pasid);
> @@ -157,10 +155,8 @@ static const struct kfd2kgd_calls kfd2kgd = {
>   .address_watch_execute = kgd_address_watch_execute,
>   .wave_control_execute = kgd_wave_control_execute,
>   .address_watch_get_offset = kgd_address_watch_get_offset,
> - .get_atc_vmid_pasid_mapping_pasid =
> - get_atc_vmid_pasid_mapping_pasid,
> - .get_atc_vmid_pasid_mapping_valid =
> - get_atc_vmid_pasid_mapping_valid,
> + .get_atc_vmid_pasid_mapping_info =
> + get_atc_vmid_pasid_mapping_info,
>   .get_tile_config = amdgpu_amdkfd_get_tile_config,
>   .set_vm_context_page_table_base = set_vm_context_page_table_base,
>   .invalidate_tlbs = invalidate_tlbs,
> @@ -772,26 +768,17 @@ static int kgd_hqd_sdma_destroy(struct kgd_dev *kgd, 
> void *mqd,
>   return 0;
>   }
>   
> -static bool get_atc_vmid_pasid_mapping_valid(struct kgd_dev *kgd,
> - uint8_t vmid)
> +static bool get_atc_vmid_pasid_mapping_info(struct kgd_dev *kgd,
> + uint8_t vmid, uint16_t *p_pasid)
>   {
> - uint32_t reg;
> + uint32_t value;
>   struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
>   
> - reg = RREG32(SOC15_REG_OFFSET(ATHUB, 0, mmATC_VMID0_PASID_MAPPING)
> + value = RREG32(SOC15_REG_OFFSET(ATHUB, 0, mmATC_VMID0_PASID_MAPPING)
>+ vmid);
> - return reg & ATC_VMID0_PASID_MAPPING__VALID_MASK;
> -}
> -
> -static uint16_t 

Re: [PATCH 4/6] drm/amdkfd: Record vmid pasid mapping in the driver

2019-09-26 Thread Kuehling, Felix
On 2019-09-26 2:38 p.m., Zhao, Yong wrote:
> This makes possible the vmid pasid mapping query through software.
>
> Change-Id: Ib539aae277a227cc39f6469ae23c46c4d289b87b
> Signed-off-by: Yong Zhao 
> ---
>   .../drm/amd/amdkfd/kfd_device_queue_manager.c | 34 +--
>   .../drm/amd/amdkfd/kfd_device_queue_manager.h |  3 +-
>   2 files changed, 26 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index e7f0a32e0e44..d006adefef55 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -224,20 +224,30 @@ static int allocate_vmid(struct device_queue_manager 
> *dqm,
>   struct qcm_process_device *qpd,
>   struct queue *q)
>   {
> - int bit, allocated_vmid;
> + int idx = -1, allocated_vmid, i;
>   
> - if (dqm->vmid_bitmap == 0)
> + for (i = 0; i < dqm->dev->vm_info.vmid_num_kfd; i++) {
> + if (!dqm->vmid_pasid[i]) {
> + idx = i;
> + break;
> + }
> + }
> +
> + if (idx < 0) {
> + pr_err("no more vmid to allocate\n");
>   return -ENOMEM;
> + }
> +
> + dqm->vmid_pasid[idx] = q->process->pasid;
>   
> - bit = ffs(dqm->vmid_bitmap) - 1;
> - dqm->vmid_bitmap &= ~(1 << bit);
> + allocated_vmid = idx + dqm->dev->vm_info.first_vmid_kfd;
> + pr_debug("vmid allocated: %d\n", allocated_vmid);
> +
> + set_pasid_vmid_mapping(dqm, q->process->pasid, allocated_vmid);
>   
> - allocated_vmid = bit + dqm->dev->vm_info.first_vmid_kfd;
> - pr_debug("vmid allocation %d\n", allocated_vmid);
>   qpd->vmid = allocated_vmid;
>   q->properties.vmid = allocated_vmid;
>   
> - set_pasid_vmid_mapping(dqm, q->process->pasid, q->properties.vmid);
>   program_sh_mem_settings(dqm, qpd);
>   
>   /* qpd->page_table_base is set earlier when register_process()
> @@ -278,7 +288,7 @@ static void deallocate_vmid(struct device_queue_manager 
> *dqm,
>   struct qcm_process_device *qpd,
>   struct queue *q)
>   {
> - int bit = qpd->vmid - dqm->dev->vm_info.first_vmid_kfd;
> + int idx;
>   
>   /* On GFX v7, CP doesn't flush TC at dequeue */
>   if (q->device->device_info->asic_family == CHIP_HAWAII)
> @@ -290,7 +300,9 @@ static void deallocate_vmid(struct device_queue_manager 
> *dqm,
>   /* Release the vmid mapping */
>   set_pasid_vmid_mapping(dqm, 0, qpd->vmid);
>   
> - dqm->vmid_bitmap |= (1 << bit);
> + idx = qpd->vmid - dqm->dev->vm_info.first_vmid_kfd;
> + dqm->vmid_pasid[idx] = 0;
> +
>   qpd->vmid = 0;
>   q->properties.vmid = 0;
>   }
> @@ -1017,7 +1029,8 @@ static int initialize_nocpsch(struct 
> device_queue_manager *dqm)
>   dqm->allocated_queues[pipe] |= 1 << queue;
>   }
>   
> - dqm->vmid_bitmap = (1 << dqm->dev->vm_info.vmid_num_kfd) - 1;
> + dqm->vmid_pasid = kcalloc(dqm->dev->vm_info.vmid_num_kfd,
> + sizeof(uint16_t), GFP_KERNEL);

If you allocate this dynamically, you need to check the return value. 
But see below ...


>   dqm->sdma_bitmap = ~0ULL >> (64 - get_num_sdma_queues(dqm));
>   dqm->xgmi_sdma_bitmap = ~0ULL >> (64 - get_num_xgmi_sdma_queues(dqm));
>   
> @@ -1030,6 +1043,7 @@ static void uninitialize(struct device_queue_manager 
> *dqm)
>   
>   WARN_ON(dqm->queue_count > 0 || dqm->processes_count > 0);
>   
> + kfree(dqm->vmid_pasid);
>   kfree(dqm->allocated_queues);
>   for (i = 0 ; i < KFD_MQD_TYPE_MAX ; i++)
>   kfree(dqm->mqd_mgrs[i]);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> index eed8f950b663..67b5e5fadd95 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> @@ -188,7 +188,8 @@ struct device_queue_manager {
>   unsigned int*allocated_queues;
>   uint64_tsdma_bitmap;
>   uint64_txgmi_sdma_bitmap;
> - unsigned intvmid_bitmap;
> + /* the pasid mapping for each kfd vmid */
> + uint16_t*vmid_pasid;

This could be a fixed-size array since the number of user mode VMIDs is 
limited to 15 by the HW. The size of the pointer alone is enough to 
store 4 PASIDs. Add overhead of kmalloc and you don't really save 
anything by allocating this dynamically. It only adds indirection, 
complexity (error handling) and the risk of memory leaks.

Regards,
   Felix


>   uint64_tpipelines_addr;
>   struct kfd_mem_obj  *pipeline_mem;
>   uint64_tfence_gpu_addr;
___
amd-gfx mailing list

Re: [PATCH] drm/amdgpu: Add NAVI12 support from kfd side

2019-09-25 Thread Kuehling, Felix
Agreed. KFD is part of amdgpu. We shouldn't use the CHIP_ IDs 
differently in KFD. The code duplication is minimal and we've done it 
for all chips so far. E.g. Fiji and all the Polaris versions are treated 
the same in KFD. Similarly Vega10, Vega20 and Arcturus are the same for 
most purposes.

Regards,
   Felix

On 2019-09-25 4:12 p.m., Alex Deucher wrote:
> I think it would be cleaner to add navi12 to all of the relevant
> cases.  We should double check what we did for navi14 as well.
>
> Alex
>
> On Wed, Sep 25, 2019 at 4:09 PM Liu, Shaoyun  wrote:
>> I sent out another change that set the  asic_family as CHIP_NAVI10 since 
>> from KFD side there is no difference for navi10 and  navi12.
>>
>> Regards
>> Shaoyun.liu
>>
>> -Original Message-
>> From: Kuehling, Felix 
>> Sent: Wednesday, September 25, 2019 11:23 AM
>> To: Liu, Shaoyun ; amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH] drm/amdgpu: Add NAVI12 support from kfd side
>>
>> You'll also need to add "case CHIP_NAVI12:" in a bunch of places. Grep for 
>> "CHIP_NAVI10" and you'll find them all pretty quickly.
>>
>> Regards,
>> Felix
>>
>> On 2019-09-24 6:13 p.m., Liu, Shaoyun wrote:
>>> Add device info for both navi12 PF and VF
>>>
>>> Change-Id: Ifb4035e65c12d153fc30e593fe109f9c7e0541f4
>>> Signed-off-by: shaoyunl 
>>> ---
>>>drivers/gpu/drm/amd/amdkfd/kfd_device.c | 19 +++
>>>1 file changed, 19 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>> index f329b82..edfbae5c 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>> @@ -387,6 +387,24 @@ static const struct kfd_device_info navi10_device_info 
>>> = {
>>>.num_sdma_queues_per_engine = 8,
>>>};
>>>
>>> +static const struct kfd_device_info navi12_device_info = {
>>> + .asic_family = CHIP_NAVI12,
>>> + .asic_name = "navi12",
>>> + .max_pasid_bits = 16,
>>> + .max_no_of_hqd  = 24,
>>> + .doorbell_size  = 8,
>>> + .ih_ring_entry_size = 8 * sizeof(uint32_t),
>>> + .event_interrupt_class = _interrupt_class_v9,
>>> + .num_of_watch_points = 4,
>>> + .mqd_size_aligned = MQD_SIZE_ALIGNED,
>>> + .needs_iommu_device = false,
>>> + .supports_cwsr = true,
>>> + .needs_pci_atomics = false,
>>> + .num_sdma_engines = 2,
>>> + .num_xgmi_sdma_engines = 0,
>>> + .num_sdma_queues_per_engine = 8,
>>> +};
>>> +
>>>static const struct kfd_device_info navi14_device_info = {
>>>.asic_family = CHIP_NAVI14,
>>>.asic_name = "navi14",
>>> @@ -425,6 +443,7 @@ static const struct kfd_device_info 
>>> *kfd_supported_devices[][2] = {
>>>[CHIP_RENOIR] = {_device_info, NULL},
>>>[CHIP_ARCTURUS] = {_device_info, _device_info},
>>>[CHIP_NAVI10] = {_device_info, NULL},
>>> + [CHIP_NAVI12] = {_device_info, _device_info},
>>>[CHIP_NAVI14] = {_device_info, NULL},
>>>};
>>>
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 3/3] drm/amdkfd: Remove the control stack workaround for GFX10

2019-09-25 Thread Kuehling, Felix
On 2019-09-25 2:15 p.m., Zhao, Yong wrote:
> The GFX10 does not have this hardware bug any more, so remove it.

I wouldn't call this a bug and a workaround. More like a change in the 
HW or FW behaviour and a corresponding driver change. I.e. in GFXv8 the 
control stack was in the user mode CWSR allocation. In GFXv9 it moved 
into a kernel mode buffer next to the MQD. So in GFXv10 the control 
stack moved back into the user mode CWSR buffer?

Regards,
   Felix

>
> Change-Id: I446c9685549a09ac8846a42ee22d86cfb93fd98c
> Signed-off-by: Yong Zhao 
> ---
>   .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  | 37 ++-
>   1 file changed, 4 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
> index 9cd3eb2d90bd..4a236b2c2354 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
> @@ -69,35 +69,13 @@ static void update_cu_mask(struct mqd_manager *mm, void 
> *mqd,
>   static struct kfd_mem_obj *allocate_mqd(struct kfd_dev *kfd,
>   struct queue_properties *q)
>   {
> - int retval;
> - struct kfd_mem_obj *mqd_mem_obj = NULL;
> + struct kfd_mem_obj *mqd_mem_obj;
>   
> - /* From V9,  for CWSR, the control stack is located on the next page
> -  * boundary after the mqd, we will use the gtt allocation function
> -  * instead of sub-allocation function.
> -  */
> - if (kfd->cwsr_enabled && (q->type == KFD_QUEUE_TYPE_COMPUTE)) {
> - mqd_mem_obj = kzalloc(sizeof(struct kfd_mem_obj), GFP_NOIO);
> - if (!mqd_mem_obj)
> - return NULL;
> - retval = amdgpu_amdkfd_alloc_gtt_mem(kfd->kgd,
> - ALIGN(q->ctl_stack_size, PAGE_SIZE) +
> - ALIGN(sizeof(struct v10_compute_mqd), 
> PAGE_SIZE),
> - &(mqd_mem_obj->gtt_mem),
> - &(mqd_mem_obj->gpu_addr),
> - (void *)&(mqd_mem_obj->cpu_ptr), true);
> - } else {
> - retval = kfd_gtt_sa_allocate(kfd, sizeof(struct 
> v10_compute_mqd),
> - _mem_obj);
> - }
> -
> - if (retval) {
> - kfree(mqd_mem_obj);
> + if (kfd_gtt_sa_allocate(kfd, sizeof(struct v10_compute_mqd),
> + _mem_obj))
>   return NULL;
> - }
>   
>   return mqd_mem_obj;
> -
>   }
>   
>   static void init_mqd(struct mqd_manager *mm, void **mqd,
> @@ -250,14 +228,7 @@ static int destroy_mqd(struct mqd_manager *mm, void *mqd,
>   static void free_mqd(struct mqd_manager *mm, void *mqd,
>   struct kfd_mem_obj *mqd_mem_obj)
>   {
> - struct kfd_dev *kfd = mm->dev;
> -
> - if (mqd_mem_obj->gtt_mem) {
> - amdgpu_amdkfd_free_gtt_mem(kfd->kgd, mqd_mem_obj->gtt_mem);
> - kfree(mqd_mem_obj);
> - } else {
> - kfd_gtt_sa_free(mm->dev, mqd_mem_obj);
> - }
> + kfd_gtt_sa_free(mm->dev, mqd_mem_obj);
>   }
>   
>   static bool is_occupied(struct mqd_manager *mm, void *mqd,
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: Add NAVI12 support from kfd side

2019-09-25 Thread Kuehling, Felix
You'll also need to add "case CHIP_NAVI12:" in a bunch of places. Grep 
for "CHIP_NAVI10" and you'll find them all pretty quickly.

Regards,
   Felix

On 2019-09-24 6:13 p.m., Liu, Shaoyun wrote:
> Add device info for both navi12 PF and VF
>
> Change-Id: Ifb4035e65c12d153fc30e593fe109f9c7e0541f4
> Signed-off-by: shaoyunl 
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c | 19 +++
>   1 file changed, 19 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index f329b82..edfbae5c 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -387,6 +387,24 @@ static const struct kfd_device_info navi10_device_info = 
> {
>   .num_sdma_queues_per_engine = 8,
>   };
>   
> +static const struct kfd_device_info navi12_device_info = {
> + .asic_family = CHIP_NAVI12,
> + .asic_name = "navi12",
> + .max_pasid_bits = 16,
> + .max_no_of_hqd  = 24,
> + .doorbell_size  = 8,
> + .ih_ring_entry_size = 8 * sizeof(uint32_t),
> + .event_interrupt_class = _interrupt_class_v9,
> + .num_of_watch_points = 4,
> + .mqd_size_aligned = MQD_SIZE_ALIGNED,
> + .needs_iommu_device = false,
> + .supports_cwsr = true,
> + .needs_pci_atomics = false,
> + .num_sdma_engines = 2,
> + .num_xgmi_sdma_engines = 0,
> + .num_sdma_queues_per_engine = 8,
> +};
> +
>   static const struct kfd_device_info navi14_device_info = {
>   .asic_family = CHIP_NAVI14,
>   .asic_name = "navi14",
> @@ -425,6 +443,7 @@ static const struct kfd_device_info 
> *kfd_supported_devices[][2] = {
>   [CHIP_RENOIR] = {_device_info, NULL},
>   [CHIP_ARCTURUS] = {_device_info, _device_info},
>   [CHIP_NAVI10] = {_device_info, NULL},
> + [CHIP_NAVI12] = {_device_info, _device_info},
>   [CHIP_NAVI14] = {_device_info, NULL},
>   };
>   
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdkfd: fix a potential NULL pointer dereference

2019-09-19 Thread Kuehling, Felix
On 2019-09-18 12:30 p.m., Allen Pais wrote:
> alloc_workqueue is not checked for errors and as a result,
> a potential NULL dereference could occur.
>
> Signed-off-by: Allen Pais 
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c | 5 +
>   1 file changed, 5 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c
> index c56ac47..caa82a8 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c
> @@ -62,6 +62,11 @@ int kfd_interrupt_init(struct kfd_dev *kfd)
>   }
>   
>   kfd->ih_wq = alloc_workqueue("KFD IH", WQ_HIGHPRI, 1);
> + if (unlikely(!kfd->ih_wq)) {
> + fifo_free(>ih_fifo);

This does not compile. I think this should be kfifo_free.

> + dev_err(kfd_chardev(), "Failed to allocate KFD IH workqueue\n");
> + return kfd->ih_wq;

This throws a compiler warning "return makes integer from pointer 
without a cast". What's worse, kfd->ih_wq is NULL here and 
kfd_interrupt_init returns 0 to mean success, so returning a NULL 
pointer is definitely not what you want here. This function should 
return a negative error code on failure. I propose -ENOMEM.

I'm going to apply your patch with those fixes.

Regards,
   Felix


> + }
>   spin_lock_init(>interrupt_lock);
>   
>   INIT_WORK(>interrupt_work, interrupt_wq);
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: fix potential VM faults

2019-09-19 Thread Kuehling, Felix
I'm not disagreeing with the change. Just trying to understand how this 
could have caused a VM fault. If the page tables are reserved or fenced 
while you allocate a new one, they would not be evicted. If they are not 
reserved or fenced, there should be no expectation that they stay resident.

Is this related to recoverable page fault handling? Do we need some more 
generic way to handle eviction of page tables and update the parent page 
directory (invalidate the corresponding PDE)?

Regards,
   Felix

On 2019-09-19 4:41, Christian König wrote:
> When we allocate new page tables under memory
> pressure we should not evict old ones.
>
> Signed-off-by: Christian König 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 70d45d48907a..8e44ecaada35 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -514,7 +514,8 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
>   .interruptible = (bp->type != ttm_bo_type_kernel),
>   .no_wait_gpu = bp->no_wait_gpu,
>   .resv = bp->resv,
> - .flags = TTM_OPT_FLAG_ALLOW_RES_EVICT
> + .flags = bp->type != ttm_bo_type_kernel ?
> + TTM_OPT_FLAG_ALLOW_RES_EVICT : 0
>   };
>   struct amdgpu_bo *bo;
>   unsigned long page_align, size = bp->size;
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdkfd: Delete unused KFD_IS_* macro

2019-09-16 Thread Kuehling, Felix
On 2019-09-16 12:33 p.m., Zhao, Yong wrote:
> These were deleted before, but somehow showed up again. Delete them again.
>
> Change-Id: I19b3063932380cb74a01d505e8e92f897a2c2cb7
> Signed-off-by: Yong Zhao 

Reviewed-by: Felix Kuehling 


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 4 
>   1 file changed, 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 06bb2d7a9b39..0773dc4df4ff 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -179,10 +179,6 @@ enum cache_policy {
>   cache_policy_noncoherent
>   };
>   
> -#define KFD_IS_VI(chip) ((chip) >= CHIP_CARRIZO && (chip) <= CHIP_POLARIS11)
> -#define KFD_IS_DGPU(chip) (((chip) >= CHIP_TONGA && \
> -(chip) <= CHIP_NAVI10) || \
> -(chip) == CHIP_HAWAII)
>   #define KFD_IS_SOC15(chip) ((chip) >= CHIP_VEGA10)
>   
>   struct kfd_event_interrupt_class {
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdkfd: Delete unused KFD_IS_DGPU macro

2019-09-16 Thread Kuehling, Felix
On 2019-09-16 12:26 p.m., wrote:
> This was deleted before, but somehow showed up again. Delete it again.
>
> Change-Id: I19b3063932380cb74a01d505e8e92f897a2c2cb7
> Signed-off-by: Yong Zhao 
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 3 ---
>   1 file changed, 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 06bb2d7a9b39..6ed31a76dfda 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -180,9 +180,6 @@ enum cache_policy {
>   };
>   
>   #define KFD_IS_VI(chip) ((chip) >= CHIP_CARRIZO && (chip) <= CHIP_POLARIS11)
> -#define KFD_IS_DGPU(chip) (((chip) >= CHIP_TONGA && \
> -(chip) <= CHIP_NAVI10) || \
> -(chip) == CHIP_HAWAII)

Are you familiar with "git blame"? It's really useful to track down 
which commit introduced code. In this case it shows that both IS_DGPU 
and IS_VI were added back by Philip Cox in his Navi10 change, probably 
accidentally while resolving a rebase conflict. You can probably remove 
IS_VI as well.

Regards,
   Felix


>   #define KFD_IS_SOC15(chip) ((chip) >= CHIP_VEGA10)
>   
>   struct kfd_event_interrupt_class {
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: remove needless usage of #ifdef

2019-09-12 Thread Kuehling, Felix
On 2019-09-12 2:44 a.m., S, Shirish wrote:
> define sched_policy in case CONFIG_HSA_AMD is not
> enabled, with this there is no need to check for CONFIG_HSA_AMD
> else where in driver code.
>
> Suggested-by: Felix Kuehling 
> Signed-off-by: Shirish S 

Reviewed-by: Felix Kuehling 


> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +-
>   2 files changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index a1516a3..6ff02bb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -171,6 +171,8 @@ extern int amdgpu_noretry;
>   extern int amdgpu_force_asic_type;
>   #ifdef CONFIG_HSA_AMD
>   extern int sched_policy;
> +#else
> +static const int sched_policy = KFD_SCHED_POLICY_HWS;
>   #endif
>   
>   #ifdef CONFIG_DRM_AMDGPU_SI
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 740638e..3b5282b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -1623,11 +1623,7 @@ static int amdgpu_device_ip_early_init(struct 
> amdgpu_device *adev)
>   }
>   
>   adev->pm.pp_feature = amdgpu_pp_feature_mask;
> - if (amdgpu_sriov_vf(adev)
> - #ifdef CONFIG_HSA_AMD
> - || sched_policy == KFD_SCHED_POLICY_NO_HWS
> - #endif
> - )
> + if (amdgpu_sriov_vf(adev) || sched_policy == KFD_SCHED_POLICY_NO_HWS)
>   adev->pm.pp_feature &= ~PP_GFXOFF_MASK;
>   
>   for (i = 0; i < adev->num_ip_blocks; i++) {
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: fix build error without CONFIG_HSA_AMD (V2)

2019-09-12 Thread Kuehling, Felix
On 2019-09-12 2:58 a.m., S, Shirish wrote:
> On 9/12/2019 3:29 AM, Kuehling, Felix wrote:
>> On 2019-09-11 2:52 a.m., S, Shirish wrote:
>>> If CONFIG_HSA_AMD is not set, build fails:
>>>
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.o: In function 
>>> `amdgpu_device_ip_early_init':
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1626: undefined reference to 
>>> `sched_policy'
>>>
>>> Use CONFIG_HSA_AMD to guard this.
>>>
>>> Fixes: 1abb680ad371 ("drm/amdgpu: disable gfxoff while use no H/W 
>>> scheduling policy")
>>>
>>> V2: declare sched_policy in amdgpu.h and remove changes in amdgpu_device.c
>> Which branch is this for. V1 of this patch was already submitted to
>> amd-staging-drm-next. So unless you're planning to revert v1 and submit
>> v2, I was expecting to see a change that fixes up the previous patch,
>> rather than a patch that replaces it.
> Have sent a patch that fixes up previous patch as well.
>
> Apparently, I did not send the revert but my plan was to revert and only
> then submit V2.

Reverts must be reviewed too. If you're planning to submit a revert, 
please do include it in code review. That also avoids this type of 
confusion.

I'll review your other patch.

Thanks,
   Felix


>
> Anyways both work for me as long as the kernel builds.
>
> Regards,
>
> Shirish S
>
>> Regards,
>>  Felix
>>
>>
>>> Signed-off-by: Shirish S 
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 
>>> 1 file changed, 4 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index 1030cb3..6ff02bb 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -169,7 +169,11 @@ extern int amdgpu_discovery;
>>> extern int amdgpu_mes;
>>> extern int amdgpu_noretry;
>>> extern int amdgpu_force_asic_type;
>>> +#ifdef CONFIG_HSA_AMD
>>> extern int sched_policy;
>>> +#else
>>> +static const int sched_policy = KFD_SCHED_POLICY_HWS;
>>> +#endif
>>> 
>>> #ifdef CONFIG_DRM_AMDGPU_SI
>>> extern int amdgpu_si_support;
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] Revert "drm/amdgpu/nbio7.4: add hw bug workaround for vega20"

2019-09-12 Thread Kuehling, Felix
On 2019-09-10 3:59 p.m., Russell, Kent wrote:
> This reverts commit e01f2d41895102d824c6b8f5e011dd5e286d5e8b.
>
> VG20 did not require this workaround, as the fix is in the VBIOS.
> Leave VG10/12 workaround as some older shipped cards do not have the
> VBIOS fix in place, and the kernel workaround is required in those
> situations
>
> Change-Id: I2d7c394ce9d205d97be6acfa5edc4635951fdadf
> Signed-off-by: Kent Russell 

Reviewed-by: Felix Kuehling 


> ---
>   drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 6 --
>   1 file changed, 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c 
> b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
> index 2d171bf07ad5..dafd9b7d31d3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
> +++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
> @@ -308,13 +308,7 @@ static void nbio_v7_4_detect_hw_virt(struct 
> amdgpu_device *adev)
>   
>   static void nbio_v7_4_init_registers(struct amdgpu_device *adev)
>   {
> - uint32_t def, data;
> -
> - def = data = RREG32_PCIE(smnPCIE_CI_CNTL);
> - data = REG_SET_FIELD(data, PCIE_CI_CNTL, CI_SLV_ORDERING_DIS, 1);
>   
> - if (def != data)
> - WREG32_PCIE(smnPCIE_CI_CNTL, data);
>   }
>   
>   static void nbio_v7_4_handle_ras_controller_intr_no_bifring(struct 
> amdgpu_device *adev)
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: fix build error without CONFIG_HSA_AMD (V2)

2019-09-11 Thread Kuehling, Felix
On 2019-09-11 2:52 a.m., S, Shirish wrote:
> If CONFIG_HSA_AMD is not set, build fails:
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.o: In function 
> `amdgpu_device_ip_early_init':
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1626: undefined reference to 
> `sched_policy'
>
> Use CONFIG_HSA_AMD to guard this.
>
> Fixes: 1abb680ad371 ("drm/amdgpu: disable gfxoff while use no H/W scheduling 
> policy")
>
> V2: declare sched_policy in amdgpu.h and remove changes in amdgpu_device.c

Which branch is this for. V1 of this patch was already submitted to 
amd-staging-drm-next. So unless you're planning to revert v1 and submit 
v2, I was expecting to see a change that fixes up the previous patch, 
rather than a patch that replaces it.

Regards,
   Felix


>
> Signed-off-by: Shirish S 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 
>   1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 1030cb3..6ff02bb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -169,7 +169,11 @@ extern int amdgpu_discovery;
>   extern int amdgpu_mes;
>   extern int amdgpu_noretry;
>   extern int amdgpu_force_asic_type;
> +#ifdef CONFIG_HSA_AMD
>   extern int sched_policy;
> +#else
> +static const int sched_policy = KFD_SCHED_POLICY_HWS;
> +#endif
>   
>   #ifdef CONFIG_DRM_AMDGPU_SI
>   extern int amdgpu_si_support;
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: fix build error without CONFIG_HSA_AMD

2019-09-10 Thread Kuehling, Felix
This is pretty ugly. See a suggestion inline.

On 2019-09-10 4:12 a.m., Huang, Ray wrote:
>> -Original Message-
>> From: S, Shirish 
>> Sent: Tuesday, September 10, 2019 3:54 PM
>> To: Deucher, Alexander ; Koenig, Christian
>> ; Huang, Ray 
>> Cc: amd-gfx@lists.freedesktop.org; S, Shirish 
>> Subject: [PATCH] drm/amdgpu: fix build error without CONFIG_HSA_AMD
>>
>> If CONFIG_HSA_AMD is not set, build fails:
>>
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.o: In function
>> `amdgpu_device_ip_early_init':
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1626: undefined
>> reference to `sched_policy'
>>
>> Use CONFIG_HSA_AMD to guard this.
>>
>> Fixes: 1abb680ad371 ("drm/amdgpu: disable gfxoff while use no H/W
>> scheduling policy")
>>
>> Signed-off-by: Shirish S 
> + Felix for his awareness.
>
> Reviewed-by: Huang Rui 
>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 ++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +-
>>   2 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index 1030cb30720c..a1516a3ae9a8 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -169,7 +169,9 @@ extern int amdgpu_discovery;  extern int
>> amdgpu_mes;  extern int amdgpu_noretry;  extern int
>> amdgpu_force_asic_type;
>> +#ifdef CONFIG_HSA_AMD
>>   extern int sched_policy;

#else
static const int sched_policy = KFD_SCHED_POLICY_HWS;
#endif

This way you don't need another set of ugly #ifdefs in amdgpu_device.c.

Regards,
   Felix


>> +#endif
>>
>>   #ifdef CONFIG_DRM_AMDGPU_SI
>>   extern int amdgpu_si_support;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index bd423dd64e18..2535db27f821 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -1623,7 +1623,11 @@ static int amdgpu_device_ip_early_init(struct
>> amdgpu_device *adev)
>>  }
>>
>>  adev->pm.pp_feature = amdgpu_pp_feature_mask;
>> -if (amdgpu_sriov_vf(adev) || sched_policy ==
>> KFD_SCHED_POLICY_NO_HWS)
>> +if (amdgpu_sriov_vf(adev)
>> +#ifdef CONFIG_HSA_AMD
>> +|| sched_policy == KFD_SCHED_POLICY_NO_HWS
>> +#endif
>> +)
>>  adev->pm.pp_feature &= ~PP_GFXOFF_MASK;
>>
>>  for (i = 0; i < adev->num_ip_blocks; i++) {
>> --
>> 2.20.1
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/1] drm/amdgpu: Fix KFD-related kernel oops on Hawaii

2019-09-05 Thread Kuehling, Felix
Hawaii needs to flush caches explicitly, submitting an IB in a user
VMID from kernel mode. There is no s_fence in this case.

Fixes: eb3961a57424 ("drm/amdgpu: remove fence context from the job")
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 6882eeb93b4e..d81e141a33fa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -141,7 +141,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
/* ring tests don't use a job */
if (job) {
vm = job->vm;
-   fence_ctx = job->base.s_fence->scheduled.context;
+   fence_ctx = job->base.s_fence ?
+   job->base.s_fence->scheduled.context : 0;
} else {
vm = NULL;
fence_ctx = 0;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdkfd: Support Navi14 in KFD

2019-09-05 Thread Kuehling, Felix
On 2019-09-05 3:22 p.m., Zhao, Yong wrote:
> Change-Id: Ie2c6226022ff4d389eaa05b1c84afa7ae4cea0aa
> Signed-off-by: Yong Zhao 

Please add a change description. With that fixed, this patch is

Reviewed-by: Felix Kuehling 


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  1 +
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c   | 19 +++
>   .../drm/amd/amdkfd/kfd_device_queue_manager.c |  1 +
>   drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c  |  1 +
>   drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c |  1 +
>   .../gpu/drm/amd/amdkfd/kfd_packet_manager.c   |  1 +
>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  1 +
>   7 files changed, 25 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> index 3d7d5eb9ed7a..333b44eb72e6 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> @@ -671,6 +671,7 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
>   num_of_cache_types = ARRAY_SIZE(raven_cache_info);
>   break;
>   case CHIP_NAVI10:
> + case CHIP_NAVI14:
>   pcache_info = navi10_cache_info;
>   num_of_cache_types = ARRAY_SIZE(navi10_cache_info);
>   break;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 444396a2fb0a..e71018b57784 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -369,6 +369,24 @@ static const struct kfd_device_info navi10_device_info = 
> {
>   .num_sdma_queues_per_engine = 8,
>   };
>   
> +static const struct kfd_device_info navi14_device_info = {
> + .asic_family = CHIP_NAVI14,
> + .asic_name = "navi14",
> + .max_pasid_bits = 16,
> + .max_no_of_hqd  = 24,
> + .doorbell_size  = 8,
> + .ih_ring_entry_size = 8 * sizeof(uint32_t),
> + .event_interrupt_class = _interrupt_class_v9,
> + .num_of_watch_points = 4,
> + .mqd_size_aligned = MQD_SIZE_ALIGNED,
> + .needs_iommu_device = false,
> + .supports_cwsr = true,
> + .needs_pci_atomics = false,
> + .num_sdma_engines = 2,
> + .num_xgmi_sdma_engines = 0,
> + .num_sdma_queues_per_engine = 8,
> +};
> +
>   /* For each entry, [0] is regular and [1] is virtualisation device. */
>   static const struct kfd_device_info *kfd_supported_devices[][2] = {
>   #ifdef KFD_SUPPORT_IOMMU_V2
> @@ -388,6 +406,7 @@ static const struct kfd_device_info 
> *kfd_supported_devices[][2] = {
>   [CHIP_VEGA20] = {_device_info, NULL},
>   [CHIP_ARCTURUS] = {_device_info, _device_info},
>   [CHIP_NAVI10] = {_device_info, NULL},
> + [CHIP_NAVI14] = {_device_info, NULL},
>   };
>   
>   static int kfd_gtt_sa_init(struct kfd_dev *kfd, unsigned int buf_size,
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 56639ee78608..9a7b512049d6 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -1908,6 +1908,7 @@ struct device_queue_manager 
> *device_queue_manager_init(struct kfd_dev *dev)
>   device_queue_manager_init_v9(>asic_ops);
>   break;
>   case CHIP_NAVI10:
> + case CHIP_NAVI14:
>   device_queue_manager_init_v10_navi10(>asic_ops);
>   break;
>   default:
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
> index ee7ff6b0541b..ed4efab0a190 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
> @@ -412,6 +412,7 @@ int kfd_init_apertures(struct kfd_process *process)
>   case CHIP_RAVEN:
>   case CHIP_ARCTURUS:
>   case CHIP_NAVI10:
> + case CHIP_NAVI14:
>   kfd_init_apertures_v9(pdd, id);
>   break;
>   default:
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
> index 7a3b0482ab1a..1097e047b4bb 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
> @@ -368,6 +368,7 @@ struct kernel_queue *kernel_queue_init(struct kfd_dev 
> *dev,
>   kernel_queue_init_v9(>ops_asic_specific);
>   break;
>   case CHIP_NAVI10:
> + case CHIP_NAVI14:
>   kernel_queue_init_v10(>ops_asic_specific);
>   break;
>   default:
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> index 6cf12422a7d8..b7828a241981 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> @@ -243,6 +243,7 @@ int pm_init(struct 

Re: [PATCH v2 02/10] drm/amdkfd: add renoir kfd device info (v2)

2019-09-05 Thread Kuehling, Felix
On 2019-09-05 2:36 p.m., Huang, Ray wrote:
> This patch inits renoir kfd device info, so we treat renoir as "dgpu"
> (bypass iommu v2). Will enable needs_iommu_device till renoir iommu is ready.
>
> v2: rebase and align the drm-next
>
> Signed-off-by: Huang Rui 

The series is

Reviewed-by: Felix Kuehling 

Note that you'll need to rebase this patch again after Yong's fix for 
builds without IOMMUv2 support. The CHIP_RAVEN entry in the device table 
moved. CHIP_RENOIR should stay where it is, because for now Renoir 
support doesn't depend on IOMMUv2.

Regards,
   Felix


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c | 18 ++
>   1 file changed, 18 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 267eb2e..c5120f3 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -351,6 +351,23 @@ static const struct kfd_device_info arcturus_device_info 
> = {
>   .num_sdma_queues_per_engine = 8,
>   };
>   
> +static const struct kfd_device_info renoir_device_info = {
> + .asic_family = CHIP_RENOIR,
> + .max_pasid_bits = 16,
> + .max_no_of_hqd  = 24,
> + .doorbell_size  = 8,
> + .ih_ring_entry_size = 8 * sizeof(uint32_t),
> + .event_interrupt_class = _interrupt_class_v9,
> + .num_of_watch_points = 4,
> + .mqd_size_aligned = MQD_SIZE_ALIGNED,
> + .supports_cwsr = true,
> + .needs_iommu_device = false,
> + .needs_pci_atomics = false,
> + .num_sdma_engines = 1,
> + .num_xgmi_sdma_engines = 0,
> + .num_sdma_queues_per_engine = 2,
> +};
> +
>   static const struct kfd_device_info navi10_device_info = {
>   .asic_family = CHIP_NAVI10,
>   .asic_name = "navi10",
> @@ -384,6 +401,7 @@ static const struct kfd_device_info 
> *kfd_supported_devices[][2] = {
>   [CHIP_VEGA12] = {_device_info, NULL},
>   [CHIP_VEGA20] = {_device_info, NULL},
>   [CHIP_RAVEN] = {_device_info, NULL},
> + [CHIP_RENOIR] = {_device_info, NULL},
>   [CHIP_ARCTURUS] = {_device_info, _device_info},
>   [CHIP_NAVI10] = {_device_info, NULL},
>   };
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdkfd: Fix a building error when KFD_SUPPORT_IOMMU_V2 is turned off

2019-09-05 Thread Kuehling, Felix
On 2019-09-05 11:01, Zhao, Yong wrote:
> The issue was accidentally introduced recently.
>
> Change-Id: I3b21caa1596d4f7de1866bed1cb5d8fe1b51504c
> Signed-off-by: Yong Zhao 

Reviewed-by: Felix Kuehling 


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c | 6 --
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 267eb2e01bec..21f5c597e699 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -371,11 +371,14 @@ static const struct kfd_device_info navi10_device_info 
> = {
>   
>   /* For each entry, [0] is regular and [1] is virtualisation device. */
>   static const struct kfd_device_info *kfd_supported_devices[][2] = {
> +#ifdef KFD_SUPPORT_IOMMU_V2
>   [CHIP_KAVERI] = {_device_info, NULL},
> + [CHIP_CARRIZO] = {_device_info, NULL},
> + [CHIP_RAVEN] = {_device_info, NULL},
> +#endif
>   [CHIP_HAWAII] = {_device_info, NULL},
>   [CHIP_TONGA] = {_device_info, NULL},
>   [CHIP_FIJI] = {_device_info, _vf_device_info},
> - [CHIP_CARRIZO] = {_device_info, NULL},
>   [CHIP_POLARIS10] = {_device_info, _vf_device_info},
>   [CHIP_POLARIS11] = {_device_info, NULL},
>   [CHIP_POLARIS12] = {_device_info, NULL},
> @@ -383,7 +386,6 @@ static const struct kfd_device_info 
> *kfd_supported_devices[][2] = {
>   [CHIP_VEGA10] = {_device_info, _vf_device_info},
>   [CHIP_VEGA12] = {_device_info, NULL},
>   [CHIP_VEGA20] = {_device_info, NULL},
> - [CHIP_RAVEN] = {_device_info, NULL},
>   [CHIP_ARCTURUS] = {_device_info, _device_info},
>   [CHIP_NAVI10] = {_device_info, NULL},
>   };
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/1] drm/amdgpu: Disable retry faults in VMID0

2019-09-04 Thread Kuehling, Felix
There is no point retrying page faults in VMID0. Those faults are
always fatal.

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 2 ++
 drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c | 2 ++
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c  | 2 ++
 drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c  | 2 ++
 drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c  | 2 ++
 5 files changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
index 6ce37ce77d14..9ec4297e61e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
@@ -178,6 +178,8 @@ static void gfxhub_v1_0_enable_system_domain(struct 
amdgpu_device *adev)
tmp = RREG32_SOC15(GC, 0, mmVM_CONTEXT0_CNTL);
tmp = REG_SET_FIELD(tmp, VM_CONTEXT0_CNTL, ENABLE_CONTEXT, 1);
tmp = REG_SET_FIELD(tmp, VM_CONTEXT0_CNTL, PAGE_TABLE_DEPTH, 0);
+   tmp = REG_SET_FIELD(tmp, VM_CONTEXT0_CNTL,
+   RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0);
WREG32_SOC15(GC, 0, mmVM_CONTEXT0_CNTL, tmp);
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c
index 8b789f750b72..a9238735d361 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c
@@ -166,6 +166,8 @@ static void gfxhub_v2_0_enable_system_domain(struct 
amdgpu_device *adev)
tmp = RREG32_SOC15(GC, 0, mmGCVM_CONTEXT0_CNTL);
tmp = REG_SET_FIELD(tmp, GCVM_CONTEXT0_CNTL, ENABLE_CONTEXT, 1);
tmp = REG_SET_FIELD(tmp, GCVM_CONTEXT0_CNTL, PAGE_TABLE_DEPTH, 0);
+   tmp = REG_SET_FIELD(tmp, GCVM_CONTEXT0_CNTL,
+   RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0);
WREG32_SOC15(GC, 0, mmGCVM_CONTEXT0_CNTL, tmp);
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
index b9d6c0bfa594..4c7e8c64a94e 100644
--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
@@ -207,6 +207,8 @@ static void mmhub_v1_0_enable_system_domain(struct 
amdgpu_device *adev)
tmp = RREG32_SOC15(MMHUB, 0, mmVM_CONTEXT0_CNTL);
tmp = REG_SET_FIELD(tmp, VM_CONTEXT0_CNTL, ENABLE_CONTEXT, 1);
tmp = REG_SET_FIELD(tmp, VM_CONTEXT0_CNTL, PAGE_TABLE_DEPTH, 0);
+   tmp = REG_SET_FIELD(tmp, VM_CONTEXT0_CNTL,
+   RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0);
WREG32_SOC15(MMHUB, 0, mmVM_CONTEXT0_CNTL, tmp);
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c 
b/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c
index 3542c203c3c8..86ed8cb915a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c
@@ -152,6 +152,8 @@ static void mmhub_v2_0_enable_system_domain(struct 
amdgpu_device *adev)
tmp = RREG32_SOC15(MMHUB, 0, mmMMVM_CONTEXT0_CNTL);
tmp = REG_SET_FIELD(tmp, MMVM_CONTEXT0_CNTL, ENABLE_CONTEXT, 1);
tmp = REG_SET_FIELD(tmp, MMVM_CONTEXT0_CNTL, PAGE_TABLE_DEPTH, 0);
+   tmp = REG_SET_FIELD(tmp, MMVM_CONTEXT0_CNTL,
+   RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0);
WREG32_SOC15(MMHUB, 0, mmMMVM_CONTEXT0_CNTL, tmp);
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c 
b/drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c
index 0cf7ef44b4b5..657970f9ebfb 100644
--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c
@@ -240,6 +240,8 @@ static void mmhub_v9_4_enable_system_domain(struct 
amdgpu_device *adev,
  hubid * MMHUB_INSTANCE_REGISTER_OFFSET);
tmp = REG_SET_FIELD(tmp, VML2VC0_VM_CONTEXT0_CNTL, ENABLE_CONTEXT, 1);
tmp = REG_SET_FIELD(tmp, VML2VC0_VM_CONTEXT0_CNTL, PAGE_TABLE_DEPTH, 0);
+   tmp = REG_SET_FIELD(tmp, VML2VC0_VM_CONTEXT0_CNTL,
+   RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0);
WREG32_SOC15_OFFSET(MMHUB, 0, mmVML2VC0_VM_CONTEXT0_CNTL,
hubid * MMHUB_INSTANCE_REGISTER_OFFSET, tmp);
 }
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Graceful page fault handling for Vega/Navi

2019-09-04 Thread Kuehling, Felix
On 2019-09-04 7:03 p.m., Huang, Ray wrote:
> On Wed, Sep 04, 2019 at 05:02:21PM +0200, Christian König wrote:
>> Hi everyone,
>>
>> this series is the next puzzle piece for recoverable page fault handling on 
>> Vega and Navi.
>>
>> It adds a new direct scheduler entity for VM updates which is then used to 
>> update page tables during a fault.
>>
>> In other words previously an application doing an invalid memory access 
>> would just hang and/or repeat the invalid access over and over again. Now 
>> the handling is modified so that the invalid memory access is redirected to 
>> the dummy page.
>>
>> This needs the following prerequisites:
>> a) The firmware must be new enough so allow re-routing of page faults.
>> b) Fault retry must be enabled using the amdgpu.noretry=0 parameter.
> In my side, I found "notretry" parameter not workable for vmid 0 vm faults.
> If the same observation in your side, I'd like give a check.

I think the noretry parameter is not meant to affect VMID0. I just find 
it surprising that retry faults are happening at all on VMID0. It 
doesn't make a lot of sense. I can't think of any good reason to retry 
any page faults in VMID0.

I see that the HW default for the RETRY_PERMISSION_OR_INVALID_PAGE_FAULT 
bit is 1 in VM_CONTEXT0_CNTL. I don't see us changing that value in the 
driver. We probably should. I'll send out a patch for that.

Regards,
   Felix


>
> Thanks,
> Ray
>
>
>> c) Enough free VRAM to allocate page tables to point to the dummy page.
>>
>> The re-routing of page faults current only works on Vega10, so Vega20 and 
>> Navi will still need some more time.
>>
>> Please review and/or comment,
>> Christian.
>>
>>
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Graceful page fault handling for Vega/Navi

2019-09-04 Thread Kuehling, Felix

On 2019-09-04 11:02 a.m., Christian König wrote:
> Hi everyone,
>
> this series is the next puzzle piece for recoverable page fault handling on 
> Vega and Navi.
>
> It adds a new direct scheduler entity for VM updates which is then used to 
> update page tables during a fault.
>
> In other words previously an application doing an invalid memory access would 
> just hang and/or repeat the invalid access over and over again. Now the 
> handling is modified so that the invalid memory access is redirected to the 
> dummy page.
>
> This needs the following prerequisites:
> a) The firmware must be new enough so allow re-routing of page faults.
> b) Fault retry must be enabled using the amdgpu.noretry=0 parameter.
> c) Enough free VRAM to allocate page tables to point to the dummy page.
>
> The re-routing of page faults current only works on Vega10, so Vega20 and 
> Navi will still need some more time.

Wait, we don't do the page fault rerouting on Vega20 yet? So we're 
getting the full brunt of the fault storm on the main interrupt ring? In 
that case, we should probably change the default setting of 
amdgpu.noretry=1 at least until that's done.

Other than that the patch series looks reasonable to me. I commented on 
patches 4 and 9 separately.

Patch 1 is Acked-by: Felix Kuehling 

With the issues addressed that I pointed out, the rest is

Reviewed-by: Felix Kuehling 

Regards,
   Felix


> Please review and/or comment,
> Christian.
>
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 4/9] drm/amdgpu: allow direct submission of PDE updates.

2019-09-04 Thread Kuehling, Felix
On 2019-09-04 11:02 a.m., Christian König wrote:
> For handling PDE updates directly in the fault handler.
>
> Signed-off-by: Christian König 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   | 2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c  | 2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   | 8 +---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 4 ++--
>   5 files changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index b0f0e060ded6..d3942d9306c4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -343,7 +343,7 @@ static int vm_update_pds(struct amdgpu_vm *vm, struct 
> amdgpu_sync *sync)
>   struct amdgpu_device *adev = amdgpu_ttm_adev(pd->tbo.bdev);
>   int ret;
>   
> - ret = amdgpu_vm_update_directories(adev, vm);
> + ret = amdgpu_vm_update_pdes(adev, vm, false);
>   if (ret)
>   return ret;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 51f3db08b8eb..bd6b88827447 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -845,7 +845,7 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser 
> *p)
>   if (r)
>   return r;
>   
> - r = amdgpu_vm_update_directories(adev, vm);
> + r = amdgpu_vm_update_pdes(adev, vm, false);
>   if (r)
>   return r;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index e7af35c7080d..a621e629d876 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -521,7 +521,7 @@ static void amdgpu_gem_va_update_vm(struct amdgpu_device 
> *adev,
>   goto error;
>   }
>   
> - r = amdgpu_vm_update_directories(adev, vm);
> + r = amdgpu_vm_update_pdes(adev, vm, false);
>   
>   error:
>   if (r && r != -ERESTARTSYS)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index fc103a9f20c5..b6c89ba9281c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -1221,18 +1221,19 @@ static void amdgpu_vm_invalidate_pds(struct 
> amdgpu_device *adev,
>   }
>   
>   /*
> - * amdgpu_vm_update_directories - make sure that all directories are valid
> + * amdgpu_vm_update_ - make sure that all directories are valid

Typo.

Regards,
   Felix


>*
>* @adev: amdgpu_device pointer
>* @vm: requested vm
> + * @direct: submit directly to the paging queue
>*
>* Makes sure all directories are up to date.
>*
>* Returns:
>* 0 for success, error for failure.
>*/
> -int amdgpu_vm_update_directories(struct amdgpu_device *adev,
> -  struct amdgpu_vm *vm)
> +int amdgpu_vm_update_pdes(struct amdgpu_device *adev,
> +   struct amdgpu_vm *vm, bool direct)
>   {
>   struct amdgpu_vm_update_params params;
>   int r;
> @@ -1243,6 +1244,7 @@ int amdgpu_vm_update_directories(struct amdgpu_device 
> *adev,
>   memset(, 0, sizeof(params));
>   params.adev = adev;
>   params.vm = vm;
> + params.direct = direct;
>   
>   r = vm->update_funcs->prepare(, AMDGPU_FENCE_OWNER_VM, NULL);
>   if (r)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index 54dcd0bcce1a..0a97dc839f3b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -366,8 +366,8 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, 
> struct amdgpu_vm *vm,
> int (*callback)(void *p, struct amdgpu_bo *bo),
> void *param);
>   int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job, bool 
> need_pipe_sync);
> -int amdgpu_vm_update_directories(struct amdgpu_device *adev,
> -  struct amdgpu_vm *vm);
> +int amdgpu_vm_update_pdes(struct amdgpu_device *adev,
> +   struct amdgpu_vm *vm, bool direct);
>   int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
> struct amdgpu_vm *vm,
> struct dma_fence **fence);
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2

2019-09-04 Thread Kuehling, Felix
On 2019-09-04 11:02 a.m., Christian König wrote:
> Next step towards HMM support. For now just silence the retry fault and
> optionally redirect the request to the dummy page.
>
> v2: make sure the VM is not destroyed while we handle the fault.
>
> Signed-off-by: Christian König 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 74 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 ++
>   3 files changed, 80 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 951608fc1925..410d89966a66 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -3142,3 +3142,77 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
>   }
>   }
>   }
> +
> +/**
> + * amdgpu_vm_handle_fault - graceful handling of VM faults.
> + * @adev: amdgpu device pointer
> + * @pasid: PASID of the VM
> + * @addr: Address of the fault
> + *
> + * Try to gracefully handle a VM fault. Return true if the fault was handled 
> and
> + * shouldn't be reported any more.
> + */
> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
> + uint64_t addr)
> +{
> + struct amdgpu_ring *ring = >sdma.instance[0].page;
> + struct amdgpu_bo *root;
> + uint64_t value, flags;
> + struct amdgpu_vm *vm;
> + long r;
> +
> + if (!ring->sched.ready)
> + return false;
> +
> + spin_lock(>vm_manager.pasid_lock);
> + vm = idr_find(>vm_manager.pasid_idr, pasid);
> + if (vm)
> + root = amdgpu_bo_ref(vm->root.base.bo);
> + else
> + root = NULL;
> + spin_unlock(>vm_manager.pasid_lock);
> +
> + if (!root)
> + return false;
> +
> + r = amdgpu_bo_reserve(root, true);
> + if (r)
> + goto error_unref;
> +
> + spin_lock(>vm_manager.pasid_lock);
> + vm = idr_find(>vm_manager.pasid_idr, pasid);
> + spin_unlock(>vm_manager.pasid_lock);

I think this deserves a comment. If I understand it correctly, you're 
looking up the vm twice so that you have the VM root reservation to 
protect against user-after-free. Otherwise the vm pointer is only valid 
as long as you're holding the spin-lock.


> +
> + if (!vm || vm->root.base.bo != root)

The check of vm->root.base.bo should probably still be under the 
spin_lock. Because you're not sure yet it's the right VM, you can't rely 
on the reservation here to prevent use-after-free.


> + goto error_unlock;
> +
> + addr /= AMDGPU_GPU_PAGE_SIZE;
> + flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
> + AMDGPU_PTE_SYSTEM;
> +
> + if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
> + /* Redirect the access to the dummy page */
> + value = adev->dummy_page_addr;
> + flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
> + AMDGPU_PTE_WRITEABLE;
> + } else {
> + value = 0;
> + }
> +
> + r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 1,
> + flags, value, NULL, NULL);
> + if (r)
> + goto error_unlock;
> +
> + r = amdgpu_vm_update_pdes(adev, vm, true);
> +
> +error_unlock:
> + amdgpu_bo_unreserve(root);
> + if (r < 0)
> + DRM_ERROR("Can't handle page fault (%ld)\n", r);
> +
> +error_unref:
> + amdgpu_bo_unref();
> +
> + return false;
> +}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index 0a97dc839f3b..4dbbe1b6b413 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -413,6 +413,8 @@ void amdgpu_vm_check_compute_bug(struct amdgpu_device 
> *adev);
>   
>   void amdgpu_vm_get_task_info(struct amdgpu_device *adev, unsigned int pasid,
>struct amdgpu_task_info *task_info);
> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
> + uint64_t addr);
>   
>   void amdgpu_vm_set_task_info(struct amdgpu_vm *vm);
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index 9d15679df6e0..15a1ce51befa 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -353,6 +353,10 @@ static int gmc_v9_0_process_interrupt(struct 
> amdgpu_device *adev,
>   }
>   
>   /* If it's the first fault for this address, process it normally */
> + if (retry_fault && !in_interrupt() &&
> + amdgpu_vm_handle_fault(adev, entry->pasid, addr))
> + return 1; /* This also prevents sending it to KFD */

The !in_interrupt() is meant to only do this on the rerouted interrupt 
ring that's handled by a worker function?

Looks like amdgpu_vm_handle_fault never returns true for now. So we'll 

Re: [PATCH 2/2] drm/amdgpu: cleanup PTE flag generation v2

2019-09-04 Thread Kuehling, Felix
On 2019-09-04 8:17 a.m., Christian König wrote:
> Move the ASIC specific code into a new callback function.
>
> v2: mask the flags for SI and CIK instead of a BUG_ON().
>
> Signed-off-by: Christian König 

Nit-pick inline. Otherwise the series is

Reviewed-by: Felix Kuehling 


> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h |  5 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 29 ++---
>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 22 ++-
>   drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c   |  9 
>   drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c   | 11 +-
>   drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c   | 13 ++-
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c   | 24 +++-
>   7 files changed, 82 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> index 6a74059b776c..232a8ff5642b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> @@ -104,6 +104,10 @@ struct amdgpu_gmc_funcs {
>   /* get the pde for a given mc addr */
>   void (*get_vm_pde)(struct amdgpu_device *adev, int level,
>  u64 *dst, u64 *flags);
> + /* get the pte flags to use for a BO VA mapping */
> + void (*get_vm_pte)(struct amdgpu_device *adev,
> +struct amdgpu_bo_va_mapping *mapping,
> +uint64_t *flags);
>   };
>   
>   struct amdgpu_xgmi {
> @@ -185,6 +189,7 @@ struct amdgpu_gmc {
>   #define amdgpu_gmc_emit_pasid_mapping(r, vmid, pasid) 
> (r)->adev->gmc.gmc_funcs->emit_pasid_mapping((r), (vmid), (pasid))
>   #define amdgpu_gmc_map_mtype(adev, flags) 
> (adev)->gmc.gmc_funcs->map_mtype((adev),(flags))
>   #define amdgpu_gmc_get_vm_pde(adev, level, dst, flags) 
> (adev)->gmc.gmc_funcs->get_vm_pde((adev), (level), (dst), (flags))
> +#define amdgpu_gmc_get_vm_pte(adev, mapping, flags) 
> (adev)->gmc.gmc_funcs->get_vm_pte((adev), (mapping), (flags))
>   
>   /**
>* amdgpu_gmc_vram_full_visible - Check if full VRAM is visible through the 
> BAR
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 2cb82b229802..b285ab25146d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -1571,33 +1571,8 @@ static int amdgpu_vm_bo_split_mapping(struct 
> amdgpu_device *adev,
>   if (!(mapping->flags & AMDGPU_PTE_WRITEABLE))
>   flags &= ~AMDGPU_PTE_WRITEABLE;
>   
> - if (adev->asic_type >= CHIP_TONGA) {
> - flags &= ~AMDGPU_PTE_EXECUTABLE;
> - flags |= mapping->flags & AMDGPU_PTE_EXECUTABLE;
> - }
> -
> - if (adev->asic_type >= CHIP_NAVI10) {
> - flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
> - flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
> - } else {
> - flags &= ~AMDGPU_PTE_MTYPE_VG10_MASK;
> - flags |= (mapping->flags & AMDGPU_PTE_MTYPE_VG10_MASK);
> - }
> -
> - if ((mapping->flags & AMDGPU_PTE_PRT) &&
> - (adev->asic_type >= CHIP_VEGA10)) {
> - flags |= AMDGPU_PTE_PRT;
> - if (adev->asic_type >= CHIP_NAVI10) {
> - flags |= AMDGPU_PTE_SNOOPED;
> - flags |= AMDGPU_PTE_LOG;
> - flags |= AMDGPU_PTE_SYSTEM;
> - }
> - flags &= ~AMDGPU_PTE_VALID;
> - }
> - if (adev->asic_type == CHIP_ARCTURUS &&
> - !(flags & AMDGPU_PTE_SYSTEM) &&
> - mapping->bo_va->is_xgmi)
> - flags |= AMDGPU_PTE_SNOOPED;
> + /* Apply ASIC specific mapping flags */
> + amdgpu_gmc_get_vm_pte(adev, mapping, );
>   
>   trace_amdgpu_vm_bo_update(mapping);
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> index 7eb0ba87fef9..1a8d8f528b01 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> @@ -440,12 +440,32 @@ static void gmc_v10_0_get_vm_pde(struct amdgpu_device 
> *adev, int level,
>   }
>   }
>   
> +static void gmc_v10_0_get_vm_pte(struct amdgpu_device *adev,
> +  struct amdgpu_bo_va_mapping *mapping,
> +  uint64_t *flags)
> +{
> + *flags &= ~AMDGPU_PTE_EXECUTABLE;
> + *flags |= mapping->flags & AMDGPU_PTE_EXECUTABLE;
> +
> + *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
> + *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
> +
> + if (mapping->flags & AMDGPU_PTE_PRT) {
> + *flags |= AMDGPU_PTE_PRT;
> + *flags |= AMDGPU_PTE_SNOOPED;
> + *flags |= AMDGPU_PTE_LOG;
> + *flags |= AMDGPU_PTE_SYSTEM;
> + *flags &= ~AMDGPU_PTE_VALID;
> + }
> +}
> +
>   static const struct amdgpu_gmc_funcs gmc_v10_0_gmc_funcs = {
>   .flush_gpu_tlb = gmc_v10_0_flush_gpu_tlb,
>   .emit_flush_gpu_tlb = 

Re: [PATCH 00/10] drm/amdkfd: introduce the kfd support for Renoir

2019-09-04 Thread Kuehling, Felix
Patches 2-10 are

Reviewed-by: Felix Kuehling 

See my comments on patches 1 and 2 in separate emails. In patch 1 it 
looks like you're based on an outdated version of the branch or a 
different branch altogether. Please check that your series is based on 
the latest amd-staging-drm-next.

Regards,
   Felix

On 2019-09-04 11:48 a.m., Huang, Ray wrote:
> Hi all,
>
> These patch set provides the kfd support for ROCm stack on Renoir APU.
>
> Thanks,
> Ray
>
> Huang Rui (10):
>drm/amdkfd: add renoir cache info for CRAT
>drm/amdkfd: add renoir kfd device info
>drm/amdkfd: enable kfd device queue manager v9 for renoir
>drm/amdkfd: add renoir typs for the workaround of iommu v2
>drm/amdkfd: init kfd apertures v9 for renoir
>drm/amdkfd: init kernel queue for renoir
>drm/amdkfd: add package manager for renoir
>drm/amdkfd: add renoir kfd topology
>drm/amdgpu: disable gfxoff while use no H/W scheduling policy
>drm/amdkfd: enable renoir while device probes
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c|  1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|  2 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  4 
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c   | 19 
> +++
>   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c |  1 +
>   drivers/gpu/drm/amd/amdkfd/kfd_events.c   |  3 ++-
>   drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c  |  1 +
>   drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c |  1 +
>   drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c   |  1 +
>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  1 +
>   11 files changed, 33 insertions(+), 2 deletions(-)
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 02/10] drm/amdkfd: add renoir kfd device info

2019-09-04 Thread Kuehling, Felix
On 2019-09-04 11:48 a.m., Huang, Ray wrote:
> This patch inits renoir kfd device info, so we treat renoir as "dgpu"
> (bypass iommu v2). Will enable needs_iommu_device till renoir iommu is ready.
>
> Signed-off-by: Huang Rui 

Looks good, but please coordinate with Yong who is changing the 
structure of the kfd device info table. See his patch "drm/amdkfd: Query 
kfd device info by CHIP id instead of pci device id" for details. 
Whoever goes in second will need to rebase and fix the conflict.

Reviewed-by: Felix Kuehling 


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c | 19 +++
>   1 file changed, 19 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 2514263..1f65585 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -317,6 +317,23 @@ static const struct kfd_device_info vega20_device_info = 
> {
>   .num_sdma_queues_per_engine = 8,
>   };
>   
> +static const struct kfd_device_info renoir_device_info = {
> + .asic_family = CHIP_RENOIR,
> + .max_pasid_bits = 16,
> + .max_no_of_hqd  = 24,
> + .doorbell_size  = 8,
> + .ih_ring_entry_size = 8 * sizeof(uint32_t),
> + .event_interrupt_class = _interrupt_class_v9,
> + .num_of_watch_points = 4,
> + .mqd_size_aligned = MQD_SIZE_ALIGNED,
> + .supports_cwsr = true,
> + .needs_iommu_device = false,
> + .needs_pci_atomics = false,
> + .num_sdma_engines = 1,
> + .num_xgmi_sdma_engines = 0,
> + .num_sdma_queues_per_engine = 2,
> +};
> +
>   static const struct kfd_device_info navi10_device_info = {
>   .asic_family = CHIP_NAVI10,
>   .max_pasid_bits = 16,
> @@ -452,6 +469,8 @@ static const struct kfd_deviceid supported_devices[] = {
>   { 0x66a4, _device_info },/* Vega20 */
>   { 0x66a7, _device_info },/* Vega20 */
>   { 0x66af, _device_info },/* Vega20 */
> + /* Renoir */
> + { 0x1636, _device_info },/* Renoir */
>   /* Navi10 */
>   { 0x7310, _device_info },/* Navi10 */
>   { 0x7312, _device_info },/* Navi10 */
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 01/10] drm/amdkfd: add renoir cache info for CRAT

2019-09-04 Thread Kuehling, Felix
On 2019-09-04 11:48 a.m., Huang, Ray wrote:
> Renoir's cache info should be the same with raven and carrizo's.
>
> Signed-off-by: Huang Rui 
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 4 
>   1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> index a84c810..2a428c9 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> @@ -138,6 +138,7 @@ static struct kfd_gpu_cache_info carrizo_cache_info[] = {
>   /* TODO - check & update Vega10 cache details */
>   #define vega10_cache_info carrizo_cache_info
>   #define raven_cache_info carrizo_cache_info
> +#define renoir_cache_info carrizo_cache_info
>   /* TODO - check & update Navi10 cache details */
>   #define navi10_cache_info carrizo_cache_info
>   
> @@ -668,6 +669,9 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
>   case CHIP_RAVEN:
>   pcache_info = raven_cache_info;
>   num_of_cache_types = ARRAY_SIZE(raven_cache_info);
> + case CHIP_RENOIR:
> + pcache_info = renoir_cache_info;
> + num_of_cache_types = ARRAY_SIZE(renoir_cache_info);

I just noticed, there are break; statements missing here. Which branch 
are your changes based on? At least the surrounding code looks OK on 
amd-staging-drm-next, but seems to be missing a break statement at least 
in the Raven case here.

Regards,
   Felix


>   case CHIP_NAVI10:
>   pcache_info = navi10_cache_info;
>   num_of_cache_types = ARRAY_SIZE(navi10_cache_info);
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/3] drm/amdgpu: reserve at least 4MB of VRAM for page tables v2

2019-09-03 Thread Kuehling, Felix
On 2019-09-03 5:09 a.m., Christian König wrote:
> This hopefully helps reduce the contention for page tables.
>
> v2: adjust maximum reported VRAM size as well
>
> Signed-off-by: Christian König 

I'll need to do something similar (and also take the vram_pin_size into 
account) in amdgpu_amdkfd_reserve_mem_limit. It doesn't even account for 
the vram_pin_size at the moment, which I should fix too. Otherwise this 
commit looks good to me.

Reviewed-by: Felix Kuehling 


> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c  | 18 --
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   |  3 +++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c |  9 +++--
>   3 files changed, 22 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index 6cf61e01041f..5bc20d84351d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -616,9 +616,12 @@ static int amdgpu_info_ioctl(struct drm_device *dev, 
> void *data, struct drm_file
>   struct drm_amdgpu_info_vram_gtt vram_gtt;
>   
>   vram_gtt.vram_size = adev->gmc.real_vram_size -
> - atomic64_read(>vram_pin_size);
> - vram_gtt.vram_cpu_accessible_size = adev->gmc.visible_vram_size 
> -
> - atomic64_read(>visible_pin_size);
> + atomic64_read(>vram_pin_size) -
> + AMDGPU_VM_RESERVED_VRAM;
> + vram_gtt.vram_cpu_accessible_size =
> + min(adev->gmc.visible_vram_size -
> + atomic64_read(>visible_pin_size),
> + vram_gtt.vram_size);
>   vram_gtt.gtt_size = adev->mman.bdev.man[TTM_PL_TT].size;
>   vram_gtt.gtt_size *= PAGE_SIZE;
>   vram_gtt.gtt_size -= atomic64_read(>gart_pin_size);
> @@ -631,15 +634,18 @@ static int amdgpu_info_ioctl(struct drm_device *dev, 
> void *data, struct drm_file
>   memset(, 0, sizeof(mem));
>   mem.vram.total_heap_size = adev->gmc.real_vram_size;
>   mem.vram.usable_heap_size = adev->gmc.real_vram_size -
> - atomic64_read(>vram_pin_size);
> + atomic64_read(>vram_pin_size) -
> + AMDGPU_VM_RESERVED_VRAM;
>   mem.vram.heap_usage =
>   
> amdgpu_vram_mgr_usage(>mman.bdev.man[TTM_PL_VRAM]);
>   mem.vram.max_allocation = mem.vram.usable_heap_size * 3 / 4;
>   
>   mem.cpu_accessible_vram.total_heap_size =
>   adev->gmc.visible_vram_size;
> - mem.cpu_accessible_vram.usable_heap_size = 
> adev->gmc.visible_vram_size -
> - atomic64_read(>visible_pin_size);
> + mem.cpu_accessible_vram.usable_heap_size =
> + min(adev->gmc.visible_vram_size -
> + atomic64_read(>visible_pin_size),
> + mem.vram.usable_heap_size);
>   mem.cpu_accessible_vram.heap_usage =
>   
> amdgpu_vram_mgr_vis_usage(>mman.bdev.man[TTM_PL_VRAM]);
>   mem.cpu_accessible_vram.max_allocation =
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index 2eda3a8c330d..3352a87b822e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -99,6 +99,9 @@ struct amdgpu_bo_list_entry;
>   #define AMDGPU_VM_FAULT_STOP_FIRST  1
>   #define AMDGPU_VM_FAULT_STOP_ALWAYS 2
>   
> +/* Reserve 4MB VRAM for page tables */
> +#define AMDGPU_VM_RESERVED_VRAM  (4ULL << 20)
> +
>   /* max number of VMHUB */
>   #define AMDGPU_MAX_VMHUBS   3
>   #define AMDGPU_GFXHUB_0 0
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> index 1150e34bc28f..59440f71d304 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> @@ -24,6 +24,7 @@
>   
>   #include 
>   #include "amdgpu.h"
> +#include "amdgpu_vm.h"
>   
>   struct amdgpu_vram_mgr {
>   struct drm_mm mm;
> @@ -276,7 +277,7 @@ static int amdgpu_vram_mgr_new(struct 
> ttm_mem_type_manager *man,
>   struct drm_mm_node *nodes;
>   enum drm_mm_insert_mode mode;
>   unsigned long lpfn, num_nodes, pages_per_node, pages_left;
> - uint64_t vis_usage = 0, mem_bytes;
> + uint64_t vis_usage = 0, mem_bytes, max_bytes;
>   unsigned i;
>   int r;
>   
> @@ -284,9 +285,13 @@ static int amdgpu_vram_mgr_new(struct 
> ttm_mem_type_manager *man,
>   if (!lpfn)
>   lpfn = man->size;
>   
> + max_bytes = adev->gmc.mc_vram_size;
> + if (tbo->type != ttm_bo_type_kernel)
> + max_bytes -= AMDGPU_VM_RESERVED_VRAM;
> +
>   /* bail out quickly if there's likely not enough 

Re: [PATCH 1/3] drm/amdgpu: use moving fence instead of exclusive for VM updates

2019-09-03 Thread Kuehling, Felix
On 2019-09-02 6:52 a.m., Christian König wrote:
> Make VM updates depend on the moving fence instead of the exclusive one.

In effect, this makes the page table update depend on the last move of 
the BO, rather than the last change of the buffer contents. Makes sense.

Reviewed-by: Felix Kuehling 


>
> Makes it less likely to actually have a dependency.
>
> Signed-off-by: Christian König 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 189ad5699946..501e13420786 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -1706,7 +1706,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev,
>   ttm = container_of(bo->tbo.ttm, struct ttm_dma_tt, ttm);
>   pages_addr = ttm->dma_address;
>   }
> - exclusive = reservation_object_get_excl(bo->tbo.resv);
> + exclusive = bo->tbo.moving;
>   }
>   
>   if (bo) {
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/2] drm/amdgpu: cleanup PTE flag generation

2019-09-03 Thread Kuehling, Felix
On 2019-09-02 10:57 a.m., Christian König wrote:
> Move the ASIC specific code into a new callback function.

NAK. I believe the BUG_ONs you're adding will trigger with ROCm on 
Hawaii and Kaveri. See inline ...

ROCm user mode doesn't care that the page table doesn't have an 
executable bit on those chips. If the HW makes all memory executable, we 
should just ignore the flag.


>
> Signed-off-by: Christian König 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h |  5 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 29 ++---
>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 22 ++-
>   drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c   |  9 
>   drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c   | 11 +-
>   drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c   | 13 ++-
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c   | 24 +++-
>   7 files changed, 82 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> index d5e4574afbc2..d3be51ba6349 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> @@ -104,6 +104,10 @@ struct amdgpu_gmc_funcs {
>   /* get the pde for a given mc addr */
>   void (*get_vm_pde)(struct amdgpu_device *adev, int level,
>  u64 *dst, u64 *flags);
> + /* get the pte flags to use for a BO VA mapping */
> + void (*get_vm_pte)(struct amdgpu_device *adev,
> +struct amdgpu_bo_va_mapping *mapping,
> +uint64_t *flags);
>   };
>   
>   struct amdgpu_xgmi {
> @@ -185,6 +189,7 @@ struct amdgpu_gmc {
>   #define amdgpu_gmc_emit_pasid_mapping(r, vmid, pasid) 
> (r)->adev->gmc.gmc_funcs->emit_pasid_mapping((r), (vmid), (pasid))
>   #define amdgpu_gmc_map_mtype(adev, flags) 
> (adev)->gmc.gmc_funcs->map_mtype((adev),(flags))
>   #define amdgpu_gmc_get_vm_pde(adev, level, dst, flags) 
> (adev)->gmc.gmc_funcs->get_vm_pde((adev), (level), (dst), (flags))
> +#define amdgpu_gmc_get_vm_pte(adev, mapping, flags) 
> (adev)->gmc.gmc_funcs->get_vm_pte((adev), (mapping), (flags))
>   
>   /**
>* amdgpu_gmc_vram_full_visible - Check if full VRAM is visible through the 
> BAR
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 2cb82b229802..b285ab25146d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -1571,33 +1571,8 @@ static int amdgpu_vm_bo_split_mapping(struct 
> amdgpu_device *adev,
>   if (!(mapping->flags & AMDGPU_PTE_WRITEABLE))
>   flags &= ~AMDGPU_PTE_WRITEABLE;
>   
> - if (adev->asic_type >= CHIP_TONGA) {
> - flags &= ~AMDGPU_PTE_EXECUTABLE;
> - flags |= mapping->flags & AMDGPU_PTE_EXECUTABLE;
> - }
> -
> - if (adev->asic_type >= CHIP_NAVI10) {
> - flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
> - flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
> - } else {
> - flags &= ~AMDGPU_PTE_MTYPE_VG10_MASK;
> - flags |= (mapping->flags & AMDGPU_PTE_MTYPE_VG10_MASK);
> - }
> -
> - if ((mapping->flags & AMDGPU_PTE_PRT) &&
> - (adev->asic_type >= CHIP_VEGA10)) {
> - flags |= AMDGPU_PTE_PRT;
> - if (adev->asic_type >= CHIP_NAVI10) {
> - flags |= AMDGPU_PTE_SNOOPED;
> - flags |= AMDGPU_PTE_LOG;
> - flags |= AMDGPU_PTE_SYSTEM;
> - }
> - flags &= ~AMDGPU_PTE_VALID;
> - }
> - if (adev->asic_type == CHIP_ARCTURUS &&
> - !(flags & AMDGPU_PTE_SYSTEM) &&
> - mapping->bo_va->is_xgmi)
> - flags |= AMDGPU_PTE_SNOOPED;
> + /* Apply ASIC specific mapping flags */
> + amdgpu_gmc_get_vm_pte(adev, mapping, );
>   
>   trace_amdgpu_vm_bo_update(mapping);
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> index 7eb0ba87fef9..1a8d8f528b01 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> @@ -440,12 +440,32 @@ static void gmc_v10_0_get_vm_pde(struct amdgpu_device 
> *adev, int level,
>   }
>   }
>   
> +static void gmc_v10_0_get_vm_pte(struct amdgpu_device *adev,
> +  struct amdgpu_bo_va_mapping *mapping,
> +  uint64_t *flags)
> +{
> + *flags &= ~AMDGPU_PTE_EXECUTABLE;
> + *flags |= mapping->flags & AMDGPU_PTE_EXECUTABLE;
> +
> + *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
> + *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
> +
> + if (mapping->flags & AMDGPU_PTE_PRT) {
> + *flags |= AMDGPU_PTE_PRT;
> + *flags |= AMDGPU_PTE_SNOOPED;
> + *flags |= AMDGPU_PTE_LOG;
> + *flags |= AMDGPU_PTE_SYSTEM;
> + *flags &= ~AMDGPU_PTE_VALID;
> + }
> +}
> +
>   static 

Re: [PATCH v3 2/3] dmr/amdgpu: Avoid HW GPU reset for RAS.

2019-08-30 Thread Kuehling, Felix
On 2019-08-30 12:39 p.m., Andrey Grodzovsky wrote:
> Problem:
> Under certain conditions, when some IP bocks take a RAS error,
> we can get into a situation where a GPU reset is not possible
> due to issues in RAS in SMU/PSP.
>
> Temporary fix until proper solution in PSP/SMU is ready:
> When uncorrectable error happens the DF will unconditionally
> broadcast error event packets to all its clients/slave upon
> receiving fatal error event and freeze all its outbound queues,
> err_event_athub interrupt  will be triggered.
> In such case and we use this interrupt
> to issue GPU reset. THe GPU reset code is modified for such case to avoid HW
> reset, only stops schedulers, deatches all in progress and not yet scheduled
> job's fences, set error code on them and signals.
> Also reject any new incoming job submissions from user space.
> All this is done to notify the applications of the problem.
>
> v2:
> Extract amdgpu_amdkfd_pre/post_reset from amdgpu_device_lock/unlock_adev
> Move amdgpu_job_stop_all_jobs_on_sched to amdgpu_job.c
> Remove print param from amdgpu_ras_query_error_count
>
> v3:
> Update based on previous bug fixing patch to properly call 
> amdgpu_amdkfd_pre_reset
> for other XGMI hive memebers.
>
> Signed-off-by: Andrey Grodzovsky 

The KFD part looks good to me. Acked-by: Felix Kuehling 



> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |  4 
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 38 
> ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|  5 
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 38 
> ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.h|  3 +++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c|  6 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 22 +++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h| 10 
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 10 
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 24 ++-
>   drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c |  5 
>   drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 32 +
>   12 files changed, 155 insertions(+), 42 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index d860170..494c384 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -38,6 +38,7 @@
>   #include "amdgpu_gmc.h"
>   #include "amdgpu_gem.h"
>   #include "amdgpu_display.h"
> +#include "amdgpu_ras.h"
>   
>   static int amdgpu_cs_user_fence_chunk(struct amdgpu_cs_parser *p,
> struct drm_amdgpu_cs_chunk_fence *data,
> @@ -1438,6 +1439,9 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void *data, 
> struct drm_file *filp)
>   bool reserved_buffers = false;
>   int i, r;
>   
> + if (amdgpu_ras_intr_triggered())
> + return -EHWPOISON;
> +
>   if (!adev->accel_working)
>   return -EBUSY;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 19f6624..c9825ae 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3727,25 +3727,18 @@ static bool amdgpu_device_lock_adev(struct 
> amdgpu_device *adev, bool trylock)
>   adev->mp1_state = PP_MP1_STATE_NONE;
>   break;
>   }
> - /* Block kfd: SRIOV would do it separately */
> - if (!amdgpu_sriov_vf(adev))
> -amdgpu_amdkfd_pre_reset(adev);
>   
>   return true;
>   }
>   
>   static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)
>   {
> - /*unlock kfd: SRIOV would do it separately */
> - if (!amdgpu_sriov_vf(adev))
> -amdgpu_amdkfd_post_reset(adev);
>   amdgpu_vf_error_trans_all(adev);
>   adev->mp1_state = PP_MP1_STATE_NONE;
>   adev->in_gpu_reset = 0;
>   mutex_unlock(>lock_reset);
>   }
>   
> -
>   /**
>* amdgpu_device_gpu_recover - reset the asic and recover scheduler
>*
> @@ -3765,11 +3758,12 @@ int amdgpu_device_gpu_recover(struct amdgpu_device 
> *adev,
>   struct amdgpu_hive_info *hive = NULL;
>   struct amdgpu_device *tmp_adev = NULL;
>   int i, r = 0;
> + bool in_ras_intr = amdgpu_ras_intr_triggered();
>   
>   need_full_reset = job_signaled = false;
>   INIT_LIST_HEAD(_list);
>   
> - dev_info(adev->dev, "GPU reset begin!\n");
> + dev_info(adev->dev, "GPU %s begin!\n", in_ras_intr ? "jobs 
> stop":"reset");
>   
>   cancel_delayed_work_sync(>delayed_init_work);
>   
> @@ -3796,9 +3790,16 @@ int amdgpu_device_gpu_recover(struct amdgpu_device 
> *adev,
>   return 0;
>   }
>   
> + /* Block kfd: SRIOV would do it separately */
> + if (!amdgpu_sriov_vf(adev))
> +amdgpu_amdkfd_pre_reset(adev);
> +
>   /* Build list of devices to reset */
>   if  

Re: [PATCH v3 1/3] drm/amdgpu: Fix bugs in amdgpu_device_gpu_recover in XGMI case.

2019-08-30 Thread Kuehling, Felix
On 2019-08-30 12:39 p.m., Andrey Grodzovsky wrote:
> Issue 1:
> In  XGMI case amdgpu_device_lock_adev for other devices in hive
> was called to late, after access to their repsective schedulers.
> So relocate the lock to the begining of accessing the other devs.
>
> Issue 2:
> Using amdgpu_device_ip_need_full_reset to switch the device list from
> all devices in hive to the single 'master' device who owns this reset
> call is wrong because when stopping schedulers we iterate all the devices
> in hive but when restarting we will only reactivate the 'master' device.
> Also, in case amdgpu_device_pre_asic_reset conlcudes that full reset IS
> needed we then have to stop schedulers for all devices in hive and not
> only the 'master' but with amdgpu_device_ip_need_full_reset  we
> already missed the opprotunity do to so. So just remove this logic and
> always stop and start all schedulers for all devices in hive.
>
> Also minor cleanup and print fix.
>
> Signed-off-by: Andrey Grodzovsky 

Minor nit-pick inline. With that fixed this patch is Acked-by: Felix 
Kuehling 


> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 25 +++--
>   1 file changed, 11 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index a5daccc..19f6624 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3814,15 +3814,16 @@ int amdgpu_device_gpu_recover(struct amdgpu_device 
> *adev,
>   device_list_handle = _list;
>   }
>   
> - /*
> -  * Mark these ASICs to be reseted as untracked first
> -  * And add them back after reset completed
> -  */
> - list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head)
> - amdgpu_unregister_gpu_instance(tmp_adev);
> -
>   /* block all schedulers and reset given job's ring */
>   list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) {
> + if (tmp_adev != adev)
> + amdgpu_device_lock_adev(tmp_adev, false);
> + /*
> +  * Mark these ASICs to be reseted as untracked first
> +  * And add them back after reset completed
> +  */
> + amdgpu_unregister_gpu_instance(tmp_adev);
> +
>   /* disable ras on ALL IPs */
>   if (amdgpu_device_ip_need_full_reset(tmp_adev))
>   amdgpu_ras_suspend(tmp_adev);
> @@ -3848,9 +3849,6 @@ int amdgpu_device_gpu_recover(struct amdgpu_device 
> *adev,
>   dma_fence_is_signaled(job->base.s_fence->parent))
>   job_signaled = true;
>   
> - if (!amdgpu_device_ip_need_full_reset(adev))
> - device_list_handle = _list;
> -
>   if (job_signaled) {
>   dev_info(adev->dev, "Guilty job already signaled, skipping HW 
> reset");
>   goto skip_hw_reset;
> @@ -3869,10 +3867,9 @@ int amdgpu_device_gpu_recover(struct amdgpu_device 
> *adev,
>   retry:  /* Rest of adevs pre asic reset from XGMI hive. */
>   list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) {
>   
> - if (tmp_adev == adev)
> + if(tmp_adev == adev)

The space before ( was correct coding style. This will trigger a 
checkpatch error or warning.


>   continue;
>   
> - amdgpu_device_lock_adev(tmp_adev, false);
>   r = amdgpu_device_pre_asic_reset(tmp_adev,
>NULL,
>_full_reset);
> @@ -3921,10 +3918,10 @@ int amdgpu_device_gpu_recover(struct amdgpu_device 
> *adev,
>   
>   if (r) {
>   /* bad news, how to tell it to userspace ? */
> - dev_info(tmp_adev->dev, "GPU reset(%d) failed\n", 
> atomic_read(>gpu_reset_counter));
> + dev_info(tmp_adev->dev, "GPU reset(%d) failed\n", 
> atomic_read(_adev->gpu_reset_counter));
>   amdgpu_vf_error_put(tmp_adev, 
> AMDGIM_ERROR_VF_GPU_RESET_FAIL, 0, r);
>   } else {
> - dev_info(tmp_adev->dev, "GPU reset(%d) succeeded!\n", 
> atomic_read(>gpu_reset_counter));
> + dev_info(tmp_adev->dev, "GPU reset(%d) succeeded!\n", 
> atomic_read(_adev->gpu_reset_counter));
>   }
>   
>   amdgpu_device_unlock_adev(tmp_adev);
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 2/2] drm/amdgpu: Disable page faults while reading user wptrs

2019-08-29 Thread Kuehling, Felix
These wptrs must be pinned and GPU accessible when this is called
from hqd_load functions. So they should never fault. This resolves
a circular lock dependency issue involving four locks including the
DQM lock and mmap_sem.

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 1af8f83f7e02..c003d9275837 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -179,10 +179,17 @@ uint64_t amdgpu_amdkfd_get_mmio_remap_phys_addr(struct 
kgd_dev *kgd);
 uint32_t amdgpu_amdkfd_get_num_gws(struct kgd_dev *kgd);
 uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct kgd_dev *dst, struct kgd_dev 
*src);
 
+/* Read user wptr from a specified user address space with page fault
+ * disabled. The memory must be pinned and mapped to the hardware when
+ * this is called in hqd_load functions, so it should never fault in
+ * the first place. This resolves a circular lock dependency involving
+ * four locks, including the DQM lock and mmap_sem.
+ */
 #define read_user_wptr(mmptr, wptr, dst)   \
({  \
bool valid = false; \
if ((mmptr) && (wptr)) {\
+   pagefault_disable();\
if ((mmptr) == current->mm) {   \
valid = !get_user((dst), (wptr));   \
} else if (current->mm == NULL) {   \
@@ -190,6 +197,7 @@ uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct kgd_dev 
*dst, struct kgd_dev *s
valid = !get_user((dst), (wptr));   \
unuse_mm(mmptr);\
}   \
+   pagefault_enable(); \
}   \
valid;  \
})
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/2] drm/amdgpu: Remove unnecessary TLB workaround

2019-08-29 Thread Kuehling, Felix
This workaround is better handled in user mode in a way that doesn't
require allocating extra memory and breaking userptr BOs.

The TLB bug is a performance bug, not a functional or security bug.
Hence it is safe to remove this kernel part of the workaround to
allow a better workaround using only virtual address alignments in
user mode.

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 42d209f5fd18..2c73ea7c425c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1110,7 +1110,6 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
uint64_t user_addr = 0;
struct amdgpu_bo *bo;
struct amdgpu_bo_param bp;
-   int byte_align;
u32 domain, alloc_domain;
u64 alloc_flags;
int ret;
@@ -1165,15 +1164,6 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
if ((*mem)->aql_queue)
size = size >> 1;
 
-   /* Workaround for TLB bug on older VI chips */
-   byte_align = (adev->family == AMDGPU_FAMILY_VI &&
-   adev->asic_type != CHIP_FIJI &&
-   adev->asic_type != CHIP_POLARIS10 &&
-   adev->asic_type != CHIP_POLARIS11 &&
-   adev->asic_type != CHIP_POLARIS12 &&
-   adev->asic_type != CHIP_VEGAM) ?
-   VI_BO_SIZE_ALIGN : 1;
-
(*mem)->alloc_flags = flags;
 
amdgpu_sync_create(&(*mem)->sync);
@@ -1189,7 +1179,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
 
memset(, 0, sizeof(bp));
bp.size = size;
-   bp.byte_align = byte_align;
+   bp.byte_align = 1;
bp.domain = alloc_domain;
bp.flags = alloc_flags;
bp.type = bo_type;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v2 1/2] dmr/amdgpu: Avoid HW GPU reset for RAS.

2019-08-29 Thread Kuehling, Felix

On 2019-08-29 8:53 p.m., Andrey Grodzovsky wrote:
> Problem:
> Under certain conditions, when some IP bocks take a RAS error,
> we can get into a situation where a GPU reset is not possible
> due to issues in RAS in SMU/PSP.
>
> Temporary fix until proper solution in PSP/SMU is ready:
> When uncorrectable error happens the DF will unconditionally
> broadcast error event packets to all its clients/slave upon
> receiving fatal error event and freeze all its outbound queues,
> err_event_athub interrupt  will be triggered.
> In such case and we use this interrupt
> to issue GPU reset. THe GPU reset code is modified for such case to avoid HW
> reset, only stops schedulers, deatches all in progress and not yet scheduled
> job's fences, set error code on them and signals.
> Also reject any new incoming job submissions from user space.
> All this is done to notify the applications of the problem.
>
> v2:
> Extract amdgpu_amdkfd_pre/post_reset from amdgpu_device_lock/unlock_adev
> Move amdgpu_job_stop_all_jobs_on_sched to amdgpu_job.c
> Remove print param from amdgpu_ras_query_error_count
>
> Signed-off-by: Andrey Grodzovsky 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |  4 +++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 46 
> +++---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|  5 
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 38 
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.h|  3 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c|  6 
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 22 --
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h| 10 +++
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 10 ---
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 24 +---
>   drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c |  5 
>   drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 32 +++--
>   12 files changed, 163 insertions(+), 42 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 9da681e..300adb8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -38,6 +38,7 @@
>   #include "amdgpu_gmc.h"
>   #include "amdgpu_gem.h"
>   #include "amdgpu_display.h"
> +#include "amdgpu_ras.h"
>   
>   #if defined(HAVE_DRM_FREE_LARGE)
>   #define kvfree drm_free_large
> @@ -1461,6 +1462,9 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void *data, 
> struct drm_file *filp)
>   bool reserved_buffers = false;
>   int i, r;
>   
> + if (amdgpu_ras_intr_triggered())
> + return -EHWPOISON;
> +
>   if (!adev->accel_working)
>   return -EBUSY;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index a5daccc..d3a078b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3727,25 +3727,18 @@ static bool amdgpu_device_lock_adev(struct 
> amdgpu_device *adev, bool trylock)
>   adev->mp1_state = PP_MP1_STATE_NONE;
>   break;
>   }
> - /* Block kfd: SRIOV would do it separately */
> - if (!amdgpu_sriov_vf(adev))
> -amdgpu_amdkfd_pre_reset(adev);
>   
>   return true;
>   }
>   
>   static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)
>   {
> - /*unlock kfd: SRIOV would do it separately */
> - if (!amdgpu_sriov_vf(adev))
> -amdgpu_amdkfd_post_reset(adev);
>   amdgpu_vf_error_trans_all(adev);
>   adev->mp1_state = PP_MP1_STATE_NONE;
>   adev->in_gpu_reset = 0;
>   mutex_unlock(>lock_reset);
>   }
>   
> -
>   /**
>* amdgpu_device_gpu_recover - reset the asic and recover scheduler
>*
> @@ -3765,11 +3758,12 @@ int amdgpu_device_gpu_recover(struct amdgpu_device 
> *adev,
>   struct amdgpu_hive_info *hive = NULL;
>   struct amdgpu_device *tmp_adev = NULL;
>   int i, r = 0;
> + bool in_ras_intr = amdgpu_ras_intr_triggered();
>   
>   need_full_reset = job_signaled = false;
>   INIT_LIST_HEAD(_list);
>   
> - dev_info(adev->dev, "GPU reset begin!\n");
> + dev_info(adev->dev, "GPU %s begin!\n", in_ras_intr ? "jobs 
> stop":"reset");
>   
>   cancel_delayed_work_sync(>delayed_init_work);
>   
> @@ -3796,9 +3790,16 @@ int amdgpu_device_gpu_recover(struct amdgpu_device 
> *adev,
>   return 0;
>   }
>   
> + /* Block kfd: SRIOV would do it separately */
> + if (!amdgpu_sriov_vf(adev))
> +amdgpu_amdkfd_pre_reset(adev);
> +
>   /* Build list of devices to reset */
>   if  (adev->gmc.xgmi.num_physical_nodes > 1) {
>   if (!hive) {
> + /*unlock kfd: SRIOV would do it separately */
> + if (!amdgpu_sriov_vf(adev))
> + amdgpu_amdkfd_post_reset(adev);
>   

[PATCH 3/4] drm/amdgpu: Determing PTE flags separately for each mapping (v3)

2019-08-29 Thread Kuehling, Felix
The same BO can be mapped with different PTE flags by different GPUs.
Therefore determine the PTE flags separately for each mapping instead
of storing them in the KFD buffer object.

Add a helper function to determine the PTE flags to be extended with
ASIC and memory-type-specific logic in subsequent commits.

v2: Split Arcturus-specific MTYPE changes into separate commit
v3: Fix return type of get_pte_flags to uint64_t

Signed-off-by: Felix Kuehling 
Acked-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 39 +++
 2 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index e519df3fd2b6..1af8f83f7e02 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -57,7 +57,7 @@ struct kgd_mem {
unsigned int mapped_to_gpu_memory;
uint64_t va;
 
-   uint32_t mapping_flags;
+   uint32_t alloc_flags;
 
atomic_t invalid;
struct amdkfd_process_info *process_info;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 44a52b09cc58..aae19d221f42 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -355,6 +355,23 @@ static int vm_update_pds(struct amdgpu_vm *vm, struct 
amdgpu_sync *sync)
return amdgpu_sync_fence(NULL, sync, vm->last_update, false);
 }
 
+static uint64_t get_pte_flags(struct amdgpu_device *adev, struct kgd_mem *mem)
+{
+   bool coherent = mem->alloc_flags & ALLOC_MEM_FLAGS_COHERENT;
+   uint32_t mapping_flags;
+
+   mapping_flags = AMDGPU_VM_PAGE_READABLE;
+   if (mem->alloc_flags & ALLOC_MEM_FLAGS_WRITABLE)
+   mapping_flags |= AMDGPU_VM_PAGE_WRITEABLE;
+   if (mem->alloc_flags & ALLOC_MEM_FLAGS_EXECUTABLE)
+   mapping_flags |= AMDGPU_VM_PAGE_EXECUTABLE;
+
+   mapping_flags |= coherent ?
+   AMDGPU_VM_MTYPE_UC : AMDGPU_VM_MTYPE_NC;
+
+   return amdgpu_gmc_get_pte_flags(adev, mapping_flags);
+}
+
 /* add_bo_to_vm - Add a BO to a VM
  *
  * Everything that needs to bo done only once when a BO is first added
@@ -403,8 +420,7 @@ static int add_bo_to_vm(struct amdgpu_device *adev, struct 
kgd_mem *mem,
}
 
bo_va_entry->va = va;
-   bo_va_entry->pte_flags = amdgpu_gmc_get_pte_flags(adev,
-mem->mapping_flags);
+   bo_va_entry->pte_flags = get_pte_flags(adev, mem);
bo_va_entry->kgd_dev = (void *)adev;
list_add(_va_entry->bo_list, list_bo_va);
 
@@ -1081,7 +1097,6 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
int byte_align;
u32 domain, alloc_domain;
u64 alloc_flags;
-   uint32_t mapping_flags;
int ret;
 
/*
@@ -1143,16 +1158,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
adev->asic_type != CHIP_VEGAM) ?
VI_BO_SIZE_ALIGN : 1;
 
-   mapping_flags = AMDGPU_VM_PAGE_READABLE;
-   if (flags & ALLOC_MEM_FLAGS_WRITABLE)
-   mapping_flags |= AMDGPU_VM_PAGE_WRITEABLE;
-   if (flags & ALLOC_MEM_FLAGS_EXECUTABLE)
-   mapping_flags |= AMDGPU_VM_PAGE_EXECUTABLE;
-   if (flags & ALLOC_MEM_FLAGS_COHERENT)
-   mapping_flags |= AMDGPU_VM_MTYPE_UC;
-   else
-   mapping_flags |= AMDGPU_VM_MTYPE_NC;
-   (*mem)->mapping_flags = mapping_flags;
+   (*mem)->alloc_flags = flags;
 
amdgpu_sync_create(&(*mem)->sync);
 
@@ -1625,9 +1631,10 @@ int amdgpu_amdkfd_gpuvm_import_dmabuf(struct kgd_dev 
*kgd,
 
INIT_LIST_HEAD(&(*mem)->bo_va_list);
mutex_init(&(*mem)->lock);
-   (*mem)->mapping_flags =
-   AMDGPU_VM_PAGE_READABLE | AMDGPU_VM_PAGE_WRITEABLE |
-   AMDGPU_VM_PAGE_EXECUTABLE | AMDGPU_VM_MTYPE_NC;
+   (*mem)->alloc_flags =
+   ((bo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM) ?
+ALLOC_MEM_FLAGS_VRAM : ALLOC_MEM_FLAGS_GTT) |
+   ALLOC_MEM_FLAGS_WRITABLE | ALLOC_MEM_FLAGS_EXECUTABLE;
 
(*mem)->bo = amdgpu_bo_ref(bo);
(*mem)->va = va;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] dmr/amdgpu: Avoid HW GPU reset for RAS.

2019-08-29 Thread Kuehling, Felix
On 2019-08-29 1:21 p.m., Grodzovsky, Andrey wrote:
> On 8/29/19 12:18 PM, Kuehling, Felix wrote:
>> On 2019-08-29 10:08 a.m., Grodzovsky, Andrey wrote:
>>> Agree, the placement of amdgpu_amdkfd_pre/post _reset in
>>> amdgpu_device_lock/unlock_adev is a bit wierd.
>>>
>> amdgpu_device_reset_sriov already calls amdgpu_amdkfd_pre/post_reset
>> itself while it has exclusive access to the GPU.
> So in that case amdgpu_amdkfd_pre/post_reset gets called twice - once
> from amdgpu_device_lock/unlock_adev and second time from
> amdgpu_device_reset_sriov, no ? Why is it ?

No, it's not called twice because the bare metal case has conditions if 
(!amdgpu_sriov_vf(adev)). If you don't move the 
amdgpu_amdkfd_pre/post_reset calls into a bare-metal-specific code-path 
(such as amdgpu_do_asic_reset), you'll need to keep those conditions.


>
>
>> It would make sense to
>> move the same calls into amdgpu_do_asic_reset for the bare-metal case.
>
> Problem is i am skipping amdgpu_do_asic_reset totally in this case as
> there is no HW reset here so i will just extract it from
> amdgpu_device_lock/unlock_adev

OK.

Regards,
   Felix


>
> Andrey
>
>
>> Regards,
>>  Felix
>>
>>
>>> Andrey
>>>
>>> On 8/29/19 10:06 AM, Koenig, Christian wrote:
>>>>> Felix advised that the way to stop all KFD activity is simply to NOT
>>>>> call amdgpu_amdkfd_post_reset so that why I added this. Do you mean you
>>>>> prefer amdgpu_amdkfd_post_reset to be outside of 
>>>>> amdgpu_device_unlock_adev ?
>>>> Yes, exactly. It doesn't seems to be related to the unlock operation in
>>>> the first place, but rather only signals the KFD that the reset is
>>>> completed.
>>>>
>>>> Christian.
>>>>
>>> ___
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  1   2   3   4   5   6   >