[PATCH] drm/amdgpu: remove the intterupt handling for the KIQ events

2018-10-04 Thread Shirish S
[Why]
1. we never submit IBs to the KIQ
2. there seems to be ~500ms delay during amdgpu resume spent in KIQ,
   hence, KIQ interrupts are not working correctly.

[How]
remove interrupt handling for KIQ.

Signed-off-by: Shirish S 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 59 ---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 75 ---
 2 files changed, 134 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 2aeef2b..dc9d5bf 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -2048,11 +2048,6 @@ static int gfx_v8_0_sw_init(void *handle)
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 8;
 
-   /* KIQ event */
-   r = amdgpu_irq_add_id(adev, AMDGPU_IRQ_CLIENTID_LEGACY, 
VISLANDS30_IV_SRCID_CP_INT_IB2, &adev->gfx.kiq.irq);
-   if (r)
-   return r;
-
/* EOP Event */
r = amdgpu_irq_add_id(adev, AMDGPU_IRQ_CLIENTID_LEGACY, 
VISLANDS30_IV_SRCID_CP_END_OF_PIPE, &adev->gfx.eop_irq);
if (r)
@@ -7025,52 +7020,6 @@ static int gfx_v8_0_sq_irq(struct amdgpu_device *adev,
return 0;
 }
 
-static int gfx_v8_0_kiq_set_interrupt_state(struct amdgpu_device *adev,
-   struct amdgpu_irq_src *src,
-   unsigned int type,
-   enum amdgpu_interrupt_state state)
-{
-   struct amdgpu_ring *ring = &(adev->gfx.kiq.ring);
-
-   switch (type) {
-   case AMDGPU_CP_KIQ_IRQ_DRIVER0:
-   WREG32_FIELD(CPC_INT_CNTL, GENERIC2_INT_ENABLE,
-state == AMDGPU_IRQ_STATE_DISABLE ? 0 : 1);
-   if (ring->me == 1)
-   WREG32_FIELD_OFFSET(CP_ME1_PIPE0_INT_CNTL,
-ring->pipe,
-GENERIC2_INT_ENABLE,
-state == AMDGPU_IRQ_STATE_DISABLE ? 0 : 1);
-   else
-   WREG32_FIELD_OFFSET(CP_ME2_PIPE0_INT_CNTL,
-ring->pipe,
-GENERIC2_INT_ENABLE,
-state == AMDGPU_IRQ_STATE_DISABLE ? 0 : 1);
-   break;
-   default:
-   BUG(); /* kiq only support GENERIC2_INT now */
-   break;
-   }
-   return 0;
-}
-
-static int gfx_v8_0_kiq_irq(struct amdgpu_device *adev,
-   struct amdgpu_irq_src *source,
-   struct amdgpu_iv_entry *entry)
-{
-   u8 me_id, pipe_id, queue_id;
-   struct amdgpu_ring *ring = &(adev->gfx.kiq.ring);
-
-   me_id = (entry->ring_id & 0x0c) >> 2;
-   pipe_id = (entry->ring_id & 0x03) >> 0;
-   queue_id = (entry->ring_id & 0x70) >> 4;
-   DRM_DEBUG("IH: CPC GENERIC2_INT, me:%d, pipe:%d, queue:%d\n",
-  me_id, pipe_id, queue_id);
-
-   amdgpu_fence_process(ring);
-   return 0;
-}
-
 static const struct amd_ip_funcs gfx_v8_0_ip_funcs = {
.name = "gfx_v8_0",
.early_init = gfx_v8_0_early_init,
@@ -7221,11 +7170,6 @@ static const struct amdgpu_irq_src_funcs 
gfx_v8_0_priv_inst_irq_funcs = {
.process = gfx_v8_0_priv_inst_irq,
 };
 
-static const struct amdgpu_irq_src_funcs gfx_v8_0_kiq_irq_funcs = {
-   .set = gfx_v8_0_kiq_set_interrupt_state,
-   .process = gfx_v8_0_kiq_irq,
-};
-
 static const struct amdgpu_irq_src_funcs gfx_v8_0_cp_ecc_error_irq_funcs = {
.set = gfx_v8_0_set_cp_ecc_int_state,
.process = gfx_v8_0_cp_ecc_error_irq,
@@ -7247,9 +7191,6 @@ static void gfx_v8_0_set_irq_funcs(struct amdgpu_device 
*adev)
adev->gfx.priv_inst_irq.num_types = 1;
adev->gfx.priv_inst_irq.funcs = &gfx_v8_0_priv_inst_irq_funcs;
 
-   adev->gfx.kiq.irq.num_types = AMDGPU_CP_KIQ_IRQ_LAST;
-   adev->gfx.kiq.irq.funcs = &gfx_v8_0_kiq_irq_funcs;
-
adev->gfx.cp_ecc_error_irq.num_types = 1;
adev->gfx.cp_ecc_error_irq.funcs = &gfx_v8_0_cp_ecc_error_irq_funcs;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 7a6a814..edf23a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -1711,11 +1711,6 @@ static int gfx_v9_0_sw_init(void *handle)
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 8;
 
-   /* KIQ event */
-   r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_GRBM_CP, 
GFX_9_0__SRCID__CP_IB2_INTERRUPT_PKT, &adev->gfx.kiq.irq);
-   if (r)
-   return r;
-
/* EOP Event */
r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_GRBM_CP, 
GFX_9_0__SRCID__CP_EOP_INTERRUPT, &adev->gfx.eop_irq);
if (r)
@@ -4716,68 +4711,6 @@ static int gfx_v9_0_priv_inst_irq(struct amdgpu_device 
*adev,
return 0;
 }
 
-stat

Re: [PATCH] drm/amdgpu: skip IB tests for KIQ in general

2018-10-04 Thread S, Shirish



On 10/4/2018 12:41 PM, Christian König wrote:

Am 03.10.2018 um 17:15 schrieb Shirish S:

From: Pratik Vishwakarma 

[Why]
1. We never submit IBs to KIQ.
2. Ring test pass without KIQ's ring also.
3. By skipping we see an improvement of around 500ms
    in the amdgpu's resume time.

[How]
skip IB tests for KIQ ring type.

Signed-off-by: Shirish S 
Signed-off-by: Pratik Vishwakarma 


Well I'm not sure if that is a good idea or not.

On the one hand it is true that we never submit IBs to the KIQ, so 
testing that doesn't make much sense actually.


But on the other hand the 500ms delay during resume points out a 
problem with the KIQ, e.g. interrupts are not working correctly!


Question is now if we should ignore that problem because we never use 
interrupts on the KIQ?


Yes Christian,  that's the approach as no point fixing something we 
never use.
If the answer is to keep it as it is we should remove the intterupt 
handling for the KIQ as well.


I have sent a patch that shall remove interrupt handling for KIQ, please 
review.


Regards,
Shirish S
Otherwise I would say we should fix interrupts on the KIQ and then we 
also don't need this change any more.


Regards,
Christian.


---

This patch is a follow-up to the suggestion given by Alex,
while reviewing the patch: 
https://patchwork.freedesktop.org/patch/250912/


-Shirish S

  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c

index 47817e0..b8963b7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -354,6 +354,14 @@ int amdgpu_ib_ring_tests(struct amdgpu_device 
*adev)

  if (!ring || !ring->ready)
  continue;
  +    /* skip IB tests for KIQ in general for the below reasons:
+ * 1. We never submit IBs to the KIQ
+ * 2. KIQ doesn't use the EOP interrupts,
+ *    we use some other CP interrupt.
+ */
+    if (ring->funcs->type == AMDGPU_RING_TYPE_KIQ)
+    continue;
+
  /* MM engine need more time */
  if (ring->funcs->type == AMDGPU_RING_TYPE_UVD ||
  ring->funcs->type == AMDGPU_RING_TYPE_VCE ||




--
Regards,
Shirish S

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm] libdrm: Allow dynamic drm majors on linux

2018-10-04 Thread Thomas Hellstrom
Hi, Emil,

On 10/04/2018 04:12 PM, Emil Velikov wrote:
> On Sun, 30 Sep 2018 at 18:31, Thomas Hellstrom  wrote:
>> Hi, Emil,
>>
>> On 09/05/2018 03:53 PM, Emil Velikov wrote:
>>> On 5 September 2018 at 14:20, Thomas Hellstrom  
>>> wrote:
>>>
> In that case, please give me 24h to do a libdrm release before pushing.
> I had to push some workarounds for the sandboxing mentioned earlier :-\
>
> Thanks
> Emil
 Ouch, I just pushed the patch, but feel free to cut the release just before
 that commit.

>>> That doesn't quite work. Barring any objections I'll: revert, release, 
>>> reapply.
>>>
>>> -Emil
>>> ___
>>> dri-devel mailing list
>>> dri-de...@lists.freedesktop.org
>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&data=02%7C01%7Cthellstrom%40vmware.com%7C01a5434f4ae94d41e02108d62a042d8b%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636742594763251508&sdata=f4pVmMl%2B2GfI8HIR4GriTD1Ed2eHyEdAttMbcFoavp0%3D&reserved=0
>> What happened here? I can't really see my commit nor a revert nor a
>> release in libdrm.
>>
> Coming back from holidays+XDC. I' m doing a release in a moment and
> will pick your patch just after that.
>
> Hmm you said you pushed the patch, yet it's not in master ... Not sure
> what happened there.
> Either way - it'll be there shortly.

Yes, that's strange. I'm offsite too so I can't check the system from 
which I pushed it. But anyway, I'll push it later today if you haven't 
already.

Thanks,
Thomas


>
> Thanks
> Emil


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[pull] amdgpu drm-fixes-4.19

2018-10-04 Thread Alex Deucher
Hi Dave,

Just two small fixes for 4.19:
- Fix an ordering issue in DC with respect to atomic flips that could result
  in a crash
- Fix incorrect use of process->mm in KFD

The following changes since commit d8938c981f58ee344687b7910a611ac345960045:

  Merge branch 'drm-tda9950-fixes' of git://git.armlinux.org.uk/~rmk/linux-arm 
into drm-fixes (2018-10-04 10:32:14 +1000)

are available in the git repository at:

  git://people.freedesktop.org/~agd5f/linux drm-fixes-4.19

for you to fetch changes up to 11b29c9e25788d0afb2ddb67bcd89424bd25f2f7:

  drm/amdkfd: Fix incorrect use of process->mm (2018-10-04 11:37:25 -0400)


Felix Kuehling (1):
  drm/amdkfd: Fix incorrect use of process->mm

Shirish S (1):
  drm/amd/display: Signal hw_done() after waiting for flip_done()

 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 37 +-
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c  | 10 --
 2 files changed, 37 insertions(+), 10 deletions(-)
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm] libdrm: Allow dynamic drm majors on linux

2018-10-04 Thread Emil Velikov
On Sun, 30 Sep 2018 at 18:31, Thomas Hellstrom  wrote:
>
> Hi, Emil,
>
> On 09/05/2018 03:53 PM, Emil Velikov wrote:
> > On 5 September 2018 at 14:20, Thomas Hellstrom  
> > wrote:
> >
> >>> In that case, please give me 24h to do a libdrm release before pushing.
> >>> I had to push some workarounds for the sandboxing mentioned earlier :-\
> >>>
> >>> Thanks
> >>> Emil
> >>
> >> Ouch, I just pushed the patch, but feel free to cut the release just before
> >> that commit.
> >>
> > That doesn't quite work. Barring any objections I'll: revert, release, 
> > reapply.
> >
> > -Emil
> > ___
> > dri-devel mailing list
> > dri-de...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
> What happened here? I can't really see my commit nor a revert nor a
> release in libdrm.
>
Coming back from holidays+XDC. I' m doing a release in a moment and
will pick your patch just after that.

Hmm you said you pushed the patch, yet it's not in master ... Not sure
what happened there.
Either way - it'll be there shortly.

Thanks
Emil
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 2/5] [libdrm] addr cs chunk for syncobj timeline

2018-10-04 Thread Emil Velikov
On Wed, 19 Sep 2018 at 10:31, Chunming Zhou  wrote:
>
> Signed-off-by: Chunming Zhou 
> ---
>  include/drm/amdgpu_drm.h | 10 ++
>  1 file changed, 10 insertions(+)
>
For this and 1/5 - see include/drm/README and git log for examples how
files in include/drm should be updated.
I'll pull merge some patches in a few moments so make sure you've got
latest master first.

-Emil
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 6/8] drm/amdgpu: always reserve two slots for the VM

2018-10-04 Thread Christian König
And drop the now superflous extra reservations.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |  4 
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 15 ++-
 2 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index b8de56d1a866..ba406bd1b08f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -964,10 +964,6 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser 
*p)
if (r)
return r;
 
-   r = reservation_object_reserve_shared(vm->root.base.bo->tbo.resv, 1);
-   if (r)
-   return r;
-
p->job->vm_pd_addr = amdgpu_gmc_pd_addr(vm->root.base.bo);
 
if (amdgpu_vm_debug) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 218527bb0156..1b39b0144698 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -616,7 +616,8 @@ void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
 {
entry->priority = 0;
entry->tv.bo = &vm->root.base.bo->tbo;
-   entry->tv.num_shared = 1;
+   /* One for the VM updates and one for the CS job */
+   entry->tv.num_shared = 2;
entry->user_pages = NULL;
list_add(&entry->tv.head, validated);
 }
@@ -772,10 +773,6 @@ static int amdgpu_vm_clear_bo(struct amdgpu_device *adev,
 
ring = container_of(vm->entity.rq->sched, struct amdgpu_ring, sched);
 
-   r = reservation_object_reserve_shared(bo->tbo.resv, 1);
-   if (r)
-   return r;
-
r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
if (r)
goto error;
@@ -1839,10 +1836,6 @@ static int amdgpu_vm_bo_update_mapping(struct 
amdgpu_device *adev,
if (r)
goto error_free;
 
-   r = reservation_object_reserve_shared(vm->root.base.bo->tbo.resv, 1);
-   if (r)
-   goto error_free;
-
r = amdgpu_vm_update_ptes(¶ms, start, last + 1, addr, flags);
if (r)
goto error_free;
@@ -3023,6 +3016,10 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
if (r)
goto error_free_root;
 
+   r = reservation_object_reserve_shared(root->tbo.resv, 1);
+   if (r)
+   goto error_unreserve;
+
r = amdgpu_vm_clear_bo(adev, vm, root,
   adev->vm_manager.root_level,
   vm->pte_support_ats);
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 7/8] drm/amdgpu: always reserve one more shared slot for pipelined BO moves

2018-10-04 Thread Christian König
This allows us to drop the extra reserve in TTM.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 6 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index ba406bd1b08f..8edf75c76475 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -50,7 +50,8 @@ static int amdgpu_cs_user_fence_chunk(struct amdgpu_cs_parser 
*p,
bo = amdgpu_bo_ref(gem_to_amdgpu_bo(gobj));
p->uf_entry.priority = 0;
p->uf_entry.tv.bo = &bo->tbo;
-   p->uf_entry.tv.num_shared = 1;
+   /* One for TTM and one for the CS job */
+   p->uf_entry.tv.num_shared = 2;
p->uf_entry.user_pages = NULL;
 
drm_gem_object_put_unlocked(gobj);
@@ -598,8 +599,9 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
return r;
}
 
+   /* One for TTM and one for the CS job */
amdgpu_bo_list_for_each_entry(e, p->bo_list)
-   e->tv.num_shared = 1;
+   e->tv.num_shared = 2;
 
amdgpu_bo_list_get_list(p->bo_list, &p->validated);
if (p->bo_list->first_userptr != p->bo_list->num_entries)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 1b39b0144698..8c46acf893cc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -616,8 +616,8 @@ void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
 {
entry->priority = 0;
entry->tv.bo = &vm->root.base.bo->tbo;
-   /* One for the VM updates and one for the CS job */
-   entry->tv.num_shared = 2;
+   /* One for the VM updates, one for TTM and one for the CS job */
+   entry->tv.num_shared = 3;
entry->user_pages = NULL;
list_add(&entry->tv.head, validated);
 }
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 8/8] drm/ttm: drop the extra reservation for pipelined BO moves

2018-10-04 Thread Christian König
The driver is now responsible to allocate enough shared slots.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/ttm/ttm_bo.c | 27 ++-
 1 file changed, 6 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 83b4657ffb10..7bdfc1e8236d 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -858,12 +858,11 @@ EXPORT_SYMBOL(ttm_bo_mem_put);
 /**
  * Add the last move fence to the BO and reserve a new shared slot.
  */
-static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo,
-struct ttm_mem_type_manager *man,
-struct ttm_mem_reg *mem)
+static void ttm_bo_add_move_fence(struct ttm_buffer_object *bo,
+ struct ttm_mem_type_manager *man,
+ struct ttm_mem_reg *mem)
 {
struct dma_fence *fence;
-   int ret;
 
spin_lock(&man->move_lock);
fence = dma_fence_get(man->move);
@@ -871,16 +870,9 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object 
*bo,
 
if (fence) {
reservation_object_add_shared_fence(bo->resv, fence);
-
-   ret = reservation_object_reserve_shared(bo->resv, 1);
-   if (unlikely(ret))
-   return ret;
-
dma_fence_put(bo->moving);
bo->moving = fence;
}
-
-   return 0;
 }
 
 /**
@@ -908,7 +900,8 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object 
*bo,
return ret;
} while (1);
mem->mem_type = mem_type;
-   return ttm_bo_add_move_fence(bo, man, mem);
+   ttm_bo_add_move_fence(bo, man, mem);
+   return 0;
 }
 
 static uint32_t ttm_bo_select_caching(struct ttm_mem_type_manager *man,
@@ -977,10 +970,6 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
bool has_erestartsys = false;
int i, ret;
 
-   ret = reservation_object_reserve_shared(bo->resv, 1);
-   if (unlikely(ret))
-   return ret;
-
mem->mm_node = NULL;
for (i = 0; i < placement->num_placement; ++i) {
const struct ttm_place *place = &placement->placement[i];
@@ -1016,11 +1005,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
return ret;
 
if (mem->mm_node) {
-   ret = ttm_bo_add_move_fence(bo, man, mem);
-   if (unlikely(ret)) {
-   (*man->func->put_node)(man, mem);
-   return ret;
-   }
+   ttm_bo_add_move_fence(bo, man, mem);
break;
}
}
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 4/8] drm/ttm: allow reserving more than one shared slot v2

2018-10-04 Thread Christian König
Let's support simultaneous submissions to multiple engines.

v2: rename the field to num_shared and fix up all users

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c  |  7 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |  2 +-
 drivers/gpu/drm/qxl/qxl_release.c|  2 +-
 drivers/gpu/drm/radeon/radeon_cs.c   |  4 ++--
 drivers/gpu/drm/radeon/radeon_gem.c  |  2 +-
 drivers/gpu/drm/radeon/radeon_vm.c   |  4 ++--
 drivers/gpu/drm/ttm/ttm_execbuf_util.c   | 12 +++-
 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c  |  6 +++---
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c | 10 +-
 include/drm/ttm/ttm_execbuf_util.h   |  4 ++--
 14 files changed, 37 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index df0a059565f9..d0bc0312430c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -536,7 +536,7 @@ static void add_kgd_mem_to_kfd_bo_list(struct kgd_mem *mem,
struct amdgpu_bo *bo = mem->bo;
 
INIT_LIST_HEAD(&entry->head);
-   entry->shared = true;
+   entry->num_shared = 1;
entry->bo = &bo->tbo;
mutex_lock(&process_info->lock);
if (userptr)
@@ -677,7 +677,7 @@ static int reserve_bo_and_vm(struct kgd_mem *mem,
 
ctx->kfd_bo.priority = 0;
ctx->kfd_bo.tv.bo = &bo->tbo;
-   ctx->kfd_bo.tv.shared = true;
+   ctx->kfd_bo.tv.num_shared = 1;
ctx->kfd_bo.user_pages = NULL;
list_add(&ctx->kfd_bo.tv.head, &ctx->list);
 
@@ -741,7 +741,7 @@ static int reserve_bo_and_cond_vms(struct kgd_mem *mem,
 
ctx->kfd_bo.priority = 0;
ctx->kfd_bo.tv.bo = &bo->tbo;
-   ctx->kfd_bo.tv.shared = true;
+   ctx->kfd_bo.tv.num_shared = 1;
ctx->kfd_bo.user_pages = NULL;
list_add(&ctx->kfd_bo.tv.head, &ctx->list);
 
@@ -1808,7 +1808,7 @@ static int validate_invalid_user_pages(struct 
amdkfd_process_info *process_info)
validate_list.head) {
list_add_tail(&mem->resv_list.head, &resv_list);
mem->resv_list.bo = mem->validate_list.bo;
-   mem->resv_list.shared = mem->validate_list.shared;
+   mem->resv_list.num_shared = mem->validate_list.num_shared;
}
 
/* Reserve all BOs and page tables for validation */
@@ -2027,7 +2027,7 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, 
struct dma_fence **ef)
 
list_add_tail(&mem->resv_list.head, &ctx.list);
mem->resv_list.bo = mem->validate_list.bo;
-   mem->resv_list.shared = mem->validate_list.shared;
+   mem->resv_list.num_shared = mem->validate_list.num_shared;
}
 
ret = ttm_eu_reserve_buffers(&ctx.ticket, &ctx.list,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
index 14d2982a47cc..b75d30ee80c6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
@@ -118,7 +118,7 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev, 
struct drm_file *filp,
entry->priority = min(info[i].bo_priority,
  AMDGPU_BO_LIST_MAX_PRIORITY);
entry->tv.bo = &bo->tbo;
-   entry->tv.shared = !bo->prime_shared_count;
+   entry->tv.num_shared = !bo->prime_shared_count;
 
if (bo->preferred_domains == AMDGPU_GEM_DOMAIN_GDS)
list->gds_obj = bo;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 3243da67db9e..4476398e5b26 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -50,7 +50,7 @@ static int amdgpu_cs_user_fence_chunk(struct amdgpu_cs_parser 
*p,
bo = amdgpu_bo_ref(gem_to_amdgpu_bo(gobj));
p->uf_entry.priority = 0;
p->uf_entry.tv.bo = &bo->tbo;
-   p->uf_entry.tv.shared = true;
+   p->uf_entry.tv.num_shared = 1;
p->uf_entry.user_pages = NULL;
 
drm_gem_object_put_unlocked(gobj);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 7b3d1ebda9df..f4f00217546e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -169,7 +169,7 @@ void amdgpu_gem_object_close(struct drm_gem_object *obj,
INIT_LIST_HEAD(&duplicates);
 
tv.bo = &bo->tbo;
-   tv.shared = true;
+   tv.num_shared = 1;
list_add(&tv.head, 

[PATCH 5/8] drm/amdgpu: fix using shared fence for exported BOs v2

2018-10-04 Thread Christian König
It is perfectly possible that the BO list is created before the BO is
exported. While at it cleanup setting shared to one instead of true.

v2: add comment and simplify logic

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c |  1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 13 +++--
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
index b75d30ee80c6..5c79da8e1150 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
@@ -118,7 +118,6 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev, 
struct drm_file *filp,
entry->priority = min(info[i].bo_priority,
  AMDGPU_BO_LIST_MAX_PRIORITY);
entry->tv.bo = &bo->tbo;
-   entry->tv.num_shared = !bo->prime_shared_count;
 
if (bo->preferred_domains == AMDGPU_GEM_DOMAIN_GDS)
list->gds_obj = bo;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 4476398e5b26..b8de56d1a866 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -598,6 +598,9 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
return r;
}
 
+   amdgpu_bo_list_for_each_entry(e, p->bo_list)
+   e->tv.num_shared = 1;
+
amdgpu_bo_list_get_list(p->bo_list, &p->validated);
if (p->bo_list->first_userptr != p->bo_list->num_entries)
p->mn = amdgpu_mn_get(p->adev, AMDGPU_MN_TYPE_GFX);
@@ -717,8 +720,14 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
gws = p->bo_list->gws_obj;
oa = p->bo_list->oa_obj;
 
-   amdgpu_bo_list_for_each_entry(e, p->bo_list)
-   e->bo_va = amdgpu_vm_bo_find(vm, ttm_to_amdgpu_bo(e->tv.bo));
+   amdgpu_bo_list_for_each_entry(e, p->bo_list) {
+   struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
+
+   /* Make sure we use the exclusive slot for shared BOs */
+   if (bo->prime_shared_count)
+   e->tv.num_shared = 0;
+   e->bo_va = amdgpu_vm_bo_find(vm, bo);
+   }
 
if (gds) {
p->job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT;
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/8] dma-buf: allow reserving more than one shared fence slot

2018-10-04 Thread Christian König
Let's support simultaneous submissions to multiple engines.

Signed-off-by: Christian König 
---
 drivers/dma-buf/reservation.c| 13 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |  4 ++--
 drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c |  2 +-
 drivers/gpu/drm/i915/i915_vma.c  |  2 +-
 drivers/gpu/drm/msm/msm_gem_submit.c |  3 ++-
 drivers/gpu/drm/nouveau/nouveau_fence.c  |  2 +-
 drivers/gpu/drm/qxl/qxl_release.c|  2 +-
 drivers/gpu/drm/radeon/radeon_vm.c   |  2 +-
 drivers/gpu/drm/ttm/ttm_bo.c |  4 ++--
 drivers/gpu/drm/ttm/ttm_execbuf_util.c   |  4 ++--
 drivers/gpu/drm/v3d/v3d_gem.c|  2 +-
 drivers/gpu/drm/vc4/vc4_gem.c|  2 +-
 drivers/gpu/drm/vgem/vgem_fence.c|  2 +-
 include/linux/reservation.h  |  3 ++-
 16 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 5825fc336a13..5fb4fd461908 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -56,9 +56,10 @@ const char reservation_seqcount_string[] = 
"reservation_seqcount";
 EXPORT_SYMBOL(reservation_seqcount_string);
 
 /**
- * reservation_object_reserve_shared - Reserve space to add a shared
- * fence to a reservation_object.
+ * reservation_object_reserve_shared - Reserve space to add shared fences to
+ * a reservation_object.
  * @obj: reservation object
+ * @num_fences: number of fences we want to add
  *
  * Should be called before reservation_object_add_shared_fence().  Must
  * be called with obj->lock held.
@@ -66,7 +67,8 @@ EXPORT_SYMBOL(reservation_seqcount_string);
  * RETURNS
  * Zero for success, or -errno
  */
-int reservation_object_reserve_shared(struct reservation_object *obj)
+int reservation_object_reserve_shared(struct reservation_object *obj,
+ unsigned int num_fences)
 {
struct reservation_object_list *old, *new;
unsigned int i, j, k, max;
@@ -74,10 +76,11 @@ int reservation_object_reserve_shared(struct 
reservation_object *obj)
old = reservation_object_get_list(obj);
 
if (old && old->shared_max) {
-   if (old->shared_count < old->shared_max)
+   if ((old->shared_count + num_fences) <= old->shared_max)
return 0;
else
-   max = old->shared_max * 2;
+   max = max(old->shared_count + num_fences,
+ old->shared_max * 2);
} else {
max = 4;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 8836186eb5ef..3243da67db9e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -955,7 +955,7 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser *p)
if (r)
return r;
 
-   r = reservation_object_reserve_shared(vm->root.base.bo->tbo.resv);
+   r = reservation_object_reserve_shared(vm->root.base.bo->tbo.resv, 1);
if (r)
return r;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 904014dc5915..cf768acb51dc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -640,7 +640,7 @@ int amdgpu_bo_backup_to_shadow(struct amdgpu_device *adev,
bo_addr = amdgpu_bo_gpu_offset(bo);
shadow_addr = amdgpu_bo_gpu_offset(bo->shadow);
 
-   r = reservation_object_reserve_shared(bo->tbo.resv);
+   r = reservation_object_reserve_shared(bo->tbo.resv, 1);
if (r)
goto err;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 6904d794d60a..bdce05183edb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -772,7 +772,7 @@ static int amdgpu_vm_clear_bo(struct amdgpu_device *adev,
 
ring = container_of(vm->entity.rq->sched, struct amdgpu_ring, sched);
 
-   r = reservation_object_reserve_shared(bo->tbo.resv);
+   r = reservation_object_reserve_shared(bo->tbo.resv, 1);
if (r)
return r;
 
@@ -1839,7 +1839,7 @@ static int amdgpu_vm_bo_update_mapping(struct 
amdgpu_device *adev,
if (r)
goto error_free;
 
-   r = reservation_object_reserve_shared(vm->root.base.bo->tbo.resv);
+   r = reservation_object_reserve_shared(vm->root.base.bo->tbo.resv, 1);
if (r)
goto error_free;
 
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
index 983e67f19e45..30875f8f2933 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
+++ b/drivers/gpu/drm

[PATCH 3/8] dma-buf: test shared slot allocation when mutex debugging is active

2018-10-04 Thread Christian König
Set shared_max to the number of shared fences right before we release
the lock.

This way every attempt to add a shared fence without previously
reserving a slot will cause an error.

Signed-off-by: Christian König 
---
 include/linux/reservation.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/linux/reservation.h b/include/linux/reservation.h
index 5ddb0e143721..2f0ffca35780 100644
--- a/include/linux/reservation.h
+++ b/include/linux/reservation.h
@@ -214,6 +214,11 @@ reservation_object_trylock(struct reservation_object *obj)
 static inline void
 reservation_object_unlock(struct reservation_object *obj)
 {
+#ifdef CONFIG_DEBUG_MUTEXES
+   /* Test shared fence slot reservation */
+   if (obj->fence)
+   obj->fence->shared_max = obj->fence->shared_count;
+#endif
ww_mutex_unlock(&obj->lock);
 }
 
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/8] dma-buf: remove shared fence staging in reservation object

2018-10-04 Thread Christian König
No need for that any more. Just replace the list when there isn't enough
room any more for the additional fence.

Signed-off-by: Christian König 
---
 drivers/dma-buf/reservation.c | 178 ++
 include/linux/reservation.h   |   4 -
 2 files changed, 58 insertions(+), 124 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 6c95f61a32e7..5825fc336a13 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -68,105 +68,23 @@ EXPORT_SYMBOL(reservation_seqcount_string);
  */
 int reservation_object_reserve_shared(struct reservation_object *obj)
 {
-   struct reservation_object_list *fobj, *old;
-   u32 max;
+   struct reservation_object_list *old, *new;
+   unsigned int i, j, k, max;
 
old = reservation_object_get_list(obj);
 
if (old && old->shared_max) {
-   if (old->shared_count < old->shared_max) {
-   /* perform an in-place update */
-   kfree(obj->staged);
-   obj->staged = NULL;
+   if (old->shared_count < old->shared_max)
return 0;
-   } else
+   else
max = old->shared_max * 2;
-   } else
-   max = 4;
-
-   /*
-* resize obj->staged or allocate if it doesn't exist,
-* noop if already correct size
-*/
-   fobj = krealloc(obj->staged, offsetof(typeof(*fobj), shared[max]),
-   GFP_KERNEL);
-   if (!fobj)
-   return -ENOMEM;
-
-   obj->staged = fobj;
-   fobj->shared_max = max;
-   return 0;
-}
-EXPORT_SYMBOL(reservation_object_reserve_shared);
-
-static void
-reservation_object_add_shared_inplace(struct reservation_object *obj,
- struct reservation_object_list *fobj,
- struct dma_fence *fence)
-{
-   struct dma_fence *signaled = NULL;
-   u32 i, signaled_idx;
-
-   dma_fence_get(fence);
-
-   preempt_disable();
-   write_seqcount_begin(&obj->seq);
-
-   for (i = 0; i < fobj->shared_count; ++i) {
-   struct dma_fence *old_fence;
-
-   old_fence = rcu_dereference_protected(fobj->shared[i],
-   reservation_object_held(obj));
-
-   if (old_fence->context == fence->context) {
-   /* memory barrier is added by write_seqcount_begin */
-   RCU_INIT_POINTER(fobj->shared[i], fence);
-   write_seqcount_end(&obj->seq);
-   preempt_enable();
-
-   dma_fence_put(old_fence);
-   return;
-   }
-
-   if (!signaled && dma_fence_is_signaled(old_fence)) {
-   signaled = old_fence;
-   signaled_idx = i;
-   }
-   }
-
-   /*
-* memory barrier is added by write_seqcount_begin,
-* fobj->shared_count is protected by this lock too
-*/
-   if (signaled) {
-   RCU_INIT_POINTER(fobj->shared[signaled_idx], fence);
} else {
-   BUG_ON(fobj->shared_count >= fobj->shared_max);
-   RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence);
-   fobj->shared_count++;
+   max = 4;
}
 
-   write_seqcount_end(&obj->seq);
-   preempt_enable();
-
-   dma_fence_put(signaled);
-}
-
-static void
-reservation_object_add_shared_replace(struct reservation_object *obj,
- struct reservation_object_list *old,
- struct reservation_object_list *fobj,
- struct dma_fence *fence)
-{
-   unsigned i, j, k;
-
-   dma_fence_get(fence);
-
-   if (!old) {
-   RCU_INIT_POINTER(fobj->shared[0], fence);
-   fobj->shared_count = 1;
-   goto done;
-   }
+   new = kmalloc(offsetof(typeof(*new), shared[max]), GFP_KERNEL);
+   if (!new)
+   return -ENOMEM;
 
/*
 * no need to bump fence refcounts, rcu_read access
@@ -174,46 +92,45 @@ reservation_object_add_shared_replace(struct 
reservation_object *obj,
 * references from the old struct are carried over to
 * the new.
 */
-   for (i = 0, j = 0, k = fobj->shared_max; i < old->shared_count; ++i) {
-   struct dma_fence *check;
+   for (i = 0, j = 0, k = max; i < (old ? old->shared_count : 0); ++i) {
+   struct dma_fence *fence;
 
-   check = rcu_dereference_protected(old->shared[i],
-   reservation_object_held(obj));
-
-   if (check->context == fence->context ||
-   dma_fence_is_signaled(check))
-   RCU_INIT_POI

[PATCH] drm/amdgpu: fix AGP location with VRAM at 0x0

2018-10-04 Thread Christian König
That also simplifies handling quite a bit.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index 9a5b252784a1..999e15945355 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -200,16 +200,13 @@ void amdgpu_gmc_agp_location(struct amdgpu_device *adev, 
struct amdgpu_gmc *mc)
}
 
if (size_bf > size_af) {
-   mc->agp_start = mc->fb_start > mc->gart_start ?
-   mc->gart_end + 1 : 0;
+   mc->agp_start = (mc->fb_start - size_bf) & sixteen_gb_mask;
mc->agp_size = size_bf;
} else {
-   mc->agp_start = (mc->fb_start > mc->gart_start ?
-   mc->fb_end : mc->gart_end) + 1,
+   mc->agp_start = ALIGN(mc->fb_end + 1, sixteen_gb);
mc->agp_size = size_af;
}
 
-   mc->agp_start = ALIGN(mc->agp_start, sixteen_gb);
mc->agp_end = mc->agp_start + mc->agp_size - 1;
dev_info(adev->dev, "AGP: %lluM 0x%016llX - 0x%016llX\n",
mc->agp_size >> 20, mc->agp_start, mc->agp_end);
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Limit the max mc address to AMDGPU_VA_HOLE_START

2018-10-04 Thread Christian König
Looks like not only the SDMA has a problem with this, but also the VCE 
firmware :(


Please add a comment to explain why we do this, with that done the patch 
is Reviewed-by: Christian König .


Regards,
Christian.

Am 03.10.2018 um 16:39 schrieb Christian König:

Am 29.09.2018 um 11:48 schrieb Emily Deng:
For the vram_start is 0 case, the gart range will be from 
0x

to 0x1FFF, which will cause the sdma engine hang.

So limit the mc address to AMDGPU_VA_HOLE_START.


Well NAK, but that's what I've done initially as well :)

That stuff should now be handled by amdgpu_gmc_sign_extend(), but 
looks like that is missing somewhere.


I can take a look tomorrow, most likely just a one liner somewhere.

Christian.



Signed-off-by: Emily Deng 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c

index 9a5b252..7245260 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -146,6 +146,7 @@ void amdgpu_gmc_gart_location(struct 
amdgpu_device *adev, struct amdgpu_gmc *mc)

  {
  const uint64_t four_gb = 0x1ULL;
  u64 size_af, size_bf;
+    u64 max_mc_address = min(adev->gmc.mc_mask, AMDGPU_VA_HOLE_START);
    mc->gart_size += adev->pm.smu_prv_buffer_size;
  @@ -153,7 +154,7 @@ void amdgpu_gmc_gart_location(struct 
amdgpu_device *adev, struct amdgpu_gmc *mc)

   * the GART base on a 4GB boundary as well.
   */
  size_bf = mc->fb_start;
-    size_af = adev->gmc.mc_mask + 1 - ALIGN(mc->fb_end + 1, four_gb);
+    size_af = max_mc_address + 1 - ALIGN(mc->fb_end + 1, four_gb);
    if (mc->gart_size > max(size_bf, size_af)) {
  dev_warn(adev->dev, "limiting GART\n");
@@ -164,7 +165,7 @@ void amdgpu_gmc_gart_location(struct 
amdgpu_device *adev, struct amdgpu_gmc *mc)

  (size_af < mc->gart_size))
  mc->gart_start = 0;
  else
-    mc->gart_start = mc->mc_mask - mc->gart_size + 1;
+    mc->gart_start = max_mc_address - mc->gart_size + 1;
    mc->gart_start &= ~(four_gb - 1);
  mc->gart_end = mc->gart_start + mc->gart_size - 1;




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier v4

2018-10-04 Thread Christian König

Is my understanding correct?

Yes, of hand that sounds correct to me.

The other occasions should just be early bail out to optimize things 
under memory pressure.


Christian.

Am 03.10.2018 um 22:31 schrieb Philip Yang:

Hi Christian,

Yes, I agree. I am working on patch 2 to replace get_user_page with 
HMM. One problem is in current gfx path, we check if mmu_invalidation 
multiple times in amdgpu_cs_ioctl() path after get_user_page(), 
amdgpu_cs_parser_bos(), amdgpu_cs_list_validate(), and 
amdgpu_cs_submit(). For HMM, hmm_vma_range_done() has to be called 
once and only once after hmm_vma_get_pfns()/hmm_vma_fault(), so I will 
call hmm_vma_range_done() inside amdgpu_cs_submit after holding the mn 
lock. Is my understanding correct?


Philip

On 2018-10-02 11:05 AM, Christian König wrote:
Checking more code and documentation and thinking about it over my 
vacation I think I have some new conclusions here.


Currently we are using get_user_pages() together with an MMU notifier 
to guarantee coherent address space view, because get_user_pages() 
works by grabbing a reference to the pages and ignoring concurrent 
page table updates.


But HMM uses a different approach by checking the address space for 
modifications using hmm_vma_range_done() and re-trying when the 
address space has changed.


Now what you are trying to do is to change that into get_user_pages() 
and HMM callback and this is what won't work. We can either use 
get_user_pages() with MMU notifier or we can use HMM for the work, 
but we can't mix and match.


So my initial guess was correct that we just need to change both 
sides of the implementation at the same time.


Regards,
Christian.

Am 28.09.2018 um 17:13 schrieb Koenig, Christian:

No it definitely isn't.

We have literally worked month on this with the core MM developers.

Making sure that we have a consistent page array is absolutely vital 
for correct operation.


Please also check Jerome's presentation from XDC it also perfectly 
explains why this approach won't work correctly.


Christian.

Am 28.09.2018 17:07 schrieb "Yang, Philip" :
For B path, we take mm->mmap_sem, then call hmm_vma_get_pfns() or 
get_user_pages(). This is obvious.


For A path, mmu notifier 
mmu_notifier_invalidate_range_start()/mmu_notifier_invalidate_range_end() 
is called in many places, and the calling path is quit complicated 
inside mm, it's not obvious. I checked many of the them, for example:


do_munmap()
  down_write(&mm->mmap_sem)
  arch_unmap()
    mpx_notify_unmap()...
   zap_bt_entries_mapping()
 zap_page_range()
 up_write(&mm->mmap_sem)

void zap_page_range(struct vm_area_struct *vma, unsigned long start,
        unsigned long size)
{
    struct mm_struct *mm = vma->vm_mm;
    struct mmu_gather tlb;
    unsigned long end = start + size;

    lru_add_drain();
    tlb_gather_mmu(&tlb, mm, start, end);
    update_hiwater_rss(mm);
    mmu_notifier_invalidate_range_start(mm, start, end);
    for ( ; vma && vma->vm_start < end; vma = vma->vm_next)
        unmap_single_vma(&tlb, vma, start, end, NULL);
    mmu_notifier_invalidate_range_end(mm, start, end);
    tlb_finish_mmu(&tlb, start, end);
}

So AFAIK it's okay without invalidate_range_end() callback.

Regards,
Philip

On 2018-09-28 01:25 AM, Koenig, Christian wrote:

No, that is incorrect as well :)

The mmap_sem isn't necessary taken during page table updates.

What you could do is replace get_user_pages() directly with HMM. If 
I'm not completely mistaken that should work as expected.


Christian.

Am 27.09.2018 22:18 schrieb "Yang, Philip" :
I was trying to understand the way how HMM handle this concurrent 
issue and how we handle it in amdgpu_ttm_tt_userptr_needs_pages() 
and  amdgpu_ttm_tt_affect_userptr(). HMM uses range->valid flag, we 
use gtt->mmu_invalidations and gtt->last_set_pages. Both use the 
same lock plus flag idea actually.


Thanks for the information, now I understand fence 
ttm_eu_fence_buffer_objects() put to BOs will block CPU page table 
update. This is another side of this concurrent issue I didn't know.


I had same worry that it has issue without invalidate_range_end() 
callback as the calling sequence Felix lists. Now I think it's fine 
after taking a look again today because of mm->mmap_sem usage, this 
is my understanding:


A path:

down_write(&mm->mmap_sem);
mmu_notifier_invalidate_range_start()
    take_lock()
    release_lock()
CPU page table update
mmu_notifier_invalidate_range_end()
up_write(&mm->mmap_sem);

B path:

again:
down_read(&mm->mmap_sem);
hmm_vma_get_pfns()
up_read(&mm->mmap_sem);


take_lock()
if (!hmm_vma_range_done()) {
   release_lock()
   goto again
}
submit command job...
release_lock()

If you agree, I will submit patch v5 with some minor changes, and 
submit another patch to replace get_user_page() with HMM.


Regards,
Philip

On 2018-09-27 11:36 AM, Christian König wrote:

Yeah, I've read that as well.

My best guess is that we just need to add a call to 
hmm_vma_ran

[PATCH 2/2] drm/amd/vega10: allow increased power limit

2018-10-04 Thread Aleksandr Mezin
Allow setting the power limit to 150% of the default one, like
Windows driver does.

Signed-off-by: Aleksandr Mezin 
---
 drivers/gpu/drm/amd/powerplay/hwmgr/vega10_powertune.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_powertune.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_powertune.c
index a6432f9a75b4..8218aa9189da 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_powertune.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_powertune.c
@@ -1342,7 +1342,8 @@ int vega10_enable_power_containment(struct pp_hwmgr 
*hwmgr)
 
hwmgr->default_power_limit = hwmgr->power_limit =
(uint32_t)(tdp_table->usMaximumPowerDeliveryLimit);
-   hwmgr->max_power_limit = hwmgr->default_power_limit;
+   hwmgr->max_power_limit = hwmgr->default_power_limit +
+   hwmgr->default_power_limit / 2;
 
if (PP_CAP(PHM_PlatformCaps_PowerContainment)) {
if (data->smu_features[GNLD_PPT].supported)
-- 
2.19.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/2] drm/amd/powerplay: add max_power_limit to pp_hwmgr

2018-10-04 Thread Aleksandr Mezin
And initialize it to the default limit for now.
Will allow increasing the power limit on select cards.

Signed-off-by: Aleksandr Mezin 
---
 drivers/gpu/drm/amd/include/kgd_pp_interface.h | 2 +-
 drivers/gpu/drm/amd/powerplay/amd_powerplay.c  | 8 
 drivers/gpu/drm/amd/powerplay/hwmgr/smu7_powertune.c   | 2 ++
 drivers/gpu/drm/amd/powerplay/hwmgr/vega10_powertune.c | 1 +
 drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c | 1 +
 drivers/gpu/drm/amd/powerplay/inc/hwmgr.h  | 1 +
 6 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h 
b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
index bd7404532029..be6c6c9eb5d1 100644
--- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
@@ -243,7 +243,7 @@ struct amd_pm_funcs {
uint32_t block_type, bool gate);
int (*set_clockgating_by_smu)(void *handle, uint32_t msg_id);
int (*set_power_limit)(void *handle, uint32_t n);
-   int (*get_power_limit)(void *handle, uint32_t *limit, bool 
default_limit);
+   int (*get_power_limit)(void *handle, uint32_t *limit, bool max_limit);
int (*get_power_profile_mode)(void *handle, char *buf);
int (*set_power_profile_mode)(void *handle, long *input, uint32_t size);
int (*odn_edit_dpm_table)(void *handle, uint32_t type, long *input, 
uint32_t size);
diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
index da4ebff5b74d..b8c96b6d4923 100644
--- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
@@ -937,7 +937,7 @@ static int pp_set_power_limit(void *handle, uint32_t limit)
if (limit == 0)
limit = hwmgr->default_power_limit;
 
-   if (limit > hwmgr->default_power_limit)
+   if (limit > hwmgr->max_power_limit)
return -EINVAL;
 
mutex_lock(&hwmgr->smu_lock);
@@ -947,7 +947,7 @@ static int pp_set_power_limit(void *handle, uint32_t limit)
return 0;
 }
 
-static int pp_get_power_limit(void *handle, uint32_t *limit, bool 
default_limit)
+static int pp_get_power_limit(void *handle, uint32_t *limit, bool max_limit)
 {
struct pp_hwmgr *hwmgr = handle;
 
@@ -956,8 +956,8 @@ static int pp_get_power_limit(void *handle, uint32_t 
*limit, bool default_limit)
 
mutex_lock(&hwmgr->smu_lock);
 
-   if (default_limit)
-   *limit = hwmgr->default_power_limit;
+   if (max_limit)
+   *limit = hwmgr->max_power_limit;
else
*limit = hwmgr->power_limit;
 
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_powertune.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_powertune.c
index 5e19f5977eb1..5e273c8cd0c4 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_powertune.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_powertune.c
@@ -1140,6 +1140,8 @@ int smu7_enable_power_containment(struct pp_hwmgr *hwmgr)
if (0 == smc_result) {
hwmgr->default_power_limit = hwmgr->power_limit 
=

cac_table->usMaximumPowerDeliveryLimit;
+   hwmgr->max_power_limit =
+   hwmgr->default_power_limit;
data->power_containment_features |=

POWERCONTAINMENT_FEATURE_PkgPwrLimit;
 
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_powertune.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_powertune.c
index 2d88abf97e7b..a6432f9a75b4 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_powertune.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_powertune.c
@@ -1342,6 +1342,7 @@ int vega10_enable_power_containment(struct pp_hwmgr 
*hwmgr)
 
hwmgr->default_power_limit = hwmgr->power_limit =
(uint32_t)(tdp_table->usMaximumPowerDeliveryLimit);
+   hwmgr->max_power_limit = hwmgr->default_power_limit;
 
if (PP_CAP(PHM_PlatformCaps_PowerContainment)) {
if (data->smu_features[GNLD_PPT].supported)
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
index 7884ae3b1922..56060b943e7a 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
@@ -1554,6 +1554,7 @@ static int vega20_enable_dpm_tasks(struct pp_hwmgr *hwmgr)
return result);
hwmgr->power_limit =
hwmgr->default_power_limit = smum_get_argument(hwmgr);
+   hwmgr->max_power_limit = hwmgr->default_power_limit;
 
return 0;
 }
diff --git a/drivers/gpu/drm/amd/powerplay/inc/hwmgr.h 
b/drivers/gpu/drm/amd/powerplay/inc/hwmgr.h
index a6d92128b19c..094d25831e57 100644
--- a/drivers/gpu/drm

Re: [PATCH] drm/amdgpu: skip IB tests for KIQ in general

2018-10-04 Thread Christian König

Am 03.10.2018 um 17:15 schrieb Shirish S:

From: Pratik Vishwakarma 

[Why]
1. We never submit IBs to KIQ.
2. Ring test pass without KIQ's ring also.
3. By skipping we see an improvement of around 500ms
in the amdgpu's resume time.

[How]
skip IB tests for KIQ ring type.

Signed-off-by: Shirish S 
Signed-off-by: Pratik Vishwakarma 


Well I'm not sure if that is a good idea or not.

On the one hand it is true that we never submit IBs to the KIQ, so 
testing that doesn't make much sense actually.


But on the other hand the 500ms delay during resume points out a problem 
with the KIQ, e.g. interrupts are not working correctly!


Question is now if we should ignore that problem because we never use 
interrupts on the KIQ?


If the answer is to keep it as it is we should remove the intterupt 
handling for the KIQ as well.


Otherwise I would say we should fix interrupts on the KIQ and then we 
also don't need this change any more.


Regards,
Christian.


---

This patch is a follow-up to the suggestion given by Alex,
while reviewing the patch: https://patchwork.freedesktop.org/patch/250912/

-Shirish S

  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 47817e0..b8963b7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -354,6 +354,14 @@ int amdgpu_ib_ring_tests(struct amdgpu_device *adev)
if (!ring || !ring->ready)
continue;
  
+		/* skip IB tests for KIQ in general for the below reasons:

+* 1. We never submit IBs to the KIQ
+* 2. KIQ doesn't use the EOP interrupts,
+*we use some other CP interrupt.
+*/
+   if (ring->funcs->type == AMDGPU_RING_TYPE_KIQ)
+   continue;
+
/* MM engine need more time */
if (ring->funcs->type == AMDGPU_RING_TYPE_UVD ||
ring->funcs->type == AMDGPU_RING_TYPE_VCE ||


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx