RE: [PATCH 10/10] drm/amdgpu: add gang submit frontend

2022-06-01 Thread Mohan Marimuthu, Yogesh
[Public]

Hi Christian,

No other comments. With p->jobs[i] fixed, the test case worked. I have to clean 
up the code and send it for review.
I wanted to add comparing the time with and without gang submission and fail 
test case if former is slow. I will do this later. I will send the test case 
for review first.

Thank you,
Yogesh

-Original Message-
From: Koenig, Christian  
Sent: Wednesday, June 1, 2022 5:42 PM
To: Mohan Marimuthu, Yogesh ; Christian König 
; amd-gfx@lists.freedesktop.org; Olsak, Marek 

Subject: Re: [PATCH 10/10] drm/amdgpu: add gang submit frontend



Am 01.06.22 um 14:09 schrieb Mohan Marimuthu, Yogesh:
> [SNIP]
> - /* Make sure all BOs are remembered as writers */
> - amdgpu_bo_list_for_each_entry(e, p->bo_list)
> + list_for_each_entry(e, >validated, tv.head) {
> +
> + /* Everybody except for the gang leader uses BOOKKEEP */
> + for (i = 0; i < (p->gang_size - 1); ++i) {
> + dma_resv_add_fence(e->tv.bo->base.resv,
> +>jobs[i]->base.s_fence->finished,
> +DMA_RESV_USAGE_BOOKKEEP);
> + }
> +
> + /* The gang leader as remembered as writer */
>   e->tv.num_shared = 0;
> + }
>
>
> p->jobs[i] = NULL is done in previous loop. Now when running 
> >jobs[i]->base.s_fence->finished there is NULL pointer crash.

Ah, yes good point. Going to fix that.

Any more comments on this? Did you finished the test cases?

Thanks,
Christian.


RE: [PATCH 10/10] drm/amdgpu: add gang submit frontend

2022-06-01 Thread Mohan Marimuthu, Yogesh
[AMD Official Use Only - General]



-Original Message-
From: amd-gfx  On Behalf Of Christian 
König
Sent: Thursday, March 3, 2022 1:53 PM
To: amd-gfx@lists.freedesktop.org; Olsak, Marek 
Cc: Koenig, Christian 
Subject: [PATCH 10/10] drm/amdgpu: add gang submit frontend

Allows submitting jobs as gang which needs to run on multiple engines at the 
same time.

All members of the gang get the same implicit, explicit and VM dependencies. So 
no gang member will start running until everything else is ready.

The last job is considered the gang leader (usually a submission to the GFX
ring) and used for signaling output dependencies.

Each job is remembered individually as user of a buffer object, so there is no 
joining of work at the end.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 244 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h|   9 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h |  12 +-
 3 files changed, 173 insertions(+), 92 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index c6541f7b8f54..7429e64919fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -69,6 +69,7 @@ static int amdgpu_cs_p1_ib(struct amdgpu_cs_parser *p,
   unsigned int *num_ibs)
 {
struct drm_sched_entity *entity;
+   unsigned int i;
int r;
 
r = amdgpu_ctx_get_entity(p->ctx, chunk_ib->ip_type, @@ -83,11 +84,19 
@@ static int amdgpu_cs_p1_ib(struct amdgpu_cs_parser *p,
return -EINVAL;
 
/* Currently we don't support submitting to multiple entities */
-   if (p->entity && p->entity != entity)
+   for (i = 0; i < p->gang_size; ++i) {
+   if (p->entities[i] == entity)
+   goto found;
+   }
+
+   if (i == AMDGPU_CS_GANG_SIZE)
return -EINVAL;
 
-   p->entity = entity;
-   ++(*num_ibs);
+   p->entities[i] = entity;
+   p->gang_size = i + 1;
+
+found:
+   ++(num_ibs[i]);
return 0;
 }
 
@@ -161,11 +170,12 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
   union drm_amdgpu_cs *cs)
 {
struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
+   unsigned int num_ibs[AMDGPU_CS_GANG_SIZE] = { };
struct amdgpu_vm *vm = >vm;
uint64_t *chunk_array_user;
uint64_t *chunk_array;
-   unsigned size, num_ibs = 0;
uint32_t uf_offset = 0;
+   unsigned int size;
int ret;
int i;
 
@@ -228,7 +238,7 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
if (size < sizeof(struct drm_amdgpu_cs_chunk_ib))
goto free_partial_kdata;
 
-   ret = amdgpu_cs_p1_ib(p, p->chunks[i].kdata, _ibs);
+   ret = amdgpu_cs_p1_ib(p, p->chunks[i].kdata, num_ibs);
if (ret)
goto free_partial_kdata;
break;
@@ -265,21 +275,27 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
}
}
 
-   ret = amdgpu_job_alloc(p->adev, num_ibs, >job, vm);
-   if (ret)
-   goto free_all_kdata;
+   if (!p->gang_size)
+   return -EINVAL;
 
-   ret = drm_sched_job_init(>job->base, p->entity, >vm);
-   if (ret)
-   goto free_all_kdata;
+   for (i = 0; i < p->gang_size; ++i) {
+   ret = amdgpu_job_alloc(p->adev, num_ibs[i], >jobs[i], vm);
+   if (ret)
+   goto free_all_kdata;
+
+   ret = drm_sched_job_init(>jobs[i]->base, p->entities[i],
+>vm);
+   if (ret)
+   goto free_all_kdata;
+   }
 
-   if (p->ctx->vram_lost_counter != p->job->vram_lost_counter) {
+   if (p->ctx->vram_lost_counter != p->jobs[0]->vram_lost_counter) {
ret = -ECANCELED;
goto free_all_kdata;
}
 
if (p->uf_entry.tv.bo)
-   p->job->uf_addr = uf_offset;
+   p->jobs[p->gang_size - 1]->uf_addr = uf_offset;
kvfree(chunk_array);
 
/* Use this opportunity to fill in task info for the vm */ @@ -301,22 
+317,18 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
return ret;
 }
 
-static int amdgpu_cs_p2_ib(struct amdgpu_cs_parser *p,
-  struct amdgpu_cs_chunk *chunk,
-  unsigned int *num_ibs,
-  unsigned int *ce_preempt,
-  unsigned int *de_preempt)
+static int amdgpu_cs_p2_ib(struct amdgpu_cs_parser *p, struct amdgpu_job *job,
+  struct amdgpu_ib *ib, struct amdgpu_cs_chunk *chunk,
+  unsigned int *ce_preempt, unsigned int *de_preempt)
 {
-   struct amdgpu_ring *ring = 

Re: [PATCH] drm/amdgpu: fix radv vulkan fps drop after s3 resume

2021-08-18 Thread Mohan Marimuthu, Yogesh
[Public]

[Why]
After s3, In radv there is huge fps drop in games. This is because
when memory is allocated using radv_amdgpu_winsys_bo_create()
with both AMDGPU_GEM_DOMAIN_VRAM and AMDGPU_GEM_DOMAIN_GTT domains
set, the kernel memory management after resume fails to move the data
back to VRAM. In kernel memory management, ttm_bo_mem_compat()
function returns true and hence data is not moved back to VRAM.

[How]
Implement the idea suggested by Christian Koenig. During suspend
move the data to system RAM instead of GTT. Due to this ttm_bo_mem_compat()
will return false and data will be moved back to VRAM.

Suggested-by: Christian König mailto:christian.koe...@amd.com
Signed-off-by: Yogesh mohan marimuthu mailto:yogesh.mohanmarimu...@amd.com
Reviewed-by: Christian König mailto:christian.koe...@amd.com
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 446943e32..44ec59998 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -136,7 +136,13 @@ static void amdgpu_evict_flags(struct ttm_buffer_object 
*bo,
return;
 
case TTM_PL_VRAM:
-   if (!adev->mman.buffer_funcs_enabled) {
+   /* Move data to system memory for S3 so that while resume
+* ttm_bo_mem_compat() will return false and data will be
+* moved back to VRAM also in case of bo with both
+* AMDGPU_GEM_DOMAIN_GTT and AMDGPU_GEM_DOMAIN_VRAM domain
+* set in bo->preferred_domains.
+*/
+   if (!adev->mman.buffer_funcs_enabled || adev->in_s3) {
/* Move to system memory */
amdgpu_bo_placement_from_domain(abo, 
AMDGPU_GEM_DOMAIN_CPU);
} else if (!amdgpu_gmc_vram_full_visible(>gmc) &&
-- 
2.25.1


[PATCH] drm/amdgpu: fix radv vulkan fps drop after s3 resume

2021-08-17 Thread Mohan Marimuthu, Yogesh
[Public]

[Why]
After s3, In radv there is huge fps drop in games. This is because
when memory is allocated using radv_amdgpu_winsys_bo_create()
with both AMDGPU_GEM_DOMAIN_VRAM and AMDGPU_GEM_DOMAIN_GTT domains
set, the kernel memory management after resume fails to move the data
back to VRAM. In kernel memory management, ttm_bo_mem_compat()
function returns true and hence data is not moved back to VRAM.

[How]
Implement the idea suggested by Christian Koenig. During suspend
move the data to system RAM instead of GTT. Due to this ttm_bo_mem_compat()
will return false and data will be moved back to VRAM.

Signed-off-by: Christian König 
christian.koe...@amd.com
Signed-off-by: Yogesh mohan marimuthu 
yogesh.mohanmarimu...@amd.com
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 8 +++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 446943e32..44ec59998 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -136,7 +136,13 @@ static void amdgpu_evict_flags(struct ttm_buffer_object 
*bo,
return;
   case TTM_PL_VRAM:
-if (!adev->mman.buffer_funcs_enabled) {
+   /* Move data to system memory for S3 so that while 
resume
+   * ttm_bo_mem_compat() will return false and data 
will be
+   * moved back to VRAM also in case of bo with both
+   * AMDGPU_GEM_DOMAIN_GTT and AMDGPU_GEM_DOMAIN_VRAM 
domain
+   * set in bo->preferred_domains.
+   */
+   if (!adev->mman.buffer_funcs_enabled || 
adev->in_s3) {
   /* Move to system memory */
   amdgpu_bo_placement_from_domain(abo, 
AMDGPU_GEM_DOMAIN_CPU);
} else if 
(!amdgpu_gmc_vram_full_visible(>gmc) &&
--
2.25.1



[PATCH] drm/amdgpu: sort probed modes before adding common modes

2019-05-23 Thread Mohan Marimuthu, Yogesh
[Why]
There are monitors which can have more than one preferred mode
set. There are chances in these monitors that if common modes are
added in function amdgpu_dm_connector_add_common_modes(), these
common modes can be calculated with different preferred mode than
the one used in function decide_crtc_timing_for_drm_display_mode().
The preferred mode can be different because after common modes
are added, the mode list is sorted and this changes the order of
preferred modes in the list. The first mode in the list with
preferred flag set is selected as preferred mode. Due to this the
preferred mode selected varies.
If same preferred mode is not selected in common mode calculation
and crtc timing, then during mode set instead of setting preferred
timing, common mode timing will be applied which can cause "out of
range" message in the monitor with monitor blanking out.

[How]
Sort the modes before adding common modes. The same sorting function
is called during common mode addition and deciding crtc timing.

Signed-off-by: Yogesh Mohan Marimuthu 
Change-Id: I48036a476ad621de022e2fda5e1861e72e7e8c30
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 4a1755bce..418e3956a 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4677,6 +4677,15 @@ static void amdgpu_dm_connector_ddc_get_modes(struct 
drm_connector *connector,
amdgpu_dm_connector->num_modes =
drm_add_edid_modes(connector, edid);
 
+   /* sorting the probed modes before calling function
+* amdgpu_dm_get_native_mode() since EDID can have
+* more than one preferred mode. The modes that are
+* later in the probed mode list could be of higher
+* and preferred resolution. For example, 3840x2160
+* resolution in base EDID preferred timing and 4096x2160
+* preferred resolution in DID extension block later.
+*/
+   drm_mode_sort(>probed_modes);
amdgpu_dm_get_native_mode(connector);
} else {
amdgpu_dm_connector->num_modes = 0;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx