RE: [PATCH 2/3] drm/amdgpu: drop the sched_sync

2018-11-03 Thread Liu, Monk
> NAK, that would result in a severe performance drop.
> We need the fence here to determine if we actually need to do the pipeline 
> sync or not.
>E.g. the explicit requested fence could already be signaled.

For the performance issue, only insert a WAIT_REG_MEM on GFX/compute ring 
*doesn't* give the "severe" drop (it's mimic in fact) ...  At least I didn't 
observe any performance drop with 3dmark benchmark (also tested vulkan CTS), 
Can you tell me which game/benchmark will have performance drop with this fix 
by your understanding ? let me check it .

The problem I hit is during the massive stress test against multi-process + 
quark , if the quark process hang the engine while there is another two job 
following the bad job, 
After the TDR these two job will lose the explicit and the pipeline-sync was 
also lost.


BTW: for original logic, the pipeline sync have another corner case:
Assume JobC depend on JobA with explicit flag, and there is jobB inserted in 
ring:

jobA -> jobB -> (pipe sync)JobC

if JobA really cost a lot of time to finish, in the amdgpu_ib_schedule() stage 
you will insert a pipeline sync for JobC against its explicit dependency which 
is JobA,
but there is a JobB between A and C and the pipeline sync of before JobC will 
wrongly wait on the JobB ...  

while it is not a big issue but obviously not necessary: C have no relation 
with B

/Monk



-Original Message-
From: Christian König  
Sent: Sunday, November 4, 2018 3:50 AM
To: Liu, Monk ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 2/3] drm/amdgpu: drop the sched_sync

Am 03.11.18 um 06:33 schrieb Monk Liu:
> Reasons to drop it:
>
> 1) simplify the code: just introduce field member "need_pipe_sync"
> for job is good enough to tell if the explicit dependency fence need 
> followed by a pipeline sync.
>
> 2) after GPU_recover the explicit fence from sched_syn will not come 
> back so the required pipeline_sync following it is missed, consider 
> scenario below:
>> now on ring buffer:
> Job-A -> pipe_sync -> Job-B
>> TDR occured on Job-A, and after GPU recover:
>> now on ring buffer:
> Job-A -> Job-B
>
> because the fence from sched_sync is used and freed after ib_schedule 
> in first time, it will never come back, with this patch this issue 
> could be avoided.

NAK, that would result in a severe performance drop.

We need the fence here to determine if we actually need to do the pipeline sync 
or not.

E.g. the explicit requested fence could already be signaled.

Christian.

>
> Signed-off-by: Monk Liu 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c  | 16 ++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 14 +++---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |  3 +--
>   3 files changed, 10 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> index c48207b3..ac7d2da 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> @@ -122,7 +122,6 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
> num_ibs,
>   {
>   struct amdgpu_device *adev = ring->adev;
>   struct amdgpu_ib *ib = [0];
> - struct dma_fence *tmp = NULL;
>   bool skip_preamble, need_ctx_switch;
>   unsigned patch_offset = ~0;
>   struct amdgpu_vm *vm;
> @@ -166,16 +165,13 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, 
> unsigned num_ibs,
>   }
>   
>   need_ctx_switch = ring->current_ctx != fence_ctx;
> - if (ring->funcs->emit_pipeline_sync && job &&
> - ((tmp = amdgpu_sync_get_fence(>sched_sync, NULL)) ||
> -  (amdgpu_sriov_vf(adev) && need_ctx_switch) ||
> -  amdgpu_vm_need_pipeline_sync(ring, job))) {
> - need_pipe_sync = true;
>   
> - if (tmp)
> - trace_amdgpu_ib_pipe_sync(job, tmp);
> -
> - dma_fence_put(tmp);
> + if (ring->funcs->emit_pipeline_sync && job) {
> + if ((need_ctx_switch && amdgpu_sriov_vf(adev)) ||
> + amdgpu_vm_need_pipeline_sync(ring, job))
> + need_pipe_sync = true;
> + else if (job->need_pipe_sync)
> + need_pipe_sync = true;
>   }
>   
>   if (ring->funcs->insert_start)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> index 1d71f8c..dae997d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> @@ -71,7 +71,6 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned 
> num_ibs,
>   (*job)->num_ibs = num_ibs;
>   
>   amdgpu_sync_create(&(*job)->sync);
> - amdgpu_sync_create(&(*job)->sched_sync);
>   (*job)->vram_lost_counter = atomic_read(>vram_lost_counter);
>   (*job)->vm_pd_addr = AMDGPU_BO_INVALID_OFFSET;
>   
> @@ -117,7 +116,6 @@ static void amdgpu_job_free_cb(struct drm_sched_job 
> *s_job)
>   amdgpu_ring_priority_put(ring, 

RE: [PATCH 2/3] drm/amdgpu: drop the sched_sync

2018-11-03 Thread Liu, Monk
How you solve the missing pipe-sync bug I illustrated ? actually original logic 
Is loose,
I found this corner case during massive TDR test couple weeks ago, and it is 
very hard to catch ...

/Monk

-Original Message-
From: Christian König  
Sent: Sunday, November 4, 2018 3:50 AM
To: Liu, Monk ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 2/3] drm/amdgpu: drop the sched_sync

Am 03.11.18 um 06:33 schrieb Monk Liu:
> Reasons to drop it:
>
> 1) simplify the code: just introduce field member "need_pipe_sync"
> for job is good enough to tell if the explicit dependency fence need 
> followed by a pipeline sync.
>
> 2) after GPU_recover the explicit fence from sched_syn will not come 
> back so the required pipeline_sync following it is missed, consider 
> scenario below:
>> now on ring buffer:
> Job-A -> pipe_sync -> Job-B
>> TDR occured on Job-A, and after GPU recover:
>> now on ring buffer:
> Job-A -> Job-B
>
> because the fence from sched_sync is used and freed after ib_schedule 
> in first time, it will never come back, with this patch this issue 
> could be avoided.

NAK, that would result in a severe performance drop.

We need the fence here to determine if we actually need to do the pipeline sync 
or not.

E.g. the explicit requested fence could already be signaled.

Christian.

>
> Signed-off-by: Monk Liu 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c  | 16 ++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 14 +++---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |  3 +--
>   3 files changed, 10 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> index c48207b3..ac7d2da 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> @@ -122,7 +122,6 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
> num_ibs,
>   {
>   struct amdgpu_device *adev = ring->adev;
>   struct amdgpu_ib *ib = [0];
> - struct dma_fence *tmp = NULL;
>   bool skip_preamble, need_ctx_switch;
>   unsigned patch_offset = ~0;
>   struct amdgpu_vm *vm;
> @@ -166,16 +165,13 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, 
> unsigned num_ibs,
>   }
>   
>   need_ctx_switch = ring->current_ctx != fence_ctx;
> - if (ring->funcs->emit_pipeline_sync && job &&
> - ((tmp = amdgpu_sync_get_fence(>sched_sync, NULL)) ||
> -  (amdgpu_sriov_vf(adev) && need_ctx_switch) ||
> -  amdgpu_vm_need_pipeline_sync(ring, job))) {
> - need_pipe_sync = true;
>   
> - if (tmp)
> - trace_amdgpu_ib_pipe_sync(job, tmp);
> -
> - dma_fence_put(tmp);
> + if (ring->funcs->emit_pipeline_sync && job) {
> + if ((need_ctx_switch && amdgpu_sriov_vf(adev)) ||
> + amdgpu_vm_need_pipeline_sync(ring, job))
> + need_pipe_sync = true;
> + else if (job->need_pipe_sync)
> + need_pipe_sync = true;
>   }
>   
>   if (ring->funcs->insert_start)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> index 1d71f8c..dae997d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> @@ -71,7 +71,6 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned 
> num_ibs,
>   (*job)->num_ibs = num_ibs;
>   
>   amdgpu_sync_create(&(*job)->sync);
> - amdgpu_sync_create(&(*job)->sched_sync);
>   (*job)->vram_lost_counter = atomic_read(>vram_lost_counter);
>   (*job)->vm_pd_addr = AMDGPU_BO_INVALID_OFFSET;
>   
> @@ -117,7 +116,6 @@ static void amdgpu_job_free_cb(struct drm_sched_job 
> *s_job)
>   amdgpu_ring_priority_put(ring, s_job->s_priority);
>   dma_fence_put(job->fence);
>   amdgpu_sync_free(>sync);
> - amdgpu_sync_free(>sched_sync);
>   kfree(job);
>   }
>   
> @@ -127,7 +125,6 @@ void amdgpu_job_free(struct amdgpu_job *job)
>   
>   dma_fence_put(job->fence);
>   amdgpu_sync_free(>sync);
> - amdgpu_sync_free(>sched_sync);
>   kfree(job);
>   }
>   
> @@ -182,14 +179,9 @@ static struct dma_fence *amdgpu_job_dependency(struct 
> drm_sched_job *sched_job,
>   bool need_pipe_sync = false;
>   int r;
>   
> - fence = amdgpu_sync_get_fence(>sync, _pipe_sync);
> - if (fence && need_pipe_sync) {
> - if (drm_sched_dependency_optimized(fence, s_entity)) {
> - r = amdgpu_sync_fence(ring->adev, >sched_sync,
> -   fence, false);
> - if (r)
> - DRM_ERROR("Error adding fence (%d)\n", r);
> - }
> + if (fence && need_pipe_sync && drm_sched_dependency_optimized(fence, 
> s_entity)) {
> + trace_amdgpu_ib_pipe_sync(job, fence);
> + job->need_pipe_sync = true;
>   }
>   
>   while (fence == NULL && 

Re: [PATCH 2/3] drm/amdgpu: drop the sched_sync

2018-11-03 Thread Christian König

Am 03.11.18 um 06:33 schrieb Monk Liu:

Reasons to drop it:

1) simplify the code: just introduce field member "need_pipe_sync"
for job is good enough to tell if the explicit dependency fence
need followed by a pipeline sync.

2) after GPU_recover the explicit fence from sched_syn will not
come back so the required pipeline_sync following it is missed,
consider scenario below:

now on ring buffer:

Job-A -> pipe_sync -> Job-B

TDR occured on Job-A, and after GPU recover:
now on ring buffer:

Job-A -> Job-B

because the fence from sched_sync is used and freed after ib_schedule
in first time, it will never come back, with this patch this issue
could be avoided.


NAK, that would result in a severe performance drop.

We need the fence here to determine if we actually need to do the 
pipeline sync or not.


E.g. the explicit requested fence could already be signaled.

Christian.



Signed-off-by: Monk Liu 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c  | 16 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 14 +++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |  3 +--
  3 files changed, 10 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index c48207b3..ac7d2da 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -122,7 +122,6 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
  {
struct amdgpu_device *adev = ring->adev;
struct amdgpu_ib *ib = [0];
-   struct dma_fence *tmp = NULL;
bool skip_preamble, need_ctx_switch;
unsigned patch_offset = ~0;
struct amdgpu_vm *vm;
@@ -166,16 +165,13 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
}
  
  	need_ctx_switch = ring->current_ctx != fence_ctx;

-   if (ring->funcs->emit_pipeline_sync && job &&
-   ((tmp = amdgpu_sync_get_fence(>sched_sync, NULL)) ||
-(amdgpu_sriov_vf(adev) && need_ctx_switch) ||
-amdgpu_vm_need_pipeline_sync(ring, job))) {
-   need_pipe_sync = true;
  
-		if (tmp)

-   trace_amdgpu_ib_pipe_sync(job, tmp);
-
-   dma_fence_put(tmp);
+   if (ring->funcs->emit_pipeline_sync && job) {
+   if ((need_ctx_switch && amdgpu_sriov_vf(adev)) ||
+   amdgpu_vm_need_pipeline_sync(ring, job))
+   need_pipe_sync = true;
+   else if (job->need_pipe_sync)
+   need_pipe_sync = true;
}
  
  	if (ring->funcs->insert_start)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 1d71f8c..dae997d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -71,7 +71,6 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned 
num_ibs,
(*job)->num_ibs = num_ibs;
  
  	amdgpu_sync_create(&(*job)->sync);

-   amdgpu_sync_create(&(*job)->sched_sync);
(*job)->vram_lost_counter = atomic_read(>vram_lost_counter);
(*job)->vm_pd_addr = AMDGPU_BO_INVALID_OFFSET;
  
@@ -117,7 +116,6 @@ static void amdgpu_job_free_cb(struct drm_sched_job *s_job)

amdgpu_ring_priority_put(ring, s_job->s_priority);
dma_fence_put(job->fence);
amdgpu_sync_free(>sync);
-   amdgpu_sync_free(>sched_sync);
kfree(job);
  }
  
@@ -127,7 +125,6 @@ void amdgpu_job_free(struct amdgpu_job *job)
  
  	dma_fence_put(job->fence);

amdgpu_sync_free(>sync);
-   amdgpu_sync_free(>sched_sync);
kfree(job);
  }
  
@@ -182,14 +179,9 @@ static struct dma_fence *amdgpu_job_dependency(struct drm_sched_job *sched_job,

bool need_pipe_sync = false;
int r;
  
-	fence = amdgpu_sync_get_fence(>sync, _pipe_sync);

-   if (fence && need_pipe_sync) {
-   if (drm_sched_dependency_optimized(fence, s_entity)) {
-   r = amdgpu_sync_fence(ring->adev, >sched_sync,
- fence, false);
-   if (r)
-   DRM_ERROR("Error adding fence (%d)\n", r);
-   }
+   if (fence && need_pipe_sync && drm_sched_dependency_optimized(fence, 
s_entity)) {
+   trace_amdgpu_ib_pipe_sync(job, fence);
+   job->need_pipe_sync = true;
}
  
  	while (fence == NULL && vm && !job->vmid) {

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
index e1b46a6..c1d00f0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
@@ -41,7 +41,6 @@ struct amdgpu_job {
struct drm_sched_jobbase;
struct amdgpu_vm*vm;
struct amdgpu_sync  sync;
-   struct amdgpu_sync  sched_sync;
struct amdgpu_ib*ibs;
struct dma_fence*fence; /* the hw fence */
uint32_t 

Re: [PATCH 3/3] drm/amdgpu: drop need_vm_flush/need_pipe_sync

2018-11-03 Thread Christian König

Am 03.11.18 um 06:33 schrieb Monk Liu:

use a flag to hold need_flush and need_pipe_sync


NAK, if you want to safe space then use "bool variable:1" instead.

Flags in structures are not as self explaining as a member variable.

Christian.



Signed-off-by: Monk Liu 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c|  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c   | 14 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c   |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h   |  6 --
  drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  4 ++--
  7 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index ac7d2da..3231a3b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -170,7 +170,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
if ((need_ctx_switch && amdgpu_sriov_vf(adev)) ||
amdgpu_vm_need_pipeline_sync(ring, job))
need_pipe_sync = true;
-   else if (job->need_pipe_sync)
+   else if (job->flags & JOB_NEED_PIPE_SYNC)
need_pipe_sync = true;
}
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c

index df9b173..ed9b6c1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
@@ -311,7 +311,9 @@ static int amdgpu_vmid_grab_reserved(struct amdgpu_vm *vm,
dma_fence_put((*id)->flushed_updates);
(*id)->flushed_updates = dma_fence_get(updates);
}
-   job->vm_needs_flush = needs_flush;
+
+   if (needs_flush)
+   job->flags |= JOB_NEED_VM_FLUSH;
return 0;
  }
  
@@ -341,7 +343,8 @@ static int amdgpu_vmid_grab_used(struct amdgpu_vm *vm,

struct dma_fence *updates = sync->last_vm_update;
int r;
  
-	job->vm_needs_flush = vm->use_cpu_for_update;

+   if (vm->use_cpu_for_update)
+   job->flags |= JOB_NEED_VM_FLUSH;
  
  	/* Check if we can use a VMID already assigned to this VM */

list_for_each_entry_reverse((*id), _mgr->ids_lru, list) {
@@ -380,7 +383,8 @@ static int amdgpu_vmid_grab_used(struct amdgpu_vm *vm,
(*id)->flushed_updates = dma_fence_get(updates);
}
  
-		job->vm_needs_flush |= needs_flush;

+   if (needs_flush)
+   job->flags |= JOB_NEED_VM_FLUSH;
return 0;
}
  
@@ -438,7 +442,7 @@ int amdgpu_vmid_grab(struct amdgpu_vm *vm, struct amdgpu_ring *ring,
  
  			dma_fence_put(id->flushed_updates);

id->flushed_updates = dma_fence_get(updates);
-   job->vm_needs_flush = true;
+   job->flags |= JOB_NEED_VM_FLUSH;
}
  
  		list_move_tail(>list, _mgr->ids_lru);

@@ -447,7 +451,7 @@ int amdgpu_vmid_grab(struct amdgpu_vm *vm, struct 
amdgpu_ring *ring,
id->pd_gpu_addr = job->vm_pd_addr;
id->owner = vm->entity.fence_context;
  
-	if (job->vm_needs_flush) {

+   if (job->flags & JOB_NEED_VM_FLUSH) {
dma_fence_put(id->last_flush);
id->last_flush = NULL;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index dae997d..82e190d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -181,7 +181,7 @@ static struct dma_fence *amdgpu_job_dependency(struct 
drm_sched_job *sched_job,
  
  	if (fence && need_pipe_sync && drm_sched_dependency_optimized(fence, s_entity)) {

trace_amdgpu_ib_pipe_sync(job, fence);
-   job->need_pipe_sync = true;
+   job->flags |= JOB_NEED_PIPE_SYNC;
}
  
  	while (fence == NULL && vm && !job->vmid) {

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
index c1d00f0..f9e8ecd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
@@ -37,6 +37,9 @@
  
  struct amdgpu_fence;
  
+#define JOB_NEED_VM_FLUSH   1   /* require a vm flush for this job */

+#define JOB_NEED_PIPE_SYNC  2   /* require a pipeline sync for this job */
+
  struct amdgpu_job {
struct drm_sched_jobbase;
struct amdgpu_vm*vm;
@@ -46,7 +49,6 @@ struct amdgpu_job {
uint32_tpreamble_status;
uint32_tnum_ibs;
void*owner;
-   boolvm_needs_flush;
uint64_tvm_pd_addr;
unsignedvmid;
unsignedpasid;
@@ -58,7 +60,7 @@ struct amdgpu_job {
/* user fence handling */
uint64_t