Re: [PATCH] drm/amd/amdkfd/kfd_int_process_v9: Use true and false for boolean values

2018-08-06 Thread Huang Rui
On Sat, Aug 04, 2018 at 07:30:45PM -0500, Gustavo A. R. Silva wrote:
> Return statements in functions returning bool should use true or false
> instead of an integer value.
> 
> This code was detected with the help of Coccinelle.
> 
> Signed-off-by: Gustavo A. R. Silva 

Reviewed-by: Huang Rui 

Also add Felix for his awareness.

Thanks,
Ray

> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> index f836897..42e92e3 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> @@ -37,12 +37,12 @@ static bool event_interrupt_isr_v9(struct kfd_dev *dev,
>   vmid = SOC15_VMID_FROM_IH_ENTRY(ih_ring_entry);
>   if (vmid < dev->vm_info.first_vmid_kfd ||
>   vmid > dev->vm_info.last_vmid_kfd)
> - return 0;
> + return false;
>  
>   /* If there is no valid PASID, it's likely a firmware bug */
>   pasid = SOC15_PASID_FROM_IH_ENTRY(ih_ring_entry);
>   if (WARN_ONCE(pasid == 0, "FW bug: No PASID in KFD interrupt"))
> - return 0;
> + return false;
>  
>   source_id = SOC15_SOURCE_ID_FROM_IH_ENTRY(ih_ring_entry);
>   client_id = SOC15_CLIENT_ID_FROM_IH_ENTRY(ih_ring_entry);
> -- 
> 2.7.4
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdkfd: Use true and false for boolean values

2018-08-06 Thread Huang Rui
On Sat, Aug 04, 2018 at 07:27:02PM -0500, Gustavo A. R. Silva wrote:
> Return statements in functions returning bool should use true or false
> instead of an integer value.
> 
> This code was detected with the help of Coccinelle.
> 
> Signed-off-by: Gustavo A. R. Silva 

Looks good for me.
Reviewed-by: Huang Rui 

Add Felix for his awareness.

> ---
>  drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c 
> b/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
> index 5d2475d..16af9d1 100644
> --- a/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
> +++ b/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
> @@ -62,12 +62,12 @@ static bool cik_event_interrupt_isr(struct kfd_dev *dev,
>   vmid  = (ihre->ring_id & 0xff00) >> 8;
>   if (vmid < dev->vm_info.first_vmid_kfd ||
>   vmid > dev->vm_info.last_vmid_kfd)
> - return 0;
> + return false;
>  
>   /* If there is no valid PASID, it's likely a firmware bug */
>   pasid = (ihre->ring_id & 0x) >> 16;
>   if (WARN_ONCE(pasid == 0, "FW bug: No PASID in KFD interrupt"))
> - return 0;
> + return false;
>  
>   /* Interrupt types we care about: various signals and faults.
>* They will be forwarded to a work queue (see below).
> -- 
> 2.7.4
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] gpu: drm: radeon: radeon_test: Replace mdelay() with msleep()

2018-08-06 Thread Alex Deucher
On Fri, Aug 3, 2018 at 8:01 PM, Jia-Ju Bai  wrote:
> radeon_test_ring_sync() and radeon_test_ring_sync2() are never
> called in atomic context.
> They call mdelay() to busily wait, which is not necessary.
> mdelay() can be replaced with msleep().
>
> This is found by a static analysis tool named DCNS written by myself.
>
> Signed-off-by: Jia-Ju Bai 

Applied.  thanks!

Alex

> ---
>  drivers/gpu/drm/radeon/radeon_test.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon_test.c 
> b/drivers/gpu/drm/radeon/radeon_test.c
> index f5e9abfadb56..411efa303f54 100644
> --- a/drivers/gpu/drm/radeon/radeon_test.c
> +++ b/drivers/gpu/drm/radeon/radeon_test.c
> @@ -347,7 +347,7 @@ void radeon_test_ring_sync(struct radeon_device *rdev,
> if (r)
> goto out_cleanup;
>
> -   mdelay(1000);
> +   msleep(1000);
>
> if (radeon_fence_signaled(fence1)) {
> DRM_ERROR("Fence 1 signaled without waiting for 
> semaphore.\n");
> @@ -368,7 +368,7 @@ void radeon_test_ring_sync(struct radeon_device *rdev,
> goto out_cleanup;
> }
>
> -   mdelay(1000);
> +   msleep(1000);
>
> if (radeon_fence_signaled(fence2)) {
> DRM_ERROR("Fence 2 signaled without waiting for 
> semaphore.\n");
> @@ -441,7 +441,7 @@ static void radeon_test_ring_sync2(struct radeon_device 
> *rdev,
> if (r)
> goto out_cleanup;
>
> -   mdelay(1000);
> +   msleep(1000);
>
> if (radeon_fence_signaled(fenceA)) {
> DRM_ERROR("Fence A signaled without waiting for 
> semaphore.\n");
> @@ -461,7 +461,7 @@ static void radeon_test_ring_sync2(struct radeon_device 
> *rdev,
> radeon_ring_unlock_commit(rdev, ringC, false);
>
> for (i = 0; i < 30; ++i) {
> -   mdelay(100);
> +   msleep(100);
> sigA = radeon_fence_signaled(fenceA);
> sigB = radeon_fence_signaled(fenceB);
> if (sigA || sigB)
> @@ -486,7 +486,7 @@ static void radeon_test_ring_sync2(struct radeon_device 
> *rdev,
> radeon_semaphore_emit_signal(rdev, ringC->idx, semaphore);
> radeon_ring_unlock_commit(rdev, ringC, false);
>
> -   mdelay(1000);
> +   msleep(1000);
>
> r = radeon_fence_wait(fenceA, false);
> if (r) {
> --
> 2.17.0
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] gpu: drm: radeon: si: Replace mdelay() with msleep() in si_pcie_gen3_enable()

2018-08-06 Thread Alex Deucher
On Fri, Aug 3, 2018 at 8:33 PM, Jia-Ju Bai  wrote:
> si_pcie_gen3_enable() is never called in atomic context.
> It calls mdelay() to busily wait, which is not necessary.
> mdelay() can be replaced with msleep().
>
> This is found by a static analysis tool named DCNS written by myself
>
> Signed-off-by: Jia-Ju Bai 

Applied.  thanks!

Alex

> ---
>  drivers/gpu/drm/radeon/si.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/radeon/si.c b/drivers/gpu/drm/radeon/si.c
> index 1907c950d76f..c28743443970 100644
> --- a/drivers/gpu/drm/radeon/si.c
> +++ b/drivers/gpu/drm/radeon/si.c
> @@ -7181,7 +7181,7 @@ static void si_pcie_gen3_enable(struct radeon_device 
> *rdev)
> tmp |= LC_REDO_EQ;
> WREG32_PCIE_PORT(PCIE_LC_CNTL4, tmp);
>
> -   mdelay(100);
> +   msleep(100);
>
> /* linkctl */
> pci_read_config_word(root, bridge_pos + 
> PCI_EXP_LNKCTL, );
> --
> 2.17.0
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] gpu: drm: radeon: cik: Replace mdelay() with msleep() in cik_pcie_gen3_enable()

2018-08-06 Thread Alex Deucher
On Fri, Aug 3, 2018 at 8:25 PM, Jia-Ju Bai  wrote:
> cik_pcie_gen3_enable() is never called in atomic context.
> It calls mdelay() to busily wait, which is not necessary.
> mdelay() can be replaced with msleep().
>
> This is found by a static analysis tool named DCNS written by myself.
>
> Signed-off-by: Jia-Ju Bai 

Applied.  thanks.

Alex

> ---
>  drivers/gpu/drm/radeon/cik.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
> index 7c73bc7e2f85..c1b4d4fbcf3a 100644
> --- a/drivers/gpu/drm/radeon/cik.c
> +++ b/drivers/gpu/drm/radeon/cik.c
> @@ -9598,7 +9598,7 @@ static void cik_pcie_gen3_enable(struct radeon_device 
> *rdev)
> tmp |= LC_REDO_EQ;
> WREG32_PCIE_PORT(PCIE_LC_CNTL4, tmp);
>
> -   mdelay(100);
> +   msleep(100);
>
> /* linkctl */
> pci_read_config_word(root, bridge_pos + 
> PCI_EXP_LNKCTL, );
> --
> 2.17.0
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2] drm/scheduler: fix timeout worker setup for out of order job completions

2018-08-06 Thread Lucas Stach
Am Montag, den 06.08.2018, 10:12 +0200 schrieb Christian König:
> Am 03.08.2018 um 19:31 schrieb Lucas Stach:
> > Am Montag, den 06.08.2018, 14:57 +0200 schrieb Christian König:
> > > Am 03.08.2018 um 16:29 schrieb Lucas Stach:
> > > > drm_sched_job_finish() is a work item scheduled for each finished job on
> > > > a unbound system workqueue. This means the workers can execute out of 
> > > > order
> > > > with regard to the real hardware job completions.
> > > > 
> > > > If this happens queueing a timeout worker for the first job on the ring
> > > > mirror list is wrong, as this may be a job which has already finished
> > > > executing. Fix this by reorganizing the code to always queue the worker
> > > > for the next job on the list, if this job hasn't finished yet. This is
> > > > robust against a potential reordering of the finish workers.
> > > > 
> > > > Also move out the timeout worker cancelling, so that we don't need to
> > > > take the job list lock twice. As a small optimization list_del is used
> > > > to remove the job from the ring mirror list, as there is no need to
> > > > reinit the list head in the job we are about to free.
> > > > 
> > > > > > Signed-off-by: Lucas Stach 
> > > > 
> > > > ---
> > > > v2: - properly handle last job in the ring
> > > >   - check correct fence for compeletion
> > > > ---
> > > >    drivers/gpu/drm/scheduler/gpu_scheduler.c | 22 ++
> > > >    1 file changed, 10 insertions(+), 12 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
> > > > b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > > > index 44d480768dfe..574875e2c206 100644
> > > > --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > > > +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > > > @@ -452,24 +452,22 @@ static void drm_sched_job_finish(struct 
> > > > work_struct *work)
> > > > > > > > > > > >        
> > > > > > > > > > > > finish_work);
> > > > > >     struct drm_gpu_scheduler *sched = s_job->sched;
> > > > 
> > > >    
> > > > > > > > > > > > -   /* remove job from ring_mirror_list */
> > > > > > > > > > > > -   spin_lock(>job_list_lock);
> > > > > > > > > > > > -   list_del_init(_job->node);
> > > > > > > > > > > > -   if (sched->timeout != MAX_SCHEDULE_TIMEOUT) {
> > > > > > -   struct drm_sched_job *next;
> > > > 
> > > > -
> > > > > > > > > > > > -   spin_unlock(>job_list_lock);
> > > > > > +   if (sched->timeout != MAX_SCHEDULE_TIMEOUT)
> > > > 
> > > >     cancel_delayed_work_sync(_job->work_tdr);
> > > 
> > > That is unfortunately still racy here.
> > > 
> > > Between canceling the job and removing it from the list someone could
> > > actually start the time (in theory) :)
> > > 
> > > Cancel it, remove it from the list and cancel it again.
> > 
> > I don't see how. If we end up in this worker the finished fence of the
> > job is already certainly signaled (as this is what triggers queueing of
> > the worker). So even if some other worker manages to find this job as
> > the next job in the list, the dma_fence_is_signaled check should
> > prevent the timeout worker from getting scheduled again.
> 
> Well that makes sense, but is a bit hard to understand.

I agree, all the possible parallelism and possible re-ordering makes
this seemingly simple part of the scheduler code a bit mind-breaking.
I've added a comment in v3 to capture the line of thought for future
reference.

Regards,
Lucas

> Anyway, please remove the extra "if" check. With that done the patch is 
> > Reviewed-by: Christian König .
> 
> Thanks,
> Christian.
> 
> 
> > 
> > > BTW: You could completely drop the "if (sched->timeout !=
> > > MAX_SCHEDULE_TIMEOUT)" here cause canceling is harmless as long as the
> > > structure is initialized.
> > 
> > Right.
> > 
> > Regards,
> > Lucas
> > 
> > > Christian.
> > > 
> > > > > > -   spin_lock(>job_list_lock);
> > > > 
> > > >    
> > > > > > > > > > > > -   /* queue TDR for next job */
> > > > > > > > > > > > -   next = 
> > > > > > > > > > > > list_first_entry_or_null(>ring_mirror_list,
> > > > > > > > > > > > -   struct 
> > > > > > > > > > > > drm_sched_job, node);
> > > > > > > > > > > > +   spin_lock(>job_list_lock);
> > > > > > > > > > > > +   /* queue TDR for next job */
> > > > > > > > > > > > +   if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
> > > > > > > > > > > > +   !list_is_last(_job->node, 
> > > > > > > > > > > > >ring_mirror_list)) {
> > > > > > +   struct drm_sched_job *next = list_next_entry(s_job, 
> > > > > > node);
> > > > 
> > > >    
> > > > > > > > > > > > -   if (next)
> > > > > > > > > > > > +   if 
> > > > > > > > > > > > (!dma_fence_is_signaled(>s_fence->finished))
> > > > > > > > > > > >     
> > > > > > > > > > > > schedule_delayed_work(>work_tdr, 

[PATCH v3] drm/scheduler: fix timeout worker setup for out of order job completions

2018-08-06 Thread Lucas Stach
drm_sched_job_finish() is a work item scheduled for each finished job on
a unbound system workqueue. This means the workers can execute out of order
with regard to the real hardware job completions.

If this happens queueing a timeout worker for the first job on the ring
mirror list is wrong, as this may be a job which has already finished
executing. Fix this by reorganizing the code to always queue the worker
for the next job on the list, if this job hasn't finished yet. This is
robust against a potential reordering of the finish workers.

Also move out the timeout worker cancelling, so that we don't need to
take the job list lock twice. As a small optimization list_del is used
to remove the job from the ring mirror list, as there is no need to
reinit the list head in the job we are about to free.

Signed-off-by: Lucas Stach 
Reviewed-by: Christian König 
---
v2: - properly handle last job in the ring
- check correct fence for completion
v3: - drop condition before cancel_delayed_work_sync
- add a comment explaining how we avoid the timeout to be
  restarted without removing the job from the ring mirror first
---
 drivers/gpu/drm/scheduler/gpu_scheduler.c | 30 +--
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
index 44d480768dfe..fff3716ece6f 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -452,24 +452,28 @@ static void drm_sched_job_finish(struct work_struct *work)
   finish_work);
struct drm_gpu_scheduler *sched = s_job->sched;
 
-   /* remove job from ring_mirror_list */
-   spin_lock(>job_list_lock);
-   list_del_init(_job->node);
-   if (sched->timeout != MAX_SCHEDULE_TIMEOUT) {
-   struct drm_sched_job *next;
-
-   spin_unlock(>job_list_lock);
-   cancel_delayed_work_sync(_job->work_tdr);
-   spin_lock(>job_list_lock);
+   /*
+* Canceling the timeout without removing our job from the ring mirror
+* list is safe, as we will only end up in this worker if our jobs
+* finished fence has been signaled. So even if some another worker
+* manages to find this job as the next job in the list, the fence
+* signaled check below will prevent the timeout to be restarted.
+*/
+   cancel_delayed_work_sync(_job->work_tdr);
 
-   /* queue TDR for next job */
-   next = list_first_entry_or_null(>ring_mirror_list,
-   struct drm_sched_job, node);
+   spin_lock(>job_list_lock);
+   /* queue TDR for next job */
+   if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
+   !list_is_last(_job->node, >ring_mirror_list)) {
+   struct drm_sched_job *next = list_next_entry(s_job, node);
 
-   if (next)
+   if (!dma_fence_is_signaled(>s_fence->finished))
schedule_delayed_work(>work_tdr, sched->timeout);
}
+   /* remove job from ring_mirror_list */
+   list_del(_job->node);
spin_unlock(>job_list_lock);
+
dma_fence_put(_job->s_fence->finished);
sched->ops->free_job(s_job);
 }
-- 
2.18.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: move missed gfxoff entry into amdgpu_gfx header

2018-08-06 Thread Christian König

Am 06.08.2018 um 14:16 schrieb Huang Rui:

Move missed gfxoff entry to amdgpu_gfx.h.

Signed-off-by: Huang Rui 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 1 +
  2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index ef4fb6a..07924d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1247,7 +1247,6 @@ void amdgpu_device_program_register_sequence(struct 
amdgpu_device *adev,
 const u32 array_size);
  
  bool amdgpu_device_is_px(struct drm_device *dev);

-void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable);
  /* atpx handler */
  #if defined(CONFIG_VGA_SWITCHEROO)
  void amdgpu_register_atpx_handler(void);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index 4e3d147..53e9e2a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -356,5 +356,6 @@ void amdgpu_gfx_bit_to_queue(struct amdgpu_device *adev, 
int bit,
 int *mec, int *pipe, int *queue);
  bool amdgpu_gfx_is_mec_queue_enabled(struct amdgpu_device *adev, int mec,
 int pipe, int queue);
+void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable);
  
  #endif


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: move missed gfxoff entry into amdgpu_gfx header

2018-08-06 Thread Huang Rui
Move missed gfxoff entry to amdgpu_gfx.h.

Signed-off-by: Huang Rui 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 1 +
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index ef4fb6a..07924d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1247,7 +1247,6 @@ void amdgpu_device_program_register_sequence(struct 
amdgpu_device *adev,
 const u32 array_size);
 
 bool amdgpu_device_is_px(struct drm_device *dev);
-void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable);
 /* atpx handler */
 #if defined(CONFIG_VGA_SWITCHEROO)
 void amdgpu_register_atpx_handler(void);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index 4e3d147..53e9e2a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -356,5 +356,6 @@ void amdgpu_gfx_bit_to_queue(struct amdgpu_device *adev, 
int bit,
 int *mec, int *pipe, int *queue);
 bool amdgpu_gfx_is_mec_queue_enabled(struct amdgpu_device *adev, int mec,
 int pipe, int queue);
+void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable);
 
 #endif
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] gpu: drm: radeon: r600: Replace mdelay() and udelay() with msleep() and usleep_range()

2018-08-06 Thread Jia-Ju Bai
r600_gpu_soft_reset() and r600_gpu_pci_config_reset() are never
called in atomic context.
They call mdelay() and udelay() to busily wait, which is not necessary.
mdelay() and udelay() can be replaced with msleep() and usleep_range().

This is found by a static analysis tool named DCNS written by myself.

Signed-off-by: Jia-Ju Bai 
---
 drivers/gpu/drm/radeon/r600.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
index e06e2d8feab3..de5f6d9f251e 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -1705,7 +1705,7 @@ static void r600_gpu_soft_reset(struct radeon_device 
*rdev, u32 reset_mask)
WREG32(DMA_RB_CNTL, tmp);
}
 
-   mdelay(50);
+   msleep(50);
 
rv515_mc_stop(rdev, );
if (r600_mc_wait_for_idle(rdev)) {
@@ -1782,7 +1782,7 @@ static void r600_gpu_soft_reset(struct radeon_device 
*rdev, u32 reset_mask)
WREG32(R_008020_GRBM_SOFT_RESET, tmp);
tmp = RREG32(R_008020_GRBM_SOFT_RESET);
 
-   udelay(50);
+   usleep_range(50, 100);
 
tmp &= ~grbm_soft_reset;
WREG32(R_008020_GRBM_SOFT_RESET, tmp);
@@ -1796,7 +1796,7 @@ static void r600_gpu_soft_reset(struct radeon_device 
*rdev, u32 reset_mask)
WREG32(SRBM_SOFT_RESET, tmp);
tmp = RREG32(SRBM_SOFT_RESET);
 
-   udelay(50);
+   usleep_range(50, 100);
 
tmp &= ~srbm_soft_reset;
WREG32(SRBM_SOFT_RESET, tmp);
@@ -1804,10 +1804,10 @@ static void r600_gpu_soft_reset(struct radeon_device 
*rdev, u32 reset_mask)
}
 
/* Wait a little for things to settle down */
-   mdelay(1);
+   usleep_range(1000, 2000);
 
rv515_mc_resume(rdev, );
-   udelay(50);
+   usleep_range(50, 100);
 
r600_print_gpu_status_regs(rdev);
 }
@@ -1835,7 +1835,7 @@ static void r600_gpu_pci_config_reset(struct 
radeon_device *rdev)
tmp &= ~DMA_RB_ENABLE;
WREG32(DMA_RB_CNTL, tmp);
 
-   mdelay(50);
+   msleep(50);
 
/* set mclk/sclk to bypass */
if (rdev->family >= CHIP_RV770)
@@ -1857,12 +1857,12 @@ static void r600_gpu_pci_config_reset(struct 
radeon_device *rdev)
 
/* reset */
radeon_pci_config_reset(rdev);
-   mdelay(1);
+   usleep_range(1000, 2000);
 
/* BIF reset workaround.  Not sure if this is needed on 6xx */
tmp = SOFT_RESET_BIF;
WREG32(SRBM_SOFT_RESET, tmp);
-   mdelay(1);
+   usleep_range(1000, 2000);
WREG32(SRBM_SOFT_RESET, 0);
 
/* wait for asic to come out of reset */
-- 
2.17.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] gpu: drm: radeon: r300: Replace mdelay() with msleep() and usleep_range() in r300_asic_reset()

2018-08-06 Thread Jia-Ju Bai
r300_asic_reset() is never called in atomic context.
It calls mdelay() to busily wait, which is not necessary.
mdelay() can be replaced with msleep() and usleep_range().

This is found by a static analysis tool named DCNS written by myself.

Signed-off-by: Jia-Ju Bai 
---
 drivers/gpu/drm/radeon/r300.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r300.c b/drivers/gpu/drm/radeon/r300.c
index 21161aa8acbf..55cf02400d5a 100644
--- a/drivers/gpu/drm/radeon/r300.c
+++ b/drivers/gpu/drm/radeon/r300.c
@@ -434,9 +434,9 @@ int r300_asic_reset(struct radeon_device *rdev, bool hard)
WREG32(R_F0_RBBM_SOFT_RESET, S_F0_SOFT_RESET_VAP(1) |
S_F0_SOFT_RESET_GA(1));
RREG32(R_F0_RBBM_SOFT_RESET);
-   mdelay(500);
+   msleep(500);
WREG32(R_F0_RBBM_SOFT_RESET, 0);
-   mdelay(1);
+   usleep_range(1000, 2000);
status = RREG32(R_000E40_RBBM_STATUS);
dev_info(rdev->dev, "(%s:%d) RBBM_STATUS=0x%08X\n", __func__, __LINE__, 
status);
/* resetting the CP seems to be problematic sometimes it end up
@@ -446,9 +446,9 @@ int r300_asic_reset(struct radeon_device *rdev, bool hard)
 */
WREG32(R_F0_RBBM_SOFT_RESET, S_F0_SOFT_RESET_CP(1));
RREG32(R_F0_RBBM_SOFT_RESET);
-   mdelay(500);
+   msleep(500);
WREG32(R_F0_RBBM_SOFT_RESET, 0);
-   mdelay(1);
+   usleep_range(1000, 2000);
status = RREG32(R_000E40_RBBM_STATUS);
dev_info(rdev->dev, "(%s:%d) RBBM_STATUS=0x%08X\n", __func__, __LINE__, 
status);
/* restore PCI & busmastering */
-- 
2.17.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] gpu: drm: radeon: si: Replace mdelay() with msleep() in si_pcie_gen3_enable()

2018-08-06 Thread Jia-Ju Bai
si_pcie_gen3_enable() is never called in atomic context.
It calls mdelay() to busily wait, which is not necessary.
mdelay() can be replaced with msleep().

This is found by a static analysis tool named DCNS written by myself

Signed-off-by: Jia-Ju Bai 
---
 drivers/gpu/drm/radeon/si.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/si.c b/drivers/gpu/drm/radeon/si.c
index 1907c950d76f..c28743443970 100644
--- a/drivers/gpu/drm/radeon/si.c
+++ b/drivers/gpu/drm/radeon/si.c
@@ -7181,7 +7181,7 @@ static void si_pcie_gen3_enable(struct radeon_device 
*rdev)
tmp |= LC_REDO_EQ;
WREG32_PCIE_PORT(PCIE_LC_CNTL4, tmp);
 
-   mdelay(100);
+   msleep(100);
 
/* linkctl */
pci_read_config_word(root, bridge_pos + 
PCI_EXP_LNKCTL, );
-- 
2.17.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] gpu: drm: radeon: r100: Replace mdelay() with msleep() and usleep_range() in r100_asic_reset()

2018-08-06 Thread Jia-Ju Bai
r100_asic_reset() is never called in atomic context.
It calls mdelay() to busily wait, which is not necessary.
mdelay() can be replaced with msleep() and usleep_range().

This is found by a static analysis tool named DCNS written by myself.

Signed-off-by: Jia-Ju Bai 
---
 drivers/gpu/drm/radeon/r100.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index 7d39ed63e5be..09c418113d9a 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -2576,17 +2576,17 @@ int r100_asic_reset(struct radeon_device *rdev, bool 
hard)
S_F0_SOFT_RESET_PP(1) |
S_F0_SOFT_RESET_RB(1));
RREG32(R_F0_RBBM_SOFT_RESET);
-   mdelay(500);
+   msleep(500);
WREG32(R_F0_RBBM_SOFT_RESET, 0);
-   mdelay(1);
+   usleep_range(1000, 2000);
status = RREG32(R_000E40_RBBM_STATUS);
dev_info(rdev->dev, "(%s:%d) RBBM_STATUS=0x%08X\n", __func__, __LINE__, 
status);
/* reset CP */
WREG32(R_F0_RBBM_SOFT_RESET, S_F0_SOFT_RESET_CP(1));
RREG32(R_F0_RBBM_SOFT_RESET);
-   mdelay(500);
+   msleep(500);
WREG32(R_F0_RBBM_SOFT_RESET, 0);
-   mdelay(1);
+   usleep_range(1000, 2000);
status = RREG32(R_000E40_RBBM_STATUS);
dev_info(rdev->dev, "(%s:%d) RBBM_STATUS=0x%08X\n", __func__, __LINE__, 
status);
/* restore PCI & busmastering */
-- 
2.17.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdkfd: Use true and false for boolean values

2018-08-06 Thread Gustavo A. R. Silva
Return statements in functions returning bool should use true or false
instead of an integer value.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c 
b/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
index 5d2475d..16af9d1 100644
--- a/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
+++ b/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
@@ -62,12 +62,12 @@ static bool cik_event_interrupt_isr(struct kfd_dev *dev,
vmid  = (ihre->ring_id & 0xff00) >> 8;
if (vmid < dev->vm_info.first_vmid_kfd ||
vmid > dev->vm_info.last_vmid_kfd)
-   return 0;
+   return false;
 
/* If there is no valid PASID, it's likely a firmware bug */
pasid = (ihre->ring_id & 0x) >> 16;
if (WARN_ONCE(pasid == 0, "FW bug: No PASID in KFD interrupt"))
-   return 0;
+   return false;
 
/* Interrupt types we care about: various signals and faults.
 * They will be forwarded to a work queue (see below).
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amd/amdkfd/kfd_int_process_v9: Use true and false for boolean values

2018-08-06 Thread Gustavo A. R. Silva
Return statements in functions returning bool should use true or false
instead of an integer value.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
index f836897..42e92e3 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
@@ -37,12 +37,12 @@ static bool event_interrupt_isr_v9(struct kfd_dev *dev,
vmid = SOC15_VMID_FROM_IH_ENTRY(ih_ring_entry);
if (vmid < dev->vm_info.first_vmid_kfd ||
vmid > dev->vm_info.last_vmid_kfd)
-   return 0;
+   return false;
 
/* If there is no valid PASID, it's likely a firmware bug */
pasid = SOC15_PASID_FROM_IH_ENTRY(ih_ring_entry);
if (WARN_ONCE(pasid == 0, "FW bug: No PASID in KFD interrupt"))
-   return 0;
+   return false;
 
source_id = SOC15_SOURCE_ID_FROM_IH_ENTRY(ih_ring_entry);
client_id = SOC15_CLIENT_ID_FROM_IH_ENTRY(ih_ring_entry);
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] gpu: drm: radeon: rs600: Replace mdelay() with msleep() and usleep_range() in rs600_asic_reset()

2018-08-06 Thread Jia-Ju Bai
rs600_asic_reset() is never called in atomic context.
They call mdelay() to busily wait, which is not necessary.
mdelay() can be replaced with msleep() and usleep_range().

This is found by a static analysis tool named DCNS written by myself.

Signed-off-by: Jia-Ju Bai 
---
 drivers/gpu/drm/radeon/rs600.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/radeon/rs600.c b/drivers/gpu/drm/radeon/rs600.c
index f16af119c688..1a97f5fd719b 100644
--- a/drivers/gpu/drm/radeon/rs600.c
+++ b/drivers/gpu/drm/radeon/rs600.c
@@ -472,30 +472,30 @@ int rs600_asic_reset(struct radeon_device *rdev, bool 
hard)
pci_save_state(rdev->pdev);
/* disable bus mastering */
pci_clear_master(rdev->pdev);
-   mdelay(1);
+   usleep_range(1000, 2000);
/* reset GA+VAP */
WREG32(R_F0_RBBM_SOFT_RESET, S_F0_SOFT_RESET_VAP(1) |
S_F0_SOFT_RESET_GA(1));
RREG32(R_F0_RBBM_SOFT_RESET);
-   mdelay(500);
+   msleep(500);
WREG32(R_F0_RBBM_SOFT_RESET, 0);
-   mdelay(1);
+   usleep_range(1000, 2000);
status = RREG32(R_000E40_RBBM_STATUS);
dev_info(rdev->dev, "(%s:%d) RBBM_STATUS=0x%08X\n", __func__, __LINE__, 
status);
/* reset CP */
WREG32(R_F0_RBBM_SOFT_RESET, S_F0_SOFT_RESET_CP(1));
RREG32(R_F0_RBBM_SOFT_RESET);
-   mdelay(500);
+   msleep(500);
WREG32(R_F0_RBBM_SOFT_RESET, 0);
-   mdelay(1);
+   usleep_range(1000, 2000);
status = RREG32(R_000E40_RBBM_STATUS);
dev_info(rdev->dev, "(%s:%d) RBBM_STATUS=0x%08X\n", __func__, __LINE__, 
status);
/* reset MC */
WREG32(R_F0_RBBM_SOFT_RESET, S_F0_SOFT_RESET_MC(1));
RREG32(R_F0_RBBM_SOFT_RESET);
-   mdelay(500);
+   msleep(500);
WREG32(R_F0_RBBM_SOFT_RESET, 0);
-   mdelay(1);
+   usleep_range(1000, 2000);
status = RREG32(R_000E40_RBBM_STATUS);
dev_info(rdev->dev, "(%s:%d) RBBM_STATUS=0x%08X\n", __func__, __LINE__, 
status);
/* restore PCI & busmastering */
-- 
2.17.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] gpu: drm: radeon: radeon_test: Replace mdelay() with msleep()

2018-08-06 Thread Jia-Ju Bai
radeon_test_ring_sync() and radeon_test_ring_sync2() are never 
called in atomic context.
They call mdelay() to busily wait, which is not necessary.
mdelay() can be replaced with msleep().

This is found by a static analysis tool named DCNS written by myself.

Signed-off-by: Jia-Ju Bai 
---
 drivers/gpu/drm/radeon/radeon_test.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_test.c 
b/drivers/gpu/drm/radeon/radeon_test.c
index f5e9abfadb56..411efa303f54 100644
--- a/drivers/gpu/drm/radeon/radeon_test.c
+++ b/drivers/gpu/drm/radeon/radeon_test.c
@@ -347,7 +347,7 @@ void radeon_test_ring_sync(struct radeon_device *rdev,
if (r)
goto out_cleanup;
 
-   mdelay(1000);
+   msleep(1000);
 
if (radeon_fence_signaled(fence1)) {
DRM_ERROR("Fence 1 signaled without waiting for semaphore.\n");
@@ -368,7 +368,7 @@ void radeon_test_ring_sync(struct radeon_device *rdev,
goto out_cleanup;
}
 
-   mdelay(1000);
+   msleep(1000);
 
if (radeon_fence_signaled(fence2)) {
DRM_ERROR("Fence 2 signaled without waiting for semaphore.\n");
@@ -441,7 +441,7 @@ static void radeon_test_ring_sync2(struct radeon_device 
*rdev,
if (r)
goto out_cleanup;
 
-   mdelay(1000);
+   msleep(1000);
 
if (radeon_fence_signaled(fenceA)) {
DRM_ERROR("Fence A signaled without waiting for semaphore.\n");
@@ -461,7 +461,7 @@ static void radeon_test_ring_sync2(struct radeon_device 
*rdev,
radeon_ring_unlock_commit(rdev, ringC, false);
 
for (i = 0; i < 30; ++i) {
-   mdelay(100);
+   msleep(100);
sigA = radeon_fence_signaled(fenceA);
sigB = radeon_fence_signaled(fenceB);
if (sigA || sigB)
@@ -486,7 +486,7 @@ static void radeon_test_ring_sync2(struct radeon_device 
*rdev,
radeon_semaphore_emit_signal(rdev, ringC->idx, semaphore);
radeon_ring_unlock_commit(rdev, ringC, false);
 
-   mdelay(1000);
+   msleep(1000);
 
r = radeon_fence_wait(fenceA, false);
if (r) {
-- 
2.17.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] gpu: drm: radeon: cik: Replace mdelay() with msleep() in cik_pcie_gen3_enable()

2018-08-06 Thread Jia-Ju Bai
cik_pcie_gen3_enable() is never called in atomic context.
It calls mdelay() to busily wait, which is not necessary.
mdelay() can be replaced with msleep().

This is found by a static analysis tool named DCNS written by myself.

Signed-off-by: Jia-Ju Bai 
---
 drivers/gpu/drm/radeon/cik.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
index 7c73bc7e2f85..c1b4d4fbcf3a 100644
--- a/drivers/gpu/drm/radeon/cik.c
+++ b/drivers/gpu/drm/radeon/cik.c
@@ -9598,7 +9598,7 @@ static void cik_pcie_gen3_enable(struct radeon_device 
*rdev)
tmp |= LC_REDO_EQ;
WREG32_PCIE_PORT(PCIE_LC_CNTL4, tmp);
 
-   mdelay(100);
+   msleep(100);
 
/* linkctl */
pci_read_config_word(root, bridge_pos + 
PCI_EXP_LNKCTL, );
-- 
2.17.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 2/6] amdgpu: add handle table implementation v2

2018-08-06 Thread Zhang, Jerry (Junwei)

On 08/03/2018 07:34 PM, Christian König wrote:

The kernel handles are dense and the kernel always tries to use the
lowest free id. Use this to implement a more efficient handle table
by using a resizeable array instead of a hash.

v2: add handle_table_fini function, extra key checks,
 fix typo in function name

Signed-off-by: Christian König 
---
  amdgpu/Makefile.sources |  4 ++-
  amdgpu/handle_table.c   | 72 +
  amdgpu/handle_table.h   | 41 
  3 files changed, 116 insertions(+), 1 deletion(-)
  create mode 100644 amdgpu/handle_table.c
  create mode 100644 amdgpu/handle_table.h

diff --git a/amdgpu/Makefile.sources b/amdgpu/Makefile.sources
index 498b64cc..62577ba5 100644
--- a/amdgpu/Makefile.sources
+++ b/amdgpu/Makefile.sources
@@ -10,7 +10,9 @@ LIBDRM_AMDGPU_FILES := \
util_hash.c \
util_hash.h \
util_hash_table.c \
-   util_hash_table.h
+   util_hash_table.h \
+   handle_table.c \
+   handle_table.h

  LIBDRM_AMDGPU_H_FILES := \
amdgpu.h
diff --git a/amdgpu/handle_table.c b/amdgpu/handle_table.c
new file mode 100644
index ..15cd4763
--- /dev/null
+++ b/amdgpu/handle_table.c
@@ -0,0 +1,72 @@
+/*
+ * Copyright 2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include "handle_table.h"
+#include "util_math.h"
+
+drm_private int handle_table_insert(struct handle_table *table, uint32_t key,
+   void *value)
+{
+   if (key >= table->max_key) {
+   uint32_t alignment = sysconf(_SC_PAGESIZE) / sizeof(void*);
+   uint32_t max_key = ALIGN(key, alignment);


On 2nd thought, we may add table->count to take the role of current 
table->max_key,
and table_max_key could be updated by key rather than aligned value.

Then we could traverse the table by max_key instead of the entire table.

Regards,
Jerry


+   void **values;
+
+   values = realloc(table->values, max_key * sizeof(void *));
+   if (!values)
+   return -ENOMEM;
+
+   memset(values + table->max_key, 0, (max_key - table->max_key) *
+  sizeof(void *));
+
+   table->max_key = max_key;
+   table->values = values;
+   }
+   table->values[key] = value;
+   return 0;
+}
+
+drm_private void handle_table_remove(struct handle_table *table, uint32_t key)
+{
+   if (key < table->max_key)
+   table->values[key] = NULL;
+}
+
+drm_private void *handle_table_lookup(struct handle_table *table, uint32_t key)
+{
+   if (key < table->max_key)
+   return table->values[key];
+   else
+   return NULL;
+}
+
+drm_private void handle_table_fini(struct handle_table *table)
+{
+   free(table->values);
+   table->max_key = 0;
+   table->values = NULL;
+}
diff --git a/amdgpu/handle_table.h b/amdgpu/handle_table.h
new file mode 100644
index ..461193f6
--- /dev/null
+++ b/amdgpu/handle_table.h
@@ -0,0 +1,41 @@
+/*
+ * Copyright 2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 

Re: [PATCH v2] drm/scheduler: fix timeout worker setup for out of order job completions

2018-08-06 Thread Christian König

Am 03.08.2018 um 19:31 schrieb Lucas Stach:

Am Montag, den 06.08.2018, 14:57 +0200 schrieb Christian König:

Am 03.08.2018 um 16:29 schrieb Lucas Stach:

drm_sched_job_finish() is a work item scheduled for each finished job on
a unbound system workqueue. This means the workers can execute out of order
with regard to the real hardware job completions.

If this happens queueing a timeout worker for the first job on the ring
mirror list is wrong, as this may be a job which has already finished
executing. Fix this by reorganizing the code to always queue the worker
for the next job on the list, if this job hasn't finished yet. This is
robust against a potential reordering of the finish workers.

Also move out the timeout worker cancelling, so that we don't need to
take the job list lock twice. As a small optimization list_del is used
to remove the job from the ring mirror list, as there is no need to
reinit the list head in the job we are about to free.


Signed-off-by: Lucas Stach 

---
v2: - properly handle last job in the ring
  - check correct fence for compeletion
---
   drivers/gpu/drm/scheduler/gpu_scheduler.c | 22 ++
   1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
index 44d480768dfe..574875e2c206 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -452,24 +452,22 @@ static void drm_sched_job_finish(struct work_struct *work)

       finish_work);
    struct drm_gpu_scheduler *sched = s_job->sched;
   

-   /* remove job from ring_mirror_list */
-   spin_lock(>job_list_lock);
-   list_del_init(_job->node);
-   if (sched->timeout != MAX_SCHEDULE_TIMEOUT) {
-   struct drm_sched_job *next;

-

-   spin_unlock(>job_list_lock);
+   if (sched->timeout != MAX_SCHEDULE_TIMEOUT)

    cancel_delayed_work_sync(_job->work_tdr);

That is unfortunately still racy here.

Between canceling the job and removing it from the list someone could
actually start the time (in theory) :)

Cancel it, remove it from the list and cancel it again.

I don't see how. If we end up in this worker the finished fence of the
job is already certainly signaled (as this is what triggers queueing of
the worker). So even if some other worker manages to find this job as
the next job in the list, the dma_fence_is_signaled check should
prevent the timeout worker from getting scheduled again.


Well that makes sense, but is a bit hard to understand.

Anyway, please remove the extra "if" check. With that done the patch is 
Reviewed-by: Christian König .


Thanks,
Christian.





BTW: You could completely drop the "if (sched->timeout !=
MAX_SCHEDULE_TIMEOUT)" here cause canceling is harmless as long as the
structure is initialized.

Right.

Regards,
Lucas


Christian.


-   spin_lock(>job_list_lock);
   

-   /* queue TDR for next job */
-   next = list_first_entry_or_null(>ring_mirror_list,
-   struct drm_sched_job, node);
+   spin_lock(>job_list_lock);
+   /* queue TDR for next job */
+   if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
+   !list_is_last(_job->node, >ring_mirror_list)) {
+   struct drm_sched_job *next = list_next_entry(s_job, node);
   

-   if (next)
+   if (!dma_fence_is_signaled(>s_fence->finished))
    schedule_delayed_work(>work_tdr, sched->timeout);
    }
+   /* remove job from ring_mirror_list */
+   list_del(_job->node);
    spin_unlock(>job_list_lock);

+

    dma_fence_put(_job->s_fence->finished);
    sched->ops->free_job(s_job);

   }



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx