Re: [PATCH 3/6] drm/amdgpu: Rework coredump to use memory dynamically

2023-07-12 Thread Lucas Stach
Sorry, accidentally hit sent on the previous mail.

Am Mittwoch, dem 12.07.2023 um 12:39 +0200 schrieb Christian König:
> Am 12.07.23 um 10:59 schrieb Lucas Stach:
> > Am Mittwoch, dem 12.07.2023 um 10:37 +0200 schrieb Christian König:
> > > Am 11.07.23 um 23:34 schrieb André Almeida:
> > > > Instead of storing coredump information inside amdgpu_device struct,
> > > > move if to a proper separated struct and allocate it dynamically. This
> > > > will make it easier to further expand the logged information.
> > > Verry big NAK to this. The problem is that memory allocation isn't
> > > allowed during a GPU reset.
> > > 
> > > What you could do is to allocate the memory with GFP_ATOMIC or similar,
> > > but for a large structure that might not be possible.
> > > 
> > I'm still not fully clear on what the rules are here. In etnaviv we do
> > devcoredump allocation in the GPU reset path with __GFP_NOWARN |
> > __GFP_NORETRY, which means the allocation will kick memory reclaim if
> > necessary, but will just give up if no memory could be made available
> > easily. This satisfies the forward progress guarantee in the absence of
> > successful memory allocation, which is the most important property in
> > this path, I think.
> > 
> > However, I'm not sure if the reclaim could lead to locking issues or
> > something like that with the more complex use-cases with MMU notifiers
> > and stuff like that. Christian, do you have any experience or
> > information that would be good to share in this regard?
> 
> Yeah, very good question.
> 
> __GFP_NORETRY isn't sufficient as far as I know. Reclaim must be 
> completely suppressed to be able to allocate in a GPU reset handler.
> 
> Daniel added lockdep annotation to some of the dma-fence signaling paths 
> and this yielded quite a bunch of potential deadlocks.
> 
> It's not even that reclaim itself waits for dma_fences (that can happen, 
> but is quite unlikely), but rather that reclaim needs locks and under 
> those locks we then wait for dma_fences.
> 
> We should probably add a define somewhere which documents that 
> (GFP_ATOMIC | __NO_WARN) should be used in the GPU reset handlers when 
> allocating memory for coredumps.
> 
Hm, if the problem is the direct reclaim path where we might recurse on
a lock through those indirect dependencies then we should document this
somewhere. kswapd reclaim should be fine as far as I can see, as we'll
guarantee progress without waiting for the background reclaim.

I don't think it's appropriate to dip into the atomic allocation
reserves for a best-effort thing like writing the devcoredump, so I
think this should be GFP_NOWAIT, which will also avoid the direct
reclaim path.

Regards,
Lucas

> Regards,
> Christian.
> 
> > 
> > Regards,
> > Lucas
> > 
> > > Regards,
> > > Christian.
> > > 
> > > > Signed-off-by: André Almeida 
> > > > ---
> > > >drivers/gpu/drm/amd/amdgpu/amdgpu.h| 14 +++--
> > > >drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 65 
> > > > ++
> > > >2 files changed, 51 insertions(+), 28 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > > index dbe062a087c5..e1cc83a89d46 100644
> > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > > @@ -1068,11 +1068,6 @@ struct amdgpu_device {
> > > > uint32_t*reset_dump_reg_list;
> > > > uint32_t*reset_dump_reg_value;
> > > > int num_regs;
> > > > -#ifdef CONFIG_DEV_COREDUMP
> > > > -   struct amdgpu_task_info reset_task_info;
> > > > -   boolreset_vram_lost;
> > > > -   struct timespec64   reset_time;
> > > > -#endif
> > > >
> > > > boolscpm_enabled;
> > > > uint32_tscpm_status;
> > > > @@ -1085,6 +1080,15 @@ struct amdgpu_device {
> > > > uint32_taid_mask;
> > > >};
> > > >
> > > > +#ifdef CONFIG_DEV_COREDUMP
> > > > +struct amdgpu_coredump_info {
> > > > +   struct amdgpu_device*adev;
> > > > +   struct amdgpu_task_info reset_task_info;
> > > &g

Re: [PATCH 3/6] drm/amdgpu: Rework coredump to use memory dynamically

2023-07-12 Thread Lucas Stach
Am Mittwoch, dem 12.07.2023 um 12:39 +0200 schrieb Christian König:
> Am 12.07.23 um 10:59 schrieb Lucas Stach:
> > Am Mittwoch, dem 12.07.2023 um 10:37 +0200 schrieb Christian König:
> > > Am 11.07.23 um 23:34 schrieb André Almeida:
> > > > Instead of storing coredump information inside amdgpu_device struct,
> > > > move if to a proper separated struct and allocate it dynamically. This
> > > > will make it easier to further expand the logged information.
> > > Verry big NAK to this. The problem is that memory allocation isn't
> > > allowed during a GPU reset.
> > > 
> > > What you could do is to allocate the memory with GFP_ATOMIC or similar,
> > > but for a large structure that might not be possible.
> > > 
> > I'm still not fully clear on what the rules are here. In etnaviv we do
> > devcoredump allocation in the GPU reset path with __GFP_NOWARN |
> > __GFP_NORETRY, which means the allocation will kick memory reclaim if
> > necessary, but will just give up if no memory could be made available
> > easily. This satisfies the forward progress guarantee in the absence of
> > successful memory allocation, which is the most important property in
> > this path, I think.
> > 
> > However, I'm not sure if the reclaim could lead to locking issues or
> > something like that with the more complex use-cases with MMU notifiers
> > and stuff like that. Christian, do you have any experience or
> > information that would be good to share in this regard?
> 
> Yeah, very good question.
> 
> __GFP_NORETRY isn't sufficient as far as I know. Reclaim must be 
> completely suppressed to be able to allocate in a GPU reset handler.
> 
> Daniel added lockdep annotation to some of the dma-fence signaling paths 
> and this yielded quite a bunch of potential deadlocks.
> 
> It's not even that reclaim itself waits for dma_fences (that can happen, 
> but is quite unlikely), but rather that reclaim needs locks and under 
> those locks we then wait for dma_fences.
> 
> We should probably add a define somewhere which documents that 
> (GFP_ATOMIC | __NO_WARN) should be used in the GPU reset handlers when 
> allocating memory for coredumps.
> 
> Regards,
> Christian.
> 
> > 
> > Regards,
> > Lucas
> > 
> > > Regards,
> > > Christian.
> > > 
> > > > Signed-off-by: André Almeida 
> > > > ---
> > > >drivers/gpu/drm/amd/amdgpu/amdgpu.h| 14 +++--
> > > >drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 65 
> > > > ++
> > > >2 files changed, 51 insertions(+), 28 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > > index dbe062a087c5..e1cc83a89d46 100644
> > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > > @@ -1068,11 +1068,6 @@ struct amdgpu_device {
> > > > uint32_t*reset_dump_reg_list;
> > > > uint32_t*reset_dump_reg_value;
> > > > int num_regs;
> > > > -#ifdef CONFIG_DEV_COREDUMP
> > > > -   struct amdgpu_task_info reset_task_info;
> > > > -   boolreset_vram_lost;
> > > > -   struct timespec64   reset_time;
> > > > -#endif
> > > >
> > > > boolscpm_enabled;
> > > > uint32_tscpm_status;
> > > > @@ -1085,6 +1080,15 @@ struct amdgpu_device {
> > > > uint32_taid_mask;
> > > >};
> > > >
> > > > +#ifdef CONFIG_DEV_COREDUMP
> > > > +struct amdgpu_coredump_info {
> > > > +   struct amdgpu_device*adev;
> > > > +   struct amdgpu_task_info reset_task_info;
> > > > +   struct timespec64   reset_time;
> > > > +   boolreset_vram_lost;
> > > > +};
> > > > +#endif
> > > > +
> > > >static inline struct amdgpu_device *drm_to_adev(struct drm_device 
> > > > *ddev)
> > > >{
> > > > return container_of(ddev, struct amdgpu_device, ddev);
> > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> > > > b/drivers/gpu/drm/a

Re: [PATCH 3/6] drm/amdgpu: Rework coredump to use memory dynamically

2023-07-12 Thread Lucas Stach
Am Mittwoch, dem 12.07.2023 um 10:37 +0200 schrieb Christian König:
> Am 11.07.23 um 23:34 schrieb André Almeida:
> > Instead of storing coredump information inside amdgpu_device struct,
> > move if to a proper separated struct and allocate it dynamically. This
> > will make it easier to further expand the logged information.
> 
> Verry big NAK to this. The problem is that memory allocation isn't 
> allowed during a GPU reset.
> 
> What you could do is to allocate the memory with GFP_ATOMIC or similar, 
> but for a large structure that might not be possible.
> 
I'm still not fully clear on what the rules are here. In etnaviv we do
devcoredump allocation in the GPU reset path with __GFP_NOWARN |
__GFP_NORETRY, which means the allocation will kick memory reclaim if
necessary, but will just give up if no memory could be made available
easily. This satisfies the forward progress guarantee in the absence of
successful memory allocation, which is the most important property in
this path, I think.

However, I'm not sure if the reclaim could lead to locking issues or
something like that with the more complex use-cases with MMU notifiers
and stuff like that. Christian, do you have any experience or
information that would be good to share in this regard?

Regards,
Lucas

> Regards,
> Christian.
> 
> > 
> > Signed-off-by: André Almeida 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu.h| 14 +++--
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 65 ++
> >   2 files changed, 51 insertions(+), 28 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > index dbe062a087c5..e1cc83a89d46 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > @@ -1068,11 +1068,6 @@ struct amdgpu_device {
> > uint32_t*reset_dump_reg_list;
> > uint32_t*reset_dump_reg_value;
> > int num_regs;
> > -#ifdef CONFIG_DEV_COREDUMP
> > -   struct amdgpu_task_info reset_task_info;
> > -   boolreset_vram_lost;
> > -   struct timespec64   reset_time;
> > -#endif
> >   
> > boolscpm_enabled;
> > uint32_tscpm_status;
> > @@ -1085,6 +1080,15 @@ struct amdgpu_device {
> > uint32_taid_mask;
> >   };
> >   
> > +#ifdef CONFIG_DEV_COREDUMP
> > +struct amdgpu_coredump_info {
> > +   struct amdgpu_device*adev;
> > +   struct amdgpu_task_info reset_task_info;
> > +   struct timespec64   reset_time;
> > +   boolreset_vram_lost;
> > +};
> > +#endif
> > +
> >   static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
> >   {
> > return container_of(ddev, struct amdgpu_device, ddev);
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index e25f085ee886..23b9784e9787 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -4963,12 +4963,17 @@ static int amdgpu_reset_reg_dumps(struct 
> > amdgpu_device *adev)
> > return 0;
> >   }
> >   
> > -#ifdef CONFIG_DEV_COREDUMP
> > +#ifndef CONFIG_DEV_COREDUMP
> > +static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
> > +   struct amdgpu_reset_context *reset_context)
> > +{
> > +}
> > +#else
> >   static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
> > size_t count, void *data, size_t datalen)
> >   {
> > struct drm_printer p;
> > -   struct amdgpu_device *adev = data;
> > +   struct amdgpu_coredump_info *coredump = data;
> > struct drm_print_iterator iter;
> > int i;
> >   
> > @@ -4982,21 +4987,21 @@ static ssize_t amdgpu_devcoredump_read(char 
> > *buffer, loff_t offset,
> > drm_printf(, " AMDGPU Device Coredump \n");
> > drm_printf(, "kernel: " UTS_RELEASE "\n");
> > drm_printf(, "module: " KBUILD_MODNAME "\n");
> > -   drm_printf(, "time: %lld.%09ld\n", adev->reset_time.tv_sec, 
> > adev->reset_time.tv_nsec);
> > -   if (adev->reset_task_info.pid)
> > +   drm_printf(, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, 
> > coredump->reset_time.tv_nsec);
> > +   if (coredump->reset_task_info.pid)
> > drm_printf(, "process_name: %s PID: %d\n",
> > -  adev->reset_task_info.process_name,
> > -  adev->reset_task_info.pid);
> > +  coredump->reset_task_info.process_name,
> > +  coredump->reset_task_info.pid);
> >   
> > -   if (adev->reset_vram_lost)
> > +   if (coredump->reset_vram_lost)
> > drm_printf(, "VRAM is lost due to GPU reset!\n");
> > -   if (adev->num_regs) {
> > +   if (coredump->adev->num_regs) {
> > drm_printf(, "AMDGPU register 

Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

2023-06-08 Thread Lucas Stach
Hi all,

and almost 2 years later I stumbled across this exact issue still being
present in the scheduler: if the driver bails out of the timeout
handling before calling drm_sched_stop(), the timeout job will be
leaked and the TDR timer will potentially not be restarted as the job
isn't put back in the pending_list.

How do we solve this? Apply the below suggestion?

Regards,
Lucas

Am Freitag, dem 20.08.2021 um 07:12 + schrieb Liu, Monk:
> [AMD Official Use Only]
> 
> @Daniel Vetter @Grodzovsky, Andrey @Koenig, Christian
>  
> Do you have any concern on the kthread_park() approach ?
> 
> Theoretically speaking sched_main shall run there exclusively with 
> job_timeout since they both touches jobs, and stop scheduler during 
> job_timeout won't impact performance since in that scenario
> There was already something wrong/stuck on that ring/scheduler 
> 
> Thanks 
> 
> --
> Monk Liu | Cloud-GPU Core team
> --
> 
> -Original Message-
> From: Liu, Monk 
> Sent: Thursday, August 19, 2021 6:26 PM
> To: Daniel Vetter ; Grodzovsky, Andrey 
> 
> Cc: Alex Deucher ; Chen, JingWen 
> ; Maling list - DRI developers 
> ; amd-gfx list 
> ; Koenig, Christian 
> Subject: RE: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."
> 
> [AMD Official Use Only]
> 
> Hi Daniel
> 
> > > Why can't we stop the scheduler thread first, so that there's guaranteed 
> > > no race? I've recently had a lot of discussions with panfrost folks about 
> > > their reset that spawns across engines, and without stopping the 
> > > scheduler thread first before you touch anything it's just plain 
> > > impossible.
> 
> Yeah we had this though as well in our mind.
> 
> Our second approach is to call ktrhead_stop() in job_timedout() routine so 
> that  the "bad" job is guaranteed to be used without scheduler's touching or 
> freeing, Check this sample patch one as well please:
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> b/drivers/gpu/drm/scheduler/sched_main.c
> index a2a9536..50a49cb 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -319,17 +319,12 @@ static void drm_sched_job_timedout(struct work_struct 
> *work)
> sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
>  
> /* Protects against concurrent deletion in drm_sched_get_cleanup_job 
> */
> +   kthread_park(sched->thread);
> spin_lock(>job_list_lock);
> job = list_first_entry_or_null(>pending_list,
>struct drm_sched_job, list);
>  
> if (job) {
> -   /*
> -* Remove the bad job so it cannot be freed by concurrent
> -* drm_sched_cleanup_jobs. It will be reinserted back after 
> sched->thread
> -* is parked at which point it's safe.
> -*/
> -   list_del_init(>list);
> spin_unlock(>job_list_lock);
>  
> status = job->sched->ops->timedout_job(job);
> @@ -345,6 +340,7 @@ static void drm_sched_job_timedout(struct work_struct 
> *work)
> } else {
> spin_unlock(>job_list_lock);
> }
> +   kthread_unpark(sched->thread);
>  
> if (status != DRM_GPU_SCHED_STAT_ENODEV) {
> spin_lock(>job_list_lock); @@ -393,20 +389,6 @@ void 
> drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
> kthread_park(sched->thread);
>  
> /*
> -* Reinsert back the bad job here - now it's safe as
> -* drm_sched_get_cleanup_job cannot race against us and release the
> -* bad job at this point - we parked (waited for) any in progress
> -* (earlier) cleanups and drm_sched_get_cleanup_job will not be called
> -* now until the scheduler thread is unparked.
> -*/
> -   if (bad && bad->sched == sched)
> -   /*
> -* Add at the head of the queue to reflect it was the earliest
> -* job extracted.
> -*/
> -   list_add(>list, >pending_list);
> -
> -   /*
>  * Iterate the job list from later to  earlier one and either deactive
>  * their HW callbacks or remove them from pending list if they already
>  * signaled.
> 
> 
> Thanks 
> 
> --
> Monk Liu | Cloud-GPU Core team
> --
> 
> -Original Message-
> From: Daniel Vetter 
> Sent: Thursday, August 19, 2021 5:31 PM
> To: Grodzovsky, Andrey 
> Cc: Daniel Vetter ; Alex Deucher ; 
> Chen, JingWen ; Maling list - DRI developers 
> ; amd-gfx list 
> ; Liu, Monk ; Koenig, 
> Christian 
> Subject: Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."
> 
> On Wed, Aug 18, 2021 at 10:51:00AM -0400, Andrey Grodzovsky wrote:
> > 
> > On 2021-08-18 10:42 a.m., Daniel 

Re: [PATCH 3/6] drm/etnaviv: stop getting the excl fence separately here

2021-11-01 Thread Lucas Stach
Am Donnerstag, dem 28.10.2021 um 15:26 +0200 schrieb Christian König:
> Just grab all fences in one go.
> 
> Signed-off-by: Christian König 

Reviewed-by: Lucas Stach 

> ---
>  drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c 
> b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> index 8dc93863bf96..b5e8ce86dbe7 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> @@ -189,7 +189,7 @@ static int submit_fence_sync(struct etnaviv_gem_submit 
> *submit)
>   continue;
>  
>   if (bo->flags & ETNA_SUBMIT_BO_WRITE) {
> - ret = dma_resv_get_fences(robj, >excl,
> + ret = dma_resv_get_fences(robj, NULL,
> >nr_shared,
> >shared);
>   if (ret)




Re: [PATCH 1/1] drm/scheduler: Job timeout handler returns status (v2)

2020-12-17 Thread Lucas Stach
Hi Luben,

Am Freitag, dem 11.12.2020 um 15:36 -0500 schrieb Luben Tuikov:
> On 2020-12-10 4:31 a.m., Lucas Stach wrote:
> > Hi Luben,
> > 
> > Am Mittwoch, den 09.12.2020, 21:14 -0500 schrieb Luben Tuikov:
> > > This patch does not change current behaviour.
> > > 
> > > The driver's job timeout handler now returns
> > > status indicating back to the DRM layer whether
> > > the task (job) was successfully aborted or whether
> > > more time should be given to the task to complete.
> > > 
> > > Default behaviour as of this patch, is preserved,
> > > except in obvious-by-comment case in the Panfrost
> > > driver, as documented below.
> > > 
> > > All drivers which make use of the
> > > drm_sched_backend_ops' .timedout_job() callback
> > > have been accordingly renamed and return the
> > > would've-been default value of
> > > DRM_TASK_STATUS_ALIVE to restart the task's
> > > timeout timer--this is the old behaviour, and
> > > is preserved by this patch.
> > > 
> > > In the case of the Panfrost driver, its timedout
> > > callback correctly first checks if the job had
> > > completed in due time and if so, it now returns
> > > DRM_TASK_STATUS_COMPLETE to notify the DRM layer
> > > that the task can be moved to the done list, to be
> > > freed later. In the other two subsequent checks,
> > > the value of DRM_TASK_STATUS_ALIVE is returned, as
> > > per the default behaviour.
> > > 
> > > A more involved driver's solutions can be had
> > > in subequent patches.
> > > 
> > > v2: Use enum as the status of a driver's job
> > > timeout callback method.
> > > 
> > > Cc: Alexander Deucher 
> > > Cc: Andrey Grodzovsky 
> > > Cc: Christian König 
> > > Cc: Daniel Vetter 
> > > Cc: Lucas Stach 
> > > Cc: Russell King 
> > > Cc: Christian Gmeiner 
> > > Cc: Qiang Yu 
> > > Cc: Rob Herring 
> > > Cc: Tomeu Vizoso 
> > > Cc: Steven Price 
> > > Cc: Alyssa Rosenzweig 
> > > Cc: Eric Anholt 
> > > Reported-by: kernel test robot 
> > > Signed-off-by: Luben Tuikov 
> > > ---
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  6 +++--
> > >  drivers/gpu/drm/etnaviv/etnaviv_sched.c | 10 +++-
> > >  drivers/gpu/drm/lima/lima_sched.c   |  4 +++-
> > >  drivers/gpu/drm/panfrost/panfrost_job.c |  9 ---
> > >  drivers/gpu/drm/scheduler/sched_main.c  |  4 +---
> > >  drivers/gpu/drm/v3d/v3d_sched.c | 32 +
> > >  include/drm/gpu_scheduler.h | 20 +---
> > >  7 files changed, 57 insertions(+), 28 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > index ff48101bab55..a111326cbdde 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > @@ -28,7 +28,7 @@
> > >  #include "amdgpu.h"
> > >  #include "amdgpu_trace.h"
> > >  
> > > 
> > > 
> > > 
> > > -static void amdgpu_job_timedout(struct drm_sched_job *s_job)
> > > +static enum drm_task_status amdgpu_job_timedout(struct drm_sched_job 
> > > *s_job)
> > >  {
> > >   struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
> > >   struct amdgpu_job *job = to_amdgpu_job(s_job);
> > > @@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct drm_sched_job 
> > > *s_job)
> > >   amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) 
> > > {
> > >   DRM_ERROR("ring %s timeout, but soft recovered\n",
> > >     s_job->sched->name);
> > > - return;
> > > + return DRM_TASK_STATUS_ALIVE;
> > >   }
> > >  
> > > 
> > > 
> > > 
> > >   amdgpu_vm_get_task_info(ring->adev, job->pasid, );
> > > @@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct drm_sched_job 
> > > *s_job)
> > >  
> > > 
> > > 
> > > 
> > >   if (amdgpu_device_should_recover_gpu(ring->adev)) {
> > >   amdgpu_device_gpu_recover(ring->adev, job);
> > > + return DRM_TASK_STATUS_ALIVE;
> > >   } else {
> > >   drm_sched_suspend_timeout(>sched);
> > >   

Re: [PATCH 1/1] drm/scheduler: Job timeout handler returns status (v2)

2020-12-10 Thread Lucas Stach
Hi Luben,

Am Mittwoch, den 09.12.2020, 21:14 -0500 schrieb Luben Tuikov:
> This patch does not change current behaviour.
> 
> The driver's job timeout handler now returns
> status indicating back to the DRM layer whether
> the task (job) was successfully aborted or whether
> more time should be given to the task to complete.
> 
> Default behaviour as of this patch, is preserved,
> except in obvious-by-comment case in the Panfrost
> driver, as documented below.
> 
> All drivers which make use of the
> drm_sched_backend_ops' .timedout_job() callback
> have been accordingly renamed and return the
> would've-been default value of
> DRM_TASK_STATUS_ALIVE to restart the task's
> timeout timer--this is the old behaviour, and
> is preserved by this patch.
> 
> In the case of the Panfrost driver, its timedout
> callback correctly first checks if the job had
> completed in due time and if so, it now returns
> DRM_TASK_STATUS_COMPLETE to notify the DRM layer
> that the task can be moved to the done list, to be
> freed later. In the other two subsequent checks,
> the value of DRM_TASK_STATUS_ALIVE is returned, as
> per the default behaviour.
> 
> A more involved driver's solutions can be had
> in subequent patches.
> 
> v2: Use enum as the status of a driver's job
> timeout callback method.
> 
> Cc: Alexander Deucher 
> Cc: Andrey Grodzovsky 
> Cc: Christian König 
> Cc: Daniel Vetter 
> Cc: Lucas Stach 
> Cc: Russell King 
> Cc: Christian Gmeiner 
> Cc: Qiang Yu 
> Cc: Rob Herring 
> Cc: Tomeu Vizoso 
> Cc: Steven Price 
> Cc: Alyssa Rosenzweig 
> Cc: Eric Anholt 
> Reported-by: kernel test robot 
> Signed-off-by: Luben Tuikov 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  6 +++--
>  drivers/gpu/drm/etnaviv/etnaviv_sched.c | 10 +++-
>  drivers/gpu/drm/lima/lima_sched.c   |  4 +++-
>  drivers/gpu/drm/panfrost/panfrost_job.c |  9 ---
>  drivers/gpu/drm/scheduler/sched_main.c  |  4 +---
>  drivers/gpu/drm/v3d/v3d_sched.c | 32 +
>  include/drm/gpu_scheduler.h | 20 +---
>  7 files changed, 57 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> index ff48101bab55..a111326cbdde 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> @@ -28,7 +28,7 @@
>  #include "amdgpu.h"
>  #include "amdgpu_trace.h"
>  
> 
> 
> 
> -static void amdgpu_job_timedout(struct drm_sched_job *s_job)
> +static enum drm_task_status amdgpu_job_timedout(struct drm_sched_job *s_job)
>  {
>   struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
>   struct amdgpu_job *job = to_amdgpu_job(s_job);
> @@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
>   amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) 
> {
>   DRM_ERROR("ring %s timeout, but soft recovered\n",
>     s_job->sched->name);
> - return;
> + return DRM_TASK_STATUS_ALIVE;
>   }
>  
> 
> 
> 
>   amdgpu_vm_get_task_info(ring->adev, job->pasid, );
> @@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct drm_sched_job 
> *s_job)
>  
> 
> 
> 
>   if (amdgpu_device_should_recover_gpu(ring->adev)) {
>   amdgpu_device_gpu_recover(ring->adev, job);
> + return DRM_TASK_STATUS_ALIVE;
>   } else {
>   drm_sched_suspend_timeout(>sched);
>   if (amdgpu_sriov_vf(adev))
>   adev->virt.tdr_debug = true;
> + return DRM_TASK_STATUS_ALIVE;
>   }
>  }
>  
> 
> 
> 
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c 
> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> index cd46c882269c..c49516942328 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> @@ -82,7 +82,8 @@ static struct dma_fence *etnaviv_sched_run_job(struct 
> drm_sched_job *sched_job)
>   return fence;
>  }
>  
> 
> 
> 
> -static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job)
> +static enum drm_task_status etnaviv_sched_timedout_job(struct drm_sched_job
> +*sched_job)
>  {
>   struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
>   struct etnaviv_gpu *gpu = submit->gpu;
> @@ -120,9 +121,16 @@ static void etnaviv_sched_timedout_job(struct 
> drm_sched_job *sched_job)
>  
>   drm_sched_resubmit_jobs(>sched);

Re: [PATCH 3/6] drm/scheduler: Job timeout handler returns status

2020-11-25 Thread Lucas Stach
Am Mittwoch, den 25.11.2020, 11:22 + schrieb Steven Price:
> On 25/11/2020 11:15, Lucas Stach wrote:
> > Am Mittwoch, den 25.11.2020, 11:04 + schrieb Steven Price:
> > > On 25/11/2020 03:17, Luben Tuikov wrote:
> > > > The job timeout handler now returns status
> > > > indicating back to the DRM layer whether the job
> > > > was successfully cancelled or whether more time
> > > > should be given to the job to complete.
> > > 
> > > I'm not sure I understand in what circumstances you would want to give
> > > the job more time to complete. Could you expand on that?
> > 
> > On etnaviv we don't have the ability to preempt a running job, but we
> > can look at the GPU state to determine if it's still making progress
> > with the current job, so we want to extend the timeout in that case to
> > not kill a long running but valid job.
> 
> Ok, fair enough. Although from my experience (on Mali) jobs very rarely 
> "get stuck" it's just that their run time can be excessive[1] causing 
> other processes to not make forward progress. So I'd expect the timeout 
> to be set based on how long a job can run before you need to stop it to 
> allow other processes to run their jobs.

Yea, we might want to kill the job eventually, but people tend to get
very angry if their use-case gets broken just because the userspace
driver manages to put enough blits in one job to run over the 500ms
timeout we allow for a job and the kernel then just hard-kills the job.

In an ideal world we would just preempt the job and allow something
else to run for a while, but without proper preemption support in HW
that's not an option right now.

> But I'm not familiar with etnaviv so perhaps stuck jobs are actually a 
> thing there.

It happens from time to time when our understanding of the HW isn't
complete and the userspace driver manages to create command streams
with missing semaphores between HW engines. ;)

Regards,
Lucas

> Thanks,
> 
> Steve
> 
> [1] Also on Mali it's quite possible to create an infinite duration job 
> which appears to be making forward progress, so in that case our measure 
> of progress isn't useful against these malicious jobs.
> 
> > Regards,
> > Lucas
> > 
> > > One thing we're missing at the moment in Panfrost is the ability to
> > > suspend ("soft stop" is the Mali jargon) a job and pick something else
> > > to run. The propitiatory driver stack uses this to avoid timing out long
> > > running jobs while still allowing other processes to have time on the
> > > GPU. But this interface as it stands doesn't seem to provide that.
> > > 
> > > As the kernel test robot has already pointed out - you'll need to at the
> > > very least update the other uses of this interface.
> > > 
> > > Steve
> > > 
> > > > Signed-off-by: Luben Tuikov 
> > > > ---
> > > >drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  6 --
> > > >include/drm/gpu_scheduler.h | 13 ++---
> > > >2 files changed, 14 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
> > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > > index ff48101bab55..81b73790ecc6 100644
> > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > > @@ -28,7 +28,7 @@
> > > >#include "amdgpu.h"
> > > >#include "amdgpu_trace.h"
> > > >
> > > > -static void amdgpu_job_timedout(struct drm_sched_job *s_job)
> > > > +static int amdgpu_job_timedout(struct drm_sched_job *s_job)
> > > >{
> > > > struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
> > > > struct amdgpu_job *job = to_amdgpu_job(s_job);
> > > > @@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct drm_sched_job 
> > > > *s_job)
> > > > amdgpu_ring_soft_recovery(ring, job->vmid, 
> > > > s_job->s_fence->parent)) {
> > > > DRM_ERROR("ring %s timeout, but soft recovered\n",
> > > >   s_job->sched->name);
> > > > -   return;
> > > > +   return 0;
> > > > }
> > > >
> > > > amdgpu_vm_get_task_info(ring->adev, job->pasid, );
> > > > @@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct 
> > > > drm_sched_job

Re: [PATCH 3/6] drm/scheduler: Job timeout handler returns status

2020-11-25 Thread Lucas Stach
Am Mittwoch, den 25.11.2020, 11:04 + schrieb Steven Price:
> On 25/11/2020 03:17, Luben Tuikov wrote:
> > The job timeout handler now returns status
> > indicating back to the DRM layer whether the job
> > was successfully cancelled or whether more time
> > should be given to the job to complete.
> 
> I'm not sure I understand in what circumstances you would want to give 
> the job more time to complete. Could you expand on that?

On etnaviv we don't have the ability to preempt a running job, but we
can look at the GPU state to determine if it's still making progress
with the current job, so we want to extend the timeout in that case to
not kill a long running but valid job.

Regards,
Lucas

> One thing we're missing at the moment in Panfrost is the ability to 
> suspend ("soft stop" is the Mali jargon) a job and pick something else 
> to run. The propitiatory driver stack uses this to avoid timing out long 
> running jobs while still allowing other processes to have time on the 
> GPU. But this interface as it stands doesn't seem to provide that.
> 
> As the kernel test robot has already pointed out - you'll need to at the 
> very least update the other uses of this interface.
> 
> Steve
> 
> > Signed-off-by: Luben Tuikov 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  6 --
> >   include/drm/gpu_scheduler.h | 13 ++---
> >   2 files changed, 14 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > index ff48101bab55..81b73790ecc6 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > @@ -28,7 +28,7 @@
> >   #include "amdgpu.h"
> >   #include "amdgpu_trace.h"
> >   
> > -static void amdgpu_job_timedout(struct drm_sched_job *s_job)
> > +static int amdgpu_job_timedout(struct drm_sched_job *s_job)
> >   {
> > struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
> > struct amdgpu_job *job = to_amdgpu_job(s_job);
> > @@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct drm_sched_job 
> > *s_job)
> > amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) 
> > {
> > DRM_ERROR("ring %s timeout, but soft recovered\n",
> >   s_job->sched->name);
> > -   return;
> > +   return 0;
> > }
> >   
> > amdgpu_vm_get_task_info(ring->adev, job->pasid, );
> > @@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct drm_sched_job 
> > *s_job)
> >   
> > if (amdgpu_device_should_recover_gpu(ring->adev)) {
> > amdgpu_device_gpu_recover(ring->adev, job);
> > +   return 0;
> > } else {
> > drm_sched_suspend_timeout(>sched);
> > if (amdgpu_sriov_vf(adev))
> > adev->virt.tdr_debug = true;
> > +   return 1;
> > }
> >   }
> >   
> > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > index 2e0c368e19f6..61f7121e1c19 100644
> > --- a/include/drm/gpu_scheduler.h
> > +++ b/include/drm/gpu_scheduler.h
> > @@ -230,10 +230,17 @@ struct drm_sched_backend_ops {
> > struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
> >   
> > /**
> > - * @timedout_job: Called when a job has taken too long to execute,
> > - * to trigger GPU recovery.
> > +* @timedout_job: Called when a job has taken too long to execute,
> > +* to trigger GPU recovery.
> > +*
> > +* Return 0, if the job has been aborted successfully and will
> > +* never be heard of from the device. Return non-zero if the
> > +* job wasn't able to be aborted, i.e. if more time should be
> > +* given to this job. The result is not "bool" as this
> > +* function is not a predicate, although its result may seem
> > +* as one.
> >  */
> > -   void (*timedout_job)(struct drm_sched_job *sched_job);
> > +   int (*timedout_job)(struct drm_sched_job *sched_job);
> >   
> > /**
> >* @free_job: Called once the job's finished fence has been 
> > signaled
> > 

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v3 03/22] drm/etnaviv: Introduce GEM object functions

2020-09-23 Thread Lucas Stach
On Mi, 2020-09-23 at 12:21 +0200, Thomas Zimmermann wrote:
> GEM object functions deprecate several similar callback interfaces in
> struct drm_driver. This patch replaces the per-driver callbacks with
> per-instance callbacks in etnaviv. The only exception is gem_prime_mmap,
> which is non-trivial to convert.
> 
> Signed-off-by: Thomas Zimmermann 
> Reviewed-by: Daniel Vetter 

Acked-by: Lucas Stach 
> ---
>  drivers/gpu/drm/etnaviv/etnaviv_drv.c | 13 -
>  drivers/gpu/drm/etnaviv/etnaviv_drv.h |  1 -
>  drivers/gpu/drm/etnaviv/etnaviv_gem.c | 19 ++-
>  3 files changed, 18 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c 
> b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> index a9a3afaef9a1..aa270b79e585 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> @@ -468,12 +468,6 @@ static const struct drm_ioctl_desc etnaviv_ioctls[] = {
>   ETNA_IOCTL(PM_QUERY_SIG, pm_query_sig, DRM_RENDER_ALLOW),
>  };
>  
> -static const struct vm_operations_struct vm_ops = {
> - .fault = etnaviv_gem_fault,
> - .open = drm_gem_vm_open,
> - .close = drm_gem_vm_close,
> -};
> -
>  static const struct file_operations fops = {
>   .owner  = THIS_MODULE,
>   .open   = drm_open,
> @@ -490,16 +484,9 @@ static struct drm_driver etnaviv_drm_driver = {
>   .driver_features= DRIVER_GEM | DRIVER_RENDER,
>   .open   = etnaviv_open,
>   .postclose   = etnaviv_postclose,
> - .gem_free_object_unlocked = etnaviv_gem_free_object,
> - .gem_vm_ops = _ops,
>   .prime_handle_to_fd = drm_gem_prime_handle_to_fd,
>   .prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> - .gem_prime_pin  = etnaviv_gem_prime_pin,
> - .gem_prime_unpin= etnaviv_gem_prime_unpin,
> - .gem_prime_get_sg_table = etnaviv_gem_prime_get_sg_table,
>   .gem_prime_import_sg_table = etnaviv_gem_prime_import_sg_table,
> - .gem_prime_vmap = etnaviv_gem_prime_vmap,
> - .gem_prime_vunmap   = etnaviv_gem_prime_vunmap,
>   .gem_prime_mmap = etnaviv_gem_prime_mmap,
>  #ifdef CONFIG_DEBUG_FS
>   .debugfs_init   = etnaviv_debugfs_init,
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.h 
> b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> index 4d8dc9236e5f..914f0867ff71 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> @@ -49,7 +49,6 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void 
> *data,
>   struct drm_file *file);
>  
>  int etnaviv_gem_mmap(struct file *filp, struct vm_area_struct *vma);
> -vm_fault_t etnaviv_gem_fault(struct vm_fault *vmf);
>  int etnaviv_gem_mmap_offset(struct drm_gem_object *obj, u64 *offset);
>  struct sg_table *etnaviv_gem_prime_get_sg_table(struct drm_gem_object *obj);
>  void *etnaviv_gem_prime_vmap(struct drm_gem_object *obj);
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c 
> b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> index d1533bdc1335..67d9a2b9ea6a 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> @@ -171,7 +171,7 @@ int etnaviv_gem_mmap(struct file *filp, struct 
> vm_area_struct *vma)
>   return obj->ops->mmap(obj, vma);
>  }
>  
> -vm_fault_t etnaviv_gem_fault(struct vm_fault *vmf)
> +static vm_fault_t etnaviv_gem_fault(struct vm_fault *vmf)
>  {
>   struct vm_area_struct *vma = vmf->vma;
>   struct drm_gem_object *obj = vma->vm_private_data;
> @@ -559,6 +559,22 @@ void etnaviv_gem_obj_add(struct drm_device *dev, struct 
> drm_gem_object *obj)
>   mutex_unlock(>gem_lock);
>  }
>  
> +static const struct vm_operations_struct vm_ops = {
> + .fault = etnaviv_gem_fault,
> + .open = drm_gem_vm_open,
> + .close = drm_gem_vm_close,
> +};
> +
> +static const struct drm_gem_object_funcs etnaviv_gem_object_funcs = {
> + .free = etnaviv_gem_free_object,
> + .pin = etnaviv_gem_prime_pin,
> + .unpin = etnaviv_gem_prime_unpin,
> + .get_sg_table = etnaviv_gem_prime_get_sg_table,
> + .vmap = etnaviv_gem_prime_vmap,
> + .vunmap = etnaviv_gem_prime_vunmap,
> + .vm_ops = _ops,
> +};
> +
>  static int etnaviv_gem_new_impl(struct drm_device *dev, u32 size, u32 flags,
>   const struct etnaviv_gem_ops *ops, struct drm_gem_object **obj)
>  {
> @@ -593,6 +609,7 @@ static int etnaviv_gem_new_impl(struct drm_device *dev, 
> u32 size, u32 flags,
>   INIT_LIST_HEAD(_obj->vram_list);
>  
>   *obj = _obj->base;
> + (*obj)->funcs = _gem_object_funcs;
>  
>   return 0;
>  }

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.

2020-07-21 Thread Lucas Stach
Hi Andrey,

Am Mittwoch, den 12.02.2020, 11:33 -0500 schrieb Andrey Grodzovsky:
> On 2/11/20 7:53 PM, Luben Tuikov wrote:
> > On 2020-02-11 4:27 p.m., Andrey Grodzovsky wrote:
> > > On 2/11/20 10:55 AM, Andrey Grodzovsky wrote:
> > > > On 2/10/20 4:50 PM, Luben Tuikov wrote:
> > > > > Hi Lucas,
> > > > > 
> > > > > Thank you for bringing awareness of this issue, publicly.
> > > > > 
> > > > > As soon as this patch showed up back in November of 2019,
> > > > > I objected to it, privately.
> > > > 
> > > > I didn't find this objection in my mail actually
> > Yes, I didn't send it to you.
> > 
> > > > > I suggested to instead use a _list_ to store the "state" of
> > > > > all jobs of the same state. Then, at any time, timeout interrupt
> > > > > or whatever, we can atomically (irq spinlock) move the timeout/bad
> > > > > job to the timedout/cleanup/bad job list, and wake someone up
> > > > > to deal with that list asynchronously, and return from the
> > > > > interrupt/etc.
> > > > > immediately.
> > > > 
> > > > Sounds a good idea to me, i think enough for us to have 2 lists,
> > > > timeout list for jobs scheduled to HW and not yet completed
> > > > (completion fence signaled) and cleanup list for those that did
> > > > complete. This should give alternative solution to the race condition
> > > > this patch was addressing without causing the break the Lucas
> > > > reported. If no one objects I think i can try implement it.
> > > > 
> > > > Andrey
> > > 
> > > Thinking more i realize Luben is right about having also bad job list as
> > > this is needed for normal job competition (by fence callback from
> > > amdgpu_fence_process)  and you need to decide if you move it to cleanup
> > > list from timeout list or not. If it's already in bad job list - meaning
> > > that it's being processed by GPU recovery code you don't touch it,
> > > otherwise you move it to cleanup list where it will be freed eventually
> > > by invocation of drm_sched_get_cleanup_job.
> > Yep...
> > 
> > Perhaps fewer lists, than "timeout", "bad" and "cleanup" could be had.
> > I'd also name the "bad" list as "recovery" list, as that is what would
> > be done to commands on that list.
> > 
> > "Timeout" is a status "timed-out", so perhaps just set the timeout
> > flag and move it to a "done" list. (Note that the command can still
> > complete asynchronously while on that list and while it has status
> > "timed-out'.)
> > 
> > The idea is that,
> > 1) it avoid contention and races when more than one context
> > can update the job at the same time, and
> > 2) easy to process all jobs of a certain state and/or
> > move them around, etc.
> > 
> > Let's discuss it and come up with a plan. :-)
> > 
> > Regards,
> > Luben
> 
> Sure, let me maybe come up with a draft patch so we have more concrete 
> stuff to discuss and review.

It seems we all dropped the ball on this one. I believe this is still
an open issue. Has there been any progress from your side on fixing
this?

Regards,
Lucas

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker

2020-04-09 Thread Lucas Stach
Am Donnerstag, den 09.04.2020, 14:35 +0200 schrieb Christian König:
> Am 09.04.20 um 03:31 schrieb xinhui pan:
> > The delayed delete list is per device which might be very huge. And in
> > a heavy workload test, the list might always not be empty. That will
> > trigger any RCU stall warnings or softlockups in non-preemptible kernels
> > Lets do schedule out if possible in that case.
> 
> Mhm, I'm not sure if that is actually allowed. This is called from a 
> work item and those are not really supposed to be scheduled away.

Huh? Workitems can schedule out just fine, otherwise they would be
horribly broken when it comes to sleeping locks. The workqueue code
even has measures to keep the workqueues at the expected concurrency
level by starting other workitems when one of them goes to sleep.

Regards,
Lucas

> Maybe rather change the while into while (!list_empty(>ddestroy) 
> && !should_reschedule(0)).
> 
> Christian.
> 
> > Signed-off-by: xinhui pan 
> > ---
> >   drivers/gpu/drm/ttm/ttm_bo.c | 1 +
> >   1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> > index 9e07c3f75156..b8d853cab33b 100644
> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > @@ -541,6 +541,7 @@ static bool ttm_bo_delayed_delete(struct ttm_bo_device 
> > *bdev, bool remove_all)
> > }
> >   
> > ttm_bo_put(bo);
> > +   cond_resched();
> > spin_lock(>lru_lock);
> > }
> > list_splice_tail(, >ddestroy);
> 
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


latencies and code inefficiencies in amdgpu display handling

2020-03-24 Thread Lucas Stach
Hi guys,

recently I've been tracing some IRQ latencies in a system and the
display handling in amdgpu doesn't really look that good. To be honest
it also doesn't look too bad, but I still want to share my findings
here. The trace below is from a single vblank IRQ with a pageflip.

The most interesting points from my perspective:

1. While this is a single CRTC vblank IRQ, the handler calls
amdgpu_get_vblank_counter_kms 10(!) times. This isn't really a cheap
function as it also reads the current scanout position and thus makes
multiple MMIO device reads.
This seems like low-hanging fruit for optimiaztion, as querying the
same thing this many times in a single IRQ invocation seems like total
overkill.

2. In this particular trace one of the scanout position reads seems to
block. The trace indicates that almost 300us are spent in this single
device read. Is this a known issue?

3. There are quite a few spinlocks being locked with spin_lock_irqsave,
while this handler code is never called from non-IRQ context, so all
those calls could use the cheaper spin_lock. This is a micro
optimization, but it caught my eye when looking over the trace/code.

Regards,
Lucas


# tracer: irqsoff
#
# irqsoff latency trace v1.1.5 on 5.6.0-rc7+
# 
# latency: 417 us, #446/446, CPU#6 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:16)
#-
#| task: user-task-0 (uid:1000 nice:0 policy:0 rt_prio:0)
#-
#  => started at: interrupt_entry
#  => ended at:   swapgs_restore_regs_and_return_to_usermode
#
#
#  _--=> CPU#
# / _-=> irqs-off
#| / _=> need-resched
#|| / _---=> hardirq/softirq 
#||| / _--=> preempt-depth   
# / delay
#  cmd pid   | time  |   caller  
# \   /  |  \|   / 
  user-task-06d...0us : trace_hardirqs_off_thunk <-interrupt_entry
  user-task-06d...0us : do_IRQ <-ret_from_intr
  user-task-06d...0us : irq_enter <-do_IRQ
  user-task-06d...0us : rcu_irq_enter <-irq_enter
  user-task-06d...0us : irqtime_account_irq <-irq_enter
  user-task-06d.h.1us : handle_edge_irq <-do_IRQ
  user-task-06d.h.1us : _raw_spin_lock <-handle_edge_irq
  user-task-06d.h.1us : irq_may_run <-handle_edge_irq
  user-task-06d.h.1us : irq_chip_ack_parent <-handle_edge_irq
  user-task-06d.h.1us : apic_ack_irq <-handle_edge_irq
  user-task-06d.h.1us : handle_irq_event <-handle_edge_irq
  user-task-06d.h.1us : handle_irq_event_percpu <-handle_irq_event
  user-task-06d.h.1us : __handle_irq_event_percpu 
<-handle_irq_event_percpu
  user-task-06d.h.1us : amdgpu_irq_handler <-__handle_irq_event_percpu
  user-task-06d.h.2us : amdgpu_ih_process <-amdgpu_irq_handler
  user-task-06d.h.2us : tonga_ih_get_wptr <-amdgpu_ih_process
  user-task-06d.h.2us : __drm_dbg <-amdgpu_ih_process
  user-task-06d.h.2us : amdgpu_irq_dispatch <-amdgpu_ih_process
  user-task-06d.h.2us : tonga_ih_decode_iv <-amdgpu_irq_dispatch
  user-task-06d.h.3us : amdgpu_dm_irq_handler <-amdgpu_irq_dispatch
  user-task-06d.h.3us : dc_interrupt_to_irq_source 
<-amdgpu_dm_irq_handler
  user-task-06d.h.3us : dal_irq_service_to_irq_source 
<-amdgpu_dm_irq_handler
  user-task-06d.h.4us : to_dal_irq_source_dce110 <-amdgpu_dm_irq_handler
  user-task-06d.h.4us : dc_interrupt_ack <-amdgpu_dm_irq_handler
  user-task-06d.h.5us : dal_irq_service_ack <-amdgpu_dm_irq_handler
  user-task-06d.h.5us : dal_irq_service_ack_generic 
<-dal_irq_service_ack
  user-task-06d.h.5us : dm_read_reg_func <-dal_irq_service_ack_generic
  user-task-06d.h.5us : amdgpu_cgs_read_register <-dm_read_reg_func
  user-task-06d.h.6us : amdgpu_mm_rreg <-dm_read_reg_func
  user-task-06d.h.7us : amdgpu_cgs_write_register 
<-dal_irq_service_ack_generic
  user-task-06d.h.7us : amdgpu_mm_wreg <-dal_irq_service_ack_generic
  user-task-06d.h.7us : _raw_spin_lock_irqsave <-amdgpu_dm_irq_handler
  user-task-06d.h.8us : dm_crtc_high_irq <-amdgpu_dm_irq_handler
  user-task-06d.h.8us : get_crtc_by_otg_inst.isra.0 <-dm_crtc_high_irq
  user-task-06d.h.8us : __drm_dbg <-dm_crtc_high_irq
  user-task-06d.h.8us : drm_crtc_handle_vblank <-dm_crtc_high_irq
  user-task-06d.h.9us : drm_handle_vblank <-dm_crtc_high_irq
  user-task-06d.h.9us : _raw_spin_lock_irqsave <-drm_handle_vblank
  user-task-06d.h.9us : _raw_spin_lock <-drm_handle_vblank
  user-task-06d.h.9us : drm_update_vblank_count <-drm_handle_vblank
  user-task-06d.h.9us : __get_vblank_counter <-drm_update_vblank_count
  user-task-06d.h.   10us : drm_crtc_from_index <-__get_vblank_counter
  

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Lucas Stach
On Fr, 2020-02-28 at 10:47 +0100, Daniel Vetter wrote:
> On Fri, Feb 28, 2020 at 10:29 AM Erik Faye-Lund
>  wrote:
> > On Fri, 2020-02-28 at 13:37 +1000, Dave Airlie wrote:
> > > On Fri, 28 Feb 2020 at 07:27, Daniel Vetter 
> > > wrote:
> > > > Hi all,
> > > > 
> > > > You might have read the short take in the X.org board meeting
> > > > minutes
> > > > already, here's the long version.
> > > > 
> > > > The good news: gitlab.fd.o has become very popular with our
> > > > communities, and is used extensively. This especially includes all
> > > > the
> > > > CI integration. Modern development process and tooling, yay!
> > > > 
> > > > The bad news: The cost in growth has also been tremendous, and it's
> > > > breaking our bank account. With reasonable estimates for continued
> > > > growth we're expecting hosting expenses totalling 75k USD this
> > > > year,
> > > > and 90k USD next year. With the current sponsors we've set up we
> > > > can't
> > > > sustain that. We estimate that hosting expenses for gitlab.fd.o
> > > > without any of the CI features enabled would total 30k USD, which
> > > > is
> > > > within X.org's ability to support through various sponsorships,
> > > > mostly
> > > > through XDC.
> > > > 
> > > > Note that X.org does no longer sponsor any CI runners themselves,
> > > > we've stopped that. The huge additional expenses are all just in
> > > > storing and serving build artifacts and images to outside CI
> > > > runners
> > > > sponsored by various companies. A related topic is that with the
> > > > growth in fd.o it's becoming infeasible to maintain it all on
> > > > volunteer admin time. X.org is therefore also looking for admin
> > > > sponsorship, at least medium term.
> > > > 
> > > > Assuming that we want cash flow reserves for one year of
> > > > gitlab.fd.o
> > > > (without CI support) and a trimmed XDC and assuming no sponsor
> > > > payment
> > > > meanwhile, we'd have to cut CI services somewhere between May and
> > > > June
> > > > this year. The board is of course working on acquiring sponsors,
> > > > but
> > > > filling a shortfall of this magnitude is neither easy nor quick
> > > > work,
> > > > and we therefore decided to give an early warning as soon as
> > > > possible.
> > > > Any help in finding sponsors for fd.o is very much appreciated.
> > > 
> > > a) Ouch.
> > > 
> > > b) we probably need to take a large step back here.
> > > 
> > 
> > I kinda agree, but maybe the step doesn't have to be *too* large?
> > 
> > I wonder if we could solve this by restructuring the project a bit. I'm
> > talking purely from a Mesa point of view here, so it might not solve
> > the full problem, but:
> > 
> > 1. It feels silly that we need to test changes to e.g the i965 driver
> > on dragonboards. We only have a big "do not run CI at all" escape-
> > hatch.
> > 
> > 2. A lot of us are working for a company that can probably pay for
> > their own needs in terms of CI. Perhaps moving some costs "up front" to
> > the company that needs it can make the future of CI for those who can't
> > do this
> > 
> > 3. I think we need a much more detailed break-down of the cost to make
> > educated changes. For instance, how expensive is Docker image
> > uploads/downloads (e.g intermediary artifacts) compared to build logs
> > and final test-results? What kind of artifacts?
> 
> We have logs somewhere, but no one yet got around to analyzing that.
> Which will be quite a bit of work to do since the cloud storage is
> totally disconnected from the gitlab front-end, making the connection
> to which project or CI job caused something is going to require
> scripting. Volunteers definitely very much welcome I think.

It's very surprising to me that this kind of cost monitoring is treated
as an afterthought, especially since one of the main jobs of the X.Org
board is to keep spending under control and transparent.

Also from all the conversations it's still unclear to me if the google
hosting costs are already over the sponsored credits (so is burning a
hole into X.org bank account right now) or if this is only going to
happen at a later point in time.

Even with CI disabled it seems that the board estimates a cost of 30k
annually for the plain gitlab hosting. Is this covered by the credits
sponsored by google? If not, why wasn't there a board voting on this
spending? All other spending seem to require pre-approval by the board.
Why wasn't gitlab hosting cost discussed much earlier in the public
board meetings, especially if it's going to be such an big chunk of the
overall spending of the X.Org foundation?

Regards,
Lucas

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.

2020-02-06 Thread Lucas Stach
Hi all,

On Mi, 2020-02-05 at 19:24 +0100, Lucas Stach wrote:
> Hi Andrey,
> 
> This commit breaks all drivers, which may bail out of the timeout
> processing as they wish to extend the timeout (etnaviv, v3d).
> 
> Those drivers currently just return from the timeout handler before
> calling drm_sched_stop(), which means with this commit applied we are
> removing the first job from the ring_mirror_list, but never put it
> back. This leads to jobs getting lost from the ring mirror, which then
> causes quite a bit of fallout like unsignaled fences.
> 
> Not sure yet what to do about it, we can either add a function to add
> the job back to the ring_mirror if the driver wants to extend the
> timeout, or we could look for another way to stop
> drm_sched_cleanup_jobs from freeing jobs that are currently in timeout
> processing.

So after thinking about this a bit more my opinion is that we need to
revert this change for now and go back to the drawing board for the
scheduler timeout handling.

Right now this starts to feel like a big midlayer mistake with all the
very intricate intertwining between the drivers and the scheduler. The
rules on when it's safe to manipulate the ring mirror and when
completed jobs are signaled and freed are not really well specified.
The fact that we need to mutate state in order to get rid of races
instead of having a single big "timeout processing is owner of the
scheduler state for now" is a big fat warning sign IMHO.

It took me far longer than I'd like to admit to understand the failure
mode with fences not getting signaled after a GPU hang. The back and
forth between scheduler and driver code makes things really hard to
follow.

Regards,
Lucas

> Regards,
> Lucas
> 
> On Mo, 2019-11-25 at 15:51 -0500, Andrey Grodzovsky wrote:
> > Problem:
> > Due to a race between drm_sched_cleanup_jobs in sched thread and
> > drm_sched_job_timedout in timeout work there is a possiblity that
> > bad job was already freed while still being accessed from the
> > timeout thread.
> > 
> > Fix:
> > Instead of just peeking at the bad job in the mirror list
> > remove it from the list under lock and then put it back later when
> > we are garanteed no race with main sched thread is possible which
> > is after the thread is parked.
> > 
> > v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.
> > 
> > v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
> > drm_sched_get_cleanup_job already has a lock there.
> > 
> > v4: Fix comments to relfect latest code in drm-misc.
> > 
> > Signed-off-by: Andrey Grodzovsky 
> > Reviewed-by: Christian König 
> > Tested-by: Emily Deng 
> > ---
> >  drivers/gpu/drm/scheduler/sched_main.c | 27 +++
> >  1 file changed, 27 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> > b/drivers/gpu/drm/scheduler/sched_main.c
> > index 6774955..1bf9c40 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct work_struct 
> > *work)
> > unsigned long flags;
> >  
> > sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
> > +
> > +   /* Protects against concurrent deletion in drm_sched_get_cleanup_job */
> > +   spin_lock_irqsave(>job_list_lock, flags);
> > job = list_first_entry_or_null(>ring_mirror_list,
> >struct drm_sched_job, node);
> >  
> > if (job) {
> > +   /*
> > +* Remove the bad job so it cannot be freed by concurrent
> > +* drm_sched_cleanup_jobs. It will be reinserted back after 
> > sched->thread
> > +* is parked at which point it's safe.
> > +*/
> > +   list_del_init(>node);
> > +   spin_unlock_irqrestore(>job_list_lock, flags);
> > +
> > job->sched->ops->timedout_job(job);
> >  
> > /*
> > @@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct work_struct 
> > *work)
> > job->sched->ops->free_job(job);
> > sched->free_guilty = false;
> > }
> > +   } else {
> > +   spin_unlock_irqrestore(>job_list_lock, flags);
> > }
> >  
> > spin_lock_irqsave(>job_list_lock, flags);
> > @@ -370,6 +383,20 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, 
> > struct drm_sched_job *bad)
> > kthread_park(sched->thread);
> >  

Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.

2020-02-05 Thread Lucas Stach
Hi Andrey,

This commit breaks all drivers, which may bail out of the timeout
processing as they wish to extend the timeout (etnaviv, v3d).

Those drivers currently just return from the timeout handler before
calling drm_sched_stop(), which means with this commit applied we are
removing the first job from the ring_mirror_list, but never put it
back. This leads to jobs getting lost from the ring mirror, which then
causes quite a bit of fallout like unsignaled fences.

Not sure yet what to do about it, we can either add a function to add
the job back to the ring_mirror if the driver wants to extend the
timeout, or we could look for another way to stop
drm_sched_cleanup_jobs from freeing jobs that are currently in timeout
processing.

Regards,
Lucas

On Mo, 2019-11-25 at 15:51 -0500, Andrey Grodzovsky wrote:
> Problem:
> Due to a race between drm_sched_cleanup_jobs in sched thread and
> drm_sched_job_timedout in timeout work there is a possiblity that
> bad job was already freed while still being accessed from the
> timeout thread.
> 
> Fix:
> Instead of just peeking at the bad job in the mirror list
> remove it from the list under lock and then put it back later when
> we are garanteed no race with main sched thread is possible which
> is after the thread is parked.
> 
> v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.
> 
> v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
> drm_sched_get_cleanup_job already has a lock there.
> 
> v4: Fix comments to relfect latest code in drm-misc.
> 
> Signed-off-by: Andrey Grodzovsky 
> Reviewed-by: Christian König 
> Tested-by: Emily Deng 
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 27 +++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 6774955..1bf9c40 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct work_struct 
> *work)
>   unsigned long flags;
>  
>   sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
> +
> + /* Protects against concurrent deletion in drm_sched_get_cleanup_job */
> + spin_lock_irqsave(>job_list_lock, flags);
>   job = list_first_entry_or_null(>ring_mirror_list,
>  struct drm_sched_job, node);
>  
>   if (job) {
> + /*
> +  * Remove the bad job so it cannot be freed by concurrent
> +  * drm_sched_cleanup_jobs. It will be reinserted back after 
> sched->thread
> +  * is parked at which point it's safe.
> +  */
> + list_del_init(>node);
> + spin_unlock_irqrestore(>job_list_lock, flags);
> +
>   job->sched->ops->timedout_job(job);
>  
>   /*
> @@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct work_struct 
> *work)
>   job->sched->ops->free_job(job);
>   sched->free_guilty = false;
>   }
> + } else {
> + spin_unlock_irqrestore(>job_list_lock, flags);
>   }
>  
>   spin_lock_irqsave(>job_list_lock, flags);
> @@ -370,6 +383,20 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, 
> struct drm_sched_job *bad)
>   kthread_park(sched->thread);
>  
>   /*
> +  * Reinsert back the bad job here - now it's safe as
> +  * drm_sched_get_cleanup_job cannot race against us and release the
> +  * bad job at this point - we parked (waited for) any in progress
> +  * (earlier) cleanups and drm_sched_get_cleanup_job will not be called
> +  * now until the scheduler thread is unparked.
> +  */
> + if (bad && bad->sched == sched)
> + /*
> +  * Add at the head of the queue to reflect it was the earliest
> +  * job extracted.
> +  */
> + list_add(>node, >ring_mirror_list);
> +
> + /*
>* Iterate the job list from later to  earlier one and either deactive
>* their HW callbacks or remove them from mirror list if they already
>* signaled.

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/display: Use msleep instead of udelay for 8ms wait

2019-06-25 Thread Lucas Stach
Hi Harry,

Am Dienstag, den 25.06.2019, 10:00 -0400 schrieb Harry Wentland:
> arm32's udelay only allows values up to 2000 microseconds. msleep
> does the trick for us here as there is no problem if this isn't
> microsecond accurate and takes a tad longer.

A "tad" longer in this case means likely double the intended wait.
Please see "SLEEPING FOR ~USECS OR SMALL MSECS ( 10us - 20ms)" in
Documentation/timers/timers-howto.txt.

The sleep here should use usleep_range. In general the DC code seems to
use quite a lot of the udelay busy waits. I doubt that many of those
occurrences are in atomic context, so could easily use a sleeping wait.

Digging further this seems to apply across amdgpu, not only DC.

Regards,
Lucas

> Signed-off-by: Harry Wentland 
> ---
>  drivers/gpu/drm/amd/display/dc/core/dc_link.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c
> b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
> index 4c31930f1cdf..f5d02f89b3f9 100644
> --- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c
> +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
> @@ -548,7 +548,7 @@ static void
> read_edp_current_link_settings_on_detect(struct dc_link *link)
>   break;
>   }
>  
> - udelay(8000);
> + msleep(8);
>   }
>  
>   ASSERT(status == DC_OK);


Re: [PATCH v2] drm/scheduler: fix timeout worker setup for out of order job completions

2018-08-06 Thread Lucas Stach
Am Montag, den 06.08.2018, 10:12 +0200 schrieb Christian König:
> Am 03.08.2018 um 19:31 schrieb Lucas Stach:
> > Am Montag, den 06.08.2018, 14:57 +0200 schrieb Christian König:
> > > Am 03.08.2018 um 16:29 schrieb Lucas Stach:
> > > > drm_sched_job_finish() is a work item scheduled for each finished job on
> > > > a unbound system workqueue. This means the workers can execute out of 
> > > > order
> > > > with regard to the real hardware job completions.
> > > > 
> > > > If this happens queueing a timeout worker for the first job on the ring
> > > > mirror list is wrong, as this may be a job which has already finished
> > > > executing. Fix this by reorganizing the code to always queue the worker
> > > > for the next job on the list, if this job hasn't finished yet. This is
> > > > robust against a potential reordering of the finish workers.
> > > > 
> > > > Also move out the timeout worker cancelling, so that we don't need to
> > > > take the job list lock twice. As a small optimization list_del is used
> > > > to remove the job from the ring mirror list, as there is no need to
> > > > reinit the list head in the job we are about to free.
> > > > 
> > > > > > Signed-off-by: Lucas Stach 
> > > > 
> > > > ---
> > > > v2: - properly handle last job in the ring
> > > >   - check correct fence for compeletion
> > > > ---
> > > >    drivers/gpu/drm/scheduler/gpu_scheduler.c | 22 ++
> > > >    1 file changed, 10 insertions(+), 12 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
> > > > b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > > > index 44d480768dfe..574875e2c206 100644
> > > > --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > > > +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > > > @@ -452,24 +452,22 @@ static void drm_sched_job_finish(struct 
> > > > work_struct *work)
> > > > > > > > > > > >        
> > > > > > > > > > > > finish_work);
> > > > > >     struct drm_gpu_scheduler *sched = s_job->sched;
> > > > 
> > > >    
> > > > > > > > > > > > -   /* remove job from ring_mirror_list */
> > > > > > > > > > > > -   spin_lock(>job_list_lock);
> > > > > > > > > > > > -   list_del_init(_job->node);
> > > > > > > > > > > > -   if (sched->timeout != MAX_SCHEDULE_TIMEOUT) {
> > > > > > -   struct drm_sched_job *next;
> > > > 
> > > > -
> > > > > > > > > > > > -   spin_unlock(>job_list_lock);
> > > > > > +   if (sched->timeout != MAX_SCHEDULE_TIMEOUT)
> > > > 
> > > >     cancel_delayed_work_sync(_job->work_tdr);
> > > 
> > > That is unfortunately still racy here.
> > > 
> > > Between canceling the job and removing it from the list someone could
> > > actually start the time (in theory) :)
> > > 
> > > Cancel it, remove it from the list and cancel it again.
> > 
> > I don't see how. If we end up in this worker the finished fence of the
> > job is already certainly signaled (as this is what triggers queueing of
> > the worker). So even if some other worker manages to find this job as
> > the next job in the list, the dma_fence_is_signaled check should
> > prevent the timeout worker from getting scheduled again.
> 
> Well that makes sense, but is a bit hard to understand.

I agree, all the possible parallelism and possible re-ordering makes
this seemingly simple part of the scheduler code a bit mind-breaking.
I've added a comment in v3 to capture the line of thought for future
reference.

Regards,
Lucas

> Anyway, please remove the extra "if" check. With that done the patch is 
> > Reviewed-by: Christian König .
> 
> Thanks,
> Christian.
> 
> 
> > 
> > > BTW: You could completely drop the "if (sched->timeout !=
> > > MAX_SCHEDULE_TIMEOUT)" here cause canceling is harmless as long as the
> > > structure is initialized.
> > 
> > Right.
> > 
> > Regards,
> > Lucas
> > 
> > > Christian.
> > > 
> > > > > > -

[PATCH v3] drm/scheduler: fix timeout worker setup for out of order job completions

2018-08-06 Thread Lucas Stach
drm_sched_job_finish() is a work item scheduled for each finished job on
a unbound system workqueue. This means the workers can execute out of order
with regard to the real hardware job completions.

If this happens queueing a timeout worker for the first job on the ring
mirror list is wrong, as this may be a job which has already finished
executing. Fix this by reorganizing the code to always queue the worker
for the next job on the list, if this job hasn't finished yet. This is
robust against a potential reordering of the finish workers.

Also move out the timeout worker cancelling, so that we don't need to
take the job list lock twice. As a small optimization list_del is used
to remove the job from the ring mirror list, as there is no need to
reinit the list head in the job we are about to free.

Signed-off-by: Lucas Stach 
Reviewed-by: Christian König 
---
v2: - properly handle last job in the ring
- check correct fence for completion
v3: - drop condition before cancel_delayed_work_sync
- add a comment explaining how we avoid the timeout to be
  restarted without removing the job from the ring mirror first
---
 drivers/gpu/drm/scheduler/gpu_scheduler.c | 30 +--
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
index 44d480768dfe..fff3716ece6f 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -452,24 +452,28 @@ static void drm_sched_job_finish(struct work_struct *work)
   finish_work);
struct drm_gpu_scheduler *sched = s_job->sched;
 
-   /* remove job from ring_mirror_list */
-   spin_lock(>job_list_lock);
-   list_del_init(_job->node);
-   if (sched->timeout != MAX_SCHEDULE_TIMEOUT) {
-   struct drm_sched_job *next;
-
-   spin_unlock(>job_list_lock);
-   cancel_delayed_work_sync(_job->work_tdr);
-   spin_lock(>job_list_lock);
+   /*
+* Canceling the timeout without removing our job from the ring mirror
+* list is safe, as we will only end up in this worker if our jobs
+* finished fence has been signaled. So even if some another worker
+* manages to find this job as the next job in the list, the fence
+* signaled check below will prevent the timeout to be restarted.
+*/
+   cancel_delayed_work_sync(_job->work_tdr);
 
-   /* queue TDR for next job */
-   next = list_first_entry_or_null(>ring_mirror_list,
-   struct drm_sched_job, node);
+   spin_lock(>job_list_lock);
+   /* queue TDR for next job */
+   if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
+   !list_is_last(_job->node, >ring_mirror_list)) {
+   struct drm_sched_job *next = list_next_entry(s_job, node);
 
-   if (next)
+   if (!dma_fence_is_signaled(>s_fence->finished))
schedule_delayed_work(>work_tdr, sched->timeout);
}
+   /* remove job from ring_mirror_list */
+   list_del(_job->node);
spin_unlock(>job_list_lock);
+
dma_fence_put(_job->s_fence->finished);
sched->ops->free_job(s_job);
 }
-- 
2.18.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2] drm/scheduler: fix timeout worker setup for out of order job completions

2018-08-03 Thread Lucas Stach
Am Montag, den 06.08.2018, 14:57 +0200 schrieb Christian König:
> Am 03.08.2018 um 16:29 schrieb Lucas Stach:
> > drm_sched_job_finish() is a work item scheduled for each finished job on
> > a unbound system workqueue. This means the workers can execute out of order
> > with regard to the real hardware job completions.
> > 
> > If this happens queueing a timeout worker for the first job on the ring
> > mirror list is wrong, as this may be a job which has already finished
> > executing. Fix this by reorganizing the code to always queue the worker
> > for the next job on the list, if this job hasn't finished yet. This is
> > robust against a potential reordering of the finish workers.
> > 
> > Also move out the timeout worker cancelling, so that we don't need to
> > take the job list lock twice. As a small optimization list_del is used
> > to remove the job from the ring mirror list, as there is no need to
> > reinit the list head in the job we are about to free.
> > 
> > > > Signed-off-by: Lucas Stach 
> > ---
> > v2: - properly handle last job in the ring
> >  - check correct fence for compeletion
> > ---
> >   drivers/gpu/drm/scheduler/gpu_scheduler.c | 22 ++
> >   1 file changed, 10 insertions(+), 12 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
> > b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > index 44d480768dfe..574875e2c206 100644
> > --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > @@ -452,24 +452,22 @@ static void drm_sched_job_finish(struct work_struct 
> > *work)
> > > >        finish_work);
> > > >     struct drm_gpu_scheduler *sched = s_job->sched;
> >   
> > > > -   /* remove job from ring_mirror_list */
> > > > -   spin_lock(>job_list_lock);
> > > > -   list_del_init(_job->node);
> > > > -   if (sched->timeout != MAX_SCHEDULE_TIMEOUT) {
> > > > -   struct drm_sched_job *next;
> > -
> > > > -   spin_unlock(>job_list_lock);
> > > > +   if (sched->timeout != MAX_SCHEDULE_TIMEOUT)
> >     cancel_delayed_work_sync(_job->work_tdr);
> 
> That is unfortunately still racy here.
> 
> Between canceling the job and removing it from the list someone could 
> actually start the time (in theory) :)
> 
> Cancel it, remove it from the list and cancel it again.

I don't see how. If we end up in this worker the finished fence of the
job is already certainly signaled (as this is what triggers queueing of
the worker). So even if some other worker manages to find this job as
the next job in the list, the dma_fence_is_signaled check should
prevent the timeout worker from getting scheduled again.

> BTW: You could completely drop the "if (sched->timeout != 
> MAX_SCHEDULE_TIMEOUT)" here cause canceling is harmless as long as the 
> structure is initialized.

Right.

Regards,
Lucas

> Christian.
> 
> > > > -   spin_lock(>job_list_lock);
> >   
> > > > -   /* queue TDR for next job */
> > > > -   next = 
> > > > list_first_entry_or_null(>ring_mirror_list,
> > > > -   struct drm_sched_job, 
> > > > node);
> > > > +   spin_lock(>job_list_lock);
> > > > +   /* queue TDR for next job */
> > > > +   if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
> > > > +   !list_is_last(_job->node, >ring_mirror_list)) {
> > > > +   struct drm_sched_job *next = list_next_entry(s_job, 
> > > > node);
> >   
> > > > -   if (next)
> > > > +   if (!dma_fence_is_signaled(>s_fence->finished))
> > > >     schedule_delayed_work(>work_tdr, 
> > > > sched->timeout);
> > > >     }
> > > > +   /* remove job from ring_mirror_list */
> > > > +   list_del(_job->node);
> > > >     spin_unlock(>job_list_lock);
> > +
> > > >     dma_fence_put(_job->s_fence->finished);
> > > >     sched->ops->free_job(s_job);
> >   }
> 
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2] drm/scheduler: fix timeout worker setup for out of order job completions

2018-08-03 Thread Lucas Stach
drm_sched_job_finish() is a work item scheduled for each finished job on
a unbound system workqueue. This means the workers can execute out of order
with regard to the real hardware job completions.

If this happens queueing a timeout worker for the first job on the ring
mirror list is wrong, as this may be a job which has already finished
executing. Fix this by reorganizing the code to always queue the worker
for the next job on the list, if this job hasn't finished yet. This is
robust against a potential reordering of the finish workers.

Also move out the timeout worker cancelling, so that we don't need to
take the job list lock twice. As a small optimization list_del is used
to remove the job from the ring mirror list, as there is no need to
reinit the list head in the job we are about to free.

Signed-off-by: Lucas Stach 
---
v2: - properly handle last job in the ring
- check correct fence for compeletion
---
 drivers/gpu/drm/scheduler/gpu_scheduler.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
index 44d480768dfe..574875e2c206 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -452,24 +452,22 @@ static void drm_sched_job_finish(struct work_struct *work)
   finish_work);
struct drm_gpu_scheduler *sched = s_job->sched;
 
-   /* remove job from ring_mirror_list */
-   spin_lock(>job_list_lock);
-   list_del_init(_job->node);
-   if (sched->timeout != MAX_SCHEDULE_TIMEOUT) {
-   struct drm_sched_job *next;
-
-   spin_unlock(>job_list_lock);
+   if (sched->timeout != MAX_SCHEDULE_TIMEOUT)
cancel_delayed_work_sync(_job->work_tdr);
-   spin_lock(>job_list_lock);
 
-   /* queue TDR for next job */
-   next = list_first_entry_or_null(>ring_mirror_list,
-   struct drm_sched_job, node);
+   spin_lock(>job_list_lock);
+   /* queue TDR for next job */
+   if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
+   !list_is_last(_job->node, >ring_mirror_list)) {
+   struct drm_sched_job *next = list_next_entry(s_job, node);
 
-   if (next)
+   if (!dma_fence_is_signaled(>s_fence->finished))
schedule_delayed_work(>work_tdr, sched->timeout);
}
+   /* remove job from ring_mirror_list */
+   list_del(_job->node);
spin_unlock(>job_list_lock);
+
dma_fence_put(_job->s_fence->finished);
sched->ops->free_job(s_job);
 }
-- 
2.18.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/scheduler: fix timeout worker setup for out of order job completions

2018-08-03 Thread Lucas Stach
Hi Christian,

Am Freitag, den 03.08.2018, 15:51 +0200 schrieb Christian König:
> Hi Lucas,
> 
> thanks a lot for taking care of that, but there is one thing you have 
> missed:
> 
> It is perfectly possible that the job is the last one on the list and 
> list_next_entry() doesn't test for that, e.g. it never return NULL.

Urgh, right. For some reason I was thinking I would get the same
sched_job in that case again, thus the "next != s_job" in the condition
below. But for sure the next pointer is to the list head in the
scheduler, must be the temperature in here causing bitflips in my
brain. ;)

Updated patch coming up in few minutes.

> BTW: There are also quite a lot of other things we could optimize here. 
> So if you need something todo, just ping me :)

Hehe, I think we are all in the same boat here: too much stuff on the
plate.

Regards,
Lucas

> Cheers,
> Christian.
> 
> Am 03.08.2018 um 15:18 schrieb Lucas Stach:
> > drm_sched_job_finish() is a work item scheduled for each finished job on
> > a unbound system workqueue. This means the workers can execute out of order
> > with regard to the real hardware job completions.
> > 
> > If this happens queueing a timeout worker for the first job on the ring
> > mirror list is wrong, as this may be a job which has already finished
> > executing. Fix this by reorganizing the code to always queue the worker
> > for the next job on the list, if this job hasn't finished yet. This is
> > robust against a potential reordering of the finish workers.
> > 
> > Also move out the timeout worker cancelling, so that we don't need to
> > take the job list lock twice. As a small optimization list_del is used
> > to remove the job from the ring mirror list, as there is no need to
> > reinit the list head in the job we are about to free.
> > 
> > > > Signed-off-by: Lucas Stach 
> > ---
> >   drivers/gpu/drm/scheduler/gpu_scheduler.c | 20 +---
> >   1 file changed, 9 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
> > b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > index 44d480768dfe..0be2859d7b80 100644
> > --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > @@ -452,24 +452,22 @@ static void drm_sched_job_finish(struct work_struct 
> > *work)
> > > >        finish_work);
> > > >     struct drm_gpu_scheduler *sched = s_job->sched;
> >   
> > > > -   /* remove job from ring_mirror_list */
> > > > +   if (sched->timeout != MAX_SCHEDULE_TIMEOUT)
> > > > +   cancel_delayed_work_sync(_job->work_tdr);
> > +
> > > >     spin_lock(>job_list_lock);
> > > > -   list_del_init(_job->node);
> > > >     if (sched->timeout != MAX_SCHEDULE_TIMEOUT) {
> > > > -   struct drm_sched_job *next;
> > -
> > > > -   spin_unlock(>job_list_lock);
> > > > -   cancel_delayed_work_sync(_job->work_tdr);
> > > > -   spin_lock(>job_list_lock);
> > > > +   struct drm_sched_job *next = list_next_entry(s_job, 
> > > > node);
> >   
> > > >     /* queue TDR for next job */
> > > > -   next = 
> > > > list_first_entry_or_null(>ring_mirror_list,
> > > > -   struct drm_sched_job, 
> > > > node);
> > -
> > > > -   if (next)
> > > > +   if (next != s_job &&
> > > > +   !dma_fence_is_signaled(_job->s_fence->finished))
> > > >     schedule_delayed_work(>work_tdr, 
> > > > sched->timeout);
> > > >     }
> > > > +   /* remove job from ring_mirror_list */
> > > > +   list_del(_job->node);
> > > >     spin_unlock(>job_list_lock);
> > +
> > > >     dma_fence_put(_job->s_fence->finished);
> > > >     sched->ops->free_job(s_job);
> >   }
> 
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/scheduler: fix timeout worker setup for out of order job completions

2018-08-03 Thread Lucas Stach
drm_sched_job_finish() is a work item scheduled for each finished job on
a unbound system workqueue. This means the workers can execute out of order
with regard to the real hardware job completions.

If this happens queueing a timeout worker for the first job on the ring
mirror list is wrong, as this may be a job which has already finished
executing. Fix this by reorganizing the code to always queue the worker
for the next job on the list, if this job hasn't finished yet. This is
robust against a potential reordering of the finish workers.

Also move out the timeout worker cancelling, so that we don't need to
take the job list lock twice. As a small optimization list_del is used
to remove the job from the ring mirror list, as there is no need to
reinit the list head in the job we are about to free.

Signed-off-by: Lucas Stach 
---
 drivers/gpu/drm/scheduler/gpu_scheduler.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
index 44d480768dfe..0be2859d7b80 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -452,24 +452,22 @@ static void drm_sched_job_finish(struct work_struct *work)
   finish_work);
struct drm_gpu_scheduler *sched = s_job->sched;
 
-   /* remove job from ring_mirror_list */
+   if (sched->timeout != MAX_SCHEDULE_TIMEOUT)
+   cancel_delayed_work_sync(_job->work_tdr);
+
spin_lock(>job_list_lock);
-   list_del_init(_job->node);
if (sched->timeout != MAX_SCHEDULE_TIMEOUT) {
-   struct drm_sched_job *next;
-
-   spin_unlock(>job_list_lock);
-   cancel_delayed_work_sync(_job->work_tdr);
-   spin_lock(>job_list_lock);
+   struct drm_sched_job *next = list_next_entry(s_job, node);
 
/* queue TDR for next job */
-   next = list_first_entry_or_null(>ring_mirror_list,
-   struct drm_sched_job, node);
-
-   if (next)
+   if (next != s_job &&
+   !dma_fence_is_signaled(_job->s_fence->finished))
schedule_delayed_work(>work_tdr, sched->timeout);
}
+   /* remove job from ring_mirror_list */
+   list_del(_job->node);
spin_unlock(>job_list_lock);
+
dma_fence_put(_job->s_fence->finished);
sched->ops->free_job(s_job);
 }
-- 
2.18.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 3/3] drm/v3d: Add a note about locking of v3d_fence_create().

2018-06-08 Thread Lucas Stach
Am Dienstag, den 05.06.2018, 12:03 -0700 schrieb Eric Anholt:
> This isn't the first time I've had to argue to myself why the '++' was
> safe.

And now you need to do the same thing with me...

> Signed-off-by: Eric Anholt 
> ---
>  drivers/gpu/drm/v3d/v3d_fence.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_fence.c b/drivers/gpu/drm/v3d/v3d_fence.c
> index bfe31a89668b..6265e9ab4a13 100644
> --- a/drivers/gpu/drm/v3d/v3d_fence.c
> +++ b/drivers/gpu/drm/v3d/v3d_fence.c
> @@ -3,6 +3,9 @@
>  
>  #include "v3d_drv.h"
>  
> +/* Note that V3D fences are created during v3d_job_run(), so we're
> + * already implictly locked.
> + */
I don't see where you would be locked in the job_run path. I think what
you mean is that this path needs no locks, as it is driven by a single
scheduler thread, right?

Regards,
Lucas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/3] drm/v3d: Remove the bad signaled() implementation.

2018-06-08 Thread Lucas Stach
Am Dienstag, den 05.06.2018, 12:03 -0700 schrieb Eric Anholt:
> Since our seqno value comes from a counter associated with the GPU
> ring, not the entity (aka client), they'll be completed out of order.
> There's actually no need for this code at all, since we don't have
> enable_signaling() and thus DMA_FENCE_SIGNALED_BIT will be set before
> we could be called.
> 
> Signed-off-by: Eric Anholt 

Reviewed-by: Lucas Stach 

> ---
>  drivers/gpu/drm/v3d/v3d_drv.h   |  1 -
>  drivers/gpu/drm/v3d/v3d_fence.c | 13 -
>  drivers/gpu/drm/v3d/v3d_gem.c   |  7 ++-
>  drivers/gpu/drm/v3d/v3d_irq.c   |  3 ---
>  4 files changed, 6 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h
> b/drivers/gpu/drm/v3d/v3d_drv.h
> index 26005abd9c5d..f32ac8c98f37 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -25,7 +25,6 @@ struct v3d_queue_state {
>  
>   u64 fence_context;
>   u64 emit_seqno;
> - u64 finished_seqno;
>  };
>  
>  struct v3d_dev {
> diff --git a/drivers/gpu/drm/v3d/v3d_fence.c
> b/drivers/gpu/drm/v3d/v3d_fence.c
> index 087d49c8cb12..bfe31a89668b 100644
> --- a/drivers/gpu/drm/v3d/v3d_fence.c
> +++ b/drivers/gpu/drm/v3d/v3d_fence.c
> @@ -40,19 +40,14 @@ static bool v3d_fence_enable_signaling(struct
> dma_fence *fence)
>   return true;
>  }
>  
> -static bool v3d_fence_signaled(struct dma_fence *fence)
> -{
> - struct v3d_fence *f = to_v3d_fence(fence);
> - struct v3d_dev *v3d = to_v3d_dev(f->dev);
> -
> - return v3d->queue[f->queue].finished_seqno >= f->seqno;
> -}
> -
>  const struct dma_fence_ops v3d_fence_ops = {
>   .get_driver_name = v3d_fence_get_driver_name,
>   .get_timeline_name = v3d_fence_get_timeline_name,
>   .enable_signaling = v3d_fence_enable_signaling,
> - .signaled = v3d_fence_signaled,
> + /* Each of our fences gets signaled as complete by the IRQ
> +  * handler, so we rely on the core's tracking of signaling.
> +  */
> + .signaled = NULL,
>   .wait = dma_fence_default_wait,
>   .release = dma_fence_free,
>  };
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> b/drivers/gpu/drm/v3d/v3d_gem.c
> index 9ea83bdb9a30..d06d6697e089 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -657,17 +657,14 @@ void
>  v3d_gem_destroy(struct drm_device *dev)
>  {
>   struct v3d_dev *v3d = to_v3d_dev(dev);
> - enum v3d_queue q;
>  
>   v3d_sched_fini(v3d);
>  
>   /* Waiting for exec to finish would need to be done before
>    * unregistering V3D.
>    */
> - for (q = 0; q < V3D_MAX_QUEUES; q++) {
> - WARN_ON(v3d->queue[q].emit_seqno !=
> - v3d->queue[q].finished_seqno);
> - }
> + WARN_ON(v3d->bin_job);
> + WARN_ON(v3d->render_job);
>  
>   drm_mm_takedown(>mm);
>  
> diff --git a/drivers/gpu/drm/v3d/v3d_irq.c
> b/drivers/gpu/drm/v3d/v3d_irq.c
> index 77e1fa046c10..e07514eb11b5 100644
> --- a/drivers/gpu/drm/v3d/v3d_irq.c
> +++ b/drivers/gpu/drm/v3d/v3d_irq.c
> @@ -87,15 +87,12 @@ v3d_irq(int irq, void *arg)
>   }
>  
>   if (intsts & V3D_INT_FLDONE) {
> - v3d->queue[V3D_BIN].finished_seqno++;
>   dma_fence_signal(v3d->bin_job->bin.done_fence);
>   status = IRQ_HANDLED;
>   }
>  
>   if (intsts & V3D_INT_FRDONE) {
> - v3d->queue[V3D_RENDER].finished_seqno++;
>   dma_fence_signal(v3d->render_job-
> >render.done_fence);
> -
>   status = IRQ_HANDLED;
>   }
>  
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/3 v2] drm/v3d: Take a lock across GPU scheduler job creation and queuing.

2018-06-07 Thread Lucas Stach
Am Mittwoch, den 06.06.2018, 10:48 -0700 schrieb Eric Anholt:
> Between creation and queueing of a job, you need to prevent any other
> job from being created and queued.  Otherwise the scheduler's fences
> may be signaled out of seqno order.
> 
> v2: move mutex unlock to the error label.
> 
> > Signed-off-by: Eric Anholt 
> Fixes: 57692c94dcbe ("drm/v3d: Introduce a new DRM driver for Broadcom V3D 
> V3.x+")

Reviewed-by: Lucas Stach 

> ---
>  drivers/gpu/drm/v3d/v3d_drv.h | 5 +
>  drivers/gpu/drm/v3d/v3d_gem.c | 4 
>  2 files changed, 9 insertions(+)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
> index a043ac3aae98..26005abd9c5d 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -85,6 +85,11 @@ struct v3d_dev {
> >      */
> >     struct mutex reset_lock;
>  
> > +   /* Lock taken when creating and pushing the GPU scheduler
> > +    * jobs, to keep the sched-fence seqnos in order.
> > +    */
> > +   struct mutex sched_lock;
> +
> >     struct {
> >     u32 num_allocated;
> >     u32 pages_allocated;
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
> index b513f9189caf..269fe16379c0 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -550,6 +550,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
> >     if (ret)
> >     goto fail;
>  
> > +   mutex_lock(>sched_lock);
> >     if (exec->bin.start != exec->bin.end) {
> >     ret = drm_sched_job_init(>bin.base,
> >      >queue[V3D_BIN].sched,
> @@ -576,6 +577,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
> >     kref_get(>refcount); /* put by scheduler job completion */
> >     drm_sched_entity_push_job(>render.base,
> >       _priv->sched_entity[V3D_RENDER]);
> > +   mutex_unlock(>sched_lock);
>  
> >     v3d_attach_object_fences(exec);
>  
> @@ -594,6 +596,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
> >     return 0;
>  
>  fail_unreserve:
> > +   mutex_unlock(>sched_lock);
> >     v3d_unlock_bo_reservations(dev, exec, _ctx);
>  fail:
> >     v3d_exec_put(exec);
> @@ -615,6 +618,7 @@ v3d_gem_init(struct drm_device *dev)
> >     spin_lock_init(>job_lock);
> >     mutex_init(>bo_lock);
> >     mutex_init(>reset_lock);
> > +   mutex_init(>sched_lock);
>  
> >     /* Note: We don't allocate address 0.  Various bits of HW
> >      * treat 0 as special, such as the occlusion query counters
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/scheduler: Rename cleanup functions.

2018-06-06 Thread Lucas Stach
Am Dienstag, den 05.06.2018, 13:02 -0400 schrieb Andrey Grodzovsky:
> Everything in the flush code path (i.e. waiting for SW queue
> to become empty) names with *_flush()
> and everything in the release code path names *_fini()
> 
> This patch also effect the amdgpu and etnaviv drivers which
> use those functions.
> 
> > Signed-off-by: Andrey Grodzovsky 
> > Suggested-by: Christian König 
> ---
[...]
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c 
> b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> index 23e73c2..3dff4d0 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> @@ -140,7 +140,7 @@ static void etnaviv_postclose(struct drm_device *dev, 
> struct drm_file *file)
> >     gpu->lastctx = NULL;
> >     mutex_unlock(>lock);
>  
> > -   drm_sched_entity_fini(>sched,
> > +   drm_sched_entity_destroy(>sched,
>     >sched_entity[i]);

Style nit: this disaligns the second row of parameters to the opening
parenthesis where it was previously aligned. I would prefer if the
second line is also changed to keep the alignment.

Acked-by: Lucas Stach 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/3] drm/v3d: Take a lock across GPU scheduler job creation and queuing.

2018-06-06 Thread Lucas Stach
Am Dienstag, den 05.06.2018, 12:03 -0700 schrieb Eric Anholt:
> Between creation and queueing of a job, you need to prevent any other
> job from being created and queued.  Otherwise the scheduler's fences
> may be signaled out of seqno order.
> 
> > Signed-off-by: Eric Anholt 
> Fixes: 57692c94dcbe ("drm/v3d: Introduce a new DRM driver for Broadcom V3D 
> V3.x+")
> ---
> 
> ccing amd-gfx due to interaction of this series with the scheduler.
> 
>  drivers/gpu/drm/v3d/v3d_drv.h |  5 +
>  drivers/gpu/drm/v3d/v3d_gem.c | 11 +--
>  2 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
> index a043ac3aae98..26005abd9c5d 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -85,6 +85,11 @@ struct v3d_dev {
> >      */
> >     struct mutex reset_lock;
>  
> > +   /* Lock taken when creating and pushing the GPU scheduler
> > +    * jobs, to keep the sched-fence seqnos in order.
> > +    */
> > +   struct mutex sched_lock;
> +
> >     struct {
> >     u32 num_allocated;
> >     u32 pages_allocated;
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
> index b513f9189caf..9ea83bdb9a30 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -550,13 +550,16 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
> >     if (ret)
> >     goto fail;
>  
> > +   mutex_lock(>sched_lock);
> >     if (exec->bin.start != exec->bin.end) {
> >     ret = drm_sched_job_init(>bin.base,
> >      >queue[V3D_BIN].sched,
> >      _priv->sched_entity[V3D_BIN],
> >      v3d_priv);
> > -   if (ret)
> > +   if (ret) {
> > +   mutex_unlock(>sched_lock);
>   goto fail_unreserve;

I don't see any path where you would go to fail_unreserve with the
mutex not yet locked, so you could just fold the mutex_unlock into this
error path for a bit less code duplication.

Otherwise this looks fine.

Regards,
Lucas

> + }
>  
> >     exec->bin_done_fence =
> >     dma_fence_get(>bin.base.s_fence->finished);
> @@ -570,12 +573,15 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
> >      >queue[V3D_RENDER].sched,
> >      _priv->sched_entity[V3D_RENDER],
> >      v3d_priv);
> > -   if (ret)
> > +   if (ret) {
> > +   mutex_unlock(>sched_lock);
> >     goto fail_unreserve;
> > +   }
>  
> >     kref_get(>refcount); /* put by scheduler job completion */
> >     drm_sched_entity_push_job(>render.base,
> >       _priv->sched_entity[V3D_RENDER]);
> > +   mutex_unlock(>sched_lock);
>  
> >     v3d_attach_object_fences(exec);
>  
> @@ -615,6 +621,7 @@ v3d_gem_init(struct drm_device *dev)
> >     spin_lock_init(>job_lock);
> >     mutex_init(>bo_lock);
> >     mutex_init(>reset_lock);
> > +   mutex_init(>sched_lock);
>  
> >     /* Note: We don't allocate address 0.  Various bits of HW
> >      * treat 0 as special, such as the occlusion query counters
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/scheduler: Remove obsolete spinlock.

2018-05-16 Thread Lucas Stach
Am Mittwoch, den 16.05.2018, 15:10 +0200 schrieb Christian König:
> Am 16.05.2018 um 15:00 schrieb Lucas Stach:
> > Am Mittwoch, den 16.05.2018, 14:32 +0200 schrieb Christian König:
> > > Am 16.05.2018 um 14:28 schrieb Lucas Stach:
> > > > Am Mittwoch, den 16.05.2018, 14:08 +0200 schrieb Christian König:
> > > > > Yes, exactly.
> > > > > 
> > > > > For normal user space command submission we should have tons of
> > > > > locks
> > > > > guaranteeing that (e.g. just the VM lock should do).
> > > > > 
> > > > > For kernel moves we have the mutex for the GTT windows which
> > > > > protects
> > > > > it.
> > > > > 
> > > > > The could be problems with the UVD/VCE queues to cleanup the
> > > > > handles
> > > > > when an application crashes.
> > > > 
> > > > FWIW, etnaviv is currently completely unlocked in this path, but I
> > > > believe that this isn't an issue as the sched fence seq numbers are
> > > > per
> > > > entity. So to actually end up with reversed seqnos one context has
> > > > to
> > > > preempt itself to do another submit, while the current one hasn't
> > > > returned from kernel space, which I believe is a fairly theoretical
> > > > issue. Is my understanding correct?
> > > 
> > > Yes. The problem is with the right timing this can be used to access
> > > freed up memory.
> > > 
> > > If you then manage to place a page table in that freed up memory
> > > taking
> > > over the system is just a typing exercise.
> > 
> > Thanks. I believe we don't have this problem in etnaviv, as memory
> > referencing is tied to the job and will only be unreferenced on
> > free_job, but I'll re-check this when I've got some time.
> 
> Well that depends on how you use the sequence numbers.
> 
> If you use them for job completion check somewhere then you certainly 
> have a problem if job A gets the 1 and B the 2, but B completes before A.

We don't do this anymore. All the etnaviv internals are driven by the
fence signal callbacks.

> At bare minimum that's still a bug we need to fix because it confuses 
> functions like dma_fence_is_later() and dma_fence_later().

Agreed. Also amending the documentation to state that
drm_sched_job_init() and drm_sched_entity_push_job() need to be called
under a common lock seems like a good idea.

Regards,
Lucas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/scheduler: Remove obsolete spinlock.

2018-05-16 Thread Lucas Stach
Am Mittwoch, den 16.05.2018, 14:32 +0200 schrieb Christian König:
> Am 16.05.2018 um 14:28 schrieb Lucas Stach:
> > Am Mittwoch, den 16.05.2018, 14:08 +0200 schrieb Christian König:
> > > Yes, exactly.
> > > 
> > > For normal user space command submission we should have tons of
> > > locks
> > > guaranteeing that (e.g. just the VM lock should do).
> > > 
> > > For kernel moves we have the mutex for the GTT windows which
> > > protects
> > > it.
> > > 
> > > The could be problems with the UVD/VCE queues to cleanup the
> > > handles
> > > when an application crashes.
> > 
> > FWIW, etnaviv is currently completely unlocked in this path, but I
> > believe that this isn't an issue as the sched fence seq numbers are
> > per
> > entity. So to actually end up with reversed seqnos one context has
> > to
> > preempt itself to do another submit, while the current one hasn't
> > returned from kernel space, which I believe is a fairly theoretical
> > issue. Is my understanding correct?
> 
> Yes. The problem is with the right timing this can be used to access 
> freed up memory.
> 
> If you then manage to place a page table in that freed up memory
> taking 
> over the system is just a typing exercise.

Thanks. I believe we don't have this problem in etnaviv, as memory
referencing is tied to the job and will only be unreferenced on
free_job, but I'll re-check this when I've got some time.

Regards,
Lucas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/scheduler: Remove obsolete spinlock.

2018-05-16 Thread Lucas Stach
Am Mittwoch, den 16.05.2018, 14:08 +0200 schrieb Christian König:
> Yes, exactly.
> 
> For normal user space command submission we should have tons of
> locks 
> guaranteeing that (e.g. just the VM lock should do).
> 
> For kernel moves we have the mutex for the GTT windows which protects
> it.
> 
> The could be problems with the UVD/VCE queues to cleanup the handles 
> when an application crashes.

FWIW, etnaviv is currently completely unlocked in this path, but I
believe that this isn't an issue as the sched fence seq numbers are per
entity. So to actually end up with reversed seqnos one context has to
preempt itself to do another submit, while the current one hasn't
returned from kernel space, which I believe is a fairly theoretical
issue. Is my understanding correct?

Regards,
Lucas

> 
> Am 16.05.2018 um 13:47 schrieb Andrey Grodzovsky:
> > So are you saying that you expect this to  be already in code for
> > any 
> > usage of drm_sched_fence_create and drm_sched_entity_push_job ?
> > 
> > lock()
> > 
> > drm_sched_fence_create()
> > 
> > ... (some code)
> > 
> > drm_sched_entity_push_job()
> > 
> > unlock()
> > 
> > 
> > Andrey
> > 
> > On 05/16/2018 07:23 AM, Christian König wrote:
> > > drm_sched_fence_create() assigns a sequence number to the fence
> > > it 
> > > creates.
> > > 
> > > Now drm_sched_fence_create() is called by drm_sched_job_init()
> > > to 
> > > initialize the jobs we want to push on the scheduler queue.
> > > 
> > > When you now call drm_sched_entity_push_job() without a
> > > protecting 
> > > lock it can happen that you push two (or even more) job with
> > > reversed 
> > > sequence numbers.
> > > 
> > > Since the sequence numbers are used to determine job completion
> > > order 
> > > reversing them can seriously mess things up.
> > > 
> > > So the spin lock should be superfluous, if it isn't we have a
> > > much 
> > > larger bug we need to fix.
> > > 
> > > Christian.
> > > 
> > > Am 16.05.2018 um 13:15 schrieb Andrey Grodzovsky:
> > > > Can you please elaborate more, maybe give an example - I don't 
> > > > understand yet the problematic scenario.
> > > > 
> > > > Andrey
> > > > 
> > > > 
> > > > On 05/16/2018 02:50 AM, Christian König wrote:
> > > > > No, that spinlock is indeed incorrect. I
> > > > > 
> > > > > See even when we protect the spsc queue with a spinlock that 
> > > > > doesn't make it correct. It can happen that the jobs pushed
> > > > > to the 
> > > > > queue are reversed in their sequence order and that can
> > > > > cause 
> > > > > severe problems in the memory management.
> > > > > 
> > > > > Christian.
> > > > > 
> > > > > Am 16.05.2018 um 05:33 schrieb Grodzovsky, Andrey:
> > > > > > Yeah, that what I am not sure about... It's lockless in a
> > > > > > sense of 
> > > > > > single producer single consumer but not for multiple
> > > > > > concurrent 
> > > > > > producers... So now I think this spinlock should stay
> > > > > > there... It 
> > > > > > just looked useless to me at first sight...
> > > > > > 
> > > > > > Andrey
> > > > > > 
> > > > > > 
> > > > > > From: Zhou, David(ChunMing)
> > > > > > Sent: 15 May 2018 23:04:44
> > > > > > To: Grodzovsky, Andrey; amd-gfx@lists.freedesktop.org; 
> > > > > > dri-de...@lists.freedesktop.org
> > > > > > Cc: Koenig, Christian
> > > > > > Subject: Re: [PATCH 2/2] drm/scheduler: Remove obsolete
> > > > > > spinlock.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > On 2018年05月16日 03:31, Andrey Grodzovsky wrote:
> > > > > > > Signed-off-by: Andrey Grodzovsky  > > > > > > om>
> > > > > > > ---
> > > > > > >    drivers/gpu/drm/scheduler/gpu_scheduler.c | 4 
> > > > > > >    include/drm/gpu_scheduler.h   | 1 -
> > > > > > >    2 files changed, 5 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
> > > > > > > b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > > > > > > index 1f1dd70..2569a63 100644
> > > > > > > --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > > > > > > +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > > > > > > @@ -140,7 +140,6 @@ int drm_sched_entity_init(struct 
> > > > > > > drm_gpu_scheduler *sched,
> > > > > > >    entity->last_scheduled = NULL;
> > > > > > > 
> > > > > > >    spin_lock_init(>rq_lock);
> > > > > > > - spin_lock_init(>queue_lock);
> > > > > > >    spsc_queue_init(>job_queue);
> > > > > > > 
> > > > > > >    atomic_set(>fence_seq, 0);
> > > > > > > @@ -424,11 +423,8 @@ void
> > > > > > > drm_sched_entity_push_job(struct 
> > > > > > > drm_sched_job *sched_job,
> > > > > > > 
> > > > > > >    trace_drm_sched_job(sched_job, entity);
> > > > > > > 
> > > > > > > - spin_lock(>queue_lock);
> > > > > > >    first = spsc_queue_push(>job_queue, 
> > > > > > > _job->queue_node);
> > > > > > > 
> > > > > > > - spin_unlock(>queue_lock);
> > > > > > 
> > > > > > Is your spsc safely to be added simultaneously?
> > > > > 

Re: [PATCH 01/13] ttm: abstruct evictable bo

2018-05-09 Thread Lucas Stach
All of those changes are including a Change-Id that has no bearing in
upstream patches and are missing a proper commit description explaining
why a specific change is done.

Regards,
Lucas

Am Mittwoch, den 09.05.2018, 14:45 +0800 schrieb Chunming Zhou:
> Change-Id: Ie81985282fab1e564fc2948109fae2173613b465
> > Signed-off-by: Chunming Zhou 
> ---
>  drivers/gpu/drm/ttm/ttm_bo.c | 35 ---
>  1 file changed, 24 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 98e06f8bf23b..15506682a0be 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -704,22 +704,20 @@ static bool ttm_bo_evict_swapout_allowable(struct 
> ttm_buffer_object *bo,
> >     return ret;
>  }
>  
> -static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
> > -      uint32_t mem_type,
> > -      const struct ttm_place *place,
> > -      struct ttm_operation_ctx *ctx)
> +static struct ttm_buffer_object *
> +ttm_mem_get_evictable_bo(struct ttm_bo_device *bdev,
> > +    uint32_t mem_type,
> > +    const struct ttm_place *place,
> > +    struct ttm_operation_ctx *ctx,
> > +    bool *locked)
>  {
> > -   struct ttm_bo_global *glob = bdev->glob;
> > -   struct ttm_mem_type_manager *man = >man[mem_type];
> >     struct ttm_buffer_object *bo = NULL;
> > -   bool locked = false;
> > -   unsigned i;
> > -   int ret;
> > +   struct ttm_mem_type_manager *man = >man[mem_type];
> > +   int i;
>  
> > -   spin_lock(>lru_lock);
> >     for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
> >     list_for_each_entry(bo, >lru[i], lru) {
> > -   if (!ttm_bo_evict_swapout_allowable(bo, ctx, ))
> > +   if (!ttm_bo_evict_swapout_allowable(bo, ctx, locked))
> >     continue;
>  
> >     if (place && !bdev->driver->eviction_valuable(bo,
> @@ -738,6 +736,21 @@ static int ttm_mem_evict_first(struct ttm_bo_device 
> *bdev,
> >     bo = NULL;
> >     }
>  
> > +   return bo;
> +}
> +
> +static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
> > +      uint32_t mem_type,
> > +      const struct ttm_place *place,
> > +      struct ttm_operation_ctx *ctx)
> +{
> > +   struct ttm_bo_global *glob = bdev->glob;
> > +   struct ttm_buffer_object *bo = NULL;
> > +   bool locked = false;
> > +   int ret;
> +
> > +   spin_lock(>lru_lock);
> > +   bo = ttm_mem_get_evictable_bo(bdev, mem_type, place, ctx, );
> >     if (!bo) {
> >     spin_unlock(>lru_lock);
> >     return -EBUSY;
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Linaro-mm-sig] [PATCH 4/8] dma-buf: add peer2peer flag

2018-05-04 Thread Lucas Stach
Am Mittwoch, den 25.04.2018, 13:44 -0400 schrieb Alex Deucher:
> On Wed, Apr 25, 2018 at 2:41 AM, Christoph Hellwig  > wrote:
> > On Wed, Apr 25, 2018 at 02:24:36AM -0400, Alex Deucher wrote:
> > > > It has a non-coherent transaction mode (which the chipset can opt to
> > > > not implement and still flush), to make sure the AGP horror show
> > > > doesn't happen again and GPU folks are happy with PCIe. That's at
> > > > least my understanding from digging around in amd the last time we had
> > > > coherency issues between intel and amd gpus. GPUs have some bits
> > > > somewhere (in the pagetables, or in the buffer object description
> > > > table created by userspace) to control that stuff.
> > > 
> > > Right.  We have a bit in the GPU page table entries that determines
> > > whether we snoop the CPU's cache or not.
> > 
> > I can see how that works with the GPU on the same SOC or SOC set as the
> > CPU.  But how is that going to work for a GPU that is a plain old PCIe
> > card?  The cache snooping in that case is happening in the PCIe root
> > complex.
> 
> I'm not a pci expert, but as far as I know, the device sends either a
> snooped or non-snooped transaction on the bus.  I think the
> transaction descriptor supports a no snoop attribute.  Our GPUs have
> supported this feature for probably 20 years if not more, going back
> to PCI.  Using non-snooped transactions have lower latency and faster
> throughput compared to snooped transactions.

PCI-X (as in the thing which as all the rage before PCIe) added a
attribute phase to each transaction where the requestor can enable
relaxed ordering and/or no-snoop on a transaction basis. As those are
strictly performance optimizations the root-complex isn't required to
honor those attributes, but implementations that care about performance
 usually will.

Regards,
Lucas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/prime: Iterate SG DMA addresses separately

2018-04-12 Thread Lucas Stach
Am Donnerstag, den 12.04.2018, 11:35 +0200 schrieb Christian König:
> Am 12.04.2018 um 11:11 schrieb Lucas Stach:
> > Am Mittwoch, den 11.04.2018, 20:26 +0200 schrieb Christian König:
> > > Am 11.04.2018 um 19:11 schrieb Robin Murphy:
> > > > For dma_map_sg(), DMA API implementations are free to merge consecutive
> > > > segments into a single DMA mapping if conditions are suitable, thus the
> > > > resulting DMA addresses may be packed into fewer entries than
> > > > ttm->sg->nents implies.
> > > > 
> > > > drm_prime_sg_to_page_addr_arrays() does not account for this, meaning
> > > > its callers either have to reject the 0 < count < nents case or risk
> > > > getting bogus addresses back later. Fortunately this is relatively easy
> > > > to deal with having to rejig structures to also store the mapped count,
> > > > since the total DMA length should still be equal to the total buffer
> > > > length. All we need is a separate scatterlist cursor to iterate the DMA
> > > > addresses separately from the CPU addresses.
> > > 
> > > Mhm, I think I like Sinas approach better.
> > > 
> > > See the hardware actually needs the dma_address on a page by page basis.
> > > 
> > > Joining multiple consecutive pages into one entry is just additional
> > > overhead which we don't need.
> > 
> > But setting MAX_SEGMENT_SIZE will probably prevent an IOMMU that might
> > be in front of your GPU to map large pages as such, causing additional
> > overhead by means of additional TLB misses and page walks on the IOMMU
> > side.
> > 
> > And wouldn't you like to use huge pages at the GPU side, if the IOMMU
> > already provides you the service of producing one large consecutive
> > address range, rather than mapping them via a number of small pages?
> 
> No, I wouldn't like to use that. We're already using that :)
> 
> But we use huge pages by allocating consecutive chunks of memory so that 
> both the CPU as well as the GPU can increase their TLB hit rate.

I'm not convinced that this is a universal win. Many GPU buffers aren't
accessed by the CPU and allocating huge pages puts much more strain on
the kernel MM subsystem.

> What currently happens is that the DMA subsystem tries to coalesce 
> multiple pages into on SG entry and we de-coalesce that inside the 
> driver again to create our random access array.

I guess the right thing would be to have a flag that tells the the DMA
implementation to not coalesce the entries. But (ab-)using max segment
size tells the DMA API to split the segments if they are larger than
the given size, which is probably not what you want either as you now
need to coalesce the segments again when you are mapping real huge
pages.

> That is a huge waste of memory and CPU cycles and I actually wanted to 
> get rid of it for quite some time now. Sinas approach seems to be a good 
> step into the right direction to me to actually clean that up.
> 
> > Detecting such a condition is much easier if the DMA map implementation
> > gives you the coalesced scatter entries.
> 
> A way which preserves both path would be indeed nice to have, but that 
> only allows for the TLB optimization for the GPU and not the CPU any 
> more. So I actually see that as really minor use case.

Some of the Tegras have much larger TLBs and better page-walk
performance on the system IOMMU compared to the GPU MMU, so they would
probably benefit a good deal even if the hugepage optimization only
targets the GPU side.

Regards,
Lucas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/prime: Iterate SG DMA addresses separately

2018-04-12 Thread Lucas Stach
Am Mittwoch, den 11.04.2018, 20:26 +0200 schrieb Christian König:
> Am 11.04.2018 um 19:11 schrieb Robin Murphy:
> > For dma_map_sg(), DMA API implementations are free to merge consecutive
> > segments into a single DMA mapping if conditions are suitable, thus the
> > resulting DMA addresses may be packed into fewer entries than
> > ttm->sg->nents implies.
> > 
> > drm_prime_sg_to_page_addr_arrays() does not account for this, meaning
> > its callers either have to reject the 0 < count < nents case or risk
> > getting bogus addresses back later. Fortunately this is relatively easy
> > to deal with having to rejig structures to also store the mapped count,
> > since the total DMA length should still be equal to the total buffer
> > length. All we need is a separate scatterlist cursor to iterate the DMA
> > addresses separately from the CPU addresses.
> 
> Mhm, I think I like Sinas approach better.
> 
> See the hardware actually needs the dma_address on a page by page basis.
> 
> Joining multiple consecutive pages into one entry is just additional 
> overhead which we don't need.

But setting MAX_SEGMENT_SIZE will probably prevent an IOMMU that might
be in front of your GPU to map large pages as such, causing additional
overhead by means of additional TLB misses and page walks on the IOMMU
side.

And wouldn't you like to use huge pages at the GPU side, if the IOMMU
already provides you the service of producing one large consecutive
address range, rather than mapping them via a number of small pages?
Detecting such a condition is much easier if the DMA map implementation
gives you the coalesced scatter entries.

Regards,
Lucas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] Per file OOM badness

2018-04-04 Thread Lucas Stach
Am Mittwoch, den 04.04.2018, 11:09 +0200 schrieb Michel Dänzer:
> On 2018-03-26 04:36 PM, Lucas Stach wrote:
> > Am Dienstag, den 30.01.2018, 11:28 +0100 schrieb Michal Hocko:
> > > On Tue 30-01-18 10:29:10, Michel Dänzer wrote:
> > > > On 2018-01-24 12:50 PM, Michal Hocko wrote:
> > > > > On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
> > > > > > On 2018-01-24 12:01 PM, Michal Hocko wrote:
> > > > > > > On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
> > > > > 
> > > > > [...]
> > > > > > > > 2. If the OOM killer kills a process which is sharing BOs
> > > > > > > > with another
> > > > > > > > process, this should result in the other process dropping
> > > > > > > > its references
> > > > > > > > to the BOs as well, at which point the memory is released.
> > > > > > > 
> > > > > > > OK. How exactly are those BOs mapped to the userspace?
> > > > > > 
> > > > > > I'm not sure what you're asking. Userspace mostly uses a GEM
> > > > > > handle to
> > > > > > refer to a BO. There can also be userspace CPU mappings of the
> > > > > > BO's
> > > > > > memory, but userspace doesn't need CPU mappings for all BOs and
> > > > > > only
> > > > > > creates them as needed.
> > > > > 
> > > > > OK, I guess you have to bear with me some more. This whole stack
> > > > > is a
> > > > > complete uknonwn. I am mostly after finding a boundary where you
> > > > > can
> > > > > charge the allocated memory to the process so that the oom killer
> > > > > can
> > > > > consider it. Is there anything like that? Except for the proposed
> > > > > file
> > > > > handle hack?
> > > > 
> > > > How about the other way around: what APIs can we use to charge /
> > > > "uncharge" memory to a process? If we have those, we can experiment
> > > > with
> > > > different places to call them.
> > > 
> > > add_mm_counter() and I would add a new counter e.g. MM_KERNEL_PAGES.
> > 
> > So is anyone still working on this? This is hurting us bad enough that
> > I don't want to keep this topic rotting for another year.
> > 
> > If no one is currently working on this I would volunteer to give the
> > simple "just account private, non-shared buffers in process RSS" a
> > spin.
> 
> Sounds good. FWIW, I think shared buffers can also be easily handled by
> accounting them in each process which has a reference. But that's more
> of a detail, shouldn't make a big difference overall either way.

Yes, both options to wither never account shared buffers or to always
account them into every process having a reference should be pretty
easy. Where it gets hard is when trying to account the buffer only in
the last process holding a reference or something like this.

For the OOM case I think it makes more sense to never account shared
buffers, as this may lead to a process (like the compositor) to have
its RSS inflated by shared buffers, rendering it the likely victim for
the OOM killer/reaper, while killing this process will not lead to
freeing of any shared graphics memory, at least if the clients sharing
the buffer survive killing of the compositor.

This opens up the possibility to "hide" buffers from the accounting by
sharing them, but I guess it's still much better than the nothing we do
today to account for graphics buffers.

Regards,
Lucas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] Per file OOM badness

2018-03-26 Thread Lucas Stach
Hi all,

Am Dienstag, den 30.01.2018, 11:28 +0100 schrieb Michal Hocko:
> On Tue 30-01-18 10:29:10, Michel Dänzer wrote:
> > On 2018-01-24 12:50 PM, Michal Hocko wrote:
> > > On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
> > > > On 2018-01-24 12:01 PM, Michal Hocko wrote:
> > > > > On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
> > > 
> > > [...]
> > > > > > 2. If the OOM killer kills a process which is sharing BOs
> > > > > > with another
> > > > > > process, this should result in the other process dropping
> > > > > > its references
> > > > > > to the BOs as well, at which point the memory is released.
> > > > > 
> > > > > OK. How exactly are those BOs mapped to the userspace?
> > > > 
> > > > I'm not sure what you're asking. Userspace mostly uses a GEM
> > > > handle to
> > > > refer to a BO. There can also be userspace CPU mappings of the
> > > > BO's
> > > > memory, but userspace doesn't need CPU mappings for all BOs and
> > > > only
> > > > creates them as needed.
> > > 
> > > OK, I guess you have to bear with me some more. This whole stack
> > > is a
> > > complete uknonwn. I am mostly after finding a boundary where you
> > > can
> > > charge the allocated memory to the process so that the oom killer
> > > can
> > > consider it. Is there anything like that? Except for the proposed
> > > file
> > > handle hack?
> > 
> > How about the other way around: what APIs can we use to charge /
> > "uncharge" memory to a process? If we have those, we can experiment
> > with
> > different places to call them.
> 
> add_mm_counter() and I would add a new counter e.g. MM_KERNEL_PAGES.

So is anyone still working on this? This is hurting us bad enough that
I don't want to keep this topic rotting for another year.

If no one is currently working on this I would volunteer to give the
simple "just account private, non-shared buffers in process RSS" a
spin.

Regards,
Lucas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/5] drm/prime: fix potential race in drm_gem_map_detach

2018-02-28 Thread Lucas Stach
Hi Christian,

Am Dienstag, den 27.02.2018, 12:49 +0100 schrieb Christian König:
> Unpin the GEM object only after freeing the sg table.

What is the race that is being fixed here? The SG table is private to
the importer and the importer should hopefully only call map_detach if
it is done with all operations using the SG table. Thus it shouldn't
matter that the SG table might point to moved pages during execution of
this function.

Regards,
Lucas

> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/drm_prime.c | 32 
>  1 file changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_prime.c
> b/drivers/gpu/drm/drm_prime.c
> index e82a976f0fba..c38dacda6119 100644
> --- a/drivers/gpu/drm/drm_prime.c
> +++ b/drivers/gpu/drm/drm_prime.c
> @@ -230,26 +230,26 @@ void drm_gem_map_detach(struct dma_buf
> *dma_buf,
>   struct drm_prime_attachment *prime_attach = attach->priv;
>   struct drm_gem_object *obj = dma_buf->priv;
>   struct drm_device *dev = obj->dev;
> - struct sg_table *sgt;
>  
> - if (dev->driver->gem_prime_unpin)
> - dev->driver->gem_prime_unpin(obj);
> + if (prime_attach) {
> + struct sg_table *sgt = prime_attach->sgt;
>  
> - if (!prime_attach)
> - return;
> -
> - sgt = prime_attach->sgt;
> - if (sgt) {
> - if (prime_attach->dir != DMA_NONE)
> - dma_unmap_sg_attrs(attach->dev, sgt->sgl,
> sgt->nents,
> -prime_attach->dir,
> -DMA_ATTR_SKIP_CPU_SYNC);
> - sg_free_table(sgt);
> + if (sgt) {
> + if (prime_attach->dir != DMA_NONE)
> + dma_unmap_sg_attrs(attach->dev, sgt-
> >sgl,
> +sgt->nents,
> +prime_attach-
> >dir,
> +DMA_ATTR_SKIP_CPU
> _SYNC);
> + sg_free_table(sgt);
> + }
> +
> + kfree(sgt);
> + kfree(prime_attach);
> + attach->priv = NULL;
>   }
>  
> - kfree(sgt);
> - kfree(prime_attach);
> - attach->priv = NULL;
> + if (dev->driver->gem_prime_unpin)
> + dev->driver->gem_prime_unpin(obj);
>  }
>  EXPORT_SYMBOL(drm_gem_map_detach);
>  
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2 1/2] drm: move amd_gpu_scheduler into common location

2017-12-06 Thread Lucas Stach
This moves and renames the AMDGPU scheduler to a common location in DRM
in order to facilitate re-use by other drivers. This is mostly a straight
forward rename with no code changes.

One notable exception is the function to_drm_sched_fence(), which is no
longer a inline header function to avoid the need to export the
drm_sched_fence_ops_scheduled and drm_sched_fence_ops_finished structures.

Signed-off-by: Lucas Stach <l.st...@pengutronix.de>
---
 drivers/gpu/drm/Kconfig|   5 +
 drivers/gpu/drm/Makefile   |   1 +
 drivers/gpu/drm/amd/amdgpu/Makefile|   5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  16 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c|  38 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c|  20 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c   |  14 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  12 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c  |  20 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.h  |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c   |   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h|   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c|   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.h|   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c|   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h|   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c|  14 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h|   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |   7 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  |   4 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c  |   8 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c  |   8 +-
 drivers/gpu/drm/amd/scheduler/gpu_scheduler.h  | 186 -
 drivers/gpu/drm/scheduler/Makefile |   4 +
 .../gpu/drm/{amd => }/scheduler/gpu_scheduler.c| 296 ++---
 drivers/gpu/drm/{amd => }/scheduler/sched_fence.c  | 118 
 include/drm/gpu_scheduler.h| 176 
 .../drm/gpu_scheduler_trace.h  |  14 +-
 .../drm/amd/scheduler => include/drm}/spsc_queue.h |   7 +-
 35 files changed, 530 insertions(+), 517 deletions(-)
 delete mode 100644 drivers/gpu/drm/amd/scheduler/gpu_scheduler.h
 create mode 100644 drivers/gpu/drm/scheduler/Makefile
 rename drivers/gpu/drm/{amd => }/scheduler/gpu_scheduler.c (65%)
 rename drivers/gpu/drm/{amd => }/scheduler/sched_fence.c (59%)
 create mode 100644 include/drm/gpu_scheduler.h
 rename drivers/gpu/drm/amd/scheduler/gpu_sched_trace.h => 
include/drm/gpu_scheduler_trace.h (82%)
 rename {drivers/gpu/drm/amd/scheduler => include/drm}/spsc_queue.h (95%)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 4d9f21831741..ee38a3db1890 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -149,6 +149,10 @@ config DRM_VM
bool
depends on DRM && MMU
 
+config DRM_SCHED
+   tristate
+   depends on DRM
+
 source "drivers/gpu/drm/i2c/Kconfig"
 
 source "drivers/gpu/drm/arm/Kconfig"
@@ -178,6 +182,7 @@ config DRM_AMDGPU
depends on DRM && PCI && MMU
select FW_LOADER
 select DRM_KMS_HELPER
+   select DRM_SCHED
 select DRM_TTM
select POWER_SUPPLY
select HWMON
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index e9500844333e..1f6ba9e34e31 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -101,3 +101,4 @@ obj-$(CONFIG_DRM_MXSFB) += mxsfb/
 obj-$(CONFIG_DRM_TINYDRM) += tinydrm/
 obj-$(CONFIG_DRM_PL111) += pl111/
 obj-$(CONFIG_DRM_TVE200) += tve200/
+obj-$(CONFIG_DRM_SCHED)+= scheduler/
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 78d609123420..5f690f023e75 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -115,10 +115,7 @@ amdgpu-y += \
 amdgpu-y += amdgpu_cgs.o
 
 # GPU scheduler
-amdgpu-y += \
-   ../scheduler/gpu_scheduler.o \
-   ../scheduler/sched_fence.o \
-   amdgpu_job.o
+amdgpu-y += amdgpu_job.o
 
 # ACP componet
 ifneq ($(CONFIG_DRM_AMD_ACP),)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 5e2958a79928..5c8648ec2cd2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 

[PATCH v2 0/2] AMDGPU scheduler move, take 2

2017-12-06 Thread Lucas Stach
Hi all,

second try to move the AMDGPU scheduler into a common location, this
time rebased onto drm-next-4.16-wip.

I've tested my etnaviv series on top of this and things seem to work
fine. I checked that AMDGPU still builds, but I don't have any means
to actually runtime test this currently, so I would be very happy to
receive some tested-bys from the AMD side.

Alex, if this looks good please pull this in so it gets your usual
testing.

Thanks,
Lucas

Lucas Stach (2):
  drm: move amd_gpu_scheduler into common location
  drm/sched: move fence slab handling to module init/exit

 drivers/gpu/drm/Kconfig|   5 +
 drivers/gpu/drm/Makefile   |   1 +
 drivers/gpu/drm/amd/amdgpu/Makefile|   5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  16 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c|  38 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|   8 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c|  20 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c   |  14 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  12 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c  |  20 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.h  |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c   |   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h|   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c|   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.h|   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c|   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h|   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c|  14 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h|   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |   7 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  |   4 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c  |   8 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c  |   8 +-
 drivers/gpu/drm/amd/scheduler/gpu_scheduler.h  | 186 -
 drivers/gpu/drm/scheduler/Makefile |   4 +
 .../gpu/drm/{amd => }/scheduler/gpu_scheduler.c| 296 ++---
 drivers/gpu/drm/{amd => }/scheduler/sched_fence.c  | 122 +
 include/drm/gpu_scheduler.h| 173 
 .../drm/gpu_scheduler_trace.h  |  14 +-
 .../drm/amd/scheduler => include/drm}/spsc_queue.h |   7 +-
 35 files changed, 529 insertions(+), 523 deletions(-)
 delete mode 100644 drivers/gpu/drm/amd/scheduler/gpu_scheduler.h
 create mode 100644 drivers/gpu/drm/scheduler/Makefile
 rename drivers/gpu/drm/{amd => }/scheduler/gpu_scheduler.c (65%)
 rename drivers/gpu/drm/{amd => }/scheduler/sched_fence.c (58%)
 create mode 100644 include/drm/gpu_scheduler.h
 rename drivers/gpu/drm/amd/scheduler/gpu_sched_trace.h => 
include/drm/gpu_scheduler_trace.h (82%)
 rename {drivers/gpu/drm/amd/scheduler => include/drm}/spsc_queue.h (95%)

-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2 2/2] drm/sched: move fence slab handling to module init/exit

2017-12-06 Thread Lucas Stach
This is the only part of the scheduler which must not be called from
different drivers. Move it to module init/exit so it is done a single
time when loading the scheduler.

Signed-off-by: Lucas Stach <l.st...@pengutronix.de>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  8 
 drivers/gpu/drm/scheduler/sched_fence.c | 12 
 include/drm/gpu_scheduler.h |  3 ---
 3 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 1d8011bca182..51b76688ab90 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -912,10 +912,6 @@ static int __init amdgpu_init(void)
if (r)
goto error_fence;
 
-   r = drm_sched_fence_slab_init();
-   if (r)
-   goto error_sched;
-
if (vgacon_text_force()) {
DRM_ERROR("VGACON disables amdgpu kernel modesetting.\n");
return -EINVAL;
@@ -928,9 +924,6 @@ static int __init amdgpu_init(void)
/* let modprobe override vga console setting */
return pci_register_driver(pdriver);
 
-error_sched:
-   amdgpu_fence_slab_fini();
-
 error_fence:
amdgpu_sync_fini();
 
@@ -944,7 +937,6 @@ static void __exit amdgpu_exit(void)
pci_unregister_driver(pdriver);
amdgpu_unregister_atpx_handler();
amdgpu_sync_fini();
-   drm_sched_fence_slab_fini();
amdgpu_fence_slab_fini();
 }
 
diff --git a/drivers/gpu/drm/scheduler/sched_fence.c 
b/drivers/gpu/drm/scheduler/sched_fence.c
index f6f2955890c4..69aab086b913 100644
--- a/drivers/gpu/drm/scheduler/sched_fence.c
+++ b/drivers/gpu/drm/scheduler/sched_fence.c
@@ -29,7 +29,7 @@
 
 static struct kmem_cache *sched_fence_slab;
 
-int drm_sched_fence_slab_init(void)
+static int __init drm_sched_fence_slab_init(void)
 {
sched_fence_slab = kmem_cache_create(
"drm_sched_fence", sizeof(struct drm_sched_fence), 0,
@@ -39,14 +39,12 @@ int drm_sched_fence_slab_init(void)
 
return 0;
 }
-EXPORT_SYMBOL_GPL(drm_sched_fence_slab_init);
 
-void drm_sched_fence_slab_fini(void)
+static void __exit drm_sched_fence_slab_fini(void)
 {
rcu_barrier();
kmem_cache_destroy(sched_fence_slab);
 }
-EXPORT_SYMBOL_GPL(drm_sched_fence_slab_fini);
 
 void drm_sched_fence_scheduled(struct drm_sched_fence *fence)
 {
@@ -185,3 +183,9 @@ struct drm_sched_fence *drm_sched_fence_create(struct 
drm_sched_entity *entity,
 
return fence;
 }
+
+module_init(drm_sched_fence_slab_init);
+module_exit(drm_sched_fence_slab_fini);
+
+MODULE_DESCRIPTION("DRM GPU scheduler");
+MODULE_LICENSE("GPL and additional rights");
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index d29da4cbb042..dfd54fb94e10 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -155,9 +155,6 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job,
 void drm_sched_entity_set_rq(struct drm_sched_entity *entity,
 struct drm_sched_rq *rq);
 
-int drm_sched_fence_slab_init(void);
-void drm_sched_fence_slab_fini(void);
-
 struct drm_sched_fence *drm_sched_fence_create(
struct drm_sched_entity *s_entity, void *owner);
 void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 0/2] Move scheduler out of AMDGPU

2017-12-04 Thread Lucas Stach
Hi,

Am Samstag, den 02.12.2017, 12:14 + schrieb Liu, Monk:
> I'm wondering if GPU reset still work after this home move ...

Why wouldn't it continue to work? After all this is just a code move
that doesn't change anything about the inner workings of the scheduler.

Regards,
Lucas

> 
> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On
> Behalf Of Christian K?nig
> Sent: Friday, December 1, 2017 11:55 PM
> To: Lucas Stach <l.st...@pengutronix.de>; Deucher, Alexander  er.deuc...@amd.com>
> Cc: Grodzovsky, Andrey <andrey.grodzov...@amd.com>; David Airlie  l...@linux.ie>; amd-gfx@lists.freedesktop.org; patchwork-lst@pengutro
> nix.de; dri-de...@lists.freedesktop.org; ker...@pengutronix.de
> Subject: Re: [PATCH 0/2] Move scheduler out of AMDGPU
> 
> Am 01.12.2017 um 16:28 schrieb Lucas Stach:
> > Hi all,
> > 
> > so this is the first step to make the marvelous AMDGPU scheduler 
> > useable for other drivers. I have a (mostly) working prototype of 
> > Etnaviv using the scheduler, but those patches need to keep baking
> > for a while.
> > 
> > I'm sending this out as I want to avoid rebasing this change too
> > much 
> > and don't want to take people by surprise when the Etnaviv 
> > implementation surfaces. Also this might need some coordination 
> > between AMDGPU and Etnaviv, which might be good to get going now.
> > 
> > Please speak up now if you have any objections or comments.
> 
> Looks good to me, but question is what is this based upon?
> 
> I strongly assume drm-next, so question is now if we have any patches
> inside amd branches we should apply before doing this.
> 
> CCing Andrey as well cause he has some tasks assigned around the
> scheduler as well.
> 
> Regards,
> Christian.
> 
> > 
> > Regards,
> > Lucas
> > 
> > Lucas Stach (2):
> >    drm: move amd_gpu_scheduler into common location
> >    drm/sched: move fence slab handling to module init/exit
> > 
> >   drivers/gpu/drm/Kconfig|   5 +
> >   drivers/gpu/drm/Makefile   |   1 +
> >   drivers/gpu/drm/amd/amdgpu/Makefile|   5 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu.h|  16 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |   8 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c|  38 +--
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  12 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|   8 -
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |   4 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c|  22 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c   |  14 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  12 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c  |  20 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_sched.h  |   2 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c   |   6 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|   8 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h|   4 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c|   8 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.h|   4 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c|   8 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h|   2 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c|  14 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h|   4 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  10 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |   4 +-
> >   drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  |   4 +-
> >   drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c  |   8 +-
> >   drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c  |   8 +-
> >   drivers/gpu/drm/amd/scheduler/gpu_scheduler.h  | 185 
> > --
> >   drivers/gpu/drm/scheduler/Makefile |   4 +
> >   .../gpu/drm/{amd => }/scheduler/gpu_scheduler.c| 281
> > +++--
> >   drivers/gpu/drm/{amd => }/scheduler/sched_fence.c  | 122 +---
> > -
> >   include/drm/gpu_scheduler.h| 171
> > +
> >   .../drm/gpu_scheduler_trace.h  |  14 +-
> >   34 files changed, 525 insertions(+), 511 deletions(-)
> >   delete mode 100644 drivers/gpu/drm/amd/scheduler/gpu_scheduler.h
> >   create mode 100644 drivers/gpu/drm/scheduler/Makefile
> >   rename drivers/gpu/drm/{amd => }/scheduler/gpu_scheduler.c 

Re: [PATCH 0/2] Move scheduler out of AMDGPU

2017-12-01 Thread Lucas Stach
Am Freitag, den 01.12.2017, 16:55 +0100 schrieb Christian König:
> Am 01.12.2017 um 16:28 schrieb Lucas Stach:
> > Hi all,
> > 
> > so this is the first step to make the marvelous AMDGPU scheduler
> > useable
> > for other drivers. I have a (mostly) working prototype of Etnaviv
> > using
> > the scheduler, but those patches need to keep baking for a while.
> > 
> > I'm sending this out as I want to avoid rebasing this change too
> > much
> > and don't want to take people by surprise when the Etnaviv
> > implementation
> > surfaces. Also this might need some coordination between AMDGPU and
> > Etnaviv, which might be good to get going now.
> > 
> > Please speak up now if you have any objections or comments.
> 
> Looks good to me, but question is what is this based upon?
> 
> I strongly assume drm-next, so question is now if we have any
> patches 
> inside amd branches we should apply before doing this.

For now this is based on 4.15-rc1, where the only difference to drm-
next in the scheduler code is currently this:

>8--

diff --git a/drivers/gpu/drm/amd/scheduler/gpu_sched_trace.h 
b/drivers/gpu/drm/amd/scheduler/gpu_sched_trace.h
index 8bd38102b58e..283a0dc25e84 100644
--- a/drivers/gpu/drm/amd/scheduler/gpu_sched_trace.h
+++ b/drivers/gpu/drm/amd/scheduler/gpu_sched_trace.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 */
 #if !defined(_GPU_SCHED_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
 #define _GPU_SCHED_TRACE_H_
 
diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
index e4d3b4ec4e92..92ec663fdada 100644
--- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
@@ -188,7 +188,7 @@ static bool amd_sched_entity_is_ready(struct 
amd_sched_entity *entity)
if (kfifo_is_empty(>job_queue))
return false;
 
-   if (ACCESS_ONCE(entity->dependency))
+   if (READ_ONCE(entity->dependency))
return false;
 
return true;

--->8--

I'm fine with rebasing this on whatever AMDGPU guys prefer, but this
requires a stable branch with all relevant patches included, so we can
use this as a synchronization point for the move.

> CCing Andrey as well cause he has some tasks assigned around the 
> scheduler as well.

Thanks,
Lucas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/2] drm: move amd_gpu_scheduler into common location

2017-12-01 Thread Lucas Stach
This moves and renames the AMDGPU scheduler to a common location in DRM
in order to facilitate re-use by other drivers. This is mostly a straight
forward rename with no code changes.

One notable exception is the function to_drm_sched_fence(), which is no
longer a inline header function to avoid the need to export the
drm_sched_fence_ops_scheduled and drm_sched_fence_ops_finished structures.

Signed-off-by: Lucas Stach <l.st...@pengutronix.de>
---
 drivers/gpu/drm/Kconfig|   5 +
 drivers/gpu/drm/Makefile   |   1 +
 drivers/gpu/drm/amd/amdgpu/Makefile|   5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  16 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c|  38 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  12 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c|  22 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c   |  14 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  12 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c  |  20 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.h  |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c   |   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h|   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c|   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.h|   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c|   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h|   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c|  14 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h|   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |   4 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  |   4 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c  |   8 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c  |   8 +-
 drivers/gpu/drm/amd/scheduler/gpu_scheduler.h  | 185 --
 drivers/gpu/drm/scheduler/Makefile |   4 +
 .../gpu/drm/{amd => }/scheduler/gpu_scheduler.c| 281 +++--
 drivers/gpu/drm/{amd => }/scheduler/sched_fence.c  | 118 +
 include/drm/gpu_scheduler.h| 174 +
 .../drm/gpu_scheduler_trace.h  |  14 +-
 34 files changed, 526 insertions(+), 505 deletions(-)
 delete mode 100644 drivers/gpu/drm/amd/scheduler/gpu_scheduler.h
 create mode 100644 drivers/gpu/drm/scheduler/Makefile
 rename drivers/gpu/drm/{amd => }/scheduler/gpu_scheduler.c (64%)
 rename drivers/gpu/drm/{amd => }/scheduler/sched_fence.c (59%)
 create mode 100644 include/drm/gpu_scheduler.h
 rename drivers/gpu/drm/amd/scheduler/gpu_sched_trace.h => 
include/drm/gpu_scheduler_trace.h (83%)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 4d9f21831741..ee38a3db1890 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -149,6 +149,10 @@ config DRM_VM
bool
depends on DRM && MMU
 
+config DRM_SCHED
+   tristate
+   depends on DRM
+
 source "drivers/gpu/drm/i2c/Kconfig"
 
 source "drivers/gpu/drm/arm/Kconfig"
@@ -178,6 +182,7 @@ config DRM_AMDGPU
depends on DRM && PCI && MMU
select FW_LOADER
 select DRM_KMS_HELPER
+   select DRM_SCHED
 select DRM_TTM
select POWER_SUPPLY
select HWMON
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index e9500844333e..1f6ba9e34e31 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -101,3 +101,4 @@ obj-$(CONFIG_DRM_MXSFB) += mxsfb/
 obj-$(CONFIG_DRM_TINYDRM) += tinydrm/
 obj-$(CONFIG_DRM_PL111) += pl111/
 obj-$(CONFIG_DRM_TVE200) += tve200/
+obj-$(CONFIG_DRM_SCHED)+= scheduler/
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 78d609123420..5f690f023e75 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -115,10 +115,7 @@ amdgpu-y += \
 amdgpu-y += amdgpu_cgs.o
 
 # GPU scheduler
-amdgpu-y += \
-   ../scheduler/gpu_scheduler.o \
-   ../scheduler/sched_fence.o \
-   amdgpu_job.o
+amdgpu-y += amdgpu_job.o
 
 # ACP componet
 ifneq ($(CONFIG_DRM_AMD_ACP),)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 5afaf6016b4a..f17882b87cf5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -68,7 +69,6 @@
 #include "amdgpu_mn.h"
 #include "amdgpu_dm.h"
 
-#include "gpu_scheduler.h"
 #include &quo

[PATCH 0/2] Move scheduler out of AMDGPU

2017-12-01 Thread Lucas Stach
Hi all,

so this is the first step to make the marvelous AMDGPU scheduler useable
for other drivers. I have a (mostly) working prototype of Etnaviv using
the scheduler, but those patches need to keep baking for a while.

I'm sending this out as I want to avoid rebasing this change too much
and don't want to take people by surprise when the Etnaviv implementation
surfaces. Also this might need some coordination between AMDGPU and
Etnaviv, which might be good to get going now.

Please speak up now if you have any objections or comments.

Regards,
Lucas

Lucas Stach (2):
  drm: move amd_gpu_scheduler into common location
  drm/sched: move fence slab handling to module init/exit

 drivers/gpu/drm/Kconfig|   5 +
 drivers/gpu/drm/Makefile   |   1 +
 drivers/gpu/drm/amd/amdgpu/Makefile|   5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  16 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c|  38 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  12 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|   8 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c|  22 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c   |  14 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  12 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c  |  20 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.h  |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c   |   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h|   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c|   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.h|   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c|   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h|   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c|  14 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h|   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |   4 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  |   4 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c  |   8 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c  |   8 +-
 drivers/gpu/drm/amd/scheduler/gpu_scheduler.h  | 185 --
 drivers/gpu/drm/scheduler/Makefile |   4 +
 .../gpu/drm/{amd => }/scheduler/gpu_scheduler.c| 281 +++--
 drivers/gpu/drm/{amd => }/scheduler/sched_fence.c  | 122 +
 include/drm/gpu_scheduler.h| 171 +
 .../drm/gpu_scheduler_trace.h  |  14 +-
 34 files changed, 525 insertions(+), 511 deletions(-)
 delete mode 100644 drivers/gpu/drm/amd/scheduler/gpu_scheduler.h
 create mode 100644 drivers/gpu/drm/scheduler/Makefile
 rename drivers/gpu/drm/{amd => }/scheduler/gpu_scheduler.c (64%)
 rename drivers/gpu/drm/{amd => }/scheduler/sched_fence.c (58%)
 create mode 100644 include/drm/gpu_scheduler.h
 rename drivers/gpu/drm/amd/scheduler/gpu_sched_trace.h => 
include/drm/gpu_scheduler_trace.h (83%)

-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/2] drm/sched: move fence slab handling to module init/exit

2017-12-01 Thread Lucas Stach
This is the only part of the scheduler which must not be called from
different drivers. Move it to module init/exit so it is done a single
time when loading the scheduler.

Signed-off-by: Lucas Stach <l.st...@pengutronix.de>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  8 
 drivers/gpu/drm/scheduler/sched_fence.c | 12 
 include/drm/gpu_scheduler.h |  3 ---
 3 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index b23c83c59725..18b7fce2fb27 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -906,10 +906,6 @@ static int __init amdgpu_init(void)
if (r)
goto error_fence;
 
-   r = drm_sched_fence_slab_init();
-   if (r)
-   goto error_sched;
-
if (vgacon_text_force()) {
DRM_ERROR("VGACON disables amdgpu kernel modesetting.\n");
return -EINVAL;
@@ -922,9 +918,6 @@ static int __init amdgpu_init(void)
/* let modprobe override vga console setting */
return pci_register_driver(pdriver);
 
-error_sched:
-   amdgpu_fence_slab_fini();
-
 error_fence:
amdgpu_sync_fini();
 
@@ -938,7 +931,6 @@ static void __exit amdgpu_exit(void)
pci_unregister_driver(pdriver);
amdgpu_unregister_atpx_handler();
amdgpu_sync_fini();
-   drm_sched_fence_slab_fini();
amdgpu_fence_slab_fini();
 }
 
diff --git a/drivers/gpu/drm/scheduler/sched_fence.c 
b/drivers/gpu/drm/scheduler/sched_fence.c
index f6f2955890c4..69aab086b913 100644
--- a/drivers/gpu/drm/scheduler/sched_fence.c
+++ b/drivers/gpu/drm/scheduler/sched_fence.c
@@ -29,7 +29,7 @@
 
 static struct kmem_cache *sched_fence_slab;
 
-int drm_sched_fence_slab_init(void)
+static int __init drm_sched_fence_slab_init(void)
 {
sched_fence_slab = kmem_cache_create(
"drm_sched_fence", sizeof(struct drm_sched_fence), 0,
@@ -39,14 +39,12 @@ int drm_sched_fence_slab_init(void)
 
return 0;
 }
-EXPORT_SYMBOL_GPL(drm_sched_fence_slab_init);
 
-void drm_sched_fence_slab_fini(void)
+static void __exit drm_sched_fence_slab_fini(void)
 {
rcu_barrier();
kmem_cache_destroy(sched_fence_slab);
 }
-EXPORT_SYMBOL_GPL(drm_sched_fence_slab_fini);
 
 void drm_sched_fence_scheduled(struct drm_sched_fence *fence)
 {
@@ -185,3 +183,9 @@ struct drm_sched_fence *drm_sched_fence_create(struct 
drm_sched_entity *entity,
 
return fence;
 }
+
+module_init(drm_sched_fence_slab_init);
+module_exit(drm_sched_fence_slab_fini);
+
+MODULE_DESCRIPTION("DRM GPU scheduler");
+MODULE_LICENSE("GPL and additional rights");
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 870ce9a693d3..2e165093a789 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -149,9 +149,6 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job);
 void drm_sched_entity_set_rq(struct drm_sched_entity *entity,
 struct drm_sched_rq *rq);
 
-int drm_sched_fence_slab_init(void);
-void drm_sched_fence_slab_fini(void);
-
 struct drm_sched_fence *drm_sched_fence_create(
struct drm_sched_entity *s_entity, void *owner);
 void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx