[Nouveau] [PATCH v7 3/6] drm/sched: Convert the GPU scheduler to variable number of run-queues

2023-10-25 Thread Matthew Brost
From: Luben Tuikov 

The GPU scheduler has now a variable number of run-queues, which are set up at
drm_sched_init() time. This way, each driver announces how many run-queues it
requires (supports) per each GPU scheduler it creates. Note, that run-queues
correspond to scheduler "priorities", thus if the number of run-queues is set
to 1 at drm_sched_init(), then that scheduler supports a single run-queue,
i.e. single "priority". If a driver further sets a single entity per
run-queue, then this creates a 1-to-1 correspondence between a scheduler and
a scheduled entity.

Cc: Lucas Stach 
Cc: Russell King 
Cc: Qiang Yu 
Cc: Rob Clark 
Cc: Abhinav Kumar 
Cc: Dmitry Baryshkov 
Cc: Danilo Krummrich 
Cc: Matthew Brost 
Cc: Boris Brezillon 
Cc: Alex Deucher 
Cc: Christian König 
Cc: Emma Anholt 
Cc: etna...@lists.freedesktop.org
Cc: l...@lists.freedesktop.org
Cc: linux-arm-...@vger.kernel.org
Cc: freedr...@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Cc: dri-de...@lists.freedesktop.org
Signed-off-by: Luben Tuikov 
Signed-off-by: Matthew Brost 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c|  4 +-
 drivers/gpu/drm/etnaviv/etnaviv_sched.c|  3 +-
 drivers/gpu/drm/lima/lima_sched.c  |  3 +-
 drivers/gpu/drm/msm/msm_ringbuffer.c   |  3 +-
 drivers/gpu/drm/nouveau/nouveau_sched.c|  1 +
 drivers/gpu/drm/panfrost/panfrost_job.c|  3 +-
 drivers/gpu/drm/scheduler/sched_entity.c   | 18 +-
 drivers/gpu/drm/scheduler/sched_main.c | 75 ++
 drivers/gpu/drm/v3d/v3d_sched.c| 14 ++--
 include/drm/gpu_scheduler.h|  9 ++-
 11 files changed, 102 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index b54c4d771104..94d073bfbd13 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2280,6 +2280,7 @@ static int amdgpu_device_init_schedulers(struct 
amdgpu_device *adev)
}
 
r = drm_sched_init(>sched, _sched_ops, NULL,
+  DRM_SCHED_PRIORITY_COUNT,
   ring->num_hw_submission, 0,
   timeout, adev->reset_domain->wq,
   ring->sched_score, ring->name,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 78476bc75b4e..1f357198533f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -325,8 +325,8 @@ void amdgpu_job_stop_all_jobs_on_sched(struct 
drm_gpu_scheduler *sched)
int i;
 
/* Signal all jobs not yet scheduled */
-   for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; 
i--) {
-   struct drm_sched_rq *rq = >sched_rq[i];
+   for (i = sched->num_rqs - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
+   struct drm_sched_rq *rq = sched->sched_rq[i];
spin_lock(>lock);
list_for_each_entry(s_entity, >entities, list) {
while ((s_job = 
to_drm_sched_job(spsc_queue_pop(_entity->job_queue {
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c 
b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
index 618a804ddc34..396334984e4d 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -134,7 +134,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
 {
int ret;
 
-   ret = drm_sched_init(>sched, _sched_ops, NULL,
+   ret = drm_sched_init(>sched, _sched_ops,
+DRM_SCHED_PRIORITY_COUNT, NULL,
 etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
 msecs_to_jiffies(500), NULL, NULL,
 dev_name(gpu->dev), gpu->dev);
diff --git a/drivers/gpu/drm/lima/lima_sched.c 
b/drivers/gpu/drm/lima/lima_sched.c
index 8d858aed0e56..23a6276f1332 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -488,7 +488,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, 
const char *name)
 
INIT_WORK(>recover_work, lima_sched_recover_work);
 
-   return drm_sched_init(>base, _sched_ops, NULL, 1,
+   return drm_sched_init(>base, _sched_ops,
+ DRM_SCHED_PRIORITY_COUNT, NULL, 1,
  lima_job_hang_limit,
  msecs_to_jiffies(timeout), NULL,
  NULL, name, pipe->ldev->dev);
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c 
b/drivers/gpu/drm/msm/msm_ringbuffer.c
index 1097f8e93d6b..935154979fc2 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -94,7 +94,8 @@ struct msm_ringbuffer *msm_ringbu

Re: [Nouveau] [PATCH drm-misc-next] drm/sched: support multiple rings per gpu_scheduler

2023-08-09 Thread Matthew Brost
On Thu, Aug 10, 2023 at 12:17:23AM +0200, Danilo Krummrich wrote:
> With the current mental model every GPU scheduler instance represents
> a single HW ring, while every entity represents a software queue feeding
> into one or multiple GPU scheduler instances and hence into one or
> multiple HW rings.
> 
> This does not really scale with firmware schedulers feeding the actual
> HW rings, while the driver feeds the firmware scheduler through an
> arbitrary amount of dynamically created rings, since for each of those
> rings a GPU scheduler instance would be required and a separate kthread
> would be created.
> 
> To overcome this we can think of the scheduler having two modes of
> operation, single ring mode and multi ring mode. Depending on the mode
> of operation, the mental model differs.
> 
> Single ring mode (which is the default) keeps the original behaviour of
> the scheduler and its entities.
> 
> In multi ring mode (enabled by passing the GPU scheduler the
> DRM_GPU_SCHEDULER_MODE_MULTI_RING flag) each entity represents a ring,
> while the GPU scheduler itself only exists to sort out job dependencies
> and actually process the jobs of each entity.
> 
> By just applying this different mental model, the required code change
> is pretty minimalistic: While in single ring mode, if a job depends on a
> dma-fence from the same scheduler instance we only wait for the
> dependency to be scheduled (sched_fence->scheduled), in multi ring mode
> we fall through and just wait for the dependency to fully complete
> (sched_fence->finished) in order to avoid races between separate rings.
> Or in other words, to pick up the schedulers existing terminology,
> prevent dependency pipelining.
> 
> Signed-off-by: Danilo Krummrich 
> ---
> Just before sending out this patch I was made aware of the "DRM Scheduler
> changes for XE" [1] patch series.
> 
> However, I think bringing this alternative approach into discussion could
> be useful.
> 
> From a first glance it looks like that both approaches semantically aim for
> the same goal, namely addressing firmware schedulers with arbitrary amounts
> of software rings to feed from the driver side.
> 
> The "XE approach" for that seems to be to make GPU scheduler instances
> scalable by submitting jobs to a workqueue rather than creating a kthread
> per scheduler instance, such that per software ring a scheduler instance can
> be created. Hence the definition of a 1:1 relationship between scheduler and
> entity.
> 
> In Nouveau we approached it differently, such that we have just one GPU
> scheduler per driver instance, while software rings are represented as
> entities. As explained in the commit message, this seems to work fine, as
> long as we can teach the scheduler to prevent dependency pipelining.
> 
> As a hack this basically already works without this patch by just setting the
> DRM_SCHED_FENCE_DONT_PIPELINE flag for all out-fences.
> 

A few questions.

1. With different rings + a firmware scheduler how do you do with
pending list being in-order and jobs from different rings completing of
order. The pending list in order nature is required for free_job being
called and the TDR to work properly.

2. How do you flow control each ring in Nouveau?

3. Do you support per ring reset? Again how does this work if the
pending list has jobs from multiple rings?

The answer to all of these in Xe is solved by the 1 to 1 relationship
between scheduler and entity. In addition to the above questions the 1
to 1 relationship in allows sleeping in the main worker in the callback
functions which we use our preempt fence implementation.

This implementation as is will not for Xe. Also in Xe it is desirable to
have multiple CPUs to be submiting jobs in parallel too.

Matt

> [1] 
> https://lore.kernel.org/intel-xe/20230801205103.627779-1-matthew.br...@intel.com/T/#t
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  2 +-
>  drivers/gpu/drm/etnaviv/etnaviv_sched.c|  2 +-
>  drivers/gpu/drm/lima/lima_sched.c  |  2 +-
>  drivers/gpu/drm/msm/msm_ringbuffer.c   |  2 +-
>  drivers/gpu/drm/nouveau/nouveau_sched.c|  7 +--
>  drivers/gpu/drm/panfrost/panfrost_job.c|  2 +-
>  drivers/gpu/drm/scheduler/sched_entity.c   |  3 +-
>  drivers/gpu/drm/scheduler/sched_main.c | 55 +++---
>  drivers/gpu/drm/v3d/v3d_sched.c| 10 ++--
>  include/drm/gpu_scheduler.h| 18 ++-
>  10 files changed, 72 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index a2cdde0ca0a7..eed6f56e3957 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2491,7 +2491,7 @@ static int amdgpu_device_init_schedulers(struct 
> amdgpu_device *adev)
>   break;
>   }
>  
> - r = drm_sched_init(>sched, _sched_ops,
> + r = drm_sched_init(>sched, 0, 

Re: [Nouveau] [PATCH drm-next v6 02/13] drm: manager to keep track of GPUs VA mappings

2023-07-10 Thread Matthew Brost
On Fri, Jul 07, 2023 at 02:52:41PM +0200, Boris Brezillon wrote:
> On Fri, 7 Jul 2023 14:41:23 +0200
> Danilo Krummrich  wrote:
> 
> > >> + va__ && (va__->va.addr < (end__)) && \
> > >> + !list_entry_is_head(va__, &(mgr__)->rb.list, rb.entry); \
> > >> + va__ = list_next_entry(va__, rb.entry))  
> > > 
> > > If you define:
> > > 
> > > static inline struct drm_gpuva *
> > > drm_gpuva_next(struct drm_gpuva *va)
> > > {
> > >   if (va && !list_is_last(>rb.entry, >mgr->rb.list))
> > >   return list_next_entry(va, rb.entry);
> > > 
> > >   return NULL;
> > > } >
> > > the for loop becomes a bit more readable:  
> > 
> > Yes, it would. However, I don't want it to be confused with 
> > drm_gpuva_find_next(). Maybe I should rename the latter to something 
> > like drm_gpuva_find_next_neighbor() then.
> 
> If you want to keep drm_gpuva_find_next(), feel free to rename/prefix
> the drm_gpuva_next() function. I was just posting it as a reference.
> 
> > 
> > > 
> > >   for (va__ = drm_gpuva_find_first((mgr__), (start__), (end__) - 
> > > (start__)); \
> > >va__ && (va__->va.addr < (end__)); \
> > >va__ = drm_gpuva_next(va__))
> > >   
> > >> +
> > >> +/**
> > >> + * drm_gpuva_for_each_va_range_safe - iternator to safely walk over a 
> > >> range of
> > >> + * _gpuvas
> > >> + * @va__: _gpuva to assign to in each iteration step
> > >> + * @next__: another _gpuva to use as temporary storage
> > >> + * @mgr__: _gpuva_manager to walk over
> > >> + * @start__: starting offset, the first gpuva will overlap this
> > >> + * @end__: ending offset, the last gpuva will start before this (but may
> > >> + * overlap)
> > >> + *
> > >> + * This iterator walks over all _gpuvas in the _gpuva_manager 
> > >> that lie
> > >> + * between @start__ and @end__. It is implemented similarly to
> > >> + * list_for_each_safe(), but is using the _gpuva_manager's internal 
> > >> interval
> > >> + * tree to accelerate the search for the starting _gpuva, and hence 
> > >> is safe
> > >> + * against removal of elements. It assumes that @end__ is within (or is 
> > >> the
> > >> + * upper limit of) the _gpuva_manager. This iterator does not skip 
> > >> over the
> > >> + * _gpuva_manager's @kernel_alloc_node.
> > >> + */
> > >> +#define drm_gpuva_for_each_va_range_safe(va__, next__, mgr__, start__, 
> > >> end__) \
> > >> +for (va__ = drm_gpuva_find_first((mgr__), (start__), (end__)), \
> > >> + next__ = va ? list_next_entry(va__, rb.entry) : NULL; \
> > >> + va__ && (va__->va.addr < (end__)) && \
> > >> + !list_entry_is_head(va__, &(mgr__)->rb.list, rb.entry); \
> > >> + va__ = next__, next__ = list_next_entry(va__, rb.entry))  
> > > 
> > > And this is the safe version using the drm_gpuva_next() helper:
> > > 
> > >   for (va__ = drm_gpuva_find_first((mgr__), (start__), (end__) - 
> > > (start__)), \
> > >next__ = drm_gpuva_next(va__); \
> > >va__ && (va__->va.addr < (end__)); \
> > >va__ = next__, next__ = drm_gpuva_next(va__))
> > > 
> > > Those changes fixed an invalid pointer access I had in the sm_unmap()
> > > path.
> > >   
> > 
> > Sorry you did run into this bug.
> 
> No worries, that's what testing/debugging/reviewing is for. And I'm glad
> someone decided to work on this gpuva stuff so I don't have to code it
> myself, so that's the least I can do.

With Boris's changes this version works in Xe.

With that:

Acked-by: Matthew Brost 
Tested-by: Matthew Brost 


Re: [Nouveau] [PATCH drm-next v5 03/14] drm: manager to keep track of GPUs VA mappings

2023-06-27 Thread Matthew Brost
On Thu, Jun 22, 2023 at 05:07:11PM +0200, Danilo Krummrich wrote:
> On 6/22/23 17:04, Danilo Krummrich wrote:
> > On 6/22/23 16:42, Christian König wrote:
> > > Am 22.06.23 um 16:22 schrieb Danilo Krummrich:
> > > > On 6/22/23 15:54, Christian König wrote:
> > > > > Am 20.06.23 um 14:23 schrieb Danilo Krummrich:
> > > > > > Hi Christian,
> > > > > > 
> > > > > > On 6/20/23 08:45, Christian König wrote:
> > > > > > > Hi Danilo,
> > > > > > > 
> > > > > > > sorry for the delayed reply. I've trying to dig
> > > > > > > myself out of a hole at the moment.
> > > > > > 
> > > > > > No worries, thank you for taking a look anyway!
> > > > > > 
> > > > > > > 
> > > > > > > Am 20.06.23 um 02:42 schrieb Danilo Krummrich:
> > > > > > > > [SNIP]
> > > > > > > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > > > > > > index bbc721870c13..5ec8148a30ee 100644
> > > > > > > > --- a/include/drm/drm_gem.h
> > > > > > > > +++ b/include/drm/drm_gem.h
> > > > > > > > @@ -36,6 +36,8 @@
> > > > > > > >   #include 
> > > > > > > >   #include 
> > > > > > > > +#include 
> > > > > > > > +#include 
> > > > > > > >   #include 
> > > > > > > > @@ -379,6 +381,18 @@ struct drm_gem_object {
> > > > > > > >    */
> > > > > > > >   struct dma_resv _resv;
> > > > > > > > +    /**
> > > > > > > > + * @gpuva:
> > > > > > > > + *
> > > > > > > > + * Provides the list of GPU VAs attached to this GEM 
> > > > > > > > object.
> > > > > > > > + *
> > > > > > > > + * Drivers should lock list accesses with
> > > > > > > > the GEMs _resv lock
> > > > > > > > + * (_gem_object.resv).
> > > > > > > > + */
> > > > > > > > +    struct {
> > > > > > > > +    struct list_head list;
> > > > > > > > +    } gpuva;
> > > > > > > > +
> > > > > > > >   /**
> > > > > > > >    * @funcs:
> > > > > > > >    *
> > > > > > > 
> > > > > > > I'm pretty sure that it's not a good idea to attach
> > > > > > > this directly to the GEM object.
> > > > > > 
> > > > > > Why do you think so? IMHO having a common way to connect
> > > > > > mappings to their backing buffers is a good thing, since
> > > > > > every driver needs this connection anyway.
> > > > > > 
> > > > > > E.g. when a BO gets evicted, drivers can just iterate
> > > > > > the list of mappings and, as the circumstances require,
> > > > > > invalidate the corresponding mappings or to unmap all
> > > > > > existing mappings of a given buffer.
> > > > > > 
> > > > > > What would be the advantage to let every driver
> > > > > > implement a driver specific way of keeping this
> > > > > > connection?
> > > > > 
> > > > > Flexibility. For example on amdgpu the mappings of a BO are
> > > > > groups by VM address spaces.
> > > > > 
> > > > > E.g. the BO points to multiple bo_vm structures which in
> > > > > turn have lists of their mappings.
> > > > 
> > > > Isn't this (almost) the same relationship I introduce with the
> > > > GPUVA manager?
> > > > 
> > > > If you would switch over to the GPUVA manager right now, it
> > > > would be that every GEM has a list of it's mappings (the gpuva
> > > > list). The mapping is represented by struct drm_gpuva (of course
> > > > embedded in driver specific structure(s)) which has a pointer to
> > > > the VM address space it is part of, namely the GPUVA manager
> > > > instance. And the GPUVA manager keeps a maple tree of it's
> > > > mappings as well.
> > > > 
> > > > If you still would like to *directly* (indirectly you already
> > > > have that relationship) keep a list of GPUVA managers (VM
> > > > address spaces) per GEM, you could still do that in a driver
> > > > specific way.
> > > > 
> > > > Do I miss something?
> > > 
> > > How do you efficiently find only the mappings of a BO in one VM?
> > 
> > Actually, I think this case should even be more efficient than with a BO
> > having a list of GPUVAs (or mappings):
> 
> *than with a BO having a list of VMs:
> 
> > 
> > Having a list of GPUVAs per GEM, each GPUVA has a pointer to it's VM.
> > Hence, you'd only need to iterate the list of mappings for a given BO
> > and check the mappings VM pointer.
> > 
> > Having a list of VMs per BO, you'd have to iterate the whole VM to find
> > the mappings having a pointer to the given BO, right?
> > 
> > I'd think that a single VM potentially has more mapping entries than a
> > single BO was mapped in multiple VMs.
> > 
> > Another case to consider is the case I originally had in mind choosing
> > this relationship: finding all mappings for a given BO, which I guess
> > all drivers need to do in order to invalidate mappings on BO eviction.
> > 
> > Having a list of VMs per BO, wouldn't you need to iterate all of the VMs
> > entirely?
> > 

FWIW I agree with Danilo here on basically all points.

- Having GPUVA list per GEM makes a ton of sense wrt eviction and
  invalidation. Xe had this list prior to GPUVA, after GPUVA it is just
  in a common place.
- From a GPUVA to you can resolve a GEM
- GPUVA can have NULL GEM