Re: [PATCH v4 5/7] drm/panfrost: Add a new ioctl to submit batches

2021-07-26 Thread Boris Brezillon
On Thu, 8 Jul 2021 14:10:45 +0200
Christian König  wrote:

> >> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> >> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> >> @@ -254,6 +254,9 @@ static int panfrost_acquire_object_fences(struct 
> >> panfrost_job *job)
> >>return ret;
> >>}
> >>   
> >> +  if (job->bo_flags[i] & PANFROST_BO_REF_NO_IMPLICIT_DEP)
> >> +  continue;  
> > This breaks dma_resv rules. I'll send out patch set fixing this pattern in
> > other drivers, I'll ping you on that for what you need to change. Should
> > go out today or so.

I guess you're talking about [1]. TBH, I don't quite see the point of
exposing a 'no-implicit' flag if we end up forcing this implicit dep
anyway, but I'm probably missing something.

> 
> I'm really wondering if the behavior that the exclusive fences replaces 
> all the shared fences was such a good idea.

Is that what's done in [1], or are you talking about a different
patchset/approach?

> 
> It just allows drivers to mess up things in a way which can be easily 
> used to compromise the system.

I must admit I'm a bit lost, so I'm tempted to drop that flag for now
:-).

[1]https://patchwork.freedesktop.org/patch/443711/?series=92334&rev=3


Re: [PATCH v4 00/18] drm/sched dependency tracking and dma-resv fixes

2021-07-27 Thread Boris Brezillon
On Mon, 12 Jul 2021 19:53:34 +0200
Daniel Vetter  wrote:

> Hi all,
> 
> Quick new version since the previous one was a bit too broken:
> - dropped the bug-on patch to avoid breaking amdgpu's gpu reset failure
>   games
> - another attempt at splitting job_init/arm, hopefully we're getting
>   there.
> 
> Note that Christian has brought up a bikeshed on the new functions to add
> dependencies to drm_sched_jobs. I'm happy to repaint, if there's some kind
> of consensus on what it should be.
> 
> Testing and review very much welcome, as usual.
> 
> Cheers, Daniel
> 
> Daniel Vetter (18):
>   drm/sched: Split drm_sched_job_init
>   drm/sched: Barriers are needed for entity->last_scheduled
>   drm/sched: Add dependency tracking
>   drm/sched: drop entity parameter from drm_sched_push_job
>   drm/sched: improve docs around drm_sched_entity

Patches 1, 3, 4 and 5 are

Reviewed-by: Boris Brezillon 

>   drm/panfrost: use scheduler dependency tracking
>   drm/lima: use scheduler dependency tracking
>   drm/v3d: Move drm_sched_job_init to v3d_job_init
>   drm/v3d: Use scheduler dependency handling
>   drm/etnaviv: Use scheduler dependency handling
>   drm/gem: Delete gem array fencing helpers
>   drm/sched: Don't store self-dependencies
>   drm/sched: Check locking in drm_sched_job_await_implicit
>   drm/msm: Don't break exclusive fence ordering
>   drm/etnaviv: Don't break exclusive fence ordering
>   drm/i915: delete exclude argument from i915_sw_fence_await_reservation
>   drm/i915: Don't break exclusive fence ordering
>   dma-resv: Give the docs a do-over
> 
>  Documentation/gpu/drm-mm.rst  |   3 +
>  drivers/dma-buf/dma-resv.c|  24 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c|   4 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c   |   4 +-
>  drivers/gpu/drm/drm_gem.c |  96 -
>  drivers/gpu/drm/etnaviv/etnaviv_gem.h |   5 +-
>  drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c  |  64 +++---
>  drivers/gpu/drm/etnaviv/etnaviv_sched.c   |  65 +-
>  drivers/gpu/drm/etnaviv/etnaviv_sched.h   |   3 +-
>  drivers/gpu/drm/i915/display/intel_display.c  |   4 +-
>  drivers/gpu/drm/i915/gem/i915_gem_clflush.c   |   2 +-
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c|   8 +-
>  drivers/gpu/drm/i915/i915_sw_fence.c  |   6 +-
>  drivers/gpu/drm/i915/i915_sw_fence.h  |   1 -
>  drivers/gpu/drm/lima/lima_gem.c   |   7 +-
>  drivers/gpu/drm/lima/lima_sched.c |  28 +--
>  drivers/gpu/drm/lima/lima_sched.h |   6 +-
>  drivers/gpu/drm/msm/msm_gem_submit.c  |   3 +-
>  drivers/gpu/drm/panfrost/panfrost_drv.c   |  16 +-
>  drivers/gpu/drm/panfrost/panfrost_job.c   |  39 +---
>  drivers/gpu/drm/panfrost/panfrost_job.h   |   5 +-
>  drivers/gpu/drm/scheduler/sched_entity.c  | 140 +++--
>  drivers/gpu/drm/scheduler/sched_fence.c   |  19 +-
>  drivers/gpu/drm/scheduler/sched_main.c| 181 +++--
>  drivers/gpu/drm/v3d/v3d_drv.h |   6 +-
>  drivers/gpu/drm/v3d/v3d_gem.c | 115 +--
>  drivers/gpu/drm/v3d/v3d_sched.c   |  44 +
>  include/drm/drm_gem.h |   5 -
>  include/drm/gpu_scheduler.h   | 186 ++
>  include/linux/dma-buf.h   |   7 +
>  include/linux/dma-resv.h  | 104 +-
>  31 files changed, 672 insertions(+), 528 deletions(-)
> 



Re: [RFC PATCH 0/7] drm/panfrost: Add a new submit ioctl

2021-03-12 Thread Boris Brezillon
On Thu, 11 Mar 2021 12:16:33 +
Steven Price  wrote:

> Also the current code completely ignores PANFROST_BO_REF_READ. So either 
> that should be defined as 0, or even better we support 3 modes:
> 
>   * Exclusive ('write' access)
>   * Shared ('read' access)
>   * No fence - ensures the BO is mapped, but doesn't add any implicit 
> fences.

I looked at other drivers and they seem to have this
NO_FENCE/NO_IMPLICIT flag at the job/submit level and conditioned on
the presence of a in_sync_fd file. I can see per-BO NO_FENCE being
useful for sub-buffer accesses, assuming we don't want to deal with
other deps explicitly, but we could also force the NO_IMPLICIT behavior
for the whole job and let the user deal with all deps explicitly. TBH,
it's a bit unclear to me how that feature would be used by the vulkan
driver, so I'd rather leave that NO_FENCE use case for later.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC PATCH 0/7] drm/panfrost: Add a new submit ioctl

2021-03-12 Thread Boris Brezillon
On Fri, 12 Mar 2021 09:37:49 -0600
Jason Ekstrand  wrote:

> On Fri, Mar 12, 2021 at 1:31 AM Boris Brezillon
>  wrote:
> >
> > On Thu, 11 Mar 2021 12:11:48 -0600
> > Jason Ekstrand  wrote:
> >  
> > > > > > > > 2/ Queued jobs might be executed out-of-order (unless they have
> > > > > > > > explicit/implicit deps between them), and Vulkan asks that 
> > > > > > > > the out
> > > > > > > > fence be signaled when all jobs are done. Timeline syncobjs 
> > > > > > > > are a
> > > > > > > > good match for that use case. All we need to do is pass the 
> > > > > > > > same
> > > > > > > > fence syncobj to all jobs being attached to a single 
> > > > > > > > QueueSubmit
> > > > > > > > request, but a different point on the timeline. The syncobj
> > > > > > > > timeline wait does the rest and guarantees that we've 
> > > > > > > > reached a
> > > > > > > > given timeline point (IOW, all jobs before that point are 
> > > > > > > > done)
> > > > > > > > before declaring the fence as signaled.
> > > > > > > > One alternative would be to have dummy 'synchronization' 
> > > > > > > > jobs that
> > > > > > > > don't actually execute anything on the GPU but declare a 
> > > > > > > > dependency
> > > > > > > > on all other jobs that are part of the QueueSubmit request, 
> > > > > > > > and
> > > > > > > > signal the out fence (the scheduler would do most of the 
> > > > > > > > work for
> > > > > > > > us, all we have to do is support NULL job heads and signal 
> > > > > > > > the
> > > > > > > > fence directly when that happens instead of queueing the 
> > > > > > > > job).  
> > > > > > >
> > > > > > > I have to admit to being rather hazy on the details of timeline
> > > > > > > syncobjs, but I thought there was a requirement that the timeline 
> > > > > > > moves
> > > > > > > monotonically. I.e. if you have multiple jobs signalling the same
> > > > > > > syncobj just with different points, then AFAIU the API requires 
> > > > > > > that the
> > > > > > > points are triggered in order.  
> > > > > >
> > > > > > I only started looking at the SYNCOBJ_TIMELINE API a few days ago, 
> > > > > > so I
> > > > > > might be wrong, but my understanding is that queuing fences 
> > > > > > (addition
> > > > > > of new points in the timeline) should happen in order, but signaling
> > > > > > can happen in any order. When checking for a signaled fence the
> > > > > > fence-chain logic starts from the last point (or from an explicit 
> > > > > > point
> > > > > > if you use the timeline wait flavor) and goes backward, stopping at 
> > > > > > the
> > > > > > first un-signaled node. If I'm correct, that means that fences that
> > > > > > are part of a chain can be signaled in any order.  
> > > > >
> > > > > You don't even need a timeline for this.  Just have a single syncobj
> > > > > per-queue and make each submit wait on it and then signal it.
> > > > > Alternatively, you can just always hang on to the out-fence from the
> > > > > previous submit and make the next one wait on that.  
> > > >
> > > > That's what I have right now, but it forces the serialization of all
> > > > jobs that are pushed during a submit (and there can be more than one
> > > > per command buffer on panfrost :-/). Maybe I'm wrong, but I thought it'd
> > > > be better to not force this serialization if we can avoid it.  
> > >
> > > I'm not familiar with panfrost's needs and I don't work on a tiler and
> > > I know there are different issues there.  But...
> > >
> > > The Vulkan spec requires that everything that all the submits that
> > > happen on a given vkQueue happen in-order.  Search the spec for

Re: [RFC PATCH 0/7] drm/panfrost: Add a new submit ioctl

2021-03-12 Thread Boris Brezillon
On Fri, 12 Mar 2021 19:25:13 +0100
Boris Brezillon  wrote:

> > So where does this leave us?  Well, it depends on your submit model
> > and exactly how you handle pipeline barriers that sync between
> > engines.  If you're taking option 3 above and doing two command
> > buffers for each VkCommandBuffer, then you probably want two
> > serialized timelines, one for each engine, and some mechanism to tell
> > the kernel driver "these two command buffers have to run in parallel"
> > so that your ping-pong works.  If you're doing 1 or 2 above, I think
> > you probably still want two simple syncobjs, one for each engine.  You
> > don't really have any need to go all that far back in history.  All
> > you really need to describe is "command buffer X depends on previous
> > compute work" or "command buffer X depends on previous binning work".  
> 
> Okay, so this will effectively force in-order execution. Let's take your
> previous example and add 2 more jobs at the end that have no deps on
> previous commands:
> 
> vkBeginRenderPass() /* Writes to ImageA */
> vkCmdDraw()
> vkCmdDraw()
> ...
> vkEndRenderPass()
> vkPipelineBarrier(imageA /* fragment -> compute */)
> vkCmdDispatch() /* reads imageA, writes BufferB */
> vkBeginRenderPass() /* Writes to ImageC */
> vkCmdBindVertexBuffers(bufferB)
> vkCmdDraw();
> ...
> vkEndRenderPass()
> vkBeginRenderPass() /* Writes to ImageD */
> vkCmdDraw()
> ...
> vkEndRenderPass()
> 
> A: Vertex for the first draw on the compute engine
> B: Vertex for the first draw on the compute engine
> C: Fragment for the first draw on the binning engine; depends on A
> D: Fragment for the second draw on the binning engine; depends on B
> E: Compute on the compute engine; depends on C and D
> F: Vertex for the third draw on the compute engine; depends on E
> G: Fragment for the third draw on the binning engine; depends on F
> H: Vertex for the fourth draw on the compute engine
> I: Fragment for the fourth draw on the binning engine
> 
> When we reach E, we might be waiting for D to finish before scheduling
> the job, and because of the implicit serialization we have on the
> compute queue (F implicitly depends on E, and H on F) we can't schedule
> H either, which could, in theory be started. I guess that's where the
> term submission order is a bit unclear to me. The action of starting a
> job sounds like execution order to me (the order you starts jobs
> determines the execution order since we only have one HW queue per job
> type). All implicit deps have been calculated when we queued the job to
> the SW queue, and I thought that would be enough to meet the submission
> order requirements, but I might be wrong.
> 
> The PoC I have was trying to get rid of this explicit serialization on
> the compute and fragment queues by having one syncobj timeline
> (queue()) and synchronization points (Sx).
> 
> S0: in-fences=, out-fences= #waitSemaphore 
> sync point
> A: in-fences=, out-fences=
> B: in-fences=, out-fences=
> C: in-fences=, out-fence= #implicit dep on A through 
> the tiler context
> D: in-fences=, out-fence= #implicit dep on B through 
> the tiler context
> E: in-fences=, out-fence= #implicit dep on D through 
> imageA
> F: in-fences=, out-fence= #implicit dep on E through 
> buffer B
> G: in-fences=, out-fence= #implicit dep on F through 
> the tiler context
> H: in-fences=, out-fence=
> I: in-fences=, out-fence= #implicit dep on H through 
> the tiler buffer
> S1: in-fences=, out-fences= 
> #signalSemaphore,fence sync point
> # QueueWaitIdle is implemented with a wait(queue(0)), AKA wait on the last 
> point
> 
> With this solution H can be started before E if the compute slot
> is empty and E's implicit deps are not done. It's probably overkill,
> but I thought maximizing GPU utilization was important.

Nevermind, I forgot the drm scheduler was dequeuing jobs in order, so 2
syncobjs (one per queue type) is indeed the right approach.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 0/3] drm/panfrost: MMU fixes

2021-02-01 Thread Boris Brezillon
Hello,

Here are 2 fixes and one improvement for the page fault handling. Those
bugs were found while working on indirect draw supports which requires
the allocation of a big heap buffer for varyings, and the vertex/tiler
shaders seem to have access pattern that trigger those issues. I
remember discussing the first issue with Steve or Robin a while back,
but we never hit it before (now we do :)).

The last patch is a perf improvement: no need to re-enable hardware
interrupts if we know the threaded irq handler will be woken up right
away.

Regards,

Boris

Boris Brezillon (3):
  drm/panfrost: Clear MMU irqs before handling the fault
  drm/panfrost: Don't try to map pages that are already mapped
  drm/panfrost: Stay in the threaded MMU IRQ handler until we've handled
all IRQs

 drivers/gpu/drm/panfrost/panfrost_mmu.c | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

-- 
2.26.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 2/3] drm/panfrost: Don't try to map pages that are already mapped

2021-02-01 Thread Boris Brezillon
We allocate 2MB chunks at a time, so it might appear that a page fault
has already been handled by a previous page fault when we reach
panfrost_mmu_map_fault_addr(). Bail out in that case to avoid mapping the
same area twice.

Cc: 
Fixes: 187d2929206e ("drm/panfrost: Add support for GPU heap allocations")
Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 904d63450862..21e552d1ac71 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -488,8 +488,14 @@ static int panfrost_mmu_map_fault_addr(struct 
panfrost_device *pfdev, int as,
}
bo->base.pages = pages;
bo->base.pages_use_count = 1;
-   } else
+   } else {
pages = bo->base.pages;
+   if (pages[page_offset]) {
+   /* Pages are already mapped, bail out. */
+   mutex_unlock(&bo->base.pages_lock);
+   goto out;
+   }
+   }
 
mapping = bo->base.base.filp->f_mapping;
mapping_set_unevictable(mapping);
@@ -522,6 +528,7 @@ static int panfrost_mmu_map_fault_addr(struct 
panfrost_device *pfdev, int as,
 
dev_dbg(pfdev->dev, "mapped page fault @ AS%d %llx", as, addr);
 
+out:
panfrost_gem_mapping_put(bomapping);
 
return 0;
-- 
2.26.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 1/3] drm/panfrost: Clear MMU irqs before handling the fault

2021-02-01 Thread Boris Brezillon
When a fault is handled it will unblock the GPU which will continue
executing its shader and might fault almost immediately on a different
page. If we clear interrupts after handling the fault we might miss new
faults, so clear them before.

Cc: 
Fixes: 187d2929206e ("drm/panfrost: Add support for GPU heap allocations")
Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 7c1b3481b785..904d63450862 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -593,6 +593,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, 
void *data)
access_type = (fault_status >> 8) & 0x3;
source_id = (fault_status >> 16);
 
+   mmu_write(pfdev, MMU_INT_CLEAR, mask);
+
/* Page fault only */
ret = -1;
if ((status & mask) == BIT(i) && (exception_type & 0xF8) == 
0xC0)
@@ -616,8 +618,6 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, 
void *data)
access_type, access_type_name(pfdev, 
fault_status),
source_id);
 
-   mmu_write(pfdev, MMU_INT_CLEAR, mask);
-
status &= ~mask;
}
 
-- 
2.26.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 3/3] drm/panfrost: Stay in the threaded MMU IRQ handler until we've handled all IRQs

2021-02-01 Thread Boris Brezillon
Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay
in the threaded irq handler as long as we can.

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 21e552d1ac71..65bc20628c4e 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -580,6 +580,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, 
void *data)
u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT);
int i, ret;
 
+again:
+
for (i = 0; status; i++) {
u32 mask = BIT(i) | BIT(i + 16);
u64 addr;
@@ -628,6 +630,11 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int 
irq, void *data)
status &= ~mask;
}
 
+   /* If we received new MMU interrupts, process them before returning. */
+   status = mmu_read(pfdev, MMU_INT_RAWSTAT);
+   if (status)
+   goto again;
+
mmu_write(pfdev, MMU_INT_MASK, ~0);
return IRQ_HANDLED;
 };
-- 
2.26.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 3/3] drm/panfrost: Stay in the threaded MMU IRQ handler until we've handled all IRQs

2021-02-01 Thread Boris Brezillon
On Mon, 1 Feb 2021 12:13:49 +
Steven Price  wrote:

> On 01/02/2021 08:21, Boris Brezillon wrote:
> > Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay
> > in the threaded irq handler as long as we can.
> > 
> > Signed-off-by: Boris Brezillon   
> 
> Looks fine to me, but I'm interested to know if you actually saw a 
> performance improvement. Back-to-back MMU faults should (hopefully) be 
> fairly uncommon.

I actually didn't check the perf improvement or the actual number of
back-to-back MMU faults, but
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_1000x1000_drawcount_5000
seemed to generate a few of those, so I thought it'd be good to
optimize that case given how trivial it is.

> 
> Regardless:
> 
> Reviewed-by: Steven Price 
> 
> > ---
> >   drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++
> >   1 file changed, 7 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
> > b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> > index 21e552d1ac71..65bc20628c4e 100644
> > --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> > +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> > @@ -580,6 +580,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int 
> > irq, void *data)
> > u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT);
> > int i, ret;
> >   
> > +again:
> > +
> > for (i = 0; status; i++) {
> > u32 mask = BIT(i) | BIT(i + 16);
> > u64 addr;
> > @@ -628,6 +630,11 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int 
> > irq, void *data)
> > status &= ~mask;
> > }
> >   
> > +   /* If we received new MMU interrupts, process them before returning. */
> > +   status = mmu_read(pfdev, MMU_INT_RAWSTAT);
> > +   if (status)
> > +   goto again;
> > +
> > mmu_write(pfdev, MMU_INT_MASK, ~0);
> > return IRQ_HANDLED;
> >   };
> >   
> 

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 3/3] drm/panfrost: Stay in the threaded MMU IRQ handler until we've handled all IRQs

2021-02-01 Thread Boris Brezillon
On Mon, 1 Feb 2021 13:24:00 +
Steven Price  wrote:

> On 01/02/2021 12:59, Boris Brezillon wrote:
> > On Mon, 1 Feb 2021 12:13:49 +
> > Steven Price  wrote:
> >   
> >> On 01/02/2021 08:21, Boris Brezillon wrote:  
> >>> Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay
> >>> in the threaded irq handler as long as we can.
> >>>
> >>> Signed-off-by: Boris Brezillon   
> >>
> >> Looks fine to me, but I'm interested to know if you actually saw a
> >> performance improvement. Back-to-back MMU faults should (hopefully) be
> >> fairly uncommon.  
> > 
> > I actually didn't check the perf improvement or the actual number of
> > back-to-back MMU faults, but
> > dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_1000x1000_drawcount_5000
> > seemed to generate a few of those, so I thought it'd be good to
> > optimize that case given how trivial it is.  
> 
> Fair enough! I was just a little concerned that Panfrost was somehow 
> provoking enough interrupts that this was a measurable performance 
> improvement.
> 
> I assume you'll push these to drm-misc-next (/fixes) as appropriate.

I think I'll just queue everything to misc-next and let the stable
maintainers backport the fixes (no one complained about this issue so
far). 

> 
> Thanks,
> 
> Steve
> 
> >>
> >> Regardless:
> >>
> >> Reviewed-by: Steven Price 
> >>  
> >>> ---
> >>>drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++
> >>>1 file changed, 7 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
> >>> b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> >>> index 21e552d1ac71..65bc20628c4e 100644
> >>> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> >>> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> >>> @@ -580,6 +580,8 @@ static irqreturn_t 
> >>> panfrost_mmu_irq_handler_thread(int irq, void *data)
> >>>   u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT);
> >>>   int i, ret;
> >>>
> >>> +again:
> >>> +
> >>>   for (i = 0; status; i++) {
> >>>   u32 mask = BIT(i) | BIT(i + 16);
> >>>   u64 addr;
> >>> @@ -628,6 +630,11 @@ static irqreturn_t 
> >>> panfrost_mmu_irq_handler_thread(int irq, void *data)
> >>>   status &= ~mask;
> >>>   }
> >>>
> >>> + /* If we received new MMU interrupts, process them before returning. */
> >>> + status = mmu_read(pfdev, MMU_INT_RAWSTAT);
> >>> + if (status)
> >>> + goto again;
> >>> +
> >>>   mmu_write(pfdev, MMU_INT_MASK, ~0);
> >>>   return IRQ_HANDLED;
> >>>};
> >>>  
> >>  
> >   
> 

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v2 0/3] drm/panfrost: MMU fixes

2021-02-05 Thread Boris Brezillon
Hello,

Here are 2 fixes and one improvement for the page fault handling. Those
bugs were found while working on indirect draw supports which requires
the allocation of a big heap buffer for varyings, and the vertex/tiler
shaders seem to have access pattern that trigger those issues. I
remember discussing the first issue with Steve or Robin a while back,
but we never hit it before (now we do :)).

The last patch is a perf improvement: no need to re-enable hardware
interrupts if we know the threaded irq handler will be woken up right
away.

Regards,

Boris

Changes in v2:
* Rework the MMU irq handling loop to avoid a goto

Boris Brezillon (3):
  drm/panfrost: Clear MMU irqs before handling the fault
  drm/panfrost: Don't try to map pages that are already mapped
  drm/panfrost: Stay in the threaded MMU IRQ handler until we've handled
all IRQs

 drivers/gpu/drm/panfrost/panfrost_mmu.c | 39 +++--
 1 file changed, 24 insertions(+), 15 deletions(-)

-- 
2.26.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v2 1/3] drm/panfrost: Clear MMU irqs before handling the fault

2021-02-05 Thread Boris Brezillon
When a fault is handled it will unblock the GPU which will continue
executing its shader and might fault almost immediately on a different
page. If we clear interrupts after handling the fault we might miss new
faults, so clear them before.

Cc: 
Fixes: 187d2929206e ("drm/panfrost: Add support for GPU heap allocations")
Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 7c1b3481b785..904d63450862 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -593,6 +593,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, 
void *data)
access_type = (fault_status >> 8) & 0x3;
source_id = (fault_status >> 16);
 
+   mmu_write(pfdev, MMU_INT_CLEAR, mask);
+
/* Page fault only */
ret = -1;
if ((status & mask) == BIT(i) && (exception_type & 0xF8) == 
0xC0)
@@ -616,8 +618,6 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, 
void *data)
access_type, access_type_name(pfdev, 
fault_status),
source_id);
 
-   mmu_write(pfdev, MMU_INT_CLEAR, mask);
-
status &= ~mask;
}
 
-- 
2.26.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v2 3/3] drm/panfrost: Stay in the threaded MMU IRQ handler until we've handled all IRQs

2021-02-05 Thread Boris Brezillon
Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay
in the threaded irq handler as long as we can.

v2:
* Rework the loop to avoid a goto

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 26 +
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 21e552d1ac71..0581186ebfb3 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -578,22 +578,20 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int 
irq, void *data)
 {
struct panfrost_device *pfdev = data;
u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT);
-   int i, ret;
+   int ret;
 
-   for (i = 0; status; i++) {
-   u32 mask = BIT(i) | BIT(i + 16);
+   while (status) {
+   u32 as = ffs(status | (status >> 16)) - 1;
+   u32 mask = BIT(as) | BIT(as + 16);
u64 addr;
u32 fault_status;
u32 exception_type;
u32 access_type;
u32 source_id;
 
-   if (!(status & mask))
-   continue;
-
-   fault_status = mmu_read(pfdev, AS_FAULTSTATUS(i));
-   addr = mmu_read(pfdev, AS_FAULTADDRESS_LO(i));
-   addr |= (u64)mmu_read(pfdev, AS_FAULTADDRESS_HI(i)) << 32;
+   fault_status = mmu_read(pfdev, AS_FAULTSTATUS(as));
+   addr = mmu_read(pfdev, AS_FAULTADDRESS_LO(as));
+   addr |= (u64)mmu_read(pfdev, AS_FAULTADDRESS_HI(as)) << 32;
 
/* decode the fault status */
exception_type = fault_status & 0xFF;
@@ -604,8 +602,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, 
void *data)
 
/* Page fault only */
ret = -1;
-   if ((status & mask) == BIT(i) && (exception_type & 0xF8) == 
0xC0)
-   ret = panfrost_mmu_map_fault_addr(pfdev, i, addr);
+   if ((status & mask) == BIT(as) && (exception_type & 0xF8) == 
0xC0)
+   ret = panfrost_mmu_map_fault_addr(pfdev, as, addr);
 
if (ret)
/* terminal fault, print info about the fault */
@@ -617,7 +615,7 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, 
void *data)
"exception type 0x%X: %s\n"
"access type 0x%X: %s\n"
"source id 0x%X\n",
-   i, addr,
+   as, addr,
"TODO",
fault_status,
(fault_status & (1 << 10) ? "DECODER FAULT" : 
"SLAVE FAULT"),
@@ -626,6 +624,10 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int 
irq, void *data)
source_id);
 
status &= ~mask;
+
+   /* If we received new MMU interrupts, process them before 
returning. */
+   if (!status)
+   status = mmu_read(pfdev, MMU_INT_RAWSTAT);
}
 
mmu_write(pfdev, MMU_INT_MASK, ~0);
-- 
2.26.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v2 2/3] drm/panfrost: Don't try to map pages that are already mapped

2021-02-05 Thread Boris Brezillon
We allocate 2MB chunks at a time, so it might appear that a page fault
has already been handled by a previous page fault when we reach
panfrost_mmu_map_fault_addr(). Bail out in that case to avoid mapping the
same area twice.

Cc: 
Fixes: 187d2929206e ("drm/panfrost: Add support for GPU heap allocations")
Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 904d63450862..21e552d1ac71 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -488,8 +488,14 @@ static int panfrost_mmu_map_fault_addr(struct 
panfrost_device *pfdev, int as,
}
bo->base.pages = pages;
bo->base.pages_use_count = 1;
-   } else
+   } else {
pages = bo->base.pages;
+   if (pages[page_offset]) {
+   /* Pages are already mapped, bail out. */
+   mutex_unlock(&bo->base.pages_lock);
+   goto out;
+   }
+   }
 
mapping = bo->base.base.filp->f_mapping;
mapping_set_unevictable(mapping);
@@ -522,6 +528,7 @@ static int panfrost_mmu_map_fault_addr(struct 
panfrost_device *pfdev, int as,
 
dev_dbg(pfdev->dev, "mapped page fault @ AS%d %llx", as, addr);
 
+out:
panfrost_gem_mapping_put(bomapping);
 
return 0;
-- 
2.26.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v2 0/3] drm/panfrost: MMU fixes

2021-02-15 Thread Boris Brezillon
On Fri,  5 Feb 2021 12:17:54 +0100
Boris Brezillon  wrote:

> Hello,
> 
> Here are 2 fixes and one improvement for the page fault handling. Those
> bugs were found while working on indirect draw supports which requires
> the allocation of a big heap buffer for varyings, and the vertex/tiler
> shaders seem to have access pattern that trigger those issues. I
> remember discussing the first issue with Steve or Robin a while back,
> but we never hit it before (now we do :)).
> 
> The last patch is a perf improvement: no need to re-enable hardware
> interrupts if we know the threaded irq handler will be woken up right
> away.
> 
> Regards,
> 
> Boris
> 
> Changes in v2:
> * Rework the MMU irq handling loop to avoid a goto

Queued to drm-misc-next.

> 
> Boris Brezillon (3):
>   drm/panfrost: Clear MMU irqs before handling the fault
>   drm/panfrost: Don't try to map pages that are already mapped
>   drm/panfrost: Stay in the threaded MMU IRQ handler until we've handled
> all IRQs
> 
>  drivers/gpu/drm/panfrost/panfrost_mmu.c | 39 +++--
>  1 file changed, 24 insertions(+), 15 deletions(-)
> 

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/shmem-helper: Don't remove the offset in vm_area_struct pgoff

2021-02-22 Thread Boris Brezillon
 whether the
> > >>>>>>> best (short term) solution is to actually disable the shrinker. I'm
> > >>>>>>> somewhat surprised that nobody has got fed up with the "Purging xxx
> > >>>>>>> bytes" message spam - which makes me think that most people are not
> > >>>>>>> hitting memory pressure to trigger the shrinker.  
> > >>>>>>
> > >>>>>> If the shrinker is dodgy, then it's probably good to have the 
> > >>>>>> messages
> > >>>>>> to know if it ran.
> > >>>>>>  
> > >>>>>>> The shrinker on kbase caused a lot of grief - and the only way I 
> > >>>>>>> managed
> > >>>>>>> to get that under control was by writing a static analysis tool for 
> > >>>>>>> the
> > >>>>>>> locking, and by upsetting people by enforcing the (rather dumb) 
> > >>>>>>> rules of
> > >>>>>>> the tool on the code base. I've been meaning to look at whether 
> > >>>>>>> sparse
> > >>>>>>> can do a similar check of locks.  
> > >>>>>>
> > >>>>>> Lockdep doesn't cover it?  
> > >>>>>
> > >>>>> Short answer: no ;)  
> > >>>
> > >>> It's pretty good actually, if you correctly annotate things up.  
> > >>
> > >> I agree - it's pretty good, the problem is you need reasonable test
> > >> coverage, and getting good test coverage of shrinkers is hard.
> > >>  
> > >>>>> The problem with lockdep is that you have to trigger the locking
> > >>>>> scenario to get a warning out of it. For example you obviously won't 
> > >>>>> get
> > >>>>> any warnings about the shrinker without triggering the shrinker (which
> > >>>>> means memory pressure since we don't have the debugfs file to trigger 
> > >>>>> it).  
> > >>>>
> > >>>> Actually, you don't need debugfs. Writing to /proc/sys/vm/drop_caches
> > >>>> will do it. Though maybe there's other code path scenarios that
> > >>>> wouldn't cover.  
> > >>>
> > >>> Huh didn't know, but it's a bit a shotgun, plus it doesn't use
> > >>> fs_reclaim shrinker annotations, which means you don't have lockdep
> > >>> checks. I think at least, would need some deadlock and testing.  
> > >>
> > >> The big problem with this sort of method for triggering the shrinkers is
> > >> that they are called without (many) locks held. Whereas it's entirely
> > >> possible for a shrinker to be called at (almost) any allocation in the
> > >> kernel.
> > >>
> > >> Admittedly the Panfrost shrinkers are fairly safe - because most things 
> > >> are
> > >> xxx_trylock(). kbase avoids trylock which makes reclaim more reliable, 
> > >> but
> > >> means deadlocks are much easier.  
> > >
> > > This is why you need the fs_reclaim annotation. With that lockdep can
> > > connect the dots. See also might_alloc() annotations I've added in 5.11 or
> > > so.
> > >
> > > Validating shrinkers for deadlocks is actually not that hard, you just
> > > need the debugfs interface to run your shrinker at will under the
> > > fs_reclaim_acquire/release annotations. You do _not_ need to hit the full
> > > combinatorial test matrix of making sure that your shrinker is called in
> > > any possible place where memory is allocated.  
> >
> > Cool - I hadn't looked at that code before, but it does look like it
> > should pick up the problem cases. I wish that had existed back when I
> > was dealing with kbase! :)
> >  
> > >>>>> I have to admit I'm not 100% sure I've seen any lockdep warnings based
> > >>>>> on buffer objects recently. I can trigger them based on jobs:
> > >>>>>  
> > >> [snip]  
> > >>>>>
> > >>>>> Certainly here the mutex causing the problem is the shrinker_lock!
> > >>>>>
> > >>>>> The above is triggered by chucking a whole ton of jobs which
> > >>>>> fault at the GPU.
> > >>>>>
> > >>>>> Sadly I haven't found time to work out how to untangle the locks.  
> > >>>>
> > >>>> They are tricky because pretty much any memory allocation can trigger
> > >>>> things as I recall.  
> > >>>
> > >>> The above should only be possible with my dma_fence annotations, and
> > >>> yes the point to bugs in the drm/scheduler. They shouldn't matter for
> > >>> panfrost, and those patches aren't in upstream yet.  
> > >>
> > >> Yes that's on a (random version of) drm-misc - just what I happened to 
> > >> have
> > >> built recently. Good news if that's not actually Panfrost's bug. I 
> > >> haven't
> > >> had the time to track down what's going on yet.  
> > >
> > > Are you sure this is really drm-misc? The patch should never have been
> > > there which adds these annotations.  
> >
> > Well drm-misc-next. It's based on commit e4abd7ad2b77 with some pending
> > Panfrost fixes on top.
> >  
> > > Also help for fixing common code is appreciated :-)  
> >
> > Indeed - I have put it on my todo list, but I'm not sure when I'll be
> > able to spend time on it.  
> 
> Oh there's indeed panfrost annotations in panfrost_reset(), but I
> guess that stuff just wasn't ever really tested when it landed?
> 
> commit 5bc5cc2819c2c0adb644919e3e790b504ea47e0a
> Author: Boris Brezillon 
> Date:   Thu Nov 5 16:17:04 2020 +0100
> 
>drm/panfrost: Move the GPU reset bits outside the timeout handler
> 
> Adding Boris, since my idea wasn't to just add more lockdep splats to
> panfrost, but help make sure the reset code works correctly.

Okay, I'm a bit tired of this "oh, this is not really what I asked"
reply I got when you realize the code you acked/reviewed is not what
you expected. The dmafence annotation stuff is far from trivial, and I
remember asking for help when I submitted the version adding those
annotations in the reset path. Note that I didn't intend to spend days
on that stuff since all I was trying to do with this patch was to fix a
deadlock we had in the reset path, deadlock that was not related to dma
fence signaling at all. So yes, it was not thoroughly tested
(especially the fence signaling stuff), but enough to validate the
workloads that triggered the deadlock I was trying to fix no longer
caused this deadlock.

Don't get me wrong, I'm all for improving the DRM subsystem, and this
fencing annotation is probably a great thing, but you can't expect
people to take on those complex tasks without a bit of supervision.

Regards,

Boris
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[RFC PATCH 0/7] drm/panfrost: Add a new submit ioctl

2021-03-11 Thread Boris Brezillon
Hello,

I've been playing with Vulkan lately and struggled quite a bit to
implement VkQueueSubmit with the submit ioctl we have. There are
several limiting factors that can be worked around if we really have to,
but I think it'd be much easier and future-proof if we introduce a new
ioctl that addresses the current limitations:

1/ There can only be one out_sync, but Vulkan might ask us to signal
   several VkSemaphores and possibly one VkFence too, both of those
   being based on sync objects in my PoC. Making out_sync an array of
   syncobjs to attach the render_done fence to would make that possible.
   The other option would be to collect syncobj updates in userspace
   in a separate thread and propagate those updates to all
   semaphores+fences waiting on those events (I think the v3dv driver
   does something like that, but I didn't spend enough time studying
   the code to be sure, so I might be wrong).

2/ Queued jobs might be executed out-of-order (unless they have
   explicit/implicit deps between them), and Vulkan asks that the out
   fence be signaled when all jobs are done. Timeline syncobjs are a
   good match for that use case. All we need to do is pass the same
   fence syncobj to all jobs being attached to a single QueueSubmit
   request, but a different point on the timeline. The syncobj
   timeline wait does the rest and guarantees that we've reached a
   given timeline point (IOW, all jobs before that point are done)
   before declaring the fence as signaled.
   One alternative would be to have dummy 'synchronization' jobs that
   don't actually execute anything on the GPU but declare a dependency
   on all other jobs that are part of the QueueSubmit request, and
   signal the out fence (the scheduler would do most of the work for
   us, all we have to do is support NULL job heads and signal the
   fence directly when that happens instead of queueing the job).

3/ The current implementation lacks information about BO access,
   so we serialize all jobs accessing the same set of BOs, even
   if those jobs might just be reading from them (which can
   happen concurrently). Other drivers pass an access type to the
   list of referenced BOs to address that. Another option would be
   to disable implicit deps (deps based on BOs) and force the driver
   to pass all deps explicitly (interestingly, some drivers have
   both the no-implicit-dep and r/w flags, probably to support
   sub-resource access, so we might want to add that one too).
   I don't see any userspace workaround to that problem, so that one
   alone would justify extending the existing ioctl or adding a new
   one.

4/ There's also the fact that submitting one job at a time adds an
   overhead when QueueSubmit is being passed more than one
   CommandBuffer. That one is less problematic, but if we're adding
   a new ioctl we'd better design it to limit the userspace -> kernel
   transition overhead.

Right now I'm just trying to collect feedback. I don't intend to get
those patches merged until we have a userspace user, but I thought
starting the discussion early would be a good thing.

Feel free to suggest other approaches.

Regards,

Boris

Boris Brezillon (7):
  drm/panfrost: Pass a job to panfrost_{acquire,attach_object_fences}()
  drm/panfrost: Collect implicit and explicit deps in an XArray
  drm/panfrost: Move the mappings collection out of
panfrost_lookup_bos()
  drm/panfrost: Add BO access flags to relax dependencies between jobs
  drm/panfrost: Add a new ioctl to submit batches
  drm/panfrost: Advertise the SYNCOBJ_TIMELINE feature
  drm/panfrost: Bump minor version to reflect the feature additions

 drivers/gpu/drm/panfrost/panfrost_drv.c | 408 +---
 drivers/gpu/drm/panfrost/panfrost_job.c |  80 +++--
 drivers/gpu/drm/panfrost/panfrost_job.h |   8 +-
 include/uapi/drm/panfrost_drm.h |  83 +
 4 files changed, 483 insertions(+), 96 deletions(-)

-- 
2.26.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[RFC PATCH 3/7] drm/panfrost: Move the mappings collection out of panfrost_lookup_bos()

2021-03-11 Thread Boris Brezillon
So we can re-use it from elsewhere.

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 52 ++---
 1 file changed, 29 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 86c1347c6f0e..32e822e00a08 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -108,6 +108,34 @@ static int panfrost_ioctl_create_bo(struct drm_device 
*dev, void *data,
return 0;
 }
 
+static int
+panfrost_get_job_mappings(struct drm_file *file_priv, struct panfrost_job *job)
+{
+   struct panfrost_file_priv *priv = file_priv->driver_priv;
+   unsigned int i;
+
+   job->mappings = kvmalloc_array(job->bo_count,
+  sizeof(*job->mappings),
+  GFP_KERNEL | __GFP_ZERO);
+   if (!job->mappings)
+   return -ENOMEM;
+
+   for (i = 0; i < job->bo_count; i++) {
+   struct panfrost_gem_mapping *mapping;
+   struct panfrost_gem_object *bo;
+
+   bo = to_panfrost_bo(job->bos[i]);
+   mapping = panfrost_gem_mapping_get(bo, priv);
+   if (!mapping)
+   return -EINVAL;
+
+   atomic_inc(&bo->gpu_usecount);
+   job->mappings[i] = mapping;
+   }
+
+   return 0;
+}
+
 /**
  * panfrost_lookup_bos() - Sets up job->bo[] with the GEM objects
  * referenced by the job.
@@ -127,8 +155,6 @@ panfrost_lookup_bos(struct drm_device *dev,
  struct drm_panfrost_submit *args,
  struct panfrost_job *job)
 {
-   struct panfrost_file_priv *priv = file_priv->driver_priv;
-   struct panfrost_gem_object *bo;
unsigned int i;
int ret;
 
@@ -143,27 +169,7 @@ panfrost_lookup_bos(struct drm_device *dev,
if (ret)
return ret;
 
-   job->mappings = kvmalloc_array(job->bo_count,
-  sizeof(struct panfrost_gem_mapping *),
-  GFP_KERNEL | __GFP_ZERO);
-   if (!job->mappings)
-   return -ENOMEM;
-
-   for (i = 0; i < job->bo_count; i++) {
-   struct panfrost_gem_mapping *mapping;
-
-   bo = to_panfrost_bo(job->bos[i]);
-   mapping = panfrost_gem_mapping_get(bo, priv);
-   if (!mapping) {
-   ret = -EINVAL;
-   break;
-   }
-
-   atomic_inc(&bo->gpu_usecount);
-   job->mappings[i] = mapping;
-   }
-
-   return ret;
+   return panfrost_get_job_mappings(file_priv, job);
 }
 
 /**
-- 
2.26.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[RFC PATCH 2/7] drm/panfrost: Collect implicit and explicit deps in an XArray

2021-03-11 Thread Boris Brezillon
This way we can re-use the standard drm_gem_fence_array_add_implicit()
helper and simplify the panfrost_job_dependency() logic.

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 42 +++---
 drivers/gpu/drm/panfrost/panfrost_job.c | 57 +++--
 drivers/gpu/drm/panfrost/panfrost_job.h |  7 +--
 3 files changed, 42 insertions(+), 64 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 83a461bdeea8..86c1347c6f0e 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -137,12 +137,6 @@ panfrost_lookup_bos(struct drm_device *dev,
if (!job->bo_count)
return 0;
 
-   job->implicit_fences = kvmalloc_array(job->bo_count,
- sizeof(struct dma_fence *),
- GFP_KERNEL | __GFP_ZERO);
-   if (!job->implicit_fences)
-   return -ENOMEM;
-
ret = drm_gem_objects_lookup(file_priv,
 (void __user *)(uintptr_t)args->bo_handles,
 job->bo_count, &job->bos);
@@ -173,8 +167,7 @@ panfrost_lookup_bos(struct drm_device *dev,
 }
 
 /**
- * panfrost_copy_in_sync() - Sets up job->in_fences[] with the sync objects
- * referenced by the job.
+ * panfrost_copy_in_sync() - Add in_syncs to the job->deps array.
  * @dev: DRM device
  * @file_priv: DRM file for this fd
  * @args: IOCTL args
@@ -187,28 +180,18 @@ panfrost_lookup_bos(struct drm_device *dev,
  */
 static int
 panfrost_copy_in_sync(struct drm_device *dev,
- struct drm_file *file_priv,
- struct drm_panfrost_submit *args,
- struct panfrost_job *job)
+ struct drm_file *file_priv,
+ struct drm_panfrost_submit *args,
+ struct panfrost_job *job)
 {
u32 *handles;
int ret = 0;
int i;
 
-   job->in_fence_count = args->in_sync_count;
-
-   if (!job->in_fence_count)
+   if (!args->in_sync_count)
return 0;
 
-   job->in_fences = kvmalloc_array(job->in_fence_count,
-   sizeof(struct dma_fence *),
-   GFP_KERNEL | __GFP_ZERO);
-   if (!job->in_fences) {
-   DRM_DEBUG("Failed to allocate job in fences\n");
-   return -ENOMEM;
-   }
-
-   handles = kvmalloc_array(job->in_fence_count, sizeof(u32), GFP_KERNEL);
+   handles = kvmalloc_array(args->in_sync_count, sizeof(u32), GFP_KERNEL);
if (!handles) {
ret = -ENOMEM;
DRM_DEBUG("Failed to allocate incoming syncobj handles\n");
@@ -217,17 +200,23 @@ panfrost_copy_in_sync(struct drm_device *dev,
 
if (copy_from_user(handles,
   (void __user *)(uintptr_t)args->in_syncs,
-  job->in_fence_count * sizeof(u32))) {
+  args->in_sync_count * sizeof(u32))) {
ret = -EFAULT;
DRM_DEBUG("Failed to copy in syncobj handles\n");
goto fail;
}
 
-   for (i = 0; i < job->in_fence_count; i++) {
+   for (i = 0; i < args->in_sync_count; i++) {
+   struct dma_fence *fence;
+
ret = drm_syncobj_find_fence(file_priv, handles[i], 0, 0,
-&job->in_fences[i]);
+&fence);
if (ret == -EINVAL)
goto fail;
+
+   ret = drm_gem_fence_array_add(&job->deps, fence);
+   if (ret)
+   goto fail;
}
 
 fail:
@@ -269,6 +258,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, 
void *data,
job->requirements = args->requirements;
job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev);
job->file_priv = file->driver_priv;
+   xa_init_flags(&job->deps, XA_FLAGS_ALLOC);
 
ret = panfrost_copy_in_sync(dev, file, args, job);
if (ret)
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index 564054a96ca9..87bfbd77d753 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -196,20 +196,31 @@ static void panfrost_job_hw_submit(struct panfrost_job 
*job, int js)
job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START);
 }
 
-static void panfrost_acquire_object_fences(struct panfrost_job *job)
+static int panfrost_acquire_object_fences(struct panfrost_job *job)
 {
int i;
 
-   for (i = 0; i < job->bo_count; i++)
-   job->implicit_fences[i] = 
dma_resv_get_excl_rcu(

[RFC PATCH 4/7] drm/panfrost: Add BO access flags to relax dependencies between jobs

2021-03-11 Thread Boris Brezillon
Jobs reading from the same BO should not be serialized. Add access flags
so we can relax the implicit dependencies in that case. We force RW
access for now to keep the behavior unchanged, but a new SUBMIT ioctl
taking explicit access flags will be introduced.

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c |  9 +
 drivers/gpu/drm/panfrost/panfrost_job.c | 15 +--
 drivers/gpu/drm/panfrost/panfrost_job.h |  1 +
 include/uapi/drm/panfrost_drm.h |  4 
 4 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 32e822e00a08..be45efee47a2 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -163,6 +163,15 @@ panfrost_lookup_bos(struct drm_device *dev,
if (!job->bo_count)
return 0;
 
+   job->bo_flags = kvmalloc_array(job->bo_count,
+  sizeof(*job->bo_flags),
+  GFP_KERNEL | __GFP_ZERO);
+   if (!job->bo_flags)
+   return -ENOMEM;
+
+   for (i = 0; i < job->bo_count; i++)
+   job->bo_flags[i] = PANFROST_BO_REF_RW;
+
ret = drm_gem_objects_lookup(file_priv,
 (void __user *)(uintptr_t)args->bo_handles,
 job->bo_count, &job->bos);
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index 87bfbd77d753..170c3b0c8641 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -201,10 +201,17 @@ static int panfrost_acquire_object_fences(struct 
panfrost_job *job)
int i;
 
for (i = 0; i < job->bo_count; i++) {
+   bool write = job->bo_flags[i] & PANFROST_BO_REF_WRITE;
int ret;
 
+   if (!(job->bo_flags[i] & PANFROST_BO_REF_WRITE)) {
+   ret = dma_resv_reserve_shared(job->bos[i]->resv, 1);
+   if (ret)
+   return ret;
+   }
+
ret = drm_gem_fence_array_add_implicit(&job->deps,
-  job->bos[i], true);
+  job->bos[i], write);
if (ret)
return ret;
}
@@ -219,7 +226,10 @@ static void panfrost_attach_object_fences(struct 
panfrost_job *job)
for (i = 0; i < job->bo_count; i++) {
struct dma_resv *robj = job->bos[i]->resv;
 
-   dma_resv_add_excl_fence(robj, job->render_done_fence);
+   if (job->bo_flags[i] & PANFROST_BO_REF_WRITE)
+   dma_resv_add_excl_fence(robj, job->render_done_fence);
+   else
+   dma_resv_add_shared_fence(robj, job->render_done_fence);
}
 }
 
@@ -298,6 +308,7 @@ static void panfrost_job_cleanup(struct kref *ref)
kvfree(job->bos);
}
 
+   kvfree(job->bo_flags);
kfree(job);
 }
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.h 
b/drivers/gpu/drm/panfrost/panfrost_job.h
index b10b5be57dd2..c3f662771ed9 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.h
+++ b/drivers/gpu/drm/panfrost/panfrost_job.h
@@ -31,6 +31,7 @@ struct panfrost_job {
 
struct panfrost_gem_mapping **mappings;
struct drm_gem_object **bos;
+   u32 *bo_flags;
u32 bo_count;
 
/* Fence to be signaled by drm-sched once its done with the job */
diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
index ec19db1eead8..eab96d7e0e0e 100644
--- a/include/uapi/drm/panfrost_drm.h
+++ b/include/uapi/drm/panfrost_drm.h
@@ -223,6 +223,10 @@ struct drm_panfrost_madvise {
__u32 retained;   /* out, whether backing store still exists */
 };
 
+#define PANFROST_BO_REF_READ   0x1
+#define PANFROST_BO_REF_WRITE  0x2
+#define PANFROST_BO_REF_RW (PANFROST_BO_REF_READ | 
PANFROST_BO_REF_WRITE)
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.26.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[RFC PATCH 7/7] drm/panfrost: Bump minor version to reflect the feature additions

2021-03-11 Thread Boris Brezillon
We now have a new ioctl that allows submitting multiple jobs at once
(among other things) and we support timelined syncobjs. Bump the
minor version number to reflect those changes.

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index b3cc6fecb18d..7fb9b20ba27f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -855,6 +855,7 @@ DEFINE_DRM_GEM_FOPS(panfrost_drm_driver_fops);
  * Panfrost driver version:
  * - 1.0 - initial interface
  * - 1.1 - adds HEAP and NOEXEC flags for CREATE_BO
+ * - 1.2 - adds BATCH_SUBMIT ioctl and SYNCOBJ_TIMELINE feature
  */
 static const struct drm_driver panfrost_drm_driver = {
.driver_features= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ |
@@ -868,7 +869,7 @@ static const struct drm_driver panfrost_drm_driver = {
.desc   = "panfrost DRM",
.date   = "20180908",
.major  = 1,
-   .minor  = 1,
+   .minor  = 2,
 
.gem_create_object  = panfrost_gem_create_object,
.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
-- 
2.26.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[RFC PATCH 6/7] drm/panfrost: Advertise the SYNCOBJ_TIMELINE feature

2021-03-11 Thread Boris Brezillon
Now that we have a new SUBMIT ioctl dealing with timelined syncojbs we
can advertise the feature.

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 3f8addab8bab..b3cc6fecb18d 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -857,7 +857,8 @@ DEFINE_DRM_GEM_FOPS(panfrost_drm_driver_fops);
  * - 1.1 - adds HEAP and NOEXEC flags for CREATE_BO
  */
 static const struct drm_driver panfrost_drm_driver = {
-   .driver_features= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ,
+   .driver_features= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ |
+ DRIVER_SYNCOBJ_TIMELINE,
.open   = panfrost_open,
.postclose  = panfrost_postclose,
.ioctls = panfrost_drm_driver_ioctls,
-- 
2.26.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[RFC PATCH 1/7] drm/panfrost: Pass a job to panfrost_{acquire, attach_object_fences}()

2021-03-11 Thread Boris Brezillon
So we don't have to change the prototype if we extend the function.

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_job.c | 22 --
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index 6003cfeb1322..564054a96ca9 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -196,24 +196,20 @@ static void panfrost_job_hw_submit(struct panfrost_job 
*job, int js)
job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START);
 }
 
-static void panfrost_acquire_object_fences(struct drm_gem_object **bos,
-  int bo_count,
-  struct dma_fence **implicit_fences)
+static void panfrost_acquire_object_fences(struct panfrost_job *job)
 {
int i;
 
-   for (i = 0; i < bo_count; i++)
-   implicit_fences[i] = dma_resv_get_excl_rcu(bos[i]->resv);
+   for (i = 0; i < job->bo_count; i++)
+   job->implicit_fences[i] = 
dma_resv_get_excl_rcu(job->bos[i]->resv);
 }
 
-static void panfrost_attach_object_fences(struct drm_gem_object **bos,
- int bo_count,
- struct dma_fence *fence)
+static void panfrost_attach_object_fences(struct panfrost_job *job)
 {
int i;
 
-   for (i = 0; i < bo_count; i++)
-   dma_resv_add_excl_fence(bos[i]->resv, fence);
+   for (i = 0; i < job->bo_count; i++)
+   dma_resv_add_excl_fence(job->bos[i]->resv, 
job->render_done_fence);
 }
 
 int panfrost_job_push(struct panfrost_job *job)
@@ -243,15 +239,13 @@ int panfrost_job_push(struct panfrost_job *job)
 
kref_get(&job->refcount); /* put by scheduler job completion */
 
-   panfrost_acquire_object_fences(job->bos, job->bo_count,
-  job->implicit_fences);
+   panfrost_acquire_object_fences(job);
 
drm_sched_entity_push_job(&job->base, entity);
 
mutex_unlock(&pfdev->sched_lock);
 
-   panfrost_attach_object_fences(job->bos, job->bo_count,
- job->render_done_fence);
+   panfrost_attach_object_fences(job);
 
 unlock:
drm_gem_unlock_reservations(job->bos, job->bo_count, &acquire_ctx);
-- 
2.26.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[RFC PATCH 5/7] drm/panfrost: Add a new ioctl to submit batches

2021-03-11 Thread Boris Brezillon
This should help limit the number of ioctls when submitting multiple
jobs. The new ioctl also supports syncobj timelines and BO access flags.

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 303 
 include/uapi/drm/panfrost_drm.h |  79 ++
 2 files changed, 382 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index be45efee47a2..3f8addab8bab 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -446,6 +446,308 @@ static int panfrost_ioctl_madvise(struct drm_device *dev, 
void *data,
return ret;
 }
 
+static int
+panfrost_get_job_in_syncs(struct drm_file *file_priv,
+ u64 refs, u32 ref_stride,
+ u32 count, struct panfrost_job *job)
+{
+   const void __user *in = u64_to_user_ptr(refs);
+   unsigned int i;
+   int ret;
+
+   if (!count)
+   return 0;
+
+   for (i = 0; i < count; i++) {
+   struct drm_panfrost_syncobj_ref ref = { };
+   struct dma_fence *fence;
+
+   ret = copy_struct_from_user(&ref, sizeof(ref),
+   in + (i * ref_stride),
+   ref_stride);
+   if (ret)
+   return ret;
+
+   if (ref.pad)
+   return -EINVAL;
+
+   ret = drm_syncobj_find_fence(file_priv, ref.handle, ref.point,
+0, &fence);
+   if (ret)
+   return ret;
+
+   ret = drm_gem_fence_array_add(&job->deps, fence);
+   if (ret)
+   return ret;
+   }
+
+   return 0;
+}
+
+struct panfrost_job_out_sync {
+   struct drm_syncobj *syncobj;
+   struct dma_fence_chain *chain;
+   u64 point;
+};
+
+static void
+panfrost_put_job_out_syncs(struct panfrost_job_out_sync *out_syncs, u32 count)
+{
+   unsigned int i;
+
+   for (i = 0; i < count; i++) {
+   if (!out_syncs[i].syncobj)
+   break;
+
+   drm_syncobj_put(out_syncs[i].syncobj);
+   kvfree(out_syncs[i].chain);
+   }
+
+   kvfree(out_syncs);
+}
+
+static struct panfrost_job_out_sync *
+panfrost_get_job_out_syncs(struct drm_file *file_priv,
+  u64 refs, u32 ref_stride,
+  u32 count)
+{
+   void __user *in = u64_to_user_ptr(refs);
+   struct panfrost_job_out_sync *out_syncs;
+   unsigned int i;
+   int ret;
+
+   if (!count)
+   return NULL;
+
+   out_syncs = kvmalloc_array(count, sizeof(*out_syncs),
+  GFP_KERNEL | __GFP_ZERO);
+   if (!out_syncs)
+   return ERR_PTR(-ENOMEM);
+
+   for (i = 0; i < count; i++) {
+   struct drm_panfrost_syncobj_ref ref = { };
+
+   ret = copy_struct_from_user(&ref, sizeof(ref),
+   in + (i * ref_stride),
+   ref_stride);
+   if (ret)
+   goto err_free_out_syncs;
+
+   if (ref.pad) {
+   ret = -EINVAL;
+   goto err_free_out_syncs;
+   }
+
+   out_syncs[i].syncobj = drm_syncobj_find(file_priv, ref.handle);
+   if (!out_syncs[i].syncobj) {
+   ret = -EINVAL;
+   goto err_free_out_syncs;
+   }
+
+   out_syncs[i].point = ref.point;
+   if (!out_syncs[i].point)
+   continue;
+
+   out_syncs[i].chain = kmalloc(sizeof(*out_syncs[i].chain),
+GFP_KERNEL);
+   if (!out_syncs[i].chain) {
+   ret = -ENOMEM;
+   goto err_free_out_syncs;
+   }
+   }
+
+   return out_syncs;
+
+err_free_out_syncs:
+   panfrost_put_job_out_syncs(out_syncs, count);
+   return ERR_PTR(ret);
+}
+
+static void
+panfrost_set_job_out_fence(struct panfrost_job_out_sync *out_syncs,
+  unsigned int count, struct dma_fence *fence)
+{
+   unsigned int i;
+
+   for (i = 0; i < count; i++) {
+   if (out_syncs[i].chain) {
+   drm_syncobj_add_point(out_syncs[i].syncobj,
+ out_syncs[i].chain,
+ fence, out_syncs[i].point);
+   out_syncs[i].chain = NULL;
+   } else {
+   drm_syncobj_replace_fence(out_syncs[i].syncobj,
+ fence);
+   }
+   }
+}
+
+#define PANFROST_BO_REF_ALLOWED_FLAGS P

Re: [RFC PATCH 0/7] drm/panfrost: Add a new submit ioctl

2021-03-11 Thread Boris Brezillon
Hi Steven,

On Thu, 11 Mar 2021 12:16:33 +
Steven Price  wrote:

> On 11/03/2021 09:25, Boris Brezillon wrote:
> > Hello,
> > 
> > I've been playing with Vulkan lately and struggled quite a bit to
> > implement VkQueueSubmit with the submit ioctl we have. There are
> > several limiting factors that can be worked around if we really have to,
> > but I think it'd be much easier and future-proof if we introduce a new
> > ioctl that addresses the current limitations:  
> 
> Hi Boris,
> 
> I think what you've proposed is quite reasonable, some detailed comments 
> to your points below.
> 
> > 
> > 1/ There can only be one out_sync, but Vulkan might ask us to signal
> > several VkSemaphores and possibly one VkFence too, both of those
> > being based on sync objects in my PoC. Making out_sync an array of
> > syncobjs to attach the render_done fence to would make that possible.
> > The other option would be to collect syncobj updates in userspace
> > in a separate thread and propagate those updates to all
> > semaphores+fences waiting on those events (I think the v3dv driver
> > does something like that, but I didn't spend enough time studying
> > the code to be sure, so I might be wrong).  
> 
> You should be able to avoid the separate thread to propagate by having a 
> proxy object in user space that maps between the one outsync of the job 
> and the possibly many Vulkan objects. But I've had this argument before 
> with the DDK... and the upshot of it was that he Vulkan API is 
> unnecessarily complex here and makes this really hard to do in practise. 
> So I agree adding this capability to the kernel is likely the best approach.
> 
> > 2/ Queued jobs might be executed out-of-order (unless they have
> > explicit/implicit deps between them), and Vulkan asks that the out
> > fence be signaled when all jobs are done. Timeline syncobjs are a
> > good match for that use case. All we need to do is pass the same
> > fence syncobj to all jobs being attached to a single QueueSubmit
> > request, but a different point on the timeline. The syncobj
> > timeline wait does the rest and guarantees that we've reached a
> > given timeline point (IOW, all jobs before that point are done)
> > before declaring the fence as signaled.
> > One alternative would be to have dummy 'synchronization' jobs that
> > don't actually execute anything on the GPU but declare a dependency
> > on all other jobs that are part of the QueueSubmit request, and
> > signal the out fence (the scheduler would do most of the work for
> > us, all we have to do is support NULL job heads and signal the
> > fence directly when that happens instead of queueing the job).  
> 
> I have to admit to being rather hazy on the details of timeline 
> syncobjs, but I thought there was a requirement that the timeline moves 
> monotonically. I.e. if you have multiple jobs signalling the same 
> syncobj just with different points, then AFAIU the API requires that the 
> points are triggered in order.

I only started looking at the SYNCOBJ_TIMELINE API a few days ago, so I
might be wrong, but my understanding is that queuing fences (addition
of new points in the timeline) should happen in order, but signaling
can happen in any order. When checking for a signaled fence the
fence-chain logic starts from the last point (or from an explicit point
if you use the timeline wait flavor) and goes backward, stopping at the
first un-signaled node. If I'm correct, that means that fences that
are part of a chain can be signaled in any order.

Note that I also considered using a sync file, which has the ability to
merge fences, but that required 3 extra ioctls for each syncobj to merge
(for the export/merge/import round trip), and AFAICT, fences stay around
until the sync file is destroyed, which forces some garbage collection
if we want to keep the number of active fences low. One nice thing
about the fence-chain/syncobj-timeline logic is that signaled fences
are collected automatically when walking the chain.

> 
> So I'm not sure that you've actually fixed this point - you either need 
> to force an order (in which case the last job can signal the Vulkan 
> fence)

That options requires adding deps that do not really exist on the last
jobs, so I'm not sure I like it.

> or you still need a dummy job to do the many-to-one dependency.

Yeah, that's what I've considered doing before realizing timelined
syncojbs could solve this problem (assuming I got it right :-/).

> 
> Or I may have completely misunderstood timeline syncobjs - definitely a 
> p

Re: [RFC PATCH 0/7] drm/panfrost: Add a new submit ioctl

2021-03-11 Thread Boris Brezillon
Hi Jason,

On Thu, 11 Mar 2021 10:58:46 -0600
Jason Ekstrand  wrote:

> Hi all,
> 
> Dropping in where I may or may not be wanted to feel free to ignore. : -)

I'm glad you decided to chime in. :-)

 
> > > > 2/ Queued jobs might be executed out-of-order (unless they have
> > > > explicit/implicit deps between them), and Vulkan asks that the out
> > > > fence be signaled when all jobs are done. Timeline syncobjs are a
> > > > good match for that use case. All we need to do is pass the same
> > > > fence syncobj to all jobs being attached to a single QueueSubmit
> > > > request, but a different point on the timeline. The syncobj
> > > > timeline wait does the rest and guarantees that we've reached a
> > > > given timeline point (IOW, all jobs before that point are done)
> > > > before declaring the fence as signaled.
> > > > One alternative would be to have dummy 'synchronization' jobs that
> > > > don't actually execute anything on the GPU but declare a dependency
> > > > on all other jobs that are part of the QueueSubmit request, and
> > > > signal the out fence (the scheduler would do most of the work for
> > > > us, all we have to do is support NULL job heads and signal the
> > > > fence directly when that happens instead of queueing the job).  
> > >
> > > I have to admit to being rather hazy on the details of timeline
> > > syncobjs, but I thought there was a requirement that the timeline moves
> > > monotonically. I.e. if you have multiple jobs signalling the same
> > > syncobj just with different points, then AFAIU the API requires that the
> > > points are triggered in order.  
> >
> > I only started looking at the SYNCOBJ_TIMELINE API a few days ago, so I
> > might be wrong, but my understanding is that queuing fences (addition
> > of new points in the timeline) should happen in order, but signaling
> > can happen in any order. When checking for a signaled fence the
> > fence-chain logic starts from the last point (or from an explicit point
> > if you use the timeline wait flavor) and goes backward, stopping at the
> > first un-signaled node. If I'm correct, that means that fences that
> > are part of a chain can be signaled in any order.  
> 
> You don't even need a timeline for this.  Just have a single syncobj
> per-queue and make each submit wait on it and then signal it.
> Alternatively, you can just always hang on to the out-fence from the
> previous submit and make the next one wait on that.

That's what I have right now, but it forces the serialization of all
jobs that are pushed during a submit (and there can be more than one
per command buffer on panfrost :-/). Maybe I'm wrong, but I thought it'd
be better to not force this serialization if we can avoid it.

> Timelines are overkill here, IMO.

Mind developing why you think this is overkill? After looking at the
kernel implementation I thought using timeline syncobjs would be
pretty cheap compared to the other options I considered.

> 
> > Note that I also considered using a sync file, which has the ability to
> > merge fences, but that required 3 extra ioctls for each syncobj to merge
> > (for the export/merge/import round trip), and AFAICT, fences stay around
> > until the sync file is destroyed, which forces some garbage collection
> > if we want to keep the number of active fences low. One nice thing
> > about the fence-chain/syncobj-timeline logic is that signaled fences
> > are collected automatically when walking the chain.  
> 
> Yeah, that's the pain when working with sync files.  This is one of
> the reasons why our driver takes an arbitrary number of in/out
> syncobjs.
> 
> > > So I'm not sure that you've actually fixed this point - you either need
> > > to force an order (in which case the last job can signal the Vulkan
> > > fence)  
> >
> > That options requires adding deps that do not really exist on the last
> > jobs, so I'm not sure I like it.
> >  
> > > or you still need a dummy job to do the many-to-one dependency.  
> >
> > Yeah, that's what I've considered doing before realizing timelined
> > syncojbs could solve this problem (assuming I got it right :-/).
> >  
> > >
> > > Or I may have completely misunderstood timeline syncobjs - definitely a
> > > possibility :)  
> >
> > I wouldn't pretend I got it right either :-).
> >  
> > >  
> > > > 3/ The current implementation lacks information about BO access,
> > > > so we serialize all jobs accessing the same set of BOs, even
> > > > if those jobs might just be reading from them (which can
> > > > happen concurrently). Other drivers pass an access type to the
> > > > list of referenced BOs to address that. Another option would be
> > > > to disable implicit deps (deps based on BOs) and force the driver
> > > > to pass all deps explicitly (interestingly, some drivers have
> > > > both the no-implicit-dep and r/w flags, probably to support
> > > > sub-resource access, so we might want

Re: [RFC PATCH 0/7] drm/panfrost: Add a new submit ioctl

2021-03-11 Thread Boris Brezillon
On Thu, 11 Mar 2021 12:11:48 -0600
Jason Ekstrand  wrote:

> > > > > > 2/ Queued jobs might be executed out-of-order (unless they have
> > > > > > explicit/implicit deps between them), and Vulkan asks that the 
> > > > > > out
> > > > > > fence be signaled when all jobs are done. Timeline syncobjs are 
> > > > > > a
> > > > > > good match for that use case. All we need to do is pass the same
> > > > > > fence syncobj to all jobs being attached to a single QueueSubmit
> > > > > > request, but a different point on the timeline. The syncobj
> > > > > > timeline wait does the rest and guarantees that we've reached a
> > > > > > given timeline point (IOW, all jobs before that point are done)
> > > > > > before declaring the fence as signaled.
> > > > > > One alternative would be to have dummy 'synchronization' jobs 
> > > > > > that
> > > > > > don't actually execute anything on the GPU but declare a 
> > > > > > dependency
> > > > > > on all other jobs that are part of the QueueSubmit request, and
> > > > > > signal the out fence (the scheduler would do most of the work 
> > > > > > for
> > > > > > us, all we have to do is support NULL job heads and signal the
> > > > > > fence directly when that happens instead of queueing the job).  
> > > > >
> > > > > I have to admit to being rather hazy on the details of timeline
> > > > > syncobjs, but I thought there was a requirement that the timeline 
> > > > > moves
> > > > > monotonically. I.e. if you have multiple jobs signalling the same
> > > > > syncobj just with different points, then AFAIU the API requires that 
> > > > > the
> > > > > points are triggered in order.  
> > > >
> > > > I only started looking at the SYNCOBJ_TIMELINE API a few days ago, so I
> > > > might be wrong, but my understanding is that queuing fences (addition
> > > > of new points in the timeline) should happen in order, but signaling
> > > > can happen in any order. When checking for a signaled fence the
> > > > fence-chain logic starts from the last point (or from an explicit point
> > > > if you use the timeline wait flavor) and goes backward, stopping at the
> > > > first un-signaled node. If I'm correct, that means that fences that
> > > > are part of a chain can be signaled in any order.  
> > >
> > > You don't even need a timeline for this.  Just have a single syncobj
> > > per-queue and make each submit wait on it and then signal it.
> > > Alternatively, you can just always hang on to the out-fence from the
> > > previous submit and make the next one wait on that.  
> >
> > That's what I have right now, but it forces the serialization of all
> > jobs that are pushed during a submit (and there can be more than one
> > per command buffer on panfrost :-/). Maybe I'm wrong, but I thought it'd
> > be better to not force this serialization if we can avoid it.  
> 
> I'm not familiar with panfrost's needs and I don't work on a tiler and
> I know there are different issues there.  But...
> 
> The Vulkan spec requires that everything that all the submits that
> happen on a given vkQueue happen in-order.  Search the spec for
> "Submission order" for more details.

Duh, looks like I completely occulted the "Submission order"
guarantees. This being said, even after reading this chapter multiple
times I'm not sure what kind of guarantee this gives us, given the
execution itself can be out-of-order. My understanding is that
submission order matters for implicit deps, say you have 2 distinct
VkSubmitInfo, the first one (in submission order) writing to a buffer
and the second one reading from it, you really want the first one to
be submitted first and the second one to wait on the implicit BO fence
created by the first one. If we were to submit out-of-order, this
guarantee wouldn't be met. OTOH, if we have 2 completely independent
submits, I don't really see what submission order gives us if execution
is out-of-order.

In our case, the kernel driver takes care of the submission
serialization (gathering implicit and explicit deps, queuing the job and
assigning the "done" fence to the output sync objects). Once things
are queued, it's the scheduler (drm_sched) deciding of the execution
order.

> 
> So, generally speaking, there are some in-order requirements there.
> Again, not having a lot of tiler experience, I'm not the one to weigh
> in.
> 
> > > Timelines are overkill here, IMO.  
> >
> > Mind developing why you think this is overkill? After looking at the
> > kernel implementation I thought using timeline syncobjs would be
> > pretty cheap compared to the other options I considered.  
> 
> If you use a regular syncobj, every time you wait on it it inserts a
> dependency between the current submit and the last thing to signal it
> on the CPU timeline.  The internal dma_fences will hang around
> as-needed to ensure those dependencies.  If you use a timeline, you
> have to also track a uint64_t to reference the current time point.
> This ma

[PATCH] drm/panfrost: Fix the panfrost_mmu_map_fault_addr() error path

2021-05-21 Thread Boris Brezillon
Make sure all bo->base.pages entries are either NULL or pointing to a
valid page before calling drm_gem_shmem_put_pages().

Reported-by: Tomeu Vizoso 
Cc: 
Fixes: 187d2929206e ("drm/panfrost: Add support for GPU heap allocations")
Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 569509c2ba27..d76dff201ea6 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -460,6 +460,7 @@ static int panfrost_mmu_map_fault_addr(struct 
panfrost_device *pfdev, int as,
if (IS_ERR(pages[i])) {
mutex_unlock(&bo->base.pages_lock);
ret = PTR_ERR(pages[i]);
+   pages[i] = NULL;
goto err_pages;
}
}
-- 
2.31.1



Re: [PATCH v5] drm/bridge: tfp410: Set input_bus_flags in atomic_check

2020-12-04 Thread Boris Brezillon
On Thu, 3 Dec 2020 18:20:48 +0530
Nikhil Devshatwar  wrote:

> input_bus_flags are specified in drm_bridge_timings (legacy) as well
> as drm_bridge_state->input_bus_cfg.flags
> 
> The flags from the timings will be deprecated. Bridges are supposed
> to validate and set the bridge state flags from atomic_check.
> 
> Implement atomic_check hook for the same.
> 
> Signed-off-by: Nikhil Devshatwar 
> ---
> 
> Notes:
> changes from v4:
> * fix a warning Reported-by: kernel test robot 
> 
>  drivers/gpu/drm/bridge/ti-tfp410.c | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/drivers/gpu/drm/bridge/ti-tfp410.c 
> b/drivers/gpu/drm/bridge/ti-tfp410.c
> index 3def9acba86b..92963d12106b 100644
> --- a/drivers/gpu/drm/bridge/ti-tfp410.c
> +++ b/drivers/gpu/drm/bridge/ti-tfp410.c
> @@ -233,6 +233,20 @@ static u32 *tfp410_get_input_bus_fmts(struct drm_bridge 
> *bridge,
>   return input_fmts;
>  }
>  
> +static int tfp410_atomic_check(struct drm_bridge *bridge,
> + struct drm_bridge_state *bridge_state,
> + struct drm_crtc_state *crtc_state,
> + struct drm_connector_state *conn_state)
> +{
> + struct tfp410 *dvi = drm_bridge_to_tfp410(bridge);
> +
> + /*
> +  * There might be flags negotiation supported in future
> +  * Set the bus flags in atomic_check statically for now
> +  */
> + bridge_state->input_bus_cfg.flags = dvi->timings.input_bus_flags;

The return statement is still missing :-).

> +}
> +
>  static const struct drm_bridge_funcs tfp410_bridge_funcs = {
>   .attach = tfp410_attach,
>   .detach = tfp410_detach,
> @@ -243,6 +257,7 @@ static const struct drm_bridge_funcs tfp410_bridge_funcs 
> = {
>   .atomic_duplicate_state = drm_atomic_helper_bridge_duplicate_state,
>   .atomic_destroy_state = drm_atomic_helper_bridge_destroy_state,
>   .atomic_get_input_bus_fmts = tfp410_get_input_bus_fmts,
> + .atomic_check = tfp410_atomic_check,
>  };
>  
>  static const struct drm_bridge_timings tfp410_default_timings = {

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v4 4/7] drm/bridge: mhdp8546: Set input_bus_flags from atomic_check

2020-12-04 Thread Boris Brezillon
On Tue, 1 Dec 2020 17:48:27 +0530
Nikhil Devshatwar  wrote:

> input_bus_flags are specified in drm_bridge_timings (legacy) as well
> as drm_bridge_state->input_bus_cfg.flags
> 
> The flags from the timings will be deprecated. Bridges are supposed
> to validate and set the bridge state flags from atomic_check.
> 
> Signed-off-by: Nikhil Devshatwar 
> ---
>  drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c | 6 ++
>  drivers/gpu/drm/bridge/ti-tfp410.c  | 1 +
>  2 files changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c 
> b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c
> index 2cd809eed827..9c17e4bb517e 100644
> --- a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c
> +++ b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c
> @@ -2121,6 +2121,12 @@ static int cdns_mhdp_atomic_check(struct drm_bridge 
> *bridge,
>   return -EINVAL;
>   }
>  
> + /*
> +  * There might be flags negotiation supported in future
> +  * Set the bus flags in atomic_check statically for now
> +  */
> + bridge_state->input_bus_cfg.flags = bridge->timings->input_bus_flags;
> +
>   mutex_unlock(&mhdp->link_mutex);
>   return 0;
>  }
> diff --git a/drivers/gpu/drm/bridge/ti-tfp410.c 
> b/drivers/gpu/drm/bridge/ti-tfp410.c
> index 4c536df003c8..9035d2145a28 100644
> --- a/drivers/gpu/drm/bridge/ti-tfp410.c
> +++ b/drivers/gpu/drm/bridge/ti-tfp410.c
> @@ -245,6 +245,7 @@ int tfp410_atomic_check(struct drm_bridge *bridge,
>* Set the bus flags in atomic_check statically for now
>*/
>   bridge_state->input_bus_cfg.flags = dvi->timings.input_bus_flags;
> + return 0;

And here is the return statement that was missing in patch 2 :-).

>  }
>  
>  static const struct drm_bridge_funcs tfp410_bridge_funcs = {

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v4 3/7] drm/bridge: mhdp8546: Add minimal format negotiation

2020-12-04 Thread Boris Brezillon
On Tue, 1 Dec 2020 17:48:26 +0530
Nikhil Devshatwar  wrote:

> With new connector model, mhdp bridge will not create the connector and
> SoC driver will rely on format negotiation to setup the encoder format.
> 
> Support minimal format negotiations hooks in the drm_bridge_funcs.
> Complete format negotiation can be added based on EDID data.
> This patch adds the minimal required support to avoid failure
> after moving to new connector model.
> 
> Signed-off-by: Nikhil Devshatwar 
> Reviewed-by: Tomi Valkeinen 
> ---
> 
> Notes:
> changes from v1:
> * cosmetic fixes, commit message update
> 
>  .../drm/bridge/cadence/cdns-mhdp8546-core.c   | 25 +++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c 
> b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c
> index d0c65610ebb5..2cd809eed827 100644
> --- a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c
> +++ b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c
> @@ -2078,6 +2078,30 @@ cdns_mhdp_bridge_atomic_reset(struct drm_bridge 
> *bridge)
>   return &cdns_mhdp_state->base;
>  }
>  
> +static u32 *cdns_mhdp_get_input_bus_fmts(struct drm_bridge *bridge,
> +   struct drm_bridge_state *bridge_state,
> +   struct drm_crtc_state *crtc_state,
> +   struct drm_connector_state *conn_state,
> +   u32 output_fmt,
> +   unsigned int *num_input_fmts)
> +{
> + u32 *input_fmts;
> + u32 default_bus_format = MEDIA_BUS_FMT_RGB121212_1X36;
> +
> + *num_input_fmts = 0;
> +
> + if (output_fmt != MEDIA_BUS_FMT_FIXED)
> + return NULL;
> +
> + input_fmts = kzalloc(sizeof(*input_fmts), GFP_KERNEL);
> + if (!input_fmts)
> + return NULL;
> +
> + *num_input_fmts = 1;
> + input_fmts[0] = default_bus_format;

Why not 

input_fmts[0] = MEDIA_BUS_FMT_RGB121212_1X36;

?

This way you could get rid of the default_bus_format variable.

> + return input_fmts;
> +}
> +
>  static int cdns_mhdp_atomic_check(struct drm_bridge *bridge,
> struct drm_bridge_state *bridge_state,
> struct drm_crtc_state *crtc_state,
> @@ -2142,6 +2166,7 @@ static const struct drm_bridge_funcs 
> cdns_mhdp_bridge_funcs = {
>   .atomic_duplicate_state = cdns_mhdp_bridge_atomic_duplicate_state,
>   .atomic_destroy_state = cdns_mhdp_bridge_atomic_destroy_state,
>   .atomic_reset = cdns_mhdp_bridge_atomic_reset,
> + .atomic_get_input_bus_fmts = cdns_mhdp_get_input_bus_fmts,
>   .detect = cdns_mhdp_bridge_detect,
>   .get_edid = cdns_mhdp_bridge_get_edid,
>   .hpd_enable = cdns_mhdp_bridge_hpd_enable,

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v4 4/7] drm/bridge: mhdp8546: Set input_bus_flags from atomic_check

2020-12-04 Thread Boris Brezillon
On Tue, 1 Dec 2020 17:48:27 +0530
Nikhil Devshatwar  wrote:

> input_bus_flags are specified in drm_bridge_timings (legacy) as well
> as drm_bridge_state->input_bus_cfg.flags
> 
> The flags from the timings will be deprecated. Bridges are supposed
> to validate and set the bridge state flags from atomic_check.
> 
> Signed-off-by: Nikhil Devshatwar 
> ---
>  drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c | 6 ++
>  drivers/gpu/drm/bridge/ti-tfp410.c  | 1 +
>  2 files changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c 
> b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c
> index 2cd809eed827..9c17e4bb517e 100644
> --- a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c
> +++ b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c
> @@ -2121,6 +2121,12 @@ static int cdns_mhdp_atomic_check(struct drm_bridge 
> *bridge,
>   return -EINVAL;
>   }
>  
> + /*
> +  * There might be flags negotiation supported in future
> +  * Set the bus flags in atomic_check statically for now
> +  */
> + bridge_state->input_bus_cfg.flags = bridge->timings->input_bus_flags;

I'd go even further and replace the timings field in
cdns_mhdp_platform_info by an input_bus_flags field, you'll then
have the following assignment here.

if (mhdp->info)
bridge_state->input_bus_cfg.flags = mhdp->info->input_bus_flags;

This way you no rely on the bridge->timings presence and can
get rid of the mhdp->bridge.timings assignment in the probe path.


> +
>   mutex_unlock(&mhdp->link_mutex);
>   return 0;
>  }

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v4 5/7] drm/tidss: Set bus_format correctly from bridge/connector

2020-12-04 Thread Boris Brezillon
On Tue, 1 Dec 2020 17:48:28 +0530
Nikhil Devshatwar  wrote:

> Remove the old code to iterate over the bridge chain, as this is
> already done by the framework.
> The bridge state should have the negotiated bus format and flags.
> Use these from the bridge's state.
> If the bridge does not support format negotiation, error out
> and fail.

That'd be even better if you implement the bridge interface instead of
the encoder one so we can get rid of the encoder_{helper}_funcs and use
drm_simple_encoder_init().

> 
> Signed-off-by: Nikhil Devshatwar 
> Reviewed-by: Tomi Valkeinen 
> ---
> 
> Notes:
> changes from v3:
> * cosmetic updates
> changes from v2:
> * Remove the old code and use the flags from the bridge state
> 
>  drivers/gpu/drm/tidss/tidss_encoder.c | 36 +++
>  1 file changed, 14 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/gpu/drm/tidss/tidss_encoder.c 
> b/drivers/gpu/drm/tidss/tidss_encoder.c
> index e278a9c89476..5deb8102e4d3 100644
> --- a/drivers/gpu/drm/tidss/tidss_encoder.c
> +++ b/drivers/gpu/drm/tidss/tidss_encoder.c
> @@ -21,37 +21,29 @@ static int tidss_encoder_atomic_check(struct drm_encoder 
> *encoder,
>  {
>   struct drm_device *ddev = encoder->dev;
>   struct tidss_crtc_state *tcrtc_state = to_tidss_crtc_state(crtc_state);
> - struct drm_display_info *di = &conn_state->connector->display_info;
> + struct drm_bridge_state *bstate;
>   struct drm_bridge *bridge;
> - bool bus_flags_set = false;
>  
>   dev_dbg(ddev->dev, "%s\n", __func__);
>  
> - /*
> -  * Take the bus_flags from the first bridge that defines
> -  * bridge timings, or from the connector's display_info if no
> -  * bridge defines the timings.
> -  */
> - drm_for_each_bridge_in_chain(encoder, bridge) {
> - if (!bridge->timings)
> - continue;
> -
> - tcrtc_state->bus_flags = bridge->timings->input_bus_flags;
> - bus_flags_set = true;
> - break;
> + /* Copy the bus_format and flags from the first bridge's state */
> + bridge = drm_bridge_chain_get_first_bridge(encoder);
> + bstate = drm_atomic_get_new_bridge_state(crtc_state->state, bridge);
> + if (!bstate) {
> + dev_err(ddev->dev, "Could not get the bridge state\n");
> + return -EINVAL;
>   }
>  
> - if (!di->bus_formats || di->num_bus_formats == 0)  {
> - dev_err(ddev->dev, "%s: No bus_formats in connected display\n",
> - __func__);
> + tcrtc_state->bus_format = bstate->input_bus_cfg.format;
> + tcrtc_state->bus_flags = bstate->input_bus_cfg.flags;
> +
> + if (tcrtc_state->bus_format == 0 ||
> + tcrtc_state->bus_format == MEDIA_BUS_FMT_FIXED) {
> +
> + dev_err(ddev->dev, "Bridge connected to the encoder did not 
> specify media bus format\n");
>   return -EINVAL;
>   }
>  
> - // XXX any cleaner way to set bus format and flags?
> - tcrtc_state->bus_format = di->bus_formats[0];
> - if (!bus_flags_set)
> - tcrtc_state->bus_flags = di->bus_flags;
> -
>   return 0;
>  }
>  

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v4 5/7] drm/tidss: Set bus_format correctly from bridge/connector

2020-12-04 Thread Boris Brezillon
On Fri, 4 Dec 2020 12:56:27 +0200
Tomi Valkeinen  wrote:

> Hi Boris,
> 
> On 04/12/2020 12:50, Boris Brezillon wrote:
> > On Tue, 1 Dec 2020 17:48:28 +0530
> > Nikhil Devshatwar  wrote:
> >   
> >> Remove the old code to iterate over the bridge chain, as this is
> >> already done by the framework.
> >> The bridge state should have the negotiated bus format and flags.
> >> Use these from the bridge's state.
> >> If the bridge does not support format negotiation, error out
> >> and fail.  
> > 
> > That'd be even better if you implement the bridge interface instead of
> > the encoder one so we can get rid of the encoder_{helper}_funcs and use
> > drm_simple_encoder_init().  
> 
> I'm a bit confused here. What should be the design here...
> 
> These formats need to be handled by the crtc (well, the display controller, 
> which is modeled as the
> crtc). Should we have a tidss_crtc.c file, which implements a crtc, a simple 
> encoder and a bridge?
> And just consider those three DRM components as different API parts of the 
> display controller?

The idea is to hide the encoder abstraction from drivers as much as we
can. We have to keep the encoder concept because that's what we expose
to userspace, but drivers shouldn't have to worry about the distinction
between the first bridge in the chain (what we currently call encoders)
and other bridges. The bridge interface provides pretty much the same
functionality, so all you need to do is turn your encoder driver into a
bridge driver (see what we did for
drivers/gpu/drm/imx/parallel-display.c), the only particularity here is
that the bridge knows it's the first in the chain, and has access to
the CRTC it's directly connected to.

IMHO, the less interfaces we keep the clearer it gets for our users.
Getting rid of the encoder_{helper_}funcs in favor or bridge_ops would
clarify the fact that any kind of encoder, no matter if it's the first
in the chain or not, should be represented as a bridge object.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v4 5/7] drm/tidss: Set bus_format correctly from bridge/connector

2020-12-04 Thread Boris Brezillon
On Fri, 4 Dec 2020 13:47:05 +0200
Tomi Valkeinen  wrote:

> On 04/12/2020 13:12, Boris Brezillon wrote:
> 
> >>> That'd be even better if you implement the bridge interface instead of
> >>> the encoder one so we can get rid of the encoder_{helper}_funcs and use
> >>> drm_simple_encoder_init().
> >>
> >> I'm a bit confused here. What should be the design here...
> >>
> >> These formats need to be handled by the crtc (well, the display 
> >> controller, which is modeled as the
> >> crtc). Should we have a tidss_crtc.c file, which implements a crtc, a 
> >> simple encoder and a bridge?
> >> And just consider those three DRM components as different API parts of the 
> >> display controller?  
> > 
> > The idea is to hide the encoder abstraction from drivers as much as we
> > can. We have to keep the encoder concept because that's what we expose
> > to userspace, but drivers shouldn't have to worry about the distinction
> > between the first bridge in the chain (what we currently call encoders)
> > and other bridges. The bridge interface provides pretty much the same
> > functionality, so all you need to do is turn your encoder driver into a
> > bridge driver (see what we did for
> > drivers/gpu/drm/imx/parallel-display.c), the only particularity here is
> > that the bridge knows it's the first in the chain, and has access to
> > the CRTC it's directly connected to.  
> 
> With a quick look, the imx parallel-display.c seems to be really part of the 
> crtc. Shouldn't we then
> take one more step forward, and just combine the crtc, encoder and bridge 
> (somehow)? That's kind of
> what parallel-display.c is doing, isn't it? It's directly poking the crtc 
> state, but the code just
> happens to be in a separate file from the crtc.

Right. If we want to keep the code split, the logic should probably be
reversed, with the CRTC driver checking the first bridge state to setup
its internal state. Those DPI encoders are always a bit special, since
they tend to be directly embedded in the block responsible for timing
control (what we call CRTCs), so you're right, maybe we should model
that as a CRTC+bridge pair.

> 
> Thinking about the TI HW, we have a bunch of internal IPs which are clearly 
> bridges: HDMI, DSI,
> which get the pixel data from the display controller, and they have their own 
> registers, so clearly
> independent bridges.
> 
> Then we have MIPI DPI, which doesn't really have its own registers, as it's 
> just plain parallel RGB,
> the same as what is sent to e.g. HDMI and DSI bridges.

I still consider that one a bridge, even if the translation is almost
transparent from a HW PoV.

> However, there might be some muxes or
> regulators to set up to get the external signals working. So a bridge would 
> be ok here too to handle
> that external side.

Exactly.

> 
> But in all the above cases, we have the display controller (crtc), which in 
> all the cases needs to
> do e.g. pixel/bus format configuration. So why add direct crtc poking into 
> the first bridges (HDMI,
> DSI, DPI), when we could just model the crtc as a bridge, and thus the first 
> bridges wouldn't need
> to touch the crtc.

Yes, that's an option. I mean, you can already modify your CRTC
logic to implement the bridge and CRTC interface and have your
driver-specific CRTC object embed both a drm_crtc and a drm_bridge.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v5 01/16] drm/sched: Document what the timedout_job method should do

2021-06-29 Thread Boris Brezillon
The documentation is a bit vague and doesn't really describe what the
->timedout_job() is expected to do. Let's add a few more details.

v5:
* New patch

Suggested-by: Daniel Vetter 
Signed-off-by: Boris Brezillon 
---
 include/drm/gpu_scheduler.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 10225a0a35d0..65700511e074 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -239,6 +239,20 @@ struct drm_sched_backend_ops {
 * @timedout_job: Called when a job has taken too long to execute,
 * to trigger GPU recovery.
 *
+* This method is called in a workqueue context.
+*
+* Drivers typically issue a reset to recover from GPU hangs, and this
+* procedure usually follows the following workflow:
+*
+* 1. Stop the scheduler using drm_sched_stop(). This will park the
+*scheduler thread and cancel the timeout work, guaranteeing that
+*nothing is queued while we reset the hardware queue
+* 2. Try to gracefully stop non-faulty jobs (optional)
+* 3. Issue a GPU reset (driver-specific)
+* 4. Re-submit jobs using drm_sched_resubmit_jobs()
+* 5. Restart the scheduler using drm_sched_start(). At that point, new
+*jobs can be queued, and the scheduler thread is unblocked
+*
 * Return DRM_GPU_SCHED_STAT_NOMINAL, when all is normal,
 * and the underlying driver has started or completed recovery.
 *
-- 
2.31.1



[PATCH v5 00/16] drm/panfrost: Misc improvements

2021-06-29 Thread Boris Brezillon


Hello,

This is a merge of [1] and [2] since the second series depends on
patches in the preparatory series.

Main changes in this v5:
* Document what's excepted in the ->timedout_job() hook
* Add a patch increasing the AS_ACTIVE polling timeout
* Fix a few minor things here and there (see each commit for a detailed
  changelog) and collect R-b/A-b tags

Regards,

Boris

Boris Brezillon (15):
  drm/sched: Document what the timedout_job method should do
  drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr
  drm/panfrost: Make ->run_job() return an ERR_PTR() when appropriate
  drm/panfrost: Get rid of the unused JS_STATUS_EVENT_ACTIVE definition
  drm/panfrost: Drop the pfdev argument passed to
panfrost_exception_name()
  drm/panfrost: Do the exception -> string translation using a table
  drm/panfrost: Expose a helper to trigger a GPU reset
  drm/panfrost: Use a threaded IRQ for job interrupts
  drm/panfrost: Simplify the reset serialization logic
  drm/panfrost: Make sure job interrupts are masked before resetting
  drm/panfrost: Disable the AS on unhandled page faults
  drm/panfrost: Reset the GPU when the AS_ACTIVE bit is stuck
  drm/panfrost: Don't reset the GPU on job faults unless we really have
to
  drm/panfrost: Kill in-flight jobs on FD close
  drm/panfrost: Increase the AS_ACTIVE polling timeout

Steven Price (1):
  drm/panfrost: Queue jobs on the hardware

 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |   2 +-
 drivers/gpu/drm/etnaviv/etnaviv_sched.c|   3 +-
 drivers/gpu/drm/lima/lima_sched.c  |   3 +-
 drivers/gpu/drm/panfrost/panfrost_device.c | 139 +++--
 drivers/gpu/drm/panfrost/panfrost_device.h |  91 ++-
 drivers/gpu/drm/panfrost/panfrost_gpu.c|   2 +-
 drivers/gpu/drm/panfrost/panfrost_job.c| 630 +++--
 drivers/gpu/drm/panfrost/panfrost_mmu.c|  43 +-
 drivers/gpu/drm/panfrost/panfrost_regs.h   |   3 -
 drivers/gpu/drm/scheduler/sched_main.c |  14 +-
 drivers/gpu/drm/v3d/v3d_sched.c|  10 +-
 include/drm/gpu_scheduler.h|  37 +-
 12 files changed, 721 insertions(+), 256 deletions(-)

-- 
2.31.1



[PATCH v5 03/16] drm/panfrost: Make ->run_job() return an ERR_PTR() when appropriate

2021-06-29 Thread Boris Brezillon
If the fence creation fail, we can return the error pointer directly.
The core will update the fence error accordingly.

Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
Reviewed-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/panfrost/panfrost_job.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index 8ff79fd49577..d6c9698bca3b 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -355,7 +355,7 @@ static struct dma_fence *panfrost_job_run(struct 
drm_sched_job *sched_job)
 
fence = panfrost_fence_create(pfdev, slot);
if (IS_ERR(fence))
-   return NULL;
+   return fence;
 
if (job->done_fence)
dma_fence_put(job->done_fence);
-- 
2.31.1



[PATCH v5 02/16] drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr

2021-06-29 Thread Boris Brezillon
Mali Midgard/Bifrost GPUs have 3 hardware queues but only a global GPU
reset. This leads to extra complexity when we need to synchronize timeout
works with the reset work. One solution to address that is to have an
ordered workqueue at the driver level that will be used by the different
schedulers to queue their timeout work. Thanks to the serialization
provided by the ordered workqueue we are guaranteed that timeout
handlers are executed sequentially, and can thus easily reset the GPU
from the timeout handler without extra synchronization.

v5:
* Add a new paragraph to the timedout_job() method

v3:
* New patch

v4:
* Actually use the timeout_wq to queue the timeout work

Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
Reviewed-by: Lucas Stach 
Cc: Qiang Yu 
Cc: Emma Anholt 
Cc: Alex Deucher 
Cc: "Christian König" 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  2 +-
 drivers/gpu/drm/etnaviv/etnaviv_sched.c   |  3 ++-
 drivers/gpu/drm/lima/lima_sched.c |  3 ++-
 drivers/gpu/drm/panfrost/panfrost_job.c   |  3 ++-
 drivers/gpu/drm/scheduler/sched_main.c| 14 +-
 drivers/gpu/drm/v3d/v3d_sched.c   | 10 +-
 include/drm/gpu_scheduler.h   | 23 ++-
 7 files changed, 43 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 47ea46859618..532636ea20bc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -488,7 +488,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
 
r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
   num_hw_submission, amdgpu_job_hang_limit,
-  timeout, sched_score, ring->name);
+  timeout, NULL, sched_score, ring->name);
if (r) {
DRM_ERROR("Failed to create scheduler on ring %s.\n",
  ring->name);
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c 
b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
index 19826e504efc..feb6da1b6ceb 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -190,7 +190,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
 
ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
 etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
-msecs_to_jiffies(500), NULL, dev_name(gpu->dev));
+msecs_to_jiffies(500), NULL, NULL,
+dev_name(gpu->dev));
if (ret)
return ret;
 
diff --git a/drivers/gpu/drm/lima/lima_sched.c 
b/drivers/gpu/drm/lima/lima_sched.c
index ecf3267334ff..dba8329937a3 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -508,7 +508,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, 
const char *name)
INIT_WORK(&pipe->recover_work, lima_sched_recover_work);
 
return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
- lima_job_hang_limit, msecs_to_jiffies(timeout),
+ lima_job_hang_limit,
+ msecs_to_jiffies(timeout), NULL,
  NULL, name);
 }
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index 682f2161b999..8ff79fd49577 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -626,7 +626,8 @@ int panfrost_job_init(struct panfrost_device *pfdev)
 
ret = drm_sched_init(&js->queue[j].sched,
 &panfrost_sched_ops,
-1, 0, msecs_to_jiffies(JOB_TIMEOUT_MS),
+1, 0,
+msecs_to_jiffies(JOB_TIMEOUT_MS), NULL,
 NULL, "pan_js");
if (ret) {
dev_err(pfdev->dev, "Failed to create scheduler: %d.", 
ret);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index c0a2f8f8d472..3e180f0d4305 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -232,7 +232,7 @@ static void drm_sched_start_timeout(struct 
drm_gpu_scheduler *sched)
 {
if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
!list_empty(&sched->pending_list))
-   schedule_delayed_work(&sched->work_tdr, sched->timeout);
+   queue_delayed_work(sched->timeout_wq, &sched->work_tdr, 
sched->timeout);
 }
 
 /**
@@ -244,7 +244,7 @@ static void drm_sched_start_timeout(struct 
drm_gpu_scheduler *sched)
  */
 void drm_sched_fault(struct 

[PATCH v5 09/16] drm/panfrost: Simplify the reset serialization logic

2021-06-29 Thread Boris Brezillon
Now that we can pass our own workqueue to drm_sched_init(), we can use
an ordered workqueue on for both the scheduler timeout tdr and our own
reset work (which we use when the reset is not caused by a fault/timeout
on a specific job, like when we have AS_ACTIVE bit stuck). This
guarantees that the timeout handlers and reset handler can't run
concurrently which drastically simplifies the locking.

v4:
* Actually pass the reset workqueue to drm_sched_init()
* Don't call cancel_work_sync() in panfrost_reset(). It will deadlock
  since it might be called from the reset work, which is executing and
  cancel_work_sync() will wait for the handler to return. Checking the
  reset pending status should avoid spurious resets

v3:
* New patch

Suggested-by: Daniel Vetter 
Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_device.h |   6 +-
 drivers/gpu/drm/panfrost/panfrost_job.c| 187 -
 2 files changed, 72 insertions(+), 121 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index f2190f90be75..59a487e8aba3 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -108,6 +108,7 @@ struct panfrost_device {
struct mutex sched_lock;
 
struct {
+   struct workqueue_struct *wq;
struct work_struct work;
atomic_t pending;
} reset;
@@ -246,9 +247,8 @@ const char *panfrost_exception_name(u32 exception_code);
 static inline void
 panfrost_device_schedule_reset(struct panfrost_device *pfdev)
 {
-   /* Schedule a reset if there's no reset in progress. */
-   if (!atomic_xchg(&pfdev->reset.pending, 1))
-   schedule_work(&pfdev->reset.work);
+   atomic_set(&pfdev->reset.pending, 1);
+   queue_work(pfdev->reset.wq, &pfdev->reset.work);
 }
 
 #endif
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index e0c479e67304..98193a557a2d 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -25,17 +25,8 @@
 #define job_write(dev, reg, data) writel(data, dev->iomem + (reg))
 #define job_read(dev, reg) readl(dev->iomem + (reg))
 
-enum panfrost_queue_status {
-   PANFROST_QUEUE_STATUS_ACTIVE,
-   PANFROST_QUEUE_STATUS_STOPPED,
-   PANFROST_QUEUE_STATUS_STARTING,
-   PANFROST_QUEUE_STATUS_FAULT_PENDING,
-};
-
 struct panfrost_queue_state {
struct drm_gpu_scheduler sched;
-   atomic_t status;
-   struct mutex lock;
u64 fence_context;
u64 emit_seqno;
 };
@@ -379,57 +370,72 @@ void panfrost_job_enable_interrupts(struct 
panfrost_device *pfdev)
job_write(pfdev, JOB_INT_MASK, irq_mask);
 }
 
-static bool panfrost_scheduler_stop(struct panfrost_queue_state *queue,
-   struct drm_sched_job *bad)
+static void panfrost_reset(struct panfrost_device *pfdev,
+  struct drm_sched_job *bad)
 {
-   enum panfrost_queue_status old_status;
-   bool stopped = false;
+   unsigned int i;
+   bool cookie;
 
-   mutex_lock(&queue->lock);
-   old_status = atomic_xchg(&queue->status,
-PANFROST_QUEUE_STATUS_STOPPED);
-   if (old_status == PANFROST_QUEUE_STATUS_STOPPED)
-   goto out;
+   if (!atomic_read(&pfdev->reset.pending))
+   return;
+
+   /* Stop the schedulers.
+*
+* FIXME: We temporarily get out of the dma_fence_signalling section
+* because the cleanup path generate lockdep splats when taking locks
+* to release job resources. We should rework the code to follow this
+* pattern:
+*
+*  try_lock
+*  if (locked)
+*  release
+*  else
+*  schedule_work_to_release_later
+*/
+   for (i = 0; i < NUM_JOB_SLOTS; i++)
+   drm_sched_stop(&pfdev->js->queue[i].sched, bad);
+
+   cookie = dma_fence_begin_signalling();
 
-   WARN_ON(old_status != PANFROST_QUEUE_STATUS_ACTIVE);
-   drm_sched_stop(&queue->sched, bad);
if (bad)
drm_sched_increase_karma(bad);
 
-   stopped = true;
+   spin_lock(&pfdev->js->job_lock);
+   for (i = 0; i < NUM_JOB_SLOTS; i++) {
+   if (pfdev->jobs[i]) {
+   pm_runtime_put_noidle(pfdev->dev);
+   panfrost_devfreq_record_idle(&pfdev->pfdevfreq);
+   pfdev->jobs[i] = NULL;
+   }
+   }
+   spin_unlock(&pfdev->js->job_lock);
 
-   /*
-* Set the timeout to max so the timer doesn't get started
-* when we return from the timeout handler (restore

[PATCH v5 07/16] drm/panfrost: Expose a helper to trigger a GPU reset

2021-06-29 Thread Boris Brezillon
Expose a helper to trigger a GPU reset so we can easily trigger reset
operations outside the job timeout handler.

Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
Reviewed-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/panfrost/panfrost_device.h | 8 
 drivers/gpu/drm/panfrost/panfrost_job.c| 4 +---
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index 4c876476268f..f2190f90be75 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -243,4 +243,12 @@ enum drm_panfrost_exception_type {
 
 const char *panfrost_exception_name(u32 exception_code);
 
+static inline void
+panfrost_device_schedule_reset(struct panfrost_device *pfdev)
+{
+   /* Schedule a reset if there's no reset in progress. */
+   if (!atomic_xchg(&pfdev->reset.pending, 1))
+   schedule_work(&pfdev->reset.work);
+}
+
 #endif
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index 3cd1aec6c261..be8f68f63974 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -458,9 +458,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct 
drm_sched_job
if (!panfrost_scheduler_stop(&pfdev->js->queue[js], sched_job))
return DRM_GPU_SCHED_STAT_NOMINAL;
 
-   /* Schedule a reset if there's no reset in progress. */
-   if (!atomic_xchg(&pfdev->reset.pending, 1))
-   schedule_work(&pfdev->reset.work);
+   panfrost_device_schedule_reset(pfdev);
 
return DRM_GPU_SCHED_STAT_NOMINAL;
 }
-- 
2.31.1



[PATCH v5 12/16] drm/panfrost: Reset the GPU when the AS_ACTIVE bit is stuck

2021-06-29 Thread Boris Brezillon
Things are unlikely to resolve until we reset the GPU. Let's not wait
for other faults/timeout to happen to trigger this reset.

Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 65e98c51cb66..5267c3a1f02f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -36,8 +36,11 @@ static int wait_ready(struct panfrost_device *pfdev, u32 
as_nr)
ret = readl_relaxed_poll_timeout_atomic(pfdev->iomem + AS_STATUS(as_nr),
val, !(val & AS_STATUS_AS_ACTIVE), 10, 1000);
 
-   if (ret)
+   if (ret) {
+   /* The GPU hung, let's trigger a reset */
+   panfrost_device_schedule_reset(pfdev);
dev_err(pfdev->dev, "AS_ACTIVE bit stuck\n");
+   }
 
return ret;
 }
-- 
2.31.1



[PATCH v5 08/16] drm/panfrost: Use a threaded IRQ for job interrupts

2021-06-29 Thread Boris Brezillon
This should avoid switching to interrupt context when the GPU is under
heavy use.

v3:
* Don't take the job_lock in panfrost_job_handle_irq()

Signed-off-by: Boris Brezillon 
Acked-by: Alyssa Rosenzweig 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_job.c | 53 ++---
 1 file changed, 38 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index be8f68f63974..e0c479e67304 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -470,19 +470,12 @@ static const struct drm_sched_backend_ops 
panfrost_sched_ops = {
.free_job = panfrost_job_free
 };
 
-static irqreturn_t panfrost_job_irq_handler(int irq, void *data)
+static void panfrost_job_handle_irq(struct panfrost_device *pfdev, u32 status)
 {
-   struct panfrost_device *pfdev = data;
-   u32 status = job_read(pfdev, JOB_INT_STAT);
int j;
 
dev_dbg(pfdev->dev, "jobslot irq status=%x\n", status);
 
-   if (!status)
-   return IRQ_NONE;
-
-   pm_runtime_mark_last_busy(pfdev->dev);
-
for (j = 0; status; j++) {
u32 mask = MK_JS_MASK(j);
 
@@ -519,7 +512,6 @@ static irqreturn_t panfrost_job_irq_handler(int irq, void 
*data)
if (status & JOB_INT_MASK_DONE(j)) {
struct panfrost_job *job;
 
-   spin_lock(&pfdev->js->job_lock);
job = pfdev->jobs[j];
/* Only NULL if job timeout occurred */
if (job) {
@@ -531,21 +523,49 @@ static irqreturn_t panfrost_job_irq_handler(int irq, void 
*data)
dma_fence_signal_locked(job->done_fence);
pm_runtime_put_autosuspend(pfdev->dev);
}
-   spin_unlock(&pfdev->js->job_lock);
}
 
status &= ~mask;
}
+}
 
+static irqreturn_t panfrost_job_irq_handler_thread(int irq, void *data)
+{
+   struct panfrost_device *pfdev = data;
+   u32 status = job_read(pfdev, JOB_INT_RAWSTAT);
+
+   while (status) {
+   pm_runtime_mark_last_busy(pfdev->dev);
+
+   spin_lock(&pfdev->js->job_lock);
+   panfrost_job_handle_irq(pfdev, status);
+   spin_unlock(&pfdev->js->job_lock);
+   status = job_read(pfdev, JOB_INT_RAWSTAT);
+   }
+
+   job_write(pfdev, JOB_INT_MASK,
+ GENMASK(16 + NUM_JOB_SLOTS - 1, 16) |
+ GENMASK(NUM_JOB_SLOTS - 1, 0));
return IRQ_HANDLED;
 }
 
+static irqreturn_t panfrost_job_irq_handler(int irq, void *data)
+{
+   struct panfrost_device *pfdev = data;
+   u32 status = job_read(pfdev, JOB_INT_STAT);
+
+   if (!status)
+   return IRQ_NONE;
+
+   job_write(pfdev, JOB_INT_MASK, 0);
+   return IRQ_WAKE_THREAD;
+}
+
 static void panfrost_reset(struct work_struct *work)
 {
struct panfrost_device *pfdev = container_of(work,
 struct panfrost_device,
 reset.work);
-   unsigned long flags;
unsigned int i;
bool cookie;
 
@@ -575,7 +595,7 @@ static void panfrost_reset(struct work_struct *work)
/* All timers have been stopped, we can safely reset the pending state. 
*/
atomic_set(&pfdev->reset.pending, 0);
 
-   spin_lock_irqsave(&pfdev->js->job_lock, flags);
+   spin_lock(&pfdev->js->job_lock);
for (i = 0; i < NUM_JOB_SLOTS; i++) {
if (pfdev->jobs[i]) {
pm_runtime_put_noidle(pfdev->dev);
@@ -583,7 +603,7 @@ static void panfrost_reset(struct work_struct *work)
pfdev->jobs[i] = NULL;
}
}
-   spin_unlock_irqrestore(&pfdev->js->job_lock, flags);
+   spin_unlock(&pfdev->js->job_lock);
 
panfrost_device_reset(pfdev);
 
@@ -610,8 +630,11 @@ int panfrost_job_init(struct panfrost_device *pfdev)
if (irq <= 0)
return -ENODEV;
 
-   ret = devm_request_irq(pfdev->dev, irq, panfrost_job_irq_handler,
-  IRQF_SHARED, KBUILD_MODNAME "-job", pfdev);
+   ret = devm_request_threaded_irq(pfdev->dev, irq,
+   panfrost_job_irq_handler,
+   panfrost_job_irq_handler_thread,
+   IRQF_SHARED, KBUILD_MODNAME "-job",
+   pfdev);
if (ret) {
dev_err(pfdev->dev, "failed to request job irq");
return ret;
-- 
2.31.1



[PATCH v5 05/16] drm/panfrost: Drop the pfdev argument passed to panfrost_exception_name()

2021-06-29 Thread Boris Brezillon
Currently unused. We'll add it back if we need per-GPU definitions.

Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_device.c | 2 +-
 drivers/gpu/drm/panfrost/panfrost_device.h | 2 +-
 drivers/gpu/drm/panfrost/panfrost_gpu.c| 2 +-
 drivers/gpu/drm/panfrost/panfrost_job.c| 2 +-
 drivers/gpu/drm/panfrost/panfrost_mmu.c| 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c 
b/drivers/gpu/drm/panfrost/panfrost_device.c
index fbcf5edbe367..bce6b0aff05e 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.c
+++ b/drivers/gpu/drm/panfrost/panfrost_device.c
@@ -292,7 +292,7 @@ void panfrost_device_fini(struct panfrost_device *pfdev)
panfrost_clk_fini(pfdev);
 }
 
-const char *panfrost_exception_name(struct panfrost_device *pfdev, u32 
exception_code)
+const char *panfrost_exception_name(u32 exception_code)
 {
switch (exception_code) {
/* Non-Fault Status code */
diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index 4c6bdea5537b..ade8a1974ee9 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -172,6 +172,6 @@ void panfrost_device_reset(struct panfrost_device *pfdev);
 int panfrost_device_resume(struct device *dev);
 int panfrost_device_suspend(struct device *dev);
 
-const char *panfrost_exception_name(struct panfrost_device *pfdev, u32 
exception_code);
+const char *panfrost_exception_name(u32 exception_code);
 
 #endif
diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c 
b/drivers/gpu/drm/panfrost/panfrost_gpu.c
index 2aae636f1cf5..ec59f15940fb 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
@@ -33,7 +33,7 @@ static irqreturn_t panfrost_gpu_irq_handler(int irq, void 
*data)
address |= gpu_read(pfdev, GPU_FAULT_ADDRESS_LO);
 
dev_warn(pfdev->dev, "GPU Fault 0x%08x (%s) at 0x%016llx\n",
-fault_status & 0xFF, panfrost_exception_name(pfdev, 
fault_status),
+fault_status & 0xFF, 
panfrost_exception_name(fault_status),
 address);
 
if (state & GPU_IRQ_MULTIPLE_FAULT)
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index d6c9698bca3b..3cd1aec6c261 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -500,7 +500,7 @@ static irqreturn_t panfrost_job_irq_handler(int irq, void 
*data)
 
dev_err(pfdev->dev, "js fault, js=%d, status=%s, 
head=0x%x, tail=0x%x",
j,
-   panfrost_exception_name(pfdev, job_read(pfdev, 
JS_STATUS(j))),
+   panfrost_exception_name(job_read(pfdev, 
JS_STATUS(j))),
job_read(pfdev, JS_HEAD_LO(j)),
job_read(pfdev, JS_TAIL_LO(j)));
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index d76dff201ea6..b4f0c673cd7f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -676,7 +676,7 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, 
void *data)
"TODO",
fault_status,
(fault_status & (1 << 10) ? "DECODER FAULT" : 
"SLAVE FAULT"),
-   exception_type, panfrost_exception_name(pfdev, 
exception_type),
+   exception_type, 
panfrost_exception_name(exception_type),
access_type, access_type_name(pfdev, 
fault_status),
source_id);
 
-- 
2.31.1



[PATCH v5 06/16] drm/panfrost: Do the exception -> string translation using a table

2021-06-29 Thread Boris Brezillon
Do the exception -> string translation using a table. This way we get
rid of those magic numbers and can easily add new fields if we need
to attach extra information to exception types.

v4:
* Don't expose exception type to userspace
* Merge the enum definition and the enum -> string table declaration
  in the same patch

v3:
* Drop the error field

Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
Reviewed-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/panfrost/panfrost_device.c | 130 +
 drivers/gpu/drm/panfrost/panfrost_device.h |  69 +++
 2 files changed, 152 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c 
b/drivers/gpu/drm/panfrost/panfrost_device.c
index bce6b0aff05e..736854542b05 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.c
+++ b/drivers/gpu/drm/panfrost/panfrost_device.c
@@ -292,55 +292,91 @@ void panfrost_device_fini(struct panfrost_device *pfdev)
panfrost_clk_fini(pfdev);
 }
 
-const char *panfrost_exception_name(u32 exception_code)
-{
-   switch (exception_code) {
-   /* Non-Fault Status code */
-   case 0x00: return "NOT_STARTED/IDLE/OK";
-   case 0x01: return "DONE";
-   case 0x02: return "INTERRUPTED";
-   case 0x03: return "STOPPED";
-   case 0x04: return "TERMINATED";
-   case 0x08: return "ACTIVE";
-   /* Job exceptions */
-   case 0x40: return "JOB_CONFIG_FAULT";
-   case 0x41: return "JOB_POWER_FAULT";
-   case 0x42: return "JOB_READ_FAULT";
-   case 0x43: return "JOB_WRITE_FAULT";
-   case 0x44: return "JOB_AFFINITY_FAULT";
-   case 0x48: return "JOB_BUS_FAULT";
-   case 0x50: return "INSTR_INVALID_PC";
-   case 0x51: return "INSTR_INVALID_ENC";
-   case 0x52: return "INSTR_TYPE_MISMATCH";
-   case 0x53: return "INSTR_OPERAND_FAULT";
-   case 0x54: return "INSTR_TLS_FAULT";
-   case 0x55: return "INSTR_BARRIER_FAULT";
-   case 0x56: return "INSTR_ALIGN_FAULT";
-   case 0x58: return "DATA_INVALID_FAULT";
-   case 0x59: return "TILE_RANGE_FAULT";
-   case 0x5A: return "ADDR_RANGE_FAULT";
-   case 0x60: return "OUT_OF_MEMORY";
-   /* GPU exceptions */
-   case 0x80: return "DELAYED_BUS_FAULT";
-   case 0x88: return "SHAREABILITY_FAULT";
-   /* MMU exceptions */
-   case 0xC1: return "TRANSLATION_FAULT_LEVEL1";
-   case 0xC2: return "TRANSLATION_FAULT_LEVEL2";
-   case 0xC3: return "TRANSLATION_FAULT_LEVEL3";
-   case 0xC4: return "TRANSLATION_FAULT_LEVEL4";
-   case 0xC8: return "PERMISSION_FAULT";
-   case 0xC9 ... 0xCF: return "PERMISSION_FAULT";
-   case 0xD1: return "TRANSTAB_BUS_FAULT_LEVEL1";
-   case 0xD2: return "TRANSTAB_BUS_FAULT_LEVEL2";
-   case 0xD3: return "TRANSTAB_BUS_FAULT_LEVEL3";
-   case 0xD4: return "TRANSTAB_BUS_FAULT_LEVEL4";
-   case 0xD8: return "ACCESS_FLAG";
-   case 0xD9 ... 0xDF: return "ACCESS_FLAG";
-   case 0xE0 ... 0xE7: return "ADDRESS_SIZE_FAULT";
-   case 0xE8 ... 0xEF: return "MEMORY_ATTRIBUTES_FAULT";
+#define PANFROST_EXCEPTION(id) \
+   [DRM_PANFROST_EXCEPTION_ ## id] = { \
+   .name = #id, \
}
 
-   return "UNKNOWN";
+struct panfrost_exception_info {
+   const char *name;
+};
+
+static const struct panfrost_exception_info panfrost_exception_infos[] = {
+   PANFROST_EXCEPTION(OK),
+   PANFROST_EXCEPTION(DONE),
+   PANFROST_EXCEPTION(INTERRUPTED),
+   PANFROST_EXCEPTION(STOPPED),
+   PANFROST_EXCEPTION(TERMINATED),
+   PANFROST_EXCEPTION(KABOOM),
+   PANFROST_EXCEPTION(EUREKA),
+   PANFROST_EXCEPTION(ACTIVE),
+   PANFROST_EXCEPTION(JOB_CONFIG_FAULT),
+   PANFROST_EXCEPTION(JOB_POWER_FAULT),
+   PANFROST_EXCEPTION(JOB_READ_FAULT),
+   PANFROST_EXCEPTION(JOB_WRITE_FAULT),
+   PANFROST_EXCEPTION(JOB_AFFINITY_FAULT),
+   PANFROST_EXCEPTION(JOB_BUS_FAULT),
+   PANFROST_EXCEPTION(INSTR_INVALID_PC),
+   PANFROST_EXCEPTION(INSTR_INVALID_ENC),
+   PANFROST_EXCEPTION(INSTR_TYPE_MISMATCH),
+   PANFROST_EXCEPTION(INSTR_OPERAND_FAULT),
+   PANFROST_EXCEPTION(INSTR_TLS_FAULT),
+   PANFROST_EXCEPTION(INSTR_BARRIER_FAULT),
+   PANFROST_EXCEPTION(INSTR_ALIGN_FAULT),
+   PANFROST_EXCEPTION(DATA_INVALID_FAULT),
+   PANFROST_EXCEPTION(TILE_RANGE_FAULT),
+   PANFROST_EXCEPTION(ADDR_RANGE_FAULT),
+   PANFROST_EXCEPTION(IMPRECISE_FAULT),
+   PANFROST_EXCEPTION(OOM),
+   PANFROST_EXCEPTION(OOM_AFBC),
+   PANFROST_EXCEP

[PATCH v5 11/16] drm/panfrost: Disable the AS on unhandled page faults

2021-06-29 Thread Boris Brezillon
If we don't do that, we have to wait for the job timeout to expire
before the fault jobs gets killed.

v3:
* Make sure the AS is re-enabled when new jobs are submitted to the
  context

Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_device.h |  1 +
 drivers/gpu/drm/panfrost/panfrost_mmu.c| 34 --
 2 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index 59a487e8aba3..2dc8c0d1d987 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -96,6 +96,7 @@ struct panfrost_device {
spinlock_t as_lock;
unsigned long as_in_use_mask;
unsigned long as_alloc_mask;
+   unsigned long as_faulty_mask;
struct list_head as_lru_list;
 
struct panfrost_job_slot *js;
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index b4f0c673cd7f..65e98c51cb66 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -154,6 +154,7 @@ u32 panfrost_mmu_as_get(struct panfrost_device *pfdev, 
struct panfrost_mmu *mmu)
as = mmu->as;
if (as >= 0) {
int en = atomic_inc_return(&mmu->as_count);
+   u32 mask = BIT(as) | BIT(16 + as);
 
/*
 * AS can be retained by active jobs or a perfcnt context,
@@ -162,6 +163,18 @@ u32 panfrost_mmu_as_get(struct panfrost_device *pfdev, 
struct panfrost_mmu *mmu)
WARN_ON(en >= (NUM_JOB_SLOTS + 1));
 
list_move(&mmu->list, &pfdev->as_lru_list);
+
+   if (pfdev->as_faulty_mask & mask) {
+   /* Unhandled pagefault on this AS, the MMU was
+* disabled. We need to re-enable the MMU after
+* clearing+unmasking the AS interrupts.
+*/
+   mmu_write(pfdev, MMU_INT_CLEAR, mask);
+   mmu_write(pfdev, MMU_INT_MASK, ~pfdev->as_faulty_mask);
+   pfdev->as_faulty_mask &= ~mask;
+   panfrost_mmu_enable(pfdev, mmu);
+   }
+
goto out;
}
 
@@ -211,6 +224,7 @@ void panfrost_mmu_reset(struct panfrost_device *pfdev)
spin_lock(&pfdev->as_lock);
 
pfdev->as_alloc_mask = 0;
+   pfdev->as_faulty_mask = 0;
 
list_for_each_entry_safe(mmu, mmu_tmp, &pfdev->as_lru_list, list) {
mmu->as = -1;
@@ -662,7 +676,7 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, 
void *data)
if ((status & mask) == BIT(as) && (exception_type & 0xF8) == 
0xC0)
ret = panfrost_mmu_map_fault_addr(pfdev, as, addr);
 
-   if (ret)
+   if (ret) {
/* terminal fault, print info about the fault */
dev_err(pfdev->dev,
"Unhandled Page fault in AS%d at VA 0x%016llX\n"
@@ -680,14 +694,28 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int 
irq, void *data)
access_type, access_type_name(pfdev, 
fault_status),
source_id);
 
+   spin_lock(&pfdev->as_lock);
+   /* Ignore MMU interrupts on this AS until it's been
+* re-enabled.
+*/
+   pfdev->as_faulty_mask |= mask;
+
+   /* Disable the MMU to kill jobs on this AS. */
+   panfrost_mmu_disable(pfdev, as);
+   spin_unlock(&pfdev->as_lock);
+   }
+
status &= ~mask;
 
/* If we received new MMU interrupts, process them before 
returning. */
if (!status)
-   status = mmu_read(pfdev, MMU_INT_RAWSTAT);
+   status = mmu_read(pfdev, MMU_INT_RAWSTAT) & 
~pfdev->as_faulty_mask;
}
 
-   mmu_write(pfdev, MMU_INT_MASK, ~0);
+   spin_lock(&pfdev->as_lock);
+   mmu_write(pfdev, MMU_INT_MASK, ~pfdev->as_faulty_mask);
+   spin_unlock(&pfdev->as_lock);
+
return IRQ_HANDLED;
 };
 
-- 
2.31.1



[PATCH v5 14/16] drm/panfrost: Kill in-flight jobs on FD close

2021-06-29 Thread Boris Brezillon
If the process who submitted these jobs decided to close the FD before
the jobs are done it probably means it doesn't care about the result.

v5:
* Add a panfrost_exception_is_fault() helper and the
  DRM_PANFROST_EXCEPTION_MAX_NON_FAULT value

v4:
* Don't disable/restore irqs when taking the job_lock (not needed since
  this lock is never taken from an interrupt context)

v3:
* Set fence error to ECANCELED when a TERMINATED exception is received

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_device.h |  7 
 drivers/gpu/drm/panfrost/panfrost_job.c| 42 ++
 2 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index d91f71366214..d2ee6e5fe5d8 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -183,6 +183,7 @@ enum drm_panfrost_exception_type {
DRM_PANFROST_EXCEPTION_KABOOM = 0x05,
DRM_PANFROST_EXCEPTION_EUREKA = 0x06,
DRM_PANFROST_EXCEPTION_ACTIVE = 0x08,
+   DRM_PANFROST_EXCEPTION_MAX_NON_FAULT = 0x3f,
DRM_PANFROST_EXCEPTION_JOB_CONFIG_FAULT = 0x40,
DRM_PANFROST_EXCEPTION_JOB_POWER_FAULT = 0x41,
DRM_PANFROST_EXCEPTION_JOB_READ_FAULT = 0x42,
@@ -243,6 +244,12 @@ enum drm_panfrost_exception_type {
DRM_PANFROST_EXCEPTION_MEM_ATTR_NONCACHE_3 = 0xef,
 };
 
+static inline bool
+panfrost_exception_is_fault(u32 exception_code)
+{
+   return exception_code > DRM_PANFROST_EXCEPTION_MAX_NON_FAULT;
+}
+
 const char *panfrost_exception_name(u32 exception_code);
 bool panfrost_exception_needs_reset(const struct panfrost_device *pfdev,
u32 exception_code);
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index b0f4857ca084..d8e1bc227455 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -499,14 +499,21 @@ static void panfrost_job_handle_irq(struct 
panfrost_device *pfdev, u32 status)
 
if (status & JOB_INT_MASK_ERR(j)) {
u32 js_status = job_read(pfdev, JS_STATUS(j));
+   const char *exception_name = 
panfrost_exception_name(js_status);
 
job_write(pfdev, JS_COMMAND_NEXT(j), JS_COMMAND_NOP);
 
-   dev_err(pfdev->dev, "js fault, js=%d, status=%s, 
head=0x%x, tail=0x%x",
-   j,
-   panfrost_exception_name(js_status),
-   job_read(pfdev, JS_HEAD_LO(j)),
-   job_read(pfdev, JS_TAIL_LO(j)));
+   if (!panfrost_exception_is_fault(js_status)) {
+   dev_dbg(pfdev->dev, "js interrupt, js=%d, 
status=%s, head=0x%x, tail=0x%x",
+   j, exception_name,
+   job_read(pfdev, JS_HEAD_LO(j)),
+   job_read(pfdev, JS_TAIL_LO(j)));
+   } else {
+   dev_err(pfdev->dev, "js fault, js=%d, 
status=%s, head=0x%x, tail=0x%x",
+   j, exception_name,
+   job_read(pfdev, JS_HEAD_LO(j)),
+   job_read(pfdev, JS_TAIL_LO(j)));
+   }
 
/* If we need a reset, signal it to the timeout
 * handler, otherwise, update the fence error field and
@@ -515,7 +522,16 @@ static void panfrost_job_handle_irq(struct panfrost_device 
*pfdev, u32 status)
if (panfrost_exception_needs_reset(pfdev, js_status)) {
drm_sched_fault(&pfdev->js->queue[j].sched);
} else {
-   dma_fence_set_error(pfdev->jobs[j]->done_fence, 
-EINVAL);
+   int error = 0;
+
+   if (js_status == 
DRM_PANFROST_EXCEPTION_TERMINATED)
+   error = -ECANCELED;
+   else if (panfrost_exception_is_fault(js_status))
+   error = -EINVAL;
+
+   if (error)
+   
dma_fence_set_error(pfdev->jobs[j]->done_fence, error);
+
status |= JOB_INT_MASK_DONE(j);
}
}
@@ -681,10 +697,24 @@ int panfrost_job_open(struct panfrost_file_priv 
*panfrost_priv)
 
 void panfrost_job_close(struct panfrost_file_priv *panfrost_priv)
 {
+   struct panfrost_device *pfdev = panfrost_priv->pfdev;
int i;
 
for (i = 0; i < NUM_JOB_SLOTS; i++)
drm_sch

[PATCH v5 13/16] drm/panfrost: Don't reset the GPU on job faults unless we really have to

2021-06-29 Thread Boris Brezillon
If we can recover from a fault without a reset there's no reason to
issue one.

v3:
* Drop the mention of Valhall requiring a reset on JOB_BUS_FAULT
* Set the fence error to -EINVAL instead of having per-exception
  error codes

Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_device.c |  9 +
 drivers/gpu/drm/panfrost/panfrost_device.h |  2 ++
 drivers/gpu/drm/panfrost/panfrost_job.c| 16 ++--
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c 
b/drivers/gpu/drm/panfrost/panfrost_device.c
index 736854542b05..f4e42009526d 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.c
+++ b/drivers/gpu/drm/panfrost/panfrost_device.c
@@ -379,6 +379,15 @@ const char *panfrost_exception_name(u32 exception_code)
return panfrost_exception_infos[exception_code].name;
 }
 
+bool panfrost_exception_needs_reset(const struct panfrost_device *pfdev,
+   u32 exception_code)
+{
+   /* Right now, none of the GPU we support need a reset, but this
+* might change.
+*/
+   return false;
+}
+
 void panfrost_device_reset(struct panfrost_device *pfdev)
 {
panfrost_gpu_soft_reset(pfdev);
diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index 2dc8c0d1d987..d91f71366214 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -244,6 +244,8 @@ enum drm_panfrost_exception_type {
 };
 
 const char *panfrost_exception_name(u32 exception_code);
+bool panfrost_exception_needs_reset(const struct panfrost_device *pfdev,
+   u32 exception_code);
 
 static inline void
 panfrost_device_schedule_reset(struct panfrost_device *pfdev)
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index 4bd4d11377b7..b0f4857ca084 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -498,14 +498,26 @@ static void panfrost_job_handle_irq(struct 
panfrost_device *pfdev, u32 status)
job_write(pfdev, JOB_INT_CLEAR, mask);
 
if (status & JOB_INT_MASK_ERR(j)) {
+   u32 js_status = job_read(pfdev, JS_STATUS(j));
+
job_write(pfdev, JS_COMMAND_NEXT(j), JS_COMMAND_NOP);
 
dev_err(pfdev->dev, "js fault, js=%d, status=%s, 
head=0x%x, tail=0x%x",
j,
-   panfrost_exception_name(job_read(pfdev, 
JS_STATUS(j))),
+   panfrost_exception_name(js_status),
job_read(pfdev, JS_HEAD_LO(j)),
job_read(pfdev, JS_TAIL_LO(j)));
-   drm_sched_fault(&pfdev->js->queue[j].sched);
+
+   /* If we need a reset, signal it to the timeout
+* handler, otherwise, update the fence error field and
+* signal the job fence.
+*/
+   if (panfrost_exception_needs_reset(pfdev, js_status)) {
+   drm_sched_fault(&pfdev->js->queue[j].sched);
+   } else {
+   dma_fence_set_error(pfdev->jobs[j]->done_fence, 
-EINVAL);
+   status |= JOB_INT_MASK_DONE(j);
+   }
}
 
if (status & JOB_INT_MASK_DONE(j)) {
-- 
2.31.1



[PATCH v5 15/16] drm/panfrost: Queue jobs on the hardware

2021-06-29 Thread Boris Brezillon
From: Steven Price 

The hardware has a set of '_NEXT' registers that can hold a second job
while the first is executing. Make use of these registers to enqueue a
second job per slot.

v5:
* Fix a comment in panfrost_job_init()

v3:
* Fix the done/err job dequeuing logic to get a valid active state
* Only enable the second slot on GPUs supporting jobchain disambiguation
* Split interrupt handling in sub-functions

Signed-off-by: Steven Price 
Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_device.h |   2 +-
 drivers/gpu/drm/panfrost/panfrost_job.c| 468 +++--
 2 files changed, 351 insertions(+), 119 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index d2ee6e5fe5d8..9f1f2a603208 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -101,7 +101,7 @@ struct panfrost_device {
 
struct panfrost_job_slot *js;
 
-   struct panfrost_job *jobs[NUM_JOB_SLOTS];
+   struct panfrost_job *jobs[NUM_JOB_SLOTS][2];
struct list_head scheduled_jobs;
 
struct panfrost_perfcnt *perfcnt;
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index d8e1bc227455..e89e6adef481 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -140,9 +141,52 @@ static void panfrost_job_write_affinity(struct 
panfrost_device *pfdev,
job_write(pfdev, JS_AFFINITY_NEXT_HI(js), affinity >> 32);
 }
 
+static u32
+panfrost_get_job_chain_flag(const struct panfrost_job *job)
+{
+   struct panfrost_fence *f = to_panfrost_fence(job->done_fence);
+
+   if (!panfrost_has_hw_feature(job->pfdev, 
HW_FEATURE_JOBCHAIN_DISAMBIGUATION))
+   return 0;
+
+   return (f->seqno & 1) ? JS_CONFIG_JOB_CHAIN_FLAG : 0;
+}
+
+static struct panfrost_job *
+panfrost_dequeue_job(struct panfrost_device *pfdev, int slot)
+{
+   struct panfrost_job *job = pfdev->jobs[slot][0];
+
+   WARN_ON(!job);
+   pfdev->jobs[slot][0] = pfdev->jobs[slot][1];
+   pfdev->jobs[slot][1] = NULL;
+
+   return job;
+}
+
+static unsigned int
+panfrost_enqueue_job(struct panfrost_device *pfdev, int slot,
+struct panfrost_job *job)
+{
+   if (WARN_ON(!job))
+   return 0;
+
+   if (!pfdev->jobs[slot][0]) {
+   pfdev->jobs[slot][0] = job;
+   return 0;
+   }
+
+   WARN_ON(pfdev->jobs[slot][1]);
+   pfdev->jobs[slot][1] = job;
+   WARN_ON(panfrost_get_job_chain_flag(job) ==
+   panfrost_get_job_chain_flag(pfdev->jobs[slot][0]));
+   return 1;
+}
+
 static void panfrost_job_hw_submit(struct panfrost_job *job, int js)
 {
struct panfrost_device *pfdev = job->pfdev;
+   unsigned int subslot;
u32 cfg;
u64 jc_head = job->jc;
int ret;
@@ -168,7 +212,8 @@ static void panfrost_job_hw_submit(struct panfrost_job 
*job, int js)
 * start */
cfg |= JS_CONFIG_THREAD_PRI(8) |
JS_CONFIG_START_FLUSH_CLEAN_INVALIDATE |
-   JS_CONFIG_END_FLUSH_CLEAN_INVALIDATE;
+   JS_CONFIG_END_FLUSH_CLEAN_INVALIDATE |
+   panfrost_get_job_chain_flag(job);
 
if (panfrost_has_hw_feature(pfdev, HW_FEATURE_FLUSH_REDUCTION))
cfg |= JS_CONFIG_ENABLE_FLUSH_REDUCTION;
@@ -182,10 +227,17 @@ static void panfrost_job_hw_submit(struct panfrost_job 
*job, int js)
job_write(pfdev, JS_FLUSH_ID_NEXT(js), job->flush_id);
 
/* GO ! */
-   dev_dbg(pfdev->dev, "JS: Submitting atom %p to js[%d] with head=0x%llx",
-   job, js, jc_head);
 
-   job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START);
+   spin_lock(&pfdev->js->job_lock);
+   subslot = panfrost_enqueue_job(pfdev, js, job);
+   /* Don't queue the job if a reset is in progress */
+   if (!atomic_read(&pfdev->reset.pending)) {
+   job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START);
+   dev_dbg(pfdev->dev,
+   "JS: Submitting atom %p to js[%d][%d] with head=0x%llx 
AS %d",
+   job, js, subslot, jc_head, cfg & 0xf);
+   }
+   spin_unlock(&pfdev->js->job_lock);
 }
 
 static void panfrost_acquire_object_fences(struct drm_gem_object **bos,
@@ -343,7 +395,11 @@ static struct dma_fence *panfrost_job_run(struct 
drm_sched_job *sched_job)
if (unlikely(job->base.s_fence->finished.error))
return NULL;
 
-   pfdev->jobs[slot] = job;
+   /* Nothing to execute: can happen if the job has finished while
+* we were resetting the GPU.
+*/
+ 

[PATCH v5 04/16] drm/panfrost: Get rid of the unused JS_STATUS_EVENT_ACTIVE definition

2021-06-29 Thread Boris Brezillon
Exception types will be defined as an enum.

v4:
* Fix typo in the commit message

Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
Reviewed-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/panfrost/panfrost_regs.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h 
b/drivers/gpu/drm/panfrost/panfrost_regs.h
index eddaa62ad8b0..151cfebd80a0 100644
--- a/drivers/gpu/drm/panfrost/panfrost_regs.h
+++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
@@ -261,9 +261,6 @@
 #define JS_COMMAND_SOFT_STOP_1 0x06/* Execute SOFT_STOP if 
JOB_CHAIN_FLAG is 1 */
 #define JS_COMMAND_HARD_STOP_1 0x07/* Execute HARD_STOP if 
JOB_CHAIN_FLAG is 1 */
 
-#define JS_STATUS_EVENT_ACTIVE 0x08
-
-
 /* MMU regs */
 #define MMU_INT_RAWSTAT0x2000
 #define MMU_INT_CLEAR  0x2004
-- 
2.31.1



[PATCH v5 16/16] drm/panfrost: Increase the AS_ACTIVE polling timeout

2021-06-29 Thread Boris Brezillon
Experience has shown that 1ms is sometimes not enough, even when the GPU
is running at its maximum frequency, not to mention that an MMU operation
might take longer if the GPU is running at a lower frequency, which is
likely to be the case if devfreq is active.

Let's pick a significantly bigger timeout value (1ms -> 100ms) to be on
the safe side.

v5:
* New patch

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 5267c3a1f02f..a32a3df5358e 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -34,7 +34,7 @@ static int wait_ready(struct panfrost_device *pfdev, u32 
as_nr)
/* Wait for the MMU status to indicate there is no active command, in
 * case one is pending. */
ret = readl_relaxed_poll_timeout_atomic(pfdev->iomem + AS_STATUS(as_nr),
-   val, !(val & AS_STATUS_AS_ACTIVE), 10, 1000);
+   val, !(val & AS_STATUS_AS_ACTIVE), 10, 10);
 
if (ret) {
/* The GPU hung, let's trigger a reset */
-- 
2.31.1



[PATCH v5 10/16] drm/panfrost: Make sure job interrupts are masked before resetting

2021-06-29 Thread Boris Brezillon
This is not yet needed because we let active jobs be killed during by
the reset and we don't really bother making sure they can be restarted.
But once we start adding soft-stop support, controlling when we deal
with the remaining interrrupts and making sure those are handled before
the reset is issued gets tricky if we keep job interrupts active.

Let's prepare for that and mask+flush job IRQs before issuing a reset.

v4:
* Add a comment explaining why we WARN_ON(!job) in the irq handler
* Keep taking the job_lock when evicting stalled jobs

v3:
* New patch

Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_job.c | 27 -
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index 98193a557a2d..4bd4d11377b7 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -34,6 +34,7 @@ struct panfrost_queue_state {
 struct panfrost_job_slot {
struct panfrost_queue_state queue[NUM_JOB_SLOTS];
spinlock_t job_lock;
+   int irq;
 };
 
 static struct panfrost_job *
@@ -400,6 +401,15 @@ static void panfrost_reset(struct panfrost_device *pfdev,
if (bad)
drm_sched_increase_karma(bad);
 
+   /* Mask job interrupts and synchronize to make sure we won't be
+* interrupted during our reset.
+*/
+   job_write(pfdev, JOB_INT_MASK, 0);
+   synchronize_irq(pfdev->js->irq);
+
+   /* Schedulers are stopped and interrupts are masked+flushed, we don't
+* need to protect the 'evict unfinished jobs' lock with the job_lock.
+*/
spin_lock(&pfdev->js->job_lock);
for (i = 0; i < NUM_JOB_SLOTS; i++) {
if (pfdev->jobs[i]) {
@@ -502,7 +512,14 @@ static void panfrost_job_handle_irq(struct panfrost_device 
*pfdev, u32 status)
struct panfrost_job *job;
 
job = pfdev->jobs[j];
-   /* Only NULL if job timeout occurred */
+   /* The only reason this job could be NULL is if the
+* job IRQ handler is called just after the
+* in-flight job eviction in the reset path, and
+* this shouldn't happen because the job IRQ has
+* been masked and synchronized when this eviction
+* happens.
+*/
+   WARN_ON(!job);
if (job) {
pfdev->jobs[j] = NULL;
 
@@ -562,7 +579,7 @@ static void panfrost_reset_work(struct work_struct *work)
 int panfrost_job_init(struct panfrost_device *pfdev)
 {
struct panfrost_job_slot *js;
-   int ret, j, irq;
+   int ret, j;
 
INIT_WORK(&pfdev->reset.work, panfrost_reset_work);
 
@@ -572,11 +589,11 @@ int panfrost_job_init(struct panfrost_device *pfdev)
 
spin_lock_init(&js->job_lock);
 
-   irq = platform_get_irq_byname(to_platform_device(pfdev->dev), "job");
-   if (irq <= 0)
+   js->irq = platform_get_irq_byname(to_platform_device(pfdev->dev), 
"job");
+   if (js->irq <= 0)
return -ENODEV;
 
-   ret = devm_request_threaded_irq(pfdev->dev, irq,
+   ret = devm_request_threaded_irq(pfdev->dev, js->irq,
panfrost_job_irq_handler,
panfrost_job_irq_handler_thread,
IRQF_SHARED, KBUILD_MODNAME "-job",
-- 
2.31.1



Re: [PATCH v5 02/16] drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr

2021-06-29 Thread Boris Brezillon
On Tue, 29 Jun 2021 10:50:36 +0200
Daniel Vetter  wrote:

> On Tue, Jun 29, 2021 at 09:34:56AM +0200, Boris Brezillon wrote:
> > Mali Midgard/Bifrost GPUs have 3 hardware queues but only a global GPU
> > reset. This leads to extra complexity when we need to synchronize timeout
> > works with the reset work. One solution to address that is to have an
> > ordered workqueue at the driver level that will be used by the different
> > schedulers to queue their timeout work. Thanks to the serialization
> > provided by the ordered workqueue we are guaranteed that timeout
> > handlers are executed sequentially, and can thus easily reset the GPU
> > from the timeout handler without extra synchronization.
> > 
> > v5:
> > * Add a new paragraph to the timedout_job() method
> > 
> > v3:
> > * New patch
> > 
> > v4:
> > * Actually use the timeout_wq to queue the timeout work
> > 
> > Signed-off-by: Boris Brezillon 
> > Reviewed-by: Steven Price 
> > Reviewed-by: Lucas Stach 
> > Cc: Qiang Yu 
> > Cc: Emma Anholt 
> > Cc: Alex Deucher 
> > Cc: "Christian König"   
> 
> Acked-by: Daniel Vetter 
> 
> Also since I'm occasionally blinded by my own pride, add suggested-by: me?

Duh, it's an oversight (I thought I had that 'Suggested-by: Daniel
Vetter ...' already).

> I did spend quite a bit pondering how to untangle your various lockdep
> splats in the trd handler :-)

And I'm grateful for your help ;-).


Re: [PATCH v5 02/16] drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr

2021-06-29 Thread Boris Brezillon
Hi Christian,

On Tue, 29 Jun 2021 13:03:58 +0200
Christian König  wrote:

> Am 29.06.21 um 09:34 schrieb Boris Brezillon:
> > Mali Midgard/Bifrost GPUs have 3 hardware queues but only a global GPU
> > reset. This leads to extra complexity when we need to synchronize timeout
> > works with the reset work. One solution to address that is to have an
> > ordered workqueue at the driver level that will be used by the different
> > schedulers to queue their timeout work. Thanks to the serialization
> > provided by the ordered workqueue we are guaranteed that timeout
> > handlers are executed sequentially, and can thus easily reset the GPU
> > from the timeout handler without extra synchronization.  
> 
> Well, we had already tried this and it didn't worked the way it is expected.
> 
> The major problem is that you not only want to serialize the queue, but 
> rather have a single reset for all queues.
> 
> Otherwise you schedule multiple resets for each hardware queue. E.g. for 
> your 3 hardware queues you would reset the GPU 3 times if all of them 
> time out at the same time (which is rather likely).
> 
> Using a single delayed work item doesn't work either because you then 
> only have one timeout.
> 
> What could be done is to cancel all delayed work items from all stopped 
> schedulers.

drm_sched_stop() does that already, and since we call drm_sched_stop()
on all queues in the timeout handler, we end up with only one global
reset happening even if several queues report a timeout at the same
time.

Regards,

Boris


Re: [PATCH v5 09/16] drm/panfrost: Simplify the reset serialization logic

2021-06-29 Thread Boris Brezillon
On Tue, 29 Jun 2021 09:35:03 +0200
Boris Brezillon  wrote:

> @@ -379,57 +370,72 @@ void panfrost_job_enable_interrupts(struct 
> panfrost_device *pfdev)
>   job_write(pfdev, JOB_INT_MASK, irq_mask);
>  }
>  
> -static bool panfrost_scheduler_stop(struct panfrost_queue_state *queue,
> - struct drm_sched_job *bad)
> +static void panfrost_reset(struct panfrost_device *pfdev,
> +struct drm_sched_job *bad)
>  {
> - enum panfrost_queue_status old_status;
> - bool stopped = false;
> + unsigned int i;
> + bool cookie;
>  
> - mutex_lock(&queue->lock);
> - old_status = atomic_xchg(&queue->status,
> -  PANFROST_QUEUE_STATUS_STOPPED);
> - if (old_status == PANFROST_QUEUE_STATUS_STOPPED)
> - goto out;
> + if (!atomic_read(&pfdev->reset.pending))
> + return;
> +
> + /* Stop the schedulers.
> +  *
> +  * FIXME: We temporarily get out of the dma_fence_signalling section
> +  * because the cleanup path generate lockdep splats when taking locks
> +  * to release job resources. We should rework the code to follow this
> +  * pattern:
> +  *
> +  *  try_lock
> +  *  if (locked)
> +  *  release
> +  *  else
> +  *  schedule_work_to_release_later
> +  */
> + for (i = 0; i < NUM_JOB_SLOTS; i++)
> + drm_sched_stop(&pfdev->js->queue[i].sched, bad);
> +
> + cookie = dma_fence_begin_signalling();
>  
> - WARN_ON(old_status != PANFROST_QUEUE_STATUS_ACTIVE);
> - drm_sched_stop(&queue->sched, bad);
>   if (bad)
>   drm_sched_increase_karma(bad);
>  
> - stopped = true;
> + spin_lock(&pfdev->js->job_lock);
> + for (i = 0; i < NUM_JOB_SLOTS; i++) {
> + if (pfdev->jobs[i]) {
> + pm_runtime_put_noidle(pfdev->dev);
> + panfrost_devfreq_record_idle(&pfdev->pfdevfreq);
> + pfdev->jobs[i] = NULL;
> + }
> + }
> + spin_unlock(&pfdev->js->job_lock);
>  
> - /*
> -  * Set the timeout to max so the timer doesn't get started
> -  * when we return from the timeout handler (restored in
> -  * panfrost_scheduler_start()).
> + panfrost_device_reset(pfdev);
> +
> + /* GPU has been reset, we can cancel timeout/fault work that may have
> +  * been queued in the meantime and clear the reset pending bit.
>*/
> - queue->sched.timeout = MAX_SCHEDULE_TIMEOUT;
> + atomic_set(&pfdev->reset.pending, 0);
> + for (i = 0; i < NUM_JOB_SLOTS; i++)
> + cancel_delayed_work(&pfdev->js->queue[i].sched.work_tdr);
>  

Those cancel_delayed_work() calls are useless, drm_sched_stop()
canceled those works already. I'll get rid of them in v6.


[PATCH v6 01/16] drm/sched: Document what the timedout_job method should do

2021-06-29 Thread Boris Brezillon
The documentation is a bit vague and doesn't really describe what the
->timedout_job() is expected to do. Let's add a few more details.

v5:
* New patch

Suggested-by: Daniel Vetter 
Signed-off-by: Boris Brezillon 
Reviewed-by: Daniel Vetter 
---
 include/drm/gpu_scheduler.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index d18af49fd009..aa90ed1f1b2b 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -239,6 +239,20 @@ struct drm_sched_backend_ops {
 * @timedout_job: Called when a job has taken too long to execute,
 * to trigger GPU recovery.
 *
+* This method is called in a workqueue context.
+*
+* Drivers typically issue a reset to recover from GPU hangs, and this
+* procedure usually follows the following workflow:
+*
+* 1. Stop the scheduler using drm_sched_stop(). This will park the
+*scheduler thread and cancel the timeout work, guaranteeing that
+*nothing is queued while we reset the hardware queue
+* 2. Try to gracefully stop non-faulty jobs (optional)
+* 3. Issue a GPU reset (driver-specific)
+* 4. Re-submit jobs using drm_sched_resubmit_jobs()
+* 5. Restart the scheduler using drm_sched_start(). At that point, new
+*jobs can be queued, and the scheduler thread is unblocked
+*
 * Return DRM_GPU_SCHED_STAT_NOMINAL, when all is normal,
 * and the underlying driver has started or completed recovery.
 *
-- 
2.31.1



[PATCH v6 00/16] drm/panfrost

2021-06-29 Thread Boris Brezillon
Hello,

Bunch of improvements to make the panfrost driver more robust and allow
queuing jobs at the HW level.

Changes in v6:
* Collected acks/reviews
* Got rid of the cancel_delayed_work() calls in the reset path

Changes in this v5:
* Document what's excepted in the ->timedout_job() hook
* Add a patch increasing the AS_ACTIVE polling timeout
* Fix a few minor things here and there (see each commit for a detailed
  changelog) and collect R-b/A-b tags

Changes in this v4:
* fixing the reset serialization
* fixing a deadlock in the reset path
* moving the exception enum to a private header

Regards,

Boris

Boris Brezillon (15):
  drm/sched: Document what the timedout_job method should do
  drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr
  drm/panfrost: Make ->run_job() return an ERR_PTR() when appropriate
  drm/panfrost: Get rid of the unused JS_STATUS_EVENT_ACTIVE definition
  drm/panfrost: Drop the pfdev argument passed to
panfrost_exception_name()
  drm/panfrost: Do the exception -> string translation using a table
  drm/panfrost: Expose a helper to trigger a GPU reset
  drm/panfrost: Use a threaded IRQ for job interrupts
  drm/panfrost: Simplify the reset serialization logic
  drm/panfrost: Make sure job interrupts are masked before resetting
  drm/panfrost: Disable the AS on unhandled page faults
  drm/panfrost: Reset the GPU when the AS_ACTIVE bit is stuck
  drm/panfrost: Don't reset the GPU on job faults unless we really have
to
  drm/panfrost: Kill in-flight jobs on FD close
  drm/panfrost: Increase the AS_ACTIVE polling timeout

Steven Price (1):
  drm/panfrost: Queue jobs on the hardware

 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |   2 +-
 drivers/gpu/drm/etnaviv/etnaviv_sched.c|   3 +-
 drivers/gpu/drm/lima/lima_sched.c  |   3 +-
 drivers/gpu/drm/panfrost/panfrost_device.c | 139 +++--
 drivers/gpu/drm/panfrost/panfrost_device.h |  91 ++-
 drivers/gpu/drm/panfrost/panfrost_gpu.c|   2 +-
 drivers/gpu/drm/panfrost/panfrost_job.c| 626 +++--
 drivers/gpu/drm/panfrost/panfrost_mmu.c|  43 +-
 drivers/gpu/drm/panfrost/panfrost_regs.h   |   3 -
 drivers/gpu/drm/scheduler/sched_main.c |  14 +-
 drivers/gpu/drm/v3d/v3d_sched.c|  10 +-
 include/drm/gpu_scheduler.h|  37 +-
 12 files changed, 717 insertions(+), 256 deletions(-)

-- 
2.31.1



[PATCH v6 02/16] drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr

2021-06-29 Thread Boris Brezillon
Mali Midgard/Bifrost GPUs have 3 hardware queues but only a global GPU
reset. This leads to extra complexity when we need to synchronize timeout
works with the reset work. One solution to address that is to have an
ordered workqueue at the driver level that will be used by the different
schedulers to queue their timeout work. Thanks to the serialization
provided by the ordered workqueue we are guaranteed that timeout
handlers are executed sequentially, and can thus easily reset the GPU
from the timeout handler without extra synchronization.

v5:
* Add a new paragraph to the timedout_job() method

v3:
* New patch

v4:
* Actually use the timeout_wq to queue the timeout work

Suggested-by: Daniel Vetter 
Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
Reviewed-by: Lucas Stach 
Acked-by: Daniel Vetter 
Acked-by: Christian König 
Cc: Qiang Yu 
Cc: Emma Anholt 
Cc: Alex Deucher 
Cc: "Christian König" 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  2 +-
 drivers/gpu/drm/etnaviv/etnaviv_sched.c   |  3 ++-
 drivers/gpu/drm/lima/lima_sched.c |  3 ++-
 drivers/gpu/drm/panfrost/panfrost_job.c   |  3 ++-
 drivers/gpu/drm/scheduler/sched_main.c| 14 +-
 drivers/gpu/drm/v3d/v3d_sched.c   | 10 +-
 include/drm/gpu_scheduler.h   | 23 ++-
 7 files changed, 43 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 72d9b92b1754..d4547d195173 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -490,7 +490,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
 
r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
   num_hw_submission, amdgpu_job_hang_limit,
-  timeout, sched_score, ring->name);
+  timeout, NULL, sched_score, ring->name);
if (r) {
DRM_ERROR("Failed to create scheduler on ring %s.\n",
  ring->name);
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c 
b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
index 19826e504efc..feb6da1b6ceb 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -190,7 +190,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
 
ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
 etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
-msecs_to_jiffies(500), NULL, dev_name(gpu->dev));
+msecs_to_jiffies(500), NULL, NULL,
+dev_name(gpu->dev));
if (ret)
return ret;
 
diff --git a/drivers/gpu/drm/lima/lima_sched.c 
b/drivers/gpu/drm/lima/lima_sched.c
index ecf3267334ff..dba8329937a3 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -508,7 +508,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, 
const char *name)
INIT_WORK(&pipe->recover_work, lima_sched_recover_work);
 
return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
- lima_job_hang_limit, msecs_to_jiffies(timeout),
+ lima_job_hang_limit,
+ msecs_to_jiffies(timeout), NULL,
  NULL, name);
 }
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index beb62c8fc851..17bc5e3bfe0e 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -615,7 +615,8 @@ int panfrost_job_init(struct panfrost_device *pfdev)
 
ret = drm_sched_init(&js->queue[j].sched,
 &panfrost_sched_ops,
-1, 0, msecs_to_jiffies(JOB_TIMEOUT_MS),
+1, 0,
+msecs_to_jiffies(JOB_TIMEOUT_MS), NULL,
 NULL, "pan_js");
if (ret) {
dev_err(pfdev->dev, "Failed to create scheduler: %d.", 
ret);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 60125fbe7bb5..67382621b429 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -232,7 +232,7 @@ static void drm_sched_start_timeout(struct 
drm_gpu_scheduler *sched)
 {
if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
!list_empty(&sched->pending_list))
-   schedule_delayed_work(&sched->work_tdr, sched->timeout);
+   queue_delayed_work(sched->timeout_wq, &sched->work_tdr, 
sched->timeout);
 }
 
 /**
@@ -244,7 +244,7 @@ static void drm_sched_sta

[PATCH v6 04/16] drm/panfrost: Get rid of the unused JS_STATUS_EVENT_ACTIVE definition

2021-06-29 Thread Boris Brezillon
Exception types will be defined as an enum.

v4:
* Fix typo in the commit message

Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
Reviewed-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/panfrost/panfrost_regs.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h 
b/drivers/gpu/drm/panfrost/panfrost_regs.h
index dc9df5457f1c..1940ff86e49a 100644
--- a/drivers/gpu/drm/panfrost/panfrost_regs.h
+++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
@@ -262,9 +262,6 @@
 #define JS_COMMAND_SOFT_STOP_1 0x06/* Execute SOFT_STOP if 
JOB_CHAIN_FLAG is 1 */
 #define JS_COMMAND_HARD_STOP_1 0x07/* Execute HARD_STOP if 
JOB_CHAIN_FLAG is 1 */
 
-#define JS_STATUS_EVENT_ACTIVE 0x08
-
-
 /* MMU regs */
 #define MMU_INT_RAWSTAT0x2000
 #define MMU_INT_CLEAR  0x2004
-- 
2.31.1



[PATCH v6 03/16] drm/panfrost: Make ->run_job() return an ERR_PTR() when appropriate

2021-06-29 Thread Boris Brezillon
If the fence creation fail, we can return the error pointer directly.
The core will update the fence error accordingly.

Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
Reviewed-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/panfrost/panfrost_job.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index 17bc5e3bfe0e..3c1dbae3ebdd 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -344,7 +344,7 @@ static struct dma_fence *panfrost_job_run(struct 
drm_sched_job *sched_job)
 
fence = panfrost_fence_create(pfdev, slot);
if (IS_ERR(fence))
-   return NULL;
+   return fence;
 
if (job->done_fence)
dma_fence_put(job->done_fence);
-- 
2.31.1



[PATCH v6 09/16] drm/panfrost: Simplify the reset serialization logic

2021-06-29 Thread Boris Brezillon
Now that we can pass our own workqueue to drm_sched_init(), we can use
an ordered workqueue on for both the scheduler timeout tdr and our own
reset work (which we use when the reset is not caused by a fault/timeout
on a specific job, like when we have AS_ACTIVE bit stuck). This
guarantees that the timeout handlers and reset handler can't run
concurrently which drastically simplifies the locking.

v5:
* Don't call cancel_delayed_timeout() in the reset path (those works
  are canceled in drm_sched_stop())

v4:
* Actually pass the reset workqueue to drm_sched_init()
* Don't call cancel_work_sync() in panfrost_reset(). It will deadlock
  since it might be called from the reset work, which is executing and
  cancel_work_sync() will wait for the handler to return. Checking the
  reset pending status should avoid spurious resets

v3:
* New patch

Suggested-by: Daniel Vetter 
Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_device.h |   6 +-
 drivers/gpu/drm/panfrost/panfrost_job.c| 186 -
 2 files changed, 69 insertions(+), 123 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index aec11b6f5abc..d37bc09809e8 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -109,6 +109,7 @@ struct panfrost_device {
struct mutex sched_lock;
 
struct {
+   struct workqueue_struct *wq;
struct work_struct work;
atomic_t pending;
} reset;
@@ -247,9 +248,8 @@ const char *panfrost_exception_name(u32 exception_code);
 static inline void
 panfrost_device_schedule_reset(struct panfrost_device *pfdev)
 {
-   /* Schedule a reset if there's no reset in progress. */
-   if (!atomic_xchg(&pfdev->reset.pending, 1))
-   schedule_work(&pfdev->reset.work);
+   atomic_set(&pfdev->reset.pending, 1);
+   queue_work(pfdev->reset.wq, &pfdev->reset.work);
 }
 
 #endif
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index 6149e4c7133f..59c23c91e47c 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -25,17 +25,8 @@
 #define job_write(dev, reg, data) writel(data, dev->iomem + (reg))
 #define job_read(dev, reg) readl(dev->iomem + (reg))
 
-enum panfrost_queue_status {
-   PANFROST_QUEUE_STATUS_ACTIVE,
-   PANFROST_QUEUE_STATUS_STOPPED,
-   PANFROST_QUEUE_STATUS_STARTING,
-   PANFROST_QUEUE_STATUS_FAULT_PENDING,
-};
-
 struct panfrost_queue_state {
struct drm_gpu_scheduler sched;
-   atomic_t status;
-   struct mutex lock;
u64 fence_context;
u64 emit_seqno;
 };
@@ -368,57 +359,67 @@ void panfrost_job_enable_interrupts(struct 
panfrost_device *pfdev)
job_write(pfdev, JOB_INT_MASK, irq_mask);
 }
 
-static bool panfrost_scheduler_stop(struct panfrost_queue_state *queue,
-   struct drm_sched_job *bad)
+static void panfrost_reset(struct panfrost_device *pfdev,
+  struct drm_sched_job *bad)
 {
-   enum panfrost_queue_status old_status;
-   bool stopped = false;
+   unsigned int i;
+   bool cookie;
 
-   mutex_lock(&queue->lock);
-   old_status = atomic_xchg(&queue->status,
-PANFROST_QUEUE_STATUS_STOPPED);
-   if (old_status == PANFROST_QUEUE_STATUS_STOPPED)
-   goto out;
+   if (!atomic_read(&pfdev->reset.pending))
+   return;
+
+   /* Stop the schedulers.
+*
+* FIXME: We temporarily get out of the dma_fence_signalling section
+* because the cleanup path generate lockdep splats when taking locks
+* to release job resources. We should rework the code to follow this
+* pattern:
+*
+*  try_lock
+*  if (locked)
+*  release
+*  else
+*  schedule_work_to_release_later
+*/
+   for (i = 0; i < NUM_JOB_SLOTS; i++)
+   drm_sched_stop(&pfdev->js->queue[i].sched, bad);
+
+   cookie = dma_fence_begin_signalling();
 
-   WARN_ON(old_status != PANFROST_QUEUE_STATUS_ACTIVE);
-   drm_sched_stop(&queue->sched, bad);
if (bad)
drm_sched_increase_karma(bad);
 
-   stopped = true;
+   spin_lock(&pfdev->js->job_lock);
+   for (i = 0; i < NUM_JOB_SLOTS; i++) {
+   if (pfdev->jobs[i]) {
+   pm_runtime_put_noidle(pfdev->dev);
+   panfrost_devfreq_record_idle(&pfdev->pfdevfreq);
+   pfdev->jobs[i] = NULL;
+   }
+   }
+   spin_unlock(&pfdev->js->job_lock);
 
-   

[PATCH v6 05/16] drm/panfrost: Drop the pfdev argument passed to panfrost_exception_name()

2021-06-29 Thread Boris Brezillon
Currently unused. We'll add it back if we need per-GPU definitions.

Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_device.c | 2 +-
 drivers/gpu/drm/panfrost/panfrost_device.h | 2 +-
 drivers/gpu/drm/panfrost/panfrost_gpu.c| 2 +-
 drivers/gpu/drm/panfrost/panfrost_job.c| 2 +-
 drivers/gpu/drm/panfrost/panfrost_mmu.c| 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c 
b/drivers/gpu/drm/panfrost/panfrost_device.c
index a2a09c51eed7..f7f5ca94f910 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.c
+++ b/drivers/gpu/drm/panfrost/panfrost_device.c
@@ -292,7 +292,7 @@ void panfrost_device_fini(struct panfrost_device *pfdev)
panfrost_clk_fini(pfdev);
 }
 
-const char *panfrost_exception_name(struct panfrost_device *pfdev, u32 
exception_code)
+const char *panfrost_exception_name(u32 exception_code)
 {
switch (exception_code) {
/* Non-Fault Status code */
diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index 8b2cdb8c701d..2fe1550da7f8 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -173,6 +173,6 @@ void panfrost_device_reset(struct panfrost_device *pfdev);
 int panfrost_device_resume(struct device *dev);
 int panfrost_device_suspend(struct device *dev);
 
-const char *panfrost_exception_name(struct panfrost_device *pfdev, u32 
exception_code);
+const char *panfrost_exception_name(u32 exception_code);
 
 #endif
diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c 
b/drivers/gpu/drm/panfrost/panfrost_gpu.c
index 0e70e27fd8c3..26e4196b6c90 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
@@ -33,7 +33,7 @@ static irqreturn_t panfrost_gpu_irq_handler(int irq, void 
*data)
address |= gpu_read(pfdev, GPU_FAULT_ADDRESS_LO);
 
dev_warn(pfdev->dev, "GPU Fault 0x%08x (%s) at 0x%016llx\n",
-fault_status & 0xFF, panfrost_exception_name(pfdev, 
fault_status),
+fault_status & 0xFF, 
panfrost_exception_name(fault_status),
 address);
 
if (state & GPU_IRQ_MULTIPLE_FAULT)
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index 3c1dbae3ebdd..ea3432ffde40 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -489,7 +489,7 @@ static irqreturn_t panfrost_job_irq_handler(int irq, void 
*data)
 
dev_err(pfdev->dev, "js fault, js=%d, status=%s, 
head=0x%x, tail=0x%x",
j,
-   panfrost_exception_name(pfdev, job_read(pfdev, 
JS_STATUS(j))),
+   panfrost_exception_name(job_read(pfdev, 
JS_STATUS(j))),
job_read(pfdev, JS_HEAD_LO(j)),
job_read(pfdev, JS_TAIL_LO(j)));
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 569509c2ba27..2a9bf30edc9d 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -675,7 +675,7 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, 
void *data)
"TODO",
fault_status,
(fault_status & (1 << 10) ? "DECODER FAULT" : 
"SLAVE FAULT"),
-   exception_type, panfrost_exception_name(pfdev, 
exception_type),
+   exception_type, 
panfrost_exception_name(exception_type),
access_type, access_type_name(pfdev, 
fault_status),
source_id);
 
-- 
2.31.1



[PATCH v6 07/16] drm/panfrost: Expose a helper to trigger a GPU reset

2021-06-29 Thread Boris Brezillon
Expose a helper to trigger a GPU reset so we can easily trigger reset
operations outside the job timeout handler.

Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
Reviewed-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/panfrost/panfrost_device.h | 8 
 drivers/gpu/drm/panfrost/panfrost_job.c| 4 +---
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index 3639288ee8a2..aec11b6f5abc 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -244,4 +244,12 @@ enum drm_panfrost_exception_type {
 
 const char *panfrost_exception_name(u32 exception_code);
 
+static inline void
+panfrost_device_schedule_reset(struct panfrost_device *pfdev)
+{
+   /* Schedule a reset if there's no reset in progress. */
+   if (!atomic_xchg(&pfdev->reset.pending, 1))
+   schedule_work(&pfdev->reset.work);
+}
+
 #endif
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index ea3432ffde40..12285a285765 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -447,9 +447,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct 
drm_sched_job
if (!panfrost_scheduler_stop(&pfdev->js->queue[js], sched_job))
return DRM_GPU_SCHED_STAT_NOMINAL;
 
-   /* Schedule a reset if there's no reset in progress. */
-   if (!atomic_xchg(&pfdev->reset.pending, 1))
-   schedule_work(&pfdev->reset.work);
+   panfrost_device_schedule_reset(pfdev);
 
return DRM_GPU_SCHED_STAT_NOMINAL;
 }
-- 
2.31.1



[PATCH v6 08/16] drm/panfrost: Use a threaded IRQ for job interrupts

2021-06-29 Thread Boris Brezillon
This should avoid switching to interrupt context when the GPU is under
heavy use.

v3:
* Don't take the job_lock in panfrost_job_handle_irq()

Signed-off-by: Boris Brezillon 
Acked-by: Alyssa Rosenzweig 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_job.c | 53 ++---
 1 file changed, 38 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index 12285a285765..6149e4c7133f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -459,19 +459,12 @@ static const struct drm_sched_backend_ops 
panfrost_sched_ops = {
.free_job = panfrost_job_free
 };
 
-static irqreturn_t panfrost_job_irq_handler(int irq, void *data)
+static void panfrost_job_handle_irq(struct panfrost_device *pfdev, u32 status)
 {
-   struct panfrost_device *pfdev = data;
-   u32 status = job_read(pfdev, JOB_INT_STAT);
int j;
 
dev_dbg(pfdev->dev, "jobslot irq status=%x\n", status);
 
-   if (!status)
-   return IRQ_NONE;
-
-   pm_runtime_mark_last_busy(pfdev->dev);
-
for (j = 0; status; j++) {
u32 mask = MK_JS_MASK(j);
 
@@ -508,7 +501,6 @@ static irqreturn_t panfrost_job_irq_handler(int irq, void 
*data)
if (status & JOB_INT_MASK_DONE(j)) {
struct panfrost_job *job;
 
-   spin_lock(&pfdev->js->job_lock);
job = pfdev->jobs[j];
/* Only NULL if job timeout occurred */
if (job) {
@@ -520,21 +512,49 @@ static irqreturn_t panfrost_job_irq_handler(int irq, void 
*data)
dma_fence_signal_locked(job->done_fence);
pm_runtime_put_autosuspend(pfdev->dev);
}
-   spin_unlock(&pfdev->js->job_lock);
}
 
status &= ~mask;
}
+}
 
+static irqreturn_t panfrost_job_irq_handler_thread(int irq, void *data)
+{
+   struct panfrost_device *pfdev = data;
+   u32 status = job_read(pfdev, JOB_INT_RAWSTAT);
+
+   while (status) {
+   pm_runtime_mark_last_busy(pfdev->dev);
+
+   spin_lock(&pfdev->js->job_lock);
+   panfrost_job_handle_irq(pfdev, status);
+   spin_unlock(&pfdev->js->job_lock);
+   status = job_read(pfdev, JOB_INT_RAWSTAT);
+   }
+
+   job_write(pfdev, JOB_INT_MASK,
+ GENMASK(16 + NUM_JOB_SLOTS - 1, 16) |
+ GENMASK(NUM_JOB_SLOTS - 1, 0));
return IRQ_HANDLED;
 }
 
+static irqreturn_t panfrost_job_irq_handler(int irq, void *data)
+{
+   struct panfrost_device *pfdev = data;
+   u32 status = job_read(pfdev, JOB_INT_STAT);
+
+   if (!status)
+   return IRQ_NONE;
+
+   job_write(pfdev, JOB_INT_MASK, 0);
+   return IRQ_WAKE_THREAD;
+}
+
 static void panfrost_reset(struct work_struct *work)
 {
struct panfrost_device *pfdev = container_of(work,
 struct panfrost_device,
 reset.work);
-   unsigned long flags;
unsigned int i;
bool cookie;
 
@@ -564,7 +584,7 @@ static void panfrost_reset(struct work_struct *work)
/* All timers have been stopped, we can safely reset the pending state. 
*/
atomic_set(&pfdev->reset.pending, 0);
 
-   spin_lock_irqsave(&pfdev->js->job_lock, flags);
+   spin_lock(&pfdev->js->job_lock);
for (i = 0; i < NUM_JOB_SLOTS; i++) {
if (pfdev->jobs[i]) {
pm_runtime_put_noidle(pfdev->dev);
@@ -572,7 +592,7 @@ static void panfrost_reset(struct work_struct *work)
pfdev->jobs[i] = NULL;
}
}
-   spin_unlock_irqrestore(&pfdev->js->job_lock, flags);
+   spin_unlock(&pfdev->js->job_lock);
 
panfrost_device_reset(pfdev);
 
@@ -599,8 +619,11 @@ int panfrost_job_init(struct panfrost_device *pfdev)
if (irq <= 0)
return -ENODEV;
 
-   ret = devm_request_irq(pfdev->dev, irq, panfrost_job_irq_handler,
-  IRQF_SHARED, KBUILD_MODNAME "-job", pfdev);
+   ret = devm_request_threaded_irq(pfdev->dev, irq,
+   panfrost_job_irq_handler,
+   panfrost_job_irq_handler_thread,
+   IRQF_SHARED, KBUILD_MODNAME "-job",
+   pfdev);
if (ret) {
dev_err(pfdev->dev, "failed to request job irq");
return ret;
-- 
2.31.1



[PATCH v6 13/16] drm/panfrost: Don't reset the GPU on job faults unless we really have to

2021-06-29 Thread Boris Brezillon
If we can recover from a fault without a reset there's no reason to
issue one.

v3:
* Drop the mention of Valhall requiring a reset on JOB_BUS_FAULT
* Set the fence error to -EINVAL instead of having per-exception
  error codes

Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_device.c |  9 +
 drivers/gpu/drm/panfrost/panfrost_device.h |  2 ++
 drivers/gpu/drm/panfrost/panfrost_job.c| 16 ++--
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c 
b/drivers/gpu/drm/panfrost/panfrost_device.c
index cd76d2ff5034..bd9b7be63b0f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.c
+++ b/drivers/gpu/drm/panfrost/panfrost_device.c
@@ -379,6 +379,15 @@ const char *panfrost_exception_name(u32 exception_code)
return panfrost_exception_infos[exception_code].name;
 }
 
+bool panfrost_exception_needs_reset(const struct panfrost_device *pfdev,
+   u32 exception_code)
+{
+   /* Right now, none of the GPU we support need a reset, but this
+* might change.
+*/
+   return false;
+}
+
 void panfrost_device_reset(struct panfrost_device *pfdev)
 {
panfrost_gpu_soft_reset(pfdev);
diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index cb5aadf7ae90..68e93b7e5b61 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -245,6 +245,8 @@ enum drm_panfrost_exception_type {
 };
 
 const char *panfrost_exception_name(u32 exception_code);
+bool panfrost_exception_needs_reset(const struct panfrost_device *pfdev,
+   u32 exception_code);
 
 static inline void
 panfrost_device_schedule_reset(struct panfrost_device *pfdev)
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index 11ff33841caf..cf5f9e8b2a27 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -482,14 +482,26 @@ static void panfrost_job_handle_irq(struct 
panfrost_device *pfdev, u32 status)
job_write(pfdev, JOB_INT_CLEAR, mask);
 
if (status & JOB_INT_MASK_ERR(j)) {
+   u32 js_status = job_read(pfdev, JS_STATUS(j));
+
job_write(pfdev, JS_COMMAND_NEXT(j), JS_COMMAND_NOP);
 
dev_err(pfdev->dev, "js fault, js=%d, status=%s, 
head=0x%x, tail=0x%x",
j,
-   panfrost_exception_name(job_read(pfdev, 
JS_STATUS(j))),
+   panfrost_exception_name(js_status),
job_read(pfdev, JS_HEAD_LO(j)),
job_read(pfdev, JS_TAIL_LO(j)));
-   drm_sched_fault(&pfdev->js->queue[j].sched);
+
+   /* If we need a reset, signal it to the timeout
+* handler, otherwise, update the fence error field and
+* signal the job fence.
+*/
+   if (panfrost_exception_needs_reset(pfdev, js_status)) {
+   drm_sched_fault(&pfdev->js->queue[j].sched);
+   } else {
+   dma_fence_set_error(pfdev->jobs[j]->done_fence, 
-EINVAL);
+   status |= JOB_INT_MASK_DONE(j);
+   }
}
 
if (status & JOB_INT_MASK_DONE(j)) {
-- 
2.31.1



[PATCH v6 11/16] drm/panfrost: Disable the AS on unhandled page faults

2021-06-29 Thread Boris Brezillon
If we don't do that, we have to wait for the job timeout to expire
before the fault jobs gets killed.

v3:
* Make sure the AS is re-enabled when new jobs are submitted to the
  context

Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_device.h |  1 +
 drivers/gpu/drm/panfrost/panfrost_mmu.c| 34 --
 2 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index d37bc09809e8..cb5aadf7ae90 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -97,6 +97,7 @@ struct panfrost_device {
spinlock_t as_lock;
unsigned long as_in_use_mask;
unsigned long as_alloc_mask;
+   unsigned long as_faulty_mask;
struct list_head as_lru_list;
 
struct panfrost_job_slot *js;
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 2a9bf30edc9d..959da770295c 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -154,6 +154,7 @@ u32 panfrost_mmu_as_get(struct panfrost_device *pfdev, 
struct panfrost_mmu *mmu)
as = mmu->as;
if (as >= 0) {
int en = atomic_inc_return(&mmu->as_count);
+   u32 mask = BIT(as) | BIT(16 + as);
 
/*
 * AS can be retained by active jobs or a perfcnt context,
@@ -162,6 +163,18 @@ u32 panfrost_mmu_as_get(struct panfrost_device *pfdev, 
struct panfrost_mmu *mmu)
WARN_ON(en >= (NUM_JOB_SLOTS + 1));
 
list_move(&mmu->list, &pfdev->as_lru_list);
+
+   if (pfdev->as_faulty_mask & mask) {
+   /* Unhandled pagefault on this AS, the MMU was
+* disabled. We need to re-enable the MMU after
+* clearing+unmasking the AS interrupts.
+*/
+   mmu_write(pfdev, MMU_INT_CLEAR, mask);
+   mmu_write(pfdev, MMU_INT_MASK, ~pfdev->as_faulty_mask);
+   pfdev->as_faulty_mask &= ~mask;
+   panfrost_mmu_enable(pfdev, mmu);
+   }
+
goto out;
}
 
@@ -211,6 +224,7 @@ void panfrost_mmu_reset(struct panfrost_device *pfdev)
spin_lock(&pfdev->as_lock);
 
pfdev->as_alloc_mask = 0;
+   pfdev->as_faulty_mask = 0;
 
list_for_each_entry_safe(mmu, mmu_tmp, &pfdev->as_lru_list, list) {
mmu->as = -1;
@@ -661,7 +675,7 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, 
void *data)
if ((status & mask) == BIT(as) && (exception_type & 0xF8) == 
0xC0)
ret = panfrost_mmu_map_fault_addr(pfdev, as, addr);
 
-   if (ret)
+   if (ret) {
/* terminal fault, print info about the fault */
dev_err(pfdev->dev,
"Unhandled Page fault in AS%d at VA 0x%016llX\n"
@@ -679,14 +693,28 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int 
irq, void *data)
access_type, access_type_name(pfdev, 
fault_status),
source_id);
 
+   spin_lock(&pfdev->as_lock);
+   /* Ignore MMU interrupts on this AS until it's been
+* re-enabled.
+*/
+   pfdev->as_faulty_mask |= mask;
+
+   /* Disable the MMU to kill jobs on this AS. */
+   panfrost_mmu_disable(pfdev, as);
+   spin_unlock(&pfdev->as_lock);
+   }
+
status &= ~mask;
 
/* If we received new MMU interrupts, process them before 
returning. */
if (!status)
-   status = mmu_read(pfdev, MMU_INT_RAWSTAT);
+   status = mmu_read(pfdev, MMU_INT_RAWSTAT) & 
~pfdev->as_faulty_mask;
}
 
-   mmu_write(pfdev, MMU_INT_MASK, ~0);
+   spin_lock(&pfdev->as_lock);
+   mmu_write(pfdev, MMU_INT_MASK, ~pfdev->as_faulty_mask);
+   spin_unlock(&pfdev->as_lock);
+
return IRQ_HANDLED;
 };
 
-- 
2.31.1



[PATCH v6 06/16] drm/panfrost: Do the exception -> string translation using a table

2021-06-29 Thread Boris Brezillon
Do the exception -> string translation using a table. This way we get
rid of those magic numbers and can easily add new fields if we need
to attach extra information to exception types.

v4:
* Don't expose exception type to userspace
* Merge the enum definition and the enum -> string table declaration
  in the same patch

v3:
* Drop the error field

Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
Reviewed-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/panfrost/panfrost_device.c | 130 +
 drivers/gpu/drm/panfrost/panfrost_device.h |  69 +++
 2 files changed, 152 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c 
b/drivers/gpu/drm/panfrost/panfrost_device.c
index f7f5ca94f910..cd76d2ff5034 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.c
+++ b/drivers/gpu/drm/panfrost/panfrost_device.c
@@ -292,55 +292,91 @@ void panfrost_device_fini(struct panfrost_device *pfdev)
panfrost_clk_fini(pfdev);
 }
 
-const char *panfrost_exception_name(u32 exception_code)
-{
-   switch (exception_code) {
-   /* Non-Fault Status code */
-   case 0x00: return "NOT_STARTED/IDLE/OK";
-   case 0x01: return "DONE";
-   case 0x02: return "INTERRUPTED";
-   case 0x03: return "STOPPED";
-   case 0x04: return "TERMINATED";
-   case 0x08: return "ACTIVE";
-   /* Job exceptions */
-   case 0x40: return "JOB_CONFIG_FAULT";
-   case 0x41: return "JOB_POWER_FAULT";
-   case 0x42: return "JOB_READ_FAULT";
-   case 0x43: return "JOB_WRITE_FAULT";
-   case 0x44: return "JOB_AFFINITY_FAULT";
-   case 0x48: return "JOB_BUS_FAULT";
-   case 0x50: return "INSTR_INVALID_PC";
-   case 0x51: return "INSTR_INVALID_ENC";
-   case 0x52: return "INSTR_TYPE_MISMATCH";
-   case 0x53: return "INSTR_OPERAND_FAULT";
-   case 0x54: return "INSTR_TLS_FAULT";
-   case 0x55: return "INSTR_BARRIER_FAULT";
-   case 0x56: return "INSTR_ALIGN_FAULT";
-   case 0x58: return "DATA_INVALID_FAULT";
-   case 0x59: return "TILE_RANGE_FAULT";
-   case 0x5A: return "ADDR_RANGE_FAULT";
-   case 0x60: return "OUT_OF_MEMORY";
-   /* GPU exceptions */
-   case 0x80: return "DELAYED_BUS_FAULT";
-   case 0x88: return "SHAREABILITY_FAULT";
-   /* MMU exceptions */
-   case 0xC1: return "TRANSLATION_FAULT_LEVEL1";
-   case 0xC2: return "TRANSLATION_FAULT_LEVEL2";
-   case 0xC3: return "TRANSLATION_FAULT_LEVEL3";
-   case 0xC4: return "TRANSLATION_FAULT_LEVEL4";
-   case 0xC8: return "PERMISSION_FAULT";
-   case 0xC9 ... 0xCF: return "PERMISSION_FAULT";
-   case 0xD1: return "TRANSTAB_BUS_FAULT_LEVEL1";
-   case 0xD2: return "TRANSTAB_BUS_FAULT_LEVEL2";
-   case 0xD3: return "TRANSTAB_BUS_FAULT_LEVEL3";
-   case 0xD4: return "TRANSTAB_BUS_FAULT_LEVEL4";
-   case 0xD8: return "ACCESS_FLAG";
-   case 0xD9 ... 0xDF: return "ACCESS_FLAG";
-   case 0xE0 ... 0xE7: return "ADDRESS_SIZE_FAULT";
-   case 0xE8 ... 0xEF: return "MEMORY_ATTRIBUTES_FAULT";
+#define PANFROST_EXCEPTION(id) \
+   [DRM_PANFROST_EXCEPTION_ ## id] = { \
+   .name = #id, \
}
 
-   return "UNKNOWN";
+struct panfrost_exception_info {
+   const char *name;
+};
+
+static const struct panfrost_exception_info panfrost_exception_infos[] = {
+   PANFROST_EXCEPTION(OK),
+   PANFROST_EXCEPTION(DONE),
+   PANFROST_EXCEPTION(INTERRUPTED),
+   PANFROST_EXCEPTION(STOPPED),
+   PANFROST_EXCEPTION(TERMINATED),
+   PANFROST_EXCEPTION(KABOOM),
+   PANFROST_EXCEPTION(EUREKA),
+   PANFROST_EXCEPTION(ACTIVE),
+   PANFROST_EXCEPTION(JOB_CONFIG_FAULT),
+   PANFROST_EXCEPTION(JOB_POWER_FAULT),
+   PANFROST_EXCEPTION(JOB_READ_FAULT),
+   PANFROST_EXCEPTION(JOB_WRITE_FAULT),
+   PANFROST_EXCEPTION(JOB_AFFINITY_FAULT),
+   PANFROST_EXCEPTION(JOB_BUS_FAULT),
+   PANFROST_EXCEPTION(INSTR_INVALID_PC),
+   PANFROST_EXCEPTION(INSTR_INVALID_ENC),
+   PANFROST_EXCEPTION(INSTR_TYPE_MISMATCH),
+   PANFROST_EXCEPTION(INSTR_OPERAND_FAULT),
+   PANFROST_EXCEPTION(INSTR_TLS_FAULT),
+   PANFROST_EXCEPTION(INSTR_BARRIER_FAULT),
+   PANFROST_EXCEPTION(INSTR_ALIGN_FAULT),
+   PANFROST_EXCEPTION(DATA_INVALID_FAULT),
+   PANFROST_EXCEPTION(TILE_RANGE_FAULT),
+   PANFROST_EXCEPTION(ADDR_RANGE_FAULT),
+   PANFROST_EXCEPTION(IMPRECISE_FAULT),
+   PANFROST_EXCEPTION(OOM),
+   PANFROST_EXCEPTION(OOM_AFBC),
+   PANFROST_EXCEP

[PATCH v6 15/16] drm/panfrost: Queue jobs on the hardware

2021-06-29 Thread Boris Brezillon
From: Steven Price 

The hardware has a set of '_NEXT' registers that can hold a second job
while the first is executing. Make use of these registers to enqueue a
second job per slot.

v5:
* Fix a comment in panfrost_job_init()

v3:
* Fix the done/err job dequeuing logic to get a valid active state
* Only enable the second slot on GPUs supporting jobchain disambiguation
* Split interrupt handling in sub-functions

Signed-off-by: Steven Price 
Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_device.h |   2 +-
 drivers/gpu/drm/panfrost/panfrost_job.c| 467 +++--
 2 files changed, 351 insertions(+), 118 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index 193cd87f643c..8b25278f34c8 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -102,7 +102,7 @@ struct panfrost_device {
 
struct panfrost_job_slot *js;
 
-   struct panfrost_job *jobs[NUM_JOB_SLOTS];
+   struct panfrost_job *jobs[NUM_JOB_SLOTS][2];
struct list_head scheduled_jobs;
 
struct panfrost_perfcnt *perfcnt;
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index 8a0db9571bfd..71a72fb50e6b 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -140,9 +141,52 @@ static void panfrost_job_write_affinity(struct 
panfrost_device *pfdev,
job_write(pfdev, JS_AFFINITY_NEXT_HI(js), affinity >> 32);
 }
 
+static u32
+panfrost_get_job_chain_flag(const struct panfrost_job *job)
+{
+   struct panfrost_fence *f = to_panfrost_fence(job->done_fence);
+
+   if (!panfrost_has_hw_feature(job->pfdev, 
HW_FEATURE_JOBCHAIN_DISAMBIGUATION))
+   return 0;
+
+   return (f->seqno & 1) ? JS_CONFIG_JOB_CHAIN_FLAG : 0;
+}
+
+static struct panfrost_job *
+panfrost_dequeue_job(struct panfrost_device *pfdev, int slot)
+{
+   struct panfrost_job *job = pfdev->jobs[slot][0];
+
+   WARN_ON(!job);
+   pfdev->jobs[slot][0] = pfdev->jobs[slot][1];
+   pfdev->jobs[slot][1] = NULL;
+
+   return job;
+}
+
+static unsigned int
+panfrost_enqueue_job(struct panfrost_device *pfdev, int slot,
+struct panfrost_job *job)
+{
+   if (WARN_ON(!job))
+   return 0;
+
+   if (!pfdev->jobs[slot][0]) {
+   pfdev->jobs[slot][0] = job;
+   return 0;
+   }
+
+   WARN_ON(pfdev->jobs[slot][1]);
+   pfdev->jobs[slot][1] = job;
+   WARN_ON(panfrost_get_job_chain_flag(job) ==
+   panfrost_get_job_chain_flag(pfdev->jobs[slot][0]));
+   return 1;
+}
+
 static void panfrost_job_hw_submit(struct panfrost_job *job, int js)
 {
struct panfrost_device *pfdev = job->pfdev;
+   unsigned int subslot;
u32 cfg;
u64 jc_head = job->jc;
int ret;
@@ -168,7 +212,8 @@ static void panfrost_job_hw_submit(struct panfrost_job 
*job, int js)
 * start */
cfg |= JS_CONFIG_THREAD_PRI(8) |
JS_CONFIG_START_FLUSH_CLEAN_INVALIDATE |
-   JS_CONFIG_END_FLUSH_CLEAN_INVALIDATE;
+   JS_CONFIG_END_FLUSH_CLEAN_INVALIDATE |
+   panfrost_get_job_chain_flag(job);
 
if (panfrost_has_hw_feature(pfdev, HW_FEATURE_FLUSH_REDUCTION))
cfg |= JS_CONFIG_ENABLE_FLUSH_REDUCTION;
@@ -182,10 +227,17 @@ static void panfrost_job_hw_submit(struct panfrost_job 
*job, int js)
job_write(pfdev, JS_FLUSH_ID_NEXT(js), job->flush_id);
 
/* GO ! */
-   dev_dbg(pfdev->dev, "JS: Submitting atom %p to js[%d] with head=0x%llx",
-   job, js, jc_head);
 
-   job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START);
+   spin_lock(&pfdev->js->job_lock);
+   subslot = panfrost_enqueue_job(pfdev, js, job);
+   /* Don't queue the job if a reset is in progress */
+   if (!atomic_read(&pfdev->reset.pending)) {
+   job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START);
+   dev_dbg(pfdev->dev,
+   "JS: Submitting atom %p to js[%d][%d] with head=0x%llx 
AS %d",
+   job, js, subslot, jc_head, cfg & 0xf);
+   }
+   spin_unlock(&pfdev->js->job_lock);
 }
 
 static int panfrost_acquire_object_fences(struct drm_gem_object **bos,
@@ -332,7 +384,11 @@ static struct dma_fence *panfrost_job_run(struct 
drm_sched_job *sched_job)
if (unlikely(job->base.s_fence->finished.error))
return NULL;
 
-   pfdev->jobs[slot] = job;
+   /* Nothing to execute: can happen if the job has finished while
+* we were resetting the GPU.
+*/
+ 

[PATCH v6 10/16] drm/panfrost: Make sure job interrupts are masked before resetting

2021-06-29 Thread Boris Brezillon
This is not yet needed because we let active jobs be killed during by
the reset and we don't really bother making sure they can be restarted.
But once we start adding soft-stop support, controlling when we deal
with the remaining interrrupts and making sure those are handled before
the reset is issued gets tricky if we keep job interrupts active.

Let's prepare for that and mask+flush job IRQs before issuing a reset.

v4:
* Add a comment explaining why we WARN_ON(!job) in the irq handler
* Keep taking the job_lock when evicting stalled jobs

v3:
* New patch

Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_job.c | 27 -
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index 59c23c91e47c..11ff33841caf 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -34,6 +34,7 @@ struct panfrost_queue_state {
 struct panfrost_job_slot {
struct panfrost_queue_state queue[NUM_JOB_SLOTS];
spinlock_t job_lock;
+   int irq;
 };
 
 static struct panfrost_job *
@@ -389,6 +390,15 @@ static void panfrost_reset(struct panfrost_device *pfdev,
if (bad)
drm_sched_increase_karma(bad);
 
+   /* Mask job interrupts and synchronize to make sure we won't be
+* interrupted during our reset.
+*/
+   job_write(pfdev, JOB_INT_MASK, 0);
+   synchronize_irq(pfdev->js->irq);
+
+   /* Schedulers are stopped and interrupts are masked+flushed, we don't
+* need to protect the 'evict unfinished jobs' lock with the job_lock.
+*/
spin_lock(&pfdev->js->job_lock);
for (i = 0; i < NUM_JOB_SLOTS; i++) {
if (pfdev->jobs[i]) {
@@ -486,7 +496,14 @@ static void panfrost_job_handle_irq(struct panfrost_device 
*pfdev, u32 status)
struct panfrost_job *job;
 
job = pfdev->jobs[j];
-   /* Only NULL if job timeout occurred */
+   /* The only reason this job could be NULL is if the
+* job IRQ handler is called just after the
+* in-flight job eviction in the reset path, and
+* this shouldn't happen because the job IRQ has
+* been masked and synchronized when this eviction
+* happens.
+*/
+   WARN_ON(!job);
if (job) {
pfdev->jobs[j] = NULL;
 
@@ -546,7 +563,7 @@ static void panfrost_reset_work(struct work_struct *work)
 int panfrost_job_init(struct panfrost_device *pfdev)
 {
struct panfrost_job_slot *js;
-   int ret, j, irq;
+   int ret, j;
 
INIT_WORK(&pfdev->reset.work, panfrost_reset_work);
 
@@ -556,11 +573,11 @@ int panfrost_job_init(struct panfrost_device *pfdev)
 
spin_lock_init(&js->job_lock);
 
-   irq = platform_get_irq_byname(to_platform_device(pfdev->dev), "job");
-   if (irq <= 0)
+   js->irq = platform_get_irq_byname(to_platform_device(pfdev->dev), 
"job");
+   if (js->irq <= 0)
return -ENODEV;
 
-   ret = devm_request_threaded_irq(pfdev->dev, irq,
+   ret = devm_request_threaded_irq(pfdev->dev, js->irq,
panfrost_job_irq_handler,
panfrost_job_irq_handler_thread,
IRQF_SHARED, KBUILD_MODNAME "-job",
-- 
2.31.1



[PATCH v6 16/16] drm/panfrost: Increase the AS_ACTIVE polling timeout

2021-06-29 Thread Boris Brezillon
Experience has shown that 1ms is sometimes not enough, even when the GPU
is running at its maximum frequency, not to mention that an MMU operation
might take longer if the GPU is running at a lower frequency, which is
likely to be the case if devfreq is active.

Let's pick a significantly bigger timeout value (1ms -> 100ms) to be on
the safe side.

v5:
* New patch

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index e0356e68e768..0da5b3100ab1 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -34,7 +34,7 @@ static int wait_ready(struct panfrost_device *pfdev, u32 
as_nr)
/* Wait for the MMU status to indicate there is no active command, in
 * case one is pending. */
ret = readl_relaxed_poll_timeout_atomic(pfdev->iomem + AS_STATUS(as_nr),
-   val, !(val & AS_STATUS_AS_ACTIVE), 10, 1000);
+   val, !(val & AS_STATUS_AS_ACTIVE), 10, 10);
 
if (ret) {
/* The GPU hung, let's trigger a reset */
-- 
2.31.1



[PATCH v6 14/16] drm/panfrost: Kill in-flight jobs on FD close

2021-06-29 Thread Boris Brezillon
If the process who submitted these jobs decided to close the FD before
the jobs are done it probably means it doesn't care about the result.

v5:
* Add a panfrost_exception_is_fault() helper and the
  DRM_PANFROST_EXCEPTION_MAX_NON_FAULT value

v4:
* Don't disable/restore irqs when taking the job_lock (not needed since
  this lock is never taken from an interrupt context)

v3:
* Set fence error to ECANCELED when a TERMINATED exception is received

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_device.h |  7 
 drivers/gpu/drm/panfrost/panfrost_job.c| 42 ++
 2 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index 68e93b7e5b61..193cd87f643c 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -184,6 +184,7 @@ enum drm_panfrost_exception_type {
DRM_PANFROST_EXCEPTION_KABOOM = 0x05,
DRM_PANFROST_EXCEPTION_EUREKA = 0x06,
DRM_PANFROST_EXCEPTION_ACTIVE = 0x08,
+   DRM_PANFROST_EXCEPTION_MAX_NON_FAULT = 0x3f,
DRM_PANFROST_EXCEPTION_JOB_CONFIG_FAULT = 0x40,
DRM_PANFROST_EXCEPTION_JOB_POWER_FAULT = 0x41,
DRM_PANFROST_EXCEPTION_JOB_READ_FAULT = 0x42,
@@ -244,6 +245,12 @@ enum drm_panfrost_exception_type {
DRM_PANFROST_EXCEPTION_MEM_ATTR_NONCACHE_3 = 0xef,
 };
 
+static inline bool
+panfrost_exception_is_fault(u32 exception_code)
+{
+   return exception_code > DRM_PANFROST_EXCEPTION_MAX_NON_FAULT;
+}
+
 const char *panfrost_exception_name(u32 exception_code);
 bool panfrost_exception_needs_reset(const struct panfrost_device *pfdev,
u32 exception_code);
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index cf5f9e8b2a27..8a0db9571bfd 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -483,14 +483,21 @@ static void panfrost_job_handle_irq(struct 
panfrost_device *pfdev, u32 status)
 
if (status & JOB_INT_MASK_ERR(j)) {
u32 js_status = job_read(pfdev, JS_STATUS(j));
+   const char *exception_name = 
panfrost_exception_name(js_status);
 
job_write(pfdev, JS_COMMAND_NEXT(j), JS_COMMAND_NOP);
 
-   dev_err(pfdev->dev, "js fault, js=%d, status=%s, 
head=0x%x, tail=0x%x",
-   j,
-   panfrost_exception_name(js_status),
-   job_read(pfdev, JS_HEAD_LO(j)),
-   job_read(pfdev, JS_TAIL_LO(j)));
+   if (!panfrost_exception_is_fault(js_status)) {
+   dev_dbg(pfdev->dev, "js interrupt, js=%d, 
status=%s, head=0x%x, tail=0x%x",
+   j, exception_name,
+   job_read(pfdev, JS_HEAD_LO(j)),
+   job_read(pfdev, JS_TAIL_LO(j)));
+   } else {
+   dev_err(pfdev->dev, "js fault, js=%d, 
status=%s, head=0x%x, tail=0x%x",
+   j, exception_name,
+   job_read(pfdev, JS_HEAD_LO(j)),
+   job_read(pfdev, JS_TAIL_LO(j)));
+   }
 
/* If we need a reset, signal it to the timeout
 * handler, otherwise, update the fence error field and
@@ -499,7 +506,16 @@ static void panfrost_job_handle_irq(struct panfrost_device 
*pfdev, u32 status)
if (panfrost_exception_needs_reset(pfdev, js_status)) {
drm_sched_fault(&pfdev->js->queue[j].sched);
} else {
-   dma_fence_set_error(pfdev->jobs[j]->done_fence, 
-EINVAL);
+   int error = 0;
+
+   if (js_status == 
DRM_PANFROST_EXCEPTION_TERMINATED)
+   error = -ECANCELED;
+   else if (panfrost_exception_is_fault(js_status))
+   error = -EINVAL;
+
+   if (error)
+   
dma_fence_set_error(pfdev->jobs[j]->done_fence, error);
+
status |= JOB_INT_MASK_DONE(j);
}
}
@@ -665,10 +681,24 @@ int panfrost_job_open(struct panfrost_file_priv 
*panfrost_priv)
 
 void panfrost_job_close(struct panfrost_file_priv *panfrost_priv)
 {
+   struct panfrost_device *pfdev = panfrost_priv->pfdev;
int i;
 
for (i = 0; i < NUM_JOB_SLOTS; i++)
drm_sch

[PATCH v6 12/16] drm/panfrost: Reset the GPU when the AS_ACTIVE bit is stuck

2021-06-29 Thread Boris Brezillon
Things are unlikely to resolve until we reset the GPU. Let's not wait
for other faults/timeout to happen to trigger this reset.

Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 959da770295c..e0356e68e768 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -36,8 +36,11 @@ static int wait_ready(struct panfrost_device *pfdev, u32 
as_nr)
ret = readl_relaxed_poll_timeout_atomic(pfdev->iomem + AS_STATUS(as_nr),
val, !(val & AS_STATUS_AS_ACTIVE), 10, 1000);
 
-   if (ret)
+   if (ret) {
+   /* The GPU hung, let's trigger a reset */
+   panfrost_device_schedule_reset(pfdev);
dev_err(pfdev->dev, "AS_ACTIVE bit stuck\n");
+   }
 
return ret;
 }
-- 
2.31.1



Re: [PATCH v6 00/16] drm/panfrost

2021-06-30 Thread Boris Brezillon
On Wed, 30 Jun 2021 08:27:35 +0200
Boris Brezillon  wrote:

> Hello,
> 
> Bunch of improvements to make the panfrost driver more robust and allow
> queuing jobs at the HW level.
> 
> Changes in v6:
> * Collected acks/reviews
> * Got rid of the cancel_delayed_work() calls in the reset path
> 
> Changes in this v5:
> * Document what's excepted in the ->timedout_job() hook
> * Add a patch increasing the AS_ACTIVE polling timeout
> * Fix a few minor things here and there (see each commit for a detailed
>   changelog) and collect R-b/A-b tags
> 
> Changes in this v4:
> * fixing the reset serialization
> * fixing a deadlock in the reset path
> * moving the exception enum to a private header
> 
> Regards,
> 
> Boris
> 
> Boris Brezillon (15):
>   drm/sched: Document what the timedout_job method should do
>   drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr
>   drm/panfrost: Make ->run_job() return an ERR_PTR() when appropriate
>   drm/panfrost: Get rid of the unused JS_STATUS_EVENT_ACTIVE definition
>   drm/panfrost: Drop the pfdev argument passed to
> panfrost_exception_name()
>   drm/panfrost: Do the exception -> string translation using a table
>   drm/panfrost: Expose a helper to trigger a GPU reset
>   drm/panfrost: Use a threaded IRQ for job interrupts
>   drm/panfrost: Simplify the reset serialization logic
>   drm/panfrost: Make sure job interrupts are masked before resetting
>   drm/panfrost: Disable the AS on unhandled page faults
>   drm/panfrost: Reset the GPU when the AS_ACTIVE bit is stuck
>   drm/panfrost: Don't reset the GPU on job faults unless we really have
> to
>   drm/panfrost: Kill in-flight jobs on FD close
>   drm/panfrost: Increase the AS_ACTIVE polling timeout
> 
> Steven Price (1):
>   drm/panfrost: Queue jobs on the hardware

Series queued to drm-misc-next.

> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |   2 +-
>  drivers/gpu/drm/etnaviv/etnaviv_sched.c|   3 +-
>  drivers/gpu/drm/lima/lima_sched.c  |   3 +-
>  drivers/gpu/drm/panfrost/panfrost_device.c | 139 +++--
>  drivers/gpu/drm/panfrost/panfrost_device.h |  91 ++-
>  drivers/gpu/drm/panfrost/panfrost_gpu.c|   2 +-
>  drivers/gpu/drm/panfrost/panfrost_job.c| 626 +++--
>  drivers/gpu/drm/panfrost/panfrost_mmu.c|  43 +-
>  drivers/gpu/drm/panfrost/panfrost_regs.h   |   3 -
>  drivers/gpu/drm/scheduler/sched_main.c |  14 +-
>  drivers/gpu/drm/v3d/v3d_sched.c|  10 +-
>  include/drm/gpu_scheduler.h|  37 +-
>  12 files changed, 717 insertions(+), 256 deletions(-)
> 



[PATCH v2 2/7] drm/panfrost: Move the mappings collection out of panfrost_lookup_bos()

2021-07-01 Thread Boris Brezillon
So we can re-use it from elsewhere.

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 52 ++---
 1 file changed, 29 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 1ffaef5ec5ff..9bbc9e78cc85 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -109,6 +109,34 @@ static int panfrost_ioctl_create_bo(struct drm_device 
*dev, void *data,
return 0;
 }
 
+static int
+panfrost_get_job_mappings(struct drm_file *file_priv, struct panfrost_job *job)
+{
+   struct panfrost_file_priv *priv = file_priv->driver_priv;
+   unsigned int i;
+
+   job->mappings = kvmalloc_array(job->bo_count,
+  sizeof(*job->mappings),
+  GFP_KERNEL | __GFP_ZERO);
+   if (!job->mappings)
+   return -ENOMEM;
+
+   for (i = 0; i < job->bo_count; i++) {
+   struct panfrost_gem_mapping *mapping;
+   struct panfrost_gem_object *bo;
+
+   bo = to_panfrost_bo(job->bos[i]);
+   mapping = panfrost_gem_mapping_get(bo, priv);
+   if (!mapping)
+   return -EINVAL;
+
+   atomic_inc(&bo->gpu_usecount);
+   job->mappings[i] = mapping;
+   }
+
+   return 0;
+}
+
 /**
  * panfrost_lookup_bos() - Sets up job->bo[] with the GEM objects
  * referenced by the job.
@@ -128,8 +156,6 @@ panfrost_lookup_bos(struct drm_device *dev,
  struct drm_panfrost_submit *args,
  struct panfrost_job *job)
 {
-   struct panfrost_file_priv *priv = file_priv->driver_priv;
-   struct panfrost_gem_object *bo;
unsigned int i;
int ret;
 
@@ -144,27 +170,7 @@ panfrost_lookup_bos(struct drm_device *dev,
if (ret)
return ret;
 
-   job->mappings = kvmalloc_array(job->bo_count,
-  sizeof(struct panfrost_gem_mapping *),
-  GFP_KERNEL | __GFP_ZERO);
-   if (!job->mappings)
-   return -ENOMEM;
-
-   for (i = 0; i < job->bo_count; i++) {
-   struct panfrost_gem_mapping *mapping;
-
-   bo = to_panfrost_bo(job->bos[i]);
-   mapping = panfrost_gem_mapping_get(bo, priv);
-   if (!mapping) {
-   ret = -EINVAL;
-   break;
-   }
-
-   atomic_inc(&bo->gpu_usecount);
-   job->mappings[i] = mapping;
-   }
-
-   return ret;
+   return panfrost_get_job_mappings(file_priv, job);
 }
 
 /**
-- 
2.31.1



[PATCH v2 0/7] drm/panfrost: drm/panfrost: Add a new submit ioctl

2021-07-01 Thread Boris Brezillon
Hello,

This is an attempt at providing a new submit ioctl that's more
Vulkan-friendly than the existing one. This ioctl

1/ allows passing several out syncobjs so we can easily update
   several fence/semaphore in a single ioctl() call
2/ allows passing several jobs so we don't have to have one ioctl
   per job-chain recorded in the command buffer
3/ supports disabling implicit dependencies as well as 
   non-exclusive access to BOs, thus removing unnecessary
   synchronization

I've also been looking at adding {IN,OUT}_FENCE_FD support (allowing
one to pass at most one sync_file object in input and/or creating a
sync_file FD embedding the render out fence), but it's not entirely
clear to me when that's useful. Indeed, we can already do the
sync_file <-> syncobj conversion using the
SYNCOBJ_{FD_TO_HANDLE,HANDLE_TO_FD} ioctls if we have to.
Note that, unlike Turnip, PanVk is using syncobjs to implement
vkQueueWaitIdle(), so the syncobj -> sync_file conversion doesn't
have to happen for each submission, but maybe there's a good reason
to use sync_files for that too. Any feedback on that aspect would
be useful I guess.

Any feedback on this new ioctl is welcome, in particular, do you
think other things are missing/would be nice to have for Vulkan?

Regards,

Boris

P.S.: basic igt tests for these new ioctls re available there [1]

[1]https://gitlab.freedesktop.org/bbrezillon/igt-gpu-tools/-/tree/panfrost-batch-submit

Boris Brezillon (7):
  drm/panfrost: Pass a job to panfrost_{acquire,attach_object_fences}()
  drm/panfrost: Move the mappings collection out of
panfrost_lookup_bos()
  drm/panfrost: Add BO access flags to relax dependencies between jobs
  drm/panfrost: Add the ability to create submit queues
  drm/panfrost: Add a new ioctl to submit batches
  drm/panfrost: Advertise the SYNCOBJ_TIMELINE feature
  drm/panfrost: Bump minor version to reflect the feature additions

 drivers/gpu/drm/panfrost/Makefile |   3 +-
 drivers/gpu/drm/panfrost/panfrost_device.h|   2 +-
 drivers/gpu/drm/panfrost/panfrost_drv.c   | 442 --
 drivers/gpu/drm/panfrost/panfrost_job.c   |  89 ++--
 drivers/gpu/drm/panfrost/panfrost_job.h   |  10 +-
 .../gpu/drm/panfrost/panfrost_submitqueue.c   | 130 ++
 .../gpu/drm/panfrost/panfrost_submitqueue.h   |  27 ++
 include/uapi/drm/panfrost_drm.h   | 103 
 8 files changed, 720 insertions(+), 86 deletions(-)
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_submitqueue.c
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_submitqueue.h

-- 
2.31.1



[PATCH v2 1/7] drm/panfrost: Pass a job to panfrost_{acquire, attach_object_fences}()

2021-07-01 Thread Boris Brezillon
So we don't have to change the prototype if we extend the function.

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_job.c | 22 --
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index 71a72fb50e6b..fdc1bd7ecf12 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -240,15 +240,13 @@ static void panfrost_job_hw_submit(struct panfrost_job 
*job, int js)
spin_unlock(&pfdev->js->job_lock);
 }
 
-static int panfrost_acquire_object_fences(struct drm_gem_object **bos,
- int bo_count,
- struct xarray *deps)
+static int panfrost_acquire_object_fences(struct panfrost_job *job)
 {
int i, ret;
 
-   for (i = 0; i < bo_count; i++) {
+   for (i = 0; i < job->bo_count; i++) {
/* panfrost always uses write mode in its current uapi */
-   ret = drm_gem_fence_array_add_implicit(deps, bos[i], true);
+   ret = drm_gem_fence_array_add_implicit(&job->deps, job->bos[i], 
true);
if (ret)
return ret;
}
@@ -256,14 +254,12 @@ static int panfrost_acquire_object_fences(struct 
drm_gem_object **bos,
return 0;
 }
 
-static void panfrost_attach_object_fences(struct drm_gem_object **bos,
- int bo_count,
- struct dma_fence *fence)
+static void panfrost_attach_object_fences(struct panfrost_job *job)
 {
int i;
 
-   for (i = 0; i < bo_count; i++)
-   dma_resv_add_excl_fence(bos[i]->resv, fence);
+   for (i = 0; i < job->bo_count; i++)
+   dma_resv_add_excl_fence(job->bos[i]->resv, 
job->render_done_fence);
 }
 
 int panfrost_job_push(struct panfrost_job *job)
@@ -290,8 +286,7 @@ int panfrost_job_push(struct panfrost_job *job)
 
job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
 
-   ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
-&job->deps);
+   ret = panfrost_acquire_object_fences(job);
if (ret) {
mutex_unlock(&pfdev->sched_lock);
goto unlock;
@@ -303,8 +298,7 @@ int panfrost_job_push(struct panfrost_job *job)
 
mutex_unlock(&pfdev->sched_lock);
 
-   panfrost_attach_object_fences(job->bos, job->bo_count,
- job->render_done_fence);
+   panfrost_attach_object_fences(job);
 
 unlock:
drm_gem_unlock_reservations(job->bos, job->bo_count, &acquire_ctx);
-- 
2.31.1



[PATCH v2 4/7] drm/panfrost: Add the ability to create submit queues

2021-07-01 Thread Boris Brezillon
Needed to keep VkQueues isolated from each other.

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/Makefile |   3 +-
 drivers/gpu/drm/panfrost/panfrost_device.h|   2 +-
 drivers/gpu/drm/panfrost/panfrost_drv.c   |  69 --
 drivers/gpu/drm/panfrost/panfrost_job.c   |  47 ++-
 drivers/gpu/drm/panfrost/panfrost_job.h   |   9 +-
 .../gpu/drm/panfrost/panfrost_submitqueue.c   | 130 ++
 .../gpu/drm/panfrost/panfrost_submitqueue.h   |  27 
 include/uapi/drm/panfrost_drm.h   |  17 +++
 8 files changed, 258 insertions(+), 46 deletions(-)
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_submitqueue.c
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_submitqueue.h

diff --git a/drivers/gpu/drm/panfrost/Makefile 
b/drivers/gpu/drm/panfrost/Makefile
index b71935862417..e99192b66ec9 100644
--- a/drivers/gpu/drm/panfrost/Makefile
+++ b/drivers/gpu/drm/panfrost/Makefile
@@ -9,6 +9,7 @@ panfrost-y := \
panfrost_gpu.o \
panfrost_job.o \
panfrost_mmu.o \
-   panfrost_perfcnt.o
+   panfrost_perfcnt.o \
+   panfrost_submitqueue.o
 
 obj-$(CONFIG_DRM_PANFROST) += panfrost.o
diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index 8b25278f34c8..51c0ba4e50f5 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -137,7 +137,7 @@ struct panfrost_mmu {
 struct panfrost_file_priv {
struct panfrost_device *pfdev;
 
-   struct drm_sched_entity sched_entity[NUM_JOB_SLOTS];
+   struct idr queues;
 
struct panfrost_mmu *mmu;
 };
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index b6b5997c9366..6529e5972b47 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -19,6 +19,7 @@
 #include "panfrost_job.h"
 #include "panfrost_gpu.h"
 #include "panfrost_perfcnt.h"
+#include "panfrost_submitqueue.h"
 
 static bool unstable_ioctls;
 module_param_unsafe(unstable_ioctls, bool, 0600);
@@ -250,6 +251,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, 
void *data,
struct panfrost_device *pfdev = dev->dev_private;
struct drm_panfrost_submit *args = data;
struct drm_syncobj *sync_out = NULL;
+   struct panfrost_submitqueue *queue;
struct panfrost_job *job;
int ret = 0;
 
@@ -259,10 +261,16 @@ static int panfrost_ioctl_submit(struct drm_device *dev, 
void *data,
if (args->requirements && args->requirements != PANFROST_JD_REQ_FS)
return -EINVAL;
 
+   queue = panfrost_submitqueue_get(file->driver_priv, 0);
+   if (IS_ERR(queue))
+   return PTR_ERR(queue);
+
if (args->out_sync > 0) {
sync_out = drm_syncobj_find(file, args->out_sync);
-   if (!sync_out)
-   return -ENODEV;
+   if (!sync_out) {
+   ret = -ENODEV;
+   goto fail_put_queue;
+   }
}
 
job = kzalloc(sizeof(*job), GFP_KERNEL);
@@ -289,7 +297,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, 
void *data,
if (ret)
goto fail_job;
 
-   ret = panfrost_job_push(job);
+   ret = panfrost_job_push(queue, job);
if (ret)
goto fail_job;
 
@@ -302,6 +310,8 @@ static int panfrost_ioctl_submit(struct drm_device *dev, 
void *data,
 fail_out_sync:
if (sync_out)
drm_syncobj_put(sync_out);
+fail_put_queue:
+   panfrost_submitqueue_put(queue);
 
return ret;
 }
@@ -451,6 +461,36 @@ static int panfrost_ioctl_madvise(struct drm_device *dev, 
void *data,
return ret;
 }
 
+static int
+panfrost_ioctl_create_submitqueue(struct drm_device *dev, void *data,
+ struct drm_file *file_priv)
+{
+   struct panfrost_file_priv *priv = file_priv->driver_priv;
+   struct drm_panfrost_create_submitqueue *args = data;
+   struct panfrost_submitqueue *queue;
+
+   queue = panfrost_submitqueue_create(priv, args->priority, args->flags);
+   if (IS_ERR(queue))
+   return PTR_ERR(queue);
+
+   args->id = queue->id;
+   return 0;
+}
+
+static int
+panfrost_ioctl_destroy_submitqueue(struct drm_device *dev, void *data,
+  struct drm_file *file_priv)
+{
+   struct panfrost_file_priv *priv = file_priv->driver_priv;
+   u32 id = *((u32 *)data);
+
+   /* Default queue can't be destroyed. */
+   if (!id)
+   return -ENOENT;
+
+   return panfrost_submitqueue_destroy(priv, id);
+}
+
 int panfrost_unstable_ioctl_check(void)
 {
if (!unstable_ioctls)
@@ -465,6 +505,7 @@ panfrost_open(struct drm_device *dev, struct drm_file *fil

[PATCH v2 5/7] drm/panfrost: Add a new ioctl to submit batches

2021-07-01 Thread Boris Brezillon
This should help limit the number of ioctls when submitting multiple
jobs. The new ioctl also supports syncobj timelines and BO access flags.

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 305 
 drivers/gpu/drm/panfrost/panfrost_job.c |   3 +
 include/uapi/drm/panfrost_drm.h |  84 +++
 3 files changed, 392 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 6529e5972b47..7ed0773a5c19 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -491,6 +491,310 @@ panfrost_ioctl_destroy_submitqueue(struct drm_device 
*dev, void *data,
return panfrost_submitqueue_destroy(priv, id);
 }
 
+static int
+panfrost_get_job_in_syncs(struct drm_file *file_priv,
+ u64 refs, u32 ref_stride,
+ u32 count, struct panfrost_job *job)
+{
+   const void __user *in = u64_to_user_ptr(refs);
+   unsigned int i;
+   int ret;
+
+   if (!count)
+   return 0;
+
+   for (i = 0; i < count; i++) {
+   struct drm_panfrost_syncobj_ref ref = { };
+   struct dma_fence *fence;
+
+   ret = copy_struct_from_user(&ref, sizeof(ref),
+   in + (i * ref_stride),
+   ref_stride);
+   if (ret)
+   return ret;
+
+   if (ref.pad)
+   return -EINVAL;
+
+   ret = drm_syncobj_find_fence(file_priv, ref.handle, ref.point,
+0, &fence);
+   if (ret)
+   return ret;
+
+   ret = drm_gem_fence_array_add(&job->deps, fence);
+   if (ret)
+   return ret;
+   }
+
+   return 0;
+}
+
+struct panfrost_job_out_sync {
+   struct drm_syncobj *syncobj;
+   struct dma_fence_chain *chain;
+   u64 point;
+};
+
+static void
+panfrost_put_job_out_syncs(struct panfrost_job_out_sync *out_syncs, u32 count)
+{
+   unsigned int i;
+
+   for (i = 0; i < count; i++) {
+   if (!out_syncs[i].syncobj)
+   break;
+
+   drm_syncobj_put(out_syncs[i].syncobj);
+   kvfree(out_syncs[i].chain);
+   }
+
+   kvfree(out_syncs);
+}
+
+static struct panfrost_job_out_sync *
+panfrost_get_job_out_syncs(struct drm_file *file_priv,
+  u64 refs, u32 ref_stride,
+  u32 count)
+{
+   void __user *in = u64_to_user_ptr(refs);
+   struct panfrost_job_out_sync *out_syncs;
+   unsigned int i;
+   int ret;
+
+   if (!count)
+   return NULL;
+
+   out_syncs = kvmalloc_array(count, sizeof(*out_syncs),
+  GFP_KERNEL | __GFP_ZERO);
+   if (!out_syncs)
+   return ERR_PTR(-ENOMEM);
+
+   for (i = 0; i < count; i++) {
+   struct drm_panfrost_syncobj_ref ref = { };
+
+   ret = copy_struct_from_user(&ref, sizeof(ref),
+   in + (i * ref_stride),
+   ref_stride);
+   if (ret)
+   goto err_free_out_syncs;
+
+   if (ref.pad) {
+   ret = -EINVAL;
+   goto err_free_out_syncs;
+   }
+
+   out_syncs[i].syncobj = drm_syncobj_find(file_priv, ref.handle);
+   if (!out_syncs[i].syncobj) {
+   ret = -EINVAL;
+   goto err_free_out_syncs;
+   }
+
+   out_syncs[i].point = ref.point;
+   if (!out_syncs[i].point)
+   continue;
+
+   out_syncs[i].chain = kmalloc(sizeof(*out_syncs[i].chain),
+GFP_KERNEL);
+   if (!out_syncs[i].chain) {
+   ret = -ENOMEM;
+   goto err_free_out_syncs;
+   }
+   }
+
+   return out_syncs;
+
+err_free_out_syncs:
+   panfrost_put_job_out_syncs(out_syncs, count);
+   return ERR_PTR(ret);
+}
+
+static void
+panfrost_set_job_out_fence(struct panfrost_job_out_sync *out_syncs,
+  unsigned int count, struct dma_fence *fence)
+{
+   unsigned int i;
+
+   for (i = 0; i < count; i++) {
+   if (out_syncs[i].chain) {
+   drm_syncobj_add_point(out_syncs[i].syncobj,
+ out_syncs[i].chain,
+ fence, out_syncs[i].point);
+   out_syncs[i].chain = NULL;
+   } else {
+   drm_syncobj_replace_fence(out_syncs[i].syncobj,
+   

[PATCH v2 3/7] drm/panfrost: Add BO access flags to relax dependencies between jobs

2021-07-01 Thread Boris Brezillon
Jobs reading from the same BO should not be serialized. Add access
flags so we can relax the implicit dependencies in that case. We force
exclusive access for now to keep the behavior unchanged, but a new
SUBMIT ioctl taking explicit access flags will be introduced.

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c |  9 +
 drivers/gpu/drm/panfrost/panfrost_job.c | 23 +++
 drivers/gpu/drm/panfrost/panfrost_job.h |  1 +
 include/uapi/drm/panfrost_drm.h |  2 ++
 4 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 9bbc9e78cc85..b6b5997c9366 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -164,6 +164,15 @@ panfrost_lookup_bos(struct drm_device *dev,
if (!job->bo_count)
return 0;
 
+   job->bo_flags = kvmalloc_array(job->bo_count,
+  sizeof(*job->bo_flags),
+  GFP_KERNEL | __GFP_ZERO);
+   if (!job->bo_flags)
+   return -ENOMEM;
+
+   for (i = 0; i < job->bo_count; i++)
+   job->bo_flags[i] = PANFROST_BO_REF_EXCLUSIVE;
+
ret = drm_gem_objects_lookup(file_priv,
 (void __user *)(uintptr_t)args->bo_handles,
 job->bo_count, &job->bos);
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index fdc1bd7ecf12..152245b122be 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -245,8 +245,16 @@ static int panfrost_acquire_object_fences(struct 
panfrost_job *job)
int i, ret;
 
for (i = 0; i < job->bo_count; i++) {
-   /* panfrost always uses write mode in its current uapi */
-   ret = drm_gem_fence_array_add_implicit(&job->deps, job->bos[i], 
true);
+   bool exclusive = job->bo_flags[i] & PANFROST_BO_REF_EXCLUSIVE;
+
+   if (!exclusive) {
+   ret = dma_resv_reserve_shared(job->bos[i]->resv, 1);
+   if (ret)
+   return ret;
+   }
+
+   ret = drm_gem_fence_array_add_implicit(&job->deps, job->bos[i],
+  exclusive);
if (ret)
return ret;
}
@@ -258,8 +266,14 @@ static void panfrost_attach_object_fences(struct 
panfrost_job *job)
 {
int i;
 
-   for (i = 0; i < job->bo_count; i++)
-   dma_resv_add_excl_fence(job->bos[i]->resv, 
job->render_done_fence);
+   for (i = 0; i < job->bo_count; i++) {
+   struct dma_resv *robj = job->bos[i]->resv;
+
+   if (job->bo_flags[i] & PANFROST_BO_REF_EXCLUSIVE)
+   dma_resv_add_excl_fence(robj, job->render_done_fence);
+   else
+   dma_resv_add_shared_fence(robj, job->render_done_fence);
+   }
 }
 
 int panfrost_job_push(struct panfrost_job *job)
@@ -340,6 +354,7 @@ static void panfrost_job_cleanup(struct kref *ref)
kvfree(job->bos);
}
 
+   kvfree(job->bo_flags);
kfree(job);
 }
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.h 
b/drivers/gpu/drm/panfrost/panfrost_job.h
index 82306a03b57e..1cbc3621b663 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.h
+++ b/drivers/gpu/drm/panfrost/panfrost_job.h
@@ -32,6 +32,7 @@ struct panfrost_job {
 
struct panfrost_gem_mapping **mappings;
struct drm_gem_object **bos;
+   u32 *bo_flags;
u32 bo_count;
 
/* Fence to be signaled by drm-sched once its done with the job */
diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
index 061e700dd06c..45d6c600475c 100644
--- a/include/uapi/drm/panfrost_drm.h
+++ b/include/uapi/drm/panfrost_drm.h
@@ -224,6 +224,8 @@ struct drm_panfrost_madvise {
__u32 retained;   /* out, whether backing store still exists */
 };
 
+#define PANFROST_BO_REF_EXCLUSIVE  0x1
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.31.1



[PATCH v2 6/7] drm/panfrost: Advertise the SYNCOBJ_TIMELINE feature

2021-07-01 Thread Boris Brezillon
Now that we have a new SUBMIT ioctl dealing with timelined syncojbs we
can advertise the feature.

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 7ed0773a5c19..b0978fe4fa36 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -886,7 +886,8 @@ DEFINE_DRM_GEM_FOPS(panfrost_drm_driver_fops);
  * - 1.2 - adds AFBC_FEATURES query
  */
 static const struct drm_driver panfrost_drm_driver = {
-   .driver_features= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ,
+   .driver_features= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ |
+ DRIVER_SYNCOBJ_TIMELINE,
.open   = panfrost_open,
.postclose  = panfrost_postclose,
.ioctls = panfrost_drm_driver_ioctls,
-- 
2.31.1



[PATCH v2 7/7] drm/panfrost: Bump minor version to reflect the feature additions

2021-07-01 Thread Boris Brezillon
We now have a new ioctl that allows submitting multiple jobs at once
(among other things) and we support timelined syncobjs. Bump the
minor version number to reflect those changes.

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index b0978fe4fa36..1859e6887877 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -884,6 +884,8 @@ DEFINE_DRM_GEM_FOPS(panfrost_drm_driver_fops);
  * - 1.0 - initial interface
  * - 1.1 - adds HEAP and NOEXEC flags for CREATE_BO
  * - 1.2 - adds AFBC_FEATURES query
+ * - 1.3 - adds the BATCH_SUBMIT, CREATE_SUBMITQUEUE, DESTROY_SUBMITQUEUE
+ *ioctls and advertises the SYNCOBJ_TIMELINE feature
  */
 static const struct drm_driver panfrost_drm_driver = {
.driver_features= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ |
@@ -897,7 +899,7 @@ static const struct drm_driver panfrost_drm_driver = {
.desc   = "panfrost DRM",
.date   = "20180908",
.major  = 1,
-   .minor  = 2,
+   .minor  = 3,
 
.gem_create_object  = panfrost_gem_create_object,
.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
-- 
2.31.1



Re: [PATCH v2 4/7] drm/panfrost: Add the ability to create submit queues

2021-07-02 Thread Boris Brezillon
On Fri, 2 Jul 2021 10:56:29 +0100
Steven Price  wrote:

> On 01/07/2021 10:12, Boris Brezillon wrote:
> > Needed to keep VkQueues isolated from each other.
> > 
> > Signed-off-by: Boris Brezillon   
> 
> My Vulkan knowledge is limited so I'm not sure whether this is the right
> approach or not. In particular is it correct that an application can
> create a high priority queue which could affect other (normal priority)
> applications?

That's what msm does (with no extra CAPS check AFAICT), and the
freedreno driver can already create high priority queues if
PIPE_CONTEXT_HIGH_PRIORITY is passed. Not saying that's okay to allow
userspace to tweak the priority, but if that's a problem, other drivers
are in trouble too ;-).

> 
> Also does it really make sense to allow user space to create an
> unlimited number of queues? It feels like an ideal way for an malicious
> application to waste kernel memory.

Same here, I see no limit on the number of queues the msm driver can
create. I can definitely pick an arbitrary limit of 2^16 (or 2^8 if
2^16 is too high) if you prefer, but I feel like there's plenty of ways
to force kernel allocations already, like allocating a gazillion of 4k
GEM buffers (cgroup can probably limit the total amount of memory
allocated, but you'd still have all gem-buf meta data in kernel memory).

> 
> In terms of implementation it looks correct, but one comment below
> 
> > ---
> >  drivers/gpu/drm/panfrost/Makefile |   3 +-
> >  drivers/gpu/drm/panfrost/panfrost_device.h|   2 +-
> >  drivers/gpu/drm/panfrost/panfrost_drv.c   |  69 --
> >  drivers/gpu/drm/panfrost/panfrost_job.c   |  47 ++-
> >  drivers/gpu/drm/panfrost/panfrost_job.h   |   9 +-
> >  .../gpu/drm/panfrost/panfrost_submitqueue.c   | 130 ++
> >  .../gpu/drm/panfrost/panfrost_submitqueue.h   |  27 
> >  include/uapi/drm/panfrost_drm.h   |  17 +++
> >  8 files changed, 258 insertions(+), 46 deletions(-)
> >  create mode 100644 drivers/gpu/drm/panfrost/panfrost_submitqueue.c
> >  create mode 100644 drivers/gpu/drm/panfrost/panfrost_submitqueue.h
> >   
> [...]
> > diff --git a/drivers/gpu/drm/panfrost/panfrost_submitqueue.c 
> > b/drivers/gpu/drm/panfrost/panfrost_submitqueue.c
> > new file mode 100644
> > index ..98050f7690df
> > --- /dev/null
> > +++ b/drivers/gpu/drm/panfrost/panfrost_submitqueue.c
> > @@ -0,0 +1,130 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright 2021 Collabora ltd. */
> > +
> > +#include 
> > +
> > +#include "panfrost_device.h"
> > +#include "panfrost_job.h"
> > +#include "panfrost_submitqueue.h"
> > +
> > +static enum drm_sched_priority
> > +to_sched_prio(enum panfrost_submitqueue_priority priority)
> > +{
> > +   switch (priority) {
> > +   case PANFROST_SUBMITQUEUE_PRIORITY_LOW:
> > +   return DRM_SCHED_PRIORITY_MIN;
> > +   case PANFROST_SUBMITQUEUE_PRIORITY_MEDIUM:
> > +   return DRM_SCHED_PRIORITY_NORMAL;
> > +   case PANFROST_SUBMITQUEUE_PRIORITY_HIGH:
> > +   return DRM_SCHED_PRIORITY_HIGH;
> > +   default:
> > +   break;
> > +   }
> > +
> > +   return DRM_SCHED_PRIORITY_UNSET;
> > +}
> > +
> > +static void
> > +panfrost_submitqueue_cleanup(struct kref *ref)
> > +{
> > +   struct panfrost_submitqueue *queue;
> > +   unsigned int i;
> > +
> > +   queue = container_of(ref, struct panfrost_submitqueue, refcount);
> > +
> > +   for (i = 0; i < NUM_JOB_SLOTS; i++)
> > +   drm_sched_entity_destroy(&queue->sched_entity[i]);
> > +
> > +   /* Kill in-flight jobs */
> > +   panfrost_job_kill_queue(queue);
> > +
> > +   kfree(queue);
> > +}
> > +
> > +void panfrost_submitqueue_put(struct panfrost_submitqueue *queue)
> > +{
> > +   if (!IS_ERR_OR_NULL(queue))
> > +   kref_put(&queue->refcount, panfrost_submitqueue_cleanup);
> > +}
> > +
> > +struct panfrost_submitqueue *
> > +panfrost_submitqueue_create(struct panfrost_file_priv *ctx,
> > +   enum panfrost_submitqueue_priority priority,
> > +   u32 flags)  
> 
> If this function returned an 'int' we could simplify some code. So
> instead of returning the struct panfrost_submitqueue just return the ID
> (or negative error). The only caller (panfrost_ioctl_create_submitqueue)
> doesn't actually want the object just the ID and we can ditch the 'id'
> field from panfrost_submitqueue.

Sure, I can do that.


Re: [PATCH v2 4/7] drm/panfrost: Add the ability to create submit queues

2021-07-02 Thread Boris Brezillon
On Fri, 2 Jul 2021 11:08:58 +0100
Steven Price  wrote:

> On 01/07/2021 10:12, Boris Brezillon wrote:
> > Needed to keep VkQueues isolated from each other.  
> 
> One more comment I noticed when I tried this out:
> 
> [...]
> > +struct panfrost_submitqueue *
> > +panfrost_submitqueue_create(struct panfrost_file_priv *ctx,
> > +   enum panfrost_submitqueue_priority priority,
> > +   u32 flags)
> > +{
> > +   struct panfrost_submitqueue *queue;
> > +   enum drm_sched_priority sched_prio;
> > +   int ret, i;
> > +
> > +   if (flags || priority >= PANFROST_SUBMITQUEUE_PRIORITY_COUNT)
> > +   return ERR_PTR(-EINVAL);
> > +
> > +   queue = kzalloc(sizeof(*queue), GFP_KERNEL);
> > +   if (!queue)
> > +   return ERR_PTR(-ENOMEM);
> > +
> > +   queue->pfdev = ctx->pfdev;
> > +   sched_prio = to_sched_prio(priority);
> > +   for (i = 0; i < NUM_JOB_SLOTS; i++) {
> > +   struct drm_gpu_scheduler *sched;
> > +
> > +   sched = panfrost_job_get_sched(ctx->pfdev, i);
> > +   ret = drm_sched_entity_init(&queue->sched_entity[i],
> > +   sched_prio, &sched, 1, NULL);
> > +   if (ret)
> > +   break;
> > +   }
> > +
> > +   if (ret) {
> > +   for (i--; i >= 0; i--)
> > +   drm_sched_entity_destroy(&queue->sched_entity[i]);
> > +
> > +   return ERR_PTR(ret);
> > +   }
> > +
> > +   kref_init(&queue->refcount);
> > +   idr_lock(&ctx->queues);
> > +   ret = idr_alloc(&ctx->queues, queue, 0, INT_MAX, GFP_KERNEL);  
> 
> This makes lockdep complain. idr_lock() is a spinlock and GFP_KERNEL can
> sleep. So either we need to bring our own mutex here or not use GFP_KERNEL.
> 

Ouch! I wonder why I don't see that (I have lockdep enabled, and the
igt tests should have exercised this path).

> Steve



Re: [PATCH v2 4/7] drm/panfrost: Add the ability to create submit queues

2021-07-02 Thread Boris Brezillon
On Fri, 2 Jul 2021 11:58:34 +0100
Steven Price  wrote:

> On 02/07/2021 11:52, Boris Brezillon wrote:
> > On Fri, 2 Jul 2021 11:08:58 +0100
> > Steven Price  wrote:
> >   
> >> On 01/07/2021 10:12, Boris Brezillon wrote:  
> >>> Needed to keep VkQueues isolated from each other.
> >>
> >> One more comment I noticed when I tried this out:
> >>
> >> [...]  
> >>> +struct panfrost_submitqueue *
> >>> +panfrost_submitqueue_create(struct panfrost_file_priv *ctx,
> >>> + enum panfrost_submitqueue_priority priority,
> >>> + u32 flags)
> >>> +{
> >>> + struct panfrost_submitqueue *queue;
> >>> + enum drm_sched_priority sched_prio;
> >>> + int ret, i;
> >>> +
> >>> + if (flags || priority >= PANFROST_SUBMITQUEUE_PRIORITY_COUNT)
> >>> + return ERR_PTR(-EINVAL);
> >>> +
> >>> + queue = kzalloc(sizeof(*queue), GFP_KERNEL);
> >>> + if (!queue)
> >>> + return ERR_PTR(-ENOMEM);
> >>> +
> >>> + queue->pfdev = ctx->pfdev;
> >>> + sched_prio = to_sched_prio(priority);
> >>> + for (i = 0; i < NUM_JOB_SLOTS; i++) {
> >>> + struct drm_gpu_scheduler *sched;
> >>> +
> >>> + sched = panfrost_job_get_sched(ctx->pfdev, i);
> >>> + ret = drm_sched_entity_init(&queue->sched_entity[i],
> >>> + sched_prio, &sched, 1, NULL);
> >>> + if (ret)
> >>> + break;
> >>> + }
> >>> +
> >>> + if (ret) {
> >>> + for (i--; i >= 0; i--)
> >>> + drm_sched_entity_destroy(&queue->sched_entity[i]);
> >>> +
> >>> + return ERR_PTR(ret);
> >>> + }
> >>> +
> >>> + kref_init(&queue->refcount);
> >>> + idr_lock(&ctx->queues);
> >>> + ret = idr_alloc(&ctx->queues, queue, 0, INT_MAX, GFP_KERNEL);
> >>
> >> This makes lockdep complain. idr_lock() is a spinlock and GFP_KERNEL can
> >> sleep. So either we need to bring our own mutex here or not use GFP_KERNEL.
> >>  
> > 
> > Ouch! I wonder why I don't see that (I have lockdep enabled, and the
> > igt tests should have exercised this path).  
> 
> Actually I'm not sure it technically lockdep - have you got
> CONFIG_DEBUG_ATOMIC_SLEEP set?

Nope, I was missing that one :-/.


Re: [PATCH v2 4/7] drm/panfrost: Add the ability to create submit queues

2021-07-02 Thread Boris Brezillon
On Fri, 2 Jul 2021 09:58:06 -0400
Alyssa Rosenzweig  wrote:

> > > My Vulkan knowledge is limited so I'm not sure whether this is the right
> > > approach or not. In particular is it correct that an application can
> > > create a high priority queue which could affect other (normal priority)
> > > applications?  
> > 
> > That's what msm does (with no extra CAPS check AFAICT), and the
> > freedreno driver can already create high priority queues if
> > PIPE_CONTEXT_HIGH_PRIORITY is passed. Not saying that's okay to allow
> > userspace to tweak the priority, but if that's a problem, other drivers
> > are in trouble too ;-).  
> 
> Speaking of, how will PIPE_CONTEXT_HIGH_PRIORITY be implemented with the
> new ioctl()? I envisioned something much simpler (for the old ioctl),
> just adding a "high priority?" flag to the submit and internally
> creating the two queues of normal/high priority for drm_sched to work
> out. Is this juggling now moved to userspace?

That's what freedreno does. I guess we could create 2 default queues
(one normal and one high prio) and extend the old submit ioctl() to do
what you suggest if you see a good reason to not switch to the new
ioctl() directly. I mean, we'll have to keep support for both anyway,
but switching to the new ioctl()) shouldn't be that hard (I can prepare
a MR transitioning the gallium driver to BATCH_SUBMIT if you want).


[PATCH v3 0/7] drm/panfrost: drm/panfrost: Add a new submit ioctl

2021-07-02 Thread Boris Brezillon
Hello,

This is an attempt at providing a new submit ioctl that's more
Vulkan-friendly than the existing one. This ioctl

1/ allows passing several out syncobjs so we can easily update
   several fence/semaphore in a single ioctl() call
2/ allows passing several jobs so we don't have to have one ioctl
   per job-chain recorded in the command buffer
3/ supports disabling implicit dependencies as well as 
   non-exclusive access to BOs, thus removing unnecessary
   synchronization

I've also been looking at adding {IN,OUT}_FENCE_FD support (allowing
one to pass at most one sync_file object in input and/or creating a
sync_file FD embedding the render out fence), but it's not entirely
clear to me when that's useful. Indeed, we can already do the
sync_file <-> syncobj conversion using the
SYNCOBJ_{FD_TO_HANDLE,HANDLE_TO_FD} ioctls if we have to.
Note that, unlike Turnip, PanVk is using syncobjs to implement
vkQueueWaitIdle(), so the syncobj -> sync_file conversion doesn't
have to happen for each submission, but maybe there's a good reason
to use sync_files for that too. Any feedback on that aspect would
be useful I guess.

Any feedback on this new ioctl is welcome, in particular, do you
think other things are missing/would be nice to have for Vulkan?

Regards,

Boris

P.S.: basic igt tests for these new ioctls re available there [1]

[1]https://gitlab.freedesktop.org/bbrezillon/igt-gpu-tools/-/tree/panfrost-batch-submit

Changes in v3:
* Fix a deadlock in the submitqueue logic
* Limit the number of submitqueue per context to 16

Boris Brezillon (7):
  drm/panfrost: Pass a job to panfrost_{acquire,attach}_object_fences()
  drm/panfrost: Move the mappings collection out of
panfrost_lookup_bos()
  drm/panfrost: Add BO access flags to relax dependencies between jobs
  drm/panfrost: Add the ability to create submit queues
  drm/panfrost: Add a new ioctl to submit batches
  drm/panfrost: Advertise the SYNCOBJ_TIMELINE feature
  drm/panfrost: Bump minor version to reflect the feature additions

 drivers/gpu/drm/panfrost/Makefile |   3 +-
 drivers/gpu/drm/panfrost/panfrost_device.h|   2 +-
 drivers/gpu/drm/panfrost/panfrost_drv.c   | 463 ++
 drivers/gpu/drm/panfrost/panfrost_job.c   |  89 ++--
 drivers/gpu/drm/panfrost/panfrost_job.h   |  10 +-
 .../gpu/drm/panfrost/panfrost_submitqueue.c   | 136 +
 .../gpu/drm/panfrost/panfrost_submitqueue.h   |  27 +
 include/uapi/drm/panfrost_drm.h   | 103 
 8 files changed, 689 insertions(+), 144 deletions(-)
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_submitqueue.c
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_submitqueue.h

-- 
2.31.1



[PATCH v3 1/7] drm/panfrost: Pass a job to panfrost_{acquire, attach}_object_fences()

2021-07-02 Thread Boris Brezillon
So we don't have to change the prototype if we extend the function.

v3:
* Fix subject

Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_job.c | 22 --
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index 71a72fb50e6b..fdc1bd7ecf12 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -240,15 +240,13 @@ static void panfrost_job_hw_submit(struct panfrost_job 
*job, int js)
spin_unlock(&pfdev->js->job_lock);
 }
 
-static int panfrost_acquire_object_fences(struct drm_gem_object **bos,
- int bo_count,
- struct xarray *deps)
+static int panfrost_acquire_object_fences(struct panfrost_job *job)
 {
int i, ret;
 
-   for (i = 0; i < bo_count; i++) {
+   for (i = 0; i < job->bo_count; i++) {
/* panfrost always uses write mode in its current uapi */
-   ret = drm_gem_fence_array_add_implicit(deps, bos[i], true);
+   ret = drm_gem_fence_array_add_implicit(&job->deps, job->bos[i], 
true);
if (ret)
return ret;
}
@@ -256,14 +254,12 @@ static int panfrost_acquire_object_fences(struct 
drm_gem_object **bos,
return 0;
 }
 
-static void panfrost_attach_object_fences(struct drm_gem_object **bos,
- int bo_count,
- struct dma_fence *fence)
+static void panfrost_attach_object_fences(struct panfrost_job *job)
 {
int i;
 
-   for (i = 0; i < bo_count; i++)
-   dma_resv_add_excl_fence(bos[i]->resv, fence);
+   for (i = 0; i < job->bo_count; i++)
+   dma_resv_add_excl_fence(job->bos[i]->resv, 
job->render_done_fence);
 }
 
 int panfrost_job_push(struct panfrost_job *job)
@@ -290,8 +286,7 @@ int panfrost_job_push(struct panfrost_job *job)
 
job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
 
-   ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
-&job->deps);
+   ret = panfrost_acquire_object_fences(job);
if (ret) {
mutex_unlock(&pfdev->sched_lock);
goto unlock;
@@ -303,8 +298,7 @@ int panfrost_job_push(struct panfrost_job *job)
 
mutex_unlock(&pfdev->sched_lock);
 
-   panfrost_attach_object_fences(job->bos, job->bo_count,
- job->render_done_fence);
+   panfrost_attach_object_fences(job);
 
 unlock:
drm_gem_unlock_reservations(job->bos, job->bo_count, &acquire_ctx);
-- 
2.31.1



[PATCH v3 2/7] drm/panfrost: Move the mappings collection out of panfrost_lookup_bos()

2021-07-02 Thread Boris Brezillon
So we can re-use it from elsewhere.

Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 52 ++---
 1 file changed, 29 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 1ffaef5ec5ff..9bbc9e78cc85 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -109,6 +109,34 @@ static int panfrost_ioctl_create_bo(struct drm_device 
*dev, void *data,
return 0;
 }
 
+static int
+panfrost_get_job_mappings(struct drm_file *file_priv, struct panfrost_job *job)
+{
+   struct panfrost_file_priv *priv = file_priv->driver_priv;
+   unsigned int i;
+
+   job->mappings = kvmalloc_array(job->bo_count,
+  sizeof(*job->mappings),
+  GFP_KERNEL | __GFP_ZERO);
+   if (!job->mappings)
+   return -ENOMEM;
+
+   for (i = 0; i < job->bo_count; i++) {
+   struct panfrost_gem_mapping *mapping;
+   struct panfrost_gem_object *bo;
+
+   bo = to_panfrost_bo(job->bos[i]);
+   mapping = panfrost_gem_mapping_get(bo, priv);
+   if (!mapping)
+   return -EINVAL;
+
+   atomic_inc(&bo->gpu_usecount);
+   job->mappings[i] = mapping;
+   }
+
+   return 0;
+}
+
 /**
  * panfrost_lookup_bos() - Sets up job->bo[] with the GEM objects
  * referenced by the job.
@@ -128,8 +156,6 @@ panfrost_lookup_bos(struct drm_device *dev,
  struct drm_panfrost_submit *args,
  struct panfrost_job *job)
 {
-   struct panfrost_file_priv *priv = file_priv->driver_priv;
-   struct panfrost_gem_object *bo;
unsigned int i;
int ret;
 
@@ -144,27 +170,7 @@ panfrost_lookup_bos(struct drm_device *dev,
if (ret)
return ret;
 
-   job->mappings = kvmalloc_array(job->bo_count,
-  sizeof(struct panfrost_gem_mapping *),
-  GFP_KERNEL | __GFP_ZERO);
-   if (!job->mappings)
-   return -ENOMEM;
-
-   for (i = 0; i < job->bo_count; i++) {
-   struct panfrost_gem_mapping *mapping;
-
-   bo = to_panfrost_bo(job->bos[i]);
-   mapping = panfrost_gem_mapping_get(bo, priv);
-   if (!mapping) {
-   ret = -EINVAL;
-   break;
-   }
-
-   atomic_inc(&bo->gpu_usecount);
-   job->mappings[i] = mapping;
-   }
-
-   return ret;
+   return panfrost_get_job_mappings(file_priv, job);
 }
 
 /**
-- 
2.31.1



[PATCH v3 3/7] drm/panfrost: Add BO access flags to relax dependencies between jobs

2021-07-02 Thread Boris Brezillon
Jobs reading from the same BO should not be serialized. Add access
flags so we can relax the implicit dependencies in that case. We force
exclusive access for now to keep the behavior unchanged, but a new
SUBMIT ioctl taking explicit access flags will be introduced.

Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c |  9 +
 drivers/gpu/drm/panfrost/panfrost_job.c | 23 +++
 drivers/gpu/drm/panfrost/panfrost_job.h |  1 +
 include/uapi/drm/panfrost_drm.h |  2 ++
 4 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 9bbc9e78cc85..b6b5997c9366 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -164,6 +164,15 @@ panfrost_lookup_bos(struct drm_device *dev,
if (!job->bo_count)
return 0;
 
+   job->bo_flags = kvmalloc_array(job->bo_count,
+  sizeof(*job->bo_flags),
+  GFP_KERNEL | __GFP_ZERO);
+   if (!job->bo_flags)
+   return -ENOMEM;
+
+   for (i = 0; i < job->bo_count; i++)
+   job->bo_flags[i] = PANFROST_BO_REF_EXCLUSIVE;
+
ret = drm_gem_objects_lookup(file_priv,
 (void __user *)(uintptr_t)args->bo_handles,
 job->bo_count, &job->bos);
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index fdc1bd7ecf12..152245b122be 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -245,8 +245,16 @@ static int panfrost_acquire_object_fences(struct 
panfrost_job *job)
int i, ret;
 
for (i = 0; i < job->bo_count; i++) {
-   /* panfrost always uses write mode in its current uapi */
-   ret = drm_gem_fence_array_add_implicit(&job->deps, job->bos[i], 
true);
+   bool exclusive = job->bo_flags[i] & PANFROST_BO_REF_EXCLUSIVE;
+
+   if (!exclusive) {
+   ret = dma_resv_reserve_shared(job->bos[i]->resv, 1);
+   if (ret)
+   return ret;
+   }
+
+   ret = drm_gem_fence_array_add_implicit(&job->deps, job->bos[i],
+  exclusive);
if (ret)
return ret;
}
@@ -258,8 +266,14 @@ static void panfrost_attach_object_fences(struct 
panfrost_job *job)
 {
int i;
 
-   for (i = 0; i < job->bo_count; i++)
-   dma_resv_add_excl_fence(job->bos[i]->resv, 
job->render_done_fence);
+   for (i = 0; i < job->bo_count; i++) {
+   struct dma_resv *robj = job->bos[i]->resv;
+
+   if (job->bo_flags[i] & PANFROST_BO_REF_EXCLUSIVE)
+   dma_resv_add_excl_fence(robj, job->render_done_fence);
+   else
+   dma_resv_add_shared_fence(robj, job->render_done_fence);
+   }
 }
 
 int panfrost_job_push(struct panfrost_job *job)
@@ -340,6 +354,7 @@ static void panfrost_job_cleanup(struct kref *ref)
kvfree(job->bos);
}
 
+   kvfree(job->bo_flags);
kfree(job);
 }
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.h 
b/drivers/gpu/drm/panfrost/panfrost_job.h
index 82306a03b57e..1cbc3621b663 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.h
+++ b/drivers/gpu/drm/panfrost/panfrost_job.h
@@ -32,6 +32,7 @@ struct panfrost_job {
 
struct panfrost_gem_mapping **mappings;
struct drm_gem_object **bos;
+   u32 *bo_flags;
u32 bo_count;
 
/* Fence to be signaled by drm-sched once its done with the job */
diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
index 061e700dd06c..45d6c600475c 100644
--- a/include/uapi/drm/panfrost_drm.h
+++ b/include/uapi/drm/panfrost_drm.h
@@ -224,6 +224,8 @@ struct drm_panfrost_madvise {
__u32 retained;   /* out, whether backing store still exists */
 };
 
+#define PANFROST_BO_REF_EXCLUSIVE  0x1
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.31.1



[PATCH v3 5/7] drm/panfrost: Add a new ioctl to submit batches

2021-07-02 Thread Boris Brezillon
This should help limit the number of ioctls when submitting multiple
jobs. The new ioctl also supports syncobj timelines and BO access flags.

v3:
* Re-use panfrost_get_job_bos() and panfrost_get_job_in_syncs() in the
  old submit path

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 366 +++-
 drivers/gpu/drm/panfrost/panfrost_job.c |   3 +
 include/uapi/drm/panfrost_drm.h |  84 ++
 3 files changed, 375 insertions(+), 78 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 6529e5972b47..e2897de6e77d 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -138,111 +138,95 @@ panfrost_get_job_mappings(struct drm_file *file_priv, 
struct panfrost_job *job)
return 0;
 }
 
-/**
- * panfrost_lookup_bos() - Sets up job->bo[] with the GEM objects
- * referenced by the job.
- * @dev: DRM device
- * @file_priv: DRM file for this fd
- * @args: IOCTL args
- * @job: job being set up
- *
- * Resolve handles from userspace to BOs and attach them to job.
- *
- * Note that this function doesn't need to unreference the BOs on
- * failure, because that will happen at panfrost_job_cleanup() time.
- */
+#define PANFROST_BO_REF_ALLOWED_FLAGS \
+   (PANFROST_BO_REF_EXCLUSIVE | PANFROST_BO_REF_NO_IMPLICIT_DEP)
+
 static int
-panfrost_lookup_bos(struct drm_device *dev,
- struct drm_file *file_priv,
- struct drm_panfrost_submit *args,
- struct panfrost_job *job)
+panfrost_get_job_bos(struct drm_file *file_priv,
+u64 refs, u32 ref_stride, u32 count,
+struct panfrost_job *job)
 {
+   void __user *in = u64_to_user_ptr(refs);
unsigned int i;
-   int ret;
 
-   job->bo_count = args->bo_handle_count;
+   job->bo_count = count;
 
-   if (!job->bo_count)
+   if (!count)
return 0;
 
+   job->bos = kvmalloc_array(job->bo_count, sizeof(*job->bos),
+ GFP_KERNEL | __GFP_ZERO);
job->bo_flags = kvmalloc_array(job->bo_count,
   sizeof(*job->bo_flags),
   GFP_KERNEL | __GFP_ZERO);
-   if (!job->bo_flags)
+   if (!job->bos || !job->bo_flags)
return -ENOMEM;
 
-   for (i = 0; i < job->bo_count; i++)
-   job->bo_flags[i] = PANFROST_BO_REF_EXCLUSIVE;
+   for (i = 0; i < count; i++) {
+   struct drm_panfrost_bo_ref ref = { };
+   int ret;
 
-   ret = drm_gem_objects_lookup(file_priv,
-(void __user *)(uintptr_t)args->bo_handles,
-job->bo_count, &job->bos);
-   if (ret)
-   return ret;
+   ret = copy_struct_from_user(&ref, sizeof(ref),
+   in + (i * ref_stride),
+   ref_stride);
+   if (ret)
+   return ret;
 
-   return panfrost_get_job_mappings(file_priv, job);
+   /* Prior to the BATCH_SUBMIT ioctl all accessed BOs were
+* treated as exclusive.
+*/
+   if (ref_stride == sizeof(u32))
+   ref.flags = PANFROST_BO_REF_EXCLUSIVE;
+
+   if ((ref.flags & ~PANFROST_BO_REF_ALLOWED_FLAGS))
+   return -EINVAL;
+
+   job->bos[i] = drm_gem_object_lookup(file_priv, ref.handle);
+   if (!job->bos[i])
+   return -EINVAL;
+
+   job->bo_flags[i] = ref.flags;
+   }
+
+   return 0;
 }
 
-/**
- * panfrost_copy_in_sync() - Sets up job->deps with the sync objects
- * referenced by the job.
- * @dev: DRM device
- * @file_priv: DRM file for this fd
- * @args: IOCTL args
- * @job: job being set up
- *
- * Resolve syncobjs from userspace to fences and attach them to job.
- *
- * Note that this function doesn't need to unreference the fences on
- * failure, because that will happen at panfrost_job_cleanup() time.
- */
 static int
-panfrost_copy_in_sync(struct drm_device *dev,
- struct drm_file *file_priv,
- struct drm_panfrost_submit *args,
- struct panfrost_job *job)
+panfrost_get_job_in_syncs(struct drm_file *file_priv,
+ u64 refs, u32 ref_stride,
+ u32 count, struct panfrost_job *job)
 {
-   u32 *handles;
-   int ret = 0;
-   int i, in_fence_count;
+   const void __user *in = u64_to_user_ptr(refs);
+   unsigned int i;
+   int ret;
 
-   in_fence_count = args->in_sync_count;
-
-   if (!in_fence_count)
+   if (!count)
return 0;
 
-   handles = 

[PATCH v3 7/7] drm/panfrost: Bump minor version to reflect the feature additions

2021-07-02 Thread Boris Brezillon
We now have a new ioctl that allows submitting multiple jobs at once
(among other things) and we support timelined syncobjs. Bump the
minor version number to reflect those changes.

Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 242a16246d79..33cd34a1213c 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -789,6 +789,8 @@ DEFINE_DRM_GEM_FOPS(panfrost_drm_driver_fops);
  * - 1.0 - initial interface
  * - 1.1 - adds HEAP and NOEXEC flags for CREATE_BO
  * - 1.2 - adds AFBC_FEATURES query
+ * - 1.3 - adds the BATCH_SUBMIT, CREATE_SUBMITQUEUE, DESTROY_SUBMITQUEUE
+ *ioctls and advertises the SYNCOBJ_TIMELINE feature
  */
 static const struct drm_driver panfrost_drm_driver = {
.driver_features= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ |
@@ -802,7 +804,7 @@ static const struct drm_driver panfrost_drm_driver = {
.desc   = "panfrost DRM",
.date   = "20180908",
.major  = 1,
-   .minor  = 2,
+   .minor  = 3,
 
.gem_create_object  = panfrost_gem_create_object,
.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
-- 
2.31.1



[PATCH v3 6/7] drm/panfrost: Advertise the SYNCOBJ_TIMELINE feature

2021-07-02 Thread Boris Brezillon
Now that we have a new SUBMIT ioctl dealing with timelined syncojbs we
can advertise the feature.

Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index e2897de6e77d..242a16246d79 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -791,7 +791,8 @@ DEFINE_DRM_GEM_FOPS(panfrost_drm_driver_fops);
  * - 1.2 - adds AFBC_FEATURES query
  */
 static const struct drm_driver panfrost_drm_driver = {
-   .driver_features= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ,
+   .driver_features= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ |
+ DRIVER_SYNCOBJ_TIMELINE,
.open   = panfrost_open,
.postclose  = panfrost_postclose,
.ioctls = panfrost_drm_driver_ioctls,
-- 
2.31.1



[PATCH v3 4/7] drm/panfrost: Add the ability to create submit queues

2021-07-02 Thread Boris Brezillon
Needed to keep VkQueues isolated from each other.

v3:
* Limit the number of submitqueue per context to 16
* Fix a deadlock

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/Makefile |   3 +-
 drivers/gpu/drm/panfrost/panfrost_device.h|   2 +-
 drivers/gpu/drm/panfrost/panfrost_drv.c   |  69 +++--
 drivers/gpu/drm/panfrost/panfrost_job.c   |  47 ++
 drivers/gpu/drm/panfrost/panfrost_job.h   |   9 +-
 .../gpu/drm/panfrost/panfrost_submitqueue.c   | 136 ++
 .../gpu/drm/panfrost/panfrost_submitqueue.h   |  27 
 include/uapi/drm/panfrost_drm.h   |  17 +++
 8 files changed, 264 insertions(+), 46 deletions(-)
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_submitqueue.c
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_submitqueue.h

diff --git a/drivers/gpu/drm/panfrost/Makefile 
b/drivers/gpu/drm/panfrost/Makefile
index b71935862417..e99192b66ec9 100644
--- a/drivers/gpu/drm/panfrost/Makefile
+++ b/drivers/gpu/drm/panfrost/Makefile
@@ -9,6 +9,7 @@ panfrost-y := \
panfrost_gpu.o \
panfrost_job.o \
panfrost_mmu.o \
-   panfrost_perfcnt.o
+   panfrost_perfcnt.o \
+   panfrost_submitqueue.o
 
 obj-$(CONFIG_DRM_PANFROST) += panfrost.o
diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index 8b25278f34c8..51c0ba4e50f5 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -137,7 +137,7 @@ struct panfrost_mmu {
 struct panfrost_file_priv {
struct panfrost_device *pfdev;
 
-   struct drm_sched_entity sched_entity[NUM_JOB_SLOTS];
+   struct idr queues;
 
struct panfrost_mmu *mmu;
 };
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index b6b5997c9366..6529e5972b47 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -19,6 +19,7 @@
 #include "panfrost_job.h"
 #include "panfrost_gpu.h"
 #include "panfrost_perfcnt.h"
+#include "panfrost_submitqueue.h"
 
 static bool unstable_ioctls;
 module_param_unsafe(unstable_ioctls, bool, 0600);
@@ -250,6 +251,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, 
void *data,
struct panfrost_device *pfdev = dev->dev_private;
struct drm_panfrost_submit *args = data;
struct drm_syncobj *sync_out = NULL;
+   struct panfrost_submitqueue *queue;
struct panfrost_job *job;
int ret = 0;
 
@@ -259,10 +261,16 @@ static int panfrost_ioctl_submit(struct drm_device *dev, 
void *data,
if (args->requirements && args->requirements != PANFROST_JD_REQ_FS)
return -EINVAL;
 
+   queue = panfrost_submitqueue_get(file->driver_priv, 0);
+   if (IS_ERR(queue))
+   return PTR_ERR(queue);
+
if (args->out_sync > 0) {
sync_out = drm_syncobj_find(file, args->out_sync);
-   if (!sync_out)
-   return -ENODEV;
+   if (!sync_out) {
+   ret = -ENODEV;
+   goto fail_put_queue;
+   }
}
 
job = kzalloc(sizeof(*job), GFP_KERNEL);
@@ -289,7 +297,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, 
void *data,
if (ret)
goto fail_job;
 
-   ret = panfrost_job_push(job);
+   ret = panfrost_job_push(queue, job);
if (ret)
goto fail_job;
 
@@ -302,6 +310,8 @@ static int panfrost_ioctl_submit(struct drm_device *dev, 
void *data,
 fail_out_sync:
if (sync_out)
drm_syncobj_put(sync_out);
+fail_put_queue:
+   panfrost_submitqueue_put(queue);
 
return ret;
 }
@@ -451,6 +461,36 @@ static int panfrost_ioctl_madvise(struct drm_device *dev, 
void *data,
return ret;
 }
 
+static int
+panfrost_ioctl_create_submitqueue(struct drm_device *dev, void *data,
+ struct drm_file *file_priv)
+{
+   struct panfrost_file_priv *priv = file_priv->driver_priv;
+   struct drm_panfrost_create_submitqueue *args = data;
+   struct panfrost_submitqueue *queue;
+
+   queue = panfrost_submitqueue_create(priv, args->priority, args->flags);
+   if (IS_ERR(queue))
+   return PTR_ERR(queue);
+
+   args->id = queue->id;
+   return 0;
+}
+
+static int
+panfrost_ioctl_destroy_submitqueue(struct drm_device *dev, void *data,
+  struct drm_file *file_priv)
+{
+   struct panfrost_file_priv *priv = file_priv->driver_priv;
+   u32 id = *((u32 *)data);
+
+   /* Default queue can't be destroyed. */
+   if (!id)
+   return -ENOENT;
+
+   return panfrost_submitqueue_destroy(priv, id);
+}
+
 int panfrost_unstable_ioctl_check(void)
 {
if (!unstable_ioctls)
@@ -

Re: [PATCH v3 5/7] drm/panfrost: Add a new ioctl to submit batches

2021-07-02 Thread Boris Brezillon
On Fri, 2 Jul 2021 11:13:16 -0400
Alyssa Rosenzweig  wrote:

> ```
> > +/* Syncobj reference passed at job submission time to encode explicit
> > + * input/output fences.
> > + */
> > +struct drm_panfrost_syncobj_ref {
> > +   __u32 handle;
> > +   __u32 pad;
> > +   __u64 point;
> > +};  
> ```
> 
> What is handle? What is point?

Handle is a syncobj handle, point is the point in a syncobj timeline.
I'll document those fields.

> Why is there padding instead of putting point first?

We can move the point field first, but we need to keep the explicit
padding: the struct has to be 64bit aligned because of the __u64 field
(which the compiler takes care of) but if we don't have an explicit
padding, the unused 32bits are undefined, which might cause trouble if
we extend the struct at some point, since we sort of expect that old
userspace keep this unused 32bit slot to 0, while new users set
non-zero values if they have to.

> 
> ```
> >  #define PANFROST_BO_REF_EXCLUSIVE  0x1
> > +#define PANFROST_BO_REF_NO_IMPLICIT_DEP0x2  
> ```
> 
> This seems logically backwards. NO_IMPLICIT_DEP makes sense if we're
> trying to keep backwards compatibility, but here you're crafting a new
> interface totally from scratch. If anything, isn't BO_REF_IMPLICIT_DEP
> the flag you'd want?

AFAICT, all other drivers make the no-implicit-dep an opt-in, and I
didn't want to do things differently in panfrost. But if that's really
an issue, I can make it an opt-out.

> 
> ```
> > +   /**
> > +* Stride of the jobs array (needed to ease extension of the
> > +* BATCH_SUBMIT ioctl). Should be set to
> > +* sizeof(struct drm_panfrost_job).
> > +*/
> > +   __u32 job_stride;  
> ...
> > +   /**
> > +* Stride of the BO and syncobj reference arrays (needed to ease
> > +* extension of the BATCH_SUBMIT ioctl). Should be set to
> > +* sizeof(struct drm_panfrost_bo_ref).
> > +*/
> > +   __u32 bo_ref_stride;
> > +   __u32 syncobj_ref_stride;  
> ```
> 
> Hmm. I'm not /opposed/ and I know kbase uses strides but it seems like
> somewhat unwarranted complexity, and there is a combinatoric explosion
> here (if jobs, bo refs, and syncobj refs use 3 different versions, as
> this encoding permits... as opposed to just specifying a UABI version or
> something like that)

Sounds like a good idea. I'll add a version field and map that
to a  tuple.

> 
> ```
> > +   /**
> > +* If the submission fails, this encodes the index of the job
> > +* failed.
> > +*/
> > +   __u32 fail_idx;  
> ```
> 
> What if multiple jobs fail?

We stop at the first failure. Note that it's not an execution failure,
but a submission failure (AKA, userspace passed wrong params, like
invalid BO or synobj handles).


Re: [PATCH v3 4/7] drm/panfrost: Add the ability to create submit queues

2021-07-02 Thread Boris Brezillon
On Fri, 2 Jul 2021 16:05:30 +0100
Steven Price  wrote:

> On 02/07/2021 15:32, Boris Brezillon wrote:
> > Needed to keep VkQueues isolated from each other.
> > 
> > v3:
> > * Limit the number of submitqueue per context to 16
> > * Fix a deadlock
> > 
> > Signed-off-by: Boris Brezillon   
> 
> 16 ought to be enough for anyone ;)
> 
> Reviewed-by: Steven Price 

Oops, forgot to change the submitqueue_get() prototype. Will address
that in v4.


Re: [PATCH v3 4/7] drm/panfrost: Add the ability to create submit queues

2021-07-02 Thread Boris Brezillon
On Fri, 2 Jul 2021 17:49:10 +0200
Boris Brezillon  wrote:

> On Fri, 2 Jul 2021 16:05:30 +0100
> Steven Price  wrote:
> 
> > On 02/07/2021 15:32, Boris Brezillon wrote:  
> > > Needed to keep VkQueues isolated from each other.
> > > 
> > > v3:
> > > * Limit the number of submitqueue per context to 16
> > > * Fix a deadlock
> > > 
> > > Signed-off-by: Boris Brezillon 
> > 
> > 16 ought to be enough for anyone ;)
> > 
> > Reviewed-by: Steven Price   
> 
> Oops, forgot to change the submitqueue_get() prototype. Will address
> that in v4.

I meant submitqueue_create().


Re: [PATCH v3 5/7] drm/panfrost: Add a new ioctl to submit batches

2021-07-02 Thread Boris Brezillon
On Fri, 2 Jul 2021 12:49:55 -0400
Alyssa Rosenzweig  wrote:

> > > Why is there padding instead of putting point first?  
> > 
> > We can move the point field first, but we need to keep the explicit
> > padding: the struct has to be 64bit aligned because of the __u64 field
> > (which the compiler takes care of) but if we don't have an explicit
> > padding, the unused 32bits are undefined, which might cause trouble if
> > we extend the struct at some point, since we sort of expect that old
> > userspace keep this unused 32bit slot to 0, while new users set
> > non-zero values if they have to.  
> 
> Makes sense. Reordering still probably makes sense.

Actually, I can't re-order if I want the new in_syncs parser to work
with the old ioctl(), which you and Steven asked me to do :-).


Re: [PATCH v3 5/7] drm/panfrost: Add a new ioctl to submit batches

2021-07-02 Thread Boris Brezillon
On Fri, 2 Jul 2021 12:49:55 -0400
Alyssa Rosenzweig  wrote:

> > > ```  
> > > >  #define PANFROST_BO_REF_EXCLUSIVE  0x1
> > > > +#define PANFROST_BO_REF_NO_IMPLICIT_DEP0x2
> > > ```
> > > 
> > > This seems logically backwards. NO_IMPLICIT_DEP makes sense if we're
> > > trying to keep backwards compatibility, but here you're crafting a new
> > > interface totally from scratch. If anything, isn't BO_REF_IMPLICIT_DEP
> > > the flag you'd want?  
> > 
> > AFAICT, all other drivers make the no-implicit-dep an opt-in, and I
> > didn't want to do things differently in panfrost. But if that's really
> > an issue, I can make it an opt-out.  
> 
> I don't have strong feelings either way. I was just under the
> impressions other drivers did this for b/w compat reasons which don't
> apply here.

Okay, I think I'll keep it like that unless there's a strong reason to
make no-implicit dep the default. It's safer to oversync than the skip
the synchronization, so it does feel like something the user should
explicitly enable.

> 
> > > Hmm. I'm not /opposed/ and I know kbase uses strides but it seems like
> > > somewhat unwarranted complexity, and there is a combinatoric explosion
> > > here (if jobs, bo refs, and syncobj refs use 3 different versions, as
> > > this encoding permits... as opposed to just specifying a UABI version or
> > > something like that)  
> > 
> > Sounds like a good idea. I'll add a version field and map that
> > to a  tuple.  
> 
> Cc Steven, does this make sense?

I have this approach working, and I must admit I prefer it to the
per-object stride field passed to the submit struct.



[PATCH v4 0/7] drm/panfrost: drm/panfrost: Add a new submit ioctl

2021-07-05 Thread Boris Brezillon
Hello,

This is an attempt at providing a new submit ioctl that's more
Vulkan-friendly than the existing one. This ioctl

1/ allows passing several out syncobjs so we can easily update
   several fence/semaphore in a single ioctl() call
2/ allows passing several jobs so we don't have to have one ioctl
   per job-chain recorded in the command buffer
3/ supports disabling implicit dependencies as well as 
   non-exclusive access to BOs, thus removing unnecessary
   synchronization

I've also been looking at adding {IN,OUT}_FENCE_FD support (allowing
one to pass at most one sync_file object in input and/or creating a
sync_file FD embedding the render out fence), but it's not entirely
clear to me when that's useful. Indeed, we can already do the
sync_file <-> syncobj conversion using the
SYNCOBJ_{FD_TO_HANDLE,HANDLE_TO_FD} ioctls if we have to.
Note that, unlike Turnip, PanVk is using syncobjs to implement
vkQueueWaitIdle(), so the syncobj -> sync_file conversion doesn't
have to happen for each submission, but maybe there's a good reason
to use sync_files for that too. Any feedback on that aspect would
be useful I guess.

Any feedback on this new ioctl is welcome, in particular, do you
think other things are missing/would be nice to have for Vulkan?

Regards,

Boris

P.S.: basic igt tests for these new ioctls re available there [1]

[1]https://gitlab.freedesktop.org/bbrezillon/igt-gpu-tools/-/tree/panfrost-batch-submit

Changes in v4:
* Replace the desc strides by a version field
* Change the submitqueue_create() prototype to return the queue id
  directly
* Implement the old submit ioctl() as a simple wrapper around
  panfrost_submit_job()

Changes in v3:
* Fix a deadlock in the submitqueue logic
* Limit the number of submitqueue per context to 16

*** BLURB HERE ***

Boris Brezillon (7):
  drm/panfrost: Pass a job to panfrost_{acquire,attach}_object_fences()
  drm/panfrost: Move the mappings collection out of
panfrost_lookup_bos()
  drm/panfrost: Add BO access flags to relax dependencies between jobs
  drm/panfrost: Add the ability to create submit queues
  drm/panfrost: Add a new ioctl to submit batches
  drm/panfrost: Advertise the SYNCOBJ_TIMELINE feature
  drm/panfrost: Bump minor version to reflect the feature additions

 drivers/gpu/drm/panfrost/Makefile |   3 +-
 drivers/gpu/drm/panfrost/panfrost_device.h|   2 +-
 drivers/gpu/drm/panfrost/panfrost_drv.c   | 611 +-
 drivers/gpu/drm/panfrost/panfrost_job.c   |  89 ++-
 drivers/gpu/drm/panfrost/panfrost_job.h   |  10 +-
 .../gpu/drm/panfrost/panfrost_submitqueue.c   | 132 
 .../gpu/drm/panfrost/panfrost_submitqueue.h   |  26 +
 include/uapi/drm/panfrost_drm.h   | 112 
 8 files changed, 766 insertions(+), 219 deletions(-)
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_submitqueue.c
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_submitqueue.h

-- 
2.31.1



  1   2   3   4   5   6   7   8   9   10   >