Re: [PATCH v4 5/7] drm/panfrost: Add a new ioctl to submit batches
On Thu, 8 Jul 2021 14:10:45 +0200 Christian König wrote: > >> --- a/drivers/gpu/drm/panfrost/panfrost_job.c > >> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c > >> @@ -254,6 +254,9 @@ static int panfrost_acquire_object_fences(struct > >> panfrost_job *job) > >>return ret; > >>} > >> > >> + if (job->bo_flags[i] & PANFROST_BO_REF_NO_IMPLICIT_DEP) > >> + continue; > > This breaks dma_resv rules. I'll send out patch set fixing this pattern in > > other drivers, I'll ping you on that for what you need to change. Should > > go out today or so. I guess you're talking about [1]. TBH, I don't quite see the point of exposing a 'no-implicit' flag if we end up forcing this implicit dep anyway, but I'm probably missing something. > > I'm really wondering if the behavior that the exclusive fences replaces > all the shared fences was such a good idea. Is that what's done in [1], or are you talking about a different patchset/approach? > > It just allows drivers to mess up things in a way which can be easily > used to compromise the system. I must admit I'm a bit lost, so I'm tempted to drop that flag for now :-). [1]https://patchwork.freedesktop.org/patch/443711/?series=92334&rev=3
Re: [PATCH v4 00/18] drm/sched dependency tracking and dma-resv fixes
On Mon, 12 Jul 2021 19:53:34 +0200 Daniel Vetter wrote: > Hi all, > > Quick new version since the previous one was a bit too broken: > - dropped the bug-on patch to avoid breaking amdgpu's gpu reset failure > games > - another attempt at splitting job_init/arm, hopefully we're getting > there. > > Note that Christian has brought up a bikeshed on the new functions to add > dependencies to drm_sched_jobs. I'm happy to repaint, if there's some kind > of consensus on what it should be. > > Testing and review very much welcome, as usual. > > Cheers, Daniel > > Daniel Vetter (18): > drm/sched: Split drm_sched_job_init > drm/sched: Barriers are needed for entity->last_scheduled > drm/sched: Add dependency tracking > drm/sched: drop entity parameter from drm_sched_push_job > drm/sched: improve docs around drm_sched_entity Patches 1, 3, 4 and 5 are Reviewed-by: Boris Brezillon > drm/panfrost: use scheduler dependency tracking > drm/lima: use scheduler dependency tracking > drm/v3d: Move drm_sched_job_init to v3d_job_init > drm/v3d: Use scheduler dependency handling > drm/etnaviv: Use scheduler dependency handling > drm/gem: Delete gem array fencing helpers > drm/sched: Don't store self-dependencies > drm/sched: Check locking in drm_sched_job_await_implicit > drm/msm: Don't break exclusive fence ordering > drm/etnaviv: Don't break exclusive fence ordering > drm/i915: delete exclude argument from i915_sw_fence_await_reservation > drm/i915: Don't break exclusive fence ordering > dma-resv: Give the docs a do-over > > Documentation/gpu/drm-mm.rst | 3 + > drivers/dma-buf/dma-resv.c| 24 ++- > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 4 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 4 +- > drivers/gpu/drm/drm_gem.c | 96 - > drivers/gpu/drm/etnaviv/etnaviv_gem.h | 5 +- > drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 64 +++--- > drivers/gpu/drm/etnaviv/etnaviv_sched.c | 65 +- > drivers/gpu/drm/etnaviv/etnaviv_sched.h | 3 +- > drivers/gpu/drm/i915/display/intel_display.c | 4 +- > drivers/gpu/drm/i915/gem/i915_gem_clflush.c | 2 +- > .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 8 +- > drivers/gpu/drm/i915/i915_sw_fence.c | 6 +- > drivers/gpu/drm/i915/i915_sw_fence.h | 1 - > drivers/gpu/drm/lima/lima_gem.c | 7 +- > drivers/gpu/drm/lima/lima_sched.c | 28 +-- > drivers/gpu/drm/lima/lima_sched.h | 6 +- > drivers/gpu/drm/msm/msm_gem_submit.c | 3 +- > drivers/gpu/drm/panfrost/panfrost_drv.c | 16 +- > drivers/gpu/drm/panfrost/panfrost_job.c | 39 +--- > drivers/gpu/drm/panfrost/panfrost_job.h | 5 +- > drivers/gpu/drm/scheduler/sched_entity.c | 140 +++-- > drivers/gpu/drm/scheduler/sched_fence.c | 19 +- > drivers/gpu/drm/scheduler/sched_main.c| 181 +++-- > drivers/gpu/drm/v3d/v3d_drv.h | 6 +- > drivers/gpu/drm/v3d/v3d_gem.c | 115 +-- > drivers/gpu/drm/v3d/v3d_sched.c | 44 + > include/drm/drm_gem.h | 5 - > include/drm/gpu_scheduler.h | 186 ++ > include/linux/dma-buf.h | 7 + > include/linux/dma-resv.h | 104 +- > 31 files changed, 672 insertions(+), 528 deletions(-) >
Re: [RFC PATCH 0/7] drm/panfrost: Add a new submit ioctl
On Thu, 11 Mar 2021 12:16:33 + Steven Price wrote: > Also the current code completely ignores PANFROST_BO_REF_READ. So either > that should be defined as 0, or even better we support 3 modes: > > * Exclusive ('write' access) > * Shared ('read' access) > * No fence - ensures the BO is mapped, but doesn't add any implicit > fences. I looked at other drivers and they seem to have this NO_FENCE/NO_IMPLICIT flag at the job/submit level and conditioned on the presence of a in_sync_fd file. I can see per-BO NO_FENCE being useful for sub-buffer accesses, assuming we don't want to deal with other deps explicitly, but we could also force the NO_IMPLICIT behavior for the whole job and let the user deal with all deps explicitly. TBH, it's a bit unclear to me how that feature would be used by the vulkan driver, so I'd rather leave that NO_FENCE use case for later. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC PATCH 0/7] drm/panfrost: Add a new submit ioctl
On Fri, 12 Mar 2021 09:37:49 -0600 Jason Ekstrand wrote: > On Fri, Mar 12, 2021 at 1:31 AM Boris Brezillon > wrote: > > > > On Thu, 11 Mar 2021 12:11:48 -0600 > > Jason Ekstrand wrote: > > > > > > > > > > 2/ Queued jobs might be executed out-of-order (unless they have > > > > > > > > explicit/implicit deps between them), and Vulkan asks that > > > > > > > > the out > > > > > > > > fence be signaled when all jobs are done. Timeline syncobjs > > > > > > > > are a > > > > > > > > good match for that use case. All we need to do is pass the > > > > > > > > same > > > > > > > > fence syncobj to all jobs being attached to a single > > > > > > > > QueueSubmit > > > > > > > > request, but a different point on the timeline. The syncobj > > > > > > > > timeline wait does the rest and guarantees that we've > > > > > > > > reached a > > > > > > > > given timeline point (IOW, all jobs before that point are > > > > > > > > done) > > > > > > > > before declaring the fence as signaled. > > > > > > > > One alternative would be to have dummy 'synchronization' > > > > > > > > jobs that > > > > > > > > don't actually execute anything on the GPU but declare a > > > > > > > > dependency > > > > > > > > on all other jobs that are part of the QueueSubmit request, > > > > > > > > and > > > > > > > > signal the out fence (the scheduler would do most of the > > > > > > > > work for > > > > > > > > us, all we have to do is support NULL job heads and signal > > > > > > > > the > > > > > > > > fence directly when that happens instead of queueing the > > > > > > > > job). > > > > > > > > > > > > > > I have to admit to being rather hazy on the details of timeline > > > > > > > syncobjs, but I thought there was a requirement that the timeline > > > > > > > moves > > > > > > > monotonically. I.e. if you have multiple jobs signalling the same > > > > > > > syncobj just with different points, then AFAIU the API requires > > > > > > > that the > > > > > > > points are triggered in order. > > > > > > > > > > > > I only started looking at the SYNCOBJ_TIMELINE API a few days ago, > > > > > > so I > > > > > > might be wrong, but my understanding is that queuing fences > > > > > > (addition > > > > > > of new points in the timeline) should happen in order, but signaling > > > > > > can happen in any order. When checking for a signaled fence the > > > > > > fence-chain logic starts from the last point (or from an explicit > > > > > > point > > > > > > if you use the timeline wait flavor) and goes backward, stopping at > > > > > > the > > > > > > first un-signaled node. If I'm correct, that means that fences that > > > > > > are part of a chain can be signaled in any order. > > > > > > > > > > You don't even need a timeline for this. Just have a single syncobj > > > > > per-queue and make each submit wait on it and then signal it. > > > > > Alternatively, you can just always hang on to the out-fence from the > > > > > previous submit and make the next one wait on that. > > > > > > > > That's what I have right now, but it forces the serialization of all > > > > jobs that are pushed during a submit (and there can be more than one > > > > per command buffer on panfrost :-/). Maybe I'm wrong, but I thought it'd > > > > be better to not force this serialization if we can avoid it. > > > > > > I'm not familiar with panfrost's needs and I don't work on a tiler and > > > I know there are different issues there. But... > > > > > > The Vulkan spec requires that everything that all the submits that > > > happen on a given vkQueue happen in-order. Search the spec for
Re: [RFC PATCH 0/7] drm/panfrost: Add a new submit ioctl
On Fri, 12 Mar 2021 19:25:13 +0100 Boris Brezillon wrote: > > So where does this leave us? Well, it depends on your submit model > > and exactly how you handle pipeline barriers that sync between > > engines. If you're taking option 3 above and doing two command > > buffers for each VkCommandBuffer, then you probably want two > > serialized timelines, one for each engine, and some mechanism to tell > > the kernel driver "these two command buffers have to run in parallel" > > so that your ping-pong works. If you're doing 1 or 2 above, I think > > you probably still want two simple syncobjs, one for each engine. You > > don't really have any need to go all that far back in history. All > > you really need to describe is "command buffer X depends on previous > > compute work" or "command buffer X depends on previous binning work". > > Okay, so this will effectively force in-order execution. Let's take your > previous example and add 2 more jobs at the end that have no deps on > previous commands: > > vkBeginRenderPass() /* Writes to ImageA */ > vkCmdDraw() > vkCmdDraw() > ... > vkEndRenderPass() > vkPipelineBarrier(imageA /* fragment -> compute */) > vkCmdDispatch() /* reads imageA, writes BufferB */ > vkBeginRenderPass() /* Writes to ImageC */ > vkCmdBindVertexBuffers(bufferB) > vkCmdDraw(); > ... > vkEndRenderPass() > vkBeginRenderPass() /* Writes to ImageD */ > vkCmdDraw() > ... > vkEndRenderPass() > > A: Vertex for the first draw on the compute engine > B: Vertex for the first draw on the compute engine > C: Fragment for the first draw on the binning engine; depends on A > D: Fragment for the second draw on the binning engine; depends on B > E: Compute on the compute engine; depends on C and D > F: Vertex for the third draw on the compute engine; depends on E > G: Fragment for the third draw on the binning engine; depends on F > H: Vertex for the fourth draw on the compute engine > I: Fragment for the fourth draw on the binning engine > > When we reach E, we might be waiting for D to finish before scheduling > the job, and because of the implicit serialization we have on the > compute queue (F implicitly depends on E, and H on F) we can't schedule > H either, which could, in theory be started. I guess that's where the > term submission order is a bit unclear to me. The action of starting a > job sounds like execution order to me (the order you starts jobs > determines the execution order since we only have one HW queue per job > type). All implicit deps have been calculated when we queued the job to > the SW queue, and I thought that would be enough to meet the submission > order requirements, but I might be wrong. > > The PoC I have was trying to get rid of this explicit serialization on > the compute and fragment queues by having one syncobj timeline > (queue()) and synchronization points (Sx). > > S0: in-fences=, out-fences= #waitSemaphore > sync point > A: in-fences=, out-fences= > B: in-fences=, out-fences= > C: in-fences=, out-fence= #implicit dep on A through > the tiler context > D: in-fences=, out-fence= #implicit dep on B through > the tiler context > E: in-fences=, out-fence= #implicit dep on D through > imageA > F: in-fences=, out-fence= #implicit dep on E through > buffer B > G: in-fences=, out-fence= #implicit dep on F through > the tiler context > H: in-fences=, out-fence= > I: in-fences=, out-fence= #implicit dep on H through > the tiler buffer > S1: in-fences=, out-fences= > #signalSemaphore,fence sync point > # QueueWaitIdle is implemented with a wait(queue(0)), AKA wait on the last > point > > With this solution H can be started before E if the compute slot > is empty and E's implicit deps are not done. It's probably overkill, > but I thought maximizing GPU utilization was important. Nevermind, I forgot the drm scheduler was dequeuing jobs in order, so 2 syncobjs (one per queue type) is indeed the right approach. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 0/3] drm/panfrost: MMU fixes
Hello, Here are 2 fixes and one improvement for the page fault handling. Those bugs were found while working on indirect draw supports which requires the allocation of a big heap buffer for varyings, and the vertex/tiler shaders seem to have access pattern that trigger those issues. I remember discussing the first issue with Steve or Robin a while back, but we never hit it before (now we do :)). The last patch is a perf improvement: no need to re-enable hardware interrupts if we know the threaded irq handler will be woken up right away. Regards, Boris Boris Brezillon (3): drm/panfrost: Clear MMU irqs before handling the fault drm/panfrost: Don't try to map pages that are already mapped drm/panfrost: Stay in the threaded MMU IRQ handler until we've handled all IRQs drivers/gpu/drm/panfrost/panfrost_mmu.c | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) -- 2.26.2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 2/3] drm/panfrost: Don't try to map pages that are already mapped
We allocate 2MB chunks at a time, so it might appear that a page fault has already been handled by a previous page fault when we reach panfrost_mmu_map_fault_addr(). Bail out in that case to avoid mapping the same area twice. Cc: Fixes: 187d2929206e ("drm/panfrost: Add support for GPU heap allocations") Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 904d63450862..21e552d1ac71 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -488,8 +488,14 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as, } bo->base.pages = pages; bo->base.pages_use_count = 1; - } else + } else { pages = bo->base.pages; + if (pages[page_offset]) { + /* Pages are already mapped, bail out. */ + mutex_unlock(&bo->base.pages_lock); + goto out; + } + } mapping = bo->base.base.filp->f_mapping; mapping_set_unevictable(mapping); @@ -522,6 +528,7 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as, dev_dbg(pfdev->dev, "mapped page fault @ AS%d %llx", as, addr); +out: panfrost_gem_mapping_put(bomapping); return 0; -- 2.26.2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 1/3] drm/panfrost: Clear MMU irqs before handling the fault
When a fault is handled it will unblock the GPU which will continue executing its shader and might fault almost immediately on a different page. If we clear interrupts after handling the fault we might miss new faults, so clear them before. Cc: Fixes: 187d2929206e ("drm/panfrost: Add support for GPU heap allocations") Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 7c1b3481b785..904d63450862 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -593,6 +593,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) access_type = (fault_status >> 8) & 0x3; source_id = (fault_status >> 16); + mmu_write(pfdev, MMU_INT_CLEAR, mask); + /* Page fault only */ ret = -1; if ((status & mask) == BIT(i) && (exception_type & 0xF8) == 0xC0) @@ -616,8 +618,6 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) access_type, access_type_name(pfdev, fault_status), source_id); - mmu_write(pfdev, MMU_INT_CLEAR, mask); - status &= ~mask; } -- 2.26.2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 3/3] drm/panfrost: Stay in the threaded MMU IRQ handler until we've handled all IRQs
Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay in the threaded irq handler as long as we can. Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 21e552d1ac71..65bc20628c4e 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -580,6 +580,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT); int i, ret; +again: + for (i = 0; status; i++) { u32 mask = BIT(i) | BIT(i + 16); u64 addr; @@ -628,6 +630,11 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) status &= ~mask; } + /* If we received new MMU interrupts, process them before returning. */ + status = mmu_read(pfdev, MMU_INT_RAWSTAT); + if (status) + goto again; + mmu_write(pfdev, MMU_INT_MASK, ~0); return IRQ_HANDLED; }; -- 2.26.2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 3/3] drm/panfrost: Stay in the threaded MMU IRQ handler until we've handled all IRQs
On Mon, 1 Feb 2021 12:13:49 + Steven Price wrote: > On 01/02/2021 08:21, Boris Brezillon wrote: > > Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay > > in the threaded irq handler as long as we can. > > > > Signed-off-by: Boris Brezillon > > Looks fine to me, but I'm interested to know if you actually saw a > performance improvement. Back-to-back MMU faults should (hopefully) be > fairly uncommon. I actually didn't check the perf improvement or the actual number of back-to-back MMU faults, but dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_1000x1000_drawcount_5000 seemed to generate a few of those, so I thought it'd be good to optimize that case given how trivial it is. > > Regardless: > > Reviewed-by: Steven Price > > > --- > > drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c > > b/drivers/gpu/drm/panfrost/panfrost_mmu.c > > index 21e552d1ac71..65bc20628c4e 100644 > > --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c > > +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c > > @@ -580,6 +580,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int > > irq, void *data) > > u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT); > > int i, ret; > > > > +again: > > + > > for (i = 0; status; i++) { > > u32 mask = BIT(i) | BIT(i + 16); > > u64 addr; > > @@ -628,6 +630,11 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int > > irq, void *data) > > status &= ~mask; > > } > > > > + /* If we received new MMU interrupts, process them before returning. */ > > + status = mmu_read(pfdev, MMU_INT_RAWSTAT); > > + if (status) > > + goto again; > > + > > mmu_write(pfdev, MMU_INT_MASK, ~0); > > return IRQ_HANDLED; > > }; > > > ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 3/3] drm/panfrost: Stay in the threaded MMU IRQ handler until we've handled all IRQs
On Mon, 1 Feb 2021 13:24:00 + Steven Price wrote: > On 01/02/2021 12:59, Boris Brezillon wrote: > > On Mon, 1 Feb 2021 12:13:49 + > > Steven Price wrote: > > > >> On 01/02/2021 08:21, Boris Brezillon wrote: > >>> Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay > >>> in the threaded irq handler as long as we can. > >>> > >>> Signed-off-by: Boris Brezillon > >> > >> Looks fine to me, but I'm interested to know if you actually saw a > >> performance improvement. Back-to-back MMU faults should (hopefully) be > >> fairly uncommon. > > > > I actually didn't check the perf improvement or the actual number of > > back-to-back MMU faults, but > > dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_1000x1000_drawcount_5000 > > seemed to generate a few of those, so I thought it'd be good to > > optimize that case given how trivial it is. > > Fair enough! I was just a little concerned that Panfrost was somehow > provoking enough interrupts that this was a measurable performance > improvement. > > I assume you'll push these to drm-misc-next (/fixes) as appropriate. I think I'll just queue everything to misc-next and let the stable maintainers backport the fixes (no one complained about this issue so far). > > Thanks, > > Steve > > >> > >> Regardless: > >> > >> Reviewed-by: Steven Price > >> > >>> --- > >>>drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++ > >>>1 file changed, 7 insertions(+) > >>> > >>> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c > >>> b/drivers/gpu/drm/panfrost/panfrost_mmu.c > >>> index 21e552d1ac71..65bc20628c4e 100644 > >>> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c > >>> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c > >>> @@ -580,6 +580,8 @@ static irqreturn_t > >>> panfrost_mmu_irq_handler_thread(int irq, void *data) > >>> u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT); > >>> int i, ret; > >>> > >>> +again: > >>> + > >>> for (i = 0; status; i++) { > >>> u32 mask = BIT(i) | BIT(i + 16); > >>> u64 addr; > >>> @@ -628,6 +630,11 @@ static irqreturn_t > >>> panfrost_mmu_irq_handler_thread(int irq, void *data) > >>> status &= ~mask; > >>> } > >>> > >>> + /* If we received new MMU interrupts, process them before returning. */ > >>> + status = mmu_read(pfdev, MMU_INT_RAWSTAT); > >>> + if (status) > >>> + goto again; > >>> + > >>> mmu_write(pfdev, MMU_INT_MASK, ~0); > >>> return IRQ_HANDLED; > >>>}; > >>> > >> > > > ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v2 0/3] drm/panfrost: MMU fixes
Hello, Here are 2 fixes and one improvement for the page fault handling. Those bugs were found while working on indirect draw supports which requires the allocation of a big heap buffer for varyings, and the vertex/tiler shaders seem to have access pattern that trigger those issues. I remember discussing the first issue with Steve or Robin a while back, but we never hit it before (now we do :)). The last patch is a perf improvement: no need to re-enable hardware interrupts if we know the threaded irq handler will be woken up right away. Regards, Boris Changes in v2: * Rework the MMU irq handling loop to avoid a goto Boris Brezillon (3): drm/panfrost: Clear MMU irqs before handling the fault drm/panfrost: Don't try to map pages that are already mapped drm/panfrost: Stay in the threaded MMU IRQ handler until we've handled all IRQs drivers/gpu/drm/panfrost/panfrost_mmu.c | 39 +++-- 1 file changed, 24 insertions(+), 15 deletions(-) -- 2.26.2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v2 1/3] drm/panfrost: Clear MMU irqs before handling the fault
When a fault is handled it will unblock the GPU which will continue executing its shader and might fault almost immediately on a different page. If we clear interrupts after handling the fault we might miss new faults, so clear them before. Cc: Fixes: 187d2929206e ("drm/panfrost: Add support for GPU heap allocations") Signed-off-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 7c1b3481b785..904d63450862 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -593,6 +593,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) access_type = (fault_status >> 8) & 0x3; source_id = (fault_status >> 16); + mmu_write(pfdev, MMU_INT_CLEAR, mask); + /* Page fault only */ ret = -1; if ((status & mask) == BIT(i) && (exception_type & 0xF8) == 0xC0) @@ -616,8 +618,6 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) access_type, access_type_name(pfdev, fault_status), source_id); - mmu_write(pfdev, MMU_INT_CLEAR, mask); - status &= ~mask; } -- 2.26.2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v2 3/3] drm/panfrost: Stay in the threaded MMU IRQ handler until we've handled all IRQs
Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay in the threaded irq handler as long as we can. v2: * Rework the loop to avoid a goto Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 26 + 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 21e552d1ac71..0581186ebfb3 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -578,22 +578,20 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) { struct panfrost_device *pfdev = data; u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT); - int i, ret; + int ret; - for (i = 0; status; i++) { - u32 mask = BIT(i) | BIT(i + 16); + while (status) { + u32 as = ffs(status | (status >> 16)) - 1; + u32 mask = BIT(as) | BIT(as + 16); u64 addr; u32 fault_status; u32 exception_type; u32 access_type; u32 source_id; - if (!(status & mask)) - continue; - - fault_status = mmu_read(pfdev, AS_FAULTSTATUS(i)); - addr = mmu_read(pfdev, AS_FAULTADDRESS_LO(i)); - addr |= (u64)mmu_read(pfdev, AS_FAULTADDRESS_HI(i)) << 32; + fault_status = mmu_read(pfdev, AS_FAULTSTATUS(as)); + addr = mmu_read(pfdev, AS_FAULTADDRESS_LO(as)); + addr |= (u64)mmu_read(pfdev, AS_FAULTADDRESS_HI(as)) << 32; /* decode the fault status */ exception_type = fault_status & 0xFF; @@ -604,8 +602,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) /* Page fault only */ ret = -1; - if ((status & mask) == BIT(i) && (exception_type & 0xF8) == 0xC0) - ret = panfrost_mmu_map_fault_addr(pfdev, i, addr); + if ((status & mask) == BIT(as) && (exception_type & 0xF8) == 0xC0) + ret = panfrost_mmu_map_fault_addr(pfdev, as, addr); if (ret) /* terminal fault, print info about the fault */ @@ -617,7 +615,7 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) "exception type 0x%X: %s\n" "access type 0x%X: %s\n" "source id 0x%X\n", - i, addr, + as, addr, "TODO", fault_status, (fault_status & (1 << 10) ? "DECODER FAULT" : "SLAVE FAULT"), @@ -626,6 +624,10 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) source_id); status &= ~mask; + + /* If we received new MMU interrupts, process them before returning. */ + if (!status) + status = mmu_read(pfdev, MMU_INT_RAWSTAT); } mmu_write(pfdev, MMU_INT_MASK, ~0); -- 2.26.2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v2 2/3] drm/panfrost: Don't try to map pages that are already mapped
We allocate 2MB chunks at a time, so it might appear that a page fault has already been handled by a previous page fault when we reach panfrost_mmu_map_fault_addr(). Bail out in that case to avoid mapping the same area twice. Cc: Fixes: 187d2929206e ("drm/panfrost: Add support for GPU heap allocations") Signed-off-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 904d63450862..21e552d1ac71 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -488,8 +488,14 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as, } bo->base.pages = pages; bo->base.pages_use_count = 1; - } else + } else { pages = bo->base.pages; + if (pages[page_offset]) { + /* Pages are already mapped, bail out. */ + mutex_unlock(&bo->base.pages_lock); + goto out; + } + } mapping = bo->base.base.filp->f_mapping; mapping_set_unevictable(mapping); @@ -522,6 +528,7 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as, dev_dbg(pfdev->dev, "mapped page fault @ AS%d %llx", as, addr); +out: panfrost_gem_mapping_put(bomapping); return 0; -- 2.26.2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v2 0/3] drm/panfrost: MMU fixes
On Fri, 5 Feb 2021 12:17:54 +0100 Boris Brezillon wrote: > Hello, > > Here are 2 fixes and one improvement for the page fault handling. Those > bugs were found while working on indirect draw supports which requires > the allocation of a big heap buffer for varyings, and the vertex/tiler > shaders seem to have access pattern that trigger those issues. I > remember discussing the first issue with Steve or Robin a while back, > but we never hit it before (now we do :)). > > The last patch is a perf improvement: no need to re-enable hardware > interrupts if we know the threaded irq handler will be woken up right > away. > > Regards, > > Boris > > Changes in v2: > * Rework the MMU irq handling loop to avoid a goto Queued to drm-misc-next. > > Boris Brezillon (3): > drm/panfrost: Clear MMU irqs before handling the fault > drm/panfrost: Don't try to map pages that are already mapped > drm/panfrost: Stay in the threaded MMU IRQ handler until we've handled > all IRQs > > drivers/gpu/drm/panfrost/panfrost_mmu.c | 39 +++-- > 1 file changed, 24 insertions(+), 15 deletions(-) > ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/shmem-helper: Don't remove the offset in vm_area_struct pgoff
whether the > > >>>>>>> best (short term) solution is to actually disable the shrinker. I'm > > >>>>>>> somewhat surprised that nobody has got fed up with the "Purging xxx > > >>>>>>> bytes" message spam - which makes me think that most people are not > > >>>>>>> hitting memory pressure to trigger the shrinker. > > >>>>>> > > >>>>>> If the shrinker is dodgy, then it's probably good to have the > > >>>>>> messages > > >>>>>> to know if it ran. > > >>>>>> > > >>>>>>> The shrinker on kbase caused a lot of grief - and the only way I > > >>>>>>> managed > > >>>>>>> to get that under control was by writing a static analysis tool for > > >>>>>>> the > > >>>>>>> locking, and by upsetting people by enforcing the (rather dumb) > > >>>>>>> rules of > > >>>>>>> the tool on the code base. I've been meaning to look at whether > > >>>>>>> sparse > > >>>>>>> can do a similar check of locks. > > >>>>>> > > >>>>>> Lockdep doesn't cover it? > > >>>>> > > >>>>> Short answer: no ;) > > >>> > > >>> It's pretty good actually, if you correctly annotate things up. > > >> > > >> I agree - it's pretty good, the problem is you need reasonable test > > >> coverage, and getting good test coverage of shrinkers is hard. > > >> > > >>>>> The problem with lockdep is that you have to trigger the locking > > >>>>> scenario to get a warning out of it. For example you obviously won't > > >>>>> get > > >>>>> any warnings about the shrinker without triggering the shrinker (which > > >>>>> means memory pressure since we don't have the debugfs file to trigger > > >>>>> it). > > >>>> > > >>>> Actually, you don't need debugfs. Writing to /proc/sys/vm/drop_caches > > >>>> will do it. Though maybe there's other code path scenarios that > > >>>> wouldn't cover. > > >>> > > >>> Huh didn't know, but it's a bit a shotgun, plus it doesn't use > > >>> fs_reclaim shrinker annotations, which means you don't have lockdep > > >>> checks. I think at least, would need some deadlock and testing. > > >> > > >> The big problem with this sort of method for triggering the shrinkers is > > >> that they are called without (many) locks held. Whereas it's entirely > > >> possible for a shrinker to be called at (almost) any allocation in the > > >> kernel. > > >> > > >> Admittedly the Panfrost shrinkers are fairly safe - because most things > > >> are > > >> xxx_trylock(). kbase avoids trylock which makes reclaim more reliable, > > >> but > > >> means deadlocks are much easier. > > > > > > This is why you need the fs_reclaim annotation. With that lockdep can > > > connect the dots. See also might_alloc() annotations I've added in 5.11 or > > > so. > > > > > > Validating shrinkers for deadlocks is actually not that hard, you just > > > need the debugfs interface to run your shrinker at will under the > > > fs_reclaim_acquire/release annotations. You do _not_ need to hit the full > > > combinatorial test matrix of making sure that your shrinker is called in > > > any possible place where memory is allocated. > > > > Cool - I hadn't looked at that code before, but it does look like it > > should pick up the problem cases. I wish that had existed back when I > > was dealing with kbase! :) > > > > >>>>> I have to admit I'm not 100% sure I've seen any lockdep warnings based > > >>>>> on buffer objects recently. I can trigger them based on jobs: > > >>>>> > > >> [snip] > > >>>>> > > >>>>> Certainly here the mutex causing the problem is the shrinker_lock! > > >>>>> > > >>>>> The above is triggered by chucking a whole ton of jobs which > > >>>>> fault at the GPU. > > >>>>> > > >>>>> Sadly I haven't found time to work out how to untangle the locks. > > >>>> > > >>>> They are tricky because pretty much any memory allocation can trigger > > >>>> things as I recall. > > >>> > > >>> The above should only be possible with my dma_fence annotations, and > > >>> yes the point to bugs in the drm/scheduler. They shouldn't matter for > > >>> panfrost, and those patches aren't in upstream yet. > > >> > > >> Yes that's on a (random version of) drm-misc - just what I happened to > > >> have > > >> built recently. Good news if that's not actually Panfrost's bug. I > > >> haven't > > >> had the time to track down what's going on yet. > > > > > > Are you sure this is really drm-misc? The patch should never have been > > > there which adds these annotations. > > > > Well drm-misc-next. It's based on commit e4abd7ad2b77 with some pending > > Panfrost fixes on top. > > > > > Also help for fixing common code is appreciated :-) > > > > Indeed - I have put it on my todo list, but I'm not sure when I'll be > > able to spend time on it. > > Oh there's indeed panfrost annotations in panfrost_reset(), but I > guess that stuff just wasn't ever really tested when it landed? > > commit 5bc5cc2819c2c0adb644919e3e790b504ea47e0a > Author: Boris Brezillon > Date: Thu Nov 5 16:17:04 2020 +0100 > >drm/panfrost: Move the GPU reset bits outside the timeout handler > > Adding Boris, since my idea wasn't to just add more lockdep splats to > panfrost, but help make sure the reset code works correctly. Okay, I'm a bit tired of this "oh, this is not really what I asked" reply I got when you realize the code you acked/reviewed is not what you expected. The dmafence annotation stuff is far from trivial, and I remember asking for help when I submitted the version adding those annotations in the reset path. Note that I didn't intend to spend days on that stuff since all I was trying to do with this patch was to fix a deadlock we had in the reset path, deadlock that was not related to dma fence signaling at all. So yes, it was not thoroughly tested (especially the fence signaling stuff), but enough to validate the workloads that triggered the deadlock I was trying to fix no longer caused this deadlock. Don't get me wrong, I'm all for improving the DRM subsystem, and this fencing annotation is probably a great thing, but you can't expect people to take on those complex tasks without a bit of supervision. Regards, Boris ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[RFC PATCH 0/7] drm/panfrost: Add a new submit ioctl
Hello, I've been playing with Vulkan lately and struggled quite a bit to implement VkQueueSubmit with the submit ioctl we have. There are several limiting factors that can be worked around if we really have to, but I think it'd be much easier and future-proof if we introduce a new ioctl that addresses the current limitations: 1/ There can only be one out_sync, but Vulkan might ask us to signal several VkSemaphores and possibly one VkFence too, both of those being based on sync objects in my PoC. Making out_sync an array of syncobjs to attach the render_done fence to would make that possible. The other option would be to collect syncobj updates in userspace in a separate thread and propagate those updates to all semaphores+fences waiting on those events (I think the v3dv driver does something like that, but I didn't spend enough time studying the code to be sure, so I might be wrong). 2/ Queued jobs might be executed out-of-order (unless they have explicit/implicit deps between them), and Vulkan asks that the out fence be signaled when all jobs are done. Timeline syncobjs are a good match for that use case. All we need to do is pass the same fence syncobj to all jobs being attached to a single QueueSubmit request, but a different point on the timeline. The syncobj timeline wait does the rest and guarantees that we've reached a given timeline point (IOW, all jobs before that point are done) before declaring the fence as signaled. One alternative would be to have dummy 'synchronization' jobs that don't actually execute anything on the GPU but declare a dependency on all other jobs that are part of the QueueSubmit request, and signal the out fence (the scheduler would do most of the work for us, all we have to do is support NULL job heads and signal the fence directly when that happens instead of queueing the job). 3/ The current implementation lacks information about BO access, so we serialize all jobs accessing the same set of BOs, even if those jobs might just be reading from them (which can happen concurrently). Other drivers pass an access type to the list of referenced BOs to address that. Another option would be to disable implicit deps (deps based on BOs) and force the driver to pass all deps explicitly (interestingly, some drivers have both the no-implicit-dep and r/w flags, probably to support sub-resource access, so we might want to add that one too). I don't see any userspace workaround to that problem, so that one alone would justify extending the existing ioctl or adding a new one. 4/ There's also the fact that submitting one job at a time adds an overhead when QueueSubmit is being passed more than one CommandBuffer. That one is less problematic, but if we're adding a new ioctl we'd better design it to limit the userspace -> kernel transition overhead. Right now I'm just trying to collect feedback. I don't intend to get those patches merged until we have a userspace user, but I thought starting the discussion early would be a good thing. Feel free to suggest other approaches. Regards, Boris Boris Brezillon (7): drm/panfrost: Pass a job to panfrost_{acquire,attach_object_fences}() drm/panfrost: Collect implicit and explicit deps in an XArray drm/panfrost: Move the mappings collection out of panfrost_lookup_bos() drm/panfrost: Add BO access flags to relax dependencies between jobs drm/panfrost: Add a new ioctl to submit batches drm/panfrost: Advertise the SYNCOBJ_TIMELINE feature drm/panfrost: Bump minor version to reflect the feature additions drivers/gpu/drm/panfrost/panfrost_drv.c | 408 +--- drivers/gpu/drm/panfrost/panfrost_job.c | 80 +++-- drivers/gpu/drm/panfrost/panfrost_job.h | 8 +- include/uapi/drm/panfrost_drm.h | 83 + 4 files changed, 483 insertions(+), 96 deletions(-) -- 2.26.2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[RFC PATCH 3/7] drm/panfrost: Move the mappings collection out of panfrost_lookup_bos()
So we can re-use it from elsewhere. Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_drv.c | 52 ++--- 1 file changed, 29 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index 86c1347c6f0e..32e822e00a08 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -108,6 +108,34 @@ static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data, return 0; } +static int +panfrost_get_job_mappings(struct drm_file *file_priv, struct panfrost_job *job) +{ + struct panfrost_file_priv *priv = file_priv->driver_priv; + unsigned int i; + + job->mappings = kvmalloc_array(job->bo_count, + sizeof(*job->mappings), + GFP_KERNEL | __GFP_ZERO); + if (!job->mappings) + return -ENOMEM; + + for (i = 0; i < job->bo_count; i++) { + struct panfrost_gem_mapping *mapping; + struct panfrost_gem_object *bo; + + bo = to_panfrost_bo(job->bos[i]); + mapping = panfrost_gem_mapping_get(bo, priv); + if (!mapping) + return -EINVAL; + + atomic_inc(&bo->gpu_usecount); + job->mappings[i] = mapping; + } + + return 0; +} + /** * panfrost_lookup_bos() - Sets up job->bo[] with the GEM objects * referenced by the job. @@ -127,8 +155,6 @@ panfrost_lookup_bos(struct drm_device *dev, struct drm_panfrost_submit *args, struct panfrost_job *job) { - struct panfrost_file_priv *priv = file_priv->driver_priv; - struct panfrost_gem_object *bo; unsigned int i; int ret; @@ -143,27 +169,7 @@ panfrost_lookup_bos(struct drm_device *dev, if (ret) return ret; - job->mappings = kvmalloc_array(job->bo_count, - sizeof(struct panfrost_gem_mapping *), - GFP_KERNEL | __GFP_ZERO); - if (!job->mappings) - return -ENOMEM; - - for (i = 0; i < job->bo_count; i++) { - struct panfrost_gem_mapping *mapping; - - bo = to_panfrost_bo(job->bos[i]); - mapping = panfrost_gem_mapping_get(bo, priv); - if (!mapping) { - ret = -EINVAL; - break; - } - - atomic_inc(&bo->gpu_usecount); - job->mappings[i] = mapping; - } - - return ret; + return panfrost_get_job_mappings(file_priv, job); } /** -- 2.26.2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[RFC PATCH 2/7] drm/panfrost: Collect implicit and explicit deps in an XArray
This way we can re-use the standard drm_gem_fence_array_add_implicit() helper and simplify the panfrost_job_dependency() logic. Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_drv.c | 42 +++--- drivers/gpu/drm/panfrost/panfrost_job.c | 57 +++-- drivers/gpu/drm/panfrost/panfrost_job.h | 7 +-- 3 files changed, 42 insertions(+), 64 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index 83a461bdeea8..86c1347c6f0e 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -137,12 +137,6 @@ panfrost_lookup_bos(struct drm_device *dev, if (!job->bo_count) return 0; - job->implicit_fences = kvmalloc_array(job->bo_count, - sizeof(struct dma_fence *), - GFP_KERNEL | __GFP_ZERO); - if (!job->implicit_fences) - return -ENOMEM; - ret = drm_gem_objects_lookup(file_priv, (void __user *)(uintptr_t)args->bo_handles, job->bo_count, &job->bos); @@ -173,8 +167,7 @@ panfrost_lookup_bos(struct drm_device *dev, } /** - * panfrost_copy_in_sync() - Sets up job->in_fences[] with the sync objects - * referenced by the job. + * panfrost_copy_in_sync() - Add in_syncs to the job->deps array. * @dev: DRM device * @file_priv: DRM file for this fd * @args: IOCTL args @@ -187,28 +180,18 @@ panfrost_lookup_bos(struct drm_device *dev, */ static int panfrost_copy_in_sync(struct drm_device *dev, - struct drm_file *file_priv, - struct drm_panfrost_submit *args, - struct panfrost_job *job) + struct drm_file *file_priv, + struct drm_panfrost_submit *args, + struct panfrost_job *job) { u32 *handles; int ret = 0; int i; - job->in_fence_count = args->in_sync_count; - - if (!job->in_fence_count) + if (!args->in_sync_count) return 0; - job->in_fences = kvmalloc_array(job->in_fence_count, - sizeof(struct dma_fence *), - GFP_KERNEL | __GFP_ZERO); - if (!job->in_fences) { - DRM_DEBUG("Failed to allocate job in fences\n"); - return -ENOMEM; - } - - handles = kvmalloc_array(job->in_fence_count, sizeof(u32), GFP_KERNEL); + handles = kvmalloc_array(args->in_sync_count, sizeof(u32), GFP_KERNEL); if (!handles) { ret = -ENOMEM; DRM_DEBUG("Failed to allocate incoming syncobj handles\n"); @@ -217,17 +200,23 @@ panfrost_copy_in_sync(struct drm_device *dev, if (copy_from_user(handles, (void __user *)(uintptr_t)args->in_syncs, - job->in_fence_count * sizeof(u32))) { + args->in_sync_count * sizeof(u32))) { ret = -EFAULT; DRM_DEBUG("Failed to copy in syncobj handles\n"); goto fail; } - for (i = 0; i < job->in_fence_count; i++) { + for (i = 0; i < args->in_sync_count; i++) { + struct dma_fence *fence; + ret = drm_syncobj_find_fence(file_priv, handles[i], 0, 0, -&job->in_fences[i]); +&fence); if (ret == -EINVAL) goto fail; + + ret = drm_gem_fence_array_add(&job->deps, fence); + if (ret) + goto fail; } fail: @@ -269,6 +258,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data, job->requirements = args->requirements; job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev); job->file_priv = file->driver_priv; + xa_init_flags(&job->deps, XA_FLAGS_ALLOC); ret = panfrost_copy_in_sync(dev, file, args, job); if (ret) diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 564054a96ca9..87bfbd77d753 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -196,20 +196,31 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js) job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START); } -static void panfrost_acquire_object_fences(struct panfrost_job *job) +static int panfrost_acquire_object_fences(struct panfrost_job *job) { int i; - for (i = 0; i < job->bo_count; i++) - job->implicit_fences[i] = dma_resv_get_excl_rcu(
[RFC PATCH 4/7] drm/panfrost: Add BO access flags to relax dependencies between jobs
Jobs reading from the same BO should not be serialized. Add access flags so we can relax the implicit dependencies in that case. We force RW access for now to keep the behavior unchanged, but a new SUBMIT ioctl taking explicit access flags will be introduced. Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_drv.c | 9 + drivers/gpu/drm/panfrost/panfrost_job.c | 15 +-- drivers/gpu/drm/panfrost/panfrost_job.h | 1 + include/uapi/drm/panfrost_drm.h | 4 4 files changed, 27 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index 32e822e00a08..be45efee47a2 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -163,6 +163,15 @@ panfrost_lookup_bos(struct drm_device *dev, if (!job->bo_count) return 0; + job->bo_flags = kvmalloc_array(job->bo_count, + sizeof(*job->bo_flags), + GFP_KERNEL | __GFP_ZERO); + if (!job->bo_flags) + return -ENOMEM; + + for (i = 0; i < job->bo_count; i++) + job->bo_flags[i] = PANFROST_BO_REF_RW; + ret = drm_gem_objects_lookup(file_priv, (void __user *)(uintptr_t)args->bo_handles, job->bo_count, &job->bos); diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 87bfbd77d753..170c3b0c8641 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -201,10 +201,17 @@ static int panfrost_acquire_object_fences(struct panfrost_job *job) int i; for (i = 0; i < job->bo_count; i++) { + bool write = job->bo_flags[i] & PANFROST_BO_REF_WRITE; int ret; + if (!(job->bo_flags[i] & PANFROST_BO_REF_WRITE)) { + ret = dma_resv_reserve_shared(job->bos[i]->resv, 1); + if (ret) + return ret; + } + ret = drm_gem_fence_array_add_implicit(&job->deps, - job->bos[i], true); + job->bos[i], write); if (ret) return ret; } @@ -219,7 +226,10 @@ static void panfrost_attach_object_fences(struct panfrost_job *job) for (i = 0; i < job->bo_count; i++) { struct dma_resv *robj = job->bos[i]->resv; - dma_resv_add_excl_fence(robj, job->render_done_fence); + if (job->bo_flags[i] & PANFROST_BO_REF_WRITE) + dma_resv_add_excl_fence(robj, job->render_done_fence); + else + dma_resv_add_shared_fence(robj, job->render_done_fence); } } @@ -298,6 +308,7 @@ static void panfrost_job_cleanup(struct kref *ref) kvfree(job->bos); } + kvfree(job->bo_flags); kfree(job); } diff --git a/drivers/gpu/drm/panfrost/panfrost_job.h b/drivers/gpu/drm/panfrost/panfrost_job.h index b10b5be57dd2..c3f662771ed9 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.h +++ b/drivers/gpu/drm/panfrost/panfrost_job.h @@ -31,6 +31,7 @@ struct panfrost_job { struct panfrost_gem_mapping **mappings; struct drm_gem_object **bos; + u32 *bo_flags; u32 bo_count; /* Fence to be signaled by drm-sched once its done with the job */ diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h index ec19db1eead8..eab96d7e0e0e 100644 --- a/include/uapi/drm/panfrost_drm.h +++ b/include/uapi/drm/panfrost_drm.h @@ -223,6 +223,10 @@ struct drm_panfrost_madvise { __u32 retained; /* out, whether backing store still exists */ }; +#define PANFROST_BO_REF_READ 0x1 +#define PANFROST_BO_REF_WRITE 0x2 +#define PANFROST_BO_REF_RW (PANFROST_BO_REF_READ | PANFROST_BO_REF_WRITE) + #if defined(__cplusplus) } #endif -- 2.26.2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[RFC PATCH 7/7] drm/panfrost: Bump minor version to reflect the feature additions
We now have a new ioctl that allows submitting multiple jobs at once (among other things) and we support timelined syncobjs. Bump the minor version number to reflect those changes. Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_drv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index b3cc6fecb18d..7fb9b20ba27f 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -855,6 +855,7 @@ DEFINE_DRM_GEM_FOPS(panfrost_drm_driver_fops); * Panfrost driver version: * - 1.0 - initial interface * - 1.1 - adds HEAP and NOEXEC flags for CREATE_BO + * - 1.2 - adds BATCH_SUBMIT ioctl and SYNCOBJ_TIMELINE feature */ static const struct drm_driver panfrost_drm_driver = { .driver_features= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ | @@ -868,7 +869,7 @@ static const struct drm_driver panfrost_drm_driver = { .desc = "panfrost DRM", .date = "20180908", .major = 1, - .minor = 1, + .minor = 2, .gem_create_object = panfrost_gem_create_object, .prime_handle_to_fd = drm_gem_prime_handle_to_fd, -- 2.26.2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[RFC PATCH 6/7] drm/panfrost: Advertise the SYNCOBJ_TIMELINE feature
Now that we have a new SUBMIT ioctl dealing with timelined syncojbs we can advertise the feature. Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_drv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index 3f8addab8bab..b3cc6fecb18d 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -857,7 +857,8 @@ DEFINE_DRM_GEM_FOPS(panfrost_drm_driver_fops); * - 1.1 - adds HEAP and NOEXEC flags for CREATE_BO */ static const struct drm_driver panfrost_drm_driver = { - .driver_features= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ, + .driver_features= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ | + DRIVER_SYNCOBJ_TIMELINE, .open = panfrost_open, .postclose = panfrost_postclose, .ioctls = panfrost_drm_driver_ioctls, -- 2.26.2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[RFC PATCH 1/7] drm/panfrost: Pass a job to panfrost_{acquire, attach_object_fences}()
So we don't have to change the prototype if we extend the function. Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_job.c | 22 -- 1 file changed, 8 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 6003cfeb1322..564054a96ca9 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -196,24 +196,20 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js) job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START); } -static void panfrost_acquire_object_fences(struct drm_gem_object **bos, - int bo_count, - struct dma_fence **implicit_fences) +static void panfrost_acquire_object_fences(struct panfrost_job *job) { int i; - for (i = 0; i < bo_count; i++) - implicit_fences[i] = dma_resv_get_excl_rcu(bos[i]->resv); + for (i = 0; i < job->bo_count; i++) + job->implicit_fences[i] = dma_resv_get_excl_rcu(job->bos[i]->resv); } -static void panfrost_attach_object_fences(struct drm_gem_object **bos, - int bo_count, - struct dma_fence *fence) +static void panfrost_attach_object_fences(struct panfrost_job *job) { int i; - for (i = 0; i < bo_count; i++) - dma_resv_add_excl_fence(bos[i]->resv, fence); + for (i = 0; i < job->bo_count; i++) + dma_resv_add_excl_fence(job->bos[i]->resv, job->render_done_fence); } int panfrost_job_push(struct panfrost_job *job) @@ -243,15 +239,13 @@ int panfrost_job_push(struct panfrost_job *job) kref_get(&job->refcount); /* put by scheduler job completion */ - panfrost_acquire_object_fences(job->bos, job->bo_count, - job->implicit_fences); + panfrost_acquire_object_fences(job); drm_sched_entity_push_job(&job->base, entity); mutex_unlock(&pfdev->sched_lock); - panfrost_attach_object_fences(job->bos, job->bo_count, - job->render_done_fence); + panfrost_attach_object_fences(job); unlock: drm_gem_unlock_reservations(job->bos, job->bo_count, &acquire_ctx); -- 2.26.2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[RFC PATCH 5/7] drm/panfrost: Add a new ioctl to submit batches
This should help limit the number of ioctls when submitting multiple jobs. The new ioctl also supports syncobj timelines and BO access flags. Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_drv.c | 303 include/uapi/drm/panfrost_drm.h | 79 ++ 2 files changed, 382 insertions(+) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index be45efee47a2..3f8addab8bab 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -446,6 +446,308 @@ static int panfrost_ioctl_madvise(struct drm_device *dev, void *data, return ret; } +static int +panfrost_get_job_in_syncs(struct drm_file *file_priv, + u64 refs, u32 ref_stride, + u32 count, struct panfrost_job *job) +{ + const void __user *in = u64_to_user_ptr(refs); + unsigned int i; + int ret; + + if (!count) + return 0; + + for (i = 0; i < count; i++) { + struct drm_panfrost_syncobj_ref ref = { }; + struct dma_fence *fence; + + ret = copy_struct_from_user(&ref, sizeof(ref), + in + (i * ref_stride), + ref_stride); + if (ret) + return ret; + + if (ref.pad) + return -EINVAL; + + ret = drm_syncobj_find_fence(file_priv, ref.handle, ref.point, +0, &fence); + if (ret) + return ret; + + ret = drm_gem_fence_array_add(&job->deps, fence); + if (ret) + return ret; + } + + return 0; +} + +struct panfrost_job_out_sync { + struct drm_syncobj *syncobj; + struct dma_fence_chain *chain; + u64 point; +}; + +static void +panfrost_put_job_out_syncs(struct panfrost_job_out_sync *out_syncs, u32 count) +{ + unsigned int i; + + for (i = 0; i < count; i++) { + if (!out_syncs[i].syncobj) + break; + + drm_syncobj_put(out_syncs[i].syncobj); + kvfree(out_syncs[i].chain); + } + + kvfree(out_syncs); +} + +static struct panfrost_job_out_sync * +panfrost_get_job_out_syncs(struct drm_file *file_priv, + u64 refs, u32 ref_stride, + u32 count) +{ + void __user *in = u64_to_user_ptr(refs); + struct panfrost_job_out_sync *out_syncs; + unsigned int i; + int ret; + + if (!count) + return NULL; + + out_syncs = kvmalloc_array(count, sizeof(*out_syncs), + GFP_KERNEL | __GFP_ZERO); + if (!out_syncs) + return ERR_PTR(-ENOMEM); + + for (i = 0; i < count; i++) { + struct drm_panfrost_syncobj_ref ref = { }; + + ret = copy_struct_from_user(&ref, sizeof(ref), + in + (i * ref_stride), + ref_stride); + if (ret) + goto err_free_out_syncs; + + if (ref.pad) { + ret = -EINVAL; + goto err_free_out_syncs; + } + + out_syncs[i].syncobj = drm_syncobj_find(file_priv, ref.handle); + if (!out_syncs[i].syncobj) { + ret = -EINVAL; + goto err_free_out_syncs; + } + + out_syncs[i].point = ref.point; + if (!out_syncs[i].point) + continue; + + out_syncs[i].chain = kmalloc(sizeof(*out_syncs[i].chain), +GFP_KERNEL); + if (!out_syncs[i].chain) { + ret = -ENOMEM; + goto err_free_out_syncs; + } + } + + return out_syncs; + +err_free_out_syncs: + panfrost_put_job_out_syncs(out_syncs, count); + return ERR_PTR(ret); +} + +static void +panfrost_set_job_out_fence(struct panfrost_job_out_sync *out_syncs, + unsigned int count, struct dma_fence *fence) +{ + unsigned int i; + + for (i = 0; i < count; i++) { + if (out_syncs[i].chain) { + drm_syncobj_add_point(out_syncs[i].syncobj, + out_syncs[i].chain, + fence, out_syncs[i].point); + out_syncs[i].chain = NULL; + } else { + drm_syncobj_replace_fence(out_syncs[i].syncobj, + fence); + } + } +} + +#define PANFROST_BO_REF_ALLOWED_FLAGS P
Re: [RFC PATCH 0/7] drm/panfrost: Add a new submit ioctl
Hi Steven, On Thu, 11 Mar 2021 12:16:33 + Steven Price wrote: > On 11/03/2021 09:25, Boris Brezillon wrote: > > Hello, > > > > I've been playing with Vulkan lately and struggled quite a bit to > > implement VkQueueSubmit with the submit ioctl we have. There are > > several limiting factors that can be worked around if we really have to, > > but I think it'd be much easier and future-proof if we introduce a new > > ioctl that addresses the current limitations: > > Hi Boris, > > I think what you've proposed is quite reasonable, some detailed comments > to your points below. > > > > > 1/ There can only be one out_sync, but Vulkan might ask us to signal > > several VkSemaphores and possibly one VkFence too, both of those > > being based on sync objects in my PoC. Making out_sync an array of > > syncobjs to attach the render_done fence to would make that possible. > > The other option would be to collect syncobj updates in userspace > > in a separate thread and propagate those updates to all > > semaphores+fences waiting on those events (I think the v3dv driver > > does something like that, but I didn't spend enough time studying > > the code to be sure, so I might be wrong). > > You should be able to avoid the separate thread to propagate by having a > proxy object in user space that maps between the one outsync of the job > and the possibly many Vulkan objects. But I've had this argument before > with the DDK... and the upshot of it was that he Vulkan API is > unnecessarily complex here and makes this really hard to do in practise. > So I agree adding this capability to the kernel is likely the best approach. > > > 2/ Queued jobs might be executed out-of-order (unless they have > > explicit/implicit deps between them), and Vulkan asks that the out > > fence be signaled when all jobs are done. Timeline syncobjs are a > > good match for that use case. All we need to do is pass the same > > fence syncobj to all jobs being attached to a single QueueSubmit > > request, but a different point on the timeline. The syncobj > > timeline wait does the rest and guarantees that we've reached a > > given timeline point (IOW, all jobs before that point are done) > > before declaring the fence as signaled. > > One alternative would be to have dummy 'synchronization' jobs that > > don't actually execute anything on the GPU but declare a dependency > > on all other jobs that are part of the QueueSubmit request, and > > signal the out fence (the scheduler would do most of the work for > > us, all we have to do is support NULL job heads and signal the > > fence directly when that happens instead of queueing the job). > > I have to admit to being rather hazy on the details of timeline > syncobjs, but I thought there was a requirement that the timeline moves > monotonically. I.e. if you have multiple jobs signalling the same > syncobj just with different points, then AFAIU the API requires that the > points are triggered in order. I only started looking at the SYNCOBJ_TIMELINE API a few days ago, so I might be wrong, but my understanding is that queuing fences (addition of new points in the timeline) should happen in order, but signaling can happen in any order. When checking for a signaled fence the fence-chain logic starts from the last point (or from an explicit point if you use the timeline wait flavor) and goes backward, stopping at the first un-signaled node. If I'm correct, that means that fences that are part of a chain can be signaled in any order. Note that I also considered using a sync file, which has the ability to merge fences, but that required 3 extra ioctls for each syncobj to merge (for the export/merge/import round trip), and AFAICT, fences stay around until the sync file is destroyed, which forces some garbage collection if we want to keep the number of active fences low. One nice thing about the fence-chain/syncobj-timeline logic is that signaled fences are collected automatically when walking the chain. > > So I'm not sure that you've actually fixed this point - you either need > to force an order (in which case the last job can signal the Vulkan > fence) That options requires adding deps that do not really exist on the last jobs, so I'm not sure I like it. > or you still need a dummy job to do the many-to-one dependency. Yeah, that's what I've considered doing before realizing timelined syncojbs could solve this problem (assuming I got it right :-/). > > Or I may have completely misunderstood timeline syncobjs - definitely a > p
Re: [RFC PATCH 0/7] drm/panfrost: Add a new submit ioctl
Hi Jason, On Thu, 11 Mar 2021 10:58:46 -0600 Jason Ekstrand wrote: > Hi all, > > Dropping in where I may or may not be wanted to feel free to ignore. : -) I'm glad you decided to chime in. :-) > > > > 2/ Queued jobs might be executed out-of-order (unless they have > > > > explicit/implicit deps between them), and Vulkan asks that the out > > > > fence be signaled when all jobs are done. Timeline syncobjs are a > > > > good match for that use case. All we need to do is pass the same > > > > fence syncobj to all jobs being attached to a single QueueSubmit > > > > request, but a different point on the timeline. The syncobj > > > > timeline wait does the rest and guarantees that we've reached a > > > > given timeline point (IOW, all jobs before that point are done) > > > > before declaring the fence as signaled. > > > > One alternative would be to have dummy 'synchronization' jobs that > > > > don't actually execute anything on the GPU but declare a dependency > > > > on all other jobs that are part of the QueueSubmit request, and > > > > signal the out fence (the scheduler would do most of the work for > > > > us, all we have to do is support NULL job heads and signal the > > > > fence directly when that happens instead of queueing the job). > > > > > > I have to admit to being rather hazy on the details of timeline > > > syncobjs, but I thought there was a requirement that the timeline moves > > > monotonically. I.e. if you have multiple jobs signalling the same > > > syncobj just with different points, then AFAIU the API requires that the > > > points are triggered in order. > > > > I only started looking at the SYNCOBJ_TIMELINE API a few days ago, so I > > might be wrong, but my understanding is that queuing fences (addition > > of new points in the timeline) should happen in order, but signaling > > can happen in any order. When checking for a signaled fence the > > fence-chain logic starts from the last point (or from an explicit point > > if you use the timeline wait flavor) and goes backward, stopping at the > > first un-signaled node. If I'm correct, that means that fences that > > are part of a chain can be signaled in any order. > > You don't even need a timeline for this. Just have a single syncobj > per-queue and make each submit wait on it and then signal it. > Alternatively, you can just always hang on to the out-fence from the > previous submit and make the next one wait on that. That's what I have right now, but it forces the serialization of all jobs that are pushed during a submit (and there can be more than one per command buffer on panfrost :-/). Maybe I'm wrong, but I thought it'd be better to not force this serialization if we can avoid it. > Timelines are overkill here, IMO. Mind developing why you think this is overkill? After looking at the kernel implementation I thought using timeline syncobjs would be pretty cheap compared to the other options I considered. > > > Note that I also considered using a sync file, which has the ability to > > merge fences, but that required 3 extra ioctls for each syncobj to merge > > (for the export/merge/import round trip), and AFAICT, fences stay around > > until the sync file is destroyed, which forces some garbage collection > > if we want to keep the number of active fences low. One nice thing > > about the fence-chain/syncobj-timeline logic is that signaled fences > > are collected automatically when walking the chain. > > Yeah, that's the pain when working with sync files. This is one of > the reasons why our driver takes an arbitrary number of in/out > syncobjs. > > > > So I'm not sure that you've actually fixed this point - you either need > > > to force an order (in which case the last job can signal the Vulkan > > > fence) > > > > That options requires adding deps that do not really exist on the last > > jobs, so I'm not sure I like it. > > > > > or you still need a dummy job to do the many-to-one dependency. > > > > Yeah, that's what I've considered doing before realizing timelined > > syncojbs could solve this problem (assuming I got it right :-/). > > > > > > > > Or I may have completely misunderstood timeline syncobjs - definitely a > > > possibility :) > > > > I wouldn't pretend I got it right either :-). > > > > > > > > > 3/ The current implementation lacks information about BO access, > > > > so we serialize all jobs accessing the same set of BOs, even > > > > if those jobs might just be reading from them (which can > > > > happen concurrently). Other drivers pass an access type to the > > > > list of referenced BOs to address that. Another option would be > > > > to disable implicit deps (deps based on BOs) and force the driver > > > > to pass all deps explicitly (interestingly, some drivers have > > > > both the no-implicit-dep and r/w flags, probably to support > > > > sub-resource access, so we might want
Re: [RFC PATCH 0/7] drm/panfrost: Add a new submit ioctl
On Thu, 11 Mar 2021 12:11:48 -0600 Jason Ekstrand wrote: > > > > > > 2/ Queued jobs might be executed out-of-order (unless they have > > > > > > explicit/implicit deps between them), and Vulkan asks that the > > > > > > out > > > > > > fence be signaled when all jobs are done. Timeline syncobjs are > > > > > > a > > > > > > good match for that use case. All we need to do is pass the same > > > > > > fence syncobj to all jobs being attached to a single QueueSubmit > > > > > > request, but a different point on the timeline. The syncobj > > > > > > timeline wait does the rest and guarantees that we've reached a > > > > > > given timeline point (IOW, all jobs before that point are done) > > > > > > before declaring the fence as signaled. > > > > > > One alternative would be to have dummy 'synchronization' jobs > > > > > > that > > > > > > don't actually execute anything on the GPU but declare a > > > > > > dependency > > > > > > on all other jobs that are part of the QueueSubmit request, and > > > > > > signal the out fence (the scheduler would do most of the work > > > > > > for > > > > > > us, all we have to do is support NULL job heads and signal the > > > > > > fence directly when that happens instead of queueing the job). > > > > > > > > > > I have to admit to being rather hazy on the details of timeline > > > > > syncobjs, but I thought there was a requirement that the timeline > > > > > moves > > > > > monotonically. I.e. if you have multiple jobs signalling the same > > > > > syncobj just with different points, then AFAIU the API requires that > > > > > the > > > > > points are triggered in order. > > > > > > > > I only started looking at the SYNCOBJ_TIMELINE API a few days ago, so I > > > > might be wrong, but my understanding is that queuing fences (addition > > > > of new points in the timeline) should happen in order, but signaling > > > > can happen in any order. When checking for a signaled fence the > > > > fence-chain logic starts from the last point (or from an explicit point > > > > if you use the timeline wait flavor) and goes backward, stopping at the > > > > first un-signaled node. If I'm correct, that means that fences that > > > > are part of a chain can be signaled in any order. > > > > > > You don't even need a timeline for this. Just have a single syncobj > > > per-queue and make each submit wait on it and then signal it. > > > Alternatively, you can just always hang on to the out-fence from the > > > previous submit and make the next one wait on that. > > > > That's what I have right now, but it forces the serialization of all > > jobs that are pushed during a submit (and there can be more than one > > per command buffer on panfrost :-/). Maybe I'm wrong, but I thought it'd > > be better to not force this serialization if we can avoid it. > > I'm not familiar with panfrost's needs and I don't work on a tiler and > I know there are different issues there. But... > > The Vulkan spec requires that everything that all the submits that > happen on a given vkQueue happen in-order. Search the spec for > "Submission order" for more details. Duh, looks like I completely occulted the "Submission order" guarantees. This being said, even after reading this chapter multiple times I'm not sure what kind of guarantee this gives us, given the execution itself can be out-of-order. My understanding is that submission order matters for implicit deps, say you have 2 distinct VkSubmitInfo, the first one (in submission order) writing to a buffer and the second one reading from it, you really want the first one to be submitted first and the second one to wait on the implicit BO fence created by the first one. If we were to submit out-of-order, this guarantee wouldn't be met. OTOH, if we have 2 completely independent submits, I don't really see what submission order gives us if execution is out-of-order. In our case, the kernel driver takes care of the submission serialization (gathering implicit and explicit deps, queuing the job and assigning the "done" fence to the output sync objects). Once things are queued, it's the scheduler (drm_sched) deciding of the execution order. > > So, generally speaking, there are some in-order requirements there. > Again, not having a lot of tiler experience, I'm not the one to weigh > in. > > > > Timelines are overkill here, IMO. > > > > Mind developing why you think this is overkill? After looking at the > > kernel implementation I thought using timeline syncobjs would be > > pretty cheap compared to the other options I considered. > > If you use a regular syncobj, every time you wait on it it inserts a > dependency between the current submit and the last thing to signal it > on the CPU timeline. The internal dma_fences will hang around > as-needed to ensure those dependencies. If you use a timeline, you > have to also track a uint64_t to reference the current time point. > This ma
[PATCH] drm/panfrost: Fix the panfrost_mmu_map_fault_addr() error path
Make sure all bo->base.pages entries are either NULL or pointing to a valid page before calling drm_gem_shmem_put_pages(). Reported-by: Tomeu Vizoso Cc: Fixes: 187d2929206e ("drm/panfrost: Add support for GPU heap allocations") Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 569509c2ba27..d76dff201ea6 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -460,6 +460,7 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as, if (IS_ERR(pages[i])) { mutex_unlock(&bo->base.pages_lock); ret = PTR_ERR(pages[i]); + pages[i] = NULL; goto err_pages; } } -- 2.31.1
Re: [PATCH v5] drm/bridge: tfp410: Set input_bus_flags in atomic_check
On Thu, 3 Dec 2020 18:20:48 +0530 Nikhil Devshatwar wrote: > input_bus_flags are specified in drm_bridge_timings (legacy) as well > as drm_bridge_state->input_bus_cfg.flags > > The flags from the timings will be deprecated. Bridges are supposed > to validate and set the bridge state flags from atomic_check. > > Implement atomic_check hook for the same. > > Signed-off-by: Nikhil Devshatwar > --- > > Notes: > changes from v4: > * fix a warning Reported-by: kernel test robot > > drivers/gpu/drm/bridge/ti-tfp410.c | 15 +++ > 1 file changed, 15 insertions(+) > > diff --git a/drivers/gpu/drm/bridge/ti-tfp410.c > b/drivers/gpu/drm/bridge/ti-tfp410.c > index 3def9acba86b..92963d12106b 100644 > --- a/drivers/gpu/drm/bridge/ti-tfp410.c > +++ b/drivers/gpu/drm/bridge/ti-tfp410.c > @@ -233,6 +233,20 @@ static u32 *tfp410_get_input_bus_fmts(struct drm_bridge > *bridge, > return input_fmts; > } > > +static int tfp410_atomic_check(struct drm_bridge *bridge, > + struct drm_bridge_state *bridge_state, > + struct drm_crtc_state *crtc_state, > + struct drm_connector_state *conn_state) > +{ > + struct tfp410 *dvi = drm_bridge_to_tfp410(bridge); > + > + /* > + * There might be flags negotiation supported in future > + * Set the bus flags in atomic_check statically for now > + */ > + bridge_state->input_bus_cfg.flags = dvi->timings.input_bus_flags; The return statement is still missing :-). > +} > + > static const struct drm_bridge_funcs tfp410_bridge_funcs = { > .attach = tfp410_attach, > .detach = tfp410_detach, > @@ -243,6 +257,7 @@ static const struct drm_bridge_funcs tfp410_bridge_funcs > = { > .atomic_duplicate_state = drm_atomic_helper_bridge_duplicate_state, > .atomic_destroy_state = drm_atomic_helper_bridge_destroy_state, > .atomic_get_input_bus_fmts = tfp410_get_input_bus_fmts, > + .atomic_check = tfp410_atomic_check, > }; > > static const struct drm_bridge_timings tfp410_default_timings = { ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v4 4/7] drm/bridge: mhdp8546: Set input_bus_flags from atomic_check
On Tue, 1 Dec 2020 17:48:27 +0530 Nikhil Devshatwar wrote: > input_bus_flags are specified in drm_bridge_timings (legacy) as well > as drm_bridge_state->input_bus_cfg.flags > > The flags from the timings will be deprecated. Bridges are supposed > to validate and set the bridge state flags from atomic_check. > > Signed-off-by: Nikhil Devshatwar > --- > drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c | 6 ++ > drivers/gpu/drm/bridge/ti-tfp410.c | 1 + > 2 files changed, 7 insertions(+) > > diff --git a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c > b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c > index 2cd809eed827..9c17e4bb517e 100644 > --- a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c > +++ b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c > @@ -2121,6 +2121,12 @@ static int cdns_mhdp_atomic_check(struct drm_bridge > *bridge, > return -EINVAL; > } > > + /* > + * There might be flags negotiation supported in future > + * Set the bus flags in atomic_check statically for now > + */ > + bridge_state->input_bus_cfg.flags = bridge->timings->input_bus_flags; > + > mutex_unlock(&mhdp->link_mutex); > return 0; > } > diff --git a/drivers/gpu/drm/bridge/ti-tfp410.c > b/drivers/gpu/drm/bridge/ti-tfp410.c > index 4c536df003c8..9035d2145a28 100644 > --- a/drivers/gpu/drm/bridge/ti-tfp410.c > +++ b/drivers/gpu/drm/bridge/ti-tfp410.c > @@ -245,6 +245,7 @@ int tfp410_atomic_check(struct drm_bridge *bridge, >* Set the bus flags in atomic_check statically for now >*/ > bridge_state->input_bus_cfg.flags = dvi->timings.input_bus_flags; > + return 0; And here is the return statement that was missing in patch 2 :-). > } > > static const struct drm_bridge_funcs tfp410_bridge_funcs = { ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v4 3/7] drm/bridge: mhdp8546: Add minimal format negotiation
On Tue, 1 Dec 2020 17:48:26 +0530 Nikhil Devshatwar wrote: > With new connector model, mhdp bridge will not create the connector and > SoC driver will rely on format negotiation to setup the encoder format. > > Support minimal format negotiations hooks in the drm_bridge_funcs. > Complete format negotiation can be added based on EDID data. > This patch adds the minimal required support to avoid failure > after moving to new connector model. > > Signed-off-by: Nikhil Devshatwar > Reviewed-by: Tomi Valkeinen > --- > > Notes: > changes from v1: > * cosmetic fixes, commit message update > > .../drm/bridge/cadence/cdns-mhdp8546-core.c | 25 +++ > 1 file changed, 25 insertions(+) > > diff --git a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c > b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c > index d0c65610ebb5..2cd809eed827 100644 > --- a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c > +++ b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c > @@ -2078,6 +2078,30 @@ cdns_mhdp_bridge_atomic_reset(struct drm_bridge > *bridge) > return &cdns_mhdp_state->base; > } > > +static u32 *cdns_mhdp_get_input_bus_fmts(struct drm_bridge *bridge, > + struct drm_bridge_state *bridge_state, > + struct drm_crtc_state *crtc_state, > + struct drm_connector_state *conn_state, > + u32 output_fmt, > + unsigned int *num_input_fmts) > +{ > + u32 *input_fmts; > + u32 default_bus_format = MEDIA_BUS_FMT_RGB121212_1X36; > + > + *num_input_fmts = 0; > + > + if (output_fmt != MEDIA_BUS_FMT_FIXED) > + return NULL; > + > + input_fmts = kzalloc(sizeof(*input_fmts), GFP_KERNEL); > + if (!input_fmts) > + return NULL; > + > + *num_input_fmts = 1; > + input_fmts[0] = default_bus_format; Why not input_fmts[0] = MEDIA_BUS_FMT_RGB121212_1X36; ? This way you could get rid of the default_bus_format variable. > + return input_fmts; > +} > + > static int cdns_mhdp_atomic_check(struct drm_bridge *bridge, > struct drm_bridge_state *bridge_state, > struct drm_crtc_state *crtc_state, > @@ -2142,6 +2166,7 @@ static const struct drm_bridge_funcs > cdns_mhdp_bridge_funcs = { > .atomic_duplicate_state = cdns_mhdp_bridge_atomic_duplicate_state, > .atomic_destroy_state = cdns_mhdp_bridge_atomic_destroy_state, > .atomic_reset = cdns_mhdp_bridge_atomic_reset, > + .atomic_get_input_bus_fmts = cdns_mhdp_get_input_bus_fmts, > .detect = cdns_mhdp_bridge_detect, > .get_edid = cdns_mhdp_bridge_get_edid, > .hpd_enable = cdns_mhdp_bridge_hpd_enable, ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v4 4/7] drm/bridge: mhdp8546: Set input_bus_flags from atomic_check
On Tue, 1 Dec 2020 17:48:27 +0530 Nikhil Devshatwar wrote: > input_bus_flags are specified in drm_bridge_timings (legacy) as well > as drm_bridge_state->input_bus_cfg.flags > > The flags from the timings will be deprecated. Bridges are supposed > to validate and set the bridge state flags from atomic_check. > > Signed-off-by: Nikhil Devshatwar > --- > drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c | 6 ++ > drivers/gpu/drm/bridge/ti-tfp410.c | 1 + > 2 files changed, 7 insertions(+) > > diff --git a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c > b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c > index 2cd809eed827..9c17e4bb517e 100644 > --- a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c > +++ b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c > @@ -2121,6 +2121,12 @@ static int cdns_mhdp_atomic_check(struct drm_bridge > *bridge, > return -EINVAL; > } > > + /* > + * There might be flags negotiation supported in future > + * Set the bus flags in atomic_check statically for now > + */ > + bridge_state->input_bus_cfg.flags = bridge->timings->input_bus_flags; I'd go even further and replace the timings field in cdns_mhdp_platform_info by an input_bus_flags field, you'll then have the following assignment here. if (mhdp->info) bridge_state->input_bus_cfg.flags = mhdp->info->input_bus_flags; This way you no rely on the bridge->timings presence and can get rid of the mhdp->bridge.timings assignment in the probe path. > + > mutex_unlock(&mhdp->link_mutex); > return 0; > } ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v4 5/7] drm/tidss: Set bus_format correctly from bridge/connector
On Tue, 1 Dec 2020 17:48:28 +0530 Nikhil Devshatwar wrote: > Remove the old code to iterate over the bridge chain, as this is > already done by the framework. > The bridge state should have the negotiated bus format and flags. > Use these from the bridge's state. > If the bridge does not support format negotiation, error out > and fail. That'd be even better if you implement the bridge interface instead of the encoder one so we can get rid of the encoder_{helper}_funcs and use drm_simple_encoder_init(). > > Signed-off-by: Nikhil Devshatwar > Reviewed-by: Tomi Valkeinen > --- > > Notes: > changes from v3: > * cosmetic updates > changes from v2: > * Remove the old code and use the flags from the bridge state > > drivers/gpu/drm/tidss/tidss_encoder.c | 36 +++ > 1 file changed, 14 insertions(+), 22 deletions(-) > > diff --git a/drivers/gpu/drm/tidss/tidss_encoder.c > b/drivers/gpu/drm/tidss/tidss_encoder.c > index e278a9c89476..5deb8102e4d3 100644 > --- a/drivers/gpu/drm/tidss/tidss_encoder.c > +++ b/drivers/gpu/drm/tidss/tidss_encoder.c > @@ -21,37 +21,29 @@ static int tidss_encoder_atomic_check(struct drm_encoder > *encoder, > { > struct drm_device *ddev = encoder->dev; > struct tidss_crtc_state *tcrtc_state = to_tidss_crtc_state(crtc_state); > - struct drm_display_info *di = &conn_state->connector->display_info; > + struct drm_bridge_state *bstate; > struct drm_bridge *bridge; > - bool bus_flags_set = false; > > dev_dbg(ddev->dev, "%s\n", __func__); > > - /* > - * Take the bus_flags from the first bridge that defines > - * bridge timings, or from the connector's display_info if no > - * bridge defines the timings. > - */ > - drm_for_each_bridge_in_chain(encoder, bridge) { > - if (!bridge->timings) > - continue; > - > - tcrtc_state->bus_flags = bridge->timings->input_bus_flags; > - bus_flags_set = true; > - break; > + /* Copy the bus_format and flags from the first bridge's state */ > + bridge = drm_bridge_chain_get_first_bridge(encoder); > + bstate = drm_atomic_get_new_bridge_state(crtc_state->state, bridge); > + if (!bstate) { > + dev_err(ddev->dev, "Could not get the bridge state\n"); > + return -EINVAL; > } > > - if (!di->bus_formats || di->num_bus_formats == 0) { > - dev_err(ddev->dev, "%s: No bus_formats in connected display\n", > - __func__); > + tcrtc_state->bus_format = bstate->input_bus_cfg.format; > + tcrtc_state->bus_flags = bstate->input_bus_cfg.flags; > + > + if (tcrtc_state->bus_format == 0 || > + tcrtc_state->bus_format == MEDIA_BUS_FMT_FIXED) { > + > + dev_err(ddev->dev, "Bridge connected to the encoder did not > specify media bus format\n"); > return -EINVAL; > } > > - // XXX any cleaner way to set bus format and flags? > - tcrtc_state->bus_format = di->bus_formats[0]; > - if (!bus_flags_set) > - tcrtc_state->bus_flags = di->bus_flags; > - > return 0; > } > ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v4 5/7] drm/tidss: Set bus_format correctly from bridge/connector
On Fri, 4 Dec 2020 12:56:27 +0200 Tomi Valkeinen wrote: > Hi Boris, > > On 04/12/2020 12:50, Boris Brezillon wrote: > > On Tue, 1 Dec 2020 17:48:28 +0530 > > Nikhil Devshatwar wrote: > > > >> Remove the old code to iterate over the bridge chain, as this is > >> already done by the framework. > >> The bridge state should have the negotiated bus format and flags. > >> Use these from the bridge's state. > >> If the bridge does not support format negotiation, error out > >> and fail. > > > > That'd be even better if you implement the bridge interface instead of > > the encoder one so we can get rid of the encoder_{helper}_funcs and use > > drm_simple_encoder_init(). > > I'm a bit confused here. What should be the design here... > > These formats need to be handled by the crtc (well, the display controller, > which is modeled as the > crtc). Should we have a tidss_crtc.c file, which implements a crtc, a simple > encoder and a bridge? > And just consider those three DRM components as different API parts of the > display controller? The idea is to hide the encoder abstraction from drivers as much as we can. We have to keep the encoder concept because that's what we expose to userspace, but drivers shouldn't have to worry about the distinction between the first bridge in the chain (what we currently call encoders) and other bridges. The bridge interface provides pretty much the same functionality, so all you need to do is turn your encoder driver into a bridge driver (see what we did for drivers/gpu/drm/imx/parallel-display.c), the only particularity here is that the bridge knows it's the first in the chain, and has access to the CRTC it's directly connected to. IMHO, the less interfaces we keep the clearer it gets for our users. Getting rid of the encoder_{helper_}funcs in favor or bridge_ops would clarify the fact that any kind of encoder, no matter if it's the first in the chain or not, should be represented as a bridge object. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v4 5/7] drm/tidss: Set bus_format correctly from bridge/connector
On Fri, 4 Dec 2020 13:47:05 +0200 Tomi Valkeinen wrote: > On 04/12/2020 13:12, Boris Brezillon wrote: > > >>> That'd be even better if you implement the bridge interface instead of > >>> the encoder one so we can get rid of the encoder_{helper}_funcs and use > >>> drm_simple_encoder_init(). > >> > >> I'm a bit confused here. What should be the design here... > >> > >> These formats need to be handled by the crtc (well, the display > >> controller, which is modeled as the > >> crtc). Should we have a tidss_crtc.c file, which implements a crtc, a > >> simple encoder and a bridge? > >> And just consider those three DRM components as different API parts of the > >> display controller? > > > > The idea is to hide the encoder abstraction from drivers as much as we > > can. We have to keep the encoder concept because that's what we expose > > to userspace, but drivers shouldn't have to worry about the distinction > > between the first bridge in the chain (what we currently call encoders) > > and other bridges. The bridge interface provides pretty much the same > > functionality, so all you need to do is turn your encoder driver into a > > bridge driver (see what we did for > > drivers/gpu/drm/imx/parallel-display.c), the only particularity here is > > that the bridge knows it's the first in the chain, and has access to > > the CRTC it's directly connected to. > > With a quick look, the imx parallel-display.c seems to be really part of the > crtc. Shouldn't we then > take one more step forward, and just combine the crtc, encoder and bridge > (somehow)? That's kind of > what parallel-display.c is doing, isn't it? It's directly poking the crtc > state, but the code just > happens to be in a separate file from the crtc. Right. If we want to keep the code split, the logic should probably be reversed, with the CRTC driver checking the first bridge state to setup its internal state. Those DPI encoders are always a bit special, since they tend to be directly embedded in the block responsible for timing control (what we call CRTCs), so you're right, maybe we should model that as a CRTC+bridge pair. > > Thinking about the TI HW, we have a bunch of internal IPs which are clearly > bridges: HDMI, DSI, > which get the pixel data from the display controller, and they have their own > registers, so clearly > independent bridges. > > Then we have MIPI DPI, which doesn't really have its own registers, as it's > just plain parallel RGB, > the same as what is sent to e.g. HDMI and DSI bridges. I still consider that one a bridge, even if the translation is almost transparent from a HW PoV. > However, there might be some muxes or > regulators to set up to get the external signals working. So a bridge would > be ok here too to handle > that external side. Exactly. > > But in all the above cases, we have the display controller (crtc), which in > all the cases needs to > do e.g. pixel/bus format configuration. So why add direct crtc poking into > the first bridges (HDMI, > DSI, DPI), when we could just model the crtc as a bridge, and thus the first > bridges wouldn't need > to touch the crtc. Yes, that's an option. I mean, you can already modify your CRTC logic to implement the bridge and CRTC interface and have your driver-specific CRTC object embed both a drm_crtc and a drm_bridge. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v5 01/16] drm/sched: Document what the timedout_job method should do
The documentation is a bit vague and doesn't really describe what the ->timedout_job() is expected to do. Let's add a few more details. v5: * New patch Suggested-by: Daniel Vetter Signed-off-by: Boris Brezillon --- include/drm/gpu_scheduler.h | 14 ++ 1 file changed, 14 insertions(+) diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 10225a0a35d0..65700511e074 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -239,6 +239,20 @@ struct drm_sched_backend_ops { * @timedout_job: Called when a job has taken too long to execute, * to trigger GPU recovery. * +* This method is called in a workqueue context. +* +* Drivers typically issue a reset to recover from GPU hangs, and this +* procedure usually follows the following workflow: +* +* 1. Stop the scheduler using drm_sched_stop(). This will park the +*scheduler thread and cancel the timeout work, guaranteeing that +*nothing is queued while we reset the hardware queue +* 2. Try to gracefully stop non-faulty jobs (optional) +* 3. Issue a GPU reset (driver-specific) +* 4. Re-submit jobs using drm_sched_resubmit_jobs() +* 5. Restart the scheduler using drm_sched_start(). At that point, new +*jobs can be queued, and the scheduler thread is unblocked +* * Return DRM_GPU_SCHED_STAT_NOMINAL, when all is normal, * and the underlying driver has started or completed recovery. * -- 2.31.1
[PATCH v5 00/16] drm/panfrost: Misc improvements
Hello, This is a merge of [1] and [2] since the second series depends on patches in the preparatory series. Main changes in this v5: * Document what's excepted in the ->timedout_job() hook * Add a patch increasing the AS_ACTIVE polling timeout * Fix a few minor things here and there (see each commit for a detailed changelog) and collect R-b/A-b tags Regards, Boris Boris Brezillon (15): drm/sched: Document what the timedout_job method should do drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr drm/panfrost: Make ->run_job() return an ERR_PTR() when appropriate drm/panfrost: Get rid of the unused JS_STATUS_EVENT_ACTIVE definition drm/panfrost: Drop the pfdev argument passed to panfrost_exception_name() drm/panfrost: Do the exception -> string translation using a table drm/panfrost: Expose a helper to trigger a GPU reset drm/panfrost: Use a threaded IRQ for job interrupts drm/panfrost: Simplify the reset serialization logic drm/panfrost: Make sure job interrupts are masked before resetting drm/panfrost: Disable the AS on unhandled page faults drm/panfrost: Reset the GPU when the AS_ACTIVE bit is stuck drm/panfrost: Don't reset the GPU on job faults unless we really have to drm/panfrost: Kill in-flight jobs on FD close drm/panfrost: Increase the AS_ACTIVE polling timeout Steven Price (1): drm/panfrost: Queue jobs on the hardware drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +- drivers/gpu/drm/etnaviv/etnaviv_sched.c| 3 +- drivers/gpu/drm/lima/lima_sched.c | 3 +- drivers/gpu/drm/panfrost/panfrost_device.c | 139 +++-- drivers/gpu/drm/panfrost/panfrost_device.h | 91 ++- drivers/gpu/drm/panfrost/panfrost_gpu.c| 2 +- drivers/gpu/drm/panfrost/panfrost_job.c| 630 +++-- drivers/gpu/drm/panfrost/panfrost_mmu.c| 43 +- drivers/gpu/drm/panfrost/panfrost_regs.h | 3 - drivers/gpu/drm/scheduler/sched_main.c | 14 +- drivers/gpu/drm/v3d/v3d_sched.c| 10 +- include/drm/gpu_scheduler.h| 37 +- 12 files changed, 721 insertions(+), 256 deletions(-) -- 2.31.1
[PATCH v5 03/16] drm/panfrost: Make ->run_job() return an ERR_PTR() when appropriate
If the fence creation fail, we can return the error pointer directly. The core will update the fence error accordingly. Signed-off-by: Boris Brezillon Reviewed-by: Steven Price Reviewed-by: Alyssa Rosenzweig --- drivers/gpu/drm/panfrost/panfrost_job.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 8ff79fd49577..d6c9698bca3b 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -355,7 +355,7 @@ static struct dma_fence *panfrost_job_run(struct drm_sched_job *sched_job) fence = panfrost_fence_create(pfdev, slot); if (IS_ERR(fence)) - return NULL; + return fence; if (job->done_fence) dma_fence_put(job->done_fence); -- 2.31.1
[PATCH v5 02/16] drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr
Mali Midgard/Bifrost GPUs have 3 hardware queues but only a global GPU reset. This leads to extra complexity when we need to synchronize timeout works with the reset work. One solution to address that is to have an ordered workqueue at the driver level that will be used by the different schedulers to queue their timeout work. Thanks to the serialization provided by the ordered workqueue we are guaranteed that timeout handlers are executed sequentially, and can thus easily reset the GPU from the timeout handler without extra synchronization. v5: * Add a new paragraph to the timedout_job() method v3: * New patch v4: * Actually use the timeout_wq to queue the timeout work Signed-off-by: Boris Brezillon Reviewed-by: Steven Price Reviewed-by: Lucas Stach Cc: Qiang Yu Cc: Emma Anholt Cc: Alex Deucher Cc: "Christian König" --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +- drivers/gpu/drm/etnaviv/etnaviv_sched.c | 3 ++- drivers/gpu/drm/lima/lima_sched.c | 3 ++- drivers/gpu/drm/panfrost/panfrost_job.c | 3 ++- drivers/gpu/drm/scheduler/sched_main.c| 14 +- drivers/gpu/drm/v3d/v3d_sched.c | 10 +- include/drm/gpu_scheduler.h | 23 ++- 7 files changed, 43 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 47ea46859618..532636ea20bc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -488,7 +488,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring, r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, num_hw_submission, amdgpu_job_hang_limit, - timeout, sched_score, ring->name); + timeout, NULL, sched_score, ring->name); if (r) { DRM_ERROR("Failed to create scheduler on ring %s.\n", ring->name); diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c index 19826e504efc..feb6da1b6ceb 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c @@ -190,7 +190,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu) ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, etnaviv_hw_jobs_limit, etnaviv_job_hang_limit, -msecs_to_jiffies(500), NULL, dev_name(gpu->dev)); +msecs_to_jiffies(500), NULL, NULL, +dev_name(gpu->dev)); if (ret) return ret; diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c index ecf3267334ff..dba8329937a3 100644 --- a/drivers/gpu/drm/lima/lima_sched.c +++ b/drivers/gpu/drm/lima/lima_sched.c @@ -508,7 +508,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name) INIT_WORK(&pipe->recover_work, lima_sched_recover_work); return drm_sched_init(&pipe->base, &lima_sched_ops, 1, - lima_job_hang_limit, msecs_to_jiffies(timeout), + lima_job_hang_limit, + msecs_to_jiffies(timeout), NULL, NULL, name); } diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 682f2161b999..8ff79fd49577 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -626,7 +626,8 @@ int panfrost_job_init(struct panfrost_device *pfdev) ret = drm_sched_init(&js->queue[j].sched, &panfrost_sched_ops, -1, 0, msecs_to_jiffies(JOB_TIMEOUT_MS), +1, 0, +msecs_to_jiffies(JOB_TIMEOUT_MS), NULL, NULL, "pan_js"); if (ret) { dev_err(pfdev->dev, "Failed to create scheduler: %d.", ret); diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index c0a2f8f8d472..3e180f0d4305 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -232,7 +232,7 @@ static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched) { if (sched->timeout != MAX_SCHEDULE_TIMEOUT && !list_empty(&sched->pending_list)) - schedule_delayed_work(&sched->work_tdr, sched->timeout); + queue_delayed_work(sched->timeout_wq, &sched->work_tdr, sched->timeout); } /** @@ -244,7 +244,7 @@ static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched) */ void drm_sched_fault(struct
[PATCH v5 09/16] drm/panfrost: Simplify the reset serialization logic
Now that we can pass our own workqueue to drm_sched_init(), we can use an ordered workqueue on for both the scheduler timeout tdr and our own reset work (which we use when the reset is not caused by a fault/timeout on a specific job, like when we have AS_ACTIVE bit stuck). This guarantees that the timeout handlers and reset handler can't run concurrently which drastically simplifies the locking. v4: * Actually pass the reset workqueue to drm_sched_init() * Don't call cancel_work_sync() in panfrost_reset(). It will deadlock since it might be called from the reset work, which is executing and cancel_work_sync() will wait for the handler to return. Checking the reset pending status should avoid spurious resets v3: * New patch Suggested-by: Daniel Vetter Signed-off-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_device.h | 6 +- drivers/gpu/drm/panfrost/panfrost_job.c| 187 - 2 files changed, 72 insertions(+), 121 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h index f2190f90be75..59a487e8aba3 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.h +++ b/drivers/gpu/drm/panfrost/panfrost_device.h @@ -108,6 +108,7 @@ struct panfrost_device { struct mutex sched_lock; struct { + struct workqueue_struct *wq; struct work_struct work; atomic_t pending; } reset; @@ -246,9 +247,8 @@ const char *panfrost_exception_name(u32 exception_code); static inline void panfrost_device_schedule_reset(struct panfrost_device *pfdev) { - /* Schedule a reset if there's no reset in progress. */ - if (!atomic_xchg(&pfdev->reset.pending, 1)) - schedule_work(&pfdev->reset.work); + atomic_set(&pfdev->reset.pending, 1); + queue_work(pfdev->reset.wq, &pfdev->reset.work); } #endif diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index e0c479e67304..98193a557a2d 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -25,17 +25,8 @@ #define job_write(dev, reg, data) writel(data, dev->iomem + (reg)) #define job_read(dev, reg) readl(dev->iomem + (reg)) -enum panfrost_queue_status { - PANFROST_QUEUE_STATUS_ACTIVE, - PANFROST_QUEUE_STATUS_STOPPED, - PANFROST_QUEUE_STATUS_STARTING, - PANFROST_QUEUE_STATUS_FAULT_PENDING, -}; - struct panfrost_queue_state { struct drm_gpu_scheduler sched; - atomic_t status; - struct mutex lock; u64 fence_context; u64 emit_seqno; }; @@ -379,57 +370,72 @@ void panfrost_job_enable_interrupts(struct panfrost_device *pfdev) job_write(pfdev, JOB_INT_MASK, irq_mask); } -static bool panfrost_scheduler_stop(struct panfrost_queue_state *queue, - struct drm_sched_job *bad) +static void panfrost_reset(struct panfrost_device *pfdev, + struct drm_sched_job *bad) { - enum panfrost_queue_status old_status; - bool stopped = false; + unsigned int i; + bool cookie; - mutex_lock(&queue->lock); - old_status = atomic_xchg(&queue->status, -PANFROST_QUEUE_STATUS_STOPPED); - if (old_status == PANFROST_QUEUE_STATUS_STOPPED) - goto out; + if (!atomic_read(&pfdev->reset.pending)) + return; + + /* Stop the schedulers. +* +* FIXME: We temporarily get out of the dma_fence_signalling section +* because the cleanup path generate lockdep splats when taking locks +* to release job resources. We should rework the code to follow this +* pattern: +* +* try_lock +* if (locked) +* release +* else +* schedule_work_to_release_later +*/ + for (i = 0; i < NUM_JOB_SLOTS; i++) + drm_sched_stop(&pfdev->js->queue[i].sched, bad); + + cookie = dma_fence_begin_signalling(); - WARN_ON(old_status != PANFROST_QUEUE_STATUS_ACTIVE); - drm_sched_stop(&queue->sched, bad); if (bad) drm_sched_increase_karma(bad); - stopped = true; + spin_lock(&pfdev->js->job_lock); + for (i = 0; i < NUM_JOB_SLOTS; i++) { + if (pfdev->jobs[i]) { + pm_runtime_put_noidle(pfdev->dev); + panfrost_devfreq_record_idle(&pfdev->pfdevfreq); + pfdev->jobs[i] = NULL; + } + } + spin_unlock(&pfdev->js->job_lock); - /* -* Set the timeout to max so the timer doesn't get started -* when we return from the timeout handler (restore
[PATCH v5 07/16] drm/panfrost: Expose a helper to trigger a GPU reset
Expose a helper to trigger a GPU reset so we can easily trigger reset operations outside the job timeout handler. Signed-off-by: Boris Brezillon Reviewed-by: Steven Price Reviewed-by: Alyssa Rosenzweig --- drivers/gpu/drm/panfrost/panfrost_device.h | 8 drivers/gpu/drm/panfrost/panfrost_job.c| 4 +--- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h index 4c876476268f..f2190f90be75 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.h +++ b/drivers/gpu/drm/panfrost/panfrost_device.h @@ -243,4 +243,12 @@ enum drm_panfrost_exception_type { const char *panfrost_exception_name(u32 exception_code); +static inline void +panfrost_device_schedule_reset(struct panfrost_device *pfdev) +{ + /* Schedule a reset if there's no reset in progress. */ + if (!atomic_xchg(&pfdev->reset.pending, 1)) + schedule_work(&pfdev->reset.work); +} + #endif diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 3cd1aec6c261..be8f68f63974 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -458,9 +458,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job if (!panfrost_scheduler_stop(&pfdev->js->queue[js], sched_job)) return DRM_GPU_SCHED_STAT_NOMINAL; - /* Schedule a reset if there's no reset in progress. */ - if (!atomic_xchg(&pfdev->reset.pending, 1)) - schedule_work(&pfdev->reset.work); + panfrost_device_schedule_reset(pfdev); return DRM_GPU_SCHED_STAT_NOMINAL; } -- 2.31.1
[PATCH v5 12/16] drm/panfrost: Reset the GPU when the AS_ACTIVE bit is stuck
Things are unlikely to resolve until we reset the GPU. Let's not wait for other faults/timeout to happen to trigger this reset. Signed-off-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 65e98c51cb66..5267c3a1f02f 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -36,8 +36,11 @@ static int wait_ready(struct panfrost_device *pfdev, u32 as_nr) ret = readl_relaxed_poll_timeout_atomic(pfdev->iomem + AS_STATUS(as_nr), val, !(val & AS_STATUS_AS_ACTIVE), 10, 1000); - if (ret) + if (ret) { + /* The GPU hung, let's trigger a reset */ + panfrost_device_schedule_reset(pfdev); dev_err(pfdev->dev, "AS_ACTIVE bit stuck\n"); + } return ret; } -- 2.31.1
[PATCH v5 08/16] drm/panfrost: Use a threaded IRQ for job interrupts
This should avoid switching to interrupt context when the GPU is under heavy use. v3: * Don't take the job_lock in panfrost_job_handle_irq() Signed-off-by: Boris Brezillon Acked-by: Alyssa Rosenzweig Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_job.c | 53 ++--- 1 file changed, 38 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index be8f68f63974..e0c479e67304 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -470,19 +470,12 @@ static const struct drm_sched_backend_ops panfrost_sched_ops = { .free_job = panfrost_job_free }; -static irqreturn_t panfrost_job_irq_handler(int irq, void *data) +static void panfrost_job_handle_irq(struct panfrost_device *pfdev, u32 status) { - struct panfrost_device *pfdev = data; - u32 status = job_read(pfdev, JOB_INT_STAT); int j; dev_dbg(pfdev->dev, "jobslot irq status=%x\n", status); - if (!status) - return IRQ_NONE; - - pm_runtime_mark_last_busy(pfdev->dev); - for (j = 0; status; j++) { u32 mask = MK_JS_MASK(j); @@ -519,7 +512,6 @@ static irqreturn_t panfrost_job_irq_handler(int irq, void *data) if (status & JOB_INT_MASK_DONE(j)) { struct panfrost_job *job; - spin_lock(&pfdev->js->job_lock); job = pfdev->jobs[j]; /* Only NULL if job timeout occurred */ if (job) { @@ -531,21 +523,49 @@ static irqreturn_t panfrost_job_irq_handler(int irq, void *data) dma_fence_signal_locked(job->done_fence); pm_runtime_put_autosuspend(pfdev->dev); } - spin_unlock(&pfdev->js->job_lock); } status &= ~mask; } +} +static irqreturn_t panfrost_job_irq_handler_thread(int irq, void *data) +{ + struct panfrost_device *pfdev = data; + u32 status = job_read(pfdev, JOB_INT_RAWSTAT); + + while (status) { + pm_runtime_mark_last_busy(pfdev->dev); + + spin_lock(&pfdev->js->job_lock); + panfrost_job_handle_irq(pfdev, status); + spin_unlock(&pfdev->js->job_lock); + status = job_read(pfdev, JOB_INT_RAWSTAT); + } + + job_write(pfdev, JOB_INT_MASK, + GENMASK(16 + NUM_JOB_SLOTS - 1, 16) | + GENMASK(NUM_JOB_SLOTS - 1, 0)); return IRQ_HANDLED; } +static irqreturn_t panfrost_job_irq_handler(int irq, void *data) +{ + struct panfrost_device *pfdev = data; + u32 status = job_read(pfdev, JOB_INT_STAT); + + if (!status) + return IRQ_NONE; + + job_write(pfdev, JOB_INT_MASK, 0); + return IRQ_WAKE_THREAD; +} + static void panfrost_reset(struct work_struct *work) { struct panfrost_device *pfdev = container_of(work, struct panfrost_device, reset.work); - unsigned long flags; unsigned int i; bool cookie; @@ -575,7 +595,7 @@ static void panfrost_reset(struct work_struct *work) /* All timers have been stopped, we can safely reset the pending state. */ atomic_set(&pfdev->reset.pending, 0); - spin_lock_irqsave(&pfdev->js->job_lock, flags); + spin_lock(&pfdev->js->job_lock); for (i = 0; i < NUM_JOB_SLOTS; i++) { if (pfdev->jobs[i]) { pm_runtime_put_noidle(pfdev->dev); @@ -583,7 +603,7 @@ static void panfrost_reset(struct work_struct *work) pfdev->jobs[i] = NULL; } } - spin_unlock_irqrestore(&pfdev->js->job_lock, flags); + spin_unlock(&pfdev->js->job_lock); panfrost_device_reset(pfdev); @@ -610,8 +630,11 @@ int panfrost_job_init(struct panfrost_device *pfdev) if (irq <= 0) return -ENODEV; - ret = devm_request_irq(pfdev->dev, irq, panfrost_job_irq_handler, - IRQF_SHARED, KBUILD_MODNAME "-job", pfdev); + ret = devm_request_threaded_irq(pfdev->dev, irq, + panfrost_job_irq_handler, + panfrost_job_irq_handler_thread, + IRQF_SHARED, KBUILD_MODNAME "-job", + pfdev); if (ret) { dev_err(pfdev->dev, "failed to request job irq"); return ret; -- 2.31.1
[PATCH v5 05/16] drm/panfrost: Drop the pfdev argument passed to panfrost_exception_name()
Currently unused. We'll add it back if we need per-GPU definitions. Signed-off-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_device.c | 2 +- drivers/gpu/drm/panfrost/panfrost_device.h | 2 +- drivers/gpu/drm/panfrost/panfrost_gpu.c| 2 +- drivers/gpu/drm/panfrost/panfrost_job.c| 2 +- drivers/gpu/drm/panfrost/panfrost_mmu.c| 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c b/drivers/gpu/drm/panfrost/panfrost_device.c index fbcf5edbe367..bce6b0aff05e 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.c +++ b/drivers/gpu/drm/panfrost/panfrost_device.c @@ -292,7 +292,7 @@ void panfrost_device_fini(struct panfrost_device *pfdev) panfrost_clk_fini(pfdev); } -const char *panfrost_exception_name(struct panfrost_device *pfdev, u32 exception_code) +const char *panfrost_exception_name(u32 exception_code) { switch (exception_code) { /* Non-Fault Status code */ diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h index 4c6bdea5537b..ade8a1974ee9 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.h +++ b/drivers/gpu/drm/panfrost/panfrost_device.h @@ -172,6 +172,6 @@ void panfrost_device_reset(struct panfrost_device *pfdev); int panfrost_device_resume(struct device *dev); int panfrost_device_suspend(struct device *dev); -const char *panfrost_exception_name(struct panfrost_device *pfdev, u32 exception_code); +const char *panfrost_exception_name(u32 exception_code); #endif diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c b/drivers/gpu/drm/panfrost/panfrost_gpu.c index 2aae636f1cf5..ec59f15940fb 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gpu.c +++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c @@ -33,7 +33,7 @@ static irqreturn_t panfrost_gpu_irq_handler(int irq, void *data) address |= gpu_read(pfdev, GPU_FAULT_ADDRESS_LO); dev_warn(pfdev->dev, "GPU Fault 0x%08x (%s) at 0x%016llx\n", -fault_status & 0xFF, panfrost_exception_name(pfdev, fault_status), +fault_status & 0xFF, panfrost_exception_name(fault_status), address); if (state & GPU_IRQ_MULTIPLE_FAULT) diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index d6c9698bca3b..3cd1aec6c261 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -500,7 +500,7 @@ static irqreturn_t panfrost_job_irq_handler(int irq, void *data) dev_err(pfdev->dev, "js fault, js=%d, status=%s, head=0x%x, tail=0x%x", j, - panfrost_exception_name(pfdev, job_read(pfdev, JS_STATUS(j))), + panfrost_exception_name(job_read(pfdev, JS_STATUS(j))), job_read(pfdev, JS_HEAD_LO(j)), job_read(pfdev, JS_TAIL_LO(j))); diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index d76dff201ea6..b4f0c673cd7f 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -676,7 +676,7 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) "TODO", fault_status, (fault_status & (1 << 10) ? "DECODER FAULT" : "SLAVE FAULT"), - exception_type, panfrost_exception_name(pfdev, exception_type), + exception_type, panfrost_exception_name(exception_type), access_type, access_type_name(pfdev, fault_status), source_id); -- 2.31.1
[PATCH v5 06/16] drm/panfrost: Do the exception -> string translation using a table
Do the exception -> string translation using a table. This way we get rid of those magic numbers and can easily add new fields if we need to attach extra information to exception types. v4: * Don't expose exception type to userspace * Merge the enum definition and the enum -> string table declaration in the same patch v3: * Drop the error field Signed-off-by: Boris Brezillon Reviewed-by: Steven Price Reviewed-by: Alyssa Rosenzweig --- drivers/gpu/drm/panfrost/panfrost_device.c | 130 + drivers/gpu/drm/panfrost/panfrost_device.h | 69 +++ 2 files changed, 152 insertions(+), 47 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c b/drivers/gpu/drm/panfrost/panfrost_device.c index bce6b0aff05e..736854542b05 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.c +++ b/drivers/gpu/drm/panfrost/panfrost_device.c @@ -292,55 +292,91 @@ void panfrost_device_fini(struct panfrost_device *pfdev) panfrost_clk_fini(pfdev); } -const char *panfrost_exception_name(u32 exception_code) -{ - switch (exception_code) { - /* Non-Fault Status code */ - case 0x00: return "NOT_STARTED/IDLE/OK"; - case 0x01: return "DONE"; - case 0x02: return "INTERRUPTED"; - case 0x03: return "STOPPED"; - case 0x04: return "TERMINATED"; - case 0x08: return "ACTIVE"; - /* Job exceptions */ - case 0x40: return "JOB_CONFIG_FAULT"; - case 0x41: return "JOB_POWER_FAULT"; - case 0x42: return "JOB_READ_FAULT"; - case 0x43: return "JOB_WRITE_FAULT"; - case 0x44: return "JOB_AFFINITY_FAULT"; - case 0x48: return "JOB_BUS_FAULT"; - case 0x50: return "INSTR_INVALID_PC"; - case 0x51: return "INSTR_INVALID_ENC"; - case 0x52: return "INSTR_TYPE_MISMATCH"; - case 0x53: return "INSTR_OPERAND_FAULT"; - case 0x54: return "INSTR_TLS_FAULT"; - case 0x55: return "INSTR_BARRIER_FAULT"; - case 0x56: return "INSTR_ALIGN_FAULT"; - case 0x58: return "DATA_INVALID_FAULT"; - case 0x59: return "TILE_RANGE_FAULT"; - case 0x5A: return "ADDR_RANGE_FAULT"; - case 0x60: return "OUT_OF_MEMORY"; - /* GPU exceptions */ - case 0x80: return "DELAYED_BUS_FAULT"; - case 0x88: return "SHAREABILITY_FAULT"; - /* MMU exceptions */ - case 0xC1: return "TRANSLATION_FAULT_LEVEL1"; - case 0xC2: return "TRANSLATION_FAULT_LEVEL2"; - case 0xC3: return "TRANSLATION_FAULT_LEVEL3"; - case 0xC4: return "TRANSLATION_FAULT_LEVEL4"; - case 0xC8: return "PERMISSION_FAULT"; - case 0xC9 ... 0xCF: return "PERMISSION_FAULT"; - case 0xD1: return "TRANSTAB_BUS_FAULT_LEVEL1"; - case 0xD2: return "TRANSTAB_BUS_FAULT_LEVEL2"; - case 0xD3: return "TRANSTAB_BUS_FAULT_LEVEL3"; - case 0xD4: return "TRANSTAB_BUS_FAULT_LEVEL4"; - case 0xD8: return "ACCESS_FLAG"; - case 0xD9 ... 0xDF: return "ACCESS_FLAG"; - case 0xE0 ... 0xE7: return "ADDRESS_SIZE_FAULT"; - case 0xE8 ... 0xEF: return "MEMORY_ATTRIBUTES_FAULT"; +#define PANFROST_EXCEPTION(id) \ + [DRM_PANFROST_EXCEPTION_ ## id] = { \ + .name = #id, \ } - return "UNKNOWN"; +struct panfrost_exception_info { + const char *name; +}; + +static const struct panfrost_exception_info panfrost_exception_infos[] = { + PANFROST_EXCEPTION(OK), + PANFROST_EXCEPTION(DONE), + PANFROST_EXCEPTION(INTERRUPTED), + PANFROST_EXCEPTION(STOPPED), + PANFROST_EXCEPTION(TERMINATED), + PANFROST_EXCEPTION(KABOOM), + PANFROST_EXCEPTION(EUREKA), + PANFROST_EXCEPTION(ACTIVE), + PANFROST_EXCEPTION(JOB_CONFIG_FAULT), + PANFROST_EXCEPTION(JOB_POWER_FAULT), + PANFROST_EXCEPTION(JOB_READ_FAULT), + PANFROST_EXCEPTION(JOB_WRITE_FAULT), + PANFROST_EXCEPTION(JOB_AFFINITY_FAULT), + PANFROST_EXCEPTION(JOB_BUS_FAULT), + PANFROST_EXCEPTION(INSTR_INVALID_PC), + PANFROST_EXCEPTION(INSTR_INVALID_ENC), + PANFROST_EXCEPTION(INSTR_TYPE_MISMATCH), + PANFROST_EXCEPTION(INSTR_OPERAND_FAULT), + PANFROST_EXCEPTION(INSTR_TLS_FAULT), + PANFROST_EXCEPTION(INSTR_BARRIER_FAULT), + PANFROST_EXCEPTION(INSTR_ALIGN_FAULT), + PANFROST_EXCEPTION(DATA_INVALID_FAULT), + PANFROST_EXCEPTION(TILE_RANGE_FAULT), + PANFROST_EXCEPTION(ADDR_RANGE_FAULT), + PANFROST_EXCEPTION(IMPRECISE_FAULT), + PANFROST_EXCEPTION(OOM), + PANFROST_EXCEPTION(OOM_AFBC), + PANFROST_EXCEP
[PATCH v5 11/16] drm/panfrost: Disable the AS on unhandled page faults
If we don't do that, we have to wait for the job timeout to expire before the fault jobs gets killed. v3: * Make sure the AS is re-enabled when new jobs are submitted to the context Signed-off-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_device.h | 1 + drivers/gpu/drm/panfrost/panfrost_mmu.c| 34 -- 2 files changed, 32 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h index 59a487e8aba3..2dc8c0d1d987 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.h +++ b/drivers/gpu/drm/panfrost/panfrost_device.h @@ -96,6 +96,7 @@ struct panfrost_device { spinlock_t as_lock; unsigned long as_in_use_mask; unsigned long as_alloc_mask; + unsigned long as_faulty_mask; struct list_head as_lru_list; struct panfrost_job_slot *js; diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index b4f0c673cd7f..65e98c51cb66 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -154,6 +154,7 @@ u32 panfrost_mmu_as_get(struct panfrost_device *pfdev, struct panfrost_mmu *mmu) as = mmu->as; if (as >= 0) { int en = atomic_inc_return(&mmu->as_count); + u32 mask = BIT(as) | BIT(16 + as); /* * AS can be retained by active jobs or a perfcnt context, @@ -162,6 +163,18 @@ u32 panfrost_mmu_as_get(struct panfrost_device *pfdev, struct panfrost_mmu *mmu) WARN_ON(en >= (NUM_JOB_SLOTS + 1)); list_move(&mmu->list, &pfdev->as_lru_list); + + if (pfdev->as_faulty_mask & mask) { + /* Unhandled pagefault on this AS, the MMU was +* disabled. We need to re-enable the MMU after +* clearing+unmasking the AS interrupts. +*/ + mmu_write(pfdev, MMU_INT_CLEAR, mask); + mmu_write(pfdev, MMU_INT_MASK, ~pfdev->as_faulty_mask); + pfdev->as_faulty_mask &= ~mask; + panfrost_mmu_enable(pfdev, mmu); + } + goto out; } @@ -211,6 +224,7 @@ void panfrost_mmu_reset(struct panfrost_device *pfdev) spin_lock(&pfdev->as_lock); pfdev->as_alloc_mask = 0; + pfdev->as_faulty_mask = 0; list_for_each_entry_safe(mmu, mmu_tmp, &pfdev->as_lru_list, list) { mmu->as = -1; @@ -662,7 +676,7 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) if ((status & mask) == BIT(as) && (exception_type & 0xF8) == 0xC0) ret = panfrost_mmu_map_fault_addr(pfdev, as, addr); - if (ret) + if (ret) { /* terminal fault, print info about the fault */ dev_err(pfdev->dev, "Unhandled Page fault in AS%d at VA 0x%016llX\n" @@ -680,14 +694,28 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) access_type, access_type_name(pfdev, fault_status), source_id); + spin_lock(&pfdev->as_lock); + /* Ignore MMU interrupts on this AS until it's been +* re-enabled. +*/ + pfdev->as_faulty_mask |= mask; + + /* Disable the MMU to kill jobs on this AS. */ + panfrost_mmu_disable(pfdev, as); + spin_unlock(&pfdev->as_lock); + } + status &= ~mask; /* If we received new MMU interrupts, process them before returning. */ if (!status) - status = mmu_read(pfdev, MMU_INT_RAWSTAT); + status = mmu_read(pfdev, MMU_INT_RAWSTAT) & ~pfdev->as_faulty_mask; } - mmu_write(pfdev, MMU_INT_MASK, ~0); + spin_lock(&pfdev->as_lock); + mmu_write(pfdev, MMU_INT_MASK, ~pfdev->as_faulty_mask); + spin_unlock(&pfdev->as_lock); + return IRQ_HANDLED; }; -- 2.31.1
[PATCH v5 14/16] drm/panfrost: Kill in-flight jobs on FD close
If the process who submitted these jobs decided to close the FD before the jobs are done it probably means it doesn't care about the result. v5: * Add a panfrost_exception_is_fault() helper and the DRM_PANFROST_EXCEPTION_MAX_NON_FAULT value v4: * Don't disable/restore irqs when taking the job_lock (not needed since this lock is never taken from an interrupt context) v3: * Set fence error to ECANCELED when a TERMINATED exception is received Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_device.h | 7 drivers/gpu/drm/panfrost/panfrost_job.c| 42 ++ 2 files changed, 43 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h index d91f71366214..d2ee6e5fe5d8 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.h +++ b/drivers/gpu/drm/panfrost/panfrost_device.h @@ -183,6 +183,7 @@ enum drm_panfrost_exception_type { DRM_PANFROST_EXCEPTION_KABOOM = 0x05, DRM_PANFROST_EXCEPTION_EUREKA = 0x06, DRM_PANFROST_EXCEPTION_ACTIVE = 0x08, + DRM_PANFROST_EXCEPTION_MAX_NON_FAULT = 0x3f, DRM_PANFROST_EXCEPTION_JOB_CONFIG_FAULT = 0x40, DRM_PANFROST_EXCEPTION_JOB_POWER_FAULT = 0x41, DRM_PANFROST_EXCEPTION_JOB_READ_FAULT = 0x42, @@ -243,6 +244,12 @@ enum drm_panfrost_exception_type { DRM_PANFROST_EXCEPTION_MEM_ATTR_NONCACHE_3 = 0xef, }; +static inline bool +panfrost_exception_is_fault(u32 exception_code) +{ + return exception_code > DRM_PANFROST_EXCEPTION_MAX_NON_FAULT; +} + const char *panfrost_exception_name(u32 exception_code); bool panfrost_exception_needs_reset(const struct panfrost_device *pfdev, u32 exception_code); diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index b0f4857ca084..d8e1bc227455 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -499,14 +499,21 @@ static void panfrost_job_handle_irq(struct panfrost_device *pfdev, u32 status) if (status & JOB_INT_MASK_ERR(j)) { u32 js_status = job_read(pfdev, JS_STATUS(j)); + const char *exception_name = panfrost_exception_name(js_status); job_write(pfdev, JS_COMMAND_NEXT(j), JS_COMMAND_NOP); - dev_err(pfdev->dev, "js fault, js=%d, status=%s, head=0x%x, tail=0x%x", - j, - panfrost_exception_name(js_status), - job_read(pfdev, JS_HEAD_LO(j)), - job_read(pfdev, JS_TAIL_LO(j))); + if (!panfrost_exception_is_fault(js_status)) { + dev_dbg(pfdev->dev, "js interrupt, js=%d, status=%s, head=0x%x, tail=0x%x", + j, exception_name, + job_read(pfdev, JS_HEAD_LO(j)), + job_read(pfdev, JS_TAIL_LO(j))); + } else { + dev_err(pfdev->dev, "js fault, js=%d, status=%s, head=0x%x, tail=0x%x", + j, exception_name, + job_read(pfdev, JS_HEAD_LO(j)), + job_read(pfdev, JS_TAIL_LO(j))); + } /* If we need a reset, signal it to the timeout * handler, otherwise, update the fence error field and @@ -515,7 +522,16 @@ static void panfrost_job_handle_irq(struct panfrost_device *pfdev, u32 status) if (panfrost_exception_needs_reset(pfdev, js_status)) { drm_sched_fault(&pfdev->js->queue[j].sched); } else { - dma_fence_set_error(pfdev->jobs[j]->done_fence, -EINVAL); + int error = 0; + + if (js_status == DRM_PANFROST_EXCEPTION_TERMINATED) + error = -ECANCELED; + else if (panfrost_exception_is_fault(js_status)) + error = -EINVAL; + + if (error) + dma_fence_set_error(pfdev->jobs[j]->done_fence, error); + status |= JOB_INT_MASK_DONE(j); } } @@ -681,10 +697,24 @@ int panfrost_job_open(struct panfrost_file_priv *panfrost_priv) void panfrost_job_close(struct panfrost_file_priv *panfrost_priv) { + struct panfrost_device *pfdev = panfrost_priv->pfdev; int i; for (i = 0; i < NUM_JOB_SLOTS; i++) drm_sch
[PATCH v5 13/16] drm/panfrost: Don't reset the GPU on job faults unless we really have to
If we can recover from a fault without a reset there's no reason to issue one. v3: * Drop the mention of Valhall requiring a reset on JOB_BUS_FAULT * Set the fence error to -EINVAL instead of having per-exception error codes Signed-off-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_device.c | 9 + drivers/gpu/drm/panfrost/panfrost_device.h | 2 ++ drivers/gpu/drm/panfrost/panfrost_job.c| 16 ++-- 3 files changed, 25 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c b/drivers/gpu/drm/panfrost/panfrost_device.c index 736854542b05..f4e42009526d 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.c +++ b/drivers/gpu/drm/panfrost/panfrost_device.c @@ -379,6 +379,15 @@ const char *panfrost_exception_name(u32 exception_code) return panfrost_exception_infos[exception_code].name; } +bool panfrost_exception_needs_reset(const struct panfrost_device *pfdev, + u32 exception_code) +{ + /* Right now, none of the GPU we support need a reset, but this +* might change. +*/ + return false; +} + void panfrost_device_reset(struct panfrost_device *pfdev) { panfrost_gpu_soft_reset(pfdev); diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h index 2dc8c0d1d987..d91f71366214 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.h +++ b/drivers/gpu/drm/panfrost/panfrost_device.h @@ -244,6 +244,8 @@ enum drm_panfrost_exception_type { }; const char *panfrost_exception_name(u32 exception_code); +bool panfrost_exception_needs_reset(const struct panfrost_device *pfdev, + u32 exception_code); static inline void panfrost_device_schedule_reset(struct panfrost_device *pfdev) diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 4bd4d11377b7..b0f4857ca084 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -498,14 +498,26 @@ static void panfrost_job_handle_irq(struct panfrost_device *pfdev, u32 status) job_write(pfdev, JOB_INT_CLEAR, mask); if (status & JOB_INT_MASK_ERR(j)) { + u32 js_status = job_read(pfdev, JS_STATUS(j)); + job_write(pfdev, JS_COMMAND_NEXT(j), JS_COMMAND_NOP); dev_err(pfdev->dev, "js fault, js=%d, status=%s, head=0x%x, tail=0x%x", j, - panfrost_exception_name(job_read(pfdev, JS_STATUS(j))), + panfrost_exception_name(js_status), job_read(pfdev, JS_HEAD_LO(j)), job_read(pfdev, JS_TAIL_LO(j))); - drm_sched_fault(&pfdev->js->queue[j].sched); + + /* If we need a reset, signal it to the timeout +* handler, otherwise, update the fence error field and +* signal the job fence. +*/ + if (panfrost_exception_needs_reset(pfdev, js_status)) { + drm_sched_fault(&pfdev->js->queue[j].sched); + } else { + dma_fence_set_error(pfdev->jobs[j]->done_fence, -EINVAL); + status |= JOB_INT_MASK_DONE(j); + } } if (status & JOB_INT_MASK_DONE(j)) { -- 2.31.1
[PATCH v5 15/16] drm/panfrost: Queue jobs on the hardware
From: Steven Price The hardware has a set of '_NEXT' registers that can hold a second job while the first is executing. Make use of these registers to enqueue a second job per slot. v5: * Fix a comment in panfrost_job_init() v3: * Fix the done/err job dequeuing logic to get a valid active state * Only enable the second slot on GPUs supporting jobchain disambiguation * Split interrupt handling in sub-functions Signed-off-by: Steven Price Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_device.h | 2 +- drivers/gpu/drm/panfrost/panfrost_job.c| 468 +++-- 2 files changed, 351 insertions(+), 119 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h index d2ee6e5fe5d8..9f1f2a603208 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.h +++ b/drivers/gpu/drm/panfrost/panfrost_device.h @@ -101,7 +101,7 @@ struct panfrost_device { struct panfrost_job_slot *js; - struct panfrost_job *jobs[NUM_JOB_SLOTS]; + struct panfrost_job *jobs[NUM_JOB_SLOTS][2]; struct list_head scheduled_jobs; struct panfrost_perfcnt *perfcnt; diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index d8e1bc227455..e89e6adef481 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -4,6 +4,7 @@ #include #include #include +#include #include #include #include @@ -140,9 +141,52 @@ static void panfrost_job_write_affinity(struct panfrost_device *pfdev, job_write(pfdev, JS_AFFINITY_NEXT_HI(js), affinity >> 32); } +static u32 +panfrost_get_job_chain_flag(const struct panfrost_job *job) +{ + struct panfrost_fence *f = to_panfrost_fence(job->done_fence); + + if (!panfrost_has_hw_feature(job->pfdev, HW_FEATURE_JOBCHAIN_DISAMBIGUATION)) + return 0; + + return (f->seqno & 1) ? JS_CONFIG_JOB_CHAIN_FLAG : 0; +} + +static struct panfrost_job * +panfrost_dequeue_job(struct panfrost_device *pfdev, int slot) +{ + struct panfrost_job *job = pfdev->jobs[slot][0]; + + WARN_ON(!job); + pfdev->jobs[slot][0] = pfdev->jobs[slot][1]; + pfdev->jobs[slot][1] = NULL; + + return job; +} + +static unsigned int +panfrost_enqueue_job(struct panfrost_device *pfdev, int slot, +struct panfrost_job *job) +{ + if (WARN_ON(!job)) + return 0; + + if (!pfdev->jobs[slot][0]) { + pfdev->jobs[slot][0] = job; + return 0; + } + + WARN_ON(pfdev->jobs[slot][1]); + pfdev->jobs[slot][1] = job; + WARN_ON(panfrost_get_job_chain_flag(job) == + panfrost_get_job_chain_flag(pfdev->jobs[slot][0])); + return 1; +} + static void panfrost_job_hw_submit(struct panfrost_job *job, int js) { struct panfrost_device *pfdev = job->pfdev; + unsigned int subslot; u32 cfg; u64 jc_head = job->jc; int ret; @@ -168,7 +212,8 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js) * start */ cfg |= JS_CONFIG_THREAD_PRI(8) | JS_CONFIG_START_FLUSH_CLEAN_INVALIDATE | - JS_CONFIG_END_FLUSH_CLEAN_INVALIDATE; + JS_CONFIG_END_FLUSH_CLEAN_INVALIDATE | + panfrost_get_job_chain_flag(job); if (panfrost_has_hw_feature(pfdev, HW_FEATURE_FLUSH_REDUCTION)) cfg |= JS_CONFIG_ENABLE_FLUSH_REDUCTION; @@ -182,10 +227,17 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js) job_write(pfdev, JS_FLUSH_ID_NEXT(js), job->flush_id); /* GO ! */ - dev_dbg(pfdev->dev, "JS: Submitting atom %p to js[%d] with head=0x%llx", - job, js, jc_head); - job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START); + spin_lock(&pfdev->js->job_lock); + subslot = panfrost_enqueue_job(pfdev, js, job); + /* Don't queue the job if a reset is in progress */ + if (!atomic_read(&pfdev->reset.pending)) { + job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START); + dev_dbg(pfdev->dev, + "JS: Submitting atom %p to js[%d][%d] with head=0x%llx AS %d", + job, js, subslot, jc_head, cfg & 0xf); + } + spin_unlock(&pfdev->js->job_lock); } static void panfrost_acquire_object_fences(struct drm_gem_object **bos, @@ -343,7 +395,11 @@ static struct dma_fence *panfrost_job_run(struct drm_sched_job *sched_job) if (unlikely(job->base.s_fence->finished.error)) return NULL; - pfdev->jobs[slot] = job; + /* Nothing to execute: can happen if the job has finished while +* we were resetting the GPU. +*/ +
[PATCH v5 04/16] drm/panfrost: Get rid of the unused JS_STATUS_EVENT_ACTIVE definition
Exception types will be defined as an enum. v4: * Fix typo in the commit message Signed-off-by: Boris Brezillon Reviewed-by: Steven Price Reviewed-by: Alyssa Rosenzweig --- drivers/gpu/drm/panfrost/panfrost_regs.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h index eddaa62ad8b0..151cfebd80a0 100644 --- a/drivers/gpu/drm/panfrost/panfrost_regs.h +++ b/drivers/gpu/drm/panfrost/panfrost_regs.h @@ -261,9 +261,6 @@ #define JS_COMMAND_SOFT_STOP_1 0x06/* Execute SOFT_STOP if JOB_CHAIN_FLAG is 1 */ #define JS_COMMAND_HARD_STOP_1 0x07/* Execute HARD_STOP if JOB_CHAIN_FLAG is 1 */ -#define JS_STATUS_EVENT_ACTIVE 0x08 - - /* MMU regs */ #define MMU_INT_RAWSTAT0x2000 #define MMU_INT_CLEAR 0x2004 -- 2.31.1
[PATCH v5 16/16] drm/panfrost: Increase the AS_ACTIVE polling timeout
Experience has shown that 1ms is sometimes not enough, even when the GPU is running at its maximum frequency, not to mention that an MMU operation might take longer if the GPU is running at a lower frequency, which is likely to be the case if devfreq is active. Let's pick a significantly bigger timeout value (1ms -> 100ms) to be on the safe side. v5: * New patch Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 5267c3a1f02f..a32a3df5358e 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -34,7 +34,7 @@ static int wait_ready(struct panfrost_device *pfdev, u32 as_nr) /* Wait for the MMU status to indicate there is no active command, in * case one is pending. */ ret = readl_relaxed_poll_timeout_atomic(pfdev->iomem + AS_STATUS(as_nr), - val, !(val & AS_STATUS_AS_ACTIVE), 10, 1000); + val, !(val & AS_STATUS_AS_ACTIVE), 10, 10); if (ret) { /* The GPU hung, let's trigger a reset */ -- 2.31.1
[PATCH v5 10/16] drm/panfrost: Make sure job interrupts are masked before resetting
This is not yet needed because we let active jobs be killed during by the reset and we don't really bother making sure they can be restarted. But once we start adding soft-stop support, controlling when we deal with the remaining interrrupts and making sure those are handled before the reset is issued gets tricky if we keep job interrupts active. Let's prepare for that and mask+flush job IRQs before issuing a reset. v4: * Add a comment explaining why we WARN_ON(!job) in the irq handler * Keep taking the job_lock when evicting stalled jobs v3: * New patch Signed-off-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_job.c | 27 - 1 file changed, 22 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 98193a557a2d..4bd4d11377b7 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -34,6 +34,7 @@ struct panfrost_queue_state { struct panfrost_job_slot { struct panfrost_queue_state queue[NUM_JOB_SLOTS]; spinlock_t job_lock; + int irq; }; static struct panfrost_job * @@ -400,6 +401,15 @@ static void panfrost_reset(struct panfrost_device *pfdev, if (bad) drm_sched_increase_karma(bad); + /* Mask job interrupts and synchronize to make sure we won't be +* interrupted during our reset. +*/ + job_write(pfdev, JOB_INT_MASK, 0); + synchronize_irq(pfdev->js->irq); + + /* Schedulers are stopped and interrupts are masked+flushed, we don't +* need to protect the 'evict unfinished jobs' lock with the job_lock. +*/ spin_lock(&pfdev->js->job_lock); for (i = 0; i < NUM_JOB_SLOTS; i++) { if (pfdev->jobs[i]) { @@ -502,7 +512,14 @@ static void panfrost_job_handle_irq(struct panfrost_device *pfdev, u32 status) struct panfrost_job *job; job = pfdev->jobs[j]; - /* Only NULL if job timeout occurred */ + /* The only reason this job could be NULL is if the +* job IRQ handler is called just after the +* in-flight job eviction in the reset path, and +* this shouldn't happen because the job IRQ has +* been masked and synchronized when this eviction +* happens. +*/ + WARN_ON(!job); if (job) { pfdev->jobs[j] = NULL; @@ -562,7 +579,7 @@ static void panfrost_reset_work(struct work_struct *work) int panfrost_job_init(struct panfrost_device *pfdev) { struct panfrost_job_slot *js; - int ret, j, irq; + int ret, j; INIT_WORK(&pfdev->reset.work, panfrost_reset_work); @@ -572,11 +589,11 @@ int panfrost_job_init(struct panfrost_device *pfdev) spin_lock_init(&js->job_lock); - irq = platform_get_irq_byname(to_platform_device(pfdev->dev), "job"); - if (irq <= 0) + js->irq = platform_get_irq_byname(to_platform_device(pfdev->dev), "job"); + if (js->irq <= 0) return -ENODEV; - ret = devm_request_threaded_irq(pfdev->dev, irq, + ret = devm_request_threaded_irq(pfdev->dev, js->irq, panfrost_job_irq_handler, panfrost_job_irq_handler_thread, IRQF_SHARED, KBUILD_MODNAME "-job", -- 2.31.1
Re: [PATCH v5 02/16] drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr
On Tue, 29 Jun 2021 10:50:36 +0200 Daniel Vetter wrote: > On Tue, Jun 29, 2021 at 09:34:56AM +0200, Boris Brezillon wrote: > > Mali Midgard/Bifrost GPUs have 3 hardware queues but only a global GPU > > reset. This leads to extra complexity when we need to synchronize timeout > > works with the reset work. One solution to address that is to have an > > ordered workqueue at the driver level that will be used by the different > > schedulers to queue their timeout work. Thanks to the serialization > > provided by the ordered workqueue we are guaranteed that timeout > > handlers are executed sequentially, and can thus easily reset the GPU > > from the timeout handler without extra synchronization. > > > > v5: > > * Add a new paragraph to the timedout_job() method > > > > v3: > > * New patch > > > > v4: > > * Actually use the timeout_wq to queue the timeout work > > > > Signed-off-by: Boris Brezillon > > Reviewed-by: Steven Price > > Reviewed-by: Lucas Stach > > Cc: Qiang Yu > > Cc: Emma Anholt > > Cc: Alex Deucher > > Cc: "Christian König" > > Acked-by: Daniel Vetter > > Also since I'm occasionally blinded by my own pride, add suggested-by: me? Duh, it's an oversight (I thought I had that 'Suggested-by: Daniel Vetter ...' already). > I did spend quite a bit pondering how to untangle your various lockdep > splats in the trd handler :-) And I'm grateful for your help ;-).
Re: [PATCH v5 02/16] drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr
Hi Christian, On Tue, 29 Jun 2021 13:03:58 +0200 Christian König wrote: > Am 29.06.21 um 09:34 schrieb Boris Brezillon: > > Mali Midgard/Bifrost GPUs have 3 hardware queues but only a global GPU > > reset. This leads to extra complexity when we need to synchronize timeout > > works with the reset work. One solution to address that is to have an > > ordered workqueue at the driver level that will be used by the different > > schedulers to queue their timeout work. Thanks to the serialization > > provided by the ordered workqueue we are guaranteed that timeout > > handlers are executed sequentially, and can thus easily reset the GPU > > from the timeout handler without extra synchronization. > > Well, we had already tried this and it didn't worked the way it is expected. > > The major problem is that you not only want to serialize the queue, but > rather have a single reset for all queues. > > Otherwise you schedule multiple resets for each hardware queue. E.g. for > your 3 hardware queues you would reset the GPU 3 times if all of them > time out at the same time (which is rather likely). > > Using a single delayed work item doesn't work either because you then > only have one timeout. > > What could be done is to cancel all delayed work items from all stopped > schedulers. drm_sched_stop() does that already, and since we call drm_sched_stop() on all queues in the timeout handler, we end up with only one global reset happening even if several queues report a timeout at the same time. Regards, Boris
Re: [PATCH v5 09/16] drm/panfrost: Simplify the reset serialization logic
On Tue, 29 Jun 2021 09:35:03 +0200 Boris Brezillon wrote: > @@ -379,57 +370,72 @@ void panfrost_job_enable_interrupts(struct > panfrost_device *pfdev) > job_write(pfdev, JOB_INT_MASK, irq_mask); > } > > -static bool panfrost_scheduler_stop(struct panfrost_queue_state *queue, > - struct drm_sched_job *bad) > +static void panfrost_reset(struct panfrost_device *pfdev, > +struct drm_sched_job *bad) > { > - enum panfrost_queue_status old_status; > - bool stopped = false; > + unsigned int i; > + bool cookie; > > - mutex_lock(&queue->lock); > - old_status = atomic_xchg(&queue->status, > - PANFROST_QUEUE_STATUS_STOPPED); > - if (old_status == PANFROST_QUEUE_STATUS_STOPPED) > - goto out; > + if (!atomic_read(&pfdev->reset.pending)) > + return; > + > + /* Stop the schedulers. > + * > + * FIXME: We temporarily get out of the dma_fence_signalling section > + * because the cleanup path generate lockdep splats when taking locks > + * to release job resources. We should rework the code to follow this > + * pattern: > + * > + * try_lock > + * if (locked) > + * release > + * else > + * schedule_work_to_release_later > + */ > + for (i = 0; i < NUM_JOB_SLOTS; i++) > + drm_sched_stop(&pfdev->js->queue[i].sched, bad); > + > + cookie = dma_fence_begin_signalling(); > > - WARN_ON(old_status != PANFROST_QUEUE_STATUS_ACTIVE); > - drm_sched_stop(&queue->sched, bad); > if (bad) > drm_sched_increase_karma(bad); > > - stopped = true; > + spin_lock(&pfdev->js->job_lock); > + for (i = 0; i < NUM_JOB_SLOTS; i++) { > + if (pfdev->jobs[i]) { > + pm_runtime_put_noidle(pfdev->dev); > + panfrost_devfreq_record_idle(&pfdev->pfdevfreq); > + pfdev->jobs[i] = NULL; > + } > + } > + spin_unlock(&pfdev->js->job_lock); > > - /* > - * Set the timeout to max so the timer doesn't get started > - * when we return from the timeout handler (restored in > - * panfrost_scheduler_start()). > + panfrost_device_reset(pfdev); > + > + /* GPU has been reset, we can cancel timeout/fault work that may have > + * been queued in the meantime and clear the reset pending bit. >*/ > - queue->sched.timeout = MAX_SCHEDULE_TIMEOUT; > + atomic_set(&pfdev->reset.pending, 0); > + for (i = 0; i < NUM_JOB_SLOTS; i++) > + cancel_delayed_work(&pfdev->js->queue[i].sched.work_tdr); > Those cancel_delayed_work() calls are useless, drm_sched_stop() canceled those works already. I'll get rid of them in v6.
[PATCH v6 01/16] drm/sched: Document what the timedout_job method should do
The documentation is a bit vague and doesn't really describe what the ->timedout_job() is expected to do. Let's add a few more details. v5: * New patch Suggested-by: Daniel Vetter Signed-off-by: Boris Brezillon Reviewed-by: Daniel Vetter --- include/drm/gpu_scheduler.h | 14 ++ 1 file changed, 14 insertions(+) diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index d18af49fd009..aa90ed1f1b2b 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -239,6 +239,20 @@ struct drm_sched_backend_ops { * @timedout_job: Called when a job has taken too long to execute, * to trigger GPU recovery. * +* This method is called in a workqueue context. +* +* Drivers typically issue a reset to recover from GPU hangs, and this +* procedure usually follows the following workflow: +* +* 1. Stop the scheduler using drm_sched_stop(). This will park the +*scheduler thread and cancel the timeout work, guaranteeing that +*nothing is queued while we reset the hardware queue +* 2. Try to gracefully stop non-faulty jobs (optional) +* 3. Issue a GPU reset (driver-specific) +* 4. Re-submit jobs using drm_sched_resubmit_jobs() +* 5. Restart the scheduler using drm_sched_start(). At that point, new +*jobs can be queued, and the scheduler thread is unblocked +* * Return DRM_GPU_SCHED_STAT_NOMINAL, when all is normal, * and the underlying driver has started or completed recovery. * -- 2.31.1
[PATCH v6 00/16] drm/panfrost
Hello, Bunch of improvements to make the panfrost driver more robust and allow queuing jobs at the HW level. Changes in v6: * Collected acks/reviews * Got rid of the cancel_delayed_work() calls in the reset path Changes in this v5: * Document what's excepted in the ->timedout_job() hook * Add a patch increasing the AS_ACTIVE polling timeout * Fix a few minor things here and there (see each commit for a detailed changelog) and collect R-b/A-b tags Changes in this v4: * fixing the reset serialization * fixing a deadlock in the reset path * moving the exception enum to a private header Regards, Boris Boris Brezillon (15): drm/sched: Document what the timedout_job method should do drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr drm/panfrost: Make ->run_job() return an ERR_PTR() when appropriate drm/panfrost: Get rid of the unused JS_STATUS_EVENT_ACTIVE definition drm/panfrost: Drop the pfdev argument passed to panfrost_exception_name() drm/panfrost: Do the exception -> string translation using a table drm/panfrost: Expose a helper to trigger a GPU reset drm/panfrost: Use a threaded IRQ for job interrupts drm/panfrost: Simplify the reset serialization logic drm/panfrost: Make sure job interrupts are masked before resetting drm/panfrost: Disable the AS on unhandled page faults drm/panfrost: Reset the GPU when the AS_ACTIVE bit is stuck drm/panfrost: Don't reset the GPU on job faults unless we really have to drm/panfrost: Kill in-flight jobs on FD close drm/panfrost: Increase the AS_ACTIVE polling timeout Steven Price (1): drm/panfrost: Queue jobs on the hardware drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +- drivers/gpu/drm/etnaviv/etnaviv_sched.c| 3 +- drivers/gpu/drm/lima/lima_sched.c | 3 +- drivers/gpu/drm/panfrost/panfrost_device.c | 139 +++-- drivers/gpu/drm/panfrost/panfrost_device.h | 91 ++- drivers/gpu/drm/panfrost/panfrost_gpu.c| 2 +- drivers/gpu/drm/panfrost/panfrost_job.c| 626 +++-- drivers/gpu/drm/panfrost/panfrost_mmu.c| 43 +- drivers/gpu/drm/panfrost/panfrost_regs.h | 3 - drivers/gpu/drm/scheduler/sched_main.c | 14 +- drivers/gpu/drm/v3d/v3d_sched.c| 10 +- include/drm/gpu_scheduler.h| 37 +- 12 files changed, 717 insertions(+), 256 deletions(-) -- 2.31.1
[PATCH v6 02/16] drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr
Mali Midgard/Bifrost GPUs have 3 hardware queues but only a global GPU reset. This leads to extra complexity when we need to synchronize timeout works with the reset work. One solution to address that is to have an ordered workqueue at the driver level that will be used by the different schedulers to queue their timeout work. Thanks to the serialization provided by the ordered workqueue we are guaranteed that timeout handlers are executed sequentially, and can thus easily reset the GPU from the timeout handler without extra synchronization. v5: * Add a new paragraph to the timedout_job() method v3: * New patch v4: * Actually use the timeout_wq to queue the timeout work Suggested-by: Daniel Vetter Signed-off-by: Boris Brezillon Reviewed-by: Steven Price Reviewed-by: Lucas Stach Acked-by: Daniel Vetter Acked-by: Christian König Cc: Qiang Yu Cc: Emma Anholt Cc: Alex Deucher Cc: "Christian König" --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +- drivers/gpu/drm/etnaviv/etnaviv_sched.c | 3 ++- drivers/gpu/drm/lima/lima_sched.c | 3 ++- drivers/gpu/drm/panfrost/panfrost_job.c | 3 ++- drivers/gpu/drm/scheduler/sched_main.c| 14 +- drivers/gpu/drm/v3d/v3d_sched.c | 10 +- include/drm/gpu_scheduler.h | 23 ++- 7 files changed, 43 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 72d9b92b1754..d4547d195173 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -490,7 +490,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring, r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, num_hw_submission, amdgpu_job_hang_limit, - timeout, sched_score, ring->name); + timeout, NULL, sched_score, ring->name); if (r) { DRM_ERROR("Failed to create scheduler on ring %s.\n", ring->name); diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c index 19826e504efc..feb6da1b6ceb 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c @@ -190,7 +190,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu) ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, etnaviv_hw_jobs_limit, etnaviv_job_hang_limit, -msecs_to_jiffies(500), NULL, dev_name(gpu->dev)); +msecs_to_jiffies(500), NULL, NULL, +dev_name(gpu->dev)); if (ret) return ret; diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c index ecf3267334ff..dba8329937a3 100644 --- a/drivers/gpu/drm/lima/lima_sched.c +++ b/drivers/gpu/drm/lima/lima_sched.c @@ -508,7 +508,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name) INIT_WORK(&pipe->recover_work, lima_sched_recover_work); return drm_sched_init(&pipe->base, &lima_sched_ops, 1, - lima_job_hang_limit, msecs_to_jiffies(timeout), + lima_job_hang_limit, + msecs_to_jiffies(timeout), NULL, NULL, name); } diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index beb62c8fc851..17bc5e3bfe0e 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -615,7 +615,8 @@ int panfrost_job_init(struct panfrost_device *pfdev) ret = drm_sched_init(&js->queue[j].sched, &panfrost_sched_ops, -1, 0, msecs_to_jiffies(JOB_TIMEOUT_MS), +1, 0, +msecs_to_jiffies(JOB_TIMEOUT_MS), NULL, NULL, "pan_js"); if (ret) { dev_err(pfdev->dev, "Failed to create scheduler: %d.", ret); diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 60125fbe7bb5..67382621b429 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -232,7 +232,7 @@ static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched) { if (sched->timeout != MAX_SCHEDULE_TIMEOUT && !list_empty(&sched->pending_list)) - schedule_delayed_work(&sched->work_tdr, sched->timeout); + queue_delayed_work(sched->timeout_wq, &sched->work_tdr, sched->timeout); } /** @@ -244,7 +244,7 @@ static void drm_sched_sta
[PATCH v6 04/16] drm/panfrost: Get rid of the unused JS_STATUS_EVENT_ACTIVE definition
Exception types will be defined as an enum. v4: * Fix typo in the commit message Signed-off-by: Boris Brezillon Reviewed-by: Steven Price Reviewed-by: Alyssa Rosenzweig --- drivers/gpu/drm/panfrost/panfrost_regs.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h index dc9df5457f1c..1940ff86e49a 100644 --- a/drivers/gpu/drm/panfrost/panfrost_regs.h +++ b/drivers/gpu/drm/panfrost/panfrost_regs.h @@ -262,9 +262,6 @@ #define JS_COMMAND_SOFT_STOP_1 0x06/* Execute SOFT_STOP if JOB_CHAIN_FLAG is 1 */ #define JS_COMMAND_HARD_STOP_1 0x07/* Execute HARD_STOP if JOB_CHAIN_FLAG is 1 */ -#define JS_STATUS_EVENT_ACTIVE 0x08 - - /* MMU regs */ #define MMU_INT_RAWSTAT0x2000 #define MMU_INT_CLEAR 0x2004 -- 2.31.1
[PATCH v6 03/16] drm/panfrost: Make ->run_job() return an ERR_PTR() when appropriate
If the fence creation fail, we can return the error pointer directly. The core will update the fence error accordingly. Signed-off-by: Boris Brezillon Reviewed-by: Steven Price Reviewed-by: Alyssa Rosenzweig --- drivers/gpu/drm/panfrost/panfrost_job.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 17bc5e3bfe0e..3c1dbae3ebdd 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -344,7 +344,7 @@ static struct dma_fence *panfrost_job_run(struct drm_sched_job *sched_job) fence = panfrost_fence_create(pfdev, slot); if (IS_ERR(fence)) - return NULL; + return fence; if (job->done_fence) dma_fence_put(job->done_fence); -- 2.31.1
[PATCH v6 09/16] drm/panfrost: Simplify the reset serialization logic
Now that we can pass our own workqueue to drm_sched_init(), we can use an ordered workqueue on for both the scheduler timeout tdr and our own reset work (which we use when the reset is not caused by a fault/timeout on a specific job, like when we have AS_ACTIVE bit stuck). This guarantees that the timeout handlers and reset handler can't run concurrently which drastically simplifies the locking. v5: * Don't call cancel_delayed_timeout() in the reset path (those works are canceled in drm_sched_stop()) v4: * Actually pass the reset workqueue to drm_sched_init() * Don't call cancel_work_sync() in panfrost_reset(). It will deadlock since it might be called from the reset work, which is executing and cancel_work_sync() will wait for the handler to return. Checking the reset pending status should avoid spurious resets v3: * New patch Suggested-by: Daniel Vetter Signed-off-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_device.h | 6 +- drivers/gpu/drm/panfrost/panfrost_job.c| 186 - 2 files changed, 69 insertions(+), 123 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h index aec11b6f5abc..d37bc09809e8 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.h +++ b/drivers/gpu/drm/panfrost/panfrost_device.h @@ -109,6 +109,7 @@ struct panfrost_device { struct mutex sched_lock; struct { + struct workqueue_struct *wq; struct work_struct work; atomic_t pending; } reset; @@ -247,9 +248,8 @@ const char *panfrost_exception_name(u32 exception_code); static inline void panfrost_device_schedule_reset(struct panfrost_device *pfdev) { - /* Schedule a reset if there's no reset in progress. */ - if (!atomic_xchg(&pfdev->reset.pending, 1)) - schedule_work(&pfdev->reset.work); + atomic_set(&pfdev->reset.pending, 1); + queue_work(pfdev->reset.wq, &pfdev->reset.work); } #endif diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 6149e4c7133f..59c23c91e47c 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -25,17 +25,8 @@ #define job_write(dev, reg, data) writel(data, dev->iomem + (reg)) #define job_read(dev, reg) readl(dev->iomem + (reg)) -enum panfrost_queue_status { - PANFROST_QUEUE_STATUS_ACTIVE, - PANFROST_QUEUE_STATUS_STOPPED, - PANFROST_QUEUE_STATUS_STARTING, - PANFROST_QUEUE_STATUS_FAULT_PENDING, -}; - struct panfrost_queue_state { struct drm_gpu_scheduler sched; - atomic_t status; - struct mutex lock; u64 fence_context; u64 emit_seqno; }; @@ -368,57 +359,67 @@ void panfrost_job_enable_interrupts(struct panfrost_device *pfdev) job_write(pfdev, JOB_INT_MASK, irq_mask); } -static bool panfrost_scheduler_stop(struct panfrost_queue_state *queue, - struct drm_sched_job *bad) +static void panfrost_reset(struct panfrost_device *pfdev, + struct drm_sched_job *bad) { - enum panfrost_queue_status old_status; - bool stopped = false; + unsigned int i; + bool cookie; - mutex_lock(&queue->lock); - old_status = atomic_xchg(&queue->status, -PANFROST_QUEUE_STATUS_STOPPED); - if (old_status == PANFROST_QUEUE_STATUS_STOPPED) - goto out; + if (!atomic_read(&pfdev->reset.pending)) + return; + + /* Stop the schedulers. +* +* FIXME: We temporarily get out of the dma_fence_signalling section +* because the cleanup path generate lockdep splats when taking locks +* to release job resources. We should rework the code to follow this +* pattern: +* +* try_lock +* if (locked) +* release +* else +* schedule_work_to_release_later +*/ + for (i = 0; i < NUM_JOB_SLOTS; i++) + drm_sched_stop(&pfdev->js->queue[i].sched, bad); + + cookie = dma_fence_begin_signalling(); - WARN_ON(old_status != PANFROST_QUEUE_STATUS_ACTIVE); - drm_sched_stop(&queue->sched, bad); if (bad) drm_sched_increase_karma(bad); - stopped = true; + spin_lock(&pfdev->js->job_lock); + for (i = 0; i < NUM_JOB_SLOTS; i++) { + if (pfdev->jobs[i]) { + pm_runtime_put_noidle(pfdev->dev); + panfrost_devfreq_record_idle(&pfdev->pfdevfreq); + pfdev->jobs[i] = NULL; + } + } + spin_unlock(&pfdev->js->job_lock); -
[PATCH v6 05/16] drm/panfrost: Drop the pfdev argument passed to panfrost_exception_name()
Currently unused. We'll add it back if we need per-GPU definitions. Signed-off-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_device.c | 2 +- drivers/gpu/drm/panfrost/panfrost_device.h | 2 +- drivers/gpu/drm/panfrost/panfrost_gpu.c| 2 +- drivers/gpu/drm/panfrost/panfrost_job.c| 2 +- drivers/gpu/drm/panfrost/panfrost_mmu.c| 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c b/drivers/gpu/drm/panfrost/panfrost_device.c index a2a09c51eed7..f7f5ca94f910 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.c +++ b/drivers/gpu/drm/panfrost/panfrost_device.c @@ -292,7 +292,7 @@ void panfrost_device_fini(struct panfrost_device *pfdev) panfrost_clk_fini(pfdev); } -const char *panfrost_exception_name(struct panfrost_device *pfdev, u32 exception_code) +const char *panfrost_exception_name(u32 exception_code) { switch (exception_code) { /* Non-Fault Status code */ diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h index 8b2cdb8c701d..2fe1550da7f8 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.h +++ b/drivers/gpu/drm/panfrost/panfrost_device.h @@ -173,6 +173,6 @@ void panfrost_device_reset(struct panfrost_device *pfdev); int panfrost_device_resume(struct device *dev); int panfrost_device_suspend(struct device *dev); -const char *panfrost_exception_name(struct panfrost_device *pfdev, u32 exception_code); +const char *panfrost_exception_name(u32 exception_code); #endif diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c b/drivers/gpu/drm/panfrost/panfrost_gpu.c index 0e70e27fd8c3..26e4196b6c90 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gpu.c +++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c @@ -33,7 +33,7 @@ static irqreturn_t panfrost_gpu_irq_handler(int irq, void *data) address |= gpu_read(pfdev, GPU_FAULT_ADDRESS_LO); dev_warn(pfdev->dev, "GPU Fault 0x%08x (%s) at 0x%016llx\n", -fault_status & 0xFF, panfrost_exception_name(pfdev, fault_status), +fault_status & 0xFF, panfrost_exception_name(fault_status), address); if (state & GPU_IRQ_MULTIPLE_FAULT) diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 3c1dbae3ebdd..ea3432ffde40 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -489,7 +489,7 @@ static irqreturn_t panfrost_job_irq_handler(int irq, void *data) dev_err(pfdev->dev, "js fault, js=%d, status=%s, head=0x%x, tail=0x%x", j, - panfrost_exception_name(pfdev, job_read(pfdev, JS_STATUS(j))), + panfrost_exception_name(job_read(pfdev, JS_STATUS(j))), job_read(pfdev, JS_HEAD_LO(j)), job_read(pfdev, JS_TAIL_LO(j))); diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 569509c2ba27..2a9bf30edc9d 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -675,7 +675,7 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) "TODO", fault_status, (fault_status & (1 << 10) ? "DECODER FAULT" : "SLAVE FAULT"), - exception_type, panfrost_exception_name(pfdev, exception_type), + exception_type, panfrost_exception_name(exception_type), access_type, access_type_name(pfdev, fault_status), source_id); -- 2.31.1
[PATCH v6 07/16] drm/panfrost: Expose a helper to trigger a GPU reset
Expose a helper to trigger a GPU reset so we can easily trigger reset operations outside the job timeout handler. Signed-off-by: Boris Brezillon Reviewed-by: Steven Price Reviewed-by: Alyssa Rosenzweig --- drivers/gpu/drm/panfrost/panfrost_device.h | 8 drivers/gpu/drm/panfrost/panfrost_job.c| 4 +--- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h index 3639288ee8a2..aec11b6f5abc 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.h +++ b/drivers/gpu/drm/panfrost/panfrost_device.h @@ -244,4 +244,12 @@ enum drm_panfrost_exception_type { const char *panfrost_exception_name(u32 exception_code); +static inline void +panfrost_device_schedule_reset(struct panfrost_device *pfdev) +{ + /* Schedule a reset if there's no reset in progress. */ + if (!atomic_xchg(&pfdev->reset.pending, 1)) + schedule_work(&pfdev->reset.work); +} + #endif diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index ea3432ffde40..12285a285765 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -447,9 +447,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job if (!panfrost_scheduler_stop(&pfdev->js->queue[js], sched_job)) return DRM_GPU_SCHED_STAT_NOMINAL; - /* Schedule a reset if there's no reset in progress. */ - if (!atomic_xchg(&pfdev->reset.pending, 1)) - schedule_work(&pfdev->reset.work); + panfrost_device_schedule_reset(pfdev); return DRM_GPU_SCHED_STAT_NOMINAL; } -- 2.31.1
[PATCH v6 08/16] drm/panfrost: Use a threaded IRQ for job interrupts
This should avoid switching to interrupt context when the GPU is under heavy use. v3: * Don't take the job_lock in panfrost_job_handle_irq() Signed-off-by: Boris Brezillon Acked-by: Alyssa Rosenzweig Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_job.c | 53 ++--- 1 file changed, 38 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 12285a285765..6149e4c7133f 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -459,19 +459,12 @@ static const struct drm_sched_backend_ops panfrost_sched_ops = { .free_job = panfrost_job_free }; -static irqreturn_t panfrost_job_irq_handler(int irq, void *data) +static void panfrost_job_handle_irq(struct panfrost_device *pfdev, u32 status) { - struct panfrost_device *pfdev = data; - u32 status = job_read(pfdev, JOB_INT_STAT); int j; dev_dbg(pfdev->dev, "jobslot irq status=%x\n", status); - if (!status) - return IRQ_NONE; - - pm_runtime_mark_last_busy(pfdev->dev); - for (j = 0; status; j++) { u32 mask = MK_JS_MASK(j); @@ -508,7 +501,6 @@ static irqreturn_t panfrost_job_irq_handler(int irq, void *data) if (status & JOB_INT_MASK_DONE(j)) { struct panfrost_job *job; - spin_lock(&pfdev->js->job_lock); job = pfdev->jobs[j]; /* Only NULL if job timeout occurred */ if (job) { @@ -520,21 +512,49 @@ static irqreturn_t panfrost_job_irq_handler(int irq, void *data) dma_fence_signal_locked(job->done_fence); pm_runtime_put_autosuspend(pfdev->dev); } - spin_unlock(&pfdev->js->job_lock); } status &= ~mask; } +} +static irqreturn_t panfrost_job_irq_handler_thread(int irq, void *data) +{ + struct panfrost_device *pfdev = data; + u32 status = job_read(pfdev, JOB_INT_RAWSTAT); + + while (status) { + pm_runtime_mark_last_busy(pfdev->dev); + + spin_lock(&pfdev->js->job_lock); + panfrost_job_handle_irq(pfdev, status); + spin_unlock(&pfdev->js->job_lock); + status = job_read(pfdev, JOB_INT_RAWSTAT); + } + + job_write(pfdev, JOB_INT_MASK, + GENMASK(16 + NUM_JOB_SLOTS - 1, 16) | + GENMASK(NUM_JOB_SLOTS - 1, 0)); return IRQ_HANDLED; } +static irqreturn_t panfrost_job_irq_handler(int irq, void *data) +{ + struct panfrost_device *pfdev = data; + u32 status = job_read(pfdev, JOB_INT_STAT); + + if (!status) + return IRQ_NONE; + + job_write(pfdev, JOB_INT_MASK, 0); + return IRQ_WAKE_THREAD; +} + static void panfrost_reset(struct work_struct *work) { struct panfrost_device *pfdev = container_of(work, struct panfrost_device, reset.work); - unsigned long flags; unsigned int i; bool cookie; @@ -564,7 +584,7 @@ static void panfrost_reset(struct work_struct *work) /* All timers have been stopped, we can safely reset the pending state. */ atomic_set(&pfdev->reset.pending, 0); - spin_lock_irqsave(&pfdev->js->job_lock, flags); + spin_lock(&pfdev->js->job_lock); for (i = 0; i < NUM_JOB_SLOTS; i++) { if (pfdev->jobs[i]) { pm_runtime_put_noidle(pfdev->dev); @@ -572,7 +592,7 @@ static void panfrost_reset(struct work_struct *work) pfdev->jobs[i] = NULL; } } - spin_unlock_irqrestore(&pfdev->js->job_lock, flags); + spin_unlock(&pfdev->js->job_lock); panfrost_device_reset(pfdev); @@ -599,8 +619,11 @@ int panfrost_job_init(struct panfrost_device *pfdev) if (irq <= 0) return -ENODEV; - ret = devm_request_irq(pfdev->dev, irq, panfrost_job_irq_handler, - IRQF_SHARED, KBUILD_MODNAME "-job", pfdev); + ret = devm_request_threaded_irq(pfdev->dev, irq, + panfrost_job_irq_handler, + panfrost_job_irq_handler_thread, + IRQF_SHARED, KBUILD_MODNAME "-job", + pfdev); if (ret) { dev_err(pfdev->dev, "failed to request job irq"); return ret; -- 2.31.1
[PATCH v6 13/16] drm/panfrost: Don't reset the GPU on job faults unless we really have to
If we can recover from a fault without a reset there's no reason to issue one. v3: * Drop the mention of Valhall requiring a reset on JOB_BUS_FAULT * Set the fence error to -EINVAL instead of having per-exception error codes Signed-off-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_device.c | 9 + drivers/gpu/drm/panfrost/panfrost_device.h | 2 ++ drivers/gpu/drm/panfrost/panfrost_job.c| 16 ++-- 3 files changed, 25 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c b/drivers/gpu/drm/panfrost/panfrost_device.c index cd76d2ff5034..bd9b7be63b0f 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.c +++ b/drivers/gpu/drm/panfrost/panfrost_device.c @@ -379,6 +379,15 @@ const char *panfrost_exception_name(u32 exception_code) return panfrost_exception_infos[exception_code].name; } +bool panfrost_exception_needs_reset(const struct panfrost_device *pfdev, + u32 exception_code) +{ + /* Right now, none of the GPU we support need a reset, but this +* might change. +*/ + return false; +} + void panfrost_device_reset(struct panfrost_device *pfdev) { panfrost_gpu_soft_reset(pfdev); diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h index cb5aadf7ae90..68e93b7e5b61 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.h +++ b/drivers/gpu/drm/panfrost/panfrost_device.h @@ -245,6 +245,8 @@ enum drm_panfrost_exception_type { }; const char *panfrost_exception_name(u32 exception_code); +bool panfrost_exception_needs_reset(const struct panfrost_device *pfdev, + u32 exception_code); static inline void panfrost_device_schedule_reset(struct panfrost_device *pfdev) diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 11ff33841caf..cf5f9e8b2a27 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -482,14 +482,26 @@ static void panfrost_job_handle_irq(struct panfrost_device *pfdev, u32 status) job_write(pfdev, JOB_INT_CLEAR, mask); if (status & JOB_INT_MASK_ERR(j)) { + u32 js_status = job_read(pfdev, JS_STATUS(j)); + job_write(pfdev, JS_COMMAND_NEXT(j), JS_COMMAND_NOP); dev_err(pfdev->dev, "js fault, js=%d, status=%s, head=0x%x, tail=0x%x", j, - panfrost_exception_name(job_read(pfdev, JS_STATUS(j))), + panfrost_exception_name(js_status), job_read(pfdev, JS_HEAD_LO(j)), job_read(pfdev, JS_TAIL_LO(j))); - drm_sched_fault(&pfdev->js->queue[j].sched); + + /* If we need a reset, signal it to the timeout +* handler, otherwise, update the fence error field and +* signal the job fence. +*/ + if (panfrost_exception_needs_reset(pfdev, js_status)) { + drm_sched_fault(&pfdev->js->queue[j].sched); + } else { + dma_fence_set_error(pfdev->jobs[j]->done_fence, -EINVAL); + status |= JOB_INT_MASK_DONE(j); + } } if (status & JOB_INT_MASK_DONE(j)) { -- 2.31.1
[PATCH v6 11/16] drm/panfrost: Disable the AS on unhandled page faults
If we don't do that, we have to wait for the job timeout to expire before the fault jobs gets killed. v3: * Make sure the AS is re-enabled when new jobs are submitted to the context Signed-off-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_device.h | 1 + drivers/gpu/drm/panfrost/panfrost_mmu.c| 34 -- 2 files changed, 32 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h index d37bc09809e8..cb5aadf7ae90 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.h +++ b/drivers/gpu/drm/panfrost/panfrost_device.h @@ -97,6 +97,7 @@ struct panfrost_device { spinlock_t as_lock; unsigned long as_in_use_mask; unsigned long as_alloc_mask; + unsigned long as_faulty_mask; struct list_head as_lru_list; struct panfrost_job_slot *js; diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 2a9bf30edc9d..959da770295c 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -154,6 +154,7 @@ u32 panfrost_mmu_as_get(struct panfrost_device *pfdev, struct panfrost_mmu *mmu) as = mmu->as; if (as >= 0) { int en = atomic_inc_return(&mmu->as_count); + u32 mask = BIT(as) | BIT(16 + as); /* * AS can be retained by active jobs or a perfcnt context, @@ -162,6 +163,18 @@ u32 panfrost_mmu_as_get(struct panfrost_device *pfdev, struct panfrost_mmu *mmu) WARN_ON(en >= (NUM_JOB_SLOTS + 1)); list_move(&mmu->list, &pfdev->as_lru_list); + + if (pfdev->as_faulty_mask & mask) { + /* Unhandled pagefault on this AS, the MMU was +* disabled. We need to re-enable the MMU after +* clearing+unmasking the AS interrupts. +*/ + mmu_write(pfdev, MMU_INT_CLEAR, mask); + mmu_write(pfdev, MMU_INT_MASK, ~pfdev->as_faulty_mask); + pfdev->as_faulty_mask &= ~mask; + panfrost_mmu_enable(pfdev, mmu); + } + goto out; } @@ -211,6 +224,7 @@ void panfrost_mmu_reset(struct panfrost_device *pfdev) spin_lock(&pfdev->as_lock); pfdev->as_alloc_mask = 0; + pfdev->as_faulty_mask = 0; list_for_each_entry_safe(mmu, mmu_tmp, &pfdev->as_lru_list, list) { mmu->as = -1; @@ -661,7 +675,7 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) if ((status & mask) == BIT(as) && (exception_type & 0xF8) == 0xC0) ret = panfrost_mmu_map_fault_addr(pfdev, as, addr); - if (ret) + if (ret) { /* terminal fault, print info about the fault */ dev_err(pfdev->dev, "Unhandled Page fault in AS%d at VA 0x%016llX\n" @@ -679,14 +693,28 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) access_type, access_type_name(pfdev, fault_status), source_id); + spin_lock(&pfdev->as_lock); + /* Ignore MMU interrupts on this AS until it's been +* re-enabled. +*/ + pfdev->as_faulty_mask |= mask; + + /* Disable the MMU to kill jobs on this AS. */ + panfrost_mmu_disable(pfdev, as); + spin_unlock(&pfdev->as_lock); + } + status &= ~mask; /* If we received new MMU interrupts, process them before returning. */ if (!status) - status = mmu_read(pfdev, MMU_INT_RAWSTAT); + status = mmu_read(pfdev, MMU_INT_RAWSTAT) & ~pfdev->as_faulty_mask; } - mmu_write(pfdev, MMU_INT_MASK, ~0); + spin_lock(&pfdev->as_lock); + mmu_write(pfdev, MMU_INT_MASK, ~pfdev->as_faulty_mask); + spin_unlock(&pfdev->as_lock); + return IRQ_HANDLED; }; -- 2.31.1
[PATCH v6 06/16] drm/panfrost: Do the exception -> string translation using a table
Do the exception -> string translation using a table. This way we get rid of those magic numbers and can easily add new fields if we need to attach extra information to exception types. v4: * Don't expose exception type to userspace * Merge the enum definition and the enum -> string table declaration in the same patch v3: * Drop the error field Signed-off-by: Boris Brezillon Reviewed-by: Steven Price Reviewed-by: Alyssa Rosenzweig --- drivers/gpu/drm/panfrost/panfrost_device.c | 130 + drivers/gpu/drm/panfrost/panfrost_device.h | 69 +++ 2 files changed, 152 insertions(+), 47 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c b/drivers/gpu/drm/panfrost/panfrost_device.c index f7f5ca94f910..cd76d2ff5034 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.c +++ b/drivers/gpu/drm/panfrost/panfrost_device.c @@ -292,55 +292,91 @@ void panfrost_device_fini(struct panfrost_device *pfdev) panfrost_clk_fini(pfdev); } -const char *panfrost_exception_name(u32 exception_code) -{ - switch (exception_code) { - /* Non-Fault Status code */ - case 0x00: return "NOT_STARTED/IDLE/OK"; - case 0x01: return "DONE"; - case 0x02: return "INTERRUPTED"; - case 0x03: return "STOPPED"; - case 0x04: return "TERMINATED"; - case 0x08: return "ACTIVE"; - /* Job exceptions */ - case 0x40: return "JOB_CONFIG_FAULT"; - case 0x41: return "JOB_POWER_FAULT"; - case 0x42: return "JOB_READ_FAULT"; - case 0x43: return "JOB_WRITE_FAULT"; - case 0x44: return "JOB_AFFINITY_FAULT"; - case 0x48: return "JOB_BUS_FAULT"; - case 0x50: return "INSTR_INVALID_PC"; - case 0x51: return "INSTR_INVALID_ENC"; - case 0x52: return "INSTR_TYPE_MISMATCH"; - case 0x53: return "INSTR_OPERAND_FAULT"; - case 0x54: return "INSTR_TLS_FAULT"; - case 0x55: return "INSTR_BARRIER_FAULT"; - case 0x56: return "INSTR_ALIGN_FAULT"; - case 0x58: return "DATA_INVALID_FAULT"; - case 0x59: return "TILE_RANGE_FAULT"; - case 0x5A: return "ADDR_RANGE_FAULT"; - case 0x60: return "OUT_OF_MEMORY"; - /* GPU exceptions */ - case 0x80: return "DELAYED_BUS_FAULT"; - case 0x88: return "SHAREABILITY_FAULT"; - /* MMU exceptions */ - case 0xC1: return "TRANSLATION_FAULT_LEVEL1"; - case 0xC2: return "TRANSLATION_FAULT_LEVEL2"; - case 0xC3: return "TRANSLATION_FAULT_LEVEL3"; - case 0xC4: return "TRANSLATION_FAULT_LEVEL4"; - case 0xC8: return "PERMISSION_FAULT"; - case 0xC9 ... 0xCF: return "PERMISSION_FAULT"; - case 0xD1: return "TRANSTAB_BUS_FAULT_LEVEL1"; - case 0xD2: return "TRANSTAB_BUS_FAULT_LEVEL2"; - case 0xD3: return "TRANSTAB_BUS_FAULT_LEVEL3"; - case 0xD4: return "TRANSTAB_BUS_FAULT_LEVEL4"; - case 0xD8: return "ACCESS_FLAG"; - case 0xD9 ... 0xDF: return "ACCESS_FLAG"; - case 0xE0 ... 0xE7: return "ADDRESS_SIZE_FAULT"; - case 0xE8 ... 0xEF: return "MEMORY_ATTRIBUTES_FAULT"; +#define PANFROST_EXCEPTION(id) \ + [DRM_PANFROST_EXCEPTION_ ## id] = { \ + .name = #id, \ } - return "UNKNOWN"; +struct panfrost_exception_info { + const char *name; +}; + +static const struct panfrost_exception_info panfrost_exception_infos[] = { + PANFROST_EXCEPTION(OK), + PANFROST_EXCEPTION(DONE), + PANFROST_EXCEPTION(INTERRUPTED), + PANFROST_EXCEPTION(STOPPED), + PANFROST_EXCEPTION(TERMINATED), + PANFROST_EXCEPTION(KABOOM), + PANFROST_EXCEPTION(EUREKA), + PANFROST_EXCEPTION(ACTIVE), + PANFROST_EXCEPTION(JOB_CONFIG_FAULT), + PANFROST_EXCEPTION(JOB_POWER_FAULT), + PANFROST_EXCEPTION(JOB_READ_FAULT), + PANFROST_EXCEPTION(JOB_WRITE_FAULT), + PANFROST_EXCEPTION(JOB_AFFINITY_FAULT), + PANFROST_EXCEPTION(JOB_BUS_FAULT), + PANFROST_EXCEPTION(INSTR_INVALID_PC), + PANFROST_EXCEPTION(INSTR_INVALID_ENC), + PANFROST_EXCEPTION(INSTR_TYPE_MISMATCH), + PANFROST_EXCEPTION(INSTR_OPERAND_FAULT), + PANFROST_EXCEPTION(INSTR_TLS_FAULT), + PANFROST_EXCEPTION(INSTR_BARRIER_FAULT), + PANFROST_EXCEPTION(INSTR_ALIGN_FAULT), + PANFROST_EXCEPTION(DATA_INVALID_FAULT), + PANFROST_EXCEPTION(TILE_RANGE_FAULT), + PANFROST_EXCEPTION(ADDR_RANGE_FAULT), + PANFROST_EXCEPTION(IMPRECISE_FAULT), + PANFROST_EXCEPTION(OOM), + PANFROST_EXCEPTION(OOM_AFBC), + PANFROST_EXCEP
[PATCH v6 15/16] drm/panfrost: Queue jobs on the hardware
From: Steven Price The hardware has a set of '_NEXT' registers that can hold a second job while the first is executing. Make use of these registers to enqueue a second job per slot. v5: * Fix a comment in panfrost_job_init() v3: * Fix the done/err job dequeuing logic to get a valid active state * Only enable the second slot on GPUs supporting jobchain disambiguation * Split interrupt handling in sub-functions Signed-off-by: Steven Price Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_device.h | 2 +- drivers/gpu/drm/panfrost/panfrost_job.c| 467 +++-- 2 files changed, 351 insertions(+), 118 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h index 193cd87f643c..8b25278f34c8 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.h +++ b/drivers/gpu/drm/panfrost/panfrost_device.h @@ -102,7 +102,7 @@ struct panfrost_device { struct panfrost_job_slot *js; - struct panfrost_job *jobs[NUM_JOB_SLOTS]; + struct panfrost_job *jobs[NUM_JOB_SLOTS][2]; struct list_head scheduled_jobs; struct panfrost_perfcnt *perfcnt; diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 8a0db9571bfd..71a72fb50e6b 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -4,6 +4,7 @@ #include #include #include +#include #include #include #include @@ -140,9 +141,52 @@ static void panfrost_job_write_affinity(struct panfrost_device *pfdev, job_write(pfdev, JS_AFFINITY_NEXT_HI(js), affinity >> 32); } +static u32 +panfrost_get_job_chain_flag(const struct panfrost_job *job) +{ + struct panfrost_fence *f = to_panfrost_fence(job->done_fence); + + if (!panfrost_has_hw_feature(job->pfdev, HW_FEATURE_JOBCHAIN_DISAMBIGUATION)) + return 0; + + return (f->seqno & 1) ? JS_CONFIG_JOB_CHAIN_FLAG : 0; +} + +static struct panfrost_job * +panfrost_dequeue_job(struct panfrost_device *pfdev, int slot) +{ + struct panfrost_job *job = pfdev->jobs[slot][0]; + + WARN_ON(!job); + pfdev->jobs[slot][0] = pfdev->jobs[slot][1]; + pfdev->jobs[slot][1] = NULL; + + return job; +} + +static unsigned int +panfrost_enqueue_job(struct panfrost_device *pfdev, int slot, +struct panfrost_job *job) +{ + if (WARN_ON(!job)) + return 0; + + if (!pfdev->jobs[slot][0]) { + pfdev->jobs[slot][0] = job; + return 0; + } + + WARN_ON(pfdev->jobs[slot][1]); + pfdev->jobs[slot][1] = job; + WARN_ON(panfrost_get_job_chain_flag(job) == + panfrost_get_job_chain_flag(pfdev->jobs[slot][0])); + return 1; +} + static void panfrost_job_hw_submit(struct panfrost_job *job, int js) { struct panfrost_device *pfdev = job->pfdev; + unsigned int subslot; u32 cfg; u64 jc_head = job->jc; int ret; @@ -168,7 +212,8 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js) * start */ cfg |= JS_CONFIG_THREAD_PRI(8) | JS_CONFIG_START_FLUSH_CLEAN_INVALIDATE | - JS_CONFIG_END_FLUSH_CLEAN_INVALIDATE; + JS_CONFIG_END_FLUSH_CLEAN_INVALIDATE | + panfrost_get_job_chain_flag(job); if (panfrost_has_hw_feature(pfdev, HW_FEATURE_FLUSH_REDUCTION)) cfg |= JS_CONFIG_ENABLE_FLUSH_REDUCTION; @@ -182,10 +227,17 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js) job_write(pfdev, JS_FLUSH_ID_NEXT(js), job->flush_id); /* GO ! */ - dev_dbg(pfdev->dev, "JS: Submitting atom %p to js[%d] with head=0x%llx", - job, js, jc_head); - job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START); + spin_lock(&pfdev->js->job_lock); + subslot = panfrost_enqueue_job(pfdev, js, job); + /* Don't queue the job if a reset is in progress */ + if (!atomic_read(&pfdev->reset.pending)) { + job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START); + dev_dbg(pfdev->dev, + "JS: Submitting atom %p to js[%d][%d] with head=0x%llx AS %d", + job, js, subslot, jc_head, cfg & 0xf); + } + spin_unlock(&pfdev->js->job_lock); } static int panfrost_acquire_object_fences(struct drm_gem_object **bos, @@ -332,7 +384,11 @@ static struct dma_fence *panfrost_job_run(struct drm_sched_job *sched_job) if (unlikely(job->base.s_fence->finished.error)) return NULL; - pfdev->jobs[slot] = job; + /* Nothing to execute: can happen if the job has finished while +* we were resetting the GPU. +*/ +
[PATCH v6 10/16] drm/panfrost: Make sure job interrupts are masked before resetting
This is not yet needed because we let active jobs be killed during by the reset and we don't really bother making sure they can be restarted. But once we start adding soft-stop support, controlling when we deal with the remaining interrrupts and making sure those are handled before the reset is issued gets tricky if we keep job interrupts active. Let's prepare for that and mask+flush job IRQs before issuing a reset. v4: * Add a comment explaining why we WARN_ON(!job) in the irq handler * Keep taking the job_lock when evicting stalled jobs v3: * New patch Signed-off-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_job.c | 27 - 1 file changed, 22 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 59c23c91e47c..11ff33841caf 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -34,6 +34,7 @@ struct panfrost_queue_state { struct panfrost_job_slot { struct panfrost_queue_state queue[NUM_JOB_SLOTS]; spinlock_t job_lock; + int irq; }; static struct panfrost_job * @@ -389,6 +390,15 @@ static void panfrost_reset(struct panfrost_device *pfdev, if (bad) drm_sched_increase_karma(bad); + /* Mask job interrupts and synchronize to make sure we won't be +* interrupted during our reset. +*/ + job_write(pfdev, JOB_INT_MASK, 0); + synchronize_irq(pfdev->js->irq); + + /* Schedulers are stopped and interrupts are masked+flushed, we don't +* need to protect the 'evict unfinished jobs' lock with the job_lock. +*/ spin_lock(&pfdev->js->job_lock); for (i = 0; i < NUM_JOB_SLOTS; i++) { if (pfdev->jobs[i]) { @@ -486,7 +496,14 @@ static void panfrost_job_handle_irq(struct panfrost_device *pfdev, u32 status) struct panfrost_job *job; job = pfdev->jobs[j]; - /* Only NULL if job timeout occurred */ + /* The only reason this job could be NULL is if the +* job IRQ handler is called just after the +* in-flight job eviction in the reset path, and +* this shouldn't happen because the job IRQ has +* been masked and synchronized when this eviction +* happens. +*/ + WARN_ON(!job); if (job) { pfdev->jobs[j] = NULL; @@ -546,7 +563,7 @@ static void panfrost_reset_work(struct work_struct *work) int panfrost_job_init(struct panfrost_device *pfdev) { struct panfrost_job_slot *js; - int ret, j, irq; + int ret, j; INIT_WORK(&pfdev->reset.work, panfrost_reset_work); @@ -556,11 +573,11 @@ int panfrost_job_init(struct panfrost_device *pfdev) spin_lock_init(&js->job_lock); - irq = platform_get_irq_byname(to_platform_device(pfdev->dev), "job"); - if (irq <= 0) + js->irq = platform_get_irq_byname(to_platform_device(pfdev->dev), "job"); + if (js->irq <= 0) return -ENODEV; - ret = devm_request_threaded_irq(pfdev->dev, irq, + ret = devm_request_threaded_irq(pfdev->dev, js->irq, panfrost_job_irq_handler, panfrost_job_irq_handler_thread, IRQF_SHARED, KBUILD_MODNAME "-job", -- 2.31.1
[PATCH v6 16/16] drm/panfrost: Increase the AS_ACTIVE polling timeout
Experience has shown that 1ms is sometimes not enough, even when the GPU is running at its maximum frequency, not to mention that an MMU operation might take longer if the GPU is running at a lower frequency, which is likely to be the case if devfreq is active. Let's pick a significantly bigger timeout value (1ms -> 100ms) to be on the safe side. v5: * New patch Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index e0356e68e768..0da5b3100ab1 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -34,7 +34,7 @@ static int wait_ready(struct panfrost_device *pfdev, u32 as_nr) /* Wait for the MMU status to indicate there is no active command, in * case one is pending. */ ret = readl_relaxed_poll_timeout_atomic(pfdev->iomem + AS_STATUS(as_nr), - val, !(val & AS_STATUS_AS_ACTIVE), 10, 1000); + val, !(val & AS_STATUS_AS_ACTIVE), 10, 10); if (ret) { /* The GPU hung, let's trigger a reset */ -- 2.31.1
[PATCH v6 14/16] drm/panfrost: Kill in-flight jobs on FD close
If the process who submitted these jobs decided to close the FD before the jobs are done it probably means it doesn't care about the result. v5: * Add a panfrost_exception_is_fault() helper and the DRM_PANFROST_EXCEPTION_MAX_NON_FAULT value v4: * Don't disable/restore irqs when taking the job_lock (not needed since this lock is never taken from an interrupt context) v3: * Set fence error to ECANCELED when a TERMINATED exception is received Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_device.h | 7 drivers/gpu/drm/panfrost/panfrost_job.c| 42 ++ 2 files changed, 43 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h index 68e93b7e5b61..193cd87f643c 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.h +++ b/drivers/gpu/drm/panfrost/panfrost_device.h @@ -184,6 +184,7 @@ enum drm_panfrost_exception_type { DRM_PANFROST_EXCEPTION_KABOOM = 0x05, DRM_PANFROST_EXCEPTION_EUREKA = 0x06, DRM_PANFROST_EXCEPTION_ACTIVE = 0x08, + DRM_PANFROST_EXCEPTION_MAX_NON_FAULT = 0x3f, DRM_PANFROST_EXCEPTION_JOB_CONFIG_FAULT = 0x40, DRM_PANFROST_EXCEPTION_JOB_POWER_FAULT = 0x41, DRM_PANFROST_EXCEPTION_JOB_READ_FAULT = 0x42, @@ -244,6 +245,12 @@ enum drm_panfrost_exception_type { DRM_PANFROST_EXCEPTION_MEM_ATTR_NONCACHE_3 = 0xef, }; +static inline bool +panfrost_exception_is_fault(u32 exception_code) +{ + return exception_code > DRM_PANFROST_EXCEPTION_MAX_NON_FAULT; +} + const char *panfrost_exception_name(u32 exception_code); bool panfrost_exception_needs_reset(const struct panfrost_device *pfdev, u32 exception_code); diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index cf5f9e8b2a27..8a0db9571bfd 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -483,14 +483,21 @@ static void panfrost_job_handle_irq(struct panfrost_device *pfdev, u32 status) if (status & JOB_INT_MASK_ERR(j)) { u32 js_status = job_read(pfdev, JS_STATUS(j)); + const char *exception_name = panfrost_exception_name(js_status); job_write(pfdev, JS_COMMAND_NEXT(j), JS_COMMAND_NOP); - dev_err(pfdev->dev, "js fault, js=%d, status=%s, head=0x%x, tail=0x%x", - j, - panfrost_exception_name(js_status), - job_read(pfdev, JS_HEAD_LO(j)), - job_read(pfdev, JS_TAIL_LO(j))); + if (!panfrost_exception_is_fault(js_status)) { + dev_dbg(pfdev->dev, "js interrupt, js=%d, status=%s, head=0x%x, tail=0x%x", + j, exception_name, + job_read(pfdev, JS_HEAD_LO(j)), + job_read(pfdev, JS_TAIL_LO(j))); + } else { + dev_err(pfdev->dev, "js fault, js=%d, status=%s, head=0x%x, tail=0x%x", + j, exception_name, + job_read(pfdev, JS_HEAD_LO(j)), + job_read(pfdev, JS_TAIL_LO(j))); + } /* If we need a reset, signal it to the timeout * handler, otherwise, update the fence error field and @@ -499,7 +506,16 @@ static void panfrost_job_handle_irq(struct panfrost_device *pfdev, u32 status) if (panfrost_exception_needs_reset(pfdev, js_status)) { drm_sched_fault(&pfdev->js->queue[j].sched); } else { - dma_fence_set_error(pfdev->jobs[j]->done_fence, -EINVAL); + int error = 0; + + if (js_status == DRM_PANFROST_EXCEPTION_TERMINATED) + error = -ECANCELED; + else if (panfrost_exception_is_fault(js_status)) + error = -EINVAL; + + if (error) + dma_fence_set_error(pfdev->jobs[j]->done_fence, error); + status |= JOB_INT_MASK_DONE(j); } } @@ -665,10 +681,24 @@ int panfrost_job_open(struct panfrost_file_priv *panfrost_priv) void panfrost_job_close(struct panfrost_file_priv *panfrost_priv) { + struct panfrost_device *pfdev = panfrost_priv->pfdev; int i; for (i = 0; i < NUM_JOB_SLOTS; i++) drm_sch
[PATCH v6 12/16] drm/panfrost: Reset the GPU when the AS_ACTIVE bit is stuck
Things are unlikely to resolve until we reset the GPU. Let's not wait for other faults/timeout to happen to trigger this reset. Signed-off-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 959da770295c..e0356e68e768 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -36,8 +36,11 @@ static int wait_ready(struct panfrost_device *pfdev, u32 as_nr) ret = readl_relaxed_poll_timeout_atomic(pfdev->iomem + AS_STATUS(as_nr), val, !(val & AS_STATUS_AS_ACTIVE), 10, 1000); - if (ret) + if (ret) { + /* The GPU hung, let's trigger a reset */ + panfrost_device_schedule_reset(pfdev); dev_err(pfdev->dev, "AS_ACTIVE bit stuck\n"); + } return ret; } -- 2.31.1
Re: [PATCH v6 00/16] drm/panfrost
On Wed, 30 Jun 2021 08:27:35 +0200 Boris Brezillon wrote: > Hello, > > Bunch of improvements to make the panfrost driver more robust and allow > queuing jobs at the HW level. > > Changes in v6: > * Collected acks/reviews > * Got rid of the cancel_delayed_work() calls in the reset path > > Changes in this v5: > * Document what's excepted in the ->timedout_job() hook > * Add a patch increasing the AS_ACTIVE polling timeout > * Fix a few minor things here and there (see each commit for a detailed > changelog) and collect R-b/A-b tags > > Changes in this v4: > * fixing the reset serialization > * fixing a deadlock in the reset path > * moving the exception enum to a private header > > Regards, > > Boris > > Boris Brezillon (15): > drm/sched: Document what the timedout_job method should do > drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr > drm/panfrost: Make ->run_job() return an ERR_PTR() when appropriate > drm/panfrost: Get rid of the unused JS_STATUS_EVENT_ACTIVE definition > drm/panfrost: Drop the pfdev argument passed to > panfrost_exception_name() > drm/panfrost: Do the exception -> string translation using a table > drm/panfrost: Expose a helper to trigger a GPU reset > drm/panfrost: Use a threaded IRQ for job interrupts > drm/panfrost: Simplify the reset serialization logic > drm/panfrost: Make sure job interrupts are masked before resetting > drm/panfrost: Disable the AS on unhandled page faults > drm/panfrost: Reset the GPU when the AS_ACTIVE bit is stuck > drm/panfrost: Don't reset the GPU on job faults unless we really have > to > drm/panfrost: Kill in-flight jobs on FD close > drm/panfrost: Increase the AS_ACTIVE polling timeout > > Steven Price (1): > drm/panfrost: Queue jobs on the hardware Series queued to drm-misc-next. > > drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +- > drivers/gpu/drm/etnaviv/etnaviv_sched.c| 3 +- > drivers/gpu/drm/lima/lima_sched.c | 3 +- > drivers/gpu/drm/panfrost/panfrost_device.c | 139 +++-- > drivers/gpu/drm/panfrost/panfrost_device.h | 91 ++- > drivers/gpu/drm/panfrost/panfrost_gpu.c| 2 +- > drivers/gpu/drm/panfrost/panfrost_job.c| 626 +++-- > drivers/gpu/drm/panfrost/panfrost_mmu.c| 43 +- > drivers/gpu/drm/panfrost/panfrost_regs.h | 3 - > drivers/gpu/drm/scheduler/sched_main.c | 14 +- > drivers/gpu/drm/v3d/v3d_sched.c| 10 +- > include/drm/gpu_scheduler.h| 37 +- > 12 files changed, 717 insertions(+), 256 deletions(-) >
[PATCH v2 2/7] drm/panfrost: Move the mappings collection out of panfrost_lookup_bos()
So we can re-use it from elsewhere. Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_drv.c | 52 ++--- 1 file changed, 29 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index 1ffaef5ec5ff..9bbc9e78cc85 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -109,6 +109,34 @@ static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data, return 0; } +static int +panfrost_get_job_mappings(struct drm_file *file_priv, struct panfrost_job *job) +{ + struct panfrost_file_priv *priv = file_priv->driver_priv; + unsigned int i; + + job->mappings = kvmalloc_array(job->bo_count, + sizeof(*job->mappings), + GFP_KERNEL | __GFP_ZERO); + if (!job->mappings) + return -ENOMEM; + + for (i = 0; i < job->bo_count; i++) { + struct panfrost_gem_mapping *mapping; + struct panfrost_gem_object *bo; + + bo = to_panfrost_bo(job->bos[i]); + mapping = panfrost_gem_mapping_get(bo, priv); + if (!mapping) + return -EINVAL; + + atomic_inc(&bo->gpu_usecount); + job->mappings[i] = mapping; + } + + return 0; +} + /** * panfrost_lookup_bos() - Sets up job->bo[] with the GEM objects * referenced by the job. @@ -128,8 +156,6 @@ panfrost_lookup_bos(struct drm_device *dev, struct drm_panfrost_submit *args, struct panfrost_job *job) { - struct panfrost_file_priv *priv = file_priv->driver_priv; - struct panfrost_gem_object *bo; unsigned int i; int ret; @@ -144,27 +170,7 @@ panfrost_lookup_bos(struct drm_device *dev, if (ret) return ret; - job->mappings = kvmalloc_array(job->bo_count, - sizeof(struct panfrost_gem_mapping *), - GFP_KERNEL | __GFP_ZERO); - if (!job->mappings) - return -ENOMEM; - - for (i = 0; i < job->bo_count; i++) { - struct panfrost_gem_mapping *mapping; - - bo = to_panfrost_bo(job->bos[i]); - mapping = panfrost_gem_mapping_get(bo, priv); - if (!mapping) { - ret = -EINVAL; - break; - } - - atomic_inc(&bo->gpu_usecount); - job->mappings[i] = mapping; - } - - return ret; + return panfrost_get_job_mappings(file_priv, job); } /** -- 2.31.1
[PATCH v2 0/7] drm/panfrost: drm/panfrost: Add a new submit ioctl
Hello, This is an attempt at providing a new submit ioctl that's more Vulkan-friendly than the existing one. This ioctl 1/ allows passing several out syncobjs so we can easily update several fence/semaphore in a single ioctl() call 2/ allows passing several jobs so we don't have to have one ioctl per job-chain recorded in the command buffer 3/ supports disabling implicit dependencies as well as non-exclusive access to BOs, thus removing unnecessary synchronization I've also been looking at adding {IN,OUT}_FENCE_FD support (allowing one to pass at most one sync_file object in input and/or creating a sync_file FD embedding the render out fence), but it's not entirely clear to me when that's useful. Indeed, we can already do the sync_file <-> syncobj conversion using the SYNCOBJ_{FD_TO_HANDLE,HANDLE_TO_FD} ioctls if we have to. Note that, unlike Turnip, PanVk is using syncobjs to implement vkQueueWaitIdle(), so the syncobj -> sync_file conversion doesn't have to happen for each submission, but maybe there's a good reason to use sync_files for that too. Any feedback on that aspect would be useful I guess. Any feedback on this new ioctl is welcome, in particular, do you think other things are missing/would be nice to have for Vulkan? Regards, Boris P.S.: basic igt tests for these new ioctls re available there [1] [1]https://gitlab.freedesktop.org/bbrezillon/igt-gpu-tools/-/tree/panfrost-batch-submit Boris Brezillon (7): drm/panfrost: Pass a job to panfrost_{acquire,attach_object_fences}() drm/panfrost: Move the mappings collection out of panfrost_lookup_bos() drm/panfrost: Add BO access flags to relax dependencies between jobs drm/panfrost: Add the ability to create submit queues drm/panfrost: Add a new ioctl to submit batches drm/panfrost: Advertise the SYNCOBJ_TIMELINE feature drm/panfrost: Bump minor version to reflect the feature additions drivers/gpu/drm/panfrost/Makefile | 3 +- drivers/gpu/drm/panfrost/panfrost_device.h| 2 +- drivers/gpu/drm/panfrost/panfrost_drv.c | 442 -- drivers/gpu/drm/panfrost/panfrost_job.c | 89 ++-- drivers/gpu/drm/panfrost/panfrost_job.h | 10 +- .../gpu/drm/panfrost/panfrost_submitqueue.c | 130 ++ .../gpu/drm/panfrost/panfrost_submitqueue.h | 27 ++ include/uapi/drm/panfrost_drm.h | 103 8 files changed, 720 insertions(+), 86 deletions(-) create mode 100644 drivers/gpu/drm/panfrost/panfrost_submitqueue.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_submitqueue.h -- 2.31.1
[PATCH v2 1/7] drm/panfrost: Pass a job to panfrost_{acquire, attach_object_fences}()
So we don't have to change the prototype if we extend the function. Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_job.c | 22 -- 1 file changed, 8 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 71a72fb50e6b..fdc1bd7ecf12 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -240,15 +240,13 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js) spin_unlock(&pfdev->js->job_lock); } -static int panfrost_acquire_object_fences(struct drm_gem_object **bos, - int bo_count, - struct xarray *deps) +static int panfrost_acquire_object_fences(struct panfrost_job *job) { int i, ret; - for (i = 0; i < bo_count; i++) { + for (i = 0; i < job->bo_count; i++) { /* panfrost always uses write mode in its current uapi */ - ret = drm_gem_fence_array_add_implicit(deps, bos[i], true); + ret = drm_gem_fence_array_add_implicit(&job->deps, job->bos[i], true); if (ret) return ret; } @@ -256,14 +254,12 @@ static int panfrost_acquire_object_fences(struct drm_gem_object **bos, return 0; } -static void panfrost_attach_object_fences(struct drm_gem_object **bos, - int bo_count, - struct dma_fence *fence) +static void panfrost_attach_object_fences(struct panfrost_job *job) { int i; - for (i = 0; i < bo_count; i++) - dma_resv_add_excl_fence(bos[i]->resv, fence); + for (i = 0; i < job->bo_count; i++) + dma_resv_add_excl_fence(job->bos[i]->resv, job->render_done_fence); } int panfrost_job_push(struct panfrost_job *job) @@ -290,8 +286,7 @@ int panfrost_job_push(struct panfrost_job *job) job->render_done_fence = dma_fence_get(&job->base.s_fence->finished); - ret = panfrost_acquire_object_fences(job->bos, job->bo_count, -&job->deps); + ret = panfrost_acquire_object_fences(job); if (ret) { mutex_unlock(&pfdev->sched_lock); goto unlock; @@ -303,8 +298,7 @@ int panfrost_job_push(struct panfrost_job *job) mutex_unlock(&pfdev->sched_lock); - panfrost_attach_object_fences(job->bos, job->bo_count, - job->render_done_fence); + panfrost_attach_object_fences(job); unlock: drm_gem_unlock_reservations(job->bos, job->bo_count, &acquire_ctx); -- 2.31.1
[PATCH v2 4/7] drm/panfrost: Add the ability to create submit queues
Needed to keep VkQueues isolated from each other. Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/Makefile | 3 +- drivers/gpu/drm/panfrost/panfrost_device.h| 2 +- drivers/gpu/drm/panfrost/panfrost_drv.c | 69 -- drivers/gpu/drm/panfrost/panfrost_job.c | 47 ++- drivers/gpu/drm/panfrost/panfrost_job.h | 9 +- .../gpu/drm/panfrost/panfrost_submitqueue.c | 130 ++ .../gpu/drm/panfrost/panfrost_submitqueue.h | 27 include/uapi/drm/panfrost_drm.h | 17 +++ 8 files changed, 258 insertions(+), 46 deletions(-) create mode 100644 drivers/gpu/drm/panfrost/panfrost_submitqueue.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_submitqueue.h diff --git a/drivers/gpu/drm/panfrost/Makefile b/drivers/gpu/drm/panfrost/Makefile index b71935862417..e99192b66ec9 100644 --- a/drivers/gpu/drm/panfrost/Makefile +++ b/drivers/gpu/drm/panfrost/Makefile @@ -9,6 +9,7 @@ panfrost-y := \ panfrost_gpu.o \ panfrost_job.o \ panfrost_mmu.o \ - panfrost_perfcnt.o + panfrost_perfcnt.o \ + panfrost_submitqueue.o obj-$(CONFIG_DRM_PANFROST) += panfrost.o diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h index 8b25278f34c8..51c0ba4e50f5 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.h +++ b/drivers/gpu/drm/panfrost/panfrost_device.h @@ -137,7 +137,7 @@ struct panfrost_mmu { struct panfrost_file_priv { struct panfrost_device *pfdev; - struct drm_sched_entity sched_entity[NUM_JOB_SLOTS]; + struct idr queues; struct panfrost_mmu *mmu; }; diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index b6b5997c9366..6529e5972b47 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -19,6 +19,7 @@ #include "panfrost_job.h" #include "panfrost_gpu.h" #include "panfrost_perfcnt.h" +#include "panfrost_submitqueue.h" static bool unstable_ioctls; module_param_unsafe(unstable_ioctls, bool, 0600); @@ -250,6 +251,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data, struct panfrost_device *pfdev = dev->dev_private; struct drm_panfrost_submit *args = data; struct drm_syncobj *sync_out = NULL; + struct panfrost_submitqueue *queue; struct panfrost_job *job; int ret = 0; @@ -259,10 +261,16 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data, if (args->requirements && args->requirements != PANFROST_JD_REQ_FS) return -EINVAL; + queue = panfrost_submitqueue_get(file->driver_priv, 0); + if (IS_ERR(queue)) + return PTR_ERR(queue); + if (args->out_sync > 0) { sync_out = drm_syncobj_find(file, args->out_sync); - if (!sync_out) - return -ENODEV; + if (!sync_out) { + ret = -ENODEV; + goto fail_put_queue; + } } job = kzalloc(sizeof(*job), GFP_KERNEL); @@ -289,7 +297,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data, if (ret) goto fail_job; - ret = panfrost_job_push(job); + ret = panfrost_job_push(queue, job); if (ret) goto fail_job; @@ -302,6 +310,8 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data, fail_out_sync: if (sync_out) drm_syncobj_put(sync_out); +fail_put_queue: + panfrost_submitqueue_put(queue); return ret; } @@ -451,6 +461,36 @@ static int panfrost_ioctl_madvise(struct drm_device *dev, void *data, return ret; } +static int +panfrost_ioctl_create_submitqueue(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct panfrost_file_priv *priv = file_priv->driver_priv; + struct drm_panfrost_create_submitqueue *args = data; + struct panfrost_submitqueue *queue; + + queue = panfrost_submitqueue_create(priv, args->priority, args->flags); + if (IS_ERR(queue)) + return PTR_ERR(queue); + + args->id = queue->id; + return 0; +} + +static int +panfrost_ioctl_destroy_submitqueue(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct panfrost_file_priv *priv = file_priv->driver_priv; + u32 id = *((u32 *)data); + + /* Default queue can't be destroyed. */ + if (!id) + return -ENOENT; + + return panfrost_submitqueue_destroy(priv, id); +} + int panfrost_unstable_ioctl_check(void) { if (!unstable_ioctls) @@ -465,6 +505,7 @@ panfrost_open(struct drm_device *dev, struct drm_file *fil
[PATCH v2 5/7] drm/panfrost: Add a new ioctl to submit batches
This should help limit the number of ioctls when submitting multiple jobs. The new ioctl also supports syncobj timelines and BO access flags. Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_drv.c | 305 drivers/gpu/drm/panfrost/panfrost_job.c | 3 + include/uapi/drm/panfrost_drm.h | 84 +++ 3 files changed, 392 insertions(+) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index 6529e5972b47..7ed0773a5c19 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -491,6 +491,310 @@ panfrost_ioctl_destroy_submitqueue(struct drm_device *dev, void *data, return panfrost_submitqueue_destroy(priv, id); } +static int +panfrost_get_job_in_syncs(struct drm_file *file_priv, + u64 refs, u32 ref_stride, + u32 count, struct panfrost_job *job) +{ + const void __user *in = u64_to_user_ptr(refs); + unsigned int i; + int ret; + + if (!count) + return 0; + + for (i = 0; i < count; i++) { + struct drm_panfrost_syncobj_ref ref = { }; + struct dma_fence *fence; + + ret = copy_struct_from_user(&ref, sizeof(ref), + in + (i * ref_stride), + ref_stride); + if (ret) + return ret; + + if (ref.pad) + return -EINVAL; + + ret = drm_syncobj_find_fence(file_priv, ref.handle, ref.point, +0, &fence); + if (ret) + return ret; + + ret = drm_gem_fence_array_add(&job->deps, fence); + if (ret) + return ret; + } + + return 0; +} + +struct panfrost_job_out_sync { + struct drm_syncobj *syncobj; + struct dma_fence_chain *chain; + u64 point; +}; + +static void +panfrost_put_job_out_syncs(struct panfrost_job_out_sync *out_syncs, u32 count) +{ + unsigned int i; + + for (i = 0; i < count; i++) { + if (!out_syncs[i].syncobj) + break; + + drm_syncobj_put(out_syncs[i].syncobj); + kvfree(out_syncs[i].chain); + } + + kvfree(out_syncs); +} + +static struct panfrost_job_out_sync * +panfrost_get_job_out_syncs(struct drm_file *file_priv, + u64 refs, u32 ref_stride, + u32 count) +{ + void __user *in = u64_to_user_ptr(refs); + struct panfrost_job_out_sync *out_syncs; + unsigned int i; + int ret; + + if (!count) + return NULL; + + out_syncs = kvmalloc_array(count, sizeof(*out_syncs), + GFP_KERNEL | __GFP_ZERO); + if (!out_syncs) + return ERR_PTR(-ENOMEM); + + for (i = 0; i < count; i++) { + struct drm_panfrost_syncobj_ref ref = { }; + + ret = copy_struct_from_user(&ref, sizeof(ref), + in + (i * ref_stride), + ref_stride); + if (ret) + goto err_free_out_syncs; + + if (ref.pad) { + ret = -EINVAL; + goto err_free_out_syncs; + } + + out_syncs[i].syncobj = drm_syncobj_find(file_priv, ref.handle); + if (!out_syncs[i].syncobj) { + ret = -EINVAL; + goto err_free_out_syncs; + } + + out_syncs[i].point = ref.point; + if (!out_syncs[i].point) + continue; + + out_syncs[i].chain = kmalloc(sizeof(*out_syncs[i].chain), +GFP_KERNEL); + if (!out_syncs[i].chain) { + ret = -ENOMEM; + goto err_free_out_syncs; + } + } + + return out_syncs; + +err_free_out_syncs: + panfrost_put_job_out_syncs(out_syncs, count); + return ERR_PTR(ret); +} + +static void +panfrost_set_job_out_fence(struct panfrost_job_out_sync *out_syncs, + unsigned int count, struct dma_fence *fence) +{ + unsigned int i; + + for (i = 0; i < count; i++) { + if (out_syncs[i].chain) { + drm_syncobj_add_point(out_syncs[i].syncobj, + out_syncs[i].chain, + fence, out_syncs[i].point); + out_syncs[i].chain = NULL; + } else { + drm_syncobj_replace_fence(out_syncs[i].syncobj, +
[PATCH v2 3/7] drm/panfrost: Add BO access flags to relax dependencies between jobs
Jobs reading from the same BO should not be serialized. Add access flags so we can relax the implicit dependencies in that case. We force exclusive access for now to keep the behavior unchanged, but a new SUBMIT ioctl taking explicit access flags will be introduced. Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_drv.c | 9 + drivers/gpu/drm/panfrost/panfrost_job.c | 23 +++ drivers/gpu/drm/panfrost/panfrost_job.h | 1 + include/uapi/drm/panfrost_drm.h | 2 ++ 4 files changed, 31 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index 9bbc9e78cc85..b6b5997c9366 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -164,6 +164,15 @@ panfrost_lookup_bos(struct drm_device *dev, if (!job->bo_count) return 0; + job->bo_flags = kvmalloc_array(job->bo_count, + sizeof(*job->bo_flags), + GFP_KERNEL | __GFP_ZERO); + if (!job->bo_flags) + return -ENOMEM; + + for (i = 0; i < job->bo_count; i++) + job->bo_flags[i] = PANFROST_BO_REF_EXCLUSIVE; + ret = drm_gem_objects_lookup(file_priv, (void __user *)(uintptr_t)args->bo_handles, job->bo_count, &job->bos); diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index fdc1bd7ecf12..152245b122be 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -245,8 +245,16 @@ static int panfrost_acquire_object_fences(struct panfrost_job *job) int i, ret; for (i = 0; i < job->bo_count; i++) { - /* panfrost always uses write mode in its current uapi */ - ret = drm_gem_fence_array_add_implicit(&job->deps, job->bos[i], true); + bool exclusive = job->bo_flags[i] & PANFROST_BO_REF_EXCLUSIVE; + + if (!exclusive) { + ret = dma_resv_reserve_shared(job->bos[i]->resv, 1); + if (ret) + return ret; + } + + ret = drm_gem_fence_array_add_implicit(&job->deps, job->bos[i], + exclusive); if (ret) return ret; } @@ -258,8 +266,14 @@ static void panfrost_attach_object_fences(struct panfrost_job *job) { int i; - for (i = 0; i < job->bo_count; i++) - dma_resv_add_excl_fence(job->bos[i]->resv, job->render_done_fence); + for (i = 0; i < job->bo_count; i++) { + struct dma_resv *robj = job->bos[i]->resv; + + if (job->bo_flags[i] & PANFROST_BO_REF_EXCLUSIVE) + dma_resv_add_excl_fence(robj, job->render_done_fence); + else + dma_resv_add_shared_fence(robj, job->render_done_fence); + } } int panfrost_job_push(struct panfrost_job *job) @@ -340,6 +354,7 @@ static void panfrost_job_cleanup(struct kref *ref) kvfree(job->bos); } + kvfree(job->bo_flags); kfree(job); } diff --git a/drivers/gpu/drm/panfrost/panfrost_job.h b/drivers/gpu/drm/panfrost/panfrost_job.h index 82306a03b57e..1cbc3621b663 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.h +++ b/drivers/gpu/drm/panfrost/panfrost_job.h @@ -32,6 +32,7 @@ struct panfrost_job { struct panfrost_gem_mapping **mappings; struct drm_gem_object **bos; + u32 *bo_flags; u32 bo_count; /* Fence to be signaled by drm-sched once its done with the job */ diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h index 061e700dd06c..45d6c600475c 100644 --- a/include/uapi/drm/panfrost_drm.h +++ b/include/uapi/drm/panfrost_drm.h @@ -224,6 +224,8 @@ struct drm_panfrost_madvise { __u32 retained; /* out, whether backing store still exists */ }; +#define PANFROST_BO_REF_EXCLUSIVE 0x1 + #if defined(__cplusplus) } #endif -- 2.31.1
[PATCH v2 6/7] drm/panfrost: Advertise the SYNCOBJ_TIMELINE feature
Now that we have a new SUBMIT ioctl dealing with timelined syncojbs we can advertise the feature. Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_drv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index 7ed0773a5c19..b0978fe4fa36 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -886,7 +886,8 @@ DEFINE_DRM_GEM_FOPS(panfrost_drm_driver_fops); * - 1.2 - adds AFBC_FEATURES query */ static const struct drm_driver panfrost_drm_driver = { - .driver_features= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ, + .driver_features= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ | + DRIVER_SYNCOBJ_TIMELINE, .open = panfrost_open, .postclose = panfrost_postclose, .ioctls = panfrost_drm_driver_ioctls, -- 2.31.1
[PATCH v2 7/7] drm/panfrost: Bump minor version to reflect the feature additions
We now have a new ioctl that allows submitting multiple jobs at once (among other things) and we support timelined syncobjs. Bump the minor version number to reflect those changes. Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_drv.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index b0978fe4fa36..1859e6887877 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -884,6 +884,8 @@ DEFINE_DRM_GEM_FOPS(panfrost_drm_driver_fops); * - 1.0 - initial interface * - 1.1 - adds HEAP and NOEXEC flags for CREATE_BO * - 1.2 - adds AFBC_FEATURES query + * - 1.3 - adds the BATCH_SUBMIT, CREATE_SUBMITQUEUE, DESTROY_SUBMITQUEUE + *ioctls and advertises the SYNCOBJ_TIMELINE feature */ static const struct drm_driver panfrost_drm_driver = { .driver_features= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ | @@ -897,7 +899,7 @@ static const struct drm_driver panfrost_drm_driver = { .desc = "panfrost DRM", .date = "20180908", .major = 1, - .minor = 2, + .minor = 3, .gem_create_object = panfrost_gem_create_object, .prime_handle_to_fd = drm_gem_prime_handle_to_fd, -- 2.31.1
Re: [PATCH v2 4/7] drm/panfrost: Add the ability to create submit queues
On Fri, 2 Jul 2021 10:56:29 +0100 Steven Price wrote: > On 01/07/2021 10:12, Boris Brezillon wrote: > > Needed to keep VkQueues isolated from each other. > > > > Signed-off-by: Boris Brezillon > > My Vulkan knowledge is limited so I'm not sure whether this is the right > approach or not. In particular is it correct that an application can > create a high priority queue which could affect other (normal priority) > applications? That's what msm does (with no extra CAPS check AFAICT), and the freedreno driver can already create high priority queues if PIPE_CONTEXT_HIGH_PRIORITY is passed. Not saying that's okay to allow userspace to tweak the priority, but if that's a problem, other drivers are in trouble too ;-). > > Also does it really make sense to allow user space to create an > unlimited number of queues? It feels like an ideal way for an malicious > application to waste kernel memory. Same here, I see no limit on the number of queues the msm driver can create. I can definitely pick an arbitrary limit of 2^16 (or 2^8 if 2^16 is too high) if you prefer, but I feel like there's plenty of ways to force kernel allocations already, like allocating a gazillion of 4k GEM buffers (cgroup can probably limit the total amount of memory allocated, but you'd still have all gem-buf meta data in kernel memory). > > In terms of implementation it looks correct, but one comment below > > > --- > > drivers/gpu/drm/panfrost/Makefile | 3 +- > > drivers/gpu/drm/panfrost/panfrost_device.h| 2 +- > > drivers/gpu/drm/panfrost/panfrost_drv.c | 69 -- > > drivers/gpu/drm/panfrost/panfrost_job.c | 47 ++- > > drivers/gpu/drm/panfrost/panfrost_job.h | 9 +- > > .../gpu/drm/panfrost/panfrost_submitqueue.c | 130 ++ > > .../gpu/drm/panfrost/panfrost_submitqueue.h | 27 > > include/uapi/drm/panfrost_drm.h | 17 +++ > > 8 files changed, 258 insertions(+), 46 deletions(-) > > create mode 100644 drivers/gpu/drm/panfrost/panfrost_submitqueue.c > > create mode 100644 drivers/gpu/drm/panfrost/panfrost_submitqueue.h > > > [...] > > diff --git a/drivers/gpu/drm/panfrost/panfrost_submitqueue.c > > b/drivers/gpu/drm/panfrost/panfrost_submitqueue.c > > new file mode 100644 > > index ..98050f7690df > > --- /dev/null > > +++ b/drivers/gpu/drm/panfrost/panfrost_submitqueue.c > > @@ -0,0 +1,130 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +/* Copyright 2021 Collabora ltd. */ > > + > > +#include > > + > > +#include "panfrost_device.h" > > +#include "panfrost_job.h" > > +#include "panfrost_submitqueue.h" > > + > > +static enum drm_sched_priority > > +to_sched_prio(enum panfrost_submitqueue_priority priority) > > +{ > > + switch (priority) { > > + case PANFROST_SUBMITQUEUE_PRIORITY_LOW: > > + return DRM_SCHED_PRIORITY_MIN; > > + case PANFROST_SUBMITQUEUE_PRIORITY_MEDIUM: > > + return DRM_SCHED_PRIORITY_NORMAL; > > + case PANFROST_SUBMITQUEUE_PRIORITY_HIGH: > > + return DRM_SCHED_PRIORITY_HIGH; > > + default: > > + break; > > + } > > + > > + return DRM_SCHED_PRIORITY_UNSET; > > +} > > + > > +static void > > +panfrost_submitqueue_cleanup(struct kref *ref) > > +{ > > + struct panfrost_submitqueue *queue; > > + unsigned int i; > > + > > + queue = container_of(ref, struct panfrost_submitqueue, refcount); > > + > > + for (i = 0; i < NUM_JOB_SLOTS; i++) > > + drm_sched_entity_destroy(&queue->sched_entity[i]); > > + > > + /* Kill in-flight jobs */ > > + panfrost_job_kill_queue(queue); > > + > > + kfree(queue); > > +} > > + > > +void panfrost_submitqueue_put(struct panfrost_submitqueue *queue) > > +{ > > + if (!IS_ERR_OR_NULL(queue)) > > + kref_put(&queue->refcount, panfrost_submitqueue_cleanup); > > +} > > + > > +struct panfrost_submitqueue * > > +panfrost_submitqueue_create(struct panfrost_file_priv *ctx, > > + enum panfrost_submitqueue_priority priority, > > + u32 flags) > > If this function returned an 'int' we could simplify some code. So > instead of returning the struct panfrost_submitqueue just return the ID > (or negative error). The only caller (panfrost_ioctl_create_submitqueue) > doesn't actually want the object just the ID and we can ditch the 'id' > field from panfrost_submitqueue. Sure, I can do that.
Re: [PATCH v2 4/7] drm/panfrost: Add the ability to create submit queues
On Fri, 2 Jul 2021 11:08:58 +0100 Steven Price wrote: > On 01/07/2021 10:12, Boris Brezillon wrote: > > Needed to keep VkQueues isolated from each other. > > One more comment I noticed when I tried this out: > > [...] > > +struct panfrost_submitqueue * > > +panfrost_submitqueue_create(struct panfrost_file_priv *ctx, > > + enum panfrost_submitqueue_priority priority, > > + u32 flags) > > +{ > > + struct panfrost_submitqueue *queue; > > + enum drm_sched_priority sched_prio; > > + int ret, i; > > + > > + if (flags || priority >= PANFROST_SUBMITQUEUE_PRIORITY_COUNT) > > + return ERR_PTR(-EINVAL); > > + > > + queue = kzalloc(sizeof(*queue), GFP_KERNEL); > > + if (!queue) > > + return ERR_PTR(-ENOMEM); > > + > > + queue->pfdev = ctx->pfdev; > > + sched_prio = to_sched_prio(priority); > > + for (i = 0; i < NUM_JOB_SLOTS; i++) { > > + struct drm_gpu_scheduler *sched; > > + > > + sched = panfrost_job_get_sched(ctx->pfdev, i); > > + ret = drm_sched_entity_init(&queue->sched_entity[i], > > + sched_prio, &sched, 1, NULL); > > + if (ret) > > + break; > > + } > > + > > + if (ret) { > > + for (i--; i >= 0; i--) > > + drm_sched_entity_destroy(&queue->sched_entity[i]); > > + > > + return ERR_PTR(ret); > > + } > > + > > + kref_init(&queue->refcount); > > + idr_lock(&ctx->queues); > > + ret = idr_alloc(&ctx->queues, queue, 0, INT_MAX, GFP_KERNEL); > > This makes lockdep complain. idr_lock() is a spinlock and GFP_KERNEL can > sleep. So either we need to bring our own mutex here or not use GFP_KERNEL. > Ouch! I wonder why I don't see that (I have lockdep enabled, and the igt tests should have exercised this path). > Steve
Re: [PATCH v2 4/7] drm/panfrost: Add the ability to create submit queues
On Fri, 2 Jul 2021 11:58:34 +0100 Steven Price wrote: > On 02/07/2021 11:52, Boris Brezillon wrote: > > On Fri, 2 Jul 2021 11:08:58 +0100 > > Steven Price wrote: > > > >> On 01/07/2021 10:12, Boris Brezillon wrote: > >>> Needed to keep VkQueues isolated from each other. > >> > >> One more comment I noticed when I tried this out: > >> > >> [...] > >>> +struct panfrost_submitqueue * > >>> +panfrost_submitqueue_create(struct panfrost_file_priv *ctx, > >>> + enum panfrost_submitqueue_priority priority, > >>> + u32 flags) > >>> +{ > >>> + struct panfrost_submitqueue *queue; > >>> + enum drm_sched_priority sched_prio; > >>> + int ret, i; > >>> + > >>> + if (flags || priority >= PANFROST_SUBMITQUEUE_PRIORITY_COUNT) > >>> + return ERR_PTR(-EINVAL); > >>> + > >>> + queue = kzalloc(sizeof(*queue), GFP_KERNEL); > >>> + if (!queue) > >>> + return ERR_PTR(-ENOMEM); > >>> + > >>> + queue->pfdev = ctx->pfdev; > >>> + sched_prio = to_sched_prio(priority); > >>> + for (i = 0; i < NUM_JOB_SLOTS; i++) { > >>> + struct drm_gpu_scheduler *sched; > >>> + > >>> + sched = panfrost_job_get_sched(ctx->pfdev, i); > >>> + ret = drm_sched_entity_init(&queue->sched_entity[i], > >>> + sched_prio, &sched, 1, NULL); > >>> + if (ret) > >>> + break; > >>> + } > >>> + > >>> + if (ret) { > >>> + for (i--; i >= 0; i--) > >>> + drm_sched_entity_destroy(&queue->sched_entity[i]); > >>> + > >>> + return ERR_PTR(ret); > >>> + } > >>> + > >>> + kref_init(&queue->refcount); > >>> + idr_lock(&ctx->queues); > >>> + ret = idr_alloc(&ctx->queues, queue, 0, INT_MAX, GFP_KERNEL); > >> > >> This makes lockdep complain. idr_lock() is a spinlock and GFP_KERNEL can > >> sleep. So either we need to bring our own mutex here or not use GFP_KERNEL. > >> > > > > Ouch! I wonder why I don't see that (I have lockdep enabled, and the > > igt tests should have exercised this path). > > Actually I'm not sure it technically lockdep - have you got > CONFIG_DEBUG_ATOMIC_SLEEP set? Nope, I was missing that one :-/.
Re: [PATCH v2 4/7] drm/panfrost: Add the ability to create submit queues
On Fri, 2 Jul 2021 09:58:06 -0400 Alyssa Rosenzweig wrote: > > > My Vulkan knowledge is limited so I'm not sure whether this is the right > > > approach or not. In particular is it correct that an application can > > > create a high priority queue which could affect other (normal priority) > > > applications? > > > > That's what msm does (with no extra CAPS check AFAICT), and the > > freedreno driver can already create high priority queues if > > PIPE_CONTEXT_HIGH_PRIORITY is passed. Not saying that's okay to allow > > userspace to tweak the priority, but if that's a problem, other drivers > > are in trouble too ;-). > > Speaking of, how will PIPE_CONTEXT_HIGH_PRIORITY be implemented with the > new ioctl()? I envisioned something much simpler (for the old ioctl), > just adding a "high priority?" flag to the submit and internally > creating the two queues of normal/high priority for drm_sched to work > out. Is this juggling now moved to userspace? That's what freedreno does. I guess we could create 2 default queues (one normal and one high prio) and extend the old submit ioctl() to do what you suggest if you see a good reason to not switch to the new ioctl() directly. I mean, we'll have to keep support for both anyway, but switching to the new ioctl()) shouldn't be that hard (I can prepare a MR transitioning the gallium driver to BATCH_SUBMIT if you want).
[PATCH v3 0/7] drm/panfrost: drm/panfrost: Add a new submit ioctl
Hello, This is an attempt at providing a new submit ioctl that's more Vulkan-friendly than the existing one. This ioctl 1/ allows passing several out syncobjs so we can easily update several fence/semaphore in a single ioctl() call 2/ allows passing several jobs so we don't have to have one ioctl per job-chain recorded in the command buffer 3/ supports disabling implicit dependencies as well as non-exclusive access to BOs, thus removing unnecessary synchronization I've also been looking at adding {IN,OUT}_FENCE_FD support (allowing one to pass at most one sync_file object in input and/or creating a sync_file FD embedding the render out fence), but it's not entirely clear to me when that's useful. Indeed, we can already do the sync_file <-> syncobj conversion using the SYNCOBJ_{FD_TO_HANDLE,HANDLE_TO_FD} ioctls if we have to. Note that, unlike Turnip, PanVk is using syncobjs to implement vkQueueWaitIdle(), so the syncobj -> sync_file conversion doesn't have to happen for each submission, but maybe there's a good reason to use sync_files for that too. Any feedback on that aspect would be useful I guess. Any feedback on this new ioctl is welcome, in particular, do you think other things are missing/would be nice to have for Vulkan? Regards, Boris P.S.: basic igt tests for these new ioctls re available there [1] [1]https://gitlab.freedesktop.org/bbrezillon/igt-gpu-tools/-/tree/panfrost-batch-submit Changes in v3: * Fix a deadlock in the submitqueue logic * Limit the number of submitqueue per context to 16 Boris Brezillon (7): drm/panfrost: Pass a job to panfrost_{acquire,attach}_object_fences() drm/panfrost: Move the mappings collection out of panfrost_lookup_bos() drm/panfrost: Add BO access flags to relax dependencies between jobs drm/panfrost: Add the ability to create submit queues drm/panfrost: Add a new ioctl to submit batches drm/panfrost: Advertise the SYNCOBJ_TIMELINE feature drm/panfrost: Bump minor version to reflect the feature additions drivers/gpu/drm/panfrost/Makefile | 3 +- drivers/gpu/drm/panfrost/panfrost_device.h| 2 +- drivers/gpu/drm/panfrost/panfrost_drv.c | 463 ++ drivers/gpu/drm/panfrost/panfrost_job.c | 89 ++-- drivers/gpu/drm/panfrost/panfrost_job.h | 10 +- .../gpu/drm/panfrost/panfrost_submitqueue.c | 136 + .../gpu/drm/panfrost/panfrost_submitqueue.h | 27 + include/uapi/drm/panfrost_drm.h | 103 8 files changed, 689 insertions(+), 144 deletions(-) create mode 100644 drivers/gpu/drm/panfrost/panfrost_submitqueue.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_submitqueue.h -- 2.31.1
[PATCH v3 1/7] drm/panfrost: Pass a job to panfrost_{acquire, attach}_object_fences()
So we don't have to change the prototype if we extend the function. v3: * Fix subject Signed-off-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_job.c | 22 -- 1 file changed, 8 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 71a72fb50e6b..fdc1bd7ecf12 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -240,15 +240,13 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js) spin_unlock(&pfdev->js->job_lock); } -static int panfrost_acquire_object_fences(struct drm_gem_object **bos, - int bo_count, - struct xarray *deps) +static int panfrost_acquire_object_fences(struct panfrost_job *job) { int i, ret; - for (i = 0; i < bo_count; i++) { + for (i = 0; i < job->bo_count; i++) { /* panfrost always uses write mode in its current uapi */ - ret = drm_gem_fence_array_add_implicit(deps, bos[i], true); + ret = drm_gem_fence_array_add_implicit(&job->deps, job->bos[i], true); if (ret) return ret; } @@ -256,14 +254,12 @@ static int panfrost_acquire_object_fences(struct drm_gem_object **bos, return 0; } -static void panfrost_attach_object_fences(struct drm_gem_object **bos, - int bo_count, - struct dma_fence *fence) +static void panfrost_attach_object_fences(struct panfrost_job *job) { int i; - for (i = 0; i < bo_count; i++) - dma_resv_add_excl_fence(bos[i]->resv, fence); + for (i = 0; i < job->bo_count; i++) + dma_resv_add_excl_fence(job->bos[i]->resv, job->render_done_fence); } int panfrost_job_push(struct panfrost_job *job) @@ -290,8 +286,7 @@ int panfrost_job_push(struct panfrost_job *job) job->render_done_fence = dma_fence_get(&job->base.s_fence->finished); - ret = panfrost_acquire_object_fences(job->bos, job->bo_count, -&job->deps); + ret = panfrost_acquire_object_fences(job); if (ret) { mutex_unlock(&pfdev->sched_lock); goto unlock; @@ -303,8 +298,7 @@ int panfrost_job_push(struct panfrost_job *job) mutex_unlock(&pfdev->sched_lock); - panfrost_attach_object_fences(job->bos, job->bo_count, - job->render_done_fence); + panfrost_attach_object_fences(job); unlock: drm_gem_unlock_reservations(job->bos, job->bo_count, &acquire_ctx); -- 2.31.1
[PATCH v3 2/7] drm/panfrost: Move the mappings collection out of panfrost_lookup_bos()
So we can re-use it from elsewhere. Signed-off-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_drv.c | 52 ++--- 1 file changed, 29 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index 1ffaef5ec5ff..9bbc9e78cc85 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -109,6 +109,34 @@ static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data, return 0; } +static int +panfrost_get_job_mappings(struct drm_file *file_priv, struct panfrost_job *job) +{ + struct panfrost_file_priv *priv = file_priv->driver_priv; + unsigned int i; + + job->mappings = kvmalloc_array(job->bo_count, + sizeof(*job->mappings), + GFP_KERNEL | __GFP_ZERO); + if (!job->mappings) + return -ENOMEM; + + for (i = 0; i < job->bo_count; i++) { + struct panfrost_gem_mapping *mapping; + struct panfrost_gem_object *bo; + + bo = to_panfrost_bo(job->bos[i]); + mapping = panfrost_gem_mapping_get(bo, priv); + if (!mapping) + return -EINVAL; + + atomic_inc(&bo->gpu_usecount); + job->mappings[i] = mapping; + } + + return 0; +} + /** * panfrost_lookup_bos() - Sets up job->bo[] with the GEM objects * referenced by the job. @@ -128,8 +156,6 @@ panfrost_lookup_bos(struct drm_device *dev, struct drm_panfrost_submit *args, struct panfrost_job *job) { - struct panfrost_file_priv *priv = file_priv->driver_priv; - struct panfrost_gem_object *bo; unsigned int i; int ret; @@ -144,27 +170,7 @@ panfrost_lookup_bos(struct drm_device *dev, if (ret) return ret; - job->mappings = kvmalloc_array(job->bo_count, - sizeof(struct panfrost_gem_mapping *), - GFP_KERNEL | __GFP_ZERO); - if (!job->mappings) - return -ENOMEM; - - for (i = 0; i < job->bo_count; i++) { - struct panfrost_gem_mapping *mapping; - - bo = to_panfrost_bo(job->bos[i]); - mapping = panfrost_gem_mapping_get(bo, priv); - if (!mapping) { - ret = -EINVAL; - break; - } - - atomic_inc(&bo->gpu_usecount); - job->mappings[i] = mapping; - } - - return ret; + return panfrost_get_job_mappings(file_priv, job); } /** -- 2.31.1
[PATCH v3 3/7] drm/panfrost: Add BO access flags to relax dependencies between jobs
Jobs reading from the same BO should not be serialized. Add access flags so we can relax the implicit dependencies in that case. We force exclusive access for now to keep the behavior unchanged, but a new SUBMIT ioctl taking explicit access flags will be introduced. Signed-off-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_drv.c | 9 + drivers/gpu/drm/panfrost/panfrost_job.c | 23 +++ drivers/gpu/drm/panfrost/panfrost_job.h | 1 + include/uapi/drm/panfrost_drm.h | 2 ++ 4 files changed, 31 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index 9bbc9e78cc85..b6b5997c9366 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -164,6 +164,15 @@ panfrost_lookup_bos(struct drm_device *dev, if (!job->bo_count) return 0; + job->bo_flags = kvmalloc_array(job->bo_count, + sizeof(*job->bo_flags), + GFP_KERNEL | __GFP_ZERO); + if (!job->bo_flags) + return -ENOMEM; + + for (i = 0; i < job->bo_count; i++) + job->bo_flags[i] = PANFROST_BO_REF_EXCLUSIVE; + ret = drm_gem_objects_lookup(file_priv, (void __user *)(uintptr_t)args->bo_handles, job->bo_count, &job->bos); diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index fdc1bd7ecf12..152245b122be 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -245,8 +245,16 @@ static int panfrost_acquire_object_fences(struct panfrost_job *job) int i, ret; for (i = 0; i < job->bo_count; i++) { - /* panfrost always uses write mode in its current uapi */ - ret = drm_gem_fence_array_add_implicit(&job->deps, job->bos[i], true); + bool exclusive = job->bo_flags[i] & PANFROST_BO_REF_EXCLUSIVE; + + if (!exclusive) { + ret = dma_resv_reserve_shared(job->bos[i]->resv, 1); + if (ret) + return ret; + } + + ret = drm_gem_fence_array_add_implicit(&job->deps, job->bos[i], + exclusive); if (ret) return ret; } @@ -258,8 +266,14 @@ static void panfrost_attach_object_fences(struct panfrost_job *job) { int i; - for (i = 0; i < job->bo_count; i++) - dma_resv_add_excl_fence(job->bos[i]->resv, job->render_done_fence); + for (i = 0; i < job->bo_count; i++) { + struct dma_resv *robj = job->bos[i]->resv; + + if (job->bo_flags[i] & PANFROST_BO_REF_EXCLUSIVE) + dma_resv_add_excl_fence(robj, job->render_done_fence); + else + dma_resv_add_shared_fence(robj, job->render_done_fence); + } } int panfrost_job_push(struct panfrost_job *job) @@ -340,6 +354,7 @@ static void panfrost_job_cleanup(struct kref *ref) kvfree(job->bos); } + kvfree(job->bo_flags); kfree(job); } diff --git a/drivers/gpu/drm/panfrost/panfrost_job.h b/drivers/gpu/drm/panfrost/panfrost_job.h index 82306a03b57e..1cbc3621b663 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.h +++ b/drivers/gpu/drm/panfrost/panfrost_job.h @@ -32,6 +32,7 @@ struct panfrost_job { struct panfrost_gem_mapping **mappings; struct drm_gem_object **bos; + u32 *bo_flags; u32 bo_count; /* Fence to be signaled by drm-sched once its done with the job */ diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h index 061e700dd06c..45d6c600475c 100644 --- a/include/uapi/drm/panfrost_drm.h +++ b/include/uapi/drm/panfrost_drm.h @@ -224,6 +224,8 @@ struct drm_panfrost_madvise { __u32 retained; /* out, whether backing store still exists */ }; +#define PANFROST_BO_REF_EXCLUSIVE 0x1 + #if defined(__cplusplus) } #endif -- 2.31.1
[PATCH v3 5/7] drm/panfrost: Add a new ioctl to submit batches
This should help limit the number of ioctls when submitting multiple jobs. The new ioctl also supports syncobj timelines and BO access flags. v3: * Re-use panfrost_get_job_bos() and panfrost_get_job_in_syncs() in the old submit path Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_drv.c | 366 +++- drivers/gpu/drm/panfrost/panfrost_job.c | 3 + include/uapi/drm/panfrost_drm.h | 84 ++ 3 files changed, 375 insertions(+), 78 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index 6529e5972b47..e2897de6e77d 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -138,111 +138,95 @@ panfrost_get_job_mappings(struct drm_file *file_priv, struct panfrost_job *job) return 0; } -/** - * panfrost_lookup_bos() - Sets up job->bo[] with the GEM objects - * referenced by the job. - * @dev: DRM device - * @file_priv: DRM file for this fd - * @args: IOCTL args - * @job: job being set up - * - * Resolve handles from userspace to BOs and attach them to job. - * - * Note that this function doesn't need to unreference the BOs on - * failure, because that will happen at panfrost_job_cleanup() time. - */ +#define PANFROST_BO_REF_ALLOWED_FLAGS \ + (PANFROST_BO_REF_EXCLUSIVE | PANFROST_BO_REF_NO_IMPLICIT_DEP) + static int -panfrost_lookup_bos(struct drm_device *dev, - struct drm_file *file_priv, - struct drm_panfrost_submit *args, - struct panfrost_job *job) +panfrost_get_job_bos(struct drm_file *file_priv, +u64 refs, u32 ref_stride, u32 count, +struct panfrost_job *job) { + void __user *in = u64_to_user_ptr(refs); unsigned int i; - int ret; - job->bo_count = args->bo_handle_count; + job->bo_count = count; - if (!job->bo_count) + if (!count) return 0; + job->bos = kvmalloc_array(job->bo_count, sizeof(*job->bos), + GFP_KERNEL | __GFP_ZERO); job->bo_flags = kvmalloc_array(job->bo_count, sizeof(*job->bo_flags), GFP_KERNEL | __GFP_ZERO); - if (!job->bo_flags) + if (!job->bos || !job->bo_flags) return -ENOMEM; - for (i = 0; i < job->bo_count; i++) - job->bo_flags[i] = PANFROST_BO_REF_EXCLUSIVE; + for (i = 0; i < count; i++) { + struct drm_panfrost_bo_ref ref = { }; + int ret; - ret = drm_gem_objects_lookup(file_priv, -(void __user *)(uintptr_t)args->bo_handles, -job->bo_count, &job->bos); - if (ret) - return ret; + ret = copy_struct_from_user(&ref, sizeof(ref), + in + (i * ref_stride), + ref_stride); + if (ret) + return ret; - return panfrost_get_job_mappings(file_priv, job); + /* Prior to the BATCH_SUBMIT ioctl all accessed BOs were +* treated as exclusive. +*/ + if (ref_stride == sizeof(u32)) + ref.flags = PANFROST_BO_REF_EXCLUSIVE; + + if ((ref.flags & ~PANFROST_BO_REF_ALLOWED_FLAGS)) + return -EINVAL; + + job->bos[i] = drm_gem_object_lookup(file_priv, ref.handle); + if (!job->bos[i]) + return -EINVAL; + + job->bo_flags[i] = ref.flags; + } + + return 0; } -/** - * panfrost_copy_in_sync() - Sets up job->deps with the sync objects - * referenced by the job. - * @dev: DRM device - * @file_priv: DRM file for this fd - * @args: IOCTL args - * @job: job being set up - * - * Resolve syncobjs from userspace to fences and attach them to job. - * - * Note that this function doesn't need to unreference the fences on - * failure, because that will happen at panfrost_job_cleanup() time. - */ static int -panfrost_copy_in_sync(struct drm_device *dev, - struct drm_file *file_priv, - struct drm_panfrost_submit *args, - struct panfrost_job *job) +panfrost_get_job_in_syncs(struct drm_file *file_priv, + u64 refs, u32 ref_stride, + u32 count, struct panfrost_job *job) { - u32 *handles; - int ret = 0; - int i, in_fence_count; + const void __user *in = u64_to_user_ptr(refs); + unsigned int i; + int ret; - in_fence_count = args->in_sync_count; - - if (!in_fence_count) + if (!count) return 0; - handles =
[PATCH v3 7/7] drm/panfrost: Bump minor version to reflect the feature additions
We now have a new ioctl that allows submitting multiple jobs at once (among other things) and we support timelined syncobjs. Bump the minor version number to reflect those changes. Signed-off-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_drv.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index 242a16246d79..33cd34a1213c 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -789,6 +789,8 @@ DEFINE_DRM_GEM_FOPS(panfrost_drm_driver_fops); * - 1.0 - initial interface * - 1.1 - adds HEAP and NOEXEC flags for CREATE_BO * - 1.2 - adds AFBC_FEATURES query + * - 1.3 - adds the BATCH_SUBMIT, CREATE_SUBMITQUEUE, DESTROY_SUBMITQUEUE + *ioctls and advertises the SYNCOBJ_TIMELINE feature */ static const struct drm_driver panfrost_drm_driver = { .driver_features= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ | @@ -802,7 +804,7 @@ static const struct drm_driver panfrost_drm_driver = { .desc = "panfrost DRM", .date = "20180908", .major = 1, - .minor = 2, + .minor = 3, .gem_create_object = panfrost_gem_create_object, .prime_handle_to_fd = drm_gem_prime_handle_to_fd, -- 2.31.1
[PATCH v3 6/7] drm/panfrost: Advertise the SYNCOBJ_TIMELINE feature
Now that we have a new SUBMIT ioctl dealing with timelined syncojbs we can advertise the feature. Signed-off-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_drv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index e2897de6e77d..242a16246d79 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -791,7 +791,8 @@ DEFINE_DRM_GEM_FOPS(panfrost_drm_driver_fops); * - 1.2 - adds AFBC_FEATURES query */ static const struct drm_driver panfrost_drm_driver = { - .driver_features= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ, + .driver_features= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ | + DRIVER_SYNCOBJ_TIMELINE, .open = panfrost_open, .postclose = panfrost_postclose, .ioctls = panfrost_drm_driver_ioctls, -- 2.31.1
[PATCH v3 4/7] drm/panfrost: Add the ability to create submit queues
Needed to keep VkQueues isolated from each other. v3: * Limit the number of submitqueue per context to 16 * Fix a deadlock Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panfrost/Makefile | 3 +- drivers/gpu/drm/panfrost/panfrost_device.h| 2 +- drivers/gpu/drm/panfrost/panfrost_drv.c | 69 +++-- drivers/gpu/drm/panfrost/panfrost_job.c | 47 ++ drivers/gpu/drm/panfrost/panfrost_job.h | 9 +- .../gpu/drm/panfrost/panfrost_submitqueue.c | 136 ++ .../gpu/drm/panfrost/panfrost_submitqueue.h | 27 include/uapi/drm/panfrost_drm.h | 17 +++ 8 files changed, 264 insertions(+), 46 deletions(-) create mode 100644 drivers/gpu/drm/panfrost/panfrost_submitqueue.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_submitqueue.h diff --git a/drivers/gpu/drm/panfrost/Makefile b/drivers/gpu/drm/panfrost/Makefile index b71935862417..e99192b66ec9 100644 --- a/drivers/gpu/drm/panfrost/Makefile +++ b/drivers/gpu/drm/panfrost/Makefile @@ -9,6 +9,7 @@ panfrost-y := \ panfrost_gpu.o \ panfrost_job.o \ panfrost_mmu.o \ - panfrost_perfcnt.o + panfrost_perfcnt.o \ + panfrost_submitqueue.o obj-$(CONFIG_DRM_PANFROST) += panfrost.o diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h index 8b25278f34c8..51c0ba4e50f5 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.h +++ b/drivers/gpu/drm/panfrost/panfrost_device.h @@ -137,7 +137,7 @@ struct panfrost_mmu { struct panfrost_file_priv { struct panfrost_device *pfdev; - struct drm_sched_entity sched_entity[NUM_JOB_SLOTS]; + struct idr queues; struct panfrost_mmu *mmu; }; diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index b6b5997c9366..6529e5972b47 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -19,6 +19,7 @@ #include "panfrost_job.h" #include "panfrost_gpu.h" #include "panfrost_perfcnt.h" +#include "panfrost_submitqueue.h" static bool unstable_ioctls; module_param_unsafe(unstable_ioctls, bool, 0600); @@ -250,6 +251,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data, struct panfrost_device *pfdev = dev->dev_private; struct drm_panfrost_submit *args = data; struct drm_syncobj *sync_out = NULL; + struct panfrost_submitqueue *queue; struct panfrost_job *job; int ret = 0; @@ -259,10 +261,16 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data, if (args->requirements && args->requirements != PANFROST_JD_REQ_FS) return -EINVAL; + queue = panfrost_submitqueue_get(file->driver_priv, 0); + if (IS_ERR(queue)) + return PTR_ERR(queue); + if (args->out_sync > 0) { sync_out = drm_syncobj_find(file, args->out_sync); - if (!sync_out) - return -ENODEV; + if (!sync_out) { + ret = -ENODEV; + goto fail_put_queue; + } } job = kzalloc(sizeof(*job), GFP_KERNEL); @@ -289,7 +297,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data, if (ret) goto fail_job; - ret = panfrost_job_push(job); + ret = panfrost_job_push(queue, job); if (ret) goto fail_job; @@ -302,6 +310,8 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data, fail_out_sync: if (sync_out) drm_syncobj_put(sync_out); +fail_put_queue: + panfrost_submitqueue_put(queue); return ret; } @@ -451,6 +461,36 @@ static int panfrost_ioctl_madvise(struct drm_device *dev, void *data, return ret; } +static int +panfrost_ioctl_create_submitqueue(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct panfrost_file_priv *priv = file_priv->driver_priv; + struct drm_panfrost_create_submitqueue *args = data; + struct panfrost_submitqueue *queue; + + queue = panfrost_submitqueue_create(priv, args->priority, args->flags); + if (IS_ERR(queue)) + return PTR_ERR(queue); + + args->id = queue->id; + return 0; +} + +static int +panfrost_ioctl_destroy_submitqueue(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct panfrost_file_priv *priv = file_priv->driver_priv; + u32 id = *((u32 *)data); + + /* Default queue can't be destroyed. */ + if (!id) + return -ENOENT; + + return panfrost_submitqueue_destroy(priv, id); +} + int panfrost_unstable_ioctl_check(void) { if (!unstable_ioctls) @@ -
Re: [PATCH v3 5/7] drm/panfrost: Add a new ioctl to submit batches
On Fri, 2 Jul 2021 11:13:16 -0400 Alyssa Rosenzweig wrote: > ``` > > +/* Syncobj reference passed at job submission time to encode explicit > > + * input/output fences. > > + */ > > +struct drm_panfrost_syncobj_ref { > > + __u32 handle; > > + __u32 pad; > > + __u64 point; > > +}; > ``` > > What is handle? What is point? Handle is a syncobj handle, point is the point in a syncobj timeline. I'll document those fields. > Why is there padding instead of putting point first? We can move the point field first, but we need to keep the explicit padding: the struct has to be 64bit aligned because of the __u64 field (which the compiler takes care of) but if we don't have an explicit padding, the unused 32bits are undefined, which might cause trouble if we extend the struct at some point, since we sort of expect that old userspace keep this unused 32bit slot to 0, while new users set non-zero values if they have to. > > ``` > > #define PANFROST_BO_REF_EXCLUSIVE 0x1 > > +#define PANFROST_BO_REF_NO_IMPLICIT_DEP0x2 > ``` > > This seems logically backwards. NO_IMPLICIT_DEP makes sense if we're > trying to keep backwards compatibility, but here you're crafting a new > interface totally from scratch. If anything, isn't BO_REF_IMPLICIT_DEP > the flag you'd want? AFAICT, all other drivers make the no-implicit-dep an opt-in, and I didn't want to do things differently in panfrost. But if that's really an issue, I can make it an opt-out. > > ``` > > + /** > > +* Stride of the jobs array (needed to ease extension of the > > +* BATCH_SUBMIT ioctl). Should be set to > > +* sizeof(struct drm_panfrost_job). > > +*/ > > + __u32 job_stride; > ... > > + /** > > +* Stride of the BO and syncobj reference arrays (needed to ease > > +* extension of the BATCH_SUBMIT ioctl). Should be set to > > +* sizeof(struct drm_panfrost_bo_ref). > > +*/ > > + __u32 bo_ref_stride; > > + __u32 syncobj_ref_stride; > ``` > > Hmm. I'm not /opposed/ and I know kbase uses strides but it seems like > somewhat unwarranted complexity, and there is a combinatoric explosion > here (if jobs, bo refs, and syncobj refs use 3 different versions, as > this encoding permits... as opposed to just specifying a UABI version or > something like that) Sounds like a good idea. I'll add a version field and map that to a tuple. > > ``` > > + /** > > +* If the submission fails, this encodes the index of the job > > +* failed. > > +*/ > > + __u32 fail_idx; > ``` > > What if multiple jobs fail? We stop at the first failure. Note that it's not an execution failure, but a submission failure (AKA, userspace passed wrong params, like invalid BO or synobj handles).
Re: [PATCH v3 4/7] drm/panfrost: Add the ability to create submit queues
On Fri, 2 Jul 2021 16:05:30 +0100 Steven Price wrote: > On 02/07/2021 15:32, Boris Brezillon wrote: > > Needed to keep VkQueues isolated from each other. > > > > v3: > > * Limit the number of submitqueue per context to 16 > > * Fix a deadlock > > > > Signed-off-by: Boris Brezillon > > 16 ought to be enough for anyone ;) > > Reviewed-by: Steven Price Oops, forgot to change the submitqueue_get() prototype. Will address that in v4.
Re: [PATCH v3 4/7] drm/panfrost: Add the ability to create submit queues
On Fri, 2 Jul 2021 17:49:10 +0200 Boris Brezillon wrote: > On Fri, 2 Jul 2021 16:05:30 +0100 > Steven Price wrote: > > > On 02/07/2021 15:32, Boris Brezillon wrote: > > > Needed to keep VkQueues isolated from each other. > > > > > > v3: > > > * Limit the number of submitqueue per context to 16 > > > * Fix a deadlock > > > > > > Signed-off-by: Boris Brezillon > > > > 16 ought to be enough for anyone ;) > > > > Reviewed-by: Steven Price > > Oops, forgot to change the submitqueue_get() prototype. Will address > that in v4. I meant submitqueue_create().
Re: [PATCH v3 5/7] drm/panfrost: Add a new ioctl to submit batches
On Fri, 2 Jul 2021 12:49:55 -0400 Alyssa Rosenzweig wrote: > > > Why is there padding instead of putting point first? > > > > We can move the point field first, but we need to keep the explicit > > padding: the struct has to be 64bit aligned because of the __u64 field > > (which the compiler takes care of) but if we don't have an explicit > > padding, the unused 32bits are undefined, which might cause trouble if > > we extend the struct at some point, since we sort of expect that old > > userspace keep this unused 32bit slot to 0, while new users set > > non-zero values if they have to. > > Makes sense. Reordering still probably makes sense. Actually, I can't re-order if I want the new in_syncs parser to work with the old ioctl(), which you and Steven asked me to do :-).
Re: [PATCH v3 5/7] drm/panfrost: Add a new ioctl to submit batches
On Fri, 2 Jul 2021 12:49:55 -0400 Alyssa Rosenzweig wrote: > > > ``` > > > > #define PANFROST_BO_REF_EXCLUSIVE 0x1 > > > > +#define PANFROST_BO_REF_NO_IMPLICIT_DEP0x2 > > > ``` > > > > > > This seems logically backwards. NO_IMPLICIT_DEP makes sense if we're > > > trying to keep backwards compatibility, but here you're crafting a new > > > interface totally from scratch. If anything, isn't BO_REF_IMPLICIT_DEP > > > the flag you'd want? > > > > AFAICT, all other drivers make the no-implicit-dep an opt-in, and I > > didn't want to do things differently in panfrost. But if that's really > > an issue, I can make it an opt-out. > > I don't have strong feelings either way. I was just under the > impressions other drivers did this for b/w compat reasons which don't > apply here. Okay, I think I'll keep it like that unless there's a strong reason to make no-implicit dep the default. It's safer to oversync than the skip the synchronization, so it does feel like something the user should explicitly enable. > > > > Hmm. I'm not /opposed/ and I know kbase uses strides but it seems like > > > somewhat unwarranted complexity, and there is a combinatoric explosion > > > here (if jobs, bo refs, and syncobj refs use 3 different versions, as > > > this encoding permits... as opposed to just specifying a UABI version or > > > something like that) > > > > Sounds like a good idea. I'll add a version field and map that > > to a tuple. > > Cc Steven, does this make sense? I have this approach working, and I must admit I prefer it to the per-object stride field passed to the submit struct.
[PATCH v4 0/7] drm/panfrost: drm/panfrost: Add a new submit ioctl
Hello, This is an attempt at providing a new submit ioctl that's more Vulkan-friendly than the existing one. This ioctl 1/ allows passing several out syncobjs so we can easily update several fence/semaphore in a single ioctl() call 2/ allows passing several jobs so we don't have to have one ioctl per job-chain recorded in the command buffer 3/ supports disabling implicit dependencies as well as non-exclusive access to BOs, thus removing unnecessary synchronization I've also been looking at adding {IN,OUT}_FENCE_FD support (allowing one to pass at most one sync_file object in input and/or creating a sync_file FD embedding the render out fence), but it's not entirely clear to me when that's useful. Indeed, we can already do the sync_file <-> syncobj conversion using the SYNCOBJ_{FD_TO_HANDLE,HANDLE_TO_FD} ioctls if we have to. Note that, unlike Turnip, PanVk is using syncobjs to implement vkQueueWaitIdle(), so the syncobj -> sync_file conversion doesn't have to happen for each submission, but maybe there's a good reason to use sync_files for that too. Any feedback on that aspect would be useful I guess. Any feedback on this new ioctl is welcome, in particular, do you think other things are missing/would be nice to have for Vulkan? Regards, Boris P.S.: basic igt tests for these new ioctls re available there [1] [1]https://gitlab.freedesktop.org/bbrezillon/igt-gpu-tools/-/tree/panfrost-batch-submit Changes in v4: * Replace the desc strides by a version field * Change the submitqueue_create() prototype to return the queue id directly * Implement the old submit ioctl() as a simple wrapper around panfrost_submit_job() Changes in v3: * Fix a deadlock in the submitqueue logic * Limit the number of submitqueue per context to 16 *** BLURB HERE *** Boris Brezillon (7): drm/panfrost: Pass a job to panfrost_{acquire,attach}_object_fences() drm/panfrost: Move the mappings collection out of panfrost_lookup_bos() drm/panfrost: Add BO access flags to relax dependencies between jobs drm/panfrost: Add the ability to create submit queues drm/panfrost: Add a new ioctl to submit batches drm/panfrost: Advertise the SYNCOBJ_TIMELINE feature drm/panfrost: Bump minor version to reflect the feature additions drivers/gpu/drm/panfrost/Makefile | 3 +- drivers/gpu/drm/panfrost/panfrost_device.h| 2 +- drivers/gpu/drm/panfrost/panfrost_drv.c | 611 +- drivers/gpu/drm/panfrost/panfrost_job.c | 89 ++- drivers/gpu/drm/panfrost/panfrost_job.h | 10 +- .../gpu/drm/panfrost/panfrost_submitqueue.c | 132 .../gpu/drm/panfrost/panfrost_submitqueue.h | 26 + include/uapi/drm/panfrost_drm.h | 112 8 files changed, 766 insertions(+), 219 deletions(-) create mode 100644 drivers/gpu/drm/panfrost/panfrost_submitqueue.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_submitqueue.h -- 2.31.1