from:"Dave Airlie"

Re: [PATCH 2/2] drm/nouveau/gsp: Use the sg allocator for level 2 of radix3

2024-04-29 Thread Dave Airlie

> Currently, this can result in runtime PM issues on systems where memory
> Luckily, we don't actually need to allocate coherent memory for the page
> table thanks to being able to pass the GPU a radix3 page table for
> suspend/resume data. So, let's rewrite nvkm_gsp_radix3_sg() to use the sg
> allocator for level 2. We continue using coherent allocations for lvl0 and
> 1, since they only take a single page.
>
> Signed-off-by: Lyude Paul 
> Cc: sta...@vger.kernel.org
> ---
>  .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h |  4 +-
>  .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c| 71 ---
>  2 files changed, 47 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h 
> b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> index 6f5d376d8fcc1..a11d16a16c3b2 100644
> --- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> +++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> @@ -15,7 +15,9 @@ struct nvkm_gsp_mem {
>  };
>
>  struct nvkm_gsp_radix3 {
> -   struct nvkm_gsp_mem mem[3];
> +   struct nvkm_gsp_mem lvl0;
> +   struct nvkm_gsp_mem lvl1;
> +   struct sg_table lvl2;

This looks great, could we go a step further and combine lvl0 and lvl1
into a 2 page allocation, I thought we could combine lvl0/lvl1 into a
2 page alloc, but that actually might be a bad idea under memory
pressure.

Dave.

[PATCH] nouveau: rip out busy fence waits

2024-04-16 Thread Dave Airlie

From: Dave Airlie 

I'm pretty sure this optimisation is actually not a great idea,
and is racy with other things waiting for fences.

Just nuke it, there should be no need to do fence waits in a
busy CPU loop.

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c|  2 +-
 drivers/gpu/drm/nouveau/nouveau_chan.c  |  2 +-
 drivers/gpu/drm/nouveau/nouveau_dmem.c  |  2 +-
 drivers/gpu/drm/nouveau/nouveau_fence.c | 30 +
 drivers/gpu/drm/nouveau/nouveau_fence.h |  2 +-
 drivers/gpu/drm/nouveau/nouveau_gem.c   |  2 +-
 6 files changed, 6 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 8a30f5a0525b..a4e8f625fce6 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -902,7 +902,7 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int 
evict,
 * Without this the operation can timeout and we'll fallback to a
 * software copy, which might take several minutes to finish.
 */
-   nouveau_fence_wait(fence, false, false);
+   nouveau_fence_wait(fence, false);
ret = ttm_bo_move_accel_cleanup(bo, >base, evict, false,
new_reg);
nouveau_fence_unref();
diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.c 
b/drivers/gpu/drm/nouveau/nouveau_chan.c
index 7c97b2886807..66fca95c10c7 100644
--- a/drivers/gpu/drm/nouveau/nouveau_chan.c
+++ b/drivers/gpu/drm/nouveau/nouveau_chan.c
@@ -72,7 +72,7 @@ nouveau_channel_idle(struct nouveau_channel *chan)
 
ret = nouveau_fence_new(, chan);
if (!ret) {
-   ret = nouveau_fence_wait(fence, false, false);
+   ret = nouveau_fence_wait(fence, false);
nouveau_fence_unref();
}
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 12feecf71e75..033a09cd3c8f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -128,7 +128,7 @@ static void nouveau_dmem_page_free(struct page *page)
 static void nouveau_dmem_fence_done(struct nouveau_fence **fence)
 {
if (fence) {
-   nouveau_fence_wait(*fence, true, false);
+   nouveau_fence_wait(*fence, false);
nouveau_fence_unref(fence);
} else {
/*
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c 
b/drivers/gpu/drm/nouveau/nouveau_fence.c
index c3ea3cd933cd..8de941379324 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -312,39 +312,11 @@ nouveau_fence_wait_legacy(struct dma_fence *f, bool intr, 
long wait)
return timeout - t;
 }
 
-static int
-nouveau_fence_wait_busy(struct nouveau_fence *fence, bool intr)
-{
-   int ret = 0;
-
-   while (!nouveau_fence_done(fence)) {
-   if (time_after_eq(jiffies, fence->timeout)) {
-   ret = -EBUSY;
-   break;
-   }
-
-   __set_current_state(intr ?
-   TASK_INTERRUPTIBLE :
-   TASK_UNINTERRUPTIBLE);
-
-   if (intr && signal_pending(current)) {
-   ret = -ERESTARTSYS;
-   break;
-   }
-   }
-
-   __set_current_state(TASK_RUNNING);
-   return ret;
-}
-
 int
-nouveau_fence_wait(struct nouveau_fence *fence, bool lazy, bool intr)
+nouveau_fence_wait(struct nouveau_fence *fence, bool intr)
 {
long ret;
 
-   if (!lazy)
-   return nouveau_fence_wait_busy(fence, intr);
-
ret = dma_fence_wait_timeout(>base, intr, 15 * HZ);
if (ret < 0)
return ret;
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h 
b/drivers/gpu/drm/nouveau/nouveau_fence.h
index bc13110bdfa4..88213014b675 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.h
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.h
@@ -23,7 +23,7 @@ void nouveau_fence_unref(struct nouveau_fence **);
 
 int  nouveau_fence_emit(struct nouveau_fence *);
 bool nouveau_fence_done(struct nouveau_fence *);
-int  nouveau_fence_wait(struct nouveau_fence *, bool lazy, bool intr);
+int  nouveau_fence_wait(struct nouveau_fence *, bool intr);
 int  nouveau_fence_sync(struct nouveau_bo *, struct nouveau_channel *, bool 
exclusive, bool intr);
 
 struct nouveau_fence_chan {
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c 
b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 49c2bcbef129..f715e381da69 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -928,7 +928,7 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void 
*data,
}
 
if (sync) {
-   if (!(ret = nouveau_fence_wait(fence, false, false))) {
+   if (!(ret = nouveau_fen

Re: [PATCH 000/156] drm/nouveau: replace "ioctl" interface between drm/nvkm

2024-04-16 Thread Dave Airlie

On Wed, 17 Apr 2024 at 10:57, Ben Skeggs  wrote:
>
> This is a series of cleanups that I intended on doing after posting
> the initial GSP-RM support several months ago, and have now had the
> opportunity to work on again.
>
> The main intention here is to replace the ioctl-like interface that
> sits between NVKM and the nouveau DRM driver with more direct calls,
> to reduce the call-chain complexity (and overhead).
>
> This is achieved by having NVKM return structs of info and function
> pointers specific to each class, along with an opaque pointer to its
> private data.  These are stored in the NVIF structs and used to call
> directly into an implementation without handle lookups and multiple
> layers of indirection.
>
> There's room for further cleanups and API changes from here, but for
> now most of the API usage is exactly as it was before, as the series
> has gotten quite large enough already.
>
> The first part of the series starts out by cleaning up some of the
> interfaces within the DRM driver, and reimplementing the subset of
> "ioctl" interfaces needed by userspace directly.
>
> A number of unused interfaces/function args are then removed so that
> they don't need to be ported, and some initial renames/preparations
> are made to the NVKM's user object implementations so that diffs of
> the next set of patches are more straightforward to read.
>
> I then go through each class in turn, starting from the root of the
> object tree (client), and working up from there.  The object ctors/
> dtors are ported first, followed by sclass/map/etc, followed by the
> class's methods, and then repeating the process with each of their
> children.
>
> Objects remain accessible with the "ioctl" interface throughout the
> changes (until their last use, after which they're removed from the
> object rb entirely) to allow each change to be done independently.
>
> After all classes have been ported, some final cleanups are made to
> the DRM driver to complete the series.

Welcome back!

Do you have a git tree with this in it, since I think at least patch
25 got stuck in moderation.

Have you tested nouveau GL and nvk on top of this?

Dave.

[PATCH] nouveau: fix instmem race condition around ptr stores

2024-04-10 Thread Dave Airlie

From: Dave Airlie 

Running a lot of VK CTS in parallel against nouveau, once every
few hours you might see something like this crash.

BUG: kernel NULL pointer dereference, address: 0008
PGD 800114e6e067 P4D 800114e6e067 PUD 109046067 PMD 0
Oops:  [#1] PREEMPT SMP PTI
CPU: 7 PID: 53891 Comm: deqp-vk Not tainted 6.8.0-rc6+ #27
Hardware name: Gigabyte Technology Co., Ltd. Z390 I AORUS PRO WIFI/Z390 I AORUS 
PRO WIFI-CF, BIOS F8 11/05/2021
RIP: 0010:gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
Code: c7 48 01 c8 49 89 45 58 85 d2 0f 84 95 00 00 00 41 0f b7 46 12 49 8b 7e 
08 89 da 42 8d 2c f8 48 8b 47 08 41 83 c7 01 48 89 ee <48> 8b 40 08 ff d0 0f 1f 
00 49 8b 7e 08 48 89 d9 48 8d 75 04 48 c1
RSP: :ac20c5857838 EFLAGS: 00010202
RAX:  RBX: 004d8001 RCX: 0001
RDX: 004d8001 RSI: 06d8 RDI: a07afe332180
RBP: 06d8 R08: ac20c5857ad0 R09: 0010
R10: 0001 R11: a07af27e2de0 R12: 001c
R13: ac20c5857ad0 R14: a07a96fe9040 R15: 001c
FS:  7fe395eed7c0() GS:a07e2c98() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0008 CR3: 00011febe001 CR4: 003706f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:

...

 ? gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
 ? gp100_vmm_pgt_mem+0x37/0x180 [nouveau]
 nvkm_vmm_iter+0x351/0xa20 [nouveau]
 ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
 ? __lock_acquire+0x3ed/0x2170
 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
 nvkm_vmm_ptes_get_map+0xc2/0x100 [nouveau]
 ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
 nvkm_vmm_map_locked+0x224/0x3a0 [nouveau]

Adding any sort of useful debug usually makes it go away, so I hand
wrote the function in a line, and debugged the asm.

Every so often pt->memory->ptrs is NULL. This ptrs ptr is set in
the nv50_instobj_acquire called from nvkm_kmap.

If Thread A and Thread B both get to nv50_instobj_acquire around
the same time, and Thread A hits the refcount_set line, and in
lockstep thread B succeeds at refcount_inc_not_zero, there is a
chance the ptrs value won't have been stored since refcount_set
is unordered. Force a memory barrier here, I picked smp_mb, since
we want it on all CPUs and it's write followed by a read.

v2: use paired smp_rmb/smp_wmb.

Cc: linux-stable
Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c
index a7f3fc342d87..dd5b5a17ece0 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c
@@ -222,8 +222,11 @@ nv50_instobj_acquire(struct nvkm_memory *memory)
void __iomem *map = NULL;
 
/* Already mapped? */
-   if (refcount_inc_not_zero(>maps))
+   if (refcount_inc_not_zero(>maps)) {
+   /* read barrier match the wmb on refcount set */
+   smp_rmb();
return iobj->map;
+   }
 
/* Take the lock, and re-check that another thread hasn't
 * already mapped the object in the meantime.
@@ -250,6 +253,8 @@ nv50_instobj_acquire(struct nvkm_memory *memory)
iobj->base.memory.ptrs = _instobj_fast;
else
iobj->base.memory.ptrs = _instobj_slow;
+   /* barrier to ensure the ptrs are written before refcount is 
set */
+   smp_wmb();
refcount_set(>maps, 1);
}
 
-- 
2.43.2

Re: [PATCH] nouveau: fix instmem race condition around ptr stores

2024-04-09 Thread Dave Airlie

On Tue, 9 Apr 2024 at 21:33, Danilo Krummrich  wrote:
>
> On 4/9/24 10:27, Lucas Stach wrote:
> > Am Dienstag, dem 09.04.2024 um 10:34 +1000 schrieb Dave Airlie:
> >> From: Dave Airlie 
> >>
> >> Running a lot of VK CTS in parallel against nouveau, once every
> >> few hours you might see something like this crash.
> >>
> >> BUG: kernel NULL pointer dereference, address: 0008
> >> PGD 800114e6e067 P4D 800114e6e067 PUD 109046067 PMD 0
> >> Oops:  [#1] PREEMPT SMP PTI
> >> CPU: 7 PID: 53891 Comm: deqp-vk Not tainted 6.8.0-rc6+ #27
> >> Hardware name: Gigabyte Technology Co., Ltd. Z390 I AORUS PRO WIFI/Z390 I 
> >> AORUS PRO WIFI-CF, BIOS F8 11/05/2021
> >> RIP: 0010:gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
> >> Code: c7 48 01 c8 49 89 45 58 85 d2 0f 84 95 00 00 00 41 0f b7 46 12 49 8b 
> >> 7e 08 89 da 42 8d 2c f8 48 8b 47 08 41 83 c7 01 48 89 ee <48> 8b 40 08 ff 
> >> d0 0f 1f 00 49 8b 7e 08 48 89 d9 48 8d 75 04 48 c1
> >> RSP: :ac20c5857838 EFLAGS: 00010202
> >> RAX:  RBX: 004d8001 RCX: 0001
> >> RDX: 004d8001 RSI: 06d8 RDI: a07afe332180
> >> RBP: 06d8 R08: ac20c5857ad0 R09: 0010
> >> R10: 0001 R11: a07af27e2de0 R12: 001c
> >> R13: ac20c5857ad0 R14: a07a96fe9040 R15: 001c
> >> FS:  7fe395eed7c0() GS:a07e2c98() 
> >> knlGS:
> >> CS:  0010 DS:  ES:  CR0: 80050033
> >> CR2: 0008 CR3: 00011febe001 CR4: 003706f0
> >> DR0:  DR1:  DR2: 
> >> DR3:  DR6: fffe0ff0 DR7: 0400
> >> Call Trace:
> >>
> >> ...
> >>
> >>   ? gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
> >>   ? gp100_vmm_pgt_mem+0x37/0x180 [nouveau]
> >>   nvkm_vmm_iter+0x351/0xa20 [nouveau]
> >>   ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
> >>   ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
> >>   ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
> >>   ? __lock_acquire+0x3ed/0x2170
> >>   ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
> >>   nvkm_vmm_ptes_get_map+0xc2/0x100 [nouveau]
> >>   ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
> >>   ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
> >>   nvkm_vmm_map_locked+0x224/0x3a0 [nouveau]
> >>
> >> Adding any sort of useful debug usually makes it go away, so I hand
> >> wrote the function in a line, and debugged the asm.
> >>
> >> Every so often pt->memory->ptrs is NULL. This ptrs ptr is set in
> >> the nv50_instobj_acquire called from nvkm_kmap.
> >>
> >> If Thread A and Thread B both get to nv50_instobj_acquire around
> >> the same time, and Thread A hits the refcount_set line, and in
> >> lockstep thread B succeeds at refcount_inc_not_zero, there is a
> >> chance the ptrs value won't have been stored since refcount_set
> >> is unordered. Force a memory barrier here, I picked smp_mb, since
> >> we want it on all CPUs and it's write followed by a read.
>
> Good catch!
>
> >>
> >> Cc: linux-stable
> >> Signed-off-by: Dave Airlie 
> >> ---
> >>   drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c | 3 +++
> >>   1 file changed, 3 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c 
> >> b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c
> >> index a7f3fc342d87..cbacc7b11f8c 100644
> >> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c
> >> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c
> >> @@ -250,6 +250,9 @@ nv50_instobj_acquire(struct nvkm_memory *memory)
> >>  iobj->base.memory.ptrs = _instobj_fast;
> >>  else
> >>  iobj->base.memory.ptrs = _instobj_slow;
> >> +/* barrier to ensure ptrs is written before another thread
> >> +   does refcount_inc_not_zero successfully. */
> >> +smp_mb();
> >
> > Doesn't this miss the corresponding smp_rmb after
> > refcount_inc_not_zero? Without it a sufficiently speculating CPU might
> > still hoist the NULL ptr load across the refcount increase.
>
> Agree, also think this one could be smp_wmb() only.
>
> I also think it's reasonable to keep "the fast path refcount_inc_not_zero
> that doesn't take the lock", since the scope for this being potentially racy
> is limited to this function only.

I've been retesting with just barrier() here, since this seems at
least to be compiler related, but probably the smp_rmb/smp_wmb combo
is the safest answer across arches.

Dave.

Re: [PATCH] nouveau: fix devinit paths to only handle display on GSP.

2024-04-08 Thread Dave Airlie

On Mon, 8 Apr 2024 at 23:05, Timur Tabi  wrote:
>
> On Mon, 2024-04-08 at 16:42 +1000, Dave Airlie wrote:
> > This reverts:
> > nouveau/gsp: don't check devinit disable on GSP.
> > and applies a further fix.
> >
> > It turns out the open gpu driver, checks this register, but only for
> > display.
> >
> > Match that behaviour and only disable displays on turings.
>
> Why only on Turings?

The only is for disabling displays, not for the turings. The ampere
devinit path only handle displays, it never tries to disable other
engines so should be fine.

Dave.

[PATCH] nouveau: fix instmem race condition around ptr stores

2024-04-08 Thread Dave Airlie

From: Dave Airlie 

Running a lot of VK CTS in parallel against nouveau, once every
few hours you might see something like this crash.

BUG: kernel NULL pointer dereference, address: 0008
PGD 800114e6e067 P4D 800114e6e067 PUD 109046067 PMD 0
Oops:  [#1] PREEMPT SMP PTI
CPU: 7 PID: 53891 Comm: deqp-vk Not tainted 6.8.0-rc6+ #27
Hardware name: Gigabyte Technology Co., Ltd. Z390 I AORUS PRO WIFI/Z390 I AORUS 
PRO WIFI-CF, BIOS F8 11/05/2021
RIP: 0010:gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
Code: c7 48 01 c8 49 89 45 58 85 d2 0f 84 95 00 00 00 41 0f b7 46 12 49 8b 7e 
08 89 da 42 8d 2c f8 48 8b 47 08 41 83 c7 01 48 89 ee <48> 8b 40 08 ff d0 0f 1f 
00 49 8b 7e 08 48 89 d9 48 8d 75 04 48 c1
RSP: :ac20c5857838 EFLAGS: 00010202
RAX:  RBX: 004d8001 RCX: 0001
RDX: 004d8001 RSI: 06d8 RDI: a07afe332180
RBP: 06d8 R08: ac20c5857ad0 R09: 0010
R10: 0001 R11: a07af27e2de0 R12: 001c
R13: ac20c5857ad0 R14: a07a96fe9040 R15: 001c
FS:  7fe395eed7c0() GS:a07e2c98() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0008 CR3: 00011febe001 CR4: 003706f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:

...

 ? gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
 ? gp100_vmm_pgt_mem+0x37/0x180 [nouveau]
 nvkm_vmm_iter+0x351/0xa20 [nouveau]
 ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
 ? __lock_acquire+0x3ed/0x2170
 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
 nvkm_vmm_ptes_get_map+0xc2/0x100 [nouveau]
 ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
 nvkm_vmm_map_locked+0x224/0x3a0 [nouveau]

Adding any sort of useful debug usually makes it go away, so I hand
wrote the function in a line, and debugged the asm.

Every so often pt->memory->ptrs is NULL. This ptrs ptr is set in
the nv50_instobj_acquire called from nvkm_kmap.

If Thread A and Thread B both get to nv50_instobj_acquire around
the same time, and Thread A hits the refcount_set line, and in
lockstep thread B succeeds at refcount_inc_not_zero, there is a
chance the ptrs value won't have been stored since refcount_set
is unordered. Force a memory barrier here, I picked smp_mb, since
we want it on all CPUs and it's write followed by a read.

Cc: linux-stable
Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c
index a7f3fc342d87..cbacc7b11f8c 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c
@@ -250,6 +250,9 @@ nv50_instobj_acquire(struct nvkm_memory *memory)
iobj->base.memory.ptrs = _instobj_fast;
else
iobj->base.memory.ptrs = _instobj_slow;
+   /* barrier to ensure ptrs is written before another thread
+  does refcount_inc_not_zero successfully. */
+   smp_mb();
refcount_set(>maps, 1);
}
 
-- 
2.43.2

Re: [PATCH 1/2] drm/nouveau/kms/nv50-: Disable AUX bus for disconnected DP ports

2024-04-08 Thread Dave Airlie

On Fri, 5 Apr 2024 at 09:37, Lyude Paul  wrote:
>
> GSP has its own state for keeping track of whether or not a given display
> connector is plugged in or not, and enforces this state on the driver. In
> particular, AUX transactions on a DisplayPort connector which GSP says is
> disconnected can never succeed - and can in some cases even cause
> unexpected timeouts, which can trickle up to cause other problems. A good
> example of this is runtime power management: where we can actually get
> stuck trying to resume the GPU if a userspace application like fwupd tries
> accessing a drm_aux_dev for a disconnected port. This was an issue I hit a
> few times with my Slimbook Executive 16 - where trying to offload something
> to the discrete GPU would wake it up, and then potentially cause it to
> timeout as fwupd tried to immediately access the dp_aux_dev nodes for
> nouveau.
>
> Likewise: we don't really have any cases I know of where we'd want to
> ignore this state and try an aux transaction anyway - and failing pointless
> aux transactions immediately can even speed things up. So - let's start
> enabling/disabling the aux bus in nouveau_dp_detect() to fix this. We
> enable the aux bus during connector probing, and leave it enabled if we
> discover something is actually on the connector. Otherwise, we just shut it
> off.
>
> This should fix some people's runtime PM issues (like myself), and also get
> rid of quite of a lot of GSP error spam in dmesg.
>
> Signed-off-by: Lyude Paul 


For the two patches,

Reviewed-by: Dave Airlie 

> ---
>  drivers/gpu/drm/nouveau/nouveau_dp.c | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_dp.c 
> b/drivers/gpu/drm/nouveau/nouveau_dp.c
> index fb06ee17d9e54..8b1be7dd64ebe 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_dp.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_dp.c
> @@ -232,6 +232,9 @@ nouveau_dp_detect(struct nouveau_connector *nv_connector,
> dpcd[DP_DPCD_REV] != 0)
> return NOUVEAU_DP_SST;
>
> +   // Ensure that the aux bus is enabled for probing
> +   drm_dp_dpcd_set_powered(_connector->aux, true);
> +
> mutex_lock(_encoder->dp.hpd_irq_lock);
> if (mstm) {
> /* If we're not ready to handle MST state changes yet, just
> @@ -293,6 +296,13 @@ nouveau_dp_detect(struct nouveau_connector *nv_connector,
> if (mstm && !mstm->suspended && ret != NOUVEAU_DP_MST)
> nv50_mstm_remove(mstm);
>
> +   /* GSP doesn't like when we try to do aux transactions on a port it 
> considers disconnected,
> +* and since we don't really have a usecase for that anyway - just 
> disable the aux bus here
> +* if we've decided the connector is disconnected
> +*/
> +   if (ret == NOUVEAU_DP_NONE)
> +   drm_dp_dpcd_set_powered(_connector->aux, false);
> +
> mutex_unlock(_encoder->dp.hpd_irq_lock);
> return ret;
>  }
> --
> 2.44.0
>

[PATCH] nouveau: fix devinit paths to only handle display on GSP.

2024-04-08 Thread Dave Airlie

This reverts:
nouveau/gsp: don't check devinit disable on GSP.
and applies a further fix.

It turns out the open gpu driver, checks this register, but only for display.

Match that behaviour and only disable displays on turings.

Fixes: 5d4e8ae6e57b ("nouveau/gsp: don't check devinit disable on GSP.")
Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/devinit/gm107.c | 12 
 drivers/gpu/drm/nouveau/nvkm/subdev/devinit/r535.c  |  1 +
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/devinit/gm107.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/devinit/gm107.c
index 7bcbc4895ec2..271bfa038f5b 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/devinit/gm107.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/devinit/gm107.c
@@ -25,6 +25,7 @@
 
 #include 
 #include 
+#include 
 
 void
 gm107_devinit_disable(struct nvkm_devinit *init)
@@ -33,10 +34,13 @@ gm107_devinit_disable(struct nvkm_devinit *init)
u32 r021c00 = nvkm_rd32(device, 0x021c00);
u32 r021c04 = nvkm_rd32(device, 0x021c04);
 
-   if (r021c00 & 0x0001)
-   nvkm_subdev_disable(device, NVKM_ENGINE_CE, 0);
-   if (r021c00 & 0x0004)
-   nvkm_subdev_disable(device, NVKM_ENGINE_CE, 2);
+   /* gsp only wants to enable/disable display */
+   if (!nvkm_gsp_rm(device->gsp)) {
+   if (r021c00 & 0x0001)
+   nvkm_subdev_disable(device, NVKM_ENGINE_CE, 0);
+   if (r021c00 & 0x0004)
+   nvkm_subdev_disable(device, NVKM_ENGINE_CE, 2);
+   }
if (r021c04 & 0x0001)
nvkm_subdev_disable(device, NVKM_ENGINE_DISP, 0);
 }
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/devinit/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/devinit/r535.c
index 11b4c9c274a1..666eb93b1742 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/devinit/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/devinit/r535.c
@@ -41,6 +41,7 @@ r535_devinit_new(const struct nvkm_devinit_func *hw,
 
rm->dtor = r535_devinit_dtor;
rm->post = hw->post;
+   rm->disable = hw->disable;
 
ret = nv50_devinit_new_(rm, device, type, inst, pdevinit);
if (ret)
-- 
2.44.0

[PATCH] nouveau/uvmm: fix addr/range calcs for remap operations

2024-03-27 Thread Dave Airlie

From: Dave Airlie 

dEQP-VK.sparse_resources.image_rebind.2d_array.r64i.128_128_8
was causing a remap operation like the below.

op_remap: prev: 003fffed 000f a5abd18a 

op_remap: next:
op_remap: unmap: 003fffed 0010 0
op_map: map: 003c 0001 5b1ba33c 000e

This was resulting in an unmap operation from 0x3fffed+0xf, 0x10
which was corrupting the pagetables and oopsing the kernel.

Fixes the prev + unmap range calcs to use start/end and map back to addr/range.

Signed-off-by: Dave Airlie 
Fixes: b88baab82871 ("drm/nouveau: implement new VM_BIND uAPI")
Cc: Danilo Krummrich 
---
 drivers/gpu/drm/nouveau/nouveau_uvmm.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c 
b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
index 9675ef25b16d..87bce1a9d073 100644
--- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
@@ -813,15 +813,15 @@ op_remap(struct drm_gpuva_op_remap *r,
struct drm_gpuva_op_unmap *u = r->unmap;
struct nouveau_uvma *uvma = uvma_from_va(u->va);
u64 addr = uvma->va.va.addr;
-   u64 range = uvma->va.va.range;
+   u64 end = uvma->va.va.addr + uvma->va.va.range;
 
if (r->prev)
addr = r->prev->va.addr + r->prev->va.range;
 
if (r->next)
-   range = r->next->va.addr - addr;
+   end = r->next->va.addr;
 
-   op_unmap_range(u, addr, range);
+   op_unmap_range(u, addr, end - addr);
 }
 
 static int
-- 
2.43.2

[PATCH] nouveau/gsp: don't check devinit disable on GSP.

2024-03-13 Thread Dave Airlie

From: Dave Airlie 

GSP should be handling this and I can see no evidence in opengpu
driver that this register should be touched.

Fixed acceleration on 2080 Ti GPUs.

Fixes: 15740541e8f0 ("drm/nouveau/devinit/tu102-: prepare for GSP-RM")

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/devinit/r535.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/devinit/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/devinit/r535.c
index 666eb93b1742..11b4c9c274a1 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/devinit/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/devinit/r535.c
@@ -41,7 +41,6 @@ r535_devinit_new(const struct nvkm_devinit_func *hw,
 
rm->dtor = r535_devinit_dtor;
rm->post = hw->post;
-   rm->disable = hw->disable;
 
ret = nv50_devinit_new_(rm, device, type, inst, pdevinit);
if (ret)
-- 
2.43.2

[PATCH] nouveau: reset the bo resource bus info after an eviction

2024-03-11 Thread Dave Airlie

From: Dave Airlie 

Later attempts to refault the bo won't happen and the whole
GPU does to lunch. I think Christian's refactoring of this
code out to the driver broke this not very well tested path.

Fixes: 141b15e59175 ("drm/nouveau: move io_reserve_lru handling into the driver 
v5")
Cc: Christian König 
Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index c6c544d7c911..a4e8f625fce6 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -1271,6 +1271,8 @@ nouveau_ttm_io_mem_reserve(struct ttm_device *bdev, 
struct ttm_resource *reg)
drm_vma_node_unmap(>bo.base.vma_node,
   bdev->dev_mapping);
nouveau_ttm_io_mem_free_locked(drm, nvbo->bo.resource);
+   nvbo->bo.resource->bus.offset = 0;
+   nvbo->bo.resource->bus.addr = NULL;
goto retry;
}
 
-- 
2.43.2

[PATCH 2/2] nouveau/umem: rename nvkm client lock to umem_lock

2024-02-29 Thread Dave Airlie

From: Dave Airlie 

This lock is just protecting the umem list so name it as such.

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/include/nvkm/core/client.h |  2 +-
 drivers/gpu/drm/nouveau/nvkm/core/client.c |  2 +-
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/umem.c | 12 ++--
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/include/nvkm/core/client.h 
b/drivers/gpu/drm/nouveau/include/nvkm/core/client.h
index 932c9fd0b2d8..6ca50e864e86 100644
--- a/drivers/gpu/drm/nouveau/include/nvkm/core/client.h
+++ b/drivers/gpu/drm/nouveau/include/nvkm/core/client.h
@@ -17,7 +17,7 @@ struct nvkm_client {
int (*event)(u64 token, void *argv, u32 argc);
 
struct list_head umem;
-   spinlock_t lock;
+   spinlock_t umem_lock;
 };
 
 int  nvkm_client_new(const char *name, u64 device, const char *cfg, const char 
*dbg,
diff --git a/drivers/gpu/drm/nouveau/nvkm/core/client.c 
b/drivers/gpu/drm/nouveau/nvkm/core/client.c
index c55662937ab2..2885795f71d4 100644
--- a/drivers/gpu/drm/nouveau/nvkm/core/client.c
+++ b/drivers/gpu/drm/nouveau/nvkm/core/client.c
@@ -183,6 +183,6 @@ nvkm_client_new(const char *name, u64 device, const char 
*cfg, const char *dbg,
spin_lock_init(>obj_lock);
client->event = event;
INIT_LIST_HEAD(>umem);
-   spin_lock_init(>lock);
+   spin_lock_init(>umem_lock);
return 0;
 }
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/umem.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/umem.c
index e530bb8b3b17..a16cc20b835f 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/umem.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/umem.c
@@ -42,14 +42,14 @@ nvkm_umem_search(struct nvkm_client *client, u64 handle)
object = nvkm_object_search(client, handle, _umem);
if (IS_ERR(object)) {
if (client != master) {
-   spin_lock(>lock);
+   spin_lock(>umem_lock);
list_for_each_entry(umem, >umem, head) {
if (umem->object.object == handle) {
memory = nvkm_memory_ref(umem->memory);
break;
}
}
-   spin_unlock(>lock);
+   spin_unlock(>umem_lock);
}
} else {
umem = nvkm_umem(object);
@@ -124,9 +124,9 @@ static void *
 nvkm_umem_dtor(struct nvkm_object *object)
 {
struct nvkm_umem *umem = nvkm_umem(object);
-   spin_lock(>object.client->lock);
+   spin_lock(>object.client->umem_lock);
list_del_init(>head);
-   spin_unlock(>object.client->lock);
+   spin_unlock(>object.client->umem_lock);
nvkm_memory_unref(>memory);
return umem;
 }
@@ -179,9 +179,9 @@ nvkm_umem_new(const struct nvkm_oclass *oclass, void *argv, 
u32 argc,
if (ret)
return ret;
 
-   spin_lock(>object.client->lock);
+   spin_lock(>object.client->umem_lock);
list_add(>head, >object.client->umem);
-   spin_unlock(>object.client->lock);
+   spin_unlock(>object.client->umem_lock);
 
args->v0.page = nvkm_memory_page(umem->memory);
args->v0.addr = nvkm_memory_addr(umem->memory);
-- 
2.43.2

[PATCH 1/2] nouveau: lock the client object tree.

2024-02-29 Thread Dave Airlie

From: Dave Airlie 

It appears the client object tree has no locking unless I've missed
something else. Fix races around adding/removing client objects,
mostly vram bar mappings.

 4562.099306] general protection fault, probably for non-canonical address 
0x6677ed422bceb80c:  [#1] PREEMPT SMP PTI
[ 4562.099314] CPU: 2 PID: 23171 Comm: deqp-vk Not tainted 6.8.0-rc6+ #27
[ 4562.099324] Hardware name: Gigabyte Technology Co., Ltd. Z390 I AORUS PRO 
WIFI/Z390 I AORUS PRO WIFI-CF, BIOS F8 11/05/2021
[ 4562.099330] RIP: 0010:nvkm_object_search+0x1d/0x70 [nouveau]
[ 4562.099503] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 
44 00 00 48 89 f8 48 85 f6 74 39 48 8b 87 a0 00 00 00 48 85 c0 74 12 <48> 8b 48 
f8 48 39 ce 73 15 48 8b 40 10 48 85 c0 75 ee 48 c7 c0 fe
[ 4562.099506] RSP: :a94cc420bbf8 EFLAGS: 00010206
[ 4562.099512] RAX: 6677ed422bceb814 RBX: 98108791f400 RCX: 9810f26b8f58
[ 4562.099517] RDX:  RSI: 9810f26b9158 RDI: 98108791f400
[ 4562.099519] RBP: 9810f26b9158 R08:  R09: 
[ 4562.099521] R10: a94cc420bc48 R11: 0001 R12: 9810f02a7cc0
[ 4562.099526] R13:  R14: 00ff R15: 0007
[ 4562.099528] FS:  7f629c5017c0() GS:98142c70() 
knlGS:
[ 4562.099534] CS:  0010 DS:  ES:  CR0: 80050033
[ 4562.099536] CR2: 7f629a882000 CR3: 00017019e004 CR4: 003706f0
[ 4562.099541] DR0:  DR1:  DR2: 
[ 4562.099542] DR3:  DR6: fffe0ff0 DR7: 0400
[ 4562.099544] Call Trace:
[ 4562.099555]  
[ 4562.099573]  ? die_addr+0x36/0x90
[ 4562.099583]  ? exc_general_protection+0x246/0x4a0
[ 4562.099593]  ? asm_exc_general_protection+0x26/0x30
[ 4562.099600]  ? nvkm_object_search+0x1d/0x70 [nouveau]
[ 4562.099730]  nvkm_ioctl+0xa1/0x250 [nouveau]
[ 4562.099861]  nvif_object_map_handle+0xc8/0x180 [nouveau]
[ 4562.099986]  nouveau_ttm_io_mem_reserve+0x122/0x270 [nouveau]
[ 4562.100156]  ? dma_resv_test_signaled+0x26/0xb0
[ 4562.100163]  ttm_bo_vm_fault_reserved+0x97/0x3c0 [ttm]
[ 4562.100182]  ? __mutex_unlock_slowpath+0x2a/0x270
[ 4562.100189]  nouveau_ttm_fault+0x69/0xb0 [nouveau]
[ 4562.100356]  __do_fault+0x32/0x150
[ 4562.100362]  do_fault+0x7c/0x560
[ 4562.100369]  __handle_mm_fault+0x800/0xc10
[ 4562.100382]  handle_mm_fault+0x17c/0x3e0
[ 4562.100388]  do_user_addr_fault+0x208/0x860
[ 4562.100395]  exc_page_fault+0x7f/0x200
[ 4562.100402]  asm_exc_page_fault+0x26/0x30
[ 4562.100412] RIP: 0033:0x9b9870
[ 4562.100419] Code: 85 a8 f7 ff ff 8b 8d 80 f7 ff ff 89 08 e9 18 f2 ff ff 0f 
1f 84 00 00 00 00 00 44 89 32 e9 90 fa ff ff 0f 1f 84 00 00 00 00 00 <44> 89 32 
e9 f8 f1 ff ff 0f 1f 84 00 00 00 00 00 66 44 89 32 e9 e7
[ 4562.100422] RSP: 002b:7fff9ba2dc70 EFLAGS: 00010246
[ 4562.100426] RAX: 0004 RBX: 0dd65e10 RCX: 00fff000
[ 4562.100428] RDX: 7f629a882000 RSI: 7f629a882000 RDI: 0066
[ 4562.100432] RBP: 7fff9ba2e570 R08:  R09: 000123ddf000
[ 4562.100434] R10: 0001 R11: 0246 R12: 7fff
[ 4562.100436] R13:  R14:  R15: 
[ 4562.100446]  
[ 4562.100448] Modules linked in: nf_conntrack_netbios_ns 
nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib 
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat 
nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c 
nfnetlink cmac bnep sunrpc iwlmvm intel_rapl_msr intel_rapl_common 
snd_sof_pci_intel_cnl x86_pkg_temp_thermal intel_powerclamp 
snd_sof_intel_hda_common mac80211 coretemp snd_soc_acpi_intel_match kvm_intel 
snd_soc_acpi snd_soc_hdac_hda snd_sof_pci snd_sof_xtensa_dsp 
snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof kvm snd_sof_utils 
snd_soc_core snd_hda_codec_realtek libarc4 snd_hda_codec_generic snd_compress 
snd_hda_ext_core vfat fat snd_hda_intel snd_intel_dspcfg irqbypass iwlwifi 
snd_hda_codec snd_hwdep snd_hda_core btusb btrtl mei_hdcp iTCO_wdt rapl mei_pxp 
btintel snd_seq iTCO_vendor_support btbcm snd_seq_device intel_cstate bluetooth 
snd_pcm cfg80211 intel_wmi_thunderbolt wmi_bmof intel_uncore snd_timer mei_me 
snd ecdh_generic i2c_i801
[ 4562.100541]  ecc mei i2c_smbus soundcore rfkill intel_pch_thermal acpi_pad 
zram nouveau drm_ttm_helper ttm gpu_sched i2c_algo_bit drm_gpuvm drm_exec 
mxm_wmi drm_display_helper drm_kms_helper drm crct10dif_pclmul crc32_pclmul 
nvme e1000e crc32c_intel nvme_core ghash_clmulni_intel video wmi 
pinctrl_cannonlake ip6_tables ip_tables fuse
[ 4562.100616] ---[ end trace  ]---

Signed-off-by: Dave Airlie 
Cc: sta...@vger.kernel.org
---
 .../drm/nouveau/include/nvkm/core/client.h|  1 +
 drivers/gpu/drm/nouveau/nvkm/core/client.c|  1 +
 drivers/gpu/drm/nouveau/nvkm/core/obje

[PATCH] nouveau: report byte usage in VRAM usage.

2024-02-25 Thread Dave Airlie

From: Dave Airlie 

Turns out usage is always in bytes not shifted.

Fixes: 72fa02fdf833 ("nouveau: add an ioctl to report vram usage")
Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nouveau_abi16.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.c 
b/drivers/gpu/drm/nouveau/nouveau_abi16.c
index cd14f993bdd1..80f74ee0fc78 100644
--- a/drivers/gpu/drm/nouveau/nouveau_abi16.c
+++ b/drivers/gpu/drm/nouveau/nouveau_abi16.c
@@ -269,7 +269,7 @@ nouveau_abi16_ioctl_getparam(ABI16_IOCTL_ARGS)
break;
case NOUVEAU_GETPARAM_VRAM_USED: {
struct ttm_resource_manager *vram_mgr = 
ttm_manager_type(>ttm.bdev, TTM_PL_VRAM);
-   getparam->value = (u64)ttm_resource_manager_usage(vram_mgr) << 
PAGE_SHIFT;
+   getparam->value = (u64)ttm_resource_manager_usage(vram_mgr);
break;
}
default:
-- 
2.43.2

Re: [PATCH] drm/nouveau: use dedicated wq for fence uevents work

2024-02-25 Thread Dave Airlie

On Sat, 24 Feb 2024 at 03:01, Danilo Krummrich  wrote:
>
> On Fri, Feb 23, 2024 at 10:14:53AM +1000, Dave Airlie wrote:
> > On Fri, 23 Feb 2024 at 00:45, Danilo Krummrich  wrote:
> > >
> > > Using the kernel global workqueue to signal fences can lead to
> > > unexpected deadlocks. Some other work (e.g. from a different driver)
> > > could directly or indirectly depend on this fence to be signaled.
> > > However, if the WQ_MAX_ACTIVE limit is reached by waiters, this can
> > > prevent the work signaling the fence from running.
> > >
> > > While this seems fairly unlikely, it's potentially exploitable.
> >
> > LGTM
> >
> > Reviewed-by: Dave Airlie 
> >
> > probably should go into drm-misc-fixes?
>
> Yes, however, 39126abc5e20 went in through drm-fixes directly it seems, since
> it's not in drm-misc-fixes.
>
> Guess you want me to cherry-pick 39126abc5e20 to drm-misc-fixes rather than 
> take
> this one through drm-fixes as well?

Nah don't ever cherry-pick, backmerge would be the correct thing, but
I'll just take it in through drm-fixes.

Dave.

>
> >
> > >
> > > Fixes: 39126abc5e20 ("nouveau: offload fence uevents work to workqueue")
> > > Signed-off-by: Danilo Krummrich 
> > > ---
> > >  drivers/gpu/drm/nouveau/nouveau_drm.c   | 13 +++--
> > >  drivers/gpu/drm/nouveau/nouveau_drv.h   |  3 +++
> > >  drivers/gpu/drm/nouveau/nouveau_fence.c |  3 ++-
> > >  drivers/gpu/drm/nouveau/nouveau_fence.h |  2 ++
> > >  4 files changed, 18 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c 
> > > b/drivers/gpu/drm/nouveau/nouveau_drm.c
> > > index 6f6c31a9937b..6be202081077 100644
> > > --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> > > +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> > > @@ -598,9 +598,15 @@ nouveau_drm_device_init(struct drm_device *dev)
> > > goto fail_alloc;
> > > }
> > >
> > > +   drm->fence_wq = alloc_workqueue("nouveau_fence_wq", 0, 
> > > WQ_MAX_ACTIVE);
> > > +   if (!drm->fence_wq) {
> > > +   ret = -ENOMEM;
> > > +   goto fail_sched_wq;
> > > +   }
> > > +
> > > ret = nouveau_cli_init(drm, "DRM-master", >master);
> > > if (ret)
> > > -   goto fail_wq;
> > > +   goto fail_fence_wq;
> > >
> > > ret = nouveau_cli_init(drm, "DRM", >client);
> > > if (ret)
> > > @@ -670,7 +676,9 @@ nouveau_drm_device_init(struct drm_device *dev)
> > > nouveau_cli_fini(>client);
> > >  fail_master:
> > > nouveau_cli_fini(>master);
> > > -fail_wq:
> > > +fail_fence_wq:
> > > +   destroy_workqueue(drm->fence_wq);
> > > +fail_sched_wq:
> > > destroy_workqueue(drm->sched_wq);
> > >  fail_alloc:
> > > nvif_parent_dtor(>parent);
> > > @@ -725,6 +733,7 @@ nouveau_drm_device_fini(struct drm_device *dev)
> > >
> > > nouveau_cli_fini(>client);
> > > nouveau_cli_fini(>master);
> > > +   destroy_workqueue(drm->fence_wq);
> > > destroy_workqueue(drm->sched_wq);
> > > nvif_parent_dtor(>parent);
> > > mutex_destroy(>clients_lock);
> > > diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h 
> > > b/drivers/gpu/drm/nouveau/nouveau_drv.h
> > > index 8a6d94c8b163..b43619a213a4 100644
> > > --- a/drivers/gpu/drm/nouveau/nouveau_drv.h
> > > +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
> > > @@ -261,6 +261,9 @@ struct nouveau_drm {
> > > /* Workqueue used for channel schedulers. */
> > > struct workqueue_struct *sched_wq;
> > >
> > > +   /* Workqueue used to signal fences. */
> > > +   struct workqueue_struct *fence_wq;
> > > +
> > > /* context for accelerated drm-internal operations */
> > > struct nouveau_channel *cechan;
> > > struct nouveau_channel *channel;
> > > diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c 
> > > b/drivers/gpu/drm/nouveau/nouveau_fence.c
> > > index 93f08f9479d8..c3ea3cd933cd 100644
> > > --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> > > +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> > > @@ -174,7 +174,7 @@ static int
> > >  nouveau_

Re: [PATCH] drm/nouveau: use dedicated wq for fence uevents work

2024-02-22 Thread Dave Airlie

On Fri, 23 Feb 2024 at 00:45, Danilo Krummrich  wrote:
>
> Using the kernel global workqueue to signal fences can lead to
> unexpected deadlocks. Some other work (e.g. from a different driver)
> could directly or indirectly depend on this fence to be signaled.
> However, if the WQ_MAX_ACTIVE limit is reached by waiters, this can
> prevent the work signaling the fence from running.
>
> While this seems fairly unlikely, it's potentially exploitable.

LGTM

Reviewed-by: Dave Airlie 

probably should go into drm-misc-fixes?

>
> Fixes: 39126abc5e20 ("nouveau: offload fence uevents work to workqueue")
> Signed-off-by: Danilo Krummrich 
> ---
>  drivers/gpu/drm/nouveau/nouveau_drm.c   | 13 +++--
>  drivers/gpu/drm/nouveau/nouveau_drv.h   |  3 +++
>  drivers/gpu/drm/nouveau/nouveau_fence.c |  3 ++-
>  drivers/gpu/drm/nouveau/nouveau_fence.h |  2 ++
>  4 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c 
> b/drivers/gpu/drm/nouveau/nouveau_drm.c
> index 6f6c31a9937b..6be202081077 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -598,9 +598,15 @@ nouveau_drm_device_init(struct drm_device *dev)
> goto fail_alloc;
> }
>
> +   drm->fence_wq = alloc_workqueue("nouveau_fence_wq", 0, WQ_MAX_ACTIVE);
> +   if (!drm->fence_wq) {
> +   ret = -ENOMEM;
> +   goto fail_sched_wq;
> +   }
> +
> ret = nouveau_cli_init(drm, "DRM-master", >master);
> if (ret)
> -   goto fail_wq;
> +   goto fail_fence_wq;
>
> ret = nouveau_cli_init(drm, "DRM", >client);
> if (ret)
> @@ -670,7 +676,9 @@ nouveau_drm_device_init(struct drm_device *dev)
> nouveau_cli_fini(>client);
>  fail_master:
> nouveau_cli_fini(>master);
> -fail_wq:
> +fail_fence_wq:
> +   destroy_workqueue(drm->fence_wq);
> +fail_sched_wq:
> destroy_workqueue(drm->sched_wq);
>  fail_alloc:
> nvif_parent_dtor(>parent);
> @@ -725,6 +733,7 @@ nouveau_drm_device_fini(struct drm_device *dev)
>
> nouveau_cli_fini(>client);
> nouveau_cli_fini(>master);
> +   destroy_workqueue(drm->fence_wq);
> destroy_workqueue(drm->sched_wq);
> nvif_parent_dtor(>parent);
> mutex_destroy(>clients_lock);
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h 
> b/drivers/gpu/drm/nouveau/nouveau_drv.h
> index 8a6d94c8b163..b43619a213a4 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drv.h
> +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
> @@ -261,6 +261,9 @@ struct nouveau_drm {
> /* Workqueue used for channel schedulers. */
> struct workqueue_struct *sched_wq;
>
> +   /* Workqueue used to signal fences. */
> +   struct workqueue_struct *fence_wq;
> +
> /* context for accelerated drm-internal operations */
> struct nouveau_channel *cechan;
> struct nouveau_channel *channel;
> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c 
> b/drivers/gpu/drm/nouveau/nouveau_fence.c
> index 93f08f9479d8..c3ea3cd933cd 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> @@ -174,7 +174,7 @@ static int
>  nouveau_fence_wait_uevent_handler(struct nvif_event *event, void *repv, u32 
> repc)
>  {
> struct nouveau_fence_chan *fctx = container_of(event, typeof(*fctx), 
> event);
> -   schedule_work(>uevent_work);
> +   queue_work(fctx->wq, >uevent_work);
> return NVIF_EVENT_KEEP;
>  }
>
> @@ -194,6 +194,7 @@ nouveau_fence_context_new(struct nouveau_channel *chan, 
> struct nouveau_fence_cha
> INIT_LIST_HEAD(>pending);
> spin_lock_init(>lock);
> fctx->context = chan->drm->runl[chan->runlist].context_base + 
> chan->chid;
> +   fctx->wq = chan->drm->fence_wq;
>
> if (chan == chan->drm->cechan)
> strcpy(fctx->name, "copy engine channel");
> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h 
> b/drivers/gpu/drm/nouveau/nouveau_fence.h
> index 8bc065acfe35..bc13110bdfa4 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_fence.h
> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.h
> @@ -44,7 +44,9 @@ struct nouveau_fence_chan {
> u32 context;
> char name[32];
>
> +   struct workqueue_struct *wq;
> struct work_struct uevent_work;
> +
> struct nvif_event event;
> int notify_ref, dead, killed;
>  };
>
> base-commit: 1f4c6f11a557642505e5f403e0dfabbaff9c529a
> --
> 2.43.0
>

Re: [PATCH] nouveau/gsp: add kconfig option to enable GSP paths by default

2024-02-13 Thread Dave Airlie

(ignore this one, sent another just after)

On Wed, 14 Feb 2024 at 13:40, Dave Airlie  wrote:
>
> From: Dave Airlie 
>
> Turing and Ampere will continue to use the old paths by default,
> but we should allow distros to decide what the policy is.
>
> Signed-off-by: Dave Airlie 
> ---
>  drivers/gpu/drm/nouveau/Kconfig| 8 
>  drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 3 +++
>  2 files changed, 11 insertions(+)
>
> diff --git a/drivers/gpu/drm/nouveau/Kconfig b/drivers/gpu/drm/nouveau/Kconfig
> index 1e6aaf95ff7c..ceef470c9fbf 100644
> --- a/drivers/gpu/drm/nouveau/Kconfig
> +++ b/drivers/gpu/drm/nouveau/Kconfig
> @@ -100,3 +100,11 @@ config DRM_NOUVEAU_SVM
> help
>   Say Y here if you want to enable experimental support for
>   Shared Virtual Memory (SVM).
> +
> +config DRM_NOUVEAU_GSP_DEFAULT
> +   bool "Use GSP firmware for Turing/Ampere (needs firmware installed)"
> +   depends on DRM_NOUVEAU
> +   default n
> +   help
> + Say Y here if you want to use the GSP codepaths by default on
> + Turing and Ampere GPUs.
> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c 
> b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> index a41735ab6068..fadbb892 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> @@ -2313,6 +2313,9 @@ r535_gsp_load(struct nvkm_gsp *gsp, int ver, const 
> struct nvkm_gsp_fwif *fwif)
> struct nvkm_subdev *subdev = >subdev;
> int ret;
>
> +#if IS_ENABLED(CONFIG_DRM_NOUVEAU_GSP_DEFAULT)
> +   fwif->enable = true;
> +#endif
> if (!nvkm_boolopt(subdev->device->cfgopt, "NvGspRm", fwif->enable))
> return -EINVAL;
>
> --
> 2.43.0
>

[PATCH] nouveau/gsp: add kconfig option to enable GSP paths by default

2024-02-13 Thread Dave Airlie

From: Dave Airlie 

Turing and Ampere will continue to use the old paths by default,
but we should allow distros to decide what the policy is.

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/Kconfig| 8 
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 6 +-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/Kconfig b/drivers/gpu/drm/nouveau/Kconfig
index 1e6aaf95ff7c..ceef470c9fbf 100644
--- a/drivers/gpu/drm/nouveau/Kconfig
+++ b/drivers/gpu/drm/nouveau/Kconfig
@@ -100,3 +100,11 @@ config DRM_NOUVEAU_SVM
help
  Say Y here if you want to enable experimental support for
  Shared Virtual Memory (SVM).
+
+config DRM_NOUVEAU_GSP_DEFAULT
+   bool "Use GSP firmware for Turing/Ampere (needs firmware installed)"
+   depends on DRM_NOUVEAU
+   default n
+   help
+ Say Y here if you want to use the GSP codepaths by default on
+ Turing and Ampere GPUs.
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
index a41735ab6068..a64c81385682 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
@@ -2312,8 +2312,12 @@ r535_gsp_load(struct nvkm_gsp *gsp, int ver, const 
struct nvkm_gsp_fwif *fwif)
 {
struct nvkm_subdev *subdev = >subdev;
int ret;
+   bool enable_gsp = fwif->enable;
 
-   if (!nvkm_boolopt(subdev->device->cfgopt, "NvGspRm", fwif->enable))
+#if IS_ENABLED(CONFIG_DRM_NOUVEAU_GSP_DEFAULT)
+   enable_gsp = true;
+#endif
+   if (!nvkm_boolopt(subdev->device->cfgopt, "NvGspRm", enable_gsp))
return -EINVAL;
 
if ((ret = r535_gsp_load_fw(gsp, "gsp", fwif->ver, >fws.rm)) ||
-- 
2.43.0

[PATCH] nouveau/gsp: add kconfig option to enable GSP paths by default

2024-02-13 Thread Dave Airlie

From: Dave Airlie 

Turing and Ampere will continue to use the old paths by default,
but we should allow distros to decide what the policy is.

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/Kconfig| 8 
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 3 +++
 2 files changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/Kconfig b/drivers/gpu/drm/nouveau/Kconfig
index 1e6aaf95ff7c..ceef470c9fbf 100644
--- a/drivers/gpu/drm/nouveau/Kconfig
+++ b/drivers/gpu/drm/nouveau/Kconfig
@@ -100,3 +100,11 @@ config DRM_NOUVEAU_SVM
help
  Say Y here if you want to enable experimental support for
  Shared Virtual Memory (SVM).
+
+config DRM_NOUVEAU_GSP_DEFAULT
+   bool "Use GSP firmware for Turing/Ampere (needs firmware installed)"
+   depends on DRM_NOUVEAU
+   default n
+   help
+ Say Y here if you want to use the GSP codepaths by default on
+ Turing and Ampere GPUs.
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
index a41735ab6068..fadbb892 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
@@ -2313,6 +2313,9 @@ r535_gsp_load(struct nvkm_gsp *gsp, int ver, const struct 
nvkm_gsp_fwif *fwif)
struct nvkm_subdev *subdev = >subdev;
int ret;
 
+#if IS_ENABLED(CONFIG_DRM_NOUVEAU_GSP_DEFAULT)
+   fwif->enable = true;
+#endif
if (!nvkm_boolopt(subdev->device->cfgopt, "NvGspRm", fwif->enable))
return -EINVAL;
 
-- 
2.43.0

Re: [PATCH 1/2] drm/nouveau: don't fini scheduler if not initialized

2024-02-09 Thread Dave Airlie

On Fri, 2 Feb 2024 at 10:06, Danilo Krummrich  wrote:
>
> nouveau_abi16_ioctl_channel_alloc() and nouveau_cli_init() simply call
> their corresponding *_fini() counterpart. This can lead to
> nouveau_sched_fini() being called without struct nouveau_sched ever
> being initialized in the first place.
>
> Instead of embedding struct nouveau_sched into struct nouveau_cli and
> struct nouveau_chan_abi16, allocate struct nouveau_sched separately,
> such that we can check for the corresponding pointer to be NULL in the
> particular *_fini() functions.
>
> It makes sense to allocate struct nouveau_sched separately anyway, since
> in a subsequent commit we can also avoid to allocate a struct
> nouveau_sched in nouveau_abi16_ioctl_channel_alloc() at all, if the
> VM_BIND uAPI has been disabled due to the legacy uAPI being used.

Looks good,

for the series
Reviewed-by: Dave Airlie 

>
> Fixes: 5f03a507b29e ("drm/nouveau: implement 1:1 scheduler - entity 
> relationship")
> Reported-by: Timur Tabi 
> Closes: 
> https://lore.kernel.org/nouveau/20240131213917.1545604-1-tt...@nvidia.com/
> Signed-off-by: Danilo Krummrich 
> ---
>  drivers/gpu/drm/nouveau/nouveau_abi16.c | 10 ---
>  drivers/gpu/drm/nouveau/nouveau_abi16.h |  2 +-
>  drivers/gpu/drm/nouveau/nouveau_drm.c   |  7 +++--
>  drivers/gpu/drm/nouveau/nouveau_drv.h   |  2 +-
>  drivers/gpu/drm/nouveau/nouveau_exec.c  |  2 +-
>  drivers/gpu/drm/nouveau/nouveau_sched.c | 38 +++--
>  drivers/gpu/drm/nouveau/nouveau_sched.h |  6 ++--
>  drivers/gpu/drm/nouveau/nouveau_uvmm.c  |  2 +-
>  8 files changed, 53 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.c 
> b/drivers/gpu/drm/nouveau/nouveau_abi16.c
> index a04156ca8390..ca4b5ab3e59e 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_abi16.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_abi16.c
> @@ -128,12 +128,14 @@ nouveau_abi16_chan_fini(struct nouveau_abi16 *abi16,
> struct nouveau_abi16_ntfy *ntfy, *temp;
>
> /* Cancel all jobs from the entity's queue. */
> -   drm_sched_entity_fini(>sched.entity);
> +   if (chan->sched)
> +   drm_sched_entity_fini(>sched->entity);
>
> if (chan->chan)
> nouveau_channel_idle(chan->chan);
>
> -   nouveau_sched_fini(>sched);
> +   if (chan->sched)
> +   nouveau_sched_destroy(>sched);
>
> /* cleanup notifier state */
> list_for_each_entry_safe(ntfy, temp, >notifiers, head) {
> @@ -337,8 +339,8 @@ nouveau_abi16_ioctl_channel_alloc(ABI16_IOCTL_ARGS)
> if (ret)
> goto done;
>
> -   ret = nouveau_sched_init(>sched, drm, drm->sched_wq,
> -chan->chan->dma.ib_max);
> +   ret = nouveau_sched_create(>sched, drm, drm->sched_wq,
> +  chan->chan->dma.ib_max);
> if (ret)
> goto done;
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.h 
> b/drivers/gpu/drm/nouveau/nouveau_abi16.h
> index 1f5e243c0c75..11c8c4a80079 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_abi16.h
> +++ b/drivers/gpu/drm/nouveau/nouveau_abi16.h
> @@ -26,7 +26,7 @@ struct nouveau_abi16_chan {
> struct nouveau_bo *ntfy;
> struct nouveau_vma *ntfy_vma;
> struct nvkm_mm  heap;
> -   struct nouveau_sched sched;
> +   struct nouveau_sched *sched;
>  };
>
>  struct nouveau_abi16 {
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c 
> b/drivers/gpu/drm/nouveau/nouveau_drm.c
> index 6f6c31a9937b..a947e1d5f309 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -201,7 +201,8 @@ nouveau_cli_fini(struct nouveau_cli *cli)
> WARN_ON(!list_empty(>worker));
>
> usif_client_fini(cli);
> -   nouveau_sched_fini(>sched);
> +   if (cli->sched)
> +   nouveau_sched_destroy(>sched);
> if (uvmm)
> nouveau_uvmm_fini(uvmm);
> nouveau_vmm_fini(>svm);
> @@ -311,7 +312,7 @@ nouveau_cli_init(struct nouveau_drm *drm, const char 
> *sname,
> cli->mem = [ret];
>
> /* Don't pass in the (shared) sched_wq in order to let
> -* nouveau_sched_init() create a dedicated one for VM_BIND jobs.
> +* nouveau_sched_create() create a dedicated one for VM_BIND jobs.
>  *
>  * This is required to ensure that for VM_BIND jobs free_job() work 
> and
>  * run_job() work can always run concurrently and hence, free_job() 
> work
> @@ -320,7 +321,7 @@ nouveau_cli_init(struct nouveau_drm *drm, const char

Re: [PATCH] nouveau: offload fence uevents work to workqueue

2024-02-05 Thread Dave Airlie

On Tue, 6 Feb 2024 at 02:22, Danilo Krummrich  wrote:
>
> On 1/29/24 02:50, Dave Airlie wrote:
> > From: Dave Airlie 
> >
> > This should break the deadlock between the fctx lock and the irq lock.
> >
> > This offloads the processing off the work from the irq into a workqueue.
> >
> > Signed-off-by: Dave Airlie 
>
> Nouveau's scheduler uses a dedicated wq, hence from this perspective it's
> safe deferring fence signalling to the kernel global wq. However, I wonder
> if we could create deadlocks by building dependency chains into other
> drivers / kernel code that, by chance, makes use of the kernel global wq as
> well.
>
> Admittedly, even if, it's gonna be extremely unlikely given that
> WQ_MAX_ACTIVE == 512. But maybe it'd be safer to use a dedicated wq.
>
> Also, do we need to CC stable?

I pushed this to Linus at the end of last week, since the hangs in 6.7
take out the complete system and I wanted it in stable.

It might be safer to use a dedicated wq, is the concern someone is
waiting on a fence in a workqueue somewhere else so we will never
signal it?

Dave.

Re: [PATCH] drm/nouveau: fix several DMA buffer leaks

2024-02-01 Thread Dave Airlie

On Fri, 2 Feb 2024 at 07:33, Timur Tabi  wrote:
>
> On Thu, 2024-02-01 at 13:55 -0600, Timur Tabi wrote:
> > +static void
> > +nvkm_gsp_mem_dtor(struct nvkm_gsp *gsp, struct nvkm_gsp_mem *mem)
> > +{
> > +   if (mem->data) {
> > +   dma_free_coherent(gsp->subdev.device->dev, mem->size, mem-
> > >data, mem->addr);
> > +   mem->data = NULL;
> > +   mem->addr = 0;
> > +   }
> > +}
>
> Dave, what do you think about doing this:
>
> if (mem->data) {
>
> memset(mem->data, 0, mem->size);   <---
>
> dma_free_coherent(gsp->subdev.device->dev, mem->size,
> mem->data, mem->addr);
> mem->data = NULL;
> mem->addr = 0;
> }
>
> This would help situations where a buffer is access by GSP-RM after we think
> it's safe to free it.

I'd prefer to fill it with a posion value than 0

Dave.

[PATCH] nouveau/gsp: use correct size for registry rpc.

2024-01-29 Thread Dave Airlie

From: Dave Airlie 

Timur pointed this out before, and it just slipped my mind,
but this might help some things work better, around pcie power
management.

Fixes: 8d55b0a940bb ("nouveau/gsp: add some basic registry entries.")
Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
index 9ee58e2a0eb2..5e1fa176aac4 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
@@ -1078,7 +1078,6 @@ r535_gsp_rpc_set_registry(struct nvkm_gsp *gsp)
if (IS_ERR(rpc))
return PTR_ERR(rpc);
 
-   rpc->size = sizeof(*rpc);
rpc->numEntries = NV_GSP_REG_NUM_ENTRIES;
 
str_offset = offsetof(typeof(*rpc), entries[NV_GSP_REG_NUM_ENTRIES]);
@@ -1094,6 +1093,7 @@ r535_gsp_rpc_set_registry(struct nvkm_gsp *gsp)
strings += name_len;
str_offset += name_len;
}
+   rpc->size = str_offset;
 
return nvkm_gsp_rpc_wr(gsp, rpc, false);
 }
-- 
2.43.0

[PATCH] nouveau: offload fence uevents work to workqueue

2024-01-28 Thread Dave Airlie

From: Dave Airlie 

This should break the deadlock between the fctx lock and the irq lock.

This offloads the processing off the work from the irq into a workqueue.

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nouveau_fence.c | 24 ++--
 drivers/gpu/drm/nouveau/nouveau_fence.h |  1 +
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c 
b/drivers/gpu/drm/nouveau/nouveau_fence.c
index ca762ea55413..93f08f9479d8 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -103,6 +103,7 @@ nouveau_fence_context_kill(struct nouveau_fence_chan *fctx, 
int error)
 void
 nouveau_fence_context_del(struct nouveau_fence_chan *fctx)
 {
+   cancel_work_sync(>uevent_work);
nouveau_fence_context_kill(fctx, 0);
nvif_event_dtor(>event);
fctx->dead = 1;
@@ -145,12 +146,13 @@ nouveau_fence_update(struct nouveau_channel *chan, struct 
nouveau_fence_chan *fc
return drop;
 }
 
-static int
-nouveau_fence_wait_uevent_handler(struct nvif_event *event, void *repv, u32 
repc)
+static void
+nouveau_fence_uevent_work(struct work_struct *work)
 {
-   struct nouveau_fence_chan *fctx = container_of(event, typeof(*fctx), 
event);
+   struct nouveau_fence_chan *fctx = container_of(work, struct 
nouveau_fence_chan,
+  uevent_work);
unsigned long flags;
-   int ret = NVIF_EVENT_KEEP;
+   int drop = 0;
 
spin_lock_irqsave(>lock, flags);
if (!list_empty(>pending)) {
@@ -160,11 +162,20 @@ nouveau_fence_wait_uevent_handler(struct nvif_event 
*event, void *repv, u32 repc
fence = list_entry(fctx->pending.next, typeof(*fence), head);
chan = rcu_dereference_protected(fence->channel, 
lockdep_is_held(>lock));
if (nouveau_fence_update(chan, fctx))
-   ret = NVIF_EVENT_DROP;
+   drop = 1;
}
+   if (drop)
+   nvif_event_block(>event);
+
spin_unlock_irqrestore(>lock, flags);
+}
 
-   return ret;
+static int
+nouveau_fence_wait_uevent_handler(struct nvif_event *event, void *repv, u32 
repc)
+{
+   struct nouveau_fence_chan *fctx = container_of(event, typeof(*fctx), 
event);
+   schedule_work(>uevent_work);
+   return NVIF_EVENT_KEEP;
 }
 
 void
@@ -178,6 +189,7 @@ nouveau_fence_context_new(struct nouveau_channel *chan, 
struct nouveau_fence_cha
} args;
int ret;
 
+   INIT_WORK(>uevent_work, nouveau_fence_uevent_work);
INIT_LIST_HEAD(>flip);
INIT_LIST_HEAD(>pending);
spin_lock_init(>lock);
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h 
b/drivers/gpu/drm/nouveau/nouveau_fence.h
index 64d33ae7f356..8bc065acfe35 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.h
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.h
@@ -44,6 +44,7 @@ struct nouveau_fence_chan {
u32 context;
char name[32];
 
+   struct work_struct uevent_work;
struct nvif_event event;
int notify_ref, dead, killed;
 };
-- 
2.43.0

Re: [PATCH] nouveau: rip out fence irq allow/block sequences.

2024-01-25 Thread Dave Airlie

On Fri, 26 Jan 2024 at 04:28, Daniel Vetter  wrote:
>
> On Tue, Jan 23, 2024 at 05:25:38PM +1000, Dave Airlie wrote:
> > From: Dave Airlie 
> >
> > fences are signalled on nvidia hw using non-stall interrupts.
> >
> > non-stall interrupts are not latched from my reading.
> >
> > When nouveau emits a fence, it requests a NON_STALL signalling,
> > but it only calls the interface to allow the non-stall irq to happen
> > after it has already emitted the fence. A recent change
> > eacabb546271 ("nouveau: push event block/allowing out of the fence context")
> > made this worse by pushing out the fence allow/block to a workqueue.
> >
> > However I can't see how this could ever work great, since when
> > enable signalling is called, the semaphore has already been emitted
> > to the ring, and the hw could already have tried to set the bits,
> > but it's been masked off. Changing the allowed mask later won't make
> > the interrupt get called again.
> >
> > For now rip all of this out.
> >
> > This fixes a bunch of stalls seen running VK CTS sync tests.
> >
> > Signed-off-by: Dave Airlie 
> > ---
> >  drivers/gpu/drm/nouveau/nouveau_fence.c | 77 +
> >  drivers/gpu/drm/nouveau/nouveau_fence.h |  2 -
> >  2 files changed, 16 insertions(+), 63 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c 
> > b/drivers/gpu/drm/nouveau/nouveau_fence.c
> > index 5057d976fa57..d6d50cdccf75 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> > @@ -50,24 +50,14 @@ nouveau_fctx(struct nouveau_fence *fence)
> >   return container_of(fence->base.lock, struct nouveau_fence_chan, 
> > lock);
> >  }
> >
> > -static int
> > +static void
> >  nouveau_fence_signal(struct nouveau_fence *fence)
> >  {
> > - int drop = 0;
> > -
> >   dma_fence_signal_locked(>base);
> >   list_del(>head);
> >   rcu_assign_pointer(fence->channel, NULL);
> >
> > - if (test_bit(DMA_FENCE_FLAG_USER_BITS, >base.flags)) {
> > - struct nouveau_fence_chan *fctx = nouveau_fctx(fence);
> > -
> > - if (atomic_dec_and_test(>notify_ref))
> > - drop = 1;
> > - }
> > -
> >   dma_fence_put(>base);
> > - return drop;
> >  }
> >
> >  static struct nouveau_fence *
> > @@ -93,8 +83,7 @@ nouveau_fence_context_kill(struct nouveau_fence_chan 
> > *fctx, int error)
> >   if (error)
> >   dma_fence_set_error(>base, error);
> >
> > - if (nouveau_fence_signal(fence))
> > - nvif_event_block(>event);
> > + nouveau_fence_signal(fence);
> >   }
> >   fctx->killed = 1;
> >   spin_unlock_irqrestore(>lock, flags);
> > @@ -103,8 +92,8 @@ nouveau_fence_context_kill(struct nouveau_fence_chan 
> > *fctx, int error)
> >  void
> >  nouveau_fence_context_del(struct nouveau_fence_chan *fctx)
> >  {
> > - cancel_work_sync(>allow_block_work);
> >   nouveau_fence_context_kill(fctx, 0);
> > + nvif_event_block(>event);
> >   nvif_event_dtor(>event);
> >   fctx->dead = 1;
> >
> > @@ -127,11 +116,10 @@ nouveau_fence_context_free(struct nouveau_fence_chan 
> > *fctx)
> >   kref_put(>fence_ref, nouveau_fence_context_put);
> >  }
> >
> > -static int
> > +static void
> >  nouveau_fence_update(struct nouveau_channel *chan, struct 
> > nouveau_fence_chan *fctx)
> >  {
> >   struct nouveau_fence *fence;
> > - int drop = 0;
> >   u32 seq = fctx->read(chan);
> >
> >   while (!list_empty(>pending)) {
> > @@ -140,10 +128,8 @@ nouveau_fence_update(struct nouveau_channel *chan, 
> > struct nouveau_fence_chan *fc
> >   if ((int)(seq - fence->base.seqno) < 0)
> >   break;
> >
> > - drop |= nouveau_fence_signal(fence);
> > + nouveau_fence_signal(fence);
> >   }
> > -
> > - return drop;
> >  }
> >
> >  static int
> > @@ -160,26 +146,13 @@ nouveau_fence_wait_uevent_handler(struct nvif_event 
> > *event, void *repv, u32 repc
> >
> >   fence = list_entry(fctx->pending.next, typeof(*fence), head);
> >   chan = rcu_dereference_protected(fence->channel, 
> > lo

[PATCH] nouveau: rip out fence irq allow/block sequences.

2024-01-22 Thread Dave Airlie

From: Dave Airlie 

fences are signalled on nvidia hw using non-stall interrupts.

non-stall interrupts are not latched from my reading.

When nouveau emits a fence, it requests a NON_STALL signalling,
but it only calls the interface to allow the non-stall irq to happen
after it has already emitted the fence. A recent change
eacabb546271 ("nouveau: push event block/allowing out of the fence context")
made this worse by pushing out the fence allow/block to a workqueue.

However I can't see how this could ever work great, since when
enable signalling is called, the semaphore has already been emitted
to the ring, and the hw could already have tried to set the bits,
but it's been masked off. Changing the allowed mask later won't make
the interrupt get called again.

For now rip all of this out.

This fixes a bunch of stalls seen running VK CTS sync tests.

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nouveau_fence.c | 77 +
 drivers/gpu/drm/nouveau/nouveau_fence.h |  2 -
 2 files changed, 16 insertions(+), 63 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c 
b/drivers/gpu/drm/nouveau/nouveau_fence.c
index 5057d976fa57..d6d50cdccf75 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -50,24 +50,14 @@ nouveau_fctx(struct nouveau_fence *fence)
return container_of(fence->base.lock, struct nouveau_fence_chan, lock);
 }
 
-static int
+static void
 nouveau_fence_signal(struct nouveau_fence *fence)
 {
-   int drop = 0;
-
dma_fence_signal_locked(>base);
list_del(>head);
rcu_assign_pointer(fence->channel, NULL);
 
-   if (test_bit(DMA_FENCE_FLAG_USER_BITS, >base.flags)) {
-   struct nouveau_fence_chan *fctx = nouveau_fctx(fence);
-
-   if (atomic_dec_and_test(>notify_ref))
-   drop = 1;
-   }
-
dma_fence_put(>base);
-   return drop;
 }
 
 static struct nouveau_fence *
@@ -93,8 +83,7 @@ nouveau_fence_context_kill(struct nouveau_fence_chan *fctx, 
int error)
if (error)
dma_fence_set_error(>base, error);
 
-   if (nouveau_fence_signal(fence))
-   nvif_event_block(>event);
+   nouveau_fence_signal(fence);
}
fctx->killed = 1;
spin_unlock_irqrestore(>lock, flags);
@@ -103,8 +92,8 @@ nouveau_fence_context_kill(struct nouveau_fence_chan *fctx, 
int error)
 void
 nouveau_fence_context_del(struct nouveau_fence_chan *fctx)
 {
-   cancel_work_sync(>allow_block_work);
nouveau_fence_context_kill(fctx, 0);
+   nvif_event_block(>event);
nvif_event_dtor(>event);
fctx->dead = 1;
 
@@ -127,11 +116,10 @@ nouveau_fence_context_free(struct nouveau_fence_chan 
*fctx)
kref_put(>fence_ref, nouveau_fence_context_put);
 }
 
-static int
+static void
 nouveau_fence_update(struct nouveau_channel *chan, struct nouveau_fence_chan 
*fctx)
 {
struct nouveau_fence *fence;
-   int drop = 0;
u32 seq = fctx->read(chan);
 
while (!list_empty(>pending)) {
@@ -140,10 +128,8 @@ nouveau_fence_update(struct nouveau_channel *chan, struct 
nouveau_fence_chan *fc
if ((int)(seq - fence->base.seqno) < 0)
break;
 
-   drop |= nouveau_fence_signal(fence);
+   nouveau_fence_signal(fence);
}
-
-   return drop;
 }
 
 static int
@@ -160,26 +146,13 @@ nouveau_fence_wait_uevent_handler(struct nvif_event 
*event, void *repv, u32 repc
 
fence = list_entry(fctx->pending.next, typeof(*fence), head);
chan = rcu_dereference_protected(fence->channel, 
lockdep_is_held(>lock));
-   if (nouveau_fence_update(chan, fctx))
-   ret = NVIF_EVENT_DROP;
+   nouveau_fence_update(chan, fctx);
}
spin_unlock_irqrestore(>lock, flags);
 
return ret;
 }
 
-static void
-nouveau_fence_work_allow_block(struct work_struct *work)
-{
-   struct nouveau_fence_chan *fctx = container_of(work, struct 
nouveau_fence_chan,
-  allow_block_work);
-
-   if (atomic_read(>notify_ref) == 0)
-   nvif_event_block(>event);
-   else
-   nvif_event_allow(>event);
-}
-
 void
 nouveau_fence_context_new(struct nouveau_channel *chan, struct 
nouveau_fence_chan *fctx)
 {
@@ -191,7 +164,6 @@ nouveau_fence_context_new(struct nouveau_channel *chan, 
struct nouveau_fence_cha
} args;
int ret;
 
-   INIT_WORK(>allow_block_work, nouveau_fence_work_allow_block);
INIT_LIST_HEAD(>flip);
INIT_LIST_HEAD(>pending);
spin_lock_init(>lock);
@@ -216,6 +188,12 @@ nouveau_fence_context_new(struct nouveau_channel *chan, 
struct nouveau_fence_cha

[PATCH] nouveau/vmm: don't set addr on the fail path to avoid warning

2024-01-17 Thread Dave Airlie

From: Dave Airlie 

nvif_vmm_put gets called if addr is set, but if the allocation
fails we don't need to call put, otherwise we get a warning like

[523232.435671] [ cut here ]
[523232.435674] WARNING: CPU: 8 PID: 1505697 at 
drivers/gpu/drm/nouveau/nvif/vmm.c:68 nvif_vmm_put+0x72/0x80 [nouveau]
[523232.435795] Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer 
nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 
nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject 
nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set 
nf_tables nfnetlink qrtr bnep sunrpc binfmt_misc intel_rapl_msr 
intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common 
isst_if_common iwlmvm nfit libnvdimm vfat fat x86_pkg_temp_thermal 
intel_powerclamp mac80211 snd_soc_avs snd_soc_hda_codec coretemp 
snd_hda_ext_core snd_soc_core snd_hda_codec_realtek kvm_intel 
snd_hda_codec_hdmi snd_compress snd_hda_codec_generic ac97_bus 
snd_pcm_dmaengine snd_hda_intel libarc4 snd_intel_dspcfg snd_intel_sdw_acpi 
snd_hda_codec kvm iwlwifi snd_hda_core btusb snd_hwdep btrtl snd_seq btintel 
irqbypass btbcm rapl snd_seq_device eeepc_wmi btmtk intel_cstate iTCO_wdt 
cfg80211 snd_pcm asus_wmi bluetooth intel_pmc_bxt iTCO_vendor_support snd_timer 
ledtrig_audio pktcdvd snd mei_me
[523232.435828]  sparse_keymap intel_uncore i2c_i801 platform_profile wmi_bmof 
mei pcspkr ioatdma soundcore i2c_smbus rfkill idma64 dca joydev acpi_tad loop 
zram nouveau drm_ttm_helper ttm video drm_exec drm_gpuvm gpu_sched 
crct10dif_pclmul i2c_algo_bit nvme crc32_pclmul crc32c_intel drm_display_helper 
polyval_clmulni nvme_core polyval_generic e1000e mxm_wmi cec 
ghash_clmulni_intel r8169 sha512_ssse3 nvme_common wmi pinctrl_sunrisepoint uas 
usb_storage ip6_tables ip_tables fuse
[523232.435849] CPU: 8 PID: 1505697 Comm: gnome-shell Tainted: GW   
   6.6.0-rc7-nvk-uapi+ #12
[523232.435851] Hardware name: System manufacturer System Product Name/ROG 
STRIX X299-E GAMING II, BIOS 1301 09/24/2021
[523232.435852] RIP: 0010:nvif_vmm_put+0x72/0x80 [nouveau]
[523232.435934] Code: 00 00 48 89 e2 be 02 00 00 00 48 c7 04 24 00 00 00 00 48 
89 44 24 08 e8 fc bf ff ff 85
c0 75 0a 48 c7 43 08 00 00 00 00 eb b3 <0f> 0b eb f2 e8 f5 c9 b2 e6 0f 1f 44 00 
00 90 90 90 90 90 90 90 90
[523232.435936] RSP: 0018:c900077ffbd8 EFLAGS: 00010282
[523232.435937] RAX: fffe RBX: c900077ffc00 RCX: 
0010
[523232.435938] RDX: 0010 RSI: c900077ffb38 RDI: 
c900077ffbd8
[523232.435940] RBP: 888e1c4f2140 R08:  R09: 

[523232.435940] R10:  R11:  R12: 
888503811800
[523232.435941] R13: c900077ffca0 R14: 888e1c4f2140 R15: 
88810317e1e0
[523232.435942] FS:  7f933a769640() GS:88905fa0() 
knlGS:
[523232.435943] CS:  0010 DS:  ES:  CR0: 80050033
[523232.435944] CR2: 7f930bef7000 CR3: 0005d0322001 CR4: 
003706e0
[523232.435945] DR0:  DR1:  DR2: 

[523232.435946] DR3:  DR6: fffe0ff0 DR7: 
0400
[523232.435964] Call Trace:
[523232.435965]  
[523232.435966]  ? nvif_vmm_put+0x72/0x80 [nouveau]
[523232.436051]  ? __warn+0x81/0x130
[523232.436055]  ? nvif_vmm_put+0x72/0x80 [nouveau]
[523232.436138]  ? report_bug+0x171/0x1a0
[523232.436142]  ? handle_bug+0x3c/0x80
[523232.436144]  ? exc_invalid_op+0x17/0x70
[523232.436145]  ? asm_exc_invalid_op+0x1a/0x20
[523232.436149]  ? nvif_vmm_put+0x72/0x80 [nouveau]
[523232.436230]  ? nvif_vmm_put+0x64/0x80 [nouveau]
[523232.436342]  nouveau_vma_del+0x80/0xd0 [nouveau]
[523232.436506]  nouveau_vma_new+0x1a0/0x210 [nouveau]
[523232.436671]  nouveau_gem_object_open+0x1d0/0x1f0 [nouveau]
[523232.436835]  drm_gem_handle_create_tail+0xd1/0x180
[523232.436840]  drm_prime_fd_to_handle_ioctl+0x12e/0x200
[523232.436844]  ? __pfx_drm_prime_fd_to_handle_ioctl+0x10/0x10
[523232.436847]  drm_ioctl_kernel+0xd3/0x180
[523232.436849]  drm_ioctl+0x26d/0x4b0
[523232.436851]  ? __pfx_drm_prime_fd_to_handle_ioctl+0x10/0x10
[523232.436855]  nouveau_drm_ioctl+0x5a/0xb0 [nouveau]
[523232.437032]  __x64_sys_ioctl+0x94/0xd0
[523232.437036]  do_syscall_64+0x5d/0x90
[523232.437040]  ? syscall_exit_to_user_mode+0x2b/0x40
[523232.437044]  ? do_syscall_64+0x6c/0x90
[523232.437046]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nouveau_vmm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_vmm.c 
b/drivers/gpu/drm/nouveau/nouveau_vmm.c
index a6602c012671..3dda885df5b2 100644
--- a/drivers/gpu/drm/nouveau/nouveau_vmm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_vmm.c
@@ -108,6 +108,9 @@ nouveau_vma_new(struct nouveau_bo *nvbo, struct nouveau_vmm 
*vmm,
} else {
ret = nvif_vmm_get(>vmm, PT

[PATCH] nouveau/gsp: handle engines in runl without nonstall interrupts.

2024-01-09 Thread Dave Airlie

From: Dave Airlie 

It appears on TU106 GPUs (2070), that some of the nvdec engines
are in the runlist but have no valid nonstall interrupt, nouveau
didn't handle that too well.

This should let nouveau/gsp work on those.

Cc: sta...@vger.kernel.org # v6.7+
---
 drivers/gpu/drm/nouveau/nvkm/engine/fifo/ga100.c | 4 
 drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c  | 2 +-
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/base.c   | 8 ++--
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/ga100.c 
b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/ga100.c
index c8ce7ff18713..e74493a4569e 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/ga100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/ga100.c
@@ -550,6 +550,10 @@ ga100_fifo_nonstall_ctor(struct nvkm_fifo *fifo)
struct nvkm_engn *engn = list_first_entry(>engns, 
typeof(*engn), head);
 
runl->nonstall.vector = engn->func->nonstall(engn);
+
+   /* if no nonstall vector just keep going */
+   if (runl->nonstall.vector == -1)
+   continue;
if (runl->nonstall.vector < 0) {
RUNL_ERROR(runl, "nonstall %d", runl->nonstall.vector);
return runl->nonstall.vector;
diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c
index b903785056b5..3454c7d29502 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c
@@ -351,7 +351,7 @@ r535_engn_nonstall(struct nvkm_engn *engn)
int ret;
 
ret = nvkm_gsp_intr_nonstall(subdev->device->gsp, subdev->type, 
subdev->inst);
-   WARN_ON(ret < 0);
+   WARN_ON(ret == -ENOENT);
return ret;
 }
 
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/base.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/base.c
index 04bceaa28a19..da1bebb896f7 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/base.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/base.c
@@ -25,12 +25,8 @@ int
 nvkm_gsp_intr_nonstall(struct nvkm_gsp *gsp, enum nvkm_subdev_type type, int 
inst)
 {
for (int i = 0; i < gsp->intr_nr; i++) {
-   if (gsp->intr[i].type == type && gsp->intr[i].inst == inst) {
-   if (gsp->intr[i].nonstall != ~0)
-   return gsp->intr[i].nonstall;
-
-   return -EINVAL;
-   }
+   if (gsp->intr[i].type == type && gsp->intr[i].inst == inst)
+   return gsp->intr[i].nonstall;
}
 
return -ENOENT;
-- 
2.43.0

Re: [PATCH 08/11] nouveau/gsp: don't free ctrl messages on errors

2024-01-03 Thread Dave Airlie

On Thu, 4 Jan 2024 at 00:47, Dan Carpenter  wrote:
>
> Hi Dave,
>
> kernel test robot noticed the following build warnings:
>
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url:
> https://github.com/intel-lab-lkp/linux/commits/Dave-Airlie/nouveau-gsp-drop-some-acpi-related-debug/20231222-180432
> base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
> patch link:
> https://lore.kernel.org/r/20231222043308.3090089-9-airlied%40gmail.com
> patch subject: [PATCH 08/11] nouveau/gsp: don't free ctrl messages on errors
> config: powerpc-randconfig-r071-20231226 
> (https://download.01.org/0day-ci/archive/20231227/202312271917.55xudmdc-...@intel.com/config)
> compiler: clang version 18.0.0git (https://github.com/llvm/llvm-project 
> d3ef86708241a3bee902615c190dead1638c4e09)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version 
> of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot 
> | Reported-by: Dan Carpenter 
> | Closes: https://lore.kernel.org/r/202312271917.55xudmdc-...@intel.com/

This is a false positive, I think the code is operating like I'd
expect, we maybe could restructure it to avoid this warning?

The idea is you send an rpc msg, if there's a reply you get a reply,
if no reply you get NULL and if an error you get an error.

So in the case you get an error or NULL you just want to return 0 for
the NULL as it's successful, and error otherwise.

Would using PTR_ERR_OR_ZERO make smatch happy? (even if it's not
really what we want).

Dave.
>
> New smatch warnings:
> drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:659 
> r535_gsp_rpc_rm_ctrl_push() warn: passing zero to 'PTR_ERR'
> drivers/gpu/drm/nouveau/nvkm/engine/disp/r535.c:1063 r535_dp_aux_xfer() warn: 
> passing a valid pointer to 'PTR_ERR'
>
> Old smatch warnings:
> drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:1887 nvkm_gsp_radix3_sg() 
> error: uninitialized symbol 'addr'.
>
> vim +/PTR_ERR +659 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
>
> af265ee961627a Dave Airlie 2023-12-22  649  static int
> af265ee961627a Dave Airlie 2023-12-22  650  r535_gsp_rpc_rm_ctrl_push(struct 
> nvkm_gsp_object *object, void **argv, u32 repc)
> 4cf2c83eb3a4c4 Ben Skeggs  2023-09-19  651  {
> af265ee961627a Dave Airlie 2023-12-22  652  rpc_gsp_rm_control_v03_00 
> *rpc = container_of((*argv), typeof(*rpc), params);
> 4cf2c83eb3a4c4 Ben Skeggs  2023-09-19  653  struct nvkm_gsp *gsp = 
> object->client->gsp;
> af265ee961627a Dave Airlie 2023-12-22  654  int ret = 0;
> 4cf2c83eb3a4c4 Ben Skeggs  2023-09-19  655
> 4cf2c83eb3a4c4 Ben Skeggs  2023-09-19  656  rpc = nvkm_gsp_rpc_push(gsp, 
> rpc, true, repc);
> af265ee961627a Dave Airlie 2023-12-22  657  if (IS_ERR_OR_NULL(rpc)) {
> af265ee961627a Dave Airlie 2023-12-22  658  *argv = NULL;
> af265ee961627a Dave Airlie 2023-12-22 @659  return PTR_ERR(rpc);
>
> If nvkm_gsp_rpc_push() returns NULL (probably a failure) then this
> returns PTR_ERR(NULL) which is zero/success.
>
> af265ee961627a Dave Airlie 2023-12-22  660  }
> 4cf2c83eb3a4c4 Ben Skeggs  2023-09-19  661
> 4cf2c83eb3a4c4 Ben Skeggs  2023-09-19  662  if (rpc->status) {
> af265ee961627a Dave Airlie 2023-12-22  663  ret = 
> r535_rpc_status_to_errno(rpc->status);
> 555bb9c29a45be Dave Airlie 2023-12-22  664  if (ret != -EAGAIN)
> 4cf2c83eb3a4c4 Ben Skeggs  2023-09-19  665  
> nvkm_error(>subdev, "cli:0x%08x obj:0x%08x ctrl cmd:0x%08x failed: 
> 0x%08x\n",
> 4cf2c83eb3a4c4 Ben Skeggs  2023-09-19  666     
> object->client->object.handle, object->handle, rpc->cmd, rpc->status);
> 4cf2c83eb3a4c4 Ben Skeggs  2023-09-19  667  }
> 4cf2c83eb3a4c4 Ben Skeggs  2023-09-19  668
> af265ee961627a Dave Airlie 2023-12-22  669  if (repc)
> af265ee961627a Dave Airlie 2023-12-22  670  *argv = rpc->params;
> af265ee961627a Dave Airlie 2023-12-22  671  else
> 4cf2c83eb3a4c4 Ben Skeggs  2023-09-19  672  
> nvkm_gsp_rpc_done(gsp, rpc);
> 4cf2c83eb3a4c4 Ben Skeggs  2023-09-19  673
> 4cf2c83eb3a4c4 Ben Skeggs  2023-09-19  674  return ret;
> 4cf2c83eb3a4c4 Ben Skeggs  2023-09-19  675  }
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>

[PATCH 11/11] drm/nouveau/dp: Honor GSP link training retry timeouts

2023-12-21 Thread Dave Airlie

From: Lyude Paul 

Turns out that one of the ways that Nvidia's driver handles the pre-LT
timeout for eDP panels is by providing a retry timeout in their link
training callbacks that we're expected to wait for. Up until now we didn't
pay any attention to this parameter.

So, start honoring the timeout if link training fails - and retry up to 3
times. The "3 times" bit comes from OpenRM's link training code.

[airlied: this fixes the panel on one of my laptops]
Signed-off-by: Lyude Paul 
Signed-off-by: Dave Airlie 
---
 .../gpu/drm/nouveau/nvkm/engine/disp/r535.c   | 62 ---
 1 file changed, 40 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/disp/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/engine/disp/r535.c
index 1b4f988df7ed..b287ab19a51f 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/disp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/disp/r535.c
@@ -957,40 +957,58 @@ r535_dp_train_target(struct nvkm_outp *outp, u8 target, 
bool mst, u8 link_nr, u8
 {
struct nvkm_disp *disp = outp->disp;
NV0073_CTRL_DP_CTRL_PARAMS *ctrl;
-   int ret;
-
-   ctrl = nvkm_gsp_rm_ctrl_get(>rm.objcom, NV0073_CTRL_CMD_DP_CTRL, 
sizeof(*ctrl));
-   if (IS_ERR(ctrl))
-   return PTR_ERR(ctrl);
+   int ret, retries;
+   u32 cmd, data;
 
-   ctrl->subDeviceInstance = 0;
-   ctrl->displayId = BIT(outp->index);
-   ctrl->cmd = NVDEF(NV0073_CTRL, DP_CMD, SET_LANE_COUNT, TRUE) |
-   NVDEF(NV0073_CTRL, DP_CMD, SET_LINK_BW, TRUE) |
-   NVDEF(NV0073_CTRL, DP_CMD, TRAIN_PHY_REPEATER, YES);
-   ctrl->data = NVVAL(NV0073_CTRL, DP_DATA, SET_LANE_COUNT, link_nr) |
-NVVAL(NV0073_CTRL, DP_DATA, SET_LINK_BW, link_bw) |
-NVVAL(NV0073_CTRL, DP_DATA, TARGET, target);
+   cmd = NVDEF(NV0073_CTRL, DP_CMD, SET_LANE_COUNT, TRUE) |
+ NVDEF(NV0073_CTRL, DP_CMD, SET_LINK_BW, TRUE) |
+ NVDEF(NV0073_CTRL, DP_CMD, TRAIN_PHY_REPEATER, YES);
+   data = NVVAL(NV0073_CTRL, DP_DATA, SET_LANE_COUNT, link_nr) |
+  NVVAL(NV0073_CTRL, DP_DATA, SET_LINK_BW, link_bw) |
+  NVVAL(NV0073_CTRL, DP_DATA, TARGET, target);
 
if (mst)
-   ctrl->cmd |= NVDEF(NV0073_CTRL, DP_CMD, SET_FORMAT_MODE, 
MULTI_STREAM);
+   cmd |= NVDEF(NV0073_CTRL, DP_CMD, SET_FORMAT_MODE, 
MULTI_STREAM);
 
if (outp->dp.dpcd[DPCD_RC02] & DPCD_RC02_ENHANCED_FRAME_CAP)
-   ctrl->cmd |= NVDEF(NV0073_CTRL, DP_CMD, SET_ENHANCED_FRAMING, 
TRUE);
+   cmd |= NVDEF(NV0073_CTRL, DP_CMD, SET_ENHANCED_FRAMING, TRUE);
 
if (target == 0 &&
 (outp->dp.dpcd[DPCD_RC02] & 0x20) &&
!(outp->dp.dpcd[DPCD_RC03] & DPCD_RC03_TPS4_SUPPORTED))
-   ctrl->cmd |= NVDEF(NV0073_CTRL, DP_CMD, POST_LT_ADJ_REQ_GRANTED, 
YES);
+   cmd |= NVDEF(NV0073_CTRL, DP_CMD, POST_LT_ADJ_REQ_GRANTED, YES);
 
-   ret = nvkm_gsp_rm_ctrl_push(>rm.objcom, , sizeof(*ctrl));
-   if (ret) {
-   nvkm_gsp_rm_ctrl_done(>rm.objcom, ctrl);
-   return ret;
+   /* We should retry up to 3 times, but only if GSP asks politely */
+   for (retries = 0; retries < 3; ++retries) {
+   ctrl = nvkm_gsp_rm_ctrl_get(>rm.objcom, 
NV0073_CTRL_CMD_DP_CTRL,
+   sizeof(*ctrl));
+   if (IS_ERR(ctrl))
+   return PTR_ERR(ctrl);
+
+   ctrl->subDeviceInstance = 0;
+   ctrl->displayId = BIT(outp->index);
+   ctrl->retryTimeMs = 0;
+   ctrl->cmd = cmd;
+   ctrl->data = data;
+
+   ret = nvkm_gsp_rm_ctrl_push(>rm.objcom, , 
sizeof(*ctrl));
+   if (ret == -EAGAIN && ctrl->retryTimeMs) {
+   /* Device (likely an eDP panel) isn't ready yet, wait 
for the time specified
+* by GSP before retrying again */
+   nvkm_debug(>engine.subdev,
+  "Waiting %dms for GSP LT panel delay before 
retrying\n",
+  ctrl->retryTimeMs);
+   msleep(ctrl->retryTimeMs);
+   nvkm_gsp_rm_ctrl_done(>rm.objcom, ctrl);
+   } else {
+   /* GSP didn't say to retry, or we were successful */
+   if (ctrl->err)
+   ret = -EIO;
+   nvkm_gsp_rm_ctrl_done(>rm.objcom, ctrl);
+   break;
+   }
}
 
-   ret = ctrl->err ? -EIO : 0;
-   nvkm_gsp_rm_ctrl_done(>rm.objcom, ctrl);
return ret;
 }
 
-- 
2.43.0

[PATCH 10/11] nouveau: push event block/allowing out of the fence context

2023-12-21 Thread Dave Airlie

There is a deadlock between the irq and fctx locks,
the irq handling takes irq then fctx lock
the fence signalling takes fctx then irq lock

This splits the fence signalling path so the code that hits
the irq lock is done in a separate work queue.

This seems to fix crashes/hangs when using nouveau gsp with
i915 primary GPU.

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nouveau_fence.c | 28 -
 drivers/gpu/drm/nouveau/nouveau_fence.h |  5 -
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c 
b/drivers/gpu/drm/nouveau/nouveau_fence.c
index ca762ea55413..5057d976fa57 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -62,7 +62,7 @@ nouveau_fence_signal(struct nouveau_fence *fence)
if (test_bit(DMA_FENCE_FLAG_USER_BITS, >base.flags)) {
struct nouveau_fence_chan *fctx = nouveau_fctx(fence);
 
-   if (!--fctx->notify_ref)
+   if (atomic_dec_and_test(>notify_ref))
drop = 1;
}
 
@@ -103,6 +103,7 @@ nouveau_fence_context_kill(struct nouveau_fence_chan *fctx, 
int error)
 void
 nouveau_fence_context_del(struct nouveau_fence_chan *fctx)
 {
+   cancel_work_sync(>allow_block_work);
nouveau_fence_context_kill(fctx, 0);
nvif_event_dtor(>event);
fctx->dead = 1;
@@ -167,6 +168,18 @@ nouveau_fence_wait_uevent_handler(struct nvif_event 
*event, void *repv, u32 repc
return ret;
 }
 
+static void
+nouveau_fence_work_allow_block(struct work_struct *work)
+{
+   struct nouveau_fence_chan *fctx = container_of(work, struct 
nouveau_fence_chan,
+  allow_block_work);
+
+   if (atomic_read(>notify_ref) == 0)
+   nvif_event_block(>event);
+   else
+   nvif_event_allow(>event);
+}
+
 void
 nouveau_fence_context_new(struct nouveau_channel *chan, struct 
nouveau_fence_chan *fctx)
 {
@@ -178,6 +191,7 @@ nouveau_fence_context_new(struct nouveau_channel *chan, 
struct nouveau_fence_cha
} args;
int ret;
 
+   INIT_WORK(>allow_block_work, nouveau_fence_work_allow_block);
INIT_LIST_HEAD(>flip);
INIT_LIST_HEAD(>pending);
spin_lock_init(>lock);
@@ -521,15 +535,19 @@ static bool nouveau_fence_enable_signaling(struct 
dma_fence *f)
struct nouveau_fence *fence = from_fence(f);
struct nouveau_fence_chan *fctx = nouveau_fctx(fence);
bool ret;
+   bool do_work;
 
-   if (!fctx->notify_ref++)
-   nvif_event_allow(>event);
+   if (atomic_inc_return(>notify_ref) == 0)
+   do_work = true;
 
ret = nouveau_fence_no_signaling(f);
if (ret)
set_bit(DMA_FENCE_FLAG_USER_BITS, >base.flags);
-   else if (!--fctx->notify_ref)
-   nvif_event_block(>event);
+   else if (atomic_dec_and_test(>notify_ref))
+   do_work = true;
+
+   if (do_work)
+   schedule_work(>allow_block_work);
 
return ret;
 }
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h 
b/drivers/gpu/drm/nouveau/nouveau_fence.h
index 64d33ae7f356..28f5cf013b89 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.h
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.h
@@ -3,6 +3,7 @@
 #define __NOUVEAU_FENCE_H__
 
 #include 
+#include 
 #include 
 
 struct nouveau_drm;
@@ -45,7 +46,9 @@ struct nouveau_fence_chan {
char name[32];
 
struct nvif_event event;
-   int notify_ref, dead, killed;
+   struct work_struct allow_block_work;
+   atomic_t notify_ref;
+   int dead, killed;
 };
 
 struct nouveau_fence_priv {
-- 
2.43.0

[PATCH 08/11] nouveau/gsp: don't free ctrl messages on errors

2023-12-21 Thread Dave Airlie

It looks like for some messages the upper layers need to get access to the
results of the message so we can interpret it.

Rework the ctrl push interface to not free things and cleanup properly
whereever it errors out.

Requested-by: Lyude
Signed-off-by: Dave Airlie 
---
 .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h |  17 +--
 .../gpu/drm/nouveau/nvkm/engine/disp/r535.c   | 108 +++---
 .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c|  36 +++---
 3 files changed, 100 insertions(+), 61 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h 
b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
index 2fa0445d8928..d1437c08645f 100644
--- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
+++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
@@ -187,7 +187,7 @@ struct nvkm_gsp {
void (*rpc_done)(struct nvkm_gsp *gsp, void *repv);
 
void *(*rm_ctrl_get)(struct nvkm_gsp_object *, u32 cmd, u32 
argc);
-   void *(*rm_ctrl_push)(struct nvkm_gsp_object *, void *argv, u32 
repc);
+   int (*rm_ctrl_push)(struct nvkm_gsp_object *, void **argv, u32 
repc);
void (*rm_ctrl_done)(struct nvkm_gsp_object *, void *repv);
 
void *(*rm_alloc_get)(struct nvkm_gsp_object *, u32 oclass, u32 
argc);
@@ -265,7 +265,7 @@ nvkm_gsp_rm_ctrl_get(struct nvkm_gsp_object *object, u32 
cmd, u32 argc)
return object->client->gsp->rm->rm_ctrl_get(object, cmd, argc);
 }
 
-static inline void *
+static inline int
 nvkm_gsp_rm_ctrl_push(struct nvkm_gsp_object *object, void *argv, u32 repc)
 {
return object->client->gsp->rm->rm_ctrl_push(object, argv, repc);
@@ -275,21 +275,24 @@ static inline void *
 nvkm_gsp_rm_ctrl_rd(struct nvkm_gsp_object *object, u32 cmd, u32 repc)
 {
void *argv = nvkm_gsp_rm_ctrl_get(object, cmd, repc);
+   int ret;
 
if (IS_ERR(argv))
return argv;
 
-   return nvkm_gsp_rm_ctrl_push(object, argv, repc);
+   ret = nvkm_gsp_rm_ctrl_push(object, , repc);
+   if (ret)
+   return ERR_PTR(ret);
+   return argv;
 }
 
 static inline int
 nvkm_gsp_rm_ctrl_wr(struct nvkm_gsp_object *object, void *argv)
 {
-   void *repv = nvkm_gsp_rm_ctrl_push(object, argv, 0);
-
-   if (IS_ERR(repv))
-   return PTR_ERR(repv);
+   int ret = nvkm_gsp_rm_ctrl_push(object, , 0);
 
+   if (ret)
+   return ret;
return 0;
 }
 
diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/disp/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/engine/disp/r535.c
index 1c8c4cca0957..1b4f988df7ed 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/disp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/disp/r535.c
@@ -282,7 +282,7 @@ r535_sor_bl_get(struct nvkm_ior *sor)
 {
struct nvkm_disp *disp = sor->disp;
NV0073_CTRL_SPECIFIC_BACKLIGHT_BRIGHTNESS_PARAMS *ctrl;
-   int lvl;
+   int ret, lvl;
 
ctrl = nvkm_gsp_rm_ctrl_get(>rm.objcom,

NV0073_CTRL_CMD_SPECIFIC_GET_BACKLIGHT_BRIGHTNESS,
@@ -292,9 +292,11 @@ r535_sor_bl_get(struct nvkm_ior *sor)
 
ctrl->displayId = BIT(sor->asy.outp->index);
 
-   ctrl = nvkm_gsp_rm_ctrl_push(>rm.objcom, ctrl, sizeof(*ctrl));
-   if (IS_ERR(ctrl))
-   return PTR_ERR(ctrl);
+   ret = nvkm_gsp_rm_ctrl_push(>rm.objcom, , sizeof(*ctrl));
+   if (ret) {
+   nvkm_gsp_rm_ctrl_done(>rm.objcom, ctrl);
+   return ret;
+   }
 
lvl = ctrl->brightness;
nvkm_gsp_rm_ctrl_done(>rm.objcom, ctrl);
@@ -649,9 +651,11 @@ r535_conn_new(struct nvkm_disp *disp, u32 id)
ctrl->subDeviceInstance = 0;
ctrl->displayId = BIT(id);
 
-   ctrl = nvkm_gsp_rm_ctrl_push(>rm.objcom, ctrl, sizeof(*ctrl));
-   if (IS_ERR(ctrl))
-   return (void *)ctrl;
+   ret = nvkm_gsp_rm_ctrl_push(>rm.objcom, , sizeof(*ctrl));
+   if (ret) {
+   nvkm_gsp_rm_ctrl_done(>rm.objcom, ctrl);
+   return ERR_PTR(ret);
+   }
 
list_for_each_entry(conn, >conns, head) {
if (conn->index == ctrl->data[0].index) {
@@ -686,7 +690,7 @@ r535_outp_acquire(struct nvkm_outp *outp, bool hda)
struct nvkm_disp *disp = outp->disp;
struct nvkm_ior *ior;
NV0073_CTRL_DFP_ASSIGN_SOR_PARAMS *ctrl;
-   int or;
+   int ret, or;
 
ctrl = nvkm_gsp_rm_ctrl_get(>rm.objcom,
NV0073_CTRL_CMD_DFP_ASSIGN_SOR, 
sizeof(*ctrl));
@@ -699,9 +703,11 @@ r535_outp_acquire(struct nvkm_outp *outp, bool hda)
if (hda)
ctrl->flags |= NVDEF(NV0073_CTRL, DFP_ASSIGN_SOR_FLAGS, AUDIO, 
OPTIMAL);
 
-   ctrl = nvkm_gsp_rm_ctrl_push(>rm.objcom, ctrl, sizeof(*ctrl));
-   if (IS_ERR(ctrl))
-   return PTR_ERR(ctrl);
+   ret = nvkm_gsp_rm_ctrl_push(>rm.objcom

[PATCH 07/11] nouveau/gsp: convert gsp errors to generic errors

2023-12-21 Thread Dave Airlie

This should let the upper layers retry as needed on EAGAIN.

There may be other values we will care about in the future, but
this covers our present needs.

Signed-off-by: Dave Airlie 
---
 .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c| 26 +++
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
index 774ca47b019f..54c1fbccc013 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
@@ -70,6 +70,20 @@ struct r535_gsp_msg {
 
 #define GSP_MSG_HDR_SIZE offsetof(struct r535_gsp_msg, data)
 
+static int
+r535_rpc_status_to_errno(uint32_t rpc_status)
+{
+   switch (rpc_status) {
+   case 0x55: /* NV_ERR_NOT_READY */
+   case 0x66: /* NV_ERR_TIMEOUT_RETRY */
+  return -EAGAIN;
+   case 0x51: /* NV_ERR_NO_MEMORY */
+   return -ENOMEM;
+   default:
+   return -EINVAL;
+   }
+}
+
 static void *
 r535_gsp_msgq_wait(struct nvkm_gsp *gsp, u32 repc, u32 *prepc, int *ptime)
 {
@@ -584,8 +598,9 @@ r535_gsp_rpc_rm_alloc_push(struct nvkm_gsp_object *object, 
void *argv, u32 repc)
return rpc;
 
if (rpc->status) {
-   nvkm_error(>subdev, "RM_ALLOC: 0x%x\n", rpc->status);
-   ret = ERR_PTR(-EINVAL);
+   ret = ERR_PTR(r535_rpc_status_to_errno(rpc->status));
+   if (ret != -EAGAIN)
+   nvkm_error(>subdev, "RM_ALLOC: 0x%x\n", 
rpc->status);
} else {
ret = repc ? rpc->params : NULL;
}
@@ -639,9 +654,10 @@ r535_gsp_rpc_rm_ctrl_push(struct nvkm_gsp_object *object, 
void *argv, u32 repc)
return rpc;
 
if (rpc->status) {
-   nvkm_error(>subdev, "cli:0x%08x obj:0x%08x ctrl cmd:0x%08x 
failed: 0x%08x\n",
-  object->client->object.handle, object->handle, 
rpc->cmd, rpc->status);
-   ret = ERR_PTR(-EINVAL);
+   ret = ERR_PTR(r535_rpc_status_to_errno(rpc->status));
+   if (ret != -EAGAIN)
+   nvkm_error(>subdev, "cli:0x%08x obj:0x%08x ctrl 
cmd:0x%08x failed: 0x%08x\n",
+  object->client->object.handle, 
object->handle, rpc->cmd, rpc->status);
} else {
ret = repc ? rpc->params : NULL;
}
-- 
2.43.0

[PATCH 09/11] nouveau/gsp: always free the alloc messages on r535

2023-12-21 Thread Dave Airlie

Fixes a memory leak seen with kmemleak.

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
index e2810fd1a36f..cafb82826473 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
@@ -605,8 +605,7 @@ r535_gsp_rpc_rm_alloc_push(struct nvkm_gsp_object *object, 
void *argv, u32 repc)
ret = repc ? rpc->params : NULL;
}
 
-   if (ret)
-   nvkm_gsp_rpc_done(gsp, rpc);
+   nvkm_gsp_rpc_done(gsp, rpc);
 
return ret;
 }
-- 
2.43.0

[PATCH 04/11] nouveau/gsp: free acpi object after use

2023-12-21 Thread Dave Airlie

This fixes a memory leak for the acpi dod object.

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
index 365dda6c002a..1a6d7c89660d 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
@@ -1226,6 +1226,7 @@ r535_gsp_acpi_dod(acpi_handle handle, DOD_METHOD_DATA 
*dod)
}
 
dod->status = 0;
+   kfree(output.pointer);
 }
 #endif
 
-- 
2.43.0

[PATCH 06/11] drm/nouveau/gsp: Fix ACPI MXDM/MXDS method invocations

2023-12-21 Thread Dave Airlie

From: Lyude Paul 

Currently we get an error from ACPI because both of these arguments expect
a single argument, and we don't provide one. I'm not totally clear on what
that argument does, but we're able to find the missing value from
_acpiCacheMethodData() in src/kernel/platform/acpi_common.c in nvidia's
driver. So, let's add that - which doesn't get eDP displays to power on
quite yet, but gets rid of the argument warning at least.

Signed-off-by: Lyude Paul 
Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
index 1a6d7c89660d..774ca47b019f 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
@@ -1150,6 +1150,8 @@ static void
 r535_gsp_acpi_mux_id(acpi_handle handle, u32 id, MUX_METHOD_DATA_ELEMENT *mode,
 MUX_METHOD_DATA_ELEMENT *part)
 {
+   union acpi_object mux_arg = { ACPI_TYPE_INTEGER };
+   struct acpi_object_list input = { 1, _arg };
acpi_handle iter = NULL, handle_mux = NULL;
acpi_status status;
unsigned long long value;
@@ -1172,14 +1174,18 @@ r535_gsp_acpi_mux_id(acpi_handle handle, u32 id, 
MUX_METHOD_DATA_ELEMENT *mode,
if (!handle_mux)
return;
 
-   status = acpi_evaluate_integer(handle_mux, "MXDM", NULL, );
+   /* I -think- 0 means "acquire" according to nvidia's driver source */
+   input.pointer->integer.type = ACPI_TYPE_INTEGER;
+   input.pointer->integer.value = 0;
+
+   status = acpi_evaluate_integer(handle_mux, "MXDM", , );
if (ACPI_SUCCESS(status)) {
mode->acpiId = id;
mode->mode   = value;
mode->status = 0;
}
 
-   status = acpi_evaluate_integer(handle_mux, "MXDS", NULL, );
+   status = acpi_evaluate_integer(handle_mux, "MXDS", , );
if (ACPI_SUCCESS(status)) {
part->acpiId = id;
part->mode   = value;
-- 
2.43.0

[PATCH 03/11] nouveau: fix disp disabling with GSP

2023-12-21 Thread Dave Airlie

This func ptr here is normally static allocation, but gsp r535
uses a dynamic pointer, so we need to handle that better.

This fixes a crash with GSP when you use config=disp=0 to avoid
disp problems.

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nvkm/engine/disp/base.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/disp/base.c 
b/drivers/gpu/drm/nouveau/nvkm/engine/disp/base.c
index 457ec5db794d..b24eb1e560bc 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/disp/base.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/disp/base.c
@@ -209,7 +209,7 @@ nvkm_disp_dtor(struct nvkm_engine *engine)
nvkm_head_del();
}
 
-   if (disp->func->dtor)
+   if (disp->func && disp->func->dtor)
disp->func->dtor(disp);
 
return data;
@@ -243,8 +243,10 @@ nvkm_disp_new_(const struct nvkm_disp_func *func, struct 
nvkm_device *device,
spin_lock_init(>client.lock);
 
ret = nvkm_engine_ctor(_disp, device, type, inst, true, 
>engine);
-   if (ret)
+   if (ret) {
+   disp->func = NULL;
return ret;
+   }
 
if (func->super) {
disp->super.wq = create_singlethread_workqueue("nvkm-disp");
-- 
2.43.0

[PATCH 05/11] nouveau/gsp: free userd allocation.

2023-12-21 Thread Dave Airlie

This was being leaked.

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c
index d088e636edc3..b903785056b5 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c
@@ -242,6 +242,7 @@ r535_chan_id_put(struct nvkm_chan *chan)
nvkm_memory_unref(>mem);
nvkm_chid_put(runl->chid, userd->chid, 
>cgrp->lock);
list_del(>head);
+   kfree(userd);
}
 
break;
-- 
2.43.0

[PATCH 01/11] nouveau/gsp: add three notifier callbacks that we see in normal operation (v2)

2023-12-21 Thread Dave Airlie

Add NULL callbacks for some things GSP calls that we don't handle, but know 
about
so we avoid the logging.

v2: Timur suggested allowing null fn.

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
index 44fb86841c05..7f831f41b598 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
@@ -298,7 +298,8 @@ r535_gsp_msg_recv(struct nvkm_gsp *gsp, int fn, u32 repc)
struct nvkm_gsp_msgq_ntfy *ntfy = >msgq.ntfy[i];
 
if (ntfy->fn == msg->function) {
-   ntfy->func(ntfy->priv, ntfy->fn, msg->data, msg->length 
- sizeof(*msg));
+   if (ntfy->func)
+   ntfy->func(ntfy->priv, ntfy->fn, msg->data, 
msg->length - sizeof(*msg));
break;
}
}
@@ -2186,7 +2187,9 @@ r535_gsp_oneinit(struct nvkm_gsp *gsp)
r535_gsp_msg_ntfy_add(gsp, NV_VGPU_MSG_EVENT_MMU_FAULT_QUEUED,
  r535_gsp_msg_mmu_fault_queued, gsp);
r535_gsp_msg_ntfy_add(gsp, NV_VGPU_MSG_EVENT_OS_ERROR_LOG, 
r535_gsp_msg_os_error_log, gsp);
-
+   r535_gsp_msg_ntfy_add(gsp, 
NV_VGPU_MSG_EVENT_PERF_BRIDGELESS_INFO_UPDATE, NULL, NULL);
+   r535_gsp_msg_ntfy_add(gsp, NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT, NULL, 
NULL);
+   r535_gsp_msg_ntfy_add(gsp, NV_VGPU_MSG_EVENT_GSP_SEND_USER_SHARED_DATA, 
NULL, NULL);
ret = r535_gsp_rm_boot_ctor(gsp);
if (ret)
return ret;
-- 
2.43.0

[PATCH 02/11] nouveau/gsp: drop some acpi related debug

2023-12-21 Thread Dave Airlie

These were leftover debug, if we need to bring them back do so
for debugging later.

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nvkm/engine/disp/r535.c | 7 ---
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c  | 9 -
 2 files changed, 16 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/disp/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/engine/disp/r535.c
index 298035070b3a..1c8c4cca0957 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/disp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/disp/r535.c
@@ -1465,8 +1465,6 @@ r535_disp_oneinit(struct nvkm_disp *disp)
bool nvhg = acpi_check_dsm(handle, 
_DSM_GUID, NVHG_DSM_REV,
   1ULL << 0x0014);
 
-   printk(KERN_ERR "bl: nbci:%d nvhg:%d\n", nbci, 
nvhg);
-
if (nbci || nvhg) {
union acpi_object argv4 = {
.buffer.type= 
ACPI_TYPE_BUFFER,
@@ -1479,9 +1477,6 @@ r535_disp_oneinit(struct nvkm_disp *disp)
if (!obj) {
acpi_handle_info(handle, 
"failed to evaluate _DSM\n");
} else {
-   printk(KERN_ERR "bl: obj type 
%d\n", obj->type);
-   printk(KERN_ERR "bl: obj len 
%d\n", obj->package.count);
-
for (int i = 0; i < 
obj->package.count; i++) {
union acpi_object *elt 
= >package.elements[i];
u32 size;
@@ -1491,12 +1486,10 @@ r535_disp_oneinit(struct nvkm_disp *disp)
else
size = 4;
 
-   printk(KERN_ERR "elt 
%03d: type %d size %d\n", i, elt->type, size);

memcpy(>backLightData[ctrl->backLightDataSize], >integer.value, 
size);
ctrl->backLightDataSize 
+= size;
}
 
-   printk(KERN_ERR "bl: data size 
%d\n", ctrl->backLightDataSize);
ctrl->status = 0;
ACPI_FREE(obj);
}
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
index 7f831f41b598..365dda6c002a 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
@@ -1100,16 +1100,12 @@ r535_gsp_acpi_caps(acpi_handle handle, CAPS_METHOD_DATA 
*caps)
if (!obj)
return;
 
-   printk(KERN_ERR "nvop: obj type %d\n", obj->type);
-   printk(KERN_ERR "nvop: obj len %d\n", obj->buffer.length);
-
if (WARN_ON(obj->type != ACPI_TYPE_BUFFER) ||
WARN_ON(obj->buffer.length != 4))
return;
 
caps->status = 0;
caps->optimusCaps = *(u32 *)obj->buffer.pointer;
-   printk(KERN_ERR "nvop: caps %08x\n", caps->optimusCaps);
 
ACPI_FREE(obj);
 
@@ -1136,9 +1132,6 @@ r535_gsp_acpi_jt(acpi_handle handle, JT_METHOD_DATA *jt)
if (!obj)
return;
 
-   printk(KERN_ERR "jt: obj type %d\n", obj->type);
-   printk(KERN_ERR "jt: obj len %d\n", obj->buffer.length);
-
if (WARN_ON(obj->type != ACPI_TYPE_BUFFER) ||
WARN_ON(obj->buffer.length != 4))
return;
@@ -1147,7 +1140,6 @@ r535_gsp_acpi_jt(acpi_handle handle, JT_METHOD_DATA *jt)
jt->jtCaps = *(u32 *)obj->buffer.pointer;
jt->jtRevId = (jt->jtCaps & 0xfff0) >> 20;
jt->bSBIOSCaps = 0;
-   printk(KERN_ERR "jt: caps %08x rev:%04x\n", jt->jtCaps, jt->jtRevId);
 
ACPI_FREE(obj);
 
@@ -1233,7 +1225,6 @@ r535_gsp_acpi_dod(acpi_handle handle, DOD_METHOD_DATA 
*dod)
dod->acpiIdListLen += sizeof(dod->acpiIdList[0]);
}
 
-   printk(KERN_ERR "_DOD: ok! len:%d\n", dod->acpiIdListLen);
dod->status = 0;
 }
 #endif
-- 
2.43.0

nouveau GSP fixes

2023-12-21 Thread Dave Airlie

This is a collection of nouveau debug prints, memory leak, a very
annoying race condition causing system hangs with prime scenarios,
and a fix from Lyude to get the panel on my laptop working.

I'd like to get these into 6.7,

Dave.

Re: [PATCH] drm/nouveau: Fixup gk20a instobj hierarchy

2023-12-14 Thread Dave Airlie

On Thu, 14 Dec 2023 at 19:26, Jon Hunter  wrote:
>
>
>
> On 08/12/2023 10:46, Thierry Reding wrote:
> > From: Thierry Reding 
> >
> > Commit 12c9b05da918 ("drm/nouveau/imem: support allocations not
> > preserved across suspend") uses container_of() to cast from struct
> > nvkm_memory to struct nvkm_instobj, assuming that all instance objects
> > are derived from struct nvkm_instobj. For the gk20a family that's not
> > the case and they are derived from struct nvkm_memory instead. This
> > causes some subtle data corruption (nvkm_instobj.preserve ends up
> > mapping to gk20a_instobj.vaddr) that causes a NULL pointer dereference
> > in gk20a_instobj_acquire_iommu() (and possibly elsewhere) and also
> > prevents suspend/resume from working.
> >
> > Fix this by making struct gk20a_instobj derive from struct nvkm_instobj
> > instead.
> >
> > Fixes: 12c9b05da918 ("drm/nouveau/imem: support allocations not preserved 
> > across suspend")
> > Reported-by: Jonathan Hunter 
> > Signed-off-by: Thierry Reding 

I've applied this to drm-fixes.

Dave.

Re: [Nouveau] Meaning of the engines in paramaters of nouveau module

2023-12-04 Thread Dave Airlie

On Mon, 4 Dec 2023 at 05:04, Paul Dufresne  wrote:
>
> In https://nouveau.freedesktop.org/KernelModuleParameters.html, there is:
> Here is a list of engines:
> DEVICE
> DMAOBJ
> PBSP
> PCE0
> PCE1
> PCE2
> PCRYPT
> PDISP
> PFIFO
> PGRAPH
> PMPEG
> PPM
> 
> PVP
> SW
> Also, in debug:
>CLIENT
>
> I have tried to find a description of those.
> https://envytools.readthedocs.io/en/latest/
> help a bit, but I don't find a precise correlation.
>
> https://nouveau.freedesktop.org/NouveauTerms.html
> does not seems to have most of these terms.
>
> I am particularly curious about PCE[0-3].
> But also about CLIENT, that seems different and mysterious.
>
> I wonder if it is possible to write:
> nouveau.debug=info,PDISP=debug
> to have the info debug level as the default, but for PDISP have the debug 
> level
>
> Also, my interest is linked to the state of GPU graph given after a context 
> switch timeout that looks like:
> [ 1696.780305] nouveau :01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
> [ 1696.780361] nouveau :01:00.0: fifo:00:00[  gr]: 8006e005: busy 
> 1 faulted 0 chsw 1 save 1 load 1 chid 5*-> chid 6
> [ 1696.780422] nouveau :01:00.0: fifo:00:07[ ce2]: 00050005: busy 
> 0 faulted 0 chsw 0 save 0 load 0 chid 5 -> chid 5
> [ 1696.780476] nouveau :01:00.0: fifo:04:04[ ce0]: : busy 
> 0 faulted 0 chsw 0 save 0 load 0 chid 0 -> chid 0
> [ 1696.780529] nouveau :01:00.0: fifo:01:01[  mspdec]: : busy 
> 0 faulted 0 chsw 0 save 0 load 0 chid 0 -> chid 0
> [ 1696.780581] nouveau :01:00.0: fifo:02:02[   msppp]: : busy 
> 0 faulted 0 chsw 0 save 0 load 0 chid 0 -> chid 0
> [ 1696.780633] nouveau :01:00.0: fifo:03:03[   msvld]: : busy 
> 0 faulted 0 chsw 0 save 0 load 0 chid 0 -> chid 0
> [ 1696.780689] nouveau :01:00.0: fifo:00:00[  gr]: 8006e005: busy 
> 1 faulted 0 chsw 1 save 1 load 1 chid 5*-> chid 6
> [ 1696.780744] nouveau :01:00.0: fifo:00:00[  gr]: 8006e005: busy 
> 1 faulted 0 chsw 1 save 1 load 1 chid 5*-> chid 6
> [ 1696.780795] nouveau :01:00.0: fifo:00:00[  gr]: triggering mmu 
> fault on 0x00
> [ 1696.780835] nouveau :01:00.0: fifo:00:07[ ce2]: 00050005: busy 
> 0 faulted 0 chsw 0 save 0 load 0 chid 5 -> chid 5
> [ 1696.780942] nouveau :01:00.0: fifo:00:00[  gr]: 0100: mmu 
> fault triggered
> [ 1696.780987] nouveau :01:00.0: fifo:00:00[  gr]: c006e005: busy 
> 1 faulted 1 chsw 1 save 1 load 1 chid 5*-> chid 6
> [ 1696.781040] nouveau :01:00.0: fifo:00:0005:[Renderer[13701]] rc 
> scheduled
>
> where I suspect ce2, is linked to PCE2.
>
> Is there a documentation that describes those "engines"?

CE is copy engine.
But this looks like an mmu fault on the GPU side, so some shader is
doing something wrong most likely.

Dave.

Re: [Nouveau] nouveau-next stalled to august 30?

2023-12-04 Thread Dave Airlie

On Mon, 4 Dec 2023 at 02:51, Paul Dufresne  wrote:
>
> According to:
> https://nouveau.freedesktop.org/InstallNouveau.html
> the project use the kernel at:
> https://gitlab.freedesktop.org/drm/nouveau
> but history shows that it stalled at August 30:
> https://gitlab.freedesktop.org/drm/nouveau/-/commits/nouveau-next
> because of this patch that cannot be push:
> https://gitlab.freedesktop.org/drm/nouveau/-/commit/775b8212e839213b335594cd5cef084d6a88a3af/pipelines?ref=nouveau-next
>
> So... I just don't understand where all the patches on this mailing list goes 
> into which repository.
>
> BTW: https://nouveau.freedesktop.org/ shows in Software section to use:
> https://github.com/skeggsb/linux/
> but this seems to be 8 years behind.

We are just using drm-misc to stage things now, there might be
branches outside that, but upstreaming is done via drm-misc-next and
drm-misc-fixes.

Dave.

[Nouveau] [PATCH] nouveau/gsp: drop some acpi related debug

2023-12-04 Thread Dave Airlie

These were leftover debug, if we need to bring them back do so
for debugging later.

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
index 6c0a8fbf0061..1dba7c49bd9a 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
@@ -1100,16 +1100,12 @@ r535_gsp_acpi_caps(acpi_handle handle, CAPS_METHOD_DATA 
*caps)
if (!obj)
return;
 
-   printk(KERN_ERR "nvop: obj type %d\n", obj->type);
-   printk(KERN_ERR "nvop: obj len %d\n", obj->buffer.length);
-
if (WARN_ON(obj->type != ACPI_TYPE_BUFFER) ||
WARN_ON(obj->buffer.length != 4))
return;
 
caps->status = 0;
caps->optimusCaps = *(u32 *)obj->buffer.pointer;
-   printk(KERN_ERR "nvop: caps %08x\n", caps->optimusCaps);
 
ACPI_FREE(obj);
 
@@ -1136,9 +1132,6 @@ r535_gsp_acpi_jt(acpi_handle handle, JT_METHOD_DATA *jt)
if (!obj)
return;
 
-   printk(KERN_ERR "jt: obj type %d\n", obj->type);
-   printk(KERN_ERR "jt: obj len %d\n", obj->buffer.length);
-
if (WARN_ON(obj->type != ACPI_TYPE_BUFFER) ||
WARN_ON(obj->buffer.length != 4))
return;
@@ -1147,7 +1140,6 @@ r535_gsp_acpi_jt(acpi_handle handle, JT_METHOD_DATA *jt)
jt->jtCaps = *(u32 *)obj->buffer.pointer;
jt->jtRevId = (jt->jtCaps & 0xfff0) >> 20;
jt->bSBIOSCaps = 0;
-   printk(KERN_ERR "jt: caps %08x rev:%04x\n", jt->jtCaps, jt->jtRevId);
 
ACPI_FREE(obj);
 
@@ -1233,7 +1225,6 @@ r535_gsp_acpi_dod(acpi_handle handle, DOD_METHOD_DATA 
*dod)
dod->acpiIdListLen += sizeof(dod->acpiIdList[0]);
}
 
-   printk(KERN_ERR "_DOD: ok! len:%d\n", dod->acpiIdListLen);
dod->status = 0;
 }
 #endif
-- 
2.43.0

[Nouveau] [PATCH] nouveau/gsp: drop the gsp failure message to a debug.

2023-12-04 Thread Dave Airlie

From: Dave Airlie 

These can happen in normal operations esp with auxch transactions.

Gets rid of
nouveau :01:00.0: gsp: cli:0xc1d2 obj:0x0073 ctrl cmd:0x00731341 
failed: 0x
in logs.

Cc: Lyude 
Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
index 72c14e7f6566..e72fe1ed0e8f 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
@@ -641,7 +641,7 @@ r535_gsp_rpc_rm_ctrl_push(struct nvkm_gsp_object *object, 
void *argv, u32 repc)
return rpc;
 
if (rpc->status) {
-   nvkm_error(>subdev, "cli:0x%08x obj:0x%08x ctrl cmd:0x%08x 
failed: 0x%08x\n",
+   nvkm_debug(>subdev, "cli:0x%08x obj:0x%08x ctrl cmd:0x%08x 
failed: 0x%08x\n",
   object->client->object.handle, object->handle, 
rpc->cmd, rpc->status);
ret = ERR_PTR(-EINVAL);
} else {
-- 
2.43.0

[Nouveau] [PATCH] nouveau/gsp: add three notifier callbacks that we see in normal operation (v2)

2023-12-04 Thread Dave Airlie

Add NULL callbacks for some things GSP calls that we don't handle, but know 
about
so we avoid the logging.

v2: Timur suggested allowing null fn.

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
index f6725a5f5bfb..6c0a8fbf0061 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
@@ -298,7 +298,8 @@ r535_gsp_msg_recv(struct nvkm_gsp *gsp, int fn, u32 repc)
struct nvkm_gsp_msgq_ntfy *ntfy = >msgq.ntfy[i];
 
if (ntfy->fn == msg->function) {
-   ntfy->func(ntfy->priv, ntfy->fn, msg->data, msg->length 
- sizeof(*msg));
+   if (ntfy->func)
+   ntfy->func(ntfy->priv, ntfy->fn, msg->data, 
msg->length - sizeof(*msg));
break;
}
}
@@ -2104,7 +2105,9 @@ r535_gsp_oneinit(struct nvkm_gsp *gsp)
r535_gsp_msg_ntfy_add(gsp, NV_VGPU_MSG_EVENT_MMU_FAULT_QUEUED,
  r535_gsp_msg_mmu_fault_queued, gsp);
r535_gsp_msg_ntfy_add(gsp, NV_VGPU_MSG_EVENT_OS_ERROR_LOG, 
r535_gsp_msg_os_error_log, gsp);
-
+   r535_gsp_msg_ntfy_add(gsp, 
NV_VGPU_MSG_EVENT_PERF_BRIDGELESS_INFO_UPDATE, NULL, NULL);
+   r535_gsp_msg_ntfy_add(gsp, NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT, NULL, 
NULL);
+   r535_gsp_msg_ntfy_add(gsp, NV_VGPU_MSG_EVENT_GSP_SEND_USER_SHARED_DATA, 
NULL, NULL);
ret = r535_gsp_rm_boot_ctor(gsp);
if (ret)
return ret;
-- 
2.43.0

Re: [Nouveau] [PATCH] nouveau/gsp: add three notifier callbacks that we see in normal operation

2023-12-04 Thread Dave Airlie

On Tue, 5 Dec 2023 at 09:07, Timur Tabi  wrote:
>
> On Tue, 2023-12-05 at 08:55 +1000, Dave Airlie wrote:
> > +static int
> > +r535_gsp_msg_ucode_libos_print(void *priv, u32 fn, void *repv, u32 repc)
> > +{
> > +   /* work out what we should do here. */
> > +   return 0;
> > +}
>
> This is part of my logrm debugfs patch.  It contains the printf log from a
> PMU exception.
>
> Do you want me to research the other two RPCs and tell you exactly what they
> do?
>
> But if you're not planning on actually doing anything with these RPCs, why
> add callbacks?  Doesn't the driver already ignore RPCs it doesn't recognize?

The current code prints a message when we get a callback we don't
handle, this silences those, but maybe I should just silence them.

Dave.

[Nouveau] [PATCH] nouveau/gsp: add three notifier callbacks that we see in normal operation

2023-12-04 Thread Dave Airlie

These seem to get called, but it doesn't look like we have to care too much
at this point.

Signed-off-by: Dave Airlie 
---
 .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c| 24 ++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
index f6725a5f5bfb..8b368df2e798 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
@@ -1505,6 +1505,26 @@ r535_gsp_msg_run_cpu_sequencer(void *priv, u32 fn, void 
*repv, u32 repc)
return 0;
 }
 
+static int
+r535_gsp_msg_perf_bridgeless_info_update(void *priv, u32 fn, void *repv, u32 
repc)
+{
+   return 0;
+}
+
+static int
+r535_gsp_msg_ucode_libos_print(void *priv, u32 fn, void *repv, u32 repc)
+{
+   /* work out what we should do here. */
+   return 0;
+}
+
+static int
+r535_gsp_msg_gsp_send_user_shared_data(void *priv, u32 fn, void *repv, u32 
repc)
+{
+   /* this seems to send some sort of assert counts from gsp */
+   return 0;
+}
+
 static void
 nvkm_gsp_mem_dtor(struct nvkm_gsp *gsp, struct nvkm_gsp_mem *mem)
 {
@@ -2104,7 +2124,9 @@ r535_gsp_oneinit(struct nvkm_gsp *gsp)
r535_gsp_msg_ntfy_add(gsp, NV_VGPU_MSG_EVENT_MMU_FAULT_QUEUED,
  r535_gsp_msg_mmu_fault_queued, gsp);
r535_gsp_msg_ntfy_add(gsp, NV_VGPU_MSG_EVENT_OS_ERROR_LOG, 
r535_gsp_msg_os_error_log, gsp);
-
+   r535_gsp_msg_ntfy_add(gsp, 
NV_VGPU_MSG_EVENT_PERF_BRIDGELESS_INFO_UPDATE, 
r535_gsp_msg_perf_bridgeless_info_update, gsp);
+   r535_gsp_msg_ntfy_add(gsp, NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT, 
r535_gsp_msg_ucode_libos_print, gsp);
+   r535_gsp_msg_ntfy_add(gsp, NV_VGPU_MSG_EVENT_GSP_SEND_USER_SHARED_DATA, 
r535_gsp_msg_gsp_send_user_shared_data, gsp);
ret = r535_gsp_rm_boot_ctor(gsp);
if (ret)
return ret;
-- 
2.43.0

[Nouveau] [PATCH] nouveau/tu102: flush all pdbs on vmm flush

2023-11-29 Thread Dave Airlie

From: Dave Airlie 

This is a hackaround a bug exposed with the GSP code, I'm not sure
what it happening exactly, but it appears some of our flushes don't
result in proper tlb invalidation for out BAR2 and we get a BAR2
fault from GSP and it all dies.

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmtu102.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmtu102.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmtu102.c
index e34bc6076401..8379e72d77ab 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmtu102.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmtu102.c
@@ -31,7 +31,7 @@ tu102_vmm_flush(struct nvkm_vmm *vmm, int depth)
 
type |= 0x0001; /* PAGE_ALL */
if (atomic_read(>engref[NVKM_SUBDEV_BAR]))
-   type |= 0x0004; /* HUB_ONLY */
+   type |= 0x0006; /* HUB_ONLY | ALL PDB (hack) */
 
mutex_lock(>mmu->mutex);
 
-- 
2.42.0

Re: [Nouveau] [PATCH 3/3] nouveau/gsp: add some basic registry entries.

2023-11-27 Thread Dave Airlie

On Tue, 28 Nov 2023 at 06:48, Timur Tabi  wrote:
>
> On Tue, Oct 31, 2023 at 12:20 AM Dave Airlie  wrote:
> > rpc->size = sizeof(*rpc);
> > -   rpc->numEntries = 1;
> > -   rpc->entries[0].nameOffset = offsetof(typeof(*rpc), entries[1]);
> > -   rpc->entries[0].type = 1;
> > -   rpc->entries[0].data = 0;
> > -   rpc->entries[0].length = 4;
> > -
> > -   strings = (char *)>entries[1];
> > -   strings[0] = '\0';
> > +   rpc->numEntries = NV_GSP_REG_NUM_ENTRIES;
> > +
> > +   str_offset = offsetof(typeof(*rpc), 
> > entries[NV_GSP_REG_NUM_ENTRIES]);
> > +   strings = (char *)>entries[NV_GSP_REG_NUM_ENTRIES];
> > +   for (i = 0; i < NV_GSP_REG_NUM_ENTRIES; i++) {
> > +   int name_len = strlen(r535_registry_entries[i].name) + 1;
> > +   rpc->entries[i].nameOffset = str_offset;
> > +   rpc->entries[i].type = 1;
> > +   rpc->entries[i].data = r535_registry_entries[i].value;
> > +   rpc->entries[i].length = 4;
> > +   memcpy(strings, r535_registry_entries[i].name, name_len);
> > +   strings += name_len;
> > +   str_offset += name_len;
> > +   }
>
> I'm working on a patch that replaces this code with a
> dynamically-generated registry so that we can set registry keys via
> the driver's command-line (like the Nvidia driver).

I'm not sure we'd want that, except maybe as a debugging aid, I'd
really like to have nouveau just pick the correct set of registry
entries, but I suppose there might be some cases where setting the
from the command line would be good for testing.

> a bug here.  rpc->size must be the total size of the RPC, including
> all the PACKED_REGISTRY_ENTRY structs and the strings that follow
> them.  You can see this by looking at RmPackageRegistry() and
> regCountEntriesAndSize() in OpenRM.

Oh interesting, I'll take a look today.

Dave.

Re: [Nouveau] [PATCH drm-misc-next 2/2] drm/nouveau: enable dynamic job-flow control

2023-11-21 Thread Dave Airlie

On Tue, 14 Nov 2023 at 10:27, Danilo Krummrich  wrote:
>
> Make use of the scheduler's credit limit and scheduler job's credit
> count to account for the actual size of a job, such that we fill up the
> ring efficiently.

For the two:

Reviewed-by: Dave Airlie 

>
> Signed-off-by: Danilo Krummrich 
> ---
>  drivers/gpu/drm/nouveau/nouveau_abi16.c | 3 ++-
>  drivers/gpu/drm/nouveau/nouveau_drm.c   | 2 +-
>  drivers/gpu/drm/nouveau/nouveau_exec.c  | 4 +++-
>  drivers/gpu/drm/nouveau/nouveau_sched.c | 9 -
>  drivers/gpu/drm/nouveau/nouveau_sched.h | 3 ++-
>  drivers/gpu/drm/nouveau/nouveau_uvmm.c  | 4 +++-
>  6 files changed, 15 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.c 
> b/drivers/gpu/drm/nouveau/nouveau_abi16.c
> index f8e59cfb1d34..207945700c94 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_abi16.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_abi16.c
> @@ -316,7 +316,8 @@ nouveau_abi16_ioctl_channel_alloc(ABI16_IOCTL_ARGS)
> if (ret)
> goto done;
>
> -   ret = nouveau_sched_init(>sched, drm, drm->sched_wq);
> +   ret = nouveau_sched_init(>sched, drm, drm->sched_wq,
> +chan->chan->dma.ib_max);
> if (ret)
> goto done;
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c 
> b/drivers/gpu/drm/nouveau/nouveau_drm.c
> index 7e5f19153829..6f6c31a9937b 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -320,7 +320,7 @@ nouveau_cli_init(struct nouveau_drm *drm, const char 
> *sname,
>  * locks which indirectly or directly are held for allocations
>  * elsewhere.
>  */
> -   ret = nouveau_sched_init(>sched, drm, NULL);
> +   ret = nouveau_sched_init(>sched, drm, NULL, 1);
> if (ret)
> goto done;
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_exec.c 
> b/drivers/gpu/drm/nouveau/nouveau_exec.c
> index 388831c53551..63c344f4f78e 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_exec.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_exec.c
> @@ -258,10 +258,12 @@ nouveau_exec_job_init(struct nouveau_exec_job **pjob,
> }
> }
>
> +   args.file_priv = __args->file_priv;
> job->chan = __args->chan;
>
> args.sched = __args->sched;
> -   args.file_priv = __args->file_priv;
> +   /* Plus one to account for the HW fence. */
> +   args.credits = job->push.count + 1;
>
> args.in_sync.count = __args->in_sync.count;
> args.in_sync.s = __args->in_sync.s;
> diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c 
> b/drivers/gpu/drm/nouveau/nouveau_sched.c
> index faabd662b165..6406d6659361 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_sched.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_sched.c
> @@ -12,7 +12,6 @@
>  #include "nouveau_abi16.h"
>  #include "nouveau_sched.h"
>
> -#define NOUVEAU_SCHED_HW_SUBMISSIONS   1
>  #define NOUVEAU_SCHED_JOB_TIMEOUT_MS   1
>
>  /* Starts at 0, since the DRM scheduler interprets those parameters as 
> (initial)
> @@ -85,10 +84,10 @@ nouveau_job_init(struct nouveau_job *job,
> ret = -ENOMEM;
> goto err_free_objs;
> }
> -
> }
>
> -   ret = drm_sched_job_init(>base, >entity, 1, NULL);
> +   ret = drm_sched_job_init(>base, >entity,
> +args->credits, NULL);
> if (ret)
> goto err_free_chains;
>
> @@ -396,7 +395,7 @@ static const struct drm_sched_backend_ops 
> nouveau_sched_ops = {
>
>  int
>  nouveau_sched_init(struct nouveau_sched *sched, struct nouveau_drm *drm,
> -  struct workqueue_struct *wq)
> +  struct workqueue_struct *wq, u32 credit_limit)
>  {
> struct drm_gpu_scheduler *drm_sched = >base;
> struct drm_sched_entity *entity = >entity;
> @@ -414,7 +413,7 @@ nouveau_sched_init(struct nouveau_sched *sched, struct 
> nouveau_drm *drm,
>
> ret = drm_sched_init(drm_sched, _sched_ops, wq,
>  NOUVEAU_SCHED_PRIORITY_COUNT,
> -NOUVEAU_SCHED_HW_SUBMISSIONS, 0, job_hang_limit,
> +credit_limit, 0, job_hang_limit,
>  NULL, NULL, "nouveau_sched", drm->dev->dev);
> if (ret)
> goto fail_wq;
> diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.h 
> b/drivers/gpu/drm/nouveau/nouveau_sched.h
> index 026f33d9b70c..7ba8ffec135a 1

Re: [Nouveau] [PATCH drm-misc-next] drm/nouveau: use GPUVM common infrastructure

2023-11-21 Thread Dave Airlie

On Tue, 14 Nov 2023 at 08:12, Danilo Krummrich  wrote:
>
> GPUVM provides common infrastructure to track external and evicted GEM
> objects as well as locking and validation helpers.
>
> Especially external and evicted object tracking is a huge improvement
> compared to the current brute force approach of iterating all mappings
> in order to lock and validate the GPUVM's GEM objects. Hence, make us of
> it.
>
> Signed-off-by: Danilo Krummrich 

Reviewed-by: Dave Airlie 

> ---
> Originally, this patch was part of [1]. However, while applying the series,
> I noticed that this patch needs another iteration, hence I held it back to
> rework it and send it separately.
>
> Changes since detached from [1]:
> 
> - Don't use drm_gpuvm_exec_lock() since it would, unnecessarily, lock all the
>   backing buffer's dma-resv locks. Instead, lock only the GEMs affected by the
>   requested mapping operations, directly or indirectly through map, remap or
>   unmap.
> - Validate backing buffers in drm_exec loop directly.
>
> [1] 
> https://lore.kernel.org/dri-devel/20231108001259.15123-1-d...@redhat.com/T/#u
> ---
>  drivers/gpu/drm/nouveau/nouveau_bo.c|   4 +-
>  drivers/gpu/drm/nouveau/nouveau_exec.c  |  57 +++---
>  drivers/gpu/drm/nouveau/nouveau_exec.h  |   4 -
>  drivers/gpu/drm/nouveau/nouveau_sched.c |   9 +-
>  drivers/gpu/drm/nouveau/nouveau_sched.h |   7 +-
>  drivers/gpu/drm/nouveau/nouveau_uvmm.c  | 134 +---
>  6 files changed, 100 insertions(+), 115 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
> b/drivers/gpu/drm/nouveau/nouveau_bo.c
> index 7afad86da64b..b7dda486a7ea 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> @@ -1061,17 +1061,18 @@ nouveau_bo_move(struct ttm_buffer_object *bo, bool 
> evict,
>  {
> struct nouveau_drm *drm = nouveau_bdev(bo->bdev);
> struct nouveau_bo *nvbo = nouveau_bo(bo);
> +   struct drm_gem_object *obj = >base;
> struct ttm_resource *old_reg = bo->resource;
> struct nouveau_drm_tile *new_tile = NULL;
> int ret = 0;
>
> -
> if (new_reg->mem_type == TTM_PL_TT) {
> ret = nouveau_ttm_tt_bind(bo->bdev, bo->ttm, new_reg);
> if (ret)
> return ret;
> }
>
> +   drm_gpuvm_bo_gem_evict(obj, evict);
> nouveau_bo_move_ntfy(bo, new_reg);
> ret = ttm_bo_wait_ctx(bo, ctx);
> if (ret)
> @@ -1136,6 +1137,7 @@ nouveau_bo_move(struct ttm_buffer_object *bo, bool 
> evict,
>  out_ntfy:
> if (ret) {
> nouveau_bo_move_ntfy(bo, bo->resource);
> +   drm_gpuvm_bo_gem_evict(obj, !evict);
> }
> return ret;
>  }
> diff --git a/drivers/gpu/drm/nouveau/nouveau_exec.c 
> b/drivers/gpu/drm/nouveau/nouveau_exec.c
> index bf6c12f4342a..9d9835fb5970 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_exec.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_exec.c
> @@ -1,7 +1,5 @@
>  // SPDX-License-Identifier: MIT
>
> -#include 
> -
>  #include "nouveau_drv.h"
>  #include "nouveau_gem.h"
>  #include "nouveau_mem.h"
> @@ -86,14 +84,12 @@
>   */
>
>  static int
> -nouveau_exec_job_submit(struct nouveau_job *job)
> +nouveau_exec_job_submit(struct nouveau_job *job,
> +   struct drm_gpuvm_exec *vme)
>  {
> struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job);
> struct nouveau_cli *cli = job->cli;
> struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(cli);
> -   struct drm_exec *exec = >exec;
> -   struct drm_gem_object *obj;
> -   unsigned long index;
> int ret;
>
> /* Create a new fence, but do not emit yet. */
> @@ -102,52 +98,29 @@ nouveau_exec_job_submit(struct nouveau_job *job)
> return ret;
>
> nouveau_uvmm_lock(uvmm);
> -   drm_exec_init(exec, DRM_EXEC_INTERRUPTIBLE_WAIT |
> -   DRM_EXEC_IGNORE_DUPLICATES);
> -   drm_exec_until_all_locked(exec) {
> -   struct drm_gpuva *va;
> -
> -   drm_gpuvm_for_each_va(va, >base) {
> -   if (unlikely(va == >base.kernel_alloc_node))
> -   continue;
> -
> -   ret = drm_exec_prepare_obj(exec, va->gem.obj, 1);
> -   drm_exec_retry_on_contention(exec);
> -   if (ret)
> -   goto err_uvmm_unlock;
> -   }
> +   ret = drm_gpuvm_exec_lock(vme);
> +

Re: [Nouveau] [PATCH] nouveau/gsp: fix getting max channel id for GSP

2023-11-21 Thread Dave Airlie

Self NAK, this isn't sufficient to fix events.

On Mon, 20 Nov 2023 at 12:07, Dave Airlie  wrote:
>
> From: Dave Airlie 
>
> The fence code uses the total number of channel ids to allocate a
> bunch of memory for fencing. This is probably not the best way to
> do this, but it's hard to fix right now.
>
> The GSP code realises it can fit 8 channels into a USERD
> page, so it claims it can support 256 channels max, but it then
> allocates channel ids sparsely (0, 8, 16 etc).
>
> This just exposes the multiplier to userspace so the fence code
> gets things right, however I think this might all need more thought.
>
> Link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/274
> Signed-off-by: Dave Airlie 
> ---
>  drivers/gpu/drm/nouveau/nvkm/engine/fifo/base.c | 7 ++-
>  drivers/gpu/drm/nouveau/nvkm/engine/fifo/priv.h | 2 ++
>  drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c | 7 +++
>  3 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/base.c 
> b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/base.c
> index 22443fe4a39f..8e36cdd0e5fb 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/base.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/base.c
> @@ -178,7 +178,12 @@ nvkm_fifo_info(struct nvkm_engine *engine, u64 mthd, u64 
> *data)
> return ret;
>
> switch (mthd) {
> -   case NV_DEVICE_HOST_CHANNELS: *data = fifo->chid ? fifo->chid->nr : 
> 0; return 0;
> +   case NV_DEVICE_HOST_CHANNELS:
> +   if (fifo->func->chid_total)
> +   *data = fifo->func->chid_total(fifo);
> +   else
> +   *data = fifo->chid ? fifo->chid->nr : 0;
> +   return 0;
> case NV_DEVICE_HOST_RUNLISTS:
> *data = 0;
> nvkm_runl_foreach(runl, fifo)
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/priv.h 
> b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/priv.h
> index a0f3277605a5..c21e982b03a5 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/priv.h
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/priv.h
> @@ -17,6 +17,8 @@ struct nvkm_fifo_func {
>
> int (*chid_nr)(struct nvkm_fifo *);
> int (*chid_ctor)(struct nvkm_fifo *, int nr);
> +
> +   int (*chid_total)(struct nvkm_fifo *);
> int (*runq_nr)(struct nvkm_fifo *);
> int (*runl_ctor)(struct nvkm_fifo *);
>
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c 
> b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c
> index b374d72fd1c1..1e9c0b113cb5 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c
> @@ -641,6 +641,12 @@ r535_fifo_dtor(struct nvkm_fifo *fifo)
> kfree(fifo->func);
>  }
>
> +static int
> +r535_fifo_chid_total(struct nvkm_fifo *fifo)
> +{
> +   return fifo->chid->nr * CHID_PER_USERD;
> +}
> +
>  int
>  r535_fifo_new(const struct nvkm_fifo_func *hw, struct nvkm_device *device,
>   enum nvkm_subdev_type type, int inst, struct nvkm_fifo **pfifo)
> @@ -652,6 +658,7 @@ r535_fifo_new(const struct nvkm_fifo_func *hw, struct 
> nvkm_device *device,
>
> rm->dtor = r535_fifo_dtor;
> rm->runl_ctor = r535_fifo_runl_ctor;
> +   rm->chid_total = r535_fifo_chid_total;
> rm->runl = _runl;
> rm->cgrp = hw->cgrp;
> rm->cgrp.func = _cgrp;
> --
> 2.42.0
>

[Nouveau] [PATCH] nouveau/gsp: allocate enough space for all channel ids.

2023-11-21 Thread Dave Airlie

From: Dave Airlie 

This probably isn't the ideal fix, but we ended up using chids
sparsely, and lots of things rely on indexing into the full range,
so just allocate the full range up front.

The GSP code fixes 8 channels into a userd page, but we end up using
a single userd page per channel so end up sparsely using the range.

Fixes a few crashes seen with multiple channels.

Link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/277
Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c
index 3adbb05ff587..d088e636edc3 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c
@@ -539,7 +539,7 @@ r535_fifo_runl_ctor(struct nvkm_fifo *fifo)
struct nvkm_runl *runl;
struct nvkm_engn *engn;
u32 cgids = 2048;
-   u32 chids = 2048 / CHID_PER_USERD;
+   u32 chids = 2048;
int ret;
NV2080_CTRL_FIFO_GET_DEVICE_INFO_TABLE_PARAMS *ctrl;
 
-- 
2.42.0

[Nouveau] [PATCH] nouveau/gsp: fix getting max channel id for GSP

2023-11-19 Thread Dave Airlie

From: Dave Airlie 

The fence code uses the total number of channel ids to allocate a
bunch of memory for fencing. This is probably not the best way to
do this, but it's hard to fix right now.

The GSP code realises it can fit 8 channels into a USERD
page, so it claims it can support 256 channels max, but it then
allocates channel ids sparsely (0, 8, 16 etc).

This just exposes the multiplier to userspace so the fence code
gets things right, however I think this might all need more thought.

Link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/274
Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nvkm/engine/fifo/base.c | 7 ++-
 drivers/gpu/drm/nouveau/nvkm/engine/fifo/priv.h | 2 ++
 drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c | 7 +++
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/base.c 
b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/base.c
index 22443fe4a39f..8e36cdd0e5fb 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/base.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/base.c
@@ -178,7 +178,12 @@ nvkm_fifo_info(struct nvkm_engine *engine, u64 mthd, u64 
*data)
return ret;
 
switch (mthd) {
-   case NV_DEVICE_HOST_CHANNELS: *data = fifo->chid ? fifo->chid->nr : 0; 
return 0;
+   case NV_DEVICE_HOST_CHANNELS:
+   if (fifo->func->chid_total)
+   *data = fifo->func->chid_total(fifo);
+   else
+   *data = fifo->chid ? fifo->chid->nr : 0;
+   return 0;
case NV_DEVICE_HOST_RUNLISTS:
*data = 0;
nvkm_runl_foreach(runl, fifo)
diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/priv.h 
b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/priv.h
index a0f3277605a5..c21e982b03a5 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/priv.h
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/priv.h
@@ -17,6 +17,8 @@ struct nvkm_fifo_func {
 
int (*chid_nr)(struct nvkm_fifo *);
int (*chid_ctor)(struct nvkm_fifo *, int nr);
+
+   int (*chid_total)(struct nvkm_fifo *);
int (*runq_nr)(struct nvkm_fifo *);
int (*runl_ctor)(struct nvkm_fifo *);
 
diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c
index b374d72fd1c1..1e9c0b113cb5 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c
@@ -641,6 +641,12 @@ r535_fifo_dtor(struct nvkm_fifo *fifo)
kfree(fifo->func);
 }
 
+static int
+r535_fifo_chid_total(struct nvkm_fifo *fifo)
+{
+   return fifo->chid->nr * CHID_PER_USERD;
+}
+
 int
 r535_fifo_new(const struct nvkm_fifo_func *hw, struct nvkm_device *device,
  enum nvkm_subdev_type type, int inst, struct nvkm_fifo **pfifo)
@@ -652,6 +658,7 @@ r535_fifo_new(const struct nvkm_fifo_func *hw, struct 
nvkm_device *device,
 
rm->dtor = r535_fifo_dtor;
rm->runl_ctor = r535_fifo_runl_ctor;
+   rm->chid_total = r535_fifo_chid_total;
rm->runl = _runl;
rm->cgrp = hw->cgrp;
rm->cgrp.func = _cgrp;
-- 
2.42.0

Re: [Nouveau] [PATCH] nouveau: don't fail driver load if no display hw present.

2023-11-15 Thread Dave Airlie

On Wed, 15 Nov 2023 at 05:54, Danilo Krummrich  wrote:
>
> On 11/5/23 21:37, Dave Airlie wrote:
> > From: Dave Airlie 
> >
> > If we get back ENODEV don't fail load.
>
> Maybe worth to note why this is OK in this case, might not be obvious
> to future readers of the code.

Sent an updated version with that fixed.
>
> >
> > Fixes: 15740541e8f0 ("drm/nouveau/devinit/tu102-: prepare for GSP-RM")
>
> Maybe I'm missing something subtle here, but did you maybe pick the wrong
> commit here? At a first glance it looks like commit 073bde453635
> ("drm/nouveau/kms/nv50-: disable dcb parsing") introduced the issue.
>

Nope this commit causes the regression, as it powers off the display
core in devinit, which means later we don't detect it because GSP
doesn't power it back on.

Dave.

Re: [Nouveau] [PATCH 3/3] nouveau/gsp: add some basic registry entries.

2023-11-07 Thread Dave Airlie

On Wed, 8 Nov 2023 at 04:51, Timur Tabi  wrote:
>
> On Tue, 2023-10-31 at 15:18 +1000, Dave Airlie wrote:
>
> +   strings = (char *)>entries[NV_GSP_REG_NUM_ENTRIES];
>
>
> I get a UBSAN index-out-of-bounds error on boot at this line.
>
> [ 17.765746] nouveau :65:00.0: gsp: cmdq: wptr 1
> [ 17.765748] 
> 
> [ 17.774170] UBSAN: array-index-out-of-bounds in 
> drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:1065:33
> [ 17.783449] index 2 is out of range for type 'PACKED_REGISTRY_ENTRY [*]'
> [ 17.790132] CPU: 0 PID: 234 Comm: kworker/0:4 Not tainted 6.6.0-rc5+ #1
> [ 17.790135] Hardware name: ASUS X299-A/PRIME X299-A, BIOS 2002 09/25/2019
> [ 17.790136] Workqueue: events work_for_cpu_fn
> [ 17.790143] Call Trace:
> [ 17.790145] 
> [ 17.790148] dump_stack_lvl+0x48/0x70
> [ 17.790155] dump_stack+0x10/0x20
> [ 17.790156] __ubsan_handle_out_of_bounds+0xc6/0x110
> [ 17.790160] r535_gsp_oneinit+0xf81/0x1530 [nouveau]
> [ 17.790292] ? __dev_printk+0x39/0xa0
> [ 17.790295] ? _dev_info+0x75/0xa0
> [ 17.790298] tu102_gsp_oneinit+0x9b/0xd0 [nouveau]
>
> I'm not sure what the fix is.  Do we need 
> __attribute__((no_sanitize("array-bounds"))) on PACKED_REGISTRY_TABLE?

yes that is probably the right answer for this, if we want to reuse
the structs that we get from the nvidia driver.

Dave.

[Nouveau] [PATCH] nouveau: use an rwlock for the event lock.

2023-11-06 Thread Dave Airlie

From: Dave Airlie 

This allows it to break the following circular locking dependency.

Aug 10 07:01:29 dg1test kernel: 
==
Aug 10 07:01:29 dg1test kernel: WARNING: possible circular locking dependency 
detected
Aug 10 07:01:29 dg1test kernel: 6.4.0-rc7+ #10 Not tainted
Aug 10 07:01:29 dg1test kernel: 
--
Aug 10 07:01:29 dg1test kernel: wireplumber/2236 is trying to acquire lock:
Aug 10 07:01:29 dg1test kernel: 8fca5320da18 (>lock){-...}-{2:2}, at: 
nouveau_fence_wait_uevent_handler+0x2b/0x100 [nouveau]
Aug 10 07:01:29 dg1test kernel:
but task is already holding lock:
Aug 10 07:01:29 dg1test kernel: 8fca41208610 
(>list_lock#2){-...}-{2:2}, at: nvkm_event_ntfy+0x50/0xf0 [nouveau]
Aug 10 07:01:29 dg1test kernel:
which lock already depends on the new lock.
Aug 10 07:01:29 dg1test kernel:
the existing dependency chain (in reverse 
order) is:
Aug 10 07:01:29 dg1test kernel:
-> #3 (>list_lock#2){-...}-{2:2}:
Aug 10 07:01:29 dg1test kernel:_raw_spin_lock_irqsave+0x4b/0x70
Aug 10 07:01:29 dg1test kernel:nvkm_event_ntfy+0x50/0xf0 [nouveau]
Aug 10 07:01:29 dg1test kernel:ga100_fifo_nonstall_intr+0x24/0x30 
[nouveau]
Aug 10 07:01:29 dg1test kernel:nvkm_intr+0x12c/0x240 [nouveau]
Aug 10 07:01:29 dg1test kernel:__handle_irq_event_percpu+0x88/0x240
Aug 10 07:01:29 dg1test kernel:handle_irq_event+0x38/0x80
Aug 10 07:01:29 dg1test kernel:handle_edge_irq+0xa3/0x240
Aug 10 07:01:29 dg1test kernel:__common_interrupt+0x72/0x160
Aug 10 07:01:29 dg1test kernel:common_interrupt+0x60/0xe0
Aug 10 07:01:29 dg1test kernel:asm_common_interrupt+0x26/0x40
Aug 10 07:01:29 dg1test kernel:
-> #2 (>intr.lock){-...}-{2:2}:
Aug 10 07:01:29 dg1test kernel:_raw_spin_lock_irqsave+0x4b/0x70
Aug 10 07:01:29 dg1test kernel:nvkm_inth_allow+0x2c/0x80 [nouveau]
Aug 10 07:01:29 dg1test kernel:nvkm_event_ntfy_state+0x181/0x250 
[nouveau]
Aug 10 07:01:29 dg1test kernel:nvkm_event_ntfy_allow+0x63/0xd0 [nouveau]
Aug 10 07:01:29 dg1test kernel:nvkm_uevent_mthd+0x4d/0x70 [nouveau]
Aug 10 07:01:29 dg1test kernel:nvkm_ioctl+0x10b/0x250 [nouveau]
Aug 10 07:01:29 dg1test kernel:nvif_object_mthd+0xa8/0x1f0 [nouveau]
Aug 10 07:01:29 dg1test kernel:nvif_event_allow+0x2a/0xa0 [nouveau]
Aug 10 07:01:29 dg1test kernel:nouveau_fence_enable_signaling+0x78/0x80 
[nouveau]
Aug 10 07:01:29 dg1test kernel:__dma_fence_enable_signaling+0x5e/0x100
Aug 10 07:01:29 dg1test kernel:dma_fence_add_callback+0x4b/0xd0
Aug 10 07:01:29 dg1test kernel:nouveau_cli_work_queue+0xae/0x110 
[nouveau]
Aug 10 07:01:29 dg1test kernel:nouveau_gem_object_close+0x1d1/0x2a0 
[nouveau]
Aug 10 07:01:29 dg1test kernel:drm_gem_handle_delete+0x70/0xe0 [drm]
Aug 10 07:01:29 dg1test kernel:drm_ioctl_kernel+0xa5/0x150 [drm]
Aug 10 07:01:29 dg1test kernel:drm_ioctl+0x256/0x490 [drm]
Aug 10 07:01:29 dg1test kernel:nouveau_drm_ioctl+0x5a/0xb0 [nouveau]
Aug 10 07:01:29 dg1test kernel:__x64_sys_ioctl+0x91/0xd0
Aug 10 07:01:29 dg1test kernel:do_syscall_64+0x3c/0x90
Aug 10 07:01:29 dg1test kernel:entry_SYSCALL_64_after_hwframe+0x72/0xdc
Aug 10 07:01:29 dg1test kernel:
-> #1 (>refs_lock#4){}-{2:2}:
Aug 10 07:01:29 dg1test kernel:_raw_spin_lock_irqsave+0x4b/0x70
Aug 10 07:01:29 dg1test kernel:nvkm_event_ntfy_state+0x37/0x250 
[nouveau]
Aug 10 07:01:29 dg1test kernel:nvkm_event_ntfy_allow+0x63/0xd0 [nouveau]
Aug 10 07:01:29 dg1test kernel:nvkm_uevent_mthd+0x4d/0x70 [nouveau]
Aug 10 07:01:29 dg1test kernel:nvkm_ioctl+0x10b/0x250 [nouveau]
Aug 10 07:01:29 dg1test kernel:nvif_object_mthd+0xa8/0x1f0 [nouveau]
Aug 10 07:01:29 dg1test kernel:nvif_event_allow+0x2a/0xa0 [nouveau]
Aug 10 07:01:29 dg1test kernel:nouveau_fence_enable_signaling+0x78/0x80 
[nouveau]
Aug 10 07:01:29 dg1test kernel:__dma_fence_enable_signaling+0x5e/0x100
Aug 10 07:01:29 dg1test kernel:dma_fence_add_callback+0x4b/0xd0
Aug 10 07:01:29 dg1test kernel:nouveau_cli_work_queue+0xae/0x110 
[nouveau]
Aug 10 07:01:29 dg1test kernel:nouveau_gem_object_close+0x1d1/0x2a0 
[nouveau]
Aug 10 07:01:29 dg1test kernel:drm_gem_handle_delete+0x70/0xe0 [drm]
Aug 10 07:01:29 dg1test kernel:drm_ioctl_kernel+0xa5/0x150 [drm]
Aug 10 07:01:29 dg1test kernel:drm_ioctl+0x256/0x490 [drm]
Aug 10 07:01:29 dg1test kernel:nouveau_drm_ioctl+0x5a/0xb0 [nouveau]
Aug 10 07:01:29 dg1test kernel:__x64_sys_ioctl+0x91/0xd0
Aug 10 07:01:29 dg1test kernel:do_syscall_64+0x3c/0x90
Aug 10

[Nouveau] [PATCH] nouveau: don't fail driver load if no display hw present.

2023-11-05 Thread Dave Airlie

From: Dave Airlie 

If we get back ENODEV don't fail load.

Fixes: 15740541e8f0 ("drm/nouveau/devinit/tu102-: prepare for GSP-RM")
Link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/270
Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nouveau_display.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index d8c92521226d..f28f9a857458 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -726,6 +726,11 @@ nouveau_display_create(struct drm_device *dev)
 
if (nouveau_modeset != 2) {
ret = nvif_disp_ctor(>client.device, "kmsDisp", 0, 
>disp);
+   /* no display hw */
+   if (ret == -ENODEV) {
+   ret = 0;
+   goto disp_create_err;
+   }
 
if (!ret && (disp->disp.outp_mask || drm->vbios.dcb.entries)) {
nouveau_display_create_properties(dev);
-- 
2.41.0

Re: [Nouveau] [PATCH] drm/nouveau/gr/gf100-: unlock mutex failing to create golden context

2023-11-02 Thread Dave Airlie

On Fri, 3 Nov 2023 at 12:41, Danilo Krummrich  wrote:
>
> Do not return from gf100_gr_chan_new() with fecs mutex held when failing
> to create the golden context image.

Reviewed-by: Dave Airlie 
>
> Cc:  # v6.2+
> Fixes: ca081fff6ecc ("drm/nouveau/gr/gf100-: generate golden context during 
> first object alloc")
> Signed-off-by: Danilo Krummrich 
> ---
>  drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c 
> b/drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c
> index c494a1ff2d57..f72d3aa33442 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c
> @@ -442,6 +442,7 @@ gf100_gr_chan_new(struct nvkm_gr *base, struct nvkm_chan 
> *fifoch,
> if (gr->data == NULL) {
> ret = gf100_grctx_generate(gr, chan, fifoch->inst);
> if (ret) {
> +   mutex_unlock(>fecs.mutex);
> nvkm_error(>engine.subdev, "failed to construct 
> context\n");
> return ret;
> }
> --
> 2.41.0
>

Re: [Nouveau] [PATCH drm-misc-next v8 12/12] drm/nouveau: use GPUVM common infrastructure

2023-11-01 Thread Dave Airlie

On Thu, 2 Nov 2023 at 09:32, Danilo Krummrich  wrote:
>
> GPUVM provides common infrastructure to track external and evicted GEM
> objects as well as locking and validation helpers.
>
> Especially external and evicted object tracking is a huge improvement
> compared to the current brute force approach of iterating all mappings
> in order to lock and validate the GPUVM's GEM objects. Hence, make us of
> it.
>
> Signed-off-by: Danilo Krummrich 

Reviewed-by: Dave Airlie 

Dave.

Re: [Nouveau] [PATCH drm-misc-next v8 04/12] drm/nouveau: make use of drm_gpuvm_range_valid()

2023-11-01 Thread Dave Airlie

On Thu, 2 Nov 2023 at 09:31, Danilo Krummrich  wrote:
>
> Use drm_gpuvm_range_valid() in order to validate userspace requests.
>
> Signed-off-by: Danilo Krummrich 

Reviewed-by: Dave Airlie

Re: [Nouveau] [PATCH drm-misc-next v8 08/12] drm/nouveau: separately allocate struct nouveau_uvmm

2023-11-01 Thread Dave Airlie

On Thu, 2 Nov 2023 at 09:31, Danilo Krummrich  wrote:
>
> Allocate struct nouveau_uvmm separately in preparation for subsequent
> commits introducing reference counting for struct drm_gpuvm.
>
> While at it, get rid of nouveau_uvmm_init() as indirection of
> nouveau_uvmm_ioctl_vm_init() and perform some minor cleanups.
>
> Signed-off-by: Danilo Krummrich 

Reviewed-by: Dave Airlie

Re: [Nouveau] [PATCH] drm/sched: Convert the GPU scheduler to variable number of run-queues

2023-10-31 Thread Dave Airlie

On Wed, 1 Nov 2023 at 11:46, Luben Tuikov  wrote:
>
> On 2023-10-31 09:33, Danilo Krummrich wrote:
> >
> > On 10/26/23 19:25, Luben Tuikov wrote:
> >> On 2023-10-26 12:39, Danilo Krummrich wrote:
> >>> On 10/23/23 05:22, Luben Tuikov wrote:
>  The GPU scheduler has now a variable number of run-queues, which are set 
>  up at
>  drm_sched_init() time. This way, each driver announces how many 
>  run-queues it
>  requires (supports) per each GPU scheduler it creates. Note, that 
>  run-queues
>  correspond to scheduler "priorities", thus if the number of run-queues 
>  is set
>  to 1 at drm_sched_init(), then that scheduler supports a single 
>  run-queue,
>  i.e. single "priority". If a driver further sets a single entity per
>  run-queue, then this creates a 1-to-1 correspondence between a scheduler 
>  and
>  a scheduled entity.
> >>>
> >>> Generally, I'm fine with this patch and how it replaces / generalizes the 
> >>> single
> >>> entity approach.
> >>
> >> Great!
> >>
> >>> However, I'm not quite sure how to properly use this. What is a driver 
> >>> supposed to
> >>> do, which previously took advantage of DRM_SCHED_POLICY_SINGLE_ENTITY?
> >>>
> >>> Is it supposed to call drm_sched_init() with num_rqs=1? If so, what's the 
> >>> correct way
> >>
> >> Yes, you call drm_sched_init() with num_rqs set to 1.
> >>
> >>> to initialize the drm_sched_entity then? Calling drm_sched_entity_init() 
> >>> with priority=0?
> >>
> >> Yes, with priority set to 0.
> >>
> >> One unfortunate fact I noticed when doing this patch is that the numerical 
> >> values
> >> assigned to enum drm_sched_priority is that the names to values are upside 
> >> down.
> >> Instead of min being 0, normal:1, high:2, kernel:3, it should've been 
> >> kernel:0 (highest),
> >> high:1, normal:2, low:4, and so on.
> >>
> >> The reason for this is absolutely clear: if you had a single priority, it 
> >> would be
> >> 0, the kernel, one, highest one. This is similar to how lanes in a highway 
> >> are counted:
> >> you always have lane 1. Similarly to nice(1) and kernel priorities...
> >>
> >>> Any other priority consequently faults in drm_sched_job_arm().
> >>
> >> drm_sched_job_arm() faults on !ENTITY, but the "priority" is just
> >> assigned to s_priority:
> >>  job->s_priority = entity->priority;
> >>
> >>
> >>> While I might sound like a broken record (sorry for that), I really think 
> >>> everything
> >>> related to Matt's series needs documentation, as in:
> >>
> >> Yes, I agree.
> >
> > Great! Do you plan to send a subsequent patch adding some documentation for 
> > this one? I
> > think it'd be good to get all the above documented.
>
> A lot of this would be the magic sauce of drivers and hardware--as we've seen 
> with Xe,
> and it would be presumptuous of me to write down to the detail of what and 
> how this
> and that should be used.

Nope it wouldn't be. Please feel free to persume how drivers might
work in the form of documentation.

At some point the scheduler needs to be documented and so far two
maintainers have avoided doing so, and it's causing no end of
problems.

Write documentation, this goes for Xe scheduler patches, Danilo's work.

When someone asks you for docs, consider it a blocker on getting stuff
merged, because this stuff isn't obvious.

Dave.

Re: [Nouveau] [PATCH 3/3] nouveau/gsp: add some basic registry entries.

2023-10-31 Thread Dave Airlie

On Wed, 1 Nov 2023 at 01:53, Timur Tabi  wrote:
>
> On Tue, Oct 31, 2023 at 12:20 AM Dave Airlie  wrote:
> > +#define NV_GSP_REG_NUM_ENTRIES 2
> > +
> > +static const struct nv_gsp_registry_entries 
> > r535_registry_entries[NV_GSP_REG_NUM_ENTRIES] = {
> > +   { "RMSecBusResetEnable", 1 },
> > +   { "RMForcePcieConfigSave", 1 },
> > +};
>
> How about :
>
> static const struct nv_gsp_registry_entries r535_registry_entries[] = {
>{ "RMSecBusResetEnable", 1 },
>{ "RMForcePcieConfigSave", 1 },
> };
>
> #define NV_GSP_REG_NUM_ENTRIES ARRAY_SIZE(r535_registry_entries)

Good plan. I'll change that now.

Dave.

[Nouveau] [PATCH 2/3] nouveau/gsp: fix message signature.

2023-10-30 Thread Dave Airlie

From: Dave Airlie 

This original one was backwards, compared to traces from nvidia driver.

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
index 47138d797748..b6f6b5e747d4 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
@@ -693,7 +693,7 @@ r535_gsp_rpc_get(struct nvkm_gsp *gsp, u32 fn, u32 argc)
return NULL;
 
rpc->header_version = 0x0300;
-   rpc->signature = ('V' << 24) | ('R' << 16) | ('P' << 8) | 'C';
+   rpc->signature = ('C' << 24) | ('P' << 16) | ('R' << 8) | 'V';
rpc->function = fn;
rpc->rpc_result = 0x;
rpc->rpc_result_private = 0x;
-- 
2.41.0

[Nouveau] [PATCH 3/3] nouveau/gsp: add some basic registry entries.

2023-10-30 Thread Dave Airlie

From: Dave Airlie 

The nvidia driver sets these two basic registry entries always,
so copy it.

Signed-off-by: Dave Airlie 
---
 .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c| 45 ++-
 1 file changed, 35 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
index b6f6b5e747d4..5bd38b1de226 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
@@ -1029,26 +1029,51 @@ r535_gsp_rpc_unloading_guest_driver(struct nvkm_gsp 
*gsp, bool suspend)
return nvkm_gsp_rpc_wr(gsp, rpc, true);
 }
 
+/* dword only */
+struct nv_gsp_registry_entries {
+   const char *name;
+   uint32_t value;
+};
+
+#define NV_GSP_REG_NUM_ENTRIES 2
+
+static const struct nv_gsp_registry_entries 
r535_registry_entries[NV_GSP_REG_NUM_ENTRIES] = {
+   { "RMSecBusResetEnable", 1 },
+   { "RMForcePcieConfigSave", 1 },
+};
+
 static int
 r535_gsp_rpc_set_registry(struct nvkm_gsp *gsp)
 {
PACKED_REGISTRY_TABLE *rpc;
char *strings;
+   int str_offset;
+   int i;
+   size_t rpc_size = sizeof(*rpc) + sizeof(rpc->entries[0]) * 
NV_GSP_REG_NUM_ENTRIES;
 
-   rpc = nvkm_gsp_rpc_get(gsp, NV_VGPU_MSG_FUNCTION_SET_REGISTRY,
-  sizeof(*rpc) + sizeof(rpc->entries[0]) + 1);
+   /* add strings + null terminator */
+   for (i = 0; i < NV_GSP_REG_NUM_ENTRIES; i++)
+   rpc_size += strlen(r535_registry_entries[i].name) + 1;
+
+   rpc = nvkm_gsp_rpc_get(gsp, NV_VGPU_MSG_FUNCTION_SET_REGISTRY, 
rpc_size);
if (IS_ERR(rpc))
return PTR_ERR(rpc);
 
rpc->size = sizeof(*rpc);
-   rpc->numEntries = 1;
-   rpc->entries[0].nameOffset = offsetof(typeof(*rpc), entries[1]);
-   rpc->entries[0].type = 1;
-   rpc->entries[0].data = 0;
-   rpc->entries[0].length = 4;
-
-   strings = (char *)>entries[1];
-   strings[0] = '\0';
+   rpc->numEntries = NV_GSP_REG_NUM_ENTRIES;
+
+   str_offset = offsetof(typeof(*rpc), entries[NV_GSP_REG_NUM_ENTRIES]);
+   strings = (char *)>entries[NV_GSP_REG_NUM_ENTRIES];
+   for (i = 0; i < NV_GSP_REG_NUM_ENTRIES; i++) {
+   int name_len = strlen(r535_registry_entries[i].name) + 1;
+   rpc->entries[i].nameOffset = str_offset;
+   rpc->entries[i].type = 1;
+   rpc->entries[i].data = r535_registry_entries[i].value;
+   rpc->entries[i].length = 4;
+   memcpy(strings, r535_registry_entries[i].name, name_len);
+   strings += name_len;
+   str_offset += name_len;
+   }
 
return nvkm_gsp_rpc_wr(gsp, rpc, false);
 }
-- 
2.41.0

[Nouveau] [PATCH 1/3] nouveau/gsp: move to 535.113.01

2023-10-30 Thread Dave Airlie

From: Dave Airlie 

This moves the initial effort to the latest 535 firmware.

The gsp msg structs have changed, and the message passing also.
The wpr also seems to have some struct changes.

This version of the firmware will be what we are stuck on for a while,
until we can refactor the driver and work out a better path forward.

Signed-off-by: Dave Airlie 
---
 .../sdk/nvidia/inc/alloc/alloc_channel.h  | 13 +++-
 .../common/sdk/nvidia/inc/class/cl.h  |  4 +-
 .../common/sdk/nvidia/inc/class/cl0005.h  |  2 +-
 .../common/sdk/nvidia/inc/class/cl0080.h  |  2 +-
 .../common/sdk/nvidia/inc/class/cl2080.h  |  2 +-
 .../nvidia/inc/class/cl2080_notification.h|  2 +-
 .../common/sdk/nvidia/inc/class/cl84a0.h  |  2 +-
 .../common/sdk/nvidia/inc/class/cl90f1.h  |  2 +-
 .../common/sdk/nvidia/inc/class/clc0b5sw.h|  2 +-
 .../nvidia/inc/ctrl/ctrl0073/ctrl0073common.h |  2 +-
 .../nvidia/inc/ctrl/ctrl0073/ctrl0073dfp.h|  2 +-
 .../sdk/nvidia/inc/ctrl/ctrl0073/ctrl0073dp.h |  4 +-
 .../inc/ctrl/ctrl0073/ctrl0073specific.h  |  2 +-
 .../nvidia/inc/ctrl/ctrl0073/ctrl0073system.h |  2 +-
 .../nvidia/inc/ctrl/ctrl0080/ctrl0080fifo.h   |  2 +-
 .../nvidia/inc/ctrl/ctrl0080/ctrl0080gpu.h|  2 +-
 .../sdk/nvidia/inc/ctrl/ctrl0080/ctrl0080gr.h |  2 +-
 .../nvidia/inc/ctrl/ctrl2080/ctrl2080bios.h   |  2 +-
 .../sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080ce.h |  2 +-
 .../nvidia/inc/ctrl/ctrl2080/ctrl2080event.h  |  2 +-
 .../sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080fb.h |  2 +-
 .../nvidia/inc/ctrl/ctrl2080/ctrl2080fifo.h   |  2 +-
 .../nvidia/inc/ctrl/ctrl2080/ctrl2080gpu.h|  2 +-
 .../sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h |  2 +-
 .../inc/ctrl/ctrl2080/ctrl2080internal.h  |  2 +-
 .../common/sdk/nvidia/inc/ctrl/ctrl90f1.h |  2 +-
 .../nvidia/inc/ctrl/ctrla06f/ctrla06fgpfifo.h |  2 +-
 .../common/sdk/nvidia/inc/nvlimits.h  |  2 +-
 .../common/sdk/nvidia/inc/nvos.h  |  2 +-
 .../common/shared/msgq/inc/msgq/msgq_priv.h   |  2 +-
 .../uproc/os/common/include/libos_init_args.h |  2 +-
 .../nvalloc/common/inc/gsp/gsp_fw_sr_meta.h   |  2 +-
 .../nvalloc/common/inc/gsp/gsp_fw_wpr_meta.h  | 47 
 .../arch/nvalloc/common/inc/rmRiscvUcode.h|  2 +-
 .../nvidia/arch/nvalloc/common/inc/rmgspseq.h |  2 +-
 .../nvidia/generated/g_allclasses.h   |  2 +-
 .../nvidia/generated/g_chipset_nvoc.h |  2 +-
 .../nvidia/generated/g_fbsr_nvoc.h|  2 +-
 .../nvidia/generated/g_gpu_nvoc.h |  2 +-
 .../nvidia/generated/g_kernel_channel_nvoc.h  |  2 +-
 .../nvidia/generated/g_kernel_fifo_nvoc.h |  2 +-
 .../nvidia/generated/g_mem_desc_nvoc.h|  2 +-
 .../nvidia/generated/g_os_nvoc.h  |  2 +-
 .../nvidia/generated/g_rpc-structures.h   |  6 +-
 .../nvidia/generated/g_sdk-structures.h   |  4 +-
 .../nvidia/inc/kernel/gpu/gpu_acpi_data.h |  4 +-
 .../nvidia/inc/kernel/gpu/gpu_engine_type.h   |  2 +-
 .../nvidia/inc/kernel/gpu/gsp/gsp_fw_heap.h   |  2 +-
 .../nvidia/inc/kernel/gpu/gsp/gsp_init_args.h |  2 +-
 .../inc/kernel/gpu/gsp/gsp_static_config.h| 26 +++
 .../nvidia/inc/kernel/gpu/intr/engine_idx.h   |  2 +-
 .../nvidia/inc/kernel/gpu/nvbitmask.h |  4 +-
 .../nvidia/inc/kernel/os/nv_memory_type.h |  2 +-
 .../nvidia/kernel/inc/vgpu/rpc_global_enums.h |  2 +-
 .../nvidia/kernel/inc/vgpu/rpc_headers.h  |  2 +-
 .../nvidia/kernel/inc/vgpu/sdk-structures.h   |  2 +-
 drivers/gpu/drm/nouveau/nvkm/engine/ce/r535.c |  4 +-
 .../gpu/drm/nouveau/nvkm/engine/disp/r535.c   | 20 ++---
 .../gpu/drm/nouveau/nvkm/engine/fifo/r535.c   | 20 ++---
 drivers/gpu/drm/nouveau/nvkm/engine/gr/r535.c | 10 +--
 .../gpu/drm/nouveau/nvkm/engine/nvdec/r535.c  |  2 +-
 .../gpu/drm/nouveau/nvkm/engine/nvenc/r535.c  |  2 +-
 .../gpu/drm/nouveau/nvkm/engine/nvjpg/r535.c  |  2 +-
 .../gpu/drm/nouveau/nvkm/engine/ofa/r535.c|  2 +-
 .../gpu/drm/nouveau/nvkm/subdev/bar/r535.c|  6 +-
 .../gpu/drm/nouveau/nvkm/subdev/gsp/ad102.c   |  4 +-
 .../gpu/drm/nouveau/nvkm/subdev/gsp/ga100.c   |  4 +-
 .../gpu/drm/nouveau/nvkm/subdev/gsp/ga102.c   |  4 +-
 .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c| 76 +++
 .../gpu/drm/nouveau/nvkm/subdev/gsp/tu102.c   |  4 +-
 .../gpu/drm/nouveau/nvkm/subdev/gsp/tu116.c   |  4 +-
 .../drm/nouveau/nvkm/subdev/instmem/r535.c| 12 +--
 .../gpu/drm/nouveau/nvkm/subdev/mmu/r535.c|  6 +-
 73 files changed, 217 insertions(+), 171 deletions(-)
 rename drivers/gpu/drm/nouveau/include/nvrm/{535.54.03 => 
535.113.01}/common/sdk/nvidia/inc/alloc/alloc_channel.h (93%)
 rename drivers/gpu/drm/nouveau/include/nvrm/{535.54.03 => 
535.113.01}/common/sdk/nvidia/inc/class/cl.h (94%)
 rename drivers/gpu/drm/nouveau/include/nvrm/{535.54.03 => 
535.113.01}/common/sdk/nvidia/inc/class/cl0005.h (97%)
 rename drivers/gpu/drm/nouveau/include/nvrm/{535.54.03 => 
535.113.01}/common/sdk/nvidia/inc/class/cl0080.h (98%)
 rename drivers/gpu/drm/nouveau/

[Nouveau] nouveau/gsp: move to latest fw and small fixes

2023-10-30 Thread Dave Airlie

This moves Ben's work to the latest GSP stable firmware 535.113.01.
We will be stuck on this for a while.

There is also a fix for a message signature, and additions of two
registry entries, which seem to help recover from crashes.

Dave.

[Nouveau] [PATCH 1/2] nouveau: fix r535 build on 32-bit arm.

2023-10-29 Thread Dave Airlie

From: Dave Airlie 

This needs the proper division macros.

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c
index 14a67cf96204..0f9b8087d5e6 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c
@@ -267,7 +267,7 @@ r535_chan_id_get_locked(struct nvkm_chan *chan, struct 
nvkm_memory *muserd, u64
return -EINVAL;
}
 
-   chid = ouserd / chan->func->userd->size;
+   chid = div_u64(ouserd, chan->func->userd->size);
 
list_for_each_entry(userd, >userd.list, head) {
if (userd->mem == muserd) {
-- 
2.41.0

[Nouveau] [PATCH 2/2] nouveau/disp: fix post-gsp build on 32-bit arm.

2023-10-29 Thread Dave Airlie

From: Dave Airlie 

This converts a bunch of divides into the proper macros.

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/dispnv50/disp.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c 
b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index d2be40337b92..7840b6428afb 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -1644,7 +1644,7 @@ nv50_sor_dp_watermark_sst(struct nouveau_encoder *outp,
// 0 active symbols. This may cause HW hang. Bug 200379426
//
if ((bEnableDsc) &&
-   ((pixelClockHz * depth) < ((8 * minRate * outp->dp.link_nr * 
DSC_FACTOR) / 64)))
+   ((pixelClockHz * depth) < div_u64(8 * minRate * outp->dp.link_nr * 
DSC_FACTOR, 64)))
{
return false;
}
@@ -1654,20 +1654,20 @@ nv50_sor_dp_watermark_sst(struct nouveau_encoder *outp,
//  For auto mode the watermark calculation does not need to track 
accumulated error the
//  formulas for manual mode will not work.  So below calculation 
was extracted from the DTB.
//
-   ratioF = ((u64)pixelClockHz * depth * PrecisionFactor) / DSC_FACTOR;
+   ratioF = div_u64((u64)pixelClockHz * depth * PrecisionFactor, 
DSC_FACTOR);
 
-   ratioF /= 8 * (u64) minRate * outp->dp.link_nr;
+   ratioF = div_u64(ratioF, 8 * (u64) minRate * outp->dp.link_nr);
 
if (PrecisionFactor < ratioF) // Assert if we will end up with a 
negative number in below
return false;
 
-   watermarkF = ratioF * tuSize * (PrecisionFactor - ratioF)  / 
PrecisionFactor;
-   waterMark = (unsigned)(watermarkAdjust + ((2 * (depth * PrecisionFactor 
/ (8 * numLanesPerLink * DSC_FACTOR)) + watermarkF) / PrecisionFactor));
+   watermarkF = div_u64(ratioF * tuSize * (PrecisionFactor - ratioF), 
PrecisionFactor);
+   waterMark = (unsigned)(watermarkAdjust + (div_u64(2 * div_u64(depth * 
PrecisionFactor, 8 * numLanesPerLink * DSC_FACTOR) + watermarkF, 
PrecisionFactor)));
 
//
//  Bounds check the watermark
//
-   numSymbolsPerLine = (surfaceWidth * depth) / (8 * outp->dp.link_nr * 
DSC_FACTOR);
+   numSymbolsPerLine = div_u64(surfaceWidth * depth, 8 * outp->dp.link_nr 
* DSC_FACTOR);
 
if (WARN_ON(waterMark > 39 || waterMark > numSymbolsPerLine))
return false;
@@ -1688,11 +1688,13 @@ nv50_sor_dp_watermark_sst(struct nouveau_encoder *outp,
surfaceWidthPerLink = surfaceWidth;
 
//Extra bits sent due to pixel steering
-   PixelSteeringBits = (surfaceWidthPerLink % numLanesPerLink) ? 
(((numLanesPerLink - surfaceWidthPerLink % numLanesPerLink) * depth) / 
DSC_FACTOR) : 0;
+   u32 remain;
+   div_u64_rem(surfaceWidthPerLink, numLanesPerLink, );
+   PixelSteeringBits = remain ? div_u64((numLanesPerLink - remain) * 
depth, DSC_FACTOR) : 0;
 
BlankingBits += PixelSteeringBits;
-   NumBlankingLinkClocks = (u64)BlankingBits * PrecisionFactor / (8 * 
numLanesPerLink);
-   MinHBlank = (u32)(NumBlankingLinkClocks * pixelClockHz/ minRate / 
PrecisionFactor);
+   NumBlankingLinkClocks = div_u64((u64)BlankingBits * PrecisionFactor, (8 
* numLanesPerLink));
+   MinHBlank = (u32)(div_u64(div_u64(NumBlankingLinkClocks * pixelClockHz, 
minRate), PrecisionFactor));
MinHBlank += 12;
 
if (WARN_ON(MinHBlank > rasterWidth - surfaceWidth))
@@ -1703,7 +1705,7 @@ nv50_sor_dp_watermark_sst(struct nouveau_encoder *outp,
return false;
 
 
-   hblank_symbols = (s32)(((u64)(rasterWidth - surfaceWidth - MinHBlank) * 
minRate) / pixelClockHz);
+   hblank_symbols = (s32)(div_u64((u64)(rasterWidth - surfaceWidth - 
MinHBlank) * minRate, pixelClockHz));
 
//reduce HBlank Symbols to account for secondary data packet
hblank_symbols -= 1; //Stuffer latency to send BS
@@ -1722,7 +1724,7 @@ nv50_sor_dp_watermark_sst(struct nouveau_encoder *outp,
}
else
{
-   vblank_symbols = (s32)(((u64)(surfaceWidth - 40) * minRate) /  
pixelClockHz) - 1;
+   vblank_symbols = (s32)((div_u64((u64)(surfaceWidth - 40) * 
minRate, pixelClockHz))) - 1;
 
vblank_symbols -= numLanesPerLink == 1 ? 39  : numLanesPerLink 
== 2 ? 21 : 12;
}
-- 
2.41.0

Re: [Nouveau] [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation

2023-10-12 Thread Dave Airlie

On Wed, 11 Oct 2023 at 17:07, Christian König  wrote:
>
> Am 10.10.23 um 22:23 schrieb Dave Airlie:
> >> I think we're then optimizing for different scenarios. Our compute
> >> driver will use mostly external objects only, and if shared, I don't
> >> forsee them bound to many VMs. What saves us currently here is that in
> >> compute mode we only really traverse the extobj list after a preempt
> >> fence wait, or when a vm is using a new context for the first time. So
> >> vm's extobj list is pretty large. Each bo's vma list will typically be
> >> pretty small.
> > Can I ask why we are optimising for this userspace, this seems
> > incredibly broken.
> >
> > We've has this sort of problem in the past with Intel letting the tail
> > wag the horse, does anyone remember optimising relocations for a
> > userspace that didn't actually need to use relocations?
> >
> > We need to ask why this userspace is doing this, can we get some
> > pointers to it? compute driver should have no reason to use mostly
> > external objects, the OpenCL and level0 APIs should be good enough to
> > figure this out.
>
> Well that is pretty normal use case, AMD works the same way.
>
> In a multi GPU compute stack you have mostly all the data shared between
> different hardware devices.
>
> As I said before looking at just the Vulcan use case is not a good idea
> at all.
>

It's okay, I don't think anyone is doing that, some of the these
use-cases are buried in server land and you guys don't communicate
them very well.

multi-gpu compute would I'd hope be moving towards HMM/SVM type
solutions though?

I'm also not into looking at use-cases that used to be important but
might not as important going forward.

Dave.


> Christian.
>
> >
> > Dave.
>

Re: [Nouveau] [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation

2023-10-10 Thread Dave Airlie

> I think we're then optimizing for different scenarios. Our compute
> driver will use mostly external objects only, and if shared, I don't
> forsee them bound to many VMs. What saves us currently here is that in
> compute mode we only really traverse the extobj list after a preempt
> fence wait, or when a vm is using a new context for the first time. So
> vm's extobj list is pretty large. Each bo's vma list will typically be
> pretty small.

Can I ask why we are optimising for this userspace, this seems
incredibly broken.

We've has this sort of problem in the past with Intel letting the tail
wag the horse, does anyone remember optimising relocations for a
userspace that didn't actually need to use relocations?

We need to ask why this userspace is doing this, can we get some
pointers to it? compute driver should have no reason to use mostly
external objects, the OpenCL and level0 APIs should be good enough to
figure this out.

Dave.

Re: [Nouveau] [PATCH 3/3] drm/nouveau: exec: report max pushs through getparam

2023-09-27 Thread Dave Airlie

On Thu, 28 Sept 2023 at 07:55, Faith Ekstrand
 wrote:
>
> On Wed, 2023-09-27 at 03:22 +0200, Danilo Krummrich wrote:
> > Report the maximum number of IBs that can be pushed with a single
> > DRM_IOCTL_NOUVEAU_EXEC through DRM_IOCTL_NOUVEAU_GETPARAM.
> >
> > While the maximum number of IBs per ring might vary between chipsets,
> > the kernel will make sure that userspace can only push a fraction of
> > the
> > maximum number of IBs per ring per job, such that we avoid a
> > situation
> > where there's only a single job occupying the ring, which could
> > potentially lead to the ring run dry.
> >
> > Using DRM_IOCTL_NOUVEAU_GETPARAM to report the maximum number of IBs
> > that can be pushed with a single DRM_IOCTL_NOUVEAU_EXEC implies that
> > all channels of a given device have the same ring size.
>
> There's a bunch of nouveau kernel details I don't know here but the
> interface looks good and I prefer it to a #define in the header.
>
> Acked-by: Faith Ekstrand 

For the series

Reviewed-by: Dave Airlie 

we should probably land this in drm-misc-fixes, since it would be good
to have in 6.6

Dave.

>
>
> > Signed-off-by: Danilo Krummrich 
> > ---
> >  drivers/gpu/drm/nouveau/nouveau_abi16.c | 19 +++
> >  drivers/gpu/drm/nouveau/nouveau_chan.c  |  2 +-
> >  drivers/gpu/drm/nouveau/nouveau_dma.h   |  3 +++
> >  drivers/gpu/drm/nouveau/nouveau_exec.c  |  7 ---
> >  drivers/gpu/drm/nouveau/nouveau_exec.h  |  5 +
> >  include/uapi/drm/nouveau_drm.h  | 10 ++
> >  6 files changed, 42 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.c
> > b/drivers/gpu/drm/nouveau/nouveau_abi16.c
> > index 30afbec9e3b1..1a198689b391 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_abi16.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_abi16.c
> > @@ -31,6 +31,7 @@
> >
> >  #include "nouveau_drv.h"
> >  #include "nouveau_dma.h"
> > +#include "nouveau_exec.h"
> >  #include "nouveau_gem.h"
> >  #include "nouveau_chan.h"
> >  #include "nouveau_abi16.h"
> > @@ -183,6 +184,20 @@ nouveau_abi16_fini(struct nouveau_abi16 *abi16)
> > cli->abi16 = NULL;
> >  }
> >
> > +static inline unsigned int
> > +getparam_dma_ib_max(struct nvif_device *device)
> > +{
> > +   const struct nvif_mclass dmas[] = {
> > +   { NV03_CHANNEL_DMA, 0 },
> > +   { NV10_CHANNEL_DMA, 0 },
> > +   { NV17_CHANNEL_DMA, 0 },
> > +   { NV40_CHANNEL_DMA, 0 },
> > +   {}
> > +   };
> > +
> > +   return nvif_mclass(>object, dmas) < 0 ?
> > NV50_DMA_IB_MAX : 0;
> > +}
> > +
> >  int
> >  nouveau_abi16_ioctl_getparam(ABI16_IOCTL_ARGS)
> >  {
> > @@ -247,6 +262,10 @@ nouveau_abi16_ioctl_getparam(ABI16_IOCTL_ARGS)
> > case NOUVEAU_GETPARAM_GRAPH_UNITS:
> > getparam->value = nvkm_gr_units(gr);
> > break;
> > +   case NOUVEAU_GETPARAM_EXEC_PUSH_MAX:
> > +   getparam->value = getparam_dma_ib_max(device) /
> > + NOUVEAU_EXEC_PUSH_MAX_DIV;
> > +   break;
> > default:
> > NV_PRINTK(dbg, cli, "unknown parameter %lld\n",
> > getparam->param);
> > return -EINVAL;
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.c
> > b/drivers/gpu/drm/nouveau/nouveau_chan.c
> > index ac56f4689ee3..c3c2ab887978 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_chan.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_chan.c
> > @@ -456,7 +456,7 @@ nouveau_channel_init(struct nouveau_channel
> > *chan, u32 vram, u32 gart)
> > chan->user_get = 0x44;
> > chan->user_get_hi = 0x60;
> > chan->dma.ib_base =  0x1 / 4;
> > -   chan->dma.ib_max  = (0x02000 / 8) - 1;
> > +   chan->dma.ib_max  = NV50_DMA_IB_MAX;
> > chan->dma.ib_put  = 0;
> > chan->dma.ib_free = chan->dma.ib_max - chan-
> > >dma.ib_put;
> > chan->dma.max = chan->dma.ib_base;
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_dma.h
> > b/drivers/gpu/drm/nouveau/nouveau_dma.h
> > index 1744d95b233e..c52cda82353e 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_dma.h
> > +++ b/drivers/gpu/drm/nouveau/nouveau_dma.h
> > @@ -49,6 +49,9 @@ void nv

Re: [Nouveau] [PATCH drm-misc-next v4 3/8] drm/nouveau: uvmm: rename 'umgr' to 'base'

2023-09-24 Thread Dave Airlie

On Thu, 21 Sept 2023 at 00:44, Danilo Krummrich  wrote:
>
> Rename struct drm_gpuvm within struct nouveau_uvmm from 'umgr' to base.
>
> Signed-off-by: Danilo Krummrich 

Reviewed-by: Dave Airlie 


> ---
>  drivers/gpu/drm/nouveau/nouveau_debugfs.c |  2 +-
>  drivers/gpu/drm/nouveau/nouveau_exec.c|  4 +--
>  drivers/gpu/drm/nouveau/nouveau_uvmm.c| 32 +++
>  drivers/gpu/drm/nouveau/nouveau_uvmm.h|  6 ++---
>  4 files changed, 22 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_debugfs.c 
> b/drivers/gpu/drm/nouveau/nouveau_debugfs.c
> index 053f703f2f68..e83db051e851 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_debugfs.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_debugfs.c
> @@ -231,7 +231,7 @@ nouveau_debugfs_gpuva(struct seq_file *m, void *data)
> continue;
>
> nouveau_uvmm_lock(uvmm);
> -   drm_debugfs_gpuva_info(m, >umgr);
> +   drm_debugfs_gpuva_info(m, >base);
> seq_puts(m, "\n");
> nouveau_debugfs_gpuva_regions(m, uvmm);
> nouveau_uvmm_unlock(uvmm);
> diff --git a/drivers/gpu/drm/nouveau/nouveau_exec.c 
> b/drivers/gpu/drm/nouveau/nouveau_exec.c
> index c001952cd678..b4239af29e5a 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_exec.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_exec.c
> @@ -106,8 +106,8 @@ nouveau_exec_job_submit(struct nouveau_job *job)
> drm_exec_until_all_locked(exec) {
> struct drm_gpuva *va;
>
> -   drm_gpuvm_for_each_va(va, >umgr) {
> -   if (unlikely(va == >umgr.kernel_alloc_node))
> +   drm_gpuvm_for_each_va(va, >base) {
> +   if (unlikely(va == >base.kernel_alloc_node))
> continue;
>
> ret = drm_exec_prepare_obj(exec, va->gem.obj, 1);
> diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c 
> b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> index c750072cb268..6c86b64273c3 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> @@ -329,7 +329,7 @@ nouveau_uvma_region_create(struct nouveau_uvmm *uvmm,
> struct nouveau_uvma_region *reg;
> int ret;
>
> -   if (!drm_gpuva_interval_empty(>umgr, addr, range))
> +   if (!drm_gpuva_interval_empty(>base, addr, range))
> return -ENOSPC;
>
> ret = nouveau_uvma_region_alloc();
> @@ -384,7 +384,7 @@ nouveau_uvma_region_empty(struct nouveau_uvma_region *reg)
>  {
> struct nouveau_uvmm *uvmm = reg->uvmm;
>
> -   return drm_gpuva_interval_empty(>umgr,
> +   return drm_gpuva_interval_empty(>base,
> reg->va.addr,
> reg->va.range);
>  }
> @@ -589,7 +589,7 @@ op_map_prepare(struct nouveau_uvmm *uvmm,
> uvma->region = args->region;
> uvma->kind = args->kind;
>
> -   drm_gpuva_map(>umgr, >va, op);
> +   drm_gpuva_map(>base, >va, op);
>
> /* Keep a reference until this uvma is destroyed. */
> nouveau_uvma_gem_get(uvma);
> @@ -1194,7 +1194,7 @@ nouveau_uvmm_bind_job_submit(struct nouveau_job *job)
> goto unwind_continue;
> }
>
> -   op->ops = drm_gpuvm_sm_unmap_ops_create(>umgr,
> +   op->ops = drm_gpuvm_sm_unmap_ops_create(>base,
> op->va.addr,
> op->va.range);
> if (IS_ERR(op->ops)) {
> @@ -1205,7 +1205,7 @@ nouveau_uvmm_bind_job_submit(struct nouveau_job *job)
> ret = nouveau_uvmm_sm_unmap_prepare(uvmm, >new,
> op->ops);
> if (ret) {
> -   drm_gpuva_ops_free(>umgr, op->ops);
> +   drm_gpuva_ops_free(>base, op->ops);
> op->ops = NULL;
> op->reg = NULL;
> goto unwind_continue;
> @@ -1240,7 +1240,7 @@ nouveau_uvmm_bind_job_submit(struct nouveau_job *job)
> }
> }
>
> -   op->ops = drm_gpuvm_sm_map_ops_create(>umgr,
> +   op->ops = drm_gpuvm_sm_map_ops_create(>base,
>

Re: [Nouveau] [PATCH drm-misc-next v4 2/8] drm/gpuvm: allow building as module

2023-09-24 Thread Dave Airlie

On Thu, 21 Sept 2023 at 00:43, Danilo Krummrich  wrote:
>
> Currently, the DRM GPUVM does not have any core dependencies preventing
> a module build.
>
> Also, new features from subsequent patches require helpers (namely
> drm_exec) which can be built as module.
>
> Reviewed-by: Christian König 
> Signed-off-by: Danilo Krummrich 

Reviewed-by: Dave Airlie 

> ---
>  drivers/gpu/drm/Kconfig | 7 +++
>  drivers/gpu/drm/Makefile| 2 +-
>  drivers/gpu/drm/drm_gpuvm.c | 3 +++
>  drivers/gpu/drm/nouveau/Kconfig | 1 +
>  4 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index ab9ef1c20349..0f78a03e4e84 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -216,6 +216,13 @@ config DRM_EXEC
> help
>   Execution context for command submissions
>
> +config DRM_GPUVM
> +   tristate
> +   depends on DRM && DRM_EXEC
> +   help
> + GPU-VM representation providing helpers to manage a GPUs virtual
> + address space
> +
>  config DRM_BUDDY
> tristate
> depends on DRM
> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> index 7a84b3cddeab..8e1bde059170 100644
> --- a/drivers/gpu/drm/Makefile
> +++ b/drivers/gpu/drm/Makefile
> @@ -45,7 +45,6 @@ drm-y := \
> drm_vblank.o \
> drm_vblank_work.o \
> drm_vma_manager.o \
> -   drm_gpuvm.o \
> drm_writeback.o
>  drm-$(CONFIG_DRM_LEGACY) += \
> drm_agpsupport.o \
> @@ -81,6 +80,7 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += 
> drm_panel_orientation_quirks.o
>  #
>  #
>  obj-$(CONFIG_DRM_EXEC) += drm_exec.o
> +obj-$(CONFIG_DRM_GPUVM) += drm_gpuvm.o
>
>  obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
>
> diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> index 7074bcad5b28..bfea4a8a19ec 100644
> --- a/drivers/gpu/drm/drm_gpuvm.c
> +++ b/drivers/gpu/drm/drm_gpuvm.c
> @@ -1723,3 +1723,6 @@ drm_gpuva_ops_free(struct drm_gpuvm *gpuvm,
> kfree(ops);
>  }
>  EXPORT_SYMBOL_GPL(drm_gpuva_ops_free);
> +
> +MODULE_DESCRIPTION("DRM GPUVM");
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/gpu/drm/nouveau/Kconfig b/drivers/gpu/drm/nouveau/Kconfig
> index c52e8096cca4..1e6aaf95ff7c 100644
> --- a/drivers/gpu/drm/nouveau/Kconfig
> +++ b/drivers/gpu/drm/nouveau/Kconfig
> @@ -11,6 +11,7 @@ config DRM_NOUVEAU
> select DRM_TTM
> select DRM_TTM_HELPER
> select DRM_EXEC
> +   select DRM_GPUVM
> select DRM_SCHED
> select I2C
> select I2C_ALGOBIT
> --
> 2.41.0
>

Re: [Nouveau] [PATCH drm-misc-next v4 1/8] drm/gpuvm: rename struct drm_gpuva_manager to struct drm_gpuvm

2023-09-24 Thread Dave Airlie

On Thu, 21 Sept 2023 at 16:49, Christian König  wrote:
>
> Am 20.09.23 um 16:42 schrieb Danilo Krummrich:
> > Rename struct drm_gpuva_manager to struct drm_gpuvm including
> > corresponding functions. This way the GPUVA manager's structures align
> > very well with the documentation of VM_BIND [1] and VM_BIND locking [2].
> >
> > It also provides a better foundation for the naming of data structures
> > and functions introduced for implementing a common dma-resv per GPU-VM
> > including tracking of external and evicted objects in subsequent
> > patches.
> >
> > [1] Documentation/gpu/drm-vm-bind-async.rst
> > [2] Documentation/gpu/drm-vm-bind-locking.rst
> >
> > Cc: Thomas Hellström 
> > Cc: Matthew Brost 
> > Signed-off-by: Danilo Krummrich 
>
> Not sure if that name is better or worse, but from the handling I
> suggest to have this patch separately pushed to drm-misc-next.
>
> Feel free to add my Acked-by for pushing this.
>

Acked-by: Dave Airlie 


> Regards,
> Christian.
>
> > ---
> >   drivers/gpu/drm/Makefile  |   2 +-
> >   drivers/gpu/drm/drm_debugfs.c |  16 +-
> >   .../gpu/drm/{drm_gpuva_mgr.c => drm_gpuvm.c}  | 400 +-
> >   drivers/gpu/drm/nouveau/nouveau_exec.c|   2 +-
> >   drivers/gpu/drm/nouveau/nouveau_uvmm.c|  24 +-
> >   drivers/gpu/drm/nouveau/nouveau_uvmm.h|   6 +-
> >   include/drm/drm_debugfs.h |   6 +-
> >   include/drm/{drm_gpuva_mgr.h => drm_gpuvm.h}  | 153 ---
> >   8 files changed, 304 insertions(+), 305 deletions(-)
> >   rename drivers/gpu/drm/{drm_gpuva_mgr.c => drm_gpuvm.c} (78%)
> >   rename include/drm/{drm_gpuva_mgr.h => drm_gpuvm.h} (78%)
> >
> > diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> > index 215e78e79125..7a84b3cddeab 100644
> > --- a/drivers/gpu/drm/Makefile
> > +++ b/drivers/gpu/drm/Makefile
> > @@ -45,7 +45,7 @@ drm-y := \
> >   drm_vblank.o \
> >   drm_vblank_work.o \
> >   drm_vma_manager.o \
> > - drm_gpuva_mgr.o \
> > + drm_gpuvm.o \
> >   drm_writeback.o
> >   drm-$(CONFIG_DRM_LEGACY) += \
> >   drm_agpsupport.o \
> > diff --git a/drivers/gpu/drm/drm_debugfs.c b/drivers/gpu/drm/drm_debugfs.c
> > index 44ecd7d0daac..f291fb4b359f 100644
> > --- a/drivers/gpu/drm/drm_debugfs.c
> > +++ b/drivers/gpu/drm/drm_debugfs.c
> > @@ -40,7 +40,7 @@
> >   #include 
> >   #include 
> >   #include 
> > -#include 
> > +#include 
> >
> >   #include "drm_crtc_internal.h"
> >   #include "drm_internal.h"
> > @@ -189,31 +189,31 @@ static const struct file_operations drm_debugfs_fops 
> > = {
> >   /**
> >* drm_debugfs_gpuva_info - dump the given DRM GPU VA space
> >* @m: pointer to the _file to write
> > - * @mgr: the _gpuva_manager representing the GPU VA space
> > + * @gpuvm: the _gpuvm representing the GPU VA space
> >*
> >* Dumps the GPU VA mappings of a given DRM GPU VA manager.
> >*
> >* For each DRM GPU VA space drivers should call this function from their
> >* _info_list's show callback.
> >*
> > - * Returns: 0 on success, -ENODEV if the  is not initialized
> > + * Returns: 0 on success, -ENODEV if the  is not initialized
> >*/
> >   int drm_debugfs_gpuva_info(struct seq_file *m,
> > -struct drm_gpuva_manager *mgr)
> > +struct drm_gpuvm *gpuvm)
> >   {
> > - struct drm_gpuva *va, *kva = >kernel_alloc_node;
> > + struct drm_gpuva *va, *kva = >kernel_alloc_node;
> >
> > - if (!mgr->name)
> > + if (!gpuvm->name)
> >   return -ENODEV;
> >
> >   seq_printf(m, "DRM GPU VA space (%s) [0x%016llx;0x%016llx]\n",
> > -mgr->name, mgr->mm_start, mgr->mm_start + mgr->mm_range);
> > +gpuvm->name, gpuvm->mm_start, gpuvm->mm_start + 
> > gpuvm->mm_range);
> >   seq_printf(m, "Kernel reserved node [0x%016llx;0x%016llx]\n",
> >  kva->va.addr, kva->va.addr + kva->va.range);
> >   seq_puts(m, "\n");
> >   seq_puts(m, " VAs | start  | range  | end 
> >| object | object offset\n");
> >   seq_puts(m, 
> > "-\n&quo

Re: [Nouveau] [PATCH] drm/nouveau: sched: fix leaking memory of timedout job

2023-09-18 Thread Dave Airlie

On Sun, 17 Sept 2023 at 02:28, Danilo Krummrich  wrote:
>
> Always stop and re-start the scheduler in order to let the scheduler
> free up the timedout job in case it got signaled. In case of exec jobs
> the job type specific callback will take care to signal all fences and
> tear down the channel.

Reviewed-by: Dave Airlie 
>
> Fixes: b88baab82871 ("drm/nouveau: implement new VM_BIND uAPI")
> Signed-off-by: Danilo Krummrich 
> ---
>  drivers/gpu/drm/nouveau/nouveau_exec.c  |  2 +-
>  drivers/gpu/drm/nouveau/nouveau_sched.c | 12 +---
>  2 files changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_exec.c 
> b/drivers/gpu/drm/nouveau/nouveau_exec.c
> index 9c031d15fe0b..49d83ac9e036 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_exec.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_exec.c
> @@ -185,7 +185,7 @@ nouveau_exec_job_timeout(struct nouveau_job *job)
>
> nouveau_sched_entity_fini(job->entity);
>
> -   return DRM_GPU_SCHED_STAT_ENODEV;
> +   return DRM_GPU_SCHED_STAT_NOMINAL;
>  }
>
>  static struct nouveau_job_ops nouveau_exec_job_ops = {
> diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c 
> b/drivers/gpu/drm/nouveau/nouveau_sched.c
> index 88217185e0f3..3b7ea5221226 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_sched.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_sched.c
> @@ -375,14 +375,20 @@ nouveau_sched_run_job(struct drm_sched_job *sched_job)
>  static enum drm_gpu_sched_stat
>  nouveau_sched_timedout_job(struct drm_sched_job *sched_job)
>  {
> +   struct drm_gpu_scheduler *sched = sched_job->sched;
> struct nouveau_job *job = to_nouveau_job(sched_job);
> +   enum drm_gpu_sched_stat stat = DRM_GPU_SCHED_STAT_NOMINAL;
>
> -   NV_PRINTK(warn, job->cli, "Job timed out.\n");
> +   drm_sched_stop(sched, sched_job);
>
> if (job->ops->timeout)
> -   return job->ops->timeout(job);
> +   stat = job->ops->timeout(job);
> +   else
> +   NV_PRINTK(warn, job->cli, "Generic job timeout.\n");
> +
> +   drm_sched_start(sched, true);
>
> -   return DRM_GPU_SCHED_STAT_ENODEV;
> +   return stat;
>  }
>
>  static void
> --
> 2.41.0
>

Re: [Nouveau] [PATCH] drm/nouveau: fence: fix type cast warning in nouveau_fence_emit()

2023-09-18 Thread Dave Airlie

On Sat, 16 Sept 2023 at 11:15, Danilo Krummrich  wrote:
>
> Fix the following warning.
>
>   drivers/gpu/drm/nouveau/nouveau_fence.c:210:45: sparse: sparse:
>   incorrect type in initializer (different address spaces)
>   @@ expected struct nouveau_channel *chan
>   @@ got struct nouveau_channel [noderef] __rcu *channel

Reviewed-by: Dave Airlie 
>
> We're just about to emit the fence, there is nothing to protect against
> yet, hence it is safe to just cast __rcu away.
>
> Reported-by: kernel test robot 
> Closes: 
> https://lore.kernel.org/oe-kbuild-all/202309140340.bwkxzadx-...@intel.com/
> Fixes: 978474dc8278 ("drm/nouveau: fence: fix undefined fence state after 
> emit")
> Signed-off-by: Danilo Krummrich 
> ---
>  drivers/gpu/drm/nouveau/nouveau_fence.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c 
> b/drivers/gpu/drm/nouveau/nouveau_fence.c
> index 61d9e70da9fd..ca762ea55413 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> @@ -207,7 +207,7 @@ nouveau_fence_context_new(struct nouveau_channel *chan, 
> struct nouveau_fence_cha
>  int
>  nouveau_fence_emit(struct nouveau_fence *fence)
>  {
> -   struct nouveau_channel *chan = fence->channel;
> +   struct nouveau_channel *chan = unrcu_pointer(fence->channel);
> struct nouveau_fence_chan *fctx = chan->fence;
> struct nouveau_fence_priv *priv = (void*)chan->drm->fence;
> int ret;
> --
> 2.41.0
>

Re: [Nouveau] Stepping away.

2023-09-18 Thread Dave Airlie

>
> As you may have gathered from the MAINTAINERS patch I just sent out, I
> have resigned from my position at Red Hat, and will be stepping back
> from nouveau development.
>
> This is a personal decision that I've been mulling over for a number
> of years now, and I feel that with GSP-RM greatly simplifying support
> of future HW, and the community being built around NVK, that things
> are in good hands and this is the right time for me to take some time
> away to explore other avenues.
>
> I still have a personal system with an RTX 4070, which I've been using
> the nouveau GSP-RM code on for the past couple of weeks, so chances
> are I'll be poking my nose in every so often :)
>
> I wish everyone the best, and look forward to seeing the progress you
> all make on nouveau in the future.

With upstream maintainer hat,

Thanks for all the work over the years, Cambridge XDS 2007, and Sydney
LCA 2007 were quite the era ago when you started down this road, and
you will probably remain the greatest single authority on NVIDIA
hardware for many years to come,

I'm cc'ing dri-devel for wider coverage, we do have some contingencies
in place, which we will roll out over next few days, nouveau will
continue with the headstart Ben has given us with the GSP work.

Dave.

Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation

2023-09-13 Thread Dave Airlie

On Wed, 13 Sept 2023 at 17:03, Boris Brezillon
 wrote:
>
> On Tue, 12 Sep 2023 18:20:32 +0200
> Thomas Hellström  wrote:
>
> > > +/**
> > > + * get_next_vm_bo_from_list() - get the next vm_bo element
> > > + * @__gpuvm: The GPU VM
> > > + * @__list_name: The name of the list we're iterating on
> > > + * @__local_list: A pointer to the local list used to store already 
> > > iterated items
> > > + * @__prev_vm_bo: The previous element we got from 
> > > drm_gpuvm_get_next_cached_vm_bo()
> > > + *
> > > + * This helper is here to provide lockless list iteration. Lockless as 
> > > in, the
> > > + * iterator releases the lock immediately after picking the first 
> > > element from
> > > + * the list, so list insertion deletion can happen concurrently.
> >
> > Are the list spinlocks needed for that async state update from within
> > the dma-fence critical section we've discussed previously?
>
> Any driver calling _[un]link() from its drm_gpu_scheduler::run_job()
> hook will be in this situation (Panthor at the moment, PowerVR soon). I
> get that Xe and Nouveau don't need that because they update the VM
> state early (in the ioctl path), but I keep thinking this will hurt us
> if we don't think it through from the beginning, because once you've
> set this logic to depend only on resv locks, it will be pretty hard to
> get back to a solution which lets synchronous VM_BINDs take precedence
> on asynchronous request, and, with vkQueueBindSparse() passing external
> deps (plus the fact the VM_BIND queue might be pretty deep), it can
> take a long time to get your synchronous VM_BIND executed...

btw what is the use case for this? do we have actual vulkan
applications we know will have problems here?

it feels like a bit of premature optimisation, but maybe we have use cases.

Dave.

Re: [Nouveau] [PATCH drm-misc-next] drm/nouveau: fence: fix undefined fence state after emit

2023-08-29 Thread Dave Airlie

On Wed, 30 Aug 2023 at 08:38, Danilo Krummrich  wrote:
>
> nouveau_fence_emit() can fail before and after initializing the
> dma-fence and hence before and after initializing the dma-fence' kref.
>
> In order to avoid nouveau_fence_emit() potentially failing before
> dma-fence initialization pass the channel to nouveau_fence_new() already
> and perform the required check before even allocating the fence.
>
> While at it, restore the original behavior of nouveau_fence_new() and
> add nouveau_fence_create() for separate (pre-)allocation instead. Always
> splitting up allocation end emit wasn't a good idea in the first place.
> Hence, limit it to the places where we actually need to pre-allocate.
>
> Fixes: 7f2a0b50b2b2 ("drm/nouveau: fence: separate fence alloc and emit")
> Signed-off-by: Danilo Krummrich 

nice find,

Reviewed-by: Dave Airlie 


> ---
>  drivers/gpu/drm/nouveau/dispnv04/crtc.c |  9 +--
>  drivers/gpu/drm/nouveau/nouveau_bo.c|  8 +--
>  drivers/gpu/drm/nouveau/nouveau_chan.c  |  6 ++---
>  drivers/gpu/drm/nouveau/nouveau_dmem.c  |  9 +++
>  drivers/gpu/drm/nouveau/nouveau_exec.c  | 11 ++---
>  drivers/gpu/drm/nouveau/nouveau_fence.c | 32 -
>  drivers/gpu/drm/nouveau/nouveau_fence.h |  5 ++--
>  drivers/gpu/drm/nouveau/nouveau_gem.c   |  5 +---
>  8 files changed, 45 insertions(+), 40 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/dispnv04/crtc.c 
> b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
> index a34924523133..a34917b048f9 100644
> --- a/drivers/gpu/drm/nouveau/dispnv04/crtc.c
> +++ b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
> @@ -1122,18 +1122,11 @@ nv04_page_flip_emit(struct nouveau_channel *chan,
> PUSH_NVSQ(push, NV_SW, NV_SW_PAGE_FLIP, 0x);
> PUSH_KICK(push);
>
> -   ret = nouveau_fence_new(pfence);
> +   ret = nouveau_fence_new(pfence, chan);
> if (ret)
> goto fail;
>
> -   ret = nouveau_fence_emit(*pfence, chan);
> -   if (ret)
> -   goto fail_fence_unref;
> -
> return 0;
> -
> -fail_fence_unref:
> -   nouveau_fence_unref(pfence);
>  fail:
> spin_lock_irqsave(>event_lock, flags);
> list_del(>head);
> diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
> b/drivers/gpu/drm/nouveau/nouveau_bo.c
> index 64f50adb2856..56427b6a00a4 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> @@ -875,16 +875,10 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int 
> evict,
> if (ret)
> goto out_unlock;
>
> -   ret = nouveau_fence_new();
> +   ret = nouveau_fence_new(, chan);
> if (ret)
> goto out_unlock;
>
> -   ret = nouveau_fence_emit(fence, chan);
> -   if (ret) {
> -   nouveau_fence_unref();
> -   goto out_unlock;
> -   }
> -
> /* TODO: figure out a better solution here
>  *
>  * wait on the fence here explicitly as going through
> diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.c 
> b/drivers/gpu/drm/nouveau/nouveau_chan.c
> index 1fd5ccf41128..bb3d6e5c122f 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_chan.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_chan.c
> @@ -70,11 +70,9 @@ nouveau_channel_idle(struct nouveau_channel *chan)
> struct nouveau_fence *fence = NULL;
> int ret;
>
> -   ret = nouveau_fence_new();
> +   ret = nouveau_fence_new(, chan);
> if (!ret) {
> -   ret = nouveau_fence_emit(fence, chan);
> -   if (!ret)
> -   ret = nouveau_fence_wait(fence, false, false);
> +   ret = nouveau_fence_wait(fence, false, false);
> nouveau_fence_unref();
> }
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
> b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> index 61e84562094a..12feecf71e75 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> @@ -209,8 +209,7 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct 
> vm_fault *vmf)
> goto done;
> }
>
> -   if (!nouveau_fence_new())
> -   nouveau_fence_emit(fence, dmem->migrate.chan);
> +   nouveau_fence_new(, dmem->migrate.chan);
> migrate_vma_pages();
> nouveau_dmem_fence_done();
> dma_unmap_page(drm->dev->dev, dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
> @@ -403,8 +402,7 @@ nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk)
> }
>

Re: [Nouveau] [PATCH drm-misc-next] drm/nouveau: uvmm: fix unset region pointer on remap

2023-08-20 Thread Dave Airlie

Reviewed-by: Dave Airlie 

On Mon, 21 Aug 2023 at 08:29, Danilo Krummrich  wrote:
>
> Transfer the region pointer of a uvma to the new uvma(s) on re-map to
> prevent potential shader faults when the re-mapped uvma(s) are unmapped.
>
> Signed-off-by: Danilo Krummrich 
> ---
>  drivers/gpu/drm/nouveau/nouveau_uvmm.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c 
> b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> index 3a1e8538f205..aae780e4a4aa 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> @@ -639,6 +639,7 @@ nouveau_uvmm_sm_prepare(struct nouveau_uvmm *uvmm,
> struct drm_gpuva *va = r->unmap->va;
> struct uvmm_map_args remap_args = {
> .kind = uvma_from_va(va)->kind,
> +   .region = uvma_from_va(va)->region,
> };
> u64 ustart = va->va.addr;
> u64 urange = va->va.range;
>
> base-commit: 25205087df1ffe06ccea9302944ed1f77dc68c6f
> --
> 2.41.0
>

[Nouveau] [PATCH] nouveau: find the smallest page allocation to cover a buffer alloc.

2023-08-10 Thread Dave Airlie

From: Dave Airlie 

With the new uapi we don't have the comp flags on the allocation,
so we shouldn't be using the first size that works, we should be
iterating until we get the correct one.

This reduces allocations from 2MB to 64k in lots of places.

Fixes dEQP-VK.memory.allocation.basic.size_8KiB.forward.count_4000
on my ampere/gsp system.

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 949195d5d782..a6993c7807b6 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -318,8 +318,9 @@ nouveau_bo_alloc(struct nouveau_cli *cli, u64 *size, int 
*align, u32 domain,
(!vmm->page[i].host || vmm->page[i].shift > 
PAGE_SHIFT))
continue;
 
-   if (pi < 0)
-   pi = i;
+   /* pick the last one as it will be smallest. */
+   pi = i;
+
/* Stop once the buffer is larger than the current page 
size. */
if (*size >= 1ULL << vmm->page[i].shift)
break;
-- 
2.41.0

[Nouveau] [PATCH] nouveau/u_memcpya: use vmemdup_user

2023-08-10 Thread Dave Airlie

From: Dave Airlie 

I think there are limit checks in places for most things but the
new api wants to not have them.

Add a limit check and use the vmemdup_user helper instead.

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nouveau_drv.h | 19 +--
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h 
b/drivers/gpu/drm/nouveau/nouveau_drv.h
index 54063b094a69..8a7357688aff 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -189,21 +189,12 @@ u_free(void *addr)
 static inline void *
 u_memcpya(uint64_t user, unsigned nmemb, unsigned size)
 {
-   void *mem;
-   void __user *userptr = (void __force __user *)(uintptr_t)user;
+   void __user *userptr = u64_to_user_ptr(user);
+   size_t bytes;
 
-   size *= nmemb;
-
-   mem = kvmalloc(size, GFP_KERNEL);
-   if (!mem)
-   return ERR_PTR(-ENOMEM);
-
-   if (copy_from_user(mem, userptr, size)) {
-   u_free(mem);
-   return ERR_PTR(-EFAULT);
-   }
-
-   return mem;
+   if (unlikely(check_mul_overflow(nmemb, size, )))
+   return NULL;
+   return vmemdup_user(userptr, bytes);
 }
 
 #include 
-- 
2.41.0

[Nouveau] [PATCH] nouveau/u_memcpya: use kvmalloc_array.

2023-08-10 Thread Dave Airlie

From: Dave Airlie 

I think there are limit checks in places for most things but the
new api wants to not have them.

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nouveau_drv.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h 
b/drivers/gpu/drm/nouveau/nouveau_drv.h
index 54063b094a69..6661f3057b72 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -192,9 +192,7 @@ u_memcpya(uint64_t user, unsigned nmemb, unsigned size)
void *mem;
void __user *userptr = (void __force __user *)(uintptr_t)user;
 
-   size *= nmemb;
-
-   mem = kvmalloc(size, GFP_KERNEL);
+   mem = kvmalloc_array(nmemb, size, GFP_KERNEL);
if (!mem)
return ERR_PTR(-ENOMEM);
 
-- 
2.41.0

Re: [Nouveau] [PATCH drm-misc-next 0/5] Nouveau VM_BIND uAPI Fixes

2023-08-07 Thread Dave Airlie

For the series:

Reviewed-by: Dave Airlie 

On Tue, 8 Aug 2023 at 02:32, Danilo Krummrich  wrote:
>
> The patch series provides a few fixes for the recently merged VM_BIND uAPI
> mostly addressing a couple of warnings.
>
> It also contains one patch to slightly reduce the memory footprint of
> struct nouveau_uvma.
>
> Danilo Krummrich (5):
>   nouveau/dmem: fix copy-paste error in nouveau_dmem_migrate_chunk()
>   drm/nouveau: nvkm: vmm: silence warning from cast
>   drm/nouveau: remove incorrect __user annotations
>   drm/nouveau: uvmm: remove incorrect calls to mas_unlock()
>   drm/nouveau: uvmm: remove dedicated VM pointer from VMAs
>
>  drivers/gpu/drm/nouveau/nouveau_dmem.c|  2 +-
>  drivers/gpu/drm/nouveau/nouveau_exec.c|  6 ++---
>  drivers/gpu/drm/nouveau/nouveau_exec.h|  2 +-
>  drivers/gpu/drm/nouveau/nouveau_uvmm.c| 23 ---
>  drivers/gpu/drm/nouveau/nouveau_uvmm.h| 14 +--
>  .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c|  5 ++--
>  6 files changed, 24 insertions(+), 28 deletions(-)
>
>
> base-commit: 82d750e9d2f5d0594c8f7057ce59127e701af781
> --
> 2.41.0
>

Re: [Nouveau] [PATCH drm-misc-next v9 00/11] Nouveau VM_BIND UAPI & DRM GPUVA Manager (merged)

2023-08-03 Thread Dave Airlie

On Fri, 4 Aug 2023 at 02:52, Danilo Krummrich  wrote:
>
> This patch series provides a new UAPI for the Nouveau driver in order to
> support Vulkan features, such as sparse bindings and sparse residency.
>

Now that Faith has reviewed the uAPI and userspace work, I think we
should try and steer this in.

I think the only thing I see is the SPDX + MIT header in some places,
I think we can drop the MIT bits where SPDX is there, and leave
copyright and authorship (if you like), personally I've been leaving
authorship up to git, as it saves trouble with people randomly
emailing you about things you wrote 10 years ago.

Otherwise for the series:
Reviewed-by: Dave Airlie 

Dave.

[Nouveau] [PATCH] drm/nouveau: fixup the uapi header file.

2023-08-03 Thread Dave Airlie

From: Dave Airlie 

nouveau > 10 years ago had a plan for new multiplexer inside a multiplexer
API using nvif. It never fully reached fruition, fast forward 10 years,
and the new vulkan driver is avoiding libdrm and calling ioctls, and
these 3 ioctls, getparam, channel alloc + free don't seem to be things
we'd want to use nvif for.

Undeprecate and put them into the uapi header so we can just copy it
into mesa later.

v2: use uapi types.

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nouveau_abi16.h | 41 -
 include/uapi/drm/nouveau_drm.h  | 48 +++--
 2 files changed, 45 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.h 
b/drivers/gpu/drm/nouveau/nouveau_abi16.h
index 27eae85f33e6..d5d80d0d9011 100644
--- a/drivers/gpu/drm/nouveau/nouveau_abi16.h
+++ b/drivers/gpu/drm/nouveau/nouveau_abi16.h
@@ -43,28 +43,6 @@ int  nouveau_abi16_usif(struct drm_file *, void *data, u32 
size);
 #define NOUVEAU_GEM_DOMAIN_VRAM  (1 << 1)
 #define NOUVEAU_GEM_DOMAIN_GART  (1 << 2)
 
-struct drm_nouveau_channel_alloc {
-   uint32_t fb_ctxdma_handle;
-   uint32_t tt_ctxdma_handle;
-
-   int  channel;
-   uint32_t pushbuf_domains;
-
-   /* Notifier memory */
-   uint32_t notifier_handle;
-
-   /* DRM-enforced subchannel assignments */
-   struct {
-   uint32_t handle;
-   uint32_t grclass;
-   } subchan[8];
-   uint32_t nr_subchan;
-};
-
-struct drm_nouveau_channel_free {
-   int channel;
-};
-
 struct drm_nouveau_grobj_alloc {
int  channel;
uint32_t handle;
@@ -83,31 +61,12 @@ struct drm_nouveau_gpuobj_free {
uint32_t handle;
 };
 
-#define NOUVEAU_GETPARAM_PCI_VENDOR  3
-#define NOUVEAU_GETPARAM_PCI_DEVICE  4
-#define NOUVEAU_GETPARAM_BUS_TYPE5
-#define NOUVEAU_GETPARAM_FB_SIZE 8
-#define NOUVEAU_GETPARAM_AGP_SIZE9
-#define NOUVEAU_GETPARAM_CHIPSET_ID  11
-#define NOUVEAU_GETPARAM_VM_VRAM_BASE12
-#define NOUVEAU_GETPARAM_GRAPH_UNITS 13
-#define NOUVEAU_GETPARAM_PTIMER_TIME 14
-#define NOUVEAU_GETPARAM_HAS_BO_USAGE15
-#define NOUVEAU_GETPARAM_HAS_PAGEFLIP16
-struct drm_nouveau_getparam {
-   uint64_t param;
-   uint64_t value;
-};
-
 struct drm_nouveau_setparam {
uint64_t param;
uint64_t value;
 };
 
-#define DRM_IOCTL_NOUVEAU_GETPARAM   DRM_IOWR(DRM_COMMAND_BASE + 
DRM_NOUVEAU_GETPARAM, struct drm_nouveau_getparam)
 #define DRM_IOCTL_NOUVEAU_SETPARAM   DRM_IOWR(DRM_COMMAND_BASE + 
DRM_NOUVEAU_SETPARAM, struct drm_nouveau_setparam)
-#define DRM_IOCTL_NOUVEAU_CHANNEL_ALLOC  DRM_IOWR(DRM_COMMAND_BASE + 
DRM_NOUVEAU_CHANNEL_ALLOC, struct drm_nouveau_channel_alloc)
-#define DRM_IOCTL_NOUVEAU_CHANNEL_FREE   DRM_IOW (DRM_COMMAND_BASE + 
DRM_NOUVEAU_CHANNEL_FREE, struct drm_nouveau_channel_free)
 #define DRM_IOCTL_NOUVEAU_GROBJ_ALLOCDRM_IOW (DRM_COMMAND_BASE + 
DRM_NOUVEAU_GROBJ_ALLOC, struct drm_nouveau_grobj_alloc)
 #define DRM_IOCTL_NOUVEAU_NOTIFIEROBJ_ALLOC  DRM_IOWR(DRM_COMMAND_BASE + 
DRM_NOUVEAU_NOTIFIEROBJ_ALLOC, struct drm_nouveau_notifierobj_alloc)
 #define DRM_IOCTL_NOUVEAU_GPUOBJ_FREEDRM_IOW (DRM_COMMAND_BASE + 
DRM_NOUVEAU_GPUOBJ_FREE, struct drm_nouveau_gpuobj_free)
diff --git a/include/uapi/drm/nouveau_drm.h b/include/uapi/drm/nouveau_drm.h
index 853a327433d3..ca917e55b38f 100644
--- a/include/uapi/drm/nouveau_drm.h
+++ b/include/uapi/drm/nouveau_drm.h
@@ -33,6 +33,44 @@
 extern "C" {
 #endif
 
+#define NOUVEAU_GETPARAM_PCI_VENDOR  3
+#define NOUVEAU_GETPARAM_PCI_DEVICE  4
+#define NOUVEAU_GETPARAM_BUS_TYPE5
+#define NOUVEAU_GETPARAM_FB_SIZE 8
+#define NOUVEAU_GETPARAM_AGP_SIZE9
+#define NOUVEAU_GETPARAM_CHIPSET_ID  11
+#define NOUVEAU_GETPARAM_VM_VRAM_BASE12
+#define NOUVEAU_GETPARAM_GRAPH_UNITS 13
+#define NOUVEAU_GETPARAM_PTIMER_TIME 14
+#define NOUVEAU_GETPARAM_HAS_BO_USAGE15
+#define NOUVEAU_GETPARAM_HAS_PAGEFLIP16
+struct drm_nouveau_getparam {
+   __u64 param;
+   __u64 value;
+};
+
+struct drm_nouveau_channel_alloc {
+   __u32 fb_ctxdma_handle;
+   __u32 tt_ctxdma_handle;
+
+   __s32 channel;
+   __u32 pushbuf_domains;
+
+   /* Notifier memory */
+   __u32 notifier_handle;
+
+   /* DRM-enforced subchannel assignments */
+   struct {
+   __u32 handle;
+   __u32 grclass;
+   } subchan[8];
+   __u32 nr_subchan;
+};
+
+struct drm_nouveau_channel_free {
+   __s32 channel;
+};
+
 #define NOUVEAU_GEM_DOMAIN_CPU   (1 << 0)
 #define NOUVEAU_GEM_DOMAIN_VRAM  (1 << 1)
 #define NOUVEAU_GEM_DOMAIN_GART  (1 << 2)
@@ -126,10 +164,10 @@ struct drm_nouveau_gem_cpu_fini {
__u32 handle;
 };
 
-#define DRM_NOUVEAU_GETPARAM   0x00 /* deprecated */
+#define DRM_NOUVEAU_GETPARAM

Re: [Nouveau] [PATCH v2] drm/nouveau/gr: enable memory loads on helper invocation on all channels

2023-08-02 Thread Dave Airlie

On Fri, 23 Jun 2023 at 01:20, Karol Herbst  wrote:
>
> We have a lurking bug where Fragment Shader Helper Invocations can't load
> from memory. But this is actually required in OpenGL and is causing random
> hangs or failures in random shaders.
>
> It is unknown how widespread this issue is, but shaders hitting this can
> end up with infinite loops.
>
> We enable those only on all Kepler and newer GPUs where we use our own
> Firmware.
>
> Nvidia's firmware provides a way to set a kernelspace controlled list of
> mmio registers in the gr space from push buffers via MME macros.

seems sane,

Reviewed-by: Dave Airlie 
>
> v2: drop code for gm200 and newer.
>
> Cc: Ben Skeggs 
> Cc: David Airlie 
> Cc: nouveau@lists.freedesktop.org
> Cc: sta...@vger.kernel.org
> Signed-off-by: Karol Herbst 
> ---
>  drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.h  |  1 +
>  drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk104.c  |  4 +++-
>  drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk110.c  | 10 ++
>  drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk110b.c |  1 +
>  drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk208.c  |  1 +
>  drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgm107.c  |  1 +
>  6 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.h 
> b/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.h
> index 00dbeda7e346..de161e7a04aa 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.h
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.h
> @@ -117,6 +117,7 @@ void gk104_grctx_generate_r418800(struct gf100_gr *);
>
>  extern const struct gf100_grctx_func gk110_grctx;
>  void gk110_grctx_generate_r419eb0(struct gf100_gr *);
> +void gk110_grctx_generate_r419f78(struct gf100_gr *);
>
>  extern const struct gf100_grctx_func gk110b_grctx;
>  extern const struct gf100_grctx_func gk208_grctx;
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk104.c 
> b/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk104.c
> index 94233d0119df..52a234b1ef01 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk104.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk104.c
> @@ -906,7 +906,9 @@ static void
>  gk104_grctx_generate_r419f78(struct gf100_gr *gr)
>  {
> struct nvkm_device *device = gr->base.engine.subdev.device;
> -   nvkm_mask(device, 0x419f78, 0x0001, 0x);
> +
> +   /* bit 3 set disables loads in fp helper invocations, we need it 
> enabled */
> +   nvkm_mask(device, 0x419f78, 0x0009, 0x);
>  }
>
>  void
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk110.c 
> b/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk110.c
> index 4391458e1fb2..3acdd9eeb74a 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk110.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk110.c
> @@ -820,6 +820,15 @@ gk110_grctx_generate_r419eb0(struct gf100_gr *gr)
> nvkm_mask(device, 0x419eb0, 0x1000, 0x1000);
>  }
>
> +void
> +gk110_grctx_generate_r419f78(struct gf100_gr *gr)
> +{
> +   struct nvkm_device *device = gr->base.engine.subdev.device;
> +
> +   /* bit 3 set disables loads in fp helper invocations, we need it 
> enabled */
> +   nvkm_mask(device, 0x419f78, 0x0008, 0x);
> +}
> +
>  const struct gf100_grctx_func
>  gk110_grctx = {
> .main  = gf100_grctx_generate_main,
> @@ -854,4 +863,5 @@ gk110_grctx = {
> .gpc_tpc_nr = gk104_grctx_generate_gpc_tpc_nr,
> .r418800 = gk104_grctx_generate_r418800,
> .r419eb0 = gk110_grctx_generate_r419eb0,
> +   .r419f78 = gk110_grctx_generate_r419f78,
>  };
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk110b.c 
> b/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk110b.c
> index 7b9a34f9ec3c..5597e87624ac 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk110b.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk110b.c
> @@ -103,4 +103,5 @@ gk110b_grctx = {
> .gpc_tpc_nr = gk104_grctx_generate_gpc_tpc_nr,
> .r418800 = gk104_grctx_generate_r418800,
> .r419eb0 = gk110_grctx_generate_r419eb0,
> +   .r419f78 = gk110_grctx_generate_r419f78,
>  };
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk208.c 
> b/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk208.c
> index c78d07a8bb7d..612656496541 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk208.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk208.c
> @@ -568,4 +568,5 @@ gk208_grctx = {
> .dist_skip_table = gf117_grctx_generate_dist_skip_table,
> .gpc_tpc_nr = gk104_grctx_generate_gpc_tpc_nr,
> .r418800 = gk104_grctx_generate_r418800,
> +   .r419f78 =

[Nouveau] [PATCH] drm/nouveau: fixup the uapi header file.

2023-07-31 Thread Dave Airlie

From: Dave Airlie 

nouveau > 10 years ago had a plan for new multiplexer inside a multiplexer
API using nvif. It never fully reached fruition, fast forward 10 years,
and the new vulkan driver is avoiding libdrm and calling ioctls, and
these 3 ioctls, getparam, channel alloc + free don't seem to be things
we'd want to use nvif for.

Undeprecate and put them into the uapi header so we can just copy it
into mesa later.

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nouveau_abi16.h | 41 -
 include/uapi/drm/nouveau_drm.h  | 48 +++--
 2 files changed, 45 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.h 
b/drivers/gpu/drm/nouveau/nouveau_abi16.h
index 27eae85f33e6..d5d80d0d9011 100644
--- a/drivers/gpu/drm/nouveau/nouveau_abi16.h
+++ b/drivers/gpu/drm/nouveau/nouveau_abi16.h
@@ -43,28 +43,6 @@ int  nouveau_abi16_usif(struct drm_file *, void *data, u32 
size);
 #define NOUVEAU_GEM_DOMAIN_VRAM  (1 << 1)
 #define NOUVEAU_GEM_DOMAIN_GART  (1 << 2)
 
-struct drm_nouveau_channel_alloc {
-   uint32_t fb_ctxdma_handle;
-   uint32_t tt_ctxdma_handle;
-
-   int  channel;
-   uint32_t pushbuf_domains;
-
-   /* Notifier memory */
-   uint32_t notifier_handle;
-
-   /* DRM-enforced subchannel assignments */
-   struct {
-   uint32_t handle;
-   uint32_t grclass;
-   } subchan[8];
-   uint32_t nr_subchan;
-};
-
-struct drm_nouveau_channel_free {
-   int channel;
-};
-
 struct drm_nouveau_grobj_alloc {
int  channel;
uint32_t handle;
@@ -83,31 +61,12 @@ struct drm_nouveau_gpuobj_free {
uint32_t handle;
 };
 
-#define NOUVEAU_GETPARAM_PCI_VENDOR  3
-#define NOUVEAU_GETPARAM_PCI_DEVICE  4
-#define NOUVEAU_GETPARAM_BUS_TYPE5
-#define NOUVEAU_GETPARAM_FB_SIZE 8
-#define NOUVEAU_GETPARAM_AGP_SIZE9
-#define NOUVEAU_GETPARAM_CHIPSET_ID  11
-#define NOUVEAU_GETPARAM_VM_VRAM_BASE12
-#define NOUVEAU_GETPARAM_GRAPH_UNITS 13
-#define NOUVEAU_GETPARAM_PTIMER_TIME 14
-#define NOUVEAU_GETPARAM_HAS_BO_USAGE15
-#define NOUVEAU_GETPARAM_HAS_PAGEFLIP16
-struct drm_nouveau_getparam {
-   uint64_t param;
-   uint64_t value;
-};
-
 struct drm_nouveau_setparam {
uint64_t param;
uint64_t value;
 };
 
-#define DRM_IOCTL_NOUVEAU_GETPARAM   DRM_IOWR(DRM_COMMAND_BASE + 
DRM_NOUVEAU_GETPARAM, struct drm_nouveau_getparam)
 #define DRM_IOCTL_NOUVEAU_SETPARAM   DRM_IOWR(DRM_COMMAND_BASE + 
DRM_NOUVEAU_SETPARAM, struct drm_nouveau_setparam)
-#define DRM_IOCTL_NOUVEAU_CHANNEL_ALLOC  DRM_IOWR(DRM_COMMAND_BASE + 
DRM_NOUVEAU_CHANNEL_ALLOC, struct drm_nouveau_channel_alloc)
-#define DRM_IOCTL_NOUVEAU_CHANNEL_FREE   DRM_IOW (DRM_COMMAND_BASE + 
DRM_NOUVEAU_CHANNEL_FREE, struct drm_nouveau_channel_free)
 #define DRM_IOCTL_NOUVEAU_GROBJ_ALLOCDRM_IOW (DRM_COMMAND_BASE + 
DRM_NOUVEAU_GROBJ_ALLOC, struct drm_nouveau_grobj_alloc)
 #define DRM_IOCTL_NOUVEAU_NOTIFIEROBJ_ALLOC  DRM_IOWR(DRM_COMMAND_BASE + 
DRM_NOUVEAU_NOTIFIEROBJ_ALLOC, struct drm_nouveau_notifierobj_alloc)
 #define DRM_IOCTL_NOUVEAU_GPUOBJ_FREEDRM_IOW (DRM_COMMAND_BASE + 
DRM_NOUVEAU_GPUOBJ_FREE, struct drm_nouveau_gpuobj_free)
diff --git a/include/uapi/drm/nouveau_drm.h b/include/uapi/drm/nouveau_drm.h
index 853a327433d3..1cd97b3d8eda 100644
--- a/include/uapi/drm/nouveau_drm.h
+++ b/include/uapi/drm/nouveau_drm.h
@@ -33,6 +33,44 @@
 extern "C" {
 #endif
 
+#define NOUVEAU_GETPARAM_PCI_VENDOR  3
+#define NOUVEAU_GETPARAM_PCI_DEVICE  4
+#define NOUVEAU_GETPARAM_BUS_TYPE5
+#define NOUVEAU_GETPARAM_FB_SIZE 8
+#define NOUVEAU_GETPARAM_AGP_SIZE9
+#define NOUVEAU_GETPARAM_CHIPSET_ID  11
+#define NOUVEAU_GETPARAM_VM_VRAM_BASE12
+#define NOUVEAU_GETPARAM_GRAPH_UNITS 13
+#define NOUVEAU_GETPARAM_PTIMER_TIME 14
+#define NOUVEAU_GETPARAM_HAS_BO_USAGE15
+#define NOUVEAU_GETPARAM_HAS_PAGEFLIP16
+struct drm_nouveau_getparam {
+   uint64_t param;
+   uint64_t value;
+};
+
+struct drm_nouveau_channel_alloc {
+   uint32_t fb_ctxdma_handle;
+   uint32_t tt_ctxdma_handle;
+
+   int  channel;
+   uint32_t pushbuf_domains;
+
+   /* Notifier memory */
+   uint32_t notifier_handle;
+
+   /* DRM-enforced subchannel assignments */
+   struct {
+   uint32_t handle;
+   uint32_t grclass;
+   } subchan[8];
+   uint32_t nr_subchan;
+};
+
+struct drm_nouveau_channel_free {
+   int channel;
+};
+
 #define NOUVEAU_GEM_DOMAIN_CPU   (1 << 0)
 #define NOUVEAU_GEM_DOMAIN_VRAM  (1 << 1)
 #define NOUVEAU_GEM_DOMAIN_GART  (1 << 2)
@@ -126,10 +164,10 @@ struct drm_nouveau_gem_cpu_fini {
__u32 handle;
 };
 
-#define DRM_NOUVEAU_GETPARAM   0x00 /* deprecated */
+#define D

Re: [Nouveau] [PATCH drm-misc-next v8 11/12] drm/nouveau: implement new VM_BIND uAPI

2023-07-23 Thread Dave Airlie

On Sun, 23 Jul 2023 at 01:12, Faith Ekstrand  wrote:
>
> On Wed, Jul 19, 2023 at 7:15 PM Danilo Krummrich  wrote:
>>
>> This commit provides the implementation for the new uapi motivated by the
>> Vulkan API. It allows user mode drivers (UMDs) to:
>>
>> 1) Initialize a GPU virtual address (VA) space via the new
>>DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the portion of VA
>>space managed by the kernel and userspace, respectively.
>>
>> 2) Allocate and free a VA space region as well as bind and unbind memory
>>to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
>>UMDs can request the named operations to be processed either
>>synchronously or asynchronously. It supports DRM syncobjs
>>(incl. timelines) as synchronization mechanism. The management of the
>>GPU VA mappings is implemented with the DRM GPU VA manager.
>>
>> 3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl. The
>>execution happens asynchronously. It supports DRM syncobj (incl.
>>timelines) as synchronization mechanism. DRM GEM object locking is
>>handled with drm_exec.
>>
>> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use the DRM
>> GPU scheduler for the asynchronous paths.
>
>
> IDK where the best place to talk about this is but this seems as good as any.
>
> I've been looking into why the Vulkan CTS runs about 2x slower for me on the 
> new UAPI and I created a little benchmark to facilitate testing:
>
> https://gitlab.freedesktop.org/mesa/crucible/-/merge_requests/141
>
> The test, roughly, does the following:
>  1. Allocates and binds 1000 BOs
>  2. Constructs a pushbuf that executes a no-op compute shader.
>  3. Does a single EXEC/wait combo to warm up the kernel
>  4. Loops 10,000 times, doing SYNCOBJ_RESET (fast), EXEC, and then 
> SYNCOBJ_WAIT and times the loop
>
> Of course, there's a bit of userspace driver overhead but that's negledgable.
>
> If you drop the top patch which allocates 1k buffers, the submit time on the 
> old uAPI is 54 us/exec vs. 66 us/exec on the new UAPI. This includes the time 
> to do a SYNCOBJ_RESET (fast), EXEC, and SYNCOBJ_WAIT. The Intel driver, by 
> comparison, is 33us/exec so it's not syncobj overhead. This is a bit 
> concerning (you'd think the new thing would be faster) but what really has me 
> concerned is the 1k buffer case.
>
> If you include the top patch in the crucible MR, it allocates 1000 BOs and 
> VM_BINDs them. All the binding is done before the warmup EXEC. Suddenly, the 
> submit time jumps to 257 us/exec with the new UAPI. The old UAPI is much 
> worse (1134 us/exec) but that's not the point. Once we've done the first EXEC 
> and created our VM bindings, the cost per EXEC shouldn't change at all based 
> on the number of BOs bound.  Part of the point of VM_BIND is to get all that 
> binding logic and BO walking off the EXEC path.
>
> Normally, I wouldn't be too worried about a little performance problem like 
> this. This is the first implementation and we can improve it later. I get 
> that. However, I suspect the solution to this problem involves more UAPI and 
> I want to make sure we have it all before we call this all done and dusted 
> and land it.
>
> The way AMD solves this problem as well as the new Xe driver for Intel is to 
> have a concept of internal vs. external BOs. Basically, there's an INTERNAL 
> bit specified somewhere in BO creation that has a few userspace implications:
>  1. In the Xe world where VMs are objects, INTERNAL BOs are assigned a VM on 
> creation and can never be bound to any other VM.
>  2. Any attempt to export an INTERNAL BO via prime or a similar mechanism 
> will fail with -EINVAL (I think?).

Okay I've done a first pass at implementing this,

https://gitlab.freedesktop.org/nouvelles/kernel/-/commit/005a0005ff80ec85f3e9a314cae0576b7441076c

Danilo we should probably pick that up for the next iteration. with
corresponding mesa work it gets the latency for me from 180us down to
70us.

Dave.

Re: [Nouveau] [PATCH] drm/nouveau/acr: Abort loading ACR if no firmware was found

2023-07-12 Thread Dave Airlie

On Tue, 23 May 2023 at 19:37, Karol Herbst  wrote:
>
> On Mon, May 22, 2023 at 10:18 PM Karol Herbst  wrote:
> >
> > This fixes a NULL pointer access inside nvkm_acr_oneinit in case necessary
> > firmware files couldn't be loaded.
> >
> > Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/212
> > Fixes: 4b569ded09fd ("drm/nouveau/acr/ga102: initial support")
> > Signed-off-by: Karol Herbst 
> > ---
> >  drivers/gpu/drm/nouveau/nvkm/subdev/acr/base.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/acr/base.c 
> > b/drivers/gpu/drm/nouveau/nvkm/subdev/acr/base.c
> > index 795f3a649b12..6388234c352c 100644
> > --- a/drivers/gpu/drm/nouveau/nvkm/subdev/acr/base.c
> > +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/acr/base.c
> > @@ -224,7 +224,7 @@ nvkm_acr_oneinit(struct nvkm_subdev *subdev)
> > u64 falcons;
> > int ret, i;
> >
> > -   if (list_empty(>hsfw)) {
> > +   if (list_empty(>hsfw) || !acr->func->wpr_layout) {
>
> Now thinking about this, it should probably also check acr->func...

with that fixed if you think you need it,

Reviewed-by: Dave Airlie 

>
> > nvkm_debug(subdev, "No HSFW(s)\n");
> > nvkm_acr_cleanup(acr);
> > return 0;
> > --
> > 2.40.1
> >
>

Re: [Nouveau] [PATCH drm-next v5 03/14] drm: manager to keep track of GPUs VA mappings

2023-06-26 Thread Dave Airlie

> > As pointed out by Christian, this would optimize the "get all mappings
> > backed by a specific BO from a given VM" use case.
> >
> > The question for me is, do other drivers than amdgpu commonly need this?
>
> I have no idea.
>
> >
> > And what does amdgpu need this for? Maybe amdgpu does something we're
> > not doing (yet)?
>
> Basically when we do a CS we need to make sure that the VM used by this
> CS is up to date. For this we walk over the relocation list of BOs and
> check the status of each BO+VM structure.
>
> This is done because we don't want to update all VMs at the same time,
> but rather just those who needs the update.

This seems like a legacy from GL and possibly older vulkan, going
forward vulkan can't rely on passing lists of objects into the kernel
due to things like buffer_device_address, I'm not sure we should
optimise for this situation, and moving the tracking list into the
drivers is going to mean having a bunch of drivers all having the same
boilerplate, to do the same thing just so amdgpu can't avoid doing it.

Now there might be some benchmark somewhere we can produce a benefit
in this, and if there is then we should consider going this way for
all drivers and not just allowing drivers to choose their own path
here.

> >
> > Christian - I know you didn't ask for "do it the way amdgpu does",
> > instead you voted for keeping it entirely driver specific. But I think
> > everyone is pretty close and I'm still optimistic that we could just
> > generalize this.
>
> Well, you should *not* necessarily do it like amdgpu does! Basically the
> implementation in amdgpu was driven by requirements, e.g. we need that,
> let's do it like this.
>
> It's perfectly possible that other requirements (e.g. focus on Vulkan)
> lead to a completely different implementation.
>
> It's just that ideally I would like to have an implementation where I
> can apply at least the basics to amdgpu as well.
>

I think we can still do that just either have an option to disable
using the list internally in the gpuva or have the driver keep it's
own tracking alongside, there may still be use cases where it can use
the gpuva tracking instead of it's own.

I don't think we should forklift what is pretty likely to be common
code across every driver that uses this going forward just to benefit
an amdgpu design decision for older stacks.

Dave.

Re: [Nouveau] [PATCH drm-next v5 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI

2023-06-20 Thread Dave Airlie

On Tue, 20 Jun 2023 at 17:06, Oded Gabbay  wrote:
>
> On Tue, Jun 20, 2023 at 7:05 AM Dave Airlie  wrote:
> >
> > Since this is feature is nouveau only currently and doesn't disturb
> > the current nouveau code paths, I'd like to try and get this work in
> > tree so other drivers can work from it.
> >
> > If there are any major objections to this, I'm happy to pull it back
> > out again, but I'd like to get some acks/rb in the next couple of days
> > in order to land some of it.
> >
> > Dave.
> >
> >
> > >
> > > forgot to add your email address to the patch series - sorry about that.
> > >
> > > This series (v5) contains the Documentation changes you requested.
> > >
> > > - Danilo
> > >
> > > On 6/20/23 02:42, Danilo Krummrich wrote:
> > > > This patch series provides a new UAPI for the Nouveau driver in order to
> > > > support Vulkan features, such as sparse bindings and sparse residency.
> > > >
> > > > Furthermore, with the DRM GPUVA manager it provides a new DRM core 
> > > > feature to
> > > > keep track of GPU virtual address (VA) mappings in a more generic way.
> > > >
> > > > The DRM GPUVA manager is indented to help drivers implement 
> > > > userspace-manageable
> > > > GPU VA spaces in reference to the Vulkan API. In order to achieve this 
> > > > goal it
> > > > serves the following purposes in this context.
> > > >
> > > >  1) Provide infrastructure to track GPU VA allocations and mappings,
> > > > making use of the maple_tree.
> > > >
> > > >  2) Generically connect GPU VA mappings to their backing buffers, in
> > > > particular DRM GEM objects.
> Will this manager be able to connect GPU VA mappings to host memory
> allocations (aka user pointers) ?
>
> I only skimmed over the uapi definitions, but from that quick glance I
> saw you can only pass a (gem) handle to the vm bind uapi.
>
> I think it is an important feature because you don't want to have two
> GPU VA managers running in your driver (if that's even possible).
> Maybe we should at least try to make sure the uapi is/will be
> compatible with such an extension.
>

I think that would have to be a new uAPI entry point anyways, since
managing user ptrs is extra, but the uAPI is nouveau specific and
nouveau has no hostptr support as of now.

The gpuva manager is kernel internal, I think adding host ptr tracking
is useful, but I don't think it's a blocker right now.

One of the reasons I'd like to get this in the tree is to add things
like that instead of overloading this initial patchset with feature
creep.

Dave.

Re: [Nouveau] [PATCH drm-next v5 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI

2023-06-19 Thread Dave Airlie

Since this is feature is nouveau only currently and doesn't disturb
the current nouveau code paths, I'd like to try and get this work in
tree so other drivers can work from it.

If there are any major objections to this, I'm happy to pull it back
out again, but I'd like to get some acks/rb in the next couple of days
in order to land some of it.

Dave.


>
> forgot to add your email address to the patch series - sorry about that.
>
> This series (v5) contains the Documentation changes you requested.
>
> - Danilo
>
> On 6/20/23 02:42, Danilo Krummrich wrote:
> > This patch series provides a new UAPI for the Nouveau driver in order to
> > support Vulkan features, such as sparse bindings and sparse residency.
> >
> > Furthermore, with the DRM GPUVA manager it provides a new DRM core feature 
> > to
> > keep track of GPU virtual address (VA) mappings in a more generic way.
> >
> > The DRM GPUVA manager is indented to help drivers implement 
> > userspace-manageable
> > GPU VA spaces in reference to the Vulkan API. In order to achieve this goal 
> > it
> > serves the following purposes in this context.
> >
> >  1) Provide infrastructure to track GPU VA allocations and mappings,
> > making use of the maple_tree.
> >
> >  2) Generically connect GPU VA mappings to their backing buffers, in
> > particular DRM GEM objects.
> >
> >  3) Provide a common implementation to perform more complex mapping
> > operations on the GPU VA space. In particular splitting and merging
> > of GPU VA mappings, e.g. for intersecting mapping requests or 
> > partial
> > unmap requests.
> >
> > The new VM_BIND Nouveau UAPI build on top of the DRM GPUVA manager, itself
> > providing the following new interfaces.
> >
> >  1) Initialize a GPU VA space via the new DRM_IOCTL_NOUVEAU_VM_INIT 
> > ioctl
> > for UMDs to specify the portion of VA space managed by the kernel 
> > and
> > userspace, respectively.
> >
> >  2) Allocate and free a VA space region as well as bind and unbind 
> > memory
> > to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
> >
> >  3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
> >
> > Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, make use of the 
> > DRM
> > scheduler to queue jobs and support asynchronous processing with DRM 
> > syncobjs
> > as synchronization mechanism.
> >
> > By default DRM_IOCTL_NOUVEAU_VM_BIND does synchronous processing,
> > DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
> >
> > The new VM_BIND UAPI for Nouveau makes also use of drm_exec (execution 
> > context
> > for GEM buffers) by Christian König. Since the patch implementing drm_exec 
> > was
> > not yet merged into drm-next it is part of this series, as well as a small 
> > fix
> > for this patch, which was found while testing this series.
> >
> > This patch series is also available at [1].
> >
> > There is a Mesa NVK merge request by Dave Airlie [2] implementing the
> > corresponding userspace parts for this series.
> >
> > The Vulkan CTS test suite passes the sparse binding and sparse residency 
> > test
> > cases for the new UAPI together with Dave's Mesa work.
> >
> > There are also some test cases in the igt-gpu-tools project [3] for the new 
> > UAPI
> > and hence the DRM GPU VA manager. However, most of them are testing the DRM 
> > GPU
> > VA manager's logic through Nouveau's new UAPI and should be considered just 
> > as
> > helper for implementation.
> >
> > However, I absolutely intend to change those test cases to proper kunit test
> > cases for the DRM GPUVA manager, once and if we agree on it's usefulness and
> > design.
> >
> > [1] 
> > https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next /
> >  https://gitlab.freedesktop.org/nouvelles/kernel/-/merge_requests/1
> > [2] https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/150/
> > [3] 
> > https://gitlab.freedesktop.org/dakr/igt-gpu-tools/-/tree/wip_nouveau_vm_bind
> >
> > Changes in V2:
> > ==
> >Nouveau:
> >  - Reworked the Nouveau VM_BIND UAPI to avoid memory allocations in 
> > fence
> >signalling critical sections. Updates to the VA space are split up 
> > in three
> >separate stages, where only the 2. stage executes in a fence 
> > signalling
> >critical

Re: [Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-12 Thread Dave Airlie

On Sun, 12 Feb 2023 at 00:43, Chris Clayton  wrote:
>
>
>
> On 10/02/2023 19:33, Linux regression tracking (Thorsten Leemhuis) wrote:
> > On 10.02.23 20:01, Karol Herbst wrote:
> >> On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking (Thorsten
> >> Leemhuis)  wrote:
> >>>
> >>> On 08.02.23 09:48, Chris Clayton wrote:
> 
>  I'm assuming  that we are not going to see a fix for this regression 
>  before 6.2 is released.
> >>>
> >>> Yeah, looks like it. That's unfortunate, but happens. But there is still
> >>> time to fix it and there is one thing I wonder:
> >>>
> >>> Did any of the nouveau developers look at the netconsole captures Chris
> >>> posted more than a week ago to check if they somehow help to track down
> >>> the root of this problem?
> >>
> >> I did now and I can't spot anything. I think at this point it would
> >> make sense to dump the active tasks/threads via sqsrq keys to see if
> >> any is in a weird state preventing the machine from shutting down.
> >
> > Many thx for looking into it!
>
> Yes, thanks Karol.
>
> Attached is the output from dmesg when this block of code:
>
> /bin/mount /dev/sda7 /mnt/sda7
> /bin/mountpoint /proc || /bin/mount /proc
> /bin/dmesg -w > /mnt/sda7/sysrq.dmesg.log &
> /bin/echo t > /proc/sysrq-trigger
> /bin/sleep 1
> /bin/sync
> /bin/sleep 1
> kill $(pidof dmesg)
> /bin/umount /mnt/sda7
>
> is executed immediately before /sbin/reboot is called as the final step of 
> rebooting my system.
>
> I hope this is what you were looking for, but if not, please let me know what 
> you need

Another shot in the dark, but does nouveau.runpm=0 help at all?

Dave.

Re: [Nouveau] [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces

2023-02-01 Thread Dave Airlie

On Mon, 30 Jan 2023 at 23:02, Christian König  wrote:
>
> Am 29.01.23 um 19:46 schrieb Danilo Krummrich:
> > On 1/27/23 22:09, Danilo Krummrich wrote:
> >> On 1/27/23 16:17, Christian König wrote:
> >>> Am 27.01.23 um 15:44 schrieb Danilo Krummrich:
>  [SNIP]
> >>>
> >>> What you want is one component for tracking the VA allocations
> >>> (drm_mm based) and a different component/interface for tracking
> >>> the VA mappings (probably rb tree based).
> >>
> >> That's what the GPUVA manager is doing. There are gpuva_regions
> >> which correspond to VA allocations and gpuvas which represent the
> >> mappings. Both are tracked separately (currently both with a
> >> separate drm_mm, though). However, the GPUVA manager needs to
> >> take regions into account when dealing with mappings to make sure
> >> the GPUVA manager doesn't propose drivers to merge over region
> >> boundaries. Speaking from userspace PoV, the kernel wouldn't
> >> merge mappings from different VKBuffer objects even if they're
> >> virtually and physically contiguous.
> >
> > That are two completely different things and shouldn't be handled
> > in a single component.
> 
>  They are different things, but they're related in a way that for
>  handling the mappings (in particular merging and sparse) the GPUVA
>  manager needs to know the VA allocation (or region) boundaries.
> 
>  I have the feeling there might be a misunderstanding. Userspace is
>  in charge to actually allocate a portion of VA space and manage it.
>  The GPUVA manager just needs to know about those VA space
>  allocations and hence keeps track of them.
> 
>  The GPUVA manager is not meant to be an allocator in the sense of
>  finding and providing a hole for a given request.
> 
>  Maybe the non-ideal choice of using drm_mm was implying something
>  else.
> >>>
> >>> Uff, well long story short that doesn't even remotely match the
> >>> requirements. This way the GPUVA manager won't be usable for a whole
> >>> bunch of use cases.
> >>>
> >>> What we have are mappings which say X needs to point to Y with this
> >>> and hw dependent flags.
> >>>
> >>> The whole idea of having ranges is not going to fly. Neither with
> >>> AMD GPUs and I strongly think not with Intels XA either.
> >>
> >> A range in the sense of the GPUVA manager simply represents a VA
> >> space allocation (which in case of Nouveau is taken in userspace).
> >> Userspace allocates the portion of VA space and lets the kernel know
> >> about it. The current implementation needs that for the named
> >> reasons. So, I think there is no reason why this would work with one
> >> GPU, but not with another. It's just part of the design choice of the
> >> manager.
> >>
> >> And I'm absolutely happy to discuss the details of the manager
> >> implementation though.
> >>
> >>>
> > We should probably talk about the design of the GPUVA manager once
> > more when this should be applicable to all GPU drivers.
> 
>  That's what I try to figure out with this RFC, how to make it
>  appicable for all GPU drivers, so I'm happy to discuss this. :-)
> >>>
> >>> Yeah, that was really good idea :) That proposal here is really far
> >>> away from the actual requirements.
> >>>
> >>
> >> And those are the ones I'm looking for. Do you mind sharing the
> >> requirements for amdgpu in particular?
> >>
> >> For sparse residency the kernel also needs to know the region
> >> boundaries to make sure that it keeps sparse mappings around.
> >
> > What?
> 
>  When userspace creates a new VKBuffer with the
>  VK_BUFFER_CREATE_SPARSE_BINDING_BIT the kernel may need to create
>  sparse mappings in order to ensure that using this buffer without
>  any memory backed mappings doesn't fault the GPU.
> 
>  Currently, the implementation does this the following way:
> 
>  1. Userspace creates a new VKBuffer and hence allocates a portion
>  of the VA space for it. It calls into the kernel indicating the new
>  VA space region and the fact that the region is sparse.
> 
>  2. The kernel picks up the region and stores it in the GPUVA
>  manager, the driver creates the corresponding sparse mappings /
>  page table entries.
> 
>  3. Userspace might ask the driver to create a couple of memory
>  backed mappings for this particular VA region. The GPUVA manager
>  stores the mapping parameters, the driver creates the corresponding
>  page table entries.
> 
>  4. Userspace might ask to unmap all the memory backed mappings from
>  this particular VA region. The GPUVA manager removes the mapping
>  parameters, the driver cleans up the corresponding page table
>  entries. However, the driver also needs to re-create the sparse
>  mappings, since it's a sparse buffer, hence it needs to know the
>  boundaries

1 2 >

1 - 100 of 174 matches

Mail list logo