Re: [PATCH drm-misc-next v9 06/11] drm/nouveau: fence: separate fence alloc and emit

2023-08-08 Thread Christian König




Am 07.08.23 um 20:54 schrieb Danilo Krummrich:

Hi Christian,

On 8/7/23 20:07, Christian König wrote:

Am 03.08.23 um 18:52 schrieb Danilo Krummrich:

The new (VM_BIND) UAPI exports DMA fences through DRM syncobjs. Hence,
in order to emit fences within DMA fence signalling critical sections
(e.g. as typically done in the DRM GPU schedulers run_job() 
callback) we

need to separate fence allocation and fence emitting.


At least from the description that sounds like it might be illegal. 
Daniel can you take a look as well.


What exactly are you doing here?


I'm basically doing exactly the same as amdgpu_fence_emit() does in 
amdgpu_ib_schedule() called by amdgpu_job_run().


The difference - and this is what this patch is for - is that I 
separate the fence allocation from emitting the fence, such that the 
fence structure is allocated before the job is submitted to the GPU 
scheduler. amdgpu solves this with GFP_ATOMIC within 
amdgpu_fence_emit() to allocate the fence structure in this case.


Yeah, that use case is perfectly valid. Maybe update the commit message 
a bit to better describe that.


Something like "Separate fence allocation and emitting to avoid 
allocation within DMA fence signalling critical sections inside the DRM 
scheduler. This helps implementing the new UAPI".


Regards,
Christian.



- Danilo



Regards,
Christian.



Signed-off-by: Danilo Krummrich 
---
  drivers/gpu/drm/nouveau/dispnv04/crtc.c |  9 -
  drivers/gpu/drm/nouveau/nouveau_bo.c    | 52 
+++--

  drivers/gpu/drm/nouveau/nouveau_chan.c  |  6 ++-
  drivers/gpu/drm/nouveau/nouveau_dmem.c  |  9 +++--
  drivers/gpu/drm/nouveau/nouveau_fence.c | 16 +++-
  drivers/gpu/drm/nouveau/nouveau_fence.h |  3 +-
  drivers/gpu/drm/nouveau/nouveau_gem.c   |  5 ++-
  7 files changed, 59 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv04/crtc.c 
b/drivers/gpu/drm/nouveau/dispnv04/crtc.c

index a6f2e681bde9..a34924523133 100644
--- a/drivers/gpu/drm/nouveau/dispnv04/crtc.c
+++ b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
@@ -1122,11 +1122,18 @@ nv04_page_flip_emit(struct nouveau_channel 
*chan,

  PUSH_NVSQ(push, NV_SW, NV_SW_PAGE_FLIP, 0x);
  PUSH_KICK(push);
-    ret = nouveau_fence_new(chan, false, pfence);
+    ret = nouveau_fence_new(pfence);
  if (ret)
  goto fail;
+    ret = nouveau_fence_emit(*pfence, chan);
+    if (ret)
+    goto fail_fence_unref;
+
  return 0;
+
+fail_fence_unref:
+    nouveau_fence_unref(pfence);
  fail:
  spin_lock_irqsave(>event_lock, flags);
  list_del(>head);
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c

index 057bc995f19b..e9cbbf594e6f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -820,29 +820,39 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object 
*bo, int evict,

  mutex_lock(>mutex);
  else
  mutex_lock_nested(>mutex, SINGLE_DEPTH_NESTING);
+
  ret = nouveau_fence_sync(nouveau_bo(bo), chan, true, 
ctx->interruptible);

-    if (ret == 0) {
-    ret = drm->ttm.move(chan, bo, bo->resource, new_reg);
-    if (ret == 0) {
-    ret = nouveau_fence_new(chan, false, );
-    if (ret == 0) {
-    /* TODO: figure out a better solution here
- *
- * wait on the fence here explicitly as going through
- * ttm_bo_move_accel_cleanup somehow doesn't seem 
to do it.

- *
- * Without this the operation can timeout and we'll 
fallback to a
- * software copy, which might take several minutes 
to finish.

- */
-    nouveau_fence_wait(fence, false, false);
-    ret = ttm_bo_move_accel_cleanup(bo,
-    >base,
-    evict, false,
-    new_reg);
-    nouveau_fence_unref();
-    }
-    }
+    if (ret)
+    goto out_unlock;
+
+    ret = drm->ttm.move(chan, bo, bo->resource, new_reg);
+    if (ret)
+    goto out_unlock;
+
+    ret = nouveau_fence_new();
+    if (ret)
+    goto out_unlock;
+
+    ret = nouveau_fence_emit(fence, chan);
+    if (ret) {
+    nouveau_fence_unref();
+    goto out_unlock;
  }
+
+    /* TODO: figure out a better solution here
+ *
+ * wait on the fence here explicitly as going through
+ * ttm_bo_move_accel_cleanup somehow doesn't seem to do it.
+ *
+ * Without this the operation can timeout and we'll fallback to a
+ * software copy, which might take several minutes to finish.
+ */
+    nouveau_fence_wait(fence, false, false);
+    ret = ttm_bo_move_accel_cleanup(bo, >base, evict, false,
+    new_reg);
+    nouveau_fence_unref();
+
+out_unlock:
  mutex_unlock(>mutex);
  return ret;
  }
diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.c 

Re: [PATCH drm-misc-next v9 06/11] drm/nouveau: fence: separate fence alloc and emit

2023-08-07 Thread Danilo Krummrich

Hi Christian,

On 8/7/23 20:07, Christian König wrote:

Am 03.08.23 um 18:52 schrieb Danilo Krummrich:

The new (VM_BIND) UAPI exports DMA fences through DRM syncobjs. Hence,
in order to emit fences within DMA fence signalling critical sections
(e.g. as typically done in the DRM GPU schedulers run_job() callback) we
need to separate fence allocation and fence emitting.


At least from the description that sounds like it might be illegal. 
Daniel can you take a look as well.


What exactly are you doing here?


I'm basically doing exactly the same as amdgpu_fence_emit() does in 
amdgpu_ib_schedule() called by amdgpu_job_run().


The difference - and this is what this patch is for - is that I separate 
the fence allocation from emitting the fence, such that the fence 
structure is allocated before the job is submitted to the GPU scheduler. 
amdgpu solves this with GFP_ATOMIC within amdgpu_fence_emit() to 
allocate the fence structure in this case.


- Danilo



Regards,
Christian.



Signed-off-by: Danilo Krummrich 
---
  drivers/gpu/drm/nouveau/dispnv04/crtc.c |  9 -
  drivers/gpu/drm/nouveau/nouveau_bo.c    | 52 +++--
  drivers/gpu/drm/nouveau/nouveau_chan.c  |  6 ++-
  drivers/gpu/drm/nouveau/nouveau_dmem.c  |  9 +++--
  drivers/gpu/drm/nouveau/nouveau_fence.c | 16 +++-
  drivers/gpu/drm/nouveau/nouveau_fence.h |  3 +-
  drivers/gpu/drm/nouveau/nouveau_gem.c   |  5 ++-
  7 files changed, 59 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv04/crtc.c 
b/drivers/gpu/drm/nouveau/dispnv04/crtc.c

index a6f2e681bde9..a34924523133 100644
--- a/drivers/gpu/drm/nouveau/dispnv04/crtc.c
+++ b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
@@ -1122,11 +1122,18 @@ nv04_page_flip_emit(struct nouveau_channel *chan,
  PUSH_NVSQ(push, NV_SW, NV_SW_PAGE_FLIP, 0x);
  PUSH_KICK(push);
-    ret = nouveau_fence_new(chan, false, pfence);
+    ret = nouveau_fence_new(pfence);
  if (ret)
  goto fail;
+    ret = nouveau_fence_emit(*pfence, chan);
+    if (ret)
+    goto fail_fence_unref;
+
  return 0;
+
+fail_fence_unref:
+    nouveau_fence_unref(pfence);
  fail:
  spin_lock_irqsave(>event_lock, flags);
  list_del(>head);
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c

index 057bc995f19b..e9cbbf594e6f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -820,29 +820,39 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object 
*bo, int evict,

  mutex_lock(>mutex);
  else
  mutex_lock_nested(>mutex, SINGLE_DEPTH_NESTING);
+
  ret = nouveau_fence_sync(nouveau_bo(bo), chan, true, 
ctx->interruptible);

-    if (ret == 0) {
-    ret = drm->ttm.move(chan, bo, bo->resource, new_reg);
-    if (ret == 0) {
-    ret = nouveau_fence_new(chan, false, );
-    if (ret == 0) {
-    /* TODO: figure out a better solution here
- *
- * wait on the fence here explicitly as going through
- * ttm_bo_move_accel_cleanup somehow doesn't seem to 
do it.

- *
- * Without this the operation can timeout and we'll 
fallback to a
- * software copy, which might take several minutes to 
finish.

- */
-    nouveau_fence_wait(fence, false, false);
-    ret = ttm_bo_move_accel_cleanup(bo,
-    >base,
-    evict, false,
-    new_reg);
-    nouveau_fence_unref();
-    }
-    }
+    if (ret)
+    goto out_unlock;
+
+    ret = drm->ttm.move(chan, bo, bo->resource, new_reg);
+    if (ret)
+    goto out_unlock;
+
+    ret = nouveau_fence_new();
+    if (ret)
+    goto out_unlock;
+
+    ret = nouveau_fence_emit(fence, chan);
+    if (ret) {
+    nouveau_fence_unref();
+    goto out_unlock;
  }
+
+    /* TODO: figure out a better solution here
+ *
+ * wait on the fence here explicitly as going through
+ * ttm_bo_move_accel_cleanup somehow doesn't seem to do it.
+ *
+ * Without this the operation can timeout and we'll fallback to a
+ * software copy, which might take several minutes to finish.
+ */
+    nouveau_fence_wait(fence, false, false);
+    ret = ttm_bo_move_accel_cleanup(bo, >base, evict, false,
+    new_reg);
+    nouveau_fence_unref();
+
+out_unlock:
  mutex_unlock(>mutex);
  return ret;
  }
diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.c 
b/drivers/gpu/drm/nouveau/nouveau_chan.c

index 6d639314250a..f69be4c8f9f2 100644
--- a/drivers/gpu/drm/nouveau/nouveau_chan.c
+++ b/drivers/gpu/drm/nouveau/nouveau_chan.c
@@ -62,9 +62,11 @@ nouveau_channel_idle(struct nouveau_channel *chan)
  struct nouveau_fence *fence = NULL;
  int ret;
-    ret = nouveau_fence_new(chan, false, );
+    ret = 

Re: [PATCH drm-misc-next v9 06/11] drm/nouveau: fence: separate fence alloc and emit

2023-08-07 Thread Christian König

Am 03.08.23 um 18:52 schrieb Danilo Krummrich:

The new (VM_BIND) UAPI exports DMA fences through DRM syncobjs. Hence,
in order to emit fences within DMA fence signalling critical sections
(e.g. as typically done in the DRM GPU schedulers run_job() callback) we
need to separate fence allocation and fence emitting.


At least from the description that sounds like it might be illegal. 
Daniel can you take a look as well.


What exactly are you doing here?

Regards,
Christian.



Signed-off-by: Danilo Krummrich 
---
  drivers/gpu/drm/nouveau/dispnv04/crtc.c |  9 -
  drivers/gpu/drm/nouveau/nouveau_bo.c| 52 +++--
  drivers/gpu/drm/nouveau/nouveau_chan.c  |  6 ++-
  drivers/gpu/drm/nouveau/nouveau_dmem.c  |  9 +++--
  drivers/gpu/drm/nouveau/nouveau_fence.c | 16 +++-
  drivers/gpu/drm/nouveau/nouveau_fence.h |  3 +-
  drivers/gpu/drm/nouveau/nouveau_gem.c   |  5 ++-
  7 files changed, 59 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv04/crtc.c 
b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
index a6f2e681bde9..a34924523133 100644
--- a/drivers/gpu/drm/nouveau/dispnv04/crtc.c
+++ b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
@@ -1122,11 +1122,18 @@ nv04_page_flip_emit(struct nouveau_channel *chan,
PUSH_NVSQ(push, NV_SW, NV_SW_PAGE_FLIP, 0x);
PUSH_KICK(push);
  
-	ret = nouveau_fence_new(chan, false, pfence);

+   ret = nouveau_fence_new(pfence);
if (ret)
goto fail;
  
+	ret = nouveau_fence_emit(*pfence, chan);

+   if (ret)
+   goto fail_fence_unref;
+
return 0;
+
+fail_fence_unref:
+   nouveau_fence_unref(pfence);
  fail:
spin_lock_irqsave(>event_lock, flags);
list_del(>head);
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 057bc995f19b..e9cbbf594e6f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -820,29 +820,39 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int 
evict,
mutex_lock(>mutex);
else
mutex_lock_nested(>mutex, SINGLE_DEPTH_NESTING);
+
ret = nouveau_fence_sync(nouveau_bo(bo), chan, true, 
ctx->interruptible);
-   if (ret == 0) {
-   ret = drm->ttm.move(chan, bo, bo->resource, new_reg);
-   if (ret == 0) {
-   ret = nouveau_fence_new(chan, false, );
-   if (ret == 0) {
-   /* TODO: figure out a better solution here
-*
-* wait on the fence here explicitly as going 
through
-* ttm_bo_move_accel_cleanup somehow doesn't 
seem to do it.
-*
-* Without this the operation can timeout and 
we'll fallback to a
-* software copy, which might take several 
minutes to finish.
-*/
-   nouveau_fence_wait(fence, false, false);
-   ret = ttm_bo_move_accel_cleanup(bo,
-   >base,
-   evict, false,
-   new_reg);
-   nouveau_fence_unref();
-   }
-   }
+   if (ret)
+   goto out_unlock;
+
+   ret = drm->ttm.move(chan, bo, bo->resource, new_reg);
+   if (ret)
+   goto out_unlock;
+
+   ret = nouveau_fence_new();
+   if (ret)
+   goto out_unlock;
+
+   ret = nouveau_fence_emit(fence, chan);
+   if (ret) {
+   nouveau_fence_unref();
+   goto out_unlock;
}
+
+   /* TODO: figure out a better solution here
+*
+* wait on the fence here explicitly as going through
+* ttm_bo_move_accel_cleanup somehow doesn't seem to do it.
+*
+* Without this the operation can timeout and we'll fallback to a
+* software copy, which might take several minutes to finish.
+*/
+   nouveau_fence_wait(fence, false, false);
+   ret = ttm_bo_move_accel_cleanup(bo, >base, evict, false,
+   new_reg);
+   nouveau_fence_unref();
+
+out_unlock:
mutex_unlock(>mutex);
return ret;
  }
diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.c 
b/drivers/gpu/drm/nouveau/nouveau_chan.c
index 6d639314250a..f69be4c8f9f2 100644
--- a/drivers/gpu/drm/nouveau/nouveau_chan.c
+++ b/drivers/gpu/drm/nouveau/nouveau_chan.c
@@ -62,9 +62,11 @@ nouveau_channel_idle(struct nouveau_channel *chan)
struct nouveau_fence *fence = NULL;
int ret;
  
-		ret = nouveau_fence_new(chan, false, );

+   ret = nouveau_fence_new();

[PATCH drm-misc-next v9 06/11] drm/nouveau: fence: separate fence alloc and emit

2023-08-03 Thread Danilo Krummrich
The new (VM_BIND) UAPI exports DMA fences through DRM syncobjs. Hence,
in order to emit fences within DMA fence signalling critical sections
(e.g. as typically done in the DRM GPU schedulers run_job() callback) we
need to separate fence allocation and fence emitting.

Signed-off-by: Danilo Krummrich 
---
 drivers/gpu/drm/nouveau/dispnv04/crtc.c |  9 -
 drivers/gpu/drm/nouveau/nouveau_bo.c| 52 +++--
 drivers/gpu/drm/nouveau/nouveau_chan.c  |  6 ++-
 drivers/gpu/drm/nouveau/nouveau_dmem.c  |  9 +++--
 drivers/gpu/drm/nouveau/nouveau_fence.c | 16 +++-
 drivers/gpu/drm/nouveau/nouveau_fence.h |  3 +-
 drivers/gpu/drm/nouveau/nouveau_gem.c   |  5 ++-
 7 files changed, 59 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv04/crtc.c 
b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
index a6f2e681bde9..a34924523133 100644
--- a/drivers/gpu/drm/nouveau/dispnv04/crtc.c
+++ b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
@@ -1122,11 +1122,18 @@ nv04_page_flip_emit(struct nouveau_channel *chan,
PUSH_NVSQ(push, NV_SW, NV_SW_PAGE_FLIP, 0x);
PUSH_KICK(push);
 
-   ret = nouveau_fence_new(chan, false, pfence);
+   ret = nouveau_fence_new(pfence);
if (ret)
goto fail;
 
+   ret = nouveau_fence_emit(*pfence, chan);
+   if (ret)
+   goto fail_fence_unref;
+
return 0;
+
+fail_fence_unref:
+   nouveau_fence_unref(pfence);
 fail:
spin_lock_irqsave(>event_lock, flags);
list_del(>head);
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 057bc995f19b..e9cbbf594e6f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -820,29 +820,39 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int 
evict,
mutex_lock(>mutex);
else
mutex_lock_nested(>mutex, SINGLE_DEPTH_NESTING);
+
ret = nouveau_fence_sync(nouveau_bo(bo), chan, true, 
ctx->interruptible);
-   if (ret == 0) {
-   ret = drm->ttm.move(chan, bo, bo->resource, new_reg);
-   if (ret == 0) {
-   ret = nouveau_fence_new(chan, false, );
-   if (ret == 0) {
-   /* TODO: figure out a better solution here
-*
-* wait on the fence here explicitly as going 
through
-* ttm_bo_move_accel_cleanup somehow doesn't 
seem to do it.
-*
-* Without this the operation can timeout and 
we'll fallback to a
-* software copy, which might take several 
minutes to finish.
-*/
-   nouveau_fence_wait(fence, false, false);
-   ret = ttm_bo_move_accel_cleanup(bo,
-   >base,
-   evict, false,
-   new_reg);
-   nouveau_fence_unref();
-   }
-   }
+   if (ret)
+   goto out_unlock;
+
+   ret = drm->ttm.move(chan, bo, bo->resource, new_reg);
+   if (ret)
+   goto out_unlock;
+
+   ret = nouveau_fence_new();
+   if (ret)
+   goto out_unlock;
+
+   ret = nouveau_fence_emit(fence, chan);
+   if (ret) {
+   nouveau_fence_unref();
+   goto out_unlock;
}
+
+   /* TODO: figure out a better solution here
+*
+* wait on the fence here explicitly as going through
+* ttm_bo_move_accel_cleanup somehow doesn't seem to do it.
+*
+* Without this the operation can timeout and we'll fallback to a
+* software copy, which might take several minutes to finish.
+*/
+   nouveau_fence_wait(fence, false, false);
+   ret = ttm_bo_move_accel_cleanup(bo, >base, evict, false,
+   new_reg);
+   nouveau_fence_unref();
+
+out_unlock:
mutex_unlock(>mutex);
return ret;
 }
diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.c 
b/drivers/gpu/drm/nouveau/nouveau_chan.c
index 6d639314250a..f69be4c8f9f2 100644
--- a/drivers/gpu/drm/nouveau/nouveau_chan.c
+++ b/drivers/gpu/drm/nouveau/nouveau_chan.c
@@ -62,9 +62,11 @@ nouveau_channel_idle(struct nouveau_channel *chan)
struct nouveau_fence *fence = NULL;
int ret;
 
-   ret = nouveau_fence_new(chan, false, );
+   ret = nouveau_fence_new();
if (!ret) {
-   ret = nouveau_fence_wait(fence, false, false);
+   ret = nouveau_fence_emit(fence, chan);
+   if (!ret)
+