Re: [PATCH 1/2] drm/nouveau: don't fini scheduler if not initialized

2024-02-20 Thread Danilo Krummrich

On 2/20/24 20:45, Timur Tabi wrote:

On Mon, 2024-02-19 at 10:32 +0100, Danilo Krummrich wrote:

Looks like I spoke too soon, I just hit the problem with the drm-next tree.


Did you apply the patch to drm-next?


Ugh, you're right.  I don't how I got confused, but I could have sworn that I 
saw your two patches in drm-next, but they are not there.

Sorry for the noise.


No worries, thanks for testing! :-)



Re: [PATCH 1/2] drm/nouveau: don't fini scheduler if not initialized

2024-02-20 Thread Timur Tabi
On Mon, 2024-02-19 at 10:32 +0100, Danilo Krummrich wrote:

Looks like I spoke too soon, I just hit the problem with the drm-next tree.


Did you apply the patch to drm-next?

Ugh, you're right.  I don't how I got confused, but I could have sworn that I 
saw your two patches in drm-next, but they are not there.

Sorry for the noise.


Re: [PATCH 1/2] drm/nouveau: don't fini scheduler if not initialized

2024-02-19 Thread Danilo Krummrich

On 2/15/24 00:48, Timur Tabi wrote:

On Fri, 2024-02-02 at 17:14 +, Timur Tabi wrote:

On Fri, 2024-02-02 at 01:05 +0100, Danilo Krummrich wrote:

nouveau_abi16_ioctl_channel_alloc() and nouveau_cli_init() simply call
their corresponding *_fini() counterpart. This can lead to
nouveau_sched_fini() being called without struct nouveau_sched ever
being initialized in the first place.


Thanks, I've confirmed that these patches do fix the problem.


Looks like I spoke too soon, I just hit the problem with the drm-next tree.


Did you apply the patch to drm-next?



I'm able to repro the problem by having r535_gsp_init() return an error.
r535_gsp_rpc_poll return -EINVAL (I'm testing my own GSP-RM build) and
nouveau_sched_fini() is called even though nouveau_sched_init() was never
called.




Re: [PATCH 1/2] drm/nouveau: don't fini scheduler if not initialized

2024-02-14 Thread Timur Tabi
On Fri, 2024-02-02 at 17:14 +, Timur Tabi wrote:
> On Fri, 2024-02-02 at 01:05 +0100, Danilo Krummrich wrote:
> > nouveau_abi16_ioctl_channel_alloc() and nouveau_cli_init() simply call
> > their corresponding *_fini() counterpart. This can lead to
> > nouveau_sched_fini() being called without struct nouveau_sched ever
> > being initialized in the first place.
> 
> Thanks, I've confirmed that these patches do fix the problem.  

Looks like I spoke too soon, I just hit the problem with the drm-next tree.

I'm able to repro the problem by having r535_gsp_init() return an error. 
r535_gsp_rpc_poll return -EINVAL (I'm testing my own GSP-RM build) and
nouveau_sched_fini() is called even though nouveau_sched_init() was never
called.


Re: [PATCH 1/2] drm/nouveau: don't fini scheduler if not initialized

2024-02-09 Thread Dave Airlie
On Fri, 2 Feb 2024 at 10:06, Danilo Krummrich  wrote:
>
> nouveau_abi16_ioctl_channel_alloc() and nouveau_cli_init() simply call
> their corresponding *_fini() counterpart. This can lead to
> nouveau_sched_fini() being called without struct nouveau_sched ever
> being initialized in the first place.
>
> Instead of embedding struct nouveau_sched into struct nouveau_cli and
> struct nouveau_chan_abi16, allocate struct nouveau_sched separately,
> such that we can check for the corresponding pointer to be NULL in the
> particular *_fini() functions.
>
> It makes sense to allocate struct nouveau_sched separately anyway, since
> in a subsequent commit we can also avoid to allocate a struct
> nouveau_sched in nouveau_abi16_ioctl_channel_alloc() at all, if the
> VM_BIND uAPI has been disabled due to the legacy uAPI being used.

Looks good,

for the series
Reviewed-by: Dave Airlie 

>
> Fixes: 5f03a507b29e ("drm/nouveau: implement 1:1 scheduler - entity 
> relationship")
> Reported-by: Timur Tabi 
> Closes: 
> https://lore.kernel.org/nouveau/20240131213917.1545604-1-tt...@nvidia.com/
> Signed-off-by: Danilo Krummrich 
> ---
>  drivers/gpu/drm/nouveau/nouveau_abi16.c | 10 ---
>  drivers/gpu/drm/nouveau/nouveau_abi16.h |  2 +-
>  drivers/gpu/drm/nouveau/nouveau_drm.c   |  7 +++--
>  drivers/gpu/drm/nouveau/nouveau_drv.h   |  2 +-
>  drivers/gpu/drm/nouveau/nouveau_exec.c  |  2 +-
>  drivers/gpu/drm/nouveau/nouveau_sched.c | 38 +++--
>  drivers/gpu/drm/nouveau/nouveau_sched.h |  6 ++--
>  drivers/gpu/drm/nouveau/nouveau_uvmm.c  |  2 +-
>  8 files changed, 53 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.c 
> b/drivers/gpu/drm/nouveau/nouveau_abi16.c
> index a04156ca8390..ca4b5ab3e59e 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_abi16.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_abi16.c
> @@ -128,12 +128,14 @@ nouveau_abi16_chan_fini(struct nouveau_abi16 *abi16,
> struct nouveau_abi16_ntfy *ntfy, *temp;
>
> /* Cancel all jobs from the entity's queue. */
> -   drm_sched_entity_fini(>sched.entity);
> +   if (chan->sched)
> +   drm_sched_entity_fini(>sched->entity);
>
> if (chan->chan)
> nouveau_channel_idle(chan->chan);
>
> -   nouveau_sched_fini(>sched);
> +   if (chan->sched)
> +   nouveau_sched_destroy(>sched);
>
> /* cleanup notifier state */
> list_for_each_entry_safe(ntfy, temp, >notifiers, head) {
> @@ -337,8 +339,8 @@ nouveau_abi16_ioctl_channel_alloc(ABI16_IOCTL_ARGS)
> if (ret)
> goto done;
>
> -   ret = nouveau_sched_init(>sched, drm, drm->sched_wq,
> -chan->chan->dma.ib_max);
> +   ret = nouveau_sched_create(>sched, drm, drm->sched_wq,
> +  chan->chan->dma.ib_max);
> if (ret)
> goto done;
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.h 
> b/drivers/gpu/drm/nouveau/nouveau_abi16.h
> index 1f5e243c0c75..11c8c4a80079 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_abi16.h
> +++ b/drivers/gpu/drm/nouveau/nouveau_abi16.h
> @@ -26,7 +26,7 @@ struct nouveau_abi16_chan {
> struct nouveau_bo *ntfy;
> struct nouveau_vma *ntfy_vma;
> struct nvkm_mm  heap;
> -   struct nouveau_sched sched;
> +   struct nouveau_sched *sched;
>  };
>
>  struct nouveau_abi16 {
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c 
> b/drivers/gpu/drm/nouveau/nouveau_drm.c
> index 6f6c31a9937b..a947e1d5f309 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -201,7 +201,8 @@ nouveau_cli_fini(struct nouveau_cli *cli)
> WARN_ON(!list_empty(>worker));
>
> usif_client_fini(cli);
> -   nouveau_sched_fini(>sched);
> +   if (cli->sched)
> +   nouveau_sched_destroy(>sched);
> if (uvmm)
> nouveau_uvmm_fini(uvmm);
> nouveau_vmm_fini(>svm);
> @@ -311,7 +312,7 @@ nouveau_cli_init(struct nouveau_drm *drm, const char 
> *sname,
> cli->mem = [ret];
>
> /* Don't pass in the (shared) sched_wq in order to let
> -* nouveau_sched_init() create a dedicated one for VM_BIND jobs.
> +* nouveau_sched_create() create a dedicated one for VM_BIND jobs.
>  *
>  * This is required to ensure that for VM_BIND jobs free_job() work 
> and
>  * run_job() work can always run concurrently and hence, free_job() 
> work
> @@ -320,7 +321,7 @@ nouveau_cli_init(struct nouveau_drm *drm, const char 
> *sname,
>  * locks which indirectly or directly are held for allocations
>  * elsewhere.
>  */
> -   ret = nouveau_sched_init(>sched, drm, NULL, 1);
> +   ret = nouveau_sched_create(>sched, drm, NULL, 1);
> if (ret)
> goto done;
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h 
> b/drivers/gpu/drm/nouveau/nouveau_drv.h
> index 

Re: [PATCH 1/2] drm/nouveau: don't fini scheduler if not initialized

2024-02-02 Thread Danilo Krummrich

On 2/2/24 18:14, Timur Tabi wrote:

On Fri, 2024-02-02 at 01:05 +0100, Danilo Krummrich wrote:

nouveau_abi16_ioctl_channel_alloc() and nouveau_cli_init() simply call
their corresponding *_fini() counterpart. This can lead to
nouveau_sched_fini() being called without struct nouveau_sched ever
being initialized in the first place.


Thanks, I've confirmed that these patches do fix the problem


Cool, gonna add your 'Tested-by' then.

- Danilo
 



Re: [PATCH 1/2] drm/nouveau: don't fini scheduler if not initialized

2024-02-02 Thread Timur Tabi
On Fri, 2024-02-02 at 01:05 +0100, Danilo Krummrich wrote:
> nouveau_abi16_ioctl_channel_alloc() and nouveau_cli_init() simply call
> their corresponding *_fini() counterpart. This can lead to
> nouveau_sched_fini() being called without struct nouveau_sched ever
> being initialized in the first place.

Thanks, I've confirmed that these patches do fix the problem.  



[PATCH 1/2] drm/nouveau: don't fini scheduler if not initialized

2024-02-01 Thread Danilo Krummrich
nouveau_abi16_ioctl_channel_alloc() and nouveau_cli_init() simply call
their corresponding *_fini() counterpart. This can lead to
nouveau_sched_fini() being called without struct nouveau_sched ever
being initialized in the first place.

Instead of embedding struct nouveau_sched into struct nouveau_cli and
struct nouveau_chan_abi16, allocate struct nouveau_sched separately,
such that we can check for the corresponding pointer to be NULL in the
particular *_fini() functions.

It makes sense to allocate struct nouveau_sched separately anyway, since
in a subsequent commit we can also avoid to allocate a struct
nouveau_sched in nouveau_abi16_ioctl_channel_alloc() at all, if the
VM_BIND uAPI has been disabled due to the legacy uAPI being used.

Fixes: 5f03a507b29e ("drm/nouveau: implement 1:1 scheduler - entity 
relationship")
Reported-by: Timur Tabi 
Closes: 
https://lore.kernel.org/nouveau/20240131213917.1545604-1-tt...@nvidia.com/
Signed-off-by: Danilo Krummrich 
---
 drivers/gpu/drm/nouveau/nouveau_abi16.c | 10 ---
 drivers/gpu/drm/nouveau/nouveau_abi16.h |  2 +-
 drivers/gpu/drm/nouveau/nouveau_drm.c   |  7 +++--
 drivers/gpu/drm/nouveau/nouveau_drv.h   |  2 +-
 drivers/gpu/drm/nouveau/nouveau_exec.c  |  2 +-
 drivers/gpu/drm/nouveau/nouveau_sched.c | 38 +++--
 drivers/gpu/drm/nouveau/nouveau_sched.h |  6 ++--
 drivers/gpu/drm/nouveau/nouveau_uvmm.c  |  2 +-
 8 files changed, 53 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.c 
b/drivers/gpu/drm/nouveau/nouveau_abi16.c
index a04156ca8390..ca4b5ab3e59e 100644
--- a/drivers/gpu/drm/nouveau/nouveau_abi16.c
+++ b/drivers/gpu/drm/nouveau/nouveau_abi16.c
@@ -128,12 +128,14 @@ nouveau_abi16_chan_fini(struct nouveau_abi16 *abi16,
struct nouveau_abi16_ntfy *ntfy, *temp;
 
/* Cancel all jobs from the entity's queue. */
-   drm_sched_entity_fini(>sched.entity);
+   if (chan->sched)
+   drm_sched_entity_fini(>sched->entity);
 
if (chan->chan)
nouveau_channel_idle(chan->chan);
 
-   nouveau_sched_fini(>sched);
+   if (chan->sched)
+   nouveau_sched_destroy(>sched);
 
/* cleanup notifier state */
list_for_each_entry_safe(ntfy, temp, >notifiers, head) {
@@ -337,8 +339,8 @@ nouveau_abi16_ioctl_channel_alloc(ABI16_IOCTL_ARGS)
if (ret)
goto done;
 
-   ret = nouveau_sched_init(>sched, drm, drm->sched_wq,
-chan->chan->dma.ib_max);
+   ret = nouveau_sched_create(>sched, drm, drm->sched_wq,
+  chan->chan->dma.ib_max);
if (ret)
goto done;
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.h 
b/drivers/gpu/drm/nouveau/nouveau_abi16.h
index 1f5e243c0c75..11c8c4a80079 100644
--- a/drivers/gpu/drm/nouveau/nouveau_abi16.h
+++ b/drivers/gpu/drm/nouveau/nouveau_abi16.h
@@ -26,7 +26,7 @@ struct nouveau_abi16_chan {
struct nouveau_bo *ntfy;
struct nouveau_vma *ntfy_vma;
struct nvkm_mm  heap;
-   struct nouveau_sched sched;
+   struct nouveau_sched *sched;
 };
 
 struct nouveau_abi16 {
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c 
b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 6f6c31a9937b..a947e1d5f309 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -201,7 +201,8 @@ nouveau_cli_fini(struct nouveau_cli *cli)
WARN_ON(!list_empty(>worker));
 
usif_client_fini(cli);
-   nouveau_sched_fini(>sched);
+   if (cli->sched)
+   nouveau_sched_destroy(>sched);
if (uvmm)
nouveau_uvmm_fini(uvmm);
nouveau_vmm_fini(>svm);
@@ -311,7 +312,7 @@ nouveau_cli_init(struct nouveau_drm *drm, const char *sname,
cli->mem = [ret];
 
/* Don't pass in the (shared) sched_wq in order to let
-* nouveau_sched_init() create a dedicated one for VM_BIND jobs.
+* nouveau_sched_create() create a dedicated one for VM_BIND jobs.
 *
 * This is required to ensure that for VM_BIND jobs free_job() work and
 * run_job() work can always run concurrently and hence, free_job() work
@@ -320,7 +321,7 @@ nouveau_cli_init(struct nouveau_drm *drm, const char *sname,
 * locks which indirectly or directly are held for allocations
 * elsewhere.
 */
-   ret = nouveau_sched_init(>sched, drm, NULL, 1);
+   ret = nouveau_sched_create(>sched, drm, NULL, 1);
if (ret)
goto done;
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h 
b/drivers/gpu/drm/nouveau/nouveau_drv.h
index 8a6d94c8b163..e239c6bf4afa 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -98,7 +98,7 @@ struct nouveau_cli {
bool disabled;
} uvmm;
 
-   struct nouveau_sched sched;
+   struct nouveau_sched *sched;
 
const struct nvif_mclass *mem;
 
diff --git