Re: [PATCH 1/2] drm/nouveau: don't fini scheduler if not initialized
On 2/20/24 20:45, Timur Tabi wrote: On Mon, 2024-02-19 at 10:32 +0100, Danilo Krummrich wrote: Looks like I spoke too soon, I just hit the problem with the drm-next tree. Did you apply the patch to drm-next? Ugh, you're right. I don't how I got confused, but I could have sworn that I saw your two patches in drm-next, but they are not there. Sorry for the noise. No worries, thanks for testing! :-)
Re: [PATCH 1/2] drm/nouveau: don't fini scheduler if not initialized
On Mon, 2024-02-19 at 10:32 +0100, Danilo Krummrich wrote: Looks like I spoke too soon, I just hit the problem with the drm-next tree. Did you apply the patch to drm-next? Ugh, you're right. I don't how I got confused, but I could have sworn that I saw your two patches in drm-next, but they are not there. Sorry for the noise.
Re: [PATCH 1/2] drm/nouveau: don't fini scheduler if not initialized
On 2/15/24 00:48, Timur Tabi wrote: On Fri, 2024-02-02 at 17:14 +, Timur Tabi wrote: On Fri, 2024-02-02 at 01:05 +0100, Danilo Krummrich wrote: nouveau_abi16_ioctl_channel_alloc() and nouveau_cli_init() simply call their corresponding *_fini() counterpart. This can lead to nouveau_sched_fini() being called without struct nouveau_sched ever being initialized in the first place. Thanks, I've confirmed that these patches do fix the problem. Looks like I spoke too soon, I just hit the problem with the drm-next tree. Did you apply the patch to drm-next? I'm able to repro the problem by having r535_gsp_init() return an error. r535_gsp_rpc_poll return -EINVAL (I'm testing my own GSP-RM build) and nouveau_sched_fini() is called even though nouveau_sched_init() was never called.
Re: [PATCH 1/2] drm/nouveau: don't fini scheduler if not initialized
On Fri, 2024-02-02 at 17:14 +, Timur Tabi wrote: > On Fri, 2024-02-02 at 01:05 +0100, Danilo Krummrich wrote: > > nouveau_abi16_ioctl_channel_alloc() and nouveau_cli_init() simply call > > their corresponding *_fini() counterpart. This can lead to > > nouveau_sched_fini() being called without struct nouveau_sched ever > > being initialized in the first place. > > Thanks, I've confirmed that these patches do fix the problem. Looks like I spoke too soon, I just hit the problem with the drm-next tree. I'm able to repro the problem by having r535_gsp_init() return an error. r535_gsp_rpc_poll return -EINVAL (I'm testing my own GSP-RM build) and nouveau_sched_fini() is called even though nouveau_sched_init() was never called.
Re: [PATCH 1/2] drm/nouveau: don't fini scheduler if not initialized
On Fri, 2 Feb 2024 at 10:06, Danilo Krummrich wrote: > > nouveau_abi16_ioctl_channel_alloc() and nouveau_cli_init() simply call > their corresponding *_fini() counterpart. This can lead to > nouveau_sched_fini() being called without struct nouveau_sched ever > being initialized in the first place. > > Instead of embedding struct nouveau_sched into struct nouveau_cli and > struct nouveau_chan_abi16, allocate struct nouveau_sched separately, > such that we can check for the corresponding pointer to be NULL in the > particular *_fini() functions. > > It makes sense to allocate struct nouveau_sched separately anyway, since > in a subsequent commit we can also avoid to allocate a struct > nouveau_sched in nouveau_abi16_ioctl_channel_alloc() at all, if the > VM_BIND uAPI has been disabled due to the legacy uAPI being used. Looks good, for the series Reviewed-by: Dave Airlie > > Fixes: 5f03a507b29e ("drm/nouveau: implement 1:1 scheduler - entity > relationship") > Reported-by: Timur Tabi > Closes: > https://lore.kernel.org/nouveau/20240131213917.1545604-1-tt...@nvidia.com/ > Signed-off-by: Danilo Krummrich > --- > drivers/gpu/drm/nouveau/nouveau_abi16.c | 10 --- > drivers/gpu/drm/nouveau/nouveau_abi16.h | 2 +- > drivers/gpu/drm/nouveau/nouveau_drm.c | 7 +++-- > drivers/gpu/drm/nouveau/nouveau_drv.h | 2 +- > drivers/gpu/drm/nouveau/nouveau_exec.c | 2 +- > drivers/gpu/drm/nouveau/nouveau_sched.c | 38 +++-- > drivers/gpu/drm/nouveau/nouveau_sched.h | 6 ++-- > drivers/gpu/drm/nouveau/nouveau_uvmm.c | 2 +- > 8 files changed, 53 insertions(+), 16 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.c > b/drivers/gpu/drm/nouveau/nouveau_abi16.c > index a04156ca8390..ca4b5ab3e59e 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_abi16.c > +++ b/drivers/gpu/drm/nouveau/nouveau_abi16.c > @@ -128,12 +128,14 @@ nouveau_abi16_chan_fini(struct nouveau_abi16 *abi16, > struct nouveau_abi16_ntfy *ntfy, *temp; > > /* Cancel all jobs from the entity's queue. */ > - drm_sched_entity_fini(>sched.entity); > + if (chan->sched) > + drm_sched_entity_fini(>sched->entity); > > if (chan->chan) > nouveau_channel_idle(chan->chan); > > - nouveau_sched_fini(>sched); > + if (chan->sched) > + nouveau_sched_destroy(>sched); > > /* cleanup notifier state */ > list_for_each_entry_safe(ntfy, temp, >notifiers, head) { > @@ -337,8 +339,8 @@ nouveau_abi16_ioctl_channel_alloc(ABI16_IOCTL_ARGS) > if (ret) > goto done; > > - ret = nouveau_sched_init(>sched, drm, drm->sched_wq, > -chan->chan->dma.ib_max); > + ret = nouveau_sched_create(>sched, drm, drm->sched_wq, > + chan->chan->dma.ib_max); > if (ret) > goto done; > > diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.h > b/drivers/gpu/drm/nouveau/nouveau_abi16.h > index 1f5e243c0c75..11c8c4a80079 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_abi16.h > +++ b/drivers/gpu/drm/nouveau/nouveau_abi16.h > @@ -26,7 +26,7 @@ struct nouveau_abi16_chan { > struct nouveau_bo *ntfy; > struct nouveau_vma *ntfy_vma; > struct nvkm_mm heap; > - struct nouveau_sched sched; > + struct nouveau_sched *sched; > }; > > struct nouveau_abi16 { > diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c > b/drivers/gpu/drm/nouveau/nouveau_drm.c > index 6f6c31a9937b..a947e1d5f309 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_drm.c > +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c > @@ -201,7 +201,8 @@ nouveau_cli_fini(struct nouveau_cli *cli) > WARN_ON(!list_empty(>worker)); > > usif_client_fini(cli); > - nouveau_sched_fini(>sched); > + if (cli->sched) > + nouveau_sched_destroy(>sched); > if (uvmm) > nouveau_uvmm_fini(uvmm); > nouveau_vmm_fini(>svm); > @@ -311,7 +312,7 @@ nouveau_cli_init(struct nouveau_drm *drm, const char > *sname, > cli->mem = [ret]; > > /* Don't pass in the (shared) sched_wq in order to let > -* nouveau_sched_init() create a dedicated one for VM_BIND jobs. > +* nouveau_sched_create() create a dedicated one for VM_BIND jobs. > * > * This is required to ensure that for VM_BIND jobs free_job() work > and > * run_job() work can always run concurrently and hence, free_job() > work > @@ -320,7 +321,7 @@ nouveau_cli_init(struct nouveau_drm *drm, const char > *sname, > * locks which indirectly or directly are held for allocations > * elsewhere. > */ > - ret = nouveau_sched_init(>sched, drm, NULL, 1); > + ret = nouveau_sched_create(>sched, drm, NULL, 1); > if (ret) > goto done; > > diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h > b/drivers/gpu/drm/nouveau/nouveau_drv.h > index
Re: [PATCH 1/2] drm/nouveau: don't fini scheduler if not initialized
On 2/2/24 18:14, Timur Tabi wrote: On Fri, 2024-02-02 at 01:05 +0100, Danilo Krummrich wrote: nouveau_abi16_ioctl_channel_alloc() and nouveau_cli_init() simply call their corresponding *_fini() counterpart. This can lead to nouveau_sched_fini() being called without struct nouveau_sched ever being initialized in the first place. Thanks, I've confirmed that these patches do fix the problem Cool, gonna add your 'Tested-by' then. - Danilo
Re: [PATCH 1/2] drm/nouveau: don't fini scheduler if not initialized
On Fri, 2024-02-02 at 01:05 +0100, Danilo Krummrich wrote: > nouveau_abi16_ioctl_channel_alloc() and nouveau_cli_init() simply call > their corresponding *_fini() counterpart. This can lead to > nouveau_sched_fini() being called without struct nouveau_sched ever > being initialized in the first place. Thanks, I've confirmed that these patches do fix the problem.
[PATCH 1/2] drm/nouveau: don't fini scheduler if not initialized
nouveau_abi16_ioctl_channel_alloc() and nouveau_cli_init() simply call their corresponding *_fini() counterpart. This can lead to nouveau_sched_fini() being called without struct nouveau_sched ever being initialized in the first place. Instead of embedding struct nouveau_sched into struct nouveau_cli and struct nouveau_chan_abi16, allocate struct nouveau_sched separately, such that we can check for the corresponding pointer to be NULL in the particular *_fini() functions. It makes sense to allocate struct nouveau_sched separately anyway, since in a subsequent commit we can also avoid to allocate a struct nouveau_sched in nouveau_abi16_ioctl_channel_alloc() at all, if the VM_BIND uAPI has been disabled due to the legacy uAPI being used. Fixes: 5f03a507b29e ("drm/nouveau: implement 1:1 scheduler - entity relationship") Reported-by: Timur Tabi Closes: https://lore.kernel.org/nouveau/20240131213917.1545604-1-tt...@nvidia.com/ Signed-off-by: Danilo Krummrich --- drivers/gpu/drm/nouveau/nouveau_abi16.c | 10 --- drivers/gpu/drm/nouveau/nouveau_abi16.h | 2 +- drivers/gpu/drm/nouveau/nouveau_drm.c | 7 +++-- drivers/gpu/drm/nouveau/nouveau_drv.h | 2 +- drivers/gpu/drm/nouveau/nouveau_exec.c | 2 +- drivers/gpu/drm/nouveau/nouveau_sched.c | 38 +++-- drivers/gpu/drm/nouveau/nouveau_sched.h | 6 ++-- drivers/gpu/drm/nouveau/nouveau_uvmm.c | 2 +- 8 files changed, 53 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.c b/drivers/gpu/drm/nouveau/nouveau_abi16.c index a04156ca8390..ca4b5ab3e59e 100644 --- a/drivers/gpu/drm/nouveau/nouveau_abi16.c +++ b/drivers/gpu/drm/nouveau/nouveau_abi16.c @@ -128,12 +128,14 @@ nouveau_abi16_chan_fini(struct nouveau_abi16 *abi16, struct nouveau_abi16_ntfy *ntfy, *temp; /* Cancel all jobs from the entity's queue. */ - drm_sched_entity_fini(>sched.entity); + if (chan->sched) + drm_sched_entity_fini(>sched->entity); if (chan->chan) nouveau_channel_idle(chan->chan); - nouveau_sched_fini(>sched); + if (chan->sched) + nouveau_sched_destroy(>sched); /* cleanup notifier state */ list_for_each_entry_safe(ntfy, temp, >notifiers, head) { @@ -337,8 +339,8 @@ nouveau_abi16_ioctl_channel_alloc(ABI16_IOCTL_ARGS) if (ret) goto done; - ret = nouveau_sched_init(>sched, drm, drm->sched_wq, -chan->chan->dma.ib_max); + ret = nouveau_sched_create(>sched, drm, drm->sched_wq, + chan->chan->dma.ib_max); if (ret) goto done; diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.h b/drivers/gpu/drm/nouveau/nouveau_abi16.h index 1f5e243c0c75..11c8c4a80079 100644 --- a/drivers/gpu/drm/nouveau/nouveau_abi16.h +++ b/drivers/gpu/drm/nouveau/nouveau_abi16.h @@ -26,7 +26,7 @@ struct nouveau_abi16_chan { struct nouveau_bo *ntfy; struct nouveau_vma *ntfy_vma; struct nvkm_mm heap; - struct nouveau_sched sched; + struct nouveau_sched *sched; }; struct nouveau_abi16 { diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c index 6f6c31a9937b..a947e1d5f309 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -201,7 +201,8 @@ nouveau_cli_fini(struct nouveau_cli *cli) WARN_ON(!list_empty(>worker)); usif_client_fini(cli); - nouveau_sched_fini(>sched); + if (cli->sched) + nouveau_sched_destroy(>sched); if (uvmm) nouveau_uvmm_fini(uvmm); nouveau_vmm_fini(>svm); @@ -311,7 +312,7 @@ nouveau_cli_init(struct nouveau_drm *drm, const char *sname, cli->mem = [ret]; /* Don't pass in the (shared) sched_wq in order to let -* nouveau_sched_init() create a dedicated one for VM_BIND jobs. +* nouveau_sched_create() create a dedicated one for VM_BIND jobs. * * This is required to ensure that for VM_BIND jobs free_job() work and * run_job() work can always run concurrently and hence, free_job() work @@ -320,7 +321,7 @@ nouveau_cli_init(struct nouveau_drm *drm, const char *sname, * locks which indirectly or directly are held for allocations * elsewhere. */ - ret = nouveau_sched_init(>sched, drm, NULL, 1); + ret = nouveau_sched_create(>sched, drm, NULL, 1); if (ret) goto done; diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h index 8a6d94c8b163..e239c6bf4afa 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drv.h +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h @@ -98,7 +98,7 @@ struct nouveau_cli { bool disabled; } uvmm; - struct nouveau_sched sched; + struct nouveau_sched *sched; const struct nvif_mclass *mem; diff --git