Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
On Fri, Nov 22, 2024 at 05:24:03PM +0100, Philipp Stanner wrote: > On Thu, 2024-11-21 at 10:05 -0800, Matthew Brost wrote: > > On Tue, Oct 29, 2024 at 08:22:22AM +0100, Philipp Stanner wrote: > > > Christian, Sima? > > > > > > Matthew? (+CC) > > > > > > > Sorry catching up here. Internally in Xe we ref count the scheduler > > as > > the scheduler is embedded into our user queue structure > > (xe_exec_queue). > > Jobs ref the queue once they are armed. Upon queue killing we set TDR > > to > > zero which will flush out all jobs / signal all the job's fences. > > Once > > the ref count of queue is zero we do hardware / firmware teardown of > > the > > queue and then finally call drm_sched_fini before freeing the queues > > memory (and thus the scheduler). > > > > This flow seems to work pretty well but largely bypasses the > > scheduler > > functions to implement this. Not sure if this flow could be > > normalized > > at all but I would expect usage models of a 1 to 1 relationship > > between > > queue and scheduler to something similar to Xe's flow. > > One of the solution I proposed basically does the same: jobs added to > gpu_scheduler.pending_list refcount the scheduler. > > If we'd go for that solution I assume the separate refcounting could be > removed from Xe. For exmaple. > That should work. We likely need a vfunc for when the refcount goes to zero which could either free the scheduler or initiate hw / hw teardown process. In Xe the queue and scheduler actually lives for a while after the refcount is zero due to several ping / pongs with the fw. Also long as we can still support that, moving the refcount from our queues to the scheduler seems reasonable. Ofc we need buy in from the other scheduler use cases too. > > > Maybe we could > > write this done as design guideline so other drivers don't have to > > figure this out - this took me a bit to land on this. > > I already improved the documentation a while ago [1], but didn't detail > how drivers are supposed to solve this. The reason is that I don't want > to encourage more separate solutions while we're working on solving it > centrally. Let me read this and add comments there if needed. Matt > > > With that, in general I agree with Christian's patch here complaining > > about pending jobs if drm_sched_fini is called. > > Yes, firing a WARN_ON there is also fine from my POV. > > P. > > > [1] > https://lore.kernel.org/dri-devel/20241105143137.71893-2-pstan...@redhat.com/ > > > > > > > Opinions on the below? > > > > > > tl;dr: > > > I still think it's a good thing to detectably block in > > > drm_sched_fini(), or at the very least drm_sched_flush(), because > > > then > > > > See above. I don't think drm_sched_fini should block rather just > > complain this patch does which says 'go fix your driver'. > > > > Matt > > > > > you'll find out that the driver is broken and can repair it. > > > > > > P. > > > > > > > > > On Fri, 2024-10-18 at 14:07 +0200, Philipp Stanner wrote: > > > > On Mon, 2024-10-14 at 16:56 +0200, Danilo Krummrich wrote: > > > > > On Fri, Sep 27, 2024 at 11:04:48AM +0200, Christian König > > > > > wrote: > > > > > > Am 25.09.24 um 16:53 schrieb Philipp Stanner: > > > > > > > On Tue, 2024-09-24 at 13:18 +0200, Simona Vetter wrote: > > > > > > > > On Mon, Sep 23, 2024 at 05:24:10PM +0200, Christian König > > > > > > > > wrote: > > > > > > > > > Am 20.09.24 um 15:26 schrieb Philipp Stanner: > > > > > > > > > > On Fri, 2024-09-20 at 12:33 +0200, Christian König > > > > > > > > > > wrote: > > > > > > > > > > > Am 20.09.24 um 10:57 schrieb Philipp Stanner: > > > > > > > > > > > > On Wed, 2024-09-18 at 15:39 +0200, Christian > > > > > > > > > > > > König > > > > > > > > > > > > wrote: > > > > > > > > > > > > > Tearing down the scheduler with jobs still on > > > > > > > > > > > > > the > > > > > > > > > > > > > pending > > > > > > > > > > > > > list > > > > > > > > > > > > > can > > > > > > > > > > > > > lead to use after free issues. Add a warning if > > > > > > > > > > > > > drivers try > > > > > > > > > > > > > to > > > > > > > > > > > > > destroy a scheduler which still has work pushed > > > > > > > > > > > > > to > > > > > > > > > > > > > the HW. > > > > > > > > > > > > Did you have time yet to look into my proposed > > > > > > > > > > > > waitque- > > > > > > > > > > > > solution? > > > > > > > > > > > I don't remember seeing anything. What have I > > > > > > > > > > > missed? > > > > > > > > > > https://lore.kernel.org/all/20240903094446.29797-2-pstan...@redhat.com/ > > > > > > > > > Mhm, I didn't got that in my inbox for some reason. > > > > > > > > > > > > > > > > > > Interesting approach, I'm just not sure if we can or > > > > > > > > > should > > > > > > > > > wait in > > > > > > > > > drm_sched_fini(). > > > > > > > We do agree that jobs still pending when drm_sched_fini() > > > > > > > starts > > > > > > > is > > > > > > > always a bug, right? > > > > > > > > > > > > Correct, the question is how
Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
On Thu, 2024-11-21 at 10:05 -0800, Matthew Brost wrote: > On Tue, Oct 29, 2024 at 08:22:22AM +0100, Philipp Stanner wrote: > > Christian, Sima? > > > > Matthew? (+CC) > > > > Sorry catching up here. Internally in Xe we ref count the scheduler > as > the scheduler is embedded into our user queue structure > (xe_exec_queue). > Jobs ref the queue once they are armed. Upon queue killing we set TDR > to > zero which will flush out all jobs / signal all the job's fences. > Once > the ref count of queue is zero we do hardware / firmware teardown of > the > queue and then finally call drm_sched_fini before freeing the queues > memory (and thus the scheduler). > > This flow seems to work pretty well but largely bypasses the > scheduler > functions to implement this. Not sure if this flow could be > normalized > at all but I would expect usage models of a 1 to 1 relationship > between > queue and scheduler to something similar to Xe's flow. One of the solution I proposed basically does the same: jobs added to gpu_scheduler.pending_list refcount the scheduler. If we'd go for that solution I assume the separate refcounting could be removed from Xe. For exmaple. > Maybe we could > write this done as design guideline so other drivers don't have to > figure this out - this took me a bit to land on this. I already improved the documentation a while ago [1], but didn't detail how drivers are supposed to solve this. The reason is that I don't want to encourage more separate solutions while we're working on solving it centrally. > With that, in general I agree with Christian's patch here complaining > about pending jobs if drm_sched_fini is called. Yes, firing a WARN_ON there is also fine from my POV. P. [1] https://lore.kernel.org/dri-devel/20241105143137.71893-2-pstan...@redhat.com/ > > > Opinions on the below? > > > > tl;dr: > > I still think it's a good thing to detectably block in > > drm_sched_fini(), or at the very least drm_sched_flush(), because > > then > > See above. I don't think drm_sched_fini should block rather just > complain this patch does which says 'go fix your driver'. > > Matt > > > you'll find out that the driver is broken and can repair it. > > > > P. > > > > > > On Fri, 2024-10-18 at 14:07 +0200, Philipp Stanner wrote: > > > On Mon, 2024-10-14 at 16:56 +0200, Danilo Krummrich wrote: > > > > On Fri, Sep 27, 2024 at 11:04:48AM +0200, Christian König > > > > wrote: > > > > > Am 25.09.24 um 16:53 schrieb Philipp Stanner: > > > > > > On Tue, 2024-09-24 at 13:18 +0200, Simona Vetter wrote: > > > > > > > On Mon, Sep 23, 2024 at 05:24:10PM +0200, Christian König > > > > > > > wrote: > > > > > > > > Am 20.09.24 um 15:26 schrieb Philipp Stanner: > > > > > > > > > On Fri, 2024-09-20 at 12:33 +0200, Christian König > > > > > > > > > wrote: > > > > > > > > > > Am 20.09.24 um 10:57 schrieb Philipp Stanner: > > > > > > > > > > > On Wed, 2024-09-18 at 15:39 +0200, Christian > > > > > > > > > > > König > > > > > > > > > > > wrote: > > > > > > > > > > > > Tearing down the scheduler with jobs still on > > > > > > > > > > > > the > > > > > > > > > > > > pending > > > > > > > > > > > > list > > > > > > > > > > > > can > > > > > > > > > > > > lead to use after free issues. Add a warning if > > > > > > > > > > > > drivers try > > > > > > > > > > > > to > > > > > > > > > > > > destroy a scheduler which still has work pushed > > > > > > > > > > > > to > > > > > > > > > > > > the HW. > > > > > > > > > > > Did you have time yet to look into my proposed > > > > > > > > > > > waitque- > > > > > > > > > > > solution? > > > > > > > > > > I don't remember seeing anything. What have I > > > > > > > > > > missed? > > > > > > > > > https://lore.kernel.org/all/20240903094446.29797-2-pstan...@redhat.com/ > > > > > > > > Mhm, I didn't got that in my inbox for some reason. > > > > > > > > > > > > > > > > Interesting approach, I'm just not sure if we can or > > > > > > > > should > > > > > > > > wait in > > > > > > > > drm_sched_fini(). > > > > > > We do agree that jobs still pending when drm_sched_fini() > > > > > > starts > > > > > > is > > > > > > always a bug, right? > > > > > > > > > > Correct, the question is how to avoid that. > > > > > > > > > > > If so, what are the disadvantages of waiting in > > > > > > drm_sched_fini()? > > > > > > We > > > > > > could block buggy drivers as I see it. Which wouldn't be > > > > > > good, > > > > > > but > > > > > > could then be fixed on drivers' site. > > > > > > > > > > Sima explained that pretty well: Don't block in fops->close, > > > > > do > > > > > that in > > > > > fops->flush instead. > > > > > > > > I agree that we shouldn't block in close(), but this > > > > effectively > > > > means that we > > > > need to reference count the scheduler, right? > > > > > > > > Otherwise, if we allow to just skip / interrupt the teardown, > > > > we > > > > can > > > > still leak > > > > memory. > > > > > > Having thought about it, I agree with Danilo. Having something >
Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
On Thu, Nov 21, 2024 at 10:06:54AM +0100, Philipp Stanner wrote: > +CC Thomas Hellström > > > On Mon, 2024-11-18 at 12:48 +0100, Christian König wrote: > > Am 15.11.24 um 17:07 schrieb Philipp Stanner: > > > > > > > > On Fri, 2024-11-15 at 15:39 +0100, Christian König wrote: > > > > > > > > > > > Am 15.11.24 um 14:55 schrieb Philipp Stanner: > > > > > > > > > > > > > > > > > > [SNIP] > > > > > > > > > > > > > > > > > > > > > > > > > > But you wouldn't want to aim for getting rid of struct > > > > > drm_sched_job > > > > > completely, or would you? > > > > > > > > > > > > > > > > > > > > > > No, the drm_sched_job structure was added to aid the single > > > > producer > > > > single consumer queue and so made it easier to put the jobs into > > > > a > > > > container. > > > > > > > > > > > > In my mind the full solution for running jobs looks like this: > > > > > > > > 1. Driver enqueues with drm_sched_job_arm() > > > > 2. job ends up in pending_list > > > > 3. Sooner or later scheduler calls run_job() > > > > 4. In return scheduler gets a dma_fence representing the > > > > resulting > > > > hardware operation. > > > > 5, This fence is put into a container to keep around what the hw > > > > is > > > > actually executing. > > > > 6. At some point the fence gets signaled informing the scheduler > > > > that the next job can be pushed if enough credits are now > > > > available. > > > > > > > > There is no free_job callback any more because after run_job is > > > > called the scheduler is done with the job and only the dma_fence > > > > which represents the actually HW operation is the object the > > > > scheduler now works with. > > > > > > > > We would still need something like a kill_job callback which is > > > > used > > > > when an entity is released without flushing all jobs (see > > > > drm_sched_entity_kill()), but that is then only used for a > > > > specific > > > > corner case. > > > > > > > > > > Can't we just limit the scheduler's responsibility to telling the > > > driver that it has to flush, and if it doesn't it's a bug? > > > > > > > We still need to remove the pending jobs from the scheduler if > > flushing times out. > > > > Without timing out flushing and/or aborting when the process was > > killed we run into the same problem again that we don't want ti block > > on _fini(). > > > > > > > > > > > > > Blocking for the cleanup in drm_sched_fini() then becomes a > > > > trivial > > > > dma_fence_wait() on the remaining unsignaled HW fences and > > > > eventually > > > > on the latest killed fence. > > > > > > > > > > But that results in exactly the same situation as my waitque > > > solution, > > > doesn't it? > > > > > > > The key part is that dma_fence's have a documented requirement to > > never block infinitely. > > > > > > > > > > Blocking infinitely in drm_sched_fini(): > > > > > > If the driver doesn't guarantee that all fences will get signaled, > > > then > > > wait_event() in the waitque solution will block forever. The same > > > with > > > dma_fence_wait(). > > > > > > Or are you aiming at an interruptible dma_fence_wait(), thereby not > > > blocking SIGKILL? > > > > > > > That is basically what drm_sched_entity_flush() is already doing. > > > > > > > > > > That then would still result in leaks. So basically the same > > > situation > > > as with an interruptible drm_sched_flush() function. > > > > > > (Although I agree that would probably be more elegant) > > > > > > > If the wait_event really would just waiting for dma_fences then yes. > > > > The problem with the waitqueue approach is that we need to wait for > > the free_job callback and that callback is normally called from a > > work item without holding any additional locks. > > > > When drm_sched_fini starts to block for that we create a rat-tail of > > new dependencies since that one is most likely called from a file > > descriptor destruction. > > > > > > > > > > > > > > > > > > > > > > > > > The problem with this approach is that the idea of re-submitting > > > > jobs in a GPU reset by the scheduler is then basically dead. But > > > > to > > > > be honest that never worked correctly. > > > > > > > > See the deprecated warning I already put on > > > > drm_sched_resubmit_jobs(). The job lifetime is not really well > > > > defined because of this, see the free_guilty hack as well. > > > > > > > > > > drm_sched_resubmit_jobs() is being used by 5 drivers currently. So > > > if > > > we go for this approach we have to port them, first. That doesn't > > > sound > > > trivial to me > > > > > > > Yeah, completely agree. I think the scheduler should provide > > something like drm_sched_for_each_pending_fence() which helps to > > iterate over all the pending HW fences. > > > > The individual drivers could then device by themself what to do, > > e.g. upcast the HW fence into the job and
Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
On Tue, Oct 29, 2024 at 08:22:22AM +0100, Philipp Stanner wrote: > Christian, Sima? > > Matthew? (+CC) > Sorry catching up here. Internally in Xe we ref count the scheduler as the scheduler is embedded into our user queue structure (xe_exec_queue). Jobs ref the queue once they are armed. Upon queue killing we set TDR to zero which will flush out all jobs / signal all the job's fences. Once the ref count of queue is zero we do hardware / firmware teardown of the queue and then finally call drm_sched_fini before freeing the queues memory (and thus the scheduler). This flow seems to work pretty well but largely bypasses the scheduler functions to implement this. Not sure if this flow could be normalized at all but I would expect usage models of a 1 to 1 relationship between queue and scheduler to something similar to Xe's flow. Maybe we could write this done as design guideline so other drivers don't have to figure this out - this took me a bit to land on this. With that, in general I agree with Christian's patch here complaining about pending jobs if drm_sched_fini is called. > Opinions on the below? > > tl;dr: > I still think it's a good thing to detectably block in > drm_sched_fini(), or at the very least drm_sched_flush(), because then See above. I don't think drm_sched_fini should block rather just complain this patch does which says 'go fix your driver'. Matt > you'll find out that the driver is broken and can repair it. > > P. > > > On Fri, 2024-10-18 at 14:07 +0200, Philipp Stanner wrote: > > On Mon, 2024-10-14 at 16:56 +0200, Danilo Krummrich wrote: > > > On Fri, Sep 27, 2024 at 11:04:48AM +0200, Christian König wrote: > > > > Am 25.09.24 um 16:53 schrieb Philipp Stanner: > > > > > On Tue, 2024-09-24 at 13:18 +0200, Simona Vetter wrote: > > > > > > On Mon, Sep 23, 2024 at 05:24:10PM +0200, Christian König > > > > > > wrote: > > > > > > > Am 20.09.24 um 15:26 schrieb Philipp Stanner: > > > > > > > > On Fri, 2024-09-20 at 12:33 +0200, Christian König wrote: > > > > > > > > > Am 20.09.24 um 10:57 schrieb Philipp Stanner: > > > > > > > > > > On Wed, 2024-09-18 at 15:39 +0200, Christian König > > > > > > > > > > wrote: > > > > > > > > > > > Tearing down the scheduler with jobs still on the > > > > > > > > > > > pending > > > > > > > > > > > list > > > > > > > > > > > can > > > > > > > > > > > lead to use after free issues. Add a warning if > > > > > > > > > > > drivers try > > > > > > > > > > > to > > > > > > > > > > > destroy a scheduler which still has work pushed to > > > > > > > > > > > the HW. > > > > > > > > > > Did you have time yet to look into my proposed > > > > > > > > > > waitque- > > > > > > > > > > solution? > > > > > > > > > I don't remember seeing anything. What have I missed? > > > > > > > > https://lore.kernel.org/all/20240903094446.29797-2-pstan...@redhat.com/ > > > > > > > Mhm, I didn't got that in my inbox for some reason. > > > > > > > > > > > > > > Interesting approach, I'm just not sure if we can or should > > > > > > > wait in > > > > > > > drm_sched_fini(). > > > > > We do agree that jobs still pending when drm_sched_fini() > > > > > starts > > > > > is > > > > > always a bug, right? > > > > > > > > Correct, the question is how to avoid that. > > > > > > > > > If so, what are the disadvantages of waiting in > > > > > drm_sched_fini()? > > > > > We > > > > > could block buggy drivers as I see it. Which wouldn't be good, > > > > > but > > > > > could then be fixed on drivers' site. > > > > > > > > Sima explained that pretty well: Don't block in fops->close, do > > > > that in > > > > fops->flush instead. > > > > > > I agree that we shouldn't block in close(), but this effectively > > > means that we > > > need to reference count the scheduler, right? > > > > > > Otherwise, if we allow to just skip / interrupt the teardown, we > > > can > > > still leak > > > memory. > > > > Having thought about it, I agree with Danilo. Having something that > > shall wait on green light, but can be interrupted, is no guarantee > > and > > therefore not a feasible solution. > > > > To break down the solution space, these seem to be our options: > > 1. We have something (either drm_sched_fini() or a helper, e.g., > > drm_sched_flush()) that definitely blocks until the pending > > list > > has become empty. > > 2. We have jobs reference-count the scheduler, so the latter can > > outlive the driver and will be freed some time later. > > > > Can anyone think of a third solution? > > > > > > Solution #1 has the problem of obviously blocking unconditionally if > > the driver didn't make sure that all fences will be signaled within > > reasonable time. In my opinion, this would actually be an advantage, > > because it will be *very* noticable and force users to repair their > > driver. The driver *has* to guarantee that all fences will be > > signaled. > > If the driver has to do fishy things, having the blocking outsourced > > to > > the helper drm_sched_flu
Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
+CC Thomas Hellström On Mon, 2024-11-18 at 12:48 +0100, Christian König wrote: > Am 15.11.24 um 17:07 schrieb Philipp Stanner: > > > > > On Fri, 2024-11-15 at 15:39 +0100, Christian König wrote: > > > > > > > > Am 15.11.24 um 14:55 schrieb Philipp Stanner: > > > > > > > > > > > > > > [SNIP] > > > > > > > > > > > > > > > > > > > > But you wouldn't want to aim for getting rid of struct > > > > drm_sched_job > > > > completely, or would you? > > > > > > > > > > > > > > > > > No, the drm_sched_job structure was added to aid the single > > > producer > > > single consumer queue and so made it easier to put the jobs into > > > a > > > container. > > > > > > > > > In my mind the full solution for running jobs looks like this: > > > > > > 1. Driver enqueues with drm_sched_job_arm() > > > 2. job ends up in pending_list > > > 3. Sooner or later scheduler calls run_job() > > > 4. In return scheduler gets a dma_fence representing the > > > resulting > > > hardware operation. > > > 5, This fence is put into a container to keep around what the hw > > > is > > > actually executing. > > > 6. At some point the fence gets signaled informing the scheduler > > > that the next job can be pushed if enough credits are now > > > available. > > > > > > There is no free_job callback any more because after run_job is > > > called the scheduler is done with the job and only the dma_fence > > > which represents the actually HW operation is the object the > > > scheduler now works with. > > > > > > We would still need something like a kill_job callback which is > > > used > > > when an entity is released without flushing all jobs (see > > > drm_sched_entity_kill()), but that is then only used for a > > > specific > > > corner case. > > > > > > > Can't we just limit the scheduler's responsibility to telling the > > driver that it has to flush, and if it doesn't it's a bug? > > > > We still need to remove the pending jobs from the scheduler if > flushing times out. > > Without timing out flushing and/or aborting when the process was > killed we run into the same problem again that we don't want ti block > on _fini(). > > > > > > > > > Blocking for the cleanup in drm_sched_fini() then becomes a > > > trivial > > > dma_fence_wait() on the remaining unsignaled HW fences and > > > eventually > > > on the latest killed fence. > > > > > > > But that results in exactly the same situation as my waitque > > solution, > > doesn't it? > > > > The key part is that dma_fence's have a documented requirement to > never block infinitely. > > > > > > Blocking infinitely in drm_sched_fini(): > > > > If the driver doesn't guarantee that all fences will get signaled, > > then > > wait_event() in the waitque solution will block forever. The same > > with > > dma_fence_wait(). > > > > Or are you aiming at an interruptible dma_fence_wait(), thereby not > > blocking SIGKILL? > > > > That is basically what drm_sched_entity_flush() is already doing. > > > > > > That then would still result in leaks. So basically the same > > situation > > as with an interruptible drm_sched_flush() function. > > > > (Although I agree that would probably be more elegant) > > > > If the wait_event really would just waiting for dma_fences then yes. > > The problem with the waitqueue approach is that we need to wait for > the free_job callback and that callback is normally called from a > work item without holding any additional locks. > > When drm_sched_fini starts to block for that we create a rat-tail of > new dependencies since that one is most likely called from a file > descriptor destruction. > > > > > > > > > > > > > > > > > The problem with this approach is that the idea of re-submitting > > > jobs in a GPU reset by the scheduler is then basically dead. But > > > to > > > be honest that never worked correctly. > > > > > > See the deprecated warning I already put on > > > drm_sched_resubmit_jobs(). The job lifetime is not really well > > > defined because of this, see the free_guilty hack as well. > > > > > > > drm_sched_resubmit_jobs() is being used by 5 drivers currently. So > > if > > we go for this approach we have to port them, first. That doesn't > > sound > > trivial to me > > > > Yeah, completely agree. I think the scheduler should provide > something like drm_sched_for_each_pending_fence() which helps to > iterate over all the pending HW fences. > > The individual drivers could then device by themself what to do, > e.g. upcast the HW fence into the job and push it again or similar. I have talked with Dave and we would be interested in exploring the direction of getting rid of backend_ops.free_job() and doing the modifications this implies. Matthew, Thomas, any hard NACKs on principle, or can we look into this in a future patch series? P. > > But what we really need to get away from are those fence pre- > requis
Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
Am 15.11.24 um 17:07 schrieb Philipp Stanner: On Fri, 2024-11-15 at 15:39 +0100, Christian König wrote: Am 15.11.24 um 14:55 schrieb Philipp Stanner: [SNIP] But you wouldn't want to aim for getting rid of struct drm_sched_job completely, or would you? No, the drm_sched_job structure was added to aid the single producer single consumer queue and so made it easier to put the jobs into a container. In my mind the full solution for running jobs looks like this: 1. Driver enqueues with drm_sched_job_arm() 2. job ends up in pending_list 3. Sooner or later scheduler calls run_job() 4. In return scheduler gets a dma_fence representing the resulting hardware operation. 5, This fence is put into a container to keep around what the hw is actually executing. 6. At some point the fence gets signaled informing the scheduler that the next job can be pushed if enough credits are now available. There is no free_job callback any more because after run_job is called the scheduler is done with the job and only the dma_fence which represents the actually HW operation is the object the scheduler now works with. We would still need something like a kill_job callback which is used when an entity is released without flushing all jobs (see drm_sched_entity_kill()), but that is then only used for a specific corner case. Can't we just limit the scheduler's responsibility to telling the driver that it has to flush, and if it doesn't it's a bug? We still need to remove the pending jobs from the scheduler if flushing times out. Without timing out flushing and/or aborting when the process was killed we run into the same problem again that we don't want ti block on _fini(). Blocking for the cleanup in drm_sched_fini() then becomes a trivial dma_fence_wait() on the remaining unsignaled HW fences and eventually on the latest killed fence. But that results in exactly the same situation as my waitque solution, doesn't it? The key part is that dma_fence's have a documented requirement to never block infinitely. Blocking infinitely in drm_sched_fini(): If the driver doesn't guarantee that all fences will get signaled, then wait_event() in the waitque solution will block forever. The same with dma_fence_wait(). Or are you aiming at an interruptible dma_fence_wait(), thereby not blocking SIGKILL? That is basically what drm_sched_entity_flush() is already doing. That then would still result in leaks. So basically the same situation as with an interruptible drm_sched_flush() function. (Although I agree that would probably be more elegant) If the wait_event really would just waiting for dma_fences then yes. The problem with the waitqueue approach is that we need to wait for the free_job callback and that callback is normally called from a work item without holding any additional locks. When drm_sched_fini starts to block for that we create a rat-tail of new dependencies since that one is most likely called from a file descriptor destruction. The problem with this approach is that the idea of re-submitting jobs in a GPU reset by the scheduler is then basically dead. But to be honest that never worked correctly. See the deprecated warning I already put on drm_sched_resubmit_jobs(). The job lifetime is not really well defined because of this, see the free_guilty hack as well. drm_sched_resubmit_jobs() is being used by 5 drivers currently. So if we go for this approach we have to port them, first. That doesn't sound trivial to me Yeah, completely agree. I think the scheduler should provide something like drm_sched_for_each_pending_fence() which helps to iterate over all the pending HW fences. The individual drivers could then device by themself what to do, e.g. upcast the HW fence into the job and push it again or similar. But what we really need to get away from are those fence pre-requisite violations Sima pointed out. For example we can't allocate memory for run_job. Regards, Christian. P. Regards, Christian. Grüße, P.
Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
On Fri, 2024-11-15 at 15:39 +0100, Christian König wrote: > Am 15.11.24 um 14:55 schrieb Philipp Stanner: > > > [SNIP] > > > > > > A good bunch of the problems we have here are caused by abusing > > > the > > > job > > > as state for executing the submission on the hardware. > > > > > > The original design idea of the scheduler instead was to only > > > have > > > the > > > job as intermediate state between submission and picking what to > > > execute > > > next. E.g. the scheduler in that view is just a pipeline were you > > > push > > > jobs in on one end and jobs come out on the other end. > > > > > > > So let's get a bit more precise about this: > > 1. Driver enqueues with drm_sched_job_arm() > > 2. job ends up in pending_list > > 3. Sooner or later scheduler calls run_job() > > 4. Job is pushed to hardware > > 5. Fence gets signaled > > 6. ??? > > > > What would the "come out on the other end" part you describe look > > like? > > > > How would jobs get removed from pending_list and, accordingly, how > > would we avoid leaks? > > > > Let me describe the full solution a bit further down. > > > > > > > > > > > In that approach the object which represents the hardware > > > execution > > > is > > > the dma_fence instead of the job. And the dma_fence has a well > > > defined > > > lifetime, error handling, etc... > > > > > > So when we go back to this original approach it would mean that > > > we > > > don't > > > need to wait for any cleanup because the scheduler wouldn't be > > > involved > > > in the execution of the jobs on the hardware any more. > > > > > > > It would be involved (to speak precisely) in the sense of the > > scheduler > > still being the one who pushes the job onto the hardware, agreed? > > > > Yes, exactly. > > See in the original design the "job" wasn't even a defined > structure, but rather just a void*. > > So basically what the driver did was telling the scheduler here I > have a bunch of different void* please tell me which one to run > first. > > And apart from being this identifier this void* had absolutely no > meaning for the scheduler. Interesting.. > > > > > > > The worst thing that could happen is that the driver messes > > > things up > > > and still has not executed job in an entity, > > > > > > > I can't fully follow. > > > > So in your mind, the driver would personally set the dma_fence > > callback > > and hereby be informed about the job being completed, right? > > > > No. The run_job callback would still return the hw fence so that the > scheduler can install the callback and so gets informed when the next > job could be run. > > > > > > But you wouldn't want to aim for getting rid of struct > > drm_sched_job > > completely, or would you? > > > > No, the drm_sched_job structure was added to aid the single producer > single consumer queue and so made it easier to put the jobs into a > container. > > > In my mind the full solution for running jobs looks like this: > > 1. Driver enqueues with drm_sched_job_arm() > 2. job ends up in pending_list > 3. Sooner or later scheduler calls run_job() > 4. In return scheduler gets a dma_fence representing the resulting > hardware operation. > 5, This fence is put into a container to keep around what the hw is > actually executing. > 6. At some point the fence gets signaled informing the scheduler > that the next job can be pushed if enough credits are now available. > > There is no free_job callback any more because after run_job is > called the scheduler is done with the job and only the dma_fence > which represents the actually HW operation is the object the > scheduler now works with. > > We would still need something like a kill_job callback which is used > when an entity is released without flushing all jobs (see > drm_sched_entity_kill()), but that is then only used for a specific > corner case. Can't we just limit the scheduler's responsibility to telling the driver that it has to flush, and if it doesn't it's a bug? > Blocking for the cleanup in drm_sched_fini() then becomes a trivial > dma_fence_wait() on the remaining unsignaled HW fences and eventually > on the latest killed fence. But that results in exactly the same situation as my waitque solution, doesn't it? Blocking infinitely in drm_sched_fini(): If the driver doesn't guarantee that all fences will get signaled, then wait_event() in the waitque solution will block forever. The same with dma_fence_wait(). Or are you aiming at an interruptible dma_fence_wait(), thereby not blocking SIGKILL? That then would still result in leaks. So basically the same situation as with an interruptible drm_sched_flush() function. (Although I agree that would probably be more elegant) > > The problem with this approach is that the idea of re-submitting > jobs in a GPU reset by the scheduler is then basically dead. But to > be honest that never worked c
Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
Am 15.11.24 um 14:55 schrieb Philipp Stanner: [SNIP] A good bunch of the problems we have here are caused by abusing the job as state for executing the submission on the hardware. The original design idea of the scheduler instead was to only have the job as intermediate state between submission and picking what to execute next. E.g. the scheduler in that view is just a pipeline were you push jobs in on one end and jobs come out on the other end. So let's get a bit more precise about this: 1. Driver enqueues with drm_sched_job_arm() 2. job ends up in pending_list 3. Sooner or later scheduler calls run_job() 4. Job is pushed to hardware 5. Fence gets signaled 6. ??? What would the "come out on the other end" part you describe look like? How would jobs get removed from pending_list and, accordingly, how would we avoid leaks? Let me describe the full solution a bit further down. In that approach the object which represents the hardware execution is the dma_fence instead of the job. And the dma_fence has a well defined lifetime, error handling, etc... So when we go back to this original approach it would mean that we don't need to wait for any cleanup because the scheduler wouldn't be involved in the execution of the jobs on the hardware any more. It would be involved (to speak precisely) in the sense of the scheduler still being the one who pushes the job onto the hardware, agreed? Yes, exactly. See in the original design the "job" wasn't even a defined structure, but rather just a void*. So basically what the driver did was telling the scheduler here I have a bunch of different void* please tell me which one to run first. And apart from being this identifier this void* had absolutely no meaning for the scheduler. The worst thing that could happen is that the driver messes things up and still has not executed job in an entity, I can't fully follow. So in your mind, the driver would personally set the dma_fence callback and hereby be informed about the job being completed, right? No. The run_job callback would still return the hw fence so that the scheduler can install the callback and so gets informed when the next job could be run. But you wouldn't want to aim for getting rid of struct drm_sched_job completely, or would you? No, the drm_sched_job structure was added to aid the single producer single consumer queue and so made it easier to put the jobs into a container. In my mind the full solution for running jobs looks like this: 1. Driver enqueues with drm_sched_job_arm() 2. job ends up in pending_list 3. Sooner or later scheduler calls run_job() 4. In return scheduler gets a dma_fence representing the resulting hardware operation. 5, This fence is put into a container to keep around what the hw is actually executing. 6. At some point the fence gets signaled informing the scheduler that the next job can be pushed if enough credits are now available. There is no free_job callback any more because after run_job is called the scheduler is done with the job and only the dma_fence which represents the actually HW operation is the object the scheduler now works with. We would still need something like a kill_job callback which is used when an entity is released without flushing all jobs (see drm_sched_entity_kill()), but that is then only used for a specific corner case. Blocking for the cleanup in drm_sched_fini() then becomes a trivial dma_fence_wait() on the remaining unsignaled HW fences and eventually on the latest killed fence. The problem with this approach is that the idea of re-submitting jobs in a GPU reset by the scheduler is then basically dead. But to be honest that never worked correctly. See the deprecated warning I already put on drm_sched_resubmit_jobs(). The job lifetime is not really well defined because of this, see the free_guilty hack as well. Regards, Christian. Grüße, P.
Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
On Fri, 2024-11-15 at 12:26 +0100, Christian König wrote: > Interesting, those mails just showed up in my inbox as unread. More > than > 14 days after you send it. > > So sorry for the late reply. I smell google mail :p > > Am 29.10.24 um 08:22 schrieb Philipp Stanner: > > Christian, Sima? > > > > Matthew? (+CC) > > > > Opinions on the below? > > > > tl;dr: > > I still think it's a good thing to detectably block in > > drm_sched_fini(), or at the very least drm_sched_flush(), because > > then > > you'll find out that the driver is broken and can repair it. > > As discussed in private mail as well, the third option not mentioned > so > far is to completely drop the free_job callback. If it's really possible to do that I think this sounds like a great solution at first glance because we could decrease complexity quite substantially. > > A good bunch of the problems we have here are caused by abusing the > job > as state for executing the submission on the hardware. > > The original design idea of the scheduler instead was to only have > the > job as intermediate state between submission and picking what to > execute > next. E.g. the scheduler in that view is just a pipeline were you > push > jobs in on one end and jobs come out on the other end. So let's get a bit more precise about this: 1. Driver enqueues with drm_sched_job_arm() 2. job ends up in pending_list 3. Sooner or later scheduler calls run_job() 4. Job is pushed to hardware 5. Fence gets signaled 6. ??? What would the "come out on the other end" part you describe look like? How would jobs get removed from pending_list and, accordingly, how would we avoid leaks? > > In that approach the object which represents the hardware execution > is > the dma_fence instead of the job. And the dma_fence has a well > defined > lifetime, error handling, etc... > > So when we go back to this original approach it would mean that we > don't > need to wait for any cleanup because the scheduler wouldn't be > involved > in the execution of the jobs on the hardware any more. It would be involved (to speak precisely) in the sense of the scheduler still being the one who pushes the job onto the hardware, agreed? > > The worst thing that could happen is that the driver messes things up > and still has not executed job in an entity, I can't fully follow. So in your mind, the driver would personally set the dma_fence callback and hereby be informed about the job being completed, right? But you wouldn't want to aim for getting rid of struct drm_sched_job completely, or would you? Grüße, P. > but in that case we could > just warn, block for the hardware fence and then kill all pending > submissions with an error.- And this blocking on the hardware fence > can't deadlock because we know that it is a hardware fence with > certain > properties and not some random free_job callback where we don't have > any > idea what locks the driver need. > > Regards, > Christian. > > > > > P. > > > > > > On Fri, 2024-10-18 at 14:07 +0200, Philipp Stanner wrote: > > > On Mon, 2024-10-14 at 16:56 +0200, Danilo Krummrich wrote: > > > > On Fri, Sep 27, 2024 at 11:04:48AM +0200, Christian König > > > > wrote: > > > > > Am 25.09.24 um 16:53 schrieb Philipp Stanner: > > > > > > On Tue, 2024-09-24 at 13:18 +0200, Simona Vetter wrote: > > > > > > > On Mon, Sep 23, 2024 at 05:24:10PM +0200, Christian König > > > > > > > wrote: > > > > > > > > Am 20.09.24 um 15:26 schrieb Philipp Stanner: > > > > > > > > > On Fri, 2024-09-20 at 12:33 +0200, Christian König > > > > > > > > > wrote: > > > > > > > > > > Am 20.09.24 um 10:57 schrieb Philipp Stanner: > > > > > > > > > > > On Wed, 2024-09-18 at 15:39 +0200, Christian > > > > > > > > > > > König > > > > > > > > > > > wrote: > > > > > > > > > > > > Tearing down the scheduler with jobs still on > > > > > > > > > > > > the > > > > > > > > > > > > pending > > > > > > > > > > > > list > > > > > > > > > > > > can > > > > > > > > > > > > lead to use after free issues. Add a warning if > > > > > > > > > > > > drivers try > > > > > > > > > > > > to > > > > > > > > > > > > destroy a scheduler which still has work pushed > > > > > > > > > > > > to > > > > > > > > > > > > the HW. > > > > > > > > > > > Did you have time yet to look into my proposed > > > > > > > > > > > waitque- > > > > > > > > > > > solution? > > > > > > > > > > I don't remember seeing anything. What have I > > > > > > > > > > missed? > > > > > > > > > https://lore.kernel.org/all/20240903094446.29797-2-pstan...@redhat.com/ > > > > > > > > Mhm, I didn't got that in my inbox for some reason. > > > > > > > > > > > > > > > > Interesting approach, I'm just not sure if we can or > > > > > > > > should > > > > > > > > wait in > > > > > > > > drm_sched_fini(). > > > > > > We do agree that jobs still pending when drm_sched_fini() > > > > > > starts > > > > > > is > > > > > > always a bug, right? > > > > > Correct, the question is how to avoid that. >
Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
Interesting, those mails just showed up in my inbox as unread. More than 14 days after you send it. So sorry for the late reply. Am 29.10.24 um 08:22 schrieb Philipp Stanner: Christian, Sima? Matthew? (+CC) Opinions on the below? tl;dr: I still think it's a good thing to detectably block in drm_sched_fini(), or at the very least drm_sched_flush(), because then you'll find out that the driver is broken and can repair it. As discussed in private mail as well, the third option not mentioned so far is to completely drop the free_job callback. A good bunch of the problems we have here are caused by abusing the job as state for executing the submission on the hardware. The original design idea of the scheduler instead was to only have the job as intermediate state between submission and picking what to execute next. E.g. the scheduler in that view is just a pipeline were you push jobs in on one end and jobs come out on the other end. In that approach the object which represents the hardware execution is the dma_fence instead of the job. And the dma_fence has a well defined lifetime, error handling, etc... So when we go back to this original approach it would mean that we don't need to wait for any cleanup because the scheduler wouldn't be involved in the execution of the jobs on the hardware any more. The worst thing that could happen is that the driver messes things up and still has not executed job in an entity, but in that case we could just warn, block for the hardware fence and then kill all pending submissions with an error.- And this blocking on the hardware fence can't deadlock because we know that it is a hardware fence with certain properties and not some random free_job callback where we don't have any idea what locks the driver need. Regards, Christian. P. On Fri, 2024-10-18 at 14:07 +0200, Philipp Stanner wrote: On Mon, 2024-10-14 at 16:56 +0200, Danilo Krummrich wrote: On Fri, Sep 27, 2024 at 11:04:48AM +0200, Christian König wrote: Am 25.09.24 um 16:53 schrieb Philipp Stanner: On Tue, 2024-09-24 at 13:18 +0200, Simona Vetter wrote: On Mon, Sep 23, 2024 at 05:24:10PM +0200, Christian König wrote: Am 20.09.24 um 15:26 schrieb Philipp Stanner: On Fri, 2024-09-20 at 12:33 +0200, Christian König wrote: Am 20.09.24 um 10:57 schrieb Philipp Stanner: On Wed, 2024-09-18 at 15:39 +0200, Christian König wrote: Tearing down the scheduler with jobs still on the pending list can lead to use after free issues. Add a warning if drivers try to destroy a scheduler which still has work pushed to the HW. Did you have time yet to look into my proposed waitque- solution? I don't remember seeing anything. What have I missed? https://lore.kernel.org/all/20240903094446.29797-2-pstan...@redhat.com/ Mhm, I didn't got that in my inbox for some reason. Interesting approach, I'm just not sure if we can or should wait in drm_sched_fini(). We do agree that jobs still pending when drm_sched_fini() starts is always a bug, right? Correct, the question is how to avoid that. If so, what are the disadvantages of waiting in drm_sched_fini()? We could block buggy drivers as I see it. Which wouldn't be good, but could then be fixed on drivers' site. Sima explained that pretty well: Don't block in fops->close, do that in fops->flush instead. I agree that we shouldn't block in close(), but this effectively means that we need to reference count the scheduler, right? Otherwise, if we allow to just skip / interrupt the teardown, we can still leak memory. Having thought about it, I agree with Danilo. Having something that shall wait on green light, but can be interrupted, is no guarantee and therefore not a feasible solution. To break down the solution space, these seem to be our options: 1. We have something (either drm_sched_fini() or a helper, e.g., drm_sched_flush()) that definitely blocks until the pending list has become empty. 2. We have jobs reference-count the scheduler, so the latter can outlive the driver and will be freed some time later. Can anyone think of a third solution? Solution #1 has the problem of obviously blocking unconditionally if the driver didn't make sure that all fences will be signaled within reasonable time. In my opinion, this would actually be an advantage, because it will be *very* noticable and force users to repair their driver. The driver *has* to guarantee that all fences will be signaled. If the driver has to do fishy things, having the blocking outsourced to the helper drm_sched_flush() would allow them to circumvent that. Solution #2 has the problem of backend_ops.free_job() potentially using driver-data after the driver is gone, causing UAF. So with this solutions all drivers would have to be aware of the issue and handle it through one of DRMs primitives dedicated to such problems. Currently, all drivers either work around the problem internally or simply ignore it, it seems. So I'd argue that both so
Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
Am 08.11.24 um 05:00 schrieb Dave Airlie: On Wed, 18 Sept 2024 at 23:48, Christian König wrote: Tearing down the scheduler with jobs still on the pending list can lead to use after free issues. Add a warning if drivers try to destroy a scheduler which still has work pushed to the HW. When there are still entities with jobs the situation is even worse since the dma_fences for those jobs can never signal we can just choose between potentially locking up core memory management and random memory corruption. When drivers really mess it up that well let them run into a BUG_ON(). I've been talking a bit to Phillip about this offline. I'm not sure we should ever be BUG_ON here, I think it might be an extreme answer, considering we are saying blocking userspace to let things finish is bad, I think hitting this would be much worse. Yeah, completely agree that this is the most extreme response. The problem is that the scheduler doesn't have much of a choice in this moment any more. What we can do is the following: 1. Try to block for the queued up jobs in the entities to be processed and signaling their hardware fence. There is no guarantee that this won't deadlock and we are potentially blocking a SIGKILL here. We already tried it before and that turned out to be quite unstable. 2. Signal all pending fences without actually executing anything. Because of the dma_fence design, pipe-lining work and only keeping the latest fence for each context in the containers that can potentially lead to random memory corruptions. 3. Don't signal anything, just free up all jobs. That can trivially result in the core memory management waiting forever for things to make progress. Leading to complete system stops. So the choice you have are all really bad for the system as a whole. I'd rather we WARN_ON, and consider just saying screw it and block userspace close. Going to make that a WARN_ON() if you insist, but IIRC Linus or Greg once summarized it as "A BUG() is only justified if you prevent even worse things from happening". If you ask me that's exactly the case here. If we really want to avoid the hang on close possibility (though I'm mostly fine with that) then I think Sima's option to just keep a reference on the driver, let userspace close and try and clean things up on fences in the driver later. I think this should be at least good for rust lifetimes. The problem with the rust lifetime is that it got the direction of the dependency wrong. A dma_fence represents an operation on the hardware, e.g. a direct access to some memory. This means that the dma_fence can only signal if the hardware tells the driver that the operation is completed. If a dma_fence should signals because some object is not referenced any more, for example because userspace closed it, than that clearly points to a fundamentally broken design. Having an explicit memory leak is bad, having a managed memory leak is less bad, because at least all the memory is still pointed to by something and managed, at a guess we'd have to keep the whole driver and scheduler around, to avoid having to make special free functions. Unless there is some concept like TTM ghost objects we could get away with, but I think having to signal the fence means we should keep all the pieces. Well I think memory leaks are more or less harmless. What we really really need to prevent are random memory corruptions. Those can run undetected under the radar for a very long time and cause massive damage without anybody being able to pinpoint where corruptions came from. Regards, Christian. Dave. Signed-off-by: Christian König --- drivers/gpu/drm/scheduler/sched_main.c | 19 ++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index f093616fe53c..8a46fab5cdc8 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -1333,17 +1333,34 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched) drm_sched_wqueue_stop(sched); + /* +* Tearing down the scheduler wile there are still unprocessed jobs can +* lead to use after free issues in the scheduler fence. +*/ + WARN_ON(!list_empty(&sched->pending_list)); + for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) { struct drm_sched_rq *rq = sched->sched_rq[i]; spin_lock(&rq->lock); - list_for_each_entry(s_entity, &rq->entities, list) + list_for_each_entry(s_entity, &rq->entities, list) { + /* +* The justification for this BUG_ON() is that tearing +* down the scheduler while jobs are pending leaves +* dma_fences unsignaled. Since we have dependencies +* from the core memory manageme
Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
On Wed, 18 Sept 2024 at 23:48, Christian König wrote: > > Tearing down the scheduler with jobs still on the pending list can > lead to use after free issues. Add a warning if drivers try to > destroy a scheduler which still has work pushed to the HW. > > When there are still entities with jobs the situation is even worse > since the dma_fences for those jobs can never signal we can just > choose between potentially locking up core memory management and > random memory corruption. When drivers really mess it up that well > let them run into a BUG_ON(). I've been talking a bit to Phillip about this offline. I'm not sure we should ever be BUG_ON here, I think it might be an extreme answer, considering we are saying blocking userspace to let things finish is bad, I think hitting this would be much worse. I'd rather we WARN_ON, and consider just saying screw it and block userspace close. If we really want to avoid the hang on close possibility (though I'm mostly fine with that) then I think Sima's option to just keep a reference on the driver, let userspace close and try and clean things up on fences in the driver later. I think this should be at least good for rust lifetimes. Having an explicit memory leak is bad, having a managed memory leak is less bad, because at least all the memory is still pointed to by something and managed, at a guess we'd have to keep the whole driver and scheduler around, to avoid having to make special free functions. Unless there is some concept like TTM ghost objects we could get away with, but I think having to signal the fence means we should keep all the pieces. Dave. > > Signed-off-by: Christian König > --- > drivers/gpu/drm/scheduler/sched_main.c | 19 ++- > 1 file changed, 18 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > b/drivers/gpu/drm/scheduler/sched_main.c > index f093616fe53c..8a46fab5cdc8 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -1333,17 +1333,34 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched) > > drm_sched_wqueue_stop(sched); > > + /* > +* Tearing down the scheduler wile there are still unprocessed jobs > can > +* lead to use after free issues in the scheduler fence. > +*/ > + WARN_ON(!list_empty(&sched->pending_list)); > + > for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) { > struct drm_sched_rq *rq = sched->sched_rq[i]; > > spin_lock(&rq->lock); > - list_for_each_entry(s_entity, &rq->entities, list) > + list_for_each_entry(s_entity, &rq->entities, list) { > + /* > +* The justification for this BUG_ON() is that tearing > +* down the scheduler while jobs are pending leaves > +* dma_fences unsignaled. Since we have dependencies > +* from the core memory management to eventually > signal > +* dma_fences this can trivially lead to a system wide > +* stop because of a locked up memory management. > +*/ > + BUG_ON(spsc_queue_count(&s_entity->job_queue)); > + > /* > * Prevents reinsertion and marks job_queue as idle, > * it will removed from rq in drm_sched_entity_fini > * eventually > */ > s_entity->stopped = true; > + } > spin_unlock(&rq->lock); > kfree(sched->sched_rq[i]); > } > -- > 2.34.1 >
Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
Christian, Sima? Matthew? (+CC) Opinions on the below? tl;dr: I still think it's a good thing to detectably block in drm_sched_fini(), or at the very least drm_sched_flush(), because then you'll find out that the driver is broken and can repair it. P. On Fri, 2024-10-18 at 14:07 +0200, Philipp Stanner wrote: > On Mon, 2024-10-14 at 16:56 +0200, Danilo Krummrich wrote: > > On Fri, Sep 27, 2024 at 11:04:48AM +0200, Christian König wrote: > > > Am 25.09.24 um 16:53 schrieb Philipp Stanner: > > > > On Tue, 2024-09-24 at 13:18 +0200, Simona Vetter wrote: > > > > > On Mon, Sep 23, 2024 at 05:24:10PM +0200, Christian König > > > > > wrote: > > > > > > Am 20.09.24 um 15:26 schrieb Philipp Stanner: > > > > > > > On Fri, 2024-09-20 at 12:33 +0200, Christian König wrote: > > > > > > > > Am 20.09.24 um 10:57 schrieb Philipp Stanner: > > > > > > > > > On Wed, 2024-09-18 at 15:39 +0200, Christian König > > > > > > > > > wrote: > > > > > > > > > > Tearing down the scheduler with jobs still on the > > > > > > > > > > pending > > > > > > > > > > list > > > > > > > > > > can > > > > > > > > > > lead to use after free issues. Add a warning if > > > > > > > > > > drivers try > > > > > > > > > > to > > > > > > > > > > destroy a scheduler which still has work pushed to > > > > > > > > > > the HW. > > > > > > > > > Did you have time yet to look into my proposed > > > > > > > > > waitque- > > > > > > > > > solution? > > > > > > > > I don't remember seeing anything. What have I missed? > > > > > > > https://lore.kernel.org/all/20240903094446.29797-2-pstan...@redhat.com/ > > > > > > Mhm, I didn't got that in my inbox for some reason. > > > > > > > > > > > > Interesting approach, I'm just not sure if we can or should > > > > > > wait in > > > > > > drm_sched_fini(). > > > > We do agree that jobs still pending when drm_sched_fini() > > > > starts > > > > is > > > > always a bug, right? > > > > > > Correct, the question is how to avoid that. > > > > > > > If so, what are the disadvantages of waiting in > > > > drm_sched_fini()? > > > > We > > > > could block buggy drivers as I see it. Which wouldn't be good, > > > > but > > > > could then be fixed on drivers' site. > > > > > > Sima explained that pretty well: Don't block in fops->close, do > > > that in > > > fops->flush instead. > > > > I agree that we shouldn't block in close(), but this effectively > > means that we > > need to reference count the scheduler, right? > > > > Otherwise, if we allow to just skip / interrupt the teardown, we > > can > > still leak > > memory. > > Having thought about it, I agree with Danilo. Having something that > shall wait on green light, but can be interrupted, is no guarantee > and > therefore not a feasible solution. > > To break down the solution space, these seem to be our options: > 1. We have something (either drm_sched_fini() or a helper, e.g., > drm_sched_flush()) that definitely blocks until the pending > list > has become empty. > 2. We have jobs reference-count the scheduler, so the latter can > outlive the driver and will be freed some time later. > > Can anyone think of a third solution? > > > Solution #1 has the problem of obviously blocking unconditionally if > the driver didn't make sure that all fences will be signaled within > reasonable time. In my opinion, this would actually be an advantage, > because it will be *very* noticable and force users to repair their > driver. The driver *has* to guarantee that all fences will be > signaled. > If the driver has to do fishy things, having the blocking outsourced > to > the helper drm_sched_flush() would allow them to circumvent that. > > Solution #2 has the problem of backend_ops.free_job() potentially > using > driver-data after the driver is gone, causing UAF. So with this > solutions all drivers would have to be aware of the issue and handle > it > through one of DRMs primitives dedicated to such problems. > > > Currently, all drivers either work around the problem internally or > simply ignore it, it seems. > > So I'd argue that both solutions are an improvement over the existing > situation. My preference would be #1. > > > Opinions? > > P. > > > > > > > > > One issue this solves is that when you send a SIGTERM the tear > > > down > > > handling > > > first flushes all the FDs and then closes them. > > > > > > So if flushing the FDs blocks because the process initiated > > > sending > > > a > > > terabyte of data over a 300bps line (for example) you can still > > > throw a > > > SIGKILL and abort that as well. > > > > > > If you would block in fops-close() that SIGKILL won't have any > > > effect any > > > more because by the time close() is called the process is gone > > > and > > > signals > > > are already blocked. > > > > > > And yes when I learned about that issue I was also buffed that > > > handling like > > > this in the UNIX design is nearly 50 years old and still applies > > > to > > > today. > > > > > > Probably bette
Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
On Mon, 2024-10-14 at 16:56 +0200, Danilo Krummrich wrote: > On Fri, Sep 27, 2024 at 11:04:48AM +0200, Christian König wrote: > > Am 25.09.24 um 16:53 schrieb Philipp Stanner: > > > On Tue, 2024-09-24 at 13:18 +0200, Simona Vetter wrote: > > > > On Mon, Sep 23, 2024 at 05:24:10PM +0200, Christian König > > > > wrote: > > > > > Am 20.09.24 um 15:26 schrieb Philipp Stanner: > > > > > > On Fri, 2024-09-20 at 12:33 +0200, Christian König wrote: > > > > > > > Am 20.09.24 um 10:57 schrieb Philipp Stanner: > > > > > > > > On Wed, 2024-09-18 at 15:39 +0200, Christian König > > > > > > > > wrote: > > > > > > > > > Tearing down the scheduler with jobs still on the > > > > > > > > > pending > > > > > > > > > list > > > > > > > > > can > > > > > > > > > lead to use after free issues. Add a warning if > > > > > > > > > drivers try > > > > > > > > > to > > > > > > > > > destroy a scheduler which still has work pushed to > > > > > > > > > the HW. > > > > > > > > Did you have time yet to look into my proposed waitque- > > > > > > > > solution? > > > > > > > I don't remember seeing anything. What have I missed? > > > > > > https://lore.kernel.org/all/20240903094446.29797-2-pstan...@redhat.com/ > > > > > Mhm, I didn't got that in my inbox for some reason. > > > > > > > > > > Interesting approach, I'm just not sure if we can or should > > > > > wait in > > > > > drm_sched_fini(). > > > We do agree that jobs still pending when drm_sched_fini() starts > > > is > > > always a bug, right? > > > > Correct, the question is how to avoid that. > > > > > If so, what are the disadvantages of waiting in drm_sched_fini()? > > > We > > > could block buggy drivers as I see it. Which wouldn't be good, > > > but > > > could then be fixed on drivers' site. > > > > Sima explained that pretty well: Don't block in fops->close, do > > that in > > fops->flush instead. > > I agree that we shouldn't block in close(), but this effectively > means that we > need to reference count the scheduler, right? > > Otherwise, if we allow to just skip / interrupt the teardown, we can > still leak > memory. Having thought about it, I agree with Danilo. Having something that shall wait on green light, but can be interrupted, is no guarantee and therefore not a feasible solution. To break down the solution space, these seem to be our options: 1. We have something (either drm_sched_fini() or a helper, e.g., drm_sched_flush()) that definitely blocks until the pending list has become empty. 2. We have jobs reference-count the scheduler, so the latter can outlive the driver and will be freed some time later. Can anyone think of a third solution? Solution #1 has the problem of obviously blocking unconditionally if the driver didn't make sure that all fences will be signaled within reasonable time. In my opinion, this would actually be an advantage, because it will be *very* noticable and force users to repair their driver. The driver *has* to guarantee that all fences will be signaled. If the driver has to do fishy things, having the blocking outsourced to the helper drm_sched_flush() would allow them to circumvent that. Solution #2 has the problem of backend_ops.free_job() potentially using driver-data after the driver is gone, causing UAF. So with this solutions all drivers would have to be aware of the issue and handle it through one of DRMs primitives dedicated to such problems. Currently, all drivers either work around the problem internally or simply ignore it, it seems. So I'd argue that both solutions are an improvement over the existing situation. My preference would be #1. Opinions? P. > > > > > One issue this solves is that when you send a SIGTERM the tear down > > handling > > first flushes all the FDs and then closes them. > > > > So if flushing the FDs blocks because the process initiated sending > > a > > terabyte of data over a 300bps line (for example) you can still > > throw a > > SIGKILL and abort that as well. > > > > If you would block in fops-close() that SIGKILL won't have any > > effect any > > more because by the time close() is called the process is gone and > > signals > > are already blocked. > > > > And yes when I learned about that issue I was also buffed that > > handling like > > this in the UNIX design is nearly 50 years old and still applies to > > today. > > > > > Probably better to make that a separate function, something > > > > > like > > > > > drm_sched_flush() or similar. > > > We could do that. Such a function could then be called by drivers > > > which > > > are not sure whether all jobs are done before they start tearing > > > down. > > > > Yes exactly that's the idea. And give that flush function a return > > code so > > that it can return -EINTR. > > > > > > Yeah I don't think we should smash this into drm_sched_fini > > > > unconditionally. I think conceptually there's about three > > > > cases: > > > > > > > > - Ringbuffer schedules. Probably want everything as-is, becaus
Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
On Fri, Sep 27, 2024 at 11:04:48AM +0200, Christian König wrote: > Am 25.09.24 um 16:53 schrieb Philipp Stanner: > > On Tue, 2024-09-24 at 13:18 +0200, Simona Vetter wrote: > > > On Mon, Sep 23, 2024 at 05:24:10PM +0200, Christian König wrote: > > > > Am 20.09.24 um 15:26 schrieb Philipp Stanner: > > > > > On Fri, 2024-09-20 at 12:33 +0200, Christian König wrote: > > > > > > Am 20.09.24 um 10:57 schrieb Philipp Stanner: > > > > > > > On Wed, 2024-09-18 at 15:39 +0200, Christian König wrote: > > > > > > > > Tearing down the scheduler with jobs still on the pending > > > > > > > > list > > > > > > > > can > > > > > > > > lead to use after free issues. Add a warning if drivers try > > > > > > > > to > > > > > > > > destroy a scheduler which still has work pushed to the HW. > > > > > > > Did you have time yet to look into my proposed waitque- > > > > > > > solution? > > > > > > I don't remember seeing anything. What have I missed? > > > > > https://lore.kernel.org/all/20240903094446.29797-2-pstan...@redhat.com/ > > > > Mhm, I didn't got that in my inbox for some reason. > > > > > > > > Interesting approach, I'm just not sure if we can or should wait in > > > > drm_sched_fini(). > > We do agree that jobs still pending when drm_sched_fini() starts is > > always a bug, right? > > Correct, the question is how to avoid that. > > > If so, what are the disadvantages of waiting in drm_sched_fini()? We > > could block buggy drivers as I see it. Which wouldn't be good, but > > could then be fixed on drivers' site. > > Sima explained that pretty well: Don't block in fops->close, do that in > fops->flush instead. I agree that we shouldn't block in close(), but this effectively means that we need to reference count the scheduler, right? Otherwise, if we allow to just skip / interrupt the teardown, we can still leak memory. > > One issue this solves is that when you send a SIGTERM the tear down handling > first flushes all the FDs and then closes them. > > So if flushing the FDs blocks because the process initiated sending a > terabyte of data over a 300bps line (for example) you can still throw a > SIGKILL and abort that as well. > > If you would block in fops-close() that SIGKILL won't have any effect any > more because by the time close() is called the process is gone and signals > are already blocked. > > And yes when I learned about that issue I was also buffed that handling like > this in the UNIX design is nearly 50 years old and still applies to today. > > > > Probably better to make that a separate function, something like > > > > drm_sched_flush() or similar. > > We could do that. Such a function could then be called by drivers which > > are not sure whether all jobs are done before they start tearing down. > > Yes exactly that's the idea. And give that flush function a return code so > that it can return -EINTR. > > > > Yeah I don't think we should smash this into drm_sched_fini > > > unconditionally. I think conceptually there's about three cases: > > > > > > - Ringbuffer schedules. Probably want everything as-is, because > > > drm_sched_fini is called long after all the entities are gone in > > > drm_device cleanup. > > > > > > - fw scheduler hardware with preemption support. There we probably > > > want to > > > nuke the context by setting the tdr timeout to zero (or maybe just > > > as > > > long as context preemption takes to be efficient), and relying on > > > the > > > normal gpu reset flow to handle things. drm_sched_entity_flush > > > kinda > > > does this, except not really and it's a lot more focused on the > > > ringbuffer context. So maybe we want a new drm_sched_entity_kill. > > > > > > For this case calling drm_sched_fini() after the 1:1 entity is gone > > > should not find any linger jobs, it would actually be a bug > > > somewhere if > > > there's a job lingering. Maybe a sanity check that there's not just > > > no > > > jobs lingering, but also no entity left would be good here? > > The check for lingering ones is in Christian's patch here IISC. > > At which position would you imagine the check for the entity being > > performed? > > > > > - fw scheduler without preemption support. There we kinda need the > > > drm_sched_flush, except blocking in fops->close is not great. So > > > instead > > > I think the following is better: > > > 1. drm_sched_entity_stopped, which only stops new submissions (for > > > paranoia) but doesn't tear down the entity > > Who would call that function? > > Drivers using it voluntarily could just as well stop accepting new jobs > > from userspace to their entities, couldn't they? > > > > > 2. drm_dev_get > > > 3. launch a worker which does a) drm_sched_flush (or > > > drm_sched_entity_flush or whatever we call it) b) > > > drm_sched_entity_fini > > > + drm_sched_fini c) drm_dev_put > > > > > > Note that semantically this implements the refcount in the other > > > path > > > from Phillip:
Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
On Tue, Sep 24, 2024 at 01:18:25PM +0200, Simona Vetter wrote: > On Mon, Sep 23, 2024 at 05:24:10PM +0200, Christian König wrote: > > Am 20.09.24 um 15:26 schrieb Philipp Stanner: > > > On Fri, 2024-09-20 at 12:33 +0200, Christian König wrote: > > > > Am 20.09.24 um 10:57 schrieb Philipp Stanner: > > > > > On Wed, 2024-09-18 at 15:39 +0200, Christian König wrote: > > > > > > Tearing down the scheduler with jobs still on the pending list > > > > > > can > > > > > > lead to use after free issues. Add a warning if drivers try to > > > > > > destroy a scheduler which still has work pushed to the HW. > > > > > Did you have time yet to look into my proposed waitque-solution? > > > > I don't remember seeing anything. What have I missed? > > > https://lore.kernel.org/all/20240903094446.29797-2-pstan...@redhat.com/ > > > > Mhm, I didn't got that in my inbox for some reason. > > > > Interesting approach, I'm just not sure if we can or should wait in > > drm_sched_fini(). > > > > Probably better to make that a separate function, something like > > drm_sched_flush() or similar. > > Yeah I don't think we should smash this into drm_sched_fini > unconditionally. I think conceptually there's about three cases: > > - Ringbuffer schedules. Probably want everything as-is, because > drm_sched_fini is called long after all the entities are gone in > drm_device cleanup. Even if entities are long gone we can still have jobs on the pending list of the scheduler. Calling drm_sched_fini() would still leak those jobs. It's just that for this case it's unlikely, but I don't think we want to rely on "unlikely". Do I miss anything? > > - fw scheduler hardware with preemption support. There we probably want to > nuke the context by setting the tdr timeout to zero (or maybe just as > long as context preemption takes to be efficient), and relying on the > normal gpu reset flow to handle things. drm_sched_entity_flush kinda > does this, except not really and it's a lot more focused on the > ringbuffer context. So maybe we want a new drm_sched_entity_kill. > > For this case calling drm_sched_fini() after the 1:1 entity is gone > should not find any linger jobs, it would actually be a bug somewhere if > there's a job lingering. Maybe a sanity check that there's not just no > jobs lingering, but also no entity left would be good here? This is a timing assumption too, isn't it? What if we reach drm_sched_fini() quicker than all the free() callbacks from the jobs on the pending list are processed? I know Xe uses the "tdr to zero trick", so I'm not aware of all implications -- in Nouveau I had to do it differently. > > - fw scheduler without preemption support. There we kinda need the > drm_sched_flush, except blocking in fops->close is not great. So instead > I think the following is better: > 1. drm_sched_entity_stopped, which only stops new submissions (for > paranoia) but doesn't tear down the entity > 2. drm_dev_get > 3. launch a worker which does a) drm_sched_flush (or > drm_sched_entity_flush or whatever we call it) b) drm_sched_entity_fini I think it would be drm_sched_flush(); the jobs on the pending are not associated with any entity any more. See also 96c7c2f4d5bd ("drm/scheduler: set entity to NULL in drm_sched_entity_pop_job()"). > + drm_sched_fini c) drm_dev_put > > Note that semantically this implements the refcount in the other path > from Phillip: > > https://lore.kernel.org/all/20240903094531.29893-2-pstan...@redhat.com/ > > Except it doesn't impose refcount on everyone else who doesn't need it, > and it doesn't even impose refcounting on drivers that do need it > because we use drm_sched_flush and a worker to achieve the same. > > Essentially helper functions for the common use-cases instead of trying to > solve them all by putting drm_sched_flush as a potentially very blocking > function into drm_sched_fini. > > > > > > > > When there are still entities with jobs the situation is even > > > > > > worse > > > > > > since the dma_fences for those jobs can never signal we can just > > > > > > choose between potentially locking up core memory management and > > > > > > random memory corruption. When drivers really mess it up that > > > > > > well > > > > > > let them run into a BUG_ON(). > > > > > > > > > > > > Signed-off-by: Christian König > > > > > > --- > > > > > > drivers/gpu/drm/scheduler/sched_main.c | 19 ++- > > > > > > 1 file changed, 18 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > > > > > > b/drivers/gpu/drm/scheduler/sched_main.c > > > > > > index f093616fe53c..8a46fab5cdc8 100644 > > > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > > > > > @@ -1333,17 +1333,34 @@ void drm_sched_fini(struct > > > > > > drm_gpu_scheduler > > > > > > *sched) > > > > > I agree with Sima that it should first be docume
Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
Am 25.09.24 um 16:53 schrieb Philipp Stanner: On Tue, 2024-09-24 at 13:18 +0200, Simona Vetter wrote: On Mon, Sep 23, 2024 at 05:24:10PM +0200, Christian König wrote: Am 20.09.24 um 15:26 schrieb Philipp Stanner: On Fri, 2024-09-20 at 12:33 +0200, Christian König wrote: Am 20.09.24 um 10:57 schrieb Philipp Stanner: On Wed, 2024-09-18 at 15:39 +0200, Christian König wrote: Tearing down the scheduler with jobs still on the pending list can lead to use after free issues. Add a warning if drivers try to destroy a scheduler which still has work pushed to the HW. Did you have time yet to look into my proposed waitque- solution? I don't remember seeing anything. What have I missed? https://lore.kernel.org/all/20240903094446.29797-2-pstan...@redhat.com/ Mhm, I didn't got that in my inbox for some reason. Interesting approach, I'm just not sure if we can or should wait in drm_sched_fini(). We do agree that jobs still pending when drm_sched_fini() starts is always a bug, right? Correct, the question is how to avoid that. If so, what are the disadvantages of waiting in drm_sched_fini()? We could block buggy drivers as I see it. Which wouldn't be good, but could then be fixed on drivers' site. Sima explained that pretty well: Don't block in fops->close, do that in fops->flush instead. One issue this solves is that when you send a SIGTERM the tear down handling first flushes all the FDs and then closes them. So if flushing the FDs blocks because the process initiated sending a terabyte of data over a 300bps line (for example) you can still throw a SIGKILL and abort that as well. If you would block in fops-close() that SIGKILL won't have any effect any more because by the time close() is called the process is gone and signals are already blocked. And yes when I learned about that issue I was also buffed that handling like this in the UNIX design is nearly 50 years old and still applies to today. Probably better to make that a separate function, something like drm_sched_flush() or similar. We could do that. Such a function could then be called by drivers which are not sure whether all jobs are done before they start tearing down. Yes exactly that's the idea. And give that flush function a return code so that it can return -EINTR. Yeah I don't think we should smash this into drm_sched_fini unconditionally. I think conceptually there's about three cases: - Ringbuffer schedules. Probably want everything as-is, because drm_sched_fini is called long after all the entities are gone in drm_device cleanup. - fw scheduler hardware with preemption support. There we probably want to nuke the context by setting the tdr timeout to zero (or maybe just as long as context preemption takes to be efficient), and relying on the normal gpu reset flow to handle things. drm_sched_entity_flush kinda does this, except not really and it's a lot more focused on the ringbuffer context. So maybe we want a new drm_sched_entity_kill. For this case calling drm_sched_fini() after the 1:1 entity is gone should not find any linger jobs, it would actually be a bug somewhere if there's a job lingering. Maybe a sanity check that there's not just no jobs lingering, but also no entity left would be good here? The check for lingering ones is in Christian's patch here IISC. At which position would you imagine the check for the entity being performed? - fw scheduler without preemption support. There we kinda need the drm_sched_flush, except blocking in fops->close is not great. So instead I think the following is better: 1. drm_sched_entity_stopped, which only stops new submissions (for paranoia) but doesn't tear down the entity Who would call that function? Drivers using it voluntarily could just as well stop accepting new jobs from userspace to their entities, couldn't they? 2. drm_dev_get 3. launch a worker which does a) drm_sched_flush (or drm_sched_entity_flush or whatever we call it) b) drm_sched_entity_fini + drm_sched_fini c) drm_dev_put Note that semantically this implements the refcount in the other path from Phillip: https://lore.kernel.org/all/20240903094531.29893-2-pstan...@redhat.com/ Except it doesn't impose refcount on everyone else who doesn't need it, and it doesn't even impose refcounting on drivers that do need it because we use drm_sched_flush and a worker to achieve the same. I indeed wasn't happy with the refcount approach for that reason, agreed. Essentially helper functions for the common use-cases instead of trying to solve them all by putting drm_sched_flush as a potentially very blocking function into drm_sched_fini. I'm still not able to see why it blocking would be undesired – as far as I can see, it is only invoked on driver teardown, so not during active operation. Teardown doesn't happen that often, and it can (if implemented correctly) only block until the driver's code has signaled the last
Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
On Tue, 2024-09-24 at 13:18 +0200, Simona Vetter wrote: > On Mon, Sep 23, 2024 at 05:24:10PM +0200, Christian König wrote: > > Am 20.09.24 um 15:26 schrieb Philipp Stanner: > > > On Fri, 2024-09-20 at 12:33 +0200, Christian König wrote: > > > > Am 20.09.24 um 10:57 schrieb Philipp Stanner: > > > > > On Wed, 2024-09-18 at 15:39 +0200, Christian König wrote: > > > > > > Tearing down the scheduler with jobs still on the pending > > > > > > list > > > > > > can > > > > > > lead to use after free issues. Add a warning if drivers try > > > > > > to > > > > > > destroy a scheduler which still has work pushed to the HW. > > > > > Did you have time yet to look into my proposed waitque- > > > > > solution? > > > > I don't remember seeing anything. What have I missed? > > > https://lore.kernel.org/all/20240903094446.29797-2-pstan...@redhat.com/ > > > > Mhm, I didn't got that in my inbox for some reason. > > > > Interesting approach, I'm just not sure if we can or should wait in > > drm_sched_fini(). We do agree that jobs still pending when drm_sched_fini() starts is always a bug, right? If so, what are the disadvantages of waiting in drm_sched_fini()? We could block buggy drivers as I see it. Which wouldn't be good, but could then be fixed on drivers' site. > > > > Probably better to make that a separate function, something like > > drm_sched_flush() or similar. We could do that. Such a function could then be called by drivers which are not sure whether all jobs are done before they start tearing down. > > Yeah I don't think we should smash this into drm_sched_fini > unconditionally. I think conceptually there's about three cases: > > - Ringbuffer schedules. Probably want everything as-is, because > drm_sched_fini is called long after all the entities are gone in > drm_device cleanup. > > - fw scheduler hardware with preemption support. There we probably > want to > nuke the context by setting the tdr timeout to zero (or maybe just > as > long as context preemption takes to be efficient), and relying on > the > normal gpu reset flow to handle things. drm_sched_entity_flush > kinda > does this, except not really and it's a lot more focused on the > ringbuffer context. So maybe we want a new drm_sched_entity_kill. > > For this case calling drm_sched_fini() after the 1:1 entity is gone > should not find any linger jobs, it would actually be a bug > somewhere if > there's a job lingering. Maybe a sanity check that there's not just > no > jobs lingering, but also no entity left would be good here? The check for lingering ones is in Christian's patch here IISC. At which position would you imagine the check for the entity being performed? > > - fw scheduler without preemption support. There we kinda need the > drm_sched_flush, except blocking in fops->close is not great. So > instead > I think the following is better: > 1. drm_sched_entity_stopped, which only stops new submissions (for > paranoia) but doesn't tear down the entity Who would call that function? Drivers using it voluntarily could just as well stop accepting new jobs from userspace to their entities, couldn't they? > 2. drm_dev_get > 3. launch a worker which does a) drm_sched_flush (or > drm_sched_entity_flush or whatever we call it) b) > drm_sched_entity_fini > + drm_sched_fini c) drm_dev_put > > Note that semantically this implements the refcount in the other > path > from Phillip: > > > https://lore.kernel.org/all/20240903094531.29893-2-pstan...@redhat.com/ > > Except it doesn't impose refcount on everyone else who doesn't need > it, > and it doesn't even impose refcounting on drivers that do need it > because we use drm_sched_flush and a worker to achieve the same. I indeed wasn't happy with the refcount approach for that reason, agreed. > > Essentially helper functions for the common use-cases instead of > trying to > solve them all by putting drm_sched_flush as a potentially very > blocking > function into drm_sched_fini. I'm still not able to see why it blocking would be undesired – as far as I can see, it is only invoked on driver teardown, so not during active operation. Teardown doesn't happen that often, and it can (if implemented correctly) only block until the driver's code has signaled the last fences. If that doesn't happen, the block would reveal a bug. But don't get me wrong: I don't want to *push* this solution. I just want to understand when it could become a problem. Wouldn't an explicitly blocking, separate function like drm_sched_flush() or drm_sched_fini_flush() be a small, doable step towards the right direction? > > > > > > > > When there are still entities with jobs the situation is > > > > > > even > > > > > > worse > > > > > > since the dma_fences for those jobs can never signal we can > > > > > > just > > > > > > choose between potentially locking up core memory > > > > > > management and > > > > > > random memory corruption. When drivers really mess
Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
On Mon, Sep 23, 2024 at 05:24:10PM +0200, Christian König wrote: > Am 20.09.24 um 15:26 schrieb Philipp Stanner: > > On Fri, 2024-09-20 at 12:33 +0200, Christian König wrote: > > > Am 20.09.24 um 10:57 schrieb Philipp Stanner: > > > > On Wed, 2024-09-18 at 15:39 +0200, Christian König wrote: > > > > > Tearing down the scheduler with jobs still on the pending list > > > > > can > > > > > lead to use after free issues. Add a warning if drivers try to > > > > > destroy a scheduler which still has work pushed to the HW. > > > > Did you have time yet to look into my proposed waitque-solution? > > > I don't remember seeing anything. What have I missed? > > https://lore.kernel.org/all/20240903094446.29797-2-pstan...@redhat.com/ > > Mhm, I didn't got that in my inbox for some reason. > > Interesting approach, I'm just not sure if we can or should wait in > drm_sched_fini(). > > Probably better to make that a separate function, something like > drm_sched_flush() or similar. Yeah I don't think we should smash this into drm_sched_fini unconditionally. I think conceptually there's about three cases: - Ringbuffer schedules. Probably want everything as-is, because drm_sched_fini is called long after all the entities are gone in drm_device cleanup. - fw scheduler hardware with preemption support. There we probably want to nuke the context by setting the tdr timeout to zero (or maybe just as long as context preemption takes to be efficient), and relying on the normal gpu reset flow to handle things. drm_sched_entity_flush kinda does this, except not really and it's a lot more focused on the ringbuffer context. So maybe we want a new drm_sched_entity_kill. For this case calling drm_sched_fini() after the 1:1 entity is gone should not find any linger jobs, it would actually be a bug somewhere if there's a job lingering. Maybe a sanity check that there's not just no jobs lingering, but also no entity left would be good here? - fw scheduler without preemption support. There we kinda need the drm_sched_flush, except blocking in fops->close is not great. So instead I think the following is better: 1. drm_sched_entity_stopped, which only stops new submissions (for paranoia) but doesn't tear down the entity 2. drm_dev_get 3. launch a worker which does a) drm_sched_flush (or drm_sched_entity_flush or whatever we call it) b) drm_sched_entity_fini + drm_sched_fini c) drm_dev_put Note that semantically this implements the refcount in the other path from Phillip: https://lore.kernel.org/all/20240903094531.29893-2-pstan...@redhat.com/ Except it doesn't impose refcount on everyone else who doesn't need it, and it doesn't even impose refcounting on drivers that do need it because we use drm_sched_flush and a worker to achieve the same. Essentially helper functions for the common use-cases instead of trying to solve them all by putting drm_sched_flush as a potentially very blocking function into drm_sched_fini. > > > > > When there are still entities with jobs the situation is even > > > > > worse > > > > > since the dma_fences for those jobs can never signal we can just > > > > > choose between potentially locking up core memory management and > > > > > random memory corruption. When drivers really mess it up that > > > > > well > > > > > let them run into a BUG_ON(). > > > > > > > > > > Signed-off-by: Christian König > > > > > --- > > > > > drivers/gpu/drm/scheduler/sched_main.c | 19 ++- > > > > > 1 file changed, 18 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > > > > > b/drivers/gpu/drm/scheduler/sched_main.c > > > > > index f093616fe53c..8a46fab5cdc8 100644 > > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > > > > @@ -1333,17 +1333,34 @@ void drm_sched_fini(struct > > > > > drm_gpu_scheduler > > > > > *sched) > > > > I agree with Sima that it should first be documented in the > > > > function's > > > > docstring what the user is expected to have done before calling the > > > > function. > > > Good point, going to update the documentation as well. > > Cool thing, thx. > > Would be great if everything (not totally trivial) necessary to be done > > before _fini() is mentioned. > > > > One could also think about providing a hint at how the driver can do > > that. AFAICS the only way for the driver to ensure that is to maintain > > its own, separate list of submitted jobs. > > Even with a duplicated pending list it's actually currently impossible to do > this fully cleanly. > > The problem is that the dma_fence object gives no guarantee when callbacks > are processed, e.g. they can be both processed from interrupt context as > well as from a CPU which called dma_fence_is_signaled(). > > So when a driver (or drm_sched_fini) waits for the last submitted fence it > actually can be that the drm_sched object still needs to do some proce
Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
Am 20.09.24 um 15:26 schrieb Philipp Stanner: On Fri, 2024-09-20 at 12:33 +0200, Christian König wrote: Am 20.09.24 um 10:57 schrieb Philipp Stanner: On Wed, 2024-09-18 at 15:39 +0200, Christian König wrote: Tearing down the scheduler with jobs still on the pending list can lead to use after free issues. Add a warning if drivers try to destroy a scheduler which still has work pushed to the HW. Did you have time yet to look into my proposed waitque-solution? I don't remember seeing anything. What have I missed? https://lore.kernel.org/all/20240903094446.29797-2-pstan...@redhat.com/ Mhm, I didn't got that in my inbox for some reason. Interesting approach, I'm just not sure if we can or should wait in drm_sched_fini(). Probably better to make that a separate function, something like drm_sched_flush() or similar. When there are still entities with jobs the situation is even worse since the dma_fences for those jobs can never signal we can just choose between potentially locking up core memory management and random memory corruption. When drivers really mess it up that well let them run into a BUG_ON(). Signed-off-by: Christian König --- drivers/gpu/drm/scheduler/sched_main.c | 19 ++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index f093616fe53c..8a46fab5cdc8 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -1333,17 +1333,34 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched) I agree with Sima that it should first be documented in the function's docstring what the user is expected to have done before calling the function. Good point, going to update the documentation as well. Cool thing, thx. Would be great if everything (not totally trivial) necessary to be done before _fini() is mentioned. One could also think about providing a hint at how the driver can do that. AFAICS the only way for the driver to ensure that is to maintain its own, separate list of submitted jobs. Even with a duplicated pending list it's actually currently impossible to do this fully cleanly. The problem is that the dma_fence object gives no guarantee when callbacks are processed, e.g. they can be both processed from interrupt context as well as from a CPU which called dma_fence_is_signaled(). So when a driver (or drm_sched_fini) waits for the last submitted fence it actually can be that the drm_sched object still needs to do some processing. See the hack in amdgpu_vm_tlb_seq() for more background on the problem. Regards, Christian. P. Thanks, Christian. P. drm_sched_wqueue_stop(sched); + /* +* Tearing down the scheduler wile there are still unprocessed jobs can +* lead to use after free issues in the scheduler fence. +*/ + WARN_ON(!list_empty(&sched->pending_list)); + for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) { struct drm_sched_rq *rq = sched->sched_rq[i]; spin_lock(&rq->lock); - list_for_each_entry(s_entity, &rq->entities, list) + list_for_each_entry(s_entity, &rq->entities, list) { + /* +* The justification for this BUG_ON() is that tearing +* down the scheduler while jobs are pending leaves +* dma_fences unsignaled. Since we have dependencies +* from the core memory management to eventually signal +* dma_fences this can trivially lead to a system wide +* stop because of a locked up memory management. +*/ + BUG_ON(spsc_queue_count(&s_entity- job_queue)); + /* * Prevents reinsertion and marks job_queue as idle, * it will removed from rq in drm_sched_entity_fini * eventually */ s_entity->stopped = true; + } spin_unlock(&rq->lock); kfree(sched->sched_rq[i]); }
Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
On Fri, 2024-09-20 at 12:33 +0200, Christian König wrote: > Am 20.09.24 um 10:57 schrieb Philipp Stanner: > > On Wed, 2024-09-18 at 15:39 +0200, Christian König wrote: > > > Tearing down the scheduler with jobs still on the pending list > > > can > > > lead to use after free issues. Add a warning if drivers try to > > > destroy a scheduler which still has work pushed to the HW. > > Did you have time yet to look into my proposed waitque-solution? > > I don't remember seeing anything. What have I missed? https://lore.kernel.org/all/20240903094446.29797-2-pstan...@redhat.com/ > > > > > > When there are still entities with jobs the situation is even > > > worse > > > since the dma_fences for those jobs can never signal we can just > > > choose between potentially locking up core memory management and > > > random memory corruption. When drivers really mess it up that > > > well > > > let them run into a BUG_ON(). > > > > > > Signed-off-by: Christian König > > > --- > > > drivers/gpu/drm/scheduler/sched_main.c | 19 ++- > > > 1 file changed, 18 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > > > b/drivers/gpu/drm/scheduler/sched_main.c > > > index f093616fe53c..8a46fab5cdc8 100644 > > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > > @@ -1333,17 +1333,34 @@ void drm_sched_fini(struct > > > drm_gpu_scheduler > > > *sched) > > I agree with Sima that it should first be documented in the > > function's > > docstring what the user is expected to have done before calling the > > function. > > Good point, going to update the documentation as well. Cool thing, thx. Would be great if everything (not totally trivial) necessary to be done before _fini() is mentioned. One could also think about providing a hint at how the driver can do that. AFAICS the only way for the driver to ensure that is to maintain its own, separate list of submitted jobs. P. > > Thanks, > Christian. > > > > > P. > > > > > > > > drm_sched_wqueue_stop(sched); > > > > > > + /* > > > + * Tearing down the scheduler wile there are still > > > unprocessed jobs can > > > + * lead to use after free issues in the scheduler fence. > > > + */ > > > + WARN_ON(!list_empty(&sched->pending_list)); > > > + > > > for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; > > > i++) > > > { > > > struct drm_sched_rq *rq = sched->sched_rq[i]; > > > > > > spin_lock(&rq->lock); > > > - list_for_each_entry(s_entity, &rq->entities, > > > list) > > > + list_for_each_entry(s_entity, &rq->entities, > > > list) { > > > + /* > > > + * The justification for this BUG_ON() > > > is > > > that tearing > > > + * down the scheduler while jobs are > > > pending > > > leaves > > > + * dma_fences unsignaled. Since we have > > > dependencies > > > + * from the core memory management to > > > eventually signal > > > + * dma_fences this can trivially lead to > > > a > > > system wide > > > + * stop because of a locked up memory > > > management. > > > + */ > > > + BUG_ON(spsc_queue_count(&s_entity- > > > > job_queue)); > > > + > > > /* > > > * Prevents reinsertion and marks > > > job_queue > > > as idle, > > > * it will removed from rq in > > > drm_sched_entity_fini > > > * eventually > > > */ > > > s_entity->stopped = true; > > > + } > > > spin_unlock(&rq->lock); > > > kfree(sched->sched_rq[i]); > > > } >
Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
Am 20.09.24 um 10:57 schrieb Philipp Stanner: On Wed, 2024-09-18 at 15:39 +0200, Christian König wrote: Tearing down the scheduler with jobs still on the pending list can lead to use after free issues. Add a warning if drivers try to destroy a scheduler which still has work pushed to the HW. Did you have time yet to look into my proposed waitque-solution? I don't remember seeing anything. What have I missed? When there are still entities with jobs the situation is even worse since the dma_fences for those jobs can never signal we can just choose between potentially locking up core memory management and random memory corruption. When drivers really mess it up that well let them run into a BUG_ON(). Signed-off-by: Christian König --- drivers/gpu/drm/scheduler/sched_main.c | 19 ++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index f093616fe53c..8a46fab5cdc8 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -1333,17 +1333,34 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched) I agree with Sima that it should first be documented in the function's docstring what the user is expected to have done before calling the function. Good point, going to update the documentation as well. Thanks, Christian. P. drm_sched_wqueue_stop(sched); + /* +* Tearing down the scheduler wile there are still unprocessed jobs can +* lead to use after free issues in the scheduler fence. +*/ + WARN_ON(!list_empty(&sched->pending_list)); + for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) { struct drm_sched_rq *rq = sched->sched_rq[i]; spin_lock(&rq->lock); - list_for_each_entry(s_entity, &rq->entities, list) + list_for_each_entry(s_entity, &rq->entities, list) { + /* +* The justification for this BUG_ON() is that tearing +* down the scheduler while jobs are pending leaves +* dma_fences unsignaled. Since we have dependencies +* from the core memory management to eventually signal +* dma_fences this can trivially lead to a system wide +* stop because of a locked up memory management. +*/ + BUG_ON(spsc_queue_count(&s_entity- job_queue)); + /* * Prevents reinsertion and marks job_queue as idle, * it will removed from rq in drm_sched_entity_fini * eventually */ s_entity->stopped = true; + } spin_unlock(&rq->lock); kfree(sched->sched_rq[i]); }
Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
On Wed, 2024-09-18 at 15:39 +0200, Christian König wrote: > Tearing down the scheduler with jobs still on the pending list can > lead to use after free issues. Add a warning if drivers try to > destroy a scheduler which still has work pushed to the HW. Did you have time yet to look into my proposed waitque-solution? > > When there are still entities with jobs the situation is even worse > since the dma_fences for those jobs can never signal we can just > choose between potentially locking up core memory management and > random memory corruption. When drivers really mess it up that well > let them run into a BUG_ON(). > > Signed-off-by: Christian König > --- > drivers/gpu/drm/scheduler/sched_main.c | 19 ++- > 1 file changed, 18 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > b/drivers/gpu/drm/scheduler/sched_main.c > index f093616fe53c..8a46fab5cdc8 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -1333,17 +1333,34 @@ void drm_sched_fini(struct drm_gpu_scheduler > *sched) I agree with Sima that it should first be documented in the function's docstring what the user is expected to have done before calling the function. P. > > drm_sched_wqueue_stop(sched); > > + /* > + * Tearing down the scheduler wile there are still > unprocessed jobs can > + * lead to use after free issues in the scheduler fence. > + */ > + WARN_ON(!list_empty(&sched->pending_list)); > + > for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) > { > struct drm_sched_rq *rq = sched->sched_rq[i]; > > spin_lock(&rq->lock); > - list_for_each_entry(s_entity, &rq->entities, list) > + list_for_each_entry(s_entity, &rq->entities, list) { > + /* > + * The justification for this BUG_ON() is > that tearing > + * down the scheduler while jobs are pending > leaves > + * dma_fences unsignaled. Since we have > dependencies > + * from the core memory management to > eventually signal > + * dma_fences this can trivially lead to a > system wide > + * stop because of a locked up memory > management. > + */ > + BUG_ON(spsc_queue_count(&s_entity- > >job_queue)); > + > /* > * Prevents reinsertion and marks job_queue > as idle, > * it will removed from rq in > drm_sched_entity_fini > * eventually > */ > s_entity->stopped = true; > + } > spin_unlock(&rq->lock); > kfree(sched->sched_rq[i]); > }
Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
Am 18.09.24 um 16:41 schrieb Jani Nikula: On Wed, 18 Sep 2024, "Christian König" wrote: Tearing down the scheduler with jobs still on the pending list can lead to use after free issues. Add a warning if drivers try to destroy a scheduler which still has work pushed to the HW. When there are still entities with jobs the situation is even worse since the dma_fences for those jobs can never signal we can just choose between potentially locking up core memory management and random memory corruption. When drivers really mess it up that well let them run into a BUG_ON(). Signed-off-by: Christian König --- drivers/gpu/drm/scheduler/sched_main.c | 19 ++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index f093616fe53c..8a46fab5cdc8 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -1333,17 +1333,34 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched) drm_sched_wqueue_stop(sched); + /* +* Tearing down the scheduler wile there are still unprocessed jobs can +* lead to use after free issues in the scheduler fence. +*/ + WARN_ON(!list_empty(&sched->pending_list)); drm_WARN_ON(sched->dev, ...) would identify the device, which I presume would be helpful in multi-GPU systems. Good point, going to fix that. Regards, Christian. BR, Jani. + for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) { struct drm_sched_rq *rq = sched->sched_rq[i]; spin_lock(&rq->lock); - list_for_each_entry(s_entity, &rq->entities, list) + list_for_each_entry(s_entity, &rq->entities, list) { + /* +* The justification for this BUG_ON() is that tearing +* down the scheduler while jobs are pending leaves +* dma_fences unsignaled. Since we have dependencies +* from the core memory management to eventually signal +* dma_fences this can trivially lead to a system wide +* stop because of a locked up memory management. +*/ + BUG_ON(spsc_queue_count(&s_entity->job_queue)); + /* * Prevents reinsertion and marks job_queue as idle, * it will removed from rq in drm_sched_entity_fini * eventually */ s_entity->stopped = true; + } spin_unlock(&rq->lock); kfree(sched->sched_rq[i]); }
Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini
On Wed, 18 Sep 2024, "Christian König" wrote: > Tearing down the scheduler with jobs still on the pending list can > lead to use after free issues. Add a warning if drivers try to > destroy a scheduler which still has work pushed to the HW. > > When there are still entities with jobs the situation is even worse > since the dma_fences for those jobs can never signal we can just > choose between potentially locking up core memory management and > random memory corruption. When drivers really mess it up that well > let them run into a BUG_ON(). > > Signed-off-by: Christian König > --- > drivers/gpu/drm/scheduler/sched_main.c | 19 ++- > 1 file changed, 18 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > b/drivers/gpu/drm/scheduler/sched_main.c > index f093616fe53c..8a46fab5cdc8 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -1333,17 +1333,34 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched) > > drm_sched_wqueue_stop(sched); > > + /* > + * Tearing down the scheduler wile there are still unprocessed jobs can > + * lead to use after free issues in the scheduler fence. > + */ > + WARN_ON(!list_empty(&sched->pending_list)); drm_WARN_ON(sched->dev, ...) would identify the device, which I presume would be helpful in multi-GPU systems. BR, Jani. > + > for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) { > struct drm_sched_rq *rq = sched->sched_rq[i]; > > spin_lock(&rq->lock); > - list_for_each_entry(s_entity, &rq->entities, list) > + list_for_each_entry(s_entity, &rq->entities, list) { > + /* > + * The justification for this BUG_ON() is that tearing > + * down the scheduler while jobs are pending leaves > + * dma_fences unsignaled. Since we have dependencies > + * from the core memory management to eventually signal > + * dma_fences this can trivially lead to a system wide > + * stop because of a locked up memory management. > + */ > + BUG_ON(spsc_queue_count(&s_entity->job_queue)); > + > /* >* Prevents reinsertion and marks job_queue as idle, >* it will removed from rq in drm_sched_entity_fini >* eventually >*/ > s_entity->stopped = true; > + } > spin_unlock(&rq->lock); > kfree(sched->sched_rq[i]); > } -- Jani Nikula, Intel