On Fri, Nov 12, 2021 at 7:09 PM Corey Minyard <miny...@acm.org> wrote: > > On Fri, Nov 12, 2021 at 06:54:13PM +0200, Ioanna Alifieraki wrote: > > Currently when removing an ipmi_user the removal is deferred as a work on > > the system's workqueue. Although this guarantees the free operation will > > occur in non atomic context, it can race with the ipmi_msghandler module > > removal (see [1]) . In case a remove_user work is scheduled for removal > > and shortly after ipmi_msghandler module is removed we can end up in a > > situation where the module is removed fist and when the work is executed > > the system crashes with : > > BUG: unable to handle page fault for address: ffffffffc05c3450 > > PF: supervisor instruction fetch in kernel mode > > PF: error_code(0x0010) - not-present page > > because the pages of the module are gone. In cleanup_ipmi() there is no > > easy way to detect if there are any pending works to flush them before > > removing the module. This patch creates a separate workqueue and schedules > > the remove_work works on it. When removing the module the workqueue is > > flushed to avoid the race. > > Yeah, this is an issue. One comment below... > > > > > [1] https://bugs.launchpad.net/bugs/1950666 > > > > Cc: sta...@vger.kernel.org > > Fixes: 3b9a907223d7 (ipmi: fix sleep-in-atomic in free_user at cleanup SRCU > > user->release_barrier) > > Signed-off-by: Ioanna Alifieraki <ioanna-maria.alifier...@canonical.com> > > --- > > drivers/char/ipmi/ipmi_msghandler.c | 9 ++++++++- > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/char/ipmi/ipmi_msghandler.c > > b/drivers/char/ipmi/ipmi_msghandler.c > > index deed355422f4..9e0ad2ccd3e0 100644 > > --- a/drivers/char/ipmi/ipmi_msghandler.c > > +++ b/drivers/char/ipmi/ipmi_msghandler.c > > @@ -191,6 +191,8 @@ struct ipmi_user { > > struct work_struct remove_work; > > }; > > > > +struct workqueue_struct *remove_work_wq; > > + > > static struct ipmi_user *acquire_ipmi_user(struct ipmi_user *user, int > > *index) > > __acquires(user->release_barrier) > > { > > @@ -1297,7 +1299,7 @@ static void free_user(struct kref *ref) > > struct ipmi_user *user = container_of(ref, struct ipmi_user, > > refcount); > > > > /* SRCU cleanup must happen in task context. */ > > - schedule_work(&user->remove_work); > > + queue_work(remove_work_wq, &user->remove_work); > > } > > > > static void _ipmi_destroy_user(struct ipmi_user *user) > > @@ -5383,6 +5385,8 @@ static int ipmi_init_msghandler(void) > > > > atomic_notifier_chain_register(&panic_notifier_list, &panic_block); > > > > + remove_work_wq = > > create_singlethread_workqueue("ipmi-msghandler-remove-wq"); > > + > > Shouldn't you check the return value here? >
Yes you're right, my bad. I'll incorporate Christophe's feedback too and send a v2 next week. Thanks all for the feedback! > -corey > > > initialized = true; > > > > out: > > @@ -5408,6 +5412,9 @@ static void __exit cleanup_ipmi(void) > > int count; > > > > if (initialized) { > > + flush_workqueue(remove_work_wq); > > + destroy_workqueue(remove_work_wq); > > + > > atomic_notifier_chain_unregister(&panic_notifier_list, > > &panic_block); > > > > -- > > 2.17.1 > > _______________________________________________ Openipmi-developer mailing list Openipmi-developer@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openipmi-developer