Re: [PATCH] workqueue: Fix spurious sanity check failures in destroy_workqueue()
Hello, On Thu, Sep 19, 2019 at 10:49:04AM +0800, Lai Jiangshan wrote: > Looks good to me. > > There is one test in show_pwq() > """ > worker == pwq->wq->rescuer ? "(RESCUER)" : "", > """ > I'm wondering if it needs to be updated to > """ > worker->rescue_wq ? "(RESCUER)" : "", > """ Hmm... yeah, good point. Let's do that. > And document "/* MD: rescue worker */" might be better > than current "/* I: rescue worker */", although ->rescuer can > be accessed without wq_mayday_lock lock in some code. Will apply this one too. Thanks. -- tejun
Re: [PATCH] workqueue: Fix spurious sanity check failures in destroy_workqueue()
Looks good to me. There is one test in show_pwq() """ worker == pwq->wq->rescuer ? "(RESCUER)" : "", """ I'm wondering if it needs to be updated to """ worker->rescue_wq ? "(RESCUER)" : "", """ And document "/* MD: rescue worker */" might be better than current "/* I: rescue worker */", although ->rescuer can be accessed without wq_mayday_lock lock in some code. Reviewed-by: Lai Jiangshan On Thu, Sep 19, 2019 at 9:43 AM Tejun Heo wrote: > > Before actually destrying a workqueue, destroy_workqueue() checks > whether it's actually idle. If it isn't, it prints out a bunch of > warning messages and leaves the workqueue dangling. It unfortunately > has a couple issues. > > * Mayday list queueing increments pwq's refcnts which gets detected as > busy and fails the sanity checks. However, because mayday list > queueing is asynchronous, this condition can happen without any > actual work items left in the workqueue. > > * Sanity check failure leaves the sysfs interface behind too which can > lead to init failure of newer instances of the workqueue. > > This patch fixes the above two by > > * If a workqueue has a rescuer, disable and kill the rescuer before > sanity checks. Disabling and killing is guaranteed to flush the > existing mayday list. > > * Remove sysfs interface before sanity checks. > > Signed-off-by: Tejun Heo > Reported-by: Marcin Pawlowski > Reported-by: "Williams, Gerald S" > Cc: sta...@vger.kernel.org > --- > Applying to wq/for-5.4. > > Thanks. > > kernel/workqueue.c | 24 +++- > 1 file changed, 19 insertions(+), 5 deletions(-) > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index 95aea04ff722..73e3ea9e31d3 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -4318,9 +4318,28 @@ void destroy_workqueue(struct workqueue_struct *wq) > struct pool_workqueue *pwq; > int node; > > + /* > +* Remove it from sysfs first so that sanity check failure doesn't > +* lead to sysfs name conflicts. > +*/ > + workqueue_sysfs_unregister(wq); > + > /* drain it before proceeding with destruction */ > drain_workqueue(wq); > > + /* kill rescuer, if sanity checks fail, leave it w/o rescuer */ > + if (wq->rescuer) { > + struct worker *rescuer = wq->rescuer; > + > + /* this prevents new queueing */ > + spin_lock_irq(_mayday_lock); > + wq->rescuer = NULL; > + spin_unlock_irq(_mayday_lock); > + > + /* rescuer will empty maydays list before exiting */ > + kthread_stop(rescuer->task); > + } > + > /* sanity checks */ > mutex_lock(>mutex); > for_each_pwq(pwq, wq) { > @@ -4352,11 +4371,6 @@ void destroy_workqueue(struct workqueue_struct *wq) > list_del_rcu(>list); > mutex_unlock(_pool_mutex); > > - workqueue_sysfs_unregister(wq); > - > - if (wq->rescuer) > - kthread_stop(wq->rescuer->task); > - > if (!(wq->flags & WQ_UNBOUND)) { > wq_unregister_lockdep(wq); > /* > -- > 2.17.1 >
[PATCH] workqueue: Fix spurious sanity check failures in destroy_workqueue()
Before actually destrying a workqueue, destroy_workqueue() checks whether it's actually idle. If it isn't, it prints out a bunch of warning messages and leaves the workqueue dangling. It unfortunately has a couple issues. * Mayday list queueing increments pwq's refcnts which gets detected as busy and fails the sanity checks. However, because mayday list queueing is asynchronous, this condition can happen without any actual work items left in the workqueue. * Sanity check failure leaves the sysfs interface behind too which can lead to init failure of newer instances of the workqueue. This patch fixes the above two by * If a workqueue has a rescuer, disable and kill the rescuer before sanity checks. Disabling and killing is guaranteed to flush the existing mayday list. * Remove sysfs interface before sanity checks. Signed-off-by: Tejun Heo Reported-by: Marcin Pawlowski Reported-by: "Williams, Gerald S" Cc: sta...@vger.kernel.org --- Applying to wq/for-5.4. Thanks. kernel/workqueue.c | 24 +++- 1 file changed, 19 insertions(+), 5 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 95aea04ff722..73e3ea9e31d3 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -4318,9 +4318,28 @@ void destroy_workqueue(struct workqueue_struct *wq) struct pool_workqueue *pwq; int node; + /* +* Remove it from sysfs first so that sanity check failure doesn't +* lead to sysfs name conflicts. +*/ + workqueue_sysfs_unregister(wq); + /* drain it before proceeding with destruction */ drain_workqueue(wq); + /* kill rescuer, if sanity checks fail, leave it w/o rescuer */ + if (wq->rescuer) { + struct worker *rescuer = wq->rescuer; + + /* this prevents new queueing */ + spin_lock_irq(_mayday_lock); + wq->rescuer = NULL; + spin_unlock_irq(_mayday_lock); + + /* rescuer will empty maydays list before exiting */ + kthread_stop(rescuer->task); + } + /* sanity checks */ mutex_lock(>mutex); for_each_pwq(pwq, wq) { @@ -4352,11 +4371,6 @@ void destroy_workqueue(struct workqueue_struct *wq) list_del_rcu(>list); mutex_unlock(_pool_mutex); - workqueue_sysfs_unregister(wq); - - if (wq->rescuer) - kthread_stop(wq->rescuer->task); - if (!(wq->flags & WQ_UNBOUND)) { wq_unregister_lockdep(wq); /* -- 2.17.1