Re: workqueue panic in 3.4 kernel
On Tue, Mar 12, 2013 at 2:13 PM, Tejun Heo wrote: > On Tue, Mar 12, 2013 at 02:01:16PM +0800, Lei Wen wrote: >> I see... >> How about only check those workqueue structure not on stack? >> For current onstack usage is rare, and should be easier to check with. > > No, kzalloc is not required. The memory area can come from any source. > If you're interested in improving the debug situation, please stop > trying to do something impossible (you can't track lifetime of random > memory area without extra bookkeeping) try to find out why debugobj > didn't detect it (did you even enable it? either via > CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT or "debug_objects" boot param). > I see... Thanks for the detailed explanation! :) Thanks, Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
On Tue, Mar 12, 2013 at 02:01:16PM +0800, Lei Wen wrote: > I see... > How about only check those workqueue structure not on stack? > For current onstack usage is rare, and should be easier to check with. No, kzalloc is not required. The memory area can come from any source. If you're interested in improving the debug situation, please stop trying to do something impossible (you can't track lifetime of random memory area without extra bookkeeping) try to find out why debugobj didn't detect it (did you even enable it? either via CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT or "debug_objects" boot param). -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
On Tue, Mar 12, 2013 at 1:40 PM, Tejun Heo wrote: > On Tue, Mar 12, 2013 at 01:34:56PM +0800, Lei Wen wrote: >> > Memory areas aren't always zero on allocation. >> >> Shouldn't work structure be allocated with kzalloc? > > It's not required to. work_struct can also be on stack. It's "init" > after all. Also, if you require clearing the memory before initing, > you would be just shifting problem from INIT to memory clearing. > I see... How about only check those workqueue structure not on stack? For current onstack usage is rare, and should be easier to check with. Thanks, Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
On Tue, Mar 12, 2013 at 1:40 PM, Tejun Heo t...@kernel.org wrote: On Tue, Mar 12, 2013 at 01:34:56PM +0800, Lei Wen wrote: Memory areas aren't always zero on allocation. Shouldn't work structure be allocated with kzalloc? It's not required to. work_struct can also be on stack. It's init after all. Also, if you require clearing the memory before initing, you would be just shifting problem from INIT to memory clearing. I see... How about only check those workqueue structure not on stack? For current onstack usage is rare, and should be easier to check with. Thanks, Lei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
On Tue, Mar 12, 2013 at 02:01:16PM +0800, Lei Wen wrote: I see... How about only check those workqueue structure not on stack? For current onstack usage is rare, and should be easier to check with. No, kzalloc is not required. The memory area can come from any source. If you're interested in improving the debug situation, please stop trying to do something impossible (you can't track lifetime of random memory area without extra bookkeeping) try to find out why debugobj didn't detect it (did you even enable it? either via CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT or debug_objects boot param). -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
On Tue, Mar 12, 2013 at 2:13 PM, Tejun Heo t...@kernel.org wrote: On Tue, Mar 12, 2013 at 02:01:16PM +0800, Lei Wen wrote: I see... How about only check those workqueue structure not on stack? For current onstack usage is rare, and should be easier to check with. No, kzalloc is not required. The memory area can come from any source. If you're interested in improving the debug situation, please stop trying to do something impossible (you can't track lifetime of random memory area without extra bookkeeping) try to find out why debugobj didn't detect it (did you even enable it? either via CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT or debug_objects boot param). I see... Thanks for the detailed explanation! :) Thanks, Lei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
On Tue, Mar 12, 2013 at 01:34:56PM +0800, Lei Wen wrote: > > Memory areas aren't always zero on allocation. > > Shouldn't work structure be allocated with kzalloc? It's not required to. work_struct can also be on stack. It's "init" after all. Also, if you require clearing the memory before initing, you would be just shifting problem from INIT to memory clearing. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
On Tue, Mar 12, 2013 at 1:24 PM, Tejun Heo wrote: > On Tue, Mar 12, 2013 at 01:18:01PM +0800, Lei Wen wrote: >> > You're initializing random piece of memory which may contain any >> > garbage and triggering BUG if some bit is set on it. No, you can't do >> > that. debugobj is the right tool for debugging object lifetime issues >> > and is already supported. >> >> The debugobj is not helping on this issue, I have enabled both >> CONFIG_DEBUG_OBJECTS_WORK and CONFIG_DEBUG_OBJECTS. >> But find they didn't report any issue at all. > > It should. No idea why it didn't. Would be interesting to find out > why. No idea about it also... > >> And I am not init random memory, original issue is call init multi-times >> for one structure and that piece of memory already been allocated. >> And __INIT_WORK shouldn't call over random memory, right? > > Memory areas aren't always zero on allocation. Shouldn't work structure be allocated with kzalloc? Thanks, Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
On Tue, Mar 12, 2013 at 01:18:01PM +0800, Lei Wen wrote: > > You're initializing random piece of memory which may contain any > > garbage and triggering BUG if some bit is set on it. No, you can't do > > that. debugobj is the right tool for debugging object lifetime issues > > and is already supported. > > The debugobj is not helping on this issue, I have enabled both > CONFIG_DEBUG_OBJECTS_WORK and CONFIG_DEBUG_OBJECTS. > But find they didn't report any issue at all. It should. No idea why it didn't. Would be interesting to find out why. > And I am not init random memory, original issue is call init multi-times > for one structure and that piece of memory already been allocated. > And __INIT_WORK shouldn't call over random memory, right? Memory areas aren't always zero on allocation. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
Tejun, On Tue, Mar 12, 2013 at 1:12 PM, Tejun Heo wrote: > Hello, > > On Tue, Mar 12, 2013 at 01:08:15PM +0800, Lei Wen wrote: >> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h >> index 8afab27..425d5a2 100644 >> --- a/include/linux/workqueue.h >> +++ b/include/linux/workqueue.h >> @@ -189,12 +189,16 @@ static inline unsigned int work_static(struct >> work_struct *work) { return 0; } >> * NOTE! No point in using "atomic_long_set()": using a direct >> * assignment of the work data initializer allows the compiler >> * to generate better code. >> + * >> + * We take the assumption that work should not be inited if it already >> + * hold the pending bit, or bug would be reported. >> */ >> #ifdef CONFIG_LOCKDEP >> #define __INIT_WORK(_work, _func, _onstack)\ >> do {\ >> static struct lock_class_key __key; \ >> \ >> + BUG_ON(work_pending(_work));\ > > You're initializing random piece of memory which may contain any > garbage and triggering BUG if some bit is set on it. No, you can't do > that. debugobj is the right tool for debugging object lifetime issues > and is already supported. The debugobj is not helping on this issue, I have enabled both CONFIG_DEBUG_OBJECTS_WORK and CONFIG_DEBUG_OBJECTS. But find they didn't report any issue at all. And I am not init random memory, original issue is call init multi-times for one structure and that piece of memory already been allocated. And __INIT_WORK shouldn't call over random memory, right? All this patch is adding a check here. Thanks, Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
Hello, On Tue, Mar 12, 2013 at 01:08:15PM +0800, Lei Wen wrote: > diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h > index 8afab27..425d5a2 100644 > --- a/include/linux/workqueue.h > +++ b/include/linux/workqueue.h > @@ -189,12 +189,16 @@ static inline unsigned int work_static(struct > work_struct *work) { return 0; } > * NOTE! No point in using "atomic_long_set()": using a direct > * assignment of the work data initializer allows the compiler > * to generate better code. > + * > + * We take the assumption that work should not be inited if it already > + * hold the pending bit, or bug would be reported. > */ > #ifdef CONFIG_LOCKDEP > #define __INIT_WORK(_work, _func, _onstack)\ > do {\ > static struct lock_class_key __key; \ > \ > + BUG_ON(work_pending(_work));\ You're initializing random piece of memory which may contain any garbage and triggering BUG if some bit is set on it. No, you can't do that. debugobj is the right tool for debugging object lifetime issues and is already supported. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
Hello, On Tue, Mar 12, 2013 at 01:08:15PM +0800, Lei Wen wrote: diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 8afab27..425d5a2 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -189,12 +189,16 @@ static inline unsigned int work_static(struct work_struct *work) { return 0; } * NOTE! No point in using atomic_long_set(): using a direct * assignment of the work data initializer allows the compiler * to generate better code. + * + * We take the assumption that work should not be inited if it already + * hold the pending bit, or bug would be reported. */ #ifdef CONFIG_LOCKDEP #define __INIT_WORK(_work, _func, _onstack)\ do {\ static struct lock_class_key __key; \ \ + BUG_ON(work_pending(_work));\ You're initializing random piece of memory which may contain any garbage and triggering BUG if some bit is set on it. No, you can't do that. debugobj is the right tool for debugging object lifetime issues and is already supported. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
Tejun, On Tue, Mar 12, 2013 at 1:12 PM, Tejun Heo t...@kernel.org wrote: Hello, On Tue, Mar 12, 2013 at 01:08:15PM +0800, Lei Wen wrote: diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 8afab27..425d5a2 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -189,12 +189,16 @@ static inline unsigned int work_static(struct work_struct *work) { return 0; } * NOTE! No point in using atomic_long_set(): using a direct * assignment of the work data initializer allows the compiler * to generate better code. + * + * We take the assumption that work should not be inited if it already + * hold the pending bit, or bug would be reported. */ #ifdef CONFIG_LOCKDEP #define __INIT_WORK(_work, _func, _onstack)\ do {\ static struct lock_class_key __key; \ \ + BUG_ON(work_pending(_work));\ You're initializing random piece of memory which may contain any garbage and triggering BUG if some bit is set on it. No, you can't do that. debugobj is the right tool for debugging object lifetime issues and is already supported. The debugobj is not helping on this issue, I have enabled both CONFIG_DEBUG_OBJECTS_WORK and CONFIG_DEBUG_OBJECTS. But find they didn't report any issue at all. And I am not init random memory, original issue is call init multi-times for one structure and that piece of memory already been allocated. And __INIT_WORK shouldn't call over random memory, right? All this patch is adding a check here. Thanks, Lei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
On Tue, Mar 12, 2013 at 01:18:01PM +0800, Lei Wen wrote: You're initializing random piece of memory which may contain any garbage and triggering BUG if some bit is set on it. No, you can't do that. debugobj is the right tool for debugging object lifetime issues and is already supported. The debugobj is not helping on this issue, I have enabled both CONFIG_DEBUG_OBJECTS_WORK and CONFIG_DEBUG_OBJECTS. But find they didn't report any issue at all. It should. No idea why it didn't. Would be interesting to find out why. And I am not init random memory, original issue is call init multi-times for one structure and that piece of memory already been allocated. And __INIT_WORK shouldn't call over random memory, right? Memory areas aren't always zero on allocation. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
On Tue, Mar 12, 2013 at 1:24 PM, Tejun Heo t...@kernel.org wrote: On Tue, Mar 12, 2013 at 01:18:01PM +0800, Lei Wen wrote: You're initializing random piece of memory which may contain any garbage and triggering BUG if some bit is set on it. No, you can't do that. debugobj is the right tool for debugging object lifetime issues and is already supported. The debugobj is not helping on this issue, I have enabled both CONFIG_DEBUG_OBJECTS_WORK and CONFIG_DEBUG_OBJECTS. But find they didn't report any issue at all. It should. No idea why it didn't. Would be interesting to find out why. No idea about it also... And I am not init random memory, original issue is call init multi-times for one structure and that piece of memory already been allocated. And __INIT_WORK shouldn't call over random memory, right? Memory areas aren't always zero on allocation. Shouldn't work structure be allocated with kzalloc? Thanks, Lei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
On Tue, Mar 12, 2013 at 01:34:56PM +0800, Lei Wen wrote: Memory areas aren't always zero on allocation. Shouldn't work structure be allocated with kzalloc? It's not required to. work_struct can also be on stack. It's init after all. Also, if you require clearing the memory before initing, you would be just shifting problem from INIT to memory clearing. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
On Thu, 7 Mar 2013, Tejun Heo wrote: > I can't see how something like that would happen and still find it > quite unlikely this would be a generic problem in either timer or > workqueue given how widely those are used and your case is the only > similar case that came up till now (and 3.4 is a long time ago). > Thomas, any ideas? debugobjects might give a hint. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
(cc'ing Thomas, hi!) Hello, Lei is seeing a problem where a delayed_work item gets corrupted (its work->data gets cleared while still queued on the timer). He thinks what's going on is that del_timer() is returning 1 but the timer function still gets executed. On Thu, Mar 07, 2013 at 11:22:40PM +0800, Lei Wen wrote: > >> If del_timer() happens after the timer starts running, del_timer() > >> would return NULL and try_to_grab_pending() will be called which will > >> return >=0 iff if successfully steals the PENDING bit (ie. it's the > >> sole owner of the work item). If del_timer() happens before the timer > >> starts running, the timer function would never run. > > > > If del_timer() happen before __run_timers() is called, while timer irq > > already happen, > > would it return 1 for the timer is still not detached in __run_timers()? > > If it is possible, then we would call try_to_grab_pending(), so that > > work->data would > > be cleared in this way. > > > >> > >> clear_work_data() happens iff the work item is confirmed to be idle. > >> At this point, I'm pretty skeptical this is a bug in workqueue itself > >> and strongly suggest looking at the crashing workqueue user. > > > > Also I am not very familiar with workqueue mechanism, how many place > > in kernel would > > clear the work->data beside the clear_work_data()? Work item initialization and clear_work_data() are the only places and from the looks of it you definitely seem to be hitting clear_work_data(). > > From the memory, I cannot find any hint for work structure being destroyed. > > So the only possible seems to me is the work->data be set by someone on > > purpose. > > > > crash> struct delayed_work 0xbf03d544 -x > > struct delayed_work { > > work = { > > data = { > > counter = 0x300 > > }, > > entry = { > > next = 0xbf03d548, > > prev = 0xbf03d548 > > }, > > func = 0xbf014b00 > > }, > > timer = { > > entry = { > > next = 0x0, > > prev = 0x200200 > > }, > > expires = 0x12b638b, > > base = 0xc0844e01, > > function = 0xc014c7a0 , > > data = 0xbf03d544, > > slack = 0x, > > start_pid = 0x, > > start_site = 0x0, > > start_comm = > > "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" > > } > > } > > I captured a trace log, which shows my previous suspicion is true: > __cancel_work_timer is called before delayed_work_timer_fn, so that > work->data is cleared. > > And when __cancel_work_timer is called, the timer is still pending, > so del_timer would return 1, thus no try_to_grab_pending would be called. > > But it is very strange that in __run_timers, it still get the same timer. > Then its callback, delayed_work_timer_fn, would be called, which cause > the issue. > > The detach_timer in __cancel_work_timer should already move the timer > from all list, am I right? Yes. > Could it happen for the timer_list be queued twice, like queue over two cpu? > If not, how could it happen? I can't see how something like that would happen and still find it quite unlikely this would be a generic problem in either timer or workqueue given how widely those are used and your case is the only similar case that came up till now (and 3.4 is a long time ago). Thomas, any ideas? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
Tejun, On Thu, Mar 7, 2013 at 9:15 AM, Lei Wen wrote: > Hi Tejun, > > On Thu, Mar 7, 2013 at 3:14 AM, Tejun Heo wrote: >> Hello, Lei. >> >> On Wed, Mar 06, 2013 at 10:39:15PM +0800, Lei Wen wrote: >>> We find a race condition as below: >>> CPU0 CPU1 >>> timer interrupt happen >>> __run_timers >>>__run_timers::spin_lock_irq(>lock) >>>__run_timers::spin_unlock_irq(>lock) >>> >>>__cancel_work_timer >>> >>>__cancel_work_timer::del_timer >>> >>>__cancel_work_timer::wait_on_work >>> >>>__cancel_work_timer::clear_work_data >>>__run_timers::call_timer_fn(timer, fn, data); >>> delayed_work_timer_fn::get_work_cwq >>>__run_timers::spin_lock_irq(>lock) >>> >>> It is possible for __cancel_work_timer to be run over cpu1 __BEFORE__ >>> cpu0 is ready to >>> run the timer callback, which is delayed_work_timer_fn in our case. >> >> If del_timer() happens after the timer starts running, del_timer() >> would return NULL and try_to_grab_pending() will be called which will >> return >=0 iff if successfully steals the PENDING bit (ie. it's the >> sole owner of the work item). If del_timer() happens before the timer >> starts running, the timer function would never run. > > If del_timer() happen before __run_timers() is called, while timer irq > already happen, > would it return 1 for the timer is still not detached in __run_timers()? > If it is possible, then we would call try_to_grab_pending(), so that > work->data would > be cleared in this way. > >> >> clear_work_data() happens iff the work item is confirmed to be idle. >> At this point, I'm pretty skeptical this is a bug in workqueue itself >> and strongly suggest looking at the crashing workqueue user. > > Also I am not very familiar with workqueue mechanism, how many place > in kernel would > clear the work->data beside the clear_work_data()? > > From the memory, I cannot find any hint for work structure being destroyed. > So the only possible seems to me is the work->data be set by someone on > purpose. > > crash> struct delayed_work 0xbf03d544 -x > struct delayed_work { > work = { > data = { > counter = 0x300 > }, > entry = { > next = 0xbf03d548, > prev = 0xbf03d548 > }, > func = 0xbf014b00 > }, > timer = { > entry = { > next = 0x0, > prev = 0x200200 > }, > expires = 0x12b638b, > base = 0xc0844e01, > function = 0xc014c7a0 , > data = 0xbf03d544, > slack = 0x, > start_pid = 0x, > start_site = 0x0, > start_comm = > "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" > } > } I captured a trace log, which shows my previous suspicion is true: __cancel_work_timer is called before delayed_work_timer_fn, so that work->data is cleared. And when __cancel_work_timer is called, the timer is still pending, so del_timer would return 1, thus no try_to_grab_pending would be called. But it is very strange that in __run_timers, it still get the same timer. Then its callback, delayed_work_timer_fn, would be called, which cause the issue. The detach_timer in __cancel_work_timer should already move the timer from all list, am I right? Could it happen for the timer_list be queued twice, like queue over two cpu? If not, how could it happen? Thanks, Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
Tejun, On Thu, Mar 7, 2013 at 9:15 AM, Lei Wen adrian.w...@gmail.com wrote: Hi Tejun, On Thu, Mar 7, 2013 at 3:14 AM, Tejun Heo t...@kernel.org wrote: Hello, Lei. On Wed, Mar 06, 2013 at 10:39:15PM +0800, Lei Wen wrote: We find a race condition as below: CPU0 CPU1 timer interrupt happen __run_timers __run_timers::spin_lock_irq(base-lock) __run_timers::spin_unlock_irq(base-lock) __cancel_work_timer __cancel_work_timer::del_timer __cancel_work_timer::wait_on_work __cancel_work_timer::clear_work_data __run_timers::call_timer_fn(timer, fn, data); delayed_work_timer_fn::get_work_cwq __run_timers::spin_lock_irq(base-lock) It is possible for __cancel_work_timer to be run over cpu1 __BEFORE__ cpu0 is ready to run the timer callback, which is delayed_work_timer_fn in our case. If del_timer() happens after the timer starts running, del_timer() would return NULL and try_to_grab_pending() will be called which will return =0 iff if successfully steals the PENDING bit (ie. it's the sole owner of the work item). If del_timer() happens before the timer starts running, the timer function would never run. If del_timer() happen before __run_timers() is called, while timer irq already happen, would it return 1 for the timer is still not detached in __run_timers()? If it is possible, then we would call try_to_grab_pending(), so that work-data would be cleared in this way. clear_work_data() happens iff the work item is confirmed to be idle. At this point, I'm pretty skeptical this is a bug in workqueue itself and strongly suggest looking at the crashing workqueue user. Also I am not very familiar with workqueue mechanism, how many place in kernel would clear the work-data beside the clear_work_data()? From the memory, I cannot find any hint for work structure being destroyed. So the only possible seems to me is the work-data be set by someone on purpose. crash struct delayed_work 0xbf03d544 -x struct delayed_work { work = { data = { counter = 0x300 }, entry = { next = 0xbf03d548, prev = 0xbf03d548 }, func = 0xbf014b00 }, timer = { entry = { next = 0x0, prev = 0x200200 }, expires = 0x12b638b, base = 0xc0844e01, function = 0xc014c7a0 delayed_work_timer_fn, data = 0xbf03d544, slack = 0x, start_pid = 0x, start_site = 0x0, start_comm = \000\000\000\000\000\000\000\000\000\000\000\000\000\000\000 } } I captured a trace log, which shows my previous suspicion is true: __cancel_work_timer is called before delayed_work_timer_fn, so that work-data is cleared. And when __cancel_work_timer is called, the timer is still pending, so del_timer would return 1, thus no try_to_grab_pending would be called. But it is very strange that in __run_timers, it still get the same timer. Then its callback, delayed_work_timer_fn, would be called, which cause the issue. The detach_timer in __cancel_work_timer should already move the timer from all list, am I right? Could it happen for the timer_list be queued twice, like queue over two cpu? If not, how could it happen? Thanks, Lei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
(cc'ing Thomas, hi!) Hello, Lei is seeing a problem where a delayed_work item gets corrupted (its work-data gets cleared while still queued on the timer). He thinks what's going on is that del_timer() is returning 1 but the timer function still gets executed. On Thu, Mar 07, 2013 at 11:22:40PM +0800, Lei Wen wrote: If del_timer() happens after the timer starts running, del_timer() would return NULL and try_to_grab_pending() will be called which will return =0 iff if successfully steals the PENDING bit (ie. it's the sole owner of the work item). If del_timer() happens before the timer starts running, the timer function would never run. If del_timer() happen before __run_timers() is called, while timer irq already happen, would it return 1 for the timer is still not detached in __run_timers()? If it is possible, then we would call try_to_grab_pending(), so that work-data would be cleared in this way. clear_work_data() happens iff the work item is confirmed to be idle. At this point, I'm pretty skeptical this is a bug in workqueue itself and strongly suggest looking at the crashing workqueue user. Also I am not very familiar with workqueue mechanism, how many place in kernel would clear the work-data beside the clear_work_data()? Work item initialization and clear_work_data() are the only places and from the looks of it you definitely seem to be hitting clear_work_data(). From the memory, I cannot find any hint for work structure being destroyed. So the only possible seems to me is the work-data be set by someone on purpose. crash struct delayed_work 0xbf03d544 -x struct delayed_work { work = { data = { counter = 0x300 }, entry = { next = 0xbf03d548, prev = 0xbf03d548 }, func = 0xbf014b00 }, timer = { entry = { next = 0x0, prev = 0x200200 }, expires = 0x12b638b, base = 0xc0844e01, function = 0xc014c7a0 delayed_work_timer_fn, data = 0xbf03d544, slack = 0x, start_pid = 0x, start_site = 0x0, start_comm = \000\000\000\000\000\000\000\000\000\000\000\000\000\000\000 } } I captured a trace log, which shows my previous suspicion is true: __cancel_work_timer is called before delayed_work_timer_fn, so that work-data is cleared. And when __cancel_work_timer is called, the timer is still pending, so del_timer would return 1, thus no try_to_grab_pending would be called. But it is very strange that in __run_timers, it still get the same timer. Then its callback, delayed_work_timer_fn, would be called, which cause the issue. The detach_timer in __cancel_work_timer should already move the timer from all list, am I right? Yes. Could it happen for the timer_list be queued twice, like queue over two cpu? If not, how could it happen? I can't see how something like that would happen and still find it quite unlikely this would be a generic problem in either timer or workqueue given how widely those are used and your case is the only similar case that came up till now (and 3.4 is a long time ago). Thomas, any ideas? Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
On Thu, 7 Mar 2013, Tejun Heo wrote: I can't see how something like that would happen and still find it quite unlikely this would be a generic problem in either timer or workqueue given how widely those are used and your case is the only similar case that came up till now (and 3.4 is a long time ago). Thomas, any ideas? debugobjects might give a hint. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
Hi Tejun, On Thu, Mar 7, 2013 at 3:14 AM, Tejun Heo wrote: > Hello, Lei. > > On Wed, Mar 06, 2013 at 10:39:15PM +0800, Lei Wen wrote: >> We find a race condition as below: >> CPU0 CPU1 >> timer interrupt happen >> __run_timers >>__run_timers::spin_lock_irq(>lock) >>__run_timers::spin_unlock_irq(>lock) >> >>__cancel_work_timer >> >>__cancel_work_timer::del_timer >> >>__cancel_work_timer::wait_on_work >> >>__cancel_work_timer::clear_work_data >>__run_timers::call_timer_fn(timer, fn, data); >> delayed_work_timer_fn::get_work_cwq >>__run_timers::spin_lock_irq(>lock) >> >> It is possible for __cancel_work_timer to be run over cpu1 __BEFORE__ >> cpu0 is ready to >> run the timer callback, which is delayed_work_timer_fn in our case. > > If del_timer() happens after the timer starts running, del_timer() > would return NULL and try_to_grab_pending() will be called which will > return >=0 iff if successfully steals the PENDING bit (ie. it's the > sole owner of the work item). If del_timer() happens before the timer > starts running, the timer function would never run. If del_timer() happen before __run_timers() is called, while timer irq already happen, would it return 1 for the timer is still not detached in __run_timers()? If it is possible, then we would call try_to_grab_pending(), so that work->data would be cleared in this way. > > clear_work_data() happens iff the work item is confirmed to be idle. > At this point, I'm pretty skeptical this is a bug in workqueue itself > and strongly suggest looking at the crashing workqueue user. Also I am not very familiar with workqueue mechanism, how many place in kernel would clear the work->data beside the clear_work_data()? >From the memory, I cannot find any hint for work structure being destroyed. So the only possible seems to me is the work->data be set by someone on purpose. crash> struct delayed_work 0xbf03d544 -x struct delayed_work { work = { data = { counter = 0x300 }, entry = { next = 0xbf03d548, prev = 0xbf03d548 }, func = 0xbf014b00 }, timer = { entry = { next = 0x0, prev = 0x200200 }, expires = 0x12b638b, base = 0xc0844e01, function = 0xc014c7a0 , data = 0xbf03d544, slack = 0x, start_pid = 0x, start_site = 0x0, start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" } } Thanks, Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
Hello, Lei. On Wed, Mar 06, 2013 at 10:39:15PM +0800, Lei Wen wrote: > We find a race condition as below: > CPU0 CPU1 > timer interrupt happen > __run_timers >__run_timers::spin_lock_irq(>lock) >__run_timers::spin_unlock_irq(>lock) > >__cancel_work_timer > >__cancel_work_timer::del_timer > >__cancel_work_timer::wait_on_work > >__cancel_work_timer::clear_work_data >__run_timers::call_timer_fn(timer, fn, data); > delayed_work_timer_fn::get_work_cwq >__run_timers::spin_lock_irq(>lock) > > It is possible for __cancel_work_timer to be run over cpu1 __BEFORE__ > cpu0 is ready to > run the timer callback, which is delayed_work_timer_fn in our case. If del_timer() happens after the timer starts running, del_timer() would return NULL and try_to_grab_pending() will be called which will return >=0 iff if successfully steals the PENDING bit (ie. it's the sole owner of the work item). If del_timer() happens before the timer starts running, the timer function would never run. clear_work_data() happens iff the work item is confirmed to be idle. At this point, I'm pretty skeptical this is a bug in workqueue itself and strongly suggest looking at the crashing workqueue user. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
Hi Tejun On Wed, Mar 6, 2013 at 12:32 AM, Tejun Heo wrote: > Hello, > > On Tue, Mar 05, 2013 at 03:31:45PM +0800, Lei Wen wrote: >> With checking memory, we find work->data becomes 0x300, when it try >> to call get_work_cwq > > Why would that become 0x300? Who's writing to that memory? Nobody > should be. We find a race condition as below: CPU0 CPU1 timer interrupt happen __run_timers __run_timers::spin_lock_irq(>lock) __run_timers::spin_unlock_irq(>lock) __cancel_work_timer __cancel_work_timer::del_timer __cancel_work_timer::wait_on_work __cancel_work_timer::clear_work_data __run_timers::call_timer_fn(timer, fn, data); delayed_work_timer_fn::get_work_cwq __run_timers::spin_lock_irq(>lock) It is possible for __cancel_work_timer to be run over cpu1 __BEFORE__ cpu0 is ready to run the timer callback, which is delayed_work_timer_fn in our case. Although __cancel_work_timer would call wait_on_work to wait the already inserted work complete, but for the work is not queued yet for its calback is not being executed, so the result should be wait_on_work directly return, and clear_work_data clears work->data. Thus when delayed_work_timer_fn is called, it would see work->data as 0x300. Do you think it is possible for this kind of sequence? > >> in delayed_work_timer_fn. Thus cwq becomes NULL before calls __queue_work. >> So it is reasonable kernel get panic when it try to access wq with cwq->wq. >> >> To fix it, we try to backport below patches: >> commit 60c057bca22285efefbba033624763a778f243bf >> Author: Lai Jiangshan >> Date: Wed Feb 6 18:04:53 2013 -0800 >> >> workqueue: add delayed_work->wq to simplify reentrancy handling >> >> commit 1265057fa02c7bed3b6d9ddc8a2048065a370364 >> Author: Tejun Heo >> Date: Wed Aug 8 09:38:42 2012 -0700 >> >> workqueue: fix CPU binding of flush_delayed_work[_sync]() > > Neither should affect the problem you described above. It *could* > make the problem go away just because it would stop using wq->data to > record cwq if the corruption was contained to that field but that > isn't a proper fix and the underlying problem could easily cause other > issues. > >> And add below change to make sure __cancel_work_timer cannot preempt >> between run_timer_softirq and delayed_work_timer_fn. >> diff --git a/kernel/workqueue.c b/kernel/workqueue.c >> index bf4888c..0e9f77c 100644 >> --- a/kernel/workqueue.c >> +++ b/kernel/workqueue.c >> @@ -2627,7 +2627,7 @@ static bool __cancel_work_timer(struct work_struct >> *work, >> ret = (timer && likely(del_timer(timer))); >> if (!ret) >> ret = try_to_grab_pending(work); >> - wait_on_work(work); >> + flush_work(work); >> } while (unlikely(ret < 0)); >> >> clear_work_data(work); >> >> Do you think this fix is enough? And add flush_work directly in >> __cancel_work_timer is ok for >> the fix? > > Maybe I'm missing something but it looks like the root cause hasn't > been diagnosed and you're piling up band-aids in workqueue code. You > might get away with it but could also be making the problem just more > obscure and difficult to debug (reproducible bugs are happy bugs). > > I'd suggest finding out who owns the delayed_work item and examining > why the delayed_work is getting corrupted in the first place. > > Thanks. > > -- > tejun Thanks, Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
Hi Tejun On Wed, Mar 6, 2013 at 12:32 AM, Tejun Heo t...@kernel.org wrote: Hello, On Tue, Mar 05, 2013 at 03:31:45PM +0800, Lei Wen wrote: With checking memory, we find work-data becomes 0x300, when it try to call get_work_cwq Why would that become 0x300? Who's writing to that memory? Nobody should be. We find a race condition as below: CPU0 CPU1 timer interrupt happen __run_timers __run_timers::spin_lock_irq(base-lock) __run_timers::spin_unlock_irq(base-lock) __cancel_work_timer __cancel_work_timer::del_timer __cancel_work_timer::wait_on_work __cancel_work_timer::clear_work_data __run_timers::call_timer_fn(timer, fn, data); delayed_work_timer_fn::get_work_cwq __run_timers::spin_lock_irq(base-lock) It is possible for __cancel_work_timer to be run over cpu1 __BEFORE__ cpu0 is ready to run the timer callback, which is delayed_work_timer_fn in our case. Although __cancel_work_timer would call wait_on_work to wait the already inserted work complete, but for the work is not queued yet for its calback is not being executed, so the result should be wait_on_work directly return, and clear_work_data clears work-data. Thus when delayed_work_timer_fn is called, it would see work-data as 0x300. Do you think it is possible for this kind of sequence? in delayed_work_timer_fn. Thus cwq becomes NULL before calls __queue_work. So it is reasonable kernel get panic when it try to access wq with cwq-wq. To fix it, we try to backport below patches: commit 60c057bca22285efefbba033624763a778f243bf Author: Lai Jiangshan la...@cn.fujitsu.com Date: Wed Feb 6 18:04:53 2013 -0800 workqueue: add delayed_work-wq to simplify reentrancy handling commit 1265057fa02c7bed3b6d9ddc8a2048065a370364 Author: Tejun Heo t...@kernel.org Date: Wed Aug 8 09:38:42 2012 -0700 workqueue: fix CPU binding of flush_delayed_work[_sync]() Neither should affect the problem you described above. It *could* make the problem go away just because it would stop using wq-data to record cwq if the corruption was contained to that field but that isn't a proper fix and the underlying problem could easily cause other issues. And add below change to make sure __cancel_work_timer cannot preempt between run_timer_softirq and delayed_work_timer_fn. diff --git a/kernel/workqueue.c b/kernel/workqueue.c index bf4888c..0e9f77c 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2627,7 +2627,7 @@ static bool __cancel_work_timer(struct work_struct *work, ret = (timer likely(del_timer(timer))); if (!ret) ret = try_to_grab_pending(work); - wait_on_work(work); + flush_work(work); } while (unlikely(ret 0)); clear_work_data(work); Do you think this fix is enough? And add flush_work directly in __cancel_work_timer is ok for the fix? Maybe I'm missing something but it looks like the root cause hasn't been diagnosed and you're piling up band-aids in workqueue code. You might get away with it but could also be making the problem just more obscure and difficult to debug (reproducible bugs are happy bugs). I'd suggest finding out who owns the delayed_work item and examining why the delayed_work is getting corrupted in the first place. Thanks. -- tejun Thanks, Lei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
Hello, Lei. On Wed, Mar 06, 2013 at 10:39:15PM +0800, Lei Wen wrote: We find a race condition as below: CPU0 CPU1 timer interrupt happen __run_timers __run_timers::spin_lock_irq(base-lock) __run_timers::spin_unlock_irq(base-lock) __cancel_work_timer __cancel_work_timer::del_timer __cancel_work_timer::wait_on_work __cancel_work_timer::clear_work_data __run_timers::call_timer_fn(timer, fn, data); delayed_work_timer_fn::get_work_cwq __run_timers::spin_lock_irq(base-lock) It is possible for __cancel_work_timer to be run over cpu1 __BEFORE__ cpu0 is ready to run the timer callback, which is delayed_work_timer_fn in our case. If del_timer() happens after the timer starts running, del_timer() would return NULL and try_to_grab_pending() will be called which will return =0 iff if successfully steals the PENDING bit (ie. it's the sole owner of the work item). If del_timer() happens before the timer starts running, the timer function would never run. clear_work_data() happens iff the work item is confirmed to be idle. At this point, I'm pretty skeptical this is a bug in workqueue itself and strongly suggest looking at the crashing workqueue user. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
Hi Tejun, On Thu, Mar 7, 2013 at 3:14 AM, Tejun Heo t...@kernel.org wrote: Hello, Lei. On Wed, Mar 06, 2013 at 10:39:15PM +0800, Lei Wen wrote: We find a race condition as below: CPU0 CPU1 timer interrupt happen __run_timers __run_timers::spin_lock_irq(base-lock) __run_timers::spin_unlock_irq(base-lock) __cancel_work_timer __cancel_work_timer::del_timer __cancel_work_timer::wait_on_work __cancel_work_timer::clear_work_data __run_timers::call_timer_fn(timer, fn, data); delayed_work_timer_fn::get_work_cwq __run_timers::spin_lock_irq(base-lock) It is possible for __cancel_work_timer to be run over cpu1 __BEFORE__ cpu0 is ready to run the timer callback, which is delayed_work_timer_fn in our case. If del_timer() happens after the timer starts running, del_timer() would return NULL and try_to_grab_pending() will be called which will return =0 iff if successfully steals the PENDING bit (ie. it's the sole owner of the work item). If del_timer() happens before the timer starts running, the timer function would never run. If del_timer() happen before __run_timers() is called, while timer irq already happen, would it return 1 for the timer is still not detached in __run_timers()? If it is possible, then we would call try_to_grab_pending(), so that work-data would be cleared in this way. clear_work_data() happens iff the work item is confirmed to be idle. At this point, I'm pretty skeptical this is a bug in workqueue itself and strongly suggest looking at the crashing workqueue user. Also I am not very familiar with workqueue mechanism, how many place in kernel would clear the work-data beside the clear_work_data()? From the memory, I cannot find any hint for work structure being destroyed. So the only possible seems to me is the work-data be set by someone on purpose. crash struct delayed_work 0xbf03d544 -x struct delayed_work { work = { data = { counter = 0x300 }, entry = { next = 0xbf03d548, prev = 0xbf03d548 }, func = 0xbf014b00 }, timer = { entry = { next = 0x0, prev = 0x200200 }, expires = 0x12b638b, base = 0xc0844e01, function = 0xc014c7a0 delayed_work_timer_fn, data = 0xbf03d544, slack = 0x, start_pid = 0x, start_site = 0x0, start_comm = \000\000\000\000\000\000\000\000\000\000\000\000\000\000\000 } } Thanks, Lei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
Hello, On Tue, Mar 05, 2013 at 03:31:45PM +0800, Lei Wen wrote: > With checking memory, we find work->data becomes 0x300, when it try > to call get_work_cwq Why would that become 0x300? Who's writing to that memory? Nobody should be. > in delayed_work_timer_fn. Thus cwq becomes NULL before calls __queue_work. > So it is reasonable kernel get panic when it try to access wq with cwq->wq. > > To fix it, we try to backport below patches: > commit 60c057bca22285efefbba033624763a778f243bf > Author: Lai Jiangshan > Date: Wed Feb 6 18:04:53 2013 -0800 > > workqueue: add delayed_work->wq to simplify reentrancy handling > > commit 1265057fa02c7bed3b6d9ddc8a2048065a370364 > Author: Tejun Heo > Date: Wed Aug 8 09:38:42 2012 -0700 > > workqueue: fix CPU binding of flush_delayed_work[_sync]() Neither should affect the problem you described above. It *could* make the problem go away just because it would stop using wq->data to record cwq if the corruption was contained to that field but that isn't a proper fix and the underlying problem could easily cause other issues. > And add below change to make sure __cancel_work_timer cannot preempt > between run_timer_softirq and delayed_work_timer_fn. > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index bf4888c..0e9f77c 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -2627,7 +2627,7 @@ static bool __cancel_work_timer(struct work_struct > *work, > ret = (timer && likely(del_timer(timer))); > if (!ret) > ret = try_to_grab_pending(work); > - wait_on_work(work); > + flush_work(work); > } while (unlikely(ret < 0)); > > clear_work_data(work); > > Do you think this fix is enough? And add flush_work directly in > __cancel_work_timer is ok for > the fix? Maybe I'm missing something but it looks like the root cause hasn't been diagnosed and you're piling up band-aids in workqueue code. You might get away with it but could also be making the problem just more obscure and difficult to debug (reproducible bugs are happy bugs). I'd suggest finding out who owns the delayed_work item and examining why the delayed_work is getting corrupted in the first place. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: workqueue panic in 3.4 kernel
Hello, On Tue, Mar 05, 2013 at 03:31:45PM +0800, Lei Wen wrote: With checking memory, we find work-data becomes 0x300, when it try to call get_work_cwq Why would that become 0x300? Who's writing to that memory? Nobody should be. in delayed_work_timer_fn. Thus cwq becomes NULL before calls __queue_work. So it is reasonable kernel get panic when it try to access wq with cwq-wq. To fix it, we try to backport below patches: commit 60c057bca22285efefbba033624763a778f243bf Author: Lai Jiangshan la...@cn.fujitsu.com Date: Wed Feb 6 18:04:53 2013 -0800 workqueue: add delayed_work-wq to simplify reentrancy handling commit 1265057fa02c7bed3b6d9ddc8a2048065a370364 Author: Tejun Heo t...@kernel.org Date: Wed Aug 8 09:38:42 2012 -0700 workqueue: fix CPU binding of flush_delayed_work[_sync]() Neither should affect the problem you described above. It *could* make the problem go away just because it would stop using wq-data to record cwq if the corruption was contained to that field but that isn't a proper fix and the underlying problem could easily cause other issues. And add below change to make sure __cancel_work_timer cannot preempt between run_timer_softirq and delayed_work_timer_fn. diff --git a/kernel/workqueue.c b/kernel/workqueue.c index bf4888c..0e9f77c 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2627,7 +2627,7 @@ static bool __cancel_work_timer(struct work_struct *work, ret = (timer likely(del_timer(timer))); if (!ret) ret = try_to_grab_pending(work); - wait_on_work(work); + flush_work(work); } while (unlikely(ret 0)); clear_work_data(work); Do you think this fix is enough? And add flush_work directly in __cancel_work_timer is ok for the fix? Maybe I'm missing something but it looks like the root cause hasn't been diagnosed and you're piling up band-aids in workqueue code. You might get away with it but could also be making the problem just more obscure and difficult to debug (reproducible bugs are happy bugs). I'd suggest finding out who owns the delayed_work item and examining why the delayed_work is getting corrupted in the first place. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/