Re: workqueue panic in 3.4 kernel

2013-03-12 Thread Lei Wen
On Tue, Mar 12, 2013 at 2:13 PM, Tejun Heo  wrote:
> On Tue, Mar 12, 2013 at 02:01:16PM +0800, Lei Wen wrote:
>> I see...
>> How about only check those workqueue structure not on stack?
>> For current onstack usage is rare, and should be easier to check with.
>
> No, kzalloc is not required. The memory area can come from any source.
> If you're interested in improving the debug situation, please stop
> trying to do something impossible (you can't track lifetime of random
> memory area without extra bookkeeping) try to find out why debugobj
> didn't detect it (did you even enable it? either via
> CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT or "debug_objects" boot param).
>


I see...
Thanks for the detailed explanation! :)

Thanks,
Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-12 Thread Tejun Heo
On Tue, Mar 12, 2013 at 02:01:16PM +0800, Lei Wen wrote:
> I see...
> How about only check those workqueue structure not on stack?
> For current onstack usage is rare, and should be easier to check with.

No, kzalloc is not required. The memory area can come from any source.
If you're interested in improving the debug situation, please stop
trying to do something impossible (you can't track lifetime of random
memory area without extra bookkeeping) try to find out why debugobj
didn't detect it (did you even enable it? either via
CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT or "debug_objects" boot param).

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-12 Thread Lei Wen
On Tue, Mar 12, 2013 at 1:40 PM, Tejun Heo  wrote:
> On Tue, Mar 12, 2013 at 01:34:56PM +0800, Lei Wen wrote:
>> > Memory areas aren't always zero on allocation.
>>
>> Shouldn't work structure be allocated with kzalloc?
>
> It's not required to. work_struct can also be on stack. It's "init"
> after all. Also, if you require clearing the memory before initing,
> you would be just shifting problem from INIT to memory clearing.
>

I see...
How about only check those workqueue structure not on stack?
For current onstack usage is rare, and should be easier to check with.

Thanks,
Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-12 Thread Lei Wen
On Tue, Mar 12, 2013 at 1:40 PM, Tejun Heo t...@kernel.org wrote:
 On Tue, Mar 12, 2013 at 01:34:56PM +0800, Lei Wen wrote:
  Memory areas aren't always zero on allocation.

 Shouldn't work structure be allocated with kzalloc?

 It's not required to. work_struct can also be on stack. It's init
 after all. Also, if you require clearing the memory before initing,
 you would be just shifting problem from INIT to memory clearing.


I see...
How about only check those workqueue structure not on stack?
For current onstack usage is rare, and should be easier to check with.

Thanks,
Lei
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-12 Thread Tejun Heo
On Tue, Mar 12, 2013 at 02:01:16PM +0800, Lei Wen wrote:
 I see...
 How about only check those workqueue structure not on stack?
 For current onstack usage is rare, and should be easier to check with.

No, kzalloc is not required. The memory area can come from any source.
If you're interested in improving the debug situation, please stop
trying to do something impossible (you can't track lifetime of random
memory area without extra bookkeeping) try to find out why debugobj
didn't detect it (did you even enable it? either via
CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT or debug_objects boot param).

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-12 Thread Lei Wen
On Tue, Mar 12, 2013 at 2:13 PM, Tejun Heo t...@kernel.org wrote:
 On Tue, Mar 12, 2013 at 02:01:16PM +0800, Lei Wen wrote:
 I see...
 How about only check those workqueue structure not on stack?
 For current onstack usage is rare, and should be easier to check with.

 No, kzalloc is not required. The memory area can come from any source.
 If you're interested in improving the debug situation, please stop
 trying to do something impossible (you can't track lifetime of random
 memory area without extra bookkeeping) try to find out why debugobj
 didn't detect it (did you even enable it? either via
 CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT or debug_objects boot param).



I see...
Thanks for the detailed explanation! :)

Thanks,
Lei
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-11 Thread Tejun Heo
On Tue, Mar 12, 2013 at 01:34:56PM +0800, Lei Wen wrote:
> > Memory areas aren't always zero on allocation.
> 
> Shouldn't work structure be allocated with kzalloc?

It's not required to. work_struct can also be on stack. It's "init"
after all. Also, if you require clearing the memory before initing,
you would be just shifting problem from INIT to memory clearing.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-11 Thread Lei Wen
On Tue, Mar 12, 2013 at 1:24 PM, Tejun Heo  wrote:
> On Tue, Mar 12, 2013 at 01:18:01PM +0800, Lei Wen wrote:
>> > You're initializing random piece of memory which may contain any
>> > garbage and triggering BUG if some bit is set on it. No, you can't do
>> > that. debugobj is the right tool for debugging object lifetime issues
>> > and is already supported.
>>
>> The debugobj is not helping on this issue, I have enabled both
>> CONFIG_DEBUG_OBJECTS_WORK and CONFIG_DEBUG_OBJECTS.
>> But find they didn't report any issue at all.
>
> It should. No idea why it didn't. Would be interesting to find out
> why.

No idea about it also...

>
>> And I am not init random memory, original issue is call init multi-times
>> for one structure and that piece of memory already been allocated.
>> And __INIT_WORK shouldn't call over random memory, right?
>
> Memory areas aren't always zero on allocation.


Shouldn't work structure be allocated with kzalloc?

Thanks,
Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-11 Thread Tejun Heo
On Tue, Mar 12, 2013 at 01:18:01PM +0800, Lei Wen wrote:
> > You're initializing random piece of memory which may contain any
> > garbage and triggering BUG if some bit is set on it. No, you can't do
> > that. debugobj is the right tool for debugging object lifetime issues
> > and is already supported.
> 
> The debugobj is not helping on this issue, I have enabled both
> CONFIG_DEBUG_OBJECTS_WORK and CONFIG_DEBUG_OBJECTS.
> But find they didn't report any issue at all.

It should. No idea why it didn't. Would be interesting to find out
why.

> And I am not init random memory, original issue is call init multi-times
> for one structure and that piece of memory already been allocated.
> And __INIT_WORK shouldn't call over random memory, right?

Memory areas aren't always zero on allocation.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-11 Thread Lei Wen
Tejun,

On Tue, Mar 12, 2013 at 1:12 PM, Tejun Heo  wrote:
> Hello,
>
> On Tue, Mar 12, 2013 at 01:08:15PM +0800, Lei Wen wrote:
>> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
>> index 8afab27..425d5a2 100644
>> --- a/include/linux/workqueue.h
>> +++ b/include/linux/workqueue.h
>> @@ -189,12 +189,16 @@ static inline unsigned int work_static(struct
>> work_struct *work) { return 0; }
>>   * NOTE! No point in using "atomic_long_set()": using a direct
>>   * assignment of the work data initializer allows the compiler
>>   * to generate better code.
>> + *
>> + * We take the assumption that work should not be inited if it already
>> + * hold the pending bit, or bug would be reported.
>>   */
>>  #ifdef CONFIG_LOCKDEP
>>  #define __INIT_WORK(_work, _func, _onstack)\
>> do {\
>> static struct lock_class_key __key; \
>> \
>> +   BUG_ON(work_pending(_work));\
>
> You're initializing random piece of memory which may contain any
> garbage and triggering BUG if some bit is set on it. No, you can't do
> that. debugobj is the right tool for debugging object lifetime issues
> and is already supported.

The debugobj is not helping on this issue, I have enabled both
CONFIG_DEBUG_OBJECTS_WORK and CONFIG_DEBUG_OBJECTS.
But find they didn't report any issue at all.

And I am not init random memory, original issue is call init multi-times
for one structure and that piece of memory already been allocated.
And __INIT_WORK shouldn't call over random memory, right?

All this patch is adding a check here.

Thanks,
Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-11 Thread Tejun Heo
Hello,

On Tue, Mar 12, 2013 at 01:08:15PM +0800, Lei Wen wrote:
> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> index 8afab27..425d5a2 100644
> --- a/include/linux/workqueue.h
> +++ b/include/linux/workqueue.h
> @@ -189,12 +189,16 @@ static inline unsigned int work_static(struct
> work_struct *work) { return 0; }
>   * NOTE! No point in using "atomic_long_set()": using a direct
>   * assignment of the work data initializer allows the compiler
>   * to generate better code.
> + *
> + * We take the assumption that work should not be inited if it already
> + * hold the pending bit, or bug would be reported.
>   */
>  #ifdef CONFIG_LOCKDEP
>  #define __INIT_WORK(_work, _func, _onstack)\
> do {\
> static struct lock_class_key __key; \
> \
> +   BUG_ON(work_pending(_work));\

You're initializing random piece of memory which may contain any
garbage and triggering BUG if some bit is set on it. No, you can't do
that. debugobj is the right tool for debugging object lifetime issues
and is already supported.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-11 Thread Tejun Heo
Hello,

On Tue, Mar 12, 2013 at 01:08:15PM +0800, Lei Wen wrote:
 diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
 index 8afab27..425d5a2 100644
 --- a/include/linux/workqueue.h
 +++ b/include/linux/workqueue.h
 @@ -189,12 +189,16 @@ static inline unsigned int work_static(struct
 work_struct *work) { return 0; }
   * NOTE! No point in using atomic_long_set(): using a direct
   * assignment of the work data initializer allows the compiler
   * to generate better code.
 + *
 + * We take the assumption that work should not be inited if it already
 + * hold the pending bit, or bug would be reported.
   */
  #ifdef CONFIG_LOCKDEP
  #define __INIT_WORK(_work, _func, _onstack)\
 do {\
 static struct lock_class_key __key; \
 \
 +   BUG_ON(work_pending(_work));\

You're initializing random piece of memory which may contain any
garbage and triggering BUG if some bit is set on it. No, you can't do
that. debugobj is the right tool for debugging object lifetime issues
and is already supported.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-11 Thread Lei Wen
Tejun,

On Tue, Mar 12, 2013 at 1:12 PM, Tejun Heo t...@kernel.org wrote:
 Hello,

 On Tue, Mar 12, 2013 at 01:08:15PM +0800, Lei Wen wrote:
 diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
 index 8afab27..425d5a2 100644
 --- a/include/linux/workqueue.h
 +++ b/include/linux/workqueue.h
 @@ -189,12 +189,16 @@ static inline unsigned int work_static(struct
 work_struct *work) { return 0; }
   * NOTE! No point in using atomic_long_set(): using a direct
   * assignment of the work data initializer allows the compiler
   * to generate better code.
 + *
 + * We take the assumption that work should not be inited if it already
 + * hold the pending bit, or bug would be reported.
   */
  #ifdef CONFIG_LOCKDEP
  #define __INIT_WORK(_work, _func, _onstack)\
 do {\
 static struct lock_class_key __key; \
 \
 +   BUG_ON(work_pending(_work));\

 You're initializing random piece of memory which may contain any
 garbage and triggering BUG if some bit is set on it. No, you can't do
 that. debugobj is the right tool for debugging object lifetime issues
 and is already supported.

The debugobj is not helping on this issue, I have enabled both
CONFIG_DEBUG_OBJECTS_WORK and CONFIG_DEBUG_OBJECTS.
But find they didn't report any issue at all.

And I am not init random memory, original issue is call init multi-times
for one structure and that piece of memory already been allocated.
And __INIT_WORK shouldn't call over random memory, right?

All this patch is adding a check here.

Thanks,
Lei
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-11 Thread Tejun Heo
On Tue, Mar 12, 2013 at 01:18:01PM +0800, Lei Wen wrote:
  You're initializing random piece of memory which may contain any
  garbage and triggering BUG if some bit is set on it. No, you can't do
  that. debugobj is the right tool for debugging object lifetime issues
  and is already supported.
 
 The debugobj is not helping on this issue, I have enabled both
 CONFIG_DEBUG_OBJECTS_WORK and CONFIG_DEBUG_OBJECTS.
 But find they didn't report any issue at all.

It should. No idea why it didn't. Would be interesting to find out
why.

 And I am not init random memory, original issue is call init multi-times
 for one structure and that piece of memory already been allocated.
 And __INIT_WORK shouldn't call over random memory, right?

Memory areas aren't always zero on allocation.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-11 Thread Lei Wen
On Tue, Mar 12, 2013 at 1:24 PM, Tejun Heo t...@kernel.org wrote:
 On Tue, Mar 12, 2013 at 01:18:01PM +0800, Lei Wen wrote:
  You're initializing random piece of memory which may contain any
  garbage and triggering BUG if some bit is set on it. No, you can't do
  that. debugobj is the right tool for debugging object lifetime issues
  and is already supported.

 The debugobj is not helping on this issue, I have enabled both
 CONFIG_DEBUG_OBJECTS_WORK and CONFIG_DEBUG_OBJECTS.
 But find they didn't report any issue at all.

 It should. No idea why it didn't. Would be interesting to find out
 why.

No idea about it also...


 And I am not init random memory, original issue is call init multi-times
 for one structure and that piece of memory already been allocated.
 And __INIT_WORK shouldn't call over random memory, right?

 Memory areas aren't always zero on allocation.


Shouldn't work structure be allocated with kzalloc?

Thanks,
Lei
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-11 Thread Tejun Heo
On Tue, Mar 12, 2013 at 01:34:56PM +0800, Lei Wen wrote:
  Memory areas aren't always zero on allocation.
 
 Shouldn't work structure be allocated with kzalloc?

It's not required to. work_struct can also be on stack. It's init
after all. Also, if you require clearing the memory before initing,
you would be just shifting problem from INIT to memory clearing.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-07 Thread Thomas Gleixner
On Thu, 7 Mar 2013, Tejun Heo wrote:
> I can't see how something like that would happen and still find it
> quite unlikely this would be a generic problem in either timer or
> workqueue given how widely those are used and your case is the only
> similar case that came up till now (and 3.4 is a long time ago).
> Thomas, any ideas?

debugobjects might give a hint.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-07 Thread Tejun Heo
(cc'ing Thomas, hi!)

Hello,

Lei is seeing a problem where a delayed_work item gets corrupted (its
work->data gets cleared while still queued on the timer).  He thinks
what's going on is that del_timer() is returning 1 but the timer
function still gets executed.

On Thu, Mar 07, 2013 at 11:22:40PM +0800, Lei Wen wrote:
> >> If del_timer() happens after the timer starts running, del_timer()
> >> would return NULL and try_to_grab_pending() will be called which will
> >> return >=0 iff if successfully steals the PENDING bit (ie. it's the
> >> sole owner of the work item).  If del_timer() happens before the timer
> >> starts running, the timer function would never run.
> >
> > If del_timer() happen before __run_timers() is called, while timer irq
> > already happen,
> > would it return 1 for the timer is still not detached in __run_timers()?
> > If it is possible, then we would call try_to_grab_pending(), so that
> > work->data would
> > be cleared in this way.
> >
> >>
> >> clear_work_data() happens iff the work item is confirmed to be idle.
> >> At this point, I'm pretty skeptical this is a bug in workqueue itself
> >> and strongly suggest looking at the crashing workqueue user.
> >
> > Also I am not very familiar with workqueue mechanism, how many place
> > in kernel would
> > clear the work->data beside the clear_work_data()?

Work item initialization and clear_work_data() are the only places and
from the looks of it you definitely seem to be hitting
clear_work_data().

> > From the memory, I cannot find any hint for work structure being destroyed.
> > So the only possible seems to me is the work->data be set by someone on 
> > purpose.
> >
> > crash> struct delayed_work 0xbf03d544 -x
> > struct delayed_work {
> >   work = {
> > data = {
> >   counter = 0x300
> > },
> > entry = {
> >   next = 0xbf03d548,
> >   prev = 0xbf03d548
> > },
> > func = 0xbf014b00
> >   },
> >   timer = {
> > entry = {
> >   next = 0x0,
> >   prev = 0x200200
> > },
> > expires = 0x12b638b,
> > base = 0xc0844e01,
> > function = 0xc014c7a0 ,
> > data = 0xbf03d544,
> > slack = 0x,
> > start_pid = 0x,
> > start_site = 0x0,
> > start_comm = 
> > "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> >   }
> > }
> 
> I captured a trace log, which shows my previous suspicion is true:
> __cancel_work_timer is called before delayed_work_timer_fn, so that
> work->data is cleared.
> 
> And when __cancel_work_timer is called, the timer is still pending,
> so del_timer would return 1, thus no try_to_grab_pending would be called.
> 
> But it is very strange that in __run_timers, it still get the same timer.
> Then its callback, delayed_work_timer_fn, would be called, which cause
> the issue.
> 
> The detach_timer in __cancel_work_timer should already move the timer
> from all list, am I right?

Yes.

> Could it happen for the timer_list be queued twice, like queue over two cpu?
> If not, how could it happen?

I can't see how something like that would happen and still find it
quite unlikely this would be a generic problem in either timer or
workqueue given how widely those are used and your case is the only
similar case that came up till now (and 3.4 is a long time ago).
Thomas, any ideas?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-07 Thread Lei Wen
Tejun,

On Thu, Mar 7, 2013 at 9:15 AM, Lei Wen  wrote:
> Hi Tejun,
>
> On Thu, Mar 7, 2013 at 3:14 AM, Tejun Heo  wrote:
>> Hello, Lei.
>>
>> On Wed, Mar 06, 2013 at 10:39:15PM +0800, Lei Wen wrote:
>>> We find a race condition as below:
>>> CPU0  CPU1
>>> timer interrupt happen
>>> __run_timers
>>>__run_timers::spin_lock_irq(>lock)
>>>__run_timers::spin_unlock_irq(>lock)
>>>
>>>__cancel_work_timer
>>>
>>>__cancel_work_timer::del_timer
>>>
>>>__cancel_work_timer::wait_on_work
>>>
>>>__cancel_work_timer::clear_work_data
>>>__run_timers::call_timer_fn(timer, fn, data);
>>>  delayed_work_timer_fn::get_work_cwq
>>>__run_timers::spin_lock_irq(>lock)
>>>
>>> It is possible for __cancel_work_timer to be run over cpu1 __BEFORE__
>>> cpu0 is ready to
>>> run the timer callback, which is delayed_work_timer_fn in our case.
>>
>> If del_timer() happens after the timer starts running, del_timer()
>> would return NULL and try_to_grab_pending() will be called which will
>> return >=0 iff if successfully steals the PENDING bit (ie. it's the
>> sole owner of the work item).  If del_timer() happens before the timer
>> starts running, the timer function would never run.
>
> If del_timer() happen before __run_timers() is called, while timer irq
> already happen,
> would it return 1 for the timer is still not detached in __run_timers()?
> If it is possible, then we would call try_to_grab_pending(), so that
> work->data would
> be cleared in this way.
>
>>
>> clear_work_data() happens iff the work item is confirmed to be idle.
>> At this point, I'm pretty skeptical this is a bug in workqueue itself
>> and strongly suggest looking at the crashing workqueue user.
>
> Also I am not very familiar with workqueue mechanism, how many place
> in kernel would
> clear the work->data beside the clear_work_data()?
>
> From the memory, I cannot find any hint for work structure being destroyed.
> So the only possible seems to me is the work->data be set by someone on 
> purpose.
>
> crash> struct delayed_work 0xbf03d544 -x
> struct delayed_work {
>   work = {
> data = {
>   counter = 0x300
> },
> entry = {
>   next = 0xbf03d548,
>   prev = 0xbf03d548
> },
> func = 0xbf014b00
>   },
>   timer = {
> entry = {
>   next = 0x0,
>   prev = 0x200200
> },
> expires = 0x12b638b,
> base = 0xc0844e01,
> function = 0xc014c7a0 ,
> data = 0xbf03d544,
> slack = 0x,
> start_pid = 0x,
> start_site = 0x0,
> start_comm = 
> "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
>   }
> }

I captured a trace log, which shows my previous suspicion is true:
__cancel_work_timer is called before delayed_work_timer_fn, so that
work->data is cleared.

And when __cancel_work_timer is called, the timer is still pending,
so del_timer would return 1, thus no try_to_grab_pending would be called.

But it is very strange that in __run_timers, it still get the same timer.
Then its callback, delayed_work_timer_fn, would be called, which cause
the issue.

The detach_timer in __cancel_work_timer should already move the timer
from all list, am I right?
Could it happen for the timer_list be queued twice, like queue over two cpu?
If not, how could it happen?

Thanks,
Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-07 Thread Lei Wen
Tejun,

On Thu, Mar 7, 2013 at 9:15 AM, Lei Wen adrian.w...@gmail.com wrote:
 Hi Tejun,

 On Thu, Mar 7, 2013 at 3:14 AM, Tejun Heo t...@kernel.org wrote:
 Hello, Lei.

 On Wed, Mar 06, 2013 at 10:39:15PM +0800, Lei Wen wrote:
 We find a race condition as below:
 CPU0  CPU1
 timer interrupt happen
 __run_timers
__run_timers::spin_lock_irq(base-lock)
__run_timers::spin_unlock_irq(base-lock)

__cancel_work_timer

__cancel_work_timer::del_timer

__cancel_work_timer::wait_on_work

__cancel_work_timer::clear_work_data
__run_timers::call_timer_fn(timer, fn, data);
  delayed_work_timer_fn::get_work_cwq
__run_timers::spin_lock_irq(base-lock)

 It is possible for __cancel_work_timer to be run over cpu1 __BEFORE__
 cpu0 is ready to
 run the timer callback, which is delayed_work_timer_fn in our case.

 If del_timer() happens after the timer starts running, del_timer()
 would return NULL and try_to_grab_pending() will be called which will
 return =0 iff if successfully steals the PENDING bit (ie. it's the
 sole owner of the work item).  If del_timer() happens before the timer
 starts running, the timer function would never run.

 If del_timer() happen before __run_timers() is called, while timer irq
 already happen,
 would it return 1 for the timer is still not detached in __run_timers()?
 If it is possible, then we would call try_to_grab_pending(), so that
 work-data would
 be cleared in this way.


 clear_work_data() happens iff the work item is confirmed to be idle.
 At this point, I'm pretty skeptical this is a bug in workqueue itself
 and strongly suggest looking at the crashing workqueue user.

 Also I am not very familiar with workqueue mechanism, how many place
 in kernel would
 clear the work-data beside the clear_work_data()?

 From the memory, I cannot find any hint for work structure being destroyed.
 So the only possible seems to me is the work-data be set by someone on 
 purpose.

 crash struct delayed_work 0xbf03d544 -x
 struct delayed_work {
   work = {
 data = {
   counter = 0x300
 },
 entry = {
   next = 0xbf03d548,
   prev = 0xbf03d548
 },
 func = 0xbf014b00
   },
   timer = {
 entry = {
   next = 0x0,
   prev = 0x200200
 },
 expires = 0x12b638b,
 base = 0xc0844e01,
 function = 0xc014c7a0 delayed_work_timer_fn,
 data = 0xbf03d544,
 slack = 0x,
 start_pid = 0x,
 start_site = 0x0,
 start_comm = 
 \000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
   }
 }

I captured a trace log, which shows my previous suspicion is true:
__cancel_work_timer is called before delayed_work_timer_fn, so that
work-data is cleared.

And when __cancel_work_timer is called, the timer is still pending,
so del_timer would return 1, thus no try_to_grab_pending would be called.

But it is very strange that in __run_timers, it still get the same timer.
Then its callback, delayed_work_timer_fn, would be called, which cause
the issue.

The detach_timer in __cancel_work_timer should already move the timer
from all list, am I right?
Could it happen for the timer_list be queued twice, like queue over two cpu?
If not, how could it happen?

Thanks,
Lei
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-07 Thread Tejun Heo
(cc'ing Thomas, hi!)

Hello,

Lei is seeing a problem where a delayed_work item gets corrupted (its
work-data gets cleared while still queued on the timer).  He thinks
what's going on is that del_timer() is returning 1 but the timer
function still gets executed.

On Thu, Mar 07, 2013 at 11:22:40PM +0800, Lei Wen wrote:
  If del_timer() happens after the timer starts running, del_timer()
  would return NULL and try_to_grab_pending() will be called which will
  return =0 iff if successfully steals the PENDING bit (ie. it's the
  sole owner of the work item).  If del_timer() happens before the timer
  starts running, the timer function would never run.
 
  If del_timer() happen before __run_timers() is called, while timer irq
  already happen,
  would it return 1 for the timer is still not detached in __run_timers()?
  If it is possible, then we would call try_to_grab_pending(), so that
  work-data would
  be cleared in this way.
 
 
  clear_work_data() happens iff the work item is confirmed to be idle.
  At this point, I'm pretty skeptical this is a bug in workqueue itself
  and strongly suggest looking at the crashing workqueue user.
 
  Also I am not very familiar with workqueue mechanism, how many place
  in kernel would
  clear the work-data beside the clear_work_data()?

Work item initialization and clear_work_data() are the only places and
from the looks of it you definitely seem to be hitting
clear_work_data().

  From the memory, I cannot find any hint for work structure being destroyed.
  So the only possible seems to me is the work-data be set by someone on 
  purpose.
 
  crash struct delayed_work 0xbf03d544 -x
  struct delayed_work {
work = {
  data = {
counter = 0x300
  },
  entry = {
next = 0xbf03d548,
prev = 0xbf03d548
  },
  func = 0xbf014b00
},
timer = {
  entry = {
next = 0x0,
prev = 0x200200
  },
  expires = 0x12b638b,
  base = 0xc0844e01,
  function = 0xc014c7a0 delayed_work_timer_fn,
  data = 0xbf03d544,
  slack = 0x,
  start_pid = 0x,
  start_site = 0x0,
  start_comm = 
  \000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
}
  }
 
 I captured a trace log, which shows my previous suspicion is true:
 __cancel_work_timer is called before delayed_work_timer_fn, so that
 work-data is cleared.
 
 And when __cancel_work_timer is called, the timer is still pending,
 so del_timer would return 1, thus no try_to_grab_pending would be called.
 
 But it is very strange that in __run_timers, it still get the same timer.
 Then its callback, delayed_work_timer_fn, would be called, which cause
 the issue.
 
 The detach_timer in __cancel_work_timer should already move the timer
 from all list, am I right?

Yes.

 Could it happen for the timer_list be queued twice, like queue over two cpu?
 If not, how could it happen?

I can't see how something like that would happen and still find it
quite unlikely this would be a generic problem in either timer or
workqueue given how widely those are used and your case is the only
similar case that came up till now (and 3.4 is a long time ago).
Thomas, any ideas?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-07 Thread Thomas Gleixner
On Thu, 7 Mar 2013, Tejun Heo wrote:
 I can't see how something like that would happen and still find it
 quite unlikely this would be a generic problem in either timer or
 workqueue given how widely those are used and your case is the only
 similar case that came up till now (and 3.4 is a long time ago).
 Thomas, any ideas?

debugobjects might give a hint.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-06 Thread Lei Wen
Hi Tejun,

On Thu, Mar 7, 2013 at 3:14 AM, Tejun Heo  wrote:
> Hello, Lei.
>
> On Wed, Mar 06, 2013 at 10:39:15PM +0800, Lei Wen wrote:
>> We find a race condition as below:
>> CPU0  CPU1
>> timer interrupt happen
>> __run_timers
>>__run_timers::spin_lock_irq(>lock)
>>__run_timers::spin_unlock_irq(>lock)
>>
>>__cancel_work_timer
>>
>>__cancel_work_timer::del_timer
>>
>>__cancel_work_timer::wait_on_work
>>
>>__cancel_work_timer::clear_work_data
>>__run_timers::call_timer_fn(timer, fn, data);
>>  delayed_work_timer_fn::get_work_cwq
>>__run_timers::spin_lock_irq(>lock)
>>
>> It is possible for __cancel_work_timer to be run over cpu1 __BEFORE__
>> cpu0 is ready to
>> run the timer callback, which is delayed_work_timer_fn in our case.
>
> If del_timer() happens after the timer starts running, del_timer()
> would return NULL and try_to_grab_pending() will be called which will
> return >=0 iff if successfully steals the PENDING bit (ie. it's the
> sole owner of the work item).  If del_timer() happens before the timer
> starts running, the timer function would never run.

If del_timer() happen before __run_timers() is called, while timer irq
already happen,
would it return 1 for the timer is still not detached in __run_timers()?
If it is possible, then we would call try_to_grab_pending(), so that
work->data would
be cleared in this way.

>
> clear_work_data() happens iff the work item is confirmed to be idle.
> At this point, I'm pretty skeptical this is a bug in workqueue itself
> and strongly suggest looking at the crashing workqueue user.

Also I am not very familiar with workqueue mechanism, how many place
in kernel would
clear the work->data beside the clear_work_data()?

>From the memory, I cannot find any hint for work structure being destroyed.
So the only possible seems to me is the work->data be set by someone on purpose.

crash> struct delayed_work 0xbf03d544 -x
struct delayed_work {
  work = {
data = {
  counter = 0x300
},
entry = {
  next = 0xbf03d548,
  prev = 0xbf03d548
},
func = 0xbf014b00
  },
  timer = {
entry = {
  next = 0x0,
  prev = 0x200200
},
expires = 0x12b638b,
base = 0xc0844e01,
function = 0xc014c7a0 ,
data = 0xbf03d544,
slack = 0x,
start_pid = 0x,
start_site = 0x0,
start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
  }
}

Thanks,
Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-06 Thread Tejun Heo
Hello, Lei.

On Wed, Mar 06, 2013 at 10:39:15PM +0800, Lei Wen wrote:
> We find a race condition as below:
> CPU0  CPU1
> timer interrupt happen
> __run_timers
>__run_timers::spin_lock_irq(>lock)
>__run_timers::spin_unlock_irq(>lock)
> 
>__cancel_work_timer
> 
>__cancel_work_timer::del_timer
> 
>__cancel_work_timer::wait_on_work
> 
>__cancel_work_timer::clear_work_data
>__run_timers::call_timer_fn(timer, fn, data);
>  delayed_work_timer_fn::get_work_cwq
>__run_timers::spin_lock_irq(>lock)
> 
> It is possible for __cancel_work_timer to be run over cpu1 __BEFORE__
> cpu0 is ready to
> run the timer callback, which is delayed_work_timer_fn in our case.

If del_timer() happens after the timer starts running, del_timer()
would return NULL and try_to_grab_pending() will be called which will
return >=0 iff if successfully steals the PENDING bit (ie. it's the
sole owner of the work item).  If del_timer() happens before the timer
starts running, the timer function would never run.

clear_work_data() happens iff the work item is confirmed to be idle.
At this point, I'm pretty skeptical this is a bug in workqueue itself
and strongly suggest looking at the crashing workqueue user.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-06 Thread Lei Wen
Hi Tejun

On Wed, Mar 6, 2013 at 12:32 AM, Tejun Heo  wrote:
> Hello,
>
> On Tue, Mar 05, 2013 at 03:31:45PM +0800, Lei Wen wrote:
>> With checking memory,  we find work->data becomes 0x300, when it try
>> to call get_work_cwq
>
> Why would that become 0x300?  Who's writing to that memory?  Nobody
> should be.

We find a race condition as below:
CPU0  CPU1
timer interrupt happen
__run_timers
   __run_timers::spin_lock_irq(>lock)
   __run_timers::spin_unlock_irq(>lock)

   __cancel_work_timer

   __cancel_work_timer::del_timer

   __cancel_work_timer::wait_on_work

   __cancel_work_timer::clear_work_data
   __run_timers::call_timer_fn(timer, fn, data);
 delayed_work_timer_fn::get_work_cwq
   __run_timers::spin_lock_irq(>lock)

It is possible for __cancel_work_timer to be run over cpu1 __BEFORE__
cpu0 is ready to
run the timer callback, which is delayed_work_timer_fn in our case.

Although __cancel_work_timer would call wait_on_work to wait the
already inserted work
complete, but for the work is not queued yet for its calback is not
being executed, so
the result should be wait_on_work directly return, and clear_work_data
clears work->data.
Thus when delayed_work_timer_fn is called, it would see work->data as 0x300.

Do you think it is possible for this kind of sequence?

>
>> in delayed_work_timer_fn. Thus cwq becomes NULL before calls __queue_work.
>> So it is reasonable kernel get panic when it try to access wq with cwq->wq.
>>
>> To fix it, we try to backport below patches:
>> commit 60c057bca22285efefbba033624763a778f243bf
>> Author: Lai Jiangshan 
>> Date:   Wed Feb 6 18:04:53 2013 -0800
>>
>> workqueue: add delayed_work->wq to simplify reentrancy handling
>>
>> commit 1265057fa02c7bed3b6d9ddc8a2048065a370364
>> Author: Tejun Heo 
>> Date:   Wed Aug 8 09:38:42 2012 -0700
>>
>> workqueue: fix CPU binding of flush_delayed_work[_sync]()
>
> Neither should affect the problem you described above.  It *could*
> make the problem go away just because it would stop using wq->data to
> record cwq if the corruption was contained to that field but that
> isn't a proper fix and the underlying problem could easily cause other
> issues.
>
>> And add below change to make sure __cancel_work_timer cannot preempt
>> between run_timer_softirq and delayed_work_timer_fn.
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index bf4888c..0e9f77c 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -2627,7 +2627,7 @@ static bool __cancel_work_timer(struct work_struct 
>> *work,
>> ret = (timer && likely(del_timer(timer)));
>> if (!ret)
>> ret = try_to_grab_pending(work);
>> -   wait_on_work(work);
>> +   flush_work(work);
>> } while (unlikely(ret < 0));
>>
>> clear_work_data(work);
>>
>> Do you think this fix is enough? And add flush_work directly in
>> __cancel_work_timer is ok for
>> the fix?
>
> Maybe I'm missing something but it looks like the root cause hasn't
> been diagnosed and you're piling up band-aids in workqueue code.  You
> might get away with it but could also be making the problem just more
> obscure and difficult to debug (reproducible bugs are happy bugs).
>
> I'd suggest finding out who owns the delayed_work item and examining
> why the delayed_work is getting corrupted in the first place.
>
> Thanks.
>
> --
> tejun

Thanks,
Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-06 Thread Lei Wen
Hi Tejun

On Wed, Mar 6, 2013 at 12:32 AM, Tejun Heo t...@kernel.org wrote:
 Hello,

 On Tue, Mar 05, 2013 at 03:31:45PM +0800, Lei Wen wrote:
 With checking memory,  we find work-data becomes 0x300, when it try
 to call get_work_cwq

 Why would that become 0x300?  Who's writing to that memory?  Nobody
 should be.

We find a race condition as below:
CPU0  CPU1
timer interrupt happen
__run_timers
   __run_timers::spin_lock_irq(base-lock)
   __run_timers::spin_unlock_irq(base-lock)

   __cancel_work_timer

   __cancel_work_timer::del_timer

   __cancel_work_timer::wait_on_work

   __cancel_work_timer::clear_work_data
   __run_timers::call_timer_fn(timer, fn, data);
 delayed_work_timer_fn::get_work_cwq
   __run_timers::spin_lock_irq(base-lock)

It is possible for __cancel_work_timer to be run over cpu1 __BEFORE__
cpu0 is ready to
run the timer callback, which is delayed_work_timer_fn in our case.

Although __cancel_work_timer would call wait_on_work to wait the
already inserted work
complete, but for the work is not queued yet for its calback is not
being executed, so
the result should be wait_on_work directly return, and clear_work_data
clears work-data.
Thus when delayed_work_timer_fn is called, it would see work-data as 0x300.

Do you think it is possible for this kind of sequence?


 in delayed_work_timer_fn. Thus cwq becomes NULL before calls __queue_work.
 So it is reasonable kernel get panic when it try to access wq with cwq-wq.

 To fix it, we try to backport below patches:
 commit 60c057bca22285efefbba033624763a778f243bf
 Author: Lai Jiangshan la...@cn.fujitsu.com
 Date:   Wed Feb 6 18:04:53 2013 -0800

 workqueue: add delayed_work-wq to simplify reentrancy handling

 commit 1265057fa02c7bed3b6d9ddc8a2048065a370364
 Author: Tejun Heo t...@kernel.org
 Date:   Wed Aug 8 09:38:42 2012 -0700

 workqueue: fix CPU binding of flush_delayed_work[_sync]()

 Neither should affect the problem you described above.  It *could*
 make the problem go away just because it would stop using wq-data to
 record cwq if the corruption was contained to that field but that
 isn't a proper fix and the underlying problem could easily cause other
 issues.

 And add below change to make sure __cancel_work_timer cannot preempt
 between run_timer_softirq and delayed_work_timer_fn.
 diff --git a/kernel/workqueue.c b/kernel/workqueue.c
 index bf4888c..0e9f77c 100644
 --- a/kernel/workqueue.c
 +++ b/kernel/workqueue.c
 @@ -2627,7 +2627,7 @@ static bool __cancel_work_timer(struct work_struct 
 *work,
 ret = (timer  likely(del_timer(timer)));
 if (!ret)
 ret = try_to_grab_pending(work);
 -   wait_on_work(work);
 +   flush_work(work);
 } while (unlikely(ret  0));

 clear_work_data(work);

 Do you think this fix is enough? And add flush_work directly in
 __cancel_work_timer is ok for
 the fix?

 Maybe I'm missing something but it looks like the root cause hasn't
 been diagnosed and you're piling up band-aids in workqueue code.  You
 might get away with it but could also be making the problem just more
 obscure and difficult to debug (reproducible bugs are happy bugs).

 I'd suggest finding out who owns the delayed_work item and examining
 why the delayed_work is getting corrupted in the first place.

 Thanks.

 --
 tejun

Thanks,
Lei
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-06 Thread Tejun Heo
Hello, Lei.

On Wed, Mar 06, 2013 at 10:39:15PM +0800, Lei Wen wrote:
 We find a race condition as below:
 CPU0  CPU1
 timer interrupt happen
 __run_timers
__run_timers::spin_lock_irq(base-lock)
__run_timers::spin_unlock_irq(base-lock)
 
__cancel_work_timer
 
__cancel_work_timer::del_timer
 
__cancel_work_timer::wait_on_work
 
__cancel_work_timer::clear_work_data
__run_timers::call_timer_fn(timer, fn, data);
  delayed_work_timer_fn::get_work_cwq
__run_timers::spin_lock_irq(base-lock)
 
 It is possible for __cancel_work_timer to be run over cpu1 __BEFORE__
 cpu0 is ready to
 run the timer callback, which is delayed_work_timer_fn in our case.

If del_timer() happens after the timer starts running, del_timer()
would return NULL and try_to_grab_pending() will be called which will
return =0 iff if successfully steals the PENDING bit (ie. it's the
sole owner of the work item).  If del_timer() happens before the timer
starts running, the timer function would never run.

clear_work_data() happens iff the work item is confirmed to be idle.
At this point, I'm pretty skeptical this is a bug in workqueue itself
and strongly suggest looking at the crashing workqueue user.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-06 Thread Lei Wen
Hi Tejun,

On Thu, Mar 7, 2013 at 3:14 AM, Tejun Heo t...@kernel.org wrote:
 Hello, Lei.

 On Wed, Mar 06, 2013 at 10:39:15PM +0800, Lei Wen wrote:
 We find a race condition as below:
 CPU0  CPU1
 timer interrupt happen
 __run_timers
__run_timers::spin_lock_irq(base-lock)
__run_timers::spin_unlock_irq(base-lock)

__cancel_work_timer

__cancel_work_timer::del_timer

__cancel_work_timer::wait_on_work

__cancel_work_timer::clear_work_data
__run_timers::call_timer_fn(timer, fn, data);
  delayed_work_timer_fn::get_work_cwq
__run_timers::spin_lock_irq(base-lock)

 It is possible for __cancel_work_timer to be run over cpu1 __BEFORE__
 cpu0 is ready to
 run the timer callback, which is delayed_work_timer_fn in our case.

 If del_timer() happens after the timer starts running, del_timer()
 would return NULL and try_to_grab_pending() will be called which will
 return =0 iff if successfully steals the PENDING bit (ie. it's the
 sole owner of the work item).  If del_timer() happens before the timer
 starts running, the timer function would never run.

If del_timer() happen before __run_timers() is called, while timer irq
already happen,
would it return 1 for the timer is still not detached in __run_timers()?
If it is possible, then we would call try_to_grab_pending(), so that
work-data would
be cleared in this way.


 clear_work_data() happens iff the work item is confirmed to be idle.
 At this point, I'm pretty skeptical this is a bug in workqueue itself
 and strongly suggest looking at the crashing workqueue user.

Also I am not very familiar with workqueue mechanism, how many place
in kernel would
clear the work-data beside the clear_work_data()?

From the memory, I cannot find any hint for work structure being destroyed.
So the only possible seems to me is the work-data be set by someone on purpose.

crash struct delayed_work 0xbf03d544 -x
struct delayed_work {
  work = {
data = {
  counter = 0x300
},
entry = {
  next = 0xbf03d548,
  prev = 0xbf03d548
},
func = 0xbf014b00
  },
  timer = {
entry = {
  next = 0x0,
  prev = 0x200200
},
expires = 0x12b638b,
base = 0xc0844e01,
function = 0xc014c7a0 delayed_work_timer_fn,
data = 0xbf03d544,
slack = 0x,
start_pid = 0x,
start_site = 0x0,
start_comm = \000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
  }
}

Thanks,
Lei
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-05 Thread Tejun Heo
Hello,

On Tue, Mar 05, 2013 at 03:31:45PM +0800, Lei Wen wrote:
> With checking memory,  we find work->data becomes 0x300, when it try
> to call get_work_cwq

Why would that become 0x300?  Who's writing to that memory?  Nobody
should be.

> in delayed_work_timer_fn. Thus cwq becomes NULL before calls __queue_work.
> So it is reasonable kernel get panic when it try to access wq with cwq->wq.
> 
> To fix it, we try to backport below patches:
> commit 60c057bca22285efefbba033624763a778f243bf
> Author: Lai Jiangshan 
> Date:   Wed Feb 6 18:04:53 2013 -0800
> 
> workqueue: add delayed_work->wq to simplify reentrancy handling
> 
> commit 1265057fa02c7bed3b6d9ddc8a2048065a370364
> Author: Tejun Heo 
> Date:   Wed Aug 8 09:38:42 2012 -0700
> 
> workqueue: fix CPU binding of flush_delayed_work[_sync]()

Neither should affect the problem you described above.  It *could*
make the problem go away just because it would stop using wq->data to
record cwq if the corruption was contained to that field but that
isn't a proper fix and the underlying problem could easily cause other
issues.

> And add below change to make sure __cancel_work_timer cannot preempt
> between run_timer_softirq and delayed_work_timer_fn.
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index bf4888c..0e9f77c 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -2627,7 +2627,7 @@ static bool __cancel_work_timer(struct work_struct 
> *work,
> ret = (timer && likely(del_timer(timer)));
> if (!ret)
> ret = try_to_grab_pending(work);
> -   wait_on_work(work);
> +   flush_work(work);
> } while (unlikely(ret < 0));
> 
> clear_work_data(work);
> 
> Do you think this fix is enough? And add flush_work directly in
> __cancel_work_timer is ok for
> the fix?

Maybe I'm missing something but it looks like the root cause hasn't
been diagnosed and you're piling up band-aids in workqueue code.  You
might get away with it but could also be making the problem just more
obscure and difficult to debug (reproducible bugs are happy bugs).

I'd suggest finding out who owns the delayed_work item and examining
why the delayed_work is getting corrupted in the first place.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: workqueue panic in 3.4 kernel

2013-03-05 Thread Tejun Heo
Hello,

On Tue, Mar 05, 2013 at 03:31:45PM +0800, Lei Wen wrote:
 With checking memory,  we find work-data becomes 0x300, when it try
 to call get_work_cwq

Why would that become 0x300?  Who's writing to that memory?  Nobody
should be.

 in delayed_work_timer_fn. Thus cwq becomes NULL before calls __queue_work.
 So it is reasonable kernel get panic when it try to access wq with cwq-wq.
 
 To fix it, we try to backport below patches:
 commit 60c057bca22285efefbba033624763a778f243bf
 Author: Lai Jiangshan la...@cn.fujitsu.com
 Date:   Wed Feb 6 18:04:53 2013 -0800
 
 workqueue: add delayed_work-wq to simplify reentrancy handling
 
 commit 1265057fa02c7bed3b6d9ddc8a2048065a370364
 Author: Tejun Heo t...@kernel.org
 Date:   Wed Aug 8 09:38:42 2012 -0700
 
 workqueue: fix CPU binding of flush_delayed_work[_sync]()

Neither should affect the problem you described above.  It *could*
make the problem go away just because it would stop using wq-data to
record cwq if the corruption was contained to that field but that
isn't a proper fix and the underlying problem could easily cause other
issues.

 And add below change to make sure __cancel_work_timer cannot preempt
 between run_timer_softirq and delayed_work_timer_fn.
 diff --git a/kernel/workqueue.c b/kernel/workqueue.c
 index bf4888c..0e9f77c 100644
 --- a/kernel/workqueue.c
 +++ b/kernel/workqueue.c
 @@ -2627,7 +2627,7 @@ static bool __cancel_work_timer(struct work_struct 
 *work,
 ret = (timer  likely(del_timer(timer)));
 if (!ret)
 ret = try_to_grab_pending(work);
 -   wait_on_work(work);
 +   flush_work(work);
 } while (unlikely(ret  0));
 
 clear_work_data(work);
 
 Do you think this fix is enough? And add flush_work directly in
 __cancel_work_timer is ok for
 the fix?

Maybe I'm missing something but it looks like the root cause hasn't
been diagnosed and you're piling up band-aids in workqueue code.  You
might get away with it but could also be making the problem just more
obscure and difficult to debug (reproducible bugs are happy bugs).

I'd suggest finding out who owns the delayed_work item and examining
why the delayed_work is getting corrupted in the first place.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/