Re: WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0

2020-10-08 Thread Daniel Jordan
On Wed, Oct 07, 2020 at 09:41:17PM +0200, Corentin Labbe wrote:
> I have added CONFIG_FTRACE=y and your second patch.
> The boot log can be seen at http://kernel.montjoie.ovh/108789.log
> 
> But it seems the latest dump_stack addition flood a bit.

Heh, sorry for making it spew, there wasn't such a flood when I tried.  Your
output is sufficiently incriminating, so I'll go post the fix now.

> I have started to read ftrace documentation, but if you have a quick what to 
> do in /sys/kernel/debug/tracing, it will be helpfull.

Sure, you can view the trace in /sys/kernel/debug/tracing/trace and
kernel-parameters.txt has the boot options documented.


Re: WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0

2020-10-07 Thread Corentin Labbe
On Mon, Oct 05, 2020 at 01:09:10PM -0400, Daniel Jordan wrote:
> On Thu, Oct 01, 2020 at 07:50:22PM +0200, Corentin Labbe wrote:
> > On Tue, Mar 03, 2020 at 04:30:17PM -0500, Daniel Jordan wrote:
> > > Barring other ideas, Corentin, would you be willing to boot with
> > > 
> > > trace_event=initcall:*,module:* trace_options=stacktrace
> > > 
> > > and
> > > 
> > > diff --git a/kernel/module.c b/kernel/module.c
> > > index 33569a01d6e1..393be6979a27 100644
> > > --- a/kernel/module.c
> > > +++ b/kernel/module.c
> > > @@ -3604,8 +3604,11 @@ static noinline int do_init_module(struct module 
> > > *mod)
> > >* be cleaned up needs to sync with the queued work - ie
> > >* rcu_barrier()
> > >*/
> > > - if (llist_add(>node, _free_list))
> > > + if (llist_add(>node, _free_list)) {
> > > + pr_warn("%s: schedule_work for mod=%s\n", __func__, mod->name);
> > > + dump_stack();
> > >   schedule_work(_free_wq);
> > > + }
> > >  
> > >   mutex_unlock(_mutex);
> > >   wake_up_all(_wq);
> > > 
> > > but not my earlier fix and share the dmesg and ftrace output to see if the
> > > theory holds?
> > > 
> > > Also, could you attach your config?  Curious now what your crypto options 
> > > look
> > > like after fiddling with some of them today while trying and failing to 
> > > see
> > > this on x86.
> > > 
> > > thanks,
> > > Daniel
> > 
> > Hello
> > 
> > Sorry for the very delayed answer.
> > 
> > I fail to reproduce it on x86 (qemu and  real hw) and arm.
> > It seems to only happen on arm64.
> 
> Thanks for the config and dmesg, but there's no ftrace.  I see it's not
> configured in your kernel, so could you boot with my earlier debug patch plus
> this one and the kernel argument initcall_debug instead?
> 
> I'm trying to see whether it really is a request module call from the crypto
> tests that's triggering this warning.  Preeetty likely that's what's 
> happening,
> but want to be sure since I can't reproduce this.  Then I can post the fix.
> 

I have added CONFIG_FTRACE=y and your second patch.
The boot log can be seen at http://kernel.montjoie.ovh/108789.log

But it seems the latest dump_stack addition flood a bit.
I have started to read ftrace documentation, but if you have a quick what to do 
in /sys/kernel/debug/tracing, it will be helpfull.



Re: WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0

2020-10-05 Thread Daniel Jordan
On Thu, Oct 01, 2020 at 07:50:22PM +0200, Corentin Labbe wrote:
> On Tue, Mar 03, 2020 at 04:30:17PM -0500, Daniel Jordan wrote:
> > Barring other ideas, Corentin, would you be willing to boot with
> > 
> > trace_event=initcall:*,module:* trace_options=stacktrace
> > 
> > and
> > 
> > diff --git a/kernel/module.c b/kernel/module.c
> > index 33569a01d6e1..393be6979a27 100644
> > --- a/kernel/module.c
> > +++ b/kernel/module.c
> > @@ -3604,8 +3604,11 @@ static noinline int do_init_module(struct module 
> > *mod)
> >  * be cleaned up needs to sync with the queued work - ie
> >  * rcu_barrier()
> >  */
> > -   if (llist_add(>node, _free_list))
> > +   if (llist_add(>node, _free_list)) {
> > +   pr_warn("%s: schedule_work for mod=%s\n", __func__, mod->name);
> > +   dump_stack();
> > schedule_work(_free_wq);
> > +   }
> >  
> > mutex_unlock(_mutex);
> > wake_up_all(_wq);
> > 
> > but not my earlier fix and share the dmesg and ftrace output to see if the
> > theory holds?
> > 
> > Also, could you attach your config?  Curious now what your crypto options 
> > look
> > like after fiddling with some of them today while trying and failing to see
> > this on x86.
> > 
> > thanks,
> > Daniel
> 
> Hello
> 
> Sorry for the very delayed answer.
> 
> I fail to reproduce it on x86 (qemu and  real hw) and arm.
> It seems to only happen on arm64.

Thanks for the config and dmesg, but there's no ftrace.  I see it's not
configured in your kernel, so could you boot with my earlier debug patch plus
this one and the kernel argument initcall_debug instead?

I'm trying to see whether it really is a request module call from the crypto
tests that's triggering this warning.  Preeetty likely that's what's happening,
but want to be sure since I can't reproduce this.  Then I can post the fix.

diff --git a/crypto/algapi.c b/crypto/algapi.c
index fdabf2675b63..0667c6b4588e 100644
--- a/crypto/algapi.c
+++ b/crypto/algapi.c
@@ -393,6 +393,10 @@ static void crypto_wait_for_test(struct crypto_larval 
*larval)
 {
int err;
 
+   pr_warn("%s: cra_name %s cra_driver_name %s\n", __func__,
+   larval->adult->cra_name, larval->adult->cra_driver_name);
+   dump_stack();
+
err = crypto_probing_notify(CRYPTO_MSG_ALG_REGISTER, larval->adult);
if (err != NOTIFY_STOP) {
if (WARN_ON(err != NOTIFY_DONE))
diff --git a/kernel/kmod.c b/kernel/kmod.c
index 3cd075ce2a1e..46c4645be763 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -166,6 +166,8 @@ int __request_module(bool wait, const char *fmt, ...)
}
 
trace_module_request(module_name, wait, _RET_IP_);
+   pr_warn("%s: %s\n", __func__, module_name);
+   dump_stack();
 
ret = call_modprobe(module_name, wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);
 


Re: WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0

2020-09-30 Thread Daniel Jordan
On Fri, Sep 25, 2020 at 08:12:03PM +0200, Corentin Labbe wrote:
> On Tue, Mar 03, 2020 at 04:31:11PM -0500, Daniel Jordan wrote:
> > On Tue, Mar 03, 2020 at 08:48:19AM +0100, Corentin Labbe wrote:
> > > The patch fix the issue. Thanks!
> > 
> > Thanks for trying it!
> > 
> > > So you could add:
> > > Reported-by: Corentin Labbe 
> > > Tested-by: Corentin Labbe 
> > > Tested-on: sun50i-h6-pine-h64
> > > Tested-on: imx8mn-ddr4-evk
> > > Tested-on: sun50i-a64-bananapi-m64
> > 
> > I definitely will if the patch turns out to be the right fix.
> > 
> > thanks,
> > Daniel
> 
> Hello
> 
> I forgot about this problem since the patch is in my branch since.
> But a co-worker hit this problem recently and without this patch my CI still 
> have it.

Hi,

Sure, I'm happy to help get a fix merged, but let's nail down what the problem
is first.  It'd be useful to have the things requested here:

https://lore.kernel.org/linux-crypto/20200303213017.tanczhqd3nhpe...@ca-dmjordan1.us.oracle.com/

thanks,
Daniel


Re: WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0

2020-09-25 Thread Corentin Labbe
On Tue, Mar 03, 2020 at 04:31:11PM -0500, Daniel Jordan wrote:
> On Tue, Mar 03, 2020 at 08:48:19AM +0100, Corentin Labbe wrote:
> > The patch fix the issue. Thanks!
> 
> Thanks for trying it!
> 
> > So you could add:
> > Reported-by: Corentin Labbe 
> > Tested-by: Corentin Labbe 
> > Tested-on: sun50i-h6-pine-h64
> > Tested-on: imx8mn-ddr4-evk
> > Tested-on: sun50i-a64-bananapi-m64
> 
> I definitely will if the patch turns out to be the right fix.
> 
> thanks,
> Daniel

Hello

I forgot about this problem since the patch is in my branch since.
But a co-worker hit this problem recently and without this patch my CI still 
have it.

Regards