Re: Possible circular locking dependency detected between cpu_hotplug_lock.rw_sem and wfc.work

2017-07-27 Thread Thomas Gleixner
On Thu, 27 Jul 2017, Thomas Gleixner wrote:
> On Thu, 27 Jul 2017, Michael Ellerman wrote:
> > Thomas Gleixner  writes:
> > 
> > > On Wed, 26 Jul 2017, Michael Ellerman wrote:
> > >
> > >> Hi Thomas,
> > >> 
> > >> I'm seeing the lockdep barf below on some bare metal Power8 machines.
> > >> 
> > >> This seems to be caused by our smp_cpus_done(), which does:
> > >> 
> > >>   void __init smp_cpus_done(unsigned int max_cpus)
> > >>   {
> > >>  /*
> > >>   * We want the setup_cpu() here to be called on the boot CPU, 
> > >> but
> > >>   * init might run on any CPU, so make sure it's invoked on the 
> > >> boot
> > >>   * CPU.
> > >>   */
> > >>  if (smp_ops && smp_ops->setup_cpu)
> > >>  work_on_cpu_safe(boot_cpuid, smp_setup_cpu_workfn, 
> > >> NULL);
> > >> 
> > >> 
> > >> I don't think CPU hotplug can happen at this point, so I don't think
> > >> there's really a bug.
> > >> 
> > >> But it looks like the work_on_cpu_safe() call could just go away, since
> > >> you pinned init to the boot CPU in 8fb12156b8db ("init: Pin init task to
> > >> the boot CPU, initially"). Though I can't see where init is unpinned, so
> > >> maybe we do still need to do it?
> > >
> > > It's undone in sched_init_smp(). So it looks safe. The call order is:
> > >
> > >  smp_init()
> > >   ...
> > >   smp_cpus_done()
> > >
> > >  sched_init_smp()
> > 
> > Great thanks.
> > 
> > Patch on the way.
> 
> Hmm. Second thoughts. The issue is the stability of the CPUs. Surely the
> boot CPU can't go away at that point, but the debug stuff does not know
> about it. maybe I'm missing something 

Ok. Now seeing your patch I know what I was missing :)


Re: Possible circular locking dependency detected between cpu_hotplug_lock.rw_sem and wfc.work

2017-07-27 Thread Thomas Gleixner
On Thu, 27 Jul 2017, Michael Ellerman wrote:
> Thomas Gleixner  writes:
> 
> > On Wed, 26 Jul 2017, Michael Ellerman wrote:
> >
> >> Hi Thomas,
> >> 
> >> I'm seeing the lockdep barf below on some bare metal Power8 machines.
> >> 
> >> This seems to be caused by our smp_cpus_done(), which does:
> >> 
> >>   void __init smp_cpus_done(unsigned int max_cpus)
> >>   {
> >>/*
> >> * We want the setup_cpu() here to be called on the boot CPU, but
> >> * init might run on any CPU, so make sure it's invoked on the boot
> >> * CPU.
> >> */
> >>if (smp_ops && smp_ops->setup_cpu)
> >>work_on_cpu_safe(boot_cpuid, smp_setup_cpu_workfn, NULL);
> >> 
> >> 
> >> I don't think CPU hotplug can happen at this point, so I don't think
> >> there's really a bug.
> >> 
> >> But it looks like the work_on_cpu_safe() call could just go away, since
> >> you pinned init to the boot CPU in 8fb12156b8db ("init: Pin init task to
> >> the boot CPU, initially"). Though I can't see where init is unpinned, so
> >> maybe we do still need to do it?
> >
> > It's undone in sched_init_smp(). So it looks safe. The call order is:
> >
> >  smp_init()
> > ...
> > smp_cpus_done()
> >
> >  sched_init_smp()
> 
> Great thanks.
> 
> Patch on the way.

Hmm. Second thoughts. The issue is the stability of the CPUs. Surely the
boot CPU can't go away at that point, but the debug stuff does not know
about it. maybe I'm missing something 

Thanks,

tglx




Re: Possible circular locking dependency detected between cpu_hotplug_lock.rw_sem and wfc.work

2017-07-27 Thread Michael Ellerman
Thomas Gleixner  writes:

> On Wed, 26 Jul 2017, Michael Ellerman wrote:
>
>> Hi Thomas,
>> 
>> I'm seeing the lockdep barf below on some bare metal Power8 machines.
>> 
>> This seems to be caused by our smp_cpus_done(), which does:
>> 
>>   void __init smp_cpus_done(unsigned int max_cpus)
>>   {
>>  /*
>>   * We want the setup_cpu() here to be called on the boot CPU, but
>>   * init might run on any CPU, so make sure it's invoked on the boot
>>   * CPU.
>>   */
>>  if (smp_ops && smp_ops->setup_cpu)
>>  work_on_cpu_safe(boot_cpuid, smp_setup_cpu_workfn, NULL);
>> 
>> 
>> I don't think CPU hotplug can happen at this point, so I don't think
>> there's really a bug.
>> 
>> But it looks like the work_on_cpu_safe() call could just go away, since
>> you pinned init to the boot CPU in 8fb12156b8db ("init: Pin init task to
>> the boot CPU, initially"). Though I can't see where init is unpinned, so
>> maybe we do still need to do it?
>
> It's undone in sched_init_smp(). So it looks safe. The call order is:
>
>  smp_init()
>   ...
>   smp_cpus_done()
>
>  sched_init_smp()

Great thanks.

Patch on the way.

cheers


Re: Possible circular locking dependency detected between cpu_hotplug_lock.rw_sem and wfc.work

2017-07-26 Thread Thomas Gleixner
On Wed, 26 Jul 2017, Michael Ellerman wrote:

> Hi Thomas,
> 
> I'm seeing the lockdep barf below on some bare metal Power8 machines.
> 
> This seems to be caused by our smp_cpus_done(), which does:
> 
>   void __init smp_cpus_done(unsigned int max_cpus)
>   {
>   /*
>* We want the setup_cpu() here to be called on the boot CPU, but
>* init might run on any CPU, so make sure it's invoked on the boot
>* CPU.
>*/
>   if (smp_ops && smp_ops->setup_cpu)
>   work_on_cpu_safe(boot_cpuid, smp_setup_cpu_workfn, NULL);
> 
> 
> I don't think CPU hotplug can happen at this point, so I don't think
> there's really a bug.
> 
> But it looks like the work_on_cpu_safe() call could just go away, since
> you pinned init to the boot CPU in 8fb12156b8db ("init: Pin init task to
> the boot CPU, initially"). Though I can't see where init is unpinned, so
> maybe we do still need to do it?

It's undone in sched_init_smp(). So it looks safe. The call order is:

 smp_init()
...
smp_cpus_done()

 sched_init_smp()

Thanks,

tglx


Possible circular locking dependency detected between cpu_hotplug_lock.rw_sem and wfc.work

2017-07-26 Thread Michael Ellerman
Hi Thomas,

I'm seeing the lockdep barf below on some bare metal Power8 machines.

This seems to be caused by our smp_cpus_done(), which does:

  void __init smp_cpus_done(unsigned int max_cpus)
  {
/*
 * We want the setup_cpu() here to be called on the boot CPU, but
 * init might run on any CPU, so make sure it's invoked on the boot
 * CPU.
 */
if (smp_ops && smp_ops->setup_cpu)
work_on_cpu_safe(boot_cpuid, smp_setup_cpu_workfn, NULL);


I don't think CPU hotplug can happen at this point, so I don't think
there's really a bug.

But it looks like the work_on_cpu_safe() call could just go away, since
you pinned init to the boot CPU in 8fb12156b8db ("init: Pin init task to
the boot CPU, initially"). Though I can't see where init is unpinned, so
maybe we do still need to do it?

cheers


[1.468911] ipr: IBM Power RAID SCSI Device Driver version: 2.6.4 (March 14, 
2017)
[1.469032] ipr 0001:08:00.0: Found IOA with IRQ: 0
[1.469167] 
[1.469193] ==
[1.469250] WARNING: possible circular locking dependency detected
[1.469309] 4.12.0-gcc6-gb40b238 #1 Not tainted
[1.469355] --
[1.469413] kworker/0:1/971 is trying to acquire lock:
[1.469459]  (cpu_hotplug_lock.rw_sem){++}, at: [] 
apply_workqueue_attrs+0x34/0xa0
[1.469544] 
[1.469544] but task is already holding lock:
[1.469602]  (()){+.+.+.}, at: [] 
process_one_work+0x25c/0x800
[1.469674] 
[1.469674] which lock already depends on the new lock.
[1.469674] 
[1.469744] 
[1.469744] the existing dependency chain (in reverse order) is:
[1.469813] 
[1.469813] -> #1 (()){+.+.+.}:
[1.469862]flush_work+0x74/0x340
[1.469898]work_on_cpu+0xb8/0xe0
[1.469934]work_on_cpu_safe+0x80/0xd0
[1.469982]smp_cpus_done+0x54/0xb0
[1.470017]smp_init+0x1a0/0x1cc
[1.470053]kernel_init_freeable+0x1f4/0x3b0
[1.470100]kernel_init+0x24/0x160
[1.470137]ret_from_kernel_thread+0x5c/0x74
[1.470183] 
[1.470183] -> #0 (cpu_hotplug_lock.rw_sem){++}:
[1.470244]lock_acquire+0xec/0x2e0
[1.470279]cpus_read_lock+0x4c/0xd0
[1.470315]apply_workqueue_attrs+0x34/0xa0
[1.470362]__alloc_workqueue_key+0x1b4/0x5b4
[1.470410]scsi_host_alloc+0x328/0x4b0
[1.470457]ipr_probe+0xb4/0x1f00
[1.470493]local_pci_probe+0x6c/0x140
[1.470541]work_for_cpu_fn+0x38/0x60
[1.470588]process_one_work+0x310/0x800
[1.470635]worker_thread+0x2b0/0x520
[1.470682]kthread+0x164/0x1b0
[1.470718]ret_from_kernel_thread+0x5c/0x74
[1.470764] 
[1.470764] other info that might help us debug this:
[1.470764] 
[1.470834]  Possible unsafe locking scenario:
[1.470834] 
[1.470891]CPU0CPU1
[1.470937]
[1.470984]   lock(());
[1.471020]lock(cpu_hotplug_lock.rw_sem);
[1.471078]lock(());
[1.471136]   lock(cpu_hotplug_lock.rw_sem);
[1.471183] 
[1.471183]  *** DEADLOCK ***
[1.471183] 
[1.471242] 2 locks held by kworker/0:1/971:
[1.471289]  #0:  ("events"){.+.+.+}, at: [] 
process_one_work+0x25c/0x800
[1.471360]  #1:  (()){+.+.+.}, at: [] 
process_one_work+0x25c/0x800
[1.471488] 
[1.471488] stack backtrace:
[1.471582] CPU: 0 PID: 971 Comm: kworker/0:1 Not tainted 
4.12.0-gcc6-gb40b238 #1
[1.471721] Workqueue: events work_for_cpu_fn
[1.471813] Call Trace:
[1.471859] [c007f877f560] [c0b491b8] dump_stack+0xe8/0x160 
(unreliable)
[1.471996] [c007f877f5a0] [c0153e68] 
print_circular_bug+0x288/0x3d0
[1.472133] [c007f877f640] [c0157ec8] 
__lock_acquire+0x1858/0x1a20
[1.472271] [c007f877f7b0] [c0158bfc] lock_acquire+0xec/0x2e0
[1.472386] [c007f877f880] [c00d4f8c] cpus_read_lock+0x4c/0xd0
[1.472501] [c007f877f8b0] [c0100974] 
apply_workqueue_attrs+0x34/0xa0
[1.472639] [c007f877f8f0] [c0102ef4] 
__alloc_workqueue_key+0x1b4/0x5b4
[1.472776] [c007f877f9b0] [c078cd58] scsi_host_alloc+0x328/0x4b0
[1.472913] [c007f877fa50] [c07bd414] ipr_probe+0xb4/0x1f00
[1.473028] [c007f877fbb0] [c06157fc] local_pci_probe+0x6c/0x140
[1.473166] [c007f877fc40] [c00f8298] work_for_cpu_fn+0x38/0x60
[1.473281] [c007f877fc70] [c00fdbe0] 
process_one_work+0x310/0x800
[1.473418] [c007f877fd30] [c00fe380] worker_thread+0x2b0/0x520
[1.473534] [c007f877fdc0] [c0107c84] kthread+0x164/0x1b0
[1.473649] [c007f877fe30] [c000b6e8] 
ret_from_kernel_thread+0x5c/0x74