Re: Possible circular locking dependency detected between cpu_hotplug_lock.rw_sem and wfc.work
On Thu, 27 Jul 2017, Thomas Gleixner wrote: > On Thu, 27 Jul 2017, Michael Ellerman wrote: > > Thomas Gleixnerwrites: > > > > > On Wed, 26 Jul 2017, Michael Ellerman wrote: > > > > > >> Hi Thomas, > > >> > > >> I'm seeing the lockdep barf below on some bare metal Power8 machines. > > >> > > >> This seems to be caused by our smp_cpus_done(), which does: > > >> > > >> void __init smp_cpus_done(unsigned int max_cpus) > > >> { > > >> /* > > >> * We want the setup_cpu() here to be called on the boot CPU, > > >> but > > >> * init might run on any CPU, so make sure it's invoked on the > > >> boot > > >> * CPU. > > >> */ > > >> if (smp_ops && smp_ops->setup_cpu) > > >> work_on_cpu_safe(boot_cpuid, smp_setup_cpu_workfn, > > >> NULL); > > >> > > >> > > >> I don't think CPU hotplug can happen at this point, so I don't think > > >> there's really a bug. > > >> > > >> But it looks like the work_on_cpu_safe() call could just go away, since > > >> you pinned init to the boot CPU in 8fb12156b8db ("init: Pin init task to > > >> the boot CPU, initially"). Though I can't see where init is unpinned, so > > >> maybe we do still need to do it? > > > > > > It's undone in sched_init_smp(). So it looks safe. The call order is: > > > > > > smp_init() > > > ... > > > smp_cpus_done() > > > > > > sched_init_smp() > > > > Great thanks. > > > > Patch on the way. > > Hmm. Second thoughts. The issue is the stability of the CPUs. Surely the > boot CPU can't go away at that point, but the debug stuff does not know > about it. maybe I'm missing something Ok. Now seeing your patch I know what I was missing :)
Re: Possible circular locking dependency detected between cpu_hotplug_lock.rw_sem and wfc.work
On Thu, 27 Jul 2017, Michael Ellerman wrote: > Thomas Gleixnerwrites: > > > On Wed, 26 Jul 2017, Michael Ellerman wrote: > > > >> Hi Thomas, > >> > >> I'm seeing the lockdep barf below on some bare metal Power8 machines. > >> > >> This seems to be caused by our smp_cpus_done(), which does: > >> > >> void __init smp_cpus_done(unsigned int max_cpus) > >> { > >>/* > >> * We want the setup_cpu() here to be called on the boot CPU, but > >> * init might run on any CPU, so make sure it's invoked on the boot > >> * CPU. > >> */ > >>if (smp_ops && smp_ops->setup_cpu) > >>work_on_cpu_safe(boot_cpuid, smp_setup_cpu_workfn, NULL); > >> > >> > >> I don't think CPU hotplug can happen at this point, so I don't think > >> there's really a bug. > >> > >> But it looks like the work_on_cpu_safe() call could just go away, since > >> you pinned init to the boot CPU in 8fb12156b8db ("init: Pin init task to > >> the boot CPU, initially"). Though I can't see where init is unpinned, so > >> maybe we do still need to do it? > > > > It's undone in sched_init_smp(). So it looks safe. The call order is: > > > > smp_init() > > ... > > smp_cpus_done() > > > > sched_init_smp() > > Great thanks. > > Patch on the way. Hmm. Second thoughts. The issue is the stability of the CPUs. Surely the boot CPU can't go away at that point, but the debug stuff does not know about it. maybe I'm missing something Thanks, tglx
Re: Possible circular locking dependency detected between cpu_hotplug_lock.rw_sem and wfc.work
Thomas Gleixnerwrites: > On Wed, 26 Jul 2017, Michael Ellerman wrote: > >> Hi Thomas, >> >> I'm seeing the lockdep barf below on some bare metal Power8 machines. >> >> This seems to be caused by our smp_cpus_done(), which does: >> >> void __init smp_cpus_done(unsigned int max_cpus) >> { >> /* >> * We want the setup_cpu() here to be called on the boot CPU, but >> * init might run on any CPU, so make sure it's invoked on the boot >> * CPU. >> */ >> if (smp_ops && smp_ops->setup_cpu) >> work_on_cpu_safe(boot_cpuid, smp_setup_cpu_workfn, NULL); >> >> >> I don't think CPU hotplug can happen at this point, so I don't think >> there's really a bug. >> >> But it looks like the work_on_cpu_safe() call could just go away, since >> you pinned init to the boot CPU in 8fb12156b8db ("init: Pin init task to >> the boot CPU, initially"). Though I can't see where init is unpinned, so >> maybe we do still need to do it? > > It's undone in sched_init_smp(). So it looks safe. The call order is: > > smp_init() > ... > smp_cpus_done() > > sched_init_smp() Great thanks. Patch on the way. cheers
Re: Possible circular locking dependency detected between cpu_hotplug_lock.rw_sem and wfc.work
On Wed, 26 Jul 2017, Michael Ellerman wrote: > Hi Thomas, > > I'm seeing the lockdep barf below on some bare metal Power8 machines. > > This seems to be caused by our smp_cpus_done(), which does: > > void __init smp_cpus_done(unsigned int max_cpus) > { > /* >* We want the setup_cpu() here to be called on the boot CPU, but >* init might run on any CPU, so make sure it's invoked on the boot >* CPU. >*/ > if (smp_ops && smp_ops->setup_cpu) > work_on_cpu_safe(boot_cpuid, smp_setup_cpu_workfn, NULL); > > > I don't think CPU hotplug can happen at this point, so I don't think > there's really a bug. > > But it looks like the work_on_cpu_safe() call could just go away, since > you pinned init to the boot CPU in 8fb12156b8db ("init: Pin init task to > the boot CPU, initially"). Though I can't see where init is unpinned, so > maybe we do still need to do it? It's undone in sched_init_smp(). So it looks safe. The call order is: smp_init() ... smp_cpus_done() sched_init_smp() Thanks, tglx
Possible circular locking dependency detected between cpu_hotplug_lock.rw_sem and wfc.work
Hi Thomas, I'm seeing the lockdep barf below on some bare metal Power8 machines. This seems to be caused by our smp_cpus_done(), which does: void __init smp_cpus_done(unsigned int max_cpus) { /* * We want the setup_cpu() here to be called on the boot CPU, but * init might run on any CPU, so make sure it's invoked on the boot * CPU. */ if (smp_ops && smp_ops->setup_cpu) work_on_cpu_safe(boot_cpuid, smp_setup_cpu_workfn, NULL); I don't think CPU hotplug can happen at this point, so I don't think there's really a bug. But it looks like the work_on_cpu_safe() call could just go away, since you pinned init to the boot CPU in 8fb12156b8db ("init: Pin init task to the boot CPU, initially"). Though I can't see where init is unpinned, so maybe we do still need to do it? cheers [1.468911] ipr: IBM Power RAID SCSI Device Driver version: 2.6.4 (March 14, 2017) [1.469032] ipr 0001:08:00.0: Found IOA with IRQ: 0 [1.469167] [1.469193] == [1.469250] WARNING: possible circular locking dependency detected [1.469309] 4.12.0-gcc6-gb40b238 #1 Not tainted [1.469355] -- [1.469413] kworker/0:1/971 is trying to acquire lock: [1.469459] (cpu_hotplug_lock.rw_sem){++}, at: [] apply_workqueue_attrs+0x34/0xa0 [1.469544] [1.469544] but task is already holding lock: [1.469602] (()){+.+.+.}, at: [] process_one_work+0x25c/0x800 [1.469674] [1.469674] which lock already depends on the new lock. [1.469674] [1.469744] [1.469744] the existing dependency chain (in reverse order) is: [1.469813] [1.469813] -> #1 (()){+.+.+.}: [1.469862]flush_work+0x74/0x340 [1.469898]work_on_cpu+0xb8/0xe0 [1.469934]work_on_cpu_safe+0x80/0xd0 [1.469982]smp_cpus_done+0x54/0xb0 [1.470017]smp_init+0x1a0/0x1cc [1.470053]kernel_init_freeable+0x1f4/0x3b0 [1.470100]kernel_init+0x24/0x160 [1.470137]ret_from_kernel_thread+0x5c/0x74 [1.470183] [1.470183] -> #0 (cpu_hotplug_lock.rw_sem){++}: [1.470244]lock_acquire+0xec/0x2e0 [1.470279]cpus_read_lock+0x4c/0xd0 [1.470315]apply_workqueue_attrs+0x34/0xa0 [1.470362]__alloc_workqueue_key+0x1b4/0x5b4 [1.470410]scsi_host_alloc+0x328/0x4b0 [1.470457]ipr_probe+0xb4/0x1f00 [1.470493]local_pci_probe+0x6c/0x140 [1.470541]work_for_cpu_fn+0x38/0x60 [1.470588]process_one_work+0x310/0x800 [1.470635]worker_thread+0x2b0/0x520 [1.470682]kthread+0x164/0x1b0 [1.470718]ret_from_kernel_thread+0x5c/0x74 [1.470764] [1.470764] other info that might help us debug this: [1.470764] [1.470834] Possible unsafe locking scenario: [1.470834] [1.470891]CPU0CPU1 [1.470937] [1.470984] lock(()); [1.471020]lock(cpu_hotplug_lock.rw_sem); [1.471078]lock(()); [1.471136] lock(cpu_hotplug_lock.rw_sem); [1.471183] [1.471183] *** DEADLOCK *** [1.471183] [1.471242] 2 locks held by kworker/0:1/971: [1.471289] #0: ("events"){.+.+.+}, at: [] process_one_work+0x25c/0x800 [1.471360] #1: (()){+.+.+.}, at: [] process_one_work+0x25c/0x800 [1.471488] [1.471488] stack backtrace: [1.471582] CPU: 0 PID: 971 Comm: kworker/0:1 Not tainted 4.12.0-gcc6-gb40b238 #1 [1.471721] Workqueue: events work_for_cpu_fn [1.471813] Call Trace: [1.471859] [c007f877f560] [c0b491b8] dump_stack+0xe8/0x160 (unreliable) [1.471996] [c007f877f5a0] [c0153e68] print_circular_bug+0x288/0x3d0 [1.472133] [c007f877f640] [c0157ec8] __lock_acquire+0x1858/0x1a20 [1.472271] [c007f877f7b0] [c0158bfc] lock_acquire+0xec/0x2e0 [1.472386] [c007f877f880] [c00d4f8c] cpus_read_lock+0x4c/0xd0 [1.472501] [c007f877f8b0] [c0100974] apply_workqueue_attrs+0x34/0xa0 [1.472639] [c007f877f8f0] [c0102ef4] __alloc_workqueue_key+0x1b4/0x5b4 [1.472776] [c007f877f9b0] [c078cd58] scsi_host_alloc+0x328/0x4b0 [1.472913] [c007f877fa50] [c07bd414] ipr_probe+0xb4/0x1f00 [1.473028] [c007f877fbb0] [c06157fc] local_pci_probe+0x6c/0x140 [1.473166] [c007f877fc40] [c00f8298] work_for_cpu_fn+0x38/0x60 [1.473281] [c007f877fc70] [c00fdbe0] process_one_work+0x310/0x800 [1.473418] [c007f877fd30] [c00fe380] worker_thread+0x2b0/0x520 [1.473534] [c007f877fdc0] [c0107c84] kthread+0x164/0x1b0 [1.473649] [c007f877fe30] [c000b6e8] ret_from_kernel_thread+0x5c/0x74