Re: blk-mq breaks suspend even with runtime PM patch
On 08/08/2017 12:33 PM, Mike Galbraith wrote: > On Tue, 2017-08-08 at 18:50 +0200, Mike Galbraith wrote: >> On Tue, 2017-08-08 at 09:44 -0700, Greg KH wrote: >>> >>> Should these go back farther than 4.12? Looks like they apply cleanly >>> to 4.9, didn't look older than that... >> >> I met prerequisites at 4.11... > > FWIW, I took/modified 2d0364c8c1a9. Dunno if the suspend regression > exists in 4.11 though, without which you'll likely want nothing. It does exist, the only change here is that we default people to scsi-mq in 4.12+. Honestly, nobody complained since we've had scsi-mq, so I could pivot both ways on whether we really need the changes in earlier versions or not. -- Jens Axboe
Re: blk-mq breaks suspend even with runtime PM patch
On 08/08/2017 12:33 PM, Mike Galbraith wrote: > On Tue, 2017-08-08 at 18:50 +0200, Mike Galbraith wrote: >> On Tue, 2017-08-08 at 09:44 -0700, Greg KH wrote: >>> >>> Should these go back farther than 4.12? Looks like they apply cleanly >>> to 4.9, didn't look older than that... >> >> I met prerequisites at 4.11... > > FWIW, I took/modified 2d0364c8c1a9. Dunno if the suspend regression > exists in 4.11 though, without which you'll likely want nothing. It does exist, the only change here is that we default people to scsi-mq in 4.12+. Honestly, nobody complained since we've had scsi-mq, so I could pivot both ways on whether we really need the changes in earlier versions or not. -- Jens Axboe
Re: blk-mq breaks suspend even with runtime PM patch
On Tue, 2017-08-08 at 18:50 +0200, Mike Galbraith wrote: > On Tue, 2017-08-08 at 09:44 -0700, Greg KH wrote: > > > > Should these go back farther than 4.12? Looks like they apply cleanly > > to 4.9, didn't look older than that... > > I met prerequisites at 4.11... FWIW, I took/modified 2d0364c8c1a9. Dunno if the suspend regression exists in 4.11 though, without which you'll likely want nothing. -Mike
Re: blk-mq breaks suspend even with runtime PM patch
On Tue, 2017-08-08 at 18:50 +0200, Mike Galbraith wrote: > On Tue, 2017-08-08 at 09:44 -0700, Greg KH wrote: > > > > Should these go back farther than 4.12? Looks like they apply cleanly > > to 4.9, didn't look older than that... > > I met prerequisites at 4.11... FWIW, I took/modified 2d0364c8c1a9. Dunno if the suspend regression exists in 4.11 though, without which you'll likely want nothing. -Mike
Re: blk-mq breaks suspend even with runtime PM patch
On Tue, 2017-08-08 at 09:44 -0700, Greg KH wrote: > > Should these go back farther than 4.12? Looks like they apply cleanly > to 4.9, didn't look older than that... I met prerequisites at 4.11, but I wasn't patching anything remotely resembling virgin source. -Mike
Re: blk-mq breaks suspend even with runtime PM patch
On Tue, 2017-08-08 at 09:44 -0700, Greg KH wrote: > > Should these go back farther than 4.12? Looks like they apply cleanly > to 4.9, didn't look older than that... I met prerequisites at 4.11, but I wasn't patching anything remotely resembling virgin source. -Mike
Re: blk-mq breaks suspend even with runtime PM patch
Greg, this is 765e40b675a9566459ddcb8358ad16f3b8344bbe. On úterý 8. srpna 2017 18:43:33 CEST Greg KH wrote: > On Tue, Aug 08, 2017 at 06:36:01PM +0200, Oleksandr Natalenko wrote: > > Could you queue "block: disable runtime-pm for blk-mq" too please? It is > > also related to suspend-resume freezes that were observed by multiple > > users. > What is the git commit id of that patch? > > thanks, > > greg k-h
Re: blk-mq breaks suspend even with runtime PM patch
Greg, this is 765e40b675a9566459ddcb8358ad16f3b8344bbe. On úterý 8. srpna 2017 18:43:33 CEST Greg KH wrote: > On Tue, Aug 08, 2017 at 06:36:01PM +0200, Oleksandr Natalenko wrote: > > Could you queue "block: disable runtime-pm for blk-mq" too please? It is > > also related to suspend-resume freezes that were observed by multiple > > users. > What is the git commit id of that patch? > > thanks, > > greg k-h
Re: blk-mq breaks suspend even with runtime PM patch
On Tue, Aug 08, 2017 at 06:34:01PM +0200, Mike Galbraith wrote: > On Tue, 2017-08-08 at 09:22 -0700, Greg KH wrote: > > On Sun, Jul 30, 2017 at 03:50:15PM +0200, Oleksandr Natalenko wrote: > > > Hello Mike et al. > > > > > > On neděle 30. července 2017 7:12:31 CEST Mike Galbraith wrote: > > > > FWIW, first thing I'd do is update that 4.12.0 to 4.12.4, and see if > > > > stable fixed it. > > > > > > My build already includes v4.12.4. > > > > > > > If not, I'd find these two commits irresistible. > > > > > > > > 5f042e7cbd9eb blk-mq: Include all present CPUs in the default queue > > > > mapping > > > > 4b855ad37194f blk-mq: Create hctx for each present CPU > > > > > > I've applied these 2 commits, and cannot reproduce the issue anymore. > > > Looks > > > like a perfect hit, thanks! > > > > > > > 'course applying random upstream bits does come with some risk, trying > > > > a kernel already containing them has less "entertainment" potential. > > > > > > Should you consider applying them to v4.12.x stable series? CC'ing Greg > > > just > > > in case. > > > > I can queue these up if I get an ack from the developers/maintainers > > that it is ok to do so... > > > > {hint} > > {hint++} > > Those commits take Steven Rostedt's hotplug stress script runtime down > from 4 _minutes_ down to 7 seconds for my RT tree, so I'm rather hoping > you hear an "ACK" too. Oh, nice! Should these go back farther than 4.12? Looks like they apply cleanly to 4.9, didn't look older than that... thanks, greg k-h
Re: blk-mq breaks suspend even with runtime PM patch
On Tue, Aug 08, 2017 at 06:34:01PM +0200, Mike Galbraith wrote: > On Tue, 2017-08-08 at 09:22 -0700, Greg KH wrote: > > On Sun, Jul 30, 2017 at 03:50:15PM +0200, Oleksandr Natalenko wrote: > > > Hello Mike et al. > > > > > > On neděle 30. července 2017 7:12:31 CEST Mike Galbraith wrote: > > > > FWIW, first thing I'd do is update that 4.12.0 to 4.12.4, and see if > > > > stable fixed it. > > > > > > My build already includes v4.12.4. > > > > > > > If not, I'd find these two commits irresistible. > > > > > > > > 5f042e7cbd9eb blk-mq: Include all present CPUs in the default queue > > > > mapping > > > > 4b855ad37194f blk-mq: Create hctx for each present CPU > > > > > > I've applied these 2 commits, and cannot reproduce the issue anymore. > > > Looks > > > like a perfect hit, thanks! > > > > > > > 'course applying random upstream bits does come with some risk, trying > > > > a kernel already containing them has less "entertainment" potential. > > > > > > Should you consider applying them to v4.12.x stable series? CC'ing Greg > > > just > > > in case. > > > > I can queue these up if I get an ack from the developers/maintainers > > that it is ok to do so... > > > > {hint} > > {hint++} > > Those commits take Steven Rostedt's hotplug stress script runtime down > from 4 _minutes_ down to 7 seconds for my RT tree, so I'm rather hoping > you hear an "ACK" too. Oh, nice! Should these go back farther than 4.12? Looks like they apply cleanly to 4.9, didn't look older than that... thanks, greg k-h
Re: blk-mq breaks suspend even with runtime PM patch
On Tue, Aug 08, 2017 at 06:36:01PM +0200, Oleksandr Natalenko wrote: > Could you queue "block: disable runtime-pm for blk-mq" too please? It is also > related to suspend-resume freezes that were observed by multiple users. What is the git commit id of that patch? thanks, greg k-h
Re: blk-mq breaks suspend even with runtime PM patch
On Tue, Aug 08, 2017 at 06:36:01PM +0200, Oleksandr Natalenko wrote: > Could you queue "block: disable runtime-pm for blk-mq" too please? It is also > related to suspend-resume freezes that were observed by multiple users. What is the git commit id of that patch? thanks, greg k-h
Re: blk-mq breaks suspend even with runtime PM patch
Could you queue "block: disable runtime-pm for blk-mq" too please? It is also related to suspend-resume freezes that were observed by multiple users. Thanks. On úterý 8. srpna 2017 18:33:29 CEST Jens Axboe wrote: > On 08/08/2017 10:22 AM, Greg KH wrote: > > On Sun, Jul 30, 2017 at 03:50:15PM +0200, Oleksandr Natalenko wrote: > >> Hello Mike et al. > >> > >> On neděle 30. července 2017 7:12:31 CEST Mike Galbraith wrote: > >>> FWIW, first thing I'd do is update that 4.12.0 to 4.12.4, and see if > >>> stable fixed it. > >> > >> My build already includes v4.12.4. > >> > >>> If not, I'd find these two commits irresistible. > >>> > >>> 5f042e7cbd9eb blk-mq: Include all present CPUs in the default queue > >>> mapping > >>> 4b855ad37194f blk-mq: Create hctx for each present CPU > >> > >> I've applied these 2 commits, and cannot reproduce the issue anymore. > >> Looks > >> like a perfect hit, thanks! > >> > >>> 'course applying random upstream bits does come with some risk, trying > >>> a kernel already containing them has less "entertainment" potential. > >> > >> Should you consider applying them to v4.12.x stable series? CC'ing Greg > >> just in case. > > > > I can queue these up if I get an ack from the developers/maintainers > > that it is ok to do so... > > You can add those two commits to stable.
Re: blk-mq breaks suspend even with runtime PM patch
Could you queue "block: disable runtime-pm for blk-mq" too please? It is also related to suspend-resume freezes that were observed by multiple users. Thanks. On úterý 8. srpna 2017 18:33:29 CEST Jens Axboe wrote: > On 08/08/2017 10:22 AM, Greg KH wrote: > > On Sun, Jul 30, 2017 at 03:50:15PM +0200, Oleksandr Natalenko wrote: > >> Hello Mike et al. > >> > >> On neděle 30. července 2017 7:12:31 CEST Mike Galbraith wrote: > >>> FWIW, first thing I'd do is update that 4.12.0 to 4.12.4, and see if > >>> stable fixed it. > >> > >> My build already includes v4.12.4. > >> > >>> If not, I'd find these two commits irresistible. > >>> > >>> 5f042e7cbd9eb blk-mq: Include all present CPUs in the default queue > >>> mapping > >>> 4b855ad37194f blk-mq: Create hctx for each present CPU > >> > >> I've applied these 2 commits, and cannot reproduce the issue anymore. > >> Looks > >> like a perfect hit, thanks! > >> > >>> 'course applying random upstream bits does come with some risk, trying > >>> a kernel already containing them has less "entertainment" potential. > >> > >> Should you consider applying them to v4.12.x stable series? CC'ing Greg > >> just in case. > > > > I can queue these up if I get an ack from the developers/maintainers > > that it is ok to do so... > > You can add those two commits to stable.
Re: blk-mq breaks suspend even with runtime PM patch
On Tue, 2017-08-08 at 09:22 -0700, Greg KH wrote: > On Sun, Jul 30, 2017 at 03:50:15PM +0200, Oleksandr Natalenko wrote: > > Hello Mike et al. > > > > On neděle 30. července 2017 7:12:31 CEST Mike Galbraith wrote: > > > FWIW, first thing I'd do is update that 4.12.0 to 4.12.4, and see if > > > stable fixed it. > > > > My build already includes v4.12.4. > > > > > If not, I'd find these two commits irresistible. > > > > > > 5f042e7cbd9eb blk-mq: Include all present CPUs in the default queue > > > mapping > > > 4b855ad37194f blk-mq: Create hctx for each present CPU > > > > I've applied these 2 commits, and cannot reproduce the issue anymore. Looks > > like a perfect hit, thanks! > > > > > 'course applying random upstream bits does come with some risk, trying > > > a kernel already containing them has less "entertainment" potential. > > > > Should you consider applying them to v4.12.x stable series? CC'ing Greg > > just > > in case. > > I can queue these up if I get an ack from the developers/maintainers > that it is ok to do so... > > {hint} {hint++} Those commits take Steven Rostedt's hotplug stress script runtime down from 4 _minutes_ down to 7 seconds for my RT tree, so I'm rather hoping you hear an "ACK" too. -Mike
Re: blk-mq breaks suspend even with runtime PM patch
On Tue, 2017-08-08 at 09:22 -0700, Greg KH wrote: > On Sun, Jul 30, 2017 at 03:50:15PM +0200, Oleksandr Natalenko wrote: > > Hello Mike et al. > > > > On neděle 30. července 2017 7:12:31 CEST Mike Galbraith wrote: > > > FWIW, first thing I'd do is update that 4.12.0 to 4.12.4, and see if > > > stable fixed it. > > > > My build already includes v4.12.4. > > > > > If not, I'd find these two commits irresistible. > > > > > > 5f042e7cbd9eb blk-mq: Include all present CPUs in the default queue > > > mapping > > > 4b855ad37194f blk-mq: Create hctx for each present CPU > > > > I've applied these 2 commits, and cannot reproduce the issue anymore. Looks > > like a perfect hit, thanks! > > > > > 'course applying random upstream bits does come with some risk, trying > > > a kernel already containing them has less "entertainment" potential. > > > > Should you consider applying them to v4.12.x stable series? CC'ing Greg > > just > > in case. > > I can queue these up if I get an ack from the developers/maintainers > that it is ok to do so... > > {hint} {hint++} Those commits take Steven Rostedt's hotplug stress script runtime down from 4 _minutes_ down to 7 seconds for my RT tree, so I'm rather hoping you hear an "ACK" too. -Mike
Re: blk-mq breaks suspend even with runtime PM patch
On 08/08/2017 10:22 AM, Greg KH wrote: > On Sun, Jul 30, 2017 at 03:50:15PM +0200, Oleksandr Natalenko wrote: >> Hello Mike et al. >> >> On neděle 30. července 2017 7:12:31 CEST Mike Galbraith wrote: >>> FWIW, first thing I'd do is update that 4.12.0 to 4.12.4, and see if >>> stable fixed it. >> >> My build already includes v4.12.4. >> >>> If not, I'd find these two commits irresistible. >>> >>> 5f042e7cbd9eb blk-mq: Include all present CPUs in the default queue mapping >>> 4b855ad37194f blk-mq: Create hctx for each present CPU >> >> I've applied these 2 commits, and cannot reproduce the issue anymore. Looks >> like a perfect hit, thanks! >> >>> 'course applying random upstream bits does come with some risk, trying >>> a kernel already containing them has less "entertainment" potential. >> >> Should you consider applying them to v4.12.x stable series? CC'ing Greg just >> in case. > > I can queue these up if I get an ack from the developers/maintainers > that it is ok to do so... You can add those two commits to stable. -- Jens Axboe
Re: blk-mq breaks suspend even with runtime PM patch
On 08/08/2017 10:22 AM, Greg KH wrote: > On Sun, Jul 30, 2017 at 03:50:15PM +0200, Oleksandr Natalenko wrote: >> Hello Mike et al. >> >> On neděle 30. července 2017 7:12:31 CEST Mike Galbraith wrote: >>> FWIW, first thing I'd do is update that 4.12.0 to 4.12.4, and see if >>> stable fixed it. >> >> My build already includes v4.12.4. >> >>> If not, I'd find these two commits irresistible. >>> >>> 5f042e7cbd9eb blk-mq: Include all present CPUs in the default queue mapping >>> 4b855ad37194f blk-mq: Create hctx for each present CPU >> >> I've applied these 2 commits, and cannot reproduce the issue anymore. Looks >> like a perfect hit, thanks! >> >>> 'course applying random upstream bits does come with some risk, trying >>> a kernel already containing them has less "entertainment" potential. >> >> Should you consider applying them to v4.12.x stable series? CC'ing Greg just >> in case. > > I can queue these up if I get an ack from the developers/maintainers > that it is ok to do so... You can add those two commits to stable. -- Jens Axboe
Re: blk-mq breaks suspend even with runtime PM patch
On Sun, Jul 30, 2017 at 03:50:15PM +0200, Oleksandr Natalenko wrote: > Hello Mike et al. > > On neděle 30. července 2017 7:12:31 CEST Mike Galbraith wrote: > > FWIW, first thing I'd do is update that 4.12.0 to 4.12.4, and see if > > stable fixed it. > > My build already includes v4.12.4. > > > If not, I'd find these two commits irresistible. > > > > 5f042e7cbd9eb blk-mq: Include all present CPUs in the default queue mapping > > 4b855ad37194f blk-mq: Create hctx for each present CPU > > I've applied these 2 commits, and cannot reproduce the issue anymore. Looks > like a perfect hit, thanks! > > > 'course applying random upstream bits does come with some risk, trying > > a kernel already containing them has less "entertainment" potential. > > Should you consider applying them to v4.12.x stable series? CC'ing Greg just > in case. I can queue these up if I get an ack from the developers/maintainers that it is ok to do so... {hint} thanks, greg k-h
Re: blk-mq breaks suspend even with runtime PM patch
On Sun, Jul 30, 2017 at 03:50:15PM +0200, Oleksandr Natalenko wrote: > Hello Mike et al. > > On neděle 30. července 2017 7:12:31 CEST Mike Galbraith wrote: > > FWIW, first thing I'd do is update that 4.12.0 to 4.12.4, and see if > > stable fixed it. > > My build already includes v4.12.4. > > > If not, I'd find these two commits irresistible. > > > > 5f042e7cbd9eb blk-mq: Include all present CPUs in the default queue mapping > > 4b855ad37194f blk-mq: Create hctx for each present CPU > > I've applied these 2 commits, and cannot reproduce the issue anymore. Looks > like a perfect hit, thanks! > > > 'course applying random upstream bits does come with some risk, trying > > a kernel already containing them has less "entertainment" potential. > > Should you consider applying them to v4.12.x stable series? CC'ing Greg just > in case. I can queue these up if I get an ack from the developers/maintainers that it is ok to do so... {hint} thanks, greg k-h
Re: blk-mq breaks suspend even with runtime PM patch
Hello Mike et al. On neděle 30. července 2017 7:12:31 CEST Mike Galbraith wrote: > FWIW, first thing I'd do is update that 4.12.0 to 4.12.4, and see if > stable fixed it. My build already includes v4.12.4. > If not, I'd find these two commits irresistible. > > 5f042e7cbd9eb blk-mq: Include all present CPUs in the default queue mapping > 4b855ad37194f blk-mq: Create hctx for each present CPU I've applied these 2 commits, and cannot reproduce the issue anymore. Looks like a perfect hit, thanks! > 'course applying random upstream bits does come with some risk, trying > a kernel already containing them has less "entertainment" potential. Should you consider applying them to v4.12.x stable series? CC'ing Greg just in case.
Re: blk-mq breaks suspend even with runtime PM patch
Hello Mike et al. On neděle 30. července 2017 7:12:31 CEST Mike Galbraith wrote: > FWIW, first thing I'd do is update that 4.12.0 to 4.12.4, and see if > stable fixed it. My build already includes v4.12.4. > If not, I'd find these two commits irresistible. > > 5f042e7cbd9eb blk-mq: Include all present CPUs in the default queue mapping > 4b855ad37194f blk-mq: Create hctx for each present CPU I've applied these 2 commits, and cannot reproduce the issue anymore. Looks like a perfect hit, thanks! > 'course applying random upstream bits does come with some risk, trying > a kernel already containing them has less "entertainment" potential. Should you consider applying them to v4.12.x stable series? CC'ing Greg just in case.
Re: blk-mq breaks suspend even with runtime PM patch
On Sat, 2017-07-29 at 17:27 +0200, Oleksandr Natalenko wrote: > Hello Jens, Christoph. > > Unfortunately, even with "block: disable runtime-pm for blk-mq" patch applied > blk-mq breaks suspend to RAM for me. It is reproducible on my laptop as well > as in a VM. > > I use complex disk layout involving MD, LUKS and LVM, and managed to get > these > warnings from VM via serial console when suspend fails: > > === > [ 245.516573] INFO: task kworker/0:1:49 blocked for more than 120 seconds. > [ 245.520025] Not tainted 4.12.0-pf4 #1 FWIW, first thing I'd do is update that 4.12.0 to 4.12.4, and see if stable fixed it. If not, I'd find these two commits irresistible. 5f042e7cbd9eb blk-mq: Include all present CPUs in the default queue mapping 4b855ad37194f blk-mq: Create hctx for each present CPU 'course applying random upstream bits does come with some risk, trying a kernel already containing them has less "entertainment" potential. -Mike
Re: blk-mq breaks suspend even with runtime PM patch
On Sat, 2017-07-29 at 17:27 +0200, Oleksandr Natalenko wrote: > Hello Jens, Christoph. > > Unfortunately, even with "block: disable runtime-pm for blk-mq" patch applied > blk-mq breaks suspend to RAM for me. It is reproducible on my laptop as well > as in a VM. > > I use complex disk layout involving MD, LUKS and LVM, and managed to get > these > warnings from VM via serial console when suspend fails: > > === > [ 245.516573] INFO: task kworker/0:1:49 blocked for more than 120 seconds. > [ 245.520025] Not tainted 4.12.0-pf4 #1 FWIW, first thing I'd do is update that 4.12.0 to 4.12.4, and see if stable fixed it. If not, I'd find these two commits irresistible. 5f042e7cbd9eb blk-mq: Include all present CPUs in the default queue mapping 4b855ad37194f blk-mq: Create hctx for each present CPU 'course applying random upstream bits does come with some risk, trying a kernel already containing them has less "entertainment" potential. -Mike
Re: blk-mq breaks suspend even with runtime PM patch
Recompiled kernel with lockdep enabled gives me this: === [ 368.655051] Showing all locks held in the system: [ 368.656387] 1 lock held by khungtaskd/37: [ 368.657171] #0: (tasklist_lock){.+.+..}, at: [] debug_show_all_locks+0x3d/0x1a0 [ 368.658725] 1 lock held by md0_raid10/458: [ 368.659455] #0: (>reconfig_mutex){+.+.+.}, at: [] md_check_recovery+0xaf/0x4d0 [md_mod] [ 368.661403] 3 locks held by btrfs-transacti/550: [ 368.662754] #0: (_info->transaction_kthread_mutex){+.+...}, at: [] transaction_kthread+0x69/0x1c0 [btrfs] [ 368.664797] #1: (_info->reloc_mutex){+.+...}, at: [] btrfs_commit_transaction+0x2e1/0x9b0 [btrfs] [ 368.69] #2: (_info->tree_log_mutex){+.+...}, at: [] btrfs_commit_transaction+0x351/0x9b0 [btrfs] [ 368.668644] 4 locks held by kworker/0:2/888: [ 368.669384] #0: ("events"){.+.+.+}, at: [] process_one_work+0x1fb/0x6e0 [ 368.670916] #1: ((shepherd).work){+.+...}, at: [] process_one_work+0x1fb/0x6e0 [ 368.672592] #2: (cpu_hotplug.dep_map){++}, at: [] get_online_cpus.part.14+0x5/0x50 [ 368.674742] #3: (cpu_hotplug.lock){+.+.+.}, at: [] get_online_cpus.part.14+0x3a/0x50 [ 368.677494] 10 locks held by systemd-sleep/889: [ 368.678650] #0: (sb_writers#5){.+.+.+}, at: [] vfs_write+0x17b/0x1a0 [ 368.680483] #1: (>mutex){+.+.+.}, at: [] kernfs_fop_write+0x123/0x1e0 [ 368.682412] #2: (s_active#257){.+.+.+}, at: [] kernfs_fop_write+0x12c/0x1e0 [ 368.684440] #3: (autosleep_lock){+.+.+.}, at: [] pm_autosleep_lock+0x17/0x20 [ 368.686707] #4: (pm_mutex){+.+.+.}, at: [] pm_suspend +0x88/0x490 [ 368.688086] #5: (acpi_scan_lock){+.+.+.}, at: [] acpi_scan_lock_acquire+0x17/0x20 [ 368.690213] #6: (cpu_add_remove_lock){+.+.+.}, at: [] freeze_secondary_cpus+0x30/0x3c0 [ 368.692016] #7: (cpu_hotplug.dep_map){++}, at: [] cpu_hotplug_begin+0x5/0xe0 [ 368.694347] #8: (cpu_hotplug.lock){+.+.+.}, at: [] cpu_hotplug_begin+0x83/0xe0 [ 368.696010] #9: (all_q_mutex){+.+...}, at: [] blk_mq_queue_reinit_work+0x1a/0x110 [ 368.698624] [ 368.698990] = [ 368.698990] === Deadlock with CPU hotplug? On sobota 29. července 2017 17:27:41 CEST Oleksandr Natalenko wrote: > Hello Jens, Christoph. > > Unfortunately, even with "block: disable runtime-pm for blk-mq" patch > applied blk-mq breaks suspend to RAM for me. It is reproducible on my > laptop as well as in a VM. > > I use complex disk layout involving MD, LUKS and LVM, and managed to get > these warnings from VM via serial console when suspend fails: > > === > [ 245.516573] INFO: task kworker/0:1:49 blocked for more than 120 seconds. > [ 245.520025] Not tainted 4.12.0-pf4 #1 > [ 245.521836] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 245.525612] kworker/0:1 D049 2 0x > [ 245.527515] Workqueue: events vmstat_shepherd > [ 245.528685] Call Trace: > [ 245.529296] __schedule+0x459/0xe40 > [ 245.530115] ? kvm_clock_read+0x25/0x40 > [ 245.531003] ? ktime_get+0x40/0xa0 > [ 245.531819] schedule+0x3d/0xb0 > [ 245.532542] ? schedule+0x3d/0xb0 > [ 245.533299] schedule_preempt_disabled+0x15/0x20 > [ 245.534367] __mutex_lock.isra.5+0x295/0x530 > [ 245.535351] __mutex_lock_slowpath+0x13/0x20 > [ 245.536362] ? __mutex_lock_slowpath+0x13/0x20 > [ 245.537334] mutex_lock+0x25/0x30 > [ 245.538118] get_online_cpus.part.14+0x15/0x30 > [ 245.539588] get_online_cpus+0x20/0x30 > [ 245.540560] vmstat_shepherd+0x21/0xc0 > [ 245.541538] process_one_work+0x1de/0x430 > [ 245.542364] worker_thread+0x47/0x3f0 > [ 245.543042] kthread+0x125/0x140 > [ 245.543649] ? process_one_work+0x430/0x430 > [ 245.544417] ? kthread_create_on_node+0x70/0x70 > [ 245.545737] ret_from_fork+0x25/0x30 > [ 245.546490] INFO: task md0_raid10:459 blocked for more than 120 seconds. > [ 245.547668] Not tainted 4.12.0-pf4 #1 > [ 245.548769] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 245.550133] md0_raid10 D0 459 2 0x > [ 245.551092] Call Trace: > [ 245.551539] __schedule+0x459/0xe40 > [ 245.552163] schedule+0x3d/0xb0 > [ 245.552728] ? schedule+0x3d/0xb0 > [ 245.553344] md_super_wait+0x6e/0xa0 [md_mod] > [ 245.554118] ? wake_bit_function+0x60/0x60 > [ 245.554854] md_update_sb.part.60+0x3df/0x840 [md_mod] > [ 245.555771] md_check_recovery+0x215/0x4b0 [md_mod] > [ 245.556732] raid10d+0x62/0x13c0 [raid10] > [ 245.557456] ? schedule+0x3d/0xb0 > [ 245.558169] ? schedule+0x3d/0xb0 > [ 245.558803] ? schedule_timeout+0x21f/0x330 > [ 245.559593] md_thread+0x120/0x160 [md_mod] > [ 245.560380] ? md_thread+0x120/0x160 [md_mod] > [ 245.561202] ? wake_bit_function+0x60/0x60 > [ 245.561975] kthread+0x125/0x140 > [ 245.562601] ? find_pers+0x70/0x70 [md_mod] > [ 245.563394] ? kthread_create_on_node+0x70/0x70 > [ 245.564516] ret_from_fork+0x25/0x30 > [ 245.565669] INFO: task dmcrypt_write:487
Re: blk-mq breaks suspend even with runtime PM patch
Recompiled kernel with lockdep enabled gives me this: === [ 368.655051] Showing all locks held in the system: [ 368.656387] 1 lock held by khungtaskd/37: [ 368.657171] #0: (tasklist_lock){.+.+..}, at: [] debug_show_all_locks+0x3d/0x1a0 [ 368.658725] 1 lock held by md0_raid10/458: [ 368.659455] #0: (>reconfig_mutex){+.+.+.}, at: [] md_check_recovery+0xaf/0x4d0 [md_mod] [ 368.661403] 3 locks held by btrfs-transacti/550: [ 368.662754] #0: (_info->transaction_kthread_mutex){+.+...}, at: [] transaction_kthread+0x69/0x1c0 [btrfs] [ 368.664797] #1: (_info->reloc_mutex){+.+...}, at: [] btrfs_commit_transaction+0x2e1/0x9b0 [btrfs] [ 368.69] #2: (_info->tree_log_mutex){+.+...}, at: [] btrfs_commit_transaction+0x351/0x9b0 [btrfs] [ 368.668644] 4 locks held by kworker/0:2/888: [ 368.669384] #0: ("events"){.+.+.+}, at: [] process_one_work+0x1fb/0x6e0 [ 368.670916] #1: ((shepherd).work){+.+...}, at: [] process_one_work+0x1fb/0x6e0 [ 368.672592] #2: (cpu_hotplug.dep_map){++}, at: [] get_online_cpus.part.14+0x5/0x50 [ 368.674742] #3: (cpu_hotplug.lock){+.+.+.}, at: [] get_online_cpus.part.14+0x3a/0x50 [ 368.677494] 10 locks held by systemd-sleep/889: [ 368.678650] #0: (sb_writers#5){.+.+.+}, at: [] vfs_write+0x17b/0x1a0 [ 368.680483] #1: (>mutex){+.+.+.}, at: [] kernfs_fop_write+0x123/0x1e0 [ 368.682412] #2: (s_active#257){.+.+.+}, at: [] kernfs_fop_write+0x12c/0x1e0 [ 368.684440] #3: (autosleep_lock){+.+.+.}, at: [] pm_autosleep_lock+0x17/0x20 [ 368.686707] #4: (pm_mutex){+.+.+.}, at: [] pm_suspend +0x88/0x490 [ 368.688086] #5: (acpi_scan_lock){+.+.+.}, at: [] acpi_scan_lock_acquire+0x17/0x20 [ 368.690213] #6: (cpu_add_remove_lock){+.+.+.}, at: [] freeze_secondary_cpus+0x30/0x3c0 [ 368.692016] #7: (cpu_hotplug.dep_map){++}, at: [] cpu_hotplug_begin+0x5/0xe0 [ 368.694347] #8: (cpu_hotplug.lock){+.+.+.}, at: [] cpu_hotplug_begin+0x83/0xe0 [ 368.696010] #9: (all_q_mutex){+.+...}, at: [] blk_mq_queue_reinit_work+0x1a/0x110 [ 368.698624] [ 368.698990] = [ 368.698990] === Deadlock with CPU hotplug? On sobota 29. července 2017 17:27:41 CEST Oleksandr Natalenko wrote: > Hello Jens, Christoph. > > Unfortunately, even with "block: disable runtime-pm for blk-mq" patch > applied blk-mq breaks suspend to RAM for me. It is reproducible on my > laptop as well as in a VM. > > I use complex disk layout involving MD, LUKS and LVM, and managed to get > these warnings from VM via serial console when suspend fails: > > === > [ 245.516573] INFO: task kworker/0:1:49 blocked for more than 120 seconds. > [ 245.520025] Not tainted 4.12.0-pf4 #1 > [ 245.521836] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 245.525612] kworker/0:1 D049 2 0x > [ 245.527515] Workqueue: events vmstat_shepherd > [ 245.528685] Call Trace: > [ 245.529296] __schedule+0x459/0xe40 > [ 245.530115] ? kvm_clock_read+0x25/0x40 > [ 245.531003] ? ktime_get+0x40/0xa0 > [ 245.531819] schedule+0x3d/0xb0 > [ 245.532542] ? schedule+0x3d/0xb0 > [ 245.533299] schedule_preempt_disabled+0x15/0x20 > [ 245.534367] __mutex_lock.isra.5+0x295/0x530 > [ 245.535351] __mutex_lock_slowpath+0x13/0x20 > [ 245.536362] ? __mutex_lock_slowpath+0x13/0x20 > [ 245.537334] mutex_lock+0x25/0x30 > [ 245.538118] get_online_cpus.part.14+0x15/0x30 > [ 245.539588] get_online_cpus+0x20/0x30 > [ 245.540560] vmstat_shepherd+0x21/0xc0 > [ 245.541538] process_one_work+0x1de/0x430 > [ 245.542364] worker_thread+0x47/0x3f0 > [ 245.543042] kthread+0x125/0x140 > [ 245.543649] ? process_one_work+0x430/0x430 > [ 245.544417] ? kthread_create_on_node+0x70/0x70 > [ 245.545737] ret_from_fork+0x25/0x30 > [ 245.546490] INFO: task md0_raid10:459 blocked for more than 120 seconds. > [ 245.547668] Not tainted 4.12.0-pf4 #1 > [ 245.548769] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 245.550133] md0_raid10 D0 459 2 0x > [ 245.551092] Call Trace: > [ 245.551539] __schedule+0x459/0xe40 > [ 245.552163] schedule+0x3d/0xb0 > [ 245.552728] ? schedule+0x3d/0xb0 > [ 245.553344] md_super_wait+0x6e/0xa0 [md_mod] > [ 245.554118] ? wake_bit_function+0x60/0x60 > [ 245.554854] md_update_sb.part.60+0x3df/0x840 [md_mod] > [ 245.555771] md_check_recovery+0x215/0x4b0 [md_mod] > [ 245.556732] raid10d+0x62/0x13c0 [raid10] > [ 245.557456] ? schedule+0x3d/0xb0 > [ 245.558169] ? schedule+0x3d/0xb0 > [ 245.558803] ? schedule_timeout+0x21f/0x330 > [ 245.559593] md_thread+0x120/0x160 [md_mod] > [ 245.560380] ? md_thread+0x120/0x160 [md_mod] > [ 245.561202] ? wake_bit_function+0x60/0x60 > [ 245.561975] kthread+0x125/0x140 > [ 245.562601] ? find_pers+0x70/0x70 [md_mod] > [ 245.563394] ? kthread_create_on_node+0x70/0x70 > [ 245.564516] ret_from_fork+0x25/0x30 > [ 245.565669] INFO: task dmcrypt_write:487