Re: Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram
On Wed 19-04-17 15:50:01, Linus Torvalds wrote: > On Wed, Apr 19, 2017 at 1:17 AM, Michal Hocko wrote: > > > > Thanks for the testing. Linus will you take the patch from this thread > > or you prefer a resend? > > I'll take it from this branch since I'm looking at it now, but in > general I prefer resends just because finding patches deep in some > discussion is very iffy. Yeah, I perfectly understand this and that's why I've asked. Thanks for taking the patch! -- Michal Hocko SUSE Labs
Re: Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram
On Wed, Apr 19, 2017 at 1:17 AM, Michal Hocko wrote: > > Thanks for the testing. Linus will you take the patch from this thread > or you prefer a resend? I'll take it from this branch since I'm looking at it now, but in general I prefer resends just because finding patches deep in some discussion is very iffy. I get too much email, so it really helps to make the patches more explicit than this... Linus
Re: Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram
On Wed 19-04-17 10:09:12, Geert Uytterhoeven wrote: > Hi Michal, Tetsuo, > > On Wed, Apr 19, 2017 at 9:57 AM, Michal Hocko wrote: > > From f3c6e287042259d6ae9916f1ff66392c46ce2a3c Mon Sep 17 00:00:00 2001 > > From: Michal Hocko > > Date: Wed, 19 Apr 2017 09:52:46 +0200 > > Subject: [PATCH] mm: make mm_percpu_wq non freezable > > > > Geert has reported a freeze during PM resume and some additional > > debugging has shown that the device_resume worker cannot make a forward > > progress because it waits for an event which is stuck waiting in > > drain_all_pages: > > [ 243.691979] INFO: task kworker/u4:0:5 blocked for more than 120 seconds. > > [ 243.698684] Not tainted > > 4.11.0-rc7-koelsch-00029-g005882e53d62f25d-dirty #3476 > > [ 243.706439] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > > this message. > > [ 243.714276] kworker/u4:0D0 5 2 0x > > [ 243.719780] Workqueue: events_unbound async_run_entry_fn > > [ 243.725118] [] (__schedule) from [] > > (schedule+0xb0/0xcc) > > [ 243.732181] [] (schedule) from [] > > (schedule_timeout+0x18/0x1f4) > > [ 243.739840] [] (schedule_timeout) from [] > > (wait_for_common+0x100/0x19c) > > [ 243.748207] [] (wait_for_common) from [] > > (dpm_wait_for_superior+0x14/0x5c) > > [ 243.756836] [] (dpm_wait_for_superior) from [] > > (device_resume+0x40/0x1a0) > > [ 243.765380] [] (device_resume) from [] > > (async_resume+0x18/0x44) > > [ 243.773055] [] (async_resume) from [] > > (async_run_entry_fn+0x44/0x114) > > [ 243.781245] [] (async_run_entry_fn) from [] > > (process_one_work+0x1cc/0x31c) > > [ 243.789876] [] (process_one_work) from [] > > (worker_thread+0x2b8/0x3f0) > > [ 243.798080] [] (worker_thread) from [] > > (kthread+0x120/0x140) > > [ 243.805500] [] (kthread) from [] > > (ret_from_fork+0x14/0x2c) > > [...] > > [ 249.441198] bashD0 1703 1694 0x > > [ 249.446702] [] (__schedule) from [] > > (schedule+0xb0/0xcc) > > [ 249.453764] [] (schedule) from [] > > (schedule_timeout+0x18/0x1f4) > > [ 249.461427] [] (schedule_timeout) from [] > > (wait_for_common+0x100/0x19c) > > [ 249.469797] [] (wait_for_common) from [] > > (flush_work+0x128/0x158) > > [ 249.477650] [] (flush_work) from [] > > (drain_all_pages+0x198/0x1f0) > > [ 249.485503] [] (drain_all_pages) from [] > > (start_isolate_page_range+0xd8/0x1ac) > > [ 249.494484] [] (start_isolate_page_range) from [] > > (alloc_contig_range+0xc4/0x304) > > [ 249.503724] [] (alloc_contig_range) from [] > > (cma_alloc+0x134/0x1bc) > > [ 249.511739] [] (cma_alloc) from [] > > (__alloc_from_contiguous+0x30/0xa0) > > [ 249.520023] [] (__alloc_from_contiguous) from [] > > (cma_allocator_alloc+0x40/0x48) > > [ 249.529173] [] (cma_allocator_alloc) from [] > > (__dma_alloc+0x1d4/0x2e8) > > [ 249.537455] [] (__dma_alloc) from [] > > (arm_dma_alloc+0x40/0x4c) > > [ 249.545047] [] (arm_dma_alloc) from [] > > (sh_eth_ring_init+0xec/0x1b8) > > [ 249.553160] [] (sh_eth_ring_init) from [] > > (sh_eth_open+0x88/0x1e0) > > [ 249.561086] [] (sh_eth_open) from [] > > (sh_eth_resume+0x7c/0xc0) > > [ 249.568678] [] (sh_eth_resume) from [] > > (dpm_run_callback+0x48/0xc8) > > [ 249.576702] [] (dpm_run_callback) from [] > > (device_resume+0x15c/0x1a0) > > [ 249.584898] [] (device_resume) from [] > > (dpm_resume+0xe4/0x244) > > [ 249.592485] [] (dpm_resume) from [] > > (dpm_resume_end+0xc/0x18) > > [ 249.599977] [] (dpm_resume_end) from [] > > (suspend_devices_and_enter+0x3c8/0x490) > > [ 249.609042] [] (suspend_devices_and_enter) from [] > > (pm_suspend+0x228/0x280) > > [ 249.617759] [] (pm_suspend) from [] > > (state_store+0xac/0xcc) > > [ 249.625089] [] (state_store) from [] > > (kernfs_fop_write+0x164/0x1a0) > > [ 249.633116] [] (kernfs_fop_write) from [] > > (__vfs_write+0x20/0x108) > > [ 249.641043] [] (__vfs_write) from [] > > (vfs_write+0xb8/0x144) > > [ 249.648373] [] (vfs_write) from [] > > (SyS_write+0x40/0x80) > > [ 249.655437] [] (SyS_write) from [] > > (ret_fast_syscall+0x0/0x34) > > [...] > > [ 254.753928] Showing busy workqueues and worker pools: > > [...] > > [ 254.854225] workqueue mm_percpu_wq: flags=0xc > > [ 254.858583] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=0/0 > > [ 254.864428] delayed: drain_local_pages_wq, vmstat_update > > [ 254.870111] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=0/0 > > [ 254.875957] delayed: drain_local_pages_wq BAR(1703), vmstat_update > > > > Tetsuo has properly noted that mm_percpu_wq is created as WQ_FREEZABLE > > so it is frozen this early during resume so we are effectively deadlocked. > > Fix this by dropping WQ_FREEZABLE when creating mm_percpu_wq. We really > > want to > > have it operational all the time. > > > > Fixes: ce612879ddc7 ("mm: move pcp and lru-pcp draining into single wq") > > Reported-by: Geert Uytterhoeven > > Debugged-by: Tetsuo Handa > > Signed-off-by: Michal Hocko > > Tested-by: Geert Uytterhoev
Re: Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram
Hi Michal, Tetsuo, On Wed, Apr 19, 2017 at 9:57 AM, Michal Hocko wrote: > From f3c6e287042259d6ae9916f1ff66392c46ce2a3c Mon Sep 17 00:00:00 2001 > From: Michal Hocko > Date: Wed, 19 Apr 2017 09:52:46 +0200 > Subject: [PATCH] mm: make mm_percpu_wq non freezable > > Geert has reported a freeze during PM resume and some additional > debugging has shown that the device_resume worker cannot make a forward > progress because it waits for an event which is stuck waiting in > drain_all_pages: > [ 243.691979] INFO: task kworker/u4:0:5 blocked for more than 120 seconds. > [ 243.698684] Not tainted > 4.11.0-rc7-koelsch-00029-g005882e53d62f25d-dirty #3476 > [ 243.706439] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 243.714276] kworker/u4:0D0 5 2 0x > [ 243.719780] Workqueue: events_unbound async_run_entry_fn > [ 243.725118] [] (__schedule) from [] > (schedule+0xb0/0xcc) > [ 243.732181] [] (schedule) from [] > (schedule_timeout+0x18/0x1f4) > [ 243.739840] [] (schedule_timeout) from [] > (wait_for_common+0x100/0x19c) > [ 243.748207] [] (wait_for_common) from [] > (dpm_wait_for_superior+0x14/0x5c) > [ 243.756836] [] (dpm_wait_for_superior) from [] > (device_resume+0x40/0x1a0) > [ 243.765380] [] (device_resume) from [] > (async_resume+0x18/0x44) > [ 243.773055] [] (async_resume) from [] > (async_run_entry_fn+0x44/0x114) > [ 243.781245] [] (async_run_entry_fn) from [] > (process_one_work+0x1cc/0x31c) > [ 243.789876] [] (process_one_work) from [] > (worker_thread+0x2b8/0x3f0) > [ 243.798080] [] (worker_thread) from [] > (kthread+0x120/0x140) > [ 243.805500] [] (kthread) from [] > (ret_from_fork+0x14/0x2c) > [...] > [ 249.441198] bashD0 1703 1694 0x > [ 249.446702] [] (__schedule) from [] > (schedule+0xb0/0xcc) > [ 249.453764] [] (schedule) from [] > (schedule_timeout+0x18/0x1f4) > [ 249.461427] [] (schedule_timeout) from [] > (wait_for_common+0x100/0x19c) > [ 249.469797] [] (wait_for_common) from [] > (flush_work+0x128/0x158) > [ 249.477650] [] (flush_work) from [] > (drain_all_pages+0x198/0x1f0) > [ 249.485503] [] (drain_all_pages) from [] > (start_isolate_page_range+0xd8/0x1ac) > [ 249.494484] [] (start_isolate_page_range) from [] > (alloc_contig_range+0xc4/0x304) > [ 249.503724] [] (alloc_contig_range) from [] > (cma_alloc+0x134/0x1bc) > [ 249.511739] [] (cma_alloc) from [] > (__alloc_from_contiguous+0x30/0xa0) > [ 249.520023] [] (__alloc_from_contiguous) from [] > (cma_allocator_alloc+0x40/0x48) > [ 249.529173] [] (cma_allocator_alloc) from [] > (__dma_alloc+0x1d4/0x2e8) > [ 249.537455] [] (__dma_alloc) from [] > (arm_dma_alloc+0x40/0x4c) > [ 249.545047] [] (arm_dma_alloc) from [] > (sh_eth_ring_init+0xec/0x1b8) > [ 249.553160] [] (sh_eth_ring_init) from [] > (sh_eth_open+0x88/0x1e0) > [ 249.561086] [] (sh_eth_open) from [] > (sh_eth_resume+0x7c/0xc0) > [ 249.568678] [] (sh_eth_resume) from [] > (dpm_run_callback+0x48/0xc8) > [ 249.576702] [] (dpm_run_callback) from [] > (device_resume+0x15c/0x1a0) > [ 249.584898] [] (device_resume) from [] > (dpm_resume+0xe4/0x244) > [ 249.592485] [] (dpm_resume) from [] > (dpm_resume_end+0xc/0x18) > [ 249.599977] [] (dpm_resume_end) from [] > (suspend_devices_and_enter+0x3c8/0x490) > [ 249.609042] [] (suspend_devices_and_enter) from [] > (pm_suspend+0x228/0x280) > [ 249.617759] [] (pm_suspend) from [] > (state_store+0xac/0xcc) > [ 249.625089] [] (state_store) from [] > (kernfs_fop_write+0x164/0x1a0) > [ 249.633116] [] (kernfs_fop_write) from [] > (__vfs_write+0x20/0x108) > [ 249.641043] [] (__vfs_write) from [] > (vfs_write+0xb8/0x144) > [ 249.648373] [] (vfs_write) from [] > (SyS_write+0x40/0x80) > [ 249.655437] [] (SyS_write) from [] > (ret_fast_syscall+0x0/0x34) > [...] > [ 254.753928] Showing busy workqueues and worker pools: > [...] > [ 254.854225] workqueue mm_percpu_wq: flags=0xc > [ 254.858583] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=0/0 > [ 254.864428] delayed: drain_local_pages_wq, vmstat_update > [ 254.870111] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=0/0 > [ 254.875957] delayed: drain_local_pages_wq BAR(1703), vmstat_update > > Tetsuo has properly noted that mm_percpu_wq is created as WQ_FREEZABLE > so it is frozen this early during resume so we are effectively deadlocked. > Fix this by dropping WQ_FREEZABLE when creating mm_percpu_wq. We really want > to > have it operational all the time. > > Fixes: ce612879ddc7 ("mm: move pcp and lru-pcp draining into single wq") > Reported-by: Geert Uytterhoeven > Debugged-by: Tetsuo Handa > Signed-off-by: Michal Hocko Tested-by: Geert Uytterhoeven Thanks a lot to both of you! Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just sa
Re: Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram
On Wed 19-04-17 16:26:10, Tetsuo Handa wrote: > > On Wed 19-04-17 14:41:30, Tetsuo Handa wrote: > > [...] > > > Somebody is waiting forever with cpu_hotplug.lock held? > > > > Why would that matter for drain_all_pages? It doesn't use > > get_online_cpus since a459eeb7b852 ("mm, page_alloc: do not depend on > > cpu hotplug locks inside the allocator") while ce612879ddc7 ("mm: move > > pcp and lru-pcp draining into single wq") was merged later. > > > > Looking at ce612879ddc7 ("mm: move pcp and lru-pcp draining into single wq"), > we merged "lru-add-drain" (!WQ_FREEZABLE && WQ_MEM_RECLAIM) workqueue and > "vmstat" (WQ_FREEZABLE && WQ_MEM_RECLAIM) workqueue into > "mm_percpu_wq" (WQ_FREEZABLE && WQ_MEM_RECLAIM) workqueue. > > - lru_add_drain_wq = alloc_workqueue("lru-add-drain", WQ_MEM_RECLAIM, > 0); > - vmstat_wq = alloc_workqueue("vmstat", WQ_FREEZABLE|WQ_MEM_RECLAIM, 0); > + mm_percpu_wq = alloc_workqueue("mm_percpu_wq", > + WQ_FREEZABLE|WQ_MEM_RECLAIM, 0); > > That means "lru-add-drain" became freezable, doesn't it? And this problem > occurs around resume operation where all freezable threads are frozen? > Then, lru_add_drain_per_cpu() cannot be performed due to mm_percpu_wq frozen? Ohh, right you are! Very well spotted. I have completely missed WQ_FREEZABLE there. The following should work --- >From f3c6e287042259d6ae9916f1ff66392c46ce2a3c Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Wed, 19 Apr 2017 09:52:46 +0200 Subject: [PATCH] mm: make mm_percpu_wq non freezable Geert has reported a freeze during PM resume and some additional debugging has shown that the device_resume worker cannot make a forward progress because it waits for an event which is stuck waiting in drain_all_pages: [ 243.691979] INFO: task kworker/u4:0:5 blocked for more than 120 seconds. [ 243.698684] Not tainted 4.11.0-rc7-koelsch-00029-g005882e53d62f25d-dirty #3476 [ 243.706439] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 243.714276] kworker/u4:0D0 5 2 0x [ 243.719780] Workqueue: events_unbound async_run_entry_fn [ 243.725118] [] (__schedule) from [] (schedule+0xb0/0xcc) [ 243.732181] [] (schedule) from [] (schedule_timeout+0x18/0x1f4) [ 243.739840] [] (schedule_timeout) from [] (wait_for_common+0x100/0x19c) [ 243.748207] [] (wait_for_common) from [] (dpm_wait_for_superior+0x14/0x5c) [ 243.756836] [] (dpm_wait_for_superior) from [] (device_resume+0x40/0x1a0) [ 243.765380] [] (device_resume) from [] (async_resume+0x18/0x44) [ 243.773055] [] (async_resume) from [] (async_run_entry_fn+0x44/0x114) [ 243.781245] [] (async_run_entry_fn) from [] (process_one_work+0x1cc/0x31c) [ 243.789876] [] (process_one_work) from [] (worker_thread+0x2b8/0x3f0) [ 243.798080] [] (worker_thread) from [] (kthread+0x120/0x140) [ 243.805500] [] (kthread) from [] (ret_from_fork+0x14/0x2c) [...] [ 249.441198] bashD0 1703 1694 0x [ 249.446702] [] (__schedule) from [] (schedule+0xb0/0xcc) [ 249.453764] [] (schedule) from [] (schedule_timeout+0x18/0x1f4) [ 249.461427] [] (schedule_timeout) from [] (wait_for_common+0x100/0x19c) [ 249.469797] [] (wait_for_common) from [] (flush_work+0x128/0x158) [ 249.477650] [] (flush_work) from [] (drain_all_pages+0x198/0x1f0) [ 249.485503] [] (drain_all_pages) from [] (start_isolate_page_range+0xd8/0x1ac) [ 249.494484] [] (start_isolate_page_range) from [] (alloc_contig_range+0xc4/0x304) [ 249.503724] [] (alloc_contig_range) from [] (cma_alloc+0x134/0x1bc) [ 249.511739] [] (cma_alloc) from [] (__alloc_from_contiguous+0x30/0xa0) [ 249.520023] [] (__alloc_from_contiguous) from [] (cma_allocator_alloc+0x40/0x48) [ 249.529173] [] (cma_allocator_alloc) from [] (__dma_alloc+0x1d4/0x2e8) [ 249.537455] [] (__dma_alloc) from [] (arm_dma_alloc+0x40/0x4c) [ 249.545047] [] (arm_dma_alloc) from [] (sh_eth_ring_init+0xec/0x1b8) [ 249.553160] [] (sh_eth_ring_init) from [] (sh_eth_open+0x88/0x1e0) [ 249.561086] [] (sh_eth_open) from [] (sh_eth_resume+0x7c/0xc0) [ 249.568678] [] (sh_eth_resume) from [] (dpm_run_callback+0x48/0xc8) [ 249.576702] [] (dpm_run_callback) from [] (device_resume+0x15c/0x1a0) [ 249.584898] [] (device_resume) from [] (dpm_resume+0xe4/0x244) [ 249.592485] [] (dpm_resume) from [] (dpm_resume_end+0xc/0x18) [ 249.599977] [] (dpm_resume_end) from [] (suspend_devices_and_enter+0x3c8/0x490) [ 249.609042] [] (suspend_devices_and_enter) from [] (pm_suspend+0x228/0x280) [ 249.617759] [] (pm_suspend) from [] (state_store+0xac/0xcc) [ 249.625089] [] (state_store) from [] (kernfs_fop_write+0x164/0x1a0) [ 249.633116] [] (kernfs_fop_write) from [] (__vfs_write+0x20/0x108) [ 249.641043] [] (__vfs_write) from [] (vfs_write+0xb8/0x144) [ 249.648373] [] (vfs_write) from [] (SyS_write+0x40/0x80) [ 249.655437] [] (SyS_write) from [] (ret_fast_syscall+0x0/0x34) [...] [ 254.753928] Showing busy workqueues and
Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram
On Wed 19-04-17 09:16:42, Geert Uytterhoeven wrote: > Hi Tetsuo, > > On Wed, Apr 19, 2017 at 7:41 AM, Tetsuo Handa > wrote: [...] > > Somebody is waiting forever with cpu_hotplug.lock held? > > I think that full dmesg with SysRq-t output is appreciated. > > As SysRq doesn't work with my serial console, I added calls to show_state() > and show_workqueue_state() to check_hung_task(). > > Result with current linus/master attached. [ 47.165412] Enabling non-boot CPUs ... [ 47.205615] CPU1 is up [ 47.382002] PM: noirq resume of devices complete after 174.017 msecs [ 47.390181] PM: early resume of devices complete after 1.468 msecs OK, so this is still the early resume path AFAIU which means that the userspace is still in the fridge... Is it possible that new workers cannot be spawned? [ 243.691979] INFO: task kworker/u4:0:5 blocked for more than 120 seconds. [ 243.698684] Not tainted 4.11.0-rc7-koelsch-00029-g005882e53d62f25d-dirty #3476 [ 243.706439] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 243.714276] kworker/u4:0D0 5 2 0x [ 243.719780] Workqueue: events_unbound async_run_entry_fn [ 243.725118] [] (__schedule) from [] (schedule+0xb0/0xcc) [ 243.732181] [] (schedule) from [] (schedule_timeout+0x18/0x1f4) [ 243.739840] [] (schedule_timeout) from [] (wait_for_common+0x100/0x19c) [ 243.748207] [] (wait_for_common) from [] (dpm_wait_for_superior+0x14/0x5c) [ 243.756836] [] (dpm_wait_for_superior) from [] (device_resume+0x40/0x1a0) [ 243.765380] [] (device_resume) from [] (async_resume+0x18/0x44) [ 243.773055] [] (async_resume) from [] (async_run_entry_fn+0x44/0x114) [ 243.781245] [] (async_run_entry_fn) from [] (process_one_work+0x1cc/0x31c) [ 243.789876] [] (process_one_work) from [] (worker_thread+0x2b8/0x3f0) [ 243.798080] [] (worker_thread) from [] (kthread+0x120/0x140) [ 243.805500] [] (kthread) from [] (ret_from_fork+0x14/0x2c) [...] [ 249.441198] bashD0 1703 1694 0x [ 249.446702] [] (__schedule) from [] (schedule+0xb0/0xcc) [ 249.453764] [] (schedule) from [] (schedule_timeout+0x18/0x1f4) [ 249.461427] [] (schedule_timeout) from [] (wait_for_common+0x100/0x19c) [ 249.469797] [] (wait_for_common) from [] (flush_work+0x128/0x158) [ 249.477650] [] (flush_work) from [] (drain_all_pages+0x198/0x1f0) [ 249.485503] [] (drain_all_pages) from [] (start_isolate_page_range+0xd8/0x1ac) [ 249.494484] [] (start_isolate_page_range) from [] (alloc_contig_range+0xc4/0x304) [ 249.503724] [] (alloc_contig_range) from [] (cma_alloc+0x134/0x1bc) [ 249.511739] [] (cma_alloc) from [] (__alloc_from_contiguous+0x30/0xa0) [ 249.520023] [] (__alloc_from_contiguous) from [] (cma_allocator_alloc+0x40/0x48) [ 249.529173] [] (cma_allocator_alloc) from [] (__dma_alloc+0x1d4/0x2e8) [ 249.537455] [] (__dma_alloc) from [] (arm_dma_alloc+0x40/0x4c) [ 249.545047] [] (arm_dma_alloc) from [] (sh_eth_ring_init+0xec/0x1b8) [ 249.553160] [] (sh_eth_ring_init) from [] (sh_eth_open+0x88/0x1e0) [ 249.561086] [] (sh_eth_open) from [] (sh_eth_resume+0x7c/0xc0) [ 249.568678] [] (sh_eth_resume) from [] (dpm_run_callback+0x48/0xc8) [ 249.576702] [] (dpm_run_callback) from [] (device_resume+0x15c/0x1a0) [ 249.584898] [] (device_resume) from [] (dpm_resume+0xe4/0x244) [ 249.592485] [] (dpm_resume) from [] (dpm_resume_end+0xc/0x18) [ 249.599977] [] (dpm_resume_end) from [] (suspend_devices_and_enter+0x3c8/0x490) [ 249.609042] [] (suspend_devices_and_enter) from [] (pm_suspend+0x228/0x280) [ 249.617759] [] (pm_suspend) from [] (state_store+0xac/0xcc) [ 249.625089] [] (state_store) from [] (kernfs_fop_write+0x164/0x1a0) [ 249.633116] [] (kernfs_fop_write) from [] (__vfs_write+0x20/0x108) [ 249.641043] [] (__vfs_write) from [] (vfs_write+0xb8/0x144) [ 249.648373] [] (vfs_write) from [] (SyS_write+0x40/0x80) [ 249.655437] [] (SyS_write) from [] (ret_fast_syscall+0x0/0x34) [...] [ 254.753928] Showing busy workqueues and worker pools: [...] [ 254.854225] workqueue mm_percpu_wq: flags=0xc [ 254.858583] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=0/0 [ 254.864428] delayed: drain_local_pages_wq, vmstat_update [ 254.870111] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=0/0 [ 254.875957] delayed: drain_local_pages_wq BAR(1703), vmstat_update I got lost in the indirection here. But is it possible that the allocating context will wake up the workqeue context? Anyway the patch you have bisected to doesn't change a lot in this scenario as I've said before. If anything the change to using WQ for the draining rather than smp_function_call would change the behavior. Does the below help by any chance? --- diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5fca73c7881a..a9a1ab7ea4c9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2417,6 +2417,14 @@ void drain_all_pages(struct zone *zone) static cpumask_t cpus_with_pcps; /* +* This is an uggl
Re: Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram
> On Wed 19-04-17 14:41:30, Tetsuo Handa wrote: > [...] > > Somebody is waiting forever with cpu_hotplug.lock held? > > Why would that matter for drain_all_pages? It doesn't use > get_online_cpus since a459eeb7b852 ("mm, page_alloc: do not depend on > cpu hotplug locks inside the allocator") while ce612879ddc7 ("mm: move > pcp and lru-pcp draining into single wq") was merged later. > Looking at ce612879ddc7 ("mm: move pcp and lru-pcp draining into single wq"), we merged "lru-add-drain" (!WQ_FREEZABLE && WQ_MEM_RECLAIM) workqueue and "vmstat" (WQ_FREEZABLE && WQ_MEM_RECLAIM) workqueue into "mm_percpu_wq" (WQ_FREEZABLE && WQ_MEM_RECLAIM) workqueue. - lru_add_drain_wq = alloc_workqueue("lru-add-drain", WQ_MEM_RECLAIM, 0); - vmstat_wq = alloc_workqueue("vmstat", WQ_FREEZABLE|WQ_MEM_RECLAIM, 0); + mm_percpu_wq = alloc_workqueue("mm_percpu_wq", + WQ_FREEZABLE|WQ_MEM_RECLAIM, 0); That means "lru-add-drain" became freezable, doesn't it? And this problem occurs around resume operation where all freezable threads are frozen? Then, lru_add_drain_per_cpu() cannot be performed due to mm_percpu_wq frozen?
Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram
Hi Tetsuo, On Wed, Apr 19, 2017 at 7:41 AM, Tetsuo Handa wrote: > Geert Uytterhoeven wrote: >> 8 locks held by s2ram/1899: >> #0: (sb_writers#7){.+.+.+}, at: [] vfs_write+0xa8/0x15c >> #1: (&of->mutex){+.+.+.}, at: [] >> kernfs_fop_write+0xf0/0x194 >> #2: (s_active#48){.+.+.+}, at: [] >> kernfs_fop_write+0xf8/0x194 >> #3: (pm_mutex){+.+.+.}, at: [] pm_suspend+0x16c/0xabc >> #4: (&dev->mutex){..}, at: [] >> device_resume+0x58/0x190 >> #5: (cma_mutex){+.+...}, at: [] cma_alloc+0x150/0x374 >> #6: (lock){+.+...}, at: [] lru_add_drain_all+0x4c/0x1b4 >> #7: (cpu_hotplug.dep_map){++}, at: [] >> get_online_cpus+0x3c/0x9c > > I think this situation suggests that [...] > Somebody is waiting forever with cpu_hotplug.lock held? > I think that full dmesg with SysRq-t output is appreciated. As SysRq doesn't work with my serial console, I added calls to show_state() and show_workqueue_state() to check_hung_task(). Result with current linus/master attached. Thanks! Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds hang.gz Description: GNU Zip compressed data
Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram
On Wed 19-04-17 14:41:30, Tetsuo Handa wrote: [...] > Somebody is waiting forever with cpu_hotplug.lock held? Why would that matter for drain_all_pages? It doesn't use get_online_cpus since a459eeb7b852 ("mm, page_alloc: do not depend on cpu hotplug locks inside the allocator") while ce612879ddc7 ("mm: move pcp and lru-pcp draining into single wq") was merged later. -- Michal Hocko SUSE Labs
Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram
Geert Uytterhoeven wrote: > 8 locks held by s2ram/1899: > #0: (sb_writers#7){.+.+.+}, at: [] vfs_write+0xa8/0x15c > #1: (&of->mutex){+.+.+.}, at: [] > kernfs_fop_write+0xf0/0x194 > #2: (s_active#48){.+.+.+}, at: [] > kernfs_fop_write+0xf8/0x194 > #3: (pm_mutex){+.+.+.}, at: [] pm_suspend+0x16c/0xabc > #4: (&dev->mutex){..}, at: [] device_resume+0x58/0x190 > #5: (cma_mutex){+.+...}, at: [] cma_alloc+0x150/0x374 > #6: (lock){+.+...}, at: [] lru_add_drain_all+0x4c/0x1b4 > #7: (cpu_hotplug.dep_map){++}, at: [] > get_online_cpus+0x3c/0x9c I think this situation suggests that int pm_suspend(suspend_state_t state) { error = enter_state(state) { if (!mutex_trylock(&pm_mutex)) /* #3 */ return -EBUSY; error = suspend_devices_and_enter(state) { error = suspend_enter(state, &wakeup) { enable_nonboot_cpus() { cpu_maps_update_begin() { mutex_lock(&cpu_add_remove_lock); } pr_info("Enabling non-boot CPUs ...\n"); for_each_cpu(cpu, frozen_cpus) { error = _cpu_up(cpu, 1, CPUHP_ONLINE) { cpu_hotplug_begin() { mutex_lock(&cpu_hotplug.lock); } cpu_hotplug_done() { mutex_unlock(&cpu_hotplug.lock); } } if (!error) { pr_info("CPU%d is up\n", cpu); continue; } } cpu_maps_update_done() { mutex_unlock(&cpu_add_remove_lock); } } } dpm_resume_end(PMSG_RESUME) { dpm_resume(state) { mutex_lock(&dpm_list_mtx); while (!list_empty(&dpm_suspended_list)) { mutex_unlock(&dpm_list_mtx); error = device_resume(dev, state, false) { dpm_wait_for_superior(dev, async); dpm_watchdog_set(&wd, dev); device_lock(dev) { mutex_lock(&dev->mutex); /* #4 */ } error = dpm_run_callback(callback, dev, state, info) { cma_alloc() { mutex_lock(&cma_mutex); /* #5 */ alloc_contig_range() { lru_add_drain_all() { mutex_lock(&lock); /* #6 */ get_online_cpus() { mutex_lock(&cpu_hotplug.lock); /* #7 hang? */ mutex_unlock(&cpu_hotplug.lock); } put_online_cpus(); mutex_unlock(&lock); /* #6 */ } } mutex_unlock(&cma_mutex); /* #5 */ } } device_unlock(dev) { mutex_unlock(&dev->mutex); /* #4 */ } } mutex_lock(&dpm_list_mtx); } mutex_unlock(&dpm_list_mtx); } dpm_complete(state) { mutex_lock(&dpm_list_mtx); while (!list_empty(&dpm_prepared_list)) { mutex_unlock(&dpm_list_mtx); device_complete(dev, state) { } mutex_lock(&dpm_list_mtx); } mutex_unlock(&dpm_list_mtx); } } } mutex_unlock(&pm_mutex); /* #3 */ } } Somebody is waiting forever with cpu_hotplug.lock held? I think that full dmesg with SysRq-t output is appreciated.
Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram
On Tue 18-04-17 21:56:56, Geert Uytterhoeven wrote: > Hi all, > > On Sat, Apr 8, 2017 at 10:48 AM, Linux Kernel Mailing List > wrote: > > Web: > > https://git.kernel.org/torvalds/c/ce612879ddc78ea7e4de4be80cba4ebf9caa07ee > > Commit: ce612879ddc78ea7e4de4be80cba4ebf9caa07ee > > Parent: cdcf4330d5660998d06fcd899b443693ab3d652f > > Refname:refs/heads/master > > Author: Michal Hocko > > AuthorDate: Fri Apr 7 16:05:05 2017 -0700 > > Committer: Linus Torvalds > > CommitDate: Sat Apr 8 00:47:49 2017 -0700 > > > > mm: move pcp and lru-pcp draining into single wq > > > > We currently have 2 specific WQ_RECLAIM workqueues in the mm code. > > vmstat_wq for updating pcp stats and lru_add_drain_wq dedicated to drain > > per cpu lru caches. This seems more than necessary because both can run > > on a single WQ. Both do not block on locks requiring a memory > > allocation nor perform any allocations themselves. We will save one > > rescuer thread this way. > > > > On the other hand drain_all_pages() queues work on the system wq which > > doesn't have rescuer and so this depend on memory allocation (when all > > workers are stuck allocating and new ones cannot be created). > > > > Initially we thought this would be more of a theoretical problem but > > Hugh Dickins has reported: > > > > : 4.11-rc has been giving me hangs after hours of swapping load. At > > : first they looked like memory leaks ("fork: Cannot allocate memory"); > > : but for no good reason I happened to do "cat > > /proc/sys/vm/stat_refresh" > > : before looking at /proc/meminfo one time, and the stat_refresh stuck > > : in D state, waiting for completion of flush_work like many kworkers. > > : kthreadd waiting for completion of flush_work in drain_all_pages(). > > > > This worker should be using WQ_RECLAIM as well in order to guarantee a > > forward progress. We can reuse the same one as for lru draining and > > vmstat. > > > > Link: http://lkml.kernel.org/r/20170307131751.24936-1-mho...@kernel.org > > Signed-off-by: Michal Hocko > > Suggested-by: Tetsuo Handa > > Acked-by: Vlastimil Babka > > Acked-by: Mel Gorman > > Tested-by: Yang Li > > Tested-by: Hugh Dickins > > Signed-off-by: Andrew Morton > > Signed-off-by: Linus Torvalds > > This commit broke resume from s2ram on some of my Renesas ARM boards. > On some boards the hang is 100% reproducible, on others it's intermittent > (which was a PITA, as I had to bisect another independent s2ram bug as well). Hmm, I am rather confused, how the above commit could change anything here. Your lockup detector is hitting dpm_wait_for_superior dpm_wait(dev->parent, async); dpm_wait_for_suppliers(dev, async); which in turn waits wait_for_completion(&dev->power.completion) the above commit has reduced the load on the system WQ. It also removed one WQ and reused the existing one. The work done on the mm_percpu_wq doesn't block so I suspect that what you are seeing is just showing a real bug somewhere else. I will have a look tomorrow. Let's add Tejun, maybe I have introduced some subtle dependency, which is not clear to me. > > On r8a7791/koelsch: > > --- /tmp/good 2017-04-18 21:47:04.457156167 +0200 > +++ /tmp/bad2017-04-18 21:43:26.215240325 +0200 > @@ -13,11 +13,178 @@ Enabling non-boot CPUs ... > CPU1 is up > PM: noirq resume of devices complete after N.N msecs > PM: early resume of devices complete after N.N msecs > -Micrel KSZ8041RNLI ee70.ethernet-:01: attached PHY driver > [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee70.ethernet-:01, > irq=-1) > -PM: resume of devices complete after N.N msecs > -PM: resume devices took N.N seconds > -PM: Finishing wakeup. > -Restarting tasks ... done. > -ata1: link resume succeeded after 1 retries > -ata1: SATA link down (SStatus 0 SControl 300) > -sh-eth ee70.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx > +INFO: task kworker/u4:0:5 blocked for more than 120 seconds. > + Not tainted 4.11.0-rc7-koelsch-00426-g70412b99f7936b37 #3470 > +"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > +kworker/u4:0D0 5 2 0x > +Workqueue: events_unbound async_run_entry_fn > +[] (__schedule) from [] (schedule+0xb0/0xcc) > +[] (schedule) from [] (schedule_timeout+0x18/0x1f4) > +[] (schedule_timeout) from [] > (wait_for_common+0x100/0x19c) > +[] (wait_for_common) from [] > (dpm_wait_for_superior+0x14/0x5c) > +[] (dpm_wait_for_superior) from [] > (device_resume+0x40/0x1a0) > +[] (device_resume) from [] (async_resume+0x18/0x44) > +[] (async_resume) from [] (async_run_entry_fn+0x44/0x114) > +[] (async_run_entry_fn) from [] > (process_one_work+0x1cc/0x31c) > +[] (process_one_work) from [] (worker_thread+0x2b8/0x3f0) > +[] (worker_thread) from [] (kthread+0x120/0x140) > +[] (kthread) from [] (ret_from_fork+0x14/0x2c