subject:"Re\: \"mm\: move pcp and lru\-pcp draining into single wq\" broke resume from s2ram"

Re: Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

2017-04-19 Thread Michal Hocko

On Wed 19-04-17 15:50:01, Linus Torvalds wrote:
> On Wed, Apr 19, 2017 at 1:17 AM, Michal Hocko  wrote:
> >
> > Thanks for the testing. Linus will you take the patch from this thread
> > or you prefer a resend?
> 
> I'll take it from this branch since I'm looking at it now, but in
> general I prefer resends just because finding patches deep in some
> discussion is very iffy.

Yeah, I perfectly understand this and that's why I've asked. Thanks for
taking the patch!

-- 
Michal Hocko
SUSE Labs

Re: Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

2017-04-19 Thread Linus Torvalds

On Wed, Apr 19, 2017 at 1:17 AM, Michal Hocko  wrote:
>
> Thanks for the testing. Linus will you take the patch from this thread
> or you prefer a resend?

I'll take it from this branch since I'm looking at it now, but in
general I prefer resends just because finding patches deep in some
discussion is very iffy.

I get too much email, so it really helps to make the patches more
explicit than this...

  Linus

Re: Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

2017-04-19 Thread Michal Hocko

On Wed 19-04-17 10:09:12, Geert Uytterhoeven wrote:
> Hi Michal, Tetsuo,
> 
> On Wed, Apr 19, 2017 at 9:57 AM, Michal Hocko  wrote:
> > From f3c6e287042259d6ae9916f1ff66392c46ce2a3c Mon Sep 17 00:00:00 2001
> > From: Michal Hocko 
> > Date: Wed, 19 Apr 2017 09:52:46 +0200
> > Subject: [PATCH] mm: make mm_percpu_wq non freezable
> >
> > Geert has reported a freeze during PM resume and some additional
> > debugging has shown that the device_resume worker cannot make a forward
> > progress because it waits for an event which is stuck waiting in
> > drain_all_pages:
> > [  243.691979] INFO: task kworker/u4:0:5 blocked for more than 120 seconds.
> > [  243.698684]   Not tainted 
> > 4.11.0-rc7-koelsch-00029-g005882e53d62f25d-dirty #3476
> > [  243.706439] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> > this message.
> > [  243.714276] kworker/u4:0D0 5  2 0x
> > [  243.719780] Workqueue: events_unbound async_run_entry_fn
> > [  243.725118] [] (__schedule) from [] 
> > (schedule+0xb0/0xcc)
> > [  243.732181] [] (schedule) from [] 
> > (schedule_timeout+0x18/0x1f4)
> > [  243.739840] [] (schedule_timeout) from [] 
> > (wait_for_common+0x100/0x19c)
> > [  243.748207] [] (wait_for_common) from [] 
> > (dpm_wait_for_superior+0x14/0x5c)
> > [  243.756836] [] (dpm_wait_for_superior) from [] 
> > (device_resume+0x40/0x1a0)
> > [  243.765380] [] (device_resume) from [] 
> > (async_resume+0x18/0x44)
> > [  243.773055] [] (async_resume) from [] 
> > (async_run_entry_fn+0x44/0x114)
> > [  243.781245] [] (async_run_entry_fn) from [] 
> > (process_one_work+0x1cc/0x31c)
> > [  243.789876] [] (process_one_work) from [] 
> > (worker_thread+0x2b8/0x3f0)
> > [  243.798080] [] (worker_thread) from [] 
> > (kthread+0x120/0x140)
> > [  243.805500] [] (kthread) from [] 
> > (ret_from_fork+0x14/0x2c)
> > [...]
> > [  249.441198] bashD0  1703   1694 0x
> > [  249.446702] [] (__schedule) from [] 
> > (schedule+0xb0/0xcc)
> > [  249.453764] [] (schedule) from [] 
> > (schedule_timeout+0x18/0x1f4)
> > [  249.461427] [] (schedule_timeout) from [] 
> > (wait_for_common+0x100/0x19c)
> > [  249.469797] [] (wait_for_common) from [] 
> > (flush_work+0x128/0x158)
> > [  249.477650] [] (flush_work) from [] 
> > (drain_all_pages+0x198/0x1f0)
> > [  249.485503] [] (drain_all_pages) from [] 
> > (start_isolate_page_range+0xd8/0x1ac)
> > [  249.494484] [] (start_isolate_page_range) from [] 
> > (alloc_contig_range+0xc4/0x304)
> > [  249.503724] [] (alloc_contig_range) from [] 
> > (cma_alloc+0x134/0x1bc)
> > [  249.511739] [] (cma_alloc) from [] 
> > (__alloc_from_contiguous+0x30/0xa0)
> > [  249.520023] [] (__alloc_from_contiguous) from [] 
> > (cma_allocator_alloc+0x40/0x48)
> > [  249.529173] [] (cma_allocator_alloc) from [] 
> > (__dma_alloc+0x1d4/0x2e8)
> > [  249.537455] [] (__dma_alloc) from [] 
> > (arm_dma_alloc+0x40/0x4c)
> > [  249.545047] [] (arm_dma_alloc) from [] 
> > (sh_eth_ring_init+0xec/0x1b8)
> > [  249.553160] [] (sh_eth_ring_init) from [] 
> > (sh_eth_open+0x88/0x1e0)
> > [  249.561086] [] (sh_eth_open) from [] 
> > (sh_eth_resume+0x7c/0xc0)
> > [  249.568678] [] (sh_eth_resume) from [] 
> > (dpm_run_callback+0x48/0xc8)
> > [  249.576702] [] (dpm_run_callback) from [] 
> > (device_resume+0x15c/0x1a0)
> > [  249.584898] [] (device_resume) from [] 
> > (dpm_resume+0xe4/0x244)
> > [  249.592485] [] (dpm_resume) from [] 
> > (dpm_resume_end+0xc/0x18)
> > [  249.599977] [] (dpm_resume_end) from [] 
> > (suspend_devices_and_enter+0x3c8/0x490)
> > [  249.609042] [] (suspend_devices_and_enter) from [] 
> > (pm_suspend+0x228/0x280)
> > [  249.617759] [] (pm_suspend) from [] 
> > (state_store+0xac/0xcc)
> > [  249.625089] [] (state_store) from [] 
> > (kernfs_fop_write+0x164/0x1a0)
> > [  249.633116] [] (kernfs_fop_write) from [] 
> > (__vfs_write+0x20/0x108)
> > [  249.641043] [] (__vfs_write) from [] 
> > (vfs_write+0xb8/0x144)
> > [  249.648373] [] (vfs_write) from [] 
> > (SyS_write+0x40/0x80)
> > [  249.655437] [] (SyS_write) from [] 
> > (ret_fast_syscall+0x0/0x34)
> > [...]
> > [  254.753928] Showing busy workqueues and worker pools:
> > [...]
> > [  254.854225] workqueue mm_percpu_wq: flags=0xc
> > [  254.858583]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=0/0
> > [  254.864428] delayed: drain_local_pages_wq, vmstat_update
> > [  254.870111]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=0/0
> > [  254.875957] delayed: drain_local_pages_wq BAR(1703), vmstat_update
> >
> > Tetsuo has properly noted that mm_percpu_wq is created as WQ_FREEZABLE
> > so it is frozen this early during resume so we are effectively deadlocked.
> > Fix this by dropping WQ_FREEZABLE when creating mm_percpu_wq. We really 
> > want to
> > have it operational all the time.
> >
> > Fixes: ce612879ddc7 ("mm: move pcp and lru-pcp draining into single wq")
> > Reported-by: Geert Uytterhoeven 
> > Debugged-by: Tetsuo Handa 
> > Signed-off-by: Michal Hocko 
> 
> Tested-by: Geert Uytterhoev

Re: Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

2017-04-19 Thread Geert Uytterhoeven

Hi Michal, Tetsuo,

On Wed, Apr 19, 2017 at 9:57 AM, Michal Hocko  wrote:
> From f3c6e287042259d6ae9916f1ff66392c46ce2a3c Mon Sep 17 00:00:00 2001
> From: Michal Hocko 
> Date: Wed, 19 Apr 2017 09:52:46 +0200
> Subject: [PATCH] mm: make mm_percpu_wq non freezable
>
> Geert has reported a freeze during PM resume and some additional
> debugging has shown that the device_resume worker cannot make a forward
> progress because it waits for an event which is stuck waiting in
> drain_all_pages:
> [  243.691979] INFO: task kworker/u4:0:5 blocked for more than 120 seconds.
> [  243.698684]   Not tainted 
> 4.11.0-rc7-koelsch-00029-g005882e53d62f25d-dirty #3476
> [  243.706439] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [  243.714276] kworker/u4:0D0 5  2 0x
> [  243.719780] Workqueue: events_unbound async_run_entry_fn
> [  243.725118] [] (__schedule) from [] 
> (schedule+0xb0/0xcc)
> [  243.732181] [] (schedule) from [] 
> (schedule_timeout+0x18/0x1f4)
> [  243.739840] [] (schedule_timeout) from [] 
> (wait_for_common+0x100/0x19c)
> [  243.748207] [] (wait_for_common) from [] 
> (dpm_wait_for_superior+0x14/0x5c)
> [  243.756836] [] (dpm_wait_for_superior) from [] 
> (device_resume+0x40/0x1a0)
> [  243.765380] [] (device_resume) from [] 
> (async_resume+0x18/0x44)
> [  243.773055] [] (async_resume) from [] 
> (async_run_entry_fn+0x44/0x114)
> [  243.781245] [] (async_run_entry_fn) from [] 
> (process_one_work+0x1cc/0x31c)
> [  243.789876] [] (process_one_work) from [] 
> (worker_thread+0x2b8/0x3f0)
> [  243.798080] [] (worker_thread) from [] 
> (kthread+0x120/0x140)
> [  243.805500] [] (kthread) from [] 
> (ret_from_fork+0x14/0x2c)
> [...]
> [  249.441198] bashD0  1703   1694 0x
> [  249.446702] [] (__schedule) from [] 
> (schedule+0xb0/0xcc)
> [  249.453764] [] (schedule) from [] 
> (schedule_timeout+0x18/0x1f4)
> [  249.461427] [] (schedule_timeout) from [] 
> (wait_for_common+0x100/0x19c)
> [  249.469797] [] (wait_for_common) from [] 
> (flush_work+0x128/0x158)
> [  249.477650] [] (flush_work) from [] 
> (drain_all_pages+0x198/0x1f0)
> [  249.485503] [] (drain_all_pages) from [] 
> (start_isolate_page_range+0xd8/0x1ac)
> [  249.494484] [] (start_isolate_page_range) from [] 
> (alloc_contig_range+0xc4/0x304)
> [  249.503724] [] (alloc_contig_range) from [] 
> (cma_alloc+0x134/0x1bc)
> [  249.511739] [] (cma_alloc) from [] 
> (__alloc_from_contiguous+0x30/0xa0)
> [  249.520023] [] (__alloc_from_contiguous) from [] 
> (cma_allocator_alloc+0x40/0x48)
> [  249.529173] [] (cma_allocator_alloc) from [] 
> (__dma_alloc+0x1d4/0x2e8)
> [  249.537455] [] (__dma_alloc) from [] 
> (arm_dma_alloc+0x40/0x4c)
> [  249.545047] [] (arm_dma_alloc) from [] 
> (sh_eth_ring_init+0xec/0x1b8)
> [  249.553160] [] (sh_eth_ring_init) from [] 
> (sh_eth_open+0x88/0x1e0)
> [  249.561086] [] (sh_eth_open) from [] 
> (sh_eth_resume+0x7c/0xc0)
> [  249.568678] [] (sh_eth_resume) from [] 
> (dpm_run_callback+0x48/0xc8)
> [  249.576702] [] (dpm_run_callback) from [] 
> (device_resume+0x15c/0x1a0)
> [  249.584898] [] (device_resume) from [] 
> (dpm_resume+0xe4/0x244)
> [  249.592485] [] (dpm_resume) from [] 
> (dpm_resume_end+0xc/0x18)
> [  249.599977] [] (dpm_resume_end) from [] 
> (suspend_devices_and_enter+0x3c8/0x490)
> [  249.609042] [] (suspend_devices_and_enter) from [] 
> (pm_suspend+0x228/0x280)
> [  249.617759] [] (pm_suspend) from [] 
> (state_store+0xac/0xcc)
> [  249.625089] [] (state_store) from [] 
> (kernfs_fop_write+0x164/0x1a0)
> [  249.633116] [] (kernfs_fop_write) from [] 
> (__vfs_write+0x20/0x108)
> [  249.641043] [] (__vfs_write) from [] 
> (vfs_write+0xb8/0x144)
> [  249.648373] [] (vfs_write) from [] 
> (SyS_write+0x40/0x80)
> [  249.655437] [] (SyS_write) from [] 
> (ret_fast_syscall+0x0/0x34)
> [...]
> [  254.753928] Showing busy workqueues and worker pools:
> [...]
> [  254.854225] workqueue mm_percpu_wq: flags=0xc
> [  254.858583]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=0/0
> [  254.864428] delayed: drain_local_pages_wq, vmstat_update
> [  254.870111]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=0/0
> [  254.875957] delayed: drain_local_pages_wq BAR(1703), vmstat_update
>
> Tetsuo has properly noted that mm_percpu_wq is created as WQ_FREEZABLE
> so it is frozen this early during resume so we are effectively deadlocked.
> Fix this by dropping WQ_FREEZABLE when creating mm_percpu_wq. We really want 
> to
> have it operational all the time.
>
> Fixes: ce612879ddc7 ("mm: move pcp and lru-pcp draining into single wq")
> Reported-by: Geert Uytterhoeven 
> Debugged-by: Tetsuo Handa 
> Signed-off-by: Michal Hocko 

Tested-by: Geert Uytterhoeven 

Thanks a lot to both of you!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just sa

Re: Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

2017-04-19 Thread Michal Hocko

On Wed 19-04-17 16:26:10, Tetsuo Handa wrote:
> > On Wed 19-04-17 14:41:30, Tetsuo Handa wrote:
> > [...]
> > > Somebody is waiting forever with cpu_hotplug.lock held?
> > 
> > Why would that matter for drain_all_pages? It doesn't use
> > get_online_cpus since a459eeb7b852 ("mm, page_alloc: do not depend on
> > cpu hotplug locks inside the allocator") while ce612879ddc7 ("mm: move
> > pcp and lru-pcp draining into single wq") was merged later.
> > 
> 
> Looking at ce612879ddc7 ("mm: move pcp and lru-pcp draining into single wq"),
> we merged "lru-add-drain" (!WQ_FREEZABLE && WQ_MEM_RECLAIM) workqueue and
> "vmstat" (WQ_FREEZABLE && WQ_MEM_RECLAIM) workqueue into
> "mm_percpu_wq" (WQ_FREEZABLE && WQ_MEM_RECLAIM) workqueue.
> 
> -   lru_add_drain_wq = alloc_workqueue("lru-add-drain", WQ_MEM_RECLAIM, 
> 0);
> -   vmstat_wq = alloc_workqueue("vmstat", WQ_FREEZABLE|WQ_MEM_RECLAIM, 0);
> +   mm_percpu_wq = alloc_workqueue("mm_percpu_wq",
> +  WQ_FREEZABLE|WQ_MEM_RECLAIM, 0);
> 
> That means "lru-add-drain" became freezable, doesn't it? And this problem
> occurs around resume operation where all freezable threads are frozen?
> Then, lru_add_drain_per_cpu() cannot be performed due to mm_percpu_wq frozen?

Ohh, right you are! Very well spotted. I have completely missed
WQ_FREEZABLE there. The following should work
---
>From f3c6e287042259d6ae9916f1ff66392c46ce2a3c Mon Sep 17 00:00:00 2001
From: Michal Hocko 
Date: Wed, 19 Apr 2017 09:52:46 +0200
Subject: [PATCH] mm: make mm_percpu_wq non freezable

Geert has reported a freeze during PM resume and some additional
debugging has shown that the device_resume worker cannot make a forward
progress because it waits for an event which is stuck waiting in
drain_all_pages:
[  243.691979] INFO: task kworker/u4:0:5 blocked for more than 120 seconds.
[  243.698684]   Not tainted 
4.11.0-rc7-koelsch-00029-g005882e53d62f25d-dirty #3476
[  243.706439] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  243.714276] kworker/u4:0D0 5  2 0x
[  243.719780] Workqueue: events_unbound async_run_entry_fn
[  243.725118] [] (__schedule) from [] (schedule+0xb0/0xcc)
[  243.732181] [] (schedule) from [] 
(schedule_timeout+0x18/0x1f4)
[  243.739840] [] (schedule_timeout) from [] 
(wait_for_common+0x100/0x19c)
[  243.748207] [] (wait_for_common) from [] 
(dpm_wait_for_superior+0x14/0x5c)
[  243.756836] [] (dpm_wait_for_superior) from [] 
(device_resume+0x40/0x1a0)
[  243.765380] [] (device_resume) from [] 
(async_resume+0x18/0x44)
[  243.773055] [] (async_resume) from [] 
(async_run_entry_fn+0x44/0x114)
[  243.781245] [] (async_run_entry_fn) from [] 
(process_one_work+0x1cc/0x31c)
[  243.789876] [] (process_one_work) from [] 
(worker_thread+0x2b8/0x3f0)
[  243.798080] [] (worker_thread) from [] 
(kthread+0x120/0x140)
[  243.805500] [] (kthread) from [] 
(ret_from_fork+0x14/0x2c)
[...]
[  249.441198] bashD0  1703   1694 0x
[  249.446702] [] (__schedule) from [] (schedule+0xb0/0xcc)
[  249.453764] [] (schedule) from [] 
(schedule_timeout+0x18/0x1f4)
[  249.461427] [] (schedule_timeout) from [] 
(wait_for_common+0x100/0x19c)
[  249.469797] [] (wait_for_common) from [] 
(flush_work+0x128/0x158)
[  249.477650] [] (flush_work) from [] 
(drain_all_pages+0x198/0x1f0)
[  249.485503] [] (drain_all_pages) from [] 
(start_isolate_page_range+0xd8/0x1ac)
[  249.494484] [] (start_isolate_page_range) from [] 
(alloc_contig_range+0xc4/0x304)
[  249.503724] [] (alloc_contig_range) from [] 
(cma_alloc+0x134/0x1bc)
[  249.511739] [] (cma_alloc) from [] 
(__alloc_from_contiguous+0x30/0xa0)
[  249.520023] [] (__alloc_from_contiguous) from [] 
(cma_allocator_alloc+0x40/0x48)
[  249.529173] [] (cma_allocator_alloc) from [] 
(__dma_alloc+0x1d4/0x2e8)
[  249.537455] [] (__dma_alloc) from [] 
(arm_dma_alloc+0x40/0x4c)
[  249.545047] [] (arm_dma_alloc) from [] 
(sh_eth_ring_init+0xec/0x1b8)
[  249.553160] [] (sh_eth_ring_init) from [] 
(sh_eth_open+0x88/0x1e0)
[  249.561086] [] (sh_eth_open) from [] 
(sh_eth_resume+0x7c/0xc0)
[  249.568678] [] (sh_eth_resume) from [] 
(dpm_run_callback+0x48/0xc8)
[  249.576702] [] (dpm_run_callback) from [] 
(device_resume+0x15c/0x1a0)
[  249.584898] [] (device_resume) from [] 
(dpm_resume+0xe4/0x244)
[  249.592485] [] (dpm_resume) from [] 
(dpm_resume_end+0xc/0x18)
[  249.599977] [] (dpm_resume_end) from [] 
(suspend_devices_and_enter+0x3c8/0x490)
[  249.609042] [] (suspend_devices_and_enter) from [] 
(pm_suspend+0x228/0x280)
[  249.617759] [] (pm_suspend) from [] 
(state_store+0xac/0xcc)
[  249.625089] [] (state_store) from [] 
(kernfs_fop_write+0x164/0x1a0)
[  249.633116] [] (kernfs_fop_write) from [] 
(__vfs_write+0x20/0x108)
[  249.641043] [] (__vfs_write) from [] 
(vfs_write+0xb8/0x144)
[  249.648373] [] (vfs_write) from [] (SyS_write+0x40/0x80)
[  249.655437] [] (SyS_write) from [] 
(ret_fast_syscall+0x0/0x34)
[...]
[  254.753928] Showing busy workqueues and

Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

2017-04-19 Thread Michal Hocko

On Wed 19-04-17 09:16:42, Geert Uytterhoeven wrote:
> Hi Tetsuo,
> 
> On Wed, Apr 19, 2017 at 7:41 AM, Tetsuo Handa
>  wrote:
[...]
> > Somebody is waiting forever with cpu_hotplug.lock held?
> > I think that full dmesg with SysRq-t output is appreciated.
> 
> As SysRq doesn't work with my serial console, I added calls to show_state()
> and show_workqueue_state() to check_hung_task().
> 
> Result with current linus/master attached.
[   47.165412] Enabling non-boot CPUs ...
[   47.205615] CPU1 is up
[   47.382002] PM: noirq resume of devices complete after 174.017 msecs
[   47.390181] PM: early resume of devices complete after 1.468 msecs

OK, so this is still the early resume path AFAIU which means that the
userspace is still in the fridge... Is it possible that new workers
cannot be spawned?

[  243.691979] INFO: task kworker/u4:0:5 blocked for more than 120 seconds.
[  243.698684]   Not tainted 
4.11.0-rc7-koelsch-00029-g005882e53d62f25d-dirty #3476
[  243.706439] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  243.714276] kworker/u4:0D0 5  2 0x
[  243.719780] Workqueue: events_unbound async_run_entry_fn
[  243.725118] [] (__schedule) from [] (schedule+0xb0/0xcc)
[  243.732181] [] (schedule) from [] 
(schedule_timeout+0x18/0x1f4)
[  243.739840] [] (schedule_timeout) from [] 
(wait_for_common+0x100/0x19c)
[  243.748207] [] (wait_for_common) from [] 
(dpm_wait_for_superior+0x14/0x5c)
[  243.756836] [] (dpm_wait_for_superior) from [] 
(device_resume+0x40/0x1a0)
[  243.765380] [] (device_resume) from [] 
(async_resume+0x18/0x44)
[  243.773055] [] (async_resume) from [] 
(async_run_entry_fn+0x44/0x114)
[  243.781245] [] (async_run_entry_fn) from [] 
(process_one_work+0x1cc/0x31c)
[  243.789876] [] (process_one_work) from [] 
(worker_thread+0x2b8/0x3f0)
[  243.798080] [] (worker_thread) from [] 
(kthread+0x120/0x140)
[  243.805500] [] (kthread) from [] 
(ret_from_fork+0x14/0x2c)
[...]
[  249.441198] bashD0  1703   1694 0x
[  249.446702] [] (__schedule) from [] (schedule+0xb0/0xcc)
[  249.453764] [] (schedule) from [] 
(schedule_timeout+0x18/0x1f4)
[  249.461427] [] (schedule_timeout) from [] 
(wait_for_common+0x100/0x19c)
[  249.469797] [] (wait_for_common) from [] 
(flush_work+0x128/0x158)
[  249.477650] [] (flush_work) from [] 
(drain_all_pages+0x198/0x1f0)
[  249.485503] [] (drain_all_pages) from [] 
(start_isolate_page_range+0xd8/0x1ac)
[  249.494484] [] (start_isolate_page_range) from [] 
(alloc_contig_range+0xc4/0x304)
[  249.503724] [] (alloc_contig_range) from [] 
(cma_alloc+0x134/0x1bc)
[  249.511739] [] (cma_alloc) from [] 
(__alloc_from_contiguous+0x30/0xa0)
[  249.520023] [] (__alloc_from_contiguous) from [] 
(cma_allocator_alloc+0x40/0x48)
[  249.529173] [] (cma_allocator_alloc) from [] 
(__dma_alloc+0x1d4/0x2e8)
[  249.537455] [] (__dma_alloc) from [] 
(arm_dma_alloc+0x40/0x4c)
[  249.545047] [] (arm_dma_alloc) from [] 
(sh_eth_ring_init+0xec/0x1b8)
[  249.553160] [] (sh_eth_ring_init) from [] 
(sh_eth_open+0x88/0x1e0)
[  249.561086] [] (sh_eth_open) from [] 
(sh_eth_resume+0x7c/0xc0)
[  249.568678] [] (sh_eth_resume) from [] 
(dpm_run_callback+0x48/0xc8)
[  249.576702] [] (dpm_run_callback) from [] 
(device_resume+0x15c/0x1a0)
[  249.584898] [] (device_resume) from [] 
(dpm_resume+0xe4/0x244)
[  249.592485] [] (dpm_resume) from [] 
(dpm_resume_end+0xc/0x18)
[  249.599977] [] (dpm_resume_end) from [] 
(suspend_devices_and_enter+0x3c8/0x490)
[  249.609042] [] (suspend_devices_and_enter) from [] 
(pm_suspend+0x228/0x280)
[  249.617759] [] (pm_suspend) from [] 
(state_store+0xac/0xcc)
[  249.625089] [] (state_store) from [] 
(kernfs_fop_write+0x164/0x1a0)
[  249.633116] [] (kernfs_fop_write) from [] 
(__vfs_write+0x20/0x108)
[  249.641043] [] (__vfs_write) from [] 
(vfs_write+0xb8/0x144)
[  249.648373] [] (vfs_write) from [] (SyS_write+0x40/0x80)
[  249.655437] [] (SyS_write) from [] 
(ret_fast_syscall+0x0/0x34)
[...]
[  254.753928] Showing busy workqueues and worker pools:
[...]
[  254.854225] workqueue mm_percpu_wq: flags=0xc
[  254.858583]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=0/0
[  254.864428] delayed: drain_local_pages_wq, vmstat_update
[  254.870111]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=0/0
[  254.875957] delayed: drain_local_pages_wq BAR(1703), vmstat_update

I got lost in the indirection here. But is it possible that the
allocating context will wake up the workqeue context? Anyway the patch
you have bisected to doesn't change a lot in this scenario as I've said
before. If anything the change to using WQ for the draining rather than
smp_function_call would change the behavior. Does the below help by any
chance?
---
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5fca73c7881a..a9a1ab7ea4c9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2417,6 +2417,14 @@ void drain_all_pages(struct zone *zone)
static cpumask_t cpus_with_pcps;
 
/*
+* This is an uggl

Re: Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

2017-04-19 Thread Tetsuo Handa

> On Wed 19-04-17 14:41:30, Tetsuo Handa wrote:
> [...]
> > Somebody is waiting forever with cpu_hotplug.lock held?
> 
> Why would that matter for drain_all_pages? It doesn't use
> get_online_cpus since a459eeb7b852 ("mm, page_alloc: do not depend on
> cpu hotplug locks inside the allocator") while ce612879ddc7 ("mm: move
> pcp and lru-pcp draining into single wq") was merged later.
> 

Looking at ce612879ddc7 ("mm: move pcp and lru-pcp draining into single wq"),
we merged "lru-add-drain" (!WQ_FREEZABLE && WQ_MEM_RECLAIM) workqueue and
"vmstat" (WQ_FREEZABLE && WQ_MEM_RECLAIM) workqueue into
"mm_percpu_wq" (WQ_FREEZABLE && WQ_MEM_RECLAIM) workqueue.

-   lru_add_drain_wq = alloc_workqueue("lru-add-drain", WQ_MEM_RECLAIM, 0);
-   vmstat_wq = alloc_workqueue("vmstat", WQ_FREEZABLE|WQ_MEM_RECLAIM, 0);
+   mm_percpu_wq = alloc_workqueue("mm_percpu_wq",
+  WQ_FREEZABLE|WQ_MEM_RECLAIM, 0);

That means "lru-add-drain" became freezable, doesn't it? And this problem
occurs around resume operation where all freezable threads are frozen?
Then, lru_add_drain_per_cpu() cannot be performed due to mm_percpu_wq frozen?

Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

2017-04-19 Thread Geert Uytterhoeven

Hi Tetsuo,

On Wed, Apr 19, 2017 at 7:41 AM, Tetsuo Handa
 wrote:
> Geert Uytterhoeven wrote:
>> 8 locks held by s2ram/1899:
>>  #0:  (sb_writers#7){.+.+.+}, at: [] vfs_write+0xa8/0x15c
>>  #1:  (&of->mutex){+.+.+.}, at: [] 
>> kernfs_fop_write+0xf0/0x194
>>  #2:  (s_active#48){.+.+.+}, at: [] 
>> kernfs_fop_write+0xf8/0x194
>>  #3:  (pm_mutex){+.+.+.}, at: [] pm_suspend+0x16c/0xabc
>>  #4:  (&dev->mutex){..}, at: [] 
>> device_resume+0x58/0x190
>>  #5:  (cma_mutex){+.+...}, at: [] cma_alloc+0x150/0x374
>>  #6:  (lock){+.+...}, at: [] lru_add_drain_all+0x4c/0x1b4
>>  #7:  (cpu_hotplug.dep_map){++}, at: [] 
>> get_online_cpus+0x3c/0x9c
>
> I think this situation suggests that

[...]

> Somebody is waiting forever with cpu_hotplug.lock held?
> I think that full dmesg with SysRq-t output is appreciated.

As SysRq doesn't work with my serial console, I added calls to show_state()
and show_workqueue_state() to check_hung_task().

Result with current linus/master attached.

Thanks!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


hang.gz
Description: GNU Zip compressed data

Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

2017-04-19 Thread Michal Hocko

On Wed 19-04-17 14:41:30, Tetsuo Handa wrote:
[...]
> Somebody is waiting forever with cpu_hotplug.lock held?

Why would that matter for drain_all_pages? It doesn't use
get_online_cpus since a459eeb7b852 ("mm, page_alloc: do not depend on
cpu hotplug locks inside the allocator") while ce612879ddc7 ("mm: move
pcp and lru-pcp draining into single wq") was merged later.

-- 
Michal Hocko
SUSE Labs

Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

2017-04-18 Thread Tetsuo Handa

Geert Uytterhoeven wrote:
> 8 locks held by s2ram/1899:
>  #0:  (sb_writers#7){.+.+.+}, at: [] vfs_write+0xa8/0x15c
>  #1:  (&of->mutex){+.+.+.}, at: [] 
> kernfs_fop_write+0xf0/0x194
>  #2:  (s_active#48){.+.+.+}, at: [] 
> kernfs_fop_write+0xf8/0x194
>  #3:  (pm_mutex){+.+.+.}, at: [] pm_suspend+0x16c/0xabc
>  #4:  (&dev->mutex){..}, at: [] device_resume+0x58/0x190
>  #5:  (cma_mutex){+.+...}, at: [] cma_alloc+0x150/0x374
>  #6:  (lock){+.+...}, at: [] lru_add_drain_all+0x4c/0x1b4
>  #7:  (cpu_hotplug.dep_map){++}, at: [] 
> get_online_cpus+0x3c/0x9c

I think this situation suggests that

int pm_suspend(suspend_state_t state) {
  error = enter_state(state) {
if (!mutex_trylock(&pm_mutex)) /* #3 */
  return -EBUSY;
error = suspend_devices_and_enter(state) {
  error = suspend_enter(state, &wakeup) {
enable_nonboot_cpus() {
  cpu_maps_update_begin() {
mutex_lock(&cpu_add_remove_lock);
  }
  pr_info("Enabling non-boot CPUs ...\n");
  for_each_cpu(cpu, frozen_cpus) {
error = _cpu_up(cpu, 1, CPUHP_ONLINE) {
  cpu_hotplug_begin() {
mutex_lock(&cpu_hotplug.lock);
  }
  
  cpu_hotplug_done() {
mutex_unlock(&cpu_hotplug.lock);
  }
}
if (!error) {
  pr_info("CPU%d is up\n", cpu);
  continue;
}
  }
  cpu_maps_update_done() {
 mutex_unlock(&cpu_add_remove_lock);
  }
}
  }
  dpm_resume_end(PMSG_RESUME) {
dpm_resume(state) {
  mutex_lock(&dpm_list_mtx);
  while (!list_empty(&dpm_suspended_list)) {
mutex_unlock(&dpm_list_mtx);
error = device_resume(dev, state, false) {
  dpm_wait_for_superior(dev, async);
  dpm_watchdog_set(&wd, dev);
  device_lock(dev) {
mutex_lock(&dev->mutex); /* #4 */
  }
  error = dpm_run_callback(callback, dev, state, info) {
cma_alloc() {
  mutex_lock(&cma_mutex); /* #5 */
  alloc_contig_range() {
lru_add_drain_all() {
  mutex_lock(&lock); /* #6 */
  get_online_cpus() {
mutex_lock(&cpu_hotplug.lock); /* #7 hang? */
mutex_unlock(&cpu_hotplug.lock);
  }
  put_online_cpus();
  mutex_unlock(&lock); /* #6 */
}
  }
  mutex_unlock(&cma_mutex); /* #5 */
}
  }
  device_unlock(dev) {
mutex_unlock(&dev->mutex); /* #4 */
  }
}
mutex_lock(&dpm_list_mtx);
  }
  mutex_unlock(&dpm_list_mtx);
}
dpm_complete(state) {
  mutex_lock(&dpm_list_mtx);
  while (!list_empty(&dpm_prepared_list)) {
mutex_unlock(&dpm_list_mtx);
device_complete(dev, state) {
}
mutex_lock(&dpm_list_mtx);
  }
  mutex_unlock(&dpm_list_mtx);
}
  }
}
mutex_unlock(&pm_mutex); /* #3 */
  }
}

Somebody is waiting forever with cpu_hotplug.lock held?
I think that full dmesg with SysRq-t output is appreciated.

Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

2017-04-18 Thread Michal Hocko

On Tue 18-04-17 21:56:56, Geert Uytterhoeven wrote:
> Hi all,
> 
> On Sat, Apr 8, 2017 at 10:48 AM, Linux Kernel Mailing List
>  wrote:
> > Web:
> > https://git.kernel.org/torvalds/c/ce612879ddc78ea7e4de4be80cba4ebf9caa07ee
> > Commit: ce612879ddc78ea7e4de4be80cba4ebf9caa07ee
> > Parent: cdcf4330d5660998d06fcd899b443693ab3d652f
> > Refname:refs/heads/master
> > Author: Michal Hocko 
> > AuthorDate: Fri Apr 7 16:05:05 2017 -0700
> > Committer:  Linus Torvalds 
> > CommitDate: Sat Apr 8 00:47:49 2017 -0700
> >
> > mm: move pcp and lru-pcp draining into single wq
> >
> > We currently have 2 specific WQ_RECLAIM workqueues in the mm code.
> > vmstat_wq for updating pcp stats and lru_add_drain_wq dedicated to drain
> > per cpu lru caches.  This seems more than necessary because both can run
> > on a single WQ.  Both do not block on locks requiring a memory
> > allocation nor perform any allocations themselves.  We will save one
> > rescuer thread this way.
> >
> > On the other hand drain_all_pages() queues work on the system wq which
> > doesn't have rescuer and so this depend on memory allocation (when all
> > workers are stuck allocating and new ones cannot be created).
> >
> > Initially we thought this would be more of a theoretical problem but
> > Hugh Dickins has reported:
> >
> > : 4.11-rc has been giving me hangs after hours of swapping load.  At
> > : first they looked like memory leaks ("fork: Cannot allocate memory");
> > : but for no good reason I happened to do "cat 
> > /proc/sys/vm/stat_refresh"
> > : before looking at /proc/meminfo one time, and the stat_refresh stuck
> > : in D state, waiting for completion of flush_work like many kworkers.
> > : kthreadd waiting for completion of flush_work in drain_all_pages().
> >
> > This worker should be using WQ_RECLAIM as well in order to guarantee a
> > forward progress.  We can reuse the same one as for lru draining and
> > vmstat.
> >
> > Link: http://lkml.kernel.org/r/20170307131751.24936-1-mho...@kernel.org
> > Signed-off-by: Michal Hocko 
> > Suggested-by: Tetsuo Handa 
> > Acked-by: Vlastimil Babka 
> > Acked-by: Mel Gorman 
> > Tested-by: Yang Li 
> > Tested-by: Hugh Dickins 
> > Signed-off-by: Andrew Morton 
> > Signed-off-by: Linus Torvalds 
> 
> This commit broke resume from s2ram on some of my Renesas ARM boards.
> On some boards the hang is 100% reproducible, on others it's intermittent
> (which was a PITA, as I had to bisect another independent s2ram bug as well).

Hmm, I am rather confused, how the above commit could change anything
here. Your lockup detector is hitting
dpm_wait_for_superior
dpm_wait(dev->parent, async);
dpm_wait_for_suppliers(dev, async);

which in turn waits wait_for_completion(&dev->power.completion)

the above commit has reduced the load on the system WQ. It also removed
one WQ and reused the existing one. The work done on the mm_percpu_wq
doesn't block so I suspect that what you are seeing is just showing a
real bug somewhere else. I will have a look tomorrow. Let's add Tejun,
maybe I have introduced some subtle dependency, which is not clear to
me.

> 
> On r8a7791/koelsch:
> 
> --- /tmp/good   2017-04-18 21:47:04.457156167 +0200
> +++ /tmp/bad2017-04-18 21:43:26.215240325 +0200
> @@ -13,11 +13,178 @@ Enabling non-boot CPUs ...
>  CPU1 is up
>  PM: noirq resume of devices complete after N.N msecs
>  PM: early resume of devices complete after N.N msecs
> -Micrel KSZ8041RNLI ee70.ethernet-:01: attached PHY driver
> [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee70.ethernet-:01,
> irq=-1)
> -PM: resume of devices complete after N.N msecs
> -PM: resume devices took N.N seconds
> -PM: Finishing wakeup.
> -Restarting tasks ... done.
> -ata1: link resume succeeded after 1 retries
> -ata1: SATA link down (SStatus 0 SControl 300)
> -sh-eth ee70.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
> +INFO: task kworker/u4:0:5 blocked for more than 120 seconds.
> +  Not tainted 4.11.0-rc7-koelsch-00426-g70412b99f7936b37 #3470
> +"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> +kworker/u4:0D0 5  2 0x
> +Workqueue: events_unbound async_run_entry_fn
> +[] (__schedule) from [] (schedule+0xb0/0xcc)
> +[] (schedule) from [] (schedule_timeout+0x18/0x1f4)
> +[] (schedule_timeout) from [] 
> (wait_for_common+0x100/0x19c)
> +[] (wait_for_common) from []
> (dpm_wait_for_superior+0x14/0x5c)
> +[] (dpm_wait_for_superior) from []
> (device_resume+0x40/0x1a0)
> +[] (device_resume) from [] (async_resume+0x18/0x44)
> +[] (async_resume) from [] (async_run_entry_fn+0x44/0x114)
> +[] (async_run_entry_fn) from []
> (process_one_work+0x1cc/0x31c)
> +[] (process_one_work) from [] (worker_thread+0x2b8/0x3f0)
> +[] (worker_thread) from [] (kthread+0x120/0x140)
> +[] (kthread) from [] (ret_from_fork+0x14/0x2c

Re: Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

Re: Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

Re: Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

Re: Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

Re: Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

Re: Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

11 matches

Site Navigation

Mail list logo

Footer information