Re: Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

2017-04-20 Thread Michal Hocko
On Wed 19-04-17 15:50:01, Linus Torvalds wrote:
> On Wed, Apr 19, 2017 at 1:17 AM, Michal Hocko <mho...@kernel.org> wrote:
> >
> > Thanks for the testing. Linus will you take the patch from this thread
> > or you prefer a resend?
> 
> I'll take it from this branch since I'm looking at it now, but in
> general I prefer resends just because finding patches deep in some
> discussion is very iffy.

Yeah, I perfectly understand this and that's why I've asked. Thanks for
taking the patch!

-- 
Michal Hocko
SUSE Labs


Re: Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

2017-04-19 Thread Michal Hocko
On Wed 19-04-17 10:09:12, Geert Uytterhoeven wrote:
> Hi Michal, Tetsuo,
> 
> On Wed, Apr 19, 2017 at 9:57 AM, Michal Hocko <mho...@kernel.org> wrote:
> > From f3c6e287042259d6ae9916f1ff66392c46ce2a3c Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mho...@suse.com>
> > Date: Wed, 19 Apr 2017 09:52:46 +0200
> > Subject: [PATCH] mm: make mm_percpu_wq non freezable
> >
> > Geert has reported a freeze during PM resume and some additional
> > debugging has shown that the device_resume worker cannot make a forward
> > progress because it waits for an event which is stuck waiting in
> > drain_all_pages:
> > [  243.691979] INFO: task kworker/u4:0:5 blocked for more than 120 seconds.
> > [  243.698684]   Not tainted 
> > 4.11.0-rc7-koelsch-00029-g005882e53d62f25d-dirty #3476
> > [  243.706439] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> > this message.
> > [  243.714276] kworker/u4:0D0 5  2 0x
> > [  243.719780] Workqueue: events_unbound async_run_entry_fn
> > [  243.725118] [] (__schedule) from [] 
> > (schedule+0xb0/0xcc)
> > [  243.732181] [] (schedule) from [] 
> > (schedule_timeout+0x18/0x1f4)
> > [  243.739840] [] (schedule_timeout) from [] 
> > (wait_for_common+0x100/0x19c)
> > [  243.748207] [] (wait_for_common) from [] 
> > (dpm_wait_for_superior+0x14/0x5c)
> > [  243.756836] [] (dpm_wait_for_superior) from [] 
> > (device_resume+0x40/0x1a0)
> > [  243.765380] [] (device_resume) from [] 
> > (async_resume+0x18/0x44)
> > [  243.773055] [] (async_resume) from [] 
> > (async_run_entry_fn+0x44/0x114)
> > [  243.781245] [] (async_run_entry_fn) from [] 
> > (process_one_work+0x1cc/0x31c)
> > [  243.789876] [] (process_one_work) from [] 
> > (worker_thread+0x2b8/0x3f0)
> > [  243.798080] [] (worker_thread) from [] 
> > (kthread+0x120/0x140)
> > [  243.805500] [] (kthread) from [] 
> > (ret_from_fork+0x14/0x2c)
> > [...]
> > [  249.441198] bashD0  1703   1694 0x
> > [  249.446702] [] (__schedule) from [] 
> > (schedule+0xb0/0xcc)
> > [  249.453764] [] (schedule) from [] 
> > (schedule_timeout+0x18/0x1f4)
> > [  249.461427] [] (schedule_timeout) from [] 
> > (wait_for_common+0x100/0x19c)
> > [  249.469797] [] (wait_for_common) from [] 
> > (flush_work+0x128/0x158)
> > [  249.477650] [] (flush_work) from [] 
> > (drain_all_pages+0x198/0x1f0)
> > [  249.485503] [] (drain_all_pages) from [] 
> > (start_isolate_page_range+0xd8/0x1ac)
> > [  249.494484] [] (start_isolate_page_range) from [] 
> > (alloc_contig_range+0xc4/0x304)
> > [  249.503724] [] (alloc_contig_range) from [] 
> > (cma_alloc+0x134/0x1bc)
> > [  249.511739] [] (cma_alloc) from [] 
> > (__alloc_from_contiguous+0x30/0xa0)
> > [  249.520023] [] (__alloc_from_contiguous) from [] 
> > (cma_allocator_alloc+0x40/0x48)
> > [  249.529173] [] (cma_allocator_alloc) from [] 
> > (__dma_alloc+0x1d4/0x2e8)
> > [  249.537455] [] (__dma_alloc) from [] 
> > (arm_dma_alloc+0x40/0x4c)
> > [  249.545047] [] (arm_dma_alloc) from [] 
> > (sh_eth_ring_init+0xec/0x1b8)
> > [  249.553160] [] (sh_eth_ring_init) from [] 
> > (sh_eth_open+0x88/0x1e0)
> > [  249.561086] [] (sh_eth_open) from [] 
> > (sh_eth_resume+0x7c/0xc0)
> > [  249.568678] [] (sh_eth_resume) from [] 
> > (dpm_run_callback+0x48/0xc8)
> > [  249.576702] [] (dpm_run_callback) from [] 
> > (device_resume+0x15c/0x1a0)
> > [  249.584898] [] (device_resume) from [] 
> > (dpm_resume+0xe4/0x244)
> > [  249.592485] [] (dpm_resume) from [] 
> > (dpm_resume_end+0xc/0x18)
> > [  249.599977] [] (dpm_resume_end) from [] 
> > (suspend_devices_and_enter+0x3c8/0x490)
> > [  249.609042] [] (suspend_devices_and_enter) from [] 
> > (pm_suspend+0x228/0x280)
> > [  249.617759] [] (pm_suspend) from [] 
> > (state_store+0xac/0xcc)
> > [  249.625089] [] (state_store) from [] 
> > (kernfs_fop_write+0x164/0x1a0)
> > [  249.633116] [] (kernfs_fop_write) from [] 
> > (__vfs_write+0x20/0x108)
> > [  249.641043] [] (__vfs_write) from [] 
> > (vfs_write+0xb8/0x144)
> > [  249.648373] [] (vfs_write) from [] 
> > (SyS_write+0x40/0x80)
> > [  249.655437] [] (SyS_write) from [] 
> > (ret_fast_syscall+0x0/0x34)
> > [...]
> > [  254.753928] Showing busy workqueues and worker pools:
> > [...]
> > [  254.854225] workqueue mm_percpu_wq: flags=0xc
> > [  254.858583]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=0/0
> > [  254.864428] delayed: drai

Re: Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

2017-04-19 Thread Michal Hocko
On Wed 19-04-17 16:26:10, Tetsuo Handa wrote:
> > On Wed 19-04-17 14:41:30, Tetsuo Handa wrote:
> > [...]
> > > Somebody is waiting forever with cpu_hotplug.lock held?
> > 
> > Why would that matter for drain_all_pages? It doesn't use
> > get_online_cpus since a459eeb7b852 ("mm, page_alloc: do not depend on
> > cpu hotplug locks inside the allocator") while ce612879ddc7 ("mm: move
> > pcp and lru-pcp draining into single wq") was merged later.
> > 
> 
> Looking at ce612879ddc7 ("mm: move pcp and lru-pcp draining into single wq"),
> we merged "lru-add-drain" (!WQ_FREEZABLE && WQ_MEM_RECLAIM) workqueue and
> "vmstat" (WQ_FREEZABLE && WQ_MEM_RECLAIM) workqueue into
> "mm_percpu_wq" (WQ_FREEZABLE && WQ_MEM_RECLAIM) workqueue.
> 
> -   lru_add_drain_wq = alloc_workqueue("lru-add-drain", WQ_MEM_RECLAIM, 
> 0);
> -   vmstat_wq = alloc_workqueue("vmstat", WQ_FREEZABLE|WQ_MEM_RECLAIM, 0);
> +   mm_percpu_wq = alloc_workqueue("mm_percpu_wq",
> +  WQ_FREEZABLE|WQ_MEM_RECLAIM, 0);
> 
> That means "lru-add-drain" became freezable, doesn't it? And this problem
> occurs around resume operation where all freezable threads are frozen?
> Then, lru_add_drain_per_cpu() cannot be performed due to mm_percpu_wq frozen?

Ohh, right you are! Very well spotted. I have completely missed
WQ_FREEZABLE there. The following should work
---
>From f3c6e287042259d6ae9916f1ff66392c46ce2a3c Mon Sep 17 00:00:00 2001
From: Michal Hocko <mho...@suse.com>
Date: Wed, 19 Apr 2017 09:52:46 +0200
Subject: [PATCH] mm: make mm_percpu_wq non freezable

Geert has reported a freeze during PM resume and some additional
debugging has shown that the device_resume worker cannot make a forward
progress because it waits for an event which is stuck waiting in
drain_all_pages:
[  243.691979] INFO: task kworker/u4:0:5 blocked for more than 120 seconds.
[  243.698684]   Not tainted 
4.11.0-rc7-koelsch-00029-g005882e53d62f25d-dirty #3476
[  243.706439] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  243.714276] kworker/u4:0D0 5  2 0x
[  243.719780] Workqueue: events_unbound async_run_entry_fn
[  243.725118] [] (__schedule) from [] (schedule+0xb0/0xcc)
[  243.732181] [] (schedule) from [] 
(schedule_timeout+0x18/0x1f4)
[  243.739840] [] (schedule_timeout) from [] 
(wait_for_common+0x100/0x19c)
[  243.748207] [] (wait_for_common) from [] 
(dpm_wait_for_superior+0x14/0x5c)
[  243.756836] [] (dpm_wait_for_superior) from [] 
(device_resume+0x40/0x1a0)
[  243.765380] [] (device_resume) from [] 
(async_resume+0x18/0x44)
[  243.773055] [] (async_resume) from [] 
(async_run_entry_fn+0x44/0x114)
[  243.781245] [] (async_run_entry_fn) from [] 
(process_one_work+0x1cc/0x31c)
[  243.789876] [] (process_one_work) from [] 
(worker_thread+0x2b8/0x3f0)
[  243.798080] [] (worker_thread) from [] 
(kthread+0x120/0x140)
[  243.805500] [] (kthread) from [] 
(ret_from_fork+0x14/0x2c)
[...]
[  249.441198] bashD0  1703   1694 0x
[  249.446702] [] (__schedule) from [] (schedule+0xb0/0xcc)
[  249.453764] [] (schedule) from [] 
(schedule_timeout+0x18/0x1f4)
[  249.461427] [] (schedule_timeout) from [] 
(wait_for_common+0x100/0x19c)
[  249.469797] [] (wait_for_common) from [] 
(flush_work+0x128/0x158)
[  249.477650] [] (flush_work) from [] 
(drain_all_pages+0x198/0x1f0)
[  249.485503] [] (drain_all_pages) from [] 
(start_isolate_page_range+0xd8/0x1ac)
[  249.494484] [] (start_isolate_page_range) from [] 
(alloc_contig_range+0xc4/0x304)
[  249.503724] [] (alloc_contig_range) from [] 
(cma_alloc+0x134/0x1bc)
[  249.511739] [] (cma_alloc) from [] 
(__alloc_from_contiguous+0x30/0xa0)
[  249.520023] [] (__alloc_from_contiguous) from [] 
(cma_allocator_alloc+0x40/0x48)
[  249.529173] [] (cma_allocator_alloc) from [] 
(__dma_alloc+0x1d4/0x2e8)
[  249.537455] [] (__dma_alloc) from [] 
(arm_dma_alloc+0x40/0x4c)
[  249.545047] [] (arm_dma_alloc) from [] 
(sh_eth_ring_init+0xec/0x1b8)
[  249.553160] [] (sh_eth_ring_init) from [] 
(sh_eth_open+0x88/0x1e0)
[  249.561086] [] (sh_eth_open) from [] 
(sh_eth_resume+0x7c/0xc0)
[  249.568678] [] (sh_eth_resume) from [] 
(dpm_run_callback+0x48/0xc8)
[  249.576702] [] (dpm_run_callback) from [] 
(device_resume+0x15c/0x1a0)
[  249.584898] [] (device_resume) from [] 
(dpm_resume+0xe4/0x244)
[  249.592485] [] (dpm_resume) from [] 
(dpm_resume_end+0xc/0x18)
[  249.599977] [] (dpm_resume_end) from [] 
(suspend_devices_and_enter+0x3c8/0x490)
[  249.609042] [] (suspend_devices_and_enter) from [] 
(pm_suspend+0x228/0x280)
[  249.617759] [] (pm_suspend) from [] 
(state_store+0xac/0xcc)
[  249.625089] [] (state_store) from [] 
(kernfs_fop_write+0x164/0x1a0)
[  249.633116] [] (kernfs_fop_write) from 

Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

2017-04-19 Thread Michal Hocko
struct zone *zone)
static cpumask_t cpus_with_pcps;
 
/*
+* This is an uggly hack but let's back off in the early PM 
suspend/resume
+* paths because the whole infrastructure might not be available yet for
+* us - namely kworkers might be still frozen
+*/
+   if (pm_suspended_storage())
+   return;
+
+   /*
 * Make sure nobody triggers this path before mm_percpu_wq is fully
 * initialized.
 */
-- 
Michal Hocko
SUSE Labs


Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

2017-04-19 Thread Michal Hocko
On Wed 19-04-17 14:41:30, Tetsuo Handa wrote:
[...]
> Somebody is waiting forever with cpu_hotplug.lock held?

Why would that matter for drain_all_pages? It doesn't use
get_online_cpus since a459eeb7b852 ("mm, page_alloc: do not depend on
cpu hotplug locks inside the allocator") while ce612879ddc7 ("mm: move
pcp and lru-pcp draining into single wq") was merged later.

-- 
Michal Hocko
SUSE Labs


Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

2017-04-18 Thread Michal Hocko
On Tue 18-04-17 21:56:56, Geert Uytterhoeven wrote:
> Hi all,
> 
> On Sat, Apr 8, 2017 at 10:48 AM, Linux Kernel Mailing List
> <linux-ker...@vger.kernel.org> wrote:
> > Web:
> > https://git.kernel.org/torvalds/c/ce612879ddc78ea7e4de4be80cba4ebf9caa07ee
> > Commit: ce612879ddc78ea7e4de4be80cba4ebf9caa07ee
> > Parent: cdcf4330d5660998d06fcd899b443693ab3d652f
> > Refname:    refs/heads/master
> > Author: Michal Hocko <mho...@suse.com>
> > AuthorDate: Fri Apr 7 16:05:05 2017 -0700
> > Committer:  Linus Torvalds <torva...@linux-foundation.org>
> > CommitDate: Sat Apr 8 00:47:49 2017 -0700
> >
> > mm: move pcp and lru-pcp draining into single wq
> >
> > We currently have 2 specific WQ_RECLAIM workqueues in the mm code.
> > vmstat_wq for updating pcp stats and lru_add_drain_wq dedicated to drain
> > per cpu lru caches.  This seems more than necessary because both can run
> > on a single WQ.  Both do not block on locks requiring a memory
> > allocation nor perform any allocations themselves.  We will save one
> > rescuer thread this way.
> >
> > On the other hand drain_all_pages() queues work on the system wq which
> > doesn't have rescuer and so this depend on memory allocation (when all
> > workers are stuck allocating and new ones cannot be created).
> >
> > Initially we thought this would be more of a theoretical problem but
> > Hugh Dickins has reported:
> >
> > : 4.11-rc has been giving me hangs after hours of swapping load.  At
> > : first they looked like memory leaks ("fork: Cannot allocate memory");
> > : but for no good reason I happened to do "cat 
> > /proc/sys/vm/stat_refresh"
> > : before looking at /proc/meminfo one time, and the stat_refresh stuck
> > : in D state, waiting for completion of flush_work like many kworkers.
> > : kthreadd waiting for completion of flush_work in drain_all_pages().
> >
> > This worker should be using WQ_RECLAIM as well in order to guarantee a
> > forward progress.  We can reuse the same one as for lru draining and
> > vmstat.
> >
> > Link: http://lkml.kernel.org/r/20170307131751.24936-1-mho...@kernel.org
> > Signed-off-by: Michal Hocko <mho...@suse.com>
> > Suggested-by: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp>
> > Acked-by: Vlastimil Babka <vba...@suse.cz>
> > Acked-by: Mel Gorman <mgor...@suse.de>
> > Tested-by: Yang Li <pku@gmail.com>
> > Tested-by: Hugh Dickins <hu...@google.com>
> > Signed-off-by: Andrew Morton <a...@linux-foundation.org>
> > Signed-off-by: Linus Torvalds <torva...@linux-foundation.org>
> 
> This commit broke resume from s2ram on some of my Renesas ARM boards.
> On some boards the hang is 100% reproducible, on others it's intermittent
> (which was a PITA, as I had to bisect another independent s2ram bug as well).

Hmm, I am rather confused, how the above commit could change anything
here. Your lockup detector is hitting
dpm_wait_for_superior
dpm_wait(dev->parent, async);
dpm_wait_for_suppliers(dev, async);

which in turn waits wait_for_completion(>power.completion)

the above commit has reduced the load on the system WQ. It also removed
one WQ and reused the existing one. The work done on the mm_percpu_wq
doesn't block so I suspect that what you are seeing is just showing a
real bug somewhere else. I will have a look tomorrow. Let's add Tejun,
maybe I have introduced some subtle dependency, which is not clear to
me.

> 
> On r8a7791/koelsch:
> 
> --- /tmp/good   2017-04-18 21:47:04.457156167 +0200
> +++ /tmp/bad2017-04-18 21:43:26.215240325 +0200
> @@ -13,11 +13,178 @@ Enabling non-boot CPUs ...
>  CPU1 is up
>  PM: noirq resume of devices complete after N.N msecs
>  PM: early resume of devices complete after N.N msecs
> -Micrel KSZ8041RNLI ee70.ethernet-:01: attached PHY driver
> [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee70.ethernet-:01,
> irq=-1)
> -PM: resume of devices complete after N.N msecs
> -PM: resume devices took N.N seconds
> -PM: Finishing wakeup.
> -Restarting tasks ... done.
> -ata1: link resume succeeded after 1 retries
> -ata1: SATA link down (SStatus 0 SControl 300)
> -sh-eth ee70.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
> +INFO: task kworker/u4:0:5 blocked for more than 120 seconds.
> +  Not tainted 4.11.0-rc7-koelsch-00426-g70412b99f7936b37 #3470
> +"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>