Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-02-25 Thread Tetsuo Handa
Michal Hocko wrote: > On Wed 22-02-17 11:02:21, Tetsuo Handa wrote: > > Michal Hocko wrote: > > > On Tue 21-02-17 23:35:07, Tetsuo Handa wrote: > > > > Michal Hocko wrote: > > > > > OK, so it seems that all the distractions are handled now and > > > > > linux-next > > > > > should provide a reason

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-02-21 Thread Michal Hocko
On Wed 22-02-17 11:02:21, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Tue 21-02-17 23:35:07, Tetsuo Handa wrote: > > > Michal Hocko wrote: > > > > OK, so it seems that all the distractions are handled now and linux-next > > > > should provide a reasonable base for testing. You said you weren't

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-02-21 Thread Tetsuo Handa
Michal Hocko wrote: > On Tue 21-02-17 23:35:07, Tetsuo Handa wrote: > > Michal Hocko wrote: > > > OK, so it seems that all the distractions are handled now and linux-next > > > should provide a reasonable base for testing. You said you weren't able > > > to reproduce the original long stalls on too

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-02-21 Thread Michal Hocko
On Tue 21-02-17 23:35:07, Tetsuo Handa wrote: > Michal Hocko wrote: > > OK, so it seems that all the distractions are handled now and linux-next > > should provide a reasonable base for testing. You said you weren't able > > to reproduce the original long stalls on too_many_isolated(). I would be >

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-02-21 Thread Tetsuo Handa
Michal Hocko wrote: > OK, so it seems that all the distractions are handled now and linux-next > should provide a reasonable base for testing. You said you weren't able > to reproduce the original long stalls on too_many_isolated(). I would be > still interested to see those oom reports and potenti

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-02-21 Thread Michal Hocko
On Fri 03-02-17 19:57:39, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Mon 30-01-17 09:55:46, Michal Hocko wrote: > > > On Sun 29-01-17 00:27:27, Tetsuo Handa wrote: > > [...] > > > > Regarding [1], it helped avoiding the too_many_isolated() issue. I can't > > > > tell whether it has any negati

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-02-08 Thread Peter Zijlstra
On Tue, Feb 07, 2017 at 10:12:12PM +0100, Michal Hocko wrote: > This is moot - > http://lkml.kernel.org/r/20170207201950.20482-1-mho...@kernel.org Thanks! I was just about to go stare at it in more detail.

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-02-07 Thread Michal Hocko
On Mon 06-02-17 11:39:18, Michal Hocko wrote: > On Sun 05-02-17 19:43:07, Tetsuo Handa wrote: > > Michal Hocko wrote: > > I got same warning with ext4. Maybe we need to check carefully. > > > > [ 511.215743] = > > [ 511.218003] WARNING: RECLAIM

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-02-07 Thread Brian Foster
On Tue, Feb 07, 2017 at 07:30:54PM +0900, Tetsuo Handa wrote: > Brian Foster wrote: > > > The workload is to write to a single file on XFS from 10 processes > > > demonstrated at > > > http://lkml.kernel.org/r/201512052133.iae00551.lsoqftmffvo...@i-love.sakura.ne.jp > > > using "while :; do ./oom-

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-02-07 Thread Tetsuo Handa
Brian Foster wrote: > > The workload is to write to a single file on XFS from 10 processes > > demonstrated at > > http://lkml.kernel.org/r/201512052133.iae00551.lsoqftmffvo...@i-love.sakura.ne.jp > > using "while :; do ./oom-write; done" loop on a VM with 4CPUs / 2048MB RAM. > > With this XFS_FIL

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-02-06 Thread Brian Foster
On Mon, Feb 06, 2017 at 03:42:22PM +0100, Michal Hocko wrote: > On Mon 06-02-17 09:35:33, Brian Foster wrote: > > On Mon, Feb 06, 2017 at 03:29:24PM +0900, Tetsuo Handa wrote: > > > Brian Foster wrote: > > > > On Fri, Feb 03, 2017 at 03:50:09PM +0100, Michal Hocko wrote: > > > > > [Let's CC more xf

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-02-06 Thread Michal Hocko
On Mon 06-02-17 09:35:33, Brian Foster wrote: > On Mon, Feb 06, 2017 at 03:29:24PM +0900, Tetsuo Handa wrote: > > Brian Foster wrote: > > > On Fri, Feb 03, 2017 at 03:50:09PM +0100, Michal Hocko wrote: > > > > [Let's CC more xfs people] > > > > > > > > On Fri 03-02-17 19:57:39, Tetsuo Handa wrote:

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-02-06 Thread Brian Foster
On Mon, Feb 06, 2017 at 03:29:24PM +0900, Tetsuo Handa wrote: > Brian Foster wrote: > > On Fri, Feb 03, 2017 at 03:50:09PM +0100, Michal Hocko wrote: > > > [Let's CC more xfs people] > > > > > > On Fri 03-02-17 19:57:39, Tetsuo Handa wrote: > > > [...] > > > > (1) I got an assertion failure. > > >

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-02-06 Thread Michal Hocko
On Sun 05-02-17 19:43:07, Tetsuo Handa wrote: > Michal Hocko wrote: > I got same warning with ext4. Maybe we need to check carefully. > > [ 511.215743] = > [ 511.218003] WARNING: RECLAIM_FS-safe -> RECLAIM_FS-unsafe lock order > detected > [

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-02-06 Thread Michal Hocko
On Sun 05-02-17 19:43:07, Tetsuo Handa wrote: [...] > Below one is also a loop. Maybe we can add __GFP_NOMEMALLOC to GFP_NOWAIT ? No, GFP_NOWAIT is just too generic to use this flag. > [ 257.781715] Out of memory: Kill process 5171 (a.out) score 842 or > sacrifice child > [ 257.784726] Killed

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-02-05 Thread Tetsuo Handa
Brian Foster wrote: > On Fri, Feb 03, 2017 at 03:50:09PM +0100, Michal Hocko wrote: > > [Let's CC more xfs people] > > > > On Fri 03-02-17 19:57:39, Tetsuo Handa wrote: > > [...] > > > (1) I got an assertion failure. > > > > I suspect this is a result of > > http://lkml.kernel.org/r/2017020109270

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-02-05 Thread Tetsuo Handa
Michal Hocko wrote: > [CC Petr] > > On Fri 03-02-17 19:57:39, Tetsuo Handa wrote: > [...] > > (2) I got a lockdep warning. (A new false positive?) > > Yes, I suspect this is a false possitive. I do not see how we can > deadlock. __alloc_pages_direct_reclaim calls drain_all_pages(NULL) which > mea

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-02-03 Thread Brian Foster
On Fri, Feb 03, 2017 at 03:50:09PM +0100, Michal Hocko wrote: > [Let's CC more xfs people] > > On Fri 03-02-17 19:57:39, Tetsuo Handa wrote: > [...] > > (1) I got an assertion failure. > > I suspect this is a result of > http://lkml.kernel.org/r/20170201092706.9966-2-mho...@kernel.org > I have no

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-02-03 Thread Michal Hocko
[CC Petr] On Fri 03-02-17 19:57:39, Tetsuo Handa wrote: [...] > (2) I got a lockdep warning. (A new false positive?) Yes, I suspect this is a false possitive. I do not see how we can deadlock. __alloc_pages_direct_reclaim calls drain_all_pages(NULL) which means that a potential recursion to the p

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-02-03 Thread Michal Hocko
[Let's CC more xfs people] On Fri 03-02-17 19:57:39, Tetsuo Handa wrote: [...] > (1) I got an assertion failure. I suspect this is a result of http://lkml.kernel.org/r/20170201092706.9966-2-mho...@kernel.org I have no idea what the assert means though. > > [ 969.626518] Killed process 6262 (oo

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-02-03 Thread Michal Hocko
On Fri 03-02-17 19:57:39, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Mon 30-01-17 09:55:46, Michal Hocko wrote: > > > On Sun 29-01-17 00:27:27, Tetsuo Handa wrote: > > [...] > > > > Regarding [1], it helped avoiding the too_many_isolated() issue. I can't > > > > tell whether it has any negati

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-02-03 Thread Tetsuo Handa
Michal Hocko wrote: > On Mon 30-01-17 09:55:46, Michal Hocko wrote: > > On Sun 29-01-17 00:27:27, Tetsuo Handa wrote: > [...] > > > Regarding [1], it helped avoiding the too_many_isolated() issue. I can't > > > tell whether it has any negative effect, but I got on the first trial that > > > all all

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-02-02 Thread Michal Hocko
On Mon 30-01-17 09:55:46, Michal Hocko wrote: > On Sun 29-01-17 00:27:27, Tetsuo Handa wrote: [...] > > Regarding [1], it helped avoiding the too_many_isolated() issue. I can't > > tell whether it has any negative effect, but I got on the first trial that > > all allocating threads are blocked on w

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-31 Thread Michal Hocko
On Tue 31-01-17 13:51:40, Christoph Hellwig wrote: > On Tue, Jan 31, 2017 at 12:58:46PM +0100, Michal Hocko wrote: > > What do you think Christoph? I have an additional patch to handle > > do_generic_file_read and a similar one to back off in > > __vmalloc_area_node. I would like to post them all i

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-31 Thread Christoph Hellwig
On Tue, Jan 31, 2017 at 12:58:46PM +0100, Michal Hocko wrote: > What do you think Christoph? I have an additional patch to handle > do_generic_file_read and a similar one to back off in > __vmalloc_area_node. I would like to post them all in one series but I > would like to know that this one is OK

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-31 Thread Michal Hocko
On Wed 25-01-17 14:00:14, Michal Hocko wrote: > On Wed 25-01-17 20:09:31, Tetsuo Handa wrote: > > Michal Hocko wrote: > > > On Wed 25-01-17 11:19:57, Christoph Hellwig wrote: > > > > On Wed, Jan 25, 2017 at 11:15:17AM +0100, Michal Hocko wrote: > > > > > I think we are missing a check for fatal_sig

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-30 Thread Michal Hocko
On Sun 29-01-17 00:27:27, Tetsuo Handa wrote: > Michal Hocko wrote: > > Tetsuo, > > before we settle on the proper fix for this issue, could you give the > > patch a try and try to reproduce the too_many_isolated() issue or > > just see whether patch [1] has any negative effect on your oom stress >

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-28 Thread Tetsuo Handa
Michal Hocko wrote: > Tetsuo, > before we settle on the proper fix for this issue, could you give the > patch a try and try to reproduce the too_many_isolated() issue or > just see whether patch [1] has any negative effect on your oom stress > testing? > > [1] http://lkml.kernel.org/r/201701191123

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-27 Thread Michal Hocko
Tetsuo, before we settle on the proper fix for this issue, could you give the patch a try and try to reproduce the too_many_isolated() issue or just see whether patch [1] has any negative effect on your oom stress testing? [1] http://lkml.kernel.org/r/20170119112336.gn30...@dhcp22.suse.cz On Wed

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-25 Thread Tetsuo Handa
Michal Hocko wrote: > On Wed 25-01-17 19:33:59, Tetsuo Handa wrote: > > Michal Hocko wrote: > > > I think we are missing a check for fatal_signal_pending in > > > iomap_file_buffered_write. This means that an oom victim can consume the > > > full memory reserves. What do you think about the followi

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-25 Thread Michal Hocko
On Wed 25-01-17 20:09:31, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Wed 25-01-17 11:19:57, Christoph Hellwig wrote: > > > On Wed, Jan 25, 2017 at 11:15:17AM +0100, Michal Hocko wrote: > > > > I think we are missing a check for fatal_signal_pending in > > > > iomap_file_buffered_write. This m

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-25 Thread Tetsuo Handa
Michal Hocko wrote: > On Wed 25-01-17 11:19:57, Christoph Hellwig wrote: > > On Wed, Jan 25, 2017 at 11:15:17AM +0100, Michal Hocko wrote: > > > I think we are missing a check for fatal_signal_pending in > > > iomap_file_buffered_write. This means that an oom victim can consume the > > > full memor

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-25 Thread Michal Hocko
On Wed 25-01-17 11:19:57, Christoph Hellwig wrote: > On Wed, Jan 25, 2017 at 11:15:17AM +0100, Michal Hocko wrote: > > I think we are missing a check for fatal_signal_pending in > > iomap_file_buffered_write. This means that an oom victim can consume the > > full memory reserves. What do you think

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-25 Thread Christoph Hellwig
On Wed, Jan 25, 2017 at 11:15:17AM +0100, Michal Hocko wrote: > I think we are missing a check for fatal_signal_pending in > iomap_file_buffered_write. This means that an oom victim can consume the > full memory reserves. What do you think about the following? I haven't > tested this but it mimics

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-25 Thread Michal Hocko
[Let's add Christoph] The below insane^Wstress test should exercise the OOM killer behavior. On Sat 21-01-17 16:42:42, Tetsuo Handa wrote: > Tetsuo Handa wrote: > > And I think that there is a different problem if I tune a reproducer > > like below (i.e. increased the buffer size to write()/fsync

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-25 Thread Michal Hocko
On Fri 20-01-17 22:27:27, Tetsuo Handa wrote: > Mel Gorman wrote: > > On Thu, Jan 19, 2017 at 12:23:36PM +0100, Michal Hocko wrote: > > > So what do you think about the following? Tetsuo, would you be willing > > > to run this patch through your torture testing please? > > > > I'm fine with treati

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-20 Thread Tetsuo Handa
Tetsuo Handa wrote: > And I think that there is a different problem if I tune a reproducer > like below (i.e. increased the buffer size to write()/fsync() from 4096). > > -- > #include > #include > #include > #include > #include > #include > #include > > int main(int argc, char *ar

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-20 Thread Tetsuo Handa
Mel Gorman wrote: > On Thu, Jan 19, 2017 at 12:23:36PM +0100, Michal Hocko wrote: > > So what do you think about the following? Tetsuo, would you be willing > > to run this patch through your torture testing please? > > I'm fine with treating this as a starting point. OK. So I tried to test this

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-20 Thread Mel Gorman
On Fri, Jan 20, 2017 at 02:42:24PM +0800, Hillf Danton wrote: > > @@ -1603,16 +1603,16 @@ int isolate_lru_page(struct page *page) > > * the LRU list will go small and be scanned faster than necessary, > > leading to > > * unnecessary swapping, thrashing and OOM. > > */ > > -static int too_ma

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-19 Thread Hillf Danton
On Thursday, January 19, 2017 6:08 PM Mel Gorman wrote: > > If it's definitely required and is proven to fix the > infinite-loop-without-oom workload then I'll back off and withdraw my > objections. However, I'd at least like the following untested patch to > be considered as an alternative. It

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-19 Thread Mel Gorman
On Thu, Jan 19, 2017 at 12:23:36PM +0100, Michal Hocko wrote: > On Thu 19-01-17 10:07:55, Mel Gorman wrote: > [...] > > mm, vmscan: Wait on a waitqueue when too many pages are isolated > > > > When too many pages are isolated, direct reclaim waits on congestion to > > clear > > for up to a tenth

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-19 Thread Michal Hocko
On Thu 19-01-17 10:07:55, Mel Gorman wrote: [...] > mm, vmscan: Wait on a waitqueue when too many pages are isolated > > When too many pages are isolated, direct reclaim waits on congestion to clear > for up to a tenth of a second. There is no reason to believe that too many > pages are isolated d

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-19 Thread Mel Gorman
On Wed, Jan 18, 2017 at 06:29:46PM +0100, Michal Hocko wrote: > On Wed 18-01-17 17:00:10, Mel Gorman wrote: > > > > You don't appear to directly use that information in patch 2. > > > > > > It is used via zone_reclaimable_pages in should_reclaim_retry > > > > > > > Which is still not directly re

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-18 Thread Michal Hocko
On Wed 18-01-17 17:00:10, Mel Gorman wrote: > On Wed, Jan 18, 2017 at 05:17:31PM +0100, Michal Hocko wrote: > > On Wed 18-01-17 15:54:30, Mel Gorman wrote: > > > On Wed, Jan 18, 2017 at 04:15:31PM +0100, Michal Hocko wrote: > > > > On Wed 18-01-17 14:46:55, Mel Gorman wrote: > > > > > On Wed, Jan 1

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-18 Thread Mel Gorman
On Wed, Jan 18, 2017 at 05:17:31PM +0100, Michal Hocko wrote: > On Wed 18-01-17 15:54:30, Mel Gorman wrote: > > On Wed, Jan 18, 2017 at 04:15:31PM +0100, Michal Hocko wrote: > > > On Wed 18-01-17 14:46:55, Mel Gorman wrote: > > > > On Wed, Jan 18, 2017 at 02:44:52PM +0100, Michal Hocko wrote: > > >

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-18 Thread Michal Hocko
On Wed 18-01-17 15:54:30, Mel Gorman wrote: > On Wed, Jan 18, 2017 at 04:15:31PM +0100, Michal Hocko wrote: > > On Wed 18-01-17 14:46:55, Mel Gorman wrote: > > > On Wed, Jan 18, 2017 at 02:44:52PM +0100, Michal Hocko wrote: > > > > From: Michal Hocko > > > > > > > > 599d0c954f91 ("mm, vmscan: mov

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-18 Thread Mel Gorman
On Wed, Jan 18, 2017 at 04:15:31PM +0100, Michal Hocko wrote: > On Wed 18-01-17 14:46:55, Mel Gorman wrote: > > On Wed, Jan 18, 2017 at 02:44:52PM +0100, Michal Hocko wrote: > > > From: Michal Hocko > > > > > > 599d0c954f91 ("mm, vmscan: move LRU lists to node") has moved > > > NR_ISOLATED* count

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-18 Thread Michal Hocko
On Wed 18-01-17 14:46:55, Mel Gorman wrote: > On Wed, Jan 18, 2017 at 02:44:52PM +0100, Michal Hocko wrote: > > From: Michal Hocko > > > > 599d0c954f91 ("mm, vmscan: move LRU lists to node") has moved > > NR_ISOLATED* counters from zones to nodes. This is not the best fit > > especially for syste

Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-18 Thread Mel Gorman
On Wed, Jan 18, 2017 at 02:44:52PM +0100, Michal Hocko wrote: > From: Michal Hocko > > 599d0c954f91 ("mm, vmscan: move LRU lists to node") has moved > NR_ISOLATED* counters from zones to nodes. This is not the best fit > especially for systems with high/lowmem because a heavy memory pressure > on

[RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

2017-01-18 Thread Michal Hocko
From: Michal Hocko 599d0c954f91 ("mm, vmscan: move LRU lists to node") has moved NR_ISOLATED* counters from zones to nodes. This is not the best fit especially for systems with high/lowmem because a heavy memory pressure on the highmem zone might block lowmem requests from making progress. Or we