Michal Hocko wrote:
> On Wed 22-02-17 11:02:21, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Tue 21-02-17 23:35:07, Tetsuo Handa wrote:
> > > > Michal Hocko wrote:
> > > > > OK, so it seems that all the distractions are handled now and
> > > > > linux-next
> > > > > should provide a reason
On Wed 22-02-17 11:02:21, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Tue 21-02-17 23:35:07, Tetsuo Handa wrote:
> > > Michal Hocko wrote:
> > > > OK, so it seems that all the distractions are handled now and linux-next
> > > > should provide a reasonable base for testing. You said you weren't
Michal Hocko wrote:
> On Tue 21-02-17 23:35:07, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > OK, so it seems that all the distractions are handled now and linux-next
> > > should provide a reasonable base for testing. You said you weren't able
> > > to reproduce the original long stalls on too
On Tue 21-02-17 23:35:07, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > OK, so it seems that all the distractions are handled now and linux-next
> > should provide a reasonable base for testing. You said you weren't able
> > to reproduce the original long stalls on too_many_isolated(). I would be
>
Michal Hocko wrote:
> OK, so it seems that all the distractions are handled now and linux-next
> should provide a reasonable base for testing. You said you weren't able
> to reproduce the original long stalls on too_many_isolated(). I would be
> still interested to see those oom reports and potenti
On Fri 03-02-17 19:57:39, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Mon 30-01-17 09:55:46, Michal Hocko wrote:
> > > On Sun 29-01-17 00:27:27, Tetsuo Handa wrote:
> > [...]
> > > > Regarding [1], it helped avoiding the too_many_isolated() issue. I can't
> > > > tell whether it has any negati
On Tue, Feb 07, 2017 at 10:12:12PM +0100, Michal Hocko wrote:
> This is moot -
> http://lkml.kernel.org/r/20170207201950.20482-1-mho...@kernel.org
Thanks! I was just about to go stare at it in more detail.
On Mon 06-02-17 11:39:18, Michal Hocko wrote:
> On Sun 05-02-17 19:43:07, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > I got same warning with ext4. Maybe we need to check carefully.
> >
> > [ 511.215743] =
> > [ 511.218003] WARNING: RECLAIM
On Tue, Feb 07, 2017 at 07:30:54PM +0900, Tetsuo Handa wrote:
> Brian Foster wrote:
> > > The workload is to write to a single file on XFS from 10 processes
> > > demonstrated at
> > > http://lkml.kernel.org/r/201512052133.iae00551.lsoqftmffvo...@i-love.sakura.ne.jp
> > > using "while :; do ./oom-
Brian Foster wrote:
> > The workload is to write to a single file on XFS from 10 processes
> > demonstrated at
> > http://lkml.kernel.org/r/201512052133.iae00551.lsoqftmffvo...@i-love.sakura.ne.jp
> > using "while :; do ./oom-write; done" loop on a VM with 4CPUs / 2048MB RAM.
> > With this XFS_FIL
On Mon, Feb 06, 2017 at 03:42:22PM +0100, Michal Hocko wrote:
> On Mon 06-02-17 09:35:33, Brian Foster wrote:
> > On Mon, Feb 06, 2017 at 03:29:24PM +0900, Tetsuo Handa wrote:
> > > Brian Foster wrote:
> > > > On Fri, Feb 03, 2017 at 03:50:09PM +0100, Michal Hocko wrote:
> > > > > [Let's CC more xf
On Mon 06-02-17 09:35:33, Brian Foster wrote:
> On Mon, Feb 06, 2017 at 03:29:24PM +0900, Tetsuo Handa wrote:
> > Brian Foster wrote:
> > > On Fri, Feb 03, 2017 at 03:50:09PM +0100, Michal Hocko wrote:
> > > > [Let's CC more xfs people]
> > > >
> > > > On Fri 03-02-17 19:57:39, Tetsuo Handa wrote:
On Mon, Feb 06, 2017 at 03:29:24PM +0900, Tetsuo Handa wrote:
> Brian Foster wrote:
> > On Fri, Feb 03, 2017 at 03:50:09PM +0100, Michal Hocko wrote:
> > > [Let's CC more xfs people]
> > >
> > > On Fri 03-02-17 19:57:39, Tetsuo Handa wrote:
> > > [...]
> > > > (1) I got an assertion failure.
> > >
On Sun 05-02-17 19:43:07, Tetsuo Handa wrote:
> Michal Hocko wrote:
> I got same warning with ext4. Maybe we need to check carefully.
>
> [ 511.215743] =
> [ 511.218003] WARNING: RECLAIM_FS-safe -> RECLAIM_FS-unsafe lock order
> detected
> [
On Sun 05-02-17 19:43:07, Tetsuo Handa wrote:
[...]
> Below one is also a loop. Maybe we can add __GFP_NOMEMALLOC to GFP_NOWAIT ?
No, GFP_NOWAIT is just too generic to use this flag.
> [ 257.781715] Out of memory: Kill process 5171 (a.out) score 842 or
> sacrifice child
> [ 257.784726] Killed
Brian Foster wrote:
> On Fri, Feb 03, 2017 at 03:50:09PM +0100, Michal Hocko wrote:
> > [Let's CC more xfs people]
> >
> > On Fri 03-02-17 19:57:39, Tetsuo Handa wrote:
> > [...]
> > > (1) I got an assertion failure.
> >
> > I suspect this is a result of
> > http://lkml.kernel.org/r/2017020109270
Michal Hocko wrote:
> [CC Petr]
>
> On Fri 03-02-17 19:57:39, Tetsuo Handa wrote:
> [...]
> > (2) I got a lockdep warning. (A new false positive?)
>
> Yes, I suspect this is a false possitive. I do not see how we can
> deadlock. __alloc_pages_direct_reclaim calls drain_all_pages(NULL) which
> mea
On Fri, Feb 03, 2017 at 03:50:09PM +0100, Michal Hocko wrote:
> [Let's CC more xfs people]
>
> On Fri 03-02-17 19:57:39, Tetsuo Handa wrote:
> [...]
> > (1) I got an assertion failure.
>
> I suspect this is a result of
> http://lkml.kernel.org/r/20170201092706.9966-2-mho...@kernel.org
> I have no
[CC Petr]
On Fri 03-02-17 19:57:39, Tetsuo Handa wrote:
[...]
> (2) I got a lockdep warning. (A new false positive?)
Yes, I suspect this is a false possitive. I do not see how we can
deadlock. __alloc_pages_direct_reclaim calls drain_all_pages(NULL) which
means that a potential recursion to the p
[Let's CC more xfs people]
On Fri 03-02-17 19:57:39, Tetsuo Handa wrote:
[...]
> (1) I got an assertion failure.
I suspect this is a result of
http://lkml.kernel.org/r/20170201092706.9966-2-mho...@kernel.org
I have no idea what the assert means though.
>
> [ 969.626518] Killed process 6262 (oo
On Fri 03-02-17 19:57:39, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Mon 30-01-17 09:55:46, Michal Hocko wrote:
> > > On Sun 29-01-17 00:27:27, Tetsuo Handa wrote:
> > [...]
> > > > Regarding [1], it helped avoiding the too_many_isolated() issue. I can't
> > > > tell whether it has any negati
Michal Hocko wrote:
> On Mon 30-01-17 09:55:46, Michal Hocko wrote:
> > On Sun 29-01-17 00:27:27, Tetsuo Handa wrote:
> [...]
> > > Regarding [1], it helped avoiding the too_many_isolated() issue. I can't
> > > tell whether it has any negative effect, but I got on the first trial that
> > > all all
On Mon 30-01-17 09:55:46, Michal Hocko wrote:
> On Sun 29-01-17 00:27:27, Tetsuo Handa wrote:
[...]
> > Regarding [1], it helped avoiding the too_many_isolated() issue. I can't
> > tell whether it has any negative effect, but I got on the first trial that
> > all allocating threads are blocked on w
On Tue 31-01-17 13:51:40, Christoph Hellwig wrote:
> On Tue, Jan 31, 2017 at 12:58:46PM +0100, Michal Hocko wrote:
> > What do you think Christoph? I have an additional patch to handle
> > do_generic_file_read and a similar one to back off in
> > __vmalloc_area_node. I would like to post them all i
On Tue, Jan 31, 2017 at 12:58:46PM +0100, Michal Hocko wrote:
> What do you think Christoph? I have an additional patch to handle
> do_generic_file_read and a similar one to back off in
> __vmalloc_area_node. I would like to post them all in one series but I
> would like to know that this one is OK
On Wed 25-01-17 14:00:14, Michal Hocko wrote:
> On Wed 25-01-17 20:09:31, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Wed 25-01-17 11:19:57, Christoph Hellwig wrote:
> > > > On Wed, Jan 25, 2017 at 11:15:17AM +0100, Michal Hocko wrote:
> > > > > I think we are missing a check for fatal_sig
On Sun 29-01-17 00:27:27, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > Tetsuo,
> > before we settle on the proper fix for this issue, could you give the
> > patch a try and try to reproduce the too_many_isolated() issue or
> > just see whether patch [1] has any negative effect on your oom stress
>
Michal Hocko wrote:
> Tetsuo,
> before we settle on the proper fix for this issue, could you give the
> patch a try and try to reproduce the too_many_isolated() issue or
> just see whether patch [1] has any negative effect on your oom stress
> testing?
>
> [1] http://lkml.kernel.org/r/201701191123
Tetsuo,
before we settle on the proper fix for this issue, could you give the
patch a try and try to reproduce the too_many_isolated() issue or
just see whether patch [1] has any negative effect on your oom stress
testing?
[1] http://lkml.kernel.org/r/20170119112336.gn30...@dhcp22.suse.cz
On Wed
Michal Hocko wrote:
> On Wed 25-01-17 19:33:59, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > I think we are missing a check for fatal_signal_pending in
> > > iomap_file_buffered_write. This means that an oom victim can consume the
> > > full memory reserves. What do you think about the followi
On Wed 25-01-17 20:09:31, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Wed 25-01-17 11:19:57, Christoph Hellwig wrote:
> > > On Wed, Jan 25, 2017 at 11:15:17AM +0100, Michal Hocko wrote:
> > > > I think we are missing a check for fatal_signal_pending in
> > > > iomap_file_buffered_write. This m
Michal Hocko wrote:
> On Wed 25-01-17 11:19:57, Christoph Hellwig wrote:
> > On Wed, Jan 25, 2017 at 11:15:17AM +0100, Michal Hocko wrote:
> > > I think we are missing a check for fatal_signal_pending in
> > > iomap_file_buffered_write. This means that an oom victim can consume the
> > > full memor
On Wed 25-01-17 11:19:57, Christoph Hellwig wrote:
> On Wed, Jan 25, 2017 at 11:15:17AM +0100, Michal Hocko wrote:
> > I think we are missing a check for fatal_signal_pending in
> > iomap_file_buffered_write. This means that an oom victim can consume the
> > full memory reserves. What do you think
On Wed, Jan 25, 2017 at 11:15:17AM +0100, Michal Hocko wrote:
> I think we are missing a check for fatal_signal_pending in
> iomap_file_buffered_write. This means that an oom victim can consume the
> full memory reserves. What do you think about the following? I haven't
> tested this but it mimics
[Let's add Christoph]
The below insane^Wstress test should exercise the OOM killer behavior.
On Sat 21-01-17 16:42:42, Tetsuo Handa wrote:
> Tetsuo Handa wrote:
> > And I think that there is a different problem if I tune a reproducer
> > like below (i.e. increased the buffer size to write()/fsync
On Fri 20-01-17 22:27:27, Tetsuo Handa wrote:
> Mel Gorman wrote:
> > On Thu, Jan 19, 2017 at 12:23:36PM +0100, Michal Hocko wrote:
> > > So what do you think about the following? Tetsuo, would you be willing
> > > to run this patch through your torture testing please?
> >
> > I'm fine with treati
Tetsuo Handa wrote:
> And I think that there is a different problem if I tune a reproducer
> like below (i.e. increased the buffer size to write()/fsync() from 4096).
>
> --
> #include
> #include
> #include
> #include
> #include
> #include
> #include
>
> int main(int argc, char *ar
Mel Gorman wrote:
> On Thu, Jan 19, 2017 at 12:23:36PM +0100, Michal Hocko wrote:
> > So what do you think about the following? Tetsuo, would you be willing
> > to run this patch through your torture testing please?
>
> I'm fine with treating this as a starting point.
OK. So I tried to test this
On Fri, Jan 20, 2017 at 02:42:24PM +0800, Hillf Danton wrote:
> > @@ -1603,16 +1603,16 @@ int isolate_lru_page(struct page *page)
> > * the LRU list will go small and be scanned faster than necessary,
> > leading to
> > * unnecessary swapping, thrashing and OOM.
> > */
> > -static int too_ma
On Thursday, January 19, 2017 6:08 PM Mel Gorman wrote:
>
> If it's definitely required and is proven to fix the
> infinite-loop-without-oom workload then I'll back off and withdraw my
> objections. However, I'd at least like the following untested patch to
> be considered as an alternative. It
On Thu, Jan 19, 2017 at 12:23:36PM +0100, Michal Hocko wrote:
> On Thu 19-01-17 10:07:55, Mel Gorman wrote:
> [...]
> > mm, vmscan: Wait on a waitqueue when too many pages are isolated
> >
> > When too many pages are isolated, direct reclaim waits on congestion to
> > clear
> > for up to a tenth
On Thu 19-01-17 10:07:55, Mel Gorman wrote:
[...]
> mm, vmscan: Wait on a waitqueue when too many pages are isolated
>
> When too many pages are isolated, direct reclaim waits on congestion to clear
> for up to a tenth of a second. There is no reason to believe that too many
> pages are isolated d
On Wed, Jan 18, 2017 at 06:29:46PM +0100, Michal Hocko wrote:
> On Wed 18-01-17 17:00:10, Mel Gorman wrote:
> > > > You don't appear to directly use that information in patch 2.
> > >
> > > It is used via zone_reclaimable_pages in should_reclaim_retry
> > >
> >
> > Which is still not directly re
On Wed 18-01-17 17:00:10, Mel Gorman wrote:
> On Wed, Jan 18, 2017 at 05:17:31PM +0100, Michal Hocko wrote:
> > On Wed 18-01-17 15:54:30, Mel Gorman wrote:
> > > On Wed, Jan 18, 2017 at 04:15:31PM +0100, Michal Hocko wrote:
> > > > On Wed 18-01-17 14:46:55, Mel Gorman wrote:
> > > > > On Wed, Jan 1
On Wed, Jan 18, 2017 at 05:17:31PM +0100, Michal Hocko wrote:
> On Wed 18-01-17 15:54:30, Mel Gorman wrote:
> > On Wed, Jan 18, 2017 at 04:15:31PM +0100, Michal Hocko wrote:
> > > On Wed 18-01-17 14:46:55, Mel Gorman wrote:
> > > > On Wed, Jan 18, 2017 at 02:44:52PM +0100, Michal Hocko wrote:
> > >
On Wed 18-01-17 15:54:30, Mel Gorman wrote:
> On Wed, Jan 18, 2017 at 04:15:31PM +0100, Michal Hocko wrote:
> > On Wed 18-01-17 14:46:55, Mel Gorman wrote:
> > > On Wed, Jan 18, 2017 at 02:44:52PM +0100, Michal Hocko wrote:
> > > > From: Michal Hocko
> > > >
> > > > 599d0c954f91 ("mm, vmscan: mov
On Wed, Jan 18, 2017 at 04:15:31PM +0100, Michal Hocko wrote:
> On Wed 18-01-17 14:46:55, Mel Gorman wrote:
> > On Wed, Jan 18, 2017 at 02:44:52PM +0100, Michal Hocko wrote:
> > > From: Michal Hocko
> > >
> > > 599d0c954f91 ("mm, vmscan: move LRU lists to node") has moved
> > > NR_ISOLATED* count
On Wed 18-01-17 14:46:55, Mel Gorman wrote:
> On Wed, Jan 18, 2017 at 02:44:52PM +0100, Michal Hocko wrote:
> > From: Michal Hocko
> >
> > 599d0c954f91 ("mm, vmscan: move LRU lists to node") has moved
> > NR_ISOLATED* counters from zones to nodes. This is not the best fit
> > especially for syste
On Wed, Jan 18, 2017 at 02:44:52PM +0100, Michal Hocko wrote:
> From: Michal Hocko
>
> 599d0c954f91 ("mm, vmscan: move LRU lists to node") has moved
> NR_ISOLATED* counters from zones to nodes. This is not the best fit
> especially for systems with high/lowmem because a heavy memory pressure
> on
From: Michal Hocko
599d0c954f91 ("mm, vmscan: move LRU lists to node") has moved
NR_ISOLATED* counters from zones to nodes. This is not the best fit
especially for systems with high/lowmem because a heavy memory pressure
on the highmem zone might block lowmem requests from making progress. Or
we
50 matches
Mail list logo