Re: [PATCH] vmscan: scan pages until it founds eligible pages
On Wed 10-05-17 16:03:11, Minchan Kim wrote: > On Wed, May 10, 2017 at 08:13:12AM +0200, Michal Hocko wrote: > > On Wed 10-05-17 10:46:54, Minchan Kim wrote: > > > On Wed, May 03, 2017 at 08:00:44AM +0200, Michal Hocko wrote: [...] > > > > + scan++; > > > > switch (__isolate_lru_page(page, mode)) { > > > > case 0: > > > > nr_pages = hpage_nr_pages(page); > > > > > > Confirmed. > > > > Hmm. I can clearly see how we could skip over too many pages and hit > > small reclaim priorities too quickly but I am still scratching my head > > about how we could hit the OOM killer as a result. The amount of pages > > on the active anonymous list suggests that we are not able to rotate > > pages quickly enough. I have to keep thinking about that. > > I explained it but seems to be not enouggh. Let me try again. > > The problem is that get_scan_count determines nr_to_scan with > eligible zones. > > size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx); > size = size >> sc->priority; Ohh, right. Who has done that ;) Now it is much more clear. We simply reclaimed all the pages on the inactive LRU list and only very slowly progress over active list and hit the OOM before we can actually reach anything. I completely forgot about the scan window not being the full LRU list. Thanks for bearing with me! -- Michal Hocko SUSE Labs
Re: [PATCH] vmscan: scan pages until it founds eligible pages
On Wed, May 10, 2017 at 08:13:12AM +0200, Michal Hocko wrote: > On Wed 10-05-17 10:46:54, Minchan Kim wrote: > > On Wed, May 03, 2017 at 08:00:44AM +0200, Michal Hocko wrote: > [...] > > > @@ -1486,6 +1486,12 @@ static unsigned long isolate_lru_pages(unsigned > > > long nr_to_scan, > > > continue; > > > } > > > > > > + /* > > > + * Do not count skipped pages because we do want to isolate > > > + * some pages even when the LRU mostly contains ineligible > > > + * pages > > > + */ > > > > How about adding comment about "why"? > > > > /* > > * Do not count skipped pages because it makes the function to return with > > * none isolated pages if the LRU mostly contains inelgible pages so that > > * VM cannot reclaim any pages and trigger premature OOM. > > */ > > I am not sure this is necessarily any better. Mentioning a pre-mature > OOM would require a much better explanation because a first immediate > question would be "why don't we scan those pages at priority 0". Also > decision about the OOM is at a different layer and it might change in > future when this doesn't apply any more. But it is not like I would > insist... > > > > + scan++; > > > switch (__isolate_lru_page(page, mode)) { > > > case 0: > > > nr_pages = hpage_nr_pages(page); > > > > Confirmed. > > Hmm. I can clearly see how we could skip over too many pages and hit > small reclaim priorities too quickly but I am still scratching my head > about how we could hit the OOM killer as a result. The amount of pages > on the active anonymous list suggests that we are not able to rotate > pages quickly enough. I have to keep thinking about that. I explained it but seems to be not enouggh. Let me try again. The problem is that get_scan_count determines nr_to_scan with eligible zones. size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx); size = size >> sc->priority; Assumes sc->priority is 0 and LRU list is as follows. N-N-N-N-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H (Ie, small eligible pages are in the head of LRU but others are almost ineligible pages) In that case, size becomes 4 so VM want to scan 4 pages but 4 pages from tail of the LRU are not eligible pages. If get_scan_count counts skipped pages, it doesn't reclaim remained pages after scanning 4 pages. If it's more helpful to understand the problem, I will add it to the description.
Re: [PATCH] vmscan: scan pages until it founds eligible pages
On Wed 10-05-17 10:46:54, Minchan Kim wrote: > On Wed, May 03, 2017 at 08:00:44AM +0200, Michal Hocko wrote: [...] > > @@ -1486,6 +1486,12 @@ static unsigned long isolate_lru_pages(unsigned long > > nr_to_scan, > > continue; > > } > > > > + /* > > +* Do not count skipped pages because we do want to isolate > > +* some pages even when the LRU mostly contains ineligible > > +* pages > > +*/ > > How about adding comment about "why"? > > /* > * Do not count skipped pages because it makes the function to return with > * none isolated pages if the LRU mostly contains inelgible pages so that > * VM cannot reclaim any pages and trigger premature OOM. > */ I am not sure this is necessarily any better. Mentioning a pre-mature OOM would require a much better explanation because a first immediate question would be "why don't we scan those pages at priority 0". Also decision about the OOM is at a different layer and it might change in future when this doesn't apply any more. But it is not like I would insist... > > + scan++; > > switch (__isolate_lru_page(page, mode)) { > > case 0: > > nr_pages = hpage_nr_pages(page); > > Confirmed. Hmm. I can clearly see how we could skip over too many pages and hit small reclaim priorities too quickly but I am still scratching my head about how we could hit the OOM killer as a result. The amount of pages on the active anonymous list suggests that we are not able to rotate pages quickly enough. I have to keep thinking about that. > It works as expected but it changed scan counter's behavior. How > about this? OK, it looks good to me. I believe the main motivation of the original patch from Johannes was to drop the magical total_skipped. > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 2314aca47d12..846922d7942e 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1469,7 +1469,7 @@ static __always_inline void update_lru_sizes(struct > lruvec *lruvec, > * > * Appropriate locks must be held before calling this function. > * > - * @nr_to_scan: The number of pages to look through on the list. > + * @nr_to_scan: The number of eligible pages to look through on the > list. > * @lruvec: The LRU vector to pull pages from. > * @dst: The temp list to put pages on to. > * @nr_scanned: The number of pages that were scanned. > @@ -1489,11 +1489,13 @@ static unsigned long isolate_lru_pages(unsigned long > nr_to_scan, > unsigned long nr_zone_taken[MAX_NR_ZONES] = { 0 }; > unsigned long nr_skipped[MAX_NR_ZONES] = { 0, }; > unsigned long skipped = 0; > - unsigned long scan, nr_pages; > + unsigned long scan, total_scan, nr_pages; > LIST_HEAD(pages_skipped); > > - for (scan = 0; scan < nr_to_scan && nr_taken < nr_to_scan && > - !list_empty(src); scan++) { > + for (total_scan = scan = 0; scan < nr_to_scan && > + nr_taken < nr_to_scan && > + !list_empty(src); > + total_scan++) { > struct page *page; > > page = lru_to_page(src); > @@ -1507,6 +1509,13 @@ static unsigned long isolate_lru_pages(unsigned long > nr_to_scan, > continue; > } > > + /* > + * Do not count skipped pages because it makes the function to > + * return with none isolated pages if the LRU mostly contains > + * inelgible pages so that VM cannot reclaim any pages and > + * trigger premature OOM. > + */ > + scan++; > switch (__isolate_lru_page(page, mode)) { > case 0: > nr_pages = hpage_nr_pages(page); > @@ -1544,9 +1553,9 @@ static unsigned long isolate_lru_pages(unsigned long > nr_to_scan, > skipped += nr_skipped[zid]; > } > } > - *nr_scanned = scan; > + *nr_scanned = total_scan; > trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan, > - scan, skipped, nr_taken, mode, lru); > + total_scan, skipped, nr_taken, mode, lru); > update_lru_sizes(lruvec, lru, nr_zone_taken); > return nr_taken; > } -- Michal Hocko SUSE Labs
Re: [PATCH] vmscan: scan pages until it founds eligible pages
On Wed, May 03, 2017 at 08:00:44AM +0200, Michal Hocko wrote: > On Wed 03-05-17 13:48:09, Minchan Kim wrote: > > On Tue, May 02, 2017 at 05:14:36PM +0200, Michal Hocko wrote: > > > On Tue 02-05-17 23:51:50, Minchan Kim wrote: > > > > Hi Michal, > > > > > > > > On Tue, May 02, 2017 at 09:54:32AM +0200, Michal Hocko wrote: > > > > > On Tue 02-05-17 14:14:52, Minchan Kim wrote: > > > > > > Oops, forgot to add lkml and linux-mm. > > > > > > Sorry for that. > > > > > > Send it again. > > > > > > > > > > > > >From 8ddf1c8aa15baf085bc6e8c62ce705459d57ea4c Mon Sep 17 00:00:00 > > > > > > >2001 > > > > > > From: Minchan Kim > > > > > > Date: Tue, 2 May 2017 12:34:05 +0900 > > > > > > Subject: [PATCH] vmscan: scan pages until it founds eligible pages > > > > > > > > > > > > On Tue, May 02, 2017 at 01:40:38PM +0900, Minchan Kim wrote: > > > > > > There are premature OOM happening. Although there are a ton of free > > > > > > swap and anonymous LRU list of elgible zones, OOM happened. > > > > > > > > > > > > With investigation, skipping page of isolate_lru_pages makes reclaim > > > > > > void because it returns zero nr_taken easily so LRU shrinking is > > > > > > effectively nothing and just increases priority aggressively. > > > > > > Finally, OOM happens. > > > > > > > > > > I am not really sure I understand the problem you are facing. Could > > > > > you > > > > > be more specific please? What is your configuration etc... > > > > > > > > Sure, KVM guest on x86_64, It has 2G memory and 1G swap and configured > > > > movablecore=1G to simulate highmem zone. > > > > Workload is a process consumes 2.2G memory and then random touch the > > > > address space so it makes lots of swap in/out. > > > > > > > > > > > > > > > balloon invoked oom-killer: > > > > > > gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), > > > > > > nodemask=(null), order=0, oom_score_adj=0 > > > > > [...] > > > > > > Node 0 active_anon:1698864kB inactive_anon:261256kB > > > > > > active_file:208kB inactive_file:184kB unevictable:0kB > > > > > > isolated(anon):0kB isolated(file):0kB mapped:532kB dirty:108kB > > > > > > writeback:0kB shmem:172kB writeback_tmp:0kB unstable:0kB > > > > > > all_unreclaimable? no > > > > > > DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB > > > > > > inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB > > > > > > writepending:0kB present:15992kB managed:15908kB mlocked:0kB > > > > > > slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB > > > > > > pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > > > > > > lowmem_reserve[]: 0 992 992 1952 > > > > > > DMA32 free:9088kB min:2048kB low:3064kB high:4080kB > > > > > > active_anon:952176kB inactive_anon:0kB active_file:36kB > > > > > > inactive_file:0kB unevictable:0kB writepending:88kB > > > > > > present:1032192kB managed:1019388kB mlocked:0kB > > > > > > slab_reclaimable:13532kB slab_unreclaimable:16460kB > > > > > > kernel_stack:3552kB pagetables:6672kB bounce:0kB free_pcp:56kB > > > > > > local_pcp:24kB free_cma:0kB > > > > > > lowmem_reserve[]: 0 0 0 959 > > > > > > > > > > Hmm DMA32 has sufficient free memory to allow this order-0 request. > > > > > Inactive anon lru is basically empty. Why do not we rotate a really > > > > > large active anon list? Isn't this the primary problem? > > > > > > > > It's a side effect by skipping page logic in isolate_lru_pages > > > > I mentioned above in changelog. > > > > > > > > The problem is a lot of anonymous memory in movable zone(ie, highmem) > > > > and non-small memory in DMA32 zone. > > > > > > Such a configuration is questionable on its own. But let't keep this > > > part alone. > > > > It seems you are misunderstood. It's really common on 32bit. > > Yes, I am not arguing about 32b syst
Re: [PATCH] vmscan: scan pages until it founds eligible pages
On Wed 03-05-17 13:48:09, Minchan Kim wrote: > On Tue, May 02, 2017 at 05:14:36PM +0200, Michal Hocko wrote: > > On Tue 02-05-17 23:51:50, Minchan Kim wrote: > > > Hi Michal, > > > > > > On Tue, May 02, 2017 at 09:54:32AM +0200, Michal Hocko wrote: > > > > On Tue 02-05-17 14:14:52, Minchan Kim wrote: > > > > > Oops, forgot to add lkml and linux-mm. > > > > > Sorry for that. > > > > > Send it again. > > > > > > > > > > >From 8ddf1c8aa15baf085bc6e8c62ce705459d57ea4c Mon Sep 17 00:00:00 > > > > > >2001 > > > > > From: Minchan Kim > > > > > Date: Tue, 2 May 2017 12:34:05 +0900 > > > > > Subject: [PATCH] vmscan: scan pages until it founds eligible pages > > > > > > > > > > On Tue, May 02, 2017 at 01:40:38PM +0900, Minchan Kim wrote: > > > > > There are premature OOM happening. Although there are a ton of free > > > > > swap and anonymous LRU list of elgible zones, OOM happened. > > > > > > > > > > With investigation, skipping page of isolate_lru_pages makes reclaim > > > > > void because it returns zero nr_taken easily so LRU shrinking is > > > > > effectively nothing and just increases priority aggressively. > > > > > Finally, OOM happens. > > > > > > > > I am not really sure I understand the problem you are facing. Could you > > > > be more specific please? What is your configuration etc... > > > > > > Sure, KVM guest on x86_64, It has 2G memory and 1G swap and configured > > > movablecore=1G to simulate highmem zone. > > > Workload is a process consumes 2.2G memory and then random touch the > > > address space so it makes lots of swap in/out. > > > > > > > > > > > > balloon invoked oom-killer: > > > > > gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), > > > > > nodemask=(null), order=0, oom_score_adj=0 > > > > [...] > > > > > Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB > > > > > inactive_file:184kB unevictable:0kB isolated(anon):0kB > > > > > isolated(file):0kB mapped:532kB dirty:108kB writeback:0kB shmem:172kB > > > > > writeback_tmp:0kB unstable:0kB all_unreclaimable? no > > > > > DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB > > > > > inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB > > > > > writepending:0kB present:15992kB managed:15908kB mlocked:0kB > > > > > slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB > > > > > pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > > > > > lowmem_reserve[]: 0 992 992 1952 > > > > > DMA32 free:9088kB min:2048kB low:3064kB high:4080kB > > > > > active_anon:952176kB inactive_anon:0kB active_file:36kB > > > > > inactive_file:0kB unevictable:0kB writepending:88kB present:1032192kB > > > > > managed:1019388kB mlocked:0kB slab_reclaimable:13532kB > > > > > slab_unreclaimable:16460kB kernel_stack:3552kB pagetables:6672kB > > > > > bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB > > > > > lowmem_reserve[]: 0 0 0 959 > > > > > > > > Hmm DMA32 has sufficient free memory to allow this order-0 request. > > > > Inactive anon lru is basically empty. Why do not we rotate a really > > > > large active anon list? Isn't this the primary problem? > > > > > > It's a side effect by skipping page logic in isolate_lru_pages > > > I mentioned above in changelog. > > > > > > The problem is a lot of anonymous memory in movable zone(ie, highmem) > > > and non-small memory in DMA32 zone. > > > > Such a configuration is questionable on its own. But let't keep this > > part alone. > > It seems you are misunderstood. It's really common on 32bit. Yes, I am not arguing about 32b systems. It is quite common to see issues which are inherent to the highmem zone. > Think of 2G DRAM system on 32bit. Normally, it's 1G normal:1G highmem. > It's almost same with one I configured. > > > > > > In heavy memory pressure, > > > requesting a page in GFP_KERNEL triggers reclaim. VM knows inactive list > > > is low so it tries to deactivate pages. For it, first of all, it tries > > > to isolate pages from active list but there are lots of
Re: [PATCH] vmscan: scan pages until it founds eligible pages
On Tue, May 02, 2017 at 05:14:36PM +0200, Michal Hocko wrote: > On Tue 02-05-17 23:51:50, Minchan Kim wrote: > > Hi Michal, > > > > On Tue, May 02, 2017 at 09:54:32AM +0200, Michal Hocko wrote: > > > On Tue 02-05-17 14:14:52, Minchan Kim wrote: > > > > Oops, forgot to add lkml and linux-mm. > > > > Sorry for that. > > > > Send it again. > > > > > > > > >From 8ddf1c8aa15baf085bc6e8c62ce705459d57ea4c Mon Sep 17 00:00:00 2001 > > > > From: Minchan Kim > > > > Date: Tue, 2 May 2017 12:34:05 +0900 > > > > Subject: [PATCH] vmscan: scan pages until it founds eligible pages > > > > > > > > On Tue, May 02, 2017 at 01:40:38PM +0900, Minchan Kim wrote: > > > > There are premature OOM happening. Although there are a ton of free > > > > swap and anonymous LRU list of elgible zones, OOM happened. > > > > > > > > With investigation, skipping page of isolate_lru_pages makes reclaim > > > > void because it returns zero nr_taken easily so LRU shrinking is > > > > effectively nothing and just increases priority aggressively. > > > > Finally, OOM happens. > > > > > > I am not really sure I understand the problem you are facing. Could you > > > be more specific please? What is your configuration etc... > > > > Sure, KVM guest on x86_64, It has 2G memory and 1G swap and configured > > movablecore=1G to simulate highmem zone. > > Workload is a process consumes 2.2G memory and then random touch the > > address space so it makes lots of swap in/out. > > > > > > > > > balloon invoked oom-killer: > > > > gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), > > > > nodemask=(null), order=0, oom_score_adj=0 > > > [...] > > > > Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB > > > > inactive_file:184kB unevictable:0kB isolated(anon):0kB > > > > isolated(file):0kB mapped:532kB dirty:108kB writeback:0kB shmem:172kB > > > > writeback_tmp:0kB unstable:0kB all_unreclaimable? no > > > > DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB > > > > inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB > > > > writepending:0kB present:15992kB managed:15908kB mlocked:0kB > > > > slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB > > > > pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > > > > lowmem_reserve[]: 0 992 992 1952 > > > > DMA32 free:9088kB min:2048kB low:3064kB high:4080kB > > > > active_anon:952176kB inactive_anon:0kB active_file:36kB > > > > inactive_file:0kB unevictable:0kB writepending:88kB present:1032192kB > > > > managed:1019388kB mlocked:0kB slab_reclaimable:13532kB > > > > slab_unreclaimable:16460kB kernel_stack:3552kB pagetables:6672kB > > > > bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB > > > > lowmem_reserve[]: 0 0 0 959 > > > > > > Hmm DMA32 has sufficient free memory to allow this order-0 request. > > > Inactive anon lru is basically empty. Why do not we rotate a really > > > large active anon list? Isn't this the primary problem? > > > > It's a side effect by skipping page logic in isolate_lru_pages > > I mentioned above in changelog. > > > > The problem is a lot of anonymous memory in movable zone(ie, highmem) > > and non-small memory in DMA32 zone. > > Such a configuration is questionable on its own. But let't keep this > part alone. It seems you are misunderstood. It's really common on 32bit. Think of 2G DRAM system on 32bit. Normally, it's 1G normal:1G highmem. It's almost same with one I configured. > > > In heavy memory pressure, > > requesting a page in GFP_KERNEL triggers reclaim. VM knows inactive list > > is low so it tries to deactivate pages. For it, first of all, it tries > > to isolate pages from active list but there are lots of anonymous pages > > from movable zone so skipping logic in isolate_lru_pages works. With > > the result, isolate_lru_pages cannot isolate any eligible pages so > > reclaim trial is effectively void. It continues to meet OOM. > > But skipped pages should be rotated and we should eventually hit pages > from the right zone(s). Moreover we should scan the full LRU at priority > 0 so why exactly we hit the OOM killer? Yes, full scan in priority 0 but keep it in mind that the number of full LRU pages to scan is one of eligible pages, no
Re: [PATCH] vmscan: scan pages until it founds eligible pages
On Tue 02-05-17 23:51:50, Minchan Kim wrote: > Hi Michal, > > On Tue, May 02, 2017 at 09:54:32AM +0200, Michal Hocko wrote: > > On Tue 02-05-17 14:14:52, Minchan Kim wrote: > > > Oops, forgot to add lkml and linux-mm. > > > Sorry for that. > > > Send it again. > > > > > > >From 8ddf1c8aa15baf085bc6e8c62ce705459d57ea4c Mon Sep 17 00:00:00 2001 > > > From: Minchan Kim > > > Date: Tue, 2 May 2017 12:34:05 +0900 > > > Subject: [PATCH] vmscan: scan pages until it founds eligible pages > > > > > > On Tue, May 02, 2017 at 01:40:38PM +0900, Minchan Kim wrote: > > > There are premature OOM happening. Although there are a ton of free > > > swap and anonymous LRU list of elgible zones, OOM happened. > > > > > > With investigation, skipping page of isolate_lru_pages makes reclaim > > > void because it returns zero nr_taken easily so LRU shrinking is > > > effectively nothing and just increases priority aggressively. > > > Finally, OOM happens. > > > > I am not really sure I understand the problem you are facing. Could you > > be more specific please? What is your configuration etc... > > Sure, KVM guest on x86_64, It has 2G memory and 1G swap and configured > movablecore=1G to simulate highmem zone. > Workload is a process consumes 2.2G memory and then random touch the > address space so it makes lots of swap in/out. > > > > > > balloon invoked oom-killer: > > > gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), > > > nodemask=(null), order=0, oom_score_adj=0 > > [...] > > > Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB > > > inactive_file:184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB > > > mapped:532kB dirty:108kB writeback:0kB shmem:172kB writeback_tmp:0kB > > > unstable:0kB all_unreclaimable? no > > > DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB > > > inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB > > > writepending:0kB present:15992kB managed:15908kB mlocked:0kB > > > slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB > > > pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > > > lowmem_reserve[]: 0 992 992 1952 > > > DMA32 free:9088kB min:2048kB low:3064kB high:4080kB active_anon:952176kB > > > inactive_anon:0kB active_file:36kB inactive_file:0kB unevictable:0kB > > > writepending:88kB present:1032192kB managed:1019388kB mlocked:0kB > > > slab_reclaimable:13532kB slab_unreclaimable:16460kB kernel_stack:3552kB > > > pagetables:6672kB bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB > > > lowmem_reserve[]: 0 0 0 959 > > > > Hmm DMA32 has sufficient free memory to allow this order-0 request. > > Inactive anon lru is basically empty. Why do not we rotate a really > > large active anon list? Isn't this the primary problem? > > It's a side effect by skipping page logic in isolate_lru_pages > I mentioned above in changelog. > > The problem is a lot of anonymous memory in movable zone(ie, highmem) > and non-small memory in DMA32 zone. Such a configuration is questionable on its own. But let't keep this part alone. > In heavy memory pressure, > requesting a page in GFP_KERNEL triggers reclaim. VM knows inactive list > is low so it tries to deactivate pages. For it, first of all, it tries > to isolate pages from active list but there are lots of anonymous pages > from movable zone so skipping logic in isolate_lru_pages works. With > the result, isolate_lru_pages cannot isolate any eligible pages so > reclaim trial is effectively void. It continues to meet OOM. But skipped pages should be rotated and we should eventually hit pages from the right zone(s). Moreover we should scan the full LRU at priority 0 so why exactly we hit the OOM killer? Anyway [1] has changed this behavior. Are you seeing the issue with this patch dropped? [1] http://www.ozlabs.org/~akpm/mmotm/broken-out/revert-mm-vmscan-account-for-skipped-pages-as-a-partial-scan.patch -- Michal Hocko SUSE Labs
Re: [PATCH] vmscan: scan pages until it founds eligible pages
Hi Michal, On Tue, May 02, 2017 at 09:54:32AM +0200, Michal Hocko wrote: > On Tue 02-05-17 14:14:52, Minchan Kim wrote: > > Oops, forgot to add lkml and linux-mm. > > Sorry for that. > > Send it again. > > > > >From 8ddf1c8aa15baf085bc6e8c62ce705459d57ea4c Mon Sep 17 00:00:00 2001 > > From: Minchan Kim > > Date: Tue, 2 May 2017 12:34:05 +0900 > > Subject: [PATCH] vmscan: scan pages until it founds eligible pages > > > > On Tue, May 02, 2017 at 01:40:38PM +0900, Minchan Kim wrote: > > There are premature OOM happening. Although there are a ton of free > > swap and anonymous LRU list of elgible zones, OOM happened. > > > > With investigation, skipping page of isolate_lru_pages makes reclaim > > void because it returns zero nr_taken easily so LRU shrinking is > > effectively nothing and just increases priority aggressively. > > Finally, OOM happens. > > I am not really sure I understand the problem you are facing. Could you > be more specific please? What is your configuration etc... Sure, KVM guest on x86_64, It has 2G memory and 1G swap and configured movablecore=1G to simulate highmem zone. Workload is a process consumes 2.2G memory and then random touch the address space so it makes lots of swap in/out. > > > balloon invoked oom-killer: > > gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), > > nodemask=(null), order=0, oom_score_adj=0 > [...] > > Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB > > inactive_file:184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB > > mapped:532kB dirty:108kB writeback:0kB shmem:172kB writeback_tmp:0kB > > unstable:0kB all_unreclaimable? no > > DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB > > inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB > > writepending:0kB present:15992kB managed:15908kB mlocked:0kB > > slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB > > pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > > lowmem_reserve[]: 0 992 992 1952 > > DMA32 free:9088kB min:2048kB low:3064kB high:4080kB active_anon:952176kB > > inactive_anon:0kB active_file:36kB inactive_file:0kB unevictable:0kB > > writepending:88kB present:1032192kB managed:1019388kB mlocked:0kB > > slab_reclaimable:13532kB slab_unreclaimable:16460kB kernel_stack:3552kB > > pagetables:6672kB bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB > > lowmem_reserve[]: 0 0 0 959 > > Hmm DMA32 has sufficient free memory to allow this order-0 request. > Inactive anon lru is basically empty. Why do not we rotate a really > large active anon list? Isn't this the primary problem? It's a side effect by skipping page logic in isolate_lru_pages I mentioned above in changelog. The problem is a lot of anonymous memory in movable zone(ie, highmem) and non-small memory in DMA32 zone. In heavy memory pressure, requesting a page in GFP_KERNEL triggers reclaim. VM knows inactive list is low so it tries to deactivate pages. For it, first of all, it tries to isolate pages from active list but there are lots of anonymous pages from movable zone so skipping logic in isolate_lru_pages works. With the result, isolate_lru_pages cannot isolate any eligible pages so reclaim trial is effectively void. It continues to meet OOM. I'm on long vacation from today so understand if my response is slow.
Re: [PATCH] vmscan: scan pages until it founds eligible pages
On Tue 02-05-17 14:14:52, Minchan Kim wrote: > Oops, forgot to add lkml and linux-mm. > Sorry for that. > Send it again. > > >From 8ddf1c8aa15baf085bc6e8c62ce705459d57ea4c Mon Sep 17 00:00:00 2001 > From: Minchan Kim > Date: Tue, 2 May 2017 12:34:05 +0900 > Subject: [PATCH] vmscan: scan pages until it founds eligible pages > > On Tue, May 02, 2017 at 01:40:38PM +0900, Minchan Kim wrote: > There are premature OOM happening. Although there are a ton of free > swap and anonymous LRU list of elgible zones, OOM happened. > > With investigation, skipping page of isolate_lru_pages makes reclaim > void because it returns zero nr_taken easily so LRU shrinking is > effectively nothing and just increases priority aggressively. > Finally, OOM happens. I am not really sure I understand the problem you are facing. Could you be more specific please? What is your configuration etc... > balloon invoked oom-killer: > gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), > nodemask=(null), order=0, oom_score_adj=0 [...] > Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB > inactive_file:184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB > mapped:532kB dirty:108kB writeback:0kB shmem:172kB writeback_tmp:0kB > unstable:0kB all_unreclaimable? no > DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB > inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB > writepending:0kB present:15992kB managed:15908kB mlocked:0kB > slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB > pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > lowmem_reserve[]: 0 992 992 1952 > DMA32 free:9088kB min:2048kB low:3064kB high:4080kB active_anon:952176kB > inactive_anon:0kB active_file:36kB inactive_file:0kB unevictable:0kB > writepending:88kB present:1032192kB managed:1019388kB mlocked:0kB > slab_reclaimable:13532kB slab_unreclaimable:16460kB kernel_stack:3552kB > pagetables:6672kB bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB > lowmem_reserve[]: 0 0 0 959 Hmm DMA32 has sufficient free memory to allow this order-0 request. Inactive anon lru is basically empty. Why do not we rotate a really large active anon list? Isn't this the primary problem? I haven't really looked at the patch deeply yet. It looks quite scary at first sight though. I would really like to understand what exactly is going on here before we move to a patch to fix it. Thanks! -- Michal Hocko SUSE Labs
Re: [PATCH] vmscan: scan pages until it founds eligible pages
Oops, forgot to add lkml and linux-mm. Sorry for that. Send it again. >From 8ddf1c8aa15baf085bc6e8c62ce705459d57ea4c Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Tue, 2 May 2017 12:34:05 +0900 Subject: [PATCH] vmscan: scan pages until it founds eligible pages On Tue, May 02, 2017 at 01:40:38PM +0900, Minchan Kim wrote: There are premature OOM happening. Although there are a ton of free swap and anonymous LRU list of elgible zones, OOM happened. With investigation, skipping page of isolate_lru_pages makes reclaim void because it returns zero nr_taken easily so LRU shrinking is effectively nothing and just increases priority aggressively. Finally, OOM happens. This patch makes isolate_lru_pages try to scan pages until it encounters eligible zones's pages or too much scan happen(ie, node's LRU size). balloon invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null), order=0, oom_score_adj=0 CPU: 7 PID: 1138 Comm: balloon Not tainted 4.11.0-rc6-mm1-zram-00289-ge228d67e9677-dirty #17 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 Call Trace: dump_stack+0x65/0x87 dump_header.isra.19+0x8f/0x20f ? preempt_count_add+0x9e/0xb0 ? _raw_spin_unlock_irqrestore+0x24/0x40 oom_kill_process+0x21d/0x3f0 ? has_capability_noaudit+0x17/0x20 out_of_memory+0xd8/0x390 __alloc_pages_slowpath+0xbc1/0xc50 ? anon_vma_interval_tree_insert+0x84/0x90 __alloc_pages_nodemask+0x1a5/0x1c0 pte_alloc_one+0x20/0x50 __pte_alloc+0x1e/0x110 __handle_mm_fault+0x919/0x960 handle_mm_fault+0x77/0x120 __do_page_fault+0x27a/0x550 trace_do_page_fault+0x43/0x150 do_async_page_fault+0x2c/0x90 async_page_fault+0x28/0x30 RIP: 0033:0x7fc4636bacb8 RSP: 002b:7fff97c9c4c0 EFLAGS: 00010202 RAX: 7fc3e818d000 RBX: 7fc4639f8760 RCX: 7fc46372e9ca RDX: 00101002 RSI: 00101000 RDI: RBP: 00100010 R08: R09: R10: 0022 R11: 000a3901 R12: 7fc3e818d010 R13: 00101000 R14: 7fc4639f87b8 R15: 7fc4639f87b8 Mem-Info: active_anon:424716 inactive_anon:65314 isolated_anon:0 active_file:52 inactive_file:46 isolated_file:0 unevictable:0 dirty:27 writeback:0 unstable:0 slab_reclaimable:3967 slab_unreclaimable:4125 mapped:133 shmem:43 pagetables:1674 bounce:0 free:4637 free_pcp:225 free_cma:0 Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB inactive_file:184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:532kB dirty:108kB writeback:0kB shmem:172kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB lowmem_reserve[]: 0 992 992 1952 DMA32 free:9088kB min:2048kB low:3064kB high:4080kB active_anon:952176kB inactive_anon:0kB active_file:36kB inactive_file:0kB unevictable:0kB writepending:88kB present:1032192kB managed:1019388kB mlocked:0kB slab_reclaimable:13532kB slab_unreclaimable:16460kB kernel_stack:3552kB pagetables:6672kB bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB lowmem_reserve[]: 0 0 0 959 Movable free:3644kB min:1980kB low:2960kB high:3940kB active_anon:738560kB inactive_anon:261340kB active_file:188kB inactive_file:640kB unevictable:0kB writepending:20kB present:1048444kB managed:1010816kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:832kB local_pcp:60kB free_cma:0kB lowmem_reserve[]: 0 0 0 0 DMA: 1*4kB (E) 0*8kB 18*16kB (E) 10*32kB (E) 10*64kB (E) 9*128kB (ME) 8*256kB (E) 2*512kB (E) 2*1024kB (E) 0*2048kB 0*4096kB = 7524kB DMA32: 417*4kB (UMEH) 181*8kB (UMEH) 68*16kB (UMEH) 48*32kB (UMEH) 14*64kB (MH) 3*128kB (M) 1*256kB (H) 1*512kB (M) 2*1024kB (M) 0*2048kB 0*4096kB = 9836kB Movable: 1*4kB (M) 1*8kB (M) 1*16kB (M) 1*32kB (M) 0*64kB 1*128kB (M) 2*256kB (M) 4*512kB (M) 1*1024kB (M) 0*2048kB 0*4096kB = 3772kB 378 total pagecache pages 17 pages in swap cache Swap cache stats: add 17325, delete 17302, find 0/27 Free swap = 978940kB Total swap = 1048572kB 524157 pages RAM 0 pages HighMem/MovableOnly 12629 pages reserved 0 pages cma reserved 0 pages hwpoisoned [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name [ 433] 0 433 49045 14 3 82 0 upstart-udev-br [ 438] 0 438123715 27 3 191 -1000 systemd-udevd ... Signed-off-by: Minchan Kim --- mm/vmscan.c | 33 + 1 file changed, 29 insertions(+), 4 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 2314aca47d12..1fec21d155b3 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.