Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
On Thu, 15 Nov 2012, Jiang Liu wrote: > I feel that zone->present_pages has been abused. I guess it means "physical > pages > present in this zone" originally, but now sometimes zone->present_pages is > used as > "pages in this zone managed by the buddy system". It's definition is all pages spanned by the zone that are not reserved and unavailable to the kernel to allocate from, and the implementation of bootmem requires that its memory be considered as "reserved" until freed. It's used throughout the kernel to determine the amount of memory that is allocatable in that zone from the page allocator since its reclaim heuristics and watermarks depend on this memory being allocatable. > So I'm trying to add a new > field "managed_pages" into zone, which accounts for pages managed by buddy > system. > That's why I thought the clean solution is a little complex:( > You need to update the pgdat's node_present_pages to be consistent with all of its zones' present_pages. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
On Thu, 15 Nov 2012, Andrew Morton wrote: > From: Andrew Morton > Subject: revert "mm: fix-up zone present pages" > > Revert > > commit 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 > Author: Jianguo Wu > AuthorDate: Mon Oct 8 16:33:06 2012 -0700 > Commit: Linus Torvalds > CommitDate: Tue Oct 9 16:22:54 2012 +0900 > > mm: fix-up zone present pages > > > That patch tried to fix a issue when calculating zone->present_pages, but > it caused a regression on 32bit systems with HIGHMEM. With that > changeset, reset_zone_present_pages() resets all zone->present_pages to > zero, and fixup_zone_present_pages() is called to recalculate > zone->present_pages when the boot allocator frees core memory pages into > buddy allocator. Because highmem pages are not freed by bootmem > allocator, all highmem zones' present_pages becomes zero. > > Various options for improving the situation are being discussed but for > now, let's return to the 3.6 code. > > Cc: Jianguo Wu > Cc: Jiang Liu > Cc: Petr Tesarik > Cc: "Luck, Tony" > Cc: Mel Gorman > Cc: Yinghai Lu > Cc: Minchan Kim > Cc: Johannes Weiner > Cc: David Rientjes > Signed-off-by: Andrew Morton Acked-by: David Rientjes -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
On 11/15/12 19:24, Andrew Morton wrote: On Wed, 14 Nov 2012 22:52:03 +0800 Jiang Liu wrote: So how about totally reverting the changeset 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 and I will post another version once I found a cleaner way? We do need to get this regression fixed and I guess that a straightforward revert is an acceptable way of doing that, for now. I queued the below, with a plan to send it to Linus next week. I've applied this patch to v3.7-rc5-28-g79e979e and can confirm that it fixes the problem I had with my laptop failing to resume (by either freezing or rebooting) after a suspend to disk. Tested-by: Chris Clayton From: Andrew Morton Subject: revert "mm: fix-up zone present pages" Revert commit 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 Author: Jianguo Wu AuthorDate: Mon Oct 8 16:33:06 2012 -0700 Commit: Linus Torvalds CommitDate: Tue Oct 9 16:22:54 2012 +0900 mm: fix-up zone present pages That patch tried to fix a issue when calculating zone->present_pages, but it caused a regression on 32bit systems with HIGHMEM. With that changeset, reset_zone_present_pages() resets all zone->present_pages to zero, and fixup_zone_present_pages() is called to recalculate zone->present_pages when the boot allocator frees core memory pages into buddy allocator. Because highmem pages are not freed by bootmem allocator, all highmem zones' present_pages becomes zero. Various options for improving the situation are being discussed but for now, let's return to the 3.6 code. Cc: Jianguo Wu Cc: Jiang Liu Cc: Petr Tesarik Cc: "Luck, Tony" Cc: Mel Gorman Cc: Yinghai Lu Cc: Minchan Kim Cc: Johannes Weiner Cc: David Rientjes Signed-off-by: Andrew Morton --- arch/ia64/mm/init.c |1 - include/linux/mm.h |4 mm/bootmem.c| 10 +- mm/memory_hotplug.c |7 --- mm/nobootmem.c |3 --- mm/page_alloc.c | 34 -- 6 files changed, 1 insertion(+), 58 deletions(-) diff -puN arch/ia64/mm/init.c~revert-1 arch/ia64/mm/init.c --- a/arch/ia64/mm/init.c~revert-1 +++ a/arch/ia64/mm/init.c @@ -637,7 +637,6 @@ mem_init (void) high_memory = __va(max_low_pfn * PAGE_SIZE); - reset_zone_present_pages(); for_each_online_pgdat(pgdat) if (pgdat->bdata->node_bootmem_map) totalram_pages += free_all_bootmem_node(pgdat); diff -puN include/linux/mm.h~revert-1 include/linux/mm.h --- a/include/linux/mm.h~revert-1 +++ a/include/linux/mm.h @@ -1684,9 +1684,5 @@ static inline unsigned int debug_guardpa static inline bool page_is_guard(struct page *page) { return false; } #endif /* CONFIG_DEBUG_PAGEALLOC */ -extern void reset_zone_present_pages(void); -extern void fixup_zone_present_pages(int nid, unsigned long start_pfn, - unsigned long end_pfn); - #endif /* __KERNEL__ */ #endif /* _LINUX_MM_H */ diff -puN mm/bootmem.c~revert-1 mm/bootmem.c --- a/mm/bootmem.c~revert-1 +++ a/mm/bootmem.c @@ -198,8 +198,6 @@ static unsigned long __init free_all_boo int order = ilog2(BITS_PER_LONG); __free_pages_bootmem(pfn_to_page(start), order); - fixup_zone_present_pages(page_to_nid(pfn_to_page(start)), - start, start + BITS_PER_LONG); count += BITS_PER_LONG; start += BITS_PER_LONG; } else { @@ -210,9 +208,6 @@ static unsigned long __init free_all_boo if (vec & 1) { page = pfn_to_page(start + off); __free_pages_bootmem(page, 0); - fixup_zone_present_pages( - page_to_nid(page), - start + off, start + off + 1); count++; } vec >>= 1; @@ -226,11 +221,8 @@ static unsigned long __init free_all_boo pages = bdata->node_low_pfn - bdata->node_min_pfn; pages = bootmem_bootmap_pages(pages); count += pages; - while (pages--) { - fixup_zone_present_pages(page_to_nid(page), - page_to_pfn(page), page_to_pfn(page) + 1); + while (pages--) __free_pages_bootmem(page++, 0); - } bdebug("nid=%td released=%lx\n", bdata - bootmem_node_data, count); diff -puN mm/memory_hotplug.c~revert-1 mm/memory_hotplug.c --- a/mm/memory_hotplug.c~revert-1 +++ a/mm/memory_hotplug.c @@ -106,7 +106,6 @@ static void get_page_bootmem(unsigned lo void __ref put_page_bootmem(struct page *page) { unsigned long type; - struct zone *zone; type = (unsigned long) page->lru.next; BUG_ON(type <
Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
On Wed, 14 Nov 2012 22:52:03 +0800 Jiang Liu wrote: > So how about totally reverting the changeset > 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 > and I will post another version once I found a cleaner way? We do need to get this regression fixed and I guess that a straightforward revert is an acceptable way of doing that, for now. I queued the below, with a plan to send it to Linus next week. From: Andrew Morton Subject: revert "mm: fix-up zone present pages" Revert commit 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 Author: Jianguo Wu AuthorDate: Mon Oct 8 16:33:06 2012 -0700 Commit: Linus Torvalds CommitDate: Tue Oct 9 16:22:54 2012 +0900 mm: fix-up zone present pages That patch tried to fix a issue when calculating zone->present_pages, but it caused a regression on 32bit systems with HIGHMEM. With that changeset, reset_zone_present_pages() resets all zone->present_pages to zero, and fixup_zone_present_pages() is called to recalculate zone->present_pages when the boot allocator frees core memory pages into buddy allocator. Because highmem pages are not freed by bootmem allocator, all highmem zones' present_pages becomes zero. Various options for improving the situation are being discussed but for now, let's return to the 3.6 code. Cc: Jianguo Wu Cc: Jiang Liu Cc: Petr Tesarik Cc: "Luck, Tony" Cc: Mel Gorman Cc: Yinghai Lu Cc: Minchan Kim Cc: Johannes Weiner Cc: David Rientjes Signed-off-by: Andrew Morton --- arch/ia64/mm/init.c |1 - include/linux/mm.h |4 mm/bootmem.c| 10 +- mm/memory_hotplug.c |7 --- mm/nobootmem.c |3 --- mm/page_alloc.c | 34 -- 6 files changed, 1 insertion(+), 58 deletions(-) diff -puN arch/ia64/mm/init.c~revert-1 arch/ia64/mm/init.c --- a/arch/ia64/mm/init.c~revert-1 +++ a/arch/ia64/mm/init.c @@ -637,7 +637,6 @@ mem_init (void) high_memory = __va(max_low_pfn * PAGE_SIZE); - reset_zone_present_pages(); for_each_online_pgdat(pgdat) if (pgdat->bdata->node_bootmem_map) totalram_pages += free_all_bootmem_node(pgdat); diff -puN include/linux/mm.h~revert-1 include/linux/mm.h --- a/include/linux/mm.h~revert-1 +++ a/include/linux/mm.h @@ -1684,9 +1684,5 @@ static inline unsigned int debug_guardpa static inline bool page_is_guard(struct page *page) { return false; } #endif /* CONFIG_DEBUG_PAGEALLOC */ -extern void reset_zone_present_pages(void); -extern void fixup_zone_present_pages(int nid, unsigned long start_pfn, - unsigned long end_pfn); - #endif /* __KERNEL__ */ #endif /* _LINUX_MM_H */ diff -puN mm/bootmem.c~revert-1 mm/bootmem.c --- a/mm/bootmem.c~revert-1 +++ a/mm/bootmem.c @@ -198,8 +198,6 @@ static unsigned long __init free_all_boo int order = ilog2(BITS_PER_LONG); __free_pages_bootmem(pfn_to_page(start), order); - fixup_zone_present_pages(page_to_nid(pfn_to_page(start)), - start, start + BITS_PER_LONG); count += BITS_PER_LONG; start += BITS_PER_LONG; } else { @@ -210,9 +208,6 @@ static unsigned long __init free_all_boo if (vec & 1) { page = pfn_to_page(start + off); __free_pages_bootmem(page, 0); - fixup_zone_present_pages( - page_to_nid(page), - start + off, start + off + 1); count++; } vec >>= 1; @@ -226,11 +221,8 @@ static unsigned long __init free_all_boo pages = bdata->node_low_pfn - bdata->node_min_pfn; pages = bootmem_bootmap_pages(pages); count += pages; - while (pages--) { - fixup_zone_present_pages(page_to_nid(page), - page_to_pfn(page), page_to_pfn(page) + 1); + while (pages--) __free_pages_bootmem(page++, 0); - } bdebug("nid=%td released=%lx\n", bdata - bootmem_node_data, count); diff -puN mm/memory_hotplug.c~revert-1 mm/memory_hotplug.c --- a/mm/memory_hotplug.c~revert-1 +++ a/mm/memory_hotplug.c @@ -106,7 +106,6 @@ static void get_page_bootmem(unsigned lo void __ref put_page_bootmem(struct page *page) { unsigned long type; - struct zone *zone; type = (unsigned long) page->lru.next; BUG_ON(type < MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE || @@ -117,12 +116,6 @@ void __ref put_page_bootmem(struct page set_page_private(page, 0); INIT_LIST_HEAD(>lru); __free_pages_bootmem(page, 0); - - zone = page_zone(page); -
Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
On 11/15/2012 05:22 PM, Wen Congyang wrote: > Hi, Liu Jiang > > At 11/14/2012 10:52 PM, Jiang Liu Wrote: >> On 11/07/2012 04:43 AM, Andrew Morton wrote: >>> On Tue, 6 Nov 2012 09:31:57 +0800 >>> Jiang Liu wrote: >>> Changeset 7f1290f2f2 tries to fix a issue when calculating zone->present_pages, but it causes a regression to 32bit systems with HIGHMEM. With that changeset, function reset_zone_present_pages() resets all zone->present_pages to zero, and fixup_zone_present_pages() is called to recalculate zone->present_pages when boot allocator frees core memory pages into buddy allocator. Because highmem pages are not freed by bootmem allocator, all highmem zones' present_pages becomes zero. Actually there's no need to recalculate present_pages for highmem zone because bootmem allocator never allocates pages from them. So fix the regression by skipping highmem in function reset_zone_present_pages() and fixup_zone_present_pages(). ... --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void) for_each_node_state(nid, N_HIGH_MEMORY) { for (i = 0; i < MAX_NR_ZONES; i++) { z = NODE_DATA(nid)->node_zones + i; - z->present_pages = 0; + if (!is_highmem(z)) + z->present_pages = 0; } } } @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long start_pfn, for (i = 0; i < MAX_NR_ZONES; i++) { z = NODE_DATA(nid)->node_zones + i; + if (is_highmem(z)) + continue; + zone_start_pfn = z->zone_start_pfn; zone_end_pfn = zone_start_pfn + z->spanned_pages; - - /* if the two regions intersect */ if (!(zone_start_pfn >= end_pfn || zone_end_pfn <= start_pfn)) z->present_pages += min(end_pfn, zone_end_pfn) - max(start_pfn, zone_start_pfn); >>> >>> This ... isn't very nice. It is embeds within >>> reset_zone_present_pages() and fixup_zone_present_pages() knowledge >>> about their caller's state. Or, more specifically, it is emebedding >>> knowledge about the overall state of the system when these functions >>> are called. >>> >>> I mean, a function called "reset_zone_present_pages" should reset >>> ->present_pages! >>> >>> The fact that fixup_zone_present_page() has multiple call sites makes >>> this all even more risky. And what are the interactions between this >>> and memory hotplug? >>> >>> Can we find a cleaner fix? >>> >>> Please tell us more about what's happening here. Is it the case that >>> reset_zone_present_pages() is being called *after* highmem has been >>> populated? If so, then fixup_zone_present_pages() should work >>> correctly for highmem? Or is it the case that highmem hasn't yet been >>> setup? IOW, what is the sequence of operations here? >>> >>> Is the problem that we're *missing* a call to >>> fixup_zone_present_pages(), perhaps? If we call >>> fixup_zone_present_pages() after highmem has been populated, >>> fixup_zone_present_pages() should correctly fill in the highmem zone's >>> ->present_pages? >> Hi Andrew, >> Sorry for the late response:( >> I have done more investigations according to your suggestions. Currently >> we have only called fixup_zone_present_pages() for memory freed by bootmem >> allocator and missed HIGHMEM pages. We could also call >> fixup_zone_present_pages() >> for HIGHMEM pages, but that will need to change arch specific code for x86, >> powerpc, >> sparc, microblaze, arm, mips, um and tile etc. Seems a little overhead. >> And sadly enough, I found the quick fix is still incomplete. The >> original >> patch still have another issue that, reset_zone_present_pages() is only >> called >> for IA64, so it will cause trouble for other arches which make use of >> "bootmem.c". >> Then I feel a little guilty and tried to find a cleaner solution without >> touching arch specific code. But things are more complex than my expectation >> and >> I'm still working on that. >> So how about totally reverting the changeset >> 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 >> and I will post another version once I found a cleaner way? > > I think fixup_zone_present_pages() are very useful for memory hotplug. > > We calculate zone->present_pages in free_area_init_core(), but its value is > wrong. > So it is why we fix it in fixup_zone_present_pages(). > > What about this: > 1. init zone->present_pages to the present pages in this zone(include bootmem) > 2. don't reset zone->present_pages for HIGHMEM pages > > We don't allocate bootmem from HIGHMEM. So its present pages is inited in > step1 > and there is no need to fix it in step2. Hi Congyang,
Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
At 2012/11/15 19:28, Bob Liu Wrote: On Thu, Nov 15, 2012 at 5:22 PM, Wen Congyang wrote: Hi, Liu Jiang At 11/14/2012 10:52 PM, Jiang Liu Wrote: On 11/07/2012 04:43 AM, Andrew Morton wrote: On Tue, 6 Nov 2012 09:31:57 +0800 Jiang Liu wrote: Changeset 7f1290f2f2 tries to fix a issue when calculating zone->present_pages, but it causes a regression to 32bit systems with HIGHMEM. With that changeset, function reset_zone_present_pages() resets all zone->present_pages to zero, and fixup_zone_present_pages() is called to recalculate zone->present_pages when boot allocator frees core memory pages into buddy allocator. Because highmem pages are not freed by bootmem allocator, all highmem zones' present_pages becomes zero. Actually there's no need to recalculate present_pages for highmem zone because bootmem allocator never allocates pages from them. So fix the regression by skipping highmem in function reset_zone_present_pages() and fixup_zone_present_pages(). ... --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void) for_each_node_state(nid, N_HIGH_MEMORY) { for (i = 0; i< MAX_NR_ZONES; i++) { z = NODE_DATA(nid)->node_zones + i; - z->present_pages = 0; + if (!is_highmem(z)) + z->present_pages = 0; } } } @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long start_pfn, for (i = 0; i< MAX_NR_ZONES; i++) { z = NODE_DATA(nid)->node_zones + i; + if (is_highmem(z)) + continue; + zone_start_pfn = z->zone_start_pfn; zone_end_pfn = zone_start_pfn + z->spanned_pages; - - /* if the two regions intersect */ if (!(zone_start_pfn>= end_pfn || zone_end_pfn<= start_pfn)) z->present_pages += min(end_pfn, zone_end_pfn) - max(start_pfn, zone_start_pfn); This ... isn't very nice. It is embeds within reset_zone_present_pages() and fixup_zone_present_pages() knowledge about their caller's state. Or, more specifically, it is emebedding knowledge about the overall state of the system when these functions are called. I mean, a function called "reset_zone_present_pages" should reset ->present_pages! The fact that fixup_zone_present_page() has multiple call sites makes this all even more risky. And what are the interactions between this and memory hotplug? Can we find a cleaner fix? Please tell us more about what's happening here. Is it the case that reset_zone_present_pages() is being called *after* highmem has been populated? If so, then fixup_zone_present_pages() should work correctly for highmem? Or is it the case that highmem hasn't yet been setup? IOW, what is the sequence of operations here? Is the problem that we're *missing* a call to fixup_zone_present_pages(), perhaps? If we call fixup_zone_present_pages() after highmem has been populated, fixup_zone_present_pages() should correctly fill in the highmem zone's ->present_pages? Hi Andrew, Sorry for the late response:( I have done more investigations according to your suggestions. Currently we have only called fixup_zone_present_pages() for memory freed by bootmem allocator and missed HIGHMEM pages. We could also call fixup_zone_present_pages() for HIGHMEM pages, but that will need to change arch specific code for x86, powerpc, sparc, microblaze, arm, mips, um and tile etc. Seems a little overhead. And sadly enough, I found the quick fix is still incomplete. The original patch still have another issue that, reset_zone_present_pages() is only called for IA64, so it will cause trouble for other arches which make use of "bootmem.c". Then I feel a little guilty and tried to find a cleaner solution without touching arch specific code. But things are more complex than my expectation and I'm still working on that. So how about totally reverting the changeset 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 and I will post another version once I found a cleaner way? I think fixup_zone_present_pages() are very useful for memory hotplug. I might miss something, but if memory hotplug is the only user depends on fixup_zone_present_pages(). IIRC, water_mask depends on zone->present_pages. But I don't meet any problem even if zone->present_pages is wrong. Why not reverting the changeset 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 And add checking to offline_pages() like: if (zone->present_pages>= offlined_page) zone->present_pages -= offlined_pages; else zone->present_pages = 0; It's more simple and can minimize the effect to other parts of kernel. Hmm, zone->present_pages may be 0 when there is memory in this zone which is onlined and in use. If zone->present_pages becomes to 0, we will free pcp list for this zone. It will cause some unexpected error. We
Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
On Thu, Nov 15, 2012 at 5:22 PM, Wen Congyang wrote: > Hi, Liu Jiang > > At 11/14/2012 10:52 PM, Jiang Liu Wrote: >> On 11/07/2012 04:43 AM, Andrew Morton wrote: >>> On Tue, 6 Nov 2012 09:31:57 +0800 >>> Jiang Liu wrote: >>> Changeset 7f1290f2f2 tries to fix a issue when calculating zone->present_pages, but it causes a regression to 32bit systems with HIGHMEM. With that changeset, function reset_zone_present_pages() resets all zone->present_pages to zero, and fixup_zone_present_pages() is called to recalculate zone->present_pages when boot allocator frees core memory pages into buddy allocator. Because highmem pages are not freed by bootmem allocator, all highmem zones' present_pages becomes zero. Actually there's no need to recalculate present_pages for highmem zone because bootmem allocator never allocates pages from them. So fix the regression by skipping highmem in function reset_zone_present_pages() and fixup_zone_present_pages(). ... --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void) for_each_node_state(nid, N_HIGH_MEMORY) { for (i = 0; i < MAX_NR_ZONES; i++) { z = NODE_DATA(nid)->node_zones + i; - z->present_pages = 0; + if (!is_highmem(z)) + z->present_pages = 0; } } } @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long start_pfn, for (i = 0; i < MAX_NR_ZONES; i++) { z = NODE_DATA(nid)->node_zones + i; + if (is_highmem(z)) + continue; + zone_start_pfn = z->zone_start_pfn; zone_end_pfn = zone_start_pfn + z->spanned_pages; - - /* if the two regions intersect */ if (!(zone_start_pfn >= end_pfn || zone_end_pfn <= start_pfn)) z->present_pages += min(end_pfn, zone_end_pfn) - max(start_pfn, zone_start_pfn); >>> >>> This ... isn't very nice. It is embeds within >>> reset_zone_present_pages() and fixup_zone_present_pages() knowledge >>> about their caller's state. Or, more specifically, it is emebedding >>> knowledge about the overall state of the system when these functions >>> are called. >>> >>> I mean, a function called "reset_zone_present_pages" should reset >>> ->present_pages! >>> >>> The fact that fixup_zone_present_page() has multiple call sites makes >>> this all even more risky. And what are the interactions between this >>> and memory hotplug? >>> >>> Can we find a cleaner fix? >>> >>> Please tell us more about what's happening here. Is it the case that >>> reset_zone_present_pages() is being called *after* highmem has been >>> populated? If so, then fixup_zone_present_pages() should work >>> correctly for highmem? Or is it the case that highmem hasn't yet been >>> setup? IOW, what is the sequence of operations here? >>> >>> Is the problem that we're *missing* a call to >>> fixup_zone_present_pages(), perhaps? If we call >>> fixup_zone_present_pages() after highmem has been populated, >>> fixup_zone_present_pages() should correctly fill in the highmem zone's >>> ->present_pages? >> Hi Andrew, >> Sorry for the late response:( >> I have done more investigations according to your suggestions. >> Currently >> we have only called fixup_zone_present_pages() for memory freed by bootmem >> allocator and missed HIGHMEM pages. We could also call >> fixup_zone_present_pages() >> for HIGHMEM pages, but that will need to change arch specific code for x86, >> powerpc, >> sparc, microblaze, arm, mips, um and tile etc. Seems a little overhead. >> And sadly enough, I found the quick fix is still incomplete. The >> original >> patch still have another issue that, reset_zone_present_pages() is only >> called >> for IA64, so it will cause trouble for other arches which make use of >> "bootmem.c". >> Then I feel a little guilty and tried to find a cleaner solution >> without >> touching arch specific code. But things are more complex than my expectation >> and >> I'm still working on that. >> So how about totally reverting the changeset >> 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 >> and I will post another version once I found a cleaner way? > > I think fixup_zone_present_pages() are very useful for memory hotplug. > I might miss something, but if memory hotplug is the only user depends on fixup_zone_present_pages(). Why not reverting the changeset 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 And add checking to offline_pages() like: if (zone->present_pages >= offlined_page) zone->present_pages -= offlined_pages; else zone->present_pages = 0; It's more simple and can minimize the effect to other parts of
Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
Hi, Liu Jiang At 11/14/2012 10:52 PM, Jiang Liu Wrote: > On 11/07/2012 04:43 AM, Andrew Morton wrote: >> On Tue, 6 Nov 2012 09:31:57 +0800 >> Jiang Liu wrote: >> >>> Changeset 7f1290f2f2 tries to fix a issue when calculating >>> zone->present_pages, but it causes a regression to 32bit systems with >>> HIGHMEM. With that changeset, function reset_zone_present_pages() >>> resets all zone->present_pages to zero, and fixup_zone_present_pages() >>> is called to recalculate zone->present_pages when boot allocator frees >>> core memory pages into buddy allocator. Because highmem pages are not >>> freed by bootmem allocator, all highmem zones' present_pages becomes >>> zero. >>> >>> Actually there's no need to recalculate present_pages for highmem zone >>> because bootmem allocator never allocates pages from them. So fix the >>> regression by skipping highmem in function reset_zone_present_pages() >>> and fixup_zone_present_pages(). >>> >>> ... >>> >>> --- a/mm/page_alloc.c >>> +++ b/mm/page_alloc.c >>> @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void) >>> for_each_node_state(nid, N_HIGH_MEMORY) { >>> for (i = 0; i < MAX_NR_ZONES; i++) { >>> z = NODE_DATA(nid)->node_zones + i; >>> - z->present_pages = 0; >>> + if (!is_highmem(z)) >>> + z->present_pages = 0; >>> } >>> } >>> } >>> @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned >>> long start_pfn, >>> >>> for (i = 0; i < MAX_NR_ZONES; i++) { >>> z = NODE_DATA(nid)->node_zones + i; >>> + if (is_highmem(z)) >>> + continue; >>> + >>> zone_start_pfn = z->zone_start_pfn; >>> zone_end_pfn = zone_start_pfn + z->spanned_pages; >>> - >>> - /* if the two regions intersect */ >>> if (!(zone_start_pfn >= end_pfn || zone_end_pfn <= start_pfn)) >>> z->present_pages += min(end_pfn, zone_end_pfn) - >>> max(start_pfn, zone_start_pfn); >> >> This ... isn't very nice. It is embeds within >> reset_zone_present_pages() and fixup_zone_present_pages() knowledge >> about their caller's state. Or, more specifically, it is emebedding >> knowledge about the overall state of the system when these functions >> are called. >> >> I mean, a function called "reset_zone_present_pages" should reset >> ->present_pages! >> >> The fact that fixup_zone_present_page() has multiple call sites makes >> this all even more risky. And what are the interactions between this >> and memory hotplug? >> >> Can we find a cleaner fix? >> >> Please tell us more about what's happening here. Is it the case that >> reset_zone_present_pages() is being called *after* highmem has been >> populated? If so, then fixup_zone_present_pages() should work >> correctly for highmem? Or is it the case that highmem hasn't yet been >> setup? IOW, what is the sequence of operations here? >> >> Is the problem that we're *missing* a call to >> fixup_zone_present_pages(), perhaps? If we call >> fixup_zone_present_pages() after highmem has been populated, >> fixup_zone_present_pages() should correctly fill in the highmem zone's >> ->present_pages? > Hi Andrew, > Sorry for the late response:( > I have done more investigations according to your suggestions. Currently > we have only called fixup_zone_present_pages() for memory freed by bootmem > allocator and missed HIGHMEM pages. We could also call > fixup_zone_present_pages() > for HIGHMEM pages, but that will need to change arch specific code for x86, > powerpc, > sparc, microblaze, arm, mips, um and tile etc. Seems a little overhead. > And sadly enough, I found the quick fix is still incomplete. The > original > patch still have another issue that, reset_zone_present_pages() is only called > for IA64, so it will cause trouble for other arches which make use of > "bootmem.c". > Then I feel a little guilty and tried to find a cleaner solution without > touching arch specific code. But things are more complex than my expectation > and > I'm still working on that. > So how about totally reverting the changeset > 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 > and I will post another version once I found a cleaner way? I think fixup_zone_present_pages() are very useful for memory hotplug. We calculate zone->present_pages in free_area_init_core(), but its value is wrong. So it is why we fix it in fixup_zone_present_pages(). What about this: 1. init zone->present_pages to the present pages in this zone(include bootmem) 2. don't reset zone->present_pages for HIGHMEM pages We don't allocate bootmem from HIGHMEM. So its present pages is inited in step1 and there is no need to fix it in step2. Is it OK? If it is OK, I will resend the patch for step1(the patch is from laijs). Thanks Wen Congyang > Thanks! > Gerry > >> >> >> -- >> To
Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
Hi, Liu Jiang At 11/14/2012 10:52 PM, Jiang Liu Wrote: On 11/07/2012 04:43 AM, Andrew Morton wrote: On Tue, 6 Nov 2012 09:31:57 +0800 Jiang Liu jiang@huawei.com wrote: Changeset 7f1290f2f2 tries to fix a issue when calculating zone-present_pages, but it causes a regression to 32bit systems with HIGHMEM. With that changeset, function reset_zone_present_pages() resets all zone-present_pages to zero, and fixup_zone_present_pages() is called to recalculate zone-present_pages when boot allocator frees core memory pages into buddy allocator. Because highmem pages are not freed by bootmem allocator, all highmem zones' present_pages becomes zero. Actually there's no need to recalculate present_pages for highmem zone because bootmem allocator never allocates pages from them. So fix the regression by skipping highmem in function reset_zone_present_pages() and fixup_zone_present_pages(). ... --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void) for_each_node_state(nid, N_HIGH_MEMORY) { for (i = 0; i MAX_NR_ZONES; i++) { z = NODE_DATA(nid)-node_zones + i; - z-present_pages = 0; + if (!is_highmem(z)) + z-present_pages = 0; } } } @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long start_pfn, for (i = 0; i MAX_NR_ZONES; i++) { z = NODE_DATA(nid)-node_zones + i; + if (is_highmem(z)) + continue; + zone_start_pfn = z-zone_start_pfn; zone_end_pfn = zone_start_pfn + z-spanned_pages; - - /* if the two regions intersect */ if (!(zone_start_pfn = end_pfn || zone_end_pfn = start_pfn)) z-present_pages += min(end_pfn, zone_end_pfn) - max(start_pfn, zone_start_pfn); This ... isn't very nice. It is embeds within reset_zone_present_pages() and fixup_zone_present_pages() knowledge about their caller's state. Or, more specifically, it is emebedding knowledge about the overall state of the system when these functions are called. I mean, a function called reset_zone_present_pages should reset -present_pages! The fact that fixup_zone_present_page() has multiple call sites makes this all even more risky. And what are the interactions between this and memory hotplug? Can we find a cleaner fix? Please tell us more about what's happening here. Is it the case that reset_zone_present_pages() is being called *after* highmem has been populated? If so, then fixup_zone_present_pages() should work correctly for highmem? Or is it the case that highmem hasn't yet been setup? IOW, what is the sequence of operations here? Is the problem that we're *missing* a call to fixup_zone_present_pages(), perhaps? If we call fixup_zone_present_pages() after highmem has been populated, fixup_zone_present_pages() should correctly fill in the highmem zone's -present_pages? Hi Andrew, Sorry for the late response:( I have done more investigations according to your suggestions. Currently we have only called fixup_zone_present_pages() for memory freed by bootmem allocator and missed HIGHMEM pages. We could also call fixup_zone_present_pages() for HIGHMEM pages, but that will need to change arch specific code for x86, powerpc, sparc, microblaze, arm, mips, um and tile etc. Seems a little overhead. And sadly enough, I found the quick fix is still incomplete. The original patch still have another issue that, reset_zone_present_pages() is only called for IA64, so it will cause trouble for other arches which make use of bootmem.c. Then I feel a little guilty and tried to find a cleaner solution without touching arch specific code. But things are more complex than my expectation and I'm still working on that. So how about totally reverting the changeset 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 and I will post another version once I found a cleaner way? I think fixup_zone_present_pages() are very useful for memory hotplug. We calculate zone-present_pages in free_area_init_core(), but its value is wrong. So it is why we fix it in fixup_zone_present_pages(). What about this: 1. init zone-present_pages to the present pages in this zone(include bootmem) 2. don't reset zone-present_pages for HIGHMEM pages We don't allocate bootmem from HIGHMEM. So its present pages is inited in step1 and there is no need to fix it in step2. Is it OK? If it is OK, I will resend the patch for step1(the patch is from laijs). Thanks Wen Congyang Thanks! Gerry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at
Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
On Thu, Nov 15, 2012 at 5:22 PM, Wen Congyang we...@cn.fujitsu.com wrote: Hi, Liu Jiang At 11/14/2012 10:52 PM, Jiang Liu Wrote: On 11/07/2012 04:43 AM, Andrew Morton wrote: On Tue, 6 Nov 2012 09:31:57 +0800 Jiang Liu jiang@huawei.com wrote: Changeset 7f1290f2f2 tries to fix a issue when calculating zone-present_pages, but it causes a regression to 32bit systems with HIGHMEM. With that changeset, function reset_zone_present_pages() resets all zone-present_pages to zero, and fixup_zone_present_pages() is called to recalculate zone-present_pages when boot allocator frees core memory pages into buddy allocator. Because highmem pages are not freed by bootmem allocator, all highmem zones' present_pages becomes zero. Actually there's no need to recalculate present_pages for highmem zone because bootmem allocator never allocates pages from them. So fix the regression by skipping highmem in function reset_zone_present_pages() and fixup_zone_present_pages(). ... --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void) for_each_node_state(nid, N_HIGH_MEMORY) { for (i = 0; i MAX_NR_ZONES; i++) { z = NODE_DATA(nid)-node_zones + i; - z-present_pages = 0; + if (!is_highmem(z)) + z-present_pages = 0; } } } @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long start_pfn, for (i = 0; i MAX_NR_ZONES; i++) { z = NODE_DATA(nid)-node_zones + i; + if (is_highmem(z)) + continue; + zone_start_pfn = z-zone_start_pfn; zone_end_pfn = zone_start_pfn + z-spanned_pages; - - /* if the two regions intersect */ if (!(zone_start_pfn = end_pfn || zone_end_pfn = start_pfn)) z-present_pages += min(end_pfn, zone_end_pfn) - max(start_pfn, zone_start_pfn); This ... isn't very nice. It is embeds within reset_zone_present_pages() and fixup_zone_present_pages() knowledge about their caller's state. Or, more specifically, it is emebedding knowledge about the overall state of the system when these functions are called. I mean, a function called reset_zone_present_pages should reset -present_pages! The fact that fixup_zone_present_page() has multiple call sites makes this all even more risky. And what are the interactions between this and memory hotplug? Can we find a cleaner fix? Please tell us more about what's happening here. Is it the case that reset_zone_present_pages() is being called *after* highmem has been populated? If so, then fixup_zone_present_pages() should work correctly for highmem? Or is it the case that highmem hasn't yet been setup? IOW, what is the sequence of operations here? Is the problem that we're *missing* a call to fixup_zone_present_pages(), perhaps? If we call fixup_zone_present_pages() after highmem has been populated, fixup_zone_present_pages() should correctly fill in the highmem zone's -present_pages? Hi Andrew, Sorry for the late response:( I have done more investigations according to your suggestions. Currently we have only called fixup_zone_present_pages() for memory freed by bootmem allocator and missed HIGHMEM pages. We could also call fixup_zone_present_pages() for HIGHMEM pages, but that will need to change arch specific code for x86, powerpc, sparc, microblaze, arm, mips, um and tile etc. Seems a little overhead. And sadly enough, I found the quick fix is still incomplete. The original patch still have another issue that, reset_zone_present_pages() is only called for IA64, so it will cause trouble for other arches which make use of bootmem.c. Then I feel a little guilty and tried to find a cleaner solution without touching arch specific code. But things are more complex than my expectation and I'm still working on that. So how about totally reverting the changeset 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 and I will post another version once I found a cleaner way? I think fixup_zone_present_pages() are very useful for memory hotplug. I might miss something, but if memory hotplug is the only user depends on fixup_zone_present_pages(). Why not reverting the changeset 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 And add checking to offline_pages() like: if (zone-present_pages = offlined_page) zone-present_pages -= offlined_pages; else zone-present_pages = 0; It's more simple and can minimize the effect to other parts of kernel. We calculate zone-present_pages in free_area_init_core(), but its value is wrong. So it is why we fix it in fixup_zone_present_pages(). What about this: 1. init zone-present_pages to the present pages in this zone(include bootmem) 2. don't reset zone-present_pages for HIGHMEM pages
Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
At 2012/11/15 19:28, Bob Liu Wrote: On Thu, Nov 15, 2012 at 5:22 PM, Wen Congyangwe...@cn.fujitsu.com wrote: Hi, Liu Jiang At 11/14/2012 10:52 PM, Jiang Liu Wrote: On 11/07/2012 04:43 AM, Andrew Morton wrote: On Tue, 6 Nov 2012 09:31:57 +0800 Jiang Liujiang@huawei.com wrote: Changeset 7f1290f2f2 tries to fix a issue when calculating zone-present_pages, but it causes a regression to 32bit systems with HIGHMEM. With that changeset, function reset_zone_present_pages() resets all zone-present_pages to zero, and fixup_zone_present_pages() is called to recalculate zone-present_pages when boot allocator frees core memory pages into buddy allocator. Because highmem pages are not freed by bootmem allocator, all highmem zones' present_pages becomes zero. Actually there's no need to recalculate present_pages for highmem zone because bootmem allocator never allocates pages from them. So fix the regression by skipping highmem in function reset_zone_present_pages() and fixup_zone_present_pages(). ... --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void) for_each_node_state(nid, N_HIGH_MEMORY) { for (i = 0; i MAX_NR_ZONES; i++) { z = NODE_DATA(nid)-node_zones + i; - z-present_pages = 0; + if (!is_highmem(z)) + z-present_pages = 0; } } } @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long start_pfn, for (i = 0; i MAX_NR_ZONES; i++) { z = NODE_DATA(nid)-node_zones + i; + if (is_highmem(z)) + continue; + zone_start_pfn = z-zone_start_pfn; zone_end_pfn = zone_start_pfn + z-spanned_pages; - - /* if the two regions intersect */ if (!(zone_start_pfn= end_pfn || zone_end_pfn= start_pfn)) z-present_pages += min(end_pfn, zone_end_pfn) - max(start_pfn, zone_start_pfn); This ... isn't very nice. It is embeds within reset_zone_present_pages() and fixup_zone_present_pages() knowledge about their caller's state. Or, more specifically, it is emebedding knowledge about the overall state of the system when these functions are called. I mean, a function called reset_zone_present_pages should reset -present_pages! The fact that fixup_zone_present_page() has multiple call sites makes this all even more risky. And what are the interactions between this and memory hotplug? Can we find a cleaner fix? Please tell us more about what's happening here. Is it the case that reset_zone_present_pages() is being called *after* highmem has been populated? If so, then fixup_zone_present_pages() should work correctly for highmem? Or is it the case that highmem hasn't yet been setup? IOW, what is the sequence of operations here? Is the problem that we're *missing* a call to fixup_zone_present_pages(), perhaps? If we call fixup_zone_present_pages() after highmem has been populated, fixup_zone_present_pages() should correctly fill in the highmem zone's -present_pages? Hi Andrew, Sorry for the late response:( I have done more investigations according to your suggestions. Currently we have only called fixup_zone_present_pages() for memory freed by bootmem allocator and missed HIGHMEM pages. We could also call fixup_zone_present_pages() for HIGHMEM pages, but that will need to change arch specific code for x86, powerpc, sparc, microblaze, arm, mips, um and tile etc. Seems a little overhead. And sadly enough, I found the quick fix is still incomplete. The original patch still have another issue that, reset_zone_present_pages() is only called for IA64, so it will cause trouble for other arches which make use of bootmem.c. Then I feel a little guilty and tried to find a cleaner solution without touching arch specific code. But things are more complex than my expectation and I'm still working on that. So how about totally reverting the changeset 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 and I will post another version once I found a cleaner way? I think fixup_zone_present_pages() are very useful for memory hotplug. I might miss something, but if memory hotplug is the only user depends on fixup_zone_present_pages(). IIRC, water_mask depends on zone-present_pages. But I don't meet any problem even if zone-present_pages is wrong. Why not reverting the changeset 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 And add checking to offline_pages() like: if (zone-present_pages= offlined_page) zone-present_pages -= offlined_pages; else zone-present_pages = 0; It's more simple and can minimize the effect to other parts of kernel. Hmm, zone-present_pages may be 0 when there is memory in this zone which is onlined and in use. If zone-present_pages becomes to 0, we will free pcp list for this zone. It will cause some unexpected error.
Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
On 11/15/2012 05:22 PM, Wen Congyang wrote: Hi, Liu Jiang At 11/14/2012 10:52 PM, Jiang Liu Wrote: On 11/07/2012 04:43 AM, Andrew Morton wrote: On Tue, 6 Nov 2012 09:31:57 +0800 Jiang Liu jiang@huawei.com wrote: Changeset 7f1290f2f2 tries to fix a issue when calculating zone-present_pages, but it causes a regression to 32bit systems with HIGHMEM. With that changeset, function reset_zone_present_pages() resets all zone-present_pages to zero, and fixup_zone_present_pages() is called to recalculate zone-present_pages when boot allocator frees core memory pages into buddy allocator. Because highmem pages are not freed by bootmem allocator, all highmem zones' present_pages becomes zero. Actually there's no need to recalculate present_pages for highmem zone because bootmem allocator never allocates pages from them. So fix the regression by skipping highmem in function reset_zone_present_pages() and fixup_zone_present_pages(). ... --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void) for_each_node_state(nid, N_HIGH_MEMORY) { for (i = 0; i MAX_NR_ZONES; i++) { z = NODE_DATA(nid)-node_zones + i; - z-present_pages = 0; + if (!is_highmem(z)) + z-present_pages = 0; } } } @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long start_pfn, for (i = 0; i MAX_NR_ZONES; i++) { z = NODE_DATA(nid)-node_zones + i; + if (is_highmem(z)) + continue; + zone_start_pfn = z-zone_start_pfn; zone_end_pfn = zone_start_pfn + z-spanned_pages; - - /* if the two regions intersect */ if (!(zone_start_pfn = end_pfn || zone_end_pfn = start_pfn)) z-present_pages += min(end_pfn, zone_end_pfn) - max(start_pfn, zone_start_pfn); This ... isn't very nice. It is embeds within reset_zone_present_pages() and fixup_zone_present_pages() knowledge about their caller's state. Or, more specifically, it is emebedding knowledge about the overall state of the system when these functions are called. I mean, a function called reset_zone_present_pages should reset -present_pages! The fact that fixup_zone_present_page() has multiple call sites makes this all even more risky. And what are the interactions between this and memory hotplug? Can we find a cleaner fix? Please tell us more about what's happening here. Is it the case that reset_zone_present_pages() is being called *after* highmem has been populated? If so, then fixup_zone_present_pages() should work correctly for highmem? Or is it the case that highmem hasn't yet been setup? IOW, what is the sequence of operations here? Is the problem that we're *missing* a call to fixup_zone_present_pages(), perhaps? If we call fixup_zone_present_pages() after highmem has been populated, fixup_zone_present_pages() should correctly fill in the highmem zone's -present_pages? Hi Andrew, Sorry for the late response:( I have done more investigations according to your suggestions. Currently we have only called fixup_zone_present_pages() for memory freed by bootmem allocator and missed HIGHMEM pages. We could also call fixup_zone_present_pages() for HIGHMEM pages, but that will need to change arch specific code for x86, powerpc, sparc, microblaze, arm, mips, um and tile etc. Seems a little overhead. And sadly enough, I found the quick fix is still incomplete. The original patch still have another issue that, reset_zone_present_pages() is only called for IA64, so it will cause trouble for other arches which make use of bootmem.c. Then I feel a little guilty and tried to find a cleaner solution without touching arch specific code. But things are more complex than my expectation and I'm still working on that. So how about totally reverting the changeset 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 and I will post another version once I found a cleaner way? I think fixup_zone_present_pages() are very useful for memory hotplug. We calculate zone-present_pages in free_area_init_core(), but its value is wrong. So it is why we fix it in fixup_zone_present_pages(). What about this: 1. init zone-present_pages to the present pages in this zone(include bootmem) 2. don't reset zone-present_pages for HIGHMEM pages We don't allocate bootmem from HIGHMEM. So its present pages is inited in step1 and there is no need to fix it in step2. Hi Congyang, I feel that zone-present_pages has been abused. I guess it means physical pages present in this zone originally, but now sometimes zone-present_pages is used as pages in this zone managed by the buddy system. So I'm trying to add a new field managed_pages into zone, which accounts for pages managed by buddy system. That's
Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
On Wed, 14 Nov 2012 22:52:03 +0800 Jiang Liu liu...@gmail.com wrote: So how about totally reverting the changeset 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 and I will post another version once I found a cleaner way? We do need to get this regression fixed and I guess that a straightforward revert is an acceptable way of doing that, for now. I queued the below, with a plan to send it to Linus next week. From: Andrew Morton a...@linux-foundation.org Subject: revert mm: fix-up zone present pages Revert commit 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 Author: Jianguo Wu wujian...@huawei.com AuthorDate: Mon Oct 8 16:33:06 2012 -0700 Commit: Linus Torvalds torva...@linux-foundation.org CommitDate: Tue Oct 9 16:22:54 2012 +0900 mm: fix-up zone present pages That patch tried to fix a issue when calculating zone-present_pages, but it caused a regression on 32bit systems with HIGHMEM. With that changeset, reset_zone_present_pages() resets all zone-present_pages to zero, and fixup_zone_present_pages() is called to recalculate zone-present_pages when the boot allocator frees core memory pages into buddy allocator. Because highmem pages are not freed by bootmem allocator, all highmem zones' present_pages becomes zero. Various options for improving the situation are being discussed but for now, let's return to the 3.6 code. Cc: Jianguo Wu wujian...@huawei.com Cc: Jiang Liu jiang@huawei.com Cc: Petr Tesarik ptesa...@suse.cz Cc: Luck, Tony tony.l...@intel.com Cc: Mel Gorman m...@csn.ul.ie Cc: Yinghai Lu ying...@kernel.org Cc: Minchan Kim minchan@gmail.com Cc: Johannes Weiner han...@cmpxchg.org Cc: David Rientjes rient...@google.com Signed-off-by: Andrew Morton a...@linux-foundation.org --- arch/ia64/mm/init.c |1 - include/linux/mm.h |4 mm/bootmem.c| 10 +- mm/memory_hotplug.c |7 --- mm/nobootmem.c |3 --- mm/page_alloc.c | 34 -- 6 files changed, 1 insertion(+), 58 deletions(-) diff -puN arch/ia64/mm/init.c~revert-1 arch/ia64/mm/init.c --- a/arch/ia64/mm/init.c~revert-1 +++ a/arch/ia64/mm/init.c @@ -637,7 +637,6 @@ mem_init (void) high_memory = __va(max_low_pfn * PAGE_SIZE); - reset_zone_present_pages(); for_each_online_pgdat(pgdat) if (pgdat-bdata-node_bootmem_map) totalram_pages += free_all_bootmem_node(pgdat); diff -puN include/linux/mm.h~revert-1 include/linux/mm.h --- a/include/linux/mm.h~revert-1 +++ a/include/linux/mm.h @@ -1684,9 +1684,5 @@ static inline unsigned int debug_guardpa static inline bool page_is_guard(struct page *page) { return false; } #endif /* CONFIG_DEBUG_PAGEALLOC */ -extern void reset_zone_present_pages(void); -extern void fixup_zone_present_pages(int nid, unsigned long start_pfn, - unsigned long end_pfn); - #endif /* __KERNEL__ */ #endif /* _LINUX_MM_H */ diff -puN mm/bootmem.c~revert-1 mm/bootmem.c --- a/mm/bootmem.c~revert-1 +++ a/mm/bootmem.c @@ -198,8 +198,6 @@ static unsigned long __init free_all_boo int order = ilog2(BITS_PER_LONG); __free_pages_bootmem(pfn_to_page(start), order); - fixup_zone_present_pages(page_to_nid(pfn_to_page(start)), - start, start + BITS_PER_LONG); count += BITS_PER_LONG; start += BITS_PER_LONG; } else { @@ -210,9 +208,6 @@ static unsigned long __init free_all_boo if (vec 1) { page = pfn_to_page(start + off); __free_pages_bootmem(page, 0); - fixup_zone_present_pages( - page_to_nid(page), - start + off, start + off + 1); count++; } vec = 1; @@ -226,11 +221,8 @@ static unsigned long __init free_all_boo pages = bdata-node_low_pfn - bdata-node_min_pfn; pages = bootmem_bootmap_pages(pages); count += pages; - while (pages--) { - fixup_zone_present_pages(page_to_nid(page), - page_to_pfn(page), page_to_pfn(page) + 1); + while (pages--) __free_pages_bootmem(page++, 0); - } bdebug(nid=%td released=%lx\n, bdata - bootmem_node_data, count); diff -puN mm/memory_hotplug.c~revert-1 mm/memory_hotplug.c --- a/mm/memory_hotplug.c~revert-1 +++ a/mm/memory_hotplug.c @@ -106,7 +106,6 @@ static void get_page_bootmem(unsigned lo void __ref put_page_bootmem(struct page *page) { unsigned long type; - struct zone *zone; type = (unsigned long) page-lru.next; BUG_ON(type
Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
On 11/15/12 19:24, Andrew Morton wrote: On Wed, 14 Nov 2012 22:52:03 +0800 Jiang Liu liu...@gmail.com wrote: So how about totally reverting the changeset 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 and I will post another version once I found a cleaner way? We do need to get this regression fixed and I guess that a straightforward revert is an acceptable way of doing that, for now. I queued the below, with a plan to send it to Linus next week. I've applied this patch to v3.7-rc5-28-g79e979e and can confirm that it fixes the problem I had with my laptop failing to resume (by either freezing or rebooting) after a suspend to disk. Tested-by: Chris Clayton chris2...@googlemail.com From: Andrew Morton a...@linux-foundation.org Subject: revert mm: fix-up zone present pages Revert commit 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 Author: Jianguo Wu wujian...@huawei.com AuthorDate: Mon Oct 8 16:33:06 2012 -0700 Commit: Linus Torvalds torva...@linux-foundation.org CommitDate: Tue Oct 9 16:22:54 2012 +0900 mm: fix-up zone present pages That patch tried to fix a issue when calculating zone-present_pages, but it caused a regression on 32bit systems with HIGHMEM. With that changeset, reset_zone_present_pages() resets all zone-present_pages to zero, and fixup_zone_present_pages() is called to recalculate zone-present_pages when the boot allocator frees core memory pages into buddy allocator. Because highmem pages are not freed by bootmem allocator, all highmem zones' present_pages becomes zero. Various options for improving the situation are being discussed but for now, let's return to the 3.6 code. Cc: Jianguo Wu wujian...@huawei.com Cc: Jiang Liu jiang@huawei.com Cc: Petr Tesarik ptesa...@suse.cz Cc: Luck, Tony tony.l...@intel.com Cc: Mel Gorman m...@csn.ul.ie Cc: Yinghai Lu ying...@kernel.org Cc: Minchan Kim minchan@gmail.com Cc: Johannes Weiner han...@cmpxchg.org Cc: David Rientjes rient...@google.com Signed-off-by: Andrew Morton a...@linux-foundation.org --- arch/ia64/mm/init.c |1 - include/linux/mm.h |4 mm/bootmem.c| 10 +- mm/memory_hotplug.c |7 --- mm/nobootmem.c |3 --- mm/page_alloc.c | 34 -- 6 files changed, 1 insertion(+), 58 deletions(-) diff -puN arch/ia64/mm/init.c~revert-1 arch/ia64/mm/init.c --- a/arch/ia64/mm/init.c~revert-1 +++ a/arch/ia64/mm/init.c @@ -637,7 +637,6 @@ mem_init (void) high_memory = __va(max_low_pfn * PAGE_SIZE); - reset_zone_present_pages(); for_each_online_pgdat(pgdat) if (pgdat-bdata-node_bootmem_map) totalram_pages += free_all_bootmem_node(pgdat); diff -puN include/linux/mm.h~revert-1 include/linux/mm.h --- a/include/linux/mm.h~revert-1 +++ a/include/linux/mm.h @@ -1684,9 +1684,5 @@ static inline unsigned int debug_guardpa static inline bool page_is_guard(struct page *page) { return false; } #endif /* CONFIG_DEBUG_PAGEALLOC */ -extern void reset_zone_present_pages(void); -extern void fixup_zone_present_pages(int nid, unsigned long start_pfn, - unsigned long end_pfn); - #endif /* __KERNEL__ */ #endif /* _LINUX_MM_H */ diff -puN mm/bootmem.c~revert-1 mm/bootmem.c --- a/mm/bootmem.c~revert-1 +++ a/mm/bootmem.c @@ -198,8 +198,6 @@ static unsigned long __init free_all_boo int order = ilog2(BITS_PER_LONG); __free_pages_bootmem(pfn_to_page(start), order); - fixup_zone_present_pages(page_to_nid(pfn_to_page(start)), - start, start + BITS_PER_LONG); count += BITS_PER_LONG; start += BITS_PER_LONG; } else { @@ -210,9 +208,6 @@ static unsigned long __init free_all_boo if (vec 1) { page = pfn_to_page(start + off); __free_pages_bootmem(page, 0); - fixup_zone_present_pages( - page_to_nid(page), - start + off, start + off + 1); count++; } vec = 1; @@ -226,11 +221,8 @@ static unsigned long __init free_all_boo pages = bdata-node_low_pfn - bdata-node_min_pfn; pages = bootmem_bootmap_pages(pages); count += pages; - while (pages--) { - fixup_zone_present_pages(page_to_nid(page), - page_to_pfn(page), page_to_pfn(page) + 1); + while (pages--) __free_pages_bootmem(page++, 0); - } bdebug(nid=%td released=%lx\n, bdata - bootmem_node_data, count); diff -puN mm/memory_hotplug.c~revert-1 mm/memory_hotplug.c ---
Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
On Thu, 15 Nov 2012, Andrew Morton wrote: From: Andrew Morton a...@linux-foundation.org Subject: revert mm: fix-up zone present pages Revert commit 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 Author: Jianguo Wu wujian...@huawei.com AuthorDate: Mon Oct 8 16:33:06 2012 -0700 Commit: Linus Torvalds torva...@linux-foundation.org CommitDate: Tue Oct 9 16:22:54 2012 +0900 mm: fix-up zone present pages That patch tried to fix a issue when calculating zone-present_pages, but it caused a regression on 32bit systems with HIGHMEM. With that changeset, reset_zone_present_pages() resets all zone-present_pages to zero, and fixup_zone_present_pages() is called to recalculate zone-present_pages when the boot allocator frees core memory pages into buddy allocator. Because highmem pages are not freed by bootmem allocator, all highmem zones' present_pages becomes zero. Various options for improving the situation are being discussed but for now, let's return to the 3.6 code. Cc: Jianguo Wu wujian...@huawei.com Cc: Jiang Liu jiang@huawei.com Cc: Petr Tesarik ptesa...@suse.cz Cc: Luck, Tony tony.l...@intel.com Cc: Mel Gorman m...@csn.ul.ie Cc: Yinghai Lu ying...@kernel.org Cc: Minchan Kim minchan@gmail.com Cc: Johannes Weiner han...@cmpxchg.org Cc: David Rientjes rient...@google.com Signed-off-by: Andrew Morton a...@linux-foundation.org Acked-by: David Rientjes rient...@google.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
On Thu, 15 Nov 2012, Jiang Liu wrote: I feel that zone-present_pages has been abused. I guess it means physical pages present in this zone originally, but now sometimes zone-present_pages is used as pages in this zone managed by the buddy system. It's definition is all pages spanned by the zone that are not reserved and unavailable to the kernel to allocate from, and the implementation of bootmem requires that its memory be considered as reserved until freed. It's used throughout the kernel to determine the amount of memory that is allocatable in that zone from the page allocator since its reclaim heuristics and watermarks depend on this memory being allocatable. So I'm trying to add a new field managed_pages into zone, which accounts for pages managed by buddy system. That's why I thought the clean solution is a little complex:( You need to update the pgdat's node_present_pages to be consistent with all of its zones' present_pages. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
On 11/07/2012 04:43 AM, Andrew Morton wrote: > On Tue, 6 Nov 2012 09:31:57 +0800 > Jiang Liu wrote: > >> Changeset 7f1290f2f2 tries to fix a issue when calculating >> zone->present_pages, but it causes a regression to 32bit systems with >> HIGHMEM. With that changeset, function reset_zone_present_pages() >> resets all zone->present_pages to zero, and fixup_zone_present_pages() >> is called to recalculate zone->present_pages when boot allocator frees >> core memory pages into buddy allocator. Because highmem pages are not >> freed by bootmem allocator, all highmem zones' present_pages becomes >> zero. >> >> Actually there's no need to recalculate present_pages for highmem zone >> because bootmem allocator never allocates pages from them. So fix the >> regression by skipping highmem in function reset_zone_present_pages() >> and fixup_zone_present_pages(). >> >> ... >> >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void) >> for_each_node_state(nid, N_HIGH_MEMORY) { >> for (i = 0; i < MAX_NR_ZONES; i++) { >> z = NODE_DATA(nid)->node_zones + i; >> -z->present_pages = 0; >> +if (!is_highmem(z)) >> +z->present_pages = 0; >> } >> } >> } >> @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long >> start_pfn, >> >> for (i = 0; i < MAX_NR_ZONES; i++) { >> z = NODE_DATA(nid)->node_zones + i; >> +if (is_highmem(z)) >> +continue; >> + >> zone_start_pfn = z->zone_start_pfn; >> zone_end_pfn = zone_start_pfn + z->spanned_pages; >> - >> -/* if the two regions intersect */ >> if (!(zone_start_pfn >= end_pfn || zone_end_pfn <= start_pfn)) >> z->present_pages += min(end_pfn, zone_end_pfn) - >> max(start_pfn, zone_start_pfn); > > This ... isn't very nice. It is embeds within > reset_zone_present_pages() and fixup_zone_present_pages() knowledge > about their caller's state. Or, more specifically, it is emebedding > knowledge about the overall state of the system when these functions > are called. > > I mean, a function called "reset_zone_present_pages" should reset > ->present_pages! > > The fact that fixup_zone_present_page() has multiple call sites makes > this all even more risky. And what are the interactions between this > and memory hotplug? > > Can we find a cleaner fix? > > Please tell us more about what's happening here. Is it the case that > reset_zone_present_pages() is being called *after* highmem has been > populated? If so, then fixup_zone_present_pages() should work > correctly for highmem? Or is it the case that highmem hasn't yet been > setup? IOW, what is the sequence of operations here? > > Is the problem that we're *missing* a call to > fixup_zone_present_pages(), perhaps? If we call > fixup_zone_present_pages() after highmem has been populated, > fixup_zone_present_pages() should correctly fill in the highmem zone's > ->present_pages? Hi Andrew, Sorry for the late response:( I have done more investigations according to your suggestions. Currently we have only called fixup_zone_present_pages() for memory freed by bootmem allocator and missed HIGHMEM pages. We could also call fixup_zone_present_pages() for HIGHMEM pages, but that will need to change arch specific code for x86, powerpc, sparc, microblaze, arm, mips, um and tile etc. Seems a little overhead. And sadly enough, I found the quick fix is still incomplete. The original patch still have another issue that, reset_zone_present_pages() is only called for IA64, so it will cause trouble for other arches which make use of "bootmem.c". Then I feel a little guilty and tried to find a cleaner solution without touching arch specific code. But things are more complex than my expectation and I'm still working on that. So how about totally reverting the changeset 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 and I will post another version once I found a cleaner way? Thanks! Gerry > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
On 11/07/2012 04:43 AM, Andrew Morton wrote: On Tue, 6 Nov 2012 09:31:57 +0800 Jiang Liu jiang@huawei.com wrote: Changeset 7f1290f2f2 tries to fix a issue when calculating zone-present_pages, but it causes a regression to 32bit systems with HIGHMEM. With that changeset, function reset_zone_present_pages() resets all zone-present_pages to zero, and fixup_zone_present_pages() is called to recalculate zone-present_pages when boot allocator frees core memory pages into buddy allocator. Because highmem pages are not freed by bootmem allocator, all highmem zones' present_pages becomes zero. Actually there's no need to recalculate present_pages for highmem zone because bootmem allocator never allocates pages from them. So fix the regression by skipping highmem in function reset_zone_present_pages() and fixup_zone_present_pages(). ... --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void) for_each_node_state(nid, N_HIGH_MEMORY) { for (i = 0; i MAX_NR_ZONES; i++) { z = NODE_DATA(nid)-node_zones + i; -z-present_pages = 0; +if (!is_highmem(z)) +z-present_pages = 0; } } } @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long start_pfn, for (i = 0; i MAX_NR_ZONES; i++) { z = NODE_DATA(nid)-node_zones + i; +if (is_highmem(z)) +continue; + zone_start_pfn = z-zone_start_pfn; zone_end_pfn = zone_start_pfn + z-spanned_pages; - -/* if the two regions intersect */ if (!(zone_start_pfn = end_pfn || zone_end_pfn = start_pfn)) z-present_pages += min(end_pfn, zone_end_pfn) - max(start_pfn, zone_start_pfn); This ... isn't very nice. It is embeds within reset_zone_present_pages() and fixup_zone_present_pages() knowledge about their caller's state. Or, more specifically, it is emebedding knowledge about the overall state of the system when these functions are called. I mean, a function called reset_zone_present_pages should reset -present_pages! The fact that fixup_zone_present_page() has multiple call sites makes this all even more risky. And what are the interactions between this and memory hotplug? Can we find a cleaner fix? Please tell us more about what's happening here. Is it the case that reset_zone_present_pages() is being called *after* highmem has been populated? If so, then fixup_zone_present_pages() should work correctly for highmem? Or is it the case that highmem hasn't yet been setup? IOW, what is the sequence of operations here? Is the problem that we're *missing* a call to fixup_zone_present_pages(), perhaps? If we call fixup_zone_present_pages() after highmem has been populated, fixup_zone_present_pages() should correctly fill in the highmem zone's -present_pages? Hi Andrew, Sorry for the late response:( I have done more investigations according to your suggestions. Currently we have only called fixup_zone_present_pages() for memory freed by bootmem allocator and missed HIGHMEM pages. We could also call fixup_zone_present_pages() for HIGHMEM pages, but that will need to change arch specific code for x86, powerpc, sparc, microblaze, arm, mips, um and tile etc. Seems a little overhead. And sadly enough, I found the quick fix is still incomplete. The original patch still have another issue that, reset_zone_present_pages() is only called for IA64, so it will cause trouble for other arches which make use of bootmem.c. Then I feel a little guilty and tried to find a cleaner solution without touching arch specific code. But things are more complex than my expectation and I'm still working on that. So how about totally reverting the changeset 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 and I will post another version once I found a cleaner way? Thanks! Gerry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
On Tue, 6 Nov 2012 09:31:57 +0800 Jiang Liu wrote: > Changeset 7f1290f2f2 tries to fix a issue when calculating > zone->present_pages, but it causes a regression to 32bit systems with > HIGHMEM. With that changeset, function reset_zone_present_pages() > resets all zone->present_pages to zero, and fixup_zone_present_pages() > is called to recalculate zone->present_pages when boot allocator frees > core memory pages into buddy allocator. Because highmem pages are not > freed by bootmem allocator, all highmem zones' present_pages becomes > zero. > > Actually there's no need to recalculate present_pages for highmem zone > because bootmem allocator never allocates pages from them. So fix the > regression by skipping highmem in function reset_zone_present_pages() > and fixup_zone_present_pages(). > > ... > > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void) > for_each_node_state(nid, N_HIGH_MEMORY) { > for (i = 0; i < MAX_NR_ZONES; i++) { > z = NODE_DATA(nid)->node_zones + i; > - z->present_pages = 0; > + if (!is_highmem(z)) > + z->present_pages = 0; > } > } > } > @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long > start_pfn, > > for (i = 0; i < MAX_NR_ZONES; i++) { > z = NODE_DATA(nid)->node_zones + i; > + if (is_highmem(z)) > + continue; > + > zone_start_pfn = z->zone_start_pfn; > zone_end_pfn = zone_start_pfn + z->spanned_pages; > - > - /* if the two regions intersect */ > if (!(zone_start_pfn >= end_pfn || zone_end_pfn <= start_pfn)) > z->present_pages += min(end_pfn, zone_end_pfn) - > max(start_pfn, zone_start_pfn); This ... isn't very nice. It is embeds within reset_zone_present_pages() and fixup_zone_present_pages() knowledge about their caller's state. Or, more specifically, it is emebedding knowledge about the overall state of the system when these functions are called. I mean, a function called "reset_zone_present_pages" should reset ->present_pages! The fact that fixup_zone_present_page() has multiple call sites makes this all even more risky. And what are the interactions between this and memory hotplug? Can we find a cleaner fix? Please tell us more about what's happening here. Is it the case that reset_zone_present_pages() is being called *after* highmem has been populated? If so, then fixup_zone_present_pages() should work correctly for highmem? Or is it the case that highmem hasn't yet been setup? IOW, what is the sequence of operations here? Is the problem that we're *missing* a call to fixup_zone_present_pages(), perhaps? If we call fixup_zone_present_pages() after highmem has been populated, fixup_zone_present_pages() should correctly fill in the highmem zone's ->present_pages? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
On 11/06/12 01:31, Jiang Liu wrote: Changeset 7f1290f2f2 tries to fix a issue when calculating zone->present_pages, but it causes a regression to 32bit systems with HIGHMEM. With that changeset, function reset_zone_present_pages() resets all zone->present_pages to zero, and fixup_zone_present_pages() is called to recalculate zone->present_pages when boot allocator frees core memory pages into buddy allocator. Because highmem pages are not freed by bootmem allocator, all highmem zones' present_pages becomes zero. Actually there's no need to recalculate present_pages for highmem zone because bootmem allocator never allocates pages from them. So fix the regression by skipping highmem in function reset_zone_present_pages() and fixup_zone_present_pages(). Signed-off-by: Jiang Liu Signed-off-by: Jianguo Wu Reported-by: Maciej Rutecki Tested-by: Maciej Rutecki Cc: Chris Clayton Cc: Rafael J. Wysocki Cc: Andrew Morton Cc: Mel Gorman Cc: Minchan Kim Cc: KAMEZAWA Hiroyuki Cc: Michal Hocko Cc: linux...@kvack.org Cc: linux-kernel@vger.kernel.org --- Hi Maciej, Thanks for reporting and bisecting. We have analyzed the regression and worked out a patch for it. Could you please help to verify whether it fix the regression? Thanks! Gerry Thanks Gerry. I've applied this patch to 3.7.0-rc4 and can confirm that it fixes the problem I had with my laptop failing to resume after a suspend to disk. Tested-by: Chris Clayton --- mm/page_alloc.c |8 +--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5b74de6..2311f15 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void) for_each_node_state(nid, N_HIGH_MEMORY) { for (i = 0; i < MAX_NR_ZONES; i++) { z = NODE_DATA(nid)->node_zones + i; - z->present_pages = 0; + if (!is_highmem(z)) + z->present_pages = 0; } } } @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long start_pfn, for (i = 0; i < MAX_NR_ZONES; i++) { z = NODE_DATA(nid)->node_zones + i; + if (is_highmem(z)) + continue; + zone_start_pfn = z->zone_start_pfn; zone_end_pfn = zone_start_pfn + z->spanned_pages; - - /* if the two regions intersect */ if (!(zone_start_pfn >= end_pfn || zone_end_pfn <= start_pfn)) z->present_pages += min(end_pfn, zone_end_pfn) - max(start_pfn, zone_start_pfn); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
On 11/06/12 01:31, Jiang Liu wrote: Changeset 7f1290f2f2 tries to fix a issue when calculating zone-present_pages, but it causes a regression to 32bit systems with HIGHMEM. With that changeset, function reset_zone_present_pages() resets all zone-present_pages to zero, and fixup_zone_present_pages() is called to recalculate zone-present_pages when boot allocator frees core memory pages into buddy allocator. Because highmem pages are not freed by bootmem allocator, all highmem zones' present_pages becomes zero. Actually there's no need to recalculate present_pages for highmem zone because bootmem allocator never allocates pages from them. So fix the regression by skipping highmem in function reset_zone_present_pages() and fixup_zone_present_pages(). Signed-off-by: Jiang Liu jiang@huawei.com Signed-off-by: Jianguo Wu wujian...@huawei.com Reported-by: Maciej Rutecki maciej.rute...@gmail.com Tested-by: Maciej Rutecki maciej.rute...@gmail.com Cc: Chris Clayton chris2...@googlemail.com Cc: Rafael J. Wysocki r...@sisk.pl Cc: Andrew Morton a...@linux-foundation.org Cc: Mel Gorman mgor...@suse.de Cc: Minchan Kim minc...@kernel.org Cc: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com Cc: Michal Hocko mho...@suse.cz Cc: linux...@kvack.org Cc: linux-kernel@vger.kernel.org --- Hi Maciej, Thanks for reporting and bisecting. We have analyzed the regression and worked out a patch for it. Could you please help to verify whether it fix the regression? Thanks! Gerry Thanks Gerry. I've applied this patch to 3.7.0-rc4 and can confirm that it fixes the problem I had with my laptop failing to resume after a suspend to disk. Tested-by: Chris Clayton chris2...@googlemail.com --- mm/page_alloc.c |8 +--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5b74de6..2311f15 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void) for_each_node_state(nid, N_HIGH_MEMORY) { for (i = 0; i MAX_NR_ZONES; i++) { z = NODE_DATA(nid)-node_zones + i; - z-present_pages = 0; + if (!is_highmem(z)) + z-present_pages = 0; } } } @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long start_pfn, for (i = 0; i MAX_NR_ZONES; i++) { z = NODE_DATA(nid)-node_zones + i; + if (is_highmem(z)) + continue; + zone_start_pfn = z-zone_start_pfn; zone_end_pfn = zone_start_pfn + z-spanned_pages; - - /* if the two regions intersect */ if (!(zone_start_pfn = end_pfn || zone_end_pfn = start_pfn)) z-present_pages += min(end_pfn, zone_end_pfn) - max(start_pfn, zone_start_pfn); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
On Tue, 6 Nov 2012 09:31:57 +0800 Jiang Liu jiang@huawei.com wrote: Changeset 7f1290f2f2 tries to fix a issue when calculating zone-present_pages, but it causes a regression to 32bit systems with HIGHMEM. With that changeset, function reset_zone_present_pages() resets all zone-present_pages to zero, and fixup_zone_present_pages() is called to recalculate zone-present_pages when boot allocator frees core memory pages into buddy allocator. Because highmem pages are not freed by bootmem allocator, all highmem zones' present_pages becomes zero. Actually there's no need to recalculate present_pages for highmem zone because bootmem allocator never allocates pages from them. So fix the regression by skipping highmem in function reset_zone_present_pages() and fixup_zone_present_pages(). ... --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void) for_each_node_state(nid, N_HIGH_MEMORY) { for (i = 0; i MAX_NR_ZONES; i++) { z = NODE_DATA(nid)-node_zones + i; - z-present_pages = 0; + if (!is_highmem(z)) + z-present_pages = 0; } } } @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long start_pfn, for (i = 0; i MAX_NR_ZONES; i++) { z = NODE_DATA(nid)-node_zones + i; + if (is_highmem(z)) + continue; + zone_start_pfn = z-zone_start_pfn; zone_end_pfn = zone_start_pfn + z-spanned_pages; - - /* if the two regions intersect */ if (!(zone_start_pfn = end_pfn || zone_end_pfn = start_pfn)) z-present_pages += min(end_pfn, zone_end_pfn) - max(start_pfn, zone_start_pfn); This ... isn't very nice. It is embeds within reset_zone_present_pages() and fixup_zone_present_pages() knowledge about their caller's state. Or, more specifically, it is emebedding knowledge about the overall state of the system when these functions are called. I mean, a function called reset_zone_present_pages should reset -present_pages! The fact that fixup_zone_present_page() has multiple call sites makes this all even more risky. And what are the interactions between this and memory hotplug? Can we find a cleaner fix? Please tell us more about what's happening here. Is it the case that reset_zone_present_pages() is being called *after* highmem has been populated? If so, then fixup_zone_present_pages() should work correctly for highmem? Or is it the case that highmem hasn't yet been setup? IOW, what is the sequence of operations here? Is the problem that we're *missing* a call to fixup_zone_present_pages(), perhaps? If we call fixup_zone_present_pages() after highmem has been populated, fixup_zone_present_pages() should correctly fill in the highmem zone's -present_pages? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
Changeset 7f1290f2f2 tries to fix a issue when calculating zone->present_pages, but it causes a regression to 32bit systems with HIGHMEM. With that changeset, function reset_zone_present_pages() resets all zone->present_pages to zero, and fixup_zone_present_pages() is called to recalculate zone->present_pages when boot allocator frees core memory pages into buddy allocator. Because highmem pages are not freed by bootmem allocator, all highmem zones' present_pages becomes zero. Actually there's no need to recalculate present_pages for highmem zone because bootmem allocator never allocates pages from them. So fix the regression by skipping highmem in function reset_zone_present_pages() and fixup_zone_present_pages(). Signed-off-by: Jiang Liu Signed-off-by: Jianguo Wu Reported-by: Maciej Rutecki Tested-by: Maciej Rutecki Cc: Chris Clayton Cc: Rafael J. Wysocki Cc: Andrew Morton Cc: Mel Gorman Cc: Minchan Kim Cc: KAMEZAWA Hiroyuki Cc: Michal Hocko Cc: linux...@kvack.org Cc: linux-kernel@vger.kernel.org --- Hi Maciej, Thanks for reporting and bisecting. We have analyzed the regression and worked out a patch for it. Could you please help to verify whether it fix the regression? Thanks! Gerry --- mm/page_alloc.c |8 +--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5b74de6..2311f15 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void) for_each_node_state(nid, N_HIGH_MEMORY) { for (i = 0; i < MAX_NR_ZONES; i++) { z = NODE_DATA(nid)->node_zones + i; - z->present_pages = 0; + if (!is_highmem(z)) + z->present_pages = 0; } } } @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long start_pfn, for (i = 0; i < MAX_NR_ZONES; i++) { z = NODE_DATA(nid)->node_zones + i; + if (is_highmem(z)) + continue; + zone_start_pfn = z->zone_start_pfn; zone_end_pfn = zone_start_pfn + z->spanned_pages; - - /* if the two regions intersect */ if (!(zone_start_pfn >= end_pfn || zone_end_pfn <= start_pfn)) z->present_pages += min(end_pfn, zone_end_pfn) - max(start_pfn, zone_start_pfn); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
Changeset 7f1290f2f2 tries to fix a issue when calculating zone-present_pages, but it causes a regression to 32bit systems with HIGHMEM. With that changeset, function reset_zone_present_pages() resets all zone-present_pages to zero, and fixup_zone_present_pages() is called to recalculate zone-present_pages when boot allocator frees core memory pages into buddy allocator. Because highmem pages are not freed by bootmem allocator, all highmem zones' present_pages becomes zero. Actually there's no need to recalculate present_pages for highmem zone because bootmem allocator never allocates pages from them. So fix the regression by skipping highmem in function reset_zone_present_pages() and fixup_zone_present_pages(). Signed-off-by: Jiang Liu jiang@huawei.com Signed-off-by: Jianguo Wu wujian...@huawei.com Reported-by: Maciej Rutecki maciej.rute...@gmail.com Tested-by: Maciej Rutecki maciej.rute...@gmail.com Cc: Chris Clayton chris2...@googlemail.com Cc: Rafael J. Wysocki r...@sisk.pl Cc: Andrew Morton a...@linux-foundation.org Cc: Mel Gorman mgor...@suse.de Cc: Minchan Kim minc...@kernel.org Cc: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com Cc: Michal Hocko mho...@suse.cz Cc: linux...@kvack.org Cc: linux-kernel@vger.kernel.org --- Hi Maciej, Thanks for reporting and bisecting. We have analyzed the regression and worked out a patch for it. Could you please help to verify whether it fix the regression? Thanks! Gerry --- mm/page_alloc.c |8 +--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5b74de6..2311f15 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void) for_each_node_state(nid, N_HIGH_MEMORY) { for (i = 0; i MAX_NR_ZONES; i++) { z = NODE_DATA(nid)-node_zones + i; - z-present_pages = 0; + if (!is_highmem(z)) + z-present_pages = 0; } } } @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long start_pfn, for (i = 0; i MAX_NR_ZONES; i++) { z = NODE_DATA(nid)-node_zones + i; + if (is_highmem(z)) + continue; + zone_start_pfn = z-zone_start_pfn; zone_end_pfn = zone_start_pfn + z-spanned_pages; - - /* if the two regions intersect */ if (!(zone_start_pfn = end_pfn || zone_end_pfn = start_pfn)) z-present_pages += min(end_pfn, zone_end_pfn) - max(start_pfn, zone_start_pfn); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/