Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-15 Thread David Rientjes
On Thu, 15 Nov 2012, Jiang Liu wrote:

> I feel that zone->present_pages has been abused. I guess it means "physical 
> pages 
> present in this zone" originally, but now sometimes zone->present_pages is 
> used as
> "pages in this zone managed by the buddy system".

It's definition is all pages spanned by the zone that are not reserved and 
unavailable to the kernel to allocate from, and the implementation of 
bootmem requires that its memory be considered as "reserved" until freed.  
It's used throughout the kernel to determine the amount of memory that is 
allocatable in that zone from the page allocator since its reclaim 
heuristics and watermarks depend on this memory being allocatable.

> So I'm trying to add a new
> field "managed_pages" into zone, which accounts for pages managed by buddy 
> system.
> That's why I thought the clean solution is a little complex:(
> 

You need to update the pgdat's node_present_pages to be consistent with 
all of its zones' present_pages.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-15 Thread David Rientjes
On Thu, 15 Nov 2012, Andrew Morton wrote:

> From: Andrew Morton 
> Subject: revert "mm: fix-up zone present pages"
> 
> Revert
> 
> commit 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
> Author: Jianguo Wu 
> AuthorDate: Mon Oct 8 16:33:06 2012 -0700
> Commit: Linus Torvalds 
> CommitDate: Tue Oct 9 16:22:54 2012 +0900
> 
> mm: fix-up zone present pages
> 
> 
> That patch tried to fix a issue when calculating zone->present_pages, but
> it caused a regression on 32bit systems with HIGHMEM.  With that
> changeset, reset_zone_present_pages() resets all zone->present_pages to
> zero, and fixup_zone_present_pages() is called to recalculate
> zone->present_pages when the boot allocator frees core memory pages into
> buddy allocator.  Because highmem pages are not freed by bootmem
> allocator, all highmem zones' present_pages becomes zero.
> 
> Various options for improving the situation are being discussed but for
> now, let's return to the 3.6 code.
> 
> Cc: Jianguo Wu 
> Cc: Jiang Liu 
> Cc: Petr Tesarik 
> Cc: "Luck, Tony" 
> Cc: Mel Gorman 
> Cc: Yinghai Lu 
> Cc: Minchan Kim 
> Cc: Johannes Weiner 
> Cc: David Rientjes 
> Signed-off-by: Andrew Morton 

Acked-by: David Rientjes 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-15 Thread Chris Clayton



On 11/15/12 19:24, Andrew Morton wrote:

On Wed, 14 Nov 2012 22:52:03 +0800
Jiang Liu  wrote:


So how about totally reverting the changeset 
7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
and I will post another version once I found a cleaner way?


We do need to get this regression fixed and I guess that a
straightforward revert is an acceptable way of doing that, for now.


I queued the below, with a plan to send it to Linus next week.



I've applied this patch to v3.7-rc5-28-g79e979e and can confirm that it 
fixes the problem I had with my laptop failing to resume (by either 
freezing or rebooting) after a suspend to disk.


Tested-by: Chris Clayton 



From: Andrew Morton 
Subject: revert "mm: fix-up zone present pages"

Revert

commit 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
Author: Jianguo Wu 
AuthorDate: Mon Oct 8 16:33:06 2012 -0700
Commit: Linus Torvalds 
CommitDate: Tue Oct 9 16:22:54 2012 +0900

 mm: fix-up zone present pages


That patch tried to fix a issue when calculating zone->present_pages, but
it caused a regression on 32bit systems with HIGHMEM.  With that
changeset, reset_zone_present_pages() resets all zone->present_pages to
zero, and fixup_zone_present_pages() is called to recalculate
zone->present_pages when the boot allocator frees core memory pages into
buddy allocator.  Because highmem pages are not freed by bootmem
allocator, all highmem zones' present_pages becomes zero.

Various options for improving the situation are being discussed but for
now, let's return to the 3.6 code.

Cc: Jianguo Wu 
Cc: Jiang Liu 
Cc: Petr Tesarik 
Cc: "Luck, Tony" 
Cc: Mel Gorman 
Cc: Yinghai Lu 
Cc: Minchan Kim 
Cc: Johannes Weiner 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
---

  arch/ia64/mm/init.c |1 -
  include/linux/mm.h  |4 
  mm/bootmem.c|   10 +-
  mm/memory_hotplug.c |7 ---
  mm/nobootmem.c  |3 ---
  mm/page_alloc.c |   34 --
  6 files changed, 1 insertion(+), 58 deletions(-)

diff -puN arch/ia64/mm/init.c~revert-1 arch/ia64/mm/init.c
--- a/arch/ia64/mm/init.c~revert-1
+++ a/arch/ia64/mm/init.c
@@ -637,7 +637,6 @@ mem_init (void)

high_memory = __va(max_low_pfn * PAGE_SIZE);

-   reset_zone_present_pages();
for_each_online_pgdat(pgdat)
if (pgdat->bdata->node_bootmem_map)
totalram_pages += free_all_bootmem_node(pgdat);
diff -puN include/linux/mm.h~revert-1 include/linux/mm.h
--- a/include/linux/mm.h~revert-1
+++ a/include/linux/mm.h
@@ -1684,9 +1684,5 @@ static inline unsigned int debug_guardpa
  static inline bool page_is_guard(struct page *page) { return false; }
  #endif /* CONFIG_DEBUG_PAGEALLOC */

-extern void reset_zone_present_pages(void);
-extern void fixup_zone_present_pages(int nid, unsigned long start_pfn,
-   unsigned long end_pfn);
-
  #endif /* __KERNEL__ */
  #endif /* _LINUX_MM_H */
diff -puN mm/bootmem.c~revert-1 mm/bootmem.c
--- a/mm/bootmem.c~revert-1
+++ a/mm/bootmem.c
@@ -198,8 +198,6 @@ static unsigned long __init free_all_boo
int order = ilog2(BITS_PER_LONG);

__free_pages_bootmem(pfn_to_page(start), order);
-   
fixup_zone_present_pages(page_to_nid(pfn_to_page(start)),
-   start, start + BITS_PER_LONG);
count += BITS_PER_LONG;
start += BITS_PER_LONG;
} else {
@@ -210,9 +208,6 @@ static unsigned long __init free_all_boo
if (vec & 1) {
page = pfn_to_page(start + off);
__free_pages_bootmem(page, 0);
-   fixup_zone_present_pages(
-   page_to_nid(page),
-   start + off, start + off + 1);
count++;
}
vec >>= 1;
@@ -226,11 +221,8 @@ static unsigned long __init free_all_boo
pages = bdata->node_low_pfn - bdata->node_min_pfn;
pages = bootmem_bootmap_pages(pages);
count += pages;
-   while (pages--) {
-   fixup_zone_present_pages(page_to_nid(page),
-   page_to_pfn(page), page_to_pfn(page) + 1);
+   while (pages--)
__free_pages_bootmem(page++, 0);
-   }

bdebug("nid=%td released=%lx\n", bdata - bootmem_node_data, count);

diff -puN mm/memory_hotplug.c~revert-1 mm/memory_hotplug.c
--- a/mm/memory_hotplug.c~revert-1
+++ a/mm/memory_hotplug.c
@@ -106,7 +106,6 @@ static void get_page_bootmem(unsigned lo
  void __ref put_page_bootmem(struct page *page)
  {
unsigned long type;
-   struct zone *zone;

type = (unsigned long) page->lru.next;
BUG_ON(type < 

Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-15 Thread Andrew Morton
On Wed, 14 Nov 2012 22:52:03 +0800
Jiang Liu  wrote:

>   So how about totally reverting the changeset 
> 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
> and I will post another version once I found a cleaner way?

We do need to get this regression fixed and I guess that a
straightforward revert is an acceptable way of doing that, for now.


I queued the below, with a plan to send it to Linus next week.


From: Andrew Morton 
Subject: revert "mm: fix-up zone present pages"

Revert

commit 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
Author: Jianguo Wu 
AuthorDate: Mon Oct 8 16:33:06 2012 -0700
Commit: Linus Torvalds 
CommitDate: Tue Oct 9 16:22:54 2012 +0900

mm: fix-up zone present pages


That patch tried to fix a issue when calculating zone->present_pages, but
it caused a regression on 32bit systems with HIGHMEM.  With that
changeset, reset_zone_present_pages() resets all zone->present_pages to
zero, and fixup_zone_present_pages() is called to recalculate
zone->present_pages when the boot allocator frees core memory pages into
buddy allocator.  Because highmem pages are not freed by bootmem
allocator, all highmem zones' present_pages becomes zero.

Various options for improving the situation are being discussed but for
now, let's return to the 3.6 code.

Cc: Jianguo Wu 
Cc: Jiang Liu 
Cc: Petr Tesarik 
Cc: "Luck, Tony" 
Cc: Mel Gorman 
Cc: Yinghai Lu 
Cc: Minchan Kim 
Cc: Johannes Weiner 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
---

 arch/ia64/mm/init.c |1 -
 include/linux/mm.h  |4 
 mm/bootmem.c|   10 +-
 mm/memory_hotplug.c |7 ---
 mm/nobootmem.c  |3 ---
 mm/page_alloc.c |   34 --
 6 files changed, 1 insertion(+), 58 deletions(-)

diff -puN arch/ia64/mm/init.c~revert-1 arch/ia64/mm/init.c
--- a/arch/ia64/mm/init.c~revert-1
+++ a/arch/ia64/mm/init.c
@@ -637,7 +637,6 @@ mem_init (void)
 
high_memory = __va(max_low_pfn * PAGE_SIZE);
 
-   reset_zone_present_pages();
for_each_online_pgdat(pgdat)
if (pgdat->bdata->node_bootmem_map)
totalram_pages += free_all_bootmem_node(pgdat);
diff -puN include/linux/mm.h~revert-1 include/linux/mm.h
--- a/include/linux/mm.h~revert-1
+++ a/include/linux/mm.h
@@ -1684,9 +1684,5 @@ static inline unsigned int debug_guardpa
 static inline bool page_is_guard(struct page *page) { return false; }
 #endif /* CONFIG_DEBUG_PAGEALLOC */
 
-extern void reset_zone_present_pages(void);
-extern void fixup_zone_present_pages(int nid, unsigned long start_pfn,
-   unsigned long end_pfn);
-
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff -puN mm/bootmem.c~revert-1 mm/bootmem.c
--- a/mm/bootmem.c~revert-1
+++ a/mm/bootmem.c
@@ -198,8 +198,6 @@ static unsigned long __init free_all_boo
int order = ilog2(BITS_PER_LONG);
 
__free_pages_bootmem(pfn_to_page(start), order);
-   
fixup_zone_present_pages(page_to_nid(pfn_to_page(start)),
-   start, start + BITS_PER_LONG);
count += BITS_PER_LONG;
start += BITS_PER_LONG;
} else {
@@ -210,9 +208,6 @@ static unsigned long __init free_all_boo
if (vec & 1) {
page = pfn_to_page(start + off);
__free_pages_bootmem(page, 0);
-   fixup_zone_present_pages(
-   page_to_nid(page),
-   start + off, start + off + 1);
count++;
}
vec >>= 1;
@@ -226,11 +221,8 @@ static unsigned long __init free_all_boo
pages = bdata->node_low_pfn - bdata->node_min_pfn;
pages = bootmem_bootmap_pages(pages);
count += pages;
-   while (pages--) {
-   fixup_zone_present_pages(page_to_nid(page),
-   page_to_pfn(page), page_to_pfn(page) + 1);
+   while (pages--)
__free_pages_bootmem(page++, 0);
-   }
 
bdebug("nid=%td released=%lx\n", bdata - bootmem_node_data, count);
 
diff -puN mm/memory_hotplug.c~revert-1 mm/memory_hotplug.c
--- a/mm/memory_hotplug.c~revert-1
+++ a/mm/memory_hotplug.c
@@ -106,7 +106,6 @@ static void get_page_bootmem(unsigned lo
 void __ref put_page_bootmem(struct page *page)
 {
unsigned long type;
-   struct zone *zone;
 
type = (unsigned long) page->lru.next;
BUG_ON(type < MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
@@ -117,12 +116,6 @@ void __ref put_page_bootmem(struct page 
set_page_private(page, 0);
INIT_LIST_HEAD(>lru);
__free_pages_bootmem(page, 0);
-
-   zone = page_zone(page);
-   

Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-15 Thread Jiang Liu
On 11/15/2012 05:22 PM, Wen Congyang wrote:
> Hi, Liu Jiang
> 
> At 11/14/2012 10:52 PM, Jiang Liu Wrote:
>> On 11/07/2012 04:43 AM, Andrew Morton wrote:
>>> On Tue, 6 Nov 2012 09:31:57 +0800
>>> Jiang Liu  wrote:
>>>
 Changeset 7f1290f2f2 tries to fix a issue when calculating
 zone->present_pages, but it causes a regression to 32bit systems with
 HIGHMEM. With that changeset, function reset_zone_present_pages()
 resets all zone->present_pages to zero, and fixup_zone_present_pages()
 is called to recalculate zone->present_pages when boot allocator frees
 core memory pages into buddy allocator. Because highmem pages are not
 freed by bootmem allocator, all highmem zones' present_pages becomes
 zero.

 Actually there's no need to recalculate present_pages for highmem zone
 because bootmem allocator never allocates pages from them. So fix the
 regression by skipping highmem in function reset_zone_present_pages()
 and fixup_zone_present_pages().

 ...

 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void)
for_each_node_state(nid, N_HIGH_MEMORY) {
for (i = 0; i < MAX_NR_ZONES; i++) {
z = NODE_DATA(nid)->node_zones + i;
 -  z->present_pages = 0;
 +  if (!is_highmem(z))
 +  z->present_pages = 0;
}
}
  }
 @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned 
 long start_pfn,
  
for (i = 0; i < MAX_NR_ZONES; i++) {
z = NODE_DATA(nid)->node_zones + i;
 +  if (is_highmem(z))
 +  continue;
 +
zone_start_pfn = z->zone_start_pfn;
zone_end_pfn = zone_start_pfn + z->spanned_pages;
 -
 -  /* if the two regions intersect */
if (!(zone_start_pfn >= end_pfn || zone_end_pfn <= start_pfn))
z->present_pages += min(end_pfn, zone_end_pfn) -
max(start_pfn, zone_start_pfn);
>>>
>>> This ...  isn't very nice.  It is embeds within
>>> reset_zone_present_pages() and fixup_zone_present_pages() knowledge
>>> about their caller's state.  Or, more specifically, it is emebedding
>>> knowledge about the overall state of the system when these functions
>>> are called.
>>>
>>> I mean, a function called "reset_zone_present_pages" should reset
>>> ->present_pages!
>>>
>>> The fact that fixup_zone_present_page() has multiple call sites makes
>>> this all even more risky.  And what are the interactions between this
>>> and memory hotplug?
>>>
>>> Can we find a cleaner fix?
>>>
>>> Please tell us more about what's happening here.  Is it the case that
>>> reset_zone_present_pages() is being called *after* highmem has been
>>> populated?  If so, then fixup_zone_present_pages() should work
>>> correctly for highmem?  Or is it the case that highmem hasn't yet been
>>> setup?  IOW, what is the sequence of operations here?
>>>
>>> Is the problem that we're *missing* a call to
>>> fixup_zone_present_pages(), perhaps?  If we call
>>> fixup_zone_present_pages() after highmem has been populated,
>>> fixup_zone_present_pages() should correctly fill in the highmem zone's
>>> ->present_pages?
>> Hi Andrew,
>>  Sorry for the late response:(
>>  I have done more investigations according to your suggestions. Currently
>> we have only called fixup_zone_present_pages() for memory freed by bootmem
>> allocator and missed HIGHMEM pages. We could also call 
>> fixup_zone_present_pages()
>> for HIGHMEM pages, but that will need to change arch specific code for x86, 
>> powerpc,
>> sparc, microblaze, arm, mips, um and tile etc. Seems a little overhead.
>>  And sadly enough, I found the quick fix is still incomplete. The 
>> original
>> patch still have another issue that, reset_zone_present_pages() is only 
>> called
>> for IA64, so it will cause trouble for other arches which make use of 
>> "bootmem.c".
>>  Then I feel a little guilty and tried to find a cleaner solution without
>> touching arch specific code. But things are more complex than my expectation 
>> and
>> I'm still working on that.
>>  So how about totally reverting the changeset 
>> 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
>> and I will post another version once I found a cleaner way?
> 
> I think fixup_zone_present_pages() are very useful for memory hotplug.
> 
> We calculate zone->present_pages in free_area_init_core(), but its value is 
> wrong.
> So it is why we fix it in fixup_zone_present_pages().
> 
> What about this:
> 1. init zone->present_pages to the present pages in this zone(include bootmem)
> 2. don't reset zone->present_pages for HIGHMEM pages
> 
> We don't allocate bootmem from HIGHMEM. So its present pages is inited in 
> step1
> and there is no need to fix it in step2.
Hi Congyang,


Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-15 Thread Wen Congyang

At 2012/11/15 19:28, Bob Liu Wrote:

On Thu, Nov 15, 2012 at 5:22 PM, Wen Congyang  wrote:

Hi, Liu Jiang

At 11/14/2012 10:52 PM, Jiang Liu Wrote:

On 11/07/2012 04:43 AM, Andrew Morton wrote:

On Tue, 6 Nov 2012 09:31:57 +0800
Jiang Liu  wrote:


Changeset 7f1290f2f2 tries to fix a issue when calculating
zone->present_pages, but it causes a regression to 32bit systems with
HIGHMEM. With that changeset, function reset_zone_present_pages()
resets all zone->present_pages to zero, and fixup_zone_present_pages()
is called to recalculate zone->present_pages when boot allocator frees
core memory pages into buddy allocator. Because highmem pages are not
freed by bootmem allocator, all highmem zones' present_pages becomes
zero.

Actually there's no need to recalculate present_pages for highmem zone
because bootmem allocator never allocates pages from them. So fix the
regression by skipping highmem in function reset_zone_present_pages()
and fixup_zone_present_pages().

...

--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void)
 for_each_node_state(nid, N_HIGH_MEMORY) {
 for (i = 0; i<  MAX_NR_ZONES; i++) {
 z = NODE_DATA(nid)->node_zones + i;
-   z->present_pages = 0;
+   if (!is_highmem(z))
+   z->present_pages = 0;
 }
 }
  }
@@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long 
start_pfn,

 for (i = 0; i<  MAX_NR_ZONES; i++) {
 z = NODE_DATA(nid)->node_zones + i;
+   if (is_highmem(z))
+   continue;
+
 zone_start_pfn = z->zone_start_pfn;
 zone_end_pfn = zone_start_pfn + z->spanned_pages;
-
-   /* if the two regions intersect */
 if (!(zone_start_pfn>= end_pfn || zone_end_pfn<= start_pfn))
 z->present_pages += min(end_pfn, zone_end_pfn) -
 max(start_pfn, zone_start_pfn);


This ...  isn't very nice.  It is embeds within
reset_zone_present_pages() and fixup_zone_present_pages() knowledge
about their caller's state.  Or, more specifically, it is emebedding
knowledge about the overall state of the system when these functions
are called.

I mean, a function called "reset_zone_present_pages" should reset
->present_pages!

The fact that fixup_zone_present_page() has multiple call sites makes
this all even more risky.  And what are the interactions between this
and memory hotplug?

Can we find a cleaner fix?

Please tell us more about what's happening here.  Is it the case that
reset_zone_present_pages() is being called *after* highmem has been
populated?  If so, then fixup_zone_present_pages() should work
correctly for highmem?  Or is it the case that highmem hasn't yet been
setup?  IOW, what is the sequence of operations here?

Is the problem that we're *missing* a call to
fixup_zone_present_pages(), perhaps?  If we call
fixup_zone_present_pages() after highmem has been populated,
fixup_zone_present_pages() should correctly fill in the highmem zone's
->present_pages?

Hi Andrew,
   Sorry for the late response:(
   I have done more investigations according to your suggestions. Currently
we have only called fixup_zone_present_pages() for memory freed by bootmem
allocator and missed HIGHMEM pages. We could also call 
fixup_zone_present_pages()
for HIGHMEM pages, but that will need to change arch specific code for x86, 
powerpc,
sparc, microblaze, arm, mips, um and tile etc. Seems a little overhead.
   And sadly enough, I found the quick fix is still incomplete. The original
patch still have another issue that, reset_zone_present_pages() is only called
for IA64, so it will cause trouble for other arches which make use of 
"bootmem.c".
   Then I feel a little guilty and tried to find a cleaner solution without
touching arch specific code. But things are more complex than my expectation and
I'm still working on that.
   So how about totally reverting the changeset 
7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
and I will post another version once I found a cleaner way?


I think fixup_zone_present_pages() are very useful for memory hotplug.



I might miss something, but if memory hotplug is the only user depends on
fixup_zone_present_pages().


IIRC, water_mask depends on zone->present_pages. But I don't meet any 
problem

even if zone->present_pages is wrong.


Why not reverting the changeset 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
And add checking to offline_pages() like:
if (zone->present_pages>= offlined_page)
 zone->present_pages -= offlined_pages;
else
 zone->present_pages = 0;

It's more simple and can minimize the effect to other parts of kernel.


Hmm, zone->present_pages may be 0 when there is memory in this zone which is
onlined and in use. If zone->present_pages becomes to 0, we will free pcp
list for this zone. It will cause some unexpected error.




We 

Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-15 Thread Bob Liu
On Thu, Nov 15, 2012 at 5:22 PM, Wen Congyang  wrote:
> Hi, Liu Jiang
>
> At 11/14/2012 10:52 PM, Jiang Liu Wrote:
>> On 11/07/2012 04:43 AM, Andrew Morton wrote:
>>> On Tue, 6 Nov 2012 09:31:57 +0800
>>> Jiang Liu  wrote:
>>>
 Changeset 7f1290f2f2 tries to fix a issue when calculating
 zone->present_pages, but it causes a regression to 32bit systems with
 HIGHMEM. With that changeset, function reset_zone_present_pages()
 resets all zone->present_pages to zero, and fixup_zone_present_pages()
 is called to recalculate zone->present_pages when boot allocator frees
 core memory pages into buddy allocator. Because highmem pages are not
 freed by bootmem allocator, all highmem zones' present_pages becomes
 zero.

 Actually there's no need to recalculate present_pages for highmem zone
 because bootmem allocator never allocates pages from them. So fix the
 regression by skipping highmem in function reset_zone_present_pages()
 and fixup_zone_present_pages().

 ...

 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void)
 for_each_node_state(nid, N_HIGH_MEMORY) {
 for (i = 0; i < MAX_NR_ZONES; i++) {
 z = NODE_DATA(nid)->node_zones + i;
 -   z->present_pages = 0;
 +   if (!is_highmem(z))
 +   z->present_pages = 0;
 }
 }
  }
 @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned 
 long start_pfn,

 for (i = 0; i < MAX_NR_ZONES; i++) {
 z = NODE_DATA(nid)->node_zones + i;
 +   if (is_highmem(z))
 +   continue;
 +
 zone_start_pfn = z->zone_start_pfn;
 zone_end_pfn = zone_start_pfn + z->spanned_pages;
 -
 -   /* if the two regions intersect */
 if (!(zone_start_pfn >= end_pfn || zone_end_pfn <= start_pfn))
 z->present_pages += min(end_pfn, zone_end_pfn) -
 max(start_pfn, zone_start_pfn);
>>>
>>> This ...  isn't very nice.  It is embeds within
>>> reset_zone_present_pages() and fixup_zone_present_pages() knowledge
>>> about their caller's state.  Or, more specifically, it is emebedding
>>> knowledge about the overall state of the system when these functions
>>> are called.
>>>
>>> I mean, a function called "reset_zone_present_pages" should reset
>>> ->present_pages!
>>>
>>> The fact that fixup_zone_present_page() has multiple call sites makes
>>> this all even more risky.  And what are the interactions between this
>>> and memory hotplug?
>>>
>>> Can we find a cleaner fix?
>>>
>>> Please tell us more about what's happening here.  Is it the case that
>>> reset_zone_present_pages() is being called *after* highmem has been
>>> populated?  If so, then fixup_zone_present_pages() should work
>>> correctly for highmem?  Or is it the case that highmem hasn't yet been
>>> setup?  IOW, what is the sequence of operations here?
>>>
>>> Is the problem that we're *missing* a call to
>>> fixup_zone_present_pages(), perhaps?  If we call
>>> fixup_zone_present_pages() after highmem has been populated,
>>> fixup_zone_present_pages() should correctly fill in the highmem zone's
>>> ->present_pages?
>> Hi Andrew,
>>   Sorry for the late response:(
>>   I have done more investigations according to your suggestions. 
>> Currently
>> we have only called fixup_zone_present_pages() for memory freed by bootmem
>> allocator and missed HIGHMEM pages. We could also call 
>> fixup_zone_present_pages()
>> for HIGHMEM pages, but that will need to change arch specific code for x86, 
>> powerpc,
>> sparc, microblaze, arm, mips, um and tile etc. Seems a little overhead.
>>   And sadly enough, I found the quick fix is still incomplete. The 
>> original
>> patch still have another issue that, reset_zone_present_pages() is only 
>> called
>> for IA64, so it will cause trouble for other arches which make use of 
>> "bootmem.c".
>>   Then I feel a little guilty and tried to find a cleaner solution 
>> without
>> touching arch specific code. But things are more complex than my expectation 
>> and
>> I'm still working on that.
>>   So how about totally reverting the changeset 
>> 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
>> and I will post another version once I found a cleaner way?
>
> I think fixup_zone_present_pages() are very useful for memory hotplug.
>

I might miss something, but if memory hotplug is the only user depends on
fixup_zone_present_pages().
Why not reverting the changeset 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
And add checking to offline_pages() like:
if (zone->present_pages >= offlined_page)
zone->present_pages -= offlined_pages;
else
zone->present_pages = 0;

It's more simple and can minimize the effect to other parts of 

Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-15 Thread Wen Congyang
Hi, Liu Jiang

At 11/14/2012 10:52 PM, Jiang Liu Wrote:
> On 11/07/2012 04:43 AM, Andrew Morton wrote:
>> On Tue, 6 Nov 2012 09:31:57 +0800
>> Jiang Liu  wrote:
>>
>>> Changeset 7f1290f2f2 tries to fix a issue when calculating
>>> zone->present_pages, but it causes a regression to 32bit systems with
>>> HIGHMEM. With that changeset, function reset_zone_present_pages()
>>> resets all zone->present_pages to zero, and fixup_zone_present_pages()
>>> is called to recalculate zone->present_pages when boot allocator frees
>>> core memory pages into buddy allocator. Because highmem pages are not
>>> freed by bootmem allocator, all highmem zones' present_pages becomes
>>> zero.
>>>
>>> Actually there's no need to recalculate present_pages for highmem zone
>>> because bootmem allocator never allocates pages from them. So fix the
>>> regression by skipping highmem in function reset_zone_present_pages()
>>> and fixup_zone_present_pages().
>>>
>>> ...
>>>
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void)
>>> for_each_node_state(nid, N_HIGH_MEMORY) {
>>> for (i = 0; i < MAX_NR_ZONES; i++) {
>>> z = NODE_DATA(nid)->node_zones + i;
>>> -   z->present_pages = 0;
>>> +   if (!is_highmem(z))
>>> +   z->present_pages = 0;
>>> }
>>> }
>>>  }
>>> @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned 
>>> long start_pfn,
>>>  
>>> for (i = 0; i < MAX_NR_ZONES; i++) {
>>> z = NODE_DATA(nid)->node_zones + i;
>>> +   if (is_highmem(z))
>>> +   continue;
>>> +
>>> zone_start_pfn = z->zone_start_pfn;
>>> zone_end_pfn = zone_start_pfn + z->spanned_pages;
>>> -
>>> -   /* if the two regions intersect */
>>> if (!(zone_start_pfn >= end_pfn || zone_end_pfn <= start_pfn))
>>> z->present_pages += min(end_pfn, zone_end_pfn) -
>>> max(start_pfn, zone_start_pfn);
>>
>> This ...  isn't very nice.  It is embeds within
>> reset_zone_present_pages() and fixup_zone_present_pages() knowledge
>> about their caller's state.  Or, more specifically, it is emebedding
>> knowledge about the overall state of the system when these functions
>> are called.
>>
>> I mean, a function called "reset_zone_present_pages" should reset
>> ->present_pages!
>>
>> The fact that fixup_zone_present_page() has multiple call sites makes
>> this all even more risky.  And what are the interactions between this
>> and memory hotplug?
>>
>> Can we find a cleaner fix?
>>
>> Please tell us more about what's happening here.  Is it the case that
>> reset_zone_present_pages() is being called *after* highmem has been
>> populated?  If so, then fixup_zone_present_pages() should work
>> correctly for highmem?  Or is it the case that highmem hasn't yet been
>> setup?  IOW, what is the sequence of operations here?
>>
>> Is the problem that we're *missing* a call to
>> fixup_zone_present_pages(), perhaps?  If we call
>> fixup_zone_present_pages() after highmem has been populated,
>> fixup_zone_present_pages() should correctly fill in the highmem zone's
>> ->present_pages?
> Hi Andrew,
>   Sorry for the late response:(
>   I have done more investigations according to your suggestions. Currently
> we have only called fixup_zone_present_pages() for memory freed by bootmem
> allocator and missed HIGHMEM pages. We could also call 
> fixup_zone_present_pages()
> for HIGHMEM pages, but that will need to change arch specific code for x86, 
> powerpc,
> sparc, microblaze, arm, mips, um and tile etc. Seems a little overhead.
>   And sadly enough, I found the quick fix is still incomplete. The 
> original
> patch still have another issue that, reset_zone_present_pages() is only called
> for IA64, so it will cause trouble for other arches which make use of 
> "bootmem.c".
>   Then I feel a little guilty and tried to find a cleaner solution without
> touching arch specific code. But things are more complex than my expectation 
> and
> I'm still working on that.
>   So how about totally reverting the changeset 
> 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
> and I will post another version once I found a cleaner way?

I think fixup_zone_present_pages() are very useful for memory hotplug.

We calculate zone->present_pages in free_area_init_core(), but its value is 
wrong.
So it is why we fix it in fixup_zone_present_pages().

What about this:
1. init zone->present_pages to the present pages in this zone(include bootmem)
2. don't reset zone->present_pages for HIGHMEM pages

We don't allocate bootmem from HIGHMEM. So its present pages is inited in step1
and there is no need to fix it in step2.

Is it OK?

If it is OK, I will resend the patch for step1(the patch is from laijs).

Thanks
Wen Congyang

>   Thanks!
>   Gerry
> 
>>
>>
>> --
>> To 

Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-15 Thread Wen Congyang
Hi, Liu Jiang

At 11/14/2012 10:52 PM, Jiang Liu Wrote:
 On 11/07/2012 04:43 AM, Andrew Morton wrote:
 On Tue, 6 Nov 2012 09:31:57 +0800
 Jiang Liu jiang@huawei.com wrote:

 Changeset 7f1290f2f2 tries to fix a issue when calculating
 zone-present_pages, but it causes a regression to 32bit systems with
 HIGHMEM. With that changeset, function reset_zone_present_pages()
 resets all zone-present_pages to zero, and fixup_zone_present_pages()
 is called to recalculate zone-present_pages when boot allocator frees
 core memory pages into buddy allocator. Because highmem pages are not
 freed by bootmem allocator, all highmem zones' present_pages becomes
 zero.

 Actually there's no need to recalculate present_pages for highmem zone
 because bootmem allocator never allocates pages from them. So fix the
 regression by skipping highmem in function reset_zone_present_pages()
 and fixup_zone_present_pages().

 ...

 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void)
 for_each_node_state(nid, N_HIGH_MEMORY) {
 for (i = 0; i  MAX_NR_ZONES; i++) {
 z = NODE_DATA(nid)-node_zones + i;
 -   z-present_pages = 0;
 +   if (!is_highmem(z))
 +   z-present_pages = 0;
 }
 }
  }
 @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned 
 long start_pfn,
  
 for (i = 0; i  MAX_NR_ZONES; i++) {
 z = NODE_DATA(nid)-node_zones + i;
 +   if (is_highmem(z))
 +   continue;
 +
 zone_start_pfn = z-zone_start_pfn;
 zone_end_pfn = zone_start_pfn + z-spanned_pages;
 -
 -   /* if the two regions intersect */
 if (!(zone_start_pfn = end_pfn || zone_end_pfn = start_pfn))
 z-present_pages += min(end_pfn, zone_end_pfn) -
 max(start_pfn, zone_start_pfn);

 This ...  isn't very nice.  It is embeds within
 reset_zone_present_pages() and fixup_zone_present_pages() knowledge
 about their caller's state.  Or, more specifically, it is emebedding
 knowledge about the overall state of the system when these functions
 are called.

 I mean, a function called reset_zone_present_pages should reset
 -present_pages!

 The fact that fixup_zone_present_page() has multiple call sites makes
 this all even more risky.  And what are the interactions between this
 and memory hotplug?

 Can we find a cleaner fix?

 Please tell us more about what's happening here.  Is it the case that
 reset_zone_present_pages() is being called *after* highmem has been
 populated?  If so, then fixup_zone_present_pages() should work
 correctly for highmem?  Or is it the case that highmem hasn't yet been
 setup?  IOW, what is the sequence of operations here?

 Is the problem that we're *missing* a call to
 fixup_zone_present_pages(), perhaps?  If we call
 fixup_zone_present_pages() after highmem has been populated,
 fixup_zone_present_pages() should correctly fill in the highmem zone's
 -present_pages?
 Hi Andrew,
   Sorry for the late response:(
   I have done more investigations according to your suggestions. Currently
 we have only called fixup_zone_present_pages() for memory freed by bootmem
 allocator and missed HIGHMEM pages. We could also call 
 fixup_zone_present_pages()
 for HIGHMEM pages, but that will need to change arch specific code for x86, 
 powerpc,
 sparc, microblaze, arm, mips, um and tile etc. Seems a little overhead.
   And sadly enough, I found the quick fix is still incomplete. The 
 original
 patch still have another issue that, reset_zone_present_pages() is only called
 for IA64, so it will cause trouble for other arches which make use of 
 bootmem.c.
   Then I feel a little guilty and tried to find a cleaner solution without
 touching arch specific code. But things are more complex than my expectation 
 and
 I'm still working on that.
   So how about totally reverting the changeset 
 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
 and I will post another version once I found a cleaner way?

I think fixup_zone_present_pages() are very useful for memory hotplug.

We calculate zone-present_pages in free_area_init_core(), but its value is 
wrong.
So it is why we fix it in fixup_zone_present_pages().

What about this:
1. init zone-present_pages to the present pages in this zone(include bootmem)
2. don't reset zone-present_pages for HIGHMEM pages

We don't allocate bootmem from HIGHMEM. So its present pages is inited in step1
and there is no need to fix it in step2.

Is it OK?

If it is OK, I will resend the patch for step1(the patch is from laijs).

Thanks
Wen Congyang

   Thanks!
   Gerry
 


 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  

Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-15 Thread Bob Liu
On Thu, Nov 15, 2012 at 5:22 PM, Wen Congyang we...@cn.fujitsu.com wrote:
 Hi, Liu Jiang

 At 11/14/2012 10:52 PM, Jiang Liu Wrote:
 On 11/07/2012 04:43 AM, Andrew Morton wrote:
 On Tue, 6 Nov 2012 09:31:57 +0800
 Jiang Liu jiang@huawei.com wrote:

 Changeset 7f1290f2f2 tries to fix a issue when calculating
 zone-present_pages, but it causes a regression to 32bit systems with
 HIGHMEM. With that changeset, function reset_zone_present_pages()
 resets all zone-present_pages to zero, and fixup_zone_present_pages()
 is called to recalculate zone-present_pages when boot allocator frees
 core memory pages into buddy allocator. Because highmem pages are not
 freed by bootmem allocator, all highmem zones' present_pages becomes
 zero.

 Actually there's no need to recalculate present_pages for highmem zone
 because bootmem allocator never allocates pages from them. So fix the
 regression by skipping highmem in function reset_zone_present_pages()
 and fixup_zone_present_pages().

 ...

 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void)
 for_each_node_state(nid, N_HIGH_MEMORY) {
 for (i = 0; i  MAX_NR_ZONES; i++) {
 z = NODE_DATA(nid)-node_zones + i;
 -   z-present_pages = 0;
 +   if (!is_highmem(z))
 +   z-present_pages = 0;
 }
 }
  }
 @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned 
 long start_pfn,

 for (i = 0; i  MAX_NR_ZONES; i++) {
 z = NODE_DATA(nid)-node_zones + i;
 +   if (is_highmem(z))
 +   continue;
 +
 zone_start_pfn = z-zone_start_pfn;
 zone_end_pfn = zone_start_pfn + z-spanned_pages;
 -
 -   /* if the two regions intersect */
 if (!(zone_start_pfn = end_pfn || zone_end_pfn = start_pfn))
 z-present_pages += min(end_pfn, zone_end_pfn) -
 max(start_pfn, zone_start_pfn);

 This ...  isn't very nice.  It is embeds within
 reset_zone_present_pages() and fixup_zone_present_pages() knowledge
 about their caller's state.  Or, more specifically, it is emebedding
 knowledge about the overall state of the system when these functions
 are called.

 I mean, a function called reset_zone_present_pages should reset
 -present_pages!

 The fact that fixup_zone_present_page() has multiple call sites makes
 this all even more risky.  And what are the interactions between this
 and memory hotplug?

 Can we find a cleaner fix?

 Please tell us more about what's happening here.  Is it the case that
 reset_zone_present_pages() is being called *after* highmem has been
 populated?  If so, then fixup_zone_present_pages() should work
 correctly for highmem?  Or is it the case that highmem hasn't yet been
 setup?  IOW, what is the sequence of operations here?

 Is the problem that we're *missing* a call to
 fixup_zone_present_pages(), perhaps?  If we call
 fixup_zone_present_pages() after highmem has been populated,
 fixup_zone_present_pages() should correctly fill in the highmem zone's
 -present_pages?
 Hi Andrew,
   Sorry for the late response:(
   I have done more investigations according to your suggestions. 
 Currently
 we have only called fixup_zone_present_pages() for memory freed by bootmem
 allocator and missed HIGHMEM pages. We could also call 
 fixup_zone_present_pages()
 for HIGHMEM pages, but that will need to change arch specific code for x86, 
 powerpc,
 sparc, microblaze, arm, mips, um and tile etc. Seems a little overhead.
   And sadly enough, I found the quick fix is still incomplete. The 
 original
 patch still have another issue that, reset_zone_present_pages() is only 
 called
 for IA64, so it will cause trouble for other arches which make use of 
 bootmem.c.
   Then I feel a little guilty and tried to find a cleaner solution 
 without
 touching arch specific code. But things are more complex than my expectation 
 and
 I'm still working on that.
   So how about totally reverting the changeset 
 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
 and I will post another version once I found a cleaner way?

 I think fixup_zone_present_pages() are very useful for memory hotplug.


I might miss something, but if memory hotplug is the only user depends on
fixup_zone_present_pages().
Why not reverting the changeset 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
And add checking to offline_pages() like:
if (zone-present_pages = offlined_page)
zone-present_pages -= offlined_pages;
else
zone-present_pages = 0;

It's more simple and can minimize the effect to other parts of kernel.

 We calculate zone-present_pages in free_area_init_core(), but its value is 
 wrong.
 So it is why we fix it in fixup_zone_present_pages().

 What about this:
 1. init zone-present_pages to the present pages in this zone(include bootmem)
 2. don't reset zone-present_pages for HIGHMEM pages

 

Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-15 Thread Wen Congyang

At 2012/11/15 19:28, Bob Liu Wrote:

On Thu, Nov 15, 2012 at 5:22 PM, Wen Congyangwe...@cn.fujitsu.com  wrote:

Hi, Liu Jiang

At 11/14/2012 10:52 PM, Jiang Liu Wrote:

On 11/07/2012 04:43 AM, Andrew Morton wrote:

On Tue, 6 Nov 2012 09:31:57 +0800
Jiang Liujiang@huawei.com  wrote:


Changeset 7f1290f2f2 tries to fix a issue when calculating
zone-present_pages, but it causes a regression to 32bit systems with
HIGHMEM. With that changeset, function reset_zone_present_pages()
resets all zone-present_pages to zero, and fixup_zone_present_pages()
is called to recalculate zone-present_pages when boot allocator frees
core memory pages into buddy allocator. Because highmem pages are not
freed by bootmem allocator, all highmem zones' present_pages becomes
zero.

Actually there's no need to recalculate present_pages for highmem zone
because bootmem allocator never allocates pages from them. So fix the
regression by skipping highmem in function reset_zone_present_pages()
and fixup_zone_present_pages().

...

--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void)
 for_each_node_state(nid, N_HIGH_MEMORY) {
 for (i = 0; i  MAX_NR_ZONES; i++) {
 z = NODE_DATA(nid)-node_zones + i;
-   z-present_pages = 0;
+   if (!is_highmem(z))
+   z-present_pages = 0;
 }
 }
  }
@@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long 
start_pfn,

 for (i = 0; i  MAX_NR_ZONES; i++) {
 z = NODE_DATA(nid)-node_zones + i;
+   if (is_highmem(z))
+   continue;
+
 zone_start_pfn = z-zone_start_pfn;
 zone_end_pfn = zone_start_pfn + z-spanned_pages;
-
-   /* if the two regions intersect */
 if (!(zone_start_pfn= end_pfn || zone_end_pfn= start_pfn))
 z-present_pages += min(end_pfn, zone_end_pfn) -
 max(start_pfn, zone_start_pfn);


This ...  isn't very nice.  It is embeds within
reset_zone_present_pages() and fixup_zone_present_pages() knowledge
about their caller's state.  Or, more specifically, it is emebedding
knowledge about the overall state of the system when these functions
are called.

I mean, a function called reset_zone_present_pages should reset
-present_pages!

The fact that fixup_zone_present_page() has multiple call sites makes
this all even more risky.  And what are the interactions between this
and memory hotplug?

Can we find a cleaner fix?

Please tell us more about what's happening here.  Is it the case that
reset_zone_present_pages() is being called *after* highmem has been
populated?  If so, then fixup_zone_present_pages() should work
correctly for highmem?  Or is it the case that highmem hasn't yet been
setup?  IOW, what is the sequence of operations here?

Is the problem that we're *missing* a call to
fixup_zone_present_pages(), perhaps?  If we call
fixup_zone_present_pages() after highmem has been populated,
fixup_zone_present_pages() should correctly fill in the highmem zone's
-present_pages?

Hi Andrew,
   Sorry for the late response:(
   I have done more investigations according to your suggestions. Currently
we have only called fixup_zone_present_pages() for memory freed by bootmem
allocator and missed HIGHMEM pages. We could also call 
fixup_zone_present_pages()
for HIGHMEM pages, but that will need to change arch specific code for x86, 
powerpc,
sparc, microblaze, arm, mips, um and tile etc. Seems a little overhead.
   And sadly enough, I found the quick fix is still incomplete. The original
patch still have another issue that, reset_zone_present_pages() is only called
for IA64, so it will cause trouble for other arches which make use of 
bootmem.c.
   Then I feel a little guilty and tried to find a cleaner solution without
touching arch specific code. But things are more complex than my expectation and
I'm still working on that.
   So how about totally reverting the changeset 
7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
and I will post another version once I found a cleaner way?


I think fixup_zone_present_pages() are very useful for memory hotplug.



I might miss something, but if memory hotplug is the only user depends on
fixup_zone_present_pages().


IIRC, water_mask depends on zone-present_pages. But I don't meet any 
problem

even if zone-present_pages is wrong.


Why not reverting the changeset 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
And add checking to offline_pages() like:
if (zone-present_pages= offlined_page)
 zone-present_pages -= offlined_pages;
else
 zone-present_pages = 0;

It's more simple and can minimize the effect to other parts of kernel.


Hmm, zone-present_pages may be 0 when there is memory in this zone which is
onlined and in use. If zone-present_pages becomes to 0, we will free pcp
list for this zone. It will cause some unexpected error.

Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-15 Thread Jiang Liu
On 11/15/2012 05:22 PM, Wen Congyang wrote:
 Hi, Liu Jiang
 
 At 11/14/2012 10:52 PM, Jiang Liu Wrote:
 On 11/07/2012 04:43 AM, Andrew Morton wrote:
 On Tue, 6 Nov 2012 09:31:57 +0800
 Jiang Liu jiang@huawei.com wrote:

 Changeset 7f1290f2f2 tries to fix a issue when calculating
 zone-present_pages, but it causes a regression to 32bit systems with
 HIGHMEM. With that changeset, function reset_zone_present_pages()
 resets all zone-present_pages to zero, and fixup_zone_present_pages()
 is called to recalculate zone-present_pages when boot allocator frees
 core memory pages into buddy allocator. Because highmem pages are not
 freed by bootmem allocator, all highmem zones' present_pages becomes
 zero.

 Actually there's no need to recalculate present_pages for highmem zone
 because bootmem allocator never allocates pages from them. So fix the
 regression by skipping highmem in function reset_zone_present_pages()
 and fixup_zone_present_pages().

 ...

 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void)
for_each_node_state(nid, N_HIGH_MEMORY) {
for (i = 0; i  MAX_NR_ZONES; i++) {
z = NODE_DATA(nid)-node_zones + i;
 -  z-present_pages = 0;
 +  if (!is_highmem(z))
 +  z-present_pages = 0;
}
}
  }
 @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned 
 long start_pfn,
  
for (i = 0; i  MAX_NR_ZONES; i++) {
z = NODE_DATA(nid)-node_zones + i;
 +  if (is_highmem(z))
 +  continue;
 +
zone_start_pfn = z-zone_start_pfn;
zone_end_pfn = zone_start_pfn + z-spanned_pages;
 -
 -  /* if the two regions intersect */
if (!(zone_start_pfn = end_pfn || zone_end_pfn = start_pfn))
z-present_pages += min(end_pfn, zone_end_pfn) -
max(start_pfn, zone_start_pfn);

 This ...  isn't very nice.  It is embeds within
 reset_zone_present_pages() and fixup_zone_present_pages() knowledge
 about their caller's state.  Or, more specifically, it is emebedding
 knowledge about the overall state of the system when these functions
 are called.

 I mean, a function called reset_zone_present_pages should reset
 -present_pages!

 The fact that fixup_zone_present_page() has multiple call sites makes
 this all even more risky.  And what are the interactions between this
 and memory hotplug?

 Can we find a cleaner fix?

 Please tell us more about what's happening here.  Is it the case that
 reset_zone_present_pages() is being called *after* highmem has been
 populated?  If so, then fixup_zone_present_pages() should work
 correctly for highmem?  Or is it the case that highmem hasn't yet been
 setup?  IOW, what is the sequence of operations here?

 Is the problem that we're *missing* a call to
 fixup_zone_present_pages(), perhaps?  If we call
 fixup_zone_present_pages() after highmem has been populated,
 fixup_zone_present_pages() should correctly fill in the highmem zone's
 -present_pages?
 Hi Andrew,
  Sorry for the late response:(
  I have done more investigations according to your suggestions. Currently
 we have only called fixup_zone_present_pages() for memory freed by bootmem
 allocator and missed HIGHMEM pages. We could also call 
 fixup_zone_present_pages()
 for HIGHMEM pages, but that will need to change arch specific code for x86, 
 powerpc,
 sparc, microblaze, arm, mips, um and tile etc. Seems a little overhead.
  And sadly enough, I found the quick fix is still incomplete. The 
 original
 patch still have another issue that, reset_zone_present_pages() is only 
 called
 for IA64, so it will cause trouble for other arches which make use of 
 bootmem.c.
  Then I feel a little guilty and tried to find a cleaner solution without
 touching arch specific code. But things are more complex than my expectation 
 and
 I'm still working on that.
  So how about totally reverting the changeset 
 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
 and I will post another version once I found a cleaner way?
 
 I think fixup_zone_present_pages() are very useful for memory hotplug.
 
 We calculate zone-present_pages in free_area_init_core(), but its value is 
 wrong.
 So it is why we fix it in fixup_zone_present_pages().
 
 What about this:
 1. init zone-present_pages to the present pages in this zone(include bootmem)
 2. don't reset zone-present_pages for HIGHMEM pages
 
 We don't allocate bootmem from HIGHMEM. So its present pages is inited in 
 step1
 and there is no need to fix it in step2.
Hi Congyang,

I feel that zone-present_pages has been abused. I guess it means physical 
pages 
present in this zone originally, but now sometimes zone-present_pages is used 
as
pages in this zone managed by the buddy system. So I'm trying to add a new
field managed_pages into zone, which accounts for pages managed by buddy 
system.
That's 

Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-15 Thread Andrew Morton
On Wed, 14 Nov 2012 22:52:03 +0800
Jiang Liu liu...@gmail.com wrote:

   So how about totally reverting the changeset 
 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
 and I will post another version once I found a cleaner way?

We do need to get this regression fixed and I guess that a
straightforward revert is an acceptable way of doing that, for now.


I queued the below, with a plan to send it to Linus next week.


From: Andrew Morton a...@linux-foundation.org
Subject: revert mm: fix-up zone present pages

Revert

commit 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
Author: Jianguo Wu wujian...@huawei.com
AuthorDate: Mon Oct 8 16:33:06 2012 -0700
Commit: Linus Torvalds torva...@linux-foundation.org
CommitDate: Tue Oct 9 16:22:54 2012 +0900

mm: fix-up zone present pages


That patch tried to fix a issue when calculating zone-present_pages, but
it caused a regression on 32bit systems with HIGHMEM.  With that
changeset, reset_zone_present_pages() resets all zone-present_pages to
zero, and fixup_zone_present_pages() is called to recalculate
zone-present_pages when the boot allocator frees core memory pages into
buddy allocator.  Because highmem pages are not freed by bootmem
allocator, all highmem zones' present_pages becomes zero.

Various options for improving the situation are being discussed but for
now, let's return to the 3.6 code.

Cc: Jianguo Wu wujian...@huawei.com
Cc: Jiang Liu jiang@huawei.com
Cc: Petr Tesarik ptesa...@suse.cz
Cc: Luck, Tony tony.l...@intel.com
Cc: Mel Gorman m...@csn.ul.ie
Cc: Yinghai Lu ying...@kernel.org
Cc: Minchan Kim minchan@gmail.com
Cc: Johannes Weiner han...@cmpxchg.org
Cc: David Rientjes rient...@google.com
Signed-off-by: Andrew Morton a...@linux-foundation.org
---

 arch/ia64/mm/init.c |1 -
 include/linux/mm.h  |4 
 mm/bootmem.c|   10 +-
 mm/memory_hotplug.c |7 ---
 mm/nobootmem.c  |3 ---
 mm/page_alloc.c |   34 --
 6 files changed, 1 insertion(+), 58 deletions(-)

diff -puN arch/ia64/mm/init.c~revert-1 arch/ia64/mm/init.c
--- a/arch/ia64/mm/init.c~revert-1
+++ a/arch/ia64/mm/init.c
@@ -637,7 +637,6 @@ mem_init (void)
 
high_memory = __va(max_low_pfn * PAGE_SIZE);
 
-   reset_zone_present_pages();
for_each_online_pgdat(pgdat)
if (pgdat-bdata-node_bootmem_map)
totalram_pages += free_all_bootmem_node(pgdat);
diff -puN include/linux/mm.h~revert-1 include/linux/mm.h
--- a/include/linux/mm.h~revert-1
+++ a/include/linux/mm.h
@@ -1684,9 +1684,5 @@ static inline unsigned int debug_guardpa
 static inline bool page_is_guard(struct page *page) { return false; }
 #endif /* CONFIG_DEBUG_PAGEALLOC */
 
-extern void reset_zone_present_pages(void);
-extern void fixup_zone_present_pages(int nid, unsigned long start_pfn,
-   unsigned long end_pfn);
-
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff -puN mm/bootmem.c~revert-1 mm/bootmem.c
--- a/mm/bootmem.c~revert-1
+++ a/mm/bootmem.c
@@ -198,8 +198,6 @@ static unsigned long __init free_all_boo
int order = ilog2(BITS_PER_LONG);
 
__free_pages_bootmem(pfn_to_page(start), order);
-   
fixup_zone_present_pages(page_to_nid(pfn_to_page(start)),
-   start, start + BITS_PER_LONG);
count += BITS_PER_LONG;
start += BITS_PER_LONG;
} else {
@@ -210,9 +208,6 @@ static unsigned long __init free_all_boo
if (vec  1) {
page = pfn_to_page(start + off);
__free_pages_bootmem(page, 0);
-   fixup_zone_present_pages(
-   page_to_nid(page),
-   start + off, start + off + 1);
count++;
}
vec = 1;
@@ -226,11 +221,8 @@ static unsigned long __init free_all_boo
pages = bdata-node_low_pfn - bdata-node_min_pfn;
pages = bootmem_bootmap_pages(pages);
count += pages;
-   while (pages--) {
-   fixup_zone_present_pages(page_to_nid(page),
-   page_to_pfn(page), page_to_pfn(page) + 1);
+   while (pages--)
__free_pages_bootmem(page++, 0);
-   }
 
bdebug(nid=%td released=%lx\n, bdata - bootmem_node_data, count);
 
diff -puN mm/memory_hotplug.c~revert-1 mm/memory_hotplug.c
--- a/mm/memory_hotplug.c~revert-1
+++ a/mm/memory_hotplug.c
@@ -106,7 +106,6 @@ static void get_page_bootmem(unsigned lo
 void __ref put_page_bootmem(struct page *page)
 {
unsigned long type;
-   struct zone *zone;
 
type = (unsigned long) page-lru.next;
BUG_ON(type  

Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-15 Thread Chris Clayton



On 11/15/12 19:24, Andrew Morton wrote:

On Wed, 14 Nov 2012 22:52:03 +0800
Jiang Liu liu...@gmail.com wrote:


So how about totally reverting the changeset 
7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
and I will post another version once I found a cleaner way?


We do need to get this regression fixed and I guess that a
straightforward revert is an acceptable way of doing that, for now.


I queued the below, with a plan to send it to Linus next week.



I've applied this patch to v3.7-rc5-28-g79e979e and can confirm that it 
fixes the problem I had with my laptop failing to resume (by either 
freezing or rebooting) after a suspend to disk.


Tested-by: Chris Clayton chris2...@googlemail.com



From: Andrew Morton a...@linux-foundation.org
Subject: revert mm: fix-up zone present pages

Revert

commit 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
Author: Jianguo Wu wujian...@huawei.com
AuthorDate: Mon Oct 8 16:33:06 2012 -0700
Commit: Linus Torvalds torva...@linux-foundation.org
CommitDate: Tue Oct 9 16:22:54 2012 +0900

 mm: fix-up zone present pages


That patch tried to fix a issue when calculating zone-present_pages, but
it caused a regression on 32bit systems with HIGHMEM.  With that
changeset, reset_zone_present_pages() resets all zone-present_pages to
zero, and fixup_zone_present_pages() is called to recalculate
zone-present_pages when the boot allocator frees core memory pages into
buddy allocator.  Because highmem pages are not freed by bootmem
allocator, all highmem zones' present_pages becomes zero.

Various options for improving the situation are being discussed but for
now, let's return to the 3.6 code.

Cc: Jianguo Wu wujian...@huawei.com
Cc: Jiang Liu jiang@huawei.com
Cc: Petr Tesarik ptesa...@suse.cz
Cc: Luck, Tony tony.l...@intel.com
Cc: Mel Gorman m...@csn.ul.ie
Cc: Yinghai Lu ying...@kernel.org
Cc: Minchan Kim minchan@gmail.com
Cc: Johannes Weiner han...@cmpxchg.org
Cc: David Rientjes rient...@google.com
Signed-off-by: Andrew Morton a...@linux-foundation.org
---

  arch/ia64/mm/init.c |1 -
  include/linux/mm.h  |4 
  mm/bootmem.c|   10 +-
  mm/memory_hotplug.c |7 ---
  mm/nobootmem.c  |3 ---
  mm/page_alloc.c |   34 --
  6 files changed, 1 insertion(+), 58 deletions(-)

diff -puN arch/ia64/mm/init.c~revert-1 arch/ia64/mm/init.c
--- a/arch/ia64/mm/init.c~revert-1
+++ a/arch/ia64/mm/init.c
@@ -637,7 +637,6 @@ mem_init (void)

high_memory = __va(max_low_pfn * PAGE_SIZE);

-   reset_zone_present_pages();
for_each_online_pgdat(pgdat)
if (pgdat-bdata-node_bootmem_map)
totalram_pages += free_all_bootmem_node(pgdat);
diff -puN include/linux/mm.h~revert-1 include/linux/mm.h
--- a/include/linux/mm.h~revert-1
+++ a/include/linux/mm.h
@@ -1684,9 +1684,5 @@ static inline unsigned int debug_guardpa
  static inline bool page_is_guard(struct page *page) { return false; }
  #endif /* CONFIG_DEBUG_PAGEALLOC */

-extern void reset_zone_present_pages(void);
-extern void fixup_zone_present_pages(int nid, unsigned long start_pfn,
-   unsigned long end_pfn);
-
  #endif /* __KERNEL__ */
  #endif /* _LINUX_MM_H */
diff -puN mm/bootmem.c~revert-1 mm/bootmem.c
--- a/mm/bootmem.c~revert-1
+++ a/mm/bootmem.c
@@ -198,8 +198,6 @@ static unsigned long __init free_all_boo
int order = ilog2(BITS_PER_LONG);

__free_pages_bootmem(pfn_to_page(start), order);
-   
fixup_zone_present_pages(page_to_nid(pfn_to_page(start)),
-   start, start + BITS_PER_LONG);
count += BITS_PER_LONG;
start += BITS_PER_LONG;
} else {
@@ -210,9 +208,6 @@ static unsigned long __init free_all_boo
if (vec  1) {
page = pfn_to_page(start + off);
__free_pages_bootmem(page, 0);
-   fixup_zone_present_pages(
-   page_to_nid(page),
-   start + off, start + off + 1);
count++;
}
vec = 1;
@@ -226,11 +221,8 @@ static unsigned long __init free_all_boo
pages = bdata-node_low_pfn - bdata-node_min_pfn;
pages = bootmem_bootmap_pages(pages);
count += pages;
-   while (pages--) {
-   fixup_zone_present_pages(page_to_nid(page),
-   page_to_pfn(page), page_to_pfn(page) + 1);
+   while (pages--)
__free_pages_bootmem(page++, 0);
-   }

bdebug(nid=%td released=%lx\n, bdata - bootmem_node_data, count);

diff -puN mm/memory_hotplug.c~revert-1 mm/memory_hotplug.c
--- 

Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-15 Thread David Rientjes
On Thu, 15 Nov 2012, Andrew Morton wrote:

 From: Andrew Morton a...@linux-foundation.org
 Subject: revert mm: fix-up zone present pages
 
 Revert
 
 commit 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
 Author: Jianguo Wu wujian...@huawei.com
 AuthorDate: Mon Oct 8 16:33:06 2012 -0700
 Commit: Linus Torvalds torva...@linux-foundation.org
 CommitDate: Tue Oct 9 16:22:54 2012 +0900
 
 mm: fix-up zone present pages
 
 
 That patch tried to fix a issue when calculating zone-present_pages, but
 it caused a regression on 32bit systems with HIGHMEM.  With that
 changeset, reset_zone_present_pages() resets all zone-present_pages to
 zero, and fixup_zone_present_pages() is called to recalculate
 zone-present_pages when the boot allocator frees core memory pages into
 buddy allocator.  Because highmem pages are not freed by bootmem
 allocator, all highmem zones' present_pages becomes zero.
 
 Various options for improving the situation are being discussed but for
 now, let's return to the 3.6 code.
 
 Cc: Jianguo Wu wujian...@huawei.com
 Cc: Jiang Liu jiang@huawei.com
 Cc: Petr Tesarik ptesa...@suse.cz
 Cc: Luck, Tony tony.l...@intel.com
 Cc: Mel Gorman m...@csn.ul.ie
 Cc: Yinghai Lu ying...@kernel.org
 Cc: Minchan Kim minchan@gmail.com
 Cc: Johannes Weiner han...@cmpxchg.org
 Cc: David Rientjes rient...@google.com
 Signed-off-by: Andrew Morton a...@linux-foundation.org

Acked-by: David Rientjes rient...@google.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-15 Thread David Rientjes
On Thu, 15 Nov 2012, Jiang Liu wrote:

 I feel that zone-present_pages has been abused. I guess it means physical 
 pages 
 present in this zone originally, but now sometimes zone-present_pages is 
 used as
 pages in this zone managed by the buddy system.

It's definition is all pages spanned by the zone that are not reserved and 
unavailable to the kernel to allocate from, and the implementation of 
bootmem requires that its memory be considered as reserved until freed.  
It's used throughout the kernel to determine the amount of memory that is 
allocatable in that zone from the page allocator since its reclaim 
heuristics and watermarks depend on this memory being allocatable.

 So I'm trying to add a new
 field managed_pages into zone, which accounts for pages managed by buddy 
 system.
 That's why I thought the clean solution is a little complex:(
 

You need to update the pgdat's node_present_pages to be consistent with 
all of its zones' present_pages.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-14 Thread Jiang Liu
On 11/07/2012 04:43 AM, Andrew Morton wrote:
> On Tue, 6 Nov 2012 09:31:57 +0800
> Jiang Liu  wrote:
> 
>> Changeset 7f1290f2f2 tries to fix a issue when calculating
>> zone->present_pages, but it causes a regression to 32bit systems with
>> HIGHMEM. With that changeset, function reset_zone_present_pages()
>> resets all zone->present_pages to zero, and fixup_zone_present_pages()
>> is called to recalculate zone->present_pages when boot allocator frees
>> core memory pages into buddy allocator. Because highmem pages are not
>> freed by bootmem allocator, all highmem zones' present_pages becomes
>> zero.
>>
>> Actually there's no need to recalculate present_pages for highmem zone
>> because bootmem allocator never allocates pages from them. So fix the
>> regression by skipping highmem in function reset_zone_present_pages()
>> and fixup_zone_present_pages().
>>
>> ...
>>
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void)
>>  for_each_node_state(nid, N_HIGH_MEMORY) {
>>  for (i = 0; i < MAX_NR_ZONES; i++) {
>>  z = NODE_DATA(nid)->node_zones + i;
>> -z->present_pages = 0;
>> +if (!is_highmem(z))
>> +z->present_pages = 0;
>>  }
>>  }
>>  }
>> @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long 
>> start_pfn,
>>  
>>  for (i = 0; i < MAX_NR_ZONES; i++) {
>>  z = NODE_DATA(nid)->node_zones + i;
>> +if (is_highmem(z))
>> +continue;
>> +
>>  zone_start_pfn = z->zone_start_pfn;
>>  zone_end_pfn = zone_start_pfn + z->spanned_pages;
>> -
>> -/* if the two regions intersect */
>>  if (!(zone_start_pfn >= end_pfn || zone_end_pfn <= start_pfn))
>>  z->present_pages += min(end_pfn, zone_end_pfn) -
>>  max(start_pfn, zone_start_pfn);
> 
> This ...  isn't very nice.  It is embeds within
> reset_zone_present_pages() and fixup_zone_present_pages() knowledge
> about their caller's state.  Or, more specifically, it is emebedding
> knowledge about the overall state of the system when these functions
> are called.
> 
> I mean, a function called "reset_zone_present_pages" should reset
> ->present_pages!
> 
> The fact that fixup_zone_present_page() has multiple call sites makes
> this all even more risky.  And what are the interactions between this
> and memory hotplug?
> 
> Can we find a cleaner fix?
> 
> Please tell us more about what's happening here.  Is it the case that
> reset_zone_present_pages() is being called *after* highmem has been
> populated?  If so, then fixup_zone_present_pages() should work
> correctly for highmem?  Or is it the case that highmem hasn't yet been
> setup?  IOW, what is the sequence of operations here?
> 
> Is the problem that we're *missing* a call to
> fixup_zone_present_pages(), perhaps?  If we call
> fixup_zone_present_pages() after highmem has been populated,
> fixup_zone_present_pages() should correctly fill in the highmem zone's
> ->present_pages?
Hi Andrew,
Sorry for the late response:(
I have done more investigations according to your suggestions. Currently
we have only called fixup_zone_present_pages() for memory freed by bootmem
allocator and missed HIGHMEM pages. We could also call 
fixup_zone_present_pages()
for HIGHMEM pages, but that will need to change arch specific code for x86, 
powerpc,
sparc, microblaze, arm, mips, um and tile etc. Seems a little overhead.
And sadly enough, I found the quick fix is still incomplete. The 
original
patch still have another issue that, reset_zone_present_pages() is only called
for IA64, so it will cause trouble for other arches which make use of 
"bootmem.c".
Then I feel a little guilty and tried to find a cleaner solution without
touching arch specific code. But things are more complex than my expectation and
I'm still working on that.
So how about totally reverting the changeset 
7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
and I will post another version once I found a cleaner way?
Thanks!
Gerry

> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-14 Thread Jiang Liu
On 11/07/2012 04:43 AM, Andrew Morton wrote:
 On Tue, 6 Nov 2012 09:31:57 +0800
 Jiang Liu jiang@huawei.com wrote:
 
 Changeset 7f1290f2f2 tries to fix a issue when calculating
 zone-present_pages, but it causes a regression to 32bit systems with
 HIGHMEM. With that changeset, function reset_zone_present_pages()
 resets all zone-present_pages to zero, and fixup_zone_present_pages()
 is called to recalculate zone-present_pages when boot allocator frees
 core memory pages into buddy allocator. Because highmem pages are not
 freed by bootmem allocator, all highmem zones' present_pages becomes
 zero.

 Actually there's no need to recalculate present_pages for highmem zone
 because bootmem allocator never allocates pages from them. So fix the
 regression by skipping highmem in function reset_zone_present_pages()
 and fixup_zone_present_pages().

 ...

 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void)
  for_each_node_state(nid, N_HIGH_MEMORY) {
  for (i = 0; i  MAX_NR_ZONES; i++) {
  z = NODE_DATA(nid)-node_zones + i;
 -z-present_pages = 0;
 +if (!is_highmem(z))
 +z-present_pages = 0;
  }
  }
  }
 @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long 
 start_pfn,
  
  for (i = 0; i  MAX_NR_ZONES; i++) {
  z = NODE_DATA(nid)-node_zones + i;
 +if (is_highmem(z))
 +continue;
 +
  zone_start_pfn = z-zone_start_pfn;
  zone_end_pfn = zone_start_pfn + z-spanned_pages;
 -
 -/* if the two regions intersect */
  if (!(zone_start_pfn = end_pfn || zone_end_pfn = start_pfn))
  z-present_pages += min(end_pfn, zone_end_pfn) -
  max(start_pfn, zone_start_pfn);
 
 This ...  isn't very nice.  It is embeds within
 reset_zone_present_pages() and fixup_zone_present_pages() knowledge
 about their caller's state.  Or, more specifically, it is emebedding
 knowledge about the overall state of the system when these functions
 are called.
 
 I mean, a function called reset_zone_present_pages should reset
 -present_pages!
 
 The fact that fixup_zone_present_page() has multiple call sites makes
 this all even more risky.  And what are the interactions between this
 and memory hotplug?
 
 Can we find a cleaner fix?
 
 Please tell us more about what's happening here.  Is it the case that
 reset_zone_present_pages() is being called *after* highmem has been
 populated?  If so, then fixup_zone_present_pages() should work
 correctly for highmem?  Or is it the case that highmem hasn't yet been
 setup?  IOW, what is the sequence of operations here?
 
 Is the problem that we're *missing* a call to
 fixup_zone_present_pages(), perhaps?  If we call
 fixup_zone_present_pages() after highmem has been populated,
 fixup_zone_present_pages() should correctly fill in the highmem zone's
 -present_pages?
Hi Andrew,
Sorry for the late response:(
I have done more investigations according to your suggestions. Currently
we have only called fixup_zone_present_pages() for memory freed by bootmem
allocator and missed HIGHMEM pages. We could also call 
fixup_zone_present_pages()
for HIGHMEM pages, but that will need to change arch specific code for x86, 
powerpc,
sparc, microblaze, arm, mips, um and tile etc. Seems a little overhead.
And sadly enough, I found the quick fix is still incomplete. The 
original
patch still have another issue that, reset_zone_present_pages() is only called
for IA64, so it will cause trouble for other arches which make use of 
bootmem.c.
Then I feel a little guilty and tried to find a cleaner solution without
touching arch specific code. But things are more complex than my expectation and
I'm still working on that.
So how about totally reverting the changeset 
7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
and I will post another version once I found a cleaner way?
Thanks!
Gerry

 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-06 Thread Andrew Morton
On Tue, 6 Nov 2012 09:31:57 +0800
Jiang Liu  wrote:

> Changeset 7f1290f2f2 tries to fix a issue when calculating
> zone->present_pages, but it causes a regression to 32bit systems with
> HIGHMEM. With that changeset, function reset_zone_present_pages()
> resets all zone->present_pages to zero, and fixup_zone_present_pages()
> is called to recalculate zone->present_pages when boot allocator frees
> core memory pages into buddy allocator. Because highmem pages are not
> freed by bootmem allocator, all highmem zones' present_pages becomes
> zero.
> 
> Actually there's no need to recalculate present_pages for highmem zone
> because bootmem allocator never allocates pages from them. So fix the
> regression by skipping highmem in function reset_zone_present_pages()
> and fixup_zone_present_pages().
> 
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void)
>   for_each_node_state(nid, N_HIGH_MEMORY) {
>   for (i = 0; i < MAX_NR_ZONES; i++) {
>   z = NODE_DATA(nid)->node_zones + i;
> - z->present_pages = 0;
> + if (!is_highmem(z))
> + z->present_pages = 0;
>   }
>   }
>  }
> @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long 
> start_pfn,
>  
>   for (i = 0; i < MAX_NR_ZONES; i++) {
>   z = NODE_DATA(nid)->node_zones + i;
> + if (is_highmem(z))
> + continue;
> +
>   zone_start_pfn = z->zone_start_pfn;
>   zone_end_pfn = zone_start_pfn + z->spanned_pages;
> -
> - /* if the two regions intersect */
>   if (!(zone_start_pfn >= end_pfn || zone_end_pfn <= start_pfn))
>   z->present_pages += min(end_pfn, zone_end_pfn) -
>   max(start_pfn, zone_start_pfn);

This ...  isn't very nice.  It is embeds within
reset_zone_present_pages() and fixup_zone_present_pages() knowledge
about their caller's state.  Or, more specifically, it is emebedding
knowledge about the overall state of the system when these functions
are called.

I mean, a function called "reset_zone_present_pages" should reset
->present_pages!

The fact that fixup_zone_present_page() has multiple call sites makes
this all even more risky.  And what are the interactions between this
and memory hotplug?

Can we find a cleaner fix?

Please tell us more about what's happening here.  Is it the case that
reset_zone_present_pages() is being called *after* highmem has been
populated?  If so, then fixup_zone_present_pages() should work
correctly for highmem?  Or is it the case that highmem hasn't yet been
setup?  IOW, what is the sequence of operations here?

Is the problem that we're *missing* a call to
fixup_zone_present_pages(), perhaps?  If we call
fixup_zone_present_pages() after highmem has been populated,
fixup_zone_present_pages() should correctly fill in the highmem zone's
->present_pages?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-06 Thread Chris Clayton



On 11/06/12 01:31, Jiang Liu wrote:

Changeset 7f1290f2f2 tries to fix a issue when calculating
zone->present_pages, but it causes a regression to 32bit systems with
HIGHMEM. With that changeset, function reset_zone_present_pages()
resets all zone->present_pages to zero, and fixup_zone_present_pages()
is called to recalculate zone->present_pages when boot allocator frees
core memory pages into buddy allocator. Because highmem pages are not
freed by bootmem allocator, all highmem zones' present_pages becomes
zero.

Actually there's no need to recalculate present_pages for highmem zone
because bootmem allocator never allocates pages from them. So fix the
regression by skipping highmem in function reset_zone_present_pages()
and fixup_zone_present_pages().

Signed-off-by: Jiang Liu 
Signed-off-by: Jianguo Wu 
Reported-by: Maciej Rutecki 
Tested-by: Maciej Rutecki 
Cc: Chris Clayton 
Cc: Rafael J. Wysocki 
Cc: Andrew Morton 
Cc: Mel Gorman 
Cc: Minchan Kim 
Cc: KAMEZAWA Hiroyuki 
Cc: Michal Hocko 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org

---

Hi Maciej,
Thanks for reporting and bisecting. We have analyzed the regression
and worked out a patch for it. Could you please help to verify whether it
fix the regression?
Thanks!
Gerry



Thanks Gerry.

I've applied this patch to 3.7.0-rc4 and can confirm that it fixes the 
problem I had with my laptop failing to resume after a suspend to disk.


Tested-by: Chris Clayton 


---
  mm/page_alloc.c |8 +---
  1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5b74de6..2311f15 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void)
for_each_node_state(nid, N_HIGH_MEMORY) {
for (i = 0; i < MAX_NR_ZONES; i++) {
z = NODE_DATA(nid)->node_zones + i;
-   z->present_pages = 0;
+   if (!is_highmem(z))
+   z->present_pages = 0;
}
}
  }
@@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long 
start_pfn,

for (i = 0; i < MAX_NR_ZONES; i++) {
z = NODE_DATA(nid)->node_zones + i;
+   if (is_highmem(z))
+   continue;
+
zone_start_pfn = z->zone_start_pfn;
zone_end_pfn = zone_start_pfn + z->spanned_pages;
-
-   /* if the two regions intersect */
if (!(zone_start_pfn >= end_pfn  || zone_end_pfn <= 
start_pfn))
z->present_pages += min(end_pfn, zone_end_pfn) -
max(start_pfn, zone_start_pfn);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-06 Thread Chris Clayton



On 11/06/12 01:31, Jiang Liu wrote:

Changeset 7f1290f2f2 tries to fix a issue when calculating
zone-present_pages, but it causes a regression to 32bit systems with
HIGHMEM. With that changeset, function reset_zone_present_pages()
resets all zone-present_pages to zero, and fixup_zone_present_pages()
is called to recalculate zone-present_pages when boot allocator frees
core memory pages into buddy allocator. Because highmem pages are not
freed by bootmem allocator, all highmem zones' present_pages becomes
zero.

Actually there's no need to recalculate present_pages for highmem zone
because bootmem allocator never allocates pages from them. So fix the
regression by skipping highmem in function reset_zone_present_pages()
and fixup_zone_present_pages().

Signed-off-by: Jiang Liu jiang@huawei.com
Signed-off-by: Jianguo Wu wujian...@huawei.com
Reported-by: Maciej Rutecki maciej.rute...@gmail.com
Tested-by: Maciej Rutecki maciej.rute...@gmail.com
Cc: Chris Clayton chris2...@googlemail.com
Cc: Rafael J. Wysocki r...@sisk.pl
Cc: Andrew Morton a...@linux-foundation.org
Cc: Mel Gorman mgor...@suse.de
Cc: Minchan Kim minc...@kernel.org
Cc: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com
Cc: Michal Hocko mho...@suse.cz
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org

---

Hi Maciej,
Thanks for reporting and bisecting. We have analyzed the regression
and worked out a patch for it. Could you please help to verify whether it
fix the regression?
Thanks!
Gerry



Thanks Gerry.

I've applied this patch to 3.7.0-rc4 and can confirm that it fixes the 
problem I had with my laptop failing to resume after a suspend to disk.


Tested-by: Chris Clayton chris2...@googlemail.com


---
  mm/page_alloc.c |8 +---
  1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5b74de6..2311f15 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void)
for_each_node_state(nid, N_HIGH_MEMORY) {
for (i = 0; i  MAX_NR_ZONES; i++) {
z = NODE_DATA(nid)-node_zones + i;
-   z-present_pages = 0;
+   if (!is_highmem(z))
+   z-present_pages = 0;
}
}
  }
@@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long 
start_pfn,

for (i = 0; i  MAX_NR_ZONES; i++) {
z = NODE_DATA(nid)-node_zones + i;
+   if (is_highmem(z))
+   continue;
+
zone_start_pfn = z-zone_start_pfn;
zone_end_pfn = zone_start_pfn + z-spanned_pages;
-
-   /* if the two regions intersect */
if (!(zone_start_pfn = end_pfn  || zone_end_pfn = 
start_pfn))
z-present_pages += min(end_pfn, zone_end_pfn) -
max(start_pfn, zone_start_pfn);


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-06 Thread Andrew Morton
On Tue, 6 Nov 2012 09:31:57 +0800
Jiang Liu jiang@huawei.com wrote:

 Changeset 7f1290f2f2 tries to fix a issue when calculating
 zone-present_pages, but it causes a regression to 32bit systems with
 HIGHMEM. With that changeset, function reset_zone_present_pages()
 resets all zone-present_pages to zero, and fixup_zone_present_pages()
 is called to recalculate zone-present_pages when boot allocator frees
 core memory pages into buddy allocator. Because highmem pages are not
 freed by bootmem allocator, all highmem zones' present_pages becomes
 zero.
 
 Actually there's no need to recalculate present_pages for highmem zone
 because bootmem allocator never allocates pages from them. So fix the
 regression by skipping highmem in function reset_zone_present_pages()
 and fixup_zone_present_pages().
 
 ...

 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void)
   for_each_node_state(nid, N_HIGH_MEMORY) {
   for (i = 0; i  MAX_NR_ZONES; i++) {
   z = NODE_DATA(nid)-node_zones + i;
 - z-present_pages = 0;
 + if (!is_highmem(z))
 + z-present_pages = 0;
   }
   }
  }
 @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long 
 start_pfn,
  
   for (i = 0; i  MAX_NR_ZONES; i++) {
   z = NODE_DATA(nid)-node_zones + i;
 + if (is_highmem(z))
 + continue;
 +
   zone_start_pfn = z-zone_start_pfn;
   zone_end_pfn = zone_start_pfn + z-spanned_pages;
 -
 - /* if the two regions intersect */
   if (!(zone_start_pfn = end_pfn || zone_end_pfn = start_pfn))
   z-present_pages += min(end_pfn, zone_end_pfn) -
   max(start_pfn, zone_start_pfn);

This ...  isn't very nice.  It is embeds within
reset_zone_present_pages() and fixup_zone_present_pages() knowledge
about their caller's state.  Or, more specifically, it is emebedding
knowledge about the overall state of the system when these functions
are called.

I mean, a function called reset_zone_present_pages should reset
-present_pages!

The fact that fixup_zone_present_page() has multiple call sites makes
this all even more risky.  And what are the interactions between this
and memory hotplug?

Can we find a cleaner fix?

Please tell us more about what's happening here.  Is it the case that
reset_zone_present_pages() is being called *after* highmem has been
populated?  If so, then fixup_zone_present_pages() should work
correctly for highmem?  Or is it the case that highmem hasn't yet been
setup?  IOW, what is the sequence of operations here?

Is the problem that we're *missing* a call to
fixup_zone_present_pages(), perhaps?  If we call
fixup_zone_present_pages() after highmem has been populated,
fixup_zone_present_pages() should correctly fill in the highmem zone's
-present_pages?


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-05 Thread Jiang Liu
Changeset 7f1290f2f2 tries to fix a issue when calculating
zone->present_pages, but it causes a regression to 32bit systems with
HIGHMEM. With that changeset, function reset_zone_present_pages()
resets all zone->present_pages to zero, and fixup_zone_present_pages()
is called to recalculate zone->present_pages when boot allocator frees
core memory pages into buddy allocator. Because highmem pages are not
freed by bootmem allocator, all highmem zones' present_pages becomes
zero.

Actually there's no need to recalculate present_pages for highmem zone
because bootmem allocator never allocates pages from them. So fix the
regression by skipping highmem in function reset_zone_present_pages()
and fixup_zone_present_pages().

Signed-off-by: Jiang Liu 
Signed-off-by: Jianguo Wu 
Reported-by: Maciej Rutecki 
Tested-by: Maciej Rutecki 
Cc: Chris Clayton 
Cc: Rafael J. Wysocki 
Cc: Andrew Morton 
Cc: Mel Gorman 
Cc: Minchan Kim 
Cc: KAMEZAWA Hiroyuki 
Cc: Michal Hocko 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org

---

Hi Maciej,
Thanks for reporting and bisecting. We have analyzed the regression
and worked out a patch for it. Could you please help to verify whether it
fix the regression?
Thanks!
Gerry

---
 mm/page_alloc.c |8 +---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5b74de6..2311f15 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void)
for_each_node_state(nid, N_HIGH_MEMORY) {
for (i = 0; i < MAX_NR_ZONES; i++) {
z = NODE_DATA(nid)->node_zones + i;
-   z->present_pages = 0;
+   if (!is_highmem(z))
+   z->present_pages = 0;
}
}
 }
@@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long 
start_pfn,
 
for (i = 0; i < MAX_NR_ZONES; i++) {
z = NODE_DATA(nid)->node_zones + i;
+   if (is_highmem(z))
+   continue;
+
zone_start_pfn = z->zone_start_pfn;
zone_end_pfn = zone_start_pfn + z->spanned_pages;
-
-   /* if the two regions intersect */
if (!(zone_start_pfn >= end_pfn || zone_end_pfn <= start_pfn))
z->present_pages += min(end_pfn, zone_end_pfn) -
max(start_pfn, zone_start_pfn);
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-05 Thread Jiang Liu
Changeset 7f1290f2f2 tries to fix a issue when calculating
zone-present_pages, but it causes a regression to 32bit systems with
HIGHMEM. With that changeset, function reset_zone_present_pages()
resets all zone-present_pages to zero, and fixup_zone_present_pages()
is called to recalculate zone-present_pages when boot allocator frees
core memory pages into buddy allocator. Because highmem pages are not
freed by bootmem allocator, all highmem zones' present_pages becomes
zero.

Actually there's no need to recalculate present_pages for highmem zone
because bootmem allocator never allocates pages from them. So fix the
regression by skipping highmem in function reset_zone_present_pages()
and fixup_zone_present_pages().

Signed-off-by: Jiang Liu jiang@huawei.com
Signed-off-by: Jianguo Wu wujian...@huawei.com
Reported-by: Maciej Rutecki maciej.rute...@gmail.com
Tested-by: Maciej Rutecki maciej.rute...@gmail.com
Cc: Chris Clayton chris2...@googlemail.com
Cc: Rafael J. Wysocki r...@sisk.pl
Cc: Andrew Morton a...@linux-foundation.org
Cc: Mel Gorman mgor...@suse.de
Cc: Minchan Kim minc...@kernel.org
Cc: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com
Cc: Michal Hocko mho...@suse.cz
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org

---

Hi Maciej,
Thanks for reporting and bisecting. We have analyzed the regression
and worked out a patch for it. Could you please help to verify whether it
fix the regression?
Thanks!
Gerry

---
 mm/page_alloc.c |8 +---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5b74de6..2311f15 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void)
for_each_node_state(nid, N_HIGH_MEMORY) {
for (i = 0; i  MAX_NR_ZONES; i++) {
z = NODE_DATA(nid)-node_zones + i;
-   z-present_pages = 0;
+   if (!is_highmem(z))
+   z-present_pages = 0;
}
}
 }
@@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long 
start_pfn,
 
for (i = 0; i  MAX_NR_ZONES; i++) {
z = NODE_DATA(nid)-node_zones + i;
+   if (is_highmem(z))
+   continue;
+
zone_start_pfn = z-zone_start_pfn;
zone_end_pfn = zone_start_pfn + z-spanned_pages;
-
-   /* if the two regions intersect */
if (!(zone_start_pfn = end_pfn || zone_end_pfn = start_pfn))
z-present_pages += min(end_pfn, zone_end_pfn) -
max(start_pfn, zone_start_pfn);
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/