Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page

2012-09-13 Thread Xiao Guangrong
On 09/13/2012 02:27 PM, Hugh Dickins wrote:
> On Wed, 12 Sep 2012, Xiao Guangrong wrote:
>> On 09/12/2012 10:03 AM, Hugh Dickins wrote:
>>
>>> What brought me to look at it was hitting "BUG at mm/huge_memory.c:1842!"
>>> running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes
>>> forcing out to swap), while I happened to have CONFIG_NUMA=y.
>>>
>>> That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page().
>>
>>>
>>> So maybe 9/12 is just obscuring what was already a BUG, either earlier
>>> in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or
>>> earlier releases, nor without CONFIG_NUMA).  I've not spent any time
>>> looking for it, maybe it's obvious - can you spot and fix it?
>>
>> Hugh,
>>
>> I think i have already found the reason,
> 
> Great, thank you.
> 
>> if i am correct, the bug was existing before my patch.
> 
> Before your patchset?  Are you sure of that?

No. :)

I have told Andrew that the fix patch need not back port in
0/3. Sorry again for my mistake.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page

2012-09-13 Thread Hugh Dickins
On Wed, 12 Sep 2012, Hugh Dickins wrote:
> > @@ -1825,6 +1825,7 @@ static bool khugepaged_prealloc_page(struct page 
> > **hpage, bool *wait)
> > return false;
> > 
> > *wait = false;
> > +   *hpage = NULL;
> > khugepaged_alloc_sleep();
> > } else if (*hpage) {
> > put_page(*hpage);
> 
> The unshown line just below this is
> 
>   *hpage = NULL;
> 
> I do wish you would take the "*hpage = NULL;" out of if and else blocks
> and place it once below both.

Hold on, I'm being unreasonable: that's an "else if", and I've no good
reason to request you to set *hpage = NULL when it's already NULL.
It would be okay if you did, but there's no reason for me to prefer it.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page

2012-09-13 Thread Hugh Dickins
On Wed, 12 Sep 2012, Xiao Guangrong wrote:
> On 09/12/2012 10:03 AM, Hugh Dickins wrote:
> 
> > What brought me to look at it was hitting "BUG at mm/huge_memory.c:1842!"
> > running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes
> > forcing out to swap), while I happened to have CONFIG_NUMA=y.
> > 
> > That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page().
> 
> > 
> > So maybe 9/12 is just obscuring what was already a BUG, either earlier
> > in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or
> > earlier releases, nor without CONFIG_NUMA).  I've not spent any time
> > looking for it, maybe it's obvious - can you spot and fix it?
> 
> Hugh,
> 
> I think i have already found the reason,

Great, thank you.

> if i am correct, the bug was existing before my patch.

Before your patchset?  Are you sure of that?

> 
> Could you please try below patch?

I put it on this morning, and ran load all day without a crash:
I think you indeed found the cause.

> And, could please allow me to fix the bug first,
> then post another patch to improve the things you dislike?

Good plan.

I've not yet glanced at your 2/3 and 3/3, I'm afraid.
I think akpm has helpfully included just 1/3 in the new mmotm.

> 
> Subject: [PATCH] thp: fix forgetting to reset the page alloc indicator
> 
> If NUMA is enabled, the indicator is not reset if the previous page
> request is failed, then it will trigger the BUG_ON in khugepaged_alloc_page
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  mm/huge_memory.c |1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index e366ca5..66d2bc6 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1825,6 +1825,7 @@ static bool khugepaged_prealloc_page(struct page 
> **hpage, bool *wait)
>   return false;
> 
>   *wait = false;
> + *hpage = NULL;
>   khugepaged_alloc_sleep();
>   } else if (*hpage) {
>   put_page(*hpage);

The unshown line just below this is

*hpage = NULL;

I do wish you would take the "*hpage = NULL;" out of if and else blocks
and place it once below both.

Thanks,
Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page

2012-09-13 Thread Hugh Dickins
On Wed, 12 Sep 2012, Xiao Guangrong wrote:
 On 09/12/2012 10:03 AM, Hugh Dickins wrote:
 
  What brought me to look at it was hitting BUG at mm/huge_memory.c:1842!
  running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes
  forcing out to swap), while I happened to have CONFIG_NUMA=y.
  
  That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page().
 
  
  So maybe 9/12 is just obscuring what was already a BUG, either earlier
  in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or
  earlier releases, nor without CONFIG_NUMA).  I've not spent any time
  looking for it, maybe it's obvious - can you spot and fix it?
 
 Hugh,
 
 I think i have already found the reason,

Great, thank you.

 if i am correct, the bug was existing before my patch.

Before your patchset?  Are you sure of that?

 
 Could you please try below patch?

I put it on this morning, and ran load all day without a crash:
I think you indeed found the cause.

 And, could please allow me to fix the bug first,
 then post another patch to improve the things you dislike?

Good plan.

I've not yet glanced at your 2/3 and 3/3, I'm afraid.
I think akpm has helpfully included just 1/3 in the new mmotm.

 
 Subject: [PATCH] thp: fix forgetting to reset the page alloc indicator
 
 If NUMA is enabled, the indicator is not reset if the previous page
 request is failed, then it will trigger the BUG_ON in khugepaged_alloc_page
 
 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  mm/huge_memory.c |1 +
  1 files changed, 1 insertions(+), 0 deletions(-)
 
 diff --git a/mm/huge_memory.c b/mm/huge_memory.c
 index e366ca5..66d2bc6 100644
 --- a/mm/huge_memory.c
 +++ b/mm/huge_memory.c
 @@ -1825,6 +1825,7 @@ static bool khugepaged_prealloc_page(struct page 
 **hpage, bool *wait)
   return false;
 
   *wait = false;
 + *hpage = NULL;
   khugepaged_alloc_sleep();
   } else if (*hpage) {
   put_page(*hpage);

The unshown line just below this is

*hpage = NULL;

I do wish you would take the *hpage = NULL; out of if and else blocks
and place it once below both.

Thanks,
Hugh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page

2012-09-13 Thread Hugh Dickins
On Wed, 12 Sep 2012, Hugh Dickins wrote:
  @@ -1825,6 +1825,7 @@ static bool khugepaged_prealloc_page(struct page 
  **hpage, bool *wait)
  return false;
  
  *wait = false;
  +   *hpage = NULL;
  khugepaged_alloc_sleep();
  } else if (*hpage) {
  put_page(*hpage);
 
 The unshown line just below this is
 
   *hpage = NULL;
 
 I do wish you would take the *hpage = NULL; out of if and else blocks
 and place it once below both.

Hold on, I'm being unreasonable: that's an else if, and I've no good
reason to request you to set *hpage = NULL when it's already NULL.
It would be okay if you did, but there's no reason for me to prefer it.

Hugh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page

2012-09-13 Thread Xiao Guangrong
On 09/13/2012 02:27 PM, Hugh Dickins wrote:
 On Wed, 12 Sep 2012, Xiao Guangrong wrote:
 On 09/12/2012 10:03 AM, Hugh Dickins wrote:

 What brought me to look at it was hitting BUG at mm/huge_memory.c:1842!
 running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes
 forcing out to swap), while I happened to have CONFIG_NUMA=y.

 That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page().


 So maybe 9/12 is just obscuring what was already a BUG, either earlier
 in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or
 earlier releases, nor without CONFIG_NUMA).  I've not spent any time
 looking for it, maybe it's obvious - can you spot and fix it?

 Hugh,

 I think i have already found the reason,
 
 Great, thank you.
 
 if i am correct, the bug was existing before my patch.
 
 Before your patchset?  Are you sure of that?

No. :)

I have told Andrew that the fix patch need not back port in
0/3. Sorry again for my mistake.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page

2012-09-11 Thread Xiao Guangrong
On 09/12/2012 10:03 AM, Hugh Dickins wrote:

> What brought me to look at it was hitting "BUG at mm/huge_memory.c:1842!"
> running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes
> forcing out to swap), while I happened to have CONFIG_NUMA=y.
> 
> That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page().

> 
> So maybe 9/12 is just obscuring what was already a BUG, either earlier
> in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or
> earlier releases, nor without CONFIG_NUMA).  I've not spent any time
> looking for it, maybe it's obvious - can you spot and fix it?

Hugh,

I think i have already found the reason, if i am correct, the bug was existing
before my patch.

Could you please try below patch? And, could please allow me to fix the bug 
first,
then post another patch to improve the things you dislike?


Subject: [PATCH] thp: fix forgetting to reset the page alloc indicator

If NUMA is enabled, the indicator is not reset if the previous page
request is failed, then it will trigger the BUG_ON in khugepaged_alloc_page

Signed-off-by: Xiao Guangrong 
---
 mm/huge_memory.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e366ca5..66d2bc6 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1825,6 +1825,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, 
bool *wait)
return false;

*wait = false;
+   *hpage = NULL;
khugepaged_alloc_sleep();
} else if (*hpage) {
put_page(*hpage);
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page

2012-09-11 Thread Xiao Guangrong
On 09/12/2012 10:03 AM, Hugh Dickins wrote:
> On Mon, 13 Aug 2012, Xiao Guangrong wrote:
> 
>> They are used to abstract the difference between NUMA enabled and NUMA 
>> disabled
>> to make the code more readable
>>
>> Signed-off-by: Xiao Guangrong 
>> ---
>>  mm/huge_memory.c |  166 
>> --
>>  1 files changed, 98 insertions(+), 68 deletions(-)
> 
> Hmm, that in itself is not necessarily an improvement.
> 
> I'm a bit sceptical about this patch,
> thp-introduce-khugepaged_prealloc_page-and-khugepaged_alloc_page.patch
> in last Thursday's mmotm 2012-09-06-16-46.
> 
> What brought me to look at it was hitting "BUG at mm/huge_memory.c:1842!"
> running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes
> forcing out to swap), while I happened to have CONFIG_NUMA=y.
> 
> That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page().

I will look into it, thanks for your point it out.

> 
> (If I'm honest, I'll admit I have Michel's "interval trees for anon rmap"
> patches in on top, and so the line number was actually shifted to 1839:
> but I don't believe his patches were in any way involved here, and
> indeed I've not yet found a problem with them: they look very good.)
> 
> I expect the BUG could quite easily be fixed up by making another call
> to khugepaged_prealloc_page() from somewhere to free up the hpage;
> but forgive me if I dislike using "prealloc" to free.
> 
> I do agree with you that the several CONFIG_NUMA ifdefs dotted around
> mm/huge_memory.c are regrettable, but I'm not at all sure that you're
> improving the situation with this patch, which gives misleading names
> to functions and moves the mmap_sem upping out of line.
> 
> I think you need to revisit it: maybe not go so far (leaving a few
> CONFIG_NUMAs behind, if they're not too bad), or maybe go further
> (add a separate function for freeing in the NUMA case, instead of
> using "prealloc").  I don't know what's best: have a play and see.

Sorry for that, i will find a better way to do this.

> 
> That's what I was intending to write yesterday.  But overnight I
> was running with this 9/12 backed out (I think 10,11,12 should be
> independent), and found "BUG at mm/huge_memory.c:1835!" this morning.
> 
> That's the VM_BUG_ON(*hpage) below #else in collapse_huge_page()
> when 9/12 is reverted.
> 
> So maybe 9/12 is just obscuring what was already a BUG, either earlier
> in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or
> earlier releases, nor without CONFIG_NUMA).  I've not spent any time
> looking for it, maybe it's obvious - can you spot and fix it?

Sure, will fix it as soon as possible. Thanks!


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page

2012-09-11 Thread Hugh Dickins
On Mon, 13 Aug 2012, Xiao Guangrong wrote:

> They are used to abstract the difference between NUMA enabled and NUMA 
> disabled
> to make the code more readable
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  mm/huge_memory.c |  166 
> --
>  1 files changed, 98 insertions(+), 68 deletions(-)

Hmm, that in itself is not necessarily an improvement.

I'm a bit sceptical about this patch,
thp-introduce-khugepaged_prealloc_page-and-khugepaged_alloc_page.patch
in last Thursday's mmotm 2012-09-06-16-46.

What brought me to look at it was hitting "BUG at mm/huge_memory.c:1842!"
running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes
forcing out to swap), while I happened to have CONFIG_NUMA=y.

That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page().

(If I'm honest, I'll admit I have Michel's "interval trees for anon rmap"
patches in on top, and so the line number was actually shifted to 1839:
but I don't believe his patches were in any way involved here, and
indeed I've not yet found a problem with them: they look very good.)

I expect the BUG could quite easily be fixed up by making another call
to khugepaged_prealloc_page() from somewhere to free up the hpage;
but forgive me if I dislike using "prealloc" to free.

I do agree with you that the several CONFIG_NUMA ifdefs dotted around
mm/huge_memory.c are regrettable, but I'm not at all sure that you're
improving the situation with this patch, which gives misleading names
to functions and moves the mmap_sem upping out of line.

I think you need to revisit it: maybe not go so far (leaving a few
CONFIG_NUMAs behind, if they're not too bad), or maybe go further
(add a separate function for freeing in the NUMA case, instead of
using "prealloc").  I don't know what's best: have a play and see.

That's what I was intending to write yesterday.  But overnight I
was running with this 9/12 backed out (I think 10,11,12 should be
independent), and found "BUG at mm/huge_memory.c:1835!" this morning.

That's the VM_BUG_ON(*hpage) below #else in collapse_huge_page()
when 9/12 is reverted.

So maybe 9/12 is just obscuring what was already a BUG, either earlier
in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or
earlier releases, nor without CONFIG_NUMA).  I've not spent any time
looking for it, maybe it's obvious - can you spot and fix it?

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page

2012-09-11 Thread Hugh Dickins
On Mon, 13 Aug 2012, Xiao Guangrong wrote:

 They are used to abstract the difference between NUMA enabled and NUMA 
 disabled
 to make the code more readable
 
 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  mm/huge_memory.c |  166 
 --
  1 files changed, 98 insertions(+), 68 deletions(-)

Hmm, that in itself is not necessarily an improvement.

I'm a bit sceptical about this patch,
thp-introduce-khugepaged_prealloc_page-and-khugepaged_alloc_page.patch
in last Thursday's mmotm 2012-09-06-16-46.

What brought me to look at it was hitting BUG at mm/huge_memory.c:1842!
running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes
forcing out to swap), while I happened to have CONFIG_NUMA=y.

That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page().

(If I'm honest, I'll admit I have Michel's interval trees for anon rmap
patches in on top, and so the line number was actually shifted to 1839:
but I don't believe his patches were in any way involved here, and
indeed I've not yet found a problem with them: they look very good.)

I expect the BUG could quite easily be fixed up by making another call
to khugepaged_prealloc_page() from somewhere to free up the hpage;
but forgive me if I dislike using prealloc to free.

I do agree with you that the several CONFIG_NUMA ifdefs dotted around
mm/huge_memory.c are regrettable, but I'm not at all sure that you're
improving the situation with this patch, which gives misleading names
to functions and moves the mmap_sem upping out of line.

I think you need to revisit it: maybe not go so far (leaving a few
CONFIG_NUMAs behind, if they're not too bad), or maybe go further
(add a separate function for freeing in the NUMA case, instead of
using prealloc).  I don't know what's best: have a play and see.

That's what I was intending to write yesterday.  But overnight I
was running with this 9/12 backed out (I think 10,11,12 should be
independent), and found BUG at mm/huge_memory.c:1835! this morning.

That's the VM_BUG_ON(*hpage) below #else in collapse_huge_page()
when 9/12 is reverted.

So maybe 9/12 is just obscuring what was already a BUG, either earlier
in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or
earlier releases, nor without CONFIG_NUMA).  I've not spent any time
looking for it, maybe it's obvious - can you spot and fix it?

Hugh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page

2012-09-11 Thread Xiao Guangrong
On 09/12/2012 10:03 AM, Hugh Dickins wrote:
 On Mon, 13 Aug 2012, Xiao Guangrong wrote:
 
 They are used to abstract the difference between NUMA enabled and NUMA 
 disabled
 to make the code more readable

 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  mm/huge_memory.c |  166 
 --
  1 files changed, 98 insertions(+), 68 deletions(-)
 
 Hmm, that in itself is not necessarily an improvement.
 
 I'm a bit sceptical about this patch,
 thp-introduce-khugepaged_prealloc_page-and-khugepaged_alloc_page.patch
 in last Thursday's mmotm 2012-09-06-16-46.
 
 What brought me to look at it was hitting BUG at mm/huge_memory.c:1842!
 running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes
 forcing out to swap), while I happened to have CONFIG_NUMA=y.
 
 That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page().

I will look into it, thanks for your point it out.

 
 (If I'm honest, I'll admit I have Michel's interval trees for anon rmap
 patches in on top, and so the line number was actually shifted to 1839:
 but I don't believe his patches were in any way involved here, and
 indeed I've not yet found a problem with them: they look very good.)
 
 I expect the BUG could quite easily be fixed up by making another call
 to khugepaged_prealloc_page() from somewhere to free up the hpage;
 but forgive me if I dislike using prealloc to free.
 
 I do agree with you that the several CONFIG_NUMA ifdefs dotted around
 mm/huge_memory.c are regrettable, but I'm not at all sure that you're
 improving the situation with this patch, which gives misleading names
 to functions and moves the mmap_sem upping out of line.
 
 I think you need to revisit it: maybe not go so far (leaving a few
 CONFIG_NUMAs behind, if they're not too bad), or maybe go further
 (add a separate function for freeing in the NUMA case, instead of
 using prealloc).  I don't know what's best: have a play and see.

Sorry for that, i will find a better way to do this.

 
 That's what I was intending to write yesterday.  But overnight I
 was running with this 9/12 backed out (I think 10,11,12 should be
 independent), and found BUG at mm/huge_memory.c:1835! this morning.
 
 That's the VM_BUG_ON(*hpage) below #else in collapse_huge_page()
 when 9/12 is reverted.
 
 So maybe 9/12 is just obscuring what was already a BUG, either earlier
 in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or
 earlier releases, nor without CONFIG_NUMA).  I've not spent any time
 looking for it, maybe it's obvious - can you spot and fix it?

Sure, will fix it as soon as possible. Thanks!


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page

2012-09-11 Thread Xiao Guangrong
On 09/12/2012 10:03 AM, Hugh Dickins wrote:

 What brought me to look at it was hitting BUG at mm/huge_memory.c:1842!
 running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes
 forcing out to swap), while I happened to have CONFIG_NUMA=y.
 
 That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page().

 
 So maybe 9/12 is just obscuring what was already a BUG, either earlier
 in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or
 earlier releases, nor without CONFIG_NUMA).  I've not spent any time
 looking for it, maybe it's obvious - can you spot and fix it?

Hugh,

I think i have already found the reason, if i am correct, the bug was existing
before my patch.

Could you please try below patch? And, could please allow me to fix the bug 
first,
then post another patch to improve the things you dislike?


Subject: [PATCH] thp: fix forgetting to reset the page alloc indicator

If NUMA is enabled, the indicator is not reset if the previous page
request is failed, then it will trigger the BUG_ON in khugepaged_alloc_page

Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
---
 mm/huge_memory.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e366ca5..66d2bc6 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1825,6 +1825,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, 
bool *wait)
return false;

*wait = false;
+   *hpage = NULL;
khugepaged_alloc_sleep();
} else if (*hpage) {
put_page(*hpage);
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page

2012-08-13 Thread Xiao Guangrong
They are used to abstract the difference between NUMA enabled and NUMA disabled
to make the code more readable

Signed-off-by: Xiao Guangrong 
---
 mm/huge_memory.c |  166 --
 1 files changed, 98 insertions(+), 68 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 050b8d0..82f6cce 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1833,28 +1833,34 @@ static void __collapse_huge_page_copy(pte_t *pte, 
struct page *page,
}
 }

-static void collapse_huge_page(struct mm_struct *mm,
-  unsigned long address,
-  struct page **hpage,
-  struct vm_area_struct *vma,
-  int node)
+static void khugepaged_alloc_sleep(void)
 {
-   pgd_t *pgd;
-   pud_t *pud;
-   pmd_t *pmd, _pmd;
-   pte_t *pte;
-   pgtable_t pgtable;
-   struct page *new_page;
-   spinlock_t *ptl;
-   int isolated;
-   unsigned long hstart, hend;
+   wait_event_freezable_timeout(khugepaged_wait, false,
+   msecs_to_jiffies(khugepaged_alloc_sleep_millisecs));
+}

-   VM_BUG_ON(address & ~HPAGE_PMD_MASK);
-#ifndef CONFIG_NUMA
-   up_read(>mmap_sem);
-   VM_BUG_ON(!*hpage);
-   new_page = *hpage;
-#else
+#ifdef CONFIG_NUMA
+static bool khugepaged_prealloc_page(struct page **hpage, bool *wait)
+{
+   if (IS_ERR(*hpage)) {
+   if (!*wait)
+   return false;
+
+   *wait = false;
+   khugepaged_alloc_sleep();
+   } else if (*hpage) {
+   put_page(*hpage);
+   *hpage = NULL;
+   }
+
+   return true;
+}
+
+static struct page
+*khugepaged_alloc_page(struct page **hpage, struct mm_struct *mm,
+  struct vm_area_struct *vma, unsigned long address,
+  int node)
+{
VM_BUG_ON(*hpage);
/*
 * Allocate the page while the vma is still valid and under
@@ -1866,7 +1872,7 @@ static void collapse_huge_page(struct mm_struct *mm,
 * mmap_sem in read mode is good idea also to allow greater
 * scalability.
 */
-   new_page = alloc_hugepage_vma(khugepaged_defrag(), vma, address,
+   *hpage  = alloc_hugepage_vma(khugepaged_defrag(), vma, address,
  node, __GFP_OTHER_NODE);

/*
@@ -1874,15 +1880,81 @@ static void collapse_huge_page(struct mm_struct *mm,
 * preparation for taking it in write mode.
 */
up_read(>mmap_sem);
-   if (unlikely(!new_page)) {
+   if (unlikely(!*hpage)) {
count_vm_event(THP_COLLAPSE_ALLOC_FAILED);
*hpage = ERR_PTR(-ENOMEM);
-   return;
+   return NULL;
}
-   *hpage = new_page;
+
count_vm_event(THP_COLLAPSE_ALLOC);
+   return *hpage;
+}
+#else
+static struct page *khugepaged_alloc_hugepage(bool *wait)
+{
+   struct page *hpage;
+
+   do {
+   hpage = alloc_hugepage(khugepaged_defrag());
+   if (!hpage) {
+   count_vm_event(THP_COLLAPSE_ALLOC_FAILED);
+   if (!*wait)
+   return NULL;
+
+   *wait = false;
+   khugepaged_alloc_sleep();
+   } else
+   count_vm_event(THP_COLLAPSE_ALLOC);
+   } while (unlikely(!hpage) && likely(khugepaged_enabled()));
+
+   return hpage;
+}
+
+static bool khugepaged_prealloc_page(struct page **hpage, bool *wait)
+{
+   if (!*hpage)
+   *hpage = khugepaged_alloc_hugepage(wait);
+
+   if (unlikely(!*hpage))
+   return false;
+
+   return true;
+}
+
+static struct page
+*khugepaged_alloc_page(struct page **hpage, struct mm_struct *mm,
+  struct vm_area_struct *vma, unsigned long address,
+  int node)
+{
+   up_read(>mmap_sem);
+   VM_BUG_ON(!*hpage);
+   return  *hpage;
+}
 #endif

+static void collapse_huge_page(struct mm_struct *mm,
+  unsigned long address,
+  struct page **hpage,
+  struct vm_area_struct *vma,
+  int node)
+{
+   pgd_t *pgd;
+   pud_t *pud;
+   pmd_t *pmd, _pmd;
+   pte_t *pte;
+   pgtable_t pgtable;
+   struct page *new_page;
+   spinlock_t *ptl;
+   int isolated;
+   unsigned long hstart, hend;
+
+   VM_BUG_ON(address & ~HPAGE_PMD_MASK);
+
+   /* release the mmap_sem read lock. */
+   new_page = khugepaged_alloc_page(hpage, mm, vma, address, node);
+   if (!new_page)
+   return;
+
if (unlikely(mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL)))
return;

@@ -2230,34 +2302,6 @@ static int khugepaged_wait_event(void)
  

[PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page

2012-08-13 Thread Xiao Guangrong
They are used to abstract the difference between NUMA enabled and NUMA disabled
to make the code more readable

Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
---
 mm/huge_memory.c |  166 --
 1 files changed, 98 insertions(+), 68 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 050b8d0..82f6cce 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1833,28 +1833,34 @@ static void __collapse_huge_page_copy(pte_t *pte, 
struct page *page,
}
 }

-static void collapse_huge_page(struct mm_struct *mm,
-  unsigned long address,
-  struct page **hpage,
-  struct vm_area_struct *vma,
-  int node)
+static void khugepaged_alloc_sleep(void)
 {
-   pgd_t *pgd;
-   pud_t *pud;
-   pmd_t *pmd, _pmd;
-   pte_t *pte;
-   pgtable_t pgtable;
-   struct page *new_page;
-   spinlock_t *ptl;
-   int isolated;
-   unsigned long hstart, hend;
+   wait_event_freezable_timeout(khugepaged_wait, false,
+   msecs_to_jiffies(khugepaged_alloc_sleep_millisecs));
+}

-   VM_BUG_ON(address  ~HPAGE_PMD_MASK);
-#ifndef CONFIG_NUMA
-   up_read(mm-mmap_sem);
-   VM_BUG_ON(!*hpage);
-   new_page = *hpage;
-#else
+#ifdef CONFIG_NUMA
+static bool khugepaged_prealloc_page(struct page **hpage, bool *wait)
+{
+   if (IS_ERR(*hpage)) {
+   if (!*wait)
+   return false;
+
+   *wait = false;
+   khugepaged_alloc_sleep();
+   } else if (*hpage) {
+   put_page(*hpage);
+   *hpage = NULL;
+   }
+
+   return true;
+}
+
+static struct page
+*khugepaged_alloc_page(struct page **hpage, struct mm_struct *mm,
+  struct vm_area_struct *vma, unsigned long address,
+  int node)
+{
VM_BUG_ON(*hpage);
/*
 * Allocate the page while the vma is still valid and under
@@ -1866,7 +1872,7 @@ static void collapse_huge_page(struct mm_struct *mm,
 * mmap_sem in read mode is good idea also to allow greater
 * scalability.
 */
-   new_page = alloc_hugepage_vma(khugepaged_defrag(), vma, address,
+   *hpage  = alloc_hugepage_vma(khugepaged_defrag(), vma, address,
  node, __GFP_OTHER_NODE);

/*
@@ -1874,15 +1880,81 @@ static void collapse_huge_page(struct mm_struct *mm,
 * preparation for taking it in write mode.
 */
up_read(mm-mmap_sem);
-   if (unlikely(!new_page)) {
+   if (unlikely(!*hpage)) {
count_vm_event(THP_COLLAPSE_ALLOC_FAILED);
*hpage = ERR_PTR(-ENOMEM);
-   return;
+   return NULL;
}
-   *hpage = new_page;
+
count_vm_event(THP_COLLAPSE_ALLOC);
+   return *hpage;
+}
+#else
+static struct page *khugepaged_alloc_hugepage(bool *wait)
+{
+   struct page *hpage;
+
+   do {
+   hpage = alloc_hugepage(khugepaged_defrag());
+   if (!hpage) {
+   count_vm_event(THP_COLLAPSE_ALLOC_FAILED);
+   if (!*wait)
+   return NULL;
+
+   *wait = false;
+   khugepaged_alloc_sleep();
+   } else
+   count_vm_event(THP_COLLAPSE_ALLOC);
+   } while (unlikely(!hpage)  likely(khugepaged_enabled()));
+
+   return hpage;
+}
+
+static bool khugepaged_prealloc_page(struct page **hpage, bool *wait)
+{
+   if (!*hpage)
+   *hpage = khugepaged_alloc_hugepage(wait);
+
+   if (unlikely(!*hpage))
+   return false;
+
+   return true;
+}
+
+static struct page
+*khugepaged_alloc_page(struct page **hpage, struct mm_struct *mm,
+  struct vm_area_struct *vma, unsigned long address,
+  int node)
+{
+   up_read(mm-mmap_sem);
+   VM_BUG_ON(!*hpage);
+   return  *hpage;
+}
 #endif

+static void collapse_huge_page(struct mm_struct *mm,
+  unsigned long address,
+  struct page **hpage,
+  struct vm_area_struct *vma,
+  int node)
+{
+   pgd_t *pgd;
+   pud_t *pud;
+   pmd_t *pmd, _pmd;
+   pte_t *pte;
+   pgtable_t pgtable;
+   struct page *new_page;
+   spinlock_t *ptl;
+   int isolated;
+   unsigned long hstart, hend;
+
+   VM_BUG_ON(address  ~HPAGE_PMD_MASK);
+
+   /* release the mmap_sem read lock. */
+   new_page = khugepaged_alloc_page(hpage, mm, vma, address, node);
+   if (!new_page)
+   return;
+
if (unlikely(mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL)))
return;

@@ -2230,34 +2302,6 @@ static int