Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page
On 09/13/2012 02:27 PM, Hugh Dickins wrote: > On Wed, 12 Sep 2012, Xiao Guangrong wrote: >> On 09/12/2012 10:03 AM, Hugh Dickins wrote: >> >>> What brought me to look at it was hitting "BUG at mm/huge_memory.c:1842!" >>> running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes >>> forcing out to swap), while I happened to have CONFIG_NUMA=y. >>> >>> That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page(). >> >>> >>> So maybe 9/12 is just obscuring what was already a BUG, either earlier >>> in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or >>> earlier releases, nor without CONFIG_NUMA). I've not spent any time >>> looking for it, maybe it's obvious - can you spot and fix it? >> >> Hugh, >> >> I think i have already found the reason, > > Great, thank you. > >> if i am correct, the bug was existing before my patch. > > Before your patchset? Are you sure of that? No. :) I have told Andrew that the fix patch need not back port in 0/3. Sorry again for my mistake. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page
On Wed, 12 Sep 2012, Hugh Dickins wrote: > > @@ -1825,6 +1825,7 @@ static bool khugepaged_prealloc_page(struct page > > **hpage, bool *wait) > > return false; > > > > *wait = false; > > + *hpage = NULL; > > khugepaged_alloc_sleep(); > > } else if (*hpage) { > > put_page(*hpage); > > The unshown line just below this is > > *hpage = NULL; > > I do wish you would take the "*hpage = NULL;" out of if and else blocks > and place it once below both. Hold on, I'm being unreasonable: that's an "else if", and I've no good reason to request you to set *hpage = NULL when it's already NULL. It would be okay if you did, but there's no reason for me to prefer it. Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page
On Wed, 12 Sep 2012, Xiao Guangrong wrote: > On 09/12/2012 10:03 AM, Hugh Dickins wrote: > > > What brought me to look at it was hitting "BUG at mm/huge_memory.c:1842!" > > running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes > > forcing out to swap), while I happened to have CONFIG_NUMA=y. > > > > That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page(). > > > > > So maybe 9/12 is just obscuring what was already a BUG, either earlier > > in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or > > earlier releases, nor without CONFIG_NUMA). I've not spent any time > > looking for it, maybe it's obvious - can you spot and fix it? > > Hugh, > > I think i have already found the reason, Great, thank you. > if i am correct, the bug was existing before my patch. Before your patchset? Are you sure of that? > > Could you please try below patch? I put it on this morning, and ran load all day without a crash: I think you indeed found the cause. > And, could please allow me to fix the bug first, > then post another patch to improve the things you dislike? Good plan. I've not yet glanced at your 2/3 and 3/3, I'm afraid. I think akpm has helpfully included just 1/3 in the new mmotm. > > Subject: [PATCH] thp: fix forgetting to reset the page alloc indicator > > If NUMA is enabled, the indicator is not reset if the previous page > request is failed, then it will trigger the BUG_ON in khugepaged_alloc_page > > Signed-off-by: Xiao Guangrong > --- > mm/huge_memory.c |1 + > 1 files changed, 1 insertions(+), 0 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index e366ca5..66d2bc6 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1825,6 +1825,7 @@ static bool khugepaged_prealloc_page(struct page > **hpage, bool *wait) > return false; > > *wait = false; > + *hpage = NULL; > khugepaged_alloc_sleep(); > } else if (*hpage) { > put_page(*hpage); The unshown line just below this is *hpage = NULL; I do wish you would take the "*hpage = NULL;" out of if and else blocks and place it once below both. Thanks, Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page
On Wed, 12 Sep 2012, Xiao Guangrong wrote: On 09/12/2012 10:03 AM, Hugh Dickins wrote: What brought me to look at it was hitting BUG at mm/huge_memory.c:1842! running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes forcing out to swap), while I happened to have CONFIG_NUMA=y. That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page(). So maybe 9/12 is just obscuring what was already a BUG, either earlier in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or earlier releases, nor without CONFIG_NUMA). I've not spent any time looking for it, maybe it's obvious - can you spot and fix it? Hugh, I think i have already found the reason, Great, thank you. if i am correct, the bug was existing before my patch. Before your patchset? Are you sure of that? Could you please try below patch? I put it on this morning, and ran load all day without a crash: I think you indeed found the cause. And, could please allow me to fix the bug first, then post another patch to improve the things you dislike? Good plan. I've not yet glanced at your 2/3 and 3/3, I'm afraid. I think akpm has helpfully included just 1/3 in the new mmotm. Subject: [PATCH] thp: fix forgetting to reset the page alloc indicator If NUMA is enabled, the indicator is not reset if the previous page request is failed, then it will trigger the BUG_ON in khugepaged_alloc_page Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com --- mm/huge_memory.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index e366ca5..66d2bc6 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1825,6 +1825,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) return false; *wait = false; + *hpage = NULL; khugepaged_alloc_sleep(); } else if (*hpage) { put_page(*hpage); The unshown line just below this is *hpage = NULL; I do wish you would take the *hpage = NULL; out of if and else blocks and place it once below both. Thanks, Hugh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page
On Wed, 12 Sep 2012, Hugh Dickins wrote: @@ -1825,6 +1825,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) return false; *wait = false; + *hpage = NULL; khugepaged_alloc_sleep(); } else if (*hpage) { put_page(*hpage); The unshown line just below this is *hpage = NULL; I do wish you would take the *hpage = NULL; out of if and else blocks and place it once below both. Hold on, I'm being unreasonable: that's an else if, and I've no good reason to request you to set *hpage = NULL when it's already NULL. It would be okay if you did, but there's no reason for me to prefer it. Hugh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page
On 09/13/2012 02:27 PM, Hugh Dickins wrote: On Wed, 12 Sep 2012, Xiao Guangrong wrote: On 09/12/2012 10:03 AM, Hugh Dickins wrote: What brought me to look at it was hitting BUG at mm/huge_memory.c:1842! running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes forcing out to swap), while I happened to have CONFIG_NUMA=y. That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page(). So maybe 9/12 is just obscuring what was already a BUG, either earlier in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or earlier releases, nor without CONFIG_NUMA). I've not spent any time looking for it, maybe it's obvious - can you spot and fix it? Hugh, I think i have already found the reason, Great, thank you. if i am correct, the bug was existing before my patch. Before your patchset? Are you sure of that? No. :) I have told Andrew that the fix patch need not back port in 0/3. Sorry again for my mistake. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page
On 09/12/2012 10:03 AM, Hugh Dickins wrote: > What brought me to look at it was hitting "BUG at mm/huge_memory.c:1842!" > running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes > forcing out to swap), while I happened to have CONFIG_NUMA=y. > > That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page(). > > So maybe 9/12 is just obscuring what was already a BUG, either earlier > in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or > earlier releases, nor without CONFIG_NUMA). I've not spent any time > looking for it, maybe it's obvious - can you spot and fix it? Hugh, I think i have already found the reason, if i am correct, the bug was existing before my patch. Could you please try below patch? And, could please allow me to fix the bug first, then post another patch to improve the things you dislike? Subject: [PATCH] thp: fix forgetting to reset the page alloc indicator If NUMA is enabled, the indicator is not reset if the previous page request is failed, then it will trigger the BUG_ON in khugepaged_alloc_page Signed-off-by: Xiao Guangrong --- mm/huge_memory.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index e366ca5..66d2bc6 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1825,6 +1825,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) return false; *wait = false; + *hpage = NULL; khugepaged_alloc_sleep(); } else if (*hpage) { put_page(*hpage); -- 1.7.7.6 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page
On 09/12/2012 10:03 AM, Hugh Dickins wrote: > On Mon, 13 Aug 2012, Xiao Guangrong wrote: > >> They are used to abstract the difference between NUMA enabled and NUMA >> disabled >> to make the code more readable >> >> Signed-off-by: Xiao Guangrong >> --- >> mm/huge_memory.c | 166 >> -- >> 1 files changed, 98 insertions(+), 68 deletions(-) > > Hmm, that in itself is not necessarily an improvement. > > I'm a bit sceptical about this patch, > thp-introduce-khugepaged_prealloc_page-and-khugepaged_alloc_page.patch > in last Thursday's mmotm 2012-09-06-16-46. > > What brought me to look at it was hitting "BUG at mm/huge_memory.c:1842!" > running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes > forcing out to swap), while I happened to have CONFIG_NUMA=y. > > That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page(). I will look into it, thanks for your point it out. > > (If I'm honest, I'll admit I have Michel's "interval trees for anon rmap" > patches in on top, and so the line number was actually shifted to 1839: > but I don't believe his patches were in any way involved here, and > indeed I've not yet found a problem with them: they look very good.) > > I expect the BUG could quite easily be fixed up by making another call > to khugepaged_prealloc_page() from somewhere to free up the hpage; > but forgive me if I dislike using "prealloc" to free. > > I do agree with you that the several CONFIG_NUMA ifdefs dotted around > mm/huge_memory.c are regrettable, but I'm not at all sure that you're > improving the situation with this patch, which gives misleading names > to functions and moves the mmap_sem upping out of line. > > I think you need to revisit it: maybe not go so far (leaving a few > CONFIG_NUMAs behind, if they're not too bad), or maybe go further > (add a separate function for freeing in the NUMA case, instead of > using "prealloc"). I don't know what's best: have a play and see. Sorry for that, i will find a better way to do this. > > That's what I was intending to write yesterday. But overnight I > was running with this 9/12 backed out (I think 10,11,12 should be > independent), and found "BUG at mm/huge_memory.c:1835!" this morning. > > That's the VM_BUG_ON(*hpage) below #else in collapse_huge_page() > when 9/12 is reverted. > > So maybe 9/12 is just obscuring what was already a BUG, either earlier > in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or > earlier releases, nor without CONFIG_NUMA). I've not spent any time > looking for it, maybe it's obvious - can you spot and fix it? Sure, will fix it as soon as possible. Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page
On Mon, 13 Aug 2012, Xiao Guangrong wrote: > They are used to abstract the difference between NUMA enabled and NUMA > disabled > to make the code more readable > > Signed-off-by: Xiao Guangrong > --- > mm/huge_memory.c | 166 > -- > 1 files changed, 98 insertions(+), 68 deletions(-) Hmm, that in itself is not necessarily an improvement. I'm a bit sceptical about this patch, thp-introduce-khugepaged_prealloc_page-and-khugepaged_alloc_page.patch in last Thursday's mmotm 2012-09-06-16-46. What brought me to look at it was hitting "BUG at mm/huge_memory.c:1842!" running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes forcing out to swap), while I happened to have CONFIG_NUMA=y. That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page(). (If I'm honest, I'll admit I have Michel's "interval trees for anon rmap" patches in on top, and so the line number was actually shifted to 1839: but I don't believe his patches were in any way involved here, and indeed I've not yet found a problem with them: they look very good.) I expect the BUG could quite easily be fixed up by making another call to khugepaged_prealloc_page() from somewhere to free up the hpage; but forgive me if I dislike using "prealloc" to free. I do agree with you that the several CONFIG_NUMA ifdefs dotted around mm/huge_memory.c are regrettable, but I'm not at all sure that you're improving the situation with this patch, which gives misleading names to functions and moves the mmap_sem upping out of line. I think you need to revisit it: maybe not go so far (leaving a few CONFIG_NUMAs behind, if they're not too bad), or maybe go further (add a separate function for freeing in the NUMA case, instead of using "prealloc"). I don't know what's best: have a play and see. That's what I was intending to write yesterday. But overnight I was running with this 9/12 backed out (I think 10,11,12 should be independent), and found "BUG at mm/huge_memory.c:1835!" this morning. That's the VM_BUG_ON(*hpage) below #else in collapse_huge_page() when 9/12 is reverted. So maybe 9/12 is just obscuring what was already a BUG, either earlier in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or earlier releases, nor without CONFIG_NUMA). I've not spent any time looking for it, maybe it's obvious - can you spot and fix it? Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page
On Mon, 13 Aug 2012, Xiao Guangrong wrote: They are used to abstract the difference between NUMA enabled and NUMA disabled to make the code more readable Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com --- mm/huge_memory.c | 166 -- 1 files changed, 98 insertions(+), 68 deletions(-) Hmm, that in itself is not necessarily an improvement. I'm a bit sceptical about this patch, thp-introduce-khugepaged_prealloc_page-and-khugepaged_alloc_page.patch in last Thursday's mmotm 2012-09-06-16-46. What brought me to look at it was hitting BUG at mm/huge_memory.c:1842! running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes forcing out to swap), while I happened to have CONFIG_NUMA=y. That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page(). (If I'm honest, I'll admit I have Michel's interval trees for anon rmap patches in on top, and so the line number was actually shifted to 1839: but I don't believe his patches were in any way involved here, and indeed I've not yet found a problem with them: they look very good.) I expect the BUG could quite easily be fixed up by making another call to khugepaged_prealloc_page() from somewhere to free up the hpage; but forgive me if I dislike using prealloc to free. I do agree with you that the several CONFIG_NUMA ifdefs dotted around mm/huge_memory.c are regrettable, but I'm not at all sure that you're improving the situation with this patch, which gives misleading names to functions and moves the mmap_sem upping out of line. I think you need to revisit it: maybe not go so far (leaving a few CONFIG_NUMAs behind, if they're not too bad), or maybe go further (add a separate function for freeing in the NUMA case, instead of using prealloc). I don't know what's best: have a play and see. That's what I was intending to write yesterday. But overnight I was running with this 9/12 backed out (I think 10,11,12 should be independent), and found BUG at mm/huge_memory.c:1835! this morning. That's the VM_BUG_ON(*hpage) below #else in collapse_huge_page() when 9/12 is reverted. So maybe 9/12 is just obscuring what was already a BUG, either earlier in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or earlier releases, nor without CONFIG_NUMA). I've not spent any time looking for it, maybe it's obvious - can you spot and fix it? Hugh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page
On 09/12/2012 10:03 AM, Hugh Dickins wrote: On Mon, 13 Aug 2012, Xiao Guangrong wrote: They are used to abstract the difference between NUMA enabled and NUMA disabled to make the code more readable Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com --- mm/huge_memory.c | 166 -- 1 files changed, 98 insertions(+), 68 deletions(-) Hmm, that in itself is not necessarily an improvement. I'm a bit sceptical about this patch, thp-introduce-khugepaged_prealloc_page-and-khugepaged_alloc_page.patch in last Thursday's mmotm 2012-09-06-16-46. What brought me to look at it was hitting BUG at mm/huge_memory.c:1842! running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes forcing out to swap), while I happened to have CONFIG_NUMA=y. That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page(). I will look into it, thanks for your point it out. (If I'm honest, I'll admit I have Michel's interval trees for anon rmap patches in on top, and so the line number was actually shifted to 1839: but I don't believe his patches were in any way involved here, and indeed I've not yet found a problem with them: they look very good.) I expect the BUG could quite easily be fixed up by making another call to khugepaged_prealloc_page() from somewhere to free up the hpage; but forgive me if I dislike using prealloc to free. I do agree with you that the several CONFIG_NUMA ifdefs dotted around mm/huge_memory.c are regrettable, but I'm not at all sure that you're improving the situation with this patch, which gives misleading names to functions and moves the mmap_sem upping out of line. I think you need to revisit it: maybe not go so far (leaving a few CONFIG_NUMAs behind, if they're not too bad), or maybe go further (add a separate function for freeing in the NUMA case, instead of using prealloc). I don't know what's best: have a play and see. Sorry for that, i will find a better way to do this. That's what I was intending to write yesterday. But overnight I was running with this 9/12 backed out (I think 10,11,12 should be independent), and found BUG at mm/huge_memory.c:1835! this morning. That's the VM_BUG_ON(*hpage) below #else in collapse_huge_page() when 9/12 is reverted. So maybe 9/12 is just obscuring what was already a BUG, either earlier in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or earlier releases, nor without CONFIG_NUMA). I've not spent any time looking for it, maybe it's obvious - can you spot and fix it? Sure, will fix it as soon as possible. Thanks! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page
On 09/12/2012 10:03 AM, Hugh Dickins wrote: What brought me to look at it was hitting BUG at mm/huge_memory.c:1842! running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes forcing out to swap), while I happened to have CONFIG_NUMA=y. That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page(). So maybe 9/12 is just obscuring what was already a BUG, either earlier in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or earlier releases, nor without CONFIG_NUMA). I've not spent any time looking for it, maybe it's obvious - can you spot and fix it? Hugh, I think i have already found the reason, if i am correct, the bug was existing before my patch. Could you please try below patch? And, could please allow me to fix the bug first, then post another patch to improve the things you dislike? Subject: [PATCH] thp: fix forgetting to reset the page alloc indicator If NUMA is enabled, the indicator is not reset if the previous page request is failed, then it will trigger the BUG_ON in khugepaged_alloc_page Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com --- mm/huge_memory.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index e366ca5..66d2bc6 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1825,6 +1825,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) return false; *wait = false; + *hpage = NULL; khugepaged_alloc_sleep(); } else if (*hpage) { put_page(*hpage); -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page
They are used to abstract the difference between NUMA enabled and NUMA disabled to make the code more readable Signed-off-by: Xiao Guangrong --- mm/huge_memory.c | 166 -- 1 files changed, 98 insertions(+), 68 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 050b8d0..82f6cce 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1833,28 +1833,34 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page, } } -static void collapse_huge_page(struct mm_struct *mm, - unsigned long address, - struct page **hpage, - struct vm_area_struct *vma, - int node) +static void khugepaged_alloc_sleep(void) { - pgd_t *pgd; - pud_t *pud; - pmd_t *pmd, _pmd; - pte_t *pte; - pgtable_t pgtable; - struct page *new_page; - spinlock_t *ptl; - int isolated; - unsigned long hstart, hend; + wait_event_freezable_timeout(khugepaged_wait, false, + msecs_to_jiffies(khugepaged_alloc_sleep_millisecs)); +} - VM_BUG_ON(address & ~HPAGE_PMD_MASK); -#ifndef CONFIG_NUMA - up_read(>mmap_sem); - VM_BUG_ON(!*hpage); - new_page = *hpage; -#else +#ifdef CONFIG_NUMA +static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) +{ + if (IS_ERR(*hpage)) { + if (!*wait) + return false; + + *wait = false; + khugepaged_alloc_sleep(); + } else if (*hpage) { + put_page(*hpage); + *hpage = NULL; + } + + return true; +} + +static struct page +*khugepaged_alloc_page(struct page **hpage, struct mm_struct *mm, + struct vm_area_struct *vma, unsigned long address, + int node) +{ VM_BUG_ON(*hpage); /* * Allocate the page while the vma is still valid and under @@ -1866,7 +1872,7 @@ static void collapse_huge_page(struct mm_struct *mm, * mmap_sem in read mode is good idea also to allow greater * scalability. */ - new_page = alloc_hugepage_vma(khugepaged_defrag(), vma, address, + *hpage = alloc_hugepage_vma(khugepaged_defrag(), vma, address, node, __GFP_OTHER_NODE); /* @@ -1874,15 +1880,81 @@ static void collapse_huge_page(struct mm_struct *mm, * preparation for taking it in write mode. */ up_read(>mmap_sem); - if (unlikely(!new_page)) { + if (unlikely(!*hpage)) { count_vm_event(THP_COLLAPSE_ALLOC_FAILED); *hpage = ERR_PTR(-ENOMEM); - return; + return NULL; } - *hpage = new_page; + count_vm_event(THP_COLLAPSE_ALLOC); + return *hpage; +} +#else +static struct page *khugepaged_alloc_hugepage(bool *wait) +{ + struct page *hpage; + + do { + hpage = alloc_hugepage(khugepaged_defrag()); + if (!hpage) { + count_vm_event(THP_COLLAPSE_ALLOC_FAILED); + if (!*wait) + return NULL; + + *wait = false; + khugepaged_alloc_sleep(); + } else + count_vm_event(THP_COLLAPSE_ALLOC); + } while (unlikely(!hpage) && likely(khugepaged_enabled())); + + return hpage; +} + +static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) +{ + if (!*hpage) + *hpage = khugepaged_alloc_hugepage(wait); + + if (unlikely(!*hpage)) + return false; + + return true; +} + +static struct page +*khugepaged_alloc_page(struct page **hpage, struct mm_struct *mm, + struct vm_area_struct *vma, unsigned long address, + int node) +{ + up_read(>mmap_sem); + VM_BUG_ON(!*hpage); + return *hpage; +} #endif +static void collapse_huge_page(struct mm_struct *mm, + unsigned long address, + struct page **hpage, + struct vm_area_struct *vma, + int node) +{ + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd, _pmd; + pte_t *pte; + pgtable_t pgtable; + struct page *new_page; + spinlock_t *ptl; + int isolated; + unsigned long hstart, hend; + + VM_BUG_ON(address & ~HPAGE_PMD_MASK); + + /* release the mmap_sem read lock. */ + new_page = khugepaged_alloc_page(hpage, mm, vma, address, node); + if (!new_page) + return; + if (unlikely(mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL))) return; @@ -2230,34 +2302,6 @@ static int khugepaged_wait_event(void)
[PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page
They are used to abstract the difference between NUMA enabled and NUMA disabled to make the code more readable Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com --- mm/huge_memory.c | 166 -- 1 files changed, 98 insertions(+), 68 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 050b8d0..82f6cce 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1833,28 +1833,34 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page, } } -static void collapse_huge_page(struct mm_struct *mm, - unsigned long address, - struct page **hpage, - struct vm_area_struct *vma, - int node) +static void khugepaged_alloc_sleep(void) { - pgd_t *pgd; - pud_t *pud; - pmd_t *pmd, _pmd; - pte_t *pte; - pgtable_t pgtable; - struct page *new_page; - spinlock_t *ptl; - int isolated; - unsigned long hstart, hend; + wait_event_freezable_timeout(khugepaged_wait, false, + msecs_to_jiffies(khugepaged_alloc_sleep_millisecs)); +} - VM_BUG_ON(address ~HPAGE_PMD_MASK); -#ifndef CONFIG_NUMA - up_read(mm-mmap_sem); - VM_BUG_ON(!*hpage); - new_page = *hpage; -#else +#ifdef CONFIG_NUMA +static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) +{ + if (IS_ERR(*hpage)) { + if (!*wait) + return false; + + *wait = false; + khugepaged_alloc_sleep(); + } else if (*hpage) { + put_page(*hpage); + *hpage = NULL; + } + + return true; +} + +static struct page +*khugepaged_alloc_page(struct page **hpage, struct mm_struct *mm, + struct vm_area_struct *vma, unsigned long address, + int node) +{ VM_BUG_ON(*hpage); /* * Allocate the page while the vma is still valid and under @@ -1866,7 +1872,7 @@ static void collapse_huge_page(struct mm_struct *mm, * mmap_sem in read mode is good idea also to allow greater * scalability. */ - new_page = alloc_hugepage_vma(khugepaged_defrag(), vma, address, + *hpage = alloc_hugepage_vma(khugepaged_defrag(), vma, address, node, __GFP_OTHER_NODE); /* @@ -1874,15 +1880,81 @@ static void collapse_huge_page(struct mm_struct *mm, * preparation for taking it in write mode. */ up_read(mm-mmap_sem); - if (unlikely(!new_page)) { + if (unlikely(!*hpage)) { count_vm_event(THP_COLLAPSE_ALLOC_FAILED); *hpage = ERR_PTR(-ENOMEM); - return; + return NULL; } - *hpage = new_page; + count_vm_event(THP_COLLAPSE_ALLOC); + return *hpage; +} +#else +static struct page *khugepaged_alloc_hugepage(bool *wait) +{ + struct page *hpage; + + do { + hpage = alloc_hugepage(khugepaged_defrag()); + if (!hpage) { + count_vm_event(THP_COLLAPSE_ALLOC_FAILED); + if (!*wait) + return NULL; + + *wait = false; + khugepaged_alloc_sleep(); + } else + count_vm_event(THP_COLLAPSE_ALLOC); + } while (unlikely(!hpage) likely(khugepaged_enabled())); + + return hpage; +} + +static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) +{ + if (!*hpage) + *hpage = khugepaged_alloc_hugepage(wait); + + if (unlikely(!*hpage)) + return false; + + return true; +} + +static struct page +*khugepaged_alloc_page(struct page **hpage, struct mm_struct *mm, + struct vm_area_struct *vma, unsigned long address, + int node) +{ + up_read(mm-mmap_sem); + VM_BUG_ON(!*hpage); + return *hpage; +} #endif +static void collapse_huge_page(struct mm_struct *mm, + unsigned long address, + struct page **hpage, + struct vm_area_struct *vma, + int node) +{ + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd, _pmd; + pte_t *pte; + pgtable_t pgtable; + struct page *new_page; + spinlock_t *ptl; + int isolated; + unsigned long hstart, hend; + + VM_BUG_ON(address ~HPAGE_PMD_MASK); + + /* release the mmap_sem read lock. */ + new_page = khugepaged_alloc_page(hpage, mm, vma, address, node); + if (!new_page) + return; + if (unlikely(mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL))) return; @@ -2230,34 +2302,6 @@ static int