RE: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood
> Transparent huge pages are not helpful for DB workload which there is a lot > of > shared memory Hmm. Perhaps they should be. If a database allocates most[1] of the memory on a machine to a shared memory segment - that *ought* to be a candidate for using transparent huge pages. Now that we have them they seem a better choice (much more flexibility) than hugetlbfs. -Tony [1] I've been told that it is normal to configure over 95% of physical memory to the shared memory region to run a particular transaction based benchmark with one commercial data base application. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood
Transparent huge pages are not helpful for DB workload which there is a lot of shared memory Hmm. Perhaps they should be. If a database allocates most[1] of the memory on a machine to a shared memory segment - that *ought* to be a candidate for using transparent huge pages. Now that we have them they seem a better choice (much more flexibility) than hugetlbfs. -Tony [1] I've been told that it is normal to configure over 95% of physical memory to the shared memory region to run a particular transaction based benchmark with one commercial data base application. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood
>>Sorry, I have no meaningful progress on this. Splitting hugepages is not >>a trivial operation, and introduce more complexity on hugetlbfs code. >>I don't hit on any usecase of it rather than memory failure, so I'm not >>sure that it's worth doing now. > > Agreed. ;-) Agreed that huge pages should be split - or that it is not worth splitting them? Actually I wonder how useful huge pages still are - transparent huge pages may give most of the benefits without having to modify applications to use them. Plus the kernel does know how to split them when an error occurs (which I care about more than most people). -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood
On Mon, Sep 16, 2013 at 09:50:06PM +, Luck, Tony wrote: > This is good - but the real solution is to stop poisoning entire huge pages > ... they should > be broken into 4K pages and just one 4K page should be poisoned. > > Naoya Horiguchi: I thought that you were looking at this problem some months > ago. Any progress? Sorry, I have no meaningful progress on this. Splitting hugepages is not a trivial operation, and introduce more complexity on hugetlbfs code. I don't hit on any usecase of it rather than memory failure, so I'm not sure that it's worth doing now. Thanks, Naoya Horiguchi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood
This is good - but the real solution is to stop poisoning entire huge pages ... they should be broken into 4K pages and just one 4K page should be poisoned. Naoya Horiguchi: I thought that you were looking at this problem some months ago. Any progress? -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood
This is good - but the real solution is to stop poisoning entire huge pages ... they should be broken into 4K pages and just one 4K page should be poisoned. Naoya Horiguchi: I thought that you were looking at this problem some months ago. Any progress? -Tony -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood
On Mon, Sep 16, 2013 at 09:50:06PM +, Luck, Tony wrote: This is good - but the real solution is to stop poisoning entire huge pages ... they should be broken into 4K pages and just one 4K page should be poisoned. Naoya Horiguchi: I thought that you were looking at this problem some months ago. Any progress? Sorry, I have no meaningful progress on this. Splitting hugepages is not a trivial operation, and introduce more complexity on hugetlbfs code. I don't hit on any usecase of it rather than memory failure, so I'm not sure that it's worth doing now. Thanks, Naoya Horiguchi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood
Sorry, I have no meaningful progress on this. Splitting hugepages is not a trivial operation, and introduce more complexity on hugetlbfs code. I don't hit on any usecase of it rather than memory failure, so I'm not sure that it's worth doing now. Agreed. ;-) Agreed that huge pages should be split - or that it is not worth splitting them? Actually I wonder how useful huge pages still are - transparent huge pages may give most of the benefits without having to modify applications to use them. Plus the kernel does know how to split them when an error occurs (which I care about more than most people). -Tony -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood
> Reviewed-by: Naoya Horiguchi > Signed-off-by: Wanpeng Li Acked-by: Andi Kleen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood
madvise_hwpoison won't check if the page is small page or huge page and traverse in small page granularity against the range unconditional, which result in a printk flood "MCE xxx: already hardware poisoned" if the page is huge page. This patch fix it by increase compound_order(compound_head(page)) for huge page iterator. Testcase: #define _GNU_SOURCE #include #include #include #include #include #include #include #define PAGES_TO_TEST 3 #define PAGE_SIZE 4096 * 512 int main(void) { char *mem; int i; mem = mmap(NULL, PAGES_TO_TEST * PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, 0, 0); if (madvise(mem, PAGES_TO_TEST * PAGE_SIZE, MADV_HWPOISON) == -1) return -1; munmap(mem, PAGES_TO_TEST * PAGE_SIZE); return 0; } Reviewed-by: Naoya Horiguchi Signed-off-by: Wanpeng Li --- mm/madvise.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 6975bc8..539eeb9 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -343,10 +343,11 @@ static long madvise_remove(struct vm_area_struct *vma, */ static int madvise_hwpoison(int bhv, unsigned long start, unsigned long end) { + struct page *p; if (!capable(CAP_SYS_ADMIN)) return -EPERM; - for (; start < end; start += PAGE_SIZE) { - struct page *p; + for (; start < end; start += PAGE_SIZE << + compound_order(compound_head(p))) { int ret; ret = get_user_pages_fast(start, 1, 0, ); -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood
madvise_hwpoison won't check if the page is small page or huge page and traverse in small page granularity against the range unconditional, which result in a printk flood MCE xxx: already hardware poisoned if the page is huge page. This patch fix it by increase compound_order(compound_head(page)) for huge page iterator. Testcase: #define _GNU_SOURCE #include stdlib.h #include stdio.h #include sys/mman.h #include unistd.h #include fcntl.h #include sys/types.h #include errno.h #define PAGES_TO_TEST 3 #define PAGE_SIZE 4096 * 512 int main(void) { char *mem; int i; mem = mmap(NULL, PAGES_TO_TEST * PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, 0, 0); if (madvise(mem, PAGES_TO_TEST * PAGE_SIZE, MADV_HWPOISON) == -1) return -1; munmap(mem, PAGES_TO_TEST * PAGE_SIZE); return 0; } Reviewed-by: Naoya Horiguchi n-horigu...@ah.jp.nec.com Signed-off-by: Wanpeng Li liw...@linux.vnet.ibm.com --- mm/madvise.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 6975bc8..539eeb9 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -343,10 +343,11 @@ static long madvise_remove(struct vm_area_struct *vma, */ static int madvise_hwpoison(int bhv, unsigned long start, unsigned long end) { + struct page *p; if (!capable(CAP_SYS_ADMIN)) return -EPERM; - for (; start end; start += PAGE_SIZE) { - struct page *p; + for (; start end; start += PAGE_SIZE + compound_order(compound_head(p))) { int ret; ret = get_user_pages_fast(start, 1, 0, p); -- 1.8.1.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood
Reviewed-by: Naoya Horiguchi n-horigu...@ah.jp.nec.com Signed-off-by: Wanpeng Li liw...@linux.vnet.ibm.com Acked-by: Andi Kleen a...@linux.intel.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood
madvise_hwpoison won't check if the page is small page or huge page and traverse in small page granularity against the range unconditional, which result in a printk flood "MCE xxx: already hardware poisoned" if the page is huge page. This patch fix it by increase compound_order(compound_head(page)) for huge page iterator. Testcase: #define _GNU_SOURCE #include #include #include #include #include #include #include #define PAGES_TO_TEST 3 #define PAGE_SIZE 4096 * 512 int main(void) { char *mem; int i; mem = mmap(NULL, PAGES_TO_TEST * PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, 0, 0); if (madvise(mem, PAGES_TO_TEST * PAGE_SIZE, MADV_HWPOISON) == -1) return -1; munmap(mem, PAGES_TO_TEST * PAGE_SIZE); return 0; } Reviewed-by: Naoya Horiguchi Signed-off-by: Wanpeng Li --- mm/madvise.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 6975bc8..539eeb9 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -343,10 +343,11 @@ static long madvise_remove(struct vm_area_struct *vma, */ static int madvise_hwpoison(int bhv, unsigned long start, unsigned long end) { + struct page *p; if (!capable(CAP_SYS_ADMIN)) return -EPERM; - for (; start < end; start += PAGE_SIZE) { - struct page *p; + for (; start < end; start += PAGE_SIZE << + compound_order(compound_head(p))) { int ret; ret = get_user_pages_fast(start, 1, 0, ); -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood
madvise_hwpoison won't check if the page is small page or huge page and traverse in small page granularity against the range unconditional, which result in a printk flood MCE xxx: already hardware poisoned if the page is huge page. This patch fix it by increase compound_order(compound_head(page)) for huge page iterator. Testcase: #define _GNU_SOURCE #include stdlib.h #include stdio.h #include sys/mman.h #include unistd.h #include fcntl.h #include sys/types.h #include errno.h #define PAGES_TO_TEST 3 #define PAGE_SIZE 4096 * 512 int main(void) { char *mem; int i; mem = mmap(NULL, PAGES_TO_TEST * PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, 0, 0); if (madvise(mem, PAGES_TO_TEST * PAGE_SIZE, MADV_HWPOISON) == -1) return -1; munmap(mem, PAGES_TO_TEST * PAGE_SIZE); return 0; } Reviewed-by: Naoya Horiguchi n-horigu...@ah.jp.nec.com Signed-off-by: Wanpeng Li liw...@linux.vnet.ibm.com --- mm/madvise.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 6975bc8..539eeb9 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -343,10 +343,11 @@ static long madvise_remove(struct vm_area_struct *vma, */ static int madvise_hwpoison(int bhv, unsigned long start, unsigned long end) { + struct page *p; if (!capable(CAP_SYS_ADMIN)) return -EPERM; - for (; start end; start += PAGE_SIZE) { - struct page *p; + for (; start end; start += PAGE_SIZE + compound_order(compound_head(p))) { int ret; ret = get_user_pages_fast(start, 1, 0, p); -- 1.8.1.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/