RE: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood

2013-09-17 Thread Luck, Tony
> Transparent huge pages are not helpful for DB workload which there is a lot 
> of 
> shared memory

Hmm. Perhaps they should be.  If a database allocates most[1] of the memory on a
machine to a shared memory segment - that *ought* to be a candidate for using
transparent huge pages.  Now that we have them they seem a better choice (much
more flexibility) than hugetlbfs.

-Tony

[1] I've been told that it is normal to configure over 95% of physical memory 
to the
shared memory region to run a particular transaction based benchmark with one
commercial data base application.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood

2013-09-17 Thread Luck, Tony
 Transparent huge pages are not helpful for DB workload which there is a lot 
 of 
 shared memory

Hmm. Perhaps they should be.  If a database allocates most[1] of the memory on a
machine to a shared memory segment - that *ought* to be a candidate for using
transparent huge pages.  Now that we have them they seem a better choice (much
more flexibility) than hugetlbfs.

-Tony

[1] I've been told that it is normal to configure over 95% of physical memory 
to the
shared memory region to run a particular transaction based benchmark with one
commercial data base application.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood

2013-09-16 Thread Luck, Tony
>>Sorry, I have no meaningful progress on this. Splitting hugepages is not
>>a trivial operation, and introduce more complexity on hugetlbfs code.
>>I don't hit on any usecase of it rather than memory failure, so I'm not
>>sure that it's worth doing now.
>
> Agreed. ;-)

Agreed that huge pages should be split - or that it is not worth splitting them?

Actually I wonder how useful huge pages still are - transparent huge pages may
give most of the benefits without having to modify applications to use them.
Plus the kernel does know how to split them when an error occurs (which I care
about more than most people).

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood

2013-09-16 Thread Naoya Horiguchi
On Mon, Sep 16, 2013 at 09:50:06PM +, Luck, Tony wrote:
> This is good - but the real solution is to stop poisoning entire huge pages 
> ... they should
> be broken into 4K pages and just one 4K page should be poisoned.
> 
> Naoya Horiguchi: I thought that you were looking at this problem some months 
> ago. Any progress?

Sorry, I have no meaningful progress on this. Splitting hugepages is not
a trivial operation, and introduce more complexity on hugetlbfs code.
I don't hit on any usecase of it rather than memory failure, so I'm not
sure that it's worth doing now.

Thanks,
Naoya Horiguchi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood

2013-09-16 Thread Luck, Tony
This is good - but the real solution is to stop poisoning entire huge pages ... 
they should
be broken into 4K pages and just one 4K page should be poisoned.

Naoya Horiguchi: I thought that you were looking at this problem some months 
ago. Any progress?

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood

2013-09-16 Thread Luck, Tony
This is good - but the real solution is to stop poisoning entire huge pages ... 
they should
be broken into 4K pages and just one 4K page should be poisoned.

Naoya Horiguchi: I thought that you were looking at this problem some months 
ago. Any progress?

-Tony
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood

2013-09-16 Thread Naoya Horiguchi
On Mon, Sep 16, 2013 at 09:50:06PM +, Luck, Tony wrote:
 This is good - but the real solution is to stop poisoning entire huge pages 
 ... they should
 be broken into 4K pages and just one 4K page should be poisoned.
 
 Naoya Horiguchi: I thought that you were looking at this problem some months 
 ago. Any progress?

Sorry, I have no meaningful progress on this. Splitting hugepages is not
a trivial operation, and introduce more complexity on hugetlbfs code.
I don't hit on any usecase of it rather than memory failure, so I'm not
sure that it's worth doing now.

Thanks,
Naoya Horiguchi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood

2013-09-16 Thread Luck, Tony
Sorry, I have no meaningful progress on this. Splitting hugepages is not
a trivial operation, and introduce more complexity on hugetlbfs code.
I don't hit on any usecase of it rather than memory failure, so I'm not
sure that it's worth doing now.

 Agreed. ;-)

Agreed that huge pages should be split - or that it is not worth splitting them?

Actually I wonder how useful huge pages still are - transparent huge pages may
give most of the benefits without having to modify applications to use them.
Plus the kernel does know how to split them when an error occurs (which I care
about more than most people).

-Tony
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood

2013-09-14 Thread Andi Kleen
> Reviewed-by: Naoya Horiguchi 
> Signed-off-by: Wanpeng Li 

Acked-by: Andi Kleen 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood

2013-09-14 Thread Wanpeng Li
madvise_hwpoison won't check if the page is small page or huge page and 
traverse 
in small page granularity against the range unconditional, which result in a 
printk 
flood "MCE xxx: already hardware poisoned" if the page is huge page. This patch 
fix 
it by increase compound_order(compound_head(page)) for huge page iterator.

Testcase:

#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define PAGES_TO_TEST 3
#define PAGE_SIZE   4096 * 512

int main(void)
{
char *mem;
int i;

mem = mmap(NULL, PAGES_TO_TEST * PAGE_SIZE,
PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | 
MAP_HUGETLB, 0, 0);

if (madvise(mem, PAGES_TO_TEST * PAGE_SIZE, MADV_HWPOISON) == -1)
return -1;

munmap(mem, PAGES_TO_TEST * PAGE_SIZE);

return 0;
}

Reviewed-by: Naoya Horiguchi 
Signed-off-by: Wanpeng Li 
---
 mm/madvise.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 6975bc8..539eeb9 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -343,10 +343,11 @@ static long madvise_remove(struct vm_area_struct *vma,
  */
 static int madvise_hwpoison(int bhv, unsigned long start, unsigned long end)
 {
+   struct page *p;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
-   for (; start < end; start += PAGE_SIZE) {
-   struct page *p;
+   for (; start < end; start += PAGE_SIZE <<
+   compound_order(compound_head(p))) {
int ret;
 
ret = get_user_pages_fast(start, 1, 0, );
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood

2013-09-14 Thread Wanpeng Li
madvise_hwpoison won't check if the page is small page or huge page and 
traverse 
in small page granularity against the range unconditional, which result in a 
printk 
flood MCE xxx: already hardware poisoned if the page is huge page. This patch 
fix 
it by increase compound_order(compound_head(page)) for huge page iterator.

Testcase:

#define _GNU_SOURCE
#include stdlib.h
#include stdio.h
#include sys/mman.h
#include unistd.h
#include fcntl.h
#include sys/types.h
#include errno.h

#define PAGES_TO_TEST 3
#define PAGE_SIZE   4096 * 512

int main(void)
{
char *mem;
int i;

mem = mmap(NULL, PAGES_TO_TEST * PAGE_SIZE,
PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | 
MAP_HUGETLB, 0, 0);

if (madvise(mem, PAGES_TO_TEST * PAGE_SIZE, MADV_HWPOISON) == -1)
return -1;

munmap(mem, PAGES_TO_TEST * PAGE_SIZE);

return 0;
}

Reviewed-by: Naoya Horiguchi n-horigu...@ah.jp.nec.com
Signed-off-by: Wanpeng Li liw...@linux.vnet.ibm.com
---
 mm/madvise.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 6975bc8..539eeb9 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -343,10 +343,11 @@ static long madvise_remove(struct vm_area_struct *vma,
  */
 static int madvise_hwpoison(int bhv, unsigned long start, unsigned long end)
 {
+   struct page *p;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
-   for (; start  end; start += PAGE_SIZE) {
-   struct page *p;
+   for (; start  end; start += PAGE_SIZE 
+   compound_order(compound_head(p))) {
int ret;
 
ret = get_user_pages_fast(start, 1, 0, p);
-- 
1.8.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood

2013-09-14 Thread Andi Kleen
 Reviewed-by: Naoya Horiguchi n-horigu...@ah.jp.nec.com
 Signed-off-by: Wanpeng Li liw...@linux.vnet.ibm.com

Acked-by: Andi Kleen a...@linux.intel.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood

2013-09-02 Thread Wanpeng Li
madvise_hwpoison won't check if the page is small page or huge page and 
traverse 
in small page granularity against the range unconditional, which result in a 
printk 
flood "MCE xxx: already hardware poisoned" if the page is huge page. This patch 
fix 
it by increase compound_order(compound_head(page)) for huge page iterator.

Testcase:

#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define PAGES_TO_TEST 3
#define PAGE_SIZE   4096 * 512

int main(void)
{
char *mem;
int i;

mem = mmap(NULL, PAGES_TO_TEST * PAGE_SIZE,
PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | 
MAP_HUGETLB, 0, 0);

if (madvise(mem, PAGES_TO_TEST * PAGE_SIZE, MADV_HWPOISON) == -1)
return -1;

munmap(mem, PAGES_TO_TEST * PAGE_SIZE);

return 0;
}

Reviewed-by: Naoya Horiguchi 
Signed-off-by: Wanpeng Li 
---
 mm/madvise.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 6975bc8..539eeb9 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -343,10 +343,11 @@ static long madvise_remove(struct vm_area_struct *vma,
  */
 static int madvise_hwpoison(int bhv, unsigned long start, unsigned long end)
 {
+   struct page *p;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
-   for (; start < end; start += PAGE_SIZE) {
-   struct page *p;
+   for (; start < end; start += PAGE_SIZE <<
+   compound_order(compound_head(p))) {
int ret;
 
ret = get_user_pages_fast(start, 1, 0, );
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood

2013-09-02 Thread Wanpeng Li
madvise_hwpoison won't check if the page is small page or huge page and 
traverse 
in small page granularity against the range unconditional, which result in a 
printk 
flood MCE xxx: already hardware poisoned if the page is huge page. This patch 
fix 
it by increase compound_order(compound_head(page)) for huge page iterator.

Testcase:

#define _GNU_SOURCE
#include stdlib.h
#include stdio.h
#include sys/mman.h
#include unistd.h
#include fcntl.h
#include sys/types.h
#include errno.h

#define PAGES_TO_TEST 3
#define PAGE_SIZE   4096 * 512

int main(void)
{
char *mem;
int i;

mem = mmap(NULL, PAGES_TO_TEST * PAGE_SIZE,
PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | 
MAP_HUGETLB, 0, 0);

if (madvise(mem, PAGES_TO_TEST * PAGE_SIZE, MADV_HWPOISON) == -1)
return -1;

munmap(mem, PAGES_TO_TEST * PAGE_SIZE);

return 0;
}

Reviewed-by: Naoya Horiguchi n-horigu...@ah.jp.nec.com
Signed-off-by: Wanpeng Li liw...@linux.vnet.ibm.com
---
 mm/madvise.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 6975bc8..539eeb9 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -343,10 +343,11 @@ static long madvise_remove(struct vm_area_struct *vma,
  */
 static int madvise_hwpoison(int bhv, unsigned long start, unsigned long end)
 {
+   struct page *p;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
-   for (; start  end; start += PAGE_SIZE) {
-   struct page *p;
+   for (; start  end; start += PAGE_SIZE 
+   compound_order(compound_head(p))) {
int ret;
 
ret = get_user_pages_fast(start, 1, 0, p);
-- 
1.8.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/