Re: [PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

2012-08-12 Thread Naoya Horiguchi
Hi Tony, Thank you for the comment. On Sat, Aug 11, 2012 at 10:41:49PM +, Luck, Tony wrote: > > dirty pagecache error recoverable under some conditions. Consider that > > if there is a copy of the corrupted dirty pagecache on user buffer and > > you write() over the error page with the copy

Re: [PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

2012-08-12 Thread Naoya Horiguchi
Hi, On Sun, Aug 12, 2012 at 05:28:44AM +0200, Andi Kleen wrote: > > > That function uses a global lock. fdatawait is quite common. This will > > > likely cause performance problems in IO workloads. > > > > OK, I should avoid it. > > Maybe just RCU the hash table. OK. > > > You need to get

Re: [PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

2012-08-12 Thread Naoya Horiguchi
Hi, On Sun, Aug 12, 2012 at 05:28:44AM +0200, Andi Kleen wrote: That function uses a global lock. fdatawait is quite common. This will likely cause performance problems in IO workloads. OK, I should avoid it. Maybe just RCU the hash table. OK. You need to get that lock out of

Re: [PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

2012-08-12 Thread Naoya Horiguchi
Hi Tony, Thank you for the comment. On Sat, Aug 11, 2012 at 10:41:49PM +, Luck, Tony wrote: dirty pagecache error recoverable under some conditions. Consider that if there is a copy of the corrupted dirty pagecache on user buffer and you write() over the error page with the copy data,

Re: [PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

2012-08-11 Thread Andi Kleen
> > That function uses a global lock. fdatawait is quite common. This will > > likely cause performance problems in IO workloads. > > OK, I should avoid it. Maybe just RCU the hash table. > > You need to get that lock out of the hot path somehow. > > > > Probably better to try to put the data

RE: [PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

2012-08-11 Thread Luck, Tony
> dirty pagecache error recoverable under some conditions. Consider that > if there is a copy of the corrupted dirty pagecache on user buffer and > you write() over the error page with the copy data, then we can ignore > the effect of the error because no one consumes the corrupted data. This

Re: [PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

2012-08-11 Thread Naoya Horiguchi
Hi, On Sat, Aug 11, 2012 at 04:15:06AM -0700, Andi Kleen wrote: > Naoya Horiguchi writes: > > I'm sceptical on the patch, but here's my review. Thank you for taking your time for the review. > > - return -EHWPOISON when we access to the error-affected address with > > read(),

Re: [PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

2012-08-11 Thread Andi Kleen
Naoya Horiguchi writes: I'm sceptical on the patch, but here's my review. > - return -EHWPOISON when we access to the error-affected address with > read(), partial-page write(), fsync(), Note that a lot of user space does not like new errnos (nothing in strerror etc.). It's probably

Re: [PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

2012-08-11 Thread Andi Kleen
Naoya Horiguchi n-horigu...@ah.jp.nec.com writes: I'm sceptical on the patch, but here's my review. - return -EHWPOISON when we access to the error-affected address with read(), partial-page write(), fsync(), Note that a lot of user space does not like new errnos (nothing in strerror

Re: [PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

2012-08-11 Thread Naoya Horiguchi
Hi, On Sat, Aug 11, 2012 at 04:15:06AM -0700, Andi Kleen wrote: Naoya Horiguchi n-horigu...@ah.jp.nec.com writes: I'm sceptical on the patch, but here's my review. Thank you for taking your time for the review. - return -EHWPOISON when we access to the error-affected address with

RE: [PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

2012-08-11 Thread Luck, Tony
dirty pagecache error recoverable under some conditions. Consider that if there is a copy of the corrupted dirty pagecache on user buffer and you write() over the error page with the copy data, then we can ignore the effect of the error because no one consumes the corrupted data. This sounds

Re: [PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

2012-08-11 Thread Andi Kleen
That function uses a global lock. fdatawait is quite common. This will likely cause performance problems in IO workloads. OK, I should avoid it. Maybe just RCU the hash table. You need to get that lock out of the hot path somehow. Probably better to try to put the data into a

Re: [PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

2012-08-10 Thread Naoya Horiguchi
Hello, On Fri, Aug 10, 2012 at 04:13:03PM -0700, Andi Kleen wrote: > Naoya Horiguchi writes: > > > Current error reporting of memory errors on dirty pagecache has silent > > data lost problem because AS_EIO in struct address_space is cleared > > once checked. > > Seems very complicated. I

Re: [PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

2012-08-10 Thread Andi Kleen
Naoya Horiguchi writes: > Current error reporting of memory errors on dirty pagecache has silent > data lost problem because AS_EIO in struct address_space is cleared > once checked. Seems very complicated. I think I would prefer something simpler if possible, especially unless it's proven the

Re: [PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

2012-08-10 Thread Naoya Horiguchi
On Fri, Aug 10, 2012 at 05:41:53PM -0400, Naoya Horiguchi wrote: ... > +/* > * Dirty cache page page > * Issues: when the error hit a hole page the error is not properly > * propagated. > */ > static int me_pagecache_dirty(struct page *p, unsigned long pfn) > { > - /* > - * The

[PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

2012-08-10 Thread Naoya Horiguchi
Current error reporting of memory errors on dirty pagecache has silent data lost problem because AS_EIO in struct address_space is cleared once checked. A simple solution is to make AS_EIO sticky (as Wu Fengguang proposed in https://lkml.org/lkml/2009/6/11/294), but this patch does more to make

[PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

2012-08-10 Thread Naoya Horiguchi
Current error reporting of memory errors on dirty pagecache has silent data lost problem because AS_EIO in struct address_space is cleared once checked. A simple solution is to make AS_EIO sticky (as Wu Fengguang proposed in https://lkml.org/lkml/2009/6/11/294), but this patch does more to make

Re: [PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

2012-08-10 Thread Naoya Horiguchi
On Fri, Aug 10, 2012 at 05:41:53PM -0400, Naoya Horiguchi wrote: ... +/* * Dirty cache page page * Issues: when the error hit a hole page the error is not properly * propagated. */ static int me_pagecache_dirty(struct page *p, unsigned long pfn) { - /* - * The original

Re: [PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

2012-08-10 Thread Andi Kleen
Naoya Horiguchi n-horigu...@ah.jp.nec.com writes: Current error reporting of memory errors on dirty pagecache has silent data lost problem because AS_EIO in struct address_space is cleared once checked. Seems very complicated. I think I would prefer something simpler if possible, especially

Re: [PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

2012-08-10 Thread Naoya Horiguchi
Hello, On Fri, Aug 10, 2012 at 04:13:03PM -0700, Andi Kleen wrote: Naoya Horiguchi n-horigu...@ah.jp.nec.com writes: Current error reporting of memory errors on dirty pagecache has silent data lost problem because AS_EIO in struct address_space is cleared once checked. Seems very