Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-10 Thread Xishi Qiu
>>> Hi Simon,

>>>
>>> If we use "/sys/devices/system/memory/soft_offline_page" to offline a
>>> free page, the value of mce_bad_pages will be added. Then the page is marked
>>> HWPoison, but it is still managed by page buddy alocator.
>>>
>>> So if we offline it again, the value of mce_bad_pages will be added again.
>>> Assume the page is not allocated during this short time.
>>>
>>> soft_offline_page()
>>> get_any_page()
>>> "else if (is_free_buddy_page(p))" branch return 0
>>> "goto done";
>>> "atomic_long_add(1, &mce_bad_pages);"
>>>
>>> I think it would be better to move "if(PageHWPoison(page))" at the 
>>> beginning of
>>> soft_offline_page(). However I don't know what do these words mean,
>>> "Synchronized using the page lock with memory_failure()"
> 
> Hi Xishi,
> 
> Unpoison will clear PG_hwpoison flag after hold page lock, memory_failure() 
> and 
> soft_offline_page() take the lock to avoid unpoison clear the flag behind 
> them.
> 
> Regards,
> Wanpeng Li 
> 

Hi Wanpeng,

As you mean, it is the necessary to get the page lock first when we check the
HWPoison flag every time, this is in order to avoid conflict, right?

So why not use a globe lock here? For example lock_memory_hotplug() is used in
online_pages() and offline_pages()?

Thanks,
Xishi Qiu

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-10 Thread Xishi Qiu
On 2012/12/11 11:48, Simon Jeons wrote:

> On Tue, 2012-12-11 at 04:19 +0100, Andi Kleen wrote:
>> On Mon, Dec 10, 2012 at 09:13:11PM -0600, Simon Jeons wrote:
>>> On Tue, 2012-12-11 at 04:01 +0100, Andi Kleen wrote:
> Oh, it will be putback to lru list during migration. So does your "some
> time" mean before call check_new_page?

 Yes until the next check_new_page() whenever that is. If the migration
 works it will be earlier, otherwise later.
>>>
>>> But I can't figure out any page reclaim path check if the page is set
>>> PG_hwpoison, can poisoned pages be rclaimed?
>>
>> The only way to reclaim a page is to free and reallocate it.
> 
> Then why there doesn't have check in reclaim path to avoid relcaim
> poisoned page?
> 
>   -Simon

Hi Simon,

If the page is free, it will be set PG_hwpoison, and soft_offline_page() is 
done.
When the page is alocated later, check_new_page() will find the poisoned page 
and
isolate the whole buddy block(just drop the block).

If the page is not free, soft_offline_page() try to free it first, if this is
failed, it will migrate the page, but the page is still in LRU list after 
migration,
migrate_pages()
unmap_and_move()
if (rc != -EAGAIN) {
...
putback_lru_page(page);
}
We can use lru_add_drain_all() to drain lru pagevec, at last 
free_hot_cold_page()
will be called, and free_pages_prepare() check the poisoned pages.
free_pages_prepare()
free_pages_check()
bad_page()

Is this right, Andi?

Thanks
Xishi Qiu

>>
>> -Andi
> 
> 
> 
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-10 Thread Simon Jeons
On Tue, 2012-12-11 at 04:19 +0100, Andi Kleen wrote:
> On Mon, Dec 10, 2012 at 09:13:11PM -0600, Simon Jeons wrote:
> > On Tue, 2012-12-11 at 04:01 +0100, Andi Kleen wrote:
> > > > Oh, it will be putback to lru list during migration. So does your "some
> > > > time" mean before call check_new_page?
> > > 
> > > Yes until the next check_new_page() whenever that is. If the migration
> > > works it will be earlier, otherwise later.
> > 
> > But I can't figure out any page reclaim path check if the page is set
> > PG_hwpoison, can poisoned pages be rclaimed?
> 
> The only way to reclaim a page is to free and reallocate it.

Then why there doesn't have check in reclaim path to avoid relcaim
poisoned page?

-Simon

> 
> -Andi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-10 Thread Andi Kleen
> "There are not so many free pages in a typical server system", sorry I don't
> quite understand it.

Linux tries to keep most memory in caches. As Linus says "free memory is
bad memory"

>
> buffered_rmqueue()
>   prep_new_page()
>   check_new_page()
>   bad_page()
> 
> If we alloc 2^10 pages and one of them is a poisoned page, then the whole 4M
> memory will be dropped.

prep_new_page() is only called on whatever is allocated.
MAX_ORDER is much smaller than 2^10

If you allocate a large order page then yes the complete page is
dropped. This is today generally true in hwpoison. It would be one
possible area of improvement (probably mostly if 1GB pages become
more common than they are today)

It's usually not a problem because usually most allocations are
small order and systems have generally very few memory errors,
and even the largest MAX_ORDER pages are a small fraction of the 
total memory.

If you lose larger amounts of memory usually you quickly hit something
that HWPoison cannot handle.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-10 Thread Xishi Qiu
On 2012/12/11 10:58, Andi Kleen wrote:

>> That sounds like overkill. There are not so many free pages in a
>> typical server system.
> 
> As Fengguang said -- memory error handling is tricky. Lots of things
> could be done in theory, but they all have a cost in testing and 
> maintenance. 
> 
> In general they are only worth doing if the situation is common and
> represents a significant percentage of the total pages of a relevant server
> workload.
> 
> -Andi
> 

Hi Andi and Fengguang,

"There are not so many free pages in a typical server system", sorry I don't
quite understand it.

buffered_rmqueue()
prep_new_page()
check_new_page()
bad_page()

If we alloc 2^10 pages and one of them is a poisoned page, then the whole 4M
memory will be dropped.

Thanks,
Xishi Qiu

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-10 Thread Andi Kleen
On Mon, Dec 10, 2012 at 09:13:11PM -0600, Simon Jeons wrote:
> On Tue, 2012-12-11 at 04:01 +0100, Andi Kleen wrote:
> > > Oh, it will be putback to lru list during migration. So does your "some
> > > time" mean before call check_new_page?
> > 
> > Yes until the next check_new_page() whenever that is. If the migration
> > works it will be earlier, otherwise later.
> 
> But I can't figure out any page reclaim path check if the page is set
> PG_hwpoison, can poisoned pages be rclaimed?

The only way to reclaim a page is to free and reallocate it.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-10 Thread Simon Jeons
On Tue, 2012-12-11 at 04:01 +0100, Andi Kleen wrote:
> > Oh, it will be putback to lru list during migration. So does your "some
> > time" mean before call check_new_page?
> 
> Yes until the next check_new_page() whenever that is. If the migration
> works it will be earlier, otherwise later.

But I can't figure out any page reclaim path check if the page is set
PG_hwpoison, can poisoned pages be rclaimed?

-Simon

> 
> -andi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-10 Thread Andi Kleen
> Oh, it will be putback to lru list during migration. So does your "some
> time" mean before call check_new_page?

Yes until the next check_new_page() whenever that is. If the migration
works it will be earlier, otherwise later.

-andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-10 Thread Andi Kleen
> That sounds like overkill. There are not so many free pages in a
> typical server system.

As Fengguang said -- memory error handling is tricky. Lots of things
could be done in theory, but they all have a cost in testing and 
maintenance. 

In general they are only worth doing if the situation is common and
represents a significant percentage of the total pages of a relevant server
workload.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-10 Thread Fengguang Wu
On Tue, Dec 11, 2012 at 10:25:00AM +0800, Xishi Qiu wrote:
> On 2012/12/10 23:38, Andi Kleen wrote:
> 
> >> It is another topic, I mean since the page is poisoned, so why not isolate 
> >> it
> >> from page buddy alocator in soft_offline_page() rather than in 
> >> check_new_page().
> >> I find soft_offline_page() only migrate the page and mark HWPoison, the 
> >> poisoned
> >> page is still managed by page buddy alocator.
> > 
> > Doing it in check_new_page is the only way if the page is currently
> > allocated by someone. Since that's not uncommon it's simplest to always
> > do it this way.
> > 
> > -Andi
> > 
> 
> Hi Andi,
> 
> The poisoned page is isolated in check_new_page, however the whole buddy 
> block will
> be dropped, it seems to be a waste of memory.
> 
> Can we separate the poisoned page from the buddy block, then *only* drop the 
> poisoned
> page?

That sounds like overkill. There are not so many free pages in a
typical server system.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-10 Thread Xishi Qiu
On 2012/12/10 23:38, Andi Kleen wrote:

>> It is another topic, I mean since the page is poisoned, so why not isolate it
>> from page buddy alocator in soft_offline_page() rather than in 
>> check_new_page().
>> I find soft_offline_page() only migrate the page and mark HWPoison, the 
>> poisoned
>> page is still managed by page buddy alocator.
> 
> Doing it in check_new_page is the only way if the page is currently
> allocated by someone. Since that's not uncommon it's simplest to always
> do it this way.
> 
> -Andi
> 

Hi Andi,

The poisoned page is isolated in check_new_page, however the whole buddy block 
will
be dropped, it seems to be a waste of memory.

Can we separate the poisoned page from the buddy block, then *only* drop the 
poisoned
page?

Thanks
Xishi Qiu


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-10 Thread Simon Jeons
On Tue, 2012-12-11 at 03:03 +0100, Andi Kleen wrote:
> > IIUC, soft offlining will isolate and migrate hwpoisoned page, and this
> > page will not be accessed by memory management subsystem until unpoison,
> > correct?
> 
> No, soft offlining can still allow accesses for some time. It'll never kill
> anything.

Oh, it will be putback to lru list during migration. So does your "some
time" mean before call check_new_page?

   -Simon 

> 
> Hard tries much harder and will kill.
> 
> In some cases (unshrinkable kernel allocation) they end up doing the same
> because there isn't any other alternative though. However these are
> expected to only apply to a small percentage of pages in a typical
> system.
> 
> -Andi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-10 Thread Andi Kleen
> IIUC, soft offlining will isolate and migrate hwpoisoned page, and this
> page will not be accessed by memory management subsystem until unpoison,
> correct?

No, soft offlining can still allow accesses for some time. It'll never kill
anything.

Hard tries much harder and will kill.

In some cases (unshrinkable kernel allocation) they end up doing the same
because there isn't any other alternative though. However these are
expected to only apply to a small percentage of pages in a typical
system.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-10 Thread Simon Jeons
On Mon, 2012-12-10 at 16:38 +0100, Andi Kleen wrote:
> > It is another topic, I mean since the page is poisoned, so why not isolate 
> > it
> > from page buddy alocator in soft_offline_page() rather than in 
> > check_new_page().
> > I find soft_offline_page() only migrate the page and mark HWPoison, the 
> > poisoned
> > page is still managed by page buddy alocator.
> 
> Doing it in check_new_page is the only way if the page is currently
> allocated by someone. Since that's not uncommon it's simplest to always
> do it this way.

Hi Andi,

IIUC, soft offlining will isolate and migrate hwpoisoned page, and this
page will not be accessed by memory management subsystem until unpoison,
correct?

 -Simon

> 
> -Andi
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-10 Thread Andi Kleen
> HWPoison delays any action on buddy allocator pages, handling can be safely 
> postponed 
> until a later time when the page might be referenced. By delaying, some 
> transient errors 
> may not reoccur or may be irrelevant.

That's not true for soft offlining, only for hard.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-10 Thread Andi Kleen
> It is another topic, I mean since the page is poisoned, so why not isolate it
> from page buddy alocator in soft_offline_page() rather than in 
> check_new_page().
> I find soft_offline_page() only migrate the page and mark HWPoison, the 
> poisoned
> page is still managed by page buddy alocator.

Doing it in check_new_page is the only way if the page is currently
allocated by someone. Since that's not uncommon it's simplest to always
do it this way.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-10 Thread Simon Jeons
Cc other guys.

On Mon, 2012-12-10 at 20:40 +0800, Xishi Qiu wrote:
> On 2012/12/10 19:56, Simon Jeons wrote:
> 
> > On Mon, 2012-12-10 at 19:16 +0800, Xishi Qiu wrote:
> >> On 2012/12/10 18:47, Simon Jeons wrote:
> >>
> >>> On Mon, 2012-12-10 at 17:06 +0800, Xishi Qiu wrote:
>  On 2012/12/10 16:33, Wanpeng Li wrote:
> 
> > On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote:
> >> On Fri, 7 Dec 2012 16:48:45 +0800
> >> Xishi Qiu  wrote:
> >>
> >>> On x86 platform, if we use 
> >>> "/sys/devices/system/memory/soft_offline_page" to offline a
> >>> free page twice, the value of mce_bad_pages will be added twice. So 
> >>> this is an error,
> >>> since the page was already marked HWPoison, we should skip the page 
> >>> and don't add the
> >>> value of mce_bad_pages.
> >>>
> >>> $ cat /proc/meminfo | grep HardwareCorrupted
> >>>
> >>> soft_offline_page()
> >>>   get_any_page()
> >>>   atomic_long_add(1, &mce_bad_pages)
> >>>
> >>> ...
> >>>
> >>> --- a/mm/memory-failure.c
> >>> +++ b/mm/memory-failure.c
> >>> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int 
> >>> flags)
> >>>   return ret;
> >>>
> >>>  done:
> >>> - atomic_long_add(1, &mce_bad_pages);
> >>> - SetPageHWPoison(page);
> >>>   /* keep elevated page count for bad page */
> >>> + if (!PageHWPoison(page)) {
> >>> + atomic_long_add(1, &mce_bad_pages);
> >>> + SetPageHWPoison(page);
> >>> + }
> >>> +
> >>>   return ret;
> >>>  }
> >>
> >> A few things:
> >>
> >> - soft_offline_page() already checks for this case:
> >>
> >>if (PageHWPoison(page)) {
> >>unlock_page(page);
> >>put_page(page);
> >>pr_info("soft offline: %#lx page already poisoned\n", 
> >> pfn);
> >>return -EBUSY;
> >>}
> >>
> >>  so why didn't this check work for you?
> >>
> >>  Presumably because one of the earlier "goto done" branches was
> >>  taken.  Which one, any why?
> >>
> >>  This function is an utter mess.  It contains six return points
> >>  randomly intermingled with three "goto done" return points.
> >>
> >>  This mess is probably the cause of the bug you have observed.  Can
> >>  we please fix it up somehow?  It *seems* that the design (lol) of
> >>  this function is "for errors, return immediately.  For success, goto
> >>  done".  In which case "done" should have been called "success".  But
> >>  if you just look at the function you'll see that this approach didn't
> >>  work.  I suggest it be converted to have two return points - one for
> >>  the success path, one for the failure path.  Or something.
> >>
> >> - soft_offline_huge_page() is a miniature copy of soft_offline_page()
> >>  and might suffer the same bug.
> >>
> >> - A cleaner, shorter and possibly faster implementation is
> >>
> >>if (!TestSetPageHWPoison(page))
> >>atomic_long_add(1, &mce_bad_pages);
> >>
> >
> > Hi Andrew,
> >
> > Since hwpoison bit for free buddy page has already be set in 
> > get_any_page, 
> > !TestSetPageHWPoison(page) will not increase mce_bad_pages count even 
> > for 
> > the first time.
> >
> > Regards,
> > Wanpeng Li
> >
> 
>  The poisoned page is isolated in bad_page(), I wonder whether it could 
>  be isolated
>  immediately in soft_offline_page() and memory_failure()?
> 
>  buffered_rmqueue()
>   prep_new_page()
>   check_new_page()
>   bad_page()
> >>>
> >>> Do you mean else if(is_free_buddy_page(p)) branch is redundancy?
> >>>
> >>
> >> Hi Simon,
> >>
> >> get_any_page() -> "else if(is_free_buddy_page(p))" branch is *not* 
> >> redundancy.
> >>
> >> It is another topic, I mean since the page is poisoned, so why not isolate 
> >> it
> > 
> > What topic? I still can't figure out when this branch can be executed
> > since hwpoison inject path can't poison free buddy pages. 
> > 
> 
> Hi Simon,
> 
> If we use "/sys/devices/system/memory/soft_offline_page" to offline a
> free page, the value of mce_bad_pages will be added. Then the page is marked
> HWPoison, but it is still managed by page buddy alocator.
> 
> So if we offline it again, the value of mce_bad_pages will be added again.
> Assume the page is not allocated during this short time.
> 
> soft_offline_page()
>   get_any_page()
>   "else if (is_free_buddy_page(p))" branch return 0
>   "goto done";
>   "atomic_long_add(1, &mce_bad_pages);"
> 
> I think it would be better to move "if(PageHWPoison(page))" at the beginning 
> of
> soft_offline_page(). Howev

Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-10 Thread Borislav Petkov
On Mon, Dec 10, 2012 at 07:54:53PM +0800, Xishi Qiu wrote:
> One more question, can we add a list_head to manager the poisoned pages?

What would you need that list for? Also, a list is not the most optimal
data structure for when you need to traverse it often.

Thanks.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-10 Thread Simon Jeons
On Mon, 2012-12-10 at 19:16 +0800, Xishi Qiu wrote:
> On 2012/12/10 18:47, Simon Jeons wrote:
> 
> > On Mon, 2012-12-10 at 17:06 +0800, Xishi Qiu wrote:
> >> On 2012/12/10 16:33, Wanpeng Li wrote:
> >>
> >>> On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote:
>  On Fri, 7 Dec 2012 16:48:45 +0800
>  Xishi Qiu  wrote:
> 
> > On x86 platform, if we use 
> > "/sys/devices/system/memory/soft_offline_page" to offline a
> > free page twice, the value of mce_bad_pages will be added twice. So 
> > this is an error,
> > since the page was already marked HWPoison, we should skip the page and 
> > don't add the
> > value of mce_bad_pages.
> >
> > $ cat /proc/meminfo | grep HardwareCorrupted
> >
> > soft_offline_page()
> > get_any_page()
> > atomic_long_add(1, &mce_bad_pages)
> >
> > ...
> >
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int 
> > flags)
> > return ret;
> >
> >  done:
> > -   atomic_long_add(1, &mce_bad_pages);
> > -   SetPageHWPoison(page);
> > /* keep elevated page count for bad page */
> > +   if (!PageHWPoison(page)) {
> > +   atomic_long_add(1, &mce_bad_pages);
> > +   SetPageHWPoison(page);
> > +   }
> > +
> > return ret;
> >  }
> 
>  A few things:
> 
>  - soft_offline_page() already checks for this case:
> 
>   if (PageHWPoison(page)) {
>   unlock_page(page);
>   put_page(page);
>   pr_info("soft offline: %#lx page already poisoned\n", pfn);
>   return -EBUSY;
>   }
> 
>   so why didn't this check work for you?
> 
>   Presumably because one of the earlier "goto done" branches was
>   taken.  Which one, any why?
> 
>   This function is an utter mess.  It contains six return points
>   randomly intermingled with three "goto done" return points.
> 
>   This mess is probably the cause of the bug you have observed.  Can
>   we please fix it up somehow?  It *seems* that the design (lol) of
>   this function is "for errors, return immediately.  For success, goto
>   done".  In which case "done" should have been called "success".  But
>   if you just look at the function you'll see that this approach didn't
>   work.  I suggest it be converted to have two return points - one for
>   the success path, one for the failure path.  Or something.
> 
>  - soft_offline_huge_page() is a miniature copy of soft_offline_page()
>   and might suffer the same bug.
> 
>  - A cleaner, shorter and possibly faster implementation is
> 
>   if (!TestSetPageHWPoison(page))
>   atomic_long_add(1, &mce_bad_pages);
> 
> >>>
> >>> Hi Andrew,
> >>>
> >>> Since hwpoison bit for free buddy page has already be set in 
> >>> get_any_page, 
> >>> !TestSetPageHWPoison(page) will not increase mce_bad_pages count even for 
> >>> the first time.
> >>>
> >>> Regards,
> >>> Wanpeng Li
> >>>
> >>
> >> The poisoned page is isolated in bad_page(), I wonder whether it could be 
> >> isolated
> >> immediately in soft_offline_page() and memory_failure()?
> >>
> >> buffered_rmqueue()
> >>prep_new_page()
> >>check_new_page()
> >>bad_page()
> > 
> > Do you mean else if(is_free_buddy_page(p)) branch is redundancy?
> > 
> 
> Hi Simon,
> 
> get_any_page() -> "else if(is_free_buddy_page(p))" branch is *not* redundancy.
> 
> It is another topic, I mean since the page is poisoned, so why not isolate it

What topic? I still can't figure out when this branch can be executed
since hwpoison inject path can't poison free buddy pages.

> from page buddy alocator in soft_offline_page() rather than in 
> check_new_page().
> 
> I find soft_offline_page() only migrate the page and mark HWPoison, the 
> poisoned
> page is still managed by page buddy alocator.
> 
> >>
> >> Thanks
> >> Xishi Qiu
> >>
>  - We have atomic_long_inc().  Use it?
> 
>  - Why do we have a variable called "mce_bad_pages"?  MCE is an x86
>   concept, and this code is in mm/.  Lights are flashing, bells are
>   ringing and a loudspeaker is blaring "layering violation" at us!
> 
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-10 Thread Xishi Qiu
On 2012/12/10 19:39, Wanpeng Li wrote:

> On Mon, Dec 10, 2012 at 07:16:50PM +0800, Xishi Qiu wrote:
>> On 2012/12/10 18:47, Simon Jeons wrote:
>>
>>> On Mon, 2012-12-10 at 17:06 +0800, Xishi Qiu wrote:
 On 2012/12/10 16:33, Wanpeng Li wrote:

> On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote:
>> On Fri, 7 Dec 2012 16:48:45 +0800
>> Xishi Qiu  wrote:
>>
>>> On x86 platform, if we use 
>>> "/sys/devices/system/memory/soft_offline_page" to offline a
>>> free page twice, the value of mce_bad_pages will be added twice. So 
>>> this is an error,
>>> since the page was already marked HWPoison, we should skip the page and 
>>> don't add the
>>> value of mce_bad_pages.
>>>
>>> $ cat /proc/meminfo | grep HardwareCorrupted
>>>
>>> soft_offline_page()
>>> get_any_page()
>>> atomic_long_add(1, &mce_bad_pages)
>>>
>>> ...
>>>
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int 
>>> flags)
>>> return ret;
>>>
>>>  done:
>>> -   atomic_long_add(1, &mce_bad_pages);
>>> -   SetPageHWPoison(page);
>>> /* keep elevated page count for bad page */
>>> +   if (!PageHWPoison(page)) {
>>> +   atomic_long_add(1, &mce_bad_pages);
>>> +   SetPageHWPoison(page);
>>> +   }
>>> +
>>> return ret;
>>>  }
>>
>> A few things:
>>
>> - soft_offline_page() already checks for this case:
>>
>>  if (PageHWPoison(page)) {
>>  unlock_page(page);
>>  put_page(page);
>>  pr_info("soft offline: %#lx page already poisoned\n", pfn);
>>  return -EBUSY;
>>  }
>>
>>  so why didn't this check work for you?
>>
>>  Presumably because one of the earlier "goto done" branches was
>>  taken.  Which one, any why?
>>
>>  This function is an utter mess.  It contains six return points
>>  randomly intermingled with three "goto done" return points.
>>
>>  This mess is probably the cause of the bug you have observed.  Can
>>  we please fix it up somehow?  It *seems* that the design (lol) of
>>  this function is "for errors, return immediately.  For success, goto
>>  done".  In which case "done" should have been called "success".  But
>>  if you just look at the function you'll see that this approach didn't
>>  work.  I suggest it be converted to have two return points - one for
>>  the success path, one for the failure path.  Or something.
>>
>> - soft_offline_huge_page() is a miniature copy of soft_offline_page()
>>  and might suffer the same bug.
>>
>> - A cleaner, shorter and possibly faster implementation is
>>
>>  if (!TestSetPageHWPoison(page))
>>  atomic_long_add(1, &mce_bad_pages);
>>
>
> Hi Andrew,
>
> Since hwpoison bit for free buddy page has already be set in 
> get_any_page, 
> !TestSetPageHWPoison(page) will not increase mce_bad_pages count even for 
> the first time.
>
> Regards,
> Wanpeng Li
>

 The poisoned page is isolated in bad_page(), I wonder whether it could be 
 isolated
 immediately in soft_offline_page() and memory_failure()?

 buffered_rmqueue()
prep_new_page()
check_new_page()
bad_page()
>>>
>>> Do you mean else if(is_free_buddy_page(p)) branch is redundancy?
>>>
>>
>> Hi Simon,
>>
>> get_any_page() -> "else if(is_free_buddy_page(p))" branch is *not* 
>> redundancy.
>>
>> It is another topic, I mean since the page is poisoned, so why not isolate it
>>from page buddy alocator in soft_offline_page() rather than in 
>>check_new_page().
>>
>> I find soft_offline_page() only migrate the page and mark HWPoison, the 
>> poisoned
>> page is still managed by page buddy alocator.
>>
> 
> Hi Xishi,
> 
> HWPoison delays any action on buddy allocator pages, handling can be safely 
> postponed 
> until a later time when the page might be referenced. By delaying, some 
> transient errors 
> may not reoccur or may be irrelevant.
> 
> Regards,
> Wanpeng Li 
> 

Hi Wanpeng, thanks for your explanation.

One more question, can we add a list_head to manager the poisoned pages? I find 
ia64
has the array which named "static struct page *page_isolate[MAX_PAGE_ISOLATE]".

Andrew, what do you think?

Thanks
Xishi Qiu


 Thanks
 Xishi Qiu

>> - We have atomic_long_inc().  Use it?
>>
>> - Why do we have a variable called "mce_bad_pages"?  MCE is an x86
>>  concept, and this code is in mm/.  Lights are flashing, bells are
>>  ringing and a loudspeaker is blaring "layering violation" at us!
>>
>>
> 
> 
> .
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the bo

Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-10 Thread Xishi Qiu
On 2012/12/10 18:47, Simon Jeons wrote:

> On Mon, 2012-12-10 at 17:06 +0800, Xishi Qiu wrote:
>> On 2012/12/10 16:33, Wanpeng Li wrote:
>>
>>> On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote:
 On Fri, 7 Dec 2012 16:48:45 +0800
 Xishi Qiu  wrote:

> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" 
> to offline a
> free page twice, the value of mce_bad_pages will be added twice. So this 
> is an error,
> since the page was already marked HWPoison, we should skip the page and 
> don't add the
> value of mce_bad_pages.
>
> $ cat /proc/meminfo | grep HardwareCorrupted
>
> soft_offline_page()
>   get_any_page()
>   atomic_long_add(1, &mce_bad_pages)
>
> ...
>
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags)
>   return ret;
>
>  done:
> - atomic_long_add(1, &mce_bad_pages);
> - SetPageHWPoison(page);
>   /* keep elevated page count for bad page */
> + if (!PageHWPoison(page)) {
> + atomic_long_add(1, &mce_bad_pages);
> + SetPageHWPoison(page);
> + }
> +
>   return ret;
>  }

 A few things:

 - soft_offline_page() already checks for this case:

if (PageHWPoison(page)) {
unlock_page(page);
put_page(page);
pr_info("soft offline: %#lx page already poisoned\n", pfn);
return -EBUSY;
}

  so why didn't this check work for you?

  Presumably because one of the earlier "goto done" branches was
  taken.  Which one, any why?

  This function is an utter mess.  It contains six return points
  randomly intermingled with three "goto done" return points.

  This mess is probably the cause of the bug you have observed.  Can
  we please fix it up somehow?  It *seems* that the design (lol) of
  this function is "for errors, return immediately.  For success, goto
  done".  In which case "done" should have been called "success".  But
  if you just look at the function you'll see that this approach didn't
  work.  I suggest it be converted to have two return points - one for
  the success path, one for the failure path.  Or something.

 - soft_offline_huge_page() is a miniature copy of soft_offline_page()
  and might suffer the same bug.

 - A cleaner, shorter and possibly faster implementation is

if (!TestSetPageHWPoison(page))
atomic_long_add(1, &mce_bad_pages);

>>>
>>> Hi Andrew,
>>>
>>> Since hwpoison bit for free buddy page has already be set in get_any_page, 
>>> !TestSetPageHWPoison(page) will not increase mce_bad_pages count even for 
>>> the first time.
>>>
>>> Regards,
>>> Wanpeng Li
>>>
>>
>> The poisoned page is isolated in bad_page(), I wonder whether it could be 
>> isolated
>> immediately in soft_offline_page() and memory_failure()?
>>
>> buffered_rmqueue()
>>  prep_new_page()
>>  check_new_page()
>>  bad_page()
> 
> Do you mean else if(is_free_buddy_page(p)) branch is redundancy?
> 

Hi Simon,

get_any_page() -> "else if(is_free_buddy_page(p))" branch is *not* redundancy.

It is another topic, I mean since the page is poisoned, so why not isolate it
from page buddy alocator in soft_offline_page() rather than in check_new_page().

I find soft_offline_page() only migrate the page and mark HWPoison, the poisoned
page is still managed by page buddy alocator.

>>
>> Thanks
>> Xishi Qiu
>>
 - We have atomic_long_inc().  Use it?

 - Why do we have a variable called "mce_bad_pages"?  MCE is an x86
  concept, and this code is in mm/.  Lights are flashing, bells are
  ringing and a loudspeaker is blaring "layering violation" at us!



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-10 Thread Simon Jeons
On Mon, 2012-12-10 at 17:06 +0800, Xishi Qiu wrote:
> On 2012/12/10 16:33, Wanpeng Li wrote:
> 
> > On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote:
> >> On Fri, 7 Dec 2012 16:48:45 +0800
> >> Xishi Qiu  wrote:
> >>
> >>> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" 
> >>> to offline a
> >>> free page twice, the value of mce_bad_pages will be added twice. So this 
> >>> is an error,
> >>> since the page was already marked HWPoison, we should skip the page and 
> >>> don't add the
> >>> value of mce_bad_pages.
> >>>
> >>> $ cat /proc/meminfo | grep HardwareCorrupted
> >>>
> >>> soft_offline_page()
> >>>   get_any_page()
> >>>   atomic_long_add(1, &mce_bad_pages)
> >>>
> >>> ...
> >>>
> >>> --- a/mm/memory-failure.c
> >>> +++ b/mm/memory-failure.c
> >>> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags)
> >>>   return ret;
> >>>
> >>>  done:
> >>> - atomic_long_add(1, &mce_bad_pages);
> >>> - SetPageHWPoison(page);
> >>>   /* keep elevated page count for bad page */
> >>> + if (!PageHWPoison(page)) {
> >>> + atomic_long_add(1, &mce_bad_pages);
> >>> + SetPageHWPoison(page);
> >>> + }
> >>> +
> >>>   return ret;
> >>>  }
> >>
> >> A few things:
> >>
> >> - soft_offline_page() already checks for this case:
> >>
> >>if (PageHWPoison(page)) {
> >>unlock_page(page);
> >>put_page(page);
> >>pr_info("soft offline: %#lx page already poisoned\n", pfn);
> >>return -EBUSY;
> >>}
> >>
> >>  so why didn't this check work for you?
> >>
> >>  Presumably because one of the earlier "goto done" branches was
> >>  taken.  Which one, any why?
> >>
> >>  This function is an utter mess.  It contains six return points
> >>  randomly intermingled with three "goto done" return points.
> >>
> >>  This mess is probably the cause of the bug you have observed.  Can
> >>  we please fix it up somehow?  It *seems* that the design (lol) of
> >>  this function is "for errors, return immediately.  For success, goto
> >>  done".  In which case "done" should have been called "success".  But
> >>  if you just look at the function you'll see that this approach didn't
> >>  work.  I suggest it be converted to have two return points - one for
> >>  the success path, one for the failure path.  Or something.
> >>
> >> - soft_offline_huge_page() is a miniature copy of soft_offline_page()
> >>  and might suffer the same bug.
> >>
> >> - A cleaner, shorter and possibly faster implementation is
> >>
> >>if (!TestSetPageHWPoison(page))
> >>atomic_long_add(1, &mce_bad_pages);
> >>
> > 
> > Hi Andrew,
> > 
> > Since hwpoison bit for free buddy page has already be set in get_any_page, 
> > !TestSetPageHWPoison(page) will not increase mce_bad_pages count even for 
> > the first time.
> > 
> > Regards,
> > Wanpeng Li
> > 
> 
> The poisoned page is isolated in bad_page(), I wonder whether it could be 
> isolated
> immediately in soft_offline_page() and memory_failure()?
> 
> buffered_rmqueue()
>   prep_new_page()
>   check_new_page()
>   bad_page()

Do you mean else if(is_free_buddy_page(p)) branch is redundancy?

> 
> Thanks
> Xishi Qiu
> 
> >> - We have atomic_long_inc().  Use it?
> >>
> >> - Why do we have a variable called "mce_bad_pages"?  MCE is an x86
> >>  concept, and this code is in mm/.  Lights are flashing, bells are
> >>  ringing and a loudspeaker is blaring "layering violation" at us!
> >>
> >> --
> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> the body to majord...@kvack.org.  For more info on Linux MM,
> >> see: http://www.linux-mm.org/ .
> >> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
> > 
> > 
> > .
> > 
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-10 Thread Xishi Qiu
On 2012/12/10 16:33, Wanpeng Li wrote:

> On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote:
>> On Fri, 7 Dec 2012 16:48:45 +0800
>> Xishi Qiu  wrote:
>>
>>> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" 
>>> to offline a
>>> free page twice, the value of mce_bad_pages will be added twice. So this is 
>>> an error,
>>> since the page was already marked HWPoison, we should skip the page and 
>>> don't add the
>>> value of mce_bad_pages.
>>>
>>> $ cat /proc/meminfo | grep HardwareCorrupted
>>>
>>> soft_offline_page()
>>> get_any_page()
>>> atomic_long_add(1, &mce_bad_pages)
>>>
>>> ...
>>>
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags)
>>> return ret;
>>>
>>>  done:
>>> -   atomic_long_add(1, &mce_bad_pages);
>>> -   SetPageHWPoison(page);
>>> /* keep elevated page count for bad page */
>>> +   if (!PageHWPoison(page)) {
>>> +   atomic_long_add(1, &mce_bad_pages);
>>> +   SetPageHWPoison(page);
>>> +   }
>>> +
>>> return ret;
>>>  }
>>
>> A few things:
>>
>> - soft_offline_page() already checks for this case:
>>
>>  if (PageHWPoison(page)) {
>>  unlock_page(page);
>>  put_page(page);
>>  pr_info("soft offline: %#lx page already poisoned\n", pfn);
>>  return -EBUSY;
>>  }
>>
>>  so why didn't this check work for you?
>>
>>  Presumably because one of the earlier "goto done" branches was
>>  taken.  Which one, any why?
>>
>>  This function is an utter mess.  It contains six return points
>>  randomly intermingled with three "goto done" return points.
>>
>>  This mess is probably the cause of the bug you have observed.  Can
>>  we please fix it up somehow?  It *seems* that the design (lol) of
>>  this function is "for errors, return immediately.  For success, goto
>>  done".  In which case "done" should have been called "success".  But
>>  if you just look at the function you'll see that this approach didn't
>>  work.  I suggest it be converted to have two return points - one for
>>  the success path, one for the failure path.  Or something.
>>
>> - soft_offline_huge_page() is a miniature copy of soft_offline_page()
>>  and might suffer the same bug.
>>
>> - A cleaner, shorter and possibly faster implementation is
>>
>>  if (!TestSetPageHWPoison(page))
>>  atomic_long_add(1, &mce_bad_pages);
>>
> 
> Hi Andrew,
> 
> Since hwpoison bit for free buddy page has already be set in get_any_page, 
> !TestSetPageHWPoison(page) will not increase mce_bad_pages count even for 
> the first time.
> 
> Regards,
> Wanpeng Li
> 

The poisoned page is isolated in bad_page(), I wonder whether it could be 
isolated
immediately in soft_offline_page() and memory_failure()?

buffered_rmqueue()
prep_new_page()
check_new_page()
bad_page()

Thanks
Xishi Qiu

>> - We have atomic_long_inc().  Use it?
>>
>> - Why do we have a variable called "mce_bad_pages"?  MCE is an x86
>>  concept, and this code is in mm/.  Lights are flashing, bells are
>>  ringing and a loudspeaker is blaring "layering violation" at us!
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majord...@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
> 
> 
> .
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-09 Thread Xishi Qiu
On 2012/12/8 6:11, Andrew Morton wrote:

> On Fri, 7 Dec 2012 16:48:45 +0800
> Xishi Qiu  wrote:
> 
>> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to 
>> offline a
>> free page twice, the value of mce_bad_pages will be added twice. So this is 
>> an error,
>> since the page was already marked HWPoison, we should skip the page and 
>> don't add the
>> value of mce_bad_pages.
>>
>> $ cat /proc/meminfo | grep HardwareCorrupted
>>
>> soft_offline_page()
>>  get_any_page()
>>  atomic_long_add(1, &mce_bad_pages)
>>
>> ...
>>
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags)
>>  return ret;
>>
>>  done:
>> -atomic_long_add(1, &mce_bad_pages);
>> -SetPageHWPoison(page);
>>  /* keep elevated page count for bad page */
>> +if (!PageHWPoison(page)) {
>> +atomic_long_add(1, &mce_bad_pages);
>> +SetPageHWPoison(page);
>> +}
>> +
>>  return ret;
>>  }
> 
> A few things:
> 
> - soft_offline_page() already checks for this case:
> 
>   if (PageHWPoison(page)) {
>   unlock_page(page);
>   put_page(page);
>   pr_info("soft offline: %#lx page already poisoned\n", pfn);
>   return -EBUSY;
>   }
> 
>   so why didn't this check work for you?
> 
>   Presumably because one of the earlier "goto done" branches was
>   taken.  Which one, any why?
> 
>   This function is an utter mess.  It contains six return points
>   randomly intermingled with three "goto done" return points.
> 
>   This mess is probably the cause of the bug you have observed.  Can
>   we please fix it up somehow?  It *seems* that the design (lol) of
>   this function is "for errors, return immediately.  For success, goto
>   done".  In which case "done" should have been called "success".  But
>   if you just look at the function you'll see that this approach didn't
>   work.  I suggest it be converted to have two return points - one for
>   the success path, one for the failure path.  Or something.
> 
> - soft_offline_huge_page() is a miniature copy of soft_offline_page()
>   and might suffer the same bug.
> 
> - A cleaner, shorter and possibly faster implementation is
> 
>   if (!TestSetPageHWPoison(page))
>   atomic_long_add(1, &mce_bad_pages);
> 
> - We have atomic_long_inc().  Use it?
> 
> - Why do we have a variable called "mce_bad_pages"?  MCE is an x86
>   concept, and this code is in mm/.  Lights are flashing, bells are
>   ringing and a loudspeaker is blaring "layering violation" at us!
> 

Hi Andrew, thank you for your advice, I will send V3 soon.

Thanks
Xishi Qiu

> .
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-07 Thread Borislav Petkov
On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote:
> A few things:
> 
> - soft_offline_page() already checks for this case:
> 
>   if (PageHWPoison(page)) {
>   unlock_page(page);
>   put_page(page);
>   pr_info("soft offline: %#lx page already poisoned\n", pfn);
>   return -EBUSY;
>   }

Oh, so we do this check after all. But later in the function. Why? Why
not at the beginning so that when a page is marked poisoned already we
can exit early?

Strange.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-07 Thread Andrew Morton
On Fri, 7 Dec 2012 16:48:45 +0800
Xishi Qiu  wrote:

> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to 
> offline a
> free page twice, the value of mce_bad_pages will be added twice. So this is 
> an error,
> since the page was already marked HWPoison, we should skip the page and don't 
> add the
> value of mce_bad_pages.
> 
> $ cat /proc/meminfo | grep HardwareCorrupted
> 
> soft_offline_page()
>   get_any_page()
>   atomic_long_add(1, &mce_bad_pages)
> 
> ...
>
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags)
>   return ret;
> 
>  done:
> - atomic_long_add(1, &mce_bad_pages);
> - SetPageHWPoison(page);
>   /* keep elevated page count for bad page */
> + if (!PageHWPoison(page)) {
> + atomic_long_add(1, &mce_bad_pages);
> + SetPageHWPoison(page);
> + }
> +
>   return ret;
>  }

A few things:

- soft_offline_page() already checks for this case:

if (PageHWPoison(page)) {
unlock_page(page);
put_page(page);
pr_info("soft offline: %#lx page already poisoned\n", pfn);
return -EBUSY;
}

  so why didn't this check work for you?

  Presumably because one of the earlier "goto done" branches was
  taken.  Which one, any why?

  This function is an utter mess.  It contains six return points
  randomly intermingled with three "goto done" return points.

  This mess is probably the cause of the bug you have observed.  Can
  we please fix it up somehow?  It *seems* that the design (lol) of
  this function is "for errors, return immediately.  For success, goto
  done".  In which case "done" should have been called "success".  But
  if you just look at the function you'll see that this approach didn't
  work.  I suggest it be converted to have two return points - one for
  the success path, one for the failure path.  Or something.

- soft_offline_huge_page() is a miniature copy of soft_offline_page()
  and might suffer the same bug.

- A cleaner, shorter and possibly faster implementation is

if (!TestSetPageHWPoison(page))
atomic_long_add(1, &mce_bad_pages);

- We have atomic_long_inc().  Use it?

- Why do we have a variable called "mce_bad_pages"?  MCE is an x86
  concept, and this code is in mm/.  Lights are flashing, bells are
  ringing and a loudspeaker is blaring "layering violation" at us!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-07 Thread Borislav Petkov
On Fri, Dec 07, 2012 at 04:48:45PM +0800, Xishi Qiu wrote:
> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to 
> offline a
> free page twice, the value of mce_bad_pages will be added twice. So this is 
> an error,
> since the page was already marked HWPoison, we should skip the page and don't 
> add the
> value of mce_bad_pages.
> 
> $ cat /proc/meminfo | grep HardwareCorrupted
> 
> soft_offline_page()
>   get_any_page()
>   atomic_long_add(1, &mce_bad_pages)
> 
> Signed-off-by: Xishi Qiu 
> Signed-off-by: Jiang Liu 
> ---
>  mm/memory-failure.c |7 +--
>  1 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 8b20278..de760ca 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags)
>   return ret;
> 
>  done:
> - atomic_long_add(1, &mce_bad_pages);
> - SetPageHWPoison(page);
>   /* keep elevated page count for bad page */
> + if (!PageHWPoison(page)) {
> + atomic_long_add(1, &mce_bad_pages);
> + SetPageHWPoison(page);
> + }

Ok, I don't know the memory-failure code all that well but IMHO why
should we wade through the whole soft_offline_page function for a page
which has been marked as poisoned already?

IOW, why not do what you started with previously and exit this function
ASAP:

---
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 8b20278be6a6..a83baeca0644 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1494,6 +1494,12 @@ int soft_offline_page(struct page *page, int flags)
if (ret == 0)
goto done;
 
+   if (PageHWPoison(page)) {
+   put_page(page);
+   pr_info("soft offline: %#lx page already poisoned\n", pfn);
+   return -EBUSY;
+   }
+
/*
 * Page cache page we can handle?
 */
---

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2] MCE: fix an error of mce_bad_pages statistics

2012-12-07 Thread Xishi Qiu
On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to 
offline a
free page twice, the value of mce_bad_pages will be added twice. So this is an 
error,
since the page was already marked HWPoison, we should skip the page and don't 
add the
value of mce_bad_pages.

$ cat /proc/meminfo | grep HardwareCorrupted

soft_offline_page()
get_any_page()
atomic_long_add(1, &mce_bad_pages)

Signed-off-by: Xishi Qiu 
Signed-off-by: Jiang Liu 
---
 mm/memory-failure.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 8b20278..de760ca 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags)
return ret;

 done:
-   atomic_long_add(1, &mce_bad_pages);
-   SetPageHWPoison(page);
/* keep elevated page count for bad page */
+   if (!PageHWPoison(page)) {
+   atomic_long_add(1, &mce_bad_pages);
+   SetPageHWPoison(page);
+   }
+
return ret;
 }
-- 
1.7.6.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/