Re: [PATCH v3 01/16] mm: use put_page to free page instead of putback_lru_page

2016-04-06 Thread Vlastimil Babka
On 04/06/2016 02:54 AM, Naoya Horiguchi wrote:
> On Tue, Apr 05, 2016 at 10:20:50AM +0200, Vlastimil Babka wrote:
>>
>> So you agree that this race is a bug? It may turn a soft-offline attempt
>> into a killed process. In that case we should fix it the same as we are
>> fixing the failed migration case.
> 
> I agree, it's a bug, although rare and non-critical.
> 
>> Maybe it will be just enough to switch
>> the test_set_page_hwpoison() and put_page() calls?
> 
> Unfortunately that restores the other race with unpoison (described below.)
> Sorry for my bad/unclear statements, these races seems exclusive and a 
> compatible
> solution is not found, so I prioritized fixing the latter one by comparing
> severity (the latter causes kernel crash,) which led to the current code.

Ah, I see. However unpoison is a functionality just for stress-testing,
and not expected to be used in production, right? So it's somewhat
unfortunate trade-off with danger of soft-offlining killing an unrelated
process.

>>> And another practical thing is the race with unpoison_memory() as described
>>> in commit da1b13ccfbebe. unpoison_memory() properly works only for properly
>>> poisoned pages, so doing unpoison for in-use hwpoisoned pages is fragile.
>>> That's why I'd like to avoid setting PageHWPoison for in-use pages if 
>>> possible.
>>>
 (Also, which part prevents pages with PageHWPoison to be allocated
 again, anyway? I can't find it and test_set_page_hwpoison() doesn't
 remove from buddy freelists).
>>>
>>> check_new_page() in mm/page_alloc.c should prevent reallocation of 
>>> PageHWPoison.
>>> As you pointed out, memory error handler doens't remove it from buddy 
>>> freelists.
>>
>> Oh, I see. It's using __PG_HWPOISON wrapper, so I didn't notice it when
>> searching. In any case that results in a bad_page() warning, right? Is
>> it desirable for a soft-offlined page?
> 
> That's right, and the bad_page warning might be too strong for soft offlining.
> We can't tell which of memory_failure/soft_offline_page a PageHWPoison came
> from, but users can find other lines in dmesg which should tell that.
> And memory error events can hit buddy pages directly, in that case we still
> need the check in check_new_page().

Ah, ok.

>> If we didn't free poisoned pages
>> to buddy system, they wouldn't trigger this warning.
> 
> Actually, we didn't free at commit add05cecef80 ("mm: soft-offline: don't free
> target page in successful page migration"), but that's was reverted in
> commit f4c18e6f7b5b ("mm: check __PG_HWPOISON separately from 
> PAGE_FLAGS_CHECK_AT_*").
> Now I start thinking the revert was a bad decision, so I'll dig this problem 
> again.

Good.

>>> BTW, it might be a bit off-topic, but recently I felt that check_new_page()
>>> might be improvable, because when check_new_page() returns 1, the whole 
>>> buddy
>>> block (not only the bad page) seems to be leaked from buddy freelist.
>>> For example, if thp (order 9) is requested, and PageHWPoison (or any other
>>> types of bad pages) is found in an order 9 block, all 512 page are 
>>> discarded.
>>> Unpoison can't bring it back to buddy.
>>> So, some code to split buddy block including bad page (and recovering code 
>>> from
>>> unpoison) might be helpful, although that's another story ...
>>
>> Hm sounds like another argument for not freeing the page to buddy lists
>> in the first place. Maybe a hook in free_pages_check()?
> 
> Sounds a good idea. I'll try it, too.

So what I think could hopefully work is to replace the put_page() after
migration with a hwpoison-specific construct that does something like:

if (put_page_testzero(page))
 if (test_set_page_hwpoison()) ...
 __put_page()

With some more thought about what other parts of put_page() apply - how
to handle compound pages and zone-device pages.

That should hopefully be the safest course. When put_page_testzero()
succeeds, there should be no other (current of near-future) users of the
page, and we can still do whatever we need before releasing to
__put_page(). I.e. set the HWPoison flag, and maybe combine this with
modification to free_pages_check() to divert it from becoming a buddy page.

It should be even safer than the current "put_page();
test_set_page_hwpoison();" approach in that we are currently not
guaranteed that the put_page() is indeed releasing the last pin, but we
set HWPoison in any case. Although we have just migrated the page away,
there might be a pfn scanner holding its pin and checking the page.
Hopefully no such scanner has a path that would break on HWPoison flag,
but I don't know. By not setting the HWpoison when we don't succeed
put_page_testzero(), we are safer. It's true the page might stay
unpoisoned due to a temporary pin, but the process data was migrated
away which is the important part, and userspace can retry anyway?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundati

Re: [PATCH v3 01/16] mm: use put_page to free page instead of putback_lru_page

2016-04-05 Thread Naoya Horiguchi
On Tue, Apr 05, 2016 at 10:20:50AM +0200, Vlastimil Babka wrote:
> On 04/05/2016 03:54 AM, Naoya Horiguchi wrote:
> > On Mon, Apr 04, 2016 at 04:46:31PM +0200, Vlastimil Babka wrote:
> >> On 04/04/2016 06:45 AM, Naoya Horiguchi wrote:
> >>> On Mon, Apr 04, 2016 at 10:39:17AM +0900, Minchan Kim wrote:
> > ...
> >
> > Also (but not your fault) the put_page() preceding
> > test_set_page_hwpoison(page)) IMHO deserves a comment saying which
> > pin we are releasing and which one we still have (hopefully? if I
> > read description of da1b13ccfbebe right) otherwise it looks like
> > doing something with a page that we just potentially freed.
> 
>  Yes, while I read the code, I had same question. I think the releasing
>  refcount is for get_any_page.
> >>>
> >>> As the other callers of page migration do, soft_offline_page expects the
> >>> migration source page to be freed at this put_page() (no pin remains.)
> >>> The refcount released here is from isolate_lru_page() in 
> >>> __soft_offline_page().
> >>> (the pin by get_any_page is released by put_hwpoison_page just after it.)
> >>>
> >>> .. yes, doing something just after freeing page looks weird, but that's
> >>> how PageHWPoison flag works. IOW, many other page flags are maintained
> >>> only during one "allocate-free" life span, but PageHWPoison still does
> >>> its job beyond it.
> >>
> >> But what prevents the page from being allocated again between put_page()
> >> and test_set_page_hwpoison()? In that case we would be marking page
> >> poisoned while still in use, which is the same as marking it while still
> >> in use after a failed migration?
> > 
> > Actually nothing prevents that race. But I think that the result of the race
> > is that the error page can be reused for allocation, which results in 
> > killing
> > processes at page fault time. Soft offline is kind of mild/precautious thing
> > (for correctable errors that don't require immediate handling), so killing
> > processes looks to me an overkill. And marking hwpoison means that we can no
> > longer do retry from userspace.
> 
> So you agree that this race is a bug? It may turn a soft-offline attempt
> into a killed process. In that case we should fix it the same as we are
> fixing the failed migration case.

I agree, it's a bug, although rare and non-critical.

> Maybe it will be just enough to switch
> the test_set_page_hwpoison() and put_page() calls?

Unfortunately that restores the other race with unpoison (described below.)
Sorry for my bad/unclear statements, these races seems exclusive and a 
compatible
solution is not found, so I prioritized fixing the latter one by comparing
severity (the latter causes kernel crash,) which led to the current code.

> > And another practical thing is the race with unpoison_memory() as described
> > in commit da1b13ccfbebe. unpoison_memory() properly works only for properly
> > poisoned pages, so doing unpoison for in-use hwpoisoned pages is fragile.
> > That's why I'd like to avoid setting PageHWPoison for in-use pages if 
> > possible.
> > 
> >> (Also, which part prevents pages with PageHWPoison to be allocated
> >> again, anyway? I can't find it and test_set_page_hwpoison() doesn't
> >> remove from buddy freelists).
> > 
> > check_new_page() in mm/page_alloc.c should prevent reallocation of 
> > PageHWPoison.
> > As you pointed out, memory error handler doens't remove it from buddy 
> > freelists.
> 
> Oh, I see. It's using __PG_HWPOISON wrapper, so I didn't notice it when
> searching. In any case that results in a bad_page() warning, right? Is
> it desirable for a soft-offlined page?

That's right, and the bad_page warning might be too strong for soft offlining.
We can't tell which of memory_failure/soft_offline_page a PageHWPoison came
from, but users can find other lines in dmesg which should tell that.
And memory error events can hit buddy pages directly, in that case we still
need the check in check_new_page().

> If we didn't free poisoned pages
> to buddy system, they wouldn't trigger this warning.

Actually, we didn't free at commit add05cecef80 ("mm: soft-offline: don't free
target page in successful page migration"), but that's was reverted in
commit f4c18e6f7b5b ("mm: check __PG_HWPOISON separately from 
PAGE_FLAGS_CHECK_AT_*").
Now I start thinking the revert was a bad decision, so I'll dig this problem 
again.

> > BTW, it might be a bit off-topic, but recently I felt that check_new_page()
> > might be improvable, because when check_new_page() returns 1, the whole 
> > buddy
> > block (not only the bad page) seems to be leaked from buddy freelist.
> > For example, if thp (order 9) is requested, and PageHWPoison (or any other
> > types of bad pages) is found in an order 9 block, all 512 page are 
> > discarded.
> > Unpoison can't bring it back to buddy.
> > So, some code to split buddy block including bad page (and recovering code 
> > from
> > unpoison) might be helpful, although that's another story 

Re: [PATCH v3 01/16] mm: use put_page to free page instead of putback_lru_page

2016-04-05 Thread Vlastimil Babka
On 04/05/2016 03:54 AM, Naoya Horiguchi wrote:
> On Mon, Apr 04, 2016 at 04:46:31PM +0200, Vlastimil Babka wrote:
>> On 04/04/2016 06:45 AM, Naoya Horiguchi wrote:
>>> On Mon, Apr 04, 2016 at 10:39:17AM +0900, Minchan Kim wrote:
> ...
>
> Also (but not your fault) the put_page() preceding
> test_set_page_hwpoison(page)) IMHO deserves a comment saying which
> pin we are releasing and which one we still have (hopefully? if I
> read description of da1b13ccfbebe right) otherwise it looks like
> doing something with a page that we just potentially freed.

 Yes, while I read the code, I had same question. I think the releasing
 refcount is for get_any_page.
>>>
>>> As the other callers of page migration do, soft_offline_page expects the
>>> migration source page to be freed at this put_page() (no pin remains.)
>>> The refcount released here is from isolate_lru_page() in 
>>> __soft_offline_page().
>>> (the pin by get_any_page is released by put_hwpoison_page just after it.)
>>>
>>> .. yes, doing something just after freeing page looks weird, but that's
>>> how PageHWPoison flag works. IOW, many other page flags are maintained
>>> only during one "allocate-free" life span, but PageHWPoison still does
>>> its job beyond it.
>>
>> But what prevents the page from being allocated again between put_page()
>> and test_set_page_hwpoison()? In that case we would be marking page
>> poisoned while still in use, which is the same as marking it while still
>> in use after a failed migration?
> 
> Actually nothing prevents that race. But I think that the result of the race
> is that the error page can be reused for allocation, which results in killing
> processes at page fault time. Soft offline is kind of mild/precautious thing
> (for correctable errors that don't require immediate handling), so killing
> processes looks to me an overkill. And marking hwpoison means that we can no
> longer do retry from userspace.

So you agree that this race is a bug? It may turn a soft-offline attempt
into a killed process. In that case we should fix it the same as we are
fixing the failed migration case. Maybe it will be just enough to switch
the test_set_page_hwpoison() and put_page() calls?

> And another practical thing is the race with unpoison_memory() as described
> in commit da1b13ccfbebe. unpoison_memory() properly works only for properly
> poisoned pages, so doing unpoison for in-use hwpoisoned pages is fragile.
> That's why I'd like to avoid setting PageHWPoison for in-use pages if 
> possible.
> 
>> (Also, which part prevents pages with PageHWPoison to be allocated
>> again, anyway? I can't find it and test_set_page_hwpoison() doesn't
>> remove from buddy freelists).
> 
> check_new_page() in mm/page_alloc.c should prevent reallocation of 
> PageHWPoison.
> As you pointed out, memory error handler doens't remove it from buddy 
> freelists.

Oh, I see. It's using __PG_HWPOISON wrapper, so I didn't notice it when
searching. In any case that results in a bad_page() warning, right? Is
it desirable for a soft-offlined page? If we didn't free poisoned pages
to buddy system, they wouldn't trigger this warning.

> BTW, it might be a bit off-topic, but recently I felt that check_new_page()
> might be improvable, because when check_new_page() returns 1, the whole buddy
> block (not only the bad page) seems to be leaked from buddy freelist.
> For example, if thp (order 9) is requested, and PageHWPoison (or any other
> types of bad pages) is found in an order 9 block, all 512 page are discarded.
> Unpoison can't bring it back to buddy.
> So, some code to split buddy block including bad page (and recovering code 
> from
> unpoison) might be helpful, although that's another story ...

Hm sounds like another argument for not freeing the page to buddy lists
in the first place. Maybe a hook in free_pages_check()?

> Thanks,
> Naoya Horiguchi
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 01/16] mm: use put_page to free page instead of putback_lru_page

2016-04-04 Thread Balbir Singh


On 04/04/16 16:01, Minchan Kim wrote:
> On Mon, Apr 04, 2016 at 03:53:59PM +1000, Balbir Singh wrote:
>>
>> On 30/03/16 18:12, Minchan Kim wrote:
>>> Procedure of page migration is as follows:
>>>
>>> First of all, it should isolate a page from LRU and try to
>>> migrate the page. If it is successful, it releases the page
>>> for freeing. Otherwise, it should put the page back to LRU
>>> list.
>>>
>>> For LRU pages, we have used putback_lru_page for both freeing
>>> and putback to LRU list. It's okay because put_page is aware of
>>> LRU list so if it releases last refcount of the page, it removes
>>> the page from LRU list. However, It makes unnecessary operations
>>> (e.g., lru_cache_add, pagevec and flags operations. It would be
>>> not significant but no worth to do) and harder to support new
>>> non-lru page migration because put_page isn't aware of non-lru
>>> page's data structure.
>>>
>>> To solve the problem, we can add new hook in put_page with
>>> PageMovable flags check but it can increase overhead in
>>> hot path and needs new locking scheme to stabilize the flag check
>>> with put_page.
>>>
>>> So, this patch cleans it up to divide two semantic(ie, put and putback).
>>> If migration is successful, use put_page instead of putback_lru_page and
>>> use putback_lru_page only on failure. That makes code more readable
>>> and doesn't add overhead in put_page.
>> So effectively when we return from unmap_and_move() the page is either
>> put_page or putback_lru_page() and the page is gone from under us.
> I didn't get your point.
> Could you elaborate it more what you want to say about this patch?

I was just adding to my understanding of this change based on your changelog.
My understanding is that we take the extra reference in isolate_lru_page()
but by the time we return from unmap_and_move() we drop the extra reference

Balbir Singh
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 01/16] mm: use put_page to free page instead of putback_lru_page

2016-04-04 Thread Naoya Horiguchi
On Mon, Apr 04, 2016 at 04:46:31PM +0200, Vlastimil Babka wrote:
> On 04/04/2016 06:45 AM, Naoya Horiguchi wrote:
> > On Mon, Apr 04, 2016 at 10:39:17AM +0900, Minchan Kim wrote:
...
> >>>
> >>> Also (but not your fault) the put_page() preceding
> >>> test_set_page_hwpoison(page)) IMHO deserves a comment saying which
> >>> pin we are releasing and which one we still have (hopefully? if I
> >>> read description of da1b13ccfbebe right) otherwise it looks like
> >>> doing something with a page that we just potentially freed.
> >>
> >> Yes, while I read the code, I had same question. I think the releasing
> >> refcount is for get_any_page.
> > 
> > As the other callers of page migration do, soft_offline_page expects the
> > migration source page to be freed at this put_page() (no pin remains.)
> > The refcount released here is from isolate_lru_page() in 
> > __soft_offline_page().
> > (the pin by get_any_page is released by put_hwpoison_page just after it.)
> > 
> > .. yes, doing something just after freeing page looks weird, but that's
> > how PageHWPoison flag works. IOW, many other page flags are maintained
> > only during one "allocate-free" life span, but PageHWPoison still does
> > its job beyond it.
> 
> But what prevents the page from being allocated again between put_page()
> and test_set_page_hwpoison()? In that case we would be marking page
> poisoned while still in use, which is the same as marking it while still
> in use after a failed migration?

Actually nothing prevents that race. But I think that the result of the race
is that the error page can be reused for allocation, which results in killing
processes at page fault time. Soft offline is kind of mild/precautious thing
(for correctable errors that don't require immediate handling), so killing
processes looks to me an overkill. And marking hwpoison means that we can no
longer do retry from userspace.

And another practical thing is the race with unpoison_memory() as described
in commit da1b13ccfbebe. unpoison_memory() properly works only for properly
poisoned pages, so doing unpoison for in-use hwpoisoned pages is fragile.
That's why I'd like to avoid setting PageHWPoison for in-use pages if possible.

> (Also, which part prevents pages with PageHWPoison to be allocated
> again, anyway? I can't find it and test_set_page_hwpoison() doesn't
> remove from buddy freelists).

check_new_page() in mm/page_alloc.c should prevent reallocation of PageHWPoison.
As you pointed out, memory error handler doens't remove it from buddy freelists.


BTW, it might be a bit off-topic, but recently I felt that check_new_page()
might be improvable, because when check_new_page() returns 1, the whole buddy
block (not only the bad page) seems to be leaked from buddy freelist.
For example, if thp (order 9) is requested, and PageHWPoison (or any other
types of bad pages) is found in an order 9 block, all 512 page are discarded.
Unpoison can't bring it back to buddy.
So, some code to split buddy block including bad page (and recovering code from
unpoison) might be helpful, although that's another story ...

Thanks,
Naoya Horiguchi
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 01/16] mm: use put_page to free page instead of putback_lru_page

2016-04-04 Thread Naoya Horiguchi
On Mon, Apr 04, 2016 at 10:39:17AM +0900, Minchan Kim wrote:
> On Fri, Apr 01, 2016 at 02:58:21PM +0200, Vlastimil Babka wrote:
> > On 03/30/2016 09:12 AM, Minchan Kim wrote:
> > >Procedure of page migration is as follows:
> > >
> > >First of all, it should isolate a page from LRU and try to
> > >migrate the page. If it is successful, it releases the page
> > >for freeing. Otherwise, it should put the page back to LRU
> > >list.
> > >
> > >For LRU pages, we have used putback_lru_page for both freeing
> > >and putback to LRU list. It's okay because put_page is aware of
> > >LRU list so if it releases last refcount of the page, it removes
> > >the page from LRU list. However, It makes unnecessary operations
> > >(e.g., lru_cache_add, pagevec and flags operations. It would be
> > >not significant but no worth to do) and harder to support new
> > >non-lru page migration because put_page isn't aware of non-lru
> > >page's data structure.
> > >
> > >To solve the problem, we can add new hook in put_page with
> > >PageMovable flags check but it can increase overhead in
> > >hot path and needs new locking scheme to stabilize the flag check
> > >with put_page.
> > >
> > >So, this patch cleans it up to divide two semantic(ie, put and putback).
> > >If migration is successful, use put_page instead of putback_lru_page and
> > >use putback_lru_page only on failure. That makes code more readable
> > >and doesn't add overhead in put_page.
> > >
> > >Comment from Vlastimil
> > >"Yeah, and compaction (perhaps also other migration users) has to drain
> > >the lru pvec... Getting rid of this stuff is worth even by itself."
> > >
> > >Cc: Mel Gorman 
> > >Cc: Hugh Dickins 
> > >Cc: Naoya Horiguchi 
> > >Acked-by: Vlastimil Babka 
> > >Signed-off-by: Minchan Kim 
> > 
> > [...]
> > 
> > >@@ -974,28 +986,28 @@ static ICE_noinline int unmap_and_move(new_page_t 
> > >get_new_page,
> > >   list_del(&page->lru);
> > >   dec_zone_page_state(page, NR_ISOLATED_ANON +
> > >   page_is_file_cache(page));
> > >-  /* Soft-offlined page shouldn't go through lru cache list */
> > >+  }
> > >+
> > >+  /*
> > >+   * If migration is successful, drop the reference grabbed during
> > >+   * isolation. Otherwise, restore the page to LRU list unless we
> > >+   * want to retry.
> > >+   */
> > >+  if (rc == MIGRATEPAGE_SUCCESS) {
> > >+  put_page(page);
> > >   if (reason == MR_MEMORY_FAILURE) {
> > >-  put_page(page);
> > >   if (!test_set_page_hwpoison(page))
> > >   num_poisoned_pages_inc();
> > >-  } else
> > >+  }
> > 
> > Hmm, I didn't notice it previously, or it's due to rebasing, but it
> > seems that you restricted the memory failure handling (i.e. setting
> > hwpoison) to MIGRATE_SUCCESS, while previously it was done for all
> > non-EAGAIN results. I think that goes against the intention of
> > hwpoison, which is IIRC to catch and kill the poor process that
> > still uses the page?
> 
> That's why I Cc'ed Naoya Horiguchi to catch things I might make
> mistake.
> 
> Thanks for catching it, Vlastimil.
> It was my mistake. But in this chance, I looked over hwpoison code and
> I saw other places which increases num_poisoned_pages are successful
> migration, already freed page and successful invalidated page.
> IOW, they are already successful isolated page so I guess it should
> increase the count when only successful migration is done?

Yes, that's right. When exiting with migration's failure, we shouldn't call
test_set_page_hwpoison or num_poisoned_pages_inc, so current code checking
(rc != -EAGAIN) is simply incorrect. Your change fixes the bug in memory
error handling. Great!

> And when I read memory_failure, it bails out without killing if it
> encounters HWPoisoned page so I think it's not for catching and
> kill the poor proces.
>
> > 
> > Also (but not your fault) the put_page() preceding
> > test_set_page_hwpoison(page)) IMHO deserves a comment saying which
> > pin we are releasing and which one we still have (hopefully? if I
> > read description of da1b13ccfbebe right) otherwise it looks like
> > doing something with a page that we just potentially freed.
>
> Yes, while I read the code, I had same question. I think the releasing
> refcount is for get_any_page.

As the other callers of page migration do, soft_offline_page expects the
migration source page to be freed at this put_page() (no pin remains.)
The refcount released here is from isolate_lru_page() in __soft_offline_page().
(the pin by get_any_page is released by put_hwpoison_page just after it.)

.. yes, doing something just after freeing page looks weird, but that's
how PageHWPoison flag works. IOW, many other page flags are maintained
only during one "allocate-free" life span, but PageHWPoison still does
its job beyond it.

As for commenting, this put_page() is called in any MIGRATEPAGE_SUCCESS
case (regardless of callers), so what we can 

Re: [PATCH v3 01/16] mm: use put_page to free page instead of putback_lru_page

2016-04-04 Thread Vlastimil Babka
On 04/04/2016 06:45 AM, Naoya Horiguchi wrote:
> On Mon, Apr 04, 2016 at 10:39:17AM +0900, Minchan Kim wrote:
>> Thanks for catching it, Vlastimil.
>> It was my mistake. But in this chance, I looked over hwpoison code and
>> I saw other places which increases num_poisoned_pages are successful
>> migration, already freed page and successful invalidated page.
>> IOW, they are already successful isolated page so I guess it should
>> increase the count when only successful migration is done?
> 
> Yes, that's right. When exiting with migration's failure, we shouldn't call
> test_set_page_hwpoison or num_poisoned_pages_inc, so current code checking
> (rc != -EAGAIN) is simply incorrect. Your change fixes the bug in memory
> error handling. Great!

Ah, I see, soft onlining works differently than I thought.

>> And when I read memory_failure, it bails out without killing if it
>> encounters HWPoisoned page so I think it's not for catching and
>> kill the poor proces.
>>
>>>
>>> Also (but not your fault) the put_page() preceding
>>> test_set_page_hwpoison(page)) IMHO deserves a comment saying which
>>> pin we are releasing and which one we still have (hopefully? if I
>>> read description of da1b13ccfbebe right) otherwise it looks like
>>> doing something with a page that we just potentially freed.
>>
>> Yes, while I read the code, I had same question. I think the releasing
>> refcount is for get_any_page.
> 
> As the other callers of page migration do, soft_offline_page expects the
> migration source page to be freed at this put_page() (no pin remains.)
> The refcount released here is from isolate_lru_page() in 
> __soft_offline_page().
> (the pin by get_any_page is released by put_hwpoison_page just after it.)
> 
> .. yes, doing something just after freeing page looks weird, but that's
> how PageHWPoison flag works. IOW, many other page flags are maintained
> only during one "allocate-free" life span, but PageHWPoison still does
> its job beyond it.

But what prevents the page from being allocated again between put_page()
and test_set_page_hwpoison()? In that case we would be marking page
poisoned while still in use, which is the same as marking it while still
in use after a failed migration?

(Also, which part prevents pages with PageHWPoison to be allocated
again, anyway? I can't find it and test_set_page_hwpoison() doesn't
remove from buddy freelists).

Thanks.

> As for commenting, this put_page() is called in any MIGRATEPAGE_SUCCESS
> case (regardless of callers), so what we can say here is "we free the
> source page here, bypassing LRU list" or something?
> 
> Thanks,
> Naoya Horiguchi
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 01/16] mm: use put_page to free page instead of putback_lru_page

2016-04-03 Thread Minchan Kim
On Mon, Apr 04, 2016 at 03:53:59PM +1000, Balbir Singh wrote:
> 
> 
> On 30/03/16 18:12, Minchan Kim wrote:
> > Procedure of page migration is as follows:
> >
> > First of all, it should isolate a page from LRU and try to
> > migrate the page. If it is successful, it releases the page
> > for freeing. Otherwise, it should put the page back to LRU
> > list.
> >
> > For LRU pages, we have used putback_lru_page for both freeing
> > and putback to LRU list. It's okay because put_page is aware of
> > LRU list so if it releases last refcount of the page, it removes
> > the page from LRU list. However, It makes unnecessary operations
> > (e.g., lru_cache_add, pagevec and flags operations. It would be
> > not significant but no worth to do) and harder to support new
> > non-lru page migration because put_page isn't aware of non-lru
> > page's data structure.
> >
> > To solve the problem, we can add new hook in put_page with
> > PageMovable flags check but it can increase overhead in
> > hot path and needs new locking scheme to stabilize the flag check
> > with put_page.
> >
> > So, this patch cleans it up to divide two semantic(ie, put and putback).
> > If migration is successful, use put_page instead of putback_lru_page and
> > use putback_lru_page only on failure. That makes code more readable
> > and doesn't add overhead in put_page.
> So effectively when we return from unmap_and_move() the page is either
> put_page or putback_lru_page() and the page is gone from under us.

I didn't get your point.
Could you elaborate it more what you want to say about this patch?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 01/16] mm: use put_page to free page instead of putback_lru_page

2016-04-03 Thread Balbir Singh


On 30/03/16 18:12, Minchan Kim wrote:
> Procedure of page migration is as follows:
>
> First of all, it should isolate a page from LRU and try to
> migrate the page. If it is successful, it releases the page
> for freeing. Otherwise, it should put the page back to LRU
> list.
>
> For LRU pages, we have used putback_lru_page for both freeing
> and putback to LRU list. It's okay because put_page is aware of
> LRU list so if it releases last refcount of the page, it removes
> the page from LRU list. However, It makes unnecessary operations
> (e.g., lru_cache_add, pagevec and flags operations. It would be
> not significant but no worth to do) and harder to support new
> non-lru page migration because put_page isn't aware of non-lru
> page's data structure.
>
> To solve the problem, we can add new hook in put_page with
> PageMovable flags check but it can increase overhead in
> hot path and needs new locking scheme to stabilize the flag check
> with put_page.
>
> So, this patch cleans it up to divide two semantic(ie, put and putback).
> If migration is successful, use put_page instead of putback_lru_page and
> use putback_lru_page only on failure. That makes code more readable
> and doesn't add overhead in put_page.
So effectively when we return from unmap_and_move() the page is either
put_page or putback_lru_page() and the page is gone from under us.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 01/16] mm: use put_page to free page instead of putback_lru_page

2016-04-03 Thread Minchan Kim
On Fri, Apr 01, 2016 at 02:58:21PM +0200, Vlastimil Babka wrote:
> On 03/30/2016 09:12 AM, Minchan Kim wrote:
> >Procedure of page migration is as follows:
> >
> >First of all, it should isolate a page from LRU and try to
> >migrate the page. If it is successful, it releases the page
> >for freeing. Otherwise, it should put the page back to LRU
> >list.
> >
> >For LRU pages, we have used putback_lru_page for both freeing
> >and putback to LRU list. It's okay because put_page is aware of
> >LRU list so if it releases last refcount of the page, it removes
> >the page from LRU list. However, It makes unnecessary operations
> >(e.g., lru_cache_add, pagevec and flags operations. It would be
> >not significant but no worth to do) and harder to support new
> >non-lru page migration because put_page isn't aware of non-lru
> >page's data structure.
> >
> >To solve the problem, we can add new hook in put_page with
> >PageMovable flags check but it can increase overhead in
> >hot path and needs new locking scheme to stabilize the flag check
> >with put_page.
> >
> >So, this patch cleans it up to divide two semantic(ie, put and putback).
> >If migration is successful, use put_page instead of putback_lru_page and
> >use putback_lru_page only on failure. That makes code more readable
> >and doesn't add overhead in put_page.
> >
> >Comment from Vlastimil
> >"Yeah, and compaction (perhaps also other migration users) has to drain
> >the lru pvec... Getting rid of this stuff is worth even by itself."
> >
> >Cc: Mel Gorman 
> >Cc: Hugh Dickins 
> >Cc: Naoya Horiguchi 
> >Acked-by: Vlastimil Babka 
> >Signed-off-by: Minchan Kim 
> 
> [...]
> 
> >@@ -974,28 +986,28 @@ static ICE_noinline int unmap_and_move(new_page_t 
> >get_new_page,
> > list_del(&page->lru);
> > dec_zone_page_state(page, NR_ISOLATED_ANON +
> > page_is_file_cache(page));
> >-/* Soft-offlined page shouldn't go through lru cache list */
> >+}
> >+
> >+/*
> >+ * If migration is successful, drop the reference grabbed during
> >+ * isolation. Otherwise, restore the page to LRU list unless we
> >+ * want to retry.
> >+ */
> >+if (rc == MIGRATEPAGE_SUCCESS) {
> >+put_page(page);
> > if (reason == MR_MEMORY_FAILURE) {
> >-put_page(page);
> > if (!test_set_page_hwpoison(page))
> > num_poisoned_pages_inc();
> >-} else
> >+}
> 
> Hmm, I didn't notice it previously, or it's due to rebasing, but it
> seems that you restricted the memory failure handling (i.e. setting
> hwpoison) to MIGRATE_SUCCESS, while previously it was done for all
> non-EAGAIN results. I think that goes against the intention of
> hwpoison, which is IIRC to catch and kill the poor process that
> still uses the page?

That's why I Cc'ed Naoya Horiguchi to catch things I might make
mistake.

Thanks for catching it, Vlastimil.
It was my mistake. But in this chance, I looked over hwpoison code and
I saw other places which increases num_poisoned_pages are successful
migration, already freed page and successful invalidated page.
IOW, they are already successful isolated page so I guess it should
increase the count when only successful migration is done?
And when I read memory_failure, it bails out without killing if it
encounters HWPoisoned page so I think it's not for catching and
kill the poor proces.

> 
> Also (but not your fault) the put_page() preceding
> test_set_page_hwpoison(page)) IMHO deserves a comment saying which
> pin we are releasing and which one we still have (hopefully? if I
> read description of da1b13ccfbebe right) otherwise it looks like
> doing something with a page that we just potentially freed.

Yes, while I read the code, I had same question. I think the releasing
refcount is for get_any_page.

Naoya, could you answer above two questions?

Thanks.

> 
> >+} else {
> >+if (rc != -EAGAIN)
> > putback_lru_page(page);
> >+if (put_new_page)
> >+put_new_page(newpage, private);
> >+else
> >+put_page(newpage);
> > }
> >
> >-/*
> >- * If migration was not successful and there's a freeing callback, use
> >- * it.  Otherwise, putback_lru_page() will drop the reference grabbed
> >- * during isolation.
> >- */
> >-if (put_new_page)
> >-put_new_page(newpage, private);
> >-else if (unlikely(__is_movable_balloon_page(newpage))) {
> >-/* drop our reference, page already in the balloon */
> >-put_page(newpage);
> >-} else
> >-putback_lru_page(newpage);
> >-
> > if (result) {
> > if (rc)
> > *result = rc;
> >
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/vi

Re: [PATCH v3 01/16] mm: use put_page to free page instead of putback_lru_page

2016-04-01 Thread Vlastimil Babka

On 03/30/2016 09:12 AM, Minchan Kim wrote:

Procedure of page migration is as follows:

First of all, it should isolate a page from LRU and try to
migrate the page. If it is successful, it releases the page
for freeing. Otherwise, it should put the page back to LRU
list.

For LRU pages, we have used putback_lru_page for both freeing
and putback to LRU list. It's okay because put_page is aware of
LRU list so if it releases last refcount of the page, it removes
the page from LRU list. However, It makes unnecessary operations
(e.g., lru_cache_add, pagevec and flags operations. It would be
not significant but no worth to do) and harder to support new
non-lru page migration because put_page isn't aware of non-lru
page's data structure.

To solve the problem, we can add new hook in put_page with
PageMovable flags check but it can increase overhead in
hot path and needs new locking scheme to stabilize the flag check
with put_page.

So, this patch cleans it up to divide two semantic(ie, put and putback).
If migration is successful, use put_page instead of putback_lru_page and
use putback_lru_page only on failure. That makes code more readable
and doesn't add overhead in put_page.

Comment from Vlastimil
"Yeah, and compaction (perhaps also other migration users) has to drain
the lru pvec... Getting rid of this stuff is worth even by itself."

Cc: Mel Gorman 
Cc: Hugh Dickins 
Cc: Naoya Horiguchi 
Acked-by: Vlastimil Babka 
Signed-off-by: Minchan Kim 


[...]


@@ -974,28 +986,28 @@ static ICE_noinline int unmap_and_move(new_page_t 
get_new_page,
list_del(&page->lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
-   /* Soft-offlined page shouldn't go through lru cache list */
+   }
+
+   /*
+* If migration is successful, drop the reference grabbed during
+* isolation. Otherwise, restore the page to LRU list unless we
+* want to retry.
+*/
+   if (rc == MIGRATEPAGE_SUCCESS) {
+   put_page(page);
if (reason == MR_MEMORY_FAILURE) {
-   put_page(page);
if (!test_set_page_hwpoison(page))
num_poisoned_pages_inc();
-   } else
+   }


Hmm, I didn't notice it previously, or it's due to rebasing, but it seems that 
you restricted the memory failure handling (i.e. setting hwpoison) to 
MIGRATE_SUCCESS, while previously it was done for all non-EAGAIN results. I 
think that goes against the intention of hwpoison, which is IIRC to catch and 
kill the poor process that still uses the page?


Also (but not your fault) the put_page() preceding test_set_page_hwpoison(page)) 
IMHO deserves a comment saying which pin we are releasing and which one we still 
have (hopefully? if I read description of da1b13ccfbebe right) otherwise it 
looks like doing something with a page that we just potentially freed.



+   } else {
+   if (rc != -EAGAIN)
putback_lru_page(page);
+   if (put_new_page)
+   put_new_page(newpage, private);
+   else
+   put_page(newpage);
}

-   /*
-* If migration was not successful and there's a freeing callback, use
-* it.  Otherwise, putback_lru_page() will drop the reference grabbed
-* during isolation.
-*/
-   if (put_new_page)
-   put_new_page(newpage, private);
-   else if (unlikely(__is_movable_balloon_page(newpage))) {
-   /* drop our reference, page already in the balloon */
-   put_page(newpage);
-   } else
-   putback_lru_page(newpage);
-
if (result) {
if (rc)
*result = rc;



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization