Re: [PATCH v4 4/7] mm: make alloc_contig_range work at pageblock granularity

2022-02-04 Thread Oscar Salvador
On Wed, Jan 19, 2022 at 02:06:20PM -0500, Zi Yan wrote:
> From: Zi Yan 
> 
> alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
> pageblocks with different migratetypes. It might unnecessarily convert
> extra pageblocks at the beginning and at the end of the range. Change
> alloc_contig_range() to work at pageblock granularity.
> 
> It is done by restoring pageblock types and split >pageblock_order free
> pages after isolating at MAX_ORDER-1 granularity and migrating pages
> away at pageblock granularity. The reason for this process is that
> during isolation, some pages, either free or in-use, might have >pageblock
> sizes and isolating part of them can cause free accounting issues.
> Restoring the migratetypes of the pageblocks not in the interesting
> range later is much easier.

Hi Zi Yan,

Due to time constraints I only glanced over, so some comments below
about stuff that caught my eye:

> +static inline void split_free_page_into_pageblocks(struct page *free_page,
> + int order, struct zone *zone)
> +{
> + unsigned long pfn;
> +
> + spin_lock(>lock);
> + del_page_from_free_list(free_page, zone, order);
> + for (pfn = page_to_pfn(free_page);
> +  pfn < page_to_pfn(free_page) + (1UL << order);

It migt make sense to have a end_pfn variable so that does not have to
be constantly evaluated. Or maybe the compiler is clever enough to only
evualuate it once.

> +  pfn += pageblock_nr_pages) {
> + int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
> +
> + __free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order,
> + mt, FPI_NONE);
> + }
> + spin_unlock(>lock);

It is possible that free_page's order is already pageblock_order, so I
would add a one-liner upfront to catch that case and return, otherwise
we do the delete_from_freelist-and-free_it_again dance.

> + /* Save the migratepages of the pageblocks before start and after end */
> + num_pageblock_to_save = (alloc_start - isolate_start) / 
> pageblock_nr_pages
> + + (isolate_end - alloc_end) / 
> pageblock_nr_pages;
> + saved_mt =
> + kmalloc_array(num_pageblock_to_save,
> +   sizeof(unsigned char), GFP_KERNEL);
> + if (!saved_mt)
> + return -ENOMEM;
> +
> + num = save_migratetypes(saved_mt, isolate_start, alloc_start);
> +
> + num = save_migratetypes(_mt[num], alloc_end, isolate_end);

I really hope we can put all this magic within start_isolate_page_range,
and the counterparts in undo_isolate_page_range.

Also, I kinda dislike the _mt thing. I thought about some other
approaches but nothing that wasn't too specific for this case, and I
guess we want that function to be as generic as possible.

> + /*
> +  * Split free page spanning [alloc_end, isolate_end) and put the
> +  * pageblocks in the right migratetype list
> +  */
> + for (outer_end = alloc_end; outer_end < isolate_end;) {
> + unsigned long begin_pfn = outer_end;
> +
> + order = 0;
> + while (!PageBuddy(pfn_to_page(outer_end))) {
> + if (++order >= MAX_ORDER) {
> + outer_end = begin_pfn;
> + break;
> + }
> + outer_end &= ~0UL << order;
> + }
> +
> + if (outer_end != begin_pfn) {
> + order = buddy_order(pfn_to_page(outer_end));
> +
> + /*
> +  * split the free page has start page and put the 
> pageblocks
> +  * in the right migratetype list
> +  */
> + VM_BUG_ON(outer_end + (1UL << order) <= begin_pfn);

How could this possibily happen?

> + {
> + struct page *free_page = pfn_to_page(outer_end);
> +
> + split_free_page_into_pageblocks(free_page, 
> order, cc.zone);
> + }
> + outer_end += 1UL << order;
> + } else
> + outer_end = begin_pfn + 1;
>   }

I think there are cases could optimize for. If the page has already been
split in pageblock by the outer_start loop, we could skip this outer_end
logic altogether.

E.g: An order-10 page is split in two pageblocks. There's nothing else
to be done, right? We could skip this. 


-- 
Oscar Salvador
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages

2022-02-02 Thread Oscar Salvador
On Wed, Feb 02, 2022 at 01:25:28PM +0100, David Hildenbrand wrote:
> That's the whole idea for being able to allocate parts of an unmovable
> pageblock that are movable.
> 
> If the first part is unmovable but the second part is movable, nothing
> should stop us from trying to allocate the second part.

Yeah, I see, I was a bit slow there, but I see the point now.
 
Thanks David

-- 
Oscar Salvador
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages

2022-02-02 Thread Oscar Salvador
On Wed, Jan 19, 2022 at 02:06:19PM -0500, Zi Yan wrote:
> From: Zi Yan 
> 
> Enable set_migratetype_isolate() to check specified sub-range for
> unmovable pages during isolation. Page isolation is done
> at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
> pages within that granularity are intended to be isolated. For example,
> alloc_contig_range(), which uses page isolation, allows ranges without
> alignment. This commit makes unmovable page check only look for
> interesting pages, so that page isolation can succeed for any
> non-overlapping ranges.

Another thing that came to my mind.
Prior to this patch, has_unmovable_pages() was checking on pageblock
granularity, starting at pfn#0 of the pageblock.
With this patch, you no longer check on pageblock granularity, which
means you might isolate a pageblock, but some pages that sneaked in
might actually be unmovable.

E.g:

Let's say you have a pageblock that spans (pfn#512,pfn#1024),
and you pass alloc_contig_range() (pfn#514,pfn#1024).
has_unmovable_pages() will start checking the pageblock at pfn#514,
and so it will mis pfn#512 and pfn#513. Isn't that a problem, if those
pfn turn out to be actually unmovable?


-- 
Oscar Salvador
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages

2022-01-25 Thread Oscar Salvador
On Tue, Jan 25, 2022 at 02:19:46PM +0100, Oscar Salvador wrote:
> I know that this has been discussed previously, and the cover-letter already
> mentions it, but I think it would be great to have some sort of information 
> about
> the problem in the commit message as well, so people do not have to go and 
> find
> it somewhere else.

Sorry, the commit already points it out, but I meant to elaborate some more.

-- 
Oscar Salvador
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages

2022-01-25 Thread Oscar Salvador
On Mon, Jan 24, 2022 at 12:17:23PM -0500, Zi Yan wrote:
> You are right. Sorry for the confusion. I think it should be
> “Page isolation is done at least on max(MAX_ORDER_NR_PAEGS,
> pageblock_nr_pages) granularity.”
> 
> memory_hotplug uses PAGES_PER_SECTION. It is greater than that.

Or just specify that the max(MAX_ORDER_NR_PAGES, pageblock_nr_pages) granurality
only comes from alloc_contig_range at the moment. Other callers might want
to work in other granularity (e.g: memory-hotplug) although ultimately the
range has to be aligned to something.

> > True is that start_isolate_page_range() expects the range to be pageblock 
> > aligned and works in pageblock_nr_pages chunks, but I do not think that is 
> > what you meant to say here.
> 
> Actually, start_isolate_page_range() should expect max(MAX_ORDER_NR_PAEGS,
> pageblock_nr_pages) alignment instead of pageblock alignment. It seems to
> be an uncovered bug in the current code, since all callers uses at least
> max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) alignment.
> 
> The reason is that if start_isolate_page_range() is only pageblock aligned
> and a caller wants to isolate one pageblock from a MAX_ORDER-1
> (2 pageblocks on x84_64 systems) free page, this will lead to MIGRATE_ISOLATE
> accounting error. To avoid it, start_isolate_page_range() needs to isolate
> the max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) aligned range.

So, let me see if I get this straight:

You are saying that, currently, alloc_contig_ranges() works on the biggest
alignment otherwise we might have this scenario:

[  MAX_ORDER-1   ]
[pageblock#0][pageblock#1]

We only want to isolate pageblock#1, so we pass a pageblock-aligned range to
start_isolate_page_range(), but the page belonging to pageblock#1 spans
pageblock#0 and pageblock#1 because it is a MAX_ORDER-1 page.

So when we call set_migratetype_isolate()->set_pageblock_migratetype(), this 
will
mark either pageblock#0 or pageblock#1 as isolated, but the whole page will be 
put
in the MIGRATE_ISOLATE freelist by move_freepages_block()->move_freepages().
Meaning, we wil effectively have two pageblocks isolated, but only one marked
as such?

Did I get it right or did I miss something?

I know that this has been discussed previously, and the cover-letter already
mentions it, but I think it would be great to have some sort of information 
about
the problem in the commit message as well, so people do not have to go and find
it somewhere else.


-- 
Oscar Salvador
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 2/7] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c

2022-01-24 Thread Oscar Salvador

On 2022-01-19 20:06, Zi Yan wrote:

From: Zi Yan 

has_unmovable_pages() is only used in mm/page_isolation.c. Move it from
mm/page_alloc.c and make it static.

Signed-off-by: Zi Yan 


Reviewed-by: Oscar Salvador 

--
Oscar Salvador
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages

2022-01-24 Thread Oscar Salvador

On 2022-01-19 20:06, Zi Yan wrote:

From: Zi Yan 

Enable set_migratetype_isolate() to check specified sub-range for
unmovable pages during isolation. Page isolation is done
at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
pages within that granularity are intended to be isolated. For example,
alloc_contig_range(), which uses page isolation, allows ranges without
alignment. This commit makes unmovable page check only look for
interesting pages, so that page isolation can succeed for any
non-overlapping ranges.


Hi Zi Yan,

I had to re-read this several times as I found this a bit misleading.
I was mainly confused by the fact that memory_hotplug does isolation on 
PAGES_PER_SECTION granularity, and reading the above seems to indicate 
that either do it at MAX_ORDER_NR_PAGES or at pageblock_nr_pages 
granularity.


True is that start_isolate_page_range() expects the range to be 
pageblock aligned and works in pageblock_nr_pages chunks, but I do not 
think that is what you meant to say here.


Now, to the change itself, below:



@@ -47,8 +51,8 @@ static struct page *has_unmovable_pages(struct zone
*zone, struct page *page,
return page;
}

-   for (; iter < pageblock_nr_pages - offset; iter++) {
-   page = pfn_to_page(pfn + iter);
+   for (pfn = first_pfn; pfn < last_pfn; pfn++) {


You already did pfn = first_pfn before.


 /**
  * start_isolate_page_range() - make page-allocation-type of range of 
pages to

  * be MIGRATE_ISOLATE.
- * @start_pfn: The lower PFN of the range to be isolated.
- * @end_pfn:   The upper PFN of the range to be isolated.
+ * @start_pfn: The lower PFN of the range to be checked for
+ * possibility of isolation.
+ * @end_pfn:   The upper PFN of the range to be checked for
+ * possibility of isolation.
+ * @isolate_start: The lower PFN of the range to be isolated.
+ * @isolate_end:   The upper PFN of the range to be isolated.


So, what does "possibility" means here. I think this need to be 
clarified a bit better.


From what you pointed out in the commit message I think what you are 
doing is:


- alloc_contig_range() gets a range to be isolated.
- then you pass two ranges to start_isolate_page_range()
  (start_pfn, end_pfn]: which is the unaligned range you got in 
alloc_contig_range()
  (isolate_start, isolate_end]: which got aligned to, let's say, to 
MAX_ORDER_NR_PAGES


Now, most likely, (start_pfn, end_pfn] only covers a sub-range of 
(isolate_start, isolate_end], and that

sub-range is what you really want to isolate (so (start_pfn, end_pfn])?

If so, should not the names be reversed?


--
Oscar Salvador
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu