Re: [PATCH v4 4/7] mm: make alloc_contig_range work at pageblock granularity
On Wed, Jan 19, 2022 at 02:06:20PM -0500, Zi Yan wrote: > From: Zi Yan > > alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging > pageblocks with different migratetypes. It might unnecessarily convert > extra pageblocks at the beginning and at the end of the range. Change > alloc_contig_range() to work at pageblock granularity. > > It is done by restoring pageblock types and split >pageblock_order free > pages after isolating at MAX_ORDER-1 granularity and migrating pages > away at pageblock granularity. The reason for this process is that > during isolation, some pages, either free or in-use, might have >pageblock > sizes and isolating part of them can cause free accounting issues. > Restoring the migratetypes of the pageblocks not in the interesting > range later is much easier. Hi Zi Yan, Due to time constraints I only glanced over, so some comments below about stuff that caught my eye: > +static inline void split_free_page_into_pageblocks(struct page *free_page, > + int order, struct zone *zone) > +{ > + unsigned long pfn; > + > + spin_lock(>lock); > + del_page_from_free_list(free_page, zone, order); > + for (pfn = page_to_pfn(free_page); > + pfn < page_to_pfn(free_page) + (1UL << order); It migt make sense to have a end_pfn variable so that does not have to be constantly evaluated. Or maybe the compiler is clever enough to only evualuate it once. > + pfn += pageblock_nr_pages) { > + int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn); > + > + __free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order, > + mt, FPI_NONE); > + } > + spin_unlock(>lock); It is possible that free_page's order is already pageblock_order, so I would add a one-liner upfront to catch that case and return, otherwise we do the delete_from_freelist-and-free_it_again dance. > + /* Save the migratepages of the pageblocks before start and after end */ > + num_pageblock_to_save = (alloc_start - isolate_start) / > pageblock_nr_pages > + + (isolate_end - alloc_end) / > pageblock_nr_pages; > + saved_mt = > + kmalloc_array(num_pageblock_to_save, > + sizeof(unsigned char), GFP_KERNEL); > + if (!saved_mt) > + return -ENOMEM; > + > + num = save_migratetypes(saved_mt, isolate_start, alloc_start); > + > + num = save_migratetypes(_mt[num], alloc_end, isolate_end); I really hope we can put all this magic within start_isolate_page_range, and the counterparts in undo_isolate_page_range. Also, I kinda dislike the _mt thing. I thought about some other approaches but nothing that wasn't too specific for this case, and I guess we want that function to be as generic as possible. > + /* > + * Split free page spanning [alloc_end, isolate_end) and put the > + * pageblocks in the right migratetype list > + */ > + for (outer_end = alloc_end; outer_end < isolate_end;) { > + unsigned long begin_pfn = outer_end; > + > + order = 0; > + while (!PageBuddy(pfn_to_page(outer_end))) { > + if (++order >= MAX_ORDER) { > + outer_end = begin_pfn; > + break; > + } > + outer_end &= ~0UL << order; > + } > + > + if (outer_end != begin_pfn) { > + order = buddy_order(pfn_to_page(outer_end)); > + > + /* > + * split the free page has start page and put the > pageblocks > + * in the right migratetype list > + */ > + VM_BUG_ON(outer_end + (1UL << order) <= begin_pfn); How could this possibily happen? > + { > + struct page *free_page = pfn_to_page(outer_end); > + > + split_free_page_into_pageblocks(free_page, > order, cc.zone); > + } > + outer_end += 1UL << order; > + } else > + outer_end = begin_pfn + 1; > } I think there are cases could optimize for. If the page has already been split in pageblock by the outer_start loop, we could skip this outer_end logic altogether. E.g: An order-10 page is split in two pageblocks. There's nothing else to be done, right? We could skip this. -- Oscar Salvador SUSE Labs ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
On Wed, Feb 02, 2022 at 01:25:28PM +0100, David Hildenbrand wrote: > That's the whole idea for being able to allocate parts of an unmovable > pageblock that are movable. > > If the first part is unmovable but the second part is movable, nothing > should stop us from trying to allocate the second part. Yeah, I see, I was a bit slow there, but I see the point now. Thanks David -- Oscar Salvador SUSE Labs ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
On Wed, Jan 19, 2022 at 02:06:19PM -0500, Zi Yan wrote: > From: Zi Yan > > Enable set_migratetype_isolate() to check specified sub-range for > unmovable pages during isolation. Page isolation is done > at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all > pages within that granularity are intended to be isolated. For example, > alloc_contig_range(), which uses page isolation, allows ranges without > alignment. This commit makes unmovable page check only look for > interesting pages, so that page isolation can succeed for any > non-overlapping ranges. Another thing that came to my mind. Prior to this patch, has_unmovable_pages() was checking on pageblock granularity, starting at pfn#0 of the pageblock. With this patch, you no longer check on pageblock granularity, which means you might isolate a pageblock, but some pages that sneaked in might actually be unmovable. E.g: Let's say you have a pageblock that spans (pfn#512,pfn#1024), and you pass alloc_contig_range() (pfn#514,pfn#1024). has_unmovable_pages() will start checking the pageblock at pfn#514, and so it will mis pfn#512 and pfn#513. Isn't that a problem, if those pfn turn out to be actually unmovable? -- Oscar Salvador SUSE Labs ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
On Tue, Jan 25, 2022 at 02:19:46PM +0100, Oscar Salvador wrote: > I know that this has been discussed previously, and the cover-letter already > mentions it, but I think it would be great to have some sort of information > about > the problem in the commit message as well, so people do not have to go and > find > it somewhere else. Sorry, the commit already points it out, but I meant to elaborate some more. -- Oscar Salvador SUSE Labs ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
On Mon, Jan 24, 2022 at 12:17:23PM -0500, Zi Yan wrote: > You are right. Sorry for the confusion. I think it should be > “Page isolation is done at least on max(MAX_ORDER_NR_PAEGS, > pageblock_nr_pages) granularity.” > > memory_hotplug uses PAGES_PER_SECTION. It is greater than that. Or just specify that the max(MAX_ORDER_NR_PAGES, pageblock_nr_pages) granurality only comes from alloc_contig_range at the moment. Other callers might want to work in other granularity (e.g: memory-hotplug) although ultimately the range has to be aligned to something. > > True is that start_isolate_page_range() expects the range to be pageblock > > aligned and works in pageblock_nr_pages chunks, but I do not think that is > > what you meant to say here. > > Actually, start_isolate_page_range() should expect max(MAX_ORDER_NR_PAEGS, > pageblock_nr_pages) alignment instead of pageblock alignment. It seems to > be an uncovered bug in the current code, since all callers uses at least > max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) alignment. > > The reason is that if start_isolate_page_range() is only pageblock aligned > and a caller wants to isolate one pageblock from a MAX_ORDER-1 > (2 pageblocks on x84_64 systems) free page, this will lead to MIGRATE_ISOLATE > accounting error. To avoid it, start_isolate_page_range() needs to isolate > the max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) aligned range. So, let me see if I get this straight: You are saying that, currently, alloc_contig_ranges() works on the biggest alignment otherwise we might have this scenario: [ MAX_ORDER-1 ] [pageblock#0][pageblock#1] We only want to isolate pageblock#1, so we pass a pageblock-aligned range to start_isolate_page_range(), but the page belonging to pageblock#1 spans pageblock#0 and pageblock#1 because it is a MAX_ORDER-1 page. So when we call set_migratetype_isolate()->set_pageblock_migratetype(), this will mark either pageblock#0 or pageblock#1 as isolated, but the whole page will be put in the MIGRATE_ISOLATE freelist by move_freepages_block()->move_freepages(). Meaning, we wil effectively have two pageblocks isolated, but only one marked as such? Did I get it right or did I miss something? I know that this has been discussed previously, and the cover-letter already mentions it, but I think it would be great to have some sort of information about the problem in the commit message as well, so people do not have to go and find it somewhere else. -- Oscar Salvador SUSE Labs ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 2/7] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c
On 2022-01-19 20:06, Zi Yan wrote: From: Zi Yan has_unmovable_pages() is only used in mm/page_isolation.c. Move it from mm/page_alloc.c and make it static. Signed-off-by: Zi Yan Reviewed-by: Oscar Salvador -- Oscar Salvador SUSE Labs ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
On 2022-01-19 20:06, Zi Yan wrote: From: Zi Yan Enable set_migratetype_isolate() to check specified sub-range for unmovable pages during isolation. Page isolation is done at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all pages within that granularity are intended to be isolated. For example, alloc_contig_range(), which uses page isolation, allows ranges without alignment. This commit makes unmovable page check only look for interesting pages, so that page isolation can succeed for any non-overlapping ranges. Hi Zi Yan, I had to re-read this several times as I found this a bit misleading. I was mainly confused by the fact that memory_hotplug does isolation on PAGES_PER_SECTION granularity, and reading the above seems to indicate that either do it at MAX_ORDER_NR_PAGES or at pageblock_nr_pages granularity. True is that start_isolate_page_range() expects the range to be pageblock aligned and works in pageblock_nr_pages chunks, but I do not think that is what you meant to say here. Now, to the change itself, below: @@ -47,8 +51,8 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page, return page; } - for (; iter < pageblock_nr_pages - offset; iter++) { - page = pfn_to_page(pfn + iter); + for (pfn = first_pfn; pfn < last_pfn; pfn++) { You already did pfn = first_pfn before. /** * start_isolate_page_range() - make page-allocation-type of range of pages to * be MIGRATE_ISOLATE. - * @start_pfn: The lower PFN of the range to be isolated. - * @end_pfn: The upper PFN of the range to be isolated. + * @start_pfn: The lower PFN of the range to be checked for + * possibility of isolation. + * @end_pfn: The upper PFN of the range to be checked for + * possibility of isolation. + * @isolate_start: The lower PFN of the range to be isolated. + * @isolate_end: The upper PFN of the range to be isolated. So, what does "possibility" means here. I think this need to be clarified a bit better. From what you pointed out in the commit message I think what you are doing is: - alloc_contig_range() gets a range to be isolated. - then you pass two ranges to start_isolate_page_range() (start_pfn, end_pfn]: which is the unaligned range you got in alloc_contig_range() (isolate_start, isolate_end]: which got aligned to, let's say, to MAX_ORDER_NR_PAGES Now, most likely, (start_pfn, end_pfn] only covers a sub-range of (isolate_start, isolate_end], and that sub-range is what you really want to isolate (so (start_pfn, end_pfn])? If so, should not the names be reversed? -- Oscar Salvador SUSE Labs ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu