Re: [virtio-dev] Re: [RFC][Patch v12 1/2] mm: page_reporting: core infrastructure

2019-09-04 Thread David Hildenbrand
>>> For some reason, I am not seeing this work as I would have expected
>>> but I don't have solid reasoning to share yet. It could be simply
>>> because I am putting my hook at the wrong place. I will continue
>>> investigating this.
>>>
>>> In any case, I may be over complicating things here, so please let me
>>> if there is a better way to do this.
>> I have already been demonstrating the "better way" I think there is to
>> do this. I will push v7 of it early next week unless there is some
>> other feedback. By putting the bit in the page and controlling what
>> comes into and out of the lists it makes most of this quite a bit
>> easier. The only limitation is you have to modify where things get
>> placed in the lists so you don't create a "vapor lock" that would
>> stall the feed of pages into the reporting engine.
>>
>>> If this overhead is not significant we can probably live with it.
>> You have bigger issues you still have to overcome as I recall. Didn't
>> you still need to sort out hotplu
> 
> For memory hotplug, my impression is that it should
> not be a blocker for taking the first step (in case we do decide to
> go ahead with this approach). Another reason why I am considering
> this as future work is that memory hot(un)plug is still under
> development and requires fixing. (Specifically, issue such as zone
> shrinking which will directly impact the bitmap approach is still
> under discussion).

Memory hotplug is way more reliable than memory unplug - however, but
both are used in production. E.g. the zone shrinking you mention is
something that is not "optimal", but works in most scenarios. So both
features are rather in a "production/fixing" stage than a pure
"development" stage.

However, I do agree that memory hot(un)plug is not something
high-priority for free page hinting in the first shot. If it works out
of the box (Alexander's approach) - great - if not it is just one
additional scenario where free page hinting might not be optimal yet
(like vfio where we can't use it completely right now). After all, most
virtual environment (under KVM) I am aware of don't use DIMM-base memory
hot(un)plug at all.

The important part is that it will not break when memory is added/removed.

> 
>> g and a sparse map with a wide span
>> in a zone? Without those resolved the bitmap approach is still a no-go
>> regardless of performance.
> 
> For sparsity, the memory wastage should not be significant as I
> am tracking pages on the granularity of (MAX_ORDER - 2) and maintaining
> the bitmaps on a per-zone basis (which was not the case earlier).

To handle sparsity one would have to have separate bitmaps for each
section. Roughly 64 bit per 128MB section. Scanning the scattered bitmap
would get more expensive. Of course, one could have a hierarchical
bitmap to speed up the search etc.

But again, I think we should have a look how much of an issue sparse
zones/nodes actually are in virtual machines (KVM). The setups I am
aware of minimize sparsity (e.g., QEMU will place DIMMs consecutively in
memory, only aligning to e.g, 128MB on x86 if required). Usually all
memory in VMs is onlined to the NORMAL zone (except special cases of
course). The only "issue" would be if you start mixing DIMMs of
different nodes when hotplugging memory. The overhead for 1bit per 2MB
could be almost neglectable in most setups.

But I do agree that for the bitmap-based approach there might be more
scenarios where you'd rather not want to use free page hinting and
instead disable it.

> 
> However, if you do consider this as a block I will think about it and try to 
> fix it.
> In the worst case, if I don't find a solution I will add this as a known 
> limitation
> for this approach in my cover.
> 
>> - Alex


-- 

Thanks,

David / dhildenb


Re: [virtio-dev] Re: [RFC][Patch v12 1/2] mm: page_reporting: core infrastructure

2019-08-14 Thread Nitesh Narayan Lal


On 8/14/19 3:07 AM, David Hildenbrand wrote:
> On 14.08.19 01:14, Alexander Duyck wrote:
>> On Tue, Aug 13, 2019 at 3:34 AM David Hildenbrand  wrote:
>>> +static int process_free_page(struct page *page,
>>> +struct page_reporting_config *phconf, int 
>>> count)
>>> +{
>>> +   int mt, order, ret = 0;
>>> +
>>> +   mt = get_pageblock_migratetype(page);
>>> +   order = page_private(page);
>>> +   ret = __isolate_free_page(page, order);
>>> +
> I just started looking into the wonderful world of
> isolation/compaction/migration.
>
> I don't think saving/restoring the migratetype is correct here. AFAIK,
> MOVABLE/UNMOVABLE/RECLAIMABLE is just a hint, doesn't mean that e.g.,
> movable pages and up in UNMOVABLE or ordinary kernel allocations on
> MOVABLE. So that shouldn't be an issue - I guess.
>
> 1. You should never allocate something that is no
> MOVABLE/UNMOVABLE/RECLAIMABLE. Especially not, if you have ISOLATE or
> CMA here. There should at least be a !is_migrate_isolate_page() check
> somewhere
>
> 2. set_migratetype_isolate() takes the zone lock, so to avoid racing
> with isolation code, you have to hold the zone lock. Your code seems to
> do that, so at least you cannot race against isolation.
>
> 3. You could end up temporarily allocating something in the
> ZONE_MOVABLE. The pages you allocate are, however, not movable. There
> would have to be a way to make alloc_contig_range()/offlining code
> properly wait until the pages have been processed. Not sure about the
> real implications, though - too many details in the code (I wonder if
> Alex' series has a way of dealing with that)
>
> When you restore the migratetype, you could suddenly overwrite e.g.,
> ISOLATE, which feels wrong.

 I was triggering an occasional CPU stall bug earlier, with saving and 
 restoring
 the migratetype I was able to fix it.
 But I will further look into this to figure out if it is really required.

>>> You should especially look into handling isolated/cma pages. Maybe that
>>> was the original issue. Alex seems to have added that in his latest
>>> series (skipping isolated/cma pageblocks completely) as well.
>> So as far as skipping isolated pageblocks, I get the reason for
>> skipping isolated, but why would we need to skip CMA? I had made the
>> change I did based on comments you had made earlier. But while working
>> on some of the changes to address isolation better and looking over
>> several spots in the code it seems like CMA is already being used as
>> an allocation fallback for MIGRATE_MOVABLE. If that is the case
>> wouldn't it make sense to allow pulling pages and reporting them while
>> they are in the free_list?
> I was assuming that CMA is also to be skipped because "static int
> fallbacks[MIGRATE_TYPES][4]" in mm/page_alloc.c doesn't handle CMA at
> all, meaning we should never fallback to CMA or from CMA to another type
> - at least when stealing pages from another migratetype. So it smells
> like MIGRATE_CMA is static -> the area is marked once and will never be
> converted to something else (except MIGRATE_ISOLATE temporarily).
>
> I assume you are talking about gfp_to_alloc_flags()/prepare_alloc_pages():


I am also trying to look into this to get more understanding of it.
Another thing which I am looking into right now is the difference between
get/set_pcppage_migratetype() and ge/set_pageblock_migratetype().
To an extent, I do understand what is the benefit of using
get/set_pcppage_migratetype() by reading the comments. However, I am not sure
how it gets along with MIGRATE_CMA.
Hopefully, I will have an understanding of it soon.

> #ifdef CONFIG_CMA
>   if (gfpflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
>   alloc_flags |= ALLOC_CMA;
> #endif
>
> Yeah, this looks like MOVABLE allocations can fallback to CMA
> pageblocks. And from what I read, "CMA may use its own migratetype
> (MIGRATE_CMA) which behaves similarly to ZONE_MOVABLE but can be put in
> arbitrary places."
>
> So I think you are right, it could be that it is safe to temporarily
> pull out CMA pages (in contrast to isolated pages) - assuming it is fine
> to have temporary unmovable allocations on them (different discussion).
>
> (I am learning about the details as we discuss :) )
>
> The important part would then be to never allocate from the isolated
> pageblocks and to never overwrite MIGRATE_ISOLATE.


Agreed. I think I should just avoid isolating pages with migratetype
MIGRATE_ISOLATE.
Adding a check with is_migrate_isolate_page() before isolating the page should
do it.


>
-- 
Thanks
Nitesh