Re: [PATCH] Optimise restore memory allocation

Andrew Cooper Fri, 29 Aug 2025 02:34:15 -0700

On 27/08/2025 6:23 pm, Andrew Cooper wrote:
> Subject wants to be at least tools/libxc, and probably "Use superpages
> where possible on migrate/resume"
>
>
> On 27/08/2025 1:33 pm, Frediano Ziglio wrote:
>> Try to allocate larger order pages.
>> With some test memory program stressing TLB (many small random
>> memory accesses) you can get 15% performance improves.
>> On the first memory iteration the sender is currently sending
>> memory in 4mb aligned chunks which allows the receiver to
>> allocate most pages as 2mb superpages instead of single 4kb pages.
> It's critical to say somewhere that this is applicable to HVM guests.
>
> You've eluded to it, but it's important to state clearly that, for HVM
> guests, PAGE_DATA records contain metadata about GFNs in aligned 4M
> blocks.  This is why we don't even need to buffer a second record.
>
> It's also worth stating that 1G superpages are left for later.
>
>
> CC-ing Oleksii as release manager.  This is a fix for a (mis)feature
> which has been known for a decade (since Xen 4.6), and for which two
> series have been posted but not managed to get in.  Unlike those series,
> this is a very surgical fix that gets the majority of the perf win back,
> without the complexity of trying to guess at 1G pages.
>
> Therefore I'd like to request that it be considered for 4.21 at this
> juncture.
>
>> Signed-off-by: Frediano Ziglio <frediano.zig...@cloud.com>
>> ---
>>  tools/libs/guest/xg_sr_restore.c | 39 ++++++++++++++++++++++++++++----
>>  1 file changed, 35 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/libs/guest/xg_sr_restore.c 
>> b/tools/libs/guest/xg_sr_restore.c
>> index 06231ca826..8dcb1b19c5 100644
>> --- a/tools/libs/guest/xg_sr_restore.c
>> +++ b/tools/libs/guest/xg_sr_restore.c
>> @@ -129,6 +129,8 @@ static int pfn_set_populated(struct xc_sr_context *ctx, 
>> xen_pfn_t pfn)
>>      return 0;
>>  }
>>  
>> +#define IS_POWER_OF_2(n) (((n) & ((n) - 1)) == 0)
>> +
>>  /*
>>   * Given a set of pfns, obtain memory from Xen to fill the physmap for the
>>   * unpopulated subset.  If types is NULL, no page type checking is performed
>> @@ -141,6 +143,7 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned 
>> int count,
>>      xen_pfn_t *mfns = malloc(count * sizeof(*mfns)),
>>          *pfns = malloc(count * sizeof(*pfns));
>>      unsigned int i, nr_pfns = 0;
>> +    bool contiguous = true;
>>      int rc = -1;
>>  
>>      if ( !mfns || !pfns )
>> @@ -159,18 +162,46 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned 
>> int count,
>>              if ( rc )
>>                  goto err;
>>              pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i];
>> +            if ( pfns[nr_pfns] != pfns[0] + nr_pfns )
>> +                contiguous = false;
>>              ++nr_pfns;
>>          }
>>      }
>>  
>>      if ( nr_pfns )
>>      {
>> -        rc = xc_domain_populate_physmap_exact(
>> -            xch, ctx->domid, nr_pfns, 0, 0, mfns);
>> +        /* try optimizing using larger order */
>> +        rc = -1;
>> +        /*
>> +         * The "nr_pfns <= (1 << 18)" check is mainly for paranoia, it 
>> should
>> +         * never happen, the sender would have to send a really large 
>> packet.
>> +         */
>> +        if ( contiguous && nr_pfns <= (1 << 18) &&
> This is an arbitrary limit, and to contradict the prior feedback given
> in private, the domain's MAX_ORDER isn't relevant here.  It's the
> toolstack's choice how to lay out the guest in memory.
>
>> +             IS_POWER_OF_2(nr_pfns) && (pfns[0] & (nr_pfns - 1)) == 0 )
>> +        {
>> +            const unsigned int extent_order = __builtin_ffs(nr_pfns) - 1;
> This (non-)loop isn't great.  You'll e.g. use 4k pages for the second 2M
> page of an HVM guest simply because the VGA hole exists in the first.
>
> I think you probably want something like:
>
> int populate_physmap_4k(ctx, nr, mfns);
> int populate_physmap_2M(ctx, nr, mfns);
>
> as simple wrappers around the raw hypercalls including transforming back
> the mfns[] array, and:
>
> int populate_phymap(...);
>
> with logic of the form:
>
>     while ( nr )
>     {
>         if ( nr < 512 ) /* Can't be a superpage */
>         {
>             populate_physmap_4k(ctx, i, mfns);
>             mfns += i;
>             nr -= i;
>             continue;
>         }
>
>         if ( !ALIGNED_2M(mfn) ) /* Populate up until a point that could be a 
> superpage */
>         {
>             while ( !ALIGNED_2M(mfn + i) )
>                 i++;
>             populate_physmap_4k(ctx, i, mfns);
>             mfns += i;
>             nr -= i;
>         }
>
>         if ( nr >= 512 )
>         {
>             for ( i = 1; i < 512; ++i )
>                 if ( mfns[i] != mfns[0] + i )
>                     break;
>             if ( i == 512 )
>                 populate_physmap_2M(ctx, i, mfns);
>             else
>                 populate_physmap_4k(...);
>
>             mfns += i;
>             nr -= i;
>         }
>     }
>
>
>
> Obviously with error handling, and whatever boundary conditions I've got
> wrong.
>
> 2M is the only size that matters (ignoring 1G which we've excluded for
> now), and this form will reconstruct more superpages for the guest than
> trying to do 4M allocations will.


Actually, I think a simpler option is to alter the first loop, where
you've got contiguous= right now.

Something like:

@@ -158,6 +165,18 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int 
count,
             rc = pfn_set_populated(ctx, original_pfns[i]);
             if ( rc )
                 goto err;
+
+            if ( ALIGNED_2M(original_pfns[i]) &&
+                 can_populate_2M(&original_pfns[i], count - i) )
+            {
+                xen_pfn_t mfn = original_pfns[i];
+
+                rc = xc_domain_populate_physmap_exact(xch, ctx->domid, 1, 
ORDER_2M, 0, &mfn);
+
+                ...
+
+                i += 511;
+                continue;
+            }
+
             pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i];
             ++nr_pfns;
         }


with error handling of course.  This way we don't potentially fall back
to many separate hypercalls for 4k pages.

~Andrew

Re: [PATCH] Optimise restore memory allocation

Reply via email to