Re: [PATCH v3 7/8] pdx: introduce a new compression algorithm based on region offsets

Jan Beulich Mon, 04 Aug 2025 06:41:33 -0700

On 04.08.2025 15:08, Roger Pau Monné wrote:
> On Tue, Jul 29, 2025 at 04:16:15PM +0200, Jan Beulich wrote:
>> On 24.07.2025 13:04, Roger Pau Monne wrote:
>>> --- a/xen/common/pdx.c
>>> +++ b/xen/common/pdx.c
>>> @@ -24,6 +24,7 @@
>>>  #include <xen/param.h>
>>>  #include <xen/pfn.h>
>>>  #include <xen/sections.h>
>>> +#include <xen/sort.h>
>>>  
>>>  /**
>>>   * Maximum (non-inclusive) usable pdx. Must be
>>> @@ -40,6 +41,12 @@ bool __mfn_valid(unsigned long mfn)
>>>  
>>>  #ifdef CONFIG_PDX_MASK_COMPRESSION
>>>      invalid |= mfn & pfn_hole_mask;
>>> +#elif defined(CONFIG_PDX_OFFSET_COMPRESSION)
>>> +{
>>> +    unsigned long base = pfn_bases[PFN_TBL_IDX(mfn)];
>>> +
>>> +    invalid |= mfn < base || mfn >= base + pdx_region_size;
>>
>> Leveraging wrapping, this could be simplified to
>>
>>     invalid |= mfn - base >= pdx_region_size;
>>
>> I think. Considering it's a frequently used path, doing so may be worthwhile.
> 
> I don't think that would work for all cases, take the following
> example:
> 
> PFN compression using lookup table shift 18 and region size 0x40000
>  range   0 [0000000280000, 00000002bffff] PFN IDX  10 : 0000000280000
> 
> If you pass mfn 0 to mfn_valid() with your proposed adjustment, the
> result of the subtraction would be:
> 
> 0 - ~0UL == 1
> 
> Which wouldn't satisfy the >= condition, and hence pfn 0 would be
> reported as a valid mfn.  I think we need to keep both sides of the
> check.


Hmm, right - I keep forgetting that the start of a pfn_bases[x] isn't 
necessarily
a valid page itself.

>>> +static void __init cf_check swp_node(void *a, void *b, size_t size)
>>> +{
>>> +    SWAP(a, b);
>>
>> This doesn't look right - you swap a and b, not what they point to.
>>
>>> +static bool __init pfn_offset_sanitize_ranges(void)
>>> +{
>>> +    unsigned int i = 0;
>>> +
>>> +    if ( nr_ranges == 1 )
>>> +    {
>>> +        ASSERT(PFN_TBL_IDX(ranges[0].base) ==
>>> +               PFN_TBL_IDX(ranges[0].base + ranges[0].size - 1));
>>
>> I think this points out a naming issue in patch 2: "base" and "size" look
>> as if these were address / byte count, when they're PFN / page count.
>> Which caught my eye because of the values being passed to something that
>> actually wants a PFN (and hence looked wrong at the first glance).
> 
> The struct name is pfn_range, I could rename the fields to base_pfn
> and pages or similar, but my impression was that the struct name was
> enough of a pointer that those are PFN ranges.

Problem being that the struct name isn't anywhere in sight here.

>  Do you have any
> alternative proposal about how to name those?

Your suggested new naming looks good to me.

>>> +        return true;
>>> +    }
>>> +
>>> +    while ( i + 1 < nr_ranges )
>>> +    {
>>> +        /*
>>> +         * Ensure ranges [start, end] use the same offset table index.  
>>> Should
>>> +         * be guaranteed by the logic that calculates the pfn shift.
>>> +         */
>>> +        if ( PFN_TBL_IDX(ranges[i].base) !=
>>> +             PFN_TBL_IDX(ranges[i].base + ranges[i].size - 1) ||
>>> +             PFN_TBL_IDX(ranges[i + 1].base) !=
>>> +             PFN_TBL_IDX(ranges[i + 1].base + ranges[i + 1].size - 1) )
>>
>> It feels a little odd to re-do the 2nd half of the checking here on the next
>> iteration, when the table wouldn't have changed when ...
>>
>>> +        {
>>> +            ASSERT_UNREACHABLE();
>>> +            return false;
>>> +        }
>>> +
>>> +        if ( PFN_TBL_IDX(ranges[i].base) != PFN_TBL_IDX(ranges[i + 
>>> 1].base) )
>>> +        {
>>> +            i++;
>>> +            continue;
>>
>> ... taking this path. Could I talk you into moving the latter half ...
>>
>>> +        }
>>
>> ... here? If that's needed at all, as ...
>>
>>> +        /* Merge ranges with the same table index. */
>>> +        ranges[i].size = ranges[i + 1].base + ranges[i + 1].size -
>>> +                         ranges[i].base;
>>
>> ... the new range should also fulfill the requirement. Just that the last
>> such range then wouldn't be checked, unless ...
>>
>>> +        memmove(&ranges[i + 1], &ranges[i + 2],
>>> +                (nr_ranges - (i + 2)) * sizeof(ranges[0]));
>>> +        nr_ranges--;
>>> +    }
>>
>> ... that checking was put past the loop. Which then would allow to remove
>> the special casing of nr_ranges == 1 at the top of the function: You'd
>> uniformly check the ranges[nr_ranges - 1] here.      
> 
> What about doing it like:
> 
> static bool __init pfn_offset_sanitize_ranges(void)
> {
>     unsigned int i = 0;
> 
>     if ( PFN_TBL_IDX(ranges[0].base) !=
>          PFN_TBL_IDX(ranges[0].base + ranges[0].size - 1) )
>     {
>         ASSERT_UNREACHABLE();
>         return false;
>     }
> 
>     while ( i + 1 < nr_ranges )
>     {
>         /*
>          * Ensure ranges [start, end] use the same offset table index.  Should
>          * be guaranteed by the logic that calculates the pfn shift.
>          */
>         if ( PFN_TBL_IDX(ranges[i + 1].base) !=
>              PFN_TBL_IDX(ranges[i + 1].base + ranges[i + 1].size - 1) )
>         {
>             ASSERT_UNREACHABLE();
>             return false;
>         }
> 
>         if ( PFN_TBL_IDX(ranges[i].base) != PFN_TBL_IDX(ranges[i + 1].base) )
>         {
>             i++;
>             continue;
>         }
> 
>         /* Merge ranges with the same table index. */
>         ranges[i].size = ranges[i + 1].base + ranges[i + 1].size -
>                          ranges[i].base;
>         memmove(&ranges[i + 1], &ranges[i + 2],
>                 (nr_ranges - (i + 2)) * sizeof(ranges[0]));
>         nr_ranges--;
>     }
> 
>     return true;
> }
> 
> I've pulled the index 0 check ahead of the loop, which then covers for
> the case where nr_ranges == 1.  There's also no duplicate checking of
> the ranges, since the range at i + 1 will always be a non-checked one;
> either because the array has been shifted as a result of a range
> merging, or the index has been advanced.

Looks good, thanks.

>>> +        ranges[i].size = start + ranges[i].size - ranges[i].base;
>>> +        ranges[i - 1].size = ranges[i].base + ranges[i].size -
>>> +                             ranges[i - 1].base;
>>> +
>>> +        if ( i + 1 < nr_ranges )
>>> +            memmove(&ranges[i], &ranges[i + 1],
>>> +                    (nr_ranges - (i + 1)) * sizeof(ranges[0]));
>>> +        else /* last range */
>>> +            mask |= pdx_region_mask(ranges[i].base, ranges[i].size);
>>> +        nr_ranges--;
>>> +        i--;
>>> +    }
>>> +
>>> +    /*
>>> +     * Populate a mask with the non-equal bits of the different ranges, do 
>>> this
>>> +     * to calculate the maximum PFN shift to use as the lookup table index.
>>> +     */
>>> +    for ( i = 0; i < nr_ranges; i++ )
>>> +        for ( unsigned int j = 0; j < nr_ranges; j++ )
>>> +            idx_mask |= (ranges[i].base & ~mask) ^ (ranges[j].base & 
>>> ~mask);
>>> +
>>> +    if ( !idx_mask )
>>> +        /* Single region case. */
>>> +        pfn_index_shift = flsl(mask);
>>> +    else if ( flsl(idx_mask) - ffsl(idx_mask) < 
>>> CONFIG_PDX_OFFSET_TBL_ORDER )
>>> +        /* The changed mask fits in the table index width. */
>>> +        pfn_index_shift = ffsl(idx_mask) - 1;
>>> +    else
>>> +        /* Changed mask is wider than array size, use most significant 
>>> bits. */
>>> +        pfn_index_shift = flsl(idx_mask) - CONFIG_PDX_OFFSET_TBL_ORDER;
>>
>> Perhaps emit a log message here to indicate that bigger PDX_OFFSET_TBL_ORDER
>> may yield better results?
> 
> What about:
> 
> printk(XENLOG_DEBUG
>        "PFN compression table index truncated, requires order %u\n",
>        flsl(idx_mask) - ffsl(idx_mask) + 1);

Again, fine with me.

Jan

Re: [PATCH v3 7/8] pdx: introduce a new compression algorithm based on region offsets

Reply via email to