On 05.08.2025 17:27, Roger Pau Monné wrote:
> On Tue, Aug 05, 2025 at 02:38:38PM +0200, Jan Beulich wrote:
>> On 05.08.2025 11:52, Roger Pau Monne wrote:
>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -275,7 +275,7 @@ static void __init assign_io_page(struct page_info 
>>> *page)
>>>  
>>>  void __init arch_init_memory(void)
>>>  {
>>> -    unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn;
>>> +    unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn, 
>>> pdx;
>>>  
>>>      /*
>>>       * Basic guest-accessible flags:
>>> @@ -328,9 +328,20 @@ void __init arch_init_memory(void)
>>>              destroy_xen_mappings((unsigned long)mfn_to_virt(iostart_pfn),
>>>                                   (unsigned long)mfn_to_virt(ioend_pfn));
>>>  
>>> -        /* Mark as I/O up to next RAM region. */
>>> -        for ( ; pfn < rstart_pfn; pfn++ )
>>> +        /*
>>> +         * Mark as I/O up to next RAM region.  Iterate over the PDX space 
>>> to
>>> +         * skip holes which would always fail the mfn_valid() check.
>>> +         *
>>> +         * pfn_to_pdx() requires a valid (iow: RAM) PFN to convert to PDX,
>>> +         * hence provide pfn - 1, which is the tailing PFN from the last 
>>> RAM
>>> +         * range, or pdx 0 if the input pfn is 0.
>>> +         */
>>> +        for ( pdx = pfn ? pfn_to_pdx(pfn - 1) + 1 : 0;
>>> +              pdx < pfn_to_pdx(rstart_pfn);
>>> +              pdx++ )
>>>          {
>>> +            pfn = pdx_to_pfn(pdx);
>>> +
>>>              if ( !mfn_valid(_mfn(pfn)) )
>>>                  continue;
>>>  
>>
>> As much as I would have liked to ack this, I fear there's another caveat 
>> here:
>> At the top of the loop we check not only for RAM, but also for UNUSABLE. The
>> latter, like RAM, shouldn't be marked I/O, but we also can't use PFN <-> PDX
>> transformations on any such page.
> 
> Right you are.  I'm not sure however why we do this - won't we want
> the mappings of UNUSABLE regions also be removed from the Xen
> page-tables? (but not marked as IO)

Yes, I think this is a flaw in current code. Perhaps it was (wrongly) assumed
that no UNUSABLE regions would ever exist this low in a memory map? Imo we want
to deal with this in two steps - first sort the UNUSABLE issue, then improve
the dealing with what is passed to assign_io_page().

While there we may also want to find a way to tie together the 16Mb boundary
checks - the 16UL isn't properly connected to the BOOTSTRAP_MAP_BASE definition
in setup.c. Yet then: Am I overlooking something, or is the 16Mb boundary not
really special anymore? I.e. could e.g. BOOTSTRAP_MAP_BASE perhaps be moved (at
least in principle), either almost arbitrarily up (within the low 4Gb), or down
as much as to the 2Mb boundary? The relevant aspect here would be that the
comment saying "the statically-initialised 1-16MB mapping area" looks to be
stale, as of 7cd7f2f5e116 ("x86/boot: Remove the preconstructed low 16M
superpage mappings"). If there are excess mappings to worry about, those may
nowadays well live above the 16Mb boundary (because of it being 2Mb mappings
that head.S inserts into l2_directmap[]).

If I was to sensibly make changes to that code, I guess I first ought to find
a system which actually surfaces any (ideally "interesting") UNUSABLE ranges.
In any event, I think the cleanest thing we can do is split the loop in two,
one to deal with the removal of mappings between RAM regions up to the 4Gb
boundary, and the other to invoke assign_io_page() for holes between RAM ||
UNUSABLE ones. The latter would then be subject to your optimization. The
former could ignore the PDX aspect due to being limited to the low 4Gb, where
we never compress.

Jan

Reply via email to