mm: adjust loop in arch_init_memory() to iterate over the PDX space

Jan Beulich Wed, 06 Aug 2025 03:36:55 -0700

On 06.08.2025 10:11, Jan Beulich wrote:
> On 05.08.2025 17:27, Roger Pau Monné wrote:
>> On Tue, Aug 05, 2025 at 02:38:38PM +0200, Jan Beulich wrote:
>>> On 05.08.2025 11:52, Roger Pau Monne wrote:
>>>> --- a/xen/arch/x86/mm.c
>>>> +++ b/xen/arch/x86/mm.c
>>>> @@ -275,7 +275,7 @@ static void __init assign_io_page(struct page_info 
>>>> *page)
>>>>  
>>>>  void __init arch_init_memory(void)
>>>>  {
>>>> -    unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn;
>>>> +    unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn, 
>>>> pdx;
>>>>  
>>>>      /*
>>>>       * Basic guest-accessible flags:
>>>> @@ -328,9 +328,20 @@ void __init arch_init_memory(void)
>>>>              destroy_xen_mappings((unsigned long)mfn_to_virt(iostart_pfn),
>>>>                                   (unsigned long)mfn_to_virt(ioend_pfn));
>>>>  
>>>> -        /* Mark as I/O up to next RAM region. */
>>>> -        for ( ; pfn < rstart_pfn; pfn++ )
>>>> +        /*
>>>> +         * Mark as I/O up to next RAM region.  Iterate over the PDX space 
>>>> to
>>>> +         * skip holes which would always fail the mfn_valid() check.
>>>> +         *
>>>> +         * pfn_to_pdx() requires a valid (iow: RAM) PFN to convert to PDX,
>>>> +         * hence provide pfn - 1, which is the tailing PFN from the last 
>>>> RAM
>>>> +         * range, or pdx 0 if the input pfn is 0.
>>>> +         */
>>>> +        for ( pdx = pfn ? pfn_to_pdx(pfn - 1) + 1 : 0;
>>>> +              pdx < pfn_to_pdx(rstart_pfn);
>>>> +              pdx++ )
>>>>          {
>>>> +            pfn = pdx_to_pfn(pdx);
>>>> +
>>>>              if ( !mfn_valid(_mfn(pfn)) )
>>>>                  continue;
>>>>  
>>>
>>> As much as I would have liked to ack this, I fear there's another caveat 
>>> here:
>>> At the top of the loop we check not only for RAM, but also for UNUSABLE. The
>>> latter, like RAM, shouldn't be marked I/O, but we also can't use PFN <-> PDX
>>> transformations on any such page.
>>
>> Right you are.  I'm not sure however why we do this - won't we want
>> the mappings of UNUSABLE regions also be removed from the Xen
>> page-tables? (but not marked as IO)
> 
> Yes, I think this is a flaw in current code. Perhaps it was (wrongly) assumed
> that no UNUSABLE regions would ever exist this low in a memory map? Imo we 
> want
> to deal with this in two steps - first sort the UNUSABLE issue, then improve
> the dealing with what is passed to assign_io_page().
> 
> While there we may also want to find a way to tie together the 16Mb boundary
> checks - the 16UL isn't properly connected to the BOOTSTRAP_MAP_BASE 
> definition
> in setup.c. Yet then: Am I overlooking something, or is the 16Mb boundary not
> really special anymore? I.e. could e.g. BOOTSTRAP_MAP_BASE perhaps be moved 
> (at
> least in principle), either almost arbitrarily up (within the low 4Gb), or 
> down
> as much as to the 2Mb boundary? The relevant aspect here would be that the
> comment saying "the statically-initialised 1-16MB mapping area" looks to be
> stale, as of 7cd7f2f5e116 ("x86/boot: Remove the preconstructed low 16M
> superpage mappings"). If there are excess mappings to worry about, those may
> nowadays well live above the 16Mb boundary (because of it being 2Mb mappings
> that head.S inserts into l2_directmap[]).


Hmm, extending this to beyond 16M collides with mappings done by 
acpi_dmar_init(),
erst_init(), and acpi_hest_init(). Luckily efi_init_memory() runs only 
afterwards.
While I guess I could limit this to the space 2M mappings were done for in 
head.S,
theoretically there could still be a collision afterwards. So I think we need to
either somehow exclude such mappings (might end up fragile), or stop (ab)using
the directmap there. Thoughts?

Jan

Re: [PATCH v4 8/8] x86/mm: adjust loop in arch_init_memory() to iterate over the PDX space

Reply via email to