x86: perform PV Dom0 mappings in batches

Roger Pau Monné Wed, 04 May 2022 06:56:25 -0700

On Wed, May 04, 2022 at 02:27:14PM +0200, Jan Beulich wrote:
> On 04.05.2022 13:20, Roger Pau Monné wrote:
> > On Wed, May 04, 2022 at 11:46:37AM +0200, Jan Beulich wrote:
> >> On 03.05.2022 16:49, Roger Pau Monné wrote:
> >>> On Mon, Apr 25, 2022 at 10:34:59AM +0200, Jan Beulich wrote:
> >>> It would seem to me that doing it that way would also allow the
> >>> mappings to get established in blocks for domUs.
> >>
> >> ... then this would perhaps be possible.
> >>
> >>>> The installing of zero-ref writable types has in fact shown (observed
> >>>> while putting together the change) that despite the intention by the
> >>>> XSA-288 changes (affecting DomU-s only) for Dom0 a number of
> >>>> sufficiently ordinary pages (at the very least initrd and P2M ones as
> >>>> well as pages that are part of the initial allocation but not part of
> >>>> the initial mapping) still have been starting out as PGT_none, meaning
> >>>> that they would have gained IOMMU mappings only the first time these
> >>>> pages would get mapped writably. Consequently an open question is
> >>>> whether iommu_memory_setup() should set the pages to PGT_writable_page
> >>>> independent of need_iommu_pt_sync().
> >>>
> >>> I think I'm confused, doesn't the setting of PGT_writable_page happen
> >>> as a result of need_iommu_pt_sync() and having those pages added to
> >>> the IOMMU page tables? (so they can be properly tracked and IOMMU
> >>> mappings are removed if thte page is also removed)
> >>
> >> In principle yes - in guest_physmap_add_page(). But this function isn't
> >> called for the pages I did enumerate in the remark. XSA-288 really only
> >> cared about getting this right for DomU-s.
> > 
> > Would it make sense to change guest_physmap_add_page() to be able to
> > pass the page_order parameter down to iommu_map(), and then use it for
> > dom0 build instead of introducing iommu_memory_setup()?
> 
> To be quite frank: This is something that I might have been willing to
> do months ago, when this series was still fresh. If I was to start
> re-doing all of this code now, it would take far more time than it
> would have taken back then. Hence I'd like to avoid a full re-work here
> unless entirely unacceptable in the way currently done (which largely
> fits with how we have been doing Dom0 setup).


Sorry, I would have really liked to be more on time with reviews of
this, but there's always something that comes up.

> Furthermore, guest_physmap_add_page() doesn't itself call iommu_map().
> What you're suggesting would require get_page_and_type() to be able to
> work on higher-order pages. I view adjustments like this as well out
> of scope for this series.

Well, my initial thinking was to do something similar to what you
currently have in iommu_memory_setup: a direct call to iommu_map and
adjust the page types manually, but I think this will only work for
dom0 because pages are fresh at that point.  For domUs we must use
get_page_and_type so any previous mapping is also removed.

> > I think guest_physmap_add_page() will need to be adjusted at some
> > point for domUs, and hence it could be unified with dom0 usage
> > also?
> 
> As an optimization - perhaps. I view it as more important to have HVM
> guests work reasonably well (which includes the performance aspect of
> setting them up).

OK, I'm fine with focusing on HVM.

> >>>> --- a/xen/drivers/passthrough/x86/iommu.c
> >>>> +++ b/xen/drivers/passthrough/x86/iommu.c
> >>>> @@ -347,8 +347,8 @@ static unsigned int __hwdom_init hwdom_i
> >>>>  
> >>>>  void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
> >>>>  {
> >>>> -    unsigned long i, top, max_pfn;
> >>>> -    unsigned int flush_flags = 0;
> >>>> +    unsigned long i, top, max_pfn, start, count;
> >>>> +    unsigned int flush_flags = 0, start_perms = 0;
> >>>>  
> >>>>      BUG_ON(!is_hardware_domain(d));
> >>>>  
> >>>> @@ -379,9 +379,9 @@ void __hwdom_init arch_iommu_hwdom_init(
> >>>>       * First Mb will get mapped in one go by pvh_populate_p2m(). Avoid
> >>>>       * setting up potentially conflicting mappings here.
> >>>>       */
> >>>> -    i = paging_mode_translate(d) ? PFN_DOWN(MB(1)) : 0;
> >>>> +    start = paging_mode_translate(d) ? PFN_DOWN(MB(1)) : 0;
> >>>>  
> >>>> -    for ( ; i < top; i++ )
> >>>> +    for ( i = start, count = 0; i < top; )
> >>>>      {
> >>>>          unsigned long pfn = pdx_to_pfn(i);
> >>>>          unsigned int perms = hwdom_iommu_map(d, pfn, max_pfn);
> >>>> @@ -390,20 +390,41 @@ void __hwdom_init arch_iommu_hwdom_init(
> >>>>          if ( !perms )
> >>>>              rc = 0;
> >>>>          else if ( paging_mode_translate(d) )
> >>>> +        {
> >>>>              rc = p2m_add_identity_entry(d, pfn,
> >>>>                                          perms & IOMMUF_writable ? 
> >>>> p2m_access_rw
> >>>>                                                                  : 
> >>>> p2m_access_r,
> >>>>                                          0);
> >>>> +            if ( rc )
> >>>> +                printk(XENLOG_WARNING
> >>>> +                       "%pd: identity mapping of %lx failed: %d\n",
> >>>> +                       d, pfn, rc);
> >>>> +        }
> >>>> +        else if ( pfn != start + count || perms != start_perms )
> >>>> +        {
> >>>> +        commit:
> >>>> +            rc = iommu_map(d, _dfn(start), _mfn(start), count, 
> >>>> start_perms,
> >>>> +                           &flush_flags);
> >>>> +            if ( rc )
> >>>> +                printk(XENLOG_WARNING
> >>>> +                       "%pd: IOMMU identity mapping of [%lx,%lx) 
> >>>> failed: %d\n",
> >>>> +                       d, pfn, pfn + count, rc);
> >>>> +            SWAP(start, pfn);
> >>>> +            start_perms = perms;
> >>>> +            count = 1;
> >>>> +        }
> >>>>          else
> >>>> -            rc = iommu_map(d, _dfn(pfn), _mfn(pfn), 1ul << 
> >>>> PAGE_ORDER_4K,
> >>>> -                           perms, &flush_flags);
> >>>> +        {
> >>>> +            ++count;
> >>>> +            rc = 0;
> >>>
> >>> Seeing as we want to process this in blocks now, I wonder whether it
> >>> would make sense to take a different approach, and use a rangeset to
> >>> track which regions need to be mapped.  What gets added would be based
> >>> on the host e820 plus the options
> >>> iommu_hwdom_{strict,inclusive,reserved}.  We would then punch holes
> >>> based on the logic in hwdom_iommu_map() and finally we could iterate
> >>> over the regions afterwards using rangeset_consume_ranges().
> >>>
> >>> Not that you strictly need to do it here, just think the end result
> >>> would be clearer.
> >>
> >> The end result might indeed be, but it would be more of a change right
> >> here. Hence I'd prefer to leave that out of the series for now.
> > 
> > OK.  I think it might be nice to add a comment in that regard, mostly
> > because I tend to forget myself.
> 
> Sure, I've added another post-commit-message remark.

Sorry for being confused, but are those reflected in the final commit
message, or in the code itself?

Thanks, Roger.

Re: [PATCH v4 06/21] IOMMU/x86: perform PV Dom0 mappings in batches

Reply via email to