Re: [Xen-devel] [PATCH v2] x86/mm/p2m: don't needlessly limit MMIO mapping order to 4k
> -Original Message- > From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf > Of Jan Beulich > Sent: 25 October 2018 15:28 > To: Paul Durrant > Cc: Andrew Cooper ; Wei Liu > ; xen-devel > Subject: Re: [Xen-devel] [PATCH v2] x86/mm/p2m: don't needlessly limit > MMIO mapping order to 4k > > >>> On 17.10.18 at 16:24, wrote: > > --- a/xen/arch/x86/mm/p2m.c > > +++ b/xen/arch/x86/mm/p2m.c > > @@ -2081,14 +2081,11 @@ static unsigned int mmio_order(const struct > domain *d, > > unsigned long start_fn, unsigned long > nr) > > { > > /* > > - * Note that the !iommu_use_hap_pt() here has three effects: > > - * - cover iommu_{,un}map_page() not having an "order" input yet, > > - * - exclude shadow mode (which doesn't support large MMIO > mappings), > > - * - exclude PV guests, should execution reach this code for such. > > - * So be careful when altering this. > > + * PV guests or shadow-mode HVM guests must be restricted to 4k > > + * mappings. > > Since you've already posted a patch to add order parameters to > IOMMU map/unmap, I'd prefer the respective part of the comment > to go away only when the order value really can be passed through. > This then requires permitting non-zero values only when the IOMMUs > also allow for the respective page sizes. > Ok. I can do that. I'll have to provide some way of querying the IOMMU implementation for which orders it supports in that case. > I am in particular not convinced that the time needed to carry out > the hypercall is going to be low enough even for 512 4k pages: You > need to take into account flushes, including those potentially > needed for ATS. You don't provide any proof that flushes are > always delayed and batched, nor do I think this is uniformly the > case. > In testing, doing the 512 4k loop does not seem to cause issues with either VT-d or AMD IOMMU. The AMD IOMMU case is still significantly slower than VT-d though so I'm currently experimenting with eliding the flushing as well as removing the page merging code to speed it up. > > @@ -2096,8 +2093,12 @@ static unsigned int mmio_order(const struct > domain *d, > > * set_typed_p2m_entry() when it needs to zap M2P entries > > * for a RAM range. > > */ && > > - !(start_fn & ((1UL << PAGE_ORDER_1G) - 1)) && (nr >> > PAGE_ORDER_1G) && > > - hap_has_1gb ) > > + !(start_fn & ((1UL << PAGE_ORDER_1G) - 1)) && > > + (nr >> PAGE_ORDER_1G) && > > + hap_has_1gb && > > + /* disable 1G mappings if we need to keep the IOMMU in sync */ > > + !need_iommu_pt_sync(d) > > I have to admit that I don't understand this v2 addition. How is > keeping-in-sync relevant for 1G but not 2M pages? > I was attempting to make it obvious that, if the IOMMU mappings needed syncing (i.e. they are not shared), that we would not handle a 1G mapping (and thus attempt to iterate over the 1G in 4k chunks). > > +) > > Misplaced paren. True. Paul > > Jan > > > > ___ > Xen-devel mailing list > Xen-devel@lists.xenproject.org > https://lists.xenproject.org/mailman/listinfo/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] x86/mm/p2m: don't needlessly limit MMIO mapping order to 4k
>>> On 25.10.18 at 16:36, wrote: > On 25/10/18 15:28, Jan Beulich wrote: > On 17.10.18 at 16:24, wrote: >>> --- a/xen/arch/x86/mm/p2m.c >>> +++ b/xen/arch/x86/mm/p2m.c >>> @@ -2081,14 +2081,11 @@ static unsigned int mmio_order(const struct domain > *d, >>> unsigned long start_fn, unsigned long nr) >>> { >>> /* >>> - * Note that the !iommu_use_hap_pt() here has three effects: >>> - * - cover iommu_{,un}map_page() not having an "order" input yet, >>> - * - exclude shadow mode (which doesn't support large MMIO mappings), >>> - * - exclude PV guests, should execution reach this code for such. >>> - * So be careful when altering this. >>> + * PV guests or shadow-mode HVM guests must be restricted to 4k >>> + * mappings. >> Since you've already posted a patch to add order parameters to >> IOMMU map/unmap, I'd prefer the respective part of the comment >> to go away only when the order value really can be passed through. >> This then requires permitting non-zero values only when the IOMMUs >> also allow for the respective page sizes. >> >> I am in particular not convinced that the time needed to carry out >> the hypercall is going to be low enough even for 512 4k pages: You >> need to take into account flushes, including those potentially >> needed for ATS. You don't provide any proof that flushes are >> always delayed and batched, nor do I think this is uniformly the >> case. > > I haven't had time to pick this back up since v1. > > The long and the short of it is that we allow order 1G loops for regular > RAM, even in !shared_pt mode. Do we? CONFIG_DOMU_MAX_ORDER is 9 for both ARM and x86. That still allows 2M loops, which - as said - I'm worried about for the time non-batched TLB flushes take. > From an "interaction with the IOMMU" point of view, mappings over > regular RAM are no different to mappings over MMIO, so they should > behave consistently. I agree in general. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] x86/mm/p2m: don't needlessly limit MMIO mapping order to 4k
On 25/10/18 15:28, Jan Beulich wrote: On 17.10.18 at 16:24, wrote: >> --- a/xen/arch/x86/mm/p2m.c >> +++ b/xen/arch/x86/mm/p2m.c >> @@ -2081,14 +2081,11 @@ static unsigned int mmio_order(const struct domain >> *d, >> unsigned long start_fn, unsigned long nr) >> { >> /* >> - * Note that the !iommu_use_hap_pt() here has three effects: >> - * - cover iommu_{,un}map_page() not having an "order" input yet, >> - * - exclude shadow mode (which doesn't support large MMIO mappings), >> - * - exclude PV guests, should execution reach this code for such. >> - * So be careful when altering this. >> + * PV guests or shadow-mode HVM guests must be restricted to 4k >> + * mappings. > Since you've already posted a patch to add order parameters to > IOMMU map/unmap, I'd prefer the respective part of the comment > to go away only when the order value really can be passed through. > This then requires permitting non-zero values only when the IOMMUs > also allow for the respective page sizes. > > I am in particular not convinced that the time needed to carry out > the hypercall is going to be low enough even for 512 4k pages: You > need to take into account flushes, including those potentially > needed for ATS. You don't provide any proof that flushes are > always delayed and batched, nor do I think this is uniformly the > case. I haven't had time to pick this back up since v1. The long and the short of it is that we allow order 1G loops for regular RAM, even in !shared_pt mode. From an "interaction with the IOMMU" point of view, mappings over regular RAM are no different to mappings over MMIO, so they should behave consistently. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] x86/mm/p2m: don't needlessly limit MMIO mapping order to 4k
>>> On 17.10.18 at 16:24, wrote: > --- a/xen/arch/x86/mm/p2m.c > +++ b/xen/arch/x86/mm/p2m.c > @@ -2081,14 +2081,11 @@ static unsigned int mmio_order(const struct domain *d, > unsigned long start_fn, unsigned long nr) > { > /* > - * Note that the !iommu_use_hap_pt() here has three effects: > - * - cover iommu_{,un}map_page() not having an "order" input yet, > - * - exclude shadow mode (which doesn't support large MMIO mappings), > - * - exclude PV guests, should execution reach this code for such. > - * So be careful when altering this. > + * PV guests or shadow-mode HVM guests must be restricted to 4k > + * mappings. Since you've already posted a patch to add order parameters to IOMMU map/unmap, I'd prefer the respective part of the comment to go away only when the order value really can be passed through. This then requires permitting non-zero values only when the IOMMUs also allow for the respective page sizes. I am in particular not convinced that the time needed to carry out the hypercall is going to be low enough even for 512 4k pages: You need to take into account flushes, including those potentially needed for ATS. You don't provide any proof that flushes are always delayed and batched, nor do I think this is uniformly the case. > @@ -2096,8 +2093,12 @@ static unsigned int mmio_order(const struct domain *d, > * set_typed_p2m_entry() when it needs to zap M2P entries > * for a RAM range. > */ && > - !(start_fn & ((1UL << PAGE_ORDER_1G) - 1)) && (nr >> PAGE_ORDER_1G) > && > - hap_has_1gb ) > + !(start_fn & ((1UL << PAGE_ORDER_1G) - 1)) && > + (nr >> PAGE_ORDER_1G) && > + hap_has_1gb && > + /* disable 1G mappings if we need to keep the IOMMU in sync */ > + !need_iommu_pt_sync(d) I have to admit that I don't understand this v2 addition. How is keeping-in-sync relevant for 1G but not 2M pages? > +) Misplaced paren. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2] x86/mm/p2m: don't needlessly limit MMIO mapping order to 4k
The P2M common code currently restricts the MMIO mapping order of any domain with IOMMU mappings, but that is not using shared tables, to 4k. This has been shown to have a huge performance cost when passing through a PCI device with a very large BAR (e.g. NVIDIA P40), increasing the guest boot time from ~20s to several minutes when iommu=no-sharept is specified on the Xen command line. The limitation was added by commit c3c756bd "x86/p2m: use large pages for MMIO mappings" however the underlying implementations of p2m->set_entry for Intel and AMD are coded to cope with mapping orders higher than 4k, even though the IOMMU mapping function is itself currently limited to 4k, so there is no real need to limit the order passed into the method, other than to avoid a potential DoS caused by a long running hypercall. In practice, mmio_order() already strictly disallows 1G mappings since the if clause in question starts with: if ( 0 /* * Don't use 1Gb pages, to limit the iteration count in * set_typed_p2m_entry() when it needs to zap M2P entries * for a RAM range. */ && With this patch applied (and hence 2M mappings in use) the VM boot time is restored to something reasonable. Not as fast as without iommu=no-sharept, but within a few seconds of it. NOTE: This patch takes the opportunity to shorten a couple of > 80 character lines in the code. Signed-off-by: Paul Durrant Acked-by: George Dunlap --- Cc: Jan Beulich Cc: Andrew Cooper Cc: Wei Liu v2: - Add an extra to the if clause disallowing 1G mappings to make sure they are not used if need_iommu_pt_sync() is true, even though the check is currently moot (see main comment). --- xen/arch/x86/mm/p2m.c | 19 ++- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index a00a3c1bff..f972b4819d 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -2081,14 +2081,11 @@ static unsigned int mmio_order(const struct domain *d, unsigned long start_fn, unsigned long nr) { /* - * Note that the !iommu_use_hap_pt() here has three effects: - * - cover iommu_{,un}map_page() not having an "order" input yet, - * - exclude shadow mode (which doesn't support large MMIO mappings), - * - exclude PV guests, should execution reach this code for such. - * So be careful when altering this. + * PV guests or shadow-mode HVM guests must be restricted to 4k + * mappings. */ -if ( !iommu_use_hap_pt(d) || - (start_fn & ((1UL << PAGE_ORDER_2M) - 1)) || !(nr >> PAGE_ORDER_2M) ) +if ( !hap_enabled(d) || (start_fn & ((1UL << PAGE_ORDER_2M) - 1)) || + !(nr >> PAGE_ORDER_2M) ) return PAGE_ORDER_4K; if ( 0 /* @@ -2096,8 +2093,12 @@ static unsigned int mmio_order(const struct domain *d, * set_typed_p2m_entry() when it needs to zap M2P entries * for a RAM range. */ && - !(start_fn & ((1UL << PAGE_ORDER_1G) - 1)) && (nr >> PAGE_ORDER_1G) && - hap_has_1gb ) + !(start_fn & ((1UL << PAGE_ORDER_1G) - 1)) && + (nr >> PAGE_ORDER_1G) && + hap_has_1gb && + /* disable 1G mappings if we need to keep the IOMMU in sync */ + !need_iommu_pt_sync(d) +) return PAGE_ORDER_1G; if ( hap_has_2mb ) -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel