On 07/11/2019 07:36, Jan Beulich wrote: > On 06.11.2019 18:31, Andrew Cooper wrote: >> On 06/11/2019 15:16, Jan Beulich wrote: >>> update_paging_mode() in the AMD IOMMU code expects to be invoked with >>> the PCI devices lock held. The check occurring only when the mode >>> actually needs updating, the violation of this rule by the majority >>> of callers did go unnoticed until per-domain IOMMU setup was changed >>> to do away with on-demand creation of IOMMU page tables. >>> >>> Unfortunately the only half way reasonable fix to this that I could >>> come up with requires more re-work than would seem desirable at this >>> time of the release process, but addressing the issue seems >>> unavoidable to me as its manifestation is a regression from the >>> IOMMU page table setup re-work. The change also isn't without risk >>> of further regressions - if in patch 2 I've missed a code path that >>> would also need to invoke the new hook, then this might mean non- >>> working guests (with passed-through devices on AMD hardware). >>> >>> 1: AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page >>> 2: introduce GFN notification for translated domains >>> 3: AMD/IOMMU: use notify_dfn() hook to update paging mode >> Having now looked at all three, why don't we just delete the dynamic >> height of AMD IOMMU pagetables? >> >> This series looks suspiciously like it is adding new common >> infrastructure to work around the fact we're doing something fairly dumb >> to being with. >> >> Hardcoding at 4 levels is, at the very worst, 2 extra pages per domain, >> and a substantial reduction in the complexity of the IOMMU code. > Yet an additional level of page walks hardware has to perform. Also > 4 levels won't cover all possible 52 address bits. And finally, the > more applicable your "substantial reduction", the less suitable such > a change may be at this point of the release (but I didn't look at > this side of things in any detail, so it may well not be an issue).
There is, in practice, no such thing as an HVM guest using 2 levels. The VRAM just below the 4G boundary will force a resize to 3 levels during domain construction, and as a 1-line fix for 4.13, this probably isn't the worst idea going. There are no AMD systems which support >48 bit PA space, so 4 levels is sufficient for now, but fundamentally details such as the size of GPA space should be specified in domain_create() and remain static for the lifetime of the domain. As far as I can tell, it is only AMD systems with IOMMUs which permit the PA space to be variable, and I still can't help but feeling that this series is attempting to work around a problem we shouldn't have in the first place. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel