On 07/11/2019 07:36, Jan Beulich wrote:
> On 06.11.2019 18:31, Andrew Cooper wrote:
>> On 06/11/2019 15:16, Jan Beulich wrote:
>>> update_paging_mode() in the AMD IOMMU code expects to be invoked with
>>> the PCI devices lock held. The check occurring only when the mode
>>> actually needs updating, the violation of this rule by the majority
>>> of callers did go unnoticed until per-domain IOMMU setup was changed
>>> to do away with on-demand creation of IOMMU page tables.
>>>
>>> Unfortunately the only half way reasonable fix to this that I could
>>> come up with requires more re-work than would seem desirable at this
>>> time of the release process, but addressing the issue seems
>>> unavoidable to me as its manifestation is a regression from the
>>> IOMMU page table setup re-work. The change also isn't without risk
>>> of further regressions - if in patch 2 I've missed a code path that
>>> would also need to invoke the new hook, then this might mean non-
>>> working guests (with passed-through devices on AMD hardware).
>>>
>>> 1: AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page
>>> 2: introduce GFN notification for translated domains
>>> 3: AMD/IOMMU: use notify_dfn() hook to update paging mode
>> Having now looked at all three, why don't we just delete the dynamic
>> height of AMD IOMMU pagetables?
>>
>> This series looks suspiciously like it is adding new common
>> infrastructure to work around the fact we're doing something fairly dumb
>> to being with.
>>
>> Hardcoding at 4 levels is, at the very worst, 2 extra pages per domain,
>> and a substantial reduction in the complexity of the IOMMU code.
> Yet an additional level of page walks hardware has to perform. Also
> 4 levels won't cover all possible 52 address bits. And finally, the
> more applicable your "substantial reduction", the less suitable such
> a change may be at this point of the release (but I didn't look at
> this side of things in any detail, so it may well not be an issue).

There is, in practice, no such thing as an HVM guest using 2 levels. 
The VRAM just below the 4G boundary will force a resize to 3 levels
during domain construction, and as a 1-line fix for 4.13, this probably
isn't the worst idea going.

There are no AMD systems which support >48 bit PA space, so 4 levels is
sufficient for now, but fundamentally details such as the size of GPA
space should be specified in domain_create() and remain static for the
lifetime of the domain.

As far as I can tell, it is only AMD systems with IOMMUs which permit
the PA space to be variable, and I still can't help but feeling that
this series is attempting to work around a problem we shouldn't have in
the first place.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to