Hi, On 05/18/2018 05:41 AM, Peter Xu wrote: > On Thu, May 17, 2018 at 04:42:54PM +0200, Auger Eric wrote: >> Hi Peter, >> >> On 05/04/2018 05:08 AM, Peter Xu wrote: >>> During IOVA page table walking, there is a special case when the PSI >>> covers one whole PDE (Page Directory Entry, which contains 512 Page >>> Table Entries) or more. In the past, we skip that entry and we don't >>> notify the IOMMU notifiers. This is not correct. We should send UNMAP >>> notification to registered UNMAP notifiers in this case. >>> >>> For UNMAP only notifiers, this might cause IOTLBs cached in the devices >>> even if they were already invalid. For MAP/UNMAP notifiers like >>> vfio-pci, this will cause stale page mappings. >>> >>> This special case doesn't trigger often, but it is very easy to be >>> triggered by nested device assignments, since in that case we'll >>> possibly map the whole L2 guest RAM region into the device's IOVA >>> address space (several GBs at least), which is far bigger than normal >>> kernel driver usages of the device (tens of MBs normally). >>> >>> Without this patch applied to L1 QEMU, nested device assignment to L2 >>> guests will dump some errors like: >>> >>> qemu-system-x86_64: VFIO_MAP_DMA: -17 >>> qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000, >>> 0x7f89a920d000) = -17 (File exists) >>> >>> Acked-by: Jason Wang <jasow...@redhat.com> >>> [peterx: rewrite the commit message] >>> Signed-off-by: Peter Xu <pet...@redhat.com> >>> --- >>> hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++------------ >>> 1 file changed, 30 insertions(+), 12 deletions(-) >>> >>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c >>> index fb31de9416..b359efd6f9 100644 >>> --- a/hw/i386/intel_iommu.c >>> +++ b/hw/i386/intel_iommu.c >>> @@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, >>> uint64_t iova, bool is_write, >>> >>> typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private); >>> >>> +static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level, >>> + vtd_page_walk_hook hook_fn, void *private) >> I find the function name a bit weird as it does not does a ptw but >> rather call a callback on an entry. vtd_callback_wrapper? > > It's a hook for the page walk process, and IMHO vtd_callback_wrapper > does not really provide any hint for the page walking. So even if you > prefer the "callback_wrapper" naming I would still more prefer: > > vtd_page_walk_callback[_wrapper] > > though if so I'd say I don't see much benefit comparing to use the old > vtd_page_walk_hook, which seems fine to me too... I preferred vtd_page_walk_hook too.
Thanks Eric > >>> +{ >>> + assert(hook_fn); >>> + trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr, >>> + entry->addr_mask, entry->perm); >>> + return hook_fn(entry, private); >>> +} >>> + >>> /** >>> * vtd_page_walk_level - walk over specific level for IOVA range >>> * >>> @@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr, >>> uint64_t start, >>> */ >>> entry_valid = read_cur | write_cur; >>> >>> + entry.target_as = &address_space_memory; >>> + entry.iova = iova & subpage_mask; >>> + entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur); >>> + entry.addr_mask = ~subpage_mask; >>> + >>> if (vtd_is_last_slpte(slpte, level)) { >>> - entry.target_as = &address_space_memory; >>> - entry.iova = iova & subpage_mask; >>> /* NOTE: this is only meaningful if entry_valid == true */ >>> entry.translated_addr = vtd_get_slpte_addr(slpte, aw); >>> - entry.addr_mask = ~subpage_mask; >>> - entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur); >>> if (!entry_valid && !notify_unmap) { >>> trace_vtd_page_walk_skip_perm(iova, iova_next); >>> goto next; >>> } >>> - trace_vtd_page_walk_one(level, entry.iova, >>> entry.translated_addr, >>> - entry.addr_mask, entry.perm); >>> - if (hook_fn) { >>> - ret = hook_fn(&entry, private); >>> - if (ret < 0) { >>> - return ret; >>> - } >>> + ret = vtd_page_walk_one(&entry, level, hook_fn, private); >>> + if (ret < 0) { >>> + return ret; >>> } >>> } else { >>> if (!entry_valid) { >>> - trace_vtd_page_walk_skip_perm(iova, iova_next); >>> + if (notify_unmap) { >>> + /* >>> + * The whole entry is invalid; unmap it all. >>> + * Translated address is meaningless, zero it. >>> + */ >>> + entry.translated_addr = 0x0; >> do you really need to zero the translated_addr and the related comment. >> As soon as perm is NONE this should not be used? > > Yes here we can avoid setting it. However that'll make sure we don't > leak strange numbers to the below notifiers, so I would still slightly > prefer to zero it here. > >>> + ret = vtd_page_walk_one(&entry, level, hook_fn, >>> private); >>> + if (ret < 0) { >>> + return ret; >>> + } >>> + } else { >>> + trace_vtd_page_walk_skip_perm(iova, iova_next); >>> + } >>> goto next; >>> } >>> ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova, >>> >> >> Besides >> Reviewed-by: Eric Auger <eric.au...@redhat.com> > > Thanks for reviewing the series. > > Note that this v2 is obsolete, please feel free to read version 3 of > this series, which contains quite a lot of functional changes. > > Regards, >