Hi Alexey, On 22/03/18 08:45, Alexey Kardashevskiy wrote: > On 22/3/18 2:29 am, Alex Williamson wrote: >> On Mon, 19 Mar 2018 21:49:58 +0100 >> Auger Eric <eric.au...@redhat.com> wrote: >> >>> Hi Alexey, >>> >>> On 09/02/18 08:55, Alexey Kardashevskiy wrote: >>>> At the moment if vfio_memory_listener is registered in the system memory >>>> address space, it maps/unmaps every RAM memory region for DMA. >>>> It expects system page size aligned memory sections so vfio_dma_map >>>> would not fail and so far this has been the case. A mapping failure >>>> would be fatal. A side effect of such behavior is that some MMIO pages >>>> would not be mapped silently. >>>> >>>> However we are going to change MSIX BAR handling so we will end having >>>> non-aligned sections in vfio_memory_listener (more details is in >>>> the next patch) and vfio_dma_map will exit QEMU. >>>> >>>> In order to avoid fatal failures on what previously was not a failure and >>>> was just silently ignored, this checks the section alignment to >>>> the smallest supported IOMMU page size and prints an error if not aligned; >>>> it also prints an error if vfio_dma_map failed despite the page size check. >>>> Both errors are not fatal; only MMIO RAM regions are checked >>>> (aka "RAM device" regions). >>>> >>>> If the amount of errors printed is overwhelming, the MSIX relocation >>>> could be used to avoid excessive error output. >>>> >>>> This is unlikely to cause any behavioral change. >>>> >>>> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> >>>> --- >>>> hw/vfio/common.c | 55 >>>> +++++++++++++++++++++++++++++++++++++++++++++++++------ >>>> 1 file changed, 49 insertions(+), 6 deletions(-) >>>> >>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c >>>> index f895e3c..736f271 100644 >>>> --- a/hw/vfio/common.c >>>> +++ b/hw/vfio/common.c >>>> @@ -544,18 +544,40 @@ static void vfio_listener_region_add(MemoryListener >>>> *listener, >>>> >>>> llsize = int128_sub(llend, int128_make64(iova)); >>>> >>>> + if (memory_region_is_ram_device(section->mr)) { >>>> + hwaddr pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1; >>>> + >>>> + if ((iova & pgmask) || (llsize & pgmask)) { >>>> + error_report("Region 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx >>>> + " is not aligned to 0x%"HWADDR_PRIx >>>> + " and cannot be mapped for DMA", >>>> + section->offset_within_region, >>>> + int128_getlo(section->size), >>> Didn't you want to print the section->offset_within_address_space >>> instead of section->offset_within_region? > > > I do, my bad - this offset_within_region is a leftover from an earlier > version, iova is offset_within_address_space essentially. > > >>> >>> For an 1000e card*, with a 64kB page, it outputs: >>> "Region 0x50..0xffb0 is not aligned to 0x10000 and cannot be mapped for DMA" >>> >>> an absolute gpa range would be simpler I think. >> >> I agree, the offset within the address space seems like it'd be much >> easier to map back to the device. Since we're handling it as >> non-fatal, perhaps we can even add "MMIO Region" to give the user a bit >> more context where the issue occurs. Alexey, do you want to submit a >> follow-up patch, or Eric, you may have the best resources to tune the >> message. Thanks, > > > I can if somebody gives a hint about what needs to be tuned in the message. > I am also fine if Eric does this too :)
I propose to send a follow-up patch as I have the proper HW to test it now. Thanks Eric > > >> >> Alex >> >>> * where Region 3: Memory at 100e0000 (32-bit, non-prefetchable) >>> [size=16K] contains the MSIX table >>> >>> Capabilities: [a0] MSI-X: Enable+ Count=5 Masked- >>> Vector table: BAR=3 offset=00000000 >>> PBA: BAR=3 offset=00002000 >>> >>> Thanks >>> >>> Eric >>>> + pgmask + 1); >>>> + return; >>>> + } >>>> + } >>>> + >>>> ret = vfio_dma_map(container, iova, int128_get64(llsize), >>>> vaddr, section->readonly); >>>> if (ret) { >>>> error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", " >>>> "0x%"HWADDR_PRIx", %p) = %d (%m)", >>>> container, iova, int128_get64(llsize), vaddr, ret); >>>> + if (memory_region_is_ram_device(section->mr)) { >>>> + /* Allow unexpected mappings not to be fatal for RAM devices >>>> */ >>>> + return; >>>> + } >>>> goto fail; >>>> } >>>> >>>> return; >>>> >>>> fail: >>>> + if (memory_region_is_ram_device(section->mr)) { >>>> + error_report("failed to vfio_dma_map. pci p2p may not work"); >>>> + return; >>>> + } >>>> /* >>>> * On the initfn path, store the first error in the container so we >>>> * can gracefully fail. Runtime, there's not much we can do other >>>> @@ -577,6 +599,7 @@ static void vfio_listener_region_del(MemoryListener >>>> *listener, >>>> hwaddr iova, end; >>>> Int128 llend, llsize; >>>> int ret; >>>> + bool try_unmap = true; >>>> >>>> if (vfio_listener_skipped_section(section)) { >>>> trace_vfio_listener_region_del_skip( >>>> @@ -629,13 +652,33 @@ static void vfio_listener_region_del(MemoryListener >>>> *listener, >>>> >>>> trace_vfio_listener_region_del(iova, end); >>>> >>>> - ret = vfio_dma_unmap(container, iova, int128_get64(llsize)); >>>> + if (memory_region_is_ram_device(section->mr)) { >>>> + hwaddr pgmask; >>>> + VFIOHostDMAWindow *hostwin; >>>> + bool hostwin_found = false; >>>> + >>>> + QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) { >>>> + if (hostwin->min_iova <= iova && end <= hostwin->max_iova) { >>>> + hostwin_found = true; >>>> + break; >>>> + } >>>> + } >>>> + assert(hostwin_found); /* or region_add() would have failed */ >>>> + >>>> + pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1; >>>> + try_unmap = !((iova & pgmask) || (llsize & pgmask)); >>>> + } >>>> + >>>> + if (try_unmap) { >>>> + ret = vfio_dma_unmap(container, iova, int128_get64(llsize)); >>>> + if (ret) { >>>> + error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", " >>>> + "0x%"HWADDR_PRIx") = %d (%m)", >>>> + container, iova, int128_get64(llsize), ret); >>>> + } >>>> + } >>>> + >>>> memory_region_unref(section->mr); >>>> - if (ret) { >>>> - error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", " >>>> - "0x%"HWADDR_PRIx") = %d (%m)", >>>> - container, iova, int128_get64(llsize), ret); >>>> - } >>>> >>>> if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) { >>>> vfio_spapr_remove_window(container, >>>> >> > >