Re: [Qemu-devel] [RFC v2 PATCH 2/2] virtio-iommu: vfio integration with virtio-iommu
Hi Eric, > -Original Message- > From: Auger Eric [mailto:eric.au...@redhat.com] > Sent: Thursday, August 17, 2017 9:03 PM > To: Bharat Bhushan ; > eric.auger@gmail.com; peter.mayd...@linaro.org; > alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org; > qemu-devel@nongnu.org > Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; > t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; > robin.mur...@arm.com; christoffer.d...@linaro.org > Subject: Re: [Qemu-devel] [RFC v2 PATCH 2/2] virtio-iommu: vfio integration > with virtio-iommu > > Hi Bharat, > > On 14/07/2017 09:25, Bharat Bhushan wrote: > > This patch allows virtio-iommu protection for PCI device-passthrough. > > > > MSI region is mapped by current version of virtio-iommu driver. > > This MSI region mapping in not getting pushed on hw iommu > > vfio_get_vaddr() allows only ram-region. > Why is it an issue. As far as I understand this is not needed actually as the > guest MSI doorbell is not used by the host. > This RFC patch needed > > to be improved. > > > > Signed-off-by: Bharat Bhushan > > --- > > v1-v2: > > - Added trace events > > > > hw/virtio/trace-events | 5 ++ > > hw/virtio/virtio-iommu.c | 133 > +++ > > include/hw/virtio/virtio-iommu.h | 6 ++ > > 3 files changed, 144 insertions(+) > > > > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events index > > 9196b63..3a3968b 100644 > > --- a/hw/virtio/trace-events > > +++ b/hw/virtio/trace-events > > @@ -39,3 +39,8 @@ virtio_iommu_unmap_left_interval(uint64_t low, > > uint64_t high, uint64_t next_low, > virtio_iommu_unmap_right_interval(uint64_t low, uint64_t high, uint64_t > next_low, uint64_t next_high) "Unmap right [0x%"PRIx64",0x%"PRIx64"], > new interval=[0x%"PRIx64",0x%"PRIx64"]" > > virtio_iommu_unmap_inc_interval(uint64_t low, uint64_t high) "Unmap > inc [0x%"PRIx64",0x%"PRIx64"]" > > virtio_iommu_translate_result(uint64_t virt_addr, uint64_t phys_addr, > uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d" > > +virtio_iommu_notify_flag_add(const char *iommu) "Add virtio-iommu > notifier node for memory region %s" > > +virtio_iommu_notify_flag_del(const char *iommu) "Del virtio-iommu > notifier node for memory region %s" > > +virtio_iommu_remap(hwaddr iova, hwaddr pa, hwaddr size) > "iova=0x%"PRIx64" pa=0x%" PRIx64" size=0x%"PRIx64"" > > +virtio_iommu_map_region(hwaddr iova, hwaddr paddr, hwaddr > map_size) "iova=0x%"PRIx64" pa=0x%" PRIx64" size=0x%"PRIx64"" > > +virtio_iommu_unmap_region(hwaddr iova, hwaddr paddr, hwaddr > map_size) "iova=0x%"PRIx64" pa=0x%" PRIx64" size=0x%"PRIx64"" > > diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c index > > cd188fc..61f33cb 100644 > > --- a/hw/virtio/virtio-iommu.c > > +++ b/hw/virtio/virtio-iommu.c > > @@ -129,6 +129,48 @@ static gint interval_cmp(gconstpointer a, > gconstpointer b, gpointer user_data) > > } > > } > > > > +static void virtio_iommu_map_region(VirtIOIOMMU *s, hwaddr iova, > hwaddr paddr, > > +hwaddr size, int map) > bool map? > > the function name is a bit misleading to me and does not really explain what > the function does. It "notifies" so why not using something like > virtio_iommu_map_notify and virtio_iommu_unmap_notify. I tend to think > having separate proto is cleaner and more standard. > > Binding should happen on a specific IOMMUmemoryRegion (see next > comment). > > > +{ > > +VirtioIOMMUNotifierNode *node; > > +IOMMUTLBEntry entry; > > +uint64_t map_size = (1 << 12); > TODO: handle something else than 4K page. > > +int npages; > > +int i; > > + > > +npages = size / map_size; > > +entry.target_as = &address_space_memory; > > +entry.addr_mask = map_size - 1; > > + > > +for (i = 0; i < npages; i++) { > Although I understand we currently fail checking the consistency between > pIOMMU and vIOMMU page sizes, this will be very slow for guest DPDK use > case where hugepages are used. > > Why not directly using the full size? vfio_iommu_map_notify will report > errors if vfio_dma_map/unmap() fail. > > +entry.iova = iova + (i * map_size); > > +if (map) { > >
Re: [Qemu-devel] [RFC v2 PATCH 2/2] virtio-iommu: vfio integration with virtio-iommu
> -Original Message- > From: Auger Eric [mailto:eric.au...@redhat.com] > Sent: Thursday, August 17, 2017 9:03 PM > To: Bharat Bhushan ; > eric.auger@gmail.com; peter.mayd...@linaro.org; > alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org; > qemu-devel@nongnu.org > Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; > t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; > robin.mur...@arm.com; christoffer.d...@linaro.org > Subject: Re: [Qemu-devel] [RFC v2 PATCH 2/2] virtio-iommu: vfio integration > with virtio-iommu > > Hi Bharat, > > On 14/07/2017 09:25, Bharat Bhushan wrote: > > This patch allows virtio-iommu protection for PCI device-passthrough. > > > > MSI region is mapped by current version of virtio-iommu driver. > > This MSI region mapping in not getting pushed on hw iommu > > vfio_get_vaddr() allows only ram-region. > Why is it an issue. As far as I understand this is not needed actually as the > guest MSI doorbell is not used by the host. > This RFC patch needed > > to be improved. > > > > Signed-off-by: Bharat Bhushan > > --- > > v1-v2: > > - Added trace events > > > > hw/virtio/trace-events | 5 ++ > > hw/virtio/virtio-iommu.c | 133 > +++ > > include/hw/virtio/virtio-iommu.h | 6 ++ > > 3 files changed, 144 insertions(+) > > > > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events index > > 9196b63..3a3968b 100644 > > --- a/hw/virtio/trace-events > > +++ b/hw/virtio/trace-events > > @@ -39,3 +39,8 @@ virtio_iommu_unmap_left_interval(uint64_t low, > > uint64_t high, uint64_t next_low, > virtio_iommu_unmap_right_interval(uint64_t low, uint64_t high, uint64_t > next_low, uint64_t next_high) "Unmap right [0x%"PRIx64",0x%"PRIx64"], > new interval=[0x%"PRIx64",0x%"PRIx64"]" > > virtio_iommu_unmap_inc_interval(uint64_t low, uint64_t high) "Unmap > inc [0x%"PRIx64",0x%"PRIx64"]" > > virtio_iommu_translate_result(uint64_t virt_addr, uint64_t phys_addr, > uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d" > > +virtio_iommu_notify_flag_add(const char *iommu) "Add virtio-iommu > notifier node for memory region %s" > > +virtio_iommu_notify_flag_del(const char *iommu) "Del virtio-iommu > notifier node for memory region %s" > > +virtio_iommu_remap(hwaddr iova, hwaddr pa, hwaddr size) > "iova=0x%"PRIx64" pa=0x%" PRIx64" size=0x%"PRIx64"" > > +virtio_iommu_map_region(hwaddr iova, hwaddr paddr, hwaddr > map_size) "iova=0x%"PRIx64" pa=0x%" PRIx64" size=0x%"PRIx64"" > > +virtio_iommu_unmap_region(hwaddr iova, hwaddr paddr, hwaddr > map_size) "iova=0x%"PRIx64" pa=0x%" PRIx64" size=0x%"PRIx64"" > > diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c index > > cd188fc..61f33cb 100644 > > --- a/hw/virtio/virtio-iommu.c > > +++ b/hw/virtio/virtio-iommu.c > > @@ -129,6 +129,48 @@ static gint interval_cmp(gconstpointer a, > gconstpointer b, gpointer user_data) > > } > > } > > > > +static void virtio_iommu_map_region(VirtIOIOMMU *s, hwaddr iova, > hwaddr paddr, > > +hwaddr size, int map) > bool map? > > the function name is a bit misleading to me and does not really explain what > the function does. It "notifies" so why not using something like > virtio_iommu_map_notify and virtio_iommu_unmap_notify. I tend to think > having separate proto is cleaner and more standard. > > Binding should happen on a specific IOMMUmemoryRegion (see next > comment). > > > +{ > > +VirtioIOMMUNotifierNode *node; > > +IOMMUTLBEntry entry; > > +uint64_t map_size = (1 << 12); > TODO: handle something else than 4K page. > > +int npages; > > +int i; > > + > > +npages = size / map_size; > > +entry.target_as = &address_space_memory; > > +entry.addr_mask = map_size - 1; > > + > > +for (i = 0; i < npages; i++) { > Although I understand we currently fail checking the consistency between > pIOMMU and vIOMMU page sizes, this will be very slow for guest DPDK use > case where hugepages are used. > > Why not directly using the full size? vfio_iommu_map_notify will report > errors if vfio_dma_map/unmap() fail. Yes, just for understanding, VFIO/IOMMU will map at page granularity and QEMU rely on that? > >
Re: [Qemu-devel] [RFC v2 PATCH 2/2] virtio-iommu: vfio integration with virtio-iommu
Hi Bharat, On 14/07/2017 09:25, Bharat Bhushan wrote: > This patch allows virtio-iommu protection for PCI > device-passthrough. > > MSI region is mapped by current version of virtio-iommu driver. > This MSI region mapping in not getting pushed on hw iommu > vfio_get_vaddr() allows only ram-region. Why is it an issue. As far as I understand this is not needed actually as the guest MSI doorbell is not used by the host. This RFC patch needed > to be improved. > > Signed-off-by: Bharat Bhushan > --- > v1-v2: > - Added trace events > > hw/virtio/trace-events | 5 ++ > hw/virtio/virtio-iommu.c | 133 > +++ > include/hw/virtio/virtio-iommu.h | 6 ++ > 3 files changed, 144 insertions(+) > > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events > index 9196b63..3a3968b 100644 > --- a/hw/virtio/trace-events > +++ b/hw/virtio/trace-events > @@ -39,3 +39,8 @@ virtio_iommu_unmap_left_interval(uint64_t low, uint64_t > high, uint64_t next_low, > virtio_iommu_unmap_right_interval(uint64_t low, uint64_t high, uint64_t > next_low, uint64_t next_high) "Unmap right [0x%"PRIx64",0x%"PRIx64"], new > interval=[0x%"PRIx64",0x%"PRIx64"]" > virtio_iommu_unmap_inc_interval(uint64_t low, uint64_t high) "Unmap inc > [0x%"PRIx64",0x%"PRIx64"]" > virtio_iommu_translate_result(uint64_t virt_addr, uint64_t phys_addr, > uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d" > +virtio_iommu_notify_flag_add(const char *iommu) "Add virtio-iommu notifier > node for memory region %s" > +virtio_iommu_notify_flag_del(const char *iommu) "Del virtio-iommu notifier > node for memory region %s" > +virtio_iommu_remap(hwaddr iova, hwaddr pa, hwaddr size) "iova=0x%"PRIx64" > pa=0x%" PRIx64" size=0x%"PRIx64"" > +virtio_iommu_map_region(hwaddr iova, hwaddr paddr, hwaddr map_size) > "iova=0x%"PRIx64" pa=0x%" PRIx64" size=0x%"PRIx64"" > +virtio_iommu_unmap_region(hwaddr iova, hwaddr paddr, hwaddr map_size) > "iova=0x%"PRIx64" pa=0x%" PRIx64" size=0x%"PRIx64"" > diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c > index cd188fc..61f33cb 100644 > --- a/hw/virtio/virtio-iommu.c > +++ b/hw/virtio/virtio-iommu.c > @@ -129,6 +129,48 @@ static gint interval_cmp(gconstpointer a, gconstpointer > b, gpointer user_data) > } > } > > +static void virtio_iommu_map_region(VirtIOIOMMU *s, hwaddr iova, hwaddr > paddr, > +hwaddr size, int map) bool map? the function name is a bit misleading to me and does not really explain what the function does. It "notifies" so why not using something like virtio_iommu_map_notify and virtio_iommu_unmap_notify. I tend to think having separate proto is cleaner and more standard. Binding should happen on a specific IOMMUmemoryRegion (see next comment). > +{ > +VirtioIOMMUNotifierNode *node; > +IOMMUTLBEntry entry; > +uint64_t map_size = (1 << 12); TODO: handle something else than 4K page. > +int npages; > +int i; > + > +npages = size / map_size; > +entry.target_as = &address_space_memory; > +entry.addr_mask = map_size - 1; > + > +for (i = 0; i < npages; i++) { Although I understand we currently fail checking the consistency between pIOMMU and vIOMMU page sizes, this will be very slow for guest DPDK use case where hugepages are used. Why not directly using the full size? vfio_iommu_map_notify will report errors if vfio_dma_map/unmap() fail. > +entry.iova = iova + (i * map_size); > +if (map) { > +trace_virtio_iommu_map_region(iova, paddr, map_size); > +entry.perm = IOMMU_RW; > +entry.translated_addr = paddr + (i * map_size); > +} else { > +trace_virtio_iommu_unmap_region(iova, paddr, map_size); > +entry.perm = IOMMU_NONE; > +entry.translated_addr = 0; > +} > + > +QLIST_FOREACH(node, &s->notifiers_list, next) { > +memory_region_notify_iommu(&node->iommu_dev->iommu_mr, entry); So as discussed this will notify *all* IOMMU memory regions and all their notifiers which is not what we want. You may have a look at vsmmuv3 v6 (or intel_iommu) where smmuv3_context_device_invalidate retrieves the mr from the sid. > +} > +} > +} > + > +static gboolean virtio_iommu_unmap_single(gpointer key, gpointer value, > + gpointer data) > +{ > +viommu_mapping *mapping = (viommu_mapping *) value; > +VirtIOIOMMU *s = (VirtIOIOMMU *) data; > + > +virtio_iommu_map_region(s, mapping->virt_addr, 0, mapping->size, 0); paddr=0? mapping->phys_addr as the trace() will be misleading. But as mentioned earlier better use unmap() separate function. > + > +return true; > +} > + > static int virtio_iommu_attach(VirtIOIOMMU *s, > struct virtio_iommu_req_attach *req) > { > @@ -170,10 +212,26 @@ static int virtio_iommu_detach(VirtIOIOMMU *s, > { >