Re: [PATCH RFC 4/5] vhost-vdpa: support IOTLB batching hints
On 2020/6/29 下午11:49, Michael S. Tsirkin wrote: On Mon, Jun 29, 2020 at 05:26:03PM +0800, Jason Wang wrote: On 2020/6/28 下午5:58, Michael S. Tsirkin wrote: On Thu, Jun 18, 2020 at 01:56:25PM +0800, Jason Wang wrote: This patches extend the vhost IOTLB API to accept batch updating hints form userspace. When userspace wants update the device IOTLB in a batch, it may do: 1) Write vhost_iotlb_msg with VHOST_IOTLB_BATCH_BEGIN flag 2) Perform a batch of IOTLB updating via VHOST_IOTLB_UPDATE/INVALIDATE 3) Write vhost_iotlb_msg with VHOST_IOTLB_BATCH_END flag As long as we are extending the interface, is there some way we could cut down the number of system calls needed here? I'm not sure it's worth to do that since usually we only have less than 10 regions. Well with a guest iommu I'm guessing it can go up significantly, right? The batching flag is not mandatory, so userspace can still choose to update a single IOTLB mapping through a single system call without batch flag. A possible method is to carry multiple vhost_iotlb_message in one system call. That's an option. Also, can kernel handle a batch that is too large by applying parts of it iteratively? I think so, we can iterate a vhost iotlb message one by one through iov iterator. Or must all changes take place atomically after BATCH_END? For changes: - if you mean the changes in vhost IOTLB, it's not atomically - if you mean the mapping seen by vDPA device driver, it could be atomically for the device driver that implements set_map() method, since we pass a interval tree to the device. If atomically, we might need to limit batch size to make sure kernel can keep a copy in memory. It's the responsibility of guest driver to make sure there won't be a race condition between DMA and map updating. So it looks to me we don't need to care about that. Vhost-vdpa may decide to batch the IOMMU/IOTLB updating in step 3 when vDPA device support set_map() ops. This is useful for the vDPA device that want to know all the mappings to tweak their own DMA translation logic. For vDPA device that doesn't require set_map(), no behavior changes. This capability is advertised via VHOST_BACKEND_F_IOTLB_BATCH capability. Signed-off-by: Jason Wang --- drivers/vhost/vdpa.c | 30 +++--- include/uapi/linux/vhost.h | 2 ++ include/uapi/linux/vhost_types.h | 7 +++ 3 files changed, 32 insertions(+), 7 deletions(-) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 453057421f80..8f624bbafee7 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -56,7 +56,9 @@ enum { }; enum { - VHOST_VDPA_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2) + VHOST_VDPA_BACKEND_FEATURES = + (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2) | + (1ULL << VHOST_BACKEND_F_IOTLB_BATCH), }; /* Currently, only network backend w/o multiqueue is supported. */ @@ -77,6 +79,7 @@ struct vhost_vdpa { int virtio_id; int minor; struct eventfd_ctx *config_ctx; + int in_batch; }; static DEFINE_IDA(vhost_vdpa_ida); @@ -125,6 +128,7 @@ static void vhost_vdpa_reset(struct vhost_vdpa *v) const struct vdpa_config_ops *ops = vdpa->config; ops->set_status(vdpa, 0); + v->in_batch = 0; } static long vhost_vdpa_get_device_id(struct vhost_vdpa *v, u8 __user *argp) @@ -540,9 +544,10 @@ static int vhost_vdpa_map(struct vhost_vdpa *v, if (ops->dma_map) r = ops->dma_map(vdpa, iova, size, pa, perm); - else if (ops->set_map) - r = ops->set_map(vdpa, dev->iotlb); - else + else if (ops->set_map) { + if (!v->in_batch) + r = ops->set_map(vdpa, dev->iotlb); + } else r = iommu_map(v->domain, iova, pa, size, perm_to_iommu_flags(perm)); @@ -559,9 +564,10 @@ static void vhost_vdpa_unmap(struct vhost_vdpa *v, u64 iova, u64 size) if (ops->dma_map) ops->dma_unmap(vdpa, iova, size); - else if (ops->set_map) - ops->set_map(vdpa, dev->iotlb); - else + else if (ops->set_map) { + if (!v->in_batch) + ops->set_map(vdpa, dev->iotlb); + } else iommu_unmap(v->domain, iova, size); } @@ -655,6 +661,8 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev, struct vhost_iotlb_msg *msg) { struct vhost_vdpa *v = container_of(dev, struct vhost_vdpa, vdev); + struct vdpa_device *vdpa = v->vdpa; + const struct vdpa_config_ops *ops = vdpa->config; int r = 0; r = vhost_dev_check_owner(dev); @@ -668,6 +676,14 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev, case VHOST_IOTLB_INVALIDATE: vhost_vdpa_unmap(v, msg->iova, msg->size); break; +
RE: [PATCH] xen: introduce xen_vring_use_dma
> Subject: RE: [PATCH] xen: introduce xen_vring_use_dma > > On Mon, 29 Jun 2020, Peng Fan wrote: > > > > If that is the case, how is it possible that virtio breaks on ARM > > > > using the default dma_ops? The breakage is not Xen related (except > > > > that Xen turns dma_ops on). The original message from Peng was: > > > > > > > > vring_map_one_sg -> vring_use_dma_api > > > >-> dma_map_page > > > >-> __swiotlb_map_page > > > > ->swiotlb_map_page > > > > > > > > ->__dma_map_area(phys_to_virt(dma_to_phys(dev, > > > dev_addr)), size, dir); > > > > However we are using per device dma area for rpmsg, phys_to_virt > > > > could not return a correct virtual address for virtual address in > > > > vmalloc area. Then kernel panic. > > > > > > > > I must be missing something. Maybe it is because it has to do with > RPMesg? > > > > > > I think it's an RPMesg bug, yes > > > > rpmsg bug is another issue, it should not use dma_alloc_coherent for > > reserved area, and use vmalloc_to_page. > > > > Anyway here using dma api will also trigger issue. > > Is the stack trace above for the RPMesg issue or for the Trusty issue? The stack trace you pasted is rpmsg issue. > If it is the stack trace for RPMesg, can you also post the Trusty stack > trace? Or > are they indentical? There is no stack dump here. It successfully using swiotlb to do a map, but we actually no need swiotlb in domu to do the map. Thanks, Peng. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH] xen: introduce xen_vring_use_dma
On Mon, 29 Jun 2020, Peng Fan wrote: > > > If that is the case, how is it possible that virtio breaks on ARM > > > using the default dma_ops? The breakage is not Xen related (except > > > that Xen turns dma_ops on). The original message from Peng was: > > > > > > vring_map_one_sg -> vring_use_dma_api > > >-> dma_map_page > > > -> __swiotlb_map_page > > > ->swiotlb_map_page > > > > > > ->__dma_map_area(phys_to_virt(dma_to_phys(dev, > > dev_addr)), size, dir); > > > However we are using per device dma area for rpmsg, phys_to_virt > > > could not return a correct virtual address for virtual address in > > > vmalloc area. Then kernel panic. > > > > > > I must be missing something. Maybe it is because it has to do with RPMesg? > > > > I think it's an RPMesg bug, yes > > rpmsg bug is another issue, it should not use dma_alloc_coherent for reserved > area, > and use vmalloc_to_page. > > Anyway here using dma api will also trigger issue. Is the stack trace above for the RPMesg issue or for the Trusty issue? If it is the stack trace for RPMesg, can you also post the Trusty stack trace? Or are they indentical? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] xen: introduce xen_vring_use_dma
On Fri, 26 Jun 2020, Michael S. Tsirkin wrote: > On Thu, Jun 25, 2020 at 10:31:27AM -0700, Stefano Stabellini wrote: > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote: > > > On Wed, Jun 24, 2020 at 02:53:54PM -0700, Stefano Stabellini wrote: > > > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote: > > > > > On Wed, Jun 24, 2020 at 10:59:47AM -0700, Stefano Stabellini wrote: > > > > > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote: > > > > > > > On Wed, Jun 24, 2020 at 05:17:32PM +0800, Peng Fan wrote: > > > > > > > > Export xen_swiotlb for all platforms using xen swiotlb > > > > > > > > > > > > > > > > Use xen_swiotlb to determine when vring should use dma APIs to > > > > > > > > map the > > > > > > > > ring: when xen_swiotlb is enabled the dma API is required. When > > > > > > > > it is > > > > > > > > disabled, it is not required. > > > > > > > > > > > > > > > > Signed-off-by: Peng Fan > > > > > > > > > > > > > > Isn't there some way to use VIRTIO_F_IOMMU_PLATFORM for this? > > > > > > > Xen was there first, but everyone else is using that now. > > > > > > > > > > > > Unfortunately it is complicated and it is not related to > > > > > > VIRTIO_F_IOMMU_PLATFORM :-( > > > > > > > > > > > > > > > > > > The Xen subsystem in Linux uses dma_ops via swiotlb_xen to translate > > > > > > foreign mappings (memory coming from other VMs) to physical > > > > > > addresses. > > > > > > On x86, it also uses dma_ops to translate Linux's idea of a physical > > > > > > address into a real physical address (this is unneeded on ARM.) > > > > > > > > > > > > > > > > > > So regardless of VIRTIO_F_IOMMU_PLATFORM, dma_ops should be used on > > > > > > Xen/x86 > > > > > > always and on Xen/ARM if Linux is Dom0 (because it has foreign > > > > > > mappings.) That is why we have the if (xen_domain) return true; in > > > > > > vring_use_dma_api. > > > > > > > > > > VIRTIO_F_IOMMU_PLATFORM makes guest always use DMA ops. > > > > > > > > > > Xen hack predates VIRTIO_F_IOMMU_PLATFORM so it *also* > > > > > forces DMA ops even if VIRTIO_F_IOMMU_PLATFORM is clear. > > > > > > > > > > Unfortunately as a result Xen never got around to > > > > > properly setting VIRTIO_F_IOMMU_PLATFORM. > > > > > > > > I don't think VIRTIO_F_IOMMU_PLATFORM would be correct for this because > > > > the usage of swiotlb_xen is not a property of virtio, > > > > > > > > > Basically any device without VIRTIO_F_ACCESS_PLATFORM > > > (that is it's name in latest virtio spec, VIRTIO_F_IOMMU_PLATFORM is > > > what linux calls it) is declared as "special, don't follow normal rules > > > for access". > > > > > > So yes swiotlb_xen is not a property of virtio, but what *is* a property > > > of virtio is that it's not special, just a regular device from DMA POV. > > > > I am trying to understand what you meant but I think I am missing > > something. > > > > Are you saying that modern virtio should always have > > VIRTIO_F_ACCESS_PLATFORM, hence use normal dma_ops as any other devices? > > I am saying it's a safe default. Clear VIRTIO_F_ACCESS_PLATFORM if you > have some special needs e.g. you are very sure it's ok to bypass DMA > ops, or you need to support a legacy guest (produced in the window > between virtio 1 support in 2014 and support for > VIRTIO_F_ACCESS_PLATFORM in 2016). Ok thanks. I understand and it makes sense to me. > > > > > > You might have noticed that I missed one possible case above: > > > > > > Xen/ARM > > > > > > DomU :-) > > > > > > > > > > > > Xen/ARM domUs don't need swiotlb_xen, it is not even initialized. > > > > > > So if > > > > > > (xen_domain) return true; would give the wrong answer in that case. > > > > > > Linux would end up calling the "normal" dma_ops, not swiotlb-xen, > > > > > > and > > > > > > the "normal" dma_ops fail. > > > > > > > > > > > > > > > > > > The solution I suggested was to make the check in vring_use_dma_api > > > > > > more > > > > > > flexible by returning true if the swiotlb_xen is supposed to be > > > > > > used, > > > > > > not in general for all Xen domains, because that is what the check > > > > > > was > > > > > > really meant to do. > > > > > > > > > > Why not fix DMA ops so they DTRT (nop) on Xen/ARM DomU? What is wrong > > > > > with that? > > > > > > > > swiotlb-xen is not used on Xen/ARM DomU, the default dma_ops are the > > > > ones that are used. So you are saying, why don't we fix the default > > > > dma_ops to work with virtio? > > > > > > > > It is bad that the default dma_ops crash with virtio, so yes I think it > > > > would be good to fix that. However, even if we fixed that, the if > > > > (xen_domain()) check in vring_use_dma_api is still a problem. > > > > > > Why is it a problem? It just makes virtio use DMA API. > > > If that in turn works, problem solved. > > > > You are correct in the sense that it would work. However I do think it > > is wrong for vring_use_dma_api to enable dma_ops/swiotlb-xen for Xen/ARM > > DomUs that don't need it. There are many
Re: [PATCH v3 1/1] s390: virtio: let arch accept devices without IOMMU feature
On Mon, Jun 29, 2020 at 06:48:28PM +0200, Pierre Morel wrote: > > > On 2020-06-29 18:09, Michael S. Tsirkin wrote: > > On Wed, Jun 17, 2020 at 12:43:57PM +0200, Pierre Morel wrote: > > > An architecture protecting the guest memory against unauthorized host > > > access may want to enforce VIRTIO I/O device protection through the > > > use of VIRTIO_F_IOMMU_PLATFORM. > > > Let's give a chance to the architecture to accept or not devices > > > without VIRTIO_F_IOMMU_PLATFORM. > > > > I agree it's a bit misleading. Protection is enforced by memory > > encryption, you can't trust the hypervisor to report the bit correctly > > so using that as a securoty measure would be pointless. > > The real gain here is that broken configs are easier to > > debug. > > > > Here's an attempt at a better description: > > > > On some architectures, guest knows that VIRTIO_F_IOMMU_PLATFORM is > > required for virtio to function: e.g. this is the case on s390 protected > > virt guests, since otherwise guest passes encrypted guest memory to > > devices, > > which the device can't read. Without VIRTIO_F_IOMMU_PLATFORM the > > result is that affected memory (or even a whole page containing > > it is corrupted). Detect and fail probe instead - that is easier > > to debug. > > Thanks indeed better aside from the "encrypted guest memory": the mechanism > used to avoid the access to the guest memory from the host by s390 is not > encryption but a hardware feature denying the general host access and > allowing pieces of memory to be shared between guest and host. s/encrypted/protected/ > As a consequence the data read from memory is not corrupted but not read at > all and the read error kills the hypervizor with a SIGSEGV. s/(or even a whole page containing it is corrupted)/can not be read and the read error kills the hypervizor with a SIGSEGV/ As an aside, we could maybe handle that more gracefully on the hypervisor side. > > > > > however, now that we have described what it is (hypervisor > > misconfiguration) I ask a question: can we be sure this will never > > ever work? E.g. what if some future hypervisor gains ability to > > access the protected guest memory in some abstractly secure manner? > > The goal of the s390 PV feature is to avoid this possibility so I don't > think so; however, there is a possibility that some hardware VIRTIO device > gain access to the guest's protected memory, even such device does not exist > yet. > > At the moment such device exists we will need a driver for it, at least to > enable the feature and apply policies, it is also one of the reasons why a > hook to the architecture is interesting. Not neessarily, it could also be fully transparent. See e.g. recent AMD andvances allowing unmodified guests with SEV. > > We are blocking this here, and it's hard to predict the future, > > and a broken hypervisor can always find ways to crash the guest ... > > yes, this is also something to fix on the hypervizor side, Halil is working > on it. > > > > > IMHO it would be safer to just print a warning. > > What do you think? > > Sadly, putting a warning may not help as qemu is killed if it accesses the > protected memory. > Also note that the crash occurs not only on start but also on hotplug. > > Thanks, > Pierre Well that depends on where does the warning go. If it's on a serial port it might be reported host side before the crash triggers. But interesting point generally. How about a feature to send a warning code or string to host then? > -- > Pierre Morel > IBM Lab Boeblingen ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 1/1] s390: virtio: let arch accept devices without IOMMU feature
On 2020-06-29 18:09, Michael S. Tsirkin wrote: On Wed, Jun 17, 2020 at 12:43:57PM +0200, Pierre Morel wrote: An architecture protecting the guest memory against unauthorized host access may want to enforce VIRTIO I/O device protection through the use of VIRTIO_F_IOMMU_PLATFORM. Let's give a chance to the architecture to accept or not devices without VIRTIO_F_IOMMU_PLATFORM. I agree it's a bit misleading. Protection is enforced by memory encryption, you can't trust the hypervisor to report the bit correctly so using that as a securoty measure would be pointless. The real gain here is that broken configs are easier to debug. Here's an attempt at a better description: On some architectures, guest knows that VIRTIO_F_IOMMU_PLATFORM is required for virtio to function: e.g. this is the case on s390 protected virt guests, since otherwise guest passes encrypted guest memory to devices, which the device can't read. Without VIRTIO_F_IOMMU_PLATFORM the result is that affected memory (or even a whole page containing it is corrupted). Detect and fail probe instead - that is easier to debug. Thanks indeed better aside from the "encrypted guest memory": the mechanism used to avoid the access to the guest memory from the host by s390 is not encryption but a hardware feature denying the general host access and allowing pieces of memory to be shared between guest and host. As a consequence the data read from memory is not corrupted but not read at all and the read error kills the hypervizor with a SIGSEGV. however, now that we have described what it is (hypervisor misconfiguration) I ask a question: can we be sure this will never ever work? E.g. what if some future hypervisor gains ability to access the protected guest memory in some abstractly secure manner? The goal of the s390 PV feature is to avoid this possibility so I don't think so; however, there is a possibility that some hardware VIRTIO device gain access to the guest's protected memory, even such device does not exist yet. At the moment such device exists we will need a driver for it, at least to enable the feature and apply policies, it is also one of the reasons why a hook to the architecture is interesting. We are blocking this here, and it's hard to predict the future, and a broken hypervisor can always find ways to crash the guest ... yes, this is also something to fix on the hypervizor side, Halil is working on it. IMHO it would be safer to just print a warning. What do you think? Sadly, putting a warning may not help as qemu is killed if it accesses the protected memory. Also note that the crash occurs not only on start but also on hotplug. Thanks, Pierre -- Pierre Morel IBM Lab Boeblingen ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 1/1] s390: virtio: let arch accept devices without IOMMU feature
On 2020-06-29 15:44, Cornelia Huck wrote: On Mon, 29 Jun 2020 15:14:04 +0200 Pierre Morel wrote: On 2020-06-19 11:20, Cornelia Huck wrote: On Thu, 18 Jun 2020 00:29:56 +0200 Halil Pasic wrote: On Wed, 17 Jun 2020 12:43:57 +0200 Pierre Morel wrote: @@ -179,6 +194,13 @@ int virtio_finalize_features(struct virtio_device *dev) if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) return 0; + if (arch_needs_virtio_iommu_platform(dev) && + !virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) { + dev_warn(>dev, +"virtio: device must provide VIRTIO_F_IOMMU_PLATFORM\n"); [Side note: wasn't there a patch renaming this bit on the list? I think this name is only kept for userspace compat.] Sorry, I do not understand what you expect from me. On which mailing list you think there is a patch on? I'm not sure, divulging the current Linux name of this feature bit is a good idea, but if everybody else is fine with this, I don't care that Not sure if that feature name will ever change, as it is exported in headers. At most, we might want to add the new ACCESS_PLATFORM define and keep the old one, but that would still mean some churn. much. An alternative would be: "virtio: device falsely claims to have full access to the memory, aborting the device" "virtio: device does not work with limited memory access" ? But no issue with keeping the current message. If it is OK, I would like to specify that the arch is responsible to accept or not the device. The reason why the device is not accepted without IOMMU_PLATFORM is arch specific. Hm, I'd think the reason is always the same (the device cannot access the memory directly), just the way to figure out whether that is the case or not is arch-specific, as with so many other things. No real need to go into detail here, I think. As you like, so I rename the subject to: "virtio: device does not work with limited memory access" Regards, Pierre -- Pierre Morel IBM Lab Boeblingen ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 1/1] s390: virtio: let arch accept devices without IOMMU feature
On 2020-06-29 17:57, Michael S. Tsirkin wrote: On Wed, Jun 17, 2020 at 12:43:57PM +0200, Pierre Morel wrote: An architecture protecting the guest memory against unauthorized host access may want to enforce VIRTIO I/O device protection through the use of VIRTIO_F_IOMMU_PLATFORM. Let's give a chance to the architecture to accept or not devices without VIRTIO_F_IOMMU_PLATFORM. Signed-off-by: Pierre Morel Acked-by: Jason Wang Acked-by: Christian Borntraeger --- arch/s390/mm/init.c | 6 ++ drivers/virtio/virtio.c | 22 ++ include/linux/virtio.h | 2 ++ 3 files changed, 30 insertions(+) diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c index 6dc7c3b60ef6..215070c03226 100644 --- a/arch/s390/mm/init.c +++ b/arch/s390/mm/init.c @@ -45,6 +45,7 @@ #include #include #include +#include pgd_t swapper_pg_dir[PTRS_PER_PGD] __section(.bss..swapper_pg_dir); @@ -161,6 +162,11 @@ bool force_dma_unencrypted(struct device *dev) return is_prot_virt_guest(); } +int arch_needs_virtio_iommu_platform(struct virtio_device *dev) +{ + return is_prot_virt_guest(); +} + /* protected virtualization */ static void pv_init(void) { diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index a977e32a88f2..aa8e01104f86 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -167,6 +167,21 @@ void virtio_add_status(struct virtio_device *dev, unsigned int status) } EXPORT_SYMBOL_GPL(virtio_add_status); +/* + * arch_needs_virtio_iommu_platform - provide arch specific hook when finalizing + * features for VIRTIO device dev + * @dev: the VIRTIO device being added + * + * Permits the platform to provide architecture specific functionality when + * devices features are finalized. This is the default implementation. + * Architecture implementations can override this. + */ + +int __weak arch_needs_virtio_iommu_platform(struct virtio_device *dev) +{ + return 0; +} + int virtio_finalize_features(struct virtio_device *dev) { int ret = dev->config->finalize_features(dev); @@ -179,6 +194,13 @@ int virtio_finalize_features(struct virtio_device *dev) if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) return 0; + if (arch_needs_virtio_iommu_platform(dev) && + !virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) { + dev_warn(>dev, +"virtio: device must provide VIRTIO_F_IOMMU_PLATFORM\n"); + return -ENODEV; + } + virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK); status = dev->config->get_status(dev); if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) { Well don't you need to check it *before* VIRTIO_F_VERSION_1, not after? Yes, you are right, Thanks, Pierre -- Pierre Morel IBM Lab Boeblingen ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 1/1] s390: virtio: let arch accept devices without IOMMU feature
On Wed, Jun 17, 2020 at 12:43:57PM +0200, Pierre Morel wrote: > An architecture protecting the guest memory against unauthorized host > access may want to enforce VIRTIO I/O device protection through the > use of VIRTIO_F_IOMMU_PLATFORM. > Let's give a chance to the architecture to accept or not devices > without VIRTIO_F_IOMMU_PLATFORM. I agree it's a bit misleading. Protection is enforced by memory encryption, you can't trust the hypervisor to report the bit correctly so using that as a securoty measure would be pointless. The real gain here is that broken configs are easier to debug. Here's an attempt at a better description: On some architectures, guest knows that VIRTIO_F_IOMMU_PLATFORM is required for virtio to function: e.g. this is the case on s390 protected virt guests, since otherwise guest passes encrypted guest memory to devices, which the device can't read. Without VIRTIO_F_IOMMU_PLATFORM the result is that affected memory (or even a whole page containing it is corrupted). Detect and fail probe instead - that is easier to debug. however, now that we have described what it is (hypervisor misconfiguration) I ask a question: can we be sure this will never ever work? E.g. what if some future hypervisor gains ability to access the protected guest memory in some abstractly secure manner? We are blocking this here, and it's hard to predict the future, and a broken hypervisor can always find ways to crash the guest ... IMHO it would be safer to just print a warning. What do you think? > > Signed-off-by: Pierre Morel > Acked-by: Jason Wang > Acked-by: Christian Borntraeger > --- > arch/s390/mm/init.c | 6 ++ > drivers/virtio/virtio.c | 22 ++ > include/linux/virtio.h | 2 ++ > 3 files changed, 30 insertions(+) > > diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c > index 6dc7c3b60ef6..215070c03226 100644 > --- a/arch/s390/mm/init.c > +++ b/arch/s390/mm/init.c > @@ -45,6 +45,7 @@ > #include > #include > #include > +#include > > pgd_t swapper_pg_dir[PTRS_PER_PGD] __section(.bss..swapper_pg_dir); > > @@ -161,6 +162,11 @@ bool force_dma_unencrypted(struct device *dev) > return is_prot_virt_guest(); > } > > +int arch_needs_virtio_iommu_platform(struct virtio_device *dev) > +{ > + return is_prot_virt_guest(); > +} > + > /* protected virtualization */ > static void pv_init(void) > { > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c > index a977e32a88f2..aa8e01104f86 100644 > --- a/drivers/virtio/virtio.c > +++ b/drivers/virtio/virtio.c > @@ -167,6 +167,21 @@ void virtio_add_status(struct virtio_device *dev, > unsigned int status) > } > EXPORT_SYMBOL_GPL(virtio_add_status); > > +/* > + * arch_needs_virtio_iommu_platform - provide arch specific hook when > finalizing > + * features for VIRTIO device dev > + * @dev: the VIRTIO device being added > + * > + * Permits the platform to provide architecture specific functionality when > + * devices features are finalized. This is the default implementation. > + * Architecture implementations can override this. > + */ > + > +int __weak arch_needs_virtio_iommu_platform(struct virtio_device *dev) > +{ > + return 0; > +} > + > int virtio_finalize_features(struct virtio_device *dev) > { > int ret = dev->config->finalize_features(dev); > @@ -179,6 +194,13 @@ int virtio_finalize_features(struct virtio_device *dev) > if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) > return 0; > > + if (arch_needs_virtio_iommu_platform(dev) && > + !virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) { > + dev_warn(>dev, > + "virtio: device must provide > VIRTIO_F_IOMMU_PLATFORM\n"); > + return -ENODEV; > + } > + > virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK); > status = dev->config->get_status(dev); > if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) { > diff --git a/include/linux/virtio.h b/include/linux/virtio.h > index a493eac08393..e8526ae3463e 100644 > --- a/include/linux/virtio.h > +++ b/include/linux/virtio.h > @@ -195,4 +195,6 @@ void unregister_virtio_driver(struct virtio_driver *drv); > #define module_virtio_driver(__virtio_driver) \ > module_driver(__virtio_driver, register_virtio_driver, \ > unregister_virtio_driver) > + > +int arch_needs_virtio_iommu_platform(struct virtio_device *dev); > #endif /* _LINUX_VIRTIO_H */ > -- > 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 1/1] s390: virtio: let arch accept devices without IOMMU feature
On Mon, 29 Jun 2020 11:57:14 -0400 "Michael S. Tsirkin" wrote: > On Wed, Jun 17, 2020 at 12:43:57PM +0200, Pierre Morel wrote: > > An architecture protecting the guest memory against unauthorized host > > access may want to enforce VIRTIO I/O device protection through the > > use of VIRTIO_F_IOMMU_PLATFORM. > > > > Let's give a chance to the architecture to accept or not devices > > without VIRTIO_F_IOMMU_PLATFORM. > > > > Signed-off-by: Pierre Morel > > Acked-by: Jason Wang > > Acked-by: Christian Borntraeger > > --- > > arch/s390/mm/init.c | 6 ++ > > drivers/virtio/virtio.c | 22 ++ > > include/linux/virtio.h | 2 ++ > > 3 files changed, 30 insertions(+) > > @@ -179,6 +194,13 @@ int virtio_finalize_features(struct virtio_device *dev) > > if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) > > return 0; > > > > + if (arch_needs_virtio_iommu_platform(dev) && > > + !virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) { > > + dev_warn(>dev, > > +"virtio: device must provide > > VIRTIO_F_IOMMU_PLATFORM\n"); > > + return -ENODEV; > > + } > > + > > virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK); > > status = dev->config->get_status(dev); > > if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) { > > Well don't you need to check it *before* VIRTIO_F_VERSION_1, not after? But it's only available with VERSION_1 anyway, isn't it? So it probably also needs to fail when this feature is needed if VERSION_1 has not been negotiated, I think. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 1/1] s390: virtio: let arch accept devices without IOMMU feature
On Wed, Jun 17, 2020 at 12:43:57PM +0200, Pierre Morel wrote: > An architecture protecting the guest memory against unauthorized host > access may want to enforce VIRTIO I/O device protection through the > use of VIRTIO_F_IOMMU_PLATFORM. > > Let's give a chance to the architecture to accept or not devices > without VIRTIO_F_IOMMU_PLATFORM. > > Signed-off-by: Pierre Morel > Acked-by: Jason Wang > Acked-by: Christian Borntraeger > --- > arch/s390/mm/init.c | 6 ++ > drivers/virtio/virtio.c | 22 ++ > include/linux/virtio.h | 2 ++ > 3 files changed, 30 insertions(+) > > diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c > index 6dc7c3b60ef6..215070c03226 100644 > --- a/arch/s390/mm/init.c > +++ b/arch/s390/mm/init.c > @@ -45,6 +45,7 @@ > #include > #include > #include > +#include > > pgd_t swapper_pg_dir[PTRS_PER_PGD] __section(.bss..swapper_pg_dir); > > @@ -161,6 +162,11 @@ bool force_dma_unencrypted(struct device *dev) > return is_prot_virt_guest(); > } > > +int arch_needs_virtio_iommu_platform(struct virtio_device *dev) > +{ > + return is_prot_virt_guest(); > +} > + > /* protected virtualization */ > static void pv_init(void) > { > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c > index a977e32a88f2..aa8e01104f86 100644 > --- a/drivers/virtio/virtio.c > +++ b/drivers/virtio/virtio.c > @@ -167,6 +167,21 @@ void virtio_add_status(struct virtio_device *dev, > unsigned int status) > } > EXPORT_SYMBOL_GPL(virtio_add_status); > > +/* > + * arch_needs_virtio_iommu_platform - provide arch specific hook when > finalizing > + * features for VIRTIO device dev > + * @dev: the VIRTIO device being added > + * > + * Permits the platform to provide architecture specific functionality when > + * devices features are finalized. This is the default implementation. > + * Architecture implementations can override this. > + */ > + > +int __weak arch_needs_virtio_iommu_platform(struct virtio_device *dev) > +{ > + return 0; > +} > + > int virtio_finalize_features(struct virtio_device *dev) > { > int ret = dev->config->finalize_features(dev); > @@ -179,6 +194,13 @@ int virtio_finalize_features(struct virtio_device *dev) > if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) > return 0; > > + if (arch_needs_virtio_iommu_platform(dev) && > + !virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) { > + dev_warn(>dev, > + "virtio: device must provide > VIRTIO_F_IOMMU_PLATFORM\n"); > + return -ENODEV; > + } > + > virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK); > status = dev->config->get_status(dev); > if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) { Well don't you need to check it *before* VIRTIO_F_VERSION_1, not after? > diff --git a/include/linux/virtio.h b/include/linux/virtio.h > index a493eac08393..e8526ae3463e 100644 > --- a/include/linux/virtio.h > +++ b/include/linux/virtio.h > @@ -195,4 +195,6 @@ void unregister_virtio_driver(struct virtio_driver *drv); > #define module_virtio_driver(__virtio_driver) \ > module_driver(__virtio_driver, register_virtio_driver, \ > unregister_virtio_driver) > + > +int arch_needs_virtio_iommu_platform(struct virtio_device *dev); > #endif /* _LINUX_VIRTIO_H */ > -- > 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC 4/5] vhost-vdpa: support IOTLB batching hints
On Mon, Jun 29, 2020 at 05:26:03PM +0800, Jason Wang wrote: > > On 2020/6/28 下午5:58, Michael S. Tsirkin wrote: > > On Thu, Jun 18, 2020 at 01:56:25PM +0800, Jason Wang wrote: > > > This patches extend the vhost IOTLB API to accept batch updating hints > > > form userspace. When userspace wants update the device IOTLB in a > > > batch, it may do: > > > > > > 1) Write vhost_iotlb_msg with VHOST_IOTLB_BATCH_BEGIN flag > > > 2) Perform a batch of IOTLB updating via VHOST_IOTLB_UPDATE/INVALIDATE > > > 3) Write vhost_iotlb_msg with VHOST_IOTLB_BATCH_END flag > > As long as we are extending the interface, > > is there some way we could cut down the number of system calls needed > > here? > > > I'm not sure it's worth to do that since usually we only have less than 10 > regions. Well with a guest iommu I'm guessing it can go up significantly, right? > A possible method is to carry multiple vhost_iotlb_message in one system > call. That's an option. Also, can kernel handle a batch that is too large by applying parts of it iteratively? Or must all changes take place atomically after BATCH_END? If atomically, we might need to limit batch size to make sure kernel can keep a copy in memory. > > > > > > > > Vhost-vdpa may decide to batch the IOMMU/IOTLB updating in step 3 when > > > vDPA device support set_map() ops. This is useful for the vDPA device > > > that want to know all the mappings to tweak their own DMA translation > > > logic. > > > > > > For vDPA device that doesn't require set_map(), no behavior changes. > > > > > > This capability is advertised via VHOST_BACKEND_F_IOTLB_BATCH capability. > > > > > > Signed-off-by: Jason Wang > > > --- > > > drivers/vhost/vdpa.c | 30 +++--- > > > include/uapi/linux/vhost.h | 2 ++ > > > include/uapi/linux/vhost_types.h | 7 +++ > > > 3 files changed, 32 insertions(+), 7 deletions(-) > > > > > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c > > > index 453057421f80..8f624bbafee7 100644 > > > --- a/drivers/vhost/vdpa.c > > > +++ b/drivers/vhost/vdpa.c > > > @@ -56,7 +56,9 @@ enum { > > > }; > > > enum { > > > - VHOST_VDPA_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2) > > > + VHOST_VDPA_BACKEND_FEATURES = > > > + (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2) | > > > + (1ULL << VHOST_BACKEND_F_IOTLB_BATCH), > > > }; > > > /* Currently, only network backend w/o multiqueue is supported. */ > > > @@ -77,6 +79,7 @@ struct vhost_vdpa { > > > int virtio_id; > > > int minor; > > > struct eventfd_ctx *config_ctx; > > > + int in_batch; > > > }; > > > static DEFINE_IDA(vhost_vdpa_ida); > > > @@ -125,6 +128,7 @@ static void vhost_vdpa_reset(struct vhost_vdpa *v) > > > const struct vdpa_config_ops *ops = vdpa->config; > > > ops->set_status(vdpa, 0); > > > + v->in_batch = 0; > > > } > > > static long vhost_vdpa_get_device_id(struct vhost_vdpa *v, u8 __user > > > *argp) > > > @@ -540,9 +544,10 @@ static int vhost_vdpa_map(struct vhost_vdpa *v, > > > if (ops->dma_map) > > > r = ops->dma_map(vdpa, iova, size, pa, perm); > > > - else if (ops->set_map) > > > - r = ops->set_map(vdpa, dev->iotlb); > > > - else > > > + else if (ops->set_map) { > > > + if (!v->in_batch) > > > + r = ops->set_map(vdpa, dev->iotlb); > > > + } else > > > r = iommu_map(v->domain, iova, pa, size, > > > perm_to_iommu_flags(perm)); > > > @@ -559,9 +564,10 @@ static void vhost_vdpa_unmap(struct vhost_vdpa *v, > > > u64 iova, u64 size) > > > if (ops->dma_map) > > > ops->dma_unmap(vdpa, iova, size); > > > - else if (ops->set_map) > > > - ops->set_map(vdpa, dev->iotlb); > > > - else > > > + else if (ops->set_map) { > > > + if (!v->in_batch) > > > + ops->set_map(vdpa, dev->iotlb); > > > + } else > > > iommu_unmap(v->domain, iova, size); > > > } > > > @@ -655,6 +661,8 @@ static int vhost_vdpa_process_iotlb_msg(struct > > > vhost_dev *dev, > > > struct vhost_iotlb_msg *msg) > > > { > > > struct vhost_vdpa *v = container_of(dev, struct vhost_vdpa, > > > vdev); > > > + struct vdpa_device *vdpa = v->vdpa; > > > + const struct vdpa_config_ops *ops = vdpa->config; > > > int r = 0; > > > r = vhost_dev_check_owner(dev); > > > @@ -668,6 +676,14 @@ static int vhost_vdpa_process_iotlb_msg(struct > > > vhost_dev *dev, > > > case VHOST_IOTLB_INVALIDATE: > > > vhost_vdpa_unmap(v, msg->iova, msg->size); > > > break; > > > + case VHOST_IOTLB_BATCH_BEGIN: > > > + v->in_batch = true; > > > + break; > > > + case VHOST_IOTLB_BATCH_END: > > > + if (v->in_batch && ops->set_map) > > > + ops->set_map(vdpa, dev->iotlb); > > > + v->in_batch =
Re: [RFC 0/3] virtio: NUMA-aware memory allocation
On Mon, Jun 29, 2020 at 10:26:46AM +0100, Stefan Hajnoczi wrote: > On Sun, Jun 28, 2020 at 02:34:37PM +0800, Jason Wang wrote: > > > > On 2020/6/25 下午9:57, Stefan Hajnoczi wrote: > > > These patches are not ready to be merged because I was unable to measure a > > > performance improvement. I'm publishing them so they are archived in case > > > someone picks up this work again in the future. > > > > > > The goal of these patches is to allocate virtqueues and driver state from > > > the > > > device's NUMA node for optimal memory access latency. Only guests with a > > > vNUMA > > > topology and virtio devices spread across vNUMA nodes benefit from this. > > > In > > > other cases the memory placement is fine and we don't need to take NUMA > > > into > > > account inside the guest. > > > > > > These patches could be extended to virtio_net.ko and other devices in the > > > future. I only tested virtio_blk.ko. > > > > > > The benchmark configuration was designed to trigger worst-case NUMA > > > placement: > > > * Physical NVMe storage controller on host NUMA node 0 It's possible that numa is not such a big deal for NVMe. And it's possible that bios misconfigures ACPI reporting NUMA placement incorrectly. I think that the best thing to try is to use a ramdisk on a specific numa node. > > > * IOThread pinned to host NUMA node 0 > > > * virtio-blk-pci device in vNUMA node 1 > > > * vCPU 0 on host NUMA node 1 and vCPU 1 on host NUMA node 0 > > > * vCPU 0 in vNUMA node 0 and vCPU 1 in vNUMA node 1 > > > > > > The intent is to have .probe() code run on vCPU 0 in vNUMA node 0 (host > > > NUMA > > > node 1) so that memory is in the wrong NUMA node for the virtio-blk-pci > > > devic= > > > e. > > > Applying these patches fixes memory placement so that virtqueues and > > > driver > > > state is allocated in vNUMA node 1 where the virtio-blk-pci device is > > > located. > > > > > > The fio 4KB randread benchmark results do not show a significant > > > improvement: > > > > > > Name IOPS Error > > > virtio-blk42373.79 =C2=B1 0.54% > > > virtio-blk-numa 42517.07 =C2=B1 0.79% > > > > > > I remember I did something similar in vhost by using page_to_nid() for > > descriptor ring. And I get little improvement as shown here. > > > > Michael reminds that it was probably because all data were cached. So I > > doubt if the test lacks sufficient stress on the cache ... > > Yes, that sounds likely. If there's no real-world performance > improvement then I'm happy to leave these patches unmerged. > > Stefan Well that was for vhost though. This is virtio, which is different. Doesn't some benchmark put pressure on the CPU cache? I kind of feel there should be a difference, and the fact there isn't means there's some other bottleneck somewhere. Might be worth figuring out. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 1/1] s390: virtio: let arch accept devices without IOMMU feature
On Mon, 29 Jun 2020 15:14:04 +0200 Pierre Morel wrote: > On 2020-06-19 11:20, Cornelia Huck wrote: > > On Thu, 18 Jun 2020 00:29:56 +0200 > > Halil Pasic wrote: > > > >> On Wed, 17 Jun 2020 12:43:57 +0200 > >> Pierre Morel wrote: > >>> @@ -179,6 +194,13 @@ int virtio_finalize_features(struct virtio_device > >>> *dev) > >>> if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) > >>> return 0; > >>> > >>> + if (arch_needs_virtio_iommu_platform(dev) && > >>> + !virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) { > >>> + dev_warn(>dev, > >>> + "virtio: device must provide > >>> VIRTIO_F_IOMMU_PLATFORM\n"); [Side note: wasn't there a patch renaming this bit on the list? I think this name is only kept for userspace compat.] > >> > >> I'm not sure, divulging the current Linux name of this feature bit is a > >> good idea, but if everybody else is fine with this, I don't care that > > > > Not sure if that feature name will ever change, as it is exported in > > headers. At most, we might want to add the new ACCESS_PLATFORM define > > and keep the old one, but that would still mean some churn. > > > >> much. An alternative would be: > >> "virtio: device falsely claims to have full access to the memory, > >> aborting the device" > > > > "virtio: device does not work with limited memory access" ? > > > > But no issue with keeping the current message. > > > > If it is OK, I would like to specify that the arch is responsible to > accept or not the device. > The reason why the device is not accepted without IOMMU_PLATFORM is arch > specific. Hm, I'd think the reason is always the same (the device cannot access the memory directly), just the way to figure out whether that is the case or not is arch-specific, as with so many other things. No real need to go into detail here, I think. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 1/1] s390: virtio: let arch accept devices without IOMMU feature
On 2020-06-18 00:29, Halil Pasic wrote: On Wed, 17 Jun 2020 12:43:57 +0200 Pierre Morel wrote: An architecture protecting the guest memory against unauthorized host access may want to enforce VIRTIO I/O device protection through the use of VIRTIO_F_IOMMU_PLATFORM. Let's give a chance to the architecture to accept or not devices without VIRTIO_F_IOMMU_PLATFORM. [..] I'm still not really satisfied with your commit message, furthermore I did some thinking about the abstraction you introduce here. I will give a short analysis of that, but first things first. Your patch does the job of preventing calamity, and the details can be changed any time, thus: Acked-by: Halil Pasic Thanks, Connie already answered the other points you raised and I have nothing to add on it. Regards, Pierre -- Pierre Morel IBM Lab Boeblingen ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 1/1] s390: virtio: let arch accept devices without IOMMU feature
On 2020-06-19 14:02, Halil Pasic wrote: On Fri, 19 Jun 2020 11:20:51 +0200 Cornelia Huck wrote: + if (arch_needs_virtio_iommu_platform(dev) && + !virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) { + dev_warn(>dev, +"virtio: device must provide VIRTIO_F_IOMMU_PLATFORM\n"); I'm not sure, divulging the current Linux name of this feature bit is a good idea, but if everybody else is fine with this, I don't care that Not sure if that feature name will ever change, as it is exported in headers. At most, we might want to add the new ACCESS_PLATFORM define and keep the old one, but that would still mean some churn. much. An alternative would be: "virtio: device falsely claims to have full access to the memory, aborting the device" "virtio: device does not work with limited memory access" ? But no issue with keeping the current message. I think I prefer Conny's version, but no strong feelings here. The reason why the device is not accepted without IOMMU_PLATFORM is arch specific, I think it should be clearly stated. If no strong oposition... Thanks, Pierre -- Pierre Morel IBM Lab Boeblingen ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 1/1] s390: virtio: let arch accept devices without IOMMU feature
On 2020-06-19 11:20, Cornelia Huck wrote: On Thu, 18 Jun 2020 00:29:56 +0200 Halil Pasic wrote: On Wed, 17 Jun 2020 12:43:57 +0200 Pierre Morel wrote: ... But since this can be rewritten any time, let's go with the option people already agree with, instead of more discussion. Yes, there's nothing wrong with the patch as-is. Acked-by: Cornelia Huck Thanks, Which tree should this go through? Virtio? s390? > Just another question. Do we want this backported? Do we need cc stable? It does change behaviour of virtio-ccw devices; but then, it only fences off configurations that would not have worked anyway. Distributions should probably pick this; but I do not consider it strictly a "fix" (more a mitigation for broken configurations), so I'm not sure whether stable applies. [..] int virtio_finalize_features(struct virtio_device *dev) { int ret = dev->config->finalize_features(dev); @@ -179,6 +194,13 @@ int virtio_finalize_features(struct virtio_device *dev) if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) return 0; + if (arch_needs_virtio_iommu_platform(dev) && + !virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) { + dev_warn(>dev, +"virtio: device must provide VIRTIO_F_IOMMU_PLATFORM\n"); I'm not sure, divulging the current Linux name of this feature bit is a good idea, but if everybody else is fine with this, I don't care that Not sure if that feature name will ever change, as it is exported in headers. At most, we might want to add the new ACCESS_PLATFORM define and keep the old one, but that would still mean some churn. much. An alternative would be: "virtio: device falsely claims to have full access to the memory, aborting the device" "virtio: device does not work with limited memory access" ? But no issue with keeping the current message. If it is OK, I would like to specify that the arch is responsible to accept or not the device. The reason why the device is not accepted without IOMMU_PLATFORM is arch specific. Regards, Pierre -- Pierre Morel IBM Lab Boeblingen ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC 0/3] virtio: NUMA-aware memory allocation
On Sun, Jun 28, 2020 at 02:34:37PM +0800, Jason Wang wrote: > > On 2020/6/25 下午9:57, Stefan Hajnoczi wrote: > > These patches are not ready to be merged because I was unable to measure a > > performance improvement. I'm publishing them so they are archived in case > > someone picks up this work again in the future. > > > > The goal of these patches is to allocate virtqueues and driver state from > > the > > device's NUMA node for optimal memory access latency. Only guests with a > > vNUMA > > topology and virtio devices spread across vNUMA nodes benefit from this. In > > other cases the memory placement is fine and we don't need to take NUMA into > > account inside the guest. > > > > These patches could be extended to virtio_net.ko and other devices in the > > future. I only tested virtio_blk.ko. > > > > The benchmark configuration was designed to trigger worst-case NUMA > > placement: > > * Physical NVMe storage controller on host NUMA node 0 > > * IOThread pinned to host NUMA node 0 > > * virtio-blk-pci device in vNUMA node 1 > > * vCPU 0 on host NUMA node 1 and vCPU 1 on host NUMA node 0 > > * vCPU 0 in vNUMA node 0 and vCPU 1 in vNUMA node 1 > > > > The intent is to have .probe() code run on vCPU 0 in vNUMA node 0 (host NUMA > > node 1) so that memory is in the wrong NUMA node for the virtio-blk-pci > > devic= > > e. > > Applying these patches fixes memory placement so that virtqueues and driver > > state is allocated in vNUMA node 1 where the virtio-blk-pci device is > > located. > > > > The fio 4KB randread benchmark results do not show a significant > > improvement: > > > > Name IOPS Error > > virtio-blk42373.79 =C2=B1 0.54% > > virtio-blk-numa 42517.07 =C2=B1 0.79% > > > I remember I did something similar in vhost by using page_to_nid() for > descriptor ring. And I get little improvement as shown here. > > Michael reminds that it was probably because all data were cached. So I > doubt if the test lacks sufficient stress on the cache ... Yes, that sounds likely. If there's no real-world performance improvement then I'm happy to leave these patches unmerged. Stefan signature.asc Description: PGP signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC 4/5] vhost-vdpa: support IOTLB batching hints
On 2020/6/28 下午5:58, Michael S. Tsirkin wrote: On Thu, Jun 18, 2020 at 01:56:25PM +0800, Jason Wang wrote: This patches extend the vhost IOTLB API to accept batch updating hints form userspace. When userspace wants update the device IOTLB in a batch, it may do: 1) Write vhost_iotlb_msg with VHOST_IOTLB_BATCH_BEGIN flag 2) Perform a batch of IOTLB updating via VHOST_IOTLB_UPDATE/INVALIDATE 3) Write vhost_iotlb_msg with VHOST_IOTLB_BATCH_END flag As long as we are extending the interface, is there some way we could cut down the number of system calls needed here? I'm not sure it's worth to do that since usually we only have less than 10 regions. A possible method is to carry multiple vhost_iotlb_message in one system call. Vhost-vdpa may decide to batch the IOMMU/IOTLB updating in step 3 when vDPA device support set_map() ops. This is useful for the vDPA device that want to know all the mappings to tweak their own DMA translation logic. For vDPA device that doesn't require set_map(), no behavior changes. This capability is advertised via VHOST_BACKEND_F_IOTLB_BATCH capability. Signed-off-by: Jason Wang --- drivers/vhost/vdpa.c | 30 +++--- include/uapi/linux/vhost.h | 2 ++ include/uapi/linux/vhost_types.h | 7 +++ 3 files changed, 32 insertions(+), 7 deletions(-) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 453057421f80..8f624bbafee7 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -56,7 +56,9 @@ enum { }; enum { - VHOST_VDPA_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2) + VHOST_VDPA_BACKEND_FEATURES = + (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2) | + (1ULL << VHOST_BACKEND_F_IOTLB_BATCH), }; /* Currently, only network backend w/o multiqueue is supported. */ @@ -77,6 +79,7 @@ struct vhost_vdpa { int virtio_id; int minor; struct eventfd_ctx *config_ctx; + int in_batch; }; static DEFINE_IDA(vhost_vdpa_ida); @@ -125,6 +128,7 @@ static void vhost_vdpa_reset(struct vhost_vdpa *v) const struct vdpa_config_ops *ops = vdpa->config; ops->set_status(vdpa, 0); + v->in_batch = 0; } static long vhost_vdpa_get_device_id(struct vhost_vdpa *v, u8 __user *argp) @@ -540,9 +544,10 @@ static int vhost_vdpa_map(struct vhost_vdpa *v, if (ops->dma_map) r = ops->dma_map(vdpa, iova, size, pa, perm); - else if (ops->set_map) - r = ops->set_map(vdpa, dev->iotlb); - else + else if (ops->set_map) { + if (!v->in_batch) + r = ops->set_map(vdpa, dev->iotlb); + } else r = iommu_map(v->domain, iova, pa, size, perm_to_iommu_flags(perm)); @@ -559,9 +564,10 @@ static void vhost_vdpa_unmap(struct vhost_vdpa *v, u64 iova, u64 size) if (ops->dma_map) ops->dma_unmap(vdpa, iova, size); - else if (ops->set_map) - ops->set_map(vdpa, dev->iotlb); - else + else if (ops->set_map) { + if (!v->in_batch) + ops->set_map(vdpa, dev->iotlb); + } else iommu_unmap(v->domain, iova, size); } @@ -655,6 +661,8 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev, struct vhost_iotlb_msg *msg) { struct vhost_vdpa *v = container_of(dev, struct vhost_vdpa, vdev); + struct vdpa_device *vdpa = v->vdpa; + const struct vdpa_config_ops *ops = vdpa->config; int r = 0; r = vhost_dev_check_owner(dev); @@ -668,6 +676,14 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev, case VHOST_IOTLB_INVALIDATE: vhost_vdpa_unmap(v, msg->iova, msg->size); break; + case VHOST_IOTLB_BATCH_BEGIN: + v->in_batch = true; + break; + case VHOST_IOTLB_BATCH_END: + if (v->in_batch && ops->set_map) + ops->set_map(vdpa, dev->iotlb); + v->in_batch = false; + break; default: r = -EINVAL; break; diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h index 0c2349612e77..565da96f55d5 100644 --- a/include/uapi/linux/vhost.h +++ b/include/uapi/linux/vhost.h @@ -91,6 +91,8 @@ /* Use message type V2 */ #define VHOST_BACKEND_F_IOTLB_MSG_V2 0x1 +/* IOTLB can accpet batching hints */ typo Will fix. +#define VHOST_BACKEND_F_IOTLB_BATCH 0x2 #define VHOST_SET_BACKEND_FEATURES _IOW(VHOST_VIRTIO, 0x25, __u64) #define VHOST_GET_BACKEND_FEATURES _IOR(VHOST_VIRTIO, 0x26, __u64) diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h index 669457ce5c48..5c12faffdde9 100644 --- a/include/uapi/linux/vhost_types.h +++ b/include/uapi/linux/vhost_types.h @@ -60,6 +60,13 @@ struct vhost_iotlb_msg {
RE: [PATCH] xen: introduce xen_vring_use_dma
> Subject: Re: [PATCH] xen: introduce xen_vring_use_dma > > On Mon, Jun 29, 2020 at 06:25:41AM +, Peng Fan wrote: > > > > > > Anyway, re-reading the last messages of the original thread > > > > > > [1], it looks like Peng had a clear idea on how to fix the general > > > > > > issue. > > > > > > Peng, what happened with that? > > > > > > > > We shrinked the rpmsg reserved area to workaround the issue. > > > > So still use the dma apis in rpmsg. > > > > > > > > But here I am going to address domu android trusty issue using virtio. > > > > > > My suggestion is to first of all fix DMA API so it works properly. > > > > Could you please elaborate more details? > > > > You mean the DMA API usage of rpmsg? Or xen domu dma_ops? > > > > Thanks, > > Peng. > > Not 100% sure but I think xen dma ops. Sorry to ask again, could you explain more details about what issues do you see of current xen ARM domu dma_ops? Or Dom0 swiotlb xen dma_ops? Thanks, Pen.g > > -- > MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] xen: introduce xen_vring_use_dma
On Mon, Jun 29, 2020 at 06:25:41AM +, Peng Fan wrote: > > > > > Anyway, re-reading the last messages of the original thread [1], > > > > > it looks like Peng had a clear idea on how to fix the general issue. > > > > > Peng, what happened with that? > > > > > > We shrinked the rpmsg reserved area to workaround the issue. > > > So still use the dma apis in rpmsg. > > > > > > But here I am going to address domu android trusty issue using virtio. > > > > My suggestion is to first of all fix DMA API so it works properly. > > Could you please elaborate more details? > > You mean the DMA API usage of rpmsg? Or xen domu dma_ops? > > Thanks, > Peng. Not 100% sure but I think xen dma ops. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH] xen: introduce xen_vring_use_dma
> Subject: Re: [PATCH] xen: introduce xen_vring_use_dma > > On Mon, Jun 29, 2020 at 03:05:19AM +, Peng Fan wrote: > > > Subject: Re: [PATCH] xen: introduce xen_vring_use_dma > > > > > > On Thu, Jun 25, 2020 at 10:31:27AM -0700, Stefano Stabellini wrote: > > > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote: > > > > > On Wed, Jun 24, 2020 at 02:53:54PM -0700, Stefano Stabellini wrote: > > > > > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote: > > > > > > > On Wed, Jun 24, 2020 at 10:59:47AM -0700, Stefano Stabellini > wrote: > > > > > > > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote: > > > > > > > > > On Wed, Jun 24, 2020 at 05:17:32PM +0800, Peng Fan wrote: > > > > > > > > > > Export xen_swiotlb for all platforms using xen swiotlb > > > > > > > > > > > > > > > > > > > > Use xen_swiotlb to determine when vring should use dma > > > > > > > > > > APIs to map the > > > > > > > > > > ring: when xen_swiotlb is enabled the dma API is required. > > > > > > > > > > When it is disabled, it is not required. > > > > > > > > > > > > > > > > > > > > Signed-off-by: Peng Fan > > > > > > > > > > > > > > > > > > Isn't there some way to use VIRTIO_F_IOMMU_PLATFORM for > > > this? > > > > > > > > > Xen was there first, but everyone else is using that now. > > > > > > > > > > > > > > > > Unfortunately it is complicated and it is not related to > > > > > > > > VIRTIO_F_IOMMU_PLATFORM :-( > > > > > > > > > > > > > > > > > > > > > > > > The Xen subsystem in Linux uses dma_ops via swiotlb_xen to > > > > > > > > translate foreign mappings (memory coming from other VMs) > > > > > > > > to > > > physical addresses. > > > > > > > > On x86, it also uses dma_ops to translate Linux's idea of > > > > > > > > a physical address into a real physical address (this is > > > > > > > > unneeded on ARM.) > > > > > > > > > > > > > > > > > > > > > > > > So regardless of VIRTIO_F_IOMMU_PLATFORM, dma_ops should > > > > > > > > be used on Xen/x86 always and on Xen/ARM if Linux is Dom0 > > > > > > > > (because it has foreign > > > > > > > > mappings.) That is why we have the if (xen_domain) return > > > > > > > > true; in vring_use_dma_api. > > > > > > > > > > > > > > VIRTIO_F_IOMMU_PLATFORM makes guest always use DMA ops. > > > > > > > > > > > > > > Xen hack predates VIRTIO_F_IOMMU_PLATFORM so it *also* > > > > > > > forces DMA ops even if VIRTIO_F_IOMMU_PLATFORM is clear. > > > > > > > > > > > > > > Unfortunately as a result Xen never got around to properly > > > > > > > setting VIRTIO_F_IOMMU_PLATFORM. > > > > > > > > > > > > I don't think VIRTIO_F_IOMMU_PLATFORM would be correct for > > > > > > this because the usage of swiotlb_xen is not a property of > > > > > > virtio, > > > > > > > > > > > > > > > Basically any device without VIRTIO_F_ACCESS_PLATFORM (that is > > > > > it's name in latest virtio spec, VIRTIO_F_IOMMU_PLATFORM is what > > > > > linux calls it) is declared as "special, don't follow normal > > > > > rules for access". > > > > > > > > > > So yes swiotlb_xen is not a property of virtio, but what *is* a > > > > > property of virtio is that it's not special, just a regular > > > > > device from DMA > > > POV. > > > > > > > > I am trying to understand what you meant but I think I am missing > > > > something. > > > > > > > > Are you saying that modern virtio should always have > > > > VIRTIO_F_ACCESS_PLATFORM, hence use normal dma_ops as any other > > > devices? > > > > > > I am saying it's a safe default. Clear VIRTIO_F_ACCESS_PLATFORM if > > > you have some special needs e.g. you are very sure it's ok to bypass > > > DMA ops, or you need to support a legacy guest (produced in the > > > window between virtio 1 support in 2014 and support for > VIRTIO_F_ACCESS_PLATFORM in 2016). > > > > > > > > > > If that is the case, how is it possible that virtio breaks on ARM > > > > using the default dma_ops? The breakage is not Xen related (except > > > > that Xen turns dma_ops on). The original message from Peng was: > > > > > > > > vring_map_one_sg -> vring_use_dma_api > > > >-> dma_map_page > > > >-> __swiotlb_map_page > > > > ->swiotlb_map_page > > > > > > > > ->__dma_map_area(phys_to_virt(dma_to_phys(dev, > > > dev_addr)), size, dir); > > > > However we are using per device dma area for rpmsg, phys_to_virt > > > > could not return a correct virtual address for virtual address in > > > > vmalloc area. Then kernel panic. > > > > > > > > I must be missing something. Maybe it is because it has to do with > RPMesg? > > > > > > I think it's an RPMesg bug, yes > > > > rpmsg bug is another issue, it should not use dma_alloc_coherent for > > reserved area, and use vmalloc_to_page. > > > > Anyway here using dma api will also trigger issue. > > > > > > > > > > > > > > > > > You might have noticed that I missed one possible case above: > > > > > > > > Xen/ARM DomU :-) > > > > > > > > > > > > > > > > Xen/ARM domUs don't need
Re: [PATCH] xen: introduce xen_vring_use_dma
On Mon, Jun 29, 2020 at 03:05:19AM +, Peng Fan wrote: > > Subject: Re: [PATCH] xen: introduce xen_vring_use_dma > > > > On Thu, Jun 25, 2020 at 10:31:27AM -0700, Stefano Stabellini wrote: > > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote: > > > > On Wed, Jun 24, 2020 at 02:53:54PM -0700, Stefano Stabellini wrote: > > > > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote: > > > > > > On Wed, Jun 24, 2020 at 10:59:47AM -0700, Stefano Stabellini wrote: > > > > > > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote: > > > > > > > > On Wed, Jun 24, 2020 at 05:17:32PM +0800, Peng Fan wrote: > > > > > > > > > Export xen_swiotlb for all platforms using xen swiotlb > > > > > > > > > > > > > > > > > > Use xen_swiotlb to determine when vring should use dma > > > > > > > > > APIs to map the > > > > > > > > > ring: when xen_swiotlb is enabled the dma API is required. > > > > > > > > > When it is disabled, it is not required. > > > > > > > > > > > > > > > > > > Signed-off-by: Peng Fan > > > > > > > > > > > > > > > > Isn't there some way to use VIRTIO_F_IOMMU_PLATFORM for > > this? > > > > > > > > Xen was there first, but everyone else is using that now. > > > > > > > > > > > > > > Unfortunately it is complicated and it is not related to > > > > > > > VIRTIO_F_IOMMU_PLATFORM :-( > > > > > > > > > > > > > > > > > > > > > The Xen subsystem in Linux uses dma_ops via swiotlb_xen to > > > > > > > translate foreign mappings (memory coming from other VMs) to > > physical addresses. > > > > > > > On x86, it also uses dma_ops to translate Linux's idea of a > > > > > > > physical address into a real physical address (this is > > > > > > > unneeded on ARM.) > > > > > > > > > > > > > > > > > > > > > So regardless of VIRTIO_F_IOMMU_PLATFORM, dma_ops should be > > > > > > > used on Xen/x86 always and on Xen/ARM if Linux is Dom0 > > > > > > > (because it has foreign > > > > > > > mappings.) That is why we have the if (xen_domain) return > > > > > > > true; in vring_use_dma_api. > > > > > > > > > > > > VIRTIO_F_IOMMU_PLATFORM makes guest always use DMA ops. > > > > > > > > > > > > Xen hack predates VIRTIO_F_IOMMU_PLATFORM so it *also* forces > > > > > > DMA ops even if VIRTIO_F_IOMMU_PLATFORM is clear. > > > > > > > > > > > > Unfortunately as a result Xen never got around to properly > > > > > > setting VIRTIO_F_IOMMU_PLATFORM. > > > > > > > > > > I don't think VIRTIO_F_IOMMU_PLATFORM would be correct for this > > > > > because the usage of swiotlb_xen is not a property of virtio, > > > > > > > > > > > > Basically any device without VIRTIO_F_ACCESS_PLATFORM (that is it's > > > > name in latest virtio spec, VIRTIO_F_IOMMU_PLATFORM is what linux > > > > calls it) is declared as "special, don't follow normal rules for > > > > access". > > > > > > > > So yes swiotlb_xen is not a property of virtio, but what *is* a > > > > property of virtio is that it's not special, just a regular device from > > > > DMA > > POV. > > > > > > I am trying to understand what you meant but I think I am missing > > > something. > > > > > > Are you saying that modern virtio should always have > > > VIRTIO_F_ACCESS_PLATFORM, hence use normal dma_ops as any other > > devices? > > > > I am saying it's a safe default. Clear VIRTIO_F_ACCESS_PLATFORM if you have > > some special needs e.g. you are very sure it's ok to bypass DMA ops, or you > > need to support a legacy guest (produced in the window between virtio 1 > > support in 2014 and support for VIRTIO_F_ACCESS_PLATFORM in 2016). > > > > > > > If that is the case, how is it possible that virtio breaks on ARM > > > using the default dma_ops? The breakage is not Xen related (except > > > that Xen turns dma_ops on). The original message from Peng was: > > > > > > vring_map_one_sg -> vring_use_dma_api > > >-> dma_map_page > > > -> __swiotlb_map_page > > > ->swiotlb_map_page > > > > > > ->__dma_map_area(phys_to_virt(dma_to_phys(dev, > > dev_addr)), size, dir); > > > However we are using per device dma area for rpmsg, phys_to_virt > > > could not return a correct virtual address for virtual address in > > > vmalloc area. Then kernel panic. > > > > > > I must be missing something. Maybe it is because it has to do with RPMesg? > > > > I think it's an RPMesg bug, yes > > rpmsg bug is another issue, it should not use dma_alloc_coherent for reserved > area, > and use vmalloc_to_page. > > Anyway here using dma api will also trigger issue. > > > > > > > > > > > > > You might have noticed that I missed one possible case above: > > > > > > > Xen/ARM DomU :-) > > > > > > > > > > > > > > Xen/ARM domUs don't need swiotlb_xen, it is not even > > > > > > > initialized. So if > > > > > > > (xen_domain) return true; would give the wrong answer in that > > > > > > > case. > > > > > > > Linux would end up calling the "normal" dma_ops, not > > > > > > > swiotlb-xen, and the "normal" dma_ops fail. > > > >