Re: [PATCH RFC v8 02/11] vhost: use batched get_vq_desc version
On 2020/6/11 下午7:34, Michael S. Tsirkin wrote: static void vhost_vq_free_iovecs(struct vhost_virtqueue *vq) { kfree(vq->descs); @@ -394,6 +400,9 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev) for (i = 0; i < dev->nvqs; ++i) { vq = dev->vqs[i]; vq->max_descs = dev->iov_limit; + if (vhost_vq_num_batch_descs(vq) < 0) { + return -EINVAL; + } This check breaks vdpa which set iov_limit to zero. Consider iov_limit is meaningless to vDPA, I wonder we can skip the test when device doesn't use worker. Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 4/4] vhost: vdpa: report iova range
This patch introduces a new ioctl for vhost-vdpa device that can report the iova range by the device. For device that depends on platform IOMMU, we fetch the iova range via DOMAIN_ATTR_GEOMETRY. For devices that has its own DMA translation unit, we fetch it directly from vDPA bus operation. Signed-off-by: Jason Wang --- drivers/vhost/vdpa.c | 27 +++ include/uapi/linux/vhost.h | 4 include/uapi/linux/vhost_types.h | 5 + 3 files changed, 36 insertions(+) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 77a0c9fb6cc3..ad23e66cbf57 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -332,6 +332,30 @@ static long vhost_vdpa_set_config_call(struct vhost_vdpa *v, u32 __user *argp) return 0; } + +static long vhost_vdpa_get_iova_range(struct vhost_vdpa *v, u32 __user *argp) +{ + struct iommu_domain_geometry geo; + struct vdpa_device *vdpa = v->vdpa; + const struct vdpa_config_ops *ops = vdpa->config; + struct vhost_vdpa_iova_range range; + struct vdpa_iova_range vdpa_range; + + if (!ops->set_map && !ops->dma_map) { + iommu_domain_get_attr(v->domain, + DOMAIN_ATTR_GEOMETRY, ); + range.start = geo.aperture_start; + range.end = geo.aperture_end; + } else { + vdpa_range = ops->get_iova_range(vdpa); + range.start = vdpa_range.start; + range.end = vdpa_range.end; + } + + return copy_to_user(argp, , sizeof(range)); + +} + static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd, void __user *argp) { @@ -442,6 +466,9 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep, case VHOST_VDPA_SET_CONFIG_CALL: r = vhost_vdpa_set_config_call(v, argp); break; + case VHOST_VDPA_GET_IOVA_RANGE: + r = vhost_vdpa_get_iova_range(v, argp); + break; default: r = vhost_dev_ioctl(>vdev, cmd, argp); if (r == -ENOIOCTLCMD) diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h index 0c2349612e77..850956980e27 100644 --- a/include/uapi/linux/vhost.h +++ b/include/uapi/linux/vhost.h @@ -144,4 +144,8 @@ /* Set event fd for config interrupt*/ #define VHOST_VDPA_SET_CONFIG_CALL _IOW(VHOST_VIRTIO, 0x77, int) + +/* Get the valid iova range */ +#define VHOST_VDPA_GET_IOVA_RANGE _IOW(VHOST_VIRTIO, 0x78, \ +struct vhost_vdpa_iova_range) #endif diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h index 669457ce5c48..4025b5a36177 100644 --- a/include/uapi/linux/vhost_types.h +++ b/include/uapi/linux/vhost_types.h @@ -127,6 +127,11 @@ struct vhost_vdpa_config { __u8 buf[0]; }; +struct vhost_vdpa_iova_range { + __u64 start; + __u64 end; +}; + /* Feature bits */ /* Log all write descriptors. Can be changed while device is active. */ #define VHOST_F_LOG_ALL 26 -- 2.20.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 3/4] vdpa: get_iova_range() is mandatory for device specific DMA translation
In order to let userspace work correctly, get_iova_range() is a must for the device that has its own DMA translation logic. Signed-off-by: Jason Wang --- drivers/vdpa/vdpa.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index de211ef3738c..ab7af978ef70 100644 --- a/drivers/vdpa/vdpa.c +++ b/drivers/vdpa/vdpa.c @@ -82,6 +82,10 @@ struct vdpa_device *__vdpa_alloc_device(struct device *parent, if (!!config->dma_map != !!config->dma_unmap) goto err; + if ((config->dma_map || config->set_map) && + !config->get_iova_range) + goto err; + err = -ENOMEM; vdev = kzalloc(size, GFP_KERNEL); if (!vdev) -- 2.20.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 0/4] vDPA: API for reporting IOVA range
Hi All: This series introduces API for reporing IOVA range. This is a must for userspace to work correclty: - for the process that uses vhost-vDPA directly to properly allocate IOVA - for VM(qemu), when vIOMMU is not enabled, fail early if GPA is out of range - for VM(qemu), when vIOMMU is enabled, determine a valid guest address width Please review. Thanks Jason Wang (4): vdpa: introduce config op to get valid iova range vdpa_sim: implement get_iova_range bus operation vdpa: get_iova_range() is mandatory for device specific DMA translation vhost: vdpa: report iova range drivers/vdpa/vdpa.c | 4 drivers/vdpa/vdpa_sim/vdpa_sim.c | 11 +++ drivers/vhost/vdpa.c | 27 +++ include/linux/vdpa.h | 14 ++ include/uapi/linux/vhost.h | 4 include/uapi/linux/vhost_types.h | 5 + 6 files changed, 65 insertions(+) -- 2.20.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 1/4] vdpa: introduce config op to get valid iova range
This patch introduce a config op to get valid iova range from the vDPA device. Signed-off-by: Jason Wang --- include/linux/vdpa.h | 14 ++ 1 file changed, 14 insertions(+) diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index 239db794357c..b7633ed2500c 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -41,6 +41,16 @@ struct vdpa_device { unsigned int index; }; +/** + * vDPA IOVA range - the IOVA range support by the device + * @start: start of the IOVA range + * @end: end of the IOVA range + */ +struct vdpa_iova_range { + u64 start; + u64 end; +}; + /** * vDPA_config_ops - operations for configuring a vDPA device. * Note: vDPA device drivers are required to implement all of the @@ -134,6 +144,9 @@ struct vdpa_device { * @get_generation:Get device config generation (optional) * @vdev: vdpa device * Returns u32: device generation + * @get_iova_range:Get supported iova range (on-chip IOMMU) + * @vdev: vdpa device + * Returns the iova range supported by the device * @set_map: Set device memory mapping (optional) * Needed for device that using device * specific DMA translation (on-chip IOMMU) @@ -195,6 +208,7 @@ struct vdpa_config_ops { void (*set_config)(struct vdpa_device *vdev, unsigned int offset, const void *buf, unsigned int len); u32 (*get_generation)(struct vdpa_device *vdev); + struct vdpa_iova_range (*get_iova_range)(struct vdpa_device *vdev); /* DMA ops */ int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb); -- 2.20.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 2/4] vdpa_sim: implement get_iova_range bus operation
This patch implements get_iova_range method for vdpa_sim. Since vdpa_sim is a software device, simply advertise a [0ULL, ~0ULL] range. Signed-off-by: Jason Wang --- drivers/vdpa/vdpa_sim/vdpa_sim.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c index c7334cc65bb2..b3a6dc5b9984 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -524,6 +524,16 @@ static u32 vdpasim_get_generation(struct vdpa_device *vdpa) return vdpasim->generation; } +static struct vdpa_iova_range vdpasim_get_iova_range(struct vdpa_device *vdpa) +{ + struct vdpa_iova_range range; + + range.start = 0ULL; + range.end = ~0ULL; + + return range; +} + static int vdpasim_set_map(struct vdpa_device *vdpa, struct vhost_iotlb *iotlb) { @@ -597,6 +607,7 @@ static const struct vdpa_config_ops vdpasim_net_config_ops = { .get_config = vdpasim_get_config, .set_config = vdpasim_set_config, .get_generation = vdpasim_get_generation, + .get_iova_range = vdpasim_get_iova_range, .set_map= vdpasim_set_map, .dma_map= vdpasim_dma_map, .dma_unmap = vdpasim_dma_unmap, -- 2.20.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 1/3] mm/slab: Use memzero_explicit() in kzfree()
On Mon 15-06-20 21:57:16, Waiman Long wrote: > The kzfree() function is normally used to clear some sensitive > information, like encryption keys, in the buffer before freeing it back > to the pool. Memset() is currently used for the buffer clearing. However, > it is entirely possible that the compiler may choose to optimize away the > memory clearing especially if LTO is being used. To make sure that this > optimization will not happen, memzero_explicit(), which is introduced > in v3.18, is now used in kzfree() to do the clearing. > > Fixes: 3ef0e5ba4673 ("slab: introduce kzfree()") > Cc: sta...@vger.kernel.org > Signed-off-by: Waiman Long Acked-by: Michal Hocko Although I am not really sure this is a stable material. Is there any known instance where the memset was optimized out from kzfree? > --- > mm/slab_common.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/slab_common.c b/mm/slab_common.c > index 9e72ba224175..37d48a56431d 100644 > --- a/mm/slab_common.c > +++ b/mm/slab_common.c > @@ -1726,7 +1726,7 @@ void kzfree(const void *p) > if (unlikely(ZERO_OR_NULL_PTR(mem))) > return; > ks = ksize(mem); > - memset(mem, 0, ks); > + memzero_explicit(mem, ks); > kfree(mem); > } > EXPORT_SYMBOL(kzfree); > -- > 2.18.1 > -- Michal Hocko SUSE Labs ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 1/1] s390: virtio: let arch accept devices without IOMMU feature
On 15.06.20 14:39, Pierre Morel wrote: > An architecture protecting the guest memory against unauthorized host > access may want to enforce VIRTIO I/O device protection through the > use of VIRTIO_F_IOMMU_PLATFORM. > > Let's give a chance to the architecture to accept or not devices > without VIRTIO_F_IOMMU_PLATFORM. > > Signed-off-by: Pierre Morel Acked-by: Christian Borntraeger Shouldnt we maybe add a pr_warn if that happens to help the admins to understand what is going on? > --- > arch/s390/mm/init.c | 6 ++ > drivers/virtio/virtio.c | 9 + > include/linux/virtio.h | 2 ++ > 3 files changed, 17 insertions(+) > > diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c > index 87b2d024e75a..3f04ad09650f 100644 > --- a/arch/s390/mm/init.c > +++ b/arch/s390/mm/init.c > @@ -46,6 +46,7 @@ > #include > #include > #include > +#include > > pgd_t swapper_pg_dir[PTRS_PER_PGD] __section(.bss..swapper_pg_dir); > > @@ -162,6 +163,11 @@ bool force_dma_unencrypted(struct device *dev) > return is_prot_virt_guest(); > } > > +int arch_needs_iommu_platform(struct virtio_device *dev) > +{ > + return is_prot_virt_guest(); > +} > + > /* protected virtualization */ > static void pv_init(void) > { > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c > index a977e32a88f2..30091089bee8 100644 > --- a/drivers/virtio/virtio.c > +++ b/drivers/virtio/virtio.c > @@ -167,6 +167,11 @@ void virtio_add_status(struct virtio_device *dev, > unsigned int status) > } > EXPORT_SYMBOL_GPL(virtio_add_status); > > +int __weak arch_needs_iommu_platform(struct virtio_device *dev) > +{ > + return 0; > +} > + > int virtio_finalize_features(struct virtio_device *dev) > { > int ret = dev->config->finalize_features(dev); > @@ -179,6 +184,10 @@ int virtio_finalize_features(struct virtio_device *dev) > if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) > return 0; > > + if (arch_needs_iommu_platform(dev) && > + !virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) > + return -EIO; > + > virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK); > status = dev->config->get_status(dev); > if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) { > diff --git a/include/linux/virtio.h b/include/linux/virtio.h > index a493eac08393..2c46b310c38c 100644 > --- a/include/linux/virtio.h > +++ b/include/linux/virtio.h > @@ -195,4 +195,6 @@ void unregister_virtio_driver(struct virtio_driver *drv); > #define module_virtio_driver(__virtio_driver) \ > module_driver(__virtio_driver, register_virtio_driver, \ > unregister_virtio_driver) > + > +int arch_needs_iommu_platform(struct virtio_device *dev); > #endif /* _LINUX_VIRTIO_H */ > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 1/1] s390: virtio: let arch accept devices without IOMMU feature
On 2020/6/15 下午8:39, Pierre Morel wrote: An architecture protecting the guest memory against unauthorized host access may want to enforce VIRTIO I/O device protection through the use of VIRTIO_F_IOMMU_PLATFORM. Let's give a chance to the architecture to accept or not devices without VIRTIO_F_IOMMU_PLATFORM. Signed-off-by: Pierre Morel Acked-by: Jason Wang --- arch/s390/mm/init.c | 6 ++ drivers/virtio/virtio.c | 9 + include/linux/virtio.h | 2 ++ 3 files changed, 17 insertions(+) diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c index 87b2d024e75a..3f04ad09650f 100644 --- a/arch/s390/mm/init.c +++ b/arch/s390/mm/init.c @@ -46,6 +46,7 @@ #include #include #include +#include pgd_t swapper_pg_dir[PTRS_PER_PGD] __section(.bss..swapper_pg_dir); @@ -162,6 +163,11 @@ bool force_dma_unencrypted(struct device *dev) return is_prot_virt_guest(); } +int arch_needs_iommu_platform(struct virtio_device *dev) +{ + return is_prot_virt_guest(); +} + /* protected virtualization */ static void pv_init(void) { diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index a977e32a88f2..30091089bee8 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -167,6 +167,11 @@ void virtio_add_status(struct virtio_device *dev, unsigned int status) } EXPORT_SYMBOL_GPL(virtio_add_status); +int __weak arch_needs_iommu_platform(struct virtio_device *dev) +{ + return 0; +} + int virtio_finalize_features(struct virtio_device *dev) { int ret = dev->config->finalize_features(dev); @@ -179,6 +184,10 @@ int virtio_finalize_features(struct virtio_device *dev) if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) return 0; + if (arch_needs_iommu_platform(dev) && + !virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) + return -EIO; + virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK); status = dev->config->get_status(dev); if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) { diff --git a/include/linux/virtio.h b/include/linux/virtio.h index a493eac08393..2c46b310c38c 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -195,4 +195,6 @@ void unregister_virtio_driver(struct virtio_driver *drv); #define module_virtio_driver(__virtio_driver) \ module_driver(__virtio_driver, register_virtio_driver, \ unregister_virtio_driver) + +int arch_needs_iommu_platform(struct virtio_device *dev); #endif /* _LINUX_VIRTIO_H */ ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 1/1] s390: virtio: let arch accept devices without IOMMU feature
On 2020-06-16 08:55, Christian Borntraeger wrote: On 15.06.20 14:39, Pierre Morel wrote: An architecture protecting the guest memory against unauthorized host access may want to enforce VIRTIO I/O device protection through the use of VIRTIO_F_IOMMU_PLATFORM. Let's give a chance to the architecture to accept or not devices without VIRTIO_F_IOMMU_PLATFORM. Signed-off-by: Pierre Morel Acked-by: Christian Borntraeger Thanks, Shouldnt we maybe add a pr_warn if that happens to help the admins to understand what is going on? Yes, Connie asked for it too, good that you remind it to me, I add it. Thanks, Pierre --- arch/s390/mm/init.c | 6 ++ drivers/virtio/virtio.c | 9 + include/linux/virtio.h | 2 ++ 3 files changed, 17 insertions(+) diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c index 87b2d024e75a..3f04ad09650f 100644 --- a/arch/s390/mm/init.c +++ b/arch/s390/mm/init.c @@ -46,6 +46,7 @@ #include #include #include +#include pgd_t swapper_pg_dir[PTRS_PER_PGD] __section(.bss..swapper_pg_dir); @@ -162,6 +163,11 @@ bool force_dma_unencrypted(struct device *dev) return is_prot_virt_guest(); } +int arch_needs_iommu_platform(struct virtio_device *dev) +{ + return is_prot_virt_guest(); +} + /* protected virtualization */ static void pv_init(void) { diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index a977e32a88f2..30091089bee8 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -167,6 +167,11 @@ void virtio_add_status(struct virtio_device *dev, unsigned int status) } EXPORT_SYMBOL_GPL(virtio_add_status); +int __weak arch_needs_iommu_platform(struct virtio_device *dev) +{ + return 0; +} + int virtio_finalize_features(struct virtio_device *dev) { int ret = dev->config->finalize_features(dev); @@ -179,6 +184,10 @@ int virtio_finalize_features(struct virtio_device *dev) if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) return 0; + if (arch_needs_iommu_platform(dev) && + !virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) + return -EIO; + virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK); status = dev->config->get_status(dev); if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) { diff --git a/include/linux/virtio.h b/include/linux/virtio.h index a493eac08393..2c46b310c38c 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -195,4 +195,6 @@ void unregister_virtio_driver(struct virtio_driver *drv); #define module_virtio_driver(__virtio_driver) \ module_driver(__virtio_driver, register_virtio_driver, \ unregister_virtio_driver) + +int arch_needs_iommu_platform(struct virtio_device *dev); #endif /* _LINUX_VIRTIO_H */ -- Pierre Morel IBM Lab Boeblingen ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 1/1] s390: virtio: let arch accept devices without IOMMU feature
On 2020-06-16 08:22, Jason Wang wrote: On 2020/6/15 下午8:39, Pierre Morel wrote: An architecture protecting the guest memory against unauthorized host access may want to enforce VIRTIO I/O device protection through the use of VIRTIO_F_IOMMU_PLATFORM. Let's give a chance to the architecture to accept or not devices without VIRTIO_F_IOMMU_PLATFORM. Signed-off-by: Pierre Morel Acked-by: Jason Wang Thanks, Pierre -- Pierre Morel IBM Lab Boeblingen ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 1/1] s390: virtio: let arch accept devices without IOMMU feature
On Tue, 16 Jun 2020 13:57:26 +0200 Halil Pasic wrote: > On Tue, 16 Jun 2020 12:52:50 +0200 > Pierre Morel wrote: > > > >> int virtio_finalize_features(struct virtio_device *dev) > > >> { > > >> int ret = dev->config->finalize_features(dev); > > >> @@ -179,6 +184,10 @@ int virtio_finalize_features(struct virtio_device > > >> *dev) > > >> if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) > > >> return 0; > > >> > > >> +if (arch_needs_iommu_platform(dev) && > > >> +!virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) > > >> +return -EIO; > > >> + > > > > > > Why EIO? > > > > Because I/O can not occur correctly? > > I am open to suggestions. > > We use -ENODEV if feature when the device rejects the features we > tried to negotiate (see virtio_finalize_features()) and -EINVAL when > the F_VERSION_1 and the virtio-ccw revision ain't coherent (in > virtio_ccw_finalize_features()). Any of those seems more fitting > that EIO to me. BTW does the error code itself matter in any way, > or is it just OK vs some error? If I haven't lost my way, we end up in the driver core probe failure handling; we probably should do -ENODEV if we just want probing to fail and -EINVAL or -EIO if we want the code to moan. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 1/1] s390: virtio: let arch accept devices without IOMMU feature
On Tue, 16 Jun 2020 12:52:50 +0200 Pierre Morel wrote: > >> int virtio_finalize_features(struct virtio_device *dev) > >> { > >>int ret = dev->config->finalize_features(dev); > >> @@ -179,6 +184,10 @@ int virtio_finalize_features(struct virtio_device > >> *dev) > >>if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) > >>return 0; > >> > >> + if (arch_needs_iommu_platform(dev) && > >> + !virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) > >> + return -EIO; > >> + > > > > Why EIO? > > Because I/O can not occur correctly? > I am open to suggestions. We use -ENODEV if feature when the device rejects the features we tried to negotiate (see virtio_finalize_features()) and -EINVAL when the F_VERSION_1 and the virtio-ccw revision ain't coherent (in virtio_ccw_finalize_features()). Any of those seems more fitting that EIO to me. BTW does the error code itself matter in any way, or is it just OK vs some error? Regards, Halil ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 1/1] s390: virtio: let arch accept devices without IOMMU feature
On Tue, 16 Jun 2020 09:35:19 +0200 Pierre Morel wrote: > On 2020-06-16 08:55, Christian Borntraeger wrote: > > > > > > On 15.06.20 14:39, Pierre Morel wrote: > >> An architecture protecting the guest memory against unauthorized host > >> access may want to enforce VIRTIO I/O device protection through the > >> use of VIRTIO_F_IOMMU_PLATFORM. > >> > >> Let's give a chance to the architecture to accept or not devices > >> without VIRTIO_F_IOMMU_PLATFORM. > >> > >> Signed-off-by: Pierre Morel > > > > > > Acked-by: Christian Borntraeger > > Thanks, > > > > > Shouldnt we maybe add a pr_warn if that happens to help the admins to > > understand what is going on? > > > > > > Yes, Connie asked for it too, good that you remind it to me, I add it. Yes, please :) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 1/1] s390: virtio: let arch accept devices without IOMMU feature
On Tue, 16 Jun 2020 12:52:50 +0200 Pierre Morel wrote: > On 2020-06-16 11:52, Halil Pasic wrote: > > On Mon, 15 Jun 2020 14:39:24 +0200 > > Pierre Morel wrote: > >> @@ -162,6 +163,11 @@ bool force_dma_unencrypted(struct device *dev) > >>return is_prot_virt_guest(); > >> } > >> > >> +int arch_needs_iommu_platform(struct virtio_device *dev) > > > > Maybe prefixing the name with virtio_ would help provide the > > proper context. > > The virtio_dev makes it obvious and from the virtio side it should be > obvious that the arch is responsible for this. > > However if nobody has something against I change it. arch_needs_virtio_iommu_platform()? > > > > >> +{ > >> + return is_prot_virt_guest(); > >> +} > >> + > >> /* protected virtualization */ > >> static void pv_init(void) > >> { ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 1/1] s390: virtio: let arch accept devices without IOMMU feature
On Mon, 15 Jun 2020 14:39:24 +0200 Pierre Morel wrote: I find the subject (commit short) sub optimal. The 'arch' is already accepting devices 'without IOMMU feature'. What you are introducing is the ability to reject. > An architecture protecting the guest memory against unauthorized host > access may want to enforce VIRTIO I/O device protection through the > use of VIRTIO_F_IOMMU_PLATFORM. > > Let's give a chance to the architecture to accept or not devices > without VIRTIO_F_IOMMU_PLATFORM. > I don't particularly like the commit message. In general, I believe using access_platform instead of iommu_platform would really benefit us. > Signed-off-by: Pierre Morel > --- > arch/s390/mm/init.c | 6 ++ > drivers/virtio/virtio.c | 9 + > include/linux/virtio.h | 2 ++ > 3 files changed, 17 insertions(+) > > diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c > index 87b2d024e75a..3f04ad09650f 100644 > --- a/arch/s390/mm/init.c > +++ b/arch/s390/mm/init.c > @@ -46,6 +46,7 @@ > #include > #include > #include > +#include arch/s390/mm/init.c including virtio.h looks a bit strange to me, but if Heiko and Vasily don't mind, neither do I. > > pgd_t swapper_pg_dir[PTRS_PER_PGD] __section(.bss..swapper_pg_dir); > > @@ -162,6 +163,11 @@ bool force_dma_unencrypted(struct device *dev) > return is_prot_virt_guest(); > } > > +int arch_needs_iommu_platform(struct virtio_device *dev) Maybe prefixing the name with virtio_ would help provide the proper context. > +{ > + return is_prot_virt_guest(); > +} > + > /* protected virtualization */ > static void pv_init(void) > { > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c > index a977e32a88f2..30091089bee8 100644 > --- a/drivers/virtio/virtio.c > +++ b/drivers/virtio/virtio.c > @@ -167,6 +167,11 @@ void virtio_add_status(struct virtio_device *dev, > unsigned int status) > } > EXPORT_SYMBOL_GPL(virtio_add_status); > > +int __weak arch_needs_iommu_platform(struct virtio_device *dev) > +{ > + return 0; > +} > + Adding some people that could be interested in overriding this as well to the cc list. > int virtio_finalize_features(struct virtio_device *dev) > { > int ret = dev->config->finalize_features(dev); > @@ -179,6 +184,10 @@ int virtio_finalize_features(struct virtio_device *dev) > if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) > return 0; > > + if (arch_needs_iommu_platform(dev) && > + !virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) > + return -EIO; > + Why EIO? Overall, I think it is a good idea to have something that is going to protect us from this scenario. Regards, Halil > virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK); > status = dev->config->get_status(dev); > if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) { > diff --git a/include/linux/virtio.h b/include/linux/virtio.h > index a493eac08393..2c46b310c38c 100644 > --- a/include/linux/virtio.h > +++ b/include/linux/virtio.h > @@ -195,4 +195,6 @@ void unregister_virtio_driver(struct virtio_driver *drv); > #define module_virtio_driver(__virtio_driver) \ > module_driver(__virtio_driver, register_virtio_driver, \ > unregister_virtio_driver) > + > +int arch_needs_iommu_platform(struct virtio_device *dev); > #endif /* _LINUX_VIRTIO_H */ ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 1/3] mm/slab: Use memzero_explicit() in kzfree()
On Tue, Jun 16, 2020 at 08:42:08AM +0200, Michal Hocko wrote: > On Mon 15-06-20 21:57:16, Waiman Long wrote: > > The kzfree() function is normally used to clear some sensitive > > information, like encryption keys, in the buffer before freeing it back > > to the pool. Memset() is currently used for the buffer clearing. However, > > it is entirely possible that the compiler may choose to optimize away the > > memory clearing especially if LTO is being used. To make sure that this > > optimization will not happen, memzero_explicit(), which is introduced > > in v3.18, is now used in kzfree() to do the clearing. > > > > Fixes: 3ef0e5ba4673 ("slab: introduce kzfree()") > > Cc: sta...@vger.kernel.org > > Signed-off-by: Waiman Long > > Acked-by: Michal Hocko > > Although I am not really sure this is a stable material. Is there any > known instance where the memset was optimized out from kzfree? I told him to add the stable. Otherwise it will just get reported to me again. It's a just safer to backport it before we forget. regards, dan carpenter ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 1/1] s390: virtio: let arch accept devices without IOMMU feature
On 2020-06-16 11:52, Halil Pasic wrote: On Mon, 15 Jun 2020 14:39:24 +0200 Pierre Morel wrote: I find the subject (commit short) sub optimal. The 'arch' is already accepting devices 'without IOMMU feature'. What you are introducing is the ability to reject. An architecture protecting the guest memory against unauthorized host access may want to enforce VIRTIO I/O device protection through the use of VIRTIO_F_IOMMU_PLATFORM. Let's give a chance to the architecture to accept or not devices without VIRTIO_F_IOMMU_PLATFORM. I don't particularly like the commit message. In general, I believe using access_platform instead of iommu_platform would really benefit us. IOMMU_PLATFORM is used overall in Linux, and I did not find any occurrence for ACCESS_PLATFORM. Signed-off-by: Pierre Morel --- arch/s390/mm/init.c | 6 ++ drivers/virtio/virtio.c | 9 + include/linux/virtio.h | 2 ++ 3 files changed, 17 insertions(+) diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c index 87b2d024e75a..3f04ad09650f 100644 --- a/arch/s390/mm/init.c +++ b/arch/s390/mm/init.c @@ -46,6 +46,7 @@ #include #include #include +#include arch/s390/mm/init.c including virtio.h looks a bit strange to me, but if Heiko and Vasily don't mind, neither do I. Do we have a better place to install the hook? I though that since it is related to memory management and that, since force_dma_unencrypted already is there, it would be a good place. However, kvm-s390 is another candidate. pgd_t swapper_pg_dir[PTRS_PER_PGD] __section(.bss..swapper_pg_dir); @@ -162,6 +163,11 @@ bool force_dma_unencrypted(struct device *dev) return is_prot_virt_guest(); } +int arch_needs_iommu_platform(struct virtio_device *dev) Maybe prefixing the name with virtio_ would help provide the proper context. The virtio_dev makes it obvious and from the virtio side it should be obvious that the arch is responsible for this. However if nobody has something against I change it. +{ + return is_prot_virt_guest(); +} + /* protected virtualization */ static void pv_init(void) { diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index a977e32a88f2..30091089bee8 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -167,6 +167,11 @@ void virtio_add_status(struct virtio_device *dev, unsigned int status) } EXPORT_SYMBOL_GPL(virtio_add_status); +int __weak arch_needs_iommu_platform(struct virtio_device *dev) +{ + return 0; +} + Adding some people that could be interested in overriding this as well to the cc list. Thanks, int virtio_finalize_features(struct virtio_device *dev) { int ret = dev->config->finalize_features(dev); @@ -179,6 +184,10 @@ int virtio_finalize_features(struct virtio_device *dev) if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) return 0; + if (arch_needs_iommu_platform(dev) && + !virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) + return -EIO; + Why EIO? Because I/O can not occur correctly? I am open to suggestions. Overall, I think it is a good idea to have something that is going to protect us from this scenario. It would clearly be a good thing that trusted hypervizors like QEMU forbid this scenario however should we let the door open? Thanks, Pierre -- Pierre Morel IBM Lab Boeblingen ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 1/1] s390: virtio: let arch accept devices without IOMMU feature
On 2020-06-16 14:20, Cornelia Huck wrote: On Tue, 16 Jun 2020 12:52:50 +0200 Pierre Morel wrote: On 2020-06-16 11:52, Halil Pasic wrote: On Mon, 15 Jun 2020 14:39:24 +0200 Pierre Morel wrote: @@ -162,6 +163,11 @@ bool force_dma_unencrypted(struct device *dev) return is_prot_virt_guest(); } +int arch_needs_iommu_platform(struct virtio_device *dev) Maybe prefixing the name with virtio_ would help provide the proper context. The virtio_dev makes it obvious and from the virtio side it should be obvious that the arch is responsible for this. However if nobody has something against I change it. arch_needs_virtio_iommu_platform()? fine with me -- Pierre Morel IBM Lab Boeblingen ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 5.4 086/134] crypto: virtio: Fix use-after-free in virtio_crypto_skcipher_finalize_req()
From: Longpeng(Mike) [ Upstream commit 8c855f0720ff006d75d0a2512c7f6c4f60ff60ee ] The system'll crash when the users insmod crypto/tcrypto.ko with mode=155 ( testing "authenc(hmac(sha1),cbc(aes))" ). It's caused by reuse the memory of request structure. In crypto_authenc_init_tfm(), the reqsize is set to: [PART 1] sizeof(authenc_request_ctx) + [PART 2] ictx->reqoff + [PART 3] MAX(ahash part, skcipher part) and the 'PART 3' is used by both ahash and skcipher in turn. When the virtio_crypto driver finish skcipher req, it'll call ->complete callback(in crypto_finalize_skcipher_request) and then free its resources whose pointers are recorded in 'skcipher parts'. However, the ->complete is 'crypto_authenc_encrypt_done' in this case, it will use the 'ahash part' of the request and change its content, so virtio_crypto driver will get the wrong pointer after ->complete finish and mistakenly free some other's memory. So the system will crash when these memory will be used again. The resources which need to be cleaned up are not used any more. But the pointers of these resources may be changed in the function "crypto_finalize_skcipher_request". Thus release specific resources before calling this function. Fixes: dbaf0624ffa5 ("crypto: add virtio-crypto driver") Reported-by: LABBE Corentin Cc: Gonglei Cc: Herbert Xu Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: "David S. Miller" Cc: virtualization@lists.linux-foundation.org Cc: linux-ker...@vger.kernel.org Cc: sta...@vger.kernel.org Link: https://lore.kernel.org/r/20200123101000.GB24255@Red Acked-by: Gonglei Signed-off-by: Longpeng(Mike) Link: https://lore.kernel.org/r/20200602070501.2023-3-longpe...@huawei.com Signed-off-by: Michael S. Tsirkin Signed-off-by: Sasha Levin --- drivers/crypto/virtio/virtio_crypto_algs.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c b/drivers/crypto/virtio/virtio_crypto_algs.c index 82b316b2f537..fea55b5da8b5 100644 --- a/drivers/crypto/virtio/virtio_crypto_algs.c +++ b/drivers/crypto/virtio/virtio_crypto_algs.c @@ -580,10 +580,11 @@ static void virtio_crypto_ablkcipher_finalize_req( scatterwalk_map_and_copy(req->info, req->dst, req->nbytes - AES_BLOCK_SIZE, AES_BLOCK_SIZE, 0); - crypto_finalize_ablkcipher_request(vc_sym_req->base.dataq->engine, - req, err); kzfree(vc_sym_req->iv); virtcrypto_clear_request(_sym_req->base); + + crypto_finalize_ablkcipher_request(vc_sym_req->base.dataq->engine, + req, err); } static struct virtio_crypto_algo virtio_crypto_algs[] = { { -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 5.4 087/134] crypto: virtio: Fix src/dst scatterlist calculation in __virtio_crypto_skcipher_do_req()
From: Longpeng(Mike) [ Upstream commit b02989f37fc5e865c9070907e4493b3a21e2 ] The system will crash when the users insmod crypto/tcrypt.ko with mode=38 ( testing "cts(cbc(aes))" ). Usually the next entry of one sg will be @sg@ + 1, but if this sg element is part of a chained scatterlist, it could jump to the start of a new scatterlist array. Fix it by sg_next() on calculation of src/dst scatterlist. Fixes: dbaf0624ffa5 ("crypto: add virtio-crypto driver") Reported-by: LABBE Corentin Cc: Herbert Xu Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: "David S. Miller" Cc: virtualization@lists.linux-foundation.org Cc: linux-ker...@vger.kernel.org Cc: sta...@vger.kernel.org Link: https://lore.kernel.org/r/20200123101000.GB24255@Red Signed-off-by: Gonglei Signed-off-by: Longpeng(Mike) Link: https://lore.kernel.org/r/20200602070501.2023-2-longpe...@huawei.com Signed-off-by: Michael S. Tsirkin Signed-off-by: Sasha Levin --- drivers/crypto/virtio/virtio_crypto_algs.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c b/drivers/crypto/virtio/virtio_crypto_algs.c index fea55b5da8b5..3b37d0150814 100644 --- a/drivers/crypto/virtio/virtio_crypto_algs.c +++ b/drivers/crypto/virtio/virtio_crypto_algs.c @@ -353,13 +353,18 @@ __virtio_crypto_ablkcipher_do_req(struct virtio_crypto_sym_request *vc_sym_req, int err; unsigned long flags; struct scatterlist outhdr, iv_sg, status_sg, **sgs; - int i; u64 dst_len; unsigned int num_out = 0, num_in = 0; int sg_total; uint8_t *iv; + struct scatterlist *sg; src_nents = sg_nents_for_len(req->src, req->nbytes); + if (src_nents < 0) { + pr_err("Invalid number of src SG.\n"); + return src_nents; + } + dst_nents = sg_nents(req->dst); pr_debug("virtio_crypto: Number of sgs (src_nents: %d, dst_nents: %d)\n", @@ -445,12 +450,12 @@ __virtio_crypto_ablkcipher_do_req(struct virtio_crypto_sym_request *vc_sym_req, vc_sym_req->iv = iv; /* Source data */ - for (i = 0; i < src_nents; i++) - sgs[num_out++] = >src[i]; + for (sg = req->src; src_nents; sg = sg_next(sg), src_nents--) + sgs[num_out++] = sg; /* Destination data */ - for (i = 0; i < dst_nents; i++) - sgs[num_out + num_in++] = >dst[i]; + for (sg = req->dst; sg; sg = sg_next(sg)) + sgs[num_out + num_in++] = sg; /* Status */ sg_init_one(_sg, _req->status, sizeof(vc_req->status)); -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 5.7 093/163] crypto: virtio: Fix dest length calculation in __virtio_crypto_skcipher_do_req()
From: Longpeng(Mike) commit d90ca42012db2863a9a30b564a2ace6016594bda upstream. The src/dst length is not aligned with AES_BLOCK_SIZE(which is 16) in some testcases in tcrypto.ko. For example, the src/dst length of one of cts(cbc(aes))'s testcase is 17, the crypto_virtio driver will set @src_data_len=16 but @dst_data_len=17 in this case and get a wrong at then end. SRC: pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp (17 bytes) EXP: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc pp (17 bytes) DST: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 00 (pollute the last bytes) (pp: plaintext cc:ciphertext) Fix this issue by limit the length of dest buffer. Fixes: dbaf0624ffa5 ("crypto: add virtio-crypto driver") Cc: Gonglei Cc: Herbert Xu Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: "David S. Miller" Cc: virtualization@lists.linux-foundation.org Cc: linux-ker...@vger.kernel.org Cc: sta...@vger.kernel.org Signed-off-by: Longpeng(Mike) Link: https://lore.kernel.org/r/20200602070501.2023-4-longpe...@huawei.com Signed-off-by: Michael S. Tsirkin Signed-off-by: Greg Kroah-Hartman --- drivers/crypto/virtio/virtio_crypto_algs.c |1 + 1 file changed, 1 insertion(+) --- a/drivers/crypto/virtio/virtio_crypto_algs.c +++ b/drivers/crypto/virtio/virtio_crypto_algs.c @@ -402,6 +402,7 @@ __virtio_crypto_skcipher_do_req(struct v goto free; } + dst_len = min_t(unsigned int, req->cryptlen, dst_len); pr_debug("virtio_crypto: src_len: %u, dst_len: %llu\n", req->cryptlen, dst_len); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 5.7 095/163] crypto: virtio: Fix src/dst scatterlist calculation in __virtio_crypto_skcipher_do_req()
From: Longpeng(Mike) commit b02989f37fc5e865c9070907e4493b3a21e2 upstream. The system will crash when the users insmod crypto/tcrypt.ko with mode=38 ( testing "cts(cbc(aes))" ). Usually the next entry of one sg will be @sg@ + 1, but if this sg element is part of a chained scatterlist, it could jump to the start of a new scatterlist array. Fix it by sg_next() on calculation of src/dst scatterlist. Fixes: dbaf0624ffa5 ("crypto: add virtio-crypto driver") Reported-by: LABBE Corentin Cc: Herbert Xu Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: "David S. Miller" Cc: virtualization@lists.linux-foundation.org Cc: linux-ker...@vger.kernel.org Cc: sta...@vger.kernel.org Link: https://lore.kernel.org/r/20200123101000.GB24255@Red Signed-off-by: Gonglei Signed-off-by: Longpeng(Mike) Link: https://lore.kernel.org/r/20200602070501.2023-2-longpe...@huawei.com Signed-off-by: Michael S. Tsirkin Signed-off-by: Greg Kroah-Hartman --- drivers/crypto/virtio/virtio_crypto_algs.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) --- a/drivers/crypto/virtio/virtio_crypto_algs.c +++ b/drivers/crypto/virtio/virtio_crypto_algs.c @@ -350,13 +350,18 @@ __virtio_crypto_skcipher_do_req(struct v int err; unsigned long flags; struct scatterlist outhdr, iv_sg, status_sg, **sgs; - int i; u64 dst_len; unsigned int num_out = 0, num_in = 0; int sg_total; uint8_t *iv; + struct scatterlist *sg; src_nents = sg_nents_for_len(req->src, req->cryptlen); + if (src_nents < 0) { + pr_err("Invalid number of src SG.\n"); + return src_nents; + } + dst_nents = sg_nents(req->dst); pr_debug("virtio_crypto: Number of sgs (src_nents: %d, dst_nents: %d)\n", @@ -443,12 +448,12 @@ __virtio_crypto_skcipher_do_req(struct v vc_sym_req->iv = iv; /* Source data */ - for (i = 0; i < src_nents; i++) - sgs[num_out++] = >src[i]; + for (sg = req->src; src_nents; sg = sg_next(sg), src_nents--) + sgs[num_out++] = sg; /* Destination data */ - for (i = 0; i < dst_nents; i++) - sgs[num_out + num_in++] = >dst[i]; + for (sg = req->dst; sg; sg = sg_next(sg)) + sgs[num_out + num_in++] = sg; /* Status */ sg_init_one(_sg, _req->status, sizeof(vc_req->status)); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 5.7 094/163] crypto: virtio: Fix use-after-free in virtio_crypto_skcipher_finalize_req()
From: Longpeng(Mike) commit 8c855f0720ff006d75d0a2512c7f6c4f60ff60ee upstream. The system'll crash when the users insmod crypto/tcrypto.ko with mode=155 ( testing "authenc(hmac(sha1),cbc(aes))" ). It's caused by reuse the memory of request structure. In crypto_authenc_init_tfm(), the reqsize is set to: [PART 1] sizeof(authenc_request_ctx) + [PART 2] ictx->reqoff + [PART 3] MAX(ahash part, skcipher part) and the 'PART 3' is used by both ahash and skcipher in turn. When the virtio_crypto driver finish skcipher req, it'll call ->complete callback(in crypto_finalize_skcipher_request) and then free its resources whose pointers are recorded in 'skcipher parts'. However, the ->complete is 'crypto_authenc_encrypt_done' in this case, it will use the 'ahash part' of the request and change its content, so virtio_crypto driver will get the wrong pointer after ->complete finish and mistakenly free some other's memory. So the system will crash when these memory will be used again. The resources which need to be cleaned up are not used any more. But the pointers of these resources may be changed in the function "crypto_finalize_skcipher_request". Thus release specific resources before calling this function. Fixes: dbaf0624ffa5 ("crypto: add virtio-crypto driver") Reported-by: LABBE Corentin Cc: Gonglei Cc: Herbert Xu Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: "David S. Miller" Cc: virtualization@lists.linux-foundation.org Cc: linux-ker...@vger.kernel.org Cc: sta...@vger.kernel.org Link: https://lore.kernel.org/r/20200123101000.GB24255@Red Acked-by: Gonglei Signed-off-by: Longpeng(Mike) Link: https://lore.kernel.org/r/20200602070501.2023-3-longpe...@huawei.com Signed-off-by: Michael S. Tsirkin Signed-off-by: Greg Kroah-Hartman --- drivers/crypto/virtio/virtio_crypto_algs.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) --- a/drivers/crypto/virtio/virtio_crypto_algs.c +++ b/drivers/crypto/virtio/virtio_crypto_algs.c @@ -578,10 +578,11 @@ static void virtio_crypto_skcipher_final scatterwalk_map_and_copy(req->iv, req->dst, req->cryptlen - AES_BLOCK_SIZE, AES_BLOCK_SIZE, 0); - crypto_finalize_skcipher_request(vc_sym_req->base.dataq->engine, - req, err); kzfree(vc_sym_req->iv); virtcrypto_clear_request(_sym_req->base); + + crypto_finalize_skcipher_request(vc_sym_req->base.dataq->engine, + req, err); } static struct virtio_crypto_algo virtio_crypto_algs[] = { { ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 1/3] mm/slab: Use memzero_explicit() in kzfree()
Waiman Long wrote: > The kzfree() function is normally used to clear some sensitive > information, like encryption keys, in the buffer before freeing it back > to the pool. Memset() "memset()" is all lowercase. > is currently used for buffer clearing. However unlikely, there is still a > non-zero probability I'd say "a possibility". > that and I'd move "in [the] future" here. > the compiler may choose to optimize away the > memory clearing especially if LTO is being used in the future. To make sure > that this optimization will never happen "in these cases" > , memzero_explicit(), which is introduced in v3.18, is now used in "instead of"? > kzfree() to future-proof it. Davod ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 2/3] mm, treewide: Rename kzfree() to kfree_sensitive()
Last time you sent this we couldn't decide which tree it should go through. Either the crypto tree or through Andrew seems like the right thing to me. Also the other issue is that it risks breaking things if people add new kzfree() instances while we are doing the transition. Could you just add a "#define kzfree kfree_sensitive" so that things continue to compile and we can remove it in the next kernel release? regards, dan carpenter ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v5 2/2] mm, treewide: Rename kzfree() to kfree_sensitive()
As said by Linus: A symmetric naming is only helpful if it implies symmetries in use. Otherwise it's actively misleading. In "kzalloc()", the z is meaningful and an important part of what the caller wants. In "kzfree()", the z is actively detrimental, because maybe in the future we really _might_ want to use that "memfill(0xdeadbeef)" or something. The "zero" part of the interface isn't even _relevant_. The main reason that kzfree() exists is to clear sensitive information that should not be leaked to other future users of the same memory objects. Rename kzfree() to kfree_sensitive() to follow the example of the recently added kvfree_sensitive() and make the intention of the API more explicit. In addition, memzero_explicit() is used to clear the memory to make sure that it won't get optimized away by the compiler. The renaming is done by using the command sequence: git grep -w --name-only kzfree |\ xargs sed -i 's/\bkzfree\b/kfree_sensitive/' followed by some editing of the kfree_sensitive() kerneldoc and adding a kzfree backward compatibility macro in slab.h. Suggested-by: Joe Perches Acked-by: David Howells Acked-by: Michal Hocko Acked-by: Johannes Weiner Signed-off-by: Waiman Long --- arch/s390/crypto/prng.c | 4 +-- arch/x86/power/hibernate.c| 2 +- crypto/adiantum.c | 2 +- crypto/ahash.c| 4 +-- crypto/api.c | 2 +- crypto/asymmetric_keys/verify_pefile.c| 4 +-- crypto/deflate.c | 2 +- crypto/drbg.c | 10 +++--- crypto/ecc.c | 8 ++--- crypto/ecdh.c | 2 +- crypto/gcm.c | 2 +- crypto/gf128mul.c | 4 +-- crypto/jitterentropy-kcapi.c | 2 +- crypto/rng.c | 2 +- crypto/rsa-pkcs1pad.c | 6 ++-- crypto/seqiv.c| 2 +- crypto/shash.c| 2 +- crypto/skcipher.c | 2 +- crypto/testmgr.c | 6 ++-- crypto/zstd.c | 2 +- .../allwinner/sun8i-ce/sun8i-ce-cipher.c | 2 +- .../allwinner/sun8i-ss/sun8i-ss-cipher.c | 2 +- drivers/crypto/amlogic/amlogic-gxl-cipher.c | 4 +-- drivers/crypto/atmel-ecc.c| 2 +- drivers/crypto/caam/caampkc.c | 28 +++ drivers/crypto/cavium/cpt/cptvf_main.c| 6 ++-- drivers/crypto/cavium/cpt/cptvf_reqmanager.c | 12 +++ drivers/crypto/cavium/nitrox/nitrox_lib.c | 4 +-- drivers/crypto/cavium/zip/zip_crypto.c| 6 ++-- drivers/crypto/ccp/ccp-crypto-rsa.c | 6 ++-- drivers/crypto/ccree/cc_aead.c| 4 +-- drivers/crypto/ccree/cc_buffer_mgr.c | 4 +-- drivers/crypto/ccree/cc_cipher.c | 6 ++-- drivers/crypto/ccree/cc_hash.c| 8 ++--- drivers/crypto/ccree/cc_request_mgr.c | 2 +- drivers/crypto/marvell/cesa/hash.c| 2 +- .../crypto/marvell/octeontx/otx_cptvf_main.c | 6 ++-- .../marvell/octeontx/otx_cptvf_reqmgr.h | 2 +- drivers/crypto/mediatek/mtk-aes.c | 2 +- drivers/crypto/nx/nx.c| 4 +-- drivers/crypto/virtio/virtio_crypto_algs.c| 12 +++ drivers/crypto/virtio/virtio_crypto_core.c| 2 +- drivers/md/dm-crypt.c | 32 - drivers/md/dm-integrity.c | 6 ++-- drivers/misc/ibmvmc.c | 6 ++-- .../hisilicon/hns3/hns3pf/hclge_mbx.c | 2 +- .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c| 6 ++-- drivers/net/ppp/ppp_mppe.c| 6 ++-- drivers/net/wireguard/noise.c | 4 +-- drivers/net/wireguard/peer.c | 2 +- drivers/net/wireless/intel/iwlwifi/pcie/rx.c | 2 +- .../net/wireless/intel/iwlwifi/pcie/tx-gen2.c | 6 ++-- drivers/net/wireless/intel/iwlwifi/pcie/tx.c | 6 ++-- drivers/net/wireless/intersil/orinoco/wext.c | 4 +-- drivers/s390/crypto/ap_bus.h | 4 +-- drivers/staging/ks7010/ks_hostif.c| 2 +- drivers/staging/rtl8723bs/core/rtw_security.c | 2 +- drivers/staging/wlan-ng/p80211netdev.c| 2 +- drivers/target/iscsi/iscsi_target_auth.c | 2 +- fs/cifs/cifsencrypt.c | 2 +- fs/cifs/connect.c | 10 +++--- fs/cifs/dfs_cache.c | 2 +- fs/cifs/misc.c| 8 ++--- fs/crypto/keyring.c | 6 ++-- fs/crypto/keysetup_v1.c | 4 +-- fs/ecryptfs/keystore.c| 4 +-- fs/ecryptfs/messaging.c
[PATCH v5 0/2] mm, treewide: Rename kzfree() to kfree_sensitive()
v5: - Break the btrfs patch out as a separate patch to be processed independently. - Update the commit log of patch 1 to make it less scary. - Add a kzfree backward compatibility macro in patch 2. v4: - Break out the memzero_explicit() change as suggested by Dan Carpenter so that it can be backported to stable. - Drop the "crypto: Remove unnecessary memzero_explicit()" patch for now as there can be a bit more discussion on what is best. It will be introduced as a separate patch later on after this one is merged. This patchset makes a global rename of the kzfree() to kfree_sensitive() to highlight the fact buffer clearing is only needed if the data objects contain sensitive information like encrpytion key. The fact that kzfree() uses memset() to do the clearing isn't totally safe either as compiler may compile out the clearing in their optimizer especially if LTO is used. Instead, the new kfree_sensitive() uses memzero_explicit() which won't get compiled out. Waiman Long (2): mm/slab: Use memzero_explicit() in kzfree() mm, treewide: Rename kzfree() to kfree_sensitive() arch/s390/crypto/prng.c | 4 +-- arch/x86/power/hibernate.c| 2 +- crypto/adiantum.c | 2 +- crypto/ahash.c| 4 +-- crypto/api.c | 2 +- crypto/asymmetric_keys/verify_pefile.c| 4 +-- crypto/deflate.c | 2 +- crypto/drbg.c | 10 +++--- crypto/ecc.c | 8 ++--- crypto/ecdh.c | 2 +- crypto/gcm.c | 2 +- crypto/gf128mul.c | 4 +-- crypto/jitterentropy-kcapi.c | 2 +- crypto/rng.c | 2 +- crypto/rsa-pkcs1pad.c | 6 ++-- crypto/seqiv.c| 2 +- crypto/shash.c| 2 +- crypto/skcipher.c | 2 +- crypto/testmgr.c | 6 ++-- crypto/zstd.c | 2 +- .../allwinner/sun8i-ce/sun8i-ce-cipher.c | 2 +- .../allwinner/sun8i-ss/sun8i-ss-cipher.c | 2 +- drivers/crypto/amlogic/amlogic-gxl-cipher.c | 4 +-- drivers/crypto/atmel-ecc.c| 2 +- drivers/crypto/caam/caampkc.c | 28 +++ drivers/crypto/cavium/cpt/cptvf_main.c| 6 ++-- drivers/crypto/cavium/cpt/cptvf_reqmanager.c | 12 +++ drivers/crypto/cavium/nitrox/nitrox_lib.c | 4 +-- drivers/crypto/cavium/zip/zip_crypto.c| 6 ++-- drivers/crypto/ccp/ccp-crypto-rsa.c | 6 ++-- drivers/crypto/ccree/cc_aead.c| 4 +-- drivers/crypto/ccree/cc_buffer_mgr.c | 4 +-- drivers/crypto/ccree/cc_cipher.c | 6 ++-- drivers/crypto/ccree/cc_hash.c| 8 ++--- drivers/crypto/ccree/cc_request_mgr.c | 2 +- drivers/crypto/marvell/cesa/hash.c| 2 +- .../crypto/marvell/octeontx/otx_cptvf_main.c | 6 ++-- .../marvell/octeontx/otx_cptvf_reqmgr.h | 2 +- drivers/crypto/mediatek/mtk-aes.c | 2 +- drivers/crypto/nx/nx.c| 4 +-- drivers/crypto/virtio/virtio_crypto_algs.c| 12 +++ drivers/crypto/virtio/virtio_crypto_core.c| 2 +- drivers/md/dm-crypt.c | 32 - drivers/md/dm-integrity.c | 6 ++-- drivers/misc/ibmvmc.c | 6 ++-- .../hisilicon/hns3/hns3pf/hclge_mbx.c | 2 +- .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c| 6 ++-- drivers/net/ppp/ppp_mppe.c| 6 ++-- drivers/net/wireguard/noise.c | 4 +-- drivers/net/wireguard/peer.c | 2 +- drivers/net/wireless/intel/iwlwifi/pcie/rx.c | 2 +- .../net/wireless/intel/iwlwifi/pcie/tx-gen2.c | 6 ++-- drivers/net/wireless/intel/iwlwifi/pcie/tx.c | 6 ++-- drivers/net/wireless/intersil/orinoco/wext.c | 4 +-- drivers/s390/crypto/ap_bus.h | 4 +-- drivers/staging/ks7010/ks_hostif.c| 2 +- drivers/staging/rtl8723bs/core/rtw_security.c | 2 +- drivers/staging/wlan-ng/p80211netdev.c| 2 +- drivers/target/iscsi/iscsi_target_auth.c | 2 +- fs/cifs/cifsencrypt.c | 2 +- fs/cifs/connect.c | 10 +++--- fs/cifs/dfs_cache.c | 2 +- fs/cifs/misc.c| 8 ++--- fs/crypto/keyring.c | 6 ++-- fs/crypto/keysetup_v1.c | 4 +-- fs/ecryptfs/keystore.c| 4 +-- fs/ecryptfs/messaging.c | 2 +- include/crypto/aead.h | 2 +- include/crypto/akcipher.h
[PATCH v5 1/2] mm/slab: Use memzero_explicit() in kzfree()
The kzfree() function is normally used to clear some sensitive information, like encryption keys, in the buffer before freeing it back to the pool. Memset() is currently used for buffer clearing. However unlikely, there is still a non-zero probability that the compiler may choose to optimize away the memory clearing especially if LTO is being used in the future. To make sure that this optimization will never happen, memzero_explicit(), which is introduced in v3.18, is now used in kzfree() to future-proof it. Fixes: 3ef0e5ba4673 ("slab: introduce kzfree()") Cc: sta...@vger.kernel.org Acked-by: Michal Hocko Signed-off-by: Waiman Long --- mm/slab_common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index 9e72ba224175..37d48a56431d 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1726,7 +1726,7 @@ void kzfree(const void *p) if (unlikely(ZERO_OR_NULL_PTR(mem))) return; ks = ksize(mem); - memset(mem, 0, ks); + memzero_explicit(mem, ks); kfree(mem); } EXPORT_SYMBOL(kzfree); -- 2.18.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 1/3] mm/slab: Use memzero_explicit() in kzfree()
On 6/15/20 11:30 PM, Eric Biggers wrote: On Mon, Jun 15, 2020 at 09:57:16PM -0400, Waiman Long wrote: The kzfree() function is normally used to clear some sensitive information, like encryption keys, in the buffer before freeing it back to the pool. Memset() is currently used for the buffer clearing. However, it is entirely possible that the compiler may choose to optimize away the memory clearing especially if LTO is being used. To make sure that this optimization will not happen, memzero_explicit(), which is introduced in v3.18, is now used in kzfree() to do the clearing. Fixes: 3ef0e5ba4673 ("slab: introduce kzfree()") Cc: sta...@vger.kernel.org Signed-off-by: Waiman Long --- mm/slab_common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index 9e72ba224175..37d48a56431d 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1726,7 +1726,7 @@ void kzfree(const void *p) if (unlikely(ZERO_OR_NULL_PTR(mem))) return; ks = ksize(mem); - memset(mem, 0, ks); + memzero_explicit(mem, ks); kfree(mem); } EXPORT_SYMBOL(kzfree); This is a good change, but the commit message isn't really accurate. AFAIK, no one has found any case where this memset() gets optimized out. And even with LTO, it would be virtually impossible due to all the synchronization and global data structures that kfree() uses. (Remember that this isn't the C standard function "free()", so the compiler can't assign it any special meaning.) Not to mention that LTO support isn't actually upstream yet. I still agree with the change, but it might be helpful if the commit message were honest that this is really a hardening measure and about properly conveying the intent. As-is this sounds like a critical fix, which might confuse people. Yes, I agree that the commit log may look a bit scary. How about the following: The kzfree() function is normally used to clear some sensitive information, like encryption keys, in the buffer before freeing it back to the pool. Memset() is currently used for buffer clearing. However unlikely, there is still a non-zero probability that the compiler may choose to optimize away the memory clearing especially if LTO is being used in the future. To make sure that this optimization will never happen, memzero_explicit(), which is introduced in v3.18, is now used in kzfree() to future-proof it. Cheers, Longman ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 1/1] s390: virtio: let arch accept devices without IOMMU feature
On 2020-06-16 14:17, Cornelia Huck wrote: On Tue, 16 Jun 2020 13:57:26 +0200 Halil Pasic wrote: On Tue, 16 Jun 2020 12:52:50 +0200 Pierre Morel wrote: int virtio_finalize_features(struct virtio_device *dev) { int ret = dev->config->finalize_features(dev); @@ -179,6 +184,10 @@ int virtio_finalize_features(struct virtio_device *dev) if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) return 0; + if (arch_needs_iommu_platform(dev) && + !virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) + return -EIO; + Why EIO? Because I/O can not occur correctly? I am open to suggestions. We use -ENODEV if feature when the device rejects the features we tried to negotiate (see virtio_finalize_features()) and -EINVAL when the F_VERSION_1 and the virtio-ccw revision ain't coherent (in virtio_ccw_finalize_features()). Any of those seems more fitting that EIO to me. BTW does the error code itself matter in any way, or is it just OK vs some error? If I haven't lost my way, we end up in the driver core probe failure handling; we probably should do -ENODEV if we just want probing to fail and -EINVAL or -EIO if we want the code to moan. what about returning -ENODEV and add a dedicated warning here? -- Pierre Morel IBM Lab Boeblingen ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 3/3] btrfs: Use kfree() in btrfs_ioctl_get_subvol_info()
On 6/16/20 10:48 AM, David Sterba wrote: On Mon, Jun 15, 2020 at 09:57:18PM -0400, Waiman Long wrote: In btrfs_ioctl_get_subvol_info(), there is a classic case where kzalloc() was incorrectly paired with kzfree(). According to David Sterba, there isn't any sensitive information in the subvol_info that needs to be cleared before freeing. So kfree_sensitive() isn't really needed, use kfree() instead. Reported-by: David Sterba Signed-off-by: Waiman Long --- fs/btrfs/ioctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index f1dd9e4271e9..e8f7c5f00894 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2692,7 +2692,7 @@ static int btrfs_ioctl_get_subvol_info(struct file *file, void __user *argp) btrfs_put_root(root); out_free: btrfs_free_path(path); - kfree_sensitive(subvol_info); + kfree(subvol_info); I would rather merge a patch doing to kzfree -> kfree instead of doing the middle step to switch it to kfree_sensitive. If it would help integration of your patchset I can push it to the next rc so there are no kzfree left in the btrfs code. Treewide change like that can take time so it would be one less problem to care about for you. Sure, I will move it forward in the patch series. Thanks, Longman ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 2/3] mm, treewide: Rename kzfree() to kfree_sensitive()
On 6/16/20 10:26 AM, Dan Carpenter wrote: Last time you sent this we couldn't decide which tree it should go through. Either the crypto tree or through Andrew seems like the right thing to me. Also the other issue is that it risks breaking things if people add new kzfree() instances while we are doing the transition. Could you just add a "#define kzfree kfree_sensitive" so that things continue to compile and we can remove it in the next kernel release? regards, dan carpenter Yes, that make sure sense. Will send out v5 later today. Cheers, Longman ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 5.4 088/134] crypto: virtio: Fix dest length calculation in __virtio_crypto_skcipher_do_req()
From: Longpeng(Mike) [ Upstream commit d90ca42012db2863a9a30b564a2ace6016594bda ] The src/dst length is not aligned with AES_BLOCK_SIZE(which is 16) in some testcases in tcrypto.ko. For example, the src/dst length of one of cts(cbc(aes))'s testcase is 17, the crypto_virtio driver will set @src_data_len=16 but @dst_data_len=17 in this case and get a wrong at then end. SRC: pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp (17 bytes) EXP: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc pp (17 bytes) DST: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 00 (pollute the last bytes) (pp: plaintext cc:ciphertext) Fix this issue by limit the length of dest buffer. Fixes: dbaf0624ffa5 ("crypto: add virtio-crypto driver") Cc: Gonglei Cc: Herbert Xu Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: "David S. Miller" Cc: virtualization@lists.linux-foundation.org Cc: linux-ker...@vger.kernel.org Cc: sta...@vger.kernel.org Signed-off-by: Longpeng(Mike) Link: https://lore.kernel.org/r/20200602070501.2023-4-longpe...@huawei.com Signed-off-by: Michael S. Tsirkin Signed-off-by: Sasha Levin --- drivers/crypto/virtio/virtio_crypto_algs.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c b/drivers/crypto/virtio/virtio_crypto_algs.c index 3b37d0150814..ac420b201dd8 100644 --- a/drivers/crypto/virtio/virtio_crypto_algs.c +++ b/drivers/crypto/virtio/virtio_crypto_algs.c @@ -410,6 +410,7 @@ __virtio_crypto_ablkcipher_do_req(struct virtio_crypto_sym_request *vc_sym_req, goto free; } + dst_len = min_t(unsigned int, req->nbytes, dst_len); pr_debug("virtio_crypto: src_len: %u, dst_len: %llu\n", req->nbytes, dst_len); -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 1/1] s390: virtio: let arch accept devices without IOMMU feature
On Tue, 16 Jun 2020 15:41:20 +0200 Pierre Morel wrote: > On 2020-06-16 14:17, Cornelia Huck wrote: > > On Tue, 16 Jun 2020 13:57:26 +0200 > > Halil Pasic wrote: > > > >> On Tue, 16 Jun 2020 12:52:50 +0200 > >> Pierre Morel wrote: > >> > >int virtio_finalize_features(struct virtio_device *dev) > >{ > > int ret = dev->config->finalize_features(dev); > > @@ -179,6 +184,10 @@ int virtio_finalize_features(struct virtio_device > > *dev) > > if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) > > return 0; > > > > + if (arch_needs_iommu_platform(dev) && > > + !virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) > > + return -EIO; > > + > > Why EIO? > >>> > >>> Because I/O can not occur correctly? > >>> I am open to suggestions. > >> > >> We use -ENODEV if feature when the device rejects the features we > >> tried to negotiate (see virtio_finalize_features()) and -EINVAL when > >> the F_VERSION_1 and the virtio-ccw revision ain't coherent (in > >> virtio_ccw_finalize_features()). Any of those seems more fitting > >> that EIO to me. BTW does the error code itself matter in any way, > >> or is it just OK vs some error? > > > > If I haven't lost my way, we end up in the driver core probe failure > > handling; we probably should do -ENODEV if we just want probing to fail > > and -EINVAL or -EIO if we want the code to moan. > > > > what about returning -ENODEV and add a dedicated warning here? > Sounds good at least to me. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 3/3] btrfs: Use kfree() in btrfs_ioctl_get_subvol_info()
On Mon, Jun 15, 2020 at 09:57:18PM -0400, Waiman Long wrote: > In btrfs_ioctl_get_subvol_info(), there is a classic case where kzalloc() > was incorrectly paired with kzfree(). According to David Sterba, there > isn't any sensitive information in the subvol_info that needs to be > cleared before freeing. So kfree_sensitive() isn't really needed, > use kfree() instead. > > Reported-by: David Sterba > Signed-off-by: Waiman Long > --- > fs/btrfs/ioctl.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index f1dd9e4271e9..e8f7c5f00894 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -2692,7 +2692,7 @@ static int btrfs_ioctl_get_subvol_info(struct file > *file, void __user *argp) > btrfs_put_root(root); > out_free: > btrfs_free_path(path); > - kfree_sensitive(subvol_info); > + kfree(subvol_info); I would rather merge a patch doing to kzfree -> kfree instead of doing the middle step to switch it to kfree_sensitive. If it would help integration of your patchset I can push it to the next rc so there are no kzfree left in the btrfs code. Treewide change like that can take time so it would be one less problem to care about for you. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 5.6 100/161] crypto: virtio: Fix src/dst scatterlist calculation in __virtio_crypto_skcipher_do_req()
From: Longpeng(Mike) commit b02989f37fc5e865c9070907e4493b3a21e2 upstream. The system will crash when the users insmod crypto/tcrypt.ko with mode=38 ( testing "cts(cbc(aes))" ). Usually the next entry of one sg will be @sg@ + 1, but if this sg element is part of a chained scatterlist, it could jump to the start of a new scatterlist array. Fix it by sg_next() on calculation of src/dst scatterlist. Fixes: dbaf0624ffa5 ("crypto: add virtio-crypto driver") Reported-by: LABBE Corentin Cc: Herbert Xu Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: "David S. Miller" Cc: virtualization@lists.linux-foundation.org Cc: linux-ker...@vger.kernel.org Cc: sta...@vger.kernel.org Link: https://lore.kernel.org/r/20200123101000.GB24255@Red Signed-off-by: Gonglei Signed-off-by: Longpeng(Mike) Link: https://lore.kernel.org/r/20200602070501.2023-2-longpe...@huawei.com Signed-off-by: Michael S. Tsirkin Signed-off-by: Greg Kroah-Hartman --- drivers/crypto/virtio/virtio_crypto_algs.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) --- a/drivers/crypto/virtio/virtio_crypto_algs.c +++ b/drivers/crypto/virtio/virtio_crypto_algs.c @@ -350,13 +350,18 @@ __virtio_crypto_skcipher_do_req(struct v int err; unsigned long flags; struct scatterlist outhdr, iv_sg, status_sg, **sgs; - int i; u64 dst_len; unsigned int num_out = 0, num_in = 0; int sg_total; uint8_t *iv; + struct scatterlist *sg; src_nents = sg_nents_for_len(req->src, req->cryptlen); + if (src_nents < 0) { + pr_err("Invalid number of src SG.\n"); + return src_nents; + } + dst_nents = sg_nents(req->dst); pr_debug("virtio_crypto: Number of sgs (src_nents: %d, dst_nents: %d)\n", @@ -443,12 +448,12 @@ __virtio_crypto_skcipher_do_req(struct v vc_sym_req->iv = iv; /* Source data */ - for (i = 0; i < src_nents; i++) - sgs[num_out++] = >src[i]; + for (sg = req->src; src_nents; sg = sg_next(sg), src_nents--) + sgs[num_out++] = sg; /* Destination data */ - for (i = 0; i < dst_nents; i++) - sgs[num_out + num_in++] = >dst[i]; + for (sg = req->dst; sg; sg = sg_next(sg)) + sgs[num_out + num_in++] = sg; /* Status */ sg_init_one(_sg, _req->status, sizeof(vc_req->status)); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 5.6 098/161] crypto: virtio: Fix dest length calculation in __virtio_crypto_skcipher_do_req()
From: Longpeng(Mike) commit d90ca42012db2863a9a30b564a2ace6016594bda upstream. The src/dst length is not aligned with AES_BLOCK_SIZE(which is 16) in some testcases in tcrypto.ko. For example, the src/dst length of one of cts(cbc(aes))'s testcase is 17, the crypto_virtio driver will set @src_data_len=16 but @dst_data_len=17 in this case and get a wrong at then end. SRC: pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp (17 bytes) EXP: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc pp (17 bytes) DST: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 00 (pollute the last bytes) (pp: plaintext cc:ciphertext) Fix this issue by limit the length of dest buffer. Fixes: dbaf0624ffa5 ("crypto: add virtio-crypto driver") Cc: Gonglei Cc: Herbert Xu Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: "David S. Miller" Cc: virtualization@lists.linux-foundation.org Cc: linux-ker...@vger.kernel.org Cc: sta...@vger.kernel.org Signed-off-by: Longpeng(Mike) Link: https://lore.kernel.org/r/20200602070501.2023-4-longpe...@huawei.com Signed-off-by: Michael S. Tsirkin Signed-off-by: Greg Kroah-Hartman --- drivers/crypto/virtio/virtio_crypto_algs.c |1 + 1 file changed, 1 insertion(+) --- a/drivers/crypto/virtio/virtio_crypto_algs.c +++ b/drivers/crypto/virtio/virtio_crypto_algs.c @@ -402,6 +402,7 @@ __virtio_crypto_skcipher_do_req(struct v goto free; } + dst_len = min_t(unsigned int, req->cryptlen, dst_len); pr_debug("virtio_crypto: src_len: %u, dst_len: %llu\n", req->cryptlen, dst_len); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 5.6 099/161] crypto: virtio: Fix use-after-free in virtio_crypto_skcipher_finalize_req()
From: Longpeng(Mike) commit 8c855f0720ff006d75d0a2512c7f6c4f60ff60ee upstream. The system'll crash when the users insmod crypto/tcrypto.ko with mode=155 ( testing "authenc(hmac(sha1),cbc(aes))" ). It's caused by reuse the memory of request structure. In crypto_authenc_init_tfm(), the reqsize is set to: [PART 1] sizeof(authenc_request_ctx) + [PART 2] ictx->reqoff + [PART 3] MAX(ahash part, skcipher part) and the 'PART 3' is used by both ahash and skcipher in turn. When the virtio_crypto driver finish skcipher req, it'll call ->complete callback(in crypto_finalize_skcipher_request) and then free its resources whose pointers are recorded in 'skcipher parts'. However, the ->complete is 'crypto_authenc_encrypt_done' in this case, it will use the 'ahash part' of the request and change its content, so virtio_crypto driver will get the wrong pointer after ->complete finish and mistakenly free some other's memory. So the system will crash when these memory will be used again. The resources which need to be cleaned up are not used any more. But the pointers of these resources may be changed in the function "crypto_finalize_skcipher_request". Thus release specific resources before calling this function. Fixes: dbaf0624ffa5 ("crypto: add virtio-crypto driver") Reported-by: LABBE Corentin Cc: Gonglei Cc: Herbert Xu Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: "David S. Miller" Cc: virtualization@lists.linux-foundation.org Cc: linux-ker...@vger.kernel.org Cc: sta...@vger.kernel.org Link: https://lore.kernel.org/r/20200123101000.GB24255@Red Acked-by: Gonglei Signed-off-by: Longpeng(Mike) Link: https://lore.kernel.org/r/20200602070501.2023-3-longpe...@huawei.com Signed-off-by: Michael S. Tsirkin Signed-off-by: Greg Kroah-Hartman --- drivers/crypto/virtio/virtio_crypto_algs.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) --- a/drivers/crypto/virtio/virtio_crypto_algs.c +++ b/drivers/crypto/virtio/virtio_crypto_algs.c @@ -578,10 +578,11 @@ static void virtio_crypto_skcipher_final scatterwalk_map_and_copy(req->iv, req->dst, req->cryptlen - AES_BLOCK_SIZE, AES_BLOCK_SIZE, 0); - crypto_finalize_skcipher_request(vc_sym_req->base.dataq->engine, - req, err); kzfree(vc_sym_req->iv); virtcrypto_clear_request(_sym_req->base); + + crypto_finalize_skcipher_request(vc_sym_req->base.dataq->engine, + req, err); } static struct virtio_crypto_algo virtio_crypto_algs[] = { { ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 2/2] mm, treewide: Rename kzfree() to kfree_sensitive()
On Tue, 16 Jun 2020 11:43:11 -0400 Waiman Long wrote: > As said by Linus: > > A symmetric naming is only helpful if it implies symmetries in use. > Otherwise it's actively misleading. > > In "kzalloc()", the z is meaningful and an important part of what the > caller wants. > > In "kzfree()", the z is actively detrimental, because maybe in the > future we really _might_ want to use that "memfill(0xdeadbeef)" or > something. The "zero" part of the interface isn't even _relevant_. > > The main reason that kzfree() exists is to clear sensitive information > that should not be leaked to other future users of the same memory > objects. > > Rename kzfree() to kfree_sensitive() to follow the example of the > recently added kvfree_sensitive() and make the intention of the API > more explicit. In addition, memzero_explicit() is used to clear the > memory to make sure that it won't get optimized away by the compiler. > > The renaming is done by using the command sequence: > > git grep -w --name-only kzfree |\ > xargs sed -i 's/\bkzfree\b/kfree_sensitive/' > > followed by some editing of the kfree_sensitive() kerneldoc and adding > a kzfree backward compatibility macro in slab.h. > > ... > > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -186,10 +186,12 @@ void memcg_deactivate_kmem_caches(struct mem_cgroup *, > struct mem_cgroup *); > */ > void * __must_check krealloc(const void *, size_t, gfp_t); > void kfree(const void *); > -void kzfree(const void *); > +void kfree_sensitive(const void *); > size_t __ksize(const void *); > size_t ksize(const void *); > > +#define kzfree(x)kfree_sensitive(x) /* For backward compatibility */ > + What was the thinking here? Is this really necessary? I suppose we could keep this around for a while to ease migration. But not for too long, please. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 2/2] mm, treewide: Rename kzfree() to kfree_sensitive()
On 6/16/20 2:09 PM, Andrew Morton wrote: On Tue, 16 Jun 2020 11:43:11 -0400 Waiman Long wrote: As said by Linus: A symmetric naming is only helpful if it implies symmetries in use. Otherwise it's actively misleading. In "kzalloc()", the z is meaningful and an important part of what the caller wants. In "kzfree()", the z is actively detrimental, because maybe in the future we really _might_ want to use that "memfill(0xdeadbeef)" or something. The "zero" part of the interface isn't even _relevant_. The main reason that kzfree() exists is to clear sensitive information that should not be leaked to other future users of the same memory objects. Rename kzfree() to kfree_sensitive() to follow the example of the recently added kvfree_sensitive() and make the intention of the API more explicit. In addition, memzero_explicit() is used to clear the memory to make sure that it won't get optimized away by the compiler. The renaming is done by using the command sequence: git grep -w --name-only kzfree |\ xargs sed -i 's/\bkzfree\b/kfree_sensitive/' followed by some editing of the kfree_sensitive() kerneldoc and adding a kzfree backward compatibility macro in slab.h. ... --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -186,10 +186,12 @@ void memcg_deactivate_kmem_caches(struct mem_cgroup *, struct mem_cgroup *); */ void * __must_check krealloc(const void *, size_t, gfp_t); void kfree(const void *); -void kzfree(const void *); +void kfree_sensitive(const void *); size_t __ksize(const void *); size_t ksize(const void *); +#define kzfree(x) kfree_sensitive(x) /* For backward compatibility */ + What was the thinking here? Is this really necessary? I suppose we could keep this around for a while to ease migration. But not for too long, please. It should be there just for 1 release cycle. I have broken out the btrfs patch to the btrfs list and I didn't make the kzfree to kfree_sensitive conversion there as that patch was in front in my patch list. So depending on which one lands first, there can be a window where the compilation may fail without this workaround. I am going to send out another patch in the next release cycle to remove it. Cheers, Longman ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 0/3] mm, treewide: Rename kzfree() to kfree_sensitive()
On 6/16/20 2:53 PM, Joe Perches wrote: On Mon, 2020-06-15 at 21:57 -0400, Waiman Long wrote: v4: - Break out the memzero_explicit() change as suggested by Dan Carpenter so that it can be backported to stable. - Drop the "crypto: Remove unnecessary memzero_explicit()" patch for now as there can be a bit more discussion on what is best. It will be introduced as a separate patch later on after this one is merged. To this larger audience and last week without reply: https://lore.kernel.org/lkml/573b3fbd5927c643920e1364230c296b23e7584d.ca...@perches.com/ Are there _any_ fastpath uses of kfree or vfree? Many patches have been posted recently to fix mispairings of specific types of alloc and free functions. To eliminate these mispairings at a runtime cost of four comparisons, should the kfree/vfree/kvfree/kfree_const functions be consolidated into a single kfree? Something like the below: void kfree(const void *addr) { if (is_kernel_rodata((unsigned long)addr)) return; if (is_vmalloc_addr(addr)) _vfree(addr); else _kfree(addr); } #define kvfree kfree #define vfree kfree #define kfree_const kfree How about adding CONFIG_DEBUG_VM code to check for invalid address ranges in kfree() and vfree()? By doing this, we can catch unmatched pairing in debug mode, but won't have the overhead when debug mode is off. Thought? Cheers, Longman ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 0/3] mm, treewide: Rename kzfree() to kfree_sensitive()
On Tue, Jun 16, 2020 at 11:53:50AM -0700, Joe Perches wrote: > To this larger audience and last week without reply: > https://lore.kernel.org/lkml/573b3fbd5927c643920e1364230c296b23e7584d.ca...@perches.com/ > > Are there _any_ fastpath uses of kfree or vfree? I worked on adding a 'free' a couple of years ago. That was capable of freeing percpu, vmalloc, kmalloc and alloc_pages memory. I ran into trouble when I tried to free kmem_cache_alloc memory -- it works for slab and slub, but not slob (because slob needs the size from the kmem_cache). My motivation for this was to change kfree_rcu() to just free_rcu(). > To eliminate these mispairings at a runtime cost of four > comparisons, should the kfree/vfree/kvfree/kfree_const > functions be consolidated into a single kfree? I would say to leave kfree() alone and just introduce free() as a new default. There's some weird places in the kernel that have a 'free' symbol of their own, but those should be renamed anyway. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 0/3] mm, treewide: Rename kzfree() to kfree_sensitive()
On 6/16/20 2:53 PM, Joe Perches wrote: On Mon, 2020-06-15 at 21:57 -0400, Waiman Long wrote: v4: - Break out the memzero_explicit() change as suggested by Dan Carpenter so that it can be backported to stable. - Drop the "crypto: Remove unnecessary memzero_explicit()" patch for now as there can be a bit more discussion on what is best. It will be introduced as a separate patch later on after this one is merged. To this larger audience and last week without reply: https://lore.kernel.org/lkml/573b3fbd5927c643920e1364230c296b23e7584d.ca...@perches.com/ Are there _any_ fastpath uses of kfree or vfree? I am not sure about that, but both of them can be slow. Many patches have been posted recently to fix mispairings of specific types of alloc and free functions. To eliminate these mispairings at a runtime cost of four comparisons, should the kfree/vfree/kvfree/kfree_const functions be consolidated into a single kfree? Something like the below: void kfree(const void *addr) { if (is_kernel_rodata((unsigned long)addr)) return; if (is_vmalloc_addr(addr)) _vfree(addr); else _kfree(addr); } is_kernel_rodata() is inlined, but is_vmalloc_addr() isn't. So the overhead can be a bit bigger. Cheers, Longman ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 0/3] mm, treewide: Rename kzfree() to kfree_sensitive()
On Wed, Jun 17, 2020 at 01:01:30AM +0200, David Sterba wrote: > On Tue, Jun 16, 2020 at 11:53:50AM -0700, Joe Perches wrote: > > On Mon, 2020-06-15 at 21:57 -0400, Waiman Long wrote: > > > v4: > > > - Break out the memzero_explicit() change as suggested by Dan Carpenter > > > so that it can be backported to stable. > > > - Drop the "crypto: Remove unnecessary memzero_explicit()" patch for > > > now as there can be a bit more discussion on what is best. It will be > > > introduced as a separate patch later on after this one is merged. > > > > To this larger audience and last week without reply: > > https://lore.kernel.org/lkml/573b3fbd5927c643920e1364230c296b23e7584d.ca...@perches.com/ > > > > Are there _any_ fastpath uses of kfree or vfree? > > I'd consider kfree performance critical for cases where it is called > under locks. If possible the kfree is moved outside of the critical > section, but we have rbtrees or lists that get deleted under locks and > restructuring the code to do eg. splice and free it outside of the lock > is not always possible. Not just performance critical, but correctness critical. Since kvfree() may allocate from the vmalloc allocator, I really think that kvfree() should assert that it's !in_atomic(). Otherwise we can get into trouble if we end up calling vfree() and have to take the mutex. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 0/3] mm, treewide: Rename kzfree() to kfree_sensitive()
On Mon, 2020-06-15 at 21:57 -0400, Waiman Long wrote: > v4: > - Break out the memzero_explicit() change as suggested by Dan Carpenter > so that it can be backported to stable. > - Drop the "crypto: Remove unnecessary memzero_explicit()" patch for > now as there can be a bit more discussion on what is best. It will be > introduced as a separate patch later on after this one is merged. To this larger audience and last week without reply: https://lore.kernel.org/lkml/573b3fbd5927c643920e1364230c296b23e7584d.ca...@perches.com/ Are there _any_ fastpath uses of kfree or vfree? Many patches have been posted recently to fix mispairings of specific types of alloc and free functions. To eliminate these mispairings at a runtime cost of four comparisons, should the kfree/vfree/kvfree/kfree_const functions be consolidated into a single kfree? Something like the below: void kfree(const void *addr) { if (is_kernel_rodata((unsigned long)addr)) return; if (is_vmalloc_addr(addr)) _vfree(addr); else _kfree(addr); } #define kvfree kfree #define vfreekfree #define kfree_const kfree ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 0/3] mm, treewide: Rename kzfree() to kfree_sensitive()
On Tue, Jun 16, 2020 at 11:53:50AM -0700, Joe Perches wrote: > On Mon, 2020-06-15 at 21:57 -0400, Waiman Long wrote: > > v4: > > - Break out the memzero_explicit() change as suggested by Dan Carpenter > > so that it can be backported to stable. > > - Drop the "crypto: Remove unnecessary memzero_explicit()" patch for > > now as there can be a bit more discussion on what is best. It will be > > introduced as a separate patch later on after this one is merged. > > To this larger audience and last week without reply: > https://lore.kernel.org/lkml/573b3fbd5927c643920e1364230c296b23e7584d.ca...@perches.com/ > > Are there _any_ fastpath uses of kfree or vfree? I'd consider kfree performance critical for cases where it is called under locks. If possible the kfree is moved outside of the critical section, but we have rbtrees or lists that get deleted under locks and restructuring the code to do eg. splice and free it outside of the lock is not always possible. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC v7 03/14] vhost: use batched get_vq_desc version
On Tue, Jun 16, 2020 at 05:23:43PM +0200, Eugenio Perez Martin wrote: > On Mon, Jun 15, 2020 at 6:05 PM Eugenio Pérez wrote: > > > > On Thu, 2020-06-11 at 07:30 -0400, Michael S. Tsirkin wrote: > > > On Wed, Jun 10, 2020 at 06:18:32PM +0200, Eugenio Perez Martin wrote: > > > > On Wed, Jun 10, 2020 at 5:13 PM Michael S. Tsirkin > > > > wrote: > > > > > On Wed, Jun 10, 2020 at 02:37:50PM +0200, Eugenio Perez Martin wrote: > > > > > > > +/* This function returns a value > 0 if a descriptor was found, > > > > > > > or 0 if none were found. > > > > > > > + * A negative code is returned on error. */ > > > > > > > +static int fetch_descs(struct vhost_virtqueue *vq) > > > > > > > +{ > > > > > > > + int ret; > > > > > > > + > > > > > > > + if (unlikely(vq->first_desc >= vq->ndescs)) { > > > > > > > + vq->first_desc = 0; > > > > > > > + vq->ndescs = 0; > > > > > > > + } > > > > > > > + > > > > > > > + if (vq->ndescs) > > > > > > > + return 1; > > > > > > > + > > > > > > > + for (ret = 1; > > > > > > > +ret > 0 && vq->ndescs <= > > > > > > > vhost_vq_num_batch_descs(vq); > > > > > > > +ret = fetch_buf(vq)) > > > > > > > + ; > > > > > > > > > > > > (Expanding comment in V6): > > > > > > > > > > > > We get an infinite loop this way: > > > > > > * vq->ndescs == 0, so we call fetch_buf() here > > > > > > * fetch_buf gets less than vhost_vq_num_batch_descs(vq); > > > > > > descriptors. ret = 1 > > > > > > * This loop calls again fetch_buf, but vq->ndescs > 0 (and avail_vq > > > > > > == > > > > > > last_avail_vq), so it just return 1 > > > > > > > > > > That's what > > > > > [PATCH RFC v7 08/14] fixup! vhost: use batched get_vq_desc > > > > > version > > > > > is supposed to fix. > > > > > > > > > > > > > Sorry, I forgot to include that fixup. > > > > > > > > With it I don't see CPU stalls, but with that version latency has > > > > increased a lot and I see packet lost: > > > > + ping -c 5 10.200.0.1 > > > > PING 10.200.0.1 (10.200.0.1) 56(84) bytes of data. > > > > > From 10.200.0.2 icmp_seq=1 Destination Host Unreachable > > > > > From 10.200.0.2 icmp_seq=2 Destination Host Unreachable > > > > > From 10.200.0.2 icmp_seq=3 Destination Host Unreachable > > > > 64 bytes from 10.200.0.1: icmp_seq=5 ttl=64 time=6848 ms > > > > > > > > --- 10.200.0.1 ping statistics --- > > > > 5 packets transmitted, 1 received, +3 errors, 80% packet loss, time 76ms > > > > rtt min/avg/max/mdev = 6848.316/6848.316/6848.316/0.000 ms, pipe 4 > > > > -- > > > > > > > > I cannot even use netperf. > > > > > > OK so that's the bug to try to find and fix I think. > > > > > > > > > > If I modify with my proposed version: > > > > + ping -c 5 10.200.0.1 > > > > PING 10.200.0.1 (10.200.0.1) 56(84) bytes of data. > > > > 64 bytes from 10.200.0.1: icmp_seq=1 ttl=64 time=7.07 ms > > > > 64 bytes from 10.200.0.1: icmp_seq=2 ttl=64 time=0.358 ms > > > > 64 bytes from 10.200.0.1: icmp_seq=3 ttl=64 time=5.35 ms > > > > 64 bytes from 10.200.0.1: icmp_seq=4 ttl=64 time=2.27 ms > > > > 64 bytes from 10.200.0.1: icmp_seq=5 ttl=64 time=0.426 ms > > > > > > Not sure which version this is. > > > > > > > [root@localhost ~]# netperf -H 10.200.0.1 -p 12865 -l 10 -t TCP_STREAM > > > > MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to > > > > 10.200.0.1 () port 0 AF_INET > > > > Recv SendSend > > > > Socket Socket Message Elapsed > > > > Size SizeSize Time Throughput > > > > bytes bytes bytessecs.10^6bits/sec > > > > > > > > 131072 16384 1638410.014742.36 > > > > [root@localhost ~]# netperf -H 10.200.0.1 -p 12865 -l 10 -t UDP_STREAM > > > > MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to > > > > 10.200.0.1 () port 0 AF_INET > > > > Socket Message Elapsed Messages > > > > SizeSize Time Okay Errors Throughput > > > > bytes bytessecs# # 10^6bits/sec > > > > > > > > 212992 65507 10.009214 0 482.83 > > > > 212992 10.009214482.83 > > > > > > > > I will compare with the non-batch version for reference, but the > > > > difference between the two is noticeable. Maybe it's worth finding a > > > > good value for the if() inside fetch_buf? > > > > > > > > Thanks! > > > > > > > > > > I don't think it's performance, I think it's a bug somewhere, > > > e.g. maybe we corrupt a packet, or stall the queue, or > > > something like this. > > > > > > Let's do this, I will squash the fixups and post v8 so you can bisect > > > and then debug cleanly. > > > > Ok, so if we apply the patch proposed in v7 08/14 (Or the version 8 of the > > patchset sent), this is what happens: > > > > 1. Userland (virtio_test in my case) introduces just one buffer in vq, and > > it kicks > > 2. vhost module reaches fetch_descs, called from vhost_get_vq_desc. From > > there we call fetch_buf in a for loop. > >