[PATCH v3 1/1] virtio: write back F_VERSION_1 before validate
The virtio specification virtio-v1.1-cs01 states: "Transitional devices MUST detect Legacy drivers by detecting that VIRTIO_F_VERSION_1 has not been acknowledged by the driver." This is exactly what QEMU as of 6.1 has done relying solely on VIRTIO_F_VERSION_1 for detecting that. However, the specification also says: "... the driver MAY read (but MUST NOT write) the device-specific configuration fields to check that it can support the device ..." before setting FEATURES_OK. In that case, any transitional device relying solely on VIRTIO_F_VERSION_1 for detecting legacy drivers will return data in legacy format. In particular, this implies that it is in big endian format for big endian guests. This naturally confuses the driver which expects little endian in the modern mode. It is probably a good idea to amend the spec to clarify that VIRTIO_F_VERSION_1 can only be relied on after the feature negotiation is complete. Before validate callback existed, config space was only read after FEATURES_OK. However, we already have two regressions, so let's address this here as well. The regressions affect the VIRTIO_NET_F_MTU feature of virtio-net and the VIRTIO_BLK_F_BLK_SIZE feature of virtio-blk for BE guests when virtio 1.0 is used on both sides. The latter renders virtio-blk unusable with DASD backing, because things simply don't work with the default. See Fixes tags for relevant commits. For QEMU, we can work around the issue by writing out the feature bits with VIRTIO_F_VERSION_1 bit set. We (ab)use the finalize_features config op for this. This isn't enough to address all vhost devices since these do not get the features until FEATURES_OK, however it looks like the affected devices actually never handled the endianness for legacy mode correctly, so at least that's not a regression. No devices except virtio net and virtio blk seem to be affected. Long term the right thing to do is to fix the hypervisors. Cc: #v4.11 Signed-off-by: Halil Pasic Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space") Fixes: fe36cbe0671e ("virtio_net: clear MTU when out of range") Reported-by: mark...@us.ibm.com Reviewed-by: Cornelia Huck --- @Connie: I made some more commit message changes to accommodate Michael's requests. I just assumed these will work or you as well and kept your r-b. Please shout at me if it needs to be dropped :) --- drivers/virtio/virtio.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index 0a5b54034d4b..236081afe9a2 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -239,6 +239,17 @@ static int virtio_dev_probe(struct device *_d) driver_features_legacy = driver_features; } + /* +* Some devices detect legacy solely via F_VERSION_1. Write +* F_VERSION_1 to force LE config space accesses before FEATURES_OK for +* these when needed. +*/ + if (drv->validate && !virtio_legacy_is_little_endian() + && device_features & BIT_ULL(VIRTIO_F_VERSION_1)) { + dev->features = BIT_ULL(VIRTIO_F_VERSION_1); + dev->config->finalize_features(dev); + } + if (device_features & (1ULL << VIRTIO_F_VERSION_1)) dev->features = driver_features & device_features; else base-commit: 60a9483534ed0d99090a2ee1d4bb0b8179195f51 -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] VDUSE: fix documentation underline warning
On Thu, Oct 7, 2021 at 4:29 AM Randy Dunlap wrote: > > Fix a VDUSE documentation build warning: > > Documentation/userspace-api/vduse.rst:21: WARNING: Title underline too short. > > Fixes: 7bc7f61897b6 ("Documentation: Add documentation for VDUSE") > Signed-off-by: Randy Dunlap > Cc: Xie Yongji > Cc: Jason Wang > Cc: Michael S. Tsirkin > Cc: virtualization@lists.linux-foundation.org > Cc: Jonathan Corbet > Cc: linux-...@vger.kernel.org > --- Acked-by: Jason Wang > Documentation/userspace-api/vduse.rst |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > --- lnx-515-rc4.orig/Documentation/userspace-api/vduse.rst > +++ lnx-515-rc4/Documentation/userspace-api/vduse.rst > @@ -18,7 +18,7 @@ types can be added after the security is > is clarified or fixed in the future. > > Create/Destroy VDUSE devices > - > + > > VDUSE devices are created as follows: > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 7/7] eni_vdpa: add vDPA driver for Alibaba ENI
在 2021/9/29 下午2:11, Wu Zongyong 写道: This patch adds a new vDPA driver for Alibaba ENI(Elastic Network Interface) which is build upon virtio 0.9.5 specification. And this driver doesn't support to run on BE host. If this is true, I think it's still better to exclude this driver via Kconfig. Signed-off-by: Wu Zongyong --- drivers/vdpa/Kconfig| 8 + drivers/vdpa/Makefile | 1 + drivers/vdpa/alibaba/Makefile | 3 + drivers/vdpa/alibaba/eni_vdpa.c | 553 4 files changed, 565 insertions(+) create mode 100644 drivers/vdpa/alibaba/Makefile create mode 100644 drivers/vdpa/alibaba/eni_vdpa.c diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig index 3d91982d8371..9587b9177b05 100644 --- a/drivers/vdpa/Kconfig +++ b/drivers/vdpa/Kconfig @@ -78,4 +78,12 @@ config VP_VDPA help This kernel module bridges virtio PCI device to vDPA bus. +config ALIBABA_ENI_VDPA + tristate "vDPA driver for Alibaba ENI" + select VIRTIO_PCI_LEGACY_LIB + depends on PCI_MSI + help + VDPA driver for Alibaba ENI(Elastic Network Interface) which is build upon + virtio 0.9.5 specification. + endif # VDPA diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile index f02ebed33f19..15665563a7f4 100644 --- a/drivers/vdpa/Makefile +++ b/drivers/vdpa/Makefile @@ -5,3 +5,4 @@ obj-$(CONFIG_VDPA_USER) += vdpa_user/ obj-$(CONFIG_IFCVF)+= ifcvf/ obj-$(CONFIG_MLX5_VDPA) += mlx5/ obj-$(CONFIG_VP_VDPA)+= virtio_pci/ +obj-$(CONFIG_ALIBABA_ENI_VDPA) += alibaba/ diff --git a/drivers/vdpa/alibaba/Makefile b/drivers/vdpa/alibaba/Makefile new file mode 100644 index ..ef4aae69f87a --- /dev/null +++ b/drivers/vdpa/alibaba/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0 +obj-$(CONFIG_ALIBABA_ENI_VDPA) += eni_vdpa.o + diff --git a/drivers/vdpa/alibaba/eni_vdpa.c b/drivers/vdpa/alibaba/eni_vdpa.c new file mode 100644 index ..6a09f157d810 --- /dev/null +++ b/drivers/vdpa/alibaba/eni_vdpa.c @@ -0,0 +1,553 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * vDPA bridge driver for Alibaba ENI(Elastic Network Interface) + * + * Copyright (c) 2021, Alibaba Inc. All rights reserved. + * Author: Wu Zongyong + * + */ + +#include "linux/bits.h" +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define ENI_MSIX_NAME_SIZE 256 + +#define ENI_ERR(pdev, fmt, ...)\ + dev_err(&pdev->dev, "%s"fmt, "eni_vdpa: ", ##__VA_ARGS__) +#define ENI_DBG(pdev, fmt, ...)\ + dev_dbg(&pdev->dev, "%s"fmt, "eni_vdpa: ", ##__VA_ARGS__) +#define ENI_INFO(pdev, fmt, ...) \ + dev_info(&pdev->dev, "%s"fmt, "eni_vdpa: ", ##__VA_ARGS__) + +struct eni_vring { + void __iomem *notify; + char msix_name[ENI_MSIX_NAME_SIZE]; + struct vdpa_callback cb; + int irq; +}; + +struct eni_vdpa { + struct vdpa_device vdpa; + struct virtio_pci_legacy_device ldev; + struct eni_vring *vring; + struct vdpa_callback config_cb; + char msix_name[ENI_MSIX_NAME_SIZE]; + int config_irq; + int queues; + int vectors; +}; + +static struct eni_vdpa *vdpa_to_eni(struct vdpa_device *vdpa) +{ + return container_of(vdpa, struct eni_vdpa, vdpa); +} + +static struct virtio_pci_legacy_device *vdpa_to_ldev(struct vdpa_device *vdpa) +{ + struct eni_vdpa *eni_vdpa = vdpa_to_eni(vdpa); + + return &eni_vdpa->ldev; +} + +static u64 eni_vdpa_get_features(struct vdpa_device *vdpa) +{ + struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa); + u64 features = vp_legacy_get_features(ldev); + + features |= BIT_ULL(VIRTIO_F_ACCESS_PLATFORM); + features |= BIT_ULL(VIRTIO_F_ORDER_PLATFORM); VERSION_1 is also needed? + + return features; +} + +static int eni_vdpa_set_features(struct vdpa_device *vdpa, u64 features) +{ + struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa); + + if (!(features & BIT_ULL(VIRTIO_NET_F_MRG_RXBUF)) && features) { + ENI_ERR(ldev->pci_dev, + "VIRTIO_NET_F_MRG_RXBUF is not negotiated\n"); + return -EINVAL; Do we need to make sure FEATURE_OK is not set in this case or the ENI can do this for us? Other looks good. Thanks + } + + vp_legacy_set_features(ldev, (u32)features); + + return 0; +} + +static u8 eni_vdpa_get_status(struct vdpa_device *vdpa) +{ + struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa); + + return vp_legacy_get_status(ldev); +} + +static int eni_vdpa_get_vq_irq(struct vdpa_device *vdpa, u16 idx) +{ + struct eni_vdpa *eni_vdpa = vdpa_to_eni(vdpa); + int irq = eni_vdpa->vring[idx].irq; + + if (irq == VIRTIO_MSI_NO_VECTOR) + return -EINVAL; + + return irq; +} + +static void eni_vdpa_free_irq(struct eni_vdpa *eni_vdpa) +{ + struct virtio_pci_
Re: [PATCH v4 6/7] vdpa: add new attribute VDPA_ATTR_DEV_MIN_VQ_SIZE
在 2021/9/29 下午2:11, Wu Zongyong 写道: This attribute advertises the min value of virtqueue size. The value is 0 by default. I think 0 is not a correct value. If I understand the spec correctly, it should be 1. Thanks Signed-off-by: Wu Zongyong --- drivers/vdpa/vdpa.c | 5 + include/uapi/linux/vdpa.h | 1 + 2 files changed, 6 insertions(+) diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index 1dc121a07a93..6ed79fba33e4 100644 --- a/drivers/vdpa/vdpa.c +++ b/drivers/vdpa/vdpa.c @@ -492,6 +492,7 @@ vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 portid, u32 seq int flags, struct netlink_ext_ack *extack) { u16 max_vq_size; + u16 min_vq_size = 0; u32 device_id; u32 vendor_id; void *hdr; @@ -508,6 +509,8 @@ vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 portid, u32 seq device_id = vdev->config->get_device_id(vdev); vendor_id = vdev->config->get_vendor_id(vdev); max_vq_size = vdev->config->get_vq_num_max(vdev); + if (vdev->config->get_vq_num_min) + min_vq_size = vdev->config->get_vq_num_min(vdev); err = -EMSGSIZE; if (nla_put_string(msg, VDPA_ATTR_DEV_NAME, dev_name(&vdev->dev))) @@ -520,6 +523,8 @@ vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 portid, u32 seq goto msg_err; if (nla_put_u16(msg, VDPA_ATTR_DEV_MAX_VQ_SIZE, max_vq_size)) goto msg_err; + if (nla_put_u16(msg, VDPA_ATTR_DEV_MIN_VQ_SIZE, min_vq_size)) + goto msg_err; genlmsg_end(msg, hdr); return 0; diff --git a/include/uapi/linux/vdpa.h b/include/uapi/linux/vdpa.h index 66a41e4ec163..e3b87879514c 100644 --- a/include/uapi/linux/vdpa.h +++ b/include/uapi/linux/vdpa.h @@ -32,6 +32,7 @@ enum vdpa_attr { VDPA_ATTR_DEV_VENDOR_ID,/* u32 */ VDPA_ATTR_DEV_MAX_VQS, /* u32 */ VDPA_ATTR_DEV_MAX_VQ_SIZE, /* u16 */ + VDPA_ATTR_DEV_MIN_VQ_SIZE, /* u16 */ /* new attributes must be added above here */ VDPA_ATTR_MAX, ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 5/7] virtio_vdpa: setup correct vq size with callbacks get_vq_num_{max,min}
在 2021/9/29 下午2:11, Wu Zongyong 写道: Signed-off-by: Wu Zongyong Commit log please. --- drivers/virtio/virtio_vdpa.c | 25 - 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c index 72eaef2caeb1..8aa4ebe2a2a2 100644 --- a/drivers/virtio/virtio_vdpa.c +++ b/drivers/virtio/virtio_vdpa.c @@ -145,7 +145,8 @@ virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned int index, /* Assume split virtqueue, switch to packed if necessary */ struct vdpa_vq_state state = {0}; unsigned long flags; - u32 align, num; + u32 align, max_num, min_num = 0; + bool may_reduce_num = true; int err; if (!name) @@ -163,22 +164,36 @@ virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned int index, if (!info) return ERR_PTR(-ENOMEM); - num = ops->get_vq_num_max(vdpa); - if (num == 0) { + max_num = ops->get_vq_num_max(vdpa); + if (max_num == 0) { err = -ENOENT; goto error_new_virtqueue; } + if (ops->get_vq_num_min) + min_num = ops->get_vq_num_min(vdpa); + if (min_num > max_num) { + err = -ENOENT; + goto error_new_virtqueue; + } If we really want to do this, let's move this to vdpa core during device probing. Or just leave it as is (device risk itself). Thanks + + may_reduce_num = (max_num == min_num) ? false : true; + /* Create the vring */ align = ops->get_vq_align(vdpa); - vq = vring_create_virtqueue(index, num, align, vdev, - true, true, ctx, + vq = vring_create_virtqueue(index, max_num, align, vdev, + true, may_reduce_num, ctx, virtio_vdpa_notify, callback, name); if (!vq) { err = -ENOMEM; goto error_new_virtqueue; } + if (virtqueue_get_vring_size(vq) < min_num) { + err = -EINVAL; + goto err_vq; + } + /* Setup virtqueue callback */ cb.callback = virtio_vdpa_virtqueue_cb; cb.private = info; ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 4/7] vdpa: add new callback get_vq_num_min in vdpa_config_ops
在 2021/9/29 下午2:11, Wu Zongyong 写道: This callback is optional. For vdpa devices that not support to change virtqueue size, get_vq_num_min and get_vq_num_max will return the same value, so that users can choose a correct value for that device. Suggested-by: Jason Wang Signed-off-by: Wu Zongyong Acked-by: Jason Wang --- include/linux/vdpa.h | 4 1 file changed, 4 insertions(+) diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index a896ee021e5f..30864848950b 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -171,6 +171,9 @@ struct vdpa_map_file { * @get_vq_num_max: Get the max size of virtqueue *@vdev: vdpa device *Returns u16: max size of virtqueue + * @get_vq_num_min:Get the min size of virtqueue (optional) + * @vdev: vdpa device + * Returns u16: min size of virtqueue * @get_device_id:Get virtio device id *@vdev: vdpa device *Returns u32: virtio device id @@ -266,6 +269,7 @@ struct vdpa_config_ops { void (*set_config_cb)(struct vdpa_device *vdev, struct vdpa_callback *cb); u16 (*get_vq_num_max)(struct vdpa_device *vdev); + u16 (*get_vq_num_min)(struct vdpa_device *vdev); u32 (*get_device_id)(struct vdpa_device *vdev); u32 (*get_vendor_id)(struct vdpa_device *vdev); u8 (*get_status)(struct vdpa_device *vdev); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 2/7] vdpa: fix typo
在 2021/9/29 下午2:11, Wu Zongyong 写道: Signed-off-by: Wu Zongyong Acked-by: Jason Wang --- include/linux/vdpa.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index 3972ab765de1..a896ee021e5f 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -257,7 +257,7 @@ struct vdpa_config_ops { struct vdpa_notification_area (*get_vq_notification)(struct vdpa_device *vdev, u16 idx); /* vq irq is not expected to be changed once DRIVER_OK is set */ - int (*get_vq_irq)(struct vdpa_device *vdv, u16 idx); + int (*get_vq_irq)(struct vdpa_device *vdev, u16 idx); /* Device ops */ u32 (*get_vq_align)(struct vdpa_device *vdev); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 1/7] virtio-pci: introduce legacy device module
在 2021/9/29 下午2:11, Wu Zongyong 写道: Split common codes from virtio-pci-legacy so vDPA driver can reuse it later. Signed-off-by: Wu Zongyong --- drivers/virtio/Kconfig | 10 ++ drivers/virtio/Makefile| 1 + drivers/virtio/virtio_pci_common.c | 10 +- drivers/virtio/virtio_pci_common.h | 9 +- drivers/virtio/virtio_pci_legacy.c | 101 +++- drivers/virtio/virtio_pci_legacy_dev.c | 220 + include/linux/virtio_pci_legacy.h | 44 + 7 files changed, 312 insertions(+), 83 deletions(-) create mode 100644 drivers/virtio/virtio_pci_legacy_dev.c create mode 100644 include/linux/virtio_pci_legacy.h diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index ce1b3f6ec325..8fcf94cd2c96 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -20,6 +20,15 @@ config VIRTIO_PCI_LIB PCI device with possible vendor specific extensions. Any module that selects this module must depend on PCI. +config VIRTIO_PCI_LIB_LEGACY + tristate + help + Legacy PCI device (Virtio PCI Card 0.9.x Draft and older device) + implementation. + This module implements the basic probe and control for devices + which are based on legacy PCI device. Any module that selects this + module must depend on PCI. + menuconfig VIRTIO_MENU bool "Virtio drivers" default y @@ -43,6 +52,7 @@ config VIRTIO_PCI_LEGACY bool "Support for legacy virtio draft 0.9.X and older devices" default y depends on VIRTIO_PCI + select VIRTIO_PCI_LIB_LEGACY help Virtio PCI Card 0.9.X Draft (circa 2014) and older device support. diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile index 699bbea0465f..0a82d0873248 100644 --- a/drivers/virtio/Makefile +++ b/drivers/virtio/Makefile @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o obj-$(CONFIG_VIRTIO_PCI_LIB) += virtio_pci_modern_dev.o +obj-$(CONFIG_VIRTIO_PCI_LIB_LEGACY) += virtio_pci_legacy_dev.o obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index b35bb2d57f62..d724f676608b 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -549,6 +549,8 @@ static int virtio_pci_probe(struct pci_dev *pci_dev, pci_set_master(pci_dev); + vp_dev->is_legacy = vp_dev->ldev.ioaddr ? true : false; + rc = register_virtio_device(&vp_dev->vdev); reg_dev = vp_dev; if (rc) @@ -557,10 +559,10 @@ static int virtio_pci_probe(struct pci_dev *pci_dev, return 0; err_register: - if (vp_dev->ioaddr) -virtio_pci_legacy_remove(vp_dev); + if (vp_dev->is_legacy) + virtio_pci_legacy_remove(vp_dev); else -virtio_pci_modern_remove(vp_dev); + virtio_pci_modern_remove(vp_dev); err_probe: pci_disable_device(pci_dev); err_enable_device: @@ -587,7 +589,7 @@ static void virtio_pci_remove(struct pci_dev *pci_dev) unregister_virtio_device(&vp_dev->vdev); - if (vp_dev->ioaddr) + if (vp_dev->is_legacy) virtio_pci_legacy_remove(vp_dev); else virtio_pci_modern_remove(vp_dev); diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h index beec047a8f8d..eb17a29fc7ef 100644 --- a/drivers/virtio/virtio_pci_common.h +++ b/drivers/virtio/virtio_pci_common.h @@ -25,6 +25,7 @@ #include #include #include +#include #include #include #include @@ -44,16 +45,14 @@ struct virtio_pci_vq_info { struct virtio_pci_device { struct virtio_device vdev; struct pci_dev *pci_dev; + struct virtio_pci_legacy_device ldev; struct virtio_pci_modern_device mdev; - /* In legacy mode, these two point to within ->legacy. */ + bool is_legacy; + /* Where to read and clear interrupt */ u8 __iomem *isr; - /* Legacy only field */ - /* the IO mapping for the PCI config space */ - void __iomem *ioaddr; - /* a list of queues so we can dispatch IRQs */ spinlock_t lock; struct list_head virtqueues; diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c index d62e9835aeec..82eb437ad920 100644 --- a/drivers/virtio/virtio_pci_legacy.c +++ b/drivers/virtio/virtio_pci_legacy.c @@ -14,6 +14,7 @@ * Michael S. Tsirkin */ +#include "linux/virtio_pci_legacy.h" #include "virtio_pci_common.h" /* virtio config->get_features() implementation */ @@ -23,7 +24,7 @@ static u64 vp_get_features(struct virtio_device *vdev) /* When someone needs more than 32 feature bits, we'll need to * steal a bit t
Re: [PATCH v4 1/7] virtio-pci: introduce legacy device module
在 2021/9/29 下午2:11, Wu Zongyong 写道: Split common codes from virtio-pci-legacy so vDPA driver can reuse it later. Signed-off-by: Wu Zongyong Acked-by: Jason Wang --- drivers/virtio/Kconfig | 10 ++ drivers/virtio/Makefile| 1 + drivers/virtio/virtio_pci_common.c | 10 +- drivers/virtio/virtio_pci_common.h | 9 +- drivers/virtio/virtio_pci_legacy.c | 101 +++- drivers/virtio/virtio_pci_legacy_dev.c | 220 + include/linux/virtio_pci_legacy.h | 44 + 7 files changed, 312 insertions(+), 83 deletions(-) create mode 100644 drivers/virtio/virtio_pci_legacy_dev.c create mode 100644 include/linux/virtio_pci_legacy.h diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index ce1b3f6ec325..8fcf94cd2c96 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -20,6 +20,15 @@ config VIRTIO_PCI_LIB PCI device with possible vendor specific extensions. Any module that selects this module must depend on PCI. +config VIRTIO_PCI_LIB_LEGACY + tristate + help + Legacy PCI device (Virtio PCI Card 0.9.x Draft and older device) + implementation. + This module implements the basic probe and control for devices + which are based on legacy PCI device. Any module that selects this + module must depend on PCI. + menuconfig VIRTIO_MENU bool "Virtio drivers" default y @@ -43,6 +52,7 @@ config VIRTIO_PCI_LEGACY bool "Support for legacy virtio draft 0.9.X and older devices" default y depends on VIRTIO_PCI + select VIRTIO_PCI_LIB_LEGACY help Virtio PCI Card 0.9.X Draft (circa 2014) and older device support. diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile index 699bbea0465f..0a82d0873248 100644 --- a/drivers/virtio/Makefile +++ b/drivers/virtio/Makefile @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o obj-$(CONFIG_VIRTIO_PCI_LIB) += virtio_pci_modern_dev.o +obj-$(CONFIG_VIRTIO_PCI_LIB_LEGACY) += virtio_pci_legacy_dev.o obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index b35bb2d57f62..d724f676608b 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -549,6 +549,8 @@ static int virtio_pci_probe(struct pci_dev *pci_dev, pci_set_master(pci_dev); + vp_dev->is_legacy = vp_dev->ldev.ioaddr ? true : false; + rc = register_virtio_device(&vp_dev->vdev); reg_dev = vp_dev; if (rc) @@ -557,10 +559,10 @@ static int virtio_pci_probe(struct pci_dev *pci_dev, return 0; err_register: - if (vp_dev->ioaddr) -virtio_pci_legacy_remove(vp_dev); + if (vp_dev->is_legacy) + virtio_pci_legacy_remove(vp_dev); else -virtio_pci_modern_remove(vp_dev); + virtio_pci_modern_remove(vp_dev); err_probe: pci_disable_device(pci_dev); err_enable_device: @@ -587,7 +589,7 @@ static void virtio_pci_remove(struct pci_dev *pci_dev) unregister_virtio_device(&vp_dev->vdev); - if (vp_dev->ioaddr) + if (vp_dev->is_legacy) virtio_pci_legacy_remove(vp_dev); else virtio_pci_modern_remove(vp_dev); diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h index beec047a8f8d..eb17a29fc7ef 100644 --- a/drivers/virtio/virtio_pci_common.h +++ b/drivers/virtio/virtio_pci_common.h @@ -25,6 +25,7 @@ #include #include #include +#include #include #include #include @@ -44,16 +45,14 @@ struct virtio_pci_vq_info { struct virtio_pci_device { struct virtio_device vdev; struct pci_dev *pci_dev; + struct virtio_pci_legacy_device ldev; struct virtio_pci_modern_device mdev; - /* In legacy mode, these two point to within ->legacy. */ + bool is_legacy; + /* Where to read and clear interrupt */ u8 __iomem *isr; - /* Legacy only field */ - /* the IO mapping for the PCI config space */ - void __iomem *ioaddr; - /* a list of queues so we can dispatch IRQs */ spinlock_t lock; struct list_head virtqueues; diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c index d62e9835aeec..82eb437ad920 100644 --- a/drivers/virtio/virtio_pci_legacy.c +++ b/drivers/virtio/virtio_pci_legacy.c @@ -14,6 +14,7 @@ * Michael S. Tsirkin */ +#include "linux/virtio_pci_legacy.h" #include "virtio_pci_common.h" /* virtio config->get_features() implementation */ @@ -23,7 +24,7 @@ static u64 vp_get_features(struct virtio_device *vdev) /* When someone needs more than 32 feature bits, we'll need
Re: [PATCH v5 16/16] x86/tdx: Add cmdline option to force use of ioremap_host_shared
The connection is quite unfortunate IMHO. Can't there be an option that unbreaks drivers *without* opening up security holes by making BIOS shared? That would require new low level APIs that distinguish both cases, and a tree sweep. -Andi ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/5] iova: Move fast alloc size roundup into alloc_iova_fast()
On Fri, Sep 24, 2021 at 6:07 PM John Garry wrote: > > It really is a property of the IOVA rcache code that we need to alloc a > power-of-2 size, so relocate the functionality to resize into > alloc_iova_fast(), rather than the callsites. > > Signed-off-by: John Garry Acked-by: Jason Wang > --- > drivers/iommu/dma-iommu.c| 8 > drivers/iommu/iova.c | 9 + > drivers/vdpa/vdpa_user/iova_domain.c | 8 > 3 files changed, 9 insertions(+), 16 deletions(-) > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index 896bea04c347..a99b3445fef8 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -444,14 +444,6 @@ static dma_addr_t iommu_dma_alloc_iova(struct > iommu_domain *domain, > > shift = iova_shift(iovad); > iova_len = size >> shift; > - /* > -* Freeing non-power-of-two-sized allocations back into the IOVA > caches > -* will come back to bite us badly, so we have to waste a bit of space > -* rounding up anything cacheable to make sure that can't happen. The > -* order of the unadjusted size will still match upon freeing. > -*/ > - if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1))) > - iova_len = roundup_pow_of_two(iova_len); > > dma_limit = min_not_zero(dma_limit, dev->bus_dma_limit); > > diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c > index 9e8bc802ac05..ff567cbc42f7 100644 > --- a/drivers/iommu/iova.c > +++ b/drivers/iommu/iova.c > @@ -497,6 +497,15 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned long > size, > unsigned long iova_pfn; > struct iova *new_iova; > > + /* > +* Freeing non-power-of-two-sized allocations back into the IOVA > caches > +* will come back to bite us badly, so we have to waste a bit of space > +* rounding up anything cacheable to make sure that can't happen. The > +* order of the unadjusted size will still match upon freeing. > +*/ > + if (size < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1))) > + size = roundup_pow_of_two(size); > + > iova_pfn = iova_rcache_get(iovad, size, limit_pfn + 1); > if (iova_pfn) > return iova_pfn; > diff --git a/drivers/vdpa/vdpa_user/iova_domain.c > b/drivers/vdpa/vdpa_user/iova_domain.c > index 1daae2608860..2b1143f11d8f 100644 > --- a/drivers/vdpa/vdpa_user/iova_domain.c > +++ b/drivers/vdpa/vdpa_user/iova_domain.c > @@ -292,14 +292,6 @@ vduse_domain_alloc_iova(struct iova_domain *iovad, > unsigned long iova_len = iova_align(iovad, size) >> shift; > unsigned long iova_pfn; > > - /* > -* Freeing non-power-of-two-sized allocations back into the IOVA > caches > -* will come back to bite us badly, so we have to waste a bit of space > -* rounding up anything cacheable to make sure that can't happen. The > -* order of the unadjusted size will still match upon freeing. > -*/ > - if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1))) > - iova_len = roundup_pow_of_two(iova_len); > iova_pfn = alloc_iova_fast(iovad, iova_len, limit >> shift, true); > > return iova_pfn << shift; > -- > 2.26.2 > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net] virtio-net: fix for skb_over_panic inside big mode
On Sat, Oct 9, 2021 at 5:18 PM Michael S. Tsirkin wrote: > > From: Xuan Zhuo > > commit 126285651b7f ("Merge > ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net") > accidentally reverted the effect of > commit 1a8024239da ("virtio-net: fix for skb_over_panic inside big mode") > on drivers/net/virtio_net.c > > As a result, users of crosvm (which is using large packet mode) > are experiencing crashes with 5.14-rc1 and above that do not > occur with 5.13. > > Crash trace: > > [ 61.346677] skbuff: skb_over_panic: text:881ae2c7 len:3762 > put:3762 head:8a5ec8c22000 data:8a5ec8c22010 tail:0xec2 end:0xec0 > dev: > [ 61.369192] kernel BUG at net/core/skbuff.c:111! > [ 61.372840] invalid opcode: [#1] SMP PTI > [ 61.374892] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.14.0-rc1 > linux-v5.14-rc1-for-mesa-ci.tar.bz2 #1 > [ 61.376450] Hardware name: ChromiumOS crosvm, BIOS 0 > > .. > > [ 61.393635] Call Trace: > [ 61.394127] > [ 61.394488] skb_put.cold+0x10/0x10 > [ 61.395095] page_to_skb+0xf7/0x410 > [ 61.395689] receive_buf+0x81/0x1660 > [ 61.396228] ? netif_receive_skb_list_internal+0x1ad/0x2b0 > [ 61.397180] ? napi_gro_flush+0x97/0xe0 > [ 61.397896] ? detach_buf_split+0x67/0x120 > [ 61.398573] virtnet_poll+0x2cf/0x420 > [ 61.399197] __napi_poll+0x25/0x150 > [ 61.399764] net_rx_action+0x22f/0x280 > [ 61.400394] __do_softirq+0xba/0x257 > [ 61.401012] irq_exit_rcu+0x8e/0xb0 > [ 61.401618] common_interrupt+0x7b/0xa0 > [ 61.402270] > > See > https://lore.kernel.org/r/5edaa2b7c2fe4abd0347b8454b2ac032b6694e2c.camel%40collabora.com > for the report. > > Apply the original 1a8024239da ("virtio-net: fix for skb_over_panic inside > big mode") > again, the original logic still holds: > > In virtio-net's large packet mode, there is a hole in the space behind > buf. > > hdr_padded_len - hdr_len > > We must take this into account when calculating tailroom. > > Cc: Greg KH > Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's > sufficient tailroom") > Fixes: 126285651b7f ("Merge > ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net") > Signed-off-by: Xuan Zhuo > Reported-by: Corentin Noël > Tested-by: Corentin Noël > Signed-off-by: Michael S. Tsirkin > --- Acked-by: Jason Wang > > David, I think we need this in stable, too. > > drivers/net/virtio_net.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 096c2ac6b7a6..6b0812f44bbf 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -406,7 +406,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info > *vi, > * add_recvbuf_mergeable() + get_mergeable_buf_len() > */ > truesize = headroom ? PAGE_SIZE : truesize; > - tailroom = truesize - len - headroom; > + tailroom = truesize - len - headroom - (hdr_padded_len - hdr_len); > buf = p - headroom; > > len -= hdr_len; > -- > MST > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
To which Andi replied One problem with removing the ioremap opt-in is that it's still possible for drivers to get at devices without going through probe. To which Greg replied: https://lore.kernel.org/all/yvxbnj431yiww...@kroah.com/ If there are in-kernel PCI drivers that do not do this, they need to be fixed today. Can you guys resolve the differences here? I addressed this in my other mail, but we may need more discussion. And once they are resolved, mention this in the commit log so I don't get to re-read the series just to find out nothing changed in this respect? I frankly do not believe we are anywhere near being able to harden an arbitrary kernel config against attack. Why not? Device filter and the opt-ins together are a fairly strong mechanism. And it's not that they're a lot of code or super complicated either. You're essentially objecting to a single line change in your subsystem here. How about creating a defconfig that makes sense for TDX then? TDX can be used in many different ways, I don't think a defconfig is practical. In theory you could do some Kconfig dependency (at the pain point of having separate kernel binariees), but why not just do it at run time then if you maintain the list anyways. That's much easier and saner for everyone. In the past we usually always ended up with runtime mechanism for similar things anyways. Also it turns out that the filter mechanisms are needed for some arch drivers which are not even configurable, so alone it's probably not enough, Anyone deviating from that better know what they are doing, this API tweaking is just putting policy into the kernel ... Hardening drivers is kernel policy. It cannot be done anywhere else. -Andi ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On 10/9/2021 1:39 PM, Dan Williams wrote: On Sat, Oct 9, 2021 at 2:53 AM Michael S. Tsirkin wrote: On Fri, Oct 08, 2021 at 05:37:07PM -0700, Kuppuswamy Sathyanarayanan wrote: From: Andi Kleen For Confidential VM guests like TDX, the host is untrusted and hence the devices emulated by the host or any data coming from the host cannot be trusted. So the drivers that interact with the outside world have to be hardened by sharing memory with host on need basis with proper hardening fixes. For the PCI driver case, to share the memory with the host add pci_iomap_host_shared() and pci_iomap_host_shared_range() APIs. Signed-off-by: Andi Kleen Signed-off-by: Kuppuswamy Sathyanarayanan So I proposed to make all pci mappings shared, eliminating the need to patch drivers. To which Andi replied One problem with removing the ioremap opt-in is that it's still possible for drivers to get at devices without going through probe. To which Greg replied: https://lore.kernel.org/all/yvxbnj431yiww...@kroah.com/ If there are in-kernel PCI drivers that do not do this, they need to be fixed today. Can you guys resolve the differences here? I agree with you and Greg here. If a driver is accessing hardware resources outside of the bind lifetime of one of the devices it supports, and in a way that neither modrobe-policy nor device-authorization -policy infrastructure can block, that sounds like a bug report. The 5.15 tree has something like ~2.4k IO accesses (including MMIO and others) in init functions that also register drivers (thanks Elena for the number) Some are probably old drivers that could be fixed, but it's quite a few legitimate cases. For example for platform or ISA drivers that's the only way they can be implemented because they often have no other enumeration mechanism. For PCI drivers it's rarer, but also still can happen. One example that comes to mind here is the x86 Intel uncore drivers, which support a mix of MSR, ioremap and PCI config space accesses all from the same driver. This particular example can (and should be) fixed in other ways, but similar things also happen in other drivers, and they're not all broken. Even for the broken ones they're usually for some crufty old devices that has very few users, so it's likely untestable in practice. My point is just that the ecosystem of devices that Linux supports is messy enough that there are legitimate exceptions from the "First IO only in probe call only" rule. And we can't just fix them all. Even if we could it would be hard to maintain. Using a "firewall model" hooking into a few strategic points like we're proposing here is much saner for everyone. Now we can argue about the details. Right now what we're proposing has some redundancies: it has both a device model filter and low level filter for ioremap (this patch and some others). The low level filter is for catching issues that don't clearly fit into the "enumeration<->probe" model. You could call that redundant, but I would call it defense in depth or better safe than sorry. In theory it would be enough to have the low level opt-in only, but that would have the drawback that is something gets enumerated after all you would have all kind of weird device driver failures and in some cases even killed guests. So I think it makes sense to have Fix those drivers instead of sprinkling ioremap_shared in select places and with unclear rules about when a driver is allowed to do "shared" mappings. Only add it when the driver has been audited and hardened. But I agree we need on a documented process for this. I will work on some documentation for a proposal. But essentially I think it should be some variant of what Elena has outlined in her talk at Security Summit. https://static.sched.com/hosted_files/lssna2021/b6/LSS-HardeningLinuxGuestForCCC.pdf That is using extra auditing/scrutiny at review time, supported with some static code analysis that points to the interaction points, and code needs to be fuzzed explicitly. However short term it's only three virtio drivers, so this is not a urgent problem. Let the new device-authorization mechanism (with policy in userspace) Default policy in user space just seems to be a bad idea here. Who should know if a driver is hardened other than the kernel? Maintaining the list somewhere else just doesn't make sense to me. Also there is the more practical problem that some devices are needed for booting. For example in TDX we can't print something to the console with this mechanism, so you would never get any output before the initrd. Just seems like a nightmare for debugging anything. There really needs to be an authorization mechanism that works reasonably early. I can see a point of having user space overrides though, but we need to have a sane kernel default that works early. -Andi ___ Virtualization mailing l