Re: [PATCH V2] virtio-mmio: harden interrupt
On Thu, Dec 9, 2021 at 2:55 PM Michael S. Tsirkin wrote: > > On Thu, Dec 09, 2021 at 10:06:34AM +0800, Jason Wang wrote: > > On Thu, Dec 9, 2021 at 4:27 AM Michael S. Tsirkin wrote: > > > > > > On Fri, Nov 26, 2021 at 12:41:02PM +0800, Jason Wang wrote: > > > > This patch tries to make sure the virtio interrupt handler for MMIO > > > > won't be called after a reset and before virtio_device_ready(). We > > > > can't use IRQF_NO_AUTOEN since we're using shared interrupt > > > > (IRQF_SHARED). So this patch tracks the interrupt enabling status in a > > > > new intr_soft_enabled variable and toggle it during in > > > > vm_disable/enable_interrupts(). The MMIO interrupt handler will check > > > > intr_soft_enabled before processing the actual interrupt. > > > > > > > > Signed-off-by: Jason Wang > > > > --- > > > > Changes since V1: > > > > - Silent compling warnings > > > > drivers/virtio/virtio_mmio.c | 37 > > > > 1 file changed, 37 insertions(+) > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c > > > > index 56128b9c46eb..c517afdd2cc5 100644 > > > > --- a/drivers/virtio/virtio_mmio.c > > > > +++ b/drivers/virtio/virtio_mmio.c > > > > @@ -90,6 +90,7 @@ struct virtio_mmio_device { > > > > /* a list of queues so we can dispatch IRQs */ > > > > spinlock_t lock; > > > > struct list_head virtqueues; > > > > + bool intr_soft_enabled; > > > > }; > > > > > > > > struct virtio_mmio_vq_info { > > > > @@ -100,7 +101,37 @@ struct virtio_mmio_vq_info { > > > > struct list_head node; > > > > }; > > > > > > > > +/* disable irq handlers */ > > > > +static void vm_disable_cbs(struct virtio_device *vdev) > > > > +{ > > > > + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); > > > > + int irq = platform_get_irq(vm_dev->pdev, 0); > > > > > > > > + /* > > > > + * The below synchronize() guarantees that any > > > > + * interrupt for this line arriving after > > > > + * synchronize_irq() has completed is guaranteed to see > > > > + * intx_soft_enabled == false. > > > > + */ > > > > + WRITE_ONCE(vm_dev->intr_soft_enabled, false); > > > > + synchronize_irq(irq); > > > > +} > > > > + > > > > +/* enable irq handlers */ > > > > +static void vm_enable_cbs(struct virtio_device *vdev) > > > > +{ > > > > + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); > > > > + int irq = platform_get_irq(vm_dev->pdev, 0); > > > > + > > > > + disable_irq(irq); > > > > + /* > > > > + * The above disable_irq() provides TSO ordering and > > > > + * as such promotes the below store to store-release. > > > > + */ > > > > + WRITE_ONCE(vm_dev->intr_soft_enabled, true); > > > > + enable_irq(irq); > > > > + return; > > > > +} > > > > > > > > /* Configuration interface */ > > > > > > > > @@ -262,6 +293,8 @@ static void vm_reset(struct virtio_device *vdev) > > > > > > > > /* 0 status means a reset. */ > > > > writel(0, vm_dev->base + VIRTIO_MMIO_STATUS); > > > > > > There was a discussion about reading status to make sure it is clear. > > > The spec says we should, this can't hurt as a further hardening measure. > > > In fact, let's do it in the core maybe? Spec says it applies to all > > > devices ... > > > > We can do that, but I'm not sure if we break some existing device. > > Hmm. Have anything specific in mind? No, I can send a patch to do that. > > > > > > > > + /* Disable VQ/configuration callbacks. */ > > > > + vm_disable_cbs(vdev); > > > > } > > > > > > > > > > > > @@ -288,6 +321,9 @@ static irqreturn_t vm_interrupt(int irq, void > > > > *opaque) > > > > unsigned long flags; > > > > irqreturn_t ret = IRQ_NONE; > > > > > > > > + if (!READ_ONCE(vm_dev->intr_soft_enabled)) > > > > + return IRQ_NONE; > > > > + > > > > > > So if the write is seen before reset happened (should not happen, but we > > > are talking a buggy device) then it won't be acknowledged and device > > > will keep pulling the interrupt. I think as long as we are hardening > > > this, let's go the full mile and try to avoid DoS if we can, do the > > > check before invoking the callback, but do not skip the read. > > > Whether to still return IRQ_NONE is a good question. > > > > Did you mean something like this: > > > > /* Read and acknowledge interrupts */ > > status = readl(vm_dev->base + VIRTIO_MMIO_INTERRUPT_STATUS); > > writel(status, vm_dev->base + VIRTIO_MMIO_INTERRUPT_ACK); > > > > if (status) > > ret = IRQ_HANDLED; > > > >if (!READ_ONCE(vm_dev->intr_soft_enabled)) > >return ret; > > > > Thanks > > Maybe. Or is > > if (!READ_ONCE(vm_dev->intr_soft_enabled)) > return IRQ_NONE; > > better here? Yes, I just paste part of the code, the ret is initialized to IRQ_NONE. Thanks > > > > > > > > > > > > > > > > > >
Re: [PATCH v1 1/7] vdpa: Provide interface to read driver features
On Thu, Dec 9, 2021 at 3:07 PM Eli Cohen wrote: > > On Thu, Dec 09, 2021 at 01:33:01PM +0800, Jason Wang wrote: > > On Thu, Dec 9, 2021 at 4:14 AM Eli Cohen wrote: > > > > > > Provide an interface to read the negotiated features. This is needed > > > when building the netlink message in vdpa_dev_net_config_fill(). > > > > > > Also fix the implementation of vdpa_dev_net_config_fill() to use the > > > negotiated features instead of the device features. > > > > > > To make APIs clearer, make the following name changes to struct > > > vdpa_config_ops so they better describe their operations: > > > > > > get_features -> get_device_features > > > set_features -> set_driver_features [...] > > > + * @get_driver_features: Get virtio features in action > > > > Maybe "Get virtio driver features .." is better. > > I hope the name does not become too long. > Which one would you favor? > > get_vio_driver_features > get_virtio_drv_features > get_virtio_driver_features The name is fine, I mean the comment might be "Get the virtio driver features in action" Thanks > > > > > Thanks > > > > > + * @vdev: vdpa device > > > + * Returns the virtio features accepted > > > * @set_config_cb: Set the config interrupt callback > > > * @vdev: vdpa device > > > * @cb: virtio-vdev interrupt callback > > > structure > > > @@ -276,8 +279,9 @@ struct vdpa_config_ops { > > > > > > /* Device ops */ > > > u32 (*get_vq_align)(struct vdpa_device *vdev); > > > - u64 (*get_features)(struct vdpa_device *vdev); > > > - int (*set_features)(struct vdpa_device *vdev, u64 features); > > > + u64 (*get_device_features)(struct vdpa_device *vdev); > > > + int (*set_driver_features)(struct vdpa_device *vdev, u64 > > > features); > > > + u64 (*get_driver_features)(struct vdpa_device *vdev); > > > void (*set_config_cb)(struct vdpa_device *vdev, > > > struct vdpa_callback *cb); > > > u16 (*get_vq_num_max)(struct vdpa_device *vdev); > > > @@ -395,7 +399,7 @@ static inline int vdpa_set_features(struct > > > vdpa_device *vdev, u64 features) > > > const struct vdpa_config_ops *ops = vdev->config; > > > > > > vdev->features_valid = true; > > > - return ops->set_features(vdev, features); > > > + return ops->set_driver_features(vdev, features); > > > } > > > > > > void vdpa_get_config(struct vdpa_device *vdev, unsigned int offset, > > > -- > > > 2.33.1 > > > > > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 6/7] vdpa: Add support for querying control virtqueue index
On Thu, Dec 9, 2021 at 3:09 PM Eli Cohen wrote: > > On Thu, Dec 09, 2021 at 01:44:56PM +0800, Jason Wang wrote: > > On Thu, Dec 9, 2021 at 4:15 AM Eli Cohen wrote: > > > > > > Add netlink attribute and callback function to query the control VQ > > > index of a device. > > > > > > Example: > > > > > > $ vdpa dev config show vdpa-a > > > vdpa-a: mac 00:00:00:00:88:88 link up link_announce false max_vq_pairs > > > 5 \ > > > mtu 9000 ctrl_vq_idx 10 > > > > > > I still wonder about the motivation for this. > To be able to show the stats for CVQ. Right. > > > And as discussed, the > > ctrl_vq_idx varies depending on whether MQ is negotiated. > > I handle this according to the spec and it depends on MQ. Yes, but there could be a chance that the cvq index changes after the vdpa dev config show. Thanks > > > > > Thanks > > > > > > > > Signed-off-by: Eli Cohen > > > --- > > > v0 -> v1: > > > 1. Use logic defined in the spec to deduce virtqueue index. > > > > > > drivers/vdpa/vdpa.c | 25 + > > > include/uapi/linux/vdpa.h | 1 + > > > 2 files changed, 26 insertions(+) > > > > > > diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c > > > index 3bf016e03512..b4d4b8a7ca4e 100644 > > > --- a/drivers/vdpa/vdpa.c > > > +++ b/drivers/vdpa/vdpa.c > > > @@ -712,6 +712,27 @@ static int vdpa_nl_cmd_dev_get_dumpit(struct sk_buff > > > *msg, struct netlink_callba > > > return msg->len; > > > } > > > > > > +static int vdpa_dev_net_ctrl_vq_fill(struct vdpa_device *vdev, > > > +struct sk_buff *msg, > > > +struct virtio_net_config *config, > > > +u64 features) > > > +{ > > > + u16 N; > > > + > > > + /* control VQ index, if available, is deduced according to the > > > logic > > > +* described in the virtio spec in section 5.1.2 > > > +*/ > > > + if (!(features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))) > > > + return 0; > > > + > > > + if (features & BIT_ULL(VIRTIO_NET_F_MQ)) > > > + N = le16_to_cpu(config->max_virtqueue_pairs); > > > + else > > > + N = 1; > > > + > > > + return nla_put_u16(msg, VDPA_ATTR_DEV_CTRL_VQ_IDX, 2 * N); > > > +} > > > + > > > static int vdpa_dev_net_mq_config_fill(struct vdpa_device *vdev, > > >struct sk_buff *msg, u64 features, > > >const struct virtio_net_config > > > *config) > > > @@ -730,6 +751,7 @@ static int vdpa_dev_net_config_fill(struct > > > vdpa_device *vdev, struct sk_buff *ms > > > struct virtio_net_config config = {}; > > > u64 features; > > > u16 val_u16; > > > + int err; > > > > > > vdpa_get_config(vdev, 0, , sizeof(config)); > > > > > > @@ -746,6 +768,9 @@ static int vdpa_dev_net_config_fill(struct > > > vdpa_device *vdev, struct sk_buff *ms > > > return -EMSGSIZE; > > > > > > features = vdev->config->get_driver_features(vdev); > > > + err = vdpa_dev_net_ctrl_vq_fill(vdev, msg, , features); > > > + if (err) > > > + return err; > > > > > > return vdpa_dev_net_mq_config_fill(vdev, msg, features, ); > > > } > > > diff --git a/include/uapi/linux/vdpa.h b/include/uapi/linux/vdpa.h > > > index a252f06f9dfd..2e3a7f89f42d 100644 > > > --- a/include/uapi/linux/vdpa.h > > > +++ b/include/uapi/linux/vdpa.h > > > @@ -34,6 +34,7 @@ enum vdpa_attr { > > > VDPA_ATTR_DEV_MAX_VQS, /* u32 */ > > > VDPA_ATTR_DEV_MAX_VQ_SIZE, /* u16 */ > > > VDPA_ATTR_DEV_MIN_VQ_SIZE, /* u16 */ > > > + VDPA_ATTR_DEV_CTRL_VQ_IDX, /* u16 */ > > > > > > VDPA_ATTR_DEV_NET_CFG_MACADDR, /* binary */ > > > VDPA_ATTR_DEV_NET_STATUS, /* u8 */ > > > -- > > > 2.33.1 > > > > > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V2] virtio-mmio: harden interrupt
On Thu, Dec 09, 2021 at 10:06:34AM +0800, Jason Wang wrote: > On Thu, Dec 9, 2021 at 4:27 AM Michael S. Tsirkin wrote: > > > > On Fri, Nov 26, 2021 at 12:41:02PM +0800, Jason Wang wrote: > > > This patch tries to make sure the virtio interrupt handler for MMIO > > > won't be called after a reset and before virtio_device_ready(). We > > > can't use IRQF_NO_AUTOEN since we're using shared interrupt > > > (IRQF_SHARED). So this patch tracks the interrupt enabling status in a > > > new intr_soft_enabled variable and toggle it during in > > > vm_disable/enable_interrupts(). The MMIO interrupt handler will check > > > intr_soft_enabled before processing the actual interrupt. > > > > > > Signed-off-by: Jason Wang > > > --- > > > Changes since V1: > > > - Silent compling warnings > > > drivers/virtio/virtio_mmio.c | 37 > > > 1 file changed, 37 insertions(+) > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c > > > index 56128b9c46eb..c517afdd2cc5 100644 > > > --- a/drivers/virtio/virtio_mmio.c > > > +++ b/drivers/virtio/virtio_mmio.c > > > @@ -90,6 +90,7 @@ struct virtio_mmio_device { > > > /* a list of queues so we can dispatch IRQs */ > > > spinlock_t lock; > > > struct list_head virtqueues; > > > + bool intr_soft_enabled; > > > }; > > > > > > struct virtio_mmio_vq_info { > > > @@ -100,7 +101,37 @@ struct virtio_mmio_vq_info { > > > struct list_head node; > > > }; > > > > > > +/* disable irq handlers */ > > > +static void vm_disable_cbs(struct virtio_device *vdev) > > > +{ > > > + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); > > > + int irq = platform_get_irq(vm_dev->pdev, 0); > > > > > > + /* > > > + * The below synchronize() guarantees that any > > > + * interrupt for this line arriving after > > > + * synchronize_irq() has completed is guaranteed to see > > > + * intx_soft_enabled == false. > > > + */ > > > + WRITE_ONCE(vm_dev->intr_soft_enabled, false); > > > + synchronize_irq(irq); > > > +} > > > + > > > +/* enable irq handlers */ > > > +static void vm_enable_cbs(struct virtio_device *vdev) > > > +{ > > > + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); > > > + int irq = platform_get_irq(vm_dev->pdev, 0); > > > + > > > + disable_irq(irq); > > > + /* > > > + * The above disable_irq() provides TSO ordering and > > > + * as such promotes the below store to store-release. > > > + */ > > > + WRITE_ONCE(vm_dev->intr_soft_enabled, true); > > > + enable_irq(irq); > > > + return; > > > +} > > > > > > /* Configuration interface */ > > > > > > @@ -262,6 +293,8 @@ static void vm_reset(struct virtio_device *vdev) > > > > > > /* 0 status means a reset. */ > > > writel(0, vm_dev->base + VIRTIO_MMIO_STATUS); > > > > There was a discussion about reading status to make sure it is clear. > > The spec says we should, this can't hurt as a further hardening measure. > > In fact, let's do it in the core maybe? Spec says it applies to all > > devices ... > > We can do that, but I'm not sure if we break some existing device. Hmm. Have anything specific in mind? > > > > > + /* Disable VQ/configuration callbacks. */ > > > + vm_disable_cbs(vdev); > > > } > > > > > > > > > @@ -288,6 +321,9 @@ static irqreturn_t vm_interrupt(int irq, void *opaque) > > > unsigned long flags; > > > irqreturn_t ret = IRQ_NONE; > > > > > > + if (!READ_ONCE(vm_dev->intr_soft_enabled)) > > > + return IRQ_NONE; > > > + > > > > So if the write is seen before reset happened (should not happen, but we > > are talking a buggy device) then it won't be acknowledged and device > > will keep pulling the interrupt. I think as long as we are hardening > > this, let's go the full mile and try to avoid DoS if we can, do the > > check before invoking the callback, but do not skip the read. > > Whether to still return IRQ_NONE is a good question. > > Did you mean something like this: > > /* Read and acknowledge interrupts */ > status = readl(vm_dev->base + VIRTIO_MMIO_INTERRUPT_STATUS); > writel(status, vm_dev->base + VIRTIO_MMIO_INTERRUPT_ACK); > > if (status) > ret = IRQ_HANDLED; > >if (!READ_ONCE(vm_dev->intr_soft_enabled)) >return ret; > > Thanks Maybe. Or is if (!READ_ONCE(vm_dev->intr_soft_enabled)) return IRQ_NONE; better here? > > > > > > > > > > > /* Read and acknowledge interrupts */ > > > status = readl(vm_dev->base + VIRTIO_MMIO_INTERRUPT_STATUS); > > > writel(status, vm_dev->base + VIRTIO_MMIO_INTERRUPT_ACK); > > > @@ -529,6 +565,7 @@ static bool vm_get_shm_region(struct virtio_device > > > *vdev, > > > } > > > > > > static const struct virtio_config_ops virtio_mmio_config_ops = { > > > + .enable_cbs = vm_enable_cbs, > > > .get=
Re: [RFC PATCH] virtio: make sure legacy pci device gain 32bit-pfn vq
On Thu, Dec 09, 2021 at 11:00:55AM +0800, 王贇 wrote: > > > 在 2021/12/9 上午8:19, Michael S. Tsirkin 写道: > [snip] > > > > > > > > > > > > Hi, Michael > > > > > > > > > > Thanks for the comment, unfortunately modify device is not an option > > > > > for us > > > > > :-( > > > > > > > > > > Is there any idea on how to solve this issue properly? > > > > > > > > > > Regards, > > > > > Michael Wang > > > > > > > > By the way, there is a bug in the error message. Want to fix that? > > > > > > Could you please provide more detail about the bug? We'd like to help > > > fixing > > > it :-) > > > > > > Besides, I've checked that patch but it can't address our issue, we > > > actually > > > have this legacy pci device on arm platform, and the memory layout is > > > unfriendly since allocation rarely providing page-address below 44bit, we > > > understand the virtio-iommu case should not do force dma, while we don't > > > have that so it's just working fine. > > > > > > Regards, > > > Michael Wang > > > > BTW is it just the ring that's at issue? > > Yes, the dma address for ring allocated as page can't fit the requirement. > > > Figuring out we have this problematic config and then allocating just > > the ring from coherent memory seems more palatable. > > Agree, I'm also wondering why can't we force alloc 44bit-pfn page to fit the > requirement? I mean if there are such pages, we should use them firstly as > dma address for legacy devices, and only fail when there are no such pages > at all, but can't find existing API to alloc page with such requirement... > anyway. > > > > > But please note we still need to detect config with a virtual iommu (can > > be any kind not just virtio-iommu, smmu, vtd are all affected) and > > disable the hacks. This is what the new DMA API I suggested would do. > > Fair enough, any more details about the design of new API? > > Regards, > Michael Wang The idea was that on some systems any DMA address is also a physical address (as in the case of e.g. bounce buffers) and then it can be used without VIRTIO_F_ACCESS_PLATFORM. > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > drivers/virtio/virtio_pci_legacy.c | 10 ++ > > > > > > > drivers/virtio/virtio_ring.c | 3 +++ > > > > > > > include/linux/virtio.h | 1 + > > > > > > > 3 files changed, 14 insertions(+) > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_legacy.c > > > > > > > b/drivers/virtio/virtio_pci_legacy.c > > > > > > > index d62e983..11f2ebf 100644 > > > > > > > --- a/drivers/virtio/virtio_pci_legacy.c > > > > > > > +++ b/drivers/virtio/virtio_pci_legacy.c > > > > > > > @@ -263,6 +263,16 @@ int virtio_pci_legacy_probe(struct > > > > > > > virtio_pci_device > > > > > > > *vp_dev) > > > > > > > vp_dev->setup_vq = setup_vq; > > > > > > > vp_dev->del_vq = del_vq; > > > > > > > > > > > > > > + /* > > > > > > > + * The legacy pci device requre 32bit-pfn vq, > > > > > > > + * or setup_vq() will failed. > > > > > > > + * > > > > > > > + * Thus we make sure vring_use_dma_api() will > > > > > > > + * return true during the allocation by marking > > > > > > > + * force_dma here. > > > > > > > + */ > > > > > > > + vp_dev->vdev.force_dma = true; > > > > > > > + > > > > > > > return 0; > > > > > > > > > > > > > > err_iomap: > > > > > > > diff --git a/drivers/virtio/virtio_ring.c > > > > > > > b/drivers/virtio/virtio_ring.c > > > > > > > index 3035bb6..6562e01 100644 > > > > > > > --- a/drivers/virtio/virtio_ring.c > > > > > > > +++ b/drivers/virtio/virtio_ring.c > > > > > > > @@ -245,6 +245,9 @@ static inline bool > > > > > > > virtqueue_use_indirect(struct > > > > > > > virtqueue *_vq, > > > > > > > > > > > > > > static bool vring_use_dma_api(struct virtio_device *vdev) > > > > > > > { > > > > > > > + if (vdev->force_dma) > > > > > > > + return true; > > > > > > > + > > > > > > > if (!virtio_has_dma_quirk(vdev)) > > > > > > > return true; > > > > > > > > > > > > > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h > > > > > > > index 41edbc0..a4eb29d 100644 > > > > > > > --- a/include/linux/virtio.h > > > > > > > +++ b/include/linux/virtio.h > > > > > > > @@ -109,6 +109,7 @@ struct virtio_device { > > > > > > > bool failed; > > > > > > > bool config_enabled; > > > > > > > bool config_change_pending; > > > > > > > + bool force_dma; > > > > > > > spinlock_t config_lock; > > > > > > > spinlock_t vqs_list_lock; /* Protects VQs list access */ > > > > > > > struct device dev; > > > > > > > -- > > > > > > > 1.8.3.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH] virtio: make sure legacy pci device gain 32bit-pfn vq
On Thu, Dec 09, 2021 at 11:21:36AM +0800, 王贇 wrote: > > > 在 2021/12/8 下午7:08, Michael S. Tsirkin 写道: > [snip] > > > > > > > > > > Hi, Michael > > > > > > > > > > Thanks for the comment, unfortunately modify device is not an option > > > > > for us > > > > > :-( > > > > > > > > > > Is there any idea on how to solve this issue properly? > > > > > > > > > > Regards, > > > > > Michael Wang > > > > > > > > By the way, there is a bug in the error message. Want to fix that? > > > > > > Could you please provide more detail about the bug? We'd like to help > > > fixing > > > it :-) > > > > virtio-pci :14:00.0: platform bug: legacy virtio-mmio must ... > > > > should be virtio-pci not virtio-mmio > > Patch on the way~ > > > > > > > > > > Besides, I've checked that patch but it can't address our issue, we > > > actually > > > have this legacy pci device on arm platform, and the memory layout is > > > unfriendly since allocation rarely providing page-address below 44bit, we > > > understand the virtio-iommu case should not do force dma, while we don't > > > have that so it's just working fine. > > > > > > Regards, > > > Michael Wang > > > > Hmm wait a sec is it a physical device or a hypervisor? > > If a physical one then doesn't it need VIRTIO_F_ORDER_PLATFORM > > on ARM? > > The PCI device is virtual, I can't see how VIRTIO_F_ORDER_PLATFORM help > address this issue, legacy pci config is 32bit but it's 36, seems like will > never be included? > > Regards, > Michael Wang Oh, if the device is virtual then I think you should just update it please. virtio 0.X is architecturally limited to small VMs, if your hypervisor supports more it should emulate virtio 1.0. > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > drivers/virtio/virtio_pci_legacy.c | 10 ++ > > > > > > > drivers/virtio/virtio_ring.c | 3 +++ > > > > > > > include/linux/virtio.h | 1 + > > > > > > > 3 files changed, 14 insertions(+) > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_legacy.c > > > > > > > b/drivers/virtio/virtio_pci_legacy.c > > > > > > > index d62e983..11f2ebf 100644 > > > > > > > --- a/drivers/virtio/virtio_pci_legacy.c > > > > > > > +++ b/drivers/virtio/virtio_pci_legacy.c > > > > > > > @@ -263,6 +263,16 @@ int virtio_pci_legacy_probe(struct > > > > > > > virtio_pci_device > > > > > > > *vp_dev) > > > > > > > vp_dev->setup_vq = setup_vq; > > > > > > > vp_dev->del_vq = del_vq; > > > > > > > > > > > > > > + /* > > > > > > > + * The legacy pci device requre 32bit-pfn vq, > > > > > > > + * or setup_vq() will failed. > > > > > > > + * > > > > > > > + * Thus we make sure vring_use_dma_api() will > > > > > > > + * return true during the allocation by marking > > > > > > > + * force_dma here. > > > > > > > + */ > > > > > > > + vp_dev->vdev.force_dma = true; > > > > > > > + > > > > > > > return 0; > > > > > > > > > > > > > > err_iomap: > > > > > > > diff --git a/drivers/virtio/virtio_ring.c > > > > > > > b/drivers/virtio/virtio_ring.c > > > > > > > index 3035bb6..6562e01 100644 > > > > > > > --- a/drivers/virtio/virtio_ring.c > > > > > > > +++ b/drivers/virtio/virtio_ring.c > > > > > > > @@ -245,6 +245,9 @@ static inline bool > > > > > > > virtqueue_use_indirect(struct > > > > > > > virtqueue *_vq, > > > > > > > > > > > > > > static bool vring_use_dma_api(struct virtio_device *vdev) > > > > > > > { > > > > > > > + if (vdev->force_dma) > > > > > > > + return true; > > > > > > > + > > > > > > > if (!virtio_has_dma_quirk(vdev)) > > > > > > > return true; > > > > > > > > > > > > > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h > > > > > > > index 41edbc0..a4eb29d 100644 > > > > > > > --- a/include/linux/virtio.h > > > > > > > +++ b/include/linux/virtio.h > > > > > > > @@ -109,6 +109,7 @@ struct virtio_device { > > > > > > > bool failed; > > > > > > > bool config_enabled; > > > > > > > bool config_change_pending; > > > > > > > + bool force_dma; > > > > > > > spinlock_t config_lock; > > > > > > > spinlock_t vqs_list_lock; /* Protects VQs list access */ > > > > > > > struct device dev; > > > > > > > -- > > > > > > > 1.8.3.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods
These methods indirect the actual DAX read/write path. In the end pmem uses magic flush and mc safe variants and fuse and dcssblk use plain ones while device mapper picks redirects to the underlying device. Add set_dax_virtual() and set_dax_nomcsafe() APIs for fuse to skip these special variants, then use them everywhere as they fall back to the plain ones on s390 anyway and remove an indirect call from the read/write path as well as a lot of boilerplate code. Signed-off-by: Christoph Hellwig --- drivers/dax/super.c | 36 ++-- drivers/md/dm-linear.c| 20 - drivers/md/dm-log-writes.c| 80 --- drivers/md/dm-stripe.c| 20 - drivers/md/dm.c | 50 -- drivers/nvdimm/pmem.c | 20 - drivers/s390/block/dcssblk.c | 14 -- fs/dax.c | 5 --- fs/fuse/virtio_fs.c | 19 + include/linux/dax.h | 9 ++-- include/linux/device-mapper.h | 4 -- 11 files changed, 37 insertions(+), 240 deletions(-) diff --git a/drivers/dax/super.c b/drivers/dax/super.c index e81d5ee57390f..ff676a07480c8 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -105,6 +105,10 @@ enum dax_device_flags { DAXDEV_WRITE_CACHE, /* flag to check if device supports synchronous flush */ DAXDEV_SYNC, + /* do not use uncached operations to write data */ + DAXDEV_CACHED, + /* do not use mcsafe operations to read data */ + DAXDEV_NOMCSAFE, }; /** @@ -146,9 +150,15 @@ size_t dax_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr, if (!dax_alive(dax_dev)) return 0; - return dax_dev->ops->copy_from_iter(dax_dev, pgoff, addr, bytes, i); + /* +* The userspace address for the memory copy has already been validated +* via access_ok() in vfs_write, so use the 'no check' version to bypass +* the HARDENED_USERCOPY overhead. +*/ + if (test_bit(DAXDEV_CACHED, _dev->flags)) + return _copy_from_iter(addr, bytes, i); + return _copy_from_iter_flushcache(addr, bytes, i); } -EXPORT_SYMBOL_GPL(dax_copy_from_iter); size_t dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr, size_t bytes, struct iov_iter *i) @@ -156,9 +166,15 @@ size_t dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr, if (!dax_alive(dax_dev)) return 0; - return dax_dev->ops->copy_to_iter(dax_dev, pgoff, addr, bytes, i); + /* +* The userspace address for the memory copy has already been validated +* via access_ok() in vfs_red, so use the 'no check' version to bypass +* the HARDENED_USERCOPY overhead. +*/ + if (test_bit(DAXDEV_NOMCSAFE, _dev->flags)) + return _copy_to_iter(addr, bytes, i); + return _copy_mc_to_iter(addr, bytes, i); } -EXPORT_SYMBOL_GPL(dax_copy_to_iter); int dax_zero_page_range(struct dax_device *dax_dev, pgoff_t pgoff, size_t nr_pages) @@ -220,6 +236,18 @@ void set_dax_synchronous(struct dax_device *dax_dev) } EXPORT_SYMBOL_GPL(set_dax_synchronous); +void set_dax_cached(struct dax_device *dax_dev) +{ + set_bit(DAXDEV_CACHED, _dev->flags); +} +EXPORT_SYMBOL_GPL(set_dax_cached); + +void set_dax_nomcsafe(struct dax_device *dax_dev) +{ + set_bit(DAXDEV_NOMCSAFE, _dev->flags); +} +EXPORT_SYMBOL_GPL(set_dax_nomcsafe); + bool dax_alive(struct dax_device *dax_dev) { lockdep_assert_held(_srcu); diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c index 90de42f6743ac..1b97a11d71517 100644 --- a/drivers/md/dm-linear.c +++ b/drivers/md/dm-linear.c @@ -180,22 +180,6 @@ static long linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff, return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn); } -static size_t linear_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff, - void *addr, size_t bytes, struct iov_iter *i) -{ - struct dax_device *dax_dev = linear_dax_pgoff(ti, ); - - return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i); -} - -static size_t linear_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff, - void *addr, size_t bytes, struct iov_iter *i) -{ - struct dax_device *dax_dev = linear_dax_pgoff(ti, ); - - return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i); -} - static int linear_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff, size_t nr_pages) { @@ -206,8 +190,6 @@ static int linear_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff, #else #define linear_dax_direct_access NULL -#define linear_dax_copy_from_iter NULL -#define linear_dax_copy_to_iter NULL #define linear_dax_zero_page_range NULL #endif @@ -225,8 +207,6 @@ static struct target_type linear_target = {
[PATCH 1/5] uio: remove copy_from_iter_flushcache() and copy_mc_to_iter()
These two wrappers are never used. Signed-off-by: Christoph Hellwig --- drivers/nvdimm/pmem.c | 4 ++-- include/linux/uio.h | 20 +--- 2 files changed, 3 insertions(+), 21 deletions(-) diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 4190c8c46ca88..8294f1c701baa 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -302,8 +302,8 @@ static long pmem_dax_direct_access(struct dax_device *dax_dev, } /* - * Use the 'no check' versions of copy_from_iter_flushcache() and - * copy_mc_to_iter() to bypass HARDENED_USERCOPY overhead. Bounds + * Use the 'no check' versions of _copy_from_iter_flushcache() and + * _copy_mc_to_iter() to bypass HARDENED_USERCOPY overhead. Bounds * checking, both file offset and device offset, is handled by * dax_iomap_actor() */ diff --git a/include/linux/uio.h b/include/linux/uio.h index 6350354f97e90..494d552c1d663 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -196,7 +196,7 @@ bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i) #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE /* * Note, users like pmem that depend on the stricter semantics of - * copy_from_iter_flushcache() than copy_from_iter_nocache() must check for + * _copy_from_iter_flushcache() than _copy_from_iter_nocache() must check for * IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) before assuming that the * destination is flushed from the cache on return. */ @@ -211,24 +211,6 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i); #define _copy_mc_to_iter _copy_to_iter #endif -static __always_inline __must_check -size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i) -{ - if (unlikely(!check_copy_size(addr, bytes, false))) - return 0; - else - return _copy_from_iter_flushcache(addr, bytes, i); -} - -static __always_inline __must_check -size_t copy_mc_to_iter(void *addr, size_t bytes, struct iov_iter *i) -{ - if (unlikely(!check_copy_size(addr, bytes, true))) - return 0; - else - return _copy_mc_to_iter(addr, bytes, i); -} - size_t iov_iter_zero(size_t bytes, struct iov_iter *); unsigned long iov_iter_alignment(const struct iov_iter *i); unsigned long iov_iter_gap_alignment(const struct iov_iter *i); -- 2.30.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 3/5] dax: remove the DAXDEV_F_SYNC flag
Remove the DAXDEV_F_SYNC flag and thus the flags argument to alloc_dax and just let the drivers call set_dax_synchronous directly. Signed-off-by: Christoph Hellwig --- drivers/dax/bus.c| 3 ++- drivers/dax/super.c | 6 +- drivers/md/dm.c | 2 +- drivers/nvdimm/pmem.c| 7 +++ drivers/s390/block/dcssblk.c | 4 ++-- fs/fuse/virtio_fs.c | 2 +- include/linux/dax.h | 8 ++-- 7 files changed, 12 insertions(+), 20 deletions(-) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 6683d42c32c56..da2a14d096d29 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -1324,11 +1324,12 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data) * No dax_operations since there is no access to this device outside of * mmap of the resulting character device. */ - dax_dev = alloc_dax(dev_dax, NULL, DAXDEV_F_SYNC); + dax_dev = alloc_dax(dev_dax, NULL); if (IS_ERR(dax_dev)) { rc = PTR_ERR(dax_dev); goto err_alloc_dax; } + set_dax_synchronous(dax_dev); /* a device_dax instance is dead while the driver is not attached */ kill_dax(dax_dev); diff --git a/drivers/dax/super.c b/drivers/dax/super.c index e18155f43a635..e81d5ee57390f 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -345,8 +345,7 @@ static struct dax_device *dax_dev_get(dev_t devt) return dax_dev; } -struct dax_device *alloc_dax(void *private, const struct dax_operations *ops, - unsigned long flags) +struct dax_device *alloc_dax(void *private, const struct dax_operations *ops) { struct dax_device *dax_dev; dev_t devt; @@ -366,9 +365,6 @@ struct dax_device *alloc_dax(void *private, const struct dax_operations *ops, dax_dev->ops = ops; dax_dev->private = private; - if (flags & DAXDEV_F_SYNC) - set_dax_synchronous(dax_dev); - return dax_dev; err_dev: diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 4e997c02bb0a0..f4b972af10928 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1765,7 +1765,7 @@ static struct mapped_device *alloc_dev(int minor) sprintf(md->disk->disk_name, "dm-%d", minor); if (IS_ENABLED(CONFIG_FS_DAX)) { - md->dax_dev = alloc_dax(md, _dax_ops, 0); + md->dax_dev = alloc_dax(md, _dax_ops); if (IS_ERR(md->dax_dev)) { md->dax_dev = NULL; goto bad; diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 8294f1c701baa..85b3339286bd8 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -402,7 +402,6 @@ static int pmem_attach_disk(struct device *dev, struct gendisk *disk; void *addr; int rc; - unsigned long flags = 0UL; pmem = devm_kzalloc(dev, sizeof(*pmem), GFP_KERNEL); if (!pmem) @@ -495,13 +494,13 @@ static int pmem_attach_disk(struct device *dev, nvdimm_badblocks_populate(nd_region, >bb, _range); disk->bb = >bb; - if (is_nvdimm_sync(nd_region)) - flags = DAXDEV_F_SYNC; - dax_dev = alloc_dax(pmem, _dax_ops, flags); + dax_dev = alloc_dax(pmem, _dax_ops); if (IS_ERR(dax_dev)) { rc = PTR_ERR(dax_dev); goto out; } + if (is_nvdimm_sync(nd_region)) + set_dax_synchronous(dax_dev); rc = dax_add_host(dax_dev, disk); if (rc) goto out_cleanup_dax; diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c index e65e83764d1ce..10823debc09bd 100644 --- a/drivers/s390/block/dcssblk.c +++ b/drivers/s390/block/dcssblk.c @@ -686,13 +686,13 @@ dcssblk_add_store(struct device *dev, struct device_attribute *attr, const char if (rc) goto put_dev; - dev_info->dax_dev = alloc_dax(dev_info, _dax_ops, - DAXDEV_F_SYNC); + dev_info->dax_dev = alloc_dax(dev_info, _dax_ops); if (IS_ERR(dev_info->dax_dev)) { rc = PTR_ERR(dev_info->dax_dev); dev_info->dax_dev = NULL; goto put_dev; } + set_dax_synchronous(dev_info->dax_dev); rc = dax_add_host(dev_info->dax_dev, dev_info->gd); if (rc) goto out_dax; diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 242cc1c0d7ed7..5c03a0364a9bb 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -850,7 +850,7 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs) dev_dbg(>dev, "%s: window kaddr 0x%px phys_addr 0x%llx len 0x%llx\n", __func__, fs->window_kaddr, cache_reg.addr, cache_reg.len); - fs->dax_dev = alloc_dax(fs, _fs_dax_ops, 0); + fs->dax_dev = alloc_dax(fs, _fs_dax_ops); if (IS_ERR(fs->dax_dev)) return PTR_ERR(fs->dax_dev); diff
[PATCH 5/5] dax: always use _copy_mc_to_iter in dax_copy_to_iter
While using the MC-safe copy routines is rather pointless on a virtual device like virtiofs, it also isn't harmful at all. So just use _copy_mc_to_iter unconditionally to simplify the code. Signed-off-by: Christoph Hellwig --- drivers/dax/super.c | 10 -- fs/fuse/virtio_fs.c | 1 - include/linux/dax.h | 1 - 3 files changed, 12 deletions(-) diff --git a/drivers/dax/super.c b/drivers/dax/super.c index ff676a07480c8..fe783234ca669 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -107,8 +107,6 @@ enum dax_device_flags { DAXDEV_SYNC, /* do not use uncached operations to write data */ DAXDEV_CACHED, - /* do not use mcsafe operations to read data */ - DAXDEV_NOMCSAFE, }; /** @@ -171,8 +169,6 @@ size_t dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr, * via access_ok() in vfs_red, so use the 'no check' version to bypass * the HARDENED_USERCOPY overhead. */ - if (test_bit(DAXDEV_NOMCSAFE, _dev->flags)) - return _copy_to_iter(addr, bytes, i); return _copy_mc_to_iter(addr, bytes, i); } @@ -242,12 +238,6 @@ void set_dax_cached(struct dax_device *dax_dev) } EXPORT_SYMBOL_GPL(set_dax_cached); -void set_dax_nomcsafe(struct dax_device *dax_dev) -{ - set_bit(DAXDEV_NOMCSAFE, _dev->flags); -} -EXPORT_SYMBOL_GPL(set_dax_nomcsafe); - bool dax_alive(struct dax_device *dax_dev) { lockdep_assert_held(_srcu); diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 754319ce2a29b..d9c20b148ac19 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -838,7 +838,6 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs) if (IS_ERR(fs->dax_dev)) return PTR_ERR(fs->dax_dev); set_dax_cached(fs->dax_dev); - set_dax_nomcsafe(fs->dax_dev); return devm_add_action_or_reset(>dev, virtio_fs_cleanup_dax, fs->dax_dev); } diff --git a/include/linux/dax.h b/include/linux/dax.h index d22cbf03d37d2..d267331bc37e7 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -90,7 +90,6 @@ static inline bool daxdev_mapping_supported(struct vm_area_struct *vma, #endif void set_dax_cached(struct dax_device *dax_dev); -void set_dax_nomcsafe(struct dax_device *dax_dev); struct writeback_control; #if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX) -- 2.30.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
devirtualize kernel access to DAX
Hi Dan, this series cleans up a few loose end ends and then removes the copy_from_iter and copy_to_iter dax_operations methods in favor of straight calls. Diffstat: drivers/dax/bus.c |3 + drivers/dax/super.c | 40 ++--- drivers/md/dm-linear.c| 20 -- drivers/md/dm-log-writes.c| 80 -- drivers/md/dm-stripe.c| 20 -- drivers/md/dm.c | 52 --- drivers/nvdimm/pmem.c | 27 +- drivers/s390/block/dcssblk.c | 18 + fs/dax.c |5 -- fs/fuse/virtio_fs.c | 20 +- include/linux/dax.h | 28 +++--- include/linux/device-mapper.h |4 -- include/linux/uio.h | 20 -- 13 files changed, 44 insertions(+), 293 deletions(-) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 2/5] dax: simplify dax_synchronous and set_dax_synchronous
Remove the pointless wrappers. Signed-off-by: Christoph Hellwig --- drivers/dax/super.c | 8 include/linux/dax.h | 12 ++-- 2 files changed, 6 insertions(+), 14 deletions(-) diff --git a/drivers/dax/super.c b/drivers/dax/super.c index e7152a6c4cc40..e18155f43a635 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -208,17 +208,17 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev) } EXPORT_SYMBOL_GPL(dax_write_cache_enabled); -bool __dax_synchronous(struct dax_device *dax_dev) +bool dax_synchronous(struct dax_device *dax_dev) { return test_bit(DAXDEV_SYNC, _dev->flags); } -EXPORT_SYMBOL_GPL(__dax_synchronous); +EXPORT_SYMBOL_GPL(dax_synchronous); -void __set_dax_synchronous(struct dax_device *dax_dev) +void set_dax_synchronous(struct dax_device *dax_dev) { set_bit(DAXDEV_SYNC, _dev->flags); } -EXPORT_SYMBOL_GPL(__set_dax_synchronous); +EXPORT_SYMBOL_GPL(set_dax_synchronous); bool dax_alive(struct dax_device *dax_dev) { diff --git a/include/linux/dax.h b/include/linux/dax.h index 87ae4c9b1d65b..3bd1fdb5d5f4b 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -48,16 +48,8 @@ void put_dax(struct dax_device *dax_dev); void kill_dax(struct dax_device *dax_dev); void dax_write_cache(struct dax_device *dax_dev, bool wc); bool dax_write_cache_enabled(struct dax_device *dax_dev); -bool __dax_synchronous(struct dax_device *dax_dev); -static inline bool dax_synchronous(struct dax_device *dax_dev) -{ - return __dax_synchronous(dax_dev); -} -void __set_dax_synchronous(struct dax_device *dax_dev); -static inline void set_dax_synchronous(struct dax_device *dax_dev) -{ - __set_dax_synchronous(dax_dev); -} +bool dax_synchronous(struct dax_device *dax_dev); +void set_dax_synchronous(struct dax_device *dax_dev); /* * Check if given mapping is supported by the file / underlying device. */ -- 2.30.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 6/7] vdpa: Add support for querying control virtqueue index
On Thu, Dec 9, 2021 at 4:15 AM Eli Cohen wrote: > > Add netlink attribute and callback function to query the control VQ > index of a device. > > Example: > > $ vdpa dev config show vdpa-a > vdpa-a: mac 00:00:00:00:88:88 link up link_announce false max_vq_pairs 5 \ > mtu 9000 ctrl_vq_idx 10 I still wonder about the motivation for this. And as discussed, the ctrl_vq_idx varies depending on whether MQ is negotiated. Thanks > > Signed-off-by: Eli Cohen > --- > v0 -> v1: > 1. Use logic defined in the spec to deduce virtqueue index. > > drivers/vdpa/vdpa.c | 25 + > include/uapi/linux/vdpa.h | 1 + > 2 files changed, 26 insertions(+) > > diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c > index 3bf016e03512..b4d4b8a7ca4e 100644 > --- a/drivers/vdpa/vdpa.c > +++ b/drivers/vdpa/vdpa.c > @@ -712,6 +712,27 @@ static int vdpa_nl_cmd_dev_get_dumpit(struct sk_buff > *msg, struct netlink_callba > return msg->len; > } > > +static int vdpa_dev_net_ctrl_vq_fill(struct vdpa_device *vdev, > +struct sk_buff *msg, > +struct virtio_net_config *config, > +u64 features) > +{ > + u16 N; > + > + /* control VQ index, if available, is deduced according to the logic > +* described in the virtio spec in section 5.1.2 > +*/ > + if (!(features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))) > + return 0; > + > + if (features & BIT_ULL(VIRTIO_NET_F_MQ)) > + N = le16_to_cpu(config->max_virtqueue_pairs); > + else > + N = 1; > + > + return nla_put_u16(msg, VDPA_ATTR_DEV_CTRL_VQ_IDX, 2 * N); > +} > + > static int vdpa_dev_net_mq_config_fill(struct vdpa_device *vdev, >struct sk_buff *msg, u64 features, >const struct virtio_net_config *config) > @@ -730,6 +751,7 @@ static int vdpa_dev_net_config_fill(struct vdpa_device > *vdev, struct sk_buff *ms > struct virtio_net_config config = {}; > u64 features; > u16 val_u16; > + int err; > > vdpa_get_config(vdev, 0, , sizeof(config)); > > @@ -746,6 +768,9 @@ static int vdpa_dev_net_config_fill(struct vdpa_device > *vdev, struct sk_buff *ms > return -EMSGSIZE; > > features = vdev->config->get_driver_features(vdev); > + err = vdpa_dev_net_ctrl_vq_fill(vdev, msg, , features); > + if (err) > + return err; > > return vdpa_dev_net_mq_config_fill(vdev, msg, features, ); > } > diff --git a/include/uapi/linux/vdpa.h b/include/uapi/linux/vdpa.h > index a252f06f9dfd..2e3a7f89f42d 100644 > --- a/include/uapi/linux/vdpa.h > +++ b/include/uapi/linux/vdpa.h > @@ -34,6 +34,7 @@ enum vdpa_attr { > VDPA_ATTR_DEV_MAX_VQS, /* u32 */ > VDPA_ATTR_DEV_MAX_VQ_SIZE, /* u16 */ > VDPA_ATTR_DEV_MIN_VQ_SIZE, /* u16 */ > + VDPA_ATTR_DEV_CTRL_VQ_IDX, /* u16 */ > > VDPA_ATTR_DEV_NET_CFG_MACADDR, /* binary */ > VDPA_ATTR_DEV_NET_STATUS, /* u8 */ > -- > 2.33.1 > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 5/7] vdpa/mlx5: Support configuring max data virtqueue pairs
On Thu, Dec 9, 2021 at 4:15 AM Eli Cohen wrote: > > Check whether the max number of data virtqueue pairs was provided when a > adding a new device and verify the new value does not exceed device > capabilities. > > In addition, change the arrays holding virtqueue and callback contexts > to be dynamically allocated. > > Signed-off-by: Eli Cohen Acked-by: Jason Wang > --- > v0->v1: > 1. Remove requirement to configure power of two virtqueue pairs > 2. Fix cleanup if kcalloc failed > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 43 +-- > 1 file changed, 30 insertions(+), 13 deletions(-) > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c > b/drivers/vdpa/mlx5/net/mlx5_vnet.c > index 1ec29879fc2c..4f0b8bba8b58 100644 > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c > @@ -131,11 +131,6 @@ struct mlx5_vdpa_virtqueue { > struct mlx5_vq_restore_info ri; > }; > > -/* We will remove this limitation once mlx5_vdpa_alloc_resources() > - * provides for driver space allocation > - */ > -#define MLX5_MAX_SUPPORTED_VQS 16 > - > static bool is_index_valid(struct mlx5_vdpa_dev *mvdev, u16 idx) > { > if (unlikely(idx > mvdev->max_idx)) > @@ -148,8 +143,8 @@ struct mlx5_vdpa_net { > struct mlx5_vdpa_dev mvdev; > struct mlx5_vdpa_net_resources res; > struct virtio_net_config config; > - struct mlx5_vdpa_virtqueue vqs[MLX5_MAX_SUPPORTED_VQS]; > - struct vdpa_callback event_cbs[MLX5_MAX_SUPPORTED_VQS + 1]; > + struct mlx5_vdpa_virtqueue *vqs; > + struct vdpa_callback *event_cbs; > > /* Serialize vq resources creation and destruction. This is required > * since memory map might change and we need to destroy and create > @@ -1218,7 +1213,7 @@ static void suspend_vqs(struct mlx5_vdpa_net *ndev) > { > int i; > > - for (i = 0; i < MLX5_MAX_SUPPORTED_VQS; i++) > + for (i = 0; i < ndev->mvdev.max_vqs; i++) > suspend_vq(ndev, >vqs[i]); > } > > @@ -1245,7 +1240,7 @@ static int create_rqt(struct mlx5_vdpa_net *ndev) > int i, j; > int err; > > - max_rqt = min_t(int, MLX5_MAX_SUPPORTED_VQS / 2, > + max_rqt = min_t(int, roundup_pow_of_two(ndev->mvdev.max_vqs / 2), > 1 << MLX5_CAP_GEN(ndev->mvdev.mdev, > log_max_rqt_size)); > if (max_rqt < 1) > return -EOPNOTSUPP; > @@ -2219,7 +2214,7 @@ static int mlx5_vdpa_reset(struct vdpa_device *vdev) > clear_vqs_ready(ndev); > mlx5_vdpa_destroy_mr(>mvdev); > ndev->mvdev.status = 0; > - memset(ndev->event_cbs, 0, sizeof(ndev->event_cbs)); > + memset(ndev->event_cbs, 0, sizeof(*ndev->event_cbs) * (mvdev->max_vqs > + 1)); > ndev->mvdev.actual_features = 0; > ++mvdev->generation; > if (MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) { > @@ -2292,6 +2287,8 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev) > } > mlx5_vdpa_free_resources(>mvdev); > mutex_destroy(>reslock); > + kfree(ndev->event_cbs); > + kfree(ndev->vqs); > } > > static struct vdpa_notification_area mlx5_get_vq_notification(struct > vdpa_device *vdev, u16 idx) > @@ -2537,15 +2534,33 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev > *v_mdev, const char *name, > return -EOPNOTSUPP; > } > > - /* we save one virtqueue for control virtqueue should we require it */ > max_vqs = MLX5_CAP_DEV_VDPA_EMULATION(mdev, max_num_virtio_queues); > - max_vqs = min_t(u32, max_vqs, MLX5_MAX_SUPPORTED_VQS); > + if (max_vqs < 2) { > + dev_warn(mdev->device, > +"%d virtqueues are supported. At least 2 are > required\n", > +max_vqs); > + return -EAGAIN; > + } > + > + if (add_config->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) { > + if (add_config->net.max_vq_pairs > max_vqs / 2) > + return -EINVAL; > + max_vqs = min_t(u32, max_vqs, 2 * > add_config->net.max_vq_pairs); > + } else { > + max_vqs = 2; > + } > > ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, > mdev->device, _vdpa_ops, > name, false); > if (IS_ERR(ndev)) > return PTR_ERR(ndev); > > + ndev->vqs = kcalloc(max_vqs, sizeof(*ndev->vqs), GFP_KERNEL); > + ndev->event_cbs = kcalloc(max_vqs + 1, sizeof(*ndev->event_cbs), > GFP_KERNEL); > + if (!ndev->vqs || !ndev->event_cbs) { > + err = -ENOMEM; > + goto err_alloc; > + } > ndev->mvdev.max_vqs = max_vqs; > mvdev = >mvdev; > mvdev->mdev = mdev; > @@ -2626,6 +2641,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev > *v_mdev, const char *name, > mlx5_mpfs_del_mac(pfmdev, config->mac); > err_mtu: >
Re: [PATCH v1 3/7] vdpa: Allow to configure max data virtqueues
On Thu, Dec 9, 2021 at 8:25 AM Si-Wei Liu wrote: > > > > On 12/8/2021 12:14 PM, Eli Cohen wrote: > > Add netlink support to configure the max virtqueue pairs for a device. > > At least one pair is required. The maximum is dictated by the device. > > > > Example: > > > > $ vdpa dev add name vdpa-a mgmtdev auxiliary/mlx5_core.sf.1 max_vqp 5 > Not this patch, but I think there should be a mega attribute defined > ahead to specify the virtio class/type to create vdpa instance with. > Like the existing ones e.g. mac_addr and mtu, max_vqp is also vdpa net > specific attribute. Probably, but this only works for the case when a single parent is allowed to create different types of devices. It looks to me the current model to have a per type parent. Thanks > > -Siwei > > > > > Signed-off-by: Eli Cohen > > --- > > v0 -> v1: > > 1. fix typo > > 2. move max_vq_pairs to net specific struct > > > > drivers/vdpa/vdpa.c | 14 +- > > include/linux/vdpa.h | 1 + > > 2 files changed, 14 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c > > index c37d384c0f33..3bf016e03512 100644 > > --- a/drivers/vdpa/vdpa.c > > +++ b/drivers/vdpa/vdpa.c > > @@ -480,7 +480,8 @@ vdpa_nl_cmd_mgmtdev_get_dumpit(struct sk_buff *msg, > > struct netlink_callback *cb) > > } > > > > #define VDPA_DEV_NET_ATTRS_MASK ((1 << VDPA_ATTR_DEV_NET_CFG_MACADDR) | \ > > - (1 << VDPA_ATTR_DEV_NET_CFG_MTU)) > > + (1 << VDPA_ATTR_DEV_NET_CFG_MTU) | \ > > + (1 << VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) > > > > static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff *skb, struct > > genl_info *info) > > { > > @@ -506,6 +507,17 @@ static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff > > *skb, struct genl_info *i > > nla_get_u16(nl_attrs[VDPA_ATTR_DEV_NET_CFG_MTU]); > > config.mask |= (1 << VDPA_ATTR_DEV_NET_CFG_MTU); > > } > > + if (nl_attrs[VDPA_ATTR_DEV_NET_CFG_MAX_VQP]) { > > + config.net.max_vq_pairs = > > + nla_get_u16(nl_attrs[VDPA_ATTR_DEV_NET_CFG_MAX_VQP]); > > + if (!config.net.max_vq_pairs) { > > + NL_SET_ERR_MSG_MOD(info->extack, > > +"At least one pair of VQs is > > required"); > > + err = -EINVAL; > > + goto err; > > + } > > + config.mask |= BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP); > > + } > > > > /* Skip checking capability if user didn't prefer to configure any > >* device networking attributes. It is likely that user might have > > used > > diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h > > index db24317d61b6..b62032573780 100644 > > --- a/include/linux/vdpa.h > > +++ b/include/linux/vdpa.h > > @@ -99,6 +99,7 @@ struct vdpa_dev_set_config { > > struct { > > u8 mac[ETH_ALEN]; > > u16 mtu; > > + u16 max_vq_pairs; > > } net; > > u64 mask; > > }; > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 1/7] vdpa: Provide interface to read driver features
On Thu, Dec 9, 2021 at 4:14 AM Eli Cohen wrote: > > Provide an interface to read the negotiated features. This is needed > when building the netlink message in vdpa_dev_net_config_fill(). > > Also fix the implementation of vdpa_dev_net_config_fill() to use the > negotiated features instead of the device features. > > To make APIs clearer, make the following name changes to struct > vdpa_config_ops so they better describe their operations: > > get_features -> get_device_features > set_features -> set_driver_features > > Finally, add get_driver_features to return the negotiated features and > add implementation to all the upstream drivers. > > Signed-off-by: Eli Cohen > --- > drivers/vdpa/alibaba/eni_vdpa.c| 16 > drivers/vdpa/ifcvf/ifcvf_main.c| 16 > drivers/vdpa/mlx5/net/mlx5_vnet.c | 16 > drivers/vdpa/vdpa.c| 2 +- > drivers/vdpa/vdpa_sim/vdpa_sim.c | 21 +++-- > drivers/vdpa/vdpa_user/vduse_dev.c | 16 > drivers/vdpa/virtio_pci/vp_vdpa.c | 16 > drivers/vhost/vdpa.c | 2 +- > drivers/virtio/virtio_vdpa.c | 2 +- > include/linux/vdpa.h | 14 +- > 10 files changed, 87 insertions(+), 34 deletions(-) > > diff --git a/drivers/vdpa/alibaba/eni_vdpa.c b/drivers/vdpa/alibaba/eni_vdpa.c > index 3f788794571a..ac0975660f4d 100644 > --- a/drivers/vdpa/alibaba/eni_vdpa.c > +++ b/drivers/vdpa/alibaba/eni_vdpa.c > @@ -58,7 +58,7 @@ static struct virtio_pci_legacy_device *vdpa_to_ldev(struct > vdpa_device *vdpa) > return _vdpa->ldev; > } > > -static u64 eni_vdpa_get_features(struct vdpa_device *vdpa) > +static u64 eni_vdpa_get_device_features(struct vdpa_device *vdpa) > { > struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa); > u64 features = vp_legacy_get_features(ldev); > @@ -69,7 +69,7 @@ static u64 eni_vdpa_get_features(struct vdpa_device *vdpa) > return features; > } > > -static int eni_vdpa_set_features(struct vdpa_device *vdpa, u64 features) > +static int eni_vdpa_set_driver_features(struct vdpa_device *vdpa, u64 > features) > { > struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa); > > @@ -84,6 +84,13 @@ static int eni_vdpa_set_features(struct vdpa_device *vdpa, > u64 features) > return 0; > } > > +static u64 eni_vdpa_get_driver_features(struct vdpa_device *vdpa) > +{ > + struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa); > + > + return vp_legacy_get_driver_features(ldev); > +} > + > static u8 eni_vdpa_get_status(struct vdpa_device *vdpa) > { > struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa); > @@ -401,8 +408,9 @@ static void eni_vdpa_set_config_cb(struct vdpa_device > *vdpa, > } > > static const struct vdpa_config_ops eni_vdpa_ops = { > - .get_features = eni_vdpa_get_features, > - .set_features = eni_vdpa_set_features, > + .get_device_features = eni_vdpa_get_device_features, > + .set_driver_features = eni_vdpa_set_driver_features, > + .get_driver_features = eni_vdpa_get_driver_features, > .get_status = eni_vdpa_get_status, > .set_status = eni_vdpa_set_status, > .reset = eni_vdpa_reset, > diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c > index 6dc75ca70b37..6a6a0a462392 100644 > --- a/drivers/vdpa/ifcvf/ifcvf_main.c > +++ b/drivers/vdpa/ifcvf/ifcvf_main.c > @@ -169,7 +169,7 @@ static struct ifcvf_hw *vdpa_to_vf(struct vdpa_device > *vdpa_dev) > return >vf; > } > > -static u64 ifcvf_vdpa_get_features(struct vdpa_device *vdpa_dev) > +static u64 ifcvf_vdpa_get_device_features(struct vdpa_device *vdpa_dev) > { > struct ifcvf_adapter *adapter = vdpa_to_adapter(vdpa_dev); > struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev); > @@ -187,7 +187,7 @@ static u64 ifcvf_vdpa_get_features(struct vdpa_device > *vdpa_dev) > return features; > } > > -static int ifcvf_vdpa_set_features(struct vdpa_device *vdpa_dev, u64 > features) > +static int ifcvf_vdpa_set_driver_features(struct vdpa_device *vdpa_dev, u64 > features) > { > struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev); > int ret; > @@ -201,6 +201,13 @@ static int ifcvf_vdpa_set_features(struct vdpa_device > *vdpa_dev, u64 features) > return 0; > } > > +static u64 ifcvf_vdpa_get_driver_features(struct vdpa_device *vdpa_dev) > +{ > + struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev); > + > + return vf->req_features; > +} > + > static u8 ifcvf_vdpa_get_status(struct vdpa_device *vdpa_dev) > { > struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev); > @@ -443,8 +450,9 @@ static struct vdpa_notification_area > ifcvf_get_vq_notification(struct vdpa_devic > * implemented set_map()/dma_map()/dma_unmap() > */ > static const struct vdpa_config_ops ifc_vdpa_ops = { > - .get_features = ifcvf_vdpa_get_features, > -
Re: [PATCH 2/2 v4] vdpa: check that offsets are within bounds
On Wed, Dec 8, 2021 at 6:34 PM Dan Carpenter wrote: > > In this function "c->off" is a u32 and "size" is a long. On 64bit systems > if "c->off" is greater than "size" then "size - c->off" is a negative and > we always return -E2BIG. But on 32bit systems the subtraction is type > promoted to a high positive u32 value and basically any "c->len" is > accepted. > > Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") > Reported-by: Xie Yongji > Signed-off-by: Dan Carpenter Acked-by: Jason Wang > --- > v4: split into a separate patch > > drivers/vhost/vdpa.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c > index 29cced1cd277..e3c4f059b21a 100644 > --- a/drivers/vhost/vdpa.c > +++ b/drivers/vhost/vdpa.c > @@ -197,7 +197,7 @@ static int vhost_vdpa_config_validate(struct vhost_vdpa > *v, > struct vdpa_device *vdpa = v->vdpa; > long size = vdpa->config->get_config_size(vdpa); > > - if (c->len == 0) > + if (c->len == 0 || c->off > size) > return -EINVAL; > > if (c->len > size - c->off) > -- > 2.20.1 > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/2 v4] vduse: fix memory corruption in vduse_dev_ioctl()
On Wed, Dec 8, 2021 at 6:33 PM Dan Carpenter wrote: > > The "config.offset" comes from the user. There needs to a check to > prevent it being out of bounds. The "config.offset" and > "dev->config_size" variables are both type u32. So if the offset if > out of bounds then the "dev->config_size - config.offset" subtraction > results in a very high u32 value. The out of bounds offset can result > in memory corruption. > > Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") > Signed-off-by: Dan Carpenter > --- > v2: fix reversed if statement > v3: fix vhost_vdpa_config_validate() as pointed out by Yongji Xie. > v4: split the vhost_vdpa_config_validate() change into a separate path Acked-by: Jason Wang > > drivers/vdpa/vdpa_user/vduse_dev.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c > b/drivers/vdpa/vdpa_user/vduse_dev.c > index c9204c62f339..1a206f95d73a 100644 > --- a/drivers/vdpa/vdpa_user/vduse_dev.c > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c > @@ -975,7 +975,8 @@ static long vduse_dev_ioctl(struct file *file, unsigned > int cmd, > break; > > ret = -EINVAL; > - if (config.length == 0 || > + if (config.offset > dev->config_size || > + config.length == 0 || > config.length > dev->config_size - config.offset) > break; > > -- > 2.20.1 > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V2] virtio-mmio: harden interrupt
On Thu, Dec 9, 2021 at 4:27 AM Michael S. Tsirkin wrote: > > On Fri, Nov 26, 2021 at 12:41:02PM +0800, Jason Wang wrote: > > This patch tries to make sure the virtio interrupt handler for MMIO > > won't be called after a reset and before virtio_device_ready(). We > > can't use IRQF_NO_AUTOEN since we're using shared interrupt > > (IRQF_SHARED). So this patch tracks the interrupt enabling status in a > > new intr_soft_enabled variable and toggle it during in > > vm_disable/enable_interrupts(). The MMIO interrupt handler will check > > intr_soft_enabled before processing the actual interrupt. > > > > Signed-off-by: Jason Wang > > --- > > Changes since V1: > > - Silent compling warnings > > drivers/virtio/virtio_mmio.c | 37 > > 1 file changed, 37 insertions(+) > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c > > index 56128b9c46eb..c517afdd2cc5 100644 > > --- a/drivers/virtio/virtio_mmio.c > > +++ b/drivers/virtio/virtio_mmio.c > > @@ -90,6 +90,7 @@ struct virtio_mmio_device { > > /* a list of queues so we can dispatch IRQs */ > > spinlock_t lock; > > struct list_head virtqueues; > > + bool intr_soft_enabled; > > }; > > > > struct virtio_mmio_vq_info { > > @@ -100,7 +101,37 @@ struct virtio_mmio_vq_info { > > struct list_head node; > > }; > > > > +/* disable irq handlers */ > > +static void vm_disable_cbs(struct virtio_device *vdev) > > +{ > > + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); > > + int irq = platform_get_irq(vm_dev->pdev, 0); > > > > + /* > > + * The below synchronize() guarantees that any > > + * interrupt for this line arriving after > > + * synchronize_irq() has completed is guaranteed to see > > + * intx_soft_enabled == false. > > + */ > > + WRITE_ONCE(vm_dev->intr_soft_enabled, false); > > + synchronize_irq(irq); > > +} > > + > > +/* enable irq handlers */ > > +static void vm_enable_cbs(struct virtio_device *vdev) > > +{ > > + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); > > + int irq = platform_get_irq(vm_dev->pdev, 0); > > + > > + disable_irq(irq); > > + /* > > + * The above disable_irq() provides TSO ordering and > > + * as such promotes the below store to store-release. > > + */ > > + WRITE_ONCE(vm_dev->intr_soft_enabled, true); > > + enable_irq(irq); > > + return; > > +} > > > > /* Configuration interface */ > > > > @@ -262,6 +293,8 @@ static void vm_reset(struct virtio_device *vdev) > > > > /* 0 status means a reset. */ > > writel(0, vm_dev->base + VIRTIO_MMIO_STATUS); > > There was a discussion about reading status to make sure it is clear. > The spec says we should, this can't hurt as a further hardening measure. > In fact, let's do it in the core maybe? Spec says it applies to all > devices ... We can do that, but I'm not sure if we break some existing device. > > > + /* Disable VQ/configuration callbacks. */ > > + vm_disable_cbs(vdev); > > } > > > > > > @@ -288,6 +321,9 @@ static irqreturn_t vm_interrupt(int irq, void *opaque) > > unsigned long flags; > > irqreturn_t ret = IRQ_NONE; > > > > + if (!READ_ONCE(vm_dev->intr_soft_enabled)) > > + return IRQ_NONE; > > + > > So if the write is seen before reset happened (should not happen, but we > are talking a buggy device) then it won't be acknowledged and device > will keep pulling the interrupt. I think as long as we are hardening > this, let's go the full mile and try to avoid DoS if we can, do the > check before invoking the callback, but do not skip the read. > Whether to still return IRQ_NONE is a good question. Did you mean something like this: /* Read and acknowledge interrupts */ status = readl(vm_dev->base + VIRTIO_MMIO_INTERRUPT_STATUS); writel(status, vm_dev->base + VIRTIO_MMIO_INTERRUPT_ACK); if (status) ret = IRQ_HANDLED; if (!READ_ONCE(vm_dev->intr_soft_enabled)) return ret; Thanks > > > > > > /* Read and acknowledge interrupts */ > > status = readl(vm_dev->base + VIRTIO_MMIO_INTERRUPT_STATUS); > > writel(status, vm_dev->base + VIRTIO_MMIO_INTERRUPT_ACK); > > @@ -529,6 +565,7 @@ static bool vm_get_shm_region(struct virtio_device > > *vdev, > > } > > > > static const struct virtio_config_ops virtio_mmio_config_ops = { > > + .enable_cbs = vm_enable_cbs, > > .get= vm_get, > > .set= vm_set, > > .generation = vm_generation, > > -- > > 2.25.1 > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 5/7] vhost_vsock: simplify vhost_vsock_flush()
On Thu, Dec 9, 2021 at 1:35 AM wrote: > > On 12/7/21 9:53 PM, Jason Wang wrote: > > On Tue, Dec 7, 2021 at 10:45 AM Mike Christie > > wrote: > >> > >> From: Andrey Ryabinin > >> > >> vhost_vsock_flush() calls vhost_work_dev_flush(vsock->vqs[i].poll.dev) > >> before vhost_work_dev_flush(>dev). This seems pointless > >> as vsock->vqs[i].poll.dev is the same as >dev and several flushes > >> in a row doesn't do anything useful, one is just enough. > >> > >> Signed-off-by: Andrey Ryabinin > >> Reviewed-by: Stefano Garzarella > >> Signed-off-by: Mike Christie > >> --- > >> drivers/vhost/vsock.c | 5 - > >> 1 file changed, 5 deletions(-) > >> > >> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c > >> index 2339587bcd31..1f38160b249d 100644 > >> --- a/drivers/vhost/vsock.c > >> +++ b/drivers/vhost/vsock.c > >> @@ -703,11 +703,6 @@ static int vhost_vsock_dev_open(struct inode *inode, > >> struct file *file) > >> > >> static void vhost_vsock_flush(struct vhost_vsock *vsock) > >> { > >> - int i; > >> - > >> - for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) > >> - if (vsock->vqs[i].handle_kick) > >> - vhost_work_dev_flush(vsock->vqs[i].poll.dev); > >> vhost_work_dev_flush(>dev); > >> } > > > > Is this better to be consistent with vhost-net so that we can simply > > remove the vhost_vsock_flush() wrapper here. > > > > I didn't understand that comment. > > Did you mean consistent as they both have vhost_vsock/net_flush functions > or as in they prefer to not have one line wrappers? > > Before and after this patchset, both net and vsock have a > vhost_vsock/net_flush > function, so maybe you didn't mean that. > > I think the wrapper is not very useful and could be removed. However, > I thought vsock preferred wrappers because we have vhost_vsock_free > which is just a wrapper around kfree. I also noticed test.c is a > fan of one line wrappers, but I see net and scsi do not do that. Ok, then I'm fine with this. Thanks > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 6/7] vhost-scsi: drop flush after vhost_dev_cleanup
On Thu, Dec 9, 2021 at 12:45 AM Mike Christie wrote: > > On 12/7/21 10:07 PM, Jason Wang wrote: > > On Tue, Dec 7, 2021 at 10:45 AM Mike Christie > > wrote: > >> > >> The flush after vhost_dev_cleanup is not needed because: > >> > >> 1. It doesn't do anything. vhost_dev_cleanup will stop the worker thread > >> so the flush call will just return since the worker has not device. > >> > >> 2. It's not needed for the re-queue case. vhost_scsi_evt_handle_kick grabs > >> the mutex and if the backend is NULL will return without queueing a work. > >> vhost_scsi_clear_endpoint will set the backend to NULL under the vq->mutex > >> then drops the mutex and does a flush. So we know when > >> vhost_scsi_clear_endpoint has dropped the mutex after clearing the backend > >> no evt related work will be able to requeue. The flush would then make sure > >> any queued evts are run and return. > > > > What happens if a kick after vhost_scsi_clear_endpoint() but before > > vhost_dev_stop()? > > vhost_dev_stop also does a flush, so: > > 1. The kick handler would see the backend is null and return immediately. > 2. The flush in vhost_dev_stop would wait for those kicks in #1 to complete. You are right. So Acked-by: Jason Wang > > > > > > Is this safe or the kthread_stop() in vhost_dev_cleanup() makes us safe? > > > > Thanks > > > >> > >> Signed-off-by: Mike Christie > >> --- > >> drivers/vhost/scsi.c | 2 -- > >> 1 file changed, 2 deletions(-) > >> > >> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c > >> index 532e204f2b1b..94535c813ef7 100644 > >> --- a/drivers/vhost/scsi.c > >> +++ b/drivers/vhost/scsi.c > >> @@ -1827,8 +1827,6 @@ static int vhost_scsi_release(struct inode *inode, > >> struct file *f) > >> vhost_scsi_clear_endpoint(vs, ); > >> vhost_dev_stop(>dev); > >> vhost_dev_cleanup(>dev); > >> - /* Jobs can re-queue themselves in evt kick handler. Do extra > >> flush. */ > >> - vhost_scsi_flush(vs); > >> kfree(vs->dev.vqs); > >> kvfree(vs); > >> return 0; > >> -- > >> 2.25.1 > >> > > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/7] vhost: get rid of vhost_poll_flush() wrapper
On Thu, Dec 9, 2021 at 12:41 AM Mike Christie wrote: > > On 12/7/21 9:49 PM, Jason Wang wrote: > > On Tue, Dec 7, 2021 at 10:45 AM Mike Christie > > wrote: > >> > >> vhost_poll_flush() is a simple wrapper around vhost_work_dev_flush(). > >> It gives wrong impression that we are doing some work over vhost_poll, > >> while in fact it flushes vhost_poll->dev. > > > > This "problem" is a byproduct of 7235acdb1144 ("vhost: simplify work > > flushing"). > > > > Before that we indeed have per poll flush flush. > > > >> It only complicate understanding of the code and leads to mistakes > >> like flushing the same vhost_dev several times in a row. > >> > >> Just remove vhost_poll_flush() and call vhost_work_dev_flush() directly. > > > > Not a native speaker but since we don't have an per work flush, is it > > better to rename this simply as vhost_flush()? > > > > What about vhost_dev_flush? > > For the existing naming when we have a function exported we tend to have > "vhost_" then the object/struct it works on then the action. > > For work we have: > > vhost_work_queue/init > > (we also have vhost_has_work which doesn't follow that pattern but > would sound strange as vhost_work_has so ignore that one). > > For dev operations we have: > > vhost_dev_reset_owner/set_owner/has_owner/cleanup/init > > For the flush operation I wanted it to reflect it flushed all work > on the device, so I mashed up the work and dev naming above and > I agree it's a little strange. It looks fine to me. Thanks > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 6/7] vdpa: Add support for querying control virtqueue index
On 12/8/2021 12:14 PM, Eli Cohen wrote: Add netlink attribute and callback function to query the control VQ index of a device. Example: $ vdpa dev config show vdpa-a vdpa-a: mac 00:00:00:00:88:88 link up link_announce false max_vq_pairs 5 \ mtu 9000 ctrl_vq_idx 10 First, I am not sure if there's value or a case showing that expose and trace the guest ctrl_vq_idx value (running state) to admin users turns out to be useful. Previously I thought you want to expose it to QEMU but it seems this is a bit redundant, which can be deduced from max_vqp passing up to QEMU. Second, I don't feel running states such as link_announce and ctrl_vq_idx are qualified to be configuration field that can be displayed in 'vdpa dev config show'. Could you please clarify the scope for what kind of info this command should cover? -Siwei Signed-off-by: Eli Cohen --- v0 -> v1: 1. Use logic defined in the spec to deduce virtqueue index. drivers/vdpa/vdpa.c | 25 + include/uapi/linux/vdpa.h | 1 + 2 files changed, 26 insertions(+) diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index 3bf016e03512..b4d4b8a7ca4e 100644 --- a/drivers/vdpa/vdpa.c +++ b/drivers/vdpa/vdpa.c @@ -712,6 +712,27 @@ static int vdpa_nl_cmd_dev_get_dumpit(struct sk_buff *msg, struct netlink_callba return msg->len; } +static int vdpa_dev_net_ctrl_vq_fill(struct vdpa_device *vdev, +struct sk_buff *msg, +struct virtio_net_config *config, +u64 features) +{ + u16 N; + + /* control VQ index, if available, is deduced according to the logic +* described in the virtio spec in section 5.1.2 +*/ + if (!(features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))) + return 0; + + if (features & BIT_ULL(VIRTIO_NET_F_MQ)) + N = le16_to_cpu(config->max_virtqueue_pairs); + else + N = 1; + + return nla_put_u16(msg, VDPA_ATTR_DEV_CTRL_VQ_IDX, 2 * N); +} + static int vdpa_dev_net_mq_config_fill(struct vdpa_device *vdev, struct sk_buff *msg, u64 features, const struct virtio_net_config *config) @@ -730,6 +751,7 @@ static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, struct sk_buff *ms struct virtio_net_config config = {}; u64 features; u16 val_u16; + int err; vdpa_get_config(vdev, 0, , sizeof(config)); @@ -746,6 +768,9 @@ static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, struct sk_buff *ms return -EMSGSIZE; features = vdev->config->get_driver_features(vdev); + err = vdpa_dev_net_ctrl_vq_fill(vdev, msg, , features); + if (err) + return err; return vdpa_dev_net_mq_config_fill(vdev, msg, features, ); } diff --git a/include/uapi/linux/vdpa.h b/include/uapi/linux/vdpa.h index a252f06f9dfd..2e3a7f89f42d 100644 --- a/include/uapi/linux/vdpa.h +++ b/include/uapi/linux/vdpa.h @@ -34,6 +34,7 @@ enum vdpa_attr { VDPA_ATTR_DEV_MAX_VQS, /* u32 */ VDPA_ATTR_DEV_MAX_VQ_SIZE, /* u16 */ VDPA_ATTR_DEV_MIN_VQ_SIZE, /* u16 */ + VDPA_ATTR_DEV_CTRL_VQ_IDX, /* u16 */ VDPA_ATTR_DEV_NET_CFG_MACADDR, /* binary */ VDPA_ATTR_DEV_NET_STATUS, /* u8 */ ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 3/7] vdpa: Allow to configure max data virtqueues
On 12/8/2021 12:14 PM, Eli Cohen wrote: Add netlink support to configure the max virtqueue pairs for a device. At least one pair is required. The maximum is dictated by the device. Example: $ vdpa dev add name vdpa-a mgmtdev auxiliary/mlx5_core.sf.1 max_vqp 5 Not this patch, but I think there should be a mega attribute defined ahead to specify the virtio class/type to create vdpa instance with. Like the existing ones e.g. mac_addr and mtu, max_vqp is also vdpa net specific attribute. -Siwei Signed-off-by: Eli Cohen --- v0 -> v1: 1. fix typo 2. move max_vq_pairs to net specific struct drivers/vdpa/vdpa.c | 14 +- include/linux/vdpa.h | 1 + 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index c37d384c0f33..3bf016e03512 100644 --- a/drivers/vdpa/vdpa.c +++ b/drivers/vdpa/vdpa.c @@ -480,7 +480,8 @@ vdpa_nl_cmd_mgmtdev_get_dumpit(struct sk_buff *msg, struct netlink_callback *cb) } #define VDPA_DEV_NET_ATTRS_MASK ((1 << VDPA_ATTR_DEV_NET_CFG_MACADDR) | \ -(1 << VDPA_ATTR_DEV_NET_CFG_MTU)) +(1 << VDPA_ATTR_DEV_NET_CFG_MTU) | \ +(1 << VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff *skb, struct genl_info *info) { @@ -506,6 +507,17 @@ static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff *skb, struct genl_info *i nla_get_u16(nl_attrs[VDPA_ATTR_DEV_NET_CFG_MTU]); config.mask |= (1 << VDPA_ATTR_DEV_NET_CFG_MTU); } + if (nl_attrs[VDPA_ATTR_DEV_NET_CFG_MAX_VQP]) { + config.net.max_vq_pairs = + nla_get_u16(nl_attrs[VDPA_ATTR_DEV_NET_CFG_MAX_VQP]); + if (!config.net.max_vq_pairs) { + NL_SET_ERR_MSG_MOD(info->extack, + "At least one pair of VQs is required"); + err = -EINVAL; + goto err; + } + config.mask |= BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP); + } /* Skip checking capability if user didn't prefer to configure any * device networking attributes. It is likely that user might have used diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index db24317d61b6..b62032573780 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -99,6 +99,7 @@ struct vdpa_dev_set_config { struct { u8 mac[ETH_ALEN]; u16 mtu; + u16 max_vq_pairs; } net; u64 mask; }; ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH] virtio: make sure legacy pci device gain 32bit-pfn vq
On Wed, Dec 08, 2021 at 04:04:19PM +0800, 王贇 wrote: > > > 在 2021/12/8 下午3:23, Michael S. Tsirkin 写道: > > On Tue, Dec 07, 2021 at 05:09:56PM +0800, 王贇 wrote: > > > > > > > > > 在 2021/12/7 下午4:13, Michael S. Tsirkin 写道: > > > > On Tue, Dec 07, 2021 at 03:51:45PM +0800, 王贇 wrote: > > > > > We observed issues like: > > > > > virtio-pci :14:00.0: platform bug: legacy virtio-mmio must > > > > > not be used with RAM above 0x4000GB > > > > > > > > > > when we have a legacy pci device which desired 32bit-pfn vq > > > > > but gain 64bit-pfn instead, lead into the failure of probe. > > > > > > > > > > vring_use_dma_api() is playing the key role in here, to help the > > > > > allocation process understand which kind of vq it should alloc, > > > > > however, it failed to take care the legacy pci device, which only > > > > > have 32bit feature flag and can never have VIRTIO_F_ACCESS_PLATFORM > > > > > setted. > > > > > > > > > > This patch introduce force_dma flag to help vring_use_dma_api() > > > > > understanding the requirement better, to avoid the failing. > > > > > > > > > > Signed-off-by: Michael Wang > > > > > > > > This will break configs where the device appears behind > > > > a virtual iommu, so this won't fly. > > > > Just make your device support 1.0, eh? > > > > > > Hi, Michael > > > > > > Thanks for the comment, unfortunately modify device is not an option for > > > us > > > :-( > > > > > > Is there any idea on how to solve this issue properly? > > > > > > Regards, > > > Michael Wang > > > > By the way, there is a bug in the error message. Want to fix that? > > Could you please provide more detail about the bug? We'd like to help fixing > it :-) > > Besides, I've checked that patch but it can't address our issue, we actually > have this legacy pci device on arm platform, and the memory layout is > unfriendly since allocation rarely providing page-address below 44bit, we > understand the virtio-iommu case should not do force dma, while we don't > have that so it's just working fine. > > Regards, > Michael Wang BTW is it just the ring that's at issue? Figuring out we have this problematic config and then allocating just the ring from coherent memory seems more palatable. But please note we still need to detect config with a virtual iommu (can be any kind not just virtio-iommu, smmu, vtd are all affected) and disable the hacks. This is what the new DMA API I suggested would do. > > > > > > > > > > > > > --- > > > > >drivers/virtio/virtio_pci_legacy.c | 10 ++ > > > > >drivers/virtio/virtio_ring.c | 3 +++ > > > > >include/linux/virtio.h | 1 + > > > > >3 files changed, 14 insertions(+) > > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_legacy.c > > > > > b/drivers/virtio/virtio_pci_legacy.c > > > > > index d62e983..11f2ebf 100644 > > > > > --- a/drivers/virtio/virtio_pci_legacy.c > > > > > +++ b/drivers/virtio/virtio_pci_legacy.c > > > > > @@ -263,6 +263,16 @@ int virtio_pci_legacy_probe(struct > > > > > virtio_pci_device > > > > > *vp_dev) > > > > > vp_dev->setup_vq = setup_vq; > > > > > vp_dev->del_vq = del_vq; > > > > > > > > > > + /* > > > > > + * The legacy pci device requre 32bit-pfn vq, > > > > > + * or setup_vq() will failed. > > > > > + * > > > > > + * Thus we make sure vring_use_dma_api() will > > > > > + * return true during the allocation by marking > > > > > + * force_dma here. > > > > > + */ > > > > > + vp_dev->vdev.force_dma = true; > > > > > + > > > > > return 0; > > > > > > > > > >err_iomap: > > > > > diff --git a/drivers/virtio/virtio_ring.c > > > > > b/drivers/virtio/virtio_ring.c > > > > > index 3035bb6..6562e01 100644 > > > > > --- a/drivers/virtio/virtio_ring.c > > > > > +++ b/drivers/virtio/virtio_ring.c > > > > > @@ -245,6 +245,9 @@ static inline bool virtqueue_use_indirect(struct > > > > > virtqueue *_vq, > > > > > > > > > >static bool vring_use_dma_api(struct virtio_device *vdev) > > > > >{ > > > > > + if (vdev->force_dma) > > > > > + return true; > > > > > + > > > > > if (!virtio_has_dma_quirk(vdev)) > > > > > return true; > > > > > > > > > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h > > > > > index 41edbc0..a4eb29d 100644 > > > > > --- a/include/linux/virtio.h > > > > > +++ b/include/linux/virtio.h > > > > > @@ -109,6 +109,7 @@ struct virtio_device { > > > > > bool failed; > > > > > bool config_enabled; > > > > > bool config_change_pending; > > > > > + bool force_dma; > > > > > spinlock_t config_lock; > > > > > spinlock_t vqs_list_lock; /* Protects VQs list access */ > > > > > struct device dev; > > > > > -- > > > > > 1.8.3.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 2/7] vdpa/mlx5: Distribute RX virtqueues in RQT object
On 12/8/2021 12:14 PM, Eli Cohen wrote: Distribute the available rx virtqueues amongst the available RQT entries. RQTs require to have a power of two entries. When creating or modifying the RQT, use the lowest number of power of two entries that is not less than the number of rx virtqueues. Distribute them in the available entries such that some virtqueus may be referenced twice. This allows to configure any number of virtqueue pairs when multiqueue is used. Signed-off-by: Eli Cohen --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 30 +++--- 1 file changed, 7 insertions(+), 23 deletions(-) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index ce2e13135dd8..e1a8a790f213 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1261,17 +1261,10 @@ static int create_rqt(struct mlx5_vdpa_net *ndev) MLX5_SET(rqtc, rqtc, list_q_type, MLX5_RQTC_LIST_Q_TYPE_VIRTIO_NET_Q); MLX5_SET(rqtc, rqtc, rqt_max_size, max_rqt); list = MLX5_ADDR_OF(rqtc, rqtc, rq_num[0]); - for (i = 0, j = 0; j < max_rqt; j++) { - if (!ndev->vqs[j].initialized) - continue; Why the !initialized check is dropped from the new code? - - if (!vq_is_tx(ndev->vqs[j].index)) { - list[i] = cpu_to_be32(ndev->vqs[j].virtq_id); - i++; - } - } - MLX5_SET(rqtc, rqtc, rqt_actual_size, i); + for (i = 0, j = 0; i < max_rqt; i++, j += 2) + list[i] = cpu_to_be32(ndev->vqs[j % ndev->mvdev.max_vqs].virtq_id); + MLX5_SET(rqtc, rqtc, rqt_actual_size, max_rqt); err = mlx5_vdpa_create_rqt(>mvdev, in, inlen, >res.rqtn); kfree(in); if (err) @@ -1292,7 +1285,7 @@ static int modify_rqt(struct mlx5_vdpa_net *ndev, int num) int i, j; int err; - max_rqt = min_t(int, ndev->cur_num_vqs / 2, + max_rqt = min_t(int, roundup_pow_of_two(ndev->cur_num_vqs / 2), 1 << MLX5_CAP_GEN(ndev->mvdev.mdev, log_max_rqt_size)); if (max_rqt < 1) return -EOPNOTSUPP; @@ -1308,16 +1301,10 @@ static int modify_rqt(struct mlx5_vdpa_net *ndev, int num) MLX5_SET(rqtc, rqtc, list_q_type, MLX5_RQTC_LIST_Q_TYPE_VIRTIO_NET_Q); list = MLX5_ADDR_OF(rqtc, rqtc, rq_num[0]); - for (i = 0, j = 0; j < num; j++) { - if (!ndev->vqs[j].initialized) - continue; Ditto. -Siwei + for (i = 0, j = 0; i < max_rqt; i++, j += 2) + list[i] = cpu_to_be32(ndev->vqs[j % num].virtq_id); - if (!vq_is_tx(ndev->vqs[j].index)) { - list[i] = cpu_to_be32(ndev->vqs[j].virtq_id); - i++; - } - } - MLX5_SET(rqtc, rqtc, rqt_actual_size, i); + MLX5_SET(rqtc, rqtc, rqt_actual_size, max_rqt); err = mlx5_vdpa_modify_rqt(>mvdev, in, inlen, ndev->res.rqtn); kfree(in); if (err) @@ -1581,9 +1568,6 @@ static virtio_net_ctrl_ack handle_ctrl_mq(struct mlx5_vdpa_dev *mvdev, u8 cmd) break; } - if (newqps & (newqps - 1)) - break; - if (!change_num_qps(mvdev, newqps)) status = VIRTIO_NET_OK; ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 1/7] vdpa: Provide interface to read driver features
On 12/8/2021 12:14 PM, Eli Cohen wrote: Provide an interface to read the negotiated features. This is needed when building the netlink message in vdpa_dev_net_config_fill(). Also fix the implementation of vdpa_dev_net_config_fill() to use the negotiated features instead of the device features. Are we sure the use of device feature is a bug rather than expected behavior? How do users determine the configured number of queue pairs at any point in case of a non-multiqueue supporting guest/driver or that the device is being reset (where actual_features == 0)? Maybe we should differentiate the static device config against driver/device running state here. I thought the change to vdpa_dev_net_config_fill() in this patch would break vdpa tool user's expectation that the value of max_vqp config is a fixed value since the vdpa creation time. Perhaps worth adding another attribute to represent the running state (cur_max_qps) based on the negotiated features. -Siwei To make APIs clearer, make the following name changes to struct vdpa_config_ops so they better describe their operations: get_features -> get_device_features set_features -> set_driver_features Finally, add get_driver_features to return the negotiated features and add implementation to all the upstream drivers. Signed-off-by: Eli Cohen --- drivers/vdpa/alibaba/eni_vdpa.c| 16 drivers/vdpa/ifcvf/ifcvf_main.c| 16 drivers/vdpa/mlx5/net/mlx5_vnet.c | 16 drivers/vdpa/vdpa.c| 2 +- drivers/vdpa/vdpa_sim/vdpa_sim.c | 21 +++-- drivers/vdpa/vdpa_user/vduse_dev.c | 16 drivers/vdpa/virtio_pci/vp_vdpa.c | 16 drivers/vhost/vdpa.c | 2 +- drivers/virtio/virtio_vdpa.c | 2 +- include/linux/vdpa.h | 14 +- 10 files changed, 87 insertions(+), 34 deletions(-) diff --git a/drivers/vdpa/alibaba/eni_vdpa.c b/drivers/vdpa/alibaba/eni_vdpa.c index 3f788794571a..ac0975660f4d 100644 --- a/drivers/vdpa/alibaba/eni_vdpa.c +++ b/drivers/vdpa/alibaba/eni_vdpa.c @@ -58,7 +58,7 @@ static struct virtio_pci_legacy_device *vdpa_to_ldev(struct vdpa_device *vdpa) return _vdpa->ldev; } -static u64 eni_vdpa_get_features(struct vdpa_device *vdpa) +static u64 eni_vdpa_get_device_features(struct vdpa_device *vdpa) { struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa); u64 features = vp_legacy_get_features(ldev); @@ -69,7 +69,7 @@ static u64 eni_vdpa_get_features(struct vdpa_device *vdpa) return features; } -static int eni_vdpa_set_features(struct vdpa_device *vdpa, u64 features) +static int eni_vdpa_set_driver_features(struct vdpa_device *vdpa, u64 features) { struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa); @@ -84,6 +84,13 @@ static int eni_vdpa_set_features(struct vdpa_device *vdpa, u64 features) return 0; } +static u64 eni_vdpa_get_driver_features(struct vdpa_device *vdpa) +{ + struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa); + + return vp_legacy_get_driver_features(ldev); +} + static u8 eni_vdpa_get_status(struct vdpa_device *vdpa) { struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa); @@ -401,8 +408,9 @@ static void eni_vdpa_set_config_cb(struct vdpa_device *vdpa, } static const struct vdpa_config_ops eni_vdpa_ops = { - .get_features = eni_vdpa_get_features, - .set_features = eni_vdpa_set_features, + .get_device_features = eni_vdpa_get_device_features, + .set_driver_features = eni_vdpa_set_driver_features, + .get_driver_features = eni_vdpa_get_driver_features, .get_status = eni_vdpa_get_status, .set_status = eni_vdpa_set_status, .reset = eni_vdpa_reset, diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c index 6dc75ca70b37..6a6a0a462392 100644 --- a/drivers/vdpa/ifcvf/ifcvf_main.c +++ b/drivers/vdpa/ifcvf/ifcvf_main.c @@ -169,7 +169,7 @@ static struct ifcvf_hw *vdpa_to_vf(struct vdpa_device *vdpa_dev) return >vf; } -static u64 ifcvf_vdpa_get_features(struct vdpa_device *vdpa_dev) +static u64 ifcvf_vdpa_get_device_features(struct vdpa_device *vdpa_dev) { struct ifcvf_adapter *adapter = vdpa_to_adapter(vdpa_dev); struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev); @@ -187,7 +187,7 @@ static u64 ifcvf_vdpa_get_features(struct vdpa_device *vdpa_dev) return features; } -static int ifcvf_vdpa_set_features(struct vdpa_device *vdpa_dev, u64 features) +static int ifcvf_vdpa_set_driver_features(struct vdpa_device *vdpa_dev, u64 features) { struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev); int ret; @@ -201,6 +201,13 @@ static int ifcvf_vdpa_set_features(struct vdpa_device *vdpa_dev, u64 features) return 0; } +static u64 ifcvf_vdpa_get_driver_features(struct vdpa_device
Re: [PATCH] eni_vdpa: alibaba: select VIRTIO_PCI_LIB
On Wed, Dec 08, 2021 at 10:31:03PM +0100, Arnd Bergmann wrote: > On Wed, Dec 8, 2021 at 8:56 PM Michael S. Tsirkin wrote: > > > > On Fri, Dec 03, 2021 at 07:55:14PM +0100, Arnd Bergmann wrote: > > > From: Arnd Bergmann > > > > > > When VIRTIO_PCI_LIB is not built-in but the alibaba driver is, the > > > kernel runs into a link error: > > > > > > x86_64-linux-ld: drivers/vdpa/alibaba/eni_vdpa.o: in function > > > `eni_vdpa_set_features': > > > eni_vdpa.c:(.text+0x23f): undefined reference to `vp_legacy_set_features' > > > x86_64-linux-ld: drivers/vdpa/alibaba/eni_vdpa.o: in function > > > `eni_vdpa_set_vq_state': > > > eni_vdpa.c:(.text+0x2fe): undefined reference to > > > `vp_legacy_get_queue_enable' > > > x86_64-linux-ld: drivers/vdpa/alibaba/eni_vdpa.o: in function > > > `eni_vdpa_set_vq_address': > > > eni_vdpa.c:(.text+0x376): undefined reference to > > > `vp_legacy_set_queue_address' > > > x86_64-linux-ld: drivers/vdpa/alibaba/eni_vdpa.o: in function > > > `eni_vdpa_set_vq_ready': > > > eni_vdpa.c:(.text+0x3b4): undefined reference to > > > `vp_legacy_set_queue_address' > > > x86_64-linux-ld: drivers/vdpa/alibaba/eni_vdpa.o: in function > > > `eni_vdpa_free_irq': > > > eni_vdpa.c:(.text+0x460): undefined reference to `vp_legacy_queue_vector' > > > x86_64-linux-ld: eni_vdpa.c:(.text+0x4b7): undefined reference to > > > `vp_legacy_config_vector' > > > x86_64-linux-ld: drivers/vdpa/alibaba/eni_vdpa.o: in function > > > `eni_vdpa_reset': > > > > > > Selecting VIRTIO_PCI_LIB_LEGACY is not sufficient here since that is > > > only part of the VIRTIO_PCI_LIB support. > > > > > > Fixes: e85087beedca ("eni_vdpa: add vDPA driver for Alibaba ENI") > > > Signed-off-by: Arnd Bergmann > > > > > > Confused. These are all part of > > drivers/virtio/virtio_pci_legacy_dev.c > > > > and > > obj-$(CONFIG_VIRTIO_PCI_LIB_LEGACY) += virtio_pci_legacy_dev.o > > > > what gives? > > The patch was wrong, see > > https://lore.kernel.org/lkml/20211206085034.2836099-1-a...@kernel.org/ > > for the correct fix. > > Arnd Oh, ok then, I picked that one up, that explains why the issue does not reproduce for me. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V6 01/10] Use copy_process in vhost layer
On 12/8/21 2:34 PM, Michael S. Tsirkin wrote: > On Mon, Nov 29, 2021 at 01:46:57PM -0600, Mike Christie wrote: >> The following patches made over Linus's tree, allow the vhost layer to do >> a copy_process on the thread that does the VHOST_SET_OWNER ioctl like how >> io_uring does a copy_process against its userspace app. This allows the >> vhost layer's worker threads to inherit cgroups, namespaces, address >> space, etc and this worker thread will also be accounted for against that >> owner/parent process's RLIMIT_NPROC limit. >> >> If you are not familiar with qemu and vhost here is more detailed >> problem description: >> >> Qemu will create vhost devices in the kernel which perform network, SCSI, >> etc IO and management operations from worker threads created by the >> kthread API. Because the kthread API does a copy_process on the kthreadd >> thread, the vhost layer has to use kthread_use_mm to access the Qemu >> thread's memory and cgroup_attach_task_all to add itself to the Qemu >> thread's cgroups. >> >> The problem with this approach is that we then have to add new functions/ >> args/functionality for every thing we want to inherit. I started doing >> that here: >> >> https://urldefense.com/v3/__https://lkml.org/lkml/2021/6/23/1233__;!!ACWV5N9M2RV99hQ!ceUHd4m4MTJFOGccB9N5r7WonxVoYYT2XPiYwWt2-Vt1Y-DmQirRN8OqKozFLN1h73N6$ >> >> >> for the RLIMIT_NPROC check, but it seems it might be easier to just >> inherit everything from the beginning, becuase I'd need to do something >> like that patch several times. > > > So who's merging this? Me? Did all patches get acked by appropriate > maintainers? > Not yet. Jens, The last open review comment was from you for the name change and additional patch description info. In this posting, I changed the name from: kernel_worker/kernel_worker_start to user_worker_create/user_worker_start I didn't do the start/create_user_worker format originally discussed because a while back Christoph had given me a review comment about trying to tie everything together into an API. Everything having the user_worker prefix felt nicer in that it was easy to tell the functions and flags went together, and so I thought it would handle his comment too. And patch: [PATCH V6 07/10] io_uring: switch to user_worker should better explain the reason for the switch. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V6 01/10] Use copy_process in vhost layer
On Mon, Nov 29, 2021 at 01:46:57PM -0600, Mike Christie wrote: > The following patches made over Linus's tree, allow the vhost layer to do > a copy_process on the thread that does the VHOST_SET_OWNER ioctl like how > io_uring does a copy_process against its userspace app. This allows the > vhost layer's worker threads to inherit cgroups, namespaces, address > space, etc and this worker thread will also be accounted for against that > owner/parent process's RLIMIT_NPROC limit. > > If you are not familiar with qemu and vhost here is more detailed > problem description: > > Qemu will create vhost devices in the kernel which perform network, SCSI, > etc IO and management operations from worker threads created by the > kthread API. Because the kthread API does a copy_process on the kthreadd > thread, the vhost layer has to use kthread_use_mm to access the Qemu > thread's memory and cgroup_attach_task_all to add itself to the Qemu > thread's cgroups. > > The problem with this approach is that we then have to add new functions/ > args/functionality for every thing we want to inherit. I started doing > that here: > > https://lkml.org/lkml/2021/6/23/1233 > > for the RLIMIT_NPROC check, but it seems it might be easier to just > inherit everything from the beginning, becuase I'd need to do something > like that patch several times. So who's merging this? Me? Did all patches get acked by appropriate maintainers? > V6: > - Rename kernel_worker to user_worker and fix prefixes. > - Add better patch descriptions. > V5: > - Handle kbuild errors by building patchset against current kernel that > has all deps merged. Also add patch to remove create_io_thread code as > it's not used anymore. > - Rebase patchset against current kernel and handle a new vm PF_IO_WORKER > case added in 5.16-rc1. > - Add PF_USER_WORKER flag so we can check it later after the initial > thread creation for the wake up, vm and singal cses. > - Added patch to auto reap the worker thread. > V4: > - Drop NO_SIG patch and replaced with Christian's SIG_IGN patch. > - Merged Christian's kernel_worker_flags_valid helpers into patch 5 that > added the new kernel worker functions. > - Fixed extra "i" issue. > - Added PF_USER_WORKER flag and added check that kernel_worker_start users > had that flag set. Also dropped patches that passed worker flags to > copy_thread and replaced with PF_USER_WORKER check. > V3: > - Add parentheses in p->flag and work_flags check in copy_thread. > - Fix check in arm/arm64 which was doing the reverse of other archs > where it did likely(!flags) instead of unlikely(flags). > V2: > - Rename kernel_copy_process to kernel_worker. > - Instead of exporting functions, make kernel_worker() a proper > function/API that does common work for the caller. > - Instead of adding new fields to kernel_clone_args for each option > make it flag based similar to CLONE_*. > - Drop unused completion struct in vhost. > - Fix compile warnings by merging vhost cgroup cleanup patch and > vhost conversion patch. > ~ > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATH v1 0/2] Support reading vendor statistics through iproute2/vdpa tool
On Wed, Nov 24, 2021 at 06:55:29PM +0200, Eli Cohen wrote: > The following patches add support for reading vdpa statistics > information from a vdpa instance using netlink messages. > > mlx5_vdpa is also implemented to provide that information. > > The supported statistics data are number of descriptors recieved by the > virtqueue and the number of descriptors completed by it. So I assume a v2 of this will be forthcoming, right? > v0 -> v1: > Emphasize that we're dealing with vendor specific counters. > Add mutex to synchronize query stats with changing of number of queues > > Eli Cohen (2): > vdpa: Add support for querying vendor statistics > vdpa/mlx5: Add support for reading descriptor statistics > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 143 > drivers/vdpa/vdpa.c| 144 + > include/linux/mlx5/mlx5_ifc.h | 1 + > include/linux/mlx5/mlx5_ifc_vdpa.h | 39 > include/linux/vdpa.h | 10 ++ > include/uapi/linux/vdpa.h | 9 ++ > 6 files changed, 346 insertions(+) > > -- > 2.33.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V2] virtio-mmio: harden interrupt
On Fri, Nov 26, 2021 at 12:41:02PM +0800, Jason Wang wrote: > This patch tries to make sure the virtio interrupt handler for MMIO > won't be called after a reset and before virtio_device_ready(). We > can't use IRQF_NO_AUTOEN since we're using shared interrupt > (IRQF_SHARED). So this patch tracks the interrupt enabling status in a > new intr_soft_enabled variable and toggle it during in > vm_disable/enable_interrupts(). The MMIO interrupt handler will check > intr_soft_enabled before processing the actual interrupt. > > Signed-off-by: Jason Wang > --- > Changes since V1: > - Silent compling warnings > drivers/virtio/virtio_mmio.c | 37 > 1 file changed, 37 insertions(+) > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c > index 56128b9c46eb..c517afdd2cc5 100644 > --- a/drivers/virtio/virtio_mmio.c > +++ b/drivers/virtio/virtio_mmio.c > @@ -90,6 +90,7 @@ struct virtio_mmio_device { > /* a list of queues so we can dispatch IRQs */ > spinlock_t lock; > struct list_head virtqueues; > + bool intr_soft_enabled; > }; > > struct virtio_mmio_vq_info { > @@ -100,7 +101,37 @@ struct virtio_mmio_vq_info { > struct list_head node; > }; > > +/* disable irq handlers */ > +static void vm_disable_cbs(struct virtio_device *vdev) > +{ > + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); > + int irq = platform_get_irq(vm_dev->pdev, 0); > > + /* > + * The below synchronize() guarantees that any > + * interrupt for this line arriving after > + * synchronize_irq() has completed is guaranteed to see > + * intx_soft_enabled == false. > + */ > + WRITE_ONCE(vm_dev->intr_soft_enabled, false); > + synchronize_irq(irq); > +} > + > +/* enable irq handlers */ > +static void vm_enable_cbs(struct virtio_device *vdev) > +{ > + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); > + int irq = platform_get_irq(vm_dev->pdev, 0); > + > + disable_irq(irq); > + /* > + * The above disable_irq() provides TSO ordering and > + * as such promotes the below store to store-release. > + */ > + WRITE_ONCE(vm_dev->intr_soft_enabled, true); > + enable_irq(irq); > + return; > +} > > /* Configuration interface */ > > @@ -262,6 +293,8 @@ static void vm_reset(struct virtio_device *vdev) > > /* 0 status means a reset. */ > writel(0, vm_dev->base + VIRTIO_MMIO_STATUS); There was a discussion about reading status to make sure it is clear. The spec says we should, this can't hurt as a further hardening measure. In fact, let's do it in the core maybe? Spec says it applies to all devices ... > + /* Disable VQ/configuration callbacks. */ > + vm_disable_cbs(vdev); > } > > > @@ -288,6 +321,9 @@ static irqreturn_t vm_interrupt(int irq, void *opaque) > unsigned long flags; > irqreturn_t ret = IRQ_NONE; > > + if (!READ_ONCE(vm_dev->intr_soft_enabled)) > + return IRQ_NONE; > + So if the write is seen before reset happened (should not happen, but we are talking a buggy device) then it won't be acknowledged and device will keep pulling the interrupt. I think as long as we are hardening this, let's go the full mile and try to avoid DoS if we can, do the check before invoking the callback, but do not skip the read. Whether to still return IRQ_NONE is a good question. > /* Read and acknowledge interrupts */ > status = readl(vm_dev->base + VIRTIO_MMIO_INTERRUPT_STATUS); > writel(status, vm_dev->base + VIRTIO_MMIO_INTERRUPT_ACK); > @@ -529,6 +565,7 @@ static bool vm_get_shm_region(struct virtio_device *vdev, > } > > static const struct virtio_config_ops virtio_mmio_config_ops = { > + .enable_cbs = vm_enable_cbs, > .get= vm_get, > .set= vm_set, > .generation = vm_generation, > -- > 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3] i2c: virtio: fix completion handling
On Thu, Dec 02, 2021 at 04:32:14PM +0100, Vincent Whitchurch wrote: > The driver currently assumes that the notify callback is only received > when the device is done with all the queued buffers. > > However, this is not true, since the notify callback could be called > without any of the queued buffers being completed (for example, with > virtio-pci and shared interrupts) or with only some of the buffers being > completed (since the driver makes them available to the device in > multiple separate virtqueue_add_sgs() calls). > > This can lead to incorrect data on the I2C bus or memory corruption in > the guest if the device operates on buffers which are have been freed by > the driver. (The WARN_ON in the driver is also triggered.) > > BUG kmalloc-128 (Tainted: GW): Poison overwritten > First byte 0x0 instead of 0x6b > Allocated in i2cdev_ioctl_rdwr+0x9d/0x1de age=243 cpu=0 pid=28 > memdup_user+0x2e/0xbd > i2cdev_ioctl_rdwr+0x9d/0x1de > i2cdev_ioctl+0x247/0x2ed > vfs_ioctl+0x21/0x30 > sys_ioctl+0xb18/0xb41 > Freed in i2cdev_ioctl_rdwr+0x1bb/0x1de age=68 cpu=0 pid=28 > kfree+0x1bd/0x1cc > i2cdev_ioctl_rdwr+0x1bb/0x1de > i2cdev_ioctl+0x247/0x2ed > vfs_ioctl+0x21/0x30 > sys_ioctl+0xb18/0xb41 > > Fix this by calling virtio_get_buf() from the notify handler like other > virtio drivers and by actually waiting for all the buffers to be > completed. > > Fixes: 3cfc88380413d20f ("i2c: virtio: add a virtio i2c frontend driver") > Acked-by: Viresh Kumar > Signed-off-by: Vincent Whitchurch Acked-by: Michael S. Tsirkin who's queueing this? > --- > > Notes: > v3: Wait for all completions instead of only the last one. > v2: Add Acked-by and Fixes tags. > > drivers/i2c/busses/i2c-virtio.c | 32 > 1 file changed, 12 insertions(+), 20 deletions(-) > > diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c > index 95378780da6d..41eb0dcc3204 100644 > --- a/drivers/i2c/busses/i2c-virtio.c > +++ b/drivers/i2c/busses/i2c-virtio.c > @@ -22,24 +22,24 @@ > /** > * struct virtio_i2c - virtio I2C data > * @vdev: virtio device for this controller > - * @completion: completion of virtio I2C message > * @adap: I2C adapter for this controller > * @vq: the virtio virtqueue for communication > */ > struct virtio_i2c { > struct virtio_device *vdev; > - struct completion completion; > struct i2c_adapter adap; > struct virtqueue *vq; > }; > > /** > * struct virtio_i2c_req - the virtio I2C request structure > + * @completion: completion of virtio I2C message > * @out_hdr: the OUT header of the virtio I2C message > * @buf: the buffer into which data is read, or from which it's written > * @in_hdr: the IN header of the virtio I2C message > */ > struct virtio_i2c_req { > + struct completion completion; > struct virtio_i2c_out_hdr out_hdr cacheline_aligned; > uint8_t *bufcacheline_aligned; > struct virtio_i2c_in_hdr in_hdr cacheline_aligned; > @@ -47,9 +47,11 @@ struct virtio_i2c_req { > > static void virtio_i2c_msg_done(struct virtqueue *vq) > { > - struct virtio_i2c *vi = vq->vdev->priv; > + struct virtio_i2c_req *req; > + unsigned int len; > > - complete(>completion); > + while ((req = virtqueue_get_buf(vq, ))) > + complete(>completion); > } > > static int virtio_i2c_prepare_reqs(struct virtqueue *vq, > @@ -62,6 +64,8 @@ static int virtio_i2c_prepare_reqs(struct virtqueue *vq, > for (i = 0; i < num; i++) { > int outcnt = 0, incnt = 0; > > + init_completion([i].completion); > + > /* >* Only 7-bit mode supported for this moment. For the address >* format, Please check the Virtio I2C Specification. > @@ -106,21 +110,15 @@ static int virtio_i2c_complete_reqs(struct virtqueue > *vq, > struct virtio_i2c_req *reqs, > struct i2c_msg *msgs, int num) > { > - struct virtio_i2c_req *req; > bool failed = false; > - unsigned int len; > int i, j = 0; > > for (i = 0; i < num; i++) { > - /* Detach the ith request from the vq */ > - req = virtqueue_get_buf(vq, ); > + struct virtio_i2c_req *req = [i]; > > - /* > - * Condition req == [i] should always meet since we have > - * total num requests in the vq. reqs[i] can never be NULL here. > - */ > - if (!failed && (WARN_ON(req != [i]) || > - req->in_hdr.status != VIRTIO_I2C_MSG_OK)) > + wait_for_completion(>completion); > + > + if (!failed && req->in_hdr.status != VIRTIO_I2C_MSG_OK) > failed = true; > >
Re: [PATCH] eni_vdpa: alibaba: select VIRTIO_PCI_LIB
On Fri, Dec 03, 2021 at 07:55:14PM +0100, Arnd Bergmann wrote: > From: Arnd Bergmann > > When VIRTIO_PCI_LIB is not built-in but the alibaba driver is, the > kernel runs into a link error: > > x86_64-linux-ld: drivers/vdpa/alibaba/eni_vdpa.o: in function > `eni_vdpa_set_features': > eni_vdpa.c:(.text+0x23f): undefined reference to `vp_legacy_set_features' > x86_64-linux-ld: drivers/vdpa/alibaba/eni_vdpa.o: in function > `eni_vdpa_set_vq_state': > eni_vdpa.c:(.text+0x2fe): undefined reference to `vp_legacy_get_queue_enable' > x86_64-linux-ld: drivers/vdpa/alibaba/eni_vdpa.o: in function > `eni_vdpa_set_vq_address': > eni_vdpa.c:(.text+0x376): undefined reference to `vp_legacy_set_queue_address' > x86_64-linux-ld: drivers/vdpa/alibaba/eni_vdpa.o: in function > `eni_vdpa_set_vq_ready': > eni_vdpa.c:(.text+0x3b4): undefined reference to `vp_legacy_set_queue_address' > x86_64-linux-ld: drivers/vdpa/alibaba/eni_vdpa.o: in function > `eni_vdpa_free_irq': > eni_vdpa.c:(.text+0x460): undefined reference to `vp_legacy_queue_vector' > x86_64-linux-ld: eni_vdpa.c:(.text+0x4b7): undefined reference to > `vp_legacy_config_vector' > x86_64-linux-ld: drivers/vdpa/alibaba/eni_vdpa.o: in function > `eni_vdpa_reset': > > Selecting VIRTIO_PCI_LIB_LEGACY is not sufficient here since that is > only part of the VIRTIO_PCI_LIB support. > > Fixes: e85087beedca ("eni_vdpa: add vDPA driver for Alibaba ENI") > Signed-off-by: Arnd Bergmann Confused. These are all part of drivers/virtio/virtio_pci_legacy_dev.c and obj-$(CONFIG_VIRTIO_PCI_LIB_LEGACY) += virtio_pci_legacy_dev.o what gives? > --- > drivers/vdpa/Kconfig | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig > index 50f45d037611..04466603021f 100644 > --- a/drivers/vdpa/Kconfig > +++ b/drivers/vdpa/Kconfig > @@ -80,6 +80,7 @@ config VP_VDPA > > config ALIBABA_ENI_VDPA > tristate "vDPA driver for Alibaba ENI" > + select VIRTIO_PCI_LIB > select VIRTIO_PCI_LIB_LEGACY > depends on PCI_MSI && X86 > help > -- > 2.29.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 5/7] vhost_vsock: simplify vhost_vsock_flush()
On 12/7/21 9:53 PM, Jason Wang wrote: > On Tue, Dec 7, 2021 at 10:45 AM Mike Christie > wrote: >> >> From: Andrey Ryabinin >> >> vhost_vsock_flush() calls vhost_work_dev_flush(vsock->vqs[i].poll.dev) >> before vhost_work_dev_flush(>dev). This seems pointless >> as vsock->vqs[i].poll.dev is the same as >dev and several flushes >> in a row doesn't do anything useful, one is just enough. >> >> Signed-off-by: Andrey Ryabinin >> Reviewed-by: Stefano Garzarella >> Signed-off-by: Mike Christie >> --- >> drivers/vhost/vsock.c | 5 - >> 1 file changed, 5 deletions(-) >> >> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c >> index 2339587bcd31..1f38160b249d 100644 >> --- a/drivers/vhost/vsock.c >> +++ b/drivers/vhost/vsock.c >> @@ -703,11 +703,6 @@ static int vhost_vsock_dev_open(struct inode *inode, >> struct file *file) >> >> static void vhost_vsock_flush(struct vhost_vsock *vsock) >> { >> - int i; >> - >> - for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) >> - if (vsock->vqs[i].handle_kick) >> - vhost_work_dev_flush(vsock->vqs[i].poll.dev); >> vhost_work_dev_flush(>dev); >> } > > Is this better to be consistent with vhost-net so that we can simply > remove the vhost_vsock_flush() wrapper here. > I didn't understand that comment. Did you mean consistent as they both have vhost_vsock/net_flush functions or as in they prefer to not have one line wrappers? Before and after this patchset, both net and vsock have a vhost_vsock/net_flush function, so maybe you didn't mean that. I think the wrapper is not very useful and could be removed. However, I thought vsock preferred wrappers because we have vhost_vsock_free which is just a wrapper around kfree. I also noticed test.c is a fan of one line wrappers, but I see net and scsi do not do that. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 6/7] vhost-scsi: drop flush after vhost_dev_cleanup
On 12/7/21 10:07 PM, Jason Wang wrote: > On Tue, Dec 7, 2021 at 10:45 AM Mike Christie > wrote: >> >> The flush after vhost_dev_cleanup is not needed because: >> >> 1. It doesn't do anything. vhost_dev_cleanup will stop the worker thread >> so the flush call will just return since the worker has not device. >> >> 2. It's not needed for the re-queue case. vhost_scsi_evt_handle_kick grabs >> the mutex and if the backend is NULL will return without queueing a work. >> vhost_scsi_clear_endpoint will set the backend to NULL under the vq->mutex >> then drops the mutex and does a flush. So we know when >> vhost_scsi_clear_endpoint has dropped the mutex after clearing the backend >> no evt related work will be able to requeue. The flush would then make sure >> any queued evts are run and return. > > What happens if a kick after vhost_scsi_clear_endpoint() but before > vhost_dev_stop()? vhost_dev_stop also does a flush, so: 1. The kick handler would see the backend is null and return immediately. 2. The flush in vhost_dev_stop would wait for those kicks in #1 to complete. > > Is this safe or the kthread_stop() in vhost_dev_cleanup() makes us safe? > > Thanks > >> >> Signed-off-by: Mike Christie >> --- >> drivers/vhost/scsi.c | 2 -- >> 1 file changed, 2 deletions(-) >> >> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c >> index 532e204f2b1b..94535c813ef7 100644 >> --- a/drivers/vhost/scsi.c >> +++ b/drivers/vhost/scsi.c >> @@ -1827,8 +1827,6 @@ static int vhost_scsi_release(struct inode *inode, >> struct file *f) >> vhost_scsi_clear_endpoint(vs, ); >> vhost_dev_stop(>dev); >> vhost_dev_cleanup(>dev); >> - /* Jobs can re-queue themselves in evt kick handler. Do extra flush. >> */ >> - vhost_scsi_flush(vs); >> kfree(vs->dev.vqs); >> kvfree(vs); >> return 0; >> -- >> 2.25.1 >> > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/7] vhost: get rid of vhost_poll_flush() wrapper
On 12/7/21 9:49 PM, Jason Wang wrote: > On Tue, Dec 7, 2021 at 10:45 AM Mike Christie > wrote: >> >> vhost_poll_flush() is a simple wrapper around vhost_work_dev_flush(). >> It gives wrong impression that we are doing some work over vhost_poll, >> while in fact it flushes vhost_poll->dev. > > This "problem" is a byproduct of 7235acdb1144 ("vhost: simplify work > flushing"). > > Before that we indeed have per poll flush flush. > >> It only complicate understanding of the code and leads to mistakes >> like flushing the same vhost_dev several times in a row. >> >> Just remove vhost_poll_flush() and call vhost_work_dev_flush() directly. > > Not a native speaker but since we don't have an per work flush, is it > better to rename this simply as vhost_flush()? > What about vhost_dev_flush? For the existing naming when we have a function exported we tend to have "vhost_" then the object/struct it works on then the action. For work we have: vhost_work_queue/init (we also have vhost_has_work which doesn't follow that pattern but would sound strange as vhost_work_has so ignore that one). For dev operations we have: vhost_dev_reset_owner/set_owner/has_owner/cleanup/init For the flush operation I wanted it to reflect it flushed all work on the device, so I mashed up the work and dev naming above and I agree it's a little strange. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH] vduse: check that offset is within bounds in get_config()
This condition checks "len" but it does not check "offset" and that could result in an out of bounds read if "offset > dev->config_size". The problem is that since both variables are unsigned the "dev->config_size - offset" subtraction would result in a very high unsigned value. I think these checks might not be necessary because "len" and "offset" are supposed to already have been validated using the vhost_vdpa_config_validate() function. But I do not know the code perfectly, and I like to be safe. Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Signed-off-by: Dan Carpenter --- drivers/vdpa/vdpa_user/vduse_dev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index c9204c62f339..6693a2c9e4a6 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -655,7 +655,8 @@ static void vduse_vdpa_get_config(struct vdpa_device *vdpa, unsigned int offset, { struct vduse_dev *dev = vdpa_to_vduse(vdpa); - if (len > dev->config_size - offset) + if (offset > dev->config_size || + len > dev->config_size - offset) return; memcpy(buf, dev->config + offset, len); -- 2.20.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH] virtio: make sure legacy pci device gain 32bit-pfn vq
On Wed, Dec 08, 2021 at 04:04:19PM +0800, 王贇 wrote: > > > 在 2021/12/8 下午3:23, Michael S. Tsirkin 写道: > > On Tue, Dec 07, 2021 at 05:09:56PM +0800, 王贇 wrote: > > > > > > > > > 在 2021/12/7 下午4:13, Michael S. Tsirkin 写道: > > > > On Tue, Dec 07, 2021 at 03:51:45PM +0800, 王贇 wrote: > > > > > We observed issues like: > > > > > virtio-pci :14:00.0: platform bug: legacy virtio-mmio must > > > > > not be used with RAM above 0x4000GB > > > > > > > > > > when we have a legacy pci device which desired 32bit-pfn vq > > > > > but gain 64bit-pfn instead, lead into the failure of probe. > > > > > > > > > > vring_use_dma_api() is playing the key role in here, to help the > > > > > allocation process understand which kind of vq it should alloc, > > > > > however, it failed to take care the legacy pci device, which only > > > > > have 32bit feature flag and can never have VIRTIO_F_ACCESS_PLATFORM > > > > > setted. > > > > > > > > > > This patch introduce force_dma flag to help vring_use_dma_api() > > > > > understanding the requirement better, to avoid the failing. > > > > > > > > > > Signed-off-by: Michael Wang > > > > > > > > This will break configs where the device appears behind > > > > a virtual iommu, so this won't fly. > > > > Just make your device support 1.0, eh? > > > > > > Hi, Michael > > > > > > Thanks for the comment, unfortunately modify device is not an option for > > > us > > > :-( > > > > > > Is there any idea on how to solve this issue properly? > > > > > > Regards, > > > Michael Wang > > > > By the way, there is a bug in the error message. Want to fix that? > > Could you please provide more detail about the bug? We'd like to help fixing > it :-) virtio-pci :14:00.0: platform bug: legacy virtio-mmio must ... should be virtio-pci not virtio-mmio > Besides, I've checked that patch but it can't address our issue, we actually > have this legacy pci device on arm platform, and the memory layout is > unfriendly since allocation rarely providing page-address below 44bit, we > understand the virtio-iommu case should not do force dma, while we don't > have that so it's just working fine. > > Regards, > Michael Wang Hmm wait a sec is it a physical device or a hypervisor? If a physical one then doesn't it need VIRTIO_F_ORDER_PLATFORM on ARM? > > > > > > > > > > > > > --- > > > > >drivers/virtio/virtio_pci_legacy.c | 10 ++ > > > > >drivers/virtio/virtio_ring.c | 3 +++ > > > > >include/linux/virtio.h | 1 + > > > > >3 files changed, 14 insertions(+) > > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_legacy.c > > > > > b/drivers/virtio/virtio_pci_legacy.c > > > > > index d62e983..11f2ebf 100644 > > > > > --- a/drivers/virtio/virtio_pci_legacy.c > > > > > +++ b/drivers/virtio/virtio_pci_legacy.c > > > > > @@ -263,6 +263,16 @@ int virtio_pci_legacy_probe(struct > > > > > virtio_pci_device > > > > > *vp_dev) > > > > > vp_dev->setup_vq = setup_vq; > > > > > vp_dev->del_vq = del_vq; > > > > > > > > > > + /* > > > > > + * The legacy pci device requre 32bit-pfn vq, > > > > > + * or setup_vq() will failed. > > > > > + * > > > > > + * Thus we make sure vring_use_dma_api() will > > > > > + * return true during the allocation by marking > > > > > + * force_dma here. > > > > > + */ > > > > > + vp_dev->vdev.force_dma = true; > > > > > + > > > > > return 0; > > > > > > > > > >err_iomap: > > > > > diff --git a/drivers/virtio/virtio_ring.c > > > > > b/drivers/virtio/virtio_ring.c > > > > > index 3035bb6..6562e01 100644 > > > > > --- a/drivers/virtio/virtio_ring.c > > > > > +++ b/drivers/virtio/virtio_ring.c > > > > > @@ -245,6 +245,9 @@ static inline bool virtqueue_use_indirect(struct > > > > > virtqueue *_vq, > > > > > > > > > >static bool vring_use_dma_api(struct virtio_device *vdev) > > > > >{ > > > > > + if (vdev->force_dma) > > > > > + return true; > > > > > + > > > > > if (!virtio_has_dma_quirk(vdev)) > > > > > return true; > > > > > > > > > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h > > > > > index 41edbc0..a4eb29d 100644 > > > > > --- a/include/linux/virtio.h > > > > > +++ b/include/linux/virtio.h > > > > > @@ -109,6 +109,7 @@ struct virtio_device { > > > > > bool failed; > > > > > bool config_enabled; > > > > > bool config_change_pending; > > > > > + bool force_dma; > > > > > spinlock_t config_lock; > > > > > spinlock_t vqs_list_lock; /* Protects VQs list access */ > > > > > struct device dev; > > > > > -- > > > > > 1.8.3.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 2/2 v4] vdpa: check that offsets are within bounds
In this function "c->off" is a u32 and "size" is a long. On 64bit systems if "c->off" is greater than "size" then "size - c->off" is a negative and we always return -E2BIG. But on 32bit systems the subtraction is type promoted to a high positive u32 value and basically any "c->len" is accepted. Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Reported-by: Xie Yongji Signed-off-by: Dan Carpenter --- v4: split into a separate patch drivers/vhost/vdpa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 29cced1cd277..e3c4f059b21a 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -197,7 +197,7 @@ static int vhost_vdpa_config_validate(struct vhost_vdpa *v, struct vdpa_device *vdpa = v->vdpa; long size = vdpa->config->get_config_size(vdpa); - if (c->len == 0) + if (c->len == 0 || c->off > size) return -EINVAL; if (c->len > size - c->off) -- 2.20.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 1/2 v4] vduse: fix memory corruption in vduse_dev_ioctl()
The "config.offset" comes from the user. There needs to a check to prevent it being out of bounds. The "config.offset" and "dev->config_size" variables are both type u32. So if the offset if out of bounds then the "dev->config_size - config.offset" subtraction results in a very high u32 value. The out of bounds offset can result in memory corruption. Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Signed-off-by: Dan Carpenter --- v2: fix reversed if statement v3: fix vhost_vdpa_config_validate() as pointed out by Yongji Xie. v4: split the vhost_vdpa_config_validate() change into a separate path drivers/vdpa/vdpa_user/vduse_dev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index c9204c62f339..1a206f95d73a 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -975,7 +975,8 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd, break; ret = -EINVAL; - if (config.length == 0 || + if (config.offset > dev->config_size || + config.length == 0 || config.length > dev->config_size - config.offset) break; -- 2.20.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] drm: Return error codes from struct drm_driver.gem_create_object
Hi Am 07.12.21 um 10:24 schrieb Thomas Zimmermann: Hi Am 07.12.21 um 09:55 schrieb Dan Carpenter: I appologize. This thread has been really frustrating. I got mixed up because I recently sent patches for ingenic and vc4. Also we are working against different trees so maybe that is part of the problem? I'm looking at today's linux-next. Your patch has been applied. Yes. You updated all the drivers. But somehow the vc4 chunk from your patch was dropped. It was *NOT* dropped by Stephen Rothwell. It got dropped earlier. I am including the `git format-patch -1 ` output from the commit. My vc4 change is in drm-misc-next, [1] but not in drm-tip. [2] Your vc4 patch went through drm-misc-fixes. drm-tip is build automatically by our DRM tools from the various trees. The tools took my patch from drm-misc-next and your patch from drm-misc-fixes and the result was broken. Thanks for bringing this up. I'll see what I can do about it. FYI I fixed drm-tip to return a pointer-encoded error in vc4. [1] Best regards Thomas [1] https://cgit.freedesktop.org/drm/drm-tip/tree/drivers/gpu/drm/vc4/vc4_bo.c?id=cc3b9eb186e3c8dfb0bcc7d54fa4bfbe52c0b58b#n394 Best regards Thomas [1] https://cgit.freedesktop.org/drm/drm-misc/tree/drivers/gpu/drm/vc4/vc4_bo.c#n394 [2] https://cgit.freedesktop.org/drm/drm-tip/tree/drivers/gpu/drm/vc4/vc4_bo.c -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Ivo Totev OpenPGP_signature Description: OpenPGP digital signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization