[bug report] vdpa: introduce virtio pci driver
Hello Jason Wang, The patch 010eee82c84e: "vdpa: introduce virtio pci driver" from Feb 23, 2021, leads to the following static checker warning: drivers/vdpa/virtio_pci/vp_vdpa.c:168 vp_vdpa_request_irq() warn: inconsistent indenting drivers/vdpa/virtio_pci/vp_vdpa.c 154 goto err; 155 } 156 vp_modern_queue_vector(mdev, i, i); 157 vp_vdpa->vring[i].irq = irq; 158 } 159 160 snprintf(vp_vdpa->msix_name, VP_VDPA_NAME_SIZE, "vp-vdpa[%s]-config\n", 161 pci_name(pdev)); 162 irq = pci_irq_vector(pdev, queues); 163 ret = devm_request_irq(>dev, irq, vp_vdpa_config_handler, 0, 164 vp_vdpa->msix_name, vp_vdpa); 165 if (ret) { 166 dev_err(>dev, 167 "vp_vdpa: fail to request irq for vq %d\n", i); Is this error message right? 168 goto err; indented too far 169 } 170 vp_modern_config_vector(mdev, queues); 171 vp_vdpa->config_irq = irq; 172 173 return 0; 174 err: 175 vp_vdpa_free_irq(vp_vdpa); 176 return ret; 177 } regards, dan carpenter ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [virtio-dev] Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero
On 2021/2/24 7:12 下午, Cornelia Huck wrote: On Wed, 24 Feb 2021 17:29:07 +0800 Jason Wang wrote: On 2021/2/23 6:58 下午, Cornelia Huck wrote: On Tue, 23 Feb 2021 18:31:07 +0800 Jason Wang wrote: On 2021/2/23 6:04 下午, Cornelia Huck wrote: On Tue, 23 Feb 2021 17:46:20 +0800 Jason Wang wrote: On 2021/2/23 下午5:25, Michael S. Tsirkin wrote: On Mon, Feb 22, 2021 at 09:09:28AM -0800, Si-Wei Liu wrote: On 2/21/2021 8:14 PM, Jason Wang wrote: On 2021/2/19 7:54 下午, Si-Wei Liu wrote: Commit 452639a64ad8 ("vdpa: make sure set_features is invoked for legacy") made an exception for legacy guests to reset features to 0, when config space is accessed before features are set. We should relieve the verify_min_features() check and allow features reset to 0 for this case. It's worth noting that not just legacy guests could access config space before features are set. For instance, when feature VIRTIO_NET_F_MTU is advertised some modern driver will try to access and validate the MTU present in the config space before virtio features are set. This looks like a spec violation: " The following driver-read-only field, mtu only exists if VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the driver to use. " Do we really want to workaround this? Isn't the commit 452639a64ad8 itself is a workaround for legacy guest? I think the point is, since there's legacy guest we'd have to support, this host side workaround is unavoidable. Although I agree the violating driver should be fixed (yes, it's in today's upstream kernel which exists for a while now). Oh you are right: static int virtnet_validate(struct virtio_device *vdev) { if (!vdev->config->get) { dev_err(>dev, "%s failure: config access disabled\n", __func__); return -EINVAL; } if (!virtnet_validate_features(vdev)) return -EINVAL; if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) { int mtu = virtio_cread16(vdev, offsetof(struct virtio_net_config, mtu)); if (mtu < MIN_MTU) __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU); I wonder why not simply fail here? I think both failing or not accepting the feature can be argued to make sense: "the device presented us with a mtu size that does not make sense" would point to failing, "we cannot work with the mtu size that the device presented us" would point to not negotiating the feature. } return 0; } And the spec says: The driver MUST follow this sequence to initialize a device: 1. Reset the device. 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device. 3. Set the DRIVER status bit: the guest OS knows how to drive the device. 4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration fields to check that it can support the device before accepting it. 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step. 6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not support our subset of features and the device is unusable. 7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup, reading and possibly writing the device’s virtio configuration space, and population of virtqueues. 8. Set the DRIVER_OK status bit. At this point the device is “live”. Item 4 on the list explicitly allows reading config space before FEATURES_OK. I conclude that VIRTIO_NET_F_MTU is set means "set in device features". So this probably need some clarification. "is set" is used many times in the spec that has different implications. Before FEATURES_OK is set by the driver, I guess it means "the device has offered the feature"; For me this part is ok since it clarify that it's the driver that set the bit. during normal usage, it means "the feature has been negotiated". /? It looks to me the feature negotiation is done only after device set FEATURES_OK, or FEATURES_OK could be read from device status? I'd consider feature negotiation done when the driver reads FEATURES_OK back from the status. I agree. (This is a bit fuzzy for legacy mode.) ...because legacy does not have FEATURES_OK. The problem is the MTU description for example: "The following driver-read-only field, mtu only exists if VIRTIO_NET_F_MTU is set." It looks to me need to use "if VIRTIO_NET_F_MTU is set by device". "offered by the device"? I don't think it should 'disappear' from the config space if the driver won't use it. (Same for other config space fields that are tied to feature bits.)
Re: [PATCH v8 bpf-next 0/5] xsk: build skb by page (aka generic zerocopy xmit)
On 2/18/21 9:49 PM, Alexander Lobakin wrote: This series introduces XSK generic zerocopy xmit by adding XSK umem pages as skb frags instead of copying data to linear space. The only requirement for this for drivers is to be able to xmit skbs with skb_headlen(skb) == 0, i.e. all data including hard headers starts from frag 0. To indicate whether a particular driver supports this, a new netdev priv flag, IFF_TX_SKB_NO_LINEAR, is added (and declared in virtio_net as it's already capable of doing it). So consider implementing this in your drivers to greatly speed-up generic XSK xmit. [...] Applied, thanks! ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH] blk-core: remove blk_put_request()
On Wed, Feb 24, 2021 at 09:48:21AM -0700, Jens Axboe wrote: > Would make sense to rename blk_get_request() to blk_mq_alloc_request() > and then we have API symmetry. The get/put don't make sense when there > are no references involved. > > But it's a lot of churn for very little reward, which is always kind > of annoying. Especially for the person that has to carry the patches. Let's do the following: - move the initialize_rq_fn call from blk_get_request into blk_mq_alloc_request and make the former a trivial alias for the latter - migrate to the blk_mq_* versions on a per-driver/subsystem basis. The scsi migration depends on the first item above, so it will have to go with that or wait for the next merge window - don't migrate the legacy ide driver, as it is about to be removed and has a huge number of blk_get_request calls ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero
On 2/23/2021 9:04 PM, Michael S. Tsirkin wrote: On Tue, Feb 23, 2021 at 11:35:57AM -0800, Si-Wei Liu wrote: On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote: On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote: On 2021/2/23 9:12 上午, Si-Wei Liu wrote: On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote: On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote: On 2021/2/19 7:54 下午, Si-Wei Liu wrote: Commit 452639a64ad8 ("vdpa: make sure set_features is invoked for legacy") made an exception for legacy guests to reset features to 0, when config space is accessed before features are set. We should relieve the verify_min_features() check and allow features reset to 0 for this case. It's worth noting that not just legacy guests could access config space before features are set. For instance, when feature VIRTIO_NET_F_MTU is advertised some modern driver will try to access and validate the MTU present in the config space before virtio features are set. This looks like a spec violation: " The following driver-read-only field, mtu only exists if VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the driver to use. " Do we really want to workaround this? Thanks And also: The driver MUST follow this sequence to initialize a device: 1. Reset the device. 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device. 3. Set the DRIVER status bit: the guest OS knows how to drive the device. 4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration fields to check that it can support the device before accepting it. 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step. 6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not support our subset of features and the device is unusable. 7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup, reading and possibly writing the device’s virtio configuration space, and population of virtqueues. 8. Set the DRIVER_OK status bit. At this point the device is “live”. so accessing config space before FEATURES_OK is a spec violation, right? It is, but it's not relevant to what this commit tries to address. I thought the legacy guest still needs to be supported. Having said, a separate patch has to be posted to fix the guest driver issue where this discrepancy is introduced to virtnet_validate() (since commit fe36cbe067). But it's not technically related to this patch. -Siwei I think it's a bug to read config space in validate, we should move it to virtnet_probe(). Thanks I take it back, reading but not writing seems to be explicitly allowed by spec. So our way to detect a legacy guest is bogus, need to think what is the best way to handle this. Then maybe revert commit fe36cbe067 and friends, and have QEMU detect legacy guest? Supposedly only config space write access needs to be guarded before setting FEATURES_OK. -Siwie Detecting it isn't enough though, we will need a new ioctl to notify the kernel that it's a legacy guest. Ugh :( Well, although I think adding an ioctl is doable, may I know what the use case there will be for kernel to leverage such info directly? Is there a case QEMU can't do with dedicate ioctls later if there's indeed differentiation (legacy v.s. modern) needed? One of the reason I asked is if this ioctl becomes a mandate for vhost-vdpa kernel. QEMU would reject initialize vhost-vdpa if doesn't see this ioctl coming? If it's optional, suppose the kernel may need it only when it becomes necessary? Thanks, -Siwei Rejecting reset to 0 prematurely causes correct MTU and link status unable to load for the very first config space access, rendering issues like guest showing inaccurate MTU value, or failure to reject out-of-range MTU. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Si-Wei Liu --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +-- 1 file changed, 1 insertion(+), 14 deletions(-) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 7c1f789..540dd67 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1490,14 +1490,6 @@ static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev) return mvdev->mlx_features; } -static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features) -{ - if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM))) - return -EOPNOTSUPP; - - return 0; -} - static int setup_virtqueues(struct mlx5_vdpa_net *ndev) { int err; @@ -1558,18 +1550,13 @@ static int mlx5_vdpa_set_features(struct vdpa_device *vdev, u64 features) { struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); struct
Re: [RFC PATCH] blk-core: remove blk_put_request()
On 2/24/21 4:53 AM, Stefan Hajnoczi wrote: > On Mon, Feb 22, 2021 at 01:11:15PM -0800, Chaitanya Kulkarni wrote: >> The function blk_put_request() is just a wrapper to >> blk_mq_free_request(), remove the unnecessary wrapper. >> >> Any feedback is welcome on this RFC. >> >> Signed-off-by: Chaitanya Kulkarni >> --- >> block/blk-core.c | 6 -- >> block/blk-merge.c | 2 +- >> block/bsg-lib.c| 4 ++-- >> block/bsg.c| 4 ++-- >> block/scsi_ioctl.c | 6 +++--- >> drivers/block/paride/pd.c | 2 +- >> drivers/block/pktcdvd.c| 2 +- >> drivers/block/virtio_blk.c | 2 +- >> drivers/cdrom/cdrom.c | 4 ++-- >> drivers/ide/ide-atapi.c| 2 +- >> drivers/ide/ide-cd.c | 4 ++-- >> drivers/ide/ide-cd_ioctl.c | 2 +- >> drivers/ide/ide-devsets.c | 2 +- >> drivers/ide/ide-disk.c | 2 +- >> drivers/ide/ide-ioctls.c | 4 ++-- >> drivers/ide/ide-park.c | 2 +- >> drivers/ide/ide-pm.c | 4 ++-- >> drivers/ide/ide-tape.c | 2 +- >> drivers/ide/ide-taskfile.c | 2 +- >> drivers/md/dm-mpath.c | 2 +- >> drivers/mmc/core/block.c | 10 +- >> drivers/scsi/scsi_error.c | 2 +- >> drivers/scsi/scsi_lib.c| 2 +- >> drivers/scsi/sg.c | 6 +++--- >> drivers/scsi/st.c | 4 ++-- >> drivers/scsi/ufs/ufshcd.c | 6 +++--- >> drivers/target/target_core_pscsi.c | 4 ++-- >> fs/nfsd/blocklayout.c | 4 ++-- >> include/linux/blkdev.h | 1 - >> 29 files changed, 46 insertions(+), 53 deletions(-) >> >> diff --git a/block/blk-core.c b/block/blk-core.c >> index fc60ff208497..1754f5e7cc80 100644 >> --- a/block/blk-core.c >> +++ b/block/blk-core.c >> @@ -642,12 +642,6 @@ struct request *blk_get_request(struct request_queue >> *q, unsigned int op, >> } >> EXPORT_SYMBOL(blk_get_request); >> >> -void blk_put_request(struct request *req) >> -{ >> -blk_mq_free_request(req); >> -} >> -EXPORT_SYMBOL(blk_put_request); > > blk_get_request() still exists after this patch. A "get" API usually has > a corresponding "put" API. I'm not sure this patch helps the consistency > and clarity of the code. > > If you do go ahead, please update the blk_get_request() doc comment > explicitly mentioning that blk_mq_free_request() needs to be called. Would make sense to rename blk_get_request() to blk_mq_alloc_request() and then we have API symmetry. The get/put don't make sense when there are no references involved. But it's a lot of churn for very little reward, which is always kind of annoying. Especially for the person that has to carry the patches. -- Jens Axboe ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V4 1/3] virtio: don't prompt CONFIG_VIRTIO_PCI_MODERN
On 23.02.21 07:19, Jason Wang wrote: > We used to prompt CONFIG_VIRTIO_PCI_MODERN to user which may bring a > lot of confusion. E.g it may break various default configs which want > virtio devices. > > So this patch fixes this by hiding the prompot and documenting the > dependency. While at it, rename the module to VIRTIO_PCI_LIB. > > Cc: Arnd Bergmann > Cc: Anders Roxell > Cc: Guenter Roeck > Reported-by: Naresh Kamboju > Fixes: 86b87c9d858b6 ("virtio-pci: introduce modern device module") > Signed-off-by: Jason Wang Michael, Can we please add this NOW to next? virtio-pci is broken without it on many defconfigs including s390. So linux-next is broken for more than 2 weeks now and it actively breaks several parts of our CI. I guess there are other CIs that will not run several testcases because of this. And Naresh reported that on Feb 9th. There IS value in CI tools on next. Not caring about regressions introduced by a tree in next is harmful. Especially when we are close or in the merge window. So please: either fix things quickly OR revert. Christian > --- > drivers/virtio/Kconfig | 11 ++- > drivers/virtio/Makefile | 2 +- > 2 files changed, 7 insertions(+), 6 deletions(-) > > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig > index 6b9b81f4b8c2..ce1b3f6ec325 100644 > --- a/drivers/virtio/Kconfig > +++ b/drivers/virtio/Kconfig > @@ -12,13 +12,13 @@ config ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS > This option is selected if the architecture may need to enforce > VIRTIO_F_ACCESS_PLATFORM > > -config VIRTIO_PCI_MODERN > - tristate "Modern Virtio PCI Device" > - depends on PCI > +config VIRTIO_PCI_LIB > + tristate > help > Modern PCI device implementation. This module implements the > basic probe and control for devices which are based on modern > - PCI device with possible vendor specific extensions. > + PCI device with possible vendor specific extensions. Any > + module that selects this module must depend on PCI. > > menuconfig VIRTIO_MENU > bool "Virtio drivers" > @@ -28,7 +28,8 @@ if VIRTIO_MENU > > config VIRTIO_PCI > tristate "PCI driver for virtio devices" > - depends on VIRTIO_PCI_MODERN > + depends on PCI > + select VIRTIO_PCI_LIB > select VIRTIO > help > This driver provides support for virtio based paravirtual device > diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile > index f097578aaa8f..699bbea0465f 100644 > --- a/drivers/virtio/Makefile > +++ b/drivers/virtio/Makefile > @@ -1,6 +1,6 @@ > # SPDX-License-Identifier: GPL-2.0 > obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o > -obj-$(CONFIG_VIRTIO_PCI_MODERN) += virtio_pci_modern_dev.o > +obj-$(CONFIG_VIRTIO_PCI_LIB) += virtio_pci_modern_dev.o > obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o > obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o > virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH] blk-core: remove blk_put_request()
On Mon, Feb 22, 2021 at 01:11:15PM -0800, Chaitanya Kulkarni wrote: > The function blk_put_request() is just a wrapper to > blk_mq_free_request(), remove the unnecessary wrapper. > > Any feedback is welcome on this RFC. > > Signed-off-by: Chaitanya Kulkarni > --- > block/blk-core.c | 6 -- > block/blk-merge.c | 2 +- > block/bsg-lib.c| 4 ++-- > block/bsg.c| 4 ++-- > block/scsi_ioctl.c | 6 +++--- > drivers/block/paride/pd.c | 2 +- > drivers/block/pktcdvd.c| 2 +- > drivers/block/virtio_blk.c | 2 +- > drivers/cdrom/cdrom.c | 4 ++-- > drivers/ide/ide-atapi.c| 2 +- > drivers/ide/ide-cd.c | 4 ++-- > drivers/ide/ide-cd_ioctl.c | 2 +- > drivers/ide/ide-devsets.c | 2 +- > drivers/ide/ide-disk.c | 2 +- > drivers/ide/ide-ioctls.c | 4 ++-- > drivers/ide/ide-park.c | 2 +- > drivers/ide/ide-pm.c | 4 ++-- > drivers/ide/ide-tape.c | 2 +- > drivers/ide/ide-taskfile.c | 2 +- > drivers/md/dm-mpath.c | 2 +- > drivers/mmc/core/block.c | 10 +- > drivers/scsi/scsi_error.c | 2 +- > drivers/scsi/scsi_lib.c| 2 +- > drivers/scsi/sg.c | 6 +++--- > drivers/scsi/st.c | 4 ++-- > drivers/scsi/ufs/ufshcd.c | 6 +++--- > drivers/target/target_core_pscsi.c | 4 ++-- > fs/nfsd/blocklayout.c | 4 ++-- > include/linux/blkdev.h | 1 - > 29 files changed, 46 insertions(+), 53 deletions(-) > > diff --git a/block/blk-core.c b/block/blk-core.c > index fc60ff208497..1754f5e7cc80 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -642,12 +642,6 @@ struct request *blk_get_request(struct request_queue *q, > unsigned int op, > } > EXPORT_SYMBOL(blk_get_request); > > -void blk_put_request(struct request *req) > -{ > - blk_mq_free_request(req); > -} > -EXPORT_SYMBOL(blk_put_request); blk_get_request() still exists after this patch. A "get" API usually has a corresponding "put" API. I'm not sure this patch helps the consistency and clarity of the code. If you do go ahead, please update the blk_get_request() doc comment explicitly mentioning that blk_mq_free_request() needs to be called. Stefan signature.asc Description: PGP signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3] vhost_vdpa: fix the missing irq_bypass_unregister_producer() invocation
Hi Michael, I've submitted a fresh patch including the ack from Jason and after testing with git am. Regards, Gautam On Tue, 23 Feb 2021 at 18:20, Michael S. Tsirkin wrote: > > On Tue, Feb 23, 2021 at 06:15:07PM +0530, Gautam Dawar wrote: > > When qemu with vhost-vdpa netdevice is run for the first time, it works > > well. > > But after the VM is powered off, the next qemu run causes kernel panic due > > to a > > NULL pointer dereference in irq_bypass_register_producer(). > > > > When the VM is powered off, vhost_vdpa_clean_irq() misses on calling > > irq_bypass_unregister_producer() for irq 0 because of the existing check. > > > > This leaves stale producer nodes, which are reset in > > vhost_vring_call_reset() > > when vhost_dev_init() is invoked during the second qemu run. > > > > As the node member of struct irq_bypass_producer is also initialized > > to zero, traversal on the producers list causes crash due to NULL pointer > > dereference. > > > > Fixes: 2cf1ba9a4d15c ("vhost_vdpa: implement IRQ offloading in vhost_vdpa") > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211711 > > Signed-off-by: Gautam Dawar > > > A bit better but still: > Applying: vhost_vdpa: fix the missing irq_bypass_unregister_producer() > invocation > error: corrupt patch at line 6 > error: could not build fake ancestor > Patch failed at 0001 vhost_vdpa: fix the missing > irq_bypass_unregister_producer() invocation > hint: Use 'git am --show-current-patch=diff' to see the failed patch > > > you also want to include acks you got on the previous versions. > > > - > > changelog: > > v2->v3: > > - Re-submitting the patch in plain text format as suggested by Michael > > - Added the fixes tag > > > > v1->v2: > > - Addressed Jason's comment to remove the irq check and use > >vhost_vdpa_unsetup_vq_irq() to avoid local variable vq > > - > > > > > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c > > index 62a9bb0efc55..e00573b87aba 100644 > > --- a/drivers/vhost/vdpa.c > > +++ b/drivers/vhost/vdpa.c > > @@ -844,14 +844,10 @@ static int vhost_vdpa_open(struct inode *inode, > > struct file *filep) > > > > static void vhost_vdpa_clean_irq(struct vhost_vdpa *v) > > { > > - struct vhost_virtqueue *vq; > > int i; > > > > - for (i = 0; i < v->nvqs; i++) { > > - vq = >vqs[i]; > > - if (vq->call_ctx.producer.irq) > > - > > irq_bypass_unregister_producer(>call_ctx.producer); > > - } > > + for (i = 0; i < v->nvqs; i++) > > + vhost_vdpa_unsetup_vq_irq(v, i); > > } > > > > static int vhost_vdpa_release(struct inode *inode, struct file *filep) > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH] vhost_vdpa: fix the missing irq_bypass_unregister_producer() invocation
When qemu with vhost-vdpa netdevice is run for the first time, it works well. But after the VM is powered off, the next qemu run causes kernel panic due to a NULL pointer dereference in irq_bypass_register_producer(). When the VM is powered off, vhost_vdpa_clean_irq() misses on calling irq_bypass_unregister_producer() for irq 0 because of the existing check. This leaves stale producer nodes, which are reset in vhost_vring_call_reset() when vhost_dev_init() is invoked during the second qemu run. As the node member of struct irq_bypass_producer is also initialized to zero, traversal on the producers list causes crash due to NULL pointer dereference. Fixes: 2cf1ba9a4d15c ("vhost_vdpa: implement IRQ offloading in vhost_vdpa") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211711 Signed-off-by: Gautam Dawar Acked-by: Jason Wang --- drivers/vhost/vdpa.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 62a9bb0efc55..e00573b87aba 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -844,14 +844,10 @@ static int vhost_vdpa_open(struct inode *inode, struct file *filep) static void vhost_vdpa_clean_irq(struct vhost_vdpa *v) { - struct vhost_virtqueue *vq; int i; - for (i = 0; i < v->nvqs; i++) { - vq = >vqs[i]; - if (vq->call_ctx.producer.irq) - irq_bypass_unregister_producer(>call_ctx.producer); - } + for (i = 0; i < v->nvqs; i++) + vhost_vdpa_unsetup_vq_irq(v, i); } static int vhost_vdpa_release(struct inode *inode, struct file *filep) -- 2.30.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH linux-next 2/9] vdpa: Introduce query of device config layout
> From: Michael S. Tsirkin > Sent: Wednesday, February 24, 2021 12:33 PM > > > + > > + features = vdev->config->get_features(vdev); > > + if ((features & (1ULL << VIRTIO_NET_F_MQ)) == 0) > > + return 0; > > + > > + if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP, > > + config->max_virtqueue_pairs)) > > + return -EMSGSIZE; > > + if (nla_put_u8(msg, > VDPA_ATTR_DEV_NET_CFG_RSS_MAX_KEY_LEN, > > + config->rss_max_key_size)) > > Why is it ok to poke at RSS fields without checking the relevant feature bits? > It should be checked. I relied on _MQ feature bit but yes, even with MQ, RSS can be off. Will fix to read RSS feature bit. > > + return -EMSGSIZE; > > Did you check this with sparse? > max_virtqueue_pairs is __virtio16. I will check and go through the types. > > > + > > + rss_field = le16_to_cpu(config->rss_max_key_size); > > + if (nla_put_u16(msg, > VDPA_ATTR_DEV_NET_CFG_RSS_MAX_IT_LEN, rss_field)) > > + return -EMSGSIZE; > > + > > + hash_types = le32_to_cpu(config->supported_hash_types); > > unused variable Will remove. > > > + if (nla_put_u32(msg, VDPA_ATTR_DEV_NET_CFG_RSS_HASH_TYPES, > > + config->supported_hash_types)) > > + return -EMSGSIZE; > > + return 0; > > +} > > + > > +static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, struct > > +sk_buff *msg) { > > + struct virtio_net_config config = {}; > > + > > + vdev->config->get_config(vdev, 0, , sizeof(config)); > > + if (nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR, > sizeof(config.mac), config.mac)) > > + return -EMSGSIZE; > > + if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_STATUS, config.status)) > > + return -EMSGSIZE; > > + if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, > config.mtu)) > > + return -EMSGSIZE; > > + if (nla_put_u32(msg, VDPA_ATTR_DEV_NET_CFG_SPEED, > config.speed)) > > + return -EMSGSIZE; > > looks like a bunch of endian-ness/sparse errors to me, and a bunch of fields > checked without checking the feature bits. Yes, few are missed out. Fixing them in v2. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [virtio-dev] Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero
On Wed, 24 Feb 2021 17:29:07 +0800 Jason Wang wrote: > On 2021/2/23 6:58 下午, Cornelia Huck wrote: > > On Tue, 23 Feb 2021 18:31:07 +0800 > > Jason Wang wrote: > > > >> On 2021/2/23 6:04 下午, Cornelia Huck wrote: > >>> On Tue, 23 Feb 2021 17:46:20 +0800 > >>> Jason Wang wrote: > >>> > On 2021/2/23 下午5:25, Michael S. Tsirkin wrote: > > On Mon, Feb 22, 2021 at 09:09:28AM -0800, Si-Wei Liu wrote: > >> On 2/21/2021 8:14 PM, Jason Wang wrote: > >>> On 2021/2/19 7:54 下午, Si-Wei Liu wrote: > Commit 452639a64ad8 ("vdpa: make sure set_features is invoked > for legacy") made an exception for legacy guests to reset > features to 0, when config space is accessed before features > are set. We should relieve the verify_min_features() check > and allow features reset to 0 for this case. > > It's worth noting that not just legacy guests could access > config space before features are set. For instance, when > feature VIRTIO_NET_F_MTU is advertised some modern driver > will try to access and validate the MTU present in the config > space before virtio features are set. > >>> This looks like a spec violation: > >>> > >>> " > >>> > >>> The following driver-read-only field, mtu only exists if > >>> VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the > >>> driver to use. > >>> " > >>> > >>> Do we really want to workaround this? > >> Isn't the commit 452639a64ad8 itself is a workaround for legacy guest? > >> > >> I think the point is, since there's legacy guest we'd have to support, > >> this > >> host side workaround is unavoidable. Although I agree the violating > >> driver > >> should be fixed (yes, it's in today's upstream kernel which exists for > >> a > >> while now). > > Oh you are right: > > > > > > static int virtnet_validate(struct virtio_device *vdev) > > { > >if (!vdev->config->get) { > >dev_err(>dev, "%s failure: config access > > disabled\n", > >__func__); > >return -EINVAL; > >} > > > >if (!virtnet_validate_features(vdev)) > >return -EINVAL; > > > >if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) { > >int mtu = virtio_cread16(vdev, > > offsetof(struct > > virtio_net_config, > > mtu)); > >if (mtu < MIN_MTU) > >__virtio_clear_bit(vdev, VIRTIO_NET_F_MTU); > I wonder why not simply fail here? > >>> I think both failing or not accepting the feature can be argued to make > >>> sense: "the device presented us with a mtu size that does not make > >>> sense" would point to failing, "we cannot work with the mtu size that > >>> the device presented us" would point to not negotiating the feature. > >>> > > >} > > > >return 0; > > } > > > > And the spec says: > > > > > > The driver MUST follow this sequence to initialize a device: > > 1. Reset the device. > > 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device. > > 3. Set the DRIVER status bit: the guest OS knows how to drive the > > device. > > 4. Read device feature bits, and write the subset of feature bits > > understood by the OS and driver to the > > device. During this step the driver MAY read (but MUST NOT write) the > > device-specific configuration > > fields to check that it can support the device before accepting it. > > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new > > feature bits after this step. > > 6. Re-read device status to ensure the FEATURES_OK bit is still set: > > otherwise, the device does not > > support our subset of features and the device is unusable. > > 7. Perform device-specific setup, including discovery of virtqueues for > > the device, optional per-bus setup, > > reading and possibly writing the device’s virtio configuration space, > > and population of virtqueues. > > 8. Set the DRIVER_OK status bit. At this point the device is “live”. > > > > > > Item 4 on the list explicitly allows reading config space before > > FEATURES_OK. > > > > I conclude that VIRTIO_NET_F_MTU is set means "set in device features". > > > So this probably need some clarification. "is set" is used many times in > the spec that has different implications. > >>> Before FEATURES_OK is set by the driver, I guess it means "the device > >>> has offered the feature"; > >> > >> For me this part is ok since it
Re: [PATCH 3/7] x86/boot/compressed/64: Setup IDT in startup_32 boot path
On Wed, Feb 10, 2021 at 11:21:31AM +0100, Joerg Roedel wrote: > From: Joerg Roedel > > This boot path needs exception handling when it is used with SEV-ES. For ? Let's explain pls. > Setup an IDT and provide a helper function to write IDT entries for > use in 32-bit protected mode. > > Signed-off-by: Joerg Roedel > --- > arch/x86/boot/compressed/head_64.S | 73 +- > 1 file changed, 72 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/boot/compressed/head_64.S > b/arch/x86/boot/compressed/head_64.S > index c59c80ca546d..8deeec78cdb4 100644 > --- a/arch/x86/boot/compressed/head_64.S > +++ b/arch/x86/boot/compressed/head_64.S > @@ -116,6 +116,11 @@ SYM_FUNC_START(startup_32) > lretl > 1: > > +#ifdef CONFIG_AMD_MEM_ENCRYPT > + /* Setup Exception handling for SEV-ES */ > + callstartup32_load_idt > +#endif > + You can push that ifdeffery out of the main path (diff ontop): --- diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S index 8deeec78cdb4..cb5a6849fb29 100644 --- a/arch/x86/boot/compressed/head_64.S +++ b/arch/x86/boot/compressed/head_64.S @@ -116,10 +116,7 @@ SYM_FUNC_START(startup_32) lretl 1: -#ifdef CONFIG_AMD_MEM_ENCRYPT - /* Setup Exception handling for SEV-ES */ callstartup32_load_idt -#endif /* Make sure cpu supports long mode. */ callverify_cpu @@ -854,16 +851,18 @@ SYM_FUNC_START(startup32_set_idt_entry) pop %ebx ret SYM_FUNC_END(startup32_set_idt_entry) +#endif +/* Setup Exception handling for SEV-ES */ SYM_FUNC_START(startup32_load_idt) +#ifdef CONFIG_AMD_MEM_ENCRYPT /* Load IDT */ lealrva(boot32_idt)(%ebp), %eax movl%eax, rva(boot32_idt_desc+2)(%ebp) lidtrva(boot32_idt_desc)(%ebp) - +#endif ret SYM_FUNC_END(startup32_load_idt) -#endif /* * Stack and heap for uncompression */ --- > +SYM_FUNC_START(startup32_set_idt_entry) > + push%ebx > + push%ecx > + > + /* IDT entry address to %ebx */ > + lealrva(boot32_idt)(%ebp), %ebx > + shl $3, %edx > + addl%edx, %ebx > + > + /* Build IDT entry, lower 4 bytes */ > + movl%eax, %edx Let's add some side comments here: + andl$0x, %edx # Target code segment offset [15:0] + movl$__KERNEL32_CS, %ecx# Target code segment selector > + shl $16, %ecx > + orl %ecx, %edx > + > + /* Store lower 4 bytes to IDT */ > + movl%edx, (%ebx) > + > + /* Build IDT entry, upper 4 bytes */ > + movl%eax, %edx + andl$0x, %edx # Target code segment offset [31:16] + orl $0x8e00, %edx # Present, Type 32-bit Interrupt Gate so that a reader like me can quickly find interrupt gates in the docs. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero
On 2021/2/23 6:17 下午, Jason Wang wrote: On 2021/2/23 6:01 下午, Michael S. Tsirkin wrote: On Tue, Feb 23, 2021 at 05:46:20PM +0800, Jason Wang wrote: On 2021/2/23 下午5:25, Michael S. Tsirkin wrote: On Mon, Feb 22, 2021 at 09:09:28AM -0800, Si-Wei Liu wrote: On 2/21/2021 8:14 PM, Jason Wang wrote: On 2021/2/19 7:54 下午, Si-Wei Liu wrote: Commit 452639a64ad8 ("vdpa: make sure set_features is invoked for legacy") made an exception for legacy guests to reset features to 0, when config space is accessed before features are set. We should relieve the verify_min_features() check and allow features reset to 0 for this case. It's worth noting that not just legacy guests could access config space before features are set. For instance, when feature VIRTIO_NET_F_MTU is advertised some modern driver will try to access and validate the MTU present in the config space before virtio features are set. This looks like a spec violation: " The following driver-read-only field, mtu only exists if VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the driver to use. " Do we really want to workaround this? Isn't the commit 452639a64ad8 itself is a workaround for legacy guest? I think the point is, since there's legacy guest we'd have to support, this host side workaround is unavoidable. Although I agree the violating driver should be fixed (yes, it's in today's upstream kernel which exists for a while now). Oh you are right: static int virtnet_validate(struct virtio_device *vdev) { if (!vdev->config->get) { dev_err(>dev, "%s failure: config access disabled\n", __func__); return -EINVAL; } if (!virtnet_validate_features(vdev)) return -EINVAL; if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) { int mtu = virtio_cread16(vdev, offsetof(struct virtio_net_config, mtu)); if (mtu < MIN_MTU) __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU); I wonder why not simply fail here? Back in 2016 it went like this: On Thu, Jun 02, 2016 at 05:10:59PM -0400, Aaron Conole wrote: > + if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) { > + dev->mtu = virtio_cread16(vdev, > + offsetof(struct virtio_net_config, > + mtu)); > + } > + > if (vi->any_header_sg) > dev->needed_headroom = vi->hdr_len; > One comment though: I think we should validate the mtu. If it's invalid, clear VIRTIO_NET_F_MTU and ignore. Too late at this point :) I guess it's a way to tell device "I can not live with this MTU", device can fail FEATURES_OK if it wants to. MIN_MTU is an internal linux thing and at the time I felt it's better to try to make progress. What if e.g the device advertise a large MTU. E.g 64K here? Ok, consider we use add_recvbuf_small() when neither GSO nor mrg_rxbuf. This means we should fail the probing if MTU is greater than 1500 in this case. Thanks In that case, the driver can not live either. Clearing MTU won't help here. Thanks } return 0; } And the spec says: The driver MUST follow this sequence to initialize a device: 1. Reset the device. 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device. 3. Set the DRIVER status bit: the guest OS knows how to drive the device. 4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration fields to check that it can support the device before accepting it. 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step. 6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not support our subset of features and the device is unusable. 7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup, reading and possibly writing the device’s virtio configuration space, and population of virtqueues. 8. Set the DRIVER_OK status bit. At this point the device is “live”. Item 4 on the list explicitly allows reading config space before FEATURES_OK. I conclude that VIRTIO_NET_F_MTU is set means "set in device features". So this probably need some clarification. "is set" is used many times in the spec that has different implications. Thanks Generally it is worth going over feature dependent config fields and checking whether they should be present when device feature is set or when feature bit has been negotiated, and making this clear. ___ Virtualization
Re: [virtio-comment] [RFC PATCH v1 1/1] virtio-vsock: add SOCK_SEQPACKET description
On Thu, Feb 18, 2021 at 09:08:23AM +0300, Arseny Krasnov wrote: Signed-off-by: Arseny Krasnov --- virtio-vsock.tex | 40 +--- 1 file changed, 37 insertions(+), 3 deletions(-) diff --git a/virtio-vsock.tex b/virtio-vsock.tex index da7e641..1ee8f99 100644 --- a/virtio-vsock.tex +++ b/virtio-vsock.tex @@ -102,6 +102,11 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op VIRTIO_VSOCK_OP_CREDIT_UPDATE = 6, /* Request the peer to send the credit info to us */ VIRTIO_VSOCK_OP_CREDIT_REQUEST = 7, + + /* Message begin for SOCK_SEQPACKET */ + VIRTIO_VSOCK_OP_SEQ_BEGIN = 8, + /* Message end for SOCK_SEQPACKET */ + VIRTIO_VSOCK_OP_SEQ_END = 9, }; \end{lstlisting} @@ -140,11 +145,11 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera consists of a (cid, port number) tuple. The header fields used for this are \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}. -Currently only stream sockets are supported. \field{type} is 1 for stream -socket types. +Currently stream and seqpacket sockets are supported. \field{type} is 1 for +stream socket types. \field{type} is 2 for seqpacket socket types. Stream sockets provide in-order, guaranteed, connection-oriented delivery -without message boundaries. +without message boundaries. Seqpacket sockets also provide message boundaries. \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device / Device Operation / Buffer Space Management} \field{buf_alloc} and \field{fwd_cnt} are used for buffer space management of @@ -240,6 +245,35 @@ \subsubsection{Stream Sockets}\label{sec:Device Types / Socket Device / Device O destination) address tuple for a new connection while the other peer is still processing the old connection. +\subsubsection{Seqpacket Sockets}\label{sec:Device Types / Socket Device / Device Operation / Seqpacket Sockets} + +Seqpacket sockets differ from stream sockets only in data transmission way: in +stream sockets all data is sent using only VIRTIO_VSOCK_OP_RW packets. In +seqpacket sockets, to provide message boundaries, every sequence of +VIRTIO_VSOCK_OP_RW packets of each message is headed with ^ Since this is a spec, I think we should use MUST when something must be respected by the peer, for example here we can say "MUST be headed" +VIRTIO_VSOCK_OP_SEQ_BEGIN and tailed with VIRTIO_VSOCK_OP_SEQ_END packets. +Both VIRTIO_VSOCK_OP_SEQ_BEGIN and VIRTIO_VSOCK_OP_SEQ_END packets carry ^ Same here "MUST carry" and in the rest of the patch. +special header in payload: + +\begin{lstlisting} +struct virtio_vsock_seq_hdr { + __le32 msg_cnt; + __le32 msg_len; +}; +\end{lstlisting} + +\field{msg_cnt} is per socket and incremented by 1 on every send of +VIRTIO_VSOCK_OP_SEQ_BEGIN or VIRTIO_VSOCK_OP_SEQ_END. \field{msg_len} contains +message length. This header is used to check integrity of each message: message +is valid if length of data in VIRTIO_VSOCK_OP_RW packets is equal to +\field{msg_len} of prepending VIRTIO_VSOCK_OP_SEQ_BEGIN and \field{msg_cnt} of +VIRTIO_VSOCK_OP_SEQ_END differs from \field{msg_cnt} of +VIRTIO_VSOCK_OP_SEQ_BEGIN by 1. If we replace msg_cnt with msg_id, I think we should have the same 'msg_id' for VIRTIO_VSOCK_OP_SEQ_BEGIN or VIRTIO_VSOCK_OP_SEQ_END. If you want to use 'msg_cnt' and you increment it between BEGIN and END, we should consider the overflow case. + +POSIX MSG_EOR flag is supported by special value of \field{flags} in packet's +header. If this flag is set for message, then all VIRTIO_VSOCK_OP_RW packets +of this message have bit 0 is set to 1 in \field{flags}. Maybe we need to define what MSG_EOR means. Thanks, Stefano + \subsubsection{Device Events}\label{sec:Device Types / Socket Device / Device Operation / Device Events} Certain events are communicated by the device to the driver using the event -- 2.25.1 This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscr...@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org List help: virtio-comment-h...@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ___ Virtualization mailing list Virtualization@lists.linux-foundation.org
Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero
On 2021/2/24 4:43 下午, Michael S. Tsirkin wrote: On Wed, Feb 24, 2021 at 04:26:43PM +0800, Jason Wang wrote: Basically on first guest access QEMU would tell kernel whether guest is using the legacy or the modern interface. E.g. virtio_pci_config_read/virtio_pci_config_write will call ioctl(ENABLE_LEGACY, 1) while virtio_pci_common_read will call ioctl(ENABLE_LEGACY, 0) But this trick work only for PCI I think? Thanks ccw has a revision it can check. mmio does not have transitional devices at all. Ok, then we can do the workaround in the qemu, isn't it? Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [virtio-dev] Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero
On 2021/2/23 6:58 下午, Cornelia Huck wrote: On Tue, 23 Feb 2021 18:31:07 +0800 Jason Wang wrote: On 2021/2/23 6:04 下午, Cornelia Huck wrote: On Tue, 23 Feb 2021 17:46:20 +0800 Jason Wang wrote: On 2021/2/23 下午5:25, Michael S. Tsirkin wrote: On Mon, Feb 22, 2021 at 09:09:28AM -0800, Si-Wei Liu wrote: On 2/21/2021 8:14 PM, Jason Wang wrote: On 2021/2/19 7:54 下午, Si-Wei Liu wrote: Commit 452639a64ad8 ("vdpa: make sure set_features is invoked for legacy") made an exception for legacy guests to reset features to 0, when config space is accessed before features are set. We should relieve the verify_min_features() check and allow features reset to 0 for this case. It's worth noting that not just legacy guests could access config space before features are set. For instance, when feature VIRTIO_NET_F_MTU is advertised some modern driver will try to access and validate the MTU present in the config space before virtio features are set. This looks like a spec violation: " The following driver-read-only field, mtu only exists if VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the driver to use. " Do we really want to workaround this? Isn't the commit 452639a64ad8 itself is a workaround for legacy guest? I think the point is, since there's legacy guest we'd have to support, this host side workaround is unavoidable. Although I agree the violating driver should be fixed (yes, it's in today's upstream kernel which exists for a while now). Oh you are right: static int virtnet_validate(struct virtio_device *vdev) { if (!vdev->config->get) { dev_err(>dev, "%s failure: config access disabled\n", __func__); return -EINVAL; } if (!virtnet_validate_features(vdev)) return -EINVAL; if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) { int mtu = virtio_cread16(vdev, offsetof(struct virtio_net_config, mtu)); if (mtu < MIN_MTU) __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU); I wonder why not simply fail here? I think both failing or not accepting the feature can be argued to make sense: "the device presented us with a mtu size that does not make sense" would point to failing, "we cannot work with the mtu size that the device presented us" would point to not negotiating the feature. } return 0; } And the spec says: The driver MUST follow this sequence to initialize a device: 1. Reset the device. 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device. 3. Set the DRIVER status bit: the guest OS knows how to drive the device. 4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration fields to check that it can support the device before accepting it. 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step. 6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not support our subset of features and the device is unusable. 7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup, reading and possibly writing the device’s virtio configuration space, and population of virtqueues. 8. Set the DRIVER_OK status bit. At this point the device is “live”. Item 4 on the list explicitly allows reading config space before FEATURES_OK. I conclude that VIRTIO_NET_F_MTU is set means "set in device features". So this probably need some clarification. "is set" is used many times in the spec that has different implications. Before FEATURES_OK is set by the driver, I guess it means "the device has offered the feature"; For me this part is ok since it clarify that it's the driver that set the bit. during normal usage, it means "the feature has been negotiated". /? It looks to me the feature negotiation is done only after device set FEATURES_OK, or FEATURES_OK could be read from device status? I'd consider feature negotiation done when the driver reads FEATURES_OK back from the status. I agree. (This is a bit fuzzy for legacy mode.) ...because legacy does not have FEATURES_OK. The problem is the MTU description for example: "The following driver-read-only field, mtu only exists if VIRTIO_NET_F_MTU is set." It looks to me need to use "if VIRTIO_NET_F_MTU is set by device". "offered by the device"? I don't think it should 'disappear' from the config space if the driver won't use it. (Same for other config space fields that are tied to feature bits.) But what happens if e.g device doesn't offer VIRTIO_NET_F_MTU? It looks to according to the spec there will be no mtu field.
Re: [PATCH linux-next 9/9] vdpa/mlx5: Forward only packets with allowed MAC address
On 2021/2/24 2:18 下午, Parav Pandit wrote: From: Eli Cohen Add rules to forward packets to the net device's TIR only if the destination MAC is equal to the configured MAC. This is required to prevent the netdevice from receiving traffic not destined to its configured MAC. Just to confirm, this will only apply to unicast packet and we will still let multicast/broadcast to go? Thanks Signed-off-by: Eli Cohen Reviewed-by: Parav Pandit --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 84 +++ 1 file changed, 64 insertions(+), 20 deletions(-) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 51a3fc4cde4d..9b580c67acda 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -147,7 +147,8 @@ struct mlx5_vdpa_net { struct mutex reslock; struct mlx5_flow_table *rxft; struct mlx5_fc *rx_counter; - struct mlx5_flow_handle *rx_rule; + struct mlx5_flow_handle *rx_rule_ucast; + struct mlx5_flow_handle *rx_rule_mcast; bool setup; u16 mtu; }; @@ -1294,21 +1295,34 @@ static int add_fwd_to_tir(struct mlx5_vdpa_net *ndev) struct mlx5_flow_table_attr ft_attr = {}; struct mlx5_flow_act flow_act = {}; struct mlx5_flow_namespace *ns; + struct mlx5_flow_spec *spec; + void *headers_c; + void *headers_v; + u8 *dmac_c; + u8 *dmac_v; int err; - /* for now, one entry, match all, forward to tir */ - ft_attr.max_fte = 1; - ft_attr.autogroup.max_num_groups = 1; + spec = kvzalloc(sizeof(*spec), GFP_KERNEL); + if (!spec) + return -ENOMEM; + + spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS; + ft_attr.max_fte = 2; + ft_attr.autogroup.max_num_groups = 2; - ns = mlx5_get_flow_namespace(ndev->mvdev.mdev, MLX5_FLOW_NAMESPACE_BYPASS); + ns = mlx5_get_flow_namespace(ndev->mvdev.mdev, +MLX5_FLOW_NAMESPACE_BYPASS); if (!ns) { - mlx5_vdpa_warn(>mvdev, "get flow namespace\n"); - return -EOPNOTSUPP; + mlx5_vdpa_warn(>mvdev, "failed to get flow namespace\n"); + err = -EOPNOTSUPP; + goto err_ns; } ndev->rxft = mlx5_create_auto_grouped_flow_table(ns, _attr); - if (IS_ERR(ndev->rxft)) - return PTR_ERR(ndev->rxft); + if (IS_ERR(ndev->rxft)) { + err = PTR_ERR(ndev->rxft); + goto err_ns; + } ndev->rx_counter = mlx5_fc_create(ndev->mvdev.mdev, false); if (IS_ERR(ndev->rx_counter)) { @@ -1316,37 +1330,67 @@ static int add_fwd_to_tir(struct mlx5_vdpa_net *ndev) goto err_fc; } - flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST | MLX5_FLOW_CONTEXT_ACTION_COUNT; + headers_c = MLX5_ADDR_OF(fte_match_param, spec->match_criteria, outer_headers); + dmac_c = MLX5_ADDR_OF(fte_match_param, headers_c, outer_headers.dmac_47_16); + memset(dmac_c, 0xff, ETH_ALEN); + headers_v = MLX5_ADDR_OF(fte_match_param, spec->match_value, outer_headers); + dmac_v = MLX5_ADDR_OF(fte_match_param, headers_v, outer_headers.dmac_47_16); + ether_addr_copy(dmac_v, ndev->config.mac); + + flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST | + MLX5_FLOW_CONTEXT_ACTION_COUNT; dest[0].type = MLX5_FLOW_DESTINATION_TYPE_TIR; dest[0].tir_num = ndev->res.tirn; dest[1].type = MLX5_FLOW_DESTINATION_TYPE_COUNTER; dest[1].counter_id = mlx5_fc_id(ndev->rx_counter); - ndev->rx_rule = mlx5_add_flow_rules(ndev->rxft, NULL, _act, dest, 2); - if (IS_ERR(ndev->rx_rule)) { - err = PTR_ERR(ndev->rx_rule); - ndev->rx_rule = NULL; - goto err_rule; + ndev->rx_rule_ucast = mlx5_add_flow_rules(ndev->rxft, spec, _act, + dest, 2); + + if (IS_ERR(ndev->rx_rule_ucast)) { + err = PTR_ERR(ndev->rx_rule_ucast); + ndev->rx_rule_ucast = NULL; + goto err_rule_ucast; + } + + memset(dmac_c, 0, ETH_ALEN); + memset(dmac_v, 0, ETH_ALEN); + dmac_c[0] = 1; + dmac_v[0] = 1; + flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST; + ndev->rx_rule_mcast = mlx5_add_flow_rules(ndev->rxft, spec, _act, + dest, 1); + if (IS_ERR(ndev->rx_rule_mcast)) { + err = PTR_ERR(ndev->rx_rule_mcast); + ndev->rx_rule_mcast = NULL; + goto err_rule_mcast; } + kvfree(spec); return 0; -err_rule: +err_rule_mcast: + mlx5_del_flow_rules(ndev->rx_rule_ucast); + ndev->rx_rule_ucast = NULL; +err_rule_ucast: mlx5_fc_destroy(ndev->mvdev.mdev, ndev->rx_counter); err_fc:
Re: [PATCH linux-next 7/9] vdpa/mlx5: Provide device generated random MAC address
On 2021/2/24 2:18 下午, Parav Pandit wrote: From: Eli Cohen Use a randomly generated MAC address to be applied in case it is not configured by management tool. The value queried through mlx5_query_nic_vport_mac_address() is not relelavnt to vdpa since it is the mac that should be used by the regular NIC driver. Signed-off-by: Eli Cohen Reviewed-by: Parav Pandit Acked-by: Jason Wang --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index b67bba581dfd..ece2183e7b20 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -2005,10 +2005,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name) if (err) goto err_mtu; - err = mlx5_query_nic_vport_mac_address(mdev, 0, 0, config->mac); - if (err) - goto err_mtu; - + eth_random_addr(config->mac); mvdev->vdev.dma_dev = mdev->device; err = mlx5_vdpa_alloc_resources(>mvdev); if (err) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH linux-next 8/9] vdpa/mlx5: Support configuration of MAC
On 2021/2/24 2:18 下午, Parav Pandit wrote: From: Eli Cohen Add code to accept MAC configuration through vdpa tool. The MAC is written into the config struct and later can be retrieved through get_config(). Examples: 1. Configure MAC: $ vdpa dev config set vdpa0 mac 00:11:22:33:44:55 2. Show configured params: $ vdpa dev config show vdpa0: mac 00:11:22:33:44:55 link down link_announce false mtu 0 speed 0 duplex 0 Signed-off-by: Eli Cohen Reviewed-by: Parav Pandit --- Acked-by: Jason Wang drivers/vdpa/mlx5/net/mlx5_vnet.c | 17 + 1 file changed, 17 insertions(+) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index ece2183e7b20..51a3fc4cde4d 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1972,6 +1972,22 @@ struct mlx5_vdpa_mgmtdev { struct mlx5_vdpa_net *ndev; }; +static int mlx5_vdpa_net_dev_config_set(struct vdpa_mgmt_dev *v_mdev, + struct vdpa_device *vdev, + const struct vdpa_dev_config_set_attr *attrs) +{ + struct mlx5_vdpa_mgmtdev *mgtdev = container_of(v_mdev, struct mlx5_vdpa_mgmtdev, mgtdev); + struct mlx5_vdpa_net *ndev = mgtdev->ndev; + + if (attrs->mask.mtu_valid) + return -EOPNOTSUPP; + + if (attrs->mask.mac_valid) + memcpy(ndev->config.mac, attrs->cfg.mac, ETH_ALEN); + + return 0; +} + static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name) { struct mlx5_vdpa_mgmtdev *mgtdev = container_of(v_mdev, struct mlx5_vdpa_mgmtdev, mgtdev); @@ -2044,6 +2060,7 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device * static const struct vdpa_mgmtdev_ops mdev_ops = { .dev_add = mlx5_vdpa_dev_add, .dev_del = mlx5_vdpa_dev_del, + .dev_config_set = mlx5_vdpa_net_dev_config_set, }; static struct virtio_device_id id_table[] = { ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH linux-next 2/9] vdpa: Introduce query of device config layout
On 2021/2/24 2:18 下午, Parav Pandit wrote: Introduce a command to query a device config layout. An example query of network vdpa device: $ vdpa dev add name bar mgmtdev vdpasim_net $ vdpa dev config show bar: mac 00:35:09:19:48:05 link up link_announce false mtu 1500 speed 0 duplex 0 $ vdpa dev config show -jp { "config": { "bar": { "mac": "00:35:09:19:48:05", "link ": "up", "link_announce ": false, "mtu": 1500, "speed": 0, "duplex": 0 } } } Signed-off-by: Parav Pandit Reviewed-by: Eli Cohen --- changelog: v1->v2: - read whole net config layout instead of individual fields - added error extack for unmanaged vdpa device --- drivers/vdpa/vdpa.c | 181 ++ include/uapi/linux/vdpa.h | 11 +++ 2 files changed, 192 insertions(+) diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index 3d997b389345..cebbba500638 100644 --- a/drivers/vdpa/vdpa.c +++ b/drivers/vdpa/vdpa.c @@ -14,6 +14,8 @@ #include #include #include +#include +#include static LIST_HEAD(mdev_head); /* A global mutex that protects vdpa management device and device level operations. */ @@ -603,6 +605,179 @@ static int vdpa_nl_cmd_dev_get_dumpit(struct sk_buff *msg, struct netlink_callba return msg->len; } +static int vdpa_dev_net_mq_config_fill(struct vdpa_device *vdev, + struct sk_buff *msg, + const struct virtio_net_config *config) +{ + u32 hash_types; + u16 rss_field; + u64 features; + + features = vdev->config->get_features(vdev); + if ((features & (1ULL << VIRTIO_NET_F_MQ)) == 0) + return 0; + + if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP, + config->max_virtqueue_pairs)) + return -EMSGSIZE; We probably need to check VIRTIO_NET_F_RSS here. + if (nla_put_u8(msg, VDPA_ATTR_DEV_NET_CFG_RSS_MAX_KEY_LEN, + config->rss_max_key_size)) + return -EMSGSIZE; + + rss_field = le16_to_cpu(config->rss_max_key_size); + if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_RSS_MAX_IT_LEN, rss_field)) + return -EMSGSIZE; + + hash_types = le32_to_cpu(config->supported_hash_types); + if (nla_put_u32(msg, VDPA_ATTR_DEV_NET_CFG_RSS_HASH_TYPES, + config->supported_hash_types)) + return -EMSGSIZE; + return 0; +} + +static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, struct sk_buff *msg) +{ + struct virtio_net_config config = {}; + + vdev->config->get_config(vdev, 0, , sizeof(config)); Do we need to sync with other possible get_config calls here? + if (nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR, sizeof(config.mac), config.mac)) + return -EMSGSIZE; + if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_STATUS, config.status)) + return -EMSGSIZE; + if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, config.mtu)) + return -EMSGSIZE; And check VIRTIO_NET_F_SPEED_DUPLEX. + if (nla_put_u32(msg, VDPA_ATTR_DEV_NET_CFG_SPEED, config.speed)) + return -EMSGSIZE; + if (nla_put_u8(msg, VDPA_ATTR_DEV_NET_CFG_DUPLEX, config.duplex)) + return -EMSGSIZE; + + return vdpa_dev_net_mq_config_fill(vdev, msg, ); +} + +static int +vdpa_dev_config_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 portid, u32 seq, +int flags, struct netlink_ext_ack *extack) +{ + u32 device_id; + void *hdr; + int err; + + hdr = genlmsg_put(msg, portid, seq, _nl_family, flags, + VDPA_CMD_DEV_CONFIG_GET); + if (!hdr) + return -EMSGSIZE; + + if (nla_put_string(msg, VDPA_ATTR_DEV_NAME, dev_name(>dev))) { + err = -EMSGSIZE; + goto msg_err; + } + + device_id = vdev->config->get_device_id(vdev); + if (nla_put_u32(msg, VDPA_ATTR_DEV_ID, device_id)) { + err = -EMSGSIZE; + goto msg_err; + } + + switch (device_id) { + case VIRTIO_ID_NET: + err = vdpa_dev_net_config_fill(vdev, msg); + break; + default: + err = -EOPNOTSUPP; + break; + } + if (err) + goto msg_err; + + genlmsg_end(msg, hdr); + return 0; + +msg_err: + genlmsg_cancel(msg, hdr); + return err; +} + +static int vdpa_nl_cmd_dev_config_get_doit(struct sk_buff *skb, struct genl_info *info) +{ + struct vdpa_device *vdev; + struct sk_buff *msg; + const char *devname; + struct device *dev; + int err; + + if (!info->attrs[VDPA_ATTR_DEV_NAME]) + return -EINVAL; + devname = nla_data(info->attrs[VDPA_ATTR_DEV_NAME]); +
Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero
On Wed, Feb 24, 2021 at 04:26:43PM +0800, Jason Wang wrote: > Basically on first guest access QEMU would tell kernel whether > guest is using the legacy or the modern interface. > E.g. virtio_pci_config_read/virtio_pci_config_write will call > ioctl(ENABLE_LEGACY, 1) > while virtio_pci_common_read will call ioctl(ENABLE_LEGACY, 0) > > > But this trick work only for PCI I think? > > Thanks ccw has a revision it can check. mmio does not have transitional devices at all. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH v5 00/19] virtio/vsock: introduce SOCK_SEQPACKET support
On Wed, Feb 24, 2021 at 11:28:50AM +0300, Arseny Krasnov wrote: On 24.02.2021 11:23, Stefano Garzarella wrote: On Wed, Feb 24, 2021 at 07:29:25AM +0300, Arseny Krasnov wrote: On 23.02.2021 17:50, Stefano Garzarella wrote: On Mon, Feb 22, 2021 at 03:23:11PM +0100, Stefano Garzarella wrote: Hi Arseny, On Thu, Feb 18, 2021 at 08:33:44AM +0300, Arseny Krasnov wrote: This patchset impelements support of SOCK_SEQPACKET for virtio transport. As SOCK_SEQPACKET guarantees to save record boundaries, so to do it, two new packet operations were added: first for start of record and second to mark end of record(SEQ_BEGIN and SEQ_END later). Also, both operations carries metadata - to maintain boundaries and payload integrity. Metadata is introduced by adding special header with two fields - message count and message length: struct virtio_vsock_seq_hdr { __le32 msg_cnt; __le32 msg_len; } __attribute__((packed)); This header is transmitted as payload of SEQ_BEGIN and SEQ_END packets(buffer of second virtio descriptor in chain) in the same way as data transmitted in RW packets. Payload was chosen as buffer for this header to avoid touching first virtio buffer which carries header of packet, because someone could check that size of this buffer is equal to size of packet header. To send record, packet with start marker is sent first(it's header contains length of record and counter), then counter is incremented and all data is sent as usual 'RW' packets and finally SEQ_END is sent(it also carries counter of message, which is counter of SEQ_BEGIN + 1), also after sedning SEQ_END counter is incremented again. On receiver's side, length of record is known from packet with start record marker. To check that no packets were dropped by transport, counters of two sequential SEQ_BEGIN and SEQ_END are checked(counter of SEQ_END must be bigger that counter of SEQ_BEGIN by 1) and length of data between two markers is compared to length in SEQ_BEGIN header. Now as packets of one socket are not reordered neither on vsock nor on vhost transport layers, such markers allows to restore original record on receiver's side. If user's buffer is smaller that record length, when all out of size data is dropped. Maximum length of datagram is not limited as in stream socket, because same credit logic is used. Difference with stream socket is that user is not woken up until whole record is received or error occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags. Tests also implemented. I reviewed the first part (af_vsock.c changes), tomorrow I'll review the rest. That part looks great to me, only found a few minor issues. I revieiwed the rest of it as well, left a few minor comments, but I think we're well on track. I'll take a better look at the specification patch tomorrow. Great, Thank You Thanks, Stefano In the meantime, however, I'm getting a doubt, especially with regard to other transports besides virtio. Should we hide the begin/end marker sending in the transport? I mean, should the transport just provide a seqpacket_enqueue() callbacl? Inside it then the transport will send the markers. This is because some transports might not need to send markers. But thinking about it more, they could actually implement stubs for that calls, if they don't need to send markers. So I think for now it's fine since it allows us to reuse a lot of code, unless someone has some objection. I thought about that, I'll try to implement it in next version. Let's see... If you want to discuss it first, write down the idea you want to implement, I wouldn't want to make you do unnecessary work. :-) Idea is simple, in iov iterator of 'struct msghdr' which is passed to enqueue callback we have two fields: 'iov_offset' which is byte offset inside io vector where next data must be picked and 'count' which is rest of unprocessed bytes in io vector. So in seqpacket enqueue callback if 'iov_offset' is 0 i'll send SEQBEGIN, and if 'count' is 0 i'll send SEQEND. Got it, make sense and it's defently more transparent for the vsock core! Go head, maybe adding a comment in the vsock core explaining this, so other developers can understand better if they want to support SEPACKET in other transports. Thanks, Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH v5 11/19] virtio/vsock: dequeue callback for SOCK_SEQPACKET
On Wed, Feb 24, 2021 at 01:41:56AM -0500, Michael S. Tsirkin wrote: On Wed, Feb 24, 2021 at 08:07:48AM +0300, Arseny Krasnov wrote: On 23.02.2021 17:17, Michael S. Tsirkin wrote: > On Thu, Feb 18, 2021 at 08:39:37AM +0300, Arseny Krasnov wrote: >> This adds transport callback and it's logic for SEQPACKET dequeue. >> Callback fetches RW packets from rx queue of socket until whole record >> is copied(if user's buffer is full, user is not woken up). This is done >> to not stall sender, because if we wake up user and it leaves syscall, >> nobody will send credit update for rest of record, and sender will wait >> for next enter of read syscall at receiver's side. So if user buffer is >> full, we just send credit update and drop data. If during copy SEQ_BEGIN >> was found(and not all data was copied), copying is restarted by reset >> user's iov iterator(previous unfinished data is dropped). >> >> Signed-off-by: Arseny Krasnov >> --- >> include/linux/virtio_vsock.h| 10 +++ >> include/uapi/linux/virtio_vsock.h | 16 >> net/vmw_vsock/virtio_transport_common.c | 114 >> 3 files changed, 140 insertions(+) >> >> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h >> index dc636b727179..003d06ae4a85 100644 >> --- a/include/linux/virtio_vsock.h >> +++ b/include/linux/virtio_vsock.h >> @@ -36,6 +36,11 @@ struct virtio_vsock_sock { >>u32 rx_bytes; >>u32 buf_alloc; >>struct list_head rx_queue; >> + >> + /* For SOCK_SEQPACKET */ >> + u32 user_read_seq_len; >> + u32 user_read_copied; >> + u32 curr_rx_msg_cnt; > > wrap these in a struct to make it's clearer they > are related? Ack > >> }; >> >> struct virtio_vsock_pkt { >> @@ -80,6 +85,11 @@ virtio_transport_dgram_dequeue(struct vsock_sock *vsk, >> struct msghdr *msg, >> size_t len, int flags); >> >> +int >> +virtio_transport_seqpacket_dequeue(struct vsock_sock *vsk, >> + struct msghdr *msg, >> + int flags, >> + bool *msg_ready); >> s64 virtio_transport_stream_has_data(struct vsock_sock *vsk); >> s64 virtio_transport_stream_has_space(struct vsock_sock *vsk); >> >> diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h >> index 1d57ed3d84d2..cf9c165e5cca 100644 >> --- a/include/uapi/linux/virtio_vsock.h >> +++ b/include/uapi/linux/virtio_vsock.h >> @@ -63,8 +63,14 @@ struct virtio_vsock_hdr { >>__le32 fwd_cnt; >> } __attribute__((packed)); >> >> +struct virtio_vsock_seq_hdr { >> + __le32 msg_cnt; >> + __le32 msg_len; >> +} __attribute__((packed)); >> + >> enum virtio_vsock_type { >>VIRTIO_VSOCK_TYPE_STREAM = 1, >> + VIRTIO_VSOCK_TYPE_SEQPACKET = 2, >> }; >> >> enum virtio_vsock_op { >> @@ -83,6 +89,11 @@ enum virtio_vsock_op { >>VIRTIO_VSOCK_OP_CREDIT_UPDATE = 6, >>/* Request the peer to send the credit info to us */ >>VIRTIO_VSOCK_OP_CREDIT_REQUEST = 7, >> + >> + /* Record begin for SOCK_SEQPACKET */ >> + VIRTIO_VSOCK_OP_SEQ_BEGIN = 8, >> + /* Record end for SOCK_SEQPACKET */ >> + VIRTIO_VSOCK_OP_SEQ_END = 9, >> }; >> >> /* VIRTIO_VSOCK_OP_SHUTDOWN flags values */ >> @@ -91,4 +102,9 @@ enum virtio_vsock_shutdown { >>VIRTIO_VSOCK_SHUTDOWN_SEND = 2, >> }; >> >> +/* VIRTIO_VSOCK_OP_RW flags values */ >> +enum virtio_vsock_rw { >> + VIRTIO_VSOCK_RW_EOR = 1, >> +}; >> + >> #endif /* _UAPI_LINUX_VIRTIO_VSOCK_H */ > Probably a good idea to also have a feature bit gating > this functionality. IIUC this also requires some qemu patch, because in current implementation of vsock device in qemu, there is no 'set_features' callback for such device. This callback will handle guest's write to feature register, by calling vhost kernel backend, where this bit will be processed by host. Well patching userspace to make use of a kernel feature is par for the course, isn't it? IMHO I'm not sure that SEQPACKET support needs feature bit - it is just two new ops for virtio vsock protocol, and from point of view of virtio device it is same as STREAM. May be it is needed for cases when client tries to connect to server which doesn't support SEQPACKET, so without bit result will be "Connection reset by peer", and with such bit client will know that server doesn't support it and 'socket(SOCK_SEQPACKET)' will return error? Yes, a better error handling would be one reason to do it like this. Agree, in this way we could implement a 'seqpacket_allow' callback (similar to 'stream_allow'), and we can return 'true' if the feature is negotiated. So instead of checking all the seqpacket callbacks, we can use only this callback to understand if the transport support it. We can implement it also for other transports (vmci, hyperv) and return always false for now. Thanks, Stefano
Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero
On 2021/2/24 3:17 下午, Michael S. Tsirkin wrote: On Wed, Feb 24, 2021 at 02:53:08PM +0800, Jason Wang wrote: On 2021/2/24 2:46 下午, Michael S. Tsirkin wrote: On Wed, Feb 24, 2021 at 02:04:36PM +0800, Jason Wang wrote: On 2021/2/24 1:04 下午, Michael S. Tsirkin wrote: On Tue, Feb 23, 2021 at 11:35:57AM -0800, Si-Wei Liu wrote: On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote: On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote: On 2021/2/23 9:12 上午, Si-Wei Liu wrote: On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote: On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote: On 2021/2/19 7:54 下午, Si-Wei Liu wrote: Commit 452639a64ad8 ("vdpa: make sure set_features is invoked for legacy") made an exception for legacy guests to reset features to 0, when config space is accessed before features are set. We should relieve the verify_min_features() check and allow features reset to 0 for this case. It's worth noting that not just legacy guests could access config space before features are set. For instance, when feature VIRTIO_NET_F_MTU is advertised some modern driver will try to access and validate the MTU present in the config space before virtio features are set. This looks like a spec violation: " The following driver-read-only field, mtu only exists if VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the driver to use. " Do we really want to workaround this? Thanks And also: The driver MUST follow this sequence to initialize a device: 1. Reset the device. 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device. 3. Set the DRIVER status bit: the guest OS knows how to drive the device. 4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration fields to check that it can support the device before accepting it. 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step. 6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not support our subset of features and the device is unusable. 7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup, reading and possibly writing the device’s virtio configuration space, and population of virtqueues. 8. Set the DRIVER_OK status bit. At this point the device is “live”. so accessing config space before FEATURES_OK is a spec violation, right? It is, but it's not relevant to what this commit tries to address. I thought the legacy guest still needs to be supported. Having said, a separate patch has to be posted to fix the guest driver issue where this discrepancy is introduced to virtnet_validate() (since commit fe36cbe067). But it's not technically related to this patch. -Siwei I think it's a bug to read config space in validate, we should move it to virtnet_probe(). Thanks I take it back, reading but not writing seems to be explicitly allowed by spec. So our way to detect a legacy guest is bogus, need to think what is the best way to handle this. Then maybe revert commit fe36cbe067 and friends, and have QEMU detect legacy guest? Supposedly only config space write access needs to be guarded before setting FEATURES_OK. -Siwie Detecting it isn't enough though, we will need a new ioctl to notify the kernel that it's a legacy guest. Ugh:( I'm not sure I get this, how can we know if there's a legacy driver before set_features()? qemu knows for sure. It does not communicate this information to the kernel right now unfortunately. I may miss something, but I still don't get how the new ioctl is supposed to work. Thanks Basically on first guest access QEMU would tell kernel whether guest is using the legacy or the modern interface. E.g. virtio_pci_config_read/virtio_pci_config_write will call ioctl(ENABLE_LEGACY, 1) while virtio_pci_common_read will call ioctl(ENABLE_LEGACY, 0) But this trick work only for PCI I think? Thanks Or maybe we just add GET_CONFIG_MODERN and GET_CONFIG_LEGACY and call the correct ioctl ... there are many ways to build this API. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH v5 00/19] virtio/vsock: introduce SOCK_SEQPACKET support
On Wed, Feb 24, 2021 at 07:29:25AM +0300, Arseny Krasnov wrote: On 23.02.2021 17:50, Stefano Garzarella wrote: On Mon, Feb 22, 2021 at 03:23:11PM +0100, Stefano Garzarella wrote: Hi Arseny, On Thu, Feb 18, 2021 at 08:33:44AM +0300, Arseny Krasnov wrote: This patchset impelements support of SOCK_SEQPACKET for virtio transport. As SOCK_SEQPACKET guarantees to save record boundaries, so to do it, two new packet operations were added: first for start of record and second to mark end of record(SEQ_BEGIN and SEQ_END later). Also, both operations carries metadata - to maintain boundaries and payload integrity. Metadata is introduced by adding special header with two fields - message count and message length: struct virtio_vsock_seq_hdr { __le32 msg_cnt; __le32 msg_len; } __attribute__((packed)); This header is transmitted as payload of SEQ_BEGIN and SEQ_END packets(buffer of second virtio descriptor in chain) in the same way as data transmitted in RW packets. Payload was chosen as buffer for this header to avoid touching first virtio buffer which carries header of packet, because someone could check that size of this buffer is equal to size of packet header. To send record, packet with start marker is sent first(it's header contains length of record and counter), then counter is incremented and all data is sent as usual 'RW' packets and finally SEQ_END is sent(it also carries counter of message, which is counter of SEQ_BEGIN + 1), also after sedning SEQ_END counter is incremented again. On receiver's side, length of record is known from packet with start record marker. To check that no packets were dropped by transport, counters of two sequential SEQ_BEGIN and SEQ_END are checked(counter of SEQ_END must be bigger that counter of SEQ_BEGIN by 1) and length of data between two markers is compared to length in SEQ_BEGIN header. Now as packets of one socket are not reordered neither on vsock nor on vhost transport layers, such markers allows to restore original record on receiver's side. If user's buffer is smaller that record length, when all out of size data is dropped. Maximum length of datagram is not limited as in stream socket, because same credit logic is used. Difference with stream socket is that user is not woken up until whole record is received or error occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags. Tests also implemented. I reviewed the first part (af_vsock.c changes), tomorrow I'll review the rest. That part looks great to me, only found a few minor issues. I revieiwed the rest of it as well, left a few minor comments, but I think we're well on track. I'll take a better look at the specification patch tomorrow. Great, Thank You Thanks, Stefano In the meantime, however, I'm getting a doubt, especially with regard to other transports besides virtio. Should we hide the begin/end marker sending in the transport? I mean, should the transport just provide a seqpacket_enqueue() callbacl? Inside it then the transport will send the markers. This is because some transports might not need to send markers. But thinking about it more, they could actually implement stubs for that calls, if they don't need to send markers. So I think for now it's fine since it allows us to reuse a lot of code, unless someone has some objection. I thought about that, I'll try to implement it in next version. Let's see... If you want to discuss it first, write down the idea you want to implement, I wouldn't want to make you do unnecessary work. :-) Cheers, Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH linux-next 0/9] vdpa: Enable user to set mac address,
> From: Michael S. Tsirkin > Sent: Wednesday, February 24, 2021 12:21 PM > > On Wed, Feb 24, 2021 at 08:18:35AM +0200, Parav Pandit wrote: > > Currently user cannot set the mac address and mtu of the vdpa device. > > This patchset enables users to set the mac address and mtu of the vdpa [..] > > > > Patch summary: > > Patch-1 uses read only features bit to detect endianness > > Patch-2 implements new config layout query command > > Patch-3 implements callback for setting vdpa net config fields > > Patch-4 extends vdpa_sim_net driver to implement mac, mtu setting > > Patch-5 removed redundant get_config callback > > Patch-6 mlx5 vdpa driver migrates to user created vdpa device > > Patch-7 mlx5 vdpa driver uses random mac address when not configured > > Patch-8 mlx5 vdpa driver supports user provided mac config > > Patch-9 mlx5 vdpa driver uses user provided mac during rx flow > > steering > > which tree is this for? does not seem to apply to linux-next branch of vhost > ... It is for linux-next branch of vhost. However I missed to rebase before sending. My bad. Preparing to send v2 after addressing other comments and after rebase. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization