Re: Kvm virtual machine uses virtio graphics card, the rotating screen is stuck
On Tue, Nov 23, 2021 at 3:00 PM 苟浩 wrote: > > Hello, > > I use `xrandr -o left` to rotate the screen in the kvm virtual machine. > When configured as a Virtio graphics card, the screen will freeze after > rotating the screen, and the keyboard and mouse will not respond. > When configured as a VGA graphics card, it is normal after rotating the > screen. > > Is the Virtio graphics card not supporting rotating? Adding list and Gerd for the answer. Thanks > > > Thanks! ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH v5 23/26] util: Add iova_tree_alloc
Hi, Eugenio, Sorry for the super late response. On Fri, Oct 29, 2021 at 08:35:22PM +0200, Eugenio Pérez wrote: [...] > +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin, > +hwaddr iova_last) > +{ > +struct IOVATreeAllocArgs args = { > +.new_size = map->size, > +.iova_begin = iova_begin, > +.iova_last = iova_last, > +}; > + > +if (iova_begin == 0) { > +/* Some devices does not like addr 0 */ > +iova_begin += qemu_real_host_page_size; > +} Any explanation of why zero is not welcomed? It would be great if we can move this out of iova-tree.c, because that doesn't look like a good place to, e.g. reference qemu_real_host_page_size, anyways. The caller can simply pass in qemu_real_host_page_size as iova_begin when needed (and I highly doubt it'll be a must for all iova-tree users..) > + > +assert(iova_begin < iova_last); > + > +/* > + * Find a valid hole for the mapping > + * > + * Assuming low iova_begin, so no need to do a binary search to > + * locate the first node. > + * > + * TODO: We can improve the search speed if we save the beginning and the > + * end of holes, so we don't iterate over the previous saved ones. > + * > + * TODO: Replace all this with g_tree_node_first/next/last when available > + * (from glib since 2.68). To do it with g_tree_foreach complicates the > + * code a lot. > + * > + */ > +g_tree_foreach(tree->tree, iova_tree_alloc_traverse, &args); > +if (!iova_tree_alloc_map_in_hole(&args)) { > +/* > + * 2nd try: Last iteration left args->right as the last DMAMap. But > + * (right, end) hole needs to be checked too > + */ > +iova_tree_alloc_args_iterate(&args, NULL); > +if (!iova_tree_alloc_map_in_hole(&args)) { > +return IOVA_ERR_NOMEM; > +} > +} > + > +map->iova = MAX(iova_begin, > +args.hole_left ? > +args.hole_left->iova + args.hole_left->size + 1 : 0); > +return iova_tree_insert(tree, map); > +} Re the algorithm - I totally agree Jason's version is much better. Thanks, -- Peter Xu ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 04/29] dax: simplify the dax_device <-> gendisk association
On Mon, Nov 22, 2021 at 07:33:06PM -0800, Dan Williams wrote: > Is it time to add a "DAX" symbol namespace? What would be the benefit? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 02/29] dm: make the DAX support dependend on CONFIG_FS_DAX
On Mon, Nov 22, 2021 at 06:54:09PM -0800, Dan Williams wrote: > On Thu, Nov 18, 2021 at 10:55 PM Christoph Hellwig wrote: > > > > On Wed, Nov 17, 2021 at 09:23:44AM -0800, Dan Williams wrote: > > > Applied, fixed the spelling of 'dependent' in the subject and picked > > > up Mike's Ack from the previous send: > > > > > > https://lore.kernel.org/r/yyasbvuorceds...@redhat.com > > > > > > Christoph, any particular reason you did not pick up the tags from the > > > last posting? > > > > I thought I did, but apparently I've missed some. > > I'll reply with the ones I see missing that need carrying over and add > my own reviewed-by then you can send me a pull request when ready, > deal? Ok. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 12/29] fsdax: remove a pointless __force cast in copy_cow_page_dax
On Tue, Nov 9, 2021 at 12:34 AM Christoph Hellwig wrote: > > Despite its name copy_user_page expected kernel addresses, which is what > we already have. Yup, Reviewed-by: Dan Williams ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 11/29] dm-stripe: add a stripe_dax_pgoff helper
On Tue, Nov 9, 2021 at 12:33 AM Christoph Hellwig wrote: > > Add a helper to perform the entire remapping for DAX accesses. This > helper open codes bdev_dax_pgoff given that the alignment checks have > already been done by the submitting file system and don't need to be > repeated. > > Signed-off-by: Christoph Hellwig > Acked-by: Mike Snitzer Reviewed-by: Dan Williams ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vdpa_sim: avoid putting an uninitialized iova_domain
On Tue, Nov 23, 2021 at 11:12 AM Parav Pandit wrote: > > > > > From: Longpeng(Mike) > > Sent: Monday, November 22, 2021 5:52 PM > > > > From: Longpeng > > > > The system will crash if we put an uninitialized iova_domain, this could > > happen when an error occurs before initializing the iova_domain in > > vdpasim_create(). > > > > BUG: kernel NULL pointer dereference, address: ... > > RIP: 0010:__cpuhp_state_remove_instance+0x96/0x1c0 > > ... > > Call Trace: > > > > put_iova_domain+0x29/0x220 > > vdpasim_free+0xd1/0x120 [vdpa_sim] > > vdpa_release_dev+0x21/0x40 [vdpa] > > device_release+0x33/0x90 > > kobject_release+0x63/0x160 > > vdpasim_create+0x127/0x2a0 [vdpa_sim] > > vdpasim_net_dev_add+0x7d/0xfe [vdpa_sim_net] > > vdpa_nl_cmd_dev_add_set_doit+0xe1/0x1a0 [vdpa] > > genl_family_rcv_msg_doit+0x112/0x140 > > genl_rcv_msg+0xdf/0x1d0 > > ... > > > > So we must make sure the iova_domain is already initialized before put it. > > > > In addition, we may get the following warning in this case: > > WARNING: ... drivers/iommu/iova.c:344 iova_cache_put+0x58/0x70 > > > > So we must make sure the iova_cache_put() is invoked only if the > > iova_cache_get() is already invoked. Let's fix it together. > > > > Signed-off-by: Longpeng > > Can you please add the fixes tag here so that older kernels can take this fix? > I guess it's 4080fc106750 ("vdpa_sim: use iova module to allocate IOVA addresses") Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 10/29] dm-log-writes: add a log_writes_dax_pgoff helper
On Tue, Nov 9, 2021 at 12:33 AM Christoph Hellwig wrote: > > Add a helper to perform the entire remapping for DAX accesses. This > helper open codes bdev_dax_pgoff given that the alignment checks have > already been done by the submitting file system and don't need to be > repeated. > > Signed-off-by: Christoph Hellwig > Acked-by: Mike Snitzer Reviewed-by: Dan Williams ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 09/29] dm-linear: add a linear_dax_pgoff helper
On Tue, Nov 9, 2021 at 12:33 AM Christoph Hellwig wrote: > > Add a helper to perform the entire remapping for DAX accesses. This > helper open codes bdev_dax_pgoff given that the alignment checks have > already been done by the submitting file system and don't need to be > repeated. > > Signed-off-by: Christoph Hellwig > Acked-by: Mike Snitzer Reviewed-by: Dan Williams ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 08/29] dax: remove dax_capable
On Tue, Nov 9, 2021 at 12:33 AM Christoph Hellwig wrote: > > Just open code the block size and dax_dev == NULL checks in the callers. > > Signed-off-by: Christoph Hellwig > Acked-by: Mike Snitzer You dropped Gao Xiang's reviewed-by: https://lore.kernel.org/r/YW1nrxmaw5CfFXBb@B-P7TQMD6M-0146.local ...and Darrick's https://lore.kernel.org/r/20211019154447.GL24282@magnolia ...which had a few more review comments below, otherwise you can also add: Reviewed-by: Dan Williams > --- > drivers/dax/super.c | 36 > drivers/md/dm-table.c| 22 +++--- > drivers/md/dm.c | 21 - > drivers/md/dm.h | 4 > drivers/nvdimm/pmem.c| 1 - > drivers/s390/block/dcssblk.c | 1 - > fs/erofs/super.c | 11 +++ > fs/ext2/super.c | 6 -- > fs/ext4/super.c | 9 ++--- > fs/xfs/xfs_super.c | 21 - > include/linux/dax.h | 14 -- > 11 files changed, 36 insertions(+), 110 deletions(-) > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c > index 482fe775324a4..803942586d1b6 100644 > --- a/drivers/dax/super.c > +++ b/drivers/dax/super.c > @@ -108,42 +108,6 @@ struct dax_device *fs_dax_get_by_bdev(struct > block_device *bdev) > return dax_dev; > } > EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev); > - > -bool generic_fsdax_supported(struct dax_device *dax_dev, > - struct block_device *bdev, int blocksize, sector_t start, > - sector_t sectors) > -{ > - if (blocksize != PAGE_SIZE) { > - pr_info("%pg: error: unsupported blocksize for dax\n", bdev); > - return false; > - } > - > - if (!dax_dev) { > - pr_debug("%pg: error: dax unsupported by block device\n", > bdev); > - return false; > - } > - > - return true; > -} > -EXPORT_SYMBOL_GPL(generic_fsdax_supported); > - > -bool dax_supported(struct dax_device *dax_dev, struct block_device *bdev, > - int blocksize, sector_t start, sector_t len) > -{ > - bool ret = false; > - int id; > - > - if (!dax_dev) > - return false; > - > - id = dax_read_lock(); > - if (dax_alive(dax_dev) && dax_dev->ops->dax_supported) > - ret = dax_dev->ops->dax_supported(dax_dev, bdev, blocksize, > - start, len); > - dax_read_unlock(id); > - return ret; > -} > -EXPORT_SYMBOL_GPL(dax_supported); > #endif /* CONFIG_BLOCK && CONFIG_FS_DAX */ > > enum dax_device_flags { > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c > index bcddc5effd155..f4915a7d5dc84 100644 > --- a/drivers/md/dm-table.c > +++ b/drivers/md/dm-table.c > @@ -806,12 +806,14 @@ void dm_table_set_type(struct dm_table *t, enum > dm_queue_mode type) > EXPORT_SYMBOL_GPL(dm_table_set_type); > > /* validate the dax capability of the target device span */ > -int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev, > +static int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev, > sector_t start, sector_t len, void *data) > { > - int blocksize = *(int *) data; > + if (dev->dax_dev) > + return false; > > - return !dax_supported(dev->dax_dev, dev->bdev, blocksize, start, len); > + DMDEBUG("%pg: error: dax unsupported by block device", dev->bdev); > + return true; > } > > /* Check devices support synchronous DAX */ > @@ -821,8 +823,8 @@ static int device_not_dax_synchronous_capable(struct > dm_target *ti, struct dm_de > return !dev->dax_dev || !dax_synchronous(dev->dax_dev); > } > > -bool dm_table_supports_dax(struct dm_table *t, > - iterate_devices_callout_fn iterate_fn, int > *blocksize) > +static bool dm_table_supports_dax(struct dm_table *t, > + iterate_devices_callout_fn iterate_fn) > { > struct dm_target *ti; > unsigned i; > @@ -835,7 +837,7 @@ bool dm_table_supports_dax(struct dm_table *t, > return false; > > if (!ti->type->iterate_devices || > - ti->type->iterate_devices(ti, iterate_fn, blocksize)) > + ti->type->iterate_devices(ti, iterate_fn, NULL)) > return false; > } > > @@ -862,7 +864,6 @@ static int dm_table_determine_type(struct dm_table *t) > struct dm_target *tgt; > struct list_head *devices = dm_table_get_devices(t); > enum dm_queue_mode live_md_type = dm_get_md_type(t->md); > - int page_size = PAGE_SIZE; > > if (t->type != DM_TYPE_NONE) { > /* target already set the table's type */ > @@ -906,7 +907,7 @@ static int dm_table_determine_type(struct dm_table *t) > verify_bio_based: > /* We must use this tab
Re: [PATCH 07/29] xfs: factor out a xfs_setup_dax_always helper
On Tue, Nov 9, 2021 at 12:33 AM Christoph Hellwig wrote: > > Factor out another DAX setup helper to simplify future changes. Also > move the experimental warning after the checks to not clutter the log > too much if the setup failed. > > Signed-off-by: Christoph Hellwig Reviewed-by: Dan Williams ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 06/29] dax: move the partition alignment check into fs_dax_get_by_bdev
On Tue, Nov 9, 2021 at 12:33 AM Christoph Hellwig wrote: > > fs_dax_get_by_bdev is the primary interface to find a dax device for a > block device, so move the partition alignment check there instead of > wiring it up through ->dax_supported. > Reviewed-by: Dan Williams ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 05/29] dax: remove the pgmap sanity checks in generic_fsdax_supported
On Tue, Nov 9, 2021 at 12:33 AM Christoph Hellwig wrote: > > Drivers that register a dax_dev should make sure it works, no need > to double check from the file system. Reviewed-by: Dan Williams ...with a self-reminder to migrate this validation to a unit test to backstop any future refactoring of the memmap reservation code. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 04/29] dax: simplify the dax_device <-> gendisk association
On Tue, Nov 9, 2021 at 12:33 AM Christoph Hellwig wrote: > > Replace the dax_host_hash with an xarray indexed by the pointer value > of the gendisk, and require explicitly calls from the block drivers that > want to associate their gendisk with a dax_device. > > Signed-off-by: Christoph Hellwig > Acked-by: Mike Snitzer > --- > drivers/dax/bus.c| 6 +- > drivers/dax/super.c | 106 +-- > drivers/md/dm.c | 6 +- > drivers/nvdimm/pmem.c| 8 ++- > drivers/s390/block/dcssblk.c | 11 +++- > fs/fuse/virtio_fs.c | 2 +- > include/linux/dax.h | 19 +-- > 7 files changed, 62 insertions(+), 96 deletions(-) > > diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c > index 6cc4da4c713d9..bd7af2f7c5b0a 100644 > --- a/drivers/dax/bus.c > +++ b/drivers/dax/bus.c > @@ -1323,10 +1323,10 @@ struct dev_dax *devm_create_dev_dax(struct > dev_dax_data *data) > } > > /* > -* No 'host' or dax_operations since there is no access to this > -* device outside of mmap of the resulting character device. > +* No dax_operations since there is no access to this device outside > of > +* mmap of the resulting character device. > */ > - dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC); > + dax_dev = alloc_dax(dev_dax, NULL, DAXDEV_F_SYNC); > if (IS_ERR(dax_dev)) { > rc = PTR_ERR(dax_dev); > goto err_alloc_dax; > diff --git a/drivers/dax/super.c b/drivers/dax/super.c > index e20d0cef10a18..9383c11b21853 100644 > --- a/drivers/dax/super.c > +++ b/drivers/dax/super.c > @@ -7,10 +7,8 @@ > #include > #include > #include > -#include > #include > #include > -#include > #include > #include > #include > @@ -26,10 +24,8 @@ > * @flags: state and boolean properties > */ > struct dax_device { > - struct hlist_node list; > struct inode inode; > struct cdev cdev; > - const char *host; > void *private; > unsigned long flags; > const struct dax_operations *ops; > @@ -42,10 +38,6 @@ static DEFINE_IDA(dax_minor_ida); > static struct kmem_cache *dax_cache __read_mostly; > static struct super_block *dax_superblock __read_mostly; > > -#define DAX_HASH_SIZE (PAGE_SIZE / sizeof(struct hlist_head)) > -static struct hlist_head dax_host_list[DAX_HASH_SIZE]; > -static DEFINE_SPINLOCK(dax_host_lock); > - > int dax_read_lock(void) > { > return srcu_read_lock(&dax_srcu); > @@ -58,13 +50,22 @@ void dax_read_unlock(int id) > } > EXPORT_SYMBOL_GPL(dax_read_unlock); > > -static int dax_host_hash(const char *host) > +#if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX) > +#include > + > +static DEFINE_XARRAY(dax_hosts); > + > +int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk) > { > - return hashlen_hash(hashlen_string("DAX", host)) % DAX_HASH_SIZE; > + return xa_insert(&dax_hosts, (unsigned long)disk, dax_dev, > GFP_KERNEL); > } > +EXPORT_SYMBOL_GPL(dax_add_host); Is it time to add a "DAX" symbol namespace? > > -#if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX) > -#include > +void dax_remove_host(struct gendisk *disk) > +{ > + xa_erase(&dax_hosts, (unsigned long)disk); > +} > +EXPORT_SYMBOL_GPL(dax_remove_host); > > int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size, > pgoff_t *pgoff) > @@ -82,40 +83,23 @@ EXPORT_SYMBOL(bdev_dax_pgoff); > > /** > * dax_get_by_host() - temporary lookup mechanism for filesystem-dax > - * @host: alternate name for the device registered by a dax driver > + * @bdev: block device to find a dax_device for > */ > -static struct dax_device *dax_get_by_host(const char *host) > +struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev) > { > - struct dax_device *dax_dev, *found = NULL; > - int hash, id; > + struct dax_device *dax_dev; > + int id; > > - if (!host) > + if (!blk_queue_dax(bdev->bd_disk->queue)) > return NULL; > > - hash = dax_host_hash(host); > - > id = dax_read_lock(); > - spin_lock(&dax_host_lock); > - hlist_for_each_entry(dax_dev, &dax_host_list[hash], list) { > - if (!dax_alive(dax_dev) > - || strcmp(host, dax_dev->host) != 0) > - continue; > - > - if (igrab(&dax_dev->inode)) > - found = dax_dev; > - break; > - } > - spin_unlock(&dax_host_lock); > + dax_dev = xa_load(&dax_hosts, (unsigned long)bdev->bd_disk); > + if (!dax_dev || !dax_alive(dax_dev) || !igrab(&dax_dev->inode)) > + dax_dev = NULL; > dax_read_unlock(id); > > - return found; > -} > - > -struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev) > -{ > - if (!blk_queue_dax(bdev->bd_disk->queue)) > - return
RE: [PATCH] vdpa_sim: avoid putting an uninitialized iova_domain
> From: Longpeng(Mike) > Sent: Monday, November 22, 2021 5:52 PM > > From: Longpeng > > The system will crash if we put an uninitialized iova_domain, this could > happen when an error occurs before initializing the iova_domain in > vdpasim_create(). > > BUG: kernel NULL pointer dereference, address: ... > RIP: 0010:__cpuhp_state_remove_instance+0x96/0x1c0 > ... > Call Trace: > > put_iova_domain+0x29/0x220 > vdpasim_free+0xd1/0x120 [vdpa_sim] > vdpa_release_dev+0x21/0x40 [vdpa] > device_release+0x33/0x90 > kobject_release+0x63/0x160 > vdpasim_create+0x127/0x2a0 [vdpa_sim] > vdpasim_net_dev_add+0x7d/0xfe [vdpa_sim_net] > vdpa_nl_cmd_dev_add_set_doit+0xe1/0x1a0 [vdpa] > genl_family_rcv_msg_doit+0x112/0x140 > genl_rcv_msg+0xdf/0x1d0 > ... > > So we must make sure the iova_domain is already initialized before put it. > > In addition, we may get the following warning in this case: > WARNING: ... drivers/iommu/iova.c:344 iova_cache_put+0x58/0x70 > > So we must make sure the iova_cache_put() is invoked only if the > iova_cache_get() is already invoked. Let's fix it together. > > Signed-off-by: Longpeng Can you please add the fixes tag here so that older kernels can take this fix? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 03/29] dax: remove CONFIG_DAX_DRIVER
On Wed, Nov 17, 2021 at 9:43 AM Dan Williams wrote: > > On Tue, Nov 9, 2021 at 12:33 AM Christoph Hellwig wrote: > > > > CONFIG_DAX_DRIVER only selects CONFIG_DAX now, so remove it. > > Applied. Unapplied, Reviewed-by: Dan Williams ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 02/29] dm: make the DAX support dependend on CONFIG_FS_DAX
On Thu, Nov 18, 2021 at 10:55 PM Christoph Hellwig wrote: > > On Wed, Nov 17, 2021 at 09:23:44AM -0800, Dan Williams wrote: > > Applied, fixed the spelling of 'dependent' in the subject and picked > > up Mike's Ack from the previous send: > > > > https://lore.kernel.org/r/yyasbvuorceds...@redhat.com > > > > Christoph, any particular reason you did not pick up the tags from the > > last posting? > > I thought I did, but apparently I've missed some. I'll reply with the ones I see missing that need carrying over and add my own reviewed-by then you can send me a pull request when ready, deal? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V5 1/4] virtio_ring: validate used buffer length
On Mon, Nov 22, 2021 at 9:50 PM Halil Pasic wrote: > > On Mon, 22 Nov 2021 14:25:26 +0800 > Jason Wang wrote: > > > On Mon, Nov 22, 2021 at 1:49 PM Halil Pasic wrote: > > > > > > On Mon, 22 Nov 2021 06:35:18 +0100 > > > Halil Pasic wrote: > > > > > > > > I think it should be a common issue, looking at > > > > > vhost_vsock_handle_tx_kick(), it did: > > > > > > > > > > len += sizeof(pkt->hdr); > > > > > vhost_add_used(vq, head, len); > > > > > > > > > > which looks like a violation of the spec since it's TX. > > > > > > > > I'm not sure the lines above look like a violation of the spec. If you > > > > examine vhost_vsock_alloc_pkt() I believe that you will agree that: > > > > len == pkt->len == pkt->hdr.len > > > > which makes sense since according to the spec both tx and rx messages > > > > are hdr+payload. And I believe hdr.len is the size of the payload, > > > > although that does not seem to be properly documented by the spec. > > > > Sorry for being unclear, what I meant is that we probably should use > > zero here. TX doesn't use in buffer actually. > > > > According to the spec, 0 should be the used length: > > > > "and len the total of bytes written into the buffer." > > Right, I was wrong. I somehow assumed this is the total length and not > just the number of bytes written. > > > > > > > > > > > On the other hand tx messages are stated to be device read-only (in the > > > > spec) so if the device writes stuff, that is certainly wrong. > > > > > > > > Yes. > > > > > > If that is what happens. > > > > > > > > Looking at virtqueue_get_buf_ctx_split() I'm not sure that is what > > > > happens. My hypothesis is that we just a last descriptor is an 'in' > > > > type descriptor (i.e. a device writable one). For tx that assumption > > > > would be wrong. > > > > > > > > I will have another look at this today and send a fix patch if my > > > > suspicion is confirmed. > > Yeah, I didn't remember the semantic of > vq->split.vring.used->ring[last_used].len > correctly, and in fact also how exactly the rings work. So your objection > is correct. > > Maybe updating some stuff would make it easier to not make this mistake. > > For example the spec and also the linux header says: > > /* le32 is used here for ids for padding reasons. */ > struct virtq_used_elem { > /* Index of start of used descriptor chain. */ > le32 id; > /* Total length of the descriptor chain which was used (written to) */ > le32 len; > }; > > I think that comment isn't as clear as it could be. I would prefer: > /* The number of bytes written into the device writable portion of the > buffer described by the descriptor chain. */ > > I believe "the descriptor chain which was used" includes both the > descriptors that map the device read only and the device write > only portions of the buffer described by the descriptor chain. And the > total length of that descriptor chain may be defined either as a number > of the descriptors that form the chain, or the length of the buffer. > > One has to use the descriptor chain even if the whole buffer is device > read only. So "used" == "written to" does not make any sense to me. Not a native speaker but if others are fine I'm ok with this tweak on the comment. > > Also something like > int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int > bytes_written) > instead of > int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len) > would make it easier to read the code correctly. Or maybe a comment to explain the len. Thanks > > > > > > > If my suspicion is right something like: > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > > > index 00f64f2f8b72..efb57898920b 100644 > > > --- a/drivers/virtio/virtio_ring.c > > > +++ b/drivers/virtio/virtio_ring.c > > > @@ -764,6 +764,7 @@ static void *virtqueue_get_buf_ctx_split(struct > > > virtqueue *_vq, > > > struct vring_virtqueue *vq = to_vvq(_vq); > > > void *ret; > > > unsigned int i; > > > + bool has_in; > > > u16 last_used; > > > > > > START_USE(vq); > > > @@ -787,6 +788,9 @@ static void *virtqueue_get_buf_ctx_split(struct > > > virtqueue *_vq, > > > vq->split.vring.used->ring[last_used].id); > > > *len = virtio32_to_cpu(_vq->vdev, > > > vq->split.vring.used->ring[last_used].len); > > > + has_in = virtio16_to_cpu(_vq->vdev, > > > + vq->split.vring.used->ring[last_used].flags) > > > + & VRING_DESC_F_WRITE; > > > > Did you mean vring.desc actually? If yes, it's better not depend on > > the descriptor ring which can be modified by the device. We've stored > > the flags in desc_extra[]. > > > > > > > > if (unlikely(i >= vq->split.vring.num)) { > > > BAD_RING(vq, "id %u out of range\n", i); > > > @@ -796,7 +800,7 @@ static void *virtqueue_get_buf_ctx_split(struct > > > vir
Re: [PATCH 1/2] vdpa: Add support for querying statistics
On Mon, Nov 22, 2021 at 11:56 PM Parav Pandit wrote: > > > > > From: Eli Cohen > > Sent: Monday, November 22, 2021 8:37 PM > > > > On Mon, Nov 22, 2021 at 12:15:36PM +0200, Parav Pandit wrote: > > > > > > > > > > From: Jason Wang > > > > Sent: Monday, November 22, 2021 3:02 PM > > > > > > > > > If we go with vendor stats, how can we communicate the information > > > > > to userspace? Currenlty we use netlink attributes defined to pass > > > > > this information. > > > > > > > > It can be done exactly as what have been done in the patch, we can > > > > document it as vendor stats. > > > > > > > Yes, attribute to have VENDOR_ prefix in it. > > > > > > > > Ok, I think I get you. So I wonder if it's more useful to use device > > > > specific counters. For networking, it could be packets send/received > > > > etc. > > > > > > Yes, I equally discussed this previously with Eli as its more meaningful > > > for end > > users. > > > We just return the device id of it along with queue number that helps to > > > show > > tx and rx. > > > For ctrl q, it is just ctrl commands and ctrl completions. > > > > I don't think we should mix send/receive packets for descriptors > > statistics. The > > hardware could process a descriptor and still not transmit any packet. > > > > We can add packets send/recv but descriptor statistics have their own value. > > > Oh right. I read Jason's comment of _packets_ to fast. I meant to say > send/receive descriptors. > I guess you already named them as tx and rx. Didn't review the patches in > this series yet. > > > To summarize, I can add the VENDOR_ preifx to the attibutes and re-send or > > is > > there anything else you think should change? > VENDOR_ prefix and command as iproute2 command as "vstats" looks fine to me. Ack, but we need to figure out: 1) use descriptors or buffers. 2) if we use descriptors, for indirect descriptors and descriptor chains how are they counted? Thanks > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V5 1/4] virtio_ring: validate used buffer length
On Tue, Nov 23, 2021 at 4:24 AM Halil Pasic wrote: > > On Mon, 22 Nov 2021 14:25:26 +0800 > Jason Wang wrote: > > > I think the fixes are: > > > > 1) fixing the vhost vsock > > 2) use suppress_used_validation=true to let vsock driver to validate > > the in buffer length > > 3) probably a new feature so the driver can only enable the validation > > when the feature is enabled. > > I'm not sure, I would consider a F_DEV_Y_FIXED_BUG_X a perfectly good > feature. Frankly the set of such bugs is device implementation > specific and it makes little sense to specify a feature bit > that says the device implementation claims to adhere to some > aspect of the specification. Also what would be the semantic > of not negotiating F_DEV_Y_FIXED_BUG_X? Yes, I agree. Rethink of the feature bit, it seems unnecessary, especially considering the driver should not care about the used length for tx. > > On the other hand I see no other way to keep the validation > permanently enabled for fixed implementations, and get around the problem > with broken implementations. So we could have something like > VHOST_USED_LEN_STRICT. It's more about a choice of the driver's knowledge. For vsock TX it should be fine. If we introduce a parameter and disable it by default, it won't be very useful. > > Maybe, we can also think of 'warn and don't alter behavior' instead of > 'warn' and alter behavior. Or maybe even not having such checks on in > production, but only when testing. I think there's an agreement that virtio drivers need more hardening, that's why a lot of patches were merged. Especially considering the new requirements came from confidential computing, smart NIC and VDUSE. For virtio drivers, enabling the validation may help to 1) protect the driver from the buggy and malicious device 2) uncover the bugs of the devices (as vsock did, and probably rpmsg) 3) force the have a smart driver that can do the validation itself then we can finally remove the validation in the core So I'd like to keep it enabled. Thanks > > Regards, > Halil > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vdpa_sim: avoid putting an uninitialized iova_domain
On Mon, Nov 22, 2021 at 8:22 PM Longpeng(Mike) wrote: > > From: Longpeng > > The system will crash if we put an uninitialized iova_domain, this > could happen when an error occurs before initializing the iova_domain > in vdpasim_create(). > > BUG: kernel NULL pointer dereference, address: > ... > RIP: 0010:__cpuhp_state_remove_instance+0x96/0x1c0 > ... > Call Trace: > > put_iova_domain+0x29/0x220 > vdpasim_free+0xd1/0x120 [vdpa_sim] > vdpa_release_dev+0x21/0x40 [vdpa] > device_release+0x33/0x90 > kobject_release+0x63/0x160 > vdpasim_create+0x127/0x2a0 [vdpa_sim] > vdpasim_net_dev_add+0x7d/0xfe [vdpa_sim_net] > vdpa_nl_cmd_dev_add_set_doit+0xe1/0x1a0 [vdpa] > genl_family_rcv_msg_doit+0x112/0x140 > genl_rcv_msg+0xdf/0x1d0 > ... > > So we must make sure the iova_domain is already initialized before > put it. > > In addition, we may get the following warning in this case: > WARNING: ... drivers/iommu/iova.c:344 iova_cache_put+0x58/0x70 > > So we must make sure the iova_cache_put() is invoked only if the > iova_cache_get() is already invoked. Let's fix it together. > > Signed-off-by: Longpeng Acked-by: Jason Wang > --- > drivers/vdpa/vdpa_sim/vdpa_sim.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c > b/drivers/vdpa/vdpa_sim/vdpa_sim.c > index 5f484fff8dbe..41b0cd17fcba 100644 > --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c > @@ -591,8 +591,11 @@ static void vdpasim_free(struct vdpa_device *vdpa) > vringh_kiov_cleanup(&vdpasim->vqs[i].in_iov); > } > > - put_iova_domain(&vdpasim->iova); > - iova_cache_put(); > + if (vdpa_get_dma_dev(vdpa)) { > + put_iova_domain(&vdpasim->iova); > + iova_cache_put(); > + } > + > kvfree(vdpasim->buffer); > if (vdpasim->iommu) > vhost_iotlb_free(vdpasim->iommu); > -- > 2.27.0 > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V5 1/4] virtio_ring: validate used buffer length
On Mon, 22 Nov 2021 14:25:26 +0800 Jason Wang wrote: > I think the fixes are: > > 1) fixing the vhost vsock > 2) use suppress_used_validation=true to let vsock driver to validate > the in buffer length > 3) probably a new feature so the driver can only enable the validation > when the feature is enabled. I'm not sure, I would consider a F_DEV_Y_FIXED_BUG_X a perfectly good feature. Frankly the set of such bugs is device implementation specific and it makes little sense to specify a feature bit that says the device implementation claims to adhere to some aspect of the specification. Also what would be the semantic of not negotiating F_DEV_Y_FIXED_BUG_X? On the other hand I see no other way to keep the validation permanently enabled for fixed implementations, and get around the problem with broken implementations. So we could have something like VHOST_USED_LEN_STRICT. Maybe, we can also think of 'warn and don't alter behavior' instead of 'warn' and alter behavior. Or maybe even not having such checks on in production, but only when testing. Regards, Halil ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V5 07/10] io_uring: switch to kernel_worker
On 11/22/21 8:20 AM, Jens Axboe wrote: > On 11/22/21 3:02 AM, Christian Brauner wrote: >> On Sun, Nov 21, 2021 at 11:17:11AM -0700, Jens Axboe wrote: >>> On 11/21/21 10:49 AM, Mike Christie wrote: Convert io_uring and io-wq to use kernel_worker. >>> >>> I don't like the kernel_worker name, that implies it's always giving you >>> a kernel thread or kthread. That's not the io_uring use case, it's >>> really just a thread off the original task that just happens to never >>> exit to userspace. >>> >>> Can we do a better name? At least io_thread doesn't imply that. >> >> Yeah, I had thought about that as well and at first had kernel_uworker() >> locally but wasn't convinced. Maybe we should just make it >> create_user_worker()? > > That's better, or maybe even create_user_inkernel_thread() or something? > Pretty long, though... I'd be fine with create_user_worker(). > Ok, I'll do: create_user_worker() start_user_worker() since you guys agree. It will also match the PF flag naming. I'll also add more details to the commit message you requested. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 2/2] vhost/vsock: cleanup removing `len` variable
We can increment `total_len` directly and remove `len` since it is no longer used for vhost_add_used(). Signed-off-by: Stefano Garzarella --- drivers/vhost/vsock.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 4e3b95af7ee4..d6ca1c7ad513 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -511,8 +511,6 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work) vhost_disable_notify(&vsock->dev, vq); do { - u32 len; - if (!vhost_vsock_more_replies(vsock)) { /* Stop tx until the device processes already * pending replies. Leave tx virtqueue @@ -540,7 +538,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work) continue; } - len = pkt->len; + total_len += sizeof(pkt->hdr) + pkt->len; /* Deliver to monitoring devices all received packets */ virtio_transport_deliver_tap_pkt(pkt); @@ -553,9 +551,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work) else virtio_transport_free_pkt(pkt); - len += sizeof(pkt->hdr); vhost_add_used(vq, head, 0); - total_len += len; added = true; } while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len))); -- 2.31.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 1/2] vhost/vsock: fix incorrect used length reported to the guest
The "used length" reported by calling vhost_add_used() must be the number of bytes written by the device (using "in" buffers). In vhost_vsock_handle_tx_kick() the device only reads the guest buffers (they are all "out" buffers), without writing anything, so we must pass 0 as "used length" to comply virtio spec. Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") Cc: sta...@vger.kernel.org Reported-by: Halil Pasic Suggested-by: Jason Wang Signed-off-by: Stefano Garzarella --- drivers/vhost/vsock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 938aefbc75ec..4e3b95af7ee4 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -554,7 +554,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work) virtio_transport_free_pkt(pkt); len += sizeof(pkt->hdr); - vhost_add_used(vq, head, len); + vhost_add_used(vq, head, 0); total_len += len; added = true; } while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len))); -- 2.31.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 0/2] vhost/vsock: fix used length and cleanup in vhost_vsock_handle_tx_kick()
This is a follow-up to Micheal's patch [1] and the discussion with Halil and Jason [2]. I made two patches, one to fix the problem and one for cleanup. This should simplify the backport of the fix because we've had the problem since vhost-vsock was introduced (v4.8) and that part has been touched a bit recently. Thanks, Stefano [1] https://lore.kernel.org/virtualization/20211122105822.onarsa4sydzxqynu@steredhat/T/#t [2] https://lore.kernel.org/virtualization/20211027022107.14357-1-jasow...@redhat.com/T/#t Stefano Garzarella (2): vhost/vsock: fix incorrect used length reported to the guest vhost/vsock: cleanup removing `len` variable drivers/vhost/vsock.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) -- 2.31.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V5 1/4] virtio_ring: validate used buffer length
On Mon, Nov 22, 2021 at 03:24:32PM +0100, Halil Pasic wrote: On Mon, 22 Nov 2021 12:08:22 +0100 Stefano Garzarella wrote: On Mon, Nov 22, 2021 at 08:55:24AM +0100, Stefano Garzarella wrote: >On Mon, Nov 22, 2021 at 02:25:26PM +0800, Jason Wang wrote: >>On Mon, Nov 22, 2021 at 1:49 PM Halil Pasic wrote: >>> >>>On Mon, 22 Nov 2021 06:35:18 +0100 >>>Halil Pasic wrote: >>> > I think it should be a common issue, looking at > vhost_vsock_handle_tx_kick(), it did: > > len += sizeof(pkt->hdr); > vhost_add_used(vq, head, len); > > which looks like a violation of the spec since it's TX. I'm not sure the lines above look like a violation of the spec. If you examine vhost_vsock_alloc_pkt() I believe that you will agree that: len == pkt->len == pkt->hdr.len which makes sense since according to the spec both tx and rx messages are hdr+payload. And I believe hdr.len is the size of the payload, although that does not seem to be properly documented by the spec. >> >>Sorry for being unclear, what I meant is that we probably should use >>zero here. TX doesn't use in buffer actually. >> >>According to the spec, 0 should be the used length: >> >>"and len the total of bytes written into the buffer." >> On the other hand tx messages are stated to be device read-only (in the spec) so if the device writes stuff, that is certainly wrong. >> >>Yes. >> If that is what happens. Looking at virtqueue_get_buf_ctx_split() I'm not sure that is what happens. My hypothesis is that we just a last descriptor is an 'in' type descriptor (i.e. a device writable one). For tx that assumption would be wrong. I will have another look at this today and send a fix patch if my suspicion is confirmed. >>> >>>If my suspicion is right something like: >>> >>>diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c >>>index 00f64f2f8b72..efb57898920b 100644 >>>--- a/drivers/virtio/virtio_ring.c >>>+++ b/drivers/virtio/virtio_ring.c >>>@@ -764,6 +764,7 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq, >>>struct vring_virtqueue *vq = to_vvq(_vq); >>>void *ret; >>>unsigned int i; >>>+ bool has_in; >>>u16 last_used; >>> >>>START_USE(vq); >>>@@ -787,6 +788,9 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq, >>>vq->split.vring.used->ring[last_used].id); >>>*len = virtio32_to_cpu(_vq->vdev, >>>vq->split.vring.used->ring[last_used].len); >>>+ has_in = virtio16_to_cpu(_vq->vdev, >>>+ vq->split.vring.used->ring[last_used].flags) >>>+ & VRING_DESC_F_WRITE; >> >>Did you mean vring.desc actually? If yes, it's better not depend on >>the descriptor ring which can be modified by the device. We've stored >>the flags in desc_extra[]. >> >>> >>>if (unlikely(i >= vq->split.vring.num)) { >>>BAD_RING(vq, "id %u out of range\n", i); >>>@@ -796,7 +800,7 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq, >>>BAD_RING(vq, "id %u is not a head!\n", i); >>>return NULL; >>>} >>>- if (vq->buflen && unlikely(*len > vq->buflen[i])) { >>>+ if (has_in && q->buflen && unlikely(*len > vq->buflen[i])) { >>>BAD_RING(vq, "used len %d is larger than in buflen %u\n", >>>*len, vq->buflen[i]); >>>return NULL; >>> >>>would fix the problem for split. I will try that out and let you know >>>later. >> >>I'm not sure I get this, in virtqueue_add_split, the buflen[i] only >>contains the in buffer length. >> >>I think the fixes are: >> >>1) fixing the vhost vsock > >Yep, in vhost_vsock_handle_tx_kick() we should have vhost_add_used(vq, >head, 0) since the device doesn't write anything. > >>2) use suppress_used_validation=true to let vsock driver to validate >>the in buffer length >>3) probably a new feature so the driver can only enable the validation >>when the feature is enabled. > >I fully agree with these steps. Michael sent a patch to suppress the validation, so I think we should just fix vhost-vsock. I mean something like this: diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 938aefbc75ec..4e3b95af7ee4 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -554,7 +554,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work) virtio_transport_free_pkt(pkt); len += sizeof(pkt->hdr); - vhost_add_used(vq, head, len); + vhost_add_used(vq, head, 0); total_len += len; added = true; } while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len))); I checked and the problem is there from the first commit, so we should add: Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost
RE: [PATCH 1/2] vdpa: Add support for querying statistics
> From: Eli Cohen > Sent: Monday, November 22, 2021 8:37 PM > > On Mon, Nov 22, 2021 at 12:15:36PM +0200, Parav Pandit wrote: > > > > > > > From: Jason Wang > > > Sent: Monday, November 22, 2021 3:02 PM > > > > > > > If we go with vendor stats, how can we communicate the information > > > > to userspace? Currenlty we use netlink attributes defined to pass > > > > this information. > > > > > > It can be done exactly as what have been done in the patch, we can > > > document it as vendor stats. > > > > > Yes, attribute to have VENDOR_ prefix in it. > > > > > > Ok, I think I get you. So I wonder if it's more useful to use device > > > specific counters. For networking, it could be packets send/received etc. > > > > Yes, I equally discussed this previously with Eli as its more meaningful > > for end > users. > > We just return the device id of it along with queue number that helps to > > show > tx and rx. > > For ctrl q, it is just ctrl commands and ctrl completions. > > I don't think we should mix send/receive packets for descriptors statistics. > The > hardware could process a descriptor and still not transmit any packet. > > We can add packets send/recv but descriptor statistics have their own value. > Oh right. I read Jason's comment of _packets_ to fast. I meant to say send/receive descriptors. I guess you already named them as tx and rx. Didn't review the patches in this series yet. > To summarize, I can add the VENDOR_ preifx to the attibutes and re-send or is > there anything else you think should change? VENDOR_ prefix and command as iproute2 command as "vstats" looks fine to me. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V5 1/4] virtio_ring: validate used buffer length
On Mon, 22 Nov 2021 12:08:22 +0100 Stefano Garzarella wrote: > On Mon, Nov 22, 2021 at 08:55:24AM +0100, Stefano Garzarella wrote: > >On Mon, Nov 22, 2021 at 02:25:26PM +0800, Jason Wang wrote: > >>On Mon, Nov 22, 2021 at 1:49 PM Halil Pasic wrote: > >>> > >>>On Mon, 22 Nov 2021 06:35:18 +0100 > >>>Halil Pasic wrote: > >>> > > I think it should be a common issue, looking at > > vhost_vsock_handle_tx_kick(), it did: > > > > len += sizeof(pkt->hdr); > > vhost_add_used(vq, head, len); > > > > which looks like a violation of the spec since it's TX. > > I'm not sure the lines above look like a violation of the spec. If you > examine vhost_vsock_alloc_pkt() I believe that you will agree that: > len == pkt->len == pkt->hdr.len > which makes sense since according to the spec both tx and rx messages > are hdr+payload. And I believe hdr.len is the size of the payload, > although that does not seem to be properly documented by the spec. > >> > >>Sorry for being unclear, what I meant is that we probably should use > >>zero here. TX doesn't use in buffer actually. > >> > >>According to the spec, 0 should be the used length: > >> > >>"and len the total of bytes written into the buffer." > >> > > On the other hand tx messages are stated to be device read-only (in the > spec) so if the device writes stuff, that is certainly wrong. > > >> > >>Yes. > >> > If that is what happens. > > Looking at virtqueue_get_buf_ctx_split() I'm not sure that is what > happens. My hypothesis is that we just a last descriptor is an 'in' > type descriptor (i.e. a device writable one). For tx that assumption > would be wrong. > > I will have another look at this today and send a fix patch if my > suspicion is confirmed. > >>> > >>>If my suspicion is right something like: > >>> > >>>diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > >>>index 00f64f2f8b72..efb57898920b 100644 > >>>--- a/drivers/virtio/virtio_ring.c > >>>+++ b/drivers/virtio/virtio_ring.c > >>>@@ -764,6 +764,7 @@ static void *virtqueue_get_buf_ctx_split(struct > >>>virtqueue *_vq, > >>>struct vring_virtqueue *vq = to_vvq(_vq); > >>>void *ret; > >>>unsigned int i; > >>>+ bool has_in; > >>>u16 last_used; > >>> > >>>START_USE(vq); > >>>@@ -787,6 +788,9 @@ static void *virtqueue_get_buf_ctx_split(struct > >>>virtqueue *_vq, > >>>vq->split.vring.used->ring[last_used].id); > >>>*len = virtio32_to_cpu(_vq->vdev, > >>>vq->split.vring.used->ring[last_used].len); > >>>+ has_in = virtio16_to_cpu(_vq->vdev, > >>>+ vq->split.vring.used->ring[last_used].flags) > >>>+ & VRING_DESC_F_WRITE; > >> > >>Did you mean vring.desc actually? If yes, it's better not depend on > >>the descriptor ring which can be modified by the device. We've stored > >>the flags in desc_extra[]. > >> > >>> > >>>if (unlikely(i >= vq->split.vring.num)) { > >>>BAD_RING(vq, "id %u out of range\n", i); > >>>@@ -796,7 +800,7 @@ static void *virtqueue_get_buf_ctx_split(struct > >>>virtqueue *_vq, > >>>BAD_RING(vq, "id %u is not a head!\n", i); > >>>return NULL; > >>>} > >>>- if (vq->buflen && unlikely(*len > vq->buflen[i])) { > >>>+ if (has_in && q->buflen && unlikely(*len > vq->buflen[i])) { > >>>BAD_RING(vq, "used len %d is larger than in buflen %u\n", > >>>*len, vq->buflen[i]); > >>>return NULL; > >>> > >>>would fix the problem for split. I will try that out and let you know > >>>later. > >> > >>I'm not sure I get this, in virtqueue_add_split, the buflen[i] only > >>contains the in buffer length. > >> > >>I think the fixes are: > >> > >>1) fixing the vhost vsock > > > >Yep, in vhost_vsock_handle_tx_kick() we should have vhost_add_used(vq, > >head, 0) since the device doesn't write anything. > > > >>2) use suppress_used_validation=true to let vsock driver to validate > >>the in buffer length > >>3) probably a new feature so the driver can only enable the validation > >>when the feature is enabled. > > > >I fully agree with these steps. > > Michael sent a patch to suppress the validation, so I think we should > just fix vhost-vsock. I mean something like this: > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c > index 938aefbc75ec..4e3b95af7ee4 100644 > --- a/drivers/vhost/vsock.c > +++ b/drivers/vhost/vsock.c > @@ -554,7 +554,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work > *work) > virtio_transport_free_pkt(pkt); > > len += sizeof(pkt->hdr); > - vhost_add_used(vq, head, len); > + vhost_add_used(vq, head, 0); > total_len += len;
Re: [PATCH V5 07/10] io_uring: switch to kernel_worker
On 11/22/21 3:02 AM, Christian Brauner wrote: > On Sun, Nov 21, 2021 at 11:17:11AM -0700, Jens Axboe wrote: >> On 11/21/21 10:49 AM, Mike Christie wrote: >>> Convert io_uring and io-wq to use kernel_worker. >> >> I don't like the kernel_worker name, that implies it's always giving you >> a kernel thread or kthread. That's not the io_uring use case, it's >> really just a thread off the original task that just happens to never >> exit to userspace. >> >> Can we do a better name? At least io_thread doesn't imply that. > > Yeah, I had thought about that as well and at first had kernel_uworker() > locally but wasn't convinced. Maybe we should just make it > create_user_worker()? That's better, or maybe even create_user_inkernel_thread() or something? Pretty long, though... I'd be fine with create_user_worker(). -- Jens Axboe ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V5 1/4] virtio_ring: validate used buffer length
On Mon, 22 Nov 2021 14:25:26 +0800 Jason Wang wrote: > On Mon, Nov 22, 2021 at 1:49 PM Halil Pasic wrote: > > > > On Mon, 22 Nov 2021 06:35:18 +0100 > > Halil Pasic wrote: > > > > > > I think it should be a common issue, looking at > > > > vhost_vsock_handle_tx_kick(), it did: > > > > > > > > len += sizeof(pkt->hdr); > > > > vhost_add_used(vq, head, len); > > > > > > > > which looks like a violation of the spec since it's TX. > > > > > > I'm not sure the lines above look like a violation of the spec. If you > > > examine vhost_vsock_alloc_pkt() I believe that you will agree that: > > > len == pkt->len == pkt->hdr.len > > > which makes sense since according to the spec both tx and rx messages > > > are hdr+payload. And I believe hdr.len is the size of the payload, > > > although that does not seem to be properly documented by the spec. > > Sorry for being unclear, what I meant is that we probably should use > zero here. TX doesn't use in buffer actually. > > According to the spec, 0 should be the used length: > > "and len the total of bytes written into the buffer." Right, I was wrong. I somehow assumed this is the total length and not just the number of bytes written. > > > > > > > On the other hand tx messages are stated to be device read-only (in the > > > spec) so if the device writes stuff, that is certainly wrong. > > > > > Yes. > > > > If that is what happens. > > > > > > Looking at virtqueue_get_buf_ctx_split() I'm not sure that is what > > > happens. My hypothesis is that we just a last descriptor is an 'in' > > > type descriptor (i.e. a device writable one). For tx that assumption > > > would be wrong. > > > > > > I will have another look at this today and send a fix patch if my > > > suspicion is confirmed. Yeah, I didn't remember the semantic of vq->split.vring.used->ring[last_used].len correctly, and in fact also how exactly the rings work. So your objection is correct. Maybe updating some stuff would make it easier to not make this mistake. For example the spec and also the linux header says: /* le32 is used here for ids for padding reasons. */ struct virtq_used_elem { /* Index of start of used descriptor chain. */ le32 id; /* Total length of the descriptor chain which was used (written to) */ le32 len; }; I think that comment isn't as clear as it could be. I would prefer: /* The number of bytes written into the device writable portion of the buffer described by the descriptor chain. */ I believe "the descriptor chain which was used" includes both the descriptors that map the device read only and the device write only portions of the buffer described by the descriptor chain. And the total length of that descriptor chain may be defined either as a number of the descriptors that form the chain, or the length of the buffer. One has to use the descriptor chain even if the whole buffer is device read only. So "used" == "written to" does not make any sense to me. Also something like int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int bytes_written) instead of int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len) would make it easier to read the code correctly. > > > > If my suspicion is right something like: > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > > index 00f64f2f8b72..efb57898920b 100644 > > --- a/drivers/virtio/virtio_ring.c > > +++ b/drivers/virtio/virtio_ring.c > > @@ -764,6 +764,7 @@ static void *virtqueue_get_buf_ctx_split(struct > > virtqueue *_vq, > > struct vring_virtqueue *vq = to_vvq(_vq); > > void *ret; > > unsigned int i; > > + bool has_in; > > u16 last_used; > > > > START_USE(vq); > > @@ -787,6 +788,9 @@ static void *virtqueue_get_buf_ctx_split(struct > > virtqueue *_vq, > > vq->split.vring.used->ring[last_used].id); > > *len = virtio32_to_cpu(_vq->vdev, > > vq->split.vring.used->ring[last_used].len); > > + has_in = virtio16_to_cpu(_vq->vdev, > > + vq->split.vring.used->ring[last_used].flags) > > + & VRING_DESC_F_WRITE; > > Did you mean vring.desc actually? If yes, it's better not depend on > the descriptor ring which can be modified by the device. We've stored > the flags in desc_extra[]. > > > > > if (unlikely(i >= vq->split.vring.num)) { > > BAD_RING(vq, "id %u out of range\n", i); > > @@ -796,7 +800,7 @@ static void *virtqueue_get_buf_ctx_split(struct > > virtqueue *_vq, > > BAD_RING(vq, "id %u is not a head!\n", i); > > return NULL; > > } > > - if (vq->buflen && unlikely(*len > vq->buflen[i])) { > > + if (has_in && q->buflen && unlikely(*len > vq->buflen[i])) { > > BAD_RING(vq, "used len %d is larger than in buflen %u\n", > > *len, vq->buflen[i]); > >
Re: [PATCH V5 1/4] virtio_ring: validate used buffer length
On Mon, Nov 22, 2021 at 08:55:24AM +0100, Stefano Garzarella wrote: On Mon, Nov 22, 2021 at 02:25:26PM +0800, Jason Wang wrote: On Mon, Nov 22, 2021 at 1:49 PM Halil Pasic wrote: On Mon, 22 Nov 2021 06:35:18 +0100 Halil Pasic wrote: > I think it should be a common issue, looking at > vhost_vsock_handle_tx_kick(), it did: > > len += sizeof(pkt->hdr); > vhost_add_used(vq, head, len); > > which looks like a violation of the spec since it's TX. I'm not sure the lines above look like a violation of the spec. If you examine vhost_vsock_alloc_pkt() I believe that you will agree that: len == pkt->len == pkt->hdr.len which makes sense since according to the spec both tx and rx messages are hdr+payload. And I believe hdr.len is the size of the payload, although that does not seem to be properly documented by the spec. Sorry for being unclear, what I meant is that we probably should use zero here. TX doesn't use in buffer actually. According to the spec, 0 should be the used length: "and len the total of bytes written into the buffer." On the other hand tx messages are stated to be device read-only (in the spec) so if the device writes stuff, that is certainly wrong. Yes. If that is what happens. Looking at virtqueue_get_buf_ctx_split() I'm not sure that is what happens. My hypothesis is that we just a last descriptor is an 'in' type descriptor (i.e. a device writable one). For tx that assumption would be wrong. I will have another look at this today and send a fix patch if my suspicion is confirmed. If my suspicion is right something like: diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 00f64f2f8b72..efb57898920b 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -764,6 +764,7 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq, struct vring_virtqueue *vq = to_vvq(_vq); void *ret; unsigned int i; + bool has_in; u16 last_used; START_USE(vq); @@ -787,6 +788,9 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq, vq->split.vring.used->ring[last_used].id); *len = virtio32_to_cpu(_vq->vdev, vq->split.vring.used->ring[last_used].len); + has_in = virtio16_to_cpu(_vq->vdev, + vq->split.vring.used->ring[last_used].flags) + & VRING_DESC_F_WRITE; Did you mean vring.desc actually? If yes, it's better not depend on the descriptor ring which can be modified by the device. We've stored the flags in desc_extra[]. if (unlikely(i >= vq->split.vring.num)) { BAD_RING(vq, "id %u out of range\n", i); @@ -796,7 +800,7 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq, BAD_RING(vq, "id %u is not a head!\n", i); return NULL; } - if (vq->buflen && unlikely(*len > vq->buflen[i])) { + if (has_in && q->buflen && unlikely(*len > vq->buflen[i])) { BAD_RING(vq, "used len %d is larger than in buflen %u\n", *len, vq->buflen[i]); return NULL; would fix the problem for split. I will try that out and let you know later. I'm not sure I get this, in virtqueue_add_split, the buflen[i] only contains the in buffer length. I think the fixes are: 1) fixing the vhost vsock Yep, in vhost_vsock_handle_tx_kick() we should have vhost_add_used(vq, head, 0) since the device doesn't write anything. 2) use suppress_used_validation=true to let vsock driver to validate the in buffer length 3) probably a new feature so the driver can only enable the validation when the feature is enabled. I fully agree with these steps. Michael sent a patch to suppress the validation, so I think we should just fix vhost-vsock. I mean something like this: diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 938aefbc75ec..4e3b95af7ee4 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -554,7 +554,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work) virtio_transport_free_pkt(pkt); len += sizeof(pkt->hdr); - vhost_add_used(vq, head, len); + vhost_add_used(vq, head, 0); total_len += len; added = true; } while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len))); I checked and the problem is there from the first commit, so we should add: Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") I tested this patch and it works even without suppressing validation in the virtio core. But for backwards compatibility we have to suppress it for sure as Michael did. Maybe we can have a patch just with this change to backport it easily and one after to clean up a bit the code that was added after (len, total_len). @Halil Let me know if you want to do it, otherwise I can do it. Stefano __
Re: [PATCH] vsock/virtio: suppress used length validation
On Mon, Nov 22, 2021 at 04:32:01AM -0500, Michael S. Tsirkin wrote: It turns out that vhost vsock violates the virtio spec by supplying the out buffer length in the used length (should just be the in length). As a result, attempts to validate the used length fail with: vmw_vsock_virtio_transport virtio1: tx: used len 44 is larger than in buflen 0 Since vsock driver does not use the length fox tx and validates the length before use for rx, it is safe to suppress the validation in virtio core for this driver. Reported-by: Halil Pasic Fixes: 939779f5152d ("virtio_ring: validate used buffer length") Cc: "Jason Wang" Signed-off-by: Michael S. Tsirkin --- net/vmw_vsock/virtio_transport.c | 1 + 1 file changed, 1 insertion(+) Thanks for this fix Reviewed-by: Stefano Garzarella I think we should also fix vhost-vsock violation (in stable branches too). @Halil do you plan to send a fix? Otherwise I can do it ;-) Thanks, Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH 1/2] vdpa: Add support for querying statistics
> From: Jason Wang > Sent: Monday, November 22, 2021 3:02 PM > > > If we go with vendor stats, how can we communicate the information to > > userspace? Currenlty we use netlink attributes defined to pass this > > information. > > It can be done exactly as what have been done in the patch, we can document > it as vendor stats. > Yes, attribute to have VENDOR_ prefix in it. > > Ok, I think I get you. So I wonder if it's more useful to use device specific > counters. For networking, it could be packets send/received etc. Yes, I equally discussed this previously with Eli as its more meaningful for end users. We just return the device id of it along with queue number that helps to show tx and rx. For ctrl q, it is just ctrl commands and ctrl completions. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH 1/2] vdpa: Add support for querying statistics
> From: Jason Wang > Sent: Monday, November 22, 2021 8:00 AM > > On Fri, Nov 19, 2021 at 11:09 AM Parav Pandit wrote: > > > > > > > > > From: Jason Wang > > > Sent: Friday, November 19, 2021 8:12 AM > > > > > > On Thu, Nov 18, 2021 at 1:58 PM Eli Cohen wrote: > > > > > > > > Add support for querying virtqueue statistics. Supported statistics are: > > > > > > > > Received_desc - number of descriptors received for the virtqueue > > > > completed_desc - number of descriptors completed for the virtqueue > > > > > > > > A new callback was added to vdpa_config_ops which provides the > > > > means for the vdpa driver to return statistics results. > > > > > > > > The interface allows for reading all the supported virtqueues, > > > > including the control virtqueue if it exists, by returning the > > > > next queue index to query. > > > > > > > > Examples: > > > > 1. Read statisitics for the virtqueue at index 1 $ vdpa dev stats > > > > show vdpa-a index 1 > > > > vdpa-a: > > > > index 1 tx received_desc 21 completed_desc 21 > > > > > > > > 2. Read statisitics for all the virtqueues vdpa dev stats show > > > > vdpa-a > > > > vdpa-a: > > > > index 0 rx received_desc 256 completed_desc 12 index 1 tx > > > > received_desc 21 completed_desc 21 index 2 ctrl received_desc 0 > > > > completed_desc 0 > > > > > > Adding Adrian and Laurent. > > > > > > It's quite useful but I think it's vendor specific statistics. > > These are vdpa device specific of Linux. > > And are very generic of the VQ for all device types. > > The question is what happens if the parent doesn't implement those statistics. Only those statistics to be filled up by the vendor driver that it implements. If parent doesn't implement any of it, usual EUNSUPPORT is returned. > > > > > > I wonder if it's better > > > to use "vendor" prefix in the protocol, then we use this instead: > > > > > > vdpa dev vendor-stats show vdpa-a > > > > > May be. Lets evaluate if stats of this patch are generic enough or not. > > > > > Or if we want to make it standard is exposing virtio index better? > > > > > > qid 0 last_avail_idx X avail_idx Y last_used_idx M used_idx N > > > > > I did consider this option a while back. Shows indices are equally useful. > > I think we should show that as vq info, along with other VQ attributes > > (addr, > len). > > That may work but it looks to me the receiced_desc/completed_desc is also per > vq. > It is per VQ but statistics of a VQ and VQ config are two different things and also mostly consumed by different entities. For example, prometheous and other similar entity will use VQ statistics. Prometheous etc statistics collector often makes frequent queries for it for which a VQ address is of no use. While VQ addr, indices etc will be more used by the developers/debuggers etc. And it is better not to mix these different info under one netlink command. > > $ vdpa dev show vq > > > > But showing indices are not less statistics and more current state of the > queue. > > For example roll over of the indices won't cover absolute number of > descriptors processed for the queue. > > And even if we make them u64 (not good), non_developer end user needs to > keep using side calculator to count the delta. > > How about exposing those raw indices via the protocol and letting the vdpa > tool > calculate for us? With wrap arounds it wont be able to calculate it. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 6/6] vhost_net: use RCU callbacks instead of synchronize_rcu()
On Fri, Nov 19, 2021 at 02:32:05PM +0300, Andrey Ryabinin wrote: > > > On 11/16/21 8:00 AM, Jason Wang wrote: > > On Mon, Nov 15, 2021 at 11:32 PM Andrey Ryabinin > > wrote: > >> > >> Currently vhost_net_release() uses synchronize_rcu() to synchronize > >> freeing with vhost_zerocopy_callback(). However synchronize_rcu() > >> is quite costly operation. It take more than 10 seconds > >> to shutdown qemu launched with couple net devices like this: > >> -netdev tap,id=tap0,..,vhost=on,queues=80 > >> because we end up calling synchronize_rcu() netdev_count*queues times. > >> > >> Free vhost net structures in rcu callback instead of using > >> synchronize_rcu() to fix the problem. > > > > I admit the release code is somehow hard to understand. But I wonder > > if the following case can still happen with this: > > > > CPU 0 (vhost_dev_cleanup) CPU1 > > (vhost_net_zerocopy_callback()->vhost_work_queue()) > > if (!dev->worker) > > dev->worker = NULL > > > > wake_up_process(dev->worker) > > > > If this is true. It seems the fix is to move RCU synchronization stuff > > in vhost_net_ubuf_put_and_wait()? > > > > It all depends whether vhost_zerocopy_callback() can be called outside of > vhost > thread context or not. If it can run after vhost thread stopped, than the > race you > describe seems possible and the fix in commit b0c057ca7e83 ("vhost: fix a > theoretical race in device cleanup") > wasn't complete. I would fix it by calling synchronize_rcu() after > vhost_net_flush() > and before vhost_dev_cleanup(). > > As for the performance problem, it can be solved by replacing > synchronize_rcu() with synchronize_rcu_expedited(). expedited causes a stop of IPIs though, so it's problematic to do it upon a userspace syscall. > But now I'm not sure that this race is actually exists and that > synchronize_rcu() needed at all. > I did a bit of testing and I only see callback being called from vhost thread: > > vhost-3724 3733 [002] 2701.768731: probe:vhost_zerocopy_callback: > (81af8c10) > 81af8c11 vhost_zerocopy_callback+0x1 ([kernel.kallsyms]) > 81bb34f6 skb_copy_ubufs+0x256 ([kernel.kallsyms]) > 81bce621 __netif_receive_skb_core.constprop.0+0xac1 > ([kernel.kallsyms]) > 81bd062d __netif_receive_skb_one_core+0x3d ([kernel.kallsyms]) > 81bd0748 netif_receive_skb+0x38 ([kernel.kallsyms]) > 819a2a1e tun_get_user+0xdce ([kernel.kallsyms]) > 819a2cf4 tun_sendmsg+0xa4 ([kernel.kallsyms]) > 81af9229 handle_tx_zerocopy+0x149 ([kernel.kallsyms]) > 81afaf05 handle_tx+0xc5 ([kernel.kallsyms]) > 81afce86 vhost_worker+0x76 ([kernel.kallsyms]) > 811581e9 kthread+0x169 ([kernel.kallsyms]) > 810018cf ret_from_fork+0x1f ([kernel.kallsyms]) >0 [unknown] ([unknown]) > > This means that the callback can't run after kthread_stop() in > vhost_dev_cleanup() and no synchronize_rcu() needed. > > I'm not confident that my quite limited testing cover all possible > vhost_zerocopy_callback() callstacks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH] vsock/virtio: suppress used length validation
It turns out that vhost vsock violates the virtio spec by supplying the out buffer length in the used length (should just be the in length). As a result, attempts to validate the used length fail with: vmw_vsock_virtio_transport virtio1: tx: used len 44 is larger than in buflen 0 Since vsock driver does not use the length fox tx and validates the length before use for rx, it is safe to suppress the validation in virtio core for this driver. Reported-by: Halil Pasic Fixes: 939779f5152d ("virtio_ring: validate used buffer length") Cc: "Jason Wang" Signed-off-by: Michael S. Tsirkin --- net/vmw_vsock/virtio_transport.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index 4f7c99dfd16c..3f82b2f1e6dd 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -731,6 +731,7 @@ static unsigned int features[] = { static struct virtio_driver virtio_vsock_driver = { .feature_table = features, .feature_table_size = ARRAY_SIZE(features), + .suppress_used_validation = true, .driver.name = KBUILD_MODNAME, .driver.owner = THIS_MODULE, .id_table = id_table, -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/2] vdpa: Add support for querying statistics
On Mon, Nov 22, 2021 at 5:08 PM Eli Cohen wrote: > > On Mon, Nov 22, 2021 at 04:07:24PM +0800, Jason Wang wrote: > > On Mon, Nov 22, 2021 at 3:56 PM Eli Cohen wrote: > > > > > > On Mon, Nov 22, 2021 at 10:30:12AM +0800, Jason Wang wrote: > > > > On Fri, Nov 19, 2021 at 11:09 AM Parav Pandit wrote: > > > > > > > > > > > > > > > > > > > > > From: Jason Wang > > > > > > Sent: Friday, November 19, 2021 8:12 AM > > > > > > > > > > > > On Thu, Nov 18, 2021 at 1:58 PM Eli Cohen wrote: > > > > > > > > > > > > > > Add support for querying virtqueue statistics. Supported > > > > > > > statistics are: > > > > > > > > > > > > > > Received_desc - number of descriptors received for the virtqueue > > > > > > > completed_desc - number of descriptors completed for the virtqueue > > > > > > > > > > > > > > A new callback was added to vdpa_config_ops which provides the > > > > > > > means > > > > > > > for the vdpa driver to return statistics results. > > > > > > > > > > > > > > The interface allows for reading all the supported virtqueues, > > > > > > > including the control virtqueue if it exists, by returning the > > > > > > > next > > > > > > > queue index to query. > > > > > > > > > > > > > > Examples: > > > > > > > 1. Read statisitics for the virtqueue at index 1 $ vdpa dev stats > > > > > > > show > > > > > > > vdpa-a index 1 > > > > > > > vdpa-a: > > > > > > > index 1 tx received_desc 21 completed_desc 21 > > > > > > > > > > > > > > 2. Read statisitics for all the virtqueues vdpa dev stats show > > > > > > > vdpa-a > > > > > > > vdpa-a: > > > > > > > index 0 rx received_desc 256 completed_desc 12 index 1 tx > > > > > > > received_desc 21 completed_desc 21 index 2 ctrl received_desc 0 > > > > > > > completed_desc 0 > > > > > > > > > > > > Adding Adrian and Laurent. > > > > > > > > > > > > It's quite useful but I think it's vendor specific statistics. > > > > > These are vdpa device specific of Linux. > > > > > And are very generic of the VQ for all device types. > > > > > > > > The question is what happens if the parent doesn't implement those > > > > statistics. > > > > > > > > > > Are you suggesting that some parents may support reporting a subeset of > > > the fields and we should maybe indicate what is actually reported? > > > > It's an open question. E.g do we want a stable API for the those > > statistics counter? If yes, it's better to not to let the parents to > > report a subset but then it forces the exact same counters to be > > supported by other vendors. If not, it should be fine. For any case, I > > think it's simpler to start with "vendor-stats" and we can switch it > > to a standard way if it was agreed by every vendor. > > > > received and completed descriptors are very basic and I assume any > vendor would support those. At least it was not supported by virtio. > If we go with vendor stats, how can we > communicate the information to userspace? Currenlty we use netlink > attributes defined to pass this information. It can be done exactly as what have been done in the patch, we can document it as vendor stats. > > > > > > > > > > > > > > > I wonder if it's better > > > > > > to use "vendor" prefix in the protocol, then we use this instead: > > > > > > > > > > > > vdpa dev vendor-stats show vdpa-a > > > > > > > > > > > May be. Lets evaluate if stats of this patch are generic enough or > > > > > not. > > > > > > > > > > > Or if we want to make it standard is exposing virtio index better? > > > > > > > > > > > > qid 0 last_avail_idx X avail_idx Y last_used_idx M used_idx N > > > > > > > > > > > I did consider this option a while back. Shows indices are equally > > > > > useful. > > > > > I think we should show that as vq info, along with other VQ > > > > > attributes (addr, len). > > > > > > > > That may work but it looks to me the receiced_desc/completed_desc is > > > > also per vq. > > > > > > > > Another question is that is it more useful to use buffers instead of > > > > descriptors? E.g how indirect descriptors are counted. > > > > > > > > > > I think it's descriptors so indirect descriptors are counted once but I > > > need to verify that. > > > > > > > > $ vdpa dev show vq > > > > > > > > > > But showing indices are not less statistics and more current state of > > > > > the queue. > > > > > For example roll over of the indices won't cover absolute number of > > > > > descriptors processed for the queue. > > > > > And even if we make them u64 (not good), non_developer end user needs > > > > > to keep using side calculator to count the delta. > > > > > > > > How about exposing those raw indices via the protocol and letting the > > > > vdpa tool calculate for us? > > > > > > > > > > counters are 16 bit per the virtio spec so I don't know how you could > > > handle rolover without losing information. > > > > So at most 1 rollover I guess. So it's not hard by comparing the > > indices. E.g the if last_avil_idx(device avail idx) > avail_idx, we > > know there's one? > > > > I am no
Re: [PATCH V5 02/10] fork/vm: Move common PF_IO_WORKER behavior to new flag
On Sun, Nov 21, 2021 at 6:49 PM Mike Christie wrote: > This adds a new flag, PF_USER_WORKER, that's used for behavior common to > to both PF_IO_WORKER and users like vhost which will use the new > kernel_worker helpers that will use the flag and are added later in this > patchset. > > The common behavior PF_USER_WORKER covers is the initial frame and fpu > setup and the vm reclaim handling. > > Signed-off-by: Mike Christie > arch/m68k/kernel/process.c | 2 +- Acked-by: Geert Uytterhoeven Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/2] vdpa: Add support for querying statistics
On Mon, Nov 22, 2021 at 3:56 PM Eli Cohen wrote: > > On Mon, Nov 22, 2021 at 10:30:12AM +0800, Jason Wang wrote: > > On Fri, Nov 19, 2021 at 11:09 AM Parav Pandit wrote: > > > > > > > > > > > > > From: Jason Wang > > > > Sent: Friday, November 19, 2021 8:12 AM > > > > > > > > On Thu, Nov 18, 2021 at 1:58 PM Eli Cohen wrote: > > > > > > > > > > Add support for querying virtqueue statistics. Supported statistics > > > > > are: > > > > > > > > > > Received_desc - number of descriptors received for the virtqueue > > > > > completed_desc - number of descriptors completed for the virtqueue > > > > > > > > > > A new callback was added to vdpa_config_ops which provides the means > > > > > for the vdpa driver to return statistics results. > > > > > > > > > > The interface allows for reading all the supported virtqueues, > > > > > including the control virtqueue if it exists, by returning the next > > > > > queue index to query. > > > > > > > > > > Examples: > > > > > 1. Read statisitics for the virtqueue at index 1 $ vdpa dev stats show > > > > > vdpa-a index 1 > > > > > vdpa-a: > > > > > index 1 tx received_desc 21 completed_desc 21 > > > > > > > > > > 2. Read statisitics for all the virtqueues vdpa dev stats show vdpa-a > > > > > vdpa-a: > > > > > index 0 rx received_desc 256 completed_desc 12 index 1 tx > > > > > received_desc 21 completed_desc 21 index 2 ctrl received_desc 0 > > > > > completed_desc 0 > > > > > > > > Adding Adrian and Laurent. > > > > > > > > It's quite useful but I think it's vendor specific statistics. > > > These are vdpa device specific of Linux. > > > And are very generic of the VQ for all device types. > > > > The question is what happens if the parent doesn't implement those > > statistics. > > > > Are you suggesting that some parents may support reporting a subeset of > the fields and we should maybe indicate what is actually reported? It's an open question. E.g do we want a stable API for the those statistics counter? If yes, it's better to not to let the parents to report a subset but then it forces the exact same counters to be supported by other vendors. If not, it should be fine. For any case, I think it's simpler to start with "vendor-stats" and we can switch it to a standard way if it was agreed by every vendor. > > > > > > > > I wonder if it's better > > > > to use "vendor" prefix in the protocol, then we use this instead: > > > > > > > > vdpa dev vendor-stats show vdpa-a > > > > > > > May be. Lets evaluate if stats of this patch are generic enough or not. > > > > > > > Or if we want to make it standard is exposing virtio index better? > > > > > > > > qid 0 last_avail_idx X avail_idx Y last_used_idx M used_idx N > > > > > > > I did consider this option a while back. Shows indices are equally useful. > > > I think we should show that as vq info, along with other VQ attributes > > > (addr, len). > > > > That may work but it looks to me the receiced_desc/completed_desc is > > also per vq. > > > > Another question is that is it more useful to use buffers instead of > > descriptors? E.g how indirect descriptors are counted. > > > > I think it's descriptors so indirect descriptors are counted once but I > need to verify that. > > > > $ vdpa dev show vq > > > > > > But showing indices are not less statistics and more current state of the > > > queue. > > > For example roll over of the indices won't cover absolute number of > > > descriptors processed for the queue. > > > And even if we make them u64 (not good), non_developer end user needs to > > > keep using side calculator to count the delta. > > > > How about exposing those raw indices via the protocol and letting the > > vdpa tool calculate for us? > > > > counters are 16 bit per the virtio spec so I don't know how you could > handle rolover without losing information. So at most 1 rollover I guess. So it's not hard by comparing the indices. E.g the if last_avil_idx(device avail idx) > avail_idx, we know there's one? Thanks > > > Thanks > > > > > > > > So I think useful q indices belong to q info. > > > > > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization