[PATCH] vdpa: regist vhost-vdpa dev class
From: Zhang Min Some applications like kata-containers need to acquire MAJOR/MINOR/DEVNAME for devInfo [1], so regist vhost-vdpa dev class to expose uevent. 1. https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/device/config/config.go Signed-off-by: Zhang Min Signed-off-by: Yi Wang --- drivers/vhost/vdpa.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index fb41db3da611..90fbad93e7a2 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -1012,6 +1012,7 @@ static void vhost_vdpa_release_dev(struct device *device) kfree(v); } +static struct class *vhost_vdpa_class; static int vhost_vdpa_probe(struct vdpa_device *vdpa) { const struct vdpa_config_ops *ops = vdpa->config; @@ -1040,6 +1041,7 @@ static int vhost_vdpa_probe(struct vdpa_device *vdpa) v->dev.release = vhost_vdpa_release_dev; v->dev.parent = >dev; v->dev.devt = MKDEV(MAJOR(vhost_vdpa_major), minor); + v->dev.class = vhost_vdpa_class; v->vqs = kmalloc_array(v->nvqs, sizeof(struct vhost_virtqueue), GFP_KERNEL); if (!v->vqs) { @@ -1097,6 +1099,14 @@ static int __init vhost_vdpa_init(void) { int r; + vhost_vdpa_class = class_create(THIS_MODULE, "vhost-vdpa"); + if (IS_ERR(vhost_vdpa_class)) { + r = PTR_ERR(vhost_vdpa_class); + pr_warn("vhost vdpa class create error %d, maybe mod reinserted\n", r); + vhost_vdpa_class = NULL; + return r; + } + r = alloc_chrdev_region(_vdpa_major, 0, VHOST_VDPA_DEV_MAX, "vhost-vdpa"); if (r) @@ -,6 +1121,7 @@ static int __init vhost_vdpa_init(void) err_vdpa_register_driver: unregister_chrdev_region(vhost_vdpa_major, VHOST_VDPA_DEV_MAX); err_alloc_chrdev: + class_destroy(vhost_vdpa_class); return r; } module_init(vhost_vdpa_init); @@ -1118,6 +1129,7 @@ module_init(vhost_vdpa_init); static void __exit vhost_vdpa_exit(void) { vdpa_unregister_driver(_vdpa_driver); + class_destroy(vhost_vdpa_class); unregister_chrdev_region(vhost_vdpa_major, VHOST_VDPA_DEV_MAX); } module_exit(vhost_vdpa_exit); -- 2.27.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio/virtio_pci_legacy_dev: ensure the correct return value
On Wed, Dec 22, 2021 at 7:21 PM Peng Hao wrote: > > When pci_iomap return NULL, the return value is zero. > > Signed-off-by: Peng Hao > --- Acked-by: Jason Wang > drivers/virtio/virtio_pci_legacy_dev.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/virtio/virtio_pci_legacy_dev.c > b/drivers/virtio/virtio_pci_legacy_dev.c > index 9b97680dd02b..677d1f68bc9b 100644 > --- a/drivers/virtio/virtio_pci_legacy_dev.c > +++ b/drivers/virtio/virtio_pci_legacy_dev.c > @@ -45,8 +45,10 @@ int vp_legacy_probe(struct virtio_pci_legacy_device *ldev) > return rc; > > ldev->ioaddr = pci_iomap(pci_dev, 0, 0); > - if (!ldev->ioaddr) > + if (!ldev->ioaddr) { > + rc = -EIO; > goto err_iomap; > + } > > ldev->isr = ldev->ioaddr + VIRTIO_PCI_ISR; > > -- > 2.27.0 > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH] kthread: Generalize pf_io_worker so it can point to struct kthread
The point of using set_child_tid to hold the kthread pointer was that it already did what is necessary. There are now restrictions on when set_child_tid can be initialized and when set_child_tid can be used in schedule_tail. Which indicates that continuing to use set_child_tid to hold the kthread pointer is a bad idea. Instead of continuing to use the set_child_tid field of task_struct generalize the pf_io_worker field of task_struct and use it to hold the kthread pointer. Rename pf_io_worker (which is a void * pointer) to worker_private so it can be used to store kthreads struct kthread pointer. Update the kthread code to store the kthread pointer in the worker_private field. Remove the places where set_child_tid had to be dealt with carefully because kthreads also used it. Link: https://lkml.kernel.org/r/CAHk-=wgtfaa9sbvyg0gr1tqpmc17-nycs0gqkayg1bghh1u...@mail.gmail.com Suggested-by: Linus Torvalds Signed-off-by: "Eric W. Biederman" --- I looked again and the vhost_worker changes do not generalize pf_io_worker, and as pf_io_worker is already a void * it is easy to generalize. So I just did that. Unless someone spots a problem I will add this to my signal-for-v5.17 branch in linux-next, as this seems much less error prone than using set_child_tid. fs/io-wq.c| 6 +++--- fs/io-wq.h| 2 +- include/linux/sched.h | 4 ++-- kernel/fork.c | 8 +--- kernel/kthread.c | 14 +- kernel/sched/core.c | 2 +- 6 files changed, 13 insertions(+), 23 deletions(-) diff --git a/fs/io-wq.c b/fs/io-wq.c index 88202de519f6..e4fc7384b40c 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -657,7 +657,7 @@ static int io_wqe_worker(void *data) */ void io_wq_worker_running(struct task_struct *tsk) { - struct io_worker *worker = tsk->pf_io_worker; + struct io_worker *worker = tsk->worker_private; if (!worker) return; @@ -675,7 +675,7 @@ void io_wq_worker_running(struct task_struct *tsk) */ void io_wq_worker_sleeping(struct task_struct *tsk) { - struct io_worker *worker = tsk->pf_io_worker; + struct io_worker *worker = tsk->worker_private; if (!worker) return; @@ -694,7 +694,7 @@ void io_wq_worker_sleeping(struct task_struct *tsk) static void io_init_new_worker(struct io_wqe *wqe, struct io_worker *worker, struct task_struct *tsk) { - tsk->pf_io_worker = worker; + tsk->worker_private = worker; worker->task = tsk; set_cpus_allowed_ptr(tsk, wqe->cpu_mask); tsk->flags |= PF_NO_SETAFFINITY; diff --git a/fs/io-wq.h b/fs/io-wq.h index 41bf37674a49..c7c23947cbcd 100644 --- a/fs/io-wq.h +++ b/fs/io-wq.h @@ -200,6 +200,6 @@ static inline void io_wq_worker_running(struct task_struct *tsk) static inline bool io_wq_current_is_worker(void) { return in_task() && (current->flags & PF_IO_WORKER) && - current->pf_io_worker; + current->worker_private; } #endif diff --git a/include/linux/sched.h b/include/linux/sched.h index 78c351e35fec..52f2fdffa3ab 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -987,8 +987,8 @@ struct task_struct { /* CLONE_CHILD_CLEARTID: */ int __user *clear_child_tid; - /* PF_IO_WORKER */ - void*pf_io_worker; + /* PF_KTHREAD | PF_IO_WORKER */ + void*worker_private; u64 utime; u64 stime; diff --git a/kernel/fork.c b/kernel/fork.c index 0816be1bb044..6f0293cb29c9 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -950,7 +950,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) tsk->splice_pipe = NULL; tsk->task_frag.page = NULL; tsk->wake_q.next = NULL; - tsk->pf_io_worker = NULL; + tsk->worker_private = NULL; account_kernel_stack(tsk, 1); @@ -2032,12 +2032,6 @@ static __latent_entropy struct task_struct *copy_process( siginitsetinv(>blocked, sigmask(SIGKILL)|sigmask(SIGSTOP)); } - /* -* This _must_ happen before we call free_task(), i.e. before we jump -* to any of the bad_fork_* labels. This is to avoid freeing -* p->set_child_tid which is (ab)used as a kthread's data pointer for -* kernel threads (PF_KTHREAD). -*/ p->set_child_tid = (clone_flags & CLONE_CHILD_SETTID) ? args->child_tid : NULL; /* * Clear TID on mm_release()? diff --git a/kernel/kthread.c b/kernel/kthread.c index c14707d15341..261a3c3b9c6c 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -72,7 +72,7 @@ enum KTHREAD_BITS { static inline struct kthread *to_kthread(struct task_struct *k) { WARN_ON(!(k->flags & PF_KTHREAD)); - return (__force void *)k->set_child_tid; + return k->worker_private; }
Re: [PATCH 09/10] kthread: Ensure struct kthread is present for all kthreads
Added a couple of people from the vhost thread. Linus Torvalds writes: > On Wed, Dec 22, 2021 at 3:25 PM Eric W. Biederman > wrote: >> >> Solve this by skipping the put_user for all kthreads. > > Ugh. > > While this fixes the problem, could we please just not mis-use that > 'set_child_tid' as that kthread pointer any more? > > It was always kind of hacky. I think a new pointer with the proper > 'struct kthread *' type would be an improvement. > > One of the "arguments" in the comment for re-using that set_child_tid > pointer was that 'fork()' used to not wrongly copy it, but your patch > literally now does that "allocate new kthread struct" at fork-time, so > that argument is actually bogus now. I agree. I think I saw in the recent vhost patches that were generalizing create_io_thread that the pf_io_worker field of struct task_struct was being generalized as well. If so I think it makes sense just to take that approach. Just build some basic infrastructure that can be used for io_workers, vhost_workers, and kthreads. Eric ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 10/13] vdpa: Support reporting max device virtqueues
On 12/22/2021 6:27 PM, Jason Wang wrote: On Thu, Dec 23, 2021 at 3:25 AM Si-Wei Liu wrote: On 12/21/2021 11:54 PM, Eli Cohen wrote: On Tue, Dec 21, 2021 at 11:29:36PM -0800, Si-Wei Liu wrote: On 12/21/2021 11:10 PM, Eli Cohen wrote: On Wed, Dec 22, 2021 at 09:03:37AM +0200, Parav Pandit wrote: From: Eli Cohen Sent: Wednesday, December 22, 2021 12:17 PM --- a/drivers/vdpa/vdpa.c +++ b/drivers/vdpa/vdpa.c @@ -507,6 +507,9 @@ static int vdpa_mgmtdev_fill(const struct vdpa_mgmt_dev *mdev, struct sk_buff *m err = -EMSGSIZE; goto msg_err; } + if (nla_put_u16(msg, VDPA_ATTR_DEV_MGMTDEV_MAX_VQS, + mdev->max_supported_vqs)) It still needs a default value when the field is not explicitly filled in by the driver. Unlikely. This can be optional field to help user decide device max limit. When max_supported_vqs is set to zero. Vdpa should omit exposing it to user space. This is not about what you expose to userspace. It's about the number of VQs you want to create for a specific instance of vdpa. This value on mgmtdev indicates that a given mgmt device supports creating a vdpa device who can have maximum VQs of N. User will choose to create VQ with VQs <= N depending on its vcpu and other factors. You're right. So each vendor needs to put there their value. If I understand Parav correctly, he was suggesting not to expose VDPA_ATTR_DEV_MGMTDEV_MAX_VQS to userspace if seeing (max_supported_vqs == 0) from the driver. I can see the reasoning, but maybe we should leave it as zero which means it was not reported. The user will then need to guess. I believe other vendors will follow with an update so this to a real value. Unless you place a check in the vdpa core to enforce it on vdpa creation, otherwise it's very likely to get ignored by other vendors. But meanwhile, I do wonder how users tell apart multiqueue supporting parent from the single queue mgmtdev without getting the aid from this field. I hope the answer won't be to create a vdpa instance to try. Do you see a scenario that an admin decides to not instantiate vdpa just because it does not support MQ? Yes, there is. If the hardware doesn't support MQ, the provisioning tool in the mgmt software will need to fallback to software vhost backend with mq=on. At the time the tool is checking out, it doesn't run with root privilege. And it the management device reports it does support, there's still no guarantee you'll end up with a MQ net device. I'm not sure I follow. Do you mean it may be up to the guest feature negotiation? But the device itself is still MQ capable, isn't it? I think we need to clarify the "device" here. For compatibility reasons, there could be a case that mgmt doesn't expect a mq capable vdpa device. So in this case, even if the parent is MQ capable, the vdpa isn't. Right. The mgmt software is not necessarily libvirt. Perhaps I should be explicit to say the mgmt software we're building would definitely create MQ vdpa device in case on a MQ capable parent. -Siwei Thanks Thanks, -Siwei -Siwei This is what is exposed to the user to decide the upper bound. There has been some talk/patches of rdma virtio device. I anticipate such device to support more than 64K queues by nature of rdma. It is better to keep max_supported_vqs as u32. Why not add it when we have it? Sure, with that approach we will end up adding two fields (current u16 and later another u32) due to smaller bit width of current one. Either way is fine. Michael was suggesting similar higher bit-width in other patches, so bringing up here for this field on how he sees it. I can use u32 then. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 10/13] vdpa: Support reporting max device virtqueues
On Thu, Dec 23, 2021 at 3:25 AM Si-Wei Liu wrote: > > > > On 12/21/2021 11:54 PM, Eli Cohen wrote: > > On Tue, Dec 21, 2021 at 11:29:36PM -0800, Si-Wei Liu wrote: > >> > >> On 12/21/2021 11:10 PM, Eli Cohen wrote: > >>> On Wed, Dec 22, 2021 at 09:03:37AM +0200, Parav Pandit wrote: > > From: Eli Cohen > > Sent: Wednesday, December 22, 2021 12:17 PM > > > --- a/drivers/vdpa/vdpa.c > +++ b/drivers/vdpa/vdpa.c > @@ -507,6 +507,9 @@ static int vdpa_mgmtdev_fill(const struct > >>> vdpa_mgmt_dev *mdev, struct sk_buff *m > err = -EMSGSIZE; > goto msg_err; > } > + if (nla_put_u16(msg, VDPA_ATTR_DEV_MGMTDEV_MAX_VQS, > + mdev->max_supported_vqs)) > >>> It still needs a default value when the field is not explicitly > >>> filled in by the driver. > >>> > >> Unlikely. This can be optional field to help user decide device max > >> limit. > >> When max_supported_vqs is set to zero. Vdpa should omit exposing it to > >> user > > space. > > This is not about what you expose to userspace. It's about the number > > of VQs > > you want to create for a specific instance of vdpa. > This value on mgmtdev indicates that a given mgmt device supports > creating a vdpa device who can have maximum VQs of N. > User will choose to create VQ with VQs <= N depending on its vcpu and > other factors. > >>> You're right. > >>> So each vendor needs to put there their value. > >> If I understand Parav correctly, he was suggesting not to expose > >> VDPA_ATTR_DEV_MGMTDEV_MAX_VQS to userspace if seeing (max_supported_vqs == > >> 0) from the driver. > > I can see the reasoning, but maybe we should leave it as zero which > > means it was not reported. The user will then need to guess. I believe > > other vendors will follow with an update so this to a real value. > Unless you place a check in the vdpa core to enforce it on vdpa > creation, otherwise it's very likely to get ignored by other vendors. > > > > >> But meanwhile, I do wonder how users tell apart multiqueue supporting > >> parent > >> from the single queue mgmtdev without getting the aid from this field. I > >> hope the answer won't be to create a vdpa instance to try. > >> > > Do you see a scenario that an admin decides to not instantiate vdpa just > > because it does not support MQ? > Yes, there is. If the hardware doesn't support MQ, the provisioning tool > in the mgmt software will need to fallback to software vhost backend > with mq=on. At the time the tool is checking out, it doesn't run with > root privilege. > > > > > And it the management device reports it does support, there's still no > > guarantee you'll end up with a MQ net device. > I'm not sure I follow. Do you mean it may be up to the guest feature > negotiation? But the device itself is still MQ capable, isn't it? I think we need to clarify the "device" here. For compatibility reasons, there could be a case that mgmt doesn't expect a mq capable vdpa device. So in this case, even if the parent is MQ capable, the vdpa isn't. Thanks > > Thanks, > -Siwei > > > > > > >> -Siwei > >> > This is what is exposed to the user to decide the upper bound. > >> There has been some talk/patches of rdma virtio device. > >> I anticipate such device to support more than 64K queues by nature of > >> rdma. > >> It is better to keep max_supported_vqs as u32. > > Why not add it when we have it? > Sure, with that approach we will end up adding two fields (current u16 > and later another u32) due to smaller bit width of current one. > Either way is fine. Michael was suggesting similar higher bit-width in > other patches, so bringing up here for this field on how he sees it. > >>> I can use u32 then. > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v6 10/13] vdpa: Support reporting max device virtqueues
On 12/22/2021 6:20 AM, Eli Cohen wrote: Add max_supported_vqs field to struct vdpa_mgmt_dev. Upstream drivers need to feel this value according to the device capabilities. This value is reported back in a netlink message when showing management devices. Examples: $ vdpa mgmtdev show auxiliary/mlx5_core.sf.1: supported_classes net max_supported_vqs 257 $ vdpa -j mgmtdev show {"mgmtdev":{"auxiliary/mlx5_core.sf.1":{"supported_classes":["net"],"max_supported_vqs":257}}} $ vdpa -jp mgmtdev show { "mgmtdev": { "auxiliary/mlx5_core.sf.1": { "supported_classes": [ "net" ], "max_supported_vqs": 257 } } } Signed-off-by: Eli Cohen --- drivers/vdpa/vdpa.c | 3 +++ include/linux/vdpa.h | 1 + include/uapi/linux/vdpa.h | 1 + 3 files changed, 5 insertions(+) diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index 462da4968270..ea22aa864dd5 100644 --- a/drivers/vdpa/vdpa.c +++ b/drivers/vdpa/vdpa.c @@ -514,6 +514,9 @@ static int vdpa_mgmtdev_fill(const struct vdpa_mgmt_dev *mdev, struct sk_buff *m err = -EMSGSIZE; goto msg_err; } + if (nla_put_u32(msg, VDPA_ATTR_DEV_MGMTDEV_MAX_VQS, + mdev->max_supported_vqs)) + goto msg_err; genlmsg_end(msg, hdr); return 0; diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index 47e2b780e4bc..bd75af4feaf7 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -448,6 +448,7 @@ struct vdpa_mgmt_dev { const struct virtio_device_id *id_table; u64 config_attr_mask; struct list_head list; + u32 max_supported_vqs; }; int vdpa_mgmtdev_register(struct vdpa_mgmt_dev *mdev); diff --git a/include/uapi/linux/vdpa.h b/include/uapi/linux/vdpa.h index db3738ef3beb..995257c6bf2a 100644 --- a/include/uapi/linux/vdpa.h +++ b/include/uapi/linux/vdpa.h @@ -44,6 +44,7 @@ enum vdpa_attr { VDPA_ATTR_DEV_NET_CFG_MTU, /* u16 */ VDPA_ATTR_DEV_NEGOTIATED_FEATURES, /* u64 */ + VDPA_ATTR_DEV_MGMTDEV_MAX_VQS, /* u16 */ s/u16/u32/ -Siwei /* new attributes must be added above here */ VDPA_ATTR_MAX, }; ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v6 03/13] vdpa: Sync calls set/get config/status with cf_mutex
On 12/22/2021 6:20 AM, Eli Cohen wrote: Add wrappers to get/set status and protect these operations with cf_mutex to serialize these operations with respect to get/set config operations. Need to protect vdpa_reset() which is essentially vdpa_set_status(0) -Siwei Signed-off-by: Eli Cohen --- drivers/vdpa/vdpa.c | 19 +++ drivers/vhost/vdpa.c | 7 +++ drivers/virtio/virtio_vdpa.c | 3 +-- include/linux/vdpa.h | 3 +++ 4 files changed, 26 insertions(+), 6 deletions(-) diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index 42d71d60d5dc..5134c83c4a22 100644 --- a/drivers/vdpa/vdpa.c +++ b/drivers/vdpa/vdpa.c @@ -21,6 +21,25 @@ static LIST_HEAD(mdev_head); static DEFINE_MUTEX(vdpa_dev_mutex); static DEFINE_IDA(vdpa_index_ida); +u8 vdpa_get_status(struct vdpa_device *vdev) +{ + u8 status; + + mutex_lock(>cf_mutex); + status = vdev->config->get_status(vdev); + mutex_unlock(>cf_mutex); + return status; +} +EXPORT_SYMBOL(vdpa_get_status); + +void vdpa_set_status(struct vdpa_device *vdev, u8 status) +{ + mutex_lock(>cf_mutex); + vdev->config->set_status(vdev, status); + mutex_unlock(>cf_mutex); +} +EXPORT_SYMBOL(vdpa_set_status); + static struct genl_family vdpa_nl_family; static int vdpa_dev_probe(struct device *d) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index ebaa373e9b82..d9d499465e2e 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -142,10 +142,9 @@ static long vhost_vdpa_get_device_id(struct vhost_vdpa *v, u8 __user *argp) static long vhost_vdpa_get_status(struct vhost_vdpa *v, u8 __user *statusp) { struct vdpa_device *vdpa = v->vdpa; - const struct vdpa_config_ops *ops = vdpa->config; u8 status; - status = ops->get_status(vdpa); + status = vdpa_get_status(vdpa); if (copy_to_user(statusp, , sizeof(status))) return -EFAULT; @@ -164,7 +163,7 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 __user *statusp) if (copy_from_user(, statusp, sizeof(status))) return -EFAULT; - status_old = ops->get_status(vdpa); + status_old = vdpa_get_status(vdpa); /* * Userspace shouldn't remove status bits unless reset the @@ -182,7 +181,7 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 __user *statusp) if (ret) return ret; } else - ops->set_status(vdpa, status); + vdpa_set_status(vdpa, status); if ((status & VIRTIO_CONFIG_S_DRIVER_OK) && !(status_old & VIRTIO_CONFIG_S_DRIVER_OK)) for (i = 0; i < nvqs; i++) diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c index a84b04ba3195..76504559bc25 100644 --- a/drivers/virtio/virtio_vdpa.c +++ b/drivers/virtio/virtio_vdpa.c @@ -91,9 +91,8 @@ static u8 virtio_vdpa_get_status(struct virtio_device *vdev) static void virtio_vdpa_set_status(struct virtio_device *vdev, u8 status) { struct vdpa_device *vdpa = vd_get_vdpa(vdev); - const struct vdpa_config_ops *ops = vdpa->config; - return ops->set_status(vdpa, status); + return vdpa_set_status(vdpa, status); } static void virtio_vdpa_reset(struct virtio_device *vdev) diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index 9cc4291a79b3..ae047fae2603 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -408,6 +408,9 @@ void vdpa_get_config(struct vdpa_device *vdev, unsigned int offset, void *buf, unsigned int len); void vdpa_set_config(struct vdpa_device *dev, unsigned int offset, const void *buf, unsigned int length); +u8 vdpa_get_status(struct vdpa_device *vdev); +void vdpa_set_status(struct vdpa_device *vdev, u8 status); + /** * struct vdpa_mgmtdev_ops - vdpa device ops * @dev_add: Add a vdpa device using alloc and register ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V6 01/10] Use copy_process in vhost layer
On Wed, Dec 22, 2021 at 12:24:08PM -0600, Eric W. Biederman wrote: > Mike Christie writes: > > > On 12/21/21 6:20 PM, Eric W. Biederman wrote: > >> michael.chris...@oracle.com writes: > >> > >>> On 12/17/21 1:26 PM, Eric W. Biederman wrote: > Mike Christie writes: > > > The following patches made over Linus's tree, allow the vhost layer to > > do > > a copy_process on the thread that does the VHOST_SET_OWNER ioctl like > > how > > io_uring does a copy_process against its userspace app. This allows the > > vhost layer's worker threads to inherit cgroups, namespaces, address > > space, etc and this worker thread will also be accounted for against > > that > > owner/parent process's RLIMIT_NPROC limit. > > > > If you are not familiar with qemu and vhost here is more detailed > > problem description: > > > > Qemu will create vhost devices in the kernel which perform network, > > SCSI, > > etc IO and management operations from worker threads created by the > > kthread API. Because the kthread API does a copy_process on the kthreadd > > thread, the vhost layer has to use kthread_use_mm to access the Qemu > > thread's memory and cgroup_attach_task_all to add itself to the Qemu > > thread's cgroups. > > > > The problem with this approach is that we then have to add new > > functions/ > > args/functionality for every thing we want to inherit. I started doing > > that here: > > > > https://urldefense.com/v3/__https://lkml.org/lkml/2021/6/23/1233__;!!ACWV5N9M2RV99hQ!eIaEe9V8mCgGU6vyvaWTKGi3Zlnz0rgk5Y-0nsBXRbsuVZsM8lYfHr8I8GRuoLYPYrOB$ > > > > > > for the RLIMIT_NPROC check, but it seems it might be easier to just > > inherit everything from the beginning, becuase I'd need to do something > > like that patch several times. > > I read through the code and I don't see why you want to make these > almost threads of a process not actually threads of that process > (like the io_uring threads are). > > As a separate process there are many things that will continue to be > disjoint. If the thread changes cgroups for example your new process > won't follow. > > If you want them to be actually processes with an lifetime independent > of the creating process I expect you want to reparent them to the local > init process. Just so they don't confuse the process tree. Plus init > processes know how to handle unexpected children. > > What are the semantics you are aiming for? > > >>> > >>> Hi Eric, > >>> > >>> Right now, for vhost we need the thread being created: > >>> > >>> 1. added to the caller's cgroup. > >>> 2. to share the mm struct with the caller. > >>> 3. to be accounted for under the caller's nproc rlimit value. > >>> > >>> For 1 and 2, we have cgroup_attach_task_all and get_task_mm > >>> already. > >>> > >>> This patchset started with me just trying to handle #3 by modifying > >>> kthreads > >>> like here: > >>> > >>> https://urldefense.com/v3/__https://lkml.org/lkml/2021/6/23/1234__;!!ACWV5N9M2RV99hQ!bvqZOWy7TxQyq18L4I_a5MxP2OX0V2imOYEJrWsc-LkyVTI_zpFzxyV2pM_dgYywwH2y$ > >>> > >>> > >>> So we can use kthreads and the existing helpers and add: > >>> > >>> A. a ucounts version of the above patches in the link > >>> > >>> or > >>> > >>> B. a helper that does something like copy_process's use of > >>> is_ucounts_overlimit and vhost can call that. > >>> > >>> instead of this patchset. > >> > >> I don't fundamentally hate the patchset. I do have concerns about > >> the completely broken patch. > >> > >> With respect this patchset my gut says decide. Are you a thread of the > >> process (just use create_io_worker) are you a separate process forked > >> from the caller (use a cousin of create_io_worker but don't touch > >> create_io_worker). I think being a process vs being a thread is such a > >> fundamental difference we don't want to mix the helpers. > >> > >>> Before we even get to the next section below, do you consider items 1 - 3 > >>> something we need an API based on copy_process for? > >> > >> I think 3 staying in the callers nproc strongly suggests you want to > >> reuse copy_process. Which gets back to my question do you want > >> a thread or do you want a process. > >> > >> > >> For me a key detail is what is the lifetime of the vhost device? > >> > >> Does the vhost go away when the caller goes away? > > > > Yes. When the caller, normally qemu in our case, that created the worker > > thread exits, then we free the vhost devices and stop and free the worker > > threads we are creating in this patchset. > > > > However, I'm not sure if it makes a difference to you, but we also have > > second > > way to free a vhost device and its worker thread. The user can run a command > > that instructs the the qemu process to free the vhost device and its worker > > thread. > > I
Re: [PATCH v5 10/13] vdpa: Support reporting max device virtqueues
On 12/21/2021 11:54 PM, Eli Cohen wrote: On Tue, Dec 21, 2021 at 11:29:36PM -0800, Si-Wei Liu wrote: On 12/21/2021 11:10 PM, Eli Cohen wrote: On Wed, Dec 22, 2021 at 09:03:37AM +0200, Parav Pandit wrote: From: Eli Cohen Sent: Wednesday, December 22, 2021 12:17 PM --- a/drivers/vdpa/vdpa.c +++ b/drivers/vdpa/vdpa.c @@ -507,6 +507,9 @@ static int vdpa_mgmtdev_fill(const struct vdpa_mgmt_dev *mdev, struct sk_buff *m err = -EMSGSIZE; goto msg_err; } + if (nla_put_u16(msg, VDPA_ATTR_DEV_MGMTDEV_MAX_VQS, + mdev->max_supported_vqs)) It still needs a default value when the field is not explicitly filled in by the driver. Unlikely. This can be optional field to help user decide device max limit. When max_supported_vqs is set to zero. Vdpa should omit exposing it to user space. This is not about what you expose to userspace. It's about the number of VQs you want to create for a specific instance of vdpa. This value on mgmtdev indicates that a given mgmt device supports creating a vdpa device who can have maximum VQs of N. User will choose to create VQ with VQs <= N depending on its vcpu and other factors. You're right. So each vendor needs to put there their value. If I understand Parav correctly, he was suggesting not to expose VDPA_ATTR_DEV_MGMTDEV_MAX_VQS to userspace if seeing (max_supported_vqs == 0) from the driver. I can see the reasoning, but maybe we should leave it as zero which means it was not reported. The user will then need to guess. I believe other vendors will follow with an update so this to a real value. Unless you place a check in the vdpa core to enforce it on vdpa creation, otherwise it's very likely to get ignored by other vendors. But meanwhile, I do wonder how users tell apart multiqueue supporting parent from the single queue mgmtdev without getting the aid from this field. I hope the answer won't be to create a vdpa instance to try. Do you see a scenario that an admin decides to not instantiate vdpa just because it does not support MQ? Yes, there is. If the hardware doesn't support MQ, the provisioning tool in the mgmt software will need to fallback to software vhost backend with mq=on. At the time the tool is checking out, it doesn't run with root privilege. And it the management device reports it does support, there's still no guarantee you'll end up with a MQ net device. I'm not sure I follow. Do you mean it may be up to the guest feature negotiation? But the device itself is still MQ capable, isn't it? Thanks, -Siwei -Siwei This is what is exposed to the user to decide the upper bound. There has been some talk/patches of rdma virtio device. I anticipate such device to support more than 64K queues by nature of rdma. It is better to keep max_supported_vqs as u32. Why not add it when we have it? Sure, with that approach we will end up adding two fields (current u16 and later another u32) due to smaller bit width of current one. Either way is fine. Michael was suggesting similar higher bit-width in other patches, so bringing up here for this field on how he sees it. I can use u32 then. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V6 01/10] Use copy_process in vhost layer
Mike Christie writes: > On 12/21/21 6:20 PM, Eric W. Biederman wrote: >> michael.chris...@oracle.com writes: >> >>> On 12/17/21 1:26 PM, Eric W. Biederman wrote: Mike Christie writes: > The following patches made over Linus's tree, allow the vhost layer to do > a copy_process on the thread that does the VHOST_SET_OWNER ioctl like how > io_uring does a copy_process against its userspace app. This allows the > vhost layer's worker threads to inherit cgroups, namespaces, address > space, etc and this worker thread will also be accounted for against that > owner/parent process's RLIMIT_NPROC limit. > > If you are not familiar with qemu and vhost here is more detailed > problem description: > > Qemu will create vhost devices in the kernel which perform network, SCSI, > etc IO and management operations from worker threads created by the > kthread API. Because the kthread API does a copy_process on the kthreadd > thread, the vhost layer has to use kthread_use_mm to access the Qemu > thread's memory and cgroup_attach_task_all to add itself to the Qemu > thread's cgroups. > > The problem with this approach is that we then have to add new functions/ > args/functionality for every thing we want to inherit. I started doing > that here: > > https://urldefense.com/v3/__https://lkml.org/lkml/2021/6/23/1233__;!!ACWV5N9M2RV99hQ!eIaEe9V8mCgGU6vyvaWTKGi3Zlnz0rgk5Y-0nsBXRbsuVZsM8lYfHr8I8GRuoLYPYrOB$ > > > for the RLIMIT_NPROC check, but it seems it might be easier to just > inherit everything from the beginning, becuase I'd need to do something > like that patch several times. I read through the code and I don't see why you want to make these almost threads of a process not actually threads of that process (like the io_uring threads are). As a separate process there are many things that will continue to be disjoint. If the thread changes cgroups for example your new process won't follow. If you want them to be actually processes with an lifetime independent of the creating process I expect you want to reparent them to the local init process. Just so they don't confuse the process tree. Plus init processes know how to handle unexpected children. What are the semantics you are aiming for? >>> >>> Hi Eric, >>> >>> Right now, for vhost we need the thread being created: >>> >>> 1. added to the caller's cgroup. >>> 2. to share the mm struct with the caller. >>> 3. to be accounted for under the caller's nproc rlimit value. >>> >>> For 1 and 2, we have cgroup_attach_task_all and get_task_mm >>> already. >>> >>> This patchset started with me just trying to handle #3 by modifying kthreads >>> like here: >>> >>> https://urldefense.com/v3/__https://lkml.org/lkml/2021/6/23/1234__;!!ACWV5N9M2RV99hQ!bvqZOWy7TxQyq18L4I_a5MxP2OX0V2imOYEJrWsc-LkyVTI_zpFzxyV2pM_dgYywwH2y$ >>> >>> >>> So we can use kthreads and the existing helpers and add: >>> >>> A. a ucounts version of the above patches in the link >>> >>> or >>> >>> B. a helper that does something like copy_process's use of >>> is_ucounts_overlimit and vhost can call that. >>> >>> instead of this patchset. >> >> I don't fundamentally hate the patchset. I do have concerns about >> the completely broken patch. >> >> With respect this patchset my gut says decide. Are you a thread of the >> process (just use create_io_worker) are you a separate process forked >> from the caller (use a cousin of create_io_worker but don't touch >> create_io_worker). I think being a process vs being a thread is such a >> fundamental difference we don't want to mix the helpers. >> >>> Before we even get to the next section below, do you consider items 1 - 3 >>> something we need an API based on copy_process for? >> >> I think 3 staying in the callers nproc strongly suggests you want to >> reuse copy_process. Which gets back to my question do you want >> a thread or do you want a process. >> >> >> For me a key detail is what is the lifetime of the vhost device? >> >> Does the vhost go away when the caller goes away? > > Yes. When the caller, normally qemu in our case, that created the worker > thread exits, then we free the vhost devices and stop and free the worker > threads we are creating in this patchset. > > However, I'm not sure if it makes a difference to you, but we also have second > way to free a vhost device and its worker thread. The user can run a command > that instructs the the qemu process to free the vhost device and its worker > thread. I dug a little deeper to understand how this works, and it appears to be a standard file descriptor based API. The last close of the file descriptor is what causes the vhost_dev_cleanup to be called which shuts down the thread. This means that in rare cases the file descriptor can be passed to another process and be held open there, even after
Re: [PATCH V6 01/10] Use copy_process in vhost layer
On 12/21/21 6:20 PM, Eric W. Biederman wrote: > michael.chris...@oracle.com writes: > >> On 12/17/21 1:26 PM, Eric W. Biederman wrote: >>> Mike Christie writes: >>> The following patches made over Linus's tree, allow the vhost layer to do a copy_process on the thread that does the VHOST_SET_OWNER ioctl like how io_uring does a copy_process against its userspace app. This allows the vhost layer's worker threads to inherit cgroups, namespaces, address space, etc and this worker thread will also be accounted for against that owner/parent process's RLIMIT_NPROC limit. If you are not familiar with qemu and vhost here is more detailed problem description: Qemu will create vhost devices in the kernel which perform network, SCSI, etc IO and management operations from worker threads created by the kthread API. Because the kthread API does a copy_process on the kthreadd thread, the vhost layer has to use kthread_use_mm to access the Qemu thread's memory and cgroup_attach_task_all to add itself to the Qemu thread's cgroups. The problem with this approach is that we then have to add new functions/ args/functionality for every thing we want to inherit. I started doing that here: https://urldefense.com/v3/__https://lkml.org/lkml/2021/6/23/1233__;!!ACWV5N9M2RV99hQ!eIaEe9V8mCgGU6vyvaWTKGi3Zlnz0rgk5Y-0nsBXRbsuVZsM8lYfHr8I8GRuoLYPYrOB$ for the RLIMIT_NPROC check, but it seems it might be easier to just inherit everything from the beginning, becuase I'd need to do something like that patch several times. >>> >>> I read through the code and I don't see why you want to make these >>> almost threads of a process not actually threads of that process >>> (like the io_uring threads are). >>> >>> As a separate process there are many things that will continue to be >>> disjoint. If the thread changes cgroups for example your new process >>> won't follow. >>> >>> If you want them to be actually processes with an lifetime independent >>> of the creating process I expect you want to reparent them to the local >>> init process. Just so they don't confuse the process tree. Plus init >>> processes know how to handle unexpected children. >>> >>> What are the semantics you are aiming for? >>> >> >> Hi Eric, >> >> Right now, for vhost we need the thread being created: >> >> 1. added to the caller's cgroup. >> 2. to share the mm struct with the caller. >> 3. to be accounted for under the caller's nproc rlimit value. >> >> For 1 and 2, we have cgroup_attach_task_all and get_task_mm >> already. >> >> This patchset started with me just trying to handle #3 by modifying kthreads >> like here: >> >> https://urldefense.com/v3/__https://lkml.org/lkml/2021/6/23/1234__;!!ACWV5N9M2RV99hQ!bvqZOWy7TxQyq18L4I_a5MxP2OX0V2imOYEJrWsc-LkyVTI_zpFzxyV2pM_dgYywwH2y$ >> >> >> So we can use kthreads and the existing helpers and add: >> >> A. a ucounts version of the above patches in the link >> >> or >> >> B. a helper that does something like copy_process's use of >> is_ucounts_overlimit and vhost can call that. >> >> instead of this patchset. > > I don't fundamentally hate the patchset. I do have concerns about > the completely broken patch. > > With respect this patchset my gut says decide. Are you a thread of the > process (just use create_io_worker) are you a separate process forked > from the caller (use a cousin of create_io_worker but don't touch > create_io_worker). I think being a process vs being a thread is such a > fundamental difference we don't want to mix the helpers. > >> Before we even get to the next section below, do you consider items 1 - 3 >> something we need an API based on copy_process for? > > I think 3 staying in the callers nproc strongly suggests you want to > reuse copy_process. Which gets back to my question do you want > a thread or do you want a process. > > > For me a key detail is what is the lifetime of the vhost device? > > Does the vhost go away when the caller goes away? Yes. When the caller, normally qemu in our case, that created the worker thread exits, then we free the vhost devices and stop and free the worker threads we are creating in this patchset. However, I'm not sure if it makes a difference to you, but we also have second way to free a vhost device and its worker thread. The user can run a command that instructs the the qemu process to free the vhost device and its worker thread. > > If so you can create a thread in the caller's process that only performs > work in kernel space. At which point you are essentially > create_io_thread. > > If the vhost device can live after the caller goes away how is that managed? When the caller goes away we free the devices and their worker threads. Either before the caller exists it does an explicit close to release the device which frees the device and its worker thread, or when the process exits and the kernel
Re: [PATCH 0/4] driver_core: Auxiliary drvdata helper cleanup
On Tue, Dec 21, 2021 at 04:48:17PM -0800, David E. Box wrote: > On Tue, 2021-12-21 at 20:09 -0400, Jason Gunthorpe wrote: > > On Tue, Dec 21, 2021 at 03:58:48PM -0800, David E. Box wrote: > > > Depends on "driver core: auxiliary bus: Add driver data helpers" patch > > > [1]. > > > Applies the helpers to all auxiliary device drivers using > > > dev_(get/set)_drvdata. Drivers were found using the following search: > > > > > > grep -lr "struct auxiliary_device" $(grep -lr "drvdata" .) > > > > > > Changes were build tested using the following configs: > > > > > > vdpa/mlx5: CONFIG_MLX5_VDPA_NET > > > net/mlx53: CONFIG_MLX5_CORE_EN > > > soundwire/intel: CONFIG_SOUNDWIRE_INTEL > > > RDAM/irdma: CONFIG_INFINIBAND_IRDMA > > > CONFIG_MLX5_INFINIBAND > > > > > > [1] https://www.spinics.net/lists/platform-driver-x86/msg29940.html > > > > I have to say I don't really find this to be a big readability > > improvement. > > I should have referenced the thread [1] discussing the benefit of this change > since the question was asked and answered already. The idea is that drivers > shouldn't have to touch the device API directly if they are already using a > higher level core API (auxiliary bus) that can do that on its behalf. Driver writers should rarely use the auxilary device type directly, the should always immediately container_of it to their proper derived type. > > Also, what use is 'to_auxiliary_dev()' ? I didn't see any users added.. > > This was not added by that patch. It was added by the referenced patch, and seems totally pointless cut and paste, again because nothing should be using the auxiliary_device type for anything more than container_of'ing to their own type. We've been ripping out bus specific APIs in favour of generic ones (see the work on the DMA API for instance) so this whole concept seems regressive, particularly when applied to auxiliary bus which does not have an API of its own. Jason ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/4] soundwire: intel: Use auxiliary_device driver data helpers
On Tue, Dec 21, 2021 at 03:58:50PM -0800, David E. Box wrote: > Use auxiliary_get_drvdata and auxiliary_set_drvdata helpers. Reviewed-by: Andy Shevchenko > Signed-off-by: David E. Box > --- > drivers/soundwire/intel.c | 8 > drivers/soundwire/intel_init.c | 2 +- > 2 files changed, 5 insertions(+), 5 deletions(-) > > diff --git a/drivers/soundwire/intel.c b/drivers/soundwire/intel.c > index 78037ffdb09b..d082d18e41a9 100644 > --- a/drivers/soundwire/intel.c > +++ b/drivers/soundwire/intel.c > @@ -1293,7 +1293,7 @@ static int intel_link_probe(struct auxiliary_device > *auxdev, > bus->ops = _intel_ops; > > /* set driver data, accessed by snd_soc_dai_get_drvdata() */ > - dev_set_drvdata(dev, cdns); > + auxiliary_set_drvdata(auxdev, cdns); > > /* use generic bandwidth allocation algorithm */ > sdw->cdns.bus.compute_params = sdw_compute_params; > @@ -1321,7 +1321,7 @@ int intel_link_startup(struct auxiliary_device *auxdev) > { > struct sdw_cdns_stream_config config; > struct device *dev = >dev; > - struct sdw_cdns *cdns = dev_get_drvdata(dev); > + struct sdw_cdns *cdns = auxiliary_get_drvdata(auxdev); > struct sdw_intel *sdw = cdns_to_intel(cdns); > struct sdw_bus *bus = >bus; > int link_flags; > @@ -1463,7 +1463,7 @@ int intel_link_startup(struct auxiliary_device *auxdev) > static void intel_link_remove(struct auxiliary_device *auxdev) > { > struct device *dev = >dev; > - struct sdw_cdns *cdns = dev_get_drvdata(dev); > + struct sdw_cdns *cdns = auxiliary_get_drvdata(auxdev); > struct sdw_intel *sdw = cdns_to_intel(cdns); > struct sdw_bus *bus = >bus; > > @@ -1488,7 +1488,7 @@ int intel_link_process_wakeen_event(struct > auxiliary_device *auxdev) > void __iomem *shim; > u16 wake_sts; > > - sdw = dev_get_drvdata(dev); > + sdw = auxiliary_get_drvdata(auxdev); > bus = >cdns.bus; > > if (bus->prop.hw_disabled || !sdw->startup_done) { > diff --git a/drivers/soundwire/intel_init.c b/drivers/soundwire/intel_init.c > index e329022e1669..d99807765dfe 100644 > --- a/drivers/soundwire/intel_init.c > +++ b/drivers/soundwire/intel_init.c > @@ -244,7 +244,7 @@ static struct sdw_intel_ctx > goto err; > > link = >link_res; > - link->cdns = dev_get_drvdata(>auxdev.dev); > + link->cdns = auxiliary_get_drvdata(>auxdev); > > if (!link->cdns) { > dev_err(>dev, "failed to get link->cdns\n"); > -- > 2.25.1 > -- With Best Regards, Andy Shevchenko ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 03/10] drm/bochs: Replace module-init boiler-plate code with DRM helpers
On Wed, Dec 22, 2021 at 09:28:24AM +0100, Javier Martinez Canillas wrote: > -static int __init bochs_init(void) > -{ > - if (drm_firmware_drivers_only() && bochs_modeset == -1) > - return -EINVAL; Also cleanup bochs_modeset? I guess its not used any more after this patch ... take care, Gerd ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization