Re: [Qemu-devel] [RFC PATCH 0/2] virtio-mmio: add irqfd support for vhost-net based on virtio-mmio
On 25 October 2014 09:24, john.liuli john.li...@huawei.com wrote: To get the interrupt reason to support such VIRTIO_NET_F_STATUS features I add a new register offset VIRTIO_MMIO_ISRMEM which will help to establish a shared memory region between qemu and virtio-mmio device. Then the interrupt reason can be accessed by guest driver through this region. At the same time, the virtio-mmio dirver check this region to see irqfd is supported or not during the irq handler registration, and different handler will be assigned. If you want to add a new register you should probably propose an update to the virtio spec. However, it seems to me it would be better to get generic PCI/PCIe working on the ARM virt board instead; then we can let virtio-mmio quietly fade away. This has been on the todo list for ages (and there have been RFC patches posted for plain PCI), it's just nobody's had time to work on it. thanks -- PMM ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH 0/2] virtio-mmio: add irqfd support for vhost-net based on virtio-mmio
On Mon, Oct 27, 2014 at 05:19:23PM +0800, Li Liu wrote: On 2014/10/26 19:52, Michael S. Tsirkin wrote: On Sat, Oct 25, 2014 at 04:24:52PM +0800, john.liuli wrote: From: Li Liu john.li...@huawei.com This set of patches try to implemet irqfd support of vhost-net based on virtio-mmio. I had posted a mail to talking about the status of vhost-net on kvm-arm refer to http://www.spinics.net/lists/kvm-arm/msg10804.html. Some dependent patches are listed in the mail too. Basically the vhost-net brings great performance improvements, almost 50%+. It's easy to implement irqfd support with PCI MSI-X. But till now arm32 do not provide equivalent mechanism to let a device allocate multiple interrupts. And even the aarch64 provid LPI but also not available in a short time. As Gauguey Remy said Vhost does not emulate a complete virtio adapter but only manage virtqueue operations. Vhost module don't update the ISR register, so if with only one irq then it's no way to get the interrupt reason even we can inject the irq correctly. Well guests don't read ISR in MSI-X mode so why does it help to set the ISR bit? Yeah, vhost don't need to set ISR under MSI-X mode. But for ARM without MSI-X kind mechanism guest can't get the interrupt reason through the only one irq hanlder (with one gsi resource). So I build a shared memory region to provide the interrupt reason instead of ISR regiser by qemu without bothering vhost. Then even there's only one irq with only one irq hanlder, it still can distinguish why irq occur. Li. OK so this might allow sharing IRQs between config and VQs which might be a handy optimization even for PCI (at the moment reading ISR causes an exit). Need to see how all this would work on MP systems though. To get the interrupt reason to support such VIRTIO_NET_F_STATUS features I add a new register offset VIRTIO_MMIO_ISRMEM which will help to establish a shared memory region between qemu and virtio-mmio device. Then the interrupt reason can be accessed by guest driver through this region. At the same time, the virtio-mmio dirver check this region to see irqfd is supported or not during the irq handler registration, and different handler will be assigned. I want to know it's the right direction? Does it comply with the virtio-mmio spec.? Or anyone have more good ideas to emulate mis-x based on virtio-mmio? I hope to get feedback and guidance. Thx for any help. Li Liu (2): Add a new register offset let interrupt reason available Assign a new irq handler while irqfd enabled drivers/virtio/virtio_mmio.c | 55 +++--- include/linux/virtio_mmio.h |3 +++ 2 files changed, 55 insertions(+), 3 deletions(-) -- 1.7.9.5 . ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v12 09/11] pvqspinlock, x86: Add para-virtualization support
On Sat, 2014-10-25 at 00:04 +0200, Peter Zijlstra wrote: On Fri, Oct 24, 2014 at 04:53:27PM -0400, Waiman Long wrote: The additional register pressure may just cause a few more register moves which should be negligible in the overall performance . The additional icache pressure, however, may have some impact on performance. I was trying to balance the performance of the pv and non-pv versions so that we won't penalize the pv code too much for a bit more performance in the non-pv code. Doing it your way will add a lot of function call and register saving/restoring to the pv code. If people care about performance they should not be using virt crap :-) I tried some benching recently.. where did they hide the fastpaths? :) -Mike ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC PATCH 1/2] Add a new register offset let interrupt reason available
From: Li Liu john.li...@huawei.com Add a new register offset VIRTIO_MMIO_ISRMEM which help to estblish a shared memory region between virtio-mmio driver and qemu with two purposes: 1.Guest virtio-mmio driver can get the interrupt reason. 2.Check irqfd enabled or not to register different irq handler. Signed-off-by: Li Liu john.li...@huawei.com --- drivers/virtio/virtio_mmio.c | 21 - include/linux/virtio_mmio.h |3 +++ 2 files changed, 23 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index ef9a165..28ddb55 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -122,6 +122,8 @@ struct virtio_mmio_device { /* a list of queues so we can dispatch IRQs */ spinlock_t lock; struct list_head virtqueues; + + uint8_t *isr_mem; }; struct virtio_mmio_vq_info { @@ -443,6 +445,7 @@ static int virtio_mmio_probe(struct platform_device *pdev) struct virtio_mmio_device *vm_dev; struct resource *mem; unsigned long magic; + int err; mem = platform_get_resource(pdev, IORESOURCE_MEM, 0); if (!mem) @@ -481,6 +484,15 @@ static int virtio_mmio_probe(struct platform_device *pdev) return -ENXIO; } + vm_dev-isr_mem = alloc_pages_exact(PAGE_SIZE, GFP_KERNEL|__GFP_ZERO); + if (vm_dev-isr_mem == NULL) { + dev_err(pdev-dev, Allocate isr memory failed!\n); + return -ENOMEM; + } + + writel(virt_to_phys(vm_dev-isr_mem), + vm_dev-base + VIRTIO_MMIO_ISRMEM); + vm_dev-vdev.id.device = readl(vm_dev-base + VIRTIO_MMIO_DEVICE_ID); vm_dev-vdev.id.vendor = readl(vm_dev-base + VIRTIO_MMIO_VENDOR_ID); @@ -488,13 +500,20 @@ static int virtio_mmio_probe(struct platform_device *pdev) platform_set_drvdata(pdev, vm_dev); - return register_virtio_device(vm_dev-vdev); + err = register_virtio_device(vm_dev-vdev); + if (err) { + free_pages_exact(vm_dev-isr_mem, PAGE_SIZE); + vm_dev-isr_mem = NULL; + } + + return err; } static int virtio_mmio_remove(struct platform_device *pdev) { struct virtio_mmio_device *vm_dev = platform_get_drvdata(pdev); + free_pages_exact(vm_dev-isr_mem, PAGE_SIZE); unregister_virtio_device(vm_dev-vdev); return 0; diff --git a/include/linux/virtio_mmio.h b/include/linux/virtio_mmio.h index 5c7b6f0..b1e3ec7 100644 --- a/include/linux/virtio_mmio.h +++ b/include/linux/virtio_mmio.h @@ -95,6 +95,9 @@ /* Device status register - Read Write */ #define VIRTIO_MMIO_STATUS 0x070 +/* Allocate ISRMEM for interrupt reason - Write Only */ +#define VIRTIO_MMIO_ISRMEM 0x080 + /* The config space is defined by each driver as * the per-driver configuration space - Read Write */ #define VIRTIO_MMIO_CONFIG 0x100 -- 1.7.9.5 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH 0/2] virtio-mmio: add irqfd support for vhost-net based on virtio-mmio
On 2014/10/26 19:52, Michael S. Tsirkin wrote: On Sat, Oct 25, 2014 at 04:24:52PM +0800, john.liuli wrote: From: Li Liu john.li...@huawei.com This set of patches try to implemet irqfd support of vhost-net based on virtio-mmio. I had posted a mail to talking about the status of vhost-net on kvm-arm refer to http://www.spinics.net/lists/kvm-arm/msg10804.html. Some dependent patches are listed in the mail too. Basically the vhost-net brings great performance improvements, almost 50%+. It's easy to implement irqfd support with PCI MSI-X. But till now arm32 do not provide equivalent mechanism to let a device allocate multiple interrupts. And even the aarch64 provid LPI but also not available in a short time. As Gauguey Remy said Vhost does not emulate a complete virtio adapter but only manage virtqueue operations. Vhost module don't update the ISR register, so if with only one irq then it's no way to get the interrupt reason even we can inject the irq correctly. Well guests don't read ISR in MSI-X mode so why does it help to set the ISR bit? Yeah, vhost don't need to set ISR under MSI-X mode. But for ARM without MSI-X kind mechanism guest can't get the interrupt reason through the only one irq hanlder (with one gsi resource). So I build a shared memory region to provide the interrupt reason instead of ISR regiser by qemu without bothering vhost. Then even there's only one irq with only one irq hanlder, it still can distinguish why irq occur. Li. To get the interrupt reason to support such VIRTIO_NET_F_STATUS features I add a new register offset VIRTIO_MMIO_ISRMEM which will help to establish a shared memory region between qemu and virtio-mmio device. Then the interrupt reason can be accessed by guest driver through this region. At the same time, the virtio-mmio dirver check this region to see irqfd is supported or not during the irq handler registration, and different handler will be assigned. I want to know it's the right direction? Does it comply with the virtio-mmio spec.? Or anyone have more good ideas to emulate mis-x based on virtio-mmio? I hope to get feedback and guidance. Thx for any help. Li Liu (2): Add a new register offset let interrupt reason available Assign a new irq handler while irqfd enabled drivers/virtio/virtio_mmio.c | 55 +++--- include/linux/virtio_mmio.h |3 +++ 2 files changed, 55 insertions(+), 3 deletions(-) -- 1.7.9.5 . ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC PATCH 0/2] virtio-mmio: add irqfd support for vhost-net based on virtio-mmio
From: Li Liu john.li...@huawei.com This set of patches try to implemet irqfd support of vhost-net based on virtio-mmio. I had posted a mail to talking about the status of vhost-net on kvm-arm refer to http://www.spinics.net/lists/kvm-arm/msg10804.html. Some dependent patches are listed in the mail too. Basically the vhost-net brings great performance improvements, almost 50%+. It's easy to implement irqfd support with PCI MSI-X. But till now arm32 do not provide equivalent mechanism to let a device allocate multiple interrupts. And even the aarch64 provid LPI but also not available in a short time. As Gauguey Remy said Vhost does not emulate a complete virtio adapter but only manage virtqueue operations. Vhost module don't update the ISR register, so if with only one irq then it's no way to get the interrupt reason even we can inject the irq correctly. To get the interrupt reason to support such VIRTIO_NET_F_STATUS features I add a new register offset VIRTIO_MMIO_ISRMEM which will help to establish a shared memory region between qemu and virtio-mmio device. Then the interrupt reason can be accessed by guest driver through this region. At the same time, the virtio-mmio dirver check this region to see irqfd is supported or not during the irq handler registration, and different handler will be assigned. I want to know it's the right direction? Does it comply with the virtio-mmio spec.? Or anyone have more good ideas to emulate mis-x based on virtio-mmio? I hope to get feedback and guidance. Thx for any help. Li Liu (2): Add a new register offset let interrupt reason available Assign a new irq handler while irqfd enabled drivers/virtio/virtio_mmio.c | 55 +++--- include/linux/virtio_mmio.h |3 +++ 2 files changed, 55 insertions(+), 3 deletions(-) -- 1.7.9.5 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC PATCH 2/2] Assign a new irq handler while irqfd enabled
From: Li Liu john.li...@huawei.com This irq handler will get the interrupt reason from a shared memory. And will be assigned only while irqfd enabled. Signed-off-by: Li Liu john.li...@huawei.com --- drivers/virtio/virtio_mmio.c | 34 -- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index 28ddb55..7229605 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -259,7 +259,31 @@ static irqreturn_t vm_interrupt(int irq, void *opaque) return ret; } +/* Notify all virtqueues on an interrupt. */ +static irqreturn_t vm_interrupt_irqfd(int irq, void *opaque) +{ + struct virtio_mmio_device *vm_dev = opaque; + struct virtio_mmio_vq_info *info; + unsigned long status; + unsigned long flags; + irqreturn_t ret = IRQ_NONE; + /* Read the interrupt reason and reset it */ + status = *vm_dev-isr_mem; + *vm_dev-isr_mem = 0x0; + + if (unlikely(status VIRTIO_MMIO_INT_CONFIG)) { + virtio_config_changed(vm_dev-vdev); + ret = IRQ_HANDLED; + } + + spin_lock_irqsave(vm_dev-lock, flags); + list_for_each_entry(info, vm_dev-virtqueues, node) + ret |= vring_interrupt(irq, info-vq); + spin_unlock_irqrestore(vm_dev-lock, flags); + + return ret; +} static void vm_del_vq(struct virtqueue *vq) { @@ -391,6 +415,7 @@ error_available: return ERR_PTR(err); } +#define VIRTIO_MMIO_F_IRQFD(1 7) static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs, struct virtqueue *vqs[], vq_callback_t *callbacks[], @@ -400,8 +425,13 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs, unsigned int irq = platform_get_irq(vm_dev-pdev, 0); int i, err; - err = request_irq(irq, vm_interrupt, IRQF_SHARED, - dev_name(vdev-dev), vm_dev); + if (*vm_dev-isr_mem VIRTIO_MMIO_F_IRQFD) { + err = request_irq(irq, vm_interrupt_irqfd, IRQF_SHARED, + dev_name(vdev-dev), vm_dev); + } else { + err = request_irq(irq, vm_interrupt, IRQF_SHARED, + dev_name(vdev-dev), vm_dev); + } if (err) return err; -- 1.7.9.5 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH 2/2] Assign a new irq handler while irqfd enabled
On 2014/10/26 19:56, Michael S. Tsirkin wrote: On Sat, Oct 25, 2014 at 04:24:54PM +0800, john.liuli wrote: From: Li Liu john.li...@huawei.com This irq handler will get the interrupt reason from a shared memory. And will be assigned only while irqfd enabled. Signed-off-by: Li Liu john.li...@huawei.com --- drivers/virtio/virtio_mmio.c | 34 -- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index 28ddb55..7229605 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -259,7 +259,31 @@ static irqreturn_t vm_interrupt(int irq, void *opaque) return ret; } +/* Notify all virtqueues on an interrupt. */ +static irqreturn_t vm_interrupt_irqfd(int irq, void *opaque) +{ +struct virtio_mmio_device *vm_dev = opaque; +struct virtio_mmio_vq_info *info; +unsigned long status; +unsigned long flags; +irqreturn_t ret = IRQ_NONE; +/* Read the interrupt reason and reset it */ +status = *vm_dev-isr_mem; +*vm_dev-isr_mem = 0x0; you are reading and modifying shared memory without atomics and any memory barriers. Why is this safe? good catch, a stupid mistake. + +if (unlikely(status VIRTIO_MMIO_INT_CONFIG)) { +virtio_config_changed(vm_dev-vdev); +ret = IRQ_HANDLED; +} + +spin_lock_irqsave(vm_dev-lock, flags); +list_for_each_entry(info, vm_dev-virtqueues, node) +ret |= vring_interrupt(irq, info-vq); +spin_unlock_irqrestore(vm_dev-lock, flags); + +return ret; +} static void vm_del_vq(struct virtqueue *vq) { So you invoke callbacks for all VQs. This won't scale well as the number of VQs grows, will it? @@ -391,6 +415,7 @@ error_available: return ERR_PTR(err); } +#define VIRTIO_MMIO_F_IRQFD(1 7) static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs, struct virtqueue *vqs[], vq_callback_t *callbacks[], @@ -400,8 +425,13 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs, unsigned int irq = platform_get_irq(vm_dev-pdev, 0); int i, err; -err = request_irq(irq, vm_interrupt, IRQF_SHARED, -dev_name(vdev-dev), vm_dev); +if (*vm_dev-isr_mem VIRTIO_MMIO_F_IRQFD) { +err = request_irq(irq, vm_interrupt_irqfd, IRQF_SHARED, + dev_name(vdev-dev), vm_dev); +} else { +err = request_irq(irq, vm_interrupt, IRQF_SHARED, + dev_name(vdev-dev), vm_dev); +} if (err) return err; So still a single interrupt for all VQs. Again this doesn't scale: a single CPU has to handle interrupts for all of them. I think you need to find a way to get per-VQ interrupts. Yeah, AFAIK it's impossible to distribute works to different CPUs with only one irq without MSI-X kind mechanism. Assign multiple gsis to one device, obviously it's consumptive and not scalable. Any ideas? Thx. -- 1.7.9.5 . ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [RFC PATCH 0/2] virtio-mmio: add irqfd support for vhost-net based on virtio-mmio
On 2014/10/27 17:37, Peter Maydell wrote: On 25 October 2014 09:24, john.liuli john.li...@huawei.com wrote: To get the interrupt reason to support such VIRTIO_NET_F_STATUS features I add a new register offset VIRTIO_MMIO_ISRMEM which will help to establish a shared memory region between qemu and virtio-mmio device. Then the interrupt reason can be accessed by guest driver through this region. At the same time, the virtio-mmio dirver check this region to see irqfd is supported or not during the irq handler registration, and different handler will be assigned. If you want to add a new register you should probably propose an update to the virtio spec. However, it seems to me it would be better to get generic PCI/PCIe working on the ARM virt board instead; then we can let virtio-mmio quietly fade away. This has been on the todo list for ages (and there have been RFC patches posted for plain PCI), it's just nobody's had time to work on it. thanks -- PMM So you mean virtio-mmio will be replaced by PCI/PCIe on ARM at last? If so, let this patch go with the wind:). Thx. Li. . ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v12 09/11] pvqspinlock, x86: Add para-virtualization support
On 10/24/2014 06:04 PM, Peter Zijlstra wrote: On Fri, Oct 24, 2014 at 04:53:27PM -0400, Waiman Long wrote: The additional register pressure may just cause a few more register moves which should be negligible in the overall performance . The additional icache pressure, however, may have some impact on performance. I was trying to balance the performance of the pv and non-pv versions so that we won't penalize the pv code too much for a bit more performance in the non-pv code. Doing it your way will add a lot of function call and register saving/restoring to the pv code. If people care about performance they should not be using virt crap :-) I only really care about bare metal. Yes, I am aware of that. However, the whole point of doing PV spinlock is to improve performance in a virtual guest. Anyway, I had done some measurements. In my test system, the queue_spin_lock_slowpath() function has a text size of about 400 bytes without PV, but 1120 bytes with PV. I made some changes to create separate versions of PV and non-PV slowpath functions as shown by the diff below. The text size is now about 430 bytes for the non-PV version and 925 bytes for the PV version. The overall object size increases by a bit more than 200 bytes, but the icache footprint should be reduced no matter which version is used. -Longman diff --git a/arch/x86/include/asm/pvqspinlock.h b/arch/x86/include/asm/pvqspinlo index d424252..241bf30 100644 --- a/arch/x86/include/asm/pvqspinlock.h +++ b/arch/x86/include/asm/pvqspinlock.h @@ -79,9 +79,6 @@ static inline void pv_init_node(struct mcs_spinlock *node) BUILD_BUG_ON(sizeof(struct pv_qnode) 5*sizeof(struct mcs_spinlock)); - if (!pv_enabled()) - return; - pn-cpustate = PV_CPU_ACTIVE; pn-mayhalt = false; pn-mycpu= smp_processor_id(); @@ -132,9 +129,6 @@ static inline bool pv_link_and_wait_node(u32 old, struct mcs struct pv_qnode *ppn, *pn = (struct pv_qnode *)node; unsigned int count; - if (!pv_enabled()) - return false; - if (!(old _Q_TAIL_MASK)) { node-locked = true;/* At queue head now */ goto ret; @@ -236,9 +230,6 @@ pv_wait_head(struct qspinlock *lock, struct mcs_spinlock *no { struct pv_qnode *pn = (struct pv_qnode *)node; - if (!pv_enabled()) - return smp_load_acquire(lock-val.counter); - for (;;) { unsigned int count; s8 oldstate; @@ -328,8 +319,6 @@ static inline void pv_wait_check(struct qspinlock *lock, struct pv_qnode *pnxt = (struct pv_qnode *)next; struct pv_qnode *pcur = (struct pv_qnode *)node; - if (!pv_enabled()) - return; /* * Clear the locked and head values of lock holder */ diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 1662dbd..05aea57 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -16,6 +16,7 @@ * Authors: Waiman Long waiman.l...@hp.com * Peter Zijlstra pzijl...@redhat.com */ +#ifndef _GEN_PV_LOCK_SLOWPATH #include linux/smp.h #include linux/bug.h #include linux/cpumask.h @@ -271,19 +272,37 @@ void queue_spin_unlock_slowpath(struct qspinlock *lock) } EXPORT_SYMBOL(queue_spin_unlock_slowpath); -#else +static void pv_queue_spin_lock_slowpath(struct qspinlock *lock, u32 val); + +#else /* CONFIG_PARAVIRT_SPINLOCKS */ + +static inline void pv_queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) + { } -static inline void pv_init_node(struct mcs_spinlock *node) { } -static inline void pv_wait_check(struct qspinlock *lock, -struct mcs_spinlock *node, -struct mcs_spinlock *next) { } -static inline bool pv_link_and_wait_node(u32 old, struct mcs_spinlock *node) +#endif /* CONFIG_PARAVIRT_SPINLOCKS */ + +/* + * Dummy PV functions for bare-metal slowpath code + */ +static inline void nopv_init_node(struct mcs_spinlock *node) { } +static inline void nopv_wait_check(struct qspinlock *lock, + struct mcs_spinlock *node, + struct mcs_spinlock *next) { } +static inline bool nopv_link_and_wait_node(u32 old, struct mcs_spinlock *node) { return false; } -static inline int pv_wait_head(struct qspinlock *lock, +static inline int nopv_wait_head(struct qspinlock *lock, struct mcs_spinlock *node) { return smp_load_acquire(lock-val.counter); } +static inline bool return_true(void) { return true; } +static inline bool return_false(void) { return false; } -#endif /* CONFIG_PARAVIRT_SPINLOCKS */ +#define pv_init_node nopv_init_node +#define pv_wait_check nopv_wait_check +#define pv_link_and_wait_node nopv_link_and_wait_node
Re: [PATCH v12 09/11] pvqspinlock, x86: Add para-virtualization support
On Mon, Oct 27, 2014 at 01:15:53PM -0400, Waiman Long wrote: On 10/24/2014 06:04 PM, Peter Zijlstra wrote: On Fri, Oct 24, 2014 at 04:53:27PM -0400, Waiman Long wrote: The additional register pressure may just cause a few more register moves which should be negligible in the overall performance . The additional icache pressure, however, may have some impact on performance. I was trying to balance the performance of the pv and non-pv versions so that we won't penalize the pv code too much for a bit more performance in the non-pv code. Doing it your way will add a lot of function call and register saving/restoring to the pv code. If people care about performance they should not be using virt crap :-) I only really care about bare metal. Yes, I am aware of that. However, the whole point of doing PV spinlock is to improve performance in a virtual guest. Anything that avoids the lock holder preemption nonsense is a _massive_ win for them, a few function calls should not even register on that scale. +#ifdef _GEN_PV_LOCK_SLOWPATH +static void pv_queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) +#else void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) +#endif If you have two functions you might as well use the PV stuff to patch in the right function call at the usage sites and avoid: + if (pv_enabled()) { + pv_queue_spin_lock_slowpath(lock, val); + return; + } this alltogether. this_cpu_dec(mcs_nodes[0].count); } EXPORT_SYMBOL(queue_spin_lock_slowpath); + +#if !defined(_GEN_PV_LOCK_SLOWPATH) defined(CONFIG_PARAVIRT_SPINLOCKS) +/* + * Generate the PV version of the queue_spin_lock_slowpath function + */ +#undef pv_init_node +#undef pv_wait_check +#undef pv_link_and_wait_node +#undef pv_wait_head +#undef EXPORT_SYMBOL +#undef in_pv_code + +#define _GEN_PV_LOCK_SLOWPATH +#define EXPORT_SYMBOL(x) +#define in_pv_code return_true +#define pv_enabled return_false + +#include qspinlock.c + +#endif That's properly disgusting :-) But a lot better than actually duplicating everything I suppose. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v12 09/11] pvqspinlock, x86: Add para-virtualization support
On 10/24/2014 04:54 AM, Peter Zijlstra wrote: On Thu, Oct 16, 2014 at 02:10:38PM -0400, Waiman Long wrote: Since enabling paravirt spinlock will disable unlock function inlining, a jump label can be added to the unlock function without adding patch sites all over the kernel. But you don't have to. My patches allowed for the inline to remain, again reducing the overhead of enabling PV spinlocks while running on a real machine. Look at: http://lkml.kernel.org/r/20140615130154.213923...@chello.nl In particular this hunk: Index: linux-2.6/arch/x86/kernel/paravirt_patch_64.c === --- linux-2.6.orig/arch/x86/kernel/paravirt_patch_64.c +++ linux-2.6/arch/x86/kernel/paravirt_patch_64.c @@ -22,6 +22,10 @@ DEF_NATIVE(pv_cpu_ops, swapgs, swapgs) DEF_NATIVE(, mov32, mov %edi, %eax); DEF_NATIVE(, mov64, mov %rdi, %rax); +#if defined(CONFIG_PARAVIRT_SPINLOCKS) defined(CONFIG_QUEUE_SPINLOCK) +DEF_NATIVE(pv_lock_ops, queue_unlock, movb $0, (%rdi)); +#endif + unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len) { return paravirt_patch_insns(insnbuf, len, @@ -61,6 +65,9 @@ unsigned native_patch(u8 type, u16 clobb PATCH_SITE(pv_cpu_ops, clts); PATCH_SITE(pv_mmu_ops, flush_tlb_single); PATCH_SITE(pv_cpu_ops, wbinvd); +#if defined(CONFIG_PARAVIRT_SPINLOCKS) defined(CONFIG_QUEUE_SPINLOCK) + PATCH_SITE(pv_lock_ops, queue_unlock); +#endif patch_site: ret = paravirt_patch_insns(ibuf, len, start, end); That makes sure to overwrite the callee-saved call to the pv_lock_ops::queue_unlock with the immediate asm movb $0, (%rdi). Therefore you can retain the inlined unlock with hardly (there might be some NOP padding) any overhead at all. On PV it reverts to a callee saved function call. My concern is that spin_unlock() can be called in many places, including loadable kernel modules. Can the paravirt_patch_ident_32() function able to patch all of them in reasonable time? How about a kernel module loaded later at run time? So I think we may still need to disable unlock function inlining even if we used your way kernel site patching. Regards, Longman ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v12 00/11] qspinlock: a 4-byte queue spinlock with PV support
On 10/24/2014 04:57 AM, Peter Zijlstra wrote: On Thu, Oct 16, 2014 at 02:10:29PM -0400, Waiman Long wrote: v11-v12: - Based on PeterZ's version of the qspinlock patch (https://lkml.org/lkml/2014/6/15/63). - Incorporated many of the review comments from Konrad Wilk and Paolo Bonzini. - The pvqspinlock code is largely from my previous version with PeterZ's way of going from queue tail to head and his idea of using callee saved calls to KVM and XEN codes. Thanks for taking the time to refresh this.. I would prefer you use a little more of the PV techniques I outlined in my latest PV patch to further reduce the overhead of PV enabled kernels on real hardware. This is an important use case, because distro kernels will have to enable PV support while their majority of installations will be on physical hardware. Other than that I see no reason not to move this forward. Thanks for reviewing the patch and agree to move forward. Currently, I am thinking of separating out a PV and non-PV versions of the lock slowpath functions as shown in my previous mail. That should also minimize the performance impact on bare metal even more than what can be done with the PV techniques used in your patch while not penalizing PV performance. As for the unlock function, if the site patching function can handle all the possible call sites of spin_unlock() without disabling function inlining, I will be glad to use your way of handing unlock function. Otherwise, I will prefer my current approach as it is simpler and more easy to understand as well as similar to what has been done in the pv ticket spinlock code. -Longman ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v12 09/11] pvqspinlock, x86: Add para-virtualization support
On Mon, Oct 27, 2014 at 01:38:20PM -0400, Waiman Long wrote: On 10/24/2014 04:54 AM, Peter Zijlstra wrote: On Thu, Oct 16, 2014 at 02:10:38PM -0400, Waiman Long wrote: Since enabling paravirt spinlock will disable unlock function inlining, a jump label can be added to the unlock function without adding patch sites all over the kernel. But you don't have to. My patches allowed for the inline to remain, again reducing the overhead of enabling PV spinlocks while running on a real machine. Look at: http://lkml.kernel.org/r/20140615130154.213923...@chello.nl In particular this hunk: Index: linux-2.6/arch/x86/kernel/paravirt_patch_64.c === --- linux-2.6.orig/arch/x86/kernel/paravirt_patch_64.c +++ linux-2.6/arch/x86/kernel/paravirt_patch_64.c @@ -22,6 +22,10 @@ DEF_NATIVE(pv_cpu_ops, swapgs, swapgs) DEF_NATIVE(, mov32, mov %edi, %eax); DEF_NATIVE(, mov64, mov %rdi, %rax); +#if defined(CONFIG_PARAVIRT_SPINLOCKS) defined(CONFIG_QUEUE_SPINLOCK) +DEF_NATIVE(pv_lock_ops, queue_unlock, movb $0, (%rdi)); +#endif + unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len) { return paravirt_patch_insns(insnbuf, len, @@ -61,6 +65,9 @@ unsigned native_patch(u8 type, u16 clobb PATCH_SITE(pv_cpu_ops, clts); PATCH_SITE(pv_mmu_ops, flush_tlb_single); PATCH_SITE(pv_cpu_ops, wbinvd); +#if defined(CONFIG_PARAVIRT_SPINLOCKS) defined(CONFIG_QUEUE_SPINLOCK) + PATCH_SITE(pv_lock_ops, queue_unlock); +#endif patch_site: ret = paravirt_patch_insns(ibuf, len, start, end); That makes sure to overwrite the callee-saved call to the pv_lock_ops::queue_unlock with the immediate asm movb $0, (%rdi). Therefore you can retain the inlined unlock with hardly (there might be some NOP padding) any overhead at all. On PV it reverts to a callee saved function call. My concern is that spin_unlock() can be called in many places, including loadable kernel modules. Can the paravirt_patch_ident_32() function able to patch all of them in reasonable time? How about a kernel module loaded later at run time? It has too. When the modules are loaded the .paravirt symbols are exposed and the module loader patches that. And during bootup time (before modules are loaded) it also patches everything - when it only runs on one CPU. So I think we may still need to disable unlock function inlining even if we used your way kernel site patching. No need. Inline should (And is) working just fine. Regards, Longman ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v12 09/11] pvqspinlock, x86: Add para-virtualization support
On Mon, Oct 27, 2014 at 01:38:20PM -0400, Waiman Long wrote: On 10/24/2014 04:54 AM, Peter Zijlstra wrote: On Thu, Oct 16, 2014 at 02:10:38PM -0400, Waiman Long wrote: Since enabling paravirt spinlock will disable unlock function inlining, a jump label can be added to the unlock function without adding patch sites all over the kernel. But you don't have to. My patches allowed for the inline to remain, again reducing the overhead of enabling PV spinlocks while running on a real machine. Look at: http://lkml.kernel.org/r/20140615130154.213923...@chello.nl In particular this hunk: Index: linux-2.6/arch/x86/kernel/paravirt_patch_64.c === --- linux-2.6.orig/arch/x86/kernel/paravirt_patch_64.c +++ linux-2.6/arch/x86/kernel/paravirt_patch_64.c @@ -22,6 +22,10 @@ DEF_NATIVE(pv_cpu_ops, swapgs, swapgs) DEF_NATIVE(, mov32, mov %edi, %eax); DEF_NATIVE(, mov64, mov %rdi, %rax); +#if defined(CONFIG_PARAVIRT_SPINLOCKS) defined(CONFIG_QUEUE_SPINLOCK) +DEF_NATIVE(pv_lock_ops, queue_unlock, movb $0, (%rdi)); +#endif + unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len) { return paravirt_patch_insns(insnbuf, len, @@ -61,6 +65,9 @@ unsigned native_patch(u8 type, u16 clobb PATCH_SITE(pv_cpu_ops, clts); PATCH_SITE(pv_mmu_ops, flush_tlb_single); PATCH_SITE(pv_cpu_ops, wbinvd); +#if defined(CONFIG_PARAVIRT_SPINLOCKS) defined(CONFIG_QUEUE_SPINLOCK) + PATCH_SITE(pv_lock_ops, queue_unlock); +#endif patch_site: ret = paravirt_patch_insns(ibuf, len, start, end); That makes sure to overwrite the callee-saved call to the pv_lock_ops::queue_unlock with the immediate asm movb $0, (%rdi). Therefore you can retain the inlined unlock with hardly (there might be some NOP padding) any overhead at all. On PV it reverts to a callee saved function call. My concern is that spin_unlock() can be called in many places, including loadable kernel modules. Can the paravirt_patch_ident_32() function able to patch all of them in reasonable time? How about a kernel module loaded later at run time? modules should be fine, see arch/x86/kernel/module.c:module_finalize() - apply_paravirt(). Also note that the 'default' text is an indirect call into the paravirt ops table which routes to the 'right' function, so even if the text patching would be 'late' calls would 'work' as expected, just slower. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v12 09/11] pvqspinlock, x86: Add para-virtualization support
On 10/27/2014 01:27 PM, Peter Zijlstra wrote: On Mon, Oct 27, 2014 at 01:15:53PM -0400, Waiman Long wrote: On 10/24/2014 06:04 PM, Peter Zijlstra wrote: On Fri, Oct 24, 2014 at 04:53:27PM -0400, Waiman Long wrote: The additional register pressure may just cause a few more register moves which should be negligible in the overall performance . The additional icache pressure, however, may have some impact on performance. I was trying to balance the performance of the pv and non-pv versions so that we won't penalize the pv code too much for a bit more performance in the non-pv code. Doing it your way will add a lot of function call and register saving/restoring to the pv code. If people care about performance they should not be using virt crap :-) I only really care about bare metal. Yes, I am aware of that. However, the whole point of doing PV spinlock is to improve performance in a virtual guest. Anything that avoids the lock holder preemption nonsense is a _massive_ win for them, a few function calls should not even register on that scale. I would say all the PV stuffs are mostly useful for a over-committed guest where a single CPU is shared in more than one guest. When the guests are not overcommitted, the PV code seldom get triggered. In those cases, the overhead of the extra function call and register saving/restoring will show up. +#ifdef _GEN_PV_LOCK_SLOWPATH +static void pv_queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) +#else void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) +#endif If you have two functions you might as well use the PV stuff to patch in the right function call at the usage sites and avoid: + if (pv_enabled()) { + pv_queue_spin_lock_slowpath(lock, val); + return; + } this alltogether. Good point! I will do some investigation on how to do this kind of function address patching and eliminate the extra function call overhead. this_cpu_dec(mcs_nodes[0].count); } EXPORT_SYMBOL(queue_spin_lock_slowpath); + +#if !defined(_GEN_PV_LOCK_SLOWPATH) defined(CONFIG_PARAVIRT_SPINLOCKS) +/* + * Generate the PV version of the queue_spin_lock_slowpath function + */ +#undef pv_init_node +#undef pv_wait_check +#undef pv_link_and_wait_node +#undef pv_wait_head +#undef EXPORT_SYMBOL +#undef in_pv_code + +#define _GEN_PV_LOCK_SLOWPATH +#define EXPORT_SYMBOL(x) +#define in_pv_code return_true +#define pv_enabled return_false + +#include qspinlock.c + +#endif That's properly disgusting :-) But a lot better than actually duplicating everything I suppose. I know you don't like this kind of preprocessor trick, but this is the easiest way that I can think of to generate two separate functions from the same source code. -Longman ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v12 09/11] pvqspinlock, x86: Add para-virtualization support
On 10/27/2014 02:02 PM, Konrad Rzeszutek Wilk wrote: On Mon, Oct 27, 2014 at 01:38:20PM -0400, Waiman Long wrote: On 10/24/2014 04:54 AM, Peter Zijlstra wrote: On Thu, Oct 16, 2014 at 02:10:38PM -0400, Waiman Long wrote: Since enabling paravirt spinlock will disable unlock function inlining, a jump label can be added to the unlock function without adding patch sites all over the kernel. But you don't have to. My patches allowed for the inline to remain, again reducing the overhead of enabling PV spinlocks while running on a real machine. Look at: http://lkml.kernel.org/r/20140615130154.213923...@chello.nl In particular this hunk: Index: linux-2.6/arch/x86/kernel/paravirt_patch_64.c === --- linux-2.6.orig/arch/x86/kernel/paravirt_patch_64.c +++ linux-2.6/arch/x86/kernel/paravirt_patch_64.c @@ -22,6 +22,10 @@ DEF_NATIVE(pv_cpu_ops, swapgs, swapgs) DEF_NATIVE(, mov32, mov %edi, %eax); DEF_NATIVE(, mov64, mov %rdi, %rax); +#if defined(CONFIG_PARAVIRT_SPINLOCKS) defined(CONFIG_QUEUE_SPINLOCK) +DEF_NATIVE(pv_lock_ops, queue_unlock, movb $0, (%rdi)); +#endif + unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len) { return paravirt_patch_insns(insnbuf, len, @@ -61,6 +65,9 @@ unsigned native_patch(u8 type, u16 clobb PATCH_SITE(pv_cpu_ops, clts); PATCH_SITE(pv_mmu_ops, flush_tlb_single); PATCH_SITE(pv_cpu_ops, wbinvd); +#if defined(CONFIG_PARAVIRT_SPINLOCKS) defined(CONFIG_QUEUE_SPINLOCK) + PATCH_SITE(pv_lock_ops, queue_unlock); +#endif patch_site: ret = paravirt_patch_insns(ibuf, len, start, end); That makes sure to overwrite the callee-saved call to the pv_lock_ops::queue_unlock with the immediate asm movb $0, (%rdi). Therefore you can retain the inlined unlock with hardly (there might be some NOP padding) any overhead at all. On PV it reverts to a callee saved function call. My concern is that spin_unlock() can be called in many places, including loadable kernel modules. Can the paravirt_patch_ident_32() function able to patch all of them in reasonable time? How about a kernel module loaded later at run time? It has too. When the modules are loaded the .paravirt symbols are exposed and the module loader patches that. And during bootup time (before modules are loaded) it also patches everything - when it only runs on one CPU. So I think we may still need to disable unlock function inlining even if we used your way kernel site patching. No need. Inline should (And is) working just fine. Regards, Longman Thanks for letting me know about the paravirt patching capability available in the kernel. In this case, I would say we should use Peter's way of doing unlock without disabling unlock function inlining. That will further reduce the performance difference of kernels with and without PV. Cheer, Longman ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v12 09/11] pvqspinlock, x86: Add para-virtualization support
On 10/27/2014 02:04 PM, Peter Zijlstra wrote: On Mon, Oct 27, 2014 at 01:38:20PM -0400, Waiman Long wrote: On 10/24/2014 04:54 AM, Peter Zijlstra wrote: On Thu, Oct 16, 2014 at 02:10:38PM -0400, Waiman Long wrote: Since enabling paravirt spinlock will disable unlock function inlining, a jump label can be added to the unlock function without adding patch sites all over the kernel. But you don't have to. My patches allowed for the inline to remain, again reducing the overhead of enabling PV spinlocks while running on a real machine. Look at: http://lkml.kernel.org/r/20140615130154.213923...@chello.nl In particular this hunk: Index: linux-2.6/arch/x86/kernel/paravirt_patch_64.c === --- linux-2.6.orig/arch/x86/kernel/paravirt_patch_64.c +++ linux-2.6/arch/x86/kernel/paravirt_patch_64.c @@ -22,6 +22,10 @@ DEF_NATIVE(pv_cpu_ops, swapgs, swapgs) DEF_NATIVE(, mov32, mov %edi, %eax); DEF_NATIVE(, mov64, mov %rdi, %rax); +#if defined(CONFIG_PARAVIRT_SPINLOCKS) defined(CONFIG_QUEUE_SPINLOCK) +DEF_NATIVE(pv_lock_ops, queue_unlock, movb $0, (%rdi)); +#endif + unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len) { return paravirt_patch_insns(insnbuf, len, @@ -61,6 +65,9 @@ unsigned native_patch(u8 type, u16 clobb PATCH_SITE(pv_cpu_ops, clts); PATCH_SITE(pv_mmu_ops, flush_tlb_single); PATCH_SITE(pv_cpu_ops, wbinvd); +#if defined(CONFIG_PARAVIRT_SPINLOCKS) defined(CONFIG_QUEUE_SPINLOCK) + PATCH_SITE(pv_lock_ops, queue_unlock); +#endif patch_site: ret = paravirt_patch_insns(ibuf, len, start, end); That makes sure to overwrite the callee-saved call to the pv_lock_ops::queue_unlock with the immediate asm movb $0, (%rdi). Therefore you can retain the inlined unlock with hardly (there might be some NOP padding) any overhead at all. On PV it reverts to a callee saved function call. My concern is that spin_unlock() can be called in many places, including loadable kernel modules. Can the paravirt_patch_ident_32() function able to patch all of them in reasonable time? How about a kernel module loaded later at run time? modules should be fine, see arch/x86/kernel/module.c:module_finalize() - apply_paravirt(). Also note that the 'default' text is an indirect call into the paravirt ops table which routes to the 'right' function, so even if the text patching would be 'late' calls would 'work' as expected, just slower. Thanks for letting me know about that. I have this concern because your patch didn't change the current configuration of disabling unlock inlining when paravirt_spinlock is enabled. With that, I think it is worthwhile to reduce the performance delta between the PV and non-PV kernel on bare metal. -Longman ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC 1/4] virtio_net: pass vi around
Michael S. Tsirkin m...@redhat.com writes: Too many places poke at [rs]q-vq-vdev-priv just to get the the vi structure. Let's just pass the pointer around: seems cleaner, and might even be faster. Agreed, it's neater. Acked-by: Rusty Russell ru...@rustcorp.com.au Thanks, Rusty. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/net/virtio_net.c | 36 +++- 1 file changed, 19 insertions(+), 17 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 57cbc7d..36f3dfc 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -241,11 +241,11 @@ static unsigned long mergeable_buf_to_ctx(void *buf, unsigned int truesize) } /* Called from bottom half context */ -static struct sk_buff *page_to_skb(struct receive_queue *rq, +static struct sk_buff *page_to_skb(struct virtnet_info *vi, +struct receive_queue *rq, struct page *page, unsigned int offset, unsigned int len, unsigned int truesize) { - struct virtnet_info *vi = rq-vq-vdev-priv; struct sk_buff *skb; struct skb_vnet_hdr *hdr; unsigned int copy, hdr_len, hdr_padded_len; @@ -328,12 +328,13 @@ static struct sk_buff *receive_small(void *buf, unsigned int len) } static struct sk_buff *receive_big(struct net_device *dev, +struct virtnet_info *vi, struct receive_queue *rq, void *buf, unsigned int len) { struct page *page = buf; - struct sk_buff *skb = page_to_skb(rq, page, 0, len, PAGE_SIZE); + struct sk_buff *skb = page_to_skb(vi, rq, page, 0, len, PAGE_SIZE); if (unlikely(!skb)) goto err; @@ -359,7 +360,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, int offset = buf - page_address(page); unsigned int truesize = max(len, mergeable_ctx_to_buf_truesize(ctx)); - struct sk_buff *head_skb = page_to_skb(rq, page, offset, len, truesize); + struct sk_buff *head_skb = page_to_skb(vi, rq, page, offset, len, +truesize); struct sk_buff *curr_skb = head_skb; if (unlikely(!curr_skb)) @@ -433,9 +435,9 @@ err_buf: return NULL; } -static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len) +static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq, + void *buf, unsigned int len) { - struct virtnet_info *vi = rq-vq-vdev-priv; struct net_device *dev = vi-dev; struct virtnet_stats *stats = this_cpu_ptr(vi-stats); struct sk_buff *skb; @@ -459,9 +461,9 @@ static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len) if (vi-mergeable_rx_bufs) skb = receive_mergeable(dev, vi, rq, (unsigned long)buf, len); else if (vi-big_packets) - skb = receive_big(dev, rq, buf, len); + skb = receive_big(dev, vi, rq, buf, len); else - skb = receive_small(buf, len); + skb = receive_small(vi, buf, len); if (unlikely(!skb)) return; @@ -530,9 +532,9 @@ frame_err: dev_kfree_skb(skb); } -static int add_recvbuf_small(struct receive_queue *rq, gfp_t gfp) +static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, + gfp_t gfp) { - struct virtnet_info *vi = rq-vq-vdev-priv; struct sk_buff *skb; struct skb_vnet_hdr *hdr; int err; @@ -655,9 +657,9 @@ static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp) * before we're receiving packets, or from refill_work which is * careful to disable receiving (using napi_disable). */ -static bool try_fill_recv(struct receive_queue *rq, gfp_t gfp) +static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq, + gfp_t gfp) { - struct virtnet_info *vi = rq-vq-vdev-priv; int err; bool oom; @@ -668,7 +670,7 @@ static bool try_fill_recv(struct receive_queue *rq, gfp_t gfp) else if (vi-big_packets) err = add_recvbuf_big(rq, gfp); else - err = add_recvbuf_small(rq, gfp); + err = add_recvbuf_small(vi, rq, gfp); oom = err == -ENOMEM; if (err) @@ -717,7 +719,7 @@ static void refill_work(struct work_struct *work) struct receive_queue *rq = vi-rq[i]; napi_disable(rq-napi); - still_empty = !try_fill_recv(rq, GFP_KERNEL); + still_empty = !try_fill_recv(vi, rq, GFP_KERNEL); virtnet_napi_enable(rq); /* In theory, this can happen:
Re: [PATCH] virtio_blk: fix race at module removal
Ming Lei tom.leim...@gmail.com writes: On Fri, Oct 24, 2014 at 12:12 AM, Michael S. Tsirkin m...@redhat.com wrote: If a device appears while module is being removed, driver will get a callback after we've given up on the major number. In theory this means this major number can get reused by something else, resulting in a conflict. Yes, there is a tiny race window. Applied. Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC 2/4] virtio_net: get rid of virtio_net_hdr/skb_vnet_hdr
Michael S. Tsirkin m...@redhat.com writes: virtio 1.0 doesn't use virtio_net_hdr anymore, and in fact, it's not really useful since virtio_net_hdr_mrg_rxbuf includes that as the first field anyway. Let's drop it, precalculate header len and store within vi instead. This way we can also remove struct skb_vnet_hdr. Yes, this is definitely a win. Acked-by: Rusty Russell ru...@rustcorp.com.au Thanks, Rusty. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/net/virtio_net.c | 88 ++-- 1 file changed, 40 insertions(+), 48 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 36f3dfc..a795a23 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -123,6 +123,9 @@ struct virtnet_info { /* Host can handle any s/g split between our header and packet data */ bool any_header_sg; + /* Packet virtio header size */ + u8 hdr_len; + /* Active statistics */ struct virtnet_stats __percpu *stats; @@ -139,21 +142,14 @@ struct virtnet_info { struct notifier_block nb; }; -struct skb_vnet_hdr { - union { - struct virtio_net_hdr hdr; - struct virtio_net_hdr_mrg_rxbuf mhdr; - }; -}; - struct padded_vnet_hdr { - struct virtio_net_hdr hdr; + struct virtio_net_hdr_mrg_rxbuf hdr; /* - * virtio_net_hdr should be in a separated sg buffer because of a - * QEMU bug, and data sg buffer shares same page with this header sg. - * This padding makes next sg 16 byte aligned after virtio_net_hdr. + * hdr is in a separate sg buffer, and data sg buffer shares same page + * with this header sg. This padding makes next sg 16 byte aligned + * after the header. */ - char padding[6]; + char padding[4]; }; /* Converting between virtqueue no. and kernel tx/rx queue no. @@ -179,9 +175,9 @@ static int rxq2vq(int rxq) return rxq * 2; } -static inline struct skb_vnet_hdr *skb_vnet_hdr(struct sk_buff *skb) +static inline struct virtio_net_hdr_mrg_rxbuf *skb_vnet_hdr(struct sk_buff *skb) { - return (struct skb_vnet_hdr *)skb-cb; + return (struct virtio_net_hdr_mrg_rxbuf *)skb-cb; } /* @@ -247,7 +243,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, unsigned int len, unsigned int truesize) { struct sk_buff *skb; - struct skb_vnet_hdr *hdr; + struct virtio_net_hdr_mrg_rxbuf *hdr; unsigned int copy, hdr_len, hdr_padded_len; char *p; @@ -260,13 +256,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, hdr = skb_vnet_hdr(skb); - if (vi-mergeable_rx_bufs) { - hdr_len = sizeof hdr-mhdr; - hdr_padded_len = sizeof hdr-mhdr; - } else { - hdr_len = sizeof hdr-hdr; + hdr_len = vi-hdr_len; + if (vi-mergeable_rx_bufs) + hdr_padded_len = sizeof *hdr; + else hdr_padded_len = sizeof(struct padded_vnet_hdr); - } memcpy(hdr, p, hdr_len); @@ -317,11 +311,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, return skb; } -static struct sk_buff *receive_small(void *buf, unsigned int len) +static struct sk_buff *receive_small(struct virtnet_info *vi, void *buf, unsigned int len) { struct sk_buff * skb = buf; - len -= sizeof(struct virtio_net_hdr); + len -= vi-hdr_len; skb_trim(skb, len); return skb; @@ -354,8 +348,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, unsigned int len) { void *buf = mergeable_ctx_to_buf_address(ctx); - struct skb_vnet_hdr *hdr = buf; - u16 num_buf = virtio16_to_cpu(rq-vq-vdev, hdr-mhdr.num_buffers); + struct virtio_net_hdr_mrg_rxbuf *hdr = buf; + u16 num_buf = virtio16_to_cpu(vi-vdev, hdr-num_buffers); struct page *page = virt_to_head_page(buf); int offset = buf - page_address(page); unsigned int truesize = max(len, mergeable_ctx_to_buf_truesize(ctx)); @@ -373,8 +367,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, if (unlikely(!ctx)) { pr_debug(%s: rx error: %d buffers out of %d missing\n, dev-name, num_buf, - virtio16_to_cpu(rq-vq-vdev, - hdr-mhdr.num_buffers)); + virtio16_to_cpu(vi-vdev, + hdr-num_buffers)); dev-stats.rx_length_errors++; goto err_buf; } @@ -441,7 +435,7 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq, struct net_device *dev = vi-dev; struct
Re: [PATCH RFC 4/4] virtio_net: bigger header when VERSION_1 is set
Michael S. Tsirkin m...@redhat.com writes: With VERSION_1 virtio_net uses same header size whether mergeable buffers are enabled or not. Signed-off-by: Michael S. Tsirkin m...@redhat.com These two are great too, thanks: Acked-by: Rusty Russell ru...@rustcorp.com.au Cheers, Rusty. --- drivers/net/virtio_net.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 9c6d50f..a2fe340 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1764,7 +1764,8 @@ static int virtnet_probe(struct virtio_device *vdev) if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF)) vi-mergeable_rx_bufs = true; - if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF)) + if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF) || + virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) vi-hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf); else vi-hdr_len = sizeof(struct virtio_net_hdr); -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC 00/11] qemu: towards virtio-1 host support
On Fri, Oct 24, 2014 at 10:38:39AM +0200, Cornelia Huck wrote: On Fri, 24 Oct 2014 00:42:20 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Tue, Oct 07, 2014 at 04:39:56PM +0200, Cornelia Huck wrote: This patchset aims to get us some way to implement virtio-1 compliant and transitional devices in qemu. Branch available at git://github.com/cohuck/qemu virtio-1 I've mainly focused on: - endianness handling - extended feature bits - virtio-ccw new/changed commands So issues identified so far: Thanks for taking a look. - devices not converted yet should not advertize 1.0 Neither should an uncoverted transport. So we either can - have transport set the bit and rely on devices -get_features callback to mask it out (virtio-ccw has to change the calling order for get_features, btw.) - have device set the bit and the transport mask it out later. Feels a bit weird, as virtio-1 is a transport feature bit. I thought more about it, I think the right thing would be for unconverted transports to clear high bits on ack and get features. So bit 32 is set, but not exposed to guests. In fact at least for PCI, we have a 32 bit field for features in 0.9 so it's automatic. Didn't check mmio yet. I'm tending towards the first option; smth like this (on top of my branch): diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c index c29c8c8..f6501ea 100644 --- a/hw/9pfs/virtio-9p-device.c +++ b/hw/9pfs/virtio-9p-device.c @@ -24,6 +24,9 @@ static uint32_t virtio_9p_get_features(VirtIODevice *vdev, unsigned int index, uint32_t features) { +if (index == 1) { +features = ~(1 (VIRTIO_F_VERSION_1 - 32)); +} if (index 0) { return features; } diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c index 0d843fe..07a7a6f 100644 --- a/hw/char/virtio-serial-bus.c +++ b/hw/char/virtio-serial-bus.c @@ -472,6 +472,9 @@ static uint32_t get_features(VirtIODevice *vdev, unsigned int index, { VirtIOSerial *vser; +if (index == 1) { +features = ~(1 (VIRTIO_F_VERSION_1 - 32)); +} if (index 0) { return features; } diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 67f91c0..4b75105 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -447,6 +447,9 @@ static uint32_t virtio_net_get_features(VirtIODevice *vdev, unsigned int index, VirtIONet *n = VIRTIO_NET(vdev); NetClientState *nc = qemu_get_queue(n-nic); +if (index == 1 get_vhost_net(nc-peer)) { +features = ~(1 (VIRTIO_F_VERSION_1 - 32)); +} if (index 0) { return features; } diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c index 80efe88..07fbf40 100644 --- a/hw/s390x/virtio-ccw.c +++ b/hw/s390x/virtio-ccw.c @@ -839,16 +839,16 @@ static int virtio_ccw_device_init(VirtioCcwDevice *dev, VirtIODevice *vdev) dev-revision = -1; /* Set default feature bits that are offered by the host. */ +dev-host_features[0] = 0x1 VIRTIO_F_NOTIFY_ON_EMPTY; +dev-host_features[0] |= 0x1 VIRTIO_F_BAD_FEATURE; + +dev-host_features[1] = 1 (VIRTIO_F_VERSION_1 - 32); + for (i = 0; i ARRAY_SIZE(dev-host_features); i++) { dev-host_features[i] = virtio_bus_get_vdev_features(dev-bus, i, dev-host_features[i]); } -dev-host_features[0] |= 0x1 VIRTIO_F_NOTIFY_ON_EMPTY; -dev-host_features[0] |= 0x1 VIRTIO_F_BAD_FEATURE; - -dev-host_features[1] = 1 (VIRTIO_F_VERSION_1 - 32); - css_generate_sch_crws(sch-cssid, sch-ssid, sch-schid, parent-hotplugged, 1); return 0; diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c index 8e1afa0..8a8fdb9 100644 --- a/hw/scsi/vhost-scsi.c +++ b/hw/scsi/vhost-scsi.c @@ -155,6 +155,9 @@ static uint32_t vhost_scsi_get_features(VirtIODevice *vdev, unsigned int index, { VHostSCSI *s = VHOST_SCSI(vdev); +if (index == 1) { +features = ~(1 (VIRTIO_F_VERSION_1 - 32)); +} if (index 0) { return features; } diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index 088e688..378783f 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -611,6 +611,9 @@ static void virtio_scsi_set_config(VirtIODevice *vdev, static uint32_t virtio_scsi_get_features(VirtIODevice *vdev, unsigned int index, uint32_t requested_features) { +if (index == 1) { +requested_features = ~(1 (VIRTIO_F_VERSION_1 - 32)); +} return requested_features; } diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c index 5af17e2..bd52845 100644 --- a/hw/virtio/virtio-balloon.c +++ b/hw/virtio/virtio-balloon.c @@ -306,6 +306,9 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,