Re: [PATCH v11 06/16] qspinlock: prolong the stay in the pending bit path
On Fri, May 30, 2014 at 11:43:52AM -0400, Waiman Long wrote: --- kernel/locking/qspinlock.c | 18 -- 1 files changed, 16 insertions(+), 2 deletions(-) diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index fc7fd8c..7f10758 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -233,11 +233,25 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) */ for (;;) { /* - * If we observe any contention; queue. + * If we observe that the queue is not empty or both + * the pending and lock bits are set, queue */ - if (val ~_Q_LOCKED_MASK) + if ((val _Q_TAIL_MASK) || + (val == (_Q_LOCKED_VAL|_Q_PENDING_VAL))) goto queue; + if (val == _Q_PENDING_VAL) { + /* + * Pending bit is set, but not the lock bit. + * Assuming that the pending bit holder is going to + * set the lock bit and clear the pending bit soon, + * it is better to wait than to exit at this point. + */ + cpu_relax(); + val = atomic_read(lock-val); + continue; + } + new = _Q_LOCKED_VAL; if (val == new) new |= _Q_PENDING_VAL; So, again, you just posted a new version without replying to the previous discussion; so let me try again, what's wrong with the proposal here: lkml.kernel.org/r/20140417163640.gt11...@twins.programming.kicks-ass.net pgpblndpE3D11.pgp Description: PGP signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v11 09/16] qspinlock, x86: Allow unfair spinlock in a virtual guest
On Fri, May 30, 2014 at 11:43:55AM -0400, Waiman Long wrote: Enabling this configuration feature causes a slight decrease the performance of an uncontended lock-unlock operation by about 1-2% mainly due to the use of a static key. However, uncontended lock-unlock operation are really just a tiny percentage of a real workload. So there should no noticeable change in application performance. No, entirely unacceptable. +#ifdef CONFIG_VIRT_UNFAIR_LOCKS +/** + * queue_spin_trylock_unfair - try to acquire the queue spinlock unfairly + * @lock : Pointer to queue spinlock structure + * Return: 1 if lock acquired, 0 if failed + */ +static __always_inline int queue_spin_trylock_unfair(struct qspinlock *lock) +{ + union arch_qspinlock *qlock = (union arch_qspinlock *)lock; + + if (!qlock-locked (cmpxchg(qlock-locked, 0, _Q_LOCKED_VAL) == 0)) + return 1; + return 0; +} + +/** + * queue_spin_lock_unfair - acquire a queue spinlock unfairly + * @lock: Pointer to queue spinlock structure + */ +static __always_inline void queue_spin_lock_unfair(struct qspinlock *lock) +{ + union arch_qspinlock *qlock = (union arch_qspinlock *)lock; + + if (likely(cmpxchg(qlock-locked, 0, _Q_LOCKED_VAL) == 0)) + return; + /* + * Since the lock is now unfair, we should not activate the 2-task + * pending bit spinning code path which disallows lock stealing. + */ + queue_spin_lock_slowpath(lock, -1); +} Why is this needed? +/* + * Redefine arch_spin_lock and arch_spin_trylock as inline functions that will + * jump to the unfair versions if the static key virt_unfairlocks_enabled + * is true. + */ +#undef arch_spin_lock +#undef arch_spin_trylock +#undef arch_spin_lock_flags + +/** + * arch_spin_lock - acquire a queue spinlock + * @lock: Pointer to queue spinlock structure + */ +static inline void arch_spin_lock(struct qspinlock *lock) +{ + if (static_key_false(virt_unfairlocks_enabled)) + queue_spin_lock_unfair(lock); + else + queue_spin_lock(lock); +} + +/** + * arch_spin_trylock - try to acquire the queue spinlock + * @lock : Pointer to queue spinlock structure + * Return: 1 if lock acquired, 0 if failed + */ +static inline int arch_spin_trylock(struct qspinlock *lock) +{ + if (static_key_false(virt_unfairlocks_enabled)) + return queue_spin_trylock_unfair(lock); + else + return queue_spin_trylock(lock); +} So I really don't see the point of all this? Why do you need special {try,}lock paths for this case? Are you worried about the upper 24bits? diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index ae1b19d..3723c83 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -217,6 +217,14 @@ static __always_inline int try_set_locked(struct qspinlock *lock) { struct __qspinlock *l = (void *)lock; +#ifdef CONFIG_VIRT_UNFAIR_LOCKS + /* + * Need to use atomic operation to grab the lock when lock stealing + * can happen. + */ + if (static_key_false(virt_unfairlocks_enabled)) + return cmpxchg(l-locked, 0, _Q_LOCKED_VAL) == 0; +#endif barrier(); ACCESS_ONCE(l-locked) = _Q_LOCKED_VAL; barrier(); Why? If we have a simple test-and-set lock like below, we'll never get here at all. @@ -252,6 +260,18 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) BUILD_BUG_ON(CONFIG_NR_CPUS = (1U _Q_TAIL_CPU_BITS)); +#ifdef CONFIG_VIRT_UNFAIR_LOCKS + /* + * A simple test and set unfair lock + */ + if (static_key_false(virt_unfairlocks_enabled)) { + cpu_relax();/* Relax after a failed lock attempt */ Meh, I don't think anybody can tell the difference if you put that in or not, therefore don't. + while (!queue_spin_trylock(lock)) + cpu_relax(); + return; + } +#endif /* CONFIG_VIRT_UNFAIR_LOCKS */ If you're really worried about those upper 24bits, you can always clear them when you get here. pgpIuA4TLXzJ1.pgp Description: PGP signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v11 09/16] qspinlock, x86: Allow unfair spinlock in a virtual guest
On Wed, Jun 11, 2014 at 12:54:02PM +0200, Peter Zijlstra wrote: @@ -252,6 +260,18 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) BUILD_BUG_ON(CONFIG_NR_CPUS = (1U _Q_TAIL_CPU_BITS)); +#ifdef CONFIG_VIRT_UNFAIR_LOCKS + /* +* A simple test and set unfair lock +*/ + if (static_key_false(virt_unfairlocks_enabled)) { + cpu_relax();/* Relax after a failed lock attempt */ Meh, I don't think anybody can tell the difference if you put that in or not, therefore don't. + while (!queue_spin_trylock(lock)) + cpu_relax(); + return; + } +#endif /* CONFIG_VIRT_UNFAIR_LOCKS */ If you're really worried about those upper 24bits, you can always clear them when you get here. I don't think its a problem at all; flipping the static_key requires stop_machine, which guarantees us that there are no spinlocks held. So I think you can actually BUG_ON() the upper 24bits. pgpJoerokfAAj.pgp Description: PGP signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PULL] vhost: infrastructure changes for 3.16
Hi Linus, Please pull the following. Please note this needs to be merged before merging target-pending PULL which Nicholas will be sending out shortly. Thanks! The following changes since commit 1860e379875dfe7271c649058aeddffe5afd9d0d: Linux 3.15 (2014-06-08 11:19:54 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus for you to fetch changes up to 47283bef7ed356629467d1fac61687756e48f254: vhost: move memory pointer to VQs (2014-06-09 16:21:07 +0300) vhost: infrastructure changes for 3.16 This reworks vhost core dropping unnecessary RCU uses in favor of VQ mutexes which are used on fast path anyway. This fixes worst-case latency for users which change the memory mappings a lot. Memory allocation for vhost-net now supports fallback on vmalloc (same as for vhost-scsi) this makes it possible to create the device on systems where memory is very fragmented, with slightly lower performance. Signed-off-by: Michael S. Tsirkin m...@redhat.com Michael S. Tsirkin (4): vhost-net: extend device allocation to vmalloc vhost: replace rcu with mutex vhost: move acked_features to VQs vhost: move memory pointer to VQs drivers/vhost/vhost.h | 19 -- drivers/vhost/net.c | 35 --- drivers/vhost/scsi.c | 26 -- drivers/vhost/test.c | 11 +++--- drivers/vhost/vhost.c | 97 ++- 5 files changed, 101 insertions(+), 87 deletions(-) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] block: virtio_blk: don't hold spin lock during world switch
Il 02/06/2014 15:06, Ming Lei ha scritto: If you're running SMP under an emulator where exits are expensive, then this wins. Under KVM it's marginal at best. Both my tests on arm64 and x86 are under KVM, and looks the patch can improve performance a lot. IMO, even though under KVM, virtio-blk performance still depends how well hypervisor( qemu, ...) emulates the device, and basically speaking, it is expensive to switch from guest to host and let host handle the notification. The difference is that virtio-pci supports ioeventfd and virtio-mmio doesn't. With ioeventfd you can tell KVM I don't care about the value that is written to a memory location, only that it is accessed. Then when the write happens, KVM doesn't do an expensive userspace exit; it just writes 1 to an eventfd. It then returns to the guest, userspace picks up the eventfd via its poll() loop and services the device. This is already useful for throughput on UP, and the small latency cost (because of the cost of the event loop in the I/O thread, and possibly the cost of waking up the thread) is usually offset by the benefit. But on SMP you get double benefit. Obviously, the kernel doesn't have to spin while userspace does its stuff. On top of this, there is also a latency improvement from ioeventfd, because QEMU processes virtqueue_notify under its big QEMU lock. With ioeventfd, serialized virtqueue processing can be a bottleneck, but it doesn't affect latency. Without ioeventfd it affects the VCPUs' latency and negates a lot of the benefit of Ming Lei's patch. You can try disabling ioeventfd with -global virtio-blk-pci.ioeventfd=off on the QEMU command line. Performance will plummet. :) Paolo ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v11 06/16] qspinlock: prolong the stay in the pending bit path
On 6/11/2014 6:26 AM, Peter Zijlstra wrote: On Fri, May 30, 2014 at 11:43:52AM -0400, Waiman Long wrote: --- kernel/locking/qspinlock.c | 18 -- 1 files changed, 16 insertions(+), 2 deletions(-) diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index fc7fd8c..7f10758 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -233,11 +233,25 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) */ for (;;) { /* -* If we observe any contention; queue. +* If we observe that the queue is not empty or both +* the pending and lock bits are set, queue */ - if (val ~_Q_LOCKED_MASK) + if ((val _Q_TAIL_MASK) || + (val == (_Q_LOCKED_VAL|_Q_PENDING_VAL))) goto queue; + if (val == _Q_PENDING_VAL) { + /* +* Pending bit is set, but not the lock bit. +* Assuming that the pending bit holder is going to +* set the lock bit and clear the pending bit soon, +* it is better to wait than to exit at this point. +*/ + cpu_relax(); + val = atomic_read(lock-val); + continue; + } + new = _Q_LOCKED_VAL; if (val == new) new |= _Q_PENDING_VAL; So, again, you just posted a new version without replying to the previous discussion; so let me try again, what's wrong with the proposal here: lkml.kernel.org/r/20140417163640.gt11...@twins.programming.kicks-ass.net I thought I had answered you before, maybe the message was lost or the answer was not complete. Anyway, I will try to response to your question again here. Wouldn't something like: while (atomic_read(lock-val) == _Q_PENDING_VAL) cpu_relax(); before the cmpxchg loop have gotten you all this? That is not exactly the same. The loop will exit if other bits are set or the pending bit cleared. In the case, we will need to do the same check at the beginning of the for loop in order to avoid doing an extra cmpxchg that is not necessary. I just tried this on my code and I cannot see a difference. As I said before, I did see a difference with that change. I think it depends on the CPU chip that we used for testing. I ran my test on a 10-core Westmere-EX chip. I run my microbench on different pairs of core within the same chip. It produces different results that varies from 779.5ms to up to 1192ms. Without that patch, the lowest value I can get is still close to 800ms, but the highest can be up to 1800ms or so. So I believe it is just a matter of timing that you did not observed in your test machine. -Longman ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v11 09/16] qspinlock, x86: Allow unfair spinlock in a virtual guest
On 6/11/2014 6:54 AM, Peter Zijlstra wrote: On Fri, May 30, 2014 at 11:43:55AM -0400, Waiman Long wrote: Enabling this configuration feature causes a slight decrease the performance of an uncontended lock-unlock operation by about 1-2% mainly due to the use of a static key. However, uncontended lock-unlock operation are really just a tiny percentage of a real workload. So there should no noticeable change in application performance. No, entirely unacceptable. +#ifdef CONFIG_VIRT_UNFAIR_LOCKS +/** + * queue_spin_trylock_unfair - try to acquire the queue spinlock unfairly + * @lock : Pointer to queue spinlock structure + * Return: 1 if lock acquired, 0 if failed + */ +static __always_inline int queue_spin_trylock_unfair(struct qspinlock *lock) +{ + union arch_qspinlock *qlock = (union arch_qspinlock *)lock; + + if (!qlock-locked (cmpxchg(qlock-locked, 0, _Q_LOCKED_VAL) == 0)) + return 1; + return 0; +} + +/** + * queue_spin_lock_unfair - acquire a queue spinlock unfairly + * @lock: Pointer to queue spinlock structure + */ +static __always_inline void queue_spin_lock_unfair(struct qspinlock *lock) +{ + union arch_qspinlock *qlock = (union arch_qspinlock *)lock; + + if (likely(cmpxchg(qlock-locked, 0, _Q_LOCKED_VAL) == 0)) + return; + /* +* Since the lock is now unfair, we should not activate the 2-task +* pending bit spinning code path which disallows lock stealing. +*/ + queue_spin_lock_slowpath(lock, -1); +} Why is this needed? I added the unfair version of lock and trylock as my original version isn't a simple test-and-set lock. Now I changed the core part to use the simple test-and-set lock. However, I still think that an unfair version in the fast path can be helpful to performance when both the unfair lock and paravirt spinlock are enabled. In this case, paravirt spinlock code will disable the unfair lock code in the slowpath, but still allow the unfair version in the fast path to get the best possible performance in a virtual guest. Yes, I could take that out to allow either unfair or paravirt spinlock, but not both. I do think that a little bit of unfairness will help in the virtual environment. +/* + * Redefine arch_spin_lock and arch_spin_trylock as inline functions that will + * jump to the unfair versions if the static key virt_unfairlocks_enabled + * is true. + */ +#undef arch_spin_lock +#undef arch_spin_trylock +#undef arch_spin_lock_flags + +/** + * arch_spin_lock - acquire a queue spinlock + * @lock: Pointer to queue spinlock structure + */ +static inline void arch_spin_lock(struct qspinlock *lock) +{ + if (static_key_false(virt_unfairlocks_enabled)) + queue_spin_lock_unfair(lock); + else + queue_spin_lock(lock); +} + +/** + * arch_spin_trylock - try to acquire the queue spinlock + * @lock : Pointer to queue spinlock structure + * Return: 1 if lock acquired, 0 if failed + */ +static inline int arch_spin_trylock(struct qspinlock *lock) +{ + if (static_key_false(virt_unfairlocks_enabled)) + return queue_spin_trylock_unfair(lock); + else + return queue_spin_trylock(lock); +} So I really don't see the point of all this? Why do you need special {try,}lock paths for this case? Are you worried about the upper 24bits? No, as I said above. I was planning for the coexistence of unfair lock in the fast path and paravirt spinlock in the slowpath. diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index ae1b19d..3723c83 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -217,6 +217,14 @@ static __always_inline int try_set_locked(struct qspinlock *lock) { struct __qspinlock *l = (void *)lock; +#ifdef CONFIG_VIRT_UNFAIR_LOCKS + /* +* Need to use atomic operation to grab the lock when lock stealing +* can happen. +*/ + if (static_key_false(virt_unfairlocks_enabled)) + return cmpxchg(l-locked, 0, _Q_LOCKED_VAL) == 0; +#endif barrier(); ACCESS_ONCE(l-locked) = _Q_LOCKED_VAL; barrier(); Why? If we have a simple test-and-set lock like below, we'll never get here at all. Again, it is due the coexistence of unfair lock in fast path and paravirt spinlock in the slowpath. @@ -252,6 +260,18 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) BUILD_BUG_ON(CONFIG_NR_CPUS = (1U _Q_TAIL_CPU_BITS)); +#ifdef CONFIG_VIRT_UNFAIR_LOCKS + /* +* A simple test and set unfair lock +*/ + if (static_key_false(virt_unfairlocks_enabled)) { + cpu_relax();/* Relax after a failed lock attempt */ Meh, I don't think anybody can tell the difference if you put that in or not, therefore don't. Yes, I can take out the cpu_relax() here. -Longman ___ Virtualization mailing list
Re: Using virtio for inter-VM communication
Henning Schild henning.sch...@siemens.com writes: Hi, i am working on the jailhouse[1] project and am currently looking at inter-VM communication. We want to connect guests directly with virtual consoles based on shared memory. The code complexity in the hypervisor should be minimal, it should just make the shared memory discoverable and provide a signaling mechanism. Hi Henning, The virtio assumption was that the host can see all of guest memory. This simplifies things significantly, and makes it efficient. If you don't have this, *someone* needs to do a copy. Usually the guest OS does a bounce buffer into your shared region. Goodbye performance. Or you can play remapping tricks. Goodbye performance again. My preferred model is to have a trusted helper (ie. host) which understands how to copy between virtio rings. The backend guest (to steal Xen vocab) R/O maps the descriptor, avail ring and used rings in the guest. It then asks the trusted helper to do various operation (copy into writable descriptor, copy out of readable descriptor, mark used). The virtio ring itself acts as a grant table. Note: that helper mechanism is completely protocol agnostic. It was also explicitly designed into the virtio mechanism (with its 4k boundaries for data structures and its 'len' field to indicate how much was written into the descriptor). It was also never implemented, and remains a thought experiment. However, implementing it in lguest should be fairly easy. Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: Using virtio for inter-VM communication
On 2014-06-12 04:27, Rusty Russell wrote: Henning Schild henning.sch...@siemens.com writes: Hi, i am working on the jailhouse[1] project and am currently looking at inter-VM communication. We want to connect guests directly with virtual consoles based on shared memory. The code complexity in the hypervisor should be minimal, it should just make the shared memory discoverable and provide a signaling mechanism. Hi Henning, The virtio assumption was that the host can see all of guest memory. This simplifies things significantly, and makes it efficient. If you don't have this, *someone* needs to do a copy. Usually the guest OS does a bounce buffer into your shared region. Goodbye performance. Or you can play remapping tricks. Goodbye performance again. My preferred model is to have a trusted helper (ie. host) which understands how to copy between virtio rings. The backend guest (to steal Xen vocab) R/O maps the descriptor, avail ring and used rings in the guest. It then asks the trusted helper to do various operation (copy into writable descriptor, copy out of readable descriptor, mark used). The virtio ring itself acts as a grant table. Note: that helper mechanism is completely protocol agnostic. It was also explicitly designed into the virtio mechanism (with its 4k boundaries for data structures and its 'len' field to indicate how much was written into the descriptor). It was also never implemented, and remains a thought experiment. However, implementing it in lguest should be fairly easy. The reason why a trusted helper, i.e. additional logic in the hypervisor, is not our favorite solution is that we'd like to keep the hypervisor as small as possible. I wouldn't exclude such an approach categorically, but we have to weigh the costs (lines of code, additional hypervisor interface) carefully against the gain (existing specifications and guest driver infrastructure). Back to VIRTIO_F_RING_SHMEM_ADDR (which you once brought up in an MCA working group discussion): What speaks against introducing an alternative encoding of addresses inside virtio data structures? The idea of this flag was to replace guest-physical addresses with offsets into a shared memory region associated with or part of a virtio device. That would preserve zero-copy capabilities (as long as you can work against the shared mem directly, e.g. doing DMA from a physical NIC or storage device into it) and keep the hypervisor out of the loop. Is it too invasive to existing infrastructure or does it have some other pitfalls? Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v11 09/16] qspinlock, x86: Allow unfair spinlock in a virtual guest
On Wed, Jun 11, 2014 at 09:37:55PM -0400, Long, Wai Man wrote: On 6/11/2014 6:54 AM, Peter Zijlstra wrote: On Fri, May 30, 2014 at 11:43:55AM -0400, Waiman Long wrote: Enabling this configuration feature causes a slight decrease the performance of an uncontended lock-unlock operation by about 1-2% mainly due to the use of a static key. However, uncontended lock-unlock operation are really just a tiny percentage of a real workload. So there should no noticeable change in application performance. No, entirely unacceptable. +#ifdef CONFIG_VIRT_UNFAIR_LOCKS +/** + * queue_spin_trylock_unfair - try to acquire the queue spinlock unfairly + * @lock : Pointer to queue spinlock structure + * Return: 1 if lock acquired, 0 if failed + */ +static __always_inline int queue_spin_trylock_unfair(struct qspinlock *lock) +{ + union arch_qspinlock *qlock = (union arch_qspinlock *)lock; + + if (!qlock-locked (cmpxchg(qlock-locked, 0, _Q_LOCKED_VAL) == 0)) + return 1; + return 0; +} + +/** + * queue_spin_lock_unfair - acquire a queue spinlock unfairly + * @lock: Pointer to queue spinlock structure + */ +static __always_inline void queue_spin_lock_unfair(struct qspinlock *lock) +{ + union arch_qspinlock *qlock = (union arch_qspinlock *)lock; + + if (likely(cmpxchg(qlock-locked, 0, _Q_LOCKED_VAL) == 0)) + return; + /* +* Since the lock is now unfair, we should not activate the 2-task +* pending bit spinning code path which disallows lock stealing. +*/ + queue_spin_lock_slowpath(lock, -1); +} Why is this needed? I added the unfair version of lock and trylock as my original version isn't a simple test-and-set lock. Now I changed the core part to use the simple test-and-set lock. However, I still think that an unfair version in the fast path can be helpful to performance when both the unfair lock and paravirt spinlock are enabled. In this case, paravirt spinlock code will disable the unfair lock code in the slowpath, but still allow the unfair version in the fast path to get the best possible performance in a virtual guest. Yes, I could take that out to allow either unfair or paravirt spinlock, but not both. I do think that a little bit of unfairness will help in the virtual environment. When will you learn to like simplicity and stop this massive over engineering effort? There's no sane reason to have the test-and-set virt and paravirt locks enabled at the same bloody time. There's 3 distinct cases: - native - virt - paravirt And they do not overlap. Furthermore, if there is any possibility at all of not polluting the native code, grab it with both hands. Native performance is king, try your very utmost bestest to preserve that, paravirt is a distant second and nobody sane should care about the virt case at all. If you want extra lock stealing in the paravirt case, put it in the slowpath code before you start queueing. pgpqW6UFxU8pH.pgp Description: PGP signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization