This is a note to let you know that I've just added the patch titled
xen: Send spinlock IPI to all waiters
to the 3.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
The filename of the patch is:
xen-send-spinlock-ipi-to-all-waiters.patch
and it can be found in the queue-3.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <[email protected]> know about it.
>From 76eaca031f0af2bb303e405986f637811956a422 Mon Sep 17 00:00:00 2001
From: Stefan Bader <[email protected]>
Date: Fri, 15 Feb 2013 09:48:52 +0100
Subject: xen: Send spinlock IPI to all waiters
From: Stefan Bader <[email protected]>
commit 76eaca031f0af2bb303e405986f637811956a422 upstream.
There is a loophole between Xen's current implementation of
pv-spinlocks and the scheduler. This was triggerable through
a testcase until v3.6 changed the TLB flushing code. The
problem potentially is still there just not observable in the
same way.
What could happen was (is):
1. CPU n tries to schedule task x away and goes into a slow
wait for the runq lock of CPU n-# (must be one with a lower
number).
2. CPU n-#, while processing softirqs, tries to balance domains
and goes into a slow wait for its own runq lock (for updating
some records). Since this is a spin_lock_irqsave in softirq
context, interrupts will be re-enabled for the duration of
the poll_irq hypercall used by Xen.
3. Before the runq lock of CPU n-# is unlocked, CPU n-1 receives
an interrupt (e.g. endio) and when processing the interrupt,
tries to wake up task x. But that is in schedule and still
on_cpu, so try_to_wake_up goes into a tight loop.
4. The runq lock of CPU n-# gets unlocked, but the message only
gets sent to the first waiter, which is CPU n-# and that is
busily stuck.
5. CPU n-# never returns from the nested interruption to take and
release the lock because the scheduler uses a busy wait.
And CPU n never finishes the task migration because the unlock
notification only went to CPU n-#.
To avoid this and since the unlocking code has no real sense of
which waiter is best suited to grab the lock, just send the IPI
to all of them. This causes the waiters to return from the hyper-
call (those not interrupted at least) and do active spinlocking.
BugLink: http://bugs.launchpad.net/bugs/1011792
Acked-by: Jan Beulich <[email protected]>
Signed-off-by: Stefan Bader <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/xen/spinlock.c | 1 -
1 file changed, 1 deletion(-)
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -328,7 +328,6 @@ static noinline void xen_spin_unlock_slo
if (per_cpu(lock_spinners, cpu) == xl) {
ADD_STATS(released_slow_kicked, 1);
xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
- break;
}
}
}
Patches currently in stable-queue which might be from
[email protected] are
queue-3.4/xen-send-spinlock-ipi-to-all-waiters.patch
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html