On Tue, Mar 17, 2015 at 04:06:14PM +0000, Jan Beulich wrote:
> >>> On 17.03.15 at 16:38, <konrad.w...@oracle.com> wrote:
> > --- a/xen/drivers/passthrough/io.c
> > +++ b/xen/drivers/passthrough/io.c
> > @@ -804,7 +804,17 @@ static void dpci_softirq(void)
> >          d = pirq_dpci->dom;
> >          smp_mb(); /* 'd' MUST be saved before we set/clear the bits. */
> >          if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
> > -            BUG();
> > +        {
> > +            unsigned long flags;
> > +
> > +            /* Put back on the list and retry. */
> > +            local_irq_save(flags);
> > +            list_add_tail(&pirq_dpci->softirq_list, &this_cpu(dpci_list));
> > +            local_irq_restore(flags);
> > +
> > +            raise_softirq(HVM_DPCI_SOFTIRQ);
> > +            continue;
> > +        }
> 
> As just said in another mail - unless there are convincing new
> arguments in favor of this (more of a hack than a real fix), I'm
> not going to accept it and instead consider reverting the
> offending commit. Iirc the latest we had come to looked quite a
> bit better than this one.

The latest one (please see attached) would cause an dead-lock iff
on the CPU we are running the softirq and an do_IRQ comes for the
exact dpci we are in process of executing.

> 
> Jan
> 
>From 6b32dccfbe00518d3ca9cd94d19a6e007b2645d9 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.w...@oracle.com>
Date: Tue, 17 Mar 2015 09:46:09 -0400
Subject: [PATCH] dpci: when scheduling spin until STATE_RUN or STATE_SCHED has
 been cleared.

There is race when we clear the STATE_SCHED in the softirq
- which allows the 'raise_softirq_for' (on another CPU)
to schedule the dpci.

Specifically this can happen whenthe other CPU receives
an interrupt, calls 'raise_softirq_for', and puts the dpci
on its per-cpu list (same dpci structure).

There would be two 'dpci_softirq' running at the same time
(on different CPUs) where on one CPU it would be executing
hvm_dirq_assist (so had cleared STATE_SCHED and set STATE_RUN)
and on the other CPU it is trying to call:

   if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
        BUG();

Since STATE_RUN is already set it would end badly.

The reason we can get his with this is when an interrupt
affinity is set over multiple CPUs.

Potential solutions:

a) Instead of the BUG() we can put the dpci back on the per-cpu
list to deal with later (when the softirq are activated again).
This putting the 'dpci' back on the per-cpu list is an spin
until the bad condition clears.

b) We could also expand the test-and-set(STATE_SCHED) in raise_softirq_for
to detect for 'STATE_RUN' bit being set and schedule the dpci
in a more safe manner (delay it). The dpci would stil not
be scheduled when STATE_SCHED bit was set.

c) This patch explores a third option - we will only schedule
the dpci when the state is cleared (no STATE_SCHED and no STATE_RUN).

We will spin if STATE_RUN is set (as it is in progress and will
finish). If the STATE_SCHED is set (so hasn't run yet) we won't
try to spin and just exit. This can cause an dead-lock if the interrupt
comes when we are processing the dpci in the softirq.

Interestingly the old ('tasklet') code used an a) mechanism.
If the function assigned to the tasklet was running  - the softirq
that ran said function (hvm_dirq_assist) would be responsible for
putting the tasklet back on the per-cpu list. This would allow
to have an running tasklet and an 'to-be-scheduled' tasklet
at the same time. This solution moves this 'to-be-scheduled'
job to be done in 'raise_softirq_for' (instead of the
'softirq').

Reported-by: Sander Eikelenboom <li...@eikelenboom.it>
Reported-by: Malcolm Crossley <malcolm.cross...@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.w...@oracle.com>
---
 xen/drivers/passthrough/io.c | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index ae050df..9c30ebb 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -63,10 +63,32 @@ enum {
 static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
 {
     unsigned long flags;
+    unsigned long old;
 
-    if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->state) )
-        return;
-
+    /*
+     * This cmpxchg spins until the state is zero (unused).
+     */
+    for ( ;; )
+    {
+        old = cmpxchg(&pirq_dpci->state, 0, 1 << STATE_SCHED);
+        switch ( old )
+        {
+        case (1 << STATE_SCHED):
+            /*
+             * Whenever STATE_SCHED is set we MUST not schedule it.
+             */
+            return;
+        case (1 << STATE_RUN) | (1 << STATE_SCHED):
+        case (1 << STATE_RUN):
+            /* Getting close to finish. Spin. */
+            continue;
+        }
+        /*
+         * If the 'state' is 0 (not in use) we can schedule it.
+         */
+        if ( old == 0 )
+            break;
+    }
     get_knownalive_domain(pirq_dpci->dom);
 
     local_irq_save(flags);
-- 
2.1.0

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Reply via email to