Jan Kiszka wrote:
Am 04.11.2010 01:13, Gilles Chanteperdrix wrote:
Jan Kiszka wrote:
Am 04.11.2010 00:56, Gilles Chanteperdrix wrote:
Jan Kiszka wrote:
Am 04.11.2010 00:44, Gilles Chanteperdrix wrote:
Jan Kiszka wrote:
Am 04.11.2010 00:18, Gilles Chanteperdrix wrote:
Jan Kiszka wrote:
Am 04.11.2010 00:11, Gilles Chanteperdrix wrote:
Jan Kiszka wrote:
Am 03.11.2010 23:11, Jan Kiszka wrote:
Am 03.11.2010 23:03, Jan Kiszka wrote:
But we not not always use atomic ops for manipulating status bits (but
we do in other cases where this is no need - different story). This may
fix the race:
Err, nonsense. As we manipulate xnsched::status also outside of nklock
protection, we must _always_ use atomic ops.

This screams for a cleanup: local-only bits like XNHTICK or XNINIRQ
should be pushed in a separate status word that can then be safely
modified non-atomically.
Second try to fix and clean up the sched status bits. Anders, please


diff --git a/include/nucleus/pod.h b/include/nucleus/pod.h
index 01ff0a7..5987a1f 100644
--- a/include/nucleus/pod.h
+++ b/include/nucleus/pod.h
@@ -277,12 +277,10 @@ static inline void xnpod_schedule(void)
         * context is active, or if we are caught in the middle of a
         * unlocked context switch.
        if (testbits(sched->status, XNKCOUT|XNINIRQ|XNSWLOCK))
-#else /* !XENO_DEBUG(NUCLEUS) */
-       if (testbits(sched->status,
+       if (!sched->resched)
 #endif /* !XENO_DEBUG(NUCLEUS) */
Having only one test was really nice here, maybe we simply read a
barrier before reading the status?

I agree - but the alternative is letting all modifications of
xnsched::status use atomic bitops (that's required when folding all bits
into a single word). And that should be much more costly, specifically
on SMP.
What about issuing a barrier before testing the status?

The problem is not about reading but writing the status concurrently,
thus it's not about the code you see above.
The bits are modified under nklock, which implies a barrier when
unlocked. Furthermore, an IPI is guaranteed to be received on the remote
CPU after this barrier, so, a barrier should be enough to see the
modifications which have been made remotely.
Check nucleus/intr.c for tons of unprotected status modifications.
Ok. Then maybe, we should reconsider the original decision to start
fiddling with the XNRESCHED bit remotely.
...which removed complexity and fixed a race? Let's better review the
checks done in xnpod_schedule vs. its callers, I bet there is more to
save (IOW: remove the need to test for sched->resched).
Not that much complexitiy... and the race was a false positive in debug
code, no big deal. At least it worked, and it has done so for a long
time. No atomic needed, no barrier, only one test in xnpod_schedule. And
a nice invariant: sched->status is always accessed on the local cpu.
What else?

Take a step back and look at the root cause for this issue again. Unlocked

        if need-resched

is inherently racy and will always be (not only for the remote
reschedule case BTW). So we either have to accept this and remove the
debugging check from the scheduler or push the check back to
__xnpod_schedule where it once came from. When this it cleaned up, we
can look into the remote resched protocol again.
Probably being daft here; why not stop fiddling with remote CPU status bits and always do a reschedule on IPI irq's?


Xenomai-core mailing list

Reply via email to