Philippe Gerum wrote:
> Jan Kiszka wrote:
>> Philippe Gerum wrote:
>>> Jan Kiszka wrote:
>>>> Philippe Gerum wrote:
>>>>> Jan Kiszka wrote:
>>>>>> Hi,
>>>>>>
>>>>>> bad news, everyone :(. According to the result of some lengthy debug
>>>>>> session with a customer and several ad-hoc lttng instrumentations, we
>>>>>> have a fatal bug in the nucleus' implementation of the lock stealing
>>>>>> algorithm. Consider this scenario:
>>>>>>
>>>>>> 1. Thread A acquires Mutex X successfully, ie. it leaves the (in this
>>>>>>    case) rt_mutex_acquire service, and its XNWAKEN flag is therefore
>>>>>>    cleared.
>>>>>>
>>>>>> 2. Thread A blocks on some further Mutex Y (in our case it was a
>>>>>>    semaphore, but that doesn't matter).
>>>>>>
>>>>>> 3. Thread B signals the availability of Mutex Y to Thread A, thus it
>>>>>>    also set XNWAKEN in Thread A. But Thread A is not yet scheduled on
>>>>>>    its CPU.
>>>>>>
>>>>>> 4. Thread C tries to acquire Mutex X, finds it assigned to Thread A, but
>>>>>>    also notices that the XNWAKEN flag of Thread A is set. Thus it steals
>>>>>>    the mutex although Thread A already entered the critical section -
>>>>>>    and hell breaks loose...
>>>>>>
>>>>> See commit #3795, and change log entry from 2008-05-15. Unless I 
>>>>> misunderstood
>>>>> your description, this bug was fixed in 2.4.4.
>>>> Oh, fatally missed that fix.
>>>>
>>>> Anyway, the patch looks a bit unclean to me. Either you are lacking
>>>> wwake = NULL in xnpod_suspend_thread, or the whole information encoded
>>>> in XNWAKEN can already be covered by wwake directly.
>>>>
>>> Clearing wwake has to be done when returning from xnsynch_sleep_on, only 
>>> when
>>> the code knows that ownership is eventually granted to the caller; making 
>>> such a
>>> decision in xnpod_suspend_thread() would be wrong.
>> What about
>>
>> http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/pod.c#1411
>>
>> then?
>>
> 
> That portion of code applies _before_ the thread enters suspended state. We
> bother for the other side, i.e. when it resumes from actual suspension, until 
> it
> has been decided whether it should be allowed to keep running, or redo the 
> wait
> due to the resource being stolen away.

Then clearing XNWAKEN here is useless - it comes for free, but it has no
practical effect. For code clarity reasons, this should be remove IMHO.

> 
>>> The awake bit has been kept mainly because the nucleus commonly uses 
>>> bitmasks to
>>> get fast access to thread status & information. It's not mandatory to have 
>>> this
>>> one in, it's just conforming to the rest of the implementation.
>> I see, but redundancy come with some ugliness as well. And we add more
>> code to hot paths.
>>
> 
> The hot path barely involves facing a stolen resource situation. Most of the
> time, the XNWAKEN test will be false, or this would mean that the application
> exhibits a high priority thread that routinely and frequently competes with a
> low priority one for gaining access to such resource. I would rather fix the
> latter code first.

Besides that the lock-stealing path (with two instead of only one
conditional jumps) is relevant for the WCET in some cases (but that is
nitpicking), I was more referring to setting the bit + setting wwake in
the wakeup paths.

> 
> Regarding the perceived ugliness, I guess this is a matter of taste here. I 
> like
> the idea of always testing a given information the same way, like:
> 
> if (test_bits(thread, foobar))
> 
> and avoid things like:
> 
> if (test_bits(thread, foo) || thread->field == bar)

The latter is what we see now with XNWAKEN (while plain "thread->field
== bar" would be possible in this case). We don't gain anything here by
using a bit (no combined tests with other bits e.g.).

Jan

---
 include/nucleus/thread.h |    1 -
 ksrc/nucleus/pod.c       |    2 +-
 ksrc/nucleus/synch.c     |    5 +----
 3 files changed, 2 insertions(+), 6 deletions(-)

Index: b/include/nucleus/thread.h
===================================================================
--- a/include/nucleus/thread.h
+++ b/include/nucleus/thread.h
@@ -112,7 +112,6 @@
 #define XNRMID    0x00000002 /**< Pending on a removed resource */
 #define XNBREAK   0x00000004 /**< Forcibly awaken from a wait state */
 #define XNKICKED  0x00000008 /**< Kicked upon Linux signal (shadow only) */
-#define XNWAKEN   0x00000010 /**< Thread waken up upon resource availability */
 #define XNROBBED  0x00000020 /**< Robbed from resource ownership */
 #define XNATOMIC  0x00000040 /**< In atomic switch from secondary to primary 
mode */
 #define XNAFFSET  0x00000080 /**< CPU affinity changed from primary mode */
Index: b/ksrc/nucleus/pod.c
===================================================================
--- a/ksrc/nucleus/pod.c
+++ b/ksrc/nucleus/pod.c
@@ -1406,7 +1406,7 @@ void xnpod_suspend_thread(xnthread_t *th
                }
 #endif /* CONFIG_XENO_OPT_PERVASIVE */
 
-               xnthread_clear_info(thread, XNRMID | XNTIMEO | XNBREAK | 
XNWAKEN | XNROBBED);
+               xnthread_clear_info(thread, XNRMID | XNTIMEO | XNBREAK | 
XNROBBED);
        }
 
        /* Don't start the timer for a thread indefinitely delayed by
Index: b/ksrc/nucleus/synch.c
===================================================================
--- a/ksrc/nucleus/synch.c
+++ b/ksrc/nucleus/synch.c
@@ -199,7 +199,7 @@ redo:
        }
 
        if (thread->cprio > owner->cprio) {
-               if (xnthread_test_info(owner, XNWAKEN) && owner->wwake == 
synch) {
+               if (owner->wwake == synch) {
                        /* Ownership is still pending, steal the resource. */
                        synch->owner = thread;
                        xnthread_clear_info(thread, XNRMID | XNTIMEO | XNBREAK);
@@ -243,7 +243,6 @@ redo:
       unlock_and_exit:
 
        thread->wwake = NULL;
-       xnthread_clear_info(thread, XNWAKEN);
 
        xnlock_put_irqrestore(&nklock, s);
 }
@@ -389,7 +388,6 @@ xnthread_t *xnsynch_wakeup_one_sleeper(x
                thread->wchan = NULL;
                thread->wwake = synch;
                synch->owner = thread;
-               xnthread_set_info(thread, XNWAKEN);
                trace_mark(xn_nucleus_synch_wakeup_one,
                           "thread %p thread_name %s synch %p",
                           thread, xnthread_name(thread), synch);
@@ -503,7 +501,6 @@ xnpholder_t *xnsynch_wakeup_this_sleeper
        thread->wchan = NULL;
        thread->wwake = synch;
        synch->owner = thread;
-       xnthread_set_info(thread, XNWAKEN);
        trace_mark(xn_nucleus_synch_wakeup_all,
                   "thread %p thread_name %s synch %p",
                   thread, xnthread_name(thread), synch);

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to