>> running the attached test case for the native skin, you will get an ugly
>> lock-up on probably all Xenomai versions. Granted, this code is a bit
>> synthetic. I originally thought I could trigger the bug also via
>> timeouts when waiting on mutexes, but this scenario is safe (the timeout
>> is cleared before being able to cause harm).
> just in order to educate me as probably I might have got something
> wrong at the first glance :)
> if we take this one:
> --- mutex.c    2006-02-27 15:34:58.000000000 +0100
> +++ mutex-NEW.c    2006-05-10 11:55:25.000000000 +0200
> @@ -391,7 +391,7 @@ int rt_mutex_lock (RT_MUTEX *mutex,
>     err = -EIDRM; /* Mutex deleted while pending. */
>     else if (xnthread_test_flags(&task->thread_base,XNTIMEO))
>     err = -ETIMEDOUT; /* Timeout.*/
> -    else if (xnthread_test_flags(&task->thread_base,XNBREAK))
> +    else if (xnthread_test_flags(&task->thread_base,XNBREAK) &&
> mutex->owner != task)
>     err = -EINTR; /* Unblocked.*/
>  unlock_and_exit:
> As I understand task2 has a lower prio and that's why
> [task1] rt_mutex_unlock
> [task 1] rt_task_unblock(task1)
> are called in a row.
> ok, task2 wakes up in rt_mutex_unlock() (when task1 is blocked on
> rt_mutex_lock()) and finds XNBREAK flag but,
> [doc] -EINTR is returned if rt_task_unblock() has been called for the
> waiting task (1) before the mutex has become available (2).
> (1) it's true, task2 was still waiting at that time;
> (2) it's wrong, task2 was already the owner.
> So why just not to bail out XNBREAK and continue task2 as it has a
> mutex (as shown above) ?

Indeed, this solves the issue more gracefully.

Looking at this again from a different perspective and running the test
case with your patch in a slightly different way, I think I
misinterpreted the crash. If I modify task2 like this

void task2_fnc(void *arg)
        printf("started task2\n");
        if (rt_mutex_lock(&mtx, 0) < 0) {
                printf("lock failed in task2\n");
//        rt_mutex_unlock(&mtx);

        printf("done task2\n");

I'm also getting a crash. So the problem seems to be releasing a mutex
ownership on task termination. Well, this needs further examination.

Looks like the issue is limited to cleanup problems and is not that
widespread to other skins as I thought. RTDM is not involved as it does
not know EINTR for rtdm_mutex_lock. The POSIX skins runs in a loop on
interruption and should recover from this.

Besides this, we then may want to consider if introducing a pending
ownership of synch objects is worthwhile to improve efficiency of PIP
users. Not critical, but if it comes at a reasonable price... Will try
to draft something.


