Re: [Xenomai-core] [bug] zombie mutex owners

2006-05-10 Thread Jan Kiszka
Dmitry Adamushko wrote:
 Hi Jan,
 

 running the attached test case for the native skin, you will get an ugly
 lock-up on probably all Xenomai versions. Granted, this code is a bit
 synthetic. I originally thought I could trigger the bug also via
 timeouts when waiting on mutexes, but this scenario is safe (the timeout
 is cleared before being able to cause harm).

 
 just in order to educate me as probably I might have got something
 wrong at the first glance :)
 
 if we take this one:
 
 --- mutex.c2006-02-27 15:34:58.0 +0100
 +++ mutex-NEW.c2006-05-10 11:55:25.0 +0200
 @@ -391,7 +391,7 @@ int rt_mutex_lock (RT_MUTEX *mutex,
 err = -EIDRM; /* Mutex deleted while pending. */
 else if (xnthread_test_flags(task-thread_base,XNTIMEO))
 err = -ETIMEDOUT; /* Timeout.*/
 -else if (xnthread_test_flags(task-thread_base,XNBREAK))
 +else if (xnthread_test_flags(task-thread_base,XNBREAK) 
 mutex-owner != task)
 err = -EINTR; /* Unblocked.*/
 
  unlock_and_exit:
 
 As I understand task2 has a lower prio and that's why
 
 [task1] rt_mutex_unlock
 [task 1] rt_task_unblock(task1)
 
 are called in a row.
 
 ok, task2 wakes up in rt_mutex_unlock() (when task1 is blocked on
 rt_mutex_lock()) and finds XNBREAK flag but,
 
 [doc] -EINTR is returned if rt_task_unblock() has been called for the
 waiting task (1) before the mutex has become available (2).
 
 (1) it's true, task2 was still waiting at that time;
 (2) it's wrong, task2 was already the owner.
 
 So why just not to bail out XNBREAK and continue task2 as it has a
 mutex (as shown above) ?

Indeed, this solves the issue more gracefully.

Looking at this again from a different perspective and running the test
case with your patch in a slightly different way, I think I
misinterpreted the crash. If I modify task2 like this

void task2_fnc(void *arg)
{
printf(started task2\n);
if (rt_mutex_lock(mtx, 0)  0) {
printf(lock failed in task2\n);
return;
}
//rt_mutex_unlock(mtx);

printf(done task2\n);
}

I'm also getting a crash. So the problem seems to be releasing a mutex
ownership on task termination. Well, this needs further examination.

Looks like the issue is limited to cleanup problems and is not that
widespread to other skins as I thought. RTDM is not involved as it does
not know EINTR for rtdm_mutex_lock. The POSIX skins runs in a loop on
interruption and should recover from this.

Besides this, we then may want to consider if introducing a pending
ownership of synch objects is worthwhile to improve efficiency of PIP
users. Not critical, but if it comes at a reasonable price... Will try
to draft something.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] [bug] zombie mutex owners

2006-05-10 Thread Jan Kiszka
Hi,

running the attached test case for the native skin, you will get an ugly
lock-up on probably all Xenomai versions. Granted, this code is a bit
synthetic. I originally thought I could trigger the bug also via
timeouts when waiting on mutexes, but this scenario is safe (the timeout
is cleared before being able to cause harm).

What we see here is that task1 forwards the ownership of the mutex to
task2 on its first unlock invocation. Then we interrupt task2, making it
drop its wish to acquire the lock - but it already has it! Now weird
things happen on cleanup of task2 (likely also when trying to require
the lock via task1 beforehand).

The attached fix solves at least the crash but still gives an
unsatisfying result:

: fn  -158+  __rt_mutex_lock (hisyscall_event)
: fn  -156+  __copy_from_user_ll (__rt_mutex_lock)
: fn  -154+  xnregistry_fetch (__rt_mutex_lock)
:|fn  -151+  __ipipe_restore_pipeline_head (xnregistry_fetch)
: fn  -148+  rt_mutex_lock (__rt_mutex_lock)
:|fn  -144+  xnsynch_sleep_on (rt_mutex_lock)
:|fn  -134+  xnpod_resume_thread (xnsynch_sleep_on)
:|fn  -130+  xnpod_suspend_thread (xnsynch_sleep_on)
:|fn  -125+  xnpod_schedule (xnpod_suspend_thread)
:|fn  -116!  __switch_to (xnpod_schedule)
:|fn  -103+  rt_mutex_unlock (rt_mutex_lock)
:|fn  -100+  xnsynch_wakeup_one_sleeper (rt_mutex_unlock)
:|fn   -98+  xnpod_resume_thread (xnsynch_wakeup_one_sleeper)
:|fn   -95+  xnsynch_clear_boost (xnsynch_wakeup_one_sleeper)
:|fn   -89+  xnpod_schedule (rt_mutex_unlock)
:|fn   -85+  __switch_to (xnpod_schedule)
:|fn   -79!  __ipipe_restore_pipeline_head (rt_mutex_lock)

This means that task2 needs to be woken up in order to let task1
re-acquire the mutex. What would be more efficient for task1 is to
steal the granted lock again (that's what the preempt-rt people do in
their rtmutex code - and this is where I stumbled over our issues).

I haven't tried to construct test cases for other skins yet, but in
theory at least POSIX and RTDM should suffer from the same issue. This
raises the question if there might be some generic solution at nucleus
level for this, also improving the re-acquire path. But I have nothing
at hand so far.

Jan
Index: ksrc/skins/native/mutex.c
===
--- ksrc/skins/native/mutex.c	(Revision 1058)
+++ ksrc/skins/native/mutex.c	(Arbeitskopie)
@@ -375,12 +375,17 @@ int rt_mutex_lock(RT_MUTEX *mutex, RTIME
 
 xnsynch_sleep_on(mutex-synch_base, timeout);
 
-if (xnthread_test_flags(task-thread_base, XNRMID))
-err = -EIDRM;   /* Mutex deleted while pending. */
-else if (xnthread_test_flags(task-thread_base, XNTIMEO))
-err = -ETIMEDOUT;   /* Timeout. */
-else if (xnthread_test_flags(task-thread_base, XNBREAK))
-err = -EINTR;   /* Unblocked. */
+if (xnthread_test_flags(task-thread_base, XNRMID | XNTIMEO | XNBREAK)) {
+if (xnthread_test_flags(task-thread_base, XNRMID))
+err = -EIDRM;   /* Mutex deleted while pending. */
+else if (xnthread_test_flags(task-thread_base, XNTIMEO))
+err = -ETIMEDOUT;   /* Timeout. */
+else
+err = -EINTR;   /* Unblocked. */
+
+if (mutex-owner == task)
+rt_mutex_unlock(mutex);
+}
 
   unlock_and_exit:
 
#include stdio.h
#include sys/mman.h
#include native/task.h
#include native/mutex.h
#include native/timer.h
#include rtdm/rtbenchmark.h

RT_TASK task1, task2;
RT_MUTEX mtx;
int fd;

void task1_fnc(void *arg)
{
printf(started task1\n);
rt_mutex_lock(mtx, 0);
rt_task_sleep(10LL);
rt_mutex_unlock(mtx);

rt_task_unblock(task2);

rt_mutex_lock(mtx, 0);
rt_dev_ioctl(fd, RTBNCH_RTIOC_REFREEZE_TRACE, 0);
rt_dev_close(fd);
rt_mutex_unlock(mtx);
printf(done task1\n);
}

void task2_fnc(void *arg)
{
printf(started task2\n);
if (rt_mutex_lock(mtx, 0)  0) {
printf(lock failed in task2\n);
return;
}
rt_mutex_unlock(mtx);

printf(done task2\n);
}

int main()
{
mlockall(MCL_CURRENT | MCL_FUTURE);

fd = rt_dev_open(rtbenchmark0, 0);
rt_mutex_create(mtx, NULL);

rt_task_spawn(task1, task1, 0, 20, T_JOINABLE, task1_fnc, 0);
rt_task_spawn(task2, task2, 0, 10, T_JOINABLE, task2_fnc, 0);

rt_task_join(task1);
rt_task_join(task2);

rt_mutex_delete(mtx);

return 0;
}


signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [bug] zombie mutex owners

2006-05-10 Thread Philippe Gerum

Jan Kiszka wrote:

Dmitry Adamushko wrote:


Hi Jan,



running the attached test case for the native skin, you will get an ugly
lock-up on probably all Xenomai versions. Granted, this code is a bit
synthetic. I originally thought I could trigger the bug also via
timeouts when waiting on mutexes, but this scenario is safe (the timeout
is cleared before being able to cause harm).



just in order to educate me as probably I might have got something
wrong at the first glance :)

if we take this one:

--- mutex.c2006-02-27 15:34:58.0 +0100
+++ mutex-NEW.c2006-05-10 11:55:25.0 +0200
@@ -391,7 +391,7 @@ int rt_mutex_lock (RT_MUTEX *mutex,
   err = -EIDRM; /* Mutex deleted while pending. */
   else if (xnthread_test_flags(task-thread_base,XNTIMEO))
   err = -ETIMEDOUT; /* Timeout.*/
-else if (xnthread_test_flags(task-thread_base,XNBREAK))
+else if (xnthread_test_flags(task-thread_base,XNBREAK) 
mutex-owner != task)
   err = -EINTR; /* Unblocked.*/

unlock_and_exit:

As I understand task2 has a lower prio and that's why

[task1] rt_mutex_unlock
[task 1] rt_task_unblock(task1)

are called in a row.

ok, task2 wakes up in rt_mutex_unlock() (when task1 is blocked on
rt_mutex_lock()) and finds XNBREAK flag but,

[doc] -EINTR is returned if rt_task_unblock() has been called for the
waiting task (1) before the mutex has become available (2).

(1) it's true, task2 was still waiting at that time;
(2) it's wrong, task2 was already the owner.

So why just not to bail out XNBREAK and continue task2 as it has a
mutex (as shown above) ?



Indeed, this solves the issue more gracefully.

Looking at this again from a different perspective and running the test
case with your patch in a slightly different way, I think I
misinterpreted the crash. If I modify task2 like this

void task2_fnc(void *arg)
{
printf(started task2\n);
if (rt_mutex_lock(mtx, 0)  0) {
printf(lock failed in task2\n);
return;
}
//rt_mutex_unlock(mtx);

printf(done task2\n);
}

I'm also getting a crash. So the problem seems to be releasing a mutex
ownership on task termination. Well, this needs further examination.

Looks like the issue is limited to cleanup problems and is not that
widespread to other skins as I thought. RTDM is not involved as it does
not know EINTR for rtdm_mutex_lock. The POSIX skins runs in a loop on
interruption and should recover from this.

Besides this, we then may want to consider if introducing a pending
ownership of synch objects is worthwhile to improve efficiency of PIP
users. Not critical, but if it comes at a reasonable price... Will try
to draft something.



I've planned to work over the simulator asap to implement the stealing 
of ownership at the nucleus level, so that this kind of issue will 
become history.


--

Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [bug] zombie mutex owners

2006-05-10 Thread Philippe Gerum

Jan Kiszka wrote:

Dmitry Adamushko wrote:

Indeed, this solves the issue more gracefully.

Looking at this again from a different perspective and running the test
case with your patch in a slightly different way, I think I
misinterpreted the crash. If I modify task2 like this

void task2_fnc(void *arg)
{
printf(started task2\n);
if (rt_mutex_lock(mtx, 0)  0) {
printf(lock failed in task2\n);
return;
}
//rt_mutex_unlock(mtx);

printf(done task2\n);
}

I'm also getting a crash. So the problem seems to be releasing a mutex
ownership on task termination. Well, this needs further examination.



The native skin does not implement robust mutex, indeed.


Looks like the issue is limited to cleanup problems and is not that
widespread to other skins as I thought. RTDM is not involved as it does
not know EINTR for rtdm_mutex_lock. The POSIX skins runs in a loop on
interruption and should recover from this.



I can confirm with the help of the simulator that there's no issue 
regarding the way the ownership is transferred back and forth between 
both tasks, this works. However, I agree that this should not preclude 
us from improving the priority inversion guard, by allowing the lock to 
be stolen if the new owner has not resumed execution yet, after it has 
been granted the mutex ownership.



Besides this, we then may want to consider if introducing a pending
ownership of synch objects is worthwhile to improve efficiency of PIP
users. Not critical, but if it comes at a reasonable price... Will try
to draft something.

Jan





___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core



--

Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [bug] zombie mutex owners

2006-05-10 Thread Jan Kiszka
Philippe Gerum wrote:
 Jan Kiszka wrote:
 Dmitry Adamushko wrote:

 Indeed, this solves the issue more gracefully.

 Looking at this again from a different perspective and running the test
 case with your patch in a slightly different way, I think I
 misinterpreted the crash. If I modify task2 like this

 void task2_fnc(void *arg)
 {
 printf(started task2\n);
 if (rt_mutex_lock(mtx, 0)  0) {
 printf(lock failed in task2\n);
 return;
 }
 //rt_mutex_unlock(mtx);

 printf(done task2\n);
 }

 I'm also getting a crash. So the problem seems to be releasing a mutex
 ownership on task termination. Well, this needs further examination.

 
 The native skin does not implement robust mutex, indeed.

Yeah, lunch opened my eyes: the skin data structure (RT_MUTEX) is not
updated appropriately on task cleanup. What about some callback hook in
xnsynch_t to invoke a per-skin cleanup handler when running
xnsynch_release_all_ownerships?

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [bug] zombie mutex owners

2006-05-10 Thread Philippe Gerum

Jan Kiszka wrote:

Dmitry Adamushko wrote:


Hi Jan,



running the attached test case for the native skin, you will get an ugly
lock-up on probably all Xenomai versions. Granted, this code is a bit
synthetic. I originally thought I could trigger the bug also via
timeouts when waiting on mutexes, but this scenario is safe (the timeout
is cleared before being able to cause harm).



just in order to educate me as probably I might have got something
wrong at the first glance :)

if we take this one:

--- mutex.c2006-02-27 15:34:58.0 +0100
+++ mutex-NEW.c2006-05-10 11:55:25.0 +0200
@@ -391,7 +391,7 @@ int rt_mutex_lock (RT_MUTEX *mutex,
   err = -EIDRM; /* Mutex deleted while pending. */
   else if (xnthread_test_flags(task-thread_base,XNTIMEO))
   err = -ETIMEDOUT; /* Timeout.*/
-else if (xnthread_test_flags(task-thread_base,XNBREAK))
+else if (xnthread_test_flags(task-thread_base,XNBREAK) 
mutex-owner != task)
   err = -EINTR; /* Unblocked.*/

unlock_and_exit:

As I understand task2 has a lower prio and that's why

[task1] rt_mutex_unlock
[task 1] rt_task_unblock(task1)

are called in a row.

ok, task2 wakes up in rt_mutex_unlock() (when task1 is blocked on
rt_mutex_lock()) and finds XNBREAK flag but,

[doc] -EINTR is returned if rt_task_unblock() has been called for the
waiting task (1) before the mutex has become available (2).

(1) it's true, task2 was still waiting at that time;
(2) it's wrong, task2 was already the owner.

So why just not to bail out XNBREAK and continue task2 as it has a
mutex (as shown above) ?



Indeed, this solves the issue more gracefully.



The real issue with the XNBREAK bit is that xnpod_unblock_thread() 
should only raise it for threads which it does actually awake, and not 
for those which are already resumed, e.g. by a call to 
xnsynch_wakeup_sleeper().


This explains why task2 gets -EINTR from rt_mutex_lock() albeit the 
syscall has indeed succeeded, so that the next rt_mutex_unlock() was ok, 
but the downstream code might clearly be confused by such behaviour. It 
should get a success code instead, since it has been resumed by 
xnsynch_wakeup_sleeper() _before_ xnpod_unblock_thread() has been called 
against it. Fixed in the repository.


--

Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [bug] zombie mutex owners

2006-05-10 Thread Philippe Gerum

Jan Kiszka wrote:

Philippe Gerum wrote:


Jan Kiszka wrote:


Dmitry Adamushko wrote:

Indeed, this solves the issue more gracefully.

Looking at this again from a different perspective and running the test
case with your patch in a slightly different way, I think I
misinterpreted the crash. If I modify task2 like this

void task2_fnc(void *arg)
{
   printf(started task2\n);
   if (rt_mutex_lock(mtx, 0)  0) {
   printf(lock failed in task2\n);
   return;
   }
//rt_mutex_unlock(mtx);

   printf(done task2\n);
}

I'm also getting a crash. So the problem seems to be releasing a mutex
ownership on task termination. Well, this needs further examination.



The native skin does not implement robust mutex, indeed.



Yeah, lunch opened my eyes: the skin data structure (RT_MUTEX) is not
updated appropriately on task cleanup. What about some callback hook in
xnsynch_t to invoke a per-skin cleanup handler when running
xnsynch_release_all_ownerships?



Yep, we need this. Added by commit #1067 (i.e. 
xnsynch_register_cleanup()) and also backported to the maintenance 
branch, so that new code may happily rely on it without too much 
portability burden.


--

Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core