Re: [Xenomai-help] Issue with Auto relax and nested mutexes

Makarand Pradhan Mon, 23 Jan 2012 07:01:57 -0800

Hi Phiippe,

Just wanted to send a gentle reminder regarding the issue related totask priority and mutexes.


The scenario under discussion was:

Start task prio 0
<Task is running>
Set priority 85
<Do something>
Set priority 0
    get_mutex <-- rescnt not incremented as XNOTHER is not set.
<Do something> <-- XNOTHER gets set somewhere
    Release mutex <-- Sends a SIGDEBUG/SIGXCPU

Would highly appreciate your opinion. Do you feel that there is agenuine issue out here?


Rgds,
Mak.

On 19/01/12 11:39 AM, Makarand Pradhan wrote:

The attached c file should get you a SIGXCPU/SIGDEBUG revealing the bug
that I am talking about.

Rgds,
Mak.



On 19/01/12 11:22 AM, Makarand Pradhan wrote:

Hi,

The scenario is:

Start task prio 0
<Task is running>
Set priority 85
<Do something>
Set priority 0
get_mutex<-- rescnt not incremented as XNOTHER is not set.
<Do something>   <-- XNOTHER gets set somewhere
Release mutex<-- Sends a SIGDEBUG/SIGXCPU

Am trying to write a simple app that will reveal the issue. Will send it
out shortly.

Rgds,
Mak

On 19/01/12 10:49 AM, Philippe Gerum wrote:

On 01/19/2012 04:22 PM, Makarand Pradhan wrote:

Hi Philippe,

I think I may have not communicated the scenario properly. I am not
trying to control the priorities from user space during resource
contention. That is left to the kernel. Let me try again.

At some point, my application which was relaxed has to run with a real
time priority. That's when I invoke rt_task_set_priority to change the
base priority. After the critical section is past, the thread has to
relax again where the priority is set to 0 again.

The rt_task_set_priority API allows me to change the task priority on
the fly, so I think that the operation is supported and legal. Pl feel
free to correct me if that is not true.

What is not supported is:

get_mutex
set_priority(current)
release_mutex

This can't be a valid operation, because of the reason I mentioned
earlier. So, regardless of the reason why you call
rt_set_task_priority(), you may not call it while holding a mutex.

What happens afterward, e.g. not getting the auto-relax feature back is
irrelevant in this case.

So, if your app does:

set_priority(current,>    0)
get_mutex
release_mutex
set_priority(current, 0)

then fine, and not getting back the auto-relax after switching to
priority 0 would indeed reveal a kernel issue. But the former scenario
is wrong.

This change of priorities does introduce the race condition that was
encountered which can be handled properly in the kernel using any of the
2 approaches that were mentioned.

Your comments are highly valued and I look forward to your opinions.

Rgds,
Mak.

On 19/01/12 06:25 AM, Philippe Gerum wrote:

On 01/18/2012 11:41 PM, Makarand Pradhan wrote:

Hi,

Another problem was encountered with rescnt related to nested mutexes.

This time the rescnt is not incrementing because the XNOTHER bit is not
set, causing a SIGDEBUG or SIGXCPU to be delivered to the thread causing
my application to crash.

The scenario is as follows:

1. Thread started with priority 0. (Relaxed)
2. This thread uses mutexes which causes Priority Inversions.
3. At some point, a rt_task_set_priority is done to change the priority.
(RT 85).
4. Some time later the priority is set back to 0.

If I understand it properly, your runtime scenario is badly broken I'm
afraid. By contrast to priority ceiling, priority inheritance is about
leaving the responsibility to the _kernel_ to pick the best dynamic
priority for your thread to solve a priority inversion.

Therefore, by changing your dynamic priority while holding a mutex, your
application is preventing the kernel to do the job you previously
assigned to it. Worst, you could be causing unexpected latencies to
other threads your application has no clue about, or just can't tell
whether they compete with your thread for accessing the resource at that
specific time.

After all, this is your application that defined the contented mutex,
and as such the fact that priority inheritance might be involved at some
point. If you don't trust the kernel and want to deal with priorities
manually during resource contention, then maybe you should use a
different mutual exclusion mechanism not implementing priority
inheritance, e.g. a plain binary semaphore.

The problem again revolves around setting XNOTHER. In the problem
scenario, the XNOTHER bit is not set in xnsynch_acquire. Hence the
rescnt is not incremented.

The reason for that is, while doing a rt_task_set_priority,
__xnsched_rt_setparam is invoked before the thread is reniced.

To resolve this issue, I had to set the XNOTHER bit in
__xnpod_set_thread_schedparam after the thread was reniced or in
rt_task_set_priority. Both the code changes are given below:


rt_task_set_priority(....

+ if (0==prio)
+ {
+ xnthread_set_state(&task->thread_base, XNOTHER);
+ }


xnpod_set_thread_schedparam(...

#ifdef CONFIG_XENO_OPT_PERVASIVE
if (propagate) {
if (xnthread_test_state(thread, XNRELAX))
xnshadow_renice(thread);
else if (xnthread_test_state(thread, XNSHADOW))
xnthread_set_info(thread, XNPRIOSET);
}

+ if (xnthread_test_state(thread, XNSHADOW)) {
+ // if (thread->bprio || !xnthread_test_state(thread, XNBOOST))
+ if (thread->bprio)
+ xnthread_clear_state(thread, XNOTHER);
+ else
+ xnthread_set_state(thread, XNOTHER);
+ }


Setting XNOTHER in rt_task_set_priority does not look appropriate. I
believe the right place is in the xnpod_set_thread_schedparam.

Would highly appreciate your views.

Rgds,
Mak


On 10/01/12 02:10 PM, Makarand Pradhan wrote:

The patch does work. Thanks.

Will it be available in the next release of xenomai?

Rgds,
Mak

root@ruggedcom:~# ./relax 0 1
Spawning: tasks
bP: 0, cp: 0, mode: 0
Acquire complete
Release complete
bP: 0, cp: 0, mode: 0
Acquire complete
Release complete
bP: 0, cp: 0, mode: 0
Acquire complete
Release complete
bP: 0, cp: 0, mode: 0
Acquire complete
Grabbing mux in HP
Mux held by Task2
Release complete
bP: 0, cp: 0, mode: 0
Acquire complete
Release complete
bP: 0, cp: 0, mode: 0
Acquire complete
^C
root@ruggedcom:~#


On 10/01/12 01:39 PM, Makarand Pradhan wrote:

Hi Phillipe,

A bit surprised to see a change in sched-rt.h. I had another problem
earlier where the XNOTHER was not getting set after a priority
change. I
had to look at the code that you have modified. Although I had
temporarily worked around it by setting the XNOTHER in
rt_task_set_priority. I think this would fix that problem as well.

Will test the patch and get back with the results.

Thanks and Rgds,
Mak.

On 10/01/12 01:08 PM, Philippe Gerum wrote:

On 01/10/2012 04:51 PM, Makarand Pradhan wrote:

Based on my testing, it is noted that the rescnt is not released
when
task1 gets a priority boost and starts running with priority 1.
That's
when the rescnt is not decremented.

It would imply that we may be checking the current priority while
testing if we want to invoke rt_mutex_release in kernel. Will try to
check it out.

Does this help in your case?

diff --git a/include/nucleus/sched-rt.h b/include/nucleus/sched-rt.h
index cc1cefa..6ac8fd7 100644
--- a/include/nucleus/sched-rt.h
+++ b/include/nucleus/sched-rt.h
@@ -87,7 +87,7 @@ static inline void __xnsched_rt_setparam(struct
xnthread *thread,
{
thread->cprio = p->rt.prio;
if (xnthread_test_state(thread, XNSHADOW)) {
- if (thread->cprio)
+ if (thread->bprio || !xnthread_test_state(thread, XNBOOST))
xnthread_clear_state(thread, XNOTHER);
else
xnthread_set_state(thread, XNOTHER);

Rgds,
Mak.

On 10/01/12 10:42 AM, Philippe Gerum wrote:

On 01/10/2012 04:40 PM, Philippe Gerum wrote:

On 01/10/2012 04:40 PM, Makarand Pradhan wrote:

Another point:

"These are fast mutexes, the thread does not have to jump to
kernel
space
unless the released mutex was actually contented."

When the first task is started with prio 0, I always see that
rt_mutex_release is invoked in the kernel. even when there is no
contention.

I should have added: "unless there is no contention ... or the
caller is
a non-rt thread". This is because we have to jump to kernel
space to
track rescnt.

Ok, next try: "unless the mutex was contented ... or the caller is
a non-rt thread".

I have an instrumented kernel. The kernel trace is given below.
In this
trace only task1 is running at prio 0. It should be easy to
follow:

Jan 10 10:36:59 ruggedcom kernel: lo: rescnt: 0, switched: 0
Jan 10 10:36:59 ruggedcom kernel: hi: rescnt: 0, switched: 0
Jan 10 10:36:59 ruggedcom kernel: lo: rescnt: 1, switched: 1
Jan 10 10:36:59 ruggedcom kernel: hi: rescnt: 2, switched: 0
Jan 10 10:36:59 ruggedcom kernel: hi: rescnt: 3, switched: 0
Jan 10 10:37:01 ruggedcom kernel: hi: rescnt: 3, switched: 0
Jan 10 10:37:01 ruggedcom kernel: __rt_mutex_release
Jan 10 10:37:01 ruggedcom kernel: RML
Jan 10 10:37:01 ruggedcom kernel: rt_mutex_release: lockcnt: 1
Jan 10 10:37:01 ruggedcom kernel: xnsynch_release_thread: BP: 0
Jan 10 10:37:01 ruggedcom kernel: hi: rescnt: 2, switched: 0
Jan 10 10:37:01 ruggedcom kernel: __rt_mutex_release
Jan 10 10:37:01 ruggedcom kernel: RML
Jan 10 10:37:01 ruggedcom kernel: rt_mutex_release: lockcnt: 1
Jan 10 10:37:01 ruggedcom kernel: xnsynch_release_thread: BP: 0
Jan 10 10:37:01 ruggedcom kernel: hi: rescnt: 1, switched: 0
Jan 10 10:37:01 ruggedcom kernel: __rt_mutex_release
Jan 10 10:37:01 ruggedcom kernel: RML
Jan 10 10:37:01 ruggedcom kernel: rt_mutex_release: lockcnt: 1
Jan 10 10:37:01 ruggedcom kernel: xnsynch_release_thread: BP: 0
Jan 10 10:37:01 ruggedcom kernel: hi: rescnt: 0, switched: 0
Jan 10 10:37:01 ruggedcom kernel: lo: rescnt: 1, switched: 1
Jan 10 10:37:01 ruggedcom kernel: hi: rescnt: 2, switched: 0
Jan 10 10:37:01 ruggedcom kernel: hi: rescnt: 3, switched: 0
Jan 10 10:37:03 ruggedcom kernel: hi: rescnt: 3, switched: 0
Jan 10 10:37:03 ruggedcom kernel: __rt_mutex_release
Jan 10 10:37:03 ruggedcom kernel: RML
Jan 10 10:37:03 ruggedcom kernel: rt_mutex_release: lockcnt: 1
Jan 10 10:37:03 ruggedcom kernel: xnsynch_release_thread: BP: 0
Jan 10 10:37:03 ruggedcom kernel: hi: rescnt: 2, switched: 0
Jan 10 10:37:03 ruggedcom kernel: __rt_mutex_release
Jan 10 10:37:03 ruggedcom kernel: RML
Jan 10 10:37:03 ruggedcom kernel: rt_mutex_release: lockcnt: 1
Jan 10 10:37:03 ruggedcom kernel: xnsynch_release_thread: BP: 0
Jan 10 10:37:03 ruggedcom kernel: hi: rescnt: 1, switched: 0
Jan 10 10:37:03 ruggedcom kernel: __rt_mutex_release
Jan 10 10:37:03 ruggedcom kernel: RML
Jan 10 10:37:03 ruggedcom kernel: rt_mutex_release: lockcnt: 1
Jan 10 10:37:03 ruggedcom kernel: xnsynch_release_thread: BP: 0
Jan 10 10:37:03 ruggedcom kernel: hi: rescnt: 0, switched: 0
Jan 10 10:37:03 ruggedcom kernel: lo: rescnt: 1, switched: 1
Jan 10 10:37:03 ruggedcom kernel: hi: rescnt: 2, switched: 0
Jan 10 10:37:03 ruggedcom kernel: hi: rescnt: 3, switched: 0
Jan 10 10:37:04 ruggedcom kernel: hi: rescnt: 3, switched: 0


root@ruggedcom:~# ./a.out 0 1
Spawning: tasks
bP: 0, cp: 0, mode: 0
Acquire complete
Release complete
bP: 0, cp: 0, mode: 0
Acquire complete
Release complete
bP: 0, cp: 0, mode: 0
Acquire complete
^C


Rgds,
Mak.



On 10/01/12 10:26 AM, Makarand Pradhan wrote:

Hi Phillippe,

You are right. Task 1 requires to be started with prio 0. I
start
seeing
the problem after task2 grabs the mutex and releases them. The
first
task never jumps back to seconodary. Here is my output. The
mode never
goes back to 0 after "Grabbing mux in HP" and the rescnt stays
stuck at
1 in the kernel.

root@ruggedcom:~# ./relax 0 1
Spawning: tasks
bP: 0, cp: 0, mode: 0
Acquire complete
Release complete
bP: 0, cp: 0, mode: 0
Acquire complete
Release complete
bP: 0, cp: 0, mode: 0
Acquire complete
Release complete
bP: 0, cp: 0, mode: 0
Acquire complete
Grabbing mux in HP
Mux held by Task2
Release complete
bP: 0, cp: 0, mode: 1
Acquire complete
Release complete
bP: 0, cp: 0, mode: 1
Acquire complete

Rgds,
Mak.


On 10/01/12 10:11 AM, Philippe Gerum wrote:

On 01/09/2012 09:50 PM, Makarand Pradhan wrote:

Hi,

I am running kernel 3.0.0, xenomai: 2.6, powerpc 8360.

I am noticing an issue while using the auto relax feature
related to
mutexes. I am using nested mutexes. The code is attached to
this
email.

The problem is that I am not relaxing after a RT thread grabs
and
releases a mutex. On further investigation, it was noted that
the
rescnt
is not going down to 0.

     From your code, task1 would auto-relax only if started with
priority 0,
which is what I get here:

-bash-3.2# ./relax 0 1
Spawning: tasks
bP: 0, cp: 0, mode: 0
Acquire complete
Release complete
bP: 0, cp: 0, mode: 0
Acquire complete
Release complete
bP: 0, cp: 0, mode: 0
Acquire complete
Release complete
...

Conversely, I get the right behavior if setting a non-zero
priority to
task1:

-bash-3.2# ./relax 1 0
Spawning: tasks
bP: 1, cp: 1, mode: 1
Acquire complete
Release complete
bP: 1, cp: 1, mode: 1
Acquire complete
Release complete
bP: 1, cp: 1, mode: 1
Acquire complete
...

In any case, the priority of task2 should have no impact on the
result.

I'm running current 2.6 HEAD commit (168da46de), kernel
3.1.5/powerpc32
(52xx), pipeline 2.13-06.

Which priority arguments are you passing to your test program?

Another observation is that I do not hit
rt_mutex_release in the kernel in the problem scenario, I
believe
when
the thread undergoes a priority inversion.This may be a
problem
as the
rescnt would not get decremented. Not sure how the mutex is
releasing
wiithout hitting rt_mutex_relase or am I missing anything?

These are fast mutexes, the thread does not have to jump to
kernel
space
unless the released mutex was actually contented.

If I have both the tasks running at priority 0, I stay in the
secondary
domain, rt_mutex_release is invoked as expected, the rescnt
goes
down to
0 when all the mutexes are released.

Has anyone faced this problem?

I'm unsure there is any yet. Auto-relax applies to non -rt
Xenomai
threads only (i.e. prio == 0).

Rgds,
Makarand







_______________________________________________
Xenomai-help mailing list
[email protected]
https://mail.gna.org/listinfo/xenomai-help

--
___________________________________________________________________________


NOTICE OF CONFIDENTIALITY:
This e-mail and any attachments may contain confidential and
privileged information. If you are
not the intended recipient, please notify the sender immediately by
return e-mail and delete this
e-mail and any copies. Any dissemination or use of this information
by a person other than the
intended recipient is unauthorized and may be illegal.
_____________________________________________________________________




_______________________________________________
Xenomai-help mailing list
[email protected]
https://mail.gna.org/listinfo/xenomai-help

--
Philippe.

--
Philippe.

--
___________________________________________________________________________
NOTICE OF CONFIDENTIALITY:
This e-mail and any attachments may contain confidential and privileged 
information.  If you are
not the intended recipient, please notify the sender immediately by return 
e-mail and delete this
e-mail and any copies.  Any dissemination or use of this information by a 
person other than the
intended recipient is unauthorized and may be illegal.
_____________________________________________________________________


--
___________________________________________________________________________
NOTICE OF CONFIDENTIALITY:
This e-mail and any attachments may contain confidential and privileged 
information.  If you are
not the intended recipient, please notify the sender immediately by return 
e-mail and delete this
e-mail and any copies.  Any dissemination or use of this information by a 
person other than the
intended recipient is unauthorized and may be illegal.
_____________________________________________________________________



--
___________________________________________________________________________
NOTICE OF CONFIDENTIALITY:
This e-mail and any attachments may contain confidential and privileged 
information.  If you are
not the intended recipient, please notify the sender immediately by return 
e-mail and delete this
e-mail and any copies.  Any dissemination or use of this information by a 
person other than the
intended recipient is unauthorized and may be illegal.
_____________________________________________________________________


_______________________________________________
Xenomai-help mailing list
[email protected]
https://mail.gna.org/listinfo/xenomai-help

Re: [Xenomai-help] Issue with Auto relax and nested mutexes

Reply via email to