Hi,
On Wed, 2010-04-14 at 20:19 -0400, Andreas Glatz wrote:
> Hi,
>
> On Tue, Apr 06, 2010 at 02:45:20AM -0400, Jan Kiszka wrote:
> > Andreas Glatz wrote:
> > >>> Actually that is what I thought in the first place, however Jan's
> > >>> comment "That's not true, Xenomai threads can run in non-RT scheduling
> > >>> classes as well. They may just gain RT priority while holding some
> > >>> lock that is requested by a RT thread as well." made me think I was
> > >>> wrong...
> > >>>
> > >>> So we would really need a SCHED_IDLE for Xenomai then to solve this
> > >>> problem?
> > >> I don't think so. But we do need to solve the issue that a non-RT thread
> > >> stays too long in primary mode and is thus scheduled by Xenomai with the
> > >> wrong priority /wrt other Linux task at its level.
> > >>
> > >> For the time being, you can work around this by issuing a Linux syscall
> > >> before entering long processing loops - unless your task doesn't do this
> > >> anyway, e.g. to perform some Linux I/O.
> > >>
> > >
> > > I think that's need. Currently the statistics task takes a mutex and
> > > waits on a message queue for messages. That's the only time it should
> > > potentially run in primary mode. After it returns the Mutex it should
> > > continue running with a policy similar to SCHED_IDLE to give other tasks
> > > a chance to run. I see how switching back to secondary mode could be
> > > achieved by issuing a Linux syscall. Is there another way which doesn't
> > > involve changing the source code of our application? (The proper way?)
> >
> > The proper way would be to not having to change the application code.
> > But this workaround (Linux syscall or *_set_mode()) is required until we
> > improve the nucleus.
>
> I generated a patch against 2.4.10.1 to get this behaviour (see further
> down). Instead of having
> to review and insert a Linux syscall or *_set_mode() in the application code
> I just call
> rt_task_set_mode(0, T_IDLE, NULL) at the beginning of the task body of the
> task which
> should mostly run in secondary mode under SCHED_IDLE (see example further
> down). The task
> marked with T_IDLE will switch to primary mode at every Xenomai skincall and
> immediately
> switch back to secondary mode once the Xenomai skincall is done.
>
> We identified just one case where this task has to stay in primary mode. This
> is between
> rt_mutex_aquire() and rt_mutex_release() since it may undergo a priority
> inversion boost.
> If the task stayed in secondary mode during that time it either would
> potentally delay the
> execution of a high priority task or would kill the system.
>
> The patch seems to work for us. Our statistics task which blocked the system
> for a long
> time (and made the UI running under Linux unresponsive) is running with
> T_IDLE. If Linux is
> heavily loaded now the statistics will get out of sync but the UI will still
> be responsive.
>
The logic of this patch looks ok for the native skin, given that 2.4.x
does not provide a centralized implementation for dealing with exclusive
resources, like 2.5.x with xnsynch_acquire/release, and always emits a
syscall to manage those resources.
This said, you could spare the T_IDLE tag by assuming that any non-RT
shadow thread has to switch back to secondary mode after a syscall,
unless the owned resource count is non-zero. This is where we are
heading to in 2.5.x, since the preferred mode of operation for such
thread has to be fundamentally "relaxed" (otherwise, one would have
created a RT thread, right).
I'm also unsure you should force SCHED_IDLE, instead of picking
SCHED_OTHER for a more general approach to this issue. You can't assume
that userland does want to be reniced that way, at least not from the
nucleus. But I guess this fits your problem though.
To sum up, since we can't really provide a true SCHED_IDLE policy on
linux (i.e. not a nice-level hack), and implementing a sched class in
Xenomai having a lower priority than the existing xnsched_class_idle (in
2.5.x) is not feasible (we could not run any userland task in it
anyway), we'd better stick with SCHED_OTHER.
> One thing I've noticed though, and this is not related to the patch (I
> verified it on a
> vanilla Xenomai system): Consider the example I included. It prints average
> cycle times
> and the cycle time variance of the high priority task ("T2"). I noticed a big
> difference
> in the cycle time variance when switching the first task ("T1") to secondary
> mode with
> rt_task_set_mode() and setting the scheduler policy to either SCHED_FIFO,
> SCHED_IDLE or
> SCHED_NORMAL. I'm assuming someone asked this before and I didn't pay
> attention :)
> Can someone give me a short explanation or point me somewhere to get an
> explanation for
> this behaviour? I didn't expect such a difference in variance:
>
> SCHED_FIFO:
> task=2 count=1000 average=10505us variance=851(us)^2
> task=2 count=1000 average=10504us variance=176(us)^2
> task=2 count=1000 average=10505us variance=716(us)^2
> task=2 count=1000 average=10504us variance=148(us)^2
> task=2 count=1000 average=10504us variance=143(us)^2
> task=2 count=1000 average=10504us variance=141(us)^2
> task=2 count=1000 average=10504us variance=138(us)^2
>
> SCHED_NORMAL:
> task=2 count=1000 average=10501us variance=2115(us)^2
> task=2 count=1000 average=10504us variance=3121(us)^2
> task=2 count=1000 average=10500us variance=161(us)^2
> task=2 count=1000 average=10501us variance=1136(us)^2
> task=2 count=1000 average=10500us variance=194(us)^2
> task=2 count=1000 average=10501us variance=1971(us)^2
> task=2 count=1000 average=10500us variance=132(us)^2
> task=2 count=1000 average=10501us variance=1173(us)^2
>
> SCHED_IDLE:
> task=2 count=1000 average=10504us variance=3413(us)^2
> task=2 count=1000 average=10503us variance=3567(us)^2
> task=2 count=1000 average=10504us variance=3409(us)^2
> task=2 count=1000 average=10504us variance=1743(us)^2
> task=2 count=1000 average=10504us variance=2710(us)^2
> task=2 count=1000 average=10504us variance=2548(us)^2
> task=2 count=1000 average=10504us variance=2364(us)^2
> task=2 count=1000 average=10504us variance=2867(us)^2
> task=2 count=1000 average=10504us variance=2755(us)^2
>
>
> Regards, Andreas
>
> (Xenomai 2.4.10.1, Linux 2.6.32, Ipipe 2.8)
>
>
> EXAMPLE APPLICATION:
>
> #include <stdio.h>
> #include <unistd.h>
> #include <native/task.h>
> #include <native/sem.h>
> #include <native/mutex.h>
> #include <sys/mman.h>
> #include <rtdk.h>
> #include <sched.h>
> #include <linux/sched.h>
>
> typedef struct test {
> RTIME timestamp;
> RTIME sum;
> RTIME sumsq;
> int count;
> } test_t;
>
> static test_t test1 = {0, 0, 0, 0}, test2 = {0, 0, 0, 0};
> static RT_TASK task1, task2;
> static RT_MUTEX mutex1, mutex2;
>
> static void task_body( void* cookie )
> {
> RTIME timestamp, delta;
> test_t* ptr = (test_t*)cookie;
> int num = ( ptr == &test1) ? 1 : 2;
>
> if( num == 1 ) rt_task_set_mode(0, 0x00080000, NULL);
>
> while(1)
> {
> rt_mutex_acquire(&mutex1, TM_INFINITE);
>
> timestamp = __xn_rdtsc();
> delta = (timestamp - ptr->timestamp) >> 6 /* in us */;
> ptr->sum += delta;
> ptr->sumsq += delta*delta;
> ptr->timestamp = timestamp;
> if( ++ptr->count >= 1000 ) {
> RTIME avg, var;
> avg = ptr->sum/ptr->count;
> var = (ptr->sumsq -
> (ptr->sum*ptr->sum)/ptr->count)/(ptr->count-1);
>
> if( num == 2 )
> rt_printf("task=%d count=%d average=%lluus
> variance=%llu(us)^2\n",
> num, ptr->count, avg, var);
>
> ptr->sum = 0;
> ptr->sumsq = 0;
> ptr->count = 0;
> }
>
> // If commented T1 basically runs in a while(1) {}
> // loop without any sleeps. UI should be responsive
> // since T1 runs with SCHED_IDLE.
> // If uncommented the sleep causes T2 to boost the
> // priority of T1.
> if( num == 1 ) rt_task_sleep(10000000);
>
> // T1 automatically switches to secondary mode after
> // this call.
> rt_mutex_release(&mutex1);
>
> // Give T1 time to run
> if( num == 2 ) rt_task_sleep(10000000);
> }
> }
>
> int main(int argc, char* argv[])
> {
> int err;
>
> mlockall(MCL_CURRENT|MCL_FUTURE);
>
> rt_print_auto_init(1);
>
> test1.timestamp = test2.timestamp = __xn_rdtsc();
> err = rt_mutex_create(&mutex1, "M1");
> err += rt_mutex_create(&mutex2, "M2");
> err += rt_task_spawn(&task1, "T1", 0, 33, 0, task_body, (void*)&test1);
> err += rt_task_spawn(&task2, "T2", 0, 66, 0, task_body, (void*)&test2);
> if( !err )
> {
> pause();
> }
>
> return err;
> }
>
> IDLE PATCH:
>
> diff -ruN linux-2.6.32-5RR9/include/asm-generic/xenomai/syscall.h
> linux-2.6.32-5RR9-new/include/asm-generic/xenomai/syscall.h
> --- linux-2.6.32-5RR9/include/asm-generic/xenomai/syscall.h 2010-04-13
> 20:02:21.000000000 -0400
> +++ linux-2.6.32-5RR9-new/include/asm-generic/xenomai/syscall.h
> 2010-04-14 10:38:09.000000000 -0400
> @@ -89,6 +89,8 @@
> #define __xn_exec_adaptive 0x40
> /* Do not restart syscall upon signal receipt. */
> #define __xn_exec_norestart 0x80
> +/* Do not switch to secondary mode after syscall if thread has XNIDLE flag
> set (see #XNIDLE) */
> +#define __xn_exec_norelax 0x100
> /* Context-agnostic syscall. Will actually run in Xenomai domain. */
> #define __xn_exec_any 0x0
> /* Short-hand for shadow init syscall. */
> diff -ruN linux-2.6.32-5RR9/include/xenomai/native/task.h
> linux-2.6.32-5RR9-new/include/xenomai/native/task.h
> --- linux-2.6.32-5RR9/include/xenomai/native/task.h 2010-04-13
> 20:02:21.000000000 -0400
> +++ linux-2.6.32-5RR9-new/include/xenomai/native/task.h 2010-04-14
> 10:38:09.000000000 -0400
> @@ -52,6 +52,7 @@
> #define T_SHIELD XNSHIELD /**< See #XNSHIELD */
> #define T_WARNSW XNTRAPSW /**< See #XNTRAPSW */
> #define T_RPIOFF XNRPIOFF /**< See #XNRPIOFF */
> +#define T_IDLE XNIDLE /**< See #XNIDLE */
> #define T_PRIMARY 0x00000200 /* Recycle internal bits status which */
> #define T_JOINABLE 0x00000400 /* won't be passed to the nucleus. */
> /*! @} */ /* Ends doxygen-group native_task_status */
> diff -ruN linux-2.6.32-5RR9/include/xenomai/nucleus/thread.h
> linux-2.6.32-5RR9-new/include/xenomai/nucleus/thread.h
> --- linux-2.6.32-5RR9/include/xenomai/nucleus/thread.h 2010-04-13
> 20:02:21.000000000 -0400
> +++ linux-2.6.32-5RR9-new/include/xenomai/nucleus/thread.h 2010-04-14
> 10:38:09.000000000 -0400
> @@ -55,6 +55,7 @@
> #define XNSHIELD 0x00010000 /**< IRQ shield is enabled (shadow only) */
> #define XNTRAPSW 0x00020000 /**< Trap execution mode switches */
> #define XNRPIOFF 0x00040000 /**< Stop priority coupling (shadow only) */
> +#define XNIDLE 0x00080000 /**< Switches to secondary mode after syscalls
> if not holding mutexes */
>
> #define XNFPU 0x00100000 /**< Thread uses FPU */
> #define XNSHADOW 0x00200000 /**< Shadow thread */
> @@ -90,7 +91,7 @@
> }
>
> #define XNTHREAD_BLOCK_BITS
> (XNSUSP|XNPEND|XNDELAY|XNDORMANT|XNRELAX|XNHELD)
> -#define XNTHREAD_MODE_BITS
> (XNLOCK|XNRRB|XNASDI|XNSHIELD|XNTRAPSW|XNRPIOFF)
> +#define XNTHREAD_MODE_BITS
> (XNLOCK|XNRRB|XNASDI|XNSHIELD|XNTRAPSW|XNRPIOFF|XNIDLE)
>
> /* These state flags are available to the real-time interfaces */
> #define XNTHREAD_STATE_SPARE0 0x10000000
> @@ -186,6 +187,8 @@
>
> xnpqueue_t claimq; /* Owned resources claimed by others
> (PIP) */
>
> + int lockcnt; /* Mutexes which are currently locked
> by this thread */
> +
> struct xnsynch *wchan; /* Resource the thread pends on */
>
> struct xnsynch *wwake; /* Wait channel the thread was resumed from */
> diff -ruN linux-2.6.32-5RR9/kernel/xenomai/nucleus/shadow.c
> linux-2.6.32-5RR9-new/kernel/xenomai/nucleus/shadow.c
> --- linux-2.6.32-5RR9/kernel/xenomai/nucleus/shadow.c 2010-04-13
> 20:02:22.000000000 -0400
> +++ linux-2.6.32-5RR9-new/kernel/xenomai/nucleus/shadow.c 2010-04-14
> 18:04:20.000000000 -0400
> @@ -1187,7 +1187,7 @@
> void xnshadow_relax(int notify)
> {
> xnthread_t *thread = xnpod_current_thread();
> - int prio;
> + int prio, policy;
> spl_t s;
>
> XENO_BUGON(NUCLEUS, xnthread_test_state(thread, XNROOT));
> @@ -1217,9 +1217,9 @@
> xnpod_fatal("xnshadow_relax() failed for thread %s[%d]",
> thread->name, xnthread_user_pid(thread));
>
> - prio = normalize_priority(xnthread_current_priority(thread));
> - rthal_reenter_root(get_switch_lock_owner(),
> - prio ? SCHED_FIFO : SCHED_NORMAL, prio);
> + prio = xnthread_test_state(thread, XNIDLE) ? 0 :
> normalize_priority(xnthread_current_priority(thread));
> + policy = xnthread_test_state(thread, XNIDLE) ? SCHED_IDLE : (prio ?
> SCHED_FIFO : SCHED_NORMAL);
> + rthal_reenter_root(get_switch_lock_owner(), policy, prio);
>
> xnstat_counter_inc(&thread->stat.ssw); /* Account for secondary mode
> switch. */
>
> @@ -2001,8 +2001,13 @@
>
> if (xnpod_shadow_p() && signal_pending(p))
> request_syscall_restart(thread, regs, sysflags);
> - else if ((sysflags & __xn_exec_switchback) != 0 && switched)
> - xnshadow_harden(); /* -EPERM will be trapped later if
> needed. */
> + else if ((sysflags & __xn_exec_switchback) != 0 && switched) {
> + if (!xnthread_test_state(thread, XNIDLE) ||
> + (xnthread_test_state(thread, XNIDLE) && (sysflags &
> __xn_exec_norelax) != 0))
> + xnshadow_harden(); /* -EPERM will be trapped later if needed.
> */
> + } else if ((sysflags & __xn_exec_norelax) == 0 && xnpod_primary_p() &&
> + xnpod_current_thread()->lockcnt == 0 &&
> xnthread_test_state(thread, XNIDLE))
> + xnshadow_relax(0);
>
> return RTHAL_EVENT_STOP;
>
> @@ -2137,6 +2142,9 @@
> request_syscall_restart(xnshadow_thread(current), regs,
> sysflags);
> else if ((sysflags & __xn_exec_switchback) != 0 && switched)
> xnshadow_relax(0);
> + else if ((sysflags & __xn_exec_norelax) == 0 && xnpod_primary_p() &&
> + xnpod_current_thread()->lockcnt == 0 &&
> xnthread_test_state(thread, XNIDLE))
> + xnshadow_relax(0);
>
> return RTHAL_EVENT_STOP;
> }
> diff -ruN linux-2.6.32-5RR9/kernel/xenomai/nucleus/thread.c
> linux-2.6.32-5RR9-new/kernel/xenomai/nucleus/thread.c
> --- linux-2.6.32-5RR9/kernel/xenomai/nucleus/thread.c 2010-04-13
> 20:02:22.000000000 -0400
> +++ linux-2.6.32-5RR9-new/kernel/xenomai/nucleus/thread.c 2010-04-14
> 10:38:09.000000000 -0400
> @@ -124,6 +124,7 @@
> thread->rpi = NULL;
> #endif /* CONFIG_XENO_OPT_PRIOCPL */
> initpq(&thread->claimq);
> + thread->lockcnt = 0;
>
> xnarch_init_display_context(thread);
>
> diff -ruN linux-2.6.32-5RR9/kernel/xenomai/skins/native/mutex.c
> linux-2.6.32-5RR9-new/kernel/xenomai/skins/native/mutex.c
> --- linux-2.6.32-5RR9/kernel/xenomai/skins/native/mutex.c 2010-04-13
> 20:02:22.000000000 -0400
> +++ linux-2.6.32-5RR9-new/kernel/xenomai/skins/native/mutex.c 2010-04-14
> 10:38:09.000000000 -0400
> @@ -396,6 +396,8 @@
> /* xnsynch_sleep_on() might have stolen the resource,
> so we need to put our internal data in sync. */
> mutex->lockcnt = 1;
> +
> + thread->lockcnt++;
> }
>
> unlock_and_exit:
> @@ -462,6 +464,8 @@
> if (--mutex->lockcnt > 0)
> goto unlock_and_exit;
>
> + xnpod_current_thread()->lockcnt--;
> +
> if (xnsynch_wakeup_one_sleeper(&mutex->synch_base)) {
> mutex->lockcnt = 1;
> xnpod_schedule();
> diff -ruN linux-2.6.32-5RR9/kernel/xenomai/skins/native/syscall.c
> linux-2.6.32-5RR9-new/kernel/xenomai/skins/native/syscall.c
> --- linux-2.6.32-5RR9/kernel/xenomai/skins/native/syscall.c 2010-04-13
> 20:02:22.000000000 -0400
> +++ linux-2.6.32-5RR9-new/kernel/xenomai/skins/native/syscall.c
> 2010-04-14 10:38:09.000000000 -0400
> @@ -3932,7 +3932,7 @@
> [__native_mutex_create] = {&__rt_mutex_create, __xn_exec_any},
> [__native_mutex_bind] = {&__rt_mutex_bind, __xn_exec_conforming},
> [__native_mutex_delete] = {&__rt_mutex_delete, __xn_exec_any},
> - [__native_mutex_acquire] = {&__rt_mutex_acquire, __xn_exec_primary},
> + [__native_mutex_acquire] = {&__rt_mutex_acquire,
> __xn_exec_primary|__xn_exec_norelax},
> [__native_mutex_release] = {&__rt_mutex_release, __xn_exec_primary},
> [__native_mutex_inquire] = {&__rt_mutex_inquire, __xn_exec_any},
> [__native_cond_create] = {&__rt_cond_create, __xn_exec_any},
> diff -ruN linux-2.6.32-5RR9/kernel/xenomai/skins/native/task.c
> linux-2.6.32-5RR9-new/kernel/xenomai/skins/native/task.c
> --- linux-2.6.32-5RR9/kernel/xenomai/skins/native/task.c 2010-04-13
> 20:02:22.000000000 -0400
> +++ linux-2.6.32-5RR9-new/kernel/xenomai/skins/native/task.c 2010-04-14
> 10:38:09.000000000 -0400
> @@ -1498,7 +1498,7 @@
> }
>
> if (((clrmask | setmask) &
> - ~(T_LOCK | T_RRB | T_NOSIG | T_SHIELD | T_WARNSW)) != 0)
> + ~(T_LOCK | T_RRB | T_NOSIG | T_SHIELD | T_WARNSW | T_IDLE)) != 0)
> return -EINVAL;
>
> if (!xnpod_primary_p())
>
>
> _______________________________________________
> Xenomai-help mailing list
> [email protected]
> https://mail.gna.org/listinfo/xenomai-help
--
Philippe.
_______________________________________________
Xenomai-help mailing list
[email protected]
https://mail.gna.org/listinfo/xenomai-help