[Xenomai-core] [REMINDER] Migrating Xenomai mailing lists
We will soon be moving all our mailing lists out of gna.org to host them on xenomai.org instead. At this chance, xenomai-h...@gna.org, xenomai-core@gna.org and adeos-m...@gna.org will be merged into a single list named xeno...@xenomai.org. These are low traffic lists, so we want to group all Xenomai-related discussions in one place. Commits to the development trees will be sent to xenomai-...@xenomai.org. The migration is scheduled for May 19, all current subscribers of the former lists will be automatically subscribed to xeno...@xenomai.org. You will receive an automated mail from our Mailman when this happens. The Mailman interface to the new lists is available at: http://www.xenomai.org/mailman/listinfo/xenomai. Please drop a mail to mail...@xenomai.org in case of issue. Thanks, -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] rt_task_create and rt_task delete re-scheduling calling task
On 05/14/2012 09:55 AM, Roberto Bielli wrote: Hi, i saw in the documentation that rt_task_create and rt_task_delete should re-scheduling the calling task. So i lost the priority if in a task try to call rt_task_create or rt_task create. Do i understand correctly ? Is there a way to avoid this behaviour ? Or which are all the case of re-scheduling whne calling rt_task_create/rt_task_delete ? There is no way to avoid rescheduling (assuming you are currently using the user-space API). Creating and deleting tasks involves switching to secondary mode to get/release linux resources it's impossible to access from a primary context. Thanks of all P.S. the imx25 now it's perfect. Was only the reentrant interrupt. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] Migrating Xenomai mailing lists
We will soon be moving all our mailing lists out of gna.org to host them on xenomai.org instead. At this chance, xenomai-h...@gna.org, xenomai-core@gna.org and adeos-m...@gna.org will be merged into a single list named xeno...@xenomai.org. These are low traffic lists, so we want to group all Xenomai-related discussions in one place. Commits to the development trees will be sent to xenomai-...@xenomai.org. The migration is scheduled for May 19, all current subscribers of the former lists will be automatically subscribed to xeno...@xenomai.org. You will receive an automated mail from our Mailman when this happens. The Mailman interface to the new lists is available at: http://www.xenomai.org/mailman/listinfo/xenomai. Please drop a mail to mail...@xenomai.org in case of issue. Thanks, -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Scheduler extensions
On 05/08/2012 09:23 AM, Jonas Flodin wrote: Hi! I'm a PhD student who is currently doing research on multicore real-time scheduling. I'm considering using Xenomai as a base for my research experiments, but this would require me to replace or extend the current scheduler. So far I have found no documents detailing how the scheduler is implemented or how to extend it (if possible). Could you point me to information regarding the scheduler? There is no documentation on the scheduling core. You should probably start with ksrc/nucleus/sched*.c, and include/nucleus/sched*.h, having a look at the files implementing the plain FIFO policy in sched-rt*. Hint: the scheduling core is meant to be extensible, adding a new policy entails providing an implementation for a new struct xnsched_class object. Make sure to read the comments in the files implementing the existing policies (-rt, -sporadic, -tp), they usually mention details on the calling context and requirements for the handlers defined by the xnsched_class type. Thank you in advance. BR Jonas Flodin ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] xenomai-forge: round-robin scheduling in pSOS skin
On 03/08/2012 03:30 PM, Ronny Meeus wrote: Hello I'm are using the xenomai-forge pSOS skin (Mercury). My application is running on a P4040 (Freescale PPC with 4 cores). Some code snippets are put in this mail but the complete testcode is also attached. I have a test task that just consumes the CPU: int run_test = 1; static void perform_work(u_long counter,u_long b,u_long c,u_long d) { int i; while (run_test) { for (i=0;i10;i++); (*(unsigned long*)counter)++; } while (1) tm_wkafter(1000); } If I create 2 instances of this task with the T_SLICE option set: t_create(WORK,10,0,0,0,tid); t_start(tid,T_TSLICE, perform_work, args); I see that only 1 task is consuming CPU. # taskset 1 ./roundrobin.exe #.543| [main] SCHED_RT priorities = [1 .. 99] .656| [main] SCHED_RT.99 reserved for IRQ emulation .692| [main] SCHED_RT.98 reserved for scheduler-lock emulation 0 - 6602 1 - 0 If I adapt the code so that I call in my init the threadobj_start_rr function, I see that the load is equally distributed over the 2 threads: # taskset 1 ./roundrobin.exe #.557| [main] SCHED_RT priorities = [1 .. 99] .672| [main] SCHED_RT.99 reserved for IRQ emulation .708| [main] SCHED_RT.98 reserved for scheduler-lock emulation 0 - 3290 1 - 3291 Here are the questions: - why is the threadobj_start_rr function not called from the context of the init of the psos layer. Because threadobj_start_rr() was originally designed to activate round-robin for all threads (some RTOS like VxWorks expose that kind of API), not on a per-thread basis. This is not what pSOS wants. The round-robin API is in state of flux for mercury, only the cobalt one is stable. This is why RR is not yet activated despite T_SLICE is recognized. - why is the roundrobin implemented in this way? If the tasks would be mapped on the SCHED_RR instead of the SCHED_FF the Linux scheduler would take care of this. Nope. We need per-thread RR intervals, to manage multiple priority groups concurrently, and we also want to define that interval as we see fit for proper RTOS emulation. POSIX does not define anything like sched_set_rr_interval(), and the linux kernel applies a default fixed interval to all threads from the SCHED_RR class (100ms IIRC). So we have to emulate SCHED_RR over SCHED_FIFO plus a per-thread virtual timer. On the other hand, once the threadobj_start_rr function is called from my init, and I create the tasks in T_NOTSLICE mode, the time-slicing is still done. Because you called threadobj_start_rr(). Thanks. --- Ronny ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PATCH forge] Fix build for relative invocations of configure
On 02/07/2012 04:43 PM, Jan Kiszka wrote: This fixes build setups like '../configure'. Merged, thanks. Signed-off-by: Jan Kiszkajan.kis...@siemens.com --- configure.in |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/configure.in b/configure.in index c0a7d17..0bdced8 100644 --- a/configure.in +++ b/configure.in @@ -547,7 +547,7 @@ LD_FILE_OPTION=$ac_cv_ld_file_option AC_SUBST(LD_FILE_OPTION) if test x$rtcore_type = xcobalt; then - XENO_USER_CFLAGS=-I$srcdir/include/cobalt $XENO_USER_CFLAGS + XENO_USER_CFLAGS=-I`cd $srcdir pwd`/include/cobalt $XENO_USER_CFLAGS if [[ $ac_cv_ld_file_option = yes ]]; then XENO_POSIX_WRAPPERS=-Wl,@`cd $srcdir pwd`/lib/cobalt/posix.wrappers else -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PATCH] Add sigdebug unit test
On 01/26/2012 11:36 AM, Jan Kiszka wrote: On 2012-01-25 19:05, Jan Kiszka wrote: On 2012-01-25 18:44, Gilles Chanteperdrix wrote: On 01/25/2012 06:10 PM, Jan Kiszka wrote: On 2012-01-25 18:02, Gilles Chanteperdrix wrote: On 01/25/2012 05:52 PM, Jan Kiszka wrote: On 2012-01-25 17:47, Jan Kiszka wrote: On 2012-01-25 17:35, Gilles Chanteperdrix wrote: On 01/25/2012 05:21 PM, Jan Kiszka wrote: We had two regressions in this code recently. So test all 6 possible SIGDEBUG reasons, or 5 if the watchdog is not available. Ok for this test, with a few remarks: - this is a regression test, so should go to src/testsuite/regression(/native), and should be added to the xeno-regression-test What are unit test for (as they are defined here)? Looks a bit inconsistent. I put under regression all the tests I have which corresponded to things that failed one time or another in xenomai past. Maybe we could move unit tests under regression. - we already have a regression test for the watchdog called mayday.c, which tests the second watchdog action, please merge mayday.c with sigdebug.c (mayday.c also allows checking the disassembly of the code in the mayday page, a nice feature) It seems to have failed in that important last discipline. Need to check why. Because it didn't check the page content for correctness. But that's now done via the new watchdog test. I can keep the debug output, but the watchdog test of mayday looks obsolete to me. Am I missing something? The watchdog does two things: it first sends a SIGDEBUG, then if the application is still spinning, it sends a SIGSEGV. As far as I understood, you test tests the first case, and mayday tests the second case, so, I agree that mayday should be removed, but whatever it tests should be integrated in the sigdebug test. Err... SIGSEGV is not a feature, it was the bug I fixed today. :) So the test case actually specified a bug as correct behavior. The fallback case is in fact killing the RT task as before. But I'm unsure right now: will this leave the system always in a clean state behind? The test case being a test case and doing nothing particular, I do not see what could go wrong. And if something goes wrong, then it needs fixing. Well, if you kill a RT task while it's running in the kernel, you risk inconsistent system states (held mutexex etc.). In this case the task is supposed to spin in user space. If that is always safe, let's implement the test. Had a closer look: These days the two-stage killing is only useful to catch endless loops in the kernel. User space tasks can't get around being migrated on watchdog events, even when SIGDEBUG is ignored. To trigger the enforced task termination without leaving any broken states behind, there is one option: rt_task_spin. Surprisingly for me, it actually spins in the kernel, thus triggers the second level if waiting long enough. I wonder, though, if that behavior shouldn't be improved, ie. the spinning loop be closed in user space - which would take away that option again. Thoughts? Tick-based timing is going to be the problem for determining the spinning delay, unless we expose it in the vdso on a per-skin basis, which won't be pretty. Jan -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] realtime pipes
On 01/16/2012 03:25 PM, Makarand Pradhan wrote: Hi, Real-time pipes are deprecated. We use a lot of rt pipes. So, can you pl elaborate on this? I would highly appreciate if you can comment on the following. 1. When will the rt pipe interface be removed? Any time frame? Xenomai 3. Xenomai 2.x will keep them forever. 2. Would like to understand the reason for deprecating the interface. - Because there is a better socket-based API implemented by the RTIPC driver w/ the XDDP protocol, which does not require running application level code in kernel space (RT_PIPE is definitely an application level API). This new interface is available since Xenomai 2.5.x. It is functionally 100% equivalent to the legacy RT_PIPE API. - Because no support will be provided in Xenomai 3 for running application level code in kernel space, so RT_PIPE have to go from kernel space. However, RT_PIPE are still part of the user-space API of Xenomai 3, interfacing with XDDP endpoints in kernel space. I'm really referring to application level code, by contrast to RTDM driver level code which will obviously remain a first-class citizen in kernel space. See: o http://www.xenomai.org/index.php/Xenomai:Roadmap o http://www.xenomai.org/documentation/xenomai-2.6/html/api/group__rtipc.html o examples/rtdm/profiles/ipc in the Xenomai distro Thanks and Rgds, Mak. On 15/01/12 12:37 PM, Gilles Chanteperdrix wrote: Real-time pipes are deprecated. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] realtime pipes
On 01/16/2012 04:09 PM, Makarand Pradhan wrote: Thanks Philippe. To ensure that I understand correctly, let me rephrase my understanding. In 3.0, the rt_pipe_create and friends will cease to exist. We have to start using the sockets with domain AF_RTIPC and protocol IPCPROTO_XDDP instead. Is that a correct statement? Basically, yes. In addition, X3 will keep the RT_PIPE interface for the -rt endpoint available on the application-side, by wrapping a XDDP socket to a RT_PIPE descriptor under the hood. In kernel space however, the RT_PIPE API to create -rt endpoints won't be available anymore, one will have to create them via the rtdm_socket/rt_dev_socket calls. In any case, the API for the non-rt side does not change, i.e. POSIX file I/O calls will still be the way to interface with the -rt endpoint. Rgds, Mak. On 16/01/12 09:35 AM, Philippe Gerum wrote: On 01/16/2012 03:25 PM, Makarand Pradhan wrote: Hi, Real-time pipes are deprecated. We use a lot of rt pipes. So, can you pl elaborate on this? I would highly appreciate if you can comment on the following. 1. When will the rt pipe interface be removed? Any time frame? Xenomai 3. Xenomai 2.x will keep them forever. 2. Would like to understand the reason for deprecating the interface. - Because there is a better socket-based API implemented by the RTIPC driver w/ the XDDP protocol, which does not require running application level code in kernel space (RT_PIPE is definitely an application level API). This new interface is available since Xenomai 2.5.x. It is functionally 100% equivalent to the legacy RT_PIPE API. - Because no support will be provided in Xenomai 3 for running application level code in kernel space, so RT_PIPE have to go from kernel space. However, RT_PIPE are still part of the user-space API of Xenomai 3, interfacing with XDDP endpoints in kernel space. I'm really referring to application level code, by contrast to RTDM driver level code which will obviously remain a first-class citizen in kernel space. See: o http://www.xenomai.org/index.php/Xenomai:Roadmap o http://www.xenomai.org/documentation/xenomai-2.6/html/api/group__rtipc.html o examples/rtdm/profiles/ipc in the Xenomai distro Thanks and Rgds, Mak. On 15/01/12 12:37 PM, Gilles Chanteperdrix wrote: Real-time pipes are deprecated. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Synchronization of shared memory with mutexes
On 01/10/2012 04:04 PM, Jan-Erik Lange wrote: Hello, I have a question about basics of the synchronization of shared memory with mutexes. The situation: The Sender is a RT task (primary domain) and the recipient is a non-RT task (usually in the secondary domain). Namely, the receiver is used to interact with a Web server. He calls to syscalls and stuff and because of that he's usually in the secondary mode. Suppose the sender has written something to the shared memory: He uses mutex for synchronization, so he calls the rt_mutex_release() function. The receiver will now get time to work from the scheduler. He calls rt_mutex_acquire() function to lock the shared memory. Then a context switch occurs from the secondary mode in the primary mode. He has now the resource for himself. Now the scheduler lets sender-task to work and it wants to write something. So it calls rt_mutex_acquire() function. And now comes my question: Provides rt_mutex_acquire() a mechanism to signal the cheduler to immediately continue with the recipient-task? If so, how does the rt_mutex_acquire() function tells the scheduler that? There are two tasks controlled by the same (Xenomai) scheduler. One is trying to grab a mutex the other one holds, so it is put to sleep on that mutex. The scheduler will simply switch to the next ready-to-run task since the sender task cannot run anymore, and that next task may be the receiver task. There is no special signaling magic required. I came out because I in the documention I read the term Rescheduling: always. The documentation for rt_mutex_acquire is Rescheduling: always unless the request is immediately satisfied or timeout specifies a non-blocking operation.. Best regards Jan ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Ipipe breaks my MPC8541 board boot
On 01/03/2012 06:58 PM, Gilles Chanteperdrix wrote: On 01/03/2012 06:49 PM, Jean-Michel Hautbois wrote: cpm2_cascade is dedicated to my board, but has nothing impressive : static void cpm2_cascade(unsigned int irq, struct irq_desc *desc) { int cascade_irq; while ((cascade_irq = cpm2_get_irq())= 0) generic_handle_irq(cascade_irq); Replace generic_handle_irq with ipipe_handle_chained_irq. You have to fixup the eoi handling as well, check how this is done in arch/powerpc/platforms/85xx/sbc8560.c. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] general questions
On 12/29/2011 01:04 PM, Jan-Erik Lange wrote: Hello, I'm new in the topic about the Xenomai co-kernel approach and I have some questions to the primary mode and secondary mode. I have trouble with the imagination of the fact in general, that one task (process oder thread) can be processed by two kernels (Xenomai nucleus and standard kernel), and treated by the one in real time and by the other in non real-time. 1. So far I understood this approach, the primary and secondary mode is an abstract description of the fact, that threads or processes can be scheduled by the Xenomai nucleus or by the standard Linux kernel scheduler. Is this correct? Yes. A Xenomai thread in user-space is attached a shadow control area in addition to the regular linux context data, which enables both linux and the nucleus to schedule it in a mutually exclusive manner. 2. Now supose, that I have chosen the VxWork-skin and I started a task in the primary mode. Is it correct that when this task is calling a non VxWork-API-funtion, there will a change of the context from primary to the secondary mode? Or what is the exact condition of the switching of the context? - invoking a regular linux syscall - receiving a linux signal (e.g. kill(2) and GDB) - causing a CPU trap (e.g. invalid memory access), hitting a breakpoint (e.g. GDB) all these situations cause the switch from primary to secondary mode. We say that such thread relaxes in Xenomai parlance. A common caveat is to call a glibc routine, which eventually issues a linux syscall under the hood. Think of malloc() detecting a process memory shortage, which then calls mmap or sbrk to extend the process data. Or, running into a mutex contention once in a while, forcing the calling thread to issue a syscall for sleeping on the mutex. Fortunately, we have a tool to detect these situations. It would be very nice, if you could tell me a little bit about these questions? Best regards Jan ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.
On 12/23/2011 06:33 PM, Lennart Sorensen wrote: After spending quite a while trying to explain how things like /bin/echo could possibly segfault, I finally discovered that the new feature in xenomai 2.6.0 (new when moving from 2.4.10 that is) of having preemptible context switches is what is corrupting the state of random linux processes once in a while. After turning the option off, I haven't seen a single crash just like 2.4.10. So something subtle is wrong with this option. It appears to be most likely to occour (possibly only likely) when xenomai is handling interrupts. It seems that getting an interrupt in the middle of a context switch at the wrong time corrupts the process that is being switched to or from (no idea which it is). Unless someone can think of a way to track down and fix this I would certainly suggest making the option off by default instead of on. Papering over a bug this way is certainly not an option. With CONFIG_XENO_HW_UNLOCKED_SWITCH=n I don't have any problems anymore. Which kernel version, what ppc hardware? -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.
On 12/23/2011 07:32 PM, Lennart Sorensen wrote: On Fri, Dec 23, 2011 at 07:17:09PM +0100, Philippe Gerum wrote: Papering over a bug this way is certainly not an option. Long term it certainly isn't. Which kernel version, what ppc hardware? 3.0.13, 3.0.9, 3.0.8. mpc8360e. xenomai 2.6.0 with ipipe 3.0.8-powerpc-2.13-04 Do you have a typical test scenario which triggers this bug? -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.
On 12/23/2011 09:25 PM, Lennart Sorensen wrote: On Fri, Dec 23, 2011 at 09:08:11PM +0100, Philippe Gerum wrote: Do you have a typical test scenario which triggers this bug? It can take a couple of hours under pretty heavy load to get one occourance. But with preemptible context swiches off we haven't seen any in a week. For sure xenomai tasks are handling interrupts quite a lot at the time. I wish we had a simple test case to show it, but it seems to require triggering an interrupt in the middle of a context switch at exactly the wrong place. Is it reproducible with the basic latency or cyclic tests if waiting for long enough? Running ltp in parallel would trigger a decent load, but sometimes two shell loops forking commands in the background are enough to trigger a variety of issues when something fragile exists in the mmu layer as modified by the I-Pipe. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.
On 12/23/2011 10:55 PM, Lennart Sorensen wrote: On Fri, Dec 23, 2011 at 10:48:29PM +0100, Philippe Gerum wrote: Is it reproducible with the basic latency or cyclic tests if waiting for long enough? Running ltp in parallel would trigger a decent load, but sometimes two shell loops forking commands in the background are enough to trigger a variety of issues when something fragile exists in the mmu layer as modified by the I-Pipe. Well we can try after I come back from vacation in a couple of weeks. Ok. I will try to reproduce on my side as well. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Usage of Xenomai name
On Mon, 2011-10-17 at 11:15 +0100, Jorge Amado Azevedo wrote: Hello I'm currently finishing a small application that allows users to draw block diagrams of control systems and execute them in real-time using Xenomai. Technically, each block is a Xenomai task and users can easily make their own as long as they adhere to a specific interface. I'm a student at the university of Aveiro (Portugal) and this work is part of my master's thesis. My original idea was to call my application Xenomai Lab but I'm not sure if I can use the Xenomai name like that. Am I violating any trademarks, copyrights or other legal restrictions by using the Xenomai name for my application? There would be no objection from the Xenomai project, provided this is and remains LGPL/GPL software. Regards, Jorge Azevedo ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc4
On Wed, 2011-09-28 at 20:34 +0200, Gilles Chanteperdrix wrote: Hi, here is the 4th release candidate for Xenomai 2.6.0: http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc4.tar.bz2 Novelties since -rc3 include: - a fix for the long names issue on psos+ - a fix for the build issue of mscan on mpc52xx (please Wolfgang, have a look at the patch, to see if you like it:) http://git.xenomai.org/?p=xenomai-head.git;a=commitdiff;h=d22fd231db7eb0af8e77ec570efb89e578e13781;hp=4a2188f049e96fc59aa7c4a7a9d058075f3d79e8 - a new version of the I-pipe patch for linux 3.0 on ppc. People running 2.13-02/powerpc over linux 3.0.4 should definitely upgrade to 2.13-03, or apply this: http://git.denx.de/?p=ipipe-2.6.git;a=commit;h=7c28eb2dea86366bf721663bb8d28ce89cf2806c This should be the last release candidate. Regards. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Policy switching and XNOTHER maintenance
On Sun, 2011-09-18 at 16:34 +0200, Jan Kiszka wrote: On 2011-09-18 16:02, Philippe Gerum wrote: On Fri, 2011-09-16 at 22:39 +0200, Gilles Chanteperdrix wrote: On 09/16/2011 10:13 PM, Gilles Chanteperdrix wrote: On 09/11/2011 04:29 PM, Jan Kiszka wrote: On 2011-09-11 16:24, Gilles Chanteperdrix wrote: On 09/11/2011 12:50 PM, Jan Kiszka wrote: Hi all, just looked into the hrescnt issue again, specifically the corner case of a shadow thread switching from real-time policy to SCHED_OTHER. Doing this while holding a mutex looks invalid. Looking at POSIX e.g., is there anything in the spec that makes this invalid? If the kernel preserves or established proper priority boosting, I do not see what could break in principle. It is nothing I would design into some app, but we should somehow handle it (doc update or code adjustments). If we do not do it, the current code is valid. Except for its dependency on XNOTHER which is not updated on RT-NORMAL transitions. The fact that this update did not take place made the code work. No negative rescnt could happen with that code. Anyway, here is a patch to allow switching back from RT to NORMAL, but send a SIGDEBUG to a thread attempting to release a mutex while its counter is already 0. We end up avoiding a big chunk of code that would have been useful for a really strange corner case. Here comes version 2: diff --git a/include/nucleus/sched-idle.h b/include/nucleus/sched-idle.h index 6399a17..417170f 100644 --- a/include/nucleus/sched-idle.h +++ b/include/nucleus/sched-idle.h @@ -39,6 +39,8 @@ extern struct xnsched_class xnsched_class_idle; static inline void __xnsched_idle_setparam(struct xnthread *thread, const union xnsched_policy_param *p) { + if (xnthread_test_state(thread, XNSHADOW)) + xnthread_clear_state(thread, XNOTHER); thread-cprio = p-idle.prio; } diff --git a/include/nucleus/sched-rt.h b/include/nucleus/sched-rt.h index 71f655c..cc1cefa 100644 --- a/include/nucleus/sched-rt.h +++ b/include/nucleus/sched-rt.h @@ -86,6 +86,12 @@ static inline void __xnsched_rt_setparam(struct xnthread *thread, const union xnsched_policy_param *p) { thread-cprio = p-rt.prio; + if (xnthread_test_state(thread, XNSHADOW)) { + if (thread-cprio) + xnthread_clear_state(thread, XNOTHER); + else + xnthread_set_state(thread, XNOTHER); + } } static inline void __xnsched_rt_getparam(struct xnthread *thread, diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c index 9a02e80..d1f 100644 --- a/ksrc/nucleus/pod.c +++ b/ksrc/nucleus/pod.c @@ -1896,16 +1896,6 @@ int __xnpod_set_thread_schedparam(struct xnthread *thread, xnsched_putback(thread); #ifdef CONFIG_XENO_OPT_PERVASIVE - /* - * A non-real-time shadow may upgrade to real-time FIFO - * scheduling, but the latter may never downgrade to - * SCHED_NORMAL Xenomai-wise. In the valid case, we clear - * XNOTHER to reflect the change. Note that we keep handling - * non real-time shadow specifics in higher code layers, not - * to pollute the core scheduler with peculiarities. - */ - if (sched_class == xnsched_class_rt sched_param-rt.prio 0) - xnthread_clear_state(thread, XNOTHER); if (propagate) { if (xnthread_test_state(thread, XNRELAX)) xnshadow_renice(thread); diff --git a/ksrc/nucleus/sched-sporadic.c b/ksrc/nucleus/sched-sporadic.c index fd37c21..ffc9bab 100644 --- a/ksrc/nucleus/sched-sporadic.c +++ b/ksrc/nucleus/sched-sporadic.c @@ -258,6 +258,8 @@ static void xnsched_sporadic_setparam(struct xnthread *thread, } } + if (xnthread_test_state(thread, XNSHADOW)) + xnthread_clear_state(thread, XNOTHER); thread-cprio = p-pss.current_prio; } diff --git a/ksrc/nucleus/sched-tp.c b/ksrc/nucleus/sched-tp.c index 43a548e..a2af1d3 100644 --- a/ksrc/nucleus/sched-tp.c +++ b/ksrc/nucleus/sched-tp.c @@ -100,6 +100,8 @@ static void xnsched_tp_setparam(struct xnthread *thread, { struct xnsched *sched = thread-sched; + if (xnthread_test_state(thread, XNSHADOW)) + xnthread_clear_state(thread, XNOTHER); thread-tps = sched-tp.partitions[p-tp.ptid]; thread-cprio = p-tp.prio; } diff --git a/ksrc/nucleus/synch.c b/ksrc/nucleus/synch.c index b956e46..47bc0c5 100644 --- a/ksrc/nucleus/synch.c +++ b/ksrc/nucleus/synch.c @@ -684,9 +684,13 @@ xnsynch_release_thread(struct xnsynch *synch, struct xnthread *lastowner) XENO_BUGON(NUCLEUS, !testbits(synch-status, XNSYNCH_OWNER)); - if (xnthread_test_state(lastowner, XNOTHER)) - xnthread_dec_rescnt(lastowner); - XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner) 0
Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1
On Tue, 2011-09-06 at 13:31 +0200, Gilles Chanteperdrix wrote: On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote: Hi, The first release candidate for the 2.6.0 version may be downloaded here: http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2 Hi, currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related to vfile support. Do we really want to still support 2.4 kernels? That would not be a massive loss, but removing linux 2.4 support is more than a few hunks here and there, so this may not be the right thing to do ATM. Besides, it would be better not to leave the few linux 2.4 users out there without upgrade path to xenomai 2.6, since this will be the last maintained version from the Xenomai 2.x architecture. That stuff does not compile likely because the Config.in bits are not up to date, blame it on me. I'll make this build over linux 2.4 and commit the result today. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1
On Tue, 2011-09-06 at 16:19 +0200, Gilles Chanteperdrix wrote: On 09/06/2011 03:27 PM, Philippe Gerum wrote: On Tue, 2011-09-06 at 13:31 +0200, Gilles Chanteperdrix wrote: On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote: Hi, The first release candidate for the 2.6.0 version may be downloaded here: http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2 Hi, currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related to vfile support. Do we really want to still support 2.4 kernels? That would not be a massive loss, but removing linux 2.4 support is more than a few hunks here and there, so this may not be the right thing to do ATM. Besides, it would be better not to leave the few linux 2.4 users out there without upgrade path to xenomai 2.6, since this will be the last maintained version from the Xenomai 2.x architecture. That stuff does not compile likely because the Config.in bits are not up to date, blame it on me. I'll make this build over linux 2.4 and commit the result today. No problem, I was not looking for someone to blame... Since you are at it, I have problems compiling the nios2 kernel too, but I am not sure I got the proper configuration file. Ok, I'll check this. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1
On Tue, 2011-09-06 at 16:19 +0200, Gilles Chanteperdrix wrote: On 09/06/2011 03:27 PM, Philippe Gerum wrote: On Tue, 2011-09-06 at 13:31 +0200, Gilles Chanteperdrix wrote: On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote: Hi, The first release candidate for the 2.6.0 version may be downloaded here: http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2 Hi, currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related to vfile support. Do we really want to still support 2.4 kernels? That would not be a massive loss, but removing linux 2.4 support is more than a few hunks here and there, so this may not be the right thing to do ATM. Besides, it would be better not to leave the few linux 2.4 users out there without upgrade path to xenomai 2.6, since this will be the last maintained version from the Xenomai 2.x architecture. That stuff does not compile likely because the Config.in bits are not up to date, blame it on me. I'll make this build over linux 2.4 and commit the result today. No problem, I was not looking for someone to blame... Since you are at it, I have problems compiling the nios2 kernel too, but I am not sure I got the proper configuration file. HEAD builds fine based on the attached .config. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1
On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:19 +0200, Gilles Chanteperdrix wrote: On 09/06/2011 03:27 PM, Philippe Gerum wrote: On Tue, 2011-09-06 at 13:31 +0200, Gilles Chanteperdrix wrote: On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote: Hi, The first release candidate for the 2.6.0 version may be downloaded here: http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2 Hi, currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related to vfile support. Do we really want to still support 2.4 kernels? That would not be a massive loss, but removing linux 2.4 support is more than a few hunks here and there, so this may not be the right thing to do ATM. Besides, it would be better not to leave the few linux 2.4 users out there without upgrade path to xenomai 2.6, since this will be the last maintained version from the Xenomai 2.x architecture. That stuff does not compile likely because the Config.in bits are not up to date, blame it on me. I'll make this build over linux 2.4 and commit the result today. No problem, I was not looking for someone to blame... Since you are at it, I have problems compiling the nios2 kernel too, but I am not sure I got the proper configuration file. HEAD builds fine based on the attached .config. Mmmfff... -- Philippe. # # Automatically generated make config: don't edit # Linux kernel version: 2.6.35 # Tue Sep 6 16:49:25 2011 # # # Linux/NiosII Configuration # CONFIG_NIOS2=y CONFIG_MMU=y # CONFIG_FPU is not set # CONFIG_SWAP is not set CONFIG_RWSEM_GENERIC_SPINLOCK=y # # NiosII board configuration # # CONFIG_3C120 is not set CONFIG_NEEK=y CONFIG_NIOS2_CUSTOM_FPGA=y # CONFIG_NIOS2_NEEK_OCM is not set # # NiosII specific compiler options # CONFIG_NIOS2_HW_MUL_SUPPORT=y # CONFIG_NIOS2_HW_MULX_SUPPORT is not set # CONFIG_NIOS2_HW_DIV_SUPPORT is not set # CONFIG_OF is not set CONFIG_ALIGNMENT_TRAP=y CONFIG_RAMKERNEL=y # # Boot options # CONFIG_CMDLINE= CONFIG_PASS_CMDLINE=y CONFIG_BOOT_LINK_OFFSET=0x0100 # # Platform driver options # # CONFIG_AVALON_DMA is not set # # Additional NiosII Device Drivers # # CONFIG_PCI_ALTPCI is not set # CONFIG_ALTERA_REMOTE_UPDATE is not set # CONFIG_PIO_DEVICES is not set # CONFIG_NIOS2_GPIO is not set # CONFIG_ALTERA_PIO_GPIO is not set CONFIG_UID16=y CONFIG_GENERIC_CSUM=y CONFIG_GENERIC_FIND_NEXT_BIT=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_NO_IOPORT=y CONFIG_ZONE_DMA=y CONFIG_BINFMT_ELF=y # CONFIG_NOT_COHERENT_CACHE is not set CONFIG_HZ=100 # CONFIG_TRACE_IRQFLAGS_SUPPORT is not set CONFIG_IPIPE=y CONFIG_IPIPE_DOMAINS=4 CONFIG_IPIPE_DELAYED_ATOMICSW=y # CONFIG_IPIPE_UNMASKED_CONTEXT_SWITCH is not set CONFIG_IPIPE_HAVE_PREEMPTIBLE_SWITCH=y CONFIG_SELECT_MEMORY_MODEL=y CONFIG_FLATMEM_MANUAL=y # CONFIG_DISCONTIGMEM_MANUAL is not set # CONFIG_SPARSEMEM_MANUAL is not set CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y CONFIG_PAGEFLAGS_EXTENDED=y CONFIG_SPLIT_PTLOCK_CPUS=4 # CONFIG_PHYS_ADDR_T_64BIT is not set CONFIG_ZONE_DMA_FLAG=1 CONFIG_BOUNCE=y CONFIG_VIRT_TO_BUS=y # CONFIG_KSM is not set CONFIG_DEFAULT_MMAP_MIN_ADDR=4096 CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config CONFIG_CONSTRUCTORS=y # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE= CONFIG_LOCALVERSION= CONFIG_LOCALVERSION_AUTO=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y # CONFIG_POSIX_MQUEUE is not set CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set # CONFIG_TASKSTATS is not set # CONFIG_AUDIT is not set # # RCU Subsystem # CONFIG_TREE_RCU=y # CONFIG_TREE_PREEMPT_RCU is not set # CONFIG_TINY_RCU is not set # CONFIG_RCU_TRACE is not set CONFIG_RCU_FANOUT=32 # CONFIG_RCU_FANOUT_EXACT is not set # CONFIG_TREE_RCU_TRACE is not set # CONFIG_IKCONFIG is not set CONFIG_LOG_BUF_SHIFT=14 # CONFIG_SYSFS_DEPRECATED_V2 is not set # CONFIG_RELAY is not set # CONFIG_NAMESPACES is not set CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE= CONFIG_RD_GZIP=y # CONFIG_RD_BZIP2 is not set # CONFIG_RD_LZMA is not set # CONFIG_RD_LZO is not set # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y CONFIG_EMBEDDED=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y # CONFIG_ELF_CORE is not set CONFIG_BASE_FULL=y CONFIG_FUTEX=y # CONFIG_EPOLL is not set # CONFIG_SIGNALFD is not set # CONFIG_TIMERFD is not set # CONFIG_EVENTFD is not set # CONFIG_SHMEM is not set CONFIG_AIO=y # # Kernel Performance Events And Counters # CONFIG_VM_EVENT_COUNTERS=y CONFIG_COMPAT_BRK=y CONFIG_SLAB=y # CONFIG_SLUB is not set # CONFIG_SLOB is not set # CONFIG_PROFILING
Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1
On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:19 +0200, Gilles Chanteperdrix wrote: On 09/06/2011 03:27 PM, Philippe Gerum wrote: On Tue, 2011-09-06 at 13:31 +0200, Gilles Chanteperdrix wrote: On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote: Hi, The first release candidate for the 2.6.0 version may be downloaded here: http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2 Hi, currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related to vfile support. Do we really want to still support 2.4 kernels? That would not be a massive loss, but removing linux 2.4 support is more than a few hunks here and there, so this may not be the right thing to do ATM. Besides, it would be better not to leave the few linux 2.4 users out there without upgrade path to xenomai 2.6, since this will be the last maintained version from the Xenomai 2.x architecture. That stuff does not compile likely because the Config.in bits are not up to date, blame it on me. I'll make this build over linux 2.4 and commit the result today. No problem, I was not looking for someone to blame... Since you are at it, I have problems compiling the nios2 kernel too, but I am not sure I got the proper configuration file. HEAD builds fine based on the attached .config. Btw we now only support the MMU version (2.6.35.2) of this kernel over Xenomai 2.6. Reference tree is available there: url = git://sopc.et.ntust.edu.tw/git/linux-2.6.git branch = nios2mmu nommu support is discontinued for nios2 - people who depend on it should stick with Xenomai 2.5.x. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1
On Tue, 2011-09-06 at 21:42 +0200, Gilles Chanteperdrix wrote: On 09/06/2011 08:19 PM, Gilles Chanteperdrix wrote: On 09/06/2011 05:10 PM, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:19 +0200, Gilles Chanteperdrix wrote: On 09/06/2011 03:27 PM, Philippe Gerum wrote: On Tue, 2011-09-06 at 13:31 +0200, Gilles Chanteperdrix wrote: On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote: Hi, The first release candidate for the 2.6.0 version may be downloaded here: http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2 Hi, currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related to vfile support. Do we really want to still support 2.4 kernels? That would not be a massive loss, but removing linux 2.4 support is more than a few hunks here and there, so this may not be the right thing to do ATM. Besides, it would be better not to leave the few linux 2.4 users out there without upgrade path to xenomai 2.6, since this will be the last maintained version from the Xenomai 2.x architecture. That stuff does not compile likely because the Config.in bits are not up to date, blame it on me. I'll make this build over linux 2.4 and commit the result today. No problem, I was not looking for someone to blame... Since you are at it, I have problems compiling the nios2 kernel too, but I am not sure I got the proper configuration file. HEAD builds fine based on the attached .config. Btw we now only support the MMU version (2.6.35.2) of this kernel over Xenomai 2.6. Reference tree is available there: url = git://sopc.et.ntust.edu.tw/git/linux-2.6.git branch = nios2mmu nommu support is discontinued for nios2 - people who depend on it should stick with Xenomai 2.5.x. Ok, still not building, maybe the commit number mentioned in the README is not up-to-date? More build failures for kernel 3.0 and ppc... http://sisyphus.hd.free.fr/~gilles/bx/index.html#powerpc I've fixed most of these, however the platform driver interface changed once again circa 2.6.39, and AFAICT, picking the right approach to cope with this never ending mess for the mscan driver requires some thoughts from educated people. Since I don't qualify for the job, I'm shamelessly passing the buck to Wolfgang: http://sisyphus.hd.free.fr/~gilles/bx/lite5200/3.0.4-ppc_6xx-gcc-4.2.2/log.html#1 PS: I guess this fix can wait until 2.6.0 final, this is not critical for -rc2. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1
On Tue, 2011-09-06 at 20:19 +0200, Gilles Chanteperdrix wrote: On 09/06/2011 05:10 PM, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:19 +0200, Gilles Chanteperdrix wrote: On 09/06/2011 03:27 PM, Philippe Gerum wrote: On Tue, 2011-09-06 at 13:31 +0200, Gilles Chanteperdrix wrote: On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote: Hi, The first release candidate for the 2.6.0 version may be downloaded here: http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2 Hi, currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related to vfile support. Do we really want to still support 2.4 kernels? That would not be a massive loss, but removing linux 2.4 support is more than a few hunks here and there, so this may not be the right thing to do ATM. Besides, it would be better not to leave the few linux 2.4 users out there without upgrade path to xenomai 2.6, since this will be the last maintained version from the Xenomai 2.x architecture. That stuff does not compile likely because the Config.in bits are not up to date, blame it on me. I'll make this build over linux 2.4 and commit the result today. No problem, I was not looking for someone to blame... Since you are at it, I have problems compiling the nios2 kernel too, but I am not sure I got the proper configuration file. HEAD builds fine based on the attached .config. Btw we now only support the MMU version (2.6.35.2) of this kernel over Xenomai 2.6. Reference tree is available there: url = git://sopc.et.ntust.edu.tw/git/linux-2.6.git branch = nios2mmu nommu support is discontinued for nios2 - people who depend on it should stick with Xenomai 2.5.x. Ok, still not building, maybe the commit number mentioned in the README is not up-to-date? The commit # is correct, but I suspect that your kernel tree does not have the files normally created by the SOPC builder anymore, these can't (may not actually) be included in the pipeline patch. In short, your tree might be missing the bits corresponding to the fpga design your build for, so basic symbols like HRCLOCK* and HRTIMER* are undefined. I'm building for a cyclone 3c25 from the NEEK kit, with SOPC files available from arch/nios2/boards/neek. Any valuable files in there on your side? (typically, include/asm/custom_fpga.h should contain definitions for our real-time clocks and timers) -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Xenomai 2.6.0, or -rc1?
On Fri, 2011-08-26 at 14:34 +0200, Gilles Chanteperdrix wrote: Hi, I think it is about time we release Xenomai 2.6.0. Has anyone anything pending (maybe Alex)? Should we release an -rc first? Thanks in advance for your input. Nothing pending for 2.6, I'm focusing on 3.x now. However let's go for -rc1 first, this is a major release anyway. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [RFC] heap: rename sys_sem_heap syscall to sys_heap_info
On Tue, 2011-08-02 at 21:16 +0200, Gilles Chanteperdrix wrote: On 08/01/2011 10:20 PM, Gilles Chanteperdrix wrote: And add the count of used bytes to the xnheap_desc structure. This allows for checking for leaks in unit tests. --- No comments? Fine with me. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Exception #14
On Wed, 2011-07-27 at 20:52 +0200, Gilles Chanteperdrix wrote: On 07/26/2011 09:36 AM, zenati wrote: Dear, I'm developping the skin Arinc 653 for Xenomai. I'm trying to run process with my skin but I have an exception : Xenomai: suspending kernel thread d8824c40 ('�') at 0xb76dbdfc after exception #14 What is the exception 14 ? Have you an idea how can I solve them? Thank you for your attention and for your help. Sincerely, The meaning of the fault number depends on the platform you are using, see /proc/xenomai/faults for human readable messages for your platform. I guess this is PF on x86, and this thread's TCB in kernel space looks badly trashed. You should probably check the behavior of your Xenomai kernel threads wrt memory writes, and possibly for stack overflows as well. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] HOW CAN I KNOW WHICH LINUX SYSTEM CALLS SWITCH TASK IN SECONDARY MODE ?
On Thu, 2011-07-21 at 12:38 +0200, Roberto Bielli wrote: Hi, how can it know with assurance which linux system calls switch to secondary mode and which not ? All do. Thanks for all -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [RFC] Waitqueue-free gatekeeper wakeup
On Mon, 2011-07-18 at 13:52 +0200, Jan Kiszka wrote: Hi Philippe, trying to decouple the PREEMPT-RT gatekeeper wakeup path from XNATOMIC (to fix the remaining races there), I wondered why we need a waitqueue here at all. What about an approach like below, i.e. waking up the gatekeeper directly via wake_up_process? That could even be called from interrupt context. We should be able to avoid missing a wakeup by setting the task state to INTERRUPTIBLE before signaling the semaphore. Am I missing something? No, I think this should work. IIRC, the wait queue dates back when we did not have a strong synchro between the hardening code and the gk via the request token, i.e. the initial implementation over 2.4 kernels. So it is about time to question this. Jan diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h index e251329..df8853b 100644 --- a/include/nucleus/sched.h +++ b/include/nucleus/sched.h @@ -111,7 +111,6 @@ typedef struct xnsched { #ifdef CONFIG_XENO_OPT_PERVASIVE struct task_struct *gatekeeper; - wait_queue_head_t gkwaitq; struct semaphore gksync; struct xnthread *gktarget; #endif diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c index f6b1e16..238317a 100644 --- a/ksrc/nucleus/shadow.c +++ b/ksrc/nucleus/shadow.c @@ -92,7 +92,6 @@ static struct __lostagerq { #define LO_SIGGRP_REQ 2 #define LO_SIGTHR_REQ 3 #define LO_UNMAP_REQ 4 -#define LO_GKWAKE_REQ 5 int type; struct task_struct *task; int arg; @@ -759,9 +758,6 @@ static void lostage_handler(void *cookie) int cpu, reqnum, type, arg, sig, sigarg; struct __lostagerq *rq; struct task_struct *p; -#ifdef CONFIG_PREEMPT_RT - struct xnsched *sched; -#endif cpu = smp_processor_id(); rq = lostagerq[cpu]; @@ -819,13 +815,6 @@ static void lostage_handler(void *cookie) case LO_SIGGRP_REQ: kill_proc(p-pid, arg, 1); break; - -#ifdef CONFIG_PREEMPT_RT - case LO_GKWAKE_REQ: - sched = xnpod_sched_slot(cpu); - wake_up_interruptible_sync(sched-gkwaitq); - break; -#endif } } } @@ -873,7 +862,6 @@ static inline int normalize_priority(int prio) static int gatekeeper_thread(void *data) { struct task_struct *this_task = current; - DECLARE_WAITQUEUE(wait, this_task); int cpu = (long)data; struct xnsched *sched = xnpod_sched_slot(cpu); struct xnthread *target; @@ -886,12 +874,10 @@ static int gatekeeper_thread(void *data) set_cpus_allowed(this_task, cpumask); set_linux_task_priority(this_task, MAX_RT_PRIO - 1); - init_waitqueue_head(sched-gkwaitq); - add_wait_queue_exclusive(sched-gkwaitq, wait); + set_current_state(TASK_INTERRUPTIBLE); up(sched-gksync); /* Sync with xnshadow_mount(). */ for (;;) { - set_current_state(TASK_INTERRUPTIBLE); up(sched-gksync); /* Make the request token available. */ schedule(); @@ -937,6 +923,7 @@ static int gatekeeper_thread(void *data) xnlock_put_irqrestore(nklock, s); xnpod_schedule(); } + set_current_state(TASK_INTERRUPTIBLE); } return 0; @@ -1014,23 +1001,9 @@ redo: thread-gksched = sched; xnthread_set_info(thread, XNATOMIC); set_current_state(TASK_INTERRUPTIBLE | TASK_ATOMICSWITCH); -#ifndef CONFIG_PREEMPT_RT - /* - * We may not hold the preemption lock across calls to - * wake_up_*() services over fully preemptible kernels, since - * tasks might sleep when contending for spinlocks. The wake - * up call for the gatekeeper will happen later, over an APC - * we kick in do_schedule_event() on the way out for the - * hardening task. - * - * We could delay the wake up call over non-RT 2.6 kernels as - * well, but not when running over 2.4 (scheduler innards - * would not allow this, causing weirdnesses when hardening - * tasks). So we always do the early wake up when running - * non-RT, which includes 2.4. - */ - wake_up_interruptible_sync(sched-gkwaitq); -#endif + + wake_up_process(sched-gatekeeper); + schedule(); /* -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On Sat, 2011-07-16 at 11:15 +0200, Jan Kiszka wrote: On 2011-07-16 10:52, Philippe Gerum wrote: On Sat, 2011-07-16 at 10:13 +0200, Jan Kiszka wrote: On 2011-07-15 15:10, Jan Kiszka wrote: But... right now it looks like we found our primary regression: nucleus/shadow: shorten the uninterruptible path to secondary mode. It opens a short windows during relax where the migrated task may be active under both schedulers. We are currently evaluating a revert (looks good so far), and I need to work out my theory in more details. Looks like this commit just made a long-standing flaw in Xenomai's interrupt handling more visible: We reschedule over the interrupt stack in the Xenomai interrupt handler tails, at least on x86-64. Not sure if other archs have interrupt stacks, the point is Xenomai's design wrongly assumes there are no such things. Fortunately, no, this is not a design issue, no such assumption was ever made, but the Xenomai core expects this to be handled on a per-arch basis with the interrupt pipeline. And that's already the problem: If Linux uses interrupt stacks, relying on ipipe to disable this during Xenomai interrupt handler execution is at best a workaround. A fragile one unless you increase the pre-thread stack size by the size of the interrupt stack. Lacking support for a generic rescheduling hook became a problem by the time Linux introduced interrupt threads. Don't assume too much. What was done for ppc64 was not meant as a general policy. Again, this is a per-arch decision. As you pointed out, there is no way to handle this via some generic Xenomai-only support. ppc64 now has separate interrupt stacks, which is why I disabled IRQSTACKS which became the builtin default at some point. Blackfin goes through a Xenomai-defined irq tail handler as well, because it may not reschedule over nested interrupt stacks. How does this arch prevent that xnpod_schedule in the generic interrupt handler tail does its normal work? It polls some hw status to know whether a rescheduling would be safe. See xnarch_escalate(). Fact is that such pending problem with x86_64 was overlooked since day #1 by /me. We were lucky so far that the values saved on this shared stack were apparently compatible, means we were overwriting them with identical or harmless values. But that's no longer true when interrupts are hitting us in the xnpod_suspend_thread path of a relaxing shadow. Makes sense. It would be better to find a solution that does not make the relax path uninterruptible again for a significant amount of time. On low end platforms we support (i.e. non-x86* mainly), this causes obvious latency spots. I agree. Conceptually, the interruptible relaxation should be safe now after recent fixes. Likely the only possible fix is establishing a reschedule hook for Xenomai in the interrupt exit path after the original stack is restored - - just like Linux works. Requires changes to both ipipe and Xenomai unfortunately. __ipipe_run_irqtail() is in the I-pipe core for such purpose. If instantiated properly for x86_64, and paired with xnarch_escalate() for that arch as well, it could be an option for running the rescheduling procedure when safe. Nope, that doesn't work. The stack is switched later in the return path in entry_64.S. We need a hook there, ideally a conditional one, controlled by some per-cpu variable that is set by Xenomai on return from its interrupt handlers to signal the rescheduling need. Yes, makes sense. The way to make it conditional without dragging bits of Xenomai logic into the kernel innards is not obvious though. It is probably time to officially introduce exo-kernel oriented bits into the Linux thread info. PTDs have too lose semantics to be practical if we want to avoid trashing the I-cache by calling probe hooks within the dual kernel, each time we want to check some basic condition (e.g. resched needed). A backlink to a foreign TCB there would help too. Which leads us to killing the ad hoc kernel threads (and stacks) at some point, which are an absolute pain. Jan -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On Wed, 2011-07-13 at 20:39 +0200, Gilles Chanteperdrix wrote: On 07/12/2011 07:43 PM, Jan Kiszka wrote: On 2011-07-12 19:38, Gilles Chanteperdrix wrote: On 07/12/2011 07:34 PM, Jan Kiszka wrote: On 2011-07-12 19:31, Gilles Chanteperdrix wrote: On 07/12/2011 02:57 PM, Jan Kiszka wrote: xnlock_put_irqrestore(nklock, s); xnpod_schedule(); } @@ -1036,6 +1043,7 @@ redo: * to process this signal anyway. */ if (rthal_current_domain == rthal_root_domain) { + XENO_BUGON(NUCLEUS, xnthread_test_info(thread, XNATOMIC)); Misleading dead code again, XNATOMIC is cleared not ten lines above. Nope, I forgot to remove that line. if (XENO_DEBUG(NUCLEUS) (!signal_pending(this_task) || this_task-state != TASK_RUNNING)) xnpod_fatal @@ -1044,6 +1052,8 @@ redo: return -ERESTARTSYS; } + xnthread_clear_info(thread, XNATOMIC); Why this? I find the xnthread_clear_info(XNATOMIC) right at the right place at the point it currently is. Nope. Now we either clear XNATOMIC after successful migration or when the signal is about to be sent (ie. in the hook). That way we can test more reliably (TM) in the gatekeeper if the thread can be migrated. Ok for adding the XNATOMIC test, because it improves the robustness, but why changing the way XNATOMIC is set and clear? Chances of breaking thing while changing code in this area are really high... The current code is (most probably) broken as it does not properly synchronizes the gatekeeper against a signaled and runaway target Linux task. We need an indication if a Linux signal will (or already has) woken up the to-be-migrated task. That task may have continued over its context, potentially on a different CPU. Providing this indication is the purpose of changing where XNATOMIC is cleared. What about synchronizing with the gatekeeper with a semaphore, as done in the first patch you sent, but doing it in xnshadow_harden, as soon as we detect that we are not back from schedule in primary mode? It seems it would avoid any further issue, as we would then be guaranteed that the thread could not switch to TASK_INTERRUPTIBLE again before the gatekeeper is finished. What worries me is the comment in xnshadow_harden: * gatekeeper sent us to primary mode. Since * TASK_UNINTERRUPTIBLE is unavailable to us without wrecking * the runqueue's count of uniniterruptible tasks, we just * notice the issue and gracefully fail; the caller will have * to process this signal anyway. */ Does this mean that we can not switch to TASK_UNINTERRUPTIBLE at this point? Or simply that TASK_UNINTERRUPTIBLE is not available for the business of xnshadow_harden? Second interpretation is correct. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] ESA SOCIS initiative
On Tue, 2011-07-12 at 09:20 +0200, julien.dela...@esa.int wrote: Dear all, The European Space Agency started a program called SOCIS. It aims at supporting free-software projects by providing funds to students that are willing to contribute to free-software projects. It works like the summer of code : mentoring organization subscribe to the program and propose projects. Then, the SOCIS committee selects the projects that are accepted. Finally, students apply to the selected projects, the mentoring organization choose the students for each project and the student has to complete the project in some weeks. Finally, if the projects is successfully finished, the student receives money. It is a nice way to improve free software and help some student ! As a Xenomai user, I was wondering if the project would like to apply to SOCIS. I think Xenomai developers may have several ideas of projects for students. I contacted Gilles Chanteperdrix to inform him about the initiative, he told me to post on this list because more people could be interested. You can have more information about the program on http://sophia.estec.esa.int/socis2011/?q=about . Subscription deadline is next Friday so that if you want to apply, you have to do that quickly. If you have any question regarding the program, do not hesitate to post on the SOCIS mailing list or to contact me. This is interesting, and the ESA running this program makes the latter even more attractive. However, the tasks of a mentoring organization proposing a project described here http://sophia.estec.esa.int/socis2011/faq seem way too heavy for us, especially in the short term. Things we commit to do should be done right, and unless I'm mistaken, I'm unsure anyone from the core team would be able to dedicate the required workload to handle this task properly. I would welcome any suggestion to make this possible nevertheless, because there is no shortage of interesting stuff that remains to be done on the Xenomai code base. Thanks fro suggesting this anyway. Best regards, ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On Tue, 2011-07-12 at 14:57 +0200, Jan Kiszka wrote: On 2011-07-12 14:13, Jan Kiszka wrote: On 2011-07-12 14:06, Gilles Chanteperdrix wrote: On 07/12/2011 01:58 PM, Jan Kiszka wrote: On 2011-07-12 13:56, Jan Kiszka wrote: However, this parallel unsynchronized execution of the gatekeeper and its target thread leaves an increasingly bad feeling on my side. Did we really catch all corner cases now? I wouldn't guarantee that yet. Specifically as I still have an obscure crash of a Xenomai thread on Linux schedule() on my table. What if the target thread woke up due to a signal, continued much further on a different CPU, blocked in TASK_INTERRUPTIBLE, and then the gatekeeper continued? I wish we could already eliminate this complexity and do the migration directly inside schedule()... BTW, we do we mask out TASK_ATOMICSWITCH when checking the task state in the gatekeeper? What would happen if we included it (state == (TASK_ATOMICSWITCH | TASK_INTERRUPTIBLE))? I would tend to think that what we should check is xnthread_test_info(XNATOMIC). Or maybe check both, the interruptible state and the XNATOMIC info bit. Actually, neither the info bits nor the task state is sufficiently synchronized against the gatekeeper yet. We need to hold a shared lock when testing and resetting the state. I'm not sure yet if that is fixable given the gatekeeper architecture. This may work (on top of the exit-race fix): diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c index 50dcf43..90feb16 100644 --- a/ksrc/nucleus/shadow.c +++ b/ksrc/nucleus/shadow.c @@ -913,20 +913,27 @@ static int gatekeeper_thread(void *data) if ((xnthread_user_task(target)-state ~TASK_ATOMICSWITCH) == TASK_INTERRUPTIBLE) { rpi_pop(target); xnlock_get_irqsave(nklock, s); -#ifdef CONFIG_SMP + /* - * If the task changed its CPU while in - * secondary mode, change the CPU of the - * underlying Xenomai shadow too. We do not - * migrate the thread timers here, it would - * not work. For a full migration comprising - * timers, using xnpod_migrate_thread is - * required. + * Recheck XNATOMIC to avoid waking the shadow if the + * Linux task received a signal meanwhile. */ - if (target-sched != sched) - xnsched_migrate_passive(target, sched); + if (xnthread_test_info(target, XNATOMIC)) { +#ifdef CONFIG_SMP + /* + * If the task changed its CPU while in + * secondary mode, change the CPU of the + * underlying Xenomai shadow too. We do not + * migrate the thread timers here, it would + * not work. For a full migration comprising + * timers, using xnpod_migrate_thread is + * required. + */ + if (target-sched != sched) + xnsched_migrate_passive(target, sched); #endif /* CONFIG_SMP */ - xnpod_resume_thread(target, XNRELAX); + xnpod_resume_thread(target, XNRELAX); + } xnlock_put_irqrestore(nklock, s); xnpod_schedule(); } @@ -1036,6 +1043,7 @@ redo: * to process this signal anyway. */ if (rthal_current_domain == rthal_root_domain) { + XENO_BUGON(NUCLEUS, xnthread_test_info(thread, XNATOMIC)); if (XENO_DEBUG(NUCLEUS) (!signal_pending(this_task) || this_task-state != TASK_RUNNING)) xnpod_fatal @@ -1044,6 +1052,8 @@ redo: return -ERESTARTSYS; } + xnthread_clear_info(thread, XNATOMIC); + /* current is now running into the Xenomai domain. */ thread-gksched = NULL; sched = xnsched_finish_unlocked_switch(thread-sched); @@ -2650,6 +2660,8 @@ static inline void do_sigwake_event(struct task_struct *p) xnlock_get_irqsave(nklock, s); + xnthread_clear_info(thread, XNATOMIC); + if ((p-ptrace PT_PTRACED) !xnthread_test_state(thread, XNDEBUG)) { sigset_t pending; It totally ignores RPI and PREEMPT_RT for now. RPI is broken anyway, I want to drop RPI in v3 for sure because it is misleading people. I'm still pondering whether we should do that earlier during the 2.6 timeframe. ripping it out would allow to use solely XNATOMIC as condition in the gatekeeper. /me is now looking to get
Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
On Thu, 2011-06-23 at 20:13 +0200, Philippe Gerum wrote: On Thu, 2011-06-23 at 19:32 +0200, Gilles Chanteperdrix wrote: On 06/23/2011 01:15 PM, Jan Kiszka wrote: On 2011-06-23 13:11, Gilles Chanteperdrix wrote: On 06/23/2011 11:37 AM, Jan Kiszka wrote: On 2011-06-20 19:07, Jan Kiszka wrote: On 2011-06-19 15:00, Gilles Chanteperdrix wrote: On 06/19/2011 01:17 PM, Gilles Chanteperdrix wrote: On 06/19/2011 12:14 PM, Gilles Chanteperdrix wrote: I am working on this ppd cleanup issue again, I am asking for help to find a fix in -head for all cases where the sys_ppd is needed during some cleanup. The problem is that when the ppd cleanup is invoked: - we have no guarantee that current is a thread from the Xenomai application; - if it is, current-mm is NULL. So, associating the sys_ppd to either current or current-mm does not work. What we could do is pass the sys_ppd to all the other ppds cleanup handlers, this would fix cases such as freeing mutexes fastlock, but that does not help when the sys_ppd is needed during a thread deletion hook. I would like to find a solution where simply calling xnsys_ppd_get() will work, where we do not have an xnsys_ppd_get for each context, such as for instance xnsys_ppd_get_by_mm/xnsys_ppd_get_by_task_struct, because it would be too error-prone. Any idea anyone? The best I could come up with: use a ptd to store the mm currently being cleaned up, so that xnshadow_ppd_get continues to work, even in the middle of a cleanup. In order to also get xnshadow_ppd_get to work in task deletion hooks (which is needed to avoid the issue at the origin of this thread), we also need to set this ptd upon shadow mapping, so it is still there when reaching the task deletion hook (where current-mm may be NULL). Hence the patch: diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c index b243600..6bc4210 100644 --- a/ksrc/nucleus/shadow.c +++ b/ksrc/nucleus/shadow.c @@ -65,6 +65,11 @@ int nkthrptd; EXPORT_SYMBOL_GPL(nkthrptd); int nkerrptd; EXPORT_SYMBOL_GPL(nkerrptd); +int nkmmptd; +EXPORT_SYMBOL_GPL(nkmmptd); + +#define xnshadow_mmptd(t) ((t)-ptd[nkmmptd]) +#define xnshadow_mm(t) ((struct mm_struct *)xnshadow_mmptd(t)) xnshadow_mm() can now return a no longer existing mm. So no user of xnshadow_mm should ever dereference that pointer. Thus we better change all that user to treat the return value as a void pointer e.g. struct xnskin_slot { struct xnskin_props *props; @@ -1304,6 +1309,8 @@ int xnshadow_map(xnthread_t *thread, xncompletion_t __user *u_completion, * friends. */ xnshadow_thrptd(current) = thread; + xnshadow_mmptd(current) = current-mm; + rthal_enable_notifier(current); if (xnthread_base_priority(thread) == 0 @@ -2759,7 +2766,15 @@ static void detach_ppd(xnshadow_ppd_t * ppd) static inline void do_cleanup_event(struct mm_struct *mm) { + struct task_struct *p = current; + struct mm_struct *old; + + old = xnshadow_mm(p); + xnshadow_mmptd(p) = mm; + ppd_remove_mm(mm, detach_ppd); + + xnshadow_mmptd(p) = old; I don't have the full picture yet, but that feels racy: If the context over which we clean up that foreign mm is also using xnshadow_mmptd, other threads in that process may dislike this temporary change. } RTHAL_DECLARE_CLEANUP_EVENT(cleanup_event); @@ -2925,7 +2940,7 @@ EXPORT_SYMBOL_GPL(xnshadow_unregister_interface); xnshadow_ppd_t *xnshadow_ppd_get(unsigned muxid) { if (xnpod_userspace_p()) - return ppd_lookup(muxid, current-mm); + return ppd_lookup(muxid, xnshadow_mm(current) ?: current-mm); return NULL; } @@ -2960,8 +2975,9 @@ int xnshadow_mount(void) sema_init(completion_mutex, 1); nkthrptd = rthal_alloc_ptdkey(); nkerrptd = rthal_alloc_ptdkey(); + nkmmptd = rthal_alloc_ptdkey(); - if (nkthrptd 0 || nkerrptd 0) { + if (nkthrptd 0 || nkerrptd 0 || nkmmptd 0) { printk(KERN_ERR Xenomai: cannot allocate PTD slots\n); return -ENOMEM; } diff --git a/ksrc/skins/posix/mutex.c b/ksrc/skins/posix/mutex.c index 6ce75e5..cc86852 100644 --- a/ksrc/skins/posix/mutex.c +++ b/ksrc/skins/posix/mutex.c @@ -219,10 +219,6 @@ void pse51_mutex_destroy_internal(pse51_mutex_t *mutex, xnlock_put_irqrestore(nklock, s); #ifdef CONFIG_XENO_FASTSYNCH - /* We call xnheap_free even if the mutex is not pshared; when -this function is called from pse51_mutexq_cleanup, the -sem_heap is destroyed, or not the one to which the fastlock -belongs, xnheap will simply return an error
Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage
On Thu, 2011-06-23 at 19:32 +0200, Gilles Chanteperdrix wrote: On 06/23/2011 01:15 PM, Jan Kiszka wrote: On 2011-06-23 13:11, Gilles Chanteperdrix wrote: On 06/23/2011 11:37 AM, Jan Kiszka wrote: On 2011-06-20 19:07, Jan Kiszka wrote: On 2011-06-19 15:00, Gilles Chanteperdrix wrote: On 06/19/2011 01:17 PM, Gilles Chanteperdrix wrote: On 06/19/2011 12:14 PM, Gilles Chanteperdrix wrote: I am working on this ppd cleanup issue again, I am asking for help to find a fix in -head for all cases where the sys_ppd is needed during some cleanup. The problem is that when the ppd cleanup is invoked: - we have no guarantee that current is a thread from the Xenomai application; - if it is, current-mm is NULL. So, associating the sys_ppd to either current or current-mm does not work. What we could do is pass the sys_ppd to all the other ppds cleanup handlers, this would fix cases such as freeing mutexes fastlock, but that does not help when the sys_ppd is needed during a thread deletion hook. I would like to find a solution where simply calling xnsys_ppd_get() will work, where we do not have an xnsys_ppd_get for each context, such as for instance xnsys_ppd_get_by_mm/xnsys_ppd_get_by_task_struct, because it would be too error-prone. Any idea anyone? The best I could come up with: use a ptd to store the mm currently being cleaned up, so that xnshadow_ppd_get continues to work, even in the middle of a cleanup. In order to also get xnshadow_ppd_get to work in task deletion hooks (which is needed to avoid the issue at the origin of this thread), we also need to set this ptd upon shadow mapping, so it is still there when reaching the task deletion hook (where current-mm may be NULL). Hence the patch: diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c index b243600..6bc4210 100644 --- a/ksrc/nucleus/shadow.c +++ b/ksrc/nucleus/shadow.c @@ -65,6 +65,11 @@ int nkthrptd; EXPORT_SYMBOL_GPL(nkthrptd); int nkerrptd; EXPORT_SYMBOL_GPL(nkerrptd); +int nkmmptd; +EXPORT_SYMBOL_GPL(nkmmptd); + +#define xnshadow_mmptd(t) ((t)-ptd[nkmmptd]) +#define xnshadow_mm(t) ((struct mm_struct *)xnshadow_mmptd(t)) xnshadow_mm() can now return a no longer existing mm. So no user of xnshadow_mm should ever dereference that pointer. Thus we better change all that user to treat the return value as a void pointer e.g. struct xnskin_slot { struct xnskin_props *props; @@ -1304,6 +1309,8 @@ int xnshadow_map(xnthread_t *thread, xncompletion_t __user *u_completion, * friends. */ xnshadow_thrptd(current) = thread; + xnshadow_mmptd(current) = current-mm; + rthal_enable_notifier(current); if (xnthread_base_priority(thread) == 0 @@ -2759,7 +2766,15 @@ static void detach_ppd(xnshadow_ppd_t * ppd) static inline void do_cleanup_event(struct mm_struct *mm) { + struct task_struct *p = current; + struct mm_struct *old; + + old = xnshadow_mm(p); + xnshadow_mmptd(p) = mm; + ppd_remove_mm(mm, detach_ppd); + + xnshadow_mmptd(p) = old; I don't have the full picture yet, but that feels racy: If the context over which we clean up that foreign mm is also using xnshadow_mmptd, other threads in that process may dislike this temporary change. } RTHAL_DECLARE_CLEANUP_EVENT(cleanup_event); @@ -2925,7 +2940,7 @@ EXPORT_SYMBOL_GPL(xnshadow_unregister_interface); xnshadow_ppd_t *xnshadow_ppd_get(unsigned muxid) { if (xnpod_userspace_p()) - return ppd_lookup(muxid, current-mm); + return ppd_lookup(muxid, xnshadow_mm(current) ?: current-mm); return NULL; } @@ -2960,8 +2975,9 @@ int xnshadow_mount(void) sema_init(completion_mutex, 1); nkthrptd = rthal_alloc_ptdkey(); nkerrptd = rthal_alloc_ptdkey(); + nkmmptd = rthal_alloc_ptdkey(); - if (nkthrptd 0 || nkerrptd 0) { + if (nkthrptd 0 || nkerrptd 0 || nkmmptd 0) { printk(KERN_ERR Xenomai: cannot allocate PTD slots\n); return -ENOMEM; } diff --git a/ksrc/skins/posix/mutex.c b/ksrc/skins/posix/mutex.c index 6ce75e5..cc86852 100644 --- a/ksrc/skins/posix/mutex.c +++ b/ksrc/skins/posix/mutex.c @@ -219,10 +219,6 @@ void pse51_mutex_destroy_internal(pse51_mutex_t *mutex, xnlock_put_irqrestore(nklock, s); #ifdef CONFIG_XENO_FASTSYNCH - /* We call xnheap_free even if the mutex is not pshared; when - this function is called from pse51_mutexq_cleanup, the - sem_heap is destroyed, or not the one to which the fastlock - belongs, xnheap will simply return an error. */ I think this comment is not completely obsolete. It still applies /wrt shared/non-shared.
Re: [Xenomai-core] [RFC] Getting rid of the NMI latency watchdog
On Wed, 2011-06-22 at 19:16 +0200, Gilles Chanteperdrix wrote: On 05/19/2011 10:29 PM, Philippe Gerum wrote: On Thu, 2011-05-19 at 20:36 +0200, Jan Kiszka wrote: On 2011-05-19 20:15, Gilles Chanteperdrix wrote: On 05/19/2011 03:58 PM, Philippe Gerum wrote: For this reason, I'm considering issuing a patch for a complete removal of the NMI latency watchdog code in Xenomai 2.6.x, disabling the feature for 2.6.38 kernels and above in 2.5.x. Comments welcome. I am in the same case as you: I no longer use Xeno's NMI watchdog, so I agree to get rid of it. Yeah. The last time we wanted to use it get more information about a hard hang, the CPU we used was not supported. Philippe, did you test the Linux watchdog already, if it generate proper results on artificial Xenomai lockups on a single core? This works provided we tell the pipeline to enter printk-sync mode when the watchdog kicks. So I'd say that we could probably do a better job in making the pipeline core smarter wrt NMI watchdog context handling than asking Xenomai to dup the mainline code for having its own NMI handling. If nobody disagrees, I am removing this code from -head. Now. Ack. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Number of arguments
On Tue, 2011-06-21 at 17:33 +0200, zenati wrote: Dear, As known, the norm arinc653 is very strict and the API proposed should be respected. Some functions of the API need more than five arguments. However, the SKINCALL is limited to 5. If I want to increase, I have to modify the following files : - ./xenomai-2.5.6/include/asm-arm/syscall.h - ./xenomai-2.5.6/include/asm-blackfin/syscall.h - ./xenomai-2.5.6/include/asm-x86/syscall.h - ./xenomai-2.5.6/include/asm-powerpc/syscall.h - ./xenomai-2.5.6/include/asm-nios2/syscall.h Is it possible? Is is a good idea ? No, dead end. You should group arguments in a struct and pass the address of such struct to kernel land for decoding. Thank you for your attention and your help. Sincerely Omar ZENATI ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [RFC][PATCH] nucleus: Prevent rescheduling while in xntbase_tick
On Fri, 2011-06-17 at 13:03 +0200, Jan Kiszka wrote: On 2011-06-17 12:58, Gilles Chanteperdrix wrote: On 06/17/2011 11:27 AM, Jan Kiszka wrote: Based on code inspection, it looks like a timer handler triggering a reschedule in the path xntbase_tick - xntimer_tick_aperiodic / xntimer_tick_periodic_inner - handler can cause problems, e.g. a reschedule before all expired timers were processed. The timer core is usually run atomically from an interrupt handler, so better emulate an IRQ context inside xntbase_tick by setting XNINIRQ. I do not understand this one either: if we are inside xntimer_tick_aperiodic, XNINIRQ is already set. Not if you come via xntbase_tick, called by the mentioned skins also outside a timer IRQ (at least based on my understanding of that skin APIs). But I might be wrong, I just came across this while checking for potentially invalid cached xnpod_current_sched values. That is ok, ui_timer(), tickAnnounce() and tm_tick() are designed by the respective RTOS to be called from IRQ context. Jan -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [RFC][PATCH] nucleus: Prevent rescheduling while in xntbase_tick
On Fri, 2011-06-17 at 14:11 +0200, Jan Kiszka wrote: On 2011-06-17 13:58, Philippe Gerum wrote: On Fri, 2011-06-17 at 13:03 +0200, Jan Kiszka wrote: On 2011-06-17 12:58, Gilles Chanteperdrix wrote: On 06/17/2011 11:27 AM, Jan Kiszka wrote: Based on code inspection, it looks like a timer handler triggering a reschedule in the path xntbase_tick - xntimer_tick_aperiodic / xntimer_tick_periodic_inner - handler can cause problems, e.g. a reschedule before all expired timers were processed. The timer core is usually run atomically from an interrupt handler, so better emulate an IRQ context inside xntbase_tick by setting XNINIRQ. I do not understand this one either: if we are inside xntimer_tick_aperiodic, XNINIRQ is already set. Not if you come via xntbase_tick, called by the mentioned skins also outside a timer IRQ (at least based on my understanding of that skin APIs). But I might be wrong, I just came across this while checking for potentially invalid cached xnpod_current_sched values. That is ok, ui_timer(), tickAnnounce() and tm_tick() are designed by the respective RTOS to be called from IRQ context. Fine. Should we add a XENO_ASSERT to set this in stone and for documentation purposes? I think so. Jan -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Oops with synchronous message passing support
On Thu, 2011-06-09 at 14:42 +0200, Wolfgang Grandegger wrote: Hello, I just realized a problem with synchronous message passing support. When rt_task_send() send times out, I get the oops below from line: Does this help? diff --git a/ksrc/skins/native/task.c b/ksrc/skins/native/task.c index b822fd0..b0e99a7 100644 --- a/ksrc/skins/native/task.c +++ b/ksrc/skins/native/task.c @@ -1988,21 +1988,28 @@ int rt_task_receive(RT_TASK_MCB *mcb_r, RTIME timeout) } /* -* Wait on our receive slot for some client to enqueue itself -* in our send queue. +* We loop to care for spurious wakeups, in case the +* client times out before we unblock. */ - info = xnsynch_sleep_on(server-mrecv, timeout, XN_RELATIVE); - /* -* XNRMID cannot happen, since well, the current task would be the -* deleted object, so... -*/ - if (info XNTIMEO) { - err = -ETIMEDOUT; /* Timeout. */ - goto unlock_and_exit; - } else if (info XNBREAK) { - err = -EINTR; /* Unblocked. */ - goto unlock_and_exit; - } + do { + /* +* Wait on our receive slot for some client to enqueue +* itself in our send queue. +*/ + info = xnsynch_sleep_on(server-mrecv, timeout, XN_RELATIVE); + /* +* XNRMID cannot happen, since well, the current task +* would be the deleted object, so... +*/ + if (info XNTIMEO) { + err = -ETIMEDOUT; /* Timeout. */ + goto unlock_and_exit; + } + if (info XNBREAK) { + err = -EINTR; /* Unblocked. */ + goto unlock_and_exit; + } + } while (!xnsynch_pended_p(server-mrecv)); holder = getheadpq(xnsynch_wait_queue(server-msendq)); /* There must be a valid holder since we waited for it. */ http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/skins/native/task.c#1976 -bash-3.2# ./oops_sender pre-rt_task_receive() [ 662.423571] Unable to handle kernel paging request for data at address 0x024c [ 662.515607] Faulting instruction address: 0xc0070124 [ 662.576614] Oops: Kernel access of bad area, sig: 11 [#2] [ 662.642806] mpc5200-simple-platform [ 662.685493] last sysfs file: [ 662.721775] Modules linked in: [ 662.759127] NIP: c0070124 LR: c00701c8 CTR: [ 662.819974] REGS: c7b8bd40 TRAP: 0300 Tainted: G D (2.6.36.4-3-g1af23a4-dirty) [ 662.925684] MSR: 3032 FP,ME,IR,DR CR: 24008482 XER: 2000 [ 663.003613] DAR: 024c, DSISR: 2000 [ 663.053780] TASK = c7b923f0[1227] 'oops_test_main' THREAD: c7b8a000 [ 663.128525] GPR00: c7b8bdf0 c7b923f0 c902a69c c042ea60 36291b28 c042ea60 [ 663.231015] GPR08: c902a210 fe0c c902a210 c9029df8 24008422 1004a118 [ 663.333503] GPR16: c040dc54 c0425ba0 fff0 c7b8bf50 c0425ba0 c042f058 0010 c902a210 [ 663.435993] GPR24: c03f9af8 c0425ba0 c03fb678 fdfc c902a200 c7b8be20 fffc [ 663.540611] NIP [c0070124] rt_task_receive+0xc8/0x1ac [ 663.602534] LR [c00701c8] rt_task_receive+0x16c/0x1ac [ 663.664436] Call Trace: [ 663.694322] [c7b8bdf0] [c00701c8] rt_task_receive+0x16c/0x1ac (unreliable) [ 663.778687] [c7b8be10] [c0072de4] __rt_task_receive+0xd0/0x1b0 [ 663.850245] [c7b8be90] [c0068cd0] losyscall_event+0xc8/0x328 [ 663.919654] [c7b8bed0] [c00587c8] __ipipe_dispatch_event+0xa4/0x200 [ 663.996532] [c7b8bf20] [c000ae78] __ipipe_syscall_root+0x58/0x164 [ 664.071289] [c7b8bf40] [c00104b8] DoSyscall+0x20/0x5c [ 664.133213] --- Exception: c01 at 0xffaca7c [ 664.133222] LR = 0xffaca14 [ 664.221809] Instruction dump: [ 664.258092] 557b07fe 90091b48 813d049c 7f9c4800 419e007c 2f89 3929fe0c 419e0070 [ 664.353104] 2f89 3b80 419e0008 3b89fff0 83bc0450 3be0ff97 801e000c 7f9d0040 [ 664.452834] ---[ end trace 07ae98a3f6576a96 ]--- Message from syslogd@ at Sat Feb 21 08:02:43 1970 ... CPUP0 kernel: [ 662.685493] last sysfs file: rt_task_send() failed: -110 (Connection timed out) Killing child The oops is *not* trigger if the timeout is long enough and rt_task_send() returns successfully. I'm using on a PowerPC MPC5200-based system: -bash-3.2# cat /proc/ipipe/version 2.12-03 -bash-3.2# cat /proc/xenomai/version 2.5.6 -bash-3.2# uname -a Linux CPUP0 2.6.36.4-3-g1af23a4-dirty #9 Thu Jun 9 11:56:54 CEST 2011 ppc ppc ppc GNU/Linux Any idea what could go wrong. I have attached my litte test programs including Makefile. Just start it with ./oops_sender. Thanks, Wolfgang. ___ Xenomai-core mailing list Xenomai-core@gna.org
Re: [Xenomai-core] Oops with synchronous message passing support
On Thu, 2011-06-09 at 15:34 +0200, Wolfgang Grandegger wrote: Hi Philippe, On 06/09/2011 03:05 PM, Philippe Gerum wrote: On Thu, 2011-06-09 at 14:42 +0200, Wolfgang Grandegger wrote: Hello, I just realized a problem with synchronous message passing support. When rt_task_send() send times out, I get the oops below from line: Does this help? diff --git a/ksrc/skins/native/task.c b/ksrc/skins/native/task.c index b822fd0..b0e99a7 100644 --- a/ksrc/skins/native/task.c +++ b/ksrc/skins/native/task.c @@ -1988,21 +1988,28 @@ int rt_task_receive(RT_TASK_MCB *mcb_r, RTIME timeout) } /* -* Wait on our receive slot for some client to enqueue itself -* in our send queue. +* We loop to care for spurious wakeups, in case the +* client times out before we unblock. */ - info = xnsynch_sleep_on(server-mrecv, timeout, XN_RELATIVE); - /* -* XNRMID cannot happen, since well, the current task would be the -* deleted object, so... -*/ - if (info XNTIMEO) { - err = -ETIMEDOUT; /* Timeout. */ - goto unlock_and_exit; - } else if (info XNBREAK) { - err = -EINTR; /* Unblocked. */ - goto unlock_and_exit; - } + do { + /* +* Wait on our receive slot for some client to enqueue +* itself in our send queue. +*/ + info = xnsynch_sleep_on(server-mrecv, timeout, XN_RELATIVE); + /* +* XNRMID cannot happen, since well, the current task +* would be the deleted object, so... +*/ + if (info XNTIMEO) { + err = -ETIMEDOUT; /* Timeout. */ + goto unlock_and_exit; + } + if (info XNBREAK) { + err = -EINTR; /* Unblocked. */ + goto unlock_and_exit; + } + } while (!xnsynch_pended_p(server-mrecv)); holder = getheadpq(xnsynch_wait_queue(server-msendq)); /* There must be a valid holder since we waited for it. */ Yes, it does help: -bash-3.2# ./oops_sender pre-rt_task_receive() rt_task_send() failed: -110 (Connection timed out) Killing child No more oops, thanks for your quick help. Ok, thanks for reporting. Patch queued. Wolfgang. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Fragile lock usage tracking for auto-relax
On Tue, 2011-05-31 at 13:37 +0200, Jan Kiszka wrote: Hi Philippe, enabling XENO_OPT_DEBUG_NUCLEUS reveals some shortcomings of the in-kernel lock usage tracking via xnthread_t::hrescnt. This BUGON in xnsynch_release triggers for RT threads: XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner) 0); RT threads do not balance their lock and unlock syscalls, so their counter goes wild quite quickly. But just limiting the bug check to XNOTHER threads is neither a solution. How to deal with the counter on scheduling policy changes? So my suggestion is to convert the auto-relax feature into a service, user space can request based on a counter that user space maintains independently. I.e. we should create another shared word that user space increments and decrements on lock acquisitions/releases on its own. The nucleus just tests it when deciding about the relax on return to user space. But before hacking into that direction, I'd like to hear if it makes sense to you. At first glance, this does not seem to address the root issue. The bottom line is that we should not have any thread release an owned lock it does not hold, kthread or not. In that respect, xnsynch_release() looks fishy because it may be called over a context which is _not_ the lock owner, but the thread who is deleting the lock owner, so assuming lastowner == current_thread when releasing is wrong. At the very least, the following patch would prevent xnsynch_release_all_ownerships() to break badly. The same way, the fastlock stuff does not track the owner properly in the synchro object. We should fix those issues before going further, they may be related to the bug described. Totally, genuinely, 100% untested. diff --git a/ksrc/nucleus/synch.c b/ksrc/nucleus/synch.c index 3a53527..0785533 100644 --- a/ksrc/nucleus/synch.c +++ b/ksrc/nucleus/synch.c @@ -424,6 +424,7 @@ xnflags_t xnsynch_acquire(struct xnsynch *synch, xnticks_t timeout, XN_NO_HANDLE, threadh); if (likely(fastlock == XN_NO_HANDLE)) { + xnsynch_set_owner(synch, thread); xnthread_inc_rescnt(thread); xnthread_clear_info(thread, XNRMID | XNTIMEO | XNBREAK); @@ -718,7 +719,7 @@ struct xnthread *xnsynch_release(struct xnsynch *synch) XENO_BUGON(NUCLEUS, !testbits(synch-status, XNSYNCH_OWNER)); - lastowner = xnpod_current_thread(); + lastowner = synch-owner ?: xnpod_current_thread(); xnthread_dec_rescnt(lastowner); XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner) 0); lastownerh = xnthread_handle(lastowner); -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Fragile lock usage tracking for auto-relax
On Tue, 2011-05-31 at 18:38 +0200, Gilles Chanteperdrix wrote: On 05/31/2011 06:29 PM, Philippe Gerum wrote: On Tue, 2011-05-31 at 13:37 +0200, Jan Kiszka wrote: Hi Philippe, enabling XENO_OPT_DEBUG_NUCLEUS reveals some shortcomings of the in-kernel lock usage tracking via xnthread_t::hrescnt. This BUGON in xnsynch_release triggers for RT threads: XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner) 0); RT threads do not balance their lock and unlock syscalls, so their counter goes wild quite quickly. But just limiting the bug check to XNOTHER threads is neither a solution. How to deal with the counter on scheduling policy changes? So my suggestion is to convert the auto-relax feature into a service, user space can request based on a counter that user space maintains independently. I.e. we should create another shared word that user space increments and decrements on lock acquisitions/releases on its own. The nucleus just tests it when deciding about the relax on return to user space. But before hacking into that direction, I'd like to hear if it makes sense to you. At first glance, this does not seem to address the root issue. The bottom line is that we should not have any thread release an owned lock it does not hold, kthread or not. In that respect, xnsynch_release() looks fishy because it may be called over a context which is _not_ the lock owner, but the thread who is deleting the lock owner, so assuming lastowner == current_thread when releasing is wrong. At the very least, the following patch would prevent xnsynch_release_all_ownerships() to break badly. The same way, the fastlock stuff does not track the owner properly in the synchro object. We should fix those issues before going further, they may be related to the bug described. It looks to me like xnsynch_fast_release uses cmpxchg, so, will not set the owner to NULL if the current owner is not the thread releasing the mutex. Is it not sufficient? Yes, we need to move that swap to the irq off section to clear the owner there as well. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Fragile lock usage tracking for auto-relax
On Tue, 2011-05-31 at 18:38 +0200, Jan Kiszka wrote: On 2011-05-31 18:29, Philippe Gerum wrote: On Tue, 2011-05-31 at 13:37 +0200, Jan Kiszka wrote: Hi Philippe, enabling XENO_OPT_DEBUG_NUCLEUS reveals some shortcomings of the in-kernel lock usage tracking via xnthread_t::hrescnt. This BUGON in xnsynch_release triggers for RT threads: XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner) 0); RT threads do not balance their lock and unlock syscalls, so their counter goes wild quite quickly. But just limiting the bug check to XNOTHER threads is neither a solution. How to deal with the counter on scheduling policy changes? So my suggestion is to convert the auto-relax feature into a service, user space can request based on a counter that user space maintains independently. I.e. we should create another shared word that user space increments and decrements on lock acquisitions/releases on its own. The nucleus just tests it when deciding about the relax on return to user space. But before hacking into that direction, I'd like to hear if it makes sense to you. At first glance, this does not seem to address the root issue. The bottom line is that we should not have any thread release an owned lock it does not hold, kthread or not. In that respect, xnsynch_release() looks fishy because it may be called over a context which is _not_ the lock owner, but the thread who is deleting the lock owner, so assuming lastowner == current_thread when releasing is wrong. At the very least, the following patch would prevent xnsynch_release_all_ownerships() to break badly. The same way, the fastlock stuff does not track the owner properly in the synchro object. We should fix those issues before going further, they may be related to the bug described. Totally, genuinely, 100% untested. diff --git a/ksrc/nucleus/synch.c b/ksrc/nucleus/synch.c index 3a53527..0785533 100644 --- a/ksrc/nucleus/synch.c +++ b/ksrc/nucleus/synch.c @@ -424,6 +424,7 @@ xnflags_t xnsynch_acquire(struct xnsynch *synch, xnticks_t timeout, XN_NO_HANDLE, threadh); if (likely(fastlock == XN_NO_HANDLE)) { + xnsynch_set_owner(synch, thread); xnthread_inc_rescnt(thread); xnthread_clear_info(thread, XNRMID | XNTIMEO | XNBREAK); @@ -718,7 +719,7 @@ struct xnthread *xnsynch_release(struct xnsynch *synch) XENO_BUGON(NUCLEUS, !testbits(synch-status, XNSYNCH_OWNER)); - lastowner = xnpod_current_thread(); + lastowner = synch-owner ?: xnpod_current_thread(); xnthread_dec_rescnt(lastowner); XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner) 0); lastownerh = xnthread_handle(lastowner); That's maybe another problem, need to check. Back to the original issue: with fastlock, kernel space has absolutely no clue about how many locks user space may hold - unless someone is contending for all those locks. IOW, you can't reliably track resource ownership at kernel level without user space help out. The current way it helps (enforced syscalls of XNOTHER threads) is insufficient. The thing is: we don't care about knowing how many locks some non-current thread owns. What the nucleus wants to know is whether the _current user-space_ thread owns a lock, which is enough for the autorelax management. This restricted scope makes the logic fine. The existing resource counter is by no mean a resource tracking tool that could be used from whatever context to query the number of locks an arbitrary thread holds, it has not been intended that way at all. It only answers the simple question: do I hold any lock, as an XNOTHER thread. Alternatively to plain counting of ownership in user space, we could adopt mainline's robust mutex mechanism (a user space maintained list) that solves the release-all-ownerships issue. But I haven't looked into details yet. Would be nice, but still overkill for the purpose of autorelax management. Jan -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] I suspect the shared memory of Xenomai has bug.
On Thu, 2011-05-19 at 15:37 +0800, arethe.rtai wrote: I solved the problem of the shared memory cannot be allocated in the user space. Because the shared memory space are allocated from the kheap, but the kheap is not a mapped heap, i.e. its pages are not reserved. I init the kheap by xnheap_init_mapped rather than xnheap_init. The problem is solved. You have just turned the global system heap to a shared heap, which is badly wrong. If anywhere, the issue is in create_new_heap(), or in the _compat_shm_alloc() interface in userland, or a combination of both. As Gilles told you already, such a 100% reproducible allocation/mapping issue can not be a generic one, involving the core heap system, otherwise no skin would ever work. We do depend on the system heap internally, for almost everything in the system. It is much more likely a local RTAI skin bug, because this code has bit rot over time, due to lack of interest and users. PS: Please keep the list CCed. Qin Chenggang 2011-05-19 __ arethe.rtai __ 发件人: Philippe Gerum 发送时间: 2011-05-13 14:50:06 收件人: arethe rtai 抄送: Xenomai-core 主题: Re: [Xenomai-core] I suspect the shared memory of Xenomai has bug. On Fri, 2011-05-13 at 09:25 +0800, arethe rtai wrote: HI all: I always got null while I request a shared memory through RTAI skins in the user space. Some bugs maybe exist in the sub-system. I traced the execution stream of rt_shm_alloc, and found the mmap() operation always return -22. I suspect the problem should be same while we use other skins, because the mmap operation is implemented in /ksrc/nucleus/heap.c. Is there anyone encountered this problem? No. The fact is that the RTAI skin has not been actively maintained for years now, so a local bug there is possible. Due to the lack of users and interest, this skin was removed from the upcoming 2.6.x series. Regards Arethe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] [RFC] Getting rid of the NMI latency watchdog
The NMI latency watchdog is a feature Xenomai supports when proper hardware is available, which triggers a stack backtrace dump, then panics when a real-time timer tick is late by a given amount of time. We used it in the early times to chase pathological latencies, particularly when debugging the original SMP port. We currently have two architectures supporting that watchdog, namely x86 and blackfin. x86-wise, the rebasing of the NMI support in mainline over the perf sub-system just obsoleted our NMI hijacking badly, making it unusable since 2.6.38. As I was diving in our NMI support code to adapt it once again for 2.6.38 - with a vague feeling of seasickness coming - I felt maybe time has come to question the very presence of that feature in our code base: - NMI watchdog predated the latency tracer. AFAIC, I stopped using the former long ago, preferring the latter for debugging latency issues. - the non-maskable nature of the interrupt trigger does not help us nowadays compared to using the I-pipe tracer: the mainline NMI support would catch hard lockups with irqs off and panic the same way, and the tracer would help spotting the issue with a much finer level of detail in case the latency spot leaves the machine in a sane state, Ie. when the board remains usable and allows for inspection of /proc/ipipe/trace files. - hijacking the mainline NMI code the way we do has always been a massive pain on x86, prone to trigger conflicts with later kernel releases. For this reason, I'm considering issuing a patch for a complete removal of the NMI latency watchdog code in Xenomai 2.6.x, disabling the feature for 2.6.38 kernels and above in 2.5.x. Comments welcome. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [RFC] Getting rid of the NMI latency watchdog
On Thu, 2011-05-19 at 20:36 +0200, Jan Kiszka wrote: On 2011-05-19 20:15, Gilles Chanteperdrix wrote: On 05/19/2011 03:58 PM, Philippe Gerum wrote: For this reason, I'm considering issuing a patch for a complete removal of the NMI latency watchdog code in Xenomai 2.6.x, disabling the feature for 2.6.38 kernels and above in 2.5.x. Comments welcome. I am in the same case as you: I no longer use Xeno's NMI watchdog, so I agree to get rid of it. Yeah. The last time we wanted to use it get more information about a hard hang, the CPU we used was not supported. Philippe, did you test the Linux watchdog already, if it generate proper results on artificial Xenomai lockups on a single core? This works provided we tell the pipeline to enter printk-sync mode when the watchdog kicks. So I'd say that we could probably do a better job in making the pipeline core smarter wrt NMI watchdog context handling than asking Xenomai to dup the mainline code for having its own NMI handling. Jan -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] I suspect the shared memory of Xenomai has bug.
On Fri, 2011-05-13 at 09:25 +0800, arethe rtai wrote: HI all: I always got null while I request a shared memory through RTAI skins in the user space. Some bugs maybe exist in the sub-system. I traced the execution stream of rt_shm_alloc, and found the mmap() operation always return -22. I suspect the problem should be same while we use other skins, because the mmap operation is implemented in /ksrc/nucleus/heap.c. Is there anyone encountered this problem? No. The fact is that the RTAI skin has not been actively maintained for years now, so a local bug there is possible. Due to the lack of users and interest, this skin was removed from the upcoming 2.6.x series. Regards Arethe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Integration of new drivers in xenomai
On Tue, 2011-05-10 at 10:42 +0200, julien.dela...@esa.int wrote: Dear all, For my work, I had to develop drivers for the 6052 and 6701 boards from National Instruments. These boards are already supported by the Comedi drivers but were not supported by Xenomai yet. So, I took the code from Comedi and adapt it to Xenomai with the analogy layer (a4l* functions and so on). It introduces two new drivers : analogy_ni_670x and analogy_ni_660x. At this time, the driver compiles and loads correctly. I will have the hardware in ten days to test the code and make sure it works from a functional point of view. In order to contribute to Xenomai, I would like to know if this code could be integrated in Xenomai repository. In particular, what are the conditions to integrate third-party code and especially driver code. Simple and straightforward: - free software license compatible with the linux kernel licensing terms - proper credits and copyrights retained from the original code - the code should solve a problem, instead of introducing it - standard linux kernel coding style Then, if it seems interesting for you, I will submit a patch as soon as I checked that it works correctly with the physical boards. Generally speaking, any sound contribution is welcome. Technically, Alex has the final cut for Analogy stuff. Thanks for any suggestion, Best regards, ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] SWITCH TASK TO SECONDARY MODE DURING GDB SESSION DOESN'T WORK
On Tue, 2011-05-10 at 18:01 +0200, Roberto Bielli wrote: Hi, i try the next C code in a gdb session on a ARM target with brakpoints and i found a strange behaviour. I expected that the task 'tsk1' never execute instead if i put a breakpoint on the instruction 'varInt += 1;' i see that it's executed. I sospect that there is this behavior: 1. i start gdb with application. The application is not running. 2. i enable the breakpoint on the instruction 'err = rt_task_start(tsk1, test_tsk1, NULL );' 3. i enable the breakpoint on the instruction 'varInt += 1;' in tsk1 4. run the application that stop on the instruction 'err = rt_task_start(tsk1, test_tsk1, NULL );' so the task is in secondary mode for the breakpoint hit if i understand correctly . 5. i make a single step and do the rt_task_start in main, so i see that i break in the task on the instruction 'varInt += 1;' in tsk1 this is strange because main has shadow and has a priority equal o 99, instead tsk1 has a priority equal to 49. 6, the program execute the instruction 'varInt += 1;' and then return to main. 7 then the main has always the control. The question is: Why the priority is not respected ? I think this. Maybe the main is in secondary mode for the breakpoint and when make a rt_task_create the new task is in primary mode for the initial instruction. Then receive the signal ( verified with rt_task_set_mode(0, T_WARNSW, NULL); ) and switch to secondary mode but initially the task tsk is in primary mode and execute before main. Is a known problem ? This is a known restriction imposed on us by the dual kernel design. - gdb means ptrace(), ptrace() means lost of linux signals - receiving a linux signal in primary mode causes a switch to secondary, so that we can handle it safely from a sane linux context - receiving a linux signal in secondary mode prevents the root priority to be boosted (no PIP), so that lengthy kernel code handling lethal signals is not stealing the CPU away from lively real-time tasks. In short, gdb will surely break the expected priority order because it depends on ptrace(), and ptrace() is making heavy use of linux signals. Only explicit synchronization between threads (sems, mutexes, whatever) can still guarantee proper serialization in this context. Best Regards CODE- int varInt = 0; RT_TASK tsk1, tskMain; int main (int argc, char *argv[]) { int err; mlockall(MCL_CURRENT|MCL_FUTURE); rt_task_shadow(tskMain, main, 99, 0); err = rt_task_create( tsk1, tsk1, 0, 49, T_FPU ); err = rt_task_start(tsk1, test_tsk1, NULL ); for (;;) { if( varInt 1 ) break; } printf(Task started\n); return 0; } void test_tsk1( void * args ) { for( ;;) { varInt += 1; rt_task_sleep( 1 ); } } -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PowerPC]Badness at mmu_context_nohash
On Fri, 2011-04-29 at 18:08 +0200, Jean-Michel Hautbois wrote: 2011/4/29 Philippe Gerum r...@xenomai.org: On Thu, 2011-04-28 at 10:33 +0200, Jean-Michel Hautbois wrote: 2011/4/27 Philippe Gerum r...@xenomai.org: On Wed, 2011-04-27 at 20:42 +0200, Jean-Michel Hautbois wrote: Hi list, I am currently using a Xenomai port on a linux 2.6.35.11 linux kernel and the adeos-ipipe-2.6.35.7-powerpc-2.12-01.patch. I am facing a scheduling issue on a P2020 (dual core PowerPC), and I get the following message : Badness at arch/powerpc/mm/mmu_context_nohash.c:209 NIP: c0018d20 LR: c039b94c CTR: c00343e4 REGS: ecfadce0 TRAP: 0700 Tainted: GW(2.6.35.11) MSR: 00021000 ME,CE CR: 24000488 XER: TASK = ec5220d0[496] 'sipaq' THREAD: ecfac000 CPU: 1 GPR00: 0001 ecfadd90 ec5220d0 ec5df340 ec58a700 0003 GPR08: c04a2d98 0007 c04a2d98 0067e000 0002f385 1007f1f8 c04a5b40 ecfac040 GPR16: c04a5b40 c04deb80 c04a2120 c04a2d98 c04a5b40 c04d008c ecfac000 00029000 GPR24: c04d c04d1e6c 0001 ec58a700 eceaf390 c04d1e78 c0b23b40 ec5df340 NIP [c0018d20] switch_mmu_context+0x80/0x438 LR [c039b94c] schedule+0x774/0x7dc Call Trace: [ecfadd90] [44000484] 0x44000484 (unreliable) [ecfadde0] [c039b94c] schedule+0x774/0x7dc [ecfade50] [c039cb98] do_nanosleep+0xc8/0x114 [ecfade80] [c0059bf8] hrtimer_nanosleep+0xd8/0x158 [ecfadf10] [c0059d48] sys_nanosleep+0xd0/0xd4 [ecfadf40] [c0013c0c] ret_from_syscall+0x0/0x3c --- Exception: c01 at 0xffa6cc4 LR = 0xffa6cb0 Instruction dump: 40a2fff0 4c00012c 2f80 409e0128 813b018c 2f83 39290001 913b018c 419e0020 8003018c 7c34 5400d97e 0f00 8123018c 3929 9123018c Do you have a clue on how to start debugging it ? Yes, but that can't be easily summarized here. In short, we have a serious problem with the sharing of the MMU context between the Linux and Xenomai schedulers in the SMP case on powerpc. OK, good to know that it is a known issue. If there is a thread with some thoughts about it, I am interested ;). It is happening quite randomly... :). Does disabling CONFIG_XENO_HW_UNLOCKED_SWITCH clear this issue? Well, yes and no. It starts well, but when booting the kernel I get : The mm switch issue was specifically addressed by this patch, which is part of 2.12-01: http://git.denx.de/?p=ipipe-2.6.git;a=commit;h=c14a47630d62d0328de1957636dceb1d498f7048 However, it the last 2.6.35 patch issued was based on 2.6.35.7, not 2.6.35.11, so there is still the possibility that something went wrong while you forward ported this code. - Please check that mmu_context_nohash.c does contain the fix above as it should It is ok, I have the fix. Does 2.6.35.7-2.12-02 exhibit the issue as well? - Please try Richard's suggestion, i.e. moving to 2.6.36, which may give us more hints. It is better. I don't have the badness on mmu context anymore. This gives some hints ;). Yes and no. The mmu management code involved was untouched between 2.6.35 and 2.6.36, so I still don't get why this activity counter gets trashed yet. Badness at kernel/lockdep.c:2327 NIP: c006e554 LR: c006e53c CTR: 000186a0 Adeos sometimes conflicts with the vanilla IRQ state tracer. I'll have a look at this. Disable CONFIG_TRACE_IRQFLAGS. Yes, but I *want* to have the CONFIG_TRACE_IRQFLAGS on. I just wanted to tell that I had the problem, in order to be sure it is known ;). Sure, but one issue at a time. JM -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PowerPC]Badness at mmu_context_nohash
On Thu, 2011-04-28 at 10:33 +0200, Jean-Michel Hautbois wrote: 2011/4/27 Philippe Gerum r...@xenomai.org: On Wed, 2011-04-27 at 20:42 +0200, Jean-Michel Hautbois wrote: Hi list, I am currently using a Xenomai port on a linux 2.6.35.11 linux kernel and the adeos-ipipe-2.6.35.7-powerpc-2.12-01.patch. I am facing a scheduling issue on a P2020 (dual core PowerPC), and I get the following message : Badness at arch/powerpc/mm/mmu_context_nohash.c:209 NIP: c0018d20 LR: c039b94c CTR: c00343e4 REGS: ecfadce0 TRAP: 0700 Tainted: GW(2.6.35.11) MSR: 00021000 ME,CE CR: 24000488 XER: TASK = ec5220d0[496] 'sipaq' THREAD: ecfac000 CPU: 1 GPR00: 0001 ecfadd90 ec5220d0 ec5df340 ec58a700 0003 GPR08: c04a2d98 0007 c04a2d98 0067e000 0002f385 1007f1f8 c04a5b40 ecfac040 GPR16: c04a5b40 c04deb80 c04a2120 c04a2d98 c04a5b40 c04d008c ecfac000 00029000 GPR24: c04d c04d1e6c 0001 ec58a700 eceaf390 c04d1e78 c0b23b40 ec5df340 NIP [c0018d20] switch_mmu_context+0x80/0x438 LR [c039b94c] schedule+0x774/0x7dc Call Trace: [ecfadd90] [44000484] 0x44000484 (unreliable) [ecfadde0] [c039b94c] schedule+0x774/0x7dc [ecfade50] [c039cb98] do_nanosleep+0xc8/0x114 [ecfade80] [c0059bf8] hrtimer_nanosleep+0xd8/0x158 [ecfadf10] [c0059d48] sys_nanosleep+0xd0/0xd4 [ecfadf40] [c0013c0c] ret_from_syscall+0x0/0x3c --- Exception: c01 at 0xffa6cc4 LR = 0xffa6cb0 Instruction dump: 40a2fff0 4c00012c 2f80 409e0128 813b018c 2f83 39290001 913b018c 419e0020 8003018c 7c34 5400d97e 0f00 8123018c 3929 9123018c Do you have a clue on how to start debugging it ? Yes, but that can't be easily summarized here. In short, we have a serious problem with the sharing of the MMU context between the Linux and Xenomai schedulers in the SMP case on powerpc. OK, good to know that it is a known issue. If there is a thread with some thoughts about it, I am interested ;). It is happening quite randomly... :). Does disabling CONFIG_XENO_HW_UNLOCKED_SWITCH clear this issue? Well, yes and no. It starts well, but when booting the kernel I get : The mm switch issue was specifically addressed by this patch, which is part of 2.12-01: http://git.denx.de/?p=ipipe-2.6.git;a=commit;h=c14a47630d62d0328de1957636dceb1d498f7048 However, it the last 2.6.35 patch issued was based on 2.6.35.7, not 2.6.35.11, so there is still the possibility that something went wrong while you forward ported this code. - Please check that mmu_context_nohash.c does contain the fix above as it should - Please try Richard's suggestion, i.e. moving to 2.6.36, which may give us more hints. Badness at kernel/lockdep.c:2327 NIP: c006e554 LR: c006e53c CTR: 000186a0 Adeos sometimes conflicts with the vanilla IRQ state tracer. I'll have a look at this. Disable CONFIG_TRACE_IRQFLAGS. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PowerPC]Badness at mmu_context_nohash
On Wed, 2011-04-27 at 20:42 +0200, Jean-Michel Hautbois wrote: Hi list, I am currently using a Xenomai port on a linux 2.6.35.11 linux kernel and the adeos-ipipe-2.6.35.7-powerpc-2.12-01.patch. I am facing a scheduling issue on a P2020 (dual core PowerPC), and I get the following message : Badness at arch/powerpc/mm/mmu_context_nohash.c:209 NIP: c0018d20 LR: c039b94c CTR: c00343e4 REGS: ecfadce0 TRAP: 0700 Tainted: GW(2.6.35.11) MSR: 00021000 ME,CE CR: 24000488 XER: TASK = ec5220d0[496] 'sipaq' THREAD: ecfac000 CPU: 1 GPR00: 0001 ecfadd90 ec5220d0 ec5df340 ec58a700 0003 GPR08: c04a2d98 0007 c04a2d98 0067e000 0002f385 1007f1f8 c04a5b40 ecfac040 GPR16: c04a5b40 c04deb80 c04a2120 c04a2d98 c04a5b40 c04d008c ecfac000 00029000 GPR24: c04d c04d1e6c 0001 ec58a700 eceaf390 c04d1e78 c0b23b40 ec5df340 NIP [c0018d20] switch_mmu_context+0x80/0x438 LR [c039b94c] schedule+0x774/0x7dc Call Trace: [ecfadd90] [44000484] 0x44000484 (unreliable) [ecfadde0] [c039b94c] schedule+0x774/0x7dc [ecfade50] [c039cb98] do_nanosleep+0xc8/0x114 [ecfade80] [c0059bf8] hrtimer_nanosleep+0xd8/0x158 [ecfadf10] [c0059d48] sys_nanosleep+0xd0/0xd4 [ecfadf40] [c0013c0c] ret_from_syscall+0x0/0x3c --- Exception: c01 at 0xffa6cc4 LR = 0xffa6cb0 Instruction dump: 40a2fff0 4c00012c 2f80 409e0128 813b018c 2f83 39290001 913b018c 419e0020 8003018c 7c34 5400d97e 0f00 8123018c 3929 9123018c Do you have a clue on how to start debugging it ? Yes, but that can't be easily summarized here. In short, we have a serious problem with the sharing of the MMU context between the Linux and Xenomai schedulers in the SMP case on powerpc. It is happening quite randomly... :). Does disabling CONFIG_XENO_HW_UNLOCKED_SWITCH clear this issue? Thanks in advance ! JM ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] kernel threads crash
On Tue, 2011-04-19 at 09:26 +0200, Jesper Christensen wrote: If i run switchtest i get the following output: If still talking about the cpci6200, this patch should apply: http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=3d6fa118ef282c60dfeb0e690a579e8357bb7d13 [root@slot6 /bin]# switchtest == Testing FPU check routines... r0: 1 != 2 r1: 1 != 2 r2: 1 != 2 r3: 1 != 2 r4: 1 != 2 r5: 1 != 2 r6: 1 != 2 r7: 1 != 2 r8: 1 != 2 r9: 1 != 2 r10: 1 != 2 r11: 1 != 2 r12: 1 != 2 r13: 1 != 2 r14: 1 != 2 r15: 1 != 2 r16: 1 != 2 r17: 1 != 2 r18: 1 != 2 r19: 1 != 2 r20: 1 != 2 r21: 1 != 2 r22: 1 != 2 r23: 1 != 2 r24: 1 != 2 r25: 1 != 2 r26: 1 != 2 r27: 1 != 2 r28: 1 != 2 r29: 1 != 2 r30: 1 != 2 r31: 1 != 2 == FPU check routines: OK. == Threads: sleeper_ufps0-0 rtk0-1 rtk0-2 rtk_fp0-3 rtk_fp0-4 rtk_fp_ufpp0-5 rtk_fp_ufpp0-6 rtup0-7 rtup0-8 rtup_ufpp0-9 rtup_ufpp0-10 rtus0-11 rtus0-12 rtus_ufps0-13 rtus_ufps0-14 rtuo0-15 rtuo0-16 rtuo_ufpp0-17 rtuo_ufpp0-18 rtuo_ufps0-19 rtuo_ufps0-20 rtuo_ufpp_ufps0-21 rtuo_ufpp_ufps0-22 And then it halts. dmesg shows: Xenomai: suspending kernel thread ae819678 ('rtk5/0') at nip=0x80319aa0, lr=0x80319a70, r1=0xafa90510 after exception #1792 switchtest -n runs normally, should i use some sort of soft float flag in my compilations? /Jesper ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] kernel threads crash
On Tue, 2011-04-19 at 09:58 +0200, Jesper Christensen wrote: Great thanks, but i can't help wondering if the problems i'm seeing are related to some of my userspace programs using fp. I don't think so. The switchtest programs exercises the FPU hardware in a certain way to make sure it is available in real-time mode from kernel space (which is an utterly crappy legacy, but we will have to deal with it until Xenomai 3.x). As far as I can see from your .config, you can't have such support, so switchtest was basically trying to test an inexistent feature. /Jesper On 2011-04-19 09:39, Philippe Gerum wrote: On Tue, 2011-04-19 at 09:26 +0200, Jesper Christensen wrote: If i run switchtest i get the following output: If still talking about the cpci6200, this patch should apply: http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=3d6fa118ef282c60dfeb0e690a579e8357bb7d13 [root@slot6 /bin]# switchtest == Testing FPU check routines... r0: 1 != 2 r1: 1 != 2 r2: 1 != 2 r3: 1 != 2 r4: 1 != 2 r5: 1 != 2 r6: 1 != 2 r7: 1 != 2 r8: 1 != 2 r9: 1 != 2 r10: 1 != 2 r11: 1 != 2 r12: 1 != 2 r13: 1 != 2 r14: 1 != 2 r15: 1 != 2 r16: 1 != 2 r17: 1 != 2 r18: 1 != 2 r19: 1 != 2 r20: 1 != 2 r21: 1 != 2 r22: 1 != 2 r23: 1 != 2 r24: 1 != 2 r25: 1 != 2 r26: 1 != 2 r27: 1 != 2 r28: 1 != 2 r29: 1 != 2 r30: 1 != 2 r31: 1 != 2 == FPU check routines: OK. == Threads: sleeper_ufps0-0 rtk0-1 rtk0-2 rtk_fp0-3 rtk_fp0-4 rtk_fp_ufpp0-5 rtk_fp_ufpp0-6 rtup0-7 rtup0-8 rtup_ufpp0-9 rtup_ufpp0-10 rtus0-11 rtus0-12 rtus_ufps0-13 rtus_ufps0-14 rtuo0-15 rtuo0-16 rtuo_ufpp0-17 rtuo_ufpp0-18 rtuo_ufps0-19 rtuo_ufps0-20 rtuo_ufpp_ufps0-21 rtuo_ufpp_ufps0-22 And then it halts. dmesg shows: Xenomai: suspending kernel thread ae819678 ('rtk5/0') at nip=0x80319aa0, lr=0x80319a70, r1=0xafa90510 after exception #1792 switchtest -n runs normally, should i use some sort of soft float flag in my compilations? /Jesper ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] kernel threads crash
On Tue, 2011-04-19 at 10:42 +0200, Gilles Chanteperdrix wrote: Philippe Gerum wrote: On Tue, 2011-04-19 at 09:58 +0200, Jesper Christensen wrote: Great thanks, but i can't help wondering if the problems i'm seeing are related to some of my userspace programs using fp. I don't think so. The switchtest programs exercises the FPU hardware in a certain way to make sure it is available in real-time mode from kernel space (which is an utterly crappy legacy, but we will have to deal with it until Xenomai 3.x). As far as I can see from your .config, you can't have such support, so switchtest was basically trying to test an inexistent feature. In fact, switchtest whether Xenomai FPU switch routines work when the Linux kernel itself uses FPU in kernel-space. Currently, the only place when this happens is in the RAID code: x86 uses mmx/sse, and some power pcs use altivec. Some powerpc also fix unaligned accesses to floating point data in kernel-space, I do not know if this may interfere, which is why the powerpc code is compiled even without RAID. AFAICS, fp_regs_set() on ppc is issuing a load float instruction in kernel space which could be unaligned, and therefore trap. Looking at the .config for the target system, hw FPU support is disabled in the alignment code, so basically, this would beget a nop. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] kernel threads crash
On Tue, 2011-04-19 at 11:29 +0200, Philippe Gerum wrote: On Tue, 2011-04-19 at 10:42 +0200, Gilles Chanteperdrix wrote: Philippe Gerum wrote: On Tue, 2011-04-19 at 09:58 +0200, Jesper Christensen wrote: Great thanks, but i can't help wondering if the problems i'm seeing are related to some of my userspace programs using fp. I don't think so. The switchtest programs exercises the FPU hardware in a certain way to make sure it is available in real-time mode from kernel space (which is an utterly crappy legacy, but we will have to deal with it until Xenomai 3.x). As far as I can see from your .config, you can't have such support, so switchtest was basically trying to test an inexistent feature. In fact, switchtest whether Xenomai FPU switch routines work when the Linux kernel itself uses FPU in kernel-space. Currently, the only place when this happens is in the RAID code: x86 uses mmx/sse, and some power pcs use altivec. Some powerpc also fix unaligned accesses to floating point data in kernel-space, I do not know if this may interfere, which is why the powerpc code is compiled even without RAID. AFAICS, fp_regs_set() on ppc is issuing a load float instruction in kernel space which could be unaligned, and therefore trap. Looking at the .config for the target system, hw FPU support is disabled in the alignment code, so basically, this would beget a nop. A nop in fixing the issue, I mean. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] kernel threads crash - possible race condition?
On Thu, 2011-04-14 at 15:46 +0200, Jesper Christensen wrote: Actually i have been running with CONFIG_XENO_HW_UNLOCKED_SWITCH the whole time You mean enabled? and i also raised the stack size from 4k to 8k. I do however think there could be some fishyness in entry_32.S. In transfer_to_handler SPRN_SPRG3 is used to check for stack overflow (at least in my kernel 2.6.29.6), but i must admit i haven't seen any of that in the kernel log. Mmm, you are right. In any case, what we want with the unmasked switch feature is to allow interrupts while we flush the tlb and set the new mm context, which may be lengthy on some low end platforms. Allowing the switch code to be preempted during the register swap is of no use wrt latency. Do you have a patch at hand which you could post that flips MSR_EE in rthal_thread_switch already? /Jesper On 2011-04-14 15:31, Philippe Gerum wrote: On Thu, 2011-04-14 at 15:04 +0200, Jesper Christensen wrote: I wrote about some problems concerning stack corruption when running xenomai on ppc. I have found out that if i disable hardware interrupts while running rthal_thread_switch the problem seems to dissapear somewhat. I saw a crash yesterday after running for 3 hours, and i'm currently running a test (has been running for 3 hours). Usually it would fail after 30-40 minutes. My question is: could there be a problem if we receive an interrupt between updating the stack pointer and the sprg3 register with the new thread pointer? Normally, there should not be any issue (famous last words), since we would run Xenomai-only code over the preempted context, and we don't depend on SPRG3 to fetch the current phys address. In fact, at this stage we simply don't care about the linux context, only referring to the current Xenomai thread, which is obtained differently. Try switching off CONFIG_XENO_HW_UNLOCKED_SWITCH, in the machine config area, if this ends up being rock-solid, then this would be a hint that something may be fishy in this area. Raising your k-thread stack sizes in a separate test may be interesting to check too, if not already done. /Jesper ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] kernel threads crash
On Mon, 2011-04-11 at 16:18 +0200, Philippe Gerum wrote: On Mon, 2011-04-11 at 16:13 +0200, Jesper Christensen wrote: I have updated to xenomai 2.5.6, but i'm still seeing exceptions (considerably less often though): Xenomai: suspending kernel thread b92a39d0 ('tt_upgw_0') at 0xb92a39d0 after exception #1792 You should build your code statically into the kernel, not as a module, and find out which code raises the MCE. It's a program check exception, not a machine check, but the rest remains applicable. CONFIG_DEBUG_INFO=y, then objdump -dl vmlinux, looking for the NIP mentioned. /Jesper On 2011-04-08 15:12, Philippe Gerum wrote: On Fri, 2011-04-08 at 14:58 +0200, Jesper Christensen wrote: Hi I'm trying to implement some gateway functionality in the kernel on a emerson CPCI6200 board, but have run into some strange errors. The kernel module is made up of two threads that run every 1 ms. I have also made use of the rtpc dispatcher in rtnet to dispatch control messages from a netlink socket to the RT part of my kernel module. The problem is that when loaded the threads get suspended due to exceptions: Xenomai: suspending kernel thread b929cbc0 ('tt_upgw_0') at 0xb929cbc0 after exception #1792 or Xenomai: suspending kernel thread b929cbc0 ('tt_upgw_0') at 0x0 after exception #1025 or Xenomai: suspending kernel thread b911f518 ('rtnet-rtpc') at 0xb911f940 after exception #1792 I have ported the gianfar driver from linux to rtnet. The versions and hardware are listed below. The errors are most likely due to faulty software on my part, but i would like to ask if there are any known issues with the versions or hardware i'm using. I would also like to ask if there are any ways of further debugging the errors as i am not getting very far with the above messages. A severe bug at kthread init was fixed in the 2.5.5.2 - 2.5.6 timeframe, which would cause exactly the kind of weird behavior you are seeing right now. The bug triggered random code execution due to stack memory pollution at init on powerpc for Xenomai kthreads: http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=90699565cbce41f2cec193d57857bb5817efc19a http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=da20c20d4b4d892d40c657ad1d32ddb6d0ceb47c http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=a5886b354dc18f054b187b58cfbacfb60bccaf47 You need at the very least those three patches (from the top of my head), but it would be much better to upgrade to 2.5.6. System info: Linux kernel: 2.6.29.6 i-pipe version: 2.7-04 processor: powerpc mpc8572 xenomai version: 2.5.3 rtnet version: 0.9.12 ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] kernel threads crash
On Mon, 2011-04-11 at 16:20 +0200, Jesper Christensen wrote: Problem is the NIP in question is the address of the thread structure as seen in the error message. LR? /Jesper On 2011-04-11 16:18, Philippe Gerum wrote: On Mon, 2011-04-11 at 16:13 +0200, Jesper Christensen wrote: I have updated to xenomai 2.5.6, but i'm still seeing exceptions (considerably less often though): Xenomai: suspending kernel thread b92a39d0 ('tt_upgw_0') at 0xb92a39d0 after exception #1792 You should build your code statically into the kernel, not as a module, and find out which code raises the MCE. CONFIG_DEBUG_INFO=y, then objdump -dl vmlinux, looking for the NIP mentioned. /Jesper On 2011-04-08 15:12, Philippe Gerum wrote: On Fri, 2011-04-08 at 14:58 +0200, Jesper Christensen wrote: Hi I'm trying to implement some gateway functionality in the kernel on a emerson CPCI6200 board, but have run into some strange errors. The kernel module is made up of two threads that run every 1 ms. I have also made use of the rtpc dispatcher in rtnet to dispatch control messages from a netlink socket to the RT part of my kernel module. The problem is that when loaded the threads get suspended due to exceptions: Xenomai: suspending kernel thread b929cbc0 ('tt_upgw_0') at 0xb929cbc0 after exception #1792 or Xenomai: suspending kernel thread b929cbc0 ('tt_upgw_0') at 0x0 after exception #1025 or Xenomai: suspending kernel thread b911f518 ('rtnet-rtpc') at 0xb911f940 after exception #1792 I have ported the gianfar driver from linux to rtnet. The versions and hardware are listed below. The errors are most likely due to faulty software on my part, but i would like to ask if there are any known issues with the versions or hardware i'm using. I would also like to ask if there are any ways of further debugging the errors as i am not getting very far with the above messages. A severe bug at kthread init was fixed in the 2.5.5.2 - 2.5.6 timeframe, which would cause exactly the kind of weird behavior you are seeing right now. The bug triggered random code execution due to stack memory pollution at init on powerpc for Xenomai kthreads: http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=90699565cbce41f2cec193d57857bb5817efc19a http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=da20c20d4b4d892d40c657ad1d32ddb6d0ceb47c http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=a5886b354dc18f054b187b58cfbacfb60bccaf47 You need at the very least those three patches (from the top of my head), but it would be much better to upgrade to 2.5.6. System info: Linux kernel: 2.6.29.6 i-pipe version: 2.7-04 processor: powerpc mpc8572 xenomai version: 2.5.3 rtnet version: 0.9.12 ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] kernel threads crash
On Mon, 2011-04-11 at 16:20 +0200, Jesper Christensen wrote: Problem is the NIP in question is the address of the thread structure as seen in the error message. Is your code spawning -rt kernel threads frequently/periodically, or only when the application initializes? /Jesper On 2011-04-11 16:18, Philippe Gerum wrote: On Mon, 2011-04-11 at 16:13 +0200, Jesper Christensen wrote: I have updated to xenomai 2.5.6, but i'm still seeing exceptions (considerably less often though): Xenomai: suspending kernel thread b92a39d0 ('tt_upgw_0') at 0xb92a39d0 after exception #1792 You should build your code statically into the kernel, not as a module, and find out which code raises the MCE. CONFIG_DEBUG_INFO=y, then objdump -dl vmlinux, looking for the NIP mentioned. /Jesper On 2011-04-08 15:12, Philippe Gerum wrote: On Fri, 2011-04-08 at 14:58 +0200, Jesper Christensen wrote: Hi I'm trying to implement some gateway functionality in the kernel on a emerson CPCI6200 board, but have run into some strange errors. The kernel module is made up of two threads that run every 1 ms. I have also made use of the rtpc dispatcher in rtnet to dispatch control messages from a netlink socket to the RT part of my kernel module. The problem is that when loaded the threads get suspended due to exceptions: Xenomai: suspending kernel thread b929cbc0 ('tt_upgw_0') at 0xb929cbc0 after exception #1792 or Xenomai: suspending kernel thread b929cbc0 ('tt_upgw_0') at 0x0 after exception #1025 or Xenomai: suspending kernel thread b911f518 ('rtnet-rtpc') at 0xb911f940 after exception #1792 I have ported the gianfar driver from linux to rtnet. The versions and hardware are listed below. The errors are most likely due to faulty software on my part, but i would like to ask if there are any known issues with the versions or hardware i'm using. I would also like to ask if there are any ways of further debugging the errors as i am not getting very far with the above messages. A severe bug at kthread init was fixed in the 2.5.5.2 - 2.5.6 timeframe, which would cause exactly the kind of weird behavior you are seeing right now. The bug triggered random code execution due to stack memory pollution at init on powerpc for Xenomai kthreads: http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=90699565cbce41f2cec193d57857bb5817efc19a http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=da20c20d4b4d892d40c657ad1d32ddb6d0ceb47c http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=a5886b354dc18f054b187b58cfbacfb60bccaf47 You need at the very least those three patches (from the top of my head), but it would be much better to upgrade to 2.5.6. System info: Linux kernel: 2.6.29.6 i-pipe version: 2.7-04 processor: powerpc mpc8572 xenomai version: 2.5.3 rtnet version: 0.9.12 ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] kernel threads crash
On Fri, 2011-04-08 at 14:58 +0200, Jesper Christensen wrote: Hi I'm trying to implement some gateway functionality in the kernel on a emerson CPCI6200 board, but have run into some strange errors. The kernel module is made up of two threads that run every 1 ms. I have also made use of the rtpc dispatcher in rtnet to dispatch control messages from a netlink socket to the RT part of my kernel module. The problem is that when loaded the threads get suspended due to exceptions: Xenomai: suspending kernel thread b929cbc0 ('tt_upgw_0') at 0xb929cbc0 after exception #1792 or Xenomai: suspending kernel thread b929cbc0 ('tt_upgw_0') at 0x0 after exception #1025 or Xenomai: suspending kernel thread b911f518 ('rtnet-rtpc') at 0xb911f940 after exception #1792 I have ported the gianfar driver from linux to rtnet. The versions and hardware are listed below. The errors are most likely due to faulty software on my part, but i would like to ask if there are any known issues with the versions or hardware i'm using. I would also like to ask if there are any ways of further debugging the errors as i am not getting very far with the above messages. A severe bug at kthread init was fixed in the 2.5.5.2 - 2.5.6 timeframe, which would cause exactly the kind of weird behavior you are seeing right now. The bug triggered random code execution due to stack memory pollution at init on powerpc for Xenomai kthreads: http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=90699565cbce41f2cec193d57857bb5817efc19a http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=da20c20d4b4d892d40c657ad1d32ddb6d0ceb47c http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=a5886b354dc18f054b187b58cfbacfb60bccaf47 You need at the very least those three patches (from the top of my head), but it would be much better to upgrade to 2.5.6. System info: Linux kernel: 2.6.29.6 i-pipe version: 2.7-04 processor: powerpc mpc8572 xenomai version: 2.5.3 rtnet version: 0.9.12 -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] kernel threads crash
On Fri, 2011-04-08 at 15:20 +0200, Jesper Christensen wrote: Thanks i'll give 2.5.6 a shot. Also it has come to my attention that there is some source files (arch/powerpc/platforms/85xx/cpci6200.c, arch/powerpc/platforms/85xx/cpci6200.h, arch/powerpc/platforms/85xx/cpci6200_timer.c) that are probably not covered by the adeos patch. Am i correct in assuming these need some work to support i-pipe? I can't tell since I have no access to them, this is probably not a mainline port. In any case, if any of those files implements the support for the programmable interrupt controller, hw timer, gpios and/or any form of cascaded interrupt handling, this is correct: they should be made I-pipe aware. /Jesper On 2011-04-08 15:12, Philippe Gerum wrote: On Fri, 2011-04-08 at 14:58 +0200, Jesper Christensen wrote: Hi I'm trying to implement some gateway functionality in the kernel on a emerson CPCI6200 board, but have run into some strange errors. The kernel module is made up of two threads that run every 1 ms. I have also made use of the rtpc dispatcher in rtnet to dispatch control messages from a netlink socket to the RT part of my kernel module. The problem is that when loaded the threads get suspended due to exceptions: Xenomai: suspending kernel thread b929cbc0 ('tt_upgw_0') at 0xb929cbc0 after exception #1792 or Xenomai: suspending kernel thread b929cbc0 ('tt_upgw_0') at 0x0 after exception #1025 or Xenomai: suspending kernel thread b911f518 ('rtnet-rtpc') at 0xb911f940 after exception #1792 I have ported the gianfar driver from linux to rtnet. The versions and hardware are listed below. The errors are most likely due to faulty software on my part, but i would like to ask if there are any known issues with the versions or hardware i'm using. I would also like to ask if there are any ways of further debugging the errors as i am not getting very far with the above messages. A severe bug at kthread init was fixed in the 2.5.5.2 - 2.5.6 timeframe, which would cause exactly the kind of weird behavior you are seeing right now. The bug triggered random code execution due to stack memory pollution at init on powerpc for Xenomai kthreads: http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=90699565cbce41f2cec193d57857bb5817efc19a http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=da20c20d4b4d892d40c657ad1d32ddb6d0ceb47c http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=a5886b354dc18f054b187b58cfbacfb60bccaf47 You need at the very least those three patches (from the top of my head), but it would be much better to upgrade to 2.5.6. System info: Linux kernel: 2.6.29.6 i-pipe version: 2.7-04 processor: powerpc mpc8572 xenomai version: 2.5.3 rtnet version: 0.9.12 -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Would Xenomai adopt RTL after the patent is expired?
On Wed, 2011-04-06 at 09:19 +0800, arethe.rtai wrote: HI: The patents that cover RT-Linux are set to expire in a few years, then, would Xenomai adopt the RTL technology? As known, the RTL idea is clean and minimalistic, it may improve the determinism of Xenomai. The trend is rather to blur the distinction between native real-time and dual kernel approaches these days, not to downgrade to a kernel-only interrupt handler with co-routines on top. So no, there would be no rational reason to do that, not to mention the fact that if we can assess the typical latency of Xenomai over the seven architectures it runs on with the latest mainline kernels, we would be unable to compare this to anything else than x86 over a legacy kernel AFAIK. And no, I don't think that I'm going to send an inquiry to WRS for information regarding how RTL performs on other architectures. Incidentally, maybe you should ask yourself why they ship WR-Linux with PREEMPT_RT. Regards arethe 2011-04-06 __ arethe.rtai ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Backfire: User - Kernel latancy mesurement tool on Xenomai
On Wed, 2011-04-06 at 19:31 +0100, krishna murthy j s wrote: Thanks for the reply. Can you please tell me why my questions are useless. Such attitude will take Xenomai user group no where. It is often better to go nowhere than to go to the wrong place. Anyway, some reasons for the flak you received could be: - no specification of your target hardware. I understand there is a long-standing trend to throw results into the latency debate without a single bit of information regarding the hardware configuration under test, the actual code being used (and not a vague description of what it eventually does), how it has been changed and how it has been used, but well, we are old-fashioned folks: we do prefer facts. Besides, we do think there is life beyond x86, so it is always better to be specific in this area when sending us inquiries. - backfire comes from the PREEMPT_RT test suite. As such, it does not care of any dual kernel issues. We do, when writing an application. So, unless you also wrote an RTDM driver to replace the original backfire driver, what you are testing is actually plain vanilla Linux, with the additional overhead of moving your task back and forth between the Xenomai scheduler and the Linux scheduler at a high rate. If so, no wonder why you get some extra latency with Xenomai. It's a bit like driving on a racetrack paved with speed bumpers. - measuring the latency of Linux signal delivery like backfire does is interestingly totally off-base wrt Xenomai, because linux signals are delivered to Xenomai tasks ... in Linux mode (yes, we have runtime modes like dual kernel systems may have). So, no real-time here either. Since Xenomai does not implement signal delivery in real-time mode yet, what you are testing still remains a mystery. But maybe you could explain better? To sum up, each RT enabler comes with a test suite which has been written carefully to illustrate a particular behavior or performance aspect, and Xenomai follows this common rule. Before issuing any claims, maybe you could have posted your code, a detailed description of your setup, and your test scenario. Asking people to reverse-engineer what you might have done, based on a couple of lose details placed side-by-side with strong claims and conclusions, is not the best way to draw attention. So, don't take what was said earlier personally. It is just that sometimes, people may have tuned their bullshit deflector a bit eagerly. Mine is totally busted btw, so you never know. On Wed, Apr 6, 2011 at 6:56 PM, Gilles Chanteperdrix gilles.chanteperd...@xenomai.org wrote: krishna m wrote: I ported the backfire tool in the OSADL site [https://www.osadl.org/backfire-4.backfire.0.html] to measure the user to/from kernel latency.I wanted to measure the difference between the RT_PREEMPT kernel and Xenomai Kernel. Surprisingly i see RT_PREEMPT performance better than Xenomai. Here are few points to note: 1. The thread priority of the sendme tool of backfire in RT_PREEMPT is 99 [highest] 2. I have made the thread priority 99 for the rt_task that i spawn [par of ported sendme] ret = rt_task_shadow(rt_task_desc, NULL, 99, 0); My Questions: * I wanted to know if any one has done such measurements using backfire and how dose Xenomai fair agnist RT_PREEMPT? * is there any similar tool like backfire in the Xenomai tool set that dose the similar measurements? * Do I need to do more Xenomai specific optimization in the sendme and backfire code to get better performance? Useless notes, useless questions. Show us the ported code. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Bug in Linux kernel 2.6.37 blocks xenomai threads
On Mon, 2011-04-04 at 16:41 +0200, Sebastian Smolorz wrote: Hi, there is a bug in kernel 2.6.37 (fixed in 2.6.37.1, see commit 1cdc65e1400d863f28af868ee1e645485b04f5ed) which blocks RT threads during creation. They stick to a certain CPU core for a certain amount of time (sometimes minutes ...) before they are migrated to the proper core and run as expected. Philippe, Gilles, maybe you could generate a new i-pipe patch based on the newest 2.6.37-series kernel. I patched a 2.6.37.6 kernel with adeos- ipipe-2.6.37-x86-2.9-00.patch and the problem was gone. Ok, I'll handle this. I have some patches from Jan which have been pending for too long in my tree to add to this one. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Bug in Linux kernel 2.6.37 blocks xenomai threads
On Mon, 2011-04-04 at 16:56 +0200, Gilles Chanteperdrix wrote: Philippe Gerum wrote: On Mon, 2011-04-04 at 16:41 +0200, Sebastian Smolorz wrote: Hi, there is a bug in kernel 2.6.37 (fixed in 2.6.37.1, see commit 1cdc65e1400d863f28af868ee1e645485b04f5ed) which blocks RT threads during creation. They stick to a certain CPU core for a certain amount of time (sometimes minutes ...) before they are migrated to the proper core and run as expected. Philippe, Gilles, maybe you could generate a new i-pipe patch based on the newest 2.6.37-series kernel. I patched a 2.6.37.6 kernel with adeos- ipipe-2.6.37-x86-2.9-00.patch and the problem was gone. Ok, I'll handle this. I have some patches from Jan which have been pending for too long in my tree to add to this one. Since you are at it, could you have a look at: https://mail.gna.org/public/adeos-main/2011-04/msg1.html Ok, queued. Thanks. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Whether can Xenomai bypass the cache?
On Sun, 2011-03-06 at 14:05 +0800, arethe rtai wrote: Hello: As known, cache can accelerate the memory access, but unfortunately, it would decrease the predictability of real-time tasks' temporal behaviour. Many tasks of our application prefer predictability to the speed of execution. Intel's processors after P6 include MTRR and PAT, both the two units can be used to bypass the cache. I wonder whether Xenomai can bypass the cache, and whether Xenomai can manage the MTRR or PAT? No. If the answers are true, how to use the function? 3x arethe ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Is anybody using the pSOS skin in userland?
On Wed, 2010-11-03 at 22:42 +0100, ronny meeus wrote: Hello we are investigating to usage of the pSOS+ skin to port a large legacy pSOS application to Linux. The application model consist of several processes in which the application lives. All processes will make use of the pSOS library. After playing around with the library for some time we have observed several missing service calls, bugs and differences in behaviour compared to a real pSOS implementation: - missing sm_ident - missing t_getreg / t_setreg in userland (patch already included in 2.5.5) - not possible to use skin from the context of different processes (patch already included in 2.5.5) - added support for identical task/queue/semaphore/region names by making names unique. - strange behaviour in pSOS message queue (see post Possible memory leak in psos skin message queue handling). I can (and will) deliver patches for all issues I have found, but I'm wondering whether there are other people using the pSOS skin (in userland) in a real live application. The target for my project would be an embedded system with strong reliability requirements (very stable / long running etc). Any feedback is welcome and appreciated. It is not clear to me either which tests are executed before a new version is released. T-e-s-t? What's this? We are proud to deliver the greatest uncertainty, where the deepest fears about upgrading may turn into the highest hopes. And vice-versa. Is there any test-suite available for the pSOS skin? This is a good start, used to validate the Xenomai SOLO implementation. http://git.denx.de/?p=xenomai-solo.git;a=tree;f=psos/testsuite;h=54411570e19dec40e14a1226084024c05c0f3e53;hb=ee9c11895ac7cf2d72b1158a4836a4465f478a0b This needs to be slightly adapted to run over the current Xenomai 2.x architecture, but the test logic of course is the same. Best regards, Ronny ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Is anybody using the pSOS skin in userland?
On Wed, 2010-11-03 at 22:42 +0100, ronny meeus wrote: Hello we are investigating to usage of the pSOS+ skin to port a large legacy pSOS application to Linux. The application model consist of several processes in which the application lives. All processes will make use of the pSOS library. After playing around with the library for some time we have observed several missing service calls, bugs and differences in behaviour compared to a real pSOS implementation: - missing sm_ident http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=26e916ecc3f8b71cd8ce4c4194555ee0cc4aa018 - missing t_getreg / t_setreg in userland (patch already included in 2.5.5) - not possible to use skin from the context of different processes (patch already included in 2.5.5) - added support for identical task/queue/semaphore/region names by making names unique. - strange behaviour in pSOS message queue (see post Possible memory leak in psos skin message queue handling). I can (and will) deliver patches for all issues I have found, but I'm wondering whether there are other people using the pSOS skin (in userland) in a real live application. The target for my project would be an embedded system with strong reliability requirements (very stable / long running etc). Any feedback is welcome and appreciated. It is not clear to me either which tests are executed before a new version is released. Is there any test-suite available for the pSOS skin? Best regards, Ronny ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Potential problem with rt_eepro100
On Sun, 2010-11-07 at 02:00 +0100, Jan Kiszka wrote: Am 06.11.2010 23:49, Philippe Gerum wrote: On Sat, 2010-11-06 at 21:37 +0100, Gilles Chanteperdrix wrote: Anders Blomdell wrote: Gilles Chanteperdrix wrote: Anders Blomdell wrote: Gilles Chanteperdrix wrote: Jan Kiszka wrote: Am 05.11.2010 00:24, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Am 04.11.2010 23:06, Gilles Chanteperdrix wrote: Jan Kiszka wrote: At first sight, here you are more breaking things than cleaning them. Still, it has the SMP record for my test program, still runs with ftrace on (after 2 hours, where it previously failed after maximum 23 minutes). My version was indeed still buggy, I'm reworking it ATM. If I get the gist of Jan's changes, they are (using the IPI to transfer one bit of information: your cpu needs to reschedule): xnsched_set_resched: - setbits((__sched__)-status, XNRESCHED); xnpod_schedule_handler: +xnsched_set_resched(sched); If you (we?) decide to keep the debug checks, under what circumstances would the current check trigger (in laymans language, that I'll be able to understand)? That's actually what /me is wondering as well. I do not see yet how you can reliably detect a missed reschedule reliably (that was the purpose of the debug check) given the racy nature between signaling resched and processing the resched hints. The purpose of the debugging change is to detect a change of the scheduler state which was not followed by setting the XNRESCHED bit. But that is nucleus business, nothing skins can screw up (as long as they do not misuse APIs). Yes, but it happens that we modify the nucleus from time to time. Getting it to work is relatively simple: we add a scheduler change set remotely bit to the sched structure which is NOT in the status bit, set this bit when changing a remote sched (under nklock). In the debug check code, if the scheduler state changed, and the XNRESCHED bit is not set, only consider this a but if this new bit is not set. All this is compiled out if the debug is not enabled. I still see no benefit in this check. Where to you want to place the bit set? Aren't that just the same locations where xnsched_set_[self_]resched already is today? Well no, that would be another bit in the sched structure which would allow us to manipulate the status bits from the local cpu. That supplementary bit would only be changed from a distant CPU, and serve to detect the race which causes the false positive. The resched bits are set on the local cpu to get xnpod_schedule to trigger a rescheduling on the distance cpu. That bit would be set on the remote cpu's sched. Only when debugging is enabled. But maybe you can provide some motivating bug scenarios, real ones of the past or realistic ones of the future. Of course. The bug is anything which changes the scheduler state but does not set the XNRESCHED bit. This happened when we started the SMP port. New scheduling policies would be good candidates for a revival of this bug. You don't gain any worthwhile check if you cannot make the instrumentation required for a stable detection simpler than the proper problem solution itself. And this is what I'm still skeptical of. The solution is simple, but finding the problem without the instrumentation is way harder than with the instrumentation, so the instrumentation is worth something. Reproducing the false positive is surprisingly easy with a simple dual-cpu semaphore ping-pong test. So, here is the (tested) patch, using a ridiculous long variable name to illustrate what I was thinking about: diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h index cf4..454b8e8 100644 --- a/include/nucleus/sched.h +++ b/include/nucleus/sched.h @@ -108,6 +108,9 @@ typedef struct xnsched { struct xnthread *gktarget; #endif +#ifdef CONFIG_XENO_OPT_DEBUG_NUCLEUS + int debug_resched_from_remote; +#endif } xnsched_t; union xnsched_policy_param; @@ -185,6 +188,8 @@ static inline int xnsched_resched_p(struct xnsched *sched) xnsched_t *current_sched = xnpod_current_sched(); \ __setbits(current_sched-status, XNRESCHED); \ if (current_sched != (__sched__)){ \ + if (XENO_DEBUG(NUCLEUS)) \ + __sched__-debug_resched_from_remote = 1; \ xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched); \ } \ } while (0) diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c index 4cb707a..50b0f49 100644 --- a/ksrc/nucleus/pod.c +++ b/ksrc/nucleus/pod.c @@ -2177,6 +2177,10 @@ static
Re: [Xenomai-core] Potential problem with rt_eepro100
On Sun, 2010-11-07 at 09:31 +0100, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Anyway, after some thoughts, I think we are going to try and make the current situation work instead of going back to the old way. You can find the patch which attempts to do so here: http://sisyphus.hd.free.fr/~gilles/sched_status.txt Ack. At last, this addresses the real issues without asking for regression funkiness: fix the lack of barrier before testing XNSCHED in Check the kernel, we actually need it on both sides. Wherever the final barriers will be, we should leave a comment behind why they are there. Could be picked up from kernel/smp.c. We have it on both sides: the non-local flags are modified while holding the nklock. Unlocking the nklock implies a barrier. I think we may have an issue with this kind of construct: xnlock_get_irq*(nklock) xnpod_resume/suspend/whatever_thread() xnlock_get_irq*(nklock) ... xnlock_put_irq*(nklock) xnpod_schedule() xnlock_get_irq*(nklock) send_ipi = xnpod_schedule_handler on dest CPU xnlock_put_irq*(nklock) xnlock_put_irq*(nklock) The issue would be triggered by the use of recursive locking. In that case, the source CPU would only sync its cache when the lock is actually dropped by the outer xnlock_put_irq* call and the inner xnlock_get/put_irq* would not act as barriers, so the remote rescheduling handler won't always see the XNSCHED update done remotely, and may lead to a no-op. So we need a barrier before sending the IPI in __xnpod_test_resched(). This could not happen if all schedule state changes where clearly isolated from rescheduling calls in different critical sections, but it's sometimes not an option not to group them for consistency reasons. the xnpod_schedule pre-test, and stop sched-status trashing due to XNINIRQ/XNHTICK/XNRPICK ops done un-synced on nklock. In short, this patch looks like moving the local-only flags where they belong, i.e. anywhere you want but *outside* of the status with remotely accessed bits. XNRPICK seems to be handled differently, but it makes sense to group it with other RPI data as you did, so fine with me. I just hope we finally converge over a solution. Looks like all possibilities have been explored now. A few more comments on this one: It probably makes sense to group the status bits accordingly (both their values and definitions) and briefly document on which status field they are supposed to be applied. Ok, but I wanted them to not use the same values, so that we can use the sched-status | sched-lstatus trick in xnpod_schedule. Something is lacking too: we probably need to use sched-status | sched-lstatus for display in /proc. I do not understand the split logic - or some bits are simply not yet migrated. XNHDEFER, XNSWLOCK, XNKCOUT are all local-only as well, no? Then better put them in the _local_ status field, that's more consistent (and would help if we once wanted to optimize their cache line usage). Maybe the naming is not good the. -status is everything which is modified under nklock, -lstatus is for XNINIRQ and XNHTICK which are modified without holding the nklock. The naming is unfortunate: status vs. lstatus. This is asking for confusion and typos. They must be better distinguishable, e.g. local_status. Or we need accessors that have debug checks built in, catching wrong bits for their target fields. I agree. Good catch of the RPI breakage, Gilles! -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Potential problem with rt_eepro100
On Sun, 2010-11-07 at 11:14 +0100, Jan Kiszka wrote: Am 07.11.2010 11:12, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Am 07.11.2010 11:03, Philippe Gerum wrote: On Sun, 2010-11-07 at 09:31 +0100, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Anyway, after some thoughts, I think we are going to try and make the current situation work instead of going back to the old way. You can find the patch which attempts to do so here: http://sisyphus.hd.free.fr/~gilles/sched_status.txt Ack. At last, this addresses the real issues without asking for regression funkiness: fix the lack of barrier before testing XNSCHED in Check the kernel, we actually need it on both sides. Wherever the final barriers will be, we should leave a comment behind why they are there. Could be picked up from kernel/smp.c. We have it on both sides: the non-local flags are modified while holding the nklock. Unlocking the nklock implies a barrier. I think we may have an issue with this kind of construct: xnlock_get_irq*(nklock) xnpod_resume/suspend/whatever_thread() xnlock_get_irq*(nklock) ... xnlock_put_irq*(nklock) xnpod_schedule() xnlock_get_irq*(nklock) send_ipi = xnpod_schedule_handler on dest CPU xnlock_put_irq*(nklock) xnlock_put_irq*(nklock) The issue would be triggered by the use of recursive locking. In that case, the source CPU would only sync its cache when the lock is actually dropped by the outer xnlock_put_irq* call and the inner xnlock_get/put_irq* would not act as barriers, so the remote rescheduling handler won't always see the XNSCHED update done remotely, and may lead to a no-op. So we need a barrier before sending the IPI in __xnpod_test_resched(). That's what I said. And we need it on the reader side as an rmb(). This one we have, in xnpod_schedule_handler. Right, with your patch (the above sounded like we only need it on writer side). C'mon... -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Potential problem with rt_eepro100
On Sat, 2010-11-06 at 21:37 +0100, Gilles Chanteperdrix wrote: Anders Blomdell wrote: Gilles Chanteperdrix wrote: Anders Blomdell wrote: Gilles Chanteperdrix wrote: Jan Kiszka wrote: Am 05.11.2010 00:24, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Am 04.11.2010 23:06, Gilles Chanteperdrix wrote: Jan Kiszka wrote: At first sight, here you are more breaking things than cleaning them. Still, it has the SMP record for my test program, still runs with ftrace on (after 2 hours, where it previously failed after maximum 23 minutes). My version was indeed still buggy, I'm reworking it ATM. If I get the gist of Jan's changes, they are (using the IPI to transfer one bit of information: your cpu needs to reschedule): xnsched_set_resched: - setbits((__sched__)-status, XNRESCHED); xnpod_schedule_handler: + xnsched_set_resched(sched); If you (we?) decide to keep the debug checks, under what circumstances would the current check trigger (in laymans language, that I'll be able to understand)? That's actually what /me is wondering as well. I do not see yet how you can reliably detect a missed reschedule reliably (that was the purpose of the debug check) given the racy nature between signaling resched and processing the resched hints. The purpose of the debugging change is to detect a change of the scheduler state which was not followed by setting the XNRESCHED bit. But that is nucleus business, nothing skins can screw up (as long as they do not misuse APIs). Yes, but it happens that we modify the nucleus from time to time. Getting it to work is relatively simple: we add a scheduler change set remotely bit to the sched structure which is NOT in the status bit, set this bit when changing a remote sched (under nklock). In the debug check code, if the scheduler state changed, and the XNRESCHED bit is not set, only consider this a but if this new bit is not set. All this is compiled out if the debug is not enabled. I still see no benefit in this check. Where to you want to place the bit set? Aren't that just the same locations where xnsched_set_[self_]resched already is today? Well no, that would be another bit in the sched structure which would allow us to manipulate the status bits from the local cpu. That supplementary bit would only be changed from a distant CPU, and serve to detect the race which causes the false positive. The resched bits are set on the local cpu to get xnpod_schedule to trigger a rescheduling on the distance cpu. That bit would be set on the remote cpu's sched. Only when debugging is enabled. But maybe you can provide some motivating bug scenarios, real ones of the past or realistic ones of the future. Of course. The bug is anything which changes the scheduler state but does not set the XNRESCHED bit. This happened when we started the SMP port. New scheduling policies would be good candidates for a revival of this bug. You don't gain any worthwhile check if you cannot make the instrumentation required for a stable detection simpler than the proper problem solution itself. And this is what I'm still skeptical of. The solution is simple, but finding the problem without the instrumentation is way harder than with the instrumentation, so the instrumentation is worth something. Reproducing the false positive is surprisingly easy with a simple dual-cpu semaphore ping-pong test. So, here is the (tested) patch, using a ridiculous long variable name to illustrate what I was thinking about: diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h index cf4..454b8e8 100644 --- a/include/nucleus/sched.h +++ b/include/nucleus/sched.h @@ -108,6 +108,9 @@ typedef struct xnsched { struct xnthread *gktarget; #endif +#ifdef CONFIG_XENO_OPT_DEBUG_NUCLEUS + int debug_resched_from_remote; +#endif } xnsched_t; union xnsched_policy_param; @@ -185,6 +188,8 @@ static inline int xnsched_resched_p(struct xnsched *sched) xnsched_t *current_sched = xnpod_current_sched();\ __setbits(current_sched-status, XNRESCHED); \ if (current_sched != (__sched__)){ \ + if (XENO_DEBUG(NUCLEUS)) \ + __sched__-debug_resched_from_remote = 1; \ xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched); \ }\ } while (0) diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c index 4cb707a..50b0f49 100644 --- a/ksrc/nucleus/pod.c +++ b/ksrc/nucleus/pod.c @@ -2177,6 +2177,10 @@ static inline int __xnpod_test_resched(struct xnsched *sched) xnarch_cpus_clear(sched-resched); } #endif + if
Re: [Xenomai-core] Potential problem with rt_eepro100
On Wed, 2010-11-03 at 20:38 +0100, Anders Blomdell wrote: Jan Kiszka wrote: Am 03.11.2010 17:46, Anders Blomdell wrote: Anders Blomdell wrote: Anders Blomdell wrote: Jan Kiszka wrote: additional barrier. Can you check this? diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h index df56417..66b52ad 100644 --- a/include/nucleus/sched.h +++ b/include/nucleus/sched.h @@ -187,6 +187,7 @@ static inline int xnsched_self_resched_p(struct xnsched *sched) if (current_sched != (__sched__)){\ xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched);\ setbits((__sched__)-status, XNRESCHED);\ + xnarch_memory_barrier();\ }\ } while (0) In progress, if nothing breaks before, I'll report status tomorrow morning. It still breaks (in approximately the same way). I'm currently putting a barrier in the other macro doing a RESCHED, also adding some tracing to see if a read barrier is needed. Nope, no luck there either. Will start interesting tracepoint adding/conversion :-( Strange. But it was too easy anyway... Any reason why xn_nucleus_sched_remote should ever report status = 0? Really don't know yet. You could trigger on this state and call ftrace_stop() then. Provided you had the functions tracer enabled, that should give a nice pictures of what happened before. Isn't there a race betweeen these two (still waiting for compilation to be finished)? We always hold the nklock in both contexts. static inline int __xnpod_test_resched(struct xnsched *sched) { int resched = testbits(sched-status, XNRESCHED); #ifdef CONFIG_SMP /* Send resched IPI to remote CPU(s). */ if (unlikely(xnsched_resched_p(sched))) { xnarch_send_ipi(sched-resched); xnarch_cpus_clear(sched-resched); } #endif clrbits(sched-status, XNRESCHED); return resched; } #define xnsched_set_resched(__sched__) do { \ xnsched_t *current_sched = xnpod_current_sched(); \ setbits(current_sched-status, XNRESCHED); \ if (current_sched != (__sched__)) { \ xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched); \ setbits((__sched__)-status, XNRESCHED);\ xnarch_memory_barrier();\ } \ } while (0) I would suggest (if I have got all the macros right): static inline int __xnpod_test_resched(struct xnsched *sched) { int resched = testbits(sched-status, XNRESCHED); if (unlikely(resched)) { #ifdef CONFIG_SMP /* Send resched IPI to remote CPU(s). */ xnarch_send_ipi(sched-resched); xnarch_cpus_clear(sched-resched); #endif clrbits(sched-status, XNRESCHED); } return resched; } /Anders ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] arm: Unprotected access to irq_desc field?
On Fri, 2010-10-29 at 09:00 +0200, Jan Kiszka wrote: Am 28.10.2010 21:34, Philippe Gerum wrote: On Thu, 2010-10-28 at 21:15 +0200, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Gilles, I happened to come across rthal_mark_irq_disabled/enabled on arm. On first glance, it looks like these helpers manipulate irq_desc::status non-atomically, i.e. without holding irq_desc::lock. Isn't this fragile? I have no idea. How do the other architectures do? As far as I know, this code has been copied from there. Other archs do the same, simply because once an irq is managed by the hal, it may not be shared in any way with the regular kernel. So locking is pointless. Indeed, I missed that all the other archs have this uninlined in hal.c. However, this leaves at least a race between xnintr_disable/enable and XN_ISR_PROPAGATE (ie. the related Linux path) behind. I can't see why XN_ISR_PROPAGATE would be involved here. This service pends an interrupt in the pipeline log. Not sure if it matters practically - but risking silent breakage for this micro optimization? It was not meant as an optimization; we may not grab the linux descriptor lock in this context because we may enter it in primary mode. Is disabling/enabling really in the highly latency-critical anywhere? Otherwise, I would suggest to just plug this by adding the intended lock for this field. The caller is expected to manage locking; AFAICS the only one who does not is the RTAI skin, which is obsolete and removed in 2.6.x, so no big deal. Jan -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] arm: Unprotected access to irq_desc field?
On Fri, 2010-10-29 at 11:05 +0200, Jan Kiszka wrote: Am 29.10.2010 10:27, Philippe Gerum wrote: On Fri, 2010-10-29 at 09:00 +0200, Jan Kiszka wrote: Am 28.10.2010 21:34, Philippe Gerum wrote: On Thu, 2010-10-28 at 21:15 +0200, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Gilles, I happened to come across rthal_mark_irq_disabled/enabled on arm. On first glance, it looks like these helpers manipulate irq_desc::status non-atomically, i.e. without holding irq_desc::lock. Isn't this fragile? I have no idea. How do the other architectures do? As far as I know, this code has been copied from there. Other archs do the same, simply because once an irq is managed by the hal, it may not be shared in any way with the regular kernel. So locking is pointless. Indeed, I missed that all the other archs have this uninlined in hal.c. However, this leaves at least a race between xnintr_disable/enable and XN_ISR_PROPAGATE (ie. the related Linux path) behind. I can't see why XN_ISR_PROPAGATE would be involved here. This service pends an interrupt in the pipeline log. And this finally lets Linux code run that fiddles with irq_desc::status as well - potentially in parallel to an unsyncrhonized xnintr_irq_disable in a different context. That's the problem. Propagation happens in primary domain. When is this supposed to conflict on the same CPU with linux? Not sure if it matters practically - but risking silent breakage for this micro optimization? It was not meant as an optimization; we may not grab the linux descriptor lock in this context because we may enter it in primary mode. Oh, that lock isn't harden as I somehow assumed. This of course complicates things. Is disabling/enabling really in the highly latency-critical anywhere? Otherwise, I would suggest to just plug this by adding the intended lock for this field. The caller is expected to manage locking; AFAICS the only one who does not is the RTAI skin, which is obsolete and removed in 2.6.x, so no big deal. The problem is that IRQ forwarding to Linux may let this manipulation race with plain Linux code, thus has to synchronize with it. It is a corner case (no one is supposed to pass IRQs down blindly anyway - if at all), but it should at least be documented (Don't use disable/enable together with IRQ forwarding unless you acquire the descriptor lock properly!). BTW, do we need to track the descriptor state in primary mode at all? That is the real issue. I don't see the point of doing this with the current kernel code. Jan -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] arm: Unprotected access to irq_desc field?
On Fri, 2010-10-29 at 14:09 +0200, Jan Kiszka wrote: Am 29.10.2010 14:00, Philippe Gerum wrote: On Fri, 2010-10-29 at 11:05 +0200, Jan Kiszka wrote: Am 29.10.2010 10:27, Philippe Gerum wrote: On Fri, 2010-10-29 at 09:00 +0200, Jan Kiszka wrote: Am 28.10.2010 21:34, Philippe Gerum wrote: On Thu, 2010-10-28 at 21:15 +0200, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Gilles, I happened to come across rthal_mark_irq_disabled/enabled on arm. On first glance, it looks like these helpers manipulate irq_desc::status non-atomically, i.e. without holding irq_desc::lock. Isn't this fragile? I have no idea. How do the other architectures do? As far as I know, this code has been copied from there. Other archs do the same, simply because once an irq is managed by the hal, it may not be shared in any way with the regular kernel. So locking is pointless. Indeed, I missed that all the other archs have this uninlined in hal.c. However, this leaves at least a race between xnintr_disable/enable and XN_ISR_PROPAGATE (ie. the related Linux path) behind. I can't see why XN_ISR_PROPAGATE would be involved here. This service pends an interrupt in the pipeline log. And this finally lets Linux code run that fiddles with irq_desc::status as well - potentially in parallel to an unsyncrhonized xnintr_irq_disable in a different context. That's the problem. Propagation happens in primary domain. When is this supposed to conflict on the same CPU with linux? The propagation triggers the delivery of this IRQ to the Linux domain, thus at some point there will be Linux accessing the descriptor while there might be xnintr_irq_enable/disable running on some other CPU (or it was preempted at the wrong point on the very same CPU). The point is that XN_ISR_PROPAGATE, as a mean to force sharing of an IRQ between both domains is plain wrong. Remove this, and no conflict remains; this is what needs to be addressed. The potential issue between xnintr_enable/disable and the hal routines does not exist, if those callers handle locking properly. Not sure if it matters practically - but risking silent breakage for this micro optimization? It was not meant as an optimization; we may not grab the linux descriptor lock in this context because we may enter it in primary mode. Oh, that lock isn't harden as I somehow assumed. This of course complicates things. Is disabling/enabling really in the highly latency-critical anywhere? Otherwise, I would suggest to just plug this by adding the intended lock for this field. The caller is expected to manage locking; AFAICS the only one who does not is the RTAI skin, which is obsolete and removed in 2.6.x, so no big deal. The problem is that IRQ forwarding to Linux may let this manipulation race with plain Linux code, thus has to synchronize with it. It is a corner case (no one is supposed to pass IRQs down blindly anyway - if at all), but it should at least be documented (Don't use disable/enable together with IRQ forwarding unless you acquire the descriptor lock properly!). BTW, do we need to track the descriptor state in primary mode at all? That is the real issue. I don't see the point of doing this with the current kernel code. Do we need to keep the status in synch with the hardware state for the case Linux may take over the descriptor again? Or will Linux test the state when processing a forwarded IRQ? These are the two potential scenarios that come to my mind. For former could be deferred, but the latter would be critical again. Jan -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] arm: Unprotected access to irq_desc field?
On Fri, 2010-10-29 at 14:46 +0200, Philippe Gerum wrote: On Fri, 2010-10-29 at 14:09 +0200, Jan Kiszka wrote: Am 29.10.2010 14:00, Philippe Gerum wrote: On Fri, 2010-10-29 at 11:05 +0200, Jan Kiszka wrote: Am 29.10.2010 10:27, Philippe Gerum wrote: On Fri, 2010-10-29 at 09:00 +0200, Jan Kiszka wrote: Am 28.10.2010 21:34, Philippe Gerum wrote: On Thu, 2010-10-28 at 21:15 +0200, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Gilles, I happened to come across rthal_mark_irq_disabled/enabled on arm. On first glance, it looks like these helpers manipulate irq_desc::status non-atomically, i.e. without holding irq_desc::lock. Isn't this fragile? I have no idea. How do the other architectures do? As far as I know, this code has been copied from there. Other archs do the same, simply because once an irq is managed by the hal, it may not be shared in any way with the regular kernel. So locking is pointless. Indeed, I missed that all the other archs have this uninlined in hal.c. However, this leaves at least a race between xnintr_disable/enable and XN_ISR_PROPAGATE (ie. the related Linux path) behind. I can't see why XN_ISR_PROPAGATE would be involved here. This service pends an interrupt in the pipeline log. And this finally lets Linux code run that fiddles with irq_desc::status as well - potentially in parallel to an unsyncrhonized xnintr_irq_disable in a different context. That's the problem. Propagation happens in primary domain. When is this supposed to conflict on the same CPU with linux? The propagation triggers the delivery of this IRQ to the Linux domain, thus at some point there will be Linux accessing the descriptor while there might be xnintr_irq_enable/disable running on some other CPU (or it was preempted at the wrong point on the very same CPU). The point is that XN_ISR_PROPAGATE, as a mean to force sharing of an IRQ between both domains is plain wrong. Remove this, and no conflict remains; this is what needs to be addressed. The potential issue between xnintr_enable/disable and the hal routines does not exist, if those callers handle locking properly. In any case, I don't think we could accept that sharing, so flipping the bits in the hal is in fact pointless. To match the linux locking, we should iron the irq_desc::lock, which we won't since this would cause massive jittery. We should stick to the basic logic: no sharing, therefore no tracking need for the irqflags. I'll kill XN_ISR_PROPAGATE in forge at some point, for sure. Not sure if it matters practically - but risking silent breakage for this micro optimization? It was not meant as an optimization; we may not grab the linux descriptor lock in this context because we may enter it in primary mode. Oh, that lock isn't harden as I somehow assumed. This of course complicates things. Is disabling/enabling really in the highly latency-critical anywhere? Otherwise, I would suggest to just plug this by adding the intended lock for this field. The caller is expected to manage locking; AFAICS the only one who does not is the RTAI skin, which is obsolete and removed in 2.6.x, so no big deal. The problem is that IRQ forwarding to Linux may let this manipulation race with plain Linux code, thus has to synchronize with it. It is a corner case (no one is supposed to pass IRQs down blindly anyway - if at all), but it should at least be documented (Don't use disable/enable together with IRQ forwarding unless you acquire the descriptor lock properly!). BTW, do we need to track the descriptor state in primary mode at all? That is the real issue. I don't see the point of doing this with the current kernel code. Do we need to keep the status in synch with the hardware state for the case Linux may take over the descriptor again? Or will Linux test the state when processing a forwarded IRQ? These are the two potential scenarios that come to my mind. For former could be deferred, but the latter would be critical again. Jan -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] arm: Unprotected access to irq_desc field?
On Thu, 2010-10-28 at 21:15 +0200, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Gilles, I happened to come across rthal_mark_irq_disabled/enabled on arm. On first glance, it looks like these helpers manipulate irq_desc::status non-atomically, i.e. without holding irq_desc::lock. Isn't this fragile? I have no idea. How do the other architectures do? As far as I know, this code has been copied from there. Other archs do the same, simply because once an irq is managed by the hal, it may not be shared in any way with the regular kernel. So locking is pointless. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] hanging in Xenomai 2.5.5
On Fri, 2010-10-15 at 22:43 -0700, Stefan Schaal wrote: Hi everybody, here is a quick first report on an issue that appeared with Xenomai 2.5.5 --- NOTE: 2.5.4 (and earlier) DOES NOT have this issue. We run multiple real-time processes, synchronized by semaphores and interprocess communication using shared memory. All is cleanly implemented using the xenomai real-time functions, no mode switches. The different processes are distributed on different processors of our multi-core machine using rt_task_spawn() with the T_CPU directive. Up to version 2.5.4, this worked fine. With version 2.5.5 (and 2.5.5.1), the processes hang after a few seconds of running (CPU consumption goes to zero), and usually one of them hangs so badly that it cannot be killed anymore with kill -9 -- thus reboot is required. The problems happens on BOTH our i386 machine (Dell 8-core, ubuntu 9.04, kernel 2.6.29.5) AND x86_64 machine (Dell 8 core, ubuntu 9.10, kernel 2.6.31.4). Thus, this seems to be specific to the xenomai release 2.5.5 and higher. No dmesg print-outs when this error occurs. We will try to create a simple test program to illustrate the problem, but maybe the issue is already obvious to some of the experts on this list. $ cat /proc/xenomai/stat $ cat /proc/xenomai/sched when the threads hang would help. Additionally, please clone the -stable repo from there: git://git.xenomai.org/xenomai-2.5.git then branch+build and test from these commits: - 6a020f5 first; if the bug does not show up anymore, check the next one - 5e7cfa5; if the bug is still there, try disabling CONFIG_XENO_OPT_PRIOCPL to test the basic system and re-check. Best wishes, -Stefan ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [forge] irqbench removal
On Sat, 2010-10-09 at 15:23 +0200, Jan Kiszka wrote: Philippe, irqbench does not inherently depend on a third I-pipe domain. It is a useful testcase, the only in our portfolio that targets a peripheral device use case. In fact, it was only of the first test cases for Native RTDM IIRC. Please revert the removal and then cut out only the few parts that actually instantiate an additional domain (i.e. mode 3. So, what do we do with this? Any chance we move to arch-neutral code for this test? Thanks, Jan -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] Xenomai forge
We need a playground for experimenting with the 3.x architecture. I have set up a GIT tree for this purpose, which currently contains legacy removal and preliminary cleanup work I've been doing lazily during the past months, periodically rebasing on -head. This tree is there for Xenomai hackers to work on radical changes toward Xenomai 3.x; this is NOT for production use. It is expected to be in a severe state-of-flux for several months from now on, until the updates on the infrastructure calm down. The plan is to work on this tree, until it makes sense to turn it into the official xenomai-3.0 tree eventually. Some CPU architectures currently supported in Xenomai 2.5.x may not be supported in this tree yet, until the dust settles, at some point (we do plan to support all of them eventually, though). The bottom line is to have powerpc (32/64), arm and x86 (32/64) available early; blackfin may be there early too, since their reference kernel tracks mainline closely as well. So this may leave us with nios2 lagging behind for a while. The same goes for RTOS emulators such as VxWorks, pSOS and friends. They have to be rebased on a new emulation core fully running in user-space we experimented with Xenomai/SOLO, so their legacy 2.x incarnations have been removed from the tree. This tree only features the POSIX, native and RTDM skins for now. The 3.x roadmap was published many moons ago on our web site [1], so I won't rehash the final goals for this architecture. However, the major development milestones can be outlined here: * legacy support removal (mainly: kernel 2.4 support and in-kernel skin APIs are being phased out, except the RTDM driver development API). * introduction of a new RTOS emulation core, which can run on top of the POSIX skin, or over the regular nptl. * port of the existing Xenomai/SOLO emulators (VxWorks, pSOS) over the new core. At some point, we shall decide whether it still makes sense to provide VRTX and uITRON emulators on this new core, given the lack of useful feedback we got for those for the past eight years. It seems that nobody cares for them actually. * integration of the missing bits to fully support our current dual kernel software stack over -rt kernels as well (i.e. no I-pipe), typically RTDM native. For sure, all theses tasks will entail various cleanup, streamlining, and sanitization activities all over the place, over time. The forge can be found at: git://git.xenomai.org/xenomai-forge.git Ok, just go wild now. [1] http://www.xenomai.org/index.php/Xenomai:Roadmap#Toward_Xenomai_3 -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Overcoming the foreign stack
On Wed, 2010-10-06 at 11:20 +0200, Jan Kiszka wrote: Am 05.10.2010 16:21, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Am 05.10.2010 15:50, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Am 05.10.2010 15:42, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Am 05.10.2010 15:15, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Hi, quite a few limitations and complications of using Linux services over non-Linux domains relate to potentially invalid current and thread_info. The non-Linux domain could maintain their own kernel stacks while Linux tend to derive current and thread_info from the stack pointer. This is not an issue anymore on x86-64 (both states are stored in per-cpu variables) but other archs (e.g. x86-32 or ARM) still use the stack and may continue to do so. I just looked into this thing again as I'm evaluating ways to exploit the kernel's tracing framework also under Xenomai. Unfortunately, it does a lot of fiddling with preempt_count and need_resched, so patching it for Xenomai use would become a maintenance nightmare. An alternative, also for other use cases like kgdb and probably perf, is to get rid of our dependency on home-grown stacks. I think we are on that way already as in-kernel skins have been deprecated. The only remaining user after them will be RTDM driver tasks. But I think those could simply become in-kernel shadows of kthreads which would bind their stacks to what Linux provides. Moreover, Xenomai could start updating current and thread_info on context switches (unless this already happens implicitly). That would give us proper contexts for system-level tracing and profiling. My key question is currently if and how much of this could be realized in 2.6. Could we drop in-kernel skins in that version? If not, what about disabling them by default, converting RTDM tasks to a kthread-based approach, and enabling tracing etc. only in that case? However, this might be a bit fragile unless we can establish compile-time or run-time requirements negotiation between Adeos and its users (Xenomai) about the stack model. A stupid question: why not make things the other way around: patch the current and current_thread_info functions to be made I-pipe aware and use an ipipe_current pointer to the current thread task_struct. Of course, there are places where the current or current_thread_info macros are implemented in assembly, so it may be not simple as it sounds, but it would allow to keep 128 Kb stacks if we want. This also means that we would have to put a task_struct at the bottom of every Xenomai task. First of all, overhead vs. maintenance. Either every access to preempt_count() would require a check for the current domain and its foreign stack flag, or I would have to patch dozens (if that is enough) of code sites in the tracer framework. No. I mean we would dereference a pointer named ipipe_current. That is all, no other check. This pointer would be maintained elsewhere. And we modify the current macro, like: #ifdef CONFIG_IPIPE extern struct task_struct *ipipe_current; #define current ipipe_current #endif Any calll site gets modified automatically. Or current_thread_info, if it is current_thread_info which is obtained using the stack pointer mask trick. The stack pointer mask trick only works with fixed-sized stacks, not a guaranteed property of in-kernel Xenomai threads. Precisely the reason why I propose to replace it with a global variable reference, or a per-cpu variable for SMP systems. Then why is Linux not using this in favor of the stack pointer approach on, say, ARM? For sure, we can patch all Adeos-supported archs away from stack-based to per-cpu current thread_info, but I don't feel comfortable with this in some way invasive approach as well. Well, maybe it's just my personal misperception. It is as much invasive as modifying local_irq_save/local_irq_restore. The real question about the global pointer approach, is, if it so much less efficient, how does Xenomai, which uses this scheme, manage to have good performances on ARM? Xenomai has no heavily-used preempt_disable/enable that is built on top of thread_info. But I also have no numbers on this. I looked closer at the kernel dependencies on a fixed stack size. Besides current and thread_info, further features that make use of this are stack unwinding (boundary checks) and overflow checking. So while we can work around the dependency for some tracing requirements, I really see no point in heading for this long-term. It just creates more subtle patching needs in Adeos, and it also requires work on Xenomai side. I really think it's better provide a compatible context to reduce maintenance efforts. So I played a bit with converting RTDM tasks to in-kernel shadows. It works but needs more fine-tuning. My proposal for
Re: [Xenomai-core] [Adeos-main] enable_kernel_fp broken with IPIPE on PowerPC
On Fri, 2010-10-01 at 19:51 +, Steve Deiters wrote: I'm getting a thread crash where an unaligned floating point access occurs. I tracked the cause down to enable_kernel_fp within the fix_alignment routine. The enable_kernel_fp routine is as follows: void enable_kernel_fp(void) { unsigned long flags; WARN_ON(preemptible()); local_irq_save_hw_cond(flags); #ifdef CONFIG_SMP if (current-thread.regs (current-thread.regs-msr MSR_FP)) giveup_fpu(current); else giveup_fpu(NULL); /* just enables FP for kernel */ #else giveup_fpu(last_task_used_math); #endif /* CONFIG_SMP */ local_irq_restore_hw_cond(flags); } The local_irq_save_hw_cond saves the old MSR value in flags. When this value is restored with local_irq_restore_hw_cond it loses the MSR[FP] bit that was set in giveup_fpu. If the MSR[FP] was not previously set before it saved the flags, I get a FPU exception a bit later in the alignment handling. As a quick fix I changed the line to restore to local_irq_restore_hw_cond(flags|MSR_FP); I'm not sure this is a correct fix. I'm don't know where else there might be code that is modifying the MSR in a similar fashion. It seems any such case would be broken. I'm using the ipipe version 2.10-03 patch that was bundled with Xenomai 2.5.4 on Linux 2.6.33.5. I noticed that this is still the same though in the 2.11-00 ipipe patch. Actually, giveup_fpu already handles the interrupt state properly, so the protection code in enable_kernel_fp is buggy and useless as well. I did not see any other spot where calling assembly code which may touch the MSR would conflict with interrupt protection in the caller. Could you try this patch instead? diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index e4eaca4..3743b27 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -98,12 +98,8 @@ void flush_fp_to_thread(struct task_struct *tsk) void enable_kernel_fp(void) { - unsigned long flags; - WARN_ON(preemptible()); - local_irq_save_hw_cond(flags); - #ifdef CONFIG_SMP if (current-thread.regs (current-thread.regs-msr MSR_FP)) giveup_fpu(current); @@ -112,7 +108,6 @@ void enable_kernel_fp(void) #else giveup_fpu(last_task_used_math); #endif /* CONFIG_SMP */ - local_irq_restore_hw_cond(flags); } EXPORT_SYMBOL(enable_kernel_fp); ___ Adeos-main mailing list adeos-m...@gna.org https://mail.gna.org/listinfo/adeos-main -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] RFC: /proc/xenomai/latency change
On Sat, 2010-09-25 at 19:27 +0200, Gilles Chanteperdrix wrote: Gilles Chanteperdrix wrote: Hi, I have been working on omap3 performances, and during this, I noticed one flaw in /proc/xenomai/latency: it displays the whole timer subsystem anticipation whereas it should probably only allow setting the scheduler latency. The reason is that when issuing the customary: echo 0 /proc/xenomai/latency we were in fact also disabling any account of the timer programming latency. This is probably almoste invisible on systems with low timer programming latencies, but this turned out to account for around 5us error on timer programming on omap. Now, the timer programming latency is back to a more reasonable 1us on omap, but I still think we should change this. However, since it may break some users settings, I wonder if we should apply it now or only in the 2.6 branch. Here is the patch I am talking about: Better: diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c index 7db0ccf..2297b74 100644 --- a/ksrc/nucleus/pod.c +++ b/ksrc/nucleus/pod.c @@ -3164,7 +3164,7 @@ static int latency_read_proc(char *page, { int len; - len = sprintf(page, %Lu\n, xnarch_tsc_to_ns(nklatency)); + len = sprintf(page, %Lu\n, xnarch_tsc_to_ns(nklatency - nktimerlat)); len -= off; if (len = off + count) *eof = 1; @@ -3196,7 +3196,7 @@ static int latency_write_proc(struct file *file, if ((*end != '\0' !isspace(*end)) || ns 0) return -EINVAL; - nklatency = xnarch_ns_to_tsc(ns); + nklatency = xnarch_ns_to_tsc(ns) + nktimerlat; return count; } Fine with me. The nucleus should always know better regarding the timer setup latency, so leaving it untouched by the /proc knob makes sense. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] RFC: /proc/xenomai/latency change
On Mon, 2010-09-27 at 14:37 +0200, Gilles Chanteperdrix wrote: Philippe Gerum wrote: Fine with me. The nucleus should always know better regarding the timer setup latency, so leaving it untouched by the /proc knob makes sense. Ok. My concern was about user settings, but guaranteeing an ABI never meant we had to maintain the latency over Xenomai revisions, that was kind of silly. It is even recommended to make it shorter over time. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?
On Wed, 2010-09-01 at 10:39 +0200, Gilles Chanteperdrix wrote: Philippe Gerum wrote: On Mon, 2010-08-30 at 17:39 +0200, Jan Kiszka wrote: Philippe Gerum wrote: Ok, Gilles did not grumble at you, so I'm daring the following patch, since I agree with you here. Totally untested, not even compiled, just for the fun of getting lockups and/or threads in limbos. Nah, just kidding, your shiny SMP box should be bricked even before that: diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h index f75c6f6..6ad66ba 100644 --- a/include/nucleus/sched.h +++ b/include/nucleus/sched.h @@ -184,10 +184,9 @@ static inline int xnsched_self_resched_p(struct xnsched *sched) #define xnsched_set_resched(__sched__) do { \ xnsched_t *current_sched = xnpod_current_sched(); \ xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched); \ To increase the probability of regressions: What about moving the above line... - if (unlikely(current_sched != (__sched__))) \ - xnarch_cpu_set(xnsched_cpu(__sched__), (__sched__)-resched); \ setbits(current_sched-status, XNRESCHED); \ - /* remote will set XNRESCHED locally in the IPI handler */ \ + if (current_sched != (__sched__)) \ + setbits((__sched__)-status, XNRESCHED); \ ...into this conditional block? Then you should be able to... } while (0) void xnsched_zombie_hooks(struct xnthread *thread); diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c index 623bdff..cff76c2 100644 --- a/ksrc/nucleus/pod.c +++ b/ksrc/nucleus/pod.c @@ -285,13 +285,6 @@ void xnpod_schedule_handler(void) /* Called with hw interrupts off. */ xnshadow_rpi_check(); } #endif /* CONFIG_SMP CONFIG_XENO_OPT_PRIOCPL */ - /* - * xnsched_set_resched() did set the resched mask remotely. We - * just need to make sure that our rescheduling request won't - * be filtered out locally when testing for XNRESCHED - * presence. - */ - setbits(sched-status, XNRESCHED); xnpod_schedule(); } @@ -2167,10 +2160,10 @@ static inline int __xnpod_test_resched(struct xnsched *sched) { int cpu = xnsched_cpu(sched), resched; - resched = xnarch_cpu_isset(cpu, sched-resched); - xnarch_cpu_clear(cpu, sched-resched); + resched = testbits(sched-status, XNRESCHED); #ifdef CONFIG_SMP /* Send resched IPI to remote CPU(s). */ + xnarch_cpu_clear(cpu, sched-resched); ...drop the line above as well. if (unlikely(xnsched_resched_p(sched))) { xnarch_send_ipi(sched-resched); xnarch_cpus_clear(sched-resched); Yes, I do think that we are way too stable on SMP boxes these days. Let's merge this as well to bring the fun back. The current cpu bit in the resched cpu mask allowed us to know whether the local cpu actually needed rescheduling. At least on SMP. It may happen that only remote cpus were set, so, in that case, we were only sending the IPI, then exiting __xnpod_schedule. So the choice here is, in SMP non-debug mode only, between: - setting and clearing a bit at each local rescheduling unconditionally - peeking at the runqueue uselessly at each rescheduling only involving remote threads The answer does not seem obvious. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PULL-REQUEST] assorted fixes and updates for 2.5.x
On Wed, 2010-09-01 at 07:14 +0200, Philippe Gerum wrote: On Tue, 2010-08-31 at 17:17 +0200, Philippe Gerum wrote: The following changes since commit 004f652d31d2e3b9b995850dbefcf12bc6dbd96d: Gilles Chanteperdrix (1): Fix typo in edaf1e2e54343b6e4bf5cf6ece9175ec0ab21cad are available in the git repository at: ssh+git://g...@xenomai.org/xenomai-rpm.git for-upstream Philippe Gerum (16): powerpc: upgrade I-pipe support to 2.6.34.4-powerpc-2.10-04 nucleus: demote RPI boost upon linux-originated signal blackfin: upgrade I-pipe support to 2.6.35.2-blackfin-1.15-00 nucleus: requeue blocked non-periodic timers properly x86: upgrade I-pipe support to 2.6.32.20-x86-2.7-02, 2.6.34.5-x86-2.7-03 arm: force enable preemptible switch support in SMP mode arm: enable VFP support in SMP arm: use rthal_processor_id() over non-linux contexts powerpc: resync thread switch code with mainline = 2.6.32 x86: increase SMP calibration value nucleus/sched: move locking to resume_rpi/suspend_rpi hal/generic: inline APC scheduling code nucleus, posix: use fast APC scheduling call nucleus/shadow: shorten the uninterruptible path to secondary mode This one causes the now famous need_resched debug assertion to trigger on UP. I'll have a look at this asap. It does not depend on 56ff4329f though. Fixed by the following commit: http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=47dac49c71e89b684203e854d1b0172ecacbc555 nucleus/sched: prevent remote wakeup from triggering a debug assertion powerpc: upgrade I-pipe support to 2.6.35.4-powerpc-2.11-00 -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] [PULL-REQUEST] urgent scheduler fix for 2.5.x head
The following changes since commit afc0eac7e4989f4134b18a256b5c5e1ca1c56a39: Gilles Chanteperdrix (1): posix: add a magic to internal structures. are available in the git repository at: ssh+git://g...@xenomai.org/xenomai-rpm.git for-upstream Philippe Gerum (2): nucleus/sched: fix race in non-atomic suspend path nucleus/sched: raise self-resched condition when unlocking scheduler ksrc/nucleus/pod.c | 13 + 1 files changed, 9 insertions(+), 4 deletions(-) -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?
On Mon, 2010-08-30 at 17:39 +0200, Jan Kiszka wrote: Philippe Gerum wrote: Ok, Gilles did not grumble at you, so I'm daring the following patch, since I agree with you here. Totally untested, not even compiled, just for the fun of getting lockups and/or threads in limbos. Nah, just kidding, your shiny SMP box should be bricked even before that: diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h index f75c6f6..6ad66ba 100644 --- a/include/nucleus/sched.h +++ b/include/nucleus/sched.h @@ -184,10 +184,9 @@ static inline int xnsched_self_resched_p(struct xnsched *sched) #define xnsched_set_resched(__sched__) do { \ xnsched_t *current_sched = xnpod_current_sched(); \ xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched); \ To increase the probability of regressions: What about moving the above line... - if (unlikely(current_sched != (__sched__))) \ - xnarch_cpu_set(xnsched_cpu(__sched__), (__sched__)-resched); \ setbits(current_sched-status, XNRESCHED); \ - /* remote will set XNRESCHED locally in the IPI handler */ \ + if (current_sched != (__sched__)) \ + setbits((__sched__)-status, XNRESCHED); \ ...into this conditional block? Then you should be able to... } while (0) void xnsched_zombie_hooks(struct xnthread *thread); diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c index 623bdff..cff76c2 100644 --- a/ksrc/nucleus/pod.c +++ b/ksrc/nucleus/pod.c @@ -285,13 +285,6 @@ void xnpod_schedule_handler(void) /* Called with hw interrupts off. */ xnshadow_rpi_check(); } #endif /* CONFIG_SMP CONFIG_XENO_OPT_PRIOCPL */ - /* -* xnsched_set_resched() did set the resched mask remotely. We -* just need to make sure that our rescheduling request won't -* be filtered out locally when testing for XNRESCHED -* presence. -*/ - setbits(sched-status, XNRESCHED); xnpod_schedule(); } @@ -2167,10 +2160,10 @@ static inline int __xnpod_test_resched(struct xnsched *sched) { int cpu = xnsched_cpu(sched), resched; - resched = xnarch_cpu_isset(cpu, sched-resched); - xnarch_cpu_clear(cpu, sched-resched); + resched = testbits(sched-status, XNRESCHED); #ifdef CONFIG_SMP /* Send resched IPI to remote CPU(s). */ + xnarch_cpu_clear(cpu, sched-resched); ...drop the line above as well. if (unlikely(xnsched_resched_p(sched))) { xnarch_send_ipi(sched-resched); xnarch_cpus_clear(sched-resched); Yes, I do think that we are way too stable on SMP boxes these days. Let's merge this as well to bring the fun back. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?
On Tue, 2010-08-31 at 09:09 +0200, Philippe Gerum wrote: On Mon, 2010-08-30 at 17:39 +0200, Jan Kiszka wrote: Philippe Gerum wrote: Ok, Gilles did not grumble at you, so I'm daring the following patch, since I agree with you here. Totally untested, not even compiled, just for the fun of getting lockups and/or threads in limbos. Nah, just kidding, your shiny SMP box should be bricked even before that: diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h index f75c6f6..6ad66ba 100644 --- a/include/nucleus/sched.h +++ b/include/nucleus/sched.h @@ -184,10 +184,9 @@ static inline int xnsched_self_resched_p(struct xnsched *sched) #define xnsched_set_resched(__sched__) do { \ xnsched_t *current_sched = xnpod_current_sched(); \ xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched); \ To increase the probability of regressions: What about moving the above line... - if (unlikely(current_sched != (__sched__))) \ - xnarch_cpu_set(xnsched_cpu(__sched__), (__sched__)-resched); \ setbits(current_sched-status, XNRESCHED); \ - /* remote will set XNRESCHED locally in the IPI handler */ \ + if (current_sched != (__sched__)) \ + setbits((__sched__)-status, XNRESCHED); \ ...into this conditional block? Then you should be able to... } while (0) void xnsched_zombie_hooks(struct xnthread *thread); diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c index 623bdff..cff76c2 100644 --- a/ksrc/nucleus/pod.c +++ b/ksrc/nucleus/pod.c @@ -285,13 +285,6 @@ void xnpod_schedule_handler(void) /* Called with hw interrupts off. */ xnshadow_rpi_check(); } #endif /* CONFIG_SMP CONFIG_XENO_OPT_PRIOCPL */ - /* - * xnsched_set_resched() did set the resched mask remotely. We - * just need to make sure that our rescheduling request won't - * be filtered out locally when testing for XNRESCHED - * presence. - */ - setbits(sched-status, XNRESCHED); xnpod_schedule(); } @@ -2167,10 +2160,10 @@ static inline int __xnpod_test_resched(struct xnsched *sched) { int cpu = xnsched_cpu(sched), resched; - resched = xnarch_cpu_isset(cpu, sched-resched); - xnarch_cpu_clear(cpu, sched-resched); + resched = testbits(sched-status, XNRESCHED); #ifdef CONFIG_SMP /* Send resched IPI to remote CPU(s). */ + xnarch_cpu_clear(cpu, sched-resched); ...drop the line above as well. if (unlikely(xnsched_resched_p(sched))) { xnarch_send_ipi(sched-resched); xnarch_cpus_clear(sched-resched); Yes, I do think that we are way too stable on SMP boxes these days. Let's merge this as well to bring the fun back. All worked according to plan, this introduced a nice lockup under switchtest load. Unfortunately, a solution exists to fix it: --- a/include/nucleus/sched.h +++ b/include/nucleus/sched.h @@ -176,17 +176,17 @@ static inline int xnsched_self_resched_p(struct xnsched *sched) /* Set self resched flag for the given scheduler. */ #define xnsched_set_self_resched(__sched__) do { \ - xnarch_cpu_set(xnsched_cpu(__sched__), (__sched__)-resched); \ setbits((__sched__)-status, XNRESCHED); \ } while (0) -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] [PULL-REQUEST] assorted fixes and updates for 2.5.x
The following changes since commit 004f652d31d2e3b9b995850dbefcf12bc6dbd96d: Gilles Chanteperdrix (1): Fix typo in edaf1e2e54343b6e4bf5cf6ece9175ec0ab21cad are available in the git repository at: ssh+git://g...@xenomai.org/xenomai-rpm.git for-upstream Philippe Gerum (16): powerpc: upgrade I-pipe support to 2.6.34.4-powerpc-2.10-04 nucleus: demote RPI boost upon linux-originated signal blackfin: upgrade I-pipe support to 2.6.35.2-blackfin-1.15-00 nucleus: requeue blocked non-periodic timers properly x86: upgrade I-pipe support to 2.6.32.20-x86-2.7-02, 2.6.34.5-x86-2.7-03 arm: force enable preemptible switch support in SMP mode arm: enable VFP support in SMP arm: use rthal_processor_id() over non-linux contexts powerpc: resync thread switch code with mainline = 2.6.32 x86: increase SMP calibration value nucleus/sched: move locking to resume_rpi/suspend_rpi hal/generic: inline APC scheduling code nucleus, posix: use fast APC scheduling call nucleus/shadow: shorten the uninterruptible path to secondary mode nucleus/sched: prevent remote wakeup from triggering a debug assertion powerpc: upgrade I-pipe support to 2.6.35.4-powerpc-2.11-00 -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?
On Mon, 2010-08-30 at 10:51 +0200, Jan Kiszka wrote: Philippe Gerum wrote: On Fri, 2010-08-27 at 20:09 +0200, Jan Kiszka wrote: Gilles Chanteperdrix wrote: Jan Kiszka wrote: Gilles Chanteperdrix wrote: Gilles Chanteperdrix wrote: Jan Kiszka wrote: Gilles Chanteperdrix wrote: Jan Kiszka wrote: Gilles Chanteperdrix wrote: Jan Kiszka wrote: Hi, I'm hitting that bug check in __xnpod_schedule after xnintr_clock_handler issued a xnpod_schedule like this: if (--sched-inesting == 0) { __clrbits(sched-status, XNINIRQ); xnpod_schedule(); } Either the assumption behind the bug check is no longer correct (no call to xnpod_schedule() without a real need), or we should check for __xnpod_test_resched(sched) in xnintr_clock_handler (but under nklock then). Comments? You probably have a real bug. This BUG_ON means that the scheduler is about to switch context for real, whereas the resched bit is not set, which is wrong. This happened over my 2.6.35 port - maybe some spurious IRQ enabling. Debugging further... You should look for something which changes the scheduler state without setting the resched bit, or for something which clears the bit without taking the scheduler changes into account. It looks like a generic Xenomai issue on SMP boxes, though a mostly harmless one: The task that was scheduled in without XNRESCHED set locally has been woken up by a remote CPU. The waker requeued the task and set the resched condition for itself and in the resched proxy mask for the remote CPU. But there is at least one place in the Xenomai code where we drop the nklock between xnsched_set_resched and xnpod_schedule: do_taskexit_event (I bet there are even more). Now the resched target CPU runs into a timer handler, issues xnpod_schedule unconditionally, and happens to find the woken-up task before it is actually informed via an IPI. I think this is a harmless race, but it ruins the debug assertion need_resched != 0. Not that harmless, since without the debugging code, we would miss the reschedule too... Ok. But we would finally reschedule when handling the IPI. So, the effect we see is a useless delay in the rescheduling. Depends on the POV: The interrupt or context switch between set_resched and xnpod_reschedule that may defer rescheduling may also hit us before we were able to wake up the thread at all. The worst case should not differ significantly. Yes, and whether we set the bit and call xnpod_schedule atomically does not really matter either: the IPI takes time to propagate, and since xnarch_send_ipi does not wait for the IPI to have been received on the remote CPU, there is no guarantee that xnpod_schedule could not have been called in the mean time. Indeed. More importantly, since in order to do an action on a remote xnsched_t, we need to hold the nklock, is there any point in not setting the XNRESCHED bit on that distant structure, at the same time as when we set the cpu bit on the local sched structure mask and send the IPI? This way, setting the XNRESCHED bit in the IPI handler would no longer be necessary, and we would avoid the race. I guess so. The IPI isn't more than a hint that something /may/ have changed in the schedule anyway. This makes sense. I'm currently testing the patch below which implements a close variant of Gilles's proposal. Could you try it as well, to see if things improve? http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=3200660065146915976c193387bf0851be10d0cc Will test ASAP. The logic makes sure that we can keep calling xnsched_set_resched() then xnpod_schedule() outside of the same critical section, which is something we need. Otherwise this requirement would extend to xnpod_suspend/resume_thread(), which is not acceptable. I still wonder if things can't be even simpler. What is the purpose of xnsched_t::resched? I first thought it's just there to coalesce multiple remote reschedule requests, thus IPIs triggered by one CPU over successive wakeups etc. If that is true, why going through resched for local changes, why not setting XNRESCHED directly? And why not setting the remote XNRESCHED instead of remote's xnsched_t::resched? Ok, Gilles did not grumble at you, so I'm daring the following patch, since I agree with you here. Totally untested, not even compiled, just for the fun of getting lockups and/or threads in limbos. Nah, just kidding, your shiny SMP box should be bricked even before that: diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h index f75c6f6..6ad66ba 100644 --- a/include/nucleus/sched.h +++ b/include/nucleus/sched.h @@ -184,10 +184,9 @@ static inline int xnsched_self_resched_p(struct xnsched *sched) #define xnsched_set_resched(__sched__) do {\ xnsched_t *current_sched = xnpod_current_sched
Re: [Xenomai-core] xenomai 2.5.3/native, kernel 2.6.31.8 and fork()
On Sat, 2010-08-21 at 19:36 +0200, Gilles Chanteperdrix wrote: Gilles Chanteperdrix wrote: There are other issues to consider, such as detecting that a private mutex created in the father continues to be used in the child. A simple fix for this would be to keep a list of mutexes in the native and posix skin, and nullify their magic/opaque pointer at fork. The problem is that there is no more room in pthread_mutex_t, so we will have to malloc at pthread_mutex_init time. Please simply issue a warning as you suggested once when a potentially dangerous situation arises upon fork regarding mutexes. Piling up non-trivial code to prevent an obviously broken application from misbehaving even more is way too expensive if such code could introduce more overhead, and potentially secondary mode switches. IIUC, we are discussing about apps using in a child context some private mutexes which were initially created in the parent context, right? If so, then a warning upon detection should suffice to have the author go back to the drawing board, and optionally run man pthread_mutex_init as well. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PATCH] Mayday support
On Fri, 2010-08-20 at 14:32 +0200, Jan Kiszka wrote: Jan Kiszka wrote: Philippe Gerum wrote: I've toyed a bit to find a generic approach for the nucleus to regain complete control over a userland application running in a syscall-less loop. The original issue was about recovering gracefully from a runaway situation detected by the nucleus watchdog, where a thread would spin in primary mode without issuing any syscall, but this would also apply for real-time signals pending for such a thread. Currently, Xenomai rt signals cannot preempt syscall-less code running in primary mode either. The major difference between the previous approaches we discussed about and this one, is the fact that we now force the runaway thread to run a piece of valid code that calls into the nucleus. We do not force the thread to run faulty code or at a faulty address anymore. Therefore, we can reuse this feature to improve the rt signal management, without having to forge yet-another signal stack frame for this. The code introduced only fixes the watchdog related issue, but also does some groundwork for enhancing the rt signal support later. The implementation details can be found here: http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=4cf21a2ae58354819da6475ae869b96c2defda0c The current mayday support is only available for powerpc and x86 for now, more will come in the next days. To have it enabled, you have to upgrade your I-pipe patch to 2.6.32.15-2.7-00 or 2.6.34-2.7-00 for x86, 2.6.33.5-2.10-01 or 2.6.34-2.10-00 for powerpc. That feature relies on a new interface available from those latest patches. The current implementation does not break the 2.5.x ABI on purpose, so we could merge it into the stable branch. We definitely need user feedback on this. Typically, does arming the nucleus watchdog with that patch support in, properly recovers from your favorite get me out of here situation? TIA, You can pull this stuff from git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch. I've retested the feature as it's now in master, and it has one remaining problem: If you run the cpu hog under gdb control and try to break out of the while(1) loop, this doesn't work before the watchdog expired - of course. But if you send the break before the expiry (or hit a breakpoint), something goes wrong. The Xenomai task continues to spin, and there is no chance to kill its process (only gdb). # cat /proc/xenomai/sched CPU PIDCLASS PRI TIMEOUT TIMEBASE STAT NAME 0 0 idle-1 - master RR ROOT/0 Eeek, we really need to have a look at this funky STAT output. 1 0 idle-1 - master R ROOT/1 0 6120 rt 99 - master Tt cpu-hog # cat /proc/xenomai/stat CPU PIDMSWCSWPFSTAT %CPU NAME 0 0 0 0 0 005000880.0 ROOT/0 1 0 0 0 0 00500080 99.7 ROOT/1 0 6120 0 1 0 00342180 100.0 cpu-hog 0 0 0 21005 0 0.0 IRQ3340: [timer] 1 0 0 35887 0 0.3 IRQ3340: [timer] Fixable by this tiny change: diff --git a/ksrc/nucleus/sched.c b/ksrc/nucleus/sched.c index 5242d9f..04a344e 100644 --- a/ksrc/nucleus/sched.c +++ b/ksrc/nucleus/sched.c @@ -175,7 +175,8 @@ void xnsched_init(struct xnsched *sched, int cpu) xnthread_name(sched-rootcb)); #ifdef CONFIG_XENO_OPT_WATCHDOG - xntimer_init(sched-wdtimer, nktbase, xnsched_watchdog_handler); + xntimer_init_noblock(sched-wdtimer, nktbase, + xnsched_watchdog_handler); xntimer_set_name(sched-wdtimer, [watchdog]); xntimer_set_priority(sched-wdtimer, XNTIMER_LOPRIO); xntimer_set_sched(sched-wdtimer, sched); I.e. the watchdog timer should not be stopped by any ongoing debug session of a Xenomai app. Will queue this for upstream. Yes, that makes a lot of sense now. The watchdog would not fire if the task was single-stepped anyway, since the latter would have been moved to secondary mode first. Did you see this bug happening in a uniprocessor context as well? Jan -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PATCH] Mayday support
On Fri, 2010-08-20 at 16:06 +0200, Jan Kiszka wrote: Philippe Gerum wrote: On Fri, 2010-08-20 at 14:32 +0200, Jan Kiszka wrote: Jan Kiszka wrote: Philippe Gerum wrote: I've toyed a bit to find a generic approach for the nucleus to regain complete control over a userland application running in a syscall-less loop. The original issue was about recovering gracefully from a runaway situation detected by the nucleus watchdog, where a thread would spin in primary mode without issuing any syscall, but this would also apply for real-time signals pending for such a thread. Currently, Xenomai rt signals cannot preempt syscall-less code running in primary mode either. The major difference between the previous approaches we discussed about and this one, is the fact that we now force the runaway thread to run a piece of valid code that calls into the nucleus. We do not force the thread to run faulty code or at a faulty address anymore. Therefore, we can reuse this feature to improve the rt signal management, without having to forge yet-another signal stack frame for this. The code introduced only fixes the watchdog related issue, but also does some groundwork for enhancing the rt signal support later. The implementation details can be found here: http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=4cf21a2ae58354819da6475ae869b96c2defda0c The current mayday support is only available for powerpc and x86 for now, more will come in the next days. To have it enabled, you have to upgrade your I-pipe patch to 2.6.32.15-2.7-00 or 2.6.34-2.7-00 for x86, 2.6.33.5-2.10-01 or 2.6.34-2.10-00 for powerpc. That feature relies on a new interface available from those latest patches. The current implementation does not break the 2.5.x ABI on purpose, so we could merge it into the stable branch. We definitely need user feedback on this. Typically, does arming the nucleus watchdog with that patch support in, properly recovers from your favorite get me out of here situation? TIA, You can pull this stuff from git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch. I've retested the feature as it's now in master, and it has one remaining problem: If you run the cpu hog under gdb control and try to break out of the while(1) loop, this doesn't work before the watchdog expired - of course. But if you send the break before the expiry (or hit a breakpoint), something goes wrong. The Xenomai task continues to spin, and there is no chance to kill its process (only gdb). # cat /proc/xenomai/sched CPU PIDCLASS PRI TIMEOUT TIMEBASE STAT NAME 0 0 idle-1 - master RR ROOT/0 Eeek, we really need to have a look at this funky STAT output. I've a patch for this queued as well. Was only a cosmetic thing. 1 0 idle-1 - master R ROOT/1 0 6120 rt 99 - master Tt cpu-hog # cat /proc/xenomai/stat CPU PIDMSWCSWPFSTAT %CPU NAME 0 0 0 0 0 005000880.0 ROOT/0 1 0 0 0 0 00500080 99.7 ROOT/1 0 6120 0 1 0 00342180 100.0 cpu-hog 0 0 0 21005 0 0.0 IRQ3340: [timer] 1 0 0 35887 0 0.3 IRQ3340: [timer] Fixable by this tiny change: diff --git a/ksrc/nucleus/sched.c b/ksrc/nucleus/sched.c index 5242d9f..04a344e 100644 --- a/ksrc/nucleus/sched.c +++ b/ksrc/nucleus/sched.c @@ -175,7 +175,8 @@ void xnsched_init(struct xnsched *sched, int cpu) xnthread_name(sched-rootcb)); #ifdef CONFIG_XENO_OPT_WATCHDOG - xntimer_init(sched-wdtimer, nktbase, xnsched_watchdog_handler); + xntimer_init_noblock(sched-wdtimer, nktbase, + xnsched_watchdog_handler); xntimer_set_name(sched-wdtimer, [watchdog]); xntimer_set_priority(sched-wdtimer, XNTIMER_LOPRIO); xntimer_set_sched(sched-wdtimer, sched); I.e. the watchdog timer should not be stopped by any ongoing debug session of a Xenomai app. Will queue this for upstream. Yes, that makes a lot of sense now. The watchdog would not fire if the task was single-stepped anyway, since the latter would have been moved to secondary mode first. Yep. Did you see this bug happening in a uniprocessor context as well? No, as it is impossible on a uniprocessor to interact with gdb if a cpu hog - the only existing CPU is simply not available. :) I was rather thinking of your hit-a-breakpoint-or-^C-early scenario... I thought you did see this on UP as well, and scratched my head to understand how this would have been possible. Ok, so let's merge this. Jan -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https
Re: [Xenomai-core] rt timer jitter
On Fri, 2010-08-20 at 18:20 +0200, Krzysztof Błaszkowski wrote: On Fri, 2010-08-20 at 18:06 +0200, Philippe Gerum wrote: On Fri, 2010-08-20 at 17:55 +0200, Krzysztof Błaszkowski wrote: Do you have any idea about reducing rt timer jitter ? I experience annoyingly big jitter in a thread which is supposed to run at 400us (i reckon this is nothing extra demanding from atom @1.6G) the thread's loop looks like: { function1() ..2() ..3() ..4() rt_task_wait_period() } (^yet another simplified model^) This is the typical pattern of the latency test. What figures do you get with: # /usr/xenomai/bin/latency -t0 ... # /usr/xenomai/bin/latency -t1 t0: RTS| -1.337| -0.039| 13.285| 0| 0| 00:02:13/00:02:13 Those are common figures for user-space latency on the kind of hw you run this test on. i can't run t1 because of missing seno_timerbench.ko (i have no idea how to find a config option which would build it) Did you consider using the Search feature from xconfig/gconfig/whatever, looking for timerbench? config XENO_DRIVERS_TIMERBENCH depends on XENO_SKIN_RTDM tristate Timer benchmark driver default y help Kernel-based benchmark driver for timer latency evaluation. See testsuite/latency for a possible front-end. If you run your app in kernel space, then -t1 is what you want to run. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core