Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : Add regression test for mprotect on pinned memory
On 2012-04-02 16:35, Gilles Chanteperdrix wrote: On 04/02/2012 04:09 PM, GIT version control wrote: Module: xenomai-jki Branch: for-upstream Commit: 410e90d085d21dc913f8724efafe6ae75bd3c952 URL: http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=410e90d085d21dc913f8724efafe6ae75bd3c952 Author: Jan Kiszka jan.kis...@siemens.com Date: Fri Mar 30 18:06:27 2012 +0200 Add regression test for mprotect on pinned memory This tests both the original issue of mprotect reintroducing COW pages to Xenomai processes as well as the recently fixed zero page corruption. Signed-off-by: Jan Kiszka jan.kis...@siemens.com +static void check_inner(const char *fn, int line, const char *msg, +int status, int expected) +{ +if (status == expected) +return; + +rt_task_set_mode(T_WARNSW, 0, NULL); +rt_print_flush_buffers(); (...) +static void check_value_inner(const char *fn, int line, const char *msg, + int value, int expected) +{ +if (value == expected) +return; + +rt_task_set_mode(T_WARNSW, 0, NULL); +rt_print_flush_buffers(); (...) +void sigdebug_handler(int sig, siginfo_t *si, void *context) +{ +unsigned int reason = si-si_value.sival_int; + +rt_print_flush_buffers(); (...) + +rt_task_set_mode(T_WARNSW, 0, NULL); +rt_print_flush_buffers(); Maybe you could use posix skin's printf instead of putting calls to rt_print_flush_buffers all over the place? I did not mean for this call to be exported, I only added it for internal use by the posix skin. Could be done, likely together with a complete switch to posix. I could also start to use the check_* wrappers that I just discovered. BTW, the native version lacks that flush unless it's used in native+posix context. I will write a fix. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : Add regression test for mprotect on pinned memory
On 04/02/2012 04:09 PM, GIT version control wrote: Module: xenomai-jki Branch: for-upstream Commit: 410e90d085d21dc913f8724efafe6ae75bd3c952 URL: http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=410e90d085d21dc913f8724efafe6ae75bd3c952 Author: Jan Kiszka jan.kis...@siemens.com Date: Fri Mar 30 18:06:27 2012 +0200 Add regression test for mprotect on pinned memory This tests both the original issue of mprotect reintroducing COW pages to Xenomai processes as well as the recently fixed zero page corruption. Signed-off-by: Jan Kiszka jan.kis...@siemens.com +static void check_inner(const char *fn, int line, const char *msg, + int status, int expected) +{ + if (status == expected) + return; + + rt_task_set_mode(T_WARNSW, 0, NULL); + rt_print_flush_buffers(); (...) +static void check_value_inner(const char *fn, int line, const char *msg, + int value, int expected) +{ + if (value == expected) + return; + + rt_task_set_mode(T_WARNSW, 0, NULL); + rt_print_flush_buffers(); (...) +void sigdebug_handler(int sig, siginfo_t *si, void *context) +{ + unsigned int reason = si-si_value.sival_int; + + rt_print_flush_buffers(); (...) + + rt_task_set_mode(T_WARNSW, 0, NULL); + rt_print_flush_buffers(); Maybe you could use posix skin's printf instead of putting calls to rt_print_flush_buffers all over the place? I did not mean for this call to be exported, I only added it for internal use by the posix skin. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] xenomai-forge: round-robin scheduling in pSOS skin
On 03/08/2012 03:30 PM, Ronny Meeus wrote: Hello I'm are using the xenomai-forge pSOS skin (Mercury). My application is running on a P4040 (Freescale PPC with 4 cores). Some code snippets are put in this mail but the complete testcode is also attached. I have a test task that just consumes the CPU: int run_test = 1; static void perform_work(u_long counter,u_long b,u_long c,u_long d) { int i; while (run_test) { for (i=0;i10;i++); (*(unsigned long*)counter)++; } while (1) tm_wkafter(1000); } If I create 2 instances of this task with the T_SLICE option set: t_create(WORK,10,0,0,0,tid); t_start(tid,T_TSLICE, perform_work, args); I see that only 1 task is consuming CPU. # taskset 1 ./roundrobin.exe #.543| [main] SCHED_RT priorities = [1 .. 99] .656| [main] SCHED_RT.99 reserved for IRQ emulation .692| [main] SCHED_RT.98 reserved for scheduler-lock emulation 0 - 6602 1 - 0 If I adapt the code so that I call in my init the threadobj_start_rr function, I see that the load is equally distributed over the 2 threads: # taskset 1 ./roundrobin.exe #.557| [main] SCHED_RT priorities = [1 .. 99] .672| [main] SCHED_RT.99 reserved for IRQ emulation .708| [main] SCHED_RT.98 reserved for scheduler-lock emulation 0 - 3290 1 - 3291 Here are the questions: - why is the threadobj_start_rr function not called from the context of the init of the psos layer. Because threadobj_start_rr() was originally designed to activate round-robin for all threads (some RTOS like VxWorks expose that kind of API), not on a per-thread basis. This is not what pSOS wants. The round-robin API is in state of flux for mercury, only the cobalt one is stable. This is why RR is not yet activated despite T_SLICE is recognized. - why is the roundrobin implemented in this way? If the tasks would be mapped on the SCHED_RR instead of the SCHED_FF the Linux scheduler would take care of this. Nope. We need per-thread RR intervals, to manage multiple priority groups concurrently, and we also want to define that interval as we see fit for proper RTOS emulation. POSIX does not define anything like sched_set_rr_interval(), and the linux kernel applies a default fixed interval to all threads from the SCHED_RR class (100ms IIRC). So we have to emulate SCHED_RR over SCHED_FIFO plus a per-thread virtual timer. On the other hand, once the threadobj_start_rr function is called from my init, and I create the tasks in T_NOTSLICE mode, the time-slicing is still done. Because you called threadobj_start_rr(). Thanks. --- Ronny ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] xenomai-forge: round-robin scheduling in pSOS skin
On Thu, Mar 8, 2012 at 3:30 PM, Ronny Meeus ronny.me...@gmail.com wrote: Hello I'm are using the xenomai-forge pSOS skin (Mercury). My application is running on a P4040 (Freescale PPC with 4 cores). Some code snippets are put in this mail but the complete testcode is also attached. I have a test task that just consumes the CPU: int run_test = 1; static void perform_work(u_long counter,u_long b,u_long c,u_long d) { int i; while (run_test) { for (i=0;i10;i++); (*(unsigned long*)counter)++; } while (1) tm_wkafter(1000); } If I create 2 instances of this task with the T_SLICE option set: t_create(WORK,10,0,0,0,tid); t_start(tid,T_TSLICE, perform_work, args); I see that only 1 task is consuming CPU. # taskset 1 ./roundrobin.exe # .543| [main] SCHED_RT priorities = [1 .. 99] .656| [main] SCHED_RT.99 reserved for IRQ emulation .692| [main] SCHED_RT.98 reserved for scheduler-lock emulation 0 - 6602 1 - 0 If I adapt the code so that I call in my init the threadobj_start_rr function, I see that the load is equally distributed over the 2 threads: # taskset 1 ./roundrobin.exe # .557| [main] SCHED_RT priorities = [1 .. 99] .672| [main] SCHED_RT.99 reserved for IRQ emulation .708| [main] SCHED_RT.98 reserved for scheduler-lock emulation 0 - 3290 1 - 3291 Here are the questions: - why is the threadobj_start_rr function not called from the context of the init of the psos layer. - why is the roundrobin implemented in this way? If the tasks would be mapped on the SCHED_RR instead of the SCHED_FF the Linux scheduler would take care of this. On the other hand, once the threadobj_start_rr function is called from my init, and I create the tasks in T_NOTSLICE mode, the time-slicing is still done. Thanks. --- Ronny Any comments on this? --- Ronny ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] xenomai-forge: round-robin scheduling in pSOS skin
On 03/15/2012 08:49 PM, Ronny Meeus wrote: On Thu, Mar 8, 2012 at 3:30 PM, Ronny Meeus ronny.me...@gmail.com wrote: Hello I'm are using the xenomai-forge pSOS skin (Mercury). My application is running on a P4040 (Freescale PPC with 4 cores). Some code snippets are put in this mail but the complete testcode is also attached. I have a test task that just consumes the CPU: int run_test = 1; static void perform_work(u_long counter,u_long b,u_long c,u_long d) { int i; while (run_test) { for (i=0;i10;i++); (*(unsigned long*)counter)++; } while (1) tm_wkafter(1000); } If I create 2 instances of this task with the T_SLICE option set: t_create(WORK,10,0,0,0,tid); t_start(tid,T_TSLICE, perform_work, args); I see that only 1 task is consuming CPU. # taskset 1 ./roundrobin.exe #.543| [main] SCHED_RT priorities = [1 .. 99] .656| [main] SCHED_RT.99 reserved for IRQ emulation .692| [main] SCHED_RT.98 reserved for scheduler-lock emulation 0 - 6602 1 - 0 If I adapt the code so that I call in my init the threadobj_start_rr function, I see that the load is equally distributed over the 2 threads: # taskset 1 ./roundrobin.exe #.557| [main] SCHED_RT priorities = [1 .. 99] .672| [main] SCHED_RT.99 reserved for IRQ emulation .708| [main] SCHED_RT.98 reserved for scheduler-lock emulation 0 - 3290 1 - 3291 Here are the questions: - why is the threadobj_start_rr function not called from the context of the init of the psos layer. - why is the roundrobin implemented in this way? If the tasks would be mapped on the SCHED_RR instead of the SCHED_FF the Linux scheduler would take care of this. On the other hand, once the threadobj_start_rr function is called from my init, and I create the tasks in T_NOTSLICE mode, the time-slicing is still done. Thanks. --- Ronny Any comments on this? I am afraid you will have to wait for Philippe to have time to answer you. I am a bit ignorant about the psos API, but most importantly completely ignorant of the mercury. I am working on forge, but mostly with cobalt. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Xenomai 2.6.0 in Debian
On 11/06/2011 11:11 PM, Roland Stigge wrote: Hi, thanks for Xenomai 2.6.0! I'm attaching a patch that's helpful for the integration of Xenomai in Debian (and FHS compliant systems in general), moving the architecture dependent test programs from /usr/share to /usr/lib. Applied, thanks. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc4
On Wed, 2011-09-28 at 20:34 +0200, Gilles Chanteperdrix wrote: Hi, here is the 4th release candidate for Xenomai 2.6.0: http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc4.tar.bz2 Novelties since -rc3 include: - a fix for the long names issue on psos+ - a fix for the build issue of mscan on mpc52xx (please Wolfgang, have a look at the patch, to see if you like it:) http://git.xenomai.org/?p=xenomai-head.git;a=commitdiff;h=d22fd231db7eb0af8e77ec570efb89e578e13781;hp=4a2188f049e96fc59aa7c4a7a9d058075f3d79e8 - a new version of the I-pipe patch for linux 3.0 on ppc. People running 2.13-02/powerpc over linux 3.0.4 should definitely upgrade to 2.13-03, or apply this: http://git.denx.de/?p=ipipe-2.6.git;a=commit;h=7c28eb2dea86366bf721663bb8d28ce89cf2806c This should be the last release candidate. Regards. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1
On Mon, Sep 5, 2011 at 7:31 PM, Gilles Chanteperdrix gilles.chanteperd...@xenomai.org wrote: On 09/05/2011 07:14 PM, Henri Roosen wrote: Hi Gilles, Unfortunately I didn't find the time to test this release yet. I'm just wondering if there is a fix for this problem in the 2.6.0 release: https://mail.gna.org/public/xenomai-core/2011-05/msg00028.html This one is fixed, a bit differently, since we fixed the ppd handling so that the ppd is valid up to the end of a process. We are using the auto-relax patches on top of 2.5.6 for a long time now. We found issues with it regarding auto-relax tasks that were not being auto-relaxed anymore. Philippe made patches for that, see https://mail.gna.org/public/xenomai-help/2011-03/msg00161.html. Philippe's patches for rt_task_send/receive/reply should have been merged too. However, locally I reverted those two patches because these introduced a memory leak in xnheap; I could only do rt_task_create() rt_task_delete() for 1024 times ;-). I thought that was the discussion of https://mail.gna.org/public/xenomai-core/2011-05/msg00028.html at that time and I don't recall a proper fix for it was provided. But I might have missed it... This looks related to the ppd issue as well, in which case, it should have been fixed too. It would be nice if you could test the release and tell us whether you still have these issues. I've now tested the issues on the Xenomai 2.6.0-rc1 release. Both issues don't occur. Thanks, Henri -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1
On 09/06/2011 11:15 PM, Philippe Gerum wrote: On Tue, 2011-09-06 at 20:19 +0200, Gilles Chanteperdrix wrote: On 09/06/2011 05:10 PM, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:19 +0200, Gilles Chanteperdrix wrote: On 09/06/2011 03:27 PM, Philippe Gerum wrote: On Tue, 2011-09-06 at 13:31 +0200, Gilles Chanteperdrix wrote: On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote: Hi, The first release candidate for the 2.6.0 version may be downloaded here: http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2 Hi, currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related to vfile support. Do we really want to still support 2.4 kernels? That would not be a massive loss, but removing linux 2.4 support is more than a few hunks here and there, so this may not be the right thing to do ATM. Besides, it would be better not to leave the few linux 2.4 users out there without upgrade path to xenomai 2.6, since this will be the last maintained version from the Xenomai 2.x architecture. That stuff does not compile likely because the Config.in bits are not up to date, blame it on me. I'll make this build over linux 2.4 and commit the result today. No problem, I was not looking for someone to blame... Since you are at it, I have problems compiling the nios2 kernel too, but I am not sure I got the proper configuration file. HEAD builds fine based on the attached .config. Btw we now only support the MMU version (2.6.35.2) of this kernel over Xenomai 2.6. Reference tree is available there: url = git://sopc.et.ntust.edu.tw/git/linux-2.6.git branch = nios2mmu nommu support is discontinued for nios2 - people who depend on it should stick with Xenomai 2.5.x. Ok, still not building, maybe the commit number mentioned in the README is not up-to-date? The commit # is correct, but I suspect that your kernel tree does not have the files normally created by the SOPC builder anymore, these can't (may not actually) be included in the pipeline patch. In short, your tree might be missing the bits corresponding to the fpga design your build for, so basic symbols like HRCLOCK* and HRTIMER* are undefined. I'm building for a cyclone 3c25 from the NEEK kit, with SOPC files available from arch/nios2/boards/neek. Any valuable files in there on your side? (typically, include/asm/custom_fpga.h should contain definitions for our real-time clocks and timers) I create a file arch/nios2/hardware.mk, which contains: SYSPTF = /path/to/std_1s10.ptf CPU = cpu EXEMEM = sdram Then run the kernel compilation as for any other platform. Is it not sufficient? Perhaps my .ptf file is outdated? -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1
On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote: Hi, The first release candidate for the 2.6.0 version may be downloaded here: http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2 Hi, currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related to vfile support. Do we really want to still support 2.4 kernels? Regards. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1
Hi, On 09/06/2011 01:31 PM, Gilles Chanteperdrix wrote: currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related to vfile support. Do we really want to still support 2.4 kernels? No worries here from the Debian (and derivatives) perspective. bye, Roland ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1
On Tue, 2011-09-06 at 13:31 +0200, Gilles Chanteperdrix wrote: On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote: Hi, The first release candidate for the 2.6.0 version may be downloaded here: http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2 Hi, currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related to vfile support. Do we really want to still support 2.4 kernels? That would not be a massive loss, but removing linux 2.4 support is more than a few hunks here and there, so this may not be the right thing to do ATM. Besides, it would be better not to leave the few linux 2.4 users out there without upgrade path to xenomai 2.6, since this will be the last maintained version from the Xenomai 2.x architecture. That stuff does not compile likely because the Config.in bits are not up to date, blame it on me. I'll make this build over linux 2.4 and commit the result today. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1
On 09/06/2011 03:27 PM, Philippe Gerum wrote: On Tue, 2011-09-06 at 13:31 +0200, Gilles Chanteperdrix wrote: On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote: Hi, The first release candidate for the 2.6.0 version may be downloaded here: http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2 Hi, currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related to vfile support. Do we really want to still support 2.4 kernels? That would not be a massive loss, but removing linux 2.4 support is more than a few hunks here and there, so this may not be the right thing to do ATM. Besides, it would be better not to leave the few linux 2.4 users out there without upgrade path to xenomai 2.6, since this will be the last maintained version from the Xenomai 2.x architecture. That stuff does not compile likely because the Config.in bits are not up to date, blame it on me. I'll make this build over linux 2.4 and commit the result today. No problem, I was not looking for someone to blame... Since you are at it, I have problems compiling the nios2 kernel too, but I am not sure I got the proper configuration file. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1
On Tue, 2011-09-06 at 16:19 +0200, Gilles Chanteperdrix wrote: On 09/06/2011 03:27 PM, Philippe Gerum wrote: On Tue, 2011-09-06 at 13:31 +0200, Gilles Chanteperdrix wrote: On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote: Hi, The first release candidate for the 2.6.0 version may be downloaded here: http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2 Hi, currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related to vfile support. Do we really want to still support 2.4 kernels? That would not be a massive loss, but removing linux 2.4 support is more than a few hunks here and there, so this may not be the right thing to do ATM. Besides, it would be better not to leave the few linux 2.4 users out there without upgrade path to xenomai 2.6, since this will be the last maintained version from the Xenomai 2.x architecture. That stuff does not compile likely because the Config.in bits are not up to date, blame it on me. I'll make this build over linux 2.4 and commit the result today. No problem, I was not looking for someone to blame... Since you are at it, I have problems compiling the nios2 kernel too, but I am not sure I got the proper configuration file. Ok, I'll check this. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1
On Tue, 2011-09-06 at 16:19 +0200, Gilles Chanteperdrix wrote: On 09/06/2011 03:27 PM, Philippe Gerum wrote: On Tue, 2011-09-06 at 13:31 +0200, Gilles Chanteperdrix wrote: On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote: Hi, The first release candidate for the 2.6.0 version may be downloaded here: http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2 Hi, currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related to vfile support. Do we really want to still support 2.4 kernels? That would not be a massive loss, but removing linux 2.4 support is more than a few hunks here and there, so this may not be the right thing to do ATM. Besides, it would be better not to leave the few linux 2.4 users out there without upgrade path to xenomai 2.6, since this will be the last maintained version from the Xenomai 2.x architecture. That stuff does not compile likely because the Config.in bits are not up to date, blame it on me. I'll make this build over linux 2.4 and commit the result today. No problem, I was not looking for someone to blame... Since you are at it, I have problems compiling the nios2 kernel too, but I am not sure I got the proper configuration file. HEAD builds fine based on the attached .config. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1
On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:19 +0200, Gilles Chanteperdrix wrote: On 09/06/2011 03:27 PM, Philippe Gerum wrote: On Tue, 2011-09-06 at 13:31 +0200, Gilles Chanteperdrix wrote: On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote: Hi, The first release candidate for the 2.6.0 version may be downloaded here: http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2 Hi, currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related to vfile support. Do we really want to still support 2.4 kernels? That would not be a massive loss, but removing linux 2.4 support is more than a few hunks here and there, so this may not be the right thing to do ATM. Besides, it would be better not to leave the few linux 2.4 users out there without upgrade path to xenomai 2.6, since this will be the last maintained version from the Xenomai 2.x architecture. That stuff does not compile likely because the Config.in bits are not up to date, blame it on me. I'll make this build over linux 2.4 and commit the result today. No problem, I was not looking for someone to blame... Since you are at it, I have problems compiling the nios2 kernel too, but I am not sure I got the proper configuration file. HEAD builds fine based on the attached .config. Mmmfff... -- Philippe. # # Automatically generated make config: don't edit # Linux kernel version: 2.6.35 # Tue Sep 6 16:49:25 2011 # # # Linux/NiosII Configuration # CONFIG_NIOS2=y CONFIG_MMU=y # CONFIG_FPU is not set # CONFIG_SWAP is not set CONFIG_RWSEM_GENERIC_SPINLOCK=y # # NiosII board configuration # # CONFIG_3C120 is not set CONFIG_NEEK=y CONFIG_NIOS2_CUSTOM_FPGA=y # CONFIG_NIOS2_NEEK_OCM is not set # # NiosII specific compiler options # CONFIG_NIOS2_HW_MUL_SUPPORT=y # CONFIG_NIOS2_HW_MULX_SUPPORT is not set # CONFIG_NIOS2_HW_DIV_SUPPORT is not set # CONFIG_OF is not set CONFIG_ALIGNMENT_TRAP=y CONFIG_RAMKERNEL=y # # Boot options # CONFIG_CMDLINE= CONFIG_PASS_CMDLINE=y CONFIG_BOOT_LINK_OFFSET=0x0100 # # Platform driver options # # CONFIG_AVALON_DMA is not set # # Additional NiosII Device Drivers # # CONFIG_PCI_ALTPCI is not set # CONFIG_ALTERA_REMOTE_UPDATE is not set # CONFIG_PIO_DEVICES is not set # CONFIG_NIOS2_GPIO is not set # CONFIG_ALTERA_PIO_GPIO is not set CONFIG_UID16=y CONFIG_GENERIC_CSUM=y CONFIG_GENERIC_FIND_NEXT_BIT=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_NO_IOPORT=y CONFIG_ZONE_DMA=y CONFIG_BINFMT_ELF=y # CONFIG_NOT_COHERENT_CACHE is not set CONFIG_HZ=100 # CONFIG_TRACE_IRQFLAGS_SUPPORT is not set CONFIG_IPIPE=y CONFIG_IPIPE_DOMAINS=4 CONFIG_IPIPE_DELAYED_ATOMICSW=y # CONFIG_IPIPE_UNMASKED_CONTEXT_SWITCH is not set CONFIG_IPIPE_HAVE_PREEMPTIBLE_SWITCH=y CONFIG_SELECT_MEMORY_MODEL=y CONFIG_FLATMEM_MANUAL=y # CONFIG_DISCONTIGMEM_MANUAL is not set # CONFIG_SPARSEMEM_MANUAL is not set CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y CONFIG_PAGEFLAGS_EXTENDED=y CONFIG_SPLIT_PTLOCK_CPUS=4 # CONFIG_PHYS_ADDR_T_64BIT is not set CONFIG_ZONE_DMA_FLAG=1 CONFIG_BOUNCE=y CONFIG_VIRT_TO_BUS=y # CONFIG_KSM is not set CONFIG_DEFAULT_MMAP_MIN_ADDR=4096 CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config CONFIG_CONSTRUCTORS=y # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE= CONFIG_LOCALVERSION= CONFIG_LOCALVERSION_AUTO=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y # CONFIG_POSIX_MQUEUE is not set CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set # CONFIG_TASKSTATS is not set # CONFIG_AUDIT is not set # # RCU Subsystem # CONFIG_TREE_RCU=y # CONFIG_TREE_PREEMPT_RCU is not set # CONFIG_TINY_RCU is not set # CONFIG_RCU_TRACE is not set CONFIG_RCU_FANOUT=32 # CONFIG_RCU_FANOUT_EXACT is not set # CONFIG_TREE_RCU_TRACE is not set # CONFIG_IKCONFIG is not set CONFIG_LOG_BUF_SHIFT=14 # CONFIG_SYSFS_DEPRECATED_V2 is not set # CONFIG_RELAY is not set # CONFIG_NAMESPACES is not set CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE= CONFIG_RD_GZIP=y # CONFIG_RD_BZIP2 is not set # CONFIG_RD_LZMA is not set # CONFIG_RD_LZO is not set # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y CONFIG_EMBEDDED=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y # CONFIG_ELF_CORE is not set CONFIG_BASE_FULL=y CONFIG_FUTEX=y # CONFIG_EPOLL is not set # CONFIG_SIGNALFD is not set # CONFIG_TIMERFD is not set # CONFIG_EVENTFD is not set # CONFIG_SHMEM is not set CONFIG_AIO=y # # Kernel Performance Events And Counters # CONFIG_VM_EVENT_COUNTERS=y CONFIG_COMPAT_BRK=y CONFIG_SLAB=y # CONFIG_SLUB is not set # CONFIG_SLOB is not set # CONFIG_PROFILING is not
Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1
On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:19 +0200, Gilles Chanteperdrix wrote: On 09/06/2011 03:27 PM, Philippe Gerum wrote: On Tue, 2011-09-06 at 13:31 +0200, Gilles Chanteperdrix wrote: On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote: Hi, The first release candidate for the 2.6.0 version may be downloaded here: http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2 Hi, currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related to vfile support. Do we really want to still support 2.4 kernels? That would not be a massive loss, but removing linux 2.4 support is more than a few hunks here and there, so this may not be the right thing to do ATM. Besides, it would be better not to leave the few linux 2.4 users out there without upgrade path to xenomai 2.6, since this will be the last maintained version from the Xenomai 2.x architecture. That stuff does not compile likely because the Config.in bits are not up to date, blame it on me. I'll make this build over linux 2.4 and commit the result today. No problem, I was not looking for someone to blame... Since you are at it, I have problems compiling the nios2 kernel too, but I am not sure I got the proper configuration file. HEAD builds fine based on the attached .config. Btw we now only support the MMU version (2.6.35.2) of this kernel over Xenomai 2.6. Reference tree is available there: url = git://sopc.et.ntust.edu.tw/git/linux-2.6.git branch = nios2mmu nommu support is discontinued for nios2 - people who depend on it should stick with Xenomai 2.5.x. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1
On 09/06/2011 05:10 PM, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:19 +0200, Gilles Chanteperdrix wrote: On 09/06/2011 03:27 PM, Philippe Gerum wrote: On Tue, 2011-09-06 at 13:31 +0200, Gilles Chanteperdrix wrote: On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote: Hi, The first release candidate for the 2.6.0 version may be downloaded here: http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2 Hi, currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related to vfile support. Do we really want to still support 2.4 kernels? That would not be a massive loss, but removing linux 2.4 support is more than a few hunks here and there, so this may not be the right thing to do ATM. Besides, it would be better not to leave the few linux 2.4 users out there without upgrade path to xenomai 2.6, since this will be the last maintained version from the Xenomai 2.x architecture. That stuff does not compile likely because the Config.in bits are not up to date, blame it on me. I'll make this build over linux 2.4 and commit the result today. No problem, I was not looking for someone to blame... Since you are at it, I have problems compiling the nios2 kernel too, but I am not sure I got the proper configuration file. HEAD builds fine based on the attached .config. Btw we now only support the MMU version (2.6.35.2) of this kernel over Xenomai 2.6. Reference tree is available there: url = git://sopc.et.ntust.edu.tw/git/linux-2.6.git branch = nios2mmu nommu support is discontinued for nios2 - people who depend on it should stick with Xenomai 2.5.x. Ok, still not building, maybe the commit number mentioned in the README is not up-to-date? -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1
On 09/06/2011 08:19 PM, Gilles Chanteperdrix wrote: On 09/06/2011 05:10 PM, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:19 +0200, Gilles Chanteperdrix wrote: On 09/06/2011 03:27 PM, Philippe Gerum wrote: On Tue, 2011-09-06 at 13:31 +0200, Gilles Chanteperdrix wrote: On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote: Hi, The first release candidate for the 2.6.0 version may be downloaded here: http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2 Hi, currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related to vfile support. Do we really want to still support 2.4 kernels? That would not be a massive loss, but removing linux 2.4 support is more than a few hunks here and there, so this may not be the right thing to do ATM. Besides, it would be better not to leave the few linux 2.4 users out there without upgrade path to xenomai 2.6, since this will be the last maintained version from the Xenomai 2.x architecture. That stuff does not compile likely because the Config.in bits are not up to date, blame it on me. I'll make this build over linux 2.4 and commit the result today. No problem, I was not looking for someone to blame... Since you are at it, I have problems compiling the nios2 kernel too, but I am not sure I got the proper configuration file. HEAD builds fine based on the attached .config. Btw we now only support the MMU version (2.6.35.2) of this kernel over Xenomai 2.6. Reference tree is available there: url = git://sopc.et.ntust.edu.tw/git/linux-2.6.git branch = nios2mmu nommu support is discontinued for nios2 - people who depend on it should stick with Xenomai 2.5.x. Ok, still not building, maybe the commit number mentioned in the README is not up-to-date? More build failures for kernel 3.0 and ppc... http://sisyphus.hd.free.fr/~gilles/bx/index.html#powerpc -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1
On Tue, 2011-09-06 at 21:42 +0200, Gilles Chanteperdrix wrote: On 09/06/2011 08:19 PM, Gilles Chanteperdrix wrote: On 09/06/2011 05:10 PM, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:19 +0200, Gilles Chanteperdrix wrote: On 09/06/2011 03:27 PM, Philippe Gerum wrote: On Tue, 2011-09-06 at 13:31 +0200, Gilles Chanteperdrix wrote: On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote: Hi, The first release candidate for the 2.6.0 version may be downloaded here: http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2 Hi, currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related to vfile support. Do we really want to still support 2.4 kernels? That would not be a massive loss, but removing linux 2.4 support is more than a few hunks here and there, so this may not be the right thing to do ATM. Besides, it would be better not to leave the few linux 2.4 users out there without upgrade path to xenomai 2.6, since this will be the last maintained version from the Xenomai 2.x architecture. That stuff does not compile likely because the Config.in bits are not up to date, blame it on me. I'll make this build over linux 2.4 and commit the result today. No problem, I was not looking for someone to blame... Since you are at it, I have problems compiling the nios2 kernel too, but I am not sure I got the proper configuration file. HEAD builds fine based on the attached .config. Btw we now only support the MMU version (2.6.35.2) of this kernel over Xenomai 2.6. Reference tree is available there: url = git://sopc.et.ntust.edu.tw/git/linux-2.6.git branch = nios2mmu nommu support is discontinued for nios2 - people who depend on it should stick with Xenomai 2.5.x. Ok, still not building, maybe the commit number mentioned in the README is not up-to-date? More build failures for kernel 3.0 and ppc... http://sisyphus.hd.free.fr/~gilles/bx/index.html#powerpc I've fixed most of these, however the platform driver interface changed once again circa 2.6.39, and AFAICT, picking the right approach to cope with this never ending mess for the mscan driver requires some thoughts from educated people. Since I don't qualify for the job, I'm shamelessly passing the buck to Wolfgang: http://sisyphus.hd.free.fr/~gilles/bx/lite5200/3.0.4-ppc_6xx-gcc-4.2.2/log.html#1 PS: I guess this fix can wait until 2.6.0 final, this is not critical for -rc2. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1
On Tue, 2011-09-06 at 20:19 +0200, Gilles Chanteperdrix wrote: On 09/06/2011 05:10 PM, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote: On Tue, 2011-09-06 at 16:19 +0200, Gilles Chanteperdrix wrote: On 09/06/2011 03:27 PM, Philippe Gerum wrote: On Tue, 2011-09-06 at 13:31 +0200, Gilles Chanteperdrix wrote: On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote: Hi, The first release candidate for the 2.6.0 version may be downloaded here: http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2 Hi, currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related to vfile support. Do we really want to still support 2.4 kernels? That would not be a massive loss, but removing linux 2.4 support is more than a few hunks here and there, so this may not be the right thing to do ATM. Besides, it would be better not to leave the few linux 2.4 users out there without upgrade path to xenomai 2.6, since this will be the last maintained version from the Xenomai 2.x architecture. That stuff does not compile likely because the Config.in bits are not up to date, blame it on me. I'll make this build over linux 2.4 and commit the result today. No problem, I was not looking for someone to blame... Since you are at it, I have problems compiling the nios2 kernel too, but I am not sure I got the proper configuration file. HEAD builds fine based on the attached .config. Btw we now only support the MMU version (2.6.35.2) of this kernel over Xenomai 2.6. Reference tree is available there: url = git://sopc.et.ntust.edu.tw/git/linux-2.6.git branch = nios2mmu nommu support is discontinued for nios2 - people who depend on it should stick with Xenomai 2.5.x. Ok, still not building, maybe the commit number mentioned in the README is not up-to-date? The commit # is correct, but I suspect that your kernel tree does not have the files normally created by the SOPC builder anymore, these can't (may not actually) be included in the pipeline patch. In short, your tree might be missing the bits corresponding to the fpga design your build for, so basic symbols like HRCLOCK* and HRTIMER* are undefined. I'm building for a cyclone 3c25 from the NEEK kit, with SOPC files available from arch/nios2/boards/neek. Any valuable files in there on your side? (typically, include/asm/custom_fpga.h should contain definitions for our real-time clocks and timers) -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1
Hi Gilles, Unfortunately I didn't find the time to test this release yet. I'm just wondering if there is a fix for this problem in the 2.6.0 release: https://mail.gna.org/public/xenomai-core/2011-05/msg00028.html We are using the auto-relax patches on top of 2.5.6 for a long time now. We found issues with it regarding auto-relax tasks that were not being auto-relaxed anymore. Philippe made patches for that, see https://mail.gna.org/public/xenomai-help/2011-03/msg00161.html. However, locally I reverted those two patches because these introduced a memory leak in xnheap; I could only do rt_task_create() rt_task_delete() for 1024 times ;-). I thought that was the discussion of https://mail.gna.org/public/xenomai-core/2011-05/msg00028.html at that time and I don't recall a proper fix for it was provided. But I might have missed it... Thanks, Henri On Sun, Sep 4, 2011 at 10:52 PM, Gilles Chanteperdrix gilles.chanteperd...@xenomai.org wrote: Hi, The first release candidate for the 2.6.0 version may be downloaded here: http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2 This version fixes a few issues in the 2.5.x branch which required breaking the ABI: - user-space heap mapping; - user-space access to thread mode; - get threads running with SCHED_OTHER scheduling policy to automatically return to secondary mode after each primary mode only system call (except when holding a mutex); - fix both native and posix condition variables signal handling. contains a few improvements as well: - add support for CLOCK_HOST_REALTIME, a real-time clock synchronized with Linux clock; - factor proc filesystem handling; - the xeno-test scripts has been simplified and rebased on xeno-test-run, which will allow writing custom test scripts; - add support for sh4 architecture; - simplify arm user-space configure script; - move rtdk to libxenomai library, printf is now rt-safe when using the posix skin; - add support for pkg-config, the xenomai skin libraries are available each as a libxenomai_skin pkg-config package. Regards. -- Gilles. ___ Xenomai-help mailing list xenomai-h...@gna.org https://mail.gna.org/listinfo/xenomai-help ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1
On 09/05/2011 07:14 PM, Henri Roosen wrote: Hi Gilles, Unfortunately I didn't find the time to test this release yet. I'm just wondering if there is a fix for this problem in the 2.6.0 release: https://mail.gna.org/public/xenomai-core/2011-05/msg00028.html This one is fixed, a bit differently, since we fixed the ppd handling so that the ppd is valid up to the end of a process. We are using the auto-relax patches on top of 2.5.6 for a long time now. We found issues with it regarding auto-relax tasks that were not being auto-relaxed anymore. Philippe made patches for that, see https://mail.gna.org/public/xenomai-help/2011-03/msg00161.html. Philippe's patches for rt_task_send/receive/reply should have been merged too. However, locally I reverted those two patches because these introduced a memory leak in xnheap; I could only do rt_task_create() rt_task_delete() for 1024 times ;-). I thought that was the discussion of https://mail.gna.org/public/xenomai-core/2011-05/msg00028.html at that time and I don't recall a proper fix for it was provided. But I might have missed it... This looks related to the ppd issue as well, in which case, it should have been fixed too. It would be nice if you could test the release and tell us whether you still have these issues. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] xenomai-core ftrace
On 2011-09-04 07:10, rainbow wrote: Sorry to reply so late, I did a test about install ftrace on xenomai. the following is my procedure: #git://git.xenomai.org/xenomai-jki.git queues/ftrace #git://git.kiszka.org/ipipe-2.6 queues/2.6.35-x86-trace #cd queues/ftrace #git checkout -b remotes/origin/queues/ftrace origin/queues/2.6.35-x86-trace //change to the ftrace xenomai branch #cd ../2.6.35-x86-trace #git checkout -b origin/queues/2.6.35-x86-trace origin/queues/2.6.35-x86-trace #cd ../ftrace #./scripts/prepare-kernel.sh --arch=i386 --adeos=ksrc/arch/x86/patches/adeos-ipipe-2.6.35.9-x86-2.8-04.patch --linux=../2.6.35-x86-trace/ #cd /2.6.35-x86-trace/ then I compile the kernel but I get the following error message: arch/x86/kernel/ipipe.c:851: error: conflicting types for ‘update_vsyscall’ include/linux/clocksource.h:316: note: previous declaration of ‘update_vsyscall’ was here make[2]: *** [arch/x86/kernel/ipipe.o] Error 1 make[1]: *** [arch/x86/kernel] Error 2 make: *** [arch/x86] Error 2 That's a build issues of the underlying old ipipe patch. However, it's x86-32 only. And as the documentation stated, only x86-64 is supported by the ftrace patches. So build for 64 bit instead. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] xenomai-core ftrace
you mean I use remotes/origin/queues/2.6.37-x86 branch and use the ipipe patch for 2.6.37 then install them on x86_64, the ftrace can work?I will have a try, thank you! 2011/9/4 Jan Kiszka jan.kis...@web.de On 2011-09-04 07:10, rainbow wrote: Sorry to reply so late, I did a test about install ftrace on xenomai. the following is my procedure: #git://git.xenomai.org/xenomai-jki.git queues/ftrace #git://git.kiszka.org/ipipe-2.6 queues/2.6.35-x86-trace #cd queues/ftrace #git checkout -b remotes/origin/queues/ftrace origin/queues/2.6.35-x86-trace //change to the ftrace xenomai branch #cd ../2.6.35-x86-trace #git checkout -b origin/queues/2.6.35-x86-trace origin/queues/2.6.35-x86-trace #cd ../ftrace #./scripts/prepare-kernel.sh --arch=i386 --adeos=ksrc/arch/x86/patches/adeos-ipipe-2.6.35.9-x86-2.8-04.patch --linux=../2.6.35-x86-trace/ #cd /2.6.35-x86-trace/ then I compile the kernel but I get the following error message: arch/x86/kernel/ipipe.c:851: error: conflicting types for ‘update_vsyscall’ include/linux/clocksource.h:316: note: previous declaration of ‘update_vsyscall’ was here make[2]: *** [arch/x86/kernel/ipipe.o] Error 1 make[1]: *** [arch/x86/kernel] Error 2 make: *** [arch/x86] Error 2 That's a build issues of the underlying old ipipe patch. However, it's x86-32 only. And as the documentation stated, only x86-64 is supported by the ftrace patches. So build for 64 bit instead. Jan -- Qingquan Lv School of Information Science Engineering , Lanzhou University. mail: lvq...@gmail.com Do what you like, Enjoy your life. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] xenomai-core ftrace
On 2011-09-04 13:49, rainbow wrote: you mean I use remotes/origin/queues/2.6.37-x86 branch and use the ipipe patch for 2.6.37 then install them on x86_64, the ftrace can work?I will have a try, thank you! Use the 2.6.35-x86-trace, it already contains the ipipe patch, and build it for x86-64. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] xenomai-core ftrace
Is the ipipe patch the same as patch like adeos-ipipe-2.6.37.6-x86-2.9-02.patch, I know the latter is xenomai patch and after I patch it, I can see Real-time sub-system --- Option. But If I use 2.6.35-x86-trace which contains ,there is no such option. Another problem is that there are so many xenomai gits , how can i download the correct git? I am a newby to xenomai and I am sorry to ask so many questions but I want to do something on xenomai :) . Thank you for your detail answers. 2011/9/4 Jan Kiszka jan.kis...@web.de On 2011-09-04 13:49, rainbow wrote: you mean I use remotes/origin/queues/2.6.37-x86 branch and use the ipipe patch for 2.6.37 then install them on x86_64, the ftrace can work?I will have a try, thank you! Use the 2.6.35-x86-trace, it already contains the ipipe patch, and build it for x86-64. Jan -- Qingquan Lv School of Information Science Engineering , Lanzhou University. mail: lvq...@gmail.com Do what you like, Enjoy your life. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] xenomai-core ftrace
On 2011-09-04 14:21, rainbow wrote: Is the ipipe patch the same as patch like adeos-ipipe-2.6.37.6-x86-2.9-02.patch, Except that the trace branch is for 2.6.35, yes. More precisely it is now the same, I just pushed the latest version that includes two more backported ipipe fixes. I know the latter is xenomai patch and after I patch it, I can see Real-time sub-system --- Option. But If I use 2.6.35-x86-trace which contains ,there is no such option. That menu option is introduced by Xenomai, ie. after running prepare-kernel.sh. You likely forgot that step. Note again that you have to use a Xenomai tree with the required ftrace patches on top if you want Xenomai to generate ftrace events as well. Another problem is that there are so many xenomai gits , how can i download the correct git? By cloning the the git repository you obtain all available branches. You just need to checkout the desired one afterward. I am a newby to xenomai and I am sorry to ask so many questions but I want to do something on xenomai :) . Thank you for your detail answers. Setting up ftrace for Xenomai is not necessarily a newbie task, but I think I know the background of this. :) Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] xenomai-core ftrace
2011/9/4 Jan Kiszka jan.kis...@web.de On 2011-09-04 14:21, rainbow wrote: Is the ipipe patch the same as patch like adeos-ipipe-2.6.37.6-x86-2.9-02.patch, Except that the trace branch is for 2.6.35, yes. More precisely it is now the same, I just pushed the latest version that includes two more backported ipipe fixes. I know the latter is xenomai patch and after I patch it, I can see Real-time sub-system --- Option. But If I use 2.6.35-x86-trace which contains ,there is no such option. That menu option is introduced by Xenomai, ie. after running prepare-kernel.sh. You likely forgot that step. Yes,I forget the step. So I think I only have to run prepare-kernel.sh --arch=x86_64 --linux=2.6.35-x86-trace , I do not need --adeos option because the 2.6.35-x86-trace contains the ipipe patch. Note again that you have to use a Xenomai tree with the required ftrace patches on top if you want Xenomai to generate ftrace events as well. Xenomai tree with required ftrace patches on top you mean the branch remotes/origin/queues/ftrace? Another problem is that there are so many xenomai gits , how can i download the correct git? By cloning the the git repository you obtain all available branches. You just need to checkout the desired one afterward. I am a newby to xenomai and I am sorry to ask so many questions but I want to do something on xenomai :) . Thank you for your detail answers. Setting up ftrace for Xenomai is not necessarily a newbie task, but I think I know the background of this. :) I think you really know the background :). Jan -- Qingquan Lv School of Information Science Engineering , Lanzhou University. mail: lvq...@gmail.com Do what you like, Enjoy your life. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] xenomai-core ftrace
On 2011-09-04 15:16, rainbow wrote: 2011/9/4 Jan Kiszka jan.kis...@web.de On 2011-09-04 14:21, rainbow wrote: Is the ipipe patch the same as patch like adeos-ipipe-2.6.37.6-x86-2.9-02.patch, Except that the trace branch is for 2.6.35, yes. More precisely it is now the same, I just pushed the latest version that includes two more backported ipipe fixes. I know the latter is xenomai patch and after I patch it, I can see Real-time sub-system --- Option. But If I use 2.6.35-x86-trace which contains ,there is no such option. That menu option is introduced by Xenomai, ie. after running prepare-kernel.sh. You likely forgot that step. Yes,I forget the step. So I think I only have to run prepare-kernel.sh --arch=x86_64 --linux=2.6.35-x86-trace , I do not need --adeos option because the 2.6.35-x86-trace contains the ipipe patch. Note again that you have to use a Xenomai tree with the required ftrace patches on top if you want Xenomai to generate ftrace events as well. Xenomai tree with required ftrace patches on top you mean the branch remotes/origin/queues/ftrace? Yep. I just pushed a rebased version of current git master. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] xenomai-core ftrace
On 2011-09-03 04:52, rainbow wrote: hi,all,I want to use ftrace in xenomai-2.5.6,but when I use git:// git.kiszka.org/ipipe.git queues/2.6.35-x86-trace to get the linux kernel,there is no option about xenomai or ipipe . If I want to patch the xenomai patch,there are some problem. How should I use ftrace on xenomai?Thanks! First of all, make sure to read README.INSTALL in the Xenomai tree for the basic installation procedure. That git branch above replaces the installation step of picking a vanilla Linux source tree and applying the ipipe patch to it (if there is no ipipe option in the kernel config, you probably haven't check out the right branch yet). The next step would be running Xenomai's prepare-kernel.sh, in this case using a Xenomai tree that has the required ftrace patches, see http://permalink.gmane.org/gmane.linux.real-time.xenomai.devel/7966 Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] xenomai-core ftrace
Sorry to reply so late, I did a test about install ftrace on xenomai. the following is my procedure: #git://git.xenomai.org/xenomai-jki.git queues/ftrace #git://git.kiszka.org/ipipe-2.6 queues/2.6.35-x86-trace #cd queues/ftrace #git checkout -b remotes/origin/queues/ftrace origin/queues/2.6.35-x86-trace //change to the ftrace xenomai branch #cd ../2.6.35-x86-trace #git checkout -b origin/queues/2.6.35-x86-trace origin/queues/2.6.35-x86-trace #cd ../ftrace #./scripts/prepare-kernel.sh --arch=i386 --adeos=ksrc/arch/x86/patches/adeos-ipipe-2.6.35.9-x86-2.8-04.patch --linux=../2.6.35-x86-trace/ #cd /2.6.35-x86-trace/ then I compile the kernel but I get the following error message: arch/x86/kernel/ipipe.c:851: error: conflicting types for ‘update_vsyscall’ include/linux/clocksource.h:316: note: previous declaration of ‘update_vsyscall’ was here make[2]: *** [arch/x86/kernel/ipipe.o] Error 1 make[1]: *** [arch/x86/kernel] Error 2 make: *** [arch/x86] Error 2 I am not sure the reason is that I get the wrong patch or the kernel configuration is wrong, Is the procedure above right? Thanks! 2011/9/3 Jan Kiszka jan.kis...@web.de On 2011-09-03 04:52, rainbow wrote: hi,all,I want to use ftrace in xenomai-2.5.6,but when I use git:// git.kiszka.org/ipipe.git queues/2.6.35-x86-trace to get the linux kernel,there is no option about xenomai or ipipe . If I want to patch the xenomai patch,there are some problem. How should I use ftrace on xenomai?Thanks! First of all, make sure to read README.INSTALL in the Xenomai tree for the basic installation procedure. That git branch above replaces the installation step of picking a vanilla Linux source tree and applying the ipipe patch to it (if there is no ipipe option in the kernel config, you probably haven't check out the right branch yet). The next step would be running Xenomai's prepare-kernel.sh, in this case using a Xenomai tree that has the required ftrace patches, see http://permalink.gmane.org/gmane.linux.real-time.xenomai.devel/7966 Jan -- Qingquan Lv School of Information Science Engineering , Lanzhou University. mail: lvq...@gmail.com Do what you like, Enjoy your life. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Xenomai 2.6.0, or -rc1?
On 08/30/2011 01:00 AM, Alexis Berlemont wrote: Hi, On Fri, Aug 26, 2011 at 2:34 PM, Gilles Chanteperdrix gilles.chanteperd...@xenomai.org wrote: Hi, I think it is about time we release Xenomai 2.6.0. Has anyone anything pending (maybe Alex)? Should we release an -rc first? Yes. in my experimental branch, I have a few things which are not that experimental. I would like to push: - a first version of Julien Delange's ni_660x driver - Anders Blomdell's fix on duplicate sympbols with comedi - Anders Blomdell's fix in pcimio driver (wrong IRQ number after reboot) - some waveform generation tools (fully generic) - an overhaul of the testing drivers (fake + loop = fake) I will integrate them in my analogy branch and send a pull request if you are OK with that. Ok for me. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Xenomai 2.6.0, or -rc1?
On Tue, Aug 30, 2011 at 1:00 AM, Alexis Berlemont alexis.berlem...@gmail.com wrote: - a first version of Julien Delange's ni_660x driver And also the one for the 670x board, no ? ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Xenomai 2.6.0, or -rc1?
Hi, On Fri, Aug 26, 2011 at 2:34 PM, Gilles Chanteperdrix gilles.chanteperd...@xenomai.org wrote: Hi, I think it is about time we release Xenomai 2.6.0. Has anyone anything pending (maybe Alex)? Should we release an -rc first? Yes. in my experimental branch, I have a few things which are not that experimental. I would like to push: - a first version of Julien Delange's ni_660x driver - Anders Blomdell's fix on duplicate sympbols with comedi - Anders Blomdell's fix in pcimio driver (wrong IRQ number after reboot) - some waveform generation tools (fully generic) - an overhaul of the testing drivers (fake + loop = fake) I will integrate them in my analogy branch and send a pull request if you are OK with that. Alexis. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Xenomai 2.6.0, or -rc1?
On 2011-08-26 14:34, Gilles Chanteperdrix wrote: Hi, I think it is about time we release Xenomai 2.6.0. Has anyone anything pending (maybe Alex)? Should we release an -rc first? No patches ATM, but [1] is still an open bug - a bug that affects the ABI. Jan [1] http://thread.gmane.org/gmane.linux.real-time.xenomai.devel/8343 -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Xenomai 2.6.0, or -rc1?
On Fri, 2011-08-26 at 14:34 +0200, Gilles Chanteperdrix wrote: Hi, I think it is about time we release Xenomai 2.6.0. Has anyone anything pending (maybe Alex)? Should we release an -rc first? Thanks in advance for your input. Nothing pending for 2.6, I'm focusing on 3.x now. However let's go for -rc1 first, this is a major release anyway. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Xenomai 2.6.0, or -rc1?
On 08/26/2011 03:05 PM, Jan Kiszka wrote: On 2011-08-26 14:34, Gilles Chanteperdrix wrote: Hi, I think it is about time we release Xenomai 2.6.0. Has anyone anything pending (maybe Alex)? Should we release an -rc first? No patches ATM, but [1] is still an open bug - a bug that affects the ABI. Jan [1] http://thread.gmane.org/gmane.linux.real-time.xenomai.devel/8343 I had forgotten about this one. So, the only real problem is if a SCHED_NOTOTHER thread switches to SCHED_OTHER, this appears to be a corner case, so, I wonder if you should not simply add a special treatment, only for this corner case. What I have in mind is keeping a list of xnsynch in kernel-space (this basically means having an xnholder_t more in the xnsynch structure), and when we trip the corner case (thread with SCHED_FIFO switches to SCHED_OTHER), walk the list to find how many xnsynch the thread is the owner, we have that info in kernel-space, and set the refcnt accordingly. Or does it still sound overkill? -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Xenomai 2.6.0, or -rc1?
On 2011-08-26 20:07, Gilles Chanteperdrix wrote: On 08/26/2011 03:05 PM, Jan Kiszka wrote: On 2011-08-26 14:34, Gilles Chanteperdrix wrote: Hi, I think it is about time we release Xenomai 2.6.0. Has anyone anything pending (maybe Alex)? Should we release an -rc first? No patches ATM, but [1] is still an open bug - a bug that affects the ABI. Jan [1] http://thread.gmane.org/gmane.linux.real-time.xenomai.devel/8343 I had forgotten about this one. So, the only real problem is if a SCHED_NOTOTHER thread switches to SCHED_OTHER, this appears to be a corner case, so, I wonder if you should not simply add a special treatment, only for this corner case. What I have in mind is keeping a list of xnsynch in kernel-space (this basically means having an xnholder_t more in the xnsynch structure), and when we trip the corner case (thread with SCHED_FIFO switches to SCHED_OTHER), walk the list to find how many xnsynch the thread is the owner, we have that info in kernel-space, and set the refcnt accordingly. Or does it still sound overkill? Mmh, need to think about it. Yeah, we do not support PTHREAD_MUTEX_INITIALIZER, so we do not share that part of the problem with futexes. If we have all objects and can explore ownership, we can also implement robust mutexes this way, i.e. waiter signaling when the owner dies. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Xenomai 2.6.0, or -rc1?
On 08/26/2011 08:19 PM, Jan Kiszka wrote: On 2011-08-26 20:07, Gilles Chanteperdrix wrote: On 08/26/2011 03:05 PM, Jan Kiszka wrote: On 2011-08-26 14:34, Gilles Chanteperdrix wrote: Hi, I think it is about time we release Xenomai 2.6.0. Has anyone anything pending (maybe Alex)? Should we release an -rc first? No patches ATM, but [1] is still an open bug - a bug that affects the ABI. Jan [1] http://thread.gmane.org/gmane.linux.real-time.xenomai.devel/8343 I had forgotten about this one. So, the only real problem is if a SCHED_NOTOTHER thread switches to SCHED_OTHER, this appears to be a corner case, so, I wonder if you should not simply add a special treatment, only for this corner case. What I have in mind is keeping a list of xnsynch in kernel-space (this basically means having an xnholder_t more in the xnsynch structure), and when we trip the corner case (thread with SCHED_FIFO switches to SCHED_OTHER), walk the list to find how many xnsynch the thread is the owner, we have that info in kernel-space, and set the refcnt accordingly. Or does it still sound overkill? Mmh, need to think about it. Yeah, we do not support PTHREAD_MUTEX_INITIALIZER, so we do not share that part of the problem with futexes. Actually, we could implement PTHREAD_MUTEX_INITIALIZER: when the magic is wrong, just issue a pthread_mutex_init syscall, and try locking again. But the problem is that this particular call to pthread_mutex_lock would be much heavier than locking an initialized mutex for reasons which are not obvious (besides, we would have to handle concurrency by some way, like having a pthread_once_t in pthread_mutex_t). I find not having PTHREAD_MUTEX_INITIALIZER more clear, even if this makes us not really posix compliant. If we have all objects and can explore ownership, we can also implement robust mutexes this way, i.e. waiter signaling when the owner dies. Jan -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] xenomai-head compile failure
On 08/09/2011 02:51 PM, Daniele Nicolodi wrote: Hello, I'm compiling xenomai-head on i386 debian/testing. I found that the file src/skins/posix/wrappers.c is missing an include of signal.h for the definition of pthread_kill(). Fixed, thanks. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : rt_print: Provide rt_puts
On 07/31/2011 06:49 PM, GIT version control wrote: +int rt_puts(const char *s) +{ + return print_to_buffer(stdout, 0, RT_PRINT_MODE_PUTS, s, NULL); +} gcc for ARM chokes here: it says that NULL can not be converted to a va_list, however I try it. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : rt_print: Provide rt_puts
On 2011-07-31 19:21, Gilles Chanteperdrix wrote: On 07/31/2011 06:49 PM, GIT version control wrote: +int rt_puts(const char *s) +{ +return print_to_buffer(stdout, 0, RT_PRINT_MODE_PUTS, s, NULL); +} gcc for ARM chokes here: it says that NULL can not be converted to a va_list, however I try it. Hmm. Does this work? diff --git a/src/skins/common/rt_print.c b/src/skins/common/rt_print.c index 186de48..52538d8 100644 --- a/src/skins/common/rt_print.c +++ b/src/skins/common/rt_print.c @@ -243,7 +243,9 @@ int rt_printf(const char *format, ...) int rt_puts(const char *s) { - return print_to_buffer(stdout, 0, RT_PRINT_MODE_PUTS, s, NULL); + va_list dummy; + + return print_to_buffer(stdout, 0, RT_PRINT_MODE_PUTS, s, dummy); } void rt_syslog(int priority, const char *format, ...) Not really beautiful as well, I know. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : rt_print: Provide rt_puts
On 07/31/2011 07:42 PM, Jan Kiszka wrote: On 2011-07-31 19:21, Gilles Chanteperdrix wrote: On 07/31/2011 06:49 PM, GIT version control wrote: +int rt_puts(const char *s) +{ + return print_to_buffer(stdout, 0, RT_PRINT_MODE_PUTS, s, NULL); +} gcc for ARM chokes here: it says that NULL can not be converted to a va_list, however I try it. Hmm. Does this work? diff --git a/src/skins/common/rt_print.c b/src/skins/common/rt_print.c index 186de48..52538d8 100644 --- a/src/skins/common/rt_print.c +++ b/src/skins/common/rt_print.c @@ -243,7 +243,9 @@ int rt_printf(const char *format, ...) int rt_puts(const char *s) { - return print_to_buffer(stdout, 0, RT_PRINT_MODE_PUTS, s, NULL); + va_list dummy; + + return print_to_buffer(stdout, 0, RT_PRINT_MODE_PUTS, s, dummy); } void rt_syslog(int priority, const char *format, ...) Not really beautiful as well, I know. It seems to work now, but some later version of gcc may decide to warn us that this argument is used without being initialized... -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : rt_print: Provide rt_puts
On 2011-07-31 19:46, Gilles Chanteperdrix wrote: On 07/31/2011 07:42 PM, Jan Kiszka wrote: On 2011-07-31 19:21, Gilles Chanteperdrix wrote: On 07/31/2011 06:49 PM, GIT version control wrote: +int rt_puts(const char *s) +{ + return print_to_buffer(stdout, 0, RT_PRINT_MODE_PUTS, s, NULL); +} gcc for ARM chokes here: it says that NULL can not be converted to a va_list, however I try it. Hmm. Does this work? diff --git a/src/skins/common/rt_print.c b/src/skins/common/rt_print.c index 186de48..52538d8 100644 --- a/src/skins/common/rt_print.c +++ b/src/skins/common/rt_print.c @@ -243,7 +243,9 @@ int rt_printf(const char *format, ...) int rt_puts(const char *s) { -return print_to_buffer(stdout, 0, RT_PRINT_MODE_PUTS, s, NULL); +va_list dummy; + +return print_to_buffer(stdout, 0, RT_PRINT_MODE_PUTS, s, dummy); } void rt_syslog(int priority, const char *format, ...) Not really beautiful as well, I know. It seems to work now, but some later version of gcc may decide to warn us that this argument is used without being initialized... Yes. I've pushed a cleaner version. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 2011-07-15 15:10, Jan Kiszka wrote: But... right now it looks like we found our primary regression: nucleus/shadow: shorten the uninterruptible path to secondary mode. It opens a short windows during relax where the migrated task may be active under both schedulers. We are currently evaluating a revert (looks good so far), and I need to work out my theory in more details. Looks like this commit just made a long-standing flaw in Xenomai's interrupt handling more visible: We reschedule over the interrupt stack in the Xenomai interrupt handler tails, at least on x86-64. Not sure if other archs have interrupt stacks, the point is Xenomai's design wrongly assumes there are no such things. We were lucky so far that the values saved on this shared stack were apparently compatible, means we were overwriting them with identical or harmless values. But that's no longer true when interrupts are hitting us in the xnpod_suspend_thread path of a relaxing shadow. Likely the only possible fix is establishing a reschedule hook for Xenomai in the interrupt exit path after the original stack is restored - - just like Linux works. Requires changes to both ipipe and Xenomai unfortunately. Jan -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk4hSDsACgkQitSsb3rl5xSmOACfbZfcNKyO9YDvPE+R5H75d0ky DX0An32BrZW+lpEnxnLLCHSQ5r8itnE9 =n6u8 -END PGP SIGNATURE- ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 2011-07-16 10:52, Philippe Gerum wrote: On Sat, 2011-07-16 at 10:13 +0200, Jan Kiszka wrote: On 2011-07-15 15:10, Jan Kiszka wrote: But... right now it looks like we found our primary regression: nucleus/shadow: shorten the uninterruptible path to secondary mode. It opens a short windows during relax where the migrated task may be active under both schedulers. We are currently evaluating a revert (looks good so far), and I need to work out my theory in more details. Looks like this commit just made a long-standing flaw in Xenomai's interrupt handling more visible: We reschedule over the interrupt stack in the Xenomai interrupt handler tails, at least on x86-64. Not sure if other archs have interrupt stacks, the point is Xenomai's design wrongly assumes there are no such things. Fortunately, no, this is not a design issue, no such assumption was ever made, but the Xenomai core expects this to be handled on a per-arch basis with the interrupt pipeline. And that's already the problem: If Linux uses interrupt stacks, relying on ipipe to disable this during Xenomai interrupt handler execution is at best a workaround. A fragile one unless you increase the pre-thread stack size by the size of the interrupt stack. Lacking support for a generic rescheduling hook became a problem by the time Linux introduced interrupt threads. As you pointed out, there is no way to handle this via some generic Xenomai-only support. ppc64 now has separate interrupt stacks, which is why I disabled IRQSTACKS which became the builtin default at some point. Blackfin goes through a Xenomai-defined irq tail handler as well, because it may not reschedule over nested interrupt stacks. How does this arch prevent that xnpod_schedule in the generic interrupt handler tail does its normal work? Fact is that such pending problem with x86_64 was overlooked since day #1 by /me. We were lucky so far that the values saved on this shared stack were apparently compatible, means we were overwriting them with identical or harmless values. But that's no longer true when interrupts are hitting us in the xnpod_suspend_thread path of a relaxing shadow. Makes sense. It would be better to find a solution that does not make the relax path uninterruptible again for a significant amount of time. On low end platforms we support (i.e. non-x86* mainly), this causes obvious latency spots. I agree. Conceptually, the interruptible relaxation should be safe now after recent fixes. Likely the only possible fix is establishing a reschedule hook for Xenomai in the interrupt exit path after the original stack is restored - - just like Linux works. Requires changes to both ipipe and Xenomai unfortunately. __ipipe_run_irqtail() is in the I-pipe core for such purpose. If instantiated properly for x86_64, and paired with xnarch_escalate() for that arch as well, it could be an option for running the rescheduling procedure when safe. Nope, that doesn't work. The stack is switched later in the return path in entry_64.S. We need a hook there, ideally a conditional one, controlled by some per-cpu variable that is set by Xenomai on return from its interrupt handlers to signal the rescheduling need. Jan ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On Sat, 2011-07-16 at 11:15 +0200, Jan Kiszka wrote: On 2011-07-16 10:52, Philippe Gerum wrote: On Sat, 2011-07-16 at 10:13 +0200, Jan Kiszka wrote: On 2011-07-15 15:10, Jan Kiszka wrote: But... right now it looks like we found our primary regression: nucleus/shadow: shorten the uninterruptible path to secondary mode. It opens a short windows during relax where the migrated task may be active under both schedulers. We are currently evaluating a revert (looks good so far), and I need to work out my theory in more details. Looks like this commit just made a long-standing flaw in Xenomai's interrupt handling more visible: We reschedule over the interrupt stack in the Xenomai interrupt handler tails, at least on x86-64. Not sure if other archs have interrupt stacks, the point is Xenomai's design wrongly assumes there are no such things. Fortunately, no, this is not a design issue, no such assumption was ever made, but the Xenomai core expects this to be handled on a per-arch basis with the interrupt pipeline. And that's already the problem: If Linux uses interrupt stacks, relying on ipipe to disable this during Xenomai interrupt handler execution is at best a workaround. A fragile one unless you increase the pre-thread stack size by the size of the interrupt stack. Lacking support for a generic rescheduling hook became a problem by the time Linux introduced interrupt threads. Don't assume too much. What was done for ppc64 was not meant as a general policy. Again, this is a per-arch decision. As you pointed out, there is no way to handle this via some generic Xenomai-only support. ppc64 now has separate interrupt stacks, which is why I disabled IRQSTACKS which became the builtin default at some point. Blackfin goes through a Xenomai-defined irq tail handler as well, because it may not reschedule over nested interrupt stacks. How does this arch prevent that xnpod_schedule in the generic interrupt handler tail does its normal work? It polls some hw status to know whether a rescheduling would be safe. See xnarch_escalate(). Fact is that such pending problem with x86_64 was overlooked since day #1 by /me. We were lucky so far that the values saved on this shared stack were apparently compatible, means we were overwriting them with identical or harmless values. But that's no longer true when interrupts are hitting us in the xnpod_suspend_thread path of a relaxing shadow. Makes sense. It would be better to find a solution that does not make the relax path uninterruptible again for a significant amount of time. On low end platforms we support (i.e. non-x86* mainly), this causes obvious latency spots. I agree. Conceptually, the interruptible relaxation should be safe now after recent fixes. Likely the only possible fix is establishing a reschedule hook for Xenomai in the interrupt exit path after the original stack is restored - - just like Linux works. Requires changes to both ipipe and Xenomai unfortunately. __ipipe_run_irqtail() is in the I-pipe core for such purpose. If instantiated properly for x86_64, and paired with xnarch_escalate() for that arch as well, it could be an option for running the rescheduling procedure when safe. Nope, that doesn't work. The stack is switched later in the return path in entry_64.S. We need a hook there, ideally a conditional one, controlled by some per-cpu variable that is set by Xenomai on return from its interrupt handlers to signal the rescheduling need. Yes, makes sense. The way to make it conditional without dragging bits of Xenomai logic into the kernel innards is not obvious though. It is probably time to officially introduce exo-kernel oriented bits into the Linux thread info. PTDs have too lose semantics to be practical if we want to avoid trashing the I-cache by calling probe hooks within the dual kernel, each time we want to check some basic condition (e.g. resched needed). A backlink to a foreign TCB there would help too. Which leads us to killing the ad hoc kernel threads (and stacks) at some point, which are an absolute pain. Jan -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 2011-07-16 11:56, Philippe Gerum wrote: On Sat, 2011-07-16 at 11:15 +0200, Jan Kiszka wrote: On 2011-07-16 10:52, Philippe Gerum wrote: On Sat, 2011-07-16 at 10:13 +0200, Jan Kiszka wrote: On 2011-07-15 15:10, Jan Kiszka wrote: But... right now it looks like we found our primary regression: nucleus/shadow: shorten the uninterruptible path to secondary mode. It opens a short windows during relax where the migrated task may be active under both schedulers. We are currently evaluating a revert (looks good so far), and I need to work out my theory in more details. Looks like this commit just made a long-standing flaw in Xenomai's interrupt handling more visible: We reschedule over the interrupt stack in the Xenomai interrupt handler tails, at least on x86-64. Not sure if other archs have interrupt stacks, the point is Xenomai's design wrongly assumes there are no such things. Fortunately, no, this is not a design issue, no such assumption was ever made, but the Xenomai core expects this to be handled on a per-arch basis with the interrupt pipeline. And that's already the problem: If Linux uses interrupt stacks, relying on ipipe to disable this during Xenomai interrupt handler execution is at best a workaround. A fragile one unless you increase the pre-thread stack size by the size of the interrupt stack. Lacking support for a generic rescheduling hook became a problem by the time Linux introduced interrupt threads. Don't assume too much. What was done for ppc64 was not meant as a general policy. Again, this is a per-arch decision. Actually, it was the right decision, not only for ppc64: Reusing Linux interrupt stacks for Xenomai does not work. If we interrupt Linux while it is already running over the interrupt stack, the stack becomes taboo on that CPU. From that point on, no RT IRQ must run over the Linux interrupt stack as it would smash it. But then the question is why we should try to use the interrupt stacks for Xenomai at all. It's better to increase the task kernel stacks and disable interrupt stacks when ipipe is enabled. That's what I'm heading for with x86-64 now (THREAD_ORDER 2, no stack switching). What we may do is introducing per-domain interrupt stacks. But that's at best Xenomai 3 / I-pipe 3 stuff. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 07/14/2011 10:57 PM, Jan Kiszka wrote: On 2011-07-13 21:12, Gilles Chanteperdrix wrote: On 07/13/2011 09:04 PM, Jan Kiszka wrote: On 2011-07-13 20:39, Gilles Chanteperdrix wrote: On 07/12/2011 07:43 PM, Jan Kiszka wrote: On 2011-07-12 19:38, Gilles Chanteperdrix wrote: On 07/12/2011 07:34 PM, Jan Kiszka wrote: On 2011-07-12 19:31, Gilles Chanteperdrix wrote: On 07/12/2011 02:57 PM, Jan Kiszka wrote: xnlock_put_irqrestore(nklock, s); xnpod_schedule(); } @@ -1036,6 +1043,7 @@ redo: * to process this signal anyway. */ if (rthal_current_domain == rthal_root_domain) { + XENO_BUGON(NUCLEUS, xnthread_test_info(thread, XNATOMIC)); Misleading dead code again, XNATOMIC is cleared not ten lines above. Nope, I forgot to remove that line. if (XENO_DEBUG(NUCLEUS) (!signal_pending(this_task) || this_task-state != TASK_RUNNING)) xnpod_fatal @@ -1044,6 +1052,8 @@ redo: return -ERESTARTSYS; } + xnthread_clear_info(thread, XNATOMIC); Why this? I find the xnthread_clear_info(XNATOMIC) right at the right place at the point it currently is. Nope. Now we either clear XNATOMIC after successful migration or when the signal is about to be sent (ie. in the hook). That way we can test more reliably (TM) in the gatekeeper if the thread can be migrated. Ok for adding the XNATOMIC test, because it improves the robustness, but why changing the way XNATOMIC is set and clear? Chances of breaking thing while changing code in this area are really high... The current code is (most probably) broken as it does not properly synchronizes the gatekeeper against a signaled and runaway target Linux task. We need an indication if a Linux signal will (or already has) woken up the to-be-migrated task. That task may have continued over its context, potentially on a different CPU. Providing this indication is the purpose of changing where XNATOMIC is cleared. What about synchronizing with the gatekeeper with a semaphore, as done in the first patch you sent, but doing it in xnshadow_harden, as soon as we detect that we are not back from schedule in primary mode? It seems it would avoid any further issue, as we would then be guaranteed that the thread could not switch to TASK_INTERRUPTIBLE again before the gatekeeper is finished. The problem is that the gatekeeper tests the task state without holding the task's rq lock (which is not available to us without a kernel patch). That cannot work reliably as long as we accept signals. That's why I'm trying to move state change and test under nklock. What worries me is the comment in xnshadow_harden: * gatekeeper sent us to primary mode. Since * TASK_UNINTERRUPTIBLE is unavailable to us without wrecking * the runqueue's count of uniniterruptible tasks, we just * notice the issue and gracefully fail; the caller will have * to process this signal anyway. */ Does this mean that we can not switch to TASK_UNINTERRUPTIBLE at this point? Or simply that TASK_UNINTERRUPTIBLE is not available for the business of xnshadow_harden? TASK_UNINTERRUPTIBLE is not available without patching the kernel's scheduler for the reason mentioned in the comment (the scheduler becomes confused and may pick the wrong tasks, IIRC). Does not using down/up in the taskexit event handler risk to cause the same issue? Yes, and that means the first patch is incomplete without something like the second. But I would refrain from trying to improve the gatekeeper design. I've recently mentioned this to Philippe offlist: For Xenomai 3 with some ipipe v3, we must rather patch schedule() to enable zero-switch domain migration. Means: enter the scheduler, let it suspend current and pick another task, but then simply escalate to the RT domain before doing any context switch. That's much cheaper than the current design and hopefully also less error-prone. So, do you want me to merge your for-upstream branch? You may merge up to for-upstream^, ie. without any gatekeeper fixes. I strongly suspect that there are still more races in the migration path. The crashes we face even with all patches applied may be related to a shadow task being executed under Linux and Xenomai at the same time. Maybe we could try the following patch instead? diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c index 01f4200..deb7620 100644 --- a/ksrc/nucleus/shadow.c +++ b/ksrc/nucleus/shadow.c @@ -1033,6 +1033,8 @@ redo: xnpod_fatal (xnshadow_harden() failed for thread %s[%d], thread-name, xnthread_user_pid(thread)); + down(sched-gksync); + up(sched-gksync); return -ERESTARTSYS; } --
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 2011-07-15 14:30, Gilles Chanteperdrix wrote: On 07/14/2011 10:57 PM, Jan Kiszka wrote: On 2011-07-13 21:12, Gilles Chanteperdrix wrote: On 07/13/2011 09:04 PM, Jan Kiszka wrote: On 2011-07-13 20:39, Gilles Chanteperdrix wrote: On 07/12/2011 07:43 PM, Jan Kiszka wrote: On 2011-07-12 19:38, Gilles Chanteperdrix wrote: On 07/12/2011 07:34 PM, Jan Kiszka wrote: On 2011-07-12 19:31, Gilles Chanteperdrix wrote: On 07/12/2011 02:57 PM, Jan Kiszka wrote: xnlock_put_irqrestore(nklock, s); xnpod_schedule(); } @@ -1036,6 +1043,7 @@ redo: * to process this signal anyway. */ if (rthal_current_domain == rthal_root_domain) { +XENO_BUGON(NUCLEUS, xnthread_test_info(thread, XNATOMIC)); Misleading dead code again, XNATOMIC is cleared not ten lines above. Nope, I forgot to remove that line. if (XENO_DEBUG(NUCLEUS) (!signal_pending(this_task) || this_task-state != TASK_RUNNING)) xnpod_fatal @@ -1044,6 +1052,8 @@ redo: return -ERESTARTSYS; } +xnthread_clear_info(thread, XNATOMIC); Why this? I find the xnthread_clear_info(XNATOMIC) right at the right place at the point it currently is. Nope. Now we either clear XNATOMIC after successful migration or when the signal is about to be sent (ie. in the hook). That way we can test more reliably (TM) in the gatekeeper if the thread can be migrated. Ok for adding the XNATOMIC test, because it improves the robustness, but why changing the way XNATOMIC is set and clear? Chances of breaking thing while changing code in this area are really high... The current code is (most probably) broken as it does not properly synchronizes the gatekeeper against a signaled and runaway target Linux task. We need an indication if a Linux signal will (or already has) woken up the to-be-migrated task. That task may have continued over its context, potentially on a different CPU. Providing this indication is the purpose of changing where XNATOMIC is cleared. What about synchronizing with the gatekeeper with a semaphore, as done in the first patch you sent, but doing it in xnshadow_harden, as soon as we detect that we are not back from schedule in primary mode? It seems it would avoid any further issue, as we would then be guaranteed that the thread could not switch to TASK_INTERRUPTIBLE again before the gatekeeper is finished. The problem is that the gatekeeper tests the task state without holding the task's rq lock (which is not available to us without a kernel patch). That cannot work reliably as long as we accept signals. That's why I'm trying to move state change and test under nklock. What worries me is the comment in xnshadow_harden: * gatekeeper sent us to primary mode. Since * TASK_UNINTERRUPTIBLE is unavailable to us without wrecking * the runqueue's count of uniniterruptible tasks, we just * notice the issue and gracefully fail; the caller will have * to process this signal anyway. */ Does this mean that we can not switch to TASK_UNINTERRUPTIBLE at this point? Or simply that TASK_UNINTERRUPTIBLE is not available for the business of xnshadow_harden? TASK_UNINTERRUPTIBLE is not available without patching the kernel's scheduler for the reason mentioned in the comment (the scheduler becomes confused and may pick the wrong tasks, IIRC). Does not using down/up in the taskexit event handler risk to cause the same issue? Yes, and that means the first patch is incomplete without something like the second. But I would refrain from trying to improve the gatekeeper design. I've recently mentioned this to Philippe offlist: For Xenomai 3 with some ipipe v3, we must rather patch schedule() to enable zero-switch domain migration. Means: enter the scheduler, let it suspend current and pick another task, but then simply escalate to the RT domain before doing any context switch. That's much cheaper than the current design and hopefully also less error-prone. So, do you want me to merge your for-upstream branch? You may merge up to for-upstream^, ie. without any gatekeeper fixes. I strongly suspect that there are still more races in the migration path. The crashes we face even with all patches applied may be related to a shadow task being executed under Linux and Xenomai at the same time. Maybe we could try the following patch instead? diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c index 01f4200..deb7620 100644 --- a/ksrc/nucleus/shadow.c +++ b/ksrc/nucleus/shadow.c @@ -1033,6 +1033,8 @@ redo: xnpod_fatal (xnshadow_harden() failed for thread %s[%d], thread-name, xnthread_user_pid(thread)); + down(sched-gksync); + up(sched-gksync); return -ERESTARTSYS; } I don't think we need this. But
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 2011-07-13 21:12, Gilles Chanteperdrix wrote: On 07/13/2011 09:04 PM, Jan Kiszka wrote: On 2011-07-13 20:39, Gilles Chanteperdrix wrote: On 07/12/2011 07:43 PM, Jan Kiszka wrote: On 2011-07-12 19:38, Gilles Chanteperdrix wrote: On 07/12/2011 07:34 PM, Jan Kiszka wrote: On 2011-07-12 19:31, Gilles Chanteperdrix wrote: On 07/12/2011 02:57 PM, Jan Kiszka wrote: xnlock_put_irqrestore(nklock, s); xnpod_schedule(); } @@ -1036,6 +1043,7 @@ redo: * to process this signal anyway. */ if (rthal_current_domain == rthal_root_domain) { + XENO_BUGON(NUCLEUS, xnthread_test_info(thread, XNATOMIC)); Misleading dead code again, XNATOMIC is cleared not ten lines above. Nope, I forgot to remove that line. if (XENO_DEBUG(NUCLEUS) (!signal_pending(this_task) || this_task-state != TASK_RUNNING)) xnpod_fatal @@ -1044,6 +1052,8 @@ redo: return -ERESTARTSYS; } + xnthread_clear_info(thread, XNATOMIC); Why this? I find the xnthread_clear_info(XNATOMIC) right at the right place at the point it currently is. Nope. Now we either clear XNATOMIC after successful migration or when the signal is about to be sent (ie. in the hook). That way we can test more reliably (TM) in the gatekeeper if the thread can be migrated. Ok for adding the XNATOMIC test, because it improves the robustness, but why changing the way XNATOMIC is set and clear? Chances of breaking thing while changing code in this area are really high... The current code is (most probably) broken as it does not properly synchronizes the gatekeeper against a signaled and runaway target Linux task. We need an indication if a Linux signal will (or already has) woken up the to-be-migrated task. That task may have continued over its context, potentially on a different CPU. Providing this indication is the purpose of changing where XNATOMIC is cleared. What about synchronizing with the gatekeeper with a semaphore, as done in the first patch you sent, but doing it in xnshadow_harden, as soon as we detect that we are not back from schedule in primary mode? It seems it would avoid any further issue, as we would then be guaranteed that the thread could not switch to TASK_INTERRUPTIBLE again before the gatekeeper is finished. The problem is that the gatekeeper tests the task state without holding the task's rq lock (which is not available to us without a kernel patch). That cannot work reliably as long as we accept signals. That's why I'm trying to move state change and test under nklock. What worries me is the comment in xnshadow_harden: * gatekeeper sent us to primary mode. Since * TASK_UNINTERRUPTIBLE is unavailable to us without wrecking * the runqueue's count of uniniterruptible tasks, we just * notice the issue and gracefully fail; the caller will have * to process this signal anyway. */ Does this mean that we can not switch to TASK_UNINTERRUPTIBLE at this point? Or simply that TASK_UNINTERRUPTIBLE is not available for the business of xnshadow_harden? TASK_UNINTERRUPTIBLE is not available without patching the kernel's scheduler for the reason mentioned in the comment (the scheduler becomes confused and may pick the wrong tasks, IIRC). Does not using down/up in the taskexit event handler risk to cause the same issue? Yes, and that means the first patch is incomplete without something like the second. But I would refrain from trying to improve the gatekeeper design. I've recently mentioned this to Philippe offlist: For Xenomai 3 with some ipipe v3, we must rather patch schedule() to enable zero-switch domain migration. Means: enter the scheduler, let it suspend current and pick another task, but then simply escalate to the RT domain before doing any context switch. That's much cheaper than the current design and hopefully also less error-prone. So, do you want me to merge your for-upstream branch? You may merge up to for-upstream^, ie. without any gatekeeper fixes. I strongly suspect that there are still more races in the migration path. The crashes we face even with all patches applied may be related to a shadow task being executed under Linux and Xenomai at the same time. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 07/12/2011 07:43 PM, Jan Kiszka wrote: On 2011-07-12 19:38, Gilles Chanteperdrix wrote: On 07/12/2011 07:34 PM, Jan Kiszka wrote: On 2011-07-12 19:31, Gilles Chanteperdrix wrote: On 07/12/2011 02:57 PM, Jan Kiszka wrote: xnlock_put_irqrestore(nklock, s); xnpod_schedule(); } @@ -1036,6 +1043,7 @@ redo: * to process this signal anyway. */ if (rthal_current_domain == rthal_root_domain) { + XENO_BUGON(NUCLEUS, xnthread_test_info(thread, XNATOMIC)); Misleading dead code again, XNATOMIC is cleared not ten lines above. Nope, I forgot to remove that line. if (XENO_DEBUG(NUCLEUS) (!signal_pending(this_task) || this_task-state != TASK_RUNNING)) xnpod_fatal @@ -1044,6 +1052,8 @@ redo: return -ERESTARTSYS; } + xnthread_clear_info(thread, XNATOMIC); Why this? I find the xnthread_clear_info(XNATOMIC) right at the right place at the point it currently is. Nope. Now we either clear XNATOMIC after successful migration or when the signal is about to be sent (ie. in the hook). That way we can test more reliably (TM) in the gatekeeper if the thread can be migrated. Ok for adding the XNATOMIC test, because it improves the robustness, but why changing the way XNATOMIC is set and clear? Chances of breaking thing while changing code in this area are really high... The current code is (most probably) broken as it does not properly synchronizes the gatekeeper against a signaled and runaway target Linux task. We need an indication if a Linux signal will (or already has) woken up the to-be-migrated task. That task may have continued over its context, potentially on a different CPU. Providing this indication is the purpose of changing where XNATOMIC is cleared. What about synchronizing with the gatekeeper with a semaphore, as done in the first patch you sent, but doing it in xnshadow_harden, as soon as we detect that we are not back from schedule in primary mode? It seems it would avoid any further issue, as we would then be guaranteed that the thread could not switch to TASK_INTERRUPTIBLE again before the gatekeeper is finished. What worries me is the comment in xnshadow_harden: * gatekeeper sent us to primary mode. Since * TASK_UNINTERRUPTIBLE is unavailable to us without wrecking * the runqueue's count of uniniterruptible tasks, we just * notice the issue and gracefully fail; the caller will have * to process this signal anyway. */ Does this mean that we can not switch to TASK_UNINTERRUPTIBLE at this point? Or simply that TASK_UNINTERRUPTIBLE is not available for the business of xnshadow_harden? -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 2011-07-13 20:39, Gilles Chanteperdrix wrote: On 07/12/2011 07:43 PM, Jan Kiszka wrote: On 2011-07-12 19:38, Gilles Chanteperdrix wrote: On 07/12/2011 07:34 PM, Jan Kiszka wrote: On 2011-07-12 19:31, Gilles Chanteperdrix wrote: On 07/12/2011 02:57 PM, Jan Kiszka wrote: xnlock_put_irqrestore(nklock, s); xnpod_schedule(); } @@ -1036,6 +1043,7 @@ redo: * to process this signal anyway. */ if (rthal_current_domain == rthal_root_domain) { +XENO_BUGON(NUCLEUS, xnthread_test_info(thread, XNATOMIC)); Misleading dead code again, XNATOMIC is cleared not ten lines above. Nope, I forgot to remove that line. if (XENO_DEBUG(NUCLEUS) (!signal_pending(this_task) || this_task-state != TASK_RUNNING)) xnpod_fatal @@ -1044,6 +1052,8 @@ redo: return -ERESTARTSYS; } +xnthread_clear_info(thread, XNATOMIC); Why this? I find the xnthread_clear_info(XNATOMIC) right at the right place at the point it currently is. Nope. Now we either clear XNATOMIC after successful migration or when the signal is about to be sent (ie. in the hook). That way we can test more reliably (TM) in the gatekeeper if the thread can be migrated. Ok for adding the XNATOMIC test, because it improves the robustness, but why changing the way XNATOMIC is set and clear? Chances of breaking thing while changing code in this area are really high... The current code is (most probably) broken as it does not properly synchronizes the gatekeeper against a signaled and runaway target Linux task. We need an indication if a Linux signal will (or already has) woken up the to-be-migrated task. That task may have continued over its context, potentially on a different CPU. Providing this indication is the purpose of changing where XNATOMIC is cleared. What about synchronizing with the gatekeeper with a semaphore, as done in the first patch you sent, but doing it in xnshadow_harden, as soon as we detect that we are not back from schedule in primary mode? It seems it would avoid any further issue, as we would then be guaranteed that the thread could not switch to TASK_INTERRUPTIBLE again before the gatekeeper is finished. The problem is that the gatekeeper tests the task state without holding the task's rq lock (which is not available to us without a kernel patch). That cannot work reliably as long as we accept signals. That's why I'm trying to move state change and test under nklock. What worries me is the comment in xnshadow_harden: * gatekeeper sent us to primary mode. Since * TASK_UNINTERRUPTIBLE is unavailable to us without wrecking * the runqueue's count of uniniterruptible tasks, we just * notice the issue and gracefully fail; the caller will have * to process this signal anyway. */ Does this mean that we can not switch to TASK_UNINTERRUPTIBLE at this point? Or simply that TASK_UNINTERRUPTIBLE is not available for the business of xnshadow_harden? TASK_UNINTERRUPTIBLE is not available without patching the kernel's scheduler for the reason mentioned in the comment (the scheduler becomes confused and may pick the wrong tasks, IIRC). But I would refrain from trying to improve the gatekeeper design. I've recently mentioned this to Philippe offlist: For Xenomai 3 with some ipipe v3, we must rather patch schedule() to enable zero-switch domain migration. Means: enter the scheduler, let it suspend current and pick another task, but then simply escalate to the RT domain before doing any context switch. That's much cheaper than the current design and hopefully also less error-prone. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 07/13/2011 09:04 PM, Jan Kiszka wrote: On 2011-07-13 20:39, Gilles Chanteperdrix wrote: On 07/12/2011 07:43 PM, Jan Kiszka wrote: On 2011-07-12 19:38, Gilles Chanteperdrix wrote: On 07/12/2011 07:34 PM, Jan Kiszka wrote: On 2011-07-12 19:31, Gilles Chanteperdrix wrote: On 07/12/2011 02:57 PM, Jan Kiszka wrote: xnlock_put_irqrestore(nklock, s); xnpod_schedule(); } @@ -1036,6 +1043,7 @@ redo: * to process this signal anyway. */ if (rthal_current_domain == rthal_root_domain) { + XENO_BUGON(NUCLEUS, xnthread_test_info(thread, XNATOMIC)); Misleading dead code again, XNATOMIC is cleared not ten lines above. Nope, I forgot to remove that line. if (XENO_DEBUG(NUCLEUS) (!signal_pending(this_task) || this_task-state != TASK_RUNNING)) xnpod_fatal @@ -1044,6 +1052,8 @@ redo: return -ERESTARTSYS; } + xnthread_clear_info(thread, XNATOMIC); Why this? I find the xnthread_clear_info(XNATOMIC) right at the right place at the point it currently is. Nope. Now we either clear XNATOMIC after successful migration or when the signal is about to be sent (ie. in the hook). That way we can test more reliably (TM) in the gatekeeper if the thread can be migrated. Ok for adding the XNATOMIC test, because it improves the robustness, but why changing the way XNATOMIC is set and clear? Chances of breaking thing while changing code in this area are really high... The current code is (most probably) broken as it does not properly synchronizes the gatekeeper against a signaled and runaway target Linux task. We need an indication if a Linux signal will (or already has) woken up the to-be-migrated task. That task may have continued over its context, potentially on a different CPU. Providing this indication is the purpose of changing where XNATOMIC is cleared. What about synchronizing with the gatekeeper with a semaphore, as done in the first patch you sent, but doing it in xnshadow_harden, as soon as we detect that we are not back from schedule in primary mode? It seems it would avoid any further issue, as we would then be guaranteed that the thread could not switch to TASK_INTERRUPTIBLE again before the gatekeeper is finished. The problem is that the gatekeeper tests the task state without holding the task's rq lock (which is not available to us without a kernel patch). That cannot work reliably as long as we accept signals. That's why I'm trying to move state change and test under nklock. What worries me is the comment in xnshadow_harden: * gatekeeper sent us to primary mode. Since * TASK_UNINTERRUPTIBLE is unavailable to us without wrecking * the runqueue's count of uniniterruptible tasks, we just * notice the issue and gracefully fail; the caller will have * to process this signal anyway. */ Does this mean that we can not switch to TASK_UNINTERRUPTIBLE at this point? Or simply that TASK_UNINTERRUPTIBLE is not available for the business of xnshadow_harden? TASK_UNINTERRUPTIBLE is not available without patching the kernel's scheduler for the reason mentioned in the comment (the scheduler becomes confused and may pick the wrong tasks, IIRC). Does not using down/up in the taskexit event handler risk to cause the same issue? But I would refrain from trying to improve the gatekeeper design. I've recently mentioned this to Philippe offlist: For Xenomai 3 with some ipipe v3, we must rather patch schedule() to enable zero-switch domain migration. Means: enter the scheduler, let it suspend current and pick another task, but then simply escalate to the RT domain before doing any context switch. That's much cheaper than the current design and hopefully also less error-prone. So, do you want me to merge your for-upstream branch? -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On Wed, 2011-07-13 at 20:39 +0200, Gilles Chanteperdrix wrote: On 07/12/2011 07:43 PM, Jan Kiszka wrote: On 2011-07-12 19:38, Gilles Chanteperdrix wrote: On 07/12/2011 07:34 PM, Jan Kiszka wrote: On 2011-07-12 19:31, Gilles Chanteperdrix wrote: On 07/12/2011 02:57 PM, Jan Kiszka wrote: xnlock_put_irqrestore(nklock, s); xnpod_schedule(); } @@ -1036,6 +1043,7 @@ redo: * to process this signal anyway. */ if (rthal_current_domain == rthal_root_domain) { + XENO_BUGON(NUCLEUS, xnthread_test_info(thread, XNATOMIC)); Misleading dead code again, XNATOMIC is cleared not ten lines above. Nope, I forgot to remove that line. if (XENO_DEBUG(NUCLEUS) (!signal_pending(this_task) || this_task-state != TASK_RUNNING)) xnpod_fatal @@ -1044,6 +1052,8 @@ redo: return -ERESTARTSYS; } + xnthread_clear_info(thread, XNATOMIC); Why this? I find the xnthread_clear_info(XNATOMIC) right at the right place at the point it currently is. Nope. Now we either clear XNATOMIC after successful migration or when the signal is about to be sent (ie. in the hook). That way we can test more reliably (TM) in the gatekeeper if the thread can be migrated. Ok for adding the XNATOMIC test, because it improves the robustness, but why changing the way XNATOMIC is set and clear? Chances of breaking thing while changing code in this area are really high... The current code is (most probably) broken as it does not properly synchronizes the gatekeeper against a signaled and runaway target Linux task. We need an indication if a Linux signal will (or already has) woken up the to-be-migrated task. That task may have continued over its context, potentially on a different CPU. Providing this indication is the purpose of changing where XNATOMIC is cleared. What about synchronizing with the gatekeeper with a semaphore, as done in the first patch you sent, but doing it in xnshadow_harden, as soon as we detect that we are not back from schedule in primary mode? It seems it would avoid any further issue, as we would then be guaranteed that the thread could not switch to TASK_INTERRUPTIBLE again before the gatekeeper is finished. What worries me is the comment in xnshadow_harden: * gatekeeper sent us to primary mode. Since * TASK_UNINTERRUPTIBLE is unavailable to us without wrecking * the runqueue's count of uniniterruptible tasks, we just * notice the issue and gracefully fail; the caller will have * to process this signal anyway. */ Does this mean that we can not switch to TASK_UNINTERRUPTIBLE at this point? Or simply that TASK_UNINTERRUPTIBLE is not available for the business of xnshadow_harden? Second interpretation is correct. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 07/11/2011 10:12 PM, Jan Kiszka wrote: On 2011-07-11 22:09, Gilles Chanteperdrix wrote: On 07/11/2011 10:06 PM, Jan Kiszka wrote: On 2011-07-11 22:02, Gilles Chanteperdrix wrote: On 07/11/2011 09:59 PM, Jan Kiszka wrote: On 2011-07-11 21:51, Gilles Chanteperdrix wrote: On 07/11/2011 09:16 PM, Jan Kiszka wrote: On 2011-07-11 21:10, Jan Kiszka wrote: On 2011-07-11 20:53, Gilles Chanteperdrix wrote: On 07/08/2011 06:29 PM, GIT version control wrote: @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + +gksched = thread-gksched; +if (gksched) { +xnlock_put_irqrestore(nklock, s); Are we sure irqs are on here? Are you sure that what is needed is not an xnlock_clear_irqon? We are in the context of do_exit. Not only IRQs are on, also preemption. And surely no nklock is held. Furthermore, I do not understand how we synchronize with the gatekeeper, how is the gatekeeper garanteed to wait for this assignment? The gatekeeper holds the gksync token while it's active. We request it, thus we wait for the gatekeeper to become idle again. While it is idle, we reset the queued reference - but I just realized that this may tramp on other tasks' values. I need to add a check that the value to be null'ified is actually still ours. Thinking again, that's actually not a problem: gktarget is only needed while gksync is zero - but then we won't get hold of it anyway and, thus, can't cause any damage. Well, you make it look like it does not work. From what I understand, what you want is to set gktarget to null if a task being hardened is destroyed. But by waiting for the semaphore, you actually wait for the harden to be complete, so setting to NULL is useless. Or am I missing something else? Setting to NULL is probably unneeded but still better than rely on the gatekeeper never waking up spuriously and then dereferencing a stale pointer. The key element of this fix is waitng on gksync, thus on the completion of the non-RT part of the hardening. Actually, this part usually fails as the target task received a termination signal at this point. Yes, but since you wait on the completion of the hardening, the test if (target ...) in the gatekeeper code will always be true, because at this point the cleanup code will still be waiting for the semaphore. Yes, except we will ever wake up the gatekeeper later on without an updated gktarget, ie. spuriously. Better safe than sorry, this is hairy code anyway (hopefully obsolete one day). The gatekeeper is not woken up by posting the semaphore, the gatekeeper is woken up by the thread which is going to be hardened (and this thread is the one which waits for the semaphore). All true. And what is the point? The point being, would not something like this patch be sufficient? diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c index 01f4200..4742c02 100644 --- a/ksrc/nucleus/shadow.c +++ b/ksrc/nucleus/shadow.c @@ -2527,6 +2527,18 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + if (xnthread_test_info(thread, XNATOMIC)) { + struct xnsched *gksched = xnpod_sched_slot(task_cpu(p)); + xnlock_put_irqrestore(nklock, s); + + /* Thread is in flight to primary mode, wait for the + gatekeeper to be done with it. */ + down(gksched-gksync); + up(gksched-gksync); + + xnlock_get_irqsave(nklock, s); + } + /* Prevent wakeup call from xnshadow_unmap(). */ xnshadow_thrptd(p) = NULL; xnthread_archtcb(thread)-user_task = NULL; -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 2011-07-12 08:41, Gilles Chanteperdrix wrote: On 07/11/2011 10:12 PM, Jan Kiszka wrote: On 2011-07-11 22:09, Gilles Chanteperdrix wrote: On 07/11/2011 10:06 PM, Jan Kiszka wrote: On 2011-07-11 22:02, Gilles Chanteperdrix wrote: On 07/11/2011 09:59 PM, Jan Kiszka wrote: On 2011-07-11 21:51, Gilles Chanteperdrix wrote: On 07/11/2011 09:16 PM, Jan Kiszka wrote: On 2011-07-11 21:10, Jan Kiszka wrote: On 2011-07-11 20:53, Gilles Chanteperdrix wrote: On 07/08/2011 06:29 PM, GIT version control wrote: @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + + gksched = thread-gksched; + if (gksched) { + xnlock_put_irqrestore(nklock, s); Are we sure irqs are on here? Are you sure that what is needed is not an xnlock_clear_irqon? We are in the context of do_exit. Not only IRQs are on, also preemption. And surely no nklock is held. Furthermore, I do not understand how we synchronize with the gatekeeper, how is the gatekeeper garanteed to wait for this assignment? The gatekeeper holds the gksync token while it's active. We request it, thus we wait for the gatekeeper to become idle again. While it is idle, we reset the queued reference - but I just realized that this may tramp on other tasks' values. I need to add a check that the value to be null'ified is actually still ours. Thinking again, that's actually not a problem: gktarget is only needed while gksync is zero - but then we won't get hold of it anyway and, thus, can't cause any damage. Well, you make it look like it does not work. From what I understand, what you want is to set gktarget to null if a task being hardened is destroyed. But by waiting for the semaphore, you actually wait for the harden to be complete, so setting to NULL is useless. Or am I missing something else? Setting to NULL is probably unneeded but still better than rely on the gatekeeper never waking up spuriously and then dereferencing a stale pointer. The key element of this fix is waitng on gksync, thus on the completion of the non-RT part of the hardening. Actually, this part usually fails as the target task received a termination signal at this point. Yes, but since you wait on the completion of the hardening, the test if (target ...) in the gatekeeper code will always be true, because at this point the cleanup code will still be waiting for the semaphore. Yes, except we will ever wake up the gatekeeper later on without an updated gktarget, ie. spuriously. Better safe than sorry, this is hairy code anyway (hopefully obsolete one day). The gatekeeper is not woken up by posting the semaphore, the gatekeeper is woken up by the thread which is going to be hardened (and this thread is the one which waits for the semaphore). All true. And what is the point? The point being, would not something like this patch be sufficient? diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c index 01f4200..4742c02 100644 --- a/ksrc/nucleus/shadow.c +++ b/ksrc/nucleus/shadow.c @@ -2527,6 +2527,18 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + if (xnthread_test_info(thread, XNATOMIC)) { + struct xnsched *gksched = xnpod_sched_slot(task_cpu(p)); That's not reliable, the task might have been migrated by Linux in the meantime. We must use the stored gksched. + xnlock_put_irqrestore(nklock, s); + + /* Thread is in flight to primary mode, wait for the +gatekeeper to be done with it. */ + down(gksched-gksync); + up(gksched-gksync); + + xnlock_get_irqsave(nklock, s); + } + /* Prevent wakeup call from xnshadow_unmap(). */ xnshadow_thrptd(p) = NULL; xnthread_archtcb(thread)-user_task = NULL; Again, setting gktarget to NULL and testing for NULL is simply safer, and I see no gain in skipping that. But if you prefer the micro-optimization, I'll drop it. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 07/12/2011 09:22 AM, Jan Kiszka wrote: On 2011-07-12 08:41, Gilles Chanteperdrix wrote: On 07/11/2011 10:12 PM, Jan Kiszka wrote: On 2011-07-11 22:09, Gilles Chanteperdrix wrote: On 07/11/2011 10:06 PM, Jan Kiszka wrote: On 2011-07-11 22:02, Gilles Chanteperdrix wrote: On 07/11/2011 09:59 PM, Jan Kiszka wrote: On 2011-07-11 21:51, Gilles Chanteperdrix wrote: On 07/11/2011 09:16 PM, Jan Kiszka wrote: On 2011-07-11 21:10, Jan Kiszka wrote: On 2011-07-11 20:53, Gilles Chanteperdrix wrote: On 07/08/2011 06:29 PM, GIT version control wrote: @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + + gksched = thread-gksched; + if (gksched) { + xnlock_put_irqrestore(nklock, s); Are we sure irqs are on here? Are you sure that what is needed is not an xnlock_clear_irqon? We are in the context of do_exit. Not only IRQs are on, also preemption. And surely no nklock is held. Furthermore, I do not understand how we synchronize with the gatekeeper, how is the gatekeeper garanteed to wait for this assignment? The gatekeeper holds the gksync token while it's active. We request it, thus we wait for the gatekeeper to become idle again. While it is idle, we reset the queued reference - but I just realized that this may tramp on other tasks' values. I need to add a check that the value to be null'ified is actually still ours. Thinking again, that's actually not a problem: gktarget is only needed while gksync is zero - but then we won't get hold of it anyway and, thus, can't cause any damage. Well, you make it look like it does not work. From what I understand, what you want is to set gktarget to null if a task being hardened is destroyed. But by waiting for the semaphore, you actually wait for the harden to be complete, so setting to NULL is useless. Or am I missing something else? Setting to NULL is probably unneeded but still better than rely on the gatekeeper never waking up spuriously and then dereferencing a stale pointer. The key element of this fix is waitng on gksync, thus on the completion of the non-RT part of the hardening. Actually, this part usually fails as the target task received a termination signal at this point. Yes, but since you wait on the completion of the hardening, the test if (target ...) in the gatekeeper code will always be true, because at this point the cleanup code will still be waiting for the semaphore. Yes, except we will ever wake up the gatekeeper later on without an updated gktarget, ie. spuriously. Better safe than sorry, this is hairy code anyway (hopefully obsolete one day). The gatekeeper is not woken up by posting the semaphore, the gatekeeper is woken up by the thread which is going to be hardened (and this thread is the one which waits for the semaphore). All true. And what is the point? The point being, would not something like this patch be sufficient? diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c index 01f4200..4742c02 100644 --- a/ksrc/nucleus/shadow.c +++ b/ksrc/nucleus/shadow.c @@ -2527,6 +2527,18 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); +if (xnthread_test_info(thread, XNATOMIC)) { +struct xnsched *gksched = xnpod_sched_slot(task_cpu(p)); That's not reliable, the task might have been migrated by Linux in the meantime. We must use the stored gksched. +xnlock_put_irqrestore(nklock, s); + +/* Thread is in flight to primary mode, wait for the + gatekeeper to be done with it. */ +down(gksched-gksync); +up(gksched-gksync); + +xnlock_get_irqsave(nklock, s); +} + /* Prevent wakeup call from xnshadow_unmap(). */ xnshadow_thrptd(p) = NULL; xnthread_archtcb(thread)-user_task = NULL; Again, setting gktarget to NULL and testing for NULL is simply safer, From my point of view, testing for NULL is misleading dead code, since it will never happen. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 07/12/2011 09:22 AM, Jan Kiszka wrote: On 2011-07-12 08:41, Gilles Chanteperdrix wrote: On 07/11/2011 10:12 PM, Jan Kiszka wrote: On 2011-07-11 22:09, Gilles Chanteperdrix wrote: On 07/11/2011 10:06 PM, Jan Kiszka wrote: On 2011-07-11 22:02, Gilles Chanteperdrix wrote: On 07/11/2011 09:59 PM, Jan Kiszka wrote: On 2011-07-11 21:51, Gilles Chanteperdrix wrote: On 07/11/2011 09:16 PM, Jan Kiszka wrote: On 2011-07-11 21:10, Jan Kiszka wrote: On 2011-07-11 20:53, Gilles Chanteperdrix wrote: On 07/08/2011 06:29 PM, GIT version control wrote: @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + + gksched = thread-gksched; + if (gksched) { + xnlock_put_irqrestore(nklock, s); Are we sure irqs are on here? Are you sure that what is needed is not an xnlock_clear_irqon? We are in the context of do_exit. Not only IRQs are on, also preemption. And surely no nklock is held. Furthermore, I do not understand how we synchronize with the gatekeeper, how is the gatekeeper garanteed to wait for this assignment? The gatekeeper holds the gksync token while it's active. We request it, thus we wait for the gatekeeper to become idle again. While it is idle, we reset the queued reference - but I just realized that this may tramp on other tasks' values. I need to add a check that the value to be null'ified is actually still ours. Thinking again, that's actually not a problem: gktarget is only needed while gksync is zero - but then we won't get hold of it anyway and, thus, can't cause any damage. Well, you make it look like it does not work. From what I understand, what you want is to set gktarget to null if a task being hardened is destroyed. But by waiting for the semaphore, you actually wait for the harden to be complete, so setting to NULL is useless. Or am I missing something else? Setting to NULL is probably unneeded but still better than rely on the gatekeeper never waking up spuriously and then dereferencing a stale pointer. The key element of this fix is waitng on gksync, thus on the completion of the non-RT part of the hardening. Actually, this part usually fails as the target task received a termination signal at this point. Yes, but since you wait on the completion of the hardening, the test if (target ...) in the gatekeeper code will always be true, because at this point the cleanup code will still be waiting for the semaphore. Yes, except we will ever wake up the gatekeeper later on without an updated gktarget, ie. spuriously. Better safe than sorry, this is hairy code anyway (hopefully obsolete one day). The gatekeeper is not woken up by posting the semaphore, the gatekeeper is woken up by the thread which is going to be hardened (and this thread is the one which waits for the semaphore). All true. And what is the point? The point being, would not something like this patch be sufficient? diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c index 01f4200..4742c02 100644 --- a/ksrc/nucleus/shadow.c +++ b/ksrc/nucleus/shadow.c @@ -2527,6 +2527,18 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); +if (xnthread_test_info(thread, XNATOMIC)) { +struct xnsched *gksched = xnpod_sched_slot(task_cpu(p)); That's not reliable, the task might have been migrated by Linux in the meantime. We must use the stored gksched. +xnlock_put_irqrestore(nklock, s); + +/* Thread is in flight to primary mode, wait for the + gatekeeper to be done with it. */ +down(gksched-gksync); +up(gksched-gksync); + +xnlock_get_irqsave(nklock, s); +} + /* Prevent wakeup call from xnshadow_unmap(). */ xnshadow_thrptd(p) = NULL; xnthread_archtcb(thread)-user_task = NULL; Again, setting gktarget to NULL and testing for NULL is simply safer, and I see no gain in skipping that. But if you prefer the micro-optimization, I'll drop it. Could not we use an info bit instead of adding a pointer? -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 2011-07-12 12:59, Gilles Chanteperdrix wrote: On 07/12/2011 09:22 AM, Jan Kiszka wrote: On 2011-07-12 08:41, Gilles Chanteperdrix wrote: On 07/11/2011 10:12 PM, Jan Kiszka wrote: On 2011-07-11 22:09, Gilles Chanteperdrix wrote: On 07/11/2011 10:06 PM, Jan Kiszka wrote: On 2011-07-11 22:02, Gilles Chanteperdrix wrote: On 07/11/2011 09:59 PM, Jan Kiszka wrote: On 2011-07-11 21:51, Gilles Chanteperdrix wrote: On 07/11/2011 09:16 PM, Jan Kiszka wrote: On 2011-07-11 21:10, Jan Kiszka wrote: On 2011-07-11 20:53, Gilles Chanteperdrix wrote: On 07/08/2011 06:29 PM, GIT version control wrote: @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + + gksched = thread-gksched; + if (gksched) { + xnlock_put_irqrestore(nklock, s); Are we sure irqs are on here? Are you sure that what is needed is not an xnlock_clear_irqon? We are in the context of do_exit. Not only IRQs are on, also preemption. And surely no nklock is held. Furthermore, I do not understand how we synchronize with the gatekeeper, how is the gatekeeper garanteed to wait for this assignment? The gatekeeper holds the gksync token while it's active. We request it, thus we wait for the gatekeeper to become idle again. While it is idle, we reset the queued reference - but I just realized that this may tramp on other tasks' values. I need to add a check that the value to be null'ified is actually still ours. Thinking again, that's actually not a problem: gktarget is only needed while gksync is zero - but then we won't get hold of it anyway and, thus, can't cause any damage. Well, you make it look like it does not work. From what I understand, what you want is to set gktarget to null if a task being hardened is destroyed. But by waiting for the semaphore, you actually wait for the harden to be complete, so setting to NULL is useless. Or am I missing something else? Setting to NULL is probably unneeded but still better than rely on the gatekeeper never waking up spuriously and then dereferencing a stale pointer. The key element of this fix is waitng on gksync, thus on the completion of the non-RT part of the hardening. Actually, this part usually fails as the target task received a termination signal at this point. Yes, but since you wait on the completion of the hardening, the test if (target ...) in the gatekeeper code will always be true, because at this point the cleanup code will still be waiting for the semaphore. Yes, except we will ever wake up the gatekeeper later on without an updated gktarget, ie. spuriously. Better safe than sorry, this is hairy code anyway (hopefully obsolete one day). The gatekeeper is not woken up by posting the semaphore, the gatekeeper is woken up by the thread which is going to be hardened (and this thread is the one which waits for the semaphore). All true. And what is the point? The point being, would not something like this patch be sufficient? diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c index 01f4200..4742c02 100644 --- a/ksrc/nucleus/shadow.c +++ b/ksrc/nucleus/shadow.c @@ -2527,6 +2527,18 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + if (xnthread_test_info(thread, XNATOMIC)) { + struct xnsched *gksched = xnpod_sched_slot(task_cpu(p)); That's not reliable, the task might have been migrated by Linux in the meantime. We must use the stored gksched. + xnlock_put_irqrestore(nklock, s); + + /* Thread is in flight to primary mode, wait for the + gatekeeper to be done with it. */ + down(gksched-gksync); + up(gksched-gksync); + + xnlock_get_irqsave(nklock, s); + } + /* Prevent wakeup call from xnshadow_unmap(). */ xnshadow_thrptd(p) = NULL; xnthread_archtcb(thread)-user_task = NULL; Again, setting gktarget to NULL and testing for NULL is simply safer, and I see no gain in skipping that. But if you prefer the micro-optimization, I'll drop it. Could not we use an info bit instead of adding a pointer? That's not reliable, the task might have been migrated by Linux in the meantime. We must use the stored gksched. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 07/12/2011 01:00 PM, Jan Kiszka wrote: On 2011-07-12 12:59, Gilles Chanteperdrix wrote: On 07/12/2011 09:22 AM, Jan Kiszka wrote: On 2011-07-12 08:41, Gilles Chanteperdrix wrote: On 07/11/2011 10:12 PM, Jan Kiszka wrote: On 2011-07-11 22:09, Gilles Chanteperdrix wrote: On 07/11/2011 10:06 PM, Jan Kiszka wrote: On 2011-07-11 22:02, Gilles Chanteperdrix wrote: On 07/11/2011 09:59 PM, Jan Kiszka wrote: On 2011-07-11 21:51, Gilles Chanteperdrix wrote: On 07/11/2011 09:16 PM, Jan Kiszka wrote: On 2011-07-11 21:10, Jan Kiszka wrote: On 2011-07-11 20:53, Gilles Chanteperdrix wrote: On 07/08/2011 06:29 PM, GIT version control wrote: @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + +gksched = thread-gksched; +if (gksched) { +xnlock_put_irqrestore(nklock, s); Are we sure irqs are on here? Are you sure that what is needed is not an xnlock_clear_irqon? We are in the context of do_exit. Not only IRQs are on, also preemption. And surely no nklock is held. Furthermore, I do not understand how we synchronize with the gatekeeper, how is the gatekeeper garanteed to wait for this assignment? The gatekeeper holds the gksync token while it's active. We request it, thus we wait for the gatekeeper to become idle again. While it is idle, we reset the queued reference - but I just realized that this may tramp on other tasks' values. I need to add a check that the value to be null'ified is actually still ours. Thinking again, that's actually not a problem: gktarget is only needed while gksync is zero - but then we won't get hold of it anyway and, thus, can't cause any damage. Well, you make it look like it does not work. From what I understand, what you want is to set gktarget to null if a task being hardened is destroyed. But by waiting for the semaphore, you actually wait for the harden to be complete, so setting to NULL is useless. Or am I missing something else? Setting to NULL is probably unneeded but still better than rely on the gatekeeper never waking up spuriously and then dereferencing a stale pointer. The key element of this fix is waitng on gksync, thus on the completion of the non-RT part of the hardening. Actually, this part usually fails as the target task received a termination signal at this point. Yes, but since you wait on the completion of the hardening, the test if (target ...) in the gatekeeper code will always be true, because at this point the cleanup code will still be waiting for the semaphore. Yes, except we will ever wake up the gatekeeper later on without an updated gktarget, ie. spuriously. Better safe than sorry, this is hairy code anyway (hopefully obsolete one day). The gatekeeper is not woken up by posting the semaphore, the gatekeeper is woken up by the thread which is going to be hardened (and this thread is the one which waits for the semaphore). All true. And what is the point? The point being, would not something like this patch be sufficient? diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c index 01f4200..4742c02 100644 --- a/ksrc/nucleus/shadow.c +++ b/ksrc/nucleus/shadow.c @@ -2527,6 +2527,18 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + if (xnthread_test_info(thread, XNATOMIC)) { + struct xnsched *gksched = xnpod_sched_slot(task_cpu(p)); That's not reliable, the task might have been migrated by Linux in the meantime. We must use the stored gksched. + xnlock_put_irqrestore(nklock, s); + + /* Thread is in flight to primary mode, wait for the + gatekeeper to be done with it. */ + down(gksched-gksync); + up(gksched-gksync); + + xnlock_get_irqsave(nklock, s); + } + /* Prevent wakeup call from xnshadow_unmap(). */ xnshadow_thrptd(p) = NULL; xnthread_archtcb(thread)-user_task = NULL; Again, setting gktarget to NULL and testing for NULL is simply safer, and I see no gain in skipping that. But if you prefer the micro-optimization, I'll drop it. Could not we use an info bit instead of adding a pointer? That's not reliable, the task might have been migrated by Linux in the meantime. We must use the stored gksched. I mean add another info bit to mean that the task is queued for wakeup by the gatekeeper. XNGKQ, or something. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 2011-07-12 13:04, Gilles Chanteperdrix wrote: On 07/12/2011 01:00 PM, Jan Kiszka wrote: On 2011-07-12 12:59, Gilles Chanteperdrix wrote: On 07/12/2011 09:22 AM, Jan Kiszka wrote: On 2011-07-12 08:41, Gilles Chanteperdrix wrote: On 07/11/2011 10:12 PM, Jan Kiszka wrote: On 2011-07-11 22:09, Gilles Chanteperdrix wrote: On 07/11/2011 10:06 PM, Jan Kiszka wrote: On 2011-07-11 22:02, Gilles Chanteperdrix wrote: On 07/11/2011 09:59 PM, Jan Kiszka wrote: On 2011-07-11 21:51, Gilles Chanteperdrix wrote: On 07/11/2011 09:16 PM, Jan Kiszka wrote: On 2011-07-11 21:10, Jan Kiszka wrote: On 2011-07-11 20:53, Gilles Chanteperdrix wrote: On 07/08/2011 06:29 PM, GIT version control wrote: @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + + gksched = thread-gksched; + if (gksched) { + xnlock_put_irqrestore(nklock, s); Are we sure irqs are on here? Are you sure that what is needed is not an xnlock_clear_irqon? We are in the context of do_exit. Not only IRQs are on, also preemption. And surely no nklock is held. Furthermore, I do not understand how we synchronize with the gatekeeper, how is the gatekeeper garanteed to wait for this assignment? The gatekeeper holds the gksync token while it's active. We request it, thus we wait for the gatekeeper to become idle again. While it is idle, we reset the queued reference - but I just realized that this may tramp on other tasks' values. I need to add a check that the value to be null'ified is actually still ours. Thinking again, that's actually not a problem: gktarget is only needed while gksync is zero - but then we won't get hold of it anyway and, thus, can't cause any damage. Well, you make it look like it does not work. From what I understand, what you want is to set gktarget to null if a task being hardened is destroyed. But by waiting for the semaphore, you actually wait for the harden to be complete, so setting to NULL is useless. Or am I missing something else? Setting to NULL is probably unneeded but still better than rely on the gatekeeper never waking up spuriously and then dereferencing a stale pointer. The key element of this fix is waitng on gksync, thus on the completion of the non-RT part of the hardening. Actually, this part usually fails as the target task received a termination signal at this point. Yes, but since you wait on the completion of the hardening, the test if (target ...) in the gatekeeper code will always be true, because at this point the cleanup code will still be waiting for the semaphore. Yes, except we will ever wake up the gatekeeper later on without an updated gktarget, ie. spuriously. Better safe than sorry, this is hairy code anyway (hopefully obsolete one day). The gatekeeper is not woken up by posting the semaphore, the gatekeeper is woken up by the thread which is going to be hardened (and this thread is the one which waits for the semaphore). All true. And what is the point? The point being, would not something like this patch be sufficient? diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c index 01f4200..4742c02 100644 --- a/ksrc/nucleus/shadow.c +++ b/ksrc/nucleus/shadow.c @@ -2527,6 +2527,18 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + if (xnthread_test_info(thread, XNATOMIC)) { + struct xnsched *gksched = xnpod_sched_slot(task_cpu(p)); That's not reliable, the task might have been migrated by Linux in the meantime. We must use the stored gksched. + xnlock_put_irqrestore(nklock, s); + + /* Thread is in flight to primary mode, wait for the +gatekeeper to be done with it. */ + down(gksched-gksync); + up(gksched-gksync); + + xnlock_get_irqsave(nklock, s); + } + /* Prevent wakeup call from xnshadow_unmap(). */ xnshadow_thrptd(p) = NULL; xnthread_archtcb(thread)-user_task = NULL; Again, setting gktarget to NULL and testing for NULL is simply safer, and I see no gain in skipping that. But if you prefer the micro-optimization, I'll drop it. Could not we use an info bit instead of adding a pointer? That's not reliable, the task might have been migrated by Linux in the meantime. We must use the stored gksched. I mean add another info bit to mean that the task is queued for wakeup by the gatekeeper. XNGKQ, or something. What additional value does it provide to gksched != NULL? We need that pointer anyway to identify the gatekeeper that holds a reference. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 07/12/2011 01:06 PM, Jan Kiszka wrote: On 2011-07-12 13:04, Gilles Chanteperdrix wrote: On 07/12/2011 01:00 PM, Jan Kiszka wrote: On 2011-07-12 12:59, Gilles Chanteperdrix wrote: On 07/12/2011 09:22 AM, Jan Kiszka wrote: On 2011-07-12 08:41, Gilles Chanteperdrix wrote: On 07/11/2011 10:12 PM, Jan Kiszka wrote: On 2011-07-11 22:09, Gilles Chanteperdrix wrote: On 07/11/2011 10:06 PM, Jan Kiszka wrote: On 2011-07-11 22:02, Gilles Chanteperdrix wrote: On 07/11/2011 09:59 PM, Jan Kiszka wrote: On 2011-07-11 21:51, Gilles Chanteperdrix wrote: On 07/11/2011 09:16 PM, Jan Kiszka wrote: On 2011-07-11 21:10, Jan Kiszka wrote: On 2011-07-11 20:53, Gilles Chanteperdrix wrote: On 07/08/2011 06:29 PM, GIT version control wrote: @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + + gksched = thread-gksched; + if (gksched) { + xnlock_put_irqrestore(nklock, s); Are we sure irqs are on here? Are you sure that what is needed is not an xnlock_clear_irqon? We are in the context of do_exit. Not only IRQs are on, also preemption. And surely no nklock is held. Furthermore, I do not understand how we synchronize with the gatekeeper, how is the gatekeeper garanteed to wait for this assignment? The gatekeeper holds the gksync token while it's active. We request it, thus we wait for the gatekeeper to become idle again. While it is idle, we reset the queued reference - but I just realized that this may tramp on other tasks' values. I need to add a check that the value to be null'ified is actually still ours. Thinking again, that's actually not a problem: gktarget is only needed while gksync is zero - but then we won't get hold of it anyway and, thus, can't cause any damage. Well, you make it look like it does not work. From what I understand, what you want is to set gktarget to null if a task being hardened is destroyed. But by waiting for the semaphore, you actually wait for the harden to be complete, so setting to NULL is useless. Or am I missing something else? Setting to NULL is probably unneeded but still better than rely on the gatekeeper never waking up spuriously and then dereferencing a stale pointer. The key element of this fix is waitng on gksync, thus on the completion of the non-RT part of the hardening. Actually, this part usually fails as the target task received a termination signal at this point. Yes, but since you wait on the completion of the hardening, the test if (target ...) in the gatekeeper code will always be true, because at this point the cleanup code will still be waiting for the semaphore. Yes, except we will ever wake up the gatekeeper later on without an updated gktarget, ie. spuriously. Better safe than sorry, this is hairy code anyway (hopefully obsolete one day). The gatekeeper is not woken up by posting the semaphore, the gatekeeper is woken up by the thread which is going to be hardened (and this thread is the one which waits for the semaphore). All true. And what is the point? The point being, would not something like this patch be sufficient? diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c index 01f4200..4742c02 100644 --- a/ksrc/nucleus/shadow.c +++ b/ksrc/nucleus/shadow.c @@ -2527,6 +2527,18 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); +if (xnthread_test_info(thread, XNATOMIC)) { +struct xnsched *gksched = xnpod_sched_slot(task_cpu(p)); That's not reliable, the task might have been migrated by Linux in the meantime. We must use the stored gksched. +xnlock_put_irqrestore(nklock, s); + +/* Thread is in flight to primary mode, wait for the + gatekeeper to be done with it. */ +down(gksched-gksync); +up(gksched-gksync); + +xnlock_get_irqsave(nklock, s); +} + /* Prevent wakeup call from xnshadow_unmap(). */ xnshadow_thrptd(p) = NULL; xnthread_archtcb(thread)-user_task = NULL; Again, setting gktarget to NULL and testing for NULL is simply safer, and I see no gain in skipping that. But if you prefer the micro-optimization, I'll drop it. Could not we use an info bit instead of adding a pointer? That's not reliable, the task might have been migrated by Linux in the meantime. We must use the stored gksched. I mean add another info bit to mean that the task is queued for wakeup by the gatekeeper. XNGKQ, or something. What additional value does it provide to gksched != NULL? We need that pointer anyway to identify the gatekeeper that holds a reference. No, the scheduler which holds the reference is xnpod_sched_slot(task_cpu(p)) --
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 2011-07-12 13:08, Gilles Chanteperdrix wrote: On 07/12/2011 01:06 PM, Jan Kiszka wrote: On 2011-07-12 13:04, Gilles Chanteperdrix wrote: On 07/12/2011 01:00 PM, Jan Kiszka wrote: On 2011-07-12 12:59, Gilles Chanteperdrix wrote: On 07/12/2011 09:22 AM, Jan Kiszka wrote: On 2011-07-12 08:41, Gilles Chanteperdrix wrote: On 07/11/2011 10:12 PM, Jan Kiszka wrote: On 2011-07-11 22:09, Gilles Chanteperdrix wrote: On 07/11/2011 10:06 PM, Jan Kiszka wrote: On 2011-07-11 22:02, Gilles Chanteperdrix wrote: On 07/11/2011 09:59 PM, Jan Kiszka wrote: On 2011-07-11 21:51, Gilles Chanteperdrix wrote: On 07/11/2011 09:16 PM, Jan Kiszka wrote: On 2011-07-11 21:10, Jan Kiszka wrote: On 2011-07-11 20:53, Gilles Chanteperdrix wrote: On 07/08/2011 06:29 PM, GIT version control wrote: @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + + gksched = thread-gksched; + if (gksched) { + xnlock_put_irqrestore(nklock, s); Are we sure irqs are on here? Are you sure that what is needed is not an xnlock_clear_irqon? We are in the context of do_exit. Not only IRQs are on, also preemption. And surely no nklock is held. Furthermore, I do not understand how we synchronize with the gatekeeper, how is the gatekeeper garanteed to wait for this assignment? The gatekeeper holds the gksync token while it's active. We request it, thus we wait for the gatekeeper to become idle again. While it is idle, we reset the queued reference - but I just realized that this may tramp on other tasks' values. I need to add a check that the value to be null'ified is actually still ours. Thinking again, that's actually not a problem: gktarget is only needed while gksync is zero - but then we won't get hold of it anyway and, thus, can't cause any damage. Well, you make it look like it does not work. From what I understand, what you want is to set gktarget to null if a task being hardened is destroyed. But by waiting for the semaphore, you actually wait for the harden to be complete, so setting to NULL is useless. Or am I missing something else? Setting to NULL is probably unneeded but still better than rely on the gatekeeper never waking up spuriously and then dereferencing a stale pointer. The key element of this fix is waitng on gksync, thus on the completion of the non-RT part of the hardening. Actually, this part usually fails as the target task received a termination signal at this point. Yes, but since you wait on the completion of the hardening, the test if (target ...) in the gatekeeper code will always be true, because at this point the cleanup code will still be waiting for the semaphore. Yes, except we will ever wake up the gatekeeper later on without an updated gktarget, ie. spuriously. Better safe than sorry, this is hairy code anyway (hopefully obsolete one day). The gatekeeper is not woken up by posting the semaphore, the gatekeeper is woken up by the thread which is going to be hardened (and this thread is the one which waits for the semaphore). All true. And what is the point? The point being, would not something like this patch be sufficient? diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c index 01f4200..4742c02 100644 --- a/ksrc/nucleus/shadow.c +++ b/ksrc/nucleus/shadow.c @@ -2527,6 +2527,18 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + if (xnthread_test_info(thread, XNATOMIC)) { + struct xnsched *gksched = xnpod_sched_slot(task_cpu(p)); That's not reliable, the task might have been migrated by Linux in the meantime. We must use the stored gksched. + xnlock_put_irqrestore(nklock, s); + + /* Thread is in flight to primary mode, wait for the + gatekeeper to be done with it. */ + down(gksched-gksync); + up(gksched-gksync); + + xnlock_get_irqsave(nklock, s); + } + /* Prevent wakeup call from xnshadow_unmap(). */ xnshadow_thrptd(p) = NULL; xnthread_archtcb(thread)-user_task = NULL; Again, setting gktarget to NULL and testing for NULL is simply safer, and I see no gain in skipping that. But if you prefer the micro-optimization, I'll drop it. Could not we use an info bit instead of adding a pointer? That's not reliable, the task might have been migrated by Linux in the meantime. We must use the stored gksched. I mean add another info bit to mean that the task is queued for wakeup by the gatekeeper. XNGKQ, or something. What additional value does it provide to gksched != NULL? We need that pointer anyway to identify the gatekeeper that holds a reference. No, the scheduler which holds the reference is
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 07/12/2011 01:10 PM, Jan Kiszka wrote: On 2011-07-12 13:08, Gilles Chanteperdrix wrote: On 07/12/2011 01:06 PM, Jan Kiszka wrote: On 2011-07-12 13:04, Gilles Chanteperdrix wrote: On 07/12/2011 01:00 PM, Jan Kiszka wrote: On 2011-07-12 12:59, Gilles Chanteperdrix wrote: On 07/12/2011 09:22 AM, Jan Kiszka wrote: On 2011-07-12 08:41, Gilles Chanteperdrix wrote: On 07/11/2011 10:12 PM, Jan Kiszka wrote: On 2011-07-11 22:09, Gilles Chanteperdrix wrote: On 07/11/2011 10:06 PM, Jan Kiszka wrote: On 2011-07-11 22:02, Gilles Chanteperdrix wrote: On 07/11/2011 09:59 PM, Jan Kiszka wrote: On 2011-07-11 21:51, Gilles Chanteperdrix wrote: On 07/11/2011 09:16 PM, Jan Kiszka wrote: On 2011-07-11 21:10, Jan Kiszka wrote: On 2011-07-11 20:53, Gilles Chanteperdrix wrote: On 07/08/2011 06:29 PM, GIT version control wrote: @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + +gksched = thread-gksched; +if (gksched) { +xnlock_put_irqrestore(nklock, s); Are we sure irqs are on here? Are you sure that what is needed is not an xnlock_clear_irqon? We are in the context of do_exit. Not only IRQs are on, also preemption. And surely no nklock is held. Furthermore, I do not understand how we synchronize with the gatekeeper, how is the gatekeeper garanteed to wait for this assignment? The gatekeeper holds the gksync token while it's active. We request it, thus we wait for the gatekeeper to become idle again. While it is idle, we reset the queued reference - but I just realized that this may tramp on other tasks' values. I need to add a check that the value to be null'ified is actually still ours. Thinking again, that's actually not a problem: gktarget is only needed while gksync is zero - but then we won't get hold of it anyway and, thus, can't cause any damage. Well, you make it look like it does not work. From what I understand, what you want is to set gktarget to null if a task being hardened is destroyed. But by waiting for the semaphore, you actually wait for the harden to be complete, so setting to NULL is useless. Or am I missing something else? Setting to NULL is probably unneeded but still better than rely on the gatekeeper never waking up spuriously and then dereferencing a stale pointer. The key element of this fix is waitng on gksync, thus on the completion of the non-RT part of the hardening. Actually, this part usually fails as the target task received a termination signal at this point. Yes, but since you wait on the completion of the hardening, the test if (target ...) in the gatekeeper code will always be true, because at this point the cleanup code will still be waiting for the semaphore. Yes, except we will ever wake up the gatekeeper later on without an updated gktarget, ie. spuriously. Better safe than sorry, this is hairy code anyway (hopefully obsolete one day). The gatekeeper is not woken up by posting the semaphore, the gatekeeper is woken up by the thread which is going to be hardened (and this thread is the one which waits for the semaphore). All true. And what is the point? The point being, would not something like this patch be sufficient? diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c index 01f4200..4742c02 100644 --- a/ksrc/nucleus/shadow.c +++ b/ksrc/nucleus/shadow.c @@ -2527,6 +2527,18 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + if (xnthread_test_info(thread, XNATOMIC)) { + struct xnsched *gksched = xnpod_sched_slot(task_cpu(p)); That's not reliable, the task might have been migrated by Linux in the meantime. We must use the stored gksched. + xnlock_put_irqrestore(nklock, s); + + /* Thread is in flight to primary mode, wait for the + gatekeeper to be done with it. */ + down(gksched-gksync); + up(gksched-gksync); + + xnlock_get_irqsave(nklock, s); + } + /* Prevent wakeup call from xnshadow_unmap(). */ xnshadow_thrptd(p) = NULL; xnthread_archtcb(thread)-user_task = NULL; Again, setting gktarget to NULL and testing for NULL is simply safer, and I see no gain in skipping that. But if you prefer the micro-optimization, I'll drop it. Could not we use an info bit instead of adding a pointer? That's not reliable, the task might have been migrated by Linux in the meantime. We must use the stored gksched. I mean add another info bit to mean that the task is queued for wakeup by the gatekeeper. XNGKQ, or something. What additional value does it provide to gksched != NULL? We need that pointer anyway to identify the gatekeeper that holds a reference. No, the scheduler which
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 2011-07-12 13:26, Gilles Chanteperdrix wrote: On 07/12/2011 01:10 PM, Jan Kiszka wrote: On 2011-07-12 13:08, Gilles Chanteperdrix wrote: On 07/12/2011 01:06 PM, Jan Kiszka wrote: On 2011-07-12 13:04, Gilles Chanteperdrix wrote: On 07/12/2011 01:00 PM, Jan Kiszka wrote: On 2011-07-12 12:59, Gilles Chanteperdrix wrote: On 07/12/2011 09:22 AM, Jan Kiszka wrote: On 2011-07-12 08:41, Gilles Chanteperdrix wrote: On 07/11/2011 10:12 PM, Jan Kiszka wrote: On 2011-07-11 22:09, Gilles Chanteperdrix wrote: On 07/11/2011 10:06 PM, Jan Kiszka wrote: On 2011-07-11 22:02, Gilles Chanteperdrix wrote: On 07/11/2011 09:59 PM, Jan Kiszka wrote: On 2011-07-11 21:51, Gilles Chanteperdrix wrote: On 07/11/2011 09:16 PM, Jan Kiszka wrote: On 2011-07-11 21:10, Jan Kiszka wrote: On 2011-07-11 20:53, Gilles Chanteperdrix wrote: On 07/08/2011 06:29 PM, GIT version control wrote: @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + + gksched = thread-gksched; + if (gksched) { + xnlock_put_irqrestore(nklock, s); Are we sure irqs are on here? Are you sure that what is needed is not an xnlock_clear_irqon? We are in the context of do_exit. Not only IRQs are on, also preemption. And surely no nklock is held. Furthermore, I do not understand how we synchronize with the gatekeeper, how is the gatekeeper garanteed to wait for this assignment? The gatekeeper holds the gksync token while it's active. We request it, thus we wait for the gatekeeper to become idle again. While it is idle, we reset the queued reference - but I just realized that this may tramp on other tasks' values. I need to add a check that the value to be null'ified is actually still ours. Thinking again, that's actually not a problem: gktarget is only needed while gksync is zero - but then we won't get hold of it anyway and, thus, can't cause any damage. Well, you make it look like it does not work. From what I understand, what you want is to set gktarget to null if a task being hardened is destroyed. But by waiting for the semaphore, you actually wait for the harden to be complete, so setting to NULL is useless. Or am I missing something else? Setting to NULL is probably unneeded but still better than rely on the gatekeeper never waking up spuriously and then dereferencing a stale pointer. The key element of this fix is waitng on gksync, thus on the completion of the non-RT part of the hardening. Actually, this part usually fails as the target task received a termination signal at this point. Yes, but since you wait on the completion of the hardening, the test if (target ...) in the gatekeeper code will always be true, because at this point the cleanup code will still be waiting for the semaphore. Yes, except we will ever wake up the gatekeeper later on without an updated gktarget, ie. spuriously. Better safe than sorry, this is hairy code anyway (hopefully obsolete one day). The gatekeeper is not woken up by posting the semaphore, the gatekeeper is woken up by the thread which is going to be hardened (and this thread is the one which waits for the semaphore). All true. And what is the point? The point being, would not something like this patch be sufficient? diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c index 01f4200..4742c02 100644 --- a/ksrc/nucleus/shadow.c +++ b/ksrc/nucleus/shadow.c @@ -2527,6 +2527,18 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + if (xnthread_test_info(thread, XNATOMIC)) { + struct xnsched *gksched = xnpod_sched_slot(task_cpu(p)); That's not reliable, the task might have been migrated by Linux in the meantime. We must use the stored gksched. + xnlock_put_irqrestore(nklock, s); + + /* Thread is in flight to primary mode, wait for the +gatekeeper to be done with it. */ + down(gksched-gksync); + up(gksched-gksync); + + xnlock_get_irqsave(nklock, s); + } + /* Prevent wakeup call from xnshadow_unmap(). */ xnshadow_thrptd(p) = NULL; xnthread_archtcb(thread)-user_task = NULL; Again, setting gktarget to NULL and testing for NULL is simply safer, and I see no gain in skipping that. But if you prefer the micro-optimization, I'll drop it. Could not we use an info bit instead of adding a pointer? That's not reliable, the task might have been migrated by Linux in the meantime. We must use the stored gksched. I mean add another info bit to mean that the task is queued for wakeup by the gatekeeper. XNGKQ, or something. What additional value does it provide to gksched != NULL? We need that pointer anyway to identify the gatekeeper that holds a reference.
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 2011-07-12 13:56, Jan Kiszka wrote: However, this parallel unsynchronized execution of the gatekeeper and its target thread leaves an increasingly bad feeling on my side. Did we really catch all corner cases now? I wouldn't guarantee that yet. Specifically as I still have an obscure crash of a Xenomai thread on Linux schedule() on my table. What if the target thread woke up due to a signal, continued much further on a different CPU, blocked in TASK_INTERRUPTIBLE, and then the gatekeeper continued? I wish we could already eliminate this complexity and do the migration directly inside schedule()... BTW, we do we mask out TASK_ATOMICSWITCH when checking the task state in the gatekeeper? What would happen if we included it (state == (TASK_ATOMICSWITCH | TASK_INTERRUPTIBLE))? Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 07/12/2011 01:58 PM, Jan Kiszka wrote: On 2011-07-12 13:56, Jan Kiszka wrote: However, this parallel unsynchronized execution of the gatekeeper and its target thread leaves an increasingly bad feeling on my side. Did we really catch all corner cases now? I wouldn't guarantee that yet. Specifically as I still have an obscure crash of a Xenomai thread on Linux schedule() on my table. What if the target thread woke up due to a signal, continued much further on a different CPU, blocked in TASK_INTERRUPTIBLE, and then the gatekeeper continued? I wish we could already eliminate this complexity and do the migration directly inside schedule()... BTW, we do we mask out TASK_ATOMICSWITCH when checking the task state in the gatekeeper? What would happen if we included it (state == (TASK_ATOMICSWITCH | TASK_INTERRUPTIBLE))? I would tend to think that what we should check is xnthread_test_info(XNATOMIC). Or maybe check both, the interruptible state and the XNATOMIC info bit. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 2011-07-12 14:06, Gilles Chanteperdrix wrote: On 07/12/2011 01:58 PM, Jan Kiszka wrote: On 2011-07-12 13:56, Jan Kiszka wrote: However, this parallel unsynchronized execution of the gatekeeper and its target thread leaves an increasingly bad feeling on my side. Did we really catch all corner cases now? I wouldn't guarantee that yet. Specifically as I still have an obscure crash of a Xenomai thread on Linux schedule() on my table. What if the target thread woke up due to a signal, continued much further on a different CPU, blocked in TASK_INTERRUPTIBLE, and then the gatekeeper continued? I wish we could already eliminate this complexity and do the migration directly inside schedule()... BTW, we do we mask out TASK_ATOMICSWITCH when checking the task state in the gatekeeper? What would happen if we included it (state == (TASK_ATOMICSWITCH | TASK_INTERRUPTIBLE))? I would tend to think that what we should check is xnthread_test_info(XNATOMIC). Or maybe check both, the interruptible state and the XNATOMIC info bit. Actually, neither the info bits nor the task state is sufficiently synchronized against the gatekeeper yet. We need to hold a shared lock when testing and resetting the state. I'm not sure yet if that is fixable given the gatekeeper architecture. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On Tue, 2011-07-12 at 14:57 +0200, Jan Kiszka wrote: On 2011-07-12 14:13, Jan Kiszka wrote: On 2011-07-12 14:06, Gilles Chanteperdrix wrote: On 07/12/2011 01:58 PM, Jan Kiszka wrote: On 2011-07-12 13:56, Jan Kiszka wrote: However, this parallel unsynchronized execution of the gatekeeper and its target thread leaves an increasingly bad feeling on my side. Did we really catch all corner cases now? I wouldn't guarantee that yet. Specifically as I still have an obscure crash of a Xenomai thread on Linux schedule() on my table. What if the target thread woke up due to a signal, continued much further on a different CPU, blocked in TASK_INTERRUPTIBLE, and then the gatekeeper continued? I wish we could already eliminate this complexity and do the migration directly inside schedule()... BTW, we do we mask out TASK_ATOMICSWITCH when checking the task state in the gatekeeper? What would happen if we included it (state == (TASK_ATOMICSWITCH | TASK_INTERRUPTIBLE))? I would tend to think that what we should check is xnthread_test_info(XNATOMIC). Or maybe check both, the interruptible state and the XNATOMIC info bit. Actually, neither the info bits nor the task state is sufficiently synchronized against the gatekeeper yet. We need to hold a shared lock when testing and resetting the state. I'm not sure yet if that is fixable given the gatekeeper architecture. This may work (on top of the exit-race fix): diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c index 50dcf43..90feb16 100644 --- a/ksrc/nucleus/shadow.c +++ b/ksrc/nucleus/shadow.c @@ -913,20 +913,27 @@ static int gatekeeper_thread(void *data) if ((xnthread_user_task(target)-state ~TASK_ATOMICSWITCH) == TASK_INTERRUPTIBLE) { rpi_pop(target); xnlock_get_irqsave(nklock, s); -#ifdef CONFIG_SMP + /* - * If the task changed its CPU while in - * secondary mode, change the CPU of the - * underlying Xenomai shadow too. We do not - * migrate the thread timers here, it would - * not work. For a full migration comprising - * timers, using xnpod_migrate_thread is - * required. + * Recheck XNATOMIC to avoid waking the shadow if the + * Linux task received a signal meanwhile. */ - if (target-sched != sched) - xnsched_migrate_passive(target, sched); + if (xnthread_test_info(target, XNATOMIC)) { +#ifdef CONFIG_SMP + /* + * If the task changed its CPU while in + * secondary mode, change the CPU of the + * underlying Xenomai shadow too. We do not + * migrate the thread timers here, it would + * not work. For a full migration comprising + * timers, using xnpod_migrate_thread is + * required. + */ + if (target-sched != sched) + xnsched_migrate_passive(target, sched); #endif /* CONFIG_SMP */ - xnpod_resume_thread(target, XNRELAX); + xnpod_resume_thread(target, XNRELAX); + } xnlock_put_irqrestore(nklock, s); xnpod_schedule(); } @@ -1036,6 +1043,7 @@ redo: * to process this signal anyway. */ if (rthal_current_domain == rthal_root_domain) { + XENO_BUGON(NUCLEUS, xnthread_test_info(thread, XNATOMIC)); if (XENO_DEBUG(NUCLEUS) (!signal_pending(this_task) || this_task-state != TASK_RUNNING)) xnpod_fatal @@ -1044,6 +1052,8 @@ redo: return -ERESTARTSYS; } + xnthread_clear_info(thread, XNATOMIC); + /* current is now running into the Xenomai domain. */ thread-gksched = NULL; sched = xnsched_finish_unlocked_switch(thread-sched); @@ -2650,6 +2660,8 @@ static inline void do_sigwake_event(struct task_struct *p) xnlock_get_irqsave(nklock, s); + xnthread_clear_info(thread, XNATOMIC); + if ((p-ptrace PT_PTRACED) !xnthread_test_state(thread, XNDEBUG)) { sigset_t pending; It totally ignores RPI and PREEMPT_RT for now. RPI is broken anyway, I want to drop RPI in v3 for sure because it is misleading people. I'm still pondering whether we should do that earlier during the 2.6 timeframe. ripping it out would allow to use solely XNATOMIC as condition in the gatekeeper. /me is now looking to get
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 2011-07-12 17:48, Philippe Gerum wrote: On Tue, 2011-07-12 at 14:57 +0200, Jan Kiszka wrote: On 2011-07-12 14:13, Jan Kiszka wrote: On 2011-07-12 14:06, Gilles Chanteperdrix wrote: On 07/12/2011 01:58 PM, Jan Kiszka wrote: On 2011-07-12 13:56, Jan Kiszka wrote: However, this parallel unsynchronized execution of the gatekeeper and its target thread leaves an increasingly bad feeling on my side. Did we really catch all corner cases now? I wouldn't guarantee that yet. Specifically as I still have an obscure crash of a Xenomai thread on Linux schedule() on my table. What if the target thread woke up due to a signal, continued much further on a different CPU, blocked in TASK_INTERRUPTIBLE, and then the gatekeeper continued? I wish we could already eliminate this complexity and do the migration directly inside schedule()... BTW, we do we mask out TASK_ATOMICSWITCH when checking the task state in the gatekeeper? What would happen if we included it (state == (TASK_ATOMICSWITCH | TASK_INTERRUPTIBLE))? I would tend to think that what we should check is xnthread_test_info(XNATOMIC). Or maybe check both, the interruptible state and the XNATOMIC info bit. Actually, neither the info bits nor the task state is sufficiently synchronized against the gatekeeper yet. We need to hold a shared lock when testing and resetting the state. I'm not sure yet if that is fixable given the gatekeeper architecture. This may work (on top of the exit-race fix): diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c index 50dcf43..90feb16 100644 --- a/ksrc/nucleus/shadow.c +++ b/ksrc/nucleus/shadow.c @@ -913,20 +913,27 @@ static int gatekeeper_thread(void *data) if ((xnthread_user_task(target)-state ~TASK_ATOMICSWITCH) == TASK_INTERRUPTIBLE) { rpi_pop(target); xnlock_get_irqsave(nklock, s); -#ifdef CONFIG_SMP + /* - * If the task changed its CPU while in - * secondary mode, change the CPU of the - * underlying Xenomai shadow too. We do not - * migrate the thread timers here, it would - * not work. For a full migration comprising - * timers, using xnpod_migrate_thread is - * required. + * Recheck XNATOMIC to avoid waking the shadow if the + * Linux task received a signal meanwhile. */ -if (target-sched != sched) -xnsched_migrate_passive(target, sched); +if (xnthread_test_info(target, XNATOMIC)) { +#ifdef CONFIG_SMP +/* + * If the task changed its CPU while in + * secondary mode, change the CPU of the + * underlying Xenomai shadow too. We do not + * migrate the thread timers here, it would + * not work. For a full migration comprising + * timers, using xnpod_migrate_thread is + * required. + */ +if (target-sched != sched) +xnsched_migrate_passive(target, sched); #endif /* CONFIG_SMP */ -xnpod_resume_thread(target, XNRELAX); +xnpod_resume_thread(target, XNRELAX); +} xnlock_put_irqrestore(nklock, s); xnpod_schedule(); } @@ -1036,6 +1043,7 @@ redo: * to process this signal anyway. */ if (rthal_current_domain == rthal_root_domain) { +XENO_BUGON(NUCLEUS, xnthread_test_info(thread, XNATOMIC)); if (XENO_DEBUG(NUCLEUS) (!signal_pending(this_task) || this_task-state != TASK_RUNNING)) xnpod_fatal @@ -1044,6 +1052,8 @@ redo: return -ERESTARTSYS; } +xnthread_clear_info(thread, XNATOMIC); + /* current is now running into the Xenomai domain. */ thread-gksched = NULL; sched = xnsched_finish_unlocked_switch(thread-sched); @@ -2650,6 +2660,8 @@ static inline void do_sigwake_event(struct task_struct *p) xnlock_get_irqsave(nklock, s); +xnthread_clear_info(thread, XNATOMIC); + if ((p-ptrace PT_PTRACED) !xnthread_test_state(thread, XNDEBUG)) { sigset_t pending; It totally ignores RPI and PREEMPT_RT for now. RPI is broken anyway, I want to drop RPI in v3 for sure because it is misleading people. I'm still pondering whether we should do that earlier during the 2.6 timeframe. That would only leave us with XNATOMIC being used under PREEMPT-RT for signaling LO_GKWAKE_REQ on schedule out while my patch may clear it on signal
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 07/12/2011 02:57 PM, Jan Kiszka wrote: xnlock_put_irqrestore(nklock, s); xnpod_schedule(); } @@ -1036,6 +1043,7 @@ redo: * to process this signal anyway. */ if (rthal_current_domain == rthal_root_domain) { + XENO_BUGON(NUCLEUS, xnthread_test_info(thread, XNATOMIC)); Misleading dead code again, XNATOMIC is cleared not ten lines above. if (XENO_DEBUG(NUCLEUS) (!signal_pending(this_task) || this_task-state != TASK_RUNNING)) xnpod_fatal @@ -1044,6 +1052,8 @@ redo: return -ERESTARTSYS; } + xnthread_clear_info(thread, XNATOMIC); Why this? I find the xnthread_clear_info(XNATOMIC) right at the right place at the point it currently is. + /* current is now running into the Xenomai domain. */ thread-gksched = NULL; sched = xnsched_finish_unlocked_switch(thread-sched); @@ -2650,6 +2660,8 @@ static inline void do_sigwake_event(struct task_struct *p) xnlock_get_irqsave(nklock, s); + xnthread_clear_info(thread, XNATOMIC); + Ditto. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 2011-07-12 19:31, Gilles Chanteperdrix wrote: On 07/12/2011 02:57 PM, Jan Kiszka wrote: xnlock_put_irqrestore(nklock, s); xnpod_schedule(); } @@ -1036,6 +1043,7 @@ redo: * to process this signal anyway. */ if (rthal_current_domain == rthal_root_domain) { +XENO_BUGON(NUCLEUS, xnthread_test_info(thread, XNATOMIC)); Misleading dead code again, XNATOMIC is cleared not ten lines above. Nope, I forgot to remove that line. if (XENO_DEBUG(NUCLEUS) (!signal_pending(this_task) || this_task-state != TASK_RUNNING)) xnpod_fatal @@ -1044,6 +1052,8 @@ redo: return -ERESTARTSYS; } +xnthread_clear_info(thread, XNATOMIC); Why this? I find the xnthread_clear_info(XNATOMIC) right at the right place at the point it currently is. Nope. Now we either clear XNATOMIC after successful migration or when the signal is about to be sent (ie. in the hook). That way we can test more reliably (TM) in the gatekeeper if the thread can be migrated. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 07/12/2011 07:34 PM, Jan Kiszka wrote: On 2011-07-12 19:31, Gilles Chanteperdrix wrote: On 07/12/2011 02:57 PM, Jan Kiszka wrote: xnlock_put_irqrestore(nklock, s); xnpod_schedule(); } @@ -1036,6 +1043,7 @@ redo: * to process this signal anyway. */ if (rthal_current_domain == rthal_root_domain) { + XENO_BUGON(NUCLEUS, xnthread_test_info(thread, XNATOMIC)); Misleading dead code again, XNATOMIC is cleared not ten lines above. Nope, I forgot to remove that line. if (XENO_DEBUG(NUCLEUS) (!signal_pending(this_task) || this_task-state != TASK_RUNNING)) xnpod_fatal @@ -1044,6 +1052,8 @@ redo: return -ERESTARTSYS; } + xnthread_clear_info(thread, XNATOMIC); Why this? I find the xnthread_clear_info(XNATOMIC) right at the right place at the point it currently is. Nope. Now we either clear XNATOMIC after successful migration or when the signal is about to be sent (ie. in the hook). That way we can test more reliably (TM) in the gatekeeper if the thread can be migrated. Ok for adding the XNATOMIC test, because it improves the robustness, but why changing the way XNATOMIC is set and clear? Chances of breaking thing while changing code in this area are really high... -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 2011-07-12 19:38, Gilles Chanteperdrix wrote: On 07/12/2011 07:34 PM, Jan Kiszka wrote: On 2011-07-12 19:31, Gilles Chanteperdrix wrote: On 07/12/2011 02:57 PM, Jan Kiszka wrote: xnlock_put_irqrestore(nklock, s); xnpod_schedule(); } @@ -1036,6 +1043,7 @@ redo: * to process this signal anyway. */ if (rthal_current_domain == rthal_root_domain) { + XENO_BUGON(NUCLEUS, xnthread_test_info(thread, XNATOMIC)); Misleading dead code again, XNATOMIC is cleared not ten lines above. Nope, I forgot to remove that line. if (XENO_DEBUG(NUCLEUS) (!signal_pending(this_task) || this_task-state != TASK_RUNNING)) xnpod_fatal @@ -1044,6 +1052,8 @@ redo: return -ERESTARTSYS; } + xnthread_clear_info(thread, XNATOMIC); Why this? I find the xnthread_clear_info(XNATOMIC) right at the right place at the point it currently is. Nope. Now we either clear XNATOMIC after successful migration or when the signal is about to be sent (ie. in the hook). That way we can test more reliably (TM) in the gatekeeper if the thread can be migrated. Ok for adding the XNATOMIC test, because it improves the robustness, but why changing the way XNATOMIC is set and clear? Chances of breaking thing while changing code in this area are really high... The current code is (most probably) broken as it does not properly synchronizes the gatekeeper against a signaled and runaway target Linux task. We need an indication if a Linux signal will (or already has) woken up the to-be-migrated task. That task may have continued over its context, potentially on a different CPU. Providing this indication is the purpose of changing where XNATOMIC is cleared. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 07/08/2011 06:29 PM, GIT version control wrote: @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + + gksched = thread-gksched; + if (gksched) { + xnlock_put_irqrestore(nklock, s); Are we sure irqs are on here? Are you sure that what is needed is not an xnlock_clear_irqon? Furthermore, I do not understand how we synchronize with the gatekeeper, how is the gatekeeper garanteed to wait for this assignment? -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 2011-07-11 20:53, Gilles Chanteperdrix wrote: On 07/08/2011 06:29 PM, GIT version control wrote: @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + +gksched = thread-gksched; +if (gksched) { +xnlock_put_irqrestore(nklock, s); Are we sure irqs are on here? Are you sure that what is needed is not an xnlock_clear_irqon? We are in the context of do_exit. Not only IRQs are on, also preemption. And surely no nklock is held. Furthermore, I do not understand how we synchronize with the gatekeeper, how is the gatekeeper garanteed to wait for this assignment? The gatekeeper holds the gksync token while it's active. We request it, thus we wait for the gatekeeper to become idle again. While it is idle, we reset the queued reference - but I just realized that this may tramp on other tasks' values. I need to add a check that the value to be null'ified is actually still ours. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 2011-07-11 21:10, Jan Kiszka wrote: On 2011-07-11 20:53, Gilles Chanteperdrix wrote: On 07/08/2011 06:29 PM, GIT version control wrote: @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + + gksched = thread-gksched; + if (gksched) { + xnlock_put_irqrestore(nklock, s); Are we sure irqs are on here? Are you sure that what is needed is not an xnlock_clear_irqon? We are in the context of do_exit. Not only IRQs are on, also preemption. And surely no nklock is held. Furthermore, I do not understand how we synchronize with the gatekeeper, how is the gatekeeper garanteed to wait for this assignment? The gatekeeper holds the gksync token while it's active. We request it, thus we wait for the gatekeeper to become idle again. While it is idle, we reset the queued reference - but I just realized that this may tramp on other tasks' values. I need to add a check that the value to be null'ified is actually still ours. Thinking again, that's actually not a problem: gktarget is only needed while gksync is zero - but then we won't get hold of it anyway and, thus, can't cause any damage. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 07/11/2011 09:16 PM, Jan Kiszka wrote: On 2011-07-11 21:10, Jan Kiszka wrote: On 2011-07-11 20:53, Gilles Chanteperdrix wrote: On 07/08/2011 06:29 PM, GIT version control wrote: @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + + gksched = thread-gksched; + if (gksched) { + xnlock_put_irqrestore(nklock, s); Are we sure irqs are on here? Are you sure that what is needed is not an xnlock_clear_irqon? We are in the context of do_exit. Not only IRQs are on, also preemption. And surely no nklock is held. Furthermore, I do not understand how we synchronize with the gatekeeper, how is the gatekeeper garanteed to wait for this assignment? The gatekeeper holds the gksync token while it's active. We request it, thus we wait for the gatekeeper to become idle again. While it is idle, we reset the queued reference - but I just realized that this may tramp on other tasks' values. I need to add a check that the value to be null'ified is actually still ours. Thinking again, that's actually not a problem: gktarget is only needed while gksync is zero - but then we won't get hold of it anyway and, thus, can't cause any damage. Well, you make it look like it does not work. From what I understand, what you want is to set gktarget to null if a task being hardened is destroyed. But by waiting for the semaphore, you actually wait for the harden to be complete, so setting to NULL is useless. Or am I missing something else? -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 2011-07-11 21:51, Gilles Chanteperdrix wrote: On 07/11/2011 09:16 PM, Jan Kiszka wrote: On 2011-07-11 21:10, Jan Kiszka wrote: On 2011-07-11 20:53, Gilles Chanteperdrix wrote: On 07/08/2011 06:29 PM, GIT version control wrote: @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + + gksched = thread-gksched; + if (gksched) { + xnlock_put_irqrestore(nklock, s); Are we sure irqs are on here? Are you sure that what is needed is not an xnlock_clear_irqon? We are in the context of do_exit. Not only IRQs are on, also preemption. And surely no nklock is held. Furthermore, I do not understand how we synchronize with the gatekeeper, how is the gatekeeper garanteed to wait for this assignment? The gatekeeper holds the gksync token while it's active. We request it, thus we wait for the gatekeeper to become idle again. While it is idle, we reset the queued reference - but I just realized that this may tramp on other tasks' values. I need to add a check that the value to be null'ified is actually still ours. Thinking again, that's actually not a problem: gktarget is only needed while gksync is zero - but then we won't get hold of it anyway and, thus, can't cause any damage. Well, you make it look like it does not work. From what I understand, what you want is to set gktarget to null if a task being hardened is destroyed. But by waiting for the semaphore, you actually wait for the harden to be complete, so setting to NULL is useless. Or am I missing something else? Setting to NULL is probably unneeded but still better than rely on the gatekeeper never waking up spuriously and then dereferencing a stale pointer. The key element of this fix is waitng on gksync, thus on the completion of the non-RT part of the hardening. Actually, this part usually fails as the target task received a termination signal at this point. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 07/11/2011 09:59 PM, Jan Kiszka wrote: On 2011-07-11 21:51, Gilles Chanteperdrix wrote: On 07/11/2011 09:16 PM, Jan Kiszka wrote: On 2011-07-11 21:10, Jan Kiszka wrote: On 2011-07-11 20:53, Gilles Chanteperdrix wrote: On 07/08/2011 06:29 PM, GIT version control wrote: @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + +gksched = thread-gksched; +if (gksched) { +xnlock_put_irqrestore(nklock, s); Are we sure irqs are on here? Are you sure that what is needed is not an xnlock_clear_irqon? We are in the context of do_exit. Not only IRQs are on, also preemption. And surely no nklock is held. Furthermore, I do not understand how we synchronize with the gatekeeper, how is the gatekeeper garanteed to wait for this assignment? The gatekeeper holds the gksync token while it's active. We request it, thus we wait for the gatekeeper to become idle again. While it is idle, we reset the queued reference - but I just realized that this may tramp on other tasks' values. I need to add a check that the value to be null'ified is actually still ours. Thinking again, that's actually not a problem: gktarget is only needed while gksync is zero - but then we won't get hold of it anyway and, thus, can't cause any damage. Well, you make it look like it does not work. From what I understand, what you want is to set gktarget to null if a task being hardened is destroyed. But by waiting for the semaphore, you actually wait for the harden to be complete, so setting to NULL is useless. Or am I missing something else? Setting to NULL is probably unneeded but still better than rely on the gatekeeper never waking up spuriously and then dereferencing a stale pointer. The key element of this fix is waitng on gksync, thus on the completion of the non-RT part of the hardening. Actually, this part usually fails as the target task received a termination signal at this point. Yes, but since you wait on the completion of the hardening, the test if (target ...) in the gatekeeper code will always be true, because at this point the cleanup code will still be waiting for the semaphore. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 2011-07-11 22:02, Gilles Chanteperdrix wrote: On 07/11/2011 09:59 PM, Jan Kiszka wrote: On 2011-07-11 21:51, Gilles Chanteperdrix wrote: On 07/11/2011 09:16 PM, Jan Kiszka wrote: On 2011-07-11 21:10, Jan Kiszka wrote: On 2011-07-11 20:53, Gilles Chanteperdrix wrote: On 07/08/2011 06:29 PM, GIT version control wrote: @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + + gksched = thread-gksched; + if (gksched) { + xnlock_put_irqrestore(nklock, s); Are we sure irqs are on here? Are you sure that what is needed is not an xnlock_clear_irqon? We are in the context of do_exit. Not only IRQs are on, also preemption. And surely no nklock is held. Furthermore, I do not understand how we synchronize with the gatekeeper, how is the gatekeeper garanteed to wait for this assignment? The gatekeeper holds the gksync token while it's active. We request it, thus we wait for the gatekeeper to become idle again. While it is idle, we reset the queued reference - but I just realized that this may tramp on other tasks' values. I need to add a check that the value to be null'ified is actually still ours. Thinking again, that's actually not a problem: gktarget is only needed while gksync is zero - but then we won't get hold of it anyway and, thus, can't cause any damage. Well, you make it look like it does not work. From what I understand, what you want is to set gktarget to null if a task being hardened is destroyed. But by waiting for the semaphore, you actually wait for the harden to be complete, so setting to NULL is useless. Or am I missing something else? Setting to NULL is probably unneeded but still better than rely on the gatekeeper never waking up spuriously and then dereferencing a stale pointer. The key element of this fix is waitng on gksync, thus on the completion of the non-RT part of the hardening. Actually, this part usually fails as the target task received a termination signal at this point. Yes, but since you wait on the completion of the hardening, the test if (target ...) in the gatekeeper code will always be true, because at this point the cleanup code will still be waiting for the semaphore. Yes, except we will ever wake up the gatekeeper later on without an updated gktarget, ie. spuriously. Better safe than sorry, this is hairy code anyway (hopefully obsolete one day). Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 07/11/2011 10:06 PM, Jan Kiszka wrote: On 2011-07-11 22:02, Gilles Chanteperdrix wrote: On 07/11/2011 09:59 PM, Jan Kiszka wrote: On 2011-07-11 21:51, Gilles Chanteperdrix wrote: On 07/11/2011 09:16 PM, Jan Kiszka wrote: On 2011-07-11 21:10, Jan Kiszka wrote: On 2011-07-11 20:53, Gilles Chanteperdrix wrote: On 07/08/2011 06:29 PM, GIT version control wrote: @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + + gksched = thread-gksched; + if (gksched) { + xnlock_put_irqrestore(nklock, s); Are we sure irqs are on here? Are you sure that what is needed is not an xnlock_clear_irqon? We are in the context of do_exit. Not only IRQs are on, also preemption. And surely no nklock is held. Furthermore, I do not understand how we synchronize with the gatekeeper, how is the gatekeeper garanteed to wait for this assignment? The gatekeeper holds the gksync token while it's active. We request it, thus we wait for the gatekeeper to become idle again. While it is idle, we reset the queued reference - but I just realized that this may tramp on other tasks' values. I need to add a check that the value to be null'ified is actually still ours. Thinking again, that's actually not a problem: gktarget is only needed while gksync is zero - but then we won't get hold of it anyway and, thus, can't cause any damage. Well, you make it look like it does not work. From what I understand, what you want is to set gktarget to null if a task being hardened is destroyed. But by waiting for the semaphore, you actually wait for the harden to be complete, so setting to NULL is useless. Or am I missing something else? Setting to NULL is probably unneeded but still better than rely on the gatekeeper never waking up spuriously and then dereferencing a stale pointer. The key element of this fix is waitng on gksync, thus on the completion of the non-RT part of the hardening. Actually, this part usually fails as the target task received a termination signal at this point. Yes, but since you wait on the completion of the hardening, the test if (target ...) in the gatekeeper code will always be true, because at this point the cleanup code will still be waiting for the semaphore. Yes, except we will ever wake up the gatekeeper later on without an updated gktarget, ie. spuriously. Better safe than sorry, this is hairy code anyway (hopefully obsolete one day). The gatekeeper is not woken up by posting the semaphore, the gatekeeper is woken up by the thread which is going to be hardened (and this thread is the one which waits for the semaphore). Jan -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion
On 2011-07-11 22:09, Gilles Chanteperdrix wrote: On 07/11/2011 10:06 PM, Jan Kiszka wrote: On 2011-07-11 22:02, Gilles Chanteperdrix wrote: On 07/11/2011 09:59 PM, Jan Kiszka wrote: On 2011-07-11 21:51, Gilles Chanteperdrix wrote: On 07/11/2011 09:16 PM, Jan Kiszka wrote: On 2011-07-11 21:10, Jan Kiszka wrote: On 2011-07-11 20:53, Gilles Chanteperdrix wrote: On 07/08/2011 06:29 PM, GIT version control wrote: @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct task_struct *p) magic = xnthread_get_magic(thread); xnlock_get_irqsave(nklock, s); + + gksched = thread-gksched; + if (gksched) { + xnlock_put_irqrestore(nklock, s); Are we sure irqs are on here? Are you sure that what is needed is not an xnlock_clear_irqon? We are in the context of do_exit. Not only IRQs are on, also preemption. And surely no nklock is held. Furthermore, I do not understand how we synchronize with the gatekeeper, how is the gatekeeper garanteed to wait for this assignment? The gatekeeper holds the gksync token while it's active. We request it, thus we wait for the gatekeeper to become idle again. While it is idle, we reset the queued reference - but I just realized that this may tramp on other tasks' values. I need to add a check that the value to be null'ified is actually still ours. Thinking again, that's actually not a problem: gktarget is only needed while gksync is zero - but then we won't get hold of it anyway and, thus, can't cause any damage. Well, you make it look like it does not work. From what I understand, what you want is to set gktarget to null if a task being hardened is destroyed. But by waiting for the semaphore, you actually wait for the harden to be complete, so setting to NULL is useless. Or am I missing something else? Setting to NULL is probably unneeded but still better than rely on the gatekeeper never waking up spuriously and then dereferencing a stale pointer. The key element of this fix is waitng on gksync, thus on the completion of the non-RT part of the hardening. Actually, this part usually fails as the target task received a termination signal at this point. Yes, but since you wait on the completion of the hardening, the test if (target ...) in the gatekeeper code will always be true, because at this point the cleanup code will still be waiting for the semaphore. Yes, except we will ever wake up the gatekeeper later on without an updated gktarget, ie. spuriously. Better safe than sorry, this is hairy code anyway (hopefully obsolete one day). The gatekeeper is not woken up by posting the semaphore, the gatekeeper is woken up by the thread which is going to be hardened (and this thread is the one which waits for the semaphore). All true. And what is the point? Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Allow drop_u_mode syscall from any context
On 2011-06-28 23:29, Gilles Chanteperdrix wrote: On 06/28/2011 11:01 PM, GIT version control wrote: Module: xenomai-jki Branch: for-upstream Commit: 5597470d84584846875e8a35309e6302c768addf URL: http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5597470d84584846875e8a35309e6302c768addf Author: Jan Kiszka jan.kis...@siemens.com Date: Tue Jun 28 22:10:07 2011 +0200 nucleus: Allow drop_u_mode syscall from any context xnshadow_sys_drop_u_mode already checks if the caller is a shadow. It does that without issuing a warning message if the check fails - in contrast to do_hisyscall_event. As user space may call this cleanup service even for non-shadow threads (e.g. after shadow creation failed), we better silence this warning. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Jan, I have a branch here, which allocates u_mode in the shared heap, so, this syscall is about to become unnecessary. See: http://git.xenomai.org/?p=xenomai-gch.git;a=shortlog;h=refs/heads/u_mode Even better. When do you plan to merge all this? I'd like to finally fix the various MPS fastlock breakages, specifically as they overlap with other issues there. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Allow drop_u_mode syscall from any context
On 06/29/2011 09:06 AM, Jan Kiszka wrote: On 2011-06-28 23:29, Gilles Chanteperdrix wrote: On 06/28/2011 11:01 PM, GIT version control wrote: Module: xenomai-jki Branch: for-upstream Commit: 5597470d84584846875e8a35309e6302c768addf URL: http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5597470d84584846875e8a35309e6302c768addf Author: Jan Kiszka jan.kis...@siemens.com Date: Tue Jun 28 22:10:07 2011 +0200 nucleus: Allow drop_u_mode syscall from any context xnshadow_sys_drop_u_mode already checks if the caller is a shadow. It does that without issuing a warning message if the check fails - in contrast to do_hisyscall_event. As user space may call this cleanup service even for non-shadow threads (e.g. after shadow creation failed), we better silence this warning. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Jan, I have a branch here, which allocates u_mode in the shared heap, so, this syscall is about to become unnecessary. See: http://git.xenomai.org/?p=xenomai-gch.git;a=shortlog;h=refs/heads/u_mode Even better. When do you plan to merge all this? I'd like to finally fix the various MPS fastlock breakages, specifically as they overlap with other issues there. I would like to give a chance to Philippe to have a look at it before it is merged (especially the commit using an adeos ptd). Normally, the MPS fastlock issues are solved in this branch. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Allow drop_u_mode syscall from any context
On 2011-06-29 09:25, Gilles Chanteperdrix wrote: On 06/29/2011 09:06 AM, Jan Kiszka wrote: On 2011-06-28 23:29, Gilles Chanteperdrix wrote: On 06/28/2011 11:01 PM, GIT version control wrote: Module: xenomai-jki Branch: for-upstream Commit: 5597470d84584846875e8a35309e6302c768addf URL: http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5597470d84584846875e8a35309e6302c768addf Author: Jan Kiszka jan.kis...@siemens.com Date: Tue Jun 28 22:10:07 2011 +0200 nucleus: Allow drop_u_mode syscall from any context xnshadow_sys_drop_u_mode already checks if the caller is a shadow. It does that without issuing a warning message if the check fails - in contrast to do_hisyscall_event. As user space may call this cleanup service even for non-shadow threads (e.g. after shadow creation failed), we better silence this warning. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Jan, I have a branch here, which allocates u_mode in the shared heap, so, this syscall is about to become unnecessary. See: http://git.xenomai.org/?p=xenomai-gch.git;a=shortlog;h=refs/heads/u_mode Even better. When do you plan to merge all this? I'd like to finally fix the various MPS fastlock breakages, specifically as they overlap with other issues there. I would like to give a chance to Philippe to have a look at it before it is merged (especially the commit using an adeos ptd). OK. Normally, the MPS fastlock issues are solved in this branch. Only the previously discussed leak. MSP-disabled is still oopsing, and error clean up is also broken - but not only for MPS. I'll base my fixes on top of your branch. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Allow drop_u_mode syscall from any context
On 06/28/2011 11:01 PM, GIT version control wrote: Module: xenomai-jki Branch: for-upstream Commit: 5597470d84584846875e8a35309e6302c768addf URL: http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5597470d84584846875e8a35309e6302c768addf Author: Jan Kiszka jan.kis...@siemens.com Date: Tue Jun 28 22:10:07 2011 +0200 nucleus: Allow drop_u_mode syscall from any context xnshadow_sys_drop_u_mode already checks if the caller is a shadow. It does that without issuing a warning message if the check fails - in contrast to do_hisyscall_event. As user space may call this cleanup service even for non-shadow threads (e.g. after shadow creation failed), we better silence this warning. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Jan, I have a branch here, which allocates u_mode in the shared heap, so, this syscall is about to become unnecessary. See: http://git.xenomai.org/?p=xenomai-gch.git;a=shortlog;h=refs/heads/u_mode -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix interrupt handler tails
On 2011-06-19 17:41, Gilles Chanteperdrix wrote: Merged your whole branch, but took the liberty to change it a bit (replacing the commit concerning unlocked context switches with comments changes only, and changing the commit about xntbase_tick). What makes splmax() redundant for the unlocked context switch case? IMO that bug is still present. We can clean up xnintr_clock_handler a bit after the changes, will follow up with a patch. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix interrupt handler tails
On 06/20/2011 06:43 PM, Jan Kiszka wrote: On 2011-06-19 17:41, Gilles Chanteperdrix wrote: Merged your whole branch, but took the liberty to change it a bit (replacing the commit concerning unlocked context switches with comments changes only, and changing the commit about xntbase_tick). What makes splmax() redundant for the unlocked context switch case? IMO that bug is still present. No, the bug is between my keyboard and chair. On architectures with unlocked context switches, the Linux task switch still happens with irqs off, only the mm switch happens with irqs on. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix interrupt handler tails
On 2011-06-20 19:33, Gilles Chanteperdrix wrote: On 06/20/2011 06:43 PM, Jan Kiszka wrote: On 2011-06-19 17:41, Gilles Chanteperdrix wrote: Merged your whole branch, but took the liberty to change it a bit (replacing the commit concerning unlocked context switches with comments changes only, and changing the commit about xntbase_tick). What makes splmax() redundant for the unlocked context switch case? IMO that bug is still present. No, the bug is between my keyboard and chair. On architectures with unlocked context switches, the Linux task switch still happens with irqs off, only the mm switch happens with irqs on. Then why do we call xnlock_get_irqsave in xnsched_finish_unlocked_switch? Why not simply xnlock_get if irqs are off anyway? Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix interrupt handler tails
On 06/20/2011 09:38 PM, Jan Kiszka wrote: On 2011-06-20 19:33, Gilles Chanteperdrix wrote: On 06/20/2011 06:43 PM, Jan Kiszka wrote: On 2011-06-19 17:41, Gilles Chanteperdrix wrote: Merged your whole branch, but took the liberty to change it a bit (replacing the commit concerning unlocked context switches with comments changes only, and changing the commit about xntbase_tick). What makes splmax() redundant for the unlocked context switch case? IMO that bug is still present. No, the bug is between my keyboard and chair. On architectures with unlocked context switches, the Linux task switch still happens with irqs off, only the mm switch happens with irqs on. Then why do we call xnlock_get_irqsave in xnsched_finish_unlocked_switch? Why not simply xnlock_get if irqs are off anyway? Because of the Xenomai task switch, not the Linux task switch. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix interrupt handler tails
On 2011-06-20 21:41, Gilles Chanteperdrix wrote: On 06/20/2011 09:38 PM, Jan Kiszka wrote: On 2011-06-20 19:33, Gilles Chanteperdrix wrote: On 06/20/2011 06:43 PM, Jan Kiszka wrote: On 2011-06-19 17:41, Gilles Chanteperdrix wrote: Merged your whole branch, but took the liberty to change it a bit (replacing the commit concerning unlocked context switches with comments changes only, and changing the commit about xntbase_tick). What makes splmax() redundant for the unlocked context switch case? IMO that bug is still present. No, the bug is between my keyboard and chair. On architectures with unlocked context switches, the Linux task switch still happens with irqs off, only the mm switch happens with irqs on. Then why do we call xnlock_get_irqsave in xnsched_finish_unlocked_switch? Why not simply xnlock_get if irqs are off anyway? Because of the Xenomai task switch, not the Linux task switch. --verbose please. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix interrupt handler tails
On 06/20/2011 09:41 PM, Jan Kiszka wrote: On 2011-06-20 21:41, Gilles Chanteperdrix wrote: On 06/20/2011 09:38 PM, Jan Kiszka wrote: On 2011-06-20 19:33, Gilles Chanteperdrix wrote: On 06/20/2011 06:43 PM, Jan Kiszka wrote: On 2011-06-19 17:41, Gilles Chanteperdrix wrote: Merged your whole branch, but took the liberty to change it a bit (replacing the commit concerning unlocked context switches with comments changes only, and changing the commit about xntbase_tick). What makes splmax() redundant for the unlocked context switch case? IMO that bug is still present. No, the bug is between my keyboard and chair. On architectures with unlocked context switches, the Linux task switch still happens with irqs off, only the mm switch happens with irqs on. Then why do we call xnlock_get_irqsave in xnsched_finish_unlocked_switch? Why not simply xnlock_get if irqs are off anyway? Because of the Xenomai task switch, not the Linux task switch. --verbose please. There are two kind of task switches, switches between Linux tasks, handled by Linux kernel function/macro/inline asm switch_to(). And those between Xenomai tasks, handled by function/macro/inline asm xnarch_switch_to(). Since a Linux kernel context switch may still be interrupted by a (primary mode) interrupt which could decide to switch context, it can not happen with interrupts enabled, due to the way it works (spill the registers in a place relative to the SP, then change SP not atomically). The Xenomai context switches have no such risk, so, they may happen with irqs on, completely. In case of relax, the two halves of context switches are not of the same kind. The first half of the context switch is a Xenomai switch, but the second half is the epilogue of a Linux context switch (which, by the way, is why we need skipping all the house keeping in __xnpod_schedule in that case, and also why we go to all the pain for keeping the two context switches compatible), hence, even on machines with unlocked context switches, irqs are off at this point. Hope this is more clear. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix interrupt handler tails
On 2011-06-20 21:51, Gilles Chanteperdrix wrote: On 06/20/2011 09:41 PM, Jan Kiszka wrote: On 2011-06-20 21:41, Gilles Chanteperdrix wrote: On 06/20/2011 09:38 PM, Jan Kiszka wrote: On 2011-06-20 19:33, Gilles Chanteperdrix wrote: On 06/20/2011 06:43 PM, Jan Kiszka wrote: On 2011-06-19 17:41, Gilles Chanteperdrix wrote: Merged your whole branch, but took the liberty to change it a bit (replacing the commit concerning unlocked context switches with comments changes only, and changing the commit about xntbase_tick). What makes splmax() redundant for the unlocked context switch case? IMO that bug is still present. No, the bug is between my keyboard and chair. On architectures with unlocked context switches, the Linux task switch still happens with irqs off, only the mm switch happens with irqs on. Then why do we call xnlock_get_irqsave in xnsched_finish_unlocked_switch? Why not simply xnlock_get if irqs are off anyway? Because of the Xenomai task switch, not the Linux task switch. --verbose please. There are two kind of task switches, switches between Linux tasks, handled by Linux kernel function/macro/inline asm switch_to(). And those between Xenomai tasks, handled by function/macro/inline asm xnarch_switch_to(). xnarch_switch_to is the central entry point for everyone. It may decide to branch to switch_to or __switch_to, or it simply handles all on its own - that's depending on the arch. Since a Linux kernel context switch may still be interrupted by a (primary mode) interrupt which could decide to switch context, it can not happen with interrupts enabled, due to the way it works (spill the registers in a place relative to the SP, then change SP not atomically). The Xenomai context switches have no such risk, so, they may happen with irqs on, completely. In case of relax, the two halves of context switches are not of the same kind. The first half of the context switch is a Xenomai switch, but the second half is the epilogue of a Linux context switch (which, by the way, is why we need skipping all the house keeping in __xnpod_schedule in that case, and also why we go to all the pain for keeping the two context switches compatible), hence, even on machines with unlocked context switches, irqs are off at this point. Hope this is more clear. That's all clear. But it's unclear how this maps to our key question. Can you point out where in those paths irqs are disabled again (after entering xnarch_switch_to) and left off for each of the unlocked switching archs? I'm still skeptical that the need for disable irqs during thread switch on some archs also leads to unconditionally disabled hard irqs when returning from xnarch_switch_to. But even if that's all the case today, we would better set this requirement in stone: diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c index f2fc2ab..c4c5807 100644 --- a/ksrc/nucleus/pod.c +++ b/ksrc/nucleus/pod.c @@ -2273,6 +2273,8 @@ reschedule: xnpod_switch_to(sched, prev, next); + XENO_BUGON(NUCLEUS, !irqs_disabled_hw()); + #ifdef CONFIG_XENO_OPT_PERVASIVE /* * Test whether we transitioned from primary mode to secondary [ just demonstrating, would require some cleanup ] Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix interrupt handler tails
On 06/20/2011 10:41 PM, Jan Kiszka wrote: xnarch_switch_to is the central entry point for everyone. It may decide to branch to switch_to or __switch_to, or it simply handles all on its own - that's depending on the arch. No, the Linux kernel does not know anything about xnarch_switch_to, so the schedule() function continues to use switch_to happily. xnarch_switch_to is only used to switch from xnthread_t to xnthread_t, by __xnpod_schedule(). Now, that some architecture (namely x86) decide that xnarch_switch_to should use switch_to (or more likely an inner __switch_to) when the xnthread_t has a non NULL user_task member is an implementation detail. Can you point out where in those paths irqs are disabled again (after entering xnarch_switch_to) They are not disabled again after xnarch_switch_to, they are disabled when starting switch_to. and left off for each of the unlocked switching archs? I'm still skeptical that the need for disable irqs during thread switch on some archs also leads to unconditionally disabled hard irqs when returning from xnarch_switch_to. But even if that's all the case today, we would better set this requirement in stone: diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c index f2fc2ab..c4c5807 100644 --- a/ksrc/nucleus/pod.c +++ b/ksrc/nucleus/pod.c @@ -2273,6 +2273,8 @@ reschedule: xnpod_switch_to(sched, prev, next); + XENO_BUGON(NUCLEUS, !irqs_disabled_hw()); + You misunderstand me: only after the second half context switch in the case of xnshadow_relax are the interrupts disabled. Because this second half-switch started as a switch_to and not an xnarch_switch_to, so, started as: #define switch_to(prev,next,last) \ do {\ local_irq_disable_hw_cond();\ last = __switch_to(prev,task_thread_info(prev), task_thread_info(next));\ local_irq_enable_hw_cond(); \ } while (0) (On ARM for instance). But that is true, we could assert this in the shadow epilogue case. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix interrupt handler tails
On 2011-06-20 22:52, Gilles Chanteperdrix wrote: On 06/20/2011 10:41 PM, Jan Kiszka wrote: xnarch_switch_to is the central entry point for everyone. It may decide to branch to switch_to or __switch_to, or it simply handles all on its own - that's depending on the arch. No, the Linux kernel does not know anything about xnarch_switch_to, so the schedule() function continues to use switch_to happily. xnarch_switch_to is only used to switch from xnthread_t to xnthread_t, by __xnpod_schedule(). Now, that some architecture (namely x86) decide that xnarch_switch_to should use switch_to (or more likely an inner __switch_to) when the xnthread_t has a non NULL user_task member is an implementation detail. Can you point out where in those paths irqs are disabled again (after entering xnarch_switch_to) They are not disabled again after xnarch_switch_to, they are disabled when starting switch_to. and left off for each of the unlocked switching archs? I'm still skeptical that the need for disable irqs during thread switch on some archs also leads to unconditionally disabled hard irqs when returning from xnarch_switch_to. But even if that's all the case today, we would better set this requirement in stone: diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c index f2fc2ab..c4c5807 100644 --- a/ksrc/nucleus/pod.c +++ b/ksrc/nucleus/pod.c @@ -2273,6 +2273,8 @@ reschedule: xnpod_switch_to(sched, prev, next); +XENO_BUGON(NUCLEUS, !irqs_disabled_hw()); + You misunderstand me: only after the second half context switch in the case of xnshadow_relax are the interrupts disabled. Because this second half-switch started as a switch_to and not an xnarch_switch_to, so, started as: #define switch_to(prev,next,last) \ do {\ local_irq_disable_hw_cond();\ last = __switch_to(prev,task_thread_info(prev), task_thread_info(next));\ local_irq_enable_hw_cond(); \ } while (0) (On ARM for instance). OK, that's now clear, thanks. But that is true, we could assert this in the shadow epilogue case. I've queued a patch. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix interrupt handler tails
On 06/18/2011 03:58 PM, Jan Kiszka wrote: On 2011-06-18 15:12, Gilles Chanteperdrix wrote: On 06/18/2011 03:07 PM, Jan Kiszka wrote: On 2011-06-18 14:56, Gilles Chanteperdrix wrote: On 06/18/2011 02:10 PM, Jan Kiszka wrote: On 2011-06-18 14:09, Gilles Chanteperdrix wrote: On 06/18/2011 12:21 PM, Jan Kiszka wrote: On 2011-06-17 20:55, Gilles Chanteperdrix wrote: On 06/17/2011 07:03 PM, Jan Kiszka wrote: On 2011-06-17 18:53, Gilles Chanteperdrix wrote: On 06/17/2011 04:38 PM, GIT version control wrote: Module: xenomai-jki Branch: for-upstream Commit: 7203b1a66ca0825d5bcda1c3abab9ca048177914 URL: http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=7203b1a66ca0825d5bcda1c3abab9ca048177914 Author: Jan Kiszka jan.kis...@siemens.com Date: Fri Jun 17 09:46:19 2011 +0200 nucleus: Fix interrupt handler tails Our current interrupt handlers assume that they leave over the same task and CPU they entered. But commit f6af9b831c broke this assumption: xnpod_schedule invoked from the handler tail can now actually trigger a domain migration, and that can also include a CPU migration. This causes subtle corruptions as invalid xnstat_exectime_t objects may be restored and - even worse - we may improperly flush XNHTICK of the old CPU, leaving Linux timer-wise dead there (as happened to us). Fix this by moving XNHTICK replay and exectime accounting before the scheduling point. Note that this introduces a tiny imprecision in the accounting. I am not sure I understand why moving the XNHTICK replay is needed: if we switch to secondary mode, the HTICK is handled by xnpod_schedule anyway, or am I missing something? The replay can work on an invalid sched (after CPU migration in secondary mode). We could reload the sched, but just moving the replay is simpler. But does it not remove the purpose of this delayed replay? Hmm, yes, in the corner case of coalesced timed RT task wakeup and host tick over a root thread. Well, then we actually have to reload sched and keep the ordering to catch that as well. Note that if you want to reload the sched, you also have to shut interrupts off, because upon return from xnpod_schedule after migration, interrupts are on. That would be another severe bug if we left an interrupt handler with hard IRQs enabled - the interrupt tail code of ipipe would break. Fortunately, only xnpod_suspend_thread re-enables IRQs and returns. xnpod_schedule also re-enables but then terminates the context (in xnshadow_exit). So we are safe. I do not think we are, at least on platforms where context switches happen with irqs on. Can you sketch a problematic path? On platforms with IPIPE_WANT_PREEMPTIBLE_SWITCH on, all context switches happens with irqs on. So, in particular, the context switch to a relaxed task happens with irqs on. In __xnpod_schedule, we then return from xnpod_switch_to with irqs on, and so return from __xnpod_schedule with irqs on. /* We are returning to xnshadow_relax via xnpod_suspend_thread, do nothing, xnpod_suspend_thread will re-enable interrupts. */ Looks like this is outdated. I think we best fix this in __xnpod_schedule by disabling irqs there instead of forcing otherwise redundant disabling into all handler return paths. I agree. I've queued a corresponding patch, and also one to clean up that special handshake between xnshadow_relax and xnpod_suspend_thread a bit. Consider both as RFC. Merged your whole branch, but took the liberty to change it a bit (replacing the commit concerning unlocked context switches with comments changes only, and changing the commit about xntbase_tick). -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix interrupt handler tails
On 2011-06-17 20:55, Gilles Chanteperdrix wrote: On 06/17/2011 07:03 PM, Jan Kiszka wrote: On 2011-06-17 18:53, Gilles Chanteperdrix wrote: On 06/17/2011 04:38 PM, GIT version control wrote: Module: xenomai-jki Branch: for-upstream Commit: 7203b1a66ca0825d5bcda1c3abab9ca048177914 URL: http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=7203b1a66ca0825d5bcda1c3abab9ca048177914 Author: Jan Kiszka jan.kis...@siemens.com Date: Fri Jun 17 09:46:19 2011 +0200 nucleus: Fix interrupt handler tails Our current interrupt handlers assume that they leave over the same task and CPU they entered. But commit f6af9b831c broke this assumption: xnpod_schedule invoked from the handler tail can now actually trigger a domain migration, and that can also include a CPU migration. This causes subtle corruptions as invalid xnstat_exectime_t objects may be restored and - even worse - we may improperly flush XNHTICK of the old CPU, leaving Linux timer-wise dead there (as happened to us). Fix this by moving XNHTICK replay and exectime accounting before the scheduling point. Note that this introduces a tiny imprecision in the accounting. I am not sure I understand why moving the XNHTICK replay is needed: if we switch to secondary mode, the HTICK is handled by xnpod_schedule anyway, or am I missing something? The replay can work on an invalid sched (after CPU migration in secondary mode). We could reload the sched, but just moving the replay is simpler. But does it not remove the purpose of this delayed replay? Hmm, yes, in the corner case of coalesced timed RT task wakeup and host tick over a root thread. Well, then we actually have to reload sched and keep the ordering to catch that as well. Note that if you want to reload the sched, you also have to shut interrupts off, because upon return from xnpod_schedule after migration, interrupts are on. That would be another severe bug if we left an interrupt handler with hard IRQs enabled - the interrupt tail code of ipipe would break. Fortunately, only xnpod_suspend_thread re-enables IRQs and returns. xnpod_schedule also re-enables but then terminates the context (in xnshadow_exit). So we are safe. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core