Re: [Xenomai-core] [PATCH] shared irqs v.3
Hi Jan, As lighter may mean that reducing the structure size also reduces the number of used cache lines, it might be a good idea. The additional complexity for entry removal is negligible. My current working version is already lighter when it comes to the size of additional data structures. It's implemented via the one-way linked list instead of xnqueue_t. This way, it's 3 times lighter for UP and 2 times for SMP systems. I'll try to post it today later. The only problem remaining is the compilation issues so I should fix it before posting, namely: it looks like some code in kscr/nucleus (e.g. intr.c) is used for compiling both kernel-mode code (of cousre) and user-mode (maybe for UVM, though I haven't looked at it thoroughly yet). The link to ksrc/nucleus is created in the src/ directory. Both the IPIPE_NR_IRQS macro and rthal_critical_enter/exit() calls are undefined when intr.c is compiled for the user-mode side. That's why it so far contains those __IPIPE_NR_IRQS and external int rthal_critical_enter/exit() definitions. I hope that also answers your another question later on this mail. Beleive it or not, I have considered different ways to guarantee that a passed cookie param is valid (xnintr_detach() has not deleted it) and remains to be so while the xnintr_irq_handler() is running. And there are some obstacles there... I'll post them later if someone is interested since I'm short of time now :) ... I'm interested... Ok. So I will have at least a reader :) Actually, I still hope to find out some solution so to make use of the recently extended ipipe interface as it was supposed to be used (then there is no need for any per-irq xnshirqs array in intr.c). Otherwise, I have to admit that my recent work with that ipipe extension (I can ay it since I made it) is of no big avail. Maybe we together will find out a solution. That code is compiled for the user-mode code also and the originals are not available. So consider it a temp solution for test purposes, I guess it's easily fixable. test/shirq.c - is a test module. SHIRQ_VECTOR must be the one used by Linux, e.g. I have used 12 that's used by the trackball device. I haven't tried your code yet, but in the preparation of a real scenario I stumbled over a problem in my serial driver regarding IRQ sharing: In case you want to use xeno_16550A for ISA devices with shared IRQs, an iteration over the list of registered handlers would be required /until/ no device reported that it handled something. This is required so that the IRQ line gets released for a while and system obtains a chance to detect a new /edge/ triggered IRQ - ISA oddity. That's the way most serial drivers work, but they do it internally. So the question arose for me if this edge-specific handling shouldn't be moved to the nucleus as well (so that I don't have to fix my 16550A ;)). Brrr... frankly speaking, I haven't got it clearly so don't want to make pure speculations. Probably I have to take a look at the xeno_16550A driver keeping in mind your words. Another optimisation idea, which I once also realised in my own shared IRQ wrapper, is to use specialised trampolines at the nucleus level, i.e. to not apply the full sharing logic with its locking and list iterations for non-shared IRQs. What do you think? Worth it? Might be when the ISA/edge handling adds further otherwise unneeded overhead. Yep, maybe. But let's take something working first.. Jan -- Best regards,Dmitry Adamushko ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] Scheduling while atomic
Hm. When I remove the output() from both tasks, all seems fine. Jeroen. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Scheduling while atomic
Hold on. Just crashed without the file access: please disregard last post. Jeroen. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] Scheduling while atomic
Gilles, I cannot reproduce those messages after turning nucleus debugging on. Instead, I now either get relatively more failing mutexes oreven hard lockups with the test program I sent to you. If thecomputer didn't crash, dmesg contains 3 Xenomai messages relating to a task being movend to secondary domain after exception #14. Aswhen thecomputer crashes: I have written the last kernel panic message on a paper. Please tell if you want also the addresses or (part of) the call stack. I'm still wondering if there's a programming error in the mutex test program. After I sent my previous message, and before I turned nucleus debugging on, I managed (by reducing the sleeptimes to max. 5.0e4) to fatally crash the computer,while spewing out countless 'scheduling while atomic messages'.Is the mutex error reproducible ? Tomorrow I'll try the patch. lostage_handler + e/33a rthal_apc_handler + 3b/46 lostage_handler + 190/33a rthal_apc_handler + 3b/46 __ipipe_sync_stage + 2a1/2bc mark_offset_tsc + c1/456 __ipipe_sync_stage + 2a9/2bc ipipe_unstall_pipeline_from + 189/194 (might be 181/194) xnpod_delete_thread + ba1/bc3 mcount + 23/2a taskexit_event + 4f/6c __ipipe_dispatch_event + 90/173 do_exit + 10f/604 sys_exit + 8/14 syscall_call + 7/b next_thread + 0/15 syscall_call + 7/b 0 Kernel panic - not syncing: Fatal Exception in interrupt Thanks for investigating, Jeroen.
Re: [Xenomai-core] Scheduling while atomic
Jeroen Van den Keybus wrote: Gilles, I cannot reproduce those messages after turning nucleus debugging on. Instead, I now either get relatively more failing mutexes or even hard lockups with the test program I sent to you. If the computer didn't crash, dmesg contains 3 Xenomai messages relating to a task being movend to secondary domain after exception #14. As when the computer crashes: I have written the last kernel panic message on a paper. Please tell if you want also the addresses or (part of) the call stack. Could you try adding a call to mlockall(MCL_CURRENT|MCL_FUTURE) ? Also note that you do not need protecting accesses to file descriptor with rt_mutexes. stdio file descriptor are protected with pthread mutexes, and pthread mutexes functions cause threads migration to secondary mode. And unix file descriptor are passed to system calls, which also cause migration to secondary mode. -- Gilles Chanteperdrix.
Re: [Xenomai-core] [PATCH] shared irqs v.3
Hi Jan, As lighter may mean that reducing the structure size also reduces the number of used cache lines, it might be a good idea. The additional complexity for entry removal is negligible. My current working version is already lighter when it comes to the size of additional data structures. It's implemented via the one-way linked list instead of xnqueue_t. This way, it's 3 times lighter for UP and 2 times for SMP systems. I'll try to post it today later. The only problem remaining is the compilation issues so I should fix it before posting, namely: it looks like some code in kscr/nucleus (e.g. intr.c) is used for compiling both kernel-mode code (of cousre) and user-mode (maybe for UVM, though I haven't looked at it thoroughly yet). The link to ksrc/nucleus is created in the src/ directory. Both the IPIPE_NR_IRQS macro and rthal_critical_enter/exit() calls are undefined when intr.c is compiled for the user-mode side. That's why it so far contains those __IPIPE_NR_IRQS and external int rthal_critical_enter/exit() definitions. I hope that also answers your another question later on this mail. Beleive it or not, I have considered different ways to guarantee that a passed cookie param is valid (xnintr_detach() has not deleted it) and remains to be so while the xnintr_irq_handler() is running. And there are some obstacles there... I'll post them later if someone is interested since I'm short of time now :) ... I'm interested... Ok. So I will have at least a reader :) Actually, I still hope to find out some solution so to make use of the recently extended ipipe interface as it was supposed to be used (then there is no need for any per-irq xnshirqs array in intr.c). Otherwise, I have to admit that my recent work with that ipipe extension (I can ay it since I made it) is of no big avail. Maybe we together will find out a solution. That code is compiled for the user-mode code also and the originals are not available. So consider it a temp solution for test purposes, I guess it's easily fixable. test/shirq.c - is a test module. SHIRQ_VECTOR must be the one used by Linux, e.g. I have used 12 that's used by the trackball device. I haven't tried your code yet, but in the preparation of a real scenario I stumbled over a problem in my serial driver regarding IRQ sharing: In case you want to use xeno_16550A for ISA devices with shared IRQs, an iteration over the list of registered handlers would be required /until/ no device reported that it handled something. This is required so that the IRQ line gets released for a while and system obtains a chance to detect a new /edge/ triggered IRQ - ISA oddity. That's the way most serial drivers work, but they do it internally. So the question arose for me if this edge-specific handling shouldn't be moved to the nucleus as well (so that I don't have to fix my 16550A ;)). Brrr... frankly speaking, I haven't got it clearly so don't want to make pure speculations. Probably I have to take a look at the xeno_16550A driver keeping in mind your words. Another optimisation idea, which I once also realised in my own shared IRQ wrapper, is to use specialised trampolines at the nucleus level, i.e. to not apply the full sharing logic with its locking and list iterations for non-shared IRQs. What do you think? Worth it? Might be when the ISA/edge handling adds further otherwise unneeded overhead. Yep, maybe. But let's take something working first.. Jan -- Best regards,Dmitry Adamushko
[Xenomai-core] Initialization of a nucleus pod
Hello I am trying to understand how a pod is initialized. I think it start with xncore_attach() from core.c, then with xnpod_init from pod.c But here (in xnpod_init) there is something not clear about the root thread creation. It use xnthread_init(sched-rootcb, ...), but I don't see where sched-rootcb is initialized Maybe I don't understand how it works, so the pod don't have his own code to run, but this thread will be in fact replaced by the task to run. I hope my question is clear enough Thank Germain
[Xenomai-core] [PATCH] Shared irqs v.5
Hello Jan, as I promised earlier today, here is the patch. hehe.. more comments later together with other explanations I promised. I have to go now since I have to make a trip and my bus is leaving in 45 minutes :) -- Best regards,Dmitry Adamushko shirq-v4.patch Description: Binary data
[Xenomai-core] Re: --enable-linux-build
Gilles Chanteperdrix wrote: Jan Kiszka wrote: Hi Gilles, I just tested your new build option. Maybe I'm using it the wrong way, but I stumbled over two quirks: o make install-nodev fails as it tries to install the kernel without being root. Actually, I only wanted to install the user space part, how can I do this separately? Or is this rather a use-case for the standard build? I did not think about this case. Any idea of what would be better ? Not installing kernel when running make install-nodev ? Creating install-nokernel and install-nokernel-nodev targets ? I would suggest make install-user instead of make install-nodev, combining both -nodev and -nokernel, i.e. excluding everything that requires root permissions. o On every make, the prepare-kernel script is executed - intentionally? Maybe it would be better to provide a dedicated make target to trigger the update. prepare-kernel should be executed whenever any file or directory is added in the ksrc and include dirs. On my own machine, prepare-kernel is much shorter than the kernel build. So, I did not see this as an issue, but I am ready to accept any better solution. Maybe it could depend on maintainer mode ? Since the user-space will work automatically when adding a file or directory only if maintainer mode is enabled. Yes, this looks good - as long as it is still run on the first make or during configure. The point is that I have a non-developer use-case for your build mode in mind were you do not constantly add files to the xeno code base, but reconfigure your kernel from time to time: We have mini distribution here which can optionally be build from source. In that case, you could soon decide to build ( ) Vanilla kernel (*) Xenomai-extended kernel and libraries instead of ( ) Vanilla kernel (*) Xenomai-extended kernel [*] Xenomai libraries (not to speak about what currently happens in the background...) Could make life of the maintainer and users here a bit easier. Thanks for caring, Jan signature.asc Description: OpenPGP digital signature
[Xenomai-core] Scheduling while atomic
Hello, Apparently, the code I shared with Gilles never made it to this forum. Anyway, the issue I'm having here is really a problem and it might be useful if some of you could try it out or comment on it. I might be making a silly programming error here, but the result is invariably erroneous operation or kernel crashes. The program creates a file dump.txt and has two independent threads trying to access it and write a one or a zero there. Inside the writing routine, which is accessed by both threads, a check is made to see if the access is really locked. In my setup, I have tons of ALERTS popping up with this program, meaning that something is wrong with my use of mutex. Could anyone please check and see if a) it is correctly written and b) it fails as well on their machine. It would allow me to focus my actions on the Xenomai setup (which I keep frozen this instant, in order to keep a possible bug predictable) or on my own programming. A second example is also included, which tries to achieve the same goal with a semaphore (initialized to 1). That seems to work, but under heavy load (tmax = 1.0e7), the kernel crashes. Kernel: 2.6.15 Adeos: 1.1-03 gcc: 4.0.2 Ipipe tracing enabled TIA Jeroen. /* TEST_MUTEX.C */ #include stdlib.h#include stdio.h#include unistd.h#include fcntl.h#include signal.h#include math.h#include values.h #include sys/mman.h #include native/task.h#include native/mutex.h#include native/sem.h int fd, err;RT_MUTEX m;RT_SEM s;float tmax = 1.0e7; #define CHECK(arg) check(arg, __LINE__) int check(int r, int n){ if (r != 0) fprintf(stderr, L%d: %s.\n, n, strerror(-r)); return(r);} void output(char c) { static int cnt = 0; int n; char buf[2]; RT_MUTEX_INFO mutexinfo; buf[0] = c; if (cnt == 80) { buf[1] = '\n'; n = 2; cnt = 0; } else { n = 1; cnt++; } CHECK(rt_mutex_inquire(m, mutexinfo)); if (mutexinfo.lockcnt = 0) { RT_TASK_INFO taskinfo; CHECK(rt_task_inquire(NULL, taskinfo)); fprintf(stderr, ALERT: No lock! (lockcnt=%d) Offending task: %s\n, mutexinfo.lockcnt, taskinfo.name ); } if (write(fd, buf, n) != n) { fprintf(stderr, File write error.\n); CHECK(rt_sem_v(s)); } } void task0(void *arg){ CHECK(rt_task_set_mode(T_PRIMARY, 0, NULL)); while (1) { CHECK(rt_task_sleep((float)rand()*tmax/(float)RAND_MAX)); CHECK(rt_mutex_lock(m, TM_INFINITE)); output('0'); CHECK(rt_mutex_unlock(m)); }} void task1(void *arg){ CHECK(rt_task_set_mode(T_PRIMARY, 0, NULL)); while (1) { CHECK(rt_task_sleep((float)rand()*tmax/(float)RAND_MAX)); CHECK(rt_mutex_lock(m, TM_INFINITE)); output('1'); CHECK(rt_mutex_unlock(m)); }} void sighandler(int arg){ CHECK(rt_sem_v(s));} int main(int argc, char *argv[]){ RT_TASK t, t0, t1; if ((fd = open(dump.txt, O_CREAT | O_TRUNC | O_WRONLY)) 0) fprintf(stderr, File open error.\n); else { if (argc == 2) { tmax = atof(argv[1]); if (tmax == 0.0) tmax = 1.0e7; } if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) printf(mlockall() error.\n); CHECK(rt_task_shadow(t, main, 1, T_FPU)); CHECK(rt_timer_start(TM_ONESHOT)); CHECK(rt_mutex_create(m, mutex)); CHECK(rt_sem_create(s, sem, 0, S_PRIO)); signal(SIGINT, sighandler); CHECK(rt_task_create(t0, task0, 0, 30, T_FPU)); CHECK(rt_task_start(t0, task0, NULL)); CHECK(rt_task_create(t1, task1, 0, 29, T_FPU)); CHECK(rt_task_start(t1, task1, NULL)); printf(Running for %.2f seconds.\n, (float)MAXLONG/1.0e9); CHECK(rt_sem_p(s, MAXLONG)); signal(SIGINT, SIG_IGN); CHECK(rt_task_delete(t1)); CHECK(rt_task_delete(t1)); CHECK(rt_task_delete(t0)); CHECK(rt_sem_delete(s)); CHECK(rt_mutex_delete(m)); rt_timer_stop(); close(fd); } return 0;} /*/ /* TEST_SEM.C */ #include stdlib.h#include stdio.h#include unistd.h#include fcntl.h#include signal.h#include math.h#include values.h #include sys/mman.h #include native/task.h#include native/sem.h int fd, err;RT_SEM s, m;float tmax = 1.0e9; #define CHECK(arg) check(arg, __LINE__) int check(int r, int n){ if (r != 0) fprintf(stderr, L%d: %s.\n, n, strerror(-r)); return(r);} void output(char c) { static int cnt = 0; int n; char buf[2]; RT_SEM_INFO seminfo; buf[0] = c; if (cnt == 80) { buf[1] = '\n'; n = 2; cnt = 0; } else { n = 1; cnt++; } CHECK(rt_sem_inquire(m, seminfo)); if (seminfo.count != 0) { RT_TASK_INFO taskinfo; CHECK(rt_task_inquire(NULL, taskinfo)); fprintf(stderr, ALERT: No lock! (count=%ld) Offending task: %s\n, seminfo.count, taskinfo.name); } if (write(fd, buf, n) != n) { fprintf(stderr, File write error.\n); CHECK(rt_sem_v(s)); } } void task0(void *arg){ CHECK(rt_task_set_mode(T_PRIMARY, 0, NULL)); while (1) { CHECK(rt_task_sleep((float)rand()*tmax/(float)RAND_MAX)); CHECK(rt_sem_p(m, TM_INFINITE)); output('0'); CHECK(rt_sem_v(m)); }} void task1(void *arg){ CHECK(rt_task_set_mode(T_PRIMARY, 0, NULL)); while (1) { CHECK(rt_task_sleep((float)rand()*tmax/(float)RAND_MAX)); CHECK(rt_sem_p(m, TM_INFINITE)); output('1'); CHECK(rt_sem_v(m)); }} void sighandler(int arg){
Re: [Xenomai-core] Scheduling while atomic
Jan Kiszka wrote: ... [Update] While writing this mail and letting your test run for a while, I *did* get a hard lock-up. Hold on, digging deeper... And here are its last words, spoken via serial console: c31dfab0 0086 c30d1a90 c02a2500 c482a360 0001 0001 0020 c012e564 0022 0246 c30d1a90 c4866ce0 0033 c482 c482a360 c4866ca0 c48293a4 c48524e1 0002 Call Trace: [c012e564] __ipipe_dispatch_event+0x56/0xdd [c482] e100_hw_init+0x3ad/0xa81 [e100] [c48524e1] xnpod_suspend_thread+0x714/0x76d [xeno_nucleus] [c4856946] xnsynch_sleep_on+0x76d/0x7a7 [xeno_nucleus] [c4a09b29] rt_sem_p+0xa6/0x10a [xeno_native] [c4a03c62] __rt_sem_p+0x5d/0x66 [xeno_native] [c485b207] hisyscall_event+0x1cb/0x2d3 [xeno_nucleus] [c012e564] __ipipe_dispatch_event+0x56/0xdd [c010b3ea] __ipipe_syscall_root+0x53/0xbe [c01029c0] system_call+0x20/0x41 Xenomai: fatal: blocked thread main[863] rescheduled?! (status=0x300082, sig=0, prev=gatekeeper/0[809]) CPU PIDPRI TIMEOUT STAT NAME 0 0 30 000500080 ROOT 0 86430 000300180 task0 0 86529 000300288 task1 0 8631000300082 main Timer: oneshot [tickval=1 ns, elapsed=175144731477] c31e1f14 c4860572 c3188000 c31dfab0 00300082 c02a2500 0286 c02a2500 c030cbec c012e564 0022 c02a2500 c30d1a90 c30d1a90 0022 0001 c02a2500 c30d1a90 c08e4623 0028 c31e1fa0 c0266ed5 f610 c030cd80 Call Trace: [c012e564] __ipipe_dispatch_event+0x56/0xdd [c0266ed5] schedule+0x3ef/0x5ed [c485a27c] gatekeeper_thread+0x0/0x179 [xeno_nucleus] [c485a316] gatekeeper_thread+0x9a/0x179 [xeno_nucleus] [c010dd8b] default_wake_function+0x0/0x12 [c0124fbc] kthread+0x68/0x95 [c0124f54] kthread+0x0/0x95 [c0100d71] kernel_thread_helper+0x5/0xb Any bells already ringing? Will try Gilles' patch now... Jan signature.asc Description: OpenPGP digital signature
Re: [Xenomai-core] Scheduling while atomic
Jan Kiszka wrote: Jan Kiszka wrote: ... [Update] While writing this mail and letting your test run for a while, I *did* get a hard lock-up. Hold on, digging deeper... And here are its last words, spoken via serial console: c31dfab0 0086 c30d1a90 c02a2500 c482a360 0001 0001 0020 c012e564 0022 0246 c30d1a90 c4866ce0 0033 c482 c482a360 c4866ca0 c48293a4 c48524e1 0002 Call Trace: [c012e564] __ipipe_dispatch_event+0x56/0xdd [c482] e100_hw_init+0x3ad/0xa81 [e100] [c48524e1] xnpod_suspend_thread+0x714/0x76d [xeno_nucleus] [c4856946] xnsynch_sleep_on+0x76d/0x7a7 [xeno_nucleus] [c4a09b29] rt_sem_p+0xa6/0x10a [xeno_native] [c4a03c62] __rt_sem_p+0x5d/0x66 [xeno_native] [c485b207] hisyscall_event+0x1cb/0x2d3 [xeno_nucleus] [c012e564] __ipipe_dispatch_event+0x56/0xdd [c010b3ea] __ipipe_syscall_root+0x53/0xbe [c01029c0] system_call+0x20/0x41 Xenomai: fatal: blocked thread main[863] rescheduled?! (status=0x300082, sig=0, prev=gatekeeper/0[809]) CPU PIDPRI TIMEOUT STAT NAME 0 0 30 000500080 ROOT 0 86430 000300180 task0 0 86529 000300288 task1 0 8631000300082 main Timer: oneshot [tickval=1 ns, elapsed=175144731477] c31e1f14 c4860572 c3188000 c31dfab0 00300082 c02a2500 0286 c02a2500 c030cbec c012e564 0022 c02a2500 c30d1a90 c30d1a90 0022 0001 c02a2500 c30d1a90 c08e4623 0028 c31e1fa0 c0266ed5 f610 c030cd80 Call Trace: [c012e564] __ipipe_dispatch_event+0x56/0xdd [c0266ed5] schedule+0x3ef/0x5ed [c485a27c] gatekeeper_thread+0x0/0x179 [xeno_nucleus] [c485a316] gatekeeper_thread+0x9a/0x179 [xeno_nucleus] [c010dd8b] default_wake_function+0x0/0x12 [c0124fbc] kthread+0x68/0x95 [c0124f54] kthread+0x0/0x95 [c0100d71] kernel_thread_helper+0x5/0xb Any bells already ringing? Will try Gilles' patch now... Nope, this didn't help. Ok, this is migration magic. Someone around who hacks this part blindly? Jan signature.asc Description: OpenPGP digital signature
Re: [Xenomai-core] Scheduling while atomic
Jeroen Van den Keybus wrote: Interesting, when writing to 2 different files, I get the same crashes. Will test with only one task/fd. File ops doesn't matter for me. I took them out of task0/1, and I still got the crashes. (BTW, this may explain the difference in your backtrace you reported privately.) Jan - now really leaving... signature.asc Description: OpenPGP digital signature
Re: [Xenomai-core] Scheduling while atomic
Jan Kiszka wrote: [...] Do you (or anybody else) have a running 2.0.x installation? If so, please test that setup as well. Sure :-) # uname -r 2.6.13.4-adeos-xenomai # cat /proc/xenomai/version 2.0 # ./mutex Running for 2.15 seconds. ALERT: No lock! (lockcnt=0) Offending task: task0 ALERT: No lock! (lockcnt=0) Offending task: task0 ALERT: No lock! (lockcnt=0) Offending task: task0 ALERT: No lock! (lockcnt=0) Offending task: task0 L121: Connection timed out. # cat dump.txt 101001001010101011000110001[...] # ./sem Running for 2.15 seconds. L119: Connection timed out. # cat dump.txt 101001muon:/home/xenomai/atomic# More tests ? Best regards, Hannes.
Re: [Xenomai-core] Scheduling while atomic
Hm. When I remove the output() from both tasks, all seems fine. Jeroen.