Re: [Xenomai-core] Cannot end interrupt from user space: add rt_intr_end call ?
Hi Philippe, thank you for the patch ! xnarch_end_irq() basically calls the -unmask() method of the interrupt chip descriptor, which is the same as calling rt_intr_enable(). Before you do that, If I read the sources correctly, for edge and simple interrupt, xnarch_end_irq() calls nothing, for percpu it calls -eoi() . For level, fasteoi, and demux, it calls -unmask(). So calling rt_intr_enable from user space would not always do the same as xnarch_end_irq from the kernel, wouldn't it ? Tomas Philippe Gerum wrote: Tomas Kalibera wrote: Hi, I think that when I handle interrupts from user space, I cannot correctly use I_NOAUTOENA. The thing is that this flag in fact means do not call automatically xnarch_end_irq. The xnarch_end_irq call usually maps to unmasking the interrupt, but not always - depending on interrupt type (sometimes in eoi, sometimes is nop). I was thinking that it would be nice if I could call something like xnarch_end_irq (i.e. rt_intr_end) from user space, so that I could correctly use I_NOAUTOENA to control the flow of interrupts. What would this buy you? xnarch_irq_end() would still handle the unmasking logic depending on the interrupt type, because it knows how the interrupt was acknowledged in the first place -- in contrast, the application does not and should not. xnarch_end_irq() basically calls the -unmask() method of the interrupt chip descriptor, which is the same as calling rt_intr_enable(). Before you do that, you may want to try the attached patch, which makes sure that rt_intr_enable/disable are eagerly routed to unmask/mask on x86 for post-2.6.18 kernels. That patch is expected to solve the rt_intr_disable() not masking IO-APIC interrupt issue we discussed earlier. Cheers, Tomas ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] Cannot end interrupt from user space: add rt_intr_end call ?
Hi, I think that when I handle interrupts from user space, I cannot correctly use I_NOAUTOENA. The thing is that this flag in fact means do not call automatically xnarch_end_irq. The xnarch_end_irq call usually maps to unmasking the interrupt, but not always - depending on interrupt type (sometimes in eoi, sometimes is nop). I was thinking that it would be nice if I could call something like xnarch_end_irq (i.e. rt_intr_end) from user space, so that I could correctly use I_NOAUTOENA to control the flow of interrupts. Cheers, Tomas ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] xnintr* calls from user space tasks
Hi, the API documentation says that xnintr* calls (like xnintr_init, _enable, _disable, and other) can be called from user space tasks. Is it a bug in the documentation ? Or is it somehow possible ? Thanks, Tomas ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
Gilles Chanteperdrix wrote: Of course, we are looking for all bugs. But please tell me: do you get the lock-up even before fork is called ? If not, could you verify that at least some Xenomai programs run correctly, for instance latency ? Looking at the code, I think I found a bug, but I doubt it could cause a lockup. The definition of VM_PINNED in include/linux/mm.h collides with another bit used by Linux, so this defintion should be changed from: #define VM_PINNED 0x0800 to: #define VM_PINNED 0x1000 Here comes a 6th patch for this bug, (patch 6 includes patch 5). I've tested the 6th patch, the lockup is still there. As far as I can observe, it behaves exactly like with the 5th patch. Tomas ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
Hmm, I checked again and did not find a mistake in the experiment (neither using the old binary nor old sources). I'm doing a clean build from scratch again, so that we can be absolutely sure. I can then run memtest on the machine... Tomas Gilles Chanteperdrix wrote: On Wed, Apr 2, 2008 at 5:02 PM, Tomas Kalibera [EMAIL PROTECTED] wrote: OK, no change with this patch compared to the previous situation. The system boots, but hangs without a stacktrace when I run my Xenomai task. SysRq is blocked, now even SysRq-kill did not work, only SysRq-boot did. Are you sure you did not keep the stuff in highmem_32.c ? ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
Hi Gilles, I've recompiled the kernel again from scratch and got the same lock up. Fix 5 does not help... If you want to inspect the exact kernel I used, it's again at http://www.cs.purdue.edu/~tkaliber/crash, the one with p5 in its name. Tomas Gilles Chanteperdrix wrote: On Wed, Apr 2, 2008 at 5:02 PM, Tomas Kalibera [EMAIL PROTECTED] wrote: OK, no change with this patch compared to the previous situation. The system boots, but hangs without a stacktrace when I run my Xenomai task. SysRq is blocked, now even SysRq-kill did not work, only SysRq-boot did. Are you sure you did not keep the stuff in highmem_32.c ? ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
Gilles Chanteperdrix wrote: Of course, we are looking for all bugs. But please tell me: do you get the lock-up even before fork is called ? If not, could you verify that at least some Xenomai programs run correctly, for instance latency ? The lock up with patch 5 happens before fork is called, but after a real-time task is started by the program. I don't know better now - I'd have to add more logging. If I run in strace, the lock-up does not happen. Thinking about that, it can be a bug in my program. If I understand the concept of Xenomai correctly, I can just write a real-time task that would starve the Linux kernel indefinitely, correct ? My program definitely does have bugs. So I'll do more debugging. The lock-up does NOT happen with latency. But, the bug in the kernel without patch 5 (the one that lead to a stack trac, after call to fork), did not appear in latency, either. Looking at the code, I think I found a bug, but I doubt it could cause a lockup. The definition of VM_PINNED in include/linux/mm.h collides with another bit used by Linux, so this defintion should be changed from: #define VM_PINNED 0x0800 to: #define VM_PINNED 0x1000 I will now try, if possible, to reproduce the bug on a x86 box of mine and will keep you informed. Thanks ! I'll indeed build kernel with patch 6, test again, and test my application. Tomas ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
I added a missing underscore and re-tried, and none of the debug messages was printed. I added another one to make sure that there is not a problem with getting printk messages to the serial console. The resulting highmem_32.c and the output is attached. T Gilles Chanteperdrix wrote: Gilles Chanteperdrix wrote: Tomas Kalibera wrote: Crashed on the very same line as before Tomas Ok. Let us look for unbalanced kmap_atomics then. Try this patch instead. Just when I hit the reply button, I realize that I forgot something. So, try this one instead. #include linux/highmem.h #include linux/module.h static struct { const char *file; unsigned line; } last_km_user0 [NR_CPUS]; void *kmap(struct page *page) { might_sleep(); if (!PageHighMem(page)) return page_address(page); return kmap_high(page); } void kunmap(struct page *page) { if (in_interrupt()) BUG(); if (!PageHighMem(page)) return; kunmap_high(page); } /* * kmap_atomic/kunmap_atomic is significantly faster than kmap/kunmap because * no global lock is needed and because the kmap code must perform a global TLB * invalidation when the kmap pool wraps. * * However when holding an atomic kmap is is not legal to sleep, so atomic * kmaps are appropriate for short, tight code paths only. */ void *_kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t prot, const char *file, unsigned line) { enum fixed_addresses idx; unsigned long vaddr; /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */ pagefault_disable(); if (!PageHighMem(page)) return page_address(page); idx = type + KM_TYPE_NR*smp_processor_id(); vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); if (!pte_none(*(kmap_pte-idx))) { if (type == KM_USER0) { printk(KM_USER0 already mapped at %s:%d\n, last_km_user0[smp_processor_id()].file, last_km_user0[smp_processor_id()].line); } else { printk(type is NOT KM_USER0\n); } BUG(); } else if (type == KM_USER0) { last_km_user0[smp_processor_id()].file = file; last_km_user0[smp_processor_id()].line = line; } set_pte(kmap_pte-idx, mk_pte(page, prot)); arch_flush_lazy_mmu_mode(); return (void *)vaddr; } void *_kmap_atomic(struct page *page, enum km_type type, const char *file, unsigned line) { return _kmap_atomic_prot(page, type, kmap_prot, file, line); } void kunmap_atomic(void *kvaddr, enum km_type type) { unsigned long vaddr = (unsigned long) kvaddr PAGE_MASK; enum fixed_addresses idx = type + KM_TYPE_NR*smp_processor_id(); /* * Force other mappings to Oops if they'll try to access this pte * without first remap it. Keeping stale mappings around is a bad idea * also, in case the page changes cacheability attributes or becomes * a protected page in a hypervisor. */ if (vaddr == __fix_to_virt(FIX_KMAP_BEGIN+idx)) kpte_clear_flush(kmap_pte-idx, vaddr); else { #ifdef CONFIG_DEBUG_HIGHMEM BUG_ON(vaddr PAGE_OFFSET); BUG_ON(vaddr = (unsigned long)high_memory); #endif } if (type == KM_USER0) { last_km_user0[smp_processor_id()].file = NULL; last_km_user0[smp_processor_id()].line = 0; } arch_flush_lazy_mmu_mode(); pagefault_enable(); } /* This is the same as kmap_atomic() but can map memory that doesn't * have a struct page associated with it. */ void *_kmap_atomic_pfn(unsigned long pfn, enum km_type type, const char *file, unsigned line) { enum fixed_addresses idx; unsigned long vaddr; pagefault_disable(); idx = type + KM_TYPE_NR*smp_processor_id(); vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); if (!pte_none(*(kmap_pte-idx))) { if (type == KM_USER0) printk(KM_USER0 already mapped at %s:%d\n, last_km_user0[smp_processor_id()].file, last_km_user0[smp_processor_id()].line); BUG(); } else if (type == KM_USER0) { last_km_user0[smp_processor_id()].file = file; last_km_user0[smp_processor_id()].line = line; } set_pte(kmap_pte-idx, pfn_pte(pfn, kmap_prot)); arch_flush_lazy_mmu_mode(); return (void*) vaddr; } struct page *kmap_atomic_to_page(void *ptr) { unsigned long idx, vaddr = (unsigned long)ptr; pte_t *pte; if (vaddr FIXADDR_START) return virt_to_page(ptr); idx = virt_to_fix(vaddr); pte = kmap_pte - (idx - FIX_KMAP_BEGIN); return pte_page(*pte); } EXPORT_SYMBOL(kmap); EXPORT_SYMBOL(kunmap); EXPORT_SYMBOL(_kmap_atomic); EXPORT_SYMBOL(kunmap_atomic); EXPORT_SYMBOL(kmap_atomic_to_page); [ 255.285392] [ cut here ] [ 255.289992] kernel BUG at arch/x86/mm/highmem_32.c:56! [ 255.295107] invalid opcode: [#1] PREEMPT SMP [ 255.299901] Modules linked in: rfcomm l2cap bluetooth ppdev sbp2 ipv6 parport_pc lp parport pcspkr iTCO_wdt iTCO_vendor_se [ 255.327057] [ 255.328538] Pid: 4986, comm: ovmtask Not tainted (2.6.24.3xenomaip3 #2) [ 255.335123] EIP: 0060:[c011a966] EFLAGS: 00010286 CPU: 0 [ 255.340588] EIP is at _kmap_atomic_prot+0xa6/0x120 [ 255.345356] EAX: 0027 EBX: c2b27520 ECX: EDX
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
Crashed on the very same line as before Tomas [ 189.558776] [ cut here ] [ 189.563377] kernel BUG at arch/x86/mm/highmem_32.c:43! [ 189.568491] invalid opcode: [#1] PREEMPT SMP [ 189.573285] Modules linked in: rfcomm l2cap bluetooth ppdev sbp2 parport_pc lp parport sr_mod cdrom pcspkr iTCO_wdt iTCO_v endor_support shpchp pci_hotplug ipv6 evdev ext3 jbd mbcache sg sd_mod ata_piix usbhid hid floppy ata_generic ahci ohci1394 l ibata scsi_mod ieee1394 ehci_hcd tg3 uhci_hcd usbcore fuse [ 189.600440] [ 189.601924] Pid: 4960, comm: ovmtask Not tainted (2.6.24.3xenomaip1 #1) [ 189.608508] EIP: 0060:[c011a908] EFLAGS: 00010286 CPU: 0 [ 189.613971] EIP is at kmap_atomic_prot+0xb8/0xc0 [ 189.618566] EAX: d91a8163 EBX: c2b23500 ECX: f000 EDX: c044fecc [ 189.624804] ESI: 0007 EDI: 0163 EBP: 08003875 ESP: df673ea0 [ 189.631043] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 189.636416] Process ovmtask (pid: 4960, ti=df672000 task=df4d29e0 task.ti=df672000)0 [ 189.643865] I-pipe domain Linux [ 189.647257] Stack: fffb2000 c2b2350c c01a96aa fffb7000 fffb6000 df66d278 dfb7a580 [ 189.655648]dfb7ae40 df846084 df9b7084 08615000 0840 08615000 f7c3d740 c2b23520 [ 189.664039] c2b2350c c2be8aac fffb3000 08614fff [ 189.672430] Call Trace: [ 189.675045] [c01a96aa] copy_page_range+0x13a/0x560 [ 189.680086] [c01224ef] copy_process+0x8df/0x1250 [ 189.684951] [c012309c] do_fork+0x4c/0x200 [ 189.689211] [c01022d2] sys_clone+0x32/0x40 [ 189.693556] [c01043a1] sysenter_past_esp+0x6e/0x72 [ 189.698595] === [ 189.702150] Code: 0c c1 fb 05 29 c1 c1 e3 0c 89 c8 09 fb 89 1a 5b 5e 5f c3 89 e0 25 00 e0 ff ff f7 40 14 ff ff ff ef 0f 85 69 ff ff ff 0f 0b eb fe 0f 0b eb fe 8d 74 26 00 8b 0d f4 b1 45 c0 e9 35 ff ff ff 90 8d [ 189.721467] EIP: [c011a908] kmap_atomic_prot+0xb8/0xc0 SS:ESP 0068:df673ea0 [ 189.728669] ---[ end trace 7363976c5f0598cc ]--- [ 189.733269] note: ovmtask[4960] exited with preempt_count 1 Gilles Chanteperdrix wrote: Tomas Kalibera wrote: Hi Gilles, thanks for looking at it. Your analysis is correct, I don't indeed have CONFIG_PREEMPT_RT kernel, but only CONFIG_PREEMPT, sorry for the confusion. I've put the kernel config, sources, and binary on the web, so that you can be sure you're really looking on the kernel that is crashing, http://www.cs.purdue.edu/homes/tkaliber/crash After looking at the sources, it appears that kmap_atomic disables preemption and kunmap_atomic reenables it. In short, the bug should never happen. What could happen is that the preemption count is garbled, or that a call to kmap_atomic is not paired with a kunmap_atomic. To check if the problem comes from the preemption count, could you apply the following patch ? ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] Kernel crash with Xenomai (caused by fork?)
Hi, I'm getting kernel crashes with my native skin user-space Xenomai application. It looks like the crash happens after clone/fork. I'm using kernel 2.6.24.3, SMP, RT_PREEMPT (settings like 2.6.22-14-rt from Ubuntu 7.10). Xenomai 2.4.2. The thread causing the crash is a Xenomai task, running most of the time in the Linux domain. The application is very huge, getting a short example leading to the bug is unfortunatelly not realistic. The crash happens when running on real hardware (x86_64 with 32 bit kernel and applications). The system is unusable after it happens, can only be rebooted, the dump is from serial console. In VMWare on another x86_64 machine, it does not crash. Anyone getting a similar error ? Any ideas where to look for the problem ? Thanks, Tomas kernel crash dump [ 139.814229] [ cut here ] [ 139.818830] kernel BUG at arch/x86/mm/highmem_32.c:42! [ 139.823945] invalid opcode: [#1] PREEMPT SMP [ 139.828739] Modules linked in: rfcomm l2cap bluetooth ppdev sbp2 parport_pc lp parport sr_mod cdrom pcspkr iTCO_wdt iTCO_vendor_support ipv6 shpchp pci_hotplug evdev ext3 jbd mbcache sg sd_mod ata_piix usbhid hid floppy ata_generic ahci ohci1394 libata scsi_mod ieee1394 ehci_hcd tg3 uhci_hcd usbcor e fuse [ 139.855896] [ 139.857378] Pid: 4959, comm: ovmtask Not tainted (2.6.24.3xenomai #1) [ 139.863790] EIP: 0060:[c011a8d8] EFLAGS: 00010286 CPU: 0 [ 139.869255] EIP is at kmap_atomic_prot+0x98/0xa0 [ 139.873850] EAX: d91aa163 EBX: c2b23540 ECX: f000 EDX: c044fecc [ 139.880088] ESI: 0007 EDI: 0163 EBP: 08003875 ESP: df68fea0 [ 139.886326] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 139.891699] Process ovmtask (pid: 4959, ti=df68e000 task=df685080 task.ti=df68e000)0 [ 139.899148] I-pipe domain Linux [ 139.902539] Stack: fffb2000 c2b2354c c01a967a fffb7000 fffb6000 df89395c df4ad580 [ 139.910930]df4ad900 dfaf5084 df9f5084 08615000 0840 08615000 f7c02ec0 c2b23560 [ 139.919323] c2b2354c c2be8acc fffb3000 08614fff [ 139.927714] Call Trace: [ 139.930329] [c01a967a] copy_page_range+0x13a/0x560 [ 139.935368] [c01224bf] copy_process+0x8df/0x1250 [ 139.940235] [c012306c] do_fork+0x4c/0x200 [ 139.944495] [c01022d2] sys_clone+0x32/0x40 [ 139.948839] [c0104431] syscall_call+0x7/0xb [ 139.953272] === [ 139.956828] Code: b5 00 00 00 00 29 c2 8b 02 85 c0 75 1e 2b 1d 80 0c 50 c0 8d 46 45 c1 e0 0c c1 fb 05 29 c1 c1 e3 0c 89 c8 09 fb 89 1a 5b 5e 5f c3 0f 0b eb fe 8d 74 26 00 8b 0d f4 b1 45 c0 e9 55 ff ff ff 90 8d [ 139.976150] EIP: [c011a8d8] kmap_atomic_prot+0x98/0xa0 SS:ESP 0068:df68fea0 [ 139.983355] ---[ end trace 1cb0b5180594e9d9 ]--- [ 139.987956] note: ovmtask[4959] exited with preempt_count 1 end of strace output 4959 fcntl64(2, F_GETFL) = 0x8001 (flags O_WRONLY|O_LARGEFILE) 4959 rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0 4959 rt_sigaction(SIGUSR1, NULL, {SIG_DFL}, 8) = 0 4959 rt_sigaction(SIGUSR1, {0x85ec4b0, [], SA_RESTART|SA_SIGINFO}, {SIG_DFL}, 8) = 0 4959 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 4959 fcntl64(2, F_GETFL) = 0x8001 (flags O_WRONLY|O_LARGEFILE) 4959 write(2, #, 2) = 2 4959 fcntl64(2, F_GETFL) = 0x8001 (flags O_WRONLY|O_LARGEFILE) 4959 fcntl64(2, F_GETFL) = 0x8001 (flags O_WRONLY|O_LARGEFILE) 4959 write(2, executive, 9) = 9 4959 fcntl64(2, F_GETFL) = 0x8001 (flags O_WRONLY|O_LARGEFILE) 4959 fcntl64(2, F_GETFL) = 0x8001 (flags O_WRONLY|O_LARGEFILE) 4959 write(2, , 2) = 2 4959 fcntl64(2, F_GETFL) = 0x8001 (flags O_WRONLY|O_LARGEFILE) 4959 fcntl64(2, F_GETFL) = 0x8001 (flags O_WRONLY|O_LARGEFILE) 4959 write(2, [Testing , 9) = 9 4959 fcntl64(2, F_GETFL) = 0x8001 (flags O_WRONLY|O_LARGEFILE) 4959 fcntl64(2, F_GETFL) = 0x8001 (flags O_WRONLY|O_LARGEFILE) 4959 write(2, AbstractInterpretation, 22) = 22 4959 fcntl64(2, F_GETFL) = 0x8001 (flags O_WRONLY|O_LARGEFILE) 4959 pipe([7, 8]) = 0 4959 fcntl64(7, F_GETFL) = 0 (flags O_RDONLY) 4959 fcntl64(7, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 4959 fcntl64(8, F_GETFL) = 0x1 (flags O_WRONLY) 4959 fcntl64(8, F_SETFL, O_WRONLY|O_NONBLOCK) = 0 4959 clone( unfinished ... 4958 ... nanosleep resumed NULL) = 0 ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
Hi Gilles, thanks for looking at it. Your analysis is correct, I don't indeed have CONFIG_PREEMPT_RT kernel, but only CONFIG_PREEMPT, sorry for the confusion. I've put the kernel config, sources, and binary on the web, so that you can be sure you're really looking on the kernel that is crashing, http://www.cs.purdue.edu/homes/tkaliber/crash Thanks, Tomas Gilles Chanteperdrix wrote: Gilles Chanteperdrix wrote: Tomas Kalibera wrote: Hi, I'm getting kernel crashes with my native skin user-space Xenomai application. It looks like the crash happens after clone/fork. I'm using kernel 2.6.24.3, SMP, RT_PREEMPT (settings like 2.6.22-14-rt from Ubuntu 7.10). Xenomai 2.4.2. The thread causing the crash is a Xenomai task, running most of the time in the Linux domain. The application is very huge, getting a short example leading to the bug is unfortunatelly not realistic. The crash happens when running on real hardware (x86_64 with 32 bit kernel and applications). The system is unusable after it happens, can only be rebooted, the dump is from serial console. In VMWare on another x86_64 machine, it does not crash. Anyone getting a similar error ? Any ideas where to look for the problem ? Looking at the kernel code, it seems that only one page may be mapped at a time with kmap_atomic using KM_USER0. So what probably happens is that for other invocations of cow_user_page than the one taking place in fork, a lock of some kind prevents concurrent invocation of cow_user_page. In our use of cow_user_page, we probably do not hold that lock. I look at the code, I see that copy_pte_range holds a spinlock, which should disable preemption on a classical kernel. But who knows what happens with RT_PREEMPT enabled... There is something strange... Normally, when compiling with CONFIG_PREEMPT_RT, kmap_atomic_prot is replaced with kmap and the real kmap_atomic_prot is renamd __kmap_atomic_prot. Since cow_user_page uses kmap_atomic_prot, kmap is in fact called and kmap_atomic_prot BUG_ON condition should in fact never occur. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core