Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
Tomas Kalibera wrote: Hi, I've tried a more defensive kernel setup your patch (no.6). The lockup is still there. It happens after a realtime task is started, though I was unable to track exactly when - it does not crash in a debugger, does not crash with strace, breaks SysRq, and printing log messages seems to be delayed (despite flushing). I tried changing the application code (like using more default flags when creating a task, etc). But I did not find a workaround. I've put the kernel on the web again, including the config (the one that contains xenomaidp6). Maybe it might help to track down the bug... Maybe not. Jumping late on this, I didn't find any (user space) test case for the observed bug in this thread. Can you provide something? The simpler, the better. It may even contain bugs itself, it just has to trigger the kernel oops reliably. Then I saw in your .config that your kernel is optimized for AMD K6. In order to prepare exporting the bug, could you check that more generic CONFIG_M586TSC makes no difference? Also, if you happen to have a second, different box (/wrt CPU type speed) at hand, it would be nice to know that the issue is present there as well. But the latter is also something we can try once a test case is available. My preferred target will be QEMU, because that one can quite nicely be debugged even if the box is hopelessly locked up. Thanks, Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
On Thu, Apr 3, 2008 at 1:46 AM, Tomas Kalibera [EMAIL PROTECTED] wrote: Gilles Chanteperdrix wrote: Of course, we are looking for all bugs. But please tell me: do you get the lock-up even before fork is called ? If not, could you verify that at least some Xenomai programs run correctly, for instance latency ? The lock up with patch 5 happens before fork is called, but after a real-time task is started by the program. I don't know better now - I'd have to add more logging. If I run in strace, the lock-up does not happen. Thinking about that, it can be a bug in my program. If I understand the concept of Xenomai correctly, I can just write a real-time task that would starve the Linux kernel indefinitely, correct ? My program definitely does have bugs. So I'll do more debugging. Yes, you can starve Linux, but after 4seconds the Xenomai watchdog should trigger. You can also starve Linux with a vanilla Linux application running with the SCHED_FIFO scheduling policy, but in this case, it is Linux soft lockup detector which should trigger. I see that you have the two options enabled, so, the lockup is probably another bug. -- Gilles ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
Gilles Chanteperdrix wrote: Of course, we are looking for all bugs. But please tell me: do you get the lock-up even before fork is called ? If not, could you verify that at least some Xenomai programs run correctly, for instance latency ? Looking at the code, I think I found a bug, but I doubt it could cause a lockup. The definition of VM_PINNED in include/linux/mm.h collides with another bit used by Linux, so this defintion should be changed from: #define VM_PINNED 0x0800 to: #define VM_PINNED 0x1000 Here comes a 6th patch for this bug, (patch 6 includes patch 5). I've tested the 6th patch, the lockup is still there. As far as I can observe, it behaves exactly like with the 5th patch. Tomas ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
On Wed, Apr 2, 2008 at 5:02 PM, Tomas Kalibera [EMAIL PROTECTED] wrote: OK, no change with this patch compared to the previous situation. The system boots, but hangs without a stacktrace when I run my Xenomai task. SysRq is blocked, now even SysRq-kill did not work, only SysRq-boot did. Are you sure you did not keep the stuff in highmem_32.c ? -- Gilles ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
Hmm, I checked again and did not find a mistake in the experiment (neither using the old binary nor old sources). I'm doing a clean build from scratch again, so that we can be absolutely sure. I can then run memtest on the machine... Tomas Gilles Chanteperdrix wrote: On Wed, Apr 2, 2008 at 5:02 PM, Tomas Kalibera [EMAIL PROTECTED] wrote: OK, no change with this patch compared to the previous situation. The system boots, but hangs without a stacktrace when I run my Xenomai task. SysRq is blocked, now even SysRq-kill did not work, only SysRq-boot did. Are you sure you did not keep the stuff in highmem_32.c ? ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
Hi Gilles, I've recompiled the kernel again from scratch and got the same lock up. Fix 5 does not help... If you want to inspect the exact kernel I used, it's again at http://www.cs.purdue.edu/~tkaliber/crash, the one with p5 in its name. Tomas Gilles Chanteperdrix wrote: On Wed, Apr 2, 2008 at 5:02 PM, Tomas Kalibera [EMAIL PROTECTED] wrote: OK, no change with this patch compared to the previous situation. The system boots, but hangs without a stacktrace when I run my Xenomai task. SysRq is blocked, now even SysRq-kill did not work, only SysRq-boot did. Are you sure you did not keep the stuff in highmem_32.c ? ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
Tomas Kalibera wrote: Hi Gilles, I've recompiled the kernel again from scratch and got the same lock up. Fix 5 does not help... If you want to inspect the exact kernel I used, it's again at http://www.cs.purdue.edu/~tkaliber/crash, the one with p5 in its name. Tomas But... do you get the lock-up without the patch ? -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
Tomas Kalibera wrote: Hi Gilles, I've recompiled the kernel again from scratch and got the same lock up. Fix 5 does not help... If you want to inspect the exact kernel I used, it's again at http://www.cs.purdue.edu/~tkaliber/crash, the one with p5 in its name. You aren't using gcc-4.2 or later, are you? I've had problems with those for building and/or running kernels. On non-x86 targets, mind you, but maybe there's a connection... b.g. -- Bill Gatliff [EMAIL PROTECTED] ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
Tomas Kalibera wrote: Hi Gilles, I've recompiled the kernel again from scratch and got the same lock up. Fix 5 does not help... If you want to inspect the exact kernel I used, it's again at http://www.cs.purdue.edu/~tkaliber/crash, the one with p5 in its name. permission denied to download kernel configuration. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
Tomas Kalibera wrote: Gilles Chanteperdrix wrote: Tomas Kalibera wrote: Hi Gilles, I've recompiled the kernel again from scratch and got the same lock up. Fix 5 does not help... If you want to inspect the exact kernel I used, it's again at http://www.cs.purdue.edu/~tkaliber/crash, the one with p5 in its name. Tomas But... do you get the lock-up without the patch ? No. Or, more precisely, not the same one. With this patch (5), the system locks up as soon as the application starts. It does not print any stack trace. Without the patch, the system gets to unusable state when the application calls clone/fork, and it does produce a stack trace (those I was sending you before). It seems to be more alive (processes start, but crash, because of garbled preempt_count). The crashes are perfectly repeatable on the system I have. So, the crashes make no sense to you, right ? I can indeed try to go the defensive path and try an older kernel or something, but if there is a Xenomai bug, it would be nice to find it... The same for kernel bug, indeed. Of course, we are looking for all bugs. But please tell me: do you get the lock-up even before fork is called ? If not, could you verify that at least some Xenomai programs run correctly, for instance latency ? Looking at the code, I think I found a bug, but I doubt it could cause a lockup. The definition of VM_PINNED in include/linux/mm.h collides with another bit used by Linux, so this defintion should be changed from: #define VM_PINNED 0x0800 to: #define VM_PINNED 0x1000 I will now try, if possible, to reproduce the bug on a x86 box of mine and will keep you informed. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
Gilles Chanteperdrix wrote: Of course, we are looking for all bugs. But please tell me: do you get the lock-up even before fork is called ? If not, could you verify that at least some Xenomai programs run correctly, for instance latency ? The lock up with patch 5 happens before fork is called, but after a real-time task is started by the program. I don't know better now - I'd have to add more logging. If I run in strace, the lock-up does not happen. Thinking about that, it can be a bug in my program. If I understand the concept of Xenomai correctly, I can just write a real-time task that would starve the Linux kernel indefinitely, correct ? My program definitely does have bugs. So I'll do more debugging. The lock-up does NOT happen with latency. But, the bug in the kernel without patch 5 (the one that lead to a stack trac, after call to fork), did not appear in latency, either. Looking at the code, I think I found a bug, but I doubt it could cause a lockup. The definition of VM_PINNED in include/linux/mm.h collides with another bit used by Linux, so this defintion should be changed from: #define VM_PINNED 0x0800 to: #define VM_PINNED 0x1000 I will now try, if possible, to reproduce the bug on a x86 box of mine and will keep you informed. Thanks ! I'll indeed build kernel with patch 6, test again, and test my application. Tomas ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
On Tue, Apr 1, 2008 at 7:52 AM, Gilles Chanteperdrix [EMAIL PROTECTED] wrote: Tomas Kalibera wrote: I added a missing underscore and re-tried, and none of the debug messages was printed. I added another one to make sure that there is not a problem with getting printk messages to the serial console. The resulting highmem_32.c and the output is attached. T The interesting part of the output is the printk which occurs right before the first bug, what happens afterwards is of little use. Do you get any output before the first bug ? There are other kmap_atomic calls in copy_pte_range than the kmap_atomic taking place in cow_user_page, they use KM_PTE0 and KM_PTE1 as the type value. So, we should track these types as well in highmem_32.c. -- Gilles ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
Tomas Kalibera wrote: Crashed on the very same line as before Tomas Ok. Let us look for unbalanced kmap_atomics then. Try this patch instead. -- Gilles. diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c index 1c3bf95..a78494e 100644 --- a/arch/x86/mm/highmem_32.c +++ b/arch/x86/mm/highmem_32.c @@ -1,6 +1,11 @@ #include linux/highmem.h #include linux/module.h +static struct { + const char *file; + unsigned line; +} last_km_user0 [NR_CPUS]; + void *kmap(struct page *page) { might_sleep(); @@ -26,7 +31,8 @@ void kunmap(struct page *page) * However when holding an atomic kmap is is not legal to sleep, so atomic * kmaps are appropriate for short, tight code paths only. */ -void *kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t prot) +void *_kmap_atomic_prot(struct page *page, enum km_type type, + pgprot_t prot, const char *file, unsigned line) { enum fixed_addresses idx; unsigned long vaddr; @@ -39,7 +45,17 @@ void *kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t prot) idx = type + KM_TYPE_NR*smp_processor_id(); vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); - BUG_ON(!pte_none(*(kmap_pte-idx))); + if (!pte_none(*(kmap_pte-idx))) { + if (type == KM_USER0) + printk(KM_USER0 already mapped at %s:%d\n, + last_km_user0[smp_processor_id()].file, + last_km_user0[smp_processor_id()].line); + BUG(); + } else if (type == KM_USER0) { + last_km_user0[smp_processor_id()].file = file; + last_km_user0[smp_processor_id()].line = line; + } + set_pte(kmap_pte-idx, mk_pte(page, prot)); arch_flush_lazy_mmu_mode(); @@ -70,6 +86,10 @@ void kunmap_atomic(void *kvaddr, enum km_type type) BUG_ON(vaddr = (unsigned long)high_memory); #endif } + if (type == KM_USER0) { + last_km_user0[smp_processor_id()].file = NULL; + last_km_user0[smp_processor_id()].line = 0; + } arch_flush_lazy_mmu_mode(); pagefault_enable(); @@ -78,7 +98,8 @@ void kunmap_atomic(void *kvaddr, enum km_type type) /* This is the same as kmap_atomic() but can map memory that doesn't * have a struct page associated with it. */ -void *kmap_atomic_pfn(unsigned long pfn, enum km_type type) +void *_kmap_atomic_pfn(unsigned long pfn, enum km_type type, + const char *file, unsigned line) { enum fixed_addresses idx; unsigned long vaddr; @@ -87,6 +108,16 @@ void *kmap_atomic_pfn(unsigned long pfn, enum km_type type) idx = type + KM_TYPE_NR*smp_processor_id(); vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); + if (!pte_none(*(kmap_pte-idx))) { + if (type == KM_USER0) + printk(KM_USER0 already mapped at %s:%d\n, + last_km_user0[smp_processor_id()].file, + last_km_user0[smp_processor_id()].line); + BUG(); + } else if (type == KM_USER0) { + last_km_user0[smp_processor_id()].file = file; + last_km_user0[smp_processor_id()].line = line; + } set_pte(kmap_pte-idx, pfn_pte(pfn, kmap_prot)); arch_flush_lazy_mmu_mode(); diff --git a/include/asm-x86/highmem.h b/include/asm-x86/highmem.h index 13cdcd6..57b89f7 100644 --- a/include/asm-x86/highmem.h +++ b/include/asm-x86/highmem.h @@ -68,10 +68,16 @@ extern void FASTCALL(kunmap_high(struct page *page)); void *kmap(struct page *page); void kunmap(struct page *page); -void *kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t prot); +void *_kmap_atomic_prot(struct page *page, enum km_type type, + pgprot_t prot, const char *file, unsigned line); +#define kmap_atomic_prot(page, type, prot) \ + _kmap_atomic_prot(page, type, prot, __FILE__, __LINE__) void *kmap_atomic(struct page *page, enum km_type type); void kunmap_atomic(void *kvaddr, enum km_type type); -void *kmap_atomic_pfn(unsigned long pfn, enum km_type type); +void *_kmap_atomic_pfn(unsigned long pfn, enum km_type type, + const char *file, unsigned line); +#define kmap_atomic_pfn(pfn, type) \ + _kmap_atomic_pfn(pfn, type, __FILE__, __LINE__) struct page *kmap_atomic_to_page(void *ptr); #ifndef CONFIG_PARAVIRT ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
Gilles Chanteperdrix wrote: Tomas Kalibera wrote: Crashed on the very same line as before Tomas Ok. Let us look for unbalanced kmap_atomics then. Try this patch instead. Just when I hit the reply button, I realize that I forgot something. So, try this one instead. -- Gilles. diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c index 1c3bf95..97a5242 100644 --- a/arch/x86/mm/highmem_32.c +++ b/arch/x86/mm/highmem_32.c @@ -1,6 +1,11 @@ #include linux/highmem.h #include linux/module.h +static struct { + const char *file; + unsigned line; +} last_km_user0 [NR_CPUS]; + void *kmap(struct page *page) { might_sleep(); @@ -26,7 +31,8 @@ void kunmap(struct page *page) * However when holding an atomic kmap is is not legal to sleep, so atomic * kmaps are appropriate for short, tight code paths only. */ -void *kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t prot) +void *_kmap_atomic_prot(struct page *page, enum km_type type, + pgprot_t prot, const char *file, unsigned line) { enum fixed_addresses idx; unsigned long vaddr; @@ -39,16 +45,27 @@ void *kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t prot) idx = type + KM_TYPE_NR*smp_processor_id(); vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); - BUG_ON(!pte_none(*(kmap_pte-idx))); + if (!pte_none(*(kmap_pte-idx))) { + if (type == KM_USER0) + printk(KM_USER0 already mapped at %s:%d\n, + last_km_user0[smp_processor_id()].file, + last_km_user0[smp_processor_id()].line); + BUG(); + } else if (type == KM_USER0) { + last_km_user0[smp_processor_id()].file = file; + last_km_user0[smp_processor_id()].line = line; + } + set_pte(kmap_pte-idx, mk_pte(page, prot)); arch_flush_lazy_mmu_mode(); return (void *)vaddr; } -void *kmap_atomic(struct page *page, enum km_type type) +void *_kmap_atomic(struct page *page, enum km_type type, + const char *file, unsigned line) { - return kmap_atomic_prot(page, type, kmap_prot); + return _kmap_atomic_prot(page, type, kmap_prot, file, line); } void kunmap_atomic(void *kvaddr, enum km_type type) @@ -70,6 +87,10 @@ void kunmap_atomic(void *kvaddr, enum km_type type) BUG_ON(vaddr = (unsigned long)high_memory); #endif } + if (type == KM_USER0) { + last_km_user0[smp_processor_id()].file = NULL; + last_km_user0[smp_processor_id()].line = 0; + } arch_flush_lazy_mmu_mode(); pagefault_enable(); @@ -78,7 +99,8 @@ void kunmap_atomic(void *kvaddr, enum km_type type) /* This is the same as kmap_atomic() but can map memory that doesn't * have a struct page associated with it. */ -void *kmap_atomic_pfn(unsigned long pfn, enum km_type type) +void *_kmap_atomic_pfn(unsigned long pfn, enum km_type type, + const char *file, unsigned line) { enum fixed_addresses idx; unsigned long vaddr; @@ -87,6 +109,16 @@ void *kmap_atomic_pfn(unsigned long pfn, enum km_type type) idx = type + KM_TYPE_NR*smp_processor_id(); vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); + if (!pte_none(*(kmap_pte-idx))) { + if (type == KM_USER0) + printk(KM_USER0 already mapped at %s:%d\n, + last_km_user0[smp_processor_id()].file, + last_km_user0[smp_processor_id()].line); + BUG(); + } else if (type == KM_USER0) { + last_km_user0[smp_processor_id()].file = file; + last_km_user0[smp_processor_id()].line = line; + } set_pte(kmap_pte-idx, pfn_pte(pfn, kmap_prot)); arch_flush_lazy_mmu_mode(); diff --git a/include/asm-x86/highmem.h b/include/asm-x86/highmem.h index 13cdcd6..db09f27 100644 --- a/include/asm-x86/highmem.h +++ b/include/asm-x86/highmem.h @@ -68,10 +68,19 @@ extern void FASTCALL(kunmap_high(struct page *page)); void *kmap(struct page *page); void kunmap(struct page *page); -void *kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t prot); -void *kmap_atomic(struct page *page, enum km_type type); +void *_kmap_atomic_prot(struct page *page, enum km_type type, + pgprot_t prot, const char *file, unsigned line); +#define kmap_atomic_prot(page, type, prot) \ + _kmap_atomic_prot(page, type, prot, __FILE__, __LINE__) +void *_kmap_atomic(struct page *page, enum km_type type, + const char *file, unsigned line); +#define kmap_atomic(page, type) \ + _kmap_atomic(page, type, __FILE__, __LINE__) void kunmap_atomic(void *kvaddr, enum km_type type); -void *kmap_atomic_pfn(unsigned long pfn, enum km_type type); +void
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
I added a missing underscore and re-tried, and none of the debug messages was printed. I added another one to make sure that there is not a problem with getting printk messages to the serial console. The resulting highmem_32.c and the output is attached. T Gilles Chanteperdrix wrote: Gilles Chanteperdrix wrote: Tomas Kalibera wrote: Crashed on the very same line as before Tomas Ok. Let us look for unbalanced kmap_atomics then. Try this patch instead. Just when I hit the reply button, I realize that I forgot something. So, try this one instead. #include linux/highmem.h #include linux/module.h static struct { const char *file; unsigned line; } last_km_user0 [NR_CPUS]; void *kmap(struct page *page) { might_sleep(); if (!PageHighMem(page)) return page_address(page); return kmap_high(page); } void kunmap(struct page *page) { if (in_interrupt()) BUG(); if (!PageHighMem(page)) return; kunmap_high(page); } /* * kmap_atomic/kunmap_atomic is significantly faster than kmap/kunmap because * no global lock is needed and because the kmap code must perform a global TLB * invalidation when the kmap pool wraps. * * However when holding an atomic kmap is is not legal to sleep, so atomic * kmaps are appropriate for short, tight code paths only. */ void *_kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t prot, const char *file, unsigned line) { enum fixed_addresses idx; unsigned long vaddr; /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */ pagefault_disable(); if (!PageHighMem(page)) return page_address(page); idx = type + KM_TYPE_NR*smp_processor_id(); vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); if (!pte_none(*(kmap_pte-idx))) { if (type == KM_USER0) { printk(KM_USER0 already mapped at %s:%d\n, last_km_user0[smp_processor_id()].file, last_km_user0[smp_processor_id()].line); } else { printk(type is NOT KM_USER0\n); } BUG(); } else if (type == KM_USER0) { last_km_user0[smp_processor_id()].file = file; last_km_user0[smp_processor_id()].line = line; } set_pte(kmap_pte-idx, mk_pte(page, prot)); arch_flush_lazy_mmu_mode(); return (void *)vaddr; } void *_kmap_atomic(struct page *page, enum km_type type, const char *file, unsigned line) { return _kmap_atomic_prot(page, type, kmap_prot, file, line); } void kunmap_atomic(void *kvaddr, enum km_type type) { unsigned long vaddr = (unsigned long) kvaddr PAGE_MASK; enum fixed_addresses idx = type + KM_TYPE_NR*smp_processor_id(); /* * Force other mappings to Oops if they'll try to access this pte * without first remap it. Keeping stale mappings around is a bad idea * also, in case the page changes cacheability attributes or becomes * a protected page in a hypervisor. */ if (vaddr == __fix_to_virt(FIX_KMAP_BEGIN+idx)) kpte_clear_flush(kmap_pte-idx, vaddr); else { #ifdef CONFIG_DEBUG_HIGHMEM BUG_ON(vaddr PAGE_OFFSET); BUG_ON(vaddr = (unsigned long)high_memory); #endif } if (type == KM_USER0) { last_km_user0[smp_processor_id()].file = NULL; last_km_user0[smp_processor_id()].line = 0; } arch_flush_lazy_mmu_mode(); pagefault_enable(); } /* This is the same as kmap_atomic() but can map memory that doesn't * have a struct page associated with it. */ void *_kmap_atomic_pfn(unsigned long pfn, enum km_type type, const char *file, unsigned line) { enum fixed_addresses idx; unsigned long vaddr; pagefault_disable(); idx = type + KM_TYPE_NR*smp_processor_id(); vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); if (!pte_none(*(kmap_pte-idx))) { if (type == KM_USER0) printk(KM_USER0 already mapped at %s:%d\n, last_km_user0[smp_processor_id()].file, last_km_user0[smp_processor_id()].line); BUG(); } else if (type == KM_USER0) { last_km_user0[smp_processor_id()].file = file; last_km_user0[smp_processor_id()].line = line; } set_pte(kmap_pte-idx, pfn_pte(pfn, kmap_prot)); arch_flush_lazy_mmu_mode(); return (void*) vaddr; } struct page *kmap_atomic_to_page(void *ptr) { unsigned long idx, vaddr = (unsigned long)ptr; pte_t *pte; if (vaddr FIXADDR_START) return virt_to_page(ptr); idx = virt_to_fix(vaddr); pte = kmap_pte - (idx - FIX_KMAP_BEGIN); return pte_page(*pte); } EXPORT_SYMBOL(kmap); EXPORT_SYMBOL(kunmap); EXPORT_SYMBOL(_kmap_atomic); EXPORT_SYMBOL(kunmap_atomic); EXPORT_SYMBOL(kmap_atomic_to_page); [ 255.285392] [ cut here ] [ 255.289992] kernel BUG at arch/x86/mm/highmem_32.c:56! [ 255.295107] invalid opcode: [#1] PREEMPT SMP [ 255.299901] Modules linked in: rfcomm l2cap bluetooth ppdev sbp2 ipv6 parport_pc lp parport pcspkr iTCO_wdt iTCO_vendor_se [ 255.327057] [ 255.328538] Pid: 4986, comm: ovmtask Not tainted (2.6.24.3xenomaip3 #2) [ 255.335123] EIP: 0060:[c011a966] EFLAGS: 00010286 CPU: 0 [ 255.340588] EIP is at _kmap_atomic_prot+0xa6/0x120 [ 255.345356] EAX: 0027 EBX: c2b27520 ECX:
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
Tomas Kalibera wrote: I added a missing underscore and re-tried, and none of the debug messages was printed. I added another one to make sure that there is not a problem with getting printk messages to the serial console. The resulting highmem_32.c and the output is attached. T The interesting part of the output is the printk which occurs right before the first bug, what happens afterwards is of little use. Do you get any output before the first bug ? -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
Tomas Kalibera wrote: Hi Gilles, thanks for looking at it. Your analysis is correct, I don't indeed have CONFIG_PREEMPT_RT kernel, but only CONFIG_PREEMPT, sorry for the confusion. I've put the kernel config, sources, and binary on the web, so that you can be sure you're really looking on the kernel that is crashing, http://www.cs.purdue.edu/homes/tkaliber/crash After looking at the sources, it appears that kmap_atomic disables preemption and kunmap_atomic reenables it. In short, the bug should never happen. What could happen is that the preemption count is garbled, or that a call to kmap_atomic is not paired with a kunmap_atomic. To check if the problem comes from the preemption count, could you apply the following patch ? -- Gilles. diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c index 1c3bf95..4bb9fc6 100644 --- a/arch/x86/mm/highmem_32.c +++ b/arch/x86/mm/highmem_32.c @@ -34,6 +34,7 @@ void *kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t prot) /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */ pagefault_disable(); + BUG_ON(type == KM_USER0 !in_atomic()); if (!PageHighMem(page)) return page_address(page); @@ -85,6 +86,7 @@ void *kmap_atomic_pfn(unsigned long pfn, enum km_type type) pagefault_disable(); + BUG_ON(type == KM_USER0 !in_atomic()); idx = type + KM_TYPE_NR*smp_processor_id(); vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); set_pte(kmap_pte-idx, pfn_pte(pfn, kmap_prot)); ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
Crashed on the very same line as before Tomas [ 189.558776] [ cut here ] [ 189.563377] kernel BUG at arch/x86/mm/highmem_32.c:43! [ 189.568491] invalid opcode: [#1] PREEMPT SMP [ 189.573285] Modules linked in: rfcomm l2cap bluetooth ppdev sbp2 parport_pc lp parport sr_mod cdrom pcspkr iTCO_wdt iTCO_v endor_support shpchp pci_hotplug ipv6 evdev ext3 jbd mbcache sg sd_mod ata_piix usbhid hid floppy ata_generic ahci ohci1394 l ibata scsi_mod ieee1394 ehci_hcd tg3 uhci_hcd usbcore fuse [ 189.600440] [ 189.601924] Pid: 4960, comm: ovmtask Not tainted (2.6.24.3xenomaip1 #1) [ 189.608508] EIP: 0060:[c011a908] EFLAGS: 00010286 CPU: 0 [ 189.613971] EIP is at kmap_atomic_prot+0xb8/0xc0 [ 189.618566] EAX: d91a8163 EBX: c2b23500 ECX: f000 EDX: c044fecc [ 189.624804] ESI: 0007 EDI: 0163 EBP: 08003875 ESP: df673ea0 [ 189.631043] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 189.636416] Process ovmtask (pid: 4960, ti=df672000 task=df4d29e0 task.ti=df672000)0 [ 189.643865] I-pipe domain Linux [ 189.647257] Stack: fffb2000 c2b2350c c01a96aa fffb7000 fffb6000 df66d278 dfb7a580 [ 189.655648]dfb7ae40 df846084 df9b7084 08615000 0840 08615000 f7c3d740 c2b23520 [ 189.664039] c2b2350c c2be8aac fffb3000 08614fff [ 189.672430] Call Trace: [ 189.675045] [c01a96aa] copy_page_range+0x13a/0x560 [ 189.680086] [c01224ef] copy_process+0x8df/0x1250 [ 189.684951] [c012309c] do_fork+0x4c/0x200 [ 189.689211] [c01022d2] sys_clone+0x32/0x40 [ 189.693556] [c01043a1] sysenter_past_esp+0x6e/0x72 [ 189.698595] === [ 189.702150] Code: 0c c1 fb 05 29 c1 c1 e3 0c 89 c8 09 fb 89 1a 5b 5e 5f c3 89 e0 25 00 e0 ff ff f7 40 14 ff ff ff ef 0f 85 69 ff ff ff 0f 0b eb fe 0f 0b eb fe 8d 74 26 00 8b 0d f4 b1 45 c0 e9 35 ff ff ff 90 8d [ 189.721467] EIP: [c011a908] kmap_atomic_prot+0xb8/0xc0 SS:ESP 0068:df673ea0 [ 189.728669] ---[ end trace 7363976c5f0598cc ]--- [ 189.733269] note: ovmtask[4960] exited with preempt_count 1 Gilles Chanteperdrix wrote: Tomas Kalibera wrote: Hi Gilles, thanks for looking at it. Your analysis is correct, I don't indeed have CONFIG_PREEMPT_RT kernel, but only CONFIG_PREEMPT, sorry for the confusion. I've put the kernel config, sources, and binary on the web, so that you can be sure you're really looking on the kernel that is crashing, http://www.cs.purdue.edu/homes/tkaliber/crash After looking at the sources, it appears that kmap_atomic disables preemption and kunmap_atomic reenables it. In short, the bug should never happen. What could happen is that the preemption count is garbled, or that a call to kmap_atomic is not paired with a kunmap_atomic. To check if the problem comes from the preemption count, could you apply the following patch ? ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
Tomas Kalibera wrote: Hi Gilles, thanks for looking at it. Your analysis is correct, I don't indeed have CONFIG_PREEMPT_RT kernel, but only CONFIG_PREEMPT, sorry for the confusion. I've put the kernel config, sources, and binary on the web, so that you can be sure you're really looking on the kernel that is crashing, http://www.cs.purdue.edu/homes/tkaliber/crash It looks like do_wp_page, the caller of cow_user_page calls it with spinlock unlocked. So nothing prevents a rescheduling to happen and reschedule a real-time process, which can call fork. Now, I wonder what prevents do_wp_page to be called in the same conditions... -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
Tomas Kalibera wrote: Hi, I'm getting kernel crashes with my native skin user-space Xenomai application. It looks like the crash happens after clone/fork. I'm using kernel 2.6.24.3, SMP, RT_PREEMPT (settings like 2.6.22-14-rt from Ubuntu 7.10). Xenomai 2.4.2. The thread causing the crash is a Xenomai task, running most of the time in the Linux domain. The application is very huge, getting a short example leading to the bug is unfortunatelly not realistic. The crash happens when running on real hardware (x86_64 with 32 bit kernel and applications). The system is unusable after it happens, can only be rebooted, the dump is from serial console. In VMWare on another x86_64 machine, it does not crash. Anyone getting a similar error ? Any ideas where to look for the problem ? Looking at the kernel code, it seems that only one page may be mapped at a time with kmap_atomic using KM_USER0. So what probably happens is that for other invocations of cow_user_page than the one taking place in fork, a lock of some kind prevents concurrent invocation of cow_user_page. In our use of cow_user_page, we probably do not hold that lock. I look at the code, I see that copy_pte_range holds a spinlock, which should disable preemption on a classical kernel. But who knows what happens with RT_PREEMPT enabled... -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
Gilles Chanteperdrix wrote: Tomas Kalibera wrote: Hi, I'm getting kernel crashes with my native skin user-space Xenomai application. It looks like the crash happens after clone/fork. I'm using kernel 2.6.24.3, SMP, RT_PREEMPT (settings like 2.6.22-14-rt from Ubuntu 7.10). Xenomai 2.4.2. The thread causing the crash is a Xenomai task, running most of the time in the Linux domain. The application is very huge, getting a short example leading to the bug is unfortunatelly not realistic. The crash happens when running on real hardware (x86_64 with 32 bit kernel and applications). The system is unusable after it happens, can only be rebooted, the dump is from serial console. In VMWare on another x86_64 machine, it does not crash. Anyone getting a similar error ? Any ideas where to look for the problem ? Looking at the kernel code, it seems that only one page may be mapped at a time with kmap_atomic using KM_USER0. So what probably happens is that for other invocations of cow_user_page than the one taking place in fork, a lock of some kind prevents concurrent invocation of cow_user_page. In our use of cow_user_page, we probably do not hold that lock. I look at the code, I see that copy_pte_range holds a spinlock, which should disable preemption on a classical kernel. But who knows what happens with RT_PREEMPT enabled... There is something strange... Normally, when compiling with CONFIG_PREEMPT_RT, kmap_atomic_prot is replaced with kmap and the real kmap_atomic_prot is renamd __kmap_atomic_prot. Since cow_user_page uses kmap_atomic_prot, kmap is in fact called and kmap_atomic_prot BUG_ON condition should in fact never occur. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
Hi Gilles, thanks for looking at it. Your analysis is correct, I don't indeed have CONFIG_PREEMPT_RT kernel, but only CONFIG_PREEMPT, sorry for the confusion. I've put the kernel config, sources, and binary on the web, so that you can be sure you're really looking on the kernel that is crashing, http://www.cs.purdue.edu/homes/tkaliber/crash Thanks, Tomas Gilles Chanteperdrix wrote: Gilles Chanteperdrix wrote: Tomas Kalibera wrote: Hi, I'm getting kernel crashes with my native skin user-space Xenomai application. It looks like the crash happens after clone/fork. I'm using kernel 2.6.24.3, SMP, RT_PREEMPT (settings like 2.6.22-14-rt from Ubuntu 7.10). Xenomai 2.4.2. The thread causing the crash is a Xenomai task, running most of the time in the Linux domain. The application is very huge, getting a short example leading to the bug is unfortunatelly not realistic. The crash happens when running on real hardware (x86_64 with 32 bit kernel and applications). The system is unusable after it happens, can only be rebooted, the dump is from serial console. In VMWare on another x86_64 machine, it does not crash. Anyone getting a similar error ? Any ideas where to look for the problem ? Looking at the kernel code, it seems that only one page may be mapped at a time with kmap_atomic using KM_USER0. So what probably happens is that for other invocations of cow_user_page than the one taking place in fork, a lock of some kind prevents concurrent invocation of cow_user_page. In our use of cow_user_page, we probably do not hold that lock. I look at the code, I see that copy_pte_range holds a spinlock, which should disable preemption on a classical kernel. But who knows what happens with RT_PREEMPT enabled... There is something strange... Normally, when compiling with CONFIG_PREEMPT_RT, kmap_atomic_prot is replaced with kmap and the real kmap_atomic_prot is renamd __kmap_atomic_prot. Since cow_user_page uses kmap_atomic_prot, kmap is in fact called and kmap_atomic_prot BUG_ON condition should in fact never occur. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core